Archive for August, 2007

You’ve Heard of 2nd Life? Now Here’s Something to Avoid: 2nd Search

August 30th, 2007 by Emre Sokullu, Search Evangelist
Second Life has been a new cool thing to do. But, in Web search, more often than not, your search is not completed just by typing your query and clicking the submit button. The real challenge starts after you receive initial search results. Now you need to comb through each result to find the information you really wanted. It is the challenge of the 2nd search. 2nd search is a tedious process that requires YOU, not the search engine, to find the information you need from the raw results you got. Given that most high-quality resources are organized in typical encyclopedia format and consist of never ending paragraphs, 2nd search (search inside the found URL), as we have all experienced, often is harder and more time consuming than 1st search (Web search).

At hakia.com, we’ve taken a closer look at this problem and developed a product that can significantly shorten your time to search: it’s called hakia ScoopBar and its genie relies on hakia’s semantic technology. hakia ScoopBar is a browser toolbar that automatically finds the answer of your question on the Web page you’re visiting, scrolling to the relevant position, and visually highlighting the relevant sentence(s). hakia ScoopBar saves you lots of time by alleviating the hassles of skimming the full document to find the right information. It’s particularly useful in the long-tail Web sites and contextually rich sites like Wikipedia and IRS.

Now let me explain you the concept with a few screenshots. In the example below, I ask hakia.com “how much gold was produced in the period of California Gold Rush” and follow the first link.


2ndsearch_mini.png

Voila, hakia directs me to the Web page that contains the desired information, and positions the scroller so that I see the relevant sentence instantly with no extra effort. With this feature, you’re no longer need to do a 2nd Search.

hakia ScoopBar also offers more great features. It allows you to copy and save the pertinent information you may want to reference later on – like bookmarking but more specific and targeted. This saves tons of time for saving research and references whether you are searching the Web for schoolwork, work, or just for personal interest.

For more information about the hakia ScoopBar, check out this exclusive page and watch our demo. Or download and see it by yourself. It’s available for Internet Explorer and a Firefox version is due by October. I already have tested the Firefox version and, it works flawlessly.

And, once you download and use the hakia ScoopBar, you’ll save so much time, you’ll have free time for 2nd Life.

SES San Jose – Thanks for Stopping By

August 29th, 2007 by hakia Team

ses.gif Just getting settled back from our trip to SES in San Jose — we want to thank all the people who stopped by the hakia lounge. Special thanks go to all of you who blogged about your hakia encounters and our search music CDs!

It’s great having a chance to talk search with so many people that are so enthusiastic about the topic. We were thrilled by your feedback on our new ScoopBar that cuts your search time in half. Sure we got a few “Who needs another toolbar?” looks when we first started talking, but that reaction quickly turned very positive when we showed how the ScoopBar positions the highlighted text in the destination page directly on the result-thanks to our semantic capabilites- AND how the “Scoop’n Save” function saves results to a file for future use or sharing.

If you stopped by the booth, again thank you and please drop us a “hello” comment below.

Visit Us at the hakia Corner in SES San Jose

August 17th, 2007 by hakia Team

If you missed us at the SES New York, find us in Booth #133 next week! If we met in the last show, come by to say hello.

We will share with you the latest scoop at hakia.com and answer your questions about semantic search in one-on-one demos. Ask for the password of the private viewing area in hakia Labs for a deeper look into our capabilities. Get your free Search Music CD when you stop by our booth.

And last, but not least, prepare yourself for small surprises. See you all in San Jose.

Proper Testing of a Semantic Search Engine: Part-1 The Query Set

August 13th, 2007 by Dr. Riza C Berkan, CEO

In response to the current interest in Semantic Search Engines (SSEs), new debates are emerging in the market as to how good SSEs are, or how good they can get. That brings us to the subject of testing. This is a long subject, therefore I have divided these considerations into several blog entries. This one focuses on preparing an appropriate query set for testing

I am seeing a number of attempts to evaluate SSEs. I am alarmed by how the testing can be trivialized so quickly without proper guidance, or the required background knowledge. I must immediately start like this:

You CANNOT test a Semantic Search Engine using a DOZEN QUERIES!

Using a handful of queries means that the tester and/or evaluator is not aware of the combinatory permutation space of all applicable considerations. In such cases, there is always an underlying favoritism using convincing arguments centered around the selected examples.

Here is the “absolute-minimum” short list of considerations to test SSEs:

QUERY TYPE:

A proper “test” case must include all possible variations of a query/question as listed in the table below. The column called “sampling” indicates the minimum number of cases to be tested for each variation.

Table-1: Query Types Sampling
keyword, phrase, sentence, what, where, when, how, why, which, who, is/was/does 11+


These variations test whether the search engine is sensitive to different aspects of the requested information. The minimum sampling is 11+ where the + indicates more variations like how much, how many, and whose. A scientific analysis can include more than 100 types of questioning patterns in English.

QUERY LENGTH:

Each one of the 11+ queries must be tested for different query lengths. The length of a query can be counted as the number of significant words after noise elimination.

Table-2: Query Length Sampling
1, 2, 3, 4, 5, 6, more than 6 7



This is a very important spectrum. The queries (of any type in Table-1) with 1, 2, and/or 3 significant words are considered “general” concept questions whereas queries with 3 or more significant words are specific questions entering the “long-tail” section. Thus, the testing sample has already increased to 11 x 7 = 77. However, if you must shorten it, you should at least sample the fat-tail (1,2, 3) versus long-tail (3 or more) queries. This would put the permutation of cases to 11 x 2 = 22.

CONTENT TYPE:

To do a quick job, you can compile 22 queries as outlined above, but in what subject? You have to cover a variety of subjects because semantic search capability can be more effective in one subject compared to another.

Table-3: Content Type Sampling
medicine, law, politics, entertainment, sports, shopping, tourism, computers, science, education 10


In its most general case, there can be 10 different content areas for SSE testing. This will put the minimum number of sampling questions to 11 x 2 x 10 = 220.

SENSE DISAMBIGUATION:

The 220 test queries as suggested above do not include sense disambiguation tests. This would require another set of queries. Among the number of ways of compiling such a test set, the shortest would require examining the 220 queries and focusing on their equivalent articulations. For example, “when did the Roman Empire fall?” can be articulated as “when did the Roman empire collapse?” The sense of “fall” in the first query is the same as “collapse” in the second query. If both queries bring overlapping results, it means that the SSE is able to detect the right sense of the word.

For the sake of the argument, if we assume all fat-tail queries (1 or 2 significant words) did not include any event (verb), then we could at least double the rest of the queries using equivalent articulations. This would require 110 additional queries, bringing the total to 330. Note that this is a very limited testing of the depth of the semantic capabilities.

The conclusion of the first part is this. Testing of a semantic search engine requires at least 330 queries just to scratch the surface. hakia’s internal fitness tests, for example, use couple of thousand queries. Therefore, if you see any report or article about the evaluation of a search engine using a dozen of queries, even if it includes valuable insight, it will tell you nothing about the overall state of that search engine.

Preparing test cases is only half of the equation. How to evaluate the results is a whole different story. I will post part-2 to discuss this matter soon.

Privacy and Alternatives

August 9th, 2007 by Emre Sokullu, Search Evangelist

emre.jpgCharles Knight, over his popular AltSearchEngines blog, is keeping an interesting list of search engines that can be a complement to and/or substitute for de-facto standard, Google. Sometimes, he also organizes “A Day without Google” events and calls people to fast from Google at least for a day and give others a try. He is not alone of course; Danny Sullivan of SearchEngineLand is also known for his similar efforts. I applaud both of them; seems like some excitement and competition are lacking – similarly to the Windows, Mac, Linux case.

The reason why I mention this is that a new “search engine” has hit the homepages of all big, reader-powered, democratized sites like Digg and del.icio.us earlier this week – it’s called Googlonymous (sorry, no link for ethic reasons). The idea is to make your Google queries anonymous to protect you against the hypothetical “Big Eye”. The application is fully illegal of course, and that’s why I don’t link to it; neither the name, nor is the use of Google’s backend legal. So don’t expect it to have a long-run.

However, the fact that this site can make such a big splash in one night is a good indicator of people’s concerns. Recently, Ask.com and Live Search also have announced their plans to renew their privacy policies. If you are a reader of the hakia blog, you must know about our sensitivity as well. So, no worries, if you are so concerned about privacy, there are better and fully legal alternatives for you just open your eyes and look around.

Information Pollution: Can Semantic Search Save the Day?

August 7th, 2007 by Dr. Riza C Berkan, CEO

Note: This article first appeared in Venturebeat on August 6, 2007.

Folks, we are approaching a Mega information clutter in the near future. There will be trillions of Web pages. People will have petabytes (quadrillions of bytes) of information on their local computers, and it will look like the biggest mess ever piled up in the history of human civilizations. A big portion of this information is junk, irrelevant, accidental and bad quality.

It is like we are building cities without any sewer, gutter, garbage collection, or sanitation systems.

Junk email filters or virus programs, meanwhile, actually encourage more information pollution by giving us a false sense of orderliness. It is analogous to recycling aluminum cans. Recycling makes consumers feel good about purchasing more of them, and we’re seeing a boom in “clean technology.” There is no “information cleaning” technology, however. Never has been, and never will be.

Our last hope is the “finding” technologies – namely, the search engines. The mess will still be there, just with better search engines we might be able to steer around the garbage. Therefore, the critical question is how much better the search engines should get to save the day?

To be able to answer this question in general principles, read more

A Message to Firefox and IE 7 Users of hakia.com

August 3rd, 2007 by Kartal Guner, Chief Architect

  You can add the hakia search plug-in for short-cut access to the search engine. All you have to do is browse to the hakia website and follow the instructions below.

Firefox:

Firefox

Click on the button to the left of the search box. If you are currently using Google the button will show the Google “G”- as shown in the picture.  After clicking towards the bottom you should see a – “Add hakia Search” – item.  Click on it and you will be done!

IE 7:

IE

Click on the drowdown button to the right of the search box. Expand the “Add Search Providers” Item and click on “hakia Search”.  After clicking you will be given a verification dialog box with the option to make it your default search.  Accept and you will be done!

We had this feature in place for a long time but still get emails from users asking for the plug-in. 

After the announcement of the hakia ScoopBar, we have also received a flood of emails inquiring about the Firefox version. We will launch it by October. Please check this space for updates!