Archive for January, 2008

Frequently Asked Questions About hakia

January 9th, 2008 by Dr. Riza C Berkan, CEO

berkan3.gif
As we have entered 2008, I picked 11 questions/answers about hakia that were inquired frequently during 2007, and are important to clarify our position.

Q1: What is the vision of hakia?
We have started hakia with the vision that the future of Web search will be much different than what you see today. When I think of a search engine, I see the computer talking back to Mr. Spock in the Star Trek episodes. That “computer” satisfies many of the requirements which are expected from a search engine, like holding vast volumes of information and being able to reply to a query. What is missing today is the fundamental component of “understanding” that is first step toward human-like interaction and cunning precision. With the current state of the semantic technologies, this may no longer be a fantasy. We are taking the ambitious route of semantic search eventhough it could be a thousand-mile journey, but knowing that even the first mile will make a huge difference.

Q2: Does the world need a semantic search engine today?
Yes, it does. There are many independent studies reporting the “search fatigue” syndrome using today’s search engines like Google and Yahoo. Their limitations are very clear although they manage very well to draw attention away from these limitations. In short, there is no other technology sector where the leading products are riding on decades-old principles. Think about cars, cell phones, camcorders,…

Q3: What are the limitations of the current search engines?
The underlying principles of the current search engines do not include “understanding”, but instead they bank on “approximation” by statistics (i.e., referral statistics or popularity). Therefore, when referral statistics is not available, they fail. This happens when (1) the query is a long-tail query, and (2) the pages are dynamic so that there is no time to collect statistics. The cases 1 and 2 constitute a huge portion of the available information on the Web (increasingly so). Therefore, you are half (if not more) blind to the available information on the Web using the current search engines. “Search fatigue” syndrome accounts for more than 50% of all searches, according to these studies.

Q4: Is “search fatigue” a widely recognized syndrome by the user community?
It may not be as sensational as the bird-flu epidemic, but the awareness is definitely increasing. Usually, people are not as alarmed when they don’t know what they are missing. This is the challenge we face. We have to be patient to be able to show them what they are missing. Then, it will all break lose.

Q5: How can semantic search technology tackle the search fatigue problem?
First of all, not relying on statistics any longer allows your algorithm to work with full-force in the cases of (1) long-tail and (2) dynamic content. So, just shaking off this kind of reliance puts you ahead in the game. Second, semantic technology enables concept based matching, a pre-requiste for ultimate relevancy and page-context identification. The query “what drug treats headache” can be matched to “aspirin helps migraine” where no words match but the concepts do. Likewise, the correct sense of the words can be recognized as in “beat” meaning rhythm versus physical abuse. As a result, semantic search technology can satisfy (simultaneously) the requirements of relevancy, credibility, and freshness – which cannot be approximated by “popularity” criteria only.

Q6: What does it take to build a semantic search engine?
A lot of patience, to start with. A semantic search engine must embody the knowledge of concepts and how they are represented in natural languages. We have built these libraries at hakia using Ontological Semantics, an academic discipline of representing meaning in text. Also, you need to invent a new middleware to organize your data. Conventional indexing is a linear (flat) system which cannot handle semantic associations without inflating 1000-fold. So, we invented QDEXing, a new data organization infrastructure to tackle that problem.

Q7: Is hakia a Natural Language search engine?
Yes, but I do not like this terminology at all. It has been widely used in the past to refer to “being able to ask queries in a natural way” or “a search engine that understands your query.” These are immature definitions. First, what is natural or not is unclear. Why not say “free of specific rules” instead? Second, understanding the query/question is only part of the story. More important part is to understand the Web text by algorithms. Just handling the query smartly, then applying a rudimentary Web text analysis will not do the job. The correct definition is “semantic search”. hakia is a semantic search engine that covers natural language search and more depending on how the term is used and understood.

Q8: What is the difference between semantic search and semantic Web?
Apples and oranges. Semantic search is a centralized system where semantic knowledge is embedded. It can analyze any Web document free of format restrictions. Semantic Web is a new form of the Web. It assumes that people making the Web pages will organize the content following a specific format. I have never seen one single format to be followed by the masses. We have 200 different file formats for images, text, video, etc. Why would this stick now? It is an idealistic but impractical approach, unless the goal is a small segment of the Web via handful of followers.

Q9: How does human powered search compare to hakia’s offering?
If your search paradigm entirely relies on human editing, like Wikia Search, then the question is how are you going to handle the cases of (1) the long-tail, and (2) dynamic content? For the long tail, there are not enough people on the planet to do this job manually, not even to make a small dent. For dynamic content, you need to have dedicated people watching and scanning every single new content to act upon immediately. Both cases are very shaky and unrealistic propositions, not to mention the vulnerability to abuse (the rat race of marketers.) As a result, human powered search is not an answer to the search fatigue problem (i.e., cases of 1 and 2 above), it will only be effective in the same popularity domain of the current players. At hakia, we believe in “guided” human contribution only as a supplementary part of a semantic search engine. Such examples will come on-line at hakia.com.

Q10: Will better search mean a reduction in clicks on the ads? How will hakia’s business model be affected?
Better search technology will bring better advertisements in response to the users’ queries. We are currently working on “semantic advertising” platform to provide that balance. If you only improve search, but not the ads, then you would be driving people away from the ads. We believe that on-line ads are an integral part of “quality search” when presented with correct relevancy, frequency, and format.

Q11: When is hakia launching its full Version?
What you are seeing today on our BETA site is a primitive version of what we have running behind the closed doors. Some people may joke and say that hakia is in the “fool” mode rather than the “stealth” mode. Such assertions are incorrect. We are in a “learning” mode with the BETA operation. We will complete the search engine soon. It is our New Year’s resolution!