Archive for January, 2007

NLP is not just about Asking Questions.

January 29th, 2007 by Dr. Christian Hempelmann, Chief Scientific Officer

kiki.jpgThere is a common misconception in the market today. When Natural Language Processing (NLP) is mentioned many people think of “uh… asking a search engine a question.” But NLP is a very broad field, and the approach of answering questions with an NLP algorithm is hardly a distinguishing feature. Let’s see what NLP really means.

Shallow NLP, still the dominant approach, uses statistics to extract frequent constellations in language and make predictions based on that. For example, when the word “chip” occurs frequently near the word “chocolate,” a relation between them is assumed. So next time you see “chocolate,” expect “chip” too. Sadly, based only on statistics, we don’t know what the nature of that relation is. We only know that there is one, but not that this relation is very different from the one that holds between, say, “butter” and “knife,” or “punch” and “line,” which also occur near each other frequently.

On the other hand, there is knowledge-based NLP, also referred to as “Computational Linguistics” to distinguish it from the use of statistical methods and algorithms that happen to be applied to text. The reason is that computational linguistics uses, you guessed it, linguistics: knowledge and theories about how language works. These theories describe the underlying processes that humans use to produce and understand language. Linguists create and test theories, for example, on how sounds form words (phonology), how words form sentences (syntax), and, most importantly, how all of this helps in the task that we use language for, to communicate meaning (semantics).

In order to make something as powerful and dumb as a computer use these, often complicated, linguistic theories they have to be fully formalized. I cannot just tell the computer, “in English sentences start with the subject, but there are exceptions.” I have to tell it what a ‘sentence’ is, what a ‘subject’ is, even what ‘English’ is, and I have to include all the nasty exceptions. This is a good thing anyway, as teachers know: because only when you can explain something to an uninitiated audience, do you know that you actually have a “theory” about these things yourself. To teach linguistic theories to computers they have to be stated, among other things, in formulas. Ironically, this leads to some people thinking when they use any type of formula on language they are doing computational linguistics.

This is NLP. Untidy, complex, and interesting, just like language itself. Word lists and computers doing tidy statistics on language is not NLP. It’s also not really interesting. And not going to change the way we search with language. Real NLP can change that, though we haven’t seen much of it in search technology at all. But we are working on it!

Welcome to the New Year!

January 5th, 2007 by Dr. Riza C Berkan, CEO

As we enter 2007, I would like to thank all of the open-minded people out there who have adopted hakia as one of their prime sources for finding relevant information – even at hakia’s early BETA stage.

The year 2007 will surface the real power of hakia as we continue development and analysis of Web pages. We will also roll-out features that have never been seen before in the world of search engines. We will announce and integrate these features into our offering over the next 12 months one-by-one.

Creating a meaning-based search engine, in the true sense, is a huge undertaking. The human brain takes almost a decade of training to be fully cognitive in natural languages. Although computers may be considered faster processors of information, the required “training” for them still takes considerable time and resources. In an analogy, the cognitive growth of hakia this year will be like a baby starting to speak, needless to say, we will be proud parents!

With all due respect to skeptics, I must conclude this New Year’s message by saying that creating a new search technology is not a football match. It is not about whether the opponents can be beaten in their own game. It is about expanding horizons and the user base of Web Search in general. It is about whether we can solve real-life problems that were never possible before using the Internet.

Happy New Year to you all. May your dreams and searches bring on new meaning.