NLP is not just about Asking Questions.

January 29th, 2007 by Dr. Christian Hempelmann, Chief Scientific Officer

kiki.jpgThere is a common misconception in the market today. When Natural Language Processing (NLP) is mentioned many people think of “uh… asking a search engine a question.” But NLP is a very broad field, and the approach of answering questions with an NLP algorithm is hardly a distinguishing feature. Let’s see what NLP really means.

Shallow NLP, still the dominant approach, uses statistics to extract frequent constellations in language and make predictions based on that. For example, when the word “chip” occurs frequently near the word “chocolate,” a relation between them is assumed. So next time you see “chocolate,” expect “chip” too. Sadly, based only on statistics, we don’t know what the nature of that relation is. We only know that there is one, but not that this relation is very different from the one that holds between, say, “butter” and “knife,” or “punch” and “line,” which also occur near each other frequently.

On the other hand, there is knowledge-based NLP, also referred to as “Computational Linguistics” to distinguish it from the use of statistical methods and algorithms that happen to be applied to text. The reason is that computational linguistics uses, you guessed it, linguistics: knowledge and theories about how language works. These theories describe the underlying processes that humans use to produce and understand language. Linguists create and test theories, for example, on how sounds form words (phonology), how words form sentences (syntax), and, most importantly, how all of this helps in the task that we use language for, to communicate meaning (semantics).

In order to make something as powerful and dumb as a computer use these, often complicated, linguistic theories they have to be fully formalized. I cannot just tell the computer, “in English sentences start with the subject, but there are exceptions.” I have to tell it what a ‘sentence’ is, what a ‘subject’ is, even what ‘English’ is, and I have to include all the nasty exceptions. This is a good thing anyway, as teachers know: because only when you can explain something to an uninitiated audience, do you know that you actually have a “theory” about these things yourself. To teach linguistic theories to computers they have to be stated, among other things, in formulas. Ironically, this leads to some people thinking when they use any type of formula on language they are doing computational linguistics.

This is NLP. Untidy, complex, and interesting, just like language itself. Word lists and computers doing tidy statistics on language is not NLP. It’s also not really interesting. And not going to change the way we search with language. Real NLP can change that, though we haven’t seen much of it in search technology at all. But we are working on it!

delicious:NLP is not just about Asking Questions.  digg:NLP is not just about Asking Questions.  furl:NLP is not just about Asking Questions.  reddit:NLP is not just about Asking Questions.  

Comments are closed.