Did Someone Just Expose Semantic Data?
This is a response to Marshall Kirkpatrick’s recent post Did Google Just Expose Semantic Data in Search Results?.
There have been many trivializing depictions of semantic search and semantic Web in the blogosphere, so much so that I might have developed an allergic reaction reading them. However, Marshall is doing the right thing by provoking us to define this space better.
First of all, what is “semantic data”? I think what this means is “syntactic extraction” as I followed the examples described. The extraction problem by fitting syntactic patterns, sorry to disappoint some of you folks, is really not semantic analysis. Extraction problem has been around many years, and is being implemented all over the market in enterprise (and government) applications.
Take a word pattern “what is the capital of –” or “what is the capital city of –”. Then, obtain a two column list from the Web of the capital cities around the world. After 12 minutes 34 seconds programming, you will have an extraction algorithm (extraction from the query) just as how Google does in these examples… This is not semantic analysis.
One step further, you can sit down and define patterns until the cows come home, and end up with a large library of extraction algorithms. You might scan through Wikipedia to collect data (if you don’t care proper authorship and credibility). Then you will have something useful, no doubt about it. However, these are not to be considered as semantic analyses.
Bruno Haid expressed his concern by using the terminology “structured versus unstructured platform” for the target of extraction. That is still not enough differentiation between syntax versus meaning in my book. For anything to be considered “semantic” there has to be a model of understanding, involving concepts and associations.
I recommend an old article written by George A. Miller on the ambiguity of words which should inspire a thought as to why syntax-only approach cannot replace meaning. We had posted a fun example here following Bill Gates’ vision. An example of semantic parsing was also posted here previously.
The most important question is how to implement semantic analysis in a search engine environment. The examples in Marshall’s post do not come close to any kind of semantic analysis beyond simple extraction operation. Google has not shown any clues to make us think of an actual semantic back-end yet.
January 12th, 2009 at 2:32 pm
Whether to follow Marshall’s post or coincidently, Ask.com also announced their new ’semantic’ search features last week: http://blog.ask.com/2009/01/semantic-search.html – returning similar results to Google.
No doubt there’s _some_ NLP taking place here too, but I think (as you also suggest) the almost rudimentary algorithms needed to achieve _these_ results in comparison are being overlooked by the hype of any announcement with the word ’semantic’ attached to it.
Ironically, it’s the meaning of that word that’s being confused – the need for a greater definition (other than a capital S) is far more significant now.