Semantics In, Popularity Out

May 14th, 2008 by Dr. Riza C Berkan, CEO

We congratulate Powerset for their launch. Some people must have a gun pointed at their heads to rush to a conclusion using one or two examples. Powerset is good, powerset is bad, etc. Well, I think they all are missing the point. So much for encouragement!

The clear message is this: Semantic technology is here, and will evolve to challenge and eventually push out the popularity based search methods. Here are the main reasons why:

1- DYNAMIC CONTENT
Dynamic Web pages and news articles move with such a fast pace that there is no time to collect any kind of statistics (link referrals) for popularity algorithms to do their job. By the time such referrals are made these pages become “history”. Thus, the only means to analyze them is via semantic algorithms that are not depending on statistics collection.

2- LONG-TAIL
A recent study shows that the average Web page has 474 words, and 41 links, 10 of which are pointed outside the domain. Any linguist would confirm that there can be 1000 queries that can be asked to a Web page of 474 words. If only 10 links are pointing out on the average, that means 99% of the meaningful word sequences (queries) are not wrapped around links to point out to any Web site. That is what creates the long-tail “relevancy” problem. There is so much valuable information left out using a popularity method. We, at hakia, call it “the hidden failure”. Semantic algorithms does not depend on statistics collection, thus are the only means to tackle the long-tail problem.

3- USER INTERACTION
The current generation of Web searchers are accustomed to use the pigeon-keyword language. But the average length of a Web query is on the rise. That means elevated expectations, problem solving, communication with (more like) natural languages. Eventually, people would love to talk back and forth to a search engine pretending to be Mr. Spock. None of these can be handled by popularity algorithms. We need semantic systems to understand text and speech.

4- CREDIBILITY
Search results that are ranked by popularity algorithms are destined to be commercially-biased. I am not talking about those “sponsored links.” If you are suffering from back-pain, you may have to sift through popular results about massage parlors, spas, and mud baths, before you encounter a credible source. With semantic technology, credibility of a source is not compromised by the ranking algorithm. It can be controlled to the full extent by expert advice.

5- ADVERTISEMENT ACCURACY
As a suitcase producer, you don’t want your ads to be pushed next to a murder story where the body was disposed using a suitcase. Content understanding is essential in on-line advertising, and can only be delivered by semantic advertising systems in a consistent basis.

At hakia, we call the combination of all these 5 points as the Quality search, as opposed to Popularity search. Quality is the new perspective for the consumers who had never been exposed to it until recently, and the semantic technology is the enabling force behind it.

It is no longer a big secret that all existing search players are also looking into the semantic technology. The question at this point is how good and comprehensive these technological developments are. It is just a matter of time until the consumers decide the winner and silence all those shot guns. Of course, when the tide changes, we may see roses popping out from their barrels.

For those who are interested, I have written about what takes to test a semantic search engine properly. It requires at least couple of hundreds of queries specially crafted to test the competency in various areas. Then, one can compare it with Google, provided they both have the same corpus to work on for the search queries. That’s how it is supposed to be done instead of a shot-gun approach.

Congratulations Powerset. Keep it coming.

delicious:Semantics In, Popularity Out  digg:Semantics In, Popularity Out  furl:Semantics In, Popularity Out  reddit:Semantics In, Popularity Out  

8 Responses to “Semantics In, Popularity Out”

  1. Mark Johnson Says:

    Thanks for the kind words! Semantic technology companies are all in this together.

    I think you hit it spot on with all counts (except we don’t have ads right now). The only thing that I’d add is that Powerset is trying to include our technology in all search results. For example, if you just type in a topical query, we show Factz that we extracted from many different Wikipedia articles. Also, we bring our search into the enhanced Wikipedia pages, summarizing long passages of content. . . it’s not *all* about the search.

    {mark} powerset product manager

  2. Host Says:

    I searched for \’Host My Own Website Router Static Ip\’ in google and found this your post (\’nnial 2007 – salvatore iaconesi – del.icio.us poetry\’) in search results. Not very relevant result, but still interesting to read.

6 Trackbacks/Pingbacks

  1. SEM News: SearchCap: The Day In Search, May 15, 2008 - Search Engine Marketing Says:

    [...] Semantics In, Popularity Out, Hakia Blog [...]

  2. E-researching» Архиви » SearchCap: The Day In Search, May 15, 2008 Says:

    [...] Semantics In, Popularity Out, Hakia Blog [...]

  3. searchenginemarketingvox » Blog Archive » SearchCap: The Day In Search, May 15, 2008 Says:

    [...] Semantics In, Popularity Out, Hakia Blog [...]

  4. Why bother comparing engines? It’s all just semantics… « slewfootsnoop Says:

    [...] is rightly pointed out elsewhere, not only would it take hundreds of carefully crafted searches to fairly put this new source [...]

  5. Alt Search Engines » Blog Archive » Semantics In, Popularity Out Says:

    [...] Guest post by Dr. Riza C Berkan, CEO [...]

  6. ICT magazine » Internet » SearchCap: The Day In Search, May 15, 2008 · Information & Communications Technology magazine Says:

    [...] Semantics In, Popularity Out, Hakia Blog [...]

Leave a Reply