Information Pollution, the Murder She Wrote

February 14th, 2008 by Dr. Riza C Berkan, CEO

Readers of our Blog may remember my previous post about “information pollution.” A recent Google Blog post identifies one special form of it “… we have investigated billions of URLs and found more than three million unique URLs on over 180,000 web sites automatically installing malware.”

It is now evident that the decade-old strategy of “cover everything under the sun” is no longer a value-adding proposition to Web search due to the increasing rate of information pollution. Unfortunately, search engines that rely on statistical algorithms – but nothing else – will continue their uphill battle against information pollution, and against tactics that can easily generate baits for statistical algorithms.

About a decade or so ago, the Web was more of an unknown, and statistical methods needed to identify popular pages (for link terms) to approximate their legitimacy and relevancy. But now, credibility of the Web sites has become highly transparent. The authority map of the WWW is as tangible as the geographic map of the world. How much popularity is needed now?

The title I picked, inspired from the famous TV show, may sound too harsh, but I wanted to make one point clear. Google’s remarkable success by means of popularity algorithms was also the beginning of the SPAM industry. Commercial success sparked more information pollution, and now we are reaching the point of very low signal-to-noise ratios. Cleaning up the pollution is Google’s main concern, wisely so if they want to continue on the same path.

But for some of us, it is obvious that the popularity approach is no longer enough, and some fundamental changes are needed in search philosophy. Semantic search technology is one potential solution, as we are working on it day and night at hakia.

With semantics, information pollution can be decreased back to insignificant levels, like the clean-air technologies for energy production. In simple terms, the search algorithm will no longer take the link referrals, or any other author/user generated statistics, as the only means of rating a content. Semantic search algorithms will analyze the content for what it really means.

One thing is for sure that, if the spammers do not dissapear they will have to operate on a new level of ingenuity, perhaps as sophisticated as a poet or novelist, to generate content that can fool a semantic algorithm. That also depends on how good the semantic technology will be.

delicious:Information Pollution, the Murder She Wrote   digg:Information Pollution, the Murder She Wrote   furl:Information Pollution, the Murder She Wrote   reddit:Information Pollution, the Murder She Wrote   

2 Responses to “Information Pollution, the Murder She Wrote”

  1. Nathan Says:

    They’re already doing rudimentary versions of it now (probably simple markov chain approaches), and it won’t take too long for them to get out of the 90’s and start using better techniques that are harder to detect as spam. Every time you raise the bar they’ll figure out a simple exploit or pay someone with enough knowledge to beat you at your own game.

    The spammers will never disappear if there is a financial incentive. If you start selling ads, instead of using ask.com, and become popular enough (or have wide arbitration opportunities regardless of popularity) you’ll be a target. And then all I can say is welcome to the club, you have a lot of work left to do.

  2. pop Says:

    its always gonna be a bit tricky unless it is human managed all time.

    lets c what is inside soon at haki .

Leave a Reply