When we state that hakia is currently analyzing Web pages to rank them by semantic methods and credibility criteria, many people are asking, what is “credibility” of a Web page, and how can it better than “popularity”.
In a nut-shell, I can tell you right off the bat that credibility is the real thing, and popularity is an approximation (cheap imitation) for credibility. Let’s walk through some cases to make this point clearer.
Case-1: Domain name
Let’s take the query “madonna”. Obviously the most credible site is Madonna’s official Web site, madonna.com. Thanks to the fiercely competed market for Web names, only Madonna can afford this name. Thus, the very first criteria for credibility is the domain and how well that domain is controlled. For example, domains like .mil, .gov are totally controlled for official announcements, thus it is very unlikely to see junk content.
In this day and age, there are lists available of the most quality editorial pages. One can use Hitwise or Alexa data to go through the highly rated sites, and easily edit/modify this list to assign a rating. CNN.com, for example, is obviously a credible source. The list can be extended to include all company names, with their official page, and all the product offerings they have. So, if the query is about XBox, the system will know what sites are credible. Popularity algorithms were devised 10 years ago – at a time when such information was not readily available. But today, popularity computation by means of link referrals is like reinventing the wheel.
The page content can be analyzed for proper language, lay out, and links. These types of analyses are very common today, but semantic analysis is necessary to assess how well a given query is represented by the page content. Without this crucial element, the credibility assessment is likely to fall short.
Using the three measurements described above, the credibility of a Web page for a given query can be easily assessed. What will be missing in this method would be the “out-of-the-list” items which are obviously not popular to start with. However, the list based coverage has become the easiest practice today given the current state of data availability, hardware capacity, and connectivity.
Conclusion: The decade-old exotic popularity methods to rank a Web page are increasingly becoming obsolete. The next generation search systems will definitely not need these approaches any longer due to the improvements in content analysis, and due to the availability of credibility measurements. With the same token, manipulating search results will become much more difficult by artificial means.
When hakiaâ€™s content analysis of the Web pages is completed, these advantages will become highly visible to the end user. Until then, monitor the progress at our beta site, hakia.com.