Archive for September, 2008

hakia Asks Librarians and Information Professionals for Credible Website Submissions

September 23rd, 2008 by hakia Team

Yesterday we issued an open call to librarians and information professionals for credible Website submissions at the WebSearch University in Washington D.C. We are glad to report that the immediate feedback is overwhelmingly positive.

Currently, hakia is generating credibility-stamped results for health and medical searches to guide users towards credible Web content. These results come from credible Websites vetted by the Medical Library Association. For an example of a credibility-stamped result, search for What causes heart disease? and mouse over the top search results. We are now aiming to expand our coverage to all topics.

Librarians and information professionals can now suggest URLs of credible Websites on a given topic by joining the hClub. Our credibility site definition is transparent and fulfills most of the following criteria:

• Peer review. The publisher of the site must have a peer review process or strict editorial controls to ensure the accuracy, dependability and merit of the published information. Most government institutions, academic journals, and news channels have such review mechanisms in place.
• No commercial bias. The publisher of the site shall have no commercial intent or bias. For example, for travel related recommendations consider U.S. Department of State travel portal and not Travelocity.
• Currency. The information on the site should be current and links should be working.
• Source authenticity. The publisher (preferably) should be the owner/producer of the content.

Upon submission, hakia will process the suggested sites with QDEX (Query Detection and Extraction) technology and make them available to Web searchers in credibility-stamped search results. Each month we will give away thank-you prizes, ranging from a book donation to two conference grants, to participants. To learn more or suggest credible Web sites, please visit http://club.hakia.com/lib/

We are looking forward to hear your feedback! This is just the beginning of a long journey.

hakia hosts Alt Search Engine Party

September 17th, 2008 by hakia Team

Last night we hosted the Alt Search Engine party to celebrate Web 2.0 Expo in New York. We would like to thank Charles Knight, the editor of the AltSearchEngines, for organizing this great event that brought together alternative search engines based in New York or visiting the Big Apple.

We enjoyed our drinks, the view of downtown Manhattan and discussions on different ways of searching. We even had some members of the hakia Band play live music. Thank you all for coming!

Semantic Advertising = Long Tail Monetization

September 11th, 2008 by Dr. Riza C Berkan, CEO

One of the most important reasons for developing semantic technology is to increase the efficiency of online advertising. There are two immediate effects like a chain reaction. First, meaning-based relevancy increases the frequency of correct impressions (easy to agree with). Second, the number of correct impressions can be increased against long tail queries (not so easy to visualize it.) In this blog entry, I want to elaborate more on the second point and explain how we tackle it at hakia.

The long tail accounts for a significant fraction of daily queries that have never been seen before. It is reported to account for 1/3rd of all daily queries by several search engines. Obviously, if 33% is unique every day, the percentage of the unique queries per year is a much larger number, resembling the famous iceberg picture.

To see what long tail queries look like, you do not have to review the search logs. You can simply look at the word distributions on your own Website. For example, if you have a Web page that includes 200 words describing your product/idea, then you can ask this question: How many possible questions/queries can be asked that your page has an answer to? If you are not a trained linguist, you will possibly undermine the number. But we can tell you that the number of possible queries/questions can be in the thousands for a page with 200 significant words.

Thousands of queries include all possible meaningful variations of the ideas you have presented on your Web page. However, make no mistake about it that if you use a linear system with no semantics involved, the number of word permutations/combinations can be in the billions. Thus, being able to extract only the meaningful ones is the heart of the technological problem.

It is an infeasible task to extract thousands of queries/questions manually for every Web page you have, not to mention creating an advertising campaign by bidding these terms one by one. So, here is the definition of the long tail for you: Those queries/questions you are not extracting from your Web pages are the long tail queries. The ones that you are extracting are the fat tail, popular, and short queries, and most likely, you are using them in your advertising campaign today.

Here is an example. If you type “celebrex” to Google, you see the ad for this product, as shown below.

The advertiser made a bid for the term “celebrex.” They may also have made bids for “arthritis”, “arthritis pain”, “arthritis treatment”, and so on. But, they have not made a bid for the query “What can ease arthritis pain, stiffness, and inflammation?” as shown below:

The reason that there is no ad for this query is because there are zillions of possible queries and the advertiser did not have the means to create these campaigns. The cost of extracting them manually, then bidding them one by one far exceeds the benefits. Therefore, long tail queries are not monetized successfully in today’s systems. The next challenge in online advertising is to take advantage of this sleeping giant.

At hakia, we have developed our own Semantic Advertising system to tackle this problem, and we are testing the system now. To tackle the long tail monetization problem, we built two important technologies: (1) QDEXing that extracts all possible meaningful questions from a given page, (2) automated long tail campaign so that the advertisers do not have to do it manually.

Taking on the example shown above, we have QDEXed the Web pages of Celebrex, and created a long-tail campaign at hakia’s simulation room. The long tail query “What can ease arthritis pain, stiffness, and inflammation?” brings an ad as shown below.

For Celebrex, there are thousands of queries extracted by QDEXing that include a vast amount of semantic variations.

An advertiser in hakia’s system can create a “long tail” campaign by simply pointing the URLs for his/her products, then make one bid for the entire campaign. To simplify the economics of it, the advertiser only pays for the successful long tail queries.

Stay tuned for updates on hakia’s Semantic Advertising System.

Search Engines’ Choices: Google v.s. hakia.com

September 5th, 2008 by Melek Pulatkonak, COO

We join others and applaud Google as one of the pioneers of search – a company with a solid stance about its choices and guiding principles. Since search engines control the way we access the Internet, we receive many questions about our algorithm and the choices we make. I was intrigued when I recently read a paper by James Grimmelmann, an Associate Professor of Law at the New York Law School, with the title “The Google Dilemma”. He questions some of Google’s choices and poses an interesting question: Where is the line between a search engine’s First Amendment right to build its system as it likes, and its responsibility as a public corporate citizen guiding users in the Web jungle?

James takes the reader through five seemingly harmless cases and questions how search engines:

• Organize information on the Internet and rank search results – for example, searches for Mongolian gerbils and talentless hack (for more you should read the paper)
• Deal with the dilemma of controversial searches such as a search for jew
• Intervene with human oversight when their algorithms are intentionally tricked, as was the case for Search King
• Customize search results by accommodating local laws and regulations which explains the different images that one sees for the search “Tiananmen”: in Google.com vs Google.cn?

We, architects of search engines, make choices when building our algorithms. These choices are extremely important, as searchers usually look only at the first page of search results and our development decisions limit them to see what the Internet has to offer through our lenses. Naturally, we are guided and grounded by our beliefs and how we perceive the world. Hence, our corporate values directly impact the Internet’s information seekers, which is a huge responsibility. For instance, Google has built a very elegant search solution that assumes that people will not “be evil”. But some of us are acting for self-promotion and as a result, the search eco-system is suffering from evil-doers who have:

• Made statements with search bombs. Google’s trust that people who link to other sites will use an actual word or phrase or meaning in the linked Web text was not returned in a small percentage of cases.
• Built link farms that unfairly boost the rank of commercial sites
• Used the Internet for political, racist and other extremist propaganda

We at hakia believe that searchers suffer from information pollution, and the time for a guided search experience has arrived.

We believe in letting computers do the work without human interference and have algorithms extract meaning from Web text directly, and refuse to consider link statistics. If there is an economic benefit, simple rules will always be violated and it is not humanly possible to monitor mischief.

We believe the Internet is in a mature stage and information scientists- the librarians- can point users to credible Web sources- an effort already under way with great projects like the LII, IPL and more.

I think it is best to demonstrate what we mean with an example. Before we do that, I have three disclaimers: 1) We are still in development and are rendering credible content recommended by librarians to our QDEX system. Our credible content universe is limited to health and medical sources recommended by the Medical Library Association for now; 2) We will not touch upon the superior relevancy performance of semantic search technology. 3) We will not discuss hakia’s immunity to search bombs to keep this blog entry to a manageable size.

We ran the query: “what prevents migraine” both at Google and hakia. When you view the hakia SERP, you will note that we have identified PubMed, MayoClinic and Clinicaltrials.gov as credible source picks and Wikipedia as user generated content. We would like to assist the searcher to identify relevant and fresh information from credible sources as the searcher is not expected to be an expert in every field to make this determination. In this case, the medical librarians were the experts. You can compare the same SERP of Google. Google has made a different choice.

As hakia matures, credible sources will be surfaced – when they are the most meaningful match to the query- in every search. We will always visually mark the credible search results. We believe that our approach will offer searchers a different way to access the vast Internet. This is hakia’s way of using its First Amendment right and filling its public corporate citizen shoes.

As said, we all make choices when we set out on a given journey. Google has made theirs, and we have made ours. But the most important traveler on the Internet’s road is the searcher – you. What will your choice be? Or, can you afford seeing only one perspective and ignore the other?