Case Study of Contexa at ReadWriteWeb: Context Improves CTR

September 20th, 2009 by hakia Team

Since the launch of the contextual link advertising product on ReadwriteWeb, we at Hakia have been anxious to see the results and evaluate the success of our contextual advertising product, Contexa. Contexa system matches the context of a blog post with a sponsor’s criteria using proprietary semantic technology to deliver relevant ads to the reader. Participating ReadWriteWeb sponsors have provided the contextual engine with up to three “trigger phrases” that define their business. As a reader of the blog, you may have seen the product’s implementation at the bottom of certain blog entries, as shown below. You can see another example here.

rwwad

As we began this exciting journey, the ReadWriteWeb team defined its objective as follows:

1. “To offer value to our readers by providing advertising links in the context of what they are reading, links that would therefore more likely be of interest to them, and
2. To offer a higher level of engagement to our advertisers, resulting in both more branding impressions and more click-throughs.”

The preliminary aggregated statistics of the six participating sponsors (excluding hakia), covering a 40-day period, demonstrate that the Contexa system has met ReadWriteWeb’s objectives:

- The Contexa system increased ad clicks by 14% (i.e. advertiser received, on average, 14% more ad clicks).
- The click-through rate (CTR) for Contexa was more than twice that of ReadWriteWeb’s 125 x 125 banner ads.

We decided to turn the tables and interview Bernard Lunn, ReadWriteWeb’s COO and Feature Writer for his feedback! Hakia’s own COO, Melek Pulatkonak, poses the questions.

Melek: Bernard, we have been working together on the ReadWriteWeb contextual link advertising system for a while. Has the system met your expectations?

Bernard: Yes, it has. We wanted to see if it would generate a meaningful uplift in CTR, and it has.

Melek: What feedback have you received from advertisers? And what would you recommend to participating advertisers going forward?

Bernard: Advertisers have to get the traffic–relevance balance right. You can drive a lot of clicks with a hot term – something we are writing about a lot – but if the relevance is low, the advertiser won’t get good conversions. As with any new type of advertising, an art and science emerges over time. People know how to buy search terms on Google, but this is a bit different. I think we need to get better at creating more of a feedback loop (e.g. stats on how different terms have performed) so that advertisers can tune their keyword selection accordingly. Each advertiser has different needs and knows its market intimately, so it is best positioned to decide what works and how to tune its selection.

Melek: What’s next? What is your vision?

Bernard: For this first phase, we provided Contexa to our long-term sponsors. In the next phase, we want to offer Contexa as a standalone offering, so that advertisers can purchase keywords (or trigger phrases) directly on ReadWriteWeb. This will be an entry-level self-service advertising option that many smaller startups have requested.

Melek: Anything else you would like to add?

Bernard: Context matters to engagement. That is an obvious statement, but doing it right has been hard, and the opportunities for bloggers to offer ads that engage readers well and offer them value have been limited. Contexa is a good step in this important journey.

We thank the ReadWriteWeb team for working with us closely to create a new contextual ad system for blogs and other publishers. To learn more about Contexa, please contact us at bdev@hakia.com

A New Commercial Ontology from hakia

July 27th, 2009 by Dr. Riza C Berkan, CEO

Perhaps the world’s first, we are proud to announce our upcoming Commercial Ontology (CO). What is a commercial ontology? If you asked this question you have just touched on an important distinction: fantasy versus reality. In the context of World Wide Web, the CO is the realistic version of an ontology for the reasons explained below.

The Realities of the Web

We have accomplished two important innovations in building the CO. First, the development of concepts and lexicons followed a strict guideline of the realities of Web operations. What were these realities? Most of the search queries on the Web reflect a single dimension of intent, almost exclusively relevant to commercial topics. Note that the interpretation of “commercial topics” must be taken in the broadest sense possible. For example, if you were looking for “the benefits of foot massage” or “the director of the movie Last Emperor” your queries fall into the same commercial pattern. One particular distinction of the commercial pattern is that they come in short packages including a name (onomasticon), or always referring to something sold, bought, watched, heard, etc.

In contrast, many ontologies (if not all) that have been built to date, or claimed so, are focused on the use of language in the general sense, but not in the sense of commercial patterns on the Web. Therefore, their usefulness when tackling the Web search queries is greatly compromised, sometimes to the point of absolute failure. If such an ontology could disambiguate a dozen of different senses of the word “kill”, it would be sad news that the last 100,000 queries in the search logs did not include a single occurrence of the word “kill”. Like drowning in 2 inches-deep water, such ontologies will not utilize their disambiguation skills nearly 80% of the queries because the queries include nothing but onomasticons and/or they are too short (under-articulated).

The Sequence Approach

The second innovation used in the CO is the use of sequences instead of single words. A single word, like “kill”, is the most ambiguous state of information and is hardly used in human communication without a strong underlying/implied context. As a result, building a natural language processing (NLP) systems by taking single word as the unit of computation is an invitation for disaster.

In contrast, word sequences (2 or more words) are inherently safe and highly descriptive. Take “road kill”, for example. This sequence describes a corpse of an animal killed on the road by a passing vehicle. If a language processing system takes the sequences as unit of computation, 99% of the ambiguity problem vanishes. There is no need to process the word “kill” and “road” separately, trace their senses, and locate convergence to identify the meaning of “road kill” if you can just take the sequence “road kill” itself as your unit of computation for mapping. This is depicted below:

road kill

Note the number of traces required in a conventional ontology approach compared to the sequence approach. The sequence approach requires a lot of data storage space (which is dirt cheap) whereas the conventional ontology approach requires a lot of CPU for a simple mapping task (which is expensive). But the bad news does not stop there. The trace routes in conventional ontology requires manual work (impossible to automate) whereas sequence-based ontology can be easily built via automation.

I realize only a handful of people will understand the second point above. Nevertheless, the scalability and performance of the end product will speak for itself when we put the testing platform on-line.

Usage of the Commercial Ontology

The immediate use of the CO is related to search queries, or document characterizations, that are not tied to any advertising in conventional systems. This unrecognized domain of search queries and characterizations means loss of revenue. hakia’s CO is designed to fill this gap. For example, if the search query or page characterization is “beat generation” the CO can map it to “literature” on the fly. As a result, systems using the CO will have much deeper understanding of the incoming terms, thus will be able to recognize the underlying intent beyond the face value of the words. The same capability can be used in a number of places other than advertising with the same effect.

Stay tuned for the release of the first version of our commercial ontology.

Everything You Always Wanted to Know About Semantic Search, But Were Afraid to Ask (in SemTech Conferences)

June 24th, 2009 by Dr. Riza C Berkan, CEO

In the wake of SemTech09 conference, I thought this title would do justice to those mischievous readers who happened to have the good fortune to stumble across this blog posting. The conference was great, neatly organized, carefully secluded in San Jose, California. One of the highlights was the Semantic Search Keynote Panel with all the players on stage (Ask, Bing, Google, hakia, TrueKnowledge and Yahoo!) as seen in the picture below.

semtech09-panel

Bear in mind that semantic technology to “any” audience can be as heavy and stifling as what the topic of stem-cell research can be to the high-school students. Thanks to Carla Thompson from Guidewire who did a terrific job to come up with discussion topics and moderating the panel, everyone survived the ordeal without any sign of dozing.

Despite the positive outcome, some responses from the panelist made me wonder if we should go back to the basic question of “What is semantic search?” Or, better to discuss: what is NOT semantic search? Here is my list:

Structured data. Folks, structured data is NOT semantic technology. A database that can pull out a list of beer brands, their manufacturers, and their contact information, given the query “social drinking”, has nothing to do with semantics. I say this because some people seemed to be under the illusion that there must be some kind of semantic technology if a search engine brings such structured data in SERP. It is a trick as old as the ancient Egyptians who used beads on strings to organize harvesting information. Organized information is not semantics.

Morphology. If a search engine is robust (brings the same results) to a query “top ten” versus “top 10″ by recognizing “ten=10″ it would be a stretch of imagination to call it semantic. Anyone can come up with such a replacement list without a drop of linguistic knowledge. Similarly, distinguishing the name Fisher from the noun fisher by detecting the capitalization of the first letter does not go beyond the application of simple linguistic rules. These capabilities are not semantic search capabilities.

Syntax. It is true that certain level of semantic information can be salvaged from syntax. Unfortunately, if syntax was enough to detect the meaning of text, then an 8 year old kid who developed a perfect reading skill (syntactically parsing strings of letters and words in English) would be expected to understand the meaning of Shakespeare’s works. The difference between reading and understanding is the difference between syntax and semantics. Former requires the skill to parse things out, whereas the latter requires vast amount of associative knowledge.

Statistics. An infinite number of monkeys with a keyboard would eventually type the complete text of the declaration of independence. This is statistically correct. However, if a search engine is expected to become semantically apt using statistical algorithms, one has to wait until the monkeys finish their job. There is no place for statistics in semantics. For example, let’s take this sentence: “Polar bears don’t eat alligator eggs before dawn.” I am sure you have never seen this combination of words before in your life. But, the fact that you can understand what it means is simple evidence that semantic brain does not need statistical sampling. Meaning does not emerge from statistics. It emerges from associative knowledge.

Scalability. Scalability is the narrow bridge between science and technology. What you can carry from the science side to the technology side over this bridge determines the level of capabilities in real world. The science of semantics is huge stemming from the basics of philosophy. But, Web search is a highly particular problem with stringent constraints (narrow bridge). Designing semantic algorithms to drive a Web search engine is like walking on egg shells and requires a completely new approach. Therefore, a semantic algorithm can be very sophisticated but it does not mean it is a semantic search algorithm suitable for the Web.

The five issues I addressed above explain what is NOT semantic search and should guide the interested readers to question emerging technologies in SemTech10. Structured data, morphology, syntax, statistics, and scalability are the key questions to discuss. Obviously, no one would be afraid to ask these questions unlike what the title suggests, but if you understood the title, it was your semantic brain in action. That was my last example to “what is semantics” in this article.

Inspired by hakia, Bing introduces categorized search

June 2nd, 2009 by Melek Pulatkonak, COO

catsearchBing, the new search engine from Microsoft just went live and in doing so introduced a similar version of hakia’s categorized search. At its launch in 2006, hakia became the first search engine to provide categorized aspects of search queries via hakia Galleries.

hakia Galleries received industry accolades after their formal introduction in 2007. Our goal has always been to take search beyond 10 blue links. It was then no surprise when Microsoft invited us to show them the inner workings of the hakia Galleries in July 2008- shortly after their acquisition of Powerset. But it was a huge surprise to recently find out that Microsoft introduced categorized search in Bing. Today we checked out the Bing preview and compared the Bing’s categorized search feature to its inspiration, hakia Galleries.

hakia Galleries provide categorized aspects of search queries. For example, if you are searching for Obama, you can find information about his official site, headline news, images, biography, speeches, and more (see image below). Powered by semantic search, hakia Galleries prove 17 aspects of this query. We save the user time by answering 17 Obama related questions in one search. Compare the hakia Obama gallery with the same search at Bing.com (Bing provides only 7 aspects of this search query).

hakiaobama
bingobama1

Let’s look at another example. Search for lung cancer at hakia and Bing. hakia provides the searcher links for the following aspects of this query: Basic Information and FAQ, Image Search, Headline News, Symptoms and Diagnostics, Treatment, Procedures, and Therapy, News, Clinical Trials, Healthcare Facilities and Finding a Physician, Alternative Therapy, For Kids, Research and Statistics, Organizations, Message Boards, and Images. Compare that with Bing’s aspects: articles, symptoms, treatment, prognosis, stages, clinical trials, and images. Look familiar?

hakialc
binglc1

As Danny Sullivan put it aptly in his Bing review, “Probably the most significant change is that Bing now organizes search results into categories (gives Obama example)…The concept of grouping results also isn’t new. Long known as clustering, you can see it in operation at hakia (see Obama there) or Clusty (again, see Obama there).”

At hakia we could not dream of a marketing budget of $80-100 million. But hey, if you are out there to try Bing as an alternative search engine to Google, give the original categorized search a try at hakia.com (one of Bing’s inspirations!). You can surf the hakia Galleries here: http://gallery.hakia.com/ or try your search at hakia.com when you bing and ding.

A New Contextual Advertising Technology from hakia: CONTEXA, launched at ReadWriteWeb

May 19th, 2009 by Kartal Guner, Chief Architect

We are happy to announce that we have launched our new contextual advertising module of our semantic advertising system: CONTEXA. ReadWriteWeb (RWW), one of the world’s top 20 most popular blogs according to Technorati, is our first partner.

CONTEXA provides page-level contextual analysis on-the-fly and outputs keywords that represent the meaning of the page along with their meaning score. CONTEXA is offered as a service and can be integrated into any ad system. RWW has integrated CONTEXA where our system matches the contextual representation of a blog page with sponsors’ requirements on-the-fly to provide relevant ads to RWW readers for a richer experience. The red box in the image below shows this step.

rww

We believe that more relevant contextual ads will bring the return of contextual advertising closer to paid-search levels with the ripple-effect of increased CTR- conversion rates- revenue. CONTEXA is powered by hakia’s semantic core technology. To see how CONTEXA works, you can visit our CONTEXA page.

We had shared with our readers the comparison demo of hakia’s contextual capabilities with that of AdSense and Yahoo in the fall. We did not have a chance to do a comparison with Microsoft’s PubCenter. As we move along with the ReadWriteWeb’s implementation of CONTEXA, we will report about lessons learned and milestones marked.

We are excited to keep the wheels of innovation turning at hakia as our industry has plenty room for improvement. Today, Web users are overwhelmed with the quantity and suffer from the quality of display ads and quickly learn to ignore a good portion of the Web pages they visit. In the long run, the industry’s focus will have shift to increasing ad quality and limiting the supply to increase value. The path to this promise goes through enhancements to both contextual and behavioral ad targeting technologies. We are happy to partner with ReadWriteWeb, a kindred-spirited innovator, for the beginning of a journey to provide more relevant contextual ads .

To learn more about CONTEXA, please contact bdev at hakia.com We are more than happy to set you up with a custom demo.

Once again, hakia is a Webware 100 finalist – Please Vote!

April 2nd, 2009 by hakia Team

webware100.jpghakia has experienced amazing momentum over the past year, and we are proud to announce that we are once again a finalist for the prestigious CNET Webware 100 awards! The Webware 100 Awards recognize the 100 best Web 2.0 applications, chosen by Webware readers and Internet users across the globe.

Last year, over 1.9 million votes were cast last year to select the winners, including hakia in the “Search and Reference” category. To make that a reality once again, please vote for us here!

Thanks to our community of users for supporting the search engine and recognizing the importance of semantic technology for the future of the Internet. We look forward to more progress to come as we near completion of development.

Automated Categorization of Search Results, a New Era?

March 23rd, 2009 by hakia Team

Since the hakia Galleries have been on-line, we have received nothing but appraisals. Our proprietary approach to “Aspect Categorization” shines with examples in topics ranging from music to health. We currently cover more than a million popular queries.

hakia’s fully automated gallery production where the search results are categorized according to the query can be seen in the following demo link where 1425 different car brands and models are covered.

Car Brands & Models.

This is part of our ongoing effort to spread this capability to all search queries, effectively creating a new organization of the content on the entire Web, in a way as distinct as how Wikipedia invented its own style.

Microsoft’s recent news about KUMO and its screen-shots leave no doubt that some people are already convinced this is the way to progress in search.

Aspect categorization is different than what some search engines are already doing. For example, dividing the SERP into Web Results, Videos, News, Images, etc., is not aspect categorization. However, when the categories are related to the query, such as Obama’s Speeches and Quotes, Obama’s Fans, etc., (for the query Obama) then it is aspect categorization.

Aspect categorization in search is a tough business, it requires carefull off-line analysis to determine how the categories are going to be decided algorithmically, resources will be identified for crawling, and how the results will be detected to fit in.

The effectiveness of this approach in the broad search space is yet to be seen, and the users will have the last word as always. The tech bloggers and authors will be able to make their own judgment and recognize the limitations and imitations. In light of our patent application in progress, we are also anxious to see where all this leads to. Some exciting times ahead. Until then happy searching at hakia.

Books, Bytes and Trees: What Do You Know?

February 11th, 2009 by hakia Team

We put together a fun quiz and invite you to stop thinking about the economy/stimulus package/your job and take a moment to ponder about the size of information overload/resources/pollution in the Internet age. We think about searching it better- all the time!

Here is a teaser, the first question.

hakiaquiz

Take the hakia Quiz now at http://company.hakia.com/quiz/quiz1.html. Enjoy!

hakia ScoopBar, Now Highlights Pages Found by Other Search Engines.

February 5th, 2009 by hakia Team

A new version of hakia Scoopbar (both for IE and FireFox browsers) has just been released. This version highlights search results in the opened Web pages that are found by hakia, as well as Google, Yahoo, Live, or any other search engine.

An example is shown below for the query “roman invasion of jerusalem” using Google.

gog1

With the hakia Scoopbar installed and Highlight button activated (as shown above), you can open long documents and the search result will be located on the page by automatic scrolling and highlighting (as shown below.)

gog2

Auto-highlighting is increasingly becoming more important to tackle the 2nd search problem especially for longer documents. hakia team is committed to improve this functionality for the Web searchers.

Note that hakia ScoopBar does not monitor user behavior, does not track Web traffic, and comes with uninstall option. Give it a try and let us know your opinion.

Making Quality the Key to Web Searches

January 13th, 2009 by hakia Team

We are happy to share a commentary by our CEO, Dr. Riza Berkan, for the Project Syndicate that was published in the Japan Times:

In the not-so-distant future, students will be able to graduate from high school without ever touching a book. Twenty years ago, they could graduate from high school without ever using a computer. In only a few decades, computer technology and the Internet have transformed the core principles of information, knowledge, and education.

To read the full article click here.

Project Syndicate is an international association of quality newspapers devoted to bringing distinguished voices from across the world to local audiences everywhere, strengthening the independence
and upgrading their journalistic, editorial, and business capacities.