Semantic Advertising = Long Tail Monetization
One of the most important reasons for developing semantic technology is to increase the efficiency of online advertising. There are two immediate effects like a chain reaction. First, meaning-based relevancy increases the frequency of correct impressions (easy to agree with). Second, the number of correct impressions can be increased against long tail queries (not so easy to visualize it.) In this blog entry, I want to elaborate more on the second point and explain how we tackle it at hakia.
The long tail accounts for a significant fraction of daily queries that have never been seen before. It is reported to account for 1/3rd of all daily queries by several search engines. Obviously, if 33% is unique every day, the percentage of the unique queries per year is a much larger number, resembling the famous iceberg picture.
To see what long tail queries look like, you do not have to review the search logs. You can simply look at the word distributions on your own Website. For example, if you have a Web page that includes 200 words describing your product/idea, then you can ask this question: How many possible questions/queries can be asked that your page has an answer to? If you are not a trained linguist, you will possibly undermine the number. But we can tell you that the number of possible queries/questions can be in the thousands for a page with 200 significant words.
Thousands of queries include all possible meaningful variations of the ideas you have presented on your Web page. However, make no mistake about it that if you use a linear system with no semantics involved, the number of word permutations/combinations can be in the billions. Thus, being able to extract only the meaningful ones is the heart of the technological problem.
It is an infeasible task to extract thousands of queries/questions manually for every Web page you have, not to mention creating an advertising campaign by bidding these terms one by one. So, here is the definition of the long tail for you: Those queries/questions you are not extracting from your Web pages are the long tail queries. The ones that you are extracting are the fat tail, popular, and short queries, and most likely, you are using them in your advertising campaign today.
Here is an example. If you type “celebrex” to Google, you see the ad for this product, as shown below.
|
The advertiser made a bid for the term “celebrex.” They may also have made bids for “arthritis”, “arthritis pain”, “arthritis treatment”, and so on. But, they have not made a bid for the query “What can ease arthritis pain, stiffness, and inflammation?” as shown below:
|
The reason that there is no ad for this query is because there are zillions of possible queries and the advertiser did not have the means to create these campaigns. The cost of extracting them manually, then bidding them one by one far exceeds the benefits. Therefore, long tail queries are not monetized successfully in today’s systems. The next challenge in online advertising is to take advantage of this sleeping giant.
At hakia, we have developed our own Semantic Advertising system to tackle this problem, and we are testing the system now. To tackle the long tail monetization problem, we built two important technologies: (1) QDEXing that extracts all possible meaningful questions from a given page, (2) automated long tail campaign so that the advertisers do not have to do it manually.
Taking on the example shown above, we have QDEXed the Web pages of Celebrex, and created a long-tail campaign at hakia’s simulation room. The long tail query “What can ease arthritis pain, stiffness, and inflammation?” brings an ad as shown below.
|
For Celebrex, there are thousands of queries extracted by QDEXing that include a vast amount of semantic variations.
An advertiser in hakia’s system can create a “long tail” campaign by simply pointing the URLs for his/her products, then make one bid for the entire campaign. To simplify the economics of it, the advertiser only pays for the successful long tail queries.
Stay tuned for updates on hakia’s Semantic Advertising System.



September 12th, 2008 at 6:03 am
It is not just semantics, but in fact the syntactic-semantic search that is most important as you exemplified. While, semantics can be inferred pretty well from the bag-of-words as most search-engines do and syntax-capturing isn’t always an easy affair, a good blend of the two is required to really capture the “meaning” and provide the relevant targeted ads. Good work and best wishes!
September 25th, 2008 at 4:58 am
Your post is very valuable, thanks
October 7th, 2008 at 12:00 pm
On the other hand, if you type the query “What can ease arthritis pain?” into Google, you do in fact get an ad. I wonder why the longer query you give an example doesn’t yield any results, since “arthritis” is presumably the keyword of interest in both cases.
October 8th, 2008 at 8:33 am
Interesting, you always learn something.