Archive for the 'News' Category

A New Commercial Ontology from hakia

July 27th, 2009 by Dr. Riza C Berkan, CEO

Perhaps the world’s first, we are proud to announce our upcoming Commercial Ontology (CO). What is a commercial ontology? If you asked this question you have just touched on an important distinction: fantasy versus reality. In the context of World Wide Web, the CO is the realistic version of an ontology for the reasons explained below.

The Realities of the Web

We have accomplished two important innovations in building the CO. First, the development of concepts and lexicons followed a strict guideline of the realities of Web operations. What were these realities? Most of the search queries on the Web reflect a single dimension of intent, almost exclusively relevant to commercial topics. Note that the interpretation of “commercial topics” must be taken in the broadest sense possible. For example, if you were looking for “the benefits of foot massage” or “the director of the movie Last Emperor” your queries fall into the same commercial pattern. One particular distinction of the commercial pattern is that they come in short packages including a name (onomasticon), or always referring to something sold, bought, watched, heard, etc.

In contrast, many ontologies (if not all) that have been built to date, or claimed so, are focused on the use of language in the general sense, but not in the sense of commercial patterns on the Web. Therefore, their usefulness when tackling the Web search queries is greatly compromised, sometimes to the point of absolute failure. If such an ontology could disambiguate a dozen of different senses of the word “kill”, it would be sad news that the last 100,000 queries in the search logs did not include a single occurrence of the word “kill”. Like drowning in 2 inches-deep water, such ontologies will not utilize their disambiguation skills nearly 80% of the queries because the queries include nothing but onomasticons and/or they are too short (under-articulated).

The Sequence Approach

The second innovation used in the CO is the use of sequences instead of single words. A single word, like “kill”, is the most ambiguous state of information and is hardly used in human communication without a strong underlying/implied context. As a result, building a natural language processing (NLP) systems by taking single word as the unit of computation is an invitation for disaster.

In contrast, word sequences (2 or more words) are inherently safe and highly descriptive. Take “road kill”, for example. This sequence describes a corpse of an animal killed on the road by a passing vehicle. If a language processing system takes the sequences as unit of computation, 99% of the ambiguity problem vanishes. There is no need to process the word “kill” and “road” separately, trace their senses, and locate convergence to identify the meaning of “road kill” if you can just take the sequence “road kill” itself as your unit of computation for mapping. This is depicted below:

road kill

Note the number of traces required in a conventional ontology approach compared to the sequence approach. The sequence approach requires a lot of data storage space (which is dirt cheap) whereas the conventional ontology approach requires a lot of CPU for a simple mapping task (which is expensive). But the bad news does not stop there. The trace routes in conventional ontology requires manual work (impossible to automate) whereas sequence-based ontology can be easily built via automation.

I realize only a handful of people will understand the second point above. Nevertheless, the scalability and performance of the end product will speak for itself when we put the testing platform on-line.

Usage of the Commercial Ontology

The immediate use of the CO is related to search queries, or document characterizations, that are not tied to any advertising in conventional systems. This unrecognized domain of search queries and characterizations means loss of revenue. hakia’s CO is designed to fill this gap. For example, if the search query or page characterization is “beat generation” the CO can map it to “literature” on the fly. As a result, systems using the CO will have much deeper understanding of the incoming terms, thus will be able to recognize the underlying intent beyond the face value of the words. The same capability can be used in a number of places other than advertising with the same effect.

Stay tuned for the release of the first version of our commercial ontology.

Inspired by hakia, Bing introduces categorized search

June 2nd, 2009 by Melek Pulatkonak, COO

catsearchBing, the new search engine from Microsoft just went live and in doing so introduced a similar version of hakia’s categorized search. At its launch in 2006, hakia became the first search engine to provide categorized aspects of search queries via hakia Galleries.

hakia Galleries received industry accolades after their formal introduction in 2007. Our goal has always been to take search beyond 10 blue links. It was then no surprise when Microsoft invited us to show them the inner workings of the hakia Galleries in July 2008- shortly after their acquisition of Powerset. But it was a huge surprise to recently find out that Microsoft introduced categorized search in Bing. Today we checked out the Bing preview and compared the Bing’s categorized search feature to its inspiration, hakia Galleries.

hakia Galleries provide categorized aspects of search queries. For example, if you are searching for Obama, you can find information about his official site, headline news, images, biography, speeches, and more (see image below). Powered by semantic search, hakia Galleries prove 17 aspects of this query. We save the user time by answering 17 Obama related questions in one search. Compare the hakia Obama gallery with the same search at Bing.com (Bing provides only 7 aspects of this search query).

hakiaobama
bingobama1

Let’s look at another example. Search for lung cancer at hakia and Bing. hakia provides the searcher links for the following aspects of this query: Basic Information and FAQ, Image Search, Headline News, Symptoms and Diagnostics, Treatment, Procedures, and Therapy, News, Clinical Trials, Healthcare Facilities and Finding a Physician, Alternative Therapy, For Kids, Research and Statistics, Organizations, Message Boards, and Images. Compare that with Bing’s aspects: articles, symptoms, treatment, prognosis, stages, clinical trials, and images. Look familiar?

hakialc
binglc1

As Danny Sullivan put it aptly in his Bing review, “Probably the most significant change is that Bing now organizes search results into categories (gives Obama example)…The concept of grouping results also isn’t new. Long known as clustering, you can see it in operation at hakia (see Obama there) or Clusty (again, see Obama there).”

At hakia we could not dream of a marketing budget of $80-100 million. But hey, if you are out there to try Bing as an alternative search engine to Google, give the original categorized search a try at hakia.com (one of Bing’s inspirations!). You can surf the hakia Galleries here: http://gallery.hakia.com/ or try your search at hakia.com when you bing and ding.

A New Contextual Advertising Technology from hakia: CONTEXA, launched at ReadWriteWeb

May 19th, 2009 by Kartal Guner, Chief Architect

We are happy to announce that we have launched our new contextual advertising module of our semantic advertising system: CONTEXA. ReadWriteWeb (RWW), one of the world’s top 20 most popular blogs according to Technorati, is our first partner.

CONTEXA provides page-level contextual analysis on-the-fly and outputs keywords that represent the meaning of the page along with their meaning score. CONTEXA is offered as a service and can be integrated into any ad system. RWW has integrated CONTEXA where our system matches the contextual representation of a blog page with sponsors’ requirements on-the-fly to provide relevant ads to RWW readers for a richer experience. The red box in the image below shows this step.

rww

We believe that more relevant contextual ads will bring the return of contextual advertising closer to paid-search levels with the ripple-effect of increased CTR- conversion rates- revenue. CONTEXA is powered by hakia’s semantic core technology. To see how CONTEXA works, you can visit our CONTEXA page.

We had shared with our readers the comparison demo of hakia’s contextual capabilities with that of AdSense and Yahoo in the fall. We did not have a chance to do a comparison with Microsoft’s PubCenter. As we move along with the ReadWriteWeb’s implementation of CONTEXA, we will report about lessons learned and milestones marked.

We are excited to keep the wheels of innovation turning at hakia as our industry has plenty room for improvement. Today, Web users are overwhelmed with the quantity and suffer from the quality of display ads and quickly learn to ignore a good portion of the Web pages they visit. In the long run, the industry’s focus will have shift to increasing ad quality and limiting the supply to increase value. The path to this promise goes through enhancements to both contextual and behavioral ad targeting technologies. We are happy to partner with ReadWriteWeb, a kindred-spirited innovator, for the beginning of a journey to provide more relevant contextual ads .

To learn more about CONTEXA, please contact bdev at hakia.com We are more than happy to set you up with a custom demo.

Once again, hakia is a Webware 100 finalist – Please Vote!

April 2nd, 2009 by hakia Team

webware100.jpghakia has experienced amazing momentum over the past year, and we are proud to announce that we are once again a finalist for the prestigious CNET Webware 100 awards! The Webware 100 Awards recognize the 100 best Web 2.0 applications, chosen by Webware readers and Internet users across the globe.

Last year, over 1.9 million votes were cast last year to select the winners, including hakia in the “Search and Reference” category. To make that a reality once again, please vote for us here!

Thanks to our community of users for supporting the search engine and recognizing the importance of semantic technology for the future of the Internet. We look forward to more progress to come as we near completion of development.

Books, Bytes and Trees: What Do You Know?

February 11th, 2009 by hakia Team

We put together a fun quiz and invite you to stop thinking about the economy/stimulus package/your job and take a moment to ponder about the size of information overload/resources/pollution in the Internet age. We think about searching it better- all the time!

Here is a teaser, the first question.

hakiaquiz

Take the hakia Quiz now at http://company.hakia.com/quiz/quiz1.html. Enjoy!

hakia ScoopBar, Now Highlights Pages Found by Other Search Engines.

February 5th, 2009 by hakia Team

A new version of hakia Scoopbar (both for IE and FireFox browsers) has just been released. This version highlights search results in the opened Web pages that are found by hakia, as well as Google, Yahoo, Live, or any other search engine.

An example is shown below for the query “roman invasion of jerusalem” using Google.

gog1

With the hakia Scoopbar installed and Highlight button activated (as shown above), you can open long documents and the search result will be located on the page by automatic scrolling and highlighting (as shown below.)

gog2

Auto-highlighting is increasingly becoming more important to tackle the 2nd search problem especially for longer documents. hakia team is committed to improve this functionality for the Web searchers.

Note that hakia ScoopBar does not monitor user behavior, does not track Web traffic, and comes with uninstall option. Give it a try and let us know your opinion.

Making Quality the Key to Web Searches

January 13th, 2009 by hakia Team

We are happy to share a commentary by our CEO, Dr. Riza Berkan, for the Project Syndicate that was published in the Japan Times:

In the not-so-distant future, students will be able to graduate from high school without ever touching a book. Twenty years ago, they could graduate from high school without ever using a computer. In only a few decades, computer technology and the Internet have transformed the core principles of information, knowledge, and education.

To read the full article click here.

Project Syndicate is an international association of quality newspapers devoted to bringing distinguished voices from across the world to local audiences everywhere, strengthening the independence
and upgrading their journalistic, editorial, and business capacities.

Did Someone Just Expose Semantic Data?

January 12th, 2009 by Dr. Riza C Berkan, CEO

This is a response to Marshall Kirkpatrick’s recent post Did Google Just Expose Semantic Data in Search Results?.

There have been many trivializing depictions of semantic search and semantic Web in the blogosphere, so much so that I might have developed an allergic reaction reading them. However, Marshall is doing the right thing by provoking us to define this space better.

First of all, what is “semantic data”? I think what this means is “syntactic extraction” as I followed the examples described. The extraction problem by fitting syntactic patterns, sorry to disappoint some of you folks, is really not semantic analysis. Extraction problem has been around many years, and is being implemented all over the market in enterprise (and government) applications.

Take a word pattern “what is the capital of –” or “what is the capital city of –”. Then, obtain a two column list from the Web of the capital cities around the world. After 12 minutes 34 seconds programming, you will have an extraction algorithm (extraction from the query) just as how Google does in these examples… This is not semantic analysis.

One step further, you can sit down and define patterns until the cows come home, and end up with a large library of extraction algorithms. You might scan through Wikipedia to collect data (if you don’t care proper authorship and credibility). Then you will have something useful, no doubt about it. However, these are not to be considered as semantic analyses.

Bruno Haid expressed his concern by using the terminology “structured versus unstructured platform” for the target of extraction. That is still not enough differentiation between syntax versus meaning in my book. For anything to be considered “semantic” there has to be a model of understanding, involving concepts and associations.

I recommend an old article written by George A. Miller on the ambiguity of words which should inspire a thought as to why syntax-only approach cannot replace meaning. We had posted a fun example here following Bill Gates’ vision. An example of semantic parsing was also posted here previously.

The most important question is how to implement semantic analysis in a search engine environment. The examples in Marshall’s post do not come close to any kind of semantic analysis beyond simple extraction operation. Google has not shown any clues to make us think of an actual semantic back-end yet.

Search Box: Keep Your Curious Visitors on Site

January 6th, 2009 by hakia Team

With the start of 2009, we have just released a new and improved version of hakia Search Box. To see how it works, go to Search Box Page.

One immediate distinguishing feature of hakia Search Box is its flexibility to search in multiple domains as shown below.

searchbox

The second distinguishing feature is its sentence highlighting and semantic precision (especially with complex, long-tail, and unusual queries) as shown below. Note the uninterrupted text snippets (no ellipses) for Pubmed and health searches.

searchbox3

There are several ASP and PHP examples on the page with design options as outlined below:

- Web Plus Search (multiple domains as shown above)
- Site search (pick a site to search only its content)
- Pubmed search (search results from 10 million pubmed articles)
- Health search (search results from credible Web sources on health)

It is free up to 30,000 searches per day (which is the highest number offered to date).

Why do you need a good search box on your site? Well, you don’t want those curious visitors to leave your site and go to a search engine. With a good search box, you will keep them on your Web property.

If you already have a search box on your Web site and you are not sure what to do, you can add hakia’s search box as a semantic search option.

Give it a try and let us know.

Make Your Own Digital Newspaper

December 23rd, 2008 by hakia Team

Before entering the holidays, one may wish that the news we get everyday were somehow customized to our interests. For example, “I am not really interested in Baseball, or I like Jazz news to appear in my first glance view, or I need to monitor emerging progress about synthetic insulin, or…” People can have variety of first-grade interests, but they have to collect these information from different places everyday, or through clicking bunch of links. Why not have my own newspaper where every column is about my selected interest, laid out in the way I want?

We built my.hakia.com, which does exactly what is described above. A screenshot is shown below.

myhakia1

The screenshot above tells the whole story except one important differentiator.

Semantic technology of hakia allows high-level of precision compared to any other similar platform. This enables the user to park highly specific questions against the emerging news. Therefore, my.hakia.com can be considered as “intelligence gathering dashboard”. Let us tell you how.

If you search Google news for Obama’s strategy for the new team, you will see that the results are mostly irrelevant. Try to create a Google alert for this query and see the results for yourself on a continuous basis.

The same search at hakia for Obama’s strategy for the new team produces dead-on results. This is because semantic technology does not need “link referrals” to pull relevant results unlike Google-esque search engines. For dynamic content like news, there is no time to collect “Link-referral” statistics and that is why Google-esque search engines fail beyond simple triggers. Try the same query at Yahoo, it displays the same confusion.

This fundamental differentiation is a valuable asset for my.hakia.com users because they will be getting precise results for specific interests that they cannot get it anywhere else. Some ideal cases are outlined below;

- Monitor your business competitors
- Get information on latest progress in the treatment of diseases
- Keep an eye on your favorite artists by activity (like album releases)
- Stay in touch with particular economic developments (such as in real estate in your city)

Try my.hakia.com, and tell us if we have met your expectations.

Happy holidays