Archive for January, 2009

Making Quality the Key to Web Searches

January 13th, 2009 by hakia Team

We are happy to share a commentary by our CEO, Dr. Riza Berkan, for the Project Syndicate that was published in the Japan Times:

In the not-so-distant future, students will be able to graduate from high school without ever touching a book. Twenty years ago, they could graduate from high school without ever using a computer. In only a few decades, computer technology and the Internet have transformed the core principles of information, knowledge, and education.

To read the full article click here.

Project Syndicate is an international association of quality newspapers devoted to bringing distinguished voices from across the world to local audiences everywhere, strengthening the independence
and upgrading their journalistic, editorial, and business capacities.

Did Someone Just Expose Semantic Data?

January 12th, 2009 by Dr. Riza C Berkan, CEO

This is a response to Marshall Kirkpatrick’s recent post Did Google Just Expose Semantic Data in Search Results?.

There have been many trivializing depictions of semantic search and semantic Web in the blogosphere, so much so that I might have developed an allergic reaction reading them. However, Marshall is doing the right thing by provoking us to define this space better.

First of all, what is “semantic data”? I think what this means is “syntactic extraction” as I followed the examples described. The extraction problem by fitting syntactic patterns, sorry to disappoint some of you folks, is really not semantic analysis. Extraction problem has been around many years, and is being implemented all over the market in enterprise (and government) applications.

Take a word pattern “what is the capital of –” or “what is the capital city of –”. Then, obtain a two column list from the Web of the capital cities around the world. After 12 minutes 34 seconds programming, you will have an extraction algorithm (extraction from the query) just as how Google does in these examples… This is not semantic analysis.

One step further, you can sit down and define patterns until the cows come home, and end up with a large library of extraction algorithms. You might scan through Wikipedia to collect data (if you don’t care proper authorship and credibility). Then you will have something useful, no doubt about it. However, these are not to be considered as semantic analyses.

Bruno Haid expressed his concern by using the terminology “structured versus unstructured platform” for the target of extraction. That is still not enough differentiation between syntax versus meaning in my book. For anything to be considered “semantic” there has to be a model of understanding, involving concepts and associations.

I recommend an old article written by George A. Miller on the ambiguity of words which should inspire a thought as to why syntax-only approach cannot replace meaning. We had posted a fun example here following Bill Gates’ vision. An example of semantic parsing was also posted here previously.

The most important question is how to implement semantic analysis in a search engine environment. The examples in Marshall’s post do not come close to any kind of semantic analysis beyond simple extraction operation. Google has not shown any clues to make us think of an actual semantic back-end yet.

Search Box: Keep Your Curious Visitors on Site

January 6th, 2009 by hakia Team

With the start of 2009, we have just released a new and improved version of hakia Search Box. To see how it works, go to Search Box Page.

One immediate distinguishing feature of hakia Search Box is its flexibility to search in multiple domains as shown below.

searchbox

The second distinguishing feature is its sentence highlighting and semantic precision (especially with complex, long-tail, and unusual queries) as shown below. Note the uninterrupted text snippets (no ellipses) for Pubmed and health searches.

searchbox3

There are several ASP and PHP examples on the page with design options as outlined below:

- Web Plus Search (multiple domains as shown above)
- Site search (pick a site to search only its content)
- Pubmed search (search results from 10 million pubmed articles)
- Health search (search results from credible Web sources on health)

It is free up to 30,000 searches per day (which is the highest number offered to date).

Why do you need a good search box on your site? Well, you don’t want those curious visitors to leave your site and go to a search engine. With a good search box, you will keep them on your Web property.

If you already have a search box on your Web site and you are not sure what to do, you can add hakia’s search box as a semantic search option.

Give it a try and let us know.