Archive for December, 2008

Make Your Own Digital Newspaper

December 23rd, 2008 by hakia Team

Before entering the holidays, one may wish that the news we get everyday were somehow customized to our interests. For example, “I am not really interested in Baseball, or I like Jazz news to appear in my first glance view, or I need to monitor emerging progress about synthetic insulin, or…” People can have variety of first-grade interests, but they have to collect these information from different places everyday, or through clicking bunch of links. Why not have my own newspaper where every column is about my selected interest, laid out in the way I want?

We built my.hakia.com, which does exactly what is described above. A screenshot is shown below.

myhakia1

The screenshot above tells the whole story except one important differentiator.

Semantic technology of hakia allows high-level of precision compared to any other similar platform. This enables the user to park highly specific questions against the emerging news. Therefore, my.hakia.com can be considered as “intelligence gathering dashboard”. Let us tell you how.

If you search Google news for Obama’s strategy for the new team, you will see that the results are mostly irrelevant. Try to create a Google alert for this query and see the results for yourself on a continuous basis.

The same search at hakia for Obama’s strategy for the new team produces dead-on results. This is because semantic technology does not need “link referrals” to pull relevant results unlike Google-esque search engines. For dynamic content like news, there is no time to collect “Link-referral” statistics and that is why Google-esque search engines fail beyond simple triggers. Try the same query at Yahoo, it displays the same confusion.

This fundamental differentiation is a valuable asset for my.hakia.com users because they will be getting precise results for specific interests that they cannot get it anywhere else. Some ideal cases are outlined below;

- Monitor your business competitors
- Get information on latest progress in the treatment of diseases
- Keep an eye on your favorite artists by activity (like album releases)
- Stay in touch with particular economic developments (such as in real estate in your city)

Try my.hakia.com, and tell us if we have met your expectations.

Happy holidays

Scalability of Semantic Search on the Web

December 4th, 2008 by Dr. Riza C Berkan, CEO

racecarLet’s start with an analogy.

If you ever had a flat tire, you would know what it takes to change the tire. Using the standard equipment available in the trunk of any car, changing a flat tire would take anywhere from 15 minutes to 30 minutes. Doing the job requires minimum knowledge.

In Formula races, the time required for changing a tire would be under 8 seconds. It requires high-tech equipment engineered specifically for this task, and trained professionals to do it fast.

The difference between any semantic technology versus the one that will become a Web search engine is very much like changing a flat tire of any car versus a race car in Formula-1.

Powerset’s limited coverage (Wikipedia only) was a recent example that helped awareness of the scalability issue in the eyes of technology savvy readers. Without overcoming the scalability challenge, a semantic technology cannot become a Web application, nor can it become a solution in enterprises handling vast amounts of documents.

Compared to conventional indexing search engines (with popularity flavor), a semantic search engine comes with the burden of extra load. This is true because semantic algorithms do much more than what the indexing search engines do both at the back-end and on the fly. The extra load is comprised of concept maps (ontology) and lexicon that are necessary to analyze Web pages as well as the incoming queries.

Consequently, if you have an indexing search engine, it becomes a nightmare to add the extra “semantic” load on top of indexing. This is the basic reason why the conventional search engines like Google cannot be easily converted into semantic search engines unless the entire infra-structure is redesigned from bottom up.

At hakia, we have engineered a solution to diminish the extra “semantic” load by re-inventing the indexing operation that is suitable for semantic operations. It is called QDEXing, which stands for “query detection and extraction.”

QDEX is not a table of words versus document IDs, rather it is a table of extracted queries (word sequences) versus paragraph IDs. It can also be viewed as the decomposition of text into its most meaningful knowledge sequences. Once such a decomposition is done with accuracy, there is no need for a table like index, and no need for taking intersections. The extra load evaporates via direct access gateways.

Extracting query sequences from a typical Web page (500 words) is the key component in QDEXing. Normally, there could be up to 1000 query sequences extracted from a Web page that make sense by human inspection. However, the permutation space with 500 words is huge, and there could be billion possibilities of creating sequences. The challenge is finding the 1000 out of billion by a computer algorithm that makes sense. That is where the semantic technology is heavily used.

QDEXing is also the basis of identifying meaningful keywords for advertising applications as I explained in the previous blog article.

The scalability issue does not stop with solutions to the back-end. Analyzing the incoming query on the fly, and ranking the retrieved paragraphs from the QDEX are the second challenge. We have developed the SemanticRank algorithm just for this purpose, but I will talk about this in another blog post.

Our scalable solution enabled us to QDEX credible Web pages as you would see in our search results in a separate column. The document coverage of our QDEXing operation is increasing with favorable speed while hakia can handle queries within 1 second average response time.