Archive for May, 2008

We Thank BlogHer Community for Giving Us Feedback

May 29th, 2008 by Melek Pulatkonak, COO

We asked BlogHer community to try our BETA search engine with health queries and give us feedback. We received about 300 responses, and were overwhelmed by the compliments we received. The BlogHers said the following:

What is your first impression of hakia compared to your favorite search engine?
17% said – hakia is better OVERALL.
22% said – hakia is better WHEN THE SEARCH QUERY IS MORE COMPLICATED (OR LONGER).
33% said – hakia is better IN CERTAIN TOPICS.
23% said – hakia is about the SAME as my current search engine.
5% said – hakia is WORSE than my current search engine.

After testing hakia, which one applies to you?
From now on, I will use hakia…
14% said – EXCLUSIVELY or MOST OF THE TIME
35% said – HALF OF THE TIME ALONG WITH MY OTHER FAVORITE SEARCH ENGINE.
28% said – OCCASIONALLY.
3% said – I will NOT USE hakia.
20% said – I have not yet decided.

Will you recommend hakia to your friends?
67% said – I WILL recommend it.
3% said – I WILL NOT recommend it.
30% said – I have not yet decided

The results show a remarkable consistency with our earlier survey resultsof 1579 participants which increases the significance of the feedback.

We thank all participants for the feedback on our semantic search engine. We are constantly improving our technology, especially as we hope to complete the development of our BETA site this year. Your suggestions were therefore truly invaluable.

Farrah and I will be at the BlogHer conference in July. Watch out for the hakia t-shirts to spot us!

hakia Trivia

May 27th, 2008 by hakia Team

For a change of pace, we thought it would be fun to share with you some hakia trivia.

1- What is the most expensive asset in hakia’s office on Wall Street?

a) Victor’s collectible chocolate
b) A bull sculpture
c) Riza’s chess board
d) Barbecue set in the terrace

2- Which one of these facts about a hakia team member is true?

a) He was traded for a mainframe computer
b) He survived a shark attack
c) He is the cousin of Bill Gates

3- What does hakia have that Google doesn’t, involving chickens?

a) A datacenter surrounded by chicken wire
b) More engineers who are allergic to chicken feathers
c) A song dedicated to the query “why did the chicken cross the road?”

4- A company used hakia’s name without permission doing what?

a) Promoting their hamburgers by the slogan “Hamburger that can only be found by hakia”
b) Promoting their magazine by the slogan “Do you think hakia is a Japanese restaurant?”
c) Promoting their video by an opening karate act with the sound “haaaa – kia”.

Come back for the answers.

Bill Gates Speaks Up For Semantics

May 23rd, 2008 by Dr. Christian Hempelmann, Chief Scientific Officer

Bill Gates’ criticism that pure keyword search is “just syntax and not semantics and has limits no matter how much you build those things up” came at the heels of a heated conversation I took part in at the Semantic Technology 2008 conference in San Jose. The session title was: Will Semantics Give Web Search a Face-lift?

It was clear from the outset that very different notions of semantics were used, so a lively discussion ensued during the Q&A, where the panelists compared their own approaches to take search to the next level. Since everyone belongs to a different school of thought, we agreed not to agree: Fernando Pereira, Research Director at Google, assumes that semantics can be captured from the use and formatting of language–ironically, he later stated that Wittgensteinian (meaning is use) or Fregian (meaning can be reduced to formal logic) approaches are futile. Google’s approach is using classic statistical machine-learning methods (robust, in the sense of a brick being a robust tool for switching off a light, but as we know non-scalable), so we know that there is no “semantics” focus. Peter Mika, a recent hire at Yahoo!, on the other hand, talked about their new SeachMonkey interface that is, inter alia, to be fed by RDF markup. Obviously, hakia’s position is rather different.

Keyword Co-occurrence Statistics as Semantics (Google)

Fernando killed the light with a rock.

If meaning is co-occurence for you, then this sentence will be a possible answer to queries about people dying in avalanches. Not much that structure of a webpage could help you here, either. Seemingly relevant words, not disambiguated as to their actual senses in the given context, will easily mislead you.

Syntax as Semantics (Powerset)

It should also be mentioned that Ron Kaplan, CSO of Powerset, made a few statements from the audience, including the very telling one that Powerset believes in the “syntactic web”, which pointedly illustrates his belief that you can get to meaning from surface syntax.

If meaning is syntax, then for you the sentence above is not distinguishable from this one:

Ron killed the program with a memory leak.

The surface structures of the sentences are identical, even some words overlap, but killing a light is different from killing a program (not to mention, killing an animate being), and the ‘rock’ is an instrument in the first case, while the ‘memory leak’ is a cause in the second. Syntax does not grant you access to any of these important differences in meaning.

Semantic-Web Markup as Semantics (Yahoo!)

If, on the other hand, you believe in semantic-web-style markup as the solution, then the author of the sentence will have to add tags that clarify that a lamp was switched off, hopefully in a way that another user has tagged this sentence:

Peter used his usual brick to turn off the lamp.

Semantics as Semantics (hakia)

If, finally, you have access to semantics, your constraints on the different senses of ‘kill’, ‘light’, and ‘rock’ will get you to the meaning automatically, and you will serve the sentence above only as an answer to queries about methods to switch off lamps, and not pollute your results with it otherwise. For more examples, you can read my prior blog posts.

Where we currently are in search is nice, but there is much room for improvement. Non-semantic methods have reached their ceiling. Carefully tested and appropriate semantic methods, based on understanding natural language, will get us to the next stage. We are phasing these in, beta release by beta release, and will show you the difference between real semantics and yesterday’s attempts at avoiding semantics. Stay tuned!

hakia Toolbar Combines Search and Social Networking

May 22nd, 2008 by hakia Team

We have updated our toolbar, the hakia ScoopBar, with a new feature: a shortcut button to hakia’s social networking platform “Meet Others who asked the same query.” The updated toolbar will now allow users to search the Web for information AND/OR search for others looking for the same information. This brings social networking and search one step closer.

A screenshot of an example is below. Please note that we combined two consecutive search screenshots in one.

On the left, the searcher typed “autism and vaccinations” in the hakia ScoopBar and clicked on “search” to access hakia Web search results. When the searcher clicks on a result link, hakia will automatically scroll to answer text in this Web page and highlight the answer text, thus cutting search time in half.

On the right, the searcher typed “autism and vaccinations” in the hakia ScoopBar and clicked on “Meet Others.” He/she was then immediately placed into a room where others have voluntarily posted comments about “autism and vaccinations.” The searcher can now connect to others without registration and with a single mouse click. Alternatively, he/she can post a message to the room to participate in the conversation.

hakia scoopbar

The new hakia ScoopBar is available in two versions to support both Internet Explorer and Mozilla Firefox. Please click here to dowload your ScoopBar!

hakia Presented at Cyber Security and Information Intelligence Research Workshop (CSIIRW) at Oak Ridge National Laboratory

May 18th, 2008 by Dr. Christian Hempelmann, Chief Scientific Officer

hakia blogLast week, we presented a paper at the Cyber Security and Information Intelligence Research Workshop (CSIIRW) at Oak Ridge National Laboratory in Oak Ridge.

The paper focused on the importance of accessing meaning in information assurance and security (IAS) tasks and OntoSem technology as the appropriate technology for this. The joint presentation outlined the OntoSem technology, introduced earlier OntoSem solutions to IAS developed at the Center for Research in Information Assurance and Security (CERIAS), and illustrated OntoSem’s use in hakia’s approach to internet search and its security, the most advanced application of OntoSem. Further illustrations for OntoSem-based IAS applications were provided by the Intelligent Content Monitor of RiverGlass, Inc. and the SENTINEL Classifier/Declassifier of Knowledge-Based Systems, Inc.

The presentation, co-authored with Victor Raskin, Brian Buck, and Arthur Keen, was entitled “Accessing and Manipulating Meaning of Textual and Data Information for Information Assurance and Security and Intelligence Information.”

For those who are interested, here are the URLs of some of the key organizations:

http://www.ioc.ornl.gov/csiirw/
http://www.cerias.purdue.edu
http://www.riverglassinc.com
http://www.kbsi.com

As a technology provider via licensing, we are taking steps for new exciting partnerships. You can visit hakia lab to get a glimpse of our technology, or join hakia club to be in our distribution network.

Chess Games at hakia

May 17th, 2008 by Melek Pulatkonak, COO

One might say that developing a semantic search engine is a daunting task, not to mention the challenge of marketing and building a business around it. Every move we make at hakia is in essence a chess move. And here is how we make the moves!

webware.png

The length of the table above is one of the closely guarded secrets of hakia. By the way, we are looking for talent to join our team and the chess games. The following skills are a plus:growing tomatoes, singing opera, playing a musical instrument, drawing cartoons, playing soccer, making videos, painting, taking artistic photographs, flying kites, commuting to work with your bike or a deep interest in Vespas. If you are looking for an interesting place to work, give us a ring.

Semantics In, Popularity Out

May 14th, 2008 by Dr. Riza C Berkan, CEO

We congratulate Powerset for their launch. Some people must have a gun pointed at their heads to rush to a conclusion using one or two examples. Powerset is good, powerset is bad, etc. Well, I think they all are missing the point. So much for encouragement!

The clear message is this: Semantic technology is here, and will evolve to challenge and eventually push out the popularity based search methods. Here are the main reasons why:

1- DYNAMIC CONTENT
Dynamic Web pages and news articles move with such a fast pace that there is no time to collect any kind of statistics (link referrals) for popularity algorithms to do their job. By the time such referrals are made these pages become “history”. Thus, the only means to analyze them is via semantic algorithms that are not depending on statistics collection.

2- LONG-TAIL
A recent study shows that the average Web page has 474 words, and 41 links, 10 of which are pointed outside the domain. Any linguist would confirm that there can be 1000 queries that can be asked to a Web page of 474 words. If only 10 links are pointing out on the average, that means 99% of the meaningful word sequences (queries) are not wrapped around links to point out to any Web site. That is what creates the long-tail “relevancy” problem. There is so much valuable information left out using a popularity method. We, at hakia, call it “the hidden failure”. Semantic algorithms does not depend on statistics collection, thus are the only means to tackle the long-tail problem.

3- USER INTERACTION
The current generation of Web searchers are accustomed to use the pigeon-keyword language. But the average length of a Web query is on the rise. That means elevated expectations, problem solving, communication with (more like) natural languages. Eventually, people would love to talk back and forth to a search engine pretending to be Mr. Spock. None of these can be handled by popularity algorithms. We need semantic systems to understand text and speech.

4- CREDIBILITY
Search results that are ranked by popularity algorithms are destined to be commercially-biased. I am not talking about those “sponsored links.” If you are suffering from back-pain, you may have to sift through popular results about massage parlors, spas, and mud baths, before you encounter a credible source. With semantic technology, credibility of a source is not compromised by the ranking algorithm. It can be controlled to the full extent by expert advice.

5- ADVERTISEMENT ACCURACY
As a suitcase producer, you don’t want your ads to be pushed next to a murder story where the body was disposed using a suitcase. Content understanding is essential in on-line advertising, and can only be delivered by semantic advertising systems in a consistent basis.

At hakia, we call the combination of all these 5 points as the Quality search, as opposed to Popularity search. Quality is the new perspective for the consumers who had never been exposed to it until recently, and the semantic technology is the enabling force behind it.

It is no longer a big secret that all existing search players are also looking into the semantic technology. The question at this point is how good and comprehensive these technological developments are. It is just a matter of time until the consumers decide the winner and silence all those shot guns. Of course, when the tide changes, we may see roses popping out from their barrels.

For those who are interested, I have written about what takes to test a semantic search engine properly. It requires at least couple of hundreds of queries specially crafted to test the competency in various areas. Then, one can compare it with Google, provided they both have the same corpus to work on for the search queries. That’s how it is supposed to be done instead of a shot-gun approach.

Congratulations Powerset. Keep it coming.

Creating Highly Specific Communities around Search Queries

May 1st, 2008 by Melek Pulatkonak, COO

To some people, it has been somewhat controversial that we combined our quality search and the social networking (Meet Others who asked the same query) to provide the users a more comprehensive picture. However, this combination uniquely enables the creation of highly specific discussion environments and communities.

The best examples come from the medical world where people suffer from specific conditions and they have questions and stories to tell. For example, consider the query below:

what infections are caused by mrsa?

The “quality-stamped” search results of hakia are shown below where credible sources are ranked at the top. And the user can click on the MEET OTHERS button to jump to the discussion room.

webware.png

There are several discussion rooms on this subject ranging from the specific question “what infections are caused by mrsa?” to more general MRSA rooms. With our new “ADD VIDEO” option, the rooms are much more animated than before as seen below.

webware.png

There is no other social networking environment today where one can find (or create) highly specific rooms, to the point of the exact question to define the community. Usually groups are formed in more general topic rooms, and the users must scan through conversation threads to locate the specific interest. The increasing participation is a sign that this combination offers a unique value to the Web searcher.

Give it a try!