With the recent article on ReadWriteWeb, it seems like the old debate is back. Are Google’s squared results coming from a real semantic backbone, or is it a good old entity extraction trick anyone, who is capable of copying and pasting lists, could do?
We illustrate 10 points below that define semantic search using our online demo where we compared hakia’s enterprise search system with Pubmed’s search engine, side by side, QDEXing 20 million documents on Pubmed.
1- Handling morphological variations
A semantic search engine is expected to handle all morphological variations (like tenses, plurals, ect.) on a consistent basis. In other words, the results should not change whether you type “improve, improves, improving, improved, improvement”. The example query “improving quality of life” illustrates that hakia results contain morphological variations of the query.
2- Handling synonyms with correct senses
A semantic search engine is expected to handle synonyms (cure, heal, treat,.. ect) in the right context and with correct word senses. For example, the word “treat” can mean doing social favors as in trick and treat, which would not be correct in the medical sense. The example query “is there a cure for ALS” shows that hakia brings results with synonyms with the correct senses. The level of sense disambiguation in a semantic search engine is a sign of its progress.
3- Handling generalizations
A semantic search engine is expected to handle generalizations (disease = GERD, ALS, AIDS, etc.) where the user’s query is expressed in generalized form and the result is expected to be specific. The example query “Which disease has the symptom of coughing?” brings a result set in hakia such that GERD is recognized by the system as the specific answer.
4- Handling concept matching
Perhaps the most challenging functionality among all, a semantic search engine is expected to recognize concepts and bring relevant results (political instability = insurgency, unrest, etc.) Usualy, the depth of this capability is increased in verticals of operation, and it would be unrealistic to expect coverage in all subjects under the sun. The example query “political instability” brings a result set in hakia including concept matching.
5- Handling knowledge matching
Very similar to the previous item, a semantic search engine is expected to have embedded knowledge and use it to bring relevant results (swine flu = H1N1, flu=influenza.) Knowledge match and concept match are similar in principle, yet different in practice in the way the capability is acquired. The example query “swine flu virus” brings a result set in hakia where these kind of matches are visible.
6- Handling natural language queries and questions
A semantic search engine is expected to respond sensibly when the query is in a question form (what, where, how, why, etc.) Note that a “search engine” is different than a “question answering” system. Search engine’s main task is to rank search results in the most logical and relevant manner whereas a question answering system may produce a single extracted entity. The example query “how fast is swine flu spreading?” brings a result set in hakia to shed light to this capability.
7- Ability to point to uninterrupted paragraph and the most relevant sentence
Unlike conventional search engines where a query points to documents, semantic search is expected to do much better. A query must point not only to documents but also to relevant sections of them. This eliminates 2nd search where the user is supposed to open the documents to find the relevant sections. The previous example query “how fast is swine flu spreading?” shows this capability as displayed below.
8- Ability to enter queries freely, no special formats like quotes, or Boolean operators.
When entering a query, special format requirements are becoming a thing of the past even with today’s non-semantic search engines. These formats perform gross approximations to substitute meaning match, and are signs that unveil the underlying weaknesses of the search technology.
9- Ability to operate without relying on statistics, user behavior, and other artificial means
A semantic search engine is expected to bring relevant results by analyzing the content of a page (or document), its source, authors, and the credibility of the results in response to a query. Relying on link referrals, user behavior/tagging, and other artificial means may produce good results when such data is available, but are outside the realm of semantic search. By not relying on artificial input, semantic search technology is more universal, applicable to any situation especially to enterprise documents and real-time content where such data does not exist.
10- Ability to detect its own performance
When there is no semantic content analysis in a search algorithm, relevancy scores refer to artificial measurements, like how popular the page is. A semantic search engine is expected to produce a relevancy score that reflects the degree of meaning match. This capability provides flexibility for the developers to apply meaning thresholds. Accordingly, the search engine can understand its poor performance to automatically flag areas of improvement that is needed.
In our experience, these 10 points make search a semantic search, and it requires an entire new infrastructure built from ground up. Being able to implement some of these capabilities at full capacity is rare and often unnecessary as it would require tremendous resources. A full capacity semantic search is more feasible in application to vertical topics, especially when embedded knowledge and concept coverage can be attained at a reasonable cost. Focusing on the delivery of concentrated semantic capabilities in verticals is our new strategy in enterprise search. More on this coming soon.