hakia’s Semantic OntoParser
Here, we explain how hakia’s semantic OntoParser takes a sentence, processes it, and produces a text-meaning-representation. This is the essence of making computers understand natural languages by means of ontological semantics resources and parsing algorithms.
Take the simple sentence “The outlaws ran cocaine into the United States.†We (the human brain) can identify the meaning of this sentence easily: Humans who habitually commit illegal acts clandestinely transported a psychoactive drug into a country called the United States. We also infer all kinds of other things from our knowledge of the world: The cocaine probably came from South America, will be sold illegally for profit and consumed by people who will show certain changes in their behavior and emotions (probably pleasant for them, usually unpleasant for those around them) after consuming it, typically by snorting it up their noses, etc.
Let’s see how close to this understanding the computer can get with ontological semantics. First, OntoParser produces all potential senses of the words in the sentence and breaks the sentence up into clauses based on central events that are identified among the senses. The screen shots from OntoParser demo are shown below.
“The outlaws†has only one sense, CRIMINAL (note, we use capital letters to indicate that we’re talking about concepts to express a sense, not words in the sentence), but for “run†our system has all of 9 senses, from which it must pick, for example RUN, RUN-FOR-OFFICE, or SMUGGLE; “cocaine†has two related senses as DRUG, as has “United Statesâ€, a COUNTRY. With 1 x 9 x 2 x 2, this simple sentence has 36 potential meanings at this stage.
But not all these combinations are possible, CRIMINALs can’t FLOW a DRUG, for example. These are excluded by matching properties of the CONCEPTs in our world model, the ontology. FLOW, for example, allows for no agent, only a theme, and that theme must be a liquid. Neither CRIMINAL nor DRUG is a liquid, and only one of them could be fit into that EVENT anyway. The parser sets the 9 possible EVENTs and tries to fill all the other OBJECT senses in the sentence as participants in the EVENT.
The event SMUGGLE, allows for theme that must be a WEAPON, or ILLEGAL-DRUG, or IMMIGRANT.
The parser fills all EVENTs with the possible PARTICIPANTs (case roles) from the sentence that it has chosen in the previous step. Then it weights the possible EVENTs and all combinations of their PARTICIPANTs, in terms of how well the PARTICIPANTs fit into the EVENTs.
For most EVENTs, CRIMINAL can fill the agent slot, but the other CONCEPTs fit nowhere; for fewer EVENTS, UNITED-STATES can be fit into theme or location, gaining them a higher score. Even fewer EVENTs can accommodate all three other CONCEPTs (some of them actually wrong). But SMUGGLE wins this race because ILLEGAL-DRUG fits closest to the theme it can take. So finally the parser outputs the text-meaning representations from the top scoring down to the lowest scoring.
This capability is the essence of semantic search where the concepts in a given query are matched to the concepts in Web pages. The range of applications that can use this technology includes summarization, categorization, classification, abstraction, machine translation, data mining, and more.



February 18th, 2008 at 8:30 am
Nice explanation. As far as I understand, search relevance for such an engine is determined by a vocabulary and words’ relations. It must be working for long phrases, but what about short key phrases of 1-2 words? Espessially it touches 1 word expressions: for example Google shows “glossary” information for a short term (1st place in SERPs is wikipedia, etc.) and some other search engines prefere to show shops/markets containing the asked product. How do you solve this uncertainty?
March 16th, 2008 at 12:52 pm
So my understanding is that using hakia.com in another language is very hard and will take very long time.
March 18th, 2008 at 12:15 am
No, Ontology is language independent. Only lexicon translation is necessary. That is one-to-one translation and is readily available.
May 13th, 2008 at 1:29 am
I see, many thanks for replying.
(Umarım pazarda hak ettiğiniz konuma ulaşırsınız Rıza Bey, sizlerle gurur duyuyoruz
)
January 12th, 2009 at 11:07 am
to Riza C Berkan: I think that “lexicon translation” is not sufficient as long as it is not sufficient for machine translation in general. Your ontology can be language independent but semantic extraction or decoding whatever you called it – it is language dependent, and this part is not just lexicon of words see f.e. idioms like “run” ~ “run-for-office” these relations are language dependent. Anyway I can be wrong – it your case, it should be very easy to port hakia to other language – are you planning anything like that?