Searching News: Nothing but Long Tail

April 20th, 2007 by Dr. Riza C Berkan, CEO

tail.gif According to Chris Sherman’s recent post Google will start to mix news with Web search results. We, at hakia, looked at each other and asked “aren’t we already doing that?” The answer is Yes, since last August. Not only did we introduce this freshness booster last year, we have also doubled the capacity of news volume to be mixed with Web search results in our latest BETA update.

The bigger news is that searching “news” is nothing but a long tail experience. Since there is literally no time to collect statistics to determine popularity, most search engines are using the short-cut method: a long list of “trigger” words. If your query is “Britney,” for example, then you are likely to see some news at a designated spot on the results page. However, you may not see the same news result if you enter “singer with post-partum depression.” The news article for the first query will not show up for the second query because it did not contain the right trigger. You have to try this experiment with a news article that is only few hours old. When the news get older, it may show up from the archives, and then that is besides the point.

Google says it will give up the practice of designating a space for news, and start mixing them with Web search results using a new “no-trigger” algorithm. It means search for better search is on its way, and we wish them good luck in this new endeavor.

The only proper way of handling long tail searches, whether it is for news or anything else, is to deploy a full-scale, uncompromised semantic search capability. If your algorithm can understand the text and the query, and if it can find the matching concepts (not only the words) in a split second, then you don’t have to collect statistics while the news rot. Nor would you need a trigger list. And that’s the news from hakia.

delicious:Searching News: Nothing but Long Tail  digg:Searching News: Nothing but Long Tail  furl:Searching News: Nothing but Long Tail  reddit:Searching News: Nothing but Long Tail  

16 Responses to “Searching News: Nothing but Long Tail”

  1. Mike Says:

    I love your search engine. Very interesting idea with galleries.

    How can a website be added to your galleries?
    My website is

    http://www.pissedconsumer.com

    Regards
    Mike.

  2. Tommy Says:

    I love your search engine, too!
    Extremely interesting with semantic search.

    How can a website be added?
    My website is

    http://www.hittajurist.se

    Regards
    Tommy.

  3. Marianne McEachern Says:

    Now that news about your Search Engine is bieng released a lot of us would like to know how to submit our sites. Will that be something reflected on the site in the near future?

  4. Melek Pulatkonak, COO Says:

    Dear Marianne & Tommy,

    Our user care team will email you to confirm your site submission request.

    Thank you for contacting us!

    Happy searching at hakia.com,

    Melek

  5. Hawaiian Shirts Says:

    How reliable can your semantic search be if your data centers have stale outdated data? IMO this is on the same track as ASK whereas normal everyday people simply wont use it. Relevant and fresh results are the path to consumer trust.

    I was searching for Hawaiian shirt Fridays and it omitted a page that’s currently listed in the top results on 3 major SE.

  6. hakia user care team Says:

    Hi Hawaiian Shirts,

    We agree that relevant and fresh results are the path to consumer trust. To achieve that goal, we have invented a new infrastructure to deploy natural language capabilities. Furthermore, we are the first search engine to mix news and search results.

    Please bear in mind that our search engine is in BETA operation and not a finished product yet. We are in development mode and are continuously improving the search engine with new features and products. Our capabilities will be more visible as we make progress.

    Please keep on searching at hakia.com. We always appreciate a good question and feedback. Our user care team is looking into the search case you have brought to our attention.

    Happy searching at hakia.com,

    hakia user care team

  7. bobby Says:

    i like the labs and semantic search, are you planning on building date sensitive search?? ie: I would like to search for current (only 2007) info/news on a company or person and not be subjected to webpages or news from previous years.

    bobby

  8. 360 Says:

    I would really like to pick up on the point Bobby made here, but I at the same time also expand the discussion onto different types of searches, as well as having the ability to filter your results more broadly. Before I begin, I would just like to point out that I think all this natural language processing and semantics stuff is great and I definitely applaud you for what you’re trying to accomplice here (so I really don’t want anyone – least of all you guys at hakia – to think I’m trying to demean any of this, I’m merely attempting to provide some constructive feedback). I merely feel that with it ‘just’ being a websearch with no real ways for a user to filter or change which results their looking at once they have them (I guess beyond continually rephrasing their query); it just feels a tad limiting. Yes, you guys do provide news results within those your web search and you clearly mark them as such, but my feeling on them (and this may be the slightly harsh bit, lol), is simply ‘and?…’. What if I ‘only’ want to find all the differenent headlines on a particular subject, or I if I then want to only want to see articles published on (or between) certain dates. Now, I know how to accomplish this in other search engines, but I can’t seem to see a way within hakia. This goes for other types of content; too, as there have been many instances where I would much rather limit my searches to a specific type of results over just seeing plain old WebPages. These include things like performing a blog search, if I’m more interested in peoples opinions on a particular subject. This is before I’ve even mentioned the ‘multimedia’ content, such as video, audio and images, which I’m well aware are an entirely different kettle of fish, but I still think that there is a lot of scope for allowing the user to choose the content that they wish to search for.
    Now, I’m really sorry if I’ve completely missed the point of what you’re trying to achieve here, but I literally cannot imagine not needing to be able to exclusively search different types of content for my results, as and when the situation arises. Nor can I imagine – regardless of how good you say all your NLP and semantics are – not needing a certain number of filtering options once I’ve received my results. Sticking entirely with the web search here, some common and fairly basic filters include results by country, language, file type, time site last updated etc. Accoono, another search engine that I like (and I believe has at least some elements of AI in it), has some really cool filtering options, but taken straight from the results themselves. So while you have some of what I’ve mentioned, I believe there are also things like actual people or company names mentioned, but it’s all lifted from the results and you can narrow your results based on those. While I think it could still do with an ‘exclude’ option to help you eliminate some of those things from your results if they are unwanted, I still believe it has been pretty well done and is more or less what I’m talking about – though of course there are plenty of other equally valid ways of implementing these sort of features. The other thing I feel that you are laking in comparison to your competitors is the ability to ‘exclude’ words and sites from your results. This, within most of the other engines is achieved through a simple use of the – before a word and the word ‘site:’ to either limit or exclude content from a particular ‘provider’ or domain. Yes, I know you guys are trying to make your search less about keywords and more about concepts and what not, but to me that would make a feature like this even more powerful, as you’d be able remove an unwanted concept and ideas entirely from your results, instead of having to think of all the different keywords it might be. Other than that, it seems to me at least that the ‘site:’ operator could be implemented in much the same way it always has.
    Once again, for me it just keeps coming back to the fact that once you’ve entered your query and hit the enter button, well then that’s kind of it. I just feel like I wanna be doing more than merely navigating page after page – especially if I keep seeing lots of results that I really don’t want, such as from a particular site that I may know well, or in a language I can’t read etc. Options like this, that help you really delve into the result set you have been given are just as important for things like research as anything else and I really hope that you implement them to some degree within your next beta.
    Anyways, sorry if I’ve stepped on anyone’s toes here and I’m sure you’ve got of other really cool features on the horizon, just wanted to add my two cents – hope it’s helped.

  9. hakia user care team Says:

    Hi boby & 360,

    Thanks for the notes & feedback!

    We wish we could share with you the entire list of our upcoming products. We can just tell you, yes, hakia will offer news search, image search and other products IN ADDITION to providing the users innovative Web search experiences like mixing search and news results to boost freshness of results.

    360, re: your second point of adding filters. Keep on testing us with more questions where you give the engine more information on what you are looking for and keep on checking our progress at hakia.com! We would love to hear back from you toward the end of the year when we are fully developed if you still think filters will be necessary to improve search relevancy.

    Happy searching at hakia.com,

    hakia user care team

  10. 360 Says:

    Hey guys,

    Thanks for the response, it’s really great that you’ve opened up your blogs to comments and are taking the time to respond to peoples individual points and feedback, it really is refreshing. It’s also great to hear at least a little more about what you are planning on doing, such as the different types of search, but I’m most definitely looking forward to seeing what all this ‘innovative’ stuff you’re talking about turns out to be – it’s certainly a very exciting prospect. Regarding your reply about my points on the filter options, I will most definitely take you up on your offer of continuing to test your search engine (though would have would probably have kept doing that anyways, lol), to see if I need them by the end of the year – will get back to you later on that one I guess, lol. In the mean time, however, I do have another bit of feedback that I’ve just thought of. This again may seem a slight downer, but is something I’m currently putting down to hakia’s beta status. This concerns the spell checker and the fact that I have noticed that it seems to miss quite a few quite obviously miss-spelt words, which I‘ve tested within a couple of the other engines (ok specifically google) that catch them. I’ll give you some examples of ones I’ve noticed, but please be aware that while I did stumble upon this due to some fairly sloppy typing, all of these are intentional tests that I’ve run:
    Andomeda (Andromeda)
    Anromeda (Andromeda)
    Promoton (Promotion)
    Promoion (Promotion)
    Engenering (Engineering)
    Busines (Business)
    Nehbours (Neighbours)
    Correnation (Coronation)
    Penicilin (Penicillin)
    Phycology (Psychology)
    Mihgt (Might)
    Unfortunately, this isn’t quite where it ends, as it isn’t as simple as just missing a few words on their own, as I’ve also noticed a couple of times where if a word is spelt incorrectly within a longer query, it can be missed, even though the incorrect word is picked up on its own. Take for example the question: ‘what are some humorus jokes available online?’ type that in and the mistake – humorus (humorous) – won’t be picked up, even though it is when spelt that way on its own. Another interesting example that I’ve just discovered, is that the reverse seems to be true for some of the individual spelling mistakes above – they’re not caught on their own, but seem get flagged up as part of a longer query.
    Is there anyway that these issues will be addressed, as the only reason that I’m putting so much emphasis on it, is that these mistakes can often bring up radically different results to what their correct spellings would. This could mean that for someone who is relying, or indeed expecting the spell checker to act as their safety net (and I’m sure there are many people who do just that), they may not get the results that they where expecting, without realising they’re not actually searching for what they thought they where.
    My final point on the spell checker, is that when you do flag up a word as potentially wrong, could you please highlight the offending word (either in bold, or italics, or through that yellowy highlighter you use on results). While this is not so important for one or two word queries, it certainly makes things a lot easier when it comes to the much longer queries/questions that you’re encouraging the searcher to type in, especially when they go beyond the ‘end’ of the search box.

    Anyway, thanks again and I wish you the best of luck as you continue to search for better search.

  11. hakia user care team Says:

    Hi 360,

    Thank you for giving us a good kick in the tires!

    You have raised a great point. Yes, we know our spell checker is not working best at this time and we are working on it.

    Please note that we have frequent updates and a very different approach to Beta development. By looking at our homepage you can keep track of the Beta versions. For instance, we now have Beta 13 online since February. Beta 14 update is scheduled for May.

    In short, keep on using hakia and you will feel the difference. We will gradually improve with the support and feedback of users like you.

    Happy searching at hakia.com,

    hakia user care team

  12. 360 Says:

    Once again thanks for the response and yay, a new beta update coming soon! It nice to see you guys working so hard for us, the results will definitely be worth it!

  13. mara Says:

    Dear Rıza & Melek,

    I don’t know how but one of my sites comes up in your searches, I was wondering how I could add my other blog also. Is it possible that you can submit this one too?
    My blog is:

    odulluseoyarismasi.wordpress.com

    Thanks and Best Wishes,
    Mara

  14. pop Says:

    Well

    guys why dont u just add some more options for “beta users , i am not a computer master study guy but do use a lot of online seraching. Your site is ok now almost 60:40 ratio in even beta form as compared to other seaarch reaults.

    thx

  15. pop Says:

    hi
    guys any comments about the unapproved deal between the yahoo and msn ?
    looks like haki is impacting the market already due to competition of google?

    can u add a page which displays all the feature u added day by day and version wise?
    thx

  16. Hawaiian Shirts Says:

    hakia user care team: I decided to give you the benefit of doubt, so I can back 2 months later and again searched for “Hawaiian shirt Fridays”. I am saying this in a nice way… I think you have a long way to go and the results I saw are certainly not a sign of advancement in semantic search capability.

    FYI Your top result for my query is a dead link: http://hakia.com/search.aspx?q=Hawaiian+shirt+Fridays

    “http://www hawaiianshirtfriday net” is the bad seed

1 Trackbacks/Pingbacks

  1. Reportlinker's blog : Web 3.0, Vertical Search, Information Industry and Market Research News Says:

    Search Engine add news in their results

    Two news were published last week, telling that search engine are mixing "classical" with news results into their results pages. The start up Hakia (Founded in 2004, based in New York City) has developed a Web’s new meaning-based search

Leave a Reply