Quest for perfect Search Engine (Part1)
By Abhi | January 9, 2009
Google catalogs billions of pages per year and by now has a trillion unique URLs at your service for searching. Considering that some of us cannot read due to digital or biological factors, this information is impossible to read for 6.7 billion human beings worldwide. Search engines are able to keep pace with the information explosion due to advancement in processing speed and cheaper memory. On the flip side, for an individual, time spent in finding relevant and authentic information on the web is going up and over. Web Searches are not done for the tons of HTML or speed, but for their relevance. Relevance is relative to the eyes of needy seekers as the meaning unfolds deep inside his neurons.
“Looking for Bill Gates in Europe” has its rationality in all the European countries {Germany, England…}, rather than in 6 characters ‘E u r o p e’, and if the document is in Danish on ‘Bill Gates in Europa’, you missed it. Do not even think of putting “before 1978″ as that won’t do any good to your valuable time. Due to the limitations in the search engines today, users have mind map that the top search results are the relevant ones. Otherwise, keywords are inadequate or ‘let’s start reading it all till I hit the wall’. But the issue lies with the search engines, which take the keywords as a collection of characters just the way they look (Europe).
Finding the results based on actual semantics of the words, phrase, or paragraph is distant, but there is a silver lining. Though nothing currently matches the ever-demanding quest for a perfect search genie, there are some initiatives that are worth watching.
Powerset
Powerset is an engine for searching wikipedia articles, and was acquired by Microsoft for a whopping 100 million dollars, August of 2008. This web site can answer simple questions whenever possible, can explore simple phrases using natural language processing, and can relate your search to the newer (but relevant) facts/concepts in various results. It compiles an interesting dossier of results, which can be navigated with comparative ease.
A simple query on “When did Mahatma Gandhi die?” returned with the exact answer. An exploration of the very first result article (if you want to dig deep) returned with interesting navigational structure for the resultant document. But the key part is that “navigational structure” can be turned into a fact-based one (“Show Factz”). This helps the user to navigate through sentences, based on facts understood from each sentence of the article. If this wasn’t interesting enough, then the best is still to come.
A simple search on “Mahatma Gandhi” produced a set of results and a fact bar. This “fact bar” collects the action verb based on listings of Gandhi’s activities over all the results found in wikipedia articles. Users can directly connect to all the articles based on Gandhi’s relations with his actions. For information on things, places and people, Wikipedia can be trusted (relatively) for focused information, and Powerset has the ability to summarize that information for the user. Thumbs up for this application and to the Microsoft.
By the way, do try Microsoft’s “Live Search” option coming out on each Powerset result page and see the true value of this tool. Web community will be waiting for Powerset to come out, from just wikipedia, to open grounds.
Cluuz
Cluuz is another promising engine which has the abilities to extract the Named Entities like people, things, companies, phone numbers, emails, addresses and domains from the result pages and interrelate them to form denser semantic graphs. This gives the user an advantage to correlate the search keywords with visible clues and extend the search just with the clicks. It picks up the images from the result pages and gives you a kind of preview of what you can expect in the page. The best part of the website is something called “Semantic Graph of a cluster”. This is a graphical component. It displays which result page leads to generation/extraction of what relations and how many result pages leads to the generation of one concept. Enhancement of this “Semantic Graph” in the future for extending the search inside the graph itself (without leaving it) will be a good feature to have.
Hakia
If you are not interested in options of clustering, and fact-based grouping, then you should try Hakia. This is a Vanilla semantic search engine, which lives on the power of their semantic algorithms to show the results. It is in beta stage right now and it shows relevant results for basic questions and simple phrases. One limiting factor, or blessing in disguise is that they show the results from pages which have been termed credible by their internal mechanisms (from credible Web sites recommended by librarians). It provides you with API’s to form your own semantic search engine (for all you geeky lads). But when it comes to be truly-semantic, this site has long way to go.
Kosmix
Do you remember the guys who founded Junglee; one of the first shopping search engines later bought by Amazon? Bunch of them are back with an engine named Kosmix and continuously getting funded for one stage after another. Kosmix offers a much better organized view of the result world. Search is based on their own indexes (and their partners), it shows organized and categorized results. Result page is categorized based on media audio/video, search engine, blogs and much more.
News, Blogs, and Media (published, audio, video) are some of the major forms of data available on the web, and also the ones which can contaminate your results unwillingly. Kosmix certainly adds value to searches since one can directly go to category of choice as they are already segregated. Kosmix is also making efforts on machine-generated related topics for your keywords; improvement in this area is required.
In Part 2 of this series “Quest for perfect Search Engine”, I will explore the different stream of search engines called clustering engines. There is good count of them in the market and certainly show more organized results than plain keyword stemming engines.




7 Comments
Pingback: Quest for perfect Search Engine (Part2) «
Pingback: Quest for perfect Search Engine (Part3) «
Pingback: Google gets phrasally semantic «
Pingback: Fallen Search Engines of 2010 «
Pingback: SenseNews – Latest innovation in stock analysis «
Pingback: Celebrating Watson as an innovation «
Pingback: Google’s Content Farm updates : or leaking patches «