Quest for perfect Search Engine (Part2)

By Abhi | February 9, 2009

Types of Apples

Types of Apples

“Apple”, the search engine, and you will be rewarded with millions of web fruits. There are many kinds of apples in this world. Or I should rather say, all kinds of apple exist in all parts of the world but are differentiated based on their color, role, age, usability, taste, performance, and of course “The Context”. Previous sentence isn’t weird since we all know that apple is in electronics. “You love to eat apple” and Adam’s apple isn’t named after every individual. Now should we read the results of all apples to get to the right one, or should we extend the “microchip” to our apple to get the right one. Or, try something else, which clusters the entire web results automatically into visually different result sets.

Quest for perfect search engine continues from Part-1. In the previous blog of this series, I looked into interesting engines like Powerset, Hakia, Kosmix and Cluuz. In this blog I will be discussing another branch of engines, broadly classified as “Clustering engines”. So, lets see what exactly is a

Clustering Engine

“Clustering is an act of grouping entities into clusters. A cluster is grouping of things/entities that can occur together (in a system). Clustering engine is a software application, which can classify a resultant heterogeneous set into smaller but more coherent homogeneous sub sets”

Clustering engines group the search results in different categories based on similarity of the text, word frequency and proximity of the phrases/words found in the resultant documents. Rather than trying to map user’s keywords to the results semantically or just literally, clustering engines employ statistical methods of Text Mining. They use predefined taxonomies and vocabularies to group results into clusters and name them for the user’s understanding based on weighted graphs (mathematically speaking). Here are few good clustering engines to try.

Kartoo

Kartoo is my personal pick in visual clustering engines category. Being a fan of visual search, I must say that they have a nice flash based layout along with a balanced use of content and display. Once you have reached some kind of consensus after doing your research on Kartoo maps, you might like to preserve your steps/efforts which went in finding the relevant information. So user-friendly features like: ability to save, load and print your visual map, zoom in and out of the clusters using the clicks come in handy. You don’t have to log in and log out for using these features.

A non-visual clustering is also available on this link non-visual-kartoo. Non-visual clustering is impressive with the abilities of including and excluding the clusters for specific views of the results.

This is a Meta search engine, meaning, it combines results from various result sources. MSN and yahoo are the few search engines behind the nicely clustered results of Kartoo.

Clusty

Clusty as the name suggests, is a clustering engine (Meta search) from a company named Vivisimo. This company provides Enterprise, Federated and Clustering search solutions to the markets. But, Clusty is a free initiative for the personal use of web searchers, one can use this for productivity enhancement.

Sources of Clusty’s cluster are engines like live.com, ask.com, yahoo news, and open directory. This website can cluster web results (obviously). Along with that, you can also cluster wikipedia, blogs, news, jobs and images. Other interesting clusters, which can be formed are based on types search engine used and type of websites (.com, .net) from where results were gathered.

Quintura

“From Russia with Love” comes the Quintura. This is one of the most effective clustering engines with very beautiful graphical user interface. Quintura does context based search visualization and context management using neural networks, as one of their patents says. You can cluster web, images, videos and Amazon.

Quintura is also a Meta search engine relying mainly on yahoo’s index for its clusters and its own-patented technology (7,437,370) for displaying the cloud of the cluster. It has a very nice user interface with “on mouse over” kind of cluster expansion and contractions. Saving the cluster, map, and reloading are the features provided for your results. Quintura definitely is one of finest clustering engines available on the web with very high ratings.

Here is the list of some other interesting clustering engines. Their order is alphabetical rather then based on features, usability
or recommendation:

CarrotSearch

Iboogie

Kooltorch

MooTer

mnemomap

qksearch

Webclust

Grokker

Xclustering

Clustering Vs Semantics in Nutshell:
Clustering engines fall short of semantic engines on the scale of language processing, context understanding, Polysemy, synonymy, vernacular, capturing negations, and useage of ontology. They use syntactic constructs of language, pattern recognition, and phrase proximity, and LSA rather than forging into semantic aspect of context understanding. Currently, there is no accurate semantic search engine in sight, so we can fall back on clustering engines for some more years to come.


In Part 3 of this blogging series I will take a look into the worlds of Google/ yahoo/MSN and their efforts to make them future attractive, Plus some other catchier efforts.


6 Comments

Neeraj on December 25, 2010 at 12:46.

I could not find any mention of Cuil. Did you not find it worthy enough for a
mention.

Reply

Abhishek on December 25, 2010 at 12:48.

Hi Neeraj,
Cuil certainly is a promising engine but it cannot be categorized as
‘Clustering engine’. The category suggestion (from Cuil), which show up to the
user are not
created on the fly. They exist in some kind of strictly predefined ontology or
taxonomy.
Try the word ‘Trigent’ in Cuil and then in any of the suggested engines. Cuil’s

cluster does not understand this word and hence will not show any category
suggestions but clustering engines will form a cluster around this word
‘Trigent’. But none the less, this is an interesting engines which will be

covered in Part 3 of the series along with zoominfo, Google search wiki,

Searchtogather, snap, and some others.

Reply

Leave Your Comment

Your email will not be published or shared. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>