Introducing Clustering 2.0
Vivisimo introduced high-quality text clustering into the search engine market in the year 2000, after a couple of years of computer science research on new algorithms by the founders at Carnegie Mellon. The research breakthrough was labelling the clusters, i.e, grouping search results into folder topics. Before that breakthrough, search result clusters had poor labels and so the technology was unusable. The technology was first demonstrated on a university website and later at vivisimo.com, with excellent reviews.
A couple of years later, Vivisimo’s computer scientists developed a way to add linguistic knowledge, to help detect similarity that the clustering algorithms would otherwise miss, and to prevent false similarities. For example, people’s language skills let them realize that kill, murder, slay, and gun down are pretty similar concepts, but make a killing is different (for you non-native speakers, make a killing is about making a large profit), and put the gun down is different too. This engineering breakthrough greatly improved the clustering performance with practical amounts of internal development, for English and other languages (try Japanese here or here).
An earlier blog post goes into more detail on the state of the art of this Clustering 1.0, as well as the end-user value.
Now on to Clustering 2.0!
Although clustering reveals the major topics in the top 200, 500, or more search results, there are always more topics than can be shown, without overloading the user with a very long list. There hasn’t been any better approach, until now.
With a single click, remix clustering answers the question: What other, subtler topics are there? It works by clustering again the same search results, but with an added input: ignore the topics that the user just saw. Typically, the user will then see new major topics that didn’t quite make the final cut at the last round, but may still be interesting.
Remix clustering was introduced in Vivisimo Velocity 6.0, our enterprise search platform which also introduced other user-experience capabilities.
To see remix clustering in action, try searching Clusty.com for our company’s hometown of Pittsburgh. Look at the folders, then click on Remix to the right of the top folder. Notice how you can dig deeper and “tour” Pittsburgh effortlessly, just by remixing. Or pick a topic that you are familiar with, and notice how repeated Remixing will turn up an interesting but unfamiliar topic or two.
To clarify: I’ve been asked whether remix clustering is only for when none of the folder topics looks interesting. Not at all! When I select a book off the bookshelf, it doesn’t mean that every prior book I saw was uninteresting. Instead, it just means that I want to see what else there is. Same thing here: What other topics are there?
The name of the game in search interfaces is to empower the user to see more, effortlessly, and avoid the curse of Information Overlook (pdf - an old thought piece). Clustering 2.0 plays the game very well.
By the way, an obligatory reminder: remix clustering is patent pending.
Tags: clustering, linguistic knowledge, search engine, search results
Technorati search for links to this article
Post this article to Digg (must be logged in)
Post this article to del.icio.us (must be logged in)
Post this article to Reddit (must be logged in)
Post this article to Furl (must be logged in)
Post this article to Spurl (must be logged in)
[…] Learn More Here […]
[…] the recent announcement of Clustering 2.0, Vivisimo and Clusty have gone even farther: A couple of years later, Vivisimo’s computer […]
[…] Here’s a blog you can reference http://searchdoneright.com/2008/01/introducing-clustering-2.0/ […]
[…] right hand corner of the cluster list and you get a new list of clusters. In a blog post titled Introducing Clustering 2.0 Vivisimo CEO Raul Valdes-Perez explains what happens when you click remix: With a single click, […]
[…] right hand corner of the cluster list and you get a new list of clusters. In a blog post titled Introducing Clustering 2.0 Vivisimo CEO Raul Valdes-Perez explains what happens when you click remix: With a single click, […]