We recently released the first version of Cloudster, a cloud-computing distributed implementation of k-means clustering for Windows Azure. This release features a full working environment and a bunch of samples showing how the API works, e.g. to cluster vectors, images, DNA, etc.

Website of the Cloudster project

In this release, I developed a text sample clustering documents using Salton et al.’s vector space model for representing text documents. Such clustering can be used for instance in query expansion for search engines. At present the Clusty search engine follows this approach.

Warm thanks to our professor Joannes Vermorel for gathering the team and launching us on this very cool project!

© Stéphane Caron — All content on this website is licensed under the CC BY 4.0 license.