Cloudster ? Design User interaction Future enhancements Cloudster K -means algorithm for cloud computing {stephane.caron, guillaume.claret, anisse.ismaili, jacques-henri.jourdan, michael.mathieu, mathieu.prevot, guillaume.seguin, yingjie.xu}@ens.fr École Normale Supérieure - Department of Computer Science May 20 2009 Cloudster team Cloudster
Cloudster ? Design User interaction Future enhancements 1 Cloudster ? K -means algorithm About Cloudster Design 2 User interaction 3 The CLI way The web way 4 Future enhancements Future (possible) core features Upcoming samples Cloudster team Cloudster
Cloudster ? Design K -means algorithm User interaction About Cloudster Future enhancements Cloudster ? Cloudster team Cloudster
Cloudster ? Design K -means algorithm User interaction About Cloudster Future enhancements K -means algorithm Goal : given N objects, optimally partition them into K clusters. Basic algorithm : Randomly initialize groups Iterate: foreach point p: Find nearest centroid C(p) Add p to the C(p) group Update centroids Cloudster team Cloudster
Cloudster ? Design K -means algorithm User interaction About Cloudster Future enhancements About Cloudster Generic k -means algorithm implementation : feel free to feed it with your distance & centroïd computation functions ! � Azure cloud-computing Heavily scalable : uses Windows R platform Written in C# & uses the .NET framework BSD licensed, open development @ http://cloudster.sourceforge.net Cloudster team Cloudster
Cloudster ? Design User interaction Future enhancements Design Cloudster team Cloudster
Cloudster ? Design User interaction Future enhancements Design T ables T asks Status Cluster EntityCluster Queue Blob Worker CoreLib.dll ClusterJob EntityJob Sample.dll ICentroid IDistance Cloudster team Cloudster
Cloudster ? Design The CLI way User interaction The web way Future enhancements User interaction Cloudster team Cloudster
Cloudster ? Design The CLI way User interaction The web way Future enhancements The CLI way Three separate tools : The Builder , which initializes the blob storage and tables and uploads the initial entities The Tester , which starts the algorithm (either the sequential one or the cloud computed one) The Evaluator , which computes the score of the current algorithm state Cloudster team Cloudster
Cloudster ? Design The CLI way User interaction The web way Future enhancements The web way A remote web interface, using Azure’s web roles power. Prefered way for interacting with the cloud : easier, better, faster : Unifies the CLI tools into a single interface Enables thorough monitoring of algorithm state (tasks, results) Enables case-specific visualisations of algorithm results Cloudster team Cloudster
Cloudster ? Design Future (possible) core features User interaction Upcoming samples Future enhancements Future enhancements Cloudster team Cloudster
Cloudster ? Design Future (possible) core features User interaction Upcoming samples Future enhancements Future (possible) core features Use reflection to unify involved tools Blob storage handling improvements : Assign workers to specific groups of entities Improve entities cache Store multiple entities in each blob Split computations and storage queries to dedicated threads Enable the user to add/remove entities on the fly Table repair tool Cloudster team Cloudster
Cloudster ? Design Future (possible) core features User interaction Upcoming samples Future enhancements Upcoming samples Sparse vectors sample Image comparison sample, based on GIST algorithm (currently investigating some implementation bugs) DNA sequences comparison sample, based on NAligner, using FASTA file format Cloudster team Cloudster
Cloudster ? Design User interaction Future enhancements Questions ? Cloudster team Cloudster
More recommend