On Using Class-Labels in Evaluation of Clusterings Ines Färber • Stephan Günnemann • Hans-Peter Kriegel ◦ Peer Kröger ◦ Emmanuel Müller • Erich Schubert ◦ Thomas Seidl • Arthur Zimek ◦ • RWTH Aachen University, Germany ◦ LMU Munich University, Germany MultiClust at KDD 2010 July 25, 2010
The Dilemma of Evaluation What would be the optimal clustering solution? View 1 View 2 On Using Class-Labels in Evaluation of Clusterings 1 / 1
Introduction evaluation of clustering solutions: evaluation based on internal measures + no additional information needed; data independent - approaches optimizing the evaluation criteria will always be preferred evaluation based on an experts opinion + may reveal new insight into the data - very expensive; results are not comparable evaluation based on external measures + objective evaluation - needs a valid ground truth On Using Class-Labels in Evaluation of Clusterings 2 / 1
History of Cluster Evaluation clustering broke off from classification ⇒ assumption: classes stand out by inherent similarity traditional clustering mainly follows the partitioning approach external evaluation of traditional clustering: ⇒ the original assumption motivated the comparison against class labels UCI - iris dataset class structure does not necessarily correspond to a clustering structure ⇒ classes may split up into several subgroups ⇒ there might be smooth transitions between two classes On Using Class-Labels in Evaluation of Clusterings 3 / 1
Multi-View Context assumption: data groups differently when seen from different perspectives ⇒ each object might be grouped in multiple clusters ⇒ with each perspective a set of attributes can be associated View 1 View 2 clustering goes beyond the structure of class labels data items potentially belong to many clusters in differing views ⇒ class labels do not meet the assumptions of this scenario On Using Class-Labels in Evaluation of Clusterings 4 / 1
Classes vs. Clusters commonly observed differences between clusterings and class labelings: splitting of classes into multiple clusters merging of classes into a single cluster missing class outliers multiple (overlapping) hidden structures given class label alternative labels C = shape H = color W Y X Z On Using Class-Labels in Evaluation of Clusterings 5 / 1
Case Study – Pendigits Dataset differnet ways of digit notation 1 1 1 1 2 2 2 2 1 2 different types of digits 9 and 3 ⇒ almost 30 different groups of digits in contrast to 10 given classes On Using Class-Labels in Evaluation of Clusterings 6 / 1
Case Study – ALOI Dataset object groups that stand out due to similarity based on: color shape rotation object types ⇒ feature space influences the clustering result On Using Class-Labels in Evaluation of Clusterings 7 / 1
Can Anything Be Learned? QUESTIONABLE EDSC … still widely used ! On Using Class-Labels in Evaluation of Clusterings 8 / 1
Challenges ground truth should provide multiple labellings 1 measures should be able to deal with multiple labels 2 ⇒ e.g. label layers challenges: clustering covers only part of the layer (incompleteness?) clusters in one layer vs. multiple layers (purity vs. variety) the clustering intersects layers the clustering contains newly detected clusters ⇒ e.g. label hierarchies ⇒ e.g. label ontologies On Using Class-Labels in Evaluation of Clusterings 9 / 1
Challenges ground truth should provide multiple labellings 1 measures should be able to deal with multiple labels 2 ⇒ e.g. label layers ⇒ e.g. label hierarchies challenges: might be hard to derive clustering covers one branch (redundancy?) clustering covers one layer (impurity?) clustering covers nodes only partially (incompleteness?) union of nodes newly detected clusters ⇒ e.g. label ontologies On Using Class-Labels in Evaluation of Clusterings 10 / 1
Conclusion classification data result database clustering class label C per object evaluation hidden clusters H 1 per object H 2 H 3 enhanced H 4 evaluation ... proceed in the development of new clustering algorithms ensure objective clustering evaluation labeling of data measures for multiple labels On Using Class-Labels in Evaluation of Clusterings 11 / 1
Recommend
More recommend