subspace clustering ensemble clustering subspace
play

Subspace Clustering Ensemble Clustering Subspace Clustering, - PowerPoint PPT Presentation

LUDWIG- MAXIMILIANS- INSTITUTE DATABASE UNIVERSITT FOR SYSTEMS MNCHEN MNCHEN INFORMATICS INFORMATICS GROUP GROUP Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview


  1. LUDWIG- MAXIMILIANS- INSTITUTE DATABASE UNIVERSITÄT FOR SYSTEMS MÜNCHEN MÜNCHEN INFORMATICS INFORMATICS GROUP GROUP Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other? MultiClust@KDD 2010 Hans-Peter Kriegel, Arthur Zimek Ludwig-Maximilians-Universität München Munich, Germany http://www.dbs.ifi.lmu.de {kriegel, zimek}@dbs.ifi.lmu.de

  2. Outline DATABASE SYSTEMS GROUP GROUP 1. Subspace Clustering 2. Ensemble Clustering 3. Alternative Clustering 4. Multiview Clustering 5. Discussion Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010) 2

  3. Subspace Clustering DATABASE SYSTEMS GROUP GROUP • Task: identify clusters of similar objects • similarity defined w.r.t. a certain subspace of the data space i il it d fi d t t i b f th d t • different subspaces for different clusters Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010) 3

  4. Subspace Clustering DATABASE SYSTEMS GROUP GROUP • Subspaces: different – selection – weighting – combination combination of attributes • learn subspace and clustering • learn subspace and clustering simultaneously (interdepency) • strategies: strategies: evant attribute – top-down (learn spatial characteristics of initially built sets of objects) irrele – bottom-up (learn 1-d clusters, combine them to 2-d clusters, etc. (APRIORI)) => many irrelevant clusters a y e e a c us e s relevant attribute/ relevant subspace Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010) 4

  5. Ensemble Clustering DATABASE SYSTEMS GROUP GROUP • basic idea: combine different clusterings to obtain one single, more reliable clustering i l li bl l t i • tasks: – how to create diverse clusterings h t t di l t i – how to combine different clusterings • induce diversity of clusterings • induce diversity of clusterings – use different feature-subsets – use different database subsets – use different clustering algorithms • correspondence between clusterings – useful for judging on redundancy of clusters? – a lot of different answers – but: could it not be that different clusterings are just different yet both meaningful? clusterings are just different, yet both meaningful? Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010) 5

  6. Alternative Clustering DATABASE SYSTEMS GROUP GROUP • given a clustering, use diversity or non-redundancy as a constraint to find a different clustering t i t t fi d diff t l t i • techniques: – ensemble techniques bl t h i – use different subspaces • relationship to subspace clustering: • relationship to subspace clustering: – subspace clustering can learn from the treatment of non-redundancy – alternative clustering can learn to allow for a certain level of g redundancy Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010) 6

  7. Multiview Clustering DATABASE SYSTEMS GROUP GROUP • seek different clusterings in different subspaces • special case of alternative clustering? – constraint: orthogonality of subspaces • special case of subspace clustering? – allowing maximal overlap of clusters – seeking minimally redundant clusters by accommodating different seeking minimally redundant clusters by accommodating different concepts • emphasizes the observation known from subspace p p clustering: highly overlapping clusters in different subspaces need not g y pp g p be redundant nor meaningless Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010) 7

  8. Discussion DATABASE SYSTEMS GROUP GROUP subspace clustering ensemble clustering •goal: different clusters in different l diff t l t i diff t •goal: different subspaces shall l diff t b h ll subspaces induce the same clusters •problem: redundancy of clusters •problem: correspondence of (same clusters reported for (same clusters reported for clusterings? What about actually clusterings? What about actually different subspaces) different clusterings? ? alternative clustering multiview clustering •goal: given a clustering, find a •goal: given a clustering find a •goal: find different cluster •goal: find different cluster different clustering concepts in different subspaces •problem: which level of •problem: balance between redundancy is admissible? redundancy is admissible? admissible overlap of clusters and admissible overlap of clusters and difference between concepts Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010) 8

  9. Discussion DATABASE SYSTEMS GROUP GROUP • how should we treat diversity of clustering solutions? – should diverse clusterings always be unified (ensemble)? – should diverse clusterings always be unified (ensemble)? – under which conditions is a unification of diverse clusterings meaningful? • can we learn from diversity itself? – again ensemble: exceptional clustering in one subspace will be outnumbered and lost – could it not be especially interesting? t b d d l t ld it t b i ll i t ti ? • how to treat redundancy (esp. overlap)? – when does a cluster qualify as redundant w.r.t. another cluster, when when does a cluster qualify as redundant w r t another cluster when does it represent a different concept (despite a certain overlap)? alternative clustering subspace clustering ? low redundancy l d d hi h high redundancy d d • how to assess similarity between clustering solutions? – possible overlap between clusters makes this problem really difficult poss b e o e ap be ee c us e s a es s p ob e ea y d cu – no simple mapping Kriegel/Zimek: What can we learn from each other? (MultiClust@KDD 2010) 9

Recommend


More recommend