Clustering Algorithms Johannes Bl¨ omer WS 2015/16 1 / 20
Introduction Clustering techniques for data management and analysis that classify/group given set of objects into categories/subgroups or clusters 2 / 20
Introduction Clustering techniques for data management and analysis that classify/group given set of objects into categories/subgroups or clusters Clusters homogeneous subgroups of objects such that similarity b/w objects in one subgroup is larger than similarity b/w objects from different subgroups 2 / 20
Introduction Clustering techniques for data management and analysis that classify/group given set of objects into categories/subgroups or clusters Clusters homogeneous subgroups of objects such that similarity b/w objects in one subgroup is larger than similarity b/w objects from different subgroups Goals 1 find structures in large set of objects/data 2 simplify large data sets 2 / 20
Example 3 / 20
Example 3 / 20
Example 4 / 20
Example How do we measure similarity/dissimilarity of objects? 4 / 20
Example How do we measure similarity/dissimilarity of objects? How do we measure quality of clustering? 4 / 20
Application areas 1 information retrieval 5 / 20
Application areas 1 information retrieval 2 data mining 5 / 20
Application areas 1 information retrieval 2 data mining 3 machine learning 5 / 20
Application areas 1 information retrieval 2 data mining 3 machine learning 4 statistics 5 / 20
Application areas 1 information retrieval 2 data mining 3 machine learning 4 statistics 5 pattern recognition 5 / 20
Application areas 1 information retrieval 2 data mining 3 machine learning 4 statistics 5 pattern recognition 6 computer graphics 5 / 20
Application areas 1 information retrieval 2 data mining 3 machine learning 4 statistics 5 pattern recognition 6 computer graphics 7 data compression 5 / 20
Application areas 1 information retrieval 2 data mining 3 machine learning 4 statistics 5 pattern recognition 6 computer graphics 7 data compression 8 bioinformatics 5 / 20
Application areas 1 information retrieval 2 data mining 3 machine learning 4 statistics 5 pattern recognition 6 computer graphics 7 data compression 8 bioinformatics 9 speech recognition. 5 / 20
Goals of this course different models for clustering 6 / 20
Goals of this course different models for clustering many important clustering heuristics, including agglomerative clustering, Lloyd’s algorithm, and the EM algorithm 6 / 20
Goals of this course different models for clustering many important clustering heuristics, including agglomerative clustering, Lloyd’s algorithm, and the EM algorithm the limitations of these heuristics 6 / 20
Goals of this course different models for clustering many important clustering heuristics, including agglomerative clustering, Lloyd’s algorithm, and the EM algorithm the limitations of these heuristics improvements to these heuristics 6 / 20
Goals of this course different models for clustering many important clustering heuristics, including agglomerative clustering, Lloyd’s algorithm, and the EM algorithm the limitations of these heuristics improvements to these heuristics various theoretical results about clustering, including NP-hardness results and approximation algorithms 6 / 20
Goals of this course different models for clustering many important clustering heuristics, including agglomerative clustering, Lloyd’s algorithm, and the EM algorithm the limitations of these heuristics improvements to these heuristics various theoretical results about clustering, including NP-hardness results and approximation algorithms general techniques to improve the efficiency of heuristics and approximation algorithms, i.e. dimension reduction techniques. 6 / 20
Organization Information about this course http://www.cs.uni-paderborn.de/fachgebiete/ag- bloemer/lehre/2015/ws/clusteringalgorithms.html Here you find announcements handouts slides literature 7 / 20
Organization Information about this course http://www.cs.uni-paderborn.de/fachgebiete/ag- bloemer/lehre/2015/ws/clusteringalgorithms.html Here you find announcements handouts slides literature lecture notes (will be written and appear as course progresses) 7 / 20
Organization Information about this course http://www.cs.uni-paderborn.de/fachgebiete/ag- bloemer/lehre/2015/ws/clusteringalgorithms.html Here you find announcements handouts slides literature lecture notes (will be written and appear as course progresses) There is only one tutorial, Thursday 13:00 -14:00. It starts next week. 7 / 20
Prerequisites design and analysis of algorithms basic complexity theory probability theory and stochastic some linear algebra 8 / 20
Objects objects described by d different features 9 / 20
Objects objects described by d different features features continuous or binary 9 / 20
Objects objects described by d different features features continuous or binary objects described as elements in R d or { 0 , 1 } d 9 / 20
Objects objects described by d different features features continuous or binary objects described as elements in R d or { 0 , 1 } d objects from M ⊆ R d or M ⊆ { 0 , 1 } d 9 / 20
Distance functions Definition 1.1 D : M × M → R is called a distance function, if for all x , y , z ∈ M D ( x , y ) = D ( y , x ) (symmetry) D ( x , y ) ≥ 0 (positivity), 10 / 20
Distance functions Definition 1.1 D : M × M → R is called a distance function, if for all x , y , z ∈ M D ( x , y ) = D ( y , x ) (symmetry) D ( x , y ) ≥ 0 (positivity), D is called a metric, if in addition, D ( x , y ) = 0 ⇔ x = y (reflexivity) D ( x , z ) ≤ D ( x , y ) + D ( y , z ) (triangle inequality) 10 / 20
Examples Example 1.2 (Euclidean distance) M = R d , d | x i − y i | 2 � 1 � 2 , � D l 2 ( x , y ) = � x − y � 2 = i =1 where x = ( x 1 , . . . , x d ) and y = ( y 1 , . . . , y d ) . 11 / 20
Examples Example 1.3 (Squared Euclidean distance) M = R d , d � | x i − y i | 2 , D l 2 2 ( x , y ) = � x − y � 2 = i =1 where x = ( x 1 , . . . , x d ) and y = ( y 1 , . . . , y d ) . 12 / 20
Examples Example 1.4 (Minkowski distances, l p -norms) M = R d , p ≥ 1 , d | x i − y i | p � 1 p . � � D l p ( x , y ) = � x − y � p = i =1 13 / 20
Examples Example 1.4 (Minkowski distances, l p -norms) M = R d , p ≥ 1 , d | x i − y i | p � 1 p . � � D l p ( x , y ) = � x − y � p = i =1 Example 1.5 (maximum distance) M = R d , D l ∞ ( x , y ) = � x − y � ∞ = max 1 ≤ i ≤ d | x i − y i | . 13 / 20
Examples Example 1.6 (Pearson correlation) M = R d , � d D Pearson ( x , y ) = 1 i =1 ( x i − ¯ x )( y i − ¯ y ) , 1 − 2 �� d x ) 2 � d y ) 2 i =1 ( x i − ¯ i =1 ( y i − ¯ � x i and ¯ � y i . x = 1 y = 1 where ¯ d d 14 / 20
Examples Example 1.7 (Mahalanobis divergence) A ∈ R d × d positive definite, i.e. x T Ax > 0 for x � = 0 , M = R d , D A ( x , y ) = ( x − y ) T A ( x − y ) 15 / 20
Examples Example 1.7 (Mahalanobis divergence) A ∈ R d × d positive definite, i.e. x T Ax > 0 for x � = 0 , M = R d , D A ( x , y ) = ( x − y ) T A ( x − y ) Example 1.8 (Itakura-Saito divergence) M = R d > 0 , � x i − ln( x i D IS ( x , y ) = ) − 1 . y i y i 15 / 20
Examples Example 1.9 (Kullback-Leibler divergence) M = S d := { x ∈ R d : ∀ i : x i ≥ 0 , � x i = 1 } , � D KLD ( x , y ) = x i ln( x i / y i ) , where by definition 0 · ln (0) = 0 . 16 / 20
Examples Example 1.9 (Kullback-Leibler divergence) M = S d := { x ∈ R d : ∀ i : x i ≥ 0 , � x i = 1 } , � D KLD ( x , y ) = x i ln( x i / y i ) , where by definition 0 · ln (0) = 0 . Example 1.10 (generalized KLD) M = R d ≥ 0 , � D KLD ( x , y ) = x i ln( x i / y i ) − ( x i − y i ) , 16 / 20
Similarity functions Definition 1.11 S : M × M → R is called a similarity function, if for all x , y , z ∈ M S ( x , y ) = S ( y , x ) (symmetry) 0 ≤ S ( x , y ) ≤ 1 (positivity), S is called a metric, if in addition, S ( x , y ) = 1 ⇔ x = y (reflexivity) � � S ( x , y ) S ( y , z ) ≤ S ( x , y ) + S ( y , z ) S ( x , z ) (triangle inequality) 17 / 20
Examples Example 1.12 (Cosine similarity) M = R d , S CS ( x , y ) = x T y or � x �� y � S CS ( x , y ) =1 + S CS ( x , y ) ¯ 2 18 / 20
Similarity for binary features Let x , y ∈ { 0 , 1 } d , then � { 1 ≤ i ≤ d : x i = b , y i = ¯ � � n b ¯ b ( x , y ) := b } � and for w ∈ R ≥ 0 n 00 ( x , y ) + n 11 ( x , y ) S w ( x , y ) := � . � n 00 ( x , y ) + n 11 ( x , y ) + w n 01 ( x , y ) + n 10 ( x , y ) 19 / 20
Similarity for binary features Let x , y ∈ { 0 , 1 } d , then � { 1 ≤ i ≤ d : x i = b , y i = ¯ � � n b ¯ b ( x , y ) := b } � and for w ∈ R ≥ 0 n 00 ( x , y ) + n 11 ( x , y ) S w ( x , y ) := � . � n 00 ( x , y ) + n 11 ( x , y ) + w n 01 ( x , y ) + n 10 ( x , y ) Popular: w = 1 , 2 , 1 2 . 19 / 20
Recommend
More recommend