Hierarchical clustering David M. Blei COS424 Princeton University February 28, 2008 D. Blei Clustering 02 1 / 21
Hierarchical clustering • Hierarchical clustering is a widely used data analysis tool. D. Blei Clustering 02 2 / 21
Hierarchical clustering • Hierarchical clustering is a widely used data analysis tool. • The idea is to build a binary tree of the data that successively merges similar groups of points D. Blei Clustering 02 2 / 21
Hierarchical clustering • Hierarchical clustering is a widely used data analysis tool. • The idea is to build a binary tree of the data that successively merges similar groups of points • Visualizing this tree provides a useful summary of the data D. Blei Clustering 02 2 / 21
Hierarchical clusering vs. k -means • Recall that k -means or k -medoids requires D. Blei Clustering 02 3 / 21
Hierarchical clusering vs. k -means • Recall that k -means or k -medoids requires • A number of clusters k D. Blei Clustering 02 3 / 21
Hierarchical clusering vs. k -means • Recall that k -means or k -medoids requires • A number of clusters k • An initial assignment of data to clusters D. Blei Clustering 02 3 / 21
Hierarchical clusering vs. k -means • Recall that k -means or k -medoids requires • A number of clusters k • An initial assignment of data to clusters • A distance measure between data d ( x n , x m ) D. Blei Clustering 02 3 / 21
Hierarchical clusering vs. k -means • Recall that k -means or k -medoids requires • A number of clusters k • An initial assignment of data to clusters • A distance measure between data d ( x n , x m ) • Hierarchical clustering only requires a measure of similarity between groups of data points. D. Blei Clustering 02 3 / 21
Agglomerative clustering • We will talk about agglomerative clustering . D. Blei Clustering 02 4 / 21
Agglomerative clustering • We will talk about agglomerative clustering . • Algorithm: D. Blei Clustering 02 4 / 21
Agglomerative clustering • We will talk about agglomerative clustering . • Algorithm: 1 Place each data point into its own singleton group D. Blei Clustering 02 4 / 21
Agglomerative clustering • We will talk about agglomerative clustering . • Algorithm: 1 Place each data point into its own singleton group 2 Repeat: iteratively merge the two closest groups D. Blei Clustering 02 4 / 21
Agglomerative clustering • We will talk about agglomerative clustering . • Algorithm: 1 Place each data point into its own singleton group 2 Repeat: iteratively merge the two closest groups 3 Until: all the data are merged into a single cluster D. Blei Clustering 02 4 / 21
Example Data ● 80 ● ● ● 60 ● 40 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 D. Blei Clustering 02 5 / 21
Example iteration 001 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 002 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 003 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 004 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 005 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 006 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 007 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 008 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 009 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 010 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 011 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 012 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 013 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 014 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 015 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 016 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 017 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 018 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 019 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 020 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 021 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 022 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 023 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 024 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Agglomerative clustering • Each level of the resulting tree is a segmentation of the data D. Blei Clustering 02 6 / 21
Agglomerative clustering • Each level of the resulting tree is a segmentation of the data • The algorithm results in a sequence of groupings D. Blei Clustering 02 6 / 21
Agglomerative clustering • Each level of the resulting tree is a segmentation of the data • The algorithm results in a sequence of groupings • It is up to the user to choose a ”natural” clustering from this sequence D. Blei Clustering 02 6 / 21
Dendrogram • Agglomerative clustering is monotonic D. Blei Clustering 02 7 / 21
Dendrogram • Agglomerative clustering is monotonic • The similarity between merged clusters is monotone decreasing with the level of the merge. D. Blei Clustering 02 7 / 21
Recommend
More recommend