npfl103 information retrieval 10
play

NPFL103: Information Retrieval (10) Document clustering Pavel - PowerPoint PPT Presentation

Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants NPFL103: Information Retrieval (10) Document clustering Pavel Pecina Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics


  1. Introduction K -means 1 1 2 1 1 1 2 1 1 1 1 1 1 1 2 1 2 2 Worked Example: Assignment Variants Hierarchical clustering How many clusters? Evaluation 29 / 114 × 2 2 ×

  2. Introduction 1 1 1 2 1 1 1 2 1 1 K -means 1 1 Worked Example: Recompute cluster centroids Evaluation How many clusters? 1 Variants Hierarchical clustering 2 2 1 2 1 30 / 114 × × 2 2 × ×

  3. Introduction K -means b b b b b b b b b b b b b b b b b b b b Worked Example: Assign points to closest centroid Variants Hierarchical clustering How many clusters? Evaluation 31 / 114 × ×

  4. Introduction K -means 1 1 2 1 1 1 2 1 1 1 1 1 1 1 2 2 2 2 Worked Example: Assignment Variants Hierarchical clustering How many clusters? Evaluation 32 / 114 × 2 2 ×

  5. Introduction 1 1 1 2 1 1 1 2 1 1 K -means 1 1 Worked Example: Recompute cluster centroids Evaluation How many clusters? 1 Variants Hierarchical clustering 2 2 2 2 1 33 / 114 × × 2 2 × ×

  6. Introduction K -means b b b b b b b b b b b b b b b b b b b b Worked Example: Assign points to closest centroid Variants Hierarchical clustering How many clusters? Evaluation 34 / 114 × ×

  7. Introduction K -means 1 1 2 1 1 1 2 1 1 2 1 1 1 1 2 2 2 2 Worked Example: Assignment Variants Hierarchical clustering How many clusters? Evaluation 35 / 114 × 2 2 ×

  8. Introduction 1 1 1 2 1 1 1 2 1 1 K -means 2 1 Worked Example: Recompute cluster centroids Evaluation How many clusters? 1 Variants Hierarchical clustering 2 2 2 2 1 36 / 114 × × 2 2 × ×

  9. Introduction K -means b b b b b b b b b b b b b b b b b b b b Worked Example: Assign points to closest centroid Variants Hierarchical clustering How many clusters? Evaluation 37 / 114 × ×

  10. Introduction K -means 1 1 1 1 1 1 2 1 2 2 1 1 1 1 2 2 2 2 Worked Example: Assignment Variants Hierarchical clustering How many clusters? Evaluation 38 / 114 × 2 1 ×

  11. Introduction 1 1 1 1 1 1 1 2 1 2 K -means 2 1 Worked Example: Recompute cluster centroids Evaluation How many clusters? 1 Variants Hierarchical clustering 2 2 2 2 1 39 / 114 × × 2 1 × ×

  12. Introduction K -means b b b b b b b b b b b b b b b b b b b b Worked Example: Assign points to closest centroid Variants Hierarchical clustering How many clusters? Evaluation 40 / 114 × ×

  13. Introduction K -means 1 1 1 1 1 1 2 1 2 2 1 1 1 1 2 2 2 2 Worked Example: Assignment Variants Hierarchical clustering How many clusters? Evaluation 41 / 114 × 1 1 ×

  14. Introduction 1 1 1 1 1 1 1 2 1 2 K -means 2 1 Worked Example: Recompute cluster centroids Evaluation How many clusters? 1 Variants Hierarchical clustering 2 2 2 2 1 42 / 114 × 1 1 × × ×

  15. Introduction K -means b b b b b b b b b b b b b b b b b b b b Worked Example: Assign points to closest centroid Variants Hierarchical clustering How many clusters? Evaluation 43 / 114 × ×

  16. Introduction K -means 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2 2 2 Worked Example: Assignment Variants Hierarchical clustering How many clusters? Evaluation 44 / 114 × 1 1 ×

  17. Introduction 1 1 1 1 1 1 1 1 1 2 K -means 2 1 Worked Example: Recompute cluster centroids Evaluation How many clusters? 1 Variants Hierarchical clustering 2 2 2 2 1 45 / 114 × 1 1 × × ×

  18. Introduction K -means 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2 2 2 Worked Example: Centroids and assignments afuer convergence Variants Hierarchical clustering How many clusters? Evaluation 46 / 114 × 1 1 ×

  19. Introduction K -means is guaranteed to converge: Proof K -means closest centroid 47 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ RSS = sum of all squared distances between document vector and ▶ RSS decreases during each reassignment step. ▶ because each vector is moved to a closer centroid ▶ RSS decreases during each recomputation step. ▶ See the book for a proof. ▶ There is only a finite number of clusterings. ▶ Thus: We must reach a fixed point. ▶ Assumption: Ties are broken consistently. ▶ Finite set & monotonically decreasing → convergence

  20. Introduction convergence and pptimality of K -means horrible. clustering! K -means 48 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ K -means is guaranteed to converge ▶ But we don’t know how long convergence will take! ▶ If we don’t care about a few docs switching back and forth, then convergence is usually fast ( < 10-20 iterations). ▶ However, complete convergence can take many more iterations. ▶ Convergence ̸ = optimality ▶ Convergence does not mean that we converge to the optimal ▶ This is the great weakness of K -means. ▶ If we start with a bad set of seeds, the resulting clustering can be

  21. Introduction Variants K -means Exercise: Suboptimal clustering 49 / 114 Hierarchical clustering How many clusters? Evaluation 3 d 1 d 2 d 3 × × × 2 × × × 1 d 4 d 5 d 6 0 0 1 2 3 4 ▶ What is the optimal clustering for K = 2 ? ▶ Do we converge on this clustering for arbitrary seeds d i , d j ?

  22. Introduction initialized. clustering for each, select the clustering with lowest RSS document space) outliers or find a set of seeds that has “good coverage” of the suboptimal clustering. K -means 50 / 114 Initialization of K -means Variants Hierarchical clustering How many clusters? Evaluation ▶ Random seed selection is just one of many ways K -means can be ▶ Random seed selection is not very robust: It’s easy to get a ▶ Betuer ways of computing initial centroids: ▶ Select seeds not randomly, but using some heuristic (e.g., filter out ▶ Use hierarchical clustering to find good seeds ▶ Select i (e.g., i = 10 ) difgerent random sets of seeds, do a K -means

  23. Introduction Time complexity of K -means document-centroid distances) K -means 51 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ Computing one distance of two vectors is O ( M ) . ▶ Reassignment step: O ( KNM ) (we need to compute KN ▶ Recomputation step: O ( NM ) (we need to add each of the document’s < M values to one of the centroids) ▶ Assume number of iterations bounded by I ▶ Overall complexity: O ( IKNM ) – linear in all important dimensions ▶ However: This is not a real worst-case analysis. ▶ In pathological cases, complexity can be worse than linear.

  24. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Evaluation 52 / 114

  25. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants What is a good clustering? clustering in the application. 53 / 114 ▶ Internal criteria ▶ Example of an internal criterion: RSS in K -means ▶ But an internal criterion ofuen does not evaluate the actual utility of a ▶ Alternative: External criteria ▶ Evaluate with respect to a human-defined classification

  26. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants External criteria for clustering quality used for the evaluation of classification groups, not the class labels.) purity 54 / 114 ▶ Based on a gold standard data set, e.g., the Reuters collection we also ▶ Goal: Clustering should reproduce the classes in the gold standard ▶ (But we only want to reproduce how documents are divided into ▶ First measure for how well we were able to reproduce the classes:

  27. Introduction N the set of classes. j max k K -means 55 / 114 External criterion: Purity Variants Hierarchical clustering How many clusters? Evaluation purity (Ω , C ) = 1 ∑ | ω k ∩ c j | ▶ Ω = { ω 1 , ω 2 , . . . , ω K } is the set of clusters and C = { c 1 , c 2 , . . . , c J } is ▶ For each cluster ω k : find class c j with most members n kj in ω k ▶ Sum all n kj and divide by total number of points

  28. Introduction o To compute purity: cluster 3 cluster 2 cluster 1 x x o K -means o x 56 / 114 x x Evaluation How many clusters? Hierarchical clustering Variants Example for computing purity x o x x ⋄ o ⋄ ⋄ ⋄ 5 = max j | ω 1 ∩ c j | (class x, cluster 1); 4 = max j | ω 2 ∩ c j | (class o, cluster 2); and 3 = max j | ω 3 ∩ c j | (class ⋄ , cluster 3). Purity is (1/17) × (5 + 4 + 3) ≈ 0 . 71 .

  29. Introduction same cluster decision is correct or incorrect. documents in the same or in difgerent clusters) … for N docs. true negatives (TN) false positives (FP) difgerent classes false negatives (FN) K -means same class difgerent clusters true positives (TP) 57 / 114 Hierarchical clustering not have this problem: Rand index. Another external criterion: Rand index Variants Evaluation How many clusters? ▶ Purity can be increased easily by increasing K – a measure that does TP + TN RI = TP + FP + FN + TN ▶ Based on 2x2 contingency table of all pairs of documents: ▶ Where: ( N ) ▶ TP+FN+FP+TN is the total number of pairs; 2 ▶ Each pair is either positive or negative (the clustering puts the two ▶ …and either “true” (correct) or “false” (incorrect): the clustering

  30. Introduction points, respectively, so the total number of “positives” or pairs of cluster 3, and the x pair in cluster 3 are true positives: K -means documents that are in the same cluster is: 58 / 114 Variants Hierarchical clustering Evaluation How many clusters? Example: compute Rand Index for the o/ ⋄ /x example ▶ We first compute TP + FP. The three clusters contain 6, 6, and 5 ( 6 ( 6 ( 5 ) ) ) TP + FP = + + = 40 2 2 2 ▶ Of these, the x pairs in cluster 1, the o pairs in cluster 2, the ⋄ pairs in ( 5 ( 4 ( 3 ( 2 ) ) ) ) TP = + + + = 20 2 2 2 2 ▶ Thus, FP = 40 − 20 = 20 . ▶ FN and TN are computed similarly.

  31. Introduction same cluster difgerent classes same class K -means difgerent clusters Variants Hierarchical clustering How many clusters? Evaluation 59 / 114 Rand index for the o/ ⋄ /x example TP = 20 FN = 24 FP = 20 TN = 72 RI is then (20 + 72)/(20 + 20 + 24 + 72) ≈ 0 . 68 .

  32. Introduction Two other external evaluation measures maximum MI classification? K -means 60 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ Two other measures ▶ Normalized mutual information (NMI) ▶ How much information does the clustering contain about the ▶ Singleton clusters (number of clusters = number of docs) have ▶ Therefore: normalize by entropy of clusters and classes ▶ F measure ▶ Like Rand, but “precision” and “recall” can be weighted

  33. Introduction 0.0 All measures range from 0 (bad clustering) to 1 (perfect clustering). 0.46 0.68 0.36 0.71 value for example 1.0 1.0 1.0 1.0 maximum 0.0 K -means 0.0 0.0 lower bound RI NMI purity Variants Hierarchical clustering How many clusters? Evaluation 61 / 114 Evaluation results for the o/ ⋄ /x example F 5

  34. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants How many clusters? 62 / 114

  35. Introduction How many clusters? clusters? K -means 63 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ Number of clusters K is given in many applications. ▶ E.g., there may be an external constraint on K . ▶ What if there is no external constraint? Is there a “right” number of ▶ One way to go: define an optimization criterion ▶ Given docs, find K for which the optimum is reached. ▶ What optimization criterion can we use? ▶ We can’t use RSS or average squared distance from centroid as criterion: always chooses K = N clusters.

  36. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Simple objective function for K : Basic idea from centroid 64 / 114 ▶ Start with 1 cluster ( K = 1 ) ▶ Keep adding clusters (= keep increasing K ) ▶ Add a penalty for each new cluster ▶ Then trade ofg cluster penalties against average squared distance ▶ Choose the value of K with the best tradeofg

  37. Introduction Simple objective function for K : Formalization (corresponds to average distance) K -means distance to centroid 65 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ Given a clustering, define the cost for a document as (squared) ▶ Define total distortion RSS(K) as sum of all individual document costs ▶ Then: penalize each cluster with a cost λ ▶ Thus for a clustering with K clusters, total cluster penalty is K λ ▶ Define the total cost of a clustering as distortion plus total cluster penalty: RSS(K) + K λ ▶ Select K that minimizes (RSS(K) + K λ ) ▶ Still need to determine good value for λ …

  38. Introduction Finding the “knee” in the curve Pick the number of clusters where curve “flatuens”. Here: 4 or 9. K -means 66 / 114 Variants Hierarchical clustering How many clusters? Evaluation 1950 1900 residual sum of squares 1850 1800 1750 2 4 6 8 10 number of clusters

  39. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Hierarchical clustering 67 / 114

  40. Introduction oil & gas clustering. TOP regions industries Kenya China UK K -means France poultry cofgee we saw earlier in Reuters: Our goal in hierarchical clustering is to create a hierarchy like the one Hierarchical clustering Variants Hierarchical clustering How many clusters? Evaluation 68 / 114 ▶ We want to create this hierarchy automatically. ▶ We can do this either top-down or botuom-up. ▶ The best known botuom-up method is hierarchical agglomerative

  41. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Hierarchical agglomerative clustering (HAC) clusters. 69 / 114 ▶ HAC creates a hierachy in the form of a binary tree. ▶ Assumes a similarity measure for determining similarity of two ▶ Up to now, our similarity measures were for documents. ▶ We will look at four difgerent cluster similarity measures.

  42. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants HAC: Basic algorithm 70 / 114 ▶ Start with each document in a separate cluster ▶ Then repeatedly merge the two clusters that are most similar ▶ Until there is only one cluster. ▶ The history of merging is a hierarchy in the form of a binary tree. ▶ The standard way of depicting this history is a dendrogram.

  43. Introduction K -means clustering. 0.1 or 0.4) to get a flat particular point (e.g., at dendrogram at a the merger was. what the similarity of each merger tells us 71 / 114 botuom to top. can be read ofg from A dendrogram Variants Hierarchical clustering How many clusters? Evaluation ▶ The history of mergers ▶ The horizontal line of ▶ We can cut the

  44. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Divisive clustering 72 / 114 ▶ Divisive clustering is top-down. ▶ Alternative to HAC (which is botuom up). ▶ Divisive clustering: ▶ Start with all docs in one big cluster ▶ Then recursively split clusters ▶ Eventually each node forms a cluster on its own. ▶ → Bisecting K -means at the end ▶ For now: HAC (= botuom-up)

  45. Introduction 4 return A 14 13 12 11 10 9 8 7 K -means 5 6 73 / 114 3 Evaluation How many clusters? 2 Hierarchical clustering Variants 1 Naive HAC algorithm SimpleHAC ( d 1 , . . . , d N ) for n ← 1 to N do for i ← 1 to N do C [ n ][ i ] ← Sim ( d n , d i ) I [ n ] ← 1 (keeps track of active clusters) A ← [] (collects clustering as a sequence of merges) for k ← 1 to N − 1 do ⟨ i , m ⟩ ← arg max {⟨ i , m ⟩ : i ̸ = m ∧ I [ i ]=1 ∧ I [ m ]=1 } C [ i ][ m ] A . Append ( ⟨ i , m ⟩ ) (store merge) for j ← 1 to N do (use i as representative for < i , m > ) C [ i ][ j ] ← Sim ( < i , m >, j ) C [ j ][ i ] ← Sim ( < i , m >, j ) I [ m ] ← 0 (deactivate cluster)

  46. Introduction Computational complexity of the naive algorithm operation. clusters. K -means 74 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ First, we compute the similarity of all N × N pairs of documents. ▶ Then, in each of N iterations: ▶ We scan the O ( N × N ) similarities to find the maximum similarity. ▶ We merge the two clusters with maximum similarity. ▶ We compute the similarity of the new cluster with all other (surviving) ▶ There are O ( N ) iterations, each performing a O ( N × N ) “scan” ▶ Overall complexity is O ( N 3 ) . ▶ We’ll look at more efgicient algorithms later.

  47. Introduction Key question: How to define cluster similarity same cluster the same cluster) K -means 75 / 114 Variants Hierarchical clustering How many clusters? Evaluation ▶ Single-link: Maximum similarity ▶ Maximum similarity of any two documents ▶ Complete-link: Minimum similarity ▶ Minimum similarity of any two documents ▶ Centroid: Average “intersimilarity” ▶ Average similarity of all document pairs (but excluding pairs of docs in ▶ This is equivalent to the similarity of the centroids. ▶ Group-average: Average “intrasimilarity” ▶ Average similary of all document pairs, including pairs of docs in the

  48. Introduction K -means b b b b 76 / 114 Variants Cluster similarity: Example Evaluation How many clusters? Hierarchical clustering 4 3 2 1 0 0 1 2 3 4 5 6 7

  49. Introduction K -means b b b b 77 / 114 Variants Single-link: Maximum similarity Evaluation How many clusters? Hierarchical clustering 4 3 2 1 0 0 1 2 3 4 5 6 7

  50. Introduction K -means b b b b 78 / 114 Variants Complete-link: Minimum similarity Evaluation How many clusters? Hierarchical clustering 4 3 2 1 0 0 1 2 3 4 5 6 7

  51. Introduction K -means intersimilarity = similarity of two documents in difgerent clusters b b b b 79 / 114 Evaluation Centroid: Average intersimilarity How many clusters? Hierarchical clustering Variants 4 3 2 1 0 0 1 2 3 4 5 6 7

  52. Introduction K -means intrasimilarity = similarity of any pair, including cases in the same cluster b b b b 80 / 114 Evaluation Group average: Average intrasimilarity How many clusters? Hierarchical clustering Variants 4 3 2 1 0 0 1 2 3 4 5 6 7

  53. Introduction b b b b b b b b b b b b b b b b b b b b K -means Cluster similarity: Larger Example Evaluation How many clusters? Hierarchical clustering Variants 81 / 114 4 3 2 1 0 0 1 2 3 4 5 6 7

  54. Introduction b b b b b b b b b b b b b b b b b b b b K -means Single-link: Maximum similarity Evaluation How many clusters? Hierarchical clustering Variants 82 / 114 4 3 2 1 0 0 1 2 3 4 5 6 7

  55. Introduction b b b b b b b b b b b b b b b b b b b b K -means Complete-link: Minimum similarity Evaluation How many clusters? Hierarchical clustering Variants 83 / 114 4 3 2 1 0 0 1 2 3 4 5 6 7

  56. Introduction b b b b b b b b b b b b b b b b b b b b K -means Centroid: Average intersimilarity Evaluation How many clusters? Hierarchical clustering Variants 84 / 114 4 3 2 1 0 0 1 2 3 4 5 6 7

  57. Introduction b b b b b b b b b b b b b b b b b b b b K -means Group average: Average intrasimilarity Evaluation How many clusters? Hierarchical clustering Variants 85 / 114 4 3 2 1 0 0 1 2 3 4 5 6 7

  58. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Single link HAC maximum similarity of a document from the first cluster and a document from the second cluster. matrix? 86 / 114 ▶ The similarity of two clusters is the maximum intersimilarity – the ▶ Once we have merged two clusters, how do we update the similarity ▶ This is simple for single link: sim ( ω i , ( ω k 1 ∪ ω k 2 )) = max ( sim ( ω i , ω k 1 ) , sim ( ω i , ω k 2 ))

  59. Introduction members) being added dendrogram. derived by cutuing the clustering that can be 2-cluster or 3-cluster to the main cluster clusters (1 or 2 K -means This dendrogram was produced by single-link Variants Hierarchical clustering How many clusters? Evaluation 87 / 114 ▶ Notice: many small ▶ There is no balanced

  60. Introduction K -means of the cluster that we would get if we merged them. matrix? document from the second cluster. minimum similarity of a document from the first cluster and a Complete link HAC Variants Hierarchical clustering How many clusters? Evaluation 88 / 114 ▶ The similarity of two clusters is the minimum intersimilarity – the ▶ Once we have merged two clusters, how do we update the similarity ▶ Again, this is simple: sim ( ω i , ( ω k 1 ∪ ω k 2 )) = min ( sim ( ω i , ω k 1 ) , sim ( ω i , ω k 2 )) ▶ We measure the similarity of two clusters by computing the diameter

  61. Introduction K -means Evaluation How many clusters? Hierarchical clustering Variants Complete-link dendrogram dendrogram is much more balanced than the single-link one. 2-cluster clustering with two clusters of about the same size. 89 / 114 ▶ Notice that this ▶ We can create a

  62. Introduction Variants K -means Exercise: Compute single and complete link clusterings 90 / 114 Hierarchical clustering How many clusters? Evaluation d 1 d 2 d 3 d 4 × × × × 3 2 d 5 d 6 d 7 d 8 × × × × 1 0 0 1 2 3 4

  63. Introduction Variants K -means Single-link clustering 91 / 114 Hierarchical clustering How many clusters? Evaluation d 1 d 2 d 3 d 4 × × × × 3 2 d 5 d 6 d 7 d 8 × × × × 1 0 0 1 2 3 4

  64. Introduction Variants K -means Complete link clustering 92 / 114 Hierarchical clustering How many clusters? Evaluation d 1 d 2 d 3 d 4 × × × × 3 2 d 5 d 6 d 7 d 8 × × × × 1 0 0 1 2 3 4

  65. Introduction Variants K -means Single-link vs. Complete link clustering 93 / 114 Hierarchical clustering How many clusters? Evaluation d 1 d 2 d 3 d 4 d 1 d 2 d 3 d 4 × × × × × × × × 3 3 2 2 d 5 d 6 d 7 d 8 d 5 d 6 d 7 d 8 × × × × × × × × 1 1 0 0 0 1 2 3 4 0 1 2 3 4

  66. Introduction Single-link: Chaining applications, these are undesirable. Single-link clustering ofuen produces long, stragglyclusters. For most K -means 94 / 114 Variants Hierarchical clustering Evaluation How many clusters? × × × × × × × × × × × × 2 × × × × × × × × × × × × 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12

  67. Introduction Variants K -means What 2-cluster clustering will complete-link produce? 95 / 114 Hierarchical clustering Evaluation How many clusters? d 1 d 2 d 3 d 4 d 5 × × × × × 1 0 0 1 2 3 4 5 6 7 Coordinates: 1 + 2 × ϵ, 4 , 5 + 2 × ϵ, 6 , 7 − ϵ .

  68. Introduction Complete-link: Sensitivity to outliers complete-link clustering. neighbors – clearly undesirable. K -means 96 / 114 Variants Evaluation How many clusters? Hierarchical clustering d 1 d 2 d 3 d 4 d 5 × × × × × 1 0 0 1 2 3 4 5 6 7 ▶ The complete-link clustering of this set splits d 2 from its right ▶ The reason is the outlier d 1 . ▶ This shows that a single outlier can negatively afgect the outcome of ▶ Single-link clustering does betuer in this case.

  69. Introduction Centroid HAC centroids: K -means average similarity of documents from the first cluster with documents from the second cluster. Variants Hierarchical clustering How many clusters? Evaluation 97 / 114 ▶ The similarity of two clusters is the average intersimilarity – the ▶ A naive implementation of this definition is inefgicient ( O ( N 2 ) ), but the definition is equivalent to computing the similarity of the sim-cent ( ω i , ω j ) = ⃗ µ ( ω i ) · ⃗ µ ( ω j ) ▶ Hence the name: centroid HAC ▶ Note: this is the dot product, not cosine similarity!

  70. Introduction Variants K -means Exercise: Compute centroid clustering 98 / 114 Hierarchical clustering Evaluation How many clusters? × d 1 × d 3 5 4 × d 2 × d 4 3 2 × × d 6 1 d 5 0 0 1 2 3 4 5 6 7

  71. Introduction K -means bc bc bc 99 / 114 Centroid clustering Variants Hierarchical clustering Evaluation How many clusters? × d 1 × d 3 5 µ 2 4 × d 2 × d 4 3 2 µ 3 × × d 6 1 d 5 µ 1 0 0 1 2 3 4 5 6 7

  72. Introduction Inversion in centroid clustering bc K -means Results in an “inverted” dendrogram. 100 / 114 Variants Hierarchical clustering Evaluation How many clusters? ▶ In an inversion, the similarity increases during a merge sequence. ▶ Below: Similarity of the first merger ( d 1 ∪ d 2 ) is -4.0, similarity of second merger (( d 1 ∪ d 2 ) ∪ d 3 ) is ≈ − 3 . 5 . d 3 5 × − 4 4 − 3 3 − 2 2 d 1 d 2 − 1 × × 1 0 0 d 1 d 2 d 3 0 1 2 3 4 5

Recommend


More recommend