clustering k means mixture models
play

Clustering: K-Means & Mixture models Prof. Mike Hughes Many - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Clustering: K-Means & Mixture models Prof. Mike Hughes Many ideas/slides attributable to: Emily Fox (UW), Erik Sudderth (UCI) 2 What will we learn?


  1. Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Clustering: K-Means & Mixture models Prof. Mike Hughes Many ideas/slides attributable to: Emily Fox (UW), Erik Sudderth (UCI) 2

  2. What will we learn? Supervised Learning Data Examples Performance { x n } N measure Task n =1 Unsupervised Learning summary data of x x Reinforcement Learning Mike Hughes - Tufts COMP 135 - Spring 2019 3

  3. Task: Clustering Supervised Learning Unsupervised Learning clustering Reinforcement Learning Mike Hughes - Tufts COMP 135 - Spring 2019 4

  4. Clustering: Unit Objectives • Understand key challenges • How to choose the number of clusters? • How to choose the shape of clusters? • K-means clustering (deep dive) • Shape: Linear Boundaries (nearest Euclidean centroid) • Explain algorithm as instance of “coordinate descent” • Update some variables while holding others fixed • Need smart init and multiple restarts to avoid local optima • Mixture models (primer) • Advantages of soft assignments and covariances Mike Hughes - Tufts COMP 135 - Spring 2019 5

  5. Examples of Clustering Mike Hughes - Tufts COMP 135 - Spring 2019 6

  6. Clustering Animals by Features Mike Hughes - Tufts COMP 135 - Spring 2019 7

  7. Clustering Images Mike Hughes - Tufts COMP 135 - Spring 2019 8

  8. Image Compression Possible pixel values (R, G, B): Possible pixel values: 255 * 255 * 255 = 16 million One of 16 fixed (R,G,B) values This image on the right achieves a compression factor of around 1 million! Mike Hughes - Tufts COMP 135 - Spring 2019 9

  9. Understanding Genes Mike Hughes - Tufts COMP 135 - Spring 2019 10

  10. How to cluster these points? Mike Hughes - Tufts COMP 135 - Spring 2019 11

  11. How to cluster these points? Mike Hughes - Tufts COMP 135 - Spring 2019 12

  12. Key Questions N ( x n − m ) T ( x n − m ) X min m ∈ R F n =1 Mike Hughes - Tufts COMP 135 - Spring 2019 13

  13. K-Means Mike Hughes - Tufts COMP 135 - Spring 2019 14

  14. Input: • Dataset of N example feature vectors • Number of clusters K Mike Hughes - Tufts COMP 135 - Spring 2019 15

  15. K-Means Goals • Assign each example to one of K clusters • Assumption: Clusters are exclusive • Minimize Euclidean distance from examples to cluster centers • Assumption: Isotropic Euclidean distance (all features weighted equally, no covariance modeled) is a good metric for your data Mike Hughes - Tufts COMP 135 - Spring 2019 16

  16. K-Means output • Centroid Vectors (one per cluster k in 1, … K) Length = # features F Real-valued • Assignments (one per example n in 1 … N) One-hot vector indicates which of K clusters example n is assigned to Mike Hughes - Tufts COMP 135 - Spring 2019 17

  17. Use Euclidean distance Mike Hughes - Tufts COMP 135 - Spring 2019 18

  18. K-means Optimization Problem Mike Hughes - Tufts COMP 135 - Spring 2019 19

  19. K-Means Algorithm Initialize cluster means Repeat until converged 1) Update per-example assignment For each n in 1:N: Find cluster k* that minimizes Set to indicate k* 2) Update per-cluster centroid For each k in 1:K: Set to mean of data vectors assigned to k Mike Hughes - Tufts COMP 135 - Spring 2019 20

  20. K-Means Algorithm Initialize cluster means Repeat until converged 1) Update per-example assignment 2) Update per-cluster centroid Mike Hughes - Tufts COMP 135 - Spring 2019 21

  21. Updates each improve cost Mike Hughes - Tufts COMP 135 - Spring 2019 22

  22. K-Means Algo: Coordinate Ascent Credit: Jake VanderPlas E-step or per-example step: Update Assignments M-step or per-centroid step: Update Centroid Locations Each step yields cost equal or lower than before Mike Hughes - Tufts COMP 135 - Spring 2019 23

  23. Demo! http://stanford.edu/class/ee103/visualizations/ kmeans/kmeans.html Mike Hughes - Tufts COMP 135 - Spring 2019 24

  24. Demo 2 (Choose initial clusters) https://www.naftaliharris.com/blog/visualizing- k-means-clustering/ Pick a dataset and fix a K value (e.g. 2 clusters) Can you find a different fixed point solution from your neighbor? What does this mean about the objective? Mike Hughes - Tufts COMP 135 - Spring 2019 25

  25. K-means Boundaries are Linear Mike Hughes - Tufts COMP 135 - Spring 2019 26

  26. Decisions when applying k-means • How to initialize the clusters? • How to choose K? Mike Hughes - Tufts COMP 135 - Spring 2019 27

  27. Initialization: K-means++ Mike Hughes - Tufts COMP 135 - Spring 2019 28

  28. Possible Initializations • Draw K random centroid locations • Choose K data vectors as centroids • Uniformly at random What can go wrong? Mike Hughes - Tufts COMP 135 - Spring 2019 29

  29. Example • Toy Example: Cluster these 4 points with K=2 1 units D units Mike Hughes - Tufts COMP 135 - Spring 2019 30

  30. No Guarantees on Cost! BAD solution. Cost scales with distance D, which could be arbitrarily larger than 1 OPTIMAL solution. Cost scales will be O(1) Mike Hughes - Tufts COMP 135 - Spring 2019 31

  31. Better init: k-means++ Arthur & Vassilvitskii SODA ‘07 Step 1: choose an example uniformly at random as first centroid Repeat for k = 2, 3, … K: Choose example based on distance from nearest centroid 32

  32. k-means++: Arthur & Vassilvitskii SODA ‘07 Guarantees on Quality Step 1: choose an example uniformly at random as first centroid Repeat for k = 2, 3, … K: Choose with probability proportional to distance from nearest centroid Theorem : This initialization will achieve score that is O(log K) of optimal score. 33

  33. Use cost to decide among multiple runs of k-means Mike Hughes - Tufts COMP 135 - Spring 2019 34

  34. How to pick K in K-means? Mike Hughes - Tufts COMP 135 - Spring 2019 35

  35. Same data. Which K is best? Mike Hughes - Tufts COMP 135 - Spring 2019 36

  36. Use cost function? No! At each K, the global optimal cost always decreases. (Local optima may not) Limit as K -> N, cost is zero . Mike Hughes - Tufts COMP 135 - Spring 2019 37

  37. Add complexity penalty! Want adding additional clusters to increase cost, if don’t help “enough” Mike Hughes - Tufts COMP 135 - Spring 2019 38

  38. Computation Issues Mike Hughes - Tufts COMP 135 - Spring 2019 39

  39. K-Means Computation • Most expensive step: Updating assignments • N x K distance calculations • Scalable? • Don’t need to update all examples, just grab a minibatch • Can do stochastic learning rate updates too • Parallelizable? • Yes. Given fixed centroids, can process minibatches of examples (the assignment step) in parallel Mike Hughes - Tufts COMP 135 - Spring 2019 40

  40. Improved clustering: Gaussian mixture model Mike Hughes - Tufts COMP 135 - Spring 2019 41

  41. Improving K-Means • Assign each example to one of K clusters • Assumption: Clusters are exclusive • Improvement: Soft probabilistic assignment • Minimize Euclidean distance from examples to cluster centers • Assumption: Isotropic Euclidean distance (all features weighted equally, no covariance modeled) is a good metric for your data • Improvement: Model cluster covariance Mike Hughes - Tufts COMP 135 - Spring 2019 42

  42. Gaussian Mixture Model Mike Hughes - Tufts COMP 135 - Spring 2019 43

  43. Gaussian Mixture Model • Mean Vectors (one per cluster k in 1, … K) Length = # features F Real-valued • Covariance Matrix (one per cluster k in 1 … K) F x F square symmetric matrix Positive definite (invertible) • Soft assignments (one per example n in 1 … N) Probabilistic! Vector sums to one Mike Hughes - Tufts COMP 135 - Spring 2019 44

  44. Covariance Models Credit: Jake VanderPlas Most similar More flexible to k-means Mike Hughes - Tufts COMP 135 - Spring 2019 45

  45. GMM Training Maximize the likelihood of the data Beyond this course: Can show this looks a lot like K-means’ simplified objective Algorithm: Coordinate ascent! E-step : Update soft assignments r M-step: Update means and covariances Mike Hughes - Tufts COMP 135 - Spring 2019 46

  46. Special Case • K-means is a GMM with: • Hard winner-take-all assignments • Spherical covariance constraints Mike Hughes - Tufts COMP 135 - Spring 2019 47

  47. Clustering: Unit Objectives • Understand key challenges • How to choose the number of clusters? • How to choose the shape of clusters? • K-means clustering (deep dive) • Shape: Linear Boundaries (nearest Euclidean centroid) • Explain algorithm as instance of “coordinate descent” • Update some variables while holding others fixed • Need smart init and multiple restarts to avoid local optima • Mixture models (primer) • Advantages of soft assignments and covariances Mike Hughes - Tufts COMP 135 - Spring 2019 48

Recommend


More recommend