Clustering: K-Means & Mixture models Prof. Mike Hughes Many - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Clustering: K-Means & Mixture models Prof. Mike Hughes Many ideas/slides attributable to: Emily Fox (UW), Erik Sudderth (UCI) 2

What will we learn? Supervised Learning Data Examples Performance { x n } N measure Task n =1 Unsupervised Learning summary data of x x Reinforcement Learning Mike Hughes - Tufts COMP 135 - Spring 2019 3

Task: Clustering Supervised Learning Unsupervised Learning clustering Reinforcement Learning Mike Hughes - Tufts COMP 135 - Spring 2019 4

Clustering: Unit Objectives • Understand key challenges • How to choose the number of clusters? • How to choose the shape of clusters? • K-means clustering (deep dive) • Shape: Linear Boundaries (nearest Euclidean centroid) • Explain algorithm as instance of “coordinate descent” • Update some variables while holding others fixed • Need smart init and multiple restarts to avoid local optima • Mixture models (primer) • Advantages of soft assignments and covariances Mike Hughes - Tufts COMP 135 - Spring 2019 5

Examples of Clustering Mike Hughes - Tufts COMP 135 - Spring 2019 6

Clustering Animals by Features Mike Hughes - Tufts COMP 135 - Spring 2019 7

Clustering Images Mike Hughes - Tufts COMP 135 - Spring 2019 8

Image Compression Possible pixel values (R, G, B): Possible pixel values: 255 * 255 * 255 = 16 million One of 16 fixed (R,G,B) values This image on the right achieves a compression factor of around 1 million! Mike Hughes - Tufts COMP 135 - Spring 2019 9

Understanding Genes Mike Hughes - Tufts COMP 135 - Spring 2019 10

How to cluster these points? Mike Hughes - Tufts COMP 135 - Spring 2019 11

How to cluster these points? Mike Hughes - Tufts COMP 135 - Spring 2019 12

Key Questions N ( x n − m ) T ( x n − m ) X min m ∈ R F n =1 Mike Hughes - Tufts COMP 135 - Spring 2019 13

K-Means Mike Hughes - Tufts COMP 135 - Spring 2019 14

Input: • Dataset of N example feature vectors • Number of clusters K Mike Hughes - Tufts COMP 135 - Spring 2019 15

K-Means Goals • Assign each example to one of K clusters • Assumption: Clusters are exclusive • Minimize Euclidean distance from examples to cluster centers • Assumption: Isotropic Euclidean distance (all features weighted equally, no covariance modeled) is a good metric for your data Mike Hughes - Tufts COMP 135 - Spring 2019 16

K-Means output • Centroid Vectors (one per cluster k in 1, … K) Length = # features F Real-valued • Assignments (one per example n in 1 … N) One-hot vector indicates which of K clusters example n is assigned to Mike Hughes - Tufts COMP 135 - Spring 2019 17

Use Euclidean distance Mike Hughes - Tufts COMP 135 - Spring 2019 18

K-means Optimization Problem Mike Hughes - Tufts COMP 135 - Spring 2019 19

K-Means Algorithm Initialize cluster means Repeat until converged 1) Update per-example assignment For each n in 1:N: Find cluster k* that minimizes Set to indicate k* 2) Update per-cluster centroid For each k in 1:K: Set to mean of data vectors assigned to k Mike Hughes - Tufts COMP 135 - Spring 2019 20

K-Means Algorithm Initialize cluster means Repeat until converged 1) Update per-example assignment 2) Update per-cluster centroid Mike Hughes - Tufts COMP 135 - Spring 2019 21

Updates each improve cost Mike Hughes - Tufts COMP 135 - Spring 2019 22

K-Means Algo: Coordinate Ascent Credit: Jake VanderPlas E-step or per-example step: Update Assignments M-step or per-centroid step: Update Centroid Locations Each step yields cost equal or lower than before Mike Hughes - Tufts COMP 135 - Spring 2019 23

Demo! http://stanford.edu/class/ee103/visualizations/ kmeans/kmeans.html Mike Hughes - Tufts COMP 135 - Spring 2019 24

Demo 2 (Choose initial clusters) https://www.naftaliharris.com/blog/visualizing- k-means-clustering/ Pick a dataset and fix a K value (e.g. 2 clusters) Can you find a different fixed point solution from your neighbor? What does this mean about the objective? Mike Hughes - Tufts COMP 135 - Spring 2019 25

K-means Boundaries are Linear Mike Hughes - Tufts COMP 135 - Spring 2019 26

Decisions when applying k-means • How to initialize the clusters? • How to choose K? Mike Hughes - Tufts COMP 135 - Spring 2019 27

Initialization: K-means++ Mike Hughes - Tufts COMP 135 - Spring 2019 28

Possible Initializations • Draw K random centroid locations • Choose K data vectors as centroids • Uniformly at random What can go wrong? Mike Hughes - Tufts COMP 135 - Spring 2019 29

Example • Toy Example: Cluster these 4 points with K=2 1 units D units Mike Hughes - Tufts COMP 135 - Spring 2019 30

No Guarantees on Cost! BAD solution. Cost scales with distance D, which could be arbitrarily larger than 1 OPTIMAL solution. Cost scales will be O(1) Mike Hughes - Tufts COMP 135 - Spring 2019 31

Better init: k-means++ Arthur & Vassilvitskii SODA ‘07 Step 1: choose an example uniformly at random as first centroid Repeat for k = 2, 3, … K: Choose example based on distance from nearest centroid 32

k-means++: Arthur & Vassilvitskii SODA ‘07 Guarantees on Quality Step 1: choose an example uniformly at random as first centroid Repeat for k = 2, 3, … K: Choose with probability proportional to distance from nearest centroid Theorem : This initialization will achieve score that is O(log K) of optimal score. 33

Use cost to decide among multiple runs of k-means Mike Hughes - Tufts COMP 135 - Spring 2019 34

How to pick K in K-means? Mike Hughes - Tufts COMP 135 - Spring 2019 35

Same data. Which K is best? Mike Hughes - Tufts COMP 135 - Spring 2019 36

Use cost function? No! At each K, the global optimal cost always decreases. (Local optima may not) Limit as K -> N, cost is zero . Mike Hughes - Tufts COMP 135 - Spring 2019 37

Add complexity penalty! Want adding additional clusters to increase cost, if don’t help “enough” Mike Hughes - Tufts COMP 135 - Spring 2019 38

Computation Issues Mike Hughes - Tufts COMP 135 - Spring 2019 39

K-Means Computation • Most expensive step: Updating assignments • N x K distance calculations • Scalable? • Don’t need to update all examples, just grab a minibatch • Can do stochastic learning rate updates too • Parallelizable? • Yes. Given fixed centroids, can process minibatches of examples (the assignment step) in parallel Mike Hughes - Tufts COMP 135 - Spring 2019 40

Improved clustering: Gaussian mixture model Mike Hughes - Tufts COMP 135 - Spring 2019 41

Improving K-Means • Assign each example to one of K clusters • Assumption: Clusters are exclusive • Improvement: Soft probabilistic assignment • Minimize Euclidean distance from examples to cluster centers • Assumption: Isotropic Euclidean distance (all features weighted equally, no covariance modeled) is a good metric for your data • Improvement: Model cluster covariance Mike Hughes - Tufts COMP 135 - Spring 2019 42

Gaussian Mixture Model Mike Hughes - Tufts COMP 135 - Spring 2019 43

Gaussian Mixture Model • Mean Vectors (one per cluster k in 1, … K) Length = # features F Real-valued • Covariance Matrix (one per cluster k in 1 … K) F x F square symmetric matrix Positive definite (invertible) • Soft assignments (one per example n in 1 … N) Probabilistic! Vector sums to one Mike Hughes - Tufts COMP 135 - Spring 2019 44

Covariance Models Credit: Jake VanderPlas Most similar More flexible to k-means Mike Hughes - Tufts COMP 135 - Spring 2019 45

GMM Training Maximize the likelihood of the data Beyond this course: Can show this looks a lot like K-means’ simplified objective Algorithm: Coordinate ascent! E-step : Update soft assignments r M-step: Update means and covariances Mike Hughes - Tufts COMP 135 - Spring 2019 46

Special Case • K-means is a GMM with: • Hard winner-take-all assignments • Spherical covariance constraints Mike Hughes - Tufts COMP 135 - Spring 2019 47

Clustering: Unit Objectives • Understand key challenges • How to choose the number of clusters? • How to choose the shape of clusters? • K-means clustering (deep dive) • Shape: Linear Boundaries (nearest Euclidean centroid) • Explain algorithm as instance of “coordinate descent” • Update some variables while holding others fixed • Need smart init and multiple restarts to avoid local optima • Mixture models (primer) • Advantages of soft assignments and covariances Mike Hughes - Tufts COMP 135 - Spring 2019 48

Clustering: K-Means & Mixture models Prof. Mike Hughes Many - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Clustering: K-Means & Mixture models Prof. Mike Hughes Many ideas/slides attributable to: Emily Fox (UW), Erik Sudderth (UCI) 2 What will we learn?

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Contents Clustering K-means Mixture of Gaussians Expectation Maximization

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline Clustering K-mean

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Flexible Mixture Modeling and Model-Based Clustering in R Bettina Grn September 2017 c

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Introduction to Machine Learning Part 1 and Part 2 Yingyu Liang yliang@cs.wisc.edu Computer

Clustering in Go May 2016 Wilfried Schobeiri MediaMath

Data Clustering with R Yanchang Zhao http://www.RDataMining.com R and Data Mining Course

With numeric and categorical variables (active and/or illustrative) Ricco RAKOTOMALALA

DBSCAN Presented by: Garrett Poppe A density-based algorithm for discovering clusters in large

Clusters for DNN Training Workloads Myeongjae Jeon , Shivaram Venkataraman, Amar Phanishayee,

Detecting Clusters in Moderate-to-high Dimensional Data: Subspace Clustering, Pattern-based

A Fistful of Bitcoins: Characterizing Payments Among Men with No Names Sarah Meiklejohn (UC San