Contents Clustering K-means Mixture of Gaussians Expectation - PowerPoint PPT Presentation

Contents � Clustering � K-means � Mixture of Gaussians � Expectation Maximization � Variational Methods 1

Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh

Clustering 3

K- means clustering What is clustering? Clustering : The process of grouping a set of objects into classes of similar objects –high intra-class similarity –low inter-class similarity –It is the most common form of unsupervised learning 4

K- means clustering What is Similarity? Hard to define! But we know it when we see it The real meaning of similarity is a philosophical question. We will take a more pragmatic approach: think in terms of a distance (rather than similarity) between random variables. 5

The K- means Clustering Problem 6

K-means Clustering Problem K -means clustering problem: Partition the n observations into K sets ( K ≤ n ) S = { S 1 , S 2 , … , S K } such that the sets minimize the within-cluster sum of squares: K=3 7

K-means Clustering Problem K -means clustering problem: Partition the n observations into K sets ( K ≤ n ) S = { S 1 , S 2 , … , S K } such that the sets minimize the within-cluster sum of squares: How hard is this problem? The problem is NP hard, but there are good heuristic algorithms that seem to work well in practice: • K–means algorithm • mixture of Gaussians 8

K-means Clustering Alg: Step 1 • Given n objects. • Guess the cluster centers k 1 , k 2 , k 3. (They were µ 1 ,…, µ 3 in the previous slide) 9

K-means Clustering Alg: Step 2 • Build a Voronoi diagram based on the cluster centers k 1 , k 2 , k 3. • Decide the class memberships of the n objects by assigning them to the nearest cluster centers k 1 , k 2 , k 3 . 10

K-means Clustering Alg: Step 3 • Re-estimate the cluster centers (aka the centroid or mean), by assuming the memberships found above are correct. 11

K-means Clustering Alg: Step 4 • Build a new Voronoi diagram. • Decide the class memberships of the n objects based on this diagram 12

K-means Clustering Alg: Step 5 • Re-estimate the cluster centers. 13

K-means Clustering Alg: Step 6 • Stop when everything is settled. (The Voronoi diagrams don’t change anymore) 14

K- means clustering K- means Clustering Algorithm Algorithm Input – Data + Desired number of clusters, K Initialize – the K cluster centers (randomly if necessary) Iterate 1. Decide the class memberships of the n objects by assigning them to the nearest cluster centers 2. Re-estimate the K cluster centers (aka the centroid or mean), by assuming the memberships found above are correct. Termination – If none of the n objects changed membership in the last iteration, exit. Otherwise go to 1. 15

K- means Algorithm K- means clustering Computation Complexity � At each iteration, – Computing distance between each of the n objects and the K cluster centers is O( Kn ). – Computing cluster centers: Each object gets added once to some cluster: O( n ). � Assume these two steps are each done once for l iterations: O( l Kn ). Can you prove that the K-means algorithm guaranteed to terminate? 16

K- means clustering Seed Choice 17

K- means clustering Seed Choice 18

K- means clustering Seed Choice The results of the K- means Algorithm can vary based on random seed selection. � Some seeds can result in poor convergence rate , or convergence to sub-optimal clustering. � K-means algorithm can get stuck easily in local minima. – Select good seeds using a heuristic (e.g., object least similar to any existing mean) – Try out multiple starting points (very important!!!) – Initialize with the results of another method. 19

Alternating Optimization 20

K- means clustering K- means Algorithm (more formally) � Randomly initialize k centers � Classify : At iteration t, assign each point (j ∈ {1,…,n}) to nearest center: Classification at iteration t � Recenter : µ i is the centroid of the new sets: Re-assign new cluster centers at iteration t 21

K- means clustering What is K-means optimizing? � Define the following potential function F of centers µ and point allocation C Two equivalent versions � Optimal solution of the K-means problem : 22

K- means clustering K-means Algorithm Optimize the potential function: K-means algorithm: (1) Exactly first step Assign each point to the nearest cluster center (2) Exactly 2 nd step (re-center) 23

K- means clustering K-means Algorithm Optimize the potential function: K-means algorithm: (coordinate descent on F) (1) Expectation step (2) Maximization step Today, we will see a generalization of this approach: EM algorithm 24

Gaussian Mixture Model 25

Density Estimation Generative approach • There is a latent parameter Θ • For all i, draw observed x i given Θ What if the basic model doesn’t fit all data? ⇒ Mixture modelling, Partitioning algorithms Different parameters for different parts of the domain. 26

K- means clustering Partitioning Algorithms • K-means – hard assignment : each object belongs to only one cluster • Mixture modeling – soft assignment : probability that an object belongs to a cluster 27

K- means clustering Gaussian Mixture Model Mixture of K Gaussians distributions: (Multi-modal distribution) • There are K components • Component i has an associated mean vector µ i Component i generates data from Each data point is generated using this process: 28

Gaussian Mixture Model Mixture of K Gaussians distributions: (Multi-modal distribution) Hidden variable Mixture Observed Mixture component data proportion 29

Mixture of Gaussians Clustering Assume that For a given x we want to decide if it belongs to cluster i or cluster j Cluster x based on posteriors : 30

Mixture of Gaussians Clustering Assume that 31

Piecewise linear decision boundary 32

MLE for GMM What if we don't know the parameters? ⇒ Maximum Likelihood Estimate (MLE) ⇒ ⇒ ⇒ 33

K-means and GMM MLE: • What happens if we assume hard assignment? P(y j = i) = 1 if i = C(j) = 0 otherwise In this case the MLE estimation : Same as K-means!!! 34

General GMM General GMM –Gaussian Mixture Model (Multi-modal distribution) • There are k components • Component i has an associated mean vector µ i • Each component generates data from a Gaussian with mean µ i and covariance matrix Σ i . Each data point is generated according to the following recipe: 1) Pick a component at random: Choose component i with probability P(y=i) 2) Datapoint x~ N( µ i , Σ i ) 35

General GMM GMM –Gaussian Mixture Model (Multi-modal distribution) Mixture Mixture proportion component 36

General GMM Assume that Clustering based on posteriors : “Quadratic Decision boundary” – second-order terms don’t cancel out 37

General GMM MLE Estimation What if we don't know ⇒ Maximize marginal likelihood (MLE): ⇒ ⇒ ⇒ Non-linear, non-analytically solvable Doable, but often slow 38

Expectation-Maximization (EM) A general algorithm to deal with hidden data, but we will study it in the context of unsupervised learning (hidden class labels = clustering) first. • EM is an optimization strategy for objective functions that can be interpreted as likelihoods in the presence of missing data. • EM is “simpler” than gradient methods: No need to choose step size. • EM is an iterative algorithm with two linked steps: o E-step: fill-in hidden values using inference o M-step: apply standard MLE/MAP method to completed data • We will prove that this procedure monotonically improves the likelihood (or leaves it unchanged). EM always converges to a local optimum of the likelihood. 39

Expectation-Maximization (EM) A simple case: • We have unlabeled data x 1 , x 2 , …, x n • We know there are K classes • We know P(y=1)= π 1 , P(y=2)= π 2 P(y=3) … P(y=K)= π K • We know common variance σ 2 • We don’t know µ 1 , µ 2 , … µ K , and we want to learn them We can write Independent data Marginalize over class ⇒ learn µ 1 , µ 2 , … µ K 40

Expectation (E) step We want to learn: Our estimator at the end of iteration t-1: At iteration t, construct function Q: E step Equivalent to assigning clusters to each data point in K-means in a soft way 41

Maximization (M) step We calculated these weights in the E step Joint distribution is simple M step At iteration t, maximize function Q in θ t : Equivalent to updating cluster centers in K-means 42

EM for spherical, same variance GMMs E-step Compute “expected” classes of all datapoints for each class In K-means “E-step” we do hard assignment. EM does soft assignment M-step Compute Max of function Q. [I.e. update µ given our data’s class membership distributions (weights) ] Iterate. Exactly the same as MLE with weighted data. 43

Contents Clustering K-means Mixture of Gaussians Expectation - PowerPoint PPT Presentation

Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 1 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabs Pczos & Aarti Singh Clustering 3 K- means

Contents Contents Fluid

Contents Contents.....2 Butter

17 www.scad.ae Table of Contents Table of Contents

Level 1, V2.0 Level 1, V2.0 1 Course Contents Course Contents Course Contents Course

Trigger and DAQ at LHC Trigger and DAQ at LHC C.Schwick Contents Contents INTRODUCTION The

Trigger and DAQ at LHC Trigger and DAQ at LHC C.Schwick Contents Contents INTRODUCTION The

CONTENTS CONTENTS 1 6 INTRODUCTION APPLICATION 2 ORGANIZATION 7 DISTRIBUTION

Introduction To Tcl/Tk Introduction To Tcl/Tk - Contents - Contents Whats Tcl/Tk? 3

September 2019 02 CONTENTS 02 VILLA GUASTAVILLANI 03 CONTENTS 04 WHO WE ARE 05 PURPOSE

Oasys Post-Processing: Did you know?... Back to Contents Slide 1 Contents Shortcuts

Contents Contents Tables i i Figures v v 1.0 1.0 Introduction 1 1 2.0 2.0 Me

Engagement with the Commonwealth 1 PRESENTATION CONTENTS Contents Economic Impact Study

MOCK-UP Table of Contents

WELCOME TO THE EXTRAORDINARY GENERAL MEETING 21 NOVEMBER 2017 CONTENTS CONTENTS THE

Contents Contents 1. Establishment of a Big Data Roadmap of the Korean Government 2.

HANDOUTS 1 Slide 2 Handout contents Page 2-3 Handout contents 4 Introduction 5 - 6 Paying

TO THE AA Investor presentation H1 17 Interims CONTENTS, IR CONTACTS AND DEFINITIONS CONTENTS

Annual Results 2019 CONTENTS Presenters Contents Paul Williams Introduction and overview 01

Organizing My I.B. Portfolio Table of Contents The table of contents displays the order of

2014 Investment Statement: Managing the Crowns B l Balance Sheet Sh t Contents Contents

K K N bar N

31 December 2016 CONTENTS CONTENTS OVERVIEW MARKET REVIEW OPERATIONAL REVIEW FINANCIAL REVIEW

ICII ROBOTS ICII ROBOTS Ing. Frantiek Ducho Ing. Marian Kik Contents Contents

DERWENT LONDON PLC CONTENTS Presenters Contents John Burns Introduction and overview 01 Simon