Latent Variable Models and Expectation Maximization Oliver Schulte - PowerPoint PPT Presentation

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models Learning Parameters to Probability Distributions • We discussed probabilistic models at length • Assignment 3: given fully observed training data, setting parameters θ i for Bayes nets is straight-forward • However, in many settings not all variables are observed (labelled) in the training data: x i = ( x i , h i ) • e.g. Speech recognition: have speech signals, but not phoneme labels • e.g. Object recognition: have object labels (car, bicycle), but not part labels (wheel, door, seat) • Unobserved variables are called latent variables 20 40 60 80 100 120 140 160 180 figs from Fergus et al.

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models Latent Variable Models: Pros • Statistically powerful, often good predictions. Many applications: • Learning with missing data . • Clustering: “missing” cluster label for data points. • Principal Component Analysis: data points are generated in linear fashion from a small set of unobserved components. (more later) • Matrix Factorization, Recommender Systems: • Assign users to unobserved “user types”, assign items to unobserved “item types”. • Use similarity between user type, item type to predict preference of user for item. • Winner of $1M Netflix challenge. • If latent variables have an intuitive interpretation (e.g., “action movies”, “factors”), discovers new features .

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models Latent Variable Models: Cons • From a user’s point of view, like a black box if latent variables don’t have an intuitive interpretation. • Statistically, hard to guarantee convergence to a correct model with more data (the identifiability problem). • Harder computationally, usually no closed form for maximum likelihood estimates. • However, the Expectation-Maximization algorithm provides a general-purpose local search algorithm for learning parameters in probabilistic models with latent variables.

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models Outline K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models Unsupervised Learning • We will start with an unsupervised (a) 2 learning (clustering) problem: • Given a dataset { x 1 , . . . , x N } , each 0 x i ∈ R D , partition the dataset into K clusters − 2 • Intuitively, a cluster is a group of − 2 0 2 points, which are close together and far from others

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models Distortion Measure (a) 2 • Formally, introduce prototypes (or cluster centers) µ k ∈ R D 0 • Use binary r nk , 1 if point n is in cluster k , − 2 0 otherwise (1-of- K coding scheme − 2 0 2 again) (i) 2 • Find { µ k } , { r nk } to minimize distortion measure: 0 N K � � r nk || x n − µ k || 2 J = −2 n = 1 k = 1 −2 0 2

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models Minimizing Distortion Measure • Minimizing J directly is hard N K � � r nk || x n − µ k || 2 J = n = 1 k = 1 • However, two things are easy • If we know µ k , minimizing J wrt r nk • If we know r nk , minimizing J wrt µ k • This suggests an iterative procedure • Start with initial guess for µ k • Iteration of two steps: • Minimize J wrt r nk • Minimize J wrt µ k • Rinse and repeat until convergence

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models Determining Membership Variables • Step 1 in an iteration of K-means is to minimize distortion measure J wrt (a) 2 cluster membership variables r nk N K 0 � � r nk || x n − µ k || 2 J = n = 1 k = 1 −2 −2 0 2 • Terms for different data points x n are (b) independent, for each data point set r nk 2 to minimize 0 K � r nk || x n − µ k || 2 k = 1 −2 −2 0 2 • Simply set r nk = 1 for the cluster center µ k with smallest distance

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models Determining Cluster Centers • Step 2: fix r nk , minimize J wrt the cluster centers µ k (b) 2 K N r nk || x n − µ k || 2 switch order of sums � � J = 0 k = 1 n = 1 • So we can minimze wrt each µ k separately −2 −2 0 2 • Take derivative, set to zero: (c) 2 N � r nk ( x n − µ k ) = 0 2 0 n = 1 � n r nk x n −2 ⇔ µ k = −2 0 2 � n r nk i.e. mean of datapoints x n assigned to cluster k

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models K-means Algorithm • Start with initial guess for µ k • Iteration of two steps: • Minimize J wrt r nk • Assign points to nearest cluster center • Minimize J wrt µ k • Set cluster center as average of points in cluster • Rinse and repeat until convergence

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models K-means example (a) 2 0 −2 −2 0 2

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models K-means example (b) 2 0 −2 −2 0 2

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models K-means example (c) 2 0 −2 −2 0 2

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models K-means example (d) 2 0 −2 −2 0 2

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models K-means example (e) 2 0 −2 −2 0 2

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models K-means example (f) 2 0 −2 −2 0 2

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models K-means example (g) 2 0 −2 −2 0 2

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models K-means example (h) 2 0 −2 −2 0 2

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models K-means example (i) 2 0 −2 −2 0 2 Next step doesn’t change membership – stop

Latent Variable Models and Expectation Maximization Oliver Schulte - PowerPoint PPT Presentation

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 K-Means The Expectation Maximization Algorithm EM Example:

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

1 Latent variable models In the next section we will discuss latent variable models for

Latent-Variable Generative Models and the Expectation Maximization (EM) Algorithm Karl Stratos

Applied Machine Learning Expectation Maximization for Mixture of Gaussians Siamak Ravanbakhsh

more on expectation 1 2 properties of expectation properties of expectation Linearity, II

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Expectation Will Perkins January 21, 2013 Expectation Definition The expectation of a random

CS70: Jean Walrand: Lecture 27. Expectation; Conditional Expectation; B(n, p); G(p) 1. Review of

Introduction to Latent Sequences & Expectation Maximization CMSC 473/673 UMBC

Notes on Neal and Hintons Generalized Expectation Maximization (GEM) Algorithm Mark Johnson

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 6 Stefano

Learning Latent Variable Models through Tensor Methods Anima Anandkumar U.C. Irvine Challenges

and how to protect against them Bianca Schroeder, Sotirios Damouras, Phillipa Gill Motivation

lavaan : an R package for structural equation modeling and more Yves Rosseel Department of Data

POIR 613: Computational Social Science Pablo Barber a School of International Relations

The semnova Package for Latent Repeated Measures ANOVA Benedikt Langenberg, RWTH Aachen

Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural

Latent Variables and Real-Time Forecasting in DSGE Models with Occasionally Binding

User Recommendation in Content Curation Platforms Jianling Wang, Ziwei Zhu and James Caverlee

About generative aspects of Variational Autoencoders LOD19 The Fifth International Conference

Latent Variable Models and Expectation Maximization Oliver Schulte - PowerPoint PPT Presentation

K-Means The Expectation Maximization Algorithm EM Example: Gaussian Mixture Models Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 K-Means The Expectation Maximization Algorithm EM Example:

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

1 Latent variable models In the next section we will discuss latent variable models for

Latent-Variable Generative Models and the Expectation Maximization (EM) Algorithm Karl Stratos

Applied Machine Learning Expectation Maximization for Mixture of Gaussians Siamak Ravanbakhsh

more on expectation 1 2 properties of expectation properties of expectation Linearity, II

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Expectation Will Perkins January 21, 2013 Expectation Definition The expectation of a random

CS70: Jean Walrand: Lecture 27. Expectation; Conditional Expectation; B(n, p); G(p) 1. Review of

Introduction to Latent Sequences &amp; Expectation Maximization CMSC 473/673 UMBC

Notes on Neal and Hintons Generalized Expectation Maximization (GEM) Algorithm Mark Johnson

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 6 Stefano

Learning Latent Variable Models through Tensor Methods Anima Anandkumar U.C. Irvine Challenges

and how to protect against them Bianca Schroeder, Sotirios Damouras, Phillipa Gill Motivation

lavaan : an R package for structural equation modeling and more Yves Rosseel Department of Data

POIR 613: Computational Social Science Pablo Barber a School of International Relations

The semnova Package for Latent Repeated Measures ANOVA Benedikt Langenberg, RWTH Aachen

Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural

Latent Variables and Real-Time Forecasting in DSGE Models with Occasionally Binding

User Recommendation in Content Curation Platforms Jianling Wang, Ziwei Zhu and James Caverlee

About generative aspects of Variational Autoencoders LOD19 The Fifth International Conference

Introduction to Latent Sequences & Expectation Maximization CMSC 473/673 UMBC