Expectation Maximization [KF Chapter 19] CS 786 University of - PDF document

Expectation Maximization [KF Chapter 19] CS 786 University of Waterloo Lecture 17: June 28, 2012 Incomplete data • Complete data – Values of all attributes are known – Learning is relatively easy • But many real-world problems have hidden variables (a.k.a latent variables) – Incomplete data – Values of some attributes missing 2 CS786 Lecture Slides (c) 2012 P. Poupart 1

Unsupervised Learning • Incomplete data  unsupervised learning • Examples: – Categorisation of stars by astronomers – Categorisation of species by anthropologists – Market segmentation for marketing – Pattern identification for fraud detection – Research in general! 3 CS786 Lecture Slides (c) 2012 P. Poupart Maximum Likelihood Learning • ML learning of Bayes net parameters: – For  V=true,pa(V)= v = Pr(V=true|par(V) = v ) –  V=true,pa(V)= v = #[V=true,pa(V)= v ] #[V=true,pa(V)= v ] + #[V=false,pa(V)= v ] – Assumes all attributes have values… • What if values of some attributes are missing? 4 CS786 Lecture Slides (c) 2012 P. Poupart 2

“Naive” solutions for incomplete data • Solution #1: Ignore records with missing values – But what if all records are missing values (i.e., when a variable is hidden, none of the records have any value for that variable) • Solution #2: Ignore hidden variables – Model may become significantly more complex! 5 CS786 Lecture Slides (c) 2012 P. Poupart Heart disease example 2 2 2 2 2 2 Smoking Diet Exercise Smoking Diet Exercise 54 HeartDisease 6 6 6 54 162 486 Symptom 1 Symptom 2 Symptom 3 Symptom 1 Symptom 2 Symptom 3 (b) (a) • a) simpler (i.e., fewer CPT parameters) • b) complex (i.e., lots of CPT parameters) 6 CS786 Lecture Slides (c) 2012 P. Poupart 3

“Direct” maximum likelihood • Solution 3: maximize likelihood directly – Let Z be hidden and E observable – h ML = argmax h P( e |h) = argmax h Σ Z P( e , Z |h) = argmax h Σ Z  i CPT(V i ) = argmax h log Σ Z  i CPT(V i ) – Problem: can’t push log past sum to linearize product 7 CS786 Lecture Slides (c) 2012 P. Poupart Expectation-Maximization (EM) • Solution #4: EM algorithm – Intuition: if we knew the missing values, computing h ML would be trival • Guess h ML • Iterate – Expectation: based on h ML , compute expectation of the missing values – Maximization: based on expected missing values, compute new estimate of h ML 8 CS786 Lecture Slides (c) 2012 P. Poupart 4

Expectation-Maximization (EM) • Objective: max h Σ Z P(Z|e,h) log P( e,Z |h) • Iterative approach h i+1 = argmax h Σ Z P( Z | e ,h i ) log P( e , Z |h) • Convergence guaranteed h ∞ = argmax h Σ Z P( Z | e ,h) log P( e , Z |h) • Monotonic improvement of likelihood P( e |h i+1 )  P( e |h i ) 11 CS786 Lecture Slides (c) 2012 P. Poupart Optimization Step • For one data point e: h i+1 = argmax h Σ Z P( Z |h i , e ) log P( e , Z |h) • For multiple data points: h i+1 = argmax h Σ e n e Σ Z P( Z |h i , e ) log P( e , Z |h) Where n e is frequency of e in dataset • Compare to ML for complete data h* = argmax h Σ d n d log P( d |h) 12 CS786 Lecture Slides (c) 2012 P. Poupart 6

Optimization Solution • Since d  <z,e> • Let n d = n e P( z |h i , e )  expected frequency • Similar to the complete data case, the optimal parameters are obtained by setting the derivative to 0, which yields relative expected frequencies • E.g.  V,pa(V) = P(V|pa(V)) = n V,pa(V) / n pa(V) 13 CS786 Lecture Slides (c) 2012 P. Poupart Candy Example • Suppose you buy two bags of candies of unknown type (e.g. flavour ratios) • You plan to eat sufficiently many candies of each bag to learn their type • Ignoring your plan, your roommate mixes both bags… • How can you learn the type of each bag despite being mixed? 14 CS786 Lecture Slides (c) 2012 P. Poupart 7

Candy Example • “Bag” variable is hidden 15 CS786 Lecture Slides (c) 2012 P. Poupart Unsupervised Clustering • “Class” variable is hidden • Naïve Bayes model P ( Bag= 1) Bag C Bag P ( F=cherry | B ) 1 F 1 2 F 2 Flavor Wrapper Holes X (a) (b) 16 CS786 Lecture Slides (c) 2012 P. Poupart 8

Candy Example • Unknown Parameters: –  i = P(Bag=i) –  Fi = P(Flavour=cherry|Bag=i) –  Wi = P(Wrapper=red|Bag=i) –  Hi = P(Hole=yes|Bag=i) • When eating a candy: – F, W and H are observable – B is hidden 17 CS786 Lecture Slides (c) 2012 P. Poupart Candy Example • Let true parameters be: –  =0.5,  F1 =  W1 =  H1 =0.8,  F2 =  W2 =  H2 =0.3 • After eating 1000 candies: W=red W=green H=1 H=0 H=1 H=0 F=cherry 273 93 104 90 F=lime 79 100 94 167 18 CS786 Lecture Slides (c) 2012 P. Poupart 9

Candy Example • EM algorithm • Guess h 0 : –  =0.6,  F1 =  W1 =  H1 =0.6,  F2 =  W2 =  H2 =0.4 • Alternate: – Expectation: expected # of candies in each bag – Maximization: new parameter estimates 19 CS786 Lecture Slides (c) 2012 P. Poupart Candy Example • Expectation: expected # of candies in each bag – #[Bag=i] = Σ j P(B=i|f j ,w j ,h j ) – Compute P(B=i|f j ,w j ,h j ) by variable elimination (or any other inference alg.) • Example: – #[Bag=1] = 612 – #[Bag=2] = 388 20 CS786 Lecture Slides (c) 2012 P. Poupart 10

Candy Example • Maximization: relative frequency of each bag –  1 = 612/1000 = 0.612 –  2 = 388/1000 = 0.388 21 CS786 Lecture Slides (c) 2012 P. Poupart Candy Example • Expectation: expected # of cherry candies in each bag – #[B=i,F=cherry] = Σ j P(B=i|f j =cherry,w j ,h j ) – Compute P(B=i|f j =cherry,w j ,h j ) by variable elimination (or any other inference alg.) • Maximization: –  F 1 = #[B=1,F=cherry] / #[B=1] = 0.668 –  F 2 = #[B=2,F=cherry] / #[B=2] = 0.389 22 CS786 Lecture Slides (c) 2012 P. Poupart 11

Candy Example -1975 -1980 -1985 -1990 Log-likelihood -1995 -2000 -2005 -2010 -2015 -2020 -2025 0 20 40 60 80 100 120 Iteration number 23 CS786 Lecture Slides (c) 2012 P. Poupart Bayesian networks • EM algorithm for general Bayes nets • Expectation: – #[V i =v ij ,Pa(V i )=pa ik ] = expected frequency • Maximization: –  vij,paik = #[V i =v ij ,Pa(V i )=pa ik ] / #[Pa(V i )=pa ik ] 24 CS786 Lecture Slides (c) 2012 P. Poupart 12

Expectation Maximization [KF Chapter 19] CS 786 University of - PDF document

Expectation Maximization [KF Chapter 19] CS 786 University of Waterloo Lecture 17: June 28, 2012 Incomplete data Complete data Values of all attributes are known Learning is relatively easy But many real-world problems have

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

Notes on Neal and Hintons Generalized Expectation Maximization (GEM) Algorithm Mark Johnson

Expectation maximization don't have any labels. Can you still do something? ! Amazingly you can!

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

Expectation Maximization, and Learning from Partly Unobserved Data Recommended readings:

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9

Applied Machine Learning Expectation Maximization for Mixture of Gaussians Siamak Ravanbakhsh

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

Probabilistic Modeling and Expectation Maximization CMSC 678 UMBC Course Overview (so far)

Expectation-Maximization L eon Bottou NEC Labs America COS 424 3/9/2010 Agenda

Introduction to Latent Sequences & Expectation Maximization CMSC 473/673 UMBC

(An example of) The Expectation-Maximization (EM) Algorithm Instructor: Sham Kakade 1 An

A Novel Approach to Model Error Modeling using the Expectation-Maximization Algorithm Ramn A.

Expectation Maximization Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia

Processing Expectation Maximization Mixture Models Bhiksha Raj Class 10. 3 Oct 2013 3 Oct

3D Point Cloud Registration using GPU-Accelerated Expectation Maximization Ben Eckart 1,2 ,

Expectation Maximization CMSC 473/673 UMBC Recap from last time (and the first unit) N-gram

Multi-Microphone Speech Dereverberation using Expectation-Maximization and Kalman Smoothing Boaz

Contents Clustering K-means Mixture of Gaussians Expectation Maximization

Lecture Maximization Expectation 12 . Variational Inference Scribes Daniel Zeiberg :

CSC 411 Lectures 1617: Expectation-Maximization Roger Grosse, Amir-massoud Farahmand, and Juan

Expectation-Maximization Tensor Factorization for Practical Location Privacy Attacks Takao

Expectation maximization Subhransu Maji CMPSCI 689: Machine Learning 14 April 2015 Motivation

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many

Expectation Maximization [KF Chapter 19] CS 786 University of - PDF document

Expectation Maximization [KF Chapter 19] CS 786 University of Waterloo Lecture 17: June 28, 2012 Incomplete data Complete data Values of all attributes are known Learning is relatively easy But many real-world problems have

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

Notes on Neal and Hintons Generalized Expectation Maximization (GEM) Algorithm Mark Johnson

Expectation maximization don't have any labels. Can you still do something? ! Amazingly you can!

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

Expectation Maximization, and Learning from Partly Unobserved Data Recommended readings:

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9

Applied Machine Learning Expectation Maximization for Mixture of Gaussians Siamak Ravanbakhsh

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

Probabilistic Modeling and Expectation Maximization CMSC 678 UMBC Course Overview (so far)

Expectation-Maximization L eon Bottou NEC Labs America COS 424 3/9/2010 Agenda

Introduction to Latent Sequences &amp; Expectation Maximization CMSC 473/673 UMBC

(An example of) The Expectation-Maximization (EM) Algorithm Instructor: Sham Kakade 1 An

A Novel Approach to Model Error Modeling using the Expectation-Maximization Algorithm Ramn A.

Expectation Maximization Henrik I. Christensen Robotics &amp; Intelligent Machines @ GT Georgia

Processing Expectation Maximization Mixture Models Bhiksha Raj Class 10. 3 Oct 2013 3 Oct

3D Point Cloud Registration using GPU-Accelerated Expectation Maximization Ben Eckart 1,2 ,

Expectation Maximization CMSC 473/673 UMBC Recap from last time (and the first unit) N-gram

Multi-Microphone Speech Dereverberation using Expectation-Maximization and Kalman Smoothing Boaz

Contents Clustering K-means Mixture of Gaussians Expectation Maximization

Lecture Maximization Expectation 12 . Variational Inference Scribes Daniel Zeiberg :

CSC 411 Lectures 1617: Expectation-Maximization Roger Grosse, Amir-massoud Farahmand, and Juan

Expectation-Maximization Tensor Factorization for Practical Location Privacy Attacks Takao

Expectation maximization Subhransu Maji CMPSCI 689: Machine Learning 14 April 2015 Motivation

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many

Introduction to Latent Sequences & Expectation Maximization CMSC 473/673 UMBC

Expectation Maximization Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia