Accelerating the EM Algorithm for Mixture Density Estimation Homer - PowerPoint PPT Presentation

Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic Instititute ❏♦✐♥t ✇♦r❦ ✇✐t❤ ❏♦s❤ P❧❛ss❡ ✭❲P■✴■♠♣❡r✐❛❧ ❈♦❧❧❡❣❡✮✳ ❘❡s❡❛r❝❤ s✉♣♣♦rt❡❞ ✐♥ ♣❛rt ❜② ❉❖❊ ●r❛♥t ❉❊✲❙❈✵✵✵✹✽✽✵ ❛♥❞ ◆❙❋ ●r❛♥t ❉▼❙✲✶✸✸✼✾✹✸✳ Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18

Mixture Densities Consider a (finite) mixture density m � p ( x | Φ) = α i p i ( x | φ i ) . i =1 Problem: Estimate Φ = ( α 1 , . . . , α m , φ 1 , . . . , φ m ) using an “unlabeled” sample { x k } N k =1 on the mixture. Maximum-Likelihood Estimate (MLE): Determine Φ = arg max L (Φ), where N � L (Φ) ≡ log p ( x k | Φ) . k =1 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 2/18

The EM (Expectation-Maximization) Algorithm The general formulation and name were given in . . . A. P. Dempster, N. M. Laird, and D. B. Rubin (1977), Maximum-likelihood from incomplete data via the EM algorithm , J. Royal Statist. Soc. Ser. B (methodological), 39, pp. 1-38. General idea: Determine the next approximate MLE to maximize the expectation of the complete-data log-likelihood function, given the observed incomplete data and the current approximate MLE. Marvelous property: The log-likelihood function increases at each iteration. Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 3/18

The EM Algorithm for Mixture Densities For a mixture density, an EM iteration is . . . N α c i p i ( x k | φ c = 1 i ) α + � p ( x k | Φ c ) , i N k =1 N log p i ( x k | φ i ) α c i p i ( x k | φ c i ) � φ + = arg max i p ( x k | Φ c ) k =1 For a derivation, convergence analysis, history, etc., see . . . R. A. Redner and HW (1984), Mixture densities, maximum-likelihood, and the EM algorithm , SIAM Review, 26, 195–239. Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 4/18

Particular Example: Normal (Gaussian) Mixtures Assume (multivariate) normal densities . For each i , φ i = ( µ i , Σ i ) and 1 (2 π ) n / 2 ( det Σ i ) 1 / 2 e − ( x − µ i ) T Σ − 1 ( x − µ i ) / 2 p i ( x | φ i ) = i EM iteration: For i = 1, . . . , m , N α c i p i ( x k | φ c = 1 i ) α + � , i p ( x k | Φ c ) N k =1 � N � � � N � α c i p i ( x k | φ c α c i p i ( x k | φ c i ) i ) µ + � � = x k , i p ( x k | Φ c ) p ( x k | Φ c ) k =1 k =1 � N � � � N � i ) T α c i p i ( x k | φ c α c i p i ( x k | φ c i ) i ) Σ + � ( x k − µ + i )( x k − µ + � = . i p ( x k | Φ c ) p ( x k | Φ c ) k =1 k =1 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 5/18

EM Iterations Demo A Univariate Normal Mixture. e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 5. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i Sample of 100,000 observations. ◮ — [ α 1 , . . . , α 5 ] = [ . 2 , . 3 , . 3 , . 1 , . 1] — [ µ 1 , . . . , µ 5 ] = [0 , 1 , 2 , 3 , 4], — [ σ 2 1 , . . . , σ 2 5 ] = [ . 2 , 2 , . 5 , . 1 , . 1]. � � �� N �� N � EM iterations on the means: µ + α i p i ( x k | φ i ) α i p i ( x k | φ i ) i = k =1 x k ◮ . k =1 p ( x k | Φ) p ( x k | Φ) 0.3 0.25 0.2 0.15 0.1 0.05 0 � 0.05 � 0.1 � 3 � 2 � 1 0 1 2 3 4 5 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 6/18

EM Iterations Demo A Univariate Normal Mixture. e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 5. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i Sample of 100,000 observations. ◮ — [ α 1 , . . . , α 5 ] = [ . 2 , . 3 , . 3 , . 1 , . 1] — [ µ 1 , . . . , µ 5 ] = [0 , 1 , 2 , 3 , 4], — [ σ 2 1 , . . . , σ 2 5 ] = [ . 2 , 2 , . 5 , . 1 , . 1]. � � �� N �� N � EM iterations on the means: µ + α i p i ( x k | φ i ) α i p i ( x k | φ i ) i = k =1 x k ◮ . k =1 p ( x k | Φ) p ( x k | Φ) 0.3 0 0.25 −2 0.2 Log Residual Norm −4 0.15 −6 0.1 −8 0.05 −10 0 −12 � 0.05 � 0.1 −14 � 3 � 2 � 1 0 1 2 3 4 5 0 20 40 60 80 100 Iteration Number Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 6/18

Anderson Acceleration Derived from a method of D. G. Anderson, Iterative procedures for nonlinear integral equations , J. Assoc. Comput. Machinery, 12 (1965), 547–560. Consider a fixed-point iteration x + = g ( x ), g : R n → R n . Anderson Acceleration: Given x 0 and mMax ≥ 1. Set x 1 = g ( x 0 ). Iterate: For k = 1, 2, . . . Set m k = min { mMax , k } . Set F k = ( f k − m k , . . . , f k ), where f i = g ( x i ) − x i . Solve min α ∈ R mk +1 � F k α � 2 s. t. � m k i =0 α i = 1. Set x k +1 = � m k i =0 α i g ( x k − m k + i ). Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 7/18

EM Iterations Demo (cont.) e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 5. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i Sample of 100,000 observations. ◮ — [ α 1 , . . . , α 5 ] = [ . 2 , . 3 , . 3 , . 1 , . 1] — [ µ 1 , . . . , µ 5 ] = [0 , 1 , 2 , 3 , 4], — [ σ 2 1 , . . . , σ 2 5 ] = [ . 2 , 2 , . 5 , . 1 , . 1]. � � �� N �� N � EM iterations on the means: µ + α i p i ( x k | φ i ) α i p i ( x k | φ i ) i = ◮ k =1 x k . k =1 p ( x k | Φ) p ( x k | Φ) 0.3 0.25 0.2 0.15 0.1 0.05 0 � 0.05 � 0.1 � 3 � 2 � 1 0 1 2 3 4 5 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 8/18

EM Iterations Demo (cont.) e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 5. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i Sample of 100,000 observations. ◮ — [ α 1 , . . . , α 5 ] = [ . 2 , . 3 , . 3 , . 1 , . 1] — [ µ 1 , . . . , µ 5 ] = [0 , 1 , 2 , 3 , 4], — [ σ 2 1 , . . . , σ 2 5 ] = [ . 2 , 2 , . 5 , . 1 , . 1]. � � �� N �� N � EM iterations on the means: µ + α i p i ( x k | φ i ) α i p i ( x k | φ i ) i = ◮ k =1 x k . k =1 p ( x k | Φ) p ( x k | Φ) 0.3 0 0.25 −2 0.2 Log Residual Norm −4 0.15 −6 0.1 −8 0.05 −10 0 −12 � 0.05 −14 � 0.1 0 10 20 30 40 50 60 70 80 90 100 � 3 � 2 � 1 0 1 2 3 4 5 Iteration Number Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 8/18

EM Convergence and “Separation” Redner–W (1984): For mixture densities, the convergence is linear and depends on the “separation” of the component populations: “well-separated” (fast convergence) if, whenever i � = j , p j ( x | φ ∗ j ) p i ( x | φ ∗ i ) p ( x | Φ ∗ ) ≈ 0 for all x ∈ IR n ; p ( x | Φ ∗ ) · “poorly separated” (slow convergence) if, for some i � = j , p j ( x | φ ∗ j ) p i ( x | φ ∗ i ) p ( x | Φ ∗ ) for all x ∈ R n . p ( x | Φ ∗ ) ≈ Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 9/18

Example: EM Convergence and “Separation” A Univariate Normal Mixture. e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 3. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i � � �� N �� N � α i p i ( x k | φ i ) α i p i ( x k | φ i ) EM iterations on the means: µ + i = k =1 x k . ◮ p ( x k | Φ) k =1 p ( x k | Φ) Sample of 100,000 observations. ◮ [ σ 2 1 , σ 2 2 , σ 2 — [ α 1 , α 2 , α 3 ] = [ . 3 , . 3 , . 4], 3 ] = [1 , 1 , 1]. — [ µ 1 , µ 2 , µ 3 ] = [0 , 2 , 4], [0 , 1 , 2], [0 , . 5 , 1]. 0 −2 Log Residual Norm −4 −6 −8 −10 −12 −14 0 10 20 30 40 50 60 70 80 90 100 Iteration Number Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 10/18

Experiments with Multivariate Normal Mixtures Experiment with Anderson acceleration applied to . . . EM iteration: For i = 1, . . . , m , N α c i p i ( x k | φ c i ) = 1 α + � , i p ( x k | Φ c ) N k =1 � N � � � N � α c i p i ( x k | φ c α c i p i ( x k | φ c i ) i ) µ + � � = x k , i p ( x k | Φ c ) p ( x k | Φ c ) k =1 k =1 � N � � � N � i ) T α c i p i ( x k | φ c α c i p i ( x k | φ c i ) i ) Σ + � ( x k − µ + i )( x k − µ + � = . i p ( x k | Φ c ) p ( x k | Φ c ) k =1 k =1 Assume m is known. Ultimate interest: very large N . Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 11/18

Accelerating the EM Algorithm for Mixture Density Estimation Homer - PowerPoint PPT Presentation

Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic Instititute t r t s Pss Pr

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Relative Density Chapters 3.5 Relative Density 1 2/5/2015 Minimum Density Pluviate soil from

Polyethylene Monomer: Ethylene High Density Polyethylene (HDPE) Low Density Polyethylene

Bulk Density and Void Content Bulk Density Bulk density ( n .) the mass of a unit volume of bulk

EM Algorithm and Mixture Models Guojun Zhang University of Waterloo Unsupervised learning and

Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI

Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative

3 M IXTURE DENSITY ESTIMATION In this chapter we consider mixture densities, the main building

Sub-quadratic Markov tree mixture models for probability density estimation Sourour Ammar 1 , Ph.

The EM Algorithm The EM algorithm Mixture models Why EM works EM variants Learning

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

The Dark Matter density MW Components Global density Data: inner Data: outer Data: masers

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

Solutions Unit 6 1 Solutions Homogenous Mixture (Solution) two or more substances mixed

Probabilistic & Unsupervised Learning Expectation Maximisation Maneesh Sahani

Cycle 1 2019: Improving Methods for Conducting Patient-Centered Outcomes Research (PCOR)

A GREAT VIRTUE THE FOUNDATION OF TRUST THE HEART OF RELATIONSHIPS AUTHENTICITY

Web Services Web Services Overview Overview Dr. Kanda Runapongsa Dr. Kanda Runapongsa

Methodology for Computer Science Research Lecture 1: Introduction Andrey Lukyanenko Department

A METHODOLOGICAL FRAMEWORK FOR A METHODOLOGICAL FRAMEWORK FOR RESEARCHING THE USABILITY OF THE

XXXIII Annual Meeting of the Division of Particles y Fields of the Mexican Society of Physics

Efficiency of general Krylov methods on GPUs An experimental study H. Anzt, M. Kreutzer, M.

Sambuz

Useful Links

Newsletter

Mail Us

Accelerating the EM Algorithm for Mixture Density Estimation Homer - PowerPoint PPT Presentation

Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic Instititute t r t s Pss Pr

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Relative Density Chapters 3.5 Relative Density 1 2/5/2015 Minimum Density Pluviate soil from

Polyethylene Monomer: Ethylene High Density Polyethylene (HDPE) Low Density Polyethylene

Bulk Density and Void Content Bulk Density Bulk density ( n .) the mass of a unit volume of bulk

EM Algorithm and Mixture Models Guojun Zhang University of Waterloo Unsupervised learning and

Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI

Joint Optimisation of Tandem Systems using Gaussian Mixture Density Neural Network Discriminative

3 M IXTURE DENSITY ESTIMATION In this chapter we consider mixture densities, the main building

Sub-quadratic Markov tree mixture models for probability density estimation Sourour Ammar 1 , Ph.

The EM Algorithm The EM algorithm Mixture models Why EM works EM variants Learning

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

The Dark Matter density MW Components Global density Data: inner Data: outer Data: masers

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

Solutions Unit 6 1 Solutions Homogenous Mixture (Solution) two or more substances mixed

Probabilistic &amp; Unsupervised Learning Expectation Maximisation Maneesh Sahani

Cycle 1 2019: Improving Methods for Conducting Patient-Centered Outcomes Research (PCOR)

A GREAT VIRTUE THE FOUNDATION OF TRUST THE HEART OF RELATIONSHIPS AUTHENTICITY

Web Services Web Services Overview Overview Dr. Kanda Runapongsa Dr. Kanda Runapongsa

Methodology for Computer Science Research Lecture 1: Introduction Andrey Lukyanenko Department

A METHODOLOGICAL FRAMEWORK FOR A METHODOLOGICAL FRAMEWORK FOR RESEARCHING THE USABILITY OF THE

XXXIII Annual Meeting of the Division of Particles y Fields of the Mexican Society of Physics

Efficiency of general Krylov methods on GPUs An experimental study H. Anzt, M. Kreutzer, M.

Sambuz

Useful Links

Newsletter

Mail Us

Probabilistic & Unsupervised Learning Expectation Maximisation Maneesh Sahani