Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic Instititute ❏♦✐♥t ✇♦r❦ ✇✐t❤ ❏♦s❤ P❧❛ss❡ ✭❲P■✴■♠♣❡r✐❛❧ ❈♦❧❧❡❣❡✮✳ ❘❡s❡❛r❝❤ s✉♣♣♦rt❡❞ ✐♥ ♣❛rt ❜② ❉❖❊ ●r❛♥t ❉❊✲❙❈✵✵✵✹✽✽✵ ❛♥❞ ◆❙❋ ●r❛♥t ❉▼❙✲✶✸✸✼✾✹✸✳ Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18
Mixture Densities Consider a (finite) mixture density m � p ( x | Φ) = α i p i ( x | φ i ) . i =1 Problem: Estimate Φ = ( α 1 , . . . , α m , φ 1 , . . . , φ m ) using an “unlabeled” sample { x k } N k =1 on the mixture. Maximum-Likelihood Estimate (MLE): Determine Φ = arg max L (Φ), where N � L (Φ) ≡ log p ( x k | Φ) . k =1 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 2/18
The EM (Expectation-Maximization) Algorithm The general formulation and name were given in . . . A. P. Dempster, N. M. Laird, and D. B. Rubin (1977), Maximum-likelihood from incomplete data via the EM algorithm , J. Royal Statist. Soc. Ser. B (methodological), 39, pp. 1-38. General idea: Determine the next approximate MLE to maximize the expectation of the complete-data log-likelihood function, given the observed incomplete data and the current approximate MLE. Marvelous property: The log-likelihood function increases at each iteration. Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 3/18
The EM Algorithm for Mixture Densities For a mixture density, an EM iteration is . . . N α c i p i ( x k | φ c = 1 i ) α + � p ( x k | Φ c ) , i N k =1 N log p i ( x k | φ i ) α c i p i ( x k | φ c i ) � φ + = arg max i p ( x k | Φ c ) k =1 For a derivation, convergence analysis, history, etc., see . . . R. A. Redner and HW (1984), Mixture densities, maximum-likelihood, and the EM algorithm , SIAM Review, 26, 195–239. Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 4/18
Particular Example: Normal (Gaussian) Mixtures Assume (multivariate) normal densities . For each i , φ i = ( µ i , Σ i ) and 1 (2 π ) n / 2 ( det Σ i ) 1 / 2 e − ( x − µ i ) T Σ − 1 ( x − µ i ) / 2 p i ( x | φ i ) = i EM iteration: For i = 1, . . . , m , N α c i p i ( x k | φ c = 1 i ) α + � , i p ( x k | Φ c ) N k =1 � N � � � N � α c i p i ( x k | φ c α c i p i ( x k | φ c i ) i ) µ + � � = x k , i p ( x k | Φ c ) p ( x k | Φ c ) k =1 k =1 � N � � � N � i ) T α c i p i ( x k | φ c α c i p i ( x k | φ c i ) i ) Σ + � ( x k − µ + i )( x k − µ + � = . i p ( x k | Φ c ) p ( x k | Φ c ) k =1 k =1 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 5/18
EM Iterations Demo A Univariate Normal Mixture. e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 5. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i Sample of 100,000 observations. ◮ — [ α 1 , . . . , α 5 ] = [ . 2 , . 3 , . 3 , . 1 , . 1] — [ µ 1 , . . . , µ 5 ] = [0 , 1 , 2 , 3 , 4], — [ σ 2 1 , . . . , σ 2 5 ] = [ . 2 , 2 , . 5 , . 1 , . 1]. � � �� N �� N � EM iterations on the means: µ + α i p i ( x k | φ i ) α i p i ( x k | φ i ) i = k =1 x k ◮ . k =1 p ( x k | Φ) p ( x k | Φ) 0.3 0.25 0.2 0.15 0.1 0.05 0 � 0.05 � 0.1 � 3 � 2 � 1 0 1 2 3 4 5 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 6/18
EM Iterations Demo A Univariate Normal Mixture. e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 5. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i Sample of 100,000 observations. ◮ — [ α 1 , . . . , α 5 ] = [ . 2 , . 3 , . 3 , . 1 , . 1] — [ µ 1 , . . . , µ 5 ] = [0 , 1 , 2 , 3 , 4], — [ σ 2 1 , . . . , σ 2 5 ] = [ . 2 , 2 , . 5 , . 1 , . 1]. � � �� N �� N � EM iterations on the means: µ + α i p i ( x k | φ i ) α i p i ( x k | φ i ) i = k =1 x k ◮ . k =1 p ( x k | Φ) p ( x k | Φ) 0.3 0 0.25 −2 0.2 Log Residual Norm −4 0.15 −6 0.1 −8 0.05 −10 0 −12 � 0.05 � 0.1 −14 � 3 � 2 � 1 0 1 2 3 4 5 0 20 40 60 80 100 Iteration Number Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 6/18
Anderson Acceleration Derived from a method of D. G. Anderson, Iterative procedures for nonlinear integral equations , J. Assoc. Comput. Machinery, 12 (1965), 547–560. Consider a fixed-point iteration x + = g ( x ), g : R n → R n . Anderson Acceleration: Given x 0 and mMax ≥ 1. Set x 1 = g ( x 0 ). Iterate: For k = 1, 2, . . . Set m k = min { mMax , k } . Set F k = ( f k − m k , . . . , f k ), where f i = g ( x i ) − x i . Solve min α ∈ R mk +1 � F k α � 2 s. t. � m k i =0 α i = 1. Set x k +1 = � m k i =0 α i g ( x k − m k + i ). Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 7/18
EM Iterations Demo (cont.) e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 5. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i Sample of 100,000 observations. ◮ — [ α 1 , . . . , α 5 ] = [ . 2 , . 3 , . 3 , . 1 , . 1] — [ µ 1 , . . . , µ 5 ] = [0 , 1 , 2 , 3 , 4], — [ σ 2 1 , . . . , σ 2 5 ] = [ . 2 , 2 , . 5 , . 1 , . 1]. � � �� N �� N � EM iterations on the means: µ + α i p i ( x k | φ i ) α i p i ( x k | φ i ) i = ◮ k =1 x k . k =1 p ( x k | Φ) p ( x k | Φ) 0.3 0.25 0.2 0.15 0.1 0.05 0 � 0.05 � 0.1 � 3 � 2 � 1 0 1 2 3 4 5 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 8/18
EM Iterations Demo (cont.) e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 5. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i Sample of 100,000 observations. ◮ — [ α 1 , . . . , α 5 ] = [ . 2 , . 3 , . 3 , . 1 , . 1] — [ µ 1 , . . . , µ 5 ] = [0 , 1 , 2 , 3 , 4], — [ σ 2 1 , . . . , σ 2 5 ] = [ . 2 , 2 , . 5 , . 1 , . 1]. � � �� N �� N � EM iterations on the means: µ + α i p i ( x k | φ i ) α i p i ( x k | φ i ) i = ◮ k =1 x k . k =1 p ( x k | Φ) p ( x k | Φ) 0.3 0 0.25 −2 0.2 Log Residual Norm −4 0.15 −6 0.1 −8 0.05 −10 0 −12 � 0.05 −14 � 0.1 0 10 20 30 40 50 60 70 80 90 100 � 3 � 2 � 1 0 1 2 3 4 5 Iteration Number Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 8/18
EM Convergence and “Separation” Redner–W (1984): For mixture densities, the convergence is linear and depends on the “separation” of the component populations: “well-separated” (fast convergence) if, whenever i � = j , p j ( x | φ ∗ j ) p i ( x | φ ∗ i ) p ( x | Φ ∗ ) ≈ 0 for all x ∈ IR n ; p ( x | Φ ∗ ) · “poorly separated” (slow convergence) if, for some i � = j , p j ( x | φ ∗ j ) p i ( x | φ ∗ i ) p ( x | Φ ∗ ) for all x ∈ R n . p ( x | Φ ∗ ) ≈ Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 9/18
Example: EM Convergence and “Separation” A Univariate Normal Mixture. e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 3. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i � � �� N �� N � α i p i ( x k | φ i ) α i p i ( x k | φ i ) EM iterations on the means: µ + i = k =1 x k . ◮ p ( x k | Φ) k =1 p ( x k | Φ) Sample of 100,000 observations. ◮ [ σ 2 1 , σ 2 2 , σ 2 — [ α 1 , α 2 , α 3 ] = [ . 3 , . 3 , . 4], 3 ] = [1 , 1 , 1]. — [ µ 1 , µ 2 , µ 3 ] = [0 , 2 , 4], [0 , 1 , 2], [0 , . 5 , 1]. 0 −2 Log Residual Norm −4 −6 −8 −10 −12 −14 0 10 20 30 40 50 60 70 80 90 100 Iteration Number Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 10/18
Example: EM Convergence and “Separation” A Univariate Normal Mixture. e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 3. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i � � �� N �� N � α i p i ( x k | φ i ) α i p i ( x k | φ i ) EM iterations on the means: µ + i = k =1 x k . ◮ p ( x k | Φ) k =1 p ( x k | Φ) Sample of 100,000 observations. ◮ [ σ 2 1 , σ 2 2 , σ 2 — [ α 1 , α 2 , α 3 ] = [ . 3 , . 3 , . 4], 3 ] = [1 , 1 , 1]. — [ µ 1 , µ 2 , µ 3 ] = [0 , 2 , 4], [0 , 1 , 2], [0 , . 5 , 1]. 0 −2 Log Residual Norm −4 −6 −8 −10 −12 −14 0 10 20 30 40 50 60 70 80 90 100 Iteration Number Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 10/18
Experiments with Multivariate Normal Mixtures Experiment with Anderson acceleration applied to . . . EM iteration: For i = 1, . . . , m , N α c i p i ( x k | φ c i ) = 1 α + � , i p ( x k | Φ c ) N k =1 � N � � � N � α c i p i ( x k | φ c α c i p i ( x k | φ c i ) i ) µ + � � = x k , i p ( x k | Φ c ) p ( x k | Φ c ) k =1 k =1 � N � � � N � i ) T α c i p i ( x k | φ c α c i p i ( x k | φ c i ) i ) Σ + � ( x k − µ + i )( x k − µ + � = . i p ( x k | Φ c ) p ( x k | Φ c ) k =1 k =1 Assume m is known. Ultimate interest: very large N . Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 11/18
Recommend
More recommend