E-M method for latent variable models Define augmented likelihood n - PowerPoint PPT Presentation

Feb 13, 2023 •181 likes •1.73k views

E-M method for latent variable models Define augmented likelihood n k R ij ln p ( x i , y i = j ) L ( ; R ) := , R ij i =1 j =1 with responsibility matrix R R n,k := { R [0 , 1] n k : R 1 k = 1 n } . Alternate two

Parameter constraints. E-M for GMMs still works if we freeze or constrain some parameters. Examples: ◮ No weights: initialize π = ( 1 / k , . . . , 1 / k ) and never update it. ◮ Diagonal covariance matrices: update everything as before, except Σ j := diag(( σ j ) 2 1 , . . . , ( σ j ) 2 d ) where � n i =1 R ij ( x i − µ j ) 2 ( σ j ) 2 l l := ; nπ j that is: we use coordinate-wise sample variances weighted by R . Why is this a good idea? 38 / 70
Parameter constraints. E-M for GMMs still works if we freeze or constrain some parameters. Examples: ◮ No weights: initialize π = ( 1 / k , . . . , 1 / k ) and never update it. ◮ Diagonal covariance matrices: update everything as before, except Σ j := diag(( σ j ) 2 1 , . . . , ( σ j ) 2 d ) where � n i =1 R ij ( x i − µ j ) 2 ( σ j ) 2 l l := ; nπ j that is: we use coordinate-wise sample variances weighted by R . Why is this a good idea? Computation (of inverse), sample complexity, . . . 38 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Singularities E-M with GMMs suffers from singularities : trivial situations where the likelihood goes to ∞ but the solution is bad. ◮ Suppose: d = 1 , k = 2 , π j = 1 / 2 , n = 3 with x 1 = − 1 and x 2 = +1 and x 3 = +3 . Initialize with µ 1 = 0 and σ 1 = 1 , but µ 2 = +3 = x 3 and σ 2 = 1 / 100 . Then σ 2 → 0 and L ↑ ∞ . 40 / 70
Interpolating between k -means and GMM E-M Same M-step: fix π = ( 1 / k , . . . , 1 / k ) and Σ j = c I for a fixed c > 0 . 41 / 70
Interpolating between k -means and GMM E-M Same M-step: fix π = ( 1 / k , . . . , 1 / k ) and Σ j = c I for a fixed c > 0 . Same E-step: define q ij := 1 2 � x i − µ j � 2 ; the E-step chooses R ij := p θ ( y i = j | x i ) = p θ ( y i = j, x i ) p θ ( y i = j, x i ) = p θ ( x i ) � k l =1 p θ ( y i = l, x i ) π j p µ j , Σ j ( x i ) exp( − q ij /c ) = = � k � k l =1 exp( − q il /c ) l =1 π l p µ l , Σ l ( x i ) Fix i ∈ { 1 , . . . , n } and suppose minimum q i := min j q ij is unique: 41 / 70
Interpolating between k -means and GMM E-M Same M-step: fix π = ( 1 / k , . . . , 1 / k ) and Σ j = c I for a fixed c > 0 . Same E-step: define q ij := 1 2 � x i − µ j � 2 ; the E-step chooses R ij := p θ ( y i = j | x i ) = p θ ( y i = j, x i ) p θ ( y i = j, x i ) = p θ ( x i ) � k l =1 p θ ( y i = l, x i ) π j p µ j , Σ j ( x i ) exp( − q ij /c ) = = � k � k l =1 exp( − q il /c ) l =1 π l p µ l , Σ l ( x i ) Fix i ∈ { 1 , . . . , n } and suppose minimum q i := min j q ij is unique: exp( − q ij /c ) exp( q i − q ij /c ) lim c ↓ 0 R ij = lim = lim � k � k c ↓ 0 l =1 exp( − q il /c ) c ↓ 0 l =1 exp( q i − q il /c ) � 1 q ij = q i , = q ij � = q i . 0 That is, R becomes hard assignment A as c ↓ 0 . 41 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 42 / 70

Recommend

E-M method for latent variable models Define augmented likelihood n - PowerPoint PPT Presentation

E-M method for latent variable models Define augmented likelihood n k R ij ln p ( x i , y i = j ) L ( ; R ) := , R ij i =1 j =1 with responsibility matrix R R n,k := { R [0 , 1] n k : R 1 k = 1 n } . Alternate two

1 Latent variable models In the next section we will discuss latent variable models for

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 6 Stefano

Learning Latent Variable Models through Tensor Methods Anima Anandkumar U.C. Irvine Challenges

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of

Discrete Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 15

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work

Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Learning Overcomplete Latent Variable Models through Tensor Methods Majid Janzamin UC Irvine

C u stomer and prod u ct segmentation basics MAC H IN E L E AR N IN G FOR MAR K E TIN G IN P

Joint Parameter Estimation of the Ornstein-Uhlenbeck SDE driven by Fractional Brownian Motion

Expectation maximization Subhransu Maji CMPSCI 689: Machine Learning 14 April 2015 Motivation

Lecture on Parameter Estimation for Stochastic Differential Equations Erik Lindstrm

Bag-of-features models for category classification for category classification Cordelia Schmid

tabula rasa Exploring sound/gesture typo-morphology for enactive computer music performance

trrt tt s

Histogram-based matching of GMM encoded features for online signature verification Vivek