e m method for latent variable models
play

E-M method for latent variable models Define augmented likelihood n - PowerPoint PPT Presentation

E-M method for latent variable models Define augmented likelihood n k R ij ln p ( x i , y i = j ) L ( ; R ) := , R ij i =1 j =1 with responsibility matrix R R n,k := { R [0 , 1] n k : R 1 k = 1 n } . Alternate two


  1. Parameter constraints. E-M for GMMs still works if we freeze or constrain some parameters. Examples: ◮ No weights: initialize π = ( 1 / k , . . . , 1 / k ) and never update it. ◮ Diagonal covariance matrices: update everything as before, except Σ j := diag(( σ j ) 2 1 , . . . , ( σ j ) 2 d ) where � n i =1 R ij ( x i − µ j ) 2 ( σ j ) 2 l l := ; nπ j that is: we use coordinate-wise sample variances weighted by R . Why is this a good idea? 38 / 70

  2. Parameter constraints. E-M for GMMs still works if we freeze or constrain some parameters. Examples: ◮ No weights: initialize π = ( 1 / k , . . . , 1 / k ) and never update it. ◮ Diagonal covariance matrices: update everything as before, except Σ j := diag(( σ j ) 2 1 , . . . , ( σ j ) 2 d ) where � n i =1 R ij ( x i − µ j ) 2 ( σ j ) 2 l l := ; nπ j that is: we use coordinate-wise sample variances weighted by R . Why is this a good idea? Computation (of inverse), sample complexity, . . . 38 / 70

  3. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  4. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  5. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  6. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  7. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  8. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  9. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  10. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  11. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  12. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  13. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  14. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  15. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  16. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  17. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  18. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  19. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  20. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  21. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  22. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  23. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  24. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  25. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  26. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  27. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  28. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  29. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  30. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  31. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  32. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  33. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  34. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  35. Singularities E-M with GMMs suffers from singularities : trivial situations where the likelihood goes to ∞ but the solution is bad. ◮ Suppose: d = 1 , k = 2 , π j = 1 / 2 , n = 3 with x 1 = − 1 and x 2 = +1 and x 3 = +3 . Initialize with µ 1 = 0 and σ 1 = 1 , but µ 2 = +3 = x 3 and σ 2 = 1 / 100 . Then σ 2 → 0 and L ↑ ∞ . 40 / 70

  36. Interpolating between k -means and GMM E-M Same M-step: fix π = ( 1 / k , . . . , 1 / k ) and Σ j = c I for a fixed c > 0 . 41 / 70

  37. Interpolating between k -means and GMM E-M Same M-step: fix π = ( 1 / k , . . . , 1 / k ) and Σ j = c I for a fixed c > 0 . Same E-step: define q ij := 1 2 � x i − µ j � 2 ; the E-step chooses R ij := p θ ( y i = j | x i ) = p θ ( y i = j, x i ) p θ ( y i = j, x i ) = p θ ( x i ) � k l =1 p θ ( y i = l, x i ) π j p µ j , Σ j ( x i ) exp( − q ij /c ) = = � k � k l =1 exp( − q il /c ) l =1 π l p µ l , Σ l ( x i ) Fix i ∈ { 1 , . . . , n } and suppose minimum q i := min j q ij is unique: 41 / 70

  38. Interpolating between k -means and GMM E-M Same M-step: fix π = ( 1 / k , . . . , 1 / k ) and Σ j = c I for a fixed c > 0 . Same E-step: define q ij := 1 2 � x i − µ j � 2 ; the E-step chooses R ij := p θ ( y i = j | x i ) = p θ ( y i = j, x i ) p θ ( y i = j, x i ) = p θ ( x i ) � k l =1 p θ ( y i = l, x i ) π j p µ j , Σ j ( x i ) exp( − q ij /c ) = = � k � k l =1 exp( − q il /c ) l =1 π l p µ l , Σ l ( x i ) Fix i ∈ { 1 , . . . , n } and suppose minimum q i := min j q ij is unique: exp( − q ij /c ) exp( q i − q ij /c ) lim c ↓ 0 R ij = lim = lim � k � k c ↓ 0 l =1 exp( − q il /c ) c ↓ 0 l =1 exp( q i − q il /c ) � 1 q ij = q i , = q ij � = q i . 0 That is, R becomes hard assignment A as c ↓ 0 . 41 / 70

  39. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  40. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  41. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  42. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  43. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  44. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  45. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  46. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  47. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  48. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  49. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  50. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  51. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 42 / 70

  52. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 42 / 70

  53. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 42 / 70

Recommend


More recommend