Parameter constraints. E-M for GMMs still works if we freeze or constrain some parameters. Examples: ◮ No weights: initialize π = ( 1 / k , . . . , 1 / k ) and never update it. ◮ Diagonal covariance matrices: update everything as before, except Σ j := diag(( σ j ) 2 1 , . . . , ( σ j ) 2 d ) where � n i =1 R ij ( x i − µ j ) 2 ( σ j ) 2 l l := ; nπ j that is: we use coordinate-wise sample variances weighted by R . Why is this a good idea? 38 / 70
Parameter constraints. E-M for GMMs still works if we freeze or constrain some parameters. Examples: ◮ No weights: initialize π = ( 1 / k , . . . , 1 / k ) and never update it. ◮ Diagonal covariance matrices: update everything as before, except Σ j := diag(( σ j ) 2 1 , . . . , ( σ j ) 2 d ) where � n i =1 R ij ( x i − µ j ) 2 ( σ j ) 2 l l := ; nπ j that is: we use coordinate-wise sample variances weighted by R . Why is this a good idea? Computation (of inverse), sample complexity, . . . 38 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70
Singularities E-M with GMMs suffers from singularities : trivial situations where the likelihood goes to ∞ but the solution is bad. ◮ Suppose: d = 1 , k = 2 , π j = 1 / 2 , n = 3 with x 1 = − 1 and x 2 = +1 and x 3 = +3 . Initialize with µ 1 = 0 and σ 1 = 1 , but µ 2 = +3 = x 3 and σ 2 = 1 / 100 . Then σ 2 → 0 and L ↑ ∞ . 40 / 70
Interpolating between k -means and GMM E-M Same M-step: fix π = ( 1 / k , . . . , 1 / k ) and Σ j = c I for a fixed c > 0 . 41 / 70
Interpolating between k -means and GMM E-M Same M-step: fix π = ( 1 / k , . . . , 1 / k ) and Σ j = c I for a fixed c > 0 . Same E-step: define q ij := 1 2 � x i − µ j � 2 ; the E-step chooses R ij := p θ ( y i = j | x i ) = p θ ( y i = j, x i ) p θ ( y i = j, x i ) = p θ ( x i ) � k l =1 p θ ( y i = l, x i ) π j p µ j , Σ j ( x i ) exp( − q ij /c ) = = � k � k l =1 exp( − q il /c ) l =1 π l p µ l , Σ l ( x i ) Fix i ∈ { 1 , . . . , n } and suppose minimum q i := min j q ij is unique: 41 / 70
Interpolating between k -means and GMM E-M Same M-step: fix π = ( 1 / k , . . . , 1 / k ) and Σ j = c I for a fixed c > 0 . Same E-step: define q ij := 1 2 � x i − µ j � 2 ; the E-step chooses R ij := p θ ( y i = j | x i ) = p θ ( y i = j, x i ) p θ ( y i = j, x i ) = p θ ( x i ) � k l =1 p θ ( y i = l, x i ) π j p µ j , Σ j ( x i ) exp( − q ij /c ) = = � k � k l =1 exp( − q il /c ) l =1 π l p µ l , Σ l ( x i ) Fix i ∈ { 1 , . . . , n } and suppose minimum q i := min j q ij is unique: exp( − q ij /c ) exp( q i − q ij /c ) lim c ↓ 0 R ij = lim = lim � k � k c ↓ 0 l =1 exp( − q il /c ) c ↓ 0 l =1 exp( q i − q il /c ) � 1 q ij = q i , = q ij � = q i . 0 That is, R becomes hard assignment A as c ↓ 0 . 41 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 42 / 70
Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 42 / 70
Recommend
More recommend