Gaussian Mixture Models & EM
CE-717: Machine Learning
Sharif University of Technology
- M. Soleymani
Gaussian Mixture Models & EM CE-717: Machine Learning Sharif - - PowerPoint PPT Presentation
Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Mixture Models: definition Mixture models: Linear supper-position of mixtures or components | =
2
π=1 πΏ
π) π π π π; πΎπ
ο½ π=1 πΏ
π(π
π) = 1 ο½ π(π π): the prior probability of π-th mixture ο½ πΎπ: the parameters of π-th mixture ο½ π π π π; πΎπ : the probability of π according to π-th mixture
ο½ Goal: estimate π π π E.g., Multi-modal density estimation
3
ο½ Input: data points π π π=1 π ο½ Goal: find the parameters of GMM (ππ, ππ, π―π, π = 1, β¦ , πΏ)
π=1 πΏ
4
5
6
7
π=1 π
π=1 π
ο½ The sum over components appears inside the log and there is no closed
8
π=1 π
πΏ
π=1 π
πΏ
new)(π π βππ new)π
new = ππ
π=1 π
πΏ
9
10
11
ο½ Initialize ππ, π―π, ππ
ο½ E step: π = 1, β¦ , π, π = 1, β¦ , πΏ
π = π π¨ π (π) = 1|π π , πΎπππ
ππππͺ(π π |ππ πππ, π―π πππ)
πΏ
ππππͺ(π(π)|ππ πππ, π―π πππ)
ο½ M Step: π = 1, β¦ , πΏ
πππ₯ =
π
ππ(π)
π
π
πππ₯ =
π
π π=1 π
π(π(π)βππ new)(π π βππ new)π
new =
π
π
ο½ Repeat E and M steps until convergence
12
13
14
15
π1 = β2 3 Ξ£1 = 1 0.5 0.5 4 π1 = 0.6 π2 = 0 β4 Ξ£2 = 1 1 π2 = 0.25 π3 = 3 2 Ξ£3 = 3 1 1 1 π3 = 0.15
π1 = 0.36 β4.09 Ξ£1 = 0.89 0.26 0.26 0.83 π1 = 0.249 π2 = 3.25 2.09 Ξ£2 = 2.23 1.08 1.09 1.41 π2 = 0.146 π3 = β2.11 3.36 Ξ£3 = 1.12 0.61 0.61 3.61 π3 = 0.604 π1 = 1.45 β1.81 Ξ£1 = 3.30 4.76 4.76 10.01 π1 = 0.392 π2 = β2.20 3.16 Ξ£2 = 1.30 1.10 1.10 2.80 π2 = 0.429 π3 = β1.88 3.74 Ξ£3 = 5.83 β0.82 β0.82 5.83 π3 = 0.178
16
ο½ can be extended to use covariance β get βhard EMβ (ellipsoidal k-
ο½ EM+GMM has more local minima ο½ Useful trick: first run k-means and then use its result to initialize EM.
18
ο½ This objective will not decouple and we use EM algorithm to solve it
19
ο½ Assumptions: π (observed or known variables), π (unobserved or latent
ο½ If π is relevant to π (in any way), we can hope to extract information about it
ο½ Initialization: Initialize the unknown parameters πΎ ο½ Iterate the following steps, until convergence:
ο½ Expectation step: Find the probability of unobserved variables given the current
ο½ Maximization step: from the observed data and the probability of the
20
ο½ hypothesizing values for the unobserved variables in each data sample ο½ learning the parameters
ο½ Given complete data, we have the statistics, and we can estimate
ο½ Conversely, computing probability of missing data given the parameters is
21
22
23
24
25
26
π
π
π
ο½ πΊ πΎ, π
π
27
28
29
πΎ
30
31
(π)
(π)
π |π π ,πΎold
π
32
Lagrange multiplier due to the constraint π=1
π
ππ = 1
33
πΎ
34
35
ο½ It is usually expensive to have a large set of labeled data ο½ Unlabeled data is often abundant with no or low cost
36
37
38
39