Statistical Machine Learning Lecture 06 Extra: Expectation Maximization Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 1 / 9
Expectation Maximization Basic Idea log-Likelihood Desired lower bound ( q , ) t , q t K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 2 / 9
Expectation Maximization ( ) Choose q * ( q , t ) t Requirements 1. Guarantee a lower bound (aka surrogate function ) F ( q , θ ) ≤ L ( θ ) ∀ q , θ where q is the “guessed” distribution and θ are the parameters 2. Choose q ∗ such that they touch F ( q ∗ , θ t ) = L ( θ t ) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 3 / 9
Expectation Maximization Expectation-Step (E-Step) ( ) Choose q * t + 1 ( q * t + 1 , ) ( q , ) t K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 4 / 9
Expectation Maximization Maximization-Step (M-Step) ( ) * ) ( q * t + 1 , * Find θ ∗ by maximization θ ∗ = arg max � � F q ∗ t + 1 , θ θ K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 5 / 9
Expectation Maximization Find a lower bound on L ( θ ) � L ( θ ) = log p θ ( x i ) i � � = log p θ ( x i , z i ) d z i i � q ( z i ) p θ ( x i , z i ) � = log d z i q ( z i ) i (by Jensen’s inequality) � q ( z i ) log p θ ( x i , z i ) � d z i ≡ F ( q , θ ) ≥ q ( z i ) i � s.t. q ( z i ) d z i = 1 ∀ i K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 6 / 9
Expectation Maximization Constrained Optimization Problem � q ( z i ) log p θ ( x i , z i ) � d z i max q ( z i ) i � s.t. q ( z i ) d z i = 1 ∀ i K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 7 / 9
Expectation Maximization �� � �� � q ( z i ) log p θ ( x i , z i ) � = d z i + λ i q i ( z i ) d z i − 1 ∀ i L q ( z i ) i � log p θ ( x i , z i ) � ! ∇ q ( z i ) L = − 1 + λ i = 0 q ( z i ) = ⇒ q ( z i ) = exp ( λ i − 1 ) p θ ( x i , z i ) � ! ∇ λ i L = q ( z i ) d z i − 1 = 0 � exp ( λ i − 1 ) p θ ( x i , z i ) d z i = 1 � = ⇒ λ i = 1 − log p θ ( x i , z i ) d z i q ( z i ) = p θ ( z i | x i ) ≡ E-step K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 8 / 9
Expectation Maximization 1. We have a lower bound for the likelihood 2. We guaranteed F ( q ∗ , θ t ) = L ( θ t ) 3. We want to guarantee L ( θ t + 1 ) ≥ L ( θ t ) thus � � � � L ( θ t + 1 ) ≥ F q ∗ = max F q ∗ ≥ L ( θ t ) t + 1 , θ t + 1 t + 1 , θ θ 4. Choose θ t + 1 as � � θ t + 1 = arg max F q ∗ t + 1 , θ θ K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 9 / 9
Recommend
More recommend