Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Approximate Inference Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Approximate Inference 1 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Outline Introduction 1 Variational Inference 2 Variational Mixture of Gaussians 3 Exponential Family 4 Expectation Propagation 5 Summary 6 Henrik I. Christensen (RIM@GT) Approximate Inference 2 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Introduction We often are required to estimate a (conditional) prior of the form p ( Z | X ) The solution might be intractable There might not be a close form solution 1 The integration over X or a parameter space θ might be 2 computationally challenging The set of possible outcomes might be significant/exponential 3 Two strategies Deterministic Approximation Methods 1 Stochastic Sampling (Monte Carlo Techniques) 2 Today we will talk about deterministic techniques Henrik I. Christensen (RIM@GT) Approximate Inference 3 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Outline Introduction 1 Variational Inference 2 Variational Mixture of Gaussians 3 Exponential Family 4 Expectation Propagation 5 Summary 6 Henrik I. Christensen (RIM@GT) Approximate Inference 4 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Variational Inference In general we have a Bayesian Model as seen earlier, ie. ln p ( X ) = ln p ( X , Z ) − ln p ( Z | X ) We can rewrite this to ln p ( X ) = L ( q ) + KL ( q || p ) where � � p ( X , Z ) � L ( q ) = q ( Z ) ln q ( Z ) � p ( Z | X ) � � KL ( q || p ) = − q ( Z ) ln q ( Z ) So L ( q ) is an estimate of the joint distribution and KL is the Kullback-Leibler comparison of q ( Z ) to p ( Z | X ). Henrik I. Christensen (RIM@GT) Approximate Inference 5 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Factorized Distributions Assume for now that we can factorize Z into disjoint groups so that M � q ( Z ) = q i ( Z i ) i =1 In physics a similar model has been adopted termed mean field theory We can them optimize L(q) through a component wise optimization � � � L ( q ) = q i ln p ( X , Z ) − q j dZ i j � � = q j ln ˜ p ( X , Z j ) dZ j − q j ln q j dZ j + const where � ˜ p ( X , Z j ) = E i � = j [ln p ( X , Z )] + c = ln p ( X , Z ) q i dZ i + c i � = j Henrik I. Christensen (RIM@GT) Approximate Inference 6 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Factorized distributions The optimal solution is now ln q ∗ j ( Z j ) = E i � = j [ln p ( X , Z )] + c Ie the solution where every factor minimizes the influence on L ( q ) Henrik I. Christensen (RIM@GT) Approximate Inference 7 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Outline Introduction 1 Variational Inference 2 Variational Mixture of Gaussians 3 Exponential Family 4 Expectation Propagation 5 Summary 6 Henrik I. Christensen (RIM@GT) Approximate Inference 8 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Variational Mixture of Gaussians We encounter mixtures of Gaussians all the time Examples are multi-wall modelling, ambiguous localization, ... We have: a set of observed data X , a set of latent variables, Z that describe the mixture Henrik I. Christensen (RIM@GT) Approximate Inference 9 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Mixture of Gaussians - Modelling We can model the mixture model N K � � π z nk p ( Z | π ) = k n =1 k =1 We can also derive the observed conditional N K � � N ( x n | µ k , Λ − 1 k ) z nk p ( X | Z , µ, Λ) = n =1 k =1 We will for now assume that mixtures are modelled as diraclets K � π α 0 − 1 p ( π ) = Dir ( π | α 0 ) = C ( α 0 ) k k =1 Henrik I. Christensen (RIM@GT) Approximate Inference 10 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Mixture of Gaussians - Modelling The component processes can be modelled as a Gaussian-Wishart K � N ( µ k | m 0 , ( β 0 Λ k ) − 1 ) W (Λ k | W 0 , ν 0 ) p ( µ, Λ) = p ( µ | Λ) p (Λ) = k =1 Ie a total model of z n Λ π x n µ N Henrik I. Christensen (RIM@GT) Approximate Inference 11 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Mixtures of Gaussians - Variational The conditional model can be seen as p ( X , Z , π, µ, Λ) = p ( X | Z , µ, Λ) p ( Z | π ) p ( π ) p ( µ | Λ) p (Λ) Only X is observed We can now consider the selection of a distribution q ( Z , π, µ, Λ) = q ( Z ) q ( π, µ, Λ) this is clear an assumption of independence. We can use the general result of component-wise optimization ln q ∗ ( Z ) = E π,µ, Λ [ln p ( X , Z , π, µ, Λ] + const Decomposition gives us ln q ∗ ( Z ) = E π [ln p ( Z | π )] + E µ, Λ [ln p ( X | Z , µ, Λ)] + const N K � � ln q ∗ ( Z ) = z nk ln ρ nk + const n =1 k =1 Henrik I. Christensen (RIM@GT) Approximate Inference 12 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Mixtures of Gaussians - Variational We can further achieve ln ρ nk = E [ln π k ]+ 1 2 E [ln | Λ k | ] − D 2 ln 2 π − 1 2 E µ k , Λ k [( x n − µ k ) T Λ k ( x n − µ k )]+ c Taking the exponential we have K N q ∗ ( Z ) ∝ � � ρ z nk nk k =1 n =1 Using normalization we arrive at K N � � q ∗ ( Z ) ∝ r z nk nk n =1 k =1 Where ρ nk r nk = � j ρ nj Henrik I. Christensen (RIM@GT) Approximate Inference 13 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Mixtures of Gaussians - Variational Just as we saw for EM we can define N � = N k r nk n =1 N 1 � ¯ x k = r nk x n N k n =1 N 1 � x n ) T = r nk ( x n − ¯ x n )( x n − ¯ S k N k n =1 Henrik I. Christensen (RIM@GT) Approximate Inference 14 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Mixtures of Gaussians - Parameters/Mixture Lets now consider q ( π, µ, Λ) to arrive at K k N E [ z nk ] ln N ( x n | µ k , Λ − 1 X X X ln q ∗ ( π, µ, Λ) = ln p ( π ) + ln p ( µ k , Λ k ) + E Z [ln p ( Z | π )] + ) + c k k =1 k =1 n =1 We can partition the problem into K � q ( π, µ, Λ) = q ( π ) q ( µ k , Λ k ) k =1 We can derive K K N � � � ln q ∗ ( π ) = ( α 0 − 1) ln π k + r nk ln π k + c k =1 k =1 n =1 We can now derive q ∗ ( π ) = Dir ( π | α ) where α k = α 0 + N k Henrik I. Christensen (RIM@GT) Approximate Inference 15 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Mixtures of Gaussians - Parameters/Mixture We can then derive q ∗ ( µ k , Λ k ) = N ( µ k | m k , ( β k Λ k ) − 1 ) W ( λ k | W k , ν k ) where β k = β 0 + N k 1 m k = ( β 0 m 0 + N k ¯ x k ) β k β 0 N k W − 1 W − 1 x k − m 0 ) T = + N k S k + (¯ x k − m 0 )(¯ 0 K β 0 + N k ν k = ν 0 + N k + 1 Henrik I. Christensen (RIM@GT) Approximate Inference 16 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Mixtures of Gaussians - Parameters We can now arrive at the parameters E µ k , Λ k [( x n − µ k ) T ( x n − µ k )] = D β − 1 + ν k ( x n − m k ) T W K ( x n − m k ) k D � ν k + 1 − i � ln ˜ � Λ k = E [ln | Λ | k | ] = ψ + D ln 2 + ln | W k | 2 i =1 ln ˜ π k = E [ln π k ] = ψ ( α k ) − ψ (ˆ α ) here ψ ( . ) which is defined as d / da ln Γ( a ) also known as the digramma function. The last two results are given by the Gauss-Wishart Henrik I. Christensen (RIM@GT) Approximate Inference 17 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Mixtures of Gaussians - Parameters We can finally find the responsibilities � − 1 � r nk ∝ π k | Λ k | 1 / 2 exp 2( x n − µ k ) T Λ k ( x n − µ k ) The optimization is stepwise Estimate µ, Λ and then r nk 1 Estimate π and Z 2 Check for convergence - return to 1 if not converged 3 Henrik I. Christensen (RIM@GT) Approximate Inference 18 / 36
Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Mixture of Gaussians - Example 0 15 60 120 Henrik I. Christensen (RIM@GT) Approximate Inference 19 / 36
Recommend
More recommend