Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments CSci 8980: Advanced Topics in Graphical Models Expectation Propagation Instructor: Arindam Banerjee October 26, 2007
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Posterior Estimation Consider a Bayesian model
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Posterior Estimation Consider a Bayesian model Latent variable u with prior P (0) ( u )
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Posterior Estimation Consider a Bayesian model Latent variable u with prior P (0) ( u ) Observable D , such as { x 1 , . . . , x m }
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Posterior Estimation Consider a Bayesian model Latent variable u with prior P (0) ( u ) Observable D , such as { x 1 , . . . , x m } Quantities of interest
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Posterior Estimation Consider a Bayesian model Latent variable u with prior P (0) ( u ) Observable D , such as { x 1 , . . . , x m } Quantities of interest Posterior over latent variable P (0) ( u | D )
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Posterior Estimation Consider a Bayesian model Latent variable u with prior P (0) ( u ) Observable D , such as { x 1 , . . . , x m } Quantities of interest Posterior over latent variable P (0) ( u | D ) Likelihood of observation P ( D )
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Posterior Estimation Consider a Bayesian model Latent variable u with prior P (0) ( u ) Observable D , such as { x 1 , . . . , x m } Quantities of interest Posterior over latent variable P (0) ( u | D ) Likelihood of observation P ( D ) For conjugate priors, posterior is in the same family
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Posterior Estimation Consider a Bayesian model Latent variable u with prior P (0) ( u ) Observable D , such as { x 1 , . . . , x m } Quantities of interest Posterior over latent variable P (0) ( u | D ) Likelihood of observation P ( D ) For conjugate priors, posterior is in the same family In general, it can be intractable
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Posterior Estimation Consider a Bayesian model Latent variable u with prior P (0) ( u ) Observable D , such as { x 1 , . . . , x m } Quantities of interest Posterior over latent variable P (0) ( u | D ) Likelihood of observation P ( D ) For conjugate priors, posterior is in the same family In general, it can be intractable What is the best approximation in the (prior) family?
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Posterior Estimation (Contd.) The likelihood function often factorizes n � P ( D | u ) = t i ( u ) i =1
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Posterior Estimation (Contd.) The likelihood function often factorizes n � P ( D | u ) = t i ( u ) i =1 The true posterior may be intractable n P ( u | D ) ∝ P (0) ( u ) � t i ( u ) i =1
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Posterior Estimation (Contd.) The likelihood function often factorizes n � P ( D | u ) = t i ( u ) i =1 The true posterior may be intractable n P ( u | D ) ∝ P (0) ( u ) � t i ( u ) i =1 The normalizer Z is the same as the data likelihood, i.e., n � � P (0) ( u ) � P ( D | u ) P (0) ( u ) d u = P ( D ) Z = t i ( u ) d u = u u i =1
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Posterior Estimation (Contd.) The likelihood function often factorizes n � P ( D | u ) = t i ( u ) i =1 The true posterior may be intractable n P ( u | D ) ∝ P (0) ( u ) � t i ( u ) i =1 The normalizer Z is the same as the data likelihood, i.e., n � � P (0) ( u ) � P ( D | u ) P (0) ( u ) d u = P ( D ) Z = t i ( u ) d u = u u i =1 The two problems are closely related
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Approximating the Posterior Assume prior P (0) ( u ) belongs to exponential family F P (0) ( u ) = exp( � θ 0 , s ( u ) � − ψ ( θ ))
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Approximating the Posterior Assume prior P (0) ( u ) belongs to exponential family F P (0) ( u ) = exp( � θ 0 , s ( u ) � − ψ ( θ )) Let Q ( u ) ∈ F be the best approximation to P ( u | D )
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Approximating the Posterior Assume prior P (0) ( u ) belongs to exponential family F P (0) ( u ) = exp( � θ 0 , s ( u ) � − ψ ( θ )) Let Q ( u ) ∈ F be the best approximation to P ( u | D ) Tractably compute Q ( u ) when P ( u | D ) is hard to compute
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Approximating the Posterior Assume prior P (0) ( u ) belongs to exponential family F P (0) ( u ) = exp( � θ 0 , s ( u ) � − ψ ( θ )) Let Q ( u ) ∈ F be the best approximation to P ( u | D ) Tractably compute Q ( u ) when P ( u | D ) is hard to compute Approach 1: Assumed density filtering, online Bayesian learning
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Approximating the Posterior Assume prior P (0) ( u ) belongs to exponential family F P (0) ( u ) = exp( � θ 0 , s ( u ) � − ψ ( θ )) Let Q ( u ) ∈ F be the best approximation to P ( u | D ) Tractably compute Q ( u ) when P ( u | D ) is hard to compute Approach 1: Assumed density filtering, online Bayesian learning Approach 2: Expectation propagation
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Assumed Density Filtering Start with an initial guess Q ( u ) = P (0) ( u )
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Assumed Density Filtering Start with an initial guess Q ( u ) = P (0) ( u ) Recall that n P ( u | D ) ∝ P (0) ( u ) � t i ( u ) i =1
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Assumed Density Filtering Start with an initial guess Q ( u ) = P (0) ( u ) Recall that n P ( u | D ) ∝ P (0) ( u ) � t i ( u ) i =1 At each step, update Q to incorporate one t i ( u )
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Assumed Density Filtering Start with an initial guess Q ( u ) = P (0) ( u ) Recall that n P ( u | D ) ∝ P (0) ( u ) � t i ( u ) i =1 At each step, update Q to incorporate one t i ( u ) Compute the true Bayesian update t i ( u ) Q ( u ) ˆ P ( u ) = � z t i ( z ) Q ( z ) d z
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Assumed Density Filtering Start with an initial guess Q ( u ) = P (0) ( u ) Recall that n P ( u | D ) ∝ P (0) ( u ) � t i ( u ) i =1 At each step, update Q to incorporate one t i ( u ) Compute the true Bayesian update t i ( u ) Q ( u ) ˆ P ( u ) = � z t i ( z ) Q ( z ) d z Find Q new ∈ F such that KL (ˆ P ( u ) � ˜ Q new ( u ) = argmin Q ( u )) ˜ Q ∈F
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Assumed Density Filtering Start with an initial guess Q ( u ) = P (0) ( u ) Recall that n P ( u | D ) ∝ P (0) ( u ) � t i ( u ) i =1 At each step, update Q to incorporate one t i ( u ) Compute the true Bayesian update t i ( u ) Q ( u ) ˆ P ( u ) = � z t i ( z ) Q ( z ) d z Find Q new ∈ F such that KL (ˆ P ( u ) � ˜ Q new ( u ) = argmin Q ( u )) ˜ Q ∈F Maximum likelihood estimate with ˆ P as the true distribution
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Assumed Density Filtering (Contd.) To obtain Q new it is sufficient to do moment matching µ new = E ˆ P [ s ( u )]
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Assumed Density Filtering (Contd.) To obtain Q new it is sufficient to do moment matching µ new = E ˆ P [ s ( u )] For each factor t i ( u )
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Assumed Density Filtering (Contd.) To obtain Q new it is sufficient to do moment matching µ new = E ˆ P [ s ( u )] For each factor t i ( u ) Compute the means (moments) of ˆ P ( u ) ∝ t i ( u ) Q ( u )
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments Assumed Density Filtering (Contd.) To obtain Q new it is sufficient to do moment matching µ new = E ˆ P [ s ( u )] For each factor t i ( u ) Compute the means (moments) of ˆ P ( u ) ∝ t i ( u ) Q ( u ) Pick Q new ∈ F with these mean parameters
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments ADF: An Alternative Viewpoint For a single factor t i ( u )
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments ADF: An Alternative Viewpoint For a single factor t i ( u ) The true posterior ˆ P ( u ) ∝ t i ( u ) Q ( u )
Recommend
More recommend