approximate inference mean field methods
play

Approximate Inference: Mean Field Methods Probabilistic Graphical - PDF document

School of Computer Science Approximate Inference: Mean Field Methods Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 17, Nov 12, 2007 Receptor A Receptor A X 1 X 1 X 1 Receptor B Receptor B X 2


  1. School of Computer Science Approximate Inference: Mean Field Methods Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 17, Nov 12, 2007 Receptor A Receptor A X 1 X 1 X 1 Receptor B Receptor B X 2 X 2 X 2 Eric Xing Eric Xing Kinase C Kinase C X 3 X 3 X 3 Kinase D Kinase D X 4 X 4 X 4 Kinase E Kinase E X 5 X 5 X 5 TF F TF F X 6 X 6 X 6 Reading: KF-Chap. 12 Gene G Gene G X 7 X 7 X 7 X 8 X 8 X 8 Gene H Gene H 1 � Questions???? Kalman Filters � � Complex models LBP-Bethe Minimization � Eric Xing 2 1

  2. Approximate Inference Eric Xing 3 Variational Methods � For a distribution p (X | θ ) associated with a complex graph, computing the marginal (or conditional) probability of arbitrary random variable(s) is intractable � Variational methods formulating probabilistic inference as an optimization problem: � { } = * f arg max F ( f ) e.g. ∈ f S a (tractable ) probabilit y distributi on f : or, solutions to certain probabilis tic queries Eric Xing 4 2

  3. Exponential Family � Exponential representation of graphical models: 1 ⎧ ⎫ ∏ ∑ = ψ θ = θ φ − θ ⇒ ⎨ ⎬ p ( | ) exp ( ) A ( ) P ( ) ( ) X X X X α α D c c ⎩ α ⎭ Z α ∈ c C � Includes discrete models, Gaussian, Poisson, exponential, and many others = ∑ − θ φ E ( X ) ( X ) energy x is referred to as the of state α α D α α ⇒ { } θ = − − θ p ( | ) exp E ( ) A ( ) X X { } = − − θ, exp E ( , ) A ( ) X x x H E E Eric Xing 5 Example: the Boltzmann distribution on atomic lattice ⎧ ⎫ 1 ∑ ∑ = θ + θ ⎨ ⎬ p ( X ) exp X X X ij i j i 0 i Z ⎩ ⎭ < i j i Eric Xing 6 3

  4. Lower bounds of exponential functions 8 4 -2 -1 0 1 2 exp( x x ) µ x 1 ≥ − µ + exp( ) exp( )( ) 1 ( ) exp( x x ) x 3 3 x 2 6 x 1 ≥ µ − µ + − µ + − µ + exp( ) exp( ) ( ) ( ) ( ) 6 Eric Xing 7 Lower bounding likelihood Representing q ( X H ) by exp{- E ’( X H )} : Lemma : Every marginal distribution q (X H ) defines a lower bound of likelihood: ≥ ∫ { } ′ − p ( x ) d x exp E ( x ) E H H ( ( ) ) , 1 ′ − − − A ( ) E ( , ) E ( ) x x x x E H E H where x E denotes observed variables (evidence). Upgradeable to higher order bound [Leisink and Kappen, 2000] Upgradeable to higher order bound Eric Xing 8 4

  5. Lower bounding likelihood Representing q ( X H ) by exp{- E ’( X H )} : Lemma : Every marginal distribution q (X H ) defines a lower bound of likelihood: ∫ ≥ − − p ( ) C E ( , ) d q ( ) log q ( ) x X x x x x E H E H H H q ( X ) H = = − − + + C C E E H H , , q q q q where x E denotes observed variables (evidence). − E : expected energy E H : Gibbs free energy q q q H : entropy q Eric Xing 9 KL and variational (Gibbs) free energy � Kullback-Leibler Distance: q ( z ) ∑ ≡ KL ( q || p ) q ( z ) ln p ( z ) z � “Boltzmann’s Law” (definition of “energy”): = 1 [ ] − p ( z ) exp E ( z ) C ∑ ∑ ≡ + + KL ( q || p ) q ( z ) E ( z ) q ( z ) ln q ( z ) ln C z z Gibbs Free Energy ; G ( q ) = minimized when q ( Z ) p ( Z ) Eric Xing 10 5

  6. KL and Log Likelihood � Jensen’s inequality l θ = θ ( ; x ) log p ( x | ) ∑ = θ log p ( x , z | ) z θ p ( x , z | ) ∑ = log q ( z | x ) q ( z | x ) z θ p ( x , z | ) ∑ ⇒ l θ ≥ l θ + = L ≥ q ( z | x ) log ( ; x ) ( ; x , z ) H ( q ) q ( z | x ) c q q z � KL and Lower bound of likelihood KL( || ) KL( || ) q p q p θ θ p ( x , z | ) p ( x , z | ) ∑ l θ = θ = = ln ( ) ln ( ) p D p D ( ; x ) log p ( x | ) log q ( z ) log θ θ p ( z | x , ) p ( z | x , ) z L ( ) L ( ) q q θ p ( x , z | ) q ( z ) ∑ = q ( z ) log θ q ( z ) p ( z | x , ) z θ p ( x , z | ) q ( z ) ∑ ∑ = + q ( z ) log q ( z ) log ⇒ l θ = + L θ ( ; x ) ( q ) KL ( q || p ) q ( z ) p ( z | x , ) z z � Setting q ()= p ( z | x ) closes the gap (c.f. EM) Eric Xing 11 A variational representation of probability distributions { } = − + q arg max E H q q ∈ q Q { } = − arg min E H q ∈ q q Q where Q is the equivalent sets of realizable distributions, e.g ., all valid parameterizations of exponential family distributions, marginal polytopes [winright et al. 2003]. H q is intractable for general q Difficulty: “solution”: approximate H q and/or, relax or tighten Q Eric Xing 12 6

  7. Bethe Free Energy/LBP But we do not optimize q ( X ) explicitly, focus on the set of beliefs � = = τ = τ e.g ., b { b ( x , x ), b ( x )} � i , j i j i i Relax the optimization problem � H H q ≈ H b b = F ( b ) approximate objective: ( , ) � Betha i j i , { ∑ ∑ } relaxed feasible set: M → 0 x 1 x x x � = τ ≥ τ = τ = τ M M | ( M ⊇ ) , ( , ) ( ) ( M ) o i i j j o o x x i i { } = − * b arg min E F ( b ) ∈ M b b o � The loopy BP algorithm: a fixed point iteration procedure that tries to solve b* � Eric Xing 13 Mean field methods Optimize q ( X H ) in the space of tractable families � i.e ., subgraph of G p over which exact computation of H q is � feasible � Tightening the optimization space H exact objective: � q Q → T T ⊆ tightened feasible set: Q � ( ) = − * q arg min ∈ E H q q T q Eric Xing 14 7

  8. Mean Field Approximation Mean Field Approximation Eric Xing 15 Cluster-based approx. to the Gibbs free energy (Wiegerinck 2001, Xing et al 03,04) Exact: G [ p ( X )] (intractable) G [{ q c X ( )}] Clusters: c Eric Xing 16 8

  9. Mean field approx. to Gibbs free energy � Given a disjoint clustering, {C 1 , … , C I }, of all variables ∏ � Let = q ( ) q ( ), X X C i i i � Mean-field free energy ( ) ( ) ( ) ∑∑∏ ∑∑ = + G q x E ( x ) q x ln q x MF i C C i C i C i i i i i i x i x C C i i ( ) ( ) ∑∑ ∑∑ ( ) ∑∑ ( ) ( ) = φ + φ + e.g., G q x q x ( x x ) q x ( x ) q x ln q x (naïve mean field) MF i j i j i i i i < i j x x i x i x i j i i Will never equal to the exact Gibbs free energy no matter what � clustering is used, but it does always define a lower bound of the likelihood � Optimize each q i ( x c ) 's. � Variational calculus … � Do inference in each q i ( x c ) using any tractable algorithm Eric Xing 17 The Generalized Mean Field theorem Theorem: The optimum GMF approximation to the cluster marginal is isomorphic to the cluster posterior of the original distribution given internal evidence and its generalized mean fields: = * q ( ) p ( | , ) X X x X i H , C H , C E , C H , MB i i i i q ≠ j i GMF algorithm: Iterate over each q i Eric Xing 18 9

  10. A generalized mean field algorithm [xing et al . UAI 2003] Eric Xing 19 A generalized mean field algorithm [xing et al . UAI 2003] Eric Xing 20 10

  11. Convergence theorem Theorem: The GMF algorithm is guaranteed to converge to a local optimum, and provides a lower bound for the likelihood of evidence (or partition function) the model. Eric Xing 21 The naive mean field approximation � Approximate p ( X ) by fully factorized q ( X ) = P i q i ( X i ) � For Boltzmann distribution p ( X ) = exp{ ∑ i < j q ij X i X j +q io X i } /Z : mean field equation: Gibbs predictive distribution: ⎧ ⎧ ⎫ ⎫ ⎪ ⎪ ⎪ ⎪ ∑ ∑ p q X X x X X X X X X x A A = = θ 0 θ 0 + + θ θ + + x ⎨ ⎨ ⎬ ⎬ ( ( | ) exp ) exp i i i i i i i i ij i ij i j j j i i − j ⎪ ⎪ ⎪ ⎪ q q ⎩ ⎩ ⎭ ⎭ X i X i j j j ∈ j ∈ N N i i p X x j p = X X X j j ∈ = 〈 〈 〉 〉 ∈ ∈ x ( | { : N }) N N ( | { { : : } }) i j i i j j j i i q q j j � ℑ x j 〉 q j X resembles a “message” sent from node j to i j q j � { 〈 x j 〉 q j : j ∈ N i } 〈 〉 ∈ forms the “mean field” applied to X i from its neighborhood N { X : j } j q i j Eric Xing 22 11

  12. Generalized MF approximation to Ising models Cluster marginal of a square block C k : ⎧ ⎫ ⎪ ⎪ ∑ ∑ ∑ ∝ θ + θ + θ ⎨ ⎬ q ( X ) exp X X X X X 0 C ij i j i i ij i j k q ( X ) ⎪ ⎪ ∈ ∈ C k ' i , j C i C ∈ ∈ i C , j MB , ⎩ ⎭ k k k k ∈ k ' MBC k Virtually a reparameterized Ising model of small size. Eric Xing 23 GMF approximation to Ising models GMF 4x4 GMF 2x2 BP Attractive coupling: positively weighted Repulsive coupling: negatively weighted Eric Xing 24 12

Recommend


More recommend