probabilistic graphical models probabilistic graphical
play

Probabilistic Graphical Models Probabilistic Graphical Models Loopy - PowerPoint PPT Presentation

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy Fall 2019 Siamak Ravanbakhsh Learning objective Learning objective loopy belief propagation its variational derivation: Bethe approximation So


  1. Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy Fall 2019 Siamak Ravanbakhsh

  2. Learning objective Learning objective loopy belief propagation its variational derivation: Bethe approximation

  3. So far... So far... exact inference: variable elimination equivalent to belief propagation (BP) in a clique tree

  4. So far... So far... exact inference: variable elimination equivalent to belief propagation (BP) in a clique tree This lecture... This lecture... what if the exact inference is too expensive? (i.e., the tree-width is large) continue to use BP: loopy BP why is this a good idea? answer using variational interpretation

  5. Recap Recap: BP in clique trees : BP in clique trees sum-product BP message update: ( S ) = ( C ) ( S ) ∑ C i ∏ k ∈ Nb δ ψ δ i → j i , j k → i i , k − S i − j i , j i i sepset cluster/clique from leaves towards the root back to leaves

  6. Recap Recap: BP in clique trees : BP in clique trees sum-product BP message update: ( S ) = ( C ) ( S ) ∑ C i ∏ k ∈ Nb δ ψ δ i → j i , j k → i i , k − S i − j i , j i i sepset cluster/clique from leaves towards the root back to leaves marginal (belief) for each cluster: ( C ) ∝ ( C ) = ( C ) ( S ) i ∏ k ∈ Nb p β ψ δ k → i i , k i i i i i i

  7. Clique-tree for Clique-tree for tree structures tree structures x x 1 5 pairwise potentials ( x , x ) ϕ i , j i j tree width = 1 x x 2 4 x x one possible clique-tree 3 6 what are the sepsets? one cluster per factor

  8. Clique-tree for Clique-tree for tree structures tree structures x x 1 5 pairwise potentials ( x , x ) ϕ i , j i j tree width = 1 x x 2 4 x x one possible clique-tree 3 6 what are the sepsets? one cluster per factor a different valid clique-tree check for running intersection property

  9. BP for BP for tree structures tree structures pairwise potentials ( x , x ) ϕ i , j i j message update x x ( x ) = ( x , x ) ( x ) ∑ x j ∏ k ∈ Nb i j δ ϕ δ i → j i , j k → i j i − j i i i from leaves towards a root back to leaves one cluster per factor

  10. BP for BP for tree structures tree structures pairwise potentials ( x , x ) ϕ i , j i j message update x x ( x ) = ( x , x ) ( x ) ∑ x j ∏ k ∈ Nb i j δ ϕ δ i → j i , j k → i j i − j i i i from leaves towards a root back to leaves marginal (belief) for each cluster one cluster per factor ( x ) ∝ ( x ) ∏ k ∈ Nb p δ k → i i i i i ( x , x ) ∝ ( x , x ) ( x ) ( x ) j ∏ k ∈ Nb i ∏ k ∈ Nb p ϕ δ δ i , j i , j k → i k → j i j i j − j − i i j

  11. BP for tree structures: BP for tree structures: reparametrization reparametrization graphical model represents 1 ∏ i , j ∈ E * p ( x ) = ( x , x ) ϕ i , j i j z write it in terms of marginals ( x , x ) ∏ i , j ∈ E p i , j p ( x ) = i j ∣ Nb ∣−1 ∏ i i p i one cluster per factor why is this correct? the denominator is adjusting for double-counts substitute the marginals using BP messages to get (*)

  12. Variational Variational interpretation interpretation BP as I-projection arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 ( x ) ∏ i q i i write q in terms of marginals of interest minimization gives us the marginals q , q i , j i

  13. Variational Variational free energy free energy q ( x )(ln q ( x ) − D ( q ∥ p ) = ln p ( x )) ∑ x − H ( q ) E [ ln ϕ ( x , x )] − ln( Z ) q ∑ i , j i , j i j = − H ( q ) − E [ ln ϕ ( x , x )] + ln Z q ∑ i , j i , j i j ignore: does not depend on q I-projection is equivalent to arg max E H ( q ) + [ ln ϕ ( x , x )] q ∑ i , j i , j q i j variational free energy free energy is a lower-bound on ln Z

  14. Simplifying the free energy Simplifying the free energy arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 ( x ) ∏ i q i i E ≡ arg max H ( q ) + [ ln ϕ ( x , x )] q ∑ i , j i , j q i j so far did not use the decomposed form of q both entropy and energy involve summation over exponentially many terms

  15. Simplifying Simplifying the free energy the free energy arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 ( x ) ∏ i q i i

  16. Simplifying Simplifying the free energy the free energy arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 ( x ) ∏ i q i i E ≡ arg max H ( q ) + [ ln ϕ ( x , x )] q ∑ i , j i , j q i j

  17. Simplifying the free energy Simplifying the free energy arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 ( x ) ∏ i q i i E ≡ arg max H ( q ) + [ ln ϕ ( x , x )] q ∑ i , j i , j q i j ∑ i , j ∈ E ∑ x ( x , x ) ln ϕ ( x , x ) q i , j i , j i j i j i , j

  18. Simplifying the free energy Simplifying the free energy arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 ( x ) ∏ i q i i E ≡ arg max H ( q ) + [ ln ϕ ( x , x )] q ∑ i , j i , j q i j ∑ i , j ∈ E ∑ x ( x , x ) ln ϕ ( x , x ) q i , j i , j i j i j i , j H ( q ) − (∣ Nb ∣ − 1) H ( q ) ∑ i , j ∈ E ∑ i i , j follows from the decomposition of q i i

  19. Variational interpretation: Variational interpretation: marginal constraints marginal constraints E arg max H ( q ) + [ ln ϕ ( x , x )] q ∑ i , j i , j q i j ( x , x ) ln ϕ ( x , x ) ∑ i , j ∈ E ∑ x q i , j i , j i j i j i , j H ( q ) − (∣ Nb ∣ − 1) H ( q ) ∑ i , j ∈ E ∑ i i , j i i marginals should be "valid" , q q a real distribution with these marginals should exist i , j i marginal polytope ( x , x ) = ( x ) ∀ i , j ∈ E , x ∑ x q q i , j i j j j j i for tree graphical models this local consistency is enough

  20. Variational derivation of BP Variational derivation of BP arg max { q } ∑ i , j ∈ E H ( q ) − ∑ i (∣ Nb ∣ − 1) H ( q ) + ∑ i , j ∈ E ∑ x ( x , x ) ln ϕ ( x , x ) q i , j i , j i , j i i i j i j i , j

  21. Variational derivation of BP Variational derivation of BP arg max { q } ∑ i , j ∈ E H ( q ) − ∑ i (∣ Nb ∣ − 1) H ( q ) + ∑ i , j ∈ E ∑ x ( x , x ) ln ϕ ( x , x ) q i , j i , j i , j i i i j i j i , j locally consistent ∑ x ( x , x ) = ( x ) ∀ i , j ∈ E , x q q i , j i j j j j i marginal distributions ( x , x ) ≥ 0 ∀ i , j ∈ E , x , x q i , j i j i j ( x ) = 1 ∀ i ∑ x q i i i

  22. Variational derivation of BP Variational derivation of BP arg max { q } ∑ i , j ∈ E H ( q ) − ∑ i (∣ Nb ∣ − 1) H ( q ) + ∑ i , j ∈ E ∑ x ( x , x ) ln ϕ ( x , x ) q i , j i , j i , j i i i j i j i , j locally consistent ∑ x ( x , x ) = ( x ) ∀ i , j ∈ E , x q q i , j i j j j j i marginal distributions ( x , x ) ≥ 0 ∀ i , j ∈ E , x , x q i , j i j i j ( x ) = 1 ∀ i ∑ x q i i i BP update is derived as "fixed-points" of the Lagrangian BP messages are the (exponential form of the) Lagrange multipliers

  23. What happens if there are loops What happens if there are loops? We can still apply BP update: ( x ) ∝ ∑ x ( x , x j ∏ k ∈ Nb ) ( x ) δ ϕ δ i → j i , j k → i j i − j k i i proportional to normalize the message for numerical stability

  24. What happens if there are loops What happens if there are loops? We can still apply BP update: ( x ) ∝ ∑ x ( x , x j ∏ k ∈ Nb ) ( x ) δ ϕ δ i → j i , j k → i j i − j k i i proportional to normalize the message for numerical stability update the messages synchronously or sequentially

  25. What happens if there are What happens if there are loops loops? We can still apply BP update: ( x ) ∝ ∑ x ( x , x j ∏ k ∈ Nb ) ( x ) δ ϕ δ i → j i , j k → i j i − j k i i proportional to normalize the message for numerical stability update the messages synchronously or sequentially may not converge (oscillating behavior)

  26. What happens if there are What happens if there are loops loops? We can still apply BP update: ( x ) ∝ ∑ x ( x , x j ∏ k ∈ Nb ) ( x ) δ ϕ δ i → j i , j k → i j i − j k i i proportional to normalize the message for numerical stability update the messages synchronously or sequentially may not converge (oscillating behavior) even when convergent only gives an approximation: ^ ( x ) ∝ ( x ) is not (proportional to) the exact marginal p ( x ) ∏ k ∈ Nb p δ k → i i i i i

  27. Loopy BP on Loopy BP on factor graphs factor graphs ϕ ϕ {1,2,4} {3,5} 1 ∏ I p ( x ) = ( x ) ϕ factor nodes I I Z is a subset of variables I ⊆ {1, … , N } x x x x x 5 1 2 3 4 variable nodes

Recommend


More recommend