graphical models graphical models
play

Graphical Models Graphical Models Loopy BP and Bethe Free Energy - PowerPoint PPT Presentation

Graphical Models Graphical Models Loopy BP and Bethe Free Energy Siamak Ravanbakhsh Winter 2018 Learning objective Learning objective loopy belief propagation its variational derivation: Bethe approximation So far... So far... exact


  1. Graphical Models Graphical Models Loopy BP and Bethe Free Energy Siamak Ravanbakhsh Winter 2018

  2. Learning objective Learning objective loopy belief propagation its variational derivation: Bethe approximation

  3. So far... So far... exact inference: variable elimination equivalent to belief propagation (BP) in a clique tree

  4. So far... So far... exact inference: variable elimination equivalent to belief propagation (BP) in a clique tree This class... This class... what if the exact inference is too expensive? (i.e., the tree-width is large) continue to use BP: loopy BP why is this a good idea? answer using variational interpretation

  5. Recap: BP in clique trees Recap : BP in clique trees sum-product BP message update: ( S ) = ψ ( C ) ( S ) ∑ C − S i ∏ k ∈ Nb − j δ δ i → j i , j k → i i , k i i , j i i sepset cluster/clique from leaves towards the root back to leaves

  6. Recap: BP in clique trees Recap : BP in clique trees sum-product BP message update: ( S ) = ψ ( C ) ( S ) ∑ C − S i ∏ k ∈ Nb − j δ δ i → j i , j k → i i , k i i , j i i sepset cluster/clique from leaves towards the root back to leaves marginal (belief) for each cluster: p ( C ) ∝ β ( C ) = ψ ( C ) ( S ) i ∏ k ∈ Nb i δ k → i i , k i i i i i

  7. Clique-tree for Clique-tree for tree structures tree structures x 1 x 5 pairwise potentials ( x , x ) ϕ i , j i j tree width = 1 x 2 x 4 x 3 x 6 one possible clique-tree what are the sepsets? one cluster per factor

  8. Clique-tree for tree structures Clique-tree for tree structures x 1 x 5 pairwise potentials ( x , x ) ϕ i , j i j tree width = 1 x 2 x 4 x 3 x 6 one possible clique-tree what are the sepsets? one cluster per factor a different valid clique-tree check for running intersection property

  9. BP for BP for tree structures tree structures pairwise potentials ( x , x ) ϕ i , j i j message update x i x j ( x ) = ( x , x ) ( x ) ∑ x i j ∏ k ∈ Nb − j δ ϕ δ i → j i , j k → i j i i i from leaves towards a root back to leaves one cluster per factor

  10. BP for BP for tree structures tree structures pairwise potentials ( x , x ) ϕ i , j i j message update x i x j ( x ) = ( x , x ) ( x ) ∑ x i j ∏ k ∈ Nb − j δ ϕ δ i → j i , j k → i j i i i from leaves towards a root back to leaves marginal (belief) for each cluster one cluster per factor p ( x ) ∝ ( x ) ∏ k ∈ Nb i δ k → i i i i ( x , x ) ∝ ϕ ( x , x ) ( x ) ( x ) j ∏ k ∈ Nb − j i ∏ k ∈ Nb − i p δ δ i , j i , j k → i k → j i j i j i j

  11. BP for tree structures: BP for tree structures: reparametrization reparametrization graphical model represents 1 ∏ i , j ∈ E * p ( x ) = ( x , x ) ϕ i , j i j z write it in terms of marginals ( x , x ) ∏ i , j ∈ E p i , j p ( x ) = i j ∣ Nb ∣−1 i ∏ i p i one cluster per factor why is this correct? the denominator is adjusting for double-counts substitute the marginals using BP messages to get (*)

  12. Variational Variational interpretation interpretation BP as I-projection arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 q ( x ) ∏ i i i write q in terms of marginals of interest minimization gives us the marginals q , q i , j i

  13. Variational Variational free energy free energy q ( x )(ln q ( x ) − ln p ( x )) D ( q ∥ p ) = ∑ x − H ( q ) E [ ln ϕ ( x , x )] − ln( Z ) q ∑ i , j i , j i j = − H ( q ) − E [ ln ϕ ( x , x )] + ln Z q ∑ i , j i , j i j ignore: does not depend on q I-projection is equivalent to arg max H ( q ) + E [ ln ϕ ( x , x )] q ∑ i , j i , j q i j variational free energy free energy is a lower-bound on ln Z

  14. Simplifying the free energy Simplifying the free energy arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 q ( x ) ∏ i i i ≡ arg max H ( q ) + E [ ln ϕ ( x , x )] q ∑ i , j i , j q i j so far did not use the decomposed form of q both entropy and energy involve summation over exponentially many terms

  15. Simplifying Simplifying the free energy the free energy arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 q ( x ) ∏ i i i ≡ arg max H ( q ) + E [ ln ϕ ( x , x )] q ∑ i , j i , j q i j ∑ i , j ∈ E ∑ x i , j ( x , x ) ln ϕ ( x , x ) q i , j i , j i j i j H ( q ) − (∣ Nb ∣ − 1) H ( q ) ∑ i , j ∈ E ∑ i i , j follows from the decomposition of q i i

  16. Variational interpretation: Variational interpretation: marginal constraints marginal constraints arg max H ( q ) + E [ ln ϕ ( x , x )] q ∑ i , j i , j q i j ( x , x ) ln ϕ ( x , x ) ∑ i , j ∈ E ∑ x i , j q i , j i , j i j i j H ( q ) − (∣ Nb ∣ − 1) H ( q ) ∑ i , j ∈ E ∑ i i , j i i marginals should be "valid" , q q a real distribution with these marginals should exist i , j i marginal polytope ( x , x ) = q ( x ) ∀ i , j ∈ E , x ∑ x i q i , j i j j j j for tree graphical models this local consistency is enough

  17. Variational derivation of BP Variational derivation of BP arg max { q } ∑ i , j ∈ E H ( q ) − ∑ i (∣ Nb ∣ − 1) H ( q ) + ∑ i , j ∈ E ∑ x i , j ( x , x ) ln ϕ ( x , x ) q i , j i , j i , j i i i j i j

  18. Variational derivation of BP Variational derivation of BP arg max { q } ∑ i , j ∈ E H ( q ) − ∑ i (∣ Nb ∣ − 1) H ( q ) + ∑ i , j ∈ E ∑ x i , j ( x , x ) ln ϕ ( x , x ) q i , j i , j i , j i i i j i j locally consistent ∑ x i ( x , x ) = q ( x ) ∀ i , j ∈ E , x q i , j i j j j j marginal distributions ( x , x ) ≥ 0 ∀ i , j ∈ E , x , x q i , j i j i j q ( x ) = 1 ∀ i ∑ x i i i

  19. Variational derivation of BP Variational derivation of BP arg max { q } ∑ i , j ∈ E H ( q ) − ∑ i (∣ Nb ∣ − 1) H ( q ) + ∑ i , j ∈ E ∑ x i , j ( x , x ) ln ϕ ( x , x ) q i , j i , j i , j i i i j i j locally consistent ∑ x i ( x , x ) = q ( x ) ∀ i , j ∈ E , x q i , j i j j j j marginal distributions ( x , x ) ≥ 0 ∀ i , j ∈ E , x , x q i , j i j i j q ( x ) = 1 ∀ i ∑ x i i i BP update is derived as "fixed-points" of the Lagrangian BP messages are the (exponential form of the) Lagrange multipliers

  20. What happens if there are loops What happens if there are loops? We can still apply BP update: ( x ) ∝ ∑ x i ( x , x ) j ∏ k ∈ Nb − j ( x ) δ ψ δ i → j i , j k → i j i k i proportional to normalize the message for numerical stability

  21. What happens if there are What happens if there are loops loops? We can still apply BP update: ( x ) ∝ ∑ x i ( x , x ) j ∏ k ∈ Nb − j ( x ) δ ψ δ i → j i , j k → i j i k i proportional to normalize the message for numerical stability update the messages synchronously or sequentially

  22. What happens if there are What happens if there are loops loops? We can still apply BP update: ( x ) ∝ ∑ x i ( x , x ) j ∏ k ∈ Nb − j ( x ) δ ψ δ i → j i , j k → i j i k i proportional to normalize the message for numerical stability update the messages synchronously or sequentially may not converge (oscillating behavior)

  23. What happens if there are What happens if there are loops loops? We can still apply BP update: ( x ) ∝ ∑ x i ( x , x ) j ∏ k ∈ Nb − j ( x ) δ ψ δ i → j i , j k → i j i k i proportional to normalize the message for numerical stability update the messages synchronously or sequentially may not converge (oscillating behavior) even when convergent only gives an approximation: ^ ( x ) ∝ ( x ) is not (proportional to) the exact marginal p ( x ) ∏ k ∈ Nb i p δ k → i i i i

  24. Loopy BP on Loopy BP on factor graphs factor graphs ψ {1,2,3} ψ {3,5} 1 ∏ I p ( x ) = ψ ( x ) factor nodes I I Z is a subset of variables I ⊆ {1, … , N } x 5 x 1 x 2 x 3 x 4 variable nodes

  25. Loopy BP on Loopy BP on factor graphs factor graphs ψ {1,2,3} ψ {3,5} 1 ∏ I p ( x ) = ψ ( x ) factor nodes I I Z is a subset of variables I ⊆ {1, … , N } x 5 x 1 x 2 x 3 x 4 variable nodes variable-to-factor message: ( x ) ∝ ( x ) ∏ J ∣ i ∈ J , J ≠ I δ δ i → I J → i i i

  26. Loopy BP on Loopy BP on factor graphs factor graphs ψ {1,2,3} ψ {3,5} 1 ∏ I p ( x ) = ψ ( x ) factor nodes I I Z is a subset of variables I ⊆ {1, … , N } x 5 x 1 x 2 x 3 x 4 variable nodes variable-to-factor message: ( x ) ∝ ( x ) ∏ J ∣ i ∈ J , J ≠ I δ δ i → I J → i i i factor-to-variable message: ( x ) ∝ ψ ( x ) ( x ) ∑ x I − i I ∏ j ∈ I − i δ δ I → i j → I i I i

  27. Loopy BP on Loopy BP on factor graphs factor graphs ψ {1,2,3} ψ {3,5} 1 ∏ I p ( x ) = ψ ( x ) factor nodes I I Z is a subset of variables I ⊆ {1, … , N } x 5 x 1 x 2 x 3 x 4 variable nodes variable-to-factor message: ( x ) ∝ ( x ) ∏ J ∣ i ∈ J , J ≠ I δ δ i → I J → i i i factor-to-variable message: ( x ) ∝ ψ ( x ) ( x ) ∑ x I − i I ∏ j ∈ I − i δ δ I → i j → I i I i after convergence: ^ ( x ) ∝ ( x ) ∏ J ∣ i ∈ J p δ J → i i i

Recommend


More recommend