Graphical Models Graphical Models Loopy BP and Bethe Free Energy Siamak Ravanbakhsh Winter 2018
Learning objective Learning objective loopy belief propagation its variational derivation: Bethe approximation
So far... So far... exact inference: variable elimination equivalent to belief propagation (BP) in a clique tree
So far... So far... exact inference: variable elimination equivalent to belief propagation (BP) in a clique tree This class... This class... what if the exact inference is too expensive? (i.e., the tree-width is large) continue to use BP: loopy BP why is this a good idea? answer using variational interpretation
Recap: BP in clique trees Recap : BP in clique trees sum-product BP message update: ( S ) = ψ ( C ) ( S ) ∑ C − S i ∏ k ∈ Nb − j δ δ i → j i , j k → i i , k i i , j i i sepset cluster/clique from leaves towards the root back to leaves
Recap: BP in clique trees Recap : BP in clique trees sum-product BP message update: ( S ) = ψ ( C ) ( S ) ∑ C − S i ∏ k ∈ Nb − j δ δ i → j i , j k → i i , k i i , j i i sepset cluster/clique from leaves towards the root back to leaves marginal (belief) for each cluster: p ( C ) ∝ β ( C ) = ψ ( C ) ( S ) i ∏ k ∈ Nb i δ k → i i , k i i i i i
Clique-tree for Clique-tree for tree structures tree structures x 1 x 5 pairwise potentials ( x , x ) ϕ i , j i j tree width = 1 x 2 x 4 x 3 x 6 one possible clique-tree what are the sepsets? one cluster per factor
Clique-tree for tree structures Clique-tree for tree structures x 1 x 5 pairwise potentials ( x , x ) ϕ i , j i j tree width = 1 x 2 x 4 x 3 x 6 one possible clique-tree what are the sepsets? one cluster per factor a different valid clique-tree check for running intersection property
BP for BP for tree structures tree structures pairwise potentials ( x , x ) ϕ i , j i j message update x i x j ( x ) = ( x , x ) ( x ) ∑ x i j ∏ k ∈ Nb − j δ ϕ δ i → j i , j k → i j i i i from leaves towards a root back to leaves one cluster per factor
BP for BP for tree structures tree structures pairwise potentials ( x , x ) ϕ i , j i j message update x i x j ( x ) = ( x , x ) ( x ) ∑ x i j ∏ k ∈ Nb − j δ ϕ δ i → j i , j k → i j i i i from leaves towards a root back to leaves marginal (belief) for each cluster one cluster per factor p ( x ) ∝ ( x ) ∏ k ∈ Nb i δ k → i i i i ( x , x ) ∝ ϕ ( x , x ) ( x ) ( x ) j ∏ k ∈ Nb − j i ∏ k ∈ Nb − i p δ δ i , j i , j k → i k → j i j i j i j
BP for tree structures: BP for tree structures: reparametrization reparametrization graphical model represents 1 ∏ i , j ∈ E * p ( x ) = ( x , x ) ϕ i , j i j z write it in terms of marginals ( x , x ) ∏ i , j ∈ E p i , j p ( x ) = i j ∣ Nb ∣−1 i ∏ i p i one cluster per factor why is this correct? the denominator is adjusting for double-counts substitute the marginals using BP messages to get (*)
Variational Variational interpretation interpretation BP as I-projection arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 q ( x ) ∏ i i i write q in terms of marginals of interest minimization gives us the marginals q , q i , j i
Variational Variational free energy free energy q ( x )(ln q ( x ) − ln p ( x )) D ( q ∥ p ) = ∑ x − H ( q ) E [ ln ϕ ( x , x )] − ln( Z ) q ∑ i , j i , j i j = − H ( q ) − E [ ln ϕ ( x , x )] + ln Z q ∑ i , j i , j i j ignore: does not depend on q I-projection is equivalent to arg max H ( q ) + E [ ln ϕ ( x , x )] q ∑ i , j i , j q i j variational free energy free energy is a lower-bound on ln Z
Simplifying the free energy Simplifying the free energy arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 q ( x ) ∏ i i i ≡ arg max H ( q ) + E [ ln ϕ ( x , x )] q ∑ i , j i , j q i j so far did not use the decomposed form of q both entropy and energy involve summation over exponentially many terms
Simplifying Simplifying the free energy the free energy arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 q ( x ) ∏ i i i ≡ arg max H ( q ) + E [ ln ϕ ( x , x )] q ∑ i , j i , j q i j ∑ i , j ∈ E ∑ x i , j ( x , x ) ln ϕ ( x , x ) q i , j i , j i j i j H ( q ) − (∣ Nb ∣ − 1) H ( q ) ∑ i , j ∈ E ∑ i i , j follows from the decomposition of q i i
Variational interpretation: Variational interpretation: marginal constraints marginal constraints arg max H ( q ) + E [ ln ϕ ( x , x )] q ∑ i , j i , j q i j ( x , x ) ln ϕ ( x , x ) ∑ i , j ∈ E ∑ x i , j q i , j i , j i j i j H ( q ) − (∣ Nb ∣ − 1) H ( q ) ∑ i , j ∈ E ∑ i i , j i i marginals should be "valid" , q q a real distribution with these marginals should exist i , j i marginal polytope ( x , x ) = q ( x ) ∀ i , j ∈ E , x ∑ x i q i , j i j j j j for tree graphical models this local consistency is enough
Variational derivation of BP Variational derivation of BP arg max { q } ∑ i , j ∈ E H ( q ) − ∑ i (∣ Nb ∣ − 1) H ( q ) + ∑ i , j ∈ E ∑ x i , j ( x , x ) ln ϕ ( x , x ) q i , j i , j i , j i i i j i j
Variational derivation of BP Variational derivation of BP arg max { q } ∑ i , j ∈ E H ( q ) − ∑ i (∣ Nb ∣ − 1) H ( q ) + ∑ i , j ∈ E ∑ x i , j ( x , x ) ln ϕ ( x , x ) q i , j i , j i , j i i i j i j locally consistent ∑ x i ( x , x ) = q ( x ) ∀ i , j ∈ E , x q i , j i j j j j marginal distributions ( x , x ) ≥ 0 ∀ i , j ∈ E , x , x q i , j i j i j q ( x ) = 1 ∀ i ∑ x i i i
Variational derivation of BP Variational derivation of BP arg max { q } ∑ i , j ∈ E H ( q ) − ∑ i (∣ Nb ∣ − 1) H ( q ) + ∑ i , j ∈ E ∑ x i , j ( x , x ) ln ϕ ( x , x ) q i , j i , j i , j i i i j i j locally consistent ∑ x i ( x , x ) = q ( x ) ∀ i , j ∈ E , x q i , j i j j j j marginal distributions ( x , x ) ≥ 0 ∀ i , j ∈ E , x , x q i , j i j i j q ( x ) = 1 ∀ i ∑ x i i i BP update is derived as "fixed-points" of the Lagrangian BP messages are the (exponential form of the) Lagrange multipliers
What happens if there are loops What happens if there are loops? We can still apply BP update: ( x ) ∝ ∑ x i ( x , x ) j ∏ k ∈ Nb − j ( x ) δ ψ δ i → j i , j k → i j i k i proportional to normalize the message for numerical stability
What happens if there are What happens if there are loops loops? We can still apply BP update: ( x ) ∝ ∑ x i ( x , x ) j ∏ k ∈ Nb − j ( x ) δ ψ δ i → j i , j k → i j i k i proportional to normalize the message for numerical stability update the messages synchronously or sequentially
What happens if there are What happens if there are loops loops? We can still apply BP update: ( x ) ∝ ∑ x i ( x , x ) j ∏ k ∈ Nb − j ( x ) δ ψ δ i → j i , j k → i j i k i proportional to normalize the message for numerical stability update the messages synchronously or sequentially may not converge (oscillating behavior)
What happens if there are What happens if there are loops loops? We can still apply BP update: ( x ) ∝ ∑ x i ( x , x ) j ∏ k ∈ Nb − j ( x ) δ ψ δ i → j i , j k → i j i k i proportional to normalize the message for numerical stability update the messages synchronously or sequentially may not converge (oscillating behavior) even when convergent only gives an approximation: ^ ( x ) ∝ ( x ) is not (proportional to) the exact marginal p ( x ) ∏ k ∈ Nb i p δ k → i i i i
Loopy BP on Loopy BP on factor graphs factor graphs ψ {1,2,3} ψ {3,5} 1 ∏ I p ( x ) = ψ ( x ) factor nodes I I Z is a subset of variables I ⊆ {1, … , N } x 5 x 1 x 2 x 3 x 4 variable nodes
Loopy BP on Loopy BP on factor graphs factor graphs ψ {1,2,3} ψ {3,5} 1 ∏ I p ( x ) = ψ ( x ) factor nodes I I Z is a subset of variables I ⊆ {1, … , N } x 5 x 1 x 2 x 3 x 4 variable nodes variable-to-factor message: ( x ) ∝ ( x ) ∏ J ∣ i ∈ J , J ≠ I δ δ i → I J → i i i
Loopy BP on Loopy BP on factor graphs factor graphs ψ {1,2,3} ψ {3,5} 1 ∏ I p ( x ) = ψ ( x ) factor nodes I I Z is a subset of variables I ⊆ {1, … , N } x 5 x 1 x 2 x 3 x 4 variable nodes variable-to-factor message: ( x ) ∝ ( x ) ∏ J ∣ i ∈ J , J ≠ I δ δ i → I J → i i i factor-to-variable message: ( x ) ∝ ψ ( x ) ( x ) ∑ x I − i I ∏ j ∈ I − i δ δ I → i j → I i I i
Loopy BP on Loopy BP on factor graphs factor graphs ψ {1,2,3} ψ {3,5} 1 ∏ I p ( x ) = ψ ( x ) factor nodes I I Z is a subset of variables I ⊆ {1, … , N } x 5 x 1 x 2 x 3 x 4 variable nodes variable-to-factor message: ( x ) ∝ ( x ) ∏ J ∣ i ∈ J , J ≠ I δ δ i → I J → i i i factor-to-variable message: ( x ) ∝ ψ ( x ) ( x ) ∑ x I − i I ∏ j ∈ I − i δ δ I → i j → I i I i after convergence: ^ ( x ) ∝ ( x ) ∏ J ∣ i ∈ J p δ J → i i i
Recommend
More recommend