New Generalizations of the Bethe Approximation via Asymptotic Expansion Ryuhei Mori Toshiyuki Tanaka Kyoto University 35th Symposium of Information Theory and Its Application Beppu, Oita, Japan 13 December 2012
The Bethe approximation ◮ Successful approximation for low-density parity-check codes, compressed sensing, etc. ◮ Efficient message passing algorithm belief propagation (BP). ◮ A fixed point of BP is a stationary point of the Bethe free energy [Yedidia et al. 2005]. 2 / 24
Factor graph and partition function For a factor graph G . ◮ V : the set of variable nodes i 7 ◮ F : the set of factor nodes ◮ X : the alphabet set a 5 i 6 ◮ N : the number of variables ◮ d o : the degree of a node for a 4 i 5 o ∈ V ∪ F ◮ f a : a non-negative function a 3 i 4 in X d a → R ≥ 0 . a 2 i 3 1 � p ( x ; G ) := f a ( x ∂ a ) Z ( G ) a ∈ F a 1 i 2 � � Z ( G ) := f a ( x ∂ a ) a ∈ F x ∈X N i 1 3 / 24
The Legendre transformation � � − log Z ( G ) = inf − q ( x ) log f a ( x ∂ a ) − H ( q ) q ∈P ( X N ) x ∈X N a ∈ F where H ( q ) is the Shannon entropy. log Z ( G ) and − H ( q ) are dual in the sense of Legendre transformation. log Z ( G ) ← → − H ( q ) 4 / 24
The Bethe free energy � � − log Z ( G ) = inf − q ( x ) log f a ( x ∂ a ) − H ( q ) q ∈P ( X N ) x ∈X N a ∈ F � − log Z Bethe ( G ) = inf ( b i ∈P ( X )) i ∈ V ,( b a ∈P ( X da )) a ∈ F � � � − b a ( x ∂ a ) log f a ( x ∂ a ) − H Bethe (( b i ) i ∈ V , ( b a ) a ∈ F ) a ∈ F x ∈X da where � � H Bethe (( b i ) i ∈ V , ( b a ) a ∈ F ) := H ( b a ) − ( d i − 1) H ( b i ). a ∈ F i ∈ V 5 / 24
Charactrizations of the Bethe free energy ◮ Loop calculus [Chertkov and Chernyak 2006, 2007] � . Z ( G ) = Z Bethe 1 + r ( C ) C : generalized loop − → generalized to non-binary alphabet [This work] 6 / 24
Charactrizations of the Bethe free energy ◮ Loop calculus [Chertkov and Chernyak 2006, 2007] � . Z ( G ) = Z Bethe 1 + r ( C ) C : generalized loop − → generalized to non-binary alphabet [This work] ◮ Method of graph cover [Vontobel 2010] 1 M log � Z Σ M � → log Z Bethe − → generalized to the second-order analysis [This work] 6 / 24
Loop calculus for the binary alphabet Lemma (Chertkov and Chernyak 2006, Sudderth et al., 2008) Assume that the alphabet is binary, i.e., X = { 0, 1 } . Let η i := � X i � b i = b i (1) . For any stationary point (( b i ), ( b a )) of the Bethe free energy, � Z ( E ′ ) Z ( G ) = Z Bethe (( b i ) i ∈ V , ( b a ) a ∈ F ) E ′ ⊆ E where � d i ( E ′ ) � �� X i − η i � Z ( E ′ ) := � � ( X i − η i ) 2 � b i i ∈ V b i � � X i − η i � � · . � � ( X i − η i ) 2 � b i i ∈ ∂ a , ( i , a ) ∈ E ′ a ∈ F b a 7 / 24
Generalized loop G := { E ′ ⊆ E | d o ( E ′ ) � = 1 for o ∈ V ∪ F } � Z ( E ′ ) . Z ( G ) = Z Bethe (( b i ) i ∈ V , ( b a ) a ∈ F ) 1 + E ′ ∈G\{ ∅ } 8 / 24
Loop calculus for a non-binary alphabet 1/2 Theorem (This work) For any stationary point (( b i ), ( b a )) of the Bethe free energy, � Z ( E ′ ) Z ( G ) = Z Bethe (( b i ) i ∈ V , ( b a ) a ∈ F ) E ′ ⊆ E where � � ∂ log b i ( X i ) � � � Z ( E ′ ) := ∂ η i , y i , a y ∈ ( X\{ 0 } ) | E ′| i ∈ V a ∈ ∂ i ,( i , a ) ∈ E ′ b i � � ∂ log b i ( X i ) � � · . ∂ θ i , y i , a a ∈ F i ∈ ∂ a ,( i , a ) ∈ E ′ b a Coordinate systems the natural parameters ( θ i , y ) y ∈X\{ 0 } and the expectation parameters ( η i , y ) y ∈X\{ 0 } . 9 / 24
Loop calculus for a non-binary alphabet 2/2 The Jacobian matrix ∂ θ ∂ η is the Fisher information matrix. Theorem (This work) If one chooses a sufficient statistic t i ( x i ) for i ∈ V such that the Fisher information matrix is diagonal at b i , it holds � � t i , y i , a ( X i ) − η i , y i , a � � � Z ( E ′ ) = ��� � 2 � t i , y i , a ( X i ) − η i , y i , a y ∈ ( X\{ 0 } ) | E ′| a ∈ ∂ i ,( i , a ) ∈ E ′ i ∈ V b i b i � � t i , y i , a ( X i ) − η i , y i , a � � · . ��� � 2 � t i , y i , a ( X i ) − η i , y i , a a ∈ F i ∈ ∂ a ,( i , a ) ∈ E ′ b i b a Acknowledgment: P. Vontobel for insightful discussion about normal factor graph. 10 / 24
Loop calculus for expectations Theorem (This work; it can be simplified like the previous theorem ) Let C ⊆ V , F C := { a ∈ F | ∂ a ⊆ C } and g : X | C | → R . For any (( b i ), ( b a )) ∈ A , it holds � Z ( E ′ ) Z � g ( X C ) � p = Z Bethe (( b i ) i ∈ V , ( b a ) a ∈ F ) E ′ ⊆ E \ E ( F C ) where � � ∂ log b i ( X i ) Z ( E ′ ) := � � � ∂η i , y i , a y ∈ ( X\{ 0 } ) | E ′| a ∈ ∂ i ,( i , a ) ∈ E ′ i ∈ V \ C b i � � ∂ log b i ( X i ) � � ∂θ i , y i , a i ∈ ∂ a ,( i , a ) ∈ E ′ a ∈ F \ F C b a � � ∂ log b i ( X i ) � · g ( X C ) . ∂η i , y i , a i ∈ C ,( i , a ) ∈ E ′ b C Here, �·� b C is a pseudo expectation with respect to b a ( x ∂ a ) � � b C ( x C ) = b i ( x i ) i ∈ ∂ a b i ( x i ) . � i ∈ C a ∈ F C 11 / 24
Loop calculus for single-cycle graph a 1 i 1 i 2 a 3 a 2 i 3 Cor b ak [ t i k ( X i ), t i k +1 ( X i k +1 )] := Var b k [ t i k ( X i k )] − 1 2 Cov b ak [ t i k ( X i k ), t i k +1 ( X i k +1 )]Var b k +1 [ t i k +1 ( X i k +1 )] − 1 2 . Corollary (Partition function of single-cycle factor graph) Z ( G ) = Z Bethe (( b i ) i ∈ V , ( b a ) a ∈ F ) � � �� · 1 + tr Cor b a 1 [ t i 1 ( X i 1 ), t i 2 ( X i 2 )]Cor b a 2 [ t i 2 ( X i 2 ), t i 3 ( X i 3 )] · · · Cor b an [ t i n ( X i n ), t i 1 ( X i 1 )] . 12 / 24
Correlation matrix on a tree factor graph a 3 i 1 i 3 4 a 1 a 2 i 2 Corollary ( Correlation matrix on a tree factor graph; Watanabe 2010) Cor p [ X 1 , X n ] = Cor p [ t 1 ( X 1 ), t 2 ( X 2 )]Cor p [ t 2 ( X 2 ), t 3 ( X 3 )] · · · Cor p [ t n − 1 ( X n − 1 ), t n ( X n )] 13 / 24
Graph cover Z ( G ) i 1 i 2 i 3 i 4 a 1 a 2 a 3 14 / 24
Graph cover Z ( G ) M i (0) i (1) i (2) i (0) i (1) i (2) i (0) i (1) i (2) i (0) i (1) i (2) 1 1 1 2 2 2 3 3 3 4 4 4 a (0) a (1) a (2) a (0) a (1) a (2) a (0) a (1) a (2) 1 1 1 2 2 2 3 3 3 14 / 24
Graph cover ? ≈ Z ( G ) M Z ( G σ ) i (0) i (1) i (2) i (0) i (1) i (2) i (0) i (1) i (2) i (0) i (1) i (2) 1 1 1 2 2 2 3 3 3 4 4 4 a (0) a (1) a (2) a (0) a (1) a (2) a (0) a (1) a (2) 1 1 1 2 2 2 3 3 3 14 / 24
The method of graph cover Lemma (Vontobel 2010) log � Z Σ M � = M log Z Bethe + o ( M ) Sketch of the proof. The method of types and Laplace method. 15 / 24
The second-order analysis for graph cover Lemma (This work) � log � Z Σ M � = M log Z Bethe + log ζ ( u ) + o (1) where ζ ( u ) is the edge zeta function and u a i → j = Cor b a [ t i ( X i ), t j ( X j )] . Sketch of the proof. Laplace method with the central approximation. 16 / 24
Interpretation of Legendre transformation by large deviation log Z ( G ) = 1 1 M log Z ( G ) M = lim M log Z ( G ) M M →∞ � � = − inf − p ( x ) log f a ( x ∂ a ) − H ( p ) p ∈P ( X N ) x ∈X N a ∈ F From more detailed analysis (asymptotic expansion) � det ( J ( θ )) + 1 M 0 + 1 log Z ( G ) M = M log Z ( G ) + log M 2 0 + · · · � x p ( x ) � �� � =0 17 / 24
Asymptotic expansion and asymptotic Bethe approximation � det ( J ( θ )) + 1 M 0 + 1 log Z ( G ) M = M log Z ( G ) + log M 2 0 + · · · � x p ( x ) � �� � =0 � det( ∇ F Bethe ) − 1 log � Z Σ M � = M log Z Bethe + log � � x i b i ( x i ) 1 − d i � � x ∂ a b a ( x ∂ a ) i a ∈ F � �� � =log √ ζ ( u ) [Watanabe and Fukumizu 2010] + 1 M g 1 + 1 M 2 g 2 + · · · . By letting M = 1, Definition (Asymptotic Bethe approximation) For m = 1, 2, ... , � log Z ( m ) AB := log Z Bethe + log ζ ( u ) + g 1 + g 2 + · · · + g m − 1 . 18 / 24
Edge zeta function Definition (Prime cycle) A closed walk e 1 ⇀ e 2 · · · ⇀ e n ⇀ e 1 is a prime cycle ⇐ ⇒ it is backtrackless and cannot be expressed as power of another walk. Definition (Edge zeta function) 1 � ζ ( u ) = det ( I − u e 1 , e 2 u e 2 , e 3 · · · u e n , e 1 ). ( e 1 ⇀ e 2 ··· ⇀ e n ⇀ e 1 ) is a prime cycle Lemma (Watanabe-Fukumizu formula; 2010) ζ ( u ) − 1 = det( ∇ 2 F Bethe (( η i ), ( η � a � ))) � det(Var b i [ t i ( X i )]) 1 − d i � · det(Var b a [ t a ( X ∂ a )]) i ∈ V a ∈ F where u a i → j = Cor b a [ t i ( X i ), t j ( X j )] . 19 / 24
Single-cycle graph Let A := Cor b a 1 [ t i 1 ( X i 1 ), t i 2 ( X i 2 )]Cor b a 2 [ t i 2 ( X i 2 ), t i 3 ( X i 3 )] · · · Cor b an [ t i n ( X i n ), t i 1 ( X i 1 )] Then, the true partition function Z and the asymptotic Bethe approximation Z (1) AB are Z = Z Bethe (( b i ) i ∈ V , ( b a ) a ∈ F ) (1 + tr( A )) . 1 Z (1) AB = Z Bethe (( b i ) i ∈ V , ( b a ) a ∈ F ) det( I − A ). � � 1 + tr( A ) + O ( ρ ( A ) 2 ) = Z Bethe (( b i ) i ∈ V , ( b a ) a ∈ F ) where ρ ( A ) is the spectrum radius of A . The asymptotic Bethe approximation is accurate when A ≈ 0. 20 / 24
Recommend
More recommend