probabilistic unsupervised learning belief propagation
play

Probabilistic & Unsupervised Learning Belief Propagation - PowerPoint PPT Presentation

Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2014 Recall: Belief


  1. Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2014

  2. Recall: Belief Propagation on undirected trees Joint distribution of undirected tree: p ( X ) = 1 � � f i ( X i ) f ij ( X i , X j ) Z X i X j nodes i edges ( ij ) Messages computed recursively: � � M j → i ( X i ) := f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) X j l ∈ ne ( j ) \ i Marginal distributions: � p ( X i ) ∝ f i ( X i ) M k → i ( X i ) k ∈ ne ( i ) � � p ( X i , X j ) ∝ f ij ( X i , X j ) f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i

  3. Loopy Belief Propagation Joint distribution of undirected graph: p ( X ) = 1 � � f i ( X i ) f ij ( X i , X j ) X i X j Z nodes i edges ( ij ) Messages computed recursively (with few guarantees of convergence): � � M j → i ( X i ) := f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) X j l ∈ ne ( j ) \ i Marginal distributions are approximate in general: � p ( X i ) ≈ b i ( X i ) ∝ f i ( X i ) M k → i ( X i ) k ∈ ne ( i ) � � p ( X i , X j ) ≈ b ij ( X i , X j ) ∝ f ij ( X i , X j ) f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i

  4. Dealing with loops ◮ Accuracy : BP posterior marginals are approximate on all non-trees, but converged approximations are frequently found to be good. ◮ Convergence : no general guarantee, but BP does converge in some cases: ◮ Trees. ◮ Graphs with a single loop. ◮ Distributions with sufficiently weak interactions. ◮ Graphs with long (and weak) loops ◮ Gaussian networks: means correct, variances may also converge. ◮ Damping : Common approach to encourage convergence (cf EP) � � M new i → j ( X j ) := ( 1 − α ) M old i → j ( X j ) + α f ij ( X i , X j ) f i ( X i ) M k → i ( X i ) X i k ∈ ne ( i ) \ j ◮ Grouping variables : Variables can be grouped into cliques to improve accuracy. ◮ Region graph approximations. ◮ Cluster variational method. ◮ Junction graph.

  5. Different Interpretations of Loopy Belief Propagation Loopy BP can be interpreted as a fixed point algorithm from a few different perspectives: ◮ Expectation propagation. ◮ Tree-based reparametrization. ◮ Bethe free energy.

  6. Different Interpretations of Loopy Belief Propagation Loopy BP can be interpreted as a fixed point algorithm from a few different perspectives: ◮ Expectation propagation. ◮ Tree-based reparametrization. ◮ Bethe free energy.

  7. Loopy BP as message-based Expectation Propagation ⇒ Approximate pairwise factors f ij by product of messages: f ij ( X i , X j ) ≈ ˜ f ij ( X i , X j ) = M i → j ( X j ) M j → i ( X i ) Thus, the full joint is approximated by a factorised distribution: � � p ( X ) ≈ 1 f ij ( X i , X j ) = 1 � � � � � ˜ f i ( X i ) f i ( X i ) M j → i ( X i ) = b i ( X i ) Z Z nodes i edges ( ij ) nodes i j ∈ ne ( i ) nodes i but with multiple factors for most X i .

  8. Loopy BP as message-based EP X j X i Then the EP updates to the messages are:

  9. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s )

  10. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s )

  11. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )]

  12. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )] Now, q ¬ ij () factors ⇒ rhs factors ⇒ min is achieved by marginals of f ij () q ¬ ij ()

  13. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )] Now, q ¬ ij () factors ⇒ rhs factors ⇒ min is achieved by marginals of f ij () q ¬ ij () � � � � � M new j → i ( X i ) q ¬ ij ( X i ) = f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) f i ( X i ) M k → i ( X i ) X j l ∈ ne ( j ) \ i k ∈ ne ( i ) \ j � � � �� � � � ⇒ M new j → i ( X i ) = f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) q ¬ ij ( X i ) X j l ∈ ne ( j ) \ i

  14. Different Interpretations of Loopy Belief Propagation Loopy BP can be interpreted as a fixed point algorithm from a few different perspectives: ◮ Expectation propagation. ◮ Tree-based reparametrization. ◮ Bethe free energy.

  15. Loopy BP as tree-based reparametrisation Tree-structured distributions can be parametrised in many ways: p ( X ) = 1 � � f i ( X i ) f ij ( X i , X j ) undirected tree (1) Z nodes i edges ( ij ) � = p ( X r ) p ( X i | X pa ( i ) ) directed (rooted) tree (2) i � = r p ( X i , X j ) � � = p ( X i ) pairwise marginals (3) p ( X i ) p ( X j ) nodes i edges ( ij ) where (3) requires that � X j p ( X i , X j ) = p ( X i ) . The undirected tree representation is not unique—multiplying a factor f ij ( X i , X j ) by g ( X i ) and dividing f i ( X i ) by the same g ( X i ) does not change the distribution. BP can be seen as an iterative replacement of f i ( X i ) by the local marginal of p ij ( X i , X j ) , along with the corresponding reparametrisation of f ij ( X i , X j ) . Cf. Hugin propagation. Converged BP on a tree finds p ( X i ) and p ( X i , X j ) , allowing us to transform (1) to (3).

  16. Reparametrisation on trees X e X b � p ( X ) = f ij ( X i , X j ) X f ( ij ) X a X d ⇓ p ( X i , X j ) � � X c p ( X ) = p ( X i ) p ( X i ) p ( X k ) X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) :

  17. Reparametrisation on trees X e X b � f de 1 · f ab · 1 p ( X ) = f ij ( X i , X j ) f df X f f ad ( ij ) X a X d ⇓ p ( X i , X j ) � � f ac X c p ( X ) = p ( X i ) f dg p ( X i ) p ( X k ) X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) : p n ( X i , X j ) = f n − 1 ( X i ) f n − 1 ( X i , X j ) f n − 1 ( X j ) i ij j

  18. Reparametrisation on trees X e X b � f de 1 · f ab · 1 p ( X ) = f ij ( X i , X j ) f df X f f ad ( ij ) X a X d ⇓ p ( X i , X j ) � � f ac X c p ( X ) = p ( X i ) M b → a f dg p ( X i ) p ( X k ) X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) : p n ( X i , X j ) = f n − 1 ( X i ) f n − 1 ( X i , X j ) f n − 1 ( X j ) i ij j � � p n ( X i , X j ) = f n − 1 f n − 1 ( X i , X j ) f n − 1 f n i ( X i ) = p n ( X i ) = ( X i ) ( X j ) i ij j X j X j � �� � M j → i

  19. Reparametrisation on trees X e X b f ab � f de M b → a p ( X ) = f ij ( X i , X j ) f df X f f ad ( ij ) X a X d ⇓ p ( X i , X j ) � � f ac X c p ( X ) = p ( X i ) M b → a f dg p ( X i ) p ( X k ) X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) : p n ( X i , X j ) = f n − 1 ( X i ) f n − 1 ( X i , X j ) f n − 1 ( X j ) i ij j � � p n ( X i , X j ) = f n − 1 f n − 1 ( X i , X j ) f n − 1 f n i ( X i ) = p n ( X i ) = ( X i ) ( X j ) i ij j X j X j � �� � f n − 1 ( X i , X j ) M j → i f n ij ij ( X i , X j ) = M j → i ( X i )

  20. Reparametrisation on trees X e X b f ab � f de M b → a p ( X ) = f ij ( X i , X j ) f df X f f ad ( ij ) X a X d ⇓ p ( X i , X j ) � � X c p ( X ) = p ( X i ) M b → a f dg p ( X i ) p ( X k ) 1 · f ac · M b → a X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) : p n ( X i , X j ) = f n − 1 ( X i ) f n − 1 ( X i , X j ) f n − 1 ( X j ) i ij j � � p n ( X i , X j ) = f n − 1 f n − 1 ( X i , X j ) f n − 1 f n i ( X i ) = p n ( X i ) = ( X i ) ( X j ) i ij j X j X j � �� � f n − 1 ( X i , X j ) M j → i f n ij ij ( X i , X j ) = M j → i ( X i )

Recommend


More recommend