Probabilistic & Unsupervised Learning Belief Propagation - PowerPoint PPT Presentation

Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2016

Recall: Belief Propagation on undirected trees Joint distribution of undirected tree: p ( X ) = 1 � � f i ( X i ) f ij ( X i , X j ) Z X i X j nodes i edges ( ij ) Messages computed recursively: � � M j → i ( X i ) := f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) X j l ∈ ne ( j ) \ i Marginal distributions: � p ( X i ) ∝ f i ( X i ) M k → i ( X i ) k ∈ ne ( i ) � � p ( X i , X j ) ∝ f ij ( X i , X j ) f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i

Loopy Belief Propagation Joint distribution of undirected graph: p ( X ) = 1 � � f i ( X i ) f ij ( X i , X j ) X i X j Z nodes i edges ( ij ) Messages computed recursively (with few guarantees of convergence): � � M j → i ( X i ) := f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) X j l ∈ ne ( j ) \ i Marginal distributions are approximate in general: � p ( X i ) ≈ b i ( X i ) ∝ f i ( X i ) M k → i ( X i ) k ∈ ne ( i ) � � p ( X i , X j ) ≈ b ij ( X i , X j ) ∝ f ij ( X i , X j ) f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i

Dealing with loops ◮ Accuracy : BP posterior marginals are approximate on all non-trees because evidence is over counted, but converged approximations are frequently found to be good (particularly in their means).

Dealing with loops ◮ Accuracy : BP posterior marginals are approximate on all non-trees because evidence is over counted, but converged approximations are frequently found to be good (particularly in their means). ◮ Convergence : no general guarantee, but BP does converge in some cases: ◮ Trees. ◮ Graphs with a single loop. ◮ Distributions with sufficiently weak interactions. ◮ Graphs with long (and weak) loops ◮ Gaussian networks: means correct, variances may also converge.

Dealing with loops ◮ Accuracy : BP posterior marginals are approximate on all non-trees because evidence is over counted, but converged approximations are frequently found to be good (particularly in their means). ◮ Convergence : no general guarantee, but BP does converge in some cases: ◮ Trees. ◮ Graphs with a single loop. ◮ Distributions with sufficiently weak interactions. ◮ Graphs with long (and weak) loops ◮ Gaussian networks: means correct, variances may also converge. ◮ Damping : Common approach to encourage convergence (cf EP) � � M new i → j ( X j ) := ( 1 − α ) M old i → j ( X j ) + α f ij ( X i , X j ) f i ( X i ) M k → i ( X i ) X i k ∈ ne ( i ) \ j

Dealing with loops ◮ Accuracy : BP posterior marginals are approximate on all non-trees because evidence is over counted, but converged approximations are frequently found to be good (particularly in their means). ◮ Convergence : no general guarantee, but BP does converge in some cases: ◮ Trees. ◮ Graphs with a single loop. ◮ Distributions with sufficiently weak interactions. ◮ Graphs with long (and weak) loops ◮ Gaussian networks: means correct, variances may also converge. ◮ Damping : Common approach to encourage convergence (cf EP) � � M new i → j ( X j ) := ( 1 − α ) M old i → j ( X j ) + α f ij ( X i , X j ) f i ( X i ) M k → i ( X i ) X i k ∈ ne ( i ) \ j ◮ Grouping variables : Variables can be grouped into cliques to improve accuracy. ◮ Region graph approximations. ◮ Cluster variational method. ◮ Junction graph.

Different Interpretations of Loopy Belief Propagation Loopy BP can be interpreted as a fixed point algorithm from a few different perspectives: ◮ Expectation propagation. ◮ Tree-based reparametrization. ◮ Bethe free energy.

Loopy BP as message-based Expectation Propagation ⇒ Approximate pairwise factors f ij by product of messages: f ij ( X i , X j ) ≈ ˜ f ij ( X i , X j ) = M i → j ( X j ) M j → i ( X i ) Thus, the full joint is approximated by a factorised distribution: � � p ( X ) ≈ 1 f ij ( X i , X j ) = 1 � � � � � ˜ f i ( X i ) f i ( X i ) M j → i ( X i ) = b i ( X i ) Z Z nodes i edges ( ij ) nodes i j ∈ ne ( i ) nodes i but with multiple factors for most X i .

Loopy BP as message-based EP X j X i Then the EP updates to the messages are:

Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s )

Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s )

Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )]

Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )] Now, q ¬ ij () factors ⇒ rhs factors ⇒ min is achieved by marginals of f ij () q ¬ ij ()

Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )] Now, q ¬ ij () factors ⇒ rhs factors ⇒ min is achieved by marginals of f ij () q ¬ ij () � � � � � M new j → i ( X i ) q ¬ ij ( X i ) = f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) f i ( X i ) M k → i ( X i ) X j l ∈ ne ( j ) \ i k ∈ ne ( i ) \ j � � � �� ⇒ M new j → i ( X i ) = f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) q ¬ ij ( X i ) X j l ∈ ne ( j ) \ i

Message-based EP ◮ Thus message-based EP in a loopy graph need not be seen as two separate approximations one to the sites and one to the cavity (as we had in the EP lecture).

Message-based EP ◮ Thus message-based EP in a loopy graph need not be seen as two separate approximations one to the sites and one to the cavity (as we had in the EP lecture). ◮ Instead, we can see it as a more severe constraint on the approximate sites: not just to an ExpFam factor, but to a product of ExpFam messages.

Message-based EP ◮ Thus message-based EP in a loopy graph need not be seen as two separate approximations one to the sites and one to the cavity (as we had in the EP lecture). ◮ Instead, we can see it as a more severe constraint on the approximate sites: not just to an ExpFam factor, but to a product of ExpFam messages. ◮ On a tree-structured graph the message-factored version of EP finds the same marginals as standard EP .

Message-based EP ◮ Thus message-based EP in a loopy graph need not be seen as two separate approximations one to the sites and one to the cavity (as we had in the EP lecture). ◮ Instead, we can see it as a more severe constraint on the approximate sites: not just to an ExpFam factor, but to a product of ExpFam messages. ◮ On a tree-structured graph the message-factored version of EP finds the same marginals as standard EP . ◮ Messages are calculated in exactly the same way as before (cf NLSSM).

Message-based EP ◮ Thus message-based EP in a loopy graph need not be seen as two separate approximations one to the sites and one to the cavity (as we had in the EP lecture). ◮ Instead, we can see it as a more severe constraint on the approximate sites: not just to an ExpFam factor, but to a product of ExpFam messages. ◮ On a tree-structured graph the message-factored version of EP finds the same marginals as standard EP . ◮ Messages are calculated in exactly the same way as before (cf NLSSM). ◮ Pairwise marginals can be found after convergence by computing ˜ P ( y i − 1 , y i ) as required (cf Forward-backward for HMMs).

Probabilistic & Unsupervised Learning Belief Propagation - PowerPoint PPT Presentation

Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2016 Recall: Belief

Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

PLANT PROPAGATION An Overview of Plant Propagation Methods Two Techniques of Stem Cutting

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Overview Independence Belief Networks Conditional Independence Belief networks Chris

26:198:722 Expert Systems I Dempster-Shafer Belief Functions I Combining Belief Functions I Types

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Introduction: Belief vs Degrees of Belief Hannes Leitgeb LMU Munich October 2014 My three

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Shuffled Belief Propagation Decoding Juntan Zhang and Marc Fossorier Department of Electrical

An empirical study of Gaussian belief propagation and application in the detection of F-formations

Probabilistic & Unsupervised Learning Expectation Propagation Maneesh Sahani

Probabilistic & Unsupervised Learning Expectation Propagation Maneesh Sahani

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Higher dimensional massive (bi-)gravity: Constructions and solutions Tuan Q. Do Vietnam National

A Probabilistic Separation Logic Justin Hsu UWMadison Computer Sciences 1 Brilliant

TBD R a v i K u m a r Google Mountain View, CA T heory B ehind D iscrete choice R a v i K u

A Systematic Presentation of Equilibrium Bidding Strategies to Undergradudate Students Felix

Buy! Buy! Buy! Bi (?) A graphic short exploring queerness, media, and representation By Katherine

Tutorial RCIS 2013-Paris May 29-31 Introduction Presenters: Noushin Ashrafi

A Reinforcement Learning Framework for Natural Question Generation using Bi-discriminators

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

Probabilistic & Unsupervised Learning Belief Propagation - PowerPoint PPT Presentation

Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2016 Recall: Belief

Probabilistic &amp; Unsupervised Learning Belief Propagation Maneesh Sahani

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

PLANT PROPAGATION An Overview of Plant Propagation Methods Two Techniques of Stem Cutting

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Overview Independence Belief Networks Conditional Independence Belief networks Chris

26:198:722 Expert Systems I Dempster-Shafer Belief Functions I Combining Belief Functions I Types

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Introduction: Belief vs Degrees of Belief Hannes Leitgeb LMU Munich October 2014 My three

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Shuffled Belief Propagation Decoding Juntan Zhang and Marc Fossorier Department of Electrical

An empirical study of Gaussian belief propagation and application in the detection of F-formations

Probabilistic &amp; Unsupervised Learning Expectation Propagation Maneesh Sahani

Probabilistic &amp; Unsupervised Learning Expectation Propagation Maneesh Sahani

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Higher dimensional massive (bi-)gravity: Constructions and solutions Tuan Q. Do Vietnam National

A Probabilistic Separation Logic Justin Hsu UWMadison Computer Sciences 1 Brilliant

TBD R a v i K u m a r Google Mountain View, CA T heory B ehind D iscrete choice R a v i K u

A Systematic Presentation of Equilibrium Bidding Strategies to Undergradudate Students Felix

Buy! Buy! Buy! Bi (?) A graphic short exploring queerness, media, and representation By Katherine

Tutorial RCIS 2013-Paris May 29-31 Introduction Presenters: Noushin Ashrafi

A Reinforcement Learning Framework for Natural Question Generation using Bi-discriminators

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani

Probabilistic & Unsupervised Learning Expectation Propagation Maneesh Sahani

Probabilistic & Unsupervised Learning Expectation Propagation Maneesh Sahani