cs786 lecture 12 may 12 2012
play

CS786 Lecture 12: May 12, 2012 Inference as Optimization (continued) - PDF document

26/06/2012 CS786 Lecture 12: May 12, 2012 Inference as Optimization (continued) [KF Chapter 11] CS786 P. Poupart 2012 1 Cluster Tree Recap Variable elimination: Induces a cluster tree Inference: message propagation on cluster tree


  1. 26/06/2012 CS786 Lecture 12: May 12, 2012 Inference as Optimization (continued) [KF Chapter 11] CS786 P. Poupart 2012 1 Cluster Tree Recap • Variable elimination: – Induces a cluster tree – Inference: message propagation on cluster tree • Cluster tree: – Graph is a tree (i.e., no loops) – Node: cluster of variables – Edge: subset of variables (a.k.a. sepset) that are common to nodes it connects – Satisfies running intersection property CS786 P. Poupart 2012 2 1

  2. 26/06/2012 Cluster Tree Calibration • � � : variables in the cluster at node � � � � � : factor at node � • � �� : variables in sepset at edge � � � � �� �� �� � : factor at edge � � � • Calibrated cluster tree For all edges � � � : sepset factor is marginal of cluster factors � �� � �� � � � � �� � � � � � � �� � � �∈� � \� �� �∈� � \� �� CS786 P. Poupart 2012 3 Calibration by Message Passing • Initialization: – Messages: � �→� ← 1 and � �→� ← 1 ∀�� – Potentials: � � ← ∏ potentials associated with � � • Update messages until calibration � �→� �� �� � ← � � � � � � � �→� � �� �∈� � \� �� �∈�����\��� • Return � � � � ← � � � � ∏ � �→� �� �� � �∈����� � �� � �� ← � �→� � �� � �→� �� �� � CS786 P. Poupart 2012 4 2

  3. 26/06/2012 Properties of Calibrated Trees • Normalized � � : marginal of � � – i.e., Pr � � � � � �� � � • Normalized � �� : marginal of � �� – i.e., Pr � �� � � �� �� �� � • The � � ’s and � �� ’s can be used to simultaneously answer many marginal queries CS786 P. Poupart 2012 5 Loopy Belief Propagation • Approximate inference • Consider cluster graph (with loops) instead of a cluster tree – Scalability: clusters can be much smaller – Approximation: calibrated cluster graph does not necessarily yield correct marginals CS786 P. Poupart 2012 6 3

  4. 26/06/2012 Cluster Graph • Same as cluster tree but loops are allowed: – Any graph structure is allowed – Node: cluster of variables – Edge: subset of variables (a.k.a. sepset) that are common to nodes it connects • Generalized running intersection property – Whenever variable � is in clusters � � and � � then there is exactly one path between � � and � � such that � ∈ � � for all edges � in that path CS786 P. Poupart 2012 7 Cluster Graph Calibration • Same algorithm as for cluster tree calibration • Disadvantages: – Convergence is not guaranteed • Damping techniques may be used to ensure convergence – When convergence is achieved: • � � ’s and � �� ’s are not necessarily the correct marginals for � � , � �� • Advantages: – Approximation is often good in practice and inference scales linearly with the size of the graph CS786 P. Poupart 2012 8 4

  5. 26/06/2012 Expectation Propagation • Alternative approximation for inference • Idea: stick with cluster tree, but approximate the messages • Consequence: propagate expectations of some statistics instead of full marginals in each sepset CS786 P. Poupart 2012 9 Example CS786 P. Poupart 2012 10 5

  6. 26/06/2012 Cluster Tree with Factored Potentials • � � : variables in the cluster at node � � � � � : product of factors at node � • � �� : variables in sepset at edge � � � � �� �� �� � : product of factors at edge � � � • Calibrated cluster tree (same as before) For all edges � � � : � �� � �� � � � � �� � � � � � � �� � � �∈� � \� �� �∈� � \� �� CS786 P. Poupart 2012 11 Calibration with Factored Messages • Initialization: – Messages: � �→� ← 1 and � �→� ← 1 ∀�� – Potentials: � � ← Set of potentials associated with � � • Update messages until calibration � �→� �� �� � ← ������� � � � � � � � �→� � �� �∈� � \� �� �∈�����\��� • Return � � � � ← � � � � ∏ � �→� �� �� � �∈����� � �� � �� ← � �→� � �� � �→� �� �� � CS786 P. Poupart 2012 12 6

  7. 26/06/2012 Projection • Approximate distribution � by the “closest” distribution � from some class of distributions. • Examples: – Factorization: joint distribution  product of marginals ���� � ∏ ��� � � � – Mixture of Gaussians  single Gaussian ∑ � � � � ��|� � , � � � � ���|�, �� � – Mixture of Dirichlets  single Dirichlet: ∑ � � � � ��|� � � � ���|�� � CS786 P. Poupart 2012 13 KL ‐ Divergence • Common distance measure for projections • KL ‐ divergence (a.k.a relative entropy) definition ����| � � � � � log � � � � � • Since ����| � � ����||�� , we can also use ����| � � � � � log � � � � � CS786 P. Poupart 2012 14 7

  8. 26/06/2012 Exponential Family • Projection by KL ‐ divergence corresponds to matching expectation of some statistics • Exponential Family � : vector of parameters defining � �: vector of statistics � � � ∝ exp �� � � , � � �� CS786 P. Poupart 2012 15 Examples • Bernoulli: Pr � � � � � � � � � � � � � � � , � � � � � � � � � � ln � , ln�1 � �� � Pr � � � � � exp � � � , � � � � � �⋅�� ���⋅�� ��� � � ��� exp � ��� � � • Gaussian: Pr � � �� � � � � � �, � � � � �, � � � � � � � � , � �� � � CS786 P. Poupart 2012 16 8

Recommend


More recommend