School of Computer Science Junction Tree Algorithm and a case study of the Hidden Markov Model Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 6, Oct 3, 2007 Receptor A Receptor A X 1 X 1 X 1 Receptor B Receptor B X 2 X 2 X 2 Eric Xing Eric Xing Kinase C Kinase C X 3 X 3 X 3 Kinase D Kinase D X 4 X 4 X 4 Kinase E Kinase E X 5 X 5 X 5 TF F TF F X 6 X 6 X 6 Reading: J-Chap 12, 17, KF-Chap. 10 Gene G Gene G X 7 X 7 X 7 X 8 X 8 X 8 Gene H Gene H 1 Outline � So far we have studied exact inference in: Trees � message passing on the original graph (which are trees) � � Poly-trees, Tree-like graphs message passing in factor trees � � Now we will look into exact inference in arbitrary graphs Junction-Tree algorithm � � Inference in Hidden Markov Model Eric Xing 2 1
Elimination Clique � Recall that Induced dependency during marginalization is captured in elimination cliques � Summation <-> elimination � Intermediate term <-> elimination clique B A B A A C A A C D A E F C D E Can this lead to an generic E � E F G H inference algorithm? Eric Xing 3 A Clique Tree B A B A A m m b c C m A A d m C D f m A E F e C D m h E m E E F g G H m ( a , c , d ) e ∑ = ( | , ) ( ) ( , ) p e c d m e m a e g f e Eric Xing 4 2
From Elimination to Message Passing Elimination ≡ message passing on a clique tree � B A B A B A B A B A B A B A B A A C D C D C D C D C D C C D E F E F E F E F E G H G H G ≡ B A B A A m m c b C m A A d m C D f A m ( , , ) m a c d E F e e C D ∑ = m p ( e | c , d ) m ( e ) m ( a , e ) g f h E E m E F e g G H Messages can be reused � Eric Xing 5 From Elimination to Message Passing Elimination ≡ message passing on a clique tree � � Another query ... B A B A A m m c b C m A A d m C D f A m E F e C D m E h E m E F g G H Messages m f and m h are reused, others need to be recomputed � Eric Xing 6 3
The Junction Tree Algorithm Recall: Elimination ≡ message passing on a clique tree � Junction Tree Algorithm: � computing messages on a clique tree � � message passing protocol on a clique tree There are several inference algorithms; some of which operate � directly on (special) directed graph Forward-backward algorithm for HMM (we will see it later) � Pealing algorithm for trees and phylogenies � The junction tree algorithm is the most popular and general � inference algorithm, it operates on an undirected graph To understand the JT-algorithm, we need to understand how to compile a � directed graph into an undirected graph Eric Xing 7 Moral Graph Note that for both directed GMs and undirected GMs, the joint � probability is in a product form: 1 ∏ ∏ = = ψ ( ) ( ) BN: P ( X ) P ( X | X ) MRF: P X X π c c i i Z 1 = ∈ : i d c C So let’s convert local conditional probabilities into potentials; then � the second expression will be generic, but how does this operation affect the directed graph? We can think of a conditional probability, e.g,. P ( C | A , B ) as a function of the three � variables A , B , and C (we get a real number of each configuration): A A C C B B Ψ ( A , B,C ) = P ( C | A , B ) P ( C | A , B ) � Problem: But a node and its parent are not generally in the same clique in a BN Solution: Marry the parents to obtain the "moral graph" � Eric Xing 8 4
Moral Graph (cont.) Define the potential on a clique as the product over all conditional � probabilities contained within the clique Now the product of potentials gives the right answer: � X 4 X 5 X 1 X 1 X 6 X 6 X 3 X 3 X 2 X 2 X 5 X 4 P ( X , X , X , X , X , X ) 1 2 3 4 5 6 = P ( X ) P ( X ) P ( X | X , X ) P ( X | X ) P ( X | X ) P ( X | X , X ) 1 2 3 1 2 4 3 5 3 6 4 5 = ψ ψ ψ ( , , ) ( , , ) ( , , ) X X X X X X X X X 1 2 3 3 4 5 4 5 6 Note that here the ψ = where ( X , X , X ) P ( X ) P ( X ) P ( X | X , X ) 1 2 3 1 2 3 1 2 interpretation of potential is ambivalent: ψ = ( X , X , X ) P ( X | X ) P ( X | X ) 3 4 5 4 3 5 3 it can be either marginals ψ = ( X , X , X ) P ( X | X , X ) or conditionals 4 5 6 6 4 5 Eric Xing 9 Clique trees A clique tree is an (undirected) tree of cliques � X 5 X 1 , X 2 , X 3 X 3 , X 4 , X 5 X 4 , X 5 , X 6 X 1 X 6 X 3 X 1 , X 2 , X 3 X 3 , X 4 , X 5 X 4 , X 5 , X 6 X 2 X 4 X 3 X 4 Consider cases in which two neighboring cliques V and W have an � overlap S (e.g., ( X 1 , X 2 , X 3 ) overlaps with ( X 3 , X 4 , X 5 ) ), ψ φ ψ ( V ) ( S ) ( W ) V S W Now we have an alternative representation of the joint in terms of � the potentials: Eric Xing 10 5
Clique trees A clique tree is an (undirected) tree of cliques � X 5 X 1 X 6 X 3 X 1 , X 2 , X 3 X 3 , X 4 , X 5 X 4 , X 5 , X 6 X 2 X 3 X 4 X 4 The alternative representation of the joint in terms of the potentials: � P ( X , X , X , X , X , X ) 1 2 3 4 5 6 = P ( X ) P ( X ) P ( X | X , X ) P ( X | X ) P ( X | X ) P ( X | X , X ) 1 2 3 1 2 4 3 5 3 6 4 5 P ( X , X , X ) P ( X , X , X ) 3 4 5 4 5 6 = ( , , ) P X X X 1 2 3 ( ) ( , ) P X P X X 3 4 5 Now each potential is ψ ψ ( X , X , X ) ( X , X , X ) = ψ 3 4 5 4 5 6 ( X , X , X ) isomorphic to the cluster 1 2 3 φ φ ( ) ( , ) X X X 3 4 5 marginal of the attendant ∏ ψ set of variables ( X ) Generally: = C C � ( ) C P X ∏ φ ( ) X S S S Eric Xing 11 Why this is useful? � Propagation of probabilities Now suppose that some evidence has been "absorbed" (i.e., certain values of � some nodes have been observed). How do we propagate this effect to the rest of the graph? X 4 X 5 X 1 X 1 X 6 X 6 X 3 X 3 X 2 X 2 X 5 X 4 What do we mean by propagate? � Can we adjust all the potentials { ψ }, { φ } so that they still represent the correct cluster marginals (or unnormalized equivalents) of their respective attendant variables? X 1 , X 2 , X 3 X 3 , X 4 , X 5 X 4 , X 5 , X 6 X ∑ = = ψ Utility? X 3 X 4 � P ( X | X x ) ( X , X , X ) 1 6 6 1 2 3 , 2 X 3 = = φ ( | ) ( ) P X X x X 3 6 6 3 Local operations! X ∑ = ψ P ( x ) ( X , X , x ) 6 4 5 6 , X Eric Xing 4 5 12 6
Local Consistency ψ φ ψ We have two ways of obtaining p ( S ) ( V ) ( S ) ( W ) � S V W ∑ ∑ = ψ = ψ P ( S ) ( V ) P ( S ) ( W ) \ \ V S W S and they must be the same The following update-rule ensures this: � φ * * = ∑ φ = ψ ψ ψ * S Forward update: � S V W φ W \ V S S φ * * ∑ φ = ψ ψ = ψ * * * * * * S Backward update � S W V φ V * \ W S S Two important identities can be proven � = ∑ ∑ ψ ψ ψ ψ ψ ψ * * * * * * ψ ψ = φ * * * * * = = V W V W V W V W S φ φ φ * * * \ \ V S W S S S S Local Consistency Invariant Joint Eric Xing 13 Message Passing Algorithm φ * * = ∑ ψ φ ψ ( V ) ( S ) ( W ) φ = ψ ψ ψ * S S V W φ W \ V S S S V W φ * * ∑ φ = ψ ψ = ψ * * * * * * S S W V φ V * \ W S S � This simple local message-passing algorithm on a clique tree defines the general probability propagation algorithm for directed graphs! Many interesting algorithms are special cases: � Forward-backward algorithm for hidden Markov models, � Kalman filter updates � Pealing algorithms for probabilistic trees � The algorithm seems reasonable. Is it correct? � Eric Xing 14 7
A problem � Consider the following graph and a corresponding clique tree A B A,B B,D C D A,C C,D Note that C appears in two non-neighboring cliques � � Question : with the previous message passage, can we ensure that the probability associated with C in these two (non- neighboring) cliques consistent? � Answer: No. It is not true that in general local consistency implies global consistency � What else do we need to get such a guarantee? Eric Xing 15 Triangulation A triangulated graph is one in which no cycles with � A B four or more nodes exist in which there is no chord C D We triangulate a graph by adding chords: � A B Now we no longer have our global inconsistency � problem. C D A clique tree for a triangulated graph has the running � intersection property : If a node appears in two cliques, A,B,C it appears everywhere on the path between the cliques Thus local consistency implies global consistency � B,C,D Eric Xing 16 8
Recommend
More recommend