Sum-Product: Message Passing Belief Propagation Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani
All single-node marginals If we need the full set of marginals, repeating elimination algorithm for each individual variable is wasteful It does not share intermediate terms Message-passing algorithms on graphs (messages are the shared intermediate terms). sum-product and junction tree upon convergence of the algorithms, we obtain marginal probabilities for all cliques of the original graph. 2
Tree Sum-product work only in trees (and we will see it also work on tree-like graphs) Directed tree Undirected tree All nodes have one A unique path between parent expect to the root any pair of nodes 3
Parameterization Consider a tree 𝒰(𝒲, ℰ) Potential functions: 𝜚 𝑦 𝑗 , 𝜚(𝑦 𝑗 , 𝑦 𝑘 ) 𝑄 𝒚 = 1 𝜚 𝑦 𝑗 𝜚 𝑦 𝑗 , 𝑦 𝑘 𝑎 𝑗∈𝒲 𝑗,𝑘 ∈ℰ In directed graphs: 𝑄 𝒚 = 𝑄(𝑦 𝑠 ) 𝑄 𝑦 𝑘 |𝑦 𝑗 𝑗,𝑘 ∈ℰ 𝜚 𝑦 𝑠 = 𝑄(𝑦 𝑠 ) , ∀𝑗 ≠ 𝑠, 𝜚 𝑦 𝑗 = 1 𝜚 𝑦 𝑗 , 𝑦 𝑘 = 𝑄(𝑦 𝑘 |𝑦 𝑗 ) ( 𝑦 𝑗 is the parent of 𝑦 𝑘 ) 𝑎 = 1 When we have evidence on variable 𝑦 𝑗 as 𝑦 𝑗 = 𝑦 𝑗 we replace 𝑦 𝑗 in all factors in which it appears by 𝑦 𝑗 4
Sum-product: elimination view Query node 𝑠 Elimination order: inverse of the topological order Starts from leaves and generates elimination cliques of size at most two Elimination of each node can be considered as message- passing (or Belief Propagation): Elimination on trees is equivalent to message passing along tree branches Instead of the node elimination, we preserve the node and compute a message from it to its parent This message is equivalent to the factor resulted from the elimination of that node and all of the nodes in its subtree 5
Messages root … Message that 𝑘 sends to 𝑗 6
Messages on a tree Messages can be reused to find probabilities on different query variables. Messages on the tree provide a data structure for caching computations. 𝑌 2 We need 𝑛 32 (𝑦 2 ) to find both 𝑌 1 𝑌 3 𝑄(𝑌 1 ) and 𝑄(𝑌 2 ) 𝑌 4 𝑌 5 7
Messages and marginal distribution Message that X 𝑘 sends to 𝑌 𝑗 𝑛 𝑘𝑗 𝑦 𝑗 = 𝜚 𝑦 𝑘 𝜚 𝑦 𝑗 , 𝑦 𝑘 𝑛 𝑙𝑘 (𝑦 𝑘 ) 𝑦 𝑘 𝑙∈𝒪(𝑘)\𝑗 a function of only 𝑦 𝑗 𝑞 𝑦 𝑠 ∝ 𝜚 𝑦 𝑠 𝑛 𝑙𝑠 (𝑦 𝑠 ) 𝑙∈𝒪(𝑠) 8
Messages and marginal: Example 𝑛 12 𝑦 2 = 𝜚 𝑦 1 𝜚 𝑦 1 , 𝑦 2 𝑦 1 𝑞 𝑦 2 ∝ 𝜚 𝑦 2 𝑛 12 (𝑦 2 )𝑛 32 (𝑦 2 )𝑛 42 (𝑦 2 ) 9
Computing all node marginals We can compute over all possible elimination order (generating only elimination cliques of size 2) by only computing all possible messages ( 2 ℰ ) T o allow all nodes can be the root, we just need to compute 2 ℰ messages Messages can be reused Instead of running the elimination algorithm 𝑂 times Dynamic programming approach 2-Pass algorithm that saves and uses messages A pair of messages (one for each direction) have been computed for each edge 10
Messages required to compute all node marginals 11
A two-pass message-passing schedule Arbitrarily pick a node as the root First pass: starting at the leaves and proceeds inward each node passes a message to its parent. continues until the root has obtained messages from all of its adjoining nodes. Second pass: starting at the root and passing the messages back out messages are passed in the reverse direction. continues until all leaves have received their messages. 12
Asynchronous two-pass message-passing First pass: upward 13 Second pass: downward
Sum-product algorithm: example 𝑛 21 (𝑦 1 ) 𝑛 21 (𝑦 1 ) 14
Sum-product algorithm: example 𝑛 21 (𝑦 1 ) 15
Parallel message-passing Message-passing protocol: a node can send a message to a neighboring node when and only when it has received messages from all of its other neighbors Correctness of parallel message-passing on trees The synchronous implementation is “ non-blocking ” Theorem: The message-passing guarantees obtaining all marginals in the tree 16
Parallel message passing: Example 17
Tree-like graphs Sum-product message passing idea can also be extended to work in tree-like graphs (e.g., polytrees) too. Although the undirected marginalized graphs resulted from polytrees are not tree, the corresponding factor graph is a tree Polytree Moralized Nodes can have graph Factor graph multiple parents 18
Recall: Factor graph 𝜚 𝑦 1 , 𝑦 2 , 𝑦 3 = 𝑔 𝑦 1 , 𝑦 2 , 𝑦 3 𝜚 𝑦 1 , 𝑦 2 , 𝑦 3 = 𝑔 𝑏 (𝑦 1 , 𝑦 2 )𝑔 𝑐 (𝑦 1 , 𝑦 3 )𝑔 𝑑 (𝑦 2 , 𝑦 3 ) 19
Sum-product on factor trees Factor tree: a factor graph with no loop Two types of messages: Message that flows from variable node 𝑗 to factor node 𝑡 : 𝑤 𝑗𝑡 𝑦 𝑗 = 𝜈 𝑢𝑗 (𝑦 𝑗 ) 𝑢∈𝒪 𝑗 −{s} Message that flows from factor node 𝑡 to variable node 𝑗 : 𝜈 𝑡𝑗 𝑦 𝑗 = 𝑔 𝑡 𝒚 𝒪(𝑡) 𝑤 𝑘𝑡 (𝑦 𝑘 ) 𝒚 𝒪 𝑡 −{𝑗} 𝑘∈𝒪 𝑡 −{𝑗} 20
Sum-product on factor trees The introduced message-passing schedule for trees can also be used on factor trees When the messages from all the neighbors of a node is received, the marginal probability will be: 𝑄 𝑦 𝑗 ∝ 𝜈 𝑡𝑗 𝑦 𝑗 𝑡∈𝒪 𝑗 𝑄 𝑦 𝑗 ∝ 𝑤 𝑗𝑡 (𝑦 𝑗 )𝜈 𝑡𝑗 (𝑦 𝑗 ) 𝑡 ∈ 𝒪 𝑗 𝑡 is a factor node that is neighbor of 𝑌 𝑗 21
The relation between sum-product on factor trees and sum-product on undirected trees Relation of 𝑛 messages of sum-product algorithm for undirected trees and 𝜈 messages of sum-product algorithm for factor trees 𝜈 𝑡𝑗 𝑦 𝑗 = 𝑔 𝑡 𝒚 𝒪(𝑡) 𝑤 𝑘𝑡 (𝑦 𝑘 ) 𝒚 𝒪 𝑡 −{𝑗} 𝑘∈𝒪 𝑡 −{𝑗} = 𝜚(𝑦 𝑗 , 𝑦 𝑘 )𝑤 𝑘𝑡 (𝑦 𝑘 ) 𝑦 𝑘 = 𝜚(𝑦 𝑗 , 𝑦 𝑘 ) 𝜈 𝑢𝑘 (𝑦 𝑘 ) 𝑦 𝑘 𝑢∈𝒪 𝑘 −{s} = 𝜚(𝑦 𝑗 )𝜚(𝑦 𝑗 , 𝑦 𝑘 ) 𝜈 𝑢𝑘 (𝑦 𝑘 ) 𝑢∈𝒪 ′ 𝑘 −{s} 𝑦 𝑘 𝒪 ′ 𝑘 = 𝒪 𝑘 − {factor corresponding to 𝜚(𝑦 𝑘 )} 22
Example 23
References D. Koller and N. Friedman, “ Probabilistic Graphical Models: Principles and Techniques ” , MIT Press, 2009, Chapter 10. M.I. Jordan, “ An Introduction to Probabilistic Graphical Models ” , Chapter 4. 24
Recommend
More recommend