Sum-Product: Message Passing Belief Propagation Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani
All single-node marginals If we need the full set of marginals, repeating elimination algorithm for each individual variable is wasteful It does not share intermediate terms Message-passing algorithms on graphs (messages are the shared intermediate terms). sum-product and junction tree upon convergence of the algorithms, we obtain marginal probabilities for all cliques of the original graph. 2
Tree Sum-product work only in trees (and we will see it also work on tree-like graphs) Directed tree Undirected tree All nodes have one A unique path between parent expect to the root any pair of nodes 3
Parameterization Consider a tree 𝒰(𝒲, ℰ) Potential functions: 𝜚 𝑦 𝑗 , 𝜚(𝑦 𝑗 , 𝑦 𝑘 ) 𝑄 𝒚 = 1 𝜚 𝑦 𝑗 𝜚 𝑦 𝑗 , 𝑦 𝑘 𝑎 𝑗∈𝒲 𝑗,𝑘 ∈ℰ In directed graphs: 𝑄 𝒚 = 𝑄(𝑦 𝑠 ) 𝑄 𝑦 𝑘 |𝑦 𝑗 𝑗,𝑘 ∈ℰ 𝜚 𝑦 𝑠 = 𝑄(𝑦 𝑠 ) , ∀𝑗 ≠ 𝑠, 𝜚 𝑦 𝑗 = 1 𝜚 𝑦 𝑗 , 𝑦 𝑘 = 𝑄(𝑦 𝑘 |𝑦 𝑗 ) ( 𝑦 𝑗 is the parent of 𝑦 𝑘 ) 𝑎 = 1 When we have evidence on variable 𝑦 𝑗 as 𝑦 𝑗 = 𝑦 𝑗 we replace 𝑦 𝑗 in all factors in which it appears by 𝑦 𝑗 4
Sum-product: elimination view Query node 𝑠 Elimination order: inverse of the topological order Starts from leaves and generates elimination cliques of size at most two Elimination of each node can be considered as message- passing (or Belief Propagation): Elimination on trees is equivalent to message passing along tree branches Instead of the node elimination, we preserve the node and compute a message from it to its parent This message is equivalent to the factor resulted from the elimination of that node and all of the nodes in its subtree 5
Messages A node can send a message to its neighbors when (and only when) it has received messages from all its other neighbors. root … Message that 𝑘 sends to 𝑗 6
Messages and marginal distribution Message that X 𝑘 sends to 𝑌 𝑗 𝑛 𝑘𝑗 𝑦 𝑗 = 𝜚 𝑦 𝑘 𝜚 𝑦 𝑗 , 𝑦 𝑘 𝑛 𝑙𝑘 (𝑦 𝑘 ) 𝑦 𝑘 𝑙∈𝒪(𝑘)\𝑗 a function of only 𝑦 𝑗 𝑞 𝑦 𝑠 ∝ 𝜚 𝑦 𝑠 𝑛 𝑙𝑠 (𝑦 𝑠 ) 𝑙∈𝒪(𝑠) 7
Messages and marginal: Example Compute 𝑞 𝑦 1 𝑞 𝑦 1 ∝ 𝜚 𝑦 1 𝑛 21 (𝑦 1 ) Product remained 𝑛 21 𝑦 1 = 𝜚 𝑦 2 𝜚 𝑦 1 , 𝑦 2 𝑛 32 (𝑦 2 )𝑛 42 (𝑦 2 ) factors (after 21 eliminating all variables 𝑦 2 except to 𝑦 1 ) 𝑛 32 𝑦 2 = 𝜚 𝑦 3 𝜚 𝑦 2 , 𝑦 3 𝑛 42 𝑦 2 = 𝜚 𝑦 4 𝜚 𝑦 2 , 𝑦 4 𝑦 3 𝑦 4 8
Messages and marginal: Example Compute 𝑞 𝑦 2 𝑛 12 𝑦 2 = 𝜚 𝑦 1 𝜚 𝑦 1 , 𝑦 2 𝑦 1 𝑞 𝑦 2 ∝ 𝜚 𝑦 2 𝑛 12 (𝑦 2 )𝑛 32 (𝑦 2 )𝑛 42 (𝑦 2 ) 𝑛 32 𝑦 2 = 𝜚 𝑦 3 𝜚 𝑦 2 , 𝑦 3 𝑛 42 𝑦 2 = 𝜚 𝑦 4 𝜚 𝑦 2 , 𝑦 4 𝑦 3 𝑦 4 9
Messages on a tree Messages can be reused to find probabilities on different query variables. Messages on the tree provide a data structure for caching computations. 𝑌 2 We need 𝑛 32 (𝑦 2 ) to find both 𝑌 1 𝑌 3 𝑄(𝑌 1 ) and 𝑄(𝑌 2 ) 𝑌 4 𝑌 5 10
From elimination to message passing Recall ELIMINATION algorithm: Choose an ordering Z in which query node f is the final node Place all potentials on an active list Eliminate node i by removing all potentials containing i, take sum over xi Place the resultant factor back on the list For a TREE graph: Choose query node f as the root of the tree View tree as a directed tree with edges pointing towards leaves from f Elimination ordering based on reverse topological order Elimination of each node can be considered as message-passing directly along tree branches Thus, we can use the tree itself as a data-structure to do general inference!! 11 This slide has been adopted from Eric Zing, PGM 10708, CMU.
Computing all node marginals We can compute over all possible elimination order (generating only elimination cliques of size 2) by only computing all possible messages ( 2 ℰ ) T o allow all nodes can be the root, we just need to compute 2 ℰ messages Messages can be reused Instead of running the elimination algorithm 𝑂 times Dynamic programming approach 2-Pass algorithm that saves and uses messages A pair of messages (one for each direction) have been computed for each edge 12
Messages required to compute all node marginals 13
Computing node marginals: Naïve approach: Complexity: N×C N is the number of nodes C is the complexity of a complete message passing Alternative dynamic programming approach 2-Pass algorithm Complexity: 2C! 14
A two-pass message-passing schedule Arbitrarily pick a node as the root First pass: starting at the leaves and proceeds inward each node passes a message to its parent. continues until the root has obtained messages from all of its adjoining nodes. Second pass: starting at the root and passing the messages back out messages are passed in the reverse direction. continues until all leaves have received their messages. 15
Asynchronous two-pass message-passing First pass: upward 16 Second pass: downward
Sum-product algorithm: example 𝑛 21 (𝑦 1 ) 𝑛 21 (𝑦 1 ) 17
Sum-product algorithm: example 𝑛 21 (𝑦 1 ) 18
Parallel (synchronous) message-passing For a node of degree d, whenever messages have arrived on any subset of d-1 node, compute the message for the remaining edge and send! A pair of messages have been computed for each edge, one for each direction All incoming messages are eventually computed for each node 19
Parallel message-passing Message-passing protocol: a node can send a message to a neighboring node when and only when it has received messages from all of its other neighbors Correctness of parallel message-passing on trees The synchronous implementation is “ non-blocking ” Theorem: The message-passing guarantees obtaining all marginals in the tree 20
Parallel message passing: Example 21
Tree-like graphs Sum-product message passing idea can also be extended to work in tree-like graphs (e.g., polytrees) too. Although the undirected marginalized graphs resulted from polytrees are not tree, the corresponding factor graph is a tree Polytree Moralized Nodes can have graph Factor graph multiple parents 22
References D. Koller and N. Friedman, “ Probabilistic Graphical Models: Principles and Techniques ” , MIT Press, 2009, Chapter 10. M.I. Jordan, “ An Introduction to Probabilistic Graphical Models ” , Chapter 4. 23
Recommend
More recommend