Junction-tree algorithm Probabilistic Graphical Models Sharif University of Technology Spring 2016 Soleymani
Junction-tree algorithm: a general approach Junction trees as opposed to the sum-product on trees can be applied on general graphs Junction tree as opposed to the elimination algorithm is not “ query-oriented ” enables us to record and use the intermediated factors to respond to multiple queries simoultaneously Upon convergence of the algorithms, we obtain marginal probabilities for all cliques of the original graph. 2
Cluster tree Cluster tree is a singly connected graph (i.e., exactly one path between each pair of nodes) in which the nodes are the cliques of an underlying graph A separator set is defined each linked pair of cliques contain the variables in the intersection of the cliques 𝑌 𝐵 𝑌 𝐶 𝑌 𝐷 𝑌 𝐵 , 𝑌 𝐶 𝑌 𝐶 𝑌 𝐶 , 𝑌 𝐷 separator set 3
Example: variable elimination and cluster tree 𝑌 4 𝑌 4 𝑌 2 𝑌 2 𝑌 6 𝑌 1 𝑌 6 𝑌 1 𝑌 3 𝑌 5 𝑌 3 𝑌 5 Moralized graph Elimination order: 𝑌 6 , 𝑌 5 , 𝑌 4 , 𝑌 3 , 𝑌 2 4
Example: elimination cliques Elimination order: 𝑌 6 , 𝑌 5 , 𝑌 4 , 𝑌 3 , 𝑌 2 𝑌 4 𝑌 4 𝑌 4 𝑌 2 𝑌 2 𝑌 2 𝑌 6 𝑌 1 𝑌 6 𝑌 6 𝑌 1 𝑌 1 𝑌 3 𝑌 5 𝑌 5 𝑌 5 𝑌 3 𝑌 3 𝑌 4 𝑌 4 𝑌 2 𝑌 2 𝑌 6 𝑌 1 𝑌 6 𝑌 1 𝑌 3 𝑌 5 𝑌 3 𝑌 5 5
Example: cluster tree obtained by VE The cluster tree contains the cliques (fully connected subsets) generated as elimination executes This cluster graph induced by an execution ofVE is necessarily a tree Indeed, after an elimination, the corresponding elimination clique will not be reappear 𝑌 4 𝑌 2 𝑌 6 𝑌 1 Elimination order: 𝑌 6 , 𝑌 5 , 𝑌 4 , 𝑌 3 , 𝑌 2 6 𝑌 3 𝑌 5
Example: cluster tree obtained by VE The cluster tree contains the cliques (fully connected subsets) generated as elimination executes This cluster graph induced by an execution ofVE is necessarily a tree Indeed, after an elimination, the corresponding elimination clique will not be reappear Maximal cliques 𝑌 4 𝑌 2 𝑌 6 𝑌 1 Elimination order: 𝑌 6 , 𝑌 5 , 𝑌 4 , 𝑌 3 , 𝑌 2 7 𝑌 3 𝑌 5
Cluster tree usefulness Cluster tree provides a structure for caching computations Multiple queries can be performed much more efficiently than performingVE for each one separately. Cluster tree dictates a partial order over the operations that are performed on factors to reach a better computational complexity 8
Junction tree property Junction tree property: If a variable appears in the two cliques in the clique tree, it must appear in all cliques on the paths connecting them For every pair of cliques 𝐷 𝑗 and 𝐷 𝑘 , all cliques on the path between 𝐷 𝑗 and 𝐷 𝑘 contain 𝑇 𝑗𝑘 = 𝐷 𝑗 ∩ 𝐷 𝑘 Also called as running intersection property The cluster tree that satisfies the running intersection property is called clique tree or junction tree . 9
Theorem The tree induced by a variable elimination algorithm satisfies running intersection property Proof: Let 𝐷 and 𝐷 ′ be two clusters that contain 𝑌 and 𝐷 𝑌 be the cluster where 𝑌 is eliminated, we will prove that 𝑌 must be present in every clique on the path between 𝐷 and 𝐷 𝑌 (and similarly on the path between 𝐷 𝑌 and 𝐷 ′ ) Idea: the computation at 𝐷 𝑌 must happen later than the computation at 𝐷 or 𝐷 ′ 10
Separation set Theorem 1: In a clique tree induced by a variable elimination algorithm, let 𝑛 𝑗𝑘 be a message that 𝐷 𝑗 sends to the neighboring cluster 𝐷 𝑘 then the scope of this message is 𝑇 𝑗𝑘 = 𝐷 𝑗 ⋂𝐷 𝑘 Theorem 2: A cluster tree satisfies running intersection property if and only if for every separation set 𝑇 𝑗𝑘 , 𝑊 ≺ 𝑗,𝑘 and 𝑊 ≺ 𝑘,𝑗 are separated in 𝐼 given 𝑇 𝑗𝑘 ≺ 𝑗,𝑘 : set of all variables in the scope of all 𝑊 cliques in the 𝐷 𝑗 side of the edge (𝑗, 𝑘) 11
Junction tree algorithm Given a factorized probability distribution 𝑄 with the Markov network 𝐼 , builds a junction tree 𝑈 based on 𝐼 For each clique, it finds the marginal probability over the variables in that clique Message-passing sum product (Shafer-Shenoy algorithm) Run a message-passing algorithm on the junction tree constructed according to the distribution Belief update: Local consistency preservation (Hugin algorithm) rescaling (update) equations 12
Junction tree algorithm: inference Junction tree inference algorithm is a message passing on a junction tree structure. Each clique starts with a set of initial factors. We assign a factor in the distribution 𝑄 to one and only one clique in 𝑈 if the scope of the factor is a subset of the variables in that clique Each clique sends one message to each neighbor in a schedule. Each clique multiplies the incoming messages and its potential, sum out over one or more variables and send an outcoming message. After message-passing, by combining its potential with the messages it receives from its neighbors, it can compute the marginal over its variables. 13
Junction-tree message passing: Shafer-Shenoy algorithm 𝜔 𝑗 = 𝜚 𝜚∈𝐺 𝑗 𝐺 𝑗 shows the set of factors assigned to clique 𝐷 𝑗 𝑛 𝑗𝑘 𝑇 𝑗𝑘 𝑛 𝑗𝑘 𝑇 𝑗𝑘 = 𝜔 𝑗 𝑛 𝑙𝑗 (𝑇 𝑙𝑗 ) 𝑇 𝑗𝑘 𝐷 𝐷 𝑗 𝐷 𝑗 −S 𝑗𝑘 𝑙∈𝒪 𝑗 −{j} 𝑘 𝑄(𝐷 𝑠 ) ∝ 𝜔 𝑠 𝑛 𝑙𝑠 (𝑇 𝑙𝑠 ) 𝑛 𝑘𝑗 𝑇 𝑗𝑘 𝑙∈𝒪(𝑠) Marginal on a clique as a product of the initial 𝑄(𝑇 𝑗𝑘 ) ∝ 𝑛 𝑗𝑘 𝑇 𝑗𝑘 𝑛 𝑘𝑗 𝑇 𝑗𝑘 potential and the messages from its neighbors 14
Junction-tree algorithm: correctness If 𝑌 is eliminated when a message is sent from 𝐷 𝑗 to a neighboring 𝐷 𝑘 such that 𝑌 ∈ 𝐷 𝑗 and 𝑌 ∉ 𝐷 𝑘 , then 𝑌 does not appear in the tree on the 𝐷 𝑘 side of the edge (𝑗, 𝑘) after elimination ≺ 𝑗,𝑘 : set of all variables in the 𝑊 scope of all cliques in the 𝐷 𝑗 side of the edge (𝑗, 𝑘) 𝐺 ≺ 𝑗,𝑘 : set of factors in the cliques in the 𝐷 𝑗 side of the edge (𝑗, 𝑘) 𝐺 𝑗 : set of factors in the clique 𝐷 𝑗 15
Junction-tree algorithm: correctness 𝐷 𝑘 Induction on the length of the path from the leaves: Base step: leaves 𝐷 𝑗 Inductive step: 𝐷 𝑗 𝑛 𝑛 𝑗→𝑘 𝑇 𝑗𝑘 = 𝜚 𝐷 𝑗 1 … 𝜚∈𝐺 ≺ 𝑗,𝑘 𝑊 ≺ 𝑗,𝑘 ≺ 𝑗,𝑘 is a disjoint union of 𝑊 ≺ 𝑗 𝑙 ,𝑗 for 𝑙 = 1, … , 𝑛 𝑊 = … 𝜚 … 𝜚 𝜚 𝜚∈𝐺 ≺ 𝑗1,𝑗 𝜚∈𝐺 ≺ 𝑗𝑙,𝑗 𝜚∈𝐺 𝑗 𝐷 𝑗 \𝑇 𝑗𝑘 𝑊 ≺ 𝑗1,𝑗 𝑊 ≺ 𝑗𝑙,𝑗 = 𝜚 𝜚 … 𝜚 𝜚∈𝐺 𝑗 𝜚∈𝐺 ≺ 𝑗1,𝑗 𝜚∈𝐺 ≺ 𝑗𝑙,𝑗 𝐷 𝑗 \𝑇 𝑗𝑘 𝑊 ≺ 𝑗1,𝑗 𝑊 ≺ 𝑗 𝑙 ,𝑗 = 𝜔 𝑗 × 𝑛 𝑗 1 →𝑗 × ⋯ × 𝑛 𝑗 𝑙 →𝑗 𝐷 𝑗 \𝑇 𝑗𝑘 16
Message passing schedule A two-pass message-passing schedule: arbitrarily pick a node as the root First pass: starting at the leaves and proceeds inward each node passes a message to its parent. continues until the root has obtained messages from all of its adjoining nodes. Second pass: starting at the root and passing the messages back out messages are passed in the reverse direction. continues until all leaves have received their messages. 17
Junction tree algorithm Belief update perspective: Hugin algorithm We define two sets of potential functions: Clique potentials: on each clique 𝒀 𝐷 , we define a potential function 𝜔 𝒀 𝐷 that is proportional to the marginal probability on that clique 𝜔 𝑌 𝐷 ∝ 𝑄 𝑌 𝐷 Separator potentials: on each separator set 𝒀 𝑇 , we define a potential function 𝜚 𝒀 𝑇 that is proportional to the marginal probability on 𝒀 𝑇 𝜔 𝑌 𝑇 ∝ 𝑄 𝑌 𝑇 𝜔 𝑊 𝜔 𝑋 𝜚 𝑇 𝑊 𝑇 𝑋 Enables us to obtain local representation of marginal probabilities in cliques 18
Extended representation of joint probability We intend to find extended representation: 𝑄 𝒀 ∝ 𝐷 𝜔 𝐷 (𝒀 𝐷 ) 𝑇 𝜚 𝑇 (𝒀 𝑇 ) 𝐷 𝜔 𝐷 (𝒀 𝐷 ) where the global representation 𝑇 𝜚 𝑇 (𝒀 𝑇 ) corresponds to the joint probabilities and the local representations 𝜔 𝐷 (𝒀 𝐷 ) and 𝜚 𝑇 (𝒀 𝑇 ) correspond to marginal probabilities 19
Consistency Consistency: since the potentials are required to represent marginal probabilities, they must give the same marginals for the nodes that they have in common Consistency is a necessary and sufficient condition for the inference algorithm to find potentials that are marginals We will first introduce local consistency: 𝜚 𝑇 𝑗𝑘 = 𝜔 𝑗 = 𝜔 𝑘 𝐷 𝑗 \S 𝑗𝑘 𝐷 𝑘 \S 𝑗𝑘 20
Recommend
More recommend