variable elimination
play

Variable Elimination Probabilistic Graphical Models Sharif - PowerPoint PPT Presentation

Exact Inference: Variable Elimination Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani Probabilistic Inference and Learning We now have compact representations of probability distributions


  1. Exact Inference: Variable Elimination Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani

  2. Probabilistic Inference and Learning  We now have compact representations of probability distributions (Graphical Models) A GM M describes a unique probability distribution P   Typical tasks:  Task 1: How do we answer queries about 𝑄 𝑁 , e.g., 𝑄 𝑁 𝑌 𝑍 ?  We use inference as a name for the process of computing answers to such queries  Task 2: How do we estimate a plausible model M from data D?  i.We use learning as a name for the process of obtaining point estimate of M.  ii. But for Bayesian, they seek 𝑞(𝑁|𝐸) , which is actually an inference problem.  iii. When not all variables are observable, even computing point estimate of M need to do inference to impute the missing data. 2 This slide has been adopted from Eric Zing, PGM 10708, CMU.

  3. Why we need inference  If we know the graphical model, we use the inference to find marginal or conditional distributions efficiently  We also need inference in the learning when we try to find a model from incomplete data or when the learning approach is Bayesian (as we will see in the next lectures) 3

  4. Inference query  Likelihood: probability of evidence Nodes: 𝒴 = {𝑌 1 , … , 𝑌 𝑜 } 𝒇 : evidence on a set variables 𝑭 𝒀 = 𝒴 − 𝑭 𝑄 𝒇 = 𝑄(𝒀, 𝒇) 𝒁 = 𝒀 − 𝒂 𝒀  Marginal probability distribution: 𝑄 𝒀 = 𝑄(𝒴) 𝒴−𝒀  Conditional probability distribution (a posteriori belief): 𝑄(𝒀, 𝒇) 𝑄 𝒀|𝒇 = 𝒀 𝑄(𝒀, 𝒇)  Marginalized conditional probability distribution: 𝒂 𝑄(𝒁, 𝒂, 𝒇) 𝑄 𝒁|𝒇 = (𝒀 = 𝒁 ∪ 𝒂) 𝒁 𝒂 𝑄(𝒁, 𝒂, 𝒇) query a subset Y of all domain variables X={Y,Z} 4 and "don't care" about the remaining Z

  5. Most Probable Assignment (MPA)  Most probable assignment for some variables of interest given an evidence 𝑭 = 𝒇 𝒁 ∗ |𝒇 = argmax 𝑄 𝒁|𝒇 𝒁 Maximum a posteriori configuration of 𝒁  Applications of MPA  Classification  find most likely label, given the evidence  Explanation  what is the most likely scenario, given the evidence 5

  6. MPA: Example 6 This slide has been adopted from Eric Zing, PGM 10708, CMU.

  7. Marginal probability: Enumeration  𝑄 𝒁 𝒇 ∝ 𝑄 𝒁, 𝒇  𝑄 𝒁, 𝒇 = 𝒂 𝑄(𝒁, 𝒇, 𝒂)  Marginal probability: exponential computation is required in general  #P-complete problem (enumeration intractable)  Even in the graph of polynomial size it can be exponential  We cannot find a general procedure that works efficiently for arbitrary GMs 7

  8. Harness of Inference  Hardness does not mean we cannot solve inference  It implies that we cannot find a general procedure that works efficiently for arbitrary GMs  For particular families of GMs, we can have provably efficient procedures  For special graph structure, provably efficient algorithms (avoiding exponential cost) are available 8

  9. Exact inference  Exact inference:  Variable elimination algorithm  general graph  one query  Belief propagation , sum-product on factor graphs  Tree  marginal probability on all nodes  Junction tree algorithm  general graph  marginal probability on all clique nodes 9

  10. Inference on a chain 𝐵 𝐶 𝐷 𝐸 𝑄 𝑒 = 𝑄(𝑏, 𝑐, 𝑑, 𝑒) 𝑏 𝑐 𝑑 𝑄 𝑒 = 𝑄 𝑏 𝑄 𝑐 𝑏 𝑄 𝑑 𝑐 𝑄(𝑒|𝑑) 𝑐 𝑑 𝑏  A naïve summation needs to enumerate over an exponential number of terms 10

  11. Inference on a chain: marginalization and elimination 𝐵 𝐶 𝐷 𝐸 𝑄 𝑒 = 𝑄 𝑏 𝑄 𝑐 𝑏 𝑄 𝑑 𝑐 𝑄(𝑒|𝑑) 𝑐 𝑑 𝑏 = 𝑄 𝑏 𝑄 𝑐 𝑏 𝑄 𝑑 𝑐 𝑄(𝑒|𝑑) 𝑐 𝑏 𝑑 = 𝑄(𝑒|𝑑) 𝑄 𝑑 𝑐 𝑄 𝑏 𝑄 𝑐 𝑏 𝑐 𝑏 𝑑 𝑄(𝑐) 𝑄(𝑑) 𝑄(𝑒)  In a chain of 𝑜 nodes each having 𝑙 values, 𝑃(𝑜𝑙 2 ) instead of 𝑃(𝑙 𝑜 ) 11

  12. 𝑌 𝑂 𝑌 1 𝑌 2 … 𝑌 𝑂−1 Inference on a chain 𝑌 𝑂 … 𝑌 1 𝑌 2 𝑌 𝑂−1  In both directed and undirected graphical models, the joint probability is a factored expression over subsets of the variables 𝑄 𝒚 = 1 𝑎 𝜚 1,2 𝑦 1 , 𝑦 2 𝜚 2,3 𝑦 2 , 𝑦 3 … 𝜚 𝑂−1,𝑂 𝑦 𝑂−1 , 𝑦 𝑂 undirected 𝑄 𝑦 𝑗 = 1 𝑎 … … 𝜚 1,2 𝑦 1 , 𝑦 2 … 𝜚 𝑂−1,𝑂 𝑦 𝑂−1 , 𝑦 𝑂 𝑦 𝑗−1 𝑦 𝑗+1 𝑦 𝑂 𝑦 1 𝑄 𝑦 𝑗 = 𝜚 𝑦 𝑗−1 , 𝑦 𝑗 𝜚 𝑦 𝑗−2 , 𝑦 𝑗−1 … 𝜚 𝑦 1 , 𝑦 2 𝑦 𝑗−1 𝑦 𝑗−2 𝑦 1 × 𝜚 𝑦 𝑗 , 𝑦 𝑗+1 𝜚 𝑦 𝑗+1 , 𝑦 𝑗+2 … 𝜚 𝑦 𝑂−1 , 𝑦 𝑂 𝑦 𝑗+1 𝑦 𝑗+2 𝑦 𝑂 operations in each elimination 𝑃 𝑊𝑏𝑚 𝑌 × 𝑊𝑏𝑚 𝑌 𝑘 𝑘+1 12

  13. Inference on a chain: improvement reasons  Computing an expression of the form (sum-product inference): 𝜚 𝜲 : the set of factors 𝒂 𝜚∈𝜲  We used the structure of BN to factorize the joint distribution and thus the scope of the resulted factors will be limited.  Distributive law: If 𝑌 ∉ Scope(𝜚 1 ) then 𝑌 𝜚 1 . 𝜚 2 = 𝜚 1 . 𝑌 𝜚 2  Performing the summations over the product of only a subset of factors  We find sub-expressions that can be computed once and then we save and reuse them in later computations  Instead of computing them exponentially many times 13

  14. Variable elimination algorithm for sum-product inference  Sum out each variable one at a time  all factors containing that variable are (removed from the set of factors and) multiplied to generate a product factor  The variable is summed out from the generated product factor and a new factor is obtained  The new factor is added to the set of the available factors The resulted factor does not necessarily correspond to any probability or conditional probability in the network 14

  15. Procedure Sum-Product-VE ( Z, G) Procedure Sum-Product-Elim-Var( 𝚾 , 𝑎 ) // 𝒂 : the variables to be eliminated 𝚾 ′ ← {𝜚 ∈ 𝚾: 𝑎 ∈ Scope(𝜚)} 𝚾 ← all factors of G 𝚾 ′′ ← 𝚾 − 𝚾 ′ Select an elimination order 𝑎 1 , . . . , 𝑎 𝐿 for 𝒂 for 𝑗 = 1, . . . , 𝐿 𝑛 ← 𝜚 𝚾 ← Sum-Product-Elim-Var( 𝚾 , 𝑎 𝑗 )) 𝜚∈𝚾 ′ 𝑎 return 𝚾 ′′ ∪ {𝑛} 𝜚 ∗ ← 𝜚 𝜚∈𝜲 Return 𝜚 ∗ • Move all irrelevant factors (to the variable that must be eliminated now) It does not need normalization for outside of the summation directed graph when we have no evidence • Perform sum, getting a new term Insert the new term into the product • 15

  16. Procedure Cond-Prob-VE ( 𝒧 , // the network over 𝒀 𝒁 , // Set of query variables 𝑭 = 𝒇, // evidence) 𝚾 ← the factors parametrizing 𝒧 Replace each 𝜚 ∈ 𝜲 by 𝜚[𝑭 = 𝒇] Select an elimination order 𝑎 1 , . . . , 𝑎 𝐿 for 𝒂 = 𝒀 − 𝒁 − 𝑭 for 𝑗 = 1, . . . , 𝑙 𝚾 ← Sum-Product-Elim-Var( 𝚾 , 𝑎 𝑗 )) 𝜚 ∗ ← 𝜚 𝜚∈𝜲 𝜚 ∗ (𝒛) 𝛽 ← 𝒛∈𝑊𝑏𝑚(𝒁) Return 𝛽, 𝜚 ∗ 16

  17. Directed example  Query: 𝑄(𝑌 2 |𝑌 7 = 𝑦 7 ) 𝑌 2 𝑌 1 𝑌 3  𝑄 𝑌 2 𝑦 7 ∝ 𝑄 𝑌 2 , 𝑦 7 𝑌 5 𝑌 4 𝑌 7 𝑄 𝑦 2 , 𝑦 7 𝑌 6 𝑌 8 = 𝑄 𝑦 1 , 𝑦 2 , 𝑦 3 , 𝑦 4 , 𝑦 5 , 𝑦 6 , 𝑦 7 , 𝑦 8 𝑦 1 𝑦 3 𝑦 4 𝑦 5 𝑦 6 𝑦 8 Consider the elimination order 𝑌 1 , 𝑌 3 , 𝑌 4 , 𝑌 5 , 𝑌 6 , 𝑌 8 𝑄 𝑦 2 , 𝑦 7 = 𝑄 𝑦 1 𝑄 𝑦 2 𝑄 𝑦 3 𝑦 1 , 𝑦 2 𝑄 𝑦 4 𝑦 3 𝑄 𝑦 5 𝑦 2 𝑄 𝑦 6 𝑦 3 , 𝑦 7 𝑄( 𝑦 7 |𝑦 4 , 𝑦 5 )𝑄 𝑦 8 𝑦 7 𝑦 8 𝑦 6 𝑦 5 𝑦 4 𝑦 3 𝑦 1 17

  18. 𝑄 𝑦 2 , 𝑦 7 = 𝑄 𝑦 2 𝑄 𝑦 4 𝑦 3 𝑄 𝑦 5 𝑦 2 𝑄 𝑦 6 𝑦 3 , 𝑦 7 𝑄( 𝑦 7 |𝑦 4 , 𝑦 5 )𝑄 𝑦 8 𝑦 7 𝑄 𝑦 1 𝑄 𝑦 3 𝑦 1 , 𝑦 2 𝑦 8 𝑦 6 𝑦 5 𝑦 4 𝑦 3 𝑦 1 = 𝑄 𝑦 2 𝑄 𝑦 4 𝑦 3 𝑄 𝑦 5 𝑦 2 𝑄 𝑦 6 𝑦 3 , 𝑦 7 𝑄 𝑦 7 𝑦 4 , 𝑦 5 𝑄 𝑦 8 𝑦 7 𝑛 1 (𝑦 2 , 𝑦 3 ) 𝑦 8 𝑦 6 𝑦 5 𝑦 4 𝑦 3 = 𝑄 𝑦 2 𝑄 𝑦 5 𝑦 2 𝑄 𝑦 7 𝑦 4 , 𝑦 5 𝑄 𝑦 8 𝑦 7 𝑄 𝑦 4 𝑦 3 𝑄 𝑦 6 𝑦 3 , 𝑦 7 𝑛 1 (𝑦 2 , 𝑦 3 ) 𝑦 8 𝑦 6 𝑦 5 𝑦 4 𝑦 3 = 𝑄 𝑦 2 𝑄 𝑦 5 𝑦 2 𝑄 𝑦 7 𝑦 4 , 𝑦 5 𝑄 𝑦 8 𝑦 7 𝑛 3 (𝑦 2 , 𝑦 6 , 𝑦 4 ) 𝑦 8 𝑦 6 𝑦 5 𝑦 4 = 𝑄 𝑦 2 𝑄 𝑦 5 𝑦 2 𝑄 𝑦 8 𝑦 7 𝑄 𝑦 7 𝑦 4 , 𝑦 5 𝑛 3 (𝑦 2 , 𝑦 6 , 𝑦 4 ) 𝑦 8 𝑦 6 𝑦 5 𝑦 4 = 𝑄 𝑦 2 𝑄 𝑦 5 𝑦 2 𝑄 𝑦 8 𝑦 7 𝑛 4 (𝑦 2 , 𝑦 5 , 𝑦 6 ) 𝑦 8 𝑦 6 𝑦 5 = 𝑄 𝑦 2 𝑄 𝑦 8 𝑦 7 𝑄 𝑦 5 𝑦 2 𝑛 4 (𝑦 2 , 𝑦 5 , 𝑦 6 ) 𝑦 8 𝑦 6 𝑦 5 = 𝑄 𝑦 2 𝑄 𝑦 8 𝑦 7 𝑛 5 (𝑦 2 , 𝑦 6 ) 𝑦 8 𝑦 6 = 𝑄 𝑦 2 𝑄 𝑦 8 𝑦 7 𝑛 5 (𝑦 2 , 𝑦 6 ) 𝑦 8 𝑦 6 = 𝑄 𝑦 2 𝑄 𝑦 8 𝑦 7 𝑛 6 (𝑦 2 ) = 𝑛 8 (𝑦 2 )𝑛 6 (𝑦 2 ) 18 𝑦 8

  19. Conditional probability 𝑛 8 (𝑦 2 )𝑛 6 (𝑦 2 ) 𝑄 𝑦 2 | 𝑦 7 = 𝑦 2 𝑛 8 (𝑦 2 )𝑛 6 (𝑦 2 ) 19

Recommend


More recommend