inference in bayesian networks
play

Inference in Bayesian Networks CE417: Introduction to Artificial - PowerPoint PPT Presentation

Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Soleymani Slides are based on Klein and Abdeel, CS188, UC Berkeley. Bayes Nets Representation Conditional


  1. Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Soleymani Slides are based on Klein and Abdeel, CS188, UC Berkeley.

  2. Bayes ’ Nets  Representation  Conditional Independences  Probabilistic Inference  Enumeration (exact, exponential complexity)  Variable elimination (exact, worst-case exponential complexity, often better)  Probabilistic inference is NP-complete  Sampling (approximate)  Learning Bayes ’ Nets from Data 2

  3. Recap: Bayes ’ Net Representation  A directed, acyclic graph, one node per random variable  A conditional probability table (CPT) for each node A collection of distributions over X, one for each combination of  parents ’ values  Bayes ’ nets implicitly encode joint distributions As a product of local conditional distributions  To see what probability a BN gives to a full assignment, multiply  all the relevant conditionals together: 3

  4. Example: Alarm Network E P(E) B P(B) B urgla E arth +e 0.002 +b 0.001 ry qk -e 0.998 -b 0.999 A lar m B E A P(A|B,E) J oh M a +b +e +a 0.95 n ry +b +e -a 0.05 call call +b -e +a 0.94 s s A J P(J|A) A M P(M|A) +b -e -a 0.06 +a +j 0.9 +a +m 0.7 -b +e +a 0.29 +a -j 0.1 +a -m 0.3 -b +e -a 0.71 -a +j 0.05 -a +m 0.01 -b -e +a 0.001 -a -j 0.95 -a -m 0.99 -b -e -a 0.999 [Demo: BN Appl 4

  5. Video of Demo BN Applet 5

  6. Example: Alarm Network B P(B) E P(E) B E +b 0.001 +e 0.002 -b 0.999 -e 0.998 A A J P(J|A) A M P(M|A) B E A P(A|B,E) +a +j 0.9 +a +m 0.7 +b +e +a 0.95 +a -j 0.1 +a -m 0.3 J M +b +e -a 0.05 -a +j 0.05 -a +m 0.01 +b -e +a 0.94 -a -j 0.95 -a -m 0.99 +b -e -a 0.06 -b +e +a 0.29 -b +e -a 0.71 -b -e +a 0.001 -b -e -a 0.999 6

  7. Example: Alarm Network B P(B) E P(E) B E +b 0.001 +e 0.002 -b 0.999 -e 0.998 A A J P(J|A) A M P(M|A) B E A P(A|B,E) +a +j 0.9 +a +m 0.7 +b +e +a 0.95 +a -j 0.1 +a -m 0.3 J M +b +e -a 0.05 -a +j 0.05 -a +m 0.01 +b -e +a 0.94 -a -j 0.95 -a -m 0.99 +b -e -a 0.06 -b +e +a 0.29 -b +e -a 0.71 -b -e +a 0.001 -b -e -a 0.999 7

  8. Bayes ’ Nets Representation  Conditional Independences  Probabilistic Inference  Enumeration (exact, exponential complexity)  Variable elimination (exact, worst-case exponential  complexity, often better)  Inference is NP-complete Sampling (approximate)  Learning Bayes ’ Nets from Data  8

  9. Inference  Examples:  Inference: calculating some useful quantity from a joint  Posterior probability probability distribution  Most likely explanation: 9

  10. Inference by Enumeration * Works fine with General case:   We want: multiple query  Evidence variables: variables, too Query* variable:  All variables  Hidden variables:    Step 2: Sum out H to get joint Step 3: Normalize Step 1: Select the entries consistent of Query and evidence with the evidence 10

  11. Inference by Enumeration in Bayes ’ Net  Given unlimited time, inference in BNs is easy B E  Reminder of inference by enumeration by example: A J M 11

  12. Burglary example: full joint probability 𝐵 𝐹 𝑄 𝑘, ¬𝑛, 𝑐, 𝐵, 𝐹 𝑄 𝑐 𝑘, ¬𝑛 = 𝑄 𝑘, ¬𝑛, 𝑐 = 𝑄 𝑘, ¬𝑛 𝐶 𝐵 𝐹 𝑄 𝑘, ¬𝑛, 𝑐, 𝐵, 𝐹 𝐵 𝐹 𝑄 𝑘 𝐵 𝑄 ¬𝑛 𝐵 𝑄 𝐵 𝑐, 𝐹 𝑄 𝑐 𝑄(𝐹) = 𝐶 𝐵 𝐹 𝑄 𝑘 𝐵 𝑄 ¬𝑛 𝐵 𝑄 𝐵 𝐶, 𝐹 𝑄 𝐶 𝑄(𝐹) Short-hands 𝑘: 𝐾𝑝ℎ𝑜𝐷𝑏𝑚𝑚𝑡 = 𝑈𝑠𝑣𝑓 ¬𝑐: 𝐶𝑣𝑠𝑕𝑚𝑏𝑠𝑧 = 𝐺𝑏𝑚𝑡𝑓 … 12

  13. Inference by Enumeration? 13

  14. Factor Zoo 14

  15. Factor Zoo I  Joint distribution: P(X,Y) T W P Entries P(x,y) for all x, y  hot sun 0.4 Sums to 1 hot rain 0.1  cold sun 0.2 cold rain 0.3  Selected joint: P(x,Y) A slice of the joint distribution  T W P Entries P(x,y) for fixed x, all y  cold sun 0.2 Sums to P(x)  cold rain 0.3  Number of capitals = dimensionality of the table 15

  16. Factor Zoo II  Single conditional: P(Y | x) Entries P(y | x) for fixed x, all y  T W P Sums to 1  cold sun 0.4 cold rain 0.6  Family of conditionals: P(X |Y) Multiple conditionals  T W P  Entries P(x | y) for all x, y hot sun 0.8  Sums to |Y| hot rain 0.2 cold sun 0.4 cold rain 0.6 16

  17. Factor Zoo III  Specified family: P( y | X ) Entries P(y | x) for fixed y,  but for all x  Sums to … who knows! T W P hot rain 0.2 cold rain 0.6 17

  18. Factor Zoo Summary  In general, when we write P(Y 1 … Y N | X 1 … X M )  It is a “ factor, ” a multi-dimensional array  Its values are P(y 1 … y N | x 1 … x M )  Any assigned (=lower-case) X or Y is a dimension missing (selected) from the array 18

  19. Example: Traffic Domain  RandomVariables +r 0.1  R: Raining -r 0.9 R  T:Traffic +r +t 0.8  L: Late for class! T +r -t 0.2 -r +t 0.1 -r -t 0.9 L +t +l 0.3 +t -l 0.7 -t +l 0.1 -t -l 0.9 19

  20. Inference by Enumeration: Procedural Outline  Track objects called factors  Initial factors are local CPTs (one per node) +r 0.1 +r +t 0.8 +t +l 0.3 -r 0.9 +r -t 0.2 +t -l 0.7 -r +t 0.1 -t +l 0.1 -r -t 0.9 -t -l 0.9  Any known values are selected  E.g. if we know , the initial factors are +r 0.1 +r +t 0.8 +t +l 0.3 -r 0.9 +r -t 0.2 -t +l 0.1 -r +t 0.1 -r -t 0.9  Procedure: Join all factors, then eliminate all hidden variables 20

  21. Operation 1: Join Factors  First basic operation: joining factors  Combining factors: Just like a database join  Get all factors over the joining variable  Build a new factor over the union of the  variables involved  Example: Join on R Computation for each entry: pointwise  products R +r 0.1 +r +t 0.8 +r +t 0.08 R,T -r 0.9 +r -t 0.2 +r -t 0.02 -r +t 0.1 -r +t 0.09 T -r -t 0.9 -r -t 0.81 21

  22. Example: Multiple Joins 22

  23. Example: Multiple Joins +r 0.1 R -r 0.9 Join R Join T +r +t 0.08 R, T, L +r -t 0.02 -r +t 0.09 +r +t 0.8 T R, T -r -t 0.81 +r -t 0.2 -r +t 0.1 0.024 +r +t +l -r -t 0.9 0.056 +r +t -l L L 0.002 +r -t +l 0.018 +r -t -l +t +l 0.3 +t +l 0.3 0.027 -r +t +l +t -l 0.7 +t -l 0.7 0.063 -r +t -l -t +l 0.1 -t +l 0.1 0.081 -r -t +l -t -l 0.9 -t -l 0.9 0.729 -r -t -l 23

  24. Operation 2: Eliminate  Second basic operation: marginalization  Take a factor and sum out a variable Shrinks a factor to a smaller one  A projection operation   Example: +r +t 0.08 +t 0.17 +r -t 0.02 -t 0.83 -r +t 0.09 -r -t 0.81 24

  25. Multiple Elimination R, T, L T, L L 0.024 +r +t +l Sum Sum 0.056 +r +t -l out R out T 0.002 +r -t +l 0.018 +r -t -l +t +l 0.051 +l 0.134 0.027 -r +t +l +t -l 0.119 -l 0.886 0.063 -r +t -l -t +l 0.083 0.081 -r -t +l -t -l 0.747 0.729 -r -t -l 25

  26. Thus Far: Multiple Join, Multiple Eliminate (= Inference by Enumeration) 26

  27. Inference by Enumeration vs. Variable Elimination  Idea: interleave joining and marginalizing!  Why is inference by enumeration so slow?  Called “ Variable Elimination ” You join up the whole joint distribution before   you sum out the hidden variables Still NP-hard, but usually much faster than inference by enumeration  First we ’ ll need some new notation: factors 27

  28. Traffic Domain R  Variable Elimination  Inference by Enumeration T L Join on r Join on r Join on t Eliminate r Eliminate r Join on t Eliminate t Eliminate t 28

  29. Marginalizing Early (= Variable Elimination) 29

  30. Marginalizing Early! (aka VE) Join R Sum out T Sum out R Join T +r +t 0.08 +r 0.1 +r -t 0.02 +t 0.17 -r 0.9 -r +t 0.09 -t 0.83 -r -t 0.81 R T T, L R, T L +r +t 0.8 +r -t 0.2 -r +t 0.1 T L -r -t 0.9 L +t +l 0.051 +l 0.134 +t -l 0.119 -l 0.866 -t +l 0.083 L +t +l 0.3 +t +l 0.3 -t -l 0.747 +t +l 0.3 +t -l 0.7 +t -l 0.7 +t -l 0.7 -t +l 0.1 -t +l 0.1 -t +l 0.1 -t -l 0.9 -t -l 0.9 -t -l 0.9 30

  31. Evidence  If evidence, start with factors that select that evidence  No evidence uses these initial factors: +r 0.1 +r +t 0.8 +t +l 0.3 -r 0.9 +r -t 0.2 +t -l 0.7 -r +t 0.1 -t +l 0.1 -r -t 0.9 -t -l 0.9  Computing , the initial factors become: +r 0.1 +r +t 0.8 +t +l 0.3 +r -t 0.2 +t -l 0.7 -t +l 0.1 -t -l 0.9  We eliminate all vars other than query + evidence 31

  32. Evidence II  Result will be a selected joint of query and evidence E.g. for P(L | +r), we would end up with:  Normalize +r +l 0.026 +l 0.26 +r -l 0.074 -l 0.74  To get our answer, just normalize this!  That ’ s it! 32

  33. Distribution of products on sums  Exploiting the factorization properties to allow sums and products to be interchanged needs three operations while 𝑏 × (𝑐 + 𝑑)  𝑏 × 𝑐 + 𝑏 × 𝑑 requires two 33

Recommend


More recommend