inference in bayesian networks
play

Inference in Bayesian Networks CE417: Introduction to Artificial - PowerPoint PPT Presentation

Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2019 Soleymani Slides are based on Klein and Abdeel, CS188, UC Berkeley. Bayes Nets } Representation } Conditional


  1. Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2019 Soleymani Slides are based on Klein and Abdeel, CS188, UC Berkeley.

  2. Bayes’ Nets } Representation } Conditional Independences } Probabilistic Inference } Enumeration (exact, exponential complexity) } Variable elimination (exact, worst-case exponential complexity, often better) } Probabilistic inference is NP-complete } Sampling (approximate) } Learning Bayes’ Nets from Data 2

  3. Recap: Bayes ’ Net Representation } A directed, acyclic graph, one node per random variable } A conditional probability table (CPT) for each node A collection of distributions over X, one for each combination of } parents ’ values } Bayes ’ nets implicitly encode joint distributions As a product of local conditional distributions } To see what probability a BN gives to a full assignment, multiply } all the relevant conditionals together: 3

  4. Example: Alarm Network E P(E) B P(B) B urglary E arthqk +e 0.002 +b 0.001 -e 0.998 -b 0.999 A larm B E A P(A|B,E) +b +e +a 0.95 J ohn M ary +b +e -a 0.05 calls calls +b -e +a 0.94 A J P(J|A) A M P(M|A) +b -e -a 0.06 +a +j 0.9 +a +m 0.7 -b +e +a 0.29 +a -j 0.1 +a -m 0.3 -b +e -a 0.71 -a +j 0.05 -a +m 0.01 -b -e +a 0.001 -a -j 0.95 -a -m 0.99 -b -e -a 0.999 [Demo: BN Applet] 4

  5. Video of Demo BN Applet 5

  6. Example: Alarm Network B P(B) E P(E) B E +b 0.001 +e 0.002 -b 0.999 -e 0.998 A A J P(J|A) A M P(M|A) B E A P(A|B,E) +a +j 0.9 +a +m 0.7 +b +e +a 0.95 +a -j 0.1 +a -m 0.3 J M +b +e -a 0.05 -a +j 0.05 -a +m 0.01 +b -e +a 0.94 -a -j 0.95 -a -m 0.99 +b -e -a 0.06 -b +e +a 0.29 -b +e -a 0.71 -b -e +a 0.001 -b -e -a 0.999 6

  7. Example: Alarm Network B P(B) E P(E) B E +b 0.001 +e 0.002 -b 0.999 -e 0.998 A A J P(J|A) A M P(M|A) B E A P(A|B,E) +a +j 0.9 +a +m 0.7 +b +e +a 0.95 +a -j 0.1 +a -m 0.3 J M +b +e -a 0.05 -a +j 0.05 -a +m 0.01 +b -e +a 0.94 -a -j 0.95 -a -m 0.99 +b -e -a 0.06 -b +e +a 0.29 -b +e -a 0.71 -b -e +a 0.001 -b -e -a 0.999 7

  8. Bayes’ Nets Representation } Conditional Independences } Probabilistic Inference } Enumeration (exact, exponential complexity) } Variable elimination (exact, worst-case exponential } complexity, often better) Inference is NP-complete } Sampling (approximate) } Learning Bayes’ Nets from Data } 8

  9. Inference Examples: } Inference: calculating some § useful quantity from a joint § Posterior probability probability distribution § Most likely explanation: 9

  10. Inference by Enumeration General case: * Works fine with } § We want: multiple query Evidence variables: } variables, too Query* variable: } All variables Hidden variables: } Step 1: Select the Step 2: Sum out H to get joint § Step 3: Normalize § § entries consistent of Query and evidence with the evidence × 1 Z 10

  11. Inference by Enumeration in Bayes’ Net } Given unlimited time, inference in BNs is easy B E } Reminder of inference by enumeration by example: P ( B | + j, + m ) ∝ B P ( B, + j, + m ) A X = P ( B, e, a, + j, + m ) J M e,a X = P ( B ) P ( e ) P ( a | B, e ) P (+ j | a ) P (+ m | a ) e,a = P ( B ) P (+ e ) P (+ a | B, + e ) P (+ j | + a ) P (+ m | + a ) + P ( B ) P (+ e ) P ( − a | B, + e ) P (+ j | − a ) P (+ m | − a ) P ( B ) P ( − e ) P (+ a | B, − e ) P (+ j | + a ) P (+ m | + a ) + P ( B ) P ( − e ) P ( − a | B, − e ) P (+ j | − a ) P (+ m | − a ) 11

  12. Burglary example: full joint probability ∑ ∑ 𝑄 𝑐 𝑘, ¬𝑛 = 𝑄 𝑘, ¬𝑛, 𝑐 𝑄 𝑘, ¬𝑛, 𝑐, 𝐵, 𝐹 , + = 𝑄 𝑘, ¬𝑛 ∑ ∑ ∑ 𝑄 𝑘, ¬𝑛, 𝑐, 𝐵, 𝐹 - , + ∑ ∑ 𝑄 𝑘 𝐵 𝑄 ¬𝑛 𝐵 𝑄 𝐵 𝑐, 𝐹 𝑄 𝑐 𝑄(𝐹) + , = ∑ ∑ ∑ 𝑄 𝑘 𝐵 𝑄 ¬𝑛 𝐵 𝑄 𝐵 𝐶, 𝐹 𝑄 𝐶 𝑄(𝐹) - , + Short-hands 𝑘: 𝐾𝑝ℎ𝑜𝐷𝑏𝑚𝑚𝑡 = 𝑈𝑠𝑣𝑓 ¬𝑐: 𝐶𝑣𝑠𝑕𝑚𝑏𝑠𝑧 = 𝐺𝑏𝑚𝑡𝑓 … 12

  13. Inference by Enumeration? P ( Antilock | observed variables ) = ? 13

  14. Factor Zoo 14

  15. Factor Zoo I } Joint distribution: P(X,Y) T W P Entries P(x,y) for all x, y hot sun 0.4 } hot rain 0.1 Sums to 1 } cold sun 0.2 cold rain 0.3 } Selected joint: P(x,Y) A slice of the joint distribution } T W P Entries P(x,y) for fixed x, all y } cold sun 0.2 Sums to P(x) } cold rain 0.3 } Number of capitals = dimensionality of the table 15

  16. Factor Zoo II } Single conditional: P(Y | x) Entries P(y | x) for fixed x, all y } T W P Sums to 1 } cold sun 0.4 cold rain 0.6 } Family of conditionals: P(X |Y) Multiple conditionals } T W P Entries P(x | y) for all x, y } hot sun 0.8 Sums to |Y| } hot rain 0.2 cold sun 0.4 cold rain 0.6 16

  17. Factor Zoo III } Specified family: P( y | X ) Entries P(y | x) for fixed y, } but for all x Sums to … who knows! } T W P hot rain 0.2 cold rain 0.6 17

  18. Factor Zoo Summary § In general, when we write P(Y 1 … Y N | X 1 … X M ) § It is a “ factor, ” a multi-dimensional array § Its values are P(y 1 … y N | x 1 … x M ) § Any assigned (=lower-case) X or Y is a dimension missing (selected) from the array 18

  19. Example: Traffic Domain } RandomVariables +r 0.1 R -r 0.9 } R: Raining } T:Traffic +r +t 0.8 T } L: Late for class! +r -t 0.2 -r +t 0.1 -r -t 0.9 P ( L ) = ? L X = P ( r, t, L ) +t +l 0.3 r,t +t -l 0.7 -t +l 0.1 X = P ( r ) P ( t | r ) P ( L | t ) -t -l 0.9 r,t 19

  20. Inference by Enumeration: Procedural Outline } Track objects called factors } Initial factors are local CPTs (one per node) +r 0.1 +r +t 0.8 +t +l 0.3 -r 0.9 +r -t 0.2 +t -l 0.7 -r +t 0.1 -t +l 0.1 -r -t 0.9 -t -l 0.9 } Any known values are selected } E.g. if we know , the initial factors are +r 0.1 +r +t 0.8 +t +l 0.3 -r 0.9 +r -t 0.2 -t +l 0.1 -r +t 0.1 -r -t 0.9 } Procedure: Join all factors, then eliminate all hidden variables 20

  21. Operation 1: Join Factors } First basic operation: joining factors } Combining factors: Just like a database join } Get all factors over the joining variable } Build a new factor over the union of the } variables involved } Example: Join on R Computation for each entry: pointwise } products R +r 0.1 +r +t 0.8 +r +t 0.08 R,T -r 0.9 +r -t 0.2 +r -t 0.02 -r +t 0.1 -r +t 0.09 T -r -t 0.9 -r -t 0.81 21

  22. Example: Multiple Joins 22

  23. Example: Multiple Joins +r 0.1 R -r 0.9 Join R Join T +r +t 0.08 R, T, L +r -t 0.02 -r +t 0.09 T +r +t 0.8 R, T -r -t 0.81 +r -t 0.2 -r +t 0.1 0.024 +r +t +l -r -t 0.9 0.056 +r +t -l L L 0.002 +r -t +l 0.018 +r -t -l +t +l 0.3 +t +l 0.3 0.027 -r +t +l +t -l 0.7 +t -l 0.7 0.063 -r +t -l -t +l 0.1 -t +l 0.1 0.081 -r -t +l -t -l 0.9 -t -l 0.9 0.729 -r -t -l 23

  24. Operation 2: Eliminate } Second basic operation: marginalization } Take a factor and sum out a variable Shrinks a factor to a smaller one } A projection operation } } Example: +r +t 0.08 +t 0.17 +r -t 0.02 -t 0.83 -r +t 0.09 -r -t 0.81 24

  25. Multiple Elimination R, T, L T, L L 0.024 +r +t +l Sum Sum 0.056 +r +t -l out R out T 0.002 +r -t +l 0.018 +r -t -l +t +l 0.051 +l 0.134 0.027 -r +t +l +t -l 0.119 -l 0.886 0.063 -r +t -l -t +l 0.083 0.081 -r -t +l -t -l 0.747 0.729 -r -t -l 25

  26. Thus Far: Multiple Join, Multiple Eliminate (= Inference by Enumeration) 26

  27. Inference by Enumeration vs. Variable Elimination } Why is inference by enumeration so slow? Idea: interleave joining and marginalizing! § Called “ Variable Elimination ” § You join up the whole joint distribution before } you sum out the hidden variables Still NP-hard, but usually much faster than § inference by enumeration First we’ll need some new notation: factors § 27

  28. Traffic Domain P ( L ) = ? R § Variable Elimination } Inference by Enumeration T X X = P ( L | t ) P ( r ) P ( t | r ) X X = P ( L | t ) P ( r ) P ( t | r ) L t r t r Join on r Join on r Join on t Eliminate r Eliminate r Join on t Eliminate t Eliminate t 28

  29. Marginalizing Early (= Variable Elimination) 29

  30. Marginalizing Early! (aka VE) Join R Sum out T Sum out R Join T +r +t 0.08 +r 0.1 +r -t 0.02 +t 0.17 -r 0.9 -r +t 0.09 -t 0.83 -r -t 0.81 R T T, L R, T L +r +t 0.8 +r -t 0.2 -r +t 0.1 T L -r -t 0.9 L +t +l 0.051 +l 0.134 +t -l 0.119 -l 0.866 -t +l 0.083 L +t +l 0.3 +t +l 0.3 -t -l 0.747 +t +l 0.3 +t -l 0.7 +t -l 0.7 +t -l 0.7 -t +l 0.1 -t +l 0.1 -t +l 0.1 -t -l 0.9 -t -l 0.9 -t -l 0.9 30

  31. Evidence } If evidence, start with factors that select that evidence } No evidence uses these initial factors: +r 0.1 +r +t 0.8 +t +l 0.3 -r 0.9 +r -t 0.2 +t -l 0.7 -r +t 0.1 -t +l 0.1 -r -t 0.9 -t -l 0.9 } Computing , the initial factors become: +r 0.1 +r +t 0.8 +t +l 0.3 +r -t 0.2 +t -l 0.7 -t +l 0.1 -t -l 0.9 } We eliminate all vars other than query + evidence 31

Recommend


More recommend