inference in bayesian networks
play

Inference in Bayesian Networks Marco Chiarandini Department of - PowerPoint PPT Presentation

Lecture 7 Inference in Bayesian Networks Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Inference in BN Course Overview Introduction Learning


  1. Lecture 7 Inference in Bayesian Networks Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig

  2. Inference in BN Course Overview ✔ Introduction Learning ✔ Artificial Intelligence Supervised ✔ Intelligent Agents Learning Bayesian Networks, Neural Networks ✔ Search Unsupervised ✔ Uninformed Search EM Algorithm ✔ Heuristic Search Reinforcement Learning Uncertain knowledge and Games and Adversarial Search Reasoning Minimax search and ✔ Probability and Bayesian Alpha-beta pruning approach Multiagent search Bayesian Networks Knowledge representation and Hidden Markov Chains Reasoning Kalman Filters Propositional logic First order logic Inference Plannning 2

  3. Inference in BN Bayesian networks, Resume Encode local conditional independences Pr ( X i | X − i ) = Pr ( X i | Parents ( X i )) Thus the global semantics simplifies to (joint probability factorization): n � Pr ( X 1 , . . . , X n ) = Pr ( X i | X 1 , . . . , X i − 1 ) (chain rule) i = 1 n � = Pr ( X i | Parents ( X i )) (by construction) i = 1 3

  4. Inference in BN Outline 1. Inference in BN 4

  5. Inference in BN Inference tasks Simple queries: compute posterior marginal Pr ( X i | E = e ) e.g., P ( NoGas | Gauge = empty , Lights = on , Starts = false ) Conjunctive queries: Pr ( X i , X j | E = e ) = Pr ( X i | E = e ) Pr ( X j | X i , E = e ) Optimal decisions: decision networks include utility information; probabilistic inference required for P ( outcome | action , evidence ) Value of information: which evidence to seek next? Sensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor? 5

  6. Inference in BN Inference by enumeration Sum out variables from the joint without actually constructing its explicit representation Simple query on the burglary network: B E Pr ( B | j , m ) = Pr ( B , j , m ) / P ( j , m ) A = α Pr ( B , j , m ) = α � � a Pr ( B , e , a , j , m ) e J M Rewrite full joint entries using product of CPT entries: = α � � Pr ( B | j , m ) a Pr ( B ) P ( e ) Pr ( a | B , e ) P ( j | a ) P ( m | a ) e = α Pr ( B ) � e P ( e ) � a Pr ( a | B , e ) P ( j | a ) P ( m | a ) Recursive depth-first enumeration: O ( n ) space, O ( d n ) time 6

  7. Inference in BN Enumeration algorithm function Enumeration-Ask( X , e , bn ) returns a distribution over X inputs : X , the query variable e , observed values for variables E bn , a Bayesian network with variables { X } ∪ E ∪ Y Q ( X ) ← a distribution over X , initially empty for each value x i of X do Q ( x i ) ← Enumerate-All( bn .Vars, e ∪ { X = x i } ) return Normalize( Q ( X ) ) function Enumerate-All( vars , e ) returns a real number if Empty?( vars ) then return 1.0 Y ← First( vars ) if Y has value y in e then return P ( y | parent ( Y )) × Enumerate-All(Rest( vars ), e ) else return � y P ( y | parent ( Y )) × Enumerate-All(Rest( vars ), e ∪ { Y = y } ) 7

  8. Inference in BN Evaluation tree P(b) .001 P(e) P( e) .002 .998 P(a|b,e) P( a|b,e) P(a|b, e) P( a|b, e) .95 .05 .94 .06 P(j|a) P(j| a) P(j|a) P(j| a) .90 .05 .90 .05 P(m|a) P(m| a) P(m|a) P(m| a) .70 .01 .70 .01 Enumeration is inefficient: repeated computation e.g., computes P ( j | a ) P ( m | a ) for each value of e 8

  9. Inference in BN Inference by variable elimination Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid recomputation Pr ( B | j , m ) � � = α Pr ( B ) e P ( e ) a Pr ( a | B , e ) P ( j | a ) P ( m | a ) � �� � ���� � �� � � �� � � �� � B E A J M = α Pr ( B ) � e P ( e ) � a Pr ( a | B , e ) P ( j | a ) f M ( a ) = α Pr ( B ) � e P ( e ) � a Pr ( a | B , e ) f J ( a ) f M ( a ) = α Pr ( B ) � e P ( e ) � a f A ( a , b , e ) f J ( a ) f M ( a ) = α Pr ( B ) � e P ( e ) f ¯ AJM ( b , e ) (sum out A ) = α Pr ( B ) f ¯ AJM ( b ) (sum out E ) E ¯ = α f B ( b ) × f ¯ AJM ( b ) E ¯ 9

  10. Inference in BN Variable elimination: Basic operations Summing out a variable from a product of factors: 1. move any constant factors outside the summation: � � x f 1 × · · · × f k = f 1 × · · · × f i x f i + 1 × · · · × f k = f 1 × · · · × f i × f ¯ X assuming f 1 , . . . , f i do not depend on X 2. add up submatrices in pointwise product of remaining factors: Eg: pointwise product of f 1 and f 2 : f 1 ( x 1 , . . . , x j , y 1 , . . . , y k ) × f 2 ( y 1 , . . . , y k , z 1 , . . . , z l ) = f ( x 1 , . . . , x j , y 1 , . . . , y k , z 1 , . . . , z l ) E.g., f 1 ( a , b ) × f 2 ( b , c ) = f ( a , b , c ) 10

  11. Inference in BN Irrelevant variables B E Consider the query P ( JohnCalls | Burglary = true ) � � � P ( J | b ) = α P ( b ) P ( e ) P ( a | b , e ) P ( J | a ) P ( m | a ) A e a m Sum over m is identically 1; M is irrelevant to the J M query Theorem Y is irrelevant unless Y ∈ Ancestors ( { X } ∪ E ) Here, X = JohnCalls , E = { Burglary } , and Ancestors ( { X } ∪ E ) = { Alarm , Earthquake } so MaryCalls is irrelevant 12

  12. Inference in BN Irrelevant variables contd. Defn: moral graph of DAG Bayes net: marry all parents and drop arrows Defn: A is m-separated from B by C iff separated by C in the moral graph Theorem Y is irrelevant if m-separated from X by E For P ( JohnCalls | Alarm = true ) , both B E Burglary and Earthquake are irrelevant A J M 13

  13. Inference in BN Complexity of exact inference Singly connected networks (or polytrees): – any two nodes are connected by at most one (undirected) path – time and space cost (with variable elimination) are O ( d k n ) – hence time and space cost are linear in n and k bounded by a constant Multiply connected networks: – can reduce 3SAT to exact inference = ⇒ NP-hard – equivalent to counting 3SAT models = ⇒ #P-complete Proof of this in one of the exercises for Thursday. 14

  14. Inference in BN Inference by stochastic simulation Basic idea: 0.5 Draw N samples from a sampling distribution S Compute an approximate posterior probability ˆ P Show this converges to the true probability P Coin Outline: – Sampling from an empty network – Rejection sampling: reject samples disagreeing with evidence – Likelihood weighting: use evidence to weight samples – Markov chain Monte Carlo (MCMC): sample from a stochastic process whose stationary distribution is the true posterior 15

  15. Inference in BN Sampling from an empty network function Prior-Sample( bn ) returns an event sampled from bn inputs : bn , a belief network specifying joint distribution Pr ( X 1 , . . . , X n ) x ← an event with n elements for i = 1 to n do x i ← a random sample from Pr ( X i | parents ( X i )) given the values of Parents ( X i ) in x return x Ancestor sampling 16

  16. Inference in BN Example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain T .10 Sprinkler T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 17

  17. Inference in BN Sampling from an empty network contd. Probability that PriorSample generates a particular event S PS ( x 1 . . . x n ) = P ( x 1 . . . x n ) i.e., the true prior probability E.g., S PS ( t , f , t , t ) = 0 . 5 × 0 . 9 × 0 . 8 × 0 . 9 = 0 . 324 = P ( t , f , t , t ) Proof: Let N PS ( x 1 . . . x n ) be the number of samples generated for event x 1 , . . . , x n . Then we have ˆ lim P ( x 1 , . . . , x n ) = N →∞ N PS ( x 1 , . . . , x n ) / N lim N →∞ = S PS ( x 1 , . . . , x n ) n � = P ( x i | parents ( X i )) = P ( x 1 . . . x n ) i = 1 � That is, estimates derived from PriorSample are consistent Shorthand: ˆ P ( x 1 , . . . , x n ) ≈ P ( x 1 . . . x n ) 18

  18. Inference in BN Rejection sampling ˆ Pr ( X | e ) estimated from samples agreeing with e function Rejection-Sampling( X , e , bn , N ) returns an estimate of P ( X | e ) local variables : N , a vector of counts over X , initially zero for j = 1 to N do x ← Prior-Sample( bn ) if x is consistent with e then N [ x ] ← N [ x ]+1 where x is the value of X in x return Normalize( N [ X ]) E.g., estimate Pr ( Rain | Sprinkler = true ) using 100 samples 27 samples have Sprinkler = true Of these, 8 have Rain = true and 19 have Rain = false . ˆ Pr ( Rain | Sprinkler = true ) = Normalize ( � 8 , 19 � ) = � 0 . 296 , 0 . 704 � Similar to a basic real-world empirical estimation procedure 19

  19. Inference in BN Analysis of rejection sampling Rejection sampling returns consistent posterior estimates Proof: ˆ Pr ( X | e ) = α N PS ( X , e ) (algorithm defn.) = N PS ( X , e ) / N PS ( e ) (normalized by N PS ( e ) ) ≈ Pr ( X , e ) / P ( e ) (property of PriorSample) = Pr ( X | e ) (defn. of conditional probability) Problem: hopelessly expensive if P ( e ) is small P ( e ) drops off exponentially with number of evidence variables! 20

Recommend


More recommend