Inference in Bayesian Networks Marco Chiarandini Department of - PowerPoint PPT Presentation

Lecture 7 Inference in Bayesian Networks Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig

Inference in BN Course Overview ✔ Introduction Learning ✔ Artificial Intelligence Supervised ✔ Intelligent Agents Learning Bayesian Networks, Neural Networks ✔ Search Unsupervised ✔ Uninformed Search EM Algorithm ✔ Heuristic Search Reinforcement Learning Uncertain knowledge and Games and Adversarial Search Reasoning Minimax search and ✔ Probability and Bayesian Alpha-beta pruning approach Multiagent search Bayesian Networks Knowledge representation and Hidden Markov Chains Reasoning Kalman Filters Propositional logic First order logic Inference Plannning 2

Inference in BN Bayesian networks, Resume Encode local conditional independences Pr ( X i | X − i ) = Pr ( X i | Parents ( X i )) Thus the global semantics simplifies to (joint probability factorization): n � Pr ( X 1 , . . . , X n ) = Pr ( X i | X 1 , . . . , X i − 1 ) (chain rule) i = 1 n � = Pr ( X i | Parents ( X i )) (by construction) i = 1 3

Inference in BN Outline 1. Inference in BN 4

Inference in BN Inference tasks Simple queries: compute posterior marginal Pr ( X i | E = e ) e.g., P ( NoGas | Gauge = empty , Lights = on , Starts = false ) Conjunctive queries: Pr ( X i , X j | E = e ) = Pr ( X i | E = e ) Pr ( X j | X i , E = e ) Optimal decisions: decision networks include utility information; probabilistic inference required for P ( outcome | action , evidence ) Value of information: which evidence to seek next? Sensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor? 5

Inference in BN Inference by enumeration Sum out variables from the joint without actually constructing its explicit representation Simple query on the burglary network: B E Pr ( B | j , m ) = Pr ( B , j , m ) / P ( j , m ) A = α Pr ( B , j , m ) = α � � a Pr ( B , e , a , j , m ) e J M Rewrite full joint entries using product of CPT entries: = α � � Pr ( B | j , m ) a Pr ( B ) P ( e ) Pr ( a | B , e ) P ( j | a ) P ( m | a ) e = α Pr ( B ) � e P ( e ) � a Pr ( a | B , e ) P ( j | a ) P ( m | a ) Recursive depth-first enumeration: O ( n ) space, O ( d n ) time 6

Inference in BN Enumeration algorithm function Enumeration-Ask( X , e , bn ) returns a distribution over X inputs : X , the query variable e , observed values for variables E bn , a Bayesian network with variables { X } ∪ E ∪ Y Q ( X ) ← a distribution over X , initially empty for each value x i of X do Q ( x i ) ← Enumerate-All( bn .Vars, e ∪ { X = x i } ) return Normalize( Q ( X ) ) function Enumerate-All( vars , e ) returns a real number if Empty?( vars ) then return 1.0 Y ← First( vars ) if Y has value y in e then return P ( y | parent ( Y )) × Enumerate-All(Rest( vars ), e ) else return � y P ( y | parent ( Y )) × Enumerate-All(Rest( vars ), e ∪ { Y = y } ) 7

Inference in BN Evaluation tree P(b) .001 P(e) P( e) .002 .998 P(a|b,e) P( a|b,e) P(a|b, e) P( a|b, e) .95 .05 .94 .06 P(j|a) P(j| a) P(j|a) P(j| a) .90 .05 .90 .05 P(m|a) P(m| a) P(m|a) P(m| a) .70 .01 .70 .01 Enumeration is inefficient: repeated computation e.g., computes P ( j | a ) P ( m | a ) for each value of e 8

Inference in BN Inference by variable elimination Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid recomputation Pr ( B | j , m ) � � = α Pr ( B ) e P ( e ) a Pr ( a | B , e ) P ( j | a ) P ( m | a ) � �� B E A J M = α Pr ( B ) � e P ( e ) � a Pr ( a | B , e ) P ( j | a ) f M ( a ) = α Pr ( B ) � e P ( e ) � a Pr ( a | B , e ) f J ( a ) f M ( a ) = α Pr ( B ) � e P ( e ) � a f A ( a , b , e ) f J ( a ) f M ( a ) = α Pr ( B ) � e P ( e ) f ¯ AJM ( b , e ) (sum out A ) = α Pr ( B ) f ¯ AJM ( b ) (sum out E ) E ¯ = α f B ( b ) × f ¯ AJM ( b ) E ¯ 9

Inference in BN Variable elimination: Basic operations Summing out a variable from a product of factors: 1. move any constant factors outside the summation: � � x f 1 × · · · × f k = f 1 × · · · × f i x f i + 1 × · · · × f k = f 1 × · · · × f i × f ¯ X assuming f 1 , . . . , f i do not depend on X 2. add up submatrices in pointwise product of remaining factors: Eg: pointwise product of f 1 and f 2 : f 1 ( x 1 , . . . , x j , y 1 , . . . , y k ) × f 2 ( y 1 , . . . , y k , z 1 , . . . , z l ) = f ( x 1 , . . . , x j , y 1 , . . . , y k , z 1 , . . . , z l ) E.g., f 1 ( a , b ) × f 2 ( b , c ) = f ( a , b , c ) 10

Inference in BN Irrelevant variables B E Consider the query P ( JohnCalls | Burglary = true ) � � � P ( J | b ) = α P ( b ) P ( e ) P ( a | b , e ) P ( J | a ) P ( m | a ) A e a m Sum over m is identically 1; M is irrelevant to the J M query Theorem Y is irrelevant unless Y ∈ Ancestors ( { X } ∪ E ) Here, X = JohnCalls , E = { Burglary } , and Ancestors ( { X } ∪ E ) = { Alarm , Earthquake } so MaryCalls is irrelevant 12

Inference in BN Irrelevant variables contd. Defn: moral graph of DAG Bayes net: marry all parents and drop arrows Defn: A is m-separated from B by C iff separated by C in the moral graph Theorem Y is irrelevant if m-separated from X by E For P ( JohnCalls | Alarm = true ) , both B E Burglary and Earthquake are irrelevant A J M 13

Inference in BN Complexity of exact inference Singly connected networks (or polytrees): – any two nodes are connected by at most one (undirected) path – time and space cost (with variable elimination) are O ( d k n ) – hence time and space cost are linear in n and k bounded by a constant Multiply connected networks: – can reduce 3SAT to exact inference = ⇒ NP-hard – equivalent to counting 3SAT models = ⇒ #P-complete Proof of this in one of the exercises for Thursday. 14

Inference in BN Inference by stochastic simulation Basic idea: 0.5 Draw N samples from a sampling distribution S Compute an approximate posterior probability ˆ P Show this converges to the true probability P Coin Outline: – Sampling from an empty network – Rejection sampling: reject samples disagreeing with evidence – Likelihood weighting: use evidence to weight samples – Markov chain Monte Carlo (MCMC): sample from a stochastic process whose stationary distribution is the true posterior 15

Inference in BN Sampling from an empty network function Prior-Sample( bn ) returns an event sampled from bn inputs : bn , a belief network specifying joint distribution Pr ( X 1 , . . . , X n ) x ← an event with n elements for i = 1 to n do x i ← a random sample from Pr ( X i | parents ( X i )) given the values of Parents ( X i ) in x return x Ancestor sampling 16

Inference in BN Example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain T .10 Sprinkler T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 17

Inference in BN Sampling from an empty network contd. Probability that PriorSample generates a particular event S PS ( x 1 . . . x n ) = P ( x 1 . . . x n ) i.e., the true prior probability E.g., S PS ( t , f , t , t ) = 0 . 5 × 0 . 9 × 0 . 8 × 0 . 9 = 0 . 324 = P ( t , f , t , t ) Proof: Let N PS ( x 1 . . . x n ) be the number of samples generated for event x 1 , . . . , x n . Then we have ˆ lim P ( x 1 , . . . , x n ) = N →∞ N PS ( x 1 , . . . , x n ) / N lim N →∞ = S PS ( x 1 , . . . , x n ) n � = P ( x i | parents ( X i )) = P ( x 1 . . . x n ) i = 1 � That is, estimates derived from PriorSample are consistent Shorthand: ˆ P ( x 1 , . . . , x n ) ≈ P ( x 1 . . . x n ) 18

Inference in BN Rejection sampling ˆ Pr ( X | e ) estimated from samples agreeing with e function Rejection-Sampling( X , e , bn , N ) returns an estimate of P ( X | e ) local variables : N , a vector of counts over X , initially zero for j = 1 to N do x ← Prior-Sample( bn ) if x is consistent with e then N [ x ] ← N [ x ]+1 where x is the value of X in x return Normalize( N [ X ]) E.g., estimate Pr ( Rain | Sprinkler = true ) using 100 samples 27 samples have Sprinkler = true Of these, 8 have Rain = true and 19 have Rain = false . ˆ Pr ( Rain | Sprinkler = true ) = Normalize ( � 8 , 19 � ) = � 0 . 296 , 0 . 704 � Similar to a basic real-world empirical estimation procedure 19

Inference in BN Analysis of rejection sampling Rejection sampling returns consistent posterior estimates Proof: ˆ Pr ( X | e ) = α N PS ( X , e ) (algorithm defn.) = N PS ( X , e ) / N PS ( e ) (normalized by N PS ( e ) ) ≈ Pr ( X , e ) / P ( e ) (property of PriorSample) = Pr ( X | e ) (defn. of conditional probability) Problem: hopelessly expensive if P ( e ) is small P ( e ) drops off exponentially with number of evidence variables! 20

Inference in Bayesian Networks Marco Chiarandini Department of - PowerPoint PPT Presentation

Lecture 7 Inference in Bayesian Networks Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Inference in BN Course Overview Introduction Learning

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

P = P (accidents happen in period t ) = 1 e A P ( B ) t A P ( B ) t , if

Ebba: An Embedded DSL for Bayesian Inference Linkping University, 17 June 2014 Henrik Nilsson

Trieste, 14 Mai 2015 J. Jasche, Bayesian LSS Inference What do we want to do? homogeneous vs.

Bayes meets Dijkstra Exact Inference by Program Verification Joost-Pieter Katoen Dagstuhl

Bayesian inference for discretely observed diffusion processes Moritz Schauer with Frank van der

Accelerating Bayesian Inference on Structured Graphs Using Parallel Gibbs Sampling Glenn G. Ko

Ohio AAP Brush, Book, Bed Pilot QI Program Action Period Call 1 January 15, 2020 Welcome and

Hadoop Distributed File System (HDFS) 1 HDFS Overview A distributed file system Built on the