Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 - PowerPoint PPT Presentation

Inference in Bayesian networks Chapter 14.4–5 Chapter 14.4–5 1

Outline ♦ Exact inference by enumeration ♦ Approximate inference by stochastic simulation Chapter 14.4–5 2

Inference tasks Simple queries: compute posterior marginal P ( X i | E = e ) e.g., P ( NoGas | Gauge = empty, Lights = on, Starts = false ) Conjunctive queries: P ( X i , X j | E = e ) = P ( X i | E = e ) P ( X j | X i , E = e ) Optimal decisions: decision networks include utility information; probabilistic inference required for P ( outcome | action, evidence ) Value of information: which evidence to seek next? Sensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor? Chapter 14.4–5 3

Inference by enumeration Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation Simple query on the burglary network: B E P ( B | j, m ) = P ( B, j, m ) /P ( j, m ) A = α P ( B, j, m ) J M = α Σ e Σ a P ( B, e, a, j, m ) Rewrite full joint entries using product of CPT entries: P ( B | j, m ) = α Σ e Σ a P ( B ) P ( e ) P ( a | B, e ) P ( j | a ) P ( m | a ) = α P ( B ) Σ e P ( e ) Σ a P ( a | B, e ) P ( j | a ) P ( m | a ) Recursive depth-first enumeration: O ( n ) space, O ( d n ) time Chapter 14.4–5 4

Enumeration algorithm function Enumeration-Ask ( X , e , bn ) returns a distribution over X inputs : X , the query variable e , observed values for variables E bn , a Bayesian network with variables { X } ∪ E ∪ Y Q ( X ) ← a distribution over X , initially empty for each value x i of X do extend e with value x i for X Q ( x i ) ← Enumerate-All ( Vars [ bn ], e ) return Normalize ( Q ( X ) ) function Enumerate-All ( vars , e ) returns a real number if Empty? ( vars ) then return 1.0 Y ← First ( vars ) if Y has value y in e then return P ( y | Pa ( Y )) × Enumerate-All ( Rest ( vars ), e ) y P ( y | Pa ( Y )) × Enumerate-All ( Rest ( vars ), e y ) else return � where e y is e extended with Y = y Chapter 14.4–5 5

Complexity of exact inference Multiply connected networks: ⇒ – can reduce 3SAT to exact inference NP-hard – equivalent to counting 3SAT models ⇒ #P-complete 0.5 0.5 0.5 0.5 A B C D L 1. A v B v C L 2. C v D v A 1 2 3 L 3. B v C v D L AND Chapter 14.4–5 6

Inference by stochastic simulation Basic idea: 1) Draw N samples from a sampling distribution S 0.5 2) Compute an approximate posterior probability ˆ P 3) Show this converges to the true probability P Coin Outline: – Sampling from an empty network – Rejection sampling: reject samples disagreeing with evidence – Likelihood weighting: use evidence to weight samples Chapter 14.4–5 7

Sampling from an empty network function Prior-Sample ( bn ) returns an event sampled from bn inputs : bn , a belief network specifying joint distribution P ( X 1 , . . . , X n ) x ← an event with n elements for i = 1 to n do x i ← a random sample from P ( X i | parents ( X i )) given the values of Parents ( X i ) in x return x Chapter 14.4–5 8

Example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain Sprinkler T .10 T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 Chapter 14.4–5 9

Sampling from an empty network contd. Probability that PriorSample generates a particular event S PS ( x 1 . . . x n ) = Π n i = 1 P ( x i | parents ( X i )) = P ( x 1 . . . x n ) i.e., the true prior probability E.g., S PS ( t, f, t, t ) = 0 . 5 × 0 . 9 × 0 . 8 × 0 . 9 = 0 . 324 = P ( t, f, t, t ) Let N PS ( x 1 . . . x n ) be the number of samples generated for event x 1 , . . . , x n Then we have ˆ lim P ( x 1 , . . . , x n ) = N →∞ N PS ( x 1 , . . . , x n ) /N lim N →∞ = S PS ( x 1 , . . . , x n ) = P ( x 1 . . . x n ) That is, estimates derived from PriorSample are consistent Shorthand: ˆ P ( x 1 , . . . , x n ) ≈ P ( x 1 . . . x n ) Chapter 14.4–5 16

Rejection sampling ˆ P ( X | e ) estimated from samples agreeing with e function Rejection-Sampling ( X , e , bn , N ) returns an estimate of P ( X | e ) local variables : N , a vector of counts over X , initially zero for j = 1 to N do x ← Prior-Sample ( bn ) if x is consistent with e then N [ x ] ← N [ x ]+1 where x is the value of X in x return Normalize ( N [ X ]) E.g., estimate P ( Rain | Sprinkler = true ) using 100 samples 27 samples have Sprinkler = true Of these, 8 have Rain = true and 19 have Rain = false . ˆ P ( Rain | Sprinkler = true ) = Normalize ( � 8 , 19 � ) = � 0 . 296 , 0 . 704 � Similar to a basic real-world empirical estimation procedure Chapter 14.4–5 17

Analysis of rejection sampling ˆ P ( X | e ) = α N PS ( X, e ) (algorithm defn.) = N PS ( X, e ) /N PS ( e ) (normalized by N PS ( e ) ) ≈ P ( X, e ) /P ( e ) (property of PriorSample ) = P ( X | e ) (defn. of conditional probability) Hence rejection sampling returns consistent posterior estimates Problem: hopelessly expensive if P ( e ) is small P ( e ) drops off exponentially with number of evidence variables! Chapter 14.4–5 18

Likelihood weighting Idea: fix evidence variables, sample only nonevidence variables, and weight each sample by the likelihood it accords the evidence function Likelihood-Weighting ( X , e , bn , N ) returns an estimate of P ( X | e ) local variables : W , a vector of weighted counts over X , initially zero for j = 1 to N do x , w ← Weighted-Sample ( bn ) W [ x ] ← W [ x ] + w where x is the value of X in x return Normalize ( W [ X ] ) function Weighted-Sample ( bn , e ) returns an event and a weight x ← an event with n elements; w ← 1 for i = 1 to n do if X i has a value x i in e then w ← w × P ( X i = x i | parents ( X i )) else x i ← a random sample from P ( X i | parents ( X i )) return x , w Chapter 14.4–5 19

Likelihood weighting example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain T .10 Sprinkler T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 w = 1 . 0 Chapter 14.4–5 20

Likelihood weighting example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain T .10 Sprinkler T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 w = 1 . 0 × 0 . 1 Chapter 14.4–5 23

Likelihood weighting example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain T .10 Sprinkler T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 w = 1 . 0 × 0 . 1 × 0 . 99 = 0 . 099 Chapter 14.4–5 26

Likelihood weighting analysis Sampling probability for WeightedSample is S WS ( z , e ) = Π l i = 1 P ( z i | parents ( Z i )) Note: pays attention to evidence in ancestors only Cloudy ⇒ somewhere “in between” prior and posterior distribution Rain Sprinkler Wet Weight for a given sample z , e is Grass w ( z , e ) = Π m i = 1 P ( e i | parents ( E i )) Weighted sampling probability is S WS ( z , e ) w ( z , e ) = Π l i = 1 P ( z i | parents ( Z i )) Π m i = 1 P ( e i | parents ( E i )) = P ( z , e ) (by standard global semantics of network) Hence likelihood weighting returns consistent estimates but performance still degrades with many evidence variables because a few samples have nearly all the total weight Chapter 14.4–5 27

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 - PowerPoint PPT Presentation

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference by enumeration Approximate inference by stochastic simulation Chapter 14.45 2 Inference tasks Simple queries: compute posterior marginal P (

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

EI331 Signals and Systems Lecture 23 Bo Jiang John Hopcroft Center for Computer Science

Discrete Surface Ricci Flow David Gu 1 1 Computer Science Department Stony Brook University, USA

Quasicircles, quasiconformal extensions and the Corona Theorem Mara Jos Gonzlez

Contents Conjugate Function Method for References 1 Numerical Conformal Mappings Motivation 2

Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks Yusuke Nagasaka

B E

Inf er ence in Bayesia n net wor k s Cha pt er 14.4 5 Extracted from:

SLE Loop Measures Dapeng Zhan Michigan State University Geometry, Analysis and Probability May