cse 473 artificial intelligence
play

CSE 473: Artificial Intelligence Bayesian Networks: Inference Hanna - PowerPoint PPT Presentation

CSE 473: Artificial Intelligence Bayesian Networks: Inference Hanna Hajishirzi Many slides over the course adapted from either Luke Zettlemoyer, Pieter Abbeel, Dan Klein, Stuart Russell or Andrew Moore 1 Outline Bayesian Networks


  1. CSE 473: Artificial Intelligence Bayesian Networks: Inference Hanna Hajishirzi Many slides over the course adapted from either Luke Zettlemoyer, Pieter Abbeel, Dan Klein, Stuart Russell or Andrew Moore 1

  2. Outline § Bayesian Networks Inference § Exact Inference: Variable Elimination § Approximate Inference: Sampling

  3. Approximate Inference § Simulation has a name: sampling F § Sampling is a hot topic in machine learning, and it ’ s really simple S § Basic idea: § Draw N samples from a sampling distribution S A § Compute an approximate posterior probability § Show this converges to the true probability P § Why sample? § Learning: get samples from a distribution you don ’ t know § Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination)

  4. Sampling ! Example# ! Sampling#from#given#distribu)on# ! Step#1:#Get#sample# u #from#uniform# C# P(C)# distribu)on#over#[0,#1)# ! E.g.#random()#in#python# red# 0.6# ! Step#2:#Convert#this#sample# u #into#an# green# 0.1# outcome#for#the#given#distribu)on#by# blue# 0.3# having#each#outcome#associated#with# a#sub`interval#of#[0,1)#with#sub`interval# size#equal#to#probability#of#the# ! If#random()#returns# u #=#0.83,# outcome# then#our#sample#is# C #=#blue# ! E.g,#ader#sampling#8#)mes:# 4

  5. Sampling in BN ! Prior#Sampling# ! Rejec)on#Sampling# ! Likelihood#Weigh)ng# ! Gibbs#Sampling# 5

  6. Prior Sampling +c ¡ 0.5 ¡ -­‑c ¡ 0.5 ¡ Cloudy Cloudy +s ¡ 0.1 ¡ +r ¡ 0.8 ¡ +c ¡ -­‑s ¡ 0.9 ¡ +c ¡ -­‑r ¡ 0.2 ¡ +s ¡ 0.5 ¡ +r ¡ 0.2 ¡ Sprinkler Sprinkler Rain Rain -­‑c ¡ -­‑s ¡ 0.5 ¡ -­‑c ¡ -­‑r ¡ 0.8 ¡ Samples: WetGrass WetGrass +w ¡ 0.99 ¡ +c, -s, +r, +w +r ¡ -­‑w ¡ 0.01 ¡ +w ¡ 0.90 ¡ -c, +s, -r, +w +s ¡ -­‑r ¡ -­‑w ¡ 0.10 ¡ … +w ¡ 0.90 ¡ +r ¡ -­‑w ¡ 0.10 ¡ +w ¡ 0.01 ¡ -­‑s ¡ -­‑r ¡ -­‑w ¡ 0.99 ¡

  7. Prior Sampling ! For#i=1,#2,#…,#n# ! Sample#x i #from#P(X i #|#Parents(X i ))# ! Return#(x 1 ,#x 2 ,#…,#x n )# 7

  8. Prior Sampling § This process generates samples with probability: … i.e. the BN ’ s joint probability § Let the number of samples of an event be § Then § I.e., the sampling procedure is consistent

  9. Example § We ’ ll get a bunch of samples from the BN: +c, -s, +r, +w +c, +s, +r, +w Cloudy C -c, +s, +r, -w Sprinkler S Rain R +c, -s, +r, +w WetGrass W -c, -s, -r, +w § If we want to know P(W) § We have counts <+w:4, -w:1> § Normalize to get P(W) = <+w:0.8, -w:0.2> § This will get closer to the true distribution with more samples § Can estimate anything else, too § What about P(C| +w)? P(C| +r, +w)? P(C| -r, -w)? § Fast: can use fewer samples if less time (what ’ s the drawback?)

  10. Rejection Sampling § Let ’ s say we want P(C) Cloudy C § No point keeping all samples around Sprinkler S Rain R § Just tally counts of C as we go WetGrass W § Let ’ s say we want P(C| +s) § Same thing: tally C outcomes, but +c, -s, +r, +w ignore (reject) samples which don ’ t +c, +s, +r, +w have S=+s -c, +s, +r, -w § This is called rejection sampling +c, -s, +r, +w -c, -s, -r, +w § It is also consistent for conditional probabilities (i.e., correct in the limit)

  11. Sampling Example § There are 2 cups. § The first contains 1 penny and 1 quarter § The second contains 2 quarters § Say I pick a cup uniformly at random, then pick a coin randomly from that cup. It's a quarter (yes!). What is the probability that the other coin in that cup is also a quarter?

  12. Rejection Sampling ! IN:#evidence#instan)a)on# ! For#i=1,#2,#…,#n# ! Sample#x i #from#P(X i #|#Parents(X i ))# ! If#x i #not#consistent#with#evidence# ! Reject:#Return,#and#no#sample#is#generated#in#this#cycle# ! Return#(x 1 ,#x 2 ,#…,#x n )# 12

  13. Likelihood Weighting § Problem with rejection sampling: § If evidence is unlikely, you reject a lot of samples § You don ’ t exploit your evidence as you sample -b, -a § Consider P(B|+a) -b, -a -b, -a -b, -a Burglary Alarm +b, +a § Idea: fix evidence variables and sample the rest -b +a -b, +a Burglary Alarm -b, +a -b, +a +b, +a § Problem: sample distribution not consistent! § Solution: weight by probability of evidence given parents

  14. Likelihood Weighting +c ¡ 0.5 ¡ -­‑c ¡ 0.5 ¡ Cloudy Cloudy +s ¡ 0.1 ¡ +r ¡ 0.8 ¡ +c ¡ -­‑s ¡ 0.9 ¡ +c ¡ -­‑r ¡ 0.2 ¡ +s ¡ 0.5 ¡ +r ¡ 0.2 ¡ Sprinkler Sprinkler Rain Rain -­‑c ¡ -­‑s ¡ 0.5 ¡ -­‑c ¡ -­‑r ¡ 0.8 ¡ Samples: WetGrass WetGrass +w ¡ 0.99 ¡ +c, +s, +r, +w +r ¡ -­‑w ¡ 0.01 ¡ +w ¡ 0.90 ¡ … +s ¡ -­‑r ¡ -­‑w ¡ 0.10 ¡ +w ¡ 0.90 ¡ +r ¡ -­‑w ¡ 0.10 ¡ +w ¡ 0.01 ¡ -­‑s ¡ -­‑r ¡ -­‑w ¡ 0.99 ¡

  15. Likelihood Weighting § Sampling distribution if z sampled and e fixed evidence Cloudy C S R § Now, samples have weights W § Together, weighted sampling distribution is consistent

  16. Likelihood Weighting ! IN:#evidence#instan)a)on# ! w#=#1.0# ! for#i=1,#2,#…,#n# ! if#X i #is#an#evidence#variable# ! X i #=#observa)on#x i #for#X i# ! Set#w#=#w#*#P(x i #|#Parents(X i ))# ! else# ! Sample#x i #from#P(X i #|#Parents(X i ))# ! return#(x 1 ,#x 2 ,#…,#x n ),#w# 16

  17. Likelihood Weighting § Likelihood weighting is good § We have taken evidence into account as we generate the sample § E.g. here, W ’ s value will get picked based on the evidence values of S, R Cloudy C § More of our samples will reflect the state of the world suggested by the evidence S Rain R § Likelihood weighting doesn ’ t solve all our problems W § Evidence influences the choice of downstream variables, but not upstream ones (C isn ’ t more likely to get a value matching the evidence) § We would like to consider evidence when we sample every variable

  18. Markov Chain Monte Carlo* § Idea: instead of sampling from scratch, create samples that are each like the last one. § Gibbs Sampling: resample one variable at a time, conditioned on the rest, but keep evidence fixed. -b +a +c -b -a +c +b +a +c § Properties: Now samples are not independent (in fact they ’ re nearly identical), but sample averages are still consistent estimators! § What ’ s the point: both upstream and downstream variables condition on evidence.

  19. Gibbs Sampling Example P(S|+r) ! Step#2:#Ini)alize#other#variables## ! Step#1:#Fix#evidence# C # C # ! Randomly# ! R#=#+r# S # +r # S # +r # W # W # ! Steps#3:#Repeat# ! Choose#a#non`evidence#variable#X# ! Resample#X#from#P(#X#|#all#other#variables) # C # C # C # C # C # C # S # +r # S # +r # S # +r # S # +r # S # +r # S # +r # W # W # W # W # W # W # 19

  20. Sampling One Variable ! #Sample#from#P(S#|#+c,#+r,#`w) ## C # S # +r # W # ! Many#things#cancel#out#–#only#CPTs#with#S#remain!# ! More#generally:#only#CPTs#that#have#resampled#variable#need#to#be#considered,#and# joined#together# 20

  21. How#About#Par)cle#Filtering?# X 1 X 2 X 2 = likelihood weighting E 2 Elapse# Weight# Resample# Par)cles:# Par)cles:# Par)cles:# (New)#Par)cles:# ####(3,3)# ####(3,2)# ####(3,2)##w=.9# ####(3,2)# ####(2,3)# ####(2,3)# ####(2,3)##w=.2# ####(2,2)# ####(3,3)#### ####(3,2)#### ####(3,2)##w=.9# ####(3,2)#### ####(3,2)# ####(3,1)# ####(3,1)##w=.4# ####(2,3)# ####(3,3)# ####(3,3)# ####(3,3)##w=.4# ####(3,3)# ####(3,2)# ####(3,2)# ####(3,2)##w=.9# ####(3,2)# ####(1,2)# ####(1,3)# ####(1,3)##w=.1# ####(1,3)# ####(3,3)# ####(2,3)# ####(2,3)##w=.2# ####(2,3)# ####(3,3)# ####(3,2)# ####(3,2)##w=.9# ####(3,2)# ####(2,3)# ####(2,2)# ####(2,2)##w=.4# ####(3,2)# 21

  22. Dynamic#Bayes#Nets#(DBNs)# ! We#want#to#track#mul)ple#variables#over#)me,#using#mul)ple#sources#of#evidence# ! Idea:#Repeat#a#fixed#Bayes#net#structure#at#each#)me# ! Variables#from#)me# t #can#condi)on#on#those#from# t-1/ t =1 t =2 t =3 G 1 a G 2 a G 3 a G 1 b G 2 b G 3 b E 1 a E 1 b E 2 a E 2 b E 3 a E 3 b ! Discrete#valued#dynamic#Bayes#nets#(with#evidence#on#the#bodom)#are#HMMs# 22

  23. Exact Inference in DBNs ! Variable#elimina)on#applies#to#dynamic#Bayes#nets# ! Procedure:# � unroll � #the#network#for#T#)me#steps,#then#eliminate#variables#un)l#P(X T |e 1:T )# is#computed# t =1 t =2 t =3 G 1 a G 2 a G 3 a G 1 b G 2 b G 3 G 3 b b E 1 a E 1 b E 2 a E 2 b E 3 a E 3 b ! Online#belief#updates:#Eliminate#all#variables#from#the#previous#)me#step;#store#factors# for#current#)me#only# 23

  24. Particle Filtering in DBNs ! A#par)cle#is#a#complete#sample#for#a#)me#step# ! Ini$alize :#Generate#prior#samples#for#the#t=1#Bayes#net# ! Example#par)cle:# G 1 a+ =#(3,3)# G 1 b+ =#(5,3)## ! Elapse+$me :#Sample#a#successor#for#each#par)cle## ! Example#successor:# G 2 a+ =#(2,3)# G 2 b+ =#(6,3)# ! Observe :#Weight#each# en0re #sample#by#the#likelihood#of#the#evidence#condi)oned#on# the#sample# ! Likelihood:#P( E 1 a+ | G 1 a+ )#*#P( E 1 b+ | G 1 b+ )## ! Resample:+ Select#prior#samples#(tuples#of#values)#in#propor)on#to#their#likelihood# 24

Recommend


More recommend