slides set 11 part a
play

Slides Set 11 (part a): Sampling Techniques for Probabilistic and - PowerPoint PPT Presentation

Algorithms for Reasoning with graphical models Slides Set 11 (part a): Sampling Techniques for Probabilistic and Deterministic Graphical models Rina Dechter (Reading Darwiche chapter 15, related papers) slides11a 828X 2019 Sampling


  1. Algorithms for Reasoning with graphical models Slides Set 11 (part a): Sampling Techniques for Probabilistic and Deterministic Graphical models Rina Dechter (Reading” Darwiche chapter 15, related papers) slides11a 828X 2019

  2. Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Reading” Darwiche chapter 15, related papers slides11a 828X 2019

  3. Overview 1. Basics of sampling 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao‐Blackwellisation, cutset sampling slides11a 828X 2019

  4. Overview 1. Basics of sampling 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao‐Blackwellisation, cutset sampling slides11a 828X 2019

  5. Types of queries  Max-Inference Harder  Sum-Inference  Mixed-Inference • NP-hard : exponentially many terms • We will focus on approximation algorithms – Anytime : very fast & very approximate ! Slower & more accurate slides10 828X 2019

  6. Monte Carlo estimators • Most basic form: empirical estimate of probability • Relevant considerations – Able to sample from the target distribution p(x)? – Able to evaluate p(x) explicitly, or only up to a constant? • “Any‐time” properties – Unbiased estimator, or asymptotically unbiased, – Variance of the estimator decreases with m slides11a 828X 2019

  7. Monte Carlo estimators • Most basic form: empirical estimate of probability • Central limit theorem – p(U) is asymptotically Gaussian: m=1: m=5: m=15: • Finite sample confidence intervals – If u(x) or its variance are bounded, e.g., probability concentrates rapidly around the expectation: slides11a 828X 2019

  8. slides11a 828X 2019

  9. A Sample • Given a set of variables X={X 1 ,...,X n }, a sample, denoted by S t is an instantiation of all variables:  t ( t , t ,..., t ) S x x x 1 2 n slides11a 828X 2019

  10. How to Draw a Sample ? Univariate Distribution • Example: Given random variable X having domain {0, 1} and a distribution P(X) = (0.3, 0.7). • Task: Generate samples of X from P. • How? – draw random number r  [0, 1] – If (r < 0.3) then set X=0 – Else set X=1 slides11a 828X 2019

  11. How to Draw a Sample? Multi‐Variate Distribution • Let X={X 1 ,..,X n } be a set of variables • Express the distribution in product form     ( ) ( ) ( | ) ... ( | ,..., ) P X P X P X X P X X X  1 2 1 1 1 n n • Sample variables one by one from left to right, along the ordering dictated by the product form. • Bayesian network literature: Logic sampling or Forward Sampling. slides11a 828X 2019

  12. [e.g., Henrion 1988] Sampling in Bayes nets (Forward Sampling) • No evidence: “causal” form makes sampling easy – Follow variable ordering defined by parents – Starting from root(s), sample downward – When sampling each variable, condition on values of parents A B D Sample: C slides11a 828X 2019

  13. Froward Sampling: No Evidence (Henrion 1988) Input: Bayesian network X= {X 1 ,…,X N }, N‐ #nodes, T ‐ # samples Output: T samples Process nodes in topological order – first process the ancestors of a node, then the node itself: 1. For t = 0 to T 2. For i = 0 to N t from P(x i | pa i ) X i  sample x i 3. slides11a 828X 2019

  14. Forward Sampling (example)     ( , , , ) ( ) ( | ) ( | ) ( | , ) P X X X X P X P X X P X X P X X X 1 2 3 4 1 2 1 3 1 4 2 3 ( ) P X No Evidence X 1 1 // generate sample k X 3 X 2 1 . Sample from ( ) x P x 1 1 ( | ) ( | )  P X 2 X P X 3 X 2 . Sample from ( | ) x P x X x 1 1 2 2 1 1 X 4  3 . Sample from ( | ) x P x X x 3 3 1 1   4 . Sample from ( | ) x P x X x X x ( | , ) P X X X 4 4 2 2 , 3 3 4 2 3 slides11a 828X 2019

  15. Forward Sampling w/ Evidence Input: Bayesian network X= {X 1 ,…,X N }, N‐ #nodes E – evidence, T ‐ # samples Output: T samples consistent with E 1. For t=1 to T 2. For i=1 to N t from P(x i | pa i ) X i  sample x i 3. If X i in E and X i  x i , reject sample: 4. 5. Goto Step 1. slides11a 828X 2019

  16. Forward Sampling (example)  Evidence : 0 X 3 ( 1 ) P x X 1 // generate sample k 1 . Sample from ( ) x P x 1 1 X 3 X 2 2 . Sample from ( | ) x P x x 2 2 1 3 . Sample from ( | ) ( | ) ( | ) x P x x P x 2 x P x 3 x 3 3 1 1 1 X 4  4 . If 0, reject sample x 3 ( | , ) P x x x and start from 1, otherwise 4 2 3 5. Sample from ( | ) x P x x x 4 4 2 , 3 slides11a 828X 2019

  17. How to answer queries with sampling? Expected value and Variance Many queries can be phrased as computing expectation of some functions Expected value : Given a probability distribution P(X) and a function g(X) defined over a set of variables X = {X 1 , X 2 , … X n }, the expected value of g w.r.t. P is   [ ( )] ( ) ( ) E g x g x P x P x Variance: The variance of g w.r.t. P is:    2   [ ( )] ( ) [ ( )] ( ) Var g x g x E g x P x P P x slides11a 828X 2019

  18. Monte Carlo Estimate • Estimator: – An estimator is a function of the samples. – It produces an estimate of the unknown parameter of the sampling distribution.  1 2 T Given i.i.d. samples S , S , S drawn from , P the Monte carlo estimate of E [g(x)] is given by : P 1   T  ˆ ( t ) g g S 1 t T slides11a 828X 2019

  19. Example: Monte Carlo estimate • Given: – A distribution P(X) = (0.3, 0.7). – g(X) = 40 if X equals 0 = 50 if X equals 1. • Estimate E P [g(x)]=(40x0.3+50x0.7)=47. • Generate k samples from P: 0,1,1,1,0,1,1,0,1,0      40 # ( 0 ) 50 # ( 1 ) samples X samples X  ˆ g # samples    40 4 50 6   46 10 slides11a 828X 2019

  20. Bayes Nets with Evidence • Estimating posterior probabilities, P[A = a | E=e]? • Rejection sampling – Draw x ~ p(x), but discard if E � e – Resulting samples are from p(x | E=e); use as before – Problem: keeps only P[E=e] fraction of the samples! – Performs poorly when evidence probability is small • Estimate the ratio: P[A=a,E=e] / P[E=e] – Two estimates (numerator & denominator) – Good finite sample bounds require low relative error! – Again, performs poorly when evidence probability is small – What bounds can we get? slides11a 828X 2019

  21. slides11a 828X 2019

  22. slides11a 828X 2019

  23. slides11a 828X 2019

  24. absolute slides11a 828X 2019

  25. Bayes Nets With Evidence • Estimating the probability of evidence, P[E=e]: – Finite sample bounds: u(x) ∈ [0,1] [e.g., Hoeffding] What if the evidence is unlikely? P[E=e]=1e‐6 ) could estimate U = 0! – Relative error bounds [Dagum & Luby 1997] So, if U, the probability of evidence is very small we would need many many samples Tht are not rejected. slides11a 828X 2019

  26. Overview 1. Basics of sampling 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao‐Blackwellisation, cutset sampling slides11a 828X 2019

  27. Importance Sampling: Main Idea • Express query as the expected value of a random variable w.r.t. to a distribution Q. • Generate random samples from Q. • Estimate the expected value from the generated samples using a monte carlo estimator (average). slides11a 828X 2019

  28. Importance Sampling • Basic empirical estimate of probability: • Importance sampling: slides11a 828X 2019

  29. Importance Sampling • Basic empirical estimate of probability: • Importance sampling: “importance weights” slides11a 828X 2019

  30. Estimating P(E) and P(X|e) slides11a 828X 2019

  31. Importance Sampling For P(e)  \ , Let Z X E Let Q(Z) be a (proposal) distributi on, satisfying    ( , ) 0 ( ) 0 P z e Q z Then, we can rewrite P(e) as :   ( ) ( , ) Q z P z e       ( ) ( , ) ( , ) [ ( )] P e P z e P z e E E w z   Q Q ( ) ( )   Q z Q z z z Monte Carlo estimate : 1 T  ˆ   ( ) ( t ) , where z t ( ) P e w z Q Z T  1 t slides11a 828X 2019

  32. Properties of IS Estimate of P(e) • Convergence: by law of large numbers 1   ˆ T    . .   ( ) ( ) a s ( ) for T i P e w z P e 1 i T • Unbiased. ˆ  [ ( )] ( ) E Q P e P e • Variance:   [ ( )]   1 Var w z   ˆ N   Q ( ) ( i ) Var P e Var w z   Q Q  1  i T T slides11a 828X 2019

  33. Properties of IS Estimate of P(e) • Mean Squared Error of the estimator       2 ˆ ˆ   ( ) ( ) ( ) MSE P e E P e P e     Q Q     2 ˆ ˆ    ( ) [ ( )] ( ) P e E P e Var P e Q Q   ˆ  ( ) Var P e Q This quantity enclosed in the brackets is zero because the expected value of the [ ( )] Var w x estimator equals the expected value of g(x)  Q T slides11a 828X 2019

Recommend


More recommend