sampling techniques for probabilistic
play

Sampling Techniques for Probabilistic and Deterministic Graphical - PowerPoint PPT Presentation

Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2017 Bozhena Bidyuk Rina Dechter Reading Darwiche chapter 15, related papers Overview 1. Probabilistic Reasoning/Graphical models 2. Importance


  1. Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2017 Bozhena Bidyuk Rina Dechter Reading” Darwiche chapter 15, related papers

  2. Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling

  3. Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Cutset-based Variance Reduction 6. AND/OR importance sampling

  4. Probabilistic Reasoning; Graphical models • Graphical models: – Bayesian network, constraint networks, mixed network • Queries • Exact algorithm – using inference, – search and hybrids • Graph parameters: – tree-width, cycle-cutset, w-cutset

  5. Queries • Probability of evidence (or partition function) n       ( ) Z i C ( ) ( | ) | P e P x i pa i i e   X i var( ) 1 X e i • Posterior marginal (beliefs): n   ( | ) | P x pa j j e ( , ) P x e      var( ) 1 X e X j i ( | ) P x e i i n ( )   P e ( | ) | P x pa j j e   X var( e ) 1 j • Most Probable Explanation  x * arg max P( x , e) x

  6. Approximation • Since inference, search and hybrids are too expensive when graph is dense; (high treewidth) then: • Bounding inference: (week 8) • mini-bucket and mini-clustering • Belief propagation • Bounding search: (week 7) • Sampling • Goal: an anytime scheme 8

  7. Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling

  8. Outline • Definitions and Background on Statistics • Theory of importance sampling • Likelihood weighting • State-of-the-art importance sampling techniques 10

  9. A sample • Given a set of variables X={X 1 ,...,X n }, a sample, denoted by S t is an instantiation of all variables:  t t t t ( , ,..., ) S x x x 1 2 n 11

  10. How to draw a sample ? Univariate distribution • Example: Given random variable X having domain {0, 1} and a distribution P(X) = (0.3, 0.7). • Task: Generate samples of X from P. • How? – draw random number r  [0, 1] – If (r < 0.3) then set X=0 – Else set X=1 12

  11. How to draw a sample? Multi-variate distribution • Let X={X 1 ,..,X n } be a set of variables • Express the distribution in product form     ( ) ( ) ( | ) ... ( | ,..., ) P X P X P X X P X X X  1 2 1 1 1 n n • Sample variables one by one from left to right, along the ordering dictated by the product form. • Bayesian network literature: Logic sampling 13

  12. Sampling for Prob. Inference Outline • Logic Sampling • Importance Sampling – Likelihood Sampling – Choosing a Proposal Distribution • Markov Chain Monte Carlo (MCMC) – Metropolis-Hastings – Gibbs sampling • Variance Reduction

  13. Logic Sampling: No Evidence (Henrion 1988) Input: Bayesian network X= {X 1 ,…,X N }, N- #nodes, T - # samples Output: T samples Process nodes in topological order – first process the ancestors of a node, then the node itself: 1. For t = 0 to T 2. For i = 0 to N t from P(x i | pa i ) X i  sample x i 3. 15

  14. Logic sampling (example)     ( , , , ) ( ) ( | ) ( | ) ( | , ) P X X X X P X P X X P X X P X X X 1 2 3 4 1 2 1 3 1 4 2 3 ( ) P X No Evidence X 1 1 // generate sample k X 3 X 2 1 . Sample from ( ) x P x 1 1  ( | ) ( | ) P X 3 X P X 2 X 2 . Sample from ( | ) x P x X x 1 1 2 2 1 1 X 4  3 . Sample from ( | ) x P x X x 3 3 1 1   4 . Sample from ( | ) x P x X x X x ( | , ) P X X X 4 4 2 2 , 3 3 4 2 3 16

  15. Logic Sampling w/ Evidence Input: Bayesian network X= {X 1 ,…,X N }, N- #nodes E – evidence, T - # samples Output: T samples consistent with E 1. For t=1 to T 2. For i=1 to N t from P(x i | pa i ) X i  sample x i 3. If X i in E and X i  x i , reject sample: 4. 5. Goto Step 1. 17

  16. Logic Sampling (example)  Evidence : 0 X 3 ( 1 ) P x X 1 // generate sample k 1 . Sample from ( ) x P x 1 1 X 3 X 2 2 . Sample from ( | ) x P x x 2 2 1 3 . Sample from ( | ) x P x x ( | ) ( | ) P x 3 x P x 2 x 3 3 1 1 1 X 4  4 . If 0, reject sample x 3 ( | , ) P x x x and start from 1, otherwise 4 2 3 5. Sample from ( | ) x P x x x 4 4 2 , 3 18

  17. Expected value and Variance Expected value : Given a probability distribution P(X) and a function g(X) defined over a set of variables X = {X 1 , X 2 , … X n }, the expected value of g w.r.t. P is   [ ( )] ( ) ( ) E g x g x P x P x Variance: The variance of g w.r.t. P is:      2 [ ( )] ( ) [ ( )] ( ) Var g x g x E g x P x P P x 20

  18. Monte Carlo Estimate • Estimator: – An estimator is a function of the samples. – It produces an estimate of the unknown parameter of the sampling distribution.  1 2 T Given i.i.d. samples S , S , S drawn from , P the Monte carlo estimate of E [g(x)] is given by : P 1   T  ˆ t ( ) g g S 1 t T 21

  19. Example: Monte Carlo estimate • Given: – A distribution P(X) = (0.3, 0.7). – g(X) = 40 if X equals 0 = 50 if X equals 1. • Estimate E P [g(x)]=(40x0.3+50x0.7)=47. • Generate k samples from P: 0,1,1,1,0,1,1,0,1,0      40 # ( 0 ) 50 # ( 1 ) samples X samples X  ˆ g # samples    40 4 50 6   46 10 22

  20. Outline • Definitions and Background on Statistics • Theory of importance sampling • Likelihood weighting • State-of-the-art importance sampling techniques 23

  21. Importance sampling: Main idea • Express query as the expected value of a random variable w.r.t. to a distribution Q. • Generate random samples from Q. • Estimate the expected value from the generated samples using a monte carlo estimator (average). 24

  22. Importance sampling for P(e)  \ , Let Z X E Let Q(Z) be a (proposal) distributi on, satisfying    ( , ) 0 ( ) 0 P z e Q z Then, we can rewrite P(e) as :   ( ) ( , ) Q z P z e       ( ) ( , ) ( , ) [ ( )]   P e P z e P z e E E w z Q Q   ( ) ( ) Q z Q z z z Monte Carlo estimate : T 1  ˆ   t t ( ) ( ) , where z ( ) P e w z Q Z T  1 t

  23. Properties of IS estimate of P(e) • Convergence: by law of large numbers    1 ˆ T     . . i a s ( ) ( ) ( ) for T P e w z P e 1 i T • Unbiased. ˆ  [ ( )] ( ) E Q P e P e • Variance:     [ ( )] Var w z 1   ˆ  N  Q i ( ) ( ) Var P e Var w z   Q Q   1 i T T

  24. Properties of IS estimate of P(e) • Mean Squared Error of the estimator       ˆ ˆ 2   ( ) ( ) ( ) MSE P e E P e P e     Q Q     ˆ 2 ˆ    ( ) [ ( )] ( ) P e E P e Var P e Q Q   ˆ  ( ) Var P e Q This quantity enclosed in the brackets is zero because the expected value of the [ ( )] Var w x estimator equals the expected value of g(x)  Q T

  25. Estimating P(X i |e)  Let (z) be a dirac - delta function, which is 1 if z contains x and 0 otherwise. x i i    ( ) ( , ) z P z e  x    E i ( ) ( , ) z P z e Q x ( )   ( , ) Q z P x e i    ( | ) i z P x e    i ( ) ( , ) ( , ) P e P z e P z e   E Q z  ( )  Q z Idea : Estimate numerator and denominato r by IS. T   k k , (z )w(z e) ˆ x ( , ) P x e i    i k 1 Ratio estimate : ( | ) P x e ˆ i T ( )  P e k , w(z e)  k 1    Estimate is biased : E ( | ) ( | ) P x e P x e i i 29

  26. Properties of the IS estimator for P(X i |e) • Convergence: By Weak law of large numbers    ( | ) ( | ) as T P x e P x e i i • Asymptotically unbiased  lim [ ( | )] ( | ) E P x e P x e   T P i i • Variance – Harder to analyze – Liu suggests a measure called “Effective sample size” 30

  27. Generating samples from Q • No restrictions on “how to” • Typically, express Q in product form: – Q(Z)=Q(Z 1 )xQ(Z 2 |Z 1 )x….xQ(Z n |Z 1 ,..Z n-1 ) • Sample along the order Z 1 ,..,Z n • Example: – Z 1  Q(Z 1 )=(0.2,0.8) – Z 2  Q(Z 2 |Z 1 )=(0.1,0.9,0.2,0.8) – Z 3  Q(Z 3 |Z 1 ,Z 2 )=Q(Z 3 )=(0.5,0.5)

  28. Outline • Definitions and Background on Statistics • Theory of importance sampling • Likelihood weighting • State-of-the-art importance sampling techniques 33

Recommend


More recommend