Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2017 Bozhena Bidyuk Rina Dechter Reading” Darwiche chapter 15, related papers
Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling
Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Cutset-based Variance Reduction 6. AND/OR importance sampling
Probabilistic Reasoning; Graphical models • Graphical models: – Bayesian network, constraint networks, mixed network • Queries • Exact algorithm – using inference, – search and hybrids • Graph parameters: – tree-width, cycle-cutset, w-cutset
Queries • Probability of evidence (or partition function) n ( ) Z i C ( ) ( | ) | P e P x i pa i i e X i var( ) 1 X e i • Posterior marginal (beliefs): n ( | ) | P x pa j j e ( , ) P x e var( ) 1 X e X j i ( | ) P x e i i n ( ) P e ( | ) | P x pa j j e X var( e ) 1 j • Most Probable Explanation x * arg max P( x , e) x
Approximation • Since inference, search and hybrids are too expensive when graph is dense; (high treewidth) then: • Bounding inference: (week 8) • mini-bucket and mini-clustering • Belief propagation • Bounding search: (week 7) • Sampling • Goal: an anytime scheme 8
Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling
Outline • Definitions and Background on Statistics • Theory of importance sampling • Likelihood weighting • State-of-the-art importance sampling techniques 10
A sample • Given a set of variables X={X 1 ,...,X n }, a sample, denoted by S t is an instantiation of all variables: t t t t ( , ,..., ) S x x x 1 2 n 11
How to draw a sample ? Univariate distribution • Example: Given random variable X having domain {0, 1} and a distribution P(X) = (0.3, 0.7). • Task: Generate samples of X from P. • How? – draw random number r [0, 1] – If (r < 0.3) then set X=0 – Else set X=1 12
How to draw a sample? Multi-variate distribution • Let X={X 1 ,..,X n } be a set of variables • Express the distribution in product form ( ) ( ) ( | ) ... ( | ,..., ) P X P X P X X P X X X 1 2 1 1 1 n n • Sample variables one by one from left to right, along the ordering dictated by the product form. • Bayesian network literature: Logic sampling 13
Sampling for Prob. Inference Outline • Logic Sampling • Importance Sampling – Likelihood Sampling – Choosing a Proposal Distribution • Markov Chain Monte Carlo (MCMC) – Metropolis-Hastings – Gibbs sampling • Variance Reduction
Logic Sampling: No Evidence (Henrion 1988) Input: Bayesian network X= {X 1 ,…,X N }, N- #nodes, T - # samples Output: T samples Process nodes in topological order – first process the ancestors of a node, then the node itself: 1. For t = 0 to T 2. For i = 0 to N t from P(x i | pa i ) X i sample x i 3. 15
Logic sampling (example) ( , , , ) ( ) ( | ) ( | ) ( | , ) P X X X X P X P X X P X X P X X X 1 2 3 4 1 2 1 3 1 4 2 3 ( ) P X No Evidence X 1 1 // generate sample k X 3 X 2 1 . Sample from ( ) x P x 1 1 ( | ) ( | ) P X 3 X P X 2 X 2 . Sample from ( | ) x P x X x 1 1 2 2 1 1 X 4 3 . Sample from ( | ) x P x X x 3 3 1 1 4 . Sample from ( | ) x P x X x X x ( | , ) P X X X 4 4 2 2 , 3 3 4 2 3 16
Logic Sampling w/ Evidence Input: Bayesian network X= {X 1 ,…,X N }, N- #nodes E – evidence, T - # samples Output: T samples consistent with E 1. For t=1 to T 2. For i=1 to N t from P(x i | pa i ) X i sample x i 3. If X i in E and X i x i , reject sample: 4. 5. Goto Step 1. 17
Logic Sampling (example) Evidence : 0 X 3 ( 1 ) P x X 1 // generate sample k 1 . Sample from ( ) x P x 1 1 X 3 X 2 2 . Sample from ( | ) x P x x 2 2 1 3 . Sample from ( | ) x P x x ( | ) ( | ) P x 3 x P x 2 x 3 3 1 1 1 X 4 4 . If 0, reject sample x 3 ( | , ) P x x x and start from 1, otherwise 4 2 3 5. Sample from ( | ) x P x x x 4 4 2 , 3 18
Expected value and Variance Expected value : Given a probability distribution P(X) and a function g(X) defined over a set of variables X = {X 1 , X 2 , … X n }, the expected value of g w.r.t. P is [ ( )] ( ) ( ) E g x g x P x P x Variance: The variance of g w.r.t. P is: 2 [ ( )] ( ) [ ( )] ( ) Var g x g x E g x P x P P x 20
Monte Carlo Estimate • Estimator: – An estimator is a function of the samples. – It produces an estimate of the unknown parameter of the sampling distribution. 1 2 T Given i.i.d. samples S , S , S drawn from , P the Monte carlo estimate of E [g(x)] is given by : P 1 T ˆ t ( ) g g S 1 t T 21
Example: Monte Carlo estimate • Given: – A distribution P(X) = (0.3, 0.7). – g(X) = 40 if X equals 0 = 50 if X equals 1. • Estimate E P [g(x)]=(40x0.3+50x0.7)=47. • Generate k samples from P: 0,1,1,1,0,1,1,0,1,0 40 # ( 0 ) 50 # ( 1 ) samples X samples X ˆ g # samples 40 4 50 6 46 10 22
Outline • Definitions and Background on Statistics • Theory of importance sampling • Likelihood weighting • State-of-the-art importance sampling techniques 23
Importance sampling: Main idea • Express query as the expected value of a random variable w.r.t. to a distribution Q. • Generate random samples from Q. • Estimate the expected value from the generated samples using a monte carlo estimator (average). 24
Importance sampling for P(e) \ , Let Z X E Let Q(Z) be a (proposal) distributi on, satisfying ( , ) 0 ( ) 0 P z e Q z Then, we can rewrite P(e) as : ( ) ( , ) Q z P z e ( ) ( , ) ( , ) [ ( )] P e P z e P z e E E w z Q Q ( ) ( ) Q z Q z z z Monte Carlo estimate : T 1 ˆ t t ( ) ( ) , where z ( ) P e w z Q Z T 1 t
Properties of IS estimate of P(e) • Convergence: by law of large numbers 1 ˆ T . . i a s ( ) ( ) ( ) for T P e w z P e 1 i T • Unbiased. ˆ [ ( )] ( ) E Q P e P e • Variance: [ ( )] Var w z 1 ˆ N Q i ( ) ( ) Var P e Var w z Q Q 1 i T T
Properties of IS estimate of P(e) • Mean Squared Error of the estimator ˆ ˆ 2 ( ) ( ) ( ) MSE P e E P e P e Q Q ˆ 2 ˆ ( ) [ ( )] ( ) P e E P e Var P e Q Q ˆ ( ) Var P e Q This quantity enclosed in the brackets is zero because the expected value of the [ ( )] Var w x estimator equals the expected value of g(x) Q T
Estimating P(X i |e) Let (z) be a dirac - delta function, which is 1 if z contains x and 0 otherwise. x i i ( ) ( , ) z P z e x E i ( ) ( , ) z P z e Q x ( ) ( , ) Q z P x e i ( | ) i z P x e i ( ) ( , ) ( , ) P e P z e P z e E Q z ( ) Q z Idea : Estimate numerator and denominato r by IS. T k k , (z )w(z e) ˆ x ( , ) P x e i i k 1 Ratio estimate : ( | ) P x e ˆ i T ( ) P e k , w(z e) k 1 Estimate is biased : E ( | ) ( | ) P x e P x e i i 29
Properties of the IS estimator for P(X i |e) • Convergence: By Weak law of large numbers ( | ) ( | ) as T P x e P x e i i • Asymptotically unbiased lim [ ( | )] ( | ) E P x e P x e T P i i • Variance – Harder to analyze – Liu suggests a measure called “Effective sample size” 30
Generating samples from Q • No restrictions on “how to” • Typically, express Q in product form: – Q(Z)=Q(Z 1 )xQ(Z 2 |Z 1 )x….xQ(Z n |Z 1 ,..Z n-1 ) • Sample along the order Z 1 ,..,Z n • Example: – Z 1 Q(Z 1 )=(0.2,0.8) – Z 2 Q(Z 2 |Z 1 )=(0.1,0.9,0.2,0.8) – Z 3 Q(Z 3 |Z 1 ,Z 2 )=Q(Z 3 )=(0.5,0.5)
Outline • Definitions and Background on Statistics • Theory of importance sampling • Likelihood weighting • State-of-the-art importance sampling techniques 33
Recommend
More recommend