Approximate inference: Sampling methods Probabilistic Graphical - PowerPoint PPT Presentation

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani

Approximate inference  Approximate inference techniques  Deterministic approximation  Variational algorithms  Stochastic simulation / sampling methods 2

Sampling-based estimation  Assume that 𝒠 = 𝑦 (1) , … , 𝑦 (𝑂) shows the set of i.i.d. samples drawn from the desired distribution 𝑄  For any distribution 𝑄 , function 𝑔 , we can estimate 𝐹 𝑄 𝑔 : 𝑂 𝐹 𝑄 𝑔 ≈ 1 𝑔 𝑦 𝑜 𝑂 𝑜=1 Empirical expectation  Expectations reveal interesting properties about distribution 𝑄  Means and variance of 𝑄  Probability of events  E.g., we can find 𝑄(𝑦 = 𝑙) by estimating 𝐹 𝑄 𝑔 where 𝑔 𝑦 = 𝐽 𝑦 = 𝑙  We can use a stochastic representation of a complex distribution 3

Bounds on error  Hoeffding bound  additive bound on error ≤ 2𝑓 −2𝑂𝜗 2  𝑄 𝜄 ∉ 𝜄 − 𝜗, 𝜄 + 𝜗 𝒠 1 𝑂 𝑔 𝒚 (𝑜) where 𝒚 (𝑜) ~𝑄(𝒚) 𝑂 𝑜=1 𝜄 = 𝜄 = 𝐹 𝑄 𝒚 𝑔 𝒚  Chernouf bound ≤ 2𝑓 − 𝑂𝜄𝜗 2  𝑄 𝜄 ∉ 𝜄 1 − 𝜗 , 𝜄 1 + 𝜗 3 𝒠  multiplicative bound on error 4

The mean and variance of the estimator  For samples drawn independently from the distribution 𝑄 : 𝑂 𝑔 = 1 𝑔 𝑦 𝑜 𝑂 𝑜=1 𝐹 𝑔 = 𝐹 𝑔 𝑔 = 1 𝑤𝑏𝑠 2 𝑀 𝐹 𝑔 − 𝐹 𝑔 5

Monte Carlo methods  Using a set of samples to find the answer of an inference query  expectations can be approximated using sample-based averages  Asymptotically exact and easy to apply to arbitrary problems  Challenges:  Drawing samples from many distributions is not trivial  Are the gathered samples enough?  Are all samples useful, or equally useful? 6

Generating samples form a distribution  Assume that we have an algorithm that generates (pseudo-) random numbers distributed uniformly over (0,1)  How do we generate a sample from these distributions. First, we see simple cases:  Bernoulli  Multinomial  Other standard distributions 7

Transformation technique  We intend to generate samples form standard distributions  map the values generated by uniform random number generator such the resulting mapped samples have the desired distribution.  Choose function 𝑔 . such that the resulting values of 𝑧 = 𝑔 𝑦 have some specific desired distribution 𝑄 𝑧 : 𝑒𝑦 𝑄 𝑧 = 𝑄 𝑦 𝑒𝑧  Since 𝑄 𝑦 = 1 , we have: 𝑧 𝑄 𝑧 ′ 𝑒𝑧′ 𝑦 = −∞ 𝑧 𝑄 𝑧 ′ 𝑒𝑧′ ⇒ 𝑧 = ℎ −1 𝑦  If we define ℎ 𝑧 ≡ −∞ 8

Transformation technique  Cumulative CDF Sampling:  If 𝑦~𝑉(0,1) , and ℎ(. ) is the CDF of 𝑄 , then ℎ −1 (𝑦) ~𝑄 .  Since we need to calculate and then invert the indefinite integral of 𝑄 , it will only be feasible for a limited number of simple distributions  Thus, we will see first rejection sampling and importance sampling (in the next slides) that can be used as important components in the more general sampling techniques. 9

Rejection sampling  Suppose we wish to sample from 𝑄(𝒚) = 𝑄(𝒚)/𝑎 .  𝑄(𝒚) is difficult to sample, but 𝑄(𝒚) is easy to evaluate  We choose a simpler (proposal) distribution 𝑅 𝒚 that we can sample from it more easily  Where ∃𝑙, 𝑙𝑅(𝒚) ≥ 𝑄 𝒚  Sample from 𝑅 𝒚 : 𝒚 ∗ ~𝑅(𝒚) 𝑄 𝒚 ∗  accept 𝒚 ∗ with probability 𝑙𝑅(𝒚 ∗ ) 𝑙𝑅(𝑦) 𝑄(𝑦) 10 𝑦 ∗ 𝑦

Rejection sampling  Correctness: 𝑄 𝒚 𝑙𝑅(𝒚) 𝑅(𝒚) 𝑄 𝒚 = 𝑄 𝒚 𝑒𝒚 = 𝑄 𝒚 𝑄 𝒚 𝑙𝑅 𝒚 𝑅 𝒚 𝑒𝒚 Probability of acceptance: 𝑄 𝒚 𝑒𝒚 𝑄 𝒚 𝑄 𝑏𝑑𝑑𝑓𝑞𝑢 = 𝑙𝑅 𝒚 𝑅 𝒚 𝑒𝒚 = 𝑙 11

Adaptive rejection sampling  It is difficult to determine a suitable analytic form for 𝑅  We can use envelope functions to define 𝑅 when 𝑄(𝑦) is log concave  Intersections of tangent lines are used to construct 𝑅  Initially, gradient are evaluated at some initial set of grid points and tangent lines are found accordingly.  In each iteration, a sample can be drawn from the envelope distribution.  the envelope distribution comprises a piecewise exponential distribution and drawing a sample from it is straight forward  If the sample is rejected, then it is incorporated into the set of grid points, a new tangent line is computed, and 𝑅 is thereby refined. ln 𝑄(𝑦) 12 𝑦 1 𝑦 2 𝑦 3 𝑦

High dimensional rejection sampling  Problem: low acceptance rate rejection sampling in high dimensional spaces  exponential decrease of acceptance rate with dimensionality  Example:  Using 𝑅 = 𝑂(𝝂, 𝜏 𝑟 2 𝑱) to sample 𝑄 = 𝑂(𝝂, 𝜏 𝑞 2 𝑱)  If 𝜏 𝑟 exceeds 𝜏 𝑞 by 1%, and 𝑒 = 1000 𝑒 𝜏 𝑟 ≈ 20,000 and so the optimal acceptance  𝜏 𝑄  rate is 1/20,000 that is too small 𝑄(𝑦) 13 𝑦

Importance sampling  Suppose sampling from 𝑄 is hard.  Simpler proposal distribution 𝑅 is used instead.  If 𝑅 dominates 𝑄 (i.e., 𝑅(𝒚) > 0 whenever 𝑄(𝒚) > 0 ), we can sample from 𝑅 and reweight the obtained samples: = 𝑔 𝒚 𝑄 𝒚 𝑒𝒚 = 𝑔 𝒚 𝑄 𝒚 𝐹 𝑄 𝑔 𝒚 𝑅 𝒚 𝑅 𝒚 𝑒𝒚 𝑄 𝒚 (𝑜) 1 𝒚 𝑜 ~𝑅 𝒚 𝑂 𝑔 𝒚 𝑜 𝑂 𝑜=1 𝐹 𝑄 𝑔 𝒚 ≈ 𝑅 𝒚 𝑜 𝑂 ≈ 1 𝑔 𝒚 𝑜 𝑥 (𝑜) 𝐹 𝑄 𝑔 𝒚 𝑂 𝑥 (𝑜) = 𝑄 𝒚 (𝑜) 𝑅 𝒚 𝑜 𝑜=1 14

Normalized importance sampling  Suppose that we can only evaluate 𝑄(𝒚) where 𝑄 𝒚 = 𝑄(𝒚)/𝑎 : = 𝑔 𝒚 𝑄 𝒚 𝑒𝒚 = 𝑎 𝑅 𝑄 𝒚 𝐹 𝑄 𝑔 𝒚 𝑔 𝒚 𝑅 𝒚 𝑅 𝒚 𝑒𝒚 𝑎 𝑄 𝑎 𝑄 = 1 𝑄 𝒚 𝑄 𝒚 𝑒𝒚 = 𝑅 𝒚 𝑅 𝒚 𝑒𝒚 = 𝑠 𝒚 𝑅 𝒚 𝑒𝒚 𝑎 𝑅 𝑎 𝑅 𝑄 𝒚 𝑔 𝒚 𝑠 𝒚 𝑅 𝒚 𝑒𝒚 𝑠 𝒚 = 𝐹 𝑄 𝑔 𝒚 = 𝑅 𝒚 𝑠 𝒚 𝑅 𝒚 𝑒𝒚 1 𝑂 𝑔 𝒚 𝑜 𝑠 (𝑜) 𝑂 𝑜=1 𝒚 𝑜 ~𝑅 𝒚 𝐹 𝑄 𝑔 𝒚 ≈ 1 𝑂 𝑠 (𝑛) 𝑂 𝑛=1 𝑂 𝑔 𝒚 𝑜 𝑥 (𝑜) 𝐹 𝑄 𝑔 𝒚 ≈ 𝑠 (𝑜) 𝑥 (𝑜) = 𝑂 𝑠 (𝑛) 𝑛=1 𝑜=1 15

Importance sampling: problem  Importance sampling depends on how well 𝑅 matches 𝑄  For mismatch distributions, weights may be dominated by few samples having large weights, with the remaining weights being relatively insignificant  It is common that 𝑄(𝒚)𝑔(𝒚) is strongly varying and has a significant proportion of its mass concentrated in a small region  The problem is severe if none of the samples falls in the regions where 𝑄(𝒚)𝑔(𝒚) is large.  The estimate of the expectation may be severely wrong w hile the variance of 𝑠 (𝑜) can be small 𝑔(𝑦) 𝑄(𝑦) 𝑅(𝑦) A key requirement for 𝑅(𝒚) is that it should not be small or zero in regions where 𝑄(𝒚) may be significant. 16 [Bishop book]

Sampling methods for graphical models  DGMs:  Forward (or ancestral) sampling  Likelihood weighted sampling  For UGMs, there is no one-pass sampling strategy that will sample even from the prior distribution with no observed variables.  Instead, computationally more expensive techniques such as Gibbs sampling exist that will be introduced in the next slides 17

Sampling the joint distribution represented by a BN  Sample the joint distribution by ancestral sampling  Example:  Sample from 𝑄(𝐸) ⇒ 𝐸 = 𝑒 1  Sample from 𝑄(𝐽) ⇒ 𝐽 = 𝑗 0  Sample from 𝑄 𝐻 𝑗 0 , 𝑒 1 ⇒ 𝐻 = 𝑕 3  Sample from 𝑄 𝑇 𝑗 0 ⇒ 𝑇 = 𝑡 0  Sample from 𝑄 𝑀 𝑕 3 ⇒ 𝑀 = 𝑚 0  One sample 𝑒 1 , 𝑗 0 , 𝑕 3 , 𝑡 0 , 𝑚 0 was generated 18

Forward sampling in a BN  Given a BN, and number of samples 𝑂  Choose a topological ordering on variables, e.g., 𝑌 1 , … , 𝑌 𝑁  For j = 1 to N  For i = 1 to M (𝑘) from the distribution 𝑄(𝑌 𝑗 |𝒚 𝑄𝑏 𝑌𝑗 (𝑘) )  Sample 𝑦 𝑗 (𝑘) , … , 𝑦 𝑁 (𝑘) } to the sample set  Add {𝑦 1 19

Sampling for conditional probability query  𝑄 𝑗 1 𝑚 0 , 𝑡 0 ) =?  Looking at the samples we can count:  𝑂 : # of samples  𝑂 𝑓 : # of samples in which the evidence holds (𝑀 = 𝑚 0 , 𝑇 = 𝑡 0 )  𝑂 𝐽 : # of samples where the joint is true (𝑀 = 𝑚 0 , 𝑇 = 𝑡 0 , 𝐽 = 𝑗 1 )  For a large enough 𝑂  𝑂 𝑓 /𝑂 ≈ 𝑄(𝑚 0 , 𝑡 0 )  𝑂 𝐽 /𝑂 ≈ 𝑄(𝑗 1 , 𝑚 0 , 𝑡 0 )  And so, we can set 𝑄(𝑗 1 ,𝑚 0 ,𝑡 0 )  𝑄 𝑗 1 𝑚 0 , 𝑡 0 ) = 𝑄(𝑚 0 ,𝑡 0 ) ≈ 𝑂 𝐽 /𝑂 𝑓 20

Using rejection sampling to compute 𝑄(𝒀|𝒇)  Given a BN, a query 𝑄(𝒀|𝒇) , and number of samples 𝑂  Choose a topological ordering on variables, e.g., 𝑌 1 , … , 𝑌 𝑁  j=1  While j<N  For i = 1 to M (𝑘) from the distribution 𝑄(𝑌 𝑗 |𝒚 𝑄𝑏 𝑌𝑗 (𝑘) )  Sample 𝑦 𝑗 (𝑘) , … , 𝑦 𝑁 (𝑘) } consistent with evidence 𝒇 add it to sample set and  If {𝑦 1 j=j+1  Use samples to compute 𝑄(𝒀|𝒇) 21

Approximate inference: Sampling methods Probabilistic Graphical - PowerPoint PPT Presentation

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani Approximate inference Approximate inference techniques Deterministic approximation Variational algorithms

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008

Two Approximate- Programmability Birds, One Statistical- Inference Stone Adrian Sampson

Approximate Inference: Randomized Methods October 15, 2015 Topics Hard Inference

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Travel Time Estimation using Approximate Belief States on a Hidden Markov Model Walid Krichene

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Approximate inference on graphical models: variational methods Alexandre Bouchard-C ot e

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

Variable Elimination 1 Inference Exact inference Enumeration Variable elimination

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Exact Inference Inference Basic task for inference: Compute

Introduction to MCMC DB Breakfast 09/30/2011 Guozhang Wang Motivation: Statistical

1 Ex. 1 The mean salt content of a certain type of potato chips is supposed to be 2.0mg. The salt

Probabilistic & Unsupervised Learning Sampling Methods Maneesh Sahani

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Status of LAr simulations Chris Marshall Lawrence Berkeley National Laboratory 4 th DUNE ND

Jay : Seaman Iris Last Lecture : Importance Sampling Xsnqcx Generate from Idea samples )

Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements

Multi-parameter models - Metropolis sampling Applied Bayesian Statistics Dr. Earvin Balderama

Approximate inference: Sampling methods Probabilistic Graphical - PowerPoint PPT Presentation

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani Approximate inference Approximate inference techniques Deterministic approximation Variational algorithms

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008

Two Approximate- Programmability Birds, One Statistical- Inference Stone Adrian Sampson

Approximate Inference: Randomized Methods October 15, 2015 Topics Hard Inference

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Travel Time Estimation using Approximate Belief States on a Hidden Markov Model Walid Krichene

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Approximate inference on graphical models: variational methods Alexandre Bouchard-C ot e

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

Variable Elimination 1 Inference Exact inference Enumeration Variable elimination

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Exact Inference Inference Basic task for inference: Compute

Introduction to MCMC DB Breakfast 09/30/2011 Guozhang Wang Motivation: Statistical

1 Ex. 1 The mean salt content of a certain type of potato chips is supposed to be 2.0mg. The salt

Probabilistic &amp; Unsupervised Learning Sampling Methods Maneesh Sahani

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Status of LAr simulations Chris Marshall Lawrence Berkeley National Laboratory 4 th DUNE ND

Jay : Seaman Iris Last Lecture : Importance Sampling Xsnqcx Generate from Idea samples )

Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements

Multi-parameter models - Metropolis sampling Applied Bayesian Statistics Dr. Earvin Balderama

Probabilistic & Unsupervised Learning Sampling Methods Maneesh Sahani