approximate inference
play

Approximate inference: Sampling methods Probabilistic Graphical - PowerPoint PPT Presentation

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani Approximate inference Approximate inference techniques Deterministic approximation Variational algorithms


  1. Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani

  2. Approximate inference  Approximate inference techniques  Deterministic approximation  Variational algorithms  Stochastic simulation / sampling methods 2

  3. Sampling-based estimation  Assume that 𝒠 = 𝑦 (1) , … , 𝑦 (𝑂) shows the set of i.i.d. samples drawn from the desired distribution 𝑄  For any distribution 𝑄 , function 𝑔 , we can estimate 𝐹 𝑄 𝑔 : 𝑂 𝐹 𝑄 𝑔 ≈ 1 𝑔 𝑦 𝑜 𝑂 𝑜=1 Empirical expectation  Expectations reveal interesting properties about distribution 𝑄  Means and variance of 𝑄  Probability of events  E.g., we can find 𝑄(𝑦 = 𝑙) by estimating 𝐹 𝑄 𝑔 where 𝑔 𝑦 = 𝐽 𝑦 = 𝑙  We can use a stochastic representation of a complex distribution 3

  4. Bounds on error  Hoeffding bound  additive bound on error ≤ 2𝑓 −2𝑂𝜗 2  𝑄 𝜄 ∉ 𝜄 − 𝜗, 𝜄 + 𝜗 𝒠 1 𝑂 𝑔 𝒚 (𝑜) where 𝒚 (𝑜) ~𝑄(𝒚) 𝑂 𝑜=1 𝜄 = 𝜄 = 𝐹 𝑄 𝒚 𝑔 𝒚  Chernouf bound ≤ 2𝑓 − 𝑂𝜄𝜗 2  𝑄 𝜄 ∉ 𝜄 1 − 𝜗 , 𝜄 1 + 𝜗 3 𝒠  multiplicative bound on error 4

  5. The mean and variance of the estimator  For samples drawn independently from the distribution 𝑄 : 𝑂 𝑔 = 1 𝑔 𝑦 𝑜 𝑂 𝑜=1 𝐹 𝑔 = 𝐹 𝑔 𝑔 = 1 𝑤𝑏𝑠 2 𝑀 𝐹 𝑔 − 𝐹 𝑔 5

  6. Monte Carlo methods  Using a set of samples to find the answer of an inference query  expectations can be approximated using sample-based averages  Asymptotically exact and easy to apply to arbitrary problems  Challenges:  Drawing samples from many distributions is not trivial  Are the gathered samples enough?  Are all samples useful, or equally useful? 6

  7. Generating samples form a distribution  Assume that we have an algorithm that generates (pseudo-) random numbers distributed uniformly over (0,1)  How do we generate a sample from these distributions. First, we see simple cases:  Bernoulli  Multinomial  Other standard distributions 7

  8. Transformation technique  We intend to generate samples form standard distributions  map the values generated by uniform random number generator such the resulting mapped samples have the desired distribution.  Choose function 𝑔 . such that the resulting values of 𝑧 = 𝑔 𝑦 have some specific desired distribution 𝑄 𝑧 : 𝑒𝑦 𝑄 𝑧 = 𝑄 𝑦 𝑒𝑧  Since 𝑄 𝑦 = 1 , we have: 𝑧 𝑄 𝑧 ′ 𝑒𝑧′ 𝑦 = −∞ 𝑧 𝑄 𝑧 ′ 𝑒𝑧′ ⇒ 𝑧 = ℎ −1 𝑦  If we define ℎ 𝑧 ≡ −∞ 8

  9. Transformation technique  Cumulative CDF Sampling:  If 𝑦~𝑉(0,1) , and ℎ(. ) is the CDF of 𝑄 , then ℎ −1 (𝑦) ~𝑄 .  Since we need to calculate and then invert the indefinite integral of 𝑄 , it will only be feasible for a limited number of simple distributions  Thus, we will see first rejection sampling and importance sampling (in the next slides) that can be used as important components in the more general sampling techniques. 9

  10. Rejection sampling  Suppose we wish to sample from 𝑄(𝒚) = 𝑄(𝒚)/𝑎 .  𝑄(𝒚) is difficult to sample, but 𝑄(𝒚) is easy to evaluate  We choose a simpler (proposal) distribution 𝑅 𝒚 that we can sample from it more easily  Where ∃𝑙, 𝑙𝑅(𝒚) ≥ 𝑄 𝒚  Sample from 𝑅 𝒚 : 𝒚 ∗ ~𝑅(𝒚) 𝑄 𝒚 ∗  accept 𝒚 ∗ with probability 𝑙𝑅(𝒚 ∗ ) 𝑙𝑅(𝑦) 𝑄(𝑦) 10 𝑦 ∗ 𝑦

  11. Rejection sampling  Correctness: 𝑄 𝒚 𝑙𝑅(𝒚) 𝑅(𝒚) 𝑄 𝒚 = 𝑄 𝒚 𝑒𝒚 = 𝑄 𝒚 𝑄 𝒚 𝑙𝑅 𝒚 𝑅 𝒚 𝑒𝒚 Probability of acceptance: 𝑄 𝒚 𝑒𝒚 𝑄 𝒚 𝑄 𝑏𝑑𝑑𝑓𝑞𝑢 = 𝑙𝑅 𝒚 𝑅 𝒚 𝑒𝒚 = 𝑙 11

  12. Adaptive rejection sampling  It is difficult to determine a suitable analytic form for 𝑅  We can use envelope functions to define 𝑅 when 𝑄(𝑦) is log concave  Intersections of tangent lines are used to construct 𝑅  Initially, gradient are evaluated at some initial set of grid points and tangent lines are found accordingly.  In each iteration, a sample can be drawn from the envelope distribution.  the envelope distribution comprises a piecewise exponential distribution and drawing a sample from it is straight forward  If the sample is rejected, then it is incorporated into the set of grid points, a new tangent line is computed, and 𝑅 is thereby refined. ln 𝑄(𝑦) 12 𝑦 1 𝑦 2 𝑦 3 𝑦

  13. High dimensional rejection sampling  Problem: low acceptance rate rejection sampling in high dimensional spaces  exponential decrease of acceptance rate with dimensionality  Example:  Using 𝑅 = 𝑂(𝝂, 𝜏 𝑟 2 𝑱) to sample 𝑄 = 𝑂(𝝂, 𝜏 𝑞 2 𝑱)  If 𝜏 𝑟 exceeds 𝜏 𝑞 by 1%, and 𝑒 = 1000 𝑒 𝜏 𝑟 ≈ 20,000 and so the optimal acceptance  𝜏 𝑄  rate is 1/20,000 that is too small 𝑄(𝑦) 13 𝑦

  14. Importance sampling  Suppose sampling from 𝑄 is hard.  Simpler proposal distribution 𝑅 is used instead.  If 𝑅 dominates 𝑄 (i.e., 𝑅(𝒚) > 0 whenever 𝑄(𝒚) > 0 ), we can sample from 𝑅 and reweight the obtained samples: = 𝑔 𝒚 𝑄 𝒚 𝑒𝒚 = 𝑔 𝒚 𝑄 𝒚 𝐹 𝑄 𝑔 𝒚 𝑅 𝒚 𝑅 𝒚 𝑒𝒚 𝑄 𝒚 (𝑜) 1 𝒚 𝑜 ~𝑅 𝒚 𝑂 𝑔 𝒚 𝑜 𝑂 𝑜=1 𝐹 𝑄 𝑔 𝒚 ≈ 𝑅 𝒚 𝑜 𝑂 ≈ 1 𝑔 𝒚 𝑜 𝑥 (𝑜) 𝐹 𝑄 𝑔 𝒚 𝑂 𝑥 (𝑜) = 𝑄 𝒚 (𝑜) 𝑅 𝒚 𝑜 𝑜=1 14

  15. Normalized importance sampling  Suppose that we can only evaluate 𝑄(𝒚) where 𝑄 𝒚 = 𝑄(𝒚)/𝑎 : = 𝑔 𝒚 𝑄 𝒚 𝑒𝒚 = 𝑎 𝑅 𝑄 𝒚 𝐹 𝑄 𝑔 𝒚 𝑔 𝒚 𝑅 𝒚 𝑅 𝒚 𝑒𝒚 𝑎 𝑄 𝑎 𝑄 = 1 𝑄 𝒚 𝑄 𝒚 𝑒𝒚 = 𝑅 𝒚 𝑅 𝒚 𝑒𝒚 = 𝑠 𝒚 𝑅 𝒚 𝑒𝒚 𝑎 𝑅 𝑎 𝑅 𝑄 𝒚 𝑔 𝒚 𝑠 𝒚 𝑅 𝒚 𝑒𝒚 𝑠 𝒚 = 𝐹 𝑄 𝑔 𝒚 = 𝑅 𝒚 𝑠 𝒚 𝑅 𝒚 𝑒𝒚 1 𝑂 𝑔 𝒚 𝑜 𝑠 (𝑜) 𝑂 𝑜=1 𝒚 𝑜 ~𝑅 𝒚 𝐹 𝑄 𝑔 𝒚 ≈ 1 𝑂 𝑠 (𝑛) 𝑂 𝑛=1 𝑂 𝑔 𝒚 𝑜 𝑥 (𝑜) 𝐹 𝑄 𝑔 𝒚 ≈ 𝑠 (𝑜) 𝑥 (𝑜) = 𝑂 𝑠 (𝑛) 𝑛=1 𝑜=1 15

  16. Importance sampling: problem  Importance sampling depends on how well 𝑅 matches 𝑄  For mismatch distributions, weights may be dominated by few samples having large weights, with the remaining weights being relatively insignificant  It is common that 𝑄(𝒚)𝑔(𝒚) is strongly varying and has a significant proportion of its mass concentrated in a small region  The problem is severe if none of the samples falls in the regions where 𝑄(𝒚)𝑔(𝒚) is large.  The estimate of the expectation may be severely wrong w hile the variance of 𝑠 (𝑜) can be small 𝑔(𝑦) 𝑄(𝑦) 𝑅(𝑦) A key requirement for 𝑅(𝒚) is that it should not be small or zero in regions where 𝑄(𝒚) may be significant. 16 [Bishop book]

  17. Sampling methods for graphical models  DGMs:  Forward (or ancestral) sampling  Likelihood weighted sampling  For UGMs, there is no one-pass sampling strategy that will sample even from the prior distribution with no observed variables.  Instead, computationally more expensive techniques such as Gibbs sampling exist that will be introduced in the next slides 17

  18. Sampling the joint distribution represented by a BN  Sample the joint distribution by ancestral sampling  Example:  Sample from 𝑄(𝐸) ⇒ 𝐸 = 𝑒 1  Sample from 𝑄(𝐽) ⇒ 𝐽 = 𝑗 0  Sample from 𝑄 𝐻 𝑗 0 , 𝑒 1 ⇒ 𝐻 = 𝑕 3  Sample from 𝑄 𝑇 𝑗 0 ⇒ 𝑇 = 𝑡 0  Sample from 𝑄 𝑀 𝑕 3 ⇒ 𝑀 = 𝑚 0  One sample 𝑒 1 , 𝑗 0 , 𝑕 3 , 𝑡 0 , 𝑚 0 was generated 18

  19. Forward sampling in a BN  Given a BN, and number of samples 𝑂  Choose a topological ordering on variables, e.g., 𝑌 1 , … , 𝑌 𝑁  For j = 1 to N  For i = 1 to M (𝑘) from the distribution 𝑄(𝑌 𝑗 |𝒚 𝑄𝑏 𝑌𝑗 (𝑘) )  Sample 𝑦 𝑗 (𝑘) , … , 𝑦 𝑁 (𝑘) } to the sample set  Add {𝑦 1 19

  20. Sampling for conditional probability query  𝑄 𝑗 1 𝑚 0 , 𝑡 0 ) =?  Looking at the samples we can count:  𝑂 : # of samples  𝑂 𝑓 : # of samples in which the evidence holds (𝑀 = 𝑚 0 , 𝑇 = 𝑡 0 )  𝑂 𝐽 : # of samples where the joint is true (𝑀 = 𝑚 0 , 𝑇 = 𝑡 0 , 𝐽 = 𝑗 1 )  For a large enough 𝑂  𝑂 𝑓 /𝑂 ≈ 𝑄(𝑚 0 , 𝑡 0 )  𝑂 𝐽 /𝑂 ≈ 𝑄(𝑗 1 , 𝑚 0 , 𝑡 0 )  And so, we can set 𝑄(𝑗 1 ,𝑚 0 ,𝑡 0 )  𝑄 𝑗 1 𝑚 0 , 𝑡 0 ) = 𝑄(𝑚 0 ,𝑡 0 ) ≈ 𝑂 𝐽 /𝑂 𝑓 20

  21. Using rejection sampling to compute 𝑄(𝒀|𝒇)  Given a BN, a query 𝑄(𝒀|𝒇) , and number of samples 𝑂  Choose a topological ordering on variables, e.g., 𝑌 1 , … , 𝑌 𝑁  j=1  While j<N  For i = 1 to M (𝑘) from the distribution 𝑄(𝑌 𝑗 |𝒚 𝑄𝑏 𝑌𝑗 (𝑘) )  Sample 𝑦 𝑗 (𝑘) , … , 𝑦 𝑁 (𝑘) } consistent with evidence 𝒇 add it to sample set and  If {𝑦 1 j=j+1  Use samples to compute 𝑄(𝒀|𝒇) 21

Recommend


More recommend