monte carlo approximation methods
play

Monte Carlo approximation methods Milos Hauskrecht - PDF document

CS 3750 Machine Learning Lecture 5 Monte Carlo approximation methods Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square CS 3750 Advanced Machine Learning Monte Carlo inference Let us assume we have a probability distribution P (X)


  1. CS 3750 Machine Learning Lecture 5 Monte Carlo approximation methods Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square CS 3750 Advanced Machine Learning Monte Carlo inference • Let us assume we have a probability distribution P (X) represented e.g. using BBN or MRF, and we want calculate P ( x ) or P ( x | e) • We can use exact probabilistic inference, but it may be hard to calculate • Monte Carlo approximation : – Idea: The probability P (x) is approximated using sample frequencies • Idea (first method): – Generate a random sample D of size M from P (X) – Estimate P(x) as: M ˆ  )   X x P ( X x D M CS 3750 Advanced Machine Learning 1

  2. Absolute Error Bound • Hoeffding’s bound lets us bound the probability with ˆ ( x ) which the estimate differs from by more ( ) P P x D  than   ˆ        2  2 M ( ( ) [ ( ) , ( ) ]) 2 P P x P x P x e D The bound can be used to decide on how many samples are required to achieve a desired accuracy:  ln( 2 / )  M  2 2 3 Relative Error Bound • Chernoff’s bound lets us bound the probability of the estimate  ˆ ( x ) ( ) P P x exceeding a relative error of the true value . D ˆ       2   ( ) / 3 MP x ( ( ) ( )( 1 )) 2 P P x P x e D • This leads to the following sample complexity bound:  ln( 2 / ) M  3  2 ( ) P x 4 2

  3. Monte Carlo inference challenges Challenge 1: How to generate M (unbiased) examples from the target distribution P(X) or P(X |e)? – Generating (unbiased) examples from P(X) or P(X|e) may be hard, or very inefficient Example: • Assume I have a distribution over 100 binary variables – There are 2 100 possible configurations of variable values • Trivial sampling solution: – Calculate and store the probability of each configuration – Pick randomly a configuration based on its probability • Problem: terribly inefficient in time and memory CS 3750 Advanced Machine Learning Monte Carlo inference challenges Challenge 2: How to estimate the expected value of f(x) for P(x) : • Generally, we can estimate this expectation by generating samples x[1], …, x[M] from P, and then estimating it as:     [ ] ( ) ( ) E f p x f x dx [ ] ( ) ( ) E f P x f x P P x x 1 M  ˆ ˆ    [ ] ( [ ]) E f f x m P M  m 1   0  2 ˆ   • Using the central limit theorem, the estimate follows  , N     M – Where the variance for f(x) is     2 2 ( )[ ( ) ( ( ))] p x f x E f x dx P x • Problem: we are unable to efficiently sample P (x). What to do? CS 3750 Advanced Machine Learning 3

  4. Central limit theorem • Central limit theorem:  , , Let random variables form a random sample X X X 1 2 m   2 from a distribution with mean and variance , then if the sample n is large, the distribution m  1 m  i   m     2 or 2 ( , ) X N ( , / m ) X N m i m   i 1 i 1 Effect of increasing the sample size m on the sample mean: 2 1.8    1.6 0 m 100 1.4 2  1.2  4  50 1 m 0.8  30 0.6 m 0.4 0.2 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 CS 3750 Advanced Machine Learning Monte Carlo inference: BBNs Challenge 1: How to generate M (unbiased) examples from the target distribution P(X) defined by a BBN? • Good news: Sample generation for the full joint defined by the BBN is easy – One top down sweep through the network lets us generate one example according to P(X) – Example: B E Examples are generated in a top down manner, A following the links M J CS 3750 Advanced Machine Learning 4

  5. BBN sampling example P (B) P (E) T F T F Burglary 0.001 0.999 Earthquake 0.002 0.998 P (A|B,E) B E T F T T 0.95 0.05 T F 0.94 0.06 Alarm F T 0.29 0.71 F F 0.001 0.999 P (J|A) P (M|A) A T F A T F T 0.90 0.1 JohnCalls MaryCalls T 0.7 0.3 F 0.05 0.95 F 0.01 0.99 CS 3750 Advanced Machine Learning BBN sampling example P (B) P (E) T F T F Burglary 0.001 0.999 Earthquake 0.002 0.998 P (A|B,E) F B E T F T T 0.95 0.05 T F 0.94 0.06 Alarm F T 0.29 0.71 F F 0.001 0.999 P (J|A) P (M|A) A T F A T F T 0.90 0.1 JohnCalls MaryCalls T 0.7 0.3 F 0.05 0.95 F 0.01 0.99 CS 3750 Advanced Machine Learning 5

  6. BBN sampling example P (B) P (E) T F T F Burglary 0.001 0.999 Earthquake 0.002 0.998 F P (A|B,E) F B E T F T T 0.95 0.05 T F 0.94 0.06 Alarm F T 0.29 0.71 F F 0.001 0.999 P (J|A) P (M|A) A T F A T F T 0.90 0.1 JohnCalls MaryCalls T 0.7 0.3 F 0.05 0.95 F 0.01 0.99 CS 3750 Advanced Machine Learning BBN sampling example P (B) P (E) T F T F Burglary 0.001 0.999 Earthquake 0.002 0.998 P (A|B,E) F F B E T F T T 0.95 0.05 T F 0.94 0.06 F Alarm F T 0.29 0.71 F F 0.001 0.999 P (J|A) P (M|A) A T F A T F T 0.90 0.1 JohnCalls MaryCalls T 0.7 0.3 F 0.05 0.95 F 0.01 0.99 CS 3750 Advanced Machine Learning 6

  7. BBN sampling example P (B) P (E) T F T F Burglary 0.001 0.999 Earthquake 0.002 0.998 F P (A|B,E) F B E T F T T 0.95 0.05 T F 0.94 0.06 F Alarm F T 0.29 0.71 F F 0.001 0.999 P (J|A) P (M|A) A T F A T F T 0.90 0.1 JohnCalls MaryCalls T 0.7 0.3 F 0.05 0.95 F 0.01 0.99 F CS 3750 Advanced Machine Learning BBN sampling example P (B) P (E) T F T F Burglary 0.001 0.999 Earthquake 0.002 0.998 P (A|B,E) F F B E T F T T 0.95 0.05 T F 0.94 0.06 F Alarm F T 0.29 0.71 F F 0.001 0.999 P (J|A) P (M|A) A T F A T F T 0.90 0.1 JohnCalls MaryCalls T 0.7 0.3 F 0.05 0.95 F 0.01 0.99 F F CS 3750 Advanced Machine Learning 7

  8. BBN sampling example P (B) P (E) T F T F Burglary 0.001 0.999 Earthquake 0.002 0.998 F P (A|B,E) F B E T F Sample: T T 0.95 0.05 T F 0.94 0.06 F F Alarm F F T 0.29 0.71 F F F 0.001 0.999 P (J|A) F F P(M|A) A T F A T F T 0.90 0.1 JohnCalls MaryCalls T 0.7 0.3 F 0.05 0.95 F 0.01 0.99 F F CS 3750 Advanced Machine Learning Monte Carlo inference: BBNs Challenge 1: How to generate M (unbiased) examples from the target distribution P(X) defined by BBN? • Good news: Sample generation for the full joint defined by the BBN is easy – One top down sweep through the network lets us generate one example according to P(X) – Example: B E Examples are generated in a top down manner, A following the links M J – Repeat many times to get enough of examples CS 3750 Advanced Machine Learning 8

  9. Monte Carlo inference: BBNs Knowing how to generate efficiently examples from the full joint lets us efficiently estimate: – Joint probabilities over a subset variables – Marginals on variables • Example: B E A M J The probability is approximated using sample frequency   # , samples with B T J T ~ N      , B T J T ( , ) P B T J T N total # samples CS 3750 Advanced Machine Learning Monte Carlo inference: BBNs • MC approximation of conditional probabilities : – The probability can approximated using sample frequencies – Example:   # , samples with B T J T ~ N      , B T J T ( | ) P B T J T N  J T  # samples with J T • Solution 1 (rejection sampling): – Generate examples from P (X) which we know how to do efficiently • Use only samples that agree with the condition (J=T), the remaining samples are rejected • Problem: many examples are rejected. What if P (J=T) is very small? CS 3750 Advanced Machine Learning 9

  10. Monte Carlo inference: BBNs • MC approximation of conditional probabilities • Solution 2 (likelihood weighting) – Avoids inefficiencies of rejection sampling – Idea: generate only samples consistent with an evidence (or conditioning event); If the value is set no sampling • Problem: using simple counts is not enough since these may occur with different probabilities • Likelihood weighting: – With every sample keep a weight with which it should count towards the estimate  w  B T ~      samples with B T and J T P ( B T | J T )  w  B x  samples with any value of B and J T CS 3750 Advanced Machine Learning BBN likelihood weighting example P (E) T F P (B) 0.002 0.998 T F Burglary 0.001 0.999 Earthquake E = F (set !!!) P (A|B,E) B E T F T T 0.95 0.05 T F 0.94 0.06 Alarm F T 0.29 0.71 F F 0.001 0.999 P (J|A) P (M|A) A T F A T F T 0.90 0.1 JohnCalls MaryCalls T 0.7 0.3 F 0.05 0.95 F 0.01 0.99 J = T (set !!!) CS 3750 Advanced Machine Learning 10

Recommend


More recommend