Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin - PowerPoint PPT Presentation

Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3–4 in Robert & Casella 2 Updated: March 18, 2016 1 / 38

Outline 1 Introduction 2 Basic Monte Carlo 3 Importance sampling 4 Rao–Blackwellization 5 Root-finding and optimization via Monte Carlo 2 / 38

Motivation Sampling and integration are important techniques for a number of statistical inference problems. But often the target distributions are too complicated and/or the integrands are too complex or high-dimensional for these problems to be solved using the basic methods. In this section we will discuss a couple of clever but fairly simple techniques for handling such difficulties, both based on a concept of “importance sampling.” Some more powerful techniques, namely Markov Chain Monte Carlo (MCMC), will be discussed in the next section. 3 / 38

Notation Let f ( x ) be a pdf defined on a sample space X . f ( x ) may be a Bayesian posterior density that’s known only up to a proportionality constant. Let ϕ ( x ) be a function mapping X to R . The goal is to estimate E { ϕ ( X ) } when X ∼ f ( x ); i.e., � µ f ( ϕ ) := E { ϕ ( X ) } = ϕ ( x ) f ( x ) dx . X The function ϕ ( x ) can be almost anything — in general, we will only assume that µ f ( | ϕ | ) < ∞ . g ( x ) will denote a generic pdf on X , different from f ( x ). 4 / 38

LLN and CLT The general Monte Carlo method is based on the two most important results of probability theory — the law of large numbers (LLN) and the central limit theorem (CLT). LLN. If µ f ( | ϕ | ) < ∞ and X 1 , X 2 , . . . are iid f , then � n ϕ n := 1 ¯ ϕ ( X i ) → µ f ( ϕ ) , with probability 1 . n i =1 CLT. If µ f ( ϕ 2 ) < ∞ and X 1 , X 2 , . . . are iid f , then √ n { ¯ ϕ n − µ f ( ϕ ) } → N(0 , σ 2 f ( ϕ )) , in distribution . Note that CLT requires a finite variance while LLN does not. 5 / 38

Details Assume that we know how to sample from f ( x ). Let X 1 , . . . , X n be an independent sample from f ( x ). � n ϕ n = 1 Then LLN states that ¯ i =1 ϕ ( X i ) should be a good n estimate of µ f ( ϕ ) provided that n is large enough. It’s an unbiased estimate for all n . If µ f ( ϕ 2 ) < ∞ , then CLT allows us to construct a confidence interval for µ f ( ϕ ) based on our sample. In particular, a 100(1 − α )% CI for µ f ( ϕ ) is sd { Y 1 , . . . , Y n } mean { Y 1 , . . . , Y n } ± z ⋆ √ n , 1 − α/ 2 where Y i = ϕ ( X i ), i = 1 , . . . , n . 7 / 38

Example Suppose X ∼ Unif( − π, π ). The goal is to estimate E { ϕ ( X ) } , where ϕ ( x ) = sin( x ), which we know to be 0. Take an iid sample of size n = 1000, from the uniform distribution and evaluate Y i = sin( X i ). Summary statistics for the Y -sample include: mean = − 0 . 0167 and sd = 0 . 699 . Then a 99% confidence interval for µ f ( ϕ ) is 0 . 699 √ − 0 . 0167 ± 2 . 5758 = [ − 0 . 074 , 0 . 040] . � �� 1000 qnorm ( 0 . 995 ) 8 / 38

Example: p-value of the likelihood ratio test Let X 1 , . . . , X n iid Pois( θ ). Goal is to test H 0 : θ = θ 0 versus H 1 : θ � = θ 0 . One idea is the likelihood ratio test, but formula is messy: � n ¯ � − n ¯ Λ = L ( θ 0 ) X X = e n ¯ X − n θ 0 . L (ˆ n θ 0 θ ) Need null distribution of the likelihood ratio statistic to compute, say, a p-value, but this is not available. 3 Straightforward to get a Monte Carlo p-value. Note that Λ depends only on the sufficient statistic n ¯ X , which is distributed as Pois( n θ 0 ) under H 0 . 3 Wilks’s theorem gives us a large-sample approximation... 9 / 38

Remarks Advantages of Monte Carlo: Does not depend on the dimension of the random variables. Basically works for all functions ϕ ( x ). A number of different things can be estimated with the same simulated X i ’s. Disadvantages of Monte Carlo: Can be slow. Need to be able to sample from f ( x ). Error bounds are not as tight as for numerical integration. 10 / 38

Motivation Importance sampling techniques are useful in a number of situations; in particular, when the target distribution f ( x ) is difficult to sample from, or to reduce the variance of basic Monte Carlo estimates. Next slides gives a simple example of the latter. Importance sampling is the general idea of sampling from a different distribution but weighting the observations to make them look more like a sample from the target. Similar in spirit to SIR... 12 / 38

Motivating example Goal is to estimate the probability that a fair die lands on 1. A basic Monte Carlo estimate ¯ X based on n iid Ber( 1 6 ) samples has mean 1 5 6 and variance 36 n . If we change the die to have three 1’s and three non-1’s, then the event of observing a 1 is more likely. To account for this change, we should weight each 1 observed with this new die by 1 3 . So, if Y i = 1 3 × Ber( 1 2 ), then 3 ) 2 · 1 E( Y i ) = 1 3 · 1 2 = 1 V( Y i ) = ( 1 4 = 1 and 36 . 6 So, ¯ Y has the same mean as ¯ X , but much smaller variance: 1 5 36 n compared to 36 n . 13 / 38

Details As before, f ( x ) is the target and g ( x ) is a generic envelope. Define importance ratios w ⋆ ( x ) = f ( x ) / g ( x ). Then the key observation is that µ f ( ϕ ) = µ g ( ϕ w ⋆ ) . This motivates the (modified) Monte Carlo approach: 1 Sample X 1 , . . . , X n iid from g ( x ). 2 Estimate µ f ( ϕ ) with n − 1 � n i =1 ϕ ( X i ) w ⋆ ( X i ). If f ( x ) is known only up to proportionality constant, then use � n � n i =1 ϕ ( X i ) w ⋆ ( X i ) � n µ f ( ϕ ) ≈ ϕ ( X i ) w ( X i ) = . i =1 w ⋆ ( X i ) i =1 14 / 38

Another motivating example � 1 Consider estimating 0 cos( π x / 2) dx , which equals 2 /π . Treat as an expectation with respect to X ∼ Unif(0 , 1). Can show that � � π X �� V cos ≈ 0 . 095 . 2 Consider importance sampling with g ( y ) = 3 2 (1 − y 2 ). Can show that � 2 cos( π Y � 2 ) V ≈ 0 . 00099 . 3(1 − Y 2 ) So, the importance sampling estimator will have much smaller variance than the basic Monte Carlo estimator! This is the same idea as in the simple dice example. 15 / 38

Example: size and power estimation Suppose we wish to test H 0 : λ = 2 vs. H 1 : λ > 2 based on a sample of size n = 10 from a Pois( λ ) distribution. One would be tempted to use the usual z-test: ¯ X − 2 Z = � . 2 / 10 Under the normal distribution theory, the size α = 0 . 05 test rejects H 0 if Z ≥ 1 . 645. But the distribution is not normal, so the actual Type I error probability for this test likely isn’t 0.05. Used two approaches: basic Monte Carlo and importance sampling, with g = Pois(2 . 4653). Respective 95% confidence intervals: (0 . 0448 , 0 . 0532) and (0 . 0520 , 0 . 0611) . 16 / 38

Remarks The choice of envelope g ( x ) is important. In particular, the importance ratio w ⋆ ( x ) = f ( x ) / g ( x ) must be well-behaved, otherwise the variance of the estimate could be too large. A general strategy is to take g ( x ) to be a heavy-tailed distribution, like Student-t or a mixture thereof. To get an idea of what makes a good proposal, let’s consider a practically useless result: 4 “optimal” proposal ∝ | ϕ ( x ) | f ( x ) . Take-away message: want proposal to look like f , but we are less concerned in places where ϕ is near/equal to zero. 4 See Theorem 3.12 in Robert & Casella. 17 / 38

Example: small tail probabilities Goal: estimate P( Z > 4 . 5), where Z ∼ N(0 , 1). Naive strategy: sample Z 1 , . . . , Z N iid N(0 , 1) and compute N − 1 � N j =1 I Z j > 4 . 5 . However, we won’t get many Z ′ j s that exceed 4.5, so naive Monte Carlo estimate is likely to be zero. Here, ϕ ( z ) = I z > 4 . 5 so we don’t need to worry about how f and g compare outside the interval [4 . 5 , ∞ ). Idea: do importance sampling with proposal g as a shifted exponential distribution, i.e., g ( z ) ∝ e − ( z − 4 . 5) . Comparison: for N = 10000 we have MC IS truth [1,] 0 3.316521e-06 3.397673e-06 18 / 38

Is the chosen proposal good? It is possible to use the weights to judge the proposal. For f known exactly, not just up to proportionality, define the effective sample size n � N ( f , g ) = 1 + var { w ⋆ ( X 1 ) , . . . , w ⋆ ( X n ) } . � N ( f , g ) is bounded by n and measures approximately how many iid samples the weighted importance samples are worth. � N ( f , g ) close to n indicates that g ( x ) is a good proposal, close to 0 means g ( x ) is a poor proposal. 19 / 38

Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin - PowerPoint PPT Presentation

Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 34 in Robert & Casella 2 Updated: March 18, 2016 1 / 38

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Monte Carlo Integration Monte Carlo methods use random numbers for sampling Although

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Change in psychotropic drug use in Norwegian nursing homes between 2004 and 2011 Sarah Janus

Computational social processes Lirong Xia Fall, 2016 Example: Crowdsourcing . . . . . . .

Ubiquity Generator Framework: Current Status via a SCIP Application Example Yuji Shinano Zuse

Structural Damage Location by Low-Cost Piezoelectric Transducer and Advanced Signal Processing

Architecture, Arguments, and Confidence (Joint work with Bev Littlewood, City University, London

Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN review Visualization

UTOPIAE is a 3.9M starting 01 January 2017 research and training network supported by the 15

Analysis of the stability and accuracy of multivariate polynomial approximation by discrete least

Sambuz

Useful Links

Newsletter

Mail Us

Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin - PowerPoint PPT Presentation

Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 34 in Robert & Casella 2 Updated: March 18, 2016 1 / 38

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Monte Carlo Integration Monte Carlo methods use random numbers for sampling Although

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Change in psychotropic drug use in Norwegian nursing homes between 2004 and 2011 Sarah Janus

Computational social processes Lirong Xia Fall, 2016 Example: Crowdsourcing . . . . . . .

Ubiquity Generator Framework: Current Status via a SCIP Application Example Yuji Shinano Zuse

Structural Damage Location by Low-Cost Piezoelectric Transducer and Advanced Signal Processing

Architecture, Arguments, and Confidence (Joint work with Bev Littlewood, City University, London

Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN review Visualization

UTOPIAE is a 3.9M starting 01 January 2017 research and training network supported by the 15

Analysis of the stability and accuracy of multivariate polynomial approximation by discrete least

Sambuz

Useful Links

Newsletter

Mail Us

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.