Linear Regression: Prior Linear Regression: Posterior 4 4 2 2 0 - PowerPoint PPT Presentation

Monte Carlo Monte Carlo and Insomnia Enrico Fermi (1901–1954) took great delight in astonishing his colleagues with his remarkably accurate — Monte Carlo basics and Motivation predictions of experimental — Rejection sampling results. . . he revealed that his — Importance sampling “guesses” were really derived from the — Next time: Markov chain Monte Carlo statistical sampling techniques that he used to calculate with whenever insomnia struck in the wee morning hours! — The beginning of the Monte Carlo method , N. Metropolis Iain Murray http://iainmurray.net/ Linear Regression: Prior Linear Regression: Posterior 4 4 2 2 0 0 −2 −2 Prior P ( θ ) −4 −4 P ( θ | Data) ∝ P (Data | θ ) P ( θ ) −6 −6 −2 0 2 4 −2 0 2 4 Input → output mappings considered plausible before seeing data. Posterior much more compact than prior.

Linear Regression: Posterior Model mismatch 0 2 0 −0.5 −2 −1 −4 −1.5 P ( θ | Data) ∝ P (Data | θ ) P ( θ ) −6 −2 0 2 −2 0 2 4 What will Bayesian linear regression do? Draws from posterior. Non-linear error envelope. Possible explanations linear. Quiz ‘Underfitting’ Given a (wrong) linear assumption , which explanations are 4 typical of the posterior distribution? 2 2 2 2 0 0 0 0 −2 −2 −2 A B C −4 −4 −4 −2 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −4 D All of the above −6 E None of the above −4 −2 0 2 4 Z Not sure Posterior very certain despite blatant misfit. Prior ruled out truth.

Microsoft Kinect (Shotton et al., 2011) The need for integrals � p ( y ∗ | x ∗ , D ) = d θ p ( y ∗ , θ | x ∗ , D ) � = d θ p ( y ∗ | x ∗ , θ, � D ) p ( θ | ✚✚✚ � x ∗ , D ) � ✚ y Eyeball modelling assumptions Generate training data p ( y ∗ | x ∗ , D ) Random forest applied to fantasies x ∗ A statistical problem Simple Monte Carlo In general: What is the average height of the people in this room? Method: measure our heights, add them up and divide by N . S � f ( x ) P ( x ) d x ≈ 1 � x ( s ) ∼ P ( x ) f ( x ( s ) ) , S s =1 What is the average height f of people p in Edinburgh E ? Example: making predictions E p ∈E [ f ( p )] ≡ 1 � f ( p ) , “intractable”? � |E| P ( y ∗ | x ∗ , D ) = P ( y ∗ | x ∗ , θ ) p ( θ |D ) d θ p ∈E S S ≈ 1 � ≈ 1 � p ( s ) � � θ ( s ) ∼ p ( θ |D ) for random survey of S people { p ( s ) } ∈ E f , P ( y ∗ | x ∗ , θ ( s ) ) , S S s =1 s =1 Surveying works for large and notionally infinite populations. Many other integrals appear throughout statistical machine learning

Properties of Monte Carlo Aside: don’t always sample! S � f ≡ 1 � x ( s ) ∼ P ( x ) f ( x ) P ( x ) d x ≈ ˆ “Monte Carlo is an extremely bad method; f ( x ( s ) ) , Estimator: S s =1 it should be used only when all alternative methods are worse.” Estimator is unbiased: S — Alan Sokal, 1996 � � = 1 � ˆ f E P ( x ) [ f ( x )] = E P ( x ) [ f ( x )] E P ( { x ( s ) } ) S s =1 Variance shrinks ∝ 1 /S : S � � 1 � ˆ var P ( { x ( s ) } ) f = var P ( x ) [ f ( x )] = var P ( x ) [ f ( x )] /S S 2 s =1 √ “Error bars” shrink like S A dumb approximation of π Alternatives to Monte Carlo There are other methods of numerical integration! � 1 0 <x< 1 and 0 <y< 1 P ( x, y ) = 0 otherwise Example: (nice) 1D integrals are easy: �� octave:1> 4 * quadl(@(x) sqrt(1-x.^2), 0, 1, tolerance) � ( x 2 + y 2 ) < 1 � π = 4 P ( x, y ) d x d y I Gives π to 6 dp’s in 108 evaluations, machine precision in 2598. (NB Matlab’s quadl fails at tolerance=0 , but Octave works.) octave:1> S=12; a=rand(S,2); 4*mean(sum(a.*a,2)<1) ans = 3.3333 In higher dimensions sometimes deterministic approximations work: octave:1> S=1e7; a=rand(S,2); 4*mean(sum(a.*a,2)<1) Variational Bayes, Laplace, . . . (covered later) ans = 3.1418

Reminder Sampling simple distributions Want to sample to approximate expectations: Use library routines for S � univariate distributions f ( x ) P ( x ) d x ≈ 1 � x ( s ) ∼ P ( x ) f ( x ( s ) ) , (and some other special cases) S s =1 This book (free online) explains how How do we get the samples? some of them work http://luc.devroye.org/rnbookindex.html Sampling discrete values Sampling from densities How to convert samples from a Uniform[0,1] generator: 1 � y h ( y ) −∞ p ( y ′ ) d y ′ h ( y ) = p ( y ) u ∼ Uniform[0,1] Sample, y ( u ) = h − 1 ( u ) u ∼ Uniform[0 , 1] 0 y ⇒ u =0 . 4 x = b Figure from PRML, Bishop (2006) Although we can’t always compute and invert h ( y ) There are more efficient ways for large numbers of values and samples. See Devroye book.

Sampling from densities Rejection sampling Sampling from π ( x ) using tractable q ( x ) : Draw points uniformly under the curve: P ( x ) x x (1) x (4) x (2) x (3) Probability mass to left of point ∼ Uniform[0,1] Figure credit: Ryan P. Adams Importance sampling Importance sampling (2) If only know P ( x ) = P ∗ ( x ) / Z P up to constant: Rewrite integral: expectation under simple distribution Q : S � f ( x ( s ) ) P ∗ ( x ( s ) ) f ( x ) P ( x ) d x ≈ Z Q 1 � x ( s ) ∼ Q ( x ) � � , f ( x ) P ( x ) Q ∗ ( x ( s ) ) Z P S f ( x ) P ( x ) d x = Q ( x ) Q ( x ) d x, s =1 � �� w ∗ ( s ) S f ( x ( s ) ) P ( x ( s ) ) ≈ 1 � x ( s ) ∼ Q ( x ) S w ∗ ( s ) 1 ✄ Q ( x ( s ) ) , ✄ � ✄ f ( x ( s ) ) ≈ S ✄ ✄ � 1 s ′ w ∗ ( s ′ ) S ✁✁ s =1 ✄ ✄ ✁ S s =1 ✁ This estimator is consistent but biased Simple Monte Carlo applied to any integral. � Exercise: Prove that Z P / Z Q ≈ 1 s w ∗ ( s ) Unbiased and independent of dimension? S

Application to large problems Summary so far Approximations scale badly with dimensionality • Monte Carlo approximate expectations with a sample average Q ( x ) = N (0 , σ 2 I ) P ( x ) = N (0 , I ) , Example: • Rejection sampling draw samples from complex distributions Rejection sampling: Requires σ ≥ 1 . Fraction of proposals accepted = σ − D • Importance sampling apply Monte Carlo to ‘any’ sum/integral Importance sampling: � � D/ 2 σ 2 Var [ P ( x ) /Q ( x )] = − 1 Next: High dimensional problems: MCMC 2 − 1 /σ 2 √ Infinite / undefined variance if σ ≤ 1 / 2

Linear Regression: Prior Linear Regression: Posterior 4 4 2 2 0 - PowerPoint PPT Presentation

Monte Carlo Monte Carlo and Insomnia Enrico Fermi (19011954) took great delight in astonishing his colleagues with his remarkably accurate Monte Carlo basics and Motivation predictions of experimental Rejection sampling results. .

A O I Posterior View A O I Posterior View A O I

Section 33: Hip Structural Components 33-1 posterior posterior anterior anterior head of

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Bayesian Linear Regression Seung-Hoon Na Chonbuk National University Bayesian Linear Regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

5b Kinesiology: AOIs - Posterior Upper Body 5b Kinesiology: AOIs - Posterior Upper Body

5b Kinesiology AOIs - Posterior Upper Body 5b Kinesiology AOIs - Posterior Upper Body Class

7b Swedish: Technique Demo and Practice - Posterior Lower Body 7b Swedish: Technique Demo and

6b Swedish: Technique Review and Practice - Posterior Upper Body 6b Swedish: Technique Review

Posterior Lower Body 54b Deep Tissue: Technique Demo and Practice - Posterior Lower Body

4b Swedish: Technique Demo and Practice - Posterior Upper Body 4b Swedish: Technique Demo and

Lec03: Writing Exploits Taesoo Kim 2 Scoreboard 3 Administrivia Survey: how many hours

The phase III study INTERCEPTOR in locally advanced head and neck cancer (LA-HNC). Preliminary

Discussions in this presentaXon o Context of the workshop o

Lung Cancer : Lung Cancer : Lung Cancer : Lung Cancer : Improving the Cure Rate Improving the

Disclosure Statements The presenter has no real or potential conflict of interest related to

Sleepless in Seattle No Longer Joshua Reich*, Michel Goraczko, Aman Kansal, and Jitu Padhye

Commonsense for Generative Multi-Hop Question Answering Tasks Lisa Bauer* Yicheng Wang* Mohit

Ge#ng A Restores energy Good Nights Improves mood

Sambuz

Useful Links

Newsletter

Mail Us