introduction to bayesian computation
play

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - - PowerPoint PPT Presentation

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 26, 2019 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 1 / 30 Bayesian computation Goals: E | y [ h ( ) |


  1. Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 26, 2019 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 1 / 30

  2. Bayesian computation Goals: � E θ | y [ h ( θ ) | y ] = h ( θ ) p ( θ | y ) dθ � p ( y ) = p ( y | θ ) p ( θ ) dθ = E θ [ p ( y | θ )] Approaches: Deterministic approximation Monte Carlo approximation Theoretical justification Gridding Inverse CDF Accept-reject Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 2 / 30

  3. Numerical integration Deterministic methods where S � � θ ( s ) � � θ ( s ) � � � E [ h ( θ ) | y ] = h ( θ ) p ( θ | y ) dθ ≈ w s h p � y � S =1 and θ ( s ) are selected points, w s is the weight given to the point θ ( s ) , and the error can be bounded. Monte Carlo (simulation) methods where S � � θ ( s ) � � E [ h ( θ ) | y ] = h ( θ ) p ( θ | y ) dθ ≈ w s h S =1 and θ ( s ) ind ∼ g ( θ ) (for some proposal distribution g ), w s = p ( θ ( s ) | y ) /g ( θ ( s ) ) , and we have SLLN and CLT. Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 3 / 30

  4. Deterministic methods Example: Normal-Cauchy model Let Y ∼ N ( θ, 1) with θ ∼ Ca (0 , 1) . The posterior is p ( θ | y ) ∝ p ( y | θ ) p ( θ ) ∝ exp( − ( y − θ ) 2 / 2) = q ( θ | y ) 1 + θ 2 which is not a known distribution. We might be interested in 1. normalizing this posterior, i.e. calculating � c ( y ) = q ( θ | y ) dθ 2. or in calculating the posterior mean, i.e. � � θq ( θ | y ) E [ θ | y ] = θp ( θ | y ) dθ = c ( y ) dθ. Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 4 / 30

  5. Deterministic methods Normal-Cauchy: marginal likelihood y = 1 # Data q = function(theta, y, log = FALSE) { out = -(y-theta)^2/2-log(1+theta^2) if (log) return(out) return(exp(out)) } # Find normalizing constant for q(theta|y) w = 0.1 theta = seq(-5,5,by=w)+y (cy = sum(q(theta,y)*w)) # gridding based approach [1] 1.305608 integrate(function(x) q(x,y), -Inf, Inf) # numerical integration 1.305609 with absolute error < 0.00013 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 5 / 30

  6. Deterministic methods Normal-Cauchy: distribution curve(q(x,y)/cy, -5, 5, n=1001) points(theta,rep(0,length(theta)), cex=0.5, pch=19) segments(theta,0,theta,q(theta,y)/cy) 0.5 0.4 q(x, y)/cy 0.3 0.2 0.1 0.0 −4 −2 0 2 4 x Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 6 / 30

  7. Deterministic methods Posterior expectation θ ( s ) � q S S � y θ ( s ) � � � θ ( s ) � � θ ( s ) � � � � � � E [ h ( θ ) | y ] ≈ w s h p � y = w s h � c ( y ) s =1 s =1 h = function(theta) theta sum(w*h(theta)*q(theta,y)/cy) [1] 0.5542021 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 7 / 30

  8. Deterministic methods Posterior expectation as a function of observed data 5.0 2.5 Posterior expectation prior Cauchy 0.0 normal improper uniform −2.5 −5.0 −5.0 −2.5 0.0 2.5 5.0 y Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 8 / 30

  9. Monte Carlo methods Convergence review Three main notions of convergence of a sequence of random variables X 1 , X 2 , . . . and a random variable X : d Convergence in distribution ( X n → X ): n →∞ F n ( X ) = F ( x ) . lim p Convergence in probability (WLLN, X n → X ): n →∞ P ( | X n − X | ≥ ǫ ) = 0 . lim a.s. Almost sure convergence (SLLN, X n − → X ): � � P n →∞ X n = X lim = 1 . Implications: Almost sure convergence implies convergence in probability. Convergence in probability implies convergence in distribution. Here, X n will be our approximation to an integral and X the true (constant) value of that integral or X n will be a standardized approximation and X will be N (0 , 1) . Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 9 / 30

  10. Monte Carlo methods Monte Carlo integration Consider evaluating the integral � E [ h ( θ )] = h ( θ ) p ( θ ) dθ Θ using the Monte Carlo estimate S h S = 1 � θ ( s ) � ˆ � h S s =1 where θ ( s ) ind ∼ p ( θ ) . We know SLLN: ˆ a.s. h S → E [ h ( θ )] . − CLT: if h 2 has finite expectation, then ˆ h S − E [ h ( θ )] d → N (0 , 1) � v S /S where S v S = V ar [ h ( θ )] ≈ 1 � 2 � � θ ( s ) � − ˆ � h h S S s =1 or any other consistent estimator. Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 10 / 30

  11. Monte Carlo methods Definite integral Definite integral Suppose you are interested in evaluating � 1 e − θ 2 / 2 dθ. I = 0 Then set h ( θ ) = e − θ 2 / 2 and p ( θ ) = 1 , i.e. θ ∼ Unif (0 , 1) . and approximate by a Monte Carlo estimate via 1. For s = 1 , . . . , S , a. sample θ ( s ) ind ∼ Unif (0 , 1) and � θ ( s ) � b. calculate h . 2. Calculate S I ≈ 1 � h ( θ ( s ) ) . S s =1 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 11 / 30

  12. Monte Carlo methods Definite integral Monte Carlo sampling randomly infills 20 samples 200 samples 1.0 1.0 0.9 0.9 f(x) f(x) 0.8 0.8 0.7 0.7 0.6 0.6 0.0 0.4 0.8 0.0 0.4 0.8 x x Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 12 / 30

  13. Monte Carlo methods Definite integral Strong law of large numbers Monte Carlo estimate 0.95 0.90 Estimate 0.85 0 200 400 600 800 1000 Number of samples Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 13 / 30

  14. Monte Carlo methods Definite integral Central limit theorem Monte Carlo estimate 0.95 0.90 Estimate 0.85 0.80 Truth Estimate 0.75 95% CI 0 200 400 600 800 1000 Number of samples Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 14 / 30

  15. Monte Carlo methods Infinite bounds Infinite bounds Suppose θ ∼ N (0 , 1) and you are interested in evaluating � ∞ 1 2 πe − θ 2 / 2 dθ √ E [ θ ] = θ −∞ Then set h ( θ ) = θ and g ( θ ) = φ ( θ ) , i.e. θ ∼ N (0 , 1) . and approximate by a Monte Carlo estimate via 1. For s = 1 , . . . , S , a. sample θ ( s ) ind ∼ N (0 , 1) and � θ ( s ) � b. calculate h . 2. Calculate S E [ θ ] ≈ 1 � h ( θ ( s ) ) . S s =1 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 15 / 30

  16. Monte Carlo methods Infinite bounds Non-uniform sampling n=20 n=100 3 3 2 2 h(theta)=theta 1 h(theta)=theta 1 0 0 −1 −1 −2 −2 −3 −3 −3 −1 1 2 3 −3 −1 1 2 3 locations, theta locations, theta Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 16 / 30

  17. Monte Carlo methods Infinite bounds Monte Carlo estimate Monte Carlo estimate 0.4 0.2 0.0 Estimate −0.2 −0.4 Truth Estimate 95% CI −0.6 0 200 400 600 800 1000 S Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 17 / 30

  18. Monte Carlo methods Gridding Monte Carlo approximation via gridding Rather than determining c ( y ) and then E [ θ | y ] via deterministic gridding (all w i are equal), we can use the grid as a discrete approximation to the posterior, i.e. N q ( θ i | y ) � p ( θ | y ) ≈ p i δ θ i ( θ ) p i = � N s =1 q ( θ j | y ) i =1 where δ θ i ( θ ) is the Dirac delta function, i.e. � δ θ i ( θ ) = 0 ∀ θ � = θ i δ θ i ( θ ) dθ = 1 . This discrete approximation to p ( θ | y ) can be used to approximate the expectation E [ h ( θ ) | y ] deterministically or via simulation, i.e. N S E [ h ( θ ) | y ] ≈ 1 � θ ( s ) � � � E [ h ( θ ) | y ] ≈ p i h ( θ i ) h S i =1 s =1 where θ ( s ) ind ∼ � N i =1 p i δ θ i ( θ ) (with replacement). Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 18 / 30

  19. Monte Carlo methods Gridding Example: Normal-Cauchy model y = 1 # Data # Small number of grid locations theta = seq(-5,5,length=1e2+1)+y; p = q(theta,y)/sum(q(theta,y)); sum(p*theta) [1] 0.5542021 mean(sample(theta,prob=p,replace=TRUE)) [1] 0.6118812 # Large number of grid locations theta = seq(-5,5,length=1e6+1)+y; p = q(theta,y)/sum(q(theta,y)); sum(p*theta) [1] 0.5542021 mean(sample(theta,1e2,prob=p,replace=TRUE)) # But small MC sample [1] 0.598394 # Truth post_expectation(1) [1] 0.5542021 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 19 / 30

  20. Monte Carlo methods Inverse CDF method Inverse cumulative distribution function Definition The cumulative distribution function (cdf) of a random variable X is defined by F X ( x ) = P X ( X ≤ x ) for all x. Lemma Let X be a random variable whose cdf is F ( x ) and you have access to the inverse cdf of X , i.e. if x = F − 1 ( u ) . u = F ( x ) = ⇒ If U ∼ Unif (0 , 1) , then X = F − 1 ( U ) is a simulation from the distribution for X . Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 20 / 30

  21. Monte Carlo methods Inverse CDF method Inverse CDF Standard normal CDF 1.0 0.8 0.6 pnorm(x) 0.4 0.2 0.0 −3 −2 −1 0 1 2 3 x Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 21 / 30

Recommend


More recommend