Introduction to Bayesian computation (cont.) Dr. Jarad Niemi STAT - PowerPoint PPT Presentation

Introduction to Bayesian computation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University March 23, 2017 Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 1 / 19

Outline Bayesian computation Adaptive rejection sampling Importance sampling Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 2 / 19

Adaptive rejection sampling Adaptive rejection sampling Definition A function is concave if f ((1 − t ) x + t y ) ≥ (1 − t ) f ( x ) + t f ( y ) for any 0 ≤ t ≤ 1 . f(y) f(x) x y Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 3 / 19

Adaptive rejection sampling Log-concavity Definition A function f ( x ) is log-concave if log f ( x ) is concave. A function is log-concave if and only if (log f ( x )) ′′ ≤ 0 . For example, X ∼ N (0 , 1) has log-concave density since dx 2 log e − x 2 / 2 = d 2 d 2 − x 2 = d dx − x = − 1 . dx 2 2 Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 4 / 19

Adaptive rejection sampling Adaptive rejection sampling Adaptive rejection sampling can be used for distributions with log-concave densities. It builds a piecewise linear envelope to the log density by evaluating the log function and its derivative at a set of locations and constructing tangent lines, e.g. log density density Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 5 / 19

Adaptive rejection sampling Adaptive rejection sampling Pseudo-algorithm for adaptive rejection sampling: 1. Choose starting locations θ , call the set Θ 2. Construct piece-wise linear envelope log g ( θ ) to the log-density a. Calculate log q ( θ | y ) and (log q ( θ | y )) ′ . b. Find line intersections 3. Sample a proposed value θ ∗ from the envelope g ( θ ) a. Sample an interval b. Sample a truncated (and possibly negative of an) exponential r.v. 4. Perform rejection sampling a. Sample u ∼ Unif (0 , 1) b. Accept if u ≤ q ( θ ∗ | y ) /g ( θ ∗ ) . 5. If rejected, add θ ∗ to Θ and return to 2. Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 6 / 19

Adaptive rejection sampling Updating the envelope As values are proposed and rejected, the envelope gets updated: Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 7 / 19

Adaptive rejection sampling Adaptive rejection sampling in R library(ars) x = ars(n=1000, function(x) -x^2/2, function(x) -x) hist(x, prob=T, 100) curve(dnorm, type='l', add=T) Histogram of x 0.6 0.5 0.4 Density 0.3 0.2 0.1 0.0 −3 −2 −1 0 1 2 3 x Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 8 / 19

Adaptive rejection sampling Adaptive rejection sampling summary Can be used with log-concave densities Makes rejection sampling efficient by updating the envelope There is a vast literature on adaptive rejection sampling. To improve upon the basic idea presented here you can include a lower bound avoid calculating derivatives incorporate a Metropolis step to deal with non-log-concave densities Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 9 / 19

Adaptive rejection sampling Importance sampling Importance sampling Notice that � � h ( θ ) p ( θ | y ) E [ h ( θ ) | y ] = h ( θ ) p ( θ | y ) dθ = g ( θ ) g ( θ ) dθ where g ( θ ) is a proposal distribution, so that we approximate the expectation via S E [ h ( θ ) | y ] ≈ 1 � θ ( s ) � � θ ( s ) � � w h S s =1 where θ ( s ) iid ∼ g ( θ ) and θ ( s ) � � y � � = p � θ ( s ) � w g ( θ ( s ) ) is known as the importance weight. Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 10 / 19

Adaptive rejection sampling Importance sampling Importance sampling If the target distribution is known only up to a proportionality constant, then h ( θ ) q ( θ | y ) � g ( θ ) g ( θ ) dθ � h ( θ ) q ( θ | y ) dθ E [ h ( θ ) | y ] = = � q ( θ | y ) � q ( θ | y ) dθ g ( θ ) g ( θ ) dθ where g ( θ ) is a proposal distribution, so that we approximate the expectation via S � S 1 θ ( s ) � θ ( s ) � � � s =1 w h � θ ( s ) � � θ ( s ) � � S E [ h ( θ ) | y ] ≈ = w ˜ h � S 1 � θ ( s ) � s =1 w S s =1 where θ ( s ) iid ∼ g ( θ ) and θ ( s ) � � w � θ ( s ) � w ˜ = � S � θ ( j ) � j =1 w is the normalized importance weight. Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 11 / 19

Adaptive rejection sampling Importance sampling Example: Normal-Cauchy model If Y ∼ N ( θ, 1) and θ ∼ Ca (0 , 1) , then 1 p ( θ | y ) ∝ e − ( y − θ ) 2 / 2 (1 + θ 2 ) for all θ . If we choose a N ( y, 1) proposal, we have 1 2 πe − ( θ − y ) 2 / 2 √ g ( θ ) = with √ w ( θ ) = q ( θ | y ) 2 π g ( θ ) = (1 + θ 2 ) Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 12 / 19

Adaptive rejection sampling Importance sampling Normalized importance weights 0.0020 0.0015 Normalized importance weight 0.0010 0.0005 −2 0 2 4 θ Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 13 / 19

Adaptive rejection sampling Importance sampling library(weights) sum(weight*theta/sum(weight)) # Estimate mean [1] 0.5504221 wtd.hist(theta, 100, prob=TRUE, weight=weight) curve(q(x,y)/py(y), add=TRUE, col="red", lwd=2) Histogram of theta 0.7 0.6 0.5 0.4 Density 0.3 0.2 0.1 0.0 −2 −1 0 1 2 3 4 theta Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 14 / 19

Adaptive rejection sampling Importance sampling Resampling If an unweighted sample is desired, sample θ ( s ) with replacement with θ ( s ) � � probability equal to the normalized weights, ˜ w . # resampling new_theta = sample(theta, replace=TRUE, prob=weight) # internally normalized hist(new_theta, 100, prob=TRUE, main="Unweighted histogram of resampled draws"); curve(q(x,y)/py(y), add=TRUE, Unweighted histogram of resampled draws 0.8 0.6 Density 0.4 0.2 0.0 −1 0 1 2 3 Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) new_theta March 23, 2017 15 / 19

Adaptive rejection sampling Importance sampling Heavy-tailed proposals Although any proposal can be used for importance sampling, only proposals with heavy tails relative to the target will be efficient. For example, suppose our target is a standard Cauchy and our proposal is a standard normal, the weights are 1 � y θ ( s ) � � � = p � θ ( s ) � π (1+ θ 2 ) w = 1 g ( θ ( s ) ) 2 π e − θ 2 / 2 √ For θ ( s ) iid ∼ N (0 , 1) , the weights for the largest | θ ( s ) | will dominate the others. Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 16 / 19

Adaptive rejection sampling Importance sampling Importance weights for proposal with thin tails 0.03 Normalized importance weight 0.02 0.01 0.00 −2 0 2 θ Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 17 / 19

Adaptive rejection sampling Importance sampling Effective sample size We can get a measure of how efficient the sample is by computing the effective sample size, i.e. how many independent unweighted draws do we effectively have: 1 S eff = � S � θ ( s ) � ) 2 s =1 ( ˜ w (n <- length(weight)) # Number of samples [1] 1000 (ess <- 1/sum(weight^2)) # Effective sample size [1] 371.432 ess/n # Effective sample proportion [1] 0.371432 Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 18 / 19

Adaptive rejection sampling Importance sampling Effective sample size set.seed(5) theta = rnorm(1e4) lweight = dcauchy(theta,log=TRUE)-dnorm(theta,log=TRUE) cumulative_ess = length(lweight) for (i in 1:length(lweight)) { lw = lweight[1:i] w = exp(lw-max(lw)) w = w/sum(w) cumulative_ess[i] = 1/sum(w^2) } qplot(x=1:length(cumulative_ess), y=cumulative_ess, geom="line") + labs(x="Number of samples", y="Effective sample size") + theme_bw() 3000 Effective sample size 2000 1000 0 0 2500 5000 7500 10000 Number of samples Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 19 / 19

Introduction to Bayesian computation (cont.) Dr. Jarad Niemi STAT - PowerPoint PPT Presentation

Introduction to Bayesian computation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University March 23, 2017 Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 1 / 19 Outline Bayesian computation Adaptive

Approximate Bayesian Computation Chris Drovandi, Charisse Farr October 24, 2012 Chris Drovandi,

Cleani ning C ng Cont ontract Cleani ning C ng Cont ontract Cleani ning C ng Cont

Why LINEX Our Explanation (cont-d) Our Explanation (cont-d) (Linear Exponential) Our

Bayesian linear regression (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University April 20,

Bayesian hypothesis testing (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University March 7,

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 26,

CS171 Introduction to Computer Science II Recursion (cont.) + MergeSort Recursion (cont.) +

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Minimal ConT EXt Distribution Mojca Miklavec, BachoT EX 2008 Specifics of ConT EXt

Migration to ConT EXt? First experience with ConT EXt typesetting Tom Hla KONVOJ

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

TMD phenomenology and A N in pp collisions Umberto DAlesio Physics Department & INFN

Music Informatics Alan Smaill Mar 18 2014 Alan Smaill Music Informatics 1/19 : : Today N

Rigid body refinement tutorial (MASSHA case) Petr V. Konarev European Molecular Biology

Sweep Surfaces for CAGD Jinesh Machchhar Advisors: Prof. Milind Sohoni and Prof. Bharat Adsul

Computational Geometry Largest convex subsets Exercise session #9: Arrangements Definition :

Crystal focusing for CLIC FFS? R. Tom as Many thanks to S. Redaelli, D. Schulte and V.

Ellipses Dan Kalman American University (until 8/31) www.dankalman.net Topics

Shape Analysis for Redistricting modern geometry meets modern politics Zachary Schutzman

Introduction to Bayesian computation (cont.) Dr. Jarad Niemi STAT - PowerPoint PPT Presentation

Introduction to Bayesian computation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University March 23, 2017 Jarad Niemi (STAT544@ISU) Introduction to Bayesian computation (cont.) March 23, 2017 1 / 19 Outline Bayesian computation Adaptive

Approximate Bayesian Computation Chris Drovandi, Charisse Farr October 24, 2012 Chris Drovandi,

Cleani ning C ng Cont ontract Cleani ning C ng Cont ontract Cleani ning C ng Cont

Why LINEX Our Explanation (cont-d) Our Explanation (cont-d) (Linear Exponential) Our

Bayesian linear regression (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University April 20,

Bayesian hypothesis testing (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University March 7,

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 26,

CS171 Introduction to Computer Science II Recursion (cont.) + MergeSort Recursion (cont.) +

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Minimal ConT EXt Distribution Mojca Miklavec, BachoT EX 2008 Specifics of ConT EXt

Migration to ConT EXt? First experience with ConT EXt typesetting Tom Hla KONVOJ

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

TMD phenomenology and A N in pp collisions Umberto DAlesio Physics Department &amp; INFN

Music Informatics Alan Smaill Mar 18 2014 Alan Smaill Music Informatics 1/19 : : Today N

Rigid body refinement tutorial (MASSHA case) Petr V. Konarev European Molecular Biology

Sweep Surfaces for CAGD Jinesh Machchhar Advisors: Prof. Milind Sohoni and Prof. Bharat Adsul

Computational Geometry Largest convex subsets Exercise session #9: Arrangements Definition :

Crystal focusing for CLIC FFS? R. Tom as Many thanks to S. Redaelli, D. Schulte and V.

Ellipses Dan Kalman American University (until 8/31) www.dankalman.net Topics

Shape Analysis for Redistricting modern geometry meets modern politics Zachary Schutzman

TMD phenomenology and A N in pp collisions Umberto DAlesio Physics Department & INFN