Scalable MCMC for Bayes Shrinkage Priors Paulo Orenstein July 2, - PowerPoint PPT Presentation

Introduction Model Computation Results Conclusion Scalable MCMC for Bayes Shrinkage Priors Paulo Orenstein July 2, 2018 Stanford University Joint work with James Johndrow and Anirban Bhattacharya Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 1 / 16

Introduction Model Computation Results Conclusion Introduction ◮ Consider the high-dimensional setting: predict a vector y ∈ R n from a set of features X ∈ R n × p , with p ≫ n . Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 2 / 16

Introduction Model Computation Results Conclusion Introduction ◮ Consider the high-dimensional setting: predict a vector y ∈ R n from a set of features X ∈ R n × p , with p ≫ n . ◮ Assume a sparse Gaussian linear model ε ∼ N ( 0 , σ 2 I n ) , y = X β + ε, with β j = 0 for many j . Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 2 / 16

Introduction Model Computation Results Conclusion Introduction ◮ Consider the high-dimensional setting: predict a vector y ∈ R n from a set of features X ∈ R n × p , with p ≫ n . ◮ Assume a sparse Gaussian linear model ε ∼ N ( 0 , σ 2 I n ) , y = X β + ε, with β j = 0 for many j . ◮ How can we perform prediction and inference? Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 2 / 16

Introduction Model Computation Results Conclusion Introduction ◮ Consider the high-dimensional setting: predict a vector y ∈ R n from a set of features X ∈ R n × p , with p ≫ n . ◮ Assume a sparse Gaussian linear model ε ∼ N ( 0 , σ 2 I n ) , y = X β + ε, with β j = 0 for many j . ◮ How can we perform prediction and inference? Lasso Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 2 / 16

Introduction Model Computation Results Conclusion Introduction ◮ Consider the high-dimensional setting: predict a vector y ∈ R n from a set of features X ∈ R n × p , with p ≫ n . ◮ Assume a sparse Gaussian linear model ε ∼ N ( 0 , σ 2 I n ) , y = X β + ε, with β j = 0 for many j . ◮ How can we perform prediction and inference? Lasso Point mass mixture prior Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 2 / 16

Introduction Model Computation Results Conclusion Introduction ◮ Consider the high-dimensional setting: predict a vector y ∈ R n from a set of features X ∈ R n × p , with p ≫ n . ◮ Assume a sparse Gaussian linear model ε ∼ N ( 0 , σ 2 I n ) , y = X β + ε, with β j = 0 for many j . ◮ How can we perform prediction and inference? Lasso, but : convex relaxation; one parameter for sparsity and shrinkage Point mass mixture prior Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 2 / 16

Introduction Model Computation Results Conclusion Introduction ◮ Consider the high-dimensional setting: predict a vector y ∈ R n from a set of features X ∈ R n × p , with p ≫ n . ◮ Assume a sparse Gaussian linear model ε ∼ N ( 0 , σ 2 I n ) , y = X β + ε, with β j = 0 for many j . ◮ How can we perform prediction and inference? Lasso, but : convex relaxation; one parameter for sparsity and shrinkage Point mass mixture prior, but : computation is prohibitive Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 2 / 16

Introduction Model Computation Results Conclusion Introduction ◮ Can we find a continuous prior that behaves like the point mass mixture prior? Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 3 / 16

Introduction Model Computation Results Conclusion Introduction ◮ Can we find a continuous prior that behaves like the point mass mixture prior? ◮ Desiderata: adaptive to sparsity easy to compute good predictive performance good frequentist properties decent compromise between statistical and computational goals Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 3 / 16

Introduction Model Computation Results Conclusion Introduction ◮ Can we find a continuous prior that behaves like the point mass mixture prior? ◮ Desiderata: adaptive to sparsity easy to compute good predictive performance good frequentist properties decent compromise between statistical and computational goals ◮ Global-local priors can achieve this (with some qualifications). Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 3 / 16

Introduction Model Computation Results Conclusion Introduction ◮ Can we find a continuous prior that behaves like the point mass mixture prior? ◮ Desiderata: adaptive to sparsity easy to compute good predictive performance good frequentist properties decent compromise between statistical and computational goals ◮ Global-local priors can achieve this (with some qualifications). ◮ But... they are still slow. Lasso: n ≈ 1 , 000, p ≈ 1 , 000 , 000; Global-local: n ≈ 1 , 000, p ≈ 1 , 000. Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 3 / 16

Introduction Model Computation Results Conclusion Model ◮ The Horseshoe model * : y i | β j , λ j , τ, σ 2 ind ∼ N ( x i β, σ 2 ) ind ∼ N ( 0 , τ 2 λ 2 β j j ) ind λ j ∼ Cauchy + ( 0 , 1 ) τ ∼ Cauchy + ( 0 , 1 ) σ 2 ∼ InvGamma ( a 0 / 2 , b 0 / 2 ) * [Carvalho et. al, 2010] Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 4 / 16

Introduction Model Computation Results Conclusion Model ◮ Horseshoe has other good frequentist properties. Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 5 / 16

Introduction Model Computation Results Conclusion Model ◮ Horseshoe has other good frequentist properties. ◮ It achieves the minimax-adaptive risk for squared error loss up to a constant. Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 5 / 16

Introduction Model Computation Results Conclusion Model ◮ Horseshoe has other good frequentist properties. ◮ It achieves the minimax-adaptive risk for squared error loss up to a constant. ◮ Suppose X = I , � β � 0 = s n , then [van der Pas et al., 2014], ≤ 4 σ 2 s n log n � � � ˆ β HS − β � 2 sup s n · ( 1 + o ( 1 )) , E β 2 β : � β � 0 ≤ s n while, for any estimator ˆ β , [Donoho et al., 1992] shows ≥ 2 σ 2 s n log n � � � ˆ β − β � 2 sup s n · ( 1 + o ( 1 )) . E β 2 β : � β � 0 ≤ s n Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 5 / 16

Introduction Model Computation Results Conclusion Computation ◮ State-of-the-art: (i) τ | β, σ 2 , λ , (ii) β, σ 2 � � | τ, λ , (iii) slice sampling for λ . Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 6 / 16

Introduction Model Computation Results Conclusion Computation ◮ State-of-the-art: (i) τ | β, σ 2 , λ , (ii) β, σ 2 � � | τ, λ , (iii) slice sampling for λ . But... Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 6 / 16

Introduction Model Computation Results Conclusion Computation ◮ State-of-the-art: (i) τ | β, σ 2 , λ , (ii) β, σ 2 � � | τ, λ , (iii) slice sampling for λ . But... ◮ We scale the model with two ideas. Paulo Orenstein Scalable MCMC for Bayes Shrinkage Priors Stanford University 6 / 16

Scalable MCMC for Bayes Shrinkage Priors Paulo Orenstein July 2, - PowerPoint PPT Presentation

Introduction Model Computation Results Conclusion Scalable MCMC for Bayes Shrinkage Priors Paulo Orenstein July 2, 2018 Stanford University Joint work with James Johndrow and Anirban Bhattacharya Paulo Orenstein Scalable MCMC for Bayes

Shrinkage priors Dr. Jarad Niemi Iowa State University August 24, 2017 Jarad Niemi (Iowa State)

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Econ 2148, fall 2017 Shrinkage in the Normal means model Maximilian Kasy Department of

Advanced Econometrics 2, Hilary term 2020 Shrinkage in the Normal means model Maximilian Kasy

Econ 2148, fall 2019 Shrinkage in the Normal means model Maximilian Kasy Department of

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

High-Dimensional Multivariate Bayesian Linear Regression with Shrinkage Priors Ray Bai

Econ 2148, fall 2017 Applications of Gaussian process priors Maximilian Kasy Department of

Econ 2148, fall 2019 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

GLIMPSE An Approach for Determining Optimal Control Strategies for Energy System Emissions of

RADIATION DAMAGES IN MATERIALS PART II Dr. Celine Cabet CEA, DEN, DMN Service de Recherches

Sacred Name Doctrine: What Does the Scripture Teach? PART 3 The Sacred Name movement

Kinds of picture Single frame Kinds of picture Single frame Multi-frame Kinds of

Exploring Trading Strategy Spaces Michael Wellman University of Michigan Trading Games

Mexico The National Institute of Ecology and Climate Change (INECC) has been preceded by the

International Trade: Theory and Evidence Growth in world exports: 196068 7.3% 196873

Leveraging Chinas Emergence for Viet Nam Economic Growth: Opportunities for Agriculture David