Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 1: Mixture Priors for Linear Settings Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 1 / 40

Part 1: Mixture Priors for Linear Settings Linear regression models (univariate and multivariate responses) Extensions to categorical responses and survival outcomes Matlab code Examples from genomics/proteomics Bayesian models for integrative genomics (next part) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 2 / 40

Regression Model ε ∼ N ( 0 , σ 2 I ) Y n × 1 = 1 α + X n × p β β p × 1 + ε, β γ = ( γ 1 , . . . , γ p ) ′ to select variables Introduce latent variable γ γ � γ j = 1 if variable j included in model γ j = 0 otherwise Specify priors for model parameters: β j | σ 2 ( 1 − γ j ) δ 0 ( β j ) + γ j N ( 0 , σ 2 h j ) ∼ N ( α 0 , h 0 σ 2 ) α | σ 2 ∼ σ 2 IG ( ν/ 2 , λ/ 2 ) ∼ p � p ( γ w γ j ( 1 − w ) 1 − γ j . γ γ ) = j = 1 where δ 0 ( · ) is the Dirac function. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 3 / 40

Posterior Distribution Combine data and prior information into a posterior distribution ⇒ interest in posterior distribution � p ( γ γ | Y , X ) ∝ p ( γ f ( Y | X , α, β σ ) p ( α | σ ) p ( β γ ) p ( σ ) d α d β β d σ γ γ γ ) β β, σ σ β | σ, γ β γ β p ( γ γ | Y , X ) ∝ g ( γ γ γ ) = γ γ ) − ( n + ν ) / 2 p ( γ | ˜ γ ) ˜ γ ) | − 1 / 2 ( νλ + S 2 X ′ γ γ ) X ( γ γ γ γ ( γ γ � � � Y � 1 X ( γ γ ) H 2 γ ˜ , ˜ ( γ γ γ ) γ ) = Y = X ( γ γ I p γ 0 γ γ Y ′ ˜ Y ′ ˜ γ ) ) − 1 ˜ γ = ˜ Y − ˜ γ ) (˜ γ ) ˜ γ ) ˜ S 2 X ′ X ′ X ( γ X ( γ Y γ γ γ γ ( γ γ ( γ γ the residual sum of squares from the least squares regression of ˜ Y on ˜ X ( γ γ ) . γ Fast updating schemes use Cholesky or QR decompositions with efficient algorithms to remove or add columns. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 4 / 40

Model Fitting via MCMC With p variables there are 2 p different γ γ γ values. We use Metropolis as stochastic search. γ new by randomly At each MCMC iteration we generate a candidate γ γ choosing one of these moves: γ old and change its (i) Add or Delete : randomly choose one of the indices in γ γ value. γ old and switch (ii) Swap : choose independently and at random a 0 and a 1 in γ γ their values. γ new is accepted with probability The proposed γ γ � p ( γ � γ new | X , Y ) γ γ old | X , Y ) , 1 . min p ( γ γ Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 5 / 40

Posterior inference γ ( 0 ) , γ γ ( 1 ) , . . . ) and The stochastic search results in a list of visited models ( γ γ γ their corresponding relative posterior probabilities p ( γ γ ( 0 ) | X , Y ) , p ( γ γ ( 1 ) | X , Y ) . . . γ γ Select variables: γ ’s with highest p ( γ in the “best” models, i.e. the γ γ γ γ | X , Y ) or with largest marginal posterior probabilities � p ( γ j = 1 | X , Y ) p ( γ j = 1 , γ γ ( − j ) | X , Y ) d γ = γ γ ( − j ) γ � γ ( t ) � � p ( γ ( t ) ) p ≈ Y | X , γ γ γ : γ j = 1 γ γ or more simply by empirical frequencies in the MCMC output γ ( t ) = 1 } p ( γ j = 1 | X , Y ) = E ( γ j = 1 | X , Y ) ≈ # { γ γ Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 6 / 40

Multivariate Response Y n × q = 1 α ′ + X n × p B p × q + E , E i ∼ N ( 0 , Σ Σ Σ) Variable selection via γ γ γ as B B B j | Σ Σ ∼ ( 1 − γ j ) I 0 + γ j N ( 0 , h j Σ Σ Σ Σ) , with B B B j the j -th row of B B B and I 0 a vector of point masses at 0. Need to work with matrix-variate distributions (Dawid, 1981): α ′ − XB ∼ N ( I n , Σ Y − 1 α α Σ Σ) N ( h 0 , Σ α α α − α α 0 α ∼ Σ Σ) B γ γ − B 0 γ ∼ N ( H γ γ, Σ Σ Σ) γ γ γ γ Σ Σ Σ ∼ IW ( δ, Q ) . with IW an inverse-Wishart with parameters δ and Q to be specified. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 7 / 40

Posterior Distribution Combine data and prior information into a posterior distribution ⇒ interest in posterior distribution � p ( γ γ | Y , X ) ∝ p ( γ f ( Y | X , α Σ) p ( α Σ) p ( B | Σ γ ) p (Σ Σ) d α α d B d Σ γ γ ) γ α α, B , Σ Σ α | Σ α Σ Σ , γ Σ γ Σ α Σ Σ p ( γ γ | Y , X ) ∝ g ( γ γ γ γ ) = γ ) | − q / 2 | Q γ γ | − ( n + δ + q − 1 ) / 2 p ( γ | ˜ γ ) ˜ X ′ γ γ ) X ( γ γ γ ( γ γ � � � Y � 1 X ( γ γ ) H 2 γ ˜ ( γ γ γ ) , ˜ γ ) = Y = X ( γ γ I p γ 0 γ γ Y ′ ˜ Y ′ ˜ γ ) ) − 1 ˜ γ = Q + ˜ Y − ˜ γ ) (˜ γ ) ˜ γ ) ˜ X ′ X ′ Q γ X ( γ X ( γ Y γ γ γ ( γ γ ( γ γ It can be calculated via QR -decomposition (Seber, ch.10, 1984). Use qrdelete and qrinsert algorithms to remove or add a column. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 8 / 40

Prediction Prediction of future Y f given the corresponding X f can be done: as posterior weighted average of model predictions (BMA) � p ( Y f | X , Y ) = p ( Y f | X , Y , γ γ ) p ( γ γ γ γ | X , Y ) γ γ γ with p ( Y f | X , Y , γ γ ) a matrix-variate T distribution with mean X f ˆ γ B γ γ γ � � � X f ˆ γ ˆ Y f = p ( γ γ | X , Y ) B γ γ γ γ γ γ γ γ γ ˆ γ = ( X ′ γ + H − 1 γ ) − 1 X ′ B γ γ X γ γ Y γ γ γ γ γ γ γ γ as LS or Bayes predictions on single best models as LS or Bayes predictions with “threshold” models (eg, “median” model) obtained from estimated marginal probabilities of inclusion. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 9 / 40

Prior Specification Priors on α α α and Σ Σ Σ vague and largely uninformative α ′ − α 0 ∼ N ( h , Σ α 0 ≡ 0 , h → ∞ , α ′ α α α Σ Σ) , α α δ = 3 , Q = k I Σ Σ Σ ∼ IW ( δ, Q ) , Choices for H γ γ : γ γ ) − 1 (Zellner g-prior) γ = c ∗ ( X ′ H γ γ X γ γ γ γ γ γ = c ∗ diag ( X ′ γ ) − 1 H γ γ X γ γ γ γ γ γ = c ∗ I γ H γ γ γ γ Choice of w j = p ( γ j = 1 ) : w j = w , w ∼ Beta ( a , b ) (sparsity). Also, choices that reflect prior information (e.g., gene networks). Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 10 / 40

Advantages of Bayesian Approach Past and collateral information through priors n << p Rich modeling via Markov chain Monte Carlo (MCMC) (for p large) Optimal model averaging prediction Extends to multivariate response Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 11 / 40

Main References G EORGE , E.I. and M C C ULLOCH , R.E. (1993). Variable Selection via Gibbs Sampling. Journal of the American Statistical Association , 88 , 881–889. G EORGE , E.I. and M C C ULLOCH , R.E. (1997). Approaches for Bayesian Variable Selection. Statistica Sinica , 7 , 339–373. M ADIGAN , D. and Y ORK , J. (1995). Bayesian Graphical Models for Discrete Data. International Statistical Review , 63 , 215–232 B ROWN , P.J., V ANNUCCI , M. and F EARN , T. (1998). Multivariate Bayesian Variable Selection and Prediction. Journal of the Royal Statistical Society, Series B , 60 , 627–641. B ROWN , P.J., V ANNUCCI , M. and F EARN , T. (2002). Bayes model averaging with selection of regressors. Journal of the Royal Statistical Society, Series B , 64(3) , 519–536. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 12 / 40

Additional References Use of g-priors: L IANG , F., P AULO , R., M OLINA , G., C LYDE , M. and B ERGER , J. (2008). Mixture of g priors for Bayes variable section. Journal of the American Statistical Association , 103 , 410-423. Improving MCMC mixing: B OTTOLO , L. and R ICHARDSON , S. (2010). Evolutionary stochastic search for Bayesian Model Exploration. Bayesian Analysis , 5(3), 583-618. The authors propose an evolutionary Monte Carlo scheme combined with a parallel tempering approach that prevents the chain from getting stuck in local modes. Multiplicity: S COTT , J. and B ERGER , J. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics , 38(5), 2587-2619. The marginal prior on γ γ γ contains a non-linear penalty which is a function of p and therefore, as p grows, with the number of true variables remaining fixed, the posterior distribution of w concentrates near 0. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 13 / 40

Code from my Website bvsme fast: Bayesian Variable Selection with fast form of QR updating Metropolis search gPrior or diagonal and non-diagonal selection prior Bernoulli priors or Beta-Binomial prior Predictions by LS, BMA and BMA with selection http://stat.rice.edu/ ∼ marina Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 14 / 40

Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 1: Mixture Priors for Linear Settings Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Luigi Spezia Biomathematics & Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 3:

Bayesian variable selection Dr. Jarad Niemi Iowa State University September 4, 2017 Jarad Niemi

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 5:

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 2:

Variable selection STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

MLCC 2019 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 4:

Sequential Monte Carlo Methods for Bayesian Model Selection in Positron Emission Tomography Yan

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Application of Local Influence Diagnostics to the Buckley-James Model Nazrina Aziz 1 and Dong Q

Accurate Regression Parameters and Summary Statistics Estimation in Data with Censored Missing

r t ss ttst

Conditional density estimation in a censored single-index regression model Olivier Bouaziz 1 and

Sequential Model-based Optimization for General Algorithm Configuration Frank Hutter, Holger

RNA Search and Motif Discovery CSEP 527 Computational Biology Previous Lecture

Response Surface Methods 07.12.2016 Goals of Todays Lecture See how a sequence of experiments

ECE444: Software Engineering Design Patterns 3 Shurui Zhou OO Design Principles Building stable

Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 1: Mixture Priors for Linear Settings Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Luigi Spezia Biomathematics &amp; Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 3:

Bayesian variable selection Dr. Jarad Niemi Iowa State University September 4, 2017 Jarad Niemi

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 5:

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 2:

Variable selection STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

MLCC 2019 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 4:

Sequential Monte Carlo Methods for Bayesian Model Selection in Positron Emission Tomography Yan

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Application of Local Influence Diagnostics to the Buckley-James Model Nazrina Aziz 1 and Dong Q

Accurate Regression Parameters and Summary Statistics Estimation in Data with Censored Missing

r t ss ttst

Conditional density estimation in a censored single-index regression model Olivier Bouaziz 1 and

Sequential Model-based Optimization for General Algorithm Configuration Frank Hutter, Holger

RNA Search and Motif Discovery CSEP 527 Computational Biology Previous Lecture

Response Surface Methods 07.12.2016 Goals of Todays Lecture See how a sequence of experiments

ECE444: Software Engineering Design Patterns 3 Shurui Zhou OO Design Principles Building stable

Luigi Spezia Biomathematics & Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?