bayesian methods for variable selection with applications
play

Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 1: Mixture Priors for Linear Settings Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian


  1. Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 1: Mixture Priors for Linear Settings Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 1 / 40

  2. Part 1: Mixture Priors for Linear Settings Linear regression models (univariate and multivariate responses) Extensions to categorical responses and survival outcomes Matlab code Examples from genomics/proteomics Bayesian models for integrative genomics (next part) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 2 / 40

  3. Regression Model ε ∼ N ( 0 , σ 2 I ) Y n × 1 = 1 α + X n × p β β p × 1 + ε, β γ = ( γ 1 , . . . , γ p ) ′ to select variables Introduce latent variable γ γ � γ j = 1 if variable j included in model γ j = 0 otherwise Specify priors for model parameters: β j | σ 2 ( 1 − γ j ) δ 0 ( β j ) + γ j N ( 0 , σ 2 h j ) ∼ N ( α 0 , h 0 σ 2 ) α | σ 2 ∼ σ 2 IG ( ν/ 2 , λ/ 2 ) ∼ p � p ( γ w γ j ( 1 − w ) 1 − γ j . γ γ ) = j = 1 where δ 0 ( · ) is the Dirac function. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 3 / 40

  4. Posterior Distribution Combine data and prior information into a posterior distribution ⇒ interest in posterior distribution � p ( γ γ | Y , X ) ∝ p ( γ f ( Y | X , α, β σ ) p ( α | σ ) p ( β γ ) p ( σ ) d α d β β d σ γ γ γ ) β β, σ σ β | σ, γ β γ β p ( γ γ | Y , X ) ∝ g ( γ γ γ ) = γ γ ) − ( n + ν ) / 2 p ( γ | ˜ γ ) ˜ γ ) | − 1 / 2 ( νλ + S 2 X ′ γ γ ) X ( γ γ γ γ ( γ γ � � � Y � 1 X ( γ γ ) H 2 γ ˜ , ˜ ( γ γ γ ) γ ) = Y = X ( γ γ I p γ 0 γ γ Y ′ ˜ Y ′ ˜ γ ) ) − 1 ˜ γ = ˜ Y − ˜ γ ) (˜ γ ) ˜ γ ) ˜ S 2 X ′ X ′ X ( γ X ( γ Y γ γ γ γ ( γ γ ( γ γ the residual sum of squares from the least squares regression of ˜ Y on ˜ X ( γ γ ) . γ Fast updating schemes use Cholesky or QR decompositions with efficient algorithms to remove or add columns. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 4 / 40

  5. Model Fitting via MCMC With p variables there are 2 p different γ γ γ values. We use Metropolis as stochastic search. γ new by randomly At each MCMC iteration we generate a candidate γ γ choosing one of these moves: γ old and change its (i) Add or Delete : randomly choose one of the indices in γ γ value. γ old and switch (ii) Swap : choose independently and at random a 0 and a 1 in γ γ their values. γ new is accepted with probability The proposed γ γ � p ( γ � γ new | X , Y ) γ γ old | X , Y ) , 1 . min p ( γ γ Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 5 / 40

  6. Posterior inference γ ( 0 ) , γ γ ( 1 ) , . . . ) and The stochastic search results in a list of visited models ( γ γ γ their corresponding relative posterior probabilities p ( γ γ ( 0 ) | X , Y ) , p ( γ γ ( 1 ) | X , Y ) . . . γ γ Select variables: γ ’s with highest p ( γ in the “best” models, i.e. the γ γ γ γ | X , Y ) or with largest marginal posterior probabilities � p ( γ j = 1 | X , Y ) p ( γ j = 1 , γ γ ( − j ) | X , Y ) d γ = γ γ ( − j ) γ � γ ( t ) � � p ( γ ( t ) ) p ≈ Y | X , γ γ γ : γ j = 1 γ γ or more simply by empirical frequencies in the MCMC output γ ( t ) = 1 } p ( γ j = 1 | X , Y ) = E ( γ j = 1 | X , Y ) ≈ # { γ γ Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 6 / 40

  7. Multivariate Response Y n × q = 1 α ′ + X n × p B p × q + E , E i ∼ N ( 0 , Σ Σ Σ) Variable selection via γ γ γ as B B B j | Σ Σ ∼ ( 1 − γ j ) I 0 + γ j N ( 0 , h j Σ Σ Σ Σ) , with B B B j the j -th row of B B B and I 0 a vector of point masses at 0. Need to work with matrix-variate distributions (Dawid, 1981): α ′ − XB ∼ N ( I n , Σ Y − 1 α α Σ Σ) N ( h 0 , Σ α α α − α α 0 α ∼ Σ Σ) B γ γ − B 0 γ ∼ N ( H γ γ, Σ Σ Σ) γ γ γ γ Σ Σ Σ ∼ IW ( δ, Q ) . with IW an inverse-Wishart with parameters δ and Q to be specified. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 7 / 40

  8. Posterior Distribution Combine data and prior information into a posterior distribution ⇒ interest in posterior distribution � p ( γ γ | Y , X ) ∝ p ( γ f ( Y | X , α Σ) p ( α Σ) p ( B | Σ γ ) p (Σ Σ) d α α d B d Σ γ γ ) γ α α, B , Σ Σ α | Σ α Σ Σ , γ Σ γ Σ α Σ Σ p ( γ γ | Y , X ) ∝ g ( γ γ γ γ ) = γ ) | − q / 2 | Q γ γ | − ( n + δ + q − 1 ) / 2 p ( γ | ˜ γ ) ˜ X ′ γ γ ) X ( γ γ γ ( γ γ � � � Y � 1 X ( γ γ ) H 2 γ ˜ ( γ γ γ ) , ˜ γ ) = Y = X ( γ γ I p γ 0 γ γ Y ′ ˜ Y ′ ˜ γ ) ) − 1 ˜ γ = Q + ˜ Y − ˜ γ ) (˜ γ ) ˜ γ ) ˜ X ′ X ′ Q γ X ( γ X ( γ Y γ γ γ ( γ γ ( γ γ It can be calculated via QR -decomposition (Seber, ch.10, 1984). Use qrdelete and qrinsert algorithms to remove or add a column. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 8 / 40

  9. Prediction Prediction of future Y f given the corresponding X f can be done: as posterior weighted average of model predictions (BMA) � p ( Y f | X , Y ) = p ( Y f | X , Y , γ γ ) p ( γ γ γ γ | X , Y ) γ γ γ with p ( Y f | X , Y , γ γ ) a matrix-variate T distribution with mean X f ˆ γ B γ γ γ � � � X f ˆ γ ˆ Y f = p ( γ γ | X , Y ) B γ γ γ γ γ γ γ γ γ ˆ γ = ( X ′ γ + H − 1 γ ) − 1 X ′ B γ γ X γ γ Y γ γ γ γ γ γ γ γ as LS or Bayes predictions on single best models as LS or Bayes predictions with “threshold” models (eg, “median” model) obtained from estimated marginal probabilities of inclusion. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 9 / 40

  10. Prior Specification Priors on α α α and Σ Σ Σ vague and largely uninformative α ′ − α 0 ∼ N ( h , Σ α 0 ≡ 0 , h → ∞ , α ′ α α α Σ Σ) , α α δ = 3 , Q = k I Σ Σ Σ ∼ IW ( δ, Q ) , Choices for H γ γ : γ γ ) − 1 (Zellner g-prior) γ = c ∗ ( X ′ H γ γ X γ γ γ γ γ γ = c ∗ diag ( X ′ γ ) − 1 H γ γ X γ γ γ γ γ γ = c ∗ I γ H γ γ γ γ Choice of w j = p ( γ j = 1 ) : w j = w , w ∼ Beta ( a , b ) (sparsity). Also, choices that reflect prior information (e.g., gene networks). Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 10 / 40

  11. Advantages of Bayesian Approach Past and collateral information through priors n << p Rich modeling via Markov chain Monte Carlo (MCMC) (for p large) Optimal model averaging prediction Extends to multivariate response Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 11 / 40

  12. Main References G EORGE , E.I. and M C C ULLOCH , R.E. (1993). Variable Selection via Gibbs Sampling. Journal of the American Statistical Association , 88 , 881–889. G EORGE , E.I. and M C C ULLOCH , R.E. (1997). Approaches for Bayesian Variable Selection. Statistica Sinica , 7 , 339–373. M ADIGAN , D. and Y ORK , J. (1995). Bayesian Graphical Models for Discrete Data. International Statistical Review , 63 , 215–232 B ROWN , P.J., V ANNUCCI , M. and F EARN , T. (1998). Multivariate Bayesian Variable Selection and Prediction. Journal of the Royal Statistical Society, Series B , 60 , 627–641. B ROWN , P.J., V ANNUCCI , M. and F EARN , T. (2002). Bayes model averaging with selection of regressors. Journal of the Royal Statistical Society, Series B , 64(3) , 519–536. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 12 / 40

  13. Additional References Use of g-priors: L IANG , F., P AULO , R., M OLINA , G., C LYDE , M. and B ERGER , J. (2008). Mixture of g priors for Bayes variable section. Journal of the American Statistical Association , 103 , 410-423. Improving MCMC mixing: B OTTOLO , L. and R ICHARDSON , S. (2010). Evolutionary stochastic search for Bayesian Model Exploration. Bayesian Analysis , 5(3), 583-618. The authors propose an evolutionary Monte Carlo scheme combined with a parallel tempering approach that prevents the chain from getting stuck in local modes. Multiplicity: S COTT , J. and B ERGER , J. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics , 38(5), 2587-2619. The marginal prior on γ γ γ contains a non-linear penalty which is a function of p and therefore, as p grows, with the number of true variables remaining fixed, the posterior distribution of w concentrates near 0. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 13 / 40

  14. Code from my Website bvsme fast: Bayesian Variable Selection with fast form of QR updating Metropolis search gPrior or diagonal and non-diagonal selection prior Bernoulli priors or Beta-Binomial prior Predictions by LS, BMA and BMA with selection http://stat.rice.edu/ ∼ marina Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 1) ABS13-Italy 06/17-21/2013 14 / 40

Recommend


More recommend