Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 4: Non-linear Models via Gaussian Processes Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 1 / 16
Part 4: Non-linear Models via Gaussian Processes 1. Gaussian processes for nonlinear models 2. Methods for variable selection and computational strategies 3. Simulated and real data examples Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 2 / 16
Nonlinear Models via Gaussian Processes Gaussian processes describe nonparametric relationships between a response and a set of predictors. In regression replace X β with z ( X ) , y = z ( X ) + ǫ, ǫ ∼ N 0 , σ 2 I n � � and wrap X in a GP, z ( X ) ∼ N ( 0 , C ) , C = Cov ( z ( X )) Marginalize over z � � 1 �� y | C , r ∼ N n r I n + C 0 , to obtain a nonparametric regression model where the covariance matrix varies with the predictors. Diggle et al. ( JRSSC , 1998), Neal (1999); Linkletter et al. ( Tech ,2006) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 3 / 16
Choice of the Covariance Matrix Exponential form C = Cov ( z ( X )) = λ a 1 n + 1 1 λ z exp ( − G ) g ij = ( x i − x j ) ′ P ( x i − x j ) , P = diag ( − log ( ρ 1 , . . . , ρ p )) , ρ k ∈ [ 0 , 1 ] 3 2.5 2 2.5 1.5 2 1 1.5 0.5 1 Y Y 0 0.5 −0.5 0 −1 −0.5 −1.5 −1 −2 −1.5 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 x x 1 3 2 0.5 1 0 0 Y −0.5 Y −1 −1 −2 −1.5 −3 −2 −4 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 x x Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 4 / 16
General Covariance Formulation: Mat´ ern Employs explicit smoothing parameter, ν ∈ [ 0 , ∞ ) � ν � � � 1 � � C ( z ( x i ) , z ( x j )) = ν d ( x i , x j ) K ν ν d ( x i , x j ) , 2 2 2 ν − 1 Γ( ν ) Parameterize d ( x i , x j ) = ( x i − x j ) ′ P ( x i − x j ) Recall P = diag ( − log ( ρ 1 , . . . , ρ p )) Mat´ ern = exponential for ν > 7 / 2 (a) Matern Covariance: ν = 0.5, ρ = 0.05 (b) Matern Covariance: ν = 0.5, ρ = 0.95 (c) Matern Covariance: ν = 4.0, ρ = 0.05 2 1.5 1 1.5 0.5 1 1 0 0.5 0.5 −0.5 0 0 Y Y Y −1 −0.5 −0.5 −1.5 −1 −1 −2 −1.5 −1.5 −2.5 −2 −2.5 −2 −3 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 x x x Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 5 / 16
Nonlinear Models y = f ( x ) + ǫ GP models are contained in the class of nonparametric kernel regression with exponential family observations, Rasmusen & Williams (2006). Kernel models include splines models and models that use regularized methods. With respect to nonparametric spline regression models GP models are less interpretable but better suited for prediction. Prediction performances of GP models are competitive with ensamble learning models, such as bagging, boosting and random forest models, Hastie et al. (2001). Variable selection can easily be achieved within GP models. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 6 / 16
Mixture Priors for Variable Selection Extract a cell from C p C ij = 1 + 1 ρ ( x ik − x jk ) 2 � k λ a λ z k = 1 → x k does not influence y (via C ) ρ k ∈ ( 0 , 1 ] ; ρ k = 1 − Selection parameters, γ = { γ 1 , . . . , γ p } Select { ρ k } with { γ k } : π ( ρ k | γ k ) = γ k U ( 0 , 1 ) + ( 1 − γ k ) δ 1 ( ρ k ) γ ∼ Bernoulli ( α ) , λ a ∼ G ( 1 , 1 ) , λ z ∼ G ( 1 , 1 ) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 7 / 16
MCMC for posterior inference Similar to MC 3 scheme, but here we traverse both model / parameter spaces Randomly choose 3 Between-models moves: Add : randomly choose k : γ k = 0, set γ ′ k = 1 and propose q ( ρ ′ k | ρ k ) = q ( ρ ′ k ) ∼ U ( 0 , 1 ) Delete : randomly choose k : γ k = 1, set ( γ ′ k = 0 , ρ k = 1 ) Swap: Jointly propose (Add, Delete) moves ′ , ρ ′ Accept proposed value ( γ γ ′ ) jointly ′ Add a within-model move to speed convergence: For all γ k = 1 propose q ( ρ ′′ k ) ∼ U ( 0 , 1 ) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 8 / 16
Generalized formulation GLM with link function g ( η i ) = z ( x i ) z ( X ) ∼ N ( 0 , C ) Regression, logit and probit models. Poisson canonical link function for count data i exp ( − λ i ) 1 π ( s i | λ i ) = λ s i s i ! ∝ exp ( s i log ( λ i ) − λ i ) and define the Poisson GP regression model g ( η ) = log ( λ ) = z ( X ) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 9 / 16
Cox formulation for survival data Define the hazard rate function as h ( t i | z ( x i )) = h 0 ( t i ) exp ( z ( x i )) , i = 1 , 2 , . . . , n Fits spirit of semi-parametric construction of Cox (1972) Partial likelihood avoids baseline hazard estimation Use likelihood formulation of Kalbfleisch (1978) with a Gamma process prior on the baseline hazard Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 10 / 16
Simulation: Count Data ( n = 100 , p = 1000 ) y i = 1 . 6 ( x i , 1 + x i , 2 + x i , 3 + x i , 4 ) + sin ( 3 x i , 5 ) + sin ( 5 x i , 6 ) + ǫ, s i = P ois ( exp ( y i )) Selected Predictors, γ k , based on EFDR = 0 Posterior Samples of ρ k 1 1 0.9 0.9 0.8 0.8 0.7 0.7 P( γ k = 1 | D) 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 2 4 6 8 10 12 14 16 18 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 γ 1 … γ 20 Variable Selection Parameters: Predictor Low order polynomial-like association ρ 1 , . . . , ρ 4 close to 1; High order/non-linear association: ρ 5 , ρ 4 closer to 0 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 11 / 16
Simulation: Cox GP model ( n = 100 , p = 1000 ) y i = ( 3 x i , 1 − 2 . 5 x i , 2 + 3 . 5 x i , 3 − 3 x i , 4 ) + sin ( 3 x i , 5 ) − sin ( 5 x i , 6 ) + ǫ, Event time observations from a Cox model with survivor function: S ( t | y ) = exp [ − H 0 ( t ) exp ( y )] , H 0 ( t ) = λ t , λ = 0 . 2 t = M / ( λ exp ( y )) , M ∼ Exp ( 1 ) with 5 % randomly censored Selected Predictors, γ k , based on EFDR = 0.01 Posterior Samples of ρ k 1 1 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 Survival Probability P( γ k = 1 | D) 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 0 Predictor 2 4 6 8 10 12 14 16 18 20 −8 −6 −4 −2 0 2 4 6 γ 1 … γ 20 Variable Selection Parameters: log Survival Time Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 12 / 16
Application: Ozone Count Data Integer particle counts per one million particles of air near Los Angeles for n = 330 days and an associated set of 8 meteorological predictors. We held out a randomly chosen set of 165 observations for validation. Selected Predictors, γ k , based on EFDR = 0.09 Posterior Samples of ρ k 1 1 0.9 0.9 0.8 0.8 0.7 0.7 P( γ k = 1 | D) 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 1 2 3 4 5 6 7 8 0 Predictor 1 2 3 4 5 6 7 8 γ 1 … γ 8 Variable Selection Parameters: Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 13 / 16
Analyzed by Liang et al (2007) with a linear regression model including all linear and quadratic terms ( p = 44). Prior on g RMSE ( M γ ) M γ p γ X 5 , X 6 , X 7 , X 2 6 , X 2 7 , X 3 X 5 Local Empirical Bayes 6 4.5 X 5 , X 6 , X 7 , X 2 6 , X 2 7 , X 3 X 5 Hyper-g (a=4) 6 4.5 X 5 , X 6 , X 7 , X 2 6 , X 2 7 , X 3 X 5 Fixed (BIC) 6 4.5 X 1 X 6 , X 1 X 7 , X 6 X 7 , X 2 1 , X 2 3 , X 2 Brown et al (2002) 6 4.5 7 X 3 , X 6 , X 7 GP model 3 3.7 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 14 / 16
Application: Wisconsin Breast Cancer Time-to-recurrence in n = 194 subjects, 76 % censored p = 32: characteristics of cell nuclei present in breast mass e.g. shape, size, texture Obtained from Fine Needle Aspiration (FNA) digitized image Selected Predictors, γ k , based on EFDR = 0.03 Posterior Samples of ρ k 1 1 1 0.9 0.95 0.9 0.8 0.9 0.8 0.7 0.7 Survival Probability 0.85 P( γ k = 1 | D) 0.6 0.6 0.8 0.5 0.5 0.75 0.4 0.4 0.3 0.7 0.3 0.2 0.65 0.2 0.1 0.1 0.6 0 2 4 5 6 7 17 18 20 25 28 32 0 0.55 Predictor 5 10 15 20 25 30 0 1 2 3 4 5 γ 1 … γ 32 Variable Selection Parameters: log Survival Time Note boxplot mix of lower and higher order covariate associations Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 15 / 16
Recommend
More recommend