Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 3: Variable Selection for Mixture Models Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 1 / 36

Part 3: Variable Selection for Mixture Models Finite mixture models for sample clustering. Variable selection Simulated data Supervised case (discriminant analysis). Applications to genomic data. Case study in imaging genetics. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 2 / 36

So far we have focussed our attention on linear settings. Now, mixture models, characterizing behavior of data that arise from a mixture of subpopulations. Mixture models widely used in classification, clustering, density estimation. Simple example: x i ∼ w 1 N ( µ 1 , Σ 1 ) + w 2 N ( µ 2 , Σ 2 ) then any sample can come from two distributions: x i ∼ N ( µ 1 , Σ 1 ) with probability w 1 x i ∼ N ( µ 2 , Σ 2 ) with probability w 2 We address the case of many variables. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 3 / 36

Objective Simultaneous variable selection and sample clustering Cluster structure of samples confined to a small subset of variables. Noisy variables mask the recovery of the clusters. Proposed methodology: - Use multivariate normal mixture model with an unknown number of components to determine cluster structure of the samples. - Use stochastic search techniques to examine the space of variable subsets and identify most probable models. - Also, infinite mixture models via Dirichlet process priors. Genomic data: Identify disease subtypes and select the discriminating genes. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 4 / 36

Finite Mixture Models In the case of G components G � x i | w , θ iid w k f ( x i | θ k ) k = 1 w ∼ Dir ( α, G ) θ k ∼ π k ( θ k ) G ∼ π ( G )? where the mixture weights follow a Dirichlet distribution with parameter α . We will consider f ( x i | θ k ) multivariate normal with θ k = ( µ k , Σ k ) . Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 5 / 36

An alternative specification (from a missing data perspective) uses latent variables y = ( y 1 , . . . , y n ) ′ , where y i = k if the i th observation comes from cluster k ( x i | y i = k , w , θ ) ∼ f ( x i | θ k ) p ( y i = k ) = w k w ∼ Dir ( α, G ) θ k ∼ π k ( θ k ) G ∼ π ( G )? This facilitates inference via Gibbs sampler (McLachlan and Basford (1988)). Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 6 / 36

Posterior inference for fixed G Gibbs sampling proceeds at each iteration, given G , with posterior conditionals: P ( y ( t ) = k |· ) ≈ w ( t − 1 ) f ( x i | θ ( t − 1 ) ) i k k P ( w ( t ) k |· ) = Dir ( α 1 + n 1 , ..., α G + n G ) k ) � k ) y ( t ) p ( θ ( t ) k |· ) = w k ( θ ( t ) i f k ( x i | θ ( t ) ik Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 7 / 36

Reversible Jump MCMC What if G is unknown? Treat G as unknown parameter, model dimension changes. Use RJMCMC by Green (1995, Biometrika ) Allows moves between parameter spaces with different dimensions. Additional random variables are introduced to ensure dimension matching. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 8 / 36

The general idea for reversible jump, given unknown G and θ k , is: From a starting state ( G , θ G ) propose a new model with probability J G , G ∗ and generate an augmenting random u from a proposal J ( u | G , G ∗ , θ G ) . Determine the proposed model parameters as θ G ∗ = g G , G ∗ ( θ G , u ) where g is a deterministic function that relates the parameters of model G to those of G ∗ . Accept the new model with probability min ( r , 1 ) where r = p ( x | θ G ∗ ) π ( θ G ∗ ) π G ∗ J G , G ∗ J ( u | G ∗ , G , θ G ∗ ) × | Jacobian | p ( x | θ G ) π ( θ G ) π G J G ∗ , G J ( u | G , G ∗ , θ G ) with Jacobian = ∇ g G , G ∗ ( θ G , u ) ∇ ( θ G , u ) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 9 / 36

Reversible jump can be thought of as a generalization of MH sampler MH sampler: r = { likelihood x prior x proposal ratios } RJ sampler: r = { likelihood x prior x proposal ratios x Jacobian } Usually implementation has three kind of moves BIRTH: Move to dimension k + 1 DEATH: Move to dimension k − 1 MOVE: Move within dimension k Not necessarily nested models Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 10 / 36

Variable Selection Discriminating variables define a mixture of G distributions. Introduce latent p -vector γ with binary entries � γ j = 1 if variable j defines a mixture distribution γ j = 0 otherwise. The likelihood function is given by G � − pnk − nk 2 w n k L ( G , γ, w , µ, Σ , η, Ω | X , y ) = 2 | Σ k | ( 2 π ) k k = 1     �  − 1 ( x ( γ ) i − µ ( γ ) k ) T Σ − 1 × exp ( γ ) k ( x ( γ ) i − µ ( γ ) k )  2 x i ∈ C k × φ ( X ( γ c ) | η ( γ c ) , Ω ( γ c ) ) , where C k = { x i | y i = k } with cardinality n k , φ ( . ) is multivariate normal density. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 11 / 36

Prior Model Assume γ j ’s are independent Bernoulli variables Number of components, G , can be assumed to follow a truncated Poisson or a discrete Uniform on [ 2 , . . . , G max ] . w | G ∼ Dirichlet ( α, . . . , α ) . � µ k ( γ ) | Σ k ( γ ) , G N ( µ 0 ( γ ) , h Σ k ( γ ) ) ∼ , Σ k ( γ ) | G IW ( δ ; Q γ ) ∼ where ( γ ) indicates the covariates with γ j = 1. Conjugate priors on parameters for case γ j = 0. We work with a marginalized likelihood. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 12 / 36

Model Fitting (1) Update γ by Metropolis algorithm (add/delete and swap moves). (2) Update w from its full conditional (Dirichlet draw). (3) Update y from its full conditional (multinomial draw). (4) Split one cluster into two, or merge two into one. (5) Birth or death of an empty component. Steps (4) and (5) via reversible jump MCMC extended to multivariate setting. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 13 / 36

Updating γ Metropolis move to update γ old to γ new : (a) Add/delete : randomly choose a γ j and change its value. (b) Swap : randomly choose a 0 and a 1 in γ old and switch values. New candidate γ new accepted with probability � � 1 , f ( γ new | X , G , w , y ) min . f ( γ old | X , G , w , y ) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 14 / 36

Updating w and y w | G , γ, y , X ∼ Dirichlet ( α + n 1 , . . . , α + n G ) . y updated one element at a time from f ( y i = k | X , y ( − i ) , γ, w , G ) . Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 15 / 36

Split/merge and birth/death moves The proposal for these moves in the multivariate setting is intricate. It is necessary to integrate out µ and Σ . Deriving f ( y i | X , G , w , γ ) is computationally prohibitive. Defining adjacency in the multivariate setting is not straightforward. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 16 / 36

Posterior Inference for y Number of clusters, G , estimated by value most frequently visited by MCMC sampler. Estimate marginal posterior probabilities p ( y i = k | X , G ) . Posterior allocation of sample i estimated as y i = max 1 ≤ k ≤ G { p ( y i = k | X , G ) } . � Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 17 / 36

Posterior Inference for γ Select variables with largest marginal posterior probability p ( γ j = 1 | X , G ) Select variables that are in the “best” models � � p ( γ ( t ) | X , G , � w , � y ) � γ ∗ = argmax , 1 ≤ t ≤ M � M t = 1 w ( t ) . y the estimated sample allocations and � w = 1 with � M Tadesse, Sha and Vannucci ( JASA , 2005) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 18 / 36

Infinite Mixture Models via Dirichlet Process Priors Integrating over w and taking G → ∞ we get n − i , k p ( y i = k and y l = k for some l � = i | y − i ) = n − 1 + α α p ( y i � = y l for all l � = i | y − i ) = n − 1 + α. (1) MCMC updates γ via Metropolis and y i from full conditionals p ( y i = k and y l = k for some l � = i | y − i , X , γ ) p ( y i � = y l for all l � = i | y − i , X , γ ) . (2) Inference on y by MAP or by estimating p ( y i = y j | X ) . Same as before for γ Natural approach to clustering (samples from a DP can have a number of ties). Kim, Tadesse and Vannucci ( Biometrika , 2006) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 19 / 36

Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 3: Variable Selection for Mixture Models Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Luigi Spezia Biomathematics & Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

Bayesian variable selection Dr. Jarad Niemi Iowa State University September 4, 2017 Jarad Niemi

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 5:

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 2:

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 1:

Variable selection STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

MLCC 2019 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 4:

Sequential Monte Carlo Methods for Bayesian Model Selection in Positron Emission Tomography Yan

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

things relate to each other. In a ratio the order is very important. To express a ratio, you

Greening the Campus Clockwise from top left: Furman University Rice University Ball State

NIJ Fellowship Applications Amy Crawford, Nate Garton, and Kiegan Rice April 11, 2018

Comp 212 - Intermediate Programming EXAM #1 February 12, 2003 Rice University - Instructors: Cox

2013 NC NCSF SFPA PA Annu An nual al Con onfe ference rence USDA General Session Salt

Commi Commission ssion Briefing Briefing on on Equal Employment Equal Employment Opportu

Ruin problem for integrated stationary Gaussian process Kobelkov S. G. 1 Consider a random process

An Update on the Atmospheric Methane Growth Rate: Growth Surges During 2014 E. Dlugokencky 1 ,

Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 3: Variable Selection for Mixture Models Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Luigi Spezia Biomathematics &amp; Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

Bayesian variable selection Dr. Jarad Niemi Iowa State University September 4, 2017 Jarad Niemi

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 5:

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 2:

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 1:

Variable selection STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

MLCC 2019 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 4:

Sequential Monte Carlo Methods for Bayesian Model Selection in Positron Emission Tomography Yan

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

things relate to each other. In a ratio the order is very important. To express a ratio, you

Greening the Campus Clockwise from top left: Furman University Rice University Ball State

NIJ Fellowship Applications Amy Crawford, Nate Garton, and Kiegan Rice April 11, 2018

Comp 212 - Intermediate Programming EXAM #1 February 12, 2003 Rice University - Instructors: Cox

2013 NC NCSF SFPA PA Annu An nual al Con onfe ference rence USDA General Session Salt

Commi Commission ssion Briefing Briefing on on Equal Employment Equal Employment Opportu

Ruin problem for integrated stationary Gaussian process Kobelkov S. G. 1 Consider a random process

An Update on the Atmospheric Methane Growth Rate: Growth Surges During 2014 E. Dlugokencky 1 ,

Luigi Spezia Biomathematics & Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?