bayesian methods for variable selection with applications
play

Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 3: Variable Selection for Mixture Models Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian


  1. Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 3: Variable Selection for Mixture Models Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 1 / 36

  2. Part 3: Variable Selection for Mixture Models Finite mixture models for sample clustering. Variable selection Simulated data Supervised case (discriminant analysis). Applications to genomic data. Case study in imaging genetics. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 2 / 36

  3. So far we have focussed our attention on linear settings. Now, mixture models, characterizing behavior of data that arise from a mixture of subpopulations. Mixture models widely used in classification, clustering, density estimation. Simple example: x i ∼ w 1 N ( µ 1 , Σ 1 ) + w 2 N ( µ 2 , Σ 2 ) then any sample can come from two distributions: x i ∼ N ( µ 1 , Σ 1 ) with probability w 1 x i ∼ N ( µ 2 , Σ 2 ) with probability w 2 We address the case of many variables. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 3 / 36

  4. Objective Simultaneous variable selection and sample clustering Cluster structure of samples confined to a small subset of variables. Noisy variables mask the recovery of the clusters. Proposed methodology: - Use multivariate normal mixture model with an unknown number of components to determine cluster structure of the samples. - Use stochastic search techniques to examine the space of variable subsets and identify most probable models. - Also, infinite mixture models via Dirichlet process priors. Genomic data: Identify disease subtypes and select the discriminating genes. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 4 / 36

  5. Finite Mixture Models In the case of G components G � x i | w , θ iid w k f ( x i | θ k ) k = 1 w ∼ Dir ( α, G ) θ k ∼ π k ( θ k ) G ∼ π ( G )? where the mixture weights follow a Dirichlet distribution with parameter α . We will consider f ( x i | θ k ) multivariate normal with θ k = ( µ k , Σ k ) . Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 5 / 36

  6. An alternative specification (from a missing data perspective) uses latent variables y = ( y 1 , . . . , y n ) ′ , where y i = k if the i th observation comes from cluster k ( x i | y i = k , w , θ ) ∼ f ( x i | θ k ) p ( y i = k ) = w k w ∼ Dir ( α, G ) θ k ∼ π k ( θ k ) G ∼ π ( G )? This facilitates inference via Gibbs sampler (McLachlan and Basford (1988)). Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 6 / 36

  7. Posterior inference for fixed G Gibbs sampling proceeds at each iteration, given G , with posterior conditionals: P ( y ( t ) = k |· ) ≈ w ( t − 1 ) f ( x i | θ ( t − 1 ) ) i k k P ( w ( t ) k |· ) = Dir ( α 1 + n 1 , ..., α G + n G ) k ) � k ) y ( t ) p ( θ ( t ) k |· ) = w k ( θ ( t ) i f k ( x i | θ ( t ) ik Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 7 / 36

  8. Reversible Jump MCMC What if G is unknown? Treat G as unknown parameter, model dimension changes. Use RJMCMC by Green (1995, Biometrika ) Allows moves between parameter spaces with different dimensions. Additional random variables are introduced to ensure dimension matching. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 8 / 36

  9. The general idea for reversible jump, given unknown G and θ k , is: From a starting state ( G , θ G ) propose a new model with probability J G , G ∗ and generate an augmenting random u from a proposal J ( u | G , G ∗ , θ G ) . Determine the proposed model parameters as θ G ∗ = g G , G ∗ ( θ G , u ) where g is a deterministic function that relates the parameters of model G to those of G ∗ . Accept the new model with probability min ( r , 1 ) where r = p ( x | θ G ∗ ) π ( θ G ∗ ) π G ∗ J G , G ∗ J ( u | G ∗ , G , θ G ∗ ) × | Jacobian | p ( x | θ G ) π ( θ G ) π G J G ∗ , G J ( u | G , G ∗ , θ G ) with Jacobian = ∇ g G , G ∗ ( θ G , u ) ∇ ( θ G , u ) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 9 / 36

  10. Reversible jump can be thought of as a generalization of MH sampler MH sampler: r = { likelihood x prior x proposal ratios } RJ sampler: r = { likelihood x prior x proposal ratios x Jacobian } Usually implementation has three kind of moves BIRTH: Move to dimension k + 1 DEATH: Move to dimension k − 1 MOVE: Move within dimension k Not necessarily nested models Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 10 / 36

  11. Variable Selection Discriminating variables define a mixture of G distributions. Introduce latent p -vector γ with binary entries � γ j = 1 if variable j defines a mixture distribution γ j = 0 otherwise. The likelihood function is given by G � − pnk − nk 2 w n k L ( G , γ, w , µ, Σ , η, Ω | X , y ) = 2 | Σ k | ( 2 π ) k k = 1     �  − 1 ( x ( γ ) i − µ ( γ ) k ) T Σ − 1 × exp ( γ ) k ( x ( γ ) i − µ ( γ ) k )  2 x i ∈ C k × φ ( X ( γ c ) | η ( γ c ) , Ω ( γ c ) ) , where C k = { x i | y i = k } with cardinality n k , φ ( . ) is multivariate normal density. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 11 / 36

  12. Prior Model Assume γ j ’s are independent Bernoulli variables Number of components, G , can be assumed to follow a truncated Poisson or a discrete Uniform on [ 2 , . . . , G max ] . w | G ∼ Dirichlet ( α, . . . , α ) . � µ k ( γ ) | Σ k ( γ ) , G N ( µ 0 ( γ ) , h Σ k ( γ ) ) ∼ , Σ k ( γ ) | G IW ( δ ; Q γ ) ∼ where ( γ ) indicates the covariates with γ j = 1. Conjugate priors on parameters for case γ j = 0. We work with a marginalized likelihood. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 12 / 36

  13. Model Fitting (1) Update γ by Metropolis algorithm (add/delete and swap moves). (2) Update w from its full conditional (Dirichlet draw). (3) Update y from its full conditional (multinomial draw). (4) Split one cluster into two, or merge two into one. (5) Birth or death of an empty component. Steps (4) and (5) via reversible jump MCMC extended to multivariate setting. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 13 / 36

  14. Updating γ Metropolis move to update γ old to γ new : (a) Add/delete : randomly choose a γ j and change its value. (b) Swap : randomly choose a 0 and a 1 in γ old and switch values. New candidate γ new accepted with probability � � 1 , f ( γ new | X , G , w , y ) min . f ( γ old | X , G , w , y ) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 14 / 36

  15. Updating w and y w | G , γ, y , X ∼ Dirichlet ( α + n 1 , . . . , α + n G ) . y updated one element at a time from f ( y i = k | X , y ( − i ) , γ, w , G ) . Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 15 / 36

  16. Split/merge and birth/death moves The proposal for these moves in the multivariate setting is intricate. It is necessary to integrate out µ and Σ . Deriving f ( y i | X , G , w , γ ) is computationally prohibitive. Defining adjacency in the multivariate setting is not straightforward. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 16 / 36

  17. Posterior Inference for y Number of clusters, G , estimated by value most frequently visited by MCMC sampler. Estimate marginal posterior probabilities p ( y i = k | X , G ) . Posterior allocation of sample i estimated as y i = max 1 ≤ k ≤ G { p ( y i = k | X , G ) } . � Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 17 / 36

  18. Posterior Inference for γ Select variables with largest marginal posterior probability p ( γ j = 1 | X , G ) Select variables that are in the “best” models � � p ( γ ( t ) | X , G , � w , � y ) � γ ∗ = argmax , 1 ≤ t ≤ M � M t = 1 w ( t ) . y the estimated sample allocations and � w = 1 with � M Tadesse, Sha and Vannucci ( JASA , 2005) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 18 / 36

  19. Infinite Mixture Models via Dirichlet Process Priors Integrating over w and taking G → ∞ we get n − i , k p ( y i = k and y l = k for some l � = i | y − i ) = n − 1 + α α p ( y i � = y l for all l � = i | y − i ) = n − 1 + α. (1) MCMC updates γ via Metropolis and y i from full conditionals p ( y i = k and y l = k for some l � = i | y − i , X , γ ) p ( y i � = y l for all l � = i | y − i , X , γ ) . (2) Inference on y by MAP or by estimating p ( y i = y j | X ) . Same as before for γ Natural approach to clustering (samples from a DP can have a number of ties). Kim, Tadesse and Vannucci ( Biometrika , 2006) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 3) ABS13-Italy 06/17-21/2013 19 / 36

Recommend


More recommend