variational greedy algorithm for clustering of grouped
play

Variational Greedy Algorithm for Clustering of Grouped Data Linda - PowerPoint PPT Presentation

Variational Greedy Algorithm for Clustering of Grouped Data Linda S. L. Tan (Joint work with A/Prof. David J. Nott) National University of Singapore 2023 Dec ICSA 2013 Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 1 / 15


  1. Variational Greedy Algorithm for Clustering of Grouped Data Linda S. L. Tan (Joint work with A/Prof. David J. Nott) National University of Singapore 20–23 Dec ICSA 2013 Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 1 / 15

  2. Presentation Outline Motivation 1 Mixtures of Linear Mixed Models 2 Variational Approximation 3 Hierarchical Centering 4 Variational Greedy Algorithm 5 Examples 6 Conclusion and Future Work 7 Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 2 / 15

  3. Motivation Problem: Clustering correlated or replicated grouped data Example: Gene expression profiles Clustering used to find co-regulated and functionally related groups of genes (e.g. Celeux et al. , 2005). Time course data (Spellman et al. 1998) Gene expression levels 3 2 1 0 −1 −2 5 10 15 Time points Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 3 / 15

  4. Approach Consider mixtures of linear mixed models (MLMMs). Provide mathematical framework for clustering grouped data Allow covariate information to be incorporated Estimated using EM algorithm (likelihood maximization). Model selection performed using penalized log-likelihood criteria e.g. BIC (Celeux et al. 2005, Ng et al. 2006). We develop a variational greedy algorithm (VGA) for fitting MLMMs automatic performs parameter estimation and model selection simultaneously reparametrize MLMM using hierarchical centering when certain parameters are weakly identifiable, report gain in efficiency in variational algorithms due to hierarchical centering (similar to MCMC). Some theoretical support is provided. Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 4 / 15

  5. Mixture of linear mixed models (MLMMs) Observe y i = [ y i 1 , . . . , y in i ] T for i = 1 , . . . , n . Number of mixture components: k . δ i : latent mixture component indicators. Conditional on δ i = j , y i = X i β j + W i a i + V i b j + ǫ i . X i , W i and V i are design matrices, β j : fixed effects a i ∼ N (0 , σ 2 a j I ) and b j ∼ N (0 , σ 2 b j I ) are random effects ǫ i ∼ N (0 , Σ ij ): error vector Mixture weights: vary with covariates. Multinomial logit model: exp( u T i γ j ) P ( δ i = j | γ ) = , � k l =1 exp( u T i γ l ) u i : vector of covariates, γ 1 ≡ 0, γ 2 , . . . , γ k : unknown parameters. Priors (Bayesian approach): γ ∼ N (0 , Σ γ ), β j ∼ N (0 , Σ β j ), σ 2 a j ∼ IG ( α a j , λ a j ), σ 2 b j ∼ IG ( α b j , λ b j ) and σ 2 jl ∼ IG ( α jl , λ jl ). Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 5 / 15

  6. Introduction to variational approximation Fast, deterministic and flexible technique Bayesian inference: approximate intractable true posterior p ( θ | y ) by more tractable q ( θ ), e.g. assume q ( θ ) belongs to some parametric distribution or 1 q ( θ ) = � m i =1 q i ( θ i ) for θ = { θ 1 , . . . , θ m } (Variational Bayes) 2 Minimize Kullback-Leibler divergence between q ( θ ) and p ( θ | y ) Equivalent to maximizing lower bound � L = q ( θ ) { log p ( y , θ ) − log q ( θ ) } d θ on the log marginal likelihood log p ( y ) L sometimes used for Bayesian model selection. Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 6 / 15

  7. Variational approximation for MLMMs Assume g n k � � � � � q ( β j ) q ( b j ) q ( σ 2 a j ) q ( σ 2 q ( σ 2 q ( θ ) = q ( γ ) { q ( a i ) q ( δ i ) } b j ) jl ) i =1 j =1 l =1 a i ) , q ( β j ) = N ( µ q β j , Σ q β j ) , q ( b j ) = N ( µ q b j , Σ q q ( a i ) = N ( µ q a i , Σ q b j ) , q ( σ 2 a j ) , q ( σ 2 b j ) = IG ( α q b j , λ q b j ) , q ( σ 2 jl ) = IG ( α q jl , λ q a j ) = IG ( α q a j , λ q jl ) k � q ( δ i = j ) = q ij where q ij = 1 ∀ i , q ( γ ) = 1 { γ = µ q γ } (for tractable L ) . j =1 Optimize L w.r.t. variational parameters in gradient ascent algorithm Conditional mode of µ q γ : iteratively weighted least squares. Closed form updates for all other variational parameters. Relax q ( γ ) to normal distribution at convergence (Waterhouse et al. , 1996). Obtain approximation L ∗ to log p ( y ) (model selection in VGA) Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 7 / 15

  8. Hierarchical centering Recall: y i = X i β j + W i a i + V i b j + ǫ i conditional on δ i = j 1 Partial centering ( X i = W i ): Introduce η i = β j + a i ∼ N ( β j , σ 2 a j I ) so y i = X i η i + V i b j + ǫ i 2 Full centering ( X i = W i = V i ): Introduce ν j = β j + b j ∼ N ( β j , σ 2 b j I ) and ρ i = ν j + a i ∼ N ( ν j , σ 2 a j I ) so y i = X i ρ i + ǫ i We derive lower bounds and algorithms for these two cases. Observe gain in efficiency through centering similar to MCMC Theoretical support: we prove that “Rate of convergence of variational Bayes algorithms by Gaussian approximation is equal to that of the corresponding Gibbs sampler” Result not directly applicable to MLMMs, but suggests hierachical centering may lead to improved convergence in variational algorithms just as in MCMC. Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 8 / 15

  9. Variational greedy algorithm (VGA) Automatic Returns a plausible number of mixture components + fitted model Bottom-up approach (VA: variational algorithm) Start by fitting a one component mixture f 1 1 Search for optimal way to split components in current mixture, f k 2 Randomly partition each component into two. Apply partial VA to resulting mixture, updating only variational parameters of two split components. Trial with highest L out of M yields optimal way. Split components in f k in descending order of L . 3 Apply partial VA each time keeping fixed variational parameters of components awaiting to be split. Split “successful” if L ∗ increases. Stop once a split is unsuccessful. Apply VA on resulting mixture updating all variational parameters. 4 Repeat 2–4 until all splits of current mixture are unsuccessful. 5 Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 9 / 15

  10. Variational greedy algorithm (VGA) Increase efficiency partial variational algorithms: Only update variational parameters of certain components instead of entire mixture Component elimination property of variational Bayes: sieve out components that resist splitting Optional merge merges may be carried out after VGA has converged. Greedy approach can be adapted to fit other mixture models Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 10 / 15

  11. Example: Time course data (Spellman et al. 1998) 18 α -factor synchronization: yeast cells sampled at 7 min intervals for 119 mins for 612 genes. Time course data (Spellman et al. 1998) Gene expression levels 3 2 1 0 −1 −2 5 10 15 Time points Apply VGA (without hierarchical centering) ten times: three 15-comp mixtures, five 17-comp mixtures and two 18-comp mixtures. Apply merge moves: three 17-comp mixtures reduce to 16-comp and both 18-comp mixtures reduce to 17-comp. Possible for VGA to overestimate number of mixture components but variation in number of components returned is relatively small. Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 11 / 15

  12. Example: Time course data Clustering of a 16-component mixture, obtained after applying one merge move to a 17-component mixture produced by VGA. cluster 1 (37 genes) cluster 2 (105 genes) cluster 3 (41 genes) cluster 4 (20 genes) cluster 5 (8 genes) cluster 6 (64 genes) 3 3 3 3 3 3 1 1 1 1 1 1 −2 −2 −2 −2 −2 −2 0 40 80 0 40 80 0 40 80 0 40 80 0 40 80 0 40 80 cluster 7 (65 genes) cluster 8 (79 genes) cluster 9 (25 genes) cluster 10 (17 genes) cluster 11 (15 genes) cluster 12 (49 genes) 3 3 3 3 3 3 1 1 1 1 1 1 −2 −2 −2 −2 −2 −2 0 40 80 0 40 80 0 40 80 0 40 80 0 40 80 0 40 80 cluster 13 (13 genes) cluster 14 (37 genes) cluster 15 (31 genes) cluster 16 (6 genes) 3 3 3 3 1 1 1 1 −2 −2 −2 −2 0 40 80 0 40 80 0 40 80 0 40 80 Figure: x -axis: time points, y -axis: gene expression levels. Line in black is posterior mean of fixed effects. Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 12 / 15

  13. Example: Synthetic data set (Yeung et al. , 2003) 400 gene expressions (4 repeated measurements) Model: X i = W i 3 Apply VGA (with partial centering and 2 1 without centering) five times each 0 Adjusted Rand index: Measure degree −2 of agreement between true and fitted 5 10 15 20 clusters. cluster 1: 67 genes cluster 2: 67 genes cluster 3: 67 genes Centering No Partial 2 2 2 Average adjusted Rand index < 0.01 0.99 0 0 0 2 comp × 5 6 comp × 3 −2 −2 −2 No. of components returned 7 comp × 2 5 10 15 20 5 10 15 20 5 10 15 20 cluster 4: 67 genes cluster 5: 66 genes cluster 6: 66 genes Hierarchical centering produced much 2 2 2 better clustering results 0 0 0 −2 −2 −2 Number of mixture components 5 10 15 20 5 10 15 20 5 10 15 20 returned by VGA very close to true Figure: x -axis: experiments, number of components y -axis: gene expression levels. Linda Tan (NUS) Variational Greedy Algorithm ICSA 2013 13 / 15

Recommend


More recommend