Convex Biclustering Eric Chi Rice University joint work with Genevera Allen and Rich Baraniuk E. Chi Convex Biclustering 1
The Biclustering Problem Task Given a data matrix X ∈ R p × n , find subgroups of rows & columns that go together. Text mining : similar documents share a small set of highly correlated words. Collaborative filtering : likeminded customers share similar preferences for a subset of products Cancer genomics : subtypes of cancerous tumors share similar molecular profiles over a subset of genes E. Chi Convex Biclustering 2
Cancer Genomics Collect expression data These genes are potential drug targets Which genes are driving “lung cancer?” “lung cancer” is heterogenous at the molecular level Genes mal mal mal mal Colon Carcinoid Carcinoid Colon Tissue Sample mal SmallCell Carcinoid Carcinoid E. Chi Colon Carcinoid mal Carcinoid Carcinoid mal mal mal SmallCell Carcinoid mal SmallCell Colon Carcinoid Colon mal mal Carcinoid Convex Biclustering Colon mal Carcinoid Colon SmallCell Carcinoid Carcinoid Carcinoid mal Carcinoid Colon Colon Colon Carcinoid Carcinoid SmallCell SmallCell Carcinoid Colon mal Carcinoid Carcinoid mal Colon Colon mal 3
Simple Solution: Cluster Dendrogram Hierarchical Clustering Genes Tissue Sample E. Chi Convex Biclustering 4
Hierarchical Clustering 2.5 ● B 2.0 0 1.5 ● ● C D y 1.0 E ● − 1 0.5 A 0.0 A E B C D ● a e b c d − 0.50 − 0.25 0.00 0.25 0.50 0.75 x E. Chi Convex Biclustering 5
Hierarchical Clustering 2.5 ● B 2.0 0 1.5 ● ● C D y 1.0 E ● − 1 0.5 A 0.0 A E B C D ● a e b c d − 0.50 − 0.25 0.00 0.25 0.50 0.75 x E. Chi Convex Biclustering 5
Hierarchical Clustering 2.5 ● B 2.0 F 0 1.5 ● ● C D D y 1.0 E ● − 1 F 0.5 A 0.0 A E B C C D D ● a e b c d − 0.50 − 0.25 0.00 0.25 0.50 0.75 x E. Chi Convex Biclustering 5
Hierarchical Clustering 2.5 ● B 2.0 F 0 1.5 ● ● C D D G y 1.0 E E ● − 1 F 0.5 G A A 0.0 A E B C C D D ● a e b c d − 0.50 − 0.25 0.00 0.25 0.50 0.75 x E. Chi Convex Biclustering 5
Hierarchical Clustering 2.5 ● B B 2.0 F F H 0 1.5 ● ● C C D D D G y 1.0 E E H ● − 1 F 0.5 G A A 0.0 A E B C C D D ● a e b c d − 0.50 − 0.25 0.00 0.25 0.50 0.75 x E. Chi Convex Biclustering 5
Hierarchical Clustering I I 2.5 ● B B B 2.0 F F F H 0 1.5 ● ● C C C D D D D G y 1.0 E E E H H ● − 1 F 0.5 G G A A A 0.0 A A E E B B C C C D D D ● a e b c d − 0.50 − 0.25 0.00 0.25 0.50 0.75 x E. Chi Convex Biclustering 5
Simple Solution: Cluster Dendrogram The Good Easy to interpret Fast computation - greedy algorithm The Bad: Non-convex optimization problem Local Minimizers Instability (initialization, tuning parameters, or data) The Ugly: How to choose number of biclusters? E. Chi Convex Biclustering 6
More Sophisticated Approaches SVD-like methods Plaid - Lazzeroni & Owen (2000) Iterative signature algorithm - Bergmann et al. (2003 ) sparse SVD - Lee et al. 2010 Graph Cut Dhillon (2001), Kluger (2003) LAS - Shabalin et al. (2009) Sparse transposable biclustering - Tan & Witten (2013) Harmonic Analysis of Digital Databases - Coifman & Gavish (2010) Goal: Simple and interpretable like clustered dendrogram Good algorithmic behavior Global minimizer Stability with respect to data and other inputs E. Chi Convex Biclustering 7
Solution: Convex Relaxation Solve combinatorially hard problem with a convex surrogate. All local minima are global minima Algorithms converge to global minimizer regardless of initialization Solve a convex optimization problem to go from A to B mal mal mal mal Colon Carcinoid Carcinoid Colon mal SmallCell Carcinoid Carcinoid Colon Carcinoid mal Carcinoid Carcinoid mal mal mal SmallCell Carcinoid mal SmallCell Colon Carcinoid Colon mal mal Carcinoid Colon mal Carcinoid Colon SmallCell Carcinoid Carcinoid Carcinoid mal Carcinoid Colon Colon Colon Carcinoid Carcinoid SmallCell SmallCell Carcinoid Colon mal Carcinoid Carcinoid mal Colon Colon mal om A to B E. Chi Convex Biclustering 8
Convex Biclustering Contributions: Characterization of the solution to the convex program Stability of solution in tuning parameters and data Simple intuitive meta-algorithm to get unique global minimizer alternate convex clustering of rows and columns Essentially one tuning parameter controls number of biclusters Data-adaptive way for selecting number of biclusters E. Chi Convex Biclustering 9
Convex Clustering Not much existing work, most is recent Pelckmans et al. 2005, Lindsten et al. 2011, Hocking et al. 2011, Chi & Lange 2013 n 1 X X k x i � u i k 2 minimize 2 + γ w ij k u i � u j k 2 2 u i =1 i < j Assign a centroid u i to each data point x i . Convex Fusion Penalty shrinks cluster centroids together sparsity in pairwise di ff erences of centroids u i � u j = 0 ( ) x i and x j belong to the same cluster γ : tunes overall amount of regularization w ij : fine tunes pairwise shrinkage Generalizes fused lasso / edge lasso (Sharpnack et. al. 2012) E. Chi Convex Biclustering 10
Convex Clustering Not much existing work, most is recent Pelckmans et al. 2005, Lindsten et al. 2011, Hocking et al. 2011, Chi & Lange 2013 n 1 X X k x i � u i k 2 minimize 2 + γ w ij k u i � u j k 2 2 u i =1 i < j Assign a centroid u i to each data point x i . Convex Fusion Penalty shrinks cluster centroids together sparsity in pairwise di ff erences of centroids Too many degrees of freedom! u i � u j = 0 ( ) x i and x j belong to the same cluster γ : tunes overall amount of regularization w ij : fine tunes pairwise shrinkage Generalizes fused lasso / edge lasso (Sharpnack et. al. 2012) E. Chi Convex Biclustering 10
Convex Clustering Not much existing work, most is recent Pelckmans et al. 2005, Lindsten et al. 2011, Hocking et al. 2011, Chi & Lange 2013 n 1 X X k x i � u i k 2 minimize 2 + γ w ij k u i � u j k 2 2 u i =1 i < j Assign a centroid u i to each data point x i . Convex Fusion Penalty shrinks cluster centroids together sparsity in pairwise di ff erences of centroids u i � u j = 0 ( ) x i and x j belong to the same cluster γ : tunes overall amount of regularization w ij : fine tunes pairwise shrinkage Generalizes fused lasso / edge lasso (Sharpnack et. al. 2012) E. Chi Convex Biclustering 10
Convex Clustering Not much existing work, most is recent Pelckmans et al. 2005, Lindsten et al. 2011, Hocking et al. 2011, Chi & Lange 2013 n 1 X X k x i � u i k 2 minimize 2 + γ w ij k u i � u j k 2 2 u i =1 i < j Assign a centroid u i to each data point x i . p ≥ 1 okay Convex Fusion Penalty shrinks cluster centroids together sparsity in pairwise di ff erences of centroids u i � u j = 0 ( ) x i and x j belong to the same cluster γ : tunes overall amount of regularization w ij : fine tunes pairwise shrinkage Generalizes fused lasso / edge lasso (Sharpnack et. al. 2012) E. Chi Convex Biclustering 10
Choosing weights Rules of thumb: w ij / k x i � x j k − 1 Most w ij = 0 Why? Encourage similar points to fuse early ! better clusterings Computation and storage scale with number of non-zero w ij Fiddle free; set and forget E. Chi Convex Biclustering 11
The Solution Path ● ● ● ● ● ● ● ● 1.00 ● ● ● ● ● ● ● ● ● ● ● ● 0.75 0.50 y ● ● ● ● ● ● ● ● ●● ● ●● ●● + γ ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● 0.25 ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0.0 0.3 0.6 0.9 x n minimize 1 X k x i � u i k 2 X w ij k u i � u j k 2 2 + γ 2 i =1 i < j E. Chi Convex Biclustering 12
The Solution Path ● ● ● ● ● ● ● ● 1.00 ● ● ● ● ● ● ● ● ● ● ● ● 0.75 0.50 y ● + γ ● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● 0.25 ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0.0 0.3 0.6 0.9 x n minimize 1 X k x i � u i k 2 X w ij k u i � u j k 2 2 + γ 2 i =1 i < j E. Chi Convex Biclustering 12
Recommend
More recommend