co manifold learning with missing data
play

Co-manifold learning with missing data Gal Mishne, Eric C. Chi and - PowerPoint PPT Presentation

Co-manifold learning with missing data Gal Mishne, Eric C. Chi and Ronald R. Coifman Department of Mathematics, Yale University Department of Statistics, North Carolina State University June 12, 2019 Gal Mishne (Yale) Co-Manifold Learning


  1. Co-manifold learning with missing data Gal Mishne, Eric C. Chi and Ronald R. Coifman Department of Mathematics, Yale University Department of Statistics, North Carolina State University June 12, 2019 Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 1 / 14

  2. The Biclustering Problem Task Given a data matrix X ∈ R n × p , find subgroups of rows & columns that go together. Text mining : similar documents share a small set of highly correlated words. Collaborative filtering : likeminded customers share similar preferences for a subset of products Cancer genomics : subtypes of cancerous tumors share similar molecular profiles over a subset of genes Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 2 / 14

  3. Cancer Genomics Lung cancer is heterogenous at the molecular level Which genes are driving lung cancer? These genes are potential drug targets Collect expression data Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 3 / 14

  4. Simple Solution: Cluster Dendrogram Each dendrogram is constructed independently of multiscale structure in other dimension. Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 4 / 14

  5. From Co-clustering to Co-Manifold Learning I would add that in many real-world applications there is no “true” fixed number of biclusters, i.e. the truth is a bit more continuous... –Anonymous Referee 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● 0.0 ● ● ● ● ● 0.2 ● ● ● ● Intrinsic Coordinate 2 Intrinsic Coordinate 2 ● ● ● ● ● − 0.1 ● 0.1 ● ● ● ● ● ● ● − 0.2 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● − 0.3 ● ● ● − 0.1 ● ● − 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 0.2 − 0.3 − 0.2 − 0.1 0.0 0.1 − 0.1 0.0 0.1 Intrinsic Coordinate 1 Intrinsic Coordinate 1 Clustered Dendrogram New Row Coordinate New Column Coordinate System System Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 5 / 14

  6. What if data matrices are not completely observed? Missing data scenario Complete data: X ∈ R n × p Suppose we only get to observe Θ ⊂ { 1 , . . . , n } × { 1 , . . . , p } . Possibly by design: too expensive to collect / measure all np possible entries Goal: Recover row and column coordinate systems, not necessarily complete missing data 15 10 5 0 2 3 1 2 0 1 -1 0 -1 -2 ( X [ i, j ] ( i, j ) ∈ Θ y - helix X [ i, j ] = k y i � z j k 2 P Θ ( X ) = 0 otherwise z - 2D plane Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 6 / 14

  7. Co-Manifold Learning Solve co-clustering-missing problem at multiple row and column scales Build multiscale row and column metrics Calculate non-linear embeddings Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 7 / 14

  8. Step 1: Co-clustering an Incomplete Data Matrix U F ( U ) = 1 2 k P Ω ( X � U ) k 2 X X min F + γ c Ω ( k U · i � U · j k 2 ) + γ r Ω ( k U k · � U l · k 2 ) i < j k < l 1.0 0.8 0.6 0.4 0.2 0.0 − 2 − 1 0 1 2 Folded concave penalty = ) less bias towards 0 Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 8 / 14

  9. Step 1: Majorization-Minimization (MM) G ( U | V ) = 1 2 k ˜ X � U k 2 X X F + γ c w c , ij k U · i � U · j k 2 + γ r ˜ w r , kl k U k · � U l · k 2 + c ˜ i < j k < l ˜ X = P Ω ( X ) + P Ω c ( V ) w c , ij = Ω 0 ( k V · i � V · j k 2 ) w r , kl = Ω 0 ( k V k · � V l · k 2 ) ˜ and ˜ Can be solved with Convex Bi-clustering [Chi et al. 2017]. 1.0 0.8 0.6 ● ● 0.4 0.2 0.0 − 2 − 1 0 1 2 Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 8 / 14

  10. Step 1: Majorization-Minimization (MM) Majorization: G ( U | V ) = 1 X X 2 k X � U k 2 F + γ c w c , ij k U · i � U · j k 2 + γ r ˜ w r , kl k U k · � U l · k 2 + c ˜ i < j k < l F ( U ) = G ( U | U ) F ( U )  G ( U | V ) for all U MM: Solve sequence of Convex Biclustering Problems = arg min G ( U | U t ) U t +1 U Proposition Under suitable regularity conditions, the sequence U t generated by Algorithm 1 has at least one limit point, and all limit points are d-stationary points of minimizing F ( U ) . Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 9 / 14

  11. Step 1: Smoothing Rows and Columns at Di ff erent Scale Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 10 / 14

  12. Co-Manifold Learning Solve co-clustering-missing problem at multiple row and column scales Build multiscale row and column metrics Calculate non-linear embeddings Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 7 / 14

  13. Step 2: Multiscale metric Intuition: Pair of rows are close over multiple scale ! distance should be small Pair of rows are far apart over multiple scales ! distance should be big ( r , c ) = P Θ ( X ) + P Θ c ( U ( γ r , γ c )) Step 1: Fill in X over multiple γ r , γ c scales: ˜ X Step 2: Take weighted combination over all scales of pairwise distances ( r , c ) ( r , c ) X ( γ r γ c ) α k ˜ � ˜ d ( X i · , X j · ) = k 2 X X i · j · r , c α tunable to emphasize local versus global structure Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 11 / 14

  14. Co-Manifold Learning Solve co-clustering-missing problem at multiple row and column scales Build multiscale row and column metrics Calculate non-linear embeddings Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 7 / 14

  15. Step 3: Spectral Embedding Example: Di ff usion Map (Coifman & Lafon, 2006) Construct an a ffi nity matrix exp { − d 2 ( X i · , X j · ) / σ 2 } A [ i , j ] = Compute row-stochastic matrix D − 1 A , X = D [ i , i ] = A [ i , j ] P j Eigendecomposition of P : keep first d eigenvalues and eigenvectors Mapping Ψ embeds the rows into the Euclidean space R d : � T . � Ψ : X i · → λ 1 ψ 1 ( i ) , λ 2 ψ 2 ( i ) , . . . , λ d ψ d ( i ) Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 12 / 14

  16. Some Examples Nonlinear Linear Nonlinear Nonlinear Uncoupled Coupled Uncoupled Coupled Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 13 / 14

  17. Some Examples Quantitative evaluation 
 1 via clustering 0.8 0.6 ARI 0.4 Lung500 0.2 0 10 20 30 40 50 60 70 80 90 percentage of missing values Co-manifold DM-missing NLPCA FRPCAG =1 FRPCAG =100 1 0.8 0.6 ARI 0.4 0.2 Linkage 0 10 20 30 40 50 60 70 80 90 percentage of missing values Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 14 / 14

Recommend


More recommend