constrained Tucker decomposition Dongjin Choi and Lee Sael 1 / 24 - PowerPoint PPT Presentation

SNeCT: Integrative cancer data analysis via large scale network constrained Tucker decomposition Dongjin Choi and Lee Sael 1 / 24

Motivation  Q: How can we characterize cancer patients?  A: The Cancer Genome Atlas (TCGA) Pan-Cancer data provide rich data across 12 tumor types 12 tumor types Mary Goldman. UCSC Cancer Browser Workshop (2015) John N. Weinstein et al. Nat Genet 45(10), 1113-1120 (2013) doi:10.1038/ng.2764 2 / 27

Motivation  How can we provide integrated analysis for multi- dimensional data?  Pan-Cancer12 data consist of multi-platform data Gene Expression DNA Methylation Copy Number Variation Mutation Mary Goldman. UCSC Cancer Browser Workshop (2015) 3 / 27

Motivation  How can we build a combined model exploiting gene networks?  Gene association networks provide gene similarity information Common pathways John N. Weinstein et al. Nat Genet 45(10), 1113-1120 (2013) doi:10.1038/ng.2764 4 / 27

Introduction Problem definition Proposed method Experiments Conclusion Overview  Introduction  Problem definition  Proposed method  Experiments  Conclusion 5 / 27

Introduction Problem definition Proposed method Experiments Conclusion Tensor  A tensor is a multi-dimensional array  Pan-can12 data are represented as a 3-D tensor 0.12 -0.3 Patients 0.82 Observations Genes 6 / 27

Introduction Problem definition Proposed method Experiments Conclusion Tensor Factorization  Given a tensor, decompose the tensor into a core tensor and factor matrices whose product approximates the original tensor CP Decomposition Tucker Decomposition (HOSVD) C C B B 𝒣 𝒣 ≈ ≈ 𝒴 𝒴 A A 7 / 27

Introduction Problem definition Proposed method Experiments Conclusion Tucker Decomposition  Tucker decomposition (Tucker, 1966)  Widely-used tensor factorization method  Given a tensor, Tucker decomposition factorizes the tensor into product of a core tensor and orthogonal factor matrices 𝒴 ≈ ෪ 𝒴 = 𝒣 × 1 𝑩 × 2 𝑪 × 3 𝑫 C : s.t. 𝑩 𝑼 𝑩 = 𝑪 𝑼 𝑪 = 𝑫 𝑼 𝑫 = 𝑱 B 𝒣 ≈ Elementwise, 𝒴 A 𝑦 𝑗𝑘𝑙 ≈ 𝒣 × 1 𝒃 𝑗 × 2 𝒄 𝑘 × 3 𝒅 𝑙 𝒃 𝑗 : 𝑗 -th row of 𝑩 𝒄 𝑘 : 𝑘 -th row of 𝑪 𝒅 𝑙 : 𝑙 -th row of 𝑫 9 / 27

Introduction Problem definition Proposed method Experiments Conclusion Tucker Decomposition (cont.)  Formal problem definition  Given a 3-D tensor 𝒴 (∈ ℝ 𝐽×𝐾×𝐿 ) with observable entries {𝑦 𝑗𝑘𝑙 |(𝑗, 𝑘, 𝑙) ∈ Ω 𝒴 } , the rank-[ 𝑄, 𝑅, 𝑆 ] factorization of 𝒴 is to find the core tensor 𝒣 and factor matrices {𝑩, 𝑪, 𝑫} which minimizes the following loss function: 𝑔 𝒣 , 𝑩, 𝑪, 𝑫 = 1 2 + 𝜇 2 𝒴 − ෪ 2 𝑆 𝒣 , 𝑩, 𝑪, 𝑫 𝒴 𝐺 = 1 2 + 𝜇 2 𝑆 𝒣 , 𝑩, 𝑪, 𝑫 ෍ 𝑦 𝑗𝑘𝑙 − 𝒣 × 1 𝒃 𝑗 × 2 𝒄 𝑘 × 3 𝒅 𝑙 2 𝑗,𝑘,𝑙 ∈Ω 𝒴 10 / 27

Introduction Problem definition Proposed method Experiments Conclusion Scheme of SNeCT Input Lock-Free Parallel SGD Extract patients profile Gene 𝑩 𝑪 𝑫 Patient 𝒣 𝑫 𝒣 Gene 𝑩 𝑪 Gene Make related factors similar Bionetwork 𝑫 Personalized Subtype Analysis Prediction Stratification 𝒃 𝒓 C 1 ≈ 𝒃 𝒋 𝒣 𝑩 𝒣 × 𝟐 𝒯 = Query patient data 𝑪 C 2 𝑩 Top-k search Patients clustering 12 / 27

Introduction Problem definition Proposed method Experiments Conclusion Proposed methods  SNeCT enables integrative tensor factorization and analysis for tensor data with network constraint SNeCT = Scalable Network Constrained Tucker decomposition  Method 1  Formulate SGD-amenable objective function  Iterative SGD update with lock-free parallel scheme  Method 2  Personalized subtype analysis 13 / 27

Introduction Problem definition Proposed method Experiments Conclusion Proposed methods  Formulate SGD-amenable objective function  Given the gene similarity matrix 𝒁 (∈ ℝ 𝐾×𝐾 ) with observable entries {𝑧 𝑛𝑜 |(𝑛, 𝑜) ∈ Ω 𝒁 } , network constraint is formulated to make similar genes have similar factors: 𝑅 𝑕 𝑪, 𝒁 = 1 𝑧 𝑛𝑜 𝑐 𝑛𝑚 − 𝑐 𝑜𝑚 2 𝑔 2 ෍ ෍ 𝑚=1 𝑛,𝑜 ∈Ω 𝒁 = 1 2 ෍ 𝑧 𝑛𝑜 𝒄 𝑛 − 𝒄 𝑜 𝐺 2 𝑛,𝑜 ∈Ω 𝒁 14 / 27

Introduction Problem definition Proposed method Experiments Conclusion Proposed methods  Formulate SGD-amenable objective function 𝑔 𝒣 , 𝑩, 𝑪, 𝑫 = 1 2 + 𝜇 2 𝑆 𝒣 , 𝑩, 𝑪, 𝑫 ෍ 𝑦 𝑗𝑘𝑙 − ෤ 𝑦 𝑗𝑘𝑙 2 𝑗,𝑘,𝑙 ∈Ω 𝒴 2 𝒄 𝑘 2 2 = 1 𝜇 𝒃 𝑗 + 𝒅 𝑙 2 + 2 + 𝜇 𝐺 𝐺 𝐺 ෍ 𝑦 𝑗𝑘𝑙 − ෤ 𝑦 𝑗𝑘𝑙 𝒣 𝐺 + 𝑗 𝑙 2 Ω 𝒴 𝑘 Ω 𝒴 Ω 𝒴 Ω 𝒴 𝑗,𝑘,𝑙 ∈Ω 𝒴 𝑕 𝑪, 𝒁 = 1 2 𝑔 ෍ 𝑧 𝑛𝑜 𝒄 𝑛 − 𝒄 𝑜 𝐺 2 𝑛,𝑜 ∈Ω 𝒁  Integrate into single objective function 𝑔 𝑝𝑞𝑢 = 𝑔 + 𝜇 𝑕 𝑔 𝑕 15 / 27

Introduction Problem definition Proposed method Experiments Conclusion Proposed methods  Calculate gradients of 𝑔 𝑝𝑞𝑢 with respect to the core tensor and factor matrices for a given data point 𝑦 𝛽=(𝑗𝑘𝑙) or 𝑧 𝛾=(𝑛𝑜) 𝜖𝑔 𝜇 𝑝𝑞𝑢 ቤ = − 𝑦 𝛽 − ෤ 𝑦 𝛽 𝒣 × 2 𝒄 𝑘 × 3 𝒅 𝑙 + 𝒃 𝑗 𝑗 𝜖𝒃 𝑗 Ω 𝒴 𝛽 𝜖𝑔 𝜇 𝑈 × 2 𝒄 𝑘 𝑈 × 3 𝒅 𝑙 𝑝𝑞𝑢 𝑈 + ቤ = − 𝑦 𝛽 − ෤ 𝑦 𝛽 × 1 𝒃 𝑗 𝒣 𝜖 𝒣 Ω 𝒴 𝛽 𝜖𝑔 𝑝𝑞𝑢 ቤ = 𝜇 𝑕 𝑧 𝛾 𝒄 𝑛 − 𝒄 𝑜 𝜖𝒄 𝑛 𝛾 𝜖𝑔 𝜖𝑔 𝜖𝑔 𝑝𝑞𝑢 𝑝𝑞𝑢 𝑝𝑞𝑢 , and are calculated symmetrically ฬ , ฬ ฬ  𝜖𝒄 𝑘 𝜖𝒅 𝑙 𝜖𝒄 𝑜 𝛽 𝛾 𝛽 16 / 27

Introduction Problem definition Proposed method Experiments Conclusion Proposed methods  Parallel update with calculated gradient  SNeCT( 𝒴 , 𝒁, 𝜇, 𝜇 𝑕 , 𝜃 ) ( 𝜃 : learning rate) Initialize 𝒣 , 𝑩, 𝑪, 𝑫 randomly 1. repeat 2. for ∀𝑦 (𝑗𝑘𝑙)=𝛽 ∈ 𝒴, ∀𝑧 𝑛𝑜 =𝛾 ∈ 𝒁 in random order in parallel 3. if 𝑦 𝑗𝑘𝑙 ∈ 𝒴 is picked then 4. 𝜖𝑔 𝜖𝑔 𝜖𝑔 𝑝𝑞𝑢 𝑝𝑞𝑢 , 𝒅 𝑙 ← 𝒅 𝑙 − 𝜃 𝑝𝑞𝑢 𝒃 𝑗 ← 𝒃 𝑗 − 𝜃 ฬ , 𝒄 𝑘 ← 𝒄 𝑘 − 𝜃 ฬ ฬ 5. 𝜖𝒃 𝑗 𝜖𝒄 𝑘 𝜖𝒅 𝑙 𝛽 𝛽 𝛽 𝜖𝑔 𝑝𝑞𝑢 𝒣 ← 𝒣 − 𝜃 ฬ 6. 𝜖 𝒣 𝛽 else if ∀𝑧 𝑛𝑜 ∈ 𝒁 is picked then 7. 𝜖𝑔 𝜖𝑔 𝑝𝑞𝑢 , 𝒄 𝑜 ← 𝒄 𝑜 − 𝜃 𝑝𝑞𝑢 𝒄 𝑛 ← 𝒄 𝑛 − 𝜃 ฬ ฬ 8. 𝜖𝒄 𝑛 𝛾 𝜖𝒄 𝑜 𝛾 end if 9. end for 10. 11. until convergence condition satisfied Orthogonalize 𝑩, 𝑪, 𝑫 by QR decomposition 12. 13. return 𝒣 , 𝑩, 𝑪, 𝑫 17 / 27

Introduction Problem definition Proposed method Experiments Conclusion Experimental Settings  Factorize data tensor with rank-[78,48,5]  Stratification  Cluster analysis  Survival analysis  Prediction  T op-k similarity search on clinical features  Personalized subtype analysis  Performance  Compare speed and convergence rate with competitor  Competitor: Narita et al . 2012 19 / 27

Introduction Problem definition Proposed method Experiments Conclusion Stratification – Cluster Analysis C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 T otal BLCA 16 32 2 19 0 22 3 0 0 0 32 0 0 126 BRCA 17 3 600 172 1 70 0 0 0 0 26 0 0 889 COAD 4 0 2 2 0 91 317 0 0 0 1 2 0 419 GBM 4 1 1 2 3 7 0 0 248 0 1 0 0 267 HNSC 0 242 1 6 0 1 0 0 0 0 60 0 0 310 KIRC 14 1 1 0 471 4 0 0 1 0 6 0 0 498 LAML 0 0 0 0 0 9 0 0 0 188 0 0 0 197 LUAD 302 2 2 7 1 12 0 0 0 0 29 0 0 457 LUSC 26 32 0 29 0 7 0 0 0 0 246 0 0 340 OV 0 0 1 3 0 1 1 348 0 0 0 0 131 485 READ 1 1 0 5 0 9 145 0 0 0 1 1 0 163 UCEC 3 1 3 117 1 348 1 0 0 0 10 13 2 499 T otal 387 315 613 362 477 581 467 348 249 188 412 17 134 4550 20 / 27

Introduction Problem definition Proposed method Experiments Conclusion Stratification – Survival Analysis  Survival curves for clustered patients log-rank statistics: 1151 1185 409 21 / 27

constrained Tucker decomposition Dongjin Choi and Lee Sael 1 / 24 - PowerPoint PPT Presentation

SNeCT: Integrative cancer data analysis via large scale network constrained Tucker decomposition Dongjin Choi and Lee Sael 1 / 24 Motivation Q: How can we characterize cancer patients? A: The Cancer Genome Atlas (TCGA) Pan-Cancer data

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Polar Decomposition of a Matrix Garrett Buffington May 4, 2014 The Polar Decomposition SVD and

Consulting: Coding Is Only Half the Work Beth Tucker Long Who am I? Beth Tucker Long

An ADMM algorithm for constrained material decomposition in spectral CT May, 23 th , 2018 Lake

Emergency Communication Tucker Dunham, KD2JPM Abbie Heim, KD2PUA 1 About Me Tucker

SculptPrint SculptPrint Subtractive 3D Printing Subtractive 3D Printing Tommy Tucker, PhD

Mark Tucker mark@thinkgenealogy.com 10 Things Genealogy Software Should DO Mark Tucker

CI:IRL By Beth Tucker Long Who am I? Beth Tucker Long (@e3betht) Editor in Chief

Accessibility for Everyone Beth Tucker Long Beth Tucker Long PHP Developer Stay-at-home

Get Your Team Talking About Usability Beth Tucker Long @e3betht Beth Tucker Long PHP

Accessibility for Everyone Beth Tucker Long Beth Tucker Long PHP Developer Stay-at-home

Sperner, Tucker and Ky Fans lemmas for manifolds Oleg R. Musin University of Texas at

Normalisation: Friend or Foe Beth Tucker Long Who am I? Beth Tucker Long (@e3betht) Editor

Foundations to Get You Started Beth Tucker Long Who am I? Elizabeth Tucker Long (aka Beth)

How to Speak at a Conference Beth Tucker Long @e3betht Who am I? Beth Tucker Long PHP

How to Speak at a User Group or Conference Beth Tucker Long @e3betht Who am I? Beth Tucker Long

IANR to 2017 to 2025 All Hands Meeting February 10, 2012 Nebraska East Union Roadmap for

Linear regression How to measure the accuracy of linear regression models Linear Regression

Fundamentals of Machine Learning Instructor: Ekpe Okorafor 1. Accenture Big Data Academy 2.

A relative survival model for clustered responses - Comparing SAS PROC NLMIXED and WinBUGS for

Eff fficient processing of Hi Hi-C data an and ap application to can ancer Nicola las Serv

Operating Plan 2017-19 Progress update Dr Jim ODonnell Clinical Chair Slough CCG Nine

Cervix Cancer Research Network (CCRN) Chair: Mary McCormack Steering Committee: Marie Plante,

Research Network Chair: Mary McCormack Steering Committee: Marie Plante, David Gaffney, Sang