orthogonal nmf based top k patient mutation profile
play

ORTHOGONAL NMF-BASED TOP-K PATIENT MUTATION PROFILE SEARCHING Ref. - PowerPoint PPT Presentation

(KCC2016) 2016-06-29 Presenter: Lee Sael Collaborative work with POSTECH DM Lab. (Hwanjo Yu & Sungchul Kim) ORTHOGONAL NMF-BASED TOP-K PATIENT MUTATION PROFILE SEARCHING Ref. Publication: Kim, S.,


  1. 인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 Presenter: Lee Sael Collaborative work with POSTECH DM Lab. (Hwanjo Yu & Sungchul Kim) ORTHOGONAL NMF-BASED TOP-K PATIENT MUTATION PROFILE SEARCHING Ref. Publication: Kim, S., Sael, L., & Yu, H. (2015). A mutation profile for top- k patient search exploiting gene-on tology and orthogonal non-negative matrix factorization. Bioinformatics , btv409. 1

  2. 2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) FAST SOMATIC MUTATION PROFILE SEARCH – THE MOTIVATION  Sequencing will become a common practice in medicine [1-3]  Characterizing cancer patients with somatic mutations is a natural process for cancer studies because cancer is the result of accumulation of genetic alterations.  Similarity search on mutation profiles can solve various translational bioinformatics tasks, including prognostics and treatment efficacy predictions for better clinical decision [4]. National Human Genome Research Institute ( NHGRI) ED Pleasance et al. Nature 000 , 1-6 (2009) 2 doi:10.1038/nature08658

  3. 2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) CHALLENGE: SPARSITY AND HETEROGENEITY OF MUTATION DATA  Somatic mutation data are sparse in character, and for complex diseases, including cancer, mutations are genetically heterogeneous [5-6]. 3

  4. 인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 GO AND ONMF-BASED SOMATIC MUTATION PROFILE  Goal  Characteristics of proposed profile  To provide a simple but effective  Compact representation of somatic mutation profile mutation for cancer patients  Method:  Enable real-time search  Tolerant to heterogeneity  Exploit Gene-Ontology (GO) and orthogonal non-negative matrix  Directness in function interpretation factorization (ONMF)  High predictive power for clinical  Target data features  Somatic mutation data (from TCGA)  5 different cancer types 4

  5. 2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) OVERVIEW OF THE PROFILE GENERATION AND VALIDATION METHODS 5

  6. 2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) SOMATIC MUTATION PROFILE, S  For each patient, somatic  Types of mutation considered: mutations are represented as a  A single-nucleotide base change, profile of binary mutated states  the insertion on genes.  deletion of bases Patient ent Gene ne Vari ariant t typ ype Varient ent Class ss Chorm. St Start art/End End Po Pos. Ref_ f_Alle lele le 2352 NEK11 INS Shift_Ins 19 58862932 - 2002 EGFR DEL Shift_Del 10 52575855 G 2002 TP53 SNP Missense 10 52575855 A 2352 EGFR SNP Missense 3 9229467 T Pat atient ient TP5 P53 NEK11 EGFR FR … A062 A2M SNP Silent 5 2352 0 1 0 … … … … 2002 1 0 1 ... 1 1 0 6

  7. 인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 GENE ONTOLOGY (GO)  Terms in the Gene ontology (GO) are hierarchical representation of controlled vocabulary of gene and gene products [7-8].   Biological terms in the same level may have different granularity in the GO hierarchy [9].  We only use Biological Adapted from a figure in Gene Ontology Consortium (geneontology.org) Processes (BP) terms 7

  8. 인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 GENE-FUNCTION PROFILE, G GENEXGO  Each gene is a binary vector of GO terms  1 if annotated with the term,  0 otherwise. Gene-function profile,  Reducing correlation between GO G geneXGO terms by using only the most specific terms  Scores of non-leaf nodes are propagated to their descendant nodes until G t converges 𝐻 𝑢+1 = 𝐻 𝑢 × 𝑁 𝐻𝐻 where G t is the gene-function profile at the t -th iteration and M GO is an adjacency matrix 8

  9. 2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) GO-BASED MUTATION PROFILE, GO-MP  For each patient, GO-based somatic mutation profile is represented by a weighted sum of gene scores on each GO term.  Multiply Mutation Profile matrix S with Gene-GO Profile matrix.  S x G genexGO lipoxin A4 biosynthetic process glycerophospholipid biosynthetic process Gene-function GO GO-based ed Mutation profile, S profile, mutation mu G geneXGO profile, G , GO-MP MP phosphatidylglycerol biosynthetic process icosanoid metabolic process 9

  10. 인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 ONMF MUTATION PROFILE, ONMF-MP  Orthogonal Non-negative Matrix Factorization (ONMF) Subtype pe-to to-GO GO matrix mat Pat atie ient-  𝑌 ≅ W × 𝐼 𝑡 . 𝑢 . 𝐼 𝐼 𝑈 = 𝐽 to- to GO GO-MP MP Subtyp ype matrix mat  Generally, orthogonal constraints on NMF enhance the clustering quality  Similar basis vectors are avoided.  ONMF mutation profile  The GO-MPs are further made compact by taking the encoding matrix W of ONMF on X as profile vectors. 10

  11. 인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 PERFORMANCE VALIDATION  Cancer stratification  Top -k search  Associations between the cancer  Similarity of clinical profiles to subtypes and clinical features. determine whether the search results are correct. Clin inica ical Clinical inical Patien ient-to- data da data ~ ~ Pat atie ient-to to- Subtyp ype (Su Survival t l time me, (Su Survival t l time me, Subtype mat Su matrix histologica cal l histologica cal l mat atrix ix feat eatur ures, and and feat eatur ures, and and so on) so on so on so on) 11

  12. 2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) EXPERIMENTAL RESULT  Data set  Somatic mutation data of five tumor types downloaded from TCGA portal; UCEC, BRCA, OV, LUAD, GBM data UCEC EC BRCA OV LUAD LUA GBM # patients 247 772 441 516 291 # genes 9341 13078 12431 18067 9341  Competitors  Cancer stratification - Network-Based Stratification (NBS). GOS (NMF on GO-MP), ORGOS (ONMF on GO-MP)  Top- k search – Somatic mutation profile, GO-MP, ONMF-MP 12

  13. 2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) COMPARED METHOD NETWORK-BASED STRATIFICATION (NBS) • A method to integrate somatic tumor genomes with gene networks Matan Hofree, John P Shen, Hannah Carter, Andrew Gross & Trey Ideker, Network-based stratification of tumor mutations. (Nature 2013). 13

  14. 2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) ASSOCIATION WITH PATIENT SURVIVAL Survival time (months) 14

  15. 인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 ASSOCIATION WITH PATIENT SURVIVAL  In OV, three survival curves show similar pattern for the all three approaches.  In LUAD, NBS produced inaccurate survival curves in which the min subtype shows longer survival pattern than the max subtype.  In GBM data, NBS was successful at grouping the min survival while ORGOS was better at grouping the max survival. 15

  16. 2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) CHI-SQUARE STATISTICS OF SUBTYPES WITH HISTOLOGICAL BASIS FEATURE ON UCEC DATA 200 180 160 Chi-square statistics 140 120 Low-grade NBS 100 120 GOS 80 C1 C2 C3 C4 Number of patients 100 ORGOS 60 80 40 60 20 40 Endometrioid type 0 20 2 3 4 5 6 7 8 9 10 0 Number of subtypes 1 2 3 Subtypes Serous adenocarcinoma, High-grade 16

  17. 2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) CHI-SQUARE STATISTICS OF SUBTYPES WITH ESTROGEN RECEPTOR STATUS ON BRCA DATA 140 120 100 Chi-square statistics 80 NBS 60 GOS ORGOS 40 20 0 2 3 4 5 6 7 8 9 10 Number of subtypes 17

  18. 인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 TOP- K SEARCH ON SINGLE FEATURE 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 Top-1 Top-10 Top-1 Top-10 Somatic mutation GO-MP ONMF-MP Somatic mutation GO-MP ONMF-MP UCEC data; histological type BRCA data; estrogen receptor status 18

  19. 인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 TOP-10 SEARCH ON MULTIPLE FEATURES 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 50% 75% 50% 75% threshold threshold Somatic mutation GO-MP ONMF-MP Somatic mutation GO-MP ONMF-MP UCEC data BRCA data 19

  20. 2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) AVERAGE TOP- K SEARCH SPEED Somatic mutation GO-MP ONMF-MP 0 2000 4000 6000 8000 10000 12000 14000 Search speed (milliseconds) BRCA GBM UCEC LUAD OV 20

  21. 2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) PROPAGATION OF GO TERM SCORES 21

  22. 2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) ANALYSIS OF SUBTYPES ON GO TERMS “PI3K cascade is an important pathway that is involved in proliferation, invasion and migration in cancer [10-12]. “PI3K pathway influence GBM patients survival [13]. “Glioblastoma cancer and pancreatic cancer share network patterns that contain most of the candidate causative mutations [14]. “Pancreatic stellate cells are responsible for creating a tumor facilitatory environment that stimulates local tumor growth and distant metastasis [15]. 22

Recommend


More recommend