인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 Presenter: Lee Sael Collaborative work with POSTECH DM Lab. (Hwanjo Yu & Sungchul Kim) ORTHOGONAL NMF-BASED TOP-K PATIENT MUTATION PROFILE SEARCHING Ref. Publication: Kim, S., Sael, L., & Yu, H. (2015). A mutation profile for top- k patient search exploiting gene-on tology and orthogonal non-negative matrix factorization. Bioinformatics , btv409. 1
2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) FAST SOMATIC MUTATION PROFILE SEARCH – THE MOTIVATION Sequencing will become a common practice in medicine [1-3] Characterizing cancer patients with somatic mutations is a natural process for cancer studies because cancer is the result of accumulation of genetic alterations. Similarity search on mutation profiles can solve various translational bioinformatics tasks, including prognostics and treatment efficacy predictions for better clinical decision [4]. National Human Genome Research Institute ( NHGRI) ED Pleasance et al. Nature 000 , 1-6 (2009) 2 doi:10.1038/nature08658
2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) CHALLENGE: SPARSITY AND HETEROGENEITY OF MUTATION DATA Somatic mutation data are sparse in character, and for complex diseases, including cancer, mutations are genetically heterogeneous [5-6]. 3
인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 GO AND ONMF-BASED SOMATIC MUTATION PROFILE Goal Characteristics of proposed profile To provide a simple but effective Compact representation of somatic mutation profile mutation for cancer patients Method: Enable real-time search Tolerant to heterogeneity Exploit Gene-Ontology (GO) and orthogonal non-negative matrix Directness in function interpretation factorization (ONMF) High predictive power for clinical Target data features Somatic mutation data (from TCGA) 5 different cancer types 4
2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) OVERVIEW OF THE PROFILE GENERATION AND VALIDATION METHODS 5
2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) SOMATIC MUTATION PROFILE, S For each patient, somatic Types of mutation considered: mutations are represented as a A single-nucleotide base change, profile of binary mutated states the insertion on genes. deletion of bases Patient ent Gene ne Vari ariant t typ ype Varient ent Class ss Chorm. St Start art/End End Po Pos. Ref_ f_Alle lele le 2352 NEK11 INS Shift_Ins 19 58862932 - 2002 EGFR DEL Shift_Del 10 52575855 G 2002 TP53 SNP Missense 10 52575855 A 2352 EGFR SNP Missense 3 9229467 T Pat atient ient TP5 P53 NEK11 EGFR FR … A062 A2M SNP Silent 5 2352 0 1 0 … … … … 2002 1 0 1 ... 1 1 0 6
인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 GENE ONTOLOGY (GO) Terms in the Gene ontology (GO) are hierarchical representation of controlled vocabulary of gene and gene products [7-8]. Biological terms in the same level may have different granularity in the GO hierarchy [9]. We only use Biological Adapted from a figure in Gene Ontology Consortium (geneontology.org) Processes (BP) terms 7
인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 GENE-FUNCTION PROFILE, G GENEXGO Each gene is a binary vector of GO terms 1 if annotated with the term, 0 otherwise. Gene-function profile, Reducing correlation between GO G geneXGO terms by using only the most specific terms Scores of non-leaf nodes are propagated to their descendant nodes until G t converges 𝐻 𝑢+1 = 𝐻 𝑢 × 𝑁 𝐻𝐻 where G t is the gene-function profile at the t -th iteration and M GO is an adjacency matrix 8
2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) GO-BASED MUTATION PROFILE, GO-MP For each patient, GO-based somatic mutation profile is represented by a weighted sum of gene scores on each GO term. Multiply Mutation Profile matrix S with Gene-GO Profile matrix. S x G genexGO lipoxin A4 biosynthetic process glycerophospholipid biosynthetic process Gene-function GO GO-based ed Mutation profile, S profile, mutation mu G geneXGO profile, G , GO-MP MP phosphatidylglycerol biosynthetic process icosanoid metabolic process 9
인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 ONMF MUTATION PROFILE, ONMF-MP Orthogonal Non-negative Matrix Factorization (ONMF) Subtype pe-to to-GO GO matrix mat Pat atie ient- 𝑌 ≅ W × 𝐼 𝑡 . 𝑢 . 𝐼 𝐼 𝑈 = 𝐽 to- to GO GO-MP MP Subtyp ype matrix mat Generally, orthogonal constraints on NMF enhance the clustering quality Similar basis vectors are avoided. ONMF mutation profile The GO-MPs are further made compact by taking the encoding matrix W of ONMF on X as profile vectors. 10
인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 PERFORMANCE VALIDATION Cancer stratification Top -k search Associations between the cancer Similarity of clinical profiles to subtypes and clinical features. determine whether the search results are correct. Clin inica ical Clinical inical Patien ient-to- data da data ~ ~ Pat atie ient-to to- Subtyp ype (Su Survival t l time me, (Su Survival t l time me, Subtype mat Su matrix histologica cal l histologica cal l mat atrix ix feat eatur ures, and and feat eatur ures, and and so on) so on so on so on) 11
2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) EXPERIMENTAL RESULT Data set Somatic mutation data of five tumor types downloaded from TCGA portal; UCEC, BRCA, OV, LUAD, GBM data UCEC EC BRCA OV LUAD LUA GBM # patients 247 772 441 516 291 # genes 9341 13078 12431 18067 9341 Competitors Cancer stratification - Network-Based Stratification (NBS). GOS (NMF on GO-MP), ORGOS (ONMF on GO-MP) Top- k search – Somatic mutation profile, GO-MP, ONMF-MP 12
2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) COMPARED METHOD NETWORK-BASED STRATIFICATION (NBS) • A method to integrate somatic tumor genomes with gene networks Matan Hofree, John P Shen, Hannah Carter, Andrew Gross & Trey Ideker, Network-based stratification of tumor mutations. (Nature 2013). 13
2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) ASSOCIATION WITH PATIENT SURVIVAL Survival time (months) 14
인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 ASSOCIATION WITH PATIENT SURVIVAL In OV, three survival curves show similar pattern for the all three approaches. In LUAD, NBS produced inaccurate survival curves in which the min subtype shows longer survival pattern than the max subtype. In GBM data, NBS was successful at grouping the min survival while ORGOS was better at grouping the max survival. 15
2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) CHI-SQUARE STATISTICS OF SUBTYPES WITH HISTOLOGICAL BASIS FEATURE ON UCEC DATA 200 180 160 Chi-square statistics 140 120 Low-grade NBS 100 120 GOS 80 C1 C2 C3 C4 Number of patients 100 ORGOS 60 80 40 60 20 40 Endometrioid type 0 20 2 3 4 5 6 7 8 9 10 0 Number of subtypes 1 2 3 Subtypes Serous adenocarcinoma, High-grade 16
2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) CHI-SQUARE STATISTICS OF SUBTYPES WITH ESTROGEN RECEPTOR STATUS ON BRCA DATA 140 120 100 Chi-square statistics 80 NBS 60 GOS ORGOS 40 20 0 2 3 4 5 6 7 8 9 10 Number of subtypes 17
인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 TOP- K SEARCH ON SINGLE FEATURE 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 Top-1 Top-10 Top-1 Top-10 Somatic mutation GO-MP ONMF-MP Somatic mutation GO-MP ONMF-MP UCEC data; histological type BRCA data; estrogen receptor status 18
인공지능 최근 동향 워크샵 (KCC2016) 2016-06-29 TOP-10 SEARCH ON MULTIPLE FEATURES 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 50% 75% 50% 75% threshold threshold Somatic mutation GO-MP ONMF-MP Somatic mutation GO-MP ONMF-MP UCEC data BRCA data 19
2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) AVERAGE TOP- K SEARCH SPEED Somatic mutation GO-MP ONMF-MP 0 2000 4000 6000 8000 10000 12000 14000 Search speed (milliseconds) BRCA GBM UCEC LUAD OV 20
2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) PROPAGATION OF GO TERM SCORES 21
2016-06-29 인공지능 최근 동향 워크샵 (KCC2016) ANALYSIS OF SUBTYPES ON GO TERMS “PI3K cascade is an important pathway that is involved in proliferation, invasion and migration in cancer [10-12]. “PI3K pathway influence GBM patients survival [13]. “Glioblastoma cancer and pancreatic cancer share network patterns that contain most of the candidate causative mutations [14]. “Pancreatic stellate cells are responsible for creating a tumor facilitatory environment that stimulates local tumor growth and distant metastasis [15]. 22
Recommend
More recommend