DeepGene: An Advanced Cancer Type Classifier Based on Deep Learning and Somatic Point Mutations 石毅 (Shi, Yi) 2016.10.03 Center for Systems Biomedicine Shanghai Jiao Tong University
Outline Motivation Methods Results & Discussion
Motivation
Motivation Traditional cancer diagnosis • Morphological appearance, imaging techniques Image from radiology.med.nyu.edu • Gene expression Image from well.ox.ac.uk • Protein profiling Image from sigmaaldrich.com
Motivation Inside drives • Somatic point mutations • Insertions and deletions (INDELs) • Chromatin translocations • Copy number abnormalities
Motivation Neural network (1940’s) Support vector machine (1960’s) Deep neural network (1980’s) Supervised combined with un-supervised
Motivation Applications of deep neural network (DNN) learning
Methods
Methods Three steps of DeepGene • Step1. Clustered gene filtering (CGF) • Step2. Indexed sparsity reduction (IDS) • Step3. Deep neural network (DNN) classifier
Methods Step1. Clustered gene filtering (CGF) • Intuitive idea: Team A: Vs. Team B:
Methods Step1. Clustered gene filtering (CGF)
Methods Step2. Indexed sparsity reduction (ISR) 𝟐 Truncate the top if n NZ ≥ n ISR n ISR elements with . / 𝟒 ⎡𝟐 highest occurrence ⋮ ⎤ frequency 𝟐 𝟏 ⎢ ⎥ 𝟐 𝟒 ⎢ ⎥ * - ⋮ 𝟏 ⎡𝟐 ⋮ ⎢ ⎥ ⎤ ⋮ 𝑶 𝟒 ⎢ ⎥ ⎢ ⎥ ⋮ ⎢ ⎥ Indexed gene data 𝟏 ⎢ ⎥ n NZ x1 𝑶 Add zeros to tail ⎣ 𝟐⎦ ⎢ ⎥ 𝟏 if n NZ < n ISR Raw gene data ⎢ ⎥ ⋮ Nx1 ⎣ 𝟏⎦ Gene data after ISR n ISR x1
Methods Step3. Deep neural network classifier
Methods Overall flowchart of DeepGene ⎡𝟐 ⎤ 𝟏 ⎢ ⎥ ⎡ 𝟐 Clustered gene 𝟏 ⎢ ⎥ ⎤ filtering (CGF) 𝟏 ⎡𝟐 𝟏 ⎢ ⎥ ⎢ ⎥ ⎤ 𝟏 𝟏 ⎣ ⋮⎦ ⎢ ⎥ ⎢ ⎥ ⋮ 𝟐 ⎢ ⎥ ⎢ ⎥ Clustered discriminatory gene data 6 KIRP Concatenation DNN classifier 𝟐 n 1 × 1 ⎢ ⎥ 𝟏 ⎢ ⎥ Classification result Classification label ⎡ 𝟐 ⎢ 𝟒 ⎥ 1 × 1 ⎣ ⋮⎦ (cancer type) ⎤ ⎢ ⎥ 𝟐𝟏 𝟒 ⎢ ⎥ Raw gene data ⎣ ⋮ ⎦ Indexed sparsity 𝟐𝟏 ⎢ ⎥ N × 1 reducing (ISR) ⎢ 𝟑𝟗 ⎥ Input to the DNN classifier (n 1 +n 2 ) × 1 ⎣ ⋮ ⎦ Indexes of non-zero elements n 2 × 1
Results & Discussion
Results & Discussion Dataset • 12 tumor somatic point mutation datasets from TCGA. (ACC, BLCA, BRCA, CESC, HNSC, KIRP, LGG, LUAD, PAAD, PRAD, STAD, UCS) • 22,834 genes from 3,122 samples in total. Note: ACC, adrenocortical carcinoma; BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; CESC, cervical squamous cell carcinoma and endocervical adenocarcinoma; HNSC, head and neck squamous cell carcinoma; KIRP , kidney renal papillary cell carcinoma; LGG, brain lower grade glioma; LUAD, lung adenocarcinoma; PAAD, pancreatic adenocarcinoma; PRAD, prostate adenocarcinoma; STAD, stomach adenocarcinoma; UCS, uterine carcinosarcomas.
Results & Discussion Parameters (a) (b) (c) (d) (a) Parameter estimation for and , corresponding to Table 4; (b) parameter estimation for layer number and parameter number per layer for the DNN classifier, corresponding to Table 5; (c) parameter estimation for cost and gamma for SVM, corresponding to Table 6; (d) parameter estimation for Table 7.
Results & Discussion Does CGF and/or ISR help? 10-fold cross validation accuracy of DeepGene with different design options
Results & Discussion Comparing to other famous classifiers Testing accuracy of DeepGene against three widely adopted classifiers
Results & Discussion Further investigation • Integrating other heterogeneous mutation data, e.g. INDEL, CNV, translocation • What feature (gene) combinations contribute to better prediction accuracy? Why? How this can help real diagnosis? • Applying to CTC or ctDNA for early diagnosis, subtyping, locating.
Questions & Comments?
Recommend
More recommend