Predicting Cancer Phenotypes based on Somatic Genomic Alterations via Genomic Impact Transformer Yifeng Tao 1 , Chunhui Cai 2 , William W. Cohen 1* , Xinghua Lu 2* 1 Carnegie Mellon University 2 University of Pittsburgh Yifeng Tao Carnegie Mellon University 1
Background o Cancers are mainly caused by somatic genomic alterations (SGAs) o Driver SGAs à causal to tumor development o Passenger SGAs à neutral mutations Driver SGAs Biological/cellular Normal cells Tumor cells processes perturbed Yifeng Tao Carnegie Mellon University 2
Challenges o Driver SGA detection o Solution 1: frequency o Solution 2: conserved domain of protein o Problem: downstream effect of SGAs o SGA/tumor representation o Solution: a higher dimensional one-hot/sparse vector o Problem: little information/knowledge Yifeng Tao Carnegie Mellon University 3
Genomic Impact Transformer (GIT) o GIT: encoder-decoder architecture o Mimic cellular signaling process gene expression o Driver SGA detection o Problem: downstream effect of SGAs decoder o Solution: supervised by gene expressions o SGA/tumor representation o Problem: little information/knowledge tumor embedding o Solution: gene/tumor embedding encoder gene embeddings SGA Yifeng Tao Carnegie Mellon University 4
Genomic Impact Transformer (GIT) (a) Over-expressed genes Under-expressed genes Differentially expressed genes (DEGs) (b) e t = e s +ɑ 1 e 1 +ɑ 2 e 2 +ɑ m e m e s 1 e 1 ɑ 1 ɑ 2 ɑ m e 2 e m e t Tumor embedding Gene embeddings Cancer type embedding Attention weights Tumor embedding Multi-head self-attention (c) Attention weights α 1 α 2 α 3 α m ... α 1 α 2 α 3 α 4 α 5 α m 1 ... α 1,h α 2,h α 3,h α m,h ... softmax softmax β 1,1 softmax β 1,2 β 1,h β 2,h β 3,h β m,h h heads e s e 1 e 2 e 3 e m Gene embeddings ... MRPS28 PIK3CA ZBTB10 CNBD1 MATN2 GATA3 BRCA PURG TP53 θ 1 θ 2 ... ... ... θ h tanh Cancer type Somatic genomic alterations W 0 (SGAs) ... Cancer patient: e 1 e 2 e 3 e m TCGA-D8-A1JJ Gene embeddings Yifeng Tao Carnegie Mellon University 5
Encoder: Attention Yifeng Tao Carnegie Mellon University 6
Encoder: Attention Yifeng Tao Carnegie Mellon University 7
Decoder: MLP Yifeng Tao Carnegie Mellon University 8
Pre-training Gene Embedding: Gene2Vec o Co-occurrence pattern Pathway 1 g c Pathway 3 Pathway 2 Yifeng Tao Carnegie Mellon University 9
Performance o Predicting gene expression using SGAs 63 79 61 78 59 77 Accuracy F1 score 57 76 55 75 53 74 51 73 Yifeng Tao Carnegie Mellon University 10
Gene Embedding Yifeng Tao Carnegie Mellon University 11
Gene Embedding o NN accuracy: functional similar genes à closer in embedding space 11 10 9 NN accuray (%) 8 7 6 5 4 Random pairs Gene2Vec Gene2Vec+GIT Yifeng Tao Carnegie Mellon University 12
Candidate Drivers via Attention Yifeng Tao Carnegie Mellon University 13
Tumor Embedding o Common cellular signaling process across cancer types Yifeng Tao Carnegie Mellon University 14
Application - Survival Analysis Yifeng Tao Carnegie Mellon University 15
Application – Drug Response Yifeng Tao Carnegie Mellon University 16
Summary o Biological-inspired neural network to mimic cellular signaling o Distinguish drivers from passengers with supervision of expression o Gene embedding: informative of gene functions o Tumor embedding: transferable to other phenotype prediction tasks o Gene2Vec: speed up training and alleviate overfitting Yifeng Tao Carnegie Mellon University 17
Acknowledgements o Lu Lab o Xinghua Lu o Chunhui Cai o Yifan Xue o Michael Q. Ding o Cohen Lab o William W. Cohen o Funding o NIH o Pennsylvania Department of Health Yifeng Tao Carnegie Mellon University 18
Recommend
More recommend