Machine Reading for Cancer Panomics Hoifung Poon 1
Overview … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB …… Cancer Systems Modeling 2
Overview … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB … Grounded Extract Pathways Semantic Parsing from PubMed 3
Precision Medicine
Vemurafenib on BRAF-V600 Melanoma Before Treatment 15 Weeks 5
Vemurafenib on BRAF-V600 Melanoma Before Treatment 15 Weeks 23 Weeks 6
7
Traditional Biology Discovery Targeted Experiments One hypothesis 8
Genomics … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … ? … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … High-Throughput Experiments Discovery Many hypotheses 9
Genome-Wide Association Studies (GWAS) Disease … ATTCGG A TATTTAAG G C … (e.g., Alzheimer, Cancer) Healthy … ATTCGGGTATTTAAGCC … “Genetic diagnosis of diseases would be accomplished in 10 years and that 2000 treatments would start to roll out perhaps five years after that. ” “ A Decade Later, Genetic Maps Yield Few New Cures ” 2010 New York Times, June 2010. 10
Key Challenges Human genome: 3 billion base pairs Potential variations: > 10 million variants Combination: > 10 1000000 (1 million zeros) Machine learning problem Atomic features: > 10 million Feature combination: Too many to enumerate 11
Genomics … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … High-Throughput Experiments Discovery How to Scale Discovery? 12
Cancer Tumor cells … ATTCGG A TATTTAAG G C … Normal cells … ATTCGGGTATTTAAGCC … Hundreds of mutations Most are “passenger”, not driver Can we identify likely drivers? 13
Panomics … ATTCGG A TATTTAAG G C … Genome Transcriptome Epigenome …… 14
Pathway Knowledge Genes work synergistically in pathways 15
Why Hard to Identify Drivers? Complex diseases Perturb multiple pathways Hanahan & Weinberg [Cell 2011] 16
Why Cancer Comes Back? Subtypes with alternative pathway profile Compensatory pathways can be activated EphA2 EphB2 Ovarian Cancer 17
Why Cancer Comes Back? Subtypes with alternative pathway profile Compensatory pathways can be activated EphA2 EphB2 X Ovarian Cancer 18
Cancer Systems Modeling Translation Activation Transcription Gene A DNA mRNA Protein Protein Active Functional activity … ATTCGG A TATTTAAG G C … Mutation effect Drug Target …… 19
Knowledge Model Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 20
Knowledge Model ? Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 21
Knowledge Model ? Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 22
Knowledge Model ! Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 23
Approach: Graph HMM Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 24
Extract Pathways from PubMed … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB …… 25
PubMed 24 millions abstracts Two new abstracts every minute Adds over one million every year 26
Machine Reading PMID: 123 … VDR+ binds to SMAD3 to form … PMID: 456 … JUN expression is induced by SMAD3/4 … …… 27
Machine Reading Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... 28
Machine Reading Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 29 CELL
Machine Reading Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement REGULATION Cause Theme up-regulation activation REGULATION REGULATION Site Theme Cause Theme human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 30 CELL
Machine Reading Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement REGULATION Cause Theme Semantic Parsing up-regulation activation REGULATION REGULATION Site Theme Cause Theme human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 31 CELL
Long Tail of Variations TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… 32
Bottleneck: Annotated Examples GENIA ( BioNLP Shared Task 2009-2013 ) 1999 abstracts MeSH: human, blood cell, transcription factor Challenge for “supervised” machine learning Can we breach this bottleneck? 33
Free Lunch #1: Distributional Similarity Similar context Probably similar meaning Annotation as latent variables Textual expression Recursive clusters Unsupervised semantic parsing Poon & Domingos, “Unsupervised Semantic Parsing”. EMNLP 2009. Best Paper Award . 34
Recursive Clustering TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B- cell CLL/Lymphoma 2 expression by TP53 … …… 35
Recursive Clustering TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… 36
Recursive Clustering TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… 37
Recursive Clustering TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… inhibits, down-regulates, suppresses, inhibition, … Theme Cause BCL2, BCL-2 proteins, TP53,Tumor B-cell CLL/Lymphoma 2 suppressor P53 …… …… 38
Free Lunch #2: Existing KBs Many KBs available Gene/Protein: GeneBank, UniProt , … Pathways: NCI, Reactome, KEGG, BioCarta , … Annotation as latent variables Textual expression Table, column, join, … Grounded semantic parsing 39
Entity Extraction ID Symbol Alias B- cell CLL/Lymphoma 2, … 990 BCL2 HGNC Tumor suppressor P53, … 11998 TP53 … … … 40
Entity Extraction ID Symbol Alias B- cell CLL/Lymphoma 2, … 990 BCL2 HGNC Tumor suppressor P53, … 11998 TP53 … … … TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… 41
Relation Extraction Regulation Theme Cause Positive A2M FOXO1 NCI-PID Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … … TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B- cell CLL/Lymphoma 2 expression by TP53 … …… 42
Relation Extraction Regulation Theme Cause Positive A2M FOXO1 NCI-PID Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … … TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. Grounded Learning BCL2 transcription is suppressed by P53 expression. The inhibition of B- cell CLL/Lymphoma 2 expression by TP53 … …… 43
Question Answering w.r.t. KB System Accuracy ZC07 84.6 Supervised FUBL 82.8 GUSP 83.5 Unsupervised Poon, “Grounded Unsupervised Semantic Parsing”. ACL 2013. 44
Pathway Extraction Generalize distant supervision : Nested events in KB likely occur in semantic parse of some sentence Prior: Favor semantic parse grounded in KB Outperformed the majority of participants in original GENIA Event Shared Task Parikh, Poon, Toutanova. In Progress . 45
Recommend
More recommend