cancer panomics
play

Cancer Panomics Hoifung Poon 1 Overview ATTCGG A TATTTAAG G C - PowerPoint PPT Presentation

Machine Reading for Cancer Panomics Hoifung Poon 1 Overview ATTCGG A TATTTAAG G C ATTCGGGTATTTAAGCC Disease Genes Drug Targets High-Throughput Data KB Cancer Systems Modeling 2 Overview ATTCGG


  1. Machine Reading for Cancer Panomics Hoifung Poon 1

  2. Overview … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB …… Cancer Systems Modeling 2

  3. Overview … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB … Grounded Extract Pathways Semantic Parsing from PubMed 3

  4. Precision Medicine

  5. Vemurafenib on BRAF-V600 Melanoma Before Treatment 15 Weeks 5

  6. Vemurafenib on BRAF-V600 Melanoma Before Treatment 15 Weeks 23 Weeks 6

  7. 7

  8. Traditional Biology Discovery Targeted Experiments One hypothesis 8

  9. Genomics … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … ? … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … High-Throughput Experiments Discovery Many hypotheses 9

  10. Genome-Wide Association Studies (GWAS) Disease … ATTCGG A TATTTAAG G C … (e.g., Alzheimer, Cancer) Healthy … ATTCGGGTATTTAAGCC … “Genetic diagnosis of diseases would be accomplished in 10 years and that 2000 treatments would start to roll out perhaps five years after that. ” “ A Decade Later, Genetic Maps Yield Few New Cures ” 2010 New York Times, June 2010. 10

  11. Key Challenges  Human genome: 3 billion base pairs  Potential variations: > 10 million variants  Combination: > 10 1000000 (1 million zeros)  Machine learning problem  Atomic features: > 10 million  Feature combination: Too many to enumerate 11

  12. Genomics … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … High-Throughput Experiments Discovery How to Scale Discovery? 12

  13. Cancer Tumor cells … ATTCGG A TATTTAAG G C … Normal cells … ATTCGGGTATTTAAGCC …  Hundreds of mutations  Most are “passenger”, not driver  Can we identify likely drivers? 13

  14. Panomics … ATTCGG A TATTTAAG G C … Genome Transcriptome Epigenome …… 14

  15. Pathway Knowledge Genes work synergistically in pathways 15

  16. Why Hard to Identify Drivers? Complex diseases  Perturb multiple pathways Hanahan & Weinberg [Cell 2011] 16

  17. Why Cancer Comes Back?  Subtypes with alternative pathway profile  Compensatory pathways can be activated EphA2 EphB2 Ovarian Cancer 17

  18. Why Cancer Comes Back?  Subtypes with alternative pathway profile  Compensatory pathways can be activated EphA2 EphB2 X Ovarian Cancer 18

  19. Cancer Systems Modeling Translation Activation Transcription Gene A DNA mRNA Protein Protein Active Functional activity … ATTCGG A TATTTAAG G C … Mutation effect Drug Target …… 19

  20. Knowledge  Model Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 20

  21. Knowledge  Model ? Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 21

  22. Knowledge  Model ? Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 22

  23. Knowledge  Model ! Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 23

  24. Approach: Graph HMM Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 24

  25. Extract Pathways from PubMed … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB …… 25

  26. PubMed  24 millions abstracts  Two new abstracts every minute  Adds over one million every year 26

  27. Machine Reading PMID: 123 … VDR+ binds to SMAD3 to form … PMID: 456 … JUN expression is induced by SMAD3/4 … …… 27

  28. Machine Reading Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... 28

  29. Machine Reading Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 29 CELL

  30. Machine Reading Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement REGULATION Cause Theme up-regulation activation REGULATION REGULATION Site Theme Cause Theme human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 30 CELL

  31. Machine Reading Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement REGULATION Cause Theme Semantic Parsing up-regulation activation REGULATION REGULATION Site Theme Cause Theme human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 31 CELL

  32. Long Tail of Variations TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… 32

  33. Bottleneck: Annotated Examples  GENIA ( BioNLP Shared Task 2009-2013 )  1999 abstracts  MeSH: human, blood cell, transcription factor  Challenge for “supervised” machine learning  Can we breach this bottleneck? 33

  34. Free Lunch #1: Distributional Similarity  Similar context  Probably similar meaning  Annotation as latent variables Textual expression  Recursive clusters  Unsupervised semantic parsing Poon & Domingos, “Unsupervised Semantic Parsing”. EMNLP 2009. Best Paper Award . 34

  35. Recursive Clustering TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B- cell CLL/Lymphoma 2 expression by TP53 … …… 35

  36. Recursive Clustering TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… 36

  37. Recursive Clustering TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… 37

  38. Recursive Clustering TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… inhibits, down-regulates, suppresses, inhibition, … Theme Cause BCL2, BCL-2 proteins, TP53,Tumor B-cell CLL/Lymphoma 2 suppressor P53 …… …… 38

  39. Free Lunch #2: Existing KBs  Many KBs available  Gene/Protein: GeneBank, UniProt , …  Pathways: NCI, Reactome, KEGG, BioCarta , …  Annotation as latent variables Textual expression  Table, column, join, …  Grounded semantic parsing 39

  40. Entity Extraction ID Symbol Alias B- cell CLL/Lymphoma 2, … 990 BCL2 HGNC Tumor suppressor P53, … 11998 TP53 … … … 40

  41. Entity Extraction ID Symbol Alias B- cell CLL/Lymphoma 2, … 990 BCL2 HGNC Tumor suppressor P53, … 11998 TP53 … … … TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… 41

  42. Relation Extraction Regulation Theme Cause Positive A2M FOXO1 NCI-PID Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … … TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B- cell CLL/Lymphoma 2 expression by TP53 … …… 42

  43. Relation Extraction Regulation Theme Cause Positive A2M FOXO1 NCI-PID Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … … TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. Grounded Learning BCL2 transcription is suppressed by P53 expression. The inhibition of B- cell CLL/Lymphoma 2 expression by TP53 … …… 43

  44. Question Answering w.r.t. KB System Accuracy ZC07 84.6 Supervised FUBL 82.8 GUSP 83.5 Unsupervised Poon, “Grounded Unsupervised Semantic Parsing”. ACL 2013. 44

  45. Pathway Extraction  Generalize distant supervision : Nested events in KB likely occur in semantic parse of some sentence  Prior: Favor semantic parse grounded in KB  Outperformed the majority of participants in original GENIA Event Shared Task Parikh, Poon, Toutanova. In Progress . 45

Recommend


More recommend