cancer panomics
play

Cancer Panomics Hoifung Poon 1 Overview ATTCGG A TATTTAAG G C - PowerPoint PPT Presentation

Semantic Parsing for Cancer Panomics Hoifung Poon 1 Overview ATTCGG A TATTTAAG G C ATTCGGGTATTTAAGCC Disease Genes Drug Targets High-Throughput Data KB 2 Overview ATTCGG A TATTTAAG G C


  1. Semantic Parsing for Cancer Panomics Hoifung Poon 1

  2. Overview … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB …… 2

  3. Overview … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB …… Infer cancer driver mutations 3

  4. Overview … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB … Grounded Extract Pathways Unsupervised Semantic Parsing from Pubmed 4

  5. Collaborators David Heckerman Kristina Toutanova Chris Quirk Lucy Vanderwende Tony Gitter Ankur Parikh 5

  6. Precision Medicine

  7. Vemurafenib on BRAF-V600 Melanoma Before Treatment 15 Weeks 7

  8. Vemurafenib on BRAF-V600 Melanoma Before Treatment 15 Weeks 23 Weeks 8

  9. 9

  10. Traditional Biology Discovery Targeted Experiments One hypothesis 10

  11. Genomics … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … ? … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … High-Throughput Experiments Discovery Many hypotheses 11

  12. Genome-Wide Association Studies (GWAS) Disease … ATTCGG A TATTTAAG G C … (e.g., Alzheimer, Cancer) Healthy … ATTCGGGTATTTAAGCC … “Genetic diagnosis of diseases would be accomplished in 10 years and that 2000 treatments would start to roll out perhaps five years after that. ” “ A Decade Later, Genetic Maps Yield Few New Cures ” 2010 New York Times, June 2010. 12

  13. Key Challenges  Human genome: 3 billion base pairs  Potential variations: > 10 million mutations  Combination: > 10 1000000 (1 million zeros)  Machine learning problem  Atomic features: > 10 million  Feature combination: Too many to enumerate 13

  14. Genomics … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … High-Throughput Experiments Discovery How to Scale Discovery? 14

  15. Cancer Tumor cells … ATTCGG A TATTTAAG G C … Normal cells … ATTCGGGTATTTAAGCC …  Hundreds of mutations  Most are “passenger”, not driver  Can we identify likely drivers? 15

  16. Panomics … ATTCGG A TATTTAAG G C … Genome Transcriptome Epigenome …… 16

  17. Pathway Knowledge Genes work synergistically in pathways 17

  18. Why Hard to Identify Drivers?  Complex diseases  Synergistic perturbation of multiple pathways  Cancer: 6  8 “hallmarks”  Promote growth  Avoid suicide  Evade immune attack  Induce blood vessels  Invade neighboring tissues  … 18

  19. Hanahan & Weinberg [Cell 2011] 19

  20. Why Cancer Comes Back?  Subtypes with alternative pathway profile  Compensatory pathways can be activated EphA2 EphB2 Ovarian Cancer 20

  21. Why Cancer Comes Back?  Subtypes with alternative pathway profile  Compensatory pathways can be activated EphA2 EphB2 X Ovarian Cancer 21

  22. A Grammar of Cancer? Cancer  Anti-Apoptosis & ProGrowth & … Anti-Apoptosis  Deactivate TP53 Anti-Apoptosis  Activate BCL-2 … 22

  23. Infer Cancer Driver Mutations Translation Activation Transcription Gene A DNA mRNA Protein Protein Active What’s the level of activity? … ATTCGG A TATTTAAG G C … Is change caused by mutation? 23

  24. Pathway Knowledge Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 24

  25. Pathway Knowledge ? Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 25

  26. Pathway Knowledge ? Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 26

  27. Pathway Knowledge ! Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 27

  28. Approach: Graph HMM Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 28

  29. Extract Pathways from Pubmed … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB …… 29

  30. PubMed  22 millions abstracts  Two new abstracts every minute  Adds 2000-4000 every day 30

  31. Extract Pathways from Pubmed PMID: 123 … VDR+ binds to SMAD3 to form … PMID: 456 … JUN expression is induced by SMAD3/4 … …… 31

  32. Extract Complex Knowledge Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement up-regulation activation human p70(S6)-kinase gp41 IL-10 monocyte 32

  33. Extract Complex Knowledge Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement REGULATION up-regulation activation REGULATION REGULATION human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 33 CELL

  34. Extract Complex Knowledge Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement REGULATION Cause Theme up-regulation activation REGULATION REGULATION Site Theme Cause Theme human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 34 CELL

  35. Extract Complex Knowledge Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement REGULATION Semantic Parsing Cause Theme up-regulation activation REGULATION REGULATION Site Theme Cause Theme human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 35 CELL

  36. Bottleneck: Annotated Examples  GENIA ( BioNLP Shared Task 2009-2013 )  1999 abstracts  MeSH: human, blood cell, transcription factor  Can we breach the annotation bottleneck? 36

  37. Free Lunch #1: Distributional Similarity  Similar context  Probably similar meaning  Annotation as latent variables Textual expression  Recursive clusters  Unsupervised semantic parsing Poon & Domingos, “Unsupervised Semantic Parsing”. EMNLP-2009 (Best Paper Award). 37

  38. Problem Formulation Dependency tree Semantic parse Probability Parsing Learning Prior: Favor fewer parameters 38

  39. Free Lunch #2: Existing KBs  Many KBs available  Gene/Protein: GeneBank, UniProt , …  Pathways: NCI, Reactome, KEGG, BioCarta , …  Annotation as latent variables Textual expression  Table, column, join, …  Grounded unsupervised semantic parsing Poon, “Grounded Unsupervised Semantic Parsing”. ACL -13. 39

  40. Natural-Language Interface to Database Get flight from Toronto to San Diego stopping at DTW SELECT flight.flight_id FROM flight, city, city c2, flight_stop, airport_service, airport_service as2 WHERE flight.from_airport = airport_service.airport_code AND flight.to_airport = as2.airport_code AND airport_service.city_code = city.city_code AND as2.city_code = city2.city_code AND city.city_name = ‘ toronto ’ AND city2.city_name = ‘san diego ’ AND flight_stop.flight_id = flight.flight_id AND flight_stop.stop_airport = ‘ dtw ’ Answers 40

  41. Clusters  KB Elements  Entity: Table, Column, Cell  Relation: Relational join  Priors:  Favor lexical similarity  Favor short relational joins 41

  42. GUSP: Key Ideas  Leverage target database JOB Bootstrap learning Job ID Company System with lexical prior 001 IBM Unix Prior: Favor Unix → System 002 Roche IBM 003 Microsoft Windows …… 42

  43. GUSP: Key Ideas  Leverage target database Flight Airport …… …… Flight ID From Airport Airport Code Airport Name Foreign Key 43

  44. GUSP: Key Ideas  Leverage target database Flight Airport 44

  45. GUSP: Key Ideas  Leverage target database Airline Days Fare Flight Airport 45

  46. GUSP: Key Ideas  Leverage target database Airline Airline Days Days Fare Fare Flight Flight Airport Airport ? flight BWI 46

  47. GUSP: Key Ideas  Leverage target database Airline Days Fare Leverage schema to guide learning Flight Airport Prior: Favor shorter join flight BWI 47

  48. Free Lunch #3: Dependency Parses  Start from syntactic parse  Rich resources and available parsers  Intractable structure learning  Tree HMM  Exact inference is linear-time  Need to handle syntax-semantics mismatch 48

  49. Syntax-Semantics Mismatch get from flight to diego toronto stopping san at dtw 49

  50. Syntax-Semantics Mismatch get from flight to diego toronto stopping san at dtw 50

  51. Syntax-Semantics Mismatch get from flight to diego toronto stopping san at dtw 51

Recommend


More recommend