clinical nlp pubgene
play

Clinical NLP, PubGene Clinical trials in Coremine Oncology Text - PowerPoint PPT Presentation

Clinical NLP, PubGene Clinical trials in Coremine Oncology Text processing and information extraction for surgery planning form November 2017 Dag Are Steenhoff Hov, PubGene AS 1 PubGene, founded 2001 ArrayIt H25K microarray Scientific


  1. Clinical NLP, PubGene Clinical trials in Coremine Oncology Text processing and information extraction for surgery planning form November 2017 Dag Are Steenhoff Hov, PubGene AS 1

  2. PubGene, founded 2001 ArrayIt H25K microarray Scientific Literature Coremine Networks Integration of structured and unstructured COREMINE information Oncology • Interpretation of biomedical analysis data COREMINE • General information Medical • Specialized information analysis COREMINE Platform 2

  3. Clinical NLP in PubGene - examples • Clinical trials in Coremine Oncology • PubGene in Ahus Optique Courtesy of DNV-GL (Tore Hartvigsen) 3

  4. Coremine Oncology AIM: To enable oncologists to make better treatment decisions HOW: Combine data from relevant sources to aid interpretation of oncogenomics data from NGS and other platforms • Input: Somatic mutations, copy number changes, gene expression, or similar quantity • Output: Gene/biomarker annotations, related drugs and drug sensitivity, pathways, clinical trials, etc. 4

  5. Coremine Oncology – Our Scope We focus on: • Analysis of “called events”; assumed that normalization and data quality considerations have been taken care of • Collecting and integrating information for interpretation • Linking to potentially relevant treatments • Linking to clinical trials related to the input data 5

  6. Coremine Oncology • Currently three types of input data: – (Somatic) mutations – Copy number changes – gene expression • Analysis/Interpretation module to display information (annotations) about – Mutation – Gene/Protein – Protein Domains • Summary module to show patient level information with respect to: – Statistics on mutations – Related drugs for targets with change (in progress: also biomarker and sensitivity info) – Pathways for targets with change – Relevant clinical trials for aberrations 6

  7. Example Somatic mutations input data • Input for Coremine Oncology, case from lung cancer – Chromosome number – Position – Reference nucleotide – Alternate nucleotide 7

  8. View of imported data file

  9. Mutation annotation – 1 patient - 1 missense mutation

  10. 10

  11. Clinical Trials for Cetuximab 11

  12. Clinical Trials for biomarkers AIM: CHALLENGES • • To map biomarkers from patient data Text mining is difficult! to relevant clinical trials • Biomarkers are described, or referred to in many ways METHOD: • • Ultimately, we want to identify Identify how biomarkers are biomarkers related to eligibility, but mentioned (referred to) in clinical this is not straightforward trials • • Complicated logic in Download and index data from inclusion/exclusion criteria, e.g., clinicaltrials.gov negation • Develop dictionaries of biomarkers • Also need to check title, description, and methods for detecting these in and condition for biomarkers trial descriptions • Focus on eligibility 12

  13. Clinical Trials text data mining • Compiled several lists of biomarkers Statistics for patterns of different types: • Expression: 135 – Single-Nucleotide mutations (Cosmic) • CNV: 32 – Polymorphisms • Other (positive/negative): 20/10 – Fusion genes • Mutation: 37 – Gene regulation (Exp-up/down) • Fusion/rearrangement/translocation: – Copy number changes 10 • Several strategies for finding these in Indexing statistics text: • – 5350 trials with at least one Detect explicit mentions – biomarker Detect patterns based on gene name and ‘marker’ type, e.g., • 855 different biomarkers with hits “GENE amplification” • Top markers: BCR/ABL1 (907), ERBB2 “GENE activating mutation” positive (725), ERBB2 negative (603), • Curated list of cancer types ESR1 positive (467), ERBB2 exp-up matched with conditions (403) 13

  14. Clinical Trials for example case – NSCLC and Erlotinib 14

  15. Clinical Trials for copy number data (CNV) 15

  16. Trials matching patient biomarkers and disease Cancer type, e.g., NSCLC Filter Manual curation Domain knowledge Clinical Trials GUI or command line CNA SNP EXP FUSION INDEL SNA 16

  17. Clinical Trials for combined data – NSCLC BRAF G469A BRAF D594G BRAF V600E EGFR T790M KIF5B/RET CD74/ROS1 KIF5B/ALK BCR/ABL1 17

  18. Details from Clinical Trial information – NCT01922583 18

  19. Clinical Trials matching to patient data • Various levels of stringency for Example: Patient ERBB2 Exp up matching trial to patient Trial: • Perfect match 1. Perfect match: ERBB2 Exp up • Other alteration (incl. same effect) 2. Same effect: ERBB2 CNV gain • Same gene (other biomarker) 3. Similar effect: ERBB2 Positive • Related gene 4. Other alteration: ERBB2 mutation • S = weighted sum of scores 5. Likely opposite effect: ERBB2 Neg. • Biomarker specific scoring models 6. Opposite effect: ERBB2 Exp down due to different prioritization of or, ERBB2 CNV loss relevance of other alterations 7. Gene Only: ERBB2 • AIM: To better map/identify other 8. Related Gene: EGFR alterations with same/similar effect, e.g., amplification/up-regulation with activating mutation 19

  20. Clinical NLP in PubGene - examples • PubGene in Ahus Optique Courtesy of DNV-GL (Tore Hartvigsen) 20

  21. Akershus University Hospital (Ahus) Optique project. Increase patient security by providing easier access to existing information Courtesy of DNV-GL (Tore Hartvigsen) Human touch and empathy – with professional skill

  22. The Surgery Planning Form is completed in 3 Stages Surgery Planning Form Stage 1: DIPS Ahus Examination (“The Green Form”) Structured data Text Stage 2: Metavision Ahus Preparations Metavision O Metavision I Metavision DKS Additional systems Stage 3: System Check/ QA System To complete the form, data must be collected from a number of systems! This is today done manually. 22 Courtesy of DNV-GL (Tore Hartvigsen)

  23. Leave the data in the source systems! Expert users «Ordinary» users Researchers/ Analysts Data A semantic IT solution and warehousing is an option ontology for clinical use in Health Care Ahus research Databases. DIPS DIPS Metav Metavision Metavision (EPJ) Metavision I (EPJ) O DKS DIPS (EPJ) (EPJ) Ahus production databases 23 Courtesy of DNV-GL (Tore Hartvigsen)

  24. We want to «lift» the data out of the silos! « Ordinary » users Expert users A semantic IT solution and ontology for clinical use in Health Care Unstructured data (text) Structured data Solutions provided by the Text mining Optique project 24 Courtesy of DNV-GL (Tore Hartvigsen)

  25. PubGene in Ahus Optique, information extraction Unstructured information Structured information • Height 1,83 m • name=height, type=int, unit=cm, value=183 Fields • ASA • BMI • Height • Weight • Puls • Blood pressure • Temperature • Diagnose codes • Treatment codes 25

  26. PubGene i Ahus Optique, allergy information 26

  27. PubGene i Ahus Optique, status on smoking Sentence Status Røyker. Yes Røyker 15-20 om dagen. Yes Ifølge datter er han også storrøyker, 40/ dag siste 50 år. Yes Røykeplaster? Uncertain Tidligere storrøyker. Stopped Ikke røyker og drikker ikke alkohol, tidligere, måteholdent alkoholbruk. No Eks-røyker, lite alkohol. Stopped Text analysis • Separate text in sentences, detection of sentences containing “ røyke …”, “ røyki …”, “ røykt …” • Classification of sentences based on recognition of keywords and word or sentence patterns • NB: Based on a small database 27

  28. Ahus Optique • Screenshots Courtesy of DNV-GL (Tore Hartvigsen) 28

  29. Courtesy of DNV-GL (Tore Hartvigsen)

  30. Page for surgery planning form Courtesy of DNV-GL (Tore Hartvigsen)

  31. Courtesy of DNV-GL (Tore Hartvigsen)

  32. BMI Courtesy of DNV-GL (Tore Hartvigsen)

  33. Courtesy of DNV-GL (Tore Hartvigsen)

  34. Courtesy of DNV-GL (Tore Hartvigsen)

  35. Courtesy of DNV-GL (Tore Hartvigsen)

  36. Courtesy of DNV-GL (Tore Hartvigsen)

  37. Allergy Courtesy of DNV-GL (Tore Hartvigsen)

  38. Smoking Courtesy of DNV-GL (Tore Hartvigsen)

  39. Courtesy of DNV-GL (Tore Hartvigsen)

  40. Surgery planning form Courtesy of DNV-GL (Tore Hartvigsen)

  41. Further development, text processing/analysis • A large set of options and potential – Far more effective collection of more relevant information, e.g., by filling surgery forms (“The green form”) – Improved quality through automatic detection of errors in documents and control of consistency with structured data • Further steps for Ahus Optique – Simple: Extraction of more “static” fields, like lab results – Information about medication – Information on heart function, lung function – Exploit document structure and information on document types 42

Recommend


More recommend