using ontologies to mine unstructured data in medicine
play

Using ontologies to mine unstructured data in medicine Nigam Shah, - PowerPoint PPT Presentation

Using ontologies to mine unstructured data in medicine Nigam Shah, MBBS, PhD nigam@stanford.edu Profiling a patient set Patients with some diagnosis All patients Disease Ontology Appropriate control Profiling patient sets ICD9 789.00 (


  1. Using ontologies to mine unstructured data in medicine Nigam Shah, MBBS, PhD nigam@stanford.edu

  2. Profiling a patient set Patients with some diagnosis All patients Disease Ontology Appropriate control

  3. Profiling patient sets ICD9 789.00 ( Abdominal pain, unspecified site ) X (+) X (-) a b Y (+) Y (-) c d 86k patient Reports Patient records processed from U. Pittsburg NLP Repository with IRB approval.

  4. Associations and outcomes Gene Disease Drug Device Procedure Environment Gene Gene Enrichment Off-label Disease Indications What Drug Side effects associations Device can we find? Procedure Environment

  5. Generation of annotated data at scale Text clinical note BioPortal – knowledge graph Creating clean lexicons Frequency Term – 1 Diseases : Annotation Workflow Term recognition : tool NCBO NegEx Procedures Annotator : Patterns Sy ntactic ty pes Term – n Drugs Terms Recognized P1 ICD9 ICD9 ICD9 ICD9 ICD9 ICD9 … … … P1 T1, T5, T4, T8, T6, T1, Further Analysis T2, T4, T3, T9, T8, T2, NegEx Rules – no T3 T1 T4 T10 no T4 T4 Negation detection P2 P2 Negation detection P3 Interest Cohort P3 of : : Pn Terms form a temporal series of tags  Pn

  6. Detecting the Vioxx Risk Signal Vioxx Patients (1,560) Vioxx  MI (339) MI Patients (1,827) ROR of 2.058, CI of [1.804, 2.349] ROR=1.524, CI=[0.872, The X 2 statistic has p-value < 10 -7 2.666] X 2 p-value = 0.06816. RA Patients (14,079) p-value < 1.3x10 -24 MI No MI Vioxx a = 339 b = 1221 No Vioxx c = 1488 d = 11031

  7. We should stop acting as if our goal is to author extremely elegant theories, […] and make use of the best ally we have: the unreasonable effectiveness of data.

  8. Big Data in biomedicine ? Big Next gen-seq Data Size Small EMR, Clinical notes Small Large Number of samples

  9. The problem On-label Off-label What Pharma Whatever else the Indication companies get approval doctor prescribes for for Side effect / Found during the pre- Goal of drug-safety Adverse marketing phase surveillance effect • Ambulatory: 100,000 deaths and $177 billion annually • 21% of prescriptions • In patient: estimated that roughly 30% of hospital stays • 73% with very little have an adverse drug event evidence

  10. Detecting Off-label use

  11. Detecting Adverse Events

  12. Patterns worth testing (off-label usage, which is risky)  Identify off-label use • Find drug- indication pairs that “look like” indications  Identify which use “may be risky” • Use existing, known side effect databases • Learn drug-disease associations that look like side effects  Assemble I-D-A triplets • Indication – Drug – Adverse effect. e.g. RA – Vioxx – MI  Test on unstructured data

  13. Testing ‘interesting patterns’

  14. The team @ www.bioontology.org/project-team NIH Roadmap grant U54 HG004028 15

Recommend


More recommend