Using ontologies to mine unstructured data in medicine Nigam Shah, MBBS, PhD nigam@stanford.edu
Profiling a patient set Patients with some diagnosis All patients Disease Ontology Appropriate control
Profiling patient sets ICD9 789.00 ( Abdominal pain, unspecified site ) X (+) X (-) a b Y (+) Y (-) c d 86k patient Reports Patient records processed from U. Pittsburg NLP Repository with IRB approval.
Associations and outcomes Gene Disease Drug Device Procedure Environment Gene Gene Enrichment Off-label Disease Indications What Drug Side effects associations Device can we find? Procedure Environment
Generation of annotated data at scale Text clinical note BioPortal – knowledge graph Creating clean lexicons Frequency Term – 1 Diseases : Annotation Workflow Term recognition : tool NCBO NegEx Procedures Annotator : Patterns Sy ntactic ty pes Term – n Drugs Terms Recognized P1 ICD9 ICD9 ICD9 ICD9 ICD9 ICD9 … … … P1 T1, T5, T4, T8, T6, T1, Further Analysis T2, T4, T3, T9, T8, T2, NegEx Rules – no T3 T1 T4 T10 no T4 T4 Negation detection P2 P2 Negation detection P3 Interest Cohort P3 of : : Pn Terms form a temporal series of tags Pn
Detecting the Vioxx Risk Signal Vioxx Patients (1,560) Vioxx MI (339) MI Patients (1,827) ROR of 2.058, CI of [1.804, 2.349] ROR=1.524, CI=[0.872, The X 2 statistic has p-value < 10 -7 2.666] X 2 p-value = 0.06816. RA Patients (14,079) p-value < 1.3x10 -24 MI No MI Vioxx a = 339 b = 1221 No Vioxx c = 1488 d = 11031
We should stop acting as if our goal is to author extremely elegant theories, […] and make use of the best ally we have: the unreasonable effectiveness of data.
Big Data in biomedicine ? Big Next gen-seq Data Size Small EMR, Clinical notes Small Large Number of samples
The problem On-label Off-label What Pharma Whatever else the Indication companies get approval doctor prescribes for for Side effect / Found during the pre- Goal of drug-safety Adverse marketing phase surveillance effect • Ambulatory: 100,000 deaths and $177 billion annually • 21% of prescriptions • In patient: estimated that roughly 30% of hospital stays • 73% with very little have an adverse drug event evidence
Detecting Off-label use
Detecting Adverse Events
Patterns worth testing (off-label usage, which is risky) Identify off-label use • Find drug- indication pairs that “look like” indications Identify which use “may be risky” • Use existing, known side effect databases • Learn drug-disease associations that look like side effects Assemble I-D-A triplets • Indication – Drug – Adverse effect. e.g. RA – Vioxx – MI Test on unstructured data
Testing ‘interesting patterns’
The team @ www.bioontology.org/project-team NIH Roadmap grant U54 HG004028 15
Recommend
More recommend