Obtaining phenotype and outcome data from EHRs Josh Denny, MD MS - PowerPoint PPT Presentation

Obtaining phenotype and outcome data from EHRs Josh Denny, MD MS Vanderbilt University Medical Center 3/26/2018

EHR data are dense and efficient for discovery: Vanderbilt’s experience (BioVU) BioVU start Vanderbilt biobank enrollment EHR Data from Vanderbilt Biobank

eMERGE Goals: • To perform genomic studies using the EHR • To implement of genomic medicine

Making text documents useful for research Billing CC: SOB codes HPI: Mr. Smith is a 65yo w/ h/o CHF, … no dm2… Customized classifiers on atenolol 50mg daily… (smoking status, etc) Mother had RA. Clinical notes, Deidentify: remove HIPAA identifiers + …. test reports, CC: SOB Structured Output etc HPI: Mr. **jones** is a Research DrugName: atenolol 65yo w/ h/o CHF, … no Strength: 50 mg EHR Medication dm2… Frequency : daily extraction on atenolol 50mg daily… Mother had RA. chief_complaint: Shortness of Breath history_present_illness: Congestive Heart Failure Type 2 diabetes, negated Find biomedical concepts and mother_medical_history: qualifiers; create structured data rheumatoid arthritis Structured Output certainty (positive, negated) Who experienced it? (patient or family member?)

Finding a “simple” disease in the EHR: Who has hypertension? Definition: SBP > 140 or DBP > 90 Patient 1 Doesn’t have hypertension Patient 2 Has hypertension

Our “simple” example: Hypertension Multiple components are better (and blood pressure is the worst) Teixeira, JAMIA 2016

What we learned - Finding phenotypes in the EHR Clinical Notes Billing codes (NLP - natural language ICD9 & CPT processing) True cases Medications ePrescribing Labs & test results & NLP NLP Algorithm Development and Implementation <95% Case & control Identify Genetic ≥95% Manual review; Deploy algorithm phenotype associatio assess in BioVU development of interest n tests precision and refinement

Early discovery science in eMERGE – Hypothyroidism Algorithms can be deployed across multiple EHRs Analyses can be performed using extant data Am J Hum Genet. 2011;89:529-42

GWAS of QRS Duration in eMERGE n=5,272 SCN5A/SCN10A Ritchie et al., Circulation 2013

What happens in the “heart healthy” population? Examined the n=5272 Atrial fibrillation-free AA “heart healthy” population AG survival Followed for development of atrial HR=1.49 per G allele fibrillation based on GG p=0.001 genotype Years since normal ECG (and no heart disease) Ritchie et al., Circulation 2013

EHRs for drug response: Clopidogrel adverse events associated with CYP2C19 From clinical trials From the EHR Normal metabolizers Carriers N=807, P=0.005 Mega et al., NEJM 2009 Delaney et al. Clin Pharm Ther. 2012

Deep learning for Diabetic Retinopathy Train a machine learning algorithm over >128k images Gulshan et al. JAMA 2016

Phenome scanning ( PheWAS ) in the EHR Associated A phenotype genotypes Dense genomic information A genetic Associated variant phenotypes The curated EHR- based phenome

Replications of GWAS Binary traits associations via PheWAS P-value for replication: • All - 210/751: 2x10 -98 Continuous traits • Powered - 51/77: 3x10 -47 Nat Biotech 2013; 31:1102-1111

PheWAS across all HLA types (n= 37,270) Karnes et al, Sci Trans Med 2017

The potential for “call back” deeper phenotyping: Long QT genes ( SCN5A and KCNH2 ) in 2,200 sequenced patients in eMERGE • 83 rare (MAF < 1%) in SCN5A, 45 in KCNH2 • 121/128 MAF < 0.5%, 92 singletons • Three labs assessed known/likely pathogenicity Lab 1 16/121 4 Lab 2 Lab 3 24/121 17/121 Van Driest et al, JAMA 2016

Calculating a Phenotype Risk Score (PheRS) For each record i , generate PheRS OMIM 𝑙 Human 1 feature 1 Phenotype PheRS 𝑗 = 0 𝜕 𝑘 OMIM Ontology 𝑘=1 feature 2 ... EHR weight for 0=phenotype j Score for Add up phenotype j phenotypes OMIM absent subject i terms for k derrived from 1=phenotype j feature k phenotypes entire EHR present Repeat this for all Mendelian diseases Bastarache et al, Science 2018

Example: a phenotype risk score in Cystic Fibrosis CF cases CF controls Age/Sex 18F 26M 29F 29M 18F 26M 29F 29M Chronic airway obstruction Pneumonia Diseases of pancreas Hypovolemia Acute upper respiratory infections Asthma Bronchiectasis Intestinal malabsorption Hepatomegaly Acute pulmonary heart disease Phenotype Risk Score 9.8 4.4 6.3 7.8 2.5 0.7 0.0 0.7 Bastarache et al, Science 2018

PheRS identified potentially pathogenic SNVs N=21k on exome chip 6k SNVs Bastarache et al, Science 2018

The All of Us Research Program – Breaking Down Data Silos Precision Medicine Initiative, PMI, All of Us, the All of Us logo, and The Future of Health Begins With You are service marks of the U.S. Department of Health and Human Services.

Overview of the All of Us approach and protocol Direct Health Care Provider Volunteers Organizations EHR data Health Baseline Bio- Smartphones Surveys measurements specimens & Wearables Multiple data types linked together by semantic standards

All of Us will aggregate data from many sources Data added centrally by DRC From Direct Volunteers From Healthcare Provider Orgs Version 1 (2018) Sync for Science Death Claims & … Index Rx Data Billing Meds Visits Labs codes Health data aggregators Version 2 (PicnicHealth) Raw Data Repository Clinical Clinical Notes & Participant provided data Messaging Reports (Health surveys, activity monitors, etc) Much longer term Curated Data Repository Geospatial data Local Images Registries Participant exams and biospecimens APIs, Analysis tools, etc

Sync 4 Science (S4S) – a technology to share health data S4S: - FHIR-based - Starting with MU Common Clinical Data set S4S Pilot Sites

Data Access is centralized in All of Us Traditional Approach: Bring data AoU Approach: Bring to researchers researchers to the data Data Download from Public Cloud public repository Problems Advantages • Data sharing = data copying • Cost • Security (data handoffs) • Threat detection and auditing • Huge infrastructure needed • Increased Accessibility • Siloed compute • Shared compute

The power of a data biosphere of common semantics and APIs

Obtaining phenotype and outcome data from e-health records and digital platforms: the experience of UK Biobank Cathie Sudlow Professor of Neurology and Clinical Epidemiology Director, Centre for Medical Informatics, Usher Institute, University of Edinburgh Director of Health Data Research UK Scottish substantive site Chief Scientist, UK Biobank International Cohorts Summit, Durham, North Carolina March 2018

UK Biobank in a nutshell • 500,000 UK men and women aged 40-69 years when recruited during 2006-2010 • Consent for all types of health research by both academic and commercial researchers • Extensive baseline questions and physical measures, with biological samples stored for future assays • Subsequent enhancements in all or large subsets of participants: – Data from portable wearable devices (100,000 accelerometry; 20,000+ continuous ECG) – Sample assays in all or large subsets: Complete: genome-wide genotyping; biochemistry panel Underway/planned: exome and whole genome sequencing; proteomics; infectious disease assays; stool microbiome – Multimodal imaging of 100,000 (>22,000 so far) – Web questionnaires • Comprehensive, long term follow-up for a wide range of health-related outcomes • Open access for approved research: see www.ukbiobank.ac.uk

Follow-up of participants in very large prospective cohorts Aim: identify a wide range of incident diseases and other health related outcomes Active methods requiring participant re-engagement • face to face reassessment • postal or web-based surveys • expensive • prone to incomplete coverage & selective loss to follow-up • miss cases emerging between assessments Passive methods via linkages to national health records • can follow all participants without need for re-engagement • efficient and cost effective • need adequate consent at recruitment • rely on universal healthcare system & availability of relevant datasets • can only detect cases of disease diagnosed in a healthcare setting • data need to be accurate and sufficiently detailed for research studies

Web questionnaires • Using email and web questionnaires – for more detailed assessment of exposures – and to obtain information on outcomes that cannot be obtained through linking to health records • Of 350,000 with email, >150,000 complete each questionnaire – Details of dietary intake Useful for following change over time…but beware – Cognitive function selective attrition – Mental health (thoughts and feelings) – Gastrointestinal symptoms

Obtaining phenotype and outcome data from EHRs Josh Denny, MD MS - PowerPoint PPT Presentation

Obtaining phenotype and outcome data from EHRs Josh Denny, MD MS Vanderbilt University Medical Center 3/26/2018 EHR data are dense and efficient for discovery: Vanderbilts experience (BioVU) BioVU start Vanderbilt biobank enrollment

Lecture 3: Biology Basics Continued Spring 2020 January 28, 2020 Genotype/Phenotype Phenotype:

Lecture 3: Biology Basics Continued Fall 2019 September 3, 2019 Genotype/Phenotype Phenotype:

PhenoBlocks: Phenotype Comparison Visualizations Glueck, Michael, et al. "PhenoBlocks:

Mating system Random Mate choice is independent of both phenotype and genotype Positive assortment

Phenotype database: what is it? Peter Kok, Jolanda Strubel 04-APR-2017 Contents Background 1.

E-Discovery in Employment Litigation Cost-Saving Strategies for Preserving Obtaining and

Obtaining Simultaneous Equation Models through a unified shared-memory scheme of metaheuristics

Obtaining SMT dictionaries for related languages Miguel Rios, Serge Sharoff University of Leeds

Example 10.23 Compute the probability of obtaining a score of 11 on a single roll of two dice.

The complexity of genotype-phenotype maps and its consequences for evolution Peter Schuster

Phenotype Sequencing Marc Harper UCLA Bioinformatics, Genomics and Proteomics March 4th, 2013

Atrazine Induced Epigenetic Transgenerational Inheritance of Disease, Lean Phenotype and Sperm

Detecting and visualizing cell phenotype differences from microscopy images using transport-based

The refractory asthma patient: Thinking outside the box to phenotype and give

Fertility Traits: Whats in a phenotype? Where we are and Why weve not made a lot of

The extended phenotype of Eucalyptus globulus B Potts 1 , R Barbour 1 , J OReilly-Wapstra 1 , S

Oxygen W Webinar ar Part 3 3 Durable M Medical E Equipment S Suppliers February 27,

Collaboratory Coordinating Center Ethics of Standard of Care Research Jeremy Sugarman, MD, MPH, MA

Common Pulmonary Problems Diana Coffa, MD Family Medicine Board Review Course, 2015 Patrick J.

Hybrid Modeling and Analysis of Biological Networks Ashish Tiwari Tiwari@csl.sri.com Computer

Webcast Jointly Presented By the Johns Hopkins University School of Medicine and the Institute

Contents List of algorithms iii 13 Mathematical morphology 1 13.1 Basic morphological

C C Care of the Potential Organ Care of the Potential Organ f h f h i l O i l O Donor

Automated Summarisation for Evidence Based Medicine Diego Moll a Centre for Language

Sambuz

Useful Links

Newsletter

Mail Us

Obtaining phenotype and outcome data from EHRs Josh Denny, MD MS - PowerPoint PPT Presentation

Obtaining phenotype and outcome data from EHRs Josh Denny, MD MS Vanderbilt University Medical Center 3/26/2018 EHR data are dense and efficient for discovery: Vanderbilts experience (BioVU) BioVU start Vanderbilt biobank enrollment

Lecture 3: Biology Basics Continued Spring 2020 January 28, 2020 Genotype/Phenotype Phenotype:

Lecture 3: Biology Basics Continued Fall 2019 September 3, 2019 Genotype/Phenotype Phenotype:

PhenoBlocks: Phenotype Comparison Visualizations Glueck, Michael, et al. &quot;PhenoBlocks:

Mating system Random Mate choice is independent of both phenotype and genotype Positive assortment

Phenotype database: what is it? Peter Kok, Jolanda Strubel 04-APR-2017 Contents Background 1.

E-Discovery in Employment Litigation Cost-Saving Strategies for Preserving Obtaining and

Obtaining Simultaneous Equation Models through a unified shared-memory scheme of metaheuristics

Obtaining SMT dictionaries for related languages Miguel Rios, Serge Sharoff University of Leeds

Example 10.23 Compute the probability of obtaining a score of 11 on a single roll of two dice.

The complexity of genotype-phenotype maps and its consequences for evolution Peter Schuster

Phenotype Sequencing Marc Harper UCLA Bioinformatics, Genomics and Proteomics March 4th, 2013

Atrazine Induced Epigenetic Transgenerational Inheritance of Disease, Lean Phenotype and Sperm

Detecting and visualizing cell phenotype differences from microscopy images using transport-based

The refractory asthma patient: Thinking outside the box to phenotype and give

Fertility Traits: Whats in a phenotype? Where we are and Why weve not made a lot of

The extended phenotype of Eucalyptus globulus B Potts 1 , R Barbour 1 , J OReilly-Wapstra 1 , S

Oxygen W Webinar ar Part 3 3 Durable M Medical E Equipment S Suppliers February 27,

Collaboratory Coordinating Center Ethics of Standard of Care Research Jeremy Sugarman, MD, MPH, MA

Common Pulmonary Problems Diana Coffa, MD Family Medicine Board Review Course, 2015 Patrick J.

Hybrid Modeling and Analysis of Biological Networks Ashish Tiwari Tiwari@csl.sri.com Computer

Webcast Jointly Presented By the Johns Hopkins University School of Medicine and the Institute

Contents List of algorithms iii 13 Mathematical morphology 1 13.1 Basic morphological

C C Care of the Potential Organ Care of the Potential Organ f h f h i l O i l O Donor

Automated Summarisation for Evidence Based Medicine Diego Moll a Centre for Language

Sambuz

Useful Links

Newsletter

Mail Us

PhenoBlocks: Phenotype Comparison Visualizations Glueck, Michael, et al. "PhenoBlocks: