Effective Feature Representation for Clinical Text Concept - PowerPoint PPT Presentation

Effective Feature Representation for Clinical Text Concept Extraction Yifeng Tao 1,2 , Bruno Godefroy 1 , Guillaume Genthial 1 , Christopher Potts 1,3,* 1 Roam Analytics 2 Carnegie Mellon University 3 Stanford University Yifeng Tao et al. NAACL Clinical NLP 2019 1

Background: Healthcare Text Datasets o Crucial information of healthcare recorded only in free-form text Clinical Diagnosis Detection Social Scientific Prescription Reasons Penn Adverse Drug Reactions Chemical-Disease Relations Init Maximization Commercial Drug-Disease Relations Expectation Crowdsourcing Expert annotation FDA Drug Labels convergence End Drug-Disease Relations Dataset [Figures from: 1. Lamjed Ben Jabeur et al. Uprising microblogs: A Bayesian network retrieval model for tweet search. 2012, 2. https://www.sjm.com.br/utilidades/pubmed-busca, 3. http://anakin.uta.cloud/uncategorized/the-need-for-drug-donations, 4. https://www.autismawareness.com.au/news-events/aupdate/is-there-an-over-diagnosis-of-autism] Yifeng Tao et al. NAACL Clinical NLP 2019 2

Background: Healthcare Text Datasets o Clinical text datasets are scarce and expensive o Privacy considerations o Domain specialists 10000 8000 # texts 6000 4000 2000 0 Diagnosis Prescription Penn Adverse Chemical-Disease Drug-Disease Detection Reasons Drug Reactions Relations Relations Yifeng Tao et al. NAACL Clinical NLP 2019 3

Task: Clinical Text Annotation P OSITIVE C ONCERN Diagnosis Detection Asymptomatic bacteriuria , could be neurogenic bladder disorder . P RESCRIBED R EASON Prescription Reasons I will go ahead and place him on Clarinex for his seasonal allergic rhinitis . ADR Penn Adverse Drug Reactions (ADR) #TwoThingsThatDontMixWell venlafaxine and alcohol - you’ll cry and ADR throw chairs at your mom’s BBQ . D ISEASE D RUG Chemical–Disease Relations (CDR) Ocular and auditory toxicity in hemodialyzed patients receiving desferrioxamine . T REATS Drug–Disease Relations Indicated for the management of active rheumatoid arthritis and should not be C ONTRA used for rheumatoid arthritis in pregnant women . Yifeng Tao et al. NAACL Clinical NLP 2019 4

Previous Models OTHER DISCONTINUED REASON REASON OTHER DISCONTINUED REASON REASON CRF CRF CRF CRF CRF CRF CRF CRF LSTM LSTM LSTM LSTM sparse features sparse features sparse features sparse features word word word word embedding embedding embedding embedding Soma Soma Stop Stop cost cost for for o LSTM-CRF o HB-CRF o General text o Clinical text o Distributed word embeddings o Sparse hand-built features Yifeng Tao et al. NAACL Clinical NLP 2019 5

Model: ELMo-LSTM-CRF-HB o Dense ELMo word embeddings + Sparse hand-built features OTHER DISCONTINUED REASON REASON CRF CRF CRF CRF dense features dense features dense features dense features LSTM LSTM LSTM LSTM sparse features sparse features sparse features sparse features ELMo ELMo ELMo ELMo Soma Stop cost for Yifeng Tao et al. NAACL Clinical NLP 2019 6

Performance: Per-token Macro-F1 Scores o Hyperparameters tuned through cross-validation o Each experiment repeated for five times rand-LSTM-CRF HB-CRF ELMo-LSTM-CRF ELMo-LSTM-CRF-HB *** *** 90 *** 80 * F1 Score 70 60 ** 50 40 Diagnosis Prescription Penn Adverse Chemical-Disease Drug-Disease Detection Reasons Drug Reactions Relations Relations *: p <0.05, **: p <0.01, ***: p <0.001 Yifeng Tao et al. NAACL Clinical NLP 2019 7

The Role of Text Length o LSTM: handles short texts well o HB-CRF: robust on long texts Yifeng Tao et al. NAACL Clinical NLP 2019 8

CRF Potential Scores o LSTM features always more OTHER DISCONTINUED REASON REASON important o HB features make substantial dense features dense features dense features dense features contribution LSTM LSTM LSTM LSTM sparse features sparse features sparse features sparse features ELMo ELMo ELMo ELMo Soma Stop cost for Yifeng Tao et al. NAACL Clinical NLP 2019 9

Major Improvements in Minor Categories 100 10 100 10 Prescription Reasons Diagnosis Detection 90 90 8 8 Improvement (%) Imrpovement (%) F1 score (%) F1 score (%) 80 80 6 6 70 70 4 4 60 60 2 2 50 50 40 0 40 0 OTHER POSITIVE RULED-OUT CONCERN OTHER REASON PRESCRIBED DISCONTINUED (74888) (24489) (2797) (2780) (83618) (9114) (5967) (2754) 100 10 100 120 Chemical-Disease Relations Drug-Disease Relations 9 90 90 100 8 Improvement (%) Improvement (%) 7 F1 score (%) F1 score (%) 80 80 80 6 70 5 70 60 4 60 40 60 3 2 50 20 50 1 40 0 40 0 OTHER TREATS UNRELATED PREVENTS OTHER DISEASE CHEMICAL (10634) (3671) (1145) (320) (104530) (6887) (6270) Label/Category (Support) Yifeng Tao et al. NAACL Clinical NLP 2019 10

Conclusion o A unified feature representation for clinical text sequence labeling o Sparse, ontology-driven features o Dense LSTM features o Best performance on five distinct healthcare datasets o Takes advantages of both feature types o Makes maximal use of small, expensive, domain-specific healthcare texts o A new labeled clinical dataset o Identifies the treatment relations between drugs and diseases o Extensive analysis to identify what information our model makes use of, and why its performance is consistently improved Yifeng Tao et al. NAACL Clinical NLP 2019 11

Acknowledgement o Roam Analytics o Christopher Potts o Bruno Godefroy o Guillaume Genthial o Kevin Reschke o NLP Group Yifeng Tao et al. NAACL Clinical NLP 2019 12

Penn Adverse Drug Reactions (ADR) Results Penn Adverse Drug Reactions 100 200 90 Improvement (%) 150 F1 score (%) 80 70 100 60 50 50 40 30 0 OTHER ADR INDICATION (5023) (283) (29) F1 score (%) Improvement (%) o The Role of Text Length o Major Improvements in Minor Categories Yifeng Tao et al. NAACL Clinical NLP 2019 13

Example of Hand-built Features Yifeng Tao et al. NAACL Clinical NLP 2019 14

Procedure for Building Drug-Disease Relations Dataset Init Maximization Expectation Crowdsourcing Expert annotation FDA Drug Labels convergence End Drug-Disease Relations Dataset Yifeng Tao et al. NAACL Clinical NLP 2019 15

Statistics of Datasets Yifeng Tao et al. NAACL Clinical NLP 2019 16

Hyperparameters of Experiments Yifeng Tao et al. NAACL Clinical NLP 2019 17

Effective Feature Representation for Clinical Text Concept - PowerPoint PPT Presentation

Effective Feature Representation for Clinical Text Concept Extraction Yifeng Tao 1,2 , Bruno Godefroy 1 , Guillaume Genthial 1 , Christopher Potts 1,3,* 1 Roam Analytics 2 Carnegie Mellon University 3 Stanford University Yifeng Tao et al. NAACL

Feature Representation Vision BoWs and Beyond Praveen Krishnan Feature Representation in

Intro to Feature Representation in Virtual Screening Shengchao Liu, Gitter Group Feature

Neural representation of linguistic feature Neural representation of linguistic feature hierarchy

Feature Representation in Person Re-identification Hong Chang Institute of Computing Technology

1 Introduction The Text Mining Process Text representation Learning Conclusion Introduction

Text Representation http://www.cse.iitb.ac.in/~soumen/mining-the-web/ Ahmed Rafea Text

Text Representation http://www.cse.iitb.ac.in/~soumen/mining-the-web/ Ahmed Rafea Text

Lecture 22: Representation Learning Kai-Wei Chang CS @ University of Virginia kw@kwchang.net

Visual Feature Learning and Representation Qingshan Liu Nanjing University of Information

Image Representation CS 105 Data Representation Types of data: Numbers Text

Text 1. A text is a sequence of characters 2. Each character is taken from a finite alphabete

CS101 Lecture 03: Hexadecimal Numbers Text Representation Hexadecimal Numbers Text Encoding

A A La Carte Emb mbedding: Ch Cheap but Effective Induction on of of Se Semantic Feature

Data Representation Data Representation Types of data: Numbers Text Audio

Effective Effective Assessment and Feedback Assessment and Feedback in Clinical Precepting in

Feature Representation Learning in Deep Learning Networks

Many text vis tools http://textvis.lnu.se/ 2 but sometimes need to read text with

Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representation Eliyahu

Effective Use of f Word Order for Text xt Categorization wit ith Convolutional Neural Network

Clinical NLP, PubGene Clinical trials in Coremine Oncology Text processing and information

Maximizing Gain Full Feature Space Representation While Upgrading Minimal Subset of PCs Tom

Green Blue Blue Blue Red Red Text visualization Why use text in visualization? Instant

Novel Balanced Feature Representation for Wikipedia Vandalism Detection Task Istvn Hegeds,

Educating Text Autoencoders: Latent Representation Guidance via Denoising Tianxiao Shen Jonas