Toward Comprehensive Syntactic and Semantic Annotations of the - PowerPoint PPT Presentation

Toward Comprehensive Syntactic and Semantic Annotations of the Clinical Narrative Guergana K. Savova, PhD Boston Children � s Hospital Harvard Medical School

Albright, Daniel; Lanfranchi, Arrick; Fredriksen, Anwen; Styler, William; Warner, Collin; Hwang, Jena; Choi, Jinho; Dligach, Dmitriy; Nielsen, Rodney; Martin, James; Ward, Wayne; Palmer, Martha; Savova, Guergana. 2013. Towards syntactic and semantic annotations of the clinical narrative. Journal of the American Medical Informatics Association. 2013;0:1–9. doi:10.1136/amiajnl-2012-001317 h t t p : / / j a m i a . b m j . c o m / c g i / r a p i d p d f / a m i a j n l - 2 0 1 2 - 0 0 1 3 1 7 ? ijkey=z3pXhpyBzC7S1wC&keytype=ref

JAMIA,&2013&

Acknowledgments NIH ! ! Multi-source integrated platform for answering clinical questions (MiPACQ) (NLM RC1LM010608) ! Temporal Histories of Your Medical Event (THYME) (NLM 10090) Office of the National Coordinator of Healthcare ! Technologies (ONC) ! Strategic Healthcare Advanced Research Project: Area 4, Secondary Use of the EMR data (SHARPn) (ONC 90TR0002) Institutions contributing data ! ! Mayo Clinic ! Seattle Group Health Cooperative

Overview Motivation ! Layers of annotations ! ! TreeBank ! PropBank ! UMLS Component development ! Discussion and future directions !

Computable Annotations: Why Developing algorithms ! System evaluation ! Community-wide training and test sets ! ! Compare results and establish state-of-the-art ! Establishing standards (ISO TC37) Long tradition in the general NLP domain ! ! Linguistic Data Consortium and PTB Layers of annotations on the same text !

Goals Combine annotation types developed for general ! domain syntactic and semantic parsing with medical domain-specific annotations Create accessible annotations for a variety of methods ! of analysis including algorithm and component development Evaluate the quality of the annotations by training ! components to perform the same annotations automatically Distribute resources (corpus, guidelines, methods - ! Apache cTAKES; ctakes.apache.org)

Background MiPACQ project ! Previous work ! ! Ogren et al, 2008 ! Roberts et al, 2009 (CLEF corpus) ! I2b2/VA challenges ! Bioscope corpus (Vincze et al, 2008) ! ODIE Contributions ! ! Layers of annotations ! Adherence to community standards and conventions (PTB, PropBank, UMLS)

Corpus

Description MiPACQ ! ! ~130K words of clinical narrative ! c.f. 901,673 tokes of Wall Street Journal (WSJ) Annotation guidelines ! ! Syntactic tree (TreeBank): http://clear.colorado.edu/compsem/documents/ treebank_guidelines.pdf ! Semantic role (PropBank): http://clear.colorado.edu/compsem/documents/ propbank_guidelines.pdf ! UMLS: http://clear.colorado.edu/compsem/documents/ umls_guidelines.pdf ! Clinical coreference: http://clear.colorado.edu/compsem

Treebank Annotations

Treebank Annotations Consist of part-of-speech tags, phrasal and function ! tags, and empty categories organized in a tree-like structure Adapted Penn � s POS tagging guidelines, bracketing ! guidelines, and associated addenda Extended the guidelines to account for domain- ! specific characteristics h t t p : / / c l e a r . c o l o r a d o . e d u / c o m p s e m / d o c u m e n t s / treebank_guidelines.pdf

Treebank Review Tokenization, sentence segmentation, and part of speech labels (in brown) are all done in an initial pass. The patient underwent a radical tonsilectomy (with additional right neck dissection) for metastatic squamous cell carcinoma .

Treebank Review Phrase labels (in green) and grammatical function tags (in blue) are added by a parser and then manually corrected The patient underwent a radical tonsilectomy (with additional right neck dissection) for metastatic squamous cell carcinoma .

Treebank Review In that second pass, new tokens are added for implicit and empty arguments (in red), and grammatically linked elements are indexed (in yellow) Patient was seen 2/18/2001

Clinical Additions – S-RED Clinical language is highly reduced, and often elides copula ( � to be � ). -RED tag was introduced to mark clauses with elided copulas. Patient (was) seen 2/18/2001

Clinical Additions – S-RED Patient (is) having hot flashes -RED tags are used for all elisions of the copula, including passive voice, progressive (top example) and equational clauses Elderly patient (is) in care center with cough (bottom example).

Clinical Additions – Null Arguments Dropped subjects are very common in this data, and *PRO* tags are added to represent them. (*PRO*) (was) Seen 2/18/2001 (*PRO*) (is) Obese (*PRO*) Complains of nausea

Clinical Additions – FRAG Use of FRAG label for fragmentary text was increased to accommodate the various kinds of non-clausal structures in the data. Discussion and recommendations: We discussed the registry objectives and procedures.

Inter-annotator Agreement F-score (EvalB) ! ! Constituent match – if they share the same node label and span (punctuation placement, function tags, trace and gap indices, and empty categories are ignored) 0.926 !

Propbank Annotations

What is Propbank? A database of syntactically parsed trees annotated ! with semantic role labels All arguments are annotated with semantic roles in ! relation to their predicate structure This provides training data that can identify ! predicate-argument structures for individual verbs.

Propbank Labels Labels do not change with predicate ! Meanings of core arguments 2-5 change ! with predicate Arg0 proto-agent for transitive verbs ! Arg1 proto-patient for transitive verbs ! Meanings of Adjunctive args do not change !

Propbank Labels Arg0 = agent ! Arg1 = theme / patient ! A r g 2 = b e n e f a c t i v e / i n s t r u m e n t / ! attribute / end state Arg3 = start point / benefactive / attribute ! Arg4 = end point ! ArgM = modifier !

Propbank Labels ARG0(agent) Adverbial Manner ARG1(patient) Cause Modal ARG2 Direction Negation ARG3 Discourse Purpose ARG4 Extent Temporal Location Predication

Why Propbank? Identifying a commonalities in predicate-argument ! structures: Agent diagnosing [Dr.Z] diagnosed [Jack � s bronchitis] Person diagnosed Disease [Jack] was diagnosed [with bronchitis] [by Dr.Z] [Dr. Z � s] diagnosis [of Jack � s bronchitis] allowed her to treat him with the proper antibiotics.

Stages of the Propank process Frame Creation !

Stages of Propbank Annotation ! ! Data is double annotated ! Annotators 1. Determine and select the sense of the predicate 2. Annotate the arguments for the selected predicate sense Adjudication ! ! After data is annotated, it is passed to an adjudicator who resolves differences between the two annotators ! This creates the gold standard – corrected, finished training data

Annotation Example

Results Propbank layer included 1772 distinct predicate ! lemmas ! 1006 has existing frames ! 74 new verb frames were created ! 692 noun frames were created Of numbered arguments, Arg0 was the most ! common, at 48.47%, followed by Arg1 at 14.58%

Inter-annotator Agreement Agreement was calculated 3 ways: ! ! Exact -- annotation needed to match on constituent boundaries and roles ! Core-Argument -- constituent boundaries matched, numbered arguments were the same, and ArgMs were used in the with exact boundaries ! Constituent -- annotators marked the same constituent Results: ! ! PropBank, exact 0.891 ! PropBank, Core-arg 0.917 ! PropBank, Constituent 0.931

UMLS Annotations

UMLS Semantic Types, Groups and Relations annotation UMLS (Unified Medical Language System) was ! developed to help with cross-linguistic translation of medical concepts We mark semantic groups (similar to Named Entity ! Types) using UMLS with attributes: ! Negation (true/false) ! Status (none (=confirmed), possible, historyOf, and familyHistoryOf) Added Person category ! 34

UMLS Example The patient underwent a radical tonsillectomy (with ! additional right neck dissection) for metastatic squamous cell carcinoma. He returns with a recent history of active bleeding from his oropharynx.

Inter-annotator Agreement F1 measure ! ! Boundaries are exact match (0.697) ! Boundaries are partial match (0.75)

Development and Evaluation of NLP Components

Development of NLP Components Treebank Dependency PropBank Conversion Part-of-speech Dependency Semantic Role Tagging Parsing Labeling Automatic Output

Development of NLP Components ClearNLP dependency converter ! ! Generates the Stanford dependency labels (and more). ! Unlike the Stanford dependency converter, our approach generates non-projective dependencies. ! Adapts to the MiPACQ Treebank guidelines. ! http://clearnlp.googlecode.com OpenNLP part-of-speech tagger ! ! One-pass, left-to-right part-of-speech tagging approach. ! Uses maximum entropy for machine learning. ! http://opennlp.apache.org

Toward Comprehensive Syntactic and Semantic Annotations of the - PowerPoint PPT Presentation

Toward Comprehensive Syntactic and Semantic Annotations of the Clinical Narrative Guergana K. Savova, PhD Boston Children s Hospital Harvard Medical School Albright, Daniel; Lanfranchi, Arrick; Fredriksen, Anwen; Styler, William; Warner,

8 JDT embraces Type Annotations JDT embraces Type Annotations Java 8 ready Stephan Herrmann GK

Semantic Analysis Wilhelm/Seidl/Hack: Compiler Design Syntactic and Semantic Analysis,

Chapter 3: Syntactic Forms, Grammatical Functions, and Semantic Roles Syntactic Constructions in

Outline Information Retrieval (IR) Syntactic IR Problems of Syntactic IR Semantic

1 Reflection on code annotations Classification of Code Annotations (1) Code annotations may

From Open Annotations to W3C Web Annotations (and the impact on IIIF Presentation API 3.0)

Introduction Syntactic analysis (5LN455) Syntactic parsing (5LN713/5LN717) 2017-11-07 Sara

Resugaring: Lifting Evaluation Sequences through Syntactic Sugar Justin Pombrio, Shriram

Denotational semantics The method define syntax ( syntactic domains ) define semantic

A Solid Foundation of A Solid Foundation of Semantic Computing Semantic Computing toward Web

Natural Language Processing Info 159/259 Lecture 20: Semantic roles (Nov.1, 2018) David

Natural Language Processing Info 159/259 Lecture 20: Semantic roles (Nov. 2, 2017) David

MODELLING AND EXCHANGING ANNOTATIONS FOR EUROPEANA PROJECTS Hugo Manguinhas, Antoine Isaac,

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

Judgment Aggregation and Collective Annotation Ulle Endriss Institute for Logic, Language and

PD3: Better Cross-Lingual Transfer By Combining Direct Transfer and Annotation Projection Steffen

Ch 3: Task Abstraction Paper: Design Study Methodology Tamara Munzner Department of Computer

Accurate Detection of Out of Body Segments In Surgical Videos using Semi-Supervised Learning

A Political News Corpus in Chi Chinese for Opinion Analysis f O i i A l i Benjamin K. Tsou

Decompositional Semantics Rachel Rudinger January 30, 2020 A story about semantic annotation

Using Corpus Lexicography of Constructions Jesse Dunietz, Lori Levin, and Jaime Carbonell LAW

Machine Learning for Annotating Semantic Web Services Andreas He, Nicholas Kushmerick

Sambuz

Useful Links

Newsletter

Mail Us