The interplay of form and meaning in complex medical terms Evidence from clinical Dutch Leonie Grön, Ann Bertels & Kris Heylen LAW-MWE-CxG-2018, 26 August 2018, Santa Fe
Specialized terminologies are dominated by Multi-Word Expressions (cf. e.g. Daille, 1994; De Hertog & Heylen, 2012)
Specialized terminologies are dominated by Multi-Word Expressions (cf. e.g. Daille, 1994; De Hertog & Heylen, 2012)
Specialized terminologies are dominated by Multi-Word Expressions (cf. e.g. Daille, 1994; De Hertog & Heylen, 2012)
Specialized terminologies are dominated by Multi-Word Expressions (cf. e.g. Daille, 1994; De Hertog & Heylen, 2012)
property_of
finding procedure site site abdomen, head abdomen, head device used severity catheter mild, moderate
obesitas ter hoogte van abdominale obesitas abdomen ‘abdominal obesity’ ‘obesitas at the abdomen’ obesity abdomen obees + obesitas abdominaal ‘abdomen obese’ ‘obesitas abdominal’ abdomen obesitas abdomen ‘obesitas abdomen’
BUT: specialized information is entrenched in linguistic structures mutual attraction between syntactic & semantic structures grammatical features can indicate conceptual relations (cf. Schulze & Römer, 2008; Faber & Léon-Araùz, 2016; ten Hacken, 2015)
Is there a patterning of the conceptual features ? surface form and of complex medical terms
Annotation of medical terms with SNOMED codes ID term 249533007 abdomen obees 249533007 abdominale obesitas 249533007 abd obesitas 274,082 entities corpus of EHRs annotation of 15,025 unique terms 14,999 consultations 4,426 consultations 7,687 concepts 500 patients 171 patients validation of term- concept associations
Retrieval of MWEs findings procedures diagnostic SNOMED term obesity ultrasonography Dutch variants obesitas echografie adipositas sonografie obes echo lexical stems adipo sono Σ 59,731 Σ 63,559
Included types of MWEs -3 -2 -1 head noun +1 +2 +3 ochtend hypo ‘morning’ ‘hypoglycemia’ hypo matinale ‘hypoglycemia’ ‘matinal’ hypo met forse convulsie ‘hypoglycemia’ ‘with strong seizure’
Included types of MWEs -3 -2 -1 head noun +1 +2 +3 ochtend hypo compounds ‘morning’ ‘hypoglycemia’ matinale hypo ‘matinal’ met forse convulsie hypo ‘with strong seizure’
Included types of MWEs -3 -2 -1 head noun +1 +2 +3 ochtend hypo ‘morning’ ‘hypoglycemia’ pre-modified matinale hypo noun phrases ‘matinal’ met forse convulsie hypo ‘with strong seizure’
Included types of MWEs -3 -2 -1 head noun +1 +2 +3 ochtend hypo ‘morning’ ‘hypoglycemia’ matinale hypo ‘matinal’ post-modified met forse convulsie hypo ‘with strong seizure’ noun phrases
Annotation of MWEs at 2 levels: Penn Tagset for biomedical text formal SNOMED Semantic classes & attributes conceptual (de Castilho et al., 2016; Warner et al., 2012; SNOMED International, 2018)
JJ NN formal diabetische retinopathie ‘diabetic retinopathy’ CAUSE FINDING conceptual
NN NN formal injectie insuline ‘injection [of] insulin’ PROCEDURE SUBSTANCE conceptual
Analysis at phrase level: Influence of the semantic type of the headword ~ degree of lexicalization ? ~ proportion of phrase types
Distribution of phrase types procedures findings 0% 20% 40% 60% 80% 100% compounds pre-modified NPs post-modified NPs
Average number of unique expressions per concept across different phrase types compounds pre-modified NPs post-modified NPs procedures 2.57 3.69 3.38 findings 1.33 3.63 2.83
Analysis at token level: Patterning of concept combinations ? ~ grammatical structures
Associate expressions with overlapping tag sequences with grammatico-semantic patterns rx thorax ‘x - ray [of the] chest’ ) ( NN, NN PROCEDURE, SITE CT schedel ‘CT [of the] skull’ ) ( JJ, NN abdominale injectie ‘abdominal injection’ SITE, PROCEDURE
frequency of the grammatico-semantic pattern absolute frequency of the concept combination how dominant is a construction to express a combination of concepts
Top patterns for findings combined with PoS sequence example relative frequency alimentaire obesitas CAUSE JJ, NN 0.90 ‘alimentary obesity ’ vaak hypo COURSE RB, NN ‘frequently 0.35 hypoglycemia’ morbiede obesitas SEVERITY JJ, NN 0.83 ‘morbid obesity ’
Top patterns for findings combined with PoS sequence example relative frequency alimentaire obesitas CAUSE JJ, NN 0.90 ‘alimentary obesity ’ vaak hypo COURSE RB, NN ‘frequently 0.35 hypoglycemia’ morbiede obesitas SEVERITY JJ, NN 0.83 ‘morbid obesity ’
Top patterns for procedures combined with PoS sequence example relative frequency lipidenmeting COMPONENT NNS, NN 0.44 ‘measurement of lipids ’ COMPONENT, gunstig lipidenprofiel JJ, NNS, NN 0.71 PROPERTY ‘good lipid profile ’ COMPONENT, glycemiedagprofielen NN, NN, NNS 0.72 TIME ‘glycemic day profiles’
Top patterns for procedures combined with PoS sequence example relative frequency lipidenmeting COMPONENT NNS, NN 0.44 ‘measurement of lipids ’ COMPONENT, gunstig lipidenprofiel JJ, NNS, NN 0.71 PROPERTY ‘good lipid profile ’ COMPONENT, glycemiedagprofielen NN, NN, NNS 0.72 TIME ‘glycemic day profiles’
conceptual composition ~ formal structure of medical MWEs findings ~ pre-modified NPs procedures ~ nominal compounds
one reason: lexical gaps combined adjective noun with adj + noun extreem *extremiteit extreme obesitas finding SEVERITY ‘extreme’ ‘extremity’ ‘extreme obesity’ insuline procedure SUBSTANCE – noun + noun ‘insulin’ insulineinjectie ‘insulin injection’
BUT: tendency is robust across concept combinations! adj + noun combined adjective noun renale insufficiëntie with ‘renal insufficiency’ finding renaal nier SITE ‘renal’ ‘kidney’ procedure noun + noun nierecho ‘kidney echography’ echo nier ‘echography [of the] kidney’
structural reductions ~ fixed concept combinations combined reduced prepositional full prepositional phrase with phrase meting van de lipiden meting lipiden procedure COMPONENT ‘measurement of lipids’ ‘measurement lipids’ rx van de thorax rx thorax procedure SITE ‘x - ray of the thorax’ ‘x -ray thorax ’ lipodistrofie ter hoogte van het lipodistrofie abdomen SITE abdomen finding ‘lipodystrophy abdomen’ ‘lipodystrophy at the abdomen’ nefropathie ten gevolge van *nephropathie diabetes finding CAUSE diabetes ‘nephropathy diabetes’ ‘nephropathy due to diabetes’
complex medical terms fixed concept constellations habitual formal constructions constructions take on communicative value in themselves
benefit for clinical NLP identification and segmentation of MWEs semantic classification and relation extraction
Thank you for your attention! Questions? Suggestions? leonie.gron@kuleuven.be 38
References de Castilho, R. E., Mujdricza-Maydt, E., Yimam, S. M., Hartmann, S., Gurevych, I., Frank, A., & Biemann, C. (2016). A Web-based Tool for the Integrated Annotation of Semantic and Syntactic Structures. In Proceedings of the LT4DH workshop at COLING 2016 (pp. 76 – 84). Osaka. Daille, B. (1994). Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Workshop at the 32nd Annual Meeting of the Association for Computational Linguistics (pp. 29 – 36). Stroudsburg: Association for Computational Linguistics. De Hertog, D., & Heylen, K. (2012). The Prevalence of Multiword Term Candidates in a Legal Corpus. In G. Aguado de Cea (Ed.), Proceedings of the 10th Terminology and Knowledge Engineering Conference (TKE2012): New Frontiers in the Constructive Symbiosis of Terminology and Knowledge Engineering (pp. 283 – 290). Madrid: Universidad Politecnica de Madrid.
References Faber, P., & León-Araùz, P. (2016). Specialized Knowledge Representation and the Parameterization of Context. Frontiers in Psychology , 7 (February). http://doi.org/10.3389/fpsyg.2016.00196 Warner, C., Lanfranchi, A., O’Gorman, T., Howard, A., Gould, K., & Regan, M. (2012). Bracketing Biomedical Text: An Addendum to Penn Treebank II Guidelines. Retrieved May 14, 2018, from https://clear.colorado.edu/compsem/documents/treebank_guidelines.pdf Schulze, R., & Römer, U. (2008). Introduction. Patterns, Meaningful Units and Specialized Discourses. International Journal of Corpus Linguistics , 13 (3), 265 – 270. http://doi.org/10.1075/ijcl.13.3.01sch SNOMED CT Editorial Guide. (2018). Retrieved May 14, 2018, from https://confluence.ihtsdotools.org/display/DOCEG/SNOMED+CT+Editorial+Guide
References Icons from the Noun Project created by Ben Davis Cengis SARI Drishya Ken Murray Melvin https://thenounproject.com/
Recommend
More recommend