Computer Based Extraction of Phenotypic Features of Human Congenital Anomalies from the Digital Literature Gökhan Karakülah, PhD Dokuz Eylül University, Health Sciences Institute, Department of Medical Informatics MIE2014- September 2014, İ stanbul
Outline Congenital Malformations? • Materials & Methods • Online data sources • Processing extracted data with basic NLP techniques • Development of information extraction rules • Results • An example output: De Lange Syndrome • Success of the approach • Conclusions •
Congenital Malformations (CMs) Human CMs are abnormal formation of single or multiple organs and/or body parts during intrauterine life.* CMs affect almost 2-3% of the newborns in general, however, there are variations among races in the frequencies.** *MeSH database definition ** Chung CS & Myrianthopoulos NC. Racial and prenatal factors in major congenital malformations. Am J Hum Genet 1968;20(1):44–60. ** Rosano A et al. Infant mortality and congenital anomalies from 1950 to 1994: an international perspective. J Epidemiol Community Health 2000;54(9):660–6.
CMs The diagnosis of CMs is mostly challenging and require special expertise: the lack of laboratory tests • most congenital malformations are not consist of a certain combinations of phenotypic • features most congenital malformations are not sharply distinct from other congenital • malformations in terms of related features. The physical examination of CM cases plays key role in the appropriate diagnosis and treatment. The cases are mostly being evaluated in the light of literature knowledge. Clinical diagnostic decision support tools are essential for accurate diagnosis of CMs.
Research questions? Developing a computational strategy for extracting the phenotypic features which characterize CMs from the case reports in the literature via text processing and NLP methods. Designing an initial framework of an information base for a potential clinical decision support (CDS) tool in the diagnosis of CMs by using the extracted information.
? ?
Concept normalization ≠ Text processing steps Abnormal jaw Abnormality of the jaws ! After normalization ≠ abnormality of the jaws abnormal jaw (lowercasing) ≠ After stop word removal abnormal jaw abnormal jaws = After stemming abnorm jaw abnorm jaw
? ?
Information extraction rules Phenotypic feature is Phrase pattern Polarity Opposite meaning observed before/after the phrase affected with Positive not affected with after biopsy showed Positive biopsy showed no after examination showed Positive examination showed no after found to have Positive not found to have after not been reported Negative previously not been reported, not after/before been reported previously, not been reported before was seen Positive was not seen before were observed in Positive were not observed in after/before who has Positive who has not, who has no after
? ?
Results Over 60.000 abstracts related to 486 congenital malformations were extracted from PubMed database. More than 10.000 phenotypic features defined in the human phenotype ontology were screened on the case report corpus. We devised 190 information extraction rules.
Example output: De Lange Syndrome
Results We identified 33 phenotypic features per CM on average. • The evaluation of the developed method with randomly • selected 100 abstracts showed that 97.82 % of the matches between real HPO terms and the ones extracted � from abstracts was correct. 89.20 % of the features extracted from the abstracts was matching to a � defined HPO term.
Conclusions The information extracted with natural language processing methods • could be a reliable information source for an application. The tool is unique in application of natural language processing for • designing clinical decision support system specific to CMs. The overall performance of the algorithm is highly convincing. • The approach developed here can be applied in other medical • domains. The clinical decision support tool developed here might be used both • in clinical applications and as a facilitating tool for specialty and training in medical sciences.
Thank you for your attention...
Recommend
More recommend