abbreviation detection for biomedical articles
play

Abbreviation detection for biomedical articles by Sonja Kenari - PowerPoint PPT Presentation

Abbreviation detection for biomedical articles by Sonja Kenari Agenda Introduction Background Implementation Results Further Improvents Introduction Full project description COVID-19 Open Research Dataset Challenge (CORD-19): What do we


  1. Abbreviation detection for biomedical articles by Sonja Kenari

  2. Agenda Introduction Background Implementation Results Further Improvents

  3. Introduction Full project description COVID-19 Open Research Dataset Challenge (CORD-19): What do we know about vaccines and therapeutics? Abbreviation Dictionary Relationship NER detection tagger extraction 1

  4. Introduction Abbreviation Detection spaCy Python library for NLP Abbreviation detection Makes it easier to: Find articles of interest faster ? Keep up with the amount of new abbreviations 2

  5. Background Abbreviation Detection Pre trained models by spaCy scispaCy: AbbreviationDetector Detect: abbreviations & definitions short form long form Accuracy? 3

  6. Implementation Generate Pubannotations data subset [json] pubannotation [json] 100 out of 60,000 articles metadata file [csv] 4

  7. Implementation Generating files of abbreviations web scraping scispaCy output file format metadata file [csv] data subset [json] url full texts HTML parser AbbreviationDetector BeautifulSoup abbreviation, abbreviations, csv files Abbreviation, Abbreviations csv files 5

  8. Implementation Evaluation Compare the 2 { detected abbreviations with spaCy [csv] detected abbreviations with web scraping [csv] Number unique short forms detected by spaCy = (%) Number short forms detected by web scraping Number unique long forms detected by spaCy = (%) Number long forms detected by web scraping 6

  9. Result Result Abbreviation lists in short forms hit rate long forms hit rate Highest: 87.5% Highest: 52.6% Lowest: 25% Lowest : 0% 20 out of 100 notable faults - spaCy weak on long form - text from json files not updated after url articles - faults in denotation extraction 7

  10. Further Improvements spaCy Optimize programs Improve the results Make more time effjcient Extract from web scraper Pubannotations Update data Instead of full text extraction 8

  11. Thank you for listening! Questions...? Sonja Kenari nat14sta@student.lu.se 9 2020-05-29

Recommend


More recommend