Assisted Curation: Does Text Mining Really Help? (Alex et al. 2008) - PowerPoint PPT Presentation

Assisted Curation: Does Text Mining Really Help? (Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor: Dr. Caroline Sporleder (and Martin Schreiber) 23.02.2012 Donnerstag, 23. Februar 2012

Outline ! Introduction ! Related Work ! Assisted Curation ! Text Mining Pipeline ! Curation Experiments ! Discussion and Conclusion ! References Donnerstag, 23. Februar 2012

Basic study elements - Content - ! Curation of biomedical literature ! For example, protein-protein interaction recognition: 1. Which protein are there? 2. If two proteins are named, are they in interaction? Donnerstag, 23. Februar 2012

Example for protein-protein interaction recognition [...] An example is YHR105W, which interacts with one protein involved in 1. Which proteins are there? vesicular transport, Akr2, and with YGL161C, an uncharacterized protein 2. If two proteins are named, are that interacts with two transport they in interaction? proteins, Yip1 and Pep12. YHR105W also interacts with YPL246C, another uncharacterized protein that interacts with Ypt1 and Vam7, proteins implicated in vesicular transport and membrane fusion, respectively. [...] Source: Schwikowski, Uetz, & Fields (pp. 1259, 2000) Donnerstag, 23. Februar 2012

Basic study elements - Research Question - ! Curation of biomedical literature ! For example, protein-protein interaction recognition: 1. Which protein are there? 2. If two proteins are named, are they in interaction? ! Task should be supported by text mining Donnerstag, 23. Februar 2012

Related Work ! Increasing development of information extraction systems (spurred on by BioCreAtIvE II competition; Krallinger, Leitner, & Valencia, 2007) ! studies suggest reduction of curation time ! But: lack of user studies for extrinsically evaluation ! no validation by curator feedback about affecting their work and usefulness Donnerstag, 23. Februar 2012

Basic study elements - Evaluation - ! Curation of biomedical literature ! For example, protein-protein interaction recognition: 1. Which protein are there? 2. If two proteins are named, are they in interaction? ! Task should be supported by text mining ! Evaluation by: ! objective performance metrics (e.g. speed improvement, number of records) ! focusing on user feedback, too Donnerstag, 23. Februar 2012

Curation Scenario - General - ! Goal: Curators should identify protein-protein interactions (PPIs) ! Initial step: Providing set of matching papers ! Middle step: Filtering papers into candidates Donnerstag, 23. Februar 2012

Curation Scenario - General - ! Goal: Curators should identify protein-protein interactions (PPIs) ! Initial step: Providing set of matching papers How can NLP help the curator ! Middle step: Filtering papers into candidates work? Donnerstag, 23. Februar 2012

Curation Scenario - General - ! Goal: Curators should identify protein-protein interactions (PPIs) ! Initial step: Providing set of matching papers ! Middle step: Filtering papers into candidates ! Basic Assumption: Information Extraction (IE) techniques are likely effective in identifying entities and relations " More specific: NLP can propose candidate PPIs Donnerstag, 23. Februar 2012

Curation Scenario - Concrete - Information Flow in the Curation Process Source: Alex et al. (p. 558, 2008) Donnerstag, 23. Februar 2012

NLP Engine - Main Components - Concrete Subtasks NLP-Components 1. Exists protein‘s name in 1. Named Entity sentence? Recognition 2. Which protein do they name? 2. Term Identification 3. If two proteins are named, are 3. Relation Extraction they in interaction? Donnerstag, 23. Februar 2012

NLP Engine - Creation details - ! How should the interface design look like? Donnerstag, 23. Februar 2012

NLP Engine - Creation details - For example: To decide which species is associated with which protein should be quite simple for an ! How should the interface design look like? expert but not necessarily for the software. ! How should the labour be divided between human and the software? Donnerstag, 23. Februar 2012

NLP Engine - Creation details - For example: Should recall or precision ! How should the interface design look like? be improved? ! How should the labour be divided between human and the software? ! Which functional characteristics of the NLP engine would be optimal? Donnerstag, 23. Februar 2012

NLP Engine - Creation details - ! How should the interface design look like? ! How should the labour be divided between human and the software? ! Which functional characteristics of the NLP engine would be optimal? The focus will be on the third question. Donnerstag, 23. Februar 2012

Pipeline-Components Pre- Named Entity Corpus processing Recognition Component Relation Term Performance Extraction Identification Donnerstag, 23. Februar 2012

Pipeline-Components inter-annotator 217 Papers agreement Pre- Named Entity Corpus 84.9 64.8 processing Recognition PPI FRAG* 9 Entities relations relations were enriched with Component Relation Term Performance 88.4 Extraction Identification 87.1 59.6 Properties Normalized Attributes *linked fragments and mutants to their parents Donnerstag, 23. Februar 2012

Pipeline-Components inter-annotator 217 Papers agreement Pre- Named Entity Corpus 84.9 64.8 processing Recognition PPI FRAG* 9 Entities relations relations Corpus consists of 2 million tokens: were enriched with Component Relation Term Performance 88.4 Extraction Identification - TRAIN (66%) 87.1 59.6 - DEVTEST (17%) Properties Normalized Attributes - TEST (17%) *linked fragments and mutants to their parents Donnerstag, 23. Februar 2012

Pipeline-Components Pre- Named Entity Corpus processing Recognition Sentence Adding useful Attaches NCBI* boundary Tokenization linguistic taxonomy Component Relation Term detection markup identifiers Performance Extraction Identification *National Center for Biotechnology Information Donnerstag, 23. Februar 2012

Pipeline-Components no entity Pre- Named Entity Corpus processing Recognition Component Relation Term Performance Extraction Identification entity Donnerstag, 23. Februar 2012

Pipeline-Components entity no entity Sum no entity pred pred entity Pre- Named Entity 9 3 12 Corpus real processing Recognition no entity 1 11 12 real Sum 10 14 24 Component Relation Term Performance Extraction Identification entity Donnerstag, 23. Februar 2012

Pipeline-Components entity no entity Sum no entity pred pred entity Pre- Named Entity 9 3 12 Corpus real processing Recognition no entity 1 11 12 real Sum 10 14 24 Component Relation Term Performance Extraction Identification Recall: 9/12 = 0.75 entity Donnerstag, 23. Februar 2012

Pipeline-Components entity no entity Sum no entity pred pred entity Pre- Named Entity 9 3 12 Corpus real processing Recognition no entity 1 11 12 real Sum 10 14 24 Component Relation Term Performance Extraction Identification Recall: 9/12 = 0.75 Precision: 9/10 = 0.9 entity Donnerstag, 23. Februar 2012

Assisted Curation: Does Text Mining Really Help? (Alex et al. 2008) - PowerPoint PPT Presentation

Assisted Curation: Does Text Mining Really Help? (Alex et al. 2008) by Benedict Fehringer Seminar: Unlocking the Secrets of the Past: Text Mining for Historical Documents Supervisor: Dr. Caroline Sporleder (and Martin Schreiber)

INTEROPen FHIR Curation Work Dr. Munish Jokhani FHIR Curation Clinical Engagement Lead, NHS

The Digital Curation Centre Michael Day Digital Curation Centre UKOLN, University of Bath

Evaluation of text data mining for Evaluation of text data mining for database curation: lessons

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Tools and Resources for Data Curation Stephen Abrams Perry Willett UC Curation Center /

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Digital Curation at the National Space Science Data Center DigCCurr2007: Digital Curation In

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Text Mining Text Mining Web pages Emails Technical documents Corporate documents

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Content Curation What do I do with all this information? KRISTY BURROUGH ELEARNING MANAGER

The Digital Curation Centre Michael Day Digital Curation Centre UKOLN, University of Bath

Curation of computational biology models Curation of computational biology models Anand

The curation curation of laboratory experimental of laboratory experimental The data as part of

User Recommendation in Content Curation Platforms Jianling Wang, Ziwei Zhu and James Caverlee

MemeSequencer : Sparse Matching for Embedding Image Macros Abhimanyu (Abhi) Dubey, Esteban Moro,

ID 111x Background Topics The Game Development Process Course Materials Motivation

AllJoyn Node AllJoyn Thin Client Other Proximal or Cloud Devices 72 Device System Bridge

Community Meeting Celebrating 10 years of housing the homeless! 2 Strengthening the CoC System

Aural Melissa Chan (chanm3) & Josh Nazarian (jknaz) The problem: Personalized content

Native Content Distribution through Off-Path Content Discovery A Proposal for a Downstream

TECHNICAL DISCOVERY Ravindra Singh - @ravindrasingh01 Shashank Merothiya - @shashtra

Integrating multi-dimensional information spaces Kostas Saidis, Alex Delis {saiko,ad}@di.uoa.gr

Sambuz

Useful Links

Newsletter

Mail Us

Assisted Curation: Does Text Mining Really Help? (Alex et al. 2008) - PowerPoint PPT Presentation

Assisted Curation: Does Text Mining Really Help? (Alex et al. 2008) by Benedict Fehringer Seminar: Unlocking the Secrets of the Past: Text Mining for Historical Documents Supervisor: Dr. Caroline Sporleder (and Martin Schreiber)

INTEROPen FHIR Curation Work Dr. Munish Jokhani FHIR Curation Clinical Engagement Lead, NHS

The Digital Curation Centre Michael Day Digital Curation Centre UKOLN, University of Bath

Evaluation of text data mining for Evaluation of text data mining for database curation: lessons

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Tools and Resources for Data Curation Stephen Abrams Perry Willett UC Curation Center /

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Digital Curation at the National Space Science Data Center DigCCurr2007: Digital Curation In

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Text Mining Text Mining Web pages Emails Technical documents Corporate documents

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Content Curation What do I do with all this information? KRISTY BURROUGH ELEARNING MANAGER

The Digital Curation Centre Michael Day Digital Curation Centre UKOLN, University of Bath

Curation of computational biology models Curation of computational biology models Anand

The curation curation of laboratory experimental of laboratory experimental The data as part of

User Recommendation in Content Curation Platforms Jianling Wang, Ziwei Zhu and James Caverlee

MemeSequencer : Sparse Matching for Embedding Image Macros Abhimanyu (Abhi) Dubey, Esteban Moro,

ID 111x Background Topics The Game Development Process Course Materials Motivation

AllJoyn Node AllJoyn Thin Client Other Proximal or Cloud Devices 72 Device System Bridge

Community Meeting Celebrating 10 years of housing the homeless! 2 Strengthening the CoC System

Aural Melissa Chan (chanm3) &amp; Josh Nazarian (jknaz) The problem: Personalized content

Native Content Distribution through Off-Path Content Discovery A Proposal for a Downstream

TECHNICAL DISCOVERY Ravindra Singh - @ravindrasingh01 Shashank Merothiya - @shashtra

Integrating multi-dimensional information spaces Kostas Saidis, Alex Delis {saiko,ad}@di.uoa.gr

Sambuz

Useful Links

Newsletter

Mail Us

Aural Melissa Chan (chanm3) & Josh Nazarian (jknaz) The problem: Personalized content