Measuring inter- -annotator annotator Measuring inter agreement - PowerPoint PPT Presentation

Measuring inter- -annotator annotator Measuring inter agreement in GO agreement in GO annotations annotations Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Binns D, Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Bin ns D, Apweiler R. An evaluation of GO annotation retrieval for BioCreAtIvE and tIvE and Apweiler R. An evaluation of GO annotation retrieval for BioCreA GOA. BMC Bioinformatics 2005; 6 Suppl 1:S17. PMID: 15960829. GOA. BMC Bioinformatics 2005; 6 Suppl 1:S17. PMID: 15960829. SILS Biomedical Informatics Journal Club SILS Biomedical Informatics Journal Club http://ils.unc.edu/bioinfo/ http://ils.unc.edu/bioinfo/ 2005- 2005 -10 10- -18 18

Gene Ontology Annotation (GOA) project � Goal: Annotate proteins in UniProt with GO terms GO GO GO GO MODs GO GOA annotations protein protein organism organism knowledge knowledge knowledge (UniProt) knowledge (UniProt) 2 2

Problems / Questions � Protein information continues to grow faster than curators can manually annotate it with knowledge extracted from the literature. � Automated annotation methods still don’t understand natural language well. � “what do GO curators really need?” [2] � A system to find ‘relevant’ papers and extract “the distinct features of a given protein and species”, and then “to locate within the text the experimental evidence to support a GO term assignment.” [2] � RQ: Does “automatically derived classification using information retrieval and extraction” “assist biologists in the annotation of the GO terminology to proteins in UniProt?” [2] 3 3

BioCreAtIvE � Critical Assessment of Information Extraction systems in Biology � Addresses the problems of comparability and evaluation (multiple text-mining systems using different data and tasks) � Defines a common task, common data sets and a clearly defined evaluation � “BioCreAtIvE task 2 was an experiment to test if automatically derived classification using information retrieval and extraction could assist expert biologists in the annotation of the GO vocabulary to the proteins in the UniProt Knowledgebase.” [1] 4 4

Standard IR evaluation process Used by TREC, MUC, CASP, KDD, et al. training data test data Evaluation Training Created / evaluated by human judges System System improve- training performance test ments data measurement data results results 5 5

Manual annotation process Protein prioritization 1. Un-annotated 2. Disease relevance 3. Microarray importance Find relevant papers Do existing papers in UniProt entry have GO � relevance? Supplementary PubMed searches using gene & � protein names Underlying species is important � Term extraction Paper is preferred � Scan specific sections [Table 1] � Term assignment Browse GO for appropriate terms � 6 6

Automated annotation process � Identify proteins in narrative text of papers � Check for presence of functional annotation � Select GO term and text that provided the evidence [4] 7 7

Data Training set � ~9,000 existing manually-curated GO annotations in UniProt with PubMed IDs & GO evidence codes � GO evidence codes ISS, IC, ND ignored � Some coding problems on older annotations limit the number of usable records [5] Test set � 200 papers from JBC 1998-2002 � Already associated with 286 UniProt entries, but lacking manual GO annotation � 923 GO terms were manually extracted; avg of 9 terms/protein 8 8

Mouse/Human annotation consistency human mouse - gene H1 - gene M1 orthologous - gene H2 - gene M2 … … - gene Hn - gene Mn consistent? GO - GO G1 - GO G2 … - GO Gn 9 9

Inter-annotator agreement � Sources of variation: � Curator’s biological knowledge / experience � Curator’s standard work practices [should be normalized for the study] � Manually-curated annotations could be wrong � Curators acting as relevance judges creates bias Test set papers Test Test A1 Test set Annotations set Annotations set Annotations curated papers papers papers 1. Exact term match Comparison: 2. Same lineage 3. Different lineage Test Test A2 Test set Annotations set Annotations set Annotations curates papers papers papers 10 10

Evaluation criteria 11 11

Evaluation criteria 12 12

Inter-annotator agreement 13 13

Inter-annotator agreement Camon’s 3 measures of agreement don’t allow for: � measurement of magnitude of difference, apart from 1 node up or down (parent_of, � child_of); cases where similar terms appear in different parts of the tree (polyhierarchy); � when new terms must be created for concepts that don’t currently exist in GO; � measures of annotation quality other than inter-annotator consistency � don’t adjust for chance or >2 annotators as do statistics such as Cohen’s kappa � Table 4. Annotation quality facets, questions, and evaluation methods. Facet Research questions Evaluation methods Consistency What is the nature and degree of Compare variation in similar annotations variance in annotations made by different made by different annotators (inter- curators for the same unit of evidence? annotator consistency). Specificity Do the annotations of the same unit of Compare quantitative and qualitative evidence made by different annotators variation in annotations apart from vary in terms of breadth, depth, consistency facets. specificity, etc.? Reliability Does the same curator make the same Compare variation in individual curators’ annotations for the same article at annotations over time (intra-annotator different time points? What factors might consistency). contribute to differences in annotation over time? Accuracy How is the accuracy of an annotation Define annotation accuracy and how to evaluated? What are the decision points measure variance. Evaluate the in the annotation process that influence accuracy of selected extant annotations. accuracy? 14 14 from MacMullen, W.J., Identification of strategies for information integration using annotation evidence. NLM F37 proposal (PAR-03-070), 2005-08-04.

Questions � “Variation is acceptable between curators but inaccuracy is not.” [6] 15 15

GO annotation 16 16 http://geneontology.org/GO.nodes.shtml

GO multi–organism annotation 17 17 http://geneontology.org/GO.annotation.example.shtml

Measuring inter- -annotator annotator Measuring inter agreement - PowerPoint PPT Presentation

Measuring inter- -annotator annotator Measuring inter agreement in GO agreement in GO annotations annotations Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Binns D, Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J,

Annotator https://github.com/okfn/annotator User interactions Protocols Data models User

world of In Inter Ic Ice-Pump JAN 2016 Presentation of Inter Ice-Pump 1 Inter Ice-Pump ApS //

Why Inter- -Municipal Municipal Why Inter Cooperation? Cooperation? 1 Inter- -Municipal

Video on the Web: Experiences from SMIL and from the Ambulant Annotator Jack Jansen, Dick

Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments Presenter: Tyler

Modeling Annotator Accuracies for Supervised Learning Abhimanu Kumar Matthew Lease Department

Choosing the Right Evaluation for Machine Translation An Examination of Annotator and Automatic

Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory

ADDRESS INTER-MODAL CONFLICT CONTENTS 1. Introduction 2. Identified inter-modal conflicts within

A Southeast Louisiana Inter Modal A Southeast Louisiana Inter Modal Transportation Hub

ITU on Measuring Speech Quality Measuring Perceived Quality Typically done by using standards

Measuring the Internet Project Introduction Mat Ford / David Belson measuring@isoc.org

Measuring What Matters Quality, Impact and Measuring Social Value Philip Angier, Angier Griffin

Measuring Environmental & Social Value Introduction Agenda Introductions What is

Annotating Expressions of Opinion and Emotion in the Italian Content Annotation Bank (I-CAB)

Tagging modality in Oceanic languages of Melanesia Annika Tjuka, Lena Weimann, and Kilu von

Optimization and Simulation Statistical analysis and bootstrapping Michel Bierlaire Transport

About interpolation on manifolds... How to interpolate points on curved spaces ? Light fast

Probability and Statistics for Computer Science Principal Component Analysis --- Exploring

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root

A S S O C I A T I O N O F S T A T E P U B L I C H E A L T H N U T R I T I O N I S T S A S S O

Lexical Association Measures Collocation Extraction Pavel Pecina pecina@ufal.mff.cuni.cz

Carnegie Mellon Univ. Problem Dept. of Computer Science Getting the data: Data Warehouses,

Absorbing systematic effects to obtain a better Absorbing systematic effects to obtain a better

Sambuz

Useful Links

Newsletter

Mail Us

Measuring inter- -annotator annotator Measuring inter agreement - PowerPoint PPT Presentation

Measuring inter- -annotator annotator Measuring inter agreement in GO agreement in GO annotations annotations Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Binns D, Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J,

Annotator https://github.com/okfn/annotator User interactions Protocols Data models User

world of In Inter Ic Ice-Pump JAN 2016 Presentation of Inter Ice-Pump 1 Inter Ice-Pump ApS //

Why Inter- -Municipal Municipal Why Inter Cooperation? Cooperation? 1 Inter- -Municipal

Video on the Web: Experiences from SMIL and from the Ambulant Annotator Jack Jansen, Dick

Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments Presenter: Tyler

Modeling Annotator Accuracies for Supervised Learning Abhimanu Kumar Matthew Lease Department

Choosing the Right Evaluation for Machine Translation An Examination of Annotator and Automatic

Lecture 2 Annotation tools &amp; Segmentation Summary of Part 1 Annotation theory

ADDRESS INTER-MODAL CONFLICT CONTENTS 1. Introduction 2. Identified inter-modal conflicts within

A Southeast Louisiana Inter Modal A Southeast Louisiana Inter Modal Transportation Hub

ITU on Measuring Speech Quality Measuring Perceived Quality Typically done by using standards

Measuring the Internet Project Introduction Mat Ford / David Belson measuring@isoc.org

Measuring What Matters Quality, Impact and Measuring Social Value Philip Angier, Angier Griffin

Measuring Environmental &amp; Social Value Introduction Agenda Introductions What is

Annotating Expressions of Opinion and Emotion in the Italian Content Annotation Bank (I-CAB)

Tagging modality in Oceanic languages of Melanesia Annika Tjuka, Lena Weimann, and Kilu von

Optimization and Simulation Statistical analysis and bootstrapping Michel Bierlaire Transport

About interpolation on manifolds... How to interpolate points on curved spaces ? Light fast

Probability and Statistics for Computer Science Principal Component Analysis --- Exploring

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) &amp; (Root

A S S O C I A T I O N O F S T A T E P U B L I C H E A L T H N U T R I T I O N I S T S A S S O

Lexical Association Measures Collocation Extraction Pavel Pecina pecina@ufal.mff.cuni.cz

Carnegie Mellon Univ. Problem Dept. of Computer Science Getting the data: Data Warehouses,

Absorbing systematic effects to obtain a better Absorbing systematic effects to obtain a better

Sambuz

Useful Links

Newsletter

Mail Us

Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory

Measuring Environmental & Social Value Introduction Agenda Introductions What is

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root