Coling 2008 workshop on human judgements in Computational - PowerPoint PPT Presentation

Workshop Topic Research Issues Statistics Coling 2008 workshop on human judgements in Computational Linguistics Ron Artstein Gemma Boleda Frank Keller Sabine Schulte im Walde August 23, 2008 Artstein et al. Coling 2008 workshop on human judgements 1

Workshop Topic Research Issues Statistics 1 Workshop Topic 2 Research Issues Theoretical Issues Reliability Learning from Disagreements Efficient Data Collection 3 Statistics Artstein et al. Coling 2008 workshop on human judgements 2

Workshop Topic Research Issues Statistics Topic Human judgments play a key role in computational linguistics: category inventories and annotation schemes are defined on the basis of judgments; lexicon creation or corpus annotation is conducted by experts via a sequences of linguistic judgments; system evaluation often involves judging the quality of system output or system performance. Artstein et al. Coling 2008 workshop on human judgements 3

Workshop Topic Research Issues Statistics Topic Questions concerning the design of judgment experiments: types of judgment experiments, design guidelines; lab-based vs. web-based experiments; methodologies for controversial tasks; role of ambiguity and polysemy in these tasks; appropriate level of granularity for judgment categories; type of participants (e.g., expert vs. naive); instructions and guidelines for participants. Artstein et al. Coling 2008 workshop on human judgements 4

Workshop Topic Research Issues Statistics Topic Questions concerning the analysis and interpretation of judgment data: importance of inter-annotator agreement; most suitable measures of agreement; other quantitative and qualitative methods for analyzing judgments; similarity/difference with practice in psycholinguistics; interaction of analysis procedures and annotation instructions. Artstein et al. Coling 2008 workshop on human judgements 5

Theoretical Issues Workshop Topic Reliability Research Issues Learning from Disagreements Statistics Efficient Data Collection Theoretical Issues Making a linguistics judgment is a categorization task: in the psychological literature, two main theories of categorization exist: the property view holds that the members of a category are defined by a unique set of features; the prototype view assumes that category membership is defined by in terms of similarity to a prototypical exemplar of that category; traditionally, generative linguistics have espoused a property view, and cognitive linguists a prototype view, of categories; it seems possible that some linguistic categories work through properties, while others through prototypes. Artstein et al. Coling 2008 workshop on human judgements 6

Theoretical Issues Workshop Topic Reliability Research Issues Learning from Disagreements Statistics Efficient Data Collection Theoretical Issues A recent theoretical issue is gradience in linguistic data: the literature on gradience has mainly focused on gradient grammaticality judgments; judgment techniques such as magnitude estimation have been developed to reliably elicit gradient judgments (Bard et al., 1996); however, there has been some work on gradient linguistic categories as well, e.g., Aarts’ (2008) distinction between intersective and subsective gradience; these development are yet to be reflected in computational linguistics. Artstein et al. Coling 2008 workshop on human judgements 7

Theoretical Issues Workshop Topic Reliability Research Issues Learning from Disagreements Statistics Efficient Data Collection Reliability Reliability of judgments is an ongoing research issue: Cohen’s κ has been a widely used to measure agreement for linguistic annotation since Carletta (1996); but this has recently been criticized (e.g., Di Eugenio and Glass, 2004; Poesio and Artstein 2008); alternative measures exist in the form of Krippendorff’s α , Scott’s π , Fleiss’ κ , etc. Bhowmick et al. (this workshop) propose an extension of κ to multi-category annotation. Artstein et al. Coling 2008 workshop on human judgements 8

Theoretical Issues Workshop Topic Reliability Research Issues Learning from Disagreements Statistics Efficient Data Collection Learning from Disagreements Another emerging issue is disagreements in judgments: disagreements can arise trivially due to errors, or due to genuine subjectivity in the judgment task; a key issue is the identification of the source of disagreements (Beigman Klebanov et al, this workshop); and how disagreements can be exploited for automatic classification (Reidsma et al., this workshop). Artstein et al. Coling 2008 workshop on human judgements 9

Theoretical Issues Workshop Topic Reliability Research Issues Learning from Disagreements Statistics Efficient Data Collection Efficient Data Collection In psychology, there has been a lot of interest in data collection over the internet (e.g., Birnbaum 2000): internet experimentation is very suitable for collecting linguistic judgments; offers access to a vast pool of participants and a wide range of languages and demographics; cost efficient, fast, experiments easy to set-up and analyze; but there are a number of open issues: data integrity and participant authentication; reliable presentation of instructions; recruitment of expert partitions; software (e.g., WebExp) and infrastructure for recruiting subjects (e.g., Mechanical Turk) readily available. Artstein et al. Coling 2008 workshop on human judgements 10

Workshop Topic Research Issues Statistics Statistics Workshop organization: 22 submissions received reviewed by program committee of 30 reviewers 8 papers accepted as talks 33 registered participants Sponsors: Spanish Education and Science Ministry via the KNOW project Sonderforschungsbereich 732, Universit¨ at Stuttgart Artstein et al. Coling 2008 workshop on human judgements 11

Coling 2008 workshop on human judgements in Computational - PowerPoint PPT Presentation

Workshop Topic Research Issues Statistics Coling 2008 workshop on human judgements in Computational Linguistics Ron Artstein Gemma Boleda Frank Keller Sabine Schulte im Walde August 23, 2008 Artstein et al. Coling 2008 workshop on human

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Framework as of September 2019 Current Key Judgements: Teaching Learning and assessment.

Semantics S E T O N T F A R D Gabriele Keller Where we are So far - Judgements and

Poverty Measurement Design Sabina Alkire, Jose Manuel Roche and Maria Emma Santos OPHI Workshop,

Improving Domain Independent Question Parsing with Synthetic Treebanks COLING 2018: LAW-MWE-CxG

Eliciting Subjectivity and Polarity Judgements on Word Senses Fangzhong Su & Katja Markert

childrens and adolescents judgements about prosocial behaviour Ben Hine, BSc. Supervised

Alternative Representations. A Case Study of Proportional Judgements Jakub Szymanik Shane

A Faith-Filled Catholic Who - Cardinal Virtues - Definitions - Prudence - Examples -

Relevant Case-Law Summaries Webinar on R ecent Important Judgements under GST Presented by:

Modeling Public Key Infrastructures in the Real World John Marchesini and Sean Smith BindView

Aesthetic and Symbolic Qualities as Antecedents of Overall Judgements of Interactive Products

Justifications and Wrong Judgements Giuseppe Primiero FWO - Research Foundation Flanders Centre

Extending average precision to graded relevance judgements Stephen

Different methods of using the judgements of natural language speakers on a semantic similarity

A Model-Theoretic Framework for Grammaticality Judgements Denys Duchier Jean-Philippe Prost

Spring 2016 Research Update Presentations UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN |

Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets Dimension Reduction

Good Predictions Are Worth a Few Comparisons Carine Pivoteau with Nicolas Auger and Cyril Nicaud

Linear Methods for Regression and Classification Petr Pok Czech Technical University in

Evaluating Interfaces with Users Why evaluation is crucial to interface design General approaches

Invulnerable software D. J. Bernstein University of Illinois at Chicago Public goal of

Entropy and temporal specifications Eugene Asarin 1 , Michel Bockelet 2 , Aldric Degorre 1 , alin

Introduction From Data to Insight Dr. etinkaya-Rundel & Dr. Morgan July 5, 2016 Overview

Coling 2008 workshop on human judgements in Computational - PowerPoint PPT Presentation

Workshop Topic Research Issues Statistics Coling 2008 workshop on human judgements in Computational Linguistics Ron Artstein Gemma Boleda Frank Keller Sabine Schulte im Walde August 23, 2008 Artstein et al. Coling 2008 workshop on human

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Framework as of September 2019 Current Key Judgements: Teaching Learning and assessment.

Semantics S E T O N T F A R D Gabriele Keller Where we are So far - Judgements and

Poverty Measurement Design Sabina Alkire, Jose Manuel Roche and Maria Emma Santos OPHI Workshop,

Improving Domain Independent Question Parsing with Synthetic Treebanks COLING 2018: LAW-MWE-CxG

Eliciting Subjectivity and Polarity Judgements on Word Senses Fangzhong Su &amp; Katja Markert

childrens and adolescents judgements about prosocial behaviour Ben Hine, BSc. Supervised

Alternative Representations. A Case Study of Proportional Judgements Jakub Szymanik Shane

A Faith-Filled Catholic Who - Cardinal Virtues - Definitions - Prudence - Examples -

Relevant Case-Law Summaries Webinar on R ecent Important Judgements under GST Presented by:

Modeling Public Key Infrastructures in the Real World John Marchesini and Sean Smith BindView

Aesthetic and Symbolic Qualities as Antecedents of Overall Judgements of Interactive Products

Justifications and Wrong Judgements Giuseppe Primiero FWO - Research Foundation Flanders Centre

Extending average precision to graded relevance judgements Stephen

Different methods of using the judgements of natural language speakers on a semantic similarity

A Model-Theoretic Framework for Grammaticality Judgements Denys Duchier Jean-Philippe Prost

Spring 2016 Research Update Presentations UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN |

Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets Dimension Reduction

Good Predictions Are Worth a Few Comparisons Carine Pivoteau with Nicolas Auger and Cyril Nicaud

Linear Methods for Regression and Classification Petr Pok Czech Technical University in

Evaluating Interfaces with Users Why evaluation is crucial to interface design General approaches

Invulnerable software D. J. Bernstein University of Illinois at Chicago Public goal of

Entropy and temporal specifications Eugene Asarin 1 , Michel Bockelet 2 , Aldric Degorre 1 , alin

Introduction From Data to Insight Dr. etinkaya-Rundel &amp; Dr. Morgan July 5, 2016 Overview

Eliciting Subjectivity and Polarity Judgements on Word Senses Fangzhong Su & Katja Markert

Introduction From Data to Insight Dr. etinkaya-Rundel & Dr. Morgan July 5, 2016 Overview