Thoughts on Learner Data and Dependency Parsing Niels Ott, Ramon Ziai, Julia Krivanek, Detmar Meurers Introduction and Thoughts on Learner Data and Motivation Learner Language Dependency Parsing and Dependency Annotation Approximated Target Hypotheses Niels Ott, Ramon Ziai, Julia Krivanek, Detmar Meurers Rule-Based vs. Data-Driven Universit¨ at T¨ ubingen Hypothesis SFB 833, Project A4 Corpora used Parsers used Overall Results Drop between UAS & LAS Results by dependency type A subjectless example Conclusion Second T¨ ubingen-Berlin Meeting on Analyzing Learner Language References T¨ ubingen, 5./6. Dezember 2011 SFB 833 1 / 30
Thoughts on Overview Learner Data and Dependency Parsing Introduction and Motivation Niels Ott, Ramon Ziai, Julia Krivanek, Detmar Meurers Introduction and Learner Language and Dependency Annotation Motivation Learner Language and Dependency Approximated Target Hypotheses Annotation Approximated Target Hypotheses Rule-Based vs. Data-Driven Rule-Based vs. Hypothesis Data-Driven Hypothesis Corpora used Corpora used Parsers used Parsers used Overall Results Drop between UAS & LAS Overall Results Results by dependency type Drop between UAS & LAS A subjectless example Conclusion Results by dependency type References A subjectless example Conclusion SFB 833 2 / 30
Thoughts on General Motivation Learner Data and Dependency Parsing Niels Ott, Ramon Ziai, Julia Krivanek, Detmar Meurers Introduction and Motivation Learner Language Why dependency parsing? and Dependency Annotation ◮ Focus on lexical dependency structure as an interface Approximated to interpretation . → CoMiC project compares meaning Target Hypotheses Rule-Based vs. of answers to reading comprehension questions Data-Driven Hypothesis ◮ At the same time, to characterize the nature of learner Corpora used Parsers used language, capturing morphosyntactic dependencies Overall Results Drop between UAS & LAS can also be an important goal ( → SLA research) Results by dependency type A subjectless example Conclusion References SFB 833 3 / 30
Thoughts on Dependency Parsing in the CoMiC Project (I) Learner Data and Dependency Comparing Meaning in Context Parsing Niels Ott, Ramon Ziai, Julia Krivanek, Detmar Meurers ◮ The CoMiC project investigates how the meaning of Introduction and student answers can be compared to the meaning of Motivation Learner Language target answers in reading comprehension exercises. and Dependency Annotation ◮ Data: Corpus from German classes in the US, Ohio Approximated State University (Prof. Kathryn Corl), University of Target Hypotheses Rule-Based vs. Kansas (prof. Nina Vyatkina). Data-Driven ◮ Target answers and student answers are compared with Hypothesis Corpora used respect to meaning, not form. Parsers used Overall Results ◮ Trying to detect automatically: Did the student answer Drop between UAS & LAS Results by dependency type the question correctly or not? A subjectless example Conclusion ◮ We want to parse German learner language References automatically with dependency parsers. ◮ These data are not annotated with errors or target SFB 833 hypotheses. 4 / 30
Thoughts on Dependency Parsing in the CoMiC Project (II) Learner Data and Dependency Comparing Meaning in Context Parsing Niels Ott, Ramon Ziai, Julia Krivanek, Detmar Meurers ◮ Our experimental system CoMiC-DE performs meaning Introduction and comparison on many levels, beginning from simple Motivation Learner Language token overlap. and Dependency Annotation ◮ So far, our most sophisticated level of linguistic Approximated representation is based on Lexical Resource Semantics Target Hypotheses Rule-Based vs. (LRS, Richter & Sailer 2003). Data-Driven ◮ Hahn & Meurers (2011) present an approach to the Hypothesis Corpora used Parsers used construction LRS representation from dependency Overall Results Drop between UAS & LAS structures. Results by dependency type A subjectless example ◮ Naturally, we need well-behaved dependency structures Conclusion of our learner data in order to construct good LRS References representations. ◮ Furthermore, we use dependency triples directly in the SFB 833 system. 5 / 30
Thoughts on Learner Language and Dependency Annotation Learner Data and Dependency Parsing Niels Ott, Ramon Ziai, Julia Krivanek, Detmar Meurers Introduction and Motivation Learner Language ◮ Ott & Ziai (2010) trained a statistical dependency parser and Dependency Annotation on a dependency-converted version of the T¨ uba-D/Z Approximated treebank and used it to parse learner language. Target Hypotheses ◮ CREG109, data set with 109 manually student answers Rule-Based vs. Data-Driven ◮ We are currently working on an extended data set Hypothesis containing more data and questions and target answers. Corpora used Parsers used ◮ Annotation scheme used: the one described by Foth Overall Results Drop between UAS & LAS (2006). Results by dependency type A subjectless example Conclusion References SFB 833 6 / 30
Thoughts on Abusing Annotation Schemes Learner Data and Dependency Parsing Niels Ott, Ramon Ziai, Julia Krivanek, Detmar Meurers Introduction and ◮ The dependency annotation scheme by Foth (2006) Motivation Learner Language has not been designed for learner language. and Dependency Annotation ◮ Hence, we are using an annotation scheme that simply Approximated cannot handle many constructions in the learner data. Target Hypotheses Rule-Based vs. ◮ What are possible solutions to this issue? Data-Driven Hypothesis 1. Annotating interlanguage as a system in its own right Corpora used using a special annotation scheme (Dickinson & Parsers used Overall Results Ragheb 2009). Drop between UAS & LAS Results by dependency type 2. Annotating target hypotheses that map to well-formed A subjectless example language and annotate these (or parse: Hirschmann Conclusion et al. 2010). References SFB 833 7 / 30
Thoughts on Annotating/Parsing Interlanguage Learner Data and Dependency Parsing Niels Ott, Ramon Ziai, Julia Krivanek, Detmar Meurers Introduction and Motivation Learner Language and Dependency Annotation ◮ So, if we would stick to interlanguage as a system in its Approximated own right? Target Hypotheses Rule-Based vs. ◮ Interlanguage is influenced by many learner-dependent Data-Driven factors (stage of acquisition, L1, background, etc). Hypothesis Corpora used ➥ Difficult to capture in a general parsing model. Parsers used Overall Results Drop between UAS & LAS Results by dependency type A subjectless example Conclusion References SFB 833 8 / 30
Thoughts on Aside: Isn’t it only a robustness issue? Learner Data and Dependency Parsing ◮ Arguably, robust tools should be able to deal with Niels Ott, Ramon Ziai, Julia Krivanek, Detmar Meurers learner language to some extent. Introduction and Motivation ◮ Foster (2007) automatically ‘damaged’ the Penn Learner Language Treebank with simulated learner errors and trained a and Dependency Annotation parser on it to achieve more error-tolerance. Approximated Target Hypotheses ◮ Still, this does not solve the problem of abusing an Rule-Based vs. annotation scheme. Data-Driven Hypothesis ◮ Possibly, there is a difference between learner levels Corpora used Parsers used ◮ Very advanced learners will be close to native speakers, Overall Results Drop between UAS & LAS so using native language categories might still be OK. Results by dependency type ◮ In our data, we have many beginners and intermediate A subjectless example Conclusion learners that produce language that often is impossible References to treat with native language categories. ◮ Robustness is good for us but robustness alone does SFB 833 not help us. 9 / 30
Thoughts on Annotating/Parsing Target Hypotheses Learner Data and Dependency Parsing Niels Ott, Ramon Ziai, Julia ◮ Using target hypotheses seems appealing in our Krivanek, Detmar Meurers situation. Introduction and Motivation ◮ Standard NLP tools could be used in the tool chain, Learner Language since we would have well-behaved language back and Dependency Annotation again. Approximated ◮ Still, we would have an explicit mapping back to the Target Hypotheses original learner data. Rule-Based vs. Data-Driven ◮ However, we do not want to annotate target hypotheses. Hypothesis Corpora used ◮ Our corpus is large, it would be a lot of work. Parsers used Overall Results ◮ It would not be applicable to tutoring systems that aim Drop between UAS & LAS Results by dependency type at giving feedback on unseen learner data. A subjectless example ◮ Target hypotheses in the sense of Falko’s ZH1 Conclusion (Reznicek et al. 2010) would be great, but also they References would provide more than we need. SFB 833 10 / 30
Recommend
More recommend