Faculty of Mathematics, Informatics and Natural Sciences Christine Köhn and Arne Köhn {ckoehn,koehn}@informatik.uni-hamburg.de An Annotated Corpus of Picture Stories Retold by Language Learners
Learner Corpora Today Many available but small coverage mainly essays only marginally constrained → low agreement between error annotations (e. g. Fitzpatrick and Seegmiller (2004) ) languages other than English are underrepresented August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 2
Annotation Reliability Assumption: Reliable interpretation supports reliable annotation Foster reliable interpretation by collecting learner corpus with explicit task context (Ott et al., 2012) . knowing the context of an utterance facilitates interpreting it August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 3
Tasks with Explicit Task Contexts Reading comprehension exercise reasonable inter-annotator agreement for meaning assessment Ott et al. (2012) ) strongly infmuences learner’s choice of words / structures Picture description no textual infmuence pictures with single activity constrain the answers to a sensible degree for extracting verb(subj,obj) triples (King and Dickinson, 2013) sentences are conceptually simple August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 4
→ Picture story retelling (with appropriate choice of story) Moral mit Wespen “moral with wasps” by Erich Ohser Exploring the Middle Ground Task design criteria capture real language use, no textual infmuence free-form answers strong visual context elicit variety of sentence structures August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 5
Moral mit Wespen “moral with wasps” by Erich Ohser Exploring the Middle Ground Task design criteria capture real language use, no textual infmuence free-form answers strong visual context elicit variety of sentence structures → Picture story retelling (with appropriate choice of story) August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 5
Exploring the Middle Ground Task design criteria capture real language use, no textual infmuence free-form answers strong visual context elicit variety of sentence structures → Picture story retelling (with appropriate choice of story) Moral mit Wespen “moral with wasps” by Erich Ohser August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 5
Erich Ohser (1903-1944) prohibited from working under the Nazi regime father and son comics published under a pseudonym (e.o.plauen) arrested together with Erich Knauf for making political jokes in 1944 committed suicide the day before his trial Knauf was executed ( https://en.wikipedia.org/wiki/E._O._Plauen ) August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 6
Comic Strips Retold by Learners of German The ComiGS Corpus August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 7
Task and User Interface August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 8
The ComiGS corpus ~90 min for 2–3 stories CEFR levels from A2 (upper beginner) to B2/C1 (upper intermediate/lower advanced) 70 texts from 30 learners of German 30 texts for stories 1 and 2, respectively 10 texts for story 3 18k tokens, nearly 1.5k sentences tokens/sentence: 12.2 (mean), 11 (median) August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 9
Der Sohn schlisst sein Mund mit der Hand , The son closes his mouth with the hand , er sieht ängstlich und überraschend gleichzeitig . he sees anxious and surprising simultaneously . Example Moral mit Wespen “moral with wasps” by Erich Ohser August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 10
Example Der Sohn schlisst sein Mund mit der Hand , The son closes his mouth with the hand , er sieht ängstlich und überraschend gleichzeitig . he sees anxious and surprising simultaneously . Moral mit Wespen “moral with wasps” by Erich Ohser August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 10
Multi-purpose Annotations target hypothesis (TH): reconstruction of the original utterance THs annotated with PoS tags using STTS tag set (Schiller et al., 1999) syntactic annotation: labeled dependencies using scheme by Foth (2006) lemmas (base form of words) August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 11
b l a b l a b l a b l a b l a b l a b l a b l a b l a b l a b l a b l a b l a b l a Target Hypotheses There are many ways to correct a sentence. Reznicek et al. (2012) : Minimal Target Hypothesis (TH1) Extended Target Hypothesis (TH2) minimal changes minimal changes normalization for automatic as similar as possible to a native processing speaker’s utterance adheres to morphological, syntactic also: semantics, pragmatics, and orthographic rules information structure rules, e. g. if verb and arguments only rough guidelines don’t match: change arguments structurally similar to learner similar to learner’s intention utterance August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 12
Example (cont’d) Der Sohn schlisst sein Mund mit der Hand , The son closes his mouth with the hand , er sieht ängstlich und überraschend gleichzeitig . he sees anxious and surprising simultaneously . August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 13
Example (cont’d) orig Der Sohn schlisst sein Mund mit der Hand , The son closes his mouth with the hand , TH1 Der Sohn schließt seinen Mund mit der Hand , August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 14
Example (cont’d) orig Der Sohn schlisst sein Mund mit der Hand , The son closes his mouth with the hand , TH1 Der Sohn schließt seinen Mund mit der Hand , TH2 Der Sohn hält seinen Mund mit der Hand zu , “The son covers his mouth with his hand” August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 14
Example (cont’d) orig er sieht ängstlich und überraschend gleichzeitig . he sees anxious and surprising simultaneously . TH1 er sieht ängstlich und überrascht gleichzeitig aus . he looks-1 anxious and surprised simultaneously looks-2 . August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 15
Example (cont’d) orig er sieht ängstlich und überraschend gleichzeitig . he sees anxious and surprising simultaneously . TH1 er sieht ängstlich und überrascht gleichzeitig aus . he looks-1 anxious and surprised simultaneously looks-2 . TH2 er sieht gleichzeitig ängstlich und überrascht aus . “he looks anxious and surprised at the same time.” August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 15
Adaptations for annotating THs mostly adhere to Falko annotation manual (Reznicek et al., 2012) minor changes mainly due to difgerences between tasks and language levels e. g. colloquial language is not discouraged in general in TH2 most changes are extensions or clarifjcations → annotations are mainly compatible with Falko August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 16
if tokens aren’t next to each other: use tmid as well orig Die Kind ist […] liegend TH2 Das Kind liegt […] tmid 1 […] 1 use tmid if tokens aren’t changed in isolation orig Der Mann geht weiter […] TH2 Der Mann fährt fort […] The man walks/goes on […] tmid 1 1 […] Extension for Movements, Splits, Merges if a token is moved & changed, this information cannot be fully recovered later → introduce unique identifjer to indicate token movement ( tmid ) August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 17
if tokens aren’t next to each other: use tmid as well orig Die Kind ist […] liegend TH2 Das Kind liegt […] tmid 1 […] 1 use tmid if tokens aren’t changed in isolation orig Der Mann geht weiter […] TH2 Der Mann fährt fort […] The man walks/goes on […] tmid 1 1 […] Extension for Movements, Splits, Merges if a token is moved & changed, this information cannot be fully recovered later → introduce unique identifjer to indicate token movement ( tmid ) tokens can be merged or split orig Die Kind ist liegend […] TH2 Das Kind liegt […] The child is lying/lies […] August 26th, 2018 An Annotated Corpus of Picture Stories Retold by Language Learners, C. Köhn, A. Köhn 17
More recommend