identifying grammar rules for language education with
play

Identifying Grammar Rules for Language Education with Dependency - PowerPoint PPT Presentation

Identifying Grammar Rules for Language Education with Dependency Parsing in German Eleni Metheniti, Pomi Park, Kristina Kolesova, Gnter Neumann August 27, 2019 Depling SyntaxFest Why identify grammar rules? Our larger mission: find


  1. Identifying Grammar Rules for Language Education with Dependency Parsing in German Eleni Metheniti, Pomi Park, Kristina Kolesova, Günter Neumann August 27, 2019 Depling – SyntaxFest

  2. Why identify grammar rules? • Our larger mission: find appropriate texts for first language learners in primary school education (in German and other languages...) • Text difficulty relies on many aspects; sentence length, word frequency, syntax... • How can we determine the difficulty of a sentence given the existing grammar rules (syntax + morphosyntax)? • How to even find grammar rules in text? Metheniti et al. (2019) 1

  3. Our approach • Create a new query language for translating grammar rules to syntactic/morphosyntactic patterns • Build patterns for German syntactic phenomena (age appropriate) + assign them with difficulty • Build a matching algorithm where input : parses + patterns, output : grammar rules • Evaluate the matches with gold parses and parses from 4 parsers (Munderline, UDPipe, jPTDP, Turku) Metheniti et al. (2019) 2

  4. Our main goals • Create very restrictive patterns that would not be found erroneously/overzealously • Descriptive and human-readable patterns • Search on dependency parses ( not just string methods or regex!) • Fast and successful with our dependency parser Metheniti et al. (2019) 3

  5. Why create a new query language? • Most existing text query languages (ANNIS, Poliqarp, COSMAS II) do not support dependency parsing, use regex (too difficult), match meaningless strings... • PML-TQ (Pajas and Štěpánek, 2009): very robust, too complex • TüNDRA (Martens, 2012): almost perfect... but not quite (uses TIGER annotation, allows surface structure) Metheniti et al. (2019) 4

  6. Query language template label ={c}& Metheniti et al. (2019) Figure 1: General template for a pattern with a head-dependent relation. tokenID(head_word) = headID(comp_word) wordform ={-j-} , wordform ={h-,i-}& wordform ={‘f’,‘g’}& lemma ={‘e’}& feature ={d,e}& POS ={A,B}& comp_word: head_word: wordform ={-j-} , wordform ={h-,i-}& wordform ={‘f’,‘g’}& lemma ={‘e’}& feature ={d,e}& label ={c}& POS ={A,B}& 5

  7. Example 1: NP with definite determiner die Metheniti et al. (2019) Match! “cat” Katze POS=NOUN “the” label=det comp_word : POS=DET Figure 2: Pattern to identify a noun phrase with a definite article. tokenID(head_word) = headID(comp_word) POS ={NOUN}, head_word: feature= {Definite=Def,PronType=Art}, POS= {DET}& label= {det}& 6 Definite=Def | PronType=Art | ...

  8. Example 1: NP with definite determiner (complex pattern) eine Metheniti et al. (2019) No match. “cat” Katze POS=NOUN “a” label=det comp_word : POS=DET Figure 3: Pattern to identify a noun phrase with a definite article. tokenID(head_word) = headID(comp_word) POS ={NOUN}, head_word: feature= {Definite=Def,PronType=Art}, POS= {DET}& label= {det}& 7 Definite=Ind | PronType=Art | ...

  9. Example 2: Definite pronoun (simple patterns) head_word: POS= {DET}& label= {det}& feature= {Definite=Def,PronType=Art} Figure 4: Pattern to identify a definite determiner. POS=DET label=det die “the” Metheniti et al. (2019) 8 Definite=Def | PronType=Art | ...

  10. Example 3a: Transitive sentence (compound patterns) “he” Metheniti et al. (2019) Match! . label=punct POS=PUNCT “Mary” Maria label=obj POS=PROPN “loves” liebt label=root POS=VERB er comp_word: feature=... label=nsubj POS=PRON tokenID(head_word) = headID(comp_word) POS ={VERB}& label ={root}, head_word: label ={obj}, comp_word: AND tokenID(head_word) = headID(comp_word) POS ={VERB}& label ={root}, head_word: label ={nsubj}, 9

  11. Example 3b: Reflexive sentence comp_word: Metheniti et al. (2019) Also a match... . label=punct POS=PUNCT “myself” mich label=obj POS=PRON “wash” wasche label=root POS=VERB “I” ich label=nsubj POS=PRON tokenID(head_word) = headID(comp_word) POS ={VERB}& label ={root}, head_word: label ={obj}, comp_word: AND tokenID(head_word) = headID(comp_word) POS ={VERB}& label ={root}, head_word: label ={nsubj}, 10 PronType=Prs | ... Relex=Yes | ...

  12. Exclude operator • Distinguish similar patterns by excluding the parts of the Metheniti et al. (2019) tokenID(head_word) = headID(comp_word)) POS ={VERB}& label= {root}, head_word: {PronType=Prs,Reflex=Yes}, label ={obj,iobj}& feature = AND tokenID(head_word) = headID(comp_word)) POS ={VERB}& label ={root}, head_word: label ={iobj}, AND tokenID(head_word) = headID(comp_word) POS ={VERB}& label ={root}, head_word: label ={obj}, comp_word: AND tokenID(head_word) = headID(comp_word) POS ={VERB}& label ={root}, head_word: label ={nsubj}, comp_word: sentences (not bitransitive sentences, not reflexive sentences) • Example: Pattern to only match simple mono-transitive pattern that should not match 11 ∼ (comp_word: ∼ (comp_word:

  13. Example 3a, revised Gluck label=subj Rolf “Rolf” POS=AUX label=aux hat “has” POS=NOUN label=obj “luck” tokenID(head_word) = headID(comp_word)) POS=VERB label=root gehabt “had” POS=PUNCT label=punct . Part 1: Match! Metheniti et al. (2019) POS=PROPN POS ={VERB}& label= {root}, comp_word: tokenID(head_word) = headID(comp_word) label ={nsubj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word) AND comp_word: label ={obj}, head_word: POS ={VERB}& label ={root}, AND head_word: label ={iobj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word)) AND label ={obj,iobj}& feature = {PronType=Prs,Reflex=Yes}, 12 ∼ (comp_word: ∼ (comp_word:

  14. Example 3a, revised Gluck label=subj Rolf “Rolf” POS=AUX label=aux hat “has” POS=NOUN label=obj “luck” tokenID(head_word) = headID(comp_word)) POS=VERB label=root gehabt “had” POS=PUNCT label=punct . Part 2: Match! Metheniti et al. (2019) POS=PROPN POS ={VERB}& label= {root}, comp_word: tokenID(head_word) = headID(comp_word) label ={nsubj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word) AND comp_word: label ={obj}, head_word: POS ={VERB}& label ={root}, AND head_word: label ={iobj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word)) AND label ={obj,iobj}& feature = {PronType=Prs,Reflex=Yes}, 13 ∼ (comp_word: ∼ (comp_word:

  15. Example 3a, revised label=obj label=subj Rolf “Rolf” POS=AUX label=aux hat “has” POS=NOUN Gluck tokenID(head_word) = headID(comp_word)) “luck” POS=VERB label=root gehabt “had” POS=PUNCT label=punct . Metheniti et al. (2019) POS=PROPN POS ={VERB}& label= {root}, comp_word: head_word: label ={nsubj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word) AND comp_word: label ={obj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word) AND label ={iobj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word)) AND label ={obj,iobj}& feature = {PronType=Prs,Reflex=Yes}, 14 ∼ (comp_word: ∼ (comp_word: Part 3: comp_word not found → No match.

  16. Example 3a, revised feature=... label=subj Rolf “Rolf” POS=AUX label=aux hat “has” POS=NOUN label=obj Gluck comp_word: “luck” POS=VERB label=root gehabt “had” POS=PUNCT label=punct . Metheniti et al. (2019) POS=PROPN tokenID(head_word) = headID(comp_word)) POS ={VERB}& label= {root}, head_word: label ={nsubj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word) AND comp_word: label ={obj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word) AND label ={iobj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word)) AND label ={obj,iobj}& feature = {PronType=Prs,Reflex=Yes}, 15 ∼ (comp_word: ∼ (comp_word: Part 4: Partial match → No match.

  17. Matching process {VerbForm=Part}, tokenID(head_word)=headID(comp_word) Metheniti et al. (2019) tokenID(head_word)=headID(comp_word)) AND head_word: POS={VERB} & label={root} & feature={VerbForm=Part}, (comp_word: POS={AUX} & label={aux}, tokenID(head_word)=headID(comp_word)) AND (comp_word: label={nsubj}, head_word: POS={VERB}&label={root}, 1 auxiliary verb intransitive verb, with with clause 135 patterns for syntactic/morphosyntactic German grammar rules 289 Simple POS={VERB} & feature= 1 ID Description Dif. Pattern 222 Auxiliary verb “haben”, {<222>,<218>}, head_word: present indicative head_word: POS={AUX} & wordform={“hab”,“habe”,“hast”,“hat”,“haben”} & feature= {Mood=Ind,VerbForm=Fin} 240 Composed forms: Per- fect indicative 1 comp_word: 16 ∼ (head_word: label={obj}) AND ∼ (head_word: label={iobj}) AND ∼ (head_word: POS={PUNCT}&wordform={“?”}) AND ∼ (head_word: feature={Mood=Imp}&label={root})

  18. Dictionaries For string matches, lemma matches, affix matches: • Dictionary of 15K German words from 117K corpus of children’s texts • Every word has orthographic, phonological, morphological information NB: Word matches should be used sparingly; dependencies are favoured. Metheniti et al. (2019) 17

Recommend


More recommend