Identifying Grammar Rules for Language Education with Dependency Parsing in German Eleni Metheniti, Pomi Park, Kristina Kolesova, Günter Neumann August 27, 2019 Depling – SyntaxFest
Why identify grammar rules? • Our larger mission: find appropriate texts for first language learners in primary school education (in German and other languages...) • Text difficulty relies on many aspects; sentence length, word frequency, syntax... • How can we determine the difficulty of a sentence given the existing grammar rules (syntax + morphosyntax)? • How to even find grammar rules in text? Metheniti et al. (2019) 1
Our approach • Create a new query language for translating grammar rules to syntactic/morphosyntactic patterns • Build patterns for German syntactic phenomena (age appropriate) + assign them with difficulty • Build a matching algorithm where input : parses + patterns, output : grammar rules • Evaluate the matches with gold parses and parses from 4 parsers (Munderline, UDPipe, jPTDP, Turku) Metheniti et al. (2019) 2
Our main goals • Create very restrictive patterns that would not be found erroneously/overzealously • Descriptive and human-readable patterns • Search on dependency parses ( not just string methods or regex!) • Fast and successful with our dependency parser Metheniti et al. (2019) 3
Why create a new query language? • Most existing text query languages (ANNIS, Poliqarp, COSMAS II) do not support dependency parsing, use regex (too difficult), match meaningless strings... • PML-TQ (Pajas and Štěpánek, 2009): very robust, too complex • TüNDRA (Martens, 2012): almost perfect... but not quite (uses TIGER annotation, allows surface structure) Metheniti et al. (2019) 4
Query language template label ={c}& Metheniti et al. (2019) Figure 1: General template for a pattern with a head-dependent relation. tokenID(head_word) = headID(comp_word) wordform ={-j-} , wordform ={h-,i-}& wordform ={‘f’,‘g’}& lemma ={‘e’}& feature ={d,e}& POS ={A,B}& comp_word: head_word: wordform ={-j-} , wordform ={h-,i-}& wordform ={‘f’,‘g’}& lemma ={‘e’}& feature ={d,e}& label ={c}& POS ={A,B}& 5
Example 1: NP with definite determiner die Metheniti et al. (2019) Match! “cat” Katze POS=NOUN “the” label=det comp_word : POS=DET Figure 2: Pattern to identify a noun phrase with a definite article. tokenID(head_word) = headID(comp_word) POS ={NOUN}, head_word: feature= {Definite=Def,PronType=Art}, POS= {DET}& label= {det}& 6 Definite=Def | PronType=Art | ...
Example 1: NP with definite determiner (complex pattern) eine Metheniti et al. (2019) No match. “cat” Katze POS=NOUN “a” label=det comp_word : POS=DET Figure 3: Pattern to identify a noun phrase with a definite article. tokenID(head_word) = headID(comp_word) POS ={NOUN}, head_word: feature= {Definite=Def,PronType=Art}, POS= {DET}& label= {det}& 7 Definite=Ind | PronType=Art | ...
Example 2: Definite pronoun (simple patterns) head_word: POS= {DET}& label= {det}& feature= {Definite=Def,PronType=Art} Figure 4: Pattern to identify a definite determiner. POS=DET label=det die “the” Metheniti et al. (2019) 8 Definite=Def | PronType=Art | ...
Example 3a: Transitive sentence (compound patterns) “he” Metheniti et al. (2019) Match! . label=punct POS=PUNCT “Mary” Maria label=obj POS=PROPN “loves” liebt label=root POS=VERB er comp_word: feature=... label=nsubj POS=PRON tokenID(head_word) = headID(comp_word) POS ={VERB}& label ={root}, head_word: label ={obj}, comp_word: AND tokenID(head_word) = headID(comp_word) POS ={VERB}& label ={root}, head_word: label ={nsubj}, 9
Example 3b: Reflexive sentence comp_word: Metheniti et al. (2019) Also a match... . label=punct POS=PUNCT “myself” mich label=obj POS=PRON “wash” wasche label=root POS=VERB “I” ich label=nsubj POS=PRON tokenID(head_word) = headID(comp_word) POS ={VERB}& label ={root}, head_word: label ={obj}, comp_word: AND tokenID(head_word) = headID(comp_word) POS ={VERB}& label ={root}, head_word: label ={nsubj}, 10 PronType=Prs | ... Relex=Yes | ...
Exclude operator • Distinguish similar patterns by excluding the parts of the Metheniti et al. (2019) tokenID(head_word) = headID(comp_word)) POS ={VERB}& label= {root}, head_word: {PronType=Prs,Reflex=Yes}, label ={obj,iobj}& feature = AND tokenID(head_word) = headID(comp_word)) POS ={VERB}& label ={root}, head_word: label ={iobj}, AND tokenID(head_word) = headID(comp_word) POS ={VERB}& label ={root}, head_word: label ={obj}, comp_word: AND tokenID(head_word) = headID(comp_word) POS ={VERB}& label ={root}, head_word: label ={nsubj}, comp_word: sentences (not bitransitive sentences, not reflexive sentences) • Example: Pattern to only match simple mono-transitive pattern that should not match 11 ∼ (comp_word: ∼ (comp_word:
Example 3a, revised Gluck label=subj Rolf “Rolf” POS=AUX label=aux hat “has” POS=NOUN label=obj “luck” tokenID(head_word) = headID(comp_word)) POS=VERB label=root gehabt “had” POS=PUNCT label=punct . Part 1: Match! Metheniti et al. (2019) POS=PROPN POS ={VERB}& label= {root}, comp_word: tokenID(head_word) = headID(comp_word) label ={nsubj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word) AND comp_word: label ={obj}, head_word: POS ={VERB}& label ={root}, AND head_word: label ={iobj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word)) AND label ={obj,iobj}& feature = {PronType=Prs,Reflex=Yes}, 12 ∼ (comp_word: ∼ (comp_word:
Example 3a, revised Gluck label=subj Rolf “Rolf” POS=AUX label=aux hat “has” POS=NOUN label=obj “luck” tokenID(head_word) = headID(comp_word)) POS=VERB label=root gehabt “had” POS=PUNCT label=punct . Part 2: Match! Metheniti et al. (2019) POS=PROPN POS ={VERB}& label= {root}, comp_word: tokenID(head_word) = headID(comp_word) label ={nsubj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word) AND comp_word: label ={obj}, head_word: POS ={VERB}& label ={root}, AND head_word: label ={iobj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word)) AND label ={obj,iobj}& feature = {PronType=Prs,Reflex=Yes}, 13 ∼ (comp_word: ∼ (comp_word:
Example 3a, revised label=obj label=subj Rolf “Rolf” POS=AUX label=aux hat “has” POS=NOUN Gluck tokenID(head_word) = headID(comp_word)) “luck” POS=VERB label=root gehabt “had” POS=PUNCT label=punct . Metheniti et al. (2019) POS=PROPN POS ={VERB}& label= {root}, comp_word: head_word: label ={nsubj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word) AND comp_word: label ={obj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word) AND label ={iobj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word)) AND label ={obj,iobj}& feature = {PronType=Prs,Reflex=Yes}, 14 ∼ (comp_word: ∼ (comp_word: Part 3: comp_word not found → No match.
Example 3a, revised feature=... label=subj Rolf “Rolf” POS=AUX label=aux hat “has” POS=NOUN label=obj Gluck comp_word: “luck” POS=VERB label=root gehabt “had” POS=PUNCT label=punct . Metheniti et al. (2019) POS=PROPN tokenID(head_word) = headID(comp_word)) POS ={VERB}& label= {root}, head_word: label ={nsubj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word) AND comp_word: label ={obj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word) AND label ={iobj}, head_word: POS ={VERB}& label ={root}, tokenID(head_word) = headID(comp_word)) AND label ={obj,iobj}& feature = {PronType=Prs,Reflex=Yes}, 15 ∼ (comp_word: ∼ (comp_word: Part 4: Partial match → No match.
Matching process {VerbForm=Part}, tokenID(head_word)=headID(comp_word) Metheniti et al. (2019) tokenID(head_word)=headID(comp_word)) AND head_word: POS={VERB} & label={root} & feature={VerbForm=Part}, (comp_word: POS={AUX} & label={aux}, tokenID(head_word)=headID(comp_word)) AND (comp_word: label={nsubj}, head_word: POS={VERB}&label={root}, 1 auxiliary verb intransitive verb, with with clause 135 patterns for syntactic/morphosyntactic German grammar rules 289 Simple POS={VERB} & feature= 1 ID Description Dif. Pattern 222 Auxiliary verb “haben”, {<222>,<218>}, head_word: present indicative head_word: POS={AUX} & wordform={“hab”,“habe”,“hast”,“hat”,“haben”} & feature= {Mood=Ind,VerbForm=Fin} 240 Composed forms: Per- fect indicative 1 comp_word: 16 ∼ (head_word: label={obj}) AND ∼ (head_word: label={iobj}) AND ∼ (head_word: POS={PUNCT}&wordform={“?”}) AND ∼ (head_word: feature={Mood=Imp}&label={root})
Dictionaries For string matches, lemma matches, affix matches: • Dictionary of 15K German words from 117K corpus of children’s texts • Every word has orthographic, phonological, morphological information NB: Word matches should be used sparingly; dependencies are favoured. Metheniti et al. (2019) 17
Recommend
More recommend