Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011
Main Idea Use syntactic context to find multiword expressions
Main Idea Use syntactic context to find multiword expressions Syntactic context → constituency parses
Main Idea Use syntactic context to find multiword expressions Syntactic context → constituency parses Multiword expressions → idiomatic constructions
Which languages? Results and analysis for French 3 / 42
Which languages? Results and analysis for French ◮ Lexicographic tradition of compiling MWE lists ◮ Annotated data! 3 / 42
Which languages? Results and analysis for French ◮ Lexicographic tradition of compiling MWE lists ◮ Annotated data! English examples in the talk 3 / 42
Motivating Example: Humans get this 1. He kicked the pail. 2. He kicked the bucket. ◮ “He died.” (Katz and Postal 1963) 4 / 42
Stanford parser can’t tell the difference S NP VP NP He kicked the pail 5 / 42
Stanford parser can’t tell the difference S S NP VP NP VP NP NP He He kicked kicked the pail the bucket 5 / 42
What does the lexicon contain? S Single-word entries? ◮ kick : <agent, theme> ◮ die : <theme> NP VP Multi-word entries? NP ◮ kick the bucket : <theme> He kicked the bucket 6 / 42
Lexicon-Grammar: He kicked the bucket S NP VP He died 7 / 42
Lexicon-Grammar: He kicked the bucket S S NP VP NP VP MWV He He died kicked the bucket (Gross 1986) 7 / 42
MWEs in Lexicon-Grammar Classified by global POS MWV Described by internal POS VBD DT NN sequence kicked the bucket Flat structures! 8 / 42
MWEs in Lexicon-Grammar Classified by global POS MWV Described by internal POS VBD DT NN sequence kicked the bucket Flat structures! Of theoretical interest but... 8 / 42
Why do we care (in NLP)? MWE knowledge improves: Dependency parsing (Nivre and Nilsson 2004) Constituency parsing (Arun and Keller 2005) Sentence generation (Hogan et al. 2007) Machine translation (Carpuat and Diab 2010) Shallow parsing (Korkontzelos and Manandhar 2010) 9 / 42
Why do we care (in NLP)? MWE knowledge improves: Dependency parsing (Nivre and Nilsson 2004) Constituency parsing (Arun and Keller 2005) Sentence generation (Hogan et al. 2007) Machine translation (Carpuat and Diab 2010) Shallow parsing (Korkontzelos and Manandhar 2010) Most experiments assume high accuracy identification! 9 / 42
French and the French Treebank MWEs common in French ◮ ∼ 5,000 multiword adverbs 10 / 42
French and the French Treebank MWC MWEs common in French ◮ ∼ 5,000 multiword adverbs P N C Paris 7 French Treebank ◮ ∼ 16,000 trees sous prétexte que ◮ 13% of tokens are MWE on the grounds that 10 / 42
French Treebank: MWE types I ET CL PRO Lots of nominal compounds ADV S e.g. N – N numéro deux O P D l a V b o l G C P ADV N 0 10 20 30 40 50 %Total MWEs 11 / 42
MWE Identification Evaluation Identification is a by-product of parsing 12 / 42
MWE Identification Evaluation Identification is a by-product of parsing ◮ Corpus: Paris 7 French Treebank (FTB) ◮ Split: same as (Crabbé and Candito 2008) ◮ Metrics: Precision and Recall ◮ Lengths ≤ 40 words 12 / 42
MWE Identification: Parent-Annotated PCFG 60 40 32.6 F1 20 0 PA-PCFG 13 / 42
MWE Identification: n -gram methods 60 40 34.7 32.6 F1 20 0 PA-PCFG mwetoolkit 14 / 42
MWE Identification: n -gram methods 60 40 34.7 32.6 F1 20 0 PA-PCFG mwetoolkit Standard approach in 2008 MWE Shared Task, MWE Workshops, etc. 14 / 42
n -gram methods: mwetoolkit Based on surface statistics 15 / 42
n -gram methods: mwetoolkit Based on surface statistics Step 1 : Lemmatize and POS tag corpus 15 / 42
n -gram methods: mwetoolkit Based on surface statistics Step 1 : Lemmatize and POS tag corpus Step 2 : Compute n -gram statistics: ◮ Maximum likelihood estimator ◮ Dice’s coefficient ◮ Pointwise mutual information ◮ Student’s t -score (Ramisch, Villavicencio, and Boitet 2010) 15 / 42
n -gram methods: mwetoolkit Step 3 : Create n -gram feature vectors 16 / 42
n -gram methods: mwetoolkit Step 3 : Create n -gram feature vectors Step 4 : Train a binary classifier 16 / 42
n -gram methods: mwetoolkit Step 3 : Create n -gram feature vectors Step 4 : Train a binary classifier Exploits statistical idiomaticity of MWEs 16 / 42
Is statistical idiomaticity sufficient? VN French multiword verbs MWV MWADV MWV Tree maintains relationship between MWV parts va d’ ailleurs bon train is also well underway 17 / 42
Recap: French MWE Identification Baselines 60 40 34.7 32.6 F1 20 0 PA-PCFG mwetoolkit 18 / 42
Recap: French MWE Identification Baselines 60 40 34.7 32.6 F1 20 0 PA-PCFG mwetoolkit Let’s build a better grammar 18 / 42
Better PCFGs: Manual grammar splits Symbol refinement à la (Klein and Manning 2003) 19 / 42
Better PCFGs: Manual grammar splits Symbol refinement à la (Klein and Manning 2003) ◮ Has a verbal nucleus (VN) 19 / 42
Better PCFGs: Manual grammar splits COORD Symbol refinement à la (Klein and Manning 2003) C ADV VN ... ◮ Has a verbal nucleus doit -il (VN) Ou bien Otherwise he must 19 / 42
Better PCFGs: Manual grammar splits COORD- hasVN Symbol refinement à la (Klein and Manning 2003) C ADV VN ... ◮ Has a verbal nucleus doit -il (VN) Ou bien Otherwise he must 20 / 42
French MWE Identification: Manual Splits 80 63.1 60 34.7 40 32.6 1 F 20 0 G t s i k t F i l l o p C o S P t e - A w P m 21 / 42
French MWE Identification: Manual Splits 80 63.1 60 34.7 40 32.6 1 F 20 0 G t s i k t F i l l o p C o S P t e - A w P m MWE features: high frequency POS sequences 21 / 42
Capture more syntactic context? PCFGs work well! 22 / 42
Capture more syntactic context? PCFGs work well! Larger “rules”: Tree Substitution Grammars (TSG) 22 / 42
Capture more syntactic context? PCFGs work well! Larger “rules”: Tree Substitution Grammars (TSG) Relationship with Data-Oriented Parsing (DOP): ◮ Same grammar formalism (TSG) ◮ We include unlexicalized fragments ◮ Different parameter estimation 22 / 42
Which tree fragments do we select? S NP VP N MWV He V D N kicked the bucket 23 / 42
Which tree fragments do we select? S NP VP N MWV He V D N kicked the bucket 24 / 42
Which tree fragments do we select? MWV NP V S NP VP N kicked V D N MWV He the bucket 25 / 42
TSG Grammar Extraction as Tree Selection MWV V D N the bucket 26 / 42
TSG Grammar Extraction as Tree Selection MWV V D N the bucket ◮ Describes MWE context ◮ Allows for inflection: kick, kicked, kicking 26 / 42
Dirichlet process TSG (DP-TSG) Tree selection as non-parametric clustering 1 1 Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009; O’Donnell, Tenenbaum, and Goodman 2009. 27 / 42
Dirichlet process TSG (DP-TSG) Tree selection as non-parametric clustering 1 Labeled Chinese Restaurant process ◮ Dirichlet process (DP) prior for each non-terminal type c 1 Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009; O’Donnell, Tenenbaum, and Goodman 2009. 27 / 42
Dirichlet process TSG (DP-TSG) Tree selection as non-parametric clustering 1 Labeled Chinese Restaurant process ◮ Dirichlet process (DP) prior for each non-terminal type c Supervised case: segment the treebank 1 Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009; O’Donnell, Tenenbaum, and Goodman 2009. 27 / 42
DP-TSG: Learning and Inference DP base distribution from manually-split CFG 28 / 42
DP-TSG: Learning and Inference DP base distribution from manually-split CFG Type-based Gibbs sampler (Liang, Jordan, and Klein 2010) ◮ Fast convergence: 400 iterations 28 / 42
DP-TSG: Learning and Inference DP base distribution from manually-split CFG Type-based Gibbs sampler (Liang, Jordan, and Klein 2010) ◮ Fast convergence: 400 iterations Derivations of a TSG are a CFG forest 28 / 42
DP-TSG: Learning and Inference DP base distribution from manually-split CFG Type-based Gibbs sampler (Liang, Jordan, and Klein 2010) ◮ Fast convergence: 400 iterations Derivations of a TSG are a CFG forest ◮ SCFG decoder: cdec (Dyer et al. 2010) 28 / 42
French MWE Identification: DP-TSG 80 71.1 63.1 60 34.7 40 32.6 F1 20 0 G t s G i k t i F S l l o p C T o S P - t P e - w D A P m 29 / 42
French MWE Identification: DP-TSG 80 71.1 63.1 60 34.7 40 32.6 F1 20 0 G t s G i k t i F S l l o p C T o S P - t P e - w D A P m DP-TSG result is a lower bound 29 / 42
Human-interpretable DP-TSG rules MWN → coup de N coup de pied ‘kick’ coup de coeur ‘favorite’ coup de foudre ‘love at first sight’ coup de main ‘help’ coup de grâce ‘death blow’ 30 / 42
Recommend
More recommend