statistical parsing
play

Statistical Parsing Paper presentation: natural language parsing. - PowerPoint PPT Presentation

Statistical Parsing Paper presentation: natural language parsing. In: Computational linguistics ar ltekin University of Tbingen Seminar fr Sprachwissenschaft December 2016 Michael Collins (2003). Head-driven statistical


  1. Statistical Parsing Paper presentation: natural language parsing”. In: Computational linguistics Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft December 2016 Michael Collins (2003). “Head-driven statistical models for 29.4, pp. 589–637. doi: 10.1162/089120103322753356

  2. Introduction/Motivation A summary of the paper Collins parser SfS / University of Tübingen Ç. Çöltekin, others in the literature that are easy to estimate 1 / 20 sparse data) but parameter estimation becomes diffjcult (many rules, What is the paper about? • A head-driven, lexicalized PCFG • PCFGs cannot capture many linguistic phenomena • Lexicalizing PCFGs allows capturing lexical dependencies, • The main idea is factoring the rule probabilities, into parts • The paper does that in a linguistically-motivated way • The resulting parser works better than PCFGs, and some

  3. Introduction/Motivation A summary of the paper Three models Model 1 parts of its LHS distance to their head Model 2 Add complement-adjunct distinction (use subcategorization frames) Model 3 Add conditions for wh-movement Ç. Çöltekin, SfS / University of Tübingen Collins parser 2 / 20 • Lexicalize the PCFG • Condition the probability of a rule based on • Condition probabilities of non-heads on

  4. Introduction/Motivation A summary of the paper An overview of the paper 2. Background: PCFGs, lexicalization, estimation (MLE) 3. Model defjnitions 4. Special cases: mainly related to treebank format 5. Practical issues: parameter estimation, unknown words, parsing algorithm 6. Results 7. Discussion 8. Related work 9. Conclusions Ç. Çöltekin, SfS / University of Tübingen Collins parser 3 / 20

  5. Introduction/Motivation A summary of the paper Collins parser SfS / University of Tübingen Ç. Çöltekin, 4 / 20 – all derivations terminate in a fjnite number of steps – if all rule probabilities with the same LHS sum to 1 Probabilistic context-free grammars • A CFG augmented with probabilities for each rule • Assigns a proper probability distribution to parse trees • The main problem is estimating probabilities associated with each rule X → β • Maximum-likelihood estimate: count ( X → β count ( X ) • With rule probabilities, parsing is fjnding the best tree P ( T, S ) T best = arg max P ( T | S ) = arg max = arg max P ( T, S ) P ( S ) T T T

  6. Introduction/Motivation A summary of the paper Probabilistic context-free grammars (2) of rules used in the derivation Ç. Çöltekin, SfS / University of Tübingen Collins parser 5 / 20 • In PCFGs derivations are assumed to be independent • The probability of a tree is the product of the probabilities • PCFGs cannot capture lexical or structural dependencies

  7. Introduction/Motivation A summary of the paper Lexicalizing PCFGs the lexical word and its POS tag dependencies are automatically annotated (based on heuristics) Ç. Çöltekin, SfS / University of Tübingen Collins parser 6 / 20 • Replace non-terminal X with X ( h ) , where h is a tuple with • Now the grammar can capture (head-driven) lexical • But number of nonterminals grow by | V | × | T | • Estimation becomes diffjcult (many rules, data sparsity) • Note: Penn Treebank (PTB) does not annotate heads, they

  8. Introduction/Motivation NPN(Lotus,NNP) Collins parser SfS / University of Tübingen Ç. Çöltekin, Last JJ(last,JJ) VBD(bought,VBD) NP(Lotus,NNP) VP(bought,VBD) NP(week,NN) NP(IBM,NNP) VP(bought,VBD) S(bought,VBD) S(bought,VBD) TOP Example rules: A summary of the paper Lotus NP(Lotus,NNP) bought Example lexicalized derivation TOP S(bought,VBD) NP(week,NN) JJ(last,JJ) Last NN(week,NN) week NP(IBM,NNP) NNP(IBM,NNP) IBM VP(bought,VBD) VBD(bought,VBD) 7 / 20 → → → →

  9. Introduction/Motivation A summary of the paper Model 1: the generative story We take each lexicalized CF rule is formed as 2. Generate the left modifjer(s) independently, each with 3. Generate the left modifjer(s) independently, each with generation Ç. Çöltekin, SfS / University of Tübingen Collins parser 8 / 20 X ( h ) → ⟨ left-dependents ⟩ H ( h ) ⟨ right-dependents ⟩ 1. Generate the head with probability P h ( H | X, h ) probability P l ( L i ( l i ) | X, h, H ) probability P r ( R i ( r i ) | X, h, H ) • A special left/right dependent label ‘STOP’ terminates the

  10. Introduction/Motivation A summary of the paper Model 1: distance using – Is the intervening string length 0? (adjacency) – Does the intervening string contain a verb? (clausal modifjers) Ç. Çöltekin, SfS / University of Tübingen Collins parser 9 / 20 • Model 1, also conditions the left and right dependents on their distance from the head. For example P l is estimated P l ( L i ( l i ) | X, h, H, distance ( i − 1 )) • Two distance measures:

  11. Introduction/Motivation A summary of the paper Model 2: the generative story Main idea: condition the right/left modifjers on right complements of the head. 2. Choose left and aright subcategorization frames, with 3. Generate the left/right modifjer(s) independently, each Ç. Çöltekin, SfS / University of Tübingen Collins parser 10 / 20 subcategorization frames ( LC and RC ), which are the left and 1. Generate the head with probability P h ( H | X, h ) probabilities P lc ( LC | X, H, h ) and P rc ( RC | X, H, h ) with probability P l ( L i ( l i ) | X, h, H, LC ) and P r ( R i ( R i ) | X, h, H, RC )

  12. Introduction/Motivation IBM Collins parser SfS / University of Tübingen Ç. Çöltekin, last week NP(week) TRACE bought VBD VP(bought)(+gap) NP-C(IBM) A summary of the paper S(bought)(+gap) that WDT WHNP(that) SBAR(that)(+gap) The store NP(store) NP(store) The idea: mark and propagate ‘gaps’. Model 3: traces and wh-movement 11 / 20

  13. Introduction/Motivation A summary of the paper Special cases comma and colon, treat the rest as coordination preprocessing Ç. Çöltekin, SfS / University of Tübingen Collins parser 12 / 20 • Non-recursive (base) NPs are marked as NPB • Coordination: allow only a single phrase after a CC • Punctuation: remove all except non-initial/non-fjnal • Empty subjects: introduce a dummy empty subject during

  14. Introduction/Motivation A summary of the paper Collins parser SfS / University of Tübingen Ç. Çöltekin, is the relevant number of types. 13 / 20 where, smoothing in the paper for details), using a version of Witten-Bell Parameters are estimated by three levels of backofg (see Table 1 Parameter estimation e = λ 1 e 1 + ( 1 − λ 1 )( λ 2 e 2 + ( 1 − λ 2 ) e 3 ) f 1 λ 1 = f 1 + 5u 1 f 1 is the relevant number of tokens (count in denominator), u 1 Other λ s are calculated similarly.

  15. Introduction/Motivation A summary of the paper Unknown words and parsing algorithm were replaced with UNKNOWN assigned using using the tagger by Ratnaparkhi (1996) Ç. Çöltekin, SfS / University of Tübingen Collins parser 14 / 20 • During training, all words with frequencies less than 6 • During testing, the POS tags for unknown words were • The parsing algorithm is a version of CKY parser with O ( n 5 0 complexity

  16. Introduction/Motivation A summary of the paper Results earlier/state-of-the-art models Ç. Çöltekin, SfS / University of Tübingen Collins parser 15 / 20 • Model 2 performs better than Model 1 • Model 2 also performs better/similar in comparison to • Details: Table 2 on page 608 on paper.

  17. Introduction/Motivation A summary of the paper More on results attachment problems. page 610) Ç. Çöltekin, SfS / University of Tübingen Collins parser 16 / 20 • Phrase-label precision/recall results do not show • Extracted dependencies are more useful (Figure 12 on • The parser recovers ‘core’ dependencies successfully, • Main problems are with adjuncts and coordination

  18. Introduction/Motivation preferences: structural preferences seem to be necessary. Collins parser SfS / University of Tübingen Ç. Çöltekin, Flip said that Squeaky will do the work yesterday John was believed to have been shot by Bill For example: right-branching A summary of the paper – the probability of attaching ‘STOP’ increases – the probability of attaching a new modifjer decreases for Model 1 More on distance measure 17 / 20 • Distance measure seem to help fjnding subcategorization • As the distance from the head increases, • Distance measure is also useful for preferring • Structural (e.g., close attachment) vs. lexical/semantic

  19. Introduction/Motivation A summary of the paper Choice of representation lexical) preferences. important – fmat trees – difgerent constituent labels at difgerent levels Ç. Çöltekin, SfS / University of Tübingen Collins parser 18 / 20 • The parser prefers PTB-style (fmat) trees • For binary representations, do pre-/post-processing • This would have an efgect on capturing structural (but not • Preprocessing steps, e.g., NPB labeling, seem to be • In general, the parser works best with

Recommend


More recommend