constituency parsing
play

Constituency Parsing Spring 2020 2020-03-24 Adapted from slides from - PowerPoint PPT Presentation

SFU NatLangLab CMPT 825: Natural Language Processing Constituency Parsing Spring 2020 2020-03-24 Adapted from slides from Danqi Chen and Karthik Narasimhan (with some content from David Bamman, Chris Manning, Mike Collins, and Graham Neubig)


  1. SFU NatLangLab CMPT 825: Natural Language Processing Constituency Parsing Spring 2020 2020-03-24 Adapted from slides from Danqi Chen and Karthik Narasimhan (with some content from David Bamman, Chris Manning, Mike Collins, and Graham Neubig)

  2. Project Milestone • Project Milestone due Tuesday 3/31 • PDF (2-4 pages) in the style of a conference (e.g. ACL/EMNLP) submission • https://2020.emnlp.org/files/emnlp2020-templates.zip • Milestone should include: • Title and Abstract - motivate the problem, describe your goals, and highlight your findings • Approach - details on your main approach and baselines. Be specific. Make clear what part is original, what code you are writing yourself, what code you are using • Experiment - describe dataset, evaluation metrics, what experiments you plan to run, any results you have so far. Also provide training details, training times, etc. • Future Work - what is your plan for the rest of the project • Reference - provide references using BibTex • Milestone will be graded based on progress and writing quality

  3. Overview • Constituency structure vs dependency structure • Context-free grammar (CFG) • Probabilistic context-free grammar (PCFG) • The CKY algorithm • Evaluation • Lexicalized PCFGs • Neural methods for constituency parsing

  4. Syntactic structure: constituency and dependency Two views of linguistic structure • Constituency • = phrase structure grammar • = context-free grammars (CFGs) • Dependency

  5. Constituency structure • Phrase structure organizes words into nested constituents • Starting units: words are given a category: part-of-speech tags the, cuddly, cat, by, the, door DT, JJ, NN, IN, DT, NN • Words combine into phrases with categories the cuddly cat, by, the door NP DT JJ NN IN NP DT NN → → • Phrases can combine into bigger phrases recursively the cuddly cat, by the door NP PP IN NP → the cuddly cat by the door NP NP PP →

  6. This Thursday Dependency structure • Dependency structure shows which words depend on (modify or are arguments of) which other words. nmod nsubj dobj case Satellites spot whales from space Satellites spot whales from space ❌

  7. Why do we need sentence structure? • We need to understand sentence structure in order to be able to interpret language correctly • Human communicate complex ideas by composing words together into bigger units • We need to know what is connected to what

  8. Syntactic parsing • Syntactic parsing is the task of recognizing a sentence and assigning a structure to it. Input: Output: Boeing is located in Seattle.

  9. Syntactic parsing • Used as intermediate representation for downstream applications English word order: subject — verb — object subject — object — verb Japanese word order: Syntax based machine translation Image credit: http://vas3k.com/blog/machine_translation/

  10. Syntactic parsing • Used as intermediate representation for downstream applications Relation Extraction Image credit: (Zhang et al, 2018)

  11. Beyond syntactic parsing This file doesn’t care about cleverness, wit or any other kind of intelligent humor. Negative Nested Sentiment Analysis Recursive deep models for semantic compositionality over a sentiment treebank Socher et al, EMNLP 2013

  12. Context-free grammars (CFG) • Widely used formal system for modeling constituency structure in English and other natural languages • A context free grammar where G = ( N , Σ , R , S ) • is a set of non-terminal symbols N • is a set of terminal symbols Σ • is a set of rules of the form for R X → Y 1 Y 2 … Y n , n ≥ 1 X ∈ N , Y i ∈ ( N ∪ Σ ) • is a distinguished start symbol S ∈ N

  13. A Context-Free Grammar for English Grammar Lexicon S:sentence, VP:verb phrase, NP: noun phrase, PP:prepositional phrase, DT:determiner, Vi:intransitive verb, Vt:transitive verb, NN: noun, IN:preposition

  14. (Left-most) Derivations • Given a CFG , a left-most derivation is a sequence of G strings , where s 1 , s 2 , …, s n • s 1 = S • : all possible strings made up of words from s n ∈ Σ * Σ • Each for is derived from by picking the left-most s i i = 2,…, n s i − 1 non-terminal in and replacing it by some where X s i − 1 X → β ∈ R β • : yield of the derivation s n

  15. (Left-most) Derivations • S s 1 = • NP VP s 2 = • DT NN VP s 3 = • the NN VP s 4 = • the man VP s 5 = • the man Vi s 6 = • the man sleeps s 7 = A derivation can be represented as a parse tree! • A string is in the language defined by the CFG if s ∈ Σ * there is at least one derivation whose yield is s • The set of possible derivations may be finite or infinite

  16. Ambiguity • Some strings may have more than one derivations (i.e. more than one parse trees!).

  17. “Classical” NLP Parsing • In fact, sentences can have a very large number of possible parses The board approved [its acquisition] [by Royal Trustco Ltd.] [of Toronto] [for $27 a share] [at its monthly meeting]. ((ab)c)d (a(bc))d (ab)(cd) a((bc)d) a(b(cd)) 1 n + 1 ( 2 n n ) Catalan number: C n = • It is also difficult to construct a grammar with enough coverage • A less constrained grammar can parse more sentences but result in more parses for even simple sentences • There is no way to choose the right parse!

  18. Statistical parsing • Learning from data : treebanks • Adding probabilities to the rules : probabilistic CFGs (PCFGs) Treebanks : a collection of sentences paired with their parse trees The Penn Treebank Project (Marcus et al, 1993)

  19. Treebanks • Standard setup (WSJ portion of Penn Treebank): • 40,000 sentences for training • 1,700 for development • 2,400 for testing • Why building a treebank instead of a grammar? • Broad coverage • Frequencies and distributional information • A way to evaluate systems

  20. Probabilistic context-free grammars (PCFGs) • A probabilistic context-free grammar (PCFG) consists of: • A context-free grammar: G = ( N , Σ , R , S ) • For each rule , there is a parameter . q ( α → β ) ≥ 0 α → β ∈ R For any , X ∈ N ∑ q ( α → β ) = 1 α → β : α = X

  21. Probabilistic context-free grammars (PCFGs) For any derivation (parse tree) containing rules: , the probability of the parse is: α 1 → β 1 , α 2 → β 2 , …, α l → β l l ∏ q ( α i → β i ) i =1 P ( t ) = q ( S → NP VP ) × q ( NP → DT NN ) × q ( DT → the ) × q ( NN → man ) × q ( VP → Vi ) × q ( Vi → sleeps ) = 1.0 × 0.3 × 1.0 × 0.7 × 0.4 × 1.0 = 0.084 ∑ Why do we want ? q ( α → β ) = 1 α → β : α = X

  22. Deriving a PCFG from a treebank • Training data: a set of parse trees t 1 , t 2 , …, t m • A PCFG : ( N , Σ , S , R , q ) • is the set of all non-terminals seen in the trees N • is the set of all words seen in the trees Σ • is taken to be the start symbol S. S • is taken to be the set of all rules seen in the trees R α → β • The maximum-likelihood parameter estimates are: q ML ( α → β ) = Count ( α → β ) Can add smoothing Count ( α ) If we have seen the rule VP → Vt NP 105 times, and the non-terminal VP 1000 times, q ( VP → Vt NP ) = 0.105

  23. CFG vs PCFG • A CFG tells us whether a sentence is in the language it defines • A PCFG gives us a mechanism for assigning scores (here, probabilities) to di ff erent parses for the same sentence.

  24. Parsing with PCFGs • Given a sentence and a PCFG, how to find the highest scoring s parse tree for ? s argmax t ∈𝒰 ( s ) P ( t ) • The CKY algorithm : applies to a PCFG in Chomsky normal form (CNF) • Chomsky Normal Form (CNF) : all the rules take one of the two following forms: • Binary where X → Y 1 Y 2 X ∈ N , Y 1 ∈ N , Y 2 ∈ N • Unary where X → Y X ∈ N , Y ∈ Σ • Can convert any PCFG into an equivalent grammar in CNF! • However, the trees will look differently • Possible to do “reverse transformation”

  25. Converting PCFGs into a CNF grammar • -ary rules ( ): NP → DT NNP VBG NN n n > 2 • Unary rules: VP → Vi , Vi → sleeps • Eliminate all the unary rules recursively by adding VP → sleeps • We will come back to this later!

  26. The CKY algorithm • Dynamic programming • Given a sentence , denote as the x 1 , x 2 , …, x n π ( i , j , X ) highest score for any parse tree that dominates words and has non-terminal as its root. x i , …, x j X ∈ N • Output: π (1, n , S ) • Initially, for , i = 1,2,…, n π ( i , i , X ) = { q ( X → x i ) if X → x i ∈ R otherwise 0 Book the flight through Houston 0 1 2 3 4 5

  27. The CKY algorithm • For all such that for all , ( i , j ) 1 ≤ i < j ≤ n X ∈ N π ( i , j , X ) = X → YZ ∈ R , i ≤ k < j q ( X → YZ ) × π ( i , k , Y ) × π ( k + 1, j , Z ) max Consider all ways span (i,j) can be split into 2 (k is the split point) Also stores backpointers which allow us to recover the parse tree Cells contain: - Best score for parse of span (i,j) for each non-terminal X - Backpointers

  28. The CKY algorithm Running time? O ( n 3 | R | )

Recommend


More recommend