natural language processing csep 517 dependency syntax
play

Natural Language Processing (CSEP 517): Dependency Syntax and - PowerPoint PPT Presentation

Natural Language Processing (CSEP 517): Dependency Syntax and Parsing Noah Smith 2017 c University of Washington nasmith@cs.washington.edu May 1, 2017 1 / 96 To-Do List Online quiz: due Sunday Read: K ubler et al. (2009, ch.


  1. Natural Language Processing (CSEP 517): Dependency Syntax and Parsing Noah Smith � 2017 c University of Washington nasmith@cs.washington.edu May 1, 2017 1 / 96

  2. To-Do List ◮ Online quiz: due Sunday ◮ Read: K¨ ubler et al. (2009, ch. 1, 2, 6) ◮ A3 due May 7 (Sunday) ◮ A4 due May 14 (Sunday) 2 / 96

  3. Dependencies Informally, you can think of dependency structures as a transformation of phrase-structures that ◮ maintains the word-to-word relationships induced by lexicalization, ◮ adds labels to them, and ◮ eliminates the phrase categories. There are also linguistic theories built on dependencies (Tesni` ere, 1959; Mel’ˇ cuk, 1987), as well as treebanks corresponding to those. ◮ Free(r)-word order languages (e.g., Czech) 3 / 96

  4. Dependency Tree: Definition Let x = � x 1 , . . . , x n � be a sentence. Add a special root symbol as “ x 0 .” A dependency tree consists of a set of tuples � p, c, ℓ � , where ◮ p ∈ { 0 , . . . , n } is the index of a parent ◮ c ∈ { 1 , . . . , n } is the index of a child ◮ ℓ ∈ L is a label Different annotation schemes define different label sets L , and different constraints on the set of tuples. Most commonly: ◮ The tuple is represented as a directed edge from x p to x c with label ℓ . ◮ The directed edges form an arborescence (directed tree) with x 0 as the root (sometimes denoted root ). 4 / 96

  5. Example S NP VP Pronoun Verb NP we wash Determiner Noun our cats Phrase-structure tree. 5 / 96

  6. Example S NP VP Pronoun NP Verb we wash Determiner Noun our cats Phrase-structure tree with heads. 6 / 96

  7. Example S wash NP we VP wash Pronoun we NP cats Verb wash we wash Determiner our Noun cats our cats Phrase-structure tree with heads, lexicalized. 7 / 96

  8. Example we wash our cats “Bare bones” dependency tree. 8 / 96

  9. Example we wash our cats who stink 9 / 96

  10. Example we vigorously wash our cats who stink 10 / 96

  11. Content Heads vs. Function Heads Credit: Nathan Schneider little kids were always watching birds with fish little kids were always watching birds with fish 11 / 96

  12. Labels root pobj sbj dobj prep kids saw birds with fish Key dependency relations captured in the labels include: subject, direct object, preposition object, adjectival modifier, adverbial modifier. In this lecture, I will mostly not discuss labels, to keep the algorithms simpler. 12 / 96

  13. Coordination Structures we vigorously wash our cats and dogs who stink The bugbear of dependency syntax. 13 / 96

  14. Example we vigorously wash our cats and dogs who stink Make the first conjunct the head? 14 / 96

  15. Example we vigorously wash our cats and dogs who stink Make the coordinating conjunction the head? 15 / 96

  16. Example we vigorously wash our cats and dogs who stink Make the second conjunct the head? 16 / 96

  17. Dependency Schemes ◮ Transform the treebank: define “head rules” that can select the head child of any node in a phrase-structure tree and label the dependencies. ◮ More powerful, less local rule sets, possibly collapsing some words into arc labels. ◮ Stanford dependencies are a popular example (de Marneffe et al., 2006). ◮ Direct annotation. 17 / 96

  18. Three Approaches to Dependency Parsing 1. Dynamic programming with the Eisner algorithm. 2. Transition-based parsing with a stack. 3. Chu-Liu-Edmonds algorithm for arborescences. 18 / 96

  19. Dependencies and Grammar Context-free grammars can be used to encode dependency structures. For every head word and constellation of dependent children: N head → N leftmost-sibling . . . N head . . . N rightmost-sibling And for every v ∈ V : N v → v and S → N v . 19 / 96

  20. Dependencies and Grammar Context-free grammars can be used to encode dependency structures. For every head word and constellation of dependent children: N head → N leftmost-sibling . . . N head . . . N rightmost-sibling And for every v ∈ V : N v → v and S → N v . A bilexical dependency grammar binarizes the dependents, generating only one per rule. 20 / 96

  21. Dependencies and Grammar Context-free grammars can be used to encode dependency structures. For every head word and constellation of dependent children: N head → N leftmost-sibling . . . N head . . . N rightmost-sibling And for every v ∈ V : N v → v and S → N v . A bilexical dependency grammar binarizes the dependents, generating only one per rule. Such a grammar can produce only projective trees, which are (informally) trees in which the arcs don’t cross. 21 / 96

  22. Bilexical Dependency Grammar: Derivation S N wash N we N wash we N wash N cats N our N cats wash our cats ıvely, the CKY algorithm will require O ( n 5 ) runtime. Why? Na¨ 22 / 96

  23. CKY for Bilexical Context-Free Grammars N x h N x c N x c N x h i j j + 1 k i j j + 1 k p ( N x h N x c | N x h ) p ( N x c N x h | N x h ) N x h N x h i k i k 23 / 96

  24. CKY Example goal S Nwash Nwash Ncats Nwe Nwash Nour. Ncats we wash our cats 24 / 96

  25. Dependency Parsing with the Eisner Algorithm (Eisner, 1996) h d d h c h h c Items: ◮ Both triangles indicate that x d is a descendant of x h . ◮ Both trapezoids indicate that x c can be attached as the child of x h . ◮ In all cases, the words “in between” are descendants of x h . 25 / 96

  26. Dependency Parsing with the Eisner Algorithm (Eisner, 1996) Initialization: p ( x i | N x i ) 1 i i i i Goal: i n 1 i p ( N x i | S ) goal 26 / 96

  27. Dependency Parsing with the Eisner Algorithm (Eisner, 1996) Attaching a left dependent: Complete a left child: i j j + 1 k i j j k p ( N x i N x k | N x k ) i k i k 27 / 96

  28. Dependency Parsing with the Eisner Algorithm (Eisner, 1996) Attaching a right dependent: Complete a right child: i j j + 1 k i j j k p ( N x i N x k | N x i ) i k i k 28 / 96

  29. Eisner Algorithm Example goal we wash our cats 29 / 96

  30. Three Approaches to Dependency Parsing 1. Dynamic programming with the Eisner algorithm. 2. Transition-based parsing with a stack. 3. Chu-Liu-Edmonds algorithm for arborescences. 30 / 96

  31. Transition-Based Parsing ◮ Process x once, from left to right, making a sequence of greedy parsing decisions. 31 / 96

  32. Transition-Based Parsing ◮ Process x once, from left to right, making a sequence of greedy parsing decisions. ◮ Formally, the parser is a state machine ( not a finite-state machine) whose state is represented by a stack S and a buffer B . 32 / 96

  33. Transition-Based Parsing ◮ Process x once, from left to right, making a sequence of greedy parsing decisions. ◮ Formally, the parser is a state machine ( not a finite-state machine) whose state is represented by a stack S and a buffer B . ◮ Initialize the buffer to contain x and the stack to contain the root symbol. 33 / 96

  34. Transition-Based Parsing ◮ Process x once, from left to right, making a sequence of greedy parsing decisions. ◮ Formally, the parser is a state machine ( not a finite-state machine) whose state is represented by a stack S and a buffer B . ◮ Initialize the buffer to contain x and the stack to contain the root symbol. ◮ The “arc standard” transition set (Nivre, 2004): ◮ shift the word at the front of the buffer B onto the stack S . ◮ right-arc : u = pop ( S ) ; v = pop ( S ) ; push ( S, v → u ) . ◮ left-arc : u = pop ( S ) ; v = pop ( S ) ; push ( S, v ← u ) . (For labeled parsing, add labels to the right-arc and left-arc transitions.) 34 / 96

  35. Transition-Based Parsing ◮ Process x once, from left to right, making a sequence of greedy parsing decisions. ◮ Formally, the parser is a state machine ( not a finite-state machine) whose state is represented by a stack S and a buffer B . ◮ Initialize the buffer to contain x and the stack to contain the root symbol. ◮ The “arc standard” transition set (Nivre, 2004): ◮ shift the word at the front of the buffer B onto the stack S . ◮ right-arc : u = pop ( S ) ; v = pop ( S ) ; push ( S, v → u ) . ◮ left-arc : u = pop ( S ) ; v = pop ( S ) ; push ( S, v ← u ) . (For labeled parsing, add labels to the right-arc and left-arc transitions.) ◮ During parsing, apply a classifier to decide which transition to take next, greedily. No backtracking. 35 / 96

  36. Transition-Based Parsing: Example Buffer B : we vigorously Stack S : wash our root cats who stink Actions: 36 / 96

  37. Transition-Based Parsing: Example Buffer B : vigorously Stack S : wash our we cats root who stink Actions: shift 37 / 96

  38. Transition-Based Parsing: Example Buffer B : Stack S : wash our vigorously cats we who root stink Actions: shift shift 38 / 96

  39. Transition-Based Parsing: Example Buffer B : Stack S : our wash cats vigorously who we stink root Actions: shift shift shift 39 / 96

  40. Transition-Based Parsing: Example Stack S : Buffer B : our cats vigorously wash who stink we root Actions: shift shift shift left-arc 40 / 96

  41. Transition-Based Parsing: Example Stack S : Buffer B : our cats who we vigorously wash stink root Actions: shift shift shift left-arc left-arc 41 / 96

Recommend


More recommend