Dependency Parsing Dr. Besnik Fetahu
Parsing so far … • Use context free grammars to determine constituents in a clause or sentence • Use CFGs to parse entire sentences into constituency based parse trees, e.g. syntactic parse trees • In constituent based parsing the dependencies between words in a sentence are “ latent” • In languages where word order is more relaxed, we would need rules for each of the different positions to capture specific phrases • The head words need to be found through hand-written rules (e.g. rules proposed by Collins) 2
Dependency Parsing Relations among words are illustrated through directed and labelled arcs ( typed dependencies ) Relations are drawn from a deterministic set of relations that are linguistically motivated (e.g. nsubj describes a nominal subject in a sentence) Each word has exactly one incoming arc (except from the root node) Dependency parse trees are useful for coreference resolution, question answering etc. 3
Dependency vs. Constituent based parse trees 4
Dependency Parsing • Dependency parsing does not require numerous rules as in the case of CFGs that are used in constituent based parsing. • Dependency parse trees form acyclic trees with type arcs between any parent-child nodes • Can handle morphologically rich languages and with relative free word order (e.g. Czech) • Head-words or root nodes in the dependency parse trees can be directly used in other NLP applications, since we can directly extract the verbs and its arguments (e.g. the case of prefer ) 5
Dependency Relations Selected dependency relations from the Universal Dependency set. (de Marneffe et al., 2014) 6
Dependency Relations • Dependency relations capture grammatical functions • In English, in many cases the notions of subject, object, or indirect object are correlated in the positions they appear, however, this is not the case for languages with a free word order (e.g. Czech) • Relations are of two groups: (i) clausal relations that describe syntactic roles w.r.t the predicate, and (ii) modifier relations that categorize the ways the words modify their heads (e.g. nmod or amod). • Clausal relations: NSUBJ and DOBJ identify the arguments of the verb canceled • Modifier relations: NMOD, DET, and CASE denote modifiers to the noun flights and Houston . 7
Dependency Relations 8
Dependency Parsers
Dependency Formalisms • Dependency structures form a directed graph G=(V,E), where V are the words (in some cases the stems of words), and E are the arcs which represent the word relations. • G more specifically is a tree which fulfills the following criteria: • It has a single root node that has no incoming arcs • Each vertex has exactly one incoming arc • There is a unique path from the root to any of the vertices in the tree. • For an arc from a head word we say it is projective if there is a path from the head word to every word that lies between the head and the dependent word in the sentence. • A dependency tree is projective if all the arcs are projective (basic check is to see if the arcs cross each other). 10
Transition Based Dependency Parsing
Shift-Reduce Parsing • It is the most basic dependency parsing approach. It uses a CFG, a stack, and a list of tokens that need to be parsed. • It successively shifts tokens from the list onto the stack and the top-2 elements in the stack are matched against the rules in the CFG. When matched, replace the two words with the non-terminal from the CFG. • In shift-reduce parsing we define the notion of a configuration which consists of: (i) stack, (ii) input buffer of words , (iii) set of relations representing a dependency tree. Goal: Find a final configuration where all the words have been • accounted for and an appropriate dependency tree has been synthesized. 12
Shift-Reduce Parsing 13
Shift-Reduce Parsing • Create an initial configuration in which the stack contains the ROOT node. The word list is initialized with the word list from the sentence, and an empty set of relation is created to represent the parse. • The shift-based parsing consists of the following three transition operations: LEFTARC: Assert a head-dependent relation between the word at the top • of the stack and the word directly beneath it. Remove the lower word if a relation is found. RIGHTARC: Assert a head-dependent relation between the second word • on the stack and the word at the top. Remove the word at the top of the stack if a relation is found. SHIFT: Remove the word from the front of the input word buffer and push • onto the stack. 14
Shift-Reduce Parsing • The operations in the shift-reduce parsing implement what is known the arc standard approach to transition based parsing. • The transition operators (LEFTARC and RIGHTARC) assert relations between the top-2 words in the stack. Once an element has its head-word, it is removed from the stack. • The ROOT per definition is not allowed to have an incoming arc, thus, LEFTARC cannot be applied to it if it is the second element in the stack. • The transition operators rely on an “oracle” which provides the right word relations. • The algorithm has linear complexity as we do only one pass on our word buffer. • The transition based parsers represent greedy algorithm, where for each two word pairs we have one choice as a relation between them. 15
Shift-Reduce Parsing “Book me the morning flight” 16
Shift-Reduce Parsing • There are two main issues with the assumptions that there is one single parse for any two words from an input sentence: 1.Due to ambiguity there may be different transition sequences that lead to valid parses. 2.We assume that our oracle provides us with correct parses for each word pair. This assumption is unlikely to hold in reality. 17
Shift-Reduce Parsing Oracle Training • Use supervised machine learning approaches to train dependency parsing oracles. • Use treebank data to learn a model that maps specific configurations to specific transition operators. • However, from treebanks we are not given the specific relation assertions between the top words in the stack and the transition operators. • For a reference parse and a configuration, to train an oracle do the following: • Choose LEFTARC if it produces the right head-dependent relation given our reference parse and the current configuration. • Choose RIGHTARC if: (i) it produces the right head-dependent relation given the reference parse, and (ii) all of the dependents of the word at the top of the stack have already been assigned. • Otherwise choose SHIFT 18
Shift-Reduce Parsing Oracle Training Training data for training the dependency parsing oracle. 19
Oracle Training Features Extract features based on feature templates from the configurations ⟨ s 1 . w , op ⟩ , ⟨ s 2 . w , op ⟩⟨ s 1 . t , op ⟩ , ⟨ s 2 . t , op ⟩⟨ b 1 . w , op ⟩⟨ s 1 . wt , op ⟩ 20
Advanced Transition- Based Parsing
Arc-eager Transition Parsing • The shift-reduce dependency parsing or the arc-standard parsing algorithm delay the removal of dependent words whose head has been assigned until all its subsequent dependents have been found. • The longer we wait for a word to assign its head the more opportunities may arise that might provide inaccurate dependency parses. • The arc-eager approach allows for words to have their head assigned as early as possible, before all its dependent words have been encountered. 22
Arc-eager Transition Parsing • Arc-eager makes minor changes to the standard algorithm: LEFTARC: Assert a head-dependent relation between the word • at the front of the input buffer and the word at the top of the stack. Pop the stack (if a relation has been found). RIGHTARC: Assert a head-dependent relation between the • word on top of the stack and the word at the front of the input buffer. Shift the word at the front of the input buffer to the stack. SHIFT: Remove the word from the front of the input buffer and • push it onto the stack REDUCE: Pop the stack. • 23
Arc-eager Transition Parsing 24
Graph-based Dependency Parsing
̂ Graph-based Dependency Parsing • Graph based approaches consider all possible parse trees and find a tree that maximizes some score (similar to constituent parsing). T ( S ) = arg max score ( t , S ) t ∈ S score ( t , S ) = ∑ score ( e ) e ∈ t • Graph based approaches are more suitable for cases with long-range dependencies, • For an input sentence, construct a fully connected weighted and directed graph with vertices being the words and directed edges being all the possible head-dependent relations. • Typical graph based approach make use of the maximum spanning tree algorithm to find the best parse trees. 26
Graph-based Dependency Parsing 27
Graph-based Dependency Parsing root 4 4 12 5 8 that Book flight 7 6 7 5 Maximum spanning tree shown in blue. 28
Graph-based Dependency Parsing Step: v=‘ Book ’ root 4 4 12 5 8 that flight Book 7 8 12 7 6 7 5 29
Graph-based Dependency Parsing Step: v=‘ Book ’ root 4 4 12 5 8 that flight Book 7 8 12 7 6 7 5 30
Graph-based Dependency Parsing Step: v=‘ Book ’ root 4 4 0 5 8 that flight Book 7 8 12 7 -6 7 -7 31
Recommend
More recommend