[PPT] - Dependency Parsing Dr. Besnik Fetahu Parsing so far Use context PowerPoint Presentation

SLIDE 1

Dependency Parsing

Dr. Besnik Fetahu

SLIDE 2

Use context free grammars to determine constituents in a clause or

sentence

Use CFGs to parse entire sentences into constituency based parse

trees, e.g. syntactic parse trees

In constituent based parsing the dependencies between words in

a sentence are “latent”

In languages where word order is more relaxed, we would need

rules for each of the different positions to capture specific phrases

The head words need to be found through hand-written rules (e.g.

rules proposed by Collins)

Parsing so far …

2

SLIDE 3

Dependency Parsing

3

Relations among words are illustrated through directed and labelled arcs (typed dependencies) Relations are drawn from a deterministic set of relations that are linguistically motivated (e.g. nsubj describes a nominal subject in a sentence) Each word has exactly one incoming arc (except from the root node) Dependency parse trees are useful for coreference resolution, question answering etc.

SLIDE 4

Dependency vs. Constituent based parse trees

4

SLIDE 5

Dependency parsing does not require numerous rules as in

the case of CFGs that are used in constituent based parsing.

Dependency parse trees form acyclic trees with type arcs

between any parent-child nodes

Can handle morphologically rich languages and with relative

free word order (e.g. Czech)

Head-words or root nodes in the dependency parse trees

can be directly used in other NLP applications, since we can directly extract the verbs and its arguments (e.g. the case of prefer)

Dependency Parsing

5

SLIDE 6

Dependency Relations

6

Selected dependency relations from the Universal Dependency set. (de Marneffe et al., 2014)

SLIDE 7

Dependency relations capture grammatical functions
In English, in many cases the notions of subject, object, or indirect object are

correlated in the positions they appear, however, this is not the case for languages with a free word order (e.g. Czech)

Relations are of two groups: (i) clausal relations that describe syntactic roles

w.r.t the predicate, and (ii) modifier relations that categorize the ways the words modify their heads (e.g. nmod or amod).

Clausal relations: NSUBJ and DOBJ identify the arguments of the verb

canceled

Modifier relations: NMOD, DET, and CASE denote modifiers to the noun

flights and Houston.

Dependency Relations

7

SLIDE 8

Dependency Relations

8

SLIDE 9

Dependency Parsers

SLIDE 10

Dependency structures form a directed graph G=(V,E), where V are the words (in

some cases the stems of words), and E are the arcs which represent the word relations.

G more specifically is a tree which fulfills the following criteria:
It has a single root node that has no incoming arcs
Each vertex has exactly one incoming arc
There is a unique path from the root to any of the vertices in the tree.
For an arc from a head word we say it is projective if there is a path from the head

word to every word that lies between the head and the dependent word in the sentence.

A dependency tree is projective if all the arcs are projective (basic check is to

see if the arcs cross each other).

Dependency Formalisms

10

SLIDE 11

Transition Based Dependency Parsing

SLIDE 12

It is the most basic dependency parsing approach. It uses a CFG, a

stack, and a list of tokens that need to be parsed.

It successively shifts tokens from the list onto the stack and the top-2

elements in the stack are matched against the rules in the CFG. When matched, replace the two words with the non-terminal from the CFG.

In shift-reduce parsing we define the notion of a configuration

which consists of: (i) stack, (ii) input buffer of words, (iii) set of relations representing a dependency tree.

Goal: Find a final configuration where all the words have been

accounted for and an appropriate dependency tree has been synthesized.

Shift-Reduce Parsing

12

SLIDE 13

Shift-Reduce Parsing

13

SLIDE 14

Create an initial configuration in which the stack contains the ROOT node. The

word list is initialized with the word list from the sentence, and an empty set of relation is created to represent the parse.

The shift-based parsing consists of the following three transition operations:
LEFTARC: Assert a head-dependent relation between the word at the top
f the stack and the word directly beneath it. Remove the lower word if a

relation is found.

RIGHTARC: Assert a head-dependent relation between the second word
n the stack and the word at the top. Remove the word at the top of the

stack if a relation is found.

SHIFT: Remove the word from the front of the input word buffer and push
nto the stack.

Shift-Reduce Parsing

14

SLIDE 15

The operations in the shift-reduce parsing implement what is known the arc

standard approach to transition based parsing.

The transition operators (LEFTARC and RIGHTARC) assert relations between

the top-2 words in the stack. Once an element has its head-word, it is removed from the stack.

The ROOT per definition is not allowed to have an incoming arc, thus,

LEFTARC cannot be applied to it if it is the second element in the stack.

The transition operators rely on an “oracle” which provides the right word

relations.

The algorithm has linear complexity as we do only one pass on our word buffer.
The transition based parsers represent greedy algorithm, where for each two

word pairs we have one choice as a relation between them.

Shift-Reduce Parsing

15

SLIDE 16

Shift-Reduce Parsing

16

“Book me the morning flight”

SLIDE 17

There are two main issues with the assumptions that there

is one single parse for any two words from an input sentence: 1.Due to ambiguity there may be different transition sequences that lead to valid parses. 2.We assume that our oracle provides us with correct parses for each word pair. This assumption is unlikely to hold in reality.

Shift-Reduce Parsing

17

SLIDE 18

Use supervised machine learning approaches to train dependency parsing
racles.
Use treebank data to learn a model that maps specific configurations to specific

transition operators.

However, from treebanks we are not given the specific relation assertions

between the top words in the stack and the transition operators.

For a reference parse and a configuration, to train an oracle do the following:
Choose LEFTARC if it produces the right head-dependent relation given our

reference parse and the current configuration.

Choose RIGHTARC if: (i) it produces the right head-dependent relation given

the reference parse, and (ii) all of the dependents of the word at the top of the stack have already been assigned.

Otherwise choose SHIFT

Shift-Reduce Parsing Oracle Training

18

SLIDE 19

Shift-Reduce Parsing Oracle Training

19

Training data for training the dependency parsing oracle.

SLIDE 20

Oracle Training Features

20

⟨s1 . w, op⟩, ⟨s2 . w, op⟩⟨s1 . t, op⟩, ⟨s2 . t, op⟩⟨b1 . w, op⟩⟨s1 . wt, op⟩

Extract features based on feature templates from the configurations

SLIDE 21

Advanced Transition- Based Parsing

SLIDE 22

The shift-reduce dependency parsing or the arc-standard parsing

algorithm delay the removal of dependent words whose head has been assigned until all its subsequent dependents have been found.

The longer we wait for a word to assign its head the more
pportunities may arise that might provide inaccurate dependency

parses.

The arc-eager approach allows for words to have their head

assigned as early as possible, before all its dependent words have been encountered.

Arc-eager Transition Parsing

22

SLIDE 23

Arc-eager makes minor changes to the standard algorithm:
LEFTARC: Assert a head-dependent relation between the word

at the front of the input buffer and the word at the top of the

stack. Pop the stack (if a relation has been found).
RIGHTARC: Assert a head-dependent relation between the

word on top of the stack and the word at the front of the input

buffer. Shift the word at the front of the input buffer to the stack.
SHIFT: Remove the word from the front of the input buffer and

push it onto the stack

REDUCE: Pop the stack.

Arc-eager Transition Parsing

23

SLIDE 24

Arc-eager Transition Parsing

24

SLIDE 25

Graph-based Dependency Parsing

SLIDE 26

Graph based approaches consider all possible parse trees and find a tree

that maximizes some score (similar to constituent parsing).

Graph based approaches are more suitable for cases with long-range

dependencies,

For an input sentence, construct a fully connected weighted and directed

graph with vertices being the words and directed edges being all the possible head-dependent relations.

Typical graph based approach make use of the maximum spanning tree

algorithm to find the best parse trees.

Graph-based Dependency Parsing

26

̂ T(S) = arg max

t∈𝒣S

score(t, S)

score(t, S) = ∑

e∈t

score(e)

SLIDE 27

Graph-based Dependency Parsing

27

SLIDE 28

Graph-based Dependency Parsing

28

Maximum spanning tree shown in blue.

root Book that flight

4 4 12 5 7 5 6 8 7

SLIDE 29

Graph-based Dependency Parsing

29

Step: v=‘Book’

root Book  12 that  7 flight  8

4 4 12 5 7 5 6 8 7

SLIDE 30

Graph-based Dependency Parsing

30

Step: v=‘Book’

root Book  12 that  7 flight  8

4 4 12 5 7 5 6 8 7

SLIDE 31

Graph-based Dependency Parsing

31

Step: v=‘Book’

root Book  12 that  7 flight  8

4 4

7

7 5

6

8 7

SLIDE 32

Graph-based Dependency Parsing

32

Step: v=‘that’

root Book  12 that  7 flight  8

4 4

7

7 5

6

8 7

SLIDE 33

Graph-based Dependency Parsing

33

Step: v=‘that’

root Book  12 that  7 flight  8

4

3
7

7

2
6

8

SLIDE 34

Graph-based Dependency Parsing

34

Step: v=‘flight’

root Book  12 that  7 flight  8

4

3
7

7

2
6

8

SLIDE 35

Graph-based Dependency Parsing

35

Step: v=‘flight’

root Book  12 that  7 flight  8

4
3
7
1
2
6

SLIDE 36

Graph-based Dependency Parsing

36

Step: contract cycles

root Book  12 that  7 flight  8

4
3
7
1
2
6

cycle

recursively apply   the algorithm

SLIDE 37

Graph-based Dependency Parsing

37

Step: contract cycles

root Book tf

4
3
7
1
2
6

recursively apply   the algorithm

SLIDE 38

Graph-based Dependency Parsing

38

Step: expand

root Book that flight

expand T’ to determine   which edge to delete

SLIDE 39

Evaluating Dependency Parsers

SLIDE 40

Consider the full parse trees or exact matches for evaluation

results in very strict criteria, for which, most of the algorithms will fail to produce entirely correct parse trees.

Thus, there are two common metrics to evaluate the

performance:

Labelled attachment score (LAS): refers to the accuracy of

assigning the correct head for each dependent word, along with the correct relation.

Unlabelled attachment score (UAS): refers to the accuracy
f assigning only the correct head for each dependent words.

Evaluation

40

SLIDE 41

https://web.stanford.edu/~jurafsky/slp3/13.pdf

Resources

41

SLIDE 42

Word Representation (if time permits)
Pointwise Mutual Information,
Latent Semantic Analysis,
word2vec: skip-gram and cbow
Word representation intrinsic evaluation

Upcoming Lectures

42