Dependency Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Dependency Parsing 1(14)
1. Dependency Trees 2. Arc-Factored Models 3. Online Learning 4. Eisner’s Algorithm 5. Spanning Tree Parsing Dependency Parsing 2(14)
Dependency Trees ◮ Input sentence x = x 1 , . . . , x n ◮ Dependency graph G = ( V x , A ) ◮ V x = { 0 , . . . , n } is a set of nodes, one for each x i + root ◮ A ⊆ ( V x × L × V x ) is a set of labeled arcs ( i , l , j ) ◮ Dependency tree = dependency graph satisfying: 1. Root : No arcs into node 0. 2. Single-Head : At most one incoming arc to any node. 3. Connected : Graph is weakly connected. Dependency Parsing 3(14)
Dependency Trees Projectivity: For every arc ( i , l , j ) , there is a directed path from i to every word k such that min ( i , j ) < k < max ( i , j ) . Dependency Parsing 4(14)
Dependency Trees Language Trees Arcs Arabic 11.2% 0.4% Basque 26.2% 2.9% Czech 23.2% 1.9% Danish 15.6% 1.0% Greek 20.3% 1.1% Russian 10.6% 0.9% Slovene 22.2% 1.9% Turkish 11.6% 1.5% Dependency Parsing 5(14)
Dependency Trees Parsing problem: ◮ Input: x = x 1 , . . . , x n ◮ Output: dependency tree y for x Equivalent to: ◮ Assign a head i and a label l to every node j (1 ≤ j ≤ n ) under the tree constraint ◮ Find a directed spanning tree in the complete graph G x = ( V x , V x × L × V x ) Dependency Parsing 6(14)
Arc-Factored Models � Score ( x , y ) = Score ( i , l , j , x ) ( i , l , j ) ∈ A y GEN ( x ) = { y | y is a spanning tree in G x = ( V x , V x × L × V x ) } � EVAL ( x , y ) = Score ( x , y ) = Score ( i , l , j , x ) ( i , l , j ) ∈ A y Dependency Parsing 7(14)
Arc-Factored Models K � Score ( i , l , j , x ) = f k ( i , l , j , x ) · w k k = 1 K y ∗ = argmax � � f k ( i , l , j , x ) · w k y ∈ GEN ( x ) k = 1 ( i , l , j ) ∈ A y Dependency Parsing 8(14)
Arc-Factored Models Unigram Bigram In-Between PoS x i -w, x i -p x i -w, x i -p, x j -w, x j -p x i -p, b -p, x j -p x i -w x i -p, x j -w, x j -p Surrounding PoS x i -p x i -w, x j -w, x j -p x i -p, x i + 1 -p, x j − 1 -p, x j -p x j -w, x j -p x i -w, x i -p, x j -p x i − 1 -p, x i -p, x j − 1 -p, x j -p x j -w x i -w, x i -p, x j -w x i -p, x i + 1 -p, x j -p, x j + 1 -p x j -p x i -w, x j -w x i − 1 -p, x i -p, x j -p, x j + 1 -p x i -p, x j -p Dependency Parsing 9(14)
Online Learning Training data: T = { ( x i , y i ) } |T | i = 1 1 w ← 0 2 for n : 1 .. N 3 for i : 1 .. |T | y ∗ ← Parse ( x i , w ) 4 if y ∗ � = y i 5 w ← Update ( w , y ∗ , y i ) 6 7 return w Dependency Parsing 10(14)
Online Learning Parse ( x , w ) � K k = 1 f k ( i , l , j , x i ) · w k 1 return argmax y ∈ GEN ( x i ) � ( i , l , j ) ∈ A y Update ( w , y ∗ , y i ) 1 for k : 1 .. K 2 for ( i , l , j ) ∈ A y ∗ 3 w k ← w k − f k ( i , l , j , x ) 4 for ( i , l , j ) ∈ A y i 5 w k ← w k + f k ( i , l , j , x ) Dependency Parsing 11(14)
Eisner’s Algorithm CKY Eisner Dependency Parsing 12(14)
Eisner’s Algorithm 1 for i : 0 .. n and all d , c 2 C [ i ][ i ][ d ][ c ] ← 0 . 0 3 for m : 1 .. n 4 for i : 0 .. n − m 5 j ← i + m 6 C [ i ][ j ][ ← ][ 0 ] ← max i ≤ k < j C [ i ][ k ][ → ][ 1 ] + C [ k + 1 ][ j ][ ← ][ 1 ] + Score ( j , i ) C [ i ][ j ][ → ][ 0 ] ← max i ≤ k < j C [ i ][ k ][ → ][ 1 ] + C [ k + 1 ][ j ][ ← ][ 1 ] + Score ( i , j ) 7 C [ i ][ j ][ ← ][ 1 ] ← max i ≤ k < j C [ i ][ k ][ ← ][ 1 ] + C [ k ][ j ][ ← ][ 0 ] 8 C [ i ][ j ][ → ][ 1 ] ← max i < k ≤ j C [ i ][ k ][ → ][ 0 ] + C [ k ][ j ][ → ][ 1 ] 9 10 return C [ 0 ][ n ][ → ][ 1 ] Dependency Parsing 13(14)
Spanning Tree Parsing ROOT ROOT 9 10 10 9 saw 30 saw 30 30 30 20 0 Mary John Mary John 11 3 Dependency Parsing 14(14)
Recommend
More recommend