“Training Deterministic Parsers with Non-Deterministic Oracles” by Yoav Goldberg and Joakim Nivre, 2013 Seminarvortrag Pius Meinert July 13, 2018
Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET 1 He 1 wrote 2 her 3 a 4 letter 5
Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET 1 He 1 wrote 2 her 3 a 4 letter 5
Transition System Defjnition (Transition System) A transition system for dependency parsing is a quadruple 1. C is a set (confjgurations), 2. T is a set of transitions, each of which is a (partial) 2 S = ( C , T , c s , C t ) , where function t : C → C , 3. c s is an initialization function, mapping sentence w = w 1 w 2 ... w n to a confjguration c ∈ C , 4. C t ⊆ C (terminal confjgurations).
c s w He 1 wrote 2 her 3 a 4 letter 5 Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET root 3 He 1 wrote 2 her 3 a 4 letter 5
Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET 3 He 1 wrote 2 her 3 a 4 letter 5 c s ( w ) [ root ] , [ He 1 , wrote 2 , her 3 , a 4 , letter 5 ] , {}
Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET Shift 3 He 1 wrote 2 her 3 a 4 letter 5 [ root ] , [ He 1 , wrote 2 , her 3 , a 4 , letter 5 ] [ root , He 1 ] , [ wrote 2 , her 3 , a 4 , letter 5 ]
Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET Left SBJ 3 He 1 wrote 2 her 3 a 4 letter 5 [ root , He 1 ] , [ wrote 2 , her 3 , a 4 , letter 5 ] [ root ] , [ wrote 2 , her 3 , a 4 , letter 5 ]
Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET Right PRD 3 He 1 wrote 2 her 3 a 4 letter 5 [ root ] , [ wrote 2 , her 3 , a 4 , letter 5 ] [ root , wrote 2 ] , [ her 3 , a 4 , letter 5 ]
Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET 3 He 1 wrote 2 her 3 a 4 letter 5 [ root , wrote 2 ] , [ her 3 , a 4 , letter 5 ] Right IOBJ , Shift , Left DET , Reduce , Right DOBJ [ root , wrote 2 , letter 5 ] , [ ] ∈ C t
Training Deterministic Parsers with Non-Deterministic Oracles 2 4 then 6 7 else 8 4 1 if c = ( σ | i , j | β, A ) and ( j , i ) ∈ T then t ← Left 3 else if c = ( σ | i , j | β, A ) and ( i , j ) ∈ T then t ← Right 5 else if c = ( σ | i , j | β, A ) and ∃ k [ k < i ∧ [( k , j ) ∈ T ∨ ( j , k ) ∈ T ]] t ← Reduce t ← Shift 9 return t
Greedy Classifjer-based Parsing 1 11 10 9 8 7 6 5 4 3 2 5 c ← c s ( w ) while c / ∈ C t do t p ← arg max t ∈ Legal ( c ) w · φ ( c , t ) c ← t p ( c ) 12 return A c
Training Deterministic Parsers with Non-Deterministic Oracles 6 12 return w 11 else 10 9 8 7 5 4 2 3 6 1 for ( w , T ) ∈ d do c ← c s ( w ) while c / ∈ C t do t p ← arg max t ∈ Legal ( c ) w · φ ( c , t ) Correct ( c ) ← { t | o ( t ; c , T ) = true } t o ← arg max t ∈ Correct ( c ) w · φ ( c , t ) if t p / ∈ Correct ( c ) then Update ( w , φ ( c , t o ) , φ ( c , t p )) c ← t o ( c ) c ← t p ( c )
SH LA SBJ RA PRD RA IOBJ RE SH LA DET RA DOBJ Training Deterministic Parsers with Non-Deterministic Oracles DOBJ instead of static oracle spurious ambiguity requires non-deterministic oracle DET IOBJ SBJ root PRD 7 He 1 wrote 2 her 3 a 4 letter 5 SH , LA SBJ , RA PRD , RA IOBJ , SH , LA DET , RE , RA DOBJ
Training Deterministic Parsers with Non-Deterministic Oracles root instead of static oracle DET IOBJ DOBJ SBJ PRD 7 He 1 wrote 2 her 3 a 4 letter 5 SH , LA SBJ , RA PRD , RA IOBJ , SH , LA DET , RE , RA DOBJ SH , LA SBJ , RA PRD , RA IOBJ , RE , SH , LA DET , RA DOBJ → spurious ambiguity requires non-deterministic oracle
... with Non-Deterministic and Complete Oracles root dynamic oracle: non-deterministic + complete error propagation can be mitigated by complete oracle DET DOBJ SBJ PRD 8 He 1 wrote 2 her 3 a 4 letter 5 [ root ] , [ He 1 , wrote 2 , her 3 , a 4 , letter 5 ] SH , LA SBJ , RA PRD , SH [ root , wrote 2 , her 3 ] , [ a 4 , letter 5 ]
... with Non-Deterministic and Complete Oracles PRD DET DOBJ root SBJ 8 He 1 wrote 2 her 3 a 4 letter 5 [ root , wrote 2 , her 3 ] , [ a 4 , letter 5 ] SH , LA DET , SH [ root , wrote 2 , her 3 , letter 5 ] , [ ] ∈ C t → error propagation can be mitigated by complete oracle → dynamic oracle: non-deterministic + complete
Training (Standard) 6 12 return w 11 else 10 9 8 7 5 4 2 3 9 1 for ( w , T ) ∈ d do c ← c s ( w ) while c / ∈ C t do t p ← arg max t ∈ Legal ( c ) w · φ ( c , t ) Correct ( c ) ← { t | o ( t ; c , T ) = true } t o ← arg max t ∈ Correct ( c ) w · φ ( c , t ) if t p / ∈ Correct ( c ) then Update ( w , φ ( c , t o ) , φ ( c , t p )) c ← t o ( c ) c ← t p ( c )
Training with Exploration 6 12 return w 11 else 10 9 8 7 5 4 2 3 10 1 for ( w , T ) ∈ d do c ← c s ( w ) while c / ∈ C t do t p ← arg max t ∈ Legal ( c ) w · φ ( c , t ) Optimal ( c ) ← { t | o ( t ; c , T ) = true } t o ← arg max t ∈ Optimal ( c ) w · φ ( c , t ) if t p / ∈ Optimal ( c ) then Update ( w , φ ( c , t o ) , φ ( c , t p )) c ← Explore ( c , t o , t p ) c ← t p ( c )
o d c T Optimality / Transition Costs DOBJ 0 t c T t 2 A T DET IOBJ root DOBJ SBJ PRD 11 wrote 2 He 1 her 3 a 4 letter 5
o d c T Optimality / Transition Costs IOBJ 0 t c T t DET DOBJ DOBJ root SBJ PRD 11 wrote 2 He 1 her 3 a 4 letter 5 C ( A , T ) = 2
o d c T Optimality / Transition Costs IOBJ 0 t c T t DET DOBJ DOBJ root SBJ PRD 11 wrote 2 a 4 He 1 her 3 letter 5 [ root , wrote 2 , her 3 ] , [ a 4 , letter 5 ]
o d c T Optimality / Transition Costs root 0 t c T t DET DOBJ IOBJ IOBJ DOBJ DOBJ SBJ PRD 11 wrote 2 a 4 He 1 her 3 letter 5 min A : c � A C ( A , T ) = 0
o d c T Optimality / Transition Costs root 0 t c T t DET DOBJ IOBJ DOBJ SBJ PRD 11 He 1 wrote 2 her 3 a 4 letter 5 [ root , wrote 2 , her 3 ] , [ a 4 , letter 5 ] SH , ...
o d c T Optimality / Transition Costs DOBJ 0 t c T t DET DOBJ root IOBJ DOBJ SBJ PRD 11 wrote 2 a 4 He 1 her 3 letter 5 C ( Shift ; c , T ) = A : t ( c ) � A C ( A , T ) − min min A : c � A C ( A , T ) = 1
Optimality / Transition Costs SBJ DET DOBJ IOBJ root DOBJ DOBJ PRD 11 wrote 2 a 4 He 1 her 3 letter 5 C ( Shift ; c , T ) = A : t ( c ) � A C ( A , T ) − min min A : c � A C ( A , T ) = 1 o d ( c , T ) = { t | C ( t ; c , T ) = 0 }
Arc Decomposition - Defjnition Defjnition (Tree Consistency) A set of arcs A is said to be tree consistent if there exists a Defjnition (Arc Decomposition) A transition system is said to be arc decomposable if, for 12 projective dependency tree T such that A ⊆ T . every tree consistent arc set A and confjguration c , c � A is entailed by c � ( h , d ) for every arc ( h , d ) ∈ A .
Arc Decomposition - Arc-Standard Counterexample a b c Arc-Standard Transitions 13 c = ([ a , b , c ] , β ) Left [( σ | s 1 | s 0 , β, A )] = ( σ | s 0 , β, A ∪ { ( s 0 , s 1 ) } ) Right [( σ | s 1 | s 0 , β, A )] = ( σ | s 1 , β, A ∪ { ( s 1 , s 0 ) } ) Shift [( σ, b | β, A )] = ( σ | b , β, A )
Arc Decomposition - Arc-Standard Counterexample Left a b c Arc-Standard Transitions 13 c = ([ a , b , c ] , β ) ⊢ ([ a , c ] , β ) Left [( σ | s 1 | s 0 , β, A )] = ( σ | s 0 , β, A ∪ { ( s 0 , s 1 ) } ) Right [( σ | s 1 | s 0 , β, A )] = ( σ | s 1 , β, A ∪ { ( s 1 , s 0 ) } ) Shift [( σ, b | β, A )] = ( σ | b , β, A )
Arc Decomposition - Arc-Standard Counterexample Right Left a b c Arc-Standard Transitions 13 c = ([ a , b , c ] , β ) ⊢ ([ a , b ] , β ) ⊢ ([ b ] , β ) Left [( σ | s 1 | s 0 , β, A )] = ( σ | s 0 , β, A ∪ { ( s 0 , s 1 ) } ) Right [( σ | s 1 | s 0 , β, A )] = ( σ | s 1 , β, A ∪ { ( s 1 , s 0 ) } ) Shift [( σ, b | β, A )] = ( σ | b , β, A )
Arc Decomposition - Arc-Eager Proof Sketch 14 Given: arbitrary confjguration c = ( σ, β, A ) and tree consistent arc set A ′ such that all arc are reachable from c . To show: c � A ′ B = { ( h , d ) | h , d / ∈ β } B = { ( h , d ) | h , d ∈ β } B h = { ( h , d ) | h ∈ β, d ∈ σ } B d = { ( h , d ) | d ∈ β, h ∈ σ }
Arc Decomposition - Arc-Eager Proof Sketch 5 0 7 6 8 4 3 2 1 14 β σ B = { ( h , d ) | h , d / ∈ β } B = { ( h , d ) | h , d ∈ β } B h = { ( h , d ) | h ∈ β, d ∈ σ } B d = { ( h , d ) | d ∈ β, h ∈ σ }
Arc Decomposition - Arc-Eager Proof Sketch 5 0 7 6 8 4 3 2 1 14 β σ B = { ( h , d ) | h , d / ∈ β } B = { ( h , d ) | h , d ∈ β } B h = { ( h , d ) | h ∈ β, d ∈ σ } B d = { ( h , d ) | d ∈ β, h ∈ σ }
Arc Decomposition - Arc-Eager Proof Sketch 5 0 7 6 8 4 3 2 1 14 β σ B = { ( h , d ) | h , d / ∈ β } B = { ( h , d ) | h , d ∈ β } B h = { ( h , d ) | h ∈ β, d ∈ σ } B d = { ( h , d ) | d ∈ β, h ∈ σ }
Arc Decomposition - Arc-Eager Proof Sketch 5 0 7 6 8 4 3 2 1 14 β σ B = { ( h , d ) | h , d / ∈ β } B = { ( h , d ) | h , d ∈ β } B h = { ( h , d ) | h ∈ β, d ∈ σ } B d = { ( h , d ) | d ∈ β, h ∈ σ }
Recommend
More recommend