Faculty of Computer Science Theoretical Computer Science, Chair of Foundations of Programming LEARNING PRUNING POLICIES FOR LINEAR CONTEXT-FREE REWRITING SYSTEMS INF-PM-FPG Andy Püschel Dresden, July 20, 2018
Motivation Example: • Weighted Deductive Parsing for LCFRS • Sentence w = Nun werden sie umworben . • Parser computes the highest scoring derivation ˆ d TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 2
Linear Context-free Rewriting System Definition A linear context-free rewriting system is a tuple G = ( N, Σ , Ξ , P, S ) where • N is a finite nonempty N -sorted set (nonterminal symbols), • Σ is a finite set (terminal symbols) (with ∀ l ∈ N : Σ ∩ N l = / 0), • Ξ is a finite nontempty set (variable symbols) (with Ξ ∩ Σ = / 0 and ∀ l ∈ N : Ξ ∩ N l = / 0), • P is a set of production rules of the form ρ = φ → ψ where – φ = A ( α 1 , . . . , α l ) (called left-hand side of ρ ) where l ∈ N , A ∈ N l , α 1 , . . . , α l ∈ ( Σ ∪ Ξ ) * and – ψ = B 1 ( X (1) 1 , . . . , X (1) l 1 ) . . . B m ( X ( m ) , . . . , X ( m ) l m ) (called right-hand side of ρ ) 1 where m ∈ N , B 1 ∈ N l 1 , . . . , B m ∈ N l m , X ( i ) ∈ Ξ for 1 < − i < − m, 1 < − j < − l i j and for every X ∈ Ξ occurring in ρ we require that X occurs exactly once in the left-hand side of ρ and exactly once in the right-hand side of ρ , and • S ∈ N 1 (initial nonterminal symbol). TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 3
Example PLCFRS PLCFRS ( G, p ) and G = ( N, Σ , Ξ , P, S ) where • N = { VROOT, S, VP, ADV, VAFIN, VAINF, VVINF, PPER, VVPP, $ , . . . }, • Σ = { Nun, werden, sie, umworben, ., . . . } and • P = { . . . , ADV ( Nun ) → ε #1 , VAFIN ( werden ) → ε #0 , 5 , VAINF ( werden ) → ε #0 , 25 , VVINF ( werden ) → ε #0 , 25 , PPER ( sie ) → ε #1 , VVPP ( umworben ) → ε #1 , $( . ) → ε #1 , , . . . } TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 4
Example PLCFRS PLCFRS ( G, p ) and G = ( N, Σ , Ξ , P, S ) where • N = { VROOT, S, VP, ADV, VAFIN, VAINF, VVINF, PPER, VVPP, $ , . . . }, • Σ = { Nun, werden, sie, umworben, ., . . . } and • P = { . . . , VP ( X (1) 1 , X (2) 1 ) → ADV ( X (1) 1 ) VVP ( X (2) 1 )#0 , 5 , S ( X (1) 1 X (2) 1 X (3) 1 ) → VAFIN ( X (1) 1 ) PPER ( X (2) 1 ) VVPP ( X (3) 1 )#0 , 25 , S ( X (1) 1 X (2) 1 , X (1) 2 ) → VP ( X (1) 1 , X (1) 2 ) VAINF ( X (2) 1 )#0 , 25 , S ( X (1) 1 X (2) 1 X (3) 1 X (1) 2 ) → VP ( X (1) 1 , X (1) 2 ) VAFIN ( X (2) 1 ) PPER ( X (3) 1 )#0 , 5 , S ( X (1) 1 X (1) 2 X (2) 1 X (1) 3 ) → S ( X (1) 1 X (1) 2 , X (1) 3 ) PPER ( X (2) 1 )#0 , 25 , VROOT ( X (1) 1 X (1) 2 X (1) 3 X (1) 4 X (2) 1 ) → S ( X (1) 1 X (1) 2 X (1) 3 X (1) 4 )$( X (2) 1 )#1 , . . . } TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 4
P ARSE - Weighted Deductive Parsing: Nun werden sie umworben . Initialize vertives TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5
P ARSE - Weighted Deductive Parsing: Nun werden sie umworben . hyperedges for ADV ( Nun ) → ε #1 , . . . TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5
P ARSE - Weighted Deductive Parsing: Nun werden sie umworben . hyperedge for VP ( X (1) 1 , X (2) 1 ) → ADV ( X (1) 1 ) VVP ( X (2) 1 )#0 , 5 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5
P ARSE - Weighted Deductive Parsing: Nun werden sie umworben . hyperedge for S ( X (1) 1 X (2) 1 X (3) 1 ) → VAFIN ( X (1) 1 ) PPER ( X (2) 1 ) VVPP ( X (3) 1 )#0 , 25 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5
P ARSE - Weighted Deductive Parsing: Nun werden sie umworben . hyperedge for S ( X (1) 1 X (2) 1 , X (1) 2 ) → VP ( X (1) 1 , X (1) 2 ) VAINF ( X (2) 1 )#0 , 25 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5
P ARSE - Weighted Deductive Parsing: Nun werden sie umworben . hyperedge for S ( X (1) 1 X (2) 1 X (3) 1 X (1) 2 ) → VP ( X (1) 1 , X (1) 2 ) VAFIN ( X (2) 1 ) PPER ( X (3) 1 )#0 , 5 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5
P ARSE - Weighted Deductive Parsing: Nun werden sie umworben . hyperedge for S ( X (1) 1 X (1) 2 X (2) 1 X (1) 3 ) → S ( X (1) 1 X (1) 2 , X (1) 3 ) PPER ( X (2) 1 )#0 , 25 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5
P ARSE - Weighted Deductive Parsing: Nun werden sie umworben . hyperedge for VROOT ( X (1) 1 X (1) 2 X (1) 3 X (1) 4 X (2) 1 ) → S ( X (1) 1 X (1) 2 X (1) 3 X (1) 4 )$( X (2) 1 )#1 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5
P ARSE - Weighted Deductive Parsing: Nun werden sie umworben . Undesired hyperedges TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5
P ARSE - Weighted Deductive Parsing: Nun werden sie umworben . Prune TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5
Motivation • How to reduce the parse time for a sentence? TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 6
Motivation • How to reduce the parse time for a sentence? • What is a good pruning method? TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 6
Motivation • How to reduce the parse time for a sentence? • What is a good pruning method? • How to train such a pruning method? TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 6
Overview • Motivation • Preliminaries • L OLS • Change Propagation • Dynamic Programming • Results TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 7
Preliminaries H = ( V, E ) ∈ H ( G,p ) ( w ) : derivation graph from P ARSE c ⊂ Σ * × T N ( Σ ) : X × Y − corpus s : state of the derivation graph a ∈ { keep, prune } : action τ = s 0 a 0 s 1 a 1 . . . s T : trajectory TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 8
Preliminaries pruning policy π : inputs a hyperedge and a sub sentence w ′ outputs a pruning decision a ∈ { keep, prune } How to evaluate π ? TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 9
Preliminaries pruning policy π : inputs a hyperedge and a sub sentence w ′ outputs a pruning decision a ∈ { keep, prune } How to evaluate π ? reward function r : H ( G,p ) ( w ) × T N ( Σ ) → R schematically r = accuracy − λ · runtime where accuracy : T N ( Σ ) × T N ( Σ ) → R and runtime : H ( G,p ) ( w ) → R λ ∈ R : trade-off factor empirical value of π : R ( π ) = 1 � r (P ARSE ( G, w, π ) , ξ ) · c ( w, ξ ) | c | ( w, ξ ) ∈ c TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 9
Preliminaries trajectory: s 0 a 0 s 1 a 1 . . . s T s 1 s 2 s T a 1 a 2 a T −1 . . . � r 1 [ a 1 ] TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 10
Preliminaries trajectory: s 0 a 0 s 1 a 1 . . . s T , (intervention at state s 1 ) s 1 s 2 s T a 1 a 2 a T −1 . . . � r 1 [ a 1 ] s ′ s ′ 2 T a ′ 1 a ′ a ′ 2 T −1 . . . � r 1 [ a ′ 1 ] TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 10
LOLS Locally Optimal Learning to Search Algorithm 1 Locally Optimal Learning to Search algorithm by [VE17] and [Cha+15] Input: PLCFRS ( G, p ) with G = ( N, Σ , Ξ , P, S ), X × Y -corpus c such that X ⊂ Σ * and Y ⊂ T N ( Σ ) Output: pruning policy π 1: function L OLS (( G, p ) , c ) π 1 := I NITIALIZE P OLICY ( . . . ) 2: for i := 1 to n do ⊲ n : number of iterations 3: Q i := / 0 ⊲ Q i : set of state-reward tuples 4: for ( w, ξ ) ∈ c do ⊲ w : sentence 5: τ := R OLL -I N (( G, p ) , w, π i , ξ ) ⊲ τ = s 0 a 0 s 1 a 1 . . . s T : trajectory 6: for t := 0 to | τ | − 1 do 7: for ¯ a t ∈ { keep, prune } do ⊲ intervention 8: � r t [ a ′ t ] := R OLL -O UT ( π i , s t , a ′ t , ξ ) 9: 10: end for Q i := Q i ∪ {( s t , � r t )} 11: 12: end for end for 13: π i +1 := T RAIN ( � i k =1 Q k ) ⊲ dataset aggregation 14: end for 15: return argmax π j :1< − n R ( π j ) 16: − j < 17: end function TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 11
Overview • Motivation • Preliminaries • L OLS • Change Propagation • Results TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 12
Change Propagation Change pruning bit TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 13
Change Propagation Delete witness for {1 , 2 , 3 , 4} and S TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 13
Recommend
More recommend