linear time constituency parsing with rnns and dynamic
play

Linear Time Constituency Parsing with RNNs and Dynamic Programming - PowerPoint PPT Presentation

Linear Time Constituency Parsing with RNNs and Dynamic Programming Juneki Hong 1 Liang Huang 1,2 1 Oregon State University 2 Baidu Research Silicon Valley AI Lab Span Parsing is SOTA in Constituency Parsing Cross+Huang 2016 introduced Span


  1. Linear Time Constituency Parsing with RNNs and Dynamic Programming Juneki Hong 1 Liang Huang 1,2 1 Oregon State University 2 Baidu Research Silicon Valley AI Lab

  2. Span Parsing is SOTA in Constituency Parsing • Cross+Huang 2016 introduced Span Parsing • But with greedy decoding. • Stern et al. 2017 had Span Parsing with Exact Search and Global Training • But was too slow: O ( n 3 ) • Can we get the best of both worlds? Cross + Huang 2016 Our Work • Something that is both fast and accurate? Speed New at ACL 2018! Also Span Parsing! Kitaev + Klein 2018 Stern et al. 2017 Joshi et al. 2018 Accuracy 2

  3. Both Fast and Accurate! chart parsing k r o w r u o Baseline Chart Parser (Stern et al. 2017a) 91.79 Our Linear Time Parser 91.97 3

  4. <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> In this talk, we will discuss: • Linear Time Constituency Parsing using dynamic programming • Going slower in order to go faster: O ( n 3 ) → O ( n 4 ) → O ( n ) • Cube Pruning to speed up Incremental Parsing with Dynamic Programming • From O ( n b 2 ) to O ( n b log b ) • An improved loss function for Loss-Augmented Decoding • 2nd highest accuracy among single systems trained on PTB only O (2 n ) → O ( n 3 ) → O ( n 4 ) O ( nb 2 ) O ( nb log b ) 4

  5. Span Parsing s ( i, j, X ) • Span differences are taken from an encoder 
 (in our case: a bi-LSTM) s • A span is scored and labeled by a feed-forward network. • The score of a tree is the sum of all the labeled ( f j − f i , b i − b j ) span scores f 0 f 1 f 2 f 3 f 4 f 5 s tree ( t ) = P s ( i, j, X ) ( i,j,X ) ∈ t ⟨ / s ⟩ ⟨ s ⟩ should eat ice cream You 0 4 1 2 3 5 b 0 b 1 b 2 b 3 b 4 b 5 5 Cross + Huang 2016 Stern et al. 2017 Wang + Chang 2016

  6. Incremental Span Parsing Example Action Label Stack 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 6

  7. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP VB NP PP Eat NN NN IN NP ice cream after NN lunch ø 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 7

  8. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP 2 Shift ø (0, 1) (1, 2) VB NN NP PP Eat ice NN IN NP cream after NN lunch ø ø 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 8

  9. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP PP 2 Shift ø (0, 1) (1, 2) VB NN NN NP IN NP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat ice cream NN after NN cream lunch ø ø ø 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 9

  10. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN lunch NP ø ø ø 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 10

  11. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN 5 Reduce ø (0, 3) lunch ø NP ø ø ø 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 11

  12. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN 5 Reduce ø (0, 3) lunch 6 Shift ø (0, 3) (3, 4) ø NP ø ø ø ø 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 12

  13. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP NP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NN 4 Reduce NP (0, 1) (1, 3) ice cream after lunch 5 Reduce ø (0, 3) 6 Shift ø (0, 3) (3, 4) ø NP 7 Shift NP (0, 3) (3, 4) (4, 5) ø ø ø ø NP 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 13

  14. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN 5 Reduce ø (0, 3) lunch 6 Shift ø (0, 3) (3, 4) ø NP PP 7 Shift NP (0, 3) (3, 4) (4, 5) ø ø ø ø NP 8 Reduce PP (0, 3) (3, 5) 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 14

  15. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) S-VP ice cream after NN 5 Reduce ø (0, 3) lunch 6 Shift ø (0, 3) (3, 4) ø NP PP 7 Shift NP (0, 3) (3, 4) (4, 5) ø ø ø ø NP 8 Reduce PP (0, 3) (3, 5) 0 1 2 3 4 5 9 Reduce S-VP (0, 5) Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 15

  16. How Many Possible Parsing Paths? • 2 actions per state. • O (2 n ) O (2 n ) 16

  17. 
 
 
 Equivalent Stacks? • Observe that all stacks that end with ( i , j ) will be treated the same! • …Until ( i , j ) is popped off. 
 [( 0 , 2 ), ( 2 , 7 ), ( 7 , 9 )] […, ( 7 , 9 )] becomes [ (0, 3), ( 3 , 7 ), ( 7 , 9 )] • So we can treat these as “temporarily equivalent”, and merge. 
 17 Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)

  18. 
 
 
 Equivalent Stacks? • Observe that all stacks that end with ( i , j ) will be treated the same! • …Until ( i , j ) is popped off. 
 […, ( 0 , 2 )] […, ( 2 , 7 )] […, ( 7 , 9 )] […, ( 3 , 7 )] […, ( 0 , 3 )] • This is our new stack representation. 
 Left Pointers 18 Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)

  19. Equivalent Stacks? • Observe that all stacks that end with ( i , j ) will be treated the same! • …Until ( i , j ) is popped off. […, ( 0 , 2 )] […, ( 2 , 7 )] […, ( 2 , 9 )] reduce […, ( 7 , 9 )] reduce […, ( 3 , 7 )] […, ( 3 , 9 )] […, ( 0 , 3 )] Left Pointers […, ( k , i )] […, ( i , j )] Reduce Actions: O(n 3 ) […, ( k , j )] 19 Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)

  20. 
 
 
 
 
 
 
 
 
 Dynamic Programming: Merging Stacks • Temporarily merging stacks will make our state space polynomial. 
 O (2 n ) O ( n 3 ) • And our parsing state is represented by top span ( i , j ). 20

  21. 
 
 
 
 
 
 Becoming Action Synchronous • Shift-Reduce Parsers are traditionally action synchronous. • This makes beam-search straight forward. • We will also do the same 
 O (2 n ) O ( n 4 ) • But will show that this will slow down our DP (before applying beam-search) 21

Recommend


More recommend