Linear Time Constituency Parsing with RNNs and Dynamic Programming Juneki Hong 1 Liang Huang 1,2 1 Oregon State University 2 Baidu Research Silicon Valley AI Lab
Span Parsing is SOTA in Constituency Parsing • Cross+Huang 2016 introduced Span Parsing • But with greedy decoding. • Stern et al. 2017 had Span Parsing with Exact Search and Global Training • But was too slow: O ( n 3 ) • Can we get the best of both worlds? Cross + Huang 2016 Our Work • Something that is both fast and accurate? Speed New at ACL 2018! Also Span Parsing! Kitaev + Klein 2018 Stern et al. 2017 Joshi et al. 2018 Accuracy 2
Both Fast and Accurate! chart parsing k r o w r u o Baseline Chart Parser (Stern et al. 2017a) 91.79 Our Linear Time Parser 91.97 3
<latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> In this talk, we will discuss: • Linear Time Constituency Parsing using dynamic programming • Going slower in order to go faster: O ( n 3 ) → O ( n 4 ) → O ( n ) • Cube Pruning to speed up Incremental Parsing with Dynamic Programming • From O ( n b 2 ) to O ( n b log b ) • An improved loss function for Loss-Augmented Decoding • 2nd highest accuracy among single systems trained on PTB only O (2 n ) → O ( n 3 ) → O ( n 4 ) O ( nb 2 ) O ( nb log b ) 4
Span Parsing s ( i, j, X ) • Span differences are taken from an encoder (in our case: a bi-LSTM) s • A span is scored and labeled by a feed-forward network. • The score of a tree is the sum of all the labeled ( f j − f i , b i − b j ) span scores f 0 f 1 f 2 f 3 f 4 f 5 s tree ( t ) = P s ( i, j, X ) ( i,j,X ) ∈ t ⟨ / s ⟩ ⟨ s ⟩ should eat ice cream You 0 4 1 2 3 5 b 0 b 1 b 2 b 3 b 4 b 5 5 Cross + Huang 2016 Stern et al. 2017 Wang + Chang 2016
Incremental Span Parsing Example Action Label Stack 0 1 2 3 4 5 Eat ice cream after lunch VB NN NN IN NN Cross + Huang 2016 6
Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP VB NP PP Eat NN NN IN NP ice cream after NN lunch ø 0 1 2 3 4 5 Eat ice cream after lunch VB NN NN IN NN Cross + Huang 2016 7
Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP 2 Shift ø (0, 1) (1, 2) VB NN NP PP Eat ice NN IN NP cream after NN lunch ø ø 0 1 2 3 4 5 Eat ice cream after lunch VB NN NN IN NN Cross + Huang 2016 8
Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP PP 2 Shift ø (0, 1) (1, 2) VB NN NN NP IN NP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat ice cream NN after NN cream lunch ø ø ø 0 1 2 3 4 5 Eat ice cream after lunch VB NN NN IN NN Cross + Huang 2016 9
Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN lunch NP ø ø ø 0 1 2 3 4 5 Eat ice cream after lunch VB NN NN IN NN Cross + Huang 2016 10
Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN 5 Reduce ø (0, 3) lunch ø NP ø ø ø 0 1 2 3 4 5 Eat ice cream after lunch VB NN NN IN NN Cross + Huang 2016 11
Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN 5 Reduce ø (0, 3) lunch 6 Shift ø (0, 3) (3, 4) ø NP ø ø ø ø 0 1 2 3 4 5 Eat ice cream after lunch VB NN NN IN NN Cross + Huang 2016 12
Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP NP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NN 4 Reduce NP (0, 1) (1, 3) ice cream after lunch 5 Reduce ø (0, 3) 6 Shift ø (0, 3) (3, 4) ø NP 7 Shift NP (0, 3) (3, 4) (4, 5) ø ø ø ø NP 0 1 2 3 4 5 Eat ice cream after lunch VB NN NN IN NN Cross + Huang 2016 13
Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN 5 Reduce ø (0, 3) lunch 6 Shift ø (0, 3) (3, 4) ø NP PP 7 Shift NP (0, 3) (3, 4) (4, 5) ø ø ø ø NP 8 Reduce PP (0, 3) (3, 5) 0 1 2 3 4 5 Eat ice cream after lunch VB NN NN IN NN Cross + Huang 2016 14
Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) S-VP ice cream after NN 5 Reduce ø (0, 3) lunch 6 Shift ø (0, 3) (3, 4) ø NP PP 7 Shift NP (0, 3) (3, 4) (4, 5) ø ø ø ø NP 8 Reduce PP (0, 3) (3, 5) 0 1 2 3 4 5 9 Reduce S-VP (0, 5) Eat ice cream after lunch VB NN NN IN NN Cross + Huang 2016 15
How Many Possible Parsing Paths? • 2 actions per state. • O (2 n ) O (2 n ) 16
Equivalent Stacks? • Observe that all stacks that end with ( i , j ) will be treated the same! • …Until ( i , j ) is popped off. [( 0 , 2 ), ( 2 , 7 ), ( 7 , 9 )] […, ( 7 , 9 )] becomes [ (0, 3), ( 3 , 7 ), ( 7 , 9 )] • So we can treat these as “temporarily equivalent”, and merge. 17 Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)
Equivalent Stacks? • Observe that all stacks that end with ( i , j ) will be treated the same! • …Until ( i , j ) is popped off. […, ( 0 , 2 )] […, ( 2 , 7 )] […, ( 7 , 9 )] […, ( 3 , 7 )] […, ( 0 , 3 )] • This is our new stack representation. Left Pointers 18 Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)
Equivalent Stacks? • Observe that all stacks that end with ( i , j ) will be treated the same! • …Until ( i , j ) is popped off. […, ( 0 , 2 )] […, ( 2 , 7 )] […, ( 2 , 9 )] reduce […, ( 7 , 9 )] reduce […, ( 3 , 7 )] […, ( 3 , 9 )] […, ( 0 , 3 )] Left Pointers […, ( k , i )] […, ( i , j )] Reduce Actions: O(n 3 ) […, ( k , j )] 19 Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)
Dynamic Programming: Merging Stacks • Temporarily merging stacks will make our state space polynomial. O (2 n ) O ( n 3 ) • And our parsing state is represented by top span ( i , j ). 20
Becoming Action Synchronous • Shift-Reduce Parsers are traditionally action synchronous. • This makes beam-search straight forward. • We will also do the same O (2 n ) O ( n 4 ) • But will show that this will slow down our DP (before applying beam-search) 21
Recommend
More recommend