Neural Joint Model for Transition-based Chinese Syntactic Analysis Shuhei Kurita Daisuke Kawahara Sadao Kurohashi Graduate School of Informatics, Kyoto University { kurita, dk, kuro } @nlp.ist.i.kyoto-u.ac.jp Abstract mentation errors. Once words have been wrongly- segmented, word embeddings and traditional one- We present neural network-based joint hot word features, used in dependency parsers, models for Chinese word segmentation, will mistake the precise meanings of the original POS tagging and dependency parsing. Our sentences. As a result, pipeline models achieve models are the first neural approaches for dependency scores of around 80% for Chinese. fully joint Chinese analysis that is known A traditional solution to this error propagation to prevent the error propagation problem problem is to use joint models. Many Chinese of pipeline models. Although word em- words play multiple grammatical roles with only beddings play a key role in dependency one grammatical form. Therefore, determining parsing, they cannot be applied directly to the word boundaries and the subsequent tagging the joint task in the previous work. To and dependency parsing are closely correlated. address this problem, we propose embed- Transition-based joint models for Chinese word dings of character strings, in addition to segmentation, POS tagging and dependency pars- words. Experiments show that our mod- ing are proposed by Hatori et al. (2012) and Zhang els outperform existing systems in Chinese et al. (2014). Hatori et al. (2012) state that de- word segmentation and POS tagging, and pendency information improves the performances perform preferable accuracies in depen- of word segmentation and POS tagging, and de- dency parsing. We also explore bi-LSTM velop the first transition-based joint word seg- models with fewer features. mentation, POS tagging and dependency parsing model. Zhang et al. (2014) expand this and find 1 Introduction that both the inter-word dependencies and intra- word dependencies are helpful in word segmenta- Dependency parsers have been enhanced by the use of neural networks and embedding vectors tion and POS tagging. (Chen and Manning, 2014; Weiss et al., 2015; Although the models of Hatori et al. (2012) and Zhou et al., 2015; Alberti et al., 2015; Andor et al., Zhang et al. (2014) perform better than pipeline 2016; Dyer et al., 2015). When these dependency models, they rely on the one-hot representation parsers process sentences in English and other lan- of characters and words, and do not assume the guages that use symbols for word separations, they similarities among characters and words. In ad- can be very accurate. However, for languages dition, not only words and characters but also that do not contain word separation symbols, de- many incomplete tokens appear in the transition- pendency parsers are used in pipeline processes based joint parsing process. Such incomplete or with word segmentation and POS tagging mod- unknown words (UNK) could become important els, and encounter serious problems because of cues for parsing, but they are not listed in dic- error propagations. In particular, Chinese word tionaries or pre-trained word embeddings. Some segmentation is notoriously difficult because sen- recent studies show that character-based embed- tences are written without word dividers and Chi- dings are effective in neural parsing (Ballesteros nese words are not clearly defined. Hence, the et al., 2015; Zheng et al., 2015), but their models pipeline of word segmentation, POS tagging and could not be directly applied to joint models be- dependency parsing always suffers from word seg- cause they use given word segmentations. To solve 1204 Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics , pages 1204–1214 Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics , pages 1204–1214 Vancouver, Canada, July 30 - August 4, 2017. c Vancouver, Canada, July 30 - August 4, 2017. c � 2017 Association for Computational Linguistics � 2017 Association for Computational Linguistics https://doi.org/10.18653/v1/P17-1111 https://doi.org/10.18653/v1/P17-1111
these problems, we propose neural network-based 技术有了新的进展。 joint models for word segmentation, POS tagging Technology have made new progress. and dependency parsing. We use both character and word embeddings for known tokens and apply Stack (word-based) Bu ff er (character-based) character string embeddings for unknown tokens. 有 新的进展。 Another problem in the models of Hatori et al. (2012) and Zhang et al. (2014) is that they rely 技术 Transitions History: 了 on detailed feature engineering. Recently, bidi- RR SH AP SH RL SH Left children Right children rectional LSTM (bi-LSTM) based neural network (word-based) (word-based) models with very few feature extraction are pro- posed (Kiperwasser and Goldberg, 2016; Cross Figure 1: Transition-based Chinese joint model and Huang, 2016). In their models, the bi-LSTM for word segmentation, POS tagging and depen- is used to represent the tokens including their con- dency parsing. text. Indeed, such neural networks can observe whole sentence through the bi-LSTM. This bi- LSTM is similar to that of neural machine trans- tions (Figure 1). The model consists of one buffer lation models of Bahdanau et al. (2014). As a and one stack. The buffer contains characters in result, Kiperwasser and Goldberg (2016) achieve the input sentence, and the stack contains words competitive scores with the previous state-of-the- shifted from the buffer. The stack words may art models. We also develop joint models with n - have their child nodes. The words in the stack are gram character string bi-LSTM. formed by the following transition operations. In the experiments, we obtain state-of-the-art • SH(t) ( shift ): Shift the first character of the Chinese word segmentation and POS tagging buffer to the top of the stack as a new word. scores, and the pipeline of the dependency model achieves the better dependency scores than the • AP ( append ): Append the first character of previous joint models. To the best of our knowl- the buffer to the end of the top word of the edge, this is the first model to use embeddings and stack. neural networks for Chinese full joint parsing. • RR ( reduce-right ): Reduce the right word of Our contributions are summarized as follows: the top two words of the stack, and make the (1) we propose the first embedding-based fully right child node of the left word. joint parsing model, (2) we use character string embeddings for UNK and incomplete tokens. (3) • RL ( reduce-left ): Reduce the left word of the we also explore bi-LSTM models to avoid the de- top two words of the stack, and make the left tailed feature engineering in previous approaches. child node of the right word. (4) in experiments using Chinese corpus, we The RR and RL operations are the same as those achieve state-of-the-art scores in word segmenta- of the arc-standard algorithm (Nivre, 2004a). SH tion, POS tagging and dependency parsing. makes a new word whereas AP makes the current 2 Model word longer by adding one character. The POS tags are attached with the SH(t) transition. All full joint parsing models we present in this In this paper, we explore both greedy models paper use the transition-based algorithm in Sec- and beam decoding models. This parsing algo- tion 2.1 and the embeddings of character strings rithm works in both types. We also develop a in Section 2.2. We present two neural networks: joint model of word segmentation and POS tag- the feed-forward neural network models in Sec- ging, along with a dependency parsing model. The tion 2.3 and the bi-LSTM models in Section 2.4. joint model of word segmentation and POS tag- ging does not have RR and RL transitions. 2.1 Transition-based Algorithm for Joint Segmentation, POS Tagging, and 2.2 Embeddings of Character Strings Dependency Parsing First, we explain the embeddings used in the neu- Based on Hatori et al. (2012), we use a modi- ral networks. Later, we explain details of the neu- fied arc-standard algorithm for character transi- ral networks in Section 2.3 and 2.4. 1205
Recommend
More recommend