Supervised word sense disambiguation on polysemy with bidirectional LSTM: A case study of BUN in Taiwan Hakka Huei-Ling Lai, Hsiao-Ling Hsu, Jyi-Shane Liu, Chia-Hung Lin and Yanhong Chen National Chengchi University, Taiwan Presented at The 21st Chinese Lexical Semantics Workshop (CLSW2020), City University of Hong Kong, Hong Kong. May 28-30, 2020
TABLE OF 01 Introduction CONTENTS 02 Related work Polysemous BUN Bidirectional LSTM for WSD task in Taiwan Hakka Experiments 04 03 Methods Dataset and Evaluation Metrics Results and Analysis Overall Architecture Input layers Conclusion and 05 Output layers Future Work 06 References
01 INTRODUCTION
Introduction Polysemy is ubiquitous in languages may trigger some problems in some of the NLP tasks performed by machines, for instance, part-of-speech (POS) tagging Word sense disambiguation (WSD) Navigli (2009:3) defines WSD as ‘the ability to computationally determine which sense of a word is activated by its use in a particular context ’.
Introduction In the extant literature WSD focuses on a few dominant languages, such as English and Chinese. Findings based on a few dominant languages may lead to narrow applications. Low-resource languages still receive little attention, for most WSD tasks are trained by supervised learning, which requires a large amount of labeled data : expensive and time-consuming A language-specific WSD system is in need to implement in low- resource languages, for instance, in Taiwan Hakka.
Introduction Polysemous phenomena in Taiwan Hakka have been probed into by quite a few studies. A part-of-speech (POS) tagging system needs to be established for classification and categorization of the corpus data.
Introduction Our aim: to schematize a workable coding framework from integrating and modifying the findings of previous studies on Taiwan Hakka polysemous phenomena to develop a model for automatic polysemous word sense disambiguation for Taiwan Hakka corpus
02 Related work
Related work Polysemous BUN in Taiwan Hakka Table 1. The various usages of BUN .
Related work Bidirectional LSTM for WSD task in Taiwan Hakka A number of neural network models and algorithms have been proposed ● and designed to simulate human understanding with statistical procedures to capture patterns of co-occurrences of words in context. Numerous studies have proposed different neural language models to solve ● the task of word sense disambiguation based on the contextual hypotheses for words and senses (cf. Li and Jurafsky, 2015; Peters, et al., 2018). Yuan et al (2016) have exemplified state- of-the-art results of WSD by ● employing a supervised and a semi-supervised LSTM neural network model.
Related work Bidirectional LSTM for WSD task in Taiwan Hakka To better capture the surrounding information of polysemous BUN ● We employ a bidirectional LSTM (Graves & Schmidhuber, 2005; Graves et al., ○ 2013) and train the model on labeled data annotated by human to disambiguate and predict the sense of BUN . In disambiguating the sense of polysemous BUN , the contextual and syntactic ○ information of BUN are crucial and should be taken into account. The basic idea of Bi-LSTM is to capture past and future information by ○ presenting them to two hidden states and then the two hidden states are concatenated to form the final output.
03 Methods
Methods Overall Architecture
Methods Overall Architecture Table 2. The number of tokens and types in each dataset Table 1. The number of tokens and types in each dataset Type Dataset Token 64,278 5,695 Dataset 1 Word embedding 89,126 2,164 (manually annotated instances Character embedding containing BUN ) 64,103 POS 24 68,012 7,322 Dataset 2 Word embedding (MOE Read Out Loud Tests ) 3,910 Dataset 3 Character embedding 835,534 (Hans Christian Andersen's fairy tales (translation in Hakka))
04 Experiments
Experiments Dataset and Evaluation Metrics Table 3. The occurrences of VA1, VA2, P1, and P2 in dataset 1. Dataset 1 Label Occurrence Training Set VA1 66 (4%) (Training + Dev) VA2 753 (46%) P1 238 (15%) P2 576 (35%) Table 4. The number of samples in the training set, dev set and test set Subtotal 1,633 (100%) Dataset 1 Token Type Test Set VA1 7 (4%) Training Word embedding 14,410 2,463 VA2 75 (46%) (around 80%) Character embedding 14,410 1,226 P1 24(15%) POS 14,410 24 P2 57 (35%) Word embedding 1,610 672 Dev Subtotal 163 (100%) (around 10%) Character embedding 1,610 513 POS 1,610 23 Word embedding 1,630 652 Test (around 10%; fixed) Character embedding 1,630 508 POS 1,630 22 Total Word embedding 17,650 2,708 (100%) Character embedding 17,650 1,283 POS 17,650 24
Experiments Results and Analysis
Experiments Results and Analysis
Experiments Results and Analysis
Experiments Results and Analysis larger corpus data is needed sentence embeddings (cf. Wang and Chang, 2016)
05 Conclusion and Future Work
Conclusion and Future Work In this study, we propose a WSD model on the classification of polysemy in Taiwan Hakka , a low-resource language in the world, especially in the field of NLP. The model proposed is a supervised bidirectional LSTM model trained and tested on a small amount of labeled data. Four kinds of input features are utilized POS only POS + word embeddings POS + character embeddings POS + word embeddings+ character embeddings the best performance is achieved
Conclusion and Future Work To enhance the robustness and stability of the model, we will design and include other possible parameters to compare and contrast the performance of the experiments. To test the model with different window spans (from L1R1 to L10R10) and/or with whole sentences as inputs. To improve the research design, we will try random selection without considering the overall distribution of the four labels on the test set for other experiments in the future. To test the model on other polysemous words in Taiwan Hakka.
06 References
References Chiang, M. H. (2016). The Functions and Origin of Locative TU5 in Hailu Hakka, with a Note on 1. the Origin of the Durative Marker TEN3. Bulletin of Chinese Linguistics , 9 (1), 95- 120. Graves, A. and Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional 2. lstm networks. In Proceedings of the 2005 International Joint Conference on Neural Networks . Montreal, Canada. Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural 3. networks. In Proceedings of ICASSP-2013 , 6645 – 6649. IEEE Huang, H. C. (2014). Semantic Extensions and the Convergence of the Beneficiary Role: A Case 4. Study of BUN and Lau in Hakka. Concentric: Studies in Linguistics , 40 (1), 65-94. Iacobacci, I., Pilehvar, M. T., & Navigli, R. (2016). Embeddings for word sense disambiguation: 5. An evaluation study. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 897-907). Lai, H. L. (2001). On Hakka BUN : A case of polygrammaticalization. Language and Linguistics , 6. 2 (2), 137-153.
References Lai, H. L. (2015). Profiling Hakka BUN 1 Causative Constructions. Language and Linguistics , 7. 16 (3), 369-395. Li, J., & Jurafsky, D. (2015). Do multi-sense embeddingss improve natural language 8. understanding? In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2015). Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient Estimation of Word 9. Representations in Vector Space. In Proceedings of Workshop at ICLR . 10. Mikolov, T., Yih, W., & Zweig, G. (2013b). Linguistic regularities in continuous space word representations. In Proceedings of HLT-NAACL , pp. 746-751. 11. Navigli, R. (2009). Word sense disambiguation: A survey. ACM computing surveys (CSUR), 41 (2), 1-69. 12. Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceeding of NAACL .
References 13. Tseng, Y. C. (2012). An Optimality Theoretic Analysis of the Distribution of Hakka Prepositions DI, DO, BUN , LAU, TUNG, ZIONG. Concentric: Studies in Linguistics , 38 (2), 171-209. 14. Yuan, D., Richardson, J., Doherty, R., Evans, C., & Altendorf, E. (2016). Semi-supervised word sense disambiguation with neural models. In Proceeding of COLING , 1374 – 1385. 15. Wang, W., & Chang, B. (2016). Graph-based dependency parsing with bidirectional LSTM. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2306-2315).
THANK YOU
Recommend
More recommend