supervised word sense disambiguation
play

Supervised word sense disambiguation on polysemy with bidirectional - PowerPoint PPT Presentation

Supervised word sense disambiguation on polysemy with bidirectional LSTM: A case study of BUN in Taiwan Hakka Huei-Ling Lai, Hsiao-Ling Hsu, Jyi-Shane Liu, Chia-Hung Lin and Yanhong Chen National Chengchi University, Taiwan Presented at The 21st


  1. Supervised word sense disambiguation on polysemy with bidirectional LSTM: A case study of BUN in Taiwan Hakka Huei-Ling Lai, Hsiao-Ling Hsu, Jyi-Shane Liu, Chia-Hung Lin and Yanhong Chen National Chengchi University, Taiwan Presented at The 21st Chinese Lexical Semantics Workshop (CLSW2020), City University of Hong Kong, Hong Kong. May 28-30, 2020

  2. TABLE OF 01 Introduction CONTENTS 02 Related work Polysemous BUN  Bidirectional LSTM for WSD task  in Taiwan Hakka Experiments 04 03 Methods Dataset and Evaluation Metrics  Results and Analysis  Overall Architecture  Input layers  Conclusion and 05 Output layers  Future Work 06 References

  3. 01 INTRODUCTION

  4. Introduction  Polysemy is ubiquitous in languages  may trigger some problems in some of the NLP tasks performed by machines, for instance, part-of-speech (POS) tagging  Word sense disambiguation (WSD)  Navigli (2009:3) defines WSD as ‘the ability to computationally determine which sense of a word is activated by its use in a particular context ’.

  5. Introduction  In the extant literature WSD focuses on a few dominant languages, such as English and Chinese.  Findings based on a few dominant languages may lead to narrow  applications. Low-resource languages still receive little attention, for most WSD tasks  are trained by supervised learning, which requires a large amount of labeled data : expensive and time-consuming  A language-specific WSD system is in need to implement in low- resource languages, for instance, in Taiwan Hakka.

  6. Introduction  Polysemous phenomena in Taiwan Hakka have been probed into by quite a few studies.  A part-of-speech (POS) tagging system needs to be established for classification and categorization of the corpus data.

  7. Introduction  Our aim:  to schematize a workable coding framework from integrating and modifying the findings of previous studies on Taiwan Hakka polysemous phenomena  to develop a model for automatic polysemous word sense disambiguation for Taiwan Hakka corpus

  8. 02 Related work

  9. Related work  Polysemous BUN in Taiwan Hakka Table 1. The various usages of BUN .

  10. Related work  Bidirectional LSTM for WSD task in Taiwan Hakka A number of neural network models and algorithms have been proposed ● and designed to simulate human understanding with statistical procedures to capture patterns of co-occurrences of words in context. Numerous studies have proposed different neural language models to solve ● the task of word sense disambiguation based on the contextual hypotheses for words and senses (cf. Li and Jurafsky, 2015; Peters, et al., 2018). Yuan et al (2016) have exemplified state- of-the-art results of WSD by ● employing a supervised and a semi-supervised LSTM neural network model.

  11. Related work  Bidirectional LSTM for WSD task in Taiwan Hakka To better capture the surrounding information of polysemous BUN ● We employ a bidirectional LSTM (Graves & Schmidhuber, 2005; Graves et al., ○ 2013) and train the model on labeled data annotated by human to disambiguate and predict the sense of BUN . In disambiguating the sense of polysemous BUN , the contextual and syntactic ○ information of BUN are crucial and should be taken into account. The basic idea of Bi-LSTM is to capture past and future information by ○ presenting them to two hidden states and then the two hidden states are concatenated to form the final output.

  12. 03 Methods

  13. Methods   Overall Architecture   

  14. Methods  Overall Architecture Table 2. The number of tokens and types in each dataset Table 1. The number of tokens and types in each dataset Type Dataset Token 64,278 5,695 Dataset 1 Word embedding 89,126 2,164 (manually annotated instances Character embedding containing BUN ) 64,103 POS 24 68,012 7,322 Dataset 2 Word embedding (MOE Read Out Loud Tests ) 3,910 Dataset 3 Character embedding 835,534 (Hans Christian Andersen's fairy tales (translation in Hakka))

  15. 04 Experiments

  16. Experiments  Dataset and Evaluation Metrics Table 3. The occurrences of VA1, VA2, P1, and P2 in dataset 1. Dataset 1 Label Occurrence Training Set VA1 66 (4%) (Training + Dev) VA2 753 (46%) P1 238 (15%) P2 576 (35%) Table 4. The number of samples in the training set, dev set and test set Subtotal 1,633 (100%) Dataset 1 Token Type Test Set VA1 7 (4%) Training Word embedding 14,410 2,463 VA2 75 (46%) (around 80%) Character embedding 14,410 1,226 P1 24(15%) POS 14,410 24 P2 57 (35%) Word embedding 1,610 672 Dev Subtotal 163 (100%) (around 10%) Character embedding 1,610 513 POS 1,610 23 Word embedding 1,630 652 Test (around 10%; fixed) Character embedding 1,630 508 POS 1,630 22 Total Word embedding 17,650 2,708 (100%) Character embedding 17,650 1,283 POS 17,650 24

  17. Experiments  Results and Analysis

  18. Experiments  Results and Analysis

  19. Experiments  Results and Analysis

  20. Experiments  Results and Analysis  larger corpus data is needed  sentence embeddings (cf. Wang and Chang, 2016)

  21. 05 Conclusion and Future Work

  22. Conclusion and Future Work  In this study, we propose a WSD model on the classification of polysemy in Taiwan Hakka , a low-resource language in the world, especially in the field of NLP.  The model proposed is a supervised bidirectional LSTM model trained and tested on a small amount of labeled data.  Four kinds of input features are utilized  POS only  POS + word embeddings  POS + character embeddings  POS + word embeddings+ character embeddings  the best performance is achieved

  23. Conclusion and Future Work  To enhance the robustness and stability of the model, we will design and include other possible parameters to compare and contrast the performance of the experiments.  To test the model with different window spans (from L1R1 to L10R10) and/or with whole sentences as inputs.  To improve the research design, we will try random selection without considering the overall distribution of the four labels on the test set for other experiments in the future.  To test the model on other polysemous words in Taiwan Hakka.

  24. 06 References

  25. References Chiang, M. H. (2016). The Functions and Origin of Locative TU5 in Hailu Hakka, with a Note on 1. the Origin of the Durative Marker TEN3. Bulletin of Chinese Linguistics , 9 (1), 95- 120. Graves, A. and Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional 2. lstm networks. In Proceedings of the 2005 International Joint Conference on Neural Networks . Montreal, Canada. Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural 3. networks. In Proceedings of ICASSP-2013 , 6645 – 6649. IEEE Huang, H. C. (2014). Semantic Extensions and the Convergence of the Beneficiary Role: A Case 4. Study of BUN and Lau in Hakka. Concentric: Studies in Linguistics , 40 (1), 65-94. Iacobacci, I., Pilehvar, M. T., & Navigli, R. (2016). Embeddings for word sense disambiguation: 5. An evaluation study. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 897-907). Lai, H. L. (2001). On Hakka BUN : A case of polygrammaticalization. Language and Linguistics , 6. 2 (2), 137-153.

  26. References Lai, H. L. (2015). Profiling Hakka BUN 1 Causative Constructions. Language and Linguistics , 7. 16 (3), 369-395. Li, J., & Jurafsky, D. (2015). Do multi-sense embeddingss improve natural language 8. understanding? In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2015). Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient Estimation of Word 9. Representations in Vector Space. In Proceedings of Workshop at ICLR . 10. Mikolov, T., Yih, W., & Zweig, G. (2013b). Linguistic regularities in continuous space word representations. In Proceedings of HLT-NAACL , pp. 746-751. 11. Navigli, R. (2009). Word sense disambiguation: A survey. ACM computing surveys (CSUR), 41 (2), 1-69. 12. Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceeding of NAACL .

  27. References 13. Tseng, Y. C. (2012). An Optimality Theoretic Analysis of the Distribution of Hakka Prepositions DI, DO, BUN , LAU, TUNG, ZIONG. Concentric: Studies in Linguistics , 38 (2), 171-209. 14. Yuan, D., Richardson, J., Doherty, R., Evans, C., & Altendorf, E. (2016). Semi-supervised word sense disambiguation with neural models. In Proceeding of COLING , 1374 – 1385. 15. Wang, W., & Chang, B. (2016). Graph-based dependency parsing with bidirectional LSTM. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2306-2315).

  28. THANK YOU

Recommend


More recommend