The 21st Chinese Lexical Semantics Workshop (CLSW 2020) “IS HOME WHERE THE WORD VECTORS LEAD?” —— A Corpus-based Diachronic Study of Jia Pei-Yi Chen and Shu-Kai Hsieh Graduate Institute of Linguistics, National Taiwan University
Sources of photo THE CONCEPT OF HOME Corpus-based computational linguistics Semantically related words to jia How the concept of home comes into shape through the lens of time
THE CONCEPT OF HOME IN LITERATURE • Has been extensively studied in (environmental) psychology, sociology, anthropology, architecture, and other fields of study. [1, 2, 5, 6] • Discussed under specialized topics such as homelessness, journeying, migration, aging, and gender. • Intertwined with words home, house, dwelling, and family. [1, 2] • “not only of belonging but also of potential alienation when attempts to make home fail or are subverted.” [5]
SEMANTIC CHANGE • The changes encompass changes to “core meanings of words” or “subtle shifts of cultural associations” [10] • The application of computation to larger sets of words across longer periods of time enables the generalization of regularities on semantic change. [9, 10]
TYPES OF SEMANTIC CHANGE (BLOOMFIELD, 1933) A. Narrowing: e.g., ‘skyline’, ‘ 有事 ’ B. Widening: e.g., ‘Kleenex’, ‘ 舒跑 ’ C. Metaphor: e.g., ‘broadcast’ D. Metonymy: e.g., jaw ‘cheek’ → ‘mandible’ E. Synecdoche (whole-part relation): e.g. capital cities → countries or their governments F. Hyperbole (weaker to stronger): e.g., kill ‘torment’ → ‘slaughter’ G. Meiosis (stronger to weaker): e.g., astound "strike with thunder" → "surprise strongly" H. Degeneration: e.g., knave "boy" → "servant" → "deceitful or despicable man". I. Elevation: e.g., knight "boy" → "nobleman".
METHODOLOGY – DISTRIBUTIONAL SEMANTIC APPROACH • Corpus-based / Usage-based approach • A word’s collocational pattern. • The use of word embeddings to trace semantic relies on the idea that these changes synchronize with changes in word co-occurrences. [11] • Data-driven way / Language models: • Skip-gram with negative sampling (SGNS) [13, 14] • Singular value decomposition (SVD) [15] • t-Distributed stochastic neighbor embedding (t-SNE) [15]
DATA COLLECTION
NOTES ON METHODOLOGIES • Character and word embeddings: Character-based methods are likely to produce a more desirable results than word-based ones at some times, especially when the input data are “vulnerable to the presence of out-of- vocabulary (OOV) words.” [18] However, it is not to conclude that word segmentation is unnecessary, but that alternatives exist. • Vector alignment: Vector alignment is based on Procrustes analysis by Hamilton and Heuser on GitHub 3 . [9] • Dimensionality reduction: A two-dimensional data visualization is plotted by employing the t-SNE technique. [19, 20]
DISCUSSIONS – JIA IN PRE-MODERN TIME Adopted from [2]
DISCUSSIONS – JIA IN MODERN TIME
WORD VECTOR VISUALIZATION Fig. 2. Visualization of word vectors from Qing dynasty and Sinica Corpus
CONCLUSION A compressed history of the Chinese society and the Chinese language • The properties of a physical space and a structured social unit • Less associated with individuated roles such as a wife, but more closely focused on • the self, depicting personal memories of home leaving and returning The meaning conflation of home, house, and family can be explored as different • components Aspects of meanings are encoded in different two-character words in modern time • In the field of corpus and computational linguistics, changes of word choice and the • inclusion of more senses allow for a closer look at the texts in snapshots of specific time frames, while resonates with studies in other disciplines.
REFERENCES • 1. Mallett, S.: Understanding home: A critical review of the literature. Sociol. Rev. 52, 62–89 (2004). https://doi.org/10.1111/j.1467-954x.2004.00442.x. • 2. Sixsmith, J.: The meaning of home: An exploratory study of environmental experience. J. Environ. Psychol. 6, 281–298 (1986). https://doi.org/10.1016/S0272-4944(86)80002-0. • 3. Home, https://www.oed.com/view/Entry/87869?rskey=OqFwzy&result=1#contentWrapper, (2020). • 4. Jia, http://dict.revised.moe.edu.tw/cgi-bin/cbdic/gsweb.cgi?o=dcbdic&searchid=W00000005502, (2015). • 5. Samanani, Farhan, & Lenhard, J.: House and Home, http://www.anthroencyclopedia.com/entry/house-and-home, (2019). • 6. Moore, J.: Placing home in context. J. Environ. Psychol. 20, 207–217 (2000). https://doi.org/10.1006/jevp.2000.0178. • 7. Shen, M.-Y., Fu, C.-C.: Transformation of modern residential design in Taiwan: A case study on public housing projects from 1920s to 1960s. J. Des. 20, 43–62 (2015).
• 8. NMTH: Abode architecture in Taiwan, (2020). • 9. Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. In: 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers. pp. 1489–1501. Association for Computational Linguistics (ACL) (2016). https://doi.org/10.18653/v1/p16-1141. • 10. Kutuzov, A., Øvrelid, L., Szymanski, T., Velldal, E.: Diachronic word embeddings and semantic shifts: a survey. In: Proceedings of COLING 2018 (2018). • 11. Hamilton, W.L., Leskovec, J., Jurafsky, D.: Cultural shift or linguistic drift? Comparing two computational measures of semantic change. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 2116–2121. NIH Public Access (2016). • 12. Tahmasebi, N., Borin, L., Jatowt, A.: Survey of computational approaches to lexical semantic change. Comput. Linguist. 1, (2018). • 13. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. In: Advances in neural information processing systems. pp. 3111–3119 (2013). • 14. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (2013).
• 15. Smilkov, D., Brain, G., Thorat, N., Nicholson, C., Reif, E., Viégas, F.B., Wattenberg, M.: Embedding Projector: Interactive visualization and interpretation of embeddings. In: 30th Conference on Neural Information Processing Systems (NIPS 2016). , Barcelona, Spain (2016). • 16. Sturgeon, D.: Chinese Text Project: a dynamic digital library of premodern Chinese. Digit. Scholarsh. Humanit. (2018). • 17. Huang, C.-R., Chen, K.-J.C., Chang, L.-P .C., Hsu, H.-L.: An introduction to the Academia Sinica Balanced Corpus of Chinese. In: Proceedings of ROCLING. pp. 81–99 (1995). • 18. Li, X., Meng, Y., Sun, X., Han, Q., Yuan, A., Li, J.: Is word segmentation necessary for deep learning of Chinese representations? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019). pp. 3242–3252. Association for Computational Linguistics (ACL) (2019). https://doi.org/10.18653/v1/p19-1314. • 19. Smetanin, S.: Google News and Leo Tolstoy: Visualizing Word2Vec word embeddings using t- SNE, https://towardsdatascience.com/google-news-and-leo-tolstoy-visualizing-word2vec-word- embeddings-with-t-sne-11558d8bd4d. • 20. Van Der Maaten, L., Hinton, G.: Visualizing Data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
• 21. Camacho-Collados, J., Pilehvar, M.T.: From word to sense embeddings: A survey on vector representations of meaning. J. Artif. Intell. Res. 63, 743–788 (2018). https://doi.org/10.1613/jair.1.11259. • 22. Pelevina, M., Arefiev, N., Biemann, C., Panchenko, A.: Making Sense of Word Embeddings. In: Proceedings of the 1st Workshop on Representation Learning for NLP . pp. 174–183 (2016). https://doi.org/10.18653/V1/W16-1620. • 23. Jatowt, A., Campos, R., Bhowmick, S.S., Tahmasebi, N., Doucet, A.: Every word has its history: Interactive exploration and visualization of word sense evolution. In: Proceedings of International Conference on Information and Knowledge Management. pp. 1899–1902. Association for Computing Machinery (2018). https://doi.org/10.1145/3269206.3269218.
THANKS FOR LISTENING
Recommend
More recommend