ACL2018 Incorporating Latent Meanings of Morphological Compositions - PowerPoint PPT Presentation

ACL2018 Incorporating Latent Meanings of Morphological Compositions to Enhance Word Embeddings Yang Xu, Jiawei Liu, Wei Yang, and Liusheng Huang School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230027, China July 17 th 2018

ACL2018 Introduction 01 Latent Meaning Models 02 OUTLINE Experimental Setup 03 Experimental Results 04 Conclusion 05 � 2

ACL2018 01 Introduction 3

Word-level Word Embedding ACL2018 Neural Network-Based 01 e.g., GloVe (Pennington et al.) INPUT PROJECTION OUTPUT INPUT PROJECTION OUTPUT w (t-2) w (t-2) word-word w (t-1) w (t-1) SUM SUM co-occurrence matrix w (t) w (t) w (t+1) w (t+1) w (t+2) w (t+2) C B O W Skip-gram e.g., CBOW, Skip-gram (Mikolov et al.) 02 Matrix Factorization-Based ( Spectral Methods ) 4

Morphology-based Word Embedding ACL2018 Morpheme Prefix Root Suffix Embeddings → → → 𝑗𝑜 − 𝑑𝑠𝑓𝑒 𝑗𝑐𝑚𝑓 Training Model 01 Word Word Embeddings → incredible Generated Generated Word Word Vectors 02 Generative Model Morpheme Prefix Root Suffix Embeddings � 5

Our Original Intention ACL2018 Word-level models: InputWords; Output Word Embeddings Morphology-based models: Input Words + Morphemes Output Word Embeddings + Morpheme Embeddings Our Latent Meaning Models: InputWords + Latent Meanings of Morphemes Output Word Embeddings ( no by-product, e.g., morpheme embedding) PURPOSE: to not only encode morphological properties into words, but also enhance the semantic similarities among word embeddings � 6

Explicit Models & Our Models ACL2018 Explicit models Lookup table directly use morphemes Latent Meaning Prefix Corpus in cred ible in in, not � � un not sentence i : it is an incredible thing in not believe able capable Latent Meaning Root believ believe un believ able cred believe � � � sentence j : it is unbelievable that not believe able capable Latent Meaning Suffix able, capale able Our models able, capale ible employ the latent meanings of morphemes *Note: The lookup table can be derived from morphological lexicons. � 7

ACL2018 02 Latent Meaning Models � 8

CBOW with Negative Sampling ACL2018 Sequence of tokens INPUT PROJECTION OUTPUT t i-2 Objective Function: t i-1 SUM 𝑜 𝑀 = 1 ∑ log 𝑞 ( 𝑢 𝑗 | 𝐷𝑝𝑜𝑢𝑓𝑦𝑢 ( 𝑢 𝑗 ) ) t i 𝑜 𝑗 =1 (Target Word) t i+1 Negative Sampling: t i+2 (Context Words) � 9

Three Specific Models ACL2018 01 03 02 LMM-A LMM-M (Latent Meaning Model-Average) (Latent Meaning Model-Max) LMM-S (Latent Meaning Model-Similarity) � 10

Word Map ACL2018 incredible Lookup table Word Map Latent Meaning Prefix in in, not Word Prefix Root Suffix in cred ible un not Latent Meaning Root incredible in not believe able capable believ believe cred believe unbelievable not believe able capable unbelievable Latent Meaning Suffix able, capale able able, capale ible #rows = |vocabulary| un believ able *Note: The derivational morphemes, not the inflectional morphemes, are mainly concerned � 11

Latent Meaning Model-Average (LMM-A) ACL2018 Sequence of tokens The latent meanings of ’s morphemes have A paradigm of LMM-A equal contributions to 𝑫𝒑𝒐𝒖𝒇𝒚𝒖 ( 𝒖 𝒋 ) Latent Meaning The modified embedding of : 1/ 5 in it Prefix 1/ 5 𝒖 𝒋 not is SUM 1/ 5 believe an Root : a set of latent meanings of ’s morphemes incredible 1/ 5 capable : the length of Suffix 1/ 5 thing able An item of the Word Map is utilized for training Word Prefix Root Suffix incredible in not believe able capable � 12

Latent Meaning Model-Similarity (LMM-S) ACL2018 Sequence of tokens The latent meanings of ’s morphemes are A paradigm of LMM-S assigned with different weights: cos ( 𝒘 𝒖 𝒌 , 𝒘 𝒙 ) 𝑫𝒑𝒐𝒖𝒇𝒚𝒖 ( 𝒖 𝒋 ) Latent Meaning 𝝏 < 𝒖 𝒌 , 𝒙 > = , 𝒙 ∈ 𝑵 𝒌 ? in in it ∑ 𝒚∈𝑵 𝒌 cos ( 𝒘 𝒖 𝒌 , 𝒘 𝒚 ) Prefix ? not 𝒖 𝒋 not is SUM ? believe believe an Root The modified embedding of : incredible ? capable capable Suffix ? able thing able An item of the Word Map : a set of latent meanings of ’s morphemes Word Prefix Root Suffix incredible in not believe able capable � 13

Latent Meaning Model-Max (LMM-M) ACL2018 Sequence of tokens Keep the latent meanings that have A paradigm of LMM-M maximum similarities to : 𝑥 𝑑𝑝𝑡 ( 𝑤 𝑢 𝑘 , 𝑤 𝑥 ) , 𝑥 ∈ 𝑄 𝑘 𝑄 𝑘 𝑛𝑏𝑦 = 𝑏𝑠𝑕 max 𝑫𝒑𝒐𝒖𝒇𝒚𝒖 ( 𝒖 𝒋 ) Latent Meaning in 𝑥 𝑑𝑝𝑡 ( 𝑤 𝑢 𝑘 , 𝑤 𝑥 ) , 𝑥 ∈ 𝑆 𝑘 it 𝑆 𝑘 𝑛𝑏𝑦 = 𝑏𝑠𝑕 max Prefix ? not 𝒖 𝒋 not is SUM 𝑥 𝑑𝑝𝑡 ( 𝑤 𝑢 𝑘 , 𝑤 𝑥 ) , 𝑥 ∈ 𝑇 𝑘 𝑇 𝑘 𝑛𝑏𝑦 = 𝑏𝑠𝑕 max ? believe believe an Root incredible capable The modified embedding of : Suffix ? able thing able An item of the Word Map Word Prefix Root Suffix 𝑛𝑏𝑦 = { 𝑄 𝑘 𝑁 𝑘 𝑛𝑏𝑦 , 𝑆 𝑘 𝑛𝑏𝑦 , 𝑇 𝑘 𝑛𝑏𝑦 } incredible in not believe able capable � 14

Update Rules for LMMs ACL2018 New Objective Function (After modifying the input layer of CBOW): 𝑜 𝑀 = 1 ^ ∑ ∑ ^ log 𝑞 ( 𝑤 𝑢 𝑗 | 𝑤 𝑢 𝑘 ) 𝑜 𝑢 𝑘 ∈ 𝐷𝑝𝑜𝑢𝑓𝑦𝑢 ( 𝑢 𝑗 ) 𝑗 =1 All parameters introduced by our models can be directly derived using the word map and word embeddings Update not just but the embeddings of the latent meanings with the same weights as they are assigned in the forward propagation period � 15

ACL2018 03 Experimental Setup � 16

Corpus & Word Map ACL2018 Corpus Word Map • News corpus of 2009 (2013 ACL • Morpheme segmentation using Eighth Workshop) Morefessor (Creutz & Lagus, 2007) • Size: 1.7GB • Assign latent meanings • ~500 million tokens • Lookup table ► derived from the resources provided • ~600,000 words by Michigan State University* ► 90 prefixes, 382 roots, 67 suffixes • Digits & punctuation marks are filtered *Resources web link: https://msu.edu/~defores1/gre/roots/gre_rts_afx1.htm � 17

Baselines & Parameter Settings ACL2018 Baselines: � Word-level models: CBOW, Skip-gram, GloVe � Explicitly Morpheme-related Model (EMM) A paradigm of EMM Morphemes it Super-parameter Settings: in Prefix SUM is � Equal settings to all models an cred Root incredible � Vector Dimension: 200 ible Suffix � Context window size: 5 thing � #Negative_Samples: 20 � 18

Evaluation Benchmarks (1/2) ACL2018 Word Similarity: Dataset Name #Pairs Name #Pairs Name #Pairs RG-65 65 Rare-Word 2034 Men-3k 3000 Wordsim-353 353 SCWS 2003 WS-353-Related 252 Gold Standard Datasets Widely-used Datasets Syntactic Analogy: � “ a b as c ? (d) ” e.g., Queen King as Woman (Man) � Microsoft Research Syntactic Analogies dataset (8000 items) � 19

Evaluation Benchmarks (2/2) ACL2018 Text Classification: � 20 Newsgroups dataset (19000 documents of 20 different topics) � 4 text classification tasks, each involves 10 topics � Training/Validation/Test subsets (6:2:2) � Feature vector: average word embedding of words in each document � L2-regularized logistic regression classifier � 20

ACL2018 04 Experimental Results � 21

The Results on Word Similarity ACL2018 CBOW Skip-gram GloVe EMM LMM-A LMM-S LMM-M Wordsim-353 58.77 61.94 49.40 60.01 62.05 63.13 61.54 Rare-Word 40.58 36.42 33.40 40.83 43.12 42.14 40.51 RG-65 56.50 62.81 59.92 60.85 62.51 62.49 63.07 SCWS 63.13 60.20 47.98 60.28 61.86 61.71 63.02 Men-3k 68.07 66.30 60.56 66.76 66.26 68.36 64.65 56.14 58.47 55.19 WS-353-Related 49.72 57.05 47.46 54.48 (Given different models) Spearman’s rank correlation (%) on different datasets � 22

The Results on Syntactic Analogy ACL2018 Question: “ a b as c (d) ” Answer: CBOW Skip-gram GloVe EMM LMM-A LMM-S LMM-M Syntactic 20.38 17.59 18.30 13.46 13.14 13.94 17.34 Analogy Syntactic analogy performance (%) � 23

The Results on Text Classification ACL2018 CBOW Skip-gram GloVe EMM LMM-A LMM-S LMM-M Text 80.67 80.59 81.28 78.26 79.40 77.01 80.00 Classification Average text classification accuracy across the 4 tasks (%) � 24

The Impact of Corpus Size ACL2018 Results on Wordsim-353 task with different corpus size � 25

The Impact of Context Window Size ACL2018 Results on Wordsim-353 task with different context window size � 26

Word Embedding Visualization ACL2018 ☒ latent meanings of morphemes Visualization of word embeddings based on PCA � 27

ACL2018 05 Conclusions � 28

ACL2018 Conclusions • Employ latent meanings of morphemes rather than the internal compositions themselves to train word embeddings • By modifying the input layer and update rules of CBOW, we proposed three latent meaning models (LMM-A, LMM-S, LMM-M) • The comprehensive quality of word embedings are enhanced by incorporating latent meanings of morphemes • In the future, we intend to evaluate our models for some morpheme-rich languages like Russian, German, etc. � 29

ACL2018 Thank you! Questions?

ACL2018 Incorporating Latent Meanings of Morphological Compositions - PowerPoint PPT Presentation

ACL2018 Incorporating Latent Meanings of Morphological Compositions to Enhance Word Embeddings Yang Xu, Jiawei Liu, Wei Yang, and Liusheng Huang School of Computer Science and Technology, University of Science and Technology of China,

Using Machine Learning to Study the Neural Representations of Language Meanings Tom M. Mitchell

The meanings of indexical words What does a listener understand sustainable meanings of the

Lesson 8 Vocabulary & Anti synonym Different words with synonym similar meanings

Supervised Learning of Complete Morphological Paradigms Greg Durrett and John DeNero UC

Morphology & Transducers Intro to morphological analysis of languages Motivation for

An Unsupervised Method for Uncovering Morphological Chains Karthik Narasimhan Regina Barzilay

Russian Morphological Processing for ICALL System architecture Exercise design Error types

A New Universal Morphological Feature Schema for Rich Morphological Annotation and Cross-Lingual

Morphological Analysis Morphological Analysis and Generation for Pali and Generation for Pali

Maca a configurable tool to Maca a configurable tool to integrate Polish morphological

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

INCORPORATING LARGE-SCALE CITIZEN INCORPORATING LARGE-SCALE CITIZEN DELIBERATION INTO

Incorporating the Zebrafish Embryo Incorporating the Zebrafish Embryo Teratogenicity Assay Into

A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation using

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Application of the Scoping Fire Human Reliability Analysis on the Nuclear Power Plant Ga Young Park

Development of lookup tables of climate change impact on national water resources Naota Hanasaki

LFEV - High Level Simplified Block Diagram Work Breakdown Structure WBS Schedule PSL

FY17 UConn Draft Budget Presentation Board of Trustees Financial Affairs Committee May 25, 2016

"Agile Database Development" Presented by: Pramod Sadalage ThoughtWorks Inc Brought to

ChipWhisperer Lite Modeling Power Consumption Every device requires power to run (static Dynamic

Ryan Burns Rho, Inc., Chapel Hill NC PhUSE Conference

SOLAR MOBIUS James Fisher George Oppong David Pettibone Background Designed by Robert

ACL2018 Incorporating Latent Meanings of Morphological Compositions - PowerPoint PPT Presentation

ACL2018 Incorporating Latent Meanings of Morphological Compositions to Enhance Word Embeddings Yang Xu, Jiawei Liu, Wei Yang, and Liusheng Huang School of Computer Science and Technology, University of Science and Technology of China,

Using Machine Learning to Study the Neural Representations of Language Meanings Tom M. Mitchell

The meanings of indexical words What does a listener understand sustainable meanings of the

Lesson 8 Vocabulary &amp; Anti synonym Different words with synonym similar meanings

Supervised Learning of Complete Morphological Paradigms Greg Durrett and John DeNero UC

Morphology &amp; Transducers Intro to morphological analysis of languages Motivation for

An Unsupervised Method for Uncovering Morphological Chains Karthik Narasimhan Regina Barzilay

Russian Morphological Processing for ICALL System architecture Exercise design Error types

A New Universal Morphological Feature Schema for Rich Morphological Annotation and Cross-Lingual

Morphological Analysis Morphological Analysis and Generation for Pali and Generation for Pali

Maca a configurable tool to Maca a configurable tool to integrate Polish morphological

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

INCORPORATING LARGE-SCALE CITIZEN INCORPORATING LARGE-SCALE CITIZEN DELIBERATION INTO

Incorporating the Zebrafish Embryo Incorporating the Zebrafish Embryo Teratogenicity Assay Into

A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation using

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Application of the Scoping Fire Human Reliability Analysis on the Nuclear Power Plant Ga Young Park

Development of lookup tables of climate change impact on national water resources Naota Hanasaki

LFEV - High Level Simplified Block Diagram Work Breakdown Structure WBS Schedule PSL

FY17 UConn Draft Budget Presentation Board of Trustees Financial Affairs Committee May 25, 2016

&quot;Agile Database Development&quot; Presented by: Pramod Sadalage ThoughtWorks Inc Brought to

ChipWhisperer Lite Modeling Power Consumption Every device requires power to run (static Dynamic

Ryan Burns Rho, Inc., Chapel Hill NC PhUSE Conference

SOLAR MOBIUS James Fisher George Oppong David Pettibone Background Designed by Robert

Lesson 8 Vocabulary & Anti synonym Different words with synonym similar meanings

Morphology & Transducers Intro to morphological analysis of languages Motivation for

"Agile Database Development" Presented by: Pramod Sadalage ThoughtWorks Inc Brought to