Labeling Text in Several Languages with Mul;lingual Hierarchical AEen;on Networks Nikolaos Pappas , Andrei Popescu-Belis Idiap Research Ins;tute June 9, 2017 Swisstext, Winterthur
Topic Recogni;on Spam filtering — Mailbox Op;miza;on — Customer Support Nikolaos Pappas 2 /18
Ques;on Answering Reading/Naviga;on Assistant — Interac;ve Search Ques;on: Which Gaudi’s crea;on is his masterpiece? Answer: Sagrada Família Nikolaos Pappas 3 /18
Machine Transla;on Document Transla;on — Dialogue Transla;on Nikolaos Pappas 4 /18
Fundamental Func;on: Represen;ng Word Sequences • Goal : Learn representa;ons (distributed vectors) of word sequences which encode effec;vely the meaning / knowledge needed to perform ✓ Topic Recogni;on • Ques;on Answering • Machine Transla;on • Summariza;on … Can we benefit from multiple languages? Nikolaos Pappas 5 /18
Dealing with Mul;ple Languages: Monolingually • Solu:on? Separate models per language • language-dependent learning • linear growth of the parameters • lack of cross-language knowledge transfer • hierarchical modeling at the document-level Documents X = {x i | i=1…n} Models (Kim, 2014) f: X → Y (Tang et al., 2015) (Lin et al., 2015) Labels (Yang et al., 2016) Y = {y i | i=1…n} Nikolaos Pappas 6 /18
Dealing with Mul;ple Languages: Mul;lingually • Solu:on? Single model with aligned input space • language-independent learning • constant number of parameters • common label sets across languages • modeling at the word-level (Klementiev et al., 2012) (Herman and Blunsom, 2014) Model (Gouws et al., 2015) (Ammar et al., 2016) Nikolaos Pappas 7 /18
Dealing with Mul;ple Languages: Our contribu;on • Solu:on: Single model trained over arbitrary label sets with an aligned input space • language-independent learning • sub-linear growth of parameters • arbitrary label sets across languages • hierarchical modeling at the document-level Model1 Model2 ModelM … Nikolaos Pappas 8 /18
Background: Hierarchical AEen;on Networks (HANs) • Input: sequence of word vectors • Output : document vector u • Hierarchical structure - Word-level and sentence-level abstrac;on layers - encoder (H s , H w ) - aEen;on mechanism (a w , α s ) - Classifica;on layer (W c ) + cross-entropy Words: Sentences: • Training: using SGD with ADAM Document: (Yang et al., 2016) Nikolaos Pappas 9 /18
MHANs: Mul;lingual Hierarchical AEen;on Networks Nikolaos Pappas 10 /18
Mul;lingual AEen;on Networks: Computa;onal Cost • A fewer number of parameters is needed • θ enc = { H, W(l), H, W(l), W(l) } , θ att = { H(l), W, H(l) , W , W(l) } • θ both = { H, W, H, W, W(l) } , θ mono = { H(l), W(l), H(l), W(l), W(l) } • The following inequali;es are true: • Example with shared aEen;on mechanisms Naive DL multilingual adaptation fails! Nikolaos Pappas 11 /18
Mul;lingual AEen;on Networks: Training Strategy • Minimizing the sum of the cross-entropy errors • Issue : Naive consecu;ve training biases the model • Sample document-label pairs for each language in a cyclic fashion: (L 1 , …, L M ) (1) → … → (L 1 , …, L M ) (M) • Op:mizer : SGD with ADAM (same as before) Nikolaos Pappas 12 /18
Dataset: Deutsche Welle Corpus (600k docs, 8 langs) Tagged by journalists Nikolaos Pappas 13 /18
Full-resource Scenario: Bilingual Training Input: 40-d, Encoders: Dense 100-d, AEen;ons: Dense 100-d Ac;va;on: relu • Mul;lingual models consistently outperform monolingual ones • Sharing aEen;on is the best configura;on (on average) • Tradi;onal (bow) vs neural (en+ar, biGRU encoders) • en: 75.8% vs 77.8% — ar: 81.8% vs 84.0% Nikolaos Pappas 14 /18
Low-resource Scenario: Bilingual Training 0.5% high Training percentage Improvement 5% 50% low Nikolaos Pappas 15 /18
Qualita;ve Analysis: English - German • True posi;ve difference Cumulative TP difference (mul; vs mono) increases over the en;re spectrum • German russland (21), berlin (19), irak (14), wahlen (13) and nato (13) • English germany (259), german (97), soccer (73), football 753 (47) and merkel (25) Labels sorted by frequency Nikolaos Pappas 16 /18
Qualita;ve Analysis: Interpretable Output Nikolaos Pappas 17 /18
Conclusion and Perspec;ves • New mul;lingual models to learn shared document structures for text classifica;on • Benefit full-resource and low-resource languages • Achieve beEer accuracy with fewer parameters • Capable of cross-language transfer • Future work • Remove the constraint of closed label sets • Incorporate label informa;on • Apply to other NLU tasks Nikolaos Pappas 18 /18
Thank you User group meeting July 3, 2017 Caversham, UK Demos Technical talks Posters & discussions Contact us if interested! Nikolaos Pappas 19 /18
References • Mul;lingual Hierarchical AEen;on Networks for Text Classifica;on, Nikolaos Pappas and Andrei Popescu-Belis, 2017 (submiEed) • Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, and Noah A. Smith. 2016. Massively mul;lingual word embeddings. CoRR abs/1602.01925. • Stephan Gouws, Yoshua Bengio, and Gregory S. Corrado. 2015. BilBOWA: Fast bilingual distributed representa;ons without word alignments. 32nd Interna;onal Conference on Machine Learning. • Karl Moritz Hermann and Phil Blunsom. 2014. Mul;lingual models for composi;onal distributed seman;cs. 52nd Annual Mee;ng of the Associa;on for Computa;onal Linguis;cs. • Alexandre Klemen;ev, Ivan Titov, and Binod BhaEarai. 2012. Inducing crosslingual distributed rep- 894 resenta;ons of words. Interna;onal Conference on Computa;onal Linguis;cs. • Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical aEen;on networks for document classifica;on. In Proceedings of the 2016 Conference of the North American Chapter of the Associa;on for Computa;onal Linguis;cs: Human Language Technologies. • Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sen;ment classifica;on. In Empirical Methods on Natural Language Processing. • Rui Lin, Shujie Liu, Muyun Yang, Mu Li, Ming Zhou, and Sheng Li. 2015. Hierarchical recurrent neural network for document modeling. Conference on Empirical Methods in Natural Language Processing. • Yoon Kim. 2014. Convolu;onal neural networks for sentence classifica;on. Conference on Empirical Methods in Natural Language Processing. Nikolaos Pappas 20 /18
Recommend
More recommend