Meta-Learning for Low Resource NMT
Introduction ● Historically Statistical Translation ● Neural Machine Translation recently outperforms ● Statistical Models outperformed translations on low resource language pairs
NMT Previous Work Monolingual Corpora Single Task Mixed Datasets Direct Transfer Learning
Meta Learning in NMT Idea: Improve on direct transfer learning by better fine-tuning
MAML for NMT 17 High-Resource Languages 4 Low-resource Languages Turkish Finnish Greek Danish Romanian Latvian Spanish Italian French Portuguese Greek Polish
17 High-Resource Languages 4 Low-resource Languages Finnish Turkish Danish Greek Romanian Latvian Spanish Italian French Portuguese Greek Polish Meta-test on these! e.g. Turkish English Meta-train on these! Note: they simulate low- e.g. Spanish English resource by sub- sampling
Gradient Update Meta-Gradient Update
Gradient Update 1st-order Approximate Meta- Gradient Update Meta-Gradient Update
Issue: Meta-train and meta-test input spaces should match! Meta-train En un lugar de la mancha, In some place in the de cuyo nombre no puedo... Mancha, whose name... Spanish Word Embeddings Spanish Embedding for nombre Meta-test Trained Benim adım kırmızı... My name is Red… independently Turkish Word Embeddings Turkish embedding for adım
Universal Lexical Representation Word embeddings trained independently on monolingual corpora Spanish Word English Word French Word Turkish Word Embeddings Embeddings Embeddings Embeddings
Universal Lexical Representation Word embeddings trained independently on monolingual corpora Spanish Word English Word French Word Turkish Word Embeddings Embeddings Embeddings Embeddings Universal Universal Embedding Values Embedding Keys
Universal Lexical Representation Universal Transformation Embedding Keys Spanish Word Matrix (transposed) Embeddings nombre Universal Embedding Values And, these are the Key: We represent weights of the linear “nombre” as a linear combination! combination of tokens in the ULR!
Universal Lexical Representation Universal Transformation Embedding Keys Turkish Word Matrix (transposed) Embeddings adım Universal Embedding Values Same embedding space as Spanish!
Training Universal Transformation Embedding Keys Spanish Word Matrix (transposed) Embeddings nombre Universal Embedding Values trainable fixed
Experiments
Experiments Comment : Best to leave the decoder be! Why?
Comment: Gap narrows as more training examples are included
Critique : Don’t evaluate on any real low-resource languages! Critique : Don’t know how many training examples per task? k-shot, but what is k?
Recommend
More recommend