Exploiting Cross-Sentence Context for Neural Machine Translation Longyue Wang ♥ Zhaopeng Tu ♠ Andy Way ♥ Qun Liu ♥ ♥ ADAPT Centre, Dublin City University ♠ Tencent AI Lab
这是 ⼀丁个 ⽣甠态 ⽹罒络 。 Motivation • The majority of NMT models is sentence-level . <eos> this a neutral s 0 s 1 s t s T c t c T c ⨁ 0.0 0.1 0.0 0.2 0.7 0.1 <eos>
Motivation • The continuous vector representation of a symbol encodes multiple dimensions of similarity . Word x 0 Axis Nearest Neighbours 1 diary notebooks (notebook) sketchbook jottings notebook 2 palmtop notebooks (notebook) ipaq laptop 1 powers authority (power) powerbase sovereignity power 2 powers electrohydraulic microwatts hydel (power) (Choi et al., 2016)
Motivation • The continuous vector representation of a symbol encodes multiple dimensions of similarity . • Consistency is another critical issue in document- level translation. 那么 在 这个 问题 上 , 伊朗 的 … well, on this issue , iran has a relatively … Past 在 任内 解决 伊朗 核 问题 , 不泌管是 ⽤甩 和平 … to resolve the iranian nuclear issue in his term , … 那 刚刚 提到 这个 … 谈判 的 问题 。 Current that just mentioned the issue of the talks …
Motivation • The cross-sentence context has proven helpful for the aforementioned two problems in multiple sequential tasks (Sordoni et al., 2015; Vinyals and Le, 2015; Serban et al., 2016).
Motivation • The cross-sentence context has proven helpful for the aforementioned two problems in multiple sequential tasks (Sordoni et al., 2015; Vinyals and Le, 2015; Serban et al., 2016). • However, it has received relatively little attention from the NMT research community.
Data and Setting • Chinese-English translation task • Training data: 1M sentence pairs from LDC corpora that contain document information • Tuning: NIST MT05, Test: NIST MT06 and MT08 • Build the model on top of Nematus ( https:// github.com/EdinburghNLP/nematus ) • Vocabulary size: 35K for both languages • Word embedding: 600; Hidden size: 1000
这⾥里離 。 ⾃臫然 保护区 。 养殖 ⼴庀义的 我们 Approach • Use a Hierarchical RNN to summarize previous M source sentences Cross-Sentence Context Sentence-level RNN Word-level RNN … … … … <eos> <eos> … … …
这是 ⼀丁个 ⽣甠态 ⽹罒络 。 Approach • Strategy I: Initialization — Encoder <eos> Cross-Sentence Context
这是 ⼀丁个 ⽣甠态 ⽹罒络 。 Approach • Strategy I: Initialization — Decoder s 0 s 1 s t s T <eos> Cross-Sentence Context
这是 ⼀丁个 ⽣甠态 ⽹罒络 。 Approach • Strategy I: Initialization — Both s 0 s 1 s t s T <eos> Cross-Sentence Context
Results • Impact of components 32.0 32.0 31.9 31.55 31.5 Baseline +Init_Enc 31.0 +Init_Dec +Init_Both 30.57 30.5 30.0
。 ⼀丁个 ⽹罒络 ⽣甠态 这是 Approach • Strategy 2: Auxiliary Context . <eos> this a neutral s 0 s 1 s t s T c t Intra-Sentence Context ⨁ 0.0 0.1 0.0 0.2 0.7 0.1 <eos> Cross-Sentence Context
Approach • Strategy 2: Auxiliary Context y t-1 y t-1 y t-1 s t s t s t s t-1 s t-1 s t-1 act. act. act. c t c t c t 𝜏 𝑨 t D D ✕ (a) standard (b) decoder with (c) decoder with decoder auxiliary context gating auxiliary context
Results • Impact of components 32.5 32.24 32.0 31.5 Baseline 31.3 +Aux. Ctx. +Gating Aux. Ctx. 31.0 30.57 30.5 30.0
。 ⼀丁个 ⽹罒络 ⽣甠态 这是 Approach • Initialization + Gating Auxiliary Context . <eos> this a neutral s 0 s 1 s t s T c t Intra-Sentence Context ⨁ 0.0 0.1 0.0 0.2 0.7 0.1 <eos> Cross-Sentence Context
Results • Impact of components 33.0 32.67 32.5 32.24 32.00 32.0 Baseline +Init_Both 31.5 +Gating Aux. Ctx. +Both 31.0 30.57 30.5 30.0
Analysis • Translation error statistics Errors Ambiguity Inconsistency All Total 38 32 70 Fixed 29 24 53 New 7 8 15
Analysis • Case Study Ÿ � I é � @ – M J … * ò Ï Hist. v ' l ˚ j ¡ ⌫ ? ˝ & O 6 å ⌥ Q P ò ? Input Can it inhibit and deter corrupt offi- Ref. cials? NMT Can we contain and deter the enemy ? Can it contain and deter the corrupt Our officials ?
Summary • We propose to use HRNN to summary previous source sentences, which aims at providing cross- sentence context for NMT • Limitations • Computational expensive • Only exploit source sentences due to error propagation • Encoded into a single fixed-length vector, not flexible
Publicly Available • The source code is publicly available at https:// github.com/tuzhaopeng/LC-NMT • The trained models and translation results will be released
Reference 1. Heeyoul Choi, Kyunghyun Cho, and Yoshua Bengio. Context-dependent word representation for neural machine translation . arXiv 2016. 2. Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, and Jian- Yun Nie. A hierarchical recurrent encoder- decoder for generative context-aware query suggestion. CIKM 2015. 3. Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. AAAI 2016. 4. Oriol Vinyals and Quoc Le. A neural conversa- tional model. In Proceedings of the International Conference on Machine Learning, Deep Learning Workshop.
Question & Answer
Recommend
More recommend