Improved Neural Machine Translation with a Syntax-Aware Encoder and - PowerPoint PPT Presentation

Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder Huadong Chen ! , Shujian Huang ! , David Chiang " , Jiajun Chen ! {chenhd,huangsj,chenjj}@nlp.nju.edu.cn dchiang@nd.edu 1. State Key Laboratory of Novel Software T echnology (Nanjing University) 2. University of Notre Dame 1

Outline • Motivation • Approach • Experiments • Conclusion 2

Part 1 Motivation 3

Neural Machine Translation • Encoder-decoder framework Cho et al., (2014) 4

Neural Machine Translation • Attentional NMT Bahdanau et al., (2015) 5

Neural Machine Translation • Their success depends on the representation they use to bridge the source and target language sentences. 6

Neural Machine Translation • However, this representation, a sequence of fixed dimensional vectors, differs considerably from – most theories about mental representations of sentences; – and traditional natural language processing pipelines, in which semantics is built up compositionally using a syntactic structure. • It neglecting the potentially useful structural information – perhaps as evidence of this, current NMT models still suffer from syntactic errors such as attachment (Shi et al., 2016). 7

Neural Machine Translation • Encoder: building up representation at higher levels, such as phrases, may need structures 3 Phrase-level Representation 2 1 𝑦 ! 𝑦 " 𝑦 G 𝑦 H 𝑦 I 2 3 1 4 8

Neural Machine Translation • Decoder: structures could act as the guidance or control for generation do not match the source structure 驻马尼拉大使馆 zhu manila dashiguan in embassy of manila embassy in manila 9

Our Work • We propose an encoder-decoder framework that takes the syntactic structures into consideration, which includes a bidirectional tree structure encoder and a tree coverage decoder. 10

Part 2 Syntax-aware Neural Machine Translation 11

Bottom-up Tree Encoder(1/3) • Bottom-up tree encoder (Tai et al., 2015, Eriguchi et al., 2016) : – building the tree structure representations from the bottom, which form the representations of constituents from their children 12

Bottom-up Tree Encoder(2/3) l We assume model consistency is important. l Our sequential model is based on bidirectional GRU. l We design tree-GRU instead of using tree-LSTM 13

Bottom-up Tree Encoder(3/3) • Drawbacks of bottom-up tree encoder: – the learned representation of a node is based on its subtree only; it contains no information from higher up in the tree – the representation of leaf nodes is still the sequential one, thus no syntactic information is fed into words. 14

Bidirectional Tree Encoder(1/2) • Bi-directional tree encoder – also propagating information from the top, which includes information from outside the current constituent – bi-directional Tree-LSMT for classification (Tengand Zhang, 2016) – bi-directional Tree-GRU for sentiment (Kokkinos and Potamianos, 2017) 15

Bidirectional Tree Encoder(2/2) • The top-down encoder by itself would have no lexical information as input – we feed the hidden states of the bottom-up encoder to the top-down encoder as input • The information propagated from parent node to left and right nodes is redundant – we use different weights for left and right nodes 16

Tree Attention • Treating the representation of tree nodes the same as word representations and performing attention (Eriguchi et al., 2016) – Pros: enabling attention at a higher level, i.e. the words in the same sub-tree could get attentions as a whole unit – Cons: still missing structural control, i.e. the attention for words and tree nodes may interfere with each other 17

Tree-Coverage Model(1/5) • T wo observations of translations: – a syntactic phrase in the source sentence is often incorrectly translated into discontinuous words in the output – the attention model prefers to attend to the non-leaf nodes, which may aggravate the over- translation problem 18

Tree-Coverage Model(2/5) Attention with Tree Encoder 19

Tree-Coverage Model(3/5) • Coverage model ( Tu et al., 2016) – it could be interpreted as a control mechanism for the attention model • Drawbacks – the coverage model sees the source-sentence annotations as a bag of vectors – it knows nothing about word order, still less about syntactic structure. 20

Tree-Coverage Model(4/5) • We propose to use prior knowledge to control the attention mechanism – in our case, the prior knowledge is the source syntactic information – in particular, we build our model on top of the word coverage model proposed by Tu et al. (2016) 21

Tree-Coverage Model(5/5) h W 𝑦 ] W 𝑒 YZ! 𝑦 ] W 𝑦 ^ W 𝐷 Y,W 𝐷 YZ!,W GRU 𝛽 Y,W 𝐷 𝐷 YZ!,](W) YZ!,^(W) 𝛽 Y,^(W) 𝛽 Y,](W) 22

Part 3 Experiments 23

Data and Settings • 1.6 million sentence pairs from LDC for training • Using MT02 for held-out dev, MT03, 04, 05, 06 for test • Implementation based on the dl4mt package 24

Tree-GRU v.s. Tree-LSTM 33 33 33 +1.02 +1.62 +1.62 32.5 32.5 32.5 32 32 32 +0.75 +0.75 +0.75 31.5 31.5 31.5 31 31 31 30.5 30.5 30.5 30 30 30 29.5 29.5 29.5 Sequential Sequential Sequential Tree-LSTM Tree-LSTM Tree-LSTM Tree-GRU Tree-GRU Tree-GRU Seq-LSTM Seq-LSTM Seq-LSTM SeqTree-LSTM SeqTree-LSTM SeqTree-LSTM 25

Tree-Coverage Model(1/2) Our tree-coverage model consistently improves performance further (rows 9–11) 26

Tree-Coverage Model(2/2) Attention with Tree + Tree-Coverage Model Encoder 27

Analysis By Sentence Length 5% ↑ 10% ↑ 1. The proposed bidirectional tree encoder outperforms the sequential NMT system and the T ree-GRU encoder across all lengths. 2. The improvements become larger for sentences longer than 20 words, and the biggest improvement is for sentences longer than 50 words . 28

Conclusion • We have investigated the potential of using explicit source-side syntactic trees in NMT. • The improvement could come from: – the enrichment of the representationduring encoding; – the structural control of attention during decoding. • In this paper, we only use the binarized structure of the source side tree. For future work, it will be interesting to make use of target side structure information or the syntactic labels,as well. 29

Thanks! 30

Improved Neural Machine Translation with a Syntax-Aware Encoder and - PowerPoint PPT Presentation

Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder Huadong Chen ! , Shujian Huang ! , David Chiang " , Jiajun Chen ! {chenhd,huangsj,chenjj}@nlp.nju.edu.cn dchiang@nd.edu 1. State Key Laboratory of Novel Software

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Machine Translation Philipp Koehn 6 October 2020 Philipp Koehn Machine Translation:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Syntax and Semantics Philipp Koehn 3 November 2020 Philipp Koehn Machine Translation: Syntax

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

Syntax-Based Decoding Philipp Koehn 9 November 2017 Philipp Koehn Machine Translation:

Syntax-Based Decoding 2 Philipp Koehn 14 November 2017 Philipp Koehn Machine Translation:

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Convolutional over Recurrent Encoder for Neural Machine Translation Praveen Dakwale and Christof

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Chapter 6: Syntax Syntax Syntax is the structure of a language. Earlier, both syntax and

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Decoding Philipp Koehn 8 October 2020 Philipp Koehn Machine

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Exemplar Encoder Decoder for Neural Conversation Generation By Gaurav Pandey, Danish

Travis Perkins plc Travis Perkins plc Financial Results Financial Results Year ended 31

Bright Horizons Delivering on the Plan FY17: Interim Results Presentation 21 February 2017

The Spoofax Language Workbench Lennart Kats Eelco Visser Software Engineering implement

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100

Deep Learning for Language Understanding (at Google Scale) Anjuli Kannan Software Engineer,

N EU G EN Text Generation from Meaning Representations Yannis Konstas Joint work

F i n a n c i a l R e s u l t s P r e s e n t a t i o n Aug 3, 2015