A Quick Introduction to Neural MT Christian Hardmeier 2016-05-16 - PDF document

A Quick Introduction to Neural MT Christian Hardmeier 2016-05-16 Why this lecture? For about 15 years, the MT world was relatively static. State of the art defined by phrase-based SMT and syntax-based SMT . Well-known strengths and weaknesses. Neural MT is a new, quite different approach to MT that seems to outperform the previous methods. Deep Learning Continuous-space NLP Neural Networks Deep Learning Machine learning paradigm that gained popularity very recently. First breakthroughs in computer vision. Multiple layers of prediction: “Automated feature engineering”

Deep Learning Image source: http://deeplearning.stanford.edu/wiki/index.php/Exercise%3AVectorization Continuous-Space Methods NLP traditionally treated words as discrete, incomparable units. Continuous-space methods map them into a vector space where you can compute similarities. Methods: Word cooccurrence or deep learning. With deep learning, we can train word embeddings for specific objectives. Discrete Words Lockheed ICCAT closed pride OHIO shadowy Ernest hangout solution homicidal Pacific things far-ranging enables Akram communicates triangle taxed secrets receipts taken Spinelli dates Cost clash district relative visa captains abilities Organization Austrian inflows Loyola whatever Primakov upstaging guidelines authors complaining oath marched soldiers geology drifts seen provide adaptation enterprises Valdis un-associated misguided non-Serb writing doubtless frankly anti-Semitism 10-1 operators Genocide camouflage gathered adopts bags shunning approaching aspirin maximum expenditure some footsteps Dutch stressed writers between mischief undertake attention degraded obscene

Word Embeddings (Projected) world new people time president her its their $ other one his the what an some three it most two a you all no ‘‘ first more this NUMBER only up about i not so than we if there she they but out last when or he over by − do ’ be _ after as ’’ who : into −− which may that with at − rrb − for ; can also from − lrb − could on will would ’s are have is has said were was had (courtesy of Ali Basirat) Neural networks Neural networks are the machine learning paradigm in which most of this happens. Biologically inspired, but doesn’t matter very much. Very popular in the early 1980’s, but the time wasn’t ripe. The elementary “neuron” is just a nonlinear with some trainable parameters. Neurons are combined into a network by function composition. Logistic Regression x 1 · λ 1 1 f ( x ) = 1 + e − x x 2 · λ 2 + x 3 · λ 3 y x 4 · λ 4 y = f � λ i x i � � � � x 5 · λ 5 i

Logistic Regression x 1 λ 1 λ 2 x 2 λ 3 x 3 y λ 4 x 4 y = f � λ i x i � λ 5 � � � x 5 i Multiple Decision Steps Latent features h Outputs y Inputs x h = f ( W 1 x ) y = f ( W 2 h ) Multiple Decision Steps Latent features h Outputs y Inputs x h = f ( W 1 x ) y = f ( W 2 h )

Training the Network Neural networks are trained by numerically minimising the error of the output for a training set. The algorithms used are variants of gradient descent . The gradients with respect to all weights can be computed efficiently with a dynamic programming algorithm called back-propagation . Word Embeddings in Neural Networks Dense embeddings e 0 0 .10 0 .32 0 .95 Sparse inputs x 0 .02 1 .51 Remaining 0 .33 network 0 .18 0 .83 0 .81 0 .67 0 .22 0 Sequence Length Limits A given network takes a fixed number of inputs. In MT, we need to process input sentences of arbitrary length and produce output of arbitrary length. Input and output length are not necessarily the same. Input length Output length Compression Network type fixed fixed feed-forward variable = input (or fixed) recurrent variable unconstrained to fixed size encoder-decoder variable unconstrained no compression attention-based

Adding a Time Dimension: Recurrent Nets y t − 1 x t 1 x t y t 2 x t 3 Processing Sequences y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 h 1 h 2 h 3 h 4 h 5 h 6 h 7 h 8 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 forward connections recurrent connections Unequal Sequence Length In this architecture, there are equally many inputs x i as outputs y i . Useful for sequence labelling tasks such as POS tagging. In machine translation, the length of the input and output sequences differ.

Encoder-Decoder Architecture One set of layers of fixed size y 1 y 2 y 3 y 4 EOS must hold the contents of the whole input sentence. d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 x 1 x 2 x 3 EOS y 1 y 2 y 3 y 4 Attention Mechanism y 1 y 3 y 4 y 2 EOS d 1 d 2 d 3 d 4 d 5 + + + + + e 4 e 1 e 2 e 3 x 1 x 2 x 3 EOS Neural MT: Summary Very new area: First large-scale systems in 2014. Promising results in public evaluations. We know little about its strengths and weaknesses yet, but they seem to be very different from earlier approaches. I’ll tell you more in a few years. . .

A Quick Introduction to Neural MT Christian Hardmeier 2016-05-16 - PDF document

A Quick Introduction to Neural MT Christian Hardmeier 2016-05-16 Why this lecture? For about 15 years, the MT world was relatively static. State of the art defined by phrase-based SMT and syntax-based SMT . Well-known strengths and weaknesses.

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

QUICK INTRODUCTION People call me GONZ QUICK INTRODUCTION 1. Never went to Art School

Printout Tuesday, October 29, 2019 7:38 PM Quick Notes Page 1 Quick Notes Page 2 Quick Notes

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Sorting Chapter 7 1 Quick Sort One of the most popular fast sorting algorithms Quick sort

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Identity Testing for constant-width, and commutative, ROABPs Rohit Gurjar , Arpita Korwar,

Bosnia and Herzegovinas Aspirations to join NATO and the EU Biljana Guti -Bjelica,

cosmological context Neven Bilic, Goran S. Djordjevic, Milan Milosevic and Dragoljub D.

Hydrogen Rich Solids as source of Quantum Spin Liquids & High Tc Superconductivity ICTP

Chapter 3 Registers, Counters, Shift Registers Process Control Flaxer Eli - Process Control Ch

VAMDC Virtual Atomic and Molecular Data Centre C. Mendoza (IVIC, CeCalCULA) With the

Octal Tri-state Register/Latch Counters 74LS374 CLK 74LS374 OE /OE 1D 1Q Clocked

Lecture Lecture on on Factors Leading up to Factors Leading up to Macedonias Partition

A Quick Introduction to Neural MT Christian Hardmeier 2016-05-16 - PDF document

A Quick Introduction to Neural MT Christian Hardmeier 2016-05-16 Why this lecture? For about 15 years, the MT world was relatively static. State of the art defined by phrase-based SMT and syntax-based SMT . Well-known strengths and weaknesses.

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

QUICK INTRODUCTION People call me GONZ QUICK INTRODUCTION 1. Never went to Art School

Printout Tuesday, October 29, 2019 7:38 PM Quick Notes Page 1 Quick Notes Page 2 Quick Notes

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Sorting Chapter 7 1 Quick Sort One of the most popular fast sorting algorithms Quick sort

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Identity Testing for constant-width, and commutative, ROABPs Rohit Gurjar , Arpita Korwar,

Bosnia and Herzegovinas Aspirations to join NATO and the EU Biljana Guti -Bjelica,

cosmological context Neven Bilic, Goran S. Djordjevic, Milan Milosevic and Dragoljub D.

Hydrogen Rich Solids as source of Quantum Spin Liquids &amp; High Tc Superconductivity ICTP

Chapter 3 Registers, Counters, Shift Registers Process Control Flaxer Eli - Process Control Ch

VAMDC Virtual Atomic and Molecular Data Centre C. Mendoza (IVIC, CeCalCULA) With the

Octal Tri-state Register/Latch Counters 74LS374 CLK 74LS374 OE /OE 1D 1Q Clocked

Lecture Lecture on on Factors Leading up to Factors Leading up to Macedonias Partition

Hydrogen Rich Solids as source of Quantum Spin Liquids & High Tc Superconductivity ICTP