Sequence to Sequence Models for Machine Translation CMSC 723 / LING - PowerPoint PPT Presentation

Feb 08, 2023 •132 likes •349 views

Sequence to Sequence Models for Machine Translation CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides & figure credits: Graham Neubig Machine Translation 3 problems Translation system Input: source sentence F Output:

Sequence to Sequence Models for Machine Translation CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides & figure credits: Graham Neubig
Machine Translation • 3 problems • Translation system • Input: source sentence F • Output: target sentence E • Modeling • Can be viewed as a function • how to define P(.)? • Training/Learning • how to estimate parameters from • Statistical machine translation systems parallel corpora? • Search • How to solve argmax efficiently?
Introduction to Neural Machine Translation • Neural language models review • Sequence to sequence models for MT • Encoder-Decoder • Sampling and search (greedy vs beam search) • Practical tricks • Sequence to sequence models for other NLP tasks
A feedforward neural 3-gram model
A recurrent language model
A recurrent language model
Examples of RNN variants • LSTMs • Aim to address vanishing/exploding gradient issue • Stacked RNNs • …
Training in practice: online
Training in practice: batch
Training in practice: minibatch • Compromise between online and batch • Computational advantages • Can leverage vector processing instructions in modern hardware • By processing multiple examples simultaneously
Problem with minibatches: in language modeling, examples don’t have the same length • 3 tricks • Padding • Add </s> symbol to make all sentences same length • Masking • Multiply loss function calculated over padded symbols by zero • + sort sentences by length
Introduction to Neural Machine Translation • Neural language models review • Sequence to sequence models for MT • Encoder-Decoder • Sampling and search (greedy vs beam search) • Training tricks • Sequence to sequence models for other NLP tasks
Encoder-decoder model
Encoder-decoder model
Generating Output • We have a model P(E|F), how can we generate translations? • 2 methods • Sampling : generate a random sentence according to probability distribution • Argmax : generate sentence with highest probability
Ancestral Sampling • Randomly generate words one by one • Until end of sentence symbol • Done!
Greedy search • One by one, pick single highest probability word • Problems • Often generates easy words first • Often prefers multiple common words to rare words
Greedy Search Example
Beam Search Example with beam size b = 2 We consider b top hypotheses at each time step
Introduction to Neural Machine Translation • Neural language models review • Sequence to sequence models for MT • Encoder-Decoder • Sampling and search (greedy vs beam search) • Practical tricks • Sequence to sequence models for other NLP tasks

Recommend

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Deep Learning Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem: A sequence goes in A different sequence comes out E.g. Speech recognition: Speech goes in, a word

1.77k views • 162 slides

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Deep Learning Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem: A sequence goes in A different sequence comes out E.g. Speech recognition: Speech goes in, a word

2.23k views • 167 slides

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence motifs Premise: the sequence of a protein Premise: the sequence of a protein sequence gives clues about its structure sequence gives clues

359 views • 11 slides

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Deep Learning Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence modelling Problem: A sequence goes in A different sequence comes out E.g. Speech recognition:

1.86k views • 172 slides

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

Deep Learning Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1 Sequence-to-sequence modelling Problem: A sequence 1 goes in A different sequence 1 comes out

1.44k views • 141 slides

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON Biswanath Halder Data Scientist Sequence to sequence generation Output a sequence given a sequence as input. Fixed length input. Fixed length output.

884 views • 39 slides

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Sequence Analysis SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or peptide sequence to sequence alignment, sequence databases, repeated sequence searches, or other bioinformatics methods on a

818 views • 20 slides

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models Language Models Language models are generative models of text s ~ P(x) The Malfoys! said Hermione. Harry was watching him. He looked like

1.18k views • 37 slides

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1 / 62 Sequence alignment: Motivation Sequence alignment: Motivation Gerhard Jger Sequence Alignment ESSLLI 2016 2 / 62 Sequence alignment:

1.44k views • 129 slides

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence Operations Map, filter, and reduce express sequence manipulation using compact expressions 4 Sequence Operations Map, filter, and reduce express

1.37k views • 102 slides

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Sequence to Sequence Learning with Neural Networks Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le, NIPS 2014 Introduced by Graham Neubig, NAIST 2014-11-01 1 Sequence to Sequence Learning with

360 views • 19 slides

Sequence-to-Sequence Models Can Directly Translate Foreign Speech Ron J. Weiss, Jan Chorowski ,

Sequence-to-Sequence Models Can Directly Translate Foreign Speech Ron J. Weiss, Jan Chorowski , Navdeep Jaitly, Yonghui Wu, Zhifeng Chen Interspeech 2017 Sequence-to-Sequence Models Can Directly Translate Foreign Speech -- Interspeech 2017

511 views • 21 slides

Machine Translation and Sequence-to-sequence Models http://phontron.com/class/mtandseq2seq2018/

Machine Translation and Sequence-to-sequence Models Machine Translation and Sequence-to-sequence Models http://phontron.com/class/mtandseq2seq2018/ Graham Neubig Carnegie Mellon University CS 11-731 1 Machine Translation and

653 views • 44 slides

The Neural Noisy Channel: Generative Models for Sequence to Sequence Modeling Chris

The Neural Noisy Channel: Generative Models for Sequence to Sequence Modeling Chris Dyer The Neural Noisy Channel: Generative Models for Sequence to Sequence Modeling EVERYTHING Chris Dyer What is a discriminative

756 views • 65 slides

Sequence-to-sequence models used for machine translation and Murat Apishev Katya Artemova

Sequence-to-sequence models used for machine translation and Murat Apishev Katya Artemova Computational Pragmatics Lab, HSE December 2, 2019 Apishev, Artemova (HSE) Sequence-to-sequence models December 2, 2019 1 / 67 Machine translation

970 views • 67 slides

Natural Language Processing with Deep Learning Sequence-to-sequence Models with Attention Navid

Natural Language Processing with Deep Learning Sequence-to-sequence Models with Attention Navid Rekab-Saz navid.rekabsaz@jku.at Institute of Computational Perception Institute of Computational Perception Agenda Sequence-to-sequence

900 views • 64 slides

Asynchronous random sampling for decentralized detection Georgios Fellouris , Columbia

Asynchronous random sampling for decentralized detection Georgios Fellouris , Columbia University, NY, USA George V. Moustakides , University of Patras, Greece Outline Sequential hypothesis testing and SPRT Sequential change detection and

943 views • 24 slides

Non-Monotonic Sequential Text Generation Sean Welleck, Kiant Brantley, Hal Daum III,

Non-Monotonic Sequential Text Generation Sean Welleck, Kiant Brantley, Hal Daum III, Kyunghyun Cho Sequential Text Generation Y = ( y 1 , y 2 , , y N ) ( hi , how , are , you , ? ) Sequential Text Generation Unconditional Y ( hi ,

459 views • 24 slides

Advanced Simulation - Lecture 16 Patrick Rebeschini March 7th, 2018 Patrick Rebeschini Lecture

Advanced Simulation - Lecture 16 Patrick Rebeschini March 7th, 2018 Patrick Rebeschini Lecture 16 1/ 25 Outline Particle methods for static problems. Evidence estimation. The End! Patrick Rebeschini Lecture 16 2/ 25 Sequence of

611 views • 25 slides

Conditional simulations of max-stable processes C. Dombry , . yi-Minko , M. Ribatet

Conditional simulations of max-stable processes C. Dombry , . yi-Minko , M. Ribatet F Laboratoire de Mathmatiques et Application, Universit de Poitiers Institut de Mathmatiques et de Modlisation, Universit

441 views • 41 slides

Welcome to the Membership Meeting! 1:30 - 3:00pm @JerseyWaterWrks #JWWMembership2020 Zoom

Welcome to the Membership Meeting! 1:30 - 3:00pm @JerseyWaterWrks #JWWMembership2020 Zoom Features Q&A: Submit your questions during the panel discussion. Polls: Submit your answers to the polls and see the results live on

541 views • 30 slides

Analysis of multistate data with thenetsurvival orcause specific survival :

stick to this world Analysis of multistate data with thenetsurvival orcause specific survival : realistic rate models and t S c ( t ) = exp c ( s ) d s multiple time scales: 0 not a proper probability A

309 views • 6 slides

Chapter 28 Entropy and Shannons Theorem CS 573: Algorithms, Fall 2013 December 10, 2013 28.1

Chapter 28 Entropy and Shannons Theorem CS 573: Algorithms, Fall 2013 December 10, 2013 28.1 Entropy 28.2 Extracting randomness 28.2.1 Enumerating binary strings with j ones 28.2.1.1 Storing all strings of length n and j bits on (A) S

180 views • 6 slides

Edge colouring multigraphs Penny Haxell University of Waterloo Hal Kierstead Arizona State

Edge colouring multigraphs Penny Haxell University of Waterloo Hal Kierstead Arizona State University 1 Multigraphs For a multigraph G , we denote by ( G ) the maximum degree of G , ( G ) the maximum edge multiplicity of G , and

608 views • 26 slides

Sequence to Sequence Models for Machine Translation CMSC 723 / LING - PowerPoint PPT Presentation

Sequence to Sequence Models for Machine Translation CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides & figure credits: Graham Neubig Machine Translation 3 problems Translation system Input: source sentence F Output:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Sequence-to-Sequence Models Can Directly Translate Foreign Speech Ron J. Weiss, Jan Chorowski ,

Machine Translation and Sequence-to-sequence Models http://phontron.com/class/mtandseq2seq2018/

The Neural Noisy Channel: Generative Models for Sequence to Sequence Modeling Chris

Sequence-to-sequence models used for machine translation and Murat Apishev Katya Artemova

Natural Language Processing with Deep Learning Sequence-to-sequence Models with Attention Navid

Asynchronous random sampling for decentralized detection Georgios Fellouris , Columbia

Non-Monotonic Sequential Text Generation Sean Welleck, Kiant Brantley, Hal Daum III,

Advanced Simulation - Lecture 16 Patrick Rebeschini March 7th, 2018 Patrick Rebeschini Lecture

Conditional simulations of max-stable processes C. Dombry , . yi-Minko , M. Ribatet

Welcome to the Membership Meeting! 1:30 - 3:00pm @JerseyWaterWrks #JWWMembership2020 Zoom

Analysis of multistate data with thenetsurvival orcause specific survival :

Chapter 28 Entropy and Shannons Theorem CS 573: Algorithms, Fall 2013 December 10, 2013 28.1

Edge colouring multigraphs Penny Haxell University of Waterloo Hal Kierstead Arizona State

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or