Encoder Decoder Models Antonios Anastasopoulos Site - PowerPoint PPT Presentation

CS11-731 MT and Seq2Seq models Encoder Decoder Models Antonios Anastasopoulos Site https://phontron.com/class/mtandseq2seq2019/ (Slides by: Antonis Anastasopoulos and Graham Neubig)

Language Models • Language models are generative models of text s ~ P(x) “The Malfoys!” said Hermione. Harry was watching him. He looked like Madame Maxime. When she strode up the wrong staircase to visit himself.   “I’m afraid I’ve definitely been suspended from power, no chance — indeed?” said Snape. He put his head back behind them and read groups as they crossed a corner and fluttered down onto their ink lamp, and picked up his spoon. The doorbell rang. It was a lot cleaner down in London. Text Credit: Max Deutsch (https://medium.com/deep-writing/)

Conditioned Language Models • Not just generate text, generate text according to some specification Input X Output Y ( Text ) Task Structured Data NL Description NL Generation English Japanese Translation Document Short Description Summarization Utterance Response Response Generation Image Text Image Captioning Speech Transcript Speech Recognition

Formulation and Modeling

Calculating the Probability of a Sentence I Y P ( X ) = P ( x i | x 1 , . . . , x i − 1 ) i =1 Next Word Context

Conditional Language Models J Y P ( Y | X ) = P ( y j | X, y 1 , . . . , y j − 1 ) j =1 Added Context!

(One Type of) Language Model (Mikolov et al. 2011) <s> I hate this movie LSTM LSTM LSTM LSTM LSTM predict predict predict predict predict I hate this movie </s>

(One Type of) Conditional Language Model (Sutskever et al. 2014) Encoder kono eiga ga kirai </s> LSTM LSTM LSTM LSTM LSTM I hate this movie LSTM LSTM LSTM LSTM argmax argmax argmax argmax argmax </s> I hate this movie Decoder

How to Pass Hidden State? • Initialize decoder w/ encoder (Sutskever et al. 2014) encoder decoder • Transform (can be different dimensions) encoder transform decoder • Input at every time step (Kalchbrenner & Blunsom 2013) decoder decoder decoder encoder

Methods of Generation

The Generation Problem • We have a model of P(Y|X), how do we use it to generate a sentence? • Two methods: • Sampling: Try to generate a random sentence according to the probability distribution. • Argmax: Try to generate the sentence with the highest probability.

      Ancestral Sampling • Randomly generate words one-by-one.   while y j-1 != “</s>”: y j ~ P(y j | X, y 1 , …, y j-1 ) • An exact method for sampling from P(X), no further work needed.

Greedy Search • One by one, pick the single highest-probability word while y j-1 != “</s>”: y j = argmax P(y j | X, y 1 , …, y j-1 ) • Not exact, real problems: • Will often generate the “easy” words first • Will prefer multiple common words to one rare word

Beam Search • Instead of picking one high-probability word, maintain several paths

Sentence Embedding   Methods

Sentence Embeddings from larger context:   Skip-thought Vectors (Kiros et al. 2015) • Unsupervised training: predict surrounding sentences on large-scale data (using encoder- decoder) • Use resulting representation as sentence representation

Sentence Embeddings from Autoencoder (Dai and Le 2015) • Unsupervised training: predict the same sentence

Sentence Embeddings from Language Model (Dai and Le 2015) • Unsupervised training: predict the next word

Sentence Embeddings from larger LMs   ELMo (Peters et al. 2018) • Bi-directional language models • Use linear combination of three layers as final representation Finetune the weights of the linear combination on the downstream task

Sentence Embeddings from larger LMs   using both sides: BERT (Devlin et al. 2018)

Encoder Decoder Models Antonios Anastasopoulos Site - PowerPoint PPT Presentation

CS11-731 MT and Seq2Seq models Encoder Decoder Models Antonios Anastasopoulos Site https://phontron.com/class/mtandseq2seq2019/ (Slides by: Antonis Anastasopoulos and Graham Neubig) Language Models Language models are generative models of

Exercise 2: Encoder / Decoder Framework Goals : Implement basic framework for encoder and decoder

UN13750 Programmable Encoder/Decoder Single chip contains both Encoder and Decoder. Schmitt

The Attention Mechanism & Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to

Image and Video Coding: Introduction bitstream encoder decoder Motivation Image and Video

Image and Video Coding: Representation, Acquisition, Display ... 10011 ... encoder decoder

A Hierarchical Encoder-Decoder for Paragraph Summarization Farzaneh Mahdisoltani Department of

Contents PRO-Decoder Function Methods Results Abstract Experiment Computer RBS-Decoder

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Encoder-decoder Models

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Encoder-decoder Models

Exemplar Encoder Decoder for Neural Conversation Generation By Gaurav Pandey, Danish

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Digital Design Disc: RTL Combinatorial Components 2-to-4 Decoder 4-to-16 Decoder 8-bit Shifter

7 Neural MT 1: Neural Encoder-Decoder Models From Section 3 to Section 6, we focused on the

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

Modeling language as a sequence of tokens CMSC 470 Marine Carpuat Beyond MT: Encoder-Decoder

Hybrid Sequence Encoder Of Collaborative Experts For Video Retrieval Kaixu Cui, Hui Liu, Cheng

BU CS 332 Theory of Computation Lecture 9: Reading: Midterm I review Sipser Ch 0 2.3

Network Protocol Design and Evaluation 05 - Validation, Part I Stefan Rhrup University of

Lecture #25: Programming Languages and Programs Metalinguistic Abstraction A programming

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

Virtual Memory Programmer can assume he/she has infinite amount of physical memory

Virtual Memory CS 3410 Computer System Organization & Programming [K. Bala, A. Bracy, E.

Tuning Cypher Andrew Bowman Customer Success Engineer NODES 2019 (andrew)-[:FREQUENTS]-> 2

Using DNS for Mapping Using DNS for Mapping Host Identifiers to Locators Host Identifiers to

Sambuz

Useful Links

Newsletter

Mail Us

Encoder Decoder Models Antonios Anastasopoulos Site - PowerPoint PPT Presentation

CS11-731 MT and Seq2Seq models Encoder Decoder Models Antonios Anastasopoulos Site https://phontron.com/class/mtandseq2seq2019/ (Slides by: Antonis Anastasopoulos and Graham Neubig) Language Models Language models are generative models of

Exercise 2: Encoder / Decoder Framework Goals : Implement basic framework for encoder and decoder

UN13750 Programmable Encoder/Decoder Single chip contains both Encoder and Decoder. Schmitt

The Attention Mechanism &amp; Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to

Image and Video Coding: Introduction bitstream encoder decoder Motivation Image and Video

Image and Video Coding: Representation, Acquisition, Display ... 10011 ... encoder decoder

A Hierarchical Encoder-Decoder for Paragraph Summarization Farzaneh Mahdisoltani Department of

Contents PRO-Decoder Function Methods Results Abstract Experiment Computer RBS-Decoder

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Encoder-decoder Models

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Encoder-decoder Models

Exemplar Encoder Decoder for Neural Conversation Generation By Gaurav Pandey, Danish

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Digital Design Disc: RTL Combinatorial Components 2-to-4 Decoder 4-to-16 Decoder 8-bit Shifter

7 Neural MT 1: Neural Encoder-Decoder Models From Section 3 to Section 6, we focused on the

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

Modeling language as a sequence of tokens CMSC 470 Marine Carpuat Beyond MT: Encoder-Decoder

Hybrid Sequence Encoder Of Collaborative Experts For Video Retrieval Kaixu Cui, Hui Liu, Cheng

BU CS 332 Theory of Computation Lecture 9: Reading: Midterm I review Sipser Ch 0 2.3

Network Protocol Design and Evaluation 05 - Validation, Part I Stefan Rhrup University of

Lecture #25: Programming Languages and Programs Metalinguistic Abstraction A programming

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

Virtual Memory Programmer can assume he/she has infinite amount of physical memory

Virtual Memory CS 3410 Computer System Organization &amp; Programming [K. Bala, A. Bracy, E.

Tuning Cypher Andrew Bowman Customer Success Engineer NODES 2019 (andrew)-[:FREQUENTS]-&gt; 2

Using DNS for Mapping Using DNS for Mapping Host Identifiers to Locators Host Identifiers to

Sambuz

Useful Links

Newsletter

Mail Us

The Attention Mechanism & Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to

Virtual Memory CS 3410 Computer System Organization & Programming [K. Bala, A. Bracy, E.

Tuning Cypher Andrew Bowman Customer Success Engineer NODES 2019 (andrew)-[:FREQUENTS]-> 2