Plug and Play Autoencoders for Conditional Text Generation Florian - PowerPoint PPT Presentation

Plug and Play Autoencoders for Conditional Text Generation Florian Mai † , ♠ Nikolaos Pappas ♣ Ivan Montero ♣ Noah A. Smith ♣ , ♦ James Henderson † † Idiap Research Institute, ♠ EPFL, Switzerland ♣ University of Washington, Seattle, USA ♦ Allen Institute for Artificial Intelligence, Seattle, USA fl orian.mai@idiap.ch

The Problem with Conditional Text Generation text lives in a messy, discrete space discrete space florian.mai@idiap.ch 1

The Problem with Conditional Text Generation text lives in a messy, discrete space discrete space conditional text generation requires mapping from discrete y input to discrete output x florian.mai@idiap.ch 1

The Problem with Conditional Text Generation text lives in a messy, discrete space discrete space conditional text generation requires mapping from discrete Usual way complex function � input to discrete output y x task specific training U sual way: learning a complex, task-specific function, which is difficult to train in discrete space florian.mai@idiap.ch 1

The Problem with Conditional Text Generation text lives in a messy, discrete space conditional text generation discrete space requires mapping from discrete Usual input to discrete output way complex function � y Usual way: x task specific training learning a complex, task-specific function, which is difficult to train in discrete space Our way : florian.mai@idiap.ch 1

The Problem with Conditional Text Generation text lives in a messy, discrete space conditional text generation discrete space requires mapping from discrete Usual input to discrete output way complex function � y Usual way: x task specific training learning a complex, task-specific function, which is difficult to train in discrete space Our way : obtain a continuous space by training an autoencoder continuous autoencoder space florian.mai@idiap.ch 1

The Problem with Conditional Text Generation text lives in a messy, discrete space conditional text generation discrete space requires mapping from discrete Usual input to discrete output way complex function � y Usual way: x task specific training learning a complex, task-specific function, which is difficult to train in discrete space Our way : Our way obtain a continuous space by training an autoencoder pretraining continuous autoencoder space florian.mai@idiap.ch 1

The Problem with Conditional Text Generation text lives in a messy, discrete space conditional text generation discrete space requires mapping from discrete Usual input to discrete output way complex function � y Usual way: x task specific training learning a complex, task-specific function, which is difficult to train in discrete space Our way : Our way obtain a continuous space by simple mapping � training an autoencoder pretraining reduce task-specific learning to continuous autoencoder space the continuous space florian.mai@idiap.ch 1

Framework Overview Our framework ( Emb2Emb ) consists of three stages: florian.mai@idiap.ch 2

Framework Overview Pretraining : Train a model of the form A ( x ) = Dec(Enc( x )) on corpus of sentences Assume a fixed-size continuous embedding z x := Enc( x ) ∈ R d Enc and Dec can be any function trained with any objective so long as A ( x ) ≈ x training corpus can be any unlabeled corpus ⇒ large-scale pretraining? florian.mai@idiap.ch 2

Framework Overview Pretraining : Train a model of the form A ( x ) = Dec(Enc( x )) on corpus of sentences Assume a fixed-size continuous embedding z x := Enc( x ) ∈ R d Enc and Dec can be any function trained with any objective so long as A ( x ) ≈ x training corpus can be any unlabeled corpus ⇒ large-scale pretraining? Plug and Play Our framework is plug and play because any autoencoder can be used with it. florian.mai@idiap.ch 2

Framework Overview Task Training : Supervised case: L task ( ˆ z y , z y ) = d ( ˆ z y , z y ) where d is a distance function (cosine distance loss in our experiments). florian.mai@idiap.ch 2

Framework Overview Task Training : Supervised case: L task ( ˆ z y , z y ) = d ( ˆ z y , z y ) where d is a distance function (cosine distance loss in our experiments). ? Training objective: L = L task + λ adv · L adv florian.mai@idiap.ch 2

Framework Overview Inference : compose inference model as Enc ◦ Φ ◦ Dec but: Dec not involved in training. Can it handle outputs of Φ ? ⇒ yes, if using L adv . florian.mai@idiap.ch 2

What can happen when learning in the embedding space? (0,0) florian.mai@idiap.ch 3

What can happen when learning in the embedding space? A prediction may end up off the manifold, and by definition, the decoder cannot handle off-manifold data well, but ... (0,0) florian.mai@idiap.ch 3

What can happen when learning in the embedding space? A prediction may end up off the manifold, and by definition, the decoder cannot handle off-manifold data well, but ... ... but the predicted embedding may still have the same angle as the true output embedding... (0,0) florian.mai@idiap.ch 3

What can happen when learning in the embedding space? A prediction may end up off the manifold, and by definition, the decoder cannot handle off-manifold data well, but ... ... but the predicted embedding may still have the same angle as the true output embedding... (0,0) resulting in zero cosine distance loss despite being off the manifold. florian.mai@idiap.ch 3

What can happen when learning in the embedding space? A prediction may end up off the manifold, and by definition, the decoder cannot handle off-manifold data well, but ... ... but the predicted embedding may still have the same angle as the true output embedding... (0,0) resulting in zero cosine distance loss despite being off the manifold. Similar problems arise for L2 distance - how do we keep the embeddings on the manifold? florian.mai@idiap.ch 3

Adversarial Loss Term train a discriminator disc to distinguish between embeddings produced by the encoder and embeddings resulting from the mapping: N log( disc ( z ˜ y i )) + log( 1 − disc (Φ( z x i )) � max disc i = 1 using the adversarial learning framework, mapping acts as the adversary and tries to fool the discriminator: L adv (Φ( z x i ); θ ) = − log( disc (Φ( z x i ); θ )) at convergence, the mapping should only produce embeddings that are on the manifold florian.mai@idiap.ch 4

Supervised Style Transfer Experiments WikiLarge dataset: transform “normal“ English to “simple“ English parallel sentences (input and output) are available Model BLEU (relative imp.) SARI (relative imp.) Emb2Emb (no L adv ) 15.7 (-) 21.1 (-) Emb2Emb 34.7 (+121%) 25.4 (+20.4%) The adversarial loss term L adv is crucial for embedding-to-embedding training! florian.mai@idiap.ch 5

Supervised Style Transfer Experiments we conducted controlled experiments of models with a fixed-size bottleneck best Seq2Seq model: best performing variant among fixed-size bottleneck models that are trained end-to-end via token-level cross-entropy loss (like Seq2Seq) Model BLEU (relative imp.) SARI (relative imp.) Speedup Best Seq2Seq model 23.3 ( ± 0 % ) 22.4 ( ± 0 % ) - Emb2Emb 34.7 (+48.9%) 25.4 (+13.4%) 2.2 × Training models with a fixed-size bottleneck may be easier, faster , and more effective when training embedding-to-embedding! florian.mai@idiap.ch 6

Unsupervised Task Training Fixed-size bottleneck autoencoders are commonly used for unsupervised style transfer florian.mai@idiap.ch 7

Unsupervised Task Training Fixed-size bottleneck autoencoders are commonly used for unsupervised style transfer The goal is to change the sty le of a text, but retain the cont ent: e.g., in machine translation, sentence simplification, sentiment transfer florian.mai@idiap.ch 7

Unsupervised Task Training Fixed-size bottleneck autoencoders are commonly used for unsupervised style transfer The goal is to change the sty le of a text, but retain the cont ent: e.g., in machine translation, sentence simplification, sentiment transfer training objective: L = L task + λ adv · L adv florian.mai@idiap.ch 7

Unsupervised Task Training Fixed-size bottleneck autoencoders are commonly used for unsupervised style transfer The goal is to change the sty le of a text, but retain the cont ent: e.g., in machine translation, sentence simplification, sentiment transfer training objective: L = L task + λ adv · L adv z y , z x ) = λ sty L sty ( ˆ z y ) + ( 1 − λ sty ) L cont ( ˆ z y , z x ) L task ( ˆ florian.mai@idiap.ch 7

Plug and Play Autoencoders for Conditional Text Generation Florian - PowerPoint PPT Presentation

Plug and Play Autoencoders for Conditional Text Generation Florian Mai , Nikolaos Pappas Ivan Montero Noah A. Smith , James Henderson Idiap Research Institute, EPFL, Switzerland University of Washington,

Introducing OSGi Eclipse Plug-ins 1 Plug-in State Information Plug-in Structure

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

WALL PLUG PLUG-IN SWITCH & POWER METERING MANY FEATURES IN A SINGLE DEVICE The FIBARO Wall

PDR900 Controller PDR900 Controller Plug and play readout for Plug and play readout for 900

Plug-In Folly Part 4 by Pat Murphy, Plan Curtail PART 4A: The Plug-In Hybrid Car Inventing the

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Educating Text Autoencoders: Latent Representation Guidance via Denoising Tianxiao Shen Jonas

CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 /

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

PLUG PLUG Presentation Layer Universal Generator Presentation Layer Universal Generator

Plug-In Folly Part 5 by Pat Murphy, Plan Curtail Part 5A: Conclusion Plug-In Vehicles the

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Universal Plug and Play Eventing Vulnerabilities Joeri Blokhuis February 4, 2009 Joeri Blokhuis

Unsupervised Machine Translation Sachin Kumar Conditional Text Generation Generate text

FIBARO Wall Plug The smallest in the world Home intelligence Most sophisticated device FIBARO

Plug and Play Language Model : A Simple Baseline for Controlled Language Generation ICLR20

Computer Graphics (Basic OpenGL) Thilo Kielmann Fall 2008 Vrije Universiteit, Amsterdam

RESULTS VISUALISATION RESULTS VISUALISATION At the beginning of this course, the large majority

Part II: Symbolic reachability for prefix rewriting Case study: Drawing skylines static Random r

Plug-and-Play Operation of Microgrids Florian D orfler ETH Z urich electric energy is our

FlawFinder A Modular System for Predicting Quality Flaws in Wikipedia Oliver Ferschke, Iryna

Taking a Customer-Centric Approach in the Business Maria Robertson, Deputy Chief Executive,

ENGR/CS 101 CS Session Lecture 2 Starting with the next class, we will be using Visual Studio

Plug and Play Autoencoders for Conditional Text Generation Florian - PowerPoint PPT Presentation

Plug and Play Autoencoders for Conditional Text Generation Florian Mai , Nikolaos Pappas Ivan Montero Noah A. Smith , James Henderson Idiap Research Institute, EPFL, Switzerland University of Washington,

Introducing OSGi Eclipse Plug-ins 1 Plug-in State Information Plug-in Structure

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

WALL PLUG PLUG-IN SWITCH &amp; POWER METERING MANY FEATURES IN A SINGLE DEVICE The FIBARO Wall

PDR900 Controller PDR900 Controller Plug and play readout for Plug and play readout for 900

Plug-In Folly Part 4 by Pat Murphy, Plan Curtail PART 4A: The Plug-In Hybrid Car Inventing the

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Educating Text Autoencoders: Latent Representation Guidance via Denoising Tianxiao Shen Jonas

CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 /

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

PLUG PLUG Presentation Layer Universal Generator Presentation Layer Universal Generator

Plug-In Folly Part 5 by Pat Murphy, Plan Curtail Part 5A: Conclusion Plug-In Vehicles the

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Universal Plug and Play Eventing Vulnerabilities Joeri Blokhuis February 4, 2009 Joeri Blokhuis

Unsupervised Machine Translation Sachin Kumar Conditional Text Generation Generate text

FIBARO Wall Plug The smallest in the world Home intelligence Most sophisticated device FIBARO

Plug and Play Language Model : A Simple Baseline for Controlled Language Generation ICLR20

Computer Graphics (Basic OpenGL) Thilo Kielmann Fall 2008 Vrije Universiteit, Amsterdam

RESULTS VISUALISATION RESULTS VISUALISATION At the beginning of this course, the large majority

Part II: Symbolic reachability for prefix rewriting Case study: Drawing skylines static Random r

Plug-and-Play Operation of Microgrids Florian D orfler ETH Z urich electric energy is our

FlawFinder A Modular System for Predicting Quality Flaws in Wikipedia Oliver Ferschke, Iryna

Taking a Customer-Centric Approach in the Business Maria Robertson, Deputy Chief Executive,

ENGR/CS 101 CS Session Lecture 2 Starting with the next class, we will be using Visual Studio

WALL PLUG PLUG-IN SWITCH & POWER METERING MANY FEATURES IN A SINGLE DEVICE The FIBARO Wall