Adversarially Regularized Autoencoders Junbo (Jake) Zhao, Yoon Kim, - - PowerPoint PPT Presentation

adversarially regularized autoencoders
SMART_READER_LITE
LIVE PREVIEW

Adversarially Regularized Autoencoders Junbo (Jake) Zhao, Yoon Kim, - - PowerPoint PPT Presentation

Adversarially Regularized Autoencoders Junbo (Jake) Zhao, Yoon Kim, Kelly Zhang, Alexander M. Rush, Yann LeCun Wei Zhen Teoh and Mathieu Ravaut 1 Refresh: Adversarial Autoencoder [From Adversarial Autoencoders by Makhzani et al 2015] 2 Some


slide-1
SLIDE 1

Adversarially Regularized Autoencoders

Junbo (Jake) Zhao, Yoon Kim, Kelly Zhang, Alexander M. Rush, Yann LeCun

1

Wei Zhen Teoh and Mathieu Ravaut

slide-2
SLIDE 2

Refresh: Adversarial Autoencoder

2

[From Adversarial Autoencoders by Makhzani et al 2015]

slide-3
SLIDE 3

Some Changes - Learned Generator

3

slide-4
SLIDE 4

Some Changes - Wasserstein GAN

  • The distance measure between two distributions is defined by the

Earth-mover distance, or Wasserstein-1:

4

[From Wasserstein GAN by Arjovsky et al 2017]

slide-5
SLIDE 5

Some Changes - Wasserstein GAN

  • This is equivalent to the following supremum over Lipschitz-1 functions:
  • In practice, f is approximated by a neural network fw where all the weights

are clipped to lie in a compact space such as a hypercube of size epsilon.

5

slide-6
SLIDE 6

Some Changes - Discrete Data

Instead of a continuous vector, X is now discrete data:

  • Binarized MNIST
  • Text (sequences of one-hot vocabulary vector)

6 [From https://ayearofai.com/lenny-2-autoencoders-and-word

  • embeddings-oh-my-576403b0113a]
slide-7
SLIDE 7

Some Changes - Encoder (for sequential data)

hn becomes the latent code c

7 [From https://mlalgorithm.wordpress.com/2016/08/04/deep-learning-part-2-recurrent-neural-networks-rnn/]

slide-8
SLIDE 8

Model

8

slide-9
SLIDE 9

Training Objective

Reconstruction loss Wasserstein distance between two distributions

9

slide-10
SLIDE 10

Training Objective Components

  • Reconstruction from decoder:
  • Reconstruction loss:

10

slide-11
SLIDE 11

Training Objective Components

Discriminator maximizing objective: Generator minimizing objective:

The max of this function approximates the Wasserstein distance

11

slide-12
SLIDE 12

Training

12

slide-13
SLIDE 13

Training

13

slide-14
SLIDE 14

Training

14

slide-15
SLIDE 15

Extension: Code Space Transfer

Unaligned transfer for text: Can we change an attribute (e.g. sentiment) of the text without changing the content using this autoencoder? Example:

15

slide-16
SLIDE 16

Extension: Code Space Transfer

sentiment attribute

  • Extend decoder to condition on a transfer variable to learn

16

slide-17
SLIDE 17

Extension: Code Space Transfer

  • Train the encoder adversarially against a classifier so that the code

space is invariant to attribute

Classifier:

17

slide-18
SLIDE 18

Additional Training

18

slide-19
SLIDE 19

AE: WGAN:

19

Input images are binarized MNIST, but normal MNIST would work as well.

Image model

EM distance

[From Adversarially Regularized Autoencoders by Zhao et al, 2017]

slide-20
SLIDE 20

AE: WGAN:

Same generator architecture

20

Text model

EM distance

[Partly from https://blog.statsbot.co/time-series-prediction-using-recurrent-neural-networks-lstms-807fa6ca7f]

slide-21
SLIDE 21

AE: WGAN:

Same generator architecture

21

Text transfer model

EM distance One decoder per class

slide-22
SLIDE 22

Checkpoint 1: How does the norm of c’ behave over training?

22

Experiment #1: effects of regularizing with WGAN

c’ L2 norm matching c L2 norm

[From Adversarially Regularized Autoencoders by Zhao et al, 2017]

slide-23
SLIDE 23

23

Experiment #1: effects of regularizing with WGAN

c’ and c sum of dimension-wise variance matching over time Checkpoint 2: How does the encoding space behave? Is it noisy?

[From Adversarially Regularized Autoencoders by Zhao et al, 2017]

slide-24
SLIDE 24

24

Experiment #1: effects of regularizing with WGAN

Checkpoint 3: Choose one sentence, then 100 other sentences within an edit-distance inferior to 5 Average cosine similarity in latent space. Maps similar input to nearby code.

[From Adversarially Regularized Autoencoders by Zhao et al, 2017]

slide-25
SLIDE 25

25

Experiment #1: effects of regularizing with WGAN

Checkpoint 4: Swap k words from an original sentence. Left: reconstruction error (NLL). Right: reconstruction examples.

[From Adversarially Regularized Autoencoders by Zhao et al, 2017]

slide-26
SLIDE 26

26

Decode positive sentences Decode negative sentences Encode all sentences

Remove sentiment information from the latent space:

  • At training time: adversarial training.
  • At test time: pass sentences of one class, decode with the

decoder from the other class

Experiment #2: unaligned text transfer

[Partly from https://blog.statsbot.co/time-series-prediction-using-recurrent-neural-networks-lstms-807fa6ca7f]

slide-27
SLIDE 27

27

Results:

  • Better transfer
  • Better perplexity
  • Transferred text less similar to original text

Experiment #2: unaligned text transfer

[From Adversarially Regularized Autoencoders by Zhao et al, 2017]

slide-28
SLIDE 28

28

Medium: 22.% of labels Small: 10.8% of labels Tiny: 5.25% of labels SNLI dataset:

  • 570k human-written English sentence pairs
  • 3 classes: entailment, contradiction, neutral

Experiment #3: semi-supervised classification

[From Adversarially Regularized Autoencoders by Zhao et al, 2017]

slide-29
SLIDE 29

Playground: latent space interpolation

29

[From Adversarially Regularized Autoencoders by Zhao et al, 2017]

slide-30
SLIDE 30

30

Pros: ✓ Better discrete autoencoder

  • Semi-supervision
  • Text transfer

✓ Different approach to text generation ✓ Robust latent space Cons: ❖ Sensitive to hyperparameters (GANs…) ❖ Unclear why WGAN ❖ Not so much novelty compared to Adversarial Auto Encoders (AAE) ❖ Discrete data but no discrete latent structure :/

Conclusion about Adversarially Regularized AEs