Neural Discrete Representation Learning (VQ-VAE) Aaron van den - PowerPoint PPT Presentation

Neural Discrete Representation Learning (VQ-VAE) Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu Google Deepmind NIPS 2017

Neural Discrete Representation Learning 1. What is the task? 2. Comparison & Contribution 3. VQ-VAE Model 4. Results 1. Density estimation & Reconstruction 2. Sampling 3. Speech 5. Discussion & Conclusion

What is the task? • Task 1: Density estimation: learn p(x) • Task 2: Extract meaningful latent variable (unsupervised) • Task 3: Reconstruct input Output x’ Latent z Input x

Comparison & Contribution 1. Bounds p(x), but does not require variational approximation 2. Train using maximum likelihood (stable training) 3. First to use discrete latent variables successfully A little girl sitting on a bed with a 4. Uses whole latent space (avoid ‘posterior collapse’) teddybear After discussion: Why is discrete nice? More natural representation for humans, avoids posterior collapse (because you can more easily manage your latent space using your dictionary), compresseable, easier to learn a prior over a discrete latent space (more tractable than a continuous latent space).

Auto Encoder How to discretize? For the example: We take this to be a 4 x 4 image with 2 channels. Output Input Latent variable (reconstruction) We can train this system end-to-end using MSE (reconstruction loss)

How to Discretize? 4 x 4 image with 2 channels. Each e has 2 We plot all pixel values (16) in 2D dimensions. (since we have 2 channels) Channel 2 Channel 1

How to Discretize? Make dictionary of vectors 𝑓 1 , … , 𝑓 𝐿 4 x 4 image with 2 channels. Each 𝑓 𝑗 has 2 dimensions.

How to Discretize? Make dictionary of vectors 𝑓 1 , … , 𝑓 𝐿 4 x 4 image with 2 channels. Each 𝑓 𝑗 has 2 dimensions. 𝑓 3 𝑓 2 For each latent pixel, look up nearest dictionary element 𝑓 𝑓 1

How to Discretize? 4 x 4 image with 2 channels. Each 𝑓 𝑗 has 2 dimensions. 𝑓 3

Proposed Model Output Input (reconstruction) Latent variable Latent is 1 channel image and contains the id of each e for each pixel ( discrete ).

How to train? • No time to discuss … See slide 18 -19 • Lets talk about results

R1: Density Estimation & Reconstructions • Comparable with VAE on CIFAR-10 in terms of density estimation • Reconstructions on ImageNet are very good Imagenet 128 * 128 * 3 * 8 = 393216 bits = 48 Kb Reconstruction 32 * 32 * 9 = 9216 bits = 1 Kb

Class: pickup R2: Sampling / Generation PixelCNN • Lack global structure, unsharp. • 1 pixelCNN is not powerful enough. Hierarchical representation necessary

R3: Stacking VQ-VAE • No time to discuss … See slide 20 -22 • Lets go to R4: Speech.

R4: Speech • Decoder: Wavenet (state of the art speech generation) • Excellent speech reconstruction • Sampling results • Unsupervised learning • Voice style transfer • Learns phonemes (using latents: 49.3% accuracy – 7.2% random) https://avdnoord.github.io/homepage/vqvae/

Discussion and Conclusion • Impressive results & good idea • Paper • Glances over many details, supplement & implementation missing • Are learned latents useful? Should be addressed quantatively • Image generation can be greatly improved • Using a hierarchical model as in Lampert (previous coffeetalk) should greatly improve speed and quality

Thanks! • Slides author • https://drive.google.com/file/d/1t8W2L1H2RtUge- IQYqGXa9ihKNVQpqNI/view • Talk author • https://www.youtube.com/watch?v=HqaIkq3qH40

How to train? (1/2) • How to backpropegate through the discretization? • Lets say a gradient is incoming to a dictionary vector • We do not update the dictionary vector (fixed) • Instead we apply the gradient of e to the non-discretized vector 𝑓 3

How to train? (2/2) • Loss part 1: reconstruction error (dictionary fixed) • Loss part 2: to update the dictionary

R3: Stacking VQ-VAE (1/2) Original (21168 bits = 3 Kb) • VQ-VAE stacked to get higher level latents • Use DeepMind lab (artificial images) • Errors: sharpness and global mismatch • Latents seem ‘useful’: can generate coherent video from latent space (input first 6 images, output: video) • No quantative experiment Reconstruction (27 bits)

R3: Stacking VQ-VAE (2/2) Generated Video

Multistage VQ-VAE VQ 3 latents in [0,512] 21 x 21 x 1 in [0,512] Before: 84 * 84 * 3 * 8 = 21168 bits = 3 Kb After 3 * 9 = 27 bits 84 x 84 x 3 In [0,256] Reconstruction not very accurate but powerful representation

Comparison GAN Variational Pixel CNN VQ-VAE Autoencoder (This talk)     Compute exact likelihood p(x)     Has latent variable z     Compute latent variable z (inference)     Discrete latent variable     Stable training?     ? Sharp images?

Neural Discrete Representation Learning (VQ-VAE) Aaron van den - PowerPoint PPT Presentation

Neural Discrete Representation Learning (VQ-VAE) Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu Google Deepmind NIPS 2017 Neural Discrete Representation Learning 1. What is the task? 2. Comparison & Contribution 3. VQ-VAE Model

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Discrete-time Systems in the Time Domain Chaiwoot Boonyasiriwat August 21, 2020 Discrete-time

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Discrete Structures of Computer Science Amanda Watson What is Discrete Mathematics?

Logic and discrete mathematics (HKGAB4) Discrete mathematics: contents http://www.ida.liu.se/

Discrete Log Problem Discrete Log Problem Given a prime number p Z * x

E ffi cient Modeling of Latent Information in Supervised Learning using Gaussian Processes

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model

Modeling nonignorable missingness in multidimensional latent class IRT models Silvia Bacci 1 ,

Convergence of latent mixing measures in finite and infinite mixture models Long Nguyen

La Latent-sp space Dynam Dynamics ics for r Re Reduced Deformable Simulation Lawson Fulton

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work

Introduction to Information Retrieval http://informationretrieval.org IIR 18: Latent Semantic

Probabilistic & Unsupervised Learning Latent Variable Models Maneesh Sahani

Neural Discrete Representation Learning (VQ-VAE) Aaron van den - PowerPoint PPT Presentation

Neural Discrete Representation Learning (VQ-VAE) Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu Google Deepmind NIPS 2017 Neural Discrete Representation Learning 1. What is the task? 2. Comparison & Contribution 3. VQ-VAE Model

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Discrete-time Systems in the Time Domain Chaiwoot Boonyasiriwat August 21, 2020 Discrete-time

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Discrete Structures of Computer Science Amanda Watson What is Discrete Mathematics?

Logic and discrete mathematics (HKGAB4) Discrete mathematics: contents http://www.ida.liu.se/

Discrete Log Problem Discrete Log Problem Given a prime number p Z * x

E ffi cient Modeling of Latent Information in Supervised Learning using Gaussian Processes

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model

Modeling nonignorable missingness in multidimensional latent class IRT models Silvia Bacci 1 ,

Convergence of latent mixing measures in finite and infinite mixture models Long Nguyen

La Latent-sp space Dynam Dynamics ics for r Re Reduced Deformable Simulation Lawson Fulton

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work

Introduction to Information Retrieval http://informationretrieval.org IIR 18: Latent Semantic

Probabilistic &amp; Unsupervised Learning Latent Variable Models Maneesh Sahani

Probabilistic & Unsupervised Learning Latent Variable Models Maneesh Sahani