pixel recurrent neural networks
play

Pixel Recurrent Neural Networks Aaron van den Oord, Nal - PowerPoint PPT Presentation

Pixel Recurrent Neural Networks Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu Google Deepmind ICML'16 188 citations Pixel Recurrent Neural Networks 1. What is the task? 2. Other models: GAN, VAE, 3. PixelCNN model 4. Results


  1. Pixel Recurrent Neural Networks Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu Google Deepmind ICML'16 188 citations

  2. Pixel Recurrent Neural Networks 1. What is the task? 2. Other models: GAN, VAE, … 3. PixelCNN model 4. Results 5. Discussion & Conclusion 6. Extensions (preview of next coffeetalk)

  3. Goal: learning the distribution of natural images • Task: • Why learn p(x)? • Input: training set of images • Image reconstruction / inpainting / denoising: input corrupted image, • Output: model that estimates p(x) output fixed image for any image x • Image colorization: input greyscale, • Evaluation: measure p(x) on testset. output color image Higher p(x) is better. • Semi-supervised learning (low • Note: p(x) should be normalized density separation) • Representation learning (find manifold of natural images) • Dimensionality reduction / finding variations in data • Clustering • …

  4. Other approaches GAN Variational Pixel CNN Invertible Autoencoder (This talk) models (VAE) Real NVP     Compute exact likelihood p(x)     Has latent variable z     Compute latent variable z (inference)    Stable training? (No mode collapse) ?    Sharp images? ?

  5. Pixel CNN (1/2) • Why is computing 𝑞(𝑦) so difficult? • This is the reason why GANs avoid it, and VAE approximate it • Answer: normalization of 𝑞(𝑦) • We need to integrate the model output over all images x which is intractable • Pixel CNN computes 𝑞(𝑦) using the chain rule of probability 𝑞 𝑦 = 𝑞 𝑦 4 𝑦 3 , 𝑦 2 , 𝑦 1 𝑞 𝑦 3 𝑦 2 , 𝑦 1 𝑞 𝑦 2 𝑦 1 𝑞 𝑦 1 • The function 𝑞(𝑦 𝑗 |𝑦 𝑗−1 , … , 𝑦 1 ) is modeled using a CNN • This 1D function is easy to keep normalized • If this conditional density is normalized, 𝑞(𝑦) is properly normalized as well!

  6. Pixel CNN (2/2) 1. Order pixels 2. Imagine already generated pixels 1-6, want to predict pixel 7 3. Mask pixels 7-16 (set to 0) 4. CNN outputs normalized histogram for pixel 7 given pixel values 1-6 (maksed input) • Maximize log likelihood w.r.t. CNN parameters Image from trainset Masked image 1 2 3 4 1 2 3 4 5 6 7 8 5 6 7 8 CNN 9 10 11 12 9 10 11 12 13 14 15 16 13 14 15 16 OUTPUT INPUT

  7. Results (1/2)

  8. Results of generating ‘new’ images

  9. Results & Discussion • Sampled images • Good local coherence • Incoherent global structure • Sharp images! • SOTA on likelihood CIFAR-10 • Discussion CIFAR-10. NLL = Negative log likelihood in bits • Slow generation (sequential) per dimension (lower is better) • No latent representation • (Teacher forcing)

  10. Preview of next coffeetalk • PixelCNN++ (faster), Conditional PixelCNN, PixelVAE , … • Use a pyramid of pixel CNN models • Go from low resolution to high resolution • Improves global coherence of generated images • Model becomes much faster • Decomposition of likelihood (high level details, low level details) • Next coffeetalk : “ PixelCNN with Auxiliary Variables for Natural Image Modeling” C.H. Lampert • Want to know more? https://www.cs.toronto.edu/~duvenaud/courses/csc2541/index.html Good course on Deep Generative Models (GAN, VAE, pixelCNN , Real NVP,…)

Recommend


More recommend