PixelCNN Models with Auxiliary Variables for Natural Image Modeling Alexander Kolesnikov*, Christoph H. Lampert* *IST Austria ICML 2017
PixelCNN Models with Auxiliary Variables 1. What is the task? 2. PixelCNN model (recap of last coffeetalk) 3. Proposed models a) Grayscale Pixel CNN b) Pyramid Pixel CNN 4. Conclusion
What is the task? Density estimation • Task: • Why learn p(x)? • Input: training set of images • Representation learning • Output: model estimating p(x) • Image reconstruction • Evaluation: measure p(x) on testset. • Deblurring • Higher p(x) is better. • Super resolution • Note: p(x) should be normalized • Image compression • …
Recap of Pixel CNN • Pixel CNN is a recurrent network • Generative model • Input: previously generated pixels • Output: pdf (prediction) for next pixel • Pro’s • Cons • Can compute p(x) (unlike GANs) • No latent variables • Train by maximum likelihood • Generation of images is very slow • Stable training (unlike GANs) because of recurrent structure • Generates sharp images (unlike VAE) • Incoherent global image structure
First Pixel CNN Proposed Grayscale Pixel CNN • PixelCNN models lowlevel feature well, but not global structure. • Idea: Likelihood is dominated by low-level details. Deep CNN (feature extractor) • First pixel CNN for global details • Output: grayscale version with 4 bits / pixel. Second Pixel CNN • Second pixel CNN for low level details • Input: output of first model (auxiliary variable) • Output: 24 bit color image.
Grayscale Pixel CNN: Results • State of the art performance on testset (CIFAR-10) • Samples are highly diverse and have coherent global structure • Is not overfitting (train loss = test loss, approximately) • Decomposition of likelihood in 2 parts shows that indeed low-level detail have more influence in the likelihood objective. • Because grayscale pixel CNN has 2 models, the objectives do not interfere
Pyramid Pixel CNN • Motivations: (1) asymmetry, lower right pixel has access to more information. (2) speed up model. Very Deep CNN Very Deep CNN Very Deep CNN Very Deep CNN • Idea P1 P2 P3 P4 P5
Pyramid Pixel CNN: Results (1/2) • Close to SOTA on CIFAR-10 • Speed up factor at least 10x • Evaluation on CelebA MAP MAP
Pyramid Pixel CNN: Results (2/2)
Conclusions • Low-level details distract models from learning high level details • Use 2 models (low level model, high level model) • Multiscale architecture can model high resolution faces • Next coffeetalk : “ Neural Discrete Representation Learning”
Grayscale results
Grayscale colored
Recommend
More recommend