Improving PixelCNN Vertical stack oblem with this m of masked - PowerPoint PPT Presentation

Improving PixelCNN Vertical stack oblem with this m of masked convolution. Blind spot Horizontal stack Solution: use two stacks of Stacking layers of masked convolution, convolution creates convolution creates a blindspot a blindspot a vertical stack and a horizontal stack 66

Improving PixelCNN I There is a problem with this oblem with this form of masked convolution. m of masked convolution. 1 1 1 1 1 1 1 1 1 1 Blind spot 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 Stacking layers of masked convolution creates a blindspot 67

Improving PixelCNN II Use more expressive nonlinearity: essive nonlinearity: h k +1 = tanh( W k,f ⇤ h k ) � σ ( W k,g ⇤ h k ) This information flow (between vertical and horizontal stacks) preserves the correct pixel dependencies 68

Samples from PixelCNN cs: CIFAR-10 Topi Topics: • Samples from a class-conditioned PixelCNN Coral Reef 69

Samples from PixelCNN cs: CIFAR-10 Topi Topics: • Samples from a class-conditioned PixelCNN 70

Samples from PixelCNN cs: CIFAR-10 Topi Topics: • Samples from a class-conditioned PixelCNN Sandbar 71

Neural Image Model: Pixel RNN Convolutional Long Convolutional Long Short-Term Memory x 1 x n Row LSTM P( ) x i LSTM x n 2 Stollenga et al, 2015 Oord, Kalchbrenner, Kavukcuoglu, 2016 72

Neural Image Model: Pixel RNN Convolutional Long Pixel RNN Multiple layers of convolutional LSTM x 1 x n P( ) x i LSTM x n 2 73

Samples from PixelRNN 74

Architecture for 1D sequences (Bytenet / Wavenet) - Stack of dilated, masked 1-D convolutions in the decoder - The architecture is parallelizable along the time dimension (during training or scoring) - Easy access to many states from the past 77

Video Pixel Net Masked convolution 78

Video Pixel Net 79

VPN Samples for Moving MNIST No frame dependencies VPN Videos on nal.ai/vpn 80

VPN Samples for Robotic Pushing No frame dependencies VPN Videos on nal.ai/vpn 81

VPN Samples for Robotic Pushing 82

Variational Autoencoders 83

Variational Auto-Encoders in General z z ~ q( z | x ) Variational Auto-encoder (VAE) Amortised variational inference for latent variable models F ( q ) = E q φ ( z ) [log p θ ( x | z )] � KL [ q φ ( z | x ) k p ( z )] Desi sign ch choice ces Inference Model Network • Pri rior r on th the late tent t vari riable p( x | z ) q( z | x ) − Continuous, Discrete, Gaussian, Bernoulli, Mixture • Li Likel elihood function − iid (static), sequential, temporal, spatial • Ap Approximating posterior − distribution, sequential, spatial x ~ p( x | z ) For sca scalability and ease se of implementation Data x • Stochastic gradient descent (and variants), • Stochastic gradient estimation 84

Variational Autoencoders (VAEs) • The VAE defines a generative process in terms of ancestral sampling through a cascade of hidden stochastic layers: Gen Each term may denote a Proces h 3 W 3 h 2 W 2 h 1 W 1 v Input data Input data 85

Variational Autoencoders (VAEs) • The VAE defines a generative process in terms of ancestral sampling through a cascade of hidden stochastic layers: Gen Each term may denote a Generative Proces Process Process h 3 W 3 h 2 W 2 h 1 W 1 v Input data Input data 86

Variational Autoencoders (VAEs) • The VAE defines a generative process in terms of ancestral sampling through a cascade of hidden stochastic layers: Gen Each term may denote a Generative Proces Process Process h 3 W 3 denotes parameters of VAE. • • d h 2 • L is the number of st stic layers. stoch chast W 2 aluaRon is tractab h 1 • Sampling and probability evaluation is tractable for each . ach . W 1 v Input data Input data 87

Variational Autoencoders (VAEs) • The VAE defines a generative process in terms of ancestral sampling through a cascade of hidden stochastic layers: Gen Each term may denote a Each term may denote a Generative Proces Process complicated nonlinear relationship Process h 3 W 3 denotes parameters of VAE. • • d h 2 • L is the number of st stic layers. stoch chast W 2 aluaRon is tractab h 1 • Sampling and probability evaluation is tractable for each . ach . W 1 v Input data Input data 88

Variational Autoencoders (VAEs) • The VAE defines a generative process in terms of ancestral sampling through a cascade of hidden stochastic layers: This term denotes a one-layer This term denotes a o neural net StochasRc Layer Stochastic Layer denotes parameters of VAE. • DeterminisRc • d Deterministic Layer Layer • L is the number of st stic layers. stoch chast aluaRon is tractab • Sampling and probability evaluation is StochasRc Layer Stochastic Layer tractable for each . ach . 89

Variational Bound • The VAE is trained to maximize the variational lower bound: h 3 W 3 h 2 W 2 h 1 W 1 v Input data 90

Variational Bound • The VAE is trained to maximize the variational lower bound: • Trading off the data log-likelihood and the KL divergence from the true posterior. h 3 W 3 h 2 W 2 h 1 W 1 v Input data 91

Variational Bound • The VAE is trained to maximize the variational lower bound: • Trading off the data log-likelihood and the KL divergence from the true posterior. h 3 • Hard to optimize the variational bound with respect to the W 3 recognition network (high-variance). h 2 W 2 h 1 • Key idea of Kingma and Welling is to use W 1 reparameterization trick. v Input data 92

Reparameterization Trick • Assume that the recognition distribution is Gaussian: • with mean and covariance computed from the state of the hidden units at the previous layer. 93

Reparameterization Trick • Assume that the recognition distribution is Gaussian: • with mean and covariance computed from the state of the hidden units at the previous layer. • Alternatively, we can express this in term of au able : auxi xiliar ary y var variab 94

Reparameterization Trick • Assume that the recognition distribution is Gaussian: • • Or on c • The recognition distribution can be expressed in terms of a deterministic mapping: apping: Deterministic Encoder Distribution of does not depend on of n 95

Reparameterization Trick Decoder Decoder ( ) ( ) + Sample from * Encoder Encoder Sample from ( ) ( ) without reparameterization trick with reparameterization trick Image: Carl Doersch 96

Computing the Gradients • The gradient w.r.t the parameters: both recognition and generative: generaRve: Gradients can be Gradients can be The mapping h is a deterministic computed by backprop neural net for fixed of 97

Implementing a Variational Algorithm Forward pass Backward pass tion Variational inference turns integration r φ into optimization: Au Automated Tools : : z H[ q(z) ] Prior Differentiation : Theano, Torch7, • Di p(z) Prior TensorFlow, Stan. p(z) Message passing: infer.NET • Me log p(z) Inference r θ q(z |x) Inference q(z |x) Model • Stochastic gradient descent and p(x |z) Model other preconditioned optimization. p(x |z) r φ • Same code can run on both GPUs or Data x on distributed clusters. tion. • Probabilistic models are modular, can log p(x|z) easily be combined. s or Ideally want probabilistic programming using variational inference. 103

Latent Gaussian VAE p ( z ) = N ( 0 , I ) Deep Latent Gaussian Model Prior z H[ q(z) ] p(z) log p(z) p ( x | f p θ ( z )) Inference q(z |x) p θ ( x | z ) = N ( µ p θ ( z ) , Σ p θ ( z )) Model p(x |z) Data x q φ ( z | x ) = N ( µ q φ ( x ) , Σ q φ ( x )) log p(x|z) F ( x , q ) = E q ( z ) [log p ( x | z )] � KL [ q ( z ) k p ( z )] All functions are deep networks. 104

Latent Gaussian VAE 3 dimensional latent variable of MNIST Oxygen/Swimmers Moving Left Latent space disentangles the input data. Latent Factor Embedding p(Y| θ ) R D 3 < 0.01 0.01 - 0.2 Latent space and 0.2 - 0.4 2 likelihood bound Factor 2 1 gives a visualisation of importance. 0 − 1 − 2 − 2 − 1 0 1 2 3 Factor 1 105

Blundell, Charles, Benigno Uria, Alexander Pritzel, Yazhe Li, Avraham VAE Representations Ruderman, Joel Z. Leibo, Jack Rae, Daan Wierstra, and Demis Hassabis. "Model-Free Episodic Control.” 2016 a 1 a 2 a 3 Representation a R Representations are useful for strategies such as episodic control. 106

Latent Gaussian VAE • Require flexible approximations for the types of posteriors we are likely to see. 107

Improving PixelCNN Vertical stack oblem with this m of masked - PowerPoint PPT Presentation

Improving PixelCNN Vertical stack oblem with this m of masked convolution. Blind spot Horizontal stack Solution: use two stacks of Stacking layers of masked convolution, convolution creates convolution creates a blindspot a blindspot a

PixelCNN Models with Auxiliary Variables for Natural Image Modeling Alexander Kolesnikov*,

Deep Autoregressive Models mainly PixelCNN and Wavenet 1 Another Way to Generate

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

for innovation improving for innovation improving Design Thinking for innovation improving New

Improving Outcomes and Controlling Costs: Improving Outcomes and Controlling Costs: Improving

Services in Portsmouth date Improving health services Improving health services Improving

Bending the Cost Curve and Improving Bending the Cost Curve and Improving Bending the Cost Curve

Duke iGEM 2014 Methodology Scaling up Synthetic Biology Improving Improving Improving CRISPR

Pixel Recurrent Neural Networks Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu Google

Plymouth Larissa Milden, Active for All Service Manager What is Im Improving Lives Plymouth?

Creating a Healthy Beach Community 60 Years of Improving Health 60 Years of Improving Health 60

Annual General Meeting Wednesday 2 nd October 2013 Improving health services Improving health

CSE 504: Project Proposal Jennifer Niederlnder 01/13/2016 Improving Security Testing

Advanced Medical Care: Improving Veterinary Anesthesia 1 Advanced Medical Care: Improving

The Texas Air Quality Study: Improving the State of the Science of Air Quality in Improving the

Updated on reconstruction of ProtoDUNE DP data Vyacheslav Galymov IP2I Lyon Matching in 3D

Fast Software Cache Design for Network Appliances Dong Zhou, Huacheng Yu, Michael Kaminsky,

VerMI & VerFI Verification Tools for Masked Implementations Svetla Nikova, Victor Arribas

Block Ciphers Implementations Provably Secure Against Second Order Side Channel Analysis Matthieu

Compiler Assisted Masking A. Moss, E. Oswald, D. Page and M. Tunstall School of Computing,

Sentence and Contextualised Word Representations Graham Neubig Site

Ins Instanc nce segm segmen enta tati tion on CV3DST | Prof. Leal-Taix 1 Se Semanti

for McEliece Im Implementations Thomas Eisenbarth Joint work with Cong Chen, Ingo von Maurich

Improving PixelCNN Vertical stack oblem with this m of masked - PowerPoint PPT Presentation

Improving PixelCNN Vertical stack oblem with this m of masked convolution. Blind spot Horizontal stack Solution: use two stacks of Stacking layers of masked convolution, convolution creates convolution creates a blindspot a blindspot a

PixelCNN Models with Auxiliary Variables for Natural Image Modeling Alexander Kolesnikov*,

Deep Autoregressive Models mainly PixelCNN and Wavenet 1 Another Way to Generate

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

for innovation improving for innovation improving Design Thinking for innovation improving New

Improving Outcomes and Controlling Costs: Improving Outcomes and Controlling Costs: Improving

Services in Portsmouth date Improving health services Improving health services Improving

Bending the Cost Curve and Improving Bending the Cost Curve and Improving Bending the Cost Curve

Duke iGEM 2014 Methodology Scaling up Synthetic Biology Improving Improving Improving CRISPR

Pixel Recurrent Neural Networks Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu Google

Plymouth Larissa Milden, Active for All Service Manager What is Im Improving Lives Plymouth?

Creating a Healthy Beach Community 60 Years of Improving Health 60 Years of Improving Health 60

Annual General Meeting Wednesday 2 nd October 2013 Improving health services Improving health

CSE 504: Project Proposal Jennifer Niederlnder 01/13/2016 Improving Security Testing

Advanced Medical Care: Improving Veterinary Anesthesia 1 Advanced Medical Care: Improving

The Texas Air Quality Study: Improving the State of the Science of Air Quality in Improving the

Updated on reconstruction of ProtoDUNE DP data Vyacheslav Galymov IP2I Lyon Matching in 3D

Fast Software Cache Design for Network Appliances Dong Zhou, Huacheng Yu, Michael Kaminsky,

VerMI &amp; VerFI Verification Tools for Masked Implementations Svetla Nikova, Victor Arribas

Block Ciphers Implementations Provably Secure Against Second Order Side Channel Analysis Matthieu

Compiler Assisted Masking A. Moss, E. Oswald, D. Page and M. Tunstall School of Computing,

Sentence and Contextualised Word Representations Graham Neubig Site

Ins Instanc nce segm segmen enta tati tion on CV3DST | Prof. Leal-Taix 1 Se Semanti

for McEliece Im Implementations Thomas Eisenbarth Joint work with Cong Chen, Ingo von Maurich

VerMI & VerFI Verification Tools for Masked Implementations Svetla Nikova, Victor Arribas