Variants and Combinations of Basic Models Stefano Ermon, Aditya - PowerPoint PPT Presentation

Variants and Combinations of Basic Models Stefano Ermon, Aditya Grover Stanford University Lecture 12 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 1 / 19

Summary Story so far Representation: Latent variable vs. fully observed Objective function and optimization algorithm: Many divergences and distances optimized via likelihood-free (two sample test) or likelihood based methods Each have Pros and Cons Plan for today: Combining models Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 2 / 19

Variational Autoencoder A mixture of an infinite number of Gaussians: 1 z ∼ N (0 , I ) 2 p ( x | z ) = N ( µ θ ( z ) , Σ θ ( z )) where µ θ ,Σ θ are neural networks 3 p ( x | z ) and p ( z ) usually simple, e.g., Gaussians or conditionally independent Bernoulli vars (i.e., pixel values chosen independently given z ) 4 Idea : increase complexity using an autoregressive model Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 3 / 19

PixelVAE (Gulrajani et al.,2017) z is a feature map with the same resolution as the image x Autoregressive structure: p ( x | z ) = � i p ( x i | x 1 , · · · , x i − 1 , z ) p ( x | z ) is a PixelCNN Prior p ( z ) can also be autoregressive Can be hierarchical: p ( x | z 1 ) p ( z 1 | z 2 ) State-of-the art log-likelihood on some datasets; learns features (unlike PixelCNN); computationally cheaper than PixelCNN (shallower) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 4 / 19

Autoregressive flow Z f θ X Flow model, the marginal likelihood p ( x ) is given by � � �� ∂ f − 1 � � θ ( x ) � � f − 1 � � p X ( x ; θ ) = p Z θ ( x ) � det � � ∂ x � where p Z ( z ) is typically simple (e.g., a Gaussian). More complex prior? Prior p Z ( z ) can be autoregressive p Z ( z ) = � i p ( z i | z 1 , · · · , z i − 1 ). Autoregressive models are flows. Just another MAF layer. See also neural autoregressive flows (Huang et al., ICML-18) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 5 / 19

VAE + Flow Model z φ θ x � ≥ q ( z | x ; φ ) log p ( z , x ; θ ) + H ( q ( z | x ; φ )) = L ( x ; θ, φ ) log p ( x ; θ ) � �� z ELBO log p ( x ; θ ) = L ( x ; θ, φ ) + D KL ( q ( z | x ; φ ) � p ( z | x ; θ )) � �� Gap between true log-likelihood and ELBO q ( z | x ; φ ) is often too simple (Gaussian) compared to the true posterior p ( z | x ; θ ), hence ELBO bound is loose Idea: Make posterior more flexible: z ′ ∼ q ( z ′ | x ; φ ), z = f φ ′ ( z ′ ) for an invertible f φ ′ (Rezende and Mohamed, 2015; Kingma et al., 2016) Still easy to sample from, and can evaluate density. Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 6 / 19

VAE + Flow Model Posterior approximation is more flexible, hence we can get tighter ELBO (closer to true log-likelihood). Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 7 / 19

Multimodal variants Goal: Learn a joint distribution over the two domains p ( x 1 , x 2 ), e.g., color and gray-scale images Can use a VAE style model: z x 1 x 2 Learn p θ ( x 1 , x 2 ), use inference nets q φ ( z | x 1 ), q φ ( z | x 2 ), q φ ( z | x 1 , x 2 ). Conceptually similar to semi-supervised VAE in HW2. Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 8 / 19

Variational RNN Goal: Learn a joint distribution over a sequence p ( x 1 , · · · , x T ) VAE for sequential data, using latent variables z 1 , · · · , z T . Instead of training separate VAEs z i → x i , train a joint model: T � p ( x ≤ T , z ≤ T ) = p ( x t | z ≤ t , x < t ) p ( z t | z < t , x < t ) t =1 z t z t z t z t h t − 1 h t h t − 1 h t h t − 1 h t h t − 1 h t x t x t x t x t (a) Prior (b) Generation (c) Recurrence (d) Inference Chung et al, 2016 Use RNNs to model the conditionals (similar to PixelRNN) Use RNNs for inference q ( z ≤ T | x ≤ T ) = � T t =1 q ( z t | z < t , x ≤ t ) Train like VAE to maximize ELBO. Conceptually similar to PixelVAE. Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 9 / 19

Combining losses Z f θ X Flow model, the marginal likelihood p ( x ) is given by � � �� ∂ f − 1 � � θ ( x ) � � f − 1 � � p X ( x ; θ ) = p Z θ ( x ) � det � � ∂ x � Can also be thought of as the generator of a GAN Should we train by min θ D KL ( p data , p θ ) or min θ JSD ( p data , p θ )? Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 10 / 19

FlowGAN Although D KL ( p data , p θ ) = 0 if and only if JSD ( p data , p θ ) = 0, optimizing one does not necessarily optimize the other. If z , x have same dimensions, can optimize min θ KL ( p data , p θ ) + λ JSD ( p data , p θ ) Interpolates between a GAN and a flow model Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 11 / 19

Adversarial Autoencoder (VAE + GAN) z φ θ x log p ( x ; θ ) = L ( x ; θ, φ ) + D KL ( q ( z | x ; φ ) � p ( z | x ; θ )) � �� ELBO E x ∼ p data [ L ( x ; θ, φ )] = E x ∼ p data [log p ( x ; θ ) − D KL ( q ( z | x ; φ ) � p ( z | x ; θ ))] � �� ≈ training obj. up to const. ≡ − D KL ( p data ( x ) � p ( x ; θ )) − E x ∼ p data [ D KL ( q ( z | x ; φ ) � p ( z | x ; θ ))] � �� equiv. to MLE Note: regularized maximum likelihood estimation (Shu et al, Amortized inference regularization ) Can add in a GAN objective − JSD ( p data , p ( x ; θ )) to get sharper samples, i.e., discriminator attempting to distinguish VAE samples from real ones. Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 12 / 19

An alternative interpretation z φ θ x E x ∼ p data [ L ( x ; θ, φ )] = E x ∼ p data [log p ( x ; θ ) − D KL ( q ( z | x ; φ ) � p ( z | x ; θ ))] � �� ≈ training obj. up to const. ≡ − D KL ( p data ( x ) � p ( x ; θ )) − E x ∼ p data [ D KL ( q ( z | x ; φ ) � p ( z | x ; θ ))] � � log p data ( x ) q ( z | x ; φ ) log q ( z | x ; φ ) � � = − p data ( x ) p ( x ; θ ) + p ( z | x ; θ ) x z �� q ( z | x ; φ ) log q ( z | x ; φ ) p data ( x ) � = − p data ( x ) p ( z | x ; θ ) p ( x ; θ ) x z p data ( x ) q ( z | x ; φ ) log p data ( x ) q ( z | x ; φ ) � = − p ( x ; θ ) p ( z | x ; θ ) x , z = − D KL ( p data ( x ) q ( z | x ; φ ) � p ( x ; θ ) p ( z | x ; θ ) ) � �� q ( z , x ; φ ) p ( z , x ; θ ) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 13 / 19

An alternative interpretation z φ θ x E x ∼ p data [ L ( x ; θ, φ ) ] ≡ − D KL ( p data ( x ) q ( z | x ; φ ) � p ( x ; θ ) p ( z | x ; θ ) ) � �� ELBO q ( z , x ; φ ) p ( z , x ; θ ) Optimizing ELBO is the same as matching the inference distribution q ( z , x ; φ ) to the generative distribution p ( z , x ; θ ) = p ( z ) p ( x | z ; θ ) Intuition : p ( x ; θ ) p ( z | x ; θ ) = p data ( x ) q ( z | x ; φ ) if p data ( x ) = p ( x ; θ ) 1 q ( z | x ; φ ) = p ( z | x ; θ ) for all x 2 Hence we get the VAE objective: 3 − D KL ( p data ( x ) � p ( x ; θ )) − E x ∼ p data [ D KL ( q ( z | x ; φ ) � p ( z | x ; θ ))] Many other variants are possible! VAE + GAN: − JSD ( p data ( x ) � p ( x ; θ )) − D KL ( p data ( x ) � p ( x ; θ )) − E x ∼ p data [ D KL ( q ( z | x ; φ ) � p ( z | x ; θ ))] Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 14 / 19

Adversarial Autoencoder (VAE + GAN) z φ θ x E x ∼ p data [ L ( x ; θ, φ ) ] ≡ − D KL ( p data ( x ) q ( z | x ; φ ) � p ( x ; θ ) p ( z | x ; θ ) ) � �� ELBO q ( z , x ; φ ) p ( z , x ; θ ) Optimizing ELBO is the same as matching the inference distribution q ( z , x ; φ ) to the generative distribution p ( z , x ; θ ) Symmetry: Using alternative factorization: p ( z ) p ( x | z ; θ ) = q ( z ; φ ) q ( x | z ; φ ) if q ( z ; φ ) = p ( z ) 1 q ( x | z ; φ ) = p ( x | z ; θ ) for all z 2 We get an equivalent form of the VAE objective: 3 − D KL ( q ( z ; φ ) � p ( z )) − E z ∼ q ( z ; φ ) [ D KL ( q ( x | z ; φ ) � p ( x | z ; θ ))] Other variants are possible. For example, can add − JSD ( q ( z ; φ ) � p ( z )) to match features in latent space (Zhao et al., 2017; Makhzani et al, 2018) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 15 / 19

Information Preference φ z θ x E x ∼ p data [ L ( x ; θ, φ ) ] ≡ − D KL ( p data ( x ) q ( z | x ; φ ) � p ( x ; θ ) p ( z | x ; θ ) ) � �� ELBO q ( z , x ; φ ) p ( z , x ; θ ) ELBO is optimized as long as q ( z , x ; φ ) = p ( z , x ; θ ). Many solutions are possible! For example, p ( z , x ; θ ) = p ( z ) p ( x | z ; θ ) = p ( z ) p data ( x ) 1 q ( z , x ; φ ) = p data ( x ) q ( z | x ; φ ) = p data ( x ) p ( z ) 2 Note z and z are independent. z carries no information about x . This 3 happens in practice when p ( x | z ; θ ) is too flexible, like PixelCNN. Issue: Many more variables than constraints Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 16 / 19

Variants and Combinations of Basic Models Stefano Ermon, Aditya - PowerPoint PPT Presentation

Variants and Combinations of Basic Models Stefano Ermon, Aditya Grover Stanford University Lecture 12 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 1 / 19 Summary Story so far Representation: Latent variable vs.

MATH 105: Finite Mathematics 6-5: Combinations Prof. Jonathan Duncan Walla Walla College Winter

Consensus Variants Usman Mazhar Mirza 6/17/2013 1 Consensus Variants In the variants we

The game Euclid , its variants, and continued fractions Nhan Bao Ho 23 April 2014 Nhan Bao Ho

Presentation to Accounting Firm Presentation to Accounting Firm FAS 141 Revised (Business

You need to get into a vault Try all combinations. Try a subset of combinations.

4/22/2009 You need to get into a vault Try all combinations. Try a subset of combinations.

2021 SECONDARY 3 SUBJECT COMBINATIONS Students Briefing 14 October 2020 Sec 3 Subject

On the variants of treewidth and minor-closedness property O-joung Kwon KAIST in Daejeon, Korea

Predic'ng 'ssue-specific effects of rare gene'c variants Farhan Damani Biological Data Sciences

Minor variants in HIV-1 Minor variants in HIV-1 Why? Why? University of Cologne Institute of

Influence of the K103N minor variants in Influence of the K103N minor variants in therapy-nave

On Variants of Modified Bar Recursion Paulo Oliva Queen Mary, University of London, UK

Variants of Turing Machines Variants of Turing Machines p.1/49 Robustness

Theory of Computer Science D4. Halting Problem Variants & Rices Theorem Gabriele R oger

Outline Contagion Contagion Basic Contagion Basic Contagion Models Models Complex Networks,

JUST THE MATHS SLIDES NUMBER 19.2 PROBABILITY 2 (Permutations and combinations) by

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

The Wasserstein GAN Instructor: John Thickstun Discussion Board: Available on Ed Zoom Link:

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 FOSDEM - 2019.02.02

An Algorithm for Determining the Endpoints for Isolated Intro to problem Solution

Computing similarity between multiscale biological systems under uncertainty Kris Ghosh Miami

Genera&ve Adversarial Networks NTT

The Role of Geographic Information in News Consumption Gebrekirstos G. Gebremeskel and Arjen P.

Learning-based Sampling over 3D Point Clouds presented by Dr. HOU Junhui, Assistant Professor

Variants and Combinations of Basic Models Stefano Ermon, Aditya - PowerPoint PPT Presentation

Variants and Combinations of Basic Models Stefano Ermon, Aditya Grover Stanford University Lecture 12 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 12 1 / 19 Summary Story so far Representation: Latent variable vs.

MATH 105: Finite Mathematics 6-5: Combinations Prof. Jonathan Duncan Walla Walla College Winter

Consensus Variants Usman Mazhar Mirza 6/17/2013 1 Consensus Variants In the variants we

The game Euclid , its variants, and continued fractions Nhan Bao Ho 23 April 2014 Nhan Bao Ho

Presentation to Accounting Firm Presentation to Accounting Firm FAS 141 Revised (Business

You need to get into a vault Try all combinations. Try a subset of combinations.

4/22/2009 You need to get into a vault Try all combinations. Try a subset of combinations.

2021 SECONDARY 3 SUBJECT COMBINATIONS Students Briefing 14 October 2020 Sec 3 Subject

On the variants of treewidth and minor-closedness property O-joung Kwon KAIST in Daejeon, Korea

Predic'ng 'ssue-specific effects of rare gene'c variants Farhan Damani Biological Data Sciences

Minor variants in HIV-1 Minor variants in HIV-1 Why? Why? University of Cologne Institute of

Influence of the K103N minor variants in Influence of the K103N minor variants in therapy-nave

On Variants of Modified Bar Recursion Paulo Oliva Queen Mary, University of London, UK

Variants of Turing Machines Variants of Turing Machines p.1/49 Robustness

Theory of Computer Science D4. Halting Problem Variants &amp; Rices Theorem Gabriele R oger

Outline Contagion Contagion Basic Contagion Basic Contagion Models Models Complex Networks,

JUST THE MATHS SLIDES NUMBER 19.2 PROBABILITY 2 (Permutations and combinations) by

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

The Wasserstein GAN Instructor: John Thickstun Discussion Board: Available on Ed Zoom Link:

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 FOSDEM - 2019.02.02

An Algorithm for Determining the Endpoints for Isolated Intro to problem Solution

Computing similarity between multiscale biological systems under uncertainty Kris Ghosh Miami

Genera&amp;ve Adversarial Networks NTT

The Role of Geographic Information in News Consumption Gebrekirstos G. Gebremeskel and Arjen P.

Learning-based Sampling over 3D Point Clouds presented by Dr. HOU Junhui, Assistant Professor

Theory of Computer Science D4. Halting Problem Variants & Rices Theorem Gabriele R oger

Genera&ve Adversarial Networks NTT