CSC2541: Differentiable Inference and Generative Models Density - PowerPoint PPT Presentation

CSC2541: Differentiable Inference and Generative Models

Density estimation using Real NVP. Ding et al, 2016

Nguyen A, Dosovitskiy A, Yosinski J, Brox T, Clune J (2016). Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Advances in Neural Information Processing Systems 29

A group of people are watching a dog ride (Jamie Kyros)

Pixel Recurrent Neural Networks Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu

Types of Generative Models • Conditional probabilistic models • Latent-variable probabilistic models • GANs • Invertible models

Advantages of latent variable models • Model checking by sampling • Natural way to specify models • Compact representations • Semi-Supervised learning • Understanding factors of variation in data

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Alec Radford, Luke Metz, Soumith Chintala

Advantages of probabilistic latent-variable models • Data-efficient learning - automatic regularization, can take advantage of more information • Compose models - e.g. incorporate data corruption model. Different from composing feedforward computations • Handle missing data (without the standard hack of just guessing the missing values using averages). • Predictive uncertainty - necessary for decision-making • conditional predictions (e.g. if brexit happens, the value of the pound will fall) • Active learning - what data would be expected to increase our confidence about a prediction • Cons: • intractable integral over latent variables • Examples: medical diagnosis, image modeling

[1] [2] [3] [4] Gaussian mixture model Linear dynamical system Hidden Markov model Switching LDS [5] [2] [6] [7] Mixture of Experts Driven LDS IO-HMM Factorial HMM [8,9] [10] Canonical correlations analysis admixture / LDA / NMF [1] Palmer, Wipf, Kreutz-Delgado, and Rao. Variational EM algorithms for non-Gaussian latent variable models. NIPS 2005. [2] Ghahramani and Beal. Propagation algorithms for variational Bayesian learning. NIPS 2001. [3] Beal. Variational algorithms for approximate Bayesian inference, Ch. 3. U of London Ph.D. Thesis 2003. [4] Ghahramani and Hinton. Variational learning for switching state-space models. Neural Computation 2000. [5] Jordan and Jacobs. Hierarchical Mixtures of Experts and the EM algorithm. Neural Computation 1994. [6] Bengio and Frasconi. An Input Output HMM Architecture. NIPS 1995. [7] Ghahramani and Jordan. Factorial Hidden Markov Models. Machine Learning 1997. [8] Bach and Jordan. A probabilistic interpretation of Canonical Correlation Analysis. Tech. Report 2005. [9] Archambeau and Bach. Sparse probabilistic projections. NIPS 2008. [10] Hoffman, Bach, Blei. Online learning for Latent Dirichlet Allocation. NIPS 2010. Courtesy of Matthew Johnson

Differentiable models • Model distributions implicitly by a variable pushed through a deep net: y = f θ ( x ) • Approximate intractable distribution by a tractable distribution parameterized by a deep net: p ( y | x ) = N ( y | µ = f θ ( x ) , Σ = g θ ( x )) • Optimize all parameters using stochastic gradient descent

Probabilistic graphical models Deep learning + structured representations – neural net “goo” + priors and uncertainty – difficult parameterization + data and computational efficiency – can require lots of data – rigid assumptions may not fit + flexible – feature engineering + feature learning – top-down inference + recognition networks

Machine-learning-centric History of Generative Models 1940s - 1960s Motivating probability and Bayesian inference • 1980s - 2000s Bayesian machine learning with MCMC • 1990s - 2000s Graphical models with exact inference • 1990s - present Bayesian Nonparametrics with MCMC (Indian Buffet • process, Chinese restaurant process) 1990s - 2000s Bayesian ML with mean-field variational inference • 1995 Helmholtz machine ( almost invented variational autoencoders) • 2000s - present Probabilistic Programming • 2000s - 2013 Deep undirected graphical models (RBMs, pretraining) • 2010s - present Stan - Bayesian Data Analysis with HMC • 2000s - 2013 Autoencoders, denoising autoencoders • 2000s - present Invertible density estimation • 2013 - present Variational autoencoders • 2014 - present Generative adversarial nets •

Frontiers • Generate images given captions • Generating large structures • images with consistent internal structure and not blurry • videos • long texts • Discrete latent random variables • Generate complex discrete structures • Time-series models for reinforcement learning

Nguyen A, Dosovitskiy A, Yosinski J, Brox T, Clune J (2016). Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Advances in Neural Information Processing Systems 29

Modeling idea: graphical models on latent variables, neural network models for observations Composing graphical models with neural networks for structured representations and fast inference. Johnson, Duvenaud, Wiltschko, Datta, Adams, NIPS 2016

unsupervised learning supervised learning Courtesy of Matthew Johnson

data space latent space

Application: learn syllable representation of behavior from video 60 60 60 50 50 50 40 40 40 mm mm mm 30 30 30 20 20 20 10 10 10 0 0 10 0 mm mm 10 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 mm 10 2 mm 0 10 20 30 40 50 60 70 2 10 20 30 40 50 60 70 0 2 0 30 30 30 40 40 40 m m m m 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 10 20 30 40 50 60 70 10 20 30 40 50 60 70

θ z 1 z 2 z 3 z 4 z 5 z 6 z 7 x 1 x 3 x 2 x 4 x 5 x 6 x 7 y 1 y 2 y 3 y 4 y 5 y 6 y 7 60 60 60 50 50 50 40 40 40 mm mm mm 30 30 30 20 20 20 10 10 10 0 10 0 0 mm mm 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 10 mm 10 mm 20 10 20 30 40 50 60 70 10 20 30 40 50 60 70 20 20 30 30 30 4 0 4 0 4 0 mm mm 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 10 20 30 40 50 60 70 10 20 30 40 50 60 70 Courtesy of Matthew Johnson

start rear

fall from rear

grooming

From Carl Rasmussen

Seminars • 7 weeks of seminars, about 8 people each • Each day will have one or two major themes, 3-6 papers covered • Divided into 2-3 presentations of about 30 mins each • Explain main idea, relate to previous work and future directions

Class Projects • Develop a generative model for a new medium . • Generate sound given video (hard to generate raw sound) • Automatic onomatopoeia: Generate text ‘ka- bloom-kshhhh’ given a sound of an explosion. • Generating text of a specific style. For instance, generating SMILES strings representing organic molecules

Class Projects • Extend existing models, inference, or training . For instance: • Extending variational autoencoders to have infinite capacity in some sense (combining Nonparametric Bayesian methods with variational autoencoders) • Train a VAE or GAN for matrix decomposition • Explore the use of mixture distributions for approximating distributions

Class Projects • Apply an existing approach in a new way . • Missing data (not at random) • Automatic data cleaning (flagging suspect entries) • Simultaneous localization and mapping (SLAM) from scratch

Class Projects • Review / comparison / tutorials: • Approaches to generating images • Approaches to generating video • Approaches to handling discrete latent variables • Approaches to building invertible yet general transformations • Variants of the GAN training objective • Different types of recognition networks • clearly articulate the differences between different approaches, and their strengths and weaknesses. • Ideally, include experiments highlighting the different properties of each method on realistic problems.

Class Project Dates • Project proposal due Oct 14th • about 2 pages, include prelim. lit search • Presentations: Nov 18th and 25th • Projects due: Dec 10th

Grades • Class presentations - 20% • Project proposal - 20% • Project presentation - 20% • Project report and code - 40%

CSC2541: Differentiable Inference and Generative Models Density - PowerPoint PPT Presentation

CSC2541: Differentiable Inference and Generative Models Density estimation using Real NVP. Ding et al, 2016 Nguyen A, Dosovitskiy A, Yosinski J, Brox T, Clune J (2016). Synthesizing the preferred inputs for neurons in neural networks via deep

CSC2541: Differentiable Inference and Generative Models Lecture 2: Variational autoencoders

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

CSC2541 Lecture 1 Introduction Roger Grosse Roger Grosse CSC2541 Lecture 1 Introduction 1 / 36

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

generative design systems Generative Brief Design Definitions Workshop Processes

CSC2541 Lecture 2 Bayesian Occams Razor and Gaussian Processes Roger Grosse Roger Grosse

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

GAN Foundations CSC2541 Michael Chiu - chiu@cs.toronto.edu Jonathan Lorraine -

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Structured Inference Networks for Nonlinear State Space Models Rahul G. Krishnan, Uri Shalit,

Bayesian Optimization CSC2541 - Topics in Machine Learning Scalable and Flexible Models of

Canonical Correlation Analysis In principal components analysis, we analyzed one set of variables

The stability of Azores Aug 2017 John Webb, UNSW/CMS fundamental constants Cambridge

Analysis of K-lines X ray fluorescence of Rare Earth and High Z elements on storage

Home Monitoring of Chronic Disease Telehealth Trial Organisational challenges and moving towards a

SCC/NN Retrieval Status and Plans In Support of ROSES NNH06ZDA001N-EOS William J. Blackwell and

CSE Cout Cin Inputs: A, B, Carry-in 311 Outputs: Sum, Carry-out A A A A A B B B

Lecture 5 Logistics HW2 posted on Wed, due 10/8 Lab1 done Lab1 done Final exam

Slides for Lecture 9 ENEL 353: Digital Circuits Fall 2013 Term Steve Norman, PhD, PEng

CSC2541: Differentiable Inference and Generative Models Density - PowerPoint PPT Presentation

CSC2541: Differentiable Inference and Generative Models Density estimation using Real NVP. Ding et al, 2016 Nguyen A, Dosovitskiy A, Yosinski J, Brox T, Clune J (2016). Synthesizing the preferred inputs for neurons in neural networks via deep

CSC2541: Differentiable Inference and Generative Models Lecture 2: Variational autoencoders

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

CSC2541 Lecture 1 Introduction Roger Grosse Roger Grosse CSC2541 Lecture 1 Introduction 1 / 36

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

generative design systems Generative Brief Design Definitions Workshop Processes

CSC2541 Lecture 2 Bayesian Occams Razor and Gaussian Processes Roger Grosse Roger Grosse

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

GAN Foundations CSC2541 Michael Chiu - chiu@cs.toronto.edu Jonathan Lorraine -

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Structured Inference Networks for Nonlinear State Space Models Rahul G. Krishnan, Uri Shalit,

Bayesian Optimization CSC2541 - Topics in Machine Learning Scalable and Flexible Models of

Canonical Correlation Analysis In principal components analysis, we analyzed one set of variables

The stability of Azores Aug 2017 John Webb, UNSW/CMS fundamental constants Cambridge

Analysis of K-lines X ray fluorescence of Rare Earth and High Z elements on storage

Home Monitoring of Chronic Disease Telehealth Trial Organisational challenges and moving towards a

SCC/NN Retrieval Status and Plans In Support of ROSES NNH06ZDA001N-EOS William J. Blackwell and

CSE Cout Cin Inputs: A, B, Carry-in 311 Outputs: Sum, Carry-out A A A A A B B B

Lecture 5 Logistics HW2 posted on Wed, due 10/8 Lab1 done Lab1 done Final exam

Slides for Lecture 9 ENEL 353: Digital Circuits Fall 2013 Term Steve Norman, PhD, PEng

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan