 
              Infinite (in theory) RNN temporal extent (neurons that are function of all video frames in the past) Finite temporal 3D extent CONVNET (neurons that are only a function of finitely many video frames in the past) video Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 36
i.e. we obtain: Infinite (in theory) temporal extent (neurons that are function RNN of all video frames in the past) CONVNET video Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 37
Summary - You think you need a Spatio-Temporal Fancy Video ConvNet - STOP. Do you really? - Okay fine: do you want to model: - local motion? (use 3D CONV), or - global motion? (use LSTM). - Try out using Optical Flow in a second stream (can work better sometimes) - Try out GRU-RCN! (imo best model) Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 38
Unsupervised Learning Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 39
Unsupervised Learning Overview ● Definitions ● Autoencoders ○ Vanilla ○ Variational ● Adversarial Networks Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 40
Supervised vs Unsupervised Supervised Learning Data : (x, y) x is data, y is label Goal : Learn a function to map x -> y Examples : Classification, regression, object detection, semantic segmentation, image captioning, etc Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 41
Supervised vs Unsupervised Supervised Learning Unsupervised Learning Data : (x, y) Data : x x is data, y is label Just data, no labels! Goal : Learn a function to Goal : Learn some structure map x -> y of the data Examples : Classification, Examples : Clustering, regression, object detection, dimensionality reduction, feature semantic segmentation, image learning, generative models, etc. captioning, etc Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 42
Unsupervised Learning ● Autoencoders ○ Traditional: feature learning ○ Variational: generate samples ● Generative Adversarial Networks: Generate samples Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 43
Autoencoders z Features Encoder x Input data Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 44
Autoencoders Originally : Linear + nonlinearity (sigmoid) Later : Deep, fully-connected Later : ReLU CNN z Features Encoder x Input data Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 45
Autoencoders Originally : Linear + nonlinearity (sigmoid) z usually smaller than x Later : Deep, fully-connected (dimensionality reduction) Later : ReLU CNN z Features Encoder x Input data Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 46
Autoencoders Reconstructed xx input data Decoder z Features Encoder x Input data Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 47
Originally : Linear + Autoencoders nonlinearity (sigmoid) Later : Deep, fully-connected Later : ReLU CNN (upconv) Reconstructed xx input data Decoder Encoder : 4-layer conv Decoder : 4-layer upconv z Features Encoder x Input data Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 48
Originally : Linear + Autoencoders nonlinearity (sigmoid) Later : Deep, fully-connected Later : ReLU CNN (upconv) Reconstructed xx input data Train for Decoder Encoder / decoder reconstruction sometimes share with no labels! weights z Features Example : Encoder dim( x ) = D dim( z ) = H x Input data w e : H x D T w d : D x H = w e Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 49
Autoencoders Loss function (Often L2) Reconstructed xx input data Train for Decoder reconstruction with no labels! z Features Encoder x Input data Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 50
Autoencoders Reconstructed xx input data Decoder After training, throw away decoder! z Features Encoder x Input data Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 51
Autoencoders Loss function (Softmax, etc) bird plane Predicted yy y dog deer truck Label Use encoder to initialize a Classifier supervised Train for final task Fine-tune model (sometimes with z encoder Features small data) jointly with classifier Encoder x Input data Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 52
Autoencoders: Greedy Training In mid 2000s layer-wise pretraining with Restricted Boltzmann Machines (RBM) was common Training deep nets was hard in 2006! Hinton and Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks”, Science 2006 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 53
Autoencoders: Greedy Training In mid 2000s layer-wise pretraining with Restricted Not common anymore Boltzmann Machines (RBM) was common Training deep nets was hard in 2006! With ReLU, proper initialization, batchnorm, Adam, etc easily train from scratch Hinton and Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks”, Science 2006 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 54
Autoencoders Autoencoders can reconstruct data, and Reconstructed can learn features to xx input data initialize a supervised Decoder model z Features Can we generate images from an Encoder autoencoder? x Input data Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 55
Variational Autoencoder A Bayesian spin on an autoencoder - lets us generate data! Assume our data is generated like this: Sample from true conditional Sample from z x true prior Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 56
Variational Autoencoder Intuition : x is an image, z gives A Bayesian spin on an autoencoder! class, orientation, attributes, etc Assume our data is generated like this: Sample from true conditional Sample from z x true prior Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 57
Variational Autoencoder Intuition : x is an image, z gives A Bayesian spin on an autoencoder! class, orientation, attributes, etc Assume our data is generated like this: Sample from true Problem : Estimate conditional � without access to Sample from latent states ! z x true prior Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 58
Variational Autoencoder Prior : Assume is a unit Gaussian Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 59
Variational Autoencoder Prior : Assume is a unit Gaussian Conditional : Assume is a diagonal Gaussian, predict mean and variance with neural net Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 60
Variational Autoencoder Mean and (diagonal) Prior : Assume covariance of is a unit Gaussian � x Σ x Conditional : Assume is a Decoder network diagonal Gaussian, with parameters � predict mean and z variance with neural net Latent state Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 61
Variational Autoencoder Mean and (diagonal) Prior : Assume covariance of is a unit Gaussian � x Σ x Conditional : Assume is a Decoder network diagonal Gaussian, with parameters � predict mean and z variance with neural Fully-connected or net Latent state upconvolutional Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 62
Variational Autoencoder: Encoder By Bayes Rule the posterior is: Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 63
Variational Autoencoder: Encoder By Bayes Rule the posterior is: Use decoder network =) Gaussian =) Intractible integral =( Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 64
Variational Autoencoder: Encoder Mean and (diagonal) By Bayes Rule the posterior is: covariance of � z Σ z Use decoder network =) Encoder network Gaussian =) with parameters � Intractible integral =( x Data point Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 65
Variational Autoencoder: Encoder Mean and (diagonal) By Bayes Rule the posterior is: covariance of � z Σ z Use decoder network =) Encoder network Gaussian =) with parameters � Intractible integral =( x Approximate posterior with encoder network Data point Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 66
Variational Autoencoder: Encoder Mean and (diagonal) By Bayes Rule the posterior is: covariance of Fully-connected or convolutional � z Σ z Use decoder network =) Encoder network Gaussian =) with parameters � Intractible integral =( x Approximate posterior with encoder network Data point Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 67
Variational Autoencoder x Data point Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 68
Variational Autoencoder Mean and (diagonal) � z Σ z covariance of Encoder network x Data point Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 69
Variational Autoencoder z Sample from Mean and (diagonal) � z Σ z covariance of Encoder network x Data point Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 70
Variational Autoencoder � x Σ x Mean and (diagonal) Decoder network covariance of z Sample from Mean and (diagonal) � z Σ z covariance of Encoder network x Data point Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 71
Variational Autoencoder xx Reconstructed Sample from � x Σ x Mean and (diagonal) Decoder network covariance of z Sample from Mean and (diagonal) � z Σ z covariance of Encoder network x Data point Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 72
Variational Autoencoder Training like a normal autoencoder: xx Reconstructed reconstruction loss at the end, Sample from regularization toward prior in middle � x Σ x Mean and (diagonal) Decoder network covariance of (should be close to data x) z Sample from Mean and (diagonal) � z Σ z covariance of Encoder network (should be close to prior ) x Data point Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 73
Variational Autoencoder: Generate Data! After network is trained: z Sample from prior Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 74
Variational Autoencoder: Generate Data! After network is trained: � x Σ x Decoder network z Sample from prior Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 75
Variational Autoencoder: Generate Data! After network is trained: xx Generated Sample from � x Σ x Decoder network z Sample from prior Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 76
Variational Autoencoder: Generate Data! After network is trained: xx Generated Sample from � x Σ x Decoder network z Sample from prior Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 77
Variational Autoencoder: Generate Data! After network is trained: xx Generated Sample from � x Σ x Decoder network z Sample from prior Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 78
Variational Autoencoder: Generate Data! Diagonal prior on z => After network is trained: independent latent variables xx Generated Sample from � x Σ x Decoder network z Sample from prior Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 79
Variational Autoencoder: Math Maximum Likelihood? Maximize likelihood of dataset Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 80
Variational Autoencoder: Math Maximum Likelihood? Maximize likelihood of dataset Maximize log-likelihood instead because sums are nicer Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 81
Variational Autoencoder: Math Maximum Likelihood? Maximize likelihood of dataset Maximize log-likelihood instead because sums are nicer Marginalize joint distribution Kingma and Welling, ICLR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 82
Variational Autoencoder: Math Maximum Likelihood? Maximize likelihood of dataset Maximize log-likelihood instead because sums are nicer Intractible integral =( Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 83
Variational Autoencoder: Math Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 84
Variational Autoencoder: Math Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 85
Variational Autoencoder: Math Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 86
Variational Autoencoder: Math Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 87
Variational Autoencoder: Math Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 88
Variational Autoencoder: Math Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 89
Variational Autoencoder: Math “Elbow” Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 90
Variational Autoencoder: Math “Elbow” Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 91
Variational Autoencoder: Math “Elbow” Variational lower bound (elbow) Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 92
Variational Autoencoder: Math “Elbow” Variational lower bound (elbow) Training: Maximize lower bound Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 93
Variational Autoencoder: Math Reconstruct the input data “Elbow” Variational lower bound (elbow) Training: Maximize lower bound Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 94
Variational Autoencoder: Math Latent states should follow the prior Reconstruct the input data “Elbow” Variational lower bound (elbow) Training: Maximize lower bound Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 95
Variational Autoencoder: Math Latent states should follow the prior Reconstruct the input data Sampling with reparam. trick (see paper) “Elbow” Variational lower bound (elbow) Training: Maximize lower bound Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 96
Variational Autoencoder: Math Latent states should follow the prior Reconstruct Everything is the input Gaussian, data closed form Sampling solution! with reparam. trick (see paper) “Elbow” Variational lower bound (elbow) Training: Maximize lower bound Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 97
Autoencoder Overview ● Traditional Autoencoders ○ Try to reconstruct input ○ Used to learn features, initialize supervised model ○ Not used much anymore ● Variational Autoencoders ○ Bayesian meets deep learning ○ Sample from model to generate images Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 98
Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014 Generative Adversarial Nets Can we generate images with less math? z Random noise Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 99
Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014 Generative Adversarial Nets Can we generate images with less math? x Fake image Generator z Random noise Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - 29 Feb 2016 100
Recommend
More recommend