Unsupervised Learning of Visual Structure Using Predictive Generative Networks William Lotter, Gabriel Kreiman & David Cox Harvard University, Cambridge, USA Article overview by Ilya Kuzovkin Computational Neuroscience Seminar University of Tartu 2015
The idea of predictive coding in neuroscience
“state-of-the-art deep learning models rely on millions of labeled training examples to learn”
“state-of-the-art deep learning models rely on millions of labeled training examples to learn” “in contrast to biological systems, where learning is largely unsupervised”
“state-of-the-art deep “we explore the idea that learning models rely on prediction is not only a millions of labeled training useful end-goal, but may examples to learn” also serve as a powerful unsupervised learning signal” “in contrast to biological systems, where learning is largely unsupervised”
P ART I T HE I DEA OF P REDICTIVE E NCODER "prediction may also serve as a powerful unsupervised learning signal"
P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) vs.
input output A UTOENCODER “bottleneck”
input output A UTOENCODER “bottleneck”
input output A UTOENCODER “bottleneck”
input output A UTOENCODER “bottleneck” Reconstruction
input output A UTOENCODER “bottleneck” Reconstruction
input output A UTOENCODER “bottleneck” Reconstruction Can we do prediction?
P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) vs.
R ECURRENT N EURAL N ETWORK
R ECURRENT N EURAL N ETWORK
R ECURRENT N EURAL N ETWORK
R ECURRENT N EURAL N ETWORK
P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) vs.
P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) 2x { Max-pooling ReLu Convolution vs.
P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) 5 - 15 steps Long Short-Term Memory (LSTM) 1024 units 2x { Max-pooling ReLu Convolution vs.
P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) 5 - 15 steps Long Short-Term Memory (LSTM) 1024 units 2x { Max-pooling 2 layers NN upsampling Convolution ReLu ReLu Convolution vs.
P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) 5 - 15 steps Long Short-Term Memory (LSTM) 1024 units 2x { Max-pooling 2 layers NN upsampling Convolution ReLu ReLu Convolution MSE loss RMSProp optimizer LR 0.001 vs.
http://keras.io P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) 5 - 15 steps Long Short-Term Memory (LSTM) 1024 units 2x { Max-pooling 2 layers NN upsampling Convolution ReLu ReLu Convolution MSE loss RMSProp optimizer LR 0.001 vs.
P ART II A DVERSARIAL L OSS "the generator is trained to maximally confuse the adversarial discriminator"
5 - 15 steps Long Short-Term Memory (LSTM) 1568 units Fully connected layer 2x { Max-pooling 2 layers NN upsampling ReLu Convolution Convolution ReLu MSE loss RMSProp optimizer LR 0.001 vs.
5 - 15 steps Long Short-Term Memory (LSTM) 1568 units Fully connected layer 2x { Max-pooling 2 layers NN upsampling ReLu Convolution Convolution ReLu MSE loss RMSProp optimizer LR 0.001 vs.
MSE loss
MSE loss
MSE loss
3 FC layers (relu, relu, softmax) MSE loss
"trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator" 3 FC layers (relu, relu, softmax) MSE loss
"trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator" N G P n i a r t o t 3 FC layers s s o l L (relu, relu, softmax) A AL loss MSE loss
"trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator" N G P n i a r t o t 3 FC layers s s o l L (relu, relu, softmax) A AL loss MSE loss
MSE model is fairly combined AL/MSE “with adversarial loss alone the faithful to the model tends to generator easily found solutions identities of the underfit the identity that fooled the discriminator, but faces, but produces towards a more did not look anything like the blurred versions average face correct samples”
P ART III I NTERNAL R EPRESENTATIONS AND L ATENT V ARIABLES "we are interested in understanding the representations learned by the models"
Value of a PGN model LSTM activities L2 regression latent variable
Value of a PGN model LSTM activities L2 regression latent variable
M ULTIDIMENSIONAL SCALING “ An MDS algorithm aims to place each object in N-dimensional space such that the between-object distances are preserved as well as possible. ”
"representations trained with a predictive loss outperform other models of comparable complexity in a supervised classification problem " P ART IV U SEFULNESS OF P REDICTIVE L EARNING
T HE T ASK : 50 randomly generated faces (12 angles per each) Generative Identify Internal SVM models: class representation
T HE T ASK : 50 randomly generated faces (12 angles per each) Generative Identify Internal SVM models: class representation • Encoder-LSTM-Decoder to predict next frame (PGN) • Encoder-LSTM-Decoder to predict last frame (AE LSTM dynamic) • Encoder-LSTM-Decoder on frames made into static movies (AE LSTM static) • Encoder-FC-Decoder with #weights as in LSTM (AE FC #weights) • Encoder-FC-Decoder with #units as in LSTM (AE FC #units)
Recommend
More recommend