unsupervised learning of visual structure using
play

Unsupervised Learning of Visual Structure Using Predictive - PowerPoint PPT Presentation

Unsupervised Learning of Visual Structure Using Predictive Generative Networks William Lotter, Gabriel Kreiman & David Cox Harvard University, Cambridge, USA Article overview by Ilya Kuzovkin Computational Neuroscience Seminar University


  1. Unsupervised Learning of Visual Structure Using Predictive Generative Networks William Lotter, Gabriel Kreiman & David Cox Harvard University, Cambridge, USA Article overview by Ilya Kuzovkin Computational Neuroscience Seminar University of Tartu 2015

  2. The idea of predictive coding in neuroscience

  3. “state-of-the-art deep learning models rely on millions of labeled training examples to learn”

  4. “state-of-the-art deep learning models rely on millions of labeled training examples to learn” “in contrast to biological systems, where learning is largely unsupervised”

  5. “state-of-the-art deep “we explore the idea that learning models rely on prediction is not only a millions of labeled training useful end-goal, but may examples to learn” also serve as a powerful unsupervised learning signal” “in contrast to biological systems, where learning is largely unsupervised”

  6. P ART I T HE I DEA OF P REDICTIVE E NCODER "prediction may also serve as a powerful unsupervised learning signal"

  7. P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) vs.

  8. input output A UTOENCODER “bottleneck”

  9. input output A UTOENCODER “bottleneck”

  10. input output A UTOENCODER “bottleneck”

  11. input output A UTOENCODER “bottleneck” Reconstruction

  12. input output A UTOENCODER “bottleneck” Reconstruction

  13. input output A UTOENCODER “bottleneck” Reconstruction Can we do prediction?

  14. P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) vs.

  15. R ECURRENT N EURAL N ETWORK

  16. R ECURRENT N EURAL N ETWORK

  17. R ECURRENT N EURAL N ETWORK

  18. R ECURRENT N EURAL N ETWORK

  19. P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) vs.

  20. P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) 2x { Max-pooling ReLu Convolution vs.

  21. P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) 5 - 15 steps Long Short-Term Memory (LSTM) 1024 units 2x { Max-pooling ReLu Convolution vs.

  22. P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) 5 - 15 steps Long Short-Term Memory (LSTM) 1024 units 2x { Max-pooling 2 layers NN upsampling Convolution ReLu ReLu Convolution vs.

  23. P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) 5 - 15 steps Long Short-Term Memory (LSTM) 1024 units 2x { Max-pooling 2 layers NN upsampling Convolution ReLu ReLu Convolution MSE loss RMSProp optimizer LR 0.001 vs.

  24. http://keras.io P REDICTIVE G ENERATIVE N ETWORK (a.k.a “Predictive Encoder” Palm 2012 ) 5 - 15 steps Long Short-Term Memory (LSTM) 1024 units 2x { Max-pooling 2 layers NN upsampling Convolution ReLu ReLu Convolution MSE loss RMSProp optimizer LR 0.001 vs.

  25. P ART II A DVERSARIAL L OSS "the generator is trained to maximally confuse the adversarial discriminator"

  26. 5 - 15 steps Long Short-Term Memory (LSTM) 1568 units Fully connected layer 2x { Max-pooling 2 layers NN upsampling ReLu Convolution Convolution ReLu MSE loss RMSProp optimizer LR 0.001 vs.

  27. 5 - 15 steps Long Short-Term Memory (LSTM) 1568 units Fully connected layer 2x { Max-pooling 2 layers NN upsampling ReLu Convolution Convolution ReLu MSE loss RMSProp optimizer LR 0.001 vs.

  28. MSE loss

  29. MSE loss

  30. MSE loss

  31. 3 FC layers (relu, relu, softmax) MSE loss

  32. "trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator" 3 FC layers (relu, relu, softmax) MSE loss

  33. "trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator" N G P n i a r t o t 3 FC layers s s o l L (relu, relu, softmax) A AL loss MSE loss

  34. "trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator" N G P n i a r t o t 3 FC layers s s o l L (relu, relu, softmax) A AL loss MSE loss

  35. MSE model is fairly combined AL/MSE “with adversarial loss alone the faithful to the model tends to generator easily found solutions identities of the underfit the identity that fooled the discriminator, but faces, but produces towards a more did not look anything like the blurred versions average face correct samples”

  36. P ART III I NTERNAL R EPRESENTATIONS AND L ATENT V ARIABLES "we are interested in understanding the representations learned by the models"

  37. Value of a PGN model LSTM activities L2 regression latent variable

  38. Value of a PGN model LSTM activities L2 regression latent variable

  39. M ULTIDIMENSIONAL SCALING “ An MDS algorithm aims to place each object in N-dimensional space such that the between-object distances are preserved as well as possible. ”

  40. "representations trained with a predictive loss outperform other models of comparable complexity in a supervised classification problem " P ART IV U SEFULNESS OF P REDICTIVE L EARNING

  41. T HE T ASK : 50 randomly generated faces (12 angles per each) Generative Identify Internal SVM models: class representation

  42. T HE T ASK : 50 randomly generated faces (12 angles per each) Generative Identify Internal SVM models: class representation • Encoder-LSTM-Decoder to predict next frame (PGN) • Encoder-LSTM-Decoder to predict last frame (AE LSTM dynamic) • Encoder-LSTM-Decoder on frames made into static movies (AE LSTM static) • Encoder-FC-Decoder with #weights as in LSTM (AE FC #weights) • Encoder-FC-Decoder with #units as in LSTM (AE FC #units)

Recommend


More recommend