lecture 13
play

Lecture 13: Introduction to Deep Learning Aykut Erdem March 2016 - PowerPoint PPT Presentation

Lecture 13: Introduction to Deep Learning Aykut Erdem March 2016 Hacettepe University Last time.. Computational Graph x s (scores) * L + hinge loss W R slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 2 Last time


  1. Lecture 13: − Introduction to Deep Learning Aykut Erdem March 2016 Hacettepe University

  2. Last time.. Computational Graph x s (scores) * L + hinge loss W R slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 2

  3. Last time… Training Neural Networks Mini-batch SGD Loop: 1.Sample a batch of data 2.Forward prop it through the graph, get loss 3.Backprop to calculate the gradients slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 4.Update the parameters using the gradient 3

  4. This week • Introduction to Deep Learning • Deep Convolutional Networks 
 • Brief Overview of other Deep Networks 4

  5. Deep Learning 5

  6. Synonyms • Representation Learning • Deep (Machine) Learning • Deep Neural Networks • Deep Unsupervised Learning • Simply: Deep Learning slide by Dhruv Batra 6

  7. Recap: 1 Layer Neural Network • 1 Neuron y x Σ - Takes input x - Outputs y 
 “Neuron” y = τ( f(x) ) f(x|w,b) = w T x – b = w 1 *x 1 + w 2 *x 2 + w 3 *x 3 – b • ~Logistic Regression! sigmoid - Gradient Descent tanh rectilinear slide by Yisong Yue 7

  8. Recap: 2 Layer Neural Network Σ y x Σ Σ Hidden Layer • 2 Layers of Neurons - 1 st Layer takes input x Non-Linear! - 2 nd Layer takes output of 1 st layer • Can approximate arbitrary functions - Provided hidden layer is large enough slide by Yisong Yue - “fat” 2-Layer Network 8

  9. Deep Neural Networks • Why prefer Deep over a “Fat” 2-Layer? - Compact Model (exponentially large “fat” model) • slide by Yisong Yue Image Source: http://blog.peltarion.com/2014/06/22/deep-learning-and-deep-neural-networks-in-synapse/ 9

  10. Original Biological Inspiration David Hubel & Torsten Wiesel discovered “simple cells” and • “complex cells” in the 1959 - Some cells activate for simple patterns • E.g., lines at certain angles - Some cells activate for more complex patterns • Appear to take activations of simple cells as input slide by Yisong Yue Image Source: https://cms.www.countway.harvard.edu/wp/wp-content/uploads/2013/09/0002595_ref.jpg https://cognitiveconsonance.files.wordpress.com/2013/05/c_fig5.jpg 10

  11. 11

  12. Early Hierarchical Feature Models 
 for Vision • Hubel & Wiesel [60s] 
 Simple & Complex 
 cells architecture: • Fukushima’s 
 Neocognitron 
 [70s] slide by Joan Bruna 12 figures from Yann LeCun’s CVPR plenary

  13. 
 
 
 
 
 
 
 
 Early Hierarchical Feature Models 
 for Vision • Yann LeCun’s Early ConvNets [80s]: 
 - Used for character recognition - Trained with back propagation. slide by Joan Bruna 13 figures from Yann LeCun’s CVPR plenary

  14. Deep Learning pre-2012 • Despite its very competitive performance, deep learning architectures were not widespread before 2012. - State-of-the-art in handwritten pattern recognition [LeCun et al. ’89, Ciresan et al, ’07, etc] slide by Joan Bruna 14 figures from Yann LeCun’s CVPR plenary

  15. Deep Learning pre-2012 • Despite its very competitive performance, deep learning architectures were not widespread before 2012. - Face detection [Vaillant et al’93,’94 ; Osadchy et al, ’03, ’04, ’07] slide by Joan Bruna 15 figures from Yann LeCun’s CVPR plenary

  16. Deep Learning pre-2012 • Despite its very competitive performance, deep learning architectures were not widespread before 2012. - Scene Parsing [Farabet et al, ’12,’13] slide by Joan Bruna 16 figures from Yann LeCun’s CVPR plenary

  17. Deep Learning pre-2012 • Despite its very competitive performance, deep learning architectures were not widespread before 2012. - Scene Parsing [Farabet et al, ’12,’13] slide by Joan Bruna 17 figures from Yann LeCun’s CVPR plenary

  18. ImageNET • Object recognition competition (2012) - 1.5 Million Labeled Training Examples - ≈ 1000 classes Leopard( Mushroom( Mite( slide by Yisong Yue http://www.image-net.org/ 18

  19. Deep Learning Golden age in Vision • 2012-2014 Imagenet results: • 2015 results: MSRA under 3.5% error. 
 slide by Joan Bruna (using a CNN with 150 layers!) 19 figures from Yann LeCun’s CVPR plenary

  20. Traditional Machine Learning VISION hand-crafted 
 your favorite 
 features “car” classifier SIFT/HOG fixed learned SPEECH hand-crafted 
 your favorite 
 features \ ˈ d ē p\ classifier MFCC fixed learned slide by Marc’Aurelio Ranzato, Yann LeCun NLP hand-crafted 
 your favorite 
 This burrito place features “+” classifier is yummy and fun! Bag-of-words fixed learned 20

  21. It’s an old paradigm • The first learning machine: 
 Feature Extractor the Perceptron - Built at Cornell in 1960 A • The Perceptron was a linear classifier on top of a simple feature extractor W i • The vast majority of practical applications of ML today use glorified linear classifiers N y=sign ( W i F i ( X ) +b ) or glorified template matching. ∑ • Designing a feature extractor requires i= 1 considerable e ff orts by experts. slide by Marc’Aurelio Ranzato, Yann LeCun 21

  22. Hierarchical Compositionality VISION pixels edge texton motif part object SPEECH spectral sample formant motif phone word band slide by Marc’Aurelio Ranzato, Yann LeCun NLP character word NP/VP/.. clause sentence story 22

  23. Building A Complicated Function Given a library of simple functions Compose into a complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 23

  24. Building A Complicated Function Given a library of simple functions Idea 1: Linear Combinations • Boosting Compose into a • Kernels • … complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 24

  25. Building A Complicated Function Given a library of simple functions Idea 2: Compositions • Deep Learning Compose into a • Grammar models • Scattering transforms… complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 25

  26. Building A Complicated Function Given a library of simple functions Idea 2: Compositions • Deep Learning Compose into a • Grammar models • Scattering transforms… complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 26

  27. Deep Learning = Hierarchical Compositionality “car” slide by Marc’Aurelio Ranzato, Yann LeCun 27

  28. Deep Learning = Hierarchical Compositionality “car” Low-Level 
 Mid-Level 
 High-Level 
 Trainable 
 Feature Feature Feature Classifier slide by Marc’Aurelio Ranzato, Yann LeCun Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013] 28

  29. The Mammalian Visual Cortex is Hierarchical • The ventral (recognition) pathway in the visual cortex slide by Marc’Aurelio Ranzato, Yann LeCun [picture from Simon Thorpe] 29

  30. Traditional Machine Learning VISION hand-crafted 
 your favorite 
 features “car” classifier SIFT/HOG fixed learned SPEECH hand-crafted 
 your favorite 
 features \ ˈ d ē p\ classifier MFCC fixed learned slide by Marc’Aurelio Ranzato, Yann LeCun NLP hand-crafted 
 your favorite 
 This burrito place features “+” classifier is yummy and fun! Bag-of-words fixed learned 30

  31. Traditional Machine Learning (more accurately) “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ ˈ d ē p\ Gaussians slide by Marc’Aurelio Ranzato, Yann LeCun fixed unsupervised supervised NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised 31

  32. Deep Learning = End-to-End Learning “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ ˈ d ē p\ Gaussians fixed unsupervised supervised slide by Marc’Aurelio Ranzato, Yann LeCun NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised 32

  33. Deep Learning = End-to-End Learning • A hierarchy of trainable feature transforms - Each module transforms its input representation into a higher-level one. - High-level features are more global and more invariant - Low-level features are shared among categories Trainable 
 Trainable 
 Trainable 
 slide by Marc’Aurelio Ranzato, Yann LeCun Feature- Feature- Feature- Transform / 
 Transform / 
 Transform / 
 Classifier Classifier Classifier Learned Internal Representations 33

  34. “Shallow” vs Deep Learning • “Shallow” models hand-crafted “Simple” Trainable Feature Extractor Classifier fixed learned • Deep models Trainable 
 Trainable 
 Trainable 
 Feature- Feature- Feature- slide by Marc’Aurelio Ranzato, Yann LeCun Transform / 
 Transform / 
 Transform / 
 Classifier Classifier Classifier Learned Internal Representations 34

  35. Next lecture: Deep Convolutional Nets 35

Recommend


More recommend