Lecture 13: − Introduction to Deep Learning Aykut Erdem March 2016 Hacettepe University
Last time.. Computational Graph x s (scores) * L + hinge loss W R slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 2
Last time… Training Neural Networks Mini-batch SGD Loop: 1.Sample a batch of data 2.Forward prop it through the graph, get loss 3.Backprop to calculate the gradients slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 4.Update the parameters using the gradient 3
This week • Introduction to Deep Learning • Deep Convolutional Networks • Brief Overview of other Deep Networks 4
Deep Learning 5
Synonyms • Representation Learning • Deep (Machine) Learning • Deep Neural Networks • Deep Unsupervised Learning • Simply: Deep Learning slide by Dhruv Batra 6
Recap: 1 Layer Neural Network • 1 Neuron y x Σ - Takes input x - Outputs y “Neuron” y = τ( f(x) ) f(x|w,b) = w T x – b = w 1 *x 1 + w 2 *x 2 + w 3 *x 3 – b • ~Logistic Regression! sigmoid - Gradient Descent tanh rectilinear slide by Yisong Yue 7
Recap: 2 Layer Neural Network Σ y x Σ Σ Hidden Layer • 2 Layers of Neurons - 1 st Layer takes input x Non-Linear! - 2 nd Layer takes output of 1 st layer • Can approximate arbitrary functions - Provided hidden layer is large enough slide by Yisong Yue - “fat” 2-Layer Network 8
Deep Neural Networks • Why prefer Deep over a “Fat” 2-Layer? - Compact Model (exponentially large “fat” model) • slide by Yisong Yue Image Source: http://blog.peltarion.com/2014/06/22/deep-learning-and-deep-neural-networks-in-synapse/ 9
Original Biological Inspiration David Hubel & Torsten Wiesel discovered “simple cells” and • “complex cells” in the 1959 - Some cells activate for simple patterns • E.g., lines at certain angles - Some cells activate for more complex patterns • Appear to take activations of simple cells as input slide by Yisong Yue Image Source: https://cms.www.countway.harvard.edu/wp/wp-content/uploads/2013/09/0002595_ref.jpg https://cognitiveconsonance.files.wordpress.com/2013/05/c_fig5.jpg 10
11
Early Hierarchical Feature Models for Vision • Hubel & Wiesel [60s] Simple & Complex cells architecture: • Fukushima’s Neocognitron [70s] slide by Joan Bruna 12 figures from Yann LeCun’s CVPR plenary
Early Hierarchical Feature Models for Vision • Yann LeCun’s Early ConvNets [80s]: - Used for character recognition - Trained with back propagation. slide by Joan Bruna 13 figures from Yann LeCun’s CVPR plenary
Deep Learning pre-2012 • Despite its very competitive performance, deep learning architectures were not widespread before 2012. - State-of-the-art in handwritten pattern recognition [LeCun et al. ’89, Ciresan et al, ’07, etc] slide by Joan Bruna 14 figures from Yann LeCun’s CVPR plenary
Deep Learning pre-2012 • Despite its very competitive performance, deep learning architectures were not widespread before 2012. - Face detection [Vaillant et al’93,’94 ; Osadchy et al, ’03, ’04, ’07] slide by Joan Bruna 15 figures from Yann LeCun’s CVPR plenary
Deep Learning pre-2012 • Despite its very competitive performance, deep learning architectures were not widespread before 2012. - Scene Parsing [Farabet et al, ’12,’13] slide by Joan Bruna 16 figures from Yann LeCun’s CVPR plenary
Deep Learning pre-2012 • Despite its very competitive performance, deep learning architectures were not widespread before 2012. - Scene Parsing [Farabet et al, ’12,’13] slide by Joan Bruna 17 figures from Yann LeCun’s CVPR plenary
ImageNET • Object recognition competition (2012) - 1.5 Million Labeled Training Examples - ≈ 1000 classes Leopard( Mushroom( Mite( slide by Yisong Yue http://www.image-net.org/ 18
Deep Learning Golden age in Vision • 2012-2014 Imagenet results: • 2015 results: MSRA under 3.5% error. slide by Joan Bruna (using a CNN with 150 layers!) 19 figures from Yann LeCun’s CVPR plenary
Traditional Machine Learning VISION hand-crafted your favorite features “car” classifier SIFT/HOG fixed learned SPEECH hand-crafted your favorite features \ ˈ d ē p\ classifier MFCC fixed learned slide by Marc’Aurelio Ranzato, Yann LeCun NLP hand-crafted your favorite This burrito place features “+” classifier is yummy and fun! Bag-of-words fixed learned 20
It’s an old paradigm • The first learning machine: Feature Extractor the Perceptron - Built at Cornell in 1960 A • The Perceptron was a linear classifier on top of a simple feature extractor W i • The vast majority of practical applications of ML today use glorified linear classifiers N y=sign ( W i F i ( X ) +b ) or glorified template matching. ∑ • Designing a feature extractor requires i= 1 considerable e ff orts by experts. slide by Marc’Aurelio Ranzato, Yann LeCun 21
Hierarchical Compositionality VISION pixels edge texton motif part object SPEECH spectral sample formant motif phone word band slide by Marc’Aurelio Ranzato, Yann LeCun NLP character word NP/VP/.. clause sentence story 22
Building A Complicated Function Given a library of simple functions Compose into a complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 23
Building A Complicated Function Given a library of simple functions Idea 1: Linear Combinations • Boosting Compose into a • Kernels • … complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 24
Building A Complicated Function Given a library of simple functions Idea 2: Compositions • Deep Learning Compose into a • Grammar models • Scattering transforms… complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 25
Building A Complicated Function Given a library of simple functions Idea 2: Compositions • Deep Learning Compose into a • Grammar models • Scattering transforms… complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 26
Deep Learning = Hierarchical Compositionality “car” slide by Marc’Aurelio Ranzato, Yann LeCun 27
Deep Learning = Hierarchical Compositionality “car” Low-Level Mid-Level High-Level Trainable Feature Feature Feature Classifier slide by Marc’Aurelio Ranzato, Yann LeCun Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013] 28
The Mammalian Visual Cortex is Hierarchical • The ventral (recognition) pathway in the visual cortex slide by Marc’Aurelio Ranzato, Yann LeCun [picture from Simon Thorpe] 29
Traditional Machine Learning VISION hand-crafted your favorite features “car” classifier SIFT/HOG fixed learned SPEECH hand-crafted your favorite features \ ˈ d ē p\ classifier MFCC fixed learned slide by Marc’Aurelio Ranzato, Yann LeCun NLP hand-crafted your favorite This burrito place features “+” classifier is yummy and fun! Bag-of-words fixed learned 30
Traditional Machine Learning (more accurately) “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ ˈ d ē p\ Gaussians slide by Marc’Aurelio Ranzato, Yann LeCun fixed unsupervised supervised NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised 31
Deep Learning = End-to-End Learning “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ ˈ d ē p\ Gaussians fixed unsupervised supervised slide by Marc’Aurelio Ranzato, Yann LeCun NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised 32
Deep Learning = End-to-End Learning • A hierarchy of trainable feature transforms - Each module transforms its input representation into a higher-level one. - High-level features are more global and more invariant - Low-level features are shared among categories Trainable Trainable Trainable slide by Marc’Aurelio Ranzato, Yann LeCun Feature- Feature- Feature- Transform / Transform / Transform / Classifier Classifier Classifier Learned Internal Representations 33
“Shallow” vs Deep Learning • “Shallow” models hand-crafted “Simple” Trainable Feature Extractor Classifier fixed learned • Deep models Trainable Trainable Trainable Feature- Feature- Feature- slide by Marc’Aurelio Ranzato, Yann LeCun Transform / Transform / Transform / Classifier Classifier Classifier Learned Internal Representations 34
Next lecture: Deep Convolutional Nets 35
Recommend
More recommend