CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/81
Acknowledgements Most of this material is based on the article “Deep Learning in Neural Networks: An Overview” by J. Schmidhuber [1] The errors, if any, are due to me and I apologize for them Feel free to contact me if you think certain portions need to be corrected (please provide appropriate references) 2/81 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 1
Chapter 1: Biological Neurons 3/81 Module 1.1
Reticular Theory Joseph von Gerlach proposed that the ner- vous system is a single continuous network as opposed to a network of many discrete cells! 1871-1873 Reticular theory 4/81 Module 1.1
Staining Technique Camillo Golgi discovered a chemical reaction that allowed him to examine nervous tissue in much greater detail than ever before He was a proponent of Reticular theory. 1871-1873 Reticular theory 5/81 Module 1.1
Neuron Doctrine Santiago Ram´ on y Cajal used Golgi’s tech- nique to study the nervous system and pro- posed that it is actually made up of discrete individual cells formimg a network (as op- posed to a single continuous network) 1871-1873 1888-1891 Reticular theory Neuron Doctrine 6/81 Module 1.1
The Term Neuron The term neuron was coined by Hein- rich Wilhelm Gottfried von Waldeyer-Hartz around 1891. He further consolidated the Neuron Doc- trine. 1871-1873 1888-1891 Reticular theory Neuron Doctrine 7/81 Module 1.1
Nobel Prize Both Golgi (reticular theory) and Cajal (neu- ron doctrine) were jointly awarded the 1906 Nobel Prize for Physiology or Medicine, that resulted in lasting conflicting ideas and con- troversies between the two scientists. 1871-1873 1888-1891 1906 Reticular theory Neuron Doctrine Nobel Prize 8/81 Module 1.1
The Final Word In 1950s electron microscopy finally con- firmed the neuron doctrine by unam- biguously demonstrating that nerve cells were individual cells interconnected through synapses (a network of many individual neu- rons). 1871-1873 1888-1891 1906 1950 Reticular theory Neuron Doctrine Nobel Prize Synapse 9/81 Module 1.1
Chapter 2: From Spring to Winter of AI 10/81 Module 2
McCulloch Pitts Neuron McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified model of the neuron (1943) [2] 1943 MP Neuron 11/81 Module 2
Perceptron “the perceptron may eventually be able to learn, make decisions, and translate lan- guages” -Frank Rosenblatt 1943 1957-1958 MP Neuron Perceptron 12/81 Module 2
Perceptron “the embryo of an electronic computer that the Navy expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” -New York Times 1943 1957-1958 MP Neuron Perceptron 13/81 Module 2
First generation Multilayer Perceptrons Ivakhnenko et. al. [3] 1943 1957-1958 1965-1968 MP Neuron Perceptron MLP 14/81 Module 2
Perceptron Limitations In their now famous book “Perceptrons”, Minsky and Papert outlined the limits of what perceptrons could do [4] 1943 1957-1958 1965-1968 1969 MP Neuron Perceptron MLP Limitations 15/81 Module 2
AI Winter of connectionism Almost lead to the abandonment of connec- tionist AI 1943 1957-1958 1965-1968 1969 1969-1986 MP Neuron Perceptron MLP Limitations AI Winter 16/81 Module 2
Backpropagation Discovered and rediscovered several times throughout 1960’s and 1970’s Werbos(1982) [5] first used it in the context of artificial neural networks Eventually popularized by the work of Rumelhart et. al. in 1986 [6] 1943 1957-1958 1965-1968 1969 1969-1986 1986 MP Neuron Perceptron MLP Limitations AI Winter Backpropagation 17/81 Module 2
Gradient Descent Cauchy discovered Gradient Descent moti- vated by the need to compute the orbit of heavenly bodies 1847 1943 1957-1958 1965-1968 1969 1969-1986 1986 Gradient Descent MP Neuron Perceptron MLP Limitations AI Winter Backpropagation 18/81 Module 2
Universal Approximation The- orem A multilayered network of neurons with a single hidden layer can be used to approxi- mate any continuous function to any desired precision [7] 1847 1943 1957-1958 1965-1968 1969 1969-1986 1986 1989 Gradient Descent MP Neuron Perceptron MLP Limitations AI Winter UAT Backpropagation 19/81 Module 2
Chapter 3: The Deep Revival 20/81 Module 3
Unsupervised Pre-Training Hinton and Salakhutdinov described an ef- fective way of initializing the weights that allows deep autoencoder networks to learn a low-dimensional representation of data. [8] 2006 Unsupervised Pre-Training 21/81 Module 3
Unsupervised Pre-Training The idea of unsupervised pre-training actu- ally dates back to 1991-1993 (J. Schmidhu- ber) when it was used to train a “Very Deep Learner” 1991-1993 2006 Unsupervised Pre-Training Very Deep Learner 22/81 Module 3
More insights (2007-2009) Further Investigations into the effectiveness of Unsupervised Pre-training 1991-1993 2006-2009 Unsupervised Pretraining Very Deep Learner 23/81 Module 3
Success in Handwriting Recog- nition Graves et. al. outperformed all entries in an international Arabic handwriting recognition competition [9] Dahl et. al. showed relative error reduction of 16.0% and 23.2% over a state of the art system [10] 1991-1993 2006-2009 2009 Handwriting Unsupervised Pretraining Very Deep Learner 24/81 Module 3
Success in Speech Recognition Dahl et. al. showed relative error reduction of 16.0% and 23.2% over a state of the art system [10] 1991-1993 2006-2009 2009 2010 Handwriting Speech Unsupervised Pretraining Very Deep Learner 25/81 Module 3
New record on MNIST Ciresan et. al. set a new record on the MNIST dataset using good old backpropa- gation on GPUs (GPUs enter the scene) [11] 1991-1993 2006-2009 2009 2010 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST 26/81 Module 3
First Superhuman Visual Pat- tern Recognition D. C. Ciresan et. al. achieved 0.56% error rate in the IJCNN Traffic Sign Recognition Competition [12] 1991-1993 2006-2009 2009 2010 2011 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Visual Pattern Recognition 27/81 Module 3
Winning more visual recogni- tion challenges Network Error Layers AlexNet [13] 16.0% 8 ZFNet [14] 11.2% 8 VGGNet [15] 7.3% 19 GoogLeNet [16] 6.7% 22 MS ResNet [17] 3.6% 152!! 1991-1993 2006-2009 2009 2010 2011 2012-2016 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Success on ImageNet Visual Pattern Recognition 28/81 Module 3
Chapter 4: From Cats to Convolutional Neural Networks 29/81 Module 4
Hubel and Wiesel Experiment Experimentally showed that each neuron has a fixed receptive field - i.e. a neuron will fire only in response to a visual stimuli in a specific region in the visual space [18] 1959 H and W experiment 30/81 Module 4
Neocognitron Used for Handwritten character recogni- tion and pattern recognition (Fukushima et. al.) [19] 1959 1980 H and W experiment Neocognitron 31/81 Module 4
Convolutional Neural Network Handwriting digit recognition using back- propagation over a Convolutional Neural Network (LeCun et. al.) [20] 1959 1980 1989 H and W experiment Neocognitron CNN 32/81 Module 4
LeNet-5 Introduced the (now famous) MNIST dataset (LeCun et. al.) [21] 1959 1980 1989 1998 H and W experiment Neocognitron CNN LeNet-5 33/81 Module 4
An algorithm inspired by an experiment on cats is today used to detect cats in videos :-) 34/81 Module 4
Chapter 5: Faster, higher, stronger 35/81 Module 5
Better Optimization Methods Faster convergence, better accuracies 1983 2011 2012 2015 2016 2018 Adagrad Adam/BatchNorm Beyond Adam Nesterov RMSProp Eve 36/81 Module 5
Chapter 6: The Curious Case of Sequences 37/81 Module 6
Sequences They are everywhere Time series, speech, music, text, video Each unit in the sequence interacts with other units Need models to capture this interaction 38/81 Module 6
Hopfield Network Content-addressable memory systems for storing and retrieving patterns [22] 1982 Hopfield 39/81 Module 6
Jordan Network The output state of each time step is fed to the next time step thereby allowing interac- tions between time steps in the sequence 1982 1986 Hopfield Jordan 40/81 Module 6
Elman Network The hidden state of each time step is fed to the next time step thereby allowing interac- tions between time steps in the sequence 1982 1986 1990 Hopfield Jordan Elman 41/81 Module 6
Drawbacks of RNNs Hochreiter et. al. and Bengio et. al. showed the difficulty in training RNNs (the problem of exploding and vanishing gradi- ents) 1982 1986 1990 1991-1994 Hopfield Jordan Elman RNN drawbacks 42/81 Module 6
Long Short Term Memory Showed that LSTMs can solve complex long time lag tasks that could never be solved before 1982 1986 1990 1991-1994 1997 Hopfield Jordan Elman RNN drawbacks LSTMs 43/81 Module 6
Recommend
More recommend