CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/81

Acknowledgements Most of this material is based on the article “Deep Learning in Neural Networks: An Overview” by J. Schmidhuber [1] The errors, if any, are due to me and I apologize for them Feel free to contact me if you think certain portions need to be corrected (please provide appropriate references) 2/81 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 1

Chapter 1: Biological Neurons 3/81 Module 1.1

Reticular Theory Joseph von Gerlach proposed that the nervous system is a single continuous network as opposed to a network of many discrete cells! 1871-1873 Reticular theory 4/81 Module 1.1

Staining Technique Camillo Golgi discovered a chemical reaction that allowed him to examine nervous tissue in much greater detail than ever before He was a proponent of Reticular theory. 1871-1873 Reticular theory 5/81 Module 1.1

Neuron Doctrine Santiago Ram´ on y Cajal used Golgi’s technique to study the nervous system and proposed that it is actually made up of discrete individual cells formimg a network (as opposed to a single continuous network) 1871-1873 1888-1891 Reticular theory Neuron Doctrine 6/81 Module 1.1

The Term Neuron The term neuron was coined by Hein- rich Wilhelm Gottfried von Waldeyer-Hartz around 1891. He further consolidated the Neuron Doc- trine. 1871-1873 1888-1891 Reticular theory Neuron Doctrine 7/81 Module 1.1

Nobel Prize Both Golgi (reticular theory) and Cajal (neuron doctrine) were jointly awarded the 1906 Nobel Prize for Physiology or Medicine, that resulted in lasting conflicting ideas and con- troversies between the two scientists. 1871-1873 1888-1891 1906 Reticular theory Neuron Doctrine Nobel Prize 8/81 Module 1.1

The Final Word In 1950s electron microscopy finally con- firmed the neuron doctrine by unam- biguously demonstrating that nerve cells were individual cells interconnected through synapses (a network of many individual neurons). 1871-1873 1888-1891 1906 1950 Reticular theory Neuron Doctrine Nobel Prize Synapse 9/81 Module 1.1

Chapter 2: From Spring to Winter of AI 10/81 Module 2

McCulloch Pitts Neuron McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified model of the neuron (1943) [2] 1943 MP Neuron 11/81 Module 2

Perceptron “the perceptron may eventually be able to learn, make decisions, and translate lan- guages” -Frank Rosenblatt 1943 1957-1958 MP Neuron Perceptron 12/81 Module 2

Perceptron “the embryo of an electronic computer that the Navy expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” -New York Times 1943 1957-1958 MP Neuron Perceptron 13/81 Module 2

First generation Multilayer Perceptrons Ivakhnenko et. al. [3] 1943 1957-1958 1965-1968 MP Neuron Perceptron MLP 14/81 Module 2

Perceptron Limitations In their now famous book “Perceptrons”, Minsky and Papert outlined the limits of what perceptrons could do [4] 1943 1957-1958 1965-1968 1969 MP Neuron Perceptron MLP Limitations 15/81 Module 2

AI Winter of connectionism Almost lead to the abandonment of connec- tionist AI 1943 1957-1958 1965-1968 1969 1969-1986 MP Neuron Perceptron MLP Limitations AI Winter 16/81 Module 2

Backpropagation Discovered and rediscovered several times throughout 1960’s and 1970’s Werbos(1982) [5] first used it in the context of artificial neural networks Eventually popularized by the work of Rumelhart et. al. in 1986 [6] 1943 1957-1958 1965-1968 1969 1969-1986 1986 MP Neuron Perceptron MLP Limitations AI Winter Backpropagation 17/81 Module 2

Gradient Descent Cauchy discovered Gradient Descent moti- vated by the need to compute the orbit of heavenly bodies 1847 1943 1957-1958 1965-1968 1969 1969-1986 1986 Gradient Descent MP Neuron Perceptron MLP Limitations AI Winter Backpropagation 18/81 Module 2

Universal Approximation The- orem A multilayered network of neurons with a single hidden layer can be used to approxi- mate any continuous function to any desired precision [7] 1847 1943 1957-1958 1965-1968 1969 1969-1986 1986 1989 Gradient Descent MP Neuron Perceptron MLP Limitations AI Winter UAT Backpropagation 19/81 Module 2

Chapter 3: The Deep Revival 20/81 Module 3

Unsupervised Pre-Training Hinton and Salakhutdinov described an ef- fective way of initializing the weights that allows deep autoencoder networks to learn a low-dimensional representation of data. [8] 2006 Unsupervised Pre-Training 21/81 Module 3

Unsupervised Pre-Training The idea of unsupervised pre-training actually dates back to 1991-1993 (J. Schmidhu- ber) when it was used to train a “Very Deep Learner” 1991-1993 2006 Unsupervised Pre-Training Very Deep Learner 22/81 Module 3

More insights (2007-2009) Further Investigations into the effectiveness of Unsupervised Pre-training 1991-1993 2006-2009 Unsupervised Pretraining Very Deep Learner 23/81 Module 3

Success in Handwriting Recog- nition Graves et. al. outperformed all entries in an international Arabic handwriting recognition competition [9] Dahl et. al. showed relative error reduction of 16.0% and 23.2% over a state of the art system [10] 1991-1993 2006-2009 2009 Handwriting Unsupervised Pretraining Very Deep Learner 24/81 Module 3

Success in Speech Recognition Dahl et. al. showed relative error reduction of 16.0% and 23.2% over a state of the art system [10] 1991-1993 2006-2009 2009 2010 Handwriting Speech Unsupervised Pretraining Very Deep Learner 25/81 Module 3

New record on MNIST Ciresan et. al. set a new record on the MNIST dataset using good old backpropagation on GPUs (GPUs enter the scene) [11] 1991-1993 2006-2009 2009 2010 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST 26/81 Module 3

First Superhuman Visual Pat- tern Recognition D. C. Ciresan et. al. achieved 0.56% error rate in the IJCNN Traffic Sign Recognition Competition [12] 1991-1993 2006-2009 2009 2010 2011 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Visual Pattern Recognition 27/81 Module 3

Winning more visual recognition challenges Network Error Layers AlexNet [13] 16.0% 8 ZFNet [14] 11.2% 8 VGGNet [15] 7.3% 19 GoogLeNet [16] 6.7% 22 MS ResNet [17] 3.6% 152!! 1991-1993 2006-2009 2009 2010 2011 2012-2016 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Success on ImageNet Visual Pattern Recognition 28/81 Module 3

Chapter 4: From Cats to Convolutional Neural Networks 29/81 Module 4

Hubel and Wiesel Experiment Experimentally showed that each neuron has a fixed receptive field - i.e. a neuron will fire only in response to a visual stimuli in a specific region in the visual space [18] 1959 H and W experiment 30/81 Module 4

Neocognitron Used for Handwritten character recognition and pattern recognition (Fukushima et. al.) [19] 1959 1980 H and W experiment Neocognitron 31/81 Module 4

Convolutional Neural Network Handwriting digit recognition using backpropagation over a Convolutional Neural Network (LeCun et. al.) [20] 1959 1980 1989 H and W experiment Neocognitron CNN 32/81 Module 4

LeNet-5 Introduced the (now famous) MNIST dataset (LeCun et. al.) [21] 1959 1980 1989 1998 H and W experiment Neocognitron CNN LeNet-5 33/81 Module 4

An algorithm inspired by an experiment on cats is today used to detect cats in videos :-) 34/81 Module 4

Chapter 5: Faster, higher, stronger 35/81 Module 5

Better Optimization Methods Faster convergence, better accuracies 1983 2011 2012 2015 2016 2018 Adagrad Adam/BatchNorm Beyond Adam Nesterov RMSProp Eve 36/81 Module 5

Chapter 6: The Curious Case of Sequences 37/81 Module 6

Sequences They are everywhere Time series, speech, music, text, video Each unit in the sequence interacts with other units Need models to capture this interaction 38/81 Module 6

Hopfield Network Content-addressable memory systems for storing and retrieving patterns [22] 1982 Hopfield 39/81 Module 6

Jordan Network The output state of each time step is fed to the next time step thereby allowing interac- tions between time steps in the sequence 1982 1986 Hopfield Jordan 40/81 Module 6

Elman Network The hidden state of each time step is fed to the next time step thereby allowing interac- tions between time steps in the sequence 1982 1986 1990 Hopfield Jordan Elman 41/81 Module 6

Drawbacks of RNNs Hochreiter et. al. and Bengio et. al. showed the difficulty in training RNNs (the problem of exploding and vanishing gradi- ents) 1982 1986 1990 1991-1994 Hopfield Jordan Elman RNN drawbacks 42/81 Module 6

Long Short Term Memory Showed that LSTMs can solve complex long time lag tasks that could never be solved before 1982 1986 1990 1991-1994 1997 Hopfield Jordan Elman RNN drawbacks LSTMs 43/81 Module 6

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/81 Acknowledgements Most of this material is based on the

CS7015 (Deep Learning) : Lecture 10 Learning Vectorial Representations Of Words Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. Khapra Department of Computer

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning): Lecture 4 Feedforward Neural Networks, Backpropagation Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons,

Tableau metatheory for propositional and syllogistic logics Part III: Generalized relational

The Gbar project, or how does antimatter fall? Paul Indelicato For the GBAR collaboration

The Language of Social Software Jan van Eijck CWI, Amsterdam Workshop on Logic and Social

Projective unification in modal logic II Projective unification in modal logic II Piotr Wojtylak

ICARUS and the status of LAr technology Carlo Rubbia LNGS-Assergi, Italy CERN, Geneva,

Bal4c Cloud AakeEdlund ,PhD KTHandSNIC,Sweden

from Britain Jocelyn Bell Burnell Royal Society of Edinburgh and Oxford University This talk

Incentives Team Stroke ea St o e Rishi Ahuja Amparo Canaveras Samira Daswani Andrea Ippolito I