cs7015 deep learning lecture 1
play

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/49 Acknowledgements Most of this material is based on the


  1. CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/49

  2. Acknowledgements Most of this material is based on the article “Deep Learning in Neural Networks: An Overview” by J. Schmidhuber [1] The errors, if any, are due to me and I apologize for them Feel free to contact me if you think certain portions need to be corrected (please provide appropriate references) 2/49 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 1

  3. Chapter 1: Biological Neurons 3/49 Module 1.1

  4. Reticular Theory Joseph von Gerlach proposed that the ner- vous system is a single continuous network as opposed to a network of many discrete cells! 1871-1873 Reticular theory 4/49 Module 1.1

  5. Staining Technique Camillo Golgi discovered a chemical reaction that allowed him to examine nervous tissue in much greater detail than ever before He was a proponent of Reticular theory. 1871-1873 Reticular theory 4/49 Module 1.1

  6. Neuron Doctrine Santiago Ram´ on y Cajal used Golgi’s tech- nique to study the nervous system and pro- posed that it is actually made up of discrete individual cells formimg a network (as op- posed to a single continuous network) 1871-1873 1888-1891 Reticular theory Neuron Doctrine 4/49 Module 1.1

  7. The Term Neuron The term neuron was coined by Hein- rich Wilhelm Gottfried von Waldeyer-Hartz around 1891. He further consolidated the Neuron Doc- trine. 1871-1873 1888-1891 Reticular theory Neuron Doctrine 4/49 Module 1.1

  8. Nobel Prize Both Golgi (reticular theory) and Cajal (neu- ron doctrine) were jointly awarded the 1906 Nobel Prize for Physiology or Medicine, that resulted in lasting conflicting ideas and con- troversies between the two scientists. 1871-1873 1888-1891 1906 Reticular theory Neuron Doctrine Nobel Prize 4/49 Module 1.1

  9. The Final Word In 1950s electron microscopy finally con- firmed the neuron doctrine by unam- biguously demonstrating that nerve cells were individual cells interconnected through synapses (a network of many individual neu- rons). 1871-1873 1888-1891 1906 1950 Reticular theory Neuron Doctrine Nobel Prize Synapse 4/49 Module 1.1

  10. Chapter 2: From Spring to Winter of AI 5/49 Module 2

  11. McCulloch Pitts Neuron McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified model of the neuron (1943) [2] 1943 MP Neuron 6/49 Module 2

  12. Perceptron “the perceptron may eventually be able to learn, make decisions, and translate lan- guages” -Frank Rosenblatt 1943 1957-1958 MP Neuron Perceptron 6/49 Module 2

  13. Perceptron “the embryo of an electronic computer that the Navy expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” -New York Times 1943 1957-1958 MP Neuron Perceptron 6/49 Module 2

  14. First generation Multilayer Perceptrons Ivakhnenko et. al. [3] 1943 1957-1958 1965-1968 MP Neuron Perceptron MLP 6/49 Module 2

  15. Perceptron Limitations In their now famous book “Perceptrons”, Minsky and Papert outlined the limits of what perceptrons could do [4] 1943 1957-1958 1965-1968 1969 MP Neuron Perceptron MLP Limitations 6/49 Module 2

  16. AI Winter of connectionism Almost lead to the abandonment of connec- tionist AI 1943 1957-1958 1965-1968 1969 1969-1986 MP Neuron Perceptron MLP Limitations AI Winter 6/49 Module 2

  17. Backpropagation Discovered and rediscovered several times throughout 1960’s and 1970’s Werbos(1982) [5] first used it in the context of artificial neural networks Eventually popularized by the work of Rumelhart et. al. in 1986 [6] 1943 1957-1958 1965-1968 1969 1969-1986 1986 MP Neuron Perceptron MLP Limitations AI Winter Backpropagation 6/49 Module 2

  18. Gradient Descent Cauchy discovered Gradient Descent moti- vated by the need to compute the orbit of heavenly bodies 1847 1943 1957-1958 1965-1968 1969 1969-1986 1986 Gradient Descent MP Neuron Perceptron MLP Limitations AI Winter Backpropagation 6/49 Module 2

  19. Universal Approximation The- orem A multilayered network of neurons with a single hidden layer can be used to approxi- mate any continuous function to any desired precision [7] 1847 1943 1957-1958 1965-1968 1969 1969-1986 1986 1989 Gradient Descent MP Neuron Perceptron MLP Limitations AI Winter UAT Backpropagation 6/49 Module 2

  20. Chapter 3: The Deep Revival 7/49 Module 3

  21. Unsupervised Pre-Training Hinton and Salakhutdinov described an ef- fective way of initializing the weights that allows deep autoencoder networks to learn a low-dimensional representation of data. [8] 2006 Unsupervised Pre-Training 8/49 Module 3

  22. Unsupervised Pre-Training The idea of unsupervised pre-training actu- ally dates back to 1991-1993 (J. Schmidhu- ber) when it was used to train a “Very Deep Learner” 1991-1993 2006 Unsupervised Pre-Training Very Deep Learner 9/49 Module 3

  23. More insights (2007-2009) Further Investigations into the effectiveness of Unsupervised Pre-training 1991-1993 2006-2009 Unsupervised Pretraining Very Deep Learner 9/49 Module 3

  24. Success in Handwriting Recog- nition Graves et. al. outperformed all entries in an international Arabic handwriting recognition competition [9] 1991-1993 2006-2009 2009 Handwriting Unsupervised Pretraining Very Deep Learner 9/49 Module 3

  25. Success in Speech Recognition Dahl et. al. showed relative error reduction of 16.0% and 23.2% over a state of the art system [10] 1991-1993 2006-2009 2009 2010 Handwriting Speech Unsupervised Pretraining Very Deep Learner 9/49 Module 3

  26. New record on MNIST Ciresan et. al. set a new record on the MNIST dataset using good old backpropa- gation on GPUs (GPUs enter the scene) [11] 1991-1993 2006-2009 2009 2010 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST 9/49 Module 3

  27. First Superhuman Visual Pat- tern Recognition D. C. Ciresan et. al. achieved 0.56% error rate in the IJCNN Traffic Sign Recognition Competition [12] 1991-1993 2006-2009 2009 2010 2011 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Visual Pattern Recognition 9/49 Module 3

  28. Winning more visual recogni- tion challenges Network Error Layers AlexNet [13] 16.0% 8 1991-1993 2006-2009 2009 2010 2011 2012-2016 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Success on ImageNet Visual Pattern Recognition 9/49 Module 3

  29. Winning more visual recogni- tion challenges Network Error Layers AlexNet [13] 16.0% 8 ZFNet [14] 11.2% 8 1991-1993 2006-2009 2009 2010 2011 2012-2016 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Success on ImageNet Visual Pattern Recognition 9/49 Module 3

  30. Winning more visual recogni- tion challenges Network Error Layers AlexNet [13] 16.0% 8 ZFNet [14] 11.2% 8 VGGNet [15] 7.3% 19 1991-1993 2006-2009 2009 2010 2011 2012-2016 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Success on ImageNet Visual Pattern Recognition 9/49 Module 3

  31. Winning more visual recogni- tion challenges Network Error Layers AlexNet [13] 16.0% 8 ZFNet [14] 11.2% 8 VGGNet [15] 7.3% 19 GoogLeNet [16] 6.7% 22 1991-1993 2006-2009 2009 2010 2011 2012-2016 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Success on ImageNet Visual Pattern Recognition 9/49 Module 3

  32. Winning more visual recogni- tion challenges Network Error Layers AlexNet [13] 16.0% 8 ZFNet [14] 11.2% 8 VGGNet [15] 7.3% 19 GoogLeNet [16] 6.7% 22 MS ResNet [17] 3.6% 152!! 1991-1993 2006-2009 2009 2010 2011 2012-2016 Handwriting Speech Unsupervised Pretraining Very Deep Learner Record on MNIST Success on ImageNet Visual Pattern Recognition 9/49 Module 3

  33. Chapter 4: From Cats to Convolutional Neural Networks 10/49 Module 4

  34. Hubel and Wiesel Experiment Experimentally showed that each neuron has a fixed receptive field - i.e. a neuron will fire only in response to a visual stimuli in a specific region in the visual space [18] 1959 H and W experiment 11/49 Module 4

  35. Neocognitron Used for Handwritten character recogni- tion and pattern recognition (Fukushima et. al.) [19] 1959 1980 H and W experiment Neocognitron 11/49 Module 4

  36. Convolutional Neural Network Handwriting digit recognition using back- propagation over a Convolutional Neural Network (LeCun et. al.) [20] 1959 1980 1989 H and W experiment Neocognitron CNN 11/49 Module 4

  37. LeNet-5 Introduced the (now famous) MNIST dataset (LeCun et. al.) [21] 1959 1980 1989 1998 H and W experiment Neocognitron CNN LeNet-5 11/49 Module 4

  38. An algorithm inspired by an experiment on cats is today used to detect cats in videos :-) 12/49 Module 4

  39. Chapter 5: Faster, higher, stronger 13/49 Module 5

  40. Better Optimization Methods Faster convergence, better accuracies 1983 Nesterov 14/49 Module 5

  41. Better Optimization Methods Faster convergence, better accuracies 1983 2011 Adagrad Nesterov 14/49 Module 5

Recommend


More recommend