a brief history of deep learning 1
play

A brief history of deep learning 1 Andrew Kurenkov. This summary - PDF document

CS 486/686 Lecture 21 A brief history of deep learning model of neurons in our brain. The output depends only on inputs. Model this by multiplying each input by a continuous valued weight. strengths. Links between neurons are called


  1. CS 486/686 Lecture 21 A brief history of deep learning model of neurons in our brain. • The output depends only on inputs. Model this by multiplying each input by a continuous valued weight. strengths. • Links between neurons are called synapses and the synapses have difgerent functions using a perceptron. It allows us to compute more • A special bias input has the value of 1. tron (a nearby neuron). • Takes binary inputs, which are either data or the output of another percep- A perceptron: In 1957, a psychologist Frank Rosenblatt developed perceptron, a mathematical 1 The birth of machine learning • exponential growth in research publications (and ML graduate students) • massive investments from industry giants such as Google in AI • drastic improvements over reigning approaches towards the hardest problems There is a deep learning tsunami over the past several years. A brief history of deep learning 1 Andrew Kurenkov. This summary is based on A ’Brief’ History of Neural Nets and Deep Learning by • A neuron either fjres or not.

  2. CS 486/686 Lecture 21 A brief history of deep learning • open question: how do we learn a perceptron model? Rosenblatt answered 1. Start with random weights in a perceptron. A simple algorithm to learn a perceptron: the weights when the output is too high. • For each example, increase the weights if the output is too low and decrease grows to strength the synapse between A and B. • Neuron A repeatedly and persistently takes part in fjring neuron B, the brain the strenghths of the synapses between the neurons. • An idea by Donald Hebb: the brain learns by forming synapses and changing Learning a perceptron: this question. reasoning. 2 • big deal: believed that AI is solved if computers could perform formal logical • can represent AND, OR, and NOT (1943). • based on earlier work by Mcculoch-Pitts The perceptron e.g. step function or the sigmoid function. weighted sum is above the threshold, the output is 1. If the Activation function: a non-linear function of the weighted sum. otherwise. The output is 1 if the weighted sum is big enough and the output is 0 2. For a training example, compute the output of the perceptron.

  3. CS 486/686 Lecture 21 A brief history of deep learning • Arrange multiple perceptrons in a layer. • Perceptrons are so simple. — basically linear regression. So cannot solve The hype around perceptrons: produce an output of 0. represents a digit. The highest weighted sum produces an output of 1 and others This can be used to classify handwritten digits. Each of the 10 output values our network only has one layer - the output layer. Artifjcial Neural Networks are simply layers of perceptrons/neurons/units. So far, to a particular class?) • Each perceptron learns one output of the function. (Does the inputs belong • The perceptrons receive the same inputs. example, classify handwritten digits. 3 How can we use a perceptron for classifjcation tasks with multiple categories? For The hype around perceptrons simple shapes correctly with 20x20 pixel-like inputs. – Machine learning was born. Rosenblatt implemented perceptrons and showed that they can learn to classify more mistakes 6. Repeat steps 2-5 for all the training examples until the perceptron makes no that had an input of 1. 5. If the correct output was 1 but the actual output was 0, increase the weights that had an input of 1. 4. If the correct output was 0 but the actual output was 1, decrease the weights 3. If the output does not match the correct input: vision or speech recognition problems yet.

  4. CS 486/686 Lecture 21 A brief history of deep learning SKY and SEYMOUR PAPERT. M.I.T. Press, Cambridge, Mass., 1969. of lines, circles, and ovals in the images. The next hidden layer will take the Facial recognition: The fjrst hidden layer converts raw pixels to the locations • Hidden layers are good for feature extraction. • Show what a multi-layer neural network looks like. • To learn an XOR function or other complex functions. We need a multi-layer network to do useful things What did Minsky’s book show? believed to lead to the fjrst AI winter - a freeze to funding and publications. • They basically said: this approach was a dead end. This publication was ceptron. (XOR function is not linearly separable.). • A notable result: impossible to learn an XOR function using a single per- Perceptrons. An Introduction to Computational Geometry. MARVIN MIN- 4 of the limitations of perceptrons. rector of the lab) published a book named Perceptrons - a rigorous analysis • In 1969, Marvin Minsky (founder of MIT AI lab) and Seymour Papert (di- ulate logical symbols using rules. • Big problem in AI: formal logical reasoning. teaching computers to manip- The hype irritated other researchers who were skeptical about perceptrons. AI winter chanical space explorers. In 1958, Rosenblatt said that: Perceptrons might be fjred to the planets as me- problems. • However, a network of such simple units can be powerful and solve complex locations of the shapes and determine whether there are faces in the images.

  5. CS 486/686 Lecture 21 A brief history of deep learning • Use an optimization algorithm to fjnd weights that minimize the errors. • Many researchers had tried and failed to fjnd good ways to train multi-layer trons even to represent simple nonlinear functions such as the XOR mapping. • In his 1969 book, Minsky showed that: we need to use multi-layer percep- The atmosphere at the time during the AI winter: nets. (analyzed it in depth in his 1974 PhD thesis). • Paul Werbos fjrst proposed that backpropagation can be used for neural • Implemented by Seppo Linnainmaa in 1970. • Derived by multiple researchers in the 60s History of backpropagation: • When we change weights in any layer, the errors in the output layer changes. 5 • Propagate these errors backwards again to the previous hidden layer. previous layer for some of the errors made. • Propagate these errors backwards to the previous hidden layer. Blame the • Calculate the errors in the output layer. Backpropagation for neural nets • How do we adjust the weights in the middle layers? Use backpropagation. work. • Rosenblatt’s learning algorithm does not work for a multi-layer neural net- Problem: perceptrons.

  6. CS 486/686 Lecture 21 A brief history of deep learning classes. Kurt Hornik, Maxwell Stinchcombe, Halbert White, Multilayer feedforward net- implement any function. 1989 a key fjnding: A mathematical proof that multi-layer neural networks can ambitions of Rosenblatt seems to be within reach. We know how to train multi-layer neural networks. Become super popular. The Neural networks are back solve complex problems. Neural networks are back. We know how to train multi-layer neural networks to out by Minsky. • Wrote another in-depth paper to specifjcally address the problem pointed • Paper is identical to how the concept is explained in textbooks and AI 6 • Finally succeeded in making this idea well-known. • The paper stands out for concisely and clearly stating the idea. • Idea was rediscovered multiple times before this paper tations by back-propagating errors. Nature, 323, 533–536. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning represen- popularized. More than a decade after Werbos’ thesis, the idea of back propagation was fjnally The rise of back propagation • Paul Werbos did not publish his idea until 1982. The community lost faith. • Neural networks were an dead end. Lack of academic interest in this topic. works are universal approximators, Neural Networks, Volume 2, Issue 5, 1989,

  7. CS 486/686 Lecture 21 A brief history of deep learning learning: extracting local features and combining them to form higher order the location of each match. sliding across the image. Capable of only recognizing one thing. Records this feature (a horizontal line, a 45 degree line, etc). A magnifying glass • Passing a fjlter through an image and only picks up whether the image has that portion. is only passed a portion of the image and it’s looking for a local feature in Used to be that each neuron is passed the entire image. Now each neuron hidden unit to only combine local sources of information. Force a • Each neuron extracts a local feature anywhere inside an image. Convolutional neural networks: features • Highlighted a key modifjcation of neural networks towards modern machine 7 systems was reading 10 to 20% of all the checks in the US.) in the mid 90. (Show video.) (At some point in the late 1990s, one of these • The method was later used for a nationally deployed cheque reading system in Neural Computation , vol.1, no.4, pp.541-551, Dec. 1989 89 Jackel, L, “Backpropagation Applied to Handwritten Zip Code Recognition,” • LeCun, Y; Boser, B; Denker, J; Henderson, D; Howard, R; Hubbard, W; • Yann LeCun and others at the AT&T Bell Labs ognizing messy handwriting was a major challenge. • US postal service was desperate to be able to sort mails automatically. Rec- code recognition. In 1989 a signifjcant real-world application of backpropagation: handwritten zip 8. Pages 359-366, ISSN 0893-6080, http://dx.doi.org/10.1016/0893-6080(89)90020- • The following layers combine local features to form higher order features.

Recommend


More recommend