neural networks introduction
play

Neural Networks: Introduction Machine Learning Based on slides and - PowerPoint PPT Presentation

Neural Networks: Introduction Machine Learning Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, 1 Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others Where are we? Learning algorithms General learning


  1. Neural Networks: Introduction Machine Learning Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, 1 Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

  2. Where are we? Learning algorithms General learning principles Decision Trees Overfitting • • Perceptron Mistake-bound learning • • AdaBoost PAC learning, sample complexity • • Support Vector Machines Hypothesis choice & VC dimensions • • Naïve Bayes Training and generalization errors • • Produce linear Logistic Regression Regularized Empirical Loss • • classifiers Minimization Bayesian Learning • 4

  3. Neural Networks • What is a neural network? • Predicting with a neural network • Training neural networks • Practical concerns 6

  4. This lecture • What is a neural network? – The hypothesis class – Structure, expressiveness • Predicting with a neural network • Training neural networks • Practical concerns 7

  5. We have seen linear threshold units Prediction sgn (& ' ( + *) = sgn(∑. / 0 / + *) Learning threshold various algorithms dot perceptron, SVM, logistic regression,… product in general, minimize loss features But where do these input features come from? What if the features were outputs of another classifier? 11

  6. Features from classifiers 12

  7. Features from classifiers 13

  8. Features from classifiers Each of these connections have their own weights as well 14

  9. Features from classifiers 15

  10. Features from classifiers This is a two layer feed forward neural network 16

  11. Features from classifiers This is a two layer feed forward neural network The output layer The input layer The hidden layer Think of the hidden layer as learning a good representation of the inputs 17

  12. Features from classifiers This is a two layer feed forward neural network The dot product followed by the threshold constitutes a neuron Five neurons in this picture (four in hidden layer and one output) 19

  13. But where do the inputs come from? The input layer What if the inputs were the outputs of a classifier? We can make a three layer network…. And so on. 20

  14. Let us try to formalize this 21

  15. Neural networks A robust approach for approximating real-valued, discrete- valued or vector valued functions Among the most effective general purpose supervised learning methods currently known Especially for complex and hard to interpret data such as real- world sensory data The Backpropagation algorithm for neural networks has been shown successful in many practical problems Across various application domains 22

  16. Artificial neurons Functions that very loosely mimic a biological neuron A neuron accepts a collection of inputs (a vector x ) and produces an output by: 1. Applying a dot product with weights w and adding a bias b 2. Applying a (possibly non-linear) transformation called an activation 123423 = activation(& ' ( + *) 25

  17. Artificial neurons Functions that very loosely mimic a biological neuron A neuron accepts a collection of inputs (a vector x ) and produces an output by: 1. Applying a dot product with weights w and adding a bias b 2. Applying a (possibly non-linear) transformation called an activation 123423 = activation(& ' ( + *) Dot product Threshold activation Other activations are possible 27

  18. Activation functions Also called transfer functions 123423 = activation(& ' ( + *) Name of the neuron Activation function: activation(;) Linear unit ; Threshold/sign unit sgn(;) 1 Sigmoid unit 1 + exp (−;) Rectified linear unit (ReLU) max (0, ;) Tanh unit tanh (;) Many more activation functions exist (sinusoid, sinc, gaussian, polynomial…) 28

  19. A neural network Output A function that converts inputs to outputs defined by a directed acyclic graph H w FG – Nodes organized in layers, correspond to Hidden neurons I w FG – Edges carry output of one neuron to another, associated with weights Input • To define a neural network, we need to specify: – The structure of the graph • How many nodes, the connectivity – The activation function on each node – The edge weights 30

  20. A neural network Output A function that converts inputs to outputs defined by a directed acyclic graph H w FG – Nodes organized in layers, correspond to Hidden neurons I w FG – Edges carry output of one neuron to another, associated with weights Input • To define a neural network, we need to specify: – The structure of the graph • How many nodes, the connectivity – The activation function on each node – The edge weights 31

  21. A neural network Output A function that converts inputs to outputs defined by a directed acyclic graph H w FG – Nodes organized in layers, correspond to Hidden neurons I w FG – Edges carry output of one neuron to another, associated with weights Input • To define a neural network, we need to Called the architecture specify: of the network – The structure of the graph Typically predefined, part of the design of • How many nodes, the connectivity the classifier – The activation function on each node – The edge weights 32

  22. A neural network Output A function that converts inputs to outputs defined by a directed acyclic graph H w FG – Nodes organized in layers, correspond to Hidden neurons I w FG – Edges carry output of one neuron to another, associated with weights Input • To define a neural network, we need to Called the architecture specify: of the network – The structure of the graph Typically predefined, part of the design of • How many nodes, the connectivity the classifier – The activation function on each node – The edge weights Learned from data 33

  23. very A brief history of neural networks 1943: McCullough and Pitts showed how linear threshold units can • compute logical functions 1949: Hebb suggested a learning rule that has some physiological • plausibility 1950s: Rosenblatt, the Peceptron algorithm for a single threshold neuron • 1969: Minsky and Papert studied the neuron from a geometrical • perspective 1980s: Convolutional neural networks (Fukushima, LeCun), the • backpropagation algorithm (various) Early 2000s-today: More compute, more data, deeper networks • 34 See also: http://people.idsia.ch/~juergen/deep-learning-overview.html

  24. What functions do neural networks express? 35

  25. A single neuron with threshold activation Prediction = sgn (b +w 1 x 1 + w 2 x 2 ) b +w 1 x 1 + w 2 x 2 =0 + ++ + + + + + - - - - - - - - - - - - - - - - - - 36

  26. Two layers, with threshold activations In general, convex polygons 37 Figure from Shai Shalev-Shwartz and Shai Ben-David, 2014

  27. Three layers with threshold activations In general, unions of convex polygons 38 Figure from Shai Shalev-Shwartz and Shai Ben-David, 2014

  28. Neural networks are universal function approximators Any continuous function can be approximated to arbitrary accuracy using • one hidden layer of sigmoid units [Cybenko 1989] Approximation error is insensitive to the choice of activation functions • [DasGupta et al 1993] Two layer threshold networks can express any Boolean function • Exercise : Prove this – VC dimension of threshold network with edges E: JK = L(|N| log |N|) • VC dimension of sigmoid networks with nodes V and edges E: • Upper bound: Ο J H N H – Lower bound: Ω N H – Exercise : Show that if we have only linear units, then multiple layers does not change the expressiveness 39

  29. Neural networks are universal function approximators Any continuous function can be approximated to arbitrary accuracy using • one hidden layer of sigmoid units [Cybenko 1989] Approximation error is insensitive to the choice of activation functions • [DasGupta et al 1993] Two layer threshold networks can express any Boolean function • Exercise : Prove this – VC dimension of threshold network with edges E: JK = L(|N| log |N|) • VC dimension of sigmoid networks with nodes V and edges E: • Upper bound: Ο J H N H – Lower bound: Ω N H – 40

  30. Neural networks are universal function approximators Any continuous function can be approximated to arbitrary accuracy using • one hidden layer of sigmoid units [Cybenko 1989] Approximation error is insensitive to the choice of activation functions • [DasGupta et al 1993] Two layer threshold networks can express any Boolean function • Exercise : Prove this – VC dimension of threshold network with edges E: JK = L(|N| log |N|) • VC dimension of sigmoid networks with nodes V and edges E: • Upper bound: Ο J H N H – Lower bound: Ω N H – Exercise : Show that if we have only linear units, then multiple layers does not change the expressiveness 41

Recommend


More recommend