machine learning 2007 lecture 8 instructor tim van erven
play

Machine Learning 2007: Lecture 8 Instructor: Tim van Erven - PowerPoint PPT Presentation

Machine Learning 2007: Lecture 8 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/erven/teaching/0708/ml/ October 31, 2007 1 / 31 Overview Organisational Organisational Matters Matters Linear Functions as Inner


  1. Machine Learning 2007: Lecture 8 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/˜erven/teaching/0708/ml/ October 31, 2007 1 / 31

  2. Overview Organisational Organisational Matters ● Matters Linear Functions as Inner Products ● Linear Functions as Inner Products Neural Networks ● Neural Networks ✦ The Perceptron Gradient Descent ✦ General Neural Networks Gradient Descent ● ✦ Convex Functions ✦ Gradient Descent in One Variable ✦ Gradient Descent in More Variables ✦ Optimizing Perceptron Weights 2 / 31

  3. Course Organisation Final Exam: Organisational Matters Linear Functions as You have to enroll for the final exam on tisvu (when possible.) ● Inner Products The final exam will be more difficult than the intermediate ● Neural Networks exam. Gradient Descent Mitchell: Read: Chapter 4, sections 4.1 – 4.4. ● 3 / 31

  4. Course Organisation Final Exam: Organisational Matters Linear Functions as You have to enroll for the final exam on tisvu (when possible.) ● Inner Products The final exam will be more difficult than the intermediate ● Neural Networks exam. Gradient Descent Mitchell: Read: Chapter 4, sections 4.1 – 4.4. ● This Lecture: Explanation of linear functions as inner products is needed to ● understand Mitchell. Neural networks are in Mitchell. I have some extra pictures. ● Convex functions are not discussed in Mitchell. ● I will give more background on gradient descent. ● 3 / 31

  5. Overview Organisational Organisational Matters ● Matters Linear Functions as Inner Products ● Linear Functions as Inner Products Neural Networks ● Neural Networks ✦ The Perceptron Gradient Descent ✦ General Neural Networks Gradient Descent ● ✦ Convex Functions ✦ Gradient Descent in One Variable ✦ Gradient Descent in More Variables ✦ Optimizing Perceptron Weights 4 / 31

  6. Linear Functions as Inner Products Linear Function: Organisational Matters Linear Functions as h w ( x ) = w 0 + w 1 x 1 + . . . + w d x d Inner Products Neural Networks x = ( x 1 , . . . , x d ) ⊤ is a d -dimensional feature vector. Gradient Descent ● w = ( w 0 , w 1 , . . . , w d ) ⊤ is a d + 1 -dimensional weight vector. ● 5 / 31

  7. Linear Functions as Inner Products Linear Function: Organisational Matters Linear Functions as h w ( x ) = w 0 + w 1 x 1 + . . . + w d x d Inner Products Neural Networks x = ( x 1 , . . . , x d ) ⊤ is a d -dimensional feature vector. Gradient Descent ● w = ( w 0 , w 1 , . . . , w d ) ⊤ is a d + 1 -dimensional weight vector. ● As an Inner Product (a standard trick): We may change x into a d + 1 -dimensional vector x ′ by adding an imaginary extra feature x 0 , which always has value 1 : x ′ = (1 , x 1 , . . . , x d ) ⊤ x = ( x 1 , . . . , x d ) ⊤ ⇒ d � w i x ′ i = � w , x ′ � h w ( x ) = i =0 5 / 31

  8. Linear Functions as Inner Products Linear Function: Organisational Matters Linear Functions as h w ( x ) = w 0 + w 1 x 1 + . . . + w d x d Inner Products Neural Networks x = ( x 1 , . . . , x d ) ⊤ is a d -dimensional feature vector. Gradient Descent ● w = ( w 0 , w 1 , . . . , w d ) ⊤ is a d + 1 -dimensional weight vector. ● As an Inner Product (a standard trick): We may change x into a d + 1 -dimensional vector x ′ by adding an imaginary extra feature x 0 , which always has value 1 : x ′ = (1 , x 1 , . . . , x d ) ⊤ x = ( x 1 , . . . , x d ) ⊤ ⇒ d � w i x ′ i = � w , x ′ � h w ( x ) = i =0 Mitchell writes w · x ′ for � w , x ′ � . ● 5 / 31

  9. Overview Organisational Organisational Matters ● Matters Linear Functions as Inner Products ● Linear Functions as Inner Products Neural Networks ● Neural Networks ✦ The Perceptron Gradient Descent ✦ General Neural Networks Gradient Descent ● ✦ Convex Functions ✦ Gradient Descent in One Variable ✦ Gradient Descent in More Variables ✦ Optimizing Perceptron Weights 6 / 31

  10. Artificial Neurons An Artificial Neuron: Organisational Matters Linear Functions as An (artificial) neuron is some function h that gets a feature vector Inner Products x as input and outputs a (single) label y . Neural Networks The Perceptron: Gradient Descent The most famous type of (artificial) neuron is the perceptron: � 1 if w 0 + w 1 x 1 + . . . w d x d > 0 , h w ( x ) = − 1 otherwise. Applies a threshold to a linear function of x . ● Has parameters w . ● 7 / 31

  11. Different Views of The Perceptron Organisational Simple Neural Network: Mitchell’s Drawing: Matters Linear Functions as Inner Products x1 Neural Networks x2 Gradient Descent y1 x3 x4 OUTPUT INPUTS NEURONS OUTPUTS � 1 if w 0 + w 1 x 1 + . . . w d x d > 0 , Equation: h w ( x ) = − 1 otherwise. 8 / 31

  12. Different Views of The Perceptron Organisational Simple Neural Network: Mitchell’s Drawing: Matters Linear Functions as Inner Products x1 Neural Networks x2 Gradient Descent y1 x3 x4 OUTPUT INPUTS NEURONS OUTPUTS � 1 if w 0 + w 1 x 1 + . . . w d x d > 0 , Equation: h w ( x ) = − 1 otherwise. One of the most simple neural networks consists of just one ● perceptron neuron. A perceptron does classification . ● 8 / 31

  13. Different Views of The Perceptron Organisational Simple Neural Network: Mitchell’s Drawing: Matters Linear Functions as Inner Products x1 Neural Networks x2 Gradient Descent y1 x3 x4 OUTPUT INPUTS NEURONS OUTPUTS � 1 if w 0 + w 1 x 1 + . . . w d x d > 0 , Equation: h w ( x ) = − 1 otherwise. One of the most simple neural networks consists of just one ● perceptron neuron. A perceptron does classification . ● The network has no hidden units, and just one output. ● It may have any number of inputs. ● 8 / 31

  14. Decision Boundary of the Perceptron Decision boundary: w 0 + w 1 x 1 + . . . + w d x d = 0 Organisational Matters Linear Functions as This is where the perceptron changes its output y from − 1 (-) ● Inner Products to +1 (+) if we change x a little bit. Neural Networks For d = 2 this decision boundary is always a line. ● Gradient Descent 9 / 31

  15. Decision Boundary of the Perceptron Decision boundary: w 0 + w 1 x 1 + . . . + w d x d = 0 Organisational Matters Linear Functions as This is where the perceptron changes its output y from − 1 (-) ● Inner Products to +1 (+) if we change x a little bit. Neural Networks For d = 2 this decision boundary is always a line. ● Gradient Descent Representing Boolean Functions ( − 1 = false, 1 = true): AND OR x2 x2 3 3 2 2 − 1 + + 1 + x1 x1 −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 − −1 − − −1 + −2 −2 −3 −3 w 0 = − 0 . 8 , w 1 = 0 . 5 , w 2 = 0 . 5 w 0 = 0 . 3 , w 1 = 0 . 5 , w 2 = 0 . 5 Wrong in Mitchell! 9 / 31

  16. Perceptron Cannot Represent Exclusive Or Organisational Exclusive Or: Matters x2 Linear Functions as 3 Inner Products 2 Neural Networks Gradient Descent + − 1 x1 −3 −2 −1 1 2 3 − + −1 −2 −3 There exists no line that separates the inputs with label ‘-’ ● from the inputs with label ‘+’. They are not linearly separable . 10 / 31

  17. Perceptron Cannot Represent Exclusive Or Organisational Exclusive Or: Matters x2 Linear Functions as 3 Inner Products 2 Neural Networks Gradient Descent + − 1 x1 −3 −2 −1 1 2 3 − + −1 −2 −3 There exists no line that separates the inputs with label ‘-’ ● from the inputs with label ‘+’. They are not linearly separable . The decision boundary for the perceptron is always a line. ● Hence a perceptron can never implement the ‘exclusive or’ ● function, whichever weights we choose! 10 / 31

  18. Overview Organisational Organisational Matters ● Matters Linear Functions as Inner Products ● Linear Functions as Inner Products Neural Networks ● Neural Networks ✦ The Perceptron Gradient Descent ✦ General Neural Networks Gradient Descent ● ✦ Convex Functions ✦ Gradient Descent in One Variable ✦ Gradient Descent in More Variables ✦ Optimizing Perceptron Weights 11 / 31

  19. Artificial Neural Networks Organisational x1 Matters y1 Linear Functions as x2 Inner Products y2 x3 Neural Networks y3 Gradient Descent x4 y4 x5 x6 HIDDEN OUTPUT INPUTS NEURONS NEURONS OUTPUTS We can create an (artificial) neural network (NN) by ● connecting neurons together. We hook up our feature vector x to the input neurons in the ● network. We get a label vector y from the output neurons. 12 / 31

  20. Artificial Neural Networks Organisational x1 Matters y1 Linear Functions as x2 Inner Products y2 x3 Neural Networks y3 Gradient Descent x4 y4 x5 x6 HIDDEN OUTPUT INPUTS NEURONS NEURONS OUTPUTS We can create an (artificial) neural network (NN) by ● connecting neurons together. We hook up our feature vector x to the input neurons in the ● network. We get a label vector y from the output neurons. The parameters of the neural network w consist of all the ● parameters of the neurons in the network taken together in one big vector. 12 / 31

  21. NN Example: ALVINN Organisational Sharp Straight Sharp Matters Left Ahead Right Linear Functions as Inner Products 30 Output Units Neural Networks Gradient Descent 4 Hidden Units 30x32 Sensor Input Retina 13 / 31

Recommend


More recommend