neural networks
play

Neural Networks Janos Borst July 23, 2019 University of Leipzig - - PowerPoint PPT Presentation

Neural Networks Janos Borst July 23, 2019 University of Leipzig - NLP Group Machine Learning as a Way of Modeling Data Perceptron - Pt. 1 Data Modelling Setting Data set Two features and two classes 1. No: blue 2. Yes: red


  1. Layers Layers - The building Blocks of Neural Networks • An arrangement of neurons (trainable parameters) • Mathematical transformation of the input • Determine how the information fmows through • Contain a method to update the parameters • The abstraction allows to stack layers on top of each other 14

  2. The Picture So Far Input Layer Layer Output Activation 15

  3. The Picture So Far Input Layer Layer Output Activation Layers • We can transform the input by using a layer • We can stack layers • Layers other than input/ouput are called Hidden Layers • The arrangement of the layers is called an Architecture 15

  4. The Picture So Far Input Layer Layer Output Activation Activation Functions • Non-linear functions • Applied to the output of a layer • They make neural networks powerful • Correspond to the ”fjring of neurons” 15

  5. Activation functions What makes neural networks so powerful? • Non-Linearity • Scaling the network • A short Guide • or this • We will learn and use mainly the softmax activation 16 • Various Activations

  6. The Big Picture Input Layer Layer Output Activation Metric 17

  7. The Big Picture Input Layer Layer Output Activation Metric 17

  8. The Big Picture Input Layer Layer Output Activation Metric Metrics • Measure the quality of the Prediction on a Data Sample • Describes the desired performance 17

  9. Metrics Accuracy - A very common and easy metric • The ratio of correct predictions to the number of test examples • For set S of examples: accuracy s S : prediction(s) is correct S • This is what we want to be high ! Unfortunately: This is not difgerentiable (Remember: The training will rely on the derivation.) 18

  10. Metrics Accuracy - A very common and easy metric • The ratio of correct predictions to the number of test examples • For set S of examples: accuracy s S : prediction(s) is correct S • This is what we want to be high ! Unfortunately: This is not difgerentiable (Remember: The training will rely on the derivation.) 18

  11. Metrics Accuracy - A very common and easy metric • The ratio of correct predictions to the number of test examples • For set S of examples: • This is what we want to be high ! Unfortunately: This is not difgerentiable (Remember: The training will rely on the derivation.) 18 accuracy = |{ s ∈ S : prediction(s) is correct }| | S |

  12. Metrics Accuracy - A very common and easy metric • The ratio of correct predictions to the number of test examples • For set S of examples: • This is what we want to be high ! Unfortunately: This is not difgerentiable (Remember: The training will rely on the derivation.) 18 accuracy = |{ s ∈ S : prediction(s) is correct }| | S |

  13. Metrics Accuracy - A very common and easy metric • The ratio of correct predictions to the number of test examples • For set S of examples: • This is what we want to be high ! Unfortunately: This is not difgerentiable (Remember: The training will rely on the derivation.) 18 accuracy = |{ s ∈ S : prediction(s) is correct }| | S |

  14. The Picture So Far Input Layer Layer Activation Output Metric Loss 19

  15. The Picture So Far Input Layer Layer Activation Output Metric Loss Loss Functions • Loss functions measure the quality of the prediction • Difgerentiable ! • A Proxy for the metric • Also: cost function or error function 19

  16. The Picture So Far Input Layer Layer Activation Output Metric Loss Loss Functions • Loss functions measure the quality of the prediction • Difgerentiable ! • A Proxy for the metric • Also: cost function or error function 19

  17. The Picture So Far Input Layer Layer Activation Output Metric Loss Loss Functions • Loss functions measure the quality of the prediction • Difgerentiable ! • A Proxy for the metric • Also: cost function or error function 19

  18. The Picture So Far Input Layer Layer Activation Output Metric Loss Loss Functions • Loss functions measure the quality of the prediction • Difgerentiable ! • A Proxy for the metric • Also: cost function or error function 19

  19. The Picture So Far Input Layer Layer Activation Output Metric Loss Loss Functions • Dependent on the task we want to train • We will learn the corresponding loss functions by example 19

  20. The Picture So Far Input Layer Layer Activation Output Metric Loss Loss Functions • Dependent on the task we want to train • We will learn the corresponding loss functions by example 19

  21. Loss - Intuition Measures ”Deviation from ideal prediction” Suppose an image classifjer predicts: human cat dog 0.48 0.01 0.51 • Results in the correct decision dog • The ideal prediction would be p(dog)=1, p(cat)=0, p(human)=1 20

  22. Loss - Intuition Measures ”Deviation from ideal prediction” Suppose an image classifjer predicts: human cat dog 0.48 0.01 0.51 • Results in the correct decision dog • The ideal prediction would be p(dog)=1, p(cat)=0, p(human)=1 20

  23. Loss - Intuition Measures ”Deviation from ideal prediction” Suppose an image classifjer predicts: human cat dog 0.48 0.01 0.51 • Results in the correct decision dog • The ideal prediction would be p(dog)=1, p(cat)=0, p(human)=1 20

  24. Loss - Intuition Measures ”Deviation from ideal prediction” Suppose an image classifjer predicts: human cat dog 0.48 0.01 0.51 • Results in the correct decision dog • The ideal prediction would be p(dog)=1, p(cat)=0, p(human)=1 20

  25. Loss - Intuition For example distance: high loss, because prediction is uncertain 21 • prediction p = ( 0 . 48 , 0 . 01 , 0 . 51 ) • truth t = ( 0 , 0 , 1 ) √ ( 0 − 0 . 48 ) 2 + ( 0 − 0 . 1 ) 2 + ( 1 − 0 . 51 ) 2 = 0 . 68 l = ∥ t − p ∥ =

  26. The Picture So Far Data Input Layer Layer Activation Output Metric Loss Performance 22

  27. The Picture So Far Data Input Layer Layer Activation Output Metric Loss Performance Performance • Metrics and Losses to train the net • How do we measure the real performance of the model? • Train on set of examples (training set) • Evaluating on unseen data (test set) 22

  28. The Picture So Far Data Input Layer Layer Activation Output Metric Loss Performance Performance • Metrics and Losses to train the net • How do we measure the real performance of the model? • Train on set of examples (training set) • Evaluating on unseen data (test set) 22

  29. The Picture So Far Data Input Layer Layer Activation Output Metric Loss Performance Data • Before there is input there is data • How do we represent language data for input and output? • Next chapter 22

  30. The Picture So Far Data Input Layer Layer Activation Output Metric Loss Performance Data • Before there is input there is data • How do we represent language data for input and output? • Next chapter 22

  31. The Picture So Far Data Input Layer Layer Activation Output Metric Loss Performance Data • Before there is input there is data • How do we represent language data for input and output? • Next chapter 22

  32. Outlook • Very coarse-grained view on the structure of Neural Networks • Learning by examples • With every example we will: • learn new layers • learn new activations • learn new loss functions • And directly use them in Keras 23

  33. Outlook • Very coarse-grained view on the structure of Neural Networks • Learning by examples • With every example we will: • learn new layers • learn new activations • learn new loss functions • And directly use them in Keras 23

  34. Outlook • Very coarse-grained view on the structure of Neural Networks • Learning by examples • With every example we will: • learn new layers • learn new activations • learn new loss functions • And directly use them in Keras 23

  35. Training a neural network

  36. The Backpropagation Algorithm 2 Training The Process of Finding the best Parameters by looking at the Data. How do we update the weights? 2 Learning representations by back-propagating errors. David E. Rumelhart, Geofgrey E. Hinton, Ronald J. Williams. (1988) 24

  37. Training The Process of Finding the best Parameters by looking at the Data. How do we update the weights? 2 Learning representations by back-propagating errors. David E. Rumelhart, Geofgrey E. Hinton, Ronald J. Williams. (1988) 24 The Backpropagation Algorithm 2

  38. Terminology • batch : A small subset drawn from the data • batch size • example : One element of the data • epoch : Iteration over all available examples (in batches) 25

  39. Terminology • batch : A small subset drawn from the data • batch size • example : One element of the data • epoch : Iteration over all available examples (in batches) 25

  40. Terminology • batch : A small subset drawn from the data • batch size • example : One element of the data • epoch : Iteration over all available examples (in batches) 25

  41. Epoch The idea of an epoch (similar to the Perceptron Algorithm): 1. Pick a few examples of data at random (batch) 2. Calculate the output of the net 3. Loss: Calculate the Loss/Error of the output 4. Determine the gradients (derivatives) 5. Update the Weights accordingly 6. Do that until every example has been seen once 26

  42. Epoch The idea of an epoch (similar to the Perceptron Algorithm): 1. Pick a few examples of data at random (batch) 2. Calculate the output of the net 3. Loss: Calculate the Loss/Error of the output 4. Determine the gradients (derivatives) 5. Update the Weights accordingly 6. Do that until every example has been seen once 26

  43. Epoch The idea of an epoch (similar to the Perceptron Algorithm): 1. Pick a few examples of data at random (batch) 2. Calculate the output of the net 3. Loss: Calculate the Loss/Error of the output 4. Determine the gradients (derivatives) 5. Update the Weights accordingly 6. Do that until every example has been seen once 26

  44. Epoch The idea of an epoch (similar to the Perceptron Algorithm): 1. Pick a few examples of data at random (batch) 2. Calculate the output of the net 3. Loss: Calculate the Loss/Error of the output 4. Determine the gradients (derivatives) 5. Update the Weights accordingly 6. Do that until every example has been seen once 26

  45. Epoch The idea of an epoch (similar to the Perceptron Algorithm): 1. Pick a few examples of data at random (batch) 2. Calculate the output of the net 3. Loss: Calculate the Loss/Error of the output 4. Determine the gradients (derivatives) 5. Update the Weights accordingly 6. Do that until every example has been seen once 26

  46. Epoch The idea of an epoch (similar to the Perceptron Algorithm): 1. Pick a few examples of data at random (batch) 2. Calculate the output of the net 3. Loss: Calculate the Loss/Error of the output 4. Determine the gradients (derivatives) 5. Update the Weights accordingly 6. Do that until every example has been seen once 26

  47. 0.9 cat 0.1 dog Backpropagation - An visual Intuition Update the Weights accordingly? loss 27

  48. 0.9 cat 0.1 dog Backpropagation - An visual Intuition Update the Weights accordingly? loss Input example 27

  49. 0.9 cat 0.1 dog Backpropagation - An visual Intuition Update the Weights accordingly? loss Transform to network input 27

  50. 0.9 cat 0.1 dog Backpropagation - An visual Intuition Update the Weights accordingly? loss Calculate a dense transformation 27

  51. Backpropagation - An visual Intuition Update the Weights accordingly? loss Calculate the output of the network 27 0.9 cat 0.1 dog

  52. Backpropagation - An visual Intuition Update the Weights accordingly? loss Calculate the loss function 27 0.9 cat 0.1 dog

  53. Backpropagation - An visual Intuition Update the Weights accordingly? loss Update the weights, s.t. the probability for Cat decreases. 27 0.9 cat 0.1 dog

  54. Backpropagation - An visual Intuition Update the Weights accordingly? loss Update the weights, s.t. the probability for Cat decreases. 27 0.9 cat 0.1 dog

  55. Backpropagation - An visual Intuition Update the Weights accordingly? loss Update the weights, s.t. the probability for Dog increases. 27 0.9 cat 0.1 dog

  56. Backpropagation - An visual Intuition Update the Weights accordingly? loss Update the weights, s.t. the probability for Dog increases. 27 0.9 cat 0.1 dog

  57. Backpropagation - Reading A neural network is trained by a combination of Gradient Descent and Backpropagation • A good video for intuition 28 • Very mathematical

  58. How does a specifjc weight w ij infmuence the error made on the Backpropagation - A mathematical Intuition The neural network is a parametrized function, e.g.: , with parameteres W and b and a loss function : loss pred i truth example? loss w ij loss w ij 29 pred ( i ) = α ( W · i + b )

  59. How does a specifjc weight w ij infmuence the error made on the Backpropagation - A mathematical Intuition The neural network is a parametrized function, e.g.: , with parameteres W and b and a loss function : example? loss w ij loss w ij 29 pred ( i ) = α ( W · i + b ) loss ( pred ( i ) , truth )

  60. Backpropagation - A mathematical Intuition The neural network is a parametrized function, e.g.: , with parameteres W and b and a loss function : example? loss w ij loss w ij 29 pred ( i ) = α ( W · i + b ) loss ( pred ( i ) , truth ) How does a specifjc weight w ij infmuence the error made on the

Recommend


More recommend