Layers Layers - The building Blocks of Neural Networks • An arrangement of neurons (trainable parameters) • Mathematical transformation of the input • Determine how the information fmows through • Contain a method to update the parameters • The abstraction allows to stack layers on top of each other 14
The Picture So Far Input Layer Layer Output Activation 15
The Picture So Far Input Layer Layer Output Activation Layers • We can transform the input by using a layer • We can stack layers • Layers other than input/ouput are called Hidden Layers • The arrangement of the layers is called an Architecture 15
The Picture So Far Input Layer Layer Output Activation Activation Functions • Non-linear functions • Applied to the output of a layer • They make neural networks powerful • Correspond to the ”fjring of neurons” 15
Activation functions What makes neural networks so powerful? • Non-Linearity • Scaling the network • A short Guide • or this • We will learn and use mainly the softmax activation 16 • Various Activations
The Big Picture Input Layer Layer Output Activation Metric 17
The Big Picture Input Layer Layer Output Activation Metric 17
The Big Picture Input Layer Layer Output Activation Metric Metrics • Measure the quality of the Prediction on a Data Sample • Describes the desired performance 17
Metrics Accuracy - A very common and easy metric • The ratio of correct predictions to the number of test examples • For set S of examples: accuracy s S : prediction(s) is correct S • This is what we want to be high ! Unfortunately: This is not difgerentiable (Remember: The training will rely on the derivation.) 18
Metrics Accuracy - A very common and easy metric • The ratio of correct predictions to the number of test examples • For set S of examples: accuracy s S : prediction(s) is correct S • This is what we want to be high ! Unfortunately: This is not difgerentiable (Remember: The training will rely on the derivation.) 18
Metrics Accuracy - A very common and easy metric • The ratio of correct predictions to the number of test examples • For set S of examples: • This is what we want to be high ! Unfortunately: This is not difgerentiable (Remember: The training will rely on the derivation.) 18 accuracy = |{ s ∈ S : prediction(s) is correct }| | S |
Metrics Accuracy - A very common and easy metric • The ratio of correct predictions to the number of test examples • For set S of examples: • This is what we want to be high ! Unfortunately: This is not difgerentiable (Remember: The training will rely on the derivation.) 18 accuracy = |{ s ∈ S : prediction(s) is correct }| | S |
Metrics Accuracy - A very common and easy metric • The ratio of correct predictions to the number of test examples • For set S of examples: • This is what we want to be high ! Unfortunately: This is not difgerentiable (Remember: The training will rely on the derivation.) 18 accuracy = |{ s ∈ S : prediction(s) is correct }| | S |
The Picture So Far Input Layer Layer Activation Output Metric Loss 19
The Picture So Far Input Layer Layer Activation Output Metric Loss Loss Functions • Loss functions measure the quality of the prediction • Difgerentiable ! • A Proxy for the metric • Also: cost function or error function 19
The Picture So Far Input Layer Layer Activation Output Metric Loss Loss Functions • Loss functions measure the quality of the prediction • Difgerentiable ! • A Proxy for the metric • Also: cost function or error function 19
The Picture So Far Input Layer Layer Activation Output Metric Loss Loss Functions • Loss functions measure the quality of the prediction • Difgerentiable ! • A Proxy for the metric • Also: cost function or error function 19
The Picture So Far Input Layer Layer Activation Output Metric Loss Loss Functions • Loss functions measure the quality of the prediction • Difgerentiable ! • A Proxy for the metric • Also: cost function or error function 19
The Picture So Far Input Layer Layer Activation Output Metric Loss Loss Functions • Dependent on the task we want to train • We will learn the corresponding loss functions by example 19
The Picture So Far Input Layer Layer Activation Output Metric Loss Loss Functions • Dependent on the task we want to train • We will learn the corresponding loss functions by example 19
Loss - Intuition Measures ”Deviation from ideal prediction” Suppose an image classifjer predicts: human cat dog 0.48 0.01 0.51 • Results in the correct decision dog • The ideal prediction would be p(dog)=1, p(cat)=0, p(human)=1 20
Loss - Intuition Measures ”Deviation from ideal prediction” Suppose an image classifjer predicts: human cat dog 0.48 0.01 0.51 • Results in the correct decision dog • The ideal prediction would be p(dog)=1, p(cat)=0, p(human)=1 20
Loss - Intuition Measures ”Deviation from ideal prediction” Suppose an image classifjer predicts: human cat dog 0.48 0.01 0.51 • Results in the correct decision dog • The ideal prediction would be p(dog)=1, p(cat)=0, p(human)=1 20
Loss - Intuition Measures ”Deviation from ideal prediction” Suppose an image classifjer predicts: human cat dog 0.48 0.01 0.51 • Results in the correct decision dog • The ideal prediction would be p(dog)=1, p(cat)=0, p(human)=1 20
Loss - Intuition For example distance: high loss, because prediction is uncertain 21 • prediction p = ( 0 . 48 , 0 . 01 , 0 . 51 ) • truth t = ( 0 , 0 , 1 ) √ ( 0 − 0 . 48 ) 2 + ( 0 − 0 . 1 ) 2 + ( 1 − 0 . 51 ) 2 = 0 . 68 l = ∥ t − p ∥ =
The Picture So Far Data Input Layer Layer Activation Output Metric Loss Performance 22
The Picture So Far Data Input Layer Layer Activation Output Metric Loss Performance Performance • Metrics and Losses to train the net • How do we measure the real performance of the model? • Train on set of examples (training set) • Evaluating on unseen data (test set) 22
The Picture So Far Data Input Layer Layer Activation Output Metric Loss Performance Performance • Metrics and Losses to train the net • How do we measure the real performance of the model? • Train on set of examples (training set) • Evaluating on unseen data (test set) 22
The Picture So Far Data Input Layer Layer Activation Output Metric Loss Performance Data • Before there is input there is data • How do we represent language data for input and output? • Next chapter 22
The Picture So Far Data Input Layer Layer Activation Output Metric Loss Performance Data • Before there is input there is data • How do we represent language data for input and output? • Next chapter 22
The Picture So Far Data Input Layer Layer Activation Output Metric Loss Performance Data • Before there is input there is data • How do we represent language data for input and output? • Next chapter 22
Outlook • Very coarse-grained view on the structure of Neural Networks • Learning by examples • With every example we will: • learn new layers • learn new activations • learn new loss functions • And directly use them in Keras 23
Outlook • Very coarse-grained view on the structure of Neural Networks • Learning by examples • With every example we will: • learn new layers • learn new activations • learn new loss functions • And directly use them in Keras 23
Outlook • Very coarse-grained view on the structure of Neural Networks • Learning by examples • With every example we will: • learn new layers • learn new activations • learn new loss functions • And directly use them in Keras 23
Training a neural network
The Backpropagation Algorithm 2 Training The Process of Finding the best Parameters by looking at the Data. How do we update the weights? 2 Learning representations by back-propagating errors. David E. Rumelhart, Geofgrey E. Hinton, Ronald J. Williams. (1988) 24
Training The Process of Finding the best Parameters by looking at the Data. How do we update the weights? 2 Learning representations by back-propagating errors. David E. Rumelhart, Geofgrey E. Hinton, Ronald J. Williams. (1988) 24 The Backpropagation Algorithm 2
Terminology • batch : A small subset drawn from the data • batch size • example : One element of the data • epoch : Iteration over all available examples (in batches) 25
Terminology • batch : A small subset drawn from the data • batch size • example : One element of the data • epoch : Iteration over all available examples (in batches) 25
Terminology • batch : A small subset drawn from the data • batch size • example : One element of the data • epoch : Iteration over all available examples (in batches) 25
Epoch The idea of an epoch (similar to the Perceptron Algorithm): 1. Pick a few examples of data at random (batch) 2. Calculate the output of the net 3. Loss: Calculate the Loss/Error of the output 4. Determine the gradients (derivatives) 5. Update the Weights accordingly 6. Do that until every example has been seen once 26
Epoch The idea of an epoch (similar to the Perceptron Algorithm): 1. Pick a few examples of data at random (batch) 2. Calculate the output of the net 3. Loss: Calculate the Loss/Error of the output 4. Determine the gradients (derivatives) 5. Update the Weights accordingly 6. Do that until every example has been seen once 26
Epoch The idea of an epoch (similar to the Perceptron Algorithm): 1. Pick a few examples of data at random (batch) 2. Calculate the output of the net 3. Loss: Calculate the Loss/Error of the output 4. Determine the gradients (derivatives) 5. Update the Weights accordingly 6. Do that until every example has been seen once 26
Epoch The idea of an epoch (similar to the Perceptron Algorithm): 1. Pick a few examples of data at random (batch) 2. Calculate the output of the net 3. Loss: Calculate the Loss/Error of the output 4. Determine the gradients (derivatives) 5. Update the Weights accordingly 6. Do that until every example has been seen once 26
Epoch The idea of an epoch (similar to the Perceptron Algorithm): 1. Pick a few examples of data at random (batch) 2. Calculate the output of the net 3. Loss: Calculate the Loss/Error of the output 4. Determine the gradients (derivatives) 5. Update the Weights accordingly 6. Do that until every example has been seen once 26
Epoch The idea of an epoch (similar to the Perceptron Algorithm): 1. Pick a few examples of data at random (batch) 2. Calculate the output of the net 3. Loss: Calculate the Loss/Error of the output 4. Determine the gradients (derivatives) 5. Update the Weights accordingly 6. Do that until every example has been seen once 26
0.9 cat 0.1 dog Backpropagation - An visual Intuition Update the Weights accordingly? loss 27
0.9 cat 0.1 dog Backpropagation - An visual Intuition Update the Weights accordingly? loss Input example 27
0.9 cat 0.1 dog Backpropagation - An visual Intuition Update the Weights accordingly? loss Transform to network input 27
0.9 cat 0.1 dog Backpropagation - An visual Intuition Update the Weights accordingly? loss Calculate a dense transformation 27
Backpropagation - An visual Intuition Update the Weights accordingly? loss Calculate the output of the network 27 0.9 cat 0.1 dog
Backpropagation - An visual Intuition Update the Weights accordingly? loss Calculate the loss function 27 0.9 cat 0.1 dog
Backpropagation - An visual Intuition Update the Weights accordingly? loss Update the weights, s.t. the probability for Cat decreases. 27 0.9 cat 0.1 dog
Backpropagation - An visual Intuition Update the Weights accordingly? loss Update the weights, s.t. the probability for Cat decreases. 27 0.9 cat 0.1 dog
Backpropagation - An visual Intuition Update the Weights accordingly? loss Update the weights, s.t. the probability for Dog increases. 27 0.9 cat 0.1 dog
Backpropagation - An visual Intuition Update the Weights accordingly? loss Update the weights, s.t. the probability for Dog increases. 27 0.9 cat 0.1 dog
Backpropagation - Reading A neural network is trained by a combination of Gradient Descent and Backpropagation • A good video for intuition 28 • Very mathematical
How does a specifjc weight w ij infmuence the error made on the Backpropagation - A mathematical Intuition The neural network is a parametrized function, e.g.: , with parameteres W and b and a loss function : loss pred i truth example? loss w ij loss w ij 29 pred ( i ) = α ( W · i + b )
How does a specifjc weight w ij infmuence the error made on the Backpropagation - A mathematical Intuition The neural network is a parametrized function, e.g.: , with parameteres W and b and a loss function : example? loss w ij loss w ij 29 pred ( i ) = α ( W · i + b ) loss ( pred ( i ) , truth )
Backpropagation - A mathematical Intuition The neural network is a parametrized function, e.g.: , with parameteres W and b and a loss function : example? loss w ij loss w ij 29 pred ( i ) = α ( W · i + b ) loss ( pred ( i ) , truth ) How does a specifjc weight w ij infmuence the error made on the
Recommend
More recommend