Neural Networks Oskar Taubert (SCC) SCC 1 15.01.2020 Oskar Taubert - Neural Networks SCC www.kit.edu KIT – The Research University in the Helmholtz Association
Neural Network Concept (very) remotely brain inspired computational system directed graph, encoding an ordered system of simple mathematical transformations successor of the perceptron concept (i.e. logistic regression) more complicated ’fit’ i.e. universal function approximator usually supervised machine learning 2 15.01.2020 Oskar Taubert - Neural Networks SCC
Motivation automation of tasks machines are traditionally bad at image recognition natural language processing . . . image processing challenges e.g. character recognition (MNIST) 3 15.01.2020 Oskar Taubert - Neural Networks SCC
Perceptron decide whether an image x 2 depicts a 0: o = Θ ( w · x + b ) where Output: o ∈ { 0 , 1 } Input: x ∈ R | Pixels | x 1 Parameter: w ∈ R | Pixels | Parameter: b ∈ R 4 15.01.2020 Oskar Taubert - Neural Networks SCC
Perceptron decide whether an image x 2 depicts a 0: o = Θ ( w · x + b ) where Output: o ∈ { 0 , 1 } Input: x ∈ R | Pixels | x 1 Parameter: w ∈ R | Pixels | Parameter: b ∈ R Problem: Non-linear decision boundaries 4 15.01.2020 Oskar Taubert - Neural Networks SCC
XOR 1 x 2 0 . 5 0 0 0 . 5 1 x 1 OR-gate h 1 = x 1 + x 2 − 1 5 15.01.2020 Oskar Taubert - Neural Networks SCC
XOR 1 x 2 0 . 5 x 2 0 0 0 . 5 1 0 0 . 5 1 x 1 x 1 OR-gate NAND-gate h 1 = x 1 + x 2 − 1 h 2 = − x 1 − x 2 − 1 . 5 5 15.01.2020 Oskar Taubert - Neural Networks SCC
XOR 1 h 2 x 2 0 . 5 x 2 0 0 0 . 5 1 0 0 . 5 1 0 0 . 5 1 x 1 x 1 h 1 OR-gate NAND-gate AND-gate h 1 = x 1 + x 2 − 1 h 2 = − x 1 − x 2 − 1 . 5 y = h 1 + h 2 − 1 . 5 ˆ 5 15.01.2020 Oskar Taubert - Neural Networks SCC
Multilayer Perceptron o = f ( W · h + b ) x 0 h = g ( V · x + c ) o 0 h 0 o ∈ R 10 x 1 W ∈ R 10 × 3 o 1 h 1 h ∈ R 3 . . . b ∈ R 10 . . . h 2 V ∈ R | Pixels |× 3 x n x ∈ R | Pixels | Input Output Hidden c ∈ R 3 6 15.01.2020 Oskar Taubert - Neural Networks SCC
Multilayer Perceptron o = f ( W · h + b ) x 0 h = g ( V · x + c ) o 0 h 0 f and g ? x 1 Values for W and V ? o 1 h 1 . . . . . . h 2 x n Input Output Hidden 6 15.01.2020 Oskar Taubert - Neural Networks SCC
Training parameters = { W , V , b , c } Error measure: E ( o , t ) = ( o − t ) 2 7 15.01.2020 Oskar Taubert - Neural Networks SCC
Training parameters = { W , V , b , c } Error measure: E ( o , t ) = ( o − t ) 2 ∂ error parameters ← parameters − λ ∂ parameters 7 15.01.2020 Oskar Taubert - Neural Networks SCC
Training parameters = { W , V , b , c } Error measure: E ( o , t ) = ( o − t ) 2 ∂ error parameters ← parameters − λ ∂ parameters ∂ W = ∂ E ∂ E ∂ W = ∂ E ∂ o ∂ o ∂ f ∂ W h ∂ o ∂ o ∂ f ∂ E ∂ b = ∂ E ∂ o ∂ b = ∂ E ∂ o ∂ f ∂ o ∂ o ∂ f ∂ b 7 15.01.2020 Oskar Taubert - Neural Networks SCC
Training parameters = { W , V , b , c } Error measure: E ( o , t ) = ( o − t ) 2 ∂ error parameters ← parameters − λ ∂ parameters ∂ W = ∂ E ∂ E ∂ W = ∂ E ∂ o ∂ o ∂ f ∂ W h ∂ o ∂ o ∂ f ∂ b = ∂ E ∂ E ∂ b = ∂ E ∂ o ∂ o ∂ f ∂ o ∂ o ∂ f ∂ b ∂ E ∂ V = ∂ E ∂ o ∂ f ∂ h ∂ g ∂ V x ∂ o ∂ f ∂ h ∂ g 7 15.01.2020 Oskar Taubert - Neural Networks SCC
Training More general: E ( t , f n ( W n · f n − 1 ( . . . f 1 ( W 1 · x )))) ∂ E = δ i · h i − 1 ∂ W i δ i − 1 = δ i ∂ f i − 1 | W l − 1 · W i 7 15.01.2020 Oskar Taubert - Neural Networks SCC
Error functions Regression: MSE, KL-divergence Classification: Cross Entropy, NLL-loss Segmentation: Hinge-losses, Overlap/Dissimilarity losses 8 15.01.2020 Oskar Taubert - Neural Networks SCC
Convolutions Figure: * � Machine Learning Guru c 9 15.01.2020 Oskar Taubert - Neural Networks SCC
Convolutions Figure: * � Machine Learning Guru c 9 15.01.2020 Oskar Taubert - Neural Networks SCC
Convolutions Figure: * � Machine Learning Guru c 9 15.01.2020 Oskar Taubert - Neural Networks SCC
Activation Functions Activation functions f ( x ) introduce non-linearity , e.g. sigmoid Other non-linear choices, e.g. tanh ( x ) , relu ( x ) = max ( 0 , x ) , exp ( x i ) softmax i ( x ) = ∑ i exp ( x i ) , etc. Better numerical properties, e.g. avoid vanishing gradient sigmoid tanh ReLU SeLU 2 1 f ( x ) 0 − 1 − 2 − 2 − 1 0 1 2 − 2 − 1 0 1 2 − 2 − 1 0 1 2 − 2 − 1 0 1 2 x x x x 10 15.01.2020 Oskar Taubert - Neural Networks SCC
Regularization degree=0 degree=1 y degree=3 degree=9 y x x 11 15.01.2020 Oskar Taubert - Neural Networks SCC
Regularization early stopping J weight decay Test loss weight sharing Optimum Training loss dropout epoch batch normalization data augmentation more data 12 15.01.2020 Oskar Taubert - Neural Networks SCC
Hyperparameters guessing experience non-gradient based optimization grid search random search particle swarm genetic 13 15.01.2020 Oskar Taubert - Neural Networks SCC
Out of Scope residual models generative models recurrent models attention models lots reinforcement learning (next week) 14 15.01.2020 Oskar Taubert - Neural Networks SCC
Sources http://nyu-cds.sparksites.io/wp-content/uploads/2015/10/ header_4@2x.png https://github.com/Markus-Goetz/gks-2019/blob/solutions/ slides/slides.pdf 15 15.01.2020 Oskar Taubert - Neural Networks SCC
Recommend
More recommend