deep learning
play

Deep Learning T HEORY , H ISTORY , S TATE OF THE A RT & P - PowerPoint PPT Presentation

Deep Learning T HEORY , H ISTORY , S TATE OF THE A RT & P RACTICAL T OOLS by Ilya Kuzovkin ilya.kuzovkin@gmail.com Machine Learning Estonia http://neuro.cs.ut.ee 2016 Where it has started How it learns How it evolved What is the state


  1. NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

  2. NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights Learning rate http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

  3. NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 1. The Backwards Pass — updating weights Learning rate Gradient descent update rule http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

  4. NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

  5. NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 • Repeat for w6, w7, w8 http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

  6. NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 • Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4 http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

  7. NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 • Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4 • Calculate the total error again: 0.29 1027924 it was: 0.29 8371109 http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

  8. NET OUT How it learns Backpropagation Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99 • Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4 • Calculate the total error again: 0.29 1027924 it was: 0.29 8371109 � • Repeat 10,000 times: 0.000035085 http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

  9. How it learns Optimization methods Alec Radford “Introduction to Deep Learning with Python”

  10. How it learns Optimization methods Alec Radford “Introduction to Deep Learning with Python”

  11. How it evolved

  12. How it evolved 1-layer NN INPUT OUTPUT Alec Radford “Introduction to Deep Learning with Python”

  13. How it evolved 1-layer NN 92.5% on the MNIST test set INPUT OUTPUT Alec Radford “Introduction to Deep Learning with Python”

  14. How it evolved 1-layer NN 92.5% on the MNIST test set INPUT OUTPUT Alec Radford “Introduction to Deep Learning with Python”

  15. How it evolved One hidden layer Alec Radford “Introduction to Deep Learning with Python”

  16. How it evolved One hidden layer 98.2% on the MNIST test set Alec Radford “Introduction to Deep Learning with Python”

  17. How it evolved One hidden layer Activity of a 100 hidden 98.2% on the MNIST test set neurons (out of 625) Alec Radford “Introduction to Deep Learning with Python”

  18. How it evolved Overfitting

  19. How it evolved Dropout Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014

  20. How it evolved Dropout Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014

  21. How it evolved Dropout Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014

  22. How it evolved Dropout Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014

  23. How it evolved ReLU X. Glorot, A. Bordes, Y. Bengio, “Deep Sparse Rectifier Neural Networks”, 2011

  24. How it evolved ReLU X. Glorot, A. Bordes, Y. Bengio, “Deep Sparse Rectifier Neural Networks”, 2011

  25. How it evolved “Modern” ANN • Several hidden layers • ReLU activation units • Dropout

  26. How it evolved “Modern” ANN • Several hidden layers • ReLU activation units • Dropout

  27. How it evolved “Modern” ANN • Several hidden layers • ReLU activation units • Dropout 99.0% on the MNIST test set

  28. How it evolved Convolution

  29. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector

  30. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector

  31. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt edge detector

  32. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt 40 40 40 edge 40 40 40 detector 40 40 40 10 10 10 10 10 10 10 10 10

  33. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt +1 +1 +1 40 40 40 edge 0 0 0 40 40 40 detector -1 -1 -1 40 40 40 10 10 10 10 10 10 10 10 10

  34. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt +40 +40 +40 edge 0 0 0 detector -40 -40 -40 10 10 10 10 10 10 10 10 10

  35. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt +40 +40 +40 edge 0 0 0 0 detector -40 -40 -40 10 10 10 10 10 10 10 10 10

  36. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt 40 40 40 edge 0 40 40 40 detector 90 40 40 40 90 10 10 10 0 10 10 10 10 10 10

  37. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt 40 40 40 edge 0 40 40 40 detector 90 40 40 40 90 10 10 10 0 10 10 10 10 10 10

  38. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt 40 40 40 edge 0 40 40 40 detector 90 40 40 40 90 10 10 10 0 10 10 10 10 10 10

  39. How it evolved Convolution +1 +1 +1 0 0 0 -1 -1 -1 Prewitt 40 40 40 edge 0 40 40 40 detector 90 40 40 40 90 10 10 10 0 10 10 10 10 10 10 Edge detector is a handcrafted feature detector.

  40. How it evolved Convolution The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones

  41. How it evolved Convolution The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones http://yann.lecun.com/exdb/mnist/

  42. How it evolved Convolution The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones 99.50% on the MNIST test set CURRENT BEST : 99.77% by committee of 35 conv. nets http://yann.lecun.com/exdb/mnist/

  43. How it evolved More layers

  44. How it evolved More layers C. Szegedy, et al., “Going Deeper with Convolutions”, 2014

  45. How it evolved More layers C. Szegedy, et al., “Going Deeper with Convolutions”, 2014 ILSVRC 2015 winner — 152 (!) layers K. He et al., “Deep Residual Learning for Image Recognition”, 2015

  46. How it evolved Hyperparameters • Network: architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization • • Convolutional layers: size • stride • number of filters • • Optimization method: learning rate • other method-specific • constants … •

  47. How it evolved Grid search :( Hyperparameters • Network: architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization • • Convolutional layers: size • stride • number of filters • • Optimization method: learning rate • other method-specific • constants … •

  48. How it evolved Grid search :( Hyperparameters Random search :/ • Network: architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization • • Convolutional layers: size • stride • number of filters • • Optimization method: learning rate • other method-specific • constants … •

  49. How it evolved Grid search :( Hyperparameters Random search :/ • Network: architecture • Bayesian optimization :) number of layers • number of units (in each layer) • type of the activation function • weight initialization • • Convolutional layers: size • stride • number of filters • • Optimization method: learning rate • other method-specific • constants … • Snoek, Larochelle, Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”

  50. How it evolved Grid search :( Hyperparameters Random search :/ • Network: architecture • Bayesian optimization :) number of layers • number of units (in each layer) • type of the activation function • weight initialization • • Convolutional layers: size • stride • number of filters • • Optimization method: learning rate • other method-specific • constants Informal parameter search :) … • Snoek, Larochelle, Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”

  51. How it evolved Major Types of ANNs feedforward convolutional

  52. How it evolved Major Types of ANNs feedforward convolutional recurrent

Recommend


More recommend