neural network basics
play

Neural Network Basics Niloy Mitra Iasonas Kokkinos Paul Guerrero - PowerPoint PPT Presentation

Deep Learning for Graphics Neural Network Basics Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL EG Course Deep Learning for Graphics Timetable


  1. Stochastic Gradient Descent (SGD) Gradient: Batch: [1..N] Noisy (‘Stochastic’) Gradient: b(1), b(2),…, b(B): sampled from [1,N] Minibatch: B elements Epoch: N samples, N/B batches EG Course Deep Learning for Graphics

  2. Code example Gradient Descent vs Stochastic Gradient Descent 62 EG Course Deep Learning for Graphics

  3. Regularization in SGD: Weight Decay Gradient: Batch: [1..N] Noisy (‘Stochastic’) Gradient: b(1), b(2),…, b(B): sampled from [1,N] Minibatch: B elements ‘’Weight decay’’ Back-prop on minibatch Epoch: N samples, N/B batches EG Course Deep Learning for Graphics

  4. Learning rate EG Course Deep Learning for Graphics

  5. Gradient Descent EG Course Deep Learning for Graphics

  6. (S)GD with adaptable stepsize e.g. EG Course Deep Learning for Graphics

  7. (S)GD with momentum Main idea: retain long-term trend of updates, drop oscillations (S)GD (S)GD + momentum EG Course Deep Learning for Graphics

  8. Code example Multi-layer perceptron classification 68 EG Course Deep Learning for Graphics

  9. Step-size Selection & Optimizers: research problem • Nesterov’s Accelerated Gradient (NAG) • R-prop • AdaGrad • RMSProp • AdaDelta • Adam • … EG Course Deep Learning for Graphics

  10. Neural Network Training: Old & New Tricks Old: (80’s) Stochastic Gradient Descent, Momentum, “weight decay” New: (last 5-6 years) Dropout ReLUs Batch Normalization EG Course Deep Learning for Graphics

  11. Linearization: may need higher dimensions http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/ EG Course Deep Learning for Graphics

  12. Reminder: Overfitting, in images Classification just right Regression EG Course Deep Learning for Graphics

  13. Previously: l2 Regularization Per-sample loss Per-layer regularization EG Course Deep Learning for Graphics

  14. Dropout Each sample is processed by a ‘decimated’ neural net Decimated nets: distinct classifiers But: they should all do the same job EG Course Deep Learning for Graphics

  15. Dropout block EG Course Deep Learning for Graphics ‘Feature noising’

  16. Test time: Deterministic Approximation EG Course Deep Learning for Graphics

  17. Dropout Performance EG Course Deep Learning for Graphics

  18. Neural Network Training: Old & New Tricks Old: (80’s) Stochastic Gradient Descent, Momentum, “weight decay” New: (last 5-6 years) Dropout ReLUs Batch Normalization EG Course Deep Learning for Graphics

  19. ‘Neuron’: Cascade of Linear and Nonlinear Function Sigmoidal (“logistic”) Rectified Linear Unit (RELU) EG Course Deep Learning for Graphics

  20. Reminder: a network in backward mode Outputs Gradient signal scaling: <1 (actually <0.25) from above EG Course Deep Learning for Graphics

  21. Vanishing Gradients Problem Gradient signal scaling: <1 (actually <0.25) from above Do this 10 times: updates in the first layers get minimal Top layer knows what to do, lower layers “don’t get it” Sigmoidal Unit: Signal is not getting through! EG Course Deep Learning for Graphics

  22. Vanishing Gradients Problem: ReLU Solves It Gradient signal Scaling: {0,1} from above EG Course Deep Learning for Graphics

  23. Neural Network Training: Old & New Tricks Old: (80’s) Stochastic Gradient Descent, Momentum, “weight decay” New: (last 5-6 years) Dropout ReLUs Batch Normalization EG Course Deep Learning for Graphics

  24. External Covariate Shift: your input changes 10 am 2pm 7pm EG Course Deep Learning for Graphics

  25. “Whitening”: Set Mean = 0, Variance = 1 Photometric transformation: I  a I + b • Make each patch have zero mean: • Then make it have unit variance: EG Course Deep Learning for Graphics

  26. Internal Covariate Shift Neural network activations during training: moving target EG Course Deep Learning for Graphics

  27. Batch Normalization Whiten-as-you-go: EG Course Deep Learning for Graphics

  28. Batch Normalization: used in all current systems EG Course Deep Learning for Graphics

  29. Convolutional Neural Networks

  30. Fully-connected Layer Example: 200x200 image 40K hidden units ~2B parameters!!! Spatial correlation is local - Waste of resources - we have not enough training samples anyway.. - EG Course Deep Learning for Graphics

  31. Locally-connected Layer Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters Note: This parameterization is good when input image is registered (e.g., face recognition). EG Course Deep Learning for Graphics

  32. Locally-connected Layer Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters Note: This parameterization is good when input image is registered (e.g., face recognition). EG Course Deep Learning for Graphics

  33. Convolutional Layer Share the same parameters across different locations (assuming input is stationary): Convolutions with learned kernels EG Course Deep Learning for Graphics

  34. Convolutional Layer EG Course Deep Learning for Graphics

  35. Convolutional Layer EG Course Deep Learning for Graphics

  36. Convolutional Layer EG Course Deep Learning for Graphics

  37. Convolutional Layer EG Course Deep Learning for Graphics

  38. Convolutional Layer EG Course Deep Learning for Graphics

  39. Convolutional Layer EG Course Deep Learning for Graphics

  40. Convolutional Layer EG Course Deep Learning for Graphics

Recommend


More recommend