practical methodology
play

Practical Methodology Lecture slides for Chapter 11 of Deep Learning - PowerPoint PPT Presentation

Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge Knowing how Mountains of dozens of to apply 3-4 of data? obscure


  1. Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26

  2. What drives success in ML? Arcane knowledge Knowing how Mountains of dozens of to apply 3-4 of data? obscure algorithms? standard techniques? (2) (2) (2) h 1 h 2 h 3 (1) (1) (1) (1) h 1 h 2 h 3 h 4 v 1 v 2 v 3 (Goodfellow 2016)

  3. Example: Street View Address Number Transcription (Goodfellow et al, 2014) (Goodfellow 2016)

  4. Three Step Process • Use needs to define metric-based goals • Build an end-to-end system • Data-driven refinement (Goodfellow 2016)

  5. Identify Needs • High accuracy or low accuracy? • Surgery robot: high accuracy • Celebrity look-a-like app: low accuracy (Goodfellow 2016)

  6. Choose Metrics • Accuracy? (% of examples correct) • Coverage? (% of examples processed) • Precision? (% of detections that are right) • Recall? (% of objects detected) • Amount of error? (For regression problems) (Goodfellow 2016)

  7. End-to-end System • Get up and running ASAP • Build the simplest viable system first • What baseline to start with though? • Copy state-of-the-art from related publication (Goodfellow 2016)

  8. Deep or Not? • Lots of noise, little structure -> not deep • Little noise, complex structure -> deep • Good shallow baseline: • Use what you know • Logistic regression, SVM, boosted tree are all good (Goodfellow 2016)

  9. Choosing Architecture Family • No structure -> fully connected • Spatial structure -> convolutional • Sequential structure -> recurrent (Goodfellow 2016)

  10. Fully Connected Baseline • 2-3 hidden layer feed-forward neural network erceptron” • AKA “multilayer perceptron” Rectified linear units V • Rectified linear units W • Batch normalization • Adam • Maybe dropout (Goodfellow 2016)

  11. Convolutional Network Baseline • Download a pretrained network • Or copy-paste an architecture from a related task olutional baseline • Or: • Deep residual network • Batch normalization • Adam (Goodfellow 2016)

  12. Recurrent Network Baseline output × • LSTM output gate self-loop • SGD + × state forget gate • Gradient clipping × input gate • High forget gate bias input (Goodfellow 2016)

  13. Data-driven Adaptation • Choose what to do based on data • Don’t believe hype • Measure train and test error • “Overfitting” versus “underfitting” (Goodfellow 2016)

  14. High Train Error • Inspect data for defects • Inspect software for bugs • Don’t roll your own unless you know what you’re doing • Tune learning rate (and other optimization settings) • Make model bigger (Goodfellow 2016)

  15. Checking Data for Defects • Can a human process it? 26624 (Goodfellow 2016)

  16. Effect of Depth 92.5 Test accuracy (%) 96.5 96.0 95.5 95.0 94.5 94.0 93.5 93.0 92.0 Number of hidden layers 11 10 9 8 7 6 5 4 3 Increasing Depth (Goodfellow 2016)

  17. High Test Error • Add dataset augmentation • Add dropout • Collect more data (Goodfellow 2016)

  18. Optimal capacity (polynomial degree) 10 4 Test (optimal capacity) Train (optimal capacity) 10 0 10 1 10 2 10 3 10 5 Train (quadratic) # train examples 0 5 10 15 20 Test (quadratic) Bayes error 0 10 0 6 5 4 3 2 1 Error (MSE) # train examples 10 5 10 4 10 3 10 2 10 1 Increasing Training Set Size (Goodfellow 2016)

  19. Tuning the Learning Rate 8 7 6 Training error 5 4 3 2 1 0 10 − 2 10 − 1 10 0 Learning rate (logarithmic scale) Figure 11.1 (Goodfellow 2016)

  20. Reasoning about Hyperparameters Hyperparameter Increases Reason Caveats capacity when. . . Number of hid- increased Increasing the number of Increasing the number den units hidden units increases the of hidden units increases representational capacity both the time and memory of the model. cost of essentially every op- eration on the model. Table 11.1 (Goodfellow 2016)

  21. Hyperparameter Search Grid Random Figure 11.2 (Goodfellow 2016)

Recommend


More recommend