practical methodology for deploying machine learning
play

Practical Methodology for Deploying Machine Learning Ian Goodfellow - PowerPoint PPT Presentation

Practical Methodology for Deploying Machine Learning Ian Goodfellow (An homage to Advice for Applying Machine Learning by Andrew Ng) What drives success in ML? Arcane knowledge Knowing how Mountains of dozens of to apply 3-4 of data?


  1. Practical Methodology for Deploying Machine Learning Ian Goodfellow (An homage to “Advice for Applying Machine Learning” by Andrew Ng)

  2. What drives success in ML? Arcane knowledge Knowing how Mountains of dozens of to apply 3-4 of data? obscure algorithms? standard techniques? (2) (2) (2) h 1 h 2 h 3 (1) (1) (1) (1) h 1 h 2 h 3 h 4 v 1 v 2 v 3

  3. Street View Transcription (Goodfellow et al, 2014)

  4. 3 Step Process • Use needs to define metric-based goals • Build an end-to-end system • Data-driven refinement

  5. Identify needs • High accuracy or low accuracy? • Surgery robot: high accuracy • Celebrity look-a-like app: low accuracy

  6. Choose Metrics • Accuracy? (% of examples correct) • Coverage? (% of examples processed) • Precision? (% of detections that are right) • Recall? (% of objects detected) • Amount of error? (For regression problems)

  7. End-to-end system • Get up and running ASAP • Build the simplest viable system first • What baseline to start with though? • Copy state-of-the-art from related publication

  8. Deep or not? • Lots of noise, little structure -> not deep • Little noise, complex structure -> deep • Good shallow baseline: • Use what you know • Logistic regression, SVM, boosted tree are all good

  9. What kind of deep? • No structure -> fully connected • Spatial structure -> convolutional • Sequential structure -> recurrent

  10. Fully connected baseline • 2-3 hidden layer feedforward network • AKA “multilayer perceptron” • Rectified linear units V • Dropout W • SGD + momentum

  11. Convolutional baseline • Inception • Batch normalization • Fallback option: • Rectified linear convolutional net • Dropout • SGD + momentum

  12. Recurrent baseline output × • LSTM output gate self-loop • SGD + × state forget gate • Gradient clipping × input gate • High forget gate bias input

  13. Data driven adaptation • Choose what to do based on data • Don’t believe hype • Measure train and test error • “Overfitting” versus “underfitting”

  14. High train error • Inspect data for defects • Inspect software for bugs • Don’t roll your own unless you know what you’re doing • Tune learning rate (and other optimization settings) • Make model bigger

  15. Checking data for defects • Can a human process it? 26624

  16. Effect of Depth 92.5 Test accuracy (%) 96.5 96.0 95.5 95.0 94.5 94.0 93.5 93.0 92.0 3 Number of hidden layers 11 10 9 8 7 6 5 4 Increasing depth

  17. High test error • Add dataset augmentation • Add dropout • Collect more data

  18. Optimal capacity (polynomial degree) Test (quadratic) 20 15 10 5 0 # train examples 10 5 10 4 10 3 10 2 10 1 10 0 Train (optimal capacity) Test (optimal capacity) Train (quadratic) 10 0 0 10 1 10 2 10 3 10 4 10 5 # train examples 1 Bayes error 2 3 4 5 6 Error (MSE) Increasing training set size

  19. Deep Learning textbook Yoshua Bengio Ian Goodfellow Aaron Courville goodfeli.github.io/dlbook

Recommend


More recommend