deep learning hyperparameter optimization with competing
play

Deep Learning Hyperparameter Optimization with Competing Objectives - PowerPoint PPT Presentation

Deep Learning Hyperparameter Optimization with Competing Objectives GTC 2018 - S8136 Scott Clark scott@sigopt.com OUTLINE 1. Why is Tuning Models Hard? 2. Common Tuning Methods 3. Deep Learning Example 4. Tuning Multiple Metrics 5.


  1. Deep Learning Hyperparameter Optimization with Competing Objectives GTC 2018 - S8136 Scott Clark scott@sigopt.com

  2. OUTLINE 1. Why is Tuning Models Hard? 2. Common Tuning Methods 3. Deep Learning Example 4. Tuning Multiple Metrics 5. Multi-metric Optimization Examples

  3. Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive

  4. Photo: Joe Ross

  5. TUNABLE PARAMETERS IN DEEP LEARNING

  6. TUNABLE PARAMETERS IN DEEP LEARNING

  7. TUNABLE PARAMETERS IN DEEP LEARNING

  8. TUNABLE PARAMETERS IN DEEP LEARNING

  9. TUNABLE PARAMETERS IN DEEP LEARNING

  10. Photo: Tammy Strobel

  11. STANDARD METHODS FOR HYPERPARAMETER SEARCH

  12. STANDARD TUNING METHODS Manual Search Parameter Configuration - Weights - Thresholds Training ML / AI - Window sizes Data Model - Transformations Grid Search Random Search ? Cross Testing Validation Data

  13. OPTIMIZATION FEEDBACK LOOP New configurations Training ML / AI Data Model Better Results Objective Metric Cross Testing Validation Data REST API

  14. DEEP LEARNING EXAMPLE

  15. SIGOPT + MXNET ● Classify movie reviews using a CNN in MXNet https://aws.amazon.com/blogs/machine-learning/fast-cnn-tuning-with-aws-gpu-instances-and-sigopt/

  16. TEXT CLASSIFICATION PIPELINE Hyperparameter Configurations and Feature ML / AI Transformations Training Model Text (MXNet) Better Results Accuracy Testing Validation Text REST API

  17. STOCHASTIC GRADIENT DESCENT ● Comparison of several RMSProp SGD parametrizations

  18. ARCHITECTURE PARAMETERS

  19. MULTIPLICATIVE TUNING SPEED UP

  20. SPEED UP #1: CPU -> GPU

  21. SPEED UP #2: RANDOM/GRID -> SIGOPT

  22. CONSISTENTLY BETTER AND FASTER

  23. TUNING MULTIPLE METRICS What if we want to optimize multiple competing metrics? Complexity Tradeoffs ● Accuracy vs Training Time ○ Accuracy vs Inference Time ○ Business Metrics ● Fraud Accuracy vs Money Lost ○ Conversion Rate vs LTV ○ Engagement vs Profit ○ Profit vs Drawdown ○

  24. PARETO OPTIMAL What does it mean to optimize two metrics simultaneously? Pareto efficiency or Pareto optimality is a state of allocation of resources from which it is impossible to reallocate so as to make any one individual or preference criterion better off without making at least one individual or preference criterion worse off.

  25. PARETO OPTIMAL What does it mean to optimize two metrics simultaneously? The red points are on the Pareto Efficient Frontier, they strictly dominate all of the grey points. You can do no better in one metric without sacrificing performance in the other. Point N is Pareto Optimal compared to Point K.

  26. PARETO EFFICIENT FRONTIER Goal is to have best set of feasible solutions to select from After optimization the expert picks one or more of the red points from the Pareto Efficient Frontier to further study or put into production.

  27. TOY EXAMPLE

  28. MULTI-METRIC OPTIMIZATION

  29. DEEP LEARNING EXAMPLES

  30. MULTI-METRIC OPT IN DEEP LEARNING https://devblogs.nvidia.com/sigopt-deep-learning-hyperparameter-optimization/

  31. DEEP LEARNING TRADEOFFS Deep Learning pipelines are time ● consuming and expensive to run Application and deployment ● conditions may make certain configurations less desirable Tuning for both accuracy and ● complexity metrics like training or inference time allows expert to make best decision for production

  32. STOCHASTIC GRADIENT DESCENT ● Comparison of several RMSProp SGD parametrizations ● Different configurations converge differently

  33. TEXT CLASSIFICATION PIPELINE Hyperparameter Configurations and Feature ML / AI Transformations Training Model Text (MXNet) Better Results Accuracy Testing Validation Training Time Text REST API

  34. FINDING THE FRONTIER

  35. SEQUENCE CLASSIFICATION PIPELINE Hyperparameter Configurations and Feature ML / AI Transformations Training Model Sequences (Tensorflow) Better Results Accuracy Testing Validation Inference Time Sequences REST API

  36. TEXT CLASSIFICATION PIPELINE

  37. FINDING THE FRONTIER

  38. FINDING THE FRONTIER

  39. LOAN CLASSIFICATION PIPELINE Hyperparameter Configurations and Feature ML / AI Transformations Training Model Data (LightGBM) Better Results AUCPR Testing Validation Avg $ Lost Data REST API

  40. GRID SEARCH CAN MISLEAD Best grid search point (wrt ● accuracy) loses >$35 / transaction Best grid search point (wrt loss) ● has 70% accuracy Points of the Pareto Frontier give ● user more information about what is possible and more control of trade-offs

  41. DISTRIBUTED TRAINING/SCHEDULING SigOpt serves as a distributed ● scheduler for training models across workers Workers access the SigOpt API ● for the latest parameters to try for each model Enables easy distributed ● training of non-distributed algorithms across any number of models

  42. TAKEAWAYS One metric may not paint the whole picture - Think about metric trade-offs in your model pipelines - Optimizing for the wrong thing can be very expensive Not all optimization strategies are equal - Pick an optimization strategy that gives the most flexibility - Different tools enable you to tackle new problems

  43. Questions? contact@sigopt.com https://sigopt.com @SigOpt

Recommend


More recommend