Deep Learning Hyperparameter Optimization with Competing Objectives - PowerPoint PPT Presentation

Deep Learning Hyperparameter Optimization with Competing Objectives GTC 2018 - S8136 Scott Clark scott@sigopt.com

OUTLINE 1. Why is Tuning Models Hard? 2. Common Tuning Methods 3. Deep Learning Example 4. Tuning Multiple Metrics 5. Multi-metric Optimization Examples

Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive

Photo: Joe Ross

TUNABLE PARAMETERS IN DEEP LEARNING

Photo: Tammy Strobel

STANDARD METHODS FOR HYPERPARAMETER SEARCH

STANDARD TUNING METHODS Manual Search Parameter Configuration - Weights - Thresholds Training ML / AI - Window sizes Data Model - Transformations Grid Search Random Search ? Cross Testing Validation Data

OPTIMIZATION FEEDBACK LOOP New configurations Training ML / AI Data Model Better Results Objective Metric Cross Testing Validation Data REST API

DEEP LEARNING EXAMPLE

SIGOPT + MXNET ● Classify movie reviews using a CNN in MXNet https://aws.amazon.com/blogs/machine-learning/fast-cnn-tuning-with-aws-gpu-instances-and-sigopt/

TEXT CLASSIFICATION PIPELINE Hyperparameter Configurations and Feature ML / AI Transformations Training Model Text (MXNet) Better Results Accuracy Testing Validation Text REST API

STOCHASTIC GRADIENT DESCENT ● Comparison of several RMSProp SGD parametrizations

ARCHITECTURE PARAMETERS

MULTIPLICATIVE TUNING SPEED UP

SPEED UP #1: CPU -> GPU

SPEED UP #2: RANDOM/GRID -> SIGOPT

CONSISTENTLY BETTER AND FASTER

TUNING MULTIPLE METRICS What if we want to optimize multiple competing metrics? Complexity Tradeoffs ● Accuracy vs Training Time ○ Accuracy vs Inference Time ○ Business Metrics ● Fraud Accuracy vs Money Lost ○ Conversion Rate vs LTV ○ Engagement vs Profit ○ Profit vs Drawdown ○

PARETO OPTIMAL What does it mean to optimize two metrics simultaneously? Pareto efficiency or Pareto optimality is a state of allocation of resources from which it is impossible to reallocate so as to make any one individual or preference criterion better off without making at least one individual or preference criterion worse off.

PARETO OPTIMAL What does it mean to optimize two metrics simultaneously? The red points are on the Pareto Efficient Frontier, they strictly dominate all of the grey points. You can do no better in one metric without sacrificing performance in the other. Point N is Pareto Optimal compared to Point K.

PARETO EFFICIENT FRONTIER Goal is to have best set of feasible solutions to select from After optimization the expert picks one or more of the red points from the Pareto Efficient Frontier to further study or put into production.

TOY EXAMPLE

MULTI-METRIC OPTIMIZATION

DEEP LEARNING EXAMPLES

MULTI-METRIC OPT IN DEEP LEARNING https://devblogs.nvidia.com/sigopt-deep-learning-hyperparameter-optimization/

DEEP LEARNING TRADEOFFS Deep Learning pipelines are time ● consuming and expensive to run Application and deployment ● conditions may make certain configurations less desirable Tuning for both accuracy and ● complexity metrics like training or inference time allows expert to make best decision for production

STOCHASTIC GRADIENT DESCENT ● Comparison of several RMSProp SGD parametrizations ● Different configurations converge differently

TEXT CLASSIFICATION PIPELINE Hyperparameter Configurations and Feature ML / AI Transformations Training Model Text (MXNet) Better Results Accuracy Testing Validation Training Time Text REST API

FINDING THE FRONTIER

SEQUENCE CLASSIFICATION PIPELINE Hyperparameter Configurations and Feature ML / AI Transformations Training Model Sequences (Tensorflow) Better Results Accuracy Testing Validation Inference Time Sequences REST API

TEXT CLASSIFICATION PIPELINE

FINDING THE FRONTIER

LOAN CLASSIFICATION PIPELINE Hyperparameter Configurations and Feature ML / AI Transformations Training Model Data (LightGBM) Better Results AUCPR Testing Validation Avg $ Lost Data REST API

GRID SEARCH CAN MISLEAD Best grid search point (wrt ● accuracy) loses >$35 / transaction Best grid search point (wrt loss) ● has 70% accuracy Points of the Pareto Frontier give ● user more information about what is possible and more control of trade-offs

DISTRIBUTED TRAINING/SCHEDULING SigOpt serves as a distributed ● scheduler for training models across workers Workers access the SigOpt API ● for the latest parameters to try for each model Enables easy distributed ● training of non-distributed algorithms across any number of models

TAKEAWAYS One metric may not paint the whole picture - Think about metric trade-offs in your model pipelines - Optimizing for the wrong thing can be very expensive Not all optimization strategies are equal - Pick an optimization strategy that gives the most flexibility - Different tools enable you to tackle new problems

Questions? contact@sigopt.com https://sigopt.com @SigOpt

Deep Learning Hyperparameter Optimization with Competing Objectives - PowerPoint PPT Presentation

Deep Learning Hyperparameter Optimization with Competing Objectives GTC 2018 - S8136 Scott Clark scott@sigopt.com OUTLINE 1. Why is Tuning Models Hard? 2. Common Tuning Methods 3. Deep Learning Example 4. Tuning Multiple Metrics 5.

Hyperparameter tuning in caret Dr. Shirin Glander Data Scientist DataCamp Hyperparameter

Parameters vs hyperparameters Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning

13. hypothesis testing 1 competing hypotheses 2 competing hypotheses 3 competing hypotheses

Machine learning with mlr Dr. Shirin Elsinghorst Data Scientist DataCamp Hyperparameter Tuning

Machine learning with H2O Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning in R

CSC321 Lecture 21: Bayesian Hyperparameter Optimization Roger Grosse Roger Grosse CSC321

Improving Bug Prediction Accuracy by Regularization and Hyperparameter Optimization Haidar Osman

Hyperparameter optimization strategies git clone

Hyperparameter Optimization with SHERPA Lars Hertel, Julian Collado, Peter Sadowski, Pierre Baldi

Session Five Five Session Session Five Competing in a global world: a Competing in a global

Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Competing Priorities for Land and Tenure Presenter: Dr. Jolyne Sanjak Property Rights and

Session A: Supersaturated Design (Wednesday, March 4, 8:30AM-10:00AM) Searching for Powerful

Critical Level Policies in Lost Sales Inventory Systems with Different Demand Classes Aleksander

A constrained optimization problem under uncertainty Raluca Andrei, Gert de Cooman, Erik

Delivering a Healthy WA PPPs Objectives are to motivate the private proponent to deliver VFM

D-optimal designs for dependent binary variables Daniel Bruce daniel.bruce@stat.su.se Department

EQUITY CONSIDERATIONS IN FRESHWATER MANAGEMENT Ruamhanga Whaitua Committee Jim Sinner November

Facility location I. Chapter 10 Facility location Continuous facility location models Single

Optimal Reinsurance with Positive Dependence Presenter: Wei Wei, Co-author: Jun Cai University of