Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC - PowerPoint PPT Presentation

Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC 2016 Mark Whitney Rescale

Overview • Hyper-parameter optimization intro • Intro to training on Rescale • Random sampling demo • Advanced optimization workflows

Image Classification Labeled training images Train model on GPU accelerated cluster input Trained Network conv conv Neural Network Library pool fully conn softmax Model definition CAT model.add(Convolution2D(128, 3, 3) model.add(Dropout(0.4)) ...

NN Hyper-Parameter Optimization input input conv conv conv input conv pool conv pool conv conv fully conn pool pool softmax fully conn fully conn softmax fully conn input softmax input conv conv conv conv pool pool fully conn fully conn fully conn softmax softmax

NN Hyper-Parameter Optimization input input conv conv conv input conv pool conv pool conv conv fully conn pool pool softmax fully conn Which one is best??? fully conn softmax fully conn input softmax input conv conv conv conv pool pool fully conn fully conn fully conn softmax softmax

Hyper-Parameter Examples • Learning rates • Convolution kernel size • Convolution kernel filters • Pooling sizes • Dropout fraction • Number of convolutional and dense layers • Training epochs • Image preprocessing parameters • Thorough list in [Bengio 2012]

NN Hyper-Parameter Optimization input GPU accelerated clusters input conv conv conv input conv pool conv pool conv conv fully conn pool pool softmax fully conn fully conn softmax fully conn input softmax input conv conv conv conv pool pool fully conn fully conn fully conn softmax softmax ● Large set of candidate architectures ● Search space with many GPUs, find most accurate

GPU and HPC on Rescale • Founded by aerospace engineers for cloud sim • On-Demand hardware – GPU (K40s, K80s soon) – Infiniband – Integrated with 30 datacenters globally • Optimized software – Automotive – Aerospace – Life Science – Machine learning • 120 packages available

Basic Model Training input conv conv pool fully conn softmax

Basic Model Training Rescale Staging Storage ● Upload dataset to cloud staging storage

Basic Model Training Preprocessing cluster Rescale Staging Storage ● Upload dataset to cloud staging storage ● Optionally start cluster to preprocess data, transfer data back to staging

Basic Model Training Preprocessing cluster Rescale Staging Storage Training cluster model:add(nn.SpatialConvolution(128, 3, 3)) model:add(nn.ReLU(true)) model:add(nn.Dropout(0.4)) ... ● Upload dataset to cloud staging storage ● Optionally start cluster to preprocess data, transfer data back to staging ● Start GPU cluster, train model using definition and dataset

Basic Model Training Preprocessing cluster input conv Rescale conv Staging Storage pool Training fully conn cluster model:add(nn.SpatialConvolution(128, 3, 3)) softmax model:add(nn.ReLU(true)) model:add(nn.Dropout(0.4)) ... ● Upload dataset to cloud staging storage ● Optionally start cluster to preprocess data, transfer data back to staging ● Start GPU cluster, train model using definition and dataset ● On completion of training, retrieve model

Parallel Hyper-Parameter Search input input conv conv conv input conv pool conv pool conv conv fully conn pool pool softmax fully conn fully conn softmax fully conn input softmax input conv conv conv conv pool pool fully conn fully conn fully conn softmax softmax

Parallel Hyper-Parameter Search Parallelized Training Training results Preprocessed data Best model and accuracy Model def with params Search algorithm (Grid, Monte Carlo, Black box opt) Model def template + Parameter ranges

Monte Carlo/Grid Search: Templated Model Definition model.add(Convolution2D(${conv_filter_count1}, ${conv_kernel_size1}, ${conv_kernel_size1}, input_shape=(1, img_rows, img_cols))) model.add(Activation('relu')) model.add(Convolution2D(${conv_filter_count2}, ${conv_kernel_size2}, ${conv_kernel_size2})) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(${pool_size}, ${pool_size}))) model.add(Dropout(${dropout}))

Demo: Monte Carlo Keras MNIST training model.add(Convolution2D( ${conv_filter_count1} , ${conv_kernel_size1} , ${conv_kernel_size1} , ... Template and Sampling Engine GPU nodes

Parameter search on Rescale User provides... • Templated model • Model training and evaluation • Parameter ranges/choices • Training dataset

Parameter search on Rescale User provides... Rescale does... • Templated model • Sample and inject parameters • Model training and • Provision GPU training nodes • Configure training libraries evaluation • Parameter ranges/choices • Load balance for training • Training dataset • Summarize results • Transfer tools for big datasets

Custom Optimizations Templated/parameterized model Black-box optimization packages model.add(Convolution2D( ${conv_filter_count1}, SMAC Spearmint SciPy.optimize ${conv_kernel_size1}, ${conv_kernel_size1}, ... Optimization SDK Optimization Workflow Engine GPU clusters

Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)

Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …) # optimizer calls objective

Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) # inject parameter values into template run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)

Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) # submit training cmd to run run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)

Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) run.wait() # wait for training to complete with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)

Example: Torch7 CIFAR10 ● SMAC optimizer: [Frank Hutter, Holger Hoos, and Kevin Leyton-Brown] ● Network-in-Network model: [Min Lin, Qiang Chen, Shuicheng Yan] ● Implementation: [https://github.com/szagoruyko/cifar.torch] ○ NIN + BatchNormalization + Dropout

Candidate Parameter Variations • Learning (6 params) – Learning rate – Decays – Momentum – Batch size

Candidate Parameter Variations • Learning (6 params) – Learning rate – Decays – Momentum – Batch size • Regularization (6 params) – Dropouts – Batch normalization – Pool sizes

Candidate Parameter Variations • Learning (6 params) – Learning rate – Decays – Momentum – Batch size • Regularization (6 params) – Dropouts – Batch normalization Convolutional – Pool sizes filters conv • Structural (3 params) inner mlpconv conv mlpconv – # of NiN blocks layers pooling + dropout NiN – # of mlpconv layers per block blocks conv mlpconv – # of conv filters per layer mlpconv pooling + dropout

Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC - PowerPoint PPT Presentation

Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC 2016 Mark Whitney Rescale Overview Hyper-parameter optimization intro Intro to training on Rescale Random sampling demo Advanced optimization workflows Image

Hyper: Make VM Runs Like Container Xu Wang <xu@hyper.sh> Hyper HQ Agenda Lesson

1 Hyper-heuristics: Raising the Level of Generality of Search Hyper-heuristics: Raising the Level

From HyPer to Hyper Integrating an academic DBMS into a leading analytics and business

Sun and Grid John Barr Grid Business Development 07808 328351 john.barr@sun.com Sun and Grid

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

ON-GRID VS OFF-GRID SOLAR On-Grid Solar is solar generation that is connected to the utility grid

Hyper-scaling on Openstack with Open Source tooling A use case in deploying hyper-scale grid

Scalable Bandit Methods for Hyper-parameter Tuning Kirthevasan Kandasamy Carnegie Mellon

Hyper-Resolution AUTOMATED REASONING Hyper-resolution generalises ``bottom- (electron) up

Vembu extends support to Vembu extends support to Vembu v4.0 Hyper-V Cluster with v4.0 Agenda

Hyper-Resolution AUTOMATED REASONING Hyper-resolution is the strategy employed (electron) in the

Status of the Hyper- Kamiokande Experiment Erin OSullivan, on behalf of the Hyper-Kamiokande

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

SEE-GRID-SCI SEE-GRID Infrastructure for Regional eScience www.see-grid-sci.eu International

Expected Nodes : a quality function for the detection of link communities e Gaumont , Fran cois

Cleaning Up the Neighborhood: Duplicate Ryan de Vera, Anna Ma, Daniel Moyer, Brendan Detection

Post-Election Audit Efforts in Iowa Successes and Challenges Luke Fostvedt* Iowa State

An Interactive Introduction to R for Actuaries CAS Conference November 2009 Michael E.

ML MLOp Ops CI CI/CD CD for or Ma Machine Le Learn rning SASHA ROSENBAUM Sasha Rosenbaum

LunarLander-v2 using Deep Reinforcement Learning A project developed for Autonomous Agents

Trust Region Policy Optimization Yixin Lin Duke University yixin.lin@duke.edu March 28, 2017

VGG/MOBILENET SSD Michael Sun July September, 2019 DATASETS FOR FINE-TUNING BelgaLogos

Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC - PowerPoint PPT Presentation

Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC 2016 Mark Whitney Rescale Overview Hyper-parameter optimization intro Intro to training on Rescale Random sampling demo Advanced optimization workflows Image

Hyper: Make VM Runs Like Container Xu Wang &lt;xu@hyper.sh&gt; Hyper HQ Agenda Lesson

1 Hyper-heuristics: Raising the Level of Generality of Search Hyper-heuristics: Raising the Level

From HyPer to Hyper Integrating an academic DBMS into a leading analytics and business

Sun and Grid John Barr Grid Business Development 07808 328351 john.barr@sun.com Sun and Grid

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

ON-GRID VS OFF-GRID SOLAR On-Grid Solar is solar generation that is connected to the utility grid

Hyper-scaling on Openstack with Open Source tooling A use case in deploying hyper-scale grid

Scalable Bandit Methods for Hyper-parameter Tuning Kirthevasan Kandasamy Carnegie Mellon

Hyper-Resolution AUTOMATED REASONING Hyper-resolution generalises ``bottom- (electron) up

Vembu extends support to Vembu extends support to Vembu v4.0 Hyper-V Cluster with v4.0 Agenda

Hyper-Resolution AUTOMATED REASONING Hyper-resolution is the strategy employed (electron) in the

Status of the Hyper- Kamiokande Experiment Erin OSullivan, on behalf of the Hyper-Kamiokande

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

SEE-GRID-SCI SEE-GRID Infrastructure for Regional eScience www.see-grid-sci.eu International

Expected Nodes : a quality function for the detection of link communities e Gaumont , Fran cois

Cleaning Up the Neighborhood: Duplicate Ryan de Vera, Anna Ma, Daniel Moyer, Brendan Detection

Post-Election Audit Efforts in Iowa Successes and Challenges Luke Fostvedt* Iowa State

An Interactive Introduction to R for Actuaries CAS Conference November 2009 Michael E.

ML MLOp Ops CI CI/CD CD for or Ma Machine Le Learn rning SASHA ROSENBAUM Sasha Rosenbaum

LunarLander-v2 using Deep Reinforcement Learning A project developed for Autonomous Agents

Trust Region Policy Optimization Yixin Lin Duke University yixin.lin@duke.edu March 28, 2017

VGG/MOBILENET SSD Michael Sun July September, 2019 DATASETS FOR FINE-TUNING BelgaLogos

Hyper: Make VM Runs Like Container Xu Wang <xu@hyper.sh> Hyper HQ Agenda Lesson