Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC 2016 Mark Whitney Rescale
Overview • Hyper-parameter optimization intro • Intro to training on Rescale • Random sampling demo • Advanced optimization workflows
Image Classification Labeled training images Train model on GPU accelerated cluster input Trained Network conv conv Neural Network Library pool fully conn softmax Model definition CAT model.add(Convolution2D(128, 3, 3) model.add(Dropout(0.4)) ...
Image Classification Labeled training images Train model on GPU accelerated cluster input Trained Network conv conv Neural Network Library pool fully conn softmax Model definition CAT model.add(Convolution2D(128, 3, 3) model.add(Dropout(0.4)) ...
NN Hyper-Parameter Optimization input input conv conv conv input conv pool conv pool conv conv fully conn pool pool softmax fully conn fully conn softmax fully conn input softmax input conv conv conv conv pool pool fully conn fully conn fully conn softmax softmax
NN Hyper-Parameter Optimization input input conv conv conv input conv pool conv pool conv conv fully conn pool pool softmax fully conn Which one is best??? fully conn softmax fully conn input softmax input conv conv conv conv pool pool fully conn fully conn fully conn softmax softmax
Hyper-Parameter Examples • Learning rates • Convolution kernel size • Convolution kernel filters • Pooling sizes • Dropout fraction • Number of convolutional and dense layers • Training epochs • Image preprocessing parameters • Thorough list in [Bengio 2012]
NN Hyper-Parameter Optimization input GPU accelerated clusters input conv conv conv input conv pool conv pool conv conv fully conn pool pool softmax fully conn fully conn softmax fully conn input softmax input conv conv conv conv pool pool fully conn fully conn fully conn softmax softmax ● Large set of candidate architectures ● Search space with many GPUs, find most accurate
GPU and HPC on Rescale • Founded by aerospace engineers for cloud sim • On-Demand hardware – GPU (K40s, K80s soon) – Infiniband – Integrated with 30 datacenters globally • Optimized software – Automotive – Aerospace – Life Science – Machine learning • 120 packages available
Basic Model Training input conv conv pool fully conn softmax
Basic Model Training Rescale Staging Storage ● Upload dataset to cloud staging storage
Basic Model Training Preprocessing cluster Rescale Staging Storage ● Upload dataset to cloud staging storage ● Optionally start cluster to preprocess data, transfer data back to staging
Basic Model Training Preprocessing cluster Rescale Staging Storage Training cluster model:add(nn.SpatialConvolution(128, 3, 3)) model:add(nn.ReLU(true)) model:add(nn.Dropout(0.4)) ... ● Upload dataset to cloud staging storage ● Optionally start cluster to preprocess data, transfer data back to staging ● Start GPU cluster, train model using definition and dataset
Basic Model Training Preprocessing cluster input conv Rescale conv Staging Storage pool Training fully conn cluster model:add(nn.SpatialConvolution(128, 3, 3)) softmax model:add(nn.ReLU(true)) model:add(nn.Dropout(0.4)) ... ● Upload dataset to cloud staging storage ● Optionally start cluster to preprocess data, transfer data back to staging ● Start GPU cluster, train model using definition and dataset ● On completion of training, retrieve model
Parallel Hyper-Parameter Search input input conv conv conv input conv pool conv pool conv conv fully conn pool pool softmax fully conn fully conn softmax fully conn input softmax input conv conv conv conv pool pool fully conn fully conn fully conn softmax softmax
Parallel Hyper-Parameter Search Parallelized Training Training results Preprocessed data Best model and accuracy Model def with params Search algorithm (Grid, Monte Carlo, Black box opt) Model def template + Parameter ranges
Monte Carlo/Grid Search: Templated Model Definition model.add(Convolution2D(${conv_filter_count1}, ${conv_kernel_size1}, ${conv_kernel_size1}, input_shape=(1, img_rows, img_cols))) model.add(Activation('relu')) model.add(Convolution2D(${conv_filter_count2}, ${conv_kernel_size2}, ${conv_kernel_size2})) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(${pool_size}, ${pool_size}))) model.add(Dropout(${dropout}))
Demo: Monte Carlo Keras MNIST training model.add(Convolution2D( ${conv_filter_count1} , ${conv_kernel_size1} , ${conv_kernel_size1} , ... Template and Sampling Engine GPU nodes
Parameter search on Rescale User provides... • Templated model • Model training and evaluation • Parameter ranges/choices • Training dataset
Parameter search on Rescale User provides... Rescale does... • Templated model • Sample and inject parameters • Model training and • Provision GPU training nodes • Configure training libraries evaluation • Parameter ranges/choices • Load balance for training • Training dataset • Summarize results • Transfer tools for big datasets
Custom Optimizations Templated/parameterized model Black-box optimization packages model.add(Convolution2D( ${conv_filter_count1}, SMAC Spearmint SciPy.optimize ${conv_kernel_size1}, ${conv_kernel_size1}, ... Optimization SDK Optimization Workflow Engine GPU clusters
Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)
Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …) # optimizer calls objective
Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) # inject parameter values into template run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)
Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) # submit training cmd to run run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)
Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) run.wait() # wait for training to complete with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)
Example: Torch7 CIFAR10 ● SMAC optimizer: [Frank Hutter, Holger Hoos, and Kevin Leyton-Brown] ● Network-in-Network model: [Min Lin, Qiang Chen, Shuicheng Yan] ● Implementation: [https://github.com/szagoruyko/cifar.torch] ○ NIN + BatchNormalization + Dropout
Candidate Parameter Variations • Learning (6 params) – Learning rate – Decays – Momentum – Batch size
Candidate Parameter Variations • Learning (6 params) – Learning rate – Decays – Momentum – Batch size • Regularization (6 params) – Dropouts – Batch normalization – Pool sizes
Candidate Parameter Variations • Learning (6 params) – Learning rate – Decays – Momentum – Batch size • Regularization (6 params) – Dropouts – Batch normalization Convolutional – Pool sizes filters conv • Structural (3 params) inner mlpconv conv mlpconv – # of NiN blocks layers pooling + dropout NiN – # of mlpconv layers per block blocks conv mlpconv – # of conv filters per layer mlpconv pooling + dropout
Recommend
More recommend