fast and easy hyper parameter grid search for deep
play

Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC - PowerPoint PPT Presentation

Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC 2016 Mark Whitney Rescale Overview Hyper-parameter optimization intro Intro to training on Rescale Random sampling demo Advanced optimization workflows Image


  1. Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC 2016 Mark Whitney Rescale

  2. Overview • Hyper-parameter optimization intro • Intro to training on Rescale • Random sampling demo • Advanced optimization workflows

  3. Image Classification Labeled training images Train model on GPU accelerated cluster input Trained Network conv conv Neural Network Library pool fully conn softmax Model definition CAT model.add(Convolution2D(128, 3, 3) model.add(Dropout(0.4)) ...

  4. Image Classification Labeled training images Train model on GPU accelerated cluster input Trained Network conv conv Neural Network Library pool fully conn softmax Model definition CAT model.add(Convolution2D(128, 3, 3) model.add(Dropout(0.4)) ...

  5. NN Hyper-Parameter Optimization input input conv conv conv input conv pool conv pool conv conv fully conn pool pool softmax fully conn fully conn softmax fully conn input softmax input conv conv conv conv pool pool fully conn fully conn fully conn softmax softmax

  6. NN Hyper-Parameter Optimization input input conv conv conv input conv pool conv pool conv conv fully conn pool pool softmax fully conn Which one is best??? fully conn softmax fully conn input softmax input conv conv conv conv pool pool fully conn fully conn fully conn softmax softmax

  7. Hyper-Parameter Examples • Learning rates • Convolution kernel size • Convolution kernel filters • Pooling sizes • Dropout fraction • Number of convolutional and dense layers • Training epochs • Image preprocessing parameters • Thorough list in [Bengio 2012]

  8. NN Hyper-Parameter Optimization input GPU accelerated clusters input conv conv conv input conv pool conv pool conv conv fully conn pool pool softmax fully conn fully conn softmax fully conn input softmax input conv conv conv conv pool pool fully conn fully conn fully conn softmax softmax ● Large set of candidate architectures ● Search space with many GPUs, find most accurate

  9. GPU and HPC on Rescale • Founded by aerospace engineers for cloud sim • On-Demand hardware – GPU (K40s, K80s soon) – Infiniband – Integrated with 30 datacenters globally • Optimized software – Automotive – Aerospace – Life Science – Machine learning • 120 packages available

  10. Basic Model Training input conv conv pool fully conn softmax

  11. Basic Model Training Rescale Staging Storage ● Upload dataset to cloud staging storage

  12. Basic Model Training Preprocessing cluster Rescale Staging Storage ● Upload dataset to cloud staging storage ● Optionally start cluster to preprocess data, transfer data back to staging

  13. Basic Model Training Preprocessing cluster Rescale Staging Storage Training cluster model:add(nn.SpatialConvolution(128, 3, 3)) model:add(nn.ReLU(true)) model:add(nn.Dropout(0.4)) ... ● Upload dataset to cloud staging storage ● Optionally start cluster to preprocess data, transfer data back to staging ● Start GPU cluster, train model using definition and dataset

  14. Basic Model Training Preprocessing cluster input conv Rescale conv Staging Storage pool Training fully conn cluster model:add(nn.SpatialConvolution(128, 3, 3)) softmax model:add(nn.ReLU(true)) model:add(nn.Dropout(0.4)) ... ● Upload dataset to cloud staging storage ● Optionally start cluster to preprocess data, transfer data back to staging ● Start GPU cluster, train model using definition and dataset ● On completion of training, retrieve model

  15. Parallel Hyper-Parameter Search input input conv conv conv input conv pool conv pool conv conv fully conn pool pool softmax fully conn fully conn softmax fully conn input softmax input conv conv conv conv pool pool fully conn fully conn fully conn softmax softmax

  16. Parallel Hyper-Parameter Search Parallelized Training Training results Preprocessed data Best model and accuracy Model def with params Search algorithm (Grid, Monte Carlo, Black box opt) Model def template + Parameter ranges

  17. Monte Carlo/Grid Search: Templated Model Definition model.add(Convolution2D(${conv_filter_count1}, ${conv_kernel_size1}, ${conv_kernel_size1}, input_shape=(1, img_rows, img_cols))) model.add(Activation('relu')) model.add(Convolution2D(${conv_filter_count2}, ${conv_kernel_size2}, ${conv_kernel_size2})) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(${pool_size}, ${pool_size}))) model.add(Dropout(${dropout}))

  18. Demo: Monte Carlo Keras MNIST training model.add(Convolution2D( ${conv_filter_count1} , ${conv_kernel_size1} , ${conv_kernel_size1} , ... Template and Sampling Engine GPU nodes

  19. Parameter search on Rescale User provides... • Templated model • Model training and evaluation • Parameter ranges/choices • Training dataset

  20. Parameter search on Rescale User provides... Rescale does... • Templated model • Sample and inject parameters • Model training and • Provision GPU training nodes • Configure training libraries evaluation • Parameter ranges/choices • Load balance for training • Training dataset • Summarize results • Transfer tools for big datasets

  21. Custom Optimizations Templated/parameterized model Black-box optimization packages model.add(Convolution2D( ${conv_filter_count1}, SMAC Spearmint SciPy.optimize ${conv_kernel_size1}, ${conv_kernel_size1}, ... Optimization SDK Optimization Workflow Engine GPU clusters

  22. Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)

  23. Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …) # optimizer calls objective

  24. Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) # inject parameter values into template run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)

  25. Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) # submit training cmd to run run.wait() with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)

  26. Using Optimization SDK import optimization_sdk as rescale from scipy.optimize import minimize def update_model_template(X): … def objective(X): script = update_model_template(X) run = rescale.submit(training_cmd, input_files=[script], output_files=[output_file], var_values=X) run.wait() # wait for training to complete with open(output_file) as f: validation_error, test_error = extract_results(f) run.report({‘valerr’: validation_error, ‘testerr’: test_error}) return objective minimize(objective, method=’Nelder-Mead’, …)

  27. Example: Torch7 CIFAR10 ● SMAC optimizer: [Frank Hutter, Holger Hoos, and Kevin Leyton-Brown] ● Network-in-Network model: [Min Lin, Qiang Chen, Shuicheng Yan] ● Implementation: [https://github.com/szagoruyko/cifar.torch] ○ NIN + BatchNormalization + Dropout

  28. Candidate Parameter Variations • Learning (6 params) – Learning rate – Decays – Momentum – Batch size

  29. Candidate Parameter Variations • Learning (6 params) – Learning rate – Decays – Momentum – Batch size • Regularization (6 params) – Dropouts – Batch normalization – Pool sizes

  30. Candidate Parameter Variations • Learning (6 params) – Learning rate – Decays – Momentum – Batch size • Regularization (6 params) – Dropouts – Batch normalization Convolutional – Pool sizes filters conv • Structural (3 params) inner mlpconv conv mlpconv – # of NiN blocks layers pooling + dropout NiN – # of mlpconv layers per block blocks conv mlpconv – # of conv filters per layer mlpconv pooling + dropout

Recommend


More recommend