Asynchronous Hyperparameter Tuning and Ablation Studies with Apache Spark Sina Sheikholeslami Distributed Computing Group, KTH Royal Institute of Technology @cutlash CASTOR Software Days 2019 October 16 2019 sinash@kth.se
The Machine Learning System Repeat if needed Machine Learning Dataset Model Problem Definition Data Preparation Evaluate Model Selection Optimizer Model Training October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 2
Artificial Neural Networks Output Layer Input Layer Hidden Layer October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 3
How We Study the Brain • Early 19 th Century, ablative brain surgeries by Jean Pierre Flourens (1794 - 1867) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 4
Ablation for Machine Learning? floors area rooms price Repeat if needed Machine Learning Dataset Model Problem Definition Data Preparation Evaluate Model Selection Optimizer Model Training October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 5
Talk of the Town “Too frequently, authors propose many tweaks absent proper ablation studies … Sometimes just one of the changes is actually responsible for the improved results … this practice misleads readers to believe that all of the proposed changes are necessary.” (Lipton & Steinhardt, “ Troubling Trends in Machine Learning Scholarship ”) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 6
Example: Layer Ablation (1/6) Accuracy: 78% The Base Model October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 7
Example: Layer Ablation (2/6) Accuracy: 73% October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 8
Example: Layer Ablation (3/6) The Base Model October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 9
Example: Layer Ablation (4/6) Accuracy: 67% October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 10
Example: Layer Ablation (5/6) The Base Model October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 11
Example: Layer Ablation (6/6) Accuracy: 63% October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 12
Ablation Study Evaluate Machine Learning Ablation System New Dataset / Model Configuration October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 13
Hyperparameter Tuning Evaluate Machine Learning Hyperparameter System Tuner New Hyperparameter Values October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 14
System Experimentation (Search) Evaluate Global Machine Learning Experiment System Controller New Trial October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 15
Better Parallel • Ability to train better models, faster • Ability to modify and inspect, easier (“Parallel Training” - by Maxim Melnikov) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 18
Parallelization in Practice Machine Learning Parallel Deep Learning Processing (TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 19
Hopsworks Open-source Platform for Data-intensive AI October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 20
Hopsworks Open-source Platform for Data-intensive AI What is Hopsworks? https://tinyurl.com/y4ze79d4 October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 21
ML/DL in Hopsworks Machine Learning Experiments Data Pipelines Feature Data Parallel Model Serving Hyperparameter Ablation Ingest & Prep Store Training Tuning Studies Bottleneck, due to iterative nature • human interaction • October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 22
Spark and Bulk Synchronous Parallel Model HDFS Metrics 3 Metrics 2 Metrics 1 Task 11 Task 21 Task 31 Task 12 Task 22 Task 32 Barrier Barrier Barrier Task 13 Task 23 Task 33 … … … Task 1N Task 2N Task 3N Driver October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 23
Example: Synchronous Hyperparameter Search HDFS Metrics 3 Metrics 1 Metrics 2 Task 11 Task 21 Task 31 Task 12 Task 22 Task 32 Barrier Barrier Barrier Task 13 Task 23 Task 33 … … … Task 1N Task 2N Task 3N Wasted Compute Wasted Compute Wasted Compute Driver October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 24
Critical Requirements • Parallel execution of trials • Support for early stopping of trials • Support for global control of the experiment • Resilience to stragglers • Simple, “Unified” User & Developer API October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 25
Maggy An Open-source Framework for Asynchronous Computation on top of Apache Spark October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 26
Key Idea: Long Running Tasks Task 11 Task 12 Barrier Task 13 Metrics … New Trial Task 1N Driver October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 27
Maggy Core Architecture October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 28
Back to Ablation October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 29
LOCO: Leave One Component Out • A simple, “natural” ablation policy: an implementation of an ablator • Currently supports Feature Ablation + Layer Ablation October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 30
Feature Ablation • Uses the Feature Store to access the dataset metadata • Generates Python callables that once called, will return modified datasets • Removes one-feature-at-a-time floors price floors rooms price area rooms October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 31
Layer Ablation • Uses a base model function • Generates Python callables that once called, will return modified models • Uses the model configuration to find and remove layer(s) • Removes one-layer-at-a-time (or one-layer-group-at-a-time) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 32
(Example Notebook Available!) Ablation User & Developer API October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 33
User API: Initialize the Study and Add Features October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 34
User API: Define Base Model October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 35
User API: Setup Model Ablation October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 36
User API: Wrap the Training Function October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 37
User API: Lagom! October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 38
Developer API: Policy Implementation (1/2) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 39
Developer API: Policy Implementation (2/2) October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 40
Hyperparameter Tuning: User API October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 41
Hyperparameter Tuning: Developer API October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 42
Maggy is Open-source • Code Repository: https://github.com/logicalclocks/maggy • API Documentation: https://maggy.readthedocs.io/en/latest/ October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 43
Next Steps • More Ablators • More Tuners • Support for More Frameworks October 16 2019 Sina Sheikholeslami - KTH Royal Institute of Technology 44
Thanks to the entire Logical Clocks Team J Specially: Thank you! J Moritz Meister @morimeister Jim Dowling @jim_dowling Robin Andersson @robzor92 Kim Hammar @KimHammar1 Alex Ormenisan @alex_ormenisan @logicalclocks @hopsworks GitHub (Example Notebook Available!) https://github.com/hopshadoop/maggy https://maggy.readthedocs.io/en/latest/ https://logicalclocks.com/whitepapers/ @cutlash sinash@kth.se October 16 2019 CASTOR Software Days 2019
Recommend
More recommend