Maggy - Open-Source Asynchronous Distributed Hyperparameter Optimization Based on Apache Spark Moritz Meister FOSDEM’20 moritz@logicalclocks.com 02.02.2020 @morimeister
The Bitter Lesson (of AI)* “Methods that scale with computation are the future of AI”** “The two (general purpose) methods that seem to scale … Rich Sutton (Father of Reinforcement Learning) ... are search and learning .”* * http://www.incompleteideas.net/IncIdeas/BitterLesson.html ** https://www.youtube.com/watch?v=EeMCEQa85tw 2
The Answer Spark scales with available compute! is the answer!
Distribution and Deep Learning Generalization Error Better Regularization Better Optimization Methods Algorithms Larger Training Datasets Hyperparameter Design Optimization Better Models Distribution
Inner and Outer Loop of Deep Learning Outer Loop Inner Loop Search Trial Training Data HParams, Method Architecture, worker 1 worker 2 worker N etc. … ∆ ∆ ∆ N 1 2 Synchronization Metric http://tiny.cc/51yjdz
Inner and Outer Loop of Deep Learning Outer Loop Inner Loop Search Trial Training Data HParams, Method Architecture, worker 1 worker 2 worker N etc. … SEARCH LEARNING ∆ ∆ ∆ N 1 2 Synchronization Metric http://tiny.cc/51yjdz
In Reality This Means Rewriting Training Code Explore/ Machine Learning Experiments Data Model Data Pipelines Feature Design Parallel Hyperparameter Ablation Serving Ingest & Prep Store Model Training Optimization Studies
Towards Distribution Transparency Inner Loop The distribution oblivious training function (pseudo-code):
Towards Distribution Transparency Set Hyper- parameters • Trial and Error is slow • Iterative approach is greedy • Search spaces are usually large Evaluate • Sensitivity and interaction of Train Model Performance hyperparameters
Sequential Black Box Optimization Outer Loop Search space Meta-level Learning learning & Black Box optimization Metric
Sequential Search Outer Loop Learning Global Black Box Controller Metric
Parallel Search Outer Loop How to monitor progress? Which algorithm to use for search? Search space or Ablation Study Global Learning Trial Controller Black Box Trial Parallel Workers Queue Metric Fault Tolerance? How to aggregate results?
Parallel Search Which algorithm to use for search? How to monitor progress? Search space Meta-level This should be managed with platform support! Learning learning & Trial Black Box optimization Trial Parallel Workers Queue Metric How to aggregate results? Fault Tolerance?
Maggy A flexible framework for asynchronous parallel execution of trials for ML experiments on Hopsworks: ASHA, Random Search, Grid Search, LOCO-Ablation, Bayesian Optimization and more to come…
Synchronous Search HDFS Metrics 3 Metrics 1 Metrics 2 Task 11 Task 21 Task 31 Task 12 Task 22 Task 32 Barrier Barrier Barrier Task 13 Task 23 Task 33 … … … Task 1N Task 2N Task 3N Driver
Add Early Stopping and Asynchronous Algorithms HDFS Metrics 3 Metrics 1 Metrics 2 Task 11 Task 21 Task 31 Task 12 Task 22 Task 32 Barrier Barrier Barrier Task 13 Task 23 Task 33 … … … Task 1N Task 2N Task 3N Wasted Compute Wasted Compute Wasted Compute Driver
Performance Enhancement Early Stopping: ● Median Stopping Rule ● Performance curve prediction Multi-fidelity Methods: ● Successive Halving Algorithm ● Hyperband
Asynchronous Successive Halving Algorithm Animation: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/ Liam Li et al. “Massively Parallel Hyperparameter Tuning”. In: CoRR abs/1810.05934 (2018).
Ablation Studies Replacing the Maggy Optimizer with an Ablator: • Feature Ablation using the Feature Store name sex survive name PClass sex survive • Leave-One-Layer-Out Ablation • Leave-One-Component-Out (LOCO)
Challenge How can we fit this into the bulk synchronous execution model of Spark? Mismatch: Spark Tasks and Stages vs. Trials Databricks’ approach: Project Hydrogen (barrier execution mode) & SparkTrials in Hyperopt
The Solution Long running tasks and communication: Task 11 Task 12 Barrier Task 13 Metrics … New Trial Task 1N Early Stop Driver HyperOpt: One Job/Trial, requiring many Threads on Driver
Enter Maggy
User API
Developer API
Ablation API
Ablation API
Results Hyperparameter Optimization Task ASHA Validation Task ASHA ASHA RS-ES RS-ES RS-NS RS-NS
Conclusions ● Avoid iterative Hyperparameter Optimization ● Black box optimization is hard ● State-of-the-art algorithms can be deployed asynchronously ● Maggy : platform support for automated hyperparameter optimization and ablation studies ● Save resources with asynchronism ● Early stopping for sensible models
● More algorithms ● Distribution Transparency ● Comparability/ What’s next? reproducibility of experiments ● Implicit Provenance ● Support for PyTorch
Acknowledgements Thanks to the entire Logical Clocks Team ☺ @hopsworks Contributions from colleagues: Robin Andersson @robzor92 HOPSWORKS Sina Sheikholeslami @cutlash Kim Hammar @KimHammar1 Alex Ormenisan @alex_ormenisan • Maggy https://github.com/logicalclocks/maggy https://maggy.readthedocs.io/en/latest/ • Hopsworks https://github.com/logicalclocks/hopsworks https://www.logicalclocks.com/whitepapers/hopsworks • Feature Store: the missing data layer in ML pipelines? https://www.logicalclocks.com/feature-store/
Recommend
More recommend