maggy
play

Maggy - Open-Source Asynchronous Distributed Hyperparameter - PowerPoint PPT Presentation

Maggy - Open-Source Asynchronous Distributed Hyperparameter Optimization Based on Apache Spark Moritz Meister FOSDEM20 moritz@logicalclocks.com 02.02.2020 @morimeister The Bitter Lesson (of AI)* Methods that scale with computation


  1. Maggy - Open-Source Asynchronous Distributed Hyperparameter Optimization Based on Apache Spark Moritz Meister FOSDEM’20 moritz@logicalclocks.com 02.02.2020 @morimeister

  2. The Bitter Lesson (of AI)* “Methods that scale with computation are the future of AI”** “The two (general purpose) methods that seem to scale … Rich Sutton (Father of Reinforcement Learning) ... are search and learning .”* * http://www.incompleteideas.net/IncIdeas/BitterLesson.html ** https://www.youtube.com/watch?v=EeMCEQa85tw 2

  3. The Answer Spark scales with available compute! is the answer!

  4. Distribution and Deep Learning Generalization Error Better Regularization Better Optimization Methods Algorithms Larger Training Datasets Hyperparameter Design Optimization Better Models Distribution

  5. Inner and Outer Loop of Deep Learning Outer Loop Inner Loop Search Trial Training Data HParams, Method Architecture, worker 1 worker 2 worker N etc. … ∆ ∆ ∆ N 1 2 Synchronization Metric http://tiny.cc/51yjdz

  6. Inner and Outer Loop of Deep Learning Outer Loop Inner Loop Search Trial Training Data HParams, Method Architecture, worker 1 worker 2 worker N etc. … SEARCH LEARNING ∆ ∆ ∆ N 1 2 Synchronization Metric http://tiny.cc/51yjdz

  7. In Reality This Means Rewriting Training Code Explore/ Machine Learning Experiments Data Model Data Pipelines Feature Design Parallel Hyperparameter Ablation Serving Ingest & Prep Store Model Training Optimization Studies

  8. Towards Distribution Transparency Inner Loop The distribution oblivious training function (pseudo-code):

  9. Towards Distribution Transparency Set Hyper- parameters • Trial and Error is slow • Iterative approach is greedy • Search spaces are usually large Evaluate • Sensitivity and interaction of Train Model Performance hyperparameters

  10. Sequential Black Box Optimization Outer Loop Search space Meta-level Learning learning & Black Box optimization Metric

  11. Sequential Search Outer Loop Learning Global Black Box Controller Metric

  12. Parallel Search Outer Loop How to monitor progress? Which algorithm to use for search? Search space or Ablation Study Global Learning Trial Controller Black Box Trial Parallel Workers Queue Metric Fault Tolerance? How to aggregate results?

  13. Parallel Search Which algorithm to use for search? How to monitor progress? Search space Meta-level This should be managed with platform support! Learning learning & Trial Black Box optimization Trial Parallel Workers Queue Metric How to aggregate results? Fault Tolerance?

  14. Maggy A flexible framework for asynchronous parallel execution of trials for ML experiments on Hopsworks: ASHA, Random Search, Grid Search, LOCO-Ablation, Bayesian Optimization and more to come…

  15. Synchronous Search HDFS Metrics 3 Metrics 1 Metrics 2 Task 11 Task 21 Task 31 Task 12 Task 22 Task 32 Barrier Barrier Barrier Task 13 Task 23 Task 33 … … … Task 1N Task 2N Task 3N Driver

  16. Add Early Stopping and Asynchronous Algorithms HDFS Metrics 3 Metrics 1 Metrics 2 Task 11 Task 21 Task 31 Task 12 Task 22 Task 32 Barrier Barrier Barrier Task 13 Task 23 Task 33 … … … Task 1N Task 2N Task 3N Wasted Compute Wasted Compute Wasted Compute Driver

  17. Performance Enhancement Early Stopping: ● Median Stopping Rule ● Performance curve prediction Multi-fidelity Methods: ● Successive Halving Algorithm ● Hyperband

  18. Asynchronous Successive Halving Algorithm Animation: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/ Liam Li et al. “Massively Parallel Hyperparameter Tuning”. In: CoRR abs/1810.05934 (2018).

  19. Ablation Studies Replacing the Maggy Optimizer with an Ablator: • Feature Ablation using the Feature Store name sex survive name PClass sex survive • Leave-One-Layer-Out Ablation • Leave-One-Component-Out (LOCO)

  20. Challenge How can we fit this into the bulk synchronous execution model of Spark? Mismatch: Spark Tasks and Stages vs. Trials Databricks’ approach: Project Hydrogen (barrier execution mode) & SparkTrials in Hyperopt

  21. The Solution Long running tasks and communication: Task 11 Task 12 Barrier Task 13 Metrics … New Trial Task 1N Early Stop Driver HyperOpt: One Job/Trial, requiring many Threads on Driver

  22. Enter Maggy

  23. User API

  24. Developer API

  25. Ablation API

  26. Ablation API

  27. Results Hyperparameter Optimization Task ASHA Validation Task ASHA ASHA RS-ES RS-ES RS-NS RS-NS

  28. Conclusions ● Avoid iterative Hyperparameter Optimization ● Black box optimization is hard ● State-of-the-art algorithms can be deployed asynchronously ● Maggy : platform support for automated hyperparameter optimization and ablation studies ● Save resources with asynchronism ● Early stopping for sensible models

  29. ● More algorithms ● Distribution Transparency ● Comparability/ What’s next? reproducibility of experiments ● Implicit Provenance ● Support for PyTorch

  30. Acknowledgements Thanks to the entire Logical Clocks Team ☺ @hopsworks Contributions from colleagues: Robin Andersson @robzor92 HOPSWORKS Sina Sheikholeslami @cutlash Kim Hammar @KimHammar1 Alex Ormenisan @alex_ormenisan • Maggy https://github.com/logicalclocks/maggy https://maggy.readthedocs.io/en/latest/ • Hopsworks https://github.com/logicalclocks/hopsworks https://www.logicalclocks.com/whitepapers/hopsworks • Feature Store: the missing data layer in ML pipelines? https://www.logicalclocks.com/feature-store/

More recommend