Maggy - Open-Source Asynchronous Distributed Hyperparameter - PowerPoint PPT Presentation

Maggy - Open-Source Asynchronous Distributed Hyperparameter Optimization Based on Apache Spark Moritz Meister FOSDEM’20 moritz@logicalclocks.com 02.02.2020 @morimeister

The Bitter Lesson (of AI)* “Methods that scale with computation are the future of AI”** “The two (general purpose) methods that seem to scale … Rich Sutton (Father of Reinforcement Learning) ... are search and learning .”* * http://www.incompleteideas.net/IncIdeas/BitterLesson.html ** https://www.youtube.com/watch?v=EeMCEQa85tw 2

The Answer Spark scales with available compute! is the answer!

Distribution and Deep Learning Generalization Error Better Regularization Better Optimization Methods Algorithms Larger Training Datasets Hyperparameter Design Optimization Better Models Distribution

Inner and Outer Loop of Deep Learning Outer Loop Inner Loop Search Trial Training Data HParams, Method Architecture, worker 1 worker 2 worker N etc. … ∆ ∆ ∆ N 1 2 Synchronization Metric http://tiny.cc/51yjdz

Inner and Outer Loop of Deep Learning Outer Loop Inner Loop Search Trial Training Data HParams, Method Architecture, worker 1 worker 2 worker N etc. … SEARCH LEARNING ∆ ∆ ∆ N 1 2 Synchronization Metric http://tiny.cc/51yjdz

In Reality This Means Rewriting Training Code Explore/ Machine Learning Experiments Data Model Data Pipelines Feature Design Parallel Hyperparameter Ablation Serving Ingest & Prep Store Model Training Optimization Studies

Towards Distribution Transparency Inner Loop The distribution oblivious training function (pseudo-code):

Towards Distribution Transparency Set Hyper- parameters • Trial and Error is slow • Iterative approach is greedy • Search spaces are usually large Evaluate • Sensitivity and interaction of Train Model Performance hyperparameters

Sequential Black Box Optimization Outer Loop Search space Meta-level Learning learning & Black Box optimization Metric

Sequential Search Outer Loop Learning Global Black Box Controller Metric

Parallel Search Outer Loop How to monitor progress? Which algorithm to use for search? Search space or Ablation Study Global Learning Trial Controller Black Box Trial Parallel Workers Queue Metric Fault Tolerance? How to aggregate results?

Parallel Search Which algorithm to use for search? How to monitor progress? Search space Meta-level This should be managed with platform support! Learning learning & Trial Black Box optimization Trial Parallel Workers Queue Metric How to aggregate results? Fault Tolerance?

Maggy A flexible framework for asynchronous parallel execution of trials for ML experiments on Hopsworks: ASHA, Random Search, Grid Search, LOCO-Ablation, Bayesian Optimization and more to come…

Synchronous Search HDFS Metrics 3 Metrics 1 Metrics 2 Task 11 Task 21 Task 31 Task 12 Task 22 Task 32 Barrier Barrier Barrier Task 13 Task 23 Task 33 … … … Task 1N Task 2N Task 3N Driver

Add Early Stopping and Asynchronous Algorithms HDFS Metrics 3 Metrics 1 Metrics 2 Task 11 Task 21 Task 31 Task 12 Task 22 Task 32 Barrier Barrier Barrier Task 13 Task 23 Task 33 … … … Task 1N Task 2N Task 3N Wasted Compute Wasted Compute Wasted Compute Driver

Performance Enhancement Early Stopping: ● Median Stopping Rule ● Performance curve prediction Multi-fidelity Methods: ● Successive Halving Algorithm ● Hyperband

Asynchronous Successive Halving Algorithm Animation: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/ Liam Li et al. “Massively Parallel Hyperparameter Tuning”. In: CoRR abs/1810.05934 (2018).

Ablation Studies Replacing the Maggy Optimizer with an Ablator: • Feature Ablation using the Feature Store name sex survive name PClass sex survive • Leave-One-Layer-Out Ablation • Leave-One-Component-Out (LOCO)

Challenge How can we fit this into the bulk synchronous execution model of Spark? Mismatch: Spark Tasks and Stages vs. Trials Databricks’ approach: Project Hydrogen (barrier execution mode) & SparkTrials in Hyperopt

The Solution Long running tasks and communication: Task 11 Task 12 Barrier Task 13 Metrics … New Trial Task 1N Early Stop Driver HyperOpt: One Job/Trial, requiring many Threads on Driver

Enter Maggy

User API

Developer API

Ablation API

Results Hyperparameter Optimization Task ASHA Validation Task ASHA ASHA RS-ES RS-ES RS-NS RS-NS

Conclusions ● Avoid iterative Hyperparameter Optimization ● Black box optimization is hard ● State-of-the-art algorithms can be deployed asynchronously ● Maggy : platform support for automated hyperparameter optimization and ablation studies ● Save resources with asynchronism ● Early stopping for sensible models

● More algorithms ● Distribution Transparency ● Comparability/ What’s next? reproducibility of experiments ● Implicit Provenance ● Support for PyTorch

Acknowledgements Thanks to the entire Logical Clocks Team ☺ @hopsworks Contributions from colleagues: Robin Andersson @robzor92 HOPSWORKS Sina Sheikholeslami @cutlash Kim Hammar @KimHammar1 Alex Ormenisan @alex_ormenisan • Maggy https://github.com/logicalclocks/maggy https://maggy.readthedocs.io/en/latest/ • Hopsworks https://github.com/logicalclocks/hopsworks https://www.logicalclocks.com/whitepapers/hopsworks • Feature Store: the missing data layer in ML pipelines? https://www.logicalclocks.com/feature-store/

Maggy - Open-Source Asynchronous Distributed Hyperparameter - PowerPoint PPT Presentation

Maggy - Open-Source Asynchronous Distributed Hyperparameter Optimization Based on Apache Spark Moritz Meister FOSDEM20 moritz@logicalclocks.com 02.02.2020 @morimeister The Bitter Lesson (of AI)* Methods that scale with computation

The Highgate Bowl: past, present and future Presentation for Highgate U3A by Maggy

Cytotoxic Activity of Food Isolates of Cronobacter sakazakii and Cronobacter muytjensii from

From Python to PySpark and Back Again - Unifying Single-host and Distributed Machine Learning

Laser drilling of a Copper Mesh Vincenzo Berardi U.O.S. Bari, Italy Dip. Interuniversitario di

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

SVS AVF Clinical Practice Guidelines Venous Ulcer SVS AVF

Shortcut-Stacked Sentence Encoders for Multi-Domain Inference Yixin Nie & Mohit Bansal 1

GLAD: Groningen Lightweight Authorship Detection PAN, Authorship verification, 2015 Manuela

SIIM 2018 Cardiovascular Informatics: Imaging and Workflows Session Co-chairs: Bruce Bray and

Procdures dAblation et NACO Dr Walid AMARA GHI Le Raincy-Montfermeil Relations

Wit ith Im Image Clu lustering Jianwei Yang Devi Parikh Dhruv Batra Vir irgin inia ia

Invasive Fetal Therapy Stephen R. Carr Francois I. Luks Fetal Therapy Definitions: Fetal

CS 287 Lecture 20 (Fall 2019) Model-based RL Pieter Abbeel UC Berkeley EECS Outline n

Learning Perceptual Inference by Contrasting http://wellyzhang.github.io/project/copinet.html Chi

Ti Timi ming of ADT T and ch chemotherapy Thomas Keane M.D. Medical University of South

Structure at the meta-level: Observations on the structure of design spaces of high-performance

Translator Research Production Shared Research task Dataset newstest2016 newstest2017

SimCLR: A Simple Framework for Contrastive Learning of Visual Representations Ting Chen Simon

Deformable Convolutional Networks Jifeng Dai^ With Haozhi Qi^, Yuwen Xiong^, Yi Li*^, Guodong

Deep Generation of Coq Lemma Names Using Elaborated Terms Pengyu Nie 1 , Karl Palmskog 2 , Junyi

vil : Dri Drift ft with th De Devi Security of Multi-Sensor Fusion based Localization in

sample synthesis method for few-shot object recognition Eli Schwartz, Leonid Karlinsky,

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 9:

Biological Systems Hillel Kugler Faculty of Engineering, Bar-Ilan University, Israel FMCAD20

Sambuz

Useful Links

Newsletter

Mail Us

Maggy - Open-Source Asynchronous Distributed Hyperparameter - PowerPoint PPT Presentation

Maggy - Open-Source Asynchronous Distributed Hyperparameter Optimization Based on Apache Spark Moritz Meister FOSDEM20 moritz@logicalclocks.com 02.02.2020 @morimeister The Bitter Lesson (of AI)* Methods that scale with computation

The Highgate Bowl: past, present and future Presentation for Highgate U3A by Maggy

Cytotoxic Activity of Food Isolates of Cronobacter sakazakii and Cronobacter muytjensii from

From Python to PySpark and Back Again - Unifying Single-host and Distributed Machine Learning

Laser drilling of a Copper Mesh Vincenzo Berardi U.O.S. Bari, Italy Dip. Interuniversitario di

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

SVS AVF Clinical Practice Guidelines Venous Ulcer SVS AVF

Shortcut-Stacked Sentence Encoders for Multi-Domain Inference Yixin Nie &amp; Mohit Bansal 1

GLAD: Groningen Lightweight Authorship Detection PAN, Authorship verification, 2015 Manuela

SIIM 2018 Cardiovascular Informatics: Imaging and Workflows Session Co-chairs: Bruce Bray and

Procdures dAblation et NACO Dr Walid AMARA GHI Le Raincy-Montfermeil Relations

Wit ith Im Image Clu lustering Jianwei Yang Devi Parikh Dhruv Batra Vir irgin inia ia

Invasive Fetal Therapy Stephen R. Carr Francois I. Luks Fetal Therapy Definitions: Fetal

CS 287 Lecture 20 (Fall 2019) Model-based RL Pieter Abbeel UC Berkeley EECS Outline n

Learning Perceptual Inference by Contrasting http://wellyzhang.github.io/project/copinet.html Chi

Ti Timi ming of ADT T and ch chemotherapy Thomas Keane M.D. Medical University of South

Structure at the meta-level: Observations on the structure of design spaces of high-performance

Translator Research Production Shared Research task Dataset newstest2016 newstest2017

SimCLR: A Simple Framework for Contrastive Learning of Visual Representations Ting Chen Simon

Deformable Convolutional Networks Jifeng Dai^ With Haozhi Qi*^, Yuwen Xiong*^, Yi Li*^, Guodong

Deep Generation of Coq Lemma Names Using Elaborated Terms Pengyu Nie 1 , Karl Palmskog 2 , Junyi

vil : Dri Drift ft with th De Devi Security of Multi-Sensor Fusion based Localization in

sample synthesis method for few-shot object recognition Eli Schwartz*, Leonid Karlinsky*,

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 9:

Biological Systems Hillel Kugler Faculty of Engineering, Bar-Ilan University, Israel FMCAD20

Sambuz

Useful Links

Newsletter

Mail Us

Shortcut-Stacked Sentence Encoders for Multi-Domain Inference Yixin Nie & Mohit Bansal 1

Deformable Convolutional Networks Jifeng Dai^ With Haozhi Qi^, Yuwen Xiong^, Yi Li*^, Guodong

sample synthesis method for few-shot object recognition Eli Schwartz, Leonid Karlinsky,