Maggy - Open-Source Asynchronous Distributed Hyperparameter - - PowerPoint PPT Presentation

maggy
SMART_READER_LITE
LIVE PREVIEW

Maggy - Open-Source Asynchronous Distributed Hyperparameter - - PowerPoint PPT Presentation

Maggy - Open-Source Asynchronous Distributed Hyperparameter Optimization Based on Apache Spark Moritz Meister FOSDEM20 moritz@logicalclocks.com 02.02.2020 @morimeister The Bitter Lesson (of AI)* Methods that scale with computation


slide-1
SLIDE 1

Moritz Meister moritz@logicalclocks.com @morimeister

Maggy

  • Open-Source Asynchronous Distributed

Hyperparameter Optimization Based on Apache Spark

FOSDEM’20 02.02.2020

slide-2
SLIDE 2

The Bitter Lesson (of AI)*

“Methods that scale with computation are the future of AI”** “The two (general purpose) methods that seem to scale … ... are search and learning.”*

2

** https://www.youtube.com/watch?v=EeMCEQa85tw

Rich Sutton

(Father of Reinforcement Learning)

* http://www.incompleteideas.net/IncIdeas/BitterLesson.html

slide-3
SLIDE 3

The Answer Spark scales with available compute! is the answer!

slide-4
SLIDE 4

Distribution and Deep Learning

Generalization Error Better Regularization Methods Design Better Models Hyperparameter Optimization Larger Training Datasets Better Optimization Algorithms Distribution

slide-5
SLIDE 5

Inner Loop Outer Loop worker1

Inner and Outer Loop of Deep Learning

worker2 workerN

2

N

Synchronization

http://tiny.cc/51yjdz

Training Data

1

Metric

Search Method

Trial HParams, Architecture, etc.

slide-6
SLIDE 6

Inner Loop Outer Loop worker1

Inner and Outer Loop of Deep Learning

worker2 workerN

2

N

Synchronization

http://tiny.cc/51yjdz

Training Data

1

Metric

SEARCH LEARNING

Search Method

Trial HParams, Architecture, etc.

slide-7
SLIDE 7

In Reality This Means Rewriting Training Code

Data Pipelines Ingest & Prep Feature Store Machine Learning Experiments Data Parallel Training Model Serving Ablation Studies Hyperparameter Optimization Explore/ Design Model

slide-8
SLIDE 8

Towards Distribution Transparency

The distribution oblivious training function (pseudo-code):

Inner Loop

slide-9
SLIDE 9

Towards Distribution Transparency

  • Trial and Error is slow
  • Iterative approach is greedy
  • Search spaces are usually large
  • Sensitivity and interaction of

hyperparameters

Set Hyper- parameters Train Model Evaluate Performance

slide-10
SLIDE 10

Sequential Black Box Optimization

Learning Black Box Metric Meta-level learning &

  • ptimization

Search space

Outer Loop

slide-11
SLIDE 11

Sequential Search

Learning Black Box Metric Global Controller

Outer Loop

slide-12
SLIDE 12

Parallel Search

Learning Black Box Metric Global Controller Parallel Workers Queue

Trial Trial

Search space or Ablation Study Which algorithm to use for search? How to monitor progress? Fault Tolerance? How to aggregate results?

Outer Loop

slide-13
SLIDE 13

Parallel Search

Learning Black Box Metric Meta-level learning &

  • ptimization

Parallel Workers Queue

Trial Trial

Search space Which algorithm to use for search? How to monitor progress? Fault Tolerance? How to aggregate results?

This should be managed with platform support!

slide-14
SLIDE 14

Maggy

A flexible framework for asynchronous parallel execution of trials for ML experiments on Hopsworks: ASHA, Random Search, Grid Search, LOCO-Ablation, Bayesian Optimization and more to come…

slide-15
SLIDE 15

Synchronous Search

Task11

Driver

Task12 Task13 Task1N

HDFS

Task21 Task22 Task23 Task2N

Barrier Barrier Task31 Task32 Task33 Task3N

Barrier

Metrics1 Metrics2 Metrics3

slide-16
SLIDE 16

Add Early Stopping and Asynchronous Algorithms

Task11

Driver

Task12 Task13 Task1N

HDFS

Task21 Task22 Task23 Task2N

Barrier Barrier Task31 Task32 Task33 Task3N

Barrier

Metrics1 Metrics2 Metrics3 Wasted Compute Wasted Compute Wasted Compute

slide-17
SLIDE 17

Performance Enhancement

Early Stopping:

  • Median Stopping Rule
  • Performance curve prediction

Multi-fidelity Methods:

  • Successive Halving Algorithm
  • Hyperband
slide-18
SLIDE 18

Asynchronous Successive Halving Algorithm

Animation: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/ Liam Li et al. “Massively Parallel Hyperparameter Tuning”. In: CoRR abs/1810.05934 (2018).

slide-19
SLIDE 19

Ablation Studies

PClass name survive sex sex name survive

Replacing the Maggy Optimizer with an Ablator:

  • Feature Ablation using

the Feature Store

  • Leave-One-Layer-Out

Ablation

  • Leave-One-Component-Out

(LOCO)

slide-20
SLIDE 20

How can we fit this into the bulk synchronous execution model of Spark? Mismatch: Spark Tasks and Stages vs. Trials

Challenge

Databricks’ approach: Project Hydrogen (barrier execution mode) & SparkTrials in Hyperopt

slide-21
SLIDE 21

The Solution

Task11

Driver

Task12 Task13 Task1N

Barrier

Metrics New Trial Early Stop

Long running tasks and communication:

HyperOpt: One Job/Trial, requiring many Threads on Driver

slide-22
SLIDE 22

Enter Maggy

slide-23
SLIDE 23

User API

slide-24
SLIDE 24

Developer API

slide-25
SLIDE 25

Ablation API

slide-26
SLIDE 26

Ablation API

slide-27
SLIDE 27

Results

Hyperparameter Optimization Task ASHA Validation Task ASHA RS-ES RS-NS ASHA RS-ES RS-NS

slide-28
SLIDE 28

Conclusions

  • Avoid iterative Hyperparameter Optimization
  • Black box optimization is hard
  • State-of-the-art algorithms can be deployed

asynchronously

  • Maggy: platform support for automated hyperparameter
  • ptimization and ablation studies
  • Save resources with asynchronism
  • Early stopping for sensible models
slide-29
SLIDE 29

What’s next?

  • More algorithms
  • Distribution

Transparency

  • Comparability/

reproducibility of experiments

  • Implicit Provenance
  • Support for PyTorch
slide-30
SLIDE 30

Acknowledgements

Thanks to the entire Logical Clocks Team ☺ Contributions from colleagues: Robin Andersson @robzor92 Sina Sheikholeslami @cutlash Kim Hammar @KimHammar1 Alex Ormenisan @alex_ormenisan

  • Maggy

https://github.com/logicalclocks/maggy https://maggy.readthedocs.io/en/latest/

  • Hopsworks

https://github.com/logicalclocks/hopsworks https://www.logicalclocks.com/whitepapers/hopsworks

  • Feature Store: the missing data layer in ML pipelines?

https://www.logicalclocks.com/feature-store/

@hopsworks HOPSWORKS