HyperSched Deadline-aware Scheduler for Model Development Richard - PowerPoint PPT Presentation

HyperSched Deadline-aware Scheduler for Model Development Richard Liaw , Romil Bhardwaj, Lisa Dunlap, Yitian Zou, Joseph E. Gonzalez, Ion Stoica, Alexey Tumanov � 1

Data Science @ Boogle Inc. � 2

Learning Rate? Momentum?? Network Size? Preprocessing Parameters??? Featurization????? � 3

Learning Rate? Momentum?? Network Size? Preprocessing Parameters??? Featurization????? L � 3

How to optimize? Try Random Search � 4

Trials (sets of hyperparameters to evaluate) Terri is faced with the decision choosing the right level of parallelism Accuracy GPUs Time Time � 5

Trials (sets of hyperparameters to evaluate) Terri is faced with the decision choosing the right level of parallelism Accuracy # GPUs Time Time � 6

Scheduling Problem? DEADLINES EXIST Accuracy # GPUs Time Time � 7

Scheduling Problem Given finite time and compute resources, Instead of increasing - DL cluster e ffi ciency [OSDI 2018] - Job Completion Time [NSDI 2019, EuroSys 2018] � 8

Scheduling Problem Exploration Problem Given finite time and compute resources, Instead of increasing evaluate many random - DL cluster e ffi ciency trials (configurations) [OSDI 2018] - Job Completion Time [NSDI 2019, EuroSys 2018] � 8

Scheduling Problem Exploration Problem Exploitation Problem Given finite time and compute resources, Instead of increasing evaluate many random - DL cluster e ffi ciency trials (configurations) [OSDI 2018] - Job Completion Time [NSDI 2019, EuroSys 2018] to obtain the best trained model � 8

HyperSched is an application-level scheduler for model development. � 9

HyperSched is an application-level scheduler for model development. • Balances explore and exploit by adaptively allocating resources based on: � 9

HyperSched is an application-level scheduler for model development. • Balances explore and exploit by adaptively allocating resources based on: • Awareness of resource constraints # GPU TIME � 9

HyperSched is an application-level scheduler for model development. • Balances explore and exploit by adaptively allocating resources based on: • Awareness of resource constraints • Awareness of training objectives Accuracy # GPU TIME TIME � 9

Properties/Assumptions of model development workloads � 10

Properties/Assumptions of model development workloads Model development consists of evaluating many trials. � 10

Properties/Assumptions of model development workloads Model development consists of evaluating many trials. • Each trial is iterative and returns intermediate results � 10

Properties/Assumptions of model development workloads Model development consists of evaluating many trials. • Each trial is iterative and returns intermediate results • Trials can be checkpointed during training. � 10

Properties/Assumptions of model development workloads Model development consists of evaluating many trials. • Each trial is iterative and returns intermediate results Accuracy • Trials can be checkpointed during training. • All trials share the same objective. Care only about 1 model. Time � 10

Properties/Assumptions of model development workloads Model development consists of evaluating many trials. • Each trial is iterative and returns intermediate results Accuracy • Trials can be checkpointed during training. • All trials share the same objective. Care only about 1 model. Time • Model training can be accelerated by parallelizing/ distributing its workload (data parallelism). � 10

How to use allocation for exploration and exploitation # GPU TIME � 11

Naive Approach: Static Space/Time Allocation # GPU TIME � 12

Exploration Naive Approach: Static Space/Time Allocation # GPU TIME � 12

Exploration Exploitation Naive Approach: Static Space/Time Allocation # GPU TIME � 12

Naive Approach: Static Space/Time Allocation 4 Layer CNN on CIFAR10 - Mukkamala, ICML2017 4 Layer CNN on CIFAR10 - Mukkamala, ICML2017 � 13

Naive Approach: Static Space/Time Allocation Problem: Initial Performance is a weak proxy of final behavior 4 Layer CNN on CIFAR10 - Mukkamala, ICML2017 � 13

Naive Solution: Static Space/Time Allocation Underallocate exploration… # GPU TIME TIME � 14 � 14

Naive Solution: Static Space/Time Allocation … or underallocate exploitation # GPU TIME TIME � 15 � 15

Naive Solution: Static Space/Time Allocation Main problem: Cannot rely on initial performance. � 16

Better Solution: Asynchronous Successive Halving Algorithm (ASHA) [Li2018] � 17

Better Solution: Asynchronous Successive Halving Algorithm (ASHA) [Li2018] - Distributed hyperparameter tuning algorithm based o ff optimal resource allocation. � 17

Better Solution: Asynchronous Successive Halving Algorithm (ASHA) [Li2018] - Distributed hyperparameter tuning algorithm based o ff optimal resource allocation. - SOTA results over other existing algorithms � 17

Better Solution: Asynchronous Successive Halving Algorithm (ASHA) [Li2018] - Distributed hyperparameter tuning algorithm based o ff optimal resource allocation. - SOTA results over other existing algorithms - Deployed on many AutoML o ff erings today � 17

Better Solution: Asynchronous Successive Halving Algorithm (ASHA) [Li2018] η * r r η * r Accuracy # GPU r * Simplified representation η * η * r TIME TIME - r: min. epoch - R : max epoch - η (eta): Balance explore/exploit - Intuition : Progressively allocate more resources to promising trials � 18

Better Solution: Asynchronous Successive Halving Algorithm (ASHA) [Li2018] η * r r η * r Accuracy # GPU r * Simplified representation η * η * r TIME TIME LIMIT = r while trial.iter < R: - r: min. epoch trial.run_one_epoch() - R : max epoch - η (eta): Balance explore/exploit - Intuition : Progressively allocate more resources to promising trials � 18

Better Solution: Asynchronous Successive Halving Algorithm (ASHA) [Li2018] η * r r η * r Accuracy # GPU r * Simplified representation η * η * r TIME TIME LIMIT = r while trial.iter < R: - r: min. epoch trial.run_one_epoch() - R : max epoch if trial.iter == LIMIT: - η (eta): Balance explore/exploit - Intuition : Progressively allocate more resources to promising trials � 18

Better Solution: Asynchronous Successive Halving Algorithm (ASHA) [Li2018] η * r r η * r Accuracy # GPU r * Simplified representation η * η * r TIME TIME LIMIT = r while trial.iter < R: - r: min. epoch trial.run_one_epoch() - R : max epoch if trial.iter == LIMIT: if is_top(trial, LIMIT, 1/ η ): - η (eta): Balance explore/exploit - Intuition : Progressively allocate more resources to promising trials � 18

Better Solution: Asynchronous Successive Halving Algorithm (ASHA) [Li2018] η * r r η * r Accuracy # GPU r * Simplified representation η * η * r TIME TIME LIMIT = r while trial.iter < R: - r: min. epoch trial.run_one_epoch() - R : max epoch if trial.iter == LIMIT: if is_top(trial, LIMIT, 1/ η ): - η (eta): Balance explore/exploit LIMIT *= η - Intuition : Progressively allocate more resources to promising trials � 18

Better Solution: Asynchronous Successive Halving Algorithm (ASHA) [Li2018] η * r r η * r Accuracy # GPU r * Simplified representation η * η * r TIME TIME LIMIT = r while trial.iter < R: - r: min. epoch trial.run_one_epoch() - R : max epoch if trial.iter == LIMIT: if is_top(trial, LIMIT, 1/ η ): - η (eta): Balance explore/exploit LIMIT *= η - Intuition : Progressively allocate more else: resources to promising trials � 18

Better Solution: Asynchronous Successive Halving Algorithm (ASHA) [Li2018] η * r r η * r Accuracy # GPU r * Simplified representation η * η * r TIME TIME LIMIT = r while trial.iter < R: - r: min. epoch trial.run_one_epoch() - R : max epoch if trial.iter == LIMIT: if is_top(trial, LIMIT, 1/ η ): - η (eta): Balance explore/exploit LIMIT *= η - Intuition : Progressively allocate more else: # allow new trials to start resources to promising trials trial.pause(); break � 18

Better Solution: Asynchronous Successive Halving Algorithm (ASHA) [Li2018] Benefit: Mitigate noisy initial performance by adaptive allocation Accuracy TIME � 19

HyperSched Deadline-aware Scheduler for Model Development Richard - PowerPoint PPT Presentation

HyperSched Deadline-aware Scheduler for Model Development Richard Liaw , Romil Bhardwaj, Lisa Dunlap, Yitian Zou, Joseph E. Gonzalez, Ion Stoica, Alexey Tumanov 1 2 Data Science @ Boogle Inc. 2 3 3 Learning Rate?

Work Integrated Learning (WIL) Faculty of Science, Engineering and Built Environment SLE390

Aug. 16, 2012 Yale LANS Live Streaming is a Major Internet App Yale LANS Poor Performance After

Back to School Night Sample Title Sample Subtitle Weyanoke Elementary School September 21, 2020

Equip yourself for career success Introduction to the Student Employability Program Clare

PRECEPTOR DEVELOPMENT: ASSESSING LEARNER PERFORMANCE, PROVIDING FEEDBACK, AND COMPLETE

I nteractive Model Perform ance Evaluation Tools Doug Boyer and Weining Zhao Air Modeling and

Formal and Informal Methods for Multi-Core Design Space Exploration Jean-Francois Kempf Olivier

Evaluation of Amplification attacks in large-scale networks to improve detection performance

Interviewing From Both Sides of the Desk Andy Lester, Land The Tech Job You Love

Academia Shmacadamia pt. 2: It gets be4er! (Careers in

!"#"$%&'() *'"&+ ,+-.%)'/# 0&%."1"#) 2$'"#)%)'/#

INTRODUCTION TO PROBABILITY INTRODUCTION TO PROBABILITY MODELS MODELS Lecture 8 Qi Wang ,

Building Trust with Job Seekers and Their Families Thursday, August 10, 2017 Advancing your

EN ENGL GLISH ISH LAN ANGUAG GUAGE Topic 20: English pupils book 10. Unit 2. What is your

Introduction to FIFE Grid submission tutorial Mike Kirby DUNE Software Tutorials Aug 14, 2017

Retain Millennial Troops Presented by An Coppens, Gamification Nation Dr Helen Dudfield, QinetiQ

Webinar Title Our Moderator Dan Meyer Publisher/Owner Our Presenters Gary Glader Verna M.

Having Impact Matters Jesper Richter-Reichhelm (@rirei) Daniel Pink Daniel Pink Autonomy

Freiheit braucht Regeln! Sudan Martin Jackson I work hard for the success-factor People

Job Satisfaction and Related Worker Attitudes Angelina Boursalian Jennifer Rivera

PINNACLE Institute 2016 - 2017 MS20 Mavens Swalena Griffin , Atlanta-Fulton Public Library

Office of the Future of Work A new path forward The future of work refers to the impact of

WORK STEALING SCHEDULER 2 6/16/2010 Work Stealing Scheduler

Nomad HASHICORP Armon Dadgar @armon HASHICORP HASHICORP Cluster Manager Scheduler Nomad

HyperSched Deadline-aware Scheduler for Model Development Richard - PowerPoint PPT Presentation

HyperSched Deadline-aware Scheduler for Model Development Richard Liaw , Romil Bhardwaj, Lisa Dunlap, Yitian Zou, Joseph E. Gonzalez, Ion Stoica, Alexey Tumanov 1 2 Data Science @ Boogle Inc. 2 3 3 Learning Rate?

Work Integrated Learning (WIL) Faculty of Science, Engineering and Built Environment SLE390

Aug. 16, 2012 Yale LANS Live Streaming is a Major Internet App Yale LANS Poor Performance After

Back to School Night Sample Title Sample Subtitle Weyanoke Elementary School September 21, 2020

Equip yourself for career success Introduction to the Student Employability Program Clare

PRECEPTOR DEVELOPMENT: ASSESSING LEARNER PERFORMANCE, PROVIDING FEEDBACK, AND COMPLETE

I nteractive Model Perform ance Evaluation Tools Doug Boyer and Weining Zhao Air Modeling and

Formal and Informal Methods for Multi-Core Design Space Exploration Jean-Francois Kempf Olivier

Evaluation of Amplification attacks in large-scale networks to improve detection performance

Interviewing From Both Sides of the Desk Andy Lester, Land The Tech Job You Love

Academia Shmacadamia pt. 2: It gets be4er! (Careers in

!&quot;#&quot;$%&amp;'() *'&quot;&amp;+ ,+-.%)'/# 0&amp;%.&quot;1&quot;#) 2$'&quot;#)%)'/#

INTRODUCTION TO PROBABILITY INTRODUCTION TO PROBABILITY MODELS MODELS Lecture 8 Qi Wang ,

Building Trust with Job Seekers and Their Families Thursday, August 10, 2017 Advancing your

EN ENGL GLISH ISH LAN ANGUAG GUAGE Topic 20: English pupils book 10. Unit 2. What is your

Introduction to FIFE Grid submission tutorial Mike Kirby DUNE Software Tutorials Aug 14, 2017

Retain Millennial Troops Presented by An Coppens, Gamification Nation Dr Helen Dudfield, QinetiQ

Webinar Title Our Moderator Dan Meyer Publisher/Owner Our Presenters Gary Glader Verna M.

Having Impact Matters Jesper Richter-Reichhelm (@rirei) Daniel Pink Daniel Pink Autonomy

Freiheit braucht Regeln! Sudan Martin Jackson I work hard for the success-factor People

Job Satisfaction and Related Worker Attitudes Angelina Boursalian Jennifer Rivera

PINNACLE Institute 2016 - 2017 MS20 Mavens Swalena Griffin , Atlanta-Fulton Public Library

Office of the Future of Work A new path forward The future of work refers to the impact of

WORK STEALING SCHEDULER 2 6/16/2010 Work Stealing Scheduler

Nomad HASHICORP Armon Dadgar @armon HASHICORP HASHICORP Cluster Manager Scheduler Nomad

!"#"$%&'() *'"&+ ,+-.%)'/# 0&%."1"#) 2$'"#)%)'/#