Pigeo eon: a an Effec ective D e Distrib ibuted ed, Hier - PowerPoint PPT Presentation

Pigeo eon: a an Effec ective D e Distrib ibuted ed, Hier erarchic ical Datacen enter Job ob S Sched eduler ler Zhijun Wang, Huiyang Li, Zhongwei Li, Xiaocui Sun, Jia Rao, Hao Che and Hong Jiang University of Texas at Arlington 1

Datacenter job scheduling challenges-I  Large scale Cluster size is large Tens of thousands of nodes/workers The number of tasks in a job can be larger Tens of thousands of tasks in a job -- More than 50K tasks in a job in the Cloudera trace 2

Datacenter job scheduling challenges-II  Heterogeneous workload Short jobs (e.g., user facing applications ) ---call for short response time Long jobs (e.g., Data backup) --call for mean response time guarantee 3

Centralized job scheduling  Scalability problem A scheduler manages all the workers’ resources in a cluster Workers Scheduler short job Job task queue long job task queue 4

Distributed scheduling-Sparrow  Low efficinecy: unbalanced probing Scheduler Job Workers task queue A scheduler needs to maintain all probes. 5

Hybrid scheduling-Eagle, Hawk  All short jobs are put to reserved workers  Scalability problem Centralized Long job Workers task Scheduler queue Distributed Scheduler Short job Reserved workers: only serve short job tasks 6

Pigeon  Contributions 1. Introduce a master level for task distribution New architecture, hierarchical job scheduler 2. Fully solve scalability problem 3. High efficiency 7

Centrally manage a Dispatch tasks to group of workers workers Overview of Pigeon Master is job agnostic Master Distributed group of workers Scheduler Jo b Reserved Task workers 8

Job scheduling in Pigeon workers Weighted fair queue (W) Idle worker list Job Distributed Scheduler 9

Why is Pigeon better? Solve key challenges in existing schedulers Scalable: greatly reduce status maintenance costs in job schedulers Group size 100: # of master is 1% # of workers, reduce 99% status maintenance cost Efficiency: Remove head-of-line blocking Have statistical multiplexing gain w ithin a group Group size 100: run at 90% load, the probability of a task finding an idle worker in a group is 1-0.9 100 =99.99734!! 10

Modeling and Analysis Consider a single type of jobs, the fanout degree in a job is less than the number of masters. The task queuing time in a master is a M/M/K queue (K is the group size) Running at 30% higher utilization Zero queueing time: job w ithout queuing time, The task execution time in a job is the same 11

Evaluation--Implementation  Spark plug-in, Amazon EC2 cloud  120-w orker cluster (3 groups in Pigeon)  Measurement metrics: 50th, 90th and 99th percentile short and long job completion time  Compare w ith state-of-the-art schedulers: Eagle and Sparrow  Source codes: https://github.com/ruby-/pigeon/ 12

Pigeon vs Eagle--Implementation Eagle normalized to Pigeon 20x~ 30x short job performance gains 13

Pigeon vs Sparrow--Implementation Sparrow normalized to Pigeon Pigeon w orks in a real cluster 14

Evaluation—Large Scale Simulation  Event-driven simulator  Google, Yahoo and Cloudera traces  Cluster size 3000--19000 w orkers  Measurement metrics: 50th, 90th and 99th percentile short and long job completion time  Compare w ith state-of-the-art hybrid scheduler: Eagle 15

Pigeon is really scalable and efficient Google trace Eagle Slow dow n=job completion time / job execution time Big performance gains for short job at high loads Slightly better performance gains for long jobs 16

Conclusion Pigeon: a new distributed and hierarchical job scheduler, new scheduling architecture 1. Excellent scalability better than existing schedulers 2. High efficiency with multiplexing 17

Thank you! Questions ?? 18

Pigeo eon: a an Effec ective D e Distrib ibuted ed, Hier - PowerPoint PPT Presentation

Pigeo eon: a an Effec ective D e Distrib ibuted ed, Hier erarchic ical Datacen enter Job ob S Sched eduler ler Zhijun Wang, Huiyang Li, Zhongwei Li, Xiaocui Sun, Jia Rao, Hao Che and Hong Jiang University of Texas at Arlington 1

EON Company Overview 2017 Knut Henrik Aas CEO, EON Reality Norway AS EON Reality, Inc.| M: +47

Blockchain for Agriculture A solution looking for a problem? Di Distrib ibuted Le Ledger Tech

Developing an effec tive Developing an effec tive c our c our c our c our se outc omes se

Effec ective e trans nsitioni oning ng from om present ntation t on to o conv

Tog oget ether her for or an n Ef Effec ective ve Lo Loca cal Af Africa ca Dr

Data distrib u tions FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON

Re Real-tim time e Dis istr trib ibuted ed MIM IMO Sy Systems Hariharan Rahul Ezzeldin

Gam ame e Theor Theory y for or Dist istribut ibuted ed Syst ystem ems John P. Conley

Challenges in in Making the Physical, Vir irtual EON Sports VR Disrupting Sports Training

Interacting with Data L eon Bottou NEC Labs America COS 424 2/2/2010 Summary - Three

Ensembles L eon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble

Information Theory, Statistics, and Decision Trees L eon Bottou COS 424 4/6/2010 Summary

Feature engineering L eon Bottou COS 424 4/22/2010 Summary Summary I. The importance

Identifiability in matrix sparse factorization L eon Zheng leon.zheng@ens-lyon.fr M2

Complexity-Effec/ve Mul/core Cache Coherence MCC2012 Stefanos

The Big Picture KNOW THE DIFFERENCES Cour r t t-Or Order dered ed Protec ective e Men

CISC2000/2010 of 1 6 Lecture 3 Fall 2018 Prof. Zhang Last week: 1. Three aspects of

TDD of toEnglish Justin Pearson 1 Introduction This is not an original idea. I found the idea on

Number Systems MA1S1 Tristan McLoughlin November 27, 2013 http://en.wikipedia.org/wiki/Binary

Lesson 9 - I can multiply 3 digits by 1 digit Today we will learn to multiply 3 digits by 1

Microsoft AI and Research Deep Learning at Microsoft 2 De Deep L Lear arnin ing I Inference

DEVICES and more... Andr Bourdoux 2 nd Vision for Future Communications Systems 27 - 28

Move fast and secure things About Me $whoami Security engineer @ Fb > 2 years Security

Achieving a Readable Style Part 2: Sentence Structure A wri&ng workshop presented by BACTER

Pigeo eon: a an Effec ective D e Distrib ibuted ed, Hier - PowerPoint PPT Presentation

Pigeo eon: a an Effec ective D e Distrib ibuted ed, Hier erarchic ical Datacen enter Job ob S Sched eduler ler Zhijun Wang, Huiyang Li, Zhongwei Li, Xiaocui Sun, Jia Rao, Hao Che and Hong Jiang University of Texas at Arlington 1

EON Company Overview 2017 Knut Henrik Aas CEO, EON Reality Norway AS EON Reality, Inc.| M: +47

Blockchain for Agriculture A solution looking for a problem? Di Distrib ibuted Le Ledger Tech

Developing an effec tive Developing an effec tive c our c our c our c our se outc omes se

Effec ective e trans nsitioni oning ng from om present ntation t on to o conv

Tog oget ether her for or an n Ef Effec ective ve Lo Loca cal Af Africa ca Dr

Data distrib u tions FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P YTH ON

Re Real-tim time e Dis istr trib ibuted ed MIM IMO Sy Systems Hariharan Rahul Ezzeldin

Gam ame e Theor Theory y for or Dist istribut ibuted ed Syst ystem ems John P. Conley

Challenges in in Making the Physical, Vir irtual EON Sports VR Disrupting Sports Training

Interacting with Data L eon Bottou NEC Labs America COS 424 2/2/2010 Summary - Three

Ensembles L eon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble

Information Theory, Statistics, and Decision Trees L eon Bottou COS 424 4/6/2010 Summary

Feature engineering L eon Bottou COS 424 4/22/2010 Summary Summary I. The importance

Identifiability in matrix sparse factorization L eon Zheng leon.zheng@ens-lyon.fr M2

Complexity-Effec/ve Mul/core Cache Coherence MCC2012 Stefanos

The Big Picture KNOW THE DIFFERENCES Cour r t t-Or Order dered ed Protec ective e Men

CISC2000/2010 of 1 6 Lecture 3 Fall 2018 Prof. Zhang Last week: 1. Three aspects of

TDD of toEnglish Justin Pearson 1 Introduction This is not an original idea. I found the idea on

Number Systems MA1S1 Tristan McLoughlin November 27, 2013 http://en.wikipedia.org/wiki/Binary

Lesson 9 - I can multiply 3 digits by 1 digit Today we will learn to multiply 3 digits by 1

Microsoft AI and Research Deep Learning at Microsoft 2 De Deep L Lear arnin ing I Inference

DEVICES and more... Andr Bourdoux 2 nd Vision for Future Communications Systems 27 - 28

Move fast and secure things About Me $whoami Security engineer @ Fb &gt; 2 years Security

Achieving a Readable Style Part 2: Sentence Structure A wri&amp;ng workshop presented by BACTER

Move fast and secure things About Me $whoami Security engineer @ Fb > 2 years Security

Achieving a Readable Style Part 2: Sentence Structure A wri&ng workshop presented by BACTER