Computer Science 00110001001110010011011000110111 Scheduling Hadoop - PowerPoint PPT Presentation

Department of Computer Science 00110001001110010011011000110111 Scheduling Hadoop Jobs to Meet Deadlines Kamal Kc, Kemafor Anyanwu Department of Computer Science North Carolina State University { kkc,kogan} @ncsu.edu

Department of Computer Science 00110001001110010011011000110111 Introduction  MapReduce  Cluster based parallel programming abstraction  Programmers focus on designing application and not on issues like parallelization, scheduling, input partitioning, failover, replication  Hadoop  open source implementation of MapReduce framework  A Hadoop job is a workflow of Map Reduce cycles

Department of Computer Science 00110001001110010011011000110111 Introduction  Using Hadoop Cluster infrastructure required  − costly to maintain − sharing cluster resources among users a viable approach Demand based pay-as-you-go model can be attractive to  meet user’s computation requirement One such user requirement is the time specification:  deadline But current Hadoop does not support deadline based job  execution  How to make Hadoop support deadlines?  Develop interface to input the deadline  Modify the Hadoop scheduler to account for deadlines

Department of Computer Science 00110001001110010011011000110111 Problem definition  A user submits a job with a specified deadline D  Hadoop cluster has fixed number of machines with fixed map and reduce slots  Hadoop job is broken down into fixed set of map and reduce tasks  Problem:  Can the job meet its deadline ?  If yes, then how should we schedule the tasks into the available slots of the machines ?  Constraint Scheduler for Hadoop : our effort to tackle these problems

Department of Computer Science 00110001001110010011011000110111 Constraint Scheduler  Extends the real time cluster scheduling approach to incorporate 2 phase(map and reduce) computation style  Can the deadline be met ? min n m min  Let , be the minimum # of map and reduce n r tasks that need to be scheduled to meet deadline  map tasks can be started as soon as job is submitted but when should the reduce be started ? (answer: let reduce should be started at S_r(max) to finish the deadline)  then the job can meet deadline: min − If map slots > = is available before S_r(max) n m min n r − if reduce slots > = is available after S_r(max) min min  But how do we know the values of , , S_r(max) ? n m n r

Department of Computer Science 00110001001110010011011000110111 Constraint Scheduler  Assume we can know/ estimate (data processing tasks) c m map cost per unit data  reduce cost per unit data c r  communication cost per unit data c d  filter ratio f   Also assume cluster is homogeneous  key distribution is uniform   Then, for a job of size σ with arrival A and deadline D  s m and s r are actual start times for map and reduce resp.

Department of Computer Science 00110001001110010011011000110111 Constraint Scheduler - 2  How to schedule tasks in cluster machines ?  Possible techniques: − assign all map and reduce tasks if enough slots are available − assign minimum tasks − assign some fixed number of tasks greater than minimum  Constraint Scheduler's approach: − assign minimum tasks − intuitive appeal : some empty slots available for other jobs

Department of Computer Science 00110001001110010011011000110111 Design and Implementation  Developed as a contrib module using Hadoop 0.20.2 version  Web interface:  to specify deadline  to provide map/ reduce cost per unit data  to start job

Department of Computer Science 00110001001110010011011000110111 Experimental Evaluation  Setup Physical cluster  − 10 tasktrackers, 1 jobtracker Virtualized cluster  − single physical node − 3 guest Vms as tasktrackers, host system as jobtracker Both systems:  − 2 map/ reduce slots per tasktracker − 64MB HDFS block size  Hadoop job Job equivalent to the query: SELECT userid, count(actionid) as  num_actions FROM useraction GROUP BY userid useraction table contains (userid, actionid) tuples  Job translates into aggregation operation which is one of the  common form of Hadoop operation

Department of Computer Science 00110001001110010011011000110111 Results  Virtualized cluster  Input size = 975MB  16 map tasks  2 deadlines − 600s deadline  min map tasks = 6 − 700s deadline  min map tasks = 5  finished early due to less task resulting in less cpu load

Department of Computer Science 00110001001110010011011000110111 Results  Physical cluster  Input size = 2.9GB  48 map tasks  2 deadlines − 680s  min map tasks = 20  min reduce tasks = 5 − 1000s  min map tasks = 8  min reduce tasks = 4

Department of Computer Science 00110001001110010011011000110111 Future work  Take into account  node failures  speculative execution  map/ reduce computation cost estimation  impact of map tasks with non local data

Department of Computer Science 00110001001110010011011000110111 Conclusion  Extended the real time cluster scheduling approach for MapReduce style computation  Constraint Scheduler identifies if a Hadoop job can meet its deadline and schedules accordingly if the deadline can be met  Constraint Scheduler based on general enough model that can be extended to account for the assumed conditions

Department of Computer Science 00110001001110010011011000110111 Thank you

Computer Science 00110001001110010011011000110111 Scheduling Hadoop - PowerPoint PPT Presentation

Department of Computer Science 00110001001110010011011000110111 Scheduling Hadoop Jobs to Meet Deadlines Kamal Kc, Kemafor Anyanwu Department of Computer Science North Carolina State University { kkc,kogan} @ncsu.edu Department of Computer

I do Computer Science. I do Computer Science. Cool! I do Computer

Preparatory Course in Computer programming experience Science Computer Science 1 : Theoretical

Computer & Information Science & Engineering Computer & Information Science &

Data Science: Statistics or Computer Science? 9/15/2015 DATA SCIENCE: STATISTICS OR COMPUTER

Oscar Gilbert Department of Computer Science and Computer Engineering Sarah Marsh Department of

Computer Science 161: Computer Security Computer Science 161 Fall 2016 Popa and Weaver

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

Los Alamos Computer Science Symposium Los Alamos Computer Science Symposium (LACSS) (LACSS)

From Model-Driven Computer Science to Data-Driven Computer Science and Back Moshe Y. Vardi Rice

The Scalable Intracampus Research Grid for Computer Science Research: SInRG Computer Science

Computer Science Qingsong Guo Fall 2017 School of Computer Science & Technology CS101

COMP 516 Research Methods in Computer Science Dominik Wojtczak Department of Computer Science

Introduction to Computer Science Computer Science Programming Languages Xuefeng Cui Group

Objectives Greedy Algorithms Interval scheduling Interval partitioning Feb 8, 2019

1 Last class: CPU Scheduling Today: CPU Scheduling Algorithms and Systems 2

Real Time Scheduling Basic Concepts Radek Pel anek Basic Elements Model of RT System

Smiths Rule In Stochastic Scheduling Caroline Jagtenberg Uwe Schwiegelshohn Marc Uetz

Operating System Principles: Scheduling CS 111 Operating Systems Peter Reiher Lecture 4 CS

1 CPU-Intensive Jobs Taskfarming Parallelisable, for example Taskfarming Sentence by sentence

What is Working Well? Since it was set up, Working Well has helped thousands of people like you

Universal Credit Sunderland City Council Partners Universal Credit - overview Housing Benefit

Sambuz

Useful Links

Newsletter

Mail Us

Computer Science 00110001001110010011011000110111 Scheduling Hadoop - PowerPoint PPT Presentation

Department of Computer Science 00110001001110010011011000110111 Scheduling Hadoop Jobs to Meet Deadlines Kamal Kc, Kemafor Anyanwu Department of Computer Science North Carolina State University { kkc,kogan} @ncsu.edu Department of Computer

I do Computer Science. I do Computer Science. Cool! I do Computer

Preparatory Course in Computer programming experience Science Computer Science 1 : Theoretical

Computer &amp; Information Science &amp; Engineering Computer &amp; Information Science &amp;

Data Science: Statistics or Computer Science? 9/15/2015 DATA SCIENCE: STATISTICS OR COMPUTER

Oscar Gilbert Department of Computer Science and Computer Engineering Sarah Marsh Department of

Computer Science 161: Computer Security Computer Science 161 Fall 2016 Popa and Weaver

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

Los Alamos Computer Science Symposium Los Alamos Computer Science Symposium (LACSS) (LACSS)

From Model-Driven Computer Science to Data-Driven Computer Science and Back Moshe Y. Vardi Rice

The Scalable Intracampus Research Grid for Computer Science Research: SInRG Computer Science

Computer Science Qingsong Guo Fall 2017 School of Computer Science &amp; Technology CS101

COMP 516 Research Methods in Computer Science Dominik Wojtczak Department of Computer Science

Introduction to Computer Science Computer Science Programming Languages Xuefeng Cui Group

Objectives Greedy Algorithms Interval scheduling Interval partitioning Feb 8, 2019

1 Last class: CPU Scheduling Today: CPU Scheduling Algorithms and Systems 2

Real Time Scheduling Basic Concepts Radek Pel anek Basic Elements Model of RT System

Smiths Rule In Stochastic Scheduling Caroline Jagtenberg Uwe Schwiegelshohn Marc Uetz

Operating System Principles: Scheduling CS 111 Operating Systems Peter Reiher Lecture 4 CS

1 CPU-Intensive Jobs Taskfarming Parallelisable, for example Taskfarming Sentence by sentence

What is Working Well? Since it was set up, Working Well has helped thousands of people like you

Universal Credit Sunderland City Council Partners Universal Credit - overview Housing Benefit

Sambuz

Useful Links

Newsletter

Mail Us

Computer & Information Science & Engineering Computer & Information Science &

Computer Science Qingsong Guo Fall 2017 School of Computer Science & Technology CS101