Coupling Task Progress for MapReduce Resource-Aware Scheduling Jian Tan, Xiaoqiao Meng, Li Zhang IBM T. J. Watson Research Center Yorktown Heights, New York, 10598 Email: { tanji, xmeng, zhangli } @us.ibm.com also an issue for ReducerTasks. Though it has recently Abstract —Schedulers are critical in enhancing the perfor- mance of MapReduce/Hadoop in presence of multiple jobs been addressed in [13], [14], the adopted approaches with different characteristics and performance goals. Though are sensitive to future run-time information (e.g., map current schedulers for Hadoop are quite successful, they still output distribution and competitions among new jobs) have room for improvement: map tasks (MapTasks) and reduce that is difficult to predict. tasks (ReduceTasks) are not jointly optimized, albeit there is a strong dependence between them. This can cause job starvation 2) Fair Scheduler uses Delay Scheduling [12] that allows and unfavorable data locality. In this paper, we design and MapTasks to wait for a period of time to find local implement a resource-aware scheduler for Hadoop. It couples data. It usually improves the data locality for MapTasks. the progresses of MapTasks and ReduceTasks, utilizing Wait However, we observe that the introduced delays may Scheduling for ReduceTasks and Random Peeking Scheduling for lead to under-utilization and instability, i.e., the number MapTasks to jointly optimize the task placement. This mitigates the starvation problem and improves the overall data locality. of MapTasks running simultaneously is far below a Our extensive experiments demonstrate significant improvements desired level, and change with large variations over time. in job response times. In view of these observations, we propose a resource- aware scheduler, termed Coupling Scheduler. It couples the I. I NTRODUCTION progresses of map and reduce tasks to mitigate starvation, and MapReduce [1] has emerged as a popular paradigm for jointly optimizes the placements for both of them to improve processing large datasets in parallel over a cluster. As an open the overall data locality. Specifically, we utilize Wait Schedul- source implementation, Hadoop [2] has been successfully used ing for ReduceTasks and Random Peeking Scheduling for in a variety of applications, such as social network mining, MapTasks, taking into consideration the interactions between log processing, video and image analysis, search indexing, them, to holistically optimize the data locality. Our extensive recommendation systems, etc. In many scenarios, long batch experiments demonstrate superior performance improvements jobs and short interactive queries are submitted to the same to the job processing times. MapReduce cluster, sharing limited common computing re- A. Scheduling ReduceTasks sources with different performance goals. To meet these im- posed challenges an efficient scheduler is critical to providing While MapTasks are small tasks that can run independently the desired quality of service for the MapReduce cluster. In in parallel, ReduceTasks are long-running tasks that contain this domain of MapReduce scheduling, Fair Scheduler [3] copy/shuffle and reduce phases. In most existing schedulers, is the most widely used one in practice. Other commonly ReduceTasks are not preemptive, i.e., a ReduceTask will not used schedulers include, the default FIFO Scheduler, Capacity release the occupied slot until its reduce phase completes. It is Scheduler [4] and variations [5]–[7]. To improve performance feasible to introduce ReduceTask preemptions in engineering of large-scale MapReduce clusters, more complicated resource realization [15]. However, this work adheres to the current management schemes have also been proposed [8]–[11]. While non-preemption assumption. For Fair Scheduler, once a certain focusing on Fair Scheduler as it is the de facto standard in the percentage of MapTasks of a job finish, the ReduceTasks are Hadoop community, we observe that it, as well as many other launched greedily to a maximum. This method overlaps the schedulers, still exhibits room for improvement. copy/shuffle and map phase of a job and can greatly reduce the 1) Map and reduce tasks are scheduled separately [5] job processing times. However, this approach can starve newly without joint optimization. First, Fair Scheduler only arrived jobs [12], and this problem is even more pronounced guarantees the fairness of MapTasks, and is not really when many small jobs arrive after large ones [16]. fair for ReduceTasks. We observe that allocating excess The experiment in Fig. 1 further illustrates the problem. resources to ReduceTasks without coordinating with the Fig. 1 shows the number of map and reduce tasks running map progress will lead to cluster-wise resource under- simultaneously at every time point for two Grep jobs. Job 1 utilization, which is evidenced by the starvation prob- grabs all the reduce slots at time 0 . 9 minutes, just before lem [12]. Secondly, most MapReduce schedulers only job 2 is submitted at time 1 . 0 minute. Thus, when job 2 consider data locality for MapTasks and ignore that it is finishes its MapTasks at time 3 . 8 minutes, it can not launch
Recommend
More recommend