Understanding Task Scheduling Algorithms Kenjiro Taura 1 / 51
Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time Introduction DAG model and greedy schedulers Work stealing schedulers 4 Analyzing cache misses of work stealing 5 Summary 2 / 51
Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time Introduction DAG model and greedy schedulers Work stealing schedulers 4 Analyzing cache misses of work stealing 5 Summary 3 / 51
Introduction in this part, we study how tasks in task parallel programs are scheduled what can we expect about its performance ✞ void ms(elem * a, elem * a_end, 1 elem * t, int dest) { 2 long n = a_end - a; 3 if (n == 1) { 4 T0 T1 T161 ... T2 T40 T162 T184 5 T3 T31 T41 T77 T163 T172 T185 T187 T4 T29 T32 T38 T42 T66 T78 T102 T164 T166 T173 T175 T186 T188 T190 } else { 6 T5 T11 T30 T33 T37 T39 T43 T62 T67 T74 T79 T82 T103 T153 T165 T167 T171 T174 T176 T181 T189 T191 T6 T7 T12 T24 T34 T35 T44 T63 T65 T68 T72 T75 T76 T80 T81 T83 T101 T104 T122 T154 T155 T168 T169 T177 T179 T182 T192 ... T8 T9 T13 T14 T25 T26 T36 T45 T61 T64 T69 T71 T73 T84 T93 T105 T120 T123 T137 T156 T158 T170 T178 T180 T183 T193 T195 7 T10 T15 T23 T27 T46 T60 T70 T85 T94 T106 T111 T121 T124 T128 T138 T152 T157 T159 T160 T194 T196 T198 create task(ms(a, c, t, 1 - dest)); T16 T20 T28 T47 T56 T86 T87 T95 T96 T107 T110 T112 T114 T125 T129 T135 T139 T143 T197 T199 8 T17 T21 T48 T57 T88 T92 T97 T108 T109 T113 T115 T117 T126 T130 T136 T140 T144 T146 T18 T22 T49 T55 T58 T89 T90 T98 T100 T116 T118 T127 T131 T141 T145 T147 T150 ms(c, a_end, t + nh, 1 - dest); 9 T19 T50 T54 T59 T91 T99 T119 T132 T134 T142 T148 T149 T151 T51 T53 T133 wait tasks; T52 10 } 11 } 12 4 / 51
Goals understand a state-of-the-art scheduling algorithm (work stealing scheduler) 5 / 51
Goals understand a state-of-the-art scheduling algorithm (work stealing scheduler) execution time (without modeling communication): how much time does a scheduler take to finish a computation? in particular, how close is it to greedy schedulers? 5 / 51
Goals understand a state-of-the-art scheduling algorithm (work stealing scheduler) execution time (without modeling communication): how much time does a scheduler take to finish a computation? in particular, how close is it to greedy schedulers? data access (communication) cost: when a computation is executed in parallel by a scheduler, how much data are transferred (caches ↔ memory, caches ↔ cache)? in particular, how much are they worse (or better) than those of the serial execution? 5 / 51
Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time Introduction DAG model and greedy schedulers Work stealing schedulers 4 Analyzing cache misses of work stealing 5 Summary 6 / 51
Model of computation assume a program performs the following operations create task( S ) : create a task that performs S wait tasks : waits for completion of tasks it has created (but has not waited for) e.g., ✞ int fib(n) { 1 if (n < 2) return 1; 2 else { 3 int x, y; 4 create_task({ x = fib(n - 1); }); // share x 5 y = fib(n - 2); 6 wait_tasks; 7 return x + y; 8 } 9 } 10 7 / 51
Model of computation model an execution as a DAG (directed acyclic graph) node: a sequence of instructions edge: dependency assume no other dependencies besides induced by create task( S ) and wait tasks e.g., (note C 1 and C 2 may be subgraphs, not single nodes) P 1 ✞ P 1 1 create_task( C 1 ); 2 P 2 C 1 P 2 3 create_task( C 2 ); 4 P 3 P 3 5 C 2 wait_tasks; 6 P 4 7 P 4 8 / 51
Terminologies and remarks a single node in the DAG represents a P 1 sequence of instructions performing no task-related operations P 2 note that a task ̸ = a single node, but = a C 1 sequence of nodes P 3 we say a node is ready when all its C 2 predecessors have finished; we say a task is ready to mean a node of it becomes P 4 ready 9 / 51
Work stealing scheduler a state of the art scheduler of task parallel systems the main ideas invented in 1990: Mohr, Kranz, and Halstead. Lazy task creation: a technique for increasing the granularity of parallel programs. ACM conference on LISP and functional programming. originally termed “Lazy Task Creation,” but essentially the same strategy is nowadays called “work stealing” 10 / 51
Work stealing scheduler: data structure ready deques executing tasks top ready tasks · · · bottom W 0 W 1 W 2 · · · W n − 1 each worker maintains its “ready deque” that contains ready tasks the top entry of each ready deque is an executing task 11 / 51
Work stealing scheduler : in a nutshell work-first; when creating a task, the created task gets executed first (before the parent) 12 / 51
Work stealing scheduler : in a nutshell work-first; when creating a task, the created task gets executed first (before the parent) LIFO execution order within a worker; without work stealing, the order of execution is as if it were a serial program create task( S ) ≡ S wait tasks ≡ noop 12 / 51
Work stealing scheduler : in a nutshell work-first; when creating a task, the created task gets executed first (before the parent) LIFO execution order within a worker; without work stealing, the order of execution is as if it were a serial program create task( S ) ≡ S wait tasks ≡ noop FIFO stealing; it partitions tasks at points close to the root of the task tree 12 / 51
Work stealing scheduler : in a nutshell work-first; when creating a task, the created task gets executed first (before the parent) LIFO execution order within a worker; without work stealing, the order of execution is as if it were a serial program create task( S ) ≡ S wait tasks ≡ noop FIFO stealing; it partitions tasks at points close to the root of the task tree it is a practical approximation of a greedy scheduler, in the sense that any ready task can be (eventually) stolen by any idle worker 12 / 51
Work stealing scheduler in action describing a scheduler boils down to defining actions on each of the following events (1) create task (2) a worker becoming idle (3) wait tasks (4) a task termination we will see them in detail 13 / 51
Work stealing scheduler in action top · · · bottom W 0 W 1 W 2 · · · W n − 1 (1) worker W encounters create task( S ) : 14 / 51
Work stealing scheduler in action top S · · · P bottom W 0 W 1 W 2 · · · W n − 1 (1) worker W encounters create task( S ) : W pushes S to its deque 1 and immediately starts executing S 2 14 / 51
Work stealing scheduler in action top S · · · P bottom W 0 W 1 W 2 · · · W n − 1 (2) a worker with empty deque repeats work stealing : 14 / 51
Work stealing scheduler in action top S · · · P bottom W 0 W 1 W 2 · · · W n − 1 (2) a worker with empty deque repeats work stealing : picks a random worker V as the victim 1 14 / 51
Work stealing scheduler in action top · · · bottom W 0 W 1 W 2 · · · W n − 1 (2) a worker with empty deque repeats work stealing : picks a random worker V as the victim 1 steals the task at the bottom of V ’s deque 2 14 / 51
Work stealing scheduler in action top · · · bottom W 0 W 1 W 2 W n − 1 · · · (3) a worker W encounters wait tasks : there are two cases 14 / 51
Work stealing scheduler in action top · · · bottom W 0 W 1 W 2 W n − 1 · · · (3) a worker W encounters wait tasks : there are two cases tasks to wait for have finished ⇒ W just continues the task 1 14 / 51
Work stealing scheduler in action top · · · bottom W 0 W 1 W 2 W n − 1 · · · (3) a worker W encounters wait tasks : there are two cases tasks to wait for have finished ⇒ W just continues the task 1 14 / 51
Work stealing scheduler in action top · · · bottom W 0 W 1 W 2 W n − 1 · · · (3) a worker W encounters wait tasks : there are two cases tasks to wait for have finished ⇒ W just continues the task 1 otherwise ⇒ pops the task from its deque (the task is now 2 blocked , and W will start work stealing) 14 / 51
Work stealing scheduler in action top · · · bottom W 0 W 1 W 2 W n − 1 · · · (3) a worker W encounters wait tasks : there are two cases tasks to wait for have finished ⇒ W just continues the task 1 otherwise ⇒ pops the task from its deque (the task is now 2 blocked , and W will start work stealing) 14 / 51
Work stealing scheduler in action top · · · bottom W 0 W 1 W 2 · · · W n − 1 (4) when W encounters the termination of a task T , W pops T from its deque. there are two cases about T ’s parent P : 14 / 51
Work stealing scheduler in action top · · · T bottom W 0 W 1 W 2 · · · W n − 1 P (4) when W encounters the termination of a task T , W pops T from its deque. there are two cases about T ’s parent P : P has been blocked and now becomes ready again ⇒ W 1 enqueues and continues to P 14 / 51
Work stealing scheduler in action top · · · P bottom W 0 W 1 W 2 · · · W n − 1 (4) when W encounters the termination of a task T , W pops T from its deque. there are two cases about T ’s parent P : P has been blocked and now becomes ready again ⇒ W 1 enqueues and continues to P 14 / 51
Recommend
More recommend