Analysis and Approximation of Optimal Co-Scheduling on Chip Multiprocessors Yunlian Jiang Xipeng Shen The College of William & Mary, USA Jie Chen DoE Jefferson Lab , USA Rahul Tripath University of South Florida, USA
Cache Sharing on CMP Shorten inter-thread communication Flexible usage of cache CPU CPU degrade performance impair fairness Shared Cache hurt performance isolation 2 The College of William and Mary
Degradation is affected by co-runner Performance degradation range min median max 200 180 160 140 Degradation 120 100 80 60 40 20 0 3 The College of William and Mary
Job Co-Scheduling To assign jobs to chips in a manner to minimize contention P1 P2 Shared cache 1 P3 Shared cahe 2 P4 4 The College of William and Mary
Job Co-Scheduling To assign jobs to chips in a manner to minimize contention Resource Waste P2 P1 Shared cache 1 Resource Contention P3 P4 Shared cahe 2 5 The College of William and Mary
Job Co-Scheduling To assign jobs to chips in a manner to minimize contention P1 P2 Shared cache 1 P3 Shared cache 2 P4 6 The College of William and Mary
Job Co-Scheduling To assign jobs to chips in a manner to minimize contention P1 P3 Shared cache 1 P2 P4 Shared cache 2 7 The College of William and Mary
The Goal of this Work Related work Snavely etc. [00’ ASPLOS] Goal of this work Find the optimal schedule on CMP system Benefits Evaluate current schedule quality Applied in real system 8 The College of William and Mary
Contributions Polynomial optimal solution on Dual-core systems NP-Completeness proof on K-core (K>2) systems Polynomial approximation algorithms on K- core (K>2) systems 9 The College of William and Mary
Contributions Polynomial optimal solution on Dual-core systems NP-Completeness proof on K-core (K>2) systems Polynomial approximation algorithms on K- core (K>2) systems 10 The College of William and Mary
Problem Formulation M jobs N Core processors 11 The College of William and Mary
Problem Formulation M jobs N Core processors 12 The College of William and Mary
Problem Formulation Assignment − cCPI sCPI = i i Deg i sCPI i Goal Minimize ∑ Deg i 13 The College of William and Mary
Dual-Core System Polynomial Solution Minimum-weight perfect matching [Edmonds: 1965] Matching A matching M in graph G is a set of edges with no common vertex. perfect matching is a matching which matches all vertices of the graph 14 The College of William and Mary
Dual-Core System Minimum-weight perfect matching In edge weighted graph Sum of weight of edges in the match is minimum 15 the College of William and Mary
Dual-Core System Job Nodes Corun-Degradation Edge Weight Optimal Schedule Minimum weight perfect matching 16 the College of William and Mary
Contributions Polynomial optimal solution on Dual-core systems NP-Completeness proof on K-core (K>2) systems Polynomial approximation algorithms on K- core (K>2) systems 17 the College of William and Mary
NP-Completeness proof NP proof Given a schedule, can compute Reduction NP-Complete problem Job Co-scheduling Multidimensional Assignment Problem (MAP) 18 the College of William and Mary
NP-Completeness Proof MAP 19 the College of William and Mary
NP-Completeness Proof MAP Weight 20 the College of William and Mary
NP-Completeness Proof MAP Weight Minimize Total Weight 21 the College of William and Mary
NP-Completeness Proof Job Co-Scheduling on CMP 22 the College of William and Mary
NP-Completeness Proof Job Co-Scheduling on CMP Sum of Degradations in the Assignment Weight Minimize Total Weight 23 the College of William and Mary
NP-Completeness Proof MAP Job Co-Scheduling 24 the College of William and Mary
NP-Completeness Proof MAP Job Co-Scheduling = Weight Weight 25 the College of William and Mary
NP-Completeness Proof MAP Job Co-Scheduling = Weight Weight = ∞ Weight 26 the College of William and Mary
Contributions Polynomial optimal solution on Dual-core systems NP-Completeness proof on K-core (K>2) systems Polynomial approximation algorithms on K- core (K>2) systems 27 the College of William and Mary
Approximation algorithms Hierarchical Perfect Matching Greedy 28 the College of William and Mary
Hierarchical Perfect Matching Dual-core system optimal solution N Core N/2 Core Dual Core 29 the College of William and Mary
Hierarchical Perfect Matching 30 the College of William and Mary
Hierarchical Perfect Matching 31 the College of William and Mary
Hierarchical Perfect Matching 32 the College of William and Mary
Hierarchical Perfect Matching 33 the College of William and Mary
Hierarchical Perfect Matching 34 the College of William and Mary
Greedy Algorithm Basic idea Schedule the least “polite” job first “politeness” of a Job Sum of degradations of all the assignments contain this job. Impact of a job on others 35 the College of William and Mary
Greedy Algorithm Sort unassigned jobs based on politeness I. Pick the least politeness job J to schedule II. III. Add assignment contains J with least degradation to schedule IV. Update unassigned job list 36 the College of William and Mary
Local Optimization Main Scheme for i 1 to K-1 K: number of assignments for j i+1 to K Local-Optimization( i, j ) 37 the College of William and Mary
Performance Evaluation Machine AMD Opteron 4 core processors Benchmarks 15 SPEC CPU2000, 1 Stream Metrics Performance Degradation Scheduling time Fairness 38 the College of William and Mary
Performance Degradation 70 OPT 60 Greedy Perf. Degradation(%) 50 Hierarchical Random 40 30 20 10 0 Benchmarks 39 the College of William and Mary
Performance Degradation 70 OPT 60 Greedy-Opt Perf. Degradation(%) 50 Hierarchical-Opt 40 Random 30 20 10 0 ammp applu bzip crafty equake facerec gap mcf parser swim twolf vpr Average art stream Benchmarks 40 the College of William and Mary
Scheduling Time Running Time(s) 20 Greedy Greedy-opt 15 Hierarchical 10 Hierarchical-opt 5 0 16 32 48 64 80 96 112 128 144 Number of Jobs 41 the College of William and Mary
Fairness Unfairness Factor Coefficient of Variation of normalized degradation OPT Unfairness Factor 0.25 Greedy opt 0.2 Greedy Hierarchical opt 0.15 Hierarchical 0.1 Random 0.05 0 1 42 the College of William and Mary
Conclusion Job co-scheduling on CMP is crucial Different schedule performance varies Dual-core system Polynomial solvable K-core (K>2) system NP-Complete problem Heuristics Hierarchical Greedy 43 the College of William and Mary
Acknowledgement Weizhen Mao William and Mary Cliff Stein Columbia University William Cook Georgia Tech National Science Foundation IBM CAS Fellowship 44 the College of William and Mary
Thanks! Questions? 45 the College of William and Mary
Recommend
More recommend