exploration of influence of program inputs on cmp co
play

Exploration of Influence of Program Inputs on CMP Co-Scheduling - PowerPoint PPT Presentation

Exploration of Influence of Program Inputs on CMP Co-Scheduling Yunlian Jiang Xipeng Shen Computer Science The College of William and Mary, USA Cache sharing in CMP Commercial CMPs Intel Core 2 Duo E6750 CPU CPU AMD Athlon X2


  1. Exploration of Influence of Program Inputs on CMP Co-Scheduling Yunlian Jiang Xipeng Shen Computer Science The College of William and Mary, USA

  2. Cache sharing in CMP  Commercial CMPs  Intel Core 2 Duo E6750 CPU CPU  AMD Athlon X2 6400+ Shared Cache 2

  3. Cache sharing  Pros  Shorten inter-thread communication  Flexible usage of cache  Cons: causes cache contention  degrade performance  impair fairness  hurt performance isolation 3

  4. Job co-scheduling  To assign jobs to chips in a manner to minimize contention  Example P1 CMP Chip1 P2 P3 CMP Chip2 P4 4

  5. Job co-scheduling  To assign jobs to chips in a manner to minimize contention  Example P1 Chip2 P2 P1 P3 P4 CMP Chip1 P2 P3 CMP Chip2 P4 5

  6. Job co-scheduling  To assign jobs to chips in a manner to minimize contention  Example P1 CMP Chip1 P2 P3 P1 Chip2 P4 CMP Chip2 P2 Chip1 P3 P4 6

  7. Previous co-scheduling work  Runtime sampling based  Online sampling the performance on different schedules and pick the best  E.g., [Tullsen+: ASPLOS’00, ….]  Profiling directed  Offline profiling to learn program cache behavior  E.g., [ Nussbaum+: USENIX’05 ….] 7

  8. Our focus  Two factors determining cache contention  Programs running together  Inputs to the programs 8

  9. Contributions of this work  Exposing input impact on cache contention  Construction of cross-input predictive models  Evaluation on a proactive co-scheduler 9

  10. Contributions of this work  Exposing input impact on cache contention  Construction of cross-input predictive models  Evaluation on a proactive co-scheduler 10

  11. Measurement of input impact  Machine: Intel Xeon dual-core processors  Compiler: gcc4.1  Hardware performance API: PAPI3.5  Experiments  Measure the performance degradation  every pair of 12 SPEC CPU2k programs  3 different input sets (test, train, and ref) 11

  12. Metric  sCPI : Cycles per Instruction (CPI) when running alone  cCPI : CPI when co-running with other programs 12

  13. Co-run degradation on different inputs 13

  14. Contributions of this work  Exposing input impact on cache contention  Construction of cross-input predictive models  Evaluation on a proactive co-scheduler 14

  15. Objective An arbitrary input Predictive model Corun schedule Cache CAPS behavior Scheduler 15

  16. Proactive Co-Scheduler: CAPS 16

  17. Single-run behaviors to predict  Access per Instruction  Density of memory references in an execution  Distinct Memory Blocks per Cycle (DPC)  Aggressiveness of cache contention DPC = Distinct Blocks per Instruction (DPI) x Instructions per cycle  Reuse Signature 17

  18. Reuse signature  Reuse distance Number of distinct data between data reuse  E.g,  b a a c b  2  Reuse signature  Histogram of reuse distances in an execution  Predictable with over 94% accuracy [Zhong+:TC’07] 18

  19. Construction of predictive models New Input < I1 B1 > … Predictive < Ik Bk > Model … Regression < In Bn > Model Memory Behavior 19

  20. Regression models  Linear model  Least Mean Squares (LMS) method  Linear function between inputs and outputs  Non-linear model  K-Nearest-Neighbor  Use k similar instances to estimate new output value  Hybrid method  Pick the model with minimum training errors for a program 20

  21. Contributions of this work  Exposing input impact on cache contention  Construction of cross-input predictive models  Evaluation on a proactive co-scheduler 21

  22. Prediction accuracy result Programs Access per instruction DPI LMS NN Hybrid LMS NN Hybrid ammp 89.58 98.76 98.76 39.83 86.72 86.72 art 98.86 94.25 98.86 98.96 94.25 98.96 bzip 75.79 78.62 78.62 67.69 64.05 67.69 crafty 99.54 99.24 99.54 76.31 72.50 76.31 equake 54.58 54.42 54.58 82.27 82.13 82.27 gap 74.75 79.35 79.35 79.87 78.08 79.87 gzip 82.76 86.98 86.98 77.85 66.47 77.85 mcf 90.25 92.45 92.45 89.73 88.11 89.73 mesa 96.39 96.98 96.98 89.43 93.33 93.33 parser 96.02 98.61 98.61 89.49 70.42 89.49 twolf 97.11 98.10 98.10 52.12 86.75 86.75 vpr 81.50 81.50 81.50 96.30 95.28 96.30 22 Average Average 86.43 86.43 88.27 88.27 88.69 88.69 78.32 78.32 81.51 81.51 85.44 85.44

  23. Effects on Co-Scheduling Normalized Corun Degradation 2.5 2 1.5 1 0.5 0 optimal CAPS-real CAPS-pred random 23

  24. Conclusion  Input influence to job co-scheduling  Co-schedulers should adapt to program inputs  Cross-input predictive models  Reasonable accuracy through LMS and NN  Effective in proactive co-scheduling 24

  25. Thanks! Questions? 25

Recommend


More recommend