exploiting hardw are heterogeneity for i nteractive
play

Exploiting Hardw are Heterogeneity for I nteractive Services - PowerPoint PPT Presentation

Exploiting Hardw are Heterogeneity for I nteractive Services Yuxiong He 2 Joint work with Shaolei Ren 1 , Sameh Elnikety 2 , Kathryn S McKinley 2 1 Florida International University 2 Microsoft Research 1 I nteractive Services Applications


  1. Exploiting Hardw are Heterogeneity for I nteractive Services Yuxiong He 2 Joint work with Shaolei Ren 1 , Sameh Elnikety 2 , Kathryn S McKinley 2 1 Florida International University 2 Microsoft Research 1

  2. I nteractive Services • Applications – Web search, web server, finance server • Requirements – High quality, fast response – High throughput, low cost 2

  3. Hardw are for I nteractive Services in Today’s Data Center • Homogeneous servers 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Few fast high-performance Many slow energy-efficient cores cores 3

  4. Variance of Job Service Dem and Hom ogeneous server 0.4 w ith slow cores: cannot satisfy QoS of 0.35 long requests 0.3 probability 0.25 Hom ogeneous server 0.2 w ith fast cores: 0.15 meet QoS but energy 0.1 consuming and lower 0.05 throughput 0 5 15 25 35 45 55 65 75 85 95 service dem and ( m s) Figure. Measured Bing search service demand distribution 4

  5. Opportunity of Heterogeneity Heterogeneous server: 0 0 0 0 combine fast and slow cores Slow cores 0 0 0 0 0.4 0.35 0 0 0.3 probability 0.25 0.2 Fast cores 0.15 Challenges: 0.1 1. Service demand 0.05 is unknown. 0 2. Jobs compete for 5 15 25 35 45 55 65 75 85 95 cores. service dem and ( m s) Figure. Measured Bing search service demand distribution 5

  6. Contributions • FOF scheduler for heterogeneous servers • Bing search server simulation – Double throughput while meeting QoS • FOF for servers with SMT (Simultaneous Multithreading) • Finance server implementation – 16% higher throughput than default OS scheduler 6

  7. Scheduling Model • Inputs • Queue of jobs • Job service demand unknown • Job deadline • Partial results Measured Bing search quality profile 7

  8. Scheduling Model • Inputs • Queue of jobs • Job service demand unknown • Job deadline • Partial results • Outputs • Assign jobs to fast/ slow cores • Decide processing time of jobs • Objective • Maximize total quality of all jobs 8

  9. Challenge I . Unknow n Service Dem and • How can we assign long jobs to fast cores and short jobs to slow cores? • Key insight: Slow to Fast – Migrate a job from slower to faster cores – Short jobs complete on slow cores – Leave fast cores for long jobs 9

  10. Challenge I I . Jobs Com pete for Cores • Which jobs should be processed by fast cores? • Key insight: Fast Old – Assign fast cores to old jobs. 10

  11. “Fast Old” insight • Older job has closer deadline. • Older job has more work left. • “Fast old” improves response quality 0.4 probability 0.35 0.3 0.25 2 7 .2 m s 2 0 m s 3 1 .6 m s 0.2 0.15 0.1 0.05 0 5 20 35 50 65 80 95 service dem and 11

  12. FOF Scheduler: Fast Old & First 1 . Fast first: alw ays use the fastest available core 2 . Fast old: prom ote old jobs slow to fast Slow Medium Fast 12

  13. Evaluation • Simulation modeling Bing search workload • Hardware: 4 servers configurations with same design time power budget A: 2 Big cores (Sandy Bridge) B: 10 Medium cores (Nehalem) C: 24 Small cores (AtomD) D: 1 B + 4 M + 2 S 13

  14. Hom ogeneous Fast vs Slow Cores Hom ogeneous Quality 1 A. 2 Fast 0 .9 9 8 B. 1 0 Medium C. 2 4 Slow 0.995 0.99 0.985 0.98 0.975 10 20 30 40 50 60 70 80 90 100 Arrival rate: Queries per Second 14

  15. Hom ogeneous Fast vs Slow Cores Hom ogeneous Quality 1 A. 2 Fast 0 .9 9 8 B. 1 0 Medium C. 2 4 Slow 0.995 0 0 0.99 A 0.985 0.98 0.975 10 20 30 40 50 60 70 80 90 100 Arrival rate: Queries per Second 15

  16. Hom ogeneous Fast vs Slow Cores Hom ogeneous Quality 1 A. 2 Fast 0 .9 9 8 B. 1 0 Medium C. 2 4 Slow 0.995 0 0 0.99 A 0.985 0 0 0 0 0 0.98 0 0 0 0 0 B 0.975 10 20 30 40 50 60 70 80 90 100 Arrival rate: Queries per Second 16

  17. Heterogeneous vs. Hom ogeneous Hom ogeneous Quality Quality 1 1 A. 2 Big 0 .9 9 8 B. 1 0 Medium C. 2 4 Sm all D 0.995 0.995 Double 0 .9 9 8 throughput FOF on or buy 5 0 % few er servers 0.99 0.99 Heterogeneous D . 1 Fast A + 4 Medium 0.985 0.985 + 2 Slow 0.98 0.98 B 0.975 0.975 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 100 100 QPS QPS 17

  18. Opportunities on Existing Data Center Hardw are • SMT (Simultaneous Multithreading) or Hyperthreading • SMT creates asymmetry among cores – Fast core: a physical core only runs one job – Slow core: two logical cores belonging to the same physical core both run jobs 18

  19. I nsight SMT = dynam ic heterogeneous core 4 fast 3 fast+ 2 fast + 2 slow 4 slow ... 8 slow 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SMT off SMT on … SMT on 1 core all cores Sim ultaneous Multithreading ( SMT) 0 19

  20. FOF Scheduler for SMT 1 . Fast first Fastest = unshared core 2 . Fast old free core? Find shared pair ( oldest, X) m ove X to free core 20

  21. Evaluation • Implementation on Finance application: Monte-Carlo computation for option price • Hardware: 6 Core 2-way SMT 3.33 GHz Intel Xeon X5680 – shared (slow) smt-core speed = 0.63 x unshared (fast) core speed • FOF achieves – 16% higher throughput than default OS scheduler while meeting QoS 21

  22. Conclusions • FoF scheduler for interactive services – Exploit hardware heterogeneity – Achieve both high quality and high throughput • Heterogeneous servers: Bing search simulation – Double throughput while meeting QoS • SMT: Finance server implementation – 16% higher throughput than default OS scheduler 22

  23. Thank you & Questions 23

Recommend


More recommend