energy proportionality and worload consolidation for
play

Energy'Proportionality'and'Worload' - PowerPoint PPT Presentation

Energy'Proportionality'and'Worload' Consolidation'for'Latency6critical' Applications George'Prekas,'Mia'Primorac,' Adam'Belay,'Christos'Kozyrakis, Edouard'Bugnion Latency(critical,applications, [OSDI14] 99 th %*Latency*(s) Linux IX


  1. Energy'Proportionality'and'Worload' Consolidation'for'Latency6critical' Applications George'Prekas,'Mia'Primorac,' Adam'Belay,'Christos'Kozyrakis, Edouard'Bugnion

  2. Latency(critical,applications, [OSDI14] 99 th %*Latency*(µs) Linux IX Throughput*(RPS*x*10 6 ) • Memcached,*Facebook*USR*workload,*2752*connections • Server:*Xeon*E5@2665*@2.4*Ghz,*8*cores*and*16*HTs,*Intel*x520 2

  3. Latency(critical,applications, [OSDI14] 99 th %*Latency*(µs) SLO Linux IX Throughput*(RPS*x*10 6 ) • Memcached,*Facebook*USR*workload,*2752*connections • Server:*Xeon*E5@2665*@2.4*Ghz,*8*cores*and*16*HTs,*Intel*x520 3

  4. What,about,energy,efficiency? 99 th %*Latency*(µs) Power)(W) Linux IX Over.configurations) drain)a)lot)of)power Throughput*(RPS*x*10 6 ) • Memcached,*Facebook*USR*workload,*2752*connections • Server:*Xeon*E5@2665*@2.4*Ghz,*8*cores*and*16*HTs,*Intel*x520 4

  5. Static,configurations,trade(off nominal (2.4*GHz,*8*cores) Power*(W) LinuxY*Memcached*MRPS*at*SLO 5

  6. Static,configurations,trade(off nominal (2.4*GHz,*8*cores) Power*(W) min (1.2*GHz,*1*core) LinuxY*Memcached*MRPS*at*SLO 6

  7. Static,configurations,trade(off max (TurboBoost,*8*cores) nominal (2.4*GHz,*8*cores) Power*(W) min (1.2*GHz,*2*cores) LinuxY*Memcached*MRPS*at*SLO 7

  8. Static,configurations,trade(off max (TurboBoost,*8*cores) nominal (2.4*GHz,*8*cores) Power*(W) min (1.2*GHz,*2*cores) 224*static*configurations 8

  9. Pareto,Frontier max (TurboBoost,*8*cores) nominal (2.4*GHz,*8*cores) Power*(W) Pareto)frontier Theoretical)optimum:)pick)the)best)static) min (1.2*GHz,*2*cores) configuration)for)any)given)load)level LinuxY*224*static*configurations 9

  10. Potential,energy,savings Linux*max IX*max Power*(W) Linux Pareto IX*Pareto Memcached*MRPS*at*SLO 10

  11. Contributions • Dynamic'resource'controls' – for'low6latency,' high6throughput' dataplanes – Supports'energy'proportionality'and' workload'consolidation'policies • Evaluation'of'dynamic'resource'controls'vs. – Maximum'configuration'(static) – Pareto6optimal' behavior'(theoretical'bound) 11

  12. Dynamic,resource,control dataplane Add'a'core Latency*critical* CP Background workload task Policy Userspace IX Host* Host Kernel RX RX TX TX C C C C C 12

  13. Dynamic,resource,control dataplane Latency*critical* CP Background workload task Policy Userspace IX Host* Host Kernel RX RX RX TX TX TX C C C C C 13

  14. Dynamic,resource,control dataplane Latency*critical* CP workload Policy Userspace IX Host* Host Kernel RX RX RX RX TX TX TX TX C C C C C 14

  15. Dynamic,resource,control dataplane Latency*critical* CP workload Policy Userspace IX Host* Host Kernel RX RX RX RX TX TX TX TX C C C C C 15

  16. Key,challenges 1. Which'resources'to'add/remove'? inferred'by'Pareto'analysis – different'for'energy'proportionality'and' – workload'consolidation 2. When'to'add/remove'resources'?' Need'to'design'a'stable'control'loop – Different'triggers'to'add/remove'resources – 3. How'to'add/remove'cores Fast,'TCP6friendly'rebalancing'mechanism' – 16

  17. #1:,Resource,Adjustment,Policies Power*(W) A +Turbo) +core +ht +dvfs 17

  18. #2:,Detection • Centralized'queues'provide'a'single'point'of' load'detection in'sub6second'timescales RX Event@driven* TCP/IP TCP/IP FIFO app RX 18

  19. #2,Detection,(add) • Centralized'queues'provide'a'single'point'of' load'detection in'sub6second'timescales RX Event@driven* TCP/IP TCP/IP FIFO app Q RX queuing*delay**<*300*us 19

  20. #3:,TCP(friendly,add/remove,core Challenge:'maintain'coherence6free'IX'design • – Queues'dedicated'to'cores – Lock6free'TCP'stack TCP/I TCP/I App P P NIC TCP/I TCP/I App P P RSS* TCP/I TCP/I hash App P P Challenge:)inherent)race)condition)in)NIC)HW)update TCP/I TCP/I App P P Flow)groups 20

  21. #3:,Flow(group,Migration,Algorithm • TCP6friendly • Without'dropping'or'reordering'packets Completes)in)less)than)2)ms 95%)of)the)time 21

  22. Experimental,setup • Latency6sensitive' workload'– memcached – Energy,prop. ;'workload'consolidation • 3'demanding'synthetic'load'patterns: – slope ,' step , sine+noise – 4min+cycle+time • 10'load'generating'clients'+'1'latency'measuring' client'@1'second'intervals • 2752'connections;' Poisson6distribution 22

  23. Evaluation,– step,pattern Achieved*MRPS Add'a' core 99 th pct latency(µs) Adequate)compliance:)violations)~)1)second Time*(s) 23

  24. Evaluation,– step,pattern max dynamic Power*(W) Pareto Average:)max=91W))Dynamic=48W)Pareto=41W Time*(s) 24

  25. Evaluation Slope Step Sine+noise Max.*power 91 92 94 Proportionality Energy* Saving)44%.54%)of)processor)energy) ! (W) Measured 42 48 53 85%.93%)of)Pareto.optimal)bound Pareto* 39 41 45 optimal Consolidation* Opportunity* (%*of*peak) Measured 46% 39% 32% Server* Running)background)job)at)32%.46%)of)their) standalone)throughput) ! Pareto* 50% 47% 39% 82%.92%)of)the)Pareto.optimal)bound optimal 25

  26. Conclusion • Real'challenges'to'latency6sensitive' applications – Maintain'service6level'objectives' while – Minimize'energy'consumption' or – Maximize'workload'consolidation • Design'using'Pareto'methodology'to'determine' theoretical'bound'and'derive'control'policies • Implement'dynamic'resource'controls'to'IX' dataplane'operating'system 26

  27. Thank,you

Recommend


More recommend