resource efficiency in the cloud
play

Resource Efficiency in the Cloud I N C LOUD C OMPUTING Neeraj - PowerPoint PPT Presentation

SAIL (Systems, Architecture and Infrastructure Lab) Leveraging Approximation to Improve I MPROVING R ESOURCE E FFICIENCY Resource Efficiency in the Cloud I N C LOUD C OMPUTING Neeraj Kulkarni, Feng Qi, Glyfina Fernando Christina Delimitrou and


  1. SAIL (Systems, Architecture and Infrastructure Lab) Leveraging Approximation to Improve I MPROVING R ESOURCE E FFICIENCY Resource Efficiency in the Cloud I N C LOUD C OMPUTING Neeraj Kulkarni, Feng Qi, Glyfina Fernando Christina Delimitrou and Christina Delimitrou Cornell University Cornell University WAX – April 9 th 2017

  2. Datacenter Underutilization Twitter (Mesos) 1 Google (Borg) 2 4-5x 3-5x 0 10 20 30 40 50 60 70 80 90 100 CPU Utilization (%) 1 C. Delimitrou and C. Kozyrakis. Quasar: Resource-Efficient and QoS-Aware Cluster Management, ASPLOS 2014. 2 L. A. Barroso, U. Holzle. The Datacenter as a Computer, 2013. 2

  3. A Common Approach App1 App2  Co-schedule multiple cloud services on same physical platform  Often leads to resource interference, especially when sharing cores 3

  4. A Common Cure App1 App2  Co-schedule one high priority and one/more best-effort apps  Performance is non-critical for best effort jobs  Disadvantage: assume best-effort apps are always low priority 4

  5. Approximate Computing Apps to the Rescue App1 App2  Approximate computing apps can absorb a loss of resources as loss of output quality instead of a loss in performance  Advantage: performance of all co-scheduled applications is high- priority 5

  6. Pliant Pliant runtime App1 App2  Enables latency-critical & approximate apps to share resources (including cores) without penalizing their performance  Tunes degree and type of approximation based on measured 6 interference

  7. Challenges Identify opportunities for approximation 1. Pliant runtime ACCEPT (precision, loop perforation, sync  elision), algorithmic exploration App1 App2 Lightweight profiling to determine when to 2. employ approximation End-to-end latency/throughput & perf counters  Determine what resource(s) to constrain? 3. Based on measured interference  Determine what type of approximation & to 4. what extent? Based on interference and performance impact 7 

  8. Pliant Server Pliant runtime Interference Client monitor Workload generator App1 App2 Performance monitor DynamoRIO for switching between precise/approximate versions Initial implementation, overheads high but not prohibitive  Looking into Petabricks and LLVM 8 

  9. Adaptive Approximation  Incremental approximation:  Employ the minimum amount of approximation (quality loss) to restore the performance of the interactive service  Several versions for each type of approximation, choose online  Interference-aware approximation:  Choose the type of interference that minimizes pressure in the bottlenecked resource  Example:  High memory interference  prioritize algo tuning  High CPU interference  prioritize sync elision, loop perforation 9

  10. Methodology  Latency-critical interactive services: memcached & nginx  Open-loop workload generator & performance monitor  Facebook traffic pattern  Approximate computing apps: PARSEC, SPLASH, Spark MLlib  System: 2 2-socket, 40-core servers, 128GB RAM each 10

  11. Evaluation  memcached sharing physical cores with PARSEC  Latency  Degree of approximation 11

  12. Conclusions  Approximate computing: opportunity to improve cloud efficiency without loss in performance  Pliant: cloud runtime to co-schedule interactive services with approximate computing apps  Incremental and interference-aware approximation  Preserves QoS for interactive service with minimal loss in quality for approximate computing application  Current work:  DynamoRIO  Petabricks/LLVM  Add cloud approximate computing application  Improve interference awareness  Leverage hardware isolation techniques 12

  13. Questions?  Approximate computing: opportunity to improve cloud efficiency without loss in performance  Pliant: cloud runtime to co-schedule interactive services with approximate computing apps  Incremental and interference-aware approximation  Preserves QoS for interactive service with minimal loss in quality for approximate computing application  Current work:  DynamoRIO  Petabricks/LLVM  Add cloud approximate computing application  Improve interference awareness  Leverage hardware isolation techniques 13

Recommend


More recommend