I MPROVING R ESOURCE E FFICIENCY I N C LOUD C OMPUTING Christina Delimitrou Stanford University Defense ¡– ¡May ¡26 th ¡2015 ¡ ¡
Resource efficiency is a first-order system constraint How efficiently do we utilize resources? How efficiently do we utilize resources? How efficiently do we design systems? 2
Why Care about Resource Efficiency? Performance/Cost Time Performance/Cost Time 3
~10K commodity servers Sophisticated cluster managers ~10s MWatts $100,000,000s Private clouds: • Google, Microsoft, Twitter, eBay Public clouds: • Amazon EC2, Windows Azure, GCE 4
The Promise of Cloud Computing ¨ Flexibility ¤ Provision and launch new services in seconds ¨ High performance ¤ High throughput & low tail latency ¨ Cost effectiveness ¤ Low capital & operational expenses Cloud computing scalability: high performance AND low cost 5
The Reality of Cloud Computing 6
Scaling Datacenters ¨ Switch to commodity servers One time trick ¨ Improve cooling/power distribution < 10% ¨ Build more datacenters >$300M per datacenter ¨ Add more servers Power limit End of voltage scaling ¨ Rely on processor technology Use existing systems more efficiently 7
Datacenter Underutilization Twitter (Mesos) 1 Google (Borg) 2 4-5x 3-5x 0 10 20 30 40 50 60 70 80 90 100 CPU Utilization (%) 1 C. Delimitrou and C. Kozyrakis. Quasar: Resource-Efficient and QoS-Aware Cluster Management, ASPLOS 2014. 2 L. A. Barroso, U. Holzle. The Datacenter as a Computer, 2013. 8
Datacenter Underutilization… Is the cluster manager’s fault Is the user’s fault! 9
Reserved vs. Used Resources 1.5-2x 3-5x ¨ Twitter: up to 5x CPU & up to 2x memory overprovisioning 10
Reserved vs. Used Resources ~25,000 jobs 936 distinct users [ASPLOS’14] Reservation=Usage ¨ 20% of job under-sized, ~70% of jobs over-sized 11
Datacenter Underutilization… Is the user’s fault! (not really…) 12
Resource Management is Hard 13
Performance Depends on Scale-up Performance Cores 14
Performance Depends on Heterogeneity Performance Cores 15
Performance Depends on Heterogeneity Scale-out Performance Performance Servers Cores 16
Performance Depends on Heterogeneity Scale-out Performance Performance Servers Cores Input load Performance Input size 17
Performance Depends on Heterogeneity Scale-out Performance Performance Overprovision Reservations! Servers Cores When sw changes, when platforms change, etc. Input load Interference Performance Performance Input size Interference 18
Can we improve resource efficiency while preserving application QoS guarantees? Potential: 3-5x efficiency; $10Ms in cost savings 19
Requirements ¨ Automate resource management ¤ Large, multi-dimensional space à Leverage big data ¨ General solution ¤ Different application types (batch, latency-critical) ¤ Different types of hardware ¨ Cross-layer design ¤ Architecture à OS à Scheduler à Application design 20
Contributions 21
Contributions Paragon [ASPLOS’13, TopPicks’14] [IISWC’13] Resource reservations Users Scheduler Cluster 1. Practical data mining 22
Contributions Quasar [ASPLOS’14] 2. High level interface Resource Users Scheduler Cluster reservations 1. Practical data mining 23
Contributions Systems: Application assignment : Paragon [ASPLOS’13, TopPicks’14, CAL’13, IISWC’13] Cluster management: Quasar [ASPLOS’14] 24
Contributions Systems: Application assignment : Paragon [ASPLOS’13, TopPicks’14] , iBench [IISWC’13] Cluster management: Quasar [ASPLOS’14] Scalable scheduling: Tarcil [SOCC’15] 25
Contributions Systems: Application assignment : Paragon [ASPLOS’13, TopPicks’14] , iBench [IISWC’13] Cluster management: Quasar [ASPLOS’14] Scalable scheduling: Tarcil [SOCC’15] Cloud provisioning: Hybrid Cloud [in submission] 26
Contributions Systems: Application assignment : Paragon [ASPLOS’13, TopPicks’14] , iBench [IISWC’13] Cluster management: Quasar [ASPLOS’14] Scalable scheduling: Tarcil [SOCC’15] Cloud provisioning: Hybrid Cloud [in submission] Admission control: ARQ [ICAC’13] 27
Contributions Systems: Application assignment : Paragon [ASPLOS’13, TopPicks’14] , iBench [IISWC’13] Cluster management: Quasar [ASPLOS’14] Scalable scheduling: Tarcil [SOCC’15] Cloud provisioning: Hybrid Cloud [in submission] Admission control: ARQ [ICAC’13] Datacenter application modeling: ECHO [IISWC’12], Storage application modeling [CAL’12, IISWC’11, 28 ISPASS’11]
Paragon [ASPLOS’13, TopPicks’14] Resource reservations Scheduler Users Cluster Practical data mining techniques 29
Heterogeneity & Interference Matter ¨ Heterogeneity Ignore Heterogeneity Ignore Both ¤ DCs provisioned over 15 years ¤ Multiple server generations & configurations ¨ Interference ¤ Apps contend on shared resources n CPU & cache hierarchy n Memory system n Storage & network I/O 30
Extracting Resource Preferences ¨ Naïve: exhaustive characterization ¤ ~10-20 platforms x 1,000 apps Resource reservations App App App Users Scheduler Cluster App App Mine Data big data ¨ Looks like a recommendation problem 31
Recommendation Systems ¨ Content-based systems: ¤ Description of items (keywords, feature vector, etc. ) ¤ Profile of user preferences (history, model, user-system interaction, etc. ) ¨ Collaborative filtering: ¤ Uncover similarities between users and items ¤ No need to know item features or explicit user preferences in advance 32
Recommendation Systems ¨ Content-based systems: ¤ Description of items (keywords, feature vector, etc. ) ¤ Profile of user preferences (history, model, user-system interaction, etc. ) ¨ Collaborative filtering: ¤ Uncover similarities between users and items ¤ No need to know item features or explicit user preferences in advance 33
Something familiar… ¨ Collaborative filtering – similar to Netflix Challenge system ¤ Singular Value Decomposition (SVD) + PQ reconstruction (SGD) movies movies 5 4 1 3 5 4 5 4 3 1 3 3 4 1 5 3 3 3 2 4 4 3 5 2 4 1 2 users SVD SVD Recommendations 1 5 2 1 3 5 5 3 1 PQ reconstruction 2 3 1 4 2 3 4 3 2 3 5 3 2 4 3 5 5 5 2 3 2 1 3 4 5 3 4 Dense utility matrix Sparse utility matrix 34
SVD m 1 m 2 … m n movie ! $ u 1 a 11 a 12 ... a 1 n # & user u 2 a 21 a 22 ... a 2 n # & # & … rating (e.g., ) # & u m a m 1 a m 2 ... a mn # & " % = m 1 … m n ! $ ! $ ! $ u 11 ... u 1 r ... 0 v 11 ... v 1 r u 1 σ 1 # & # & # & x x # & # & # & … # & # & # & u m 1 ... u mr 0 ... v n 1 ... v nr σ r u m # & # & # & " % " % " % 35
SVD m 1 m 2 … m n movie ! $ u 1 a 11 a 12 ... a 1 n # & user u 2 a 21 a 22 ... a 2 n # & # & … rating (e.g., ) # & u m a m 1 a m 2 ... a mn # & correlation of user " % to similarity concept = m 1 … m n ! $ ! $ ! $ u 11 ... u 1 r ... 0 v 11 ... v 1 r u 1 σ 1 # & # & # & x x # & # & # & … # & # & # & u m 1 ... u mr 0 ... v n 1 ... v nr σ r u m # & # & # & " % " % " % similarity concept correlation of movie to similarity concept 36
Heterogeneity Classification … Movie 1 Movie 2 Movie 3 Movie 4 Movie 5 Movie M User A User B … User N 37
Heterogeneity Classification … Platform 1 Platform 2 Platform 3 Platform 4 Platform 5 Platform M User A User B … User N 38
Heterogeneity Classification … Platform 1 Platform 2 Platform 3 Platform 4 Platform 5 Platform M App A App B … App N 39
Heterogeneity Classification … Platform 1 Platform 2 Platform 3 Platform 4 Platform 5 Platform M App A 1,500QPS 843QPS App B 458QPS 946QPS … App N 1,016QPS 186QPS App performance 40
Heterogeneity Classification … Platform 1 Platform 2 Platform 3 Platform 4 Platform 5 Platform M App A 1,500QPS 843QPS App B … App N Profiled Performance Inferred Performance 41
Heterogeneity Classification … Platform 1 Platform 2 Platform 3 Platform 4 Platform 5 Platform M … App A 1,500QPS 843QPS 675QPS 843QPS 1,786QPS 8,675QPS App B … App N Profiled Performance Inferred Performance 42
Heterogeneity Classification … Platform 1 Platform 2 Platform 3 Platform 4 Platform 5 Platform M … App A 1,500QPS 843QPS 675QPS 843QPS 1,786QPS 8,675QPS … App B 987QPS 458QPS 773QPS 1,073QPS 986QPS 1,836QPS … App N Profiled Performance Inferred Performance 43
Recommend
More recommend