Spock: Exploiting Serverless Functions for SLO and Cost Aware Resource Procurement in Public Cloud Jashwant Gunasekaran, Prashanth Thinakaran, Mahmut Kandemir, Bhuvan Urgaonkar, George Kesidis, Chita Das Computer Science and Engineering The Pennsylvania State University Hardware architectures for Modern Scientific Computing
The Last Supper of cloud clients Probably our private key is But.. I compromised But have the money to I turned off my VM pay for it this time Instances � 2
That Last AWS bill � 3
Spock: Cost Aware Resource Procurement in Public Clouds using Serverless Hardware architectures for Modern Scientific Computing
Whose problem are we solving? � 5
Outline • Elastic Web Services • VM-based Resource Procurement • Serverless Functions • Cost of VMs vs Cloud Functions • Spock Hybrid Elastic Scaling • Implementation and Evaluation • Results � 6
Elastic Web Services • Short lived queries • Strict SLO • Varying resource demands Typical example? • Stateless ML based web services • Resources Required • acquired/released on demand • Average to Peak ratio is high � 7
ML Inference Engine � 8
Outline • Elastic Web Services • VM-based Resource Procurement • Serverless Functions • Spock Hybrid Elastic Scaling • Implementation and Evaluation • Results � 9
VM-based Procurement EC2 instances � 10
VM-based Procurement • Initial pool of VM Arrival rate SLA active VMs Scale Up Scale Down d n a m e D • Procure more VMs e c SLA Violations r u o s e on demand R Time (Sec) • Autoscaling during request surge � 11
Disadvantages • Very long VM startup times (5s-50s) • Over-provisioning to meet strict SLOs Possible alternative/s? • Under-provisioned during sudden surge � 12
Outline • Elastic Web Services • VM-based Resource Procurement • Serverless Functions • Spock Hybrid Elastic Scaling • Implementation and Evaluation • Results � 13
Serverless Functions � 14
Serverless Functions • Pay per second Lambda SLA Arrival rate Scale Up Scale Down • Cost efficient d SLA n a Violations m e D e c r u o But, is serverless a panacea? s • Scale e R instantaneously Time (Sec) • Intermittent SLA violations � 15
Constant arrival rate • Constant arrival rate 40 VM Lambda 30 • Cost compared under iso- Cost ($) performance 20 10 • All requests have similar SLA compliance 0 0 50 100 150 200 Requests per sec • VMs are 100% utilized � 16
Varying arrival rate 120 Avg-1 Avg-2 Request Rate • Trace based arrival rate 90 60 • Each request is an ML inference for 30 caffenet-model 0 0 3600 7200 Time(s) Cost-effective Solution ? • Cost compared under iso-performance 2 Average-1 Average-2 Normalized Cost • All requests have similar SLA 1.5 compliance 1 0.5 • VMs are provisioned for the peak request rate 0 Lambda Lambda � 17
SPOCK • Use serverless functions along with VMS • Reduce SLO violations during request surge • Reduce intermittent over-provisioning VMs VM SLA Lambda Arrival rate Scale Up Scale Down d n a m e D SLA e c Violations r u o s e R Time (Sec) � 18
Key Motivation • It is non-trivial to predict the peak request rate at any given time period. • Provisioning VMs for the peak demands would always lead to higher cost of deployment. While, under provisioning VMs leads to severe SLO violations for queries. • Using serverless functions would overcome the SLO violation problem. However, it is not cost effective. � 19
Outline • Elastic Web Services • VM-based Resource Procurement • Serverless Functions • Spock Hybrid Elastic Scaling • Implementation and Evaluation • Results � 20
Spock Scheme • Schedule queries on VM’s if available • If VM’s are fully utilized, redirect queries to lambda functions • Spawn a new VM in the meantime • After spin-up incoming requests are sent to new VMs • Scale down VMs after three minutes of inactivity � 21
Two Scaling Policies • Reactive • Spin-up new VMs as when request surge occurs • No prediction of the request rates Lets see an example • Predictive • Using moving window linear regression predict request every minute • Spin up new VMs based on prediction � 22
Spock resource procurement Scale out Scale in 500 Request r ate per sec Lambda 375 Lambda Lambda 250 Lambda Lambda VM VM Lambda 125 VM 0 0 1 4 10 15 30 Time (hundreds of sec)
Overall Design of Spock User Applications Scaling Policy Reactive Predictive Queries Predicted Load Resource Required Resource Manager Load Balancer Query Complete Load Monitor Query Assigned Instance Created Resource Status VM VM VM VM λ λ λ λ MODEL 1 MODEL 2 MODEL 3 MODEL 4 � 24
Outline • Elastic Web Services • VM-based Resource Procurement • Serverless Functions • Spock Hybrid Elastic Scaling • Implementation and Evaluation • Results � 25
Evaluation • Two traces used to generate ML inference workload WITS Berkeley � 26
Evaluation • Mxnet Framework • AWS resources • Pretrained ML models on imagenet dataset � 27
Evaluation • Two scaling policies • Predictive • Reactive • Three resource procurement schemes • Autoscale • X-autoscale • Spock � 28
Outline • Elastic Web Services • VM-based Resource Procurement • Serverless Functions • Spock Hybrid Elastic Scaling • Implementation and Evaluation • Results � 29
Berkely Trace Results 0.6 5 Mix-1 Mix-2 SLO Violation SLO violations (%) Normalized Cost 0.45 0.3 2.5 0.15 0 0 autoscale X-autoscale Spock 0.6 14 Mix-1 Mix-2 SLO violations (%) SLO Violation Normalized Cost 0.45 0.3 7 0.15 0 0 autoscale X-autoscale Spock � 30
WITS Trace Results 2 8 Mix-1 Mix-2 SLO Violation SLO violations (%) Normalized Cost 6 1.3333 4 0.6667 2 0 0 autoscale X-autoscale Spock 1.6 12 Mix-1 Mix-2 SLO Violation SLO violations (%) Normalized Cost 9 1.067 6 0.533 3 0 0 autoscale X-autoscale Spock � 31
Spock Prediction Accuracy � 32
Spock Resource Procurement Scale in Scale out Request rate 500 Request rate per sec VM 375 VM 250 VM VM 125 VM 0 0 1 4 10 15 30 Time (hundreds of sec) � 33
Questions? � 34
Recommend
More recommend