On the Impact of Isolation Costs on Locality-aware Cloud Scheduling Ankit Bhardwaj, Meghana G Gupta , Ryan Stutsman University of Utah Scalable Computer Systems Lab www.utah.systems
Code Isolation-cost Aware Scheduling Cloud N Networking P Performance → 1 100 G Gbps, m , microsecond r round-tr trips Rethink o of c f code i isolation s schemes → M Meltdown, , Sp Spectre, V , VT-x, , eB eBPF, W , WASM Granular, S , Serverless A Applications → V Visibility a and P Placement a a f fine g grain Three r recent s t shifts ts i in th the c cloud
Code Isolation-cost Aware Scheduling Cloud N Networking P Performance → 1 100 G Gbps, m , microsecond r round-tr trips Rethink o of c f code i isolation s schemes → M Meltdown, , Sp Spectre, V , VT-x, , eB eBPF, W , WASM Granular, S , Serverless A Applications → V Visibility a and P Placement a a f fine g grain Three r recent s t shifts ts i in th the c cloud
Code Isolation-cost Aware Scheduling Cloud N Networking P Performance → 1 100 G Gbps, m , microsecond r round-tr trips Rethink o of c f code i isolation s schemes → M Meltdown, , Sp Spectre, V , VT-x, , eB eBPF, W , WASM Granular, S , Serverless A Applications → V Visibility a and P Placement a a f fine g grain Three r recent s t shifts ts i in th the c cloud
Code Isolation-cost Aware Scheduling Cloud N Networking P Performance → 1 100 G Gbps, m , microsecond r round-tr trips Rethink o of c f code i isolation s schemes → M Meltdown, , Sp Spectre, V , VT-x, , eB eBPF, W , WASM Granular, S , Serverless A Applications → V Visibility a and P Placement a a f fine g grain Diversity a and F Flexibility i in P Placement, W , Workloads, a , and Is Isolation C Costs
Code Isolation-cost Aware Scheduling Cloud N Networking P Performance → 1 100 G Gbps, m , microsecond r round-tr trips Rethink o of c f code i isolation s schemes → M Meltdown, , Sp Spectre, V , VT-x, , eB eBPF, W , WASM Granular, S , Serverless A Applications → V Visibility a and P Placement a a f fine g grain Diversity a and F Flexibility i in P Placement, W , Workloads, a , and Is Isolation C Costs Is Isolation- and d data-mo moveme ment-cost A Aware S Scheduling f for C Cloud C Compute It It i is t time f for a a h holistic, c , cost-aware a approach t to s scheduling i in t the c cloud
Past: State + Application on One VM • Compute/storage together on one machine; VMs access state locally DATA DATA • Pr Probl blem : Resource stranding • Idle compute when storage capacity is the limiting factor • Idle storage when compute capacity is the limiting factor • Costly to reorganize
Today: Disaggregation tion: Separate compute from storage • Soluti • New P Problem: : High data movement costs (multiple gets/puts) • RPC, serialization/deserialization • TCP/transport • memcpys • Substantial c costs a at g gigabits/second
Move compute to storage at finer grain? • Soluti tion: : storage-side computation over stored data • But, high tenant density at storage to homogenize/balance load • Need granular decomposition of application logic • Pr Probl blem: Many tenants sharing storage; code isolation is hard • Process creation and context switch add up
Key Idea: Isolation-cost Aware Scheduler • Placement of computation in the cloud can improve efficiency • by eliminating data movement, • but it also must reason about code isolation costs to do so. • Profile • inter-function interaction in applications, • data access and locality patterns, • networking, dispatch, and isolation domain context switch costs • Global fine-grained, core-level choices at microsecond-timescales
Challenges for Isolation-cost Aware Scheduling • Need for Fine-grained Applications • Workload Characterization • Profiling and Understanding Context Switch Costs • Provisioning, Re-provisioning, and Placement • Dealing with Intermediate State
Challenge #0: Need Finer-grained Apps VM VM • Scheduler must be able to "see" into applications to optimize • Soluti tion : serverless λ λ • Functions can be individually placed λ λ λ • Creates visibility into applications λ λ • Supports alternative isolation schemes λ λ • Malleable interface λ • Today implementations do not tap into these potential benefits
Challenge #1: Workload Characterization • Pr Probl blem: No insight into function's 8.00 (millions of invocations/second) Client-side Function + Disaggregated Access 7.00 Server-side Function + Colocated Access network and data access costs Function Throughput 6.00 5.00 • Soluti tion: Profile functions to capture 4.00 3.00 • data access patterns and locality 2.00 • runtime distribution 1.00 0.00 1 2 4 Data Record Accesses (accesses/invocation) • Place functions that access many records or much data at storage • Dynamically shift to idle compute when server is overloaded • Even simple schemes can work: counting accesses & runtime
Challenge #2: Code Isolation Costs App 1 VT-x VM Pr Probl blem: isolation costs vary depending on workload App 2 VT-x VM VMs: hw protection & dispatch • Too expensive to context switch VT-x VMs for isolation, • Good if high per-tenant throughput SR-IOV+IOMMU for dispatch Containers: sw dispatch Processes for isolation, software demultiplexing for dispatch • Need ms-scale length requests App 1 Address Space App 1 Address Space • Good for timesharing CPU App 1 Address Space App 1 Address Space App 1 Address Space Language Runtimes: pure sw App 1 Address Space • Good for short-running functions with constrained logic Page Table Switching
Comparing Three Hw Isolation Schemes • Paging/conventional process context switch is always costly • Low tenant counts → MPK Page Table Entry Coloring Fastest • Higher tenants counts → Extended Page Table Switching Fastest Best s t scheme d depends o on t tenant c t count a t and r request r t rates
Challenge #3: Provisioning & Placement • Problem: : Function properties change over time • in data access patterns • in computational costs • in distribution of functions invoked • Churn and instability forces new placement decisions • VMs, containers, etc have different start, stop, migration costs • Soluti tion: scheduling must model stability and variance of workload • In compute costs, invocation frequency, and data access
Preliminary Design Ideas Incoming In Function Func n Invocations In Task D Dispatching Load B Balancer • Two-level scheduling avoids idle CPUs but limits queue imbalance Global S Scheduler • History at global level, route invocations to avoid context switching Storage N Node Compute N Node • Global knowledge of data placement Storage N Node Compute N Node Storage N Node Compute N Node Statistics, L , Load, & , & P Prediction • Core and task level stats collection Local T Task S Scheduler Push via RDMA writes • • Low-cost with frequent updates 100s to 1000s of machines pushing • updates each second • Use in assessing workload stability Used by scheduler to promote/demote • Stored D Data Local T Task S Scheduler functions between isolation schemes
Discussion Questions • Cloud process model • Cloud f function i interfaces ( (that d differ f from P POSIX IX) a are l likely t to t take h hold? • Security risks • Larger attack surface, but works around vulnerabilities with less reengineering • Which i isolation s schemes a and r runtimes l likely t to b be s sufficiently t trustworthy? • Workloads • What w will f future, m , more g granular s serverless w workloads l look l like? • What w ways m might t there b be t to a approximate t these w workloads u using p public d data? • Pricing • How m might i improved b but h hard-to to-predict e efficiency g gains b be r reflected i in p pricing?
Conclusion • Kernel-bypass → low-latency, high-throughput storage services • These gains are now showing up in the cloud • Fast networks → more data movement • Small functions over data, but code isolation cuts into gains • Key idea: different code isolation schemes have different costs • Dynamically understand data movement and code isolation costs • Run different functions with different schemes based on runtime profiling • For more details, check out our project website or reach out to me at meghana@cs.utah.edu.
Recommend
More recommend