On the Impact of Isolation Costs on Locality-aware Cloud Scheduling - PowerPoint PPT Presentation

On the Impact of Isolation Costs on Locality-aware Cloud Scheduling Ankit Bhardwaj, Meghana G Gupta , Ryan Stutsman University of Utah Scalable Computer Systems Lab www.utah.systems

Code Isolation-cost Aware Scheduling Cloud N Networking P Performance → 1 100 G Gbps, m , microsecond r round-tr trips Rethink o of c f code i isolation s schemes → M Meltdown, , Sp Spectre, V , VT-x, , eB eBPF, W , WASM Granular, S , Serverless A Applications → V Visibility a and P Placement a a f fine g grain Three r recent s t shifts ts i in th the c cloud

Code Isolation-cost Aware Scheduling Cloud N Networking P Performance → 1 100 G Gbps, m , microsecond r round-tr trips Rethink o of c f code i isolation s schemes → M Meltdown, , Sp Spectre, V , VT-x, , eB eBPF, W , WASM Granular, S , Serverless A Applications → V Visibility a and P Placement a a f fine g grain Diversity a and F Flexibility i in P Placement, W , Workloads, a , and Is Isolation C Costs

Code Isolation-cost Aware Scheduling Cloud N Networking P Performance → 1 100 G Gbps, m , microsecond r round-tr trips Rethink o of c f code i isolation s schemes → M Meltdown, , Sp Spectre, V , VT-x, , eB eBPF, W , WASM Granular, S , Serverless A Applications → V Visibility a and P Placement a a f fine g grain Diversity a and F Flexibility i in P Placement, W , Workloads, a , and Is Isolation C Costs Is Isolation- and d data-mo moveme ment-cost A Aware S Scheduling f for C Cloud C Compute It It i is t time f for a a h holistic, c , cost-aware a approach t to s scheduling i in t the c cloud

Past: State + Application on One VM • Compute/storage together on one machine; VMs access state locally DATA DATA • Pr Probl blem : Resource stranding • Idle compute when storage capacity is the limiting factor • Idle storage when compute capacity is the limiting factor • Costly to reorganize

Today: Disaggregation tion: Separate compute from storage • Soluti • New P Problem: : High data movement costs (multiple gets/puts) • RPC, serialization/deserialization • TCP/transport • memcpys • Substantial c costs a at g gigabits/second

Move compute to storage at finer grain? • Soluti tion: : storage-side computation over stored data • But, high tenant density at storage to homogenize/balance load • Need granular decomposition of application logic • Pr Probl blem: Many tenants sharing storage; code isolation is hard • Process creation and context switch add up

Key Idea: Isolation-cost Aware Scheduler • Placement of computation in the cloud can improve efficiency • by eliminating data movement, • but it also must reason about code isolation costs to do so. • Profile • inter-function interaction in applications, • data access and locality patterns, • networking, dispatch, and isolation domain context switch costs • Global fine-grained, core-level choices at microsecond-timescales

Challenges for Isolation-cost Aware Scheduling • Need for Fine-grained Applications • Workload Characterization • Profiling and Understanding Context Switch Costs • Provisioning, Re-provisioning, and Placement • Dealing with Intermediate State

Challenge #0: Need Finer-grained Apps VM VM • Scheduler must be able to "see" into applications to optimize • Soluti tion : serverless λ λ • Functions can be individually placed λ λ λ • Creates visibility into applications λ λ • Supports alternative isolation schemes λ λ • Malleable interface λ • Today implementations do not tap into these potential benefits

Challenge #1: Workload Characterization • Pr Probl blem: No insight into function's 8.00 (millions of invocations/second) Client-side Function + Disaggregated Access 7.00 Server-side Function + Colocated Access network and data access costs Function Throughput 6.00 5.00 • Soluti tion: Profile functions to capture 4.00 3.00 • data access patterns and locality 2.00 • runtime distribution 1.00 0.00 1 2 4 Data Record Accesses (accesses/invocation) • Place functions that access many records or much data at storage • Dynamically shift to idle compute when server is overloaded • Even simple schemes can work: counting accesses & runtime

Challenge #2: Code Isolation Costs App 1 VT-x VM Pr Probl blem: isolation costs vary depending on workload App 2 VT-x VM VMs: hw protection & dispatch • Too expensive to context switch VT-x VMs for isolation, • Good if high per-tenant throughput SR-IOV+IOMMU for dispatch Containers: sw dispatch Processes for isolation, software demultiplexing for dispatch • Need ms-scale length requests App 1 Address Space App 1 Address Space • Good for timesharing CPU App 1 Address Space App 1 Address Space App 1 Address Space Language Runtimes: pure sw App 1 Address Space • Good for short-running functions with constrained logic Page Table Switching

Comparing Three Hw Isolation Schemes • Paging/conventional process context switch is always costly • Low tenant counts → MPK Page Table Entry Coloring Fastest • Higher tenants counts → Extended Page Table Switching Fastest Best s t scheme d depends o on t tenant c t count a t and r request r t rates

Challenge #3: Provisioning & Placement • Problem: : Function properties change over time • in data access patterns • in computational costs • in distribution of functions invoked • Churn and instability forces new placement decisions • VMs, containers, etc have different start, stop, migration costs • Soluti tion: scheduling must model stability and variance of workload • In compute costs, invocation frequency, and data access

Preliminary Design Ideas Incoming In Function Func n Invocations In Task D Dispatching Load B Balancer • Two-level scheduling avoids idle CPUs but limits queue imbalance Global S Scheduler • History at global level, route invocations to avoid context switching Storage N Node Compute N Node • Global knowledge of data placement Storage N Node Compute N Node Storage N Node Compute N Node Statistics, L , Load, & , & P Prediction • Core and task level stats collection Local T Task S Scheduler Push via RDMA writes • • Low-cost with frequent updates 100s to 1000s of machines pushing • updates each second • Use in assessing workload stability Used by scheduler to promote/demote • Stored D Data Local T Task S Scheduler functions between isolation schemes

Discussion Questions • Cloud process model • Cloud f function i interfaces ( (that d differ f from P POSIX IX) a are l likely t to t take h hold? • Security risks • Larger attack surface, but works around vulnerabilities with less reengineering • Which i isolation s schemes a and r runtimes l likely t to b be s sufficiently t trustworthy? • Workloads • What w will f future, m , more g granular s serverless w workloads l look l like? • What w ways m might t there b be t to a approximate t these w workloads u using p public d data? • Pricing • How m might i improved b but h hard-to to-predict e efficiency g gains b be r reflected i in p pricing?

Conclusion • Kernel-bypass → low-latency, high-throughput storage services • These gains are now showing up in the cloud • Fast networks → more data movement • Small functions over data, but code isolation cuts into gains • Key idea: different code isolation schemes have different costs • Dynamically understand data movement and code isolation costs • Run different functions with different schemes based on runtime profiling • For more details, check out our project website or reach out to me at meghana@cs.utah.edu.

On the Impact of Isolation Costs on Locality-aware Cloud Scheduling - PowerPoint PPT Presentation

On the Impact of Isolation Costs on Locality-aware Cloud Scheduling Ankit Bhardwaj, Meghana G Gupta , Ryan Stutsman University of Utah Scalable Computer Systems Lab www.utah.systems Code Isolation-cost Aware Scheduling Cloud N Networking P

CONTEXT LOCALITY LOCALITY LOCALITY LOCALITY LAYOUTS M E E R L U S T R O A D PICK

Locality Locality CS 105 Tour of the Black Holes of Computing Principle of Locality: Programs

GCC Highlighted Products GSure Gel Extraction kit GSure Soil DNA Isolation kit GSure Sputum DNA

Serializable Snapshot Isolation Making ISOLATION LEVEL SERIALIZABLE Provide Serializable

locality.org.uk Locality is the national network of ambitious and enterprising community-led

Highway Locality Budget Scheme Steve Dibben Highway Locality Manager Mid Herts Group

Resource Management for Isolation Enhanced Cloud Services Enhanced Cloud Services Himanshu Raj ,

Introduction to pixel track isolation The purpose of track isolation algorithm is an additional

ADAPTED SPAULDING PYRAMID Making Isolation: How does it work? Patient Isolation- Creating

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

LEEN : Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud i # Shadi Ibr ahim,

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

TenantGuard: Scalable Runtime Verification of Cloud-Wide VM-Level Network Isolation Han Song

ISOLATION DEFENSES GRAD SEC OCT 03 2017 ISOLATION Running untrusted code in a trusted

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles Graham

Welcome! LGSEC.org Explore a New Funding/Partner-Finding Platform from the California Energy

UMAMI: A Recipe for Generating Meaningful Metrics through Holistic I/O Performance Analysis

Data-Intensive Workfmows A journey to a Holistjc Framework for Data-Intensive Workfmows Ian

STANDARDIZING QUALITY ASSESSMENT FOR THE MULTILINGUAL WEB Leonid

Peninsula Clean Energy Board of Directors Meeting August 22, 2019 Agenda Call to order /

Natural Language Generation . .. . . .. .. . .. . . . .. . . .. . . .. . . ..

Breaking reCAPTCHA: A Holistic Approach via Shape Recognition IFIP SEC 2011 Paul Baecher, Niklas

Sambuz

Useful Links

Newsletter

Mail Us