Work Stealing for Interac1ve Services to Meet Target Latency Jing Li - PowerPoint PPT Presentation

Work Stealing for Interac1ve Services to Meet Target Latency Jing Li ∗ , Kunal Agrawal ∗ , Sameh Elnikety†, Yuxiong He†, I-Ting Angelina Lee ∗ , Chenyang Lu ∗ , Kathryn S. McKinley† ∗ Washington University in St. Louis †MicrosoF Research ＊ This work and was iniIated and partly done during Jing Li’s internship at MicrosoF Research in summer 2014.

Interac1ve services must meet a target latency Interactive services Search, ads, games, finance Users demand responsiveness

Interac1ve services must meet a target latency Interactive services Search, ads, games, finance Users demand responsiveness Problem setting Multiple requests arrive over time Each request: parallelizable Latency = completion time – arrival time Its latency should be less than a target latency T Goal: maximize the number of requests that meet � a target latency T

Latency in Internet search Ø In industrial interactive services, thousands of servers together serve a single user query. Ø End-to-end latency ≥ latency of the slowest server end-to-end response Ime (~ 100ms for user to find responsive) Doc lookup & ranking Target latency Parsing a Doc lookup Result aggrega1on & search query & ranking snippet genera1on . . . Doc lookup & ranking

Goal — Meet Target Latency in Single Server Ø Goal – design a scheduler to maximize the number of requests that can be completed within the target latency � (in a single server) Doc lookup & ranking Target latency Parsing a Doc lookup Result aggrega1on & search query & ranking snippet genera1on . . . Doc lookup & ranking

Sequen1al execu1on is insufficient Large request must execute in parallel to meet target latency constraint Target latency Request Sequen1al Execu1on Time (ms) ( work )

Full parallelism does not always work well Large request Target latency: 90ms 270 60 Small request

Full parallelism does not always work well Finish by 1me 90 Target latency: 90ms 270 Case 1 : 1 large request + 3 small requests 60 Finish by 60 1me 110 60 0 20 1me

Full parallelism does not always work well Finish by 1me 90 Target latency: 90ms 270 Case 1 : 1 large request + 3 small requests 60 Finish by 60 1me 110 60 Small requests are wai1ng 0 20 ✖ Miss 2 requests core 1 core 2 core 3 90 130 1me 110 150

Full parallelism does not always work well Finish by 1me 90 Target latency: 90ms 270 Case 1 : 1 large request + 3 small requests 60 Finish by 60 1me 110 60 0 20 ✖ ✔ Miss 2 requests Miss 1 request core 1 core 1 core 2 core 2 core 3 core 3 50 90 130 1me 110 1me 110 150 80 270

Some large requests require parallelism Finish by 1me 90 Target latency: 90ms 270 Case 2 : 1 large request + 1 small request 60 Finish by 1me 110 0 20 1me

Some large requests require parallelism Finish by 1me 90 Target latency: 90ms 270 Case 2 : 1 large request + 1 small request 60 Finish by 1me 110 0 20 1me ✔ ✖ Miss 0 request Miss 1 request core 1 core 1 core 2 core 2 core 3 core 3 80 90 1me 1me 110 270

Strategy: adapt scheduling to load Case 1 � ✔ Miss 1 request core 1 Cannot afford to run all large � core 2 requests in parallel core 3 50 110 1me 80 270 Case 2 ✔ Miss 0 request core 1 Do need to run some large � core 2 requests in parallel core 3 90 1me 110

Strategy: adapt scheduling to load High load run large requests sequentially � ✔ Miss 1 request core 1 Cannot afford to run all large � core 2 requests in parallel core 3 50 110 1me 80 270 Low load run all requests in parallel ✔ Miss 0 request core 1 Do need to run some large � core 2 requests in parallel core 3 90 1me 110

Why does the adap1ve strategy work? Latency = Processing Time + Waiting time At low load, processing time dominates latency q Parallel execution reduces request processing time q All requests run in parallel At high load, waiting time dominates latency q Executing a large request in parallel increases waiting time of many more later arriving requests q Each large request that is sacrificed helps to reduce waiting time of many more later arriving requests

Challenge: which request to sacrifice? Strategy: when load is low, run all requests in parallel; when load is high, run large requests sequentially

Challenge: which request to sacrifice? Strategy: when load is low, run all requests in parallel; when load is high, run large requests sequentially Challenge 1 non-clairvoyant q We do not know the work of a request when it arrives Challenge 2 no accurate definition of large requests q Large is relative to instantaneous load

Challenge: which request to sacrifice? Strategy: when load is low, run all requests in parallel; when load is high, run large requests sequentially Challenge 1 non-clairvoyant q We do not know the work of a request when it arrives Challenge 2 no accurate definition of large requests q Large is relative to instantaneous load q load = 10, large request >180ms � load = 20, large request > 80ms � load = 30, large request > 20ms

Contribu1ons Tail-control scheduler Tail-control offline threshold calculation Tail-control online runtime

Contribu1ons Tail-control scheduler Target latency T Input Request work distribuIon Available in highly engineered interacIve services Request per second (RPS) Tail-control offline threshold calculation Tail-control online runtime

Contribu1ons Tail-control scheduler Input Compute a large request Tail-control offline threshold for each load value threshold calculation Large request threshold table Tail-control online runtime

Contribu1ons Tail-control scheduler Input Tail-control offline threshold calculation Large request threshold table Tail-control Use threshold table to decide online runtime which request to serialize

Contribu1ons We modify work stealing to implement tail-control scheduling using Intel Thread Building Block Be\er performance

Contribu1ons Tail-control scheduler Input Tail-control offline threshold calculation Large request threshold table Implementation Tail-control details in the paper online runtime

Tail-control scheduler Input Threshold table Tail-control Tail-control offline threshold online calculation runtime Runtime functionalities: q Execute all requests in parallel to begin with q Record total amount of computation time spent on each request thus far q Detect large requests based on the current threshold and current processing time q Serializes large requests to limit their impact on other waiting requests

Work Stealing for Single Request Ø Workers’ local queues q Execute work, if there is any in local queue q Steal Workers 1 A execute 2 A parallelize 3

Generalize Work Stealing to Mul1ple Req. Ø Workers’ local queues + a global queue q Execute work, if there is any in local queue q Steal – further parallelize a request Workers q Admit – start executing a new request 1 A execute Parallelizable requests C B 2 A arrive at global queue admit parallelize 3

Implement Tail-Control in TBB Ø Workers’ local queues + a global queue q Execute work, if there is any in local queue q Steal – further parallelize a request Workers q Admit – start executing a new request 1 A execute Parallelizable requests C B 2 A arrive at global queue admit parallelize 3 Ø Steal-first (try to reduce processing time) Ø Admit-first (try to reduce waiting time) Ø Tail-control q Steal-first + long request detection & serialization

Evalua1on Ø Various request work distributions q Bing search q Finance server q Log-normal Ø Different request arrival q Poisson q Log-normal Ø Each setting:100,000 requests, plot target latency miss ratio Ø Two baselines (generalized from work stealing for single job) q Steal-first: tries to parallelize requests and reduce proc time q Admit-first: tries to admit requests and reduce waiting time

Improvement in target latency miss ra1o Be\er performance Hard à Easy to meet the target latency

Improvement in target latency miss ra1o Be\er performance Admit-first wins Steal-first wins Hard à Easy to meet the target latency Rela1ve load: high à low

Improvement in target latency miss ra1o Be\er performance

The inner workings of tail-control Target Latency

The inner workings of tail-control Tail-control sacrifices few large requests and reduces latency of many more small requests to meet target latency. Target Latency

Tail-control performs well with inaccurate input

Tail-control performs well with inaccurate input Slightly inaccurate input work distribution is still useful less à more inaccurate input work distribu1on

Work Stealing for Interac1ve Services to Meet Target Latency Jing Li - PowerPoint PPT Presentation

Work Stealing for Interac1ve Services to Meet Target Latency Jing Li , Kunal Agrawal , Sameh Elnikety, Yuxiong He, I-Ting Angelina Lee , Chenyang Lu , Kathryn S. McKinley Washington University in St. Louis MicrosoF

WORK STEALING SCHEDULER 2 6/16/2010 Work Stealing Scheduler

Shared Memory Parallelism in Ada: Load Balancing by Work Stealing Jan Verschelde University of

Palirria: Accurate On-line Parallelism Estimation for Adaptive Work-Stealing Georgios Varisteas,

Balancing Graph Processing Workloads Using Work Stealing on Heterogeneous CPU-FPGA Systems Matthew

Understanding Task Scheduling Algorithms Kenjiro Taura 1 / 51 Contents 1 Introduction 2 Work

Scheduling Parallel Programs by Work Stealing with Private Deques Umut Acar Arthur Charguraud

Asymmetry-Aware Work-Stealing Runtimes Christopher Torng, Moyang Wang, and Christopher Batten

DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures Quan Chen, Long

Scope Defining Documents for the future TARGET services 2 nd meeting of the Target Consolidation

Parallel Search Ciaran McCreesh and Patrick Prosser This Weeks Lectures Search and

PRESENTATION 2019 01 BUSINESS MODEL AND TARGET BUSINESS MODEL TARGET CLIENTS FAMILY FAST

PRESENTATION 2019 01 BUSINESS MODEL AND TARGET BUSINESS MODEL TARGET CLIENTS FAMILY FAST

How to be a paranoid or just think like one 1 2 Leaking information Stealing 26.5 million

UA Career Services Connecting Your Students to Career Success Meet with Career Services How:

HOTEL PACKAGES Services we ofger How we work: Our videographers know the best way to present

How To Work With The US FDA When You Cant Meet With Them Experiences and Best Practices for

Industrial and economic problems E.g. : Logistics, telecommunications, IT, etc.

Sport Matters PL23: an analysis of progress towards this target and future options of delivery to

How to be a paranoid or just think like one 1 2 Leaking information Stealing 26.5 million

MEET YOURSELF! PROVIDING SERVICES AND COMMUNICATING WITH CLIENTS REMOTELY JUNE 10, 2O2O What is

Active Labour Market Policies Scope of work under ESAP project SEE 2020 Employment Creation

What your Team needs to know and do at a CARA Meet PRIOR TO THE MEET 1. OBTAIN MEET SCHEDULE

Head of Secondary Mr Simon Oakley Y7 Meet the Tutor Evening Y7 Meet the Tutor Year Group

Stealing From Thieves: Breaking IonCube VM to RE Exploit Kits