FOR L ATENCY C RITICAL W ORKLOADS H ARSHAD K ASTURE , D ANIEL S - PowerPoint PPT Presentation

U BIK : E FFICIENT C ACHE S HARING WITH S TRICT Q O S FOR L ATENCY C RITICAL W ORKLOADS H ARSHAD K ASTURE , D ANIEL S ANCHEZ ASPLOS 2014

Motivation 2 L. Barroso and U. Hölzle, The Case for Energy-Proportional Computing  Low server utilization in datacenters is a major source of inefficiency

Common Industry Practice 3 Latency Critical Application Core 3 Core 4 Core 5 Last Level Cache Core 0 Core 1 Core 2  Dedicated machines for latency-critical applications guarantees QoS

Common Industry Practice 4 Latency Critical Application Core 3 Core 4 Core 5 Last Level Cache Core 0 Core 1 Core 2  Dedicated machines for latency-critical applications guarantees QoS  Under utilization of machine resources

Colocation to Improve Utilization 5 Latency Critical Applications Core 3 Core 4 Core 5 Last Level Cache Core 0 Core 1 Core 2  Can utilize spare resources by colocating batch apps

Sharing Causes Interference! 6 Latency Critical Applications Core 3 Core 4 Core 5 Last Level Cache Core 0 Core 1 Core 2  Can utilize spare resources by colocating batch apps  Contention in shared resources degrades QoS

Outline 7  Introduction  Analysis of latency-critical apps  Inertia-oblivious cache management schemes  Ubik: Inertia-aware cache management  Evaluation

Understanding Latency-Critical Applications 8 Client Back End Front End Client Back End Front End Client Back End Datacenter  Large number of backend servers participate in handling every user request  Total service time determined by tail latency behavior of backend

Understanding Latency-Critical Applications 9  Service latency highly sensitive to changes in load

Understanding Latency-Critical Applications 10 Active Idle Time  Short bursts of activity interspersed with idle periods  Need guaranteed high performance during active periods

Inertia and Transient Behavior 11 Core 3 Core 4 Core 5 IPC Last Level Cache Core 0 Core 1 Core 2 Time

Inertia and Transient Behavior 12 Transient begin Transient end Core 3 Core 4 Core 5 IPC Last Level Cache Core 0 Core 1 Core 2 Time  Transient lengths can dominate tail latency!  Any dynamic reconfiguration scheme has to be inertia-aware  Many hardware resources exhibit inertia  branch predictors, prefetchers, memory bandwidth…  LLCs are one of the biggest sources of inertia

Inertia-Oblivious Cache Management 14 LC1 LC2 Active Idle Active Core 2 Core 3 Idle Active Last Level Cache (LLC) Idle Core 0 Core 1 Active Idle Time Batch1 Batch2

Unmanaged LLC (LRU Replacement) 15 LC1 LC2 Active Idle Active Core 2 Core 3 Idle Active Last Level Cache (LLC) Idle Core 0 Core 1 Active Idle Time Batch1 Batch2 LLC Space ✖ Unconstrained interference results in poor tail-latency behavior Time

Utility Based Cache Partitioning (UCP) 16 LC1 LC2 Active Idle Active Core 2 Core 3 Idle Active Last Level Cache (LLC) Idle Core 0 Core 1 Active Idle Time Batch1 Batch2 LLC Space ✔ High batch throughput ✖ Poor tail latency (low allocation) Time Reconfigure

OnOff: Efficient but Unsafe 17 LC1 LC2 Active Idle Active Core 2 Core 3 Idle Active Last Level Cache (LLC) Idle Core 0 Core 1 Active Idle Time Batch1 Batch2 LLC Space ✔ High batch throughput Time Batch Reconfigure

Cross-Request LLC Inertia 18 Misses LLC Access Breakdown (%) Cross-request hits Hits (same request) Shore-MT, 2 MB LLC  Other applications qualitatively similar (see paper for details)

StaticLC: Safe but Inefficient 19 LC1 LC2 Active Idle Active Core 2 Core 3 Idle Active Last Level Cache (LLC) Idle Core 0 Core 1 Active Idle Time Batch1 Batch2 LLC Space ✔ Low tail latency (preserve LLC state) ✖ Low batch throughput (poor space utilization) Time Batch Reconfigure

Ubik: Performance Guarantee 21 Progress Instructions with constant size Progress with Ubik deadline Time Request begins  Performance as well as overall progress under Ubik after the deadline is identical to static partitioning

Ubik: Overview 22 Activity Time Size Target Size nominal static Actual Size size idle size Time

Ubik: Overview 23 Activity Time Size boosted size Target Size nominal static Actual Size size idle size Time

Ubik: Overview 24 Activity Time Size boosted size Target Size nominal static Actual Size size idle size Time

Ubik: Overview 25 Activit y Time Size boosted size Target Size nominal static size Actual Size idle size Time  Constraint: Cycles lost during should be compensated for by the cycles gained during before the deadline

Analyzing Transients 26  Need accurate predictions for  The length of the transient from s 1 to s 2  Cycles lost during the transient from s 1 to s 2 Size Progress with Instructions T transient Lost constant size (s 2 ) Performanc s 2 e Target Actual Size Progress Size with Ubik s 1 Transient Time Time begins Transient ends

Hardware Support 27  Utility monitors to measure per-application miss curves Miss probability p S1 p S2 s 1 s 2 Size  Fine grained cache partitioning  Memory Level Parallelism (MLP) profiler

Bounds on Transient Behavior 28 Size T transient s 2 Target   s 2  1 c c    Actual Size     M  s 2  s 1  M T transient   p s p s 2   Size s  s 1 s 1 ฀ Time Progress with Instructions Lost constant size (s 2 ) Performance (L)   s 2  1 p s 2 p s 2    1    L  M 1   M s 2  s 1   p s p s 1   s  s 1 Progress with Ubik Transient Time ฀ begins Transient ends

Ubik: Partition Sizing 29  Use transient analysis to identify feasible ( idle size, boosted size ) pairs Size 1 Time deadline

Ubik: Partition Sizing 30  Use transient analysis to identify feasible ( idle size, boosted size ) pairs Size 1 2 Time deadline

Ubik: Partition Sizing 31  Use transient analysis to identify feasible ( idle size, boosted size ) pairs Size 1 2 3 Time deadline

Ubik: Partition Sizing 32  Use transient analysis to identify feasible ( idle size, boosted size ) pairs I N F E A S I B L E Size 1 2 3 4 Time deadline

Ubik: Partition Sizing 33  Use transient analysis to identify feasible ( idle size, boosted size ) pairs  Choose the pair that yields the maximum batch throughput Size 1 2 3 Time deadline  See paper for details

Workloads 35  Five diverse latency-critical apps  xapian (search engine)  masstree (in-memory key-value store)  moses (statistical machine translation)  shore-mt (multi-threaded DBMS)  specjbb (java middleware)  Batch applications: random mixes of SPECCPU 2006 benchmarks

Target System 36  6 OOO cores LC1 LC2 LC3  Private L1I, L1D and L2 caches Core 3 Core 4 Core 5  12MB shared LLC L3 L3 L3 Bank 3 Bank 4 Bank 5 L3 L3 L3  400 6-app mixes: 3 Bank 0 Bank 1 Bank 2 latency-critical + 3 batch Core 0 Core 1 Core 2 apps  Apps pinned to cores Batch2 Batch3 Batch1

Metrics 37  Baseline system has LC1 LC2 LC3 private LLCs Core 3 Core 4 Core 5  We report L3 3 L3 4 L3 5  Normalized tail latency L3 0 L3 1 L3 2  Throughput improvement for batch applications Core 0 Core 1 Core 2 Batch1 Batch2 Batch3

Results: Unmanaged LLC (LRU) 38 Higher is better

Results: UCP 39 Higher is better

Results: OnOff 40 Higher is better

Results: StaticLC 41 Higher is better

Results: Ubik 42 Higher is better

Results: Summary 43 OnOff LRU UCP StaticLC Ubik Private LLC Higher is better

Conclusions 44  To guarantee tail latency, dynamic resource management schemes must be inertia-aware  Ubik: Inertia-aware cache capacity management  Preserves tail of latency-critical apps  Achieves high cache space utilization for batch apps  Requires minimal additional hardware

T HANKS F OR Y OUR A TTENTION ! Q UESTIONS ?

FOR L ATENCY C RITICAL W ORKLOADS H ARSHAD K ASTURE , D ANIEL S - PowerPoint PPT Presentation

U BIK : E FFICIENT C ACHE S HARING WITH S TRICT Q O S FOR L ATENCY C RITICAL W ORKLOADS H ARSHAD K ASTURE , D ANIEL S ANCHEZ ASPLOS 2014 Motivation 2 L. Barroso and U. Hlzle, The Case for Energy-Proportional Computing Low server

FOR L ATENCY -C RITICAL S YSTEMS H ARSHAD K ASTURE , D AVIDE B ARTOLINI , N ATHAN B ECKMANN , D

T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L ATENCY - C RITICAL A

M INIMIZING L ATENCY FOR S ECURE D ISTRIBUTED C OMPUTING Rawad Bitar Illinois

CIRT C ritical I ncident R esponse T eam What is CIRT? CIRT is the Critical Incident Response Team

Pediatric Emergency and Cri ritical Care in in Low Mid iddle In Income Countries: An In

Suic icidal thoughts st start young: The cri ritical need for r famil ily su support and

M ISSION C RITICAL HVAC COLLABORATION CERTAINTY HVAC A IR D

State e and Lo Loca cal l Per erspect ctives on Critic ritical Water er Res esource

PER formance EVAL uation Eneko Atxutegi of C ritical C ommunications www.n .nem emerg ergen

T HE TRANSITION FROM 2D TO 3D AND TO IMRT - R ATIONALE AND C RITICAL E LEMENTS ICTP P S CHOOL ON

consultations in in Queensland: : benefits, challenges and cri ritical enablers Amina TARIQ a ,

M ANAGING C RITICAL I NFRASTRUCTURES THROUGH B EHAVIOURAL O BSERVATION W ILLIAM H

GTC M ARCH 2018 Mike Tilkin ACR Chief Information Officer and EVP for Technology R EALIZING THE

C ANADA R ARE E ARTH C ORP . I N T HE B USINESS OF R ARE E ARTHS Corporate Overview February 2020

Agenda Project Background Critical Facilities Included Key Questions Design

SCADE 1 S E M I N A R I N S O F T W A R E E N G I N E E R I N G P R E S E N T E R A V N E R

A Constraint-Based Approach to Quality Assurance in Service Choreographies c, 1 Manuel Carro, 1 ,

A survey of QoS architectures Cristina Aurrecoechea, Andrew T. Campbell, Linda Hauw Center for

QoS Negotiation in Real-Time No critique for Mondays class Systems By Stephanie McCarthy

Telematics 2 & Performance Evaluation Chapter 2 Quality of Service in the Internet

OPERATOR SCHEDULING IN A DATA STREAM MANAGER Authors: D. Charney , U.etintemel , A.Rasin

Outline EECS228a Lecture 2 Economics of Networks Research Topics Routing Congestion

Automating Cloud Deployment for Deep Learning Inference of Real-time Online Services Yang Li

#$%%$&'""

FOR L ATENCY C RITICAL W ORKLOADS H ARSHAD K ASTURE , D ANIEL S - PowerPoint PPT Presentation

U BIK : E FFICIENT C ACHE S HARING WITH S TRICT Q O S FOR L ATENCY C RITICAL W ORKLOADS H ARSHAD K ASTURE , D ANIEL S ANCHEZ ASPLOS 2014 Motivation 2 L. Barroso and U. Hlzle, The Case for Energy-Proportional Computing Low server

FOR L ATENCY -C RITICAL S YSTEMS H ARSHAD K ASTURE , D AVIDE B ARTOLINI , N ATHAN B ECKMANN , D

T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L ATENCY - C RITICAL A

M INIMIZING L ATENCY FOR S ECURE D ISTRIBUTED C OMPUTING Rawad Bitar Illinois

CIRT C ritical I ncident R esponse T eam What is CIRT? CIRT is the Critical Incident Response Team

Pediatric Emergency and Cri ritical Care in in Low Mid iddle In Income Countries: An In

Suic icidal thoughts st start young: The cri ritical need for r famil ily su support and

M ISSION C RITICAL HVAC COLLABORATION CERTAINTY HVAC A IR D

State e and Lo Loca cal l Per erspect ctives on Critic ritical Water er Res esource

PER formance EVAL uation Eneko Atxutegi of C ritical C ommunications www.n .nem emerg ergen

T HE TRANSITION FROM 2D TO 3D AND TO IMRT - R ATIONALE AND C RITICAL E LEMENTS ICTP P S CHOOL ON

consultations in in Queensland: : benefits, challenges and cri ritical enablers Amina TARIQ a ,

M ANAGING C RITICAL I NFRASTRUCTURES THROUGH B EHAVIOURAL O BSERVATION W ILLIAM H

GTC M ARCH 2018 Mike Tilkin ACR Chief Information Officer and EVP for Technology R EALIZING THE

C ANADA R ARE E ARTH C ORP . I N T HE B USINESS OF R ARE E ARTHS Corporate Overview February 2020

Agenda Project Background Critical Facilities Included Key Questions Design

SCADE 1 S E M I N A R I N S O F T W A R E E N G I N E E R I N G P R E S E N T E R A V N E R

A Constraint-Based Approach to Quality Assurance in Service Choreographies c, 1 Manuel Carro, 1 ,

A survey of QoS architectures Cristina Aurrecoechea, Andrew T. Campbell, Linda Hauw Center for

QoS Negotiation in Real-Time No critique for Mondays class Systems By Stephanie McCarthy

Telematics 2 &amp; Performance Evaluation Chapter 2 Quality of Service in the Internet

OPERATOR SCHEDULING IN A DATA STREAM MANAGER Authors: D. Charney , U.etintemel , A.Rasin

Outline EECS228a Lecture 2 Economics of Networks Research Topics Routing Congestion

Automating Cloud Deployment for Deep Learning Inference of Real-time Online Services Yang Li

#$%%$&amp;'&quot;&quot;

Telematics 2 & Performance Evaluation Chapter 2 Quality of Service in the Internet

#$%%$&'""