ACM Symposium on Cloud Computing 2019 1 Tenants Cloud providers - PowerPoint PPT Presentation

ACM Symposium on Cloud Computing 2019 1

Tenants Cloud providers Rent Virtual Machines (VMs) VM Operate cloud infrastructures VM Great budget expenditure for: VM • Data center equipment • Power provisioning 2

Tenants Cloud providers Rent Virtual Machines (VMs) VM Operate cloud infrastructures VM Great budget expenditure for: VM • Data center equipment • Power provisioning Ø Virtual resources might be provisioned (via tenants) for peak load Ø Tenants’ VM placement (via providers) is challenging 3

Cumulative probability, F(x) X = Average usage CDF of average CPU and memory usage, Alibaba cluster trace (2018). fg = foreground/online workload 4

Cumulative CPU probability, utilization F(x) (%) X = Average usage Time (days) CDF of average CPU and memory usage, VM-level CPU usage for the Azure Alibaba cluster trace (2018). trace (2017). fg = foreground/online workload 5

Cumulative CPU probability, utilization F(x) (%) X = Average usage Time (days) CDF of average CPU and memory usage, VM-level CPU usage for the Azure Great opportunity to use cloud idle resources Alibaba cluster trace (2018). trace (2017). fg = foreground/online workload 6

Cumulative probability, F(x) X = Average usage CDF of average CPU and memory usage, Alibaba cluster trace (2018). 7

Ø Cumulative • probability, F(x) • X = Average usage CDF of average CPU and memory usage, Alibaba cluster trace (2018). bg = background/batch workload 8

Ø Cumulative • probability, F(x) • X = Average usage CDF of average CPU and memory usage, Alibaba cluster trace (2018). Problem statement: How to schedule background batch jobs to improve utilization without hurting black-box foreground performance? bg = background/batch workload 9

Ø Ø • • • • Ø 10

Ø • fg: facebook bg: FB-Hadoop • • • Ø • • o 11

Ø Ø Ø Ø Physical server network Virtual Machine ( VM ) … Container [n_socket] Container [1] Worker process Worker process Data Scavenger Daemon sources 12

Ø • VM Container Web serving DCopy 0 1 2 3 CPU Cores Last Level Cache (LLC) Ubuntu 16.04, KVM, Docker Using Linux’s cpuset cgroups 13

Ø • VM Container Web serving DCopy 95%ile RT degradation 0 1 2 3 (%) CPU Cores Last Level Cache (LLC) Ubuntu 16.04, KVM, Docker Background CPU usage (%) Using Linux’s cpuset cgroups 14

Ø • VM Container Web serving DCopy Instruction Per 95%ile RT Cycle (IPC) degradation 0 1 2 3 degradation(%) (%) CPU Cores Last Level Cache (LLC) Ubuntu 16.04, KVM, Docker Background CPU usage (%) Using Linux’s cpuset cgroups 15

Ø • VM Container Web serving DCopy Instruction Per 95%ile RT Cycle (IPC) degradation 0 1 2 3 degradation(%) (%) CPU Cores IPC is used as performance proxy Last Level Cache (LLC) Ubuntu 16.04, KVM, Docker Background CPU usage (%) Using Linux’s cpuset cgroups 16

Ø • o • • 17

Ø Our generic online algorithm • Monitor VMs’ perf metric (e.g., memory usage) for window-size • Calculate mean, 𝜈 , and standard deviation, 𝜏 • React based on the VMs’ perf metric and 𝝂 +/- 𝒅 . 𝝉 Headroom Simplified illustration window-size 𝝂 + 𝒅. 𝝉 bg-- Normalized Do nothing metric value [memory usage, 𝝂 − 𝒅. 𝝉 bg++ network usage] Time bg = 0 bg = 1 – ( 𝝂 + 𝒅. 𝝉 ) 18

Ø • 19

Ø • Training CloudSuite Widely used benchmark suite Foreground Testing TailBench Designed for latency-critical applications KMeans A popular clustering algorithm Background (SparkBench) SparkPi Computes Pi with very high precision 20

Ø • Training CloudSuite Widely used benchmark suite Foreground Testing TailBench Designed for latency-critical applications KMeans A popular clustering algorithm Background (SparkBench) SparkPi Computes Pi with very high precision Sensitivity analysis Experimental evaluation 21

The load generators employed in TailBench are open-loop. Workload Domain Tail latency scale Xapian Online search Milliseconds Moses Real-time translation Milliseconds Silo In-memory database (OLTP) Microseconds Specjbb Java middleware Microseconds Masstree Key-value store Microseconds Shore On-disk database (OLTP) Milliseconds Sphinx Speech recognition Seconds Img-dnn Image recognition Milliseconds http://people.csail.mit.edu/sanchez/papers/2016.tailbench.iiswc.pdf 22

0 1 2 3 4 0 1 2 3 4 PM 1 5 6 7 8 9 5 6 7 8 9 LLC of size 25MB LLC of size 25MB Processor socket 0 Processor socket 1 250GB DRAM 10 Gb/s network KVM, Docker Resource Manager, PM 2 Ubuntu 16.04 Name Node, Data Node 23

Background VM 1 0 0 1 1 2 2 3 3 4 4 0 1 2 3 4 PM 1 5 5 6 6 7 7 8 8 9 9 5 6 7 8 9 LLC of size 25MB LLC of size 25MB Processor socket 0 Processor socket 1 250GB DRAM 10 Gb/s network KVM, Docker Resource Manager, PM 2 Ubuntu 16.04 Name Node, Data Node 24

Background VM 2 Background VM 1 0 0 1 1 2 2 3 3 4 4 0 0 1 1 2 2 3 3 4 4 PM 1 5 5 6 6 7 7 8 8 9 9 5 5 6 6 7 7 8 8 9 9 LLC of size 25MB LLC of size 25MB Processor socket 0 Processor socket 1 250GB DRAM 10 Gb/s network KVM, Docker Resource Manager, PM 2 Ubuntu 16.04 Name Node, Data Node 25

Ø Ø • • • • Ø 26

VM 1 Workload || VM 2 Workload bg: SparkPi 95%ile latency degradation (%) Better 27

VM 1 Workload || VM 2 Workload bg: SparkPi 95%ile latency CPU Memory degradation (%) 43%↑ 201%↑ Better 28

VM 1 Workload || VM 2 Workload bg: SparkPi 95%ile latency CPU Memory degradation (%) 43%↑ 201%↑ Better bg: KMeans 95%ile latency degradation (%) Better 29

VM 1 Workload || VM 2 Workload bg: SparkPi 95%ile latency CPU Memory degradation (%) 43%↑ 201%↑ Better bg: KMeans 95%ile latency CPU Memory degradation (%) 34%↑ 321%↑ Better 30

Lab testbed: 2-vCPU foreground VM, 2-core background container. 20 Increase in Baseline Heracles transfer time (%) Static 160Mbps 10 Static 80Mbps Better Scavenger 0 Sorting FFT 31

Lab testbed: 2-vCPU foreground VM, 2-core background container. 20 Increase in Baseline Heracles transfer time (%) Static 160Mbps 10 Static 80Mbps Better Scavenger 0 Sorting FFT CPU Network Scavenger outperforms static approaches while 37%↑ 180Mbps ↑ affording higher background usage. 32

Cloud testbed: 4-vCPU foreground VM, 6-core background DCopy container. 4 35346 369968 13 177343 62 50612 Normalized 3 95%ile latency No background 95%ile latency Baseline Scavenger 2 Better 1 0 xapian moses silo specjbb masstree shore sphinx img-dnn 33

Cloud testbed: 4-vCPU foreground VM, 6-core background DCopy container. 4 35346 369968 13 177343 62 50612 Normalized 3 95%ile latency No background 95%ile latency Baseline Scavenger 2 Better 1 0 xapian moses silo specjbb masstree shore sphinx img-dnn 3-5% Scavenger can successfully and aggressively regulate bg CPU ↑ workload to mitigate its impact on fg performance. 34

Ø Ø • • Ø • • 35

ACM Symposium on Cloud Computing 2019 1 Tenants Cloud providers - PowerPoint PPT Presentation

ACM Symposium on Cloud Computing 2019 1 Tenants Cloud providers Rent Virtual Machines (VMs) VM Operate cloud infrastructures VM Great budget expenditure for: VM Data center equipment Power provisioning 2 Tenants Cloud providers

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing

ACM-W Europe Volunteering to Improve your Prospects Who am I I am the Chair of ACM-W Europe

DC-DRF : Adaptive Multi- Resource Sharing at Public Cloud Scale ACM Symposium on Cloud

ACM History Committee Brent Hailper n ACM SGB - Chicago - 27 Mar 2009 Purpose to foster

Data in the Cloud Happy 10 th ACM SoCC! Raghu Ramakrishnan CTO for Data, Technical Fellow ACM

cloud computing Ridwaan Boda Director | Technology, Media and Telecommunications Overview

ETC/ACM air quality mapping method and its evaluation Jan Horlek (ETC/ACM, CHMI) Nina

Cloud Computing and Cloud Storage By: Maurice Kelly History of Internet and Cloud Computing

Chapter 4 Cloud Computing Applications and Paradigms Cloud Computing: Theory and Practice. 1

Storage Deduplication in Cloud Computing Joo Paulo and Jos Pereira University of Minho July

Patterns for Cloud Computing Simon Guest Senior Director, Technical Strategy Microsoft

Introduction to PaaS and IaaS Cloud Computing Roberto Beraldi Models for Cloud Computing

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

Linux Containers Drive P2P Social Cloud Computing By Alex Karasulu Social cloud computing ,

Cloud Computing Tom Hendrickx RESEARCH QUESTION Define Cloud Computing in context of the higher

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

IBM i Its been a challenge to determine how to distill the essence of IBM i. Since IBM i is

First Order Circuits II: mathematical tools we use are a model intended to describe the observed

Be a Binary Rockst r An Introduction to Program Analysis with Binary Ninja Agenda

Nu: a Dynamic Aspect-Oriented Intermediate Language Model and Virtual Machine for Flexible

Autopilot: workload autoscaling at Google Krzysztof Rzadca (Google & University of Warsaw,

Tail Loss Probe (TLP) Converting RTOs to fast recoveries draft-dukkipati-tcpm-tcp-loss-probe-00

MAJ LTC COL Promotion Promotion Promotion BRD FY Opportunity BRD FY Opportunity BRD FY

A"Hitchhikers"Guide"to"Fast"and"Efficient"Data"

ACM Symposium on Cloud Computing 2019 1 Tenants Cloud providers - PowerPoint PPT Presentation

ACM Symposium on Cloud Computing 2019 1 Tenants Cloud providers Rent Virtual Machines (VMs) VM Operate cloud infrastructures VM Great budget expenditure for: VM Data center equipment Power provisioning 2 Tenants Cloud providers

Cloud Computing &amp; Cloud Models Cloud Models Topics Defining cloud computing

ACM-W Europe Volunteering to Improve your Prospects Who am I I am the Chair of ACM-W Europe

DC-DRF : Adaptive Multi- Resource Sharing at Public Cloud Scale ACM Symposium on Cloud

ACM History Committee Brent Hailper n ACM SGB - Chicago - 27 Mar 2009 Purpose to foster

Data in the Cloud Happy 10 th ACM SoCC! Raghu Ramakrishnan CTO for Data, Technical Fellow ACM

cloud computing Ridwaan Boda Director | Technology, Media and Telecommunications Overview

ETC/ACM air quality mapping method and its evaluation Jan Horlek (ETC/ACM, CHMI) Nina

Cloud Computing and Cloud Storage By: Maurice Kelly History of Internet and Cloud Computing

Chapter 4 Cloud Computing Applications and Paradigms Cloud Computing: Theory and Practice. 1

Storage Deduplication in Cloud Computing Joo Paulo and Jos Pereira University of Minho July

Patterns for Cloud Computing Simon Guest Senior Director, Technical Strategy Microsoft

Introduction to PaaS and IaaS Cloud Computing Roberto Beraldi Models for Cloud Computing

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

Linux Containers Drive P2P Social Cloud Computing By Alex Karasulu Social cloud computing ,

Cloud Computing Tom Hendrickx RESEARCH QUESTION Define Cloud Computing in context of the higher

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

IBM i Its been a challenge to determine how to distill the essence of IBM i. Since IBM i is

First Order Circuits II: mathematical tools we use are a model intended to describe the observed

Be a Binary Rockst r An Introduction to Program Analysis with Binary Ninja Agenda

Nu: a Dynamic Aspect-Oriented Intermediate Language Model and Virtual Machine for Flexible

Autopilot: workload autoscaling at Google Krzysztof Rzadca (Google &amp; University of Warsaw,

Tail Loss Probe (TLP) Converting RTOs to fast recoveries draft-dukkipati-tcpm-tcp-loss-probe-00

MAJ LTC COL Promotion Promotion Promotion BRD FY Opportunity BRD FY Opportunity BRD FY

A&quot;Hitchhikers&quot;Guide&quot;to&quot;Fast&quot;and&quot;Efficient&quot;Data&quot;

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing

Autopilot: workload autoscaling at Google Krzysztof Rzadca (Google & University of Warsaw,

A"Hitchhikers"Guide"to"Fast"and"Efficient"Data"