To Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai , - PowerPoint PPT Presentation

To Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai , Mosharaf Chowdhury, Harsha Madhyastha

Background • Over 40 Data Centers (DCs) on EC2, Azure, Google Cloud • A geographically denser set of DCs across clouds • Cloud apps host on multiple DCs • Web search, Interactive Multimedia • Low latency access, privacy regulations • Massive data across geo-distributed DCs

WAN is Crucial for Geo-distributed Service • Bandwidth-intensive transfers • Geo-distributed replication : Web search, cloud storage • Inter-DC Routing : SWAN [SIGCOMM’13] , Pretium [SIGCOMM’16], etc • Big data analytics : Iridium [SIGCOMM’15] , Clarinet [OSDI’16] … • … • Latency-sensitive traffic • Interactive service : Skype, Hangout • Transaction processing : SPANStore [SOSP’13] , Carousel [SIGMOD’18] , etc • …

Prior Efforts: WAN b/w varies spatially • WAN bandwidth(b/w) varies significantly between different regions • Close regions have more than12 × of the b/w than distant regions [1] Direct: VM WAN VM Sao Paulo Singapore ≈ 3x Relay: WAN WAN VM • Virginia Bandwidth Measurement across 11 EC2 regions [1] [1] “Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds.” NSDI’17

WAN Bandwidth Varies Spatially • Reproduce prior measurements • 11 EC2 regions, 110 inter-DC pairs • Tools: iperf (TCP) • Heterogeneous link capacity • Varies between the same type of VMs • Lower b/w between distant regions • Relay should work pretty well

About 40% percent data 40% transfers between EC2 regions can have more than 1.5x bandwidth increase via relay Bandwidth improvement via best relay on EC2

How to identify and tackle this complicated WAN? - Heterogeneous across regions - Dynamic runtime environment - Great complexity in sys design

How to identify and tackle this Assumptions in prior measure- complicated WAN? ments: - Heterogeneous across regions - Default TCP setting works well - Dynamic runtime environment - Single TCP is representative - Great complexity in sys design enough for the available b/w

#1: Whether the b/w still varies spatially ? What if we Break Down these assumptions ? #2: Whether the b/w still varies - Default TCP setting works well temporally? - Single TCP is representative enough for the available b/w #3: How much room for WAN improvement via relay?

Default TCP Setting may be Sub-optimal • B/w varies across regions • Lower b/w between distant regions • RTT varies across regions • Max TCP window is bounded • TCP throughput is RTT -based • Google: Bandwidth to Iowa

Default TCP Setting is Sub-optimal • B/w varies across regions • Lower b/w between distant regions • RTT varies across regions • Max TCP window is bounded • TCP throughput is RTT -based • Per-TCP rate limit on the WAN Google: Bandwidth to Iowa

Single TCP is not Representative • Single TCP underutilize the b/w • Use multiple TCPs • Per-VM cap for outbound rate • Per-TCP rate limit < Per-VM cap • Aggregate b/w is homogeneous • VM-cap works on all connections Google: Bandwidth to Iowa

#1: Whether the b/w still varies spatially ? Often Homogeneous What if we Break Down these assumptions ? #2: Whether the b/w still varies - Default TCP setting works well temporally? - Single TCP is representative enough for the available b/w #3: How much room for WAN improvement via relay?

Available B/w is often Stable • Measurement setup • Create/terminate connections • Inter-DC connections share the VM-cap Create new connections • Google: Throughput from Iowa

Available B/w is often Stable • Measurement setup Terminate connections • Create/terminate connections • Inter-DC connections share the VM-cap • Google: Throughput from Iowa

Available B/w is often Stable • Measurement setup Aggregate b/w is stable • Create/terminate connections • Inter-DC connections share the VM-cap • Max b/w (VM cap) is stable Google: Throughput from Iowa

Homogeneous bandwidth Maximum available bandwidth - Homogeneous across regions - Stable over time - Varies with VM instances - Performance can be predict- able w/o great sys complexity What will happen if the b/w is homogeneous ?

Little Scope for Optimization via Inter-DC Relay Homogeneous bandwidth Latency Measurement across 40 DCs What will happen if the b/w is homogeneous ?

Takeaway • Intra-DC relay from poor performance VMs to high performance VMs • Gain more inter-DC bandwidth without extra costs for transfers • Routing through a third DC takes your money away $ $ VM VM VM VM DC 1 DC 2 VM VM $ + $ = 2$ DC 1 0 + $ + 0 = $ DC 2 VM • Intra-DC relay DC 3 Inter-DC routing

Takeaway • Turn to the optimization of bandwidth contentions inside VMs • VM-cap VS link-level optimizations used in existing GDA work • VM-aware VS WAN-aware • Bandwidth measurements are far from complete • More than 40 VM instance types VM ∑ b i ≤ VM-cap b 1 b n b 2 VM VM VM •

#1: Whether the b/w still varies spatially ? Often Homogeneous Thank you! #2: Whether the b/w still varies Questions? temporally? Often Stable #3: How much room for WAN fanlai@umich.edu improvement via relay? Case by case

To Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai , - PowerPoint PPT Presentation

To Relay or Not to Relay for Inter-Cloud Transfers? Fan Lai , Mosharaf Chowdhury, Harsha Madhyastha Background Over 40 Data Centers (DCs) on EC2, Azure, Google Cloud A geographically denser set of DCs across clouds Cloud apps host on

Tor61 P P R2 Time Note on Relay Packets A relay does not look inside Relay cells unless

2010 Relay for Life Seasons of Hope What is Relay for Life? Relay for Life is an annual event

Frame Relay Topologies and Designs Frame Relay Topologies and Design As we learned in the Frame

Wave Relay System and Wave Relay System and General Project Details General Project Details

LaGOV LaGOV Version 2.0 2 Before we get started ... Logistics Ground Rules Has

PRACTITIONERS DEEDS AND TRANSFERS UNDER THE RPA Fraud in conveyances and transfers of property

world of In Inter Ic Ice-Pump JAN 2016 Presentation of Inter Ice-Pump 1 Inter Ice-Pump ApS //

Why Inter- -Municipal Municipal Why Inter Cooperation? Cooperation? 1 Inter- -Municipal

What is the National Traffic System (NTS)? The RELAY in American Radio Relay League

Frame Relay Basic Configurations: Point to Point Frame Relay Basic Point to Point Configuration

Frame Relay Basic Configurations: Hub and Spoke Frame Relay Basic Hub and Spoke Configuration

Harry Porters Relay Computer Harry Porter, Ph.D. Portland State University November 7, 2007

Data Management Network transfers Network data transfers Not everyone needs to transfer large

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Frame Relay Bigger, Longer, Uncut 2005/03/11 (C) Herbert Haas What is Frame Relay?

Can Far Memory Improve Job Throughput? Eurosys 2020 Talk Emmanuel Amaro, Christopher

Survivable and Bandwidth- Guaranteed Embedding of Virtual Clusters in Cloud Data Centers Ruozhou

What We Talk About When We Talk About Cloud Network Performance* * With apologies to Raymond

Thinking Outside the Box: Innovative Pathways to Refugee Employment What makes us unique? Hire

An Introduction to the Tor Ecosystem for Developers Alexander Fry February 2, 2020 FOSDEM

Nonparametric density estimation Christopher F Baum EC 823: Applied Econometrics Boston College,

Caching in HTTP Adaptive Streaming: Friend or Foe? Danny Lee Ali C. Begen Constantine Dovrolis

BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding Shenglong Li , Quanlu Zhang, Zhi

Sambuz

Useful Links

Newsletter

Mail Us