Surviving congestion in geo-distributed storage systems Brian Cho - PowerPoint PPT Presentation

Surviving congestion in geo-distributed storage systems Brian Cho University of Illinois at Urbana-Champaign Marcos K. Aguilera Microsoft Research Silicon Valley

Geo-distributed data centers • Web applications increasingly deployed across geo-distributed data centers – e.g., social networks, online stores, messaging • App data replicated across data centers – Disaster tolerance – Access locality 2

Congestion between geo-distributed data centers • Limited bandwidth between data centers – e.g., leased lines, MPLS VPN – Bandwidth is expensive: ~1K $/Mbps [SprintMPLS] – Provision for typical (not peak) usage • Many machines in each data center 3

Congestion → Delay between geo-distributed data centers • Congestion can cause significant delays – TCP messaging increases to order-of-seconds (Figure) – Observed across Amazon EC2 data centers [Kraska et al] • Users do not tolerate delays (<1s) [Nielsen] FIGURE: RPC round trip delay under congestion (10-30s) 4

Replication techniques applied to geo-distributed data centers • Weak consistency – e.g., Amazon Dynamo, Yahoo PNUTS, COPS – Good performance: updates can be propagated asynchronously – Semantics undesirable in some cases (e.g., writes get re-ordered across replicas) • Strong consistency – e.g., ABD, Paxos, available in Google Megastore, Amazon SimpleDB – Avoids the many problems of weak consistency – Must wait for updates to propagate across data centers – App delay requirements difficult to meet under congestion 5

Contributions • Vivace: a strongly consistent key-value store that is resilient to congestion across geo-distributed data centers • Approach – New algorithms send small amount of critical information across data centers in separate prioritized messages • Challenges – Still provide strong consistency – Keep prioritized messages small – Avoid delay overhead in absence of congestion 6

Vivace algorithms • Enhance previous strongly consistent algorithms • Prioritize small amount of critical information across sites 7

Vivace algorithms • Enhance previous strongly consistent algorithms • Prioritize small amount of critical information across sites Two algorithms: 1. Read/write algorithm – Very simple – Based on traditional quorum algorithm [ABD] – Linearizable read() and write() – read() contains a write-back phase 2. State machine replication algorithm – More complex, details in paper 8

Traditional quorum algorithm: write val is large (compared with key & ts) 1 <WRITE,key, val ,ts> Client Replica 1 Replica 2 Replica 3 9

Traditional quorum algorithm: write 1 <WRITE,key, val ,ts> Client <ACK-WRITE> Replica 1 Replica 2 Replica 3 10

Traditional quorum algorithm: write write done 1 <WRITE,key, val ,ts> Client <ACK-WRITE> Replica 1 Replica 2 Replica 3 11

Traditional quorum algorithm: read 1 <READ,key> Client Replica 1 Replica 2 Replica 3 12

Traditional quorum algorithm: read large val 1 <READ,key> Client <ACK-READ, val ,ts> Replica 1 Replica 2 Replica 3 13

Traditional quorum algorithm: read writeback: ensures strong consistency (linearizability) 1 large val, again! Client 2 <WRITE,key, val ,ts> Replica 1 Replica 2 Replica 3 14

Traditional quorum algorithm: read writeback: ensures strong consistency (linearizability) 1 large val, again! Client 2 <WRITE,key, val ,ts> <ACK-WRITE> Replica 1 Replica 2 Replica 3 15

Traditional quorum algorithm: read read done 1 Client 2 Replica 1 Replica 2 Replica 3 16

Vivace: write Client Replica 1 Replica 2 Replica 3 Local Local Local Replica 1 Replica 2 Replica 3 new quorum of local replicas 17

Vivace: write Client val sent locally Replica 1 Replica 2 Replica 3 1 <W-LOCAL,key, val ,ts> Local Local Local Replica 1 Replica 2 Replica 3 18

Vivace: write Client Replica 1 Replica 2 Replica 3 1 <W-LOCAL,key, val ,ts> <ACK-W-LOCAL> Local Local Local Replica 1 Replica 2 Replica 3 19

Vivace: write no val : prioritize small message! 2 <W-TS,key,ts> Client Replica 1 Replica 2 Replica 3 1 Local Local Local Replica 1 Replica 2 Replica 3 20

Vivace: write no val : prioritize small message! 2 <W-TS,key,ts> Client <ACK-W-TS> Replica 1 Replica 2 Replica 3 1 Local Local Local Replica 1 Replica 2 Replica 3 21

Vivace: write Replica 1,2,3 have a consistent view of key & ts, write but no val (yet) done 2 Client Replica 1 Replica 2 Replica 3 1 Local Local Local Replica 1 Replica 2 Replica 3 22

Vivace: write Replica 1,2,3 add val to their consistent view of key & ts 2 Client Replica 1 Replica 2 Replica 3 1 * <W-REMOTE,key, val ,ts> val is still large, Local Local Local Replica 1 Replica 2 Replica 3 but not in critical path 23

write comparison 1 Client Replica 1 Replica 2 Replica 3 Traditional algorithm: 1 remote RTT 2 Client 1 Replica 1 Replica 2 Replica 3 * Vivace algorithm: Local Local Local 1 prioritized remote RTT + Replica 1 Replica 2 Replica 3 1 local RTT 24

Vivace: read prioritize only ask for ts 1 <R-TS,key> Client Replica 1 Replica 2 Replica 3 Local Local Local Replica 1 Replica 2 Replica 3 25

Vivace: read prioritize small message 1 <R-TS,key> Client <ACK-R-TS,ts> Replica 1 Replica 2 Replica 3 Local Local Local Replica 1 Replica 2 Replica 3 26

Vivace: read ask for data 1 with largest ts Client 2 <R-DATA,key,ts> Replica 1 Replica 2 Replica 3 Local Local Local Replica 1 Replica 2 Replica 3 27

Vivace: read 1 Client 2 <R-DATA,key,ts> <ACK-R-DATA, val > Replica 1 Replica 2 Replica 3 large val , but wait for only one reply (common case: local) Local Local Local Replica 1 Replica 2 Replica 3 28

Vivace: read prioritize 1 Client 2 <W-TS,key,ts> 3 Replica 1 Replica 2 Replica 3 writeback only small ts Local Local Local Replica 1 Replica 2 Replica 3 29

Vivace: read prioritize 1 Client 2 <W-TS,key,ts> 3 Replica 1 Replica 2 Replica 3 <ACK-W-TS> Local Local Local Replica 1 Replica 2 Replica 3 30

Vivace: read read done 1 Client 2 3 Replica 1 Replica 2 Replica 3 Local Local Local Replica 1 Replica 2 Replica 3 31

read comparison Client 1 2 Replica 1 Replica 2 Replica 3 Traditional algorithm: 2 remote RTTs 1 Client 2 3 Replica 1 Replica 2 Replica 3 Vivace algorithm: 2 prioritized remote RTT + 1 local RTT 32

Evaluation topics • Practical prioritization setup • Delay with congestion – KV-store operations – Twitter clone web app operations • Delay without congestion – Overhead of Vivace algorithms compared to traditional algorithms 33

Evaluation setup • Local cluster <-> Amazon EC2 Ireland • DSCP bit prioritization on local router’s egress port • Congestion generated with iperf prioritization applied here only Local cluster Amazon EC2 (Illinois) (Ireland) 34

Evaluation Does prioritization work in practice? • Simple ping experiment • Prioritized messages bypass congestion • Local router-based prioritization is effective 35

Evaluation How well does Vivace perform under congestion? KV-store operations Twitter-clone operations (a) Read algorithms (a) Post tweet (b) Write algorithms (b) Read user timeline (c) State machine algorithms (c) Read friends timeline 36

Evaluation How well does Vivace perform under congestion? 2 prioritized remote RTTs 2 remote RTTs + 1 local RTT avoids congestion delays TCP resend on packet loss buffering delay (a) Read algorithms 37

Evaluation What is the overhead of Vivace without congestion? • (Results in paper) • No measurable overhead compared to traditional algorithms • Extra message phases are not harmful 38

Conclusion • Proposed two new algorithms – Read/write (simple, in talk) – State machine (more complex, in paper) • Both algorithms avoid delay due to congestion by prioritizing a small amount of critical information, while – Still providing strong consistency – Keeping prioritized messages small – Avoiding delay overhead in absence of congestion – Using a practical prioritization infrastructure • Careful use of prioritized messages can be an effective strategy in geo-distributed data centers 39

Surviving congestion in geo-distributed storage systems Brian Cho - PowerPoint PPT Presentation

Surviving congestion in geo-distributed storage systems Brian Cho University of Illinois at Urbana-Champaign Marcos K. Aguilera Microsoft Research Silicon Valley Geo-distributed data centers Web applications increasingly deployed across

Surviving the First Night Surviving the First Night Surviving the First Night Surviving

Congestion Control Outline Queuing Discipline Reacting to Congestion Avoiding Congestion 1

What do you mean, Congestion? some history Congestion Collapse

Congestion Games with affine functions Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

Congestion Control Mark Handley Outline Part 1: Traditional congestion control for bulk

The Present and Future of Congestion Control Mark Handley Outline Purpose of congestion

Internet congestion control: TCP Internet congestion control: TCP 1988: "Congestion

Flows and linkages Observation Edge congestion a, maximum degree vertex congestion

Traffic Congestion Continues to Increase Across the US, congestion during commuting hours

TCP Congestion Avoidance Joshua Gancher November 10, 2016 Joshua Gancher TCP Congestion

Congestion Games and Selfish Routing Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

TCP TCP Congestion Control Congestion Control Essential strategy :: The TCP host sends

Congestion Control In The Congestion Control In The Internet Internet JY Le Boudec Fall 2009

CS 457 Lecture 22 Congestion Fall 2011 Extended Project 3 Discussion Topics Principles

THE SEVEN STAGES OF BOSH THE SEVEN STAGES OF BOSH Surviving successful Bosh adoption Surviving

Congestion Games Karousatou Christina Algor. Game Theory June 2, 2011 Karousatou Christina

Congestion in Worcestershire Jim Bradley Integrated Transport Planning Ltd Presentation overview

DAVID PLUMMER & ASSOCIATES Objective Identify alternatives for improving transportation

Bus Congestion on the West Side of Manhattan 6/6/14 Manha*an Community Baord 4

CRR Revenue Adequacy, Auction Values, and Settlement Rules Scott Harvey Member, California ISO

City of Houston Climate Action Plan Working Group Meeting 3/1/2019 What is a Climate Action

Update on Pricing and Incentive-Based Congestion Management Strategies San an Franci cisco

Top 5 Timing Closure Techniques Greg Daughtry Correct Timing Constraints Analyze Before

Congestion Rent Shortfalls in the California ISO and Other Electricity Markets Scott Harvey

Sambuz

Useful Links

Newsletter

Mail Us

Surviving congestion in geo-distributed storage systems Brian Cho - PowerPoint PPT Presentation

Surviving congestion in geo-distributed storage systems Brian Cho University of Illinois at Urbana-Champaign Marcos K. Aguilera Microsoft Research Silicon Valley Geo-distributed data centers Web applications increasingly deployed across

Surviving the First Night Surviving the First Night Surviving the First Night Surviving

Congestion Control Outline Queuing Discipline Reacting to Congestion Avoiding Congestion 1

What do you mean, Congestion? some history Congestion Collapse

Congestion Games with affine functions Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

Congestion Control Mark Handley Outline Part 1: Traditional congestion control for bulk

The Present and Future of Congestion Control Mark Handley Outline Purpose of congestion

Internet congestion control: TCP Internet congestion control: TCP 1988: &quot;Congestion

Flows and linkages Observation Edge congestion a, maximum degree vertex congestion

Traffic Congestion Continues to Increase Across the US, congestion during commuting hours

TCP Congestion Avoidance Joshua Gancher November 10, 2016 Joshua Gancher TCP Congestion

Congestion Games and Selfish Routing Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

TCP TCP Congestion Control Congestion Control Essential strategy :: The TCP host sends

Congestion Control In The Congestion Control In The Internet Internet JY Le Boudec Fall 2009

CS 457 Lecture 22 Congestion Fall 2011 Extended Project 3 Discussion Topics Principles

THE SEVEN STAGES OF BOSH THE SEVEN STAGES OF BOSH Surviving successful Bosh adoption Surviving

Congestion Games Karousatou Christina Algor. Game Theory June 2, 2011 Karousatou Christina

Congestion in Worcestershire Jim Bradley Integrated Transport Planning Ltd Presentation overview

DAVID PLUMMER &amp; ASSOCIATES Objective Identify alternatives for improving transportation

Bus Congestion on the West Side of Manhattan 6/6/14 Manha*an Community Baord 4

CRR Revenue Adequacy, Auction Values, and Settlement Rules Scott Harvey Member, California ISO

City of Houston Climate Action Plan Working Group Meeting 3/1/2019 What is a Climate Action

Update on Pricing and Incentive-Based Congestion Management Strategies San an Franci cisco

Top 5 Timing Closure Techniques Greg Daughtry Correct Timing Constraints Analyze Before

Congestion Rent Shortfalls in the California ISO and Other Electricity Markets Scott Harvey

Sambuz

Useful Links

Newsletter

Mail Us

Internet congestion control: TCP Internet congestion control: TCP 1988: "Congestion

DAVID PLUMMER & ASSOCIATES Objective Identify alternatives for improving transportation