Understanding Tradeoffs for Scalability Steve Vinoski Architect, - PowerPoint PPT Presentation

Understanding Tradeoffs for Scalability Steve Vinoski Architect, Basho Technologies Cambridge, MA USA @stevevinoski Wednesday, October 12, 11 1

Back In the Old Days • Big centralized servers controlled all storage • To scale, you scaled vertically (up) by getting a bigger server • Single host guaranteed data consistency Wednesday, October 12, 11 2

Drawbacks • Scaling up is limited • Servers can only get so big • And the bigger they get, the more they cost Wednesday, October 12, 11 3

Hitting the Wall • Websites started outgrowing the scale-up approach • Started applying workarounds to try to scale • Resulted in fragile systems with di ffj cult operational challenges Wednesday, October 12, 11 4

A Distributed Approach • Multiple commodity servers • Scale horizontally (out instead of up) • Read and write on any server • Replicated data • Losing a server doesn’t lose data Wednesday, October 12, 11 5

No Magic Bullet • A distributed approach can scale much larger • But distribution brings its own set of issues • Requires tradeo fg s Wednesday, October 12, 11 6

CAP Theorem • A conjecture put forth in 2000 by Dr. Eric Brewer • Formally proven in 2002 • In any distributed system, pick two: • Consistency • Availability • Partition tolerance Wednesday, October 12, 11 7

Partition Tolerance • Guarantees continued system operation even when the network breaks and messages are lost • Systems generally tend to support P • Leaves choice of either C or A Wednesday, October 12, 11 8

Consistency • Distributed nodes see the same updates at the same logical time • Hard to guarantee across a distributed system Wednesday, October 12, 11 9

Availability • Guarantees the system will service every read and write sent to it • Even when things are breaking Wednesday, October 12, 11 10

Choose Two: CA • Traditional single-node RDBMS • Single node means P irrelevant Wednesday, October 12, 11 11

Choose Two: CP • Typically involves sharding, where data is spread across nodes in an app-specific manner • Sharding can be brittle • data unavailable from a given shard if its node dies • can be hard to add nodes and change the sharding logic Wednesday, October 12, 11 12

Choose Two: AP • Provides read/write availability even when network breaks or nodes die • Provides eventual consistency • Example: Domain Name System (DNS) is an AP system Wednesday, October 12, 11 13

Example AP Systems • Amazon Dynamo • Cassandra • CouchDB • Voldemort • Basho Riak Wednesday, October 12, 11 14

Handling Tradeoffs for AP Systems Wednesday, October 12, 11 15

• Problem: how to make the system available even if nodes die or the network breaks? • Solution: • allow reading and writing from multiple nodes in the system • avoid master nodes, instead make all nodes peers Wednesday, October 12, 11 16

• Problem: if multiple nodes are involved, how do you reliably know where to read or write? • Solution: • assign virtual nodes (vnodes) to physical nodes • use consistent hashing to find vnodes for reads/writes Wednesday, October 12, 11 17

Consistent Hashing Wednesday, October 12, 11 18

Consistent Hashing and Multi Vnode Benefits • Data is stored in multiple locations • Loss of a node means only a single replica is lost • No master to lose • Adding nodes is trivial, data gets rebalanced automatically Wednesday, October 12, 11 19

• Problem: what about availability? What if the node you write to dies or becomes inaccessible? • Solution: sloppy quorums • write to multiple vnodes • attempt reads from multiple vnodes Wednesday, October 12, 11 20

N/R/W Values • N = number of replicas to store (on distinct nodes) • R = number of replica responses needed for a successful read (specified per-request) • W = number of replica responses needed for a successful write (specified per-request) Wednesday, October 12, 11 21

N/R/W Values Wednesday, October 12, 11 22

• Problem: what happens if a key hashes to vnodes that aren’t available? • Solution: • read from or write to the next available vnode • eventually repair via hinted hando fg Wednesday, October 12, 11 23

N/R/W Values Wednesday, October 12, 11 24

Hinted Handoff • Surrogate vnode holds data for unavailable actual vnode • Surrogate vnode keeps checking for availability of actual vnode • Once the actual vnode is again available, surrogate hands o fg data to it Wednesday, October 12, 11 25

Quorum Benefits • Allows applications to tune consistency, availability, reliability per read or write Wednesday, October 12, 11 26

• Problem: how do the nodes in the ring keep track of ring state? • Solution: gossip protocol Wednesday, October 12, 11 27

Gossip Protocol • Nodes “gossip” their view of the state of the ring to other nodes • If a node changes its claim on the ring, it lets others know • The overall state of the ring is thus kept consistent among all nodes in the ring Wednesday, October 12, 11 28

• Problem: what happens if vnodes get out of sync? • Solution: • vector clocks • read repair Wednesday, October 12, 11 29

Vector Clocks • Reasoning about time and causality in distributed systems is hard • Integer timestamps don’t necessarily capture causality • Vector clocks provide a happens- before relationship between two events Wednesday, October 12, 11 30

Vector Clocks • Simple data structure: [(ActorID,Counter)] • All data has an associated vector clock, actors update their entry when making changes • ClockA happened-before ClockB if all actor-counters in A are less than or equal to those in B Wednesday, October 12, 11 31

Read Repair • If a read detects that a vnode has stale data, it is repaired via asynchronous update • Helps implement eventual consistency Wednesday, October 12, 11 32

This is Riak Core • consistent • gossip hashing protocols • vector clocks • virtual nodes (vnodes) • sloppy • hinted hando fg quorums Wednesday, October 12, 11 33

Conclusion • Scaling up is limited • But scaling out requires di fg erent tradeo fg s • CAP Theorem: pick two • AP systems use a variety of techniques to ensure availability and eventual consistency Wednesday, October 12, 11 34

Thanks Wednesday, October 12, 11 35

Understanding Tradeoffs for Scalability Steve Vinoski Architect, - PowerPoint PPT Presentation

Understanding Tradeoffs for Scalability Steve Vinoski Architect, Basho Technologies Cambridge, MA USA @stevevinoski Wednesday, October 12, 11 1 Back In the Old Days Big centralized servers controlled all storage To scale, you

Scalability and Replication Marco Serafini COMPSCI 532 Lecture 13 Scalability 2 Scalability

Performance and Scalability (Chapter 11) Performance and Scalability Performance: How long

Root zone scalability model Bart Gijsen October 28, 2009 Root zone scalability model

REDD+ within the WEL nexus Opportunities and tradeoffs Kristy Graham May 2011 Outline What

Space/time tradeoffs; dynamic programming; y g g transform and conquer 1. Space/time

Storage Tradeoffs in a Collaborative Backup Service for Mobile Devices Ludovic Courts,

Area and Time Tradeoffs in FPGAs Examining the concept of area/time tradeoffs in FPGA design,

Versioning of Topic Map Templates Structuring Versioning and Scalability Scalability Proc.

Hidden Scalability Gotchas Gotchas Hidden Scalability in Memcached Memcached and Friends and

Improving Scalability and Fault Improving Scalability and Fault Tolerance in an Application

Linux multi-core scalability Oct 2009 Andi Kleen Intel Corporation andi@firstfloor.org

Scalability: Pushing the Limits PNSQC Presentation, October 2014 Neha Rai, Tim Schooley, Tejas

Scalability Testing of Kadeploy using Virtual Machines on Grid5000 Luc Sarzyniec, S

Scalability of web applications CSCI 470: Web Science Keith Vertanen Overview Scalability

Scalability and Stability of IP and Compact Routing Huaiyuan Ma PhD defense presentation Feb

Tradeoffs in Infinite Games Martin Zimmermann Saarland University May 15th, 2018 Scientific

Checking Safety by Inductive Generalization of Counterexamples to Induction Aaron R. Bradley and

AFAs and Facing the Client Paul Covey Julia Stahl-Hughes Lucy Curci-Gonzalez, Moderator #AALL14

In-Memory Computing Patterns for High Volume, Real-Time Applications Narendra Paruchuri Murali

Git Workflows Sylvain Bouveret, Grgory Mouni, Matthieu Moy 2017 [first].[last]@imag.fr

Waiting in Line to Vote Queuing theory helps organize thinking about improvements Charles

Auger and conversion electron spectroscopy of medical radioisotope 125 I A magic bullet for cancer

programmatic assessment American Board of Pediatrics retreat on the Future of Testing

Reflecting on Visualization for Cyber Security Carrie Gates carrie.gates@ca.com Sophie Engle