Understanding Tradeoffs for Scalability Steve Vinoski Architect, Basho Technologies Cambridge, MA USA @stevevinoski Wednesday, October 12, 11 1
Back In the Old Days • Big centralized servers controlled all storage • To scale, you scaled vertically (up) by getting a bigger server • Single host guaranteed data consistency Wednesday, October 12, 11 2
Drawbacks • Scaling up is limited • Servers can only get so big • And the bigger they get, the more they cost Wednesday, October 12, 11 3
Hitting the Wall • Websites started outgrowing the scale-up approach • Started applying workarounds to try to scale • Resulted in fragile systems with di ffj cult operational challenges Wednesday, October 12, 11 4
A Distributed Approach • Multiple commodity servers • Scale horizontally (out instead of up) • Read and write on any server • Replicated data • Losing a server doesn’t lose data Wednesday, October 12, 11 5
No Magic Bullet • A distributed approach can scale much larger • But distribution brings its own set of issues • Requires tradeo fg s Wednesday, October 12, 11 6
CAP Theorem • A conjecture put forth in 2000 by Dr. Eric Brewer • Formally proven in 2002 • In any distributed system, pick two: • Consistency • Availability • Partition tolerance Wednesday, October 12, 11 7
Partition Tolerance • Guarantees continued system operation even when the network breaks and messages are lost • Systems generally tend to support P • Leaves choice of either C or A Wednesday, October 12, 11 8
Consistency • Distributed nodes see the same updates at the same logical time • Hard to guarantee across a distributed system Wednesday, October 12, 11 9
Availability • Guarantees the system will service every read and write sent to it • Even when things are breaking Wednesday, October 12, 11 10
Choose Two: CA • Traditional single-node RDBMS • Single node means P irrelevant Wednesday, October 12, 11 11
Choose Two: CP • Typically involves sharding, where data is spread across nodes in an app-specific manner • Sharding can be brittle • data unavailable from a given shard if its node dies • can be hard to add nodes and change the sharding logic Wednesday, October 12, 11 12
Choose Two: AP • Provides read/write availability even when network breaks or nodes die • Provides eventual consistency • Example: Domain Name System (DNS) is an AP system Wednesday, October 12, 11 13
Example AP Systems • Amazon Dynamo • Cassandra • CouchDB • Voldemort • Basho Riak Wednesday, October 12, 11 14
Handling Tradeoffs for AP Systems Wednesday, October 12, 11 15
• Problem: how to make the system available even if nodes die or the network breaks? • Solution: • allow reading and writing from multiple nodes in the system • avoid master nodes, instead make all nodes peers Wednesday, October 12, 11 16
• Problem: if multiple nodes are involved, how do you reliably know where to read or write? • Solution: • assign virtual nodes (vnodes) to physical nodes • use consistent hashing to find vnodes for reads/writes Wednesday, October 12, 11 17
Consistent Hashing Wednesday, October 12, 11 18
Consistent Hashing and Multi Vnode Benefits • Data is stored in multiple locations • Loss of a node means only a single replica is lost • No master to lose • Adding nodes is trivial, data gets rebalanced automatically Wednesday, October 12, 11 19
• Problem: what about availability? What if the node you write to dies or becomes inaccessible? • Solution: sloppy quorums • write to multiple vnodes • attempt reads from multiple vnodes Wednesday, October 12, 11 20
N/R/W Values • N = number of replicas to store (on distinct nodes) • R = number of replica responses needed for a successful read (specified per-request) • W = number of replica responses needed for a successful write (specified per-request) Wednesday, October 12, 11 21
N/R/W Values Wednesday, October 12, 11 22
• Problem: what happens if a key hashes to vnodes that aren’t available? • Solution: • read from or write to the next available vnode • eventually repair via hinted hando fg Wednesday, October 12, 11 23
N/R/W Values Wednesday, October 12, 11 24
Hinted Handoff • Surrogate vnode holds data for unavailable actual vnode • Surrogate vnode keeps checking for availability of actual vnode • Once the actual vnode is again available, surrogate hands o fg data to it Wednesday, October 12, 11 25
Quorum Benefits • Allows applications to tune consistency, availability, reliability per read or write Wednesday, October 12, 11 26
• Problem: how do the nodes in the ring keep track of ring state? • Solution: gossip protocol Wednesday, October 12, 11 27
Gossip Protocol • Nodes “gossip” their view of the state of the ring to other nodes • If a node changes its claim on the ring, it lets others know • The overall state of the ring is thus kept consistent among all nodes in the ring Wednesday, October 12, 11 28
• Problem: what happens if vnodes get out of sync? • Solution: • vector clocks • read repair Wednesday, October 12, 11 29
Vector Clocks • Reasoning about time and causality in distributed systems is hard • Integer timestamps don’t necessarily capture causality • Vector clocks provide a happens- before relationship between two events Wednesday, October 12, 11 30
Vector Clocks • Simple data structure: [(ActorID,Counter)] • All data has an associated vector clock, actors update their entry when making changes • ClockA happened-before ClockB if all actor-counters in A are less than or equal to those in B Wednesday, October 12, 11 31
Read Repair • If a read detects that a vnode has stale data, it is repaired via asynchronous update • Helps implement eventual consistency Wednesday, October 12, 11 32
This is Riak Core • consistent • gossip hashing protocols • vector clocks • virtual nodes (vnodes) • sloppy • hinted hando fg quorums Wednesday, October 12, 11 33
Conclusion • Scaling up is limited • But scaling out requires di fg erent tradeo fg s • CAP Theorem: pick two • AP systems use a variety of techniques to ensure availability and eventual consistency Wednesday, October 12, 11 34
Thanks Wednesday, October 12, 11 35
Recommend
More recommend