understanding tradeoffs for scalability
play

Understanding Tradeoffs for Scalability Steve Vinoski Architect, - PowerPoint PPT Presentation

Understanding Tradeoffs for Scalability Steve Vinoski Architect, Basho Technologies Cambridge, MA USA @stevevinoski Wednesday, October 12, 11 1 Back In the Old Days Big centralized servers controlled all storage To scale, you


  1. Understanding Tradeoffs for Scalability Steve Vinoski Architect, Basho Technologies Cambridge, MA USA @stevevinoski Wednesday, October 12, 11 1

  2. Back In the Old Days • Big centralized servers controlled all storage • To scale, you scaled vertically (up) by getting a bigger server • Single host guaranteed data consistency Wednesday, October 12, 11 2

  3. Drawbacks • Scaling up is limited • Servers can only get so big • And the bigger they get, the more they cost Wednesday, October 12, 11 3

  4. Hitting the Wall • Websites started outgrowing the scale-up approach • Started applying workarounds to try to scale • Resulted in fragile systems with di ffj cult operational challenges Wednesday, October 12, 11 4

  5. A Distributed Approach • Multiple commodity servers • Scale horizontally (out instead of up) • Read and write on any server • Replicated data • Losing a server doesn’t lose data Wednesday, October 12, 11 5

  6. No Magic Bullet • A distributed approach can scale much larger • But distribution brings its own set of issues • Requires tradeo fg s Wednesday, October 12, 11 6

  7. CAP Theorem • A conjecture put forth in 2000 by Dr. Eric Brewer • Formally proven in 2002 • In any distributed system, pick two: • Consistency • Availability • Partition tolerance Wednesday, October 12, 11 7

  8. Partition Tolerance • Guarantees continued system operation even when the network breaks and messages are lost • Systems generally tend to support P • Leaves choice of either C or A Wednesday, October 12, 11 8

  9. Consistency • Distributed nodes see the same updates at the same logical time • Hard to guarantee across a distributed system Wednesday, October 12, 11 9

  10. Availability • Guarantees the system will service every read and write sent to it • Even when things are breaking Wednesday, October 12, 11 10

  11. Choose Two: CA • Traditional single-node RDBMS • Single node means P irrelevant Wednesday, October 12, 11 11

  12. Choose Two: CP • Typically involves sharding, where data is spread across nodes in an app-specific manner • Sharding can be brittle • data unavailable from a given shard if its node dies • can be hard to add nodes and change the sharding logic Wednesday, October 12, 11 12

  13. Choose Two: AP • Provides read/write availability even when network breaks or nodes die • Provides eventual consistency • Example: Domain Name System (DNS) is an AP system Wednesday, October 12, 11 13

  14. Example AP Systems • Amazon Dynamo • Cassandra • CouchDB • Voldemort • Basho Riak Wednesday, October 12, 11 14

  15. Handling Tradeoffs for AP Systems Wednesday, October 12, 11 15

  16. • Problem: how to make the system available even if nodes die or the network breaks? • Solution: • allow reading and writing from multiple nodes in the system • avoid master nodes, instead make all nodes peers Wednesday, October 12, 11 16

  17. • Problem: if multiple nodes are involved, how do you reliably know where to read or write? • Solution: • assign virtual nodes (vnodes) to physical nodes • use consistent hashing to find vnodes for reads/writes Wednesday, October 12, 11 17

  18. Consistent Hashing Wednesday, October 12, 11 18

  19. Consistent Hashing and Multi Vnode Benefits • Data is stored in multiple locations • Loss of a node means only a single replica is lost • No master to lose • Adding nodes is trivial, data gets rebalanced automatically Wednesday, October 12, 11 19

  20. • Problem: what about availability? What if the node you write to dies or becomes inaccessible? • Solution: sloppy quorums • write to multiple vnodes • attempt reads from multiple vnodes Wednesday, October 12, 11 20

  21. N/R/W Values • N = number of replicas to store (on distinct nodes) • R = number of replica responses needed for a successful read (specified per-request) • W = number of replica responses needed for a successful write (specified per-request) Wednesday, October 12, 11 21

  22. N/R/W Values Wednesday, October 12, 11 22

  23. • Problem: what happens if a key hashes to vnodes that aren’t available? • Solution: • read from or write to the next available vnode • eventually repair via hinted hando fg Wednesday, October 12, 11 23

  24. N/R/W Values Wednesday, October 12, 11 24

  25. Hinted Handoff • Surrogate vnode holds data for unavailable actual vnode • Surrogate vnode keeps checking for availability of actual vnode • Once the actual vnode is again available, surrogate hands o fg data to it Wednesday, October 12, 11 25

  26. Quorum Benefits • Allows applications to tune consistency, availability, reliability per read or write Wednesday, October 12, 11 26

  27. • Problem: how do the nodes in the ring keep track of ring state? • Solution: gossip protocol Wednesday, October 12, 11 27

  28. Gossip Protocol • Nodes “gossip” their view of the state of the ring to other nodes • If a node changes its claim on the ring, it lets others know • The overall state of the ring is thus kept consistent among all nodes in the ring Wednesday, October 12, 11 28

  29. • Problem: what happens if vnodes get out of sync? • Solution: • vector clocks • read repair Wednesday, October 12, 11 29

  30. Vector Clocks • Reasoning about time and causality in distributed systems is hard • Integer timestamps don’t necessarily capture causality • Vector clocks provide a happens- before relationship between two events Wednesday, October 12, 11 30

  31. Vector Clocks • Simple data structure: [(ActorID,Counter)] • All data has an associated vector clock, actors update their entry when making changes • ClockA happened-before ClockB if all actor-counters in A are less than or equal to those in B Wednesday, October 12, 11 31

  32. Read Repair • If a read detects that a vnode has stale data, it is repaired via asynchronous update • Helps implement eventual consistency Wednesday, October 12, 11 32

  33. This is Riak Core • consistent • gossip hashing protocols • vector clocks • virtual nodes (vnodes) • sloppy • hinted hando fg quorums Wednesday, October 12, 11 33

  34. Conclusion • Scaling up is limited • But scaling out requires di fg erent tradeo fg s • CAP Theorem: pick two • AP systems use a variety of techniques to ensure availability and eventual consistency Wednesday, October 12, 11 34

  35. Thanks Wednesday, October 12, 11 35

Recommend


More recommend