amazon dynamo
play

Amazon Dynamo A Highly Available Key-value Store Present by Jian - PowerPoint PPT Presentation

Amazon Dynamo A Highly Available Key-value Store Present by Jian Fang jianf@cmu.edu What is Dynamo Eventually consistent key-value store Support scalable highly available data access Optimized for availability to maximize customer


  1. Amazon Dynamo A Highly Available Key-value Store Present by Jian Fang jianf@cmu.edu

  2. What is Dynamo  Eventually consistent key-value store  Support scalable highly available data access  Optimized for availability to maximize customer satisfaction

  3. Why not RDBMS?  Only need primary-key access  RDBMS have limited scalability  RDBMS require expensive hardware and skillful administrators

  4. Amazon’s Requirements  Objects are less than 1MB  No operations span for multiple data  <300ms response time for 99.9% requests  Heterogeneous commodity hardware infrastructure  Decentralized, loosely coupled services  Highly available(always writable)

  5. Techniques used in Dynamo  Consistent Hashing  Vector clocks  Sloppy Quorum and Hinted handoff  Merkle trees  Gossip-based membership protocol

  6. Interfaces  Key-value storage system with operators:  Get(key): return a single or a list of objects with conflicting versions  Put(key, context, object): context contains the version information  MD5 hashing is applied on the key to generate 128-bit identifier

  7. Partitioning  Scale Incrementally  Consistent Hashing  Variant of Consistent Hashing

  8. Consistent Hashing 12 keys, N = 3  Simple Non-Consistent Hashing  𝐼𝑏𝑡ℎ 𝑙𝑓𝑧 𝑛𝑝𝑒 𝑂  What if N = N + 1 S1 S2 S3  6 keys(a half) remapped  Consistent Hashing  Only K/N keys need to be remapped 12 keys, N = 4 S1 S2 S3 S4

  9. Consistent Hashing A Key Z Key X D C B Key Y

  10. Consistent Hashing  Not good enough  Non-uniform load distribution  No heterogeneity in node’s performance  Variant of Consistent Hashing  Virtual Nodes

  11. Variant of Consistent Hashing S1 S2 S3 S3 S2 S1 Q = 12 (Virtual Nodes) S = 3 (Physical Nodes) T = Q/S = 4 (Tokens) S1 S2 S3 S3 S2 S1

  12. Variant of Consistent Hashing S3 S1 S2 S4 S3 S1 S2 S4 Q = 12 (Virtual Nodes) S = 4 (Physical Nodes) T = Q/S = 4 (Tokens) S1 S2 S3 S3 S4 S1 S2

  13. Replication Key Z  A coordinator Node(i)  (N-1) clockwise successor nodes as replicas Node(i) A  Node(i) update all other (N-1) replicas  A preference list of nodes  List size > N B D C Preference List = [A,B,C,D]

  14. Data Versioning  Eventual Consistency  Put() is returned before updating all replicas  Get() can return multiple versions for the same key  Data mutation as new version  Vector Clock

  15. Vector Clock(Example) Supplier A 500$ Sx Sy Sz 500$(1,0,0) 500$(1,0,0) 500$(1,0,0)

  16. Vector Clock(Example) Supplier A 550$ Sx Sy Sz 500$(1,0,0) 500$(1,0,0) 500$(1,0,0) 550$(2,0,0) 550$(2,0,0) 550$(2,0,0)

  17. Vector Clock(Example) Supplier B 600$ Sx Sy Sz 500$(1,0,0) 500$(1,0,0) 500$(1,0,0) 550$(2,0,0) 550$(2,0,0) 550$(2,0,0) 600$(2,1,0)

  18. Vector Clock(Example) Supplier C 650$ Sx Sy Sz 500$(1,0,0) 500$(1,0,0) 500$(1,0,0) 550$(2,0,0) 550$(2,0,0) 550$(2,0,0) 650$(2,0,1) 600$(2,1,0) 650$(2,0,1) 650$(2,0,1) Conflict!

  19. Vector Clock(Example) Supplier B Resolve Conflict Choose 650$ 600$(2,1,0)/650$(2,0,1) Sx Sy Sz 500$(1,0,0) 500$(1,0,0) 500$(1,0,0) 550$(2,0,0) 550$(2,0,0) 550$(2,0,0) 650$(2,0,1) 600$(2,1,0) 650$(2,0,1) 650$(2,0,1)

  20. Vector Clock(Example) Supplier B 650$(2,1,1) Sx Sy Sz 500$(1,0,0) 500$(1,0,0) 500$(1,0,0) 550$(2,0,0) 550$(2,0,0) 550$(2,0,0) 650$(2,0,1) 600$(2,1,0)/650$(2,0,1) 650$(2,0,1) 650$(2,1,1) 650$(2,1,1) 650$(2,1,1)

  21. Processing get() and put()  How to select a coordinator node  Load balancer (server-driven)  Partition aware client library (client-driven) N  Quorum-like system for consistency  W + R > N W R  Typical value: W=2 R=2 N=3

  22. Hinted Handoff Put() A B D A C

  23. Hinted Handoff A B D A C

  24. Replica Synchronization(Merkle Tree) Row key1 Row key2 Row key3 Row key4 128 Token: 5 Token: 135 Token: 170 Token: 185 0x0010 Hash: 0x1001 Hash: 0x1100 Hash: 0x0101 Hash: 0x0010 Range: (0,256] Depth: 3 64 192 Tokens: 8 * 32 XOR 0x1001 0x1011 32 96 160 224 XOR XOR 0 0x1011 0 0x1001 (128,160] (160,192] (192,224] (224,256] (64,96} (96,128] (0,32] (32,64} XOR XOR XOR XOR 0 0 0x1100 0x0111 0 0 0 0x1001 Example from: http://bit.ly/1fUa0CS

  25. Performance

  26. Q&A Thank you!

Recommend


More recommend