nosql like there is no tomorrow
play

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL - PowerPoint PPT Presentation

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian Swami GM, NoSQL @swami_79 @ksshams how can you build your own DynamoDB Scale service? @swami_79 @ksshams lets start with a story about a


  1. NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian Swami GM, NoSQL @swami_79 @ksshams

  2. how can you build your own DynamoDB Scale service? @swami_79 @ksshams

  3. let’s start with a story about a little company called amazon.com @swami_79 @ksshams

  4. episode 1 once upon a time... (in 2000) @swami_79 @ksshams

  5. a few thousand miles away... (seattle) @swami_79 @ksshams

  6. amazon.com - a rapidly growing Internet based retail business relied on relational databases @swami_79 @ksshams

  7. we had 1000s of independent services @swami_79 @ksshams

  8. each service managed its state in RDBMs @swami_79 @ksshams

  9. RDBMs are actually kind of cool @swami_79 @ksshams

  10. first of all... SQL!! @swami_79 @ksshams

  11. so it is easier to query.. @swami_79 @ksshams

  12. easier to learn @swami_79 @ksshams

  13. as versatile as a swiss army knife key-value access complex queries analytics transactions @swami_79 @ksshams

  14. RDBMs are too similar to Swiss Army Knives @swami_79 @ksshams

  15. but sometimes.. swiss army knifes.. can be more than what you bargained for @swami_79 @ksshams

  16. re-partitioning partitioning easy hard.. @swami_79 @ksshams

  17. so we bought bigger boxes... @swami_79 @ksshams

  18. benchmark migrate to new new hardware hardware Q4 was hard-work at Amazon repartition pray databases ... @swami_79 @ksshams

  19. RDBMs availability challenges.. @swami_79 @ksshams

  20. episode 2 then.. (in 2005) @swami_79 @ksshams

  21. amazon dynamo predecessor to dynamoDB replicated DHT with consistent hashing optimistic replication “sloppy quorum” anti-entropy mechanism object versioning specialist tool : • limited querying capabilities • simpler consistency @swami_79 @ksshams

  22. dynamo had many benefits • higher availability • we traded it off for eventual consistency 
 • incremental scalability • no more repartitioning • no need to architect apps for peak • just add boxes 
 • simpler querying model ==>> predictable performance @swami_79 @ksshams

  23. but dynamo was not perfect... lacked strong consistency @swami_79 @ksshams

  24. but dynamo was not perfect... scaling was easier, but... @swami_79 @ksshams

  25. but dynamo was not perfect... steep learning curve @swami_79 @ksshams

  26. but dynamo was not perfect... dynamo was a product ... ==>> not a service... @swami_79 @ksshams

  27. episode 3 then.. (in 2012) @swami_79 @ksshams

  28. DynamoDB • NoSQL database • fast & predictable performance • seamless scalability • easy administration “Even though we have years of experience with large, complex NoSQL architectures, we are happy to be finally out of the business of managing it ourselves.” - Don MacAskill, CEO @swami_79 @ksshams

  29. build services not software!! @swami_79 @ksshams

  30. amazon.com’s experience with services @swami_79 @ksshams

  31. how do you create a successful service? @swami_79 @ksshams

  32. with great services, comes great responsibility @swami_79 @ksshams

  33. DynamoDB Goals and Philosophies never compromise on durability scale is our problem easy to use consistent and low latencies scale in rps @swami_79 @ksshams

  34. Architect Develop Goals Customer Test Monitor o y p l D e @swami_79 @ksshams

  35. Architect Develop Goals Test Customer Monitor Deploy @swami_79 @ksshams

  36. Sacred Tenets in Services don’t compromise durability for performance plan for success - plan for scalability plan for failures - fault -tolerance is key consistent performance is important design - think of blast radius insist on correctness @swami_79 @ksshams

  37. fault tolerance is a lesson best learned offline @swami_79 @ksshams

  38. a simple 2-way replication system of a traditional database… Writes Primary Standby @swami_79 @ksshams

  39. P ¡is ¡dead, ¡need ¡to ¡ S ¡is ¡dead, ¡need ¡ promote ¡myself to ¡trigger ¡new ¡ replica P P’ S @swami_79 @ksshams @swami_79 @ksshams

  40. improved Replication: quorum Writes Replica Replica Replica Quorum: Successful write on a majority @swami_79 @ksshams @swami_79 @ksshams

  41. Not so easy.. Replica D Writes from New member in the client A group Replica A Replica B Reads and Writes from Replica C client B Should I continue to serve reads? Should I start a new quorum? Replica E Replica F Classic Split Brain Issue in Replicated systems leading to lost writes!

  42. Building correct distributed systems is not straight forward.. How do you handle replica failures? • How do you ensure there is not a parallel • quorum? How do you handle partial failures of replicas? • How do you handle concurrent failures? • @swami_79 @ksshams

  43. correctness is hard, but necessary

  44. Formal Methods

  45. Formal Methods to minimize bugs, we must have a precise description of the design

  46. Formal Methods code is too detailed design documents and diagrams are vague & imprecise how would you express partial failures or concurrency?

  47. Formal Methods law of large numbers is your friend, until you hit large numbers so design for scale

  48. TLA+ to the rescue? @swami_79 @ksshams

  49. PlusCal @swami_79 @ksshams

  50. formal methods are necessary but not sufficient.. @swami_79 @ksshams

  51. Test p o l e v e D Deploy t c customer e t i h r c o r t i A n o Goals M @swami_79 @ksshams

  52. don’t forget to test - no, serious .. ly @swami_79 @ksshams

  53. simulate failures at fault injection testing unit test level scale testing embrace failure and don’t be surprised datacenter testing network brown out testing

  54. testing is a lifelong journey

  55. testing is necessary but not sufficient.. @swami_79 @ksshams

  56. Deploy Monitor Test Customer p o G l e o v a e l s D Architect @swami_79 @ksshams

  57. gamma one box simulate real world does it work? release cycle phased deployment monitor treading lightly does it still work? @swami_79 @ksshams

  58. Canaries @swami_79 @ksshams

  59. Alarms @swami_79 @ksshams

  60. Monitor customer behavior Monitor Deploy G o a l s Customer Architect t s e T Develop @swami_79 @ksshams

  61. measuring customer experience is key don’t be satisfied by average - look at 99 percentile @swami_79 @ksshams

  62. understand the scaling dimensions @swami_79 @ksshams

  63. understand how your service will be abused @swami_79 @ksshams

  64. let’s see these rules in action through a true story @swami_79 @ksshams

  65. we were building distributed systems all over amazon.com @swami_79 @ksshams

  66. we needed a uniform and correct way to do consensus.. @swami_79 @ksshams

  67. so we built a paxos lock library service @swami_79 @ksshams

  68. such a service is so much more useful than just leader election.. it became a distributed state store @swami_79 @ksshams

  69. such a service is so much more useful than just leader election.. or a distributed state store wait wait.. you’re telling me if I poll, I can detect node failure? @swami_79 @ksshams

  70. we acted quickly - and scaled up our entire fleet with more nodes doh!!!! we slowed consensus... @swami_79 @ksshams

  71. understand the scaling dimensions & scale them independently... @swami_79 @ksshams

  72. a lock service has 3 components.. Failure Notification L e a d e r E l e c t i o n State Store @swami_79 @ksshams

  73. they must be scaled independently.. n o L i e t a a c d i e f i r t o E N l e e c r t u i o l i n a F State Store @swami_79 @ksshams

  74. they must be scaled independently.. n o i L t e a a c d i f i e t o r N E l e e c r u t i l o i a n F State Store @swami_79 @ksshams

  75. they must be scaled independently.. n o i t a c L i f e i t a o d N e r e E r u l e l i c a t F i o n State Store @swami_79 @ksshams

  76. understand observe scaling dimensions how service is used monitor relentlessly like a hawk test strive � scalability over features for correctness @swami_79 @ksshams

  77. @swami_79 Thank You! @kshams @swami_79 @ksshams

Recommend


More recommend