a cloud native architecture for replicated data services
play

A Cloud-native Architecture for Replicated Data Services Hemant - PowerPoint PPT Presentation

A Cloud-native Architecture for Replicated Data Services Hemant Saxena, Jeffery Pound University of Waterloo, SAP Labs Waterloo Outline Problem overview Solution overview Kafka Cassandra Evaluation 2 Problem overview Cloud


  1. A Cloud-native Architecture for Replicated Data Services Hemant Saxena, Jeffery Pound University of Waterloo, SAP Labs Waterloo

  2. Outline ● Problem overview ● Solution overview ○ Kafka ○ Cassandra ● Evaluation 2

  3. Problem overview Cloud has become de facto standard for deploying applications ➢ However, applications designed for on-premise infrastructure ➢ find it challenging to leverage the Cloud storage efficiently, because: Data replication for on-premise provides fault-tolerance (FT) and high ○ availability (HA) Whereas, Cloud storage already uses replication to provides FT and HA ○ Making application’s replication redundant resulting into additional storage ○ cost 3

  4. Typical replicated application on-premise client Replicated application replica-set - - - 4

  5. Typical replicated application on Cloud client Replicated application - Application-level replica-set replication - - - (replica-set) - Storage-level replication - Resulting into redundant replicas Storage service - Introducing additional storage cost 5

  6. Problem overview We ask the following research question... How can we easily allow applications designed for on-premise infrastructure to efficiently leverage the Cloud storage? 6

  7. Outline ● Problem overview ● Solution overview ○ Kafka ○ Cassandra ● Evaluation 7

  8. Na ȉ ve solution replica-set - Have one replica (i.e. no application-level replication) - Solves the problem of redundant replication - But, it is prone to node failure. Hence not highly available. 8

  9. Contributions of this work We show how a well-known main-delta architecture can be ➢ used to leverage cloud storage efficiently i.e. ensure no redundant replication ○ while maintaining the fault-tolerance and availability guarantees of the ○ applications We show that incorporating main-delta architecture in ➢ existing on-premise applications is easy by controlling how buffers are managed and flushed to storage ○ and it is compatible with the whole spectrum of replication strategies ○ 9

  10. Quick recap of main-delta architecture Originally designed for ➢ efficiently handling mixed read/update workloads Two parts ➢ ○ Static, read-only, read optimized main ○ Small, write-optimized delta Deltas are merged with the main ○ at regular intervals 10

  11. Solution overview replica-set - Replicated local deltas, maintained by application - But single shared main on Cloud storage (which is fault-tolerant) M M M 11

  12. Solution overview replica-set - Replicated local deltas, maintained by application - But single shared main on Cloud storage (which is fault-tolerant) How to merge the M deltas? M M 12

  13. Merging Deltas to Main Details are in how the delta is merged to the main such that ➢ No data is lost from any deltas ○ And applications have same guarantees as on-premise deployment ○ Delta-merge strategy depends on the replication strategy ➢ ○ Single primary node means single delta to merge Multiple primary nodes means multiple deltas to merge ○ 13

  14. Classification of replication strategies ▪ Write to any, read from ▪ Write to primary, ▪ Write to primary, any (e.g. quorum): read from primary: read from any: Request-handler Request-handler Request-handler replica-set replica-set replica-set 14

  15. Case-study 1: Delta merge for single primary Idea: In-memory buffers as ● deltas , on-disk data as main. Only the primary will merge its ● replica-set delta to main . Other replicas will discard their deltas when they are full. M In case of primary node failure, ● new primary node takes the M M responsibility of merging deltas. 15

  16. Case-study 2: Delta merge for quorum system The memtable and sstables can ● be easily leveraged as delta and main . Deciding which node merges ● the delta is tricky: replica-set Each node can have different set ○ of updates M M M 16

  17. Case-study 2: Delta merge for quorum system Nodes flush their deltas to ● cloud storage Background compaction job ● combines the deltas and merges it to the main 17

  18. Outline ● Problem overview ● Solution overview ○ Kafka ○ Cassandra ● Evaluation 18

  19. Evaluation Want to show that our cloud-native design can save storage cost while ● keeping the performance same Tested performance of our prototype on Kafka and Cassandra ● ○ Used real Cloud infrastructure - Amazon Web Services (AWS) Tested different types of storage types - EBS and EFS ○ 19

  20. Evaluation Implementations: ● md-kafka : main-delta architecture ○ based Kafka implementation kafka : vanilla Kafka ○ 3x storage cost savings ● Replication factor 3x ○ Savings by design ○ Similar write throughput for block base ● storage (EBS) Almost 2x throughput improvement for ● EFS storage, due to batching 20

  21. Evaluation Implementations: ● md-cassandra-efs : main-delta ○ based Cassandra using EFS storage cassandra-ebs : vanilla ○ Cassandra using EBS cassandra-efs : vanilla ○ Cassandra using EFS Close to 2.8x storage cost saving ● With replication factor of 3x ○ Almost similar throughput for all 3 ● types of workloads 21

  22. Conclusion Existing on-premise applications (with replication) when deployed on ➢ cloud ends up with redundant replication We proposed a main-delta based cloud-native architecture to solve this ➢ problem ○ Allowing for storage cost savings up to factor of k (applications replication factor) We show our approach is general enough to work with the complete ➢ spectrum of replication strategies Simplest strategy: single primary (Kafka case study) ○ ○ Complex strategy: quorum based systems(Cassandra case study) 22

  23. Thank you! Contact for any follow-up questions: Hemant Saxena email : hemant.saxena@uwaterloo.ca 23

Recommend


More recommend