cassandra on rocksdb
play

Cassandra on RocksDB Dikang Gu Software Engineer @ Facebook Agenda - PowerPoint PPT Presentation

Cassandra on RocksDB Dikang Gu Software Engineer @ Facebook Agenda 1. Motivation 2. Approaches 3. Design 4. Performance metrics 2 3 Stories Direct Live Explore 4 5 Apache Cassandra Highly scalable partitioned data store


  1. Cassandra on RocksDB Dikang Gu Software Engineer @ Facebook

  2. Agenda 1. Motivation 2. Approaches 3. Design 4. Performance metrics 2

  3. 3

  4. Stories Direct Live Explore 4

  5. 5

  6. Apache Cassandra • Highly scalable partitioned data store • High performance • High availability • Tunable consistency 6

  7. Cassandra at Instagram • Thousands of Cassandra servers • 5 DCs • 100+ product use cases • millions of requests per seconds 7

  8. Top Line Metrics • Reliability • 5-9s, requests failure rate < 0.001% • Performance • Write throughput • Read latency 8

  9. Read Latency

  10. Read Latency 60ms 25ms 5ms 10

  11. GC Stalls 2.5% 1% 11

  12. Where do we play 12

  13. Approach 1: GC Tuning 13

  14. Approach 1: GC Tuning Pros: Cons: • No code changes • Hard to tune for both latency and throughput • Highly depend on the work load • Max 20% P99 latency drop 14

  15. Approach 2: Off-heap Data Structures • Memtable • Caches • Indexes • Read/write path • Compaction • … 15

  16. Approach 2: Off-heap Data Structures Pros: Cons: • Incremental improvements • Play with Java unsafe • Easier to be accepted by • Highly depend on the work load community • Max 20% P99 latency drop 16

  17. Approach 3: C++ Storage Engines • Most memory consumed by storage engine • Memtable, compaction, read/write path, etc • Switch existing Java storage engine to C++ implementation • Pluggable storage engine 17

  18. Approach 3: C++ Storage Engines Pros: Cons: • Greatly reduce JVM overhead • Non-trivial effort to make storage engine to be pluggable • CPU efficiency • JNI efficiency • Long term benefit from pluggable storage engine 18

  19. C++ Storage Engine

  20. 20

  21. RocksDB • Embedded C++ key-value database • Optimized for Flash with extremely low latencies • Popular storage engine for Mysql, MongoDB, etc • Open source, Apache 2.0 license 21

  22. Prototype • Support single key-value case • Bypass C* own storage engine • No streaming support • Shadow one production use case 22

  23. Prototype Latency 35ms 15ms 2ms-5ms 23

  24. Prototype GC Stalls 1% 0.5% 0.1% 24

  25. RocksDB + Cassandra = Rocksandra

  26. Challenges 1. Cassandra data model 2. Streaming 26

  27. Design: Data Model 27

  28. Key Encoding 28

  29. Key Encoding 29

  30. Value Encoding 30

  31. Merge operator/Compaction filter 31

  32. Streaming 32

  33. Feature Milestone Current Features: Future Features: • Most of non-nested data types • Multi-partition query • Table data model • Nested data types • Point query • Counters • Range query • Range tombstone • Mutations • Materialized views • Timestamp • Secondary indexes • TTL • Repair • Deletions/Cell tombstones 33

  34. Performance metrics

  35. Cluster A • Similar P99 read/write latency • Footprint reduced to 1/3 35

  36. Cluster B • P99 read latency reduced 3X (60ms to 20ms) • Footprint reduced to 60% 36

  37. Cluster C • High write and large fanout read • P99 read latency reduced from 1s to 10ms • Same footprint 37

  38. Benchmark on AWS • C* cluster in one us-west-2a, replication factor 1. • 3 i3.8xlarge EC2 instance: 256GB memory, 32 core CPU, raid0 with 4 nvme flash disk • NDBench cluster, https://github.com/Netflix/ndbench, run from same AZ 38

  39. Benchmark Metrics 39

  40. Benchmark Metrics 40

  41. Benchmark Metrics 41

  42. Benchmark Metrics 42

  43. Recap Switch to Rocksandra helped us: • Cuts down tail latency • Improves throughput 43

  44. Try it! Don’t just believe what we said, download from github.com/instagram • Rocksandra code • Benchmark cloud formation template and scripts 44

  45. Future work • Support more Cassandra features • Cassandra pluggable storage engine 45

  46. Acknowledgement Thanks for all the support from Cassandra community and RocksDB community 46

  47. Thank You!

Recommend


More recommend