Cassandra on RocksDB Dikang Gu Software Engineer @ Facebook
Agenda 1. Motivation 2. Approaches 3. Design 4. Performance metrics 2
3
Stories Direct Live Explore 4
5
Apache Cassandra • Highly scalable partitioned data store • High performance • High availability • Tunable consistency 6
Cassandra at Instagram • Thousands of Cassandra servers • 5 DCs • 100+ product use cases • millions of requests per seconds 7
Top Line Metrics • Reliability • 5-9s, requests failure rate < 0.001% • Performance • Write throughput • Read latency 8
Read Latency
Read Latency 60ms 25ms 5ms 10
GC Stalls 2.5% 1% 11
Where do we play 12
Approach 1: GC Tuning 13
Approach 1: GC Tuning Pros: Cons: • No code changes • Hard to tune for both latency and throughput • Highly depend on the work load • Max 20% P99 latency drop 14
Approach 2: Off-heap Data Structures • Memtable • Caches • Indexes • Read/write path • Compaction • … 15
Approach 2: Off-heap Data Structures Pros: Cons: • Incremental improvements • Play with Java unsafe • Easier to be accepted by • Highly depend on the work load community • Max 20% P99 latency drop 16
Approach 3: C++ Storage Engines • Most memory consumed by storage engine • Memtable, compaction, read/write path, etc • Switch existing Java storage engine to C++ implementation • Pluggable storage engine 17
Approach 3: C++ Storage Engines Pros: Cons: • Greatly reduce JVM overhead • Non-trivial effort to make storage engine to be pluggable • CPU efficiency • JNI efficiency • Long term benefit from pluggable storage engine 18
C++ Storage Engine
20
RocksDB • Embedded C++ key-value database • Optimized for Flash with extremely low latencies • Popular storage engine for Mysql, MongoDB, etc • Open source, Apache 2.0 license 21
Prototype • Support single key-value case • Bypass C* own storage engine • No streaming support • Shadow one production use case 22
Prototype Latency 35ms 15ms 2ms-5ms 23
Prototype GC Stalls 1% 0.5% 0.1% 24
RocksDB + Cassandra = Rocksandra
Challenges 1. Cassandra data model 2. Streaming 26
Design: Data Model 27
Key Encoding 28
Key Encoding 29
Value Encoding 30
Merge operator/Compaction filter 31
Streaming 32
Feature Milestone Current Features: Future Features: • Most of non-nested data types • Multi-partition query • Table data model • Nested data types • Point query • Counters • Range query • Range tombstone • Mutations • Materialized views • Timestamp • Secondary indexes • TTL • Repair • Deletions/Cell tombstones 33
Performance metrics
Cluster A • Similar P99 read/write latency • Footprint reduced to 1/3 35
Cluster B • P99 read latency reduced 3X (60ms to 20ms) • Footprint reduced to 60% 36
Cluster C • High write and large fanout read • P99 read latency reduced from 1s to 10ms • Same footprint 37
Benchmark on AWS • C* cluster in one us-west-2a, replication factor 1. • 3 i3.8xlarge EC2 instance: 256GB memory, 32 core CPU, raid0 with 4 nvme flash disk • NDBench cluster, https://github.com/Netflix/ndbench, run from same AZ 38
Benchmark Metrics 39
Benchmark Metrics 40
Benchmark Metrics 41
Benchmark Metrics 42
Recap Switch to Rocksandra helped us: • Cuts down tail latency • Improves throughput 43
Try it! Don’t just believe what we said, download from github.com/instagram • Rocksandra code • Benchmark cloud formation template and scripts 44
Future work • Support more Cassandra features • Cassandra pluggable storage engine 45
Acknowledgement Thanks for all the support from Cassandra community and RocksDB community 46
Thank You!
Recommend
More recommend