CockroachDB’s Survivability Model Scalable, Survivable, Consistent, SQL presented by Marc Berhault / Engineer @cockroachdb
CockroachDB: Make Data Easy Scalable ■ Survivable ■ Strongly Consistent ■ SQL ■ And... Open Source ■ @cockroachdb
Agenda Architecture: SQL layer ■ Transactions ■ Sharding ■ Replication ■ Survivability: Rebalancing ■ Repairs ■ @cockroachdb
Architecture @cockroachdb
Architecture (high-level) Abstraction stack: In the network: SQL SQL SQL Transactional KV * * Distribution Storage Storage Replication Storage Store Store Store Store Range Range Range Range Range Range Range Range Range Range Range Range Node 1 Node 2 @cockroachdb
SQL CREATE TABLE inventory ( id INTEGER PRIMARY KEY, SQL Transactional KV name VARCHAR, Distribution quantity INTEGER, Replication INDEX name_index (name)); INSERT INTO inventory VALUES (1, “Apple”, 3); @cockroachdb
SQL: Data model ■ Tables inventory name_index ■ Rows id name quantity name id ■ Columns 1 Apple 3 Apple 1 ■ Indexes 2 Orange 12 Banana 4 3 Cherry 5 Cherry 3 4 Banana 7 Orange 2 @cockroachdb
SQL: Key anatomy INSERT INTO inventory VALUES ( 1 , “Apple”, 3); inventory Key: /<table>/<index>/<key>/<column> Value /inventory/primary/ 1 /name Apple /inventory/primary/ 1 /quantity 3 name_index Key: /<table>/<index>/<key> Value /inventory/name_index/Apple 1 @cockroachdb
Transactional KV: consistency ■ Update all keys atomically ■ Track across multiple SQL commands Transactional KV Distribution ■ Retry when necessary Replication @cockroachdb
Optimistic Concurrency ■ CockroachDB uses optimistic concurrency control for lock-free transactions ■ In case of conflict: the losing transaction restarts @cockroachdb
Distribution: scalability ■ Route KV commands to the appropriate shards SQL ■ Split batches if necessary Transactional KV Distribution Replication @cockroachdb
Sharding: Index Each shard holds a contiguous span of the keyspace Ø-lem lem-pea pea-∞ peach apricot lemon banana lime pear blueberry mango pineapple cherry melon raspberry grape orange strawberry @cockroachdb
Sharding: Index An index maps from key to range ID shard index Ø-lem lem-pea pea-∞ Ø-lem lem-pea pea-∞ peach apricot lemon banana lime pear blueberry mango pineapple cherry melon raspberry grape orange strawberry @cockroachdb
Sharding: Split Split when a shard is too large shard index Ø-lem lem-pea pea-str pea-∞ Ø-lem lem-pea pea-str str-∞ peach strawberry apricot lemon banana lime pear tamarillo blueberry mango pineapple tamarind cherry melon raspberry grape orange @cockroachdb
Replication: survivability ■ Each range is replicated to three or more nodes SQL ■ One replica of each range is the Transactional KV Distribution leader Replication @cockroachdb
Replication Each set of replicas is a ■ Node 1 Node 2 Node 3 Raft group Range 1 Range 1 Range 1 Consistency provided by ■ Range 2 Range 2 quorum Range 2 Range 3 Range 3 Node 4 Range 2 Range 3 @cockroachdb
Replication: Node storage ■ Data is stored locally in RocksDB ■ Embedded KV database ■ Provides atomic writes to multiple keys ■ Supports ordered scans @cockroachdb
Reliability @cockroachdb
Reliability ■ Symmetric nodes ■ Auto-balancing ■ Self-healing @cockroachdb
Reliability: Rebalancing Node 1 Node 2 Node 3 Range 1 Range 1 Range 1 Range 2 Range 2 Range 2 Range 2 Range 3 Range 3 Range 3 @cockroachdb
Reliability: Rebalancing Adding a new Node 1 Node 2 Node 3 (empty) node Range 1 Range 1 Range 1 Range 2 Range 2 Range 2 Range 2 Range 3 Range 3 Range 3 Node 4 @cockroachdb
Reliability: Rebalancing A new replica is Node 1 Node 2 Node 3 allocated, data is Range 1 Range 1 Range 1 Range 2 Range 2 Range 2 copied. Range 2 Range 3 Range 3 Range 3 Node 4 Range 3 @cockroachdb
Reliability: Rebalancing The new replica is Node 1 Node 2 Node 3 made live, replacing Range 1 Range 1 Range 1 Range 2 Range 2 Range 2 another. Range 2 Range 3 Range 3 Range 3 Node 4 Range 3 @cockroachdb
Reliability: Rebalancing The old (inactive) Node 1 Node 2 Node 3 replica is deleted. Range 1 Range 1 Range 1 Range 2 Range 2 Range 2 Range 2 Range 3 Range 3 Node 4 Range 3 @cockroachdb
Reliability: Rebalancing Process continues Node 1 Node 2 Node 3 until nodes are Range 1 Range 1 Range 1 Range 2 Range 2 balanced. Range 2 Range 3 Range 3 Node 4 Range 2 Range 3 @cockroachdb
Reliability: Recovery Node 1 Node 2 Node 3 Range 1 Range 1 Range 1 Range 2 Range 2 Range 2 Range 3 Range 3 Node 4 Range 2 Range 3 @cockroachdb
Reliability: Recovery X Losing a node causes Node 1 Node 2 Node 3 recovery of its Range 1 Range 1 Range 1 Range 2 Range 2 replicas. Range 2 Range 3 Range 3 Node 4 Range 2 Range 3 @cockroachdb
Reliability: Recovery X A new replica gets Node 1 Node 2 Node 3 created on an Range 1 Range 1 Range 1 Range 2 Range 2 existing node. Range 2 Range 3 Range 3 Range 3 Node 4 Range 1 Range 2 Range 3 @cockroachdb
Reliability: Recovery Once at full Node 1 Node 3 replication, the old Range 1 Range 1 Range 2 Range 2 replicas are Range 2 Range 3 Range 3 forgotten. Node 4 Range 1 Range 2 Range 3 @cockroachdb
Zone configuration ■ Replication factor (default 3) ■ Geographical location (eg: 2 in Europe, 1 in US) ■ Machine attributes (ssd vs disk) @cockroachdb
Status: BETA @cockroachdb
Status: Beta Ready for development testing Roadmap: ■ Stability ■ Performance ■ Distributed SQL ■ Optimized JOINs @cockroachdb
Thank You github.com/cockroachdb/cockroach CockroachLabs.com Gitter: cockroachdb @cockroachdb @cockroachdb
Recommend
More recommend