Rocksteady: Fast Migration for Low-Latency In-memory Storage Chinmay Kulkarni , Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman 1
Introduction • Distributed low-latency in-memory key-value stores are emerging • Predictable response times ~ 10 µs median, ~60 µs 99.9 th -tile • Problem: Must migrate data between servers • Minimize performance impact of migration → go slow? • Quickly respond to hot spots, skew shifts, load spikes → go fast? • Solution: Fast data migration with low impact • Early ownership transfer of data, leverage workload skew • Low priority, parallel and adaptive migration • Result: Migration protocol for RAMCloud in-memory key-value store • Migrates 256 GB in 6 minutes, 99.9 th -tile latency less than 250 µs • Median latency recovers from 40 µs to 20 µs in 14 s 2
Why Migrate Data? Client 1 Client 2 multiGet( ) multiGet( ) A B C D 6 Million No Locality Fanout=7 A C B D Server 1 Server 2 Poor spatial locality → High multiGet() fan-out → More RPCs 3
Migrate To Improve Spatial Locality Client 1 Client 2 multiGet( ) multiGet( ) A B C D 6 Million No Locality Fanout=7 B A C B D Server 1 Server 2 C 4
Spatial Locality Improves Throughput Client 1 Client 2 25 Million multiGet( ) multiGet( ) A B C D 6 Million Full Locality No Locality Fanout=1 Fanout=7 A B C D Server 1 Server 2 Better spatial locality → Fewer RPCs → Higher throughput Benefits multiGet(), range scans 5
The RAMCloud Key-Value Store Client Client Client Client All Data in RAM Kernel Bypass/ DPDK Data Center Fabric Coordinator 10 µs reads Master Master Master Master Backup Backup Backup Backup 6
The RAMCloud Key-Value Store Client Client Client Client Write RPC Data Center Fabric Coordinator Master Master Master Master Backup Backup Backup Backup 7
The RAMCloud Key-Value Store Client Client Client Client Write RPC 1x in DRAM Data Center Fabric Coordinator 3x on Disk Master Master Master Master Backup Backup Backup Backup 8
Fault-toler eranc nce & e & Rec ecover ery I In n RAM AMCl Cloud ud Client Client Client Client Data Center Fabric Coordinator Master Master Master Master Backup Backup Backup Backup 9
Fault-toler eranc nce & e & Rec ecover ery I In n RAM AMCl Cloud ud Client Client Client Client 2 seconds to Data Center Fabric Coordinator recover Master Master Master Master Backup Backup Backup Backup 10
Performance Goals For Migration • Maintain low access latency • 10 µsec median latency → System extremely sensitive • Tail latency matters at scale → Even more sensitive • Migrate data fast • Workloads dynamic → Respond quickly • Growing DRAM storage: 512 GB per server • Slow data migration → Entire day to scale cluster 11
Rocksteady Overview: Early Ownership Transfer Problem: Loaded source can bottleneck migration Solution: Instantly shift ownership and all load to target Client 1 Client 2 Client 3 Client 4 Reads and Writes Source Server Target Server 12
Rocksteady Overview: Early Ownership Transfer Problem: Loaded source can bottleneck migration Solution: Instantly shift ownership and all load to target Client 1 Client 2 Client 3 Client 4 Instantly All future operations Redirected serviced at Target Creates “headroom” to speed migration Source Server Target Server 13
Rocksteady Overview: Leverage Skew Problem: Data has not arrived at source yet Solution: On demand migration of unavailable data Client 1 Client 2 Client 3 Client 4 Read On-demand Pull Source Server Target Server 14
Rocksteady Overview: Leverage Skew Problem: Data has not arrived at source yet Solution: On demand migration of unavailable data Client 1 Client 2 Client 3 Client 4 Read Hot keys move early Median Latency recovers to 20 µs in 14 s Source Server Target Server 15
Rocksteady Overview: Adaptive and Parallel Problem: Old single-threaded protocol limited to 130 MB/s Solution: Pipelined and parallel at source and target Client 1 Client 2 Client 3 Client 4 On-demand Pull Parallel Pulls Source Server Target Server 16
Rocksteady Overview: Adaptive and Parallel Problem: Old single-threaded protocol limited to 130 MB/s Solution: Pipelined and parallel at source and target Client 1 Client 2 Client 3 Client 4 On-demand Pull Target Driven Yields to On-demand Pulls Parallel Pulls Moves 758 MB/s Source Server Target Server 17
Rocksteady Overview: Eliminate Sync Replication Problem: Synchronous replication bottleneck at target Solution: Safely defer replication until after migration Client 1 Client 2 Client 3 Client 4 Replication On-demand Pull Parallel Pulls Source Server Target Server 18
Rocksteady Overview: Eliminate Sync Replication Problem: Synchronous replication bottleneck at target Solution: Safely defer replication until after migration Client 1 Client 2 Client 3 Client 4 Replication Source Server Target Server 19
Rocksteady: Putting it all together • Instantaneous ownership transfer • Immediate load reduction at overloaded source • Creates “headroom” for migration work • Leverage skew to rapidly migrate hot data • Target comes up to speed with little data movement • Adaptive parallel, pipelined at source and target • All cores avoid stalls, but yield to client-facing operations • Safely defer replication at target • Eliminates replication bottleneck and contention 20
Rocksteady • Instantaneous ownership transfer • Leverage skew to rapidly migrate hot data • Adaptive parallel, pipelined at source and target • Safely defer synchronous replication at target 21
Evaluation Setup Client Client Client Client YCSB-B (95/5) YCSB-B (95/5) YCSB-B (95/5) YCSB-B (95/5) Skew=0.99 Skew=0.99 Skew=0.99 Skew=0.99 300 Million Records 45 GB Source Server Target Server 22
Evaluation Setup Client Client Client Client YCSB-B (95/5) YCSB-B (95/5) YCSB-B (95/5) YCSB-B (95/5) Skew=0.99 Skew=0.99 Skew=0.99 Skew=0.99 150 Million Records 150 Million 22.5 GB Records 150 Million 22.5 GB Records 22.5 GB Target Server 23
Instantaneous Ownership Transfer Source CPU Utilization 80% Created 55% Source CPU Headroom 25% Before Ownership Immediately After Transfer Transfer Before migration: Source over-loaded, Target under-loaded Ownership transfer creates Source headroom for migration 24
Rocksteady • Instantaneous ownership transfer • Leverage skew to rapidly migrate hot data • Adaptive parallel, pipelined at source and target • Safely defer synchronous replication at target 25
Leverage Skew To Move Hot Data Before Migration: 99.9th Latency Median Latency 245µs 240µs Median=10 µs 99.9 th = 60 µs 155µs 75µs 28µs 17µs Uniform (Low) Skew=0.99 Skew=1.5 (High) After ownership transfer, hot keys pulled on-demand More skew → Median restored faster (migrate fewer hot keys) 26
Rocksteady • Instantaneous ownership transfer • Leverage skew to rapidly migrate hot data • Adaptive parallel, pipelined at source and target • Safely defer synchronous replication at target 27
Parallel, Pipelined, & Adaptive Pulls 0 8 16 24 Target Hash Table Per-Core Buffers Worker Cores replay replay read(B) Dispatch Migration Core Manager Pull Buffers NIC Polling pulling • Target driven, migration manager • Co-partitioned hash tables, pull from partitions in parallel • Replay pulled data into per-core buffers 28
Parallel, Pipelined, & Adaptive Pulls 0 8 16 24 Source Hash Table Copy Addresses read(A) pull(11) pull(17) Dispatch Core Gather Gather List List NIC Polling • Stateless passive Source • Granular 20 KB pulls 29
Parallel, Pipelined, & Adaptive Pulls • Redirect any idle CPU for migration • Migration yields to regular requests, on-demand pulls 30
Rocksteady • Instantaneous ownership transfer • Leverage skew to rapidly migrate hot data • Adaptive parallel, pipelined at source and target • Safely defer synchronous replication at target 31
Naïve Fault Tolerance During Migration Each server has a recovery log distributed across the cluster Source Target A A B C Backup Backup Backup Backup Backup Source C A B C A A C B B Recovery Log Target Recovery Log 32
Naïve Fault Tolerance During Migration Migrated data needs to be triplicated to target’s recovery log Source Target A B C A Backup Backup Backup Backup Backup Source C A B C A A C B B Recovery Log Target Recovery Log 33
Naïve Fault Tolerance During Migration Migrated data needs to be triplicated to target’s recovery log Source Target A B C A Backup Backup Backup Backup Backup Source C A B C A A C B B Recovery Log Target A A A Recovery Log 34
Synchronous Replication Bottlenecks Migration Synchronous replication hits migration speed by 34% Source Target A B C A B Backup Backup Backup Backup Backup Source C A B C A A C B B Recovery Log Target B A B A B A Recovery Log 35
Recommend
More recommend