CRaft: Building High-Performance Consensus Protocols with Accurate Clocks Feiran Wang*, Balaji Prabhakar*, Mendel Rosenblum*, Gene Zhang† *Stanford University, †eBay Inc.
Overview • CRaft: a multi-leader extension to Raft enabled by accurate clocks Existing protocol Synchronized clocks Better performance 2
State Machines • Maintain internal states • Respond to external requests • Examples: databases, storage systems State State State y ← x x ← 1 x: 2 x: 1 x: 1 y: 3 y: 1 y: 3 • How do we make them reliable? 3
Replicated State Machines Client Client x ← 1 Servers Consensus State Consensus State Consensus State Machine Machine Machine Log Log Log x: 2 x: 2 x: 2 x ← 1 y ← 1 … x ← 1 y ← 1 … x ← 1 y ← 1 … y: 3 y: 3 y: 3 • Consensus: ensures all servers agree on the same log • Continues to operate if at least a majority of servers are up Diego Ongaro and John Ousterhout. The Raft consensus algorithm. https://raft.github.io 4
The Raft Consensus Protocol • A widely used consensus protocol Client Client Client • Leader-based • Benefits: simple and efficient • Limitation: leader is the bottleneck for Leader x ← 1 y ← 1 … throughput and scalability Follower Leader Follower x ← 1 y ← 1 … x ← 1 y ← 1 … Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In USENIX Annual Technical Conference. 305–319. 5
Limitations with Single Leader • Single leader limits throughput and scalability Performance degrades with high load Decreasing throughput with larger cluster sizes Load increases 6
Challenge in a Multi-Leader Protocol Single leader Multiple leaders Replicate my log I have a log I have a log I have a log ok ok • Challenge: how to coordinate leaders? • Solution: agreement on time => agreement on order 7
Clock Synchronization • Achieving agreement on time is not trivial in a distributed system • Huygens: a software clock synchronization system Distribution of clock offsets between servers NTP precision: ~20ms (20 machines on CloudLab) Percentile 90th 99th 99.9th max Clock offset 7us 11us 15us 26us Huygens precision: ~20us Yilong Geng, Shiyu Liu, Zi Yin, Ashish Naik, Balaji Prabhakar, Mendel Rosenblum, and Amin Vahdat. Exploiting a natural network effect for scalable, fine-grained clock synchronization. In NSDI 2018. 81–94. 8
Our Approach: CRaft Raft CRaft (Clocks + Raft) Scalability Output A replicate log A replicated log ✓ ✓ Safety & Consistency Same guarantee as Raft ✓ ✓ Practicability A simple add-on to Raft; easy to implement 9
The CRaft Consensus Protocol
CRaft Overview Client Client Client Merged log Merged log Merged log Leader Follower Follower Group 1 Follower Follower Leader Group 2 Follower Follower Leader Group 3 Server 1 Server 2 Server 3 11
Life of a Request Replicated on a majority of servers • Safe and durable • Client Replicate Commit Execute Merge log log log Leader Follower Follower Follower Leader Follower Follower Follower Leader State Machine State Machine State Machine Server 1 Server 2 Server 3 12
Timestamp Management Log Merged log Safe time = 20 Leader 2 4 5 index 1 3 timestamp 1 4 6 17 18 Follower x ← 1 y ← 1 y ← x x ← 2 x ← 5 command Follower Server • CRaft guarantees monotonically increasing timestamps in each log • Safe time: indicates how up-to-date a log is 13
Safe Times How up-to-date is this log? Now Log … 1 4 6 17 18 23 25 Safe time = 20 Current entries: No entries come in timestamps <= with a timestamp safe time smaller than safe time 14
Merging index 1 2 3 4 5 Log 1 1 4 6 17 18 ts = 18 merged log … Log 2 2 5 12 ts = 12 5 6 8 10 12 Log 3 ts = 19 3 8 10 15 • Merge up to the smallest safe time • CRaft ensures merged log in monotonically increasing timestamp order 15
Optimization: Fast Path Replicate Commit Merge Execute Fast path: respond Normal path: respond before execution after execution • Fast path: respond to clients early for certain write operations 16
Evaluation
Experiment Setup • Implementation • Based on HashiCorp Raft – a popular and well-optimized implementation • Environment • CloudLab, single data center • Workload • In-memory key-value store • Multiple clients send get or set requests concurrently 18
Throughput vs Cluster Size • Up to ~2x read and ~2.5x write throughput compared to Raft 19
Latency vs Throughput Average latency vs throughput (3 servers) 99th percentile latency vs throughput (3 servers) Performance gain under high load Load increases Load increases • CRaft improves throughput and latency under high load 20
Performance vs Number of Clients Throughput Average Latency 2x Latency is bounded by clock difference 2x 2x • NTP precision: ~20ms, Huygens: ~20us 21
Conclusion Better performance Existing systems Stronger consistency Synchronized clocks • Accurate clocks enable better performance and/or consistency 22
Thank you!
Recommend
More recommend