Implementing Raft protocol by coroutines and Ktor Framework Andrii Rodionov @AndriiRodionov
About me ● Devoxx Ukraine organizer ● KNight Kyiv co-organizer ● JUG UA Leader ● Kyiv Kotlin User Group Co-leader
Devoxx Ukraine devoxx.org.ua 30% discount Code: DevoxxUAKotlin
Agenda ● Consensus algorithms ● Replicated state machine ● Raft basics ● Raft algorithm building blocks ● Implementation details ● Demo
Consensus algorithms overview allow a collection of machines to work as a coherent group ● replicated state machines ● servers compute identical copies of the same state ○ can survive the failures of some of its members ● play a key role in building reliable distributed systems ● (ZooKeeper, HDFS, …)
Replicated State Machine Replicated log ⇒ replicated state machine ● All servers execute same commands in same order ○ Consensus module ensures proper log replication ● System makes progress as long as any majority of servers are up ● Failure model: fail-stop (not Byzantine), delayed/lost messages ●
Replicated State Machine Replicated log ⇒ replicated state machine ● All servers execute same commands in same order ○ Consensus module ensures proper log replication ● System makes progress as long as any majority of servers are up ● Failure model: fail-stop (not Byzantine), delayed/lost messages ●
Replicated State Machine Replicated log ⇒ replicated state machine ● All servers execute same commands in same order ○ Consensus module ensures proper log replication ● System makes progress as long as any majority of servers are up ● Failure model: fail-stop (not Byzantine), delayed/lost messages ●
Replicated State Machine Replicated log ⇒ replicated state machine ● All servers execute same commands in same order ○ Consensus module ensures proper log replication ● System makes progress as long as any majority of servers are up ● Failure model: fail-stop (not Byzantine), delayed/lost messages ●
Replicated State Machine Replicated log ⇒ replicated state machine ● All servers execute same commands in same order ○ Consensus module ensures proper log replication ● System makes progress as long as any majority of servers are up ● Failure model: fail-stop (not Byzantine), delayed/lost messages ●
Why Raft? Paxos has dominated discussion for 25 years ● Hard to understand ○ Not complete enough for real implementations ○ New consensus algorithm: Raft ● ○ Raft is a consensus algorithm for managing a replicated log ○ Diego Ongaro and John Ousterhout - Stanford University Raft adoption ● ○ Docker swarm, Consul, Kudu, RavenDB etc
Raft consensus algorithm Leader election ● Select one of the servers to act as cluster leader ○ Detect crashes, choose new leader ○ Log replication ● Leader takes commands from clients, appends them to its log ○ Leader replicates its log to other servers (overwriting ○ inconsistencies)
Server States and RPCs
Log replication Client sends command to leader ● Leader appends command to its log ● Leader sends AppendEntries RPCs to all followers ● Once new entry committed: ● Leader executes command in its state machine, returns result to client ○ Leader notifies followers of committed entries in subsequent ○ AppendEntries RPCs Followers execute committed commands in their state machines ○ Crashed/slow followers? ● Leader retries AppendEntries RPCs until they succeed ○ Optimal performance in common case: ● One successful RPC to any majority of servers ○
Raft API to implement service Raft { rpc Vote (RequestVoteRPC) returns (ResponseVoteRPC); rpc Append (RequestAppendEntriesRPC) returns (ResponseAppendEntriesRPC); }
Raft architecture [As a FOLLOWER] - waiting for heartbeats [As a Raft-node] [As a CANDIDATE] - process vote requests - ask for votes - process append requests [As a LEADER] - generate heartbeats - replicate log
Raft algorithm building blocks ● RPC ● State transition ● Resettable countdown timer ● Retry operation ● Leader election ● Heartbeat
gRPC Kotlin - Coroutine based gRPC for Kotlin class RaftServer( ... ) : RaftGrpcKt.RaftImplBase(), CoroutineScope { fun vote(request: RequestVoteRPC): Deferred<ResponseVoteRPC> = async { … } fun append(request: RequestAppendEntriesRPC): Deferred<ResponseAppendEntriesRPC> = async { … } } https://github.com/rouzwawi/grpc-kotlin
Server States and RPCs
Channel<State> for transition between states val channel = Channel <State>() init { val waitingForHeartbeat = waitingForHeartbeatFromLeaderTimer() launch { channel. consumeEach { when (it) { FOLLOWER -> waitingForHeartbeat.reset() CANDIDATE -> leaderElection() LEADER -> appendRequestAndLeaderHeartbeat() } } } }
Resettable Countdown Timer class ResettableCountdownTimer(private val action: suspend () -> Unit) { private var timer = startTimer() fun reset() { timer.cancel() timer = startTimer() } private fun startTimer(): Timer { val newTimer = Timer() newTimer. schedule (randomDelay()) { runBlocking { action() } } return newTimer } }
retry suspend fun <T> retry(delay: Long = 5000, block: suspend () -> T): T { while (true) { try { return block() } catch (e: Exception) { } delay (delay) } }
Leader election 1 2 5 4 3
Leader election val countDownLatch = CountDownLatch(majority) 1 2 5 4 3
Leader election val countDownLatch = CountDownLatch(majority) val job = Job() 1 2 servers.forEach { srv -> launch(parent = job) { 5 val responseVote = retry { srv.vote( … ) } countDownLatch.countDown() 4 3 ... } } countDownLatch.await(electionTimeout, TimeUnit.SECONDS)
Leader election (with old coroutines) val countDownLatch = CountDownLatch(majority) val job = Job() 1 2 servers.forEach { srv -> launch(parent = job) { 5 val responseVote = retry { srv.vote( … ) } countDownLatch.countDown() 4 3 ... } } countDownLatch.await(electionTimeout, TimeUnit.SECONDS) job.cancelAndJoin()
Leader election (with structured concurrency) val countDownLatch = CountDownLatch( majority ) coroutineScope { servers . forEach { 1 2 launch { 5 val responseVote = retry { it.vote( … ) } countDownLatch.countDown() 4 3 ... } } countDownLatch.await(electionTimeout, TimeUnit. SECONDS ) coroutineContext . cancelChildren () }
Heartbeat via fixedRateTimer fixedRateTimer (period = 2000) { runBlocking { servers . forEach { launch { try { val response = it.append( … ) ... } catch (e: Exception) { } } } } }
Ktor private fun ktorServer() { val server = embeddedServer (Netty, port = 7000) { routing { get ( "/" ) { call . respondText ( "Server $id log ${ entries() }" , Text. Plain ) } get ( "/cmd/{command}" ) { appendCommand( call . parameters [ "command" ]) call . respondText ( "Server $id log ${ entries() }" , Text. Plain ) } } } server.start(wait = false ) }
Demo
Thank you! Questions? @AndriiRodionov
Literature https://raft.github.io
Recommend
More recommend