Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 Benjamin Wester 1 , James Cowling 2 , Edmund B. Nightingale 3 , Peter M. Chen 1 , Jason Flinn 1 , Barbara Liskov 2 University of Michigan 1 , MIT CSAIL 2 , Microsoft Research 3
Simple Service Configuration 1 x=1 ++x NSDI'09 Benjamin Wester 2 University of Michigan CSE
Replicated State Machines (RSM) 2 x=1 x=2 ++x 2 x=1 x=2 ++x x=1 x=2 2 ++x • Agree on request • x=2 x=1 All non-faulty replies are 2 ++x identical NSDI'09 Benjamin Wester 3 University of Michigan CSE
RSMs have high latency 2 2 2 1. Need many replies 2. Agreement 3. Geographic Distribution NSDI'09 Benjamin Wester 4 University of Michigan CSE
Hide the Latency • Use speculative execution inside RSM • Speculate before consensus is reached – Without faults, any reply predicts consensus value – Let client continue after receiving one reply NSDI'09 Benjamin Wester 5 University of Michigan CSE
Overview • Introduction • Improving RSMs with speculation • Application to PBFT • Performance • Conclusion NSDI'09 Benjamin Wester 6 University of Michigan CSE
Speculative Execution in RSM Take Checkpoint Predict: 1 Blocked Speculate! Commit x=1 x=1 • Continue processing while waiting NSDI'09 Benjamin Wester 7 University of Michigan CSE
Critical path: first reply 1 1 • Completion latency less relevant • First reply latency sets critical path – Speed – Accuracy • Other desirable properties – Throughput – Stability under contention – Smaller number of replicas NSDI'09 Benjamin Wester 8 University of Michigan CSE
Requests while speculative Predict win? = yes while !check_lottery(): submit_tps() buy_corvette() yes buy win? What do we do with this? 1. Hold request – Bad performance 2. Distributed commit/rollback – State tracking complex NSDI'09 Benjamin Wester 9 University of Michigan CSE
Resolve speculations on the replicas Predict win? = yes while !check_lottery(): submit_tps() win? = yes buy_corvette() yes win? yes if win?=yes : buy • Explicitly encode dependencies as predicates • No special request handling needed • Replicas need to log past replies • Local decision at replicas matches client NSDI'09 Benjamin Wester 10 University of Michigan CSE
Overview • Introduction • Improving RSMs with speculation • Application to PBFT • Performance • Conclusion NSDI'09 Benjamin Wester 11 University of Michigan CSE
Practical BFT -CS [Castro and Liskov 1999] client primary f=1 NSDI'09 Benjamin Wester 12 University of Michigan CSE
Additional Details • Tentative execution – PBFT/PBFT-CS complete in 4 phases • Read-only optimization – Accurate answer from backup replica • Failure threshold – Bound worst-case failure • Correctness NSDI'09 Benjamin Wester 13 University of Michigan CSE
Overview • Introduction • Improving RSMs with speculation • Application to PBFT • Performance • Conclusion NSDI'09 Benjamin Wester 14 University of Michigan CSE
Benchmarks • Shared counter – Simple checkpoint – No computation • NFS: Apache httpd build – Complex checkpoint – Significant computation NSDI'09 Benjamin Wester 15 University of Michigan CSE
Topology 1. Primary-local 2. Primary-remote 3. Uniform 2.5 or 15 ms Primary NSDI'09 Benjamin Wester 16 University of Michigan CSE
Base case: no replication 1. Primary-local 2. Primary-remote 3. Uniform 2.5 or 15 ms NSDI'09 Benjamin Wester 17 University of Michigan CSE
Shared Counter Primary-local topology 120 100 Run Time (sec) 80 60 PBFT PBFT-CS 40 No replication 20 0 0 5 10 15 Network Delay (ms) NSDI'09 Benjamin Wester 18 University of Michigan CSE
Shared Counter Primary-local topology 120 100 Run Time (sec) 80 PBFT 60 PBFT-CS No replication 40 Zyzzyva 20 [Kotla et al. 07] 0 0 5 10 15 Network Delay (ms) NSDI'09 Benjamin Wester 19 University of Michigan CSE
Shared Counter Uniform & Primary-remote topology 120 100 Run Time (sec) 80 60 PBFT PBFT-CS 40 No replication 20 0 0 5 10 15 Network Delay (ms) NSDI'09 Benjamin Wester 20 University of Michigan CSE
Shared Counter Uniform & Primary-remote topology 120 100 Run Time (sec) 80 PBFT 60 PBFT-CS No replication 40 Zyzzyva 20 0 0 5 10 15 Network Delay (ms) NSDI'09 Benjamin Wester 21 University of Michigan CSE
NFS: Apache build Primary-local topology 35 30 Run Time (min) 25 20 PBFT 15 PBFT-CS 10 No replication 5 0 0 5 10 15 Network Delay (ms) NSDI'09 Benjamin Wester 22 University of Michigan CSE
NFS: Apache build Uniform topology 35 30 Run Time (min) 25 20 PBFT 15 PBFT-CS 10 No replication 5 0 0 5 10 15 Network Delay (ms) NSDI'09 Benjamin Wester 23 University of Michigan CSE
NFS: Apache build Primary-remote topology 35 30 Run Time (min) 25 20 PBFT 15 PBFT-CS 10 No replication 5 0 0 5 10 15 Network Delay (ms) NSDI'09 Benjamin Wester 24 University of Michigan CSE
NFS: With Failure Primary-local topology 35 30 Run Time (min) 25 20 PBFT 15 PBFT-CS No replication 10 PBFT-CS (1% fail) 5 0 0 5 10 15 Network Delay (ms) NSDI'09 Benjamin Wester 25 University of Michigan CSE
Throughput (Shared Counter) LAN topology 70 60 50 KOps/sec 40 PBFT 30 PBFT-CS 20 Zyzzyva 10 0 1 10 100 Number of Clients NSDI'09 Benjamin Wester 26 University of Michigan CSE
Conclusion • Integrate client speculation within RSMs • Predicated requests: performance without complexity • Clients less sensitive to latency between replicas • 5x speedup over non-speculative protocol Makes WAN deployments more practical NSDI'09 Benjamin Wester 27 University of Michigan CSE
Recommend
More recommend