Low-Latency Network-Scalable Byzantine Fault-Tolerant Replication 12th EuroSys Doctoral Workshop (EuroDW 2018) Ines Messadi, TU Braunschweig, Germany, 2018-04-23 New PhD student (Second month) in the distributed systems group Research area: Resiliency of distributed systems, Byzantine Fault Tolerance Advisor: Rüdiger Kapitza
Overview R eplica 1 R eplica 1 C lients Text 3 f + 1 nodes to tolerate f faults R eplica 1 R eplica 4 Byzantine Fault 2018-04-23 Ines Messadi, TU Braunschweig, Germany Page 1 Low-Latency Network-Scalable Byzantine Fault-Tolerant Replication
Overview Byzantine Agreement C lient Voting C lient Leader Execution R eplica Execution R eplica Execution R eplica Execution Pre-prepare Prepare C ommit 2018-04-23 Ines Messadi, TU Braunschweig, Germany Page 2 Low-Latency Network-Scalable Byzantine Fault-Tolerant Replication
Overview Byzantine Agreement C lient Voting C lient Leader Execution R eplica Execution R eplica Execution R eplica Execution Pre-prepare Prepare C ommit Problem: Agreement latency overhead & message complexity in BFT 2018-04-23 Ines Messadi, TU Braunschweig, Germany Page 2 Low-Latency Network-Scalable Byzantine Fault-Tolerant Replication
Overview Byzantine Agreement C lient Voting C lient Leader Execution R eplica Execution R eplica Execution R eplica Execution Pre-prepare Prepare C ommit Problem: Agreement latency overhead & message complexity in BFT Reason: Multiple communication rounds & slow TCP networking 2018-04-23 Ines Messadi, TU Braunschweig, Germany Page 2 Low-Latency Network-Scalable Byzantine Fault-Tolerant Replication
Overview Byzantine Agreement C lient Voting C lient Leader Execution R eplica Execution R eplica Execution R eplica Execution Pre-prepare Prepare C ommit Problem: Agreement latency overhead & message complexity in BFT Reason: Multiple communication rounds & slow TCP networking New trend: Availability of modern hardware technology such as Remote Direct Memory Access (RDMA) 2018-04-23 Ines Messadi, TU Braunschweig, Germany Page 2 Low-Latency Network-Scalable Byzantine Fault-Tolerant Replication
Overview Byzantine Agreement C lient Voting C lient Leader Execution R eplica Execution R eplica Execution R eplica Execution Pre-prepare Prepare C ommit Problem: Agreement latency overhead & message complexity in BFT Reason: Multiple communication rounds & slow TCP networking New trend: Availability of modern hardware technology such as Remote Direct Memory Access (RDMA) Consequence: A need to redesign current BFT systems → How can we build a secure fast and scalable RDMA-based BFT? ֒ 2018-04-23 Ines Messadi, TU Braunschweig, Germany Page 2 Low-Latency Network-Scalable Byzantine Fault-Tolerant Replication
Remote Direct Memory Access (RDMA) TCP 800 RDMA Send/Recv Latency ( µ s) Why RDMA ? RDMA Read/Write 600 Zero-copy data transfer 400 Reduce communication CPU usage 200 → Low latency and CPU efficiency ֒ 1 10 100 Payload (KB) 2018-04-23 Ines Messadi, TU Braunschweig, Germany Page 3 Low-Latency Network-Scalable Byzantine Fault-Tolerant Replication
Remote Direct Memory Access (RDMA) TCP 800 RDMA Send/Recv Latency ( µ s) Why RDMA ? RDMA Read/Write 600 Zero-copy data transfer 400 Reduce communication CPU usage 200 → Low latency and CPU efficiency ֒ 1 10 100 Challenges Payload (KB) Different communication mechanisms Inappropriate design ⇒ unexpected bad performance Security issues → Require an explicit design of applications ֒ 2018-04-23 Ines Messadi, TU Braunschweig, Germany Page 3 Low-Latency Network-Scalable Byzantine Fault-Tolerant Replication
Remote Direct Memory Access (RDMA) TCP 800 RDMA Send/Recv Latency ( µ s) Why RDMA ? RDMA Read/Write 600 Zero-copy data transfer 400 Reduce communication CPU usage 200 → Low latency and CPU efficiency ֒ 1 10 100 Challenges Payload (KB) Different communication mechanisms Inappropriate design ⇒ unexpected bad performance Security issues → Require an explicit design of applications ֒ Observation Necessity to redesign the existing BFT protocols for RDMA 2018-04-23 Ines Messadi, TU Braunschweig, Germany Page 3 Low-Latency Network-Scalable Byzantine Fault-Tolerant Replication
Towards building RDMA-based BFT Basis BFT protocol: Hybster [Behl et al., EuroSys’17] Building an RDMA-tailored BFT protocol Investigating RDMA communication tradeoffs Counter-measures for the resilient use of RDMA 2018-04-23 Ines Messadi, TU Braunschweig, Germany Page 4 Low-Latency Network-Scalable Byzantine Fault-Tolerant Replication
Towards building RDMA-based BFT Basis BFT protocol: Hybster [Behl et al., EuroSys’17] Building an RDMA-tailored BFT protocol R eplica Investigating RDMA communication tradeoffs RDMA-based selector Counter-measures for the resilient use of RDMA RDMA C hannel Preliminary approach Build similar interfaces to TCP programming using RDMA RDMA C hannel ⇒ Aiming to take fully advantage of RDMA RDMA-based selector R eplica 2018-04-23 Ines Messadi, TU Braunschweig, Germany Page 4 Low-Latency Network-Scalable Byzantine Fault-Tolerant Replication
Towards building RDMA-based BFT Basis BFT protocol: Hybster [Behl et al., EuroSys’17] Building an RDMA-tailored BFT protocol R eplica Investigating RDMA communication tradeoffs RDMA-based selector Counter-measures for the resilient use of RDMA RDMA C hannel Preliminary approach Build similar interfaces to TCP programming using RDMA RDMA C hannel ⇒ Aiming to take fully advantage of RDMA RDMA-based selector R eplica Example applications: Blockchain & coordination services 2018-04-23 Ines Messadi, TU Braunschweig, Germany Page 4 Low-Latency Network-Scalable Byzantine Fault-Tolerant Replication
Recommend
More recommend