Byzantine Fault Tolerant Systems Stefan Heinz Advanced Topics in Distributed Computing Ph. D. Petr Kuznetsov WS 07/08 Stefan Heinz, WS 07/08
Farsite Federated, Available and Reliable Storage for an Incompletely Trusted Environment Adya et al., 2002 Stefan Heinz, WS 07/08
Outline Motivation / Introduction System Overview System Architecture File System Features Summary / Conclusion Stefan Heinz, WS 07/08
Outline Motivation / Introduction System Overview System Architecture File System Features Summary / Conclusion Stefan Heinz, WS 07/08
Farsite Motivation / Introduction Farsite Server-based file-system What are the disadvantages Key techniques: BFT, replication, cryptography of this architecture? Stefan Heinz, WS 07/08
Farsite Motivation / Introduction Want to achieve the benefits of a central file server A shared namespace Location-transparent access Reliable data storage and the benefits of a local desktop file systems Low cost Privacy from nosy sysadmins Resistance to geographically localized faults Stefan Heinz, WS 07/08
Farsite Motivation / Introduction Key design objectives Emulation of a local NTFS file system Scalability Provide the benefits of BFT Minimal administrative effort Stefan Heinz, WS 07/08
Outline Motivation / Introduction System Overview System Architecture File System Features Summary / Conclusion Stefan Heinz, WS 07/08
Farsite System Overview – Design Assumptions High-bandwidth, low-latency network Majority of machines are up for the majority of the time Incorrelated machine downtimes Independent permanent machine failures Each machine performs correctly for its immediate user Stefan Heinz, WS 07/08
Farsite System Overview – Namespace Roots Hierarchical directory namespace Farsite supports multiple roots (like names of file servers) Each root is managed by a set of machines, which form a BFT group rootA rootB SubDirA SubDirB Stefan Heinz, WS 07/08
Farsite System Overview – Trust and Certification namespace certificate Certification user certificate Authority machine certificate Each user private key is encrypted with a symmetric key derived from the user’s password and then stored in a globally-readable directory Usage of certificate revocation lists Stefan Heinz, WS 07/08
Outline Motivation / Introduction System Overview System Architecture File System Features Summary / Conclusion Stefan Heinz, WS 07/08
Farsite System Architecture – Basic System Every machine may perform three roles client member of a directory group file host A directory group collectively manages file information using a BFT protocol The BFT protocol guarantees data consistency as long as fewer than a third of the machines misbehave Stefan Heinz, WS 07/08
Farsite System Architecture – Basic System metadata filedata clients directory group Stefan Heinz, WS 07/08
Farsite System Architecture – Enhancements Scalability metadata hashes filedata BFT replication raw replication file hosts & directory group clients How many machines may die until data is lost? Stefan Heinz, WS 07/08
Farsite System Architecture – Enhancements Performance Usage of local caching and file leases Updates not pushed directly to the directory group, because most file writes are deleted or overwritten shortly after they occur Stefan Heinz, WS 07/08
Farsite System Architecture – Enhancements Security Clients encrypt written file data with the public keys of all authorized readers Directory group cryptographically validates requests from users before accepting updates Reliability When a machine is unavailable for an extended period of time, its functions migrate to one or more other machines Data is lost permanently only if too many machines fail within too small a time window to permit regeneration Stefan Heinz, WS 07/08
Outline Motivation / Introduction System Overview System Architecture File System Features Summary / Conclusion Stefan Heinz, WS 07/08
Farsite File System Features - Security Convergent Encryption encrypt the generate one-way blocks using the hash of each block hashes as keys Benefits: encryptions are comparable, e.g. to identify block encyryption allows for: duplicated files ● writing individual blocks ● read individual blocks without use a randomly generated file key the need to load the entire file to encrypt the hashes and encrypt this key with the public keys of authorized readers Stefan Heinz, WS 07/08
Farsite File System Features - Scalability Delegation of parts of the namespace Hint based pathname translation Clients cache pathnames and their mappings to directory groups Delayed directory-change notification Clients register for a notification when a user lists a directory The directory group packages the information, signs it and sends it to the registered clients Stefan Heinz, WS 07/08
Outline Motivation / Introduction System Overview System Architecture File System Features Summary / Conclusion Stefan Heinz, WS 07/08
Farsite Summary / Conclusion Farsite is a scalable, decentralized network file system which uses insecure and unreliable machines as a basis for a virtual file server that is secure and reliable To achieve this it uses a lot of known techniques: replication, BFT, cryptography, certificates, leases, caching It also introduces new techniques: convergent encryption metadata timed byzantine operations hashes filedata BFT replication raw replication file hosts directory group Stefan Heinz, WS 07/08
Farsite Summary / Conclusion Performance Measurements ”For our performance evaluation, we configured a five- machine Farsite system [...]. Four machines served as file hosts and as members of a directory group, and one machine served as a client.” Performance conclusion: Farsite performs significantly better than remote file access via CIFS Stefan Heinz, WS 07/08
Zyzzyva Speculative Byzantine Fault Tolerance Kotla et al., 2007 Stefan Heinz, WS 07/08
Outline Motivation / Introduction Protocol Agreement Protocol View Changes Correctness Summary / Conclusion Stefan Heinz, WS 07/08
Outline Motivation / Introduction Protocol Agreement Protocol View Changes Correctness Summary / Conclusion Stefan Heinz, WS 07/08
Zyzzyva: Speculative BFT Motivation / Introduction Goal of BFT protocols: Transform a high-performance service into a high-performance and reliable service Stefan Heinz, WS 07/08
Zyzzyva: Speculative BFT Motivation / Introduction Why another BFT protocol? A lot of different BFT protocols exist, which perform differently in different situations, e.g under different workload Such complexity represents a barrier to adoption of BFT techniques because it requires to choose the right technique for a workload which then should not deviate from expectations Outperform other BFT protocols BFT? Yes Zyzzyva Stefan Heinz, WS 07/08
Zyzzyva: Speculative BFT Motivation / Introduction One replica is selected as a primary The primary proposes an order on client requests to the other replicas Unlike in other protocols the replicas speculatively execute requests without running an expensive agreement protocol Replicas states may diverge, but clients help to detect and correct inconsistencies The replies of the replicas carry sufficient history information for clients to determine if the replies and history are stable and guaranteed to be eventually committed Stefan Heinz, WS 07/08
Zyzzyva: Speculative BFT Motivation / Introduction Traditional BFT state machine replication reply Client request Primary Replica 1 Replica 2 Replica 3 Agreement Execution Cost: Agreement protocol overhead Stefan Heinz, WS 07/08
Zyzzyva: Speculative BFT Motivation / Introduction Zyzzyva: Speculative BFT Replication reply Client request Primary Replica 1 Replica 2 Replica 3 Speculative execution Cost: No explicit replica agreement Stefan Heinz, WS 07/08
Outline Motivation / Introduction Protocol Agreement Protocol View Changes Correctness Summary / Conclusion Stefan Heinz, WS 07/08
Zyzzyva: Speculative BFT Protocol - Agreement Protocol Clients should act only upon replies that correspond to stable requests executed in a total order that is guaranteed to eventually commit at all correct servers The request has afterwards the same sequence number at all correct replicas and the same history of preceding requests observed by the client Stefan Heinz, WS 07/08
Zyzzyva: Speculative BFT Protocol - Agreement Protocol Client sends request to the primary m = < REQUEST ,o ,t ,c > c client primary Stefan Heinz, WS 07/08
Zyzzyva: Speculative BFT Protocol - Agreement Protocol Primary receives request, assigns sequence number and forwards ordered request to replicas <OR ,m > OR = < ORDER − REQ ,v ,n,h n ,d ,ND > p d = H m ,h n = H h n − 1 ,d client primary Stefan Heinz, WS 07/08
Recommend
More recommend