Experiences in Building and Operating ePOST, a Reliable Peer-to-Peer Application Alan Mislove †‡ Ansley Post †‡ Andreas Haeberlen †‡ Peter Druschel † † Max Planck Institute for Software Systems ‡ Rice University 1
Reliable P2P Systems: Myth or Reality? • For the past few years, much research interest in p2p • Highly scalable in nodes and data • Utilization of underused resources • Robust to large range of workloads and failures • Most deployed systems are not reliable [Kazaa, Skype, etc] • None attempt to store data reliably, durably, or securely • Lead some to conclude p2p can’t support reliable applications • Question: Can peer-to-peer systems provide reliable service? 2 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 2
Demonstration Application: ePOST • ePOST is an email service built using decentralized components • Completely decentralized, no ‘email servers’ • Email one of the most important Internet applications • Privacy • Integrity • Durability • Availability • Wanted to develop system to a point where people rely on it 3 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 3
ePOST: Deployment • Built and deployed ePOST within our group • Running for over 2 years • Processed well over 500,000 email messages • Built ePOST to be more reliable than existing email systems • 16 users used ePOST as primary email • Even my advisor! • Many challenges found by building the system • After challenges solved, provides reliable service • Robust; numerous times ePOST was only mail service working 4 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 4
Rest of Talk • ePOST in detail • Challenges faced in building and deploying ePOST • Conclusion 5 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 5
ePOST: Architecture • Each participating node runs mail Node servers for the local user Email • Email service looks the same to users IMAP • Data stored cooperatively on POP3 SMTP SMTP participating machines • Machines form overlay • Replicated for redundancy • All data encrypted and signed IMAP SMTP • Prevents others from reading your IMAP email SMTP 6 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 6
ePOST: Architecture • Each participating node runs mail Node servers for the local user Email • Email service looks the same to users • Data stored cooperatively on participating machines • Machines form overlay • Replicated for redundancy • All data encrypted and signed • Prevents others from reading your email 6 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 6
ePOST: Metadata Storage • Folders represented using logs Log Head • Entries represent changes Log Entry • All entries self-authenticating • Log head points to most recent entry Add Email #3 • Signed by owner due to mutability • Only local node has key material Delete Email #2 • All writes performed by owner Mark #2 Read • Map multi-access problem to single- Add Email #2 writer 7 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 7
ePOST: Metadata Storage • Folders represented using logs Log Head • Entries represent changes Log Entry • All entries self-authenticating Add Email #4 • Log head points to most recent entry Add Email #3 • Signed by owner due to mutability • Only local node has key material Delete Email #2 • All writes performed by owner Mark #2 Read • Map multi-access problem to single- Add Email #2 writer 7 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 7
ePOST: Metadata Storage • Folders represented using logs Log Head • Entries represent changes Log Entry • All entries self-authenticating Add Email #4 • Log head points to most recent entry Add Email #3 • Signed by owner due to mutability • Only local node has key material Delete Email #2 • All writes performed by owner Mark #2 Read • Map multi-access problem to single- Add Email #2 writer 7 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 7
Challenges Faced 8 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 8
Challenges Faced • Network partitions • Complex failure modes • NATs and firewalls • Very unsynchronized clocks • Routing anomalies • Lost key material • Node churn • Disconnected nodes • Correlated failures • Power failures • Resource consumption • Resource exhaustion • Data storage • Spam attacks on relays • Slow nodes • Java eccentricities • Hidden single points of failure • Congested links • Data corruption • PlanetLab slice deletion • Comatose nodes • ... 8 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 8
Challenges Faced • Network partitions • Network partitions • Complex failure modes • NATs and firewalls • Very unsynchronized clocks • Very unsynchronized clocks • Routing anomalies • Routing anomalies • Lost key material • Node churn • Disconnected nodes • Correlated failures • Correlated failures • Power failures • Resource consumption • Resource consumption • Resource exhaustion • Data storage • Spam attacks on relays • Slow nodes • Java eccentricities • Hidden single points of failure • Congested links • Data corruption • PlanetLab slice deletion • Comatose nodes • ... 8 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 8
Challenge: Network Partitions • Overlay originally had no special Node provisions for network partitions • Did not envision partitions as a significant problem • When a network failure occurs, nodes detect others to be dead • Multiple overlays reform • Network usually fails at access links • Generally one large overlay and one small overlay 9 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 9
Challenge: Network Partitions • Overlay originally had no special Node provisions for network partitions • Did not envision partitions as a significant problem • When a network failure occurs, nodes detect others to be dead • Multiple overlays reform • Network usually fails at access links • Generally one large overlay and one small overlay 9 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 9
Challenge: Network Partitions • Overlay originally had no special Node provisions for network partitions • Did not envision partitions as a significant problem • When a network failure occurs, nodes detect others to be dead • Multiple overlays reform • Network usually fails at access links • Generally one large overlay and one small overlay 9 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 9
How frequent are partitions? 6 5 Number of Partitions 4 3 2 1 0 0 10 20 30 40 50 60 70 80 90 Time (days) • Partitions occur often in PlanetLab • Usually a single subnet (PlanetLab site) becomes partitioned 10 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 10
Impact of Network Partitions Node • Tradeoff between consistency and Email availability under partitions Log Head • Well-known tradeoff • ePOST resolves this in favor of availability • Partitions cause consistency problems • Small partitions have data inaccessibility • Mutable data can diverge • Partitions persist unless action is taken 11 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 11
Impact of Network Partitions Node • Tradeoff between consistency and Email availability under partitions Log Head • Well-known tradeoff • ePOST resolves this in favor of availability • Partitions cause consistency problems • Small partitions have data inaccessibility • Mutable data can diverge • Partitions persist unless action is taken ? 11 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 11
Partitions: Overlay Reintegration • To reintegrate overlay Node • Nodes remember recently deceased nodes • Periodically query these nodes, and integrate missing nodes into overlay • Protocol is periodic, and therefore stable • Tested on simulated failures as well as Planetlab • Overlay heals as expected 12 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 12
Partitions: Overlay Reintegration • To reintegrate overlay Node • Nodes remember recently deceased nodes • Periodically query these nodes, and integrate missing nodes into overlay • Protocol is periodic, and therefore stable • Tested on simulated failures as well as Planetlab • Overlay heals as expected 12 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 12
Partitions: Data Divergence • In ePOST, log-based data structure Log Entry • Forked logs must be merged • Data divergence unlikely due to single-writer behavior • To repair logs, merge entries, cancel destructive operations Add Email #3 • Ensures no data loss Delete Email #2 Mark #2 Read Add Email #2 13 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 13
Partitions: Data Divergence • In ePOST, log-based data structure Log Entry • Forked logs must be merged • Data divergence unlikely due to single-writer behavior Mark #4 Read Delete Folder Add Email #4 • To repair logs, merge entries, cancel destructive operations Add Email #3 • Ensures no data loss Delete Email #2 Mark #2 Read Add Email #2 13 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 13
Recommend
More recommend