experiences in building and operating epost a reliable
play

Experiences in Building and Operating ePOST, a Reliable Peer-to-Peer - PowerPoint PPT Presentation

Experiences in Building and Operating ePOST, a Reliable Peer-to-Peer Application Alan Mislove Ansley Post Andreas Haeberlen Peter Druschel Max Planck Institute for Software Systems Rice University 1 Reliable P2P


  1. Experiences in Building and Operating ePOST, a Reliable Peer-to-Peer Application Alan Mislove †‡ Ansley Post †‡ Andreas Haeberlen †‡ Peter Druschel † † Max Planck Institute for Software Systems ‡ Rice University 1

  2. Reliable P2P Systems: Myth or Reality? • For the past few years, much research interest in p2p • Highly scalable in nodes and data • Utilization of underused resources • Robust to large range of workloads and failures • Most deployed systems are not reliable [Kazaa, Skype, etc] • None attempt to store data reliably, durably, or securely • Lead some to conclude p2p can’t support reliable applications • Question: Can peer-to-peer systems provide reliable service? 2 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 2

  3. Demonstration Application: ePOST • ePOST is an email service built using decentralized components • Completely decentralized, no ‘email servers’ • Email one of the most important Internet applications • Privacy • Integrity • Durability • Availability • Wanted to develop system to a point where people rely on it 3 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 3

  4. ePOST: Deployment • Built and deployed ePOST within our group • Running for over 2 years • Processed well over 500,000 email messages • Built ePOST to be more reliable than existing email systems • 16 users used ePOST as primary email • Even my advisor! • Many challenges found by building the system • After challenges solved, provides reliable service • Robust; numerous times ePOST was only mail service working 4 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 4

  5. Rest of Talk • ePOST in detail • Challenges faced in building and deploying ePOST • Conclusion 5 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 5

  6. ePOST: Architecture • Each participating node runs mail Node servers for the local user Email • Email service looks the same to users IMAP • Data stored cooperatively on POP3 SMTP SMTP participating machines • Machines form overlay • Replicated for redundancy • All data encrypted and signed IMAP SMTP • Prevents others from reading your IMAP email SMTP 6 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 6

  7. ePOST: Architecture • Each participating node runs mail Node servers for the local user Email • Email service looks the same to users • Data stored cooperatively on participating machines • Machines form overlay • Replicated for redundancy • All data encrypted and signed • Prevents others from reading your email 6 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 6

  8. ePOST: Metadata Storage • Folders represented using logs Log Head • Entries represent changes Log Entry • All entries self-authenticating • Log head points to most recent entry Add Email #3 • Signed by owner due to mutability • Only local node has key material Delete Email #2 • All writes performed by owner Mark #2 Read • Map multi-access problem to single- Add Email #2 writer 7 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 7

  9. ePOST: Metadata Storage • Folders represented using logs Log Head • Entries represent changes Log Entry • All entries self-authenticating Add Email #4 • Log head points to most recent entry Add Email #3 • Signed by owner due to mutability • Only local node has key material Delete Email #2 • All writes performed by owner Mark #2 Read • Map multi-access problem to single- Add Email #2 writer 7 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 7

  10. ePOST: Metadata Storage • Folders represented using logs Log Head • Entries represent changes Log Entry • All entries self-authenticating Add Email #4 • Log head points to most recent entry Add Email #3 • Signed by owner due to mutability • Only local node has key material Delete Email #2 • All writes performed by owner Mark #2 Read • Map multi-access problem to single- Add Email #2 writer 7 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 7

  11. Challenges Faced 8 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 8

  12. Challenges Faced • Network partitions • Complex failure modes • NATs and firewalls • Very unsynchronized clocks • Routing anomalies • Lost key material • Node churn • Disconnected nodes • Correlated failures • Power failures • Resource consumption • Resource exhaustion • Data storage • Spam attacks on relays • Slow nodes • Java eccentricities • Hidden single points of failure • Congested links • Data corruption • PlanetLab slice deletion • Comatose nodes • ... 8 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 8

  13. Challenges Faced • Network partitions • Network partitions • Complex failure modes • NATs and firewalls • Very unsynchronized clocks • Very unsynchronized clocks • Routing anomalies • Routing anomalies • Lost key material • Node churn • Disconnected nodes • Correlated failures • Correlated failures • Power failures • Resource consumption • Resource consumption • Resource exhaustion • Data storage • Spam attacks on relays • Slow nodes • Java eccentricities • Hidden single points of failure • Congested links • Data corruption • PlanetLab slice deletion • Comatose nodes • ... 8 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 8

  14. Challenge: Network Partitions • Overlay originally had no special Node provisions for network partitions • Did not envision partitions as a significant problem • When a network failure occurs, nodes detect others to be dead • Multiple overlays reform • Network usually fails at access links • Generally one large overlay and one small overlay 9 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 9

  15. Challenge: Network Partitions • Overlay originally had no special Node provisions for network partitions • Did not envision partitions as a significant problem • When a network failure occurs, nodes detect others to be dead • Multiple overlays reform • Network usually fails at access links • Generally one large overlay and one small overlay 9 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 9

  16. Challenge: Network Partitions • Overlay originally had no special Node provisions for network partitions • Did not envision partitions as a significant problem • When a network failure occurs, nodes detect others to be dead • Multiple overlays reform • Network usually fails at access links • Generally one large overlay and one small overlay 9 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 9

  17. How frequent are partitions? 6 5 Number of Partitions 4 3 2 1 0 0 10 20 30 40 50 60 70 80 90 Time (days) • Partitions occur often in PlanetLab • Usually a single subnet (PlanetLab site) becomes partitioned 10 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 10

  18. Impact of Network Partitions Node • Tradeoff between consistency and Email availability under partitions Log Head • Well-known tradeoff • ePOST resolves this in favor of availability • Partitions cause consistency problems • Small partitions have data inaccessibility • Mutable data can diverge • Partitions persist unless action is taken 11 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 11

  19. Impact of Network Partitions Node • Tradeoff between consistency and Email availability under partitions Log Head • Well-known tradeoff • ePOST resolves this in favor of availability • Partitions cause consistency problems • Small partitions have data inaccessibility • Mutable data can diverge • Partitions persist unless action is taken ? 11 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 11

  20. Partitions: Overlay Reintegration • To reintegrate overlay Node • Nodes remember recently deceased nodes • Periodically query these nodes, and integrate missing nodes into overlay • Protocol is periodic, and therefore stable • Tested on simulated failures as well as Planetlab • Overlay heals as expected 12 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 12

  21. Partitions: Overlay Reintegration • To reintegrate overlay Node • Nodes remember recently deceased nodes • Periodically query these nodes, and integrate missing nodes into overlay • Protocol is periodic, and therefore stable • Tested on simulated failures as well as Planetlab • Overlay heals as expected 12 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 12

  22. Partitions: Data Divergence • In ePOST, log-based data structure Log Entry • Forked logs must be merged • Data divergence unlikely due to single-writer behavior • To repair logs, merge entries, cancel destructive operations Add Email #3 • Ensures no data loss Delete Email #2 Mark #2 Read Add Email #2 13 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 13

  23. Partitions: Data Divergence • In ePOST, log-based data structure Log Entry • Forked logs must be merged • Data divergence unlikely due to single-writer behavior Mark #4 Read Delete Folder Add Email #4 • To repair logs, merge entries, cancel destructive operations Add Email #3 • Ensures no data loss Delete Email #2 Mark #2 Read Add Email #2 13 19.04.2006 EuroSys’06 Conference, Leuven, Belgium 13

Recommend


More recommend