flease lease coordination without a lock server
play

Flease - Lease Coordination Without a Lock Server Bjrn Kolbeck , - PowerPoint PPT Presentation

Flease - Lease Coordination Without a Lock Server Bjrn Kolbeck , Mikael Hgqvist, Jan Stender, Felix Hupfeld * Zuse Institute Berlin, * Google Switzerland GmbH File and Metadata Replication in XtreemFS Bjrn Kolbeck 1 Problem: Data


  1. Flease - Lease Coordination Without a Lock Server Björn Kolbeck , Mikael Högqvist, Jan Stender, Felix Hupfeld * Zuse Institute Berlin, * Google Switzerland GmbH File and Metadata Replication in XtreemFS · Björn Kolbeck 1

  2. Problem: Data Replication – Data replication with strong consistency – Apply updates in same order ~ total order broadcast Destination Agreement: Fixed Sequencer: (Multi)Paxos Primary/Backup 2/20

  3. Data Replication: Primary/Backup – “Easy“ to implement Single process takes all decisions – Widley used: Google GFS, many RDBMS (Oracle, DB2, MySQL) – – Primary is SPOF Primary role must be revoked when process failed/disconnected – ➔ Leases for Primary election Lease: Exclusive access for limited period of time – Exclusive access = primary role – Timeout = revocation – 3/20

  4. Outline 1.Distributed Lease Coordination 2.The Flease Algorithm 3.Decentralized Lease Coordination 4.Evaluation 4/20

  5. Distributed Lease Coordination – Lease = exclusive access – Lease Invariant: At most one valid lease at any point in time. – Distributed System Many processes concurrently trying to get a lease – All processes must agree on the same lease – – Distributed Consensus (?) (Multi)Paxos – 5/20

  6. Distributed Lease Coordination: Agreement – Agreement (Consensus): If process p decides v then all process will decide v. – – Agreement (Leases): If process p decides l then all process will decide l – until l has timed out. – Leases have a timeout. We don't care about leases that have timed out – 6/20

  7. Deconstructing Paxos: Round Based Register – Round-based register Atomic read-modify-write – read(version) – write(version, new value) – – Register on each process – Majority-based (Quorum Intersection Property) X 1 1 read write(X) 2 2 X 3 3 7/20

  8. Paxos vs. Flease – Consensus with RBR value = read(version) IF value = empty THEN value := proposed value END IF IF write(value, version) THEN „decide“ value END IF – Lease Agreement with RBR lease = read(version) IF lease = empty OR timed_out(lease) THEN lease := (me, t now + t max ) END IF IF write(version, lease) THEN „decide“ lease END IF 8/20

  9. Flease: No persistent state – Process crashes Register contents is lost – X X 1 1 2 2 X 3 3 – Lease has timed out = empty register IF lease = empty OR timed_out(lease) THEN – – Flease: wait for t max before recovering Lease in register has timed out – 9/20

  10. Advantages of Flease – Smaller state Multipaxos: one Paxos instance per lease – Flease: only a single register – ▪ easier to implement – No disk access (Multi)Paxos: two writes per lease (on all nodes) – Flease: no disk writes – ▪ lower latency ▪ throughtput limited only by bandwidth of RAM ▪ share server with I/O intensive applications 10/20

  11. Throughput under heavy IO load 2500 zookeeper (IOZone) flease (IOZone) zookeeper (alone) flease (alone) 2000 throughput (leases/second) 1500 1000 500 0 1000 10000 20000 50000 batch size (leases per node) 11/20

  12. Decentralized Lease coordination – No separate lock service – Central Lock Service vs. Decentralized Leases No extra service (saves hardware, maintenance) – Availability of replicas depends only on replica machines – Automatically scales with the system size – 12/20

  13. Evaluation: Scalability – Zookeeper: 3 servers – Flease: 3 nodes (2 randomly selected) 13/20

  14. Evaluation: Max. number of open files/server 120000 102058 Flease 100000 10 sec Zookeeper 17010 5 sec 2445 8500 80000 1 sec 1223 1700 60000 245 51029 40000 25515 20000 17010 14672 8505 7336 3668 3402 2445 1701 1223 489 245 0 0 10 20 30 40 50 60 lease timeout (s) 30 nodes, LAN 14/20

  15. Thank You – Conclusion If you need a primary/exclusive access you can do better without a central lock service – Open Source implementation – www.xtreemfs.org – www.contrail-project.eu The Contrail project is supported by funding under – the Seventh Framework Programme of the European Commission: ICT, Internet of Services, Software and Virtualization. GA nr.: FP7-ICT-257438. 15/20

  16. 16/20

  17. Flease: Renewing Leases – Modified Lease Invariant: If process p decides l=(p',t) then all process will decide l'=(p',t') – with t' >= t until l has timed out. lease = read(version) IF lease = empty OR timed_out(lease) OR owner(lease) = me THEN lease := (me, t now + t max ) END IF IF write(version, lease) THEN „decide“ lease END IF 17/20

  18. Flease: The other half of the truth. – Assumed perfectly synchronized clocks – Instead: Loosely synchronized clocks c(t) < c(t') if t < t' – At any time t for any two processes p, q: | c p (t) – c q (t) | < ε – ε system-wide constant, e.g. 1 sec – lease = read(version) IF lease.t < t now AND lease.t > t now + ε THEN wait ε retry END IF ... 18/20

  19. Throughput vs. Messages 19/20

  20. XtreemFS: Flease for file replication – One lease per file = one primary per file better load balancing – arbitrary replica placement – – When a file is openend Elect a primary with Flease – Execute Replica Reset – Read locally, write quorum – 20/20

Recommend


More recommend