minuet rethinking concurrency control in storage area
play

Minuet Rethinking Concurrency Control in Storage Area Networks - PowerPoint PPT Presentation

Minuet Rethinking Concurrency Control in Storage Area Networks FAST 09 Andrey Ermolinskiy (U. C. Berkeley) Daekyeong Moon (U. C. Berkeley) Byung-Gon Chun (Intel Research, Berkeley) Scott Shenker (U. C. Berkeley


  1. Minuet – Rethinking Concurrency Control in Storage Area Networks FAST ‘09 Andrey Ermolinskiy (U. C. Berkeley) Daekyeong Moon (U. C. Berkeley) Byung-Gon Chun (Intel Research, Berkeley) Scott Shenker (U. C. Berkeley and ICSI) 1

  2. Storage Area Networks – an Overview  Storage Area Networks (SANs) are gaining widespread adoption in data centers.  An attractive architecture for clustered services and data-intensive clustered applications that require a scalable and highly-available storage backend. Examples:  Online transaction processing  Data mining and business intelligence  Digital media production and streaming media delivery 2

  3. Clustered SAN applications and services  One of the main design challenges: ensuring safe and efficient coordination of concurrent access to shared state on disk.  Need mechanisms for distributed concurrency control.  Traditional techniques for shared-disk applications: distributed locking, leases. 3

  4. Limitations of distributed locking  Distributed locking semantics do not suffice to guarantee correct serialization of disk requests and hence do not ensure application-level data safety. 4

  5. Data integrity violation: an example Client 1 – updating resource R DLM SAN X X X X X X X X X X Client 2 – reading resource R Shared resource R 5

  6. Data integrity violation: an example Client 1 – updating resource R Lock(R) - OK Write(B, offset=3, data= ) Y Y Y Y Y Y Y Y CRASH! DLM SAN Client 1 owns lock on R waiting for lock on R Client 2 owns lock on R X X X X X X X X X X X X X X X Y Y X X X X X X Client 2 – reading resource R Shared resource R Lock(R) - OK Read(R, offset=0, data= ) Read(R, offset=5, data= ) 6

  7. Data integrity violation: an example  Both clients obey the locking protocol, but Client 1 observes only partial effects of Client 2’s update.  Update atomicity is violated. X X X Y Y Y Y X X X X X X X X Y Y X X X Client 2 – reading resource R Shared resource R 7

  8. Availability limitations of distributed locking  The lock service represents an additional point of failure.  DLM failure  loss of lock management state  application downtime. 8

  9. Availability limitations of distributed locking  Standard fault tolerance techniques can be applied to mitigate the effects of DLM failures  State machine replication  Dynamic election  These techniques necessitate some form of global agreement.  Agreement requires an active majority  Makes it difficult to tolerate network-level failures and large- scale node failures. 9

  10. Example: a partitioned network Application cluster C1 DLM1 DLM2 C2 SAN C3 DLM3 C4 C3 and C4 stop making process DLM replicas 10

  11. Minuet overview  Minuet is a new synchronization primitive for shared- disk applications and middleware that seeks to address these limitations.  `Guarantees safe access to shared state in the face of arbitrary asynchrony Unbounded network transfer delays  Unbounded clock drift rates   Improves application availability Resilience to network partitions and large-scale node failures.  11

  12. Our approach  A “traditional” cluster lock service provides the guarantees of mutual exclusion and focuses on preventing conflicting lock assignments.  We focus on ensuring safe ordering of disk requests at target storage devices. Client 2 – reading resource R Lock(R) Read(R, offset=0, data= ) Read(R, offset=5, data= ) Unlock(R) 12

  13. Session isolation Lock(R, Shared) Lock(R, Shared) Read2.1(R) C1 Read1.1(R) C2 UpgradeLock(R, Excl) Read1.2(R) UpgradeLock(R, Excl) Write2.1(R) Excl session Write2.2(R) Write1.1(R) Excl Unlock(R) Write1.2(R) session DowngradeLock(R, Shared) Read1.3(R) Shared Unlock(R) session Shared Owner R session  Session isolation : R.owner must observe the prefixes of all sessions to R in strictly serial order, such that  No two requests in a shared session are interleaved by an exclusive-session request from another client. 13

  14. Session isolation Lock(R, Shared) Lock(R, Shared) Read2.1(R) C1 Read1.1(R) C2 UpgradeLock(R, Excl) Read1.2(R) UpgradeLock(R, Excl) Write2.1(R) Excl session Write2.2(R) Write1.1(R) Excl Unlock(R) Write1.2(R) session DowngradeLock(R, Shared) Read1.3(R) Shared Unlock(R) session Shared Owner R session  Session isolation : R.owner must observe the prefixes of all sessions to R in strictly serial order, such that  No two requests in an exclusive session are interleaved by a shared- or exclusive-session request from another client. 14

  15. Enforcing session isolation  Each session to a shared resource is assigned a globally-unique session identifier (SID) at the time of lock acquisition.  Client annotates its outbound disk commands with its current SID for the respective resource.  SAN-attached storage devices are extended with a small application-independent logical component (“ guard ”), which:  Examines the client-supplied session annotations  Rejects commands that violate session isolation. 15

  16. Enforcing session isolation Client Guard Guard node module module SAN SAN R R 16

  17. Enforcing session isolation Client Guard node module SAN R R.clientSID = <T S , T X > R.curSType = {Excl / Shared / None} 17

  18. Enforcing session isolation Client Guard node module SAN R R.clientSID = <T S , T X > R.curSType = {Excl / Shared / None} Establishing a session to resource R: Lock(R, Shared / Excl) { R.curSType  Shared / Excl R.clientSID  unique session ID } 18

  19. Enforcing session isolation Client Guard node module SAN R R.clientSID = <T S , T X > R.curSType = {Excl / Shared / None} Submitting a remote disk command: command Initialize the session annotation: IF (R.curSType = Excl) { READ / WRITE (LUN, Offset, Length, …) updateSID  R.clientSID verifySID  R.clientSID R verifySID = <T s , T x > updateSID = <T s , T x > } session annotation 19

  20. Enforcing session isolation Client Guard node module SAN R R.clientSID = <T S , T X > R.curSType = {Excl / Shared / None} Submitting a remote disk command: command Initialize the session annotation: IF (R.curSType = Shared) { READ / WRITE (LUN, Offset, Length, …) updateSID  R.clientSID verifySID.T x  R.clientSID.T X R verifySID = <T s , T x > updateSID = <T s , T x > verifySID.T s  EMPTY } session annotation 20

  21. Enforcing session isolation Client Guard node module disk cmd. annotation SAN R R.clientSID = <T S , T X > R.curSType = {Excl / Shared / None} Submitting a remote disk command: command Initialize the session annotation: IF (R.curSType = Shared) { READ / WRITE (LUN, Offset, Length, …) updateSID  R.clientSID verifySID.T x  R.clientSID.T X R verifySID = <T s , T x > updateSID = <T s , T x > verifySID.T s  EMPTY } session annotation 21

  22. Enforcing session isolation Client Client Guard Guard node node module module disk cmd. annotation SAN R R R.clientSID = <T S , T X > R.curSType = {Excl / Shared / None} 22

  23. Enforcing session isolation Client Guard Guard node module module disk cmd. annotation SAN R R R.ownerSID = <T s , T x > Guard logic at the storage controller: IF (verifySID.T x < R.ownerSID.T x ) decision  REJECT ELSE IF ((verifySID.T s ≠ EMPTY) AND (verifySID.T s < R.ownerSID.T s )) decision  REJECT ELSE decision  ACCEPT 23

  24. Enforcing session isolation Client Guard Guard node module module disk cmd. annotation SAN R R R.ownerSID = <T s , T x > Guard logic at the storage controller: IF (decision = ACCEPT) { R.ownerSID.T s  MAX(R.ownerSID.T s , updateSID.T s ) R.ownerSID.T X  MAX(R.ownerSID.T X , updateSID.T X ) Enqueue and process the command } ELSE { Status = BADSESSION Respond to client with R.ownerSID Drop the command } 24

  25. Enforcing session isolation Client Guard Guard ACCEPT node module module disk cmd. annotation SAN R R R.ownerSID = <T s , T x > Guard logic at the storage controller: IF (decision = ACCEPT) { R.ownerSID.T s  MAX(R.ownerSID.T s , updateSID.T s ) R.ownerSID.T X  MAX(R.ownerSID.T X , updateSID.T X ) Enqueue and process the command } ELSE { Status = BADSESSION Respond to client with R.ownerSID Drop the command } 25

  26. Enforcing session isolation Client Guard Guard REJECT Status = BADSESSION node module module R.ownerSID SAN R R R.ownerSID = <T s , T x > Guard logic at the storage controller: IF (decision = ACCEPT) { R.ownerSID.T s  MAX(R.ownerSID.T s , updateSID.T s ) R.ownerSID.T X  MAX(R.ownerSID.T X , updateSID.T X ) Enqueue and process the command } ELSE { Status = BADSESSION Respond to client with R.ownerSID Drop the command } 26

  27. Enforcing session isolation Client Client Guard Guard REJECT Status = BADSESSION node node module module R.ownerSID SAN R R R.ownerSID = <T s , T x >  Upon command rejection:  Storage device responds to the client with a special status code (BADSESSION) and the most recent value of R.ownerSID.  Application at the client node Observes a failed disk request and forced lock revocation.  Re-establishes its session to R under a new SID and retries.  27

Recommend


More recommend