pads policy architecture for distributed storage systems
play

PADS: Policy Architecture for Distributed Storage Systems Nalini - PowerPoint PPT Presentation

PADS: Policy Architecture for Distributed Storage Systems Nalini Belaramani, Jiandan Zheng, Amol Nayate, Robert Soul, Mike Dahlin and Robert Grimm. University of Texas at AusHn, Amazon Inc., IBM T.J. Watson, New York University Lots of data


  1. PADS: Policy Architecture for Distributed Storage Systems Nalini Belaramani, Jiandan Zheng, Amol Nayate, Robert Soulé, Mike Dahlin and Robert Grimm. University of Texas at AusHn, Amazon Inc., IBM T.J. Watson, New York University

  2. Lots of data storage systems BlueFS OmniStore Bayou Ivy Deceit Chain WinFS Replication Farsite Dynamo Ficus Pangaea Cimbiosys Ceph NFS Coda XFS Segank WheelFS Google File Zebra AFS OceanStore TierStore System 1985 1995 2005

  3. Is there a beOer way to build distributed storage systems?

  4. Microkernel approach General mechanism layer System development defining policy System 1 Policy System 2 Policy System 3 Policy PRACTI Mechanisms [*] “PRACTI Replication”, Nalini Belaramani, Mike Dahlin, Lei Gao, Amol Nayate, Arun Venkataramani, Praveen Yalagandula, and Jiandan Zheng. NSDI 2006.

  5. Is it really a beOer way? Challenge: 10 systems, 1K lines each before you graduate *Gulp* How about 3?

  6. Yes it is! With PADS: 2 grad students + 4 months = 12 diverse systems Par5al Replica5on TierStore SCS Pangaea Coda Chain Any Bayou* Topology ReplicaHon Consistency Independence TierStore* FCS Coda* TRIP 10 ‐ 100 LOC Bayou TRIP*

  7. Outline • PADS approach • Policy – RouHng – Blocking • EvaluaHon

  8. Blocking RouHng How is Where is Consistency Durability informaHon data stored? requirements? requirements? propagated? PADS

  9. Outline • PADS approach • Policy – RouHng – Blocking • EvaluaHon

  10. RouHng Data flows among nodes When and where to send an update? Who to contact on a local read miss? Coda Bayou TierStore Chain ReplicaHon

  11. SubscripHon PrimiHve for update flow OpHons: • Data set of interest (e.g. /vol1/*) • NoHficaHons (invalidaHons) in causal order or updates (bodies) • Logical start Hme Source Node Destination Node

  12. Event‐driven API To set up rouHng Events AcHons Operation block Write Add inval sub Delete Add body sub Inval arrived Remove inval sub RouHng Blocking Send body succ Remove body sub Send body failed Policy Policy Send body Subscription start Assign seq Subscription caught- up B_action Subscription end PADS

  13. Domain‐specific language To specify rouHng • R/Overlog – RouHng language based on Overlog[*] – declaraHve rules fired by events • Policy wriOen as rules – invoke acHons when events received [*] “ImplemenHng DeclaraHve Overlays”. Boon Thau Loo, Tyson Condie, Joseph M. Hellerstein, Petros ManiaHs, Timothy Roscoe, Ion Stoica. SOSP 2005.

  14. Simple example On read operaHon block, establish subscripHon to server RouHng Blocking Policy Policy Read Operation Add Subscription: Block: “/foo” Server to me, “/foo” Read PADS “/foo”

  15. Simple example On read operaHon block, establish subscripHon to server ResulHng AcHon addInvalSubscription(@C, S, C, Obj, Catchup) :- operationBlock(@C, Obj, Off, Len, BPoint, _), serverId(@C, S), Triggering event BPoint==“ReadNowBlock”, Catchup:=“CP”. Table lookup Assignment CondiHons

  16. P‐TierStore RouHng in0 TRIG readEvent(@X, ObjId) :‐ cSb1 ACT addInvalSub(@X, C, X, SS, CTP) :‐ Parent EVT iniHalize(@X), ObjId := "/.parent“. TRIG subStart(@X, X, C, , Type), C 6= P, SubscripHons Config Type == "Inval", SS := "/*", CTP := "LOG". pp0 TBL parent(@X, P) :‐ from child RCV parent(@X, P). cSb2 ACT addBodySub(@X, C, X, SS, CTP) :‐ TRIG subStart(@X, X, C, , Type), C 6= P, pp1 TRIG readAndWatchEvent(@X, ObjId) :‐ Type == "Body", SS := "/*". PublicaHons RCV iniHalize(@X), ObjId := "/.subList". dtn1 ACT addInvalSub(@X, R, X, SS, CTP) :‐ Config pSb0 TBL subscripHon(@X, SS) :‐ EVT relayNodeArrives(@X, R), RCV subscripHon(@X, SS). TBL subscripHon(@X, SS), CTP=="LOG". pSb1 ACT addInvalSub(@X, P, X, SS, CTP) :‐ dtn2 ACT addBodySub(@X, R, X, SS) :‐ DTN RCV subscripHon(@X, SS), TBL parent(@X, P), EVT relayNodeArrives(@X, R), CTP=="LOG". TBL subscripHon(@X, SS), CTP=="LOG". support pSb2 ACT addBodySub(@X, P, X, SS) :‐ dtn3 ACT addInvalSub(@X, X, R, SS, CTP) :‐ SubscripHons RCV subscripHon(@X, SS), TBL parent(@X, P). EVT relayNodeArrives(@X, R), from parent SS:="/*", CTP=="LOG". f1 ACT addInvalSub(@X, P, X, SS, CTP) :‐ TRIG subEnd(@X, P, X, SS, , Type), dtn4 ACT addBodySub(@X, X, R, SS) :‐ TBL parent(@X, P), Type=="Inval", CTP:="LOG". EVT relayNodeArrives(@X, R), SS:="/*", CTP=="LOG". f2 ACT addBodySub(@X, P, X, SS) :‐ TRIG subEnd(@X, P, X, SS, , Type), TBL parent(@X, P), TYPE=="Body", CTP:="LOG". [*] “TierStore: A Distributed Storage System for Challenged Networks”. M. Demmer, B. Du, and E. Brewer. FAST 2008.

  17. Outline • PADS approach • Policy – RouHng – Blocking • EvaluaHon

  18. Blocking policy Is it safe to access local data? Consistency Durability What version of data Whether updates can be accessed? have propagated to safe locaHons? Block unHl semanHcs guaranteed

  19. How to specify blocking policy? Where to block? PADS provides • At data access points • 4 built‐in condiHons (local bookkeeping ) What to specify? • 1 extensible condiHon • List of condiHons Read Write Is valid Is causal Update PADS Is sequenced Max staleness R_Msg

  20. Blocking policy examples Consistency: • Read only causal data Read at block: Is_causal Durability: • Block write unHl update reaches server Write after block : R_Msg (ackFromServer)

  21. Outline • PADS approach • Policy – RouHng – Blocking • EvaluaHon

  22. Is PADS a beOer way to build distributed storage systems? • General enough? • Easy to use? • Easy to adapt • Overheads?

  23. General enough? Tier Chain SCS FCS Coda TRIP Bayou Pangaea Store Repl Client/ Client/ Client/ Client/ Ad‐ Topology Tree Chains Ad‐Hoc Server Server Server Server Hoc ReplicaHon ParHal ParHal ParHal Full ParHal Full Full ParHal Demand     caching CooperaHve  caching Prefetching       Seque Seque Open/ Seque Mono‐ Lineari‐ Mono‐ Consistency Causal nHal nHal Close nHal Reads zable Reads Callbacks    Leases    Disconnected      operaHon Inval v. update Inval Inval Inval Inval Update Update Update Update progagaHon

  24. Easy to use? System Rou5ng Rules Blocking Condi5ons P‐Bayou 9 3 P‐Bayou* 9 3 P‐Chain Rep 75 5 P‐Coda 31 5 P‐Coda* 44 5 P‐FCS 43 6 P‐Pangaea 75 1 P‐TierStore 14 1 P‐TierStore* 29 1 P‐TRIP 6 3 P‐TRIP* 6 3

  25. Easy to adapt? Coda • Restricts communicaHon to client‐server only • Cannot take advantage of nearby peers Added co‐operaHve Read caching in Latency (ms) 13 rules

  26. Overheads? Kilobytes transferred Number of updates

  27. Read/Write performance Time (ms)

  28. Take away lesson

  29. Distributed data storage systems RouHng Blocking + Update Consistency propagaHon Durability

  30. Thank you

  31. Easy to adapt? Bayou • Mechanisms only support full replicaHon Add small device support • Change 4 rules Kilobytes Transferred

  32. Real enough?

  33. TierStore • Data storage for developing environments • Publish‐subscribe system – Every node subscribes to publicaHons • Hierarchical topology – Updates flood down the tree – Child updates go up the tree to the root [*] “TierStore: A Distributed Storage System for Challenged Networks”. M. Demmer, B. Du, and E. Brewer. FAST 2008.

Recommend


More recommend