Haryadi S. Gunawi, Pallavi Joshi, Peter Alvaro, ! Joseph M. Hellerstein, and Koushik Sen ! ! Thanh Do, Andrea C. Arpaci-Dusseau, ! and Remzi H. Arpaci-Dusseau ! ! Dhruba Borthakur ! 1
! Cloud ! " Thousands of commodity machines ! " “Rare (HW) failures become frequent” [Hamilton] ! ! Failure recovery ! " “… has to come from the software” [Dean] ! " “… must be a first-class op” [Ramakrishnan et al.] ! " But ... hard to get right ! 2
Cloudy ! ! with a chance of ! ! failure ! More in literature: ! - Data loss, whole-system down in Google Chubby [Burrows06] ! - 91 recovery issues found in HDFS over 4 years ! - ... ! 3
! Testing is not advanced enough ! " Cloud systems face complex multiple, diverse failures ! ! Recovery is under-specified ! " Lots of custom recovery ! " Implementation is complex ! ! Need two advancements: ! " Exercise complex failure modes ! " Write recovery specifications and test the implementation ! 4
FATE ! Cloud software ! Failure Testing ! ! ! Service ! X 1 ! X 2 ! DESTINI ! Declarative Testing ! Violate ! Specifications ! specs? ! 5
! FATE ! " Exercise multiple, diverse failures ! - Over 40,000 unique combinations (80 hours) ! - Challenge: combinatorial explosion of multiple failures ! " Pruning strategies for failure exploration ! - An order of magnitude speedup ! - Found the same #bugs ! ! DESTINI ! " Facilitate recovery specifications ! - Reliability and availability related ! " Clear and concise (use Datalog, 5 lines/check) ! " Design patterns ! 6
! Target 3 cloud systems ! " HDFS (primary target), Cassandra, and ZooKeeper ! ! HDFS recovery bugs ! " Found 16 new bugs (+6 in newest) ! ! Problems found ! " Data loss ! - Buggy recovery wipes out all replicas ! " Unavailability ! - Broken rack-aware policy ! - Can’t restart after failures ! 7
! Introduction ! ! FATE ! " Failure IDs: abstraction for failure exploration ! " Pruning strategies ! ! DESTINI ! ! Evaluation ! ! Conclusion ! 8
Setup ! Stage ! M ! C ! 1 2 ! 3 ! HadoopFS (HDFS) ! Alloc Req ! Write ! Protocol ! Data ! Transfer ! No failures ! M ! C ! 1 2 ! 3 ! M ! C ! 2 ! 3 ! 4 ! 1 X 1 X 2 Data Transfer Recovery: ! Setup Recovery: ! Continue on surviving nodes ( 1 , 2) ! Recreate fresh pipeline ( 1 , 2, 4) ! 9 !
! Failures ! " Anytime: different stages # different recovery ! " Anywhere: N2 crash, and then N3 ! " Any type: bad disks, partitioned nodes/racks ! ! FATE ! " Systematically exercise multiple, diverse failures ! " How? need to “remember” failures – via failure IDs ! M ! C ! 1 ! 2 ! 3 ! M ! C ! 1 ! 2 ! 3 ! 4 ! 10
! Abstraction of I/O failures ! ! Building failure IDs ! " Intercept every I/O ! " Inject possible failures ! - Ex: crash, network partition, disk failure (LSE/corruption) ! OutputStream.read() in ! BlockReceiver.java ! Node2 ! Node3 ! I/O ! <stack trace> ! information: ! X Net I/O from N3 to N2 ! “Data Ack” ! Note: ! Injected failure: ! Crash After ! FIDs ! Failure ID: 2573 ! A, B, C, ... ! 11
1 failure / run ! 2 failures / run ! M ! C ! 1 2 ! 3 ! M ! C ! 1 2 ! 3 ! A ! A ! Exp # 1 : A ! AB ! B ! A ! A ! Exp #2: B ! AC ! B ! C ! B ! Exp #3: C ! A ! A ! BC ! C ! B ! B ! C ! 12 !
! Introduction ! ! FATE ! " Failure IDs: abstraction of failures ! " Pruning strategies for failure exploration ! ! DESTINI ! ! Evaluation ! ! Conclusion ! 13
! Exercised over 40,000 unique combinations of 1 , 2, and 3 failures per run ! " 80 hours of testing time! ! New challenge: ! Combinatorial explosion of multiple failures ! 1 2 ! 3 ! 2 failures / run ! A 1 A 3 A 2 A1 A2 A1 B2 B1 A2 B 1 B 3 B 2 B1 B2 ... 14
! Properties of multiple failures ! " Pairwise dependent failure IDs ! " Pairwise independent failure IDs ! ! Goal: exercise distinct recovery behaviors ! " Key: some failures result in similar recovery ! " Result: > 10x faster, and found the same bugs ! 15
! Failure dependency graph ! FID # Subseq FIDs ! " Inject single failures first ! A # X " Record subsequent dependent IDs ! B # X C # X, Y - Ex: X depends on A ! " Brute-force: AX, BX, CX, DX, CY, DY ! D # X, Y ! Recovery clustering ! A B C D " Two clusters: {X} and {X, Y} ! ! Only exercise distinct clusters ! Y X " Pick a failureID that triggers a recovery cluster ! " Results: AX, CX, CY ! 16
2 ! 3 ! 1 ! Independent combinations ! A 1 A 3 A 2 " Ex: FP = 2, N = 3 ! " FP 2 x N (N – 1 ) ! B 1 B 3 B 2 ! Symmetric code ! 2 ! 3 ! 1 " Just pick two nodes ! " N (N – 1 ) # 2 ! A 1 A 3 A 2 " FP 2 x 2 ! B 1 B 3 B 2 17
! FP 2 bottleneck ! A 1 A 2 " Ex: FP = 4 ! B 1 B 2 " Real example: FP = 15 ! C 1 C 2 ! Recovery clustering ! D 1 D 2 " Cluster A and B if: ! fail(A) == fail(B) ! A 1 A 2 " Reduce FP 2 to FP 2 clustered ! B 1 B 2 " E.g.15 FPs to 8 FPs clustered ! C 1 C 2 D 1 D 2 18
! Contributions ! " Exercise multiple, diverse failures (via failure IDs) ! " Pruning strategies (> 10x improvement) ! ! Limitations ! " I/O reordering ! " Inclusion of states to failure IDs ! " More failure modes ! - Transient, slow-down, and data-center partitioning ! 19
! Introduction ! ! FATE ! ! DESTINI: Declarative Testing Specifications ! ! Evaluation ! ! Conclusion ! 20
! Is the system correct under failures? ! " Need to write specifications ! " FATE needs DESTINI ! Test ! Specs ! [It is] great to document (in a spec) the HDFS write protocol ... ! ! …, but we shouldn't spend too Implemen- ! much time on it, … a formal spec tation ! X 2 may be overkill for a protocol we X 1 plan to deprecate imminently. ! 21
! How to write specifications? ! " Developer friendly (clear, concise, easy) ! ! Datalog: a declarative relational logic language ! " Easy to express logical relations ! " (just for writing specifications) ! 22
! How to write specs? ! Specs ! " Violations ! " Expectations ! " Facts ! Implemen- ! tation ! ! How to write recovery specs? ! " “... recovery is under specified” [Hamilton] ! " Precise failure events ! " Precise check timings ! ! How to test implementation? ! " Interpose I/O calls (lightweight) ! " Deduce expectations and facts from I/O events ! ! 23
“Throw a violation if ! an expectation is different from ! the actual behavior” ! violationTable(…) :- ! expectationTable(…), ! NOT -IN actualTable(…) ! ! head() :- predicates(), … ! Datalog syntax: ! :- derivation ! , AND ! 24
M ! C ! 1 2 ! 3 ! “Block replicas should ! exist in surviving nodes” ! X Data ! B ! B ! Transfer ! incorrectNodes ! actualNodes ! expectedNodes ! (Block, Node) ! (Block, Node) ! (Block, Node) ! B Node 1 B Node 1 B Node 2 B Node 2 incorrectNodes(B, N) :- expectedNodes(B, N), NOT -IN actualNodes(B, N); ! 25
M ! C ! 1 2 ! 3 ! X B ! B ! incorrectNodes ! expectedNodes ! actualNodes ! (Block, Node) ! (Block, Node) ! (Block, Node) ! B Node 2 B Node 1 B Node 1 B Node 2 incorrectNodes(B, N) :- expectedNodes(B, N), NOT -IN actualNodes(B, N); ! 26
1 2 ! 3 ! ! Ex: which nodes should M ! C ! have the blocks? ! X " Deduce expectations 2 from I/O events (italic) ! expectedNodes ! (Block, Node) ! C ! M ! B Node 1 getBlockPipe(…) ! B Node 2 Give me 3 nodes for B ! B Node 3 [Node 1 , Node2, Node3] ! expectedNodes (B, N) :- ! getBlockPipe (B, N); ! # 1 : incorrectNodes(B, N) :- expectedNodes(B, N), NOT -IN actualNodes(B, N); ! 27
expectedNodes ! (Block, Node) ! DEL expectedNodes (B, N) :- ! B Node 1 expectedNodes (B, N), ! B Node 2 fateCrashNode (N) ! B Node 3 M ! C ! 2 ! 3 ! 1 DESTINI ! needs ! X FATE ! B ! B ! # 1 : incorrectNodes(B, N) :- expectedNodes(B, N), NOT -IN actualNodes(B, N); ! #2: expectedNodes(B, N) :- getBlockPipe (B,N); ! 28
DEL expectedNodes (B, N) :- ! expectedNodes (B, N), ! fateCrashNode (N), ! Precise failure events # ! writeStage (B, Stage), ! Stage == “ Data Transfer”; ! # 1 : incorrectNodes(B,N) ! :- expectedNodes(B,N), NOT -IN actualNodes(B,N) ! #2: expectedNodes(B,N) ! :- getBlockPipe (B,N); ! #3: expectedNodes(B,N) ! :- expectedNodes(B,N), fateCrashNode(N), ! writeStg (B,Stage), Stage == “DataTr” ! ! #4: writeStg (B, “DataTr”) ! :- writeStg (B,“Setup”), nodesCnt(Nc), acksCnt (Ac), Nc==Ac ! #5: nodesCnt (B, CNT<N>) ! :- pipeNodes (B, N); ! #6: pipeNodes (B, N) ! :- getBlockPipe (B, N); ! #7: acksCnt (B, CNT<A>) ! :- setupAcks (B, P , “OK”); ! #8: setupAcks (B, P , A) ! :- setupAck (B, P , A); ! ! 29
! Recovery ≠ invariant ! # 1 : ! " If recovery is ongoing, ! incorrectNodes(B, N) :- ! invariants are violated ! expectedNodes(B, N), ! " Don’t want false alarms ! NOT -IN actualNodes(B, N), ! ! Need precise check timings ! completeBlock (B); ! " Ex: upon block completion ! 30
! Support recovery specs ! " Reliability and availability related ! " Clear and concise (use Datalog) ! ! Design patterns ! " Add detailed specs ! " Write specs from different views (global, client, ...) ! " Incorporate diverse failures (crashes, rack partitions) ! " ... more in the paper ! 31
! Introduction ! ! FATE ! ! DESTINI ! ! Evaluation and conclusion ! 32
Recommend
More recommend