Department of Computer Science Institute for Systems Architecture, Systems Engineering Group Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems Andrey Brito 1 , Christof Fetzer 1 , Pascal Felber 2 1 Technische Universität Dresden, Germany 2 Université de Neuchâtel, Switzerland ICDCS'09, June 23 rd , 2009
Goal Minimize the cost of logging/checkpointing in event stream processing systems Contribution: Usage of an speculation framework based on transactional memory to overlap logging and processing ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 2 of 55
Motivation (1) • Event stream applications – Directed acyclic graph of operators – Some operators don't keep state • Trivially parallelizable – Some do keep state • Not trivially parallelizable – Sometimes they are order sensitive • Need to process events sequentially, maybe even waiting for the order to be restored ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 3 of 55
Application example Publisher Filter n A A6 STATE STATE Output Adapter Processor2 A5 B7 B6 B5 Processor1 A2 B0 A1 A0 A4 B3 B2 B1 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 4 of 55
Application example Events based on non-deterministic Events are out! decision Publisher Filter n A A6 STATE STATE Output Adapter Processor2 A5 B7 B6 B5 Processor1 A2 B0 A1 A0 A4 B3 B2 B1 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 5 of 55
Application example Publisher Filter n A A6 STATE STATE Processor1 Processor2 Output Adapter A5 B7 B6 B5 A2 B0 A1 A0 A4 B3 B2 B1 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 6 of 55
Application example Restore checkpoint. Publisher Filter n A A6 STATE STATE Output Adapter Processor1 Processor2 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 7 of 55
Application example Ask upstream node to replay missing ones. Publisher Filter n A A6 STATE STATE Output Adapter Processor1 Processor2 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 8 of 55
Application example Processing some events again. Publisher Filter n A A6 STATE STATE Output Adapter A5 B7 B6 B5 Processor1 Processor2 B3 B2 A4 B1 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 9 of 55
Application example What are you Events reflect talking about? different decisions. Publisher Filter n A A6 STATE STATE Output Adapter A5 B7 B6 B5 Processor1 Processor2 B3 B2 A4 B1 Publisher Filter n B B8 → Incomplete log of non-deterministic decisions no repeatability ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 10 of 55
Motivation (2) • Fault-tolerant event stream applications – Precise recovery – Even if order does not matter, repeatability does – Non-determinism • Input order from different streams • Non-determinism in processing (multi-threading, time, random numbers) – Log or checkpoint before each output ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 11 of 55
Logging is expensive ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 12 of 55
My solution • Speculate... • … to parallelize stateful components • … to not have to wait for events • … to not have to wait for logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 13 of 55
Outline • How the speculation works • Logging algorithm • Experiments • Final remarks ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 14 of 55
How the speculation works • Base: TinySTM – Some extra features added – But same basic rule: “it appears to be atomic” • Goal: track accesses to shared memory – Instrumentation • Reads and writes are intercepted • Hold back writes, validate reads until all dependencies satisfied ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 15 of 55
Speculative execution: parallelization NEXT = 9 Processor 1 8 7 6 12 11 9 Processor 2 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 16 of 55
Speculative execution: parallelization NEXT = 9 Processor 1 11 8 7 6 14 13 12 Processor 2 9 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 17 of 55
Speculative execution: parallelization NEXT = 9 Processor 1 11 8 7 6 14 13 12 Processor 2 9 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 18 of 55
Speculative execution: parallelization NEXT = 9 Processor 1 11 8 7 6 14 13 12 Processor 2 9 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 19 of 55
Speculative execution: parallelization NEXT = 10 Processor 1 11 9 8 7 14 13 12 Processor 2 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 20 of 55
Speculative execution: parallelization NEXT = 10 Processor 1 9 8 7 14 13 12 Processor 2 11 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 21 of 55
Logging algorithm • Operator enqueues all events & decisions • N+1 threads for N disks – One groups requests in a buffers – The others write their buffers to disk ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 22 of 55
Logging algorithm E Operator ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 23 of 55
Logging algorithm Operator E ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 24 of 55
Logging algorithm Operator ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 25 of 55
Logging algorithm Operator NDDs ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 26 of 55
Logging algorithm Operator E ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 27 of 55
Logging algorithm E is here waiting. Operator ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 28 of 55
Logging algorithm Operator ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 29 of 55
Logging algorithm Operator ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 30 of 55
Logging algorithm Operator update(E) ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 31 of 55
Logging algorithm E Operator ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 32 of 55
Logging algorithm Publisher Filter n A A6 STATE STATE Output Adapter Processor2 A5 B7 B6 B5 Processor1 A2 B0 A1 A0 A4 B3 B2 B1 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 33 of 55
Logging algorithm Events based on non-deterministic Events are out! decision Publisher Filter n A A6 STATE STATE Output Adapter Processor2 A5 B7 B6 B5 Processor1 A2 B0 A1 A0 A4 B3 B2 B1 Publisher Filter n B B8 ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 34 of 55
Logging algorithm Filter 1 A6 STATE Processor1 A5 B7 B6 B5 Filter n B8 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 35 of 55
Logging algorithm Filter 1 STATE Processor1 Filter n Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 36 of 55
Logging algorithm Filter 1 STATE Processor1 1 Filter n Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 37 of 55
Logging algorithm Filter 1 STATE Processor1 1 Filter n 2 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 38 of 55
Logging algorithm Filter 1 STATE Processor1 1 Filter n 3 2 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 39 of 55
Logging algorithm Filter 1 STATE Processor1 1 Filter n 3 2 4 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 40 of 55
Logging algorithm Filter 1 STATE Processor1 1 5 Filter n 3 2 4 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 41 of 55
Logging algorithm Filter 1 STATE Processor1 1 5 Filter n 3 2 4 6 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 42 of 55
Logging algorithm Filter 1 STATE 7 Processor1 1 5 Filter n 3 2 4 6 Checkpoint/Logging ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 43 of 55
Speculative processing + Logging • From the original node's viewpoint – Emit outputs as speculative – When logging requests are acknowledged, emit final • The next downstream node – If speculative event modifies some state, keep track • Outputs that consider that part of the state are speculative • Speculative status is contagious ICDCS'09, 23.06.09 Minimizing latency in fault-tolerant DSMS Slide 44 of 55
Recommend
More recommend