naiad a timely dataflow system
play

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry - PowerPoint PPT Presentation

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard Paul Barham Martn Abadi MSR Silicon Valley Presented by Jesse Mu (jlm95) Background: dataflow programming Batch processing Batch processing


  1. Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard Paul Barham Martín Abadi MSR Silicon Valley Presented by Jesse Mu (jlm95)

  2. Background: dataflow programming

  3. Batch processing

  4. Batch processing

  5. Batch processing Count most popular hashtags at a given time

  6. Batch processing Count most popular ... hashtags at a given time

  7. Batch processing

  8. Batch processing Must wait for all inputs to be completed (= latency)

  9. Stream processing (asynchronous)

  10. Stream processing (asynchronous) Pick out key words/mentions/relevant topics

  11. Stream processing (asynchronous) Pick out key words/mentions/relevant topics Real-time access

  12. Background: types of data processing systems ● Batch processing (e.g. Pregel, CIEL) ○ High throughput, aggregate summaries of data ○ Waiting for batches introduces latency ● Stream processing (e.g. Storm, MillWheel) ○ Low-latency, near-realtime access to results ○ No synchronization/aggregate computation ● Iterative (graph-centric) computation ○ e.g. network data, ML

  13. Background: types of data processing systems ● Batch processing (e.g. Pregel, CIEL) ○ High throughput, aggregate summaries of data ○ Waiting for batches introduces latency ● Stream processing (e.g. Storm, MillWheel) ○ Low-latency, near-realtime access to results Timely Dataflow ○ No synchronization/aggregate computation One-size-fits-all ● Iterative (graph-centric) computation ○ e.g. network data, ML

  14. Background: types of data processing systems Timely Dataflow One-size-fits-all

  15. Contributions 1. Timely dataflow , a dataflow computing model which supports batch, stream, and graph-centric iterative processing a. Supports common high-level programming interfaces (e.g. LINQ) 2. Naiad , a high-performance distributed implementation of the model a. Faster than SOTA batch/streaming frameworks

  16. Timely Dataflow supports Batch and Stream Async event-based model A B C Nodes are always active. Send and receive messages via A. SendBy (edge, message, time) B. OnRecv (edge, message, time) Request and operate on notifications for batches C. NotifyAt (time) C. OnNotify (time)

  17. Timely Dataflow supports Batch and Stream Async event-based model A B C Nodes are always active. Send and receive messages via A. SendBy (edge, message, time) Stream processing B. OnRecv (edge, message, time) Request and operate on notifications for batches C. NotifyAt (time) Batch processing C. OnNotify (time)

  18. rt_out realtime output a_out A B b_out batched output

  19. rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ...

  20. Pass through even numbers A only rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ...

  21. Pass through even numbers A only rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ... B Pass through all numbers; compute min of each time

  22. function OnRecv (input_edge, msg, time) { Pass through if (msg % 2 == 0) even numbers A this. SendBy (a_out, msg, time)} only rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ... B Pass through all numbers; compute min of each time

  23. function OnRecv (input_edge, msg, time) { Pass through if (msg % 2 == 0) even numbers A this. SendBy (a_out, msg, time)} only rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ... state = {} // times -> running mins B Pass through all numbers; compute min of each time

  24. function OnRecv (input_edge, msg, time) { Pass through if (msg % 2 == 0) even numbers A this. SendBy (a_out, msg, time)} only rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ... state = {} // times -> running mins function OnRecv (input_edge, msg, time) { B Pass through all numbers; compute min of each time

  25. function OnRecv (input_edge, msg, time) { Pass through if (msg % 2 == 0) even numbers A this. SendBy (a_out, msg, time)} only rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ... state = {} // times -> running mins function OnRecv (input_edge, msg, time) { this. SendBy (rt_out, msg, time) B Pass through all numbers; compute min of each time

  26. function OnRecv (input_edge, msg, time) { Pass through if (msg % 2 == 0) even numbers A this. SendBy (a_out, msg, time)} only rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ... state = {} // times -> running mins function OnRecv (input_edge, msg, time) { this. SendBy (rt_out, msg, time) // Streaming B if (time not in state) // New time state[time] = msg Pass through all this. NotifyAt (time) numbers; compute min of each time

  27. function OnRecv (input_edge, msg, time) { Pass through if (msg % 2 == 0) even numbers A this. SendBy (a_out, msg, time)} only rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ... state = {} // times -> running mins function OnRecv (input_edge, msg, time) { this. SendBy (rt_out, msg, time) // Streaming B if (time not in state) // New time state[time] = msg Pass through all this. NotifyAt (time) numbers; compute if (msg < state[time]) // New min min of each time state[time] = msg

  28. function OnRecv (input_edge, msg, time) { Pass through if (msg % 2 == 0) even numbers A this. SendBy (a_out, msg, time)} only rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ... state = {} // times -> running mins function OnNotify (time) { function OnRecv (input_edge, msg, time) { this. SendBy (batch_out, this. SendBy (rt_out, msg, time) // Streaming state[time], time)} B if (time not in state) // New time state[time] = msg Pass through all this. NotifyAt (time) numbers; compute if (msg < state[time]) // New min min of each time state[time] = msg

  29. function OnRecv (input_edge, msg, time) { Pass through if (msg % 2 == 0) even numbers A this. SendBy (a_out, msg, time)} only rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ... state = {} // times -> running mins function OnNotify (time) { function OnRecv (input_edge, msg, time) { this. SendBy (batch_out, Node B, you’ve this. SendBy (rt_out, msg, time) // Streaming state[time], seen all messages time)} B if (time not in state) // New time for time 1 state[time] = msg Pass through all this. NotifyAt (time) numbers; compute if (msg < state[time]) // New min min of each time state[time] = msg

  30. rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ...

  31. All messages for time 1 delivered rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ...

  32. All messages for ??? time 1 delivered rt_out Input realtime output time numbers a_out 1 9, 3, 2, 5, ... A B b_out 2 3, 2, 7, 12, ... batched output ...

  33. Progress tracking

  34. Progress tracking SendBy(_, _, 1)

  35. Progress tracking NotifyAt(1) SendBy(_, _, 1)

  36. Progress tracking SendBy(_, _, (1, 1)) NotifyAt(1) SendBy(_, _, 1)

  37. Progress tracking SendBy(_, _, (1, 1)) NotifyAt(1) SendBy(_, _, 1) SendBy(_, _, (1, 2)) NotifyAt((1, 2))

  38. Progress tracking SendBy(_, _, (1, 1)) NotifyAt(1) SendBy(_, _, 1) SendBy(_, _, (1, 2)) NotifyAt((1, 2))

  39. Sort by could-result-in order SendBy(_, _, (1, 1)) NotifyAt(1) SendBy(_, _, 1) SendBy(_, _, (1, 2)) NotifyAt((1, 2))

  40. Sort by could-result-in order SendBy(_, _, (1, 1)) NotifyAt(1) SendBy(_, _, 1) SendBy(_, _, (1, 2)) NotifyAt((1, 2))

  41. Sort by could-result-in order SendBy(_, _, (1, 1)) NotifyAt(1) SendBy(_, _, (1, 2)) NotifyAt((1, 2))

  42. Sort by could-result-in order SendBy(_, _, (1, 1)) NotifyAt(1) SendBy(_, _, (1, 2)) NotifyAt((1, 2))

  43. Sort by could-result-in order NotifyAt(1) SendBy(_, _, (1, 2)) NotifyAt((1, 2))

  44. Sort by could-result-in order NotifyAt(1) SendBy(_, _, (1, 2)) NotifyAt((1, 2))

  45. Sort by could-result-in order NotifyAt(1) NotifyAt((1, 2))

  46. Sort by could-result-in order NotifyAt(1) NotifyAt((1, 2))

  47. Sort by could-result-in order NotifyAt(1) Send notification! NotifyAt((1, 2))

  48. Sort by could-result-in order NotifyAt(1)

  49. Sort by could-result-in order Send notification! NotifyAt(1)

  50. Sort by could-result-in order

  51. Sort by could-result-in order ...a notification can be delivered only when no possible predecessors of a timestamp exist

  52. Sort by could-result-in order ...a notification can be delivered only when no possible predecessors of a timestamp exist (based on timestamps + graph structure)

  53. Low vs High Level Interfaces

Recommend


More recommend