naiad a timely dataflow system
play

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry - PowerPoint PPT Presentation

Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard Paul Barham Martin Abadi Microsoft Research Silicon Valley Presented by Braden Ehrat Batch Stream Graph processing processing processing


  1. Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard Paul Barham Martin Abadi Microsoft Research Silicon Valley Presented by Braden Ehrat

  2. Batch Stream Graph processing processing processing

  3. Batch Stream Graph processing processing processing ● Hadoop ● Storm ● GraphLab ● Dryad ● MillWheel ● PowerGraph

  4. Batch Stream Graph processing processing processing Timely dataflow with Naiad

  5. Timely dataflow A new computational model for stream processing ● Supports feedback loops ● Stateful vertices with arbitrary data ● Notifications for end of epoch

  6. Low-latency, incremental stream processing < 100ms interactive queries < 1s < 1ms iterations batch updates

  7. Dataflow Stage Connector

  8. Dataflow: parallelism Vertex B C Edge

  9. Messages B.S END B Y (edge, message, time) B C D ✉ C.O N R ECV (edge, message, time) Messages are delivered asynchronously

  10. Notifications C.S END B Y (_, _, time) D.N OTIFY A T (time) B C D ✉ No more messages at time or earlier D.O N R ECV (_, _, time) D.O N N OTIFY (time) Notifications support batching

  11. Progress tracking Epoch t is complete E.N OTIFY A T (t) A B C D E C.O N R ECV (_, _, t) C.S END B Y (_, _, t ʹ ) t ʹ ≥ t

  12. Dataflow: iteration

  13. Progress tracking C.N OTIFY A T (t) B C A E D Problem: C depends on its own output

  14. B.S END B Y (_, _, (1, 7)) C.N OTIFY A T ((1, 6)) C.N OTIFY A T (t) A.S END B Y (_, _, 1) E.N OTIFY A T (?) E.N OTIFY A T (1) B C A E D F Advances timestamp and loop counter D.S END B Y (1, 6) Solution: structured timestamps in loops

  15. Simple API class DistinctCount<S,T> : Vertex<T> { Dictionary<T, Dictionary<S,int>> counts; void OnRecv(Edge e, S msg, T time) { if (!counts.ContainsKey(time)) { counts[time] = new Dictionary<S,int>(); this.NotifyAt(time); } if (!counts[time].ContainsKey(msg)) { counts[time][msg] = 0; this.SendBy(output1, msg, time); } counts[time][msg]++; } void OnNotify(T time) { foreach (var pair in counts[time]) this.SendBy(output2, pair, time); counts.Remove(time); } }

  16. Evaluation All-to-all exchange throughput Naiad exchanges 8-byte records between all processes Shows low, linear overhead

  17. Global barrier (Iteration) latency Evaluates time to achieve global coordination No data was exchanged Effect of micro-straglers seen at 50-60 nodes

  18. Real world calculations Twitter follower graph • 42M nodes • 1.5B Edges • 6GB on disk PageRank on Twitter followers

  19. Real world calculations Vowpal Wabbit: Open- source distributed machine learning Naiad is on-par with specialized implementations

  20. Query Latency Compute connected components and top tweets • 32,000 tweets/s • 10 queries/s Fresh: queries delayed behind updates 1s delay: querying stale but consistent data

  21. Conclusions Timely Dataflow in Naiad achieves: • The performance of specialized frameworks • Generic flexibility Open source: http://github.com/MicrosoftResearchSVC/naiad/

Recommend


More recommend