differential dataflow
play

Differential Dataflow McSherry, Frank D., Murray, Derek G., Isaacs, - PowerPoint PPT Presentation

Differential Dataflow McSherry, Frank D., Murray, Derek G., Isaacs, Rebecca, Isard, Michael Chathura Kankanamge 08th November 2016 Outline Motivation for Differential Dataflow Key Concepts Differential Dataflow in practice


  1. Differential Dataflow McSherry, Frank D., Murray, Derek G., Isaacs, Rebecca, Isard, Michael Chathura Kankanamge 08th November 2016

  2. Outline ● Motivation for Differential Dataflow ● Key Concepts ● Differential Dataflow in practice ● Discussion

  3. Motivation

  4. Traditional data parallel processing ● Take input data in batches. ● Process and output. ● Highly evolved - Hadoop, Spark. ● Mostly stateless.

  5. Interactive - Twitter Mention Graph ● Used to find trending #hashtags. ● Billions of vertices and edges. ● Millions of updates per second (storm). ● Needs low latency of streaming and throughput of spark. ● Similar issue with interactive analytics

  6. Loop Processing ● Some algorithms require iterations ○ Pagerank ○ Connected components ● Usually requires transferring entire state between iterations ● Spark, Hadoop etc execution times ~ stateless

  7. Incremental Dataflow ● Stateful. ● Get the differences of collections. ● Only calculate changes. ● Example ○ Wordcount in Hadoop Online. ● Can deal with changes due to, ○ Loops ○ New Data ● But NOT both!!

  8. Concepts

  9. Total vs Partial Ordering ● Traditional dataflow systems expect total 1 2 3 4 5 ordering ○ Multiple variables are a problem (1, 2) (0, 2) (2, 2) ● A partial ordering uses a time vector for ordering ○ Deals well with multiple variables (1, 1) (0, 1) (2, 1) ● Partial because ordering by variable x gives only a partial ordering (0, 0) (1, 0) (2, 0)

  10. Total vs Partial Ordering ● Traditional dataflow systems expect total 1 2 3 4 5 ordering ○ Multiple variables are a problem (1, 2) (0, 2) (2, 2) ● A partial ordering uses a time vector for ordering ○ Deals well with multiple variables (1, 1) (0, 1) (2, 1) ● Partial because ordering by variable x gives only a partial ordering for x (0, 0) (1, 0) (2, 0)

  11. Total vs Partial Ordering ● Traditional dataflow systems expect total 1 2 3 4 5 ordering ○ Multiple variables are a problem (1, 2) (0, 2) (2, 2) ● A partial ordering uses a time vector for ordering ○ Deals well with multiple variables (1, 1) (0, 1) (2, 1) ● Partial because ordering by variable x gives only a partial ordering (0, 0) (1, 0) (2, 0)

  12. Differential Dataflow ● Computational Model ○ Defines how to process partially ordered data. ○ Defines state between iterations ● Goals ○ Do less calculation per change ○ Converge quicker per iteration

  13. Timely Dataflow ● Performs Iterative Calculations ● Computational model with directed graph ● Vertices exchange messages ● Logical Timestamps for messages

  14. Timely Dataflow ● Loops denoted by, ○ Ingress - adds a counter ○ Feedback - increments a counter ○ Egress - removes a counter ● Pointstamps - events at location and time

  15. Differential Dataflow in practise

  16. The Connected Graph Problem 4 6 2 3 7 8 5 1

  17. The Connected Graph Problem 1 6 1 1 6 6 1 1

  18. Connected Graph with Relational Algebra Labels Edges 1 3 3 1 1 1 4 3 2 2 3 3 3 4 U 4 4 4 2 Min 5 5 2 4 2 5 O 5 2

  19. Connected Graph with Relational Algebra Labels Edges 1 3 1 3 3 1 4 3 4 3 4 3 U 4 2 4 Min 2 4 2 2 2 5 O 5 5 2

  20. Connected Graph with Relational Algebra Labels Edges 3 1 2 5 5 2 3 1 Neighbour Labels 3 4 1 1 U 4 3 2 2 s l e b 2 4 a 3 3 L Min f l e S 4 4 4 2 5 5 O

  21. Connected Graph with Relational Algebra Labels Edges 1 1 Result after 1st Iteration 1 3 4 2 U 2 2 5 2 Min O

  22. Connected Graph in Timely ● Edges are available constantly GroupBy G H Edges F I B Concat Egress +Min ● Add counter at Ingress Map Join ● Remove Counter at egress ● Increment counter at E feedback ● Map converts joined tuples A Labels C E Ingress Concat into node/label tuples ● Concat performs the union I F e e d b a c k J

  23. Maintaining State in Differential Dataflow Sum of all states at Change in state at b before t node b at t Cumulative state at b upto t

  24. Connected Graph 4 2 3 5 1

  25. Connected Graph in Differential 1 3 Edges Labels t= (0) t= (0) 1 1 3 1 Ingress 4 3 2 2 3 4 3 3 Concat Join 4 2 4 4 2 4 Map 5 5 2 5 Feedback Concat 5 2 GroupBy +Min Egress

  26. Connected Graph in Differential 1 3 Edges Labels t= (0) 3 1 t= (0, 0) 1 1 Ingress 4 3 2 2 3 4 Concat Join 3 3 4 2 4 4 2 4 Map 5 5 2 5 Feedback Concat 5 2 GroupBy +Min Egress

  27. Connected Graph in Differential 1 3 Edges Labels t= (0) 3 1 Ingress 4 3 t= (0, 0) 1 1 3 4 Concat Join 2 2 4 2 ? 3 3 2 4 Map 4 4 2 5 Feedback Concat 5 2 5 5 GroupBy +Min Egress

  28. Connected Graph in Differential t= (0, 0) Edges Labels 1 3 1 Ingress 3 3 1 4 3 4 Concat Join 3 4 3 4 2 4 Map 2 4 2 Feedback Concat 2 2 5 5 5 2 GroupBy +Min Egress

  29. Connected Graph in Differential t= (0, 0) Edges Labels 3 1 Ingress 3 1 3 4 Concat Join 4 3 2 4 Map 4 2 Feedback Concat 2 5 5 2 GroupBy +Min Egress

  30. Connected Graph in Differential Edges Labels t= (0, 0) Ingress 3 1 3 1 1 1 Concat Join 3 4 2 2 4 3 3 3 Map 2 4 4 4 Feedback Concat 4 2 5 5 2 5 GroupBy +Min 5 2 Egress

  31. Connected Graph in Differential Edges Labels t= (0, 0) Ingress 1 1 Concat Join 3 1 Map 4 2 Feedback 2 2 Concat 5 2 GroupBy +Min Egress

  32. Connected Graph in Differential Edges Labels t= (0, 1) Ingress 1 1 Concat Join 3 1 Map 4 2 Feedback 2 2 Concat 5 2 GroupBy +Min Egress

  33. Connected Graph in Differential 1 1 Edges t= (0, 1) Labels 2 2 Ingress 3 3 4 4 Concat Join 5 5 t= (0, 1) 1 1 Map 3 1 Feedback Concat 4 2 GroupBy 2 2 +Min 5 2 Egress

  34. Connected Graph in Differential Edges Labels Ingress t= (0, 1) Concat Join 1 1 1 1 3 1 2 2 Map 4 2 3 3 Feedback Concat 2 2 4 4 GroupBy 5 2 5 5 +Min Egress

  35. Connected Graph in Differential Edges Labels t= (0, 1) Ingress 3 3 3 1 Concat Join 4 4 Map 4 2 Feedback 5 5 Concat 5 2 GroupBy +Min Egress

  36. Connected Graph in Differential 3 1 3 Edges Labels t= (0, 1) 1 3 1 Ingress 3 4 3 4 3 4 Concat Join 4 3 2 Map 4 2 4 2 Feedback 4 2 Concat 5 5 2 2 GroupBy 2 5 +Min Egress

  37. Connected Graph in Differential Edges Labels t= (0, 1) 1 3 Ingress 1 1 4 3 Concat Join 4 1 3 4 Map 3 2 Feedback Concat 4 2 2 2 GroupBy +Min 5 2 Egress

  38. Connected Graph in Differential Edges Labels t= (0, 1) 1 3 Ingress 3 4 3 4 Concat Join 2 4 2 5 Map 4 1 Feedback Concat 2 3 3 3 GroupBy +Min 4 4 4 4 Egress

  39. Connected Graph in Differential Edges Labels 3 1 1 1 1 1 1 1 1 1 Ingress 3 1 2 2 3 1 3 1 4 1 4 1 Concat Join 4 1 4 2 2 2 3 2 2 2 2 2 Map 2 4 5 2 5 2 5 2 2 5 Feedback Concat t= (0, 1) GroupBy Groupby + Min Cumulative Input +Min 4 2 from concat 1 4 Egress

  40. Connected Graph in Differential Edges Labels Ingress t= (0, 2) Concat Join 4 2 1 4 Map Feedback Concat GroupBy +Min Egress

  41. Connected Graph in Differential Edges Labels Ingress t= (0, 2) 4 2 1 4 Concat Join Map Feedback Concat GroupBy +Min Egress

  42. Connected Graph in Differential Edges Labels Ingress Concat Join Map Feedback Concat GroupBy t= (0, 2) +Min 2 2 2 1 Egress

  43. Connected Graph in Differential Edges Labels Ingress t= (0, 3) Concat Join 2 2 1 2 Map Feedback Concat GroupBy +Min Egress

  44. Connected Graph in Differential Edges Labels Ingress t= (0, 3) Concat Join 2 2 1 2 Map Feedback Concat GroupBy +Min Egress

  45. Connected Graph in Differential Edges Labels Ingress Concat Join Map Feedback Concat GroupBy t= (0, 3) +Min 5 2 5 1 Egress

  46. Connected Graph in Differential Edges Labels Ingress t= (0, 4) 5 2 Concat Join 5 1 Map Feedback Concat GroupBy +Min Egress

  47. Connected Graph in Differential Edges Labels Ingress t= (0, 4) 5 2 Concat Join 5 1 Map Feedback Concat GroupBy +Min Egress

  48. Connected Graph in Differential Edges Labels Ingress t= (0, 4) 5 2 Concat Join 5 1 Map Feedback Concat GroupBy +Min Egress

  49. Connected Graph in Differential Edges Labels Ingress Concat Join Map Feedback Concat ? t= (0, 4) GroupBy +Min Egress

  50. Connected Graph in Differential Edges Labels Ingress Concat Join t= (0, 4) ? Map Feedback Concat Does not increment GroupBy ? +Min t= (0) Egress

  51. Changes to Connected Graph - I Remove Undirected Edge 4 2 3 5 1

  52. Changes to Connected Graph - I Edges Labels 4 2 t= (1) Ingress 2 4 Concat Join Map Feedback Concat GroupBy +Min Egress

  53. Changes to Connected Graph - I Edges Labels 4 2 t= (1) Ingress t= (1, 0) 2 4 ? Concat Join Map Feedback Concat GroupBy +Min Egress

Recommend


More recommend