Differential Dataflow McSherry, Frank D., Murray, Derek G., Isaacs, Rebecca, Isard, Michael Chathura Kankanamge 08th November 2016
Outline ● Motivation for Differential Dataflow ● Key Concepts ● Differential Dataflow in practice ● Discussion
Motivation
Traditional data parallel processing ● Take input data in batches. ● Process and output. ● Highly evolved - Hadoop, Spark. ● Mostly stateless.
Interactive - Twitter Mention Graph ● Used to find trending #hashtags. ● Billions of vertices and edges. ● Millions of updates per second (storm). ● Needs low latency of streaming and throughput of spark. ● Similar issue with interactive analytics
Loop Processing ● Some algorithms require iterations ○ Pagerank ○ Connected components ● Usually requires transferring entire state between iterations ● Spark, Hadoop etc execution times ~ stateless
Incremental Dataflow ● Stateful. ● Get the differences of collections. ● Only calculate changes. ● Example ○ Wordcount in Hadoop Online. ● Can deal with changes due to, ○ Loops ○ New Data ● But NOT both!!
Concepts
Total vs Partial Ordering ● Traditional dataflow systems expect total 1 2 3 4 5 ordering ○ Multiple variables are a problem (1, 2) (0, 2) (2, 2) ● A partial ordering uses a time vector for ordering ○ Deals well with multiple variables (1, 1) (0, 1) (2, 1) ● Partial because ordering by variable x gives only a partial ordering (0, 0) (1, 0) (2, 0)
Total vs Partial Ordering ● Traditional dataflow systems expect total 1 2 3 4 5 ordering ○ Multiple variables are a problem (1, 2) (0, 2) (2, 2) ● A partial ordering uses a time vector for ordering ○ Deals well with multiple variables (1, 1) (0, 1) (2, 1) ● Partial because ordering by variable x gives only a partial ordering for x (0, 0) (1, 0) (2, 0)
Total vs Partial Ordering ● Traditional dataflow systems expect total 1 2 3 4 5 ordering ○ Multiple variables are a problem (1, 2) (0, 2) (2, 2) ● A partial ordering uses a time vector for ordering ○ Deals well with multiple variables (1, 1) (0, 1) (2, 1) ● Partial because ordering by variable x gives only a partial ordering (0, 0) (1, 0) (2, 0)
Differential Dataflow ● Computational Model ○ Defines how to process partially ordered data. ○ Defines state between iterations ● Goals ○ Do less calculation per change ○ Converge quicker per iteration
Timely Dataflow ● Performs Iterative Calculations ● Computational model with directed graph ● Vertices exchange messages ● Logical Timestamps for messages
Timely Dataflow ● Loops denoted by, ○ Ingress - adds a counter ○ Feedback - increments a counter ○ Egress - removes a counter ● Pointstamps - events at location and time
Differential Dataflow in practise
The Connected Graph Problem 4 6 2 3 7 8 5 1
The Connected Graph Problem 1 6 1 1 6 6 1 1
Connected Graph with Relational Algebra Labels Edges 1 3 3 1 1 1 4 3 2 2 3 3 3 4 U 4 4 4 2 Min 5 5 2 4 2 5 O 5 2
Connected Graph with Relational Algebra Labels Edges 1 3 1 3 3 1 4 3 4 3 4 3 U 4 2 4 Min 2 4 2 2 2 5 O 5 5 2
Connected Graph with Relational Algebra Labels Edges 3 1 2 5 5 2 3 1 Neighbour Labels 3 4 1 1 U 4 3 2 2 s l e b 2 4 a 3 3 L Min f l e S 4 4 4 2 5 5 O
Connected Graph with Relational Algebra Labels Edges 1 1 Result after 1st Iteration 1 3 4 2 U 2 2 5 2 Min O
Connected Graph in Timely ● Edges are available constantly GroupBy G H Edges F I B Concat Egress +Min ● Add counter at Ingress Map Join ● Remove Counter at egress ● Increment counter at E feedback ● Map converts joined tuples A Labels C E Ingress Concat into node/label tuples ● Concat performs the union I F e e d b a c k J
Maintaining State in Differential Dataflow Sum of all states at Change in state at b before t node b at t Cumulative state at b upto t
Connected Graph 4 2 3 5 1
Connected Graph in Differential 1 3 Edges Labels t= (0) t= (0) 1 1 3 1 Ingress 4 3 2 2 3 4 3 3 Concat Join 4 2 4 4 2 4 Map 5 5 2 5 Feedback Concat 5 2 GroupBy +Min Egress
Connected Graph in Differential 1 3 Edges Labels t= (0) 3 1 t= (0, 0) 1 1 Ingress 4 3 2 2 3 4 Concat Join 3 3 4 2 4 4 2 4 Map 5 5 2 5 Feedback Concat 5 2 GroupBy +Min Egress
Connected Graph in Differential 1 3 Edges Labels t= (0) 3 1 Ingress 4 3 t= (0, 0) 1 1 3 4 Concat Join 2 2 4 2 ? 3 3 2 4 Map 4 4 2 5 Feedback Concat 5 2 5 5 GroupBy +Min Egress
Connected Graph in Differential t= (0, 0) Edges Labels 1 3 1 Ingress 3 3 1 4 3 4 Concat Join 3 4 3 4 2 4 Map 2 4 2 Feedback Concat 2 2 5 5 5 2 GroupBy +Min Egress
Connected Graph in Differential t= (0, 0) Edges Labels 3 1 Ingress 3 1 3 4 Concat Join 4 3 2 4 Map 4 2 Feedback Concat 2 5 5 2 GroupBy +Min Egress
Connected Graph in Differential Edges Labels t= (0, 0) Ingress 3 1 3 1 1 1 Concat Join 3 4 2 2 4 3 3 3 Map 2 4 4 4 Feedback Concat 4 2 5 5 2 5 GroupBy +Min 5 2 Egress
Connected Graph in Differential Edges Labels t= (0, 0) Ingress 1 1 Concat Join 3 1 Map 4 2 Feedback 2 2 Concat 5 2 GroupBy +Min Egress
Connected Graph in Differential Edges Labels t= (0, 1) Ingress 1 1 Concat Join 3 1 Map 4 2 Feedback 2 2 Concat 5 2 GroupBy +Min Egress
Connected Graph in Differential 1 1 Edges t= (0, 1) Labels 2 2 Ingress 3 3 4 4 Concat Join 5 5 t= (0, 1) 1 1 Map 3 1 Feedback Concat 4 2 GroupBy 2 2 +Min 5 2 Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 1) Concat Join 1 1 1 1 3 1 2 2 Map 4 2 3 3 Feedback Concat 2 2 4 4 GroupBy 5 2 5 5 +Min Egress
Connected Graph in Differential Edges Labels t= (0, 1) Ingress 3 3 3 1 Concat Join 4 4 Map 4 2 Feedback 5 5 Concat 5 2 GroupBy +Min Egress
Connected Graph in Differential 3 1 3 Edges Labels t= (0, 1) 1 3 1 Ingress 3 4 3 4 3 4 Concat Join 4 3 2 Map 4 2 4 2 Feedback 4 2 Concat 5 5 2 2 GroupBy 2 5 +Min Egress
Connected Graph in Differential Edges Labels t= (0, 1) 1 3 Ingress 1 1 4 3 Concat Join 4 1 3 4 Map 3 2 Feedback Concat 4 2 2 2 GroupBy +Min 5 2 Egress
Connected Graph in Differential Edges Labels t= (0, 1) 1 3 Ingress 3 4 3 4 Concat Join 2 4 2 5 Map 4 1 Feedback Concat 2 3 3 3 GroupBy +Min 4 4 4 4 Egress
Connected Graph in Differential Edges Labels 3 1 1 1 1 1 1 1 1 1 Ingress 3 1 2 2 3 1 3 1 4 1 4 1 Concat Join 4 1 4 2 2 2 3 2 2 2 2 2 Map 2 4 5 2 5 2 5 2 2 5 Feedback Concat t= (0, 1) GroupBy Groupby + Min Cumulative Input +Min 4 2 from concat 1 4 Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 2) Concat Join 4 2 1 4 Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 2) 4 2 1 4 Concat Join Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress Concat Join Map Feedback Concat GroupBy t= (0, 2) +Min 2 2 2 1 Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 3) Concat Join 2 2 1 2 Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 3) Concat Join 2 2 1 2 Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress Concat Join Map Feedback Concat GroupBy t= (0, 3) +Min 5 2 5 1 Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 4) 5 2 Concat Join 5 1 Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 4) 5 2 Concat Join 5 1 Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 4) 5 2 Concat Join 5 1 Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress Concat Join Map Feedback Concat ? t= (0, 4) GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress Concat Join t= (0, 4) ? Map Feedback Concat Does not increment GroupBy ? +Min t= (0) Egress
Changes to Connected Graph - I Remove Undirected Edge 4 2 3 5 1
Changes to Connected Graph - I Edges Labels 4 2 t= (1) Ingress 2 4 Concat Join Map Feedback Concat GroupBy +Min Egress
Changes to Connected Graph - I Edges Labels 4 2 t= (1) Ingress t= (1, 0) 2 4 ? Concat Join Map Feedback Concat GroupBy +Min Egress
Recommend
More recommend