Differential Dataflow McSherry, Frank D., Murray, Derek G., Isaacs, - PowerPoint PPT Presentation
Differential Dataflow McSherry, Frank D., Murray, Derek G., Isaacs, Rebecca, Isard, Michael Chathura Kankanamge 08th November 2016 Outline Motivation for Differential Dataflow Key Concepts Differential Dataflow in practice
Differential Dataflow McSherry, Frank D., Murray, Derek G., Isaacs, Rebecca, Isard, Michael Chathura Kankanamge 08th November 2016
Outline ● Motivation for Differential Dataflow ● Key Concepts ● Differential Dataflow in practice ● Discussion
Motivation
Traditional data parallel processing ● Take input data in batches. ● Process and output. ● Highly evolved - Hadoop, Spark. ● Mostly stateless.
Interactive - Twitter Mention Graph ● Used to find trending #hashtags. ● Billions of vertices and edges. ● Millions of updates per second (storm). ● Needs low latency of streaming and throughput of spark. ● Similar issue with interactive analytics
Loop Processing ● Some algorithms require iterations ○ Pagerank ○ Connected components ● Usually requires transferring entire state between iterations ● Spark, Hadoop etc execution times ~ stateless
Incremental Dataflow ● Stateful. ● Get the differences of collections. ● Only calculate changes. ● Example ○ Wordcount in Hadoop Online. ● Can deal with changes due to, ○ Loops ○ New Data ● But NOT both!!
Concepts
Total vs Partial Ordering ● Traditional dataflow systems expect total 1 2 3 4 5 ordering ○ Multiple variables are a problem (1, 2) (0, 2) (2, 2) ● A partial ordering uses a time vector for ordering ○ Deals well with multiple variables (1, 1) (0, 1) (2, 1) ● Partial because ordering by variable x gives only a partial ordering (0, 0) (1, 0) (2, 0)
Total vs Partial Ordering ● Traditional dataflow systems expect total 1 2 3 4 5 ordering ○ Multiple variables are a problem (1, 2) (0, 2) (2, 2) ● A partial ordering uses a time vector for ordering ○ Deals well with multiple variables (1, 1) (0, 1) (2, 1) ● Partial because ordering by variable x gives only a partial ordering for x (0, 0) (1, 0) (2, 0)
Total vs Partial Ordering ● Traditional dataflow systems expect total 1 2 3 4 5 ordering ○ Multiple variables are a problem (1, 2) (0, 2) (2, 2) ● A partial ordering uses a time vector for ordering ○ Deals well with multiple variables (1, 1) (0, 1) (2, 1) ● Partial because ordering by variable x gives only a partial ordering (0, 0) (1, 0) (2, 0)
Differential Dataflow ● Computational Model ○ Defines how to process partially ordered data. ○ Defines state between iterations ● Goals ○ Do less calculation per change ○ Converge quicker per iteration
Timely Dataflow ● Performs Iterative Calculations ● Computational model with directed graph ● Vertices exchange messages ● Logical Timestamps for messages
Timely Dataflow ● Loops denoted by, ○ Ingress - adds a counter ○ Feedback - increments a counter ○ Egress - removes a counter ● Pointstamps - events at location and time
Differential Dataflow in practise
The Connected Graph Problem 4 6 2 3 7 8 5 1
The Connected Graph Problem 1 6 1 1 6 6 1 1
Connected Graph with Relational Algebra Labels Edges 1 3 3 1 1 1 4 3 2 2 3 3 3 4 U 4 4 4 2 Min 5 5 2 4 2 5 O 5 2
Connected Graph with Relational Algebra Labels Edges 1 3 1 3 3 1 4 3 4 3 4 3 U 4 2 4 Min 2 4 2 2 2 5 O 5 5 2
Connected Graph with Relational Algebra Labels Edges 3 1 2 5 5 2 3 1 Neighbour Labels 3 4 1 1 U 4 3 2 2 s l e b 2 4 a 3 3 L Min f l e S 4 4 4 2 5 5 O
Connected Graph with Relational Algebra Labels Edges 1 1 Result after 1st Iteration 1 3 4 2 U 2 2 5 2 Min O
Connected Graph in Timely ● Edges are available constantly GroupBy G H Edges F I B Concat Egress +Min ● Add counter at Ingress Map Join ● Remove Counter at egress ● Increment counter at E feedback ● Map converts joined tuples A Labels C E Ingress Concat into node/label tuples ● Concat performs the union I F e e d b a c k J
Maintaining State in Differential Dataflow Sum of all states at Change in state at b before t node b at t Cumulative state at b upto t
Connected Graph 4 2 3 5 1
Connected Graph in Differential 1 3 Edges Labels t= (0) t= (0) 1 1 3 1 Ingress 4 3 2 2 3 4 3 3 Concat Join 4 2 4 4 2 4 Map 5 5 2 5 Feedback Concat 5 2 GroupBy +Min Egress
Connected Graph in Differential 1 3 Edges Labels t= (0) 3 1 t= (0, 0) 1 1 Ingress 4 3 2 2 3 4 Concat Join 3 3 4 2 4 4 2 4 Map 5 5 2 5 Feedback Concat 5 2 GroupBy +Min Egress
Connected Graph in Differential 1 3 Edges Labels t= (0) 3 1 Ingress 4 3 t= (0, 0) 1 1 3 4 Concat Join 2 2 4 2 ? 3 3 2 4 Map 4 4 2 5 Feedback Concat 5 2 5 5 GroupBy +Min Egress
Connected Graph in Differential t= (0, 0) Edges Labels 1 3 1 Ingress 3 3 1 4 3 4 Concat Join 3 4 3 4 2 4 Map 2 4 2 Feedback Concat 2 2 5 5 5 2 GroupBy +Min Egress
Connected Graph in Differential t= (0, 0) Edges Labels 3 1 Ingress 3 1 3 4 Concat Join 4 3 2 4 Map 4 2 Feedback Concat 2 5 5 2 GroupBy +Min Egress
Connected Graph in Differential Edges Labels t= (0, 0) Ingress 3 1 3 1 1 1 Concat Join 3 4 2 2 4 3 3 3 Map 2 4 4 4 Feedback Concat 4 2 5 5 2 5 GroupBy +Min 5 2 Egress
Connected Graph in Differential Edges Labels t= (0, 0) Ingress 1 1 Concat Join 3 1 Map 4 2 Feedback 2 2 Concat 5 2 GroupBy +Min Egress
Connected Graph in Differential Edges Labels t= (0, 1) Ingress 1 1 Concat Join 3 1 Map 4 2 Feedback 2 2 Concat 5 2 GroupBy +Min Egress
Connected Graph in Differential 1 1 Edges t= (0, 1) Labels 2 2 Ingress 3 3 4 4 Concat Join 5 5 t= (0, 1) 1 1 Map 3 1 Feedback Concat 4 2 GroupBy 2 2 +Min 5 2 Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 1) Concat Join 1 1 1 1 3 1 2 2 Map 4 2 3 3 Feedback Concat 2 2 4 4 GroupBy 5 2 5 5 +Min Egress
Connected Graph in Differential Edges Labels t= (0, 1) Ingress 3 3 3 1 Concat Join 4 4 Map 4 2 Feedback 5 5 Concat 5 2 GroupBy +Min Egress
Connected Graph in Differential 3 1 3 Edges Labels t= (0, 1) 1 3 1 Ingress 3 4 3 4 3 4 Concat Join 4 3 2 Map 4 2 4 2 Feedback 4 2 Concat 5 5 2 2 GroupBy 2 5 +Min Egress
Connected Graph in Differential Edges Labels t= (0, 1) 1 3 Ingress 1 1 4 3 Concat Join 4 1 3 4 Map 3 2 Feedback Concat 4 2 2 2 GroupBy +Min 5 2 Egress
Connected Graph in Differential Edges Labels t= (0, 1) 1 3 Ingress 3 4 3 4 Concat Join 2 4 2 5 Map 4 1 Feedback Concat 2 3 3 3 GroupBy +Min 4 4 4 4 Egress
Connected Graph in Differential Edges Labels 3 1 1 1 1 1 1 1 1 1 Ingress 3 1 2 2 3 1 3 1 4 1 4 1 Concat Join 4 1 4 2 2 2 3 2 2 2 2 2 Map 2 4 5 2 5 2 5 2 2 5 Feedback Concat t= (0, 1) GroupBy Groupby + Min Cumulative Input +Min 4 2 from concat 1 4 Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 2) Concat Join 4 2 1 4 Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 2) 4 2 1 4 Concat Join Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress Concat Join Map Feedback Concat GroupBy t= (0, 2) +Min 2 2 2 1 Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 3) Concat Join 2 2 1 2 Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 3) Concat Join 2 2 1 2 Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress Concat Join Map Feedback Concat GroupBy t= (0, 3) +Min 5 2 5 1 Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 4) 5 2 Concat Join 5 1 Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 4) 5 2 Concat Join 5 1 Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress t= (0, 4) 5 2 Concat Join 5 1 Map Feedback Concat GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress Concat Join Map Feedback Concat ? t= (0, 4) GroupBy +Min Egress
Connected Graph in Differential Edges Labels Ingress Concat Join t= (0, 4) ? Map Feedback Concat Does not increment GroupBy ? +Min t= (0) Egress
Changes to Connected Graph - I Remove Undirected Edge 4 2 3 5 1
Changes to Connected Graph - I Edges Labels 4 2 t= (1) Ingress 2 4 Concat Join Map Feedback Concat GroupBy +Min Egress
Changes to Connected Graph - I Edges Labels 4 2 t= (1) Ingress t= (1, 0) 2 4 ? Concat Join Map Feedback Concat GroupBy +Min Egress
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.