time evolving graph
play

Time-Evolving Graph Processing at Scale Anand Iyer # , Li Erran Li + - PowerPoint PPT Presentation

Time-Evolving Graph Processing at Scale Anand Iyer # , Li Erran Li + , Tathagata Das * , Ion Stoica #* # UC Berkeley + Uber Technologies * Databricks Motivation Dynamically evolving graphs prevalent in many domains Social networks (e.g.,


  1. Time-Evolving Graph Processing at Scale Anand Iyer # , Li Erran Li + , Tathagata Das * , Ion Stoica #* # UC Berkeley + Uber Technologies * Databricks

  2. Motivation Dynamically evolving graphs prevalent in many domains – Social networks (e.g., Twitter, Facebook) – Communication networks (e.g. cellular networks) – Internet-of-Things

  3. Motivation Many applications need to leverage the evolution characteristics – Product recommendations – Network troubleshooting – Real-time ad placement

  4. Motivation Lots of interest in distributed graph processing … – GraphX, Girafe, Powergraph, GraphLab, GraphChi, Chaos, … …but existing graph processing engines offer little support for dynamic graphs – Some specialized systems exist. E.g., Kineograph, Chronos, not generic enough

  5. Challenges • Consistent & fault-tolerant snapshot generation • Co-ordinate snapshot generation and computation • Window operations on snapshots • Mix data and graph parallel computations Existing solutions do not satisfy all the requirements

  6. GraphTau Computational Model a a b a b Abstraction c c c f e e e d d d

  7. GraphTau GraphTau represents time-evolving graphs as a series of consistent graph snapshots a a b a b c c c f d e d e d e t 2 t 3 t 1

  8. New Computational Models Two new models for processing time-evolving graphs Pause Shift Resume Online Rectification

  9. Pause-Shift-Resume Many graph algorithms robust to changes in graph before convergence E.g. PageRank: pause iterating, update snapshot, continue iterating

  10. Pause-Shift-Resume 0.502 0.849 (0.571, 0.556) (0.977, 0.968) B B C C B B C C A D A D A A D Transition A A D D D 1.224 A A D D (2.33, 2.39) 2.07 (0.977, 0.968) F F E E F F E E (0.571, 0.556) 0.502 0.849 (0.571, 0.556) (X , Y): X is 10 iteration P ageRank After 11 iteration on graph 2, Y is 23 iteration P ageRank Both converge to 3-digit precision

  11. Online Rectification Model Many graph algorithms not resilient to changes Need to keep per-vertex state to handle changes Connected components on an evolving graph can be done if each vertex stores its component

  12. Abstraction GraphStream[V,E]: Represents a series of Graph[V,E] snapshots where V = vertices, E = edges Graph[V,E] Graph[V,E] Graph[V,E] Graph[V,E] @ T = 1 @ T = 2 @ T = 3 @ T = 4 GraphStream[V,E]

  13. Operations: transform class GraphStream { def transform ( func : Graph => Graph): GraphStream } func: User provided function to do bulk operations on vertices and edges to create a new graph, allows aggregations over vertices and edges transform: Applies func over each snapshot Graphs in a GraphStream

  14. Operations: transform class GraphStream { def transform ( func : Graph => Graph): GraphStream } T = 1 T = 2 T = 3 T = 4 Original GraphStream func func func func Transformed GraphStream

  15. Operations: sliding windows class GraphStream { def mergeWindows ( aggregationFuncs , windowLength , slidingInterval ): GraphStream } T = 1 T = 2 T = 3 T = 4 windowLen Original GraphStream aggregationFuncs slidingInterval Windowed GraphStream

  16. Differential Computation: Pause-shift-resume and Online Rectification incorporated into an efficient Pregel-style computation implementation Effectively an extension of the Pregel iterative processing model for time-evolving graphs

  17. Operations: StreamingBSP class GraphStream { def StreamingBSP ( ..., iterationFunc, ... ): GraphStream } T = 3 T = 1 T = 2 GraphStream Combine previous Apply Pregel results with new iterationFunc snaphot, continue until next snapshot iterating is available Continue until convergence

  18. PageRank using StreamingBSP PageRank computation on streaming graphs easily achieved by a simple call Faster convergence than running PageRank from scratch on every snapshot

  19. Operations: updateLocalState Keep updating non-graph "state" as graph evolves class GraphStream { def updateLocalState ( stateUpdateFunc , initialState ): LocalStateStream } T = 3 T = 1 T = 2 GraphStream stateUpdateFunc initialState

  20. Implementation Implemented on Apache Spark platform - Spark Streaming: stream processing engine - GraphX: graph processing engine GraphTau implemented by combining Spark Streaming and Graphx - Novel optimizations to implement the GraphStream abstraction

  21. Other Benefits Spark Streaming, GraphX built on Spark's RDDs RDDs guarantees fault-tolerance and consistency of datasets In addition, allows mixing data and graph parallel computations in GraphStream

  22. Preliminary Results • Algorithms: – PageRank – Connected Components • Setup: 16 Amazon EC2 instances • Datasets: – Twitter follow graph: 41M vertices, ~1.5B edges – Live LTE network: 2M vertices, variable edges

  23. Preliminary Results: PageRank Dataset: Twitter Graph broken in to parts: - 1 part = full graph - 5 parts = 20% of graph in each part Comparison: - Time to complete PageRank in GraphX on full graph - Time to complete streaming PageRank in GraphTau when the graph is streamed in parts

  24. Preliminary Results: PageRank GraphX on whole graph could not converge! GraphTau converged fast Smaller batches lead when 20% of the graph is to faster convergence streamed at a time

  25. Preliminary Results: Cell IQ CellIQ (NSDI 2015): Prior work - Detection of persistent hotspots using incremental connected components - Built specialized system to do temporal analysis Re-implemented on general system GraphTau - Uses mergeByWindow for sliding window analysis - Strawman (baseline) runs non-incremental connected components on whole window of snapshots

  26. Preliminary Results: Cell IQ 8 (s) Strawman GraphTau CellIQ Time� 6 Analysis� 4 2 0 0 2 4 6 8 10 12 Window� Size� (m) GraphTau managed to get performance comparable to specialized system, without domain specific optimizations

  27. Takeways GraphTau General purpose processing engine for time-evolving graphs GraphStream abstraction that provides Consistent & fault-tolerant snapshot generation Co-ordinate snapshotting and computation Sliding window operations Mix data and graph parallel computations

Recommend


More recommend