Time-Evolving Graph Processing at Scale Anand Iyer # , Li Erran Li + - PowerPoint PPT Presentation

Time-Evolving Graph Processing at Scale Anand Iyer # , Li Erran Li + , Tathagata Das * , Ion Stoica #* # UC Berkeley + Uber Technologies * Databricks

Motivation Dynamically evolving graphs prevalent in many domains – Social networks (e.g., Twitter, Facebook) – Communication networks (e.g. cellular networks) – Internet-of-Things

Motivation Many applications need to leverage the evolution characteristics – Product recommendations – Network troubleshooting – Real-time ad placement

Motivation Lots of interest in distributed graph processing … – GraphX, Girafe, Powergraph, GraphLab, GraphChi, Chaos, … …but existing graph processing engines offer little support for dynamic graphs – Some specialized systems exist. E.g., Kineograph, Chronos, not generic enough

Challenges • Consistent & fault-tolerant snapshot generation • Co-ordinate snapshot generation and computation • Window operations on snapshots • Mix data and graph parallel computations Existing solutions do not satisfy all the requirements

GraphTau Computational Model a a b a b Abstraction c c c f e e e d d d

GraphTau GraphTau represents time-evolving graphs as a series of consistent graph snapshots a a b a b c c c f d e d e d e t 2 t 3 t 1

New Computational Models Two new models for processing time-evolving graphs Pause Shift Resume Online Rectification

Pause-Shift-Resume Many graph algorithms robust to changes in graph before convergence E.g. PageRank: pause iterating, update snapshot, continue iterating

Pause-Shift-Resume 0.502 0.849 (0.571, 0.556) (0.977, 0.968) B B C C B B C C A D A D A A D Transition A A D D D 1.224 A A D D (2.33, 2.39) 2.07 (0.977, 0.968) F F E E F F E E (0.571, 0.556) 0.502 0.849 (0.571, 0.556) (X , Y): X is 10 iteration P ageRank After 11 iteration on graph 2, Y is 23 iteration P ageRank Both converge to 3-digit precision

Online Rectification Model Many graph algorithms not resilient to changes Need to keep per-vertex state to handle changes Connected components on an evolving graph can be done if each vertex stores its component

Abstraction GraphStream[V,E]: Represents a series of Graph[V,E] snapshots where V = vertices, E = edges Graph[V,E] Graph[V,E] Graph[V,E] Graph[V,E] @ T = 1 @ T = 2 @ T = 3 @ T = 4 GraphStream[V,E]

Operations: transform class GraphStream { def transform ( func : Graph => Graph): GraphStream } func: User provided function to do bulk operations on vertices and edges to create a new graph, allows aggregations over vertices and edges transform: Applies func over each snapshot Graphs in a GraphStream

Operations: transform class GraphStream { def transform ( func : Graph => Graph): GraphStream } T = 1 T = 2 T = 3 T = 4 Original GraphStream func func func func Transformed GraphStream

Operations: sliding windows class GraphStream { def mergeWindows ( aggregationFuncs , windowLength , slidingInterval ): GraphStream } T = 1 T = 2 T = 3 T = 4 windowLen Original GraphStream aggregationFuncs slidingInterval Windowed GraphStream

Differential Computation: Pause-shift-resume and Online Rectification incorporated into an efficient Pregel-style computation implementation Effectively an extension of the Pregel iterative processing model for time-evolving graphs

Operations: StreamingBSP class GraphStream { def StreamingBSP ( ..., iterationFunc, ... ): GraphStream } T = 3 T = 1 T = 2 GraphStream Combine previous Apply Pregel results with new iterationFunc snaphot, continue until next snapshot iterating is available Continue until convergence

PageRank using StreamingBSP PageRank computation on streaming graphs easily achieved by a simple call Faster convergence than running PageRank from scratch on every snapshot

Operations: updateLocalState Keep updating non-graph "state" as graph evolves class GraphStream { def updateLocalState ( stateUpdateFunc , initialState ): LocalStateStream } T = 3 T = 1 T = 2 GraphStream stateUpdateFunc initialState

Implementation Implemented on Apache Spark platform - Spark Streaming: stream processing engine - GraphX: graph processing engine GraphTau implemented by combining Spark Streaming and Graphx - Novel optimizations to implement the GraphStream abstraction

Other Benefits Spark Streaming, GraphX built on Spark's RDDs RDDs guarantees fault-tolerance and consistency of datasets In addition, allows mixing data and graph parallel computations in GraphStream

Preliminary Results • Algorithms: – PageRank – Connected Components • Setup: 16 Amazon EC2 instances • Datasets: – Twitter follow graph: 41M vertices, ~1.5B edges – Live LTE network: 2M vertices, variable edges

Preliminary Results: PageRank Dataset: Twitter Graph broken in to parts: - 1 part = full graph - 5 parts = 20% of graph in each part Comparison: - Time to complete PageRank in GraphX on full graph - Time to complete streaming PageRank in GraphTau when the graph is streamed in parts

Preliminary Results: PageRank GraphX on whole graph could not converge! GraphTau converged fast Smaller batches lead when 20% of the graph is to faster convergence streamed at a time

Preliminary Results: Cell IQ CellIQ (NSDI 2015): Prior work - Detection of persistent hotspots using incremental connected components - Built specialized system to do temporal analysis Re-implemented on general system GraphTau - Uses mergeByWindow for sliding window analysis - Strawman (baseline) runs non-incremental connected components on whole window of snapshots

Preliminary Results: Cell IQ 8 (s) Strawman GraphTau CellIQ Time� 6 Analysis� 4 2 0 0 2 4 6 8 10 12 Window� Size� (m) GraphTau managed to get performance comparable to specialized system, without domain specific optimizations

Takeways GraphTau General purpose processing engine for time-evolving graphs GraphStream abstraction that provides Consistent & fault-tolerant snapshot generation Co-ordinate snapshotting and computation Sliding window operations Mix data and graph parallel computations

Time-Evolving Graph Processing at Scale Anand Iyer # , Li Erran Li + - PowerPoint PPT Presentation

Time-Evolving Graph Processing at Scale Anand Iyer # , Li Erran Li + , Tathagata Das * , Ion Stoica #* # UC Berkeley + Uber Technologies * Databricks Motivation Dynamically evolving graphs prevalent in many domains Social networks (e.g.,

Evolving Data Access Evolving Data Access Evolving Data Access Evolving Data Access

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

UI Evolving Platform Evolving Architecture Evolving About Me Xianning ( Pronunciation

Evolving Neural Networks This lecture is based on Xin Yaos tutorial slides From Evolving

XL1C: Graph Times-Series Using Ratio Display 3/9/2017 V0D XL1C: V0D XL1C: V0D Graph by Time

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Employee Wellbeing CONVENTIONAL THE EVOLVING NORMAL Employee Wellbeing CONVENTIONAL THE

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Kaleidoscope : Graph Analytics on Evolving Graphs Steffen Maass, Taesoo Kim Georgia Institute of

Split clique graph complexity L. Alcn and M. Gutierrez La Plata, Argentina L. Faria and C. M.

XL1A: Graph Nominal Frequency Data Using Excel2013 3/10/2017 V0E XL1A: V0E XL1A: V0E Graph

Graph Sparsifiers Smaller graph that (approximately) preserves the values of some set of

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying

Network/Graph Network/Graph Informally a graph is a set of nodes Theory Theory joined by a

Cellular Network Delay: Measurements in Four Swedish HSDPA+ and LTE Networks Anna Brunstrom

MySpeedTest: Active and Passive Measurements of Cellular Data Network Performance Sachit

CS 525M Mobile and Ubiquitous Computing Seminar Damian Robo Paper Information RADAR: An

Michail Antisthenis I. Tsompanas, Georgios Ch. Sirakoulis* and Ioannis Karafyllidis Department

Census 2020: Everyone Counts DISCUSSION Was there anything you heard today that surprised you?

Cens u s S u bject Tables AN ALYZIN G U S C E N SU S DATA IN P YTH ON Lee Hachadoorian Asst .

You can still respond to the 2020 Census! The Census is a decennial count of every person living

2020 Census Jeff T. Behler, Regional Director New York Regional Office About the U.S. Census

Time-Evolving Graph Processing at Scale Anand Iyer # , Li Erran Li + - PowerPoint PPT Presentation

Time-Evolving Graph Processing at Scale Anand Iyer # , Li Erran Li + , Tathagata Das * , Ion Stoica #* # UC Berkeley + Uber Technologies * Databricks Motivation Dynamically evolving graphs prevalent in many domains Social networks (e.g.,

Evolving Data Access Evolving Data Access Evolving Data Access Evolving Data Access

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

UI Evolving Platform Evolving Architecture Evolving About Me Xianning ( Pronunciation

Evolving Neural Networks This lecture is based on Xin Yaos tutorial slides From Evolving

XL1C: Graph Times-Series Using Ratio Display 3/9/2017 V0D XL1C: V0D XL1C: V0D Graph by Time

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Employee Wellbeing CONVENTIONAL THE EVOLVING NORMAL Employee Wellbeing CONVENTIONAL THE

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Kaleidoscope : Graph Analytics on Evolving Graphs Steffen Maass, Taesoo Kim Georgia Institute of

Split clique graph complexity L. Alcn and M. Gutierrez La Plata, Argentina L. Faria and C. M.

XL1A: Graph Nominal Frequency Data Using Excel2013 3/10/2017 V0E XL1A: V0E XL1A: V0E Graph

Graph Sparsifiers Smaller graph that (approximately) preserves the values of some set of

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying

Network/Graph Network/Graph Informally a graph is a set of nodes Theory Theory joined by a

Cellular Network Delay: Measurements in Four Swedish HSDPA+ and LTE Networks Anna Brunstrom

MySpeedTest: Active and Passive Measurements of Cellular Data Network Performance Sachit

CS 525M Mobile and Ubiquitous Computing Seminar Damian Robo Paper Information RADAR: An

Michail Antisthenis I. Tsompanas, Georgios Ch. Sirakoulis* and Ioannis Karafyllidis Department

Census 2020: Everyone Counts DISCUSSION Was there anything you heard today that surprised you?

Cens u s S u bject Tables AN ALYZIN G U S C E N SU S DATA IN P YTH ON Lee Hachadoorian Asst .

You can still respond to the 2020 Census! The Census is a decennial count of every person living

2020 Census Jeff T. Behler, Regional Director New York Regional Office About the U.S. Census

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,