Processing Massive Graphs Amir H. Payberah amir.payberah@cs.ox.ac.uk University of Oxford Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 1 / 78
What’s the Problem? Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 2 / 78
Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 3 / 78
Large Graph Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 4 / 78 ◮ A large graph either cannot fit into memory of single computer or
Big Data Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 5 / 78
Scale Up vs. Scale Out ◮ Scale up or scale vertically. ◮ Scale out or scale horizontally. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 6 / 78
A Scale Out Example (1/3) ◮ Count the number of times each distinct word appears in the file ◮ If the file fits in memory: words(doc.txt) | sort | uniq -c Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 7 / 78
A Scale Out Example (1/3) ◮ Count the number of times each distinct word appears in the file ◮ If the file fits in memory: words(doc.txt) | sort | uniq -c ◮ If not? Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 7 / 78
A Scale Out Example (2/3) ◮ Parallelize the data and process. ◮ Data-Parallel processing. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 8 / 78
A Scale Out Example (3/3) ◮ MapReduce Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 9 / 78
Can we use platforms like MapReduce or Spark, which are based on data-parallel model, for large-scale graph proceeding? Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 10 / 78
Large Graph Processing Challenges ◮ Difficult to extract parallelism based on partitioning of the data. ◮ Difficult to express parallelism based on partitioning of computation. ◮ No locality between computations and data access patterns. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 11 / 78
Graph-Parallel Processing Graph-Parallel Processing ◮ Computation typically depends on the neighbors. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 12 / 78
Graph-Parallel Processing ◮ Restricts the types of computation. ◮ New techniques to partition and distribute graphs. ◮ Exploit graph structure. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 13 / 78
Data-Parallel vs. Graph-Parallel Computation Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 14 / 78
Graph-Parallel Processing Models ◮ Vertex-centric processing model • Pregel, Giraph, GraphLab, PowerGraph, ... ◮ Edge-centric processing model • X-Stream, Chaos, ... Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 15 / 78
Vertex-Centric Programming Model ◮ Vertex-centric Programming model • Write a vertex program • State stored in vertices. ◮ Vertex operations: • Gather updates from incoming edges • Scatter updates along outgoing edges Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 16 / 78
A Vertex-Centric Program ◮ Iterates over vertices // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 17 / 78
A Vertex-Centric Program ◮ Iterates over vertices // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 17 / 78
Vertex-Centric Scatter-Gather (1/5) Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 18 / 78
Vertex-Centric Scatter-Gather (2/5) Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 19 / 78
Vertex-Centric Scatter-Gather (3/5) Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 20 / 78
Vertex-Centric Scatter-Gather (4/5) Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 21 / 78
Vertex-Centric Scatter-Gather (5/5) Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 22 / 78
Vertex-Centric vs. Edge-Centric (1/2) Vertex-centric Edge-centric Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 23 / 78
Vertex-Centric vs. Edge-Centric (2/2) Until convergence { // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) } Until convergence { // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 24 / 78
Edge-Centric Scatter-Gather (1/5) Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 25 / 78
Edge-Centric Scatter-Gather (2/5) Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 26 / 78
Edge-Centric Scatter-Gather (3/5) Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 27 / 78
Edge-Centric Scatter-Gather (4/5) Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 28 / 78
Edge-Centric Scatter-Gather (5/5) Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 29 / 78
Vertex-Centric Processing Platforms Pregel and GraphLab Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 30 / 78
Pregel Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 31 / 78
Pregel ◮ Large-scale graph-parallel processing platform developed at Google. ◮ Inspired by bulk synchronous parallel (BSP) model. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 32 / 78
Programming Model ◮ Vertex-centric programming: Think as a vertex. ◮ Each vertex computes individually its value: in parallel ◮ Each vertex can see its local context and updates its value. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 33 / 78
Execution Model (1/2) ◮ Applications run in sequence of iterations: supersteps ◮ A vertex in superstep S can: • reads messages sent to it in superstep S-1. • sends messages to other vertices: receiving at superstep S+1. • modifies its state. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 34 / 78
Execution Model (2/2) ◮ Superstep 0: all vertices are in the active state. ◮ A vertex deactivates itself by voting to halt: no further work to do. ◮ A halted vertex can be active if it receives a message. ◮ The whole algorithm terminates when: • All vertices are simultaneously inactive. • There are no messages in transit. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 35 / 78
Example: Max Value (1/4) i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val) Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 36 / 78
Recommend
More recommend