PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz - PowerPoint PPT Presentation

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski -2010 Presented by K.M.D.Muthumali Karunarathna 27 th October 2015 University Of Cambridge-Computer Laborotory-R212

Outline ■ Problem ■ Current Solutions ■ Limitations of current solutions ■ Pregel ■ Future work University Of Cambridge-Computer Laborotory-R212

Problem Billions of vertices,Trillions of edges poses challenges to their efficient processing in large graphs. e.g. Web graphs Social Networks (Facebook, Twitter) University Of Cambridge-Computer Laborotory-R212

Frequently added algorithms ■ Shortest path computation ■ Different flavors of clustering (e.g. K-means, K-median) ■ PageRank theme (PageRank is a “vote”, by all the other pages on the Web, about how important a page is) University Of Cambridge-Computer Laborotory-R212

Problems with graph algorithms ■ Poor locality of memory access ■ Very little work per vertex ■ A changing degree of parallelism over the course of execution University Of Cambridge-Computer Laborotory-R212

Implementing an algorithm for a large graph ■ Crafting custom distributed infrastructure ■ Relying on an existing distributed computing platform like MapReduce ■ Using single computer graph algorithm library such as BGL, LEDA ■ Using an existing parallel graph system like BGL, CGMgraph None of this alternatives fit the author's purpose University Of Cambridge-Computer Laborotory-R212

Solution simply ■ A computational model, − Which expressed as a sequence of iterations − A vertex can receive messages sent in the previous iteration − Send messages to other vertices − Modify its own state and that of its outgoing edges ■ Efficient, Scalable, Fault tolerance implementation on clusters ■ Its implied synchronicity makes reasoning about programs easier University Of Cambridge-Computer Laborotory-R212

Model of computation ■ Consists of a sequence of iterations (supersteps), where the same user-defined function is executed for each vertex. ■ This function specifies behavior at a single vertex V and superset S. It can read messages sent to the vertex in super Steps-1, send messages to other vertices that will be read in superset S+1, and modify that state of V and its outgoing edges. University Of Cambridge-Computer Laborotory-R212

Vertex State Machine Vote to halt Active Inactive Message received • Initially each vertex is in an active state. Each vertex can ‘vote to halt’, where it runs no further computation in any further super step unless it receives a message from another vertex. ,It is then reactivated again and needs to explicitly vote to halt to deactivate itself again. Algorithm terminates when all vertices have halted. University Of Cambridge-Computer Laborotory-R212

Pregal - Find maximum value Superstep 0 3 6 2 1 Superstep 1 6 6 2 6 Superstep 2 6 6 6 6 Superstep 3 6 6 6 6 University Of Cambridge-Computer Laborotory-R212

Pregel Solution ■ Allows efficient processing of large, distributively-stored, graphs. ■ Abstracts away distributed computer related issues like fault tolerance . ■ A ‘vertex - centric system’ - All programmer needs to do is outline a single function. University Of Cambridge-Computer Laborotory-R212

Pregel in detailed ■ Master node 1. Coordinates and maintains a list of all workers. 2. Maintains aggregator ■ Aggregators 1. Nodes send master a value at each iteration for aggregation. 2. Provides a global statistic to each node at each super step, important for some algorithms like Dijkstra's algorithm University Of Cambridge-Computer Laborotory-R212

■ Combiners Combines messages to reduce message traffic. − ■ Input and outputs Can be generated from any arbitrary format and stored in a form most suitable for a given − application. ■ Fault tolerance. Achieved through checkpointing − Master instructs workers to save their state to persistent storage at the beginning of each − superstep (Vertex values, Edge values, Incoming messages) If Master detects these workers as down, it reassigns their partitions to available workers − and recomputes the superstep University Of Cambridge-Computer Laborotory-R212

Pregel example - SSSP University Of Cambridge-Computer Laborotory-R212

Experiments - SSSP with varying graph size and worker numbers Runtime (Seconds) Runtime (Seconds) Number of worker tasks Number of worker vertices Figure 9: SSSP — log-normal random graphs, mean out-degree Figure 7: SSSP — 1 billion vertex binary tree: varying number 127.1 (thus over 127 billion edges in the largest case): of worker tasks scheduled on 300 multicore machines varying graph sizes on 800 worker tasks scheduled on 300 multicore machines University Of Cambridge-Computer Laborotory-R212

Critical Analysis ■ What will happen if fault tolerance occurs and it’s not clear whether only the work for the reassigned graph partition or the entire work for that super step is recomputed? ■ They doesn't address when infinite loops might occur and how to account for them University Of Cambridge-Computer Laborotory-R212

Future Work ■ Partitioning based on the graph ■ Handle complex parallelizable functions over the whole graph ■ Avoid waiting for slow workers ■ Confined recovery to improve the cost and latency of recovery University Of Cambridge-Computer Laborotory-R212

THA THANK Y NK YOU OU University Of Cambridge-Computer Laborotory-R212

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz - PowerPoint PPT Presentation

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski -2010 Presented by K.M.D.Muthumali Karunarathna 27 th October 2015

Optimising Graph Algorithms on Pregel-Like Systems S. Salihoglu, J. Widom Stanford University

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J.

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart

Graph Processing Connor Gramazio Spiros Boosalis Pregel why not MapReduce? semantics: awkward

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J.

Pregel A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, et. al.

Pregel: A System for Large- Scale Graph Processing Written by G. Malewicz et al. at SIGMOD 2010

Think Like a {Vertex, Column, Parallel Collection} David Konerding, Google Inc. Pregel: a system

Pregel Large-Scale Graph Processing William Jones Analysing large graphs is hard. We are

Graph Mining - PageRank Mert Terzihan-Zhixiong Chen Content 1. Web as a Graph 2. Why is

Chapter 3: Operating-System Structures System Components Operating System Services

Chapter 3: Operating-System Structures System Components Operating System Services

Module 3: Operating-System Structures System Components Operating-System Services

Module 3: Operating-System Structures System Components Operating System Services

X-Stream: Edge-centric Graph Processing using Streaming Partitions Amitabha Roy, Ivo Mihailovic,

Ligra: A Lightweight Graph Processing Framework for Shared Memory Shared memory Other not

ChemoPort Access Guidelines & Demonstration What is Implantable Chemoport ?

WPW Syndrome Ahmed El-Damaty MD Electrophysiology and Pacing Service Cardiovascular department

Capitol View V O L U M E 4 , N U M B E R 4 N O V E M B E R 2 0 0 6 SUNDOWN ON THE 109TH

City Managers FY 2017-18 Proposed Budget Budgetary Priorities Reduce inefficiencies, costs

Using Unsupervised Paradigm Acquisition for Prefixes Daniel Zeman FAL MFF, Univerzita Karlova,

SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent

2 Preliminaries of the same size and covered by the same superset. For ex- ample, ab , ac and ad

Securing the Tor Network Mike Perry Black Hat USA 2007 Defcon 2007 What is Tor? Volunteer

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz - PowerPoint PPT Presentation

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski -2010 Presented by K.M.D.Muthumali Karunarathna 27 th October 2015

Optimising Graph Algorithms on Pregel-Like Systems S. Salihoglu, J. Widom Stanford University

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J.

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart

Graph Processing Connor Gramazio Spiros Boosalis Pregel why not MapReduce? semantics: awkward

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J.

Pregel A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, et. al.

Pregel: A System for Large- Scale Graph Processing Written by G. Malewicz et al. at SIGMOD 2010

Think Like a {Vertex, Column, Parallel Collection} David Konerding, Google Inc. Pregel: a system

Pregel Large-Scale Graph Processing William Jones Analysing large graphs is hard. We are

Graph Mining - PageRank Mert Terzihan-Zhixiong Chen Content 1. Web as a Graph 2. Why is

Chapter 3: Operating-System Structures System Components Operating System Services

Chapter 3: Operating-System Structures System Components Operating System Services

Module 3: Operating-System Structures System Components Operating-System Services

Module 3: Operating-System Structures System Components Operating System Services

X-Stream: Edge-centric Graph Processing using Streaming Partitions Amitabha Roy, Ivo Mihailovic,

Ligra: A Lightweight Graph Processing Framework for Shared Memory Shared memory Other not

ChemoPort Access Guidelines &amp; Demonstration What is Implantable Chemoport ?

WPW Syndrome Ahmed El-Damaty MD Electrophysiology and Pacing Service Cardiovascular department

Capitol View V O L U M E 4 , N U M B E R 4 N O V E M B E R 2 0 0 6 SUNDOWN ON THE 109TH

City Managers FY 2017-18 Proposed Budget Budgetary Priorities Reduce inefficiencies, costs

Using Unsupervised Paradigm Acquisition for Prefixes Daniel Zeman FAL MFF, Univerzita Karlova,

SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent

2 Preliminaries of the same size and covered by the same superset. For ex- ample, ab , ac and ad

Securing the Tor Network Mike Perry Black Hat USA 2007 Defcon 2007 What is Tor? Volunteer

ChemoPort Access Guidelines & Demonstration What is Implantable Chemoport ?