Pregel: A System for Large- Scale Graph Processing Written by G. - PowerPoint PPT Presentation

Pregel: A System for Large- Scale Graph Processing Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010 1 Wednesday, October 13, 2010

Graphs are hard • Poor locality of memory access • Very little work per vertex • Changing degree of parallelism • Running over many machines makes the problem worse 2 Wednesday, October 13, 2010

State of the Art Today • Write your own infrastructure • Substantial engineering effort • Use MapReduce • Inefficient - must store graph state in each stage, too much communication between stages 3 Wednesday, October 13, 2010

State of the Art Today • Use a single-computer graph library • Not scalable ☹ • Use existing parallel graph systems • No fault tolerance ☹ 4 Wednesday, October 13, 2010

Bulk Synchronous Parallel • Series of iterations (supersteps) • Each vertex invokes a function in parallel • Can read messages sent in previous superstep • Can send messages, to be read at the next superstep • Can modify state of outgoing edges 5 Wednesday, October 13, 2010

Compute Model • You give Pregel a directed graph • It runs your computation at each vertex • Do this until every vertex votes to halt • Pregel gives you a directed graph back 6 Wednesday, October 13, 2010

Primitives • Vertices - first class • Edges - not • Both can be dynamically created and destroyed 7 Wednesday, October 13, 2010

Vertex State Machine 8 Wednesday, October 13, 2010

C++ API • Your code subclasses Vertex, writes a Compute method • Can get/set vertex value • Can get/set outgoing edges values • Can send/receive messages 9 Wednesday, October 13, 2010

C++ API • Message passing: • No guaranteed message delivery order • Messages are delivered exactly once • Can send messages to any node • If dest doesn’t exist, user’s function is called 10 Wednesday, October 13, 2010

C++ API • Combiners (off by default): • User specifies a way to reduce many messages into one value (ala Reduce in MR) • Must be commutative and associative • Exceedingly useful in certain contexts (e.g., 4x speedup on shortest-path compuation) 11 Wednesday, October 13, 2010

C++ API • Aggregators: • User specifies a function • Each vertex sends it a value • Each vertex receives aggregate(vals) • Can be used for statistics or coordination 12 Wednesday, October 13, 2010

C++ API • Topology mutations: • Vertices can create / destroy vertices at will • Resolving conflicting requests: • Partial ordering: E Remove, V Remove, V Add, E Add • User-defined handlers: You fix the conflicts on your own 13 Wednesday, October 13, 2010

C++ API • Input and output: • Text file • Vertices in a relational DB • Rows in BigTable • Custom - subclass Reader/Writer classes 14 Wednesday, October 13, 2010

Implementation • Executable is copied to many machines • One machine becomes the Master • Coordinates activities • Other machines become Workers • Performs computation 15 Wednesday, October 13, 2010

Implementation • Master partitions the graph • Master partitions the input • If a Worker receives input that is not for their vertices, they pass it along • Supersteps begin • Master can tell Workers to save graphs 16 Wednesday, October 13, 2010

Fault Tolerance • At each superstep S: • Workers checkpoint V, E, and Messages • Master checkpoints Aggregators • If a node fails, everyone starts over at S • Confined recovery is under development • what happens if the Master fails? 17 Wednesday, October 13, 2010

The Worker • Keeps graph in memory • Message queues for supersteps S and S+1 • Remote messages are buffered • Combiner is used when messages are sent or received (save network and disk) 18 Wednesday, October 13, 2010

The Master • Master keeps track of which Workers own each partition • Not who owns each Vertex • Coordinates all operations (via barriers) • Maintains statistics and runs a HTTP server for users to view info on 19 Wednesday, October 13, 2010

Aggregators • Worker passes values to its aggregator • Aggregator uses tree structure to reduce vals w/ other aggregators • Better parallelism than chain pipelining • Final value is sent to Master 20 Wednesday, October 13, 2010

PageRank in Pregel 21 Wednesday, October 13, 2010

Shortest Path in Pregel 22 Wednesday, October 13, 2010

Evaluation • 300 multicore commodity PCs used • Only running time is counted • Checkpointing disabled • Measures scalability of Worker tasks • Measures scalability w.r.t. # of Vertices • in binary trees and log-normal trees 23 Wednesday, October 13, 2010

24 Wednesday, October 13, 2010

Current / Future Work • Graph must fit in RAM - working on spilling over to / from disk • Assigning vertices to machines to optimize traffic is an open problem • Want to investigate dynamic re- partitioning 27 Wednesday, October 13, 2010

Conclusions • Pregel is production-ready and in use • Usable after a short learning curve • Vertex centric is not always easy to do • Pregel works best on sparse graphs w / communication over edges • Can’t change the API - too many people using it! 28 Wednesday, October 13, 2010

Related Work • Hama - from the Apache Hadoop team • BSP model but not vertex centric ala Pregel • Appears not to be ready for real use: • 29 Wednesday, October 13, 2010

Related Work • Phoebus, released last week on github • Runs on Mac OS X • Cons (as of this writing): • Doesn’t work on Linux • Must write code in Erlang (since Phoebus is written in it) 30 Wednesday, October 13, 2010

Thanks! • To my advisor, Chandra Krintz • To Google for this paper • To all of you for coming! 31 Wednesday, October 13, 2010

Pregel: A System for Large- Scale Graph Processing Written by G. - PowerPoint PPT Presentation

Pregel: A System for Large- Scale Graph Processing Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010 1 Wednesday, October 13, 2010 Graphs are hard Poor locality of memory access Very

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J.

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart

Pregel Large-Scale Graph Processing William Jones Analysing large graphs is hard. We are

Optimising Graph Algorithms on Pregel-Like Systems S. Salihoglu, J. Widom Stanford University

Graph Processing Connor Gramazio Spiros Boosalis Pregel why not MapReduce? semantics: awkward

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz Malewicz, Matthew H. Austern, Aart J.

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J.

Pregel A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, et. al.

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

Think Like a {Vertex, Column, Parallel Collection} David Konerding, Google Inc. Pregel: a system

Graph Mining - PageRank Mert Terzihan-Zhixiong Chen Content 1. Web as a Graph 2. Why is

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Granula: Toward Fine-grained Performance Analysis of Large-scale Graph Processing Platforms Wing

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

HelP: High-level Primitives for Large- Scale Graph Processing Semih Salihoglu Stanford

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

V E R T E X S I M I L A R I T Y A N D I T S A P P L I C A T I O N T O F U N C T I O N A L P

Efficient Delivery with Mobile Agents Andreas B artschi NSEC/CNLS, baertschi@lanl.gov CNLS

Unicorn Runtime Provenance-Based Detector for Advanced Persistent Threats Thomas Pasquier

On the Shadow Simplex Method for Curved Polyhedra Daniel Dadush 1 ahnle 2 Nicolai H 1 Centrum

Pregelix: Think Like a Vertex, Scale Like Spandex Yingyi Bu (UC Irvine) Work with: Vinayak

A K a l ma n f i l t e r f o r t h e C M S M u o n T r i g g e r f

Vehicle Routing Marco Chiarandini Outline 1. Vehicle Routing Introduction 2. CVRP 3. VRPTW

Outline Other Variants of VRP DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. A Uniform Model