Families of distributed graph algorithms Divide and conquer arton Balassi 1 M´ mbalassi@ilab.sztaki.hu 1 Hungarian Academy of Sciences – Institute for Computer Science and Control Data Mining & Search Group June 24, 2014
Families of distributed graph algorithms 2 / 61 Table of contents Distributing data-intensive algorithms Motivation MapReduce & Pregel Counting the number of triangles in a graph Families of distributed graph algorithms Local algorithms Graph traversal based algorithms Matrix multiplication based algorithms Experiments Representative algorithms Results
Families of distributed graph algorithms Distributing data-intensive algorithms 3 / 61 Table of contents Distributing data-intensive algorithms Motivation MapReduce & Pregel Counting the number of triangles in a graph Families of distributed graph algorithms Local algorithms Graph traversal based algorithms Matrix multiplication based algorithms Experiments Representative algorithms Results
Families of distributed graph algorithms Distributing data-intensive algorithms Motivation 4 / 61 A bit about myself My background ◮ BSc, MSc in Computer Science, E¨ otv¨ os University Budapest ◮ BA in Economics, TU Budapest ◮ Distributed algorithms ◮ Big data architecture
Families of distributed graph algorithms Distributing data-intensive algorithms Motivation 5 / 61 A bit about myself My background ◮ BSc, MSc in Computer Science, E¨ otv¨ os University Budapest ◮ BA in Economics, TU Budapest ◮ Distributed algorithms ◮ Big data architecture
Families of distributed graph algorithms Distributing data-intensive algorithms Motivation 6 / 61 A bit about myself My background ◮ BSc, MSc in Computer Science, E¨ otv¨ os University Budapest ◮ BA in Economics, TU Budapest ◮ Distributed algorithms ◮ Big data architecture
Families of distributed graph algorithms Distributing data-intensive algorithms Motivation 7 / 61 A bit about myself My background ◮ BSc, MSc in Computer Science, E¨ otv¨ os University Budapest ◮ BA in Economics, TU Budapest ◮ Distributed algorithms ◮ Big data architecture
Families of distributed graph algorithms Distributing data-intensive algorithms Motivation 8 / 61 Motivation Let’s do a PageRank on this graph. . . ◮ A large Portugese webcrawl 1 ◮ 3 . 1 · 10 9 nodes ◮ 1 . 1 · 10 11 edges ◮ 80 GB of compressed data ◮ Divide and conquer is almost mandatory 1 a large Portuguese crawl of the Portuguese Web Archive obtained from Daniel Gomes
Families of distributed graph algorithms Distributing data-intensive algorithms Motivation 9 / 61 Motivation Let’s do a PageRank on this graph. . . ◮ A large Portugese webcrawl 1 ◮ 3 . 1 · 10 9 nodes ◮ 1 . 1 · 10 11 edges ◮ 80 GB of compressed data ◮ Divide and conquer is almost mandatory 1 a large Portuguese crawl of the Portuguese Web Archive obtained from Daniel Gomes
Families of distributed graph algorithms Distributing data-intensive algorithms Motivation 10 / 61 Motivation Let’s do a PageRank on this graph. . . ◮ A large Portugese webcrawl 1 ◮ 3 . 1 · 10 9 nodes ◮ 1 . 1 · 10 11 edges ◮ 80 GB of compressed data ◮ Divide and conquer is almost mandatory 1 a large Portuguese crawl of the Portuguese Web Archive obtained from Daniel Gomes
Families of distributed graph algorithms Distributing data-intensive algorithms Motivation 11 / 61 Motivation Let’s do a PageRank on this graph. . . ◮ A large Portugese webcrawl 1 ◮ 3 . 1 · 10 9 nodes ◮ 1 . 1 · 10 11 edges ◮ 80 GB of compressed data ◮ Divide and conquer is almost mandatory 1 a large Portuguese crawl of the Portuguese Web Archive obtained from Daniel Gomes
Families of distributed graph algorithms Distributing data-intensive algorithms Motivation 12 / 61 Motivation Let’s do a PageRank on this graph. . . ◮ A large Portugese webcrawl 1 ◮ 3 . 1 · 10 9 nodes ◮ 1 . 1 · 10 11 edges ◮ 80 GB of compressed data ◮ Divide and conquer is almost mandatory 1 a large Portuguese crawl of the Portuguese Web Archive obtained from Daniel Gomes
Families of distributed graph algorithms Distributing data-intensive algorithms MapReduce & Pregel 13 / 61 MapReduce [DG04]
Families of distributed graph algorithms Distributing data-intensive algorithms MapReduce & Pregel 14 / 61 Pregel [MAB + 10] Traits ◮ Bulk Synchronous Parallel [Val90] ◮ ,,Think like a vertex” ◮ Graph kept in memory Scheme of the BSP system Wikipedia, public domain
Families of distributed graph algorithms Distributing data-intensive algorithms MapReduce & Pregel 15 / 61 Pregel [MAB + 10] Traits ◮ Bulk Synchronous Parallel [Val90] ◮ ,,Think like a vertex” ◮ Graph kept in memory Scheme of the BSP system Wikipedia, public domain
Families of distributed graph algorithms Distributing data-intensive algorithms MapReduce & Pregel 16 / 61 Pregel [MAB + 10] Traits ◮ Bulk Synchronous Parallel [Val90] ◮ ,,Think like a vertex” ◮ Graph kept in memory Scheme of the BSP system Wikipedia, public domain
Families of distributed graph algorithms Distributing data-intensive algorithms MapReduce & Pregel 17 / 61 Pregel [MAB + 10] In 1 Out 1 t − 1 t t − 1 t . . . . . . Vertex t − 1 t In n Out m Pregel schema as perceived from a vertex
Families of distributed graph algorithms Distributing data-intensive algorithms Counting the number of triangles in a graph 18 / 61 Triangle Counter – Sequential algorithm Sequential algorithm Every vertex executes a search of itself bounded in depth of three. Thus every triangle is counted three times.
Families of distributed graph algorithms Distributing data-intensive algorithms Counting the number of triangles in a graph 19 / 61 Triangle Counter – Sequential algorithm Sequential algorithm Every vertex executes a search of itself bounded in depth of three. Thus every triangle is counted three times. You can do better by making use of the ordering on the vertices.
Families of distributed graph algorithms Distributing data-intensive algorithms Counting the number of triangles in a graph 20 / 61 Triangle Counter – distributed algorithm 0 1 Representation 0 1 2 1 2 2 0 3 3 2
Families of distributed graph algorithms Distributing data-intensive algorithms Counting the number of triangles in a graph 21 / 61 Triangle Counter – distributed algorithm First Map 0 Let’s send our ID to all of our 0 1 neighbours possessing a higher ID than ours. Let’s send our neighbours 1 to ourselves. First Reduce 2 Let’s write out the information received.
Families of distributed graph algorithms Distributing data-intensive algorithms Counting the number of triangles in a graph 22 / 61 Triangle Counter – distributed algorithm Second Map If the ID received is smaller then 0 [] 1 [0] ours let’s pass it on to our neighbours. Let’s send our neighbours to ourselves. 1 Second Reduce 2 [1] If the ID received is our neighbour then let’s increment a global counter.
Families of distributed graph algorithms Distributing data-intensive algorithms Counting the number of triangles in a graph 23 / 61 Triangle Counter – distributed algorithm Second Map If the ID received is smaller then ours let’s pass it on to our 0 + + 1 neighbours. Let’s send our neighbours to ourselves. Second Reduce 2 If the ID received is our neighbour then let’s increment a global counter.
Families of distributed graph algorithms Families of distributed graph algorithms 24 / 61 Table of contents Distributing data-intensive algorithms Motivation MapReduce & Pregel Counting the number of triangles in a graph Families of distributed graph algorithms Local algorithms Graph traversal based algorithms Matrix multiplication based algorithms Experiments Representative algorithms Results
Families of distributed graph algorithms Families of distributed graph algorithms Local algorithms 25 / 61 Local algorithms Traits ◮ Dependant on a small environment of the given vertex or edge. ◮ ,,Trivial” candidates for parallel computing. ◮ Examples are fingerprint computation, local clustering coefficient and the number of triangles.
Families of distributed graph algorithms Families of distributed graph algorithms Local algorithms 26 / 61 Local algorithms Traits ◮ Dependant on a small environment of the given vertex or edge. ◮ ,,Trivial” candidates for parallel computing. ◮ Examples are fingerprint computation, local clustering coefficient and the number of triangles.
Families of distributed graph algorithms Families of distributed graph algorithms Local algorithms 27 / 61 Local algorithms Traits ◮ Dependant on a small environment of the given vertex or edge. ◮ ,,Trivial” candidates for parallel computing. ◮ Examples are fingerprint computation, local clustering coefficient and the number of triangles.
Families of distributed graph algorithms Families of distributed graph algorithms Graph traversal based algorithms 28 / 61 Graph traversal based algorithms Traits ◮ Dependant on taking long routes in the graph. ◮ Difficult to implement in a distributed environment. ◮ The distributed algorithm can be less effective than the sequential as the representation is less powerful. ◮ Examples could be accessibility, betweenness centrality and strongly connected components.
Recommend
More recommend