Efficient Diameter Approximation for Large Graphs in MapReduce - PowerPoint PPT Presentation

Efficient Diameter Approximation for Large Graphs in MapReduce Geppino Pucci - Universit` a di Padova, Italy Based on joint works ([SPAA15], [IPDPS16]) with: Matteo Ceccarello, Andrea Pietracaprina (U. Padova) Eli Upfal (Brown U.)

Outline 1. Context 2. Computational model 3. Previous work 4. Diameter approximation algorithm 5. Experiments 6. Conclusions

Context Scenario ◮ Large graph analytics: major discovery tool for diverse application domains (e.g., social/road/biological network analysis, cybersecurity, NLP, cognitive computing) ◮ (Commodity) computer clusters: cheap, widespread platforms with relatively high communication/synchronization costs Focus ◮ Approximation of graph diameter ◮ very large, undirected, weighted (sparse) graphs ◮ linear space, few parallel rounds, practical efficiency

Computational Model MR model [PPRSU12] ◮ Abstraction of popular programming frameworks (MapReduce/Hadoop, Spark) ◮ Builds upon and simplifies [Karloff+’10][Goodrich’11] ◮ Underlying platform: unspecified number of interconnected commodity machines ◮ Algorithm: sequence of rounds ◮ 2 parameters: max local space M L , max aggregate space M A MR( M L , M A ) round Transforms a multiset X of key-value pairs into a new multiset Y of key value pairs by applying a given reduce function to all input pairs with the same key.

Previous work Sequential setting ◮ APSP (Johnson’s alg.): O ( n · m + n 2 log n ) time ◮ Roditty et al. (STOC’13, SODA’14): 3/2-approximation in O (min { m 3 / 2 , mn 2 / 3 } ) = o ( n · m ) time. ◮ Empirically: very few SSSPs guarantee accurate estimates ([MLH09, CGHLM13, C+12, C+13, C+15]). Parallel setting ◮ Exact diameter through matrix-multiplication: O (log n ) rounds but Ω( n 2 ) space. ◮ Cohen (JACM’00): (1 + ǫ )-approximation in O (poly(log n )) time and superlinear space . Not easy to implement.

Previous work (cont’d) 2-Approximation achievable through SSSP PRAM algorithms ∆-stepping (Meyer and Sanders, JoA’03) ◮ Parallel time-work tradeoff by staggering edge relaxations ( d j ← min { d j , d i + w ij } ) ◮ At iteration i , compute distances ∈ [( i − 1)∆ , i ∆]. ◮ Small ∆’s: ≃ Dijkstra. Large ∆’s: ≃ Bellman-Ford ◮ Round complexity =Ω( ℓ Φ( G ) ), where ℓ Φ( G ) edges are required to connect any two nodes at distance Φ( G ). Our aim Diameter approximation in linear space and o ( ℓ Φ( G ) ) rounds

Diameter approximation: high-level strategy Based on shallow-depth clustering: 1. Compute a decomposition C of G into clusters of small radius 2. Estimate diameter Φ( G ) from diameter Φ( G C ), with G C a suitable quotient graph derived from C Remarks ◮ Previous decompositions ([MPX13, Mey08]) do not guarantee small (unweighted+weighted) radius ◮ Cluster ganularity chosen so that G C fits into local memory ◮ Small radius → low round complexity, better approximation

Decomposition C : algorithm cluster ( τ ) Challenges Cluster centers are sampled at random. In order to attain small (unweighted+weighted) cluster radius we must 1. Ensure higher sampling density in remote regions of the graph 2. Avoid heavy edges for cluster growth Key ingredients 1. Progressive clustering strategy 2. ∆-stepping approach to cluster growing

Decomposition C : a pathological example

Decomposition C : algorithm cluster ( τ ) Progressive clustering [CPPU15] 1. Select random batch of τ centers from uncovered nodes 2. Grow both old and new clusters until covering half of the uncovered nodes 3. Repeat steps 1-2 until complete coverage ∆-stepping-like cluster growth [CPPU16] ◮ ∆ ← guess on cluster’s minimum weighted radius ◮ In each iteration of progressive clustering (Steps 1-2): ◮ Use only light edges (weight < ∆) and stop at radius ∆ ◮ If desired coverage cannot be obtained then ∆ ← 2∆

Algorithm cluster ( τ ) : example ( τ = 1 , ∆ = 4 ) Graph G 4 I M C 3 1 1 1 2 3 D 4 1 B 2 P O 3 N 2 1 A 2 E F 3 5 1 Q L G H 2 1 1 R S

Algorithm cluster ( τ ) : example ( τ = 1 , ∆ = 4 ) 1st batch of τ centers 4 I M C 3 1 1 1 2 3 D 4 1 B 2 P 3 O N 2 1 A 2 E F 3 5 1 Q L G H 2 1 1 R S

Algorithm cluster ( τ ) : example ( τ = 1 , ∆ = 4 ) 2nd batch of τ centers 4 I M C 3 1 1 1 2 3 D 4 1 B 2 P 3 O N 2 1 A 2 E F 3 5 1 Q L G H 2 1 1 R S

Algorithm cluster ( τ ) : example ( τ = 1 , ∆ = 4 ) 3rd batch of τ centers 4 I M C 3 1 1 1 2 3 D 4 1 B 2 P 3 O N 2 1 A 2 E F 3 5 1 Q L G H 2 1 1 R S

Decomposition C : algorithm cluster ( τ ) Theorem W.h.p. cluster ( τ ) computes a decomposition C of G into O ( τ log 2 n ) clusters ◮ Max cluster radius: O ( R ( G , τ ) log n ) ◮ Round complexity: O (min { n /τ, ℓ R ( G ,τ ) } log n ) on MR( n ǫ , m ), for any constant ǫ ∈ (0 , 1). where: ◮ R ( G , τ ): minimum max radius in any τ -clustering of G ◮ ℓ X : max number of edges in a min-weight path of weight X

Diameter approximation: example Graph G , weighted diameter Φ( G ) = 16 4 I M C 3 1 1 1 2 3 D 4 1 B 2 P O 3 N 2 A 1 2 E F 3 5 1 Q L G H 1 2 1 S R

Diameter approximation: example 4 I M C 3 1 1 1 2 3 D 4 1 B 2 P 3 N O 2 1 A 2 E F 3 5 1 Q L G H 1 2 1 R S Radius = 3 7 M Quotient graph G C A 5 Φ (G C ) = 12 Radius = 4 Φ (G) <= 12+4+2 = 18 (vs 16) R Radius = 2

Diameter approximation: main result Theorem For a given weighted graph G , w.h.p. we can compute an upper bound to Φ( G ) ◮ Approximation ratio: O (log 3 n ) ◮ Round complexity: O (min { n /τ, ℓ R ( G ,τ ) log n } log n ) on MR( n ǫ , m ), for any constant ǫ ∈ (0 , 1). Remarks ◮ Round complexity becomes o ( ℓ Φ( G ) / n δ ) on graphs of bounded doubling dimension ◮ Practical implementation. On real-world graphs, approximation ratio < 1 . 3 ◮ Byproduct: linear-space, low-round k -center clustering in MR

Proof Idea ◮ 2-phase decomposition strategy: ◮ Phase 1. Compute an estimate R of R ( G , τ ) through progressive sampling. ◮ Phase 2. Perform log n iterations of cluster-growing steps of fixed radius R from batches of centers selected with geometrically increasing probability ◮ O (log 3 n ) Approximation: w.h.p. the nodes of each shortest-path segment of length R belong to O (log 2 n ) clusters of radius O ( R log n ).

Diameter approximation: experiments Experimental setup ◮ In-house cluster with 16 machines ◮ 18GB RAM / Intel i7 nehalem 4-core processor ◮ Spark MapReduce platform Scalability Datasets 4500 RMAT(26) roads(3) 4000 Graph n m Φ( G ) 3500 3000 23,947,347 29,166,673 55,859,820 roads-USA 1,890,815 2,328,872 16,425,258 roads-CAL 2500 time (s) 3,997,962 32, 681, 189 9.41 livejournal 2000 41,652,230 1,468,365,182 9.07 twitter 1500 S 2 mesh(S) 2 S ( S − 1) † 1000 2 S 16 · 2 S R-MAT(S) † ≈ S · 2 . 3 · 10 7 ≈ S · 5 . 3 · 10 7 roads(S) † 500 0 † the diameter depends on the size of the graph, controlled by S > 1. 2 1 2 2 2 3 2 4 machines

Diameter approximation: experiments ◮ We compare our algorithm ( CLUSTER ) with ∆-stepping Rounds Time 10 5 10 5 CLUSTER CLUSTER ∆ stepping ∆ stepping 10 4 10 4 10 3 time (s) 10 3 10 2 10 2 10 1 10 0 10 1 A L h a l r 4 ) A L h a l r 4 ) S A s n t e 2 S A s n t e 2 U C m e r i t ( U C m e r i t ( u w T u w T s d s j o t A s d s o j t A d a e M d a e M a o v a o v o r l i R o r l i R r r

Diameter approximation: experiments Approximation Work 10 12 1.5 CLUSTER CLUSTER ∆ stepping ∆ stepping 10 11 1.4 10 10 1.3 10 9 1.2 10 8 1.1 10 7 1.0 A L h a l r 4 ) S A s n e 2 A L h a l r 4 ) U C e r t t ( S A s n e 2 m u w i T U C e r t t ( s s o t A m u w i T d d e j M s s o t A a o a v d d e j M o r l i R a a v r o r o l i R r The approximation quality does not depend on the granularity of the clustering.

Conclusions Summary MR-algorithm for O (log 3 n ) approximation of the diameter of a large, undirected, weighted graph G ◮ o ( ℓ Φ( G ) ) rounds, linear global space, sublinear local space ◮ Good performance/approximation on real-world graphs Ongoing and future work ◮ Tighter analysis of approximation factor ◮ Clustering + constant d.d. yields a (1 + ǫ ) (unweighted) diameter approximation in O (( m + n ) /ǫ ) sequential time. ◮ Clustering for approximate centrality computations Software GRADIAS: crono.dei.unipd.it/gradias

Efficient Diameter Approximation for Large Graphs in MapReduce - PowerPoint PPT Presentation

Efficient Diameter Approximation for Large Graphs in MapReduce Geppino Pucci - Universit` a di Padova, Italy Based on joint works ([SPAA15], [IPDPS16]) with: Matteo Ceccarello, Andrea Pietracaprina (U. Padova) Eli Upfal (Brown U.) Outline 1.

Mitchell Anderson Fortunate Son , 2017 Acrylic on canvas 200cm diameter 78 3/4in diameter

The diameter of permutation groups kos Seress May 2012 Cayley graphs The diameter of

The diameter of permutation groups permutation groups H. A. Helfgott February 2017 The

6. Approximation and fitting norm approximation least-norm problems regularized

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

The diameter of permutation groups Proof ideas H. A. Helfgott and . Seress July 2013 Cayley

Mesh Models (Chapter 8) 1. Overview of Mesh and Related models. a. Diameter: The linear

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Device for In-Situ Coating of Long, Small Diameter Tubes Diameter Tubes Project Summary Award

Diameter Group Signaling Thursday, March 6 th , 2014 draft-ietf-diameter-group-signaling-03 Mark

MLK CONNECTOR Presented by LARGE DIAMETER PILE REPAIR Brad Hessing & Heather Shoup Midwest

Growth of large diameter high-purity germanium crystals for Nuclear Physics research Principal

LARGE DIAMETER DRILLPIPE FIELD TRAINING OUTLINE Changes in Drill Pipe over a decade

NVIDIA GPU Architecture for General Purpose Computing Anthony Lippert 4/27/09 1 Outline

Automatic clustering of similar VM to improve the scalability of monitoring and management in

Toward Understanding Heterogeneity in Computing Arnold L. Rosenberg Ron C. Chiang Electrical

CS1010 Programming Methodology AY18/19 Sem 1 Lecture 2 21 August 2018 Admin Matters Unit 3:

Introduction to Machine Learning CMU-10701 19. Clustering and EM Barnabs Pczos Contents

CS 147: Computer Systems Performance Analysis Workload Characterization 1 / 31 Overview CS147

FAQs Quiz #3 Scores will be available by 3/6 Programming Assignment #2 March 10

Chapter 5: Cluster ering ing Jilles Vreeken IRDM 15/16 10 Nov 2015 Qu Question o of f

Efficient Diameter Approximation for Large Graphs in MapReduce - PowerPoint PPT Presentation

Efficient Diameter Approximation for Large Graphs in MapReduce Geppino Pucci - Universit` a di Padova, Italy Based on joint works ([SPAA15], [IPDPS16]) with: Matteo Ceccarello, Andrea Pietracaprina (U. Padova) Eli Upfal (Brown U.) Outline 1.

Mitchell Anderson Fortunate Son , 2017 Acrylic on canvas 200cm diameter 78 3/4in diameter

The diameter of permutation groups kos Seress May 2012 Cayley graphs The diameter of

The diameter of permutation groups permutation groups H. A. Helfgott February 2017 The

6. Approximation and fitting norm approximation least-norm problems regularized

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

The diameter of permutation groups Proof ideas H. A. Helfgott and . Seress July 2013 Cayley

Mesh Models (Chapter 8) 1. Overview of Mesh and Related models. a. Diameter: The linear

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Device for In-Situ Coating of Long, Small Diameter Tubes Diameter Tubes Project Summary Award

Diameter Group Signaling Thursday, March 6 th , 2014 draft-ietf-diameter-group-signaling-03 Mark

MLK CONNECTOR Presented by LARGE DIAMETER PILE REPAIR Brad Hessing &amp; Heather Shoup Midwest

Growth of large diameter high-purity germanium crystals for Nuclear Physics research Principal

LARGE DIAMETER DRILLPIPE FIELD TRAINING OUTLINE Changes in Drill Pipe over a decade

NVIDIA GPU Architecture for General Purpose Computing Anthony Lippert 4/27/09 1 Outline

Automatic clustering of similar VM to improve the scalability of monitoring and management in

Toward Understanding Heterogeneity in Computing Arnold L. Rosenberg Ron C. Chiang Electrical

CS1010 Programming Methodology AY18/19 Sem 1 Lecture 2 21 August 2018 Admin Matters Unit 3:

Introduction to Machine Learning CMU-10701 19. Clustering and EM Barnabs Pczos Contents

CS 147: Computer Systems Performance Analysis Workload Characterization 1 / 31 Overview CS147

FAQs Quiz #3 Scores will be available by 3/6 Programming Assignment #2 March 10

Chapter 5: Cluster ering ing Jilles Vreeken IRDM 15/16 10 Nov 2015 Qu Question o of f

MLK CONNECTOR Presented by LARGE DIAMETER PILE REPAIR Brad Hessing & Heather Shoup Midwest