efficient diameter approximation for large graphs in
play

Efficient Diameter Approximation for Large Graphs in MapReduce - PowerPoint PPT Presentation

Efficient Diameter Approximation for Large Graphs in MapReduce Geppino Pucci - Universit` a di Padova, Italy Based on joint works ([SPAA15], [IPDPS16]) with: Matteo Ceccarello, Andrea Pietracaprina (U. Padova) Eli Upfal (Brown U.) Outline 1.


  1. Efficient Diameter Approximation for Large Graphs in MapReduce Geppino Pucci - Universit` a di Padova, Italy Based on joint works ([SPAA15], [IPDPS16]) with: Matteo Ceccarello, Andrea Pietracaprina (U. Padova) Eli Upfal (Brown U.)

  2. Outline 1. Context 2. Computational model 3. Previous work 4. Diameter approximation algorithm 5. Experiments 6. Conclusions

  3. Context Scenario ◮ Large graph analytics: major discovery tool for diverse application domains (e.g., social/road/biological network analysis, cybersecurity, NLP, cognitive computing) ◮ (Commodity) computer clusters: cheap, widespread platforms with relatively high communication/synchronization costs Focus ◮ Approximation of graph diameter ◮ very large, undirected, weighted (sparse) graphs ◮ linear space, few parallel rounds, practical efficiency

  4. Computational Model MR model [PPRSU12] ◮ Abstraction of popular programming frameworks (MapReduce/Hadoop, Spark) ◮ Builds upon and simplifies [Karloff+’10][Goodrich’11] ◮ Underlying platform: unspecified number of interconnected commodity machines ◮ Algorithm: sequence of rounds ◮ 2 parameters: max local space M L , max aggregate space M A MR( M L , M A ) round Transforms a multiset X of key-value pairs into a new multiset Y of key value pairs by applying a given reduce function to all input pairs with the same key.

  5. Previous work Sequential setting ◮ APSP (Johnson’s alg.): O ( n · m + n 2 log n ) time ◮ Roditty et al. (STOC’13, SODA’14): 3/2-approximation in O (min { m 3 / 2 , mn 2 / 3 } ) = o ( n · m ) time. ◮ Empirically: very few SSSPs guarantee accurate estimates ([MLH09, CGHLM13, C+12, C+13, C+15]). Parallel setting ◮ Exact diameter through matrix-multiplication: O (log n ) rounds but Ω( n 2 ) space. ◮ Cohen (JACM’00): (1 + ǫ )-approximation in O (poly(log n )) time and superlinear space . Not easy to implement.

  6. Previous work (cont’d) 2-Approximation achievable through SSSP PRAM algorithms ∆-stepping (Meyer and Sanders, JoA’03) ◮ Parallel time-work tradeoff by staggering edge relaxations ( d j ← min { d j , d i + w ij } ) ◮ At iteration i , compute distances ∈ [( i − 1)∆ , i ∆]. ◮ Small ∆’s: ≃ Dijkstra. Large ∆’s: ≃ Bellman-Ford ◮ Round complexity =Ω( ℓ Φ( G ) ), where ℓ Φ( G ) edges are required to connect any two nodes at distance Φ( G ). Our aim Diameter approximation in linear space and o ( ℓ Φ( G ) ) rounds

  7. Diameter approximation: high-level strategy Based on shallow-depth clustering: 1. Compute a decomposition C of G into clusters of small radius 2. Estimate diameter Φ( G ) from diameter Φ( G C ), with G C a suitable quotient graph derived from C Remarks ◮ Previous decompositions ([MPX13, Mey08]) do not guarantee small (unweighted+weighted) radius ◮ Cluster ganularity chosen so that G C fits into local memory ◮ Small radius → low round complexity, better approximation

  8. Decomposition C : algorithm cluster ( τ ) Challenges Cluster centers are sampled at random. In order to attain small (unweighted+weighted) cluster radius we must 1. Ensure higher sampling density in remote regions of the graph 2. Avoid heavy edges for cluster growth Key ingredients 1. Progressive clustering strategy 2. ∆-stepping approach to cluster growing

  9. Decomposition C : a pathological example

  10. Decomposition C : algorithm cluster ( τ ) Progressive clustering [CPPU15] 1. Select random batch of τ centers from uncovered nodes 2. Grow both old and new clusters until covering half of the uncovered nodes 3. Repeat steps 1-2 until complete coverage ∆-stepping-like cluster growth [CPPU16] ◮ ∆ ← guess on cluster’s minimum weighted radius ◮ In each iteration of progressive clustering (Steps 1-2): ◮ Use only light edges (weight < ∆) and stop at radius ∆ ◮ If desired coverage cannot be obtained then ∆ ← 2∆

  11. Algorithm cluster ( τ ) : example ( τ = 1 , ∆ = 4 ) Graph G 4 I M C 3 1 1 1 2 3 D 4 1 B 2 P O 3 N 2 1 A 2 E F 3 5 1 Q L G H 2 1 1 R S

  12. Algorithm cluster ( τ ) : example ( τ = 1 , ∆ = 4 ) 1st batch of τ centers 4 I M C 3 1 1 1 2 3 D 4 1 B 2 P 3 O N 2 1 A 2 E F 3 5 1 Q L G H 2 1 1 R S

  13. Algorithm cluster ( τ ) : example ( τ = 1 , ∆ = 4 ) 1st batch of τ centers 4 I M C 3 1 1 1 2 3 D 4 1 B 2 P 3 O N 2 1 A 2 E F 3 5 1 Q L G H 2 1 1 R S

  14. Algorithm cluster ( τ ) : example ( τ = 1 , ∆ = 4 ) 2nd batch of τ centers 4 I M C 3 1 1 1 2 3 D 4 1 B 2 P 3 O N 2 1 A 2 E F 3 5 1 Q L G H 2 1 1 R S

  15. Algorithm cluster ( τ ) : example ( τ = 1 , ∆ = 4 ) 3rd batch of τ centers 4 I M C 3 1 1 1 2 3 D 4 1 B 2 P 3 O N 2 1 A 2 E F 3 5 1 Q L G H 2 1 1 R S

  16. Decomposition C : algorithm cluster ( τ ) Theorem W.h.p. cluster ( τ ) computes a decomposition C of G into O ( τ log 2 n ) clusters ◮ Max cluster radius: O ( R ( G , τ ) log n ) ◮ Round complexity: O (min { n /τ, ℓ R ( G ,τ ) } log n ) on MR( n ǫ , m ), for any constant ǫ ∈ (0 , 1). where: ◮ R ( G , τ ): minimum max radius in any τ -clustering of G ◮ ℓ X : max number of edges in a min-weight path of weight X

  17. Diameter approximation: example Graph G , weighted diameter Φ( G ) = 16 4 I M C 3 1 1 1 2 3 D 4 1 B 2 P O 3 N 2 A 1 2 E F 3 5 1 Q L G H 1 2 1 S R

  18. Diameter approximation: example 4 I M C 3 1 1 1 2 3 D 4 1 B 2 P 3 N O 2 1 A 2 E F 3 5 1 Q L G H 1 2 1 R S Radius = 3 7 M Quotient graph G C A 5 Φ (G C ) = 12 Radius = 4 Φ (G) <= 12+4+2 = 18 (vs 16) R Radius = 2

  19. Diameter approximation: main result Theorem For a given weighted graph G , w.h.p. we can compute an upper bound to Φ( G ) ◮ Approximation ratio: O (log 3 n ) ◮ Round complexity: O (min { n /τ, ℓ R ( G ,τ ) log n } log n ) on MR( n ǫ , m ), for any constant ǫ ∈ (0 , 1). Remarks ◮ Round complexity becomes o ( ℓ Φ( G ) / n δ ) on graphs of bounded doubling dimension ◮ Practical implementation. On real-world graphs, approximation ratio < 1 . 3 ◮ Byproduct: linear-space, low-round k -center clustering in MR

  20. Proof Idea ◮ 2-phase decomposition strategy: ◮ Phase 1. Compute an estimate R of R ( G , τ ) through progressive sampling. ◮ Phase 2. Perform log n iterations of cluster-growing steps of fixed radius R from batches of centers selected with geometrically increasing probability ◮ O (log 3 n ) Approximation: w.h.p. the nodes of each shortest-path segment of length R belong to O (log 2 n ) clusters of radius O ( R log n ).

  21. Diameter approximation: experiments Experimental setup ◮ In-house cluster with 16 machines ◮ 18GB RAM / Intel i7 nehalem 4-core processor ◮ Spark MapReduce platform Scalability Datasets 4500 R­MAT(26) roads(3) 4000 Graph n m Φ( G ) 3500 3000 23,947,347 29,166,673 55,859,820 roads-USA 1,890,815 2,328,872 16,425,258 roads-CAL 2500 time (s) 3,997,962 32, 681, 189 9.41 livejournal 2000 41,652,230 1,468,365,182 9.07 twitter 1500 S 2 mesh(S) 2 S ( S − 1) † 1000 2 S 16 · 2 S R-MAT(S) † ≈ S · 2 . 3 · 10 7 ≈ S · 5 . 3 · 10 7 roads(S) † 500 0 † the diameter depends on the size of the graph, controlled by S > 1. 2 1 2 2 2 3 2 4 machines

  22. Diameter approximation: experiments ◮ We compare our algorithm ( CLUSTER ) with ∆-stepping Rounds Time 10 5 10 5 CLUSTER CLUSTER ∆ ­stepping ∆ ­stepping 10 4 10 4 10 3 time (s) 10 3 10 2 10 2 10 1 10 0 10 1 A L h a l r 4 ) A L h a l r 4 ) S A s n t e 2 S A s n t e 2 U C m e r i t ( U C m e r i t ( ­ ­ u w T ­ ­ u w T s d s j o t A s d s o j t A d a e M d a e M a o v ­ a o v ­ o r l i R o r l i R r r

  23. Diameter approximation: experiments Approximation Work 10 12 1.5 CLUSTER CLUSTER ∆ ­stepping ∆ ­stepping 10 11 1.4 10 10 1.3 10 9 1.2 10 8 1.1 10 7 1.0 A L h a l r 4 ) S A s n e 2 A L h a l r 4 ) U C e r t t ( S A s n e 2 ­ ­ m u w i T U C e r t t ( s s o t A ­ m u w i T d d e j M s ­ s o t A a o a v ­ d d e j M o r l i R a a v ­ r o r o l i R r The approximation quality does not depend on the granularity of the clustering.

  24. Conclusions Summary MR-algorithm for O (log 3 n ) approximation of the diameter of a large, undirected, weighted graph G ◮ o ( ℓ Φ( G ) ) rounds, linear global space, sublinear local space ◮ Good performance/approximation on real-world graphs Ongoing and future work ◮ Tighter analysis of approximation factor ◮ Clustering + constant d.d. yields a (1 + ǫ ) (unweighted) diameter approximation in O (( m + n ) /ǫ ) sequential time. ◮ Clustering for approximate centrality computations Software GRADIAS: crono.dei.unipd.it/gradias

Recommend


More recommend