Implementing a Parallel Graph Clustering Algorithm with Sparse Matrix Computation Jun Chen, Peigang Zou High Performance Computing Center, Institute of Applied Physics and Computational Mathematics chenjun@iapcm.ac.cn
OUTLINE n Gr Graph c h clu lustering ng n Peer pressure clustering(PPCL) n Cha halle lleng nges n A s solu lution: b n: based o on L n Large g graph p h pla latforms ms/li libraries n Combinatorial BLAS n Rela lated w works ks n Paralle llel P l PPCL a alg lgorithm w hm with ma h matrix c computation n Nume merical R l Result lts n Di Discussions ns n Conc nclu lusion
1 、 Graph Clustering ß A wide class of algorithms to classify vertices in a group into many clusters, where the vertices in the same cluster have high connectivity than those in various clusters.
Graph clustering Vs Vs. Vector clustering Clustering : find natural groups. Vector clustering Graph clustering Classified by relationship between points. Classified by distances between points.
Graph clustering ( cont. ) Machine Pattern learning recognition Peer pressure Random walks clustering ( PPCL ) Minimum cuts Graph clustering multi-way Graph clustering partition Genetic algorithms Image Bioinformatics …… analyze
2. Challenges
2.1 The size of Graph is growing E.g., # of Facebook users > 1 billion. A big graph!
2.2. Large Graph Computation n Paralle llel g l graph c h clu lustering ng i imple leme ment nt i is d difficult lt. It . It r requires: : n well-suited description of the natural sparse locality n Storage them effectively n High performance computing n High p h performa manc nce c cha halle lleng nge n Scalability: time should <= O(n). Memory consumption <O(nxn). n Parallel patterns for solving PDEs in typical scientific computing is based on dense computations. They are not suitful for the sparse characteristic in large graph computation. n MapReduce pattern for big data problems has low efficiency. ü A S Solu lution: B n: Based o on L n Large Gr Graph p h pla latform/ m/li library y
3. Large graph platforms/Libraries Scale leGr Graph n TITECH, JAPAN n Implements Pregel model provided by google , and optimize its collective n communication and memory management methods. Goal: can analyze the graphs containing 10 billion nodes and edges. n Di DisBeli lief n Google n A parallel framework for deep learning n PBGL GL(Paralle llel Bo Boost Gr Graph Library) y) n Indiana University, USA n C++ graph library n GA GAPDT ( Gr Graph A h Alg lgorithm a hm and nd P Pattern Di n Discovery T y Toolb lbox ) n UCSB, USA n Provide interactive graph operations, and can parallel run on Star-P, a parallel n version of MATLAB. Use distributed sparse array to describe the parallel operations. n
Gr Graph BL BLAS n Þ Defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. etc. n
Combinatorial BLAS CombBLAS: Ø A representation implementation of Graph BLAS Ø Multi-core parallelism based on MPI Ø A collective set of basic linear algebra operations. Table le 1 1. Many b y basic li line near a alg lgebra o operations ns i in n Comb mbBLAS
[Buluç A, Gilbert J R. The Combinatorial BLAS: design, implementation, and applications]
4. Related works ß Introduction of random walks method Þ The cluster assignment of a vertex will be the same at that of most of its neighbors. ß Current representative parallel PPCL methods. Þ Parallel PPCL in SPARQL Þ Parallel PPCL in STAR-P
Related works about parallel PPCL Paper 1 : Paper 2 : PPCL PPCL in S n SPARQL PPCL in S PPCL n STAR-P -P Result lts : Result lts : ü Maximu mum p m processor nu numb mber : 64 64 ü Maximu mum p m processor nu numb mber : 128 128 ü RDF DF Gr Graph : 10000 v vertices, 2 , 232,0 ,000 edg edges es ü R-M -MAT g graph : 209 2097152 52 vertices, 1 , 18305177 edg edges es 200 秒 200 • (scale le 2 21 ) ) ü Low p performa manc nce. Ø SPARQL is a similar SQL tool for RDF graph. Ø STAR-P is a parallel implementation of MATLAB. RDF Graph : stores meta-data for web resources. Its vertex identify a resource, its arc describes the resource attributes.
5. Parallel PPCL Algorithm with Matrix Computation • Standard PPCL algorithm • Alternative PPCL algorithm based on linear algebra • Parallel PPCL algorithm based on linear algebra • Parallel PPCL implementation on ComBLAS
5.1 、 standard PPCL algorithm 3.Vote Alg lgorithm 1 hm 1. PeerPressure( G ( , V E C ), ) = i 1 for ( , , ) u v w E ∈ 2 do ( )( T v C u ( )) T v C u ( )( ( )) w ← + i i 3 for n V ∈ 1.Given an 4 do C ( ) n i : j V T n : ( )( ) j T n i ( )( ) ← ∀ ∈ ≤ f 4.tally 2.Initial approximation 5 if C C == i f G’ 6 then return C f 7 else return PeerPre ssure( G ( , V E C ), ) = f If G’!=G’’, then G’=G’’ 5.Form new cluster Result If approximation G’’ G’==G’’ G’’ It starts with an initial cluster assignment, e.g., each vertex being in its own cluster.Each iteration performs an election at each vertex to select its cluster num. The votes are the cluster assignments of its neighbors. Ties are settled by selecting the lowest cluster ID to maintain determinism here. The algorithm converges when two consecutive iterations have a tiny difference between them.
5.2 PPCL algorithm based on Linear algebra Alg lgorithm 2 hm 2: 3.Vote PeerPressure( G A : R N N ,C : B N N ) × × = i + 1 T : R N N C : B N N m : R N × × f + + 2 T = C A i 3 m = T max. 4 C = m .== T f 5 if C == C 1.Given an i f 4.tally 6 then return C 2.Initial approximation f 7 else return PeerPressure( ,C ) G G’ f 1. Starting approximation G’: each If G’!=G’’, then G’=G’’ vertex is a cluster. 5.Form new 2. Initialization: assuring vertices have cluster Result If approximation equally votes. G’’ G’==G’’ G’’ 3. Vote: Each node vote for its neighbors. 4. Tally: (1) normalize. (2) Settling ties: what to do if two clusters tie for the maximum number of votes for a vertex. 5. Form a new approximation G’’
0.25 0 0.25 0.25 0 0 0.25 0 ⎛ ⎞ ⎜ ⎟ 0 0.25 0.25 0.25 0 0 0 0.25 ⎜ ⎟ 0.25 0 0.25 0 0.25 0 0.25 0 ⎜ ⎟ ⎜ ⎟ 0.25 0.25 0 0.25 0 0.25 0 0 ⎜ ⎟ A = 0.25 ⎜ 0 0.25 0 0.25 0 0.25 0 ⎟ ⎜ ⎟ 0 0.2 0 0.2 0 0.2 0.2 0.2 ⎜ ⎟ ⎜ ⎟ 0.33 0 0 0 0.33 0 0.33 0 ⎜ ⎟ ⎜ ⎟ 0 0.25 0 0.25 0 0.25 0 0.25 ⎝ ⎠ (a) The object graph G (b) The adjacency A of G after initialization 0 0 1 1 0 0 0 0 ⎛ ⎞ 0 0 1 1 0 0 0 0 ⎛ ⎞ ⎜ ⎟ 0 1 0 0 0 0 0 1 ⎜ ⎟ 0 1 1 1 0 0 0 1 ⎜ ⎟ ⎜ ⎟ 0 0 0 0 0 0 0 0 ⎜ ⎟ 0 0 1 0 0 0 0 0 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 ⎜ ⎟ C = 0 ⎜ ⎟ C = 0 f ⎜ 0 0 0 0 0 0 0 ⎟ ⎜ 0 1 0 0 0 0 0 ⎟ ⎜ ⎟ ⎜ ⎟ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 1 0 0 0 1 0 1 0 ⎜ ⎟ 1 0 0 0 1 0 1 0 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 0 0 0 0 0 0 0 0 ⎜ ⎟ 0 1 0 1 0 1 0 1 ⎝ ⎠ ⎝ ⎠ (c) Temporary results of matrix C (d) Temporary matrix C after 1 st tires-settling Fig.1 The procedure when applying algorithm 2 for object graph G.
5.3 Parallel PPCL algorithm based on linear algebra Alg lgorithm 3 hm 3: R N N B N N Input : a matix-based tageted graph A : × and a matrix-based initial approximation garph C : × . i + Output : a matrix-based clustering result. procedure PeerPressure( G A : R N N ,C : B N N ) × × = i + 1 SpParMat<unsig ned,double,SpDCCols,<unsigned,double>> A, C; /* reduce to Row columns are collapsed to , single entries */ 2 DenseParVec<unsigned,double> rowsums = A.Reduce(Row,plus<double>); Ini Initiali lization /* multinv double function is user defined for double */ < > 3 rowsums.Apply(multinv<double>); /* nomalize A scale each Column with given vector , */ 4 A.DimScale(Row, rowsums); 5 while C != T do /* vote */ Vote Vo 6 SpParMat<unsigned,double,SpDCCols,<unsigned,double>> T = SpGEMM(C,A); /* renomalize T */ Norma mali lization 7 Renomalize(SpParMat<unsigned,double,SpDCCols,<unsigned,double>> &T); /* settling ties */ Settli ling ng t ties 8 settling_ties(SpParMat<unsigned,double,SpDCCols,<unsigned,double>> &T);
5.4 Parallel PPCL implementation on CombBLAS ß Data distribution and storage Þ DCSC storage structure ß Algorithm expansion & MPI implementation Þ Parallel voting Þ Renormalization Þ Parallel ties-settling
5.4.1 Data distribution and storage ß Distribute the sparse matrices on a 2D <Pr, Pc> processor grid. Þ Processor P(I,j) stores the sub-matrix Aij of dimensions (m/Pr)x(n/Pc) in its local memory. ß HyperSparseGEMM operates on O(nnz) DCSC data structure.
Recommend
More recommend