Partitioning Spatially Located Load with Rectangles Erik Saule 1 , Erdeniz ¨ s 1 , 2 , ¨ urek 1 , 3 O. Ba¸ Umit V. C ¸ataly¨ { esaule,erdeniz,umit } @bmi.osu.edu 1 Department of Biomedical Informatics 2 Department of Computer Science and Engineering 3 Department of Electric and Computer Engineering The Ohio State University IPDPS 2011 Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek :: 1 / 36 HPC Lab http://bmi.osu.edu/hpc
A load distribution problem Load matrix In parallel computing, the load can be spatially located. The computation should be distributed accordingly. Applications Particles in Cell (stencil) Sparse Matrices Direct Volume Rendering Metrics Load balance Communication Stability Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 2 / 36 HPC Lab http://bmi.osu.edu/hpc
Different kinds of partition Uniform Rectilinear P × Q -way jagged (th) m -way jagged hierarchical spiral (def, heur, th, opt) (heur, opt) (heur, opt) Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 3 / 36 HPC Lab http://bmi.osu.edu/hpc
Different load balance on 2304 processors Particles (2050x2050) Uniform (17.5%) Rectilinear (15.1%) P × Q -way jagged (2.3%) m -way jagged (2.0%) hierarchical (2.7%) Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 4 / 36 HPC Lab http://bmi.osu.edu/hpc
This talk is about how to generate such partitions, either optimally or heuristically, and the type of guarantee we can obtain. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 5 / 36 HPC Lab http://bmi.osu.edu/hpc
Outline Introduction 1 Preliminaries 2 Notation In One Dimension Simulation Setting Rectilinear Partitioning 3 Nicol’s Algorithm Jagged Partitioning 4 P × Q -way Jagged m -way Jagged Hierarchical Bisection 5 Recursive Bisection Dynamic Programming Final thoughts 6 Summing up Conclusion and Perspective Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 6 / 36 HPC Lab http://bmi.osu.edu/hpc
The Rectangular Partitioning Problem Definition Let A be a n 1 × n 2 matrix of non-negative values. The problem is to partition the [1 , 1] × [ n 1 , n 2 ] rectangle into a set S of m rectangles. The load of rectangle r = [ x , y ] × [ x ′ , y ′ ] is L ( r ) = � x ≤ i ≤ x ′ , y ≤ j ≤ y ′ A [ i ][ j ]. The problem is to minimize L max = max r ∈ S L ( r ). Prefix Sum Algorithms are rarely interested in the value of a particular element but rather interested in the load of a rectangle. The matrix is given as a 2D i ′ ≤ i , j ′ ≤ j A [ i ′ ][ j ′ ]. By convention prefix sum array Pr such as Pr [ i ][ j ] = � Pr [0][ j ] = Pr [ i ][0] = 0. We can now compute the load of rectangle r = [ x , y ] × [ x ′ , y ′ ] as L ( r ) = Pr [ x ′ ][ y ′ ] − Pr [ x − 1][ y ′ ] − Pr [ x ′ ][ y − 1] + Pr [ x − 1][ y − 1]. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Preliminaries::Notation 7 / 36 HPC Lab http://bmi.osu.edu/hpc
In One Dimension Optimal : Nicol’s algorithm [Nic94] (improved by [PA04]) Based on parametric search. Complexity: O (( m log n m ) 2 ). Heuristic : Direct Cut [MP97] Greedy algorithm. Complexity: O ( m log n m ). i ′ A [ i ′ ] � Guarantees : L max ( DC ) ≤ + max i A [ i ]. m (More details in Section 2.2) Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Preliminaries::In One Dimension 8 / 36 HPC Lab http://bmi.osu.edu/hpc
Simulation Setting Classes (Some inspired by [MS96]) Processors Simulation are perform with different number of processors: most squared numbers up to 10,000. Metric L max Load imbalance is the presented metric : − 1. � i , j A [ i ][ j ] m Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Preliminaries::Simulation Setting 9 / 36 HPC Lab http://bmi.osu.edu/hpc
Outline of the Talk Introduction 1 Preliminaries 2 Notation In One Dimension Simulation Setting Rectilinear Partitioning 3 Nicol’s Algorithm Jagged Partitioning 4 P × Q -way Jagged m -way Jagged Hierarchical Bisection 5 Recursive Bisection Dynamic Programming Final thoughts 6 Summing up Conclusion and Perspective Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Rectilinear Partitioning:: 10 / 36 HPC Lab http://bmi.osu.edu/hpc
Rectilinear Partitioning Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Rectilinear Partitioning:: 11 / 36 HPC Lab http://bmi.osu.edu/hpc
Nicol’s Algorithm [Nic94]: RECT-NICOL The algorithm RECT-NICOL is an iterative heuristic. At each iteration the partition in one dimension is refined by using a 1D algorithm. Complexity: O ( n 1 n 2 ) iterations (around 10 in practice) P ) 2 + P ( Q log n 2 1 iteration : O ( Q ( P log n 1 Q ) 2 ). Other algorithms The problem of finding the optimal Rectilinear Partitioning is NP-Complete. Therefore, other algorithms which mainly focuses on theoretical properties. The guarantees are unsuitable. The algorithms are computationally expensive ( n 10 1 ) and difficult to implement (rely on linear programming or present numerical instability). (See Section 3.1 for more details) Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Rectilinear Partitioning:: 12 / 36 HPC Lab http://bmi.osu.edu/hpc
Outline of the Talk Introduction 1 Preliminaries 2 Notation In One Dimension Simulation Setting Rectilinear Partitioning 3 Nicol’s Algorithm Jagged Partitioning 4 P × Q -way Jagged m -way Jagged Hierarchical Bisection 5 Recursive Bisection Dynamic Programming Final thoughts 6 Summing up Conclusion and Perspective Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: 13 / 36 HPC Lab http://bmi.osu.edu/hpc
P × Q -way Jagged Partitioning Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 14 / 36 HPC Lab http://bmi.osu.edu/hpc
A P × Q -way Jagged Heuristic: JAG-PQ-HEUR P × Q Jagged Partitioning Sum on columns to generate a 1D problem. Partition it in P parts. For the first stripe, sum on rows. Partition it in Q parts. � � � � � � � Treat all stripes. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 15 / 36 HPC Lab http://bmi.osu.edu/hpc
A P × Q -way Jagged Heuristic: JAG-PQ-HEUR � P × Q Jagged Partitioning � Sum on columns to generate a � 1D problem. � Partition it in P parts. � For the first stripe, sum on rows. � Partition it in Q parts. Treat all stripes. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 15 / 36 HPC Lab http://bmi.osu.edu/hpc
A P × Q -way Jagged Heuristic: JAG-PQ-HEUR P × Q Jagged Partitioning Sum on columns to generate a 1D problem. Partition it in P parts. For the first stripe, sum on rows. Partition it in Q parts. Treat all stripes. Complexity : P ) 2 + P × ( Q log n 2 O (( P log n 1 Q ) 2 ). Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 15 / 36 HPC Lab http://bmi.osu.edu/hpc
How good is that ? Theorem (Theorem 1 in Section 3.2.1) If there are no zero in the array, JAG-PQ-HEUR is a (1 + ∆ P n 1 )(1 + ∆ Q n 2 ) -approximation algorithm where ∆ = max A min A , P < n 1 , Q < n 2 . Proof. Based on the guarantee of 1D heuristics. Theorem (Theorem 2 in Section 3.2.1) � m n 1 The approximation ratio is minimized by P = n 2 . Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 16 / 36 HPC Lab http://bmi.osu.edu/hpc
An optimal P × Q -way jagged partitioning : JAG-PQ-OPT A Dynamic Programming Formulation L max ( n 1 , P ) = min 1 ≤ k < n 1 max( L max ( k − 1 , P − 1) , 1 D ( k , n 1 , Q )) L max (0 , P ) = 0 L max ( n 1 , 0) = + ∞ , ∀ n 1 ≥ 1 O ( n 1 P ) L max functions to evaluate. (Each is O ( k ).) O ( n 2 1 ) 1D functions to evaluate. (Each is O (( Q log n 2 Q ) 2 ).) (Some significant implementation optimizations apply) For a 512x512 matrix and 1000 processors, that’s 512,000+262,144 values. On 64-bit values, that’s 6MB. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 17 / 36 HPC Lab http://bmi.osu.edu/hpc
Performance of P × Q -way jagged (PIC-MAG it=30000) 1 RECT-NICOL JAG-PQ-HEUR JAG-PQ-OPT 0.1 load imbalance 0.01 0.001 10 100 1000 10000 number of processors Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 18 / 36 HPC Lab http://bmi.osu.edu/hpc
m-way Jagged Partitioning Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: m -way Jagged 19 / 36 HPC Lab http://bmi.osu.edu/hpc
Recommend
More recommend