Load-Balancing Spatially Located Computations using Rectangular Partitions Erdeniz ¨ s 1 , 2 , Erik Saule 1 , ¨ urek 1 , 3 O. Ba¸ Umit V. C ¸ataly¨ { erdeniz,esaule,umit } @bmi.osu.edu 1 Department of Biomedical Informatics 2 Department of Computer Science and Engineering 3 Department of Electric and Computer Engineering The Ohio State University SIAM Conference on Parallel Processing for Scientific Computing 2012 Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek :: 1 / 31 HPC Lab http://bmi.osu.edu/hpc
A load distribution problem Load matrix In parallel computing, the load can be spatially located. The computation should be distributed accordingly. Applications Particles in Cell Sparse Matrices Direct Volume Rendering Metrics Load balance Communication Stability Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 2 / 31 HPC Lab http://bmi.osu.edu/hpc
Different kinds of partition Uniform Rectilinear P × Q -way jagged (th) m -way jagged hierarchical spiral (def, heur, th, opt) (heur, opt) (heur, opt) Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 3 / 31 HPC Lab http://bmi.osu.edu/hpc
Different load balance on 2304 processors Particles (2050x2050) Uniform (17.5%) Rectilinear (15.1%) P × Q -way jagged (2.3%) m -way jagged (2.0%) hierarchical (2.7%) Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 4 / 31 HPC Lab http://bmi.osu.edu/hpc
This talk is about how to generate such partitions, either optimally or heuristically, and the type of guarantee we can obtain. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 5 / 31 HPC Lab http://bmi.osu.edu/hpc
Outline Introduction 1 Preliminaries 2 Notation In One Dimension Simulation Setting Rectilinear Partitioning 3 Nicol’s Algorithm Jagged Partitioning 4 P × Q -way Jagged m -way Jagged Hierarchical Bisection 5 Recursive Bisection Dynamic Programming Final thoughts 6 Summing up Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 6 / 31 HPC Lab http://bmi.osu.edu/hpc
The Rectangular Partitioning Problem Definition Let A be a n 1 × n 2 matrix of non-negative values. The problem is to partition the [1 , 1] × [ n 1 , n 2 ] rectangle into a set S of m rectangles. The load of rectangle r = [ x , y ] × [ x ′ , y ′ ] is L ( r ) = � x ≤ i ≤ x ′ , y ≤ j ≤ y ′ A [ i ][ j ]. The problem is to minimize L max = max r ∈ S L ( r ). Prefix Sum Algorithms are rarely interested in the value of a particular element but rather interested in the load of a rectangle. The matrix is given as a 2D i ′ ≤ i , j ′ ≤ j A [ i ′ ][ j ′ ]. By convention prefix sum array Pr such as Pr [ i ][ j ] = � Pr [0][ j ] = Pr [ i ][0] = 0. We can now compute the load of rectangle r = [ x , y ] × [ x ′ , y ′ ] as L ( r ) = Pr [ x ′ ][ y ′ ] − Pr [ x − 1][ y ′ ] − Pr [ x ′ ][ y − 1] + Pr [ x − 1][ y − 1]. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Preliminaries::Notation 7 / 31 HPC Lab http://bmi.osu.edu/hpc
In One Dimension Optimal : Nicol’s algorithm [Nic94] (improved by [PA04]) Based on parametric search. Complexity: O (( m log n m ) 2 ). Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Preliminaries::In One Dimension 8 / 31 HPC Lab http://bmi.osu.edu/hpc
Simulation Setting Classes (Some inspired by [MS96]) Processors Simulation are perform with different number of processors: most squared numbers up to 10,000. Metric L max Load imbalance is the presented metric : − 1. � i , j A [ i ][ j ] m Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Preliminaries::Simulation Setting 9 / 31 HPC Lab http://bmi.osu.edu/hpc
Outline of the Talk Introduction 1 Preliminaries 2 Notation In One Dimension Simulation Setting Rectilinear Partitioning 3 Nicol’s Algorithm Jagged Partitioning 4 P × Q -way Jagged m -way Jagged Hierarchical Bisection 5 Recursive Bisection Dynamic Programming Final thoughts 6 Summing up Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Rectilinear Partitioning:: 10 / 31 HPC Lab http://bmi.osu.edu/hpc
Rectilinear Partitioning Generalities The problem is NP-Hard. Approximation algorithms exist but are very slow. RECT-NICOL [Nic94] An iterative heuristics. At each iteration the partition in one dimension is refined. Complexity: O ( n 1 n 2 ) iterations ( ≤ 10 in practice). 1 iteration: P ) 2 + P ( Q log n 2 O ( Q ( P log n 1 Q ) 2 ). Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Rectilinear Partitioning:: 11 / 31 HPC Lab http://bmi.osu.edu/hpc
Outline of the Talk Introduction 1 Preliminaries 2 Notation In One Dimension Simulation Setting Rectilinear Partitioning 3 Nicol’s Algorithm Jagged Partitioning 4 P × Q -way Jagged m -way Jagged Hierarchical Bisection 5 Recursive Bisection Dynamic Programming Final thoughts 6 Summing up Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: 12 / 31 HPC Lab http://bmi.osu.edu/hpc
A P × Q -way Jagged Heuristic JAG-PQ-HEUR Sum on each column to generate a 1D problem. Partition it into P parts. For the first stripe, sum on each row. Partition it in Q parts. Treat all stripes. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 13 / 31 HPC Lab http://bmi.osu.edu/hpc
A P × Q -way Jagged Heuristic JAG-PQ-HEUR Sum on each column to generate a 1D problem. Partition it into P parts. For the first stripe, sum on each row. Partition it in Q parts. � � � � � � � Treat all stripes. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 13 / 31 HPC Lab http://bmi.osu.edu/hpc
A P × Q -way Jagged Heuristic � JAG-PQ-HEUR � Sum on each column to generate a 1D problem. � Partition it into P parts. � For the first stripe, sum on each row. � Partition it in Q parts. � Treat all stripes. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 13 / 31 HPC Lab http://bmi.osu.edu/hpc
A P × Q -way Jagged Heuristic JAG-PQ-HEUR Sum on each column to generate a 1D problem. Partition it into P parts. For the first stripe, sum on each row. Partition it in Q parts. Treat all stripes. Complexity : P ) 2 + P × ( Q log n 2 O (( P log n 1 Q ) 2 ). Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 13 / 31 HPC Lab http://bmi.osu.edu/hpc
An optimal P × Q -way jagged partitioning : JAG-PQ-OPT A Dynamic Programming Formulation L max ( n 1 , P ) = min 1 ≤ k < n 1 max( L max ( k − 1 , P − 1) , 1 D ( k , n 1 , Q )) L max (0 , P ) = 0 L max ( n 1 , 0) = + ∞ , ∀ n 1 ≥ 1 O ( n 1 P ) L max functions to evaluate. (Each is O ( k ).) O ( n 2 1 ) 1D functions to evaluate. (Each is O (( Q log n 2 Q ) 2 ).) (Some significant implementation optimizations apply) For a 512x512 matrix and 1000 processors, that’s 512,000+262,144 values. On 64-bit values, that’s 6MB. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 14 / 31 HPC Lab http://bmi.osu.edu/hpc
Performance of P × Q -way jagged (PIC-MAG it=30000) 1 RECT-NICOL JAG-PQ-HEUR JAG-PQ-OPT 0.1 load imbalance 0.01 0.001 10 100 1000 10000 number of processors Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 15 / 31 HPC Lab http://bmi.osu.edu/hpc
m -way jagged partitioning heuristics JAG-M-HEUR Similar to JAG-PQ-HEUR . Cut in P stripes using an optimal 1D Algorithm. Distribute processors proportionally to the stripe’s load. Compute a 1D partitioning of each stripe independently. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: m -way Jagged 16 / 31 HPC Lab http://bmi.osu.edu/hpc
m -way jagged partitioning heuristics JAG-M-HEUR Similar to JAG-PQ-HEUR . Cut in P stripes using an optimal 1D Algorithm. Distribute processors proportionally to the stripe’s load. Compute a 1D partitioning of each stripe independently. JAG-M-HEUR-PROBE Partition all the stripes at once using a multiple 1D arrays partitioning algorithm [Fre92]. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: m -way Jagged 16 / 31 HPC Lab http://bmi.osu.edu/hpc
Recommend
More recommend