Concept: coarsest balanced octree The coarsest balanced neighborhood can be extended to the coarsest balanced octree T k ( o ) . It does not look the same for all octants. 1 -balance 2 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 21 / 46
Formal definition of a Balance algorithm Definition ( T k ( S ) ) Given an arbitrary set of octants S , T k ( S ) is equal to the o ∈ S T k ( o ) . leaves (i.e. non-ancestor) octants in � The purpose of a Balance algorithm is to convert a linear octree T into T k ( T ) . Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 22 / 46
Ripple effect Example (2D, 1 -balance) This octree would be balanced if the blue octants were not present. The presence of the blue octants causes changes across the whole diameter of the octree. This has big implications for parallel algorithms. Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 23 / 46
Ripple effect Example (2D, 1 -balance) This octree would be balanced if the blue octants were not present. The presence of the blue octants causes changes across the whole diameter of the octree. This has big implications for parallel algorithms. Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 23 / 46
Ripple effect Example (2D, 1 -balance) This octree would be balanced if the blue octants were not present. The presence of the blue octants causes changes across the whole diameter of the octree. This has big implications for parallel algorithms. Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 23 / 46
Ripple effect Example (2D, 1 -balance) This octree would be balanced if the blue octants were not present. The presence of the blue octants causes changes across the whole diameter of the octree. This has big implications for parallel algorithms. Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 23 / 46
Serial 2:1 Balance algorithm Serial 2:1 Balance algorithm Start: unbalanced linear octree T for o ∈ T T ← T ∪ N k ( o ) ( T no longer linear octree: not ordered, overlaps) order T and remove overlaps Recognizing and eliminating redundant octants from this process greatly improves its performance. For more information, see Section 3 of our paper. Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 24 / 46
Parallel 2:1 Balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 25 / 46
Parallel Balance: the Ripple algorithm p q r s ↔ ↔ ↔ local balance using a serial algorithm exchange neighboring information local rebalance using neighboring information repeat O ( P ) rounds of communication may be necessary. This algorithm is appropriate in low latency settings. Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 26 / 46
Parallel Balance: the Ripple algorithm p q r s ↔ ↔ ↔ local balance using a serial algorithm exchange neighboring information local rebalance using neighboring information repeat O ( P ) rounds of communication may be necessary. This algorithm is appropriate in low latency settings. Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 26 / 46
Parallel Balance: the Ripple algorithm p q r s ↔ ↔ ↔ local balance using a serial algorithm exchange neighboring information local rebalance using neighboring information repeat O ( P ) rounds of communication may be necessary. This algorithm is appropriate in low latency settings. Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 26 / 46
Parallel Balance: the Ripple algorithm p q r s ↔ ↔ ↔ local balance using a serial algorithm exchange neighboring information local rebalance using neighboring information repeat O ( P ) rounds of communication may be necessary. This algorithm is appropriate in low latency settings. Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 26 / 46
Parallel Balance: the One-Pass algorithm local balance using a serial algorithm exchange neighboring and remote information local rebalance using all pertinent information Na¨ ıvely requires all-to-all communication Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 27 / 46
The insulation layer I ( r ) When enforcing 2:1 balance, an octant is only affected by octants within its insulation layer [3]. process q greatly reduces number of processes that must communicate relationship is not symmetric p does not know a priori it affects q An efficient scheme for determining process p communicating pairs is required. See section 5 of our paper. Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 28 / 46
The insulation layer I ( r ) When enforcing 2:1 balance, an octant is only affected by octants within its insulation layer [3]. process q greatly reduces number of processes that must communicate r relationship is not symmetric p does not know a priori it affects q An efficient scheme for determining process p communicating pairs is required. I ( r ) See section 5 of our paper. p affects q Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 28 / 46
The insulation layer I ( r ) When enforcing 2:1 balance, an octant is only affected by octants within its insulation layer [3]. process q greatly reduces number of processes that must communicate r relationship is not symmetric p does not know a priori it affects q An efficient scheme for determining process p communicating pairs is required. I ( r ) See section 5 of our paper. p affects q q does not affect p Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 28 / 46
One-pass communication process q process p Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 29 / 46
One-pass communication process q r ⇒ send r to p Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 29 / 46
One-pass communication r o process p ⇒ send o to q Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 29 / 46
One-pass communication process q r o Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 29 / 46
One-pass communication process q r o Once remote octants are received, how do we determine their effect? Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 29 / 46
Rebalancing with remote octants, old algorithm process q r o Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 30 / 46
Rebalancing with remote octants, old algorithm process q r o Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 30 / 46
Rebalancing with remote octants, old algorithm process q r o Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 30 / 46
Rebalancing with remote octants, old algorithm process q o Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 30 / 46
Rebalancing with remote octants, old algorithm process q o This method is not O ( 1 ) , represent redundant work Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 30 / 46
Determining remote balance process q ℓ ? δ y o δ x We want an O ( 1 ) method to determine the size ℓ from displacement δ Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 31 / 46
Illustration: 2D, 2 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46
Illustration: 2D, 2 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46
Illustration: 2D, 2 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46
Illustration: 2D, 2 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46
Illustration: 2D, 2 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46
Illustration: 2D, 2 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46
Illustration: 2D, 2 -balance 2 ℓ ∼ max { δ x , δ y } Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 32 / 46
Illustration: 2D, 1 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46
Illustration: 2D, 1 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46
Illustration: 2D, 1 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46
Illustration: 2D, 1 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46
Illustration: 2D, 1 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46
Illustration: 2D, 1 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46
Illustration: 2D, 1 -balance 2 ℓ ∼ δ x + δ y Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 33 / 46
Illustration: 3D, 3 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46
Illustration: 3D, 3 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46
Illustration: 3D, 3 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46
Illustration: 3D, 3 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46
Illustration: 3D, 3 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46
Illustration: 3D, 3 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46
Illustration: 3D, 3 -balance 2 ℓ ∼ max { δ x , δ y , δ z } Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 34 / 46
Illustration: 3D, 2 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 35 / 46
Illustration: 3D, 2 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 35 / 46
Illustration: 3D, 2 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 35 / 46
Illustration: 3D, 2 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 35 / 46
Illustration: 3D, 2 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 35 / 46
Illustration: 3D, 2 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 35 / 46
Computing the Sierpinski profile The size 2 ℓ can be computed using ternary addition: Reinterpret the binary displacement δ as a base-3 number. Set λ = δ x + δ y + δ z , base-3. Reinterpret λ as a binary number. Return λ . We only need the most significant bit of λ , so we can approximate it with the expression Carry 3 ( δ x , δ y , δ z ) : = max { δ x , δ y , δ z , δ x + δ y + δ z − ( δ x | δ y | δ z ) } . Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 36 / 46
Illustration: 3D, 1 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46
Illustration: 3D, 1 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46
Illustration: 3D, 1 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46
Illustration: 3D, 1 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46
Illustration: 3D, 1 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46
Illustration: 3D, 1 -balance Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46
Illustration: 3D, 1 -balance 2 ℓ ∼ Carry 3 ( δ y + δ z , δ z + δ x , δ x + δ y ) Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 37 / 46
Results Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 38 / 46
Weak scaling Compared old and new algorithms on the Jaguar XT5 supercomputer at Oak Ridge National Laboratory. Fractal refinement pattern, increasing refinement proportional to the number of cpus. ∼ 1.3 Million octants per cpu. Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 39 / 46
Weak scaling: full one-pass algorithm Old New 6 5 Seconds per (million elements / core) 4 3 2 1 0 12 96 768 6144 49152 112128 Number of CPU cores Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 40 / 46
Weak scaling: components Old New Old New 3 2.5 2.5 Seconds per (million elements / core) Seconds per (million elements / core) 2 2 1.5 1.5 1 1 0.5 0.5 0 0 12 96 768 6144 49152 112128 12 96 768 6144 49152 112128 Number of CPU cores Number of CPU cores Local balance (serial algorithm) Local rebalance (remote balancing) Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 41 / 46
Strong scaling Compared old and new algorithms on the Jaguar XT5 supercomputer at Oak Ridge National Laboratory. Mesh of Antarctic ice sheet, with localized refinement to resolve the transition from grounded to floating ice, with ∼ 90 million octants. Doubling processor counts from 12 to 6,144. Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 42 / 46
Strong scaling: full one-pass algorithm Perfect Scaling Old New 100 10 Seconds 1 0.1 12 24 48 96 192 384 768 1536 3072 6144 Number of CPU cores Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 43 / 46
Strong scaling: components Perfect Scaling Old New Perfect Scaling Old New 10 10 1 1 Seconds Seconds 0.1 0.1 0.01 0.01 0.001 12 24 48 96 192 384 768 1536 3072 6144 12 24 48 96 192 384 768 1536 3072 6144 Number of CPU cores Number of CPU cores Local balance (serial algorithm) Local rebalance (remote balancing) Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 44 / 46
Thank you Isaac, Burstedde, Ghattas (UT Austin, Uni Bonn) Algorithms for 2:1 Octree Balance IEEE IPDPS12, Shanghai 45 / 46
Recommend
More recommend