PASCAL A Parallel Algorithmic SCALable Framework A Parallel Algorithmic SCALable Framework for N-body Problems for N-body Problems Laleh Aghababaie Beni, Aparna Chandramowlishwaran Laleh Aghababaie Beni, Aparna Chandramowlishwaran Euro-Par 2017 Euro-Par 2017 HPC Factory
Outline • Introduction • PASCAL Framework • Space Partitioning Trees • Tree Traversal • Prune/Approximate Generators • Optimizations & Parallelization • Experiments & Results • Conclusions & Future Work HPC Factory
Outline • Introduction • PASCAL Framework • Space Partitioning Trees • Tree Traversal • Prune/Approximate Generators • Optimizations & Parallelization • Experiments & Results • Conclusions & Future Work HPC Factory
N-body calculations r − q � ∀ q ∈ Q : F ( q ) = C Force computation || r − q || 3 r ∈ ( Q − { q } ) ∀ q ∈ Q : AllNN ( q ) = argmin r ∈ R d ( q, r ) Nearest neighbors 1 � ∀ q ∈ Q : KDE ( q ) = K ( q, r ) Kernel density estimation | R | r ∈ R � ∀ q ∈ Q : Range ( q ) = I ( dist ( q, r )) ≤ h ) Range count r ∈ R HPC Factory
N-body calculations What do these have in common? r − q � ∀ q ∈ Q : F ( q ) = C Force computation || r − q || 3 r ∈ ( Q − { q } ) ∀ q ∈ Q : AllNN ( q ) = argmin r ∈ R d ( q, r ) Nearest neighbors 1 � ∀ q ∈ Q : KDE ( q ) = K ( q, r ) Kernel density estimation | R | r ∈ R � ∀ q ∈ Q : Range ( q ) = I ( dist ( q, r )) ≤ h ) Range count r ∈ R HPC Factory
N-body calculations What do these have in common? r − q � ∀ q ∈ Q : F ( q ) = C Force computation || r − q || 3 r ∈ ( Q − { q } ) ∀ q ∈ Q : AllNN ( q ) = argmin r ∈ R d ( q, r ) Nearest neighbors 1 � ∀ q ∈ Q : KDE ( q ) = K ( q, r ) Kernel density estimation | R | r ∈ R � ∀ q ∈ Q : Range ( q ) = I ( dist ( q, r )) ≤ h ) Range count r ∈ R Consider pairs of points – naïvely O( N 2 ) HPC Factory
Commonality: Optimal approximation algorithms r − q � ∀ q ∈ Q : F ( q ) = C Force computation || r − q || 3 r ∈ ( Q − { q } ) • Hierarchical tree-based approximation algorithms for force computations, e.g. , Barnes-Hut or FMM Evaluate interactions → Tree traversals Store aggregate data at nodes, e.g., bounding box, mass HPC Factory
N-body problems in other domains Problem Operators Kernel Function ∀ , arg min All Nearest Neighbors || x q − x r || ∀ , ∪ arg All Range Search I ( h min < || x q − x r || < h max ) ∀ , Σ All Range Count I ( h min < || x q − x r || < h max ) ∀ , arg max 2 ( x i − µ k ) T Σ − 1 2 π | Σ k | ) e − 1 Naive Bayes Classifier p k ( x i − µ k ) P ( C k ) (1 / 2 ( x i − µ k ) T Σ − 1 Mixture Model E-step ∀ , ∀ 2 π | Σ k | ) e − 1 p k ( x i − µ k ) (1 / || x q − x r || ∀ , arg min K-means E-step X X 2 ( x i − µ k ) T Σ − 1 Mixture Model Log-likelihood , log 2 π | Σ k | ) e − 1 p ( x i − µ k ) (1 / k φ ( || x q − x r || ∀ , Σ Kernel Density Estimation ) h φ ( || x q − x r || ∀ , arg max Σ Kernel Density Bayes Classifier ) P ( C k ) h I ( || x q − x r || < h ) Σ , Σ 2-point (cross-)correlation y r φ ( || x q − x r || ∀ , Σ Nadaraya-Watson Regression ) h Thermodynamic Average Σ , Σ φ ( || x q − x r || ) Σ ( || x q − x r || ) Largest-span set max , ..., max Closest Pair arg min , arg min || x q − x r || ∀ , arg min || x q − x r || Minimum Spanning Tree α q α r Coulombic Interaction ∀ , Σ || x q − x r || Average Density I ( || x q − x r || < h ) Σ , Σ φ ( || x q − x r || ) Wave function ∀ , Π || x q − x r || Hausdor ff Distance max , min Intrinsic (fractal) Dimension I ( || x q − x r || < h ) Σ , Σ Each problem has a set of operators and a kernel function HPC Factory
Why N-body methods? • One of the original seven dwarfs or motifs • FMM listed among the top 10 algorithms having the greatest influence in 20 th century • EM is one of the top 10 algorithms having the highest impact in data mining • Applications • Machine learning • Computer vision • Computational geometry • Scientific computing … HPC Factory
Key Ideas and Findings • An algorithmic framework for N-body problems • Automatically generates prune & approximation conditions • Results in O(N log N) and O(N) algorithms • Domain-specific optimizations and parallelization • Show 10-230x speedup on 6 different algorithms compared to state-of-art libraries/softwares • Out-of-the-box new optimal algorithms • O(N log N) EM algorithm for GMM’s • O(N) algorithm for Hausdorff distance HPC Factory
Outline • Introduction • PASCAL Framework • Space Partitioning Trees • Tree Traversal • Prune/Approximate Generators • Optimizations & Parallelization • Experiments & Results • Conclusions & Future Work HPC Factory
PASCAL Framework Space-partitioning Trees Tree Traversal Schemes Datasets Kd-tree Multi tree traversals BaseCase Prune/Approximate ComputeApproximate N-body spec.: Prune/Approximate Operators & condition generator Kernel function Domain-Specific Optimizations Parallelization Optimized code HPC Factory
Tree Construction Recursively divide space until each box has at most q points . http://www.cs.cmu.edu/~dpelleg/kmeans.html HPC Factory
Tree Construction Recursively divide space until each box has at most q points . http://www.cs.cmu.edu/~dpelleg/kmeans.html HPC Factory
Tree Construction Recursively divide space until each box has at most q points . http://www.cs.cmu.edu/~dpelleg/kmeans.html HPC Factory
Tree Traversal Q R HPC Factory
Tree Traversal Q R Prune/Approx()? HPC Factory
Tree Traversal Q R Prune/Approx()? NO HPC Factory
Tree Traversal Q R HPC Factory
Tree Traversal Q R Prune/Approx()? HPC Factory
Tree Traversal Q R Prune/Approx()? NO HPC Factory
Tree Traversal Q R HPC Factory
Tree Traversal Q R BaseCase() Direct Q L ⊗ R L → O( q 2 ) HPC Factory
Tree Traversal Q R HPC Factory
Tree Traversal Q R Prune/Approx()? HPC Factory
Tree Traversal Q R Prune/Approx()? YES HPC Factory
Tree Traversal Q R If Prune/Approx() is true, discard the entire subtree for pruning problems HPC Factory
Tree Traversal Q R BaseCase() HPC Factory
Tree Traversal Q R HPC Factory
Tree Traversal Q R Prune/Approx()? YES HPC Factory
Tree Traversal Q R If Prune/Approx() is true, replace the subtree with the centroid for approximation problems HPC Factory
Tree Traversal Q R ApproxCompute() HPC Factory
Tree Traversal Q R BaseCase() HPC Factory
Tree Traversal Q R HPC Factory
Prune/Approximate Condition Generator • Prune e.g. , Hausdorff Distance HPC Factory
Hausdorff Distance HPC Factory
Hausdorff Distance Q HPC Factory
Hausdorff Distance Q R HPC Factory
Hausdorff Distance Q R HPC Factory
Hausdorff Distance Q R HPC Factory
Hausdorff Distance Q R HPC Factory
Hausdorff Distance Q R HPC Factory
Hausdorff Distance Q R HPC Factory
Prune/Approximate Condition Generator • Prune e.g. , Hausdorff Distance • Approximation e.g. , Expectation Maximization (EM) E-step M-step Log-likelihood HPC Factory
Approximate Condition for EM HPC Factory
Approximate Condition for EM Q R HPC Factory
Approximate Condition for EM Q R HPC Factory
Approximate Condition for EM Q K min R HPC Factory
Approximate Condition for EM Q R HPC Factory
Approximate Condition for EM Q K max R HPC Factory
Approximate Condition for EM Q R HPC Factory
Approximate Condition for EM center Q R center HPC Factory
Approximate Condition for EM center Q K cen tf r R center HPC Factory
Approximate Condition for EM K max - K min < X K cen tf r < β center Q K cen tf r R center HPC Factory
Approximate Condition for EM K max - K min < X K cen tf r < β center user-controlled accuracy Q K cen tf r R center HPC Factory
Approximate Condition for EM K max - K min < X K cen tf r < β center user-controlled accuracy Q K cen tf r R center E-step: log liklihood: ( r i,max − r i,min ) < β r i,mean ( i = 1 , ..., K ) HPC Factory
Prune Condition for Hausdorff distance: HPC Factory
Prune Condition for Hausdorff distance: Q R HPC Factory
Prune Condition for Hausdorff distance: Q R HPC Factory
Prune Condition for Hausdorff distance: border point Q R HPC Factory
Prune Condition for Hausdorff distance: border point Q R ∀ x q ∈ N border , ∀ x r ∈ N border op ⊕ 1 ( τ 1 , K ( x q , x r ) | op ⊕ 2 ( τ 2 , K ( x q , x r ))) s.t. q r HPC Factory
Outline • Introduction • PASCAL Framework • Space Partitioning Trees • Tree Traversal • Prune/Approximate Generators • Optimizations & Parallelization • Experiments & Results • Conclusions & Future Work HPC Factory
Recommend
More recommend