Institute for CLEAN AND SECURE ENERGY THE UNIVERSITY OF UTAH TM Efficient Abstractions for Exascale Software Design J AMES C. S UTHERLAND Associate Professor - Chemical Engineering The University of Utah DOE Awards DE-NA0002375 PetaApps award 0904631 DE-NA-000740 XPS award1337145
CLEAN AND SECURE ENERGY Acknowledgments THE UNIVERSITY OF UTAH Institute for TM Matt Might Chris Earl Tony Saad A SSOCIATE P ROFESSOR P OST -D OCTORAL R ESEARCH S ENIOR C OMPUTATIONAL S CIENTIST S CHOOL OF C OMPUTING A SSOCIATE ( NOW AT LLNL) Devin Robison Abhishek Bagusetty M.S. S TUDENT M.S. S TUDENT N OW AT F USION IO N OW AT U. P ITTSBURGH Amir Biglari Babak Goshayeshi P H .D. S TUDENT P H .D. S TUDENT N OW A P OST -D OC Nathan Yonkee P H .D. S TUDENT US DOE/NNSA NSF PetaApps award 0904631
CLEAN AND SECURE ENERGY One-Dimensional Turbulence (ODT) THE UNIVERSITY OF UTAH Institute for TM Cost: ODT domain • ODT: ~600 CPU hours (~1.5 hours/realization, 400 realizations) - scales as Re 3/2 . • DNS: ~2 million hours - scales as Re 3 .
CLEAN AND SECURE ENERGY ODT of Multiphase Reacting Flows THE UNIVERSITY OF UTAH Institute for 15 CPD Kob TM Probability Density x/d j Exp − 10 10 − 5 10 0 5 20 5 30 0 0 0.2 0.4 0.6 Length (cm) 40 Standoff Distance (m) 50 60 70 800 1000 1200 1400 1600
CLEAN AND SECURE ENERGY Parameterizing Manifolds in Turbulent Combustion THE UNIVERSITY OF UTAH Common parameterization PCA parameterization Institute for TM
CLEAN AND SECURE ENERGY Parameterizing Manifolds in Turbulent Combustion THE UNIVERSITY OF UTAH Common parameterization PCA parameterization Institute for TM Enabling technologies: • Principal component analysis • Multivariate adaptive regression
CLEAN AND SECURE ENERGY Reduced cost while maintaining accuracy THE UNIVERSITY OF UTAH Institute for PCA to identify model on 11-dimensional original system. TM Truncation to two dimensions. − 3 <Y OH > x 10 <T>[K] 3 40 1500 30 2 τ 20 1000 1 10 500 0 0 − 5 0 5 Y/H
CLEAN AND SECURE ENERGY THE UNIVERSITY OF UTAH Time Integration Methods for Stiff, Nonlinear Systems 3000 Institute for 2500 Dual time-stepping TM Robust nonlinear solver 2000 for explosive chemistry. T (K) Equally fast as Newton’s 1500 method at small Δ t. 1000 Stable for any Δ t. 500 Gives choice of resolution Reactor (DT-BDF-1) despite ignition/extinction. Inlet 0 Significant performance gains 0 20 40 60 80 100 with our adaptive Δσ scheme. t (ms) Heptane/Air (654 spec., 4846 rxns) Δ t max Newton = 0.1 μ s Δ t process = 1 ms est. 2000x speedup
CLEAN AND SECURE ENERGY HPC is becoming more challenging THE UNIVERSITY OF UTAH Institute for Physics problems that we are Hardware architectures are increasingly TM tackling are increasingly complex. complex, uncertain, and even divergent! Cost of software rewrites measured in tens of millions of dollars! Year Machine Peak Programming Cores Cost ($) Footprint Speed model MPI (distributed 1,600 ft 2 1997 ASCI Red 1 TFLOPS 9,298 55 M memory) MPI (+threads) 3,000 ft 2 2012 Sequoia 16 PFLOPS 1,572,864 (98,304x16) ~250 M (mixed) 17 PFLOPS 299,000 CPU (18,688x16) MPI + CUDA + 4,350 ft 2 2012 Titan 97 M 50,233,344 GPU threads (mixed) 2014 Xeon Phi 1 TFLOPS ~50 “traditional” ~3 K your foot 2014 NVidia ~3 TFLOPS 4,992 CUDA ~3 K your foot GPU
CLEAN AND SECURE ENERGY Taming the complexity beast… THE UNIVERSITY OF UTAH Institute for Enabling technologies: TM Goals: Task-graph: • Efficiently use complex • MPI communication modern architectures. • Threaded task scheduling • Enhance programmer • Allows overlap of computation with productivity by insulating the communication • Automatic memory management (fields are programmer from details. where you need them, when you need them) Domain-Specific Language • Array & stencil operations • GPU & multithread execution
CLEAN AND SECURE ENERGY Hierarchical Parallelization THE UNIVERSITY OF UTAH Uintah framework 1 Institute for Distributed task-graph: • Domain decomposition data parallelism TM • MPI communication • Task parallelism & coarse-grained DAG • Coarse-grained, threaded task scheduling • Scales to largest capability machines. 1. Berzins, M., Meng, Q., Schmidt, J., & Sutherland, J. C., DAG-based software frameworks for PDEs. In Euro-Par 2011: Parallel Processing Workshops (pp. 324–333). Springer.
CLEAN AND SECURE ENERGY Hierarchical Parallelization THE UNIVERSITY OF UTAH Uintah framework 1 Institute for Distributed task-graph: • Domain decomposition data parallelism TM • MPI communication • Task parallelism & coarse-grained DAG • Coarse-grained, threaded task scheduling • Scales to largest capability machines. On-node task-graph “Expression Library” 2 • memory management & fine- • “fine-grained” task graph for PDE assembly. grained task scheduling. • tasks consist of stencil & field operations • thread-pools • GPU management Nebo EDSL 3 EDSL • GPU & multithreaded execution • array & stencil operations • Matlab-style syntax • GPU & multithread execution Utilize resources at “high” level & “push down” as parallelism runs out 1. Berzins, M., Meng, Q., Schmidt, J., & Sutherland, J. C., DAG-based software frameworks for PDEs. In Euro-Par 2011: Parallel Processing Workshops (pp. 324–333). Springer. 2. Notz, P . K., Pawlowski, R. P ., & Sutherland, J. C., Graph-Based Software Design for Managing Complexity and Enabling Concurrency in Multiphysics PDE Software. ACM TOMS (2012). 3. Earl, C., Might, M., Bagusetty, A., & Sutherland, J. C., Nebo: An efficient, parallel, and portable domain-specific language for numerically solving partial differential equations. Journal of Systems and Software, to appear.
CLEAN AND SECURE ENERGY A Simple Example of DAGs THE UNIVERSITY OF UTAH CLEAN AND SECURE ENERGY THE UNIVERSITY OF UTAH Register all expressions Institute for • Each “expression” calculates one or more field TM quantities. Institute for • Each expression advertises its direct dependencies. TM u Γ Expression τ Registry p y i T s φ ρ φ *Notz, Pawlowski, & Sutherland (2012). ACM Transactions on Mathematical Software, 39(1).
CLEAN AND SECURE ENERGY A Simple Example of DAGs THE UNIVERSITY OF UTAH CLEAN AND SECURE ENERGY THE UNIVERSITY OF UTAH Register all expressions Institute for Γ = Γ ( T, p, y i ) • Each “expression” calculates one or more field TM quantities. Institute for Γ • Each expression advertises its direct dependencies. TM Direct (expressed) Set a “root” expression; construct a graph dependencies. p • All dependencies are discovered/resolved automatically. y i • Highly localized influence of changes in models. T • Not all expressions in the registry may be relevant/used. u Expression τ Registry s φ ρ φ *Notz, Pawlowski, & Sutherland (2012). ACM Transactions on Mathematical Software, 39(1).
CLEAN AND SECURE ENERGY A Simple Example of DAGs THE UNIVERSITY OF UTAH CLEAN AND SECURE ENERGY THE UNIVERSITY OF UTAH Register all expressions Institute for Γ = Γ ( T, p, y i ) • Each “expression” calculates one or more field TM quantities. Institute for Γ • Each expression advertises its direct dependencies. TM Direct (expressed) Set a “root” expression; construct a graph dependencies. p • All dependencies are discovered/resolved automatically. y i • Highly localized influence of changes in models. T Indirect (discovered) ρ • Not all expressions in the registry may be relevant/used. dependencies. u Expression τ Registry s φ φ *Notz, Pawlowski, & Sutherland (2012). ACM Transactions on Mathematical Software, 39(1).
CLEAN AND SECURE ENERGY A Simple Example of DAGs THE UNIVERSITY OF UTAH CLEAN AND SECURE ENERGY THE UNIVERSITY OF UTAH Register all expressions Institute for Γ = Γ ( T, p, y i ) • Each “expression” calculates one or more field TM quantities. Institute for Γ • Each expression advertises its direct dependencies. TM Direct (expressed) Set a “root” expression; construct a graph dependencies. p • All dependencies are discovered/resolved automatically. y i • Highly localized influence of changes in models. T Indirect (discovered) ρ • Not all expressions in the registry may be relevant/used. dependencies. From the graph: • Deduce storage requirements & allocate memory (externally to each expression). u Expression • Automatically schedule evaluation, ensuring proper τ Registry ordering. s φ • Asynchronous execution is critical! (overlap communication & φ computation) • Robust scheduling algorithms are key. *Notz, Pawlowski, & Sutherland (2012). ACM Transactions on Mathematical Software, 39(1).
CLEAN AND SECURE ENERGY Changes in model form are naturally handled THE UNIVERSITY OF UTAH Institute for Pure substance heat flux: TM q = � λ r T q T λ
CLEAN AND SECURE ENERGY Changes in model form are naturally handled THE UNIVERSITY OF UTAH Institute for Multi-species mixture heat flux: n TM X q = � λ r T + h i J i i =1 q J n J 1 T λ h n h 1 y 1 y n No complex logic changes in code when model are added/changed.
Recommend
More recommend