for the uintah framework
play

for the Uintah Framework Qingyu Meng, Justin Luitjens, and Martin - PowerPoint PPT Presentation

Dynamic Task Scheduling for the Uintah Framework Qingyu Meng, Justin Luitjens, and Martin Berzins Thanks to DOE for funding since 1997, NSF since 2008 Uintah Applications Plume Fires Angiogenesis Industrial Flares Micropin Flow Explosions


  1. Dynamic Task Scheduling for the Uintah Framework Qingyu Meng, Justin Luitjens, and Martin Berzins Thanks to DOE for funding since 1997, NSF since 2008

  2. Uintah Applications Plume Fires Angiogenesis Industrial Flares Micropin Flow Explosions Sandstone Compaction Virtual Foam Soldier Shaped Charges Compaction

  3. Uintah Development • Uintah is developed over a decade based on a far-sighted design by Steve Parker - Complete separation of user code and parallelism Tuning Expert (CS) Domain Expert (Engineering) Goal Performance, Salability Problem, Methods Responsibility Infrastructure Components Simulation Components Major Load balancing, AMR Arches, ICE, MPM, MPM- Contributions Task-graph based scheduling ICE, etc.. Asynchronous communication View of Parallel Infrastructure Serial Code Written for a Program MPI, Threads Patch

  4. How Does Uintah Work Task-Graph Specification Patch-Based Domain • Computes & Requires Decomposition • No Processor or Domain Information

  5. Patch-Based Domain Decomposition Adaptive Mesh Particles Cells Patch regrid load balance Mesh of cells Patch Processors

  6. Uintah Scalability • Currently scale up to 98K cores on Kraken • Prepare for future machines • Petascale • Exascale 2018 ~2020 e.g. Aggressive Strawmam 742 cores per socket, 166,113,024 cores (DARPA hardware report, 2009)

  7. Software Model for Exascale • Silver model for Exascale software which must: • Directed dynamic graph execution • Latency hiding • Minimize synchronization and overheads • Adaptive resource scheduling • Heterogeneous processing • Graph-based asynchronous-task work queue model (DARPA software report, 2009)

  8. Graph Based Applications 1: 1 1: 1: 1: 2 3 4 2: 2: 2: 2 3 4 2: Charm++: Object-based Virtualization 2 2: 2: 3 4 3: 3: 3 4 Intel CnC: 3: new language for 3 graph based parallelism Plasma (Dongarra): DAG based Parallel linear algebra software

  9. Uintah Distributed Task Graph • Up to 2 million tasks per timestep globally • Tasks on local and neighboring patches • Callback by each patch • Variables in data warehouse(DW) • Read - get() from OldDW and NewDW • Write- put() to NewDW • Communication on cutting edges 4 patches single level ICE task graph

  10. Example Uintah Task from the ICE Algorithm Compute face-centered Velocities: Input variables Output variables (include boundary conditions)

  11. Task Graph Compiling

  12. Uintah Static Task Scheduler • Task List • Static analysis • In order execution, same order for each patch • Task status • Running->Finished->Next Task

  13. Static Task Graph Execution 1) Static: Predetermined order • Tasks are Synchronized • Higher waiting times Task Dependency Execution Order

  14. Dynamic Task Graph Execution 1) Static: Predetermined order • Tasks are Synchronized • Higher waiting times 2) Dynamic: Execute when ready • Tasks are Asynchronous • Lower waiting times Task Dependency Execution Order

  15. Dynamic Multi-threaded Task Graph Execution 1) Static: Predetermined order • Tasks are Synchronized • Higher waiting times 2) Dynamic: Execute when ready • Tasks are Asynchronous • Lower waiting times 3) Dynamic Multi-threaded( Future ): • Task-Level Parallelism • Decreases Communication Task • Decreases Load Imbalance Dependency Multicore friendly Execution Order Support GPU tasks

  16. Uintah Dynamic Task Scheduler Multi-thread • Task queues • Internal ready (MPI waiting tasks) • External ready (ready for concurrent execution) • Task status • Not scheduled -> Internal Ready -> External Ready -> Running -> Finished -> New task(s) satisfied

  17. Ondemand Datawarehouse Directory based hash map <name, type, patchid> Var <name, type, patchid> Var versions del_T Global n/a del_T Global n/a v v1 press CC 1 press CC 1 v v1 v2 v3 press CC 2 press CC 2 v2 v3 v u_vel FC 1 u_vel FC 1 v v1 v3 … … … … … C .. .. V V R M V For fixed order execution Variable versioning for out-of-order execution

  18. Schedule Global Sync Task • Synchronization task R1 R2 • Update global variable R2 R1 • e.g. MPI Allreduce • Call third party library Deadlock • e.g. PETSc • Out-of-order issues • Deadlock • Load imbalance • Task phases • One global task per phase Load imbalance • Global task runs last • In phase out-of-order

  19. Dynamic Scheduler Performance Improvements Strong Scaling Weak Scaling (Fixed problem size) (Fixed problem size/Core)

  20. Task prioritization algorithms not executed executed MPI sends Algorithm Random FCFS PatchOrder MostMsg. Queue Length 3.11 3.16 4.05 4.29 Wait Time 18.9 18.0 7.0 2.6 Overall Time 315.35 308.73 187.19 139.39

  21. Granularity Effect • Decrease patch size • (+) Increase queue length • (+) More overlap, lower task wait time • (+) More patches, better load balance • (-) More MPI messages • (-) More regrid overheads • Other Factors • Problem size • Implied task level parallelism • Interconnection bandwidth and legacy • CPU cache size • Solution- Self Tuning?

  22. Summary • Dynamic task scheduling • Support Out of order execution • Two task queues • Variable versioning • Task phases • Task prioritization algorithms • Ready queue length and task wait time • Granularity effect • Multi-thread and Self-tuning (Future)

  23. Questions

  24. BACKUP SLIDES

  25. Uintah Components Simulation Models Regridder (Arches, ICE, MPM, (EoS, Constitutive, …) MPMICE, MPMArches , …) Callbacks Tasks Simulation Data Tasks Controller Archiver Scheduler Callbacks XML Load Checkpoints Problem MPI Balancer Specification Data I/O Domain Tuning Expert Expert

  26. Uintah: Task Based Application • Automatic dependency analysis • Automatic message generation

  27. Priority: Most Messages • Priority external task queue • Give priority to tasks that satisfy external dependencies first 3 3 5 5 0 3 3 0 3 3 0 0 5 3 3 5 patches on a single core

  28. External Dependency Counter

  29. MPM-ICE Algorithm Uintah originally designed for simulation of fires and explosions, e.g. metal containers embedded in large hydrocarbon fires. • ICE is a cell-centered finite volume method for Navier Stokes equations • ICE now handles fast and slow flows (2009) • MPM is a novel method that uses particles and nodes • Cartesian grid used as a common frame of reference • MPM (solids) and ICE (fluids) exchange data several times per timestep, not just boundary condition exchange. Container with PBX explosive

Recommend


More recommend