Towards system-scale optimisation of HPC applications TADaaM : Topology-Aware System-Scale Data Management for High-Performance Computing Applications Emmanuel Jeannot October 2016
INTRODUCTION Optimize application execution at system-scale t he nt a - s ? e ., d h e a t t t (a) Topology Applications Data Tadaam, october 2016 Emmanuel Jeannot - 2
Outline 1. Context and problematic 2. Scientific challenges 3. Software and use-cases 4. Conclusion Tadaam, october 2016 Emmanuel Jeannot - 3
1 Context and Problematic Tadaam, october 2016 Emmanuel Jeannot - 4
Computing is easy, accessing data is difficult Lot of computing power. Bringing data at the right place at the right time is the challenge. Flops are free but bytes are expensive! Tadaam, october 2016 Emmanuel Jeannot - 5
Stacking Optimized Library and Runtime Systems Multithreaded application Scientific app Parallel Blas Multithreaded Comp. Library Multithreaded MPI (progress Multithreaded Runtime OpenMP threads) Com. Library System Hardware Multicore+parallel Pb: Each thread ignore the existence of the other threads! Mapping? Priority? Scheduling? Tadaam, october 2016 Emmanuel Jeannot - 6
First Year Accumlated Curie Utilization Platform partitioning 1.4e+07 1.2e+07 Curie median case t 1e+07 he (install time): 256 nodes nt 8e+06 Node hours 6e+06 a 4e+06 2e+06 - s Cumulated node hours 0 1 10 100 1000 Job size e ., BW median case: h 2048 nodes e a t t t (a) Pb: message transfer not aware of other applications! Contention, routing, message scheduling Cf.: Demonstrating Improved Application Performance Using Dynamic Monitoring and Task Mapping, A. Gentile, J.Brandt, K. Devine, K. Pedretti Tadaam, october 2016 Emmanuel Jeannot - 7
What is missing? A “thing” that allows for managing data by doing: • Cross-layer optimizations • System-wide optimizations Tadaam, october 2016 Emmanuel Jeannot - 8
How application can make the best possible use of the available resources t he nt a - s ? e Problematic: ., d h e • Allocate data a t t t (a) • Partition data Applications Topology • Reserve resources • Control affinity • Map computation • Manage contention • Optimize communication • Access storage Data • Perform visualization Tadaam, october 2016 Emmanuel Jeannot - 9
Our approach: An intermediate service layer for optimizing execution Application a Application Application b Application needs Programming Model Stateful System-wide Service Layer Memory hierachy Cache size Network topology Allocated resources Other applications Storage Hardware Tadaam, october 2016 Emmanuel Jeannot - 10
Applications needs Application can express its varying needs for: • Memory usage • Computation • Network access • Storage • Affinity • Model/data refinement • etc. Tadaam, october 2016 Emmanuel Jeannot - 11
2 Scientific challenges Tadaam, october 2016 Emmanuel Jeannot - 12
The application within its ecosystem Applications Application Environment need and model model Programming models Compilers Libraries Batch Optimization algorithm scheduler Runtime systems Network Operating systems Hardware Optimized execution SW stack Storage Tadaam, october 2016 Emmanuel Jeannot - 13
Challenges We need: Application a Application b • A layer based on models and abstractions (application and App. needs environment) Stateful System-Wide Service Layer • System-wide services that Memory hierachy take into account the whole Cache size ecosystem at scale Network topology • A stateful optimization engines Allocated resources Other applications Storage Hardware Tadaam, october 2016 Emmanuel Jeannot - 14
3 Software and use-case Tadaam, october 2016 Emmanuel Jeannot - 15
Mesh-based High-performance computing applications Most of the large-scale applications (at least 2/3 in last PRACE call) use meshes: • domain decomposition • stencil • unstructured • hierarchical • etc. Ex: aerodynamic, climate, electromagnetism, seismology, plasma, etc. Tadaam, october 2016 Emmanuel Jeannot - 16
Software suite: use-case example Mesh/graph partitioning (Scotch) Platform model (Hwloc) Topology-aware locality mechanisms (TreeMatch) Parallel mesh adaptation (Pampa) Communication optimization (New Madeline) Tadaam, october 2016 Emmanuel Jeannot - 17
4 Conclusion Tadaam, october 2016 Emmanuel Jeannot - 18
System-wide topology-aware data management Machines are more complex and applications require to be executed at large-scale. Need for cross-layer and system-wide optimizations Target mesh-based applications. Design, implement, deploy a stateful, system-wide service layer to: • Optimize application execution • According to its needs Tadaam, october 2016 Emmanuel Jeannot - 19
The TADaaM Team Emmanuel Jeannot, senior research scientist (DR2), Inria, Team leader; Guillaume Aupy, Research scientist (CR2), Inria Alexandre Denis, experienced research scientist (CR1), Inria; Brice Goglin, experienced research scientist (CR1), Inria; Guillaume Mercier, assistant professor, Bordeaux Institute of Technology; François Pellegrini, professor, University of Bordeaux; Raphaël Blanchard, PhD student, CIFRE Onera; Cyril Bordage, Postdoc, COLOC, Inria; Remi Barat, PhD student, CIFRE, CEA; Nicolas Denoyelle, research engineer, COLOC, Inria; Clément Foyer, Engineer, ELCI, Inria; Cédric Lachat, post-doc, ELCI, Inria; Benjamin Lorendeau, PhD student, CIFRE, EDF; Farouk Mansouri, Post-doc, Inria, Adèle Villiermet, PhD student, COLOC, Inria. ; Hugo Taboada, PhD syudent, CEA; Cécile Boutors, Team assistant. Tadaam, october 2016 Emmanuel Jeannot - 20
Thanks! Inria Bordeaux Sud-Ouest www.inria.fr
Recommend
More recommend