Damaris: Using Dedicated I/O Cores for Scalable Post-petascale HPC - PowerPoint PPT Presentation

Damaris: Using Dedicated I/O Cores for Scalable Post-petascale HPC Simulations Matthieu Dorier ENS Cachan Brittany extension matthieu.dorier@eleves.bretagne.ens-cachan.fr Advised by Gabriel Antoniu SRC

Context: HPC simulations on Blue Waters ² INRIA/UIUC Joint Lab for Petascale Computing ² Targeting large-scale simulation of unprecedented accuracy ² Our concern: I/O performance scalability 2

Motivation: data management in HPC 3

Motivation: data management in HPC PetaBytes of data + 100.000 ~ 10.000 processes processes ~ 100 data servers ² Problem: ² All processes entering I/O phases at the same time ² File system contention: lake of scalability ² High I/O overhead, high performance variability 4

I/O variability: an example ² CM1 tornado simulation: 672 processes sorted by write time 5

The Damaris approach: dedicated I/O cores ² Use the SMP’s intra-node shared memory Leave a core, go faster! 6

Integration with the CM1 tornado simulation ² Less than an hour to write an I/O backend with Damaris ² The I/O core spends 25% of its time writing è 75% spare time! How to use the spare time? ² Custom plugin system: ² Data post-processing, indexing, analysis ² End-to-end scientific process ² Connect visualization/analysis tools è inline visualization 7

Results with the CM1 tornado simulation ² On Grid’5000: French national testbed (24 cores/node, 672 cores), with PVFS, comparison with collective I/O ² Communication overhead è leaving a core is more efficient ² No synchronization ² 6 times higher write throughput ² BluePrint: Power5 BlueWaters interim system at NCSA (16 cores/node, 1024 cores), with GPFS, comparison with file-per-process approach ² On 64 nodes è 64 files instead of 1024 8

Results with the CM1 tornado simulation ² On Grid’5000: French national testbed (24 cores/node, 672 cores), with PVFS, comparison with collective I/O ² Communication overhead è leaving a core is more efficient ² No synchronization ² 6 times higher write throughput ² BluePrint: Power5 BlueWaters interim system at NCSA (16 cores/node, 1024 cores), with GPFS, comparison with file-per-process approach ² On 64 nodes è 64 files instead of 1024 ² Overall benefits ² Spare time usage ² Data layout adaptation for subsequent analysis ² Overhead-free compression (600%) ² No more I/O jitter 9

Results with the CM1 tornado simulation 10

Conclusion ² Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases 3 Easy integration and configuration ² Targeting Blue Waters and future Post-petascale machines ² Very promising prospects in many directions ² Integration with other simulations: Enzo (AMR), GTC, … ² Leverage spare time for efficient inline visualization ² Data-aware self-configuration, scheduled data movements, multi-simulations coupling ² http://damaris.gforge.inria.fr 11

Conclusion ² Damaris: dedicated I/O core in multicore SMP nodes 1 Better I/O and global performance 2 No more variability in write phases 3 Easy integration and configuration ² Targeting Blue Waters and future Post-petascale machines ² Very promising prospects in many directions ² Integration with other simulations: Enzo (AMR), GTC, … ² Leverage spare time for efficient inline visualization ² Data-aware self-configuration, scheduled data movements, multi-simulations coupling ² http://damaris.gforge.inria.fr Thank you, questions? 12

Damaris: Using Dedicated I/O Cores for Scalable Post-petascale HPC - PowerPoint PPT Presentation

Damaris: Using Dedicated I/O Cores for Scalable Post-petascale HPC Simulations Matthieu Dorier ENS Cachan Brittany extension matthieu.dorier@eleves.bretagne.ens-cachan.fr Advised by Gabriel Antoniu SRC Context: HPC simulations on Blue Waters

TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES

PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS Andrew Kerr, Timmy Liu, Mostafa

Fused and Composable Heterogeneous Cores Roshan Nair and Anirudh Krishna Villivalam Single cores

2018 Year in Review Winona 10 dedicated Winona 10 dedicated Winona 10 dedicated Winona 10

the case of Montral, Canada Damaris Rose INRS - Urbanisation, Culture et Socit Institut

A Generalized Framework for Optimization with Risk Damaris Zipperer & Andrew Brown Agenda

Damaris Reyes Executive Director, Good Old Lower East Side (GOLES) Equitable Adaptation:

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Counterfeit Cores and Counterfeit Cores and the Importance of the Importance of Supply Chain

NVIDIA QUADRO RTX NVIDIA TURING GPU Turing SM RT Cores Turing SM RT Cores Up to 10 Giga

The Nature of Radio Cores The Nature of Radio Cores Sascha Trippe Sascha

Cores so efetivas na codificao de informao? Percepo de Cores Sistema Visual

Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand Amin Ansari 1 ,

Hunting Deadlocks Efficiently in Micro-Architectural Models of Communication Fabrics Freek

HW/SW Codesign w/ FPGAsGeneral Purpose Embedded Cores ECE 495/595 G.P. Embedded Cores (A

HW/SW Codesign w/ FPGAsMicroprocessors/Embedded Cores ECE 495/595 Microprocessors/Embedded Cores

Introduc)on to GPU Programming Mubashir Adnan Qureshi

Road Map 1. Introduction Introduction 1. 2. The Physical Problem The Physical Problem 2. 3.

CCIN2P3 connectivity NCSA / LSST meeting jerome.bernier@in2p3.fr May 2015 CCIN2P3 connectivity

1 About the Better Identity Coalition Focus: developing and advancing consensus-driven,

Parallel Homotopy Algorithms to Solve Polynomial Systems Jan Verschelde Department of Math, Stat

AIRS Browse Products Stephanie Granger AIRS Science Integration Team May 2, 2002 Jet Propulsion

WELCOME Data Analytics Industry Day VADM Brian Brown Commander, Naval Information Forces Our

Outline Introduction Computational Challenges Data Management Challenges

Damaris: Using Dedicated I/O Cores for Scalable Post-petascale HPC - PowerPoint PPT Presentation

Damaris: Using Dedicated I/O Cores for Scalable Post-petascale HPC Simulations Matthieu Dorier ENS Cachan Brittany extension matthieu.dorier@eleves.bretagne.ens-cachan.fr Advised by Gabriel Antoniu SRC Context: HPC simulations on Blue Waters

TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES

PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS Andrew Kerr, Timmy Liu, Mostafa

Fused and Composable Heterogeneous Cores Roshan Nair and Anirudh Krishna Villivalam Single cores

2018 Year in Review Winona 10 dedicated Winona 10 dedicated Winona 10 dedicated Winona 10

the case of Montral, Canada Damaris Rose INRS - Urbanisation, Culture et Socit Institut

A Generalized Framework for Optimization with Risk Damaris Zipperer &amp; Andrew Brown Agenda

Damaris Reyes Executive Director, Good Old Lower East Side (GOLES) Equitable Adaptation:

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Counterfeit Cores and Counterfeit Cores and the Importance of the Importance of Supply Chain

NVIDIA QUADRO RTX NVIDIA TURING GPU Turing SM RT Cores Turing SM RT Cores Up to 10 Giga

The Nature of Radio Cores The Nature of Radio Cores Sascha Trippe Sascha

Cores so efetivas na codificao de informao? Percepo de Cores Sistema Visual

Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand Amin Ansari 1 ,

Hunting Deadlocks Efficiently in Micro-Architectural Models of Communication Fabrics Freek

HW/SW Codesign w/ FPGAsGeneral Purpose Embedded Cores ECE 495/595 G.P. Embedded Cores (A

HW/SW Codesign w/ FPGAsMicroprocessors/Embedded Cores ECE 495/595 Microprocessors/Embedded Cores

Introduc)on to GPU Programming Mubashir Adnan Qureshi

Road Map 1. Introduction Introduction 1. 2. The Physical Problem The Physical Problem 2. 3.

CCIN2P3 connectivity NCSA / LSST meeting jerome.bernier@in2p3.fr May 2015 CCIN2P3 connectivity

1 About the Better Identity Coalition Focus: developing and advancing consensus-driven,

Parallel Homotopy Algorithms to Solve Polynomial Systems Jan Verschelde Department of Math, Stat

AIRS Browse Products Stephanie Granger AIRS Science Integration Team May 2, 2002 Jet Propulsion

WELCOME Data Analytics Industry Day VADM Brian Brown Commander, Naval Information Forces Our

Outline Introduction Computational Challenges Data Management Challenges

A Generalized Framework for Optimization with Risk Damaris Zipperer & Andrew Brown Agenda