Efficient Scientific Data Management on Supercomputers – HDF5 and Proactive Data Containers (PDC) Suren Byna Staff Scientist Scientific Data Management Group Data Science and Technology Department Lawrence Berkeley National Laboratory
Scientific Data - Where is it coming from? ▪ Simulations ▪ Experiments ▪ Observations 2
Life of scientific data Generation In situ analysis Processing Storage Analysis Preservation (archive) Sharing Refinement 3
Supercomputing systems 4
Typical supercomputer architecture Blade&&=&2&x&Burst&Buffer&Node&(2x&SSD)& Compute&Nodes& I/O&Node&(2x&InfiniBand&HCA)& BB& SSD& CN& CN& CN& CN& CN& SSD& Lustre&OSSs/OSTs& Storage&Fabric&(InfiniBand)& ION& IB& CN& CN& CN& CN& CN& IB& BB& SSD& CN& CN& CN& CN& CN& SSD& BB& SSD& CN& CN& CN& CN& CN& SSD& ION& IB& CN& CN& CN& CN& CN& IB& Storage&Servers& BB& SSD& CN& CN& CN& CN& CN& SSD& Aries&HighHSpeed&Network& InfiniBand&Fabric& Cori system 5
Scientific Data Management in supercomputers ▪ Data representation – Metadata, data structures, data models ▪ Data storage – Storing and retrieving data and metadata to file systems fast ▪ Data access – Improving performance of data access that scientists desire ▪ Facilitating analysis – Strategies for supporting finding the meaning in the data ▪ Data transfers – Transfer data within a supercomputing system and between different systems 6
Scientific Data Management in supercomputers ▪ Data representation – Metadata, data structures, data models ▪ Data storage – Storing and retrieving data and metadata to file systems fast ▪ Data access – Improving performance of data access that scientists desire ▪ Facilitating analysis – Strategies for supporting finding the meaning in the data ▪ Data transfers – Transfer data within a supercomputing system and between different systems 7
Focus of this presentation ▪ Storing and retrieving data – Parallel I/O and HDF5 – Software stack – Modes of parallel I/O – Intro to HDF5 and some tuning I/O of exascale applications ▪ Autonomous data management system – Proactive Data Containers (PDC) system – Metadata management service – Data management service 8
Trends – Storage system transformation Node-local, Eg. Theta Shared burst buffer Conventional Upcoming Eg. Cori @ NERSC (ALCF), Summit (OLCF) Memory Memory Memory Memory Node-local storage Node-local storage IO Gap NVM-based shared IO Gap storage Parallel file system Shared burst buffer Parallel file system (on Theta) Parallel file system Parallel file system Campaign / center- Center-wide storage (Lustre, GPFS) (Lustre, GPFS) wide storage (on Summit) Archival storage Archival storage Archival Storage Archival Storage (HPSS tape) (HPSS tape) (HPSS tape) (HPSS tape) •IO performance gap in HPC storage is a significant bottleneck because of slow disk-based storage •SSD and new memory technologies are trying to fill the gap, but increase the depth of storage hierarchy 9
Parallel I/O software stack § I/O Libraries – HDF5 (The HDF Group) [LBL, ANL] Applications – ADIOS (ORNL) – PnetCDF (Northwestern, ANL) High Level I/O Library (HDF5, NetCDF, ADIOS) – NetCDF-4 (UCAR) I/O Middleware (MPI-IO) • Middleware – POSIX-IO, MPI-IO I/O Forwarding (ANL) • I/O Forwarding Parallel File System (Lustre, GPFS,..) I/O Hardware • File systems: Lustre (Intel), GPFS (IBM), DataWarp (Cray), … § I/O Hardware (disk-based, SSD- based, …) 11
Parallel I/O – Application view ▪ Types of parallel I/O • 1 writer/reader, 1 file … … … … … P n- P n- P n- P n- P n- • N writers/readers, N files P 0 P 0 P 0 P 1 P 1 P 1 P n P n P n P 0 P 0 P 1 P 1 P n P n 1 1 1 1 1 (File-per-process) • N writers/readers, 1 file file.0 file.0 file.m file.0 file.1 file.n file.n-1 • M writers/readers, 1 file File.1 File.1 M Writers/Readers, M Files M Writers/Readers, 1 File 1 Writer/Reader, 1 File n Writers/Readers, 1 File n Writers/Readers, n Files – Aggregators – Two-phase I/O • M aggregators, M files (file- per-aggregator) – Variations of this mode 12
Parallel I/O – System view Logical view ▪ Parallel file systems – Lustre and Spectrum Scale (GPFS) File ▪ Typical building blocks of parallel file systems Communication – Storage hardware – HDD or SSD network RAID – Storage servers (in Lustre, Object Storage Servers [OSS], and object Physical view on a parallel file system storage targets [OST] – Metadata servers – Client-side processes and interfaces ▪ Management – Stripe files for parallelism OST 0 OST 1 OST 2 OST 3 – Tolerate failures File 13
Applications High Level I/O Library ( HDF5 , NetCDF, ADIOS) I/O Middleware (MPI-IO) I/O Forwarding Parallel File System (Lustre, GPFS,..) I/O Hardware WHAT IS HDF5?
What is HDF5? • HDF5 è Hierarchical Data Format, v5 • Open file format – Designed for high volume and complex data • Open source software – Works with data in the format • An extensible data model – Structures for data organization and specification
HDF5 is like …
HDF5 is designed … ▪ for high volume and / or complex data ▪ for every size and type of system – from cell phones to supercomputers ▪ for flexible, efficient storage and I/O ▪ to enable applications to evolve in their use of HDF5 and to accommodate new models ▪ to support long-term data preservation
HDF5 Overview ▪ HDF5 is designed to organize, store, discover, access, analyze, share, and preserve diverse, complex data in continuously evolving heterogeneous computing and storage environments. ▪ First released in 1998, maintained by The HDF Group “De-facto standard for scientific computing” and integrated into every major scientific analytics + visualization tool ▪ Heavily used on DOE supercomputing systems Library Usage on Cori and Edison in 2017 Library usage on Cori and Edison in 2017 10000 10000000 Top library used at NERSC by the number of linked instances Number of linking incidences 1000000 and the number of unique users Number of unique users 1000 100000 10000 100 1000 100 10 h i l l 5 w l f f t i b i c l c k e e d d s p p s c f i s p s m l d t l c c o a l m t i b l l z p a f a t t o p e t h f mpich libsci mkl hdf5-parallel fftw hdf5 papi netcdf-hdf5parallel netcdf impi petsc parallel-netcdf tpsl gsl boost m i r r e e i p l b a a n n p p - - 5 l e 5 f l f d l a d h h r - a f p d c t e n Libraies Libraries
HDF5 in Exascale Computing Project 19 out of the 26 (22 ECP + 4 NNSA) apps currently use or planning to use HDF5
HDF5 Ecosystem … Tools Supporters File Format Library Data Model Documentation …
HDF5 DATA MODEL
HDF5 File lat | lon | temp ----|-----|----- 12 | 23 | 3.1 Experiment Notes: Serial Number: 99378920 An HDF5 file is a 15 | 24 | 4.2 Date: 3/13/09 Configuration: Standard 3 17 | 21 | 3.6 container that holds data objects.
HDF5 Data Model Dataset Link Group HDF5 Datatype Objects Attribute Dataspace File
HDF5 Dataset HDF5 Datatype Integer: 32-bit, LE HDF5 Dataspace Rank Dimensions 3 Dim[0] = 4 Dim[1] = 5 Dim[2] = 7 Specifications for single data Multi-dimensional array of element and array dimensions identically typed data elements • HDF5 datasets organize and contain data elements. • HDF5 datatype describes individual data elements. • HDF5 dataspace describes the logical layout of the data elements.
HDF5 Dataspace • Describe individual data elements in an HDF5 dataset • Wide range of datatypes supported • Integer • Float • Enum • Array • User-defined (e.g., 13-bit integer) • Variable-length types (e.g., strings, vectors) • Compound (similar to C structs) • More … Extreme Scale Computing Argonne
HDF5 Dataspace Two roles: Dataspace contains spatial information • Rank and dimensions • Permanent part of dataset definition Rank = 2 Dimensions = 4x6 Partial I/0: Dataspace describes application ’ s data buffer and data elements participating in I/O Rank = 1 Dimension = 10
HDF5 Dataset with a 2D array 3 5 12 Datatype: 32-bit Integer Dataspace: Rank = 2 Dimensions = 5 x 3
HDF5 Dataset with Compound Datatype 3 5 V V V V V V V V V uint16 char int32 2x3x2 array of float32 Compound Datatype: Dataspace: Rank = 2 Dimensions = 5 x 3
How are data elements stored? Buffer in memory Data in the file Data elements Contiguous stored physically adjacent to each (default) other Better access time Chunked for subsets; extendible Improves storage Chunked & efficiency, Compressed transmission speed
Recommend
More recommend