Parallel I/O Fang Zheng 1 Credits Some materials are taken from - PowerPoint PPT Presentation

CSE6230 Fall 2012 Parallel I/O Fang Zheng 1

Credits • Some materials are taken from Rob Latham’s “Parallel I/O in Practice” talk • http://www.spscicomp.org/ScicomP14/talks/L atham.pdf 2

Outline • I/O Requirements of HPC Applications • Parallel I/O Stack – From storage hardware to I/O libraries • High-level I/O Middleware – Case study: ADIOS • In situ I/O Processing • Interesting Research Topics 3

Parallel I/O • I/O Requirements for HPC Applications – Checkpoint/restart: for defending against failures – Analysis data: for later analysis and visualization – Other data: diagnostics, logs, snapshots, etc. – Applications view data with domain-specific semantics: • Variables, meshes, particles, attributes, etc. • Challenges: – High concurrency: e.g., 100,000 processes – Hugh data volume: e.g., 200MB per process – Ease to use: scientists are not I/O experts 4

Supporting Application I/O • Provide mapping of app. domain data abstractions – API that uses language meaningful to app. programmers • Coordinate access by many processes – Collective I/O, consistency semantics • Organize I/O devices into a single space – Convenient utilities and file model • And also – Insulate applications from I/O system changes – Maintain performance!!! 5

What about Parallel I/O? • Focus of parallel I/O is on using parallelism to increase bandwidth • Use multiple data sources/sinks in concert – Both multiple storage devices and multiple/wide paths to them • But applications don't want to deal with block devices and network protocols, • So we add software layers 6

Parallel I/O Stack • I/O subsystem in supercomputers – Oak Ridge National Lab’s “Jaguar” Cray XT4/5 7 cited

Parallel I/O Stack • Another Example: IBM BlueGene/P 8 cited

Parallel File Systems (PFSs) • Organize I/O devices into a single logical space – Striping files across devices for performance • Export a well-defined API, usually POSIX – Access data in contiguous regions of bytes – Very general 9

Parallel I/O Stack • Idea: Add some additional software components to address remaining issues – Coordination of access – Mapping from application model to I/O model • These components will be increasingly specialized as we add layers • Bridge this gap between existing I/O systems and application needs 10

Parallel I/O for HPC • Break up support into multiple layers: – High level I/O library maps app. abstractions to a structured, portable file format (e.g. HDF5, Parallel netCDF, ADIOS) – Middleware layer deals with organizing access by many processes (e.g. MPI-IO, UPC-IO) – Parallel file system maintains logical space, provides efficient access to data (e.g. PVFS, GPFS, Lustre) 11

High Level Libraries • Provide an appropriate abstraction for domain – Multidimensional datasets – Typed variables – Attributes • Self-describing, structured file format • Map to middleware interface – Encourage collective I/O • Provide optimizations that middleware cannot 12

Parallel I/O Stack • High Level I/O Libraries: – Provide richer semantics than “file” abstraction • Match applications’ data models: variables, attributes, data types, domain decomposition, etc. – Optimize I/O performance on top of MPI-IO • Can leverage more application-level knowledge • File format and layout • Orchestrate/coordinate I/O requests – Examples: HDF5, NetCDF, ADIOS, SILO, etc. 13

I/O Middleware • Facilitate concurrent access by groups of processes – Collective I/O – Atomicity rules • Expose a generic interface – Good building block for high-level libraries • Match the underlying programming model (e.g. MPI) • Efficiently map middleware operations into PFS ones – Leverage any rich PFS access constructs 14

Parallel I/O • Parallel I/O supported by MPI-IO – Individual files – Shared file, individual file pointers – Shared file, collective I/O Each MPI process MPI process writes/reads a MPI process writes/reads a writes/reads a separate file single shared file with single shared file with collective semantics individual file pointers 15

Parallel File System • Manage storage hardware – Present single view – Focus on concurrent, independent access – Knowledge of collective I/O usually very limited • Publish an interface that middleware can use effectively – Rich I/O language – Relaxed but sufficient semantics 16

Parallel I/O Stack • Parallel File System: – Example: Lustre file system 17 Reference: http://wiki.lustre.org/images/3/38/Shipman_Feb_lustre_scalability.pdf

High Level I/O Library Case Study: ADIOS • ADIOS (Adaptable I/O System): – Developed by Georgia Tech and Oak Ridge National Lab – Works on Lustre and IBM’s GPFS – In production use by several major DOE applications – Features: • Simple, high-level API for reading/writing data in parallel • Support several popular file formats • High I/O performance at large scales • Extensible framework 18

High Level I/O Library Case Study: ADIOS • ADIOS Architecture: Scientific Codes – Layered design External Metadata (XML file) – Higher level AIDOS API: ADIOS API • Adios_open/read/write/close buffering schedule feedback – Support multiple underlying POSIX IO MPI-IO LIVE/DataTap DART HDF-5 pnetCDF Viz Engines Others (plug-in) file formats and I/O methods – Built-in optimization: scheduling, buffering, etc. 19

High Level I/O Library Case Study: ADIOS • Optimize Write Performance under Contention: – Write performance is critical for checkpointing – Parallel file system is shared by: • Processes within a MPI program • Different MPI programs running concurrently – How to attain high write performance on a busy, shared supercomputer? Application 1 Application 2 20 I/O Servers

In Situ I/O Processing: An alternative Approach to Parallel I/O • Negative interference due to contention on shared file system – Internal contention between process within the same MPI program As ratio of application processes vs. I/O servers reaches certain point, write bandwidth starts to drop 21

In Situ I/O Processing: An alternative Approach to Parallel I/O • Negative interference due to contention on shared file system – External contention between different MPI programs Huge variations of I/O performance on supercomputers Reference: Jay Lofstead, Fang Zheng, Qing Liu, Scott Klasky, Ron Oldfield, Todd Kordenbrock, Karsten Schwan, Matthew Wolf. "Managing Variability in the IO Performance of Petascale Storage Systems". In Proceedings of SC 10. New Orleans, LA. November 2010 22

High Level I/O Library Case Study: ADIOS • How to obtain high write performance on a busy, shared supercomputer? • The basic trick is find slow (over-loaded) I/O servers and avoid writing data to them Application 1 Application 2 I/O Servers 23

High Level I/O Library Case Study: ADIOS • ADIOS’ solution: Coordination – Divide the writing processes into groups – Each group has a sub-coordinator to monitor writing progress – Near the end of collective I/O, coordinator has a global view of storage targets’ performance, and inform stragglers to write to fast targets 24

High Level I/O Library Case Study: ADIOS • Results with a parallel application: Pixie3D Higher I/O Bandwidth Less variation 25

In Situ I/O Processing • An alternative approach to existing parallel I/O techniques • Motivation: – I/O is becoming the bottleneck for large scale simulation AND analysis 26

I/O Is a Major Bottleneck Now! • Under-provisioned I/O and storage sub-system in supercomputers – Huge disparity between I/O and computation capacity – I/O resources are shared and contended by concurrent jobs Machine (as of Peak Flops Peak I/O Flop/byte Nov. 2011) bandwidth Jaguar Cray XT5 2.3 Petaflops 120GB/sec 191666 Franklin Cray XT4 352 Teraflops 17GB/sec 20705 Hopper Cray XE6 1.28 Petaflops 35GB/sec 36571 Intrepid BG/P 557 Teraflops 78GB/sec 7141 27

I/O Is a Major Bottleneck Now! • Huge output volume for scientific simulations – Example: GTS fusion simulation: 200MB per MPI process x 100,000 procs  20TB per checkpoint – Increasing scale  increasing failure frequency  increasing I/O frequency  increasing I/O time A prediction by Sandia National Lab shows that checkpoint I/O will weigh more than 50% of total simulation runtime under current machines’ failure frequency 28 Reference: Ron Oldfield, Sarala Arunagiri, Patricia J. Teller, Seetharami R. Seelam, Maria Ruiz Varela, Rolf Riesen, Philip C. Roth: Modeling the Impact of Checkpoints on Next-Generation Systems. MSST 2007: 30-46

I/O Is a Major Bottleneck Now! • Analysis and visualization needs to read data back to gain useful insights from the raw bits – File read time can weigh 90% of total runtime of visualization tasks Reference: Tom Peterka, Hongfeng Yu, Robert B. Ross, Kwan-Liu Ma, Robert Latham: End-to-End Study of Parallel Volume Rendering on the IBM Blue Gene/P. ICPP 2009:566-573 29

In Situ I/O Processing • In Situ I/O Processing: – Eliminate I/O bottleneck by tightly coupling simulation and analysis Simulation Analysis PFS remove the bottleneck! Simulation Analysis 30

Parallel I/O Fang Zheng 1 Credits Some materials are taken from - PowerPoint PPT Presentation

CSE6230 Fall 2012 Parallel I/O Fang Zheng 1 Credits Some materials are taken from Rob Lathams Parallel I/O in Practice talk http://www.spscicomp.org/ScicomP14/talks/L atham.pdf 2 Outline I/O Requirements of HPC

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Shared Memory Programming with OpenMP Lecture 3: Parallel Regions Parallel region directive

How to Think Algorithmically in Parallel? Or, Parallel Programming through Parallel Algorithms

PARALLEL Joachim Nitschke PROGRAMMING Project Seminar Parallel Programming, Summer

The Parallel Revolution Has Started: Are You Part of the Solution or Part of the Problem? Dave

Introduction Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures Parallel

Cluster Basics Hana Sevcikova University of Washington DataCamp Parallel Programming in R

2110412 Parallel Comp Arch Parallel Programming Paradigm Natawut Nupairoj, Ph.D. Department of

Micro-mechanics on Nuclear Graphite Dr. Dong Liu EPSRC Postdoctoral Research Fellow 1851

Optimization Ding Li 1 , Yuchen Jin 1 , Cagri Sahin 2 , James Clause 2 , William G. J. Halfond 1

Managing Messes in [2] Computational Notebooks [6] [3] Andrew Head Fred Hohman Titus

Statistical Process Monitoring of Additive Firma convenzione Manufacturing via In-Situ Sensing

Please complete the pre-assessment located in your meeting handout before the program begins .

Flu 2013: Protect Yourself and Those You Care About Meg Fisher, M.D. Medical Director, The

Centers for Disease Control and Prevention. 0 . Whats New With the Flu? CDC's Recommendations

NIH Test for Rabies Potency September 25, 2018 Alethea M. Fry, MS NIH Background Developed in