parallel i o
play

Parallel I/O Fang Zheng 1 Credits Some materials are taken from - PowerPoint PPT Presentation

CSE6230 Fall 2012 Parallel I/O Fang Zheng 1 Credits Some materials are taken from Rob Lathams Parallel I/O in Practice talk http://www.spscicomp.org/ScicomP14/talks/L atham.pdf 2 Outline I/O Requirements of HPC


  1. CSE6230 Fall 2012 Parallel I/O Fang Zheng 1

  2. Credits • Some materials are taken from Rob Latham’s “Parallel I/O in Practice” talk • http://www.spscicomp.org/ScicomP14/talks/L atham.pdf 2

  3. Outline • I/O Requirements of HPC Applications • Parallel I/O Stack – From storage hardware to I/O libraries • High-level I/O Middleware – Case study: ADIOS • In situ I/O Processing • Interesting Research Topics 3

  4. Parallel I/O • I/O Requirements for HPC Applications – Checkpoint/restart: for defending against failures – Analysis data: for later analysis and visualization – Other data: diagnostics, logs, snapshots, etc. – Applications view data with domain-specific semantics: • Variables, meshes, particles, attributes, etc. • Challenges: – High concurrency: e.g., 100,000 processes – Hugh data volume: e.g., 200MB per process – Ease to use: scientists are not I/O experts 4

  5. Supporting Application I/O • Provide mapping of app. domain data abstractions – API that uses language meaningful to app. programmers • Coordinate access by many processes – Collective I/O, consistency semantics • Organize I/O devices into a single space – Convenient utilities and file model • And also – Insulate applications from I/O system changes – Maintain performance!!! 5

  6. What about Parallel I/O? • Focus of parallel I/O is on using parallelism to increase bandwidth • Use multiple data sources/sinks in concert – Both multiple storage devices and multiple/wide paths to them • But applications don't want to deal with block devices and network protocols, • So we add software layers 6

  7. Parallel I/O Stack • I/O subsystem in supercomputers – Oak Ridge National Lab’s “Jaguar” Cray XT4/5 7 cited

  8. Parallel I/O Stack • Another Example: IBM BlueGene/P 8 cited

  9. Parallel File Systems (PFSs) • Organize I/O devices into a single logical space – Striping files across devices for performance • Export a well-defined API, usually POSIX – Access data in contiguous regions of bytes – Very general 9

  10. Parallel I/O Stack • Idea: Add some additional software components to address remaining issues – Coordination of access – Mapping from application model to I/O model • These components will be increasingly specialized as we add layers • Bridge this gap between existing I/O systems and application needs 10

  11. Parallel I/O for HPC • Break up support into multiple layers: – High level I/O library maps app. abstractions to a structured, portable file format (e.g. HDF5, Parallel netCDF, ADIOS) – Middleware layer deals with organizing access by many processes (e.g. MPI-IO, UPC-IO) – Parallel file system maintains logical space, provides efficient access to data (e.g. PVFS, GPFS, Lustre) 11

  12. High Level Libraries • Provide an appropriate abstraction for domain – Multidimensional datasets – Typed variables – Attributes • Self-describing, structured file format • Map to middleware interface – Encourage collective I/O • Provide optimizations that middleware cannot 12

  13. Parallel I/O Stack • High Level I/O Libraries: – Provide richer semantics than “file” abstraction • Match applications’ data models: variables, attributes, data types, domain decomposition, etc. – Optimize I/O performance on top of MPI-IO • Can leverage more application-level knowledge • File format and layout • Orchestrate/coordinate I/O requests – Examples: HDF5, NetCDF, ADIOS, SILO, etc. 13

  14. I/O Middleware • Facilitate concurrent access by groups of processes – Collective I/O – Atomicity rules • Expose a generic interface – Good building block for high-level libraries • Match the underlying programming model (e.g. MPI) • Efficiently map middleware operations into PFS ones – Leverage any rich PFS access constructs 14

  15. Parallel I/O • Parallel I/O supported by MPI-IO – Individual files – Shared file, individual file pointers – Shared file, collective I/O Each MPI process MPI process writes/reads a MPI process writes/reads a writes/reads a separate file single shared file with single shared file with collective semantics individual file pointers 15

  16. Parallel File System • Manage storage hardware – Present single view – Focus on concurrent, independent access – Knowledge of collective I/O usually very limited • Publish an interface that middleware can use effectively – Rich I/O language – Relaxed but sufficient semantics 16

  17. Parallel I/O Stack • Parallel File System: – Example: Lustre file system 17 Reference: http://wiki.lustre.org/images/3/38/Shipman_Feb_lustre_scalability.pdf

  18. High Level I/O Library Case Study: ADIOS • ADIOS (Adaptable I/O System): – Developed by Georgia Tech and Oak Ridge National Lab – Works on Lustre and IBM’s GPFS – In production use by several major DOE applications – Features: • Simple, high-level API for reading/writing data in parallel • Support several popular file formats • High I/O performance at large scales • Extensible framework 18

  19. High Level I/O Library Case Study: ADIOS • ADIOS Architecture: Scientific Codes – Layered design External Metadata (XML file) – Higher level AIDOS API: ADIOS API • Adios_open/read/write/close buffering schedule feedback – Support multiple underlying POSIX IO MPI-IO LIVE/DataTap DART HDF-5 pnetCDF Viz Engines Others (plug-in) file formats and I/O methods – Built-in optimization: scheduling, buffering, etc. 19

  20. High Level I/O Library Case Study: ADIOS • Optimize Write Performance under Contention: – Write performance is critical for checkpointing – Parallel file system is shared by: • Processes within a MPI program • Different MPI programs running concurrently – How to attain high write performance on a busy, shared supercomputer? Application 1 Application 2 20 I/O Servers

  21. In Situ I/O Processing: An alternative Approach to Parallel I/O • Negative interference due to contention on shared file system – Internal contention between process within the same MPI program As ratio of application processes vs. I/O servers reaches certain point, write bandwidth starts to drop 21

  22. In Situ I/O Processing: An alternative Approach to Parallel I/O • Negative interference due to contention on shared file system – External contention between different MPI programs Huge variations of I/O performance on supercomputers Reference: Jay Lofstead, Fang Zheng, Qing Liu, Scott Klasky, Ron Oldfield, Todd Kordenbrock, Karsten Schwan, Matthew Wolf. "Managing Variability in the IO Performance of Petascale Storage Systems". In Proceedings of SC 10. New Orleans, LA. November 2010 22

  23. High Level I/O Library Case Study: ADIOS • How to obtain high write performance on a busy, shared supercomputer? • The basic trick is find slow (over-loaded) I/O servers and avoid writing data to them Application 1 Application 2 I/O Servers 23

  24. High Level I/O Library Case Study: ADIOS • ADIOS’ solution: Coordination – Divide the writing processes into groups – Each group has a sub-coordinator to monitor writing progress – Near the end of collective I/O, coordinator has a global view of storage targets’ performance, and inform stragglers to write to fast targets 24

  25. High Level I/O Library Case Study: ADIOS • Results with a parallel application: Pixie3D Higher I/O Bandwidth Less variation 25

  26. In Situ I/O Processing • An alternative approach to existing parallel I/O techniques • Motivation: – I/O is becoming the bottleneck for large scale simulation AND analysis 26

  27. I/O Is a Major Bottleneck Now! • Under-provisioned I/O and storage sub-system in supercomputers – Huge disparity between I/O and computation capacity – I/O resources are shared and contended by concurrent jobs Machine (as of Peak Flops Peak I/O Flop/byte Nov. 2011) bandwidth Jaguar Cray XT5 2.3 Petaflops 120GB/sec 191666 Franklin Cray XT4 352 Teraflops 17GB/sec 20705 Hopper Cray XE6 1.28 Petaflops 35GB/sec 36571 Intrepid BG/P 557 Teraflops 78GB/sec 7141 27

  28. I/O Is a Major Bottleneck Now! • Huge output volume for scientific simulations – Example: GTS fusion simulation: 200MB per MPI process x 100,000 procs  20TB per checkpoint – Increasing scale  increasing failure frequency  increasing I/O frequency  increasing I/O time A prediction by Sandia National Lab shows that checkpoint I/O will weigh more than 50% of total simulation runtime under current machines’ failure frequency 28 Reference: Ron Oldfield, Sarala Arunagiri, Patricia J. Teller, Seetharami R. Seelam, Maria Ruiz Varela, Rolf Riesen, Philip C. Roth: Modeling the Impact of Checkpoints on Next-Generation Systems. MSST 2007: 30-46

  29. I/O Is a Major Bottleneck Now! • Analysis and visualization needs to read data back to gain useful insights from the raw bits – File read time can weigh 90% of total runtime of visualization tasks Reference: Tom Peterka, Hongfeng Yu, Robert B. Ross, Kwan-Liu Ma, Robert Latham: End-to-End Study of Parallel Volume Rendering on the IBM Blue Gene/P. ICPP 2009:566-573 29

  30. In Situ I/O Processing • In Situ I/O Processing: – Eliminate I/O bottleneck by tightly coupling simulation and analysis Simulation Analysis PFS remove the bottleneck! Simulation Analysis 30

Recommend


More recommend