Data Staging and Asynchronous I/O in ADIOS Hasan Abbasi ORNL Jong - PowerPoint PPT Presentation

Data Staging and Asynchronous I/O in ADIOS Hasan Abbasi ORNL Jong Choi ORNL Greg Eisenhauer Georgia Tech Scott Klasky ORNL Manish Parashar Rutgers Norbert Podhorszki ORNL Nagiza Samatova NCSU Karsten Schwan Georgia Tech Matthew Wolf Georgia Tech ORNL is managed by UT-Battelle for the US Department of Energy

Outline ¡ • ADIOS ¡Overview ¡ • Introduc3on ¡to ¡Staging ¡ • Data ¡Management ¡in ¡I/O ¡Pipelines ¡ • Staging ¡in ¡ADIOS ¡ • Network ¡and ¡System ¡service ¡discussion ¡ 2 Presentation_name

ADIOS ¡ hNp://www.nccs.gov/user-‑support/center-‑projects/adios/ ¡ • Abstracts ¡Data-‑at-‑Rest ¡to ¡Data-‑in-‑Mo3on ¡for ¡HPC ¡ – Provides ¡portable, ¡fast, ¡scalable, ¡easy-‑to-‑use, ¡metadata ¡rich ¡ output ¡ ¡ – Dynamically ¡allows ¡users ¡to ¡change ¡the ¡method ¡during ¡an ¡ experiment/simula3on ¡ • Provides ¡solu3ons ¡for ¡“90% ¡of ¡the ¡applica3ons” ¡ • ADIOS ¡has ¡been ¡cited ¡almost ¡1,000 ¡3mes ¡ Interface)to)apps)for)descrip/on)of)data)(ADIOS,)etc.)) • Astrophysics • Nuclear Science Data)Management)Services) • Climate • Quantum Turbulence Feedback) Buffering) )Schedule) • Combustion • Relativity Mul/Bresolu/on) Data)Compression) Data)Indexing) methods) methods) (FastBit))methods) • CFD • Seismology • Environmental Science • Sub-surface modeling Plugins)to)the)hybrid)staging)area) • Fusion • Weather Provenance) Workflow))Engine) Run/me)engine) Data)movement) • Geoscience Analysis)Plugins) Visualiza/on)Plugins) • Materials Science AdiosBbp) IDX) HDF5) pnetcdf ) “raw”)data) Image)data) • Medical: Pathology Parallel)and)Distributed)File)System) Viz.)Client) • Neutron Science 3 Presentation_name

Improving ¡I/O ¡Methods ¡for ¡High ¡End ¡simula3ons ¡ • Reduce ¡I/O ¡overhead, ¡reduce ¡network ¡data ¡ movement, ¡improve ¡wri3ng ¡and ¡reading ¡performance ¡ ¡ • To ¡achieve ¡this ¡goal, ¡ADIOS ¡provides ¡many ¡methods ¡ – Posix ¡(1 ¡file ¡per ¡process, ¡independent ¡set ¡of ¡files) ¡ – Posix ¡(1 ¡file ¡per ¡process ¡+ ¡metadata; ¡read ¡as ¡one ¡dataset) ¡ – MPI-‑Lustre ¡(MPI-‑IO ¡wri3ng ¡to ¡1 ¡global ¡file) ¡ – Aggregate ¡ ¡(1 ¡file ¡per ¡OST) ¡+ ¡1 ¡metadata ¡file ¡ – BG ¡(1 ¡file ¡per ¡rack) ¡+ ¡1 ¡metadata ¡file ¡ – …. ¡ • There’s ¡no ¡single ¡right ¡answer ¡for ¡all ¡users. ¡ – ADIOS ¡gives ¡the ¡user ¡flexibility ¡without ¡rewri3ng ¡code. ¡ 4 Presentation_name

Large ¡Writes ¡Per ¡Many ¡cores ¡ • First ¡effort ¡shows ¡performance ¡goes ¡from ¡50 ¡GB/s ¡to ¡over ¡ 100 ¡GB/s ¡ • New ¡features ¡for ¡IBM ¡BG/Q ¡to ¡eliminate ¡the ¡serial ¡process ¡in ¡ ADIOS ¡for ¡the ¡metadata ¡crea3on ¡is ¡now ¡op3onal ¡ – Metadata ¡crea3on ¡is ¡serial ¡due ¡to ¡ the ¡problem ¡of ¡threadsafe ¡MPI ¡on ¡ most ¡systems ¡ • Tes3ng ¡has ¡begun ¡to ¡use ¡staging ¡ to ¡write ¡data ¡ – Problem ¡is ¡size ¡of ¡the ¡staging ¡area ¡ • Requires ¡over ¡10K ¡cores ¡for ¡staging… ¡ • GPU ¡on ¡staging ¡is ¡useless ¡if ¡we ¡do ¡NOT ¡ do ¡other ¡processing ¡ 5 Presentation_name

Small Writes per many cores (Combustion) • Requires ¡high ¡performance ¡I/O ¡due ¡to ¡large ¡output ¡ (200 ¡GB/10 ¡minutes) ¡ • Frequent ¡reading ¡of ¡large ¡datasets ¡on ¡a ¡small ¡number ¡ of ¡processors ¡for ¡analy3cs ¡ • Individual ¡process ¡output ¡is ¡small, ¡leading ¡to ¡low ¡ u3liza3on ¡of ¡network ¡bandwidth ¡with ¡other ¡I/O ¡ solu3ons ¡ • Reading ¡of ¡large ¡datasets ¡with ¡a ¡different ¡access ¡ paNern ¡than ¡they ¡were ¡wriNen ¡out ¡leads ¡to ¡ • frequent ¡seeking ¡for ¡data ¡ • very ¡low ¡read ¡bandwidth ¡ • Analysis ¡codes ¡spend ¡90% ¡of ¡their ¡3me ¡reading ¡data ¡ • Allowed ¡ADIOS ¡team ¡to ¡focus ¡on ¡small ¡but ¡frequent ¡ output ¡data ¡ 6 Presentation_name

Spa3al ¡Temporal ¡Aggrega3on ¡ • Temporal ¡aggrega3on ¡is ¡to ¡open ¡up ¡another ¡horizon ¡to ¡further ¡consolidate ¡data ¡ • Data ¡of ¡mul3ple ¡3me ¡steps ¡are ¡merged ¡at ¡each ¡process ¡ • Data ¡is ¡wriNen ¡out ¡only ¡at ¡the ¡last ¡3me ¡step ¡or ¡reaches ¡the ¡boundary ¡of ¡memory ¡ capacity ¡ • Achieved ¡up ¡to ¡70x ¡speedup ¡for ¡read ¡performance, ¡and ¡11x ¡speedup ¡for ¡write ¡ performance ¡in ¡mission ¡cri3cal ¡climate ¡simula3on ¡GEOS-‑5 ¡(NASA), ¡on ¡Jaguar ¡ GEOS-5 Results Original( • Common read patterns New( Read performance of a 2D slice t3 of a 3D variable + time for GEOS-5 users are t2 reduced from 10 – 0.1 t1 seconds t A 2-D variable at 3 time steps are merged into • Allows interactive data a 3-D variable with time as new dimension exploration for mission critical visualizations T 1 T 2 T 3 Temperature)tendency)from)moist)physics) Original( V1 V2 V3 … V1 V2 V3 … V1 V2 V3 … 90) 60) T 1,2,3 30) New( V1 V2 V3 … La4tude) 0) Data ¡layout ¡ =30) =60) 10/29/79) 10/30/79) 10/31/79) Date) 7 Presentation_name

I/O ¡Variability ¡ Single Storage Target Performance Variations 300 • Problem 250 Throughput (MB/sec) 200 • Techniques that achieved high performance I/O 150 • Aggregation with write-behind strategy 100 • Stripe alignment: to avoid contention 50 • Are these techniques sufficient to get 0 the peak I/O performance? 0 20 40 60 80 100 Runs Titan Hopper 16 1000 Static Static I/O Re-routing (TF=0.1) I/O Re-routing (TF=0.1) 15 800 I/O Time (sec) I/O Time (sec) 14 600 13 400 12 200 11 0 0 20 40 60 80 100 0 50 100 150 200 250 Run (21:20PM to 1:20AM, 3/1/2013, no noise injected) Run (21:20PM to 23:30PM, 2/27/2013, no noise injected) 8 Presentation_name

Introduc3on ¡to ¡Staging ¡ • Initial development as a research effort to minimize I/O overhead • Draws from past work on threaded I/O • Exploits network hardware support for fast data transfer to remote memory Hasan Abbasi, Matthew Wolf, Greg Eisenhauer, Scott Klasky, Karsten Schwan, Fang Zheng: DataStager: scalable data staging services for petascale applications. Cluster Computing 13(3): 277-290 (2010) Ciprian Docan, Manish Parashar, Scott Klasky: DataSpaces: an interaction and coordination framework for coupled simulation workflows. Cluster Computing 15(2): 163-181 (2012) 9 Presentation_name

Data ¡Management ¡in ¡I/O ¡Pipelines ¡ • Perform computation in the right • Aggrega3on ¡and ¡chunking ¡to ¡improve ¡data ¡ location access ¡ • End-‑to-‑End ¡approach ¡to ¡data ¡management ¡ • Support dynamic placement • Use data reduction techniques Data chunks … Application Optimized Chunking Model Intra-chunk level < > Optimized Chunk Size? Hierarchical Dynamic Spatial Yes Subchunking Aggregation Chunk Space Filling Curve Reordering level Offset org Offset new Storage 10 Presentation_name

Indexing ¡and ¡Compression ¡ • Extreme ¡scale ¡data ¡ enhancement ¡and ¡reduc3on ¡ • U3lize ¡in ¡transit ¡and ¡in ¡situ ¡ mechanisms ¡ • Scien3fic ¡compression ¡ schemes ¡(ISABELA ¡and ¡ ISOBAR) ¡ • In ¡situ ¡indexing ¡to ¡enable ¡ fast ¡query ¡and ¡data ¡access ¡ ¡ • Deployed ¡as ¡services ¡in ¡the ¡ pipeline ¡ 11 Presentation_name

Predata: ¡I/O ¡Pipelines ¡ • Use ¡the ¡staging ¡nodes ¡and ¡create ¡a ¡workflow ¡in ¡the ¡staging ¡nodes. ¡ • Allows ¡us ¡to ¡explore ¡many ¡research ¡aspects. ¡ • Improve ¡total ¡simula3on ¡3me ¡by ¡2.7% ¡ ¡ • Allow ¡the ¡ability ¡to ¡generate ¡online ¡insights ¡into ¡the ¡260GB ¡data ¡being ¡output ¡from ¡ 16,384 ¡compute ¡cores ¡in ¡40 ¡seconds. ¡ ¡ BP file sorted array BP writer BP writer Sort Sort Bitmap Bitmap Particle array Indexing Indexing Index file Histogram Plotter Histogram Plotter 2D Histogram Plotter 2D Histogram Plotter 12 Presentation_name

In-‑Memory ¡Data ¡Staging ¡with ¡DataSpaces ¡ Staging-‑based ¡(ADIOS ¡DATASPACES ¡transport ¡method) ¡ • Extract ¡data ¡from ¡running ¡simula3ons ¡into ¡the ¡memory ¡of ¡staging ¡servers ¡ • Enables ¡more ¡loosely ¡coupled ¡data ¡interac3ons ¡ • Reduced ¡resource ¡conten3on, ¡e.g., ¡on-‑node ¡memory ¡ ¡ 13 Presentation_name

Data Staging and Asynchronous I/O in ADIOS Hasan Abbasi ORNL Jong - PowerPoint PPT Presentation

Data Staging and Asynchronous I/O in ADIOS Hasan Abbasi ORNL Jong Choi ORNL Greg Eisenhauer Georgia Tech Scott Klasky ORNL Manish Parashar Rutgers Norbert Podhorszki ORNL Nagiza Samatova NCSU Karsten Schwan Georgia Tech Matthew

Adaptable IO System (ADIOS) http://www.cc.gatech.edu/~lofstead/adios Cray User Group 2008 May 8,

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen 2 Asynchronous

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

Staging Drupal Change Management Strategies for Drupal DrupalCamp CT 2010 Staging Drupal

Just-in-time Staging of Large Input Just-in-time Staging of Large Input Data for Supercomputing

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Implementation, evaluation and analysis of Block index for ADIOS Tzuhsien Wu, Jerry Chou

Last time: staging basics . < e > . 1/ 54 Staging recap Goal : specialise with available

SOCIAL MEDIA AND MDIAS SOCIAUX THE STAGING OF ET MISE EN SCNE HISTORY DE LHISTOIRE

Pro- and cons of staging complex EVAR Is there more to staging than lower paraplegia? Barend

PREOPERATIVE STAGING PREOPERATIVE STAGING IN RECTAL CANCER IN RECTAL CANCER Jacqueline A.

Kings Staging Ammar Al-Chalabi ENCALS Training for TRICALS Milan, 21-22 May 2016

Last time: staging basics . < e > . 1/ 41 Staging pow let rec pow x n = if n = 0 then .

Asynchronous sequence circuits An asynchronous sequence machine is a sequence circuit without

Staging in High Powered Rocketry Presenters: Mai Vo, Carson Dobbs Authors: Carson Coursey, Mai

Kuznets Curve: A Simple Stages Dynamical System-Based What Happens on the . . . What Happens on

5 STAGES OF INTIMACY Dr. Jessica Higgins THE DEVELOPMENT OF INTIMACY Every couple goes

workforce, and strategies to

Welcome to Market Leader Power Hour!! Todays call will be recorded View todays

Last time: effects effect E : t 1/ 38 This time: staging . < e > . 2/ 38 Review:

Staging Parser Combinators for Efficient Data Processing Parsing @ SLE, 14 September 2014

Staging studies Callum Wilkinson University of Bern April 15, 2019 CPV 1 kt MW yr (0.04 actual

Data Staging and Asynchronous I/O in ADIOS Hasan Abbasi ORNL Jong - PowerPoint PPT Presentation

Data Staging and Asynchronous I/O in ADIOS Hasan Abbasi ORNL Jong Choi ORNL Greg Eisenhauer Georgia Tech Scott Klasky ORNL Manish Parashar Rutgers Norbert Podhorszki ORNL Nagiza Samatova NCSU Karsten Schwan Georgia Tech Matthew

Adaptable IO System (ADIOS) http://www.cc.gatech.edu/~lofstead/adios Cray User Group 2008 May 8,

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen 2 Asynchronous

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

Staging Drupal Change Management Strategies for Drupal DrupalCamp CT 2010 Staging Drupal

Just-in-time Staging of Large Input Just-in-time Staging of Large Input Data for Supercomputing

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Implementation, evaluation and analysis of Block index for ADIOS Tzuhsien Wu, Jerry Chou

Last time: staging basics . &lt; e &gt; . 1/ 54 Staging recap Goal : specialise with available

SOCIAL MEDIA AND MDIAS SOCIAUX THE STAGING OF ET MISE EN SCNE HISTORY DE LHISTOIRE

Pro- and cons of staging complex EVAR Is there more to staging than lower paraplegia? Barend

PREOPERATIVE STAGING PREOPERATIVE STAGING IN RECTAL CANCER IN RECTAL CANCER Jacqueline A.

Kings Staging Ammar Al-Chalabi ENCALS Training for TRICALS Milan, 21-22 May 2016

Last time: staging basics . &lt; e &gt; . 1/ 41 Staging pow let rec pow x n = if n = 0 then .

Asynchronous sequence circuits An asynchronous sequence machine is a sequence circuit without

Staging in High Powered Rocketry Presenters: Mai Vo, Carson Dobbs Authors: Carson Coursey, Mai

Kuznets Curve: A Simple Stages Dynamical System-Based What Happens on the . . . What Happens on

5 STAGES OF INTIMACY Dr. Jessica Higgins THE DEVELOPMENT OF INTIMACY Every couple goes

workforce, and strategies to

Welcome to Market Leader Power Hour!! Todays call will be recorded View todays

Last time: effects effect E : t 1/ 38 This time: staging . &lt; e &gt; . 2/ 38 Review:

Staging Parser Combinators for Efficient Data Processing Parsing @ SLE, 14 September 2014

Staging studies Callum Wilkinson University of Bern April 15, 2019 CPV 1 kt MW yr (0.04 actual

Last time: staging basics . < e > . 1/ 54 Staging recap Goal : specialise with available

Last time: staging basics . < e > . 1/ 41 Staging pow let rec pow x n = if n = 0 then .

Last time: effects effect E : t 1/ 38 This time: staging . < e > . 2/ 38 Review: