Performance Advantages of Using a Burst Buffer for Scientific - PowerPoint PPT Presentation

Performance Advantages of Using a Burst Buffer for Scientific Workflows Andrey Ovsyannikov NERSC, Lawrence Berkeley National Laboratory with David Trebotich, Brian Van Straalen (ANAG, LBNL) BASCD-2016: Bay Area Scientific Computing Day December 3, 2016. Stanford, CA

Data-intensive science Astronomy Climate Genomics Light Sources § Applications analyzing data from experimental or observational facilities (telescopes, accelerators, etc.) § Applications combining modeling/simulation with experimental/observational data § Applications with complex workflows that require large amounts of data movement - 2 -

Data-intensive simulation at scale Example : Reactive flow in a shale Sample of California’s Monterey shale • Required computational resources: 41K cores • Space discretization: 2 billion cells • Time discretization: ~1µs; in total 3*10 4 timesteps • Size of 1 plotfile: 0.3TB • Total amount of data: 9PB* • I/O: 61% of total run time • Time to transfer data: - to GlobusOnline storage: > 1000 days - to NERSC HPSS: 120 days *Assuming that the plotfile is written at every timestep 10µm Complex workflow: On-the-fly visualization/quantitative analysis On-the-fly coupling of pore-scale simulation with continuum scale model - 3 -

Bandwidth gap Growing gap between computation and I/O rates. Insufficient bandwidth of persistent storage media. - 4 -

What is a burst bu ff er? Layer of SSDs which resides between compute nodes and parallel file system PFS PFS Parallel File System and Storage Arrays Compute nodes I/O Nodes SSD placement - 5 -

HPC memory hierarchy Past Future CPU On On CPU Near Memory Chip Chip (HBM) Memory Far Memory (DRAM) (DRAM) Near Storage Off Off (SSD) Storage Chip (HDD) Chip Far Storage (HDD) - 6 -

Why a burst bu ff er? • HDD performance not increasing sufficiently - More and more capacity to get required bandwidth - The bandwidth demand comes in ‘spikes’ • For bandwidth HDD/PFS is more expensive than SSD • Use NVRAM-based storage Burst Buffer - Lower latency, higher bandwidth of flash-based Burst Buffer - Handle I/O bandwidth spikes without increasing size of PFS - File systems on demand scale better than large PFS - 7 -

Burst bu ff ers at HPC centers § NERSC : Cori (2016) - 288 BB nodes with 1.8PB total capacity (Cray DataWarp Burst Buffer) § LANL/Sandia : Trinity (2016) - Similar architecture to NERSC/Cori § ANL : Theta (2016) - 128GiB SSD per compute node § ANL : Aurora (2018) - NVRAM per compute node and SSD burst buffers § ORNL : Summit (2018) Commonalities: § Shorter path to compute nodes § Handle latency-bound access patterns more effectively § Solid state or NVRAM storage devices § Limited capacity - 8 -

Computational physics and traditional post-processing Simulation code N timesteps ... File 1 File 2 File 3 File N Data transfer HDD Remote storage: e.g. Globus Online, visualization cluster,... Data analysis/ Visualization Data transfer/storage and traditional post-processing is extremely expensive! - 9 -

Data processing methods Data processing execution methods (Prabhat & Koziol, 2015) Post-processing In-situ In-transit Analysis Execution Separate Application Within Simulation Burst Buffer Location Data Location On Parallel File Within Simulation Within Burst Buffer System Memory Space Flash Memory Data Reduction NO: All data saved to YES: Can limit YES: Can limit data Possible? disc for future use output to only saved to disk to only analysis products analysis products. Interactivity YES: User has full NO: Analysis actions LIMITED: Data is not control on what to must be pre-scribed permanently resident load and when to to run within in flash and can be load data from disk simulation removed to disk Analysis Routines All possible analysis Fast running analysis Longer running Expected and visualization operations, statistical analysis operations routines routines, image bounded by the time rendering until drain to file system. Statistics over simulation time - 10 -

NERSC/Cray Burst Bu ff er Architecture Blade = 2x Burst Buffer Node (2x SSD each) Compute Nodes I/O Node (2x InfiniBand HCA) BB SSD CN CN SSD Storage Fabric Lustre OSSs/OSTs (InfiniBand) ION IB CN CN IB Aries High-Speed Network Storage Servers InfiniBand Fabric • Cori Phase 1 configuration: 920TB on 144 BB nodes (288 x 3.2 GB SSDs) 288 BB nodes on Cori Phase 2. • DataWarp software (integrated with SLURM WLM) allocates portions of available storage to users per-job • Users see a POSIX filesystem • Filesystem can be striped across multiple BB nodes (depending on allocation size requested) - 11 -

Burst Bu ff er User Cases @ NERSC Burst Buffer User Cases Example Early Users IO Bandwidth: Reads/ Writes ● Nyx/BoxLib ● VPIC IO Data-intensive Experimental ● ATLAS experiment Science - “Challenging/ Complex” ● TomoPy for ALS and APS IO pattern, eg. high IOPs Workflow coupling and visualization: ● Chombo-Crunch / VisIt in transit / in-situ analysis carbon sequestration simulation Staging experimental data ● ATLAS and ALS SPOT Suite Many others projects not described here (~50 active users). - 12 - - 12 -

Benchmark performance Details on use cases and benchmark performance in Bhimji et al, CUG 2016 - 13 -

Chombo-Crunch (ECP application) Transport in fractured dolomite pH on crushed calcite in capillary tube • Simulates pore scale reactive transport processes associated with carbon sequestration • Applied to other subsurface science areas: Flooding in fractured Marcellus shale O 2 diffusion in Kansas aggregate soil – Hydrofracturing (aka “fracking”) – Used fuel disposition (Hanford salt repository modeling) • Extended to engineering applications Paper re-wetting – Lithium ion battery electrodes paper Electric potential in Li-ion – Paper manufacturing (hpc4mfg) electrode The common feature is ability to perform felt direct numerical simulation from image data of arbitrary heterogeneous, porous materials - 14 -

I/O constraint: common practice Common practice: increase I/O (plotfile) interval by 10x, 100x, 1000x,... I/O contribution to Chombo-Crunch wall time at different plotfile intervals - 15 -

Loss of temporal/statistics accuracy Time evolution from 0 to T: d U dt = F ( U ( x, t )) 10x increase of plotfile time time interval x x Pros : less data to move and store Cons : degraded accuracy of statistics (stochastic simul.) - 16 -

Proposed in-transit workflow Input Config Workflow components: n timesteps MAIN SIMULATION q Chombo-Crunch Chombo-Crunch per time step Chkpt Manager s q VisIt (visualization and analytics) t 0 1/10 ts Detects Large .chk 0 user 1 Issues asynch stage out / 1 config via python q Encoder script .chk q Checkpoint manager O(100) GB .plt .chk VISUALIZATION Burst Bu ff er 1+ per .plt file VisIt ‘frame’ for movie Final Img File may be >1 movie DataWarp SW Movie DataWarp SW .png Stage Out .mp4 I/O: HDF5 for checkpoints and plotfiles Stage Out Multiple .png Files PFS Movie Encoder Lustre Wait for N .pngs, encode, place result in DRAM, at end, concatenate movies Intermediate .ts Movies Local DRAM LEGEND Input Data / Program Flow Software File SW Output / Data Out - 17 -

Straightforward batch script #!/bin/bash #SBATCH --nodes=1291 #SBATCH --job-name=shale allocate BB capacity #DW jobdw capacity=200TB access_mode=striped type=scratch #DW stage_in type=file source=/pfs/restart.hdf5 destination copy restart file to BB =$DW_JOB_STRIPED/restart.hdf5 ### Load required modules module load visit ScratchDir="$SLURM_SUBMIT_DIR/_output.$SLURM_JOBID" BurstBufferDir="${DW_JOB_STRIPED}" mkdir $ScratchDir stripe_large $ScratchDir NumTimeSteps=2000 EncoderInt=200 RestartFileName="restart.hdf5" ProgName="chombocrunch3d.Linux.64.CC.ftn.OPTHIGH.MPI.PETSC. ex" ProgArgs=chombocrunch.inputs ProgArgs="$ProgArgs check_file=${BurstBufferDir}check plot_file=${BurstBufferDir}plot pfs_path_to_checkpoint= ${ScratchDir}/check restart_file=${BurstBufferDir}${ RestartFileName} max_step=$NumTimeSteps" ### Launch Chombo-Crunch run each component srun -N 1275 –n 40791 $ProgName $ProgArgs > log 2>&1 & ### Launch VisIt visit -l srun -nn 16 -np 512 -cli -nowin -s VisIt.py & ### Launch Encoder ./encoder.sh -pngpath $BurstBufferDir -endts $NumTimeSteps -i $EncoderInt & transfer output product to wait persistent storage ### Stage-out movie file from Burst Buffer #DW stage_out type=file source=$DW_JOB_STRIPED/movie.mp4 destination=/pfs/movie.mp4 - 18 -

DataWarp API Asynchronous transfer of plot file/checkpoint from Burst Buffer to PFS #ifdef CH_DATAWARP // use DataWarp API stage_out call to move plotfile from BB to Lustre char lustre_file_path[200]; char bb_file_path[200]; if ((m_curStep%m_copyPlotFromBurstBufferInterval == 0) && (m_copyPlotFromBurstBufferInterval > 0)) { sprintf( lustre_file_path , "%s.nx%d.step%07d.%dd.hdf5", m_LustrePlotFile.c_str(), ncells, m_curStep, SpaceDim); sprintf( bb_file_path , "%s.nx%d.step%07d.%dd.hdf5", m_plotFile.c_str(), ncells, m_curStep, SpaceDim); dw_stage_file_out( bb_file_path , lustre_file_path , DW_STAGE_IMMEDIATE); } #endif - 19 -

Performance Advantages of Using a Burst Buffer for Scientific - PowerPoint PPT Presentation

Performance Advantages of Using a Burst Buffer for Scientific Workflows Andrey Ovsyannikov NERSC, Lawrence Berkeley National Laboratory with David Trebotich, Brian Van Straalen (ANAG, LBNL) BASCD-2016: Bay Area Scientific Computing Day December

Gushers Advantages Gushers Advantages Gusher s Advantages Gusher s Advantages R&D

Status of GEO burst analysis efforts Ik Siong Heng for the GEO burst group Outline

Advantages and Advantages and Advantages and Advantages and Disadvantages of Disadvantages of

Evaluating Performance of Burst Buffer Models for Real-Application Workloads in HPC Systems

Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael Lang Illinois Institute of

External buffer Raslan Darawsheh Mellanox External buffer First was introduced by Olivier

Observation of THz CSR Observation of THz CSR Burst at UVSOR- -II II Burst at UVSOR 1 Miho

Gamma- -Ray Burst observation with GLAST Ray Burst observation with GLAST Gamma F. Piron F.

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

TinyOS Determine when Fill message Specify Pass buffer message buffer Network Communication

Lab 2: Buffer Overflows Fengwei Zhang SUSTech CS 315 Computer Security 1 Buffer Overflows

Delta Pointers: Buffer Overflow Checks Without the Checks Tadde us Kroes & Koen Koning

Smashing the Buffer Smashing the Buffer Miroslav tampar Miroslav tampar (mstampar@zsis.hr )

Buffer Software Security overflows and other memory safety vulnerabilities Buffer overflow

Buffer Overflows with Content 2 A Process Stack Buffer Overflow Common Techniques employed

More Vulnerabilities (buffer overreads, format string, integer overflow, heap overflows) Chester

Cisco IOS Embedded Packet Capture (EPC) Cisco IOS Embedded Packet Capture (EPC) The Cisco IOS

SYSC3601 Microprocessor Systems Unit 4: 8086/88 Hardware & Bus Structure Topics/Reading

P ostgreSQ L B u fg ers Vi k Feari ng PG C onf. EU W arsaw O ctober 25, 20 17 V i k

High Performance PostgreSQL, Tuning and Optimization Guide Ibrar Ahmed Senior Software Engineer

The OpenCL C++ API Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Conversations with my washing machine an in-the-wild study of demand shifting with self-generated

Multilingual Visual Sentiment Concept Matching Nikolaos Pappas, Miriam Redi, Mercan Topkara,

Task-Oriented Active Perception and Planning in Environments with Partially Known Semantics MAHSA

Performance Advantages of Using a Burst Buffer for Scientific - PowerPoint PPT Presentation

Performance Advantages of Using a Burst Buffer for Scientific Workflows Andrey Ovsyannikov NERSC, Lawrence Berkeley National Laboratory with David Trebotich, Brian Van Straalen (ANAG, LBNL) BASCD-2016: Bay Area Scientific Computing Day December

Gushers Advantages Gushers Advantages Gusher s Advantages Gusher s Advantages R&amp;D

Status of GEO burst analysis efforts Ik Siong Heng for the GEO burst group Outline

Advantages and Advantages and Advantages and Advantages and Disadvantages of Disadvantages of

Evaluating Performance of Burst Buffer Models for Real-Application Workloads in HPC Systems

Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael Lang Illinois Institute of

External buffer Raslan Darawsheh Mellanox External buffer First was introduced by Olivier

Observation of THz CSR Observation of THz CSR Burst at UVSOR- -II II Burst at UVSOR 1 Miho

Gamma- -Ray Burst observation with GLAST Ray Burst observation with GLAST Gamma F. Piron F.

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

TinyOS Determine when Fill message Specify Pass buffer message buffer Network Communication

Lab 2: Buffer Overflows Fengwei Zhang SUSTech CS 315 Computer Security 1 Buffer Overflows

Delta Pointers: Buffer Overflow Checks Without the Checks Tadde us Kroes &amp; Koen Koning

Smashing the Buffer Smashing the Buffer Miroslav tampar Miroslav tampar (mstampar@zsis.hr )

Buffer Software Security overflows and other memory safety vulnerabilities Buffer overflow

Buffer Overflows with Content 2 A Process Stack Buffer Overflow Common Techniques employed

More Vulnerabilities (buffer overreads, format string, integer overflow, heap overflows) Chester

Cisco IOS Embedded Packet Capture (EPC) Cisco IOS Embedded Packet Capture (EPC) The Cisco IOS

SYSC3601 Microprocessor Systems Unit 4: 8086/88 Hardware &amp; Bus Structure Topics/Reading

P ostgreSQ L B u fg ers Vi k Feari ng PG C onf. EU W arsaw O ctober 25, 20 17 V i k

High Performance PostgreSQL, Tuning and Optimization Guide Ibrar Ahmed Senior Software Engineer

The OpenCL C++ API Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Conversations with my washing machine an in-the-wild study of demand shifting with self-generated

Multilingual Visual Sentiment Concept Matching Nikolaos Pappas, Miriam Redi, Mercan Topkara,

Task-Oriented Active Perception and Planning in Environments with Partially Known Semantics MAHSA

Gushers Advantages Gushers Advantages Gusher s Advantages Gusher s Advantages R&D

Delta Pointers: Buffer Overflow Checks Without the Checks Tadde us Kroes & Koen Koning

SYSC3601 Microprocessor Systems Unit 4: 8086/88 Hardware & Bus Structure Topics/Reading