Replicating HPC I/O Workloads With Proxy Applications James Dickson , - PowerPoint PPT Presentation

Replicating HPC I/O Workloads With Proxy Applications James Dickson , Steven Wright, Stephen Jarvis - University of Warwick Satheesh Maheswaran, Andy Herdman - UK Atomic Weapons Establishment Marc C. Miller - Lawrence Livermore National Laboratory

Motivation I/O investigation goals? – Benchmarking systems – Tuning application behaviour – Tuning software stack – Changing paradigm – Changing hardware technology 2

Motivation Working with a mini application or proxy is less cumbersome and more streamlined, not to mention more portable Developing and maintaining a representative proxy for every application is time consuming and probably redundant Ideally we would like to experiment while minimising time spent making code changes and writing new implementations 3

Outline Background: Proxy app and I/O library Replication Components Case Study Conclusion 4

Background: MACSio “Multi-purpose Application-Centric, Scalable I/O Proxy Application” Two key characteristics: – Level of Abstraction: POSIX, MPI-IO, SILO, HDF5 and beyond… – Degree of Flexibility: dump type, dataset composition, user defined data objects Multi-purpose achieved through plugin based design, if you have a library or interface to work with, write a plugin! 5

Background: TyphonIO APPLICATION TyphonIO SCIENTIFIC DATA MODEL HIGH LEVEL I/O LIBRARY HDF5 PARALLEL INTERFACE MIDDLEWARE PARALLEL FILE SYSTEM STORAGE HARDWARE 6

Background: TyphonIO File Overlays a hierarchical data model on the parallel I/O State 1 State 2..N interface Mesh Chunked Object Designed to use HDF5 in a consistent way that can be Material optimised for the data model, Quants e.g. efficient use of chunking Vargroup in the mesh structure Variable 7

Replication: Profiling Darshan I/O characterisation chosen for lightweight profiling Runtime 1 Node 64 Nodes (seconds) Instrumentation overhead indistinguishable from Uninstrumented 309.25 352.33 machine noise in our Instrumented 307.43 352.29 experiments Profiling produces counters for POSIX, MPI-IO, HDF5 8

Replication: Parameter Generation Darshan YAML MACSio Application Log Log Parameters Run Access Diagram 9

Replication: Parameter Generation Filesize = Processors ( PartSize ( α Variables + β ) +γ Variables + δ ) + ψ Variables + η MACSio currently weak scales, so increasing processor count increases the file size linearly Similarly, part size and dataset variable count give a linear increase in total bytes written Combining the linear equations gives the equation above to calculate a good estimate for the resultant file size based on dataset composition Constants α, β, γ, δ, ψ, η are derived experimentally from a dataset composition scaling study 10

Replication: Parameter Generation Extracting counters such as BYTES_WRITTEN, NUM_PROCS, COLL_WRITES, [OPEN/ CLOSE]_TIMESTAMP is enough to generate an input to MACSio for a similar dataset composition and I/O pattern In particular, using timestamps to distribute load across the simulation runtime is important to give an accurate representation of typical ‘bursty’ I/O hotspots spread out across runtime 11

Case Study: Bookleaf 2D unstructured Lagrangian hydrodynamics application Fixed checkpoint scheme: two per simulation The input deck used solves the Noh verification problem for ideal gases I/O is handled by TyphonIO 12

Experimental Setup ARCHER - 4920 node CRAY XC30 - Two 12-core Ivy Bridge processors per node (118,080 cores total) - Three Lustre filesystems: - 12 OSSs - 4 OSTs/OSS - 10 4TB Discs/OST (RAID6) - 1 MDS + 1 MDT with 14 600GB discs (RAID1+0) - 10 LNet Router nodes with overlapping routing paths 13

Experimental Setup: Input Parameters Part size represents the volume of data written Nodes Part Size (Bytes) Wait Time (s) from each rank 1 404 320 266 2 202 205 120 4 101 148 53 Wait time is a basic time 8 50 619 22 16 25 355 11 buffer between 32 12 723 7 64 6407 5 consecutive file accesses 14

File Access Pattern File access times are offset by the initial setup in MACSio 2 Bookleaf MACSio 1 Accounting for this overhead Bookleaf 2 is not necessary to Bookleaf 1 accurately represent the I/O 0 50 100 150 200 250 300 350 pattern so we don’t factor it Time (s) in, but this could easily be introduced 15

Results: I/O Time Cumulative I/O Time across all ranks Absolute I/O Time 17,000s 262 , 000 128 1,536 ranks 64 32 , 700 ≈ 110s writing Time (s) 32 per rank 4 , 090 16 MACSio #1 MACSio #1 512 8 MACSio #2 MACSio #2 Bookleaf #1 Bookleaf #1 4 Bookleaf #2 Bookleaf #2 64 1 2 4 8 16 32 64 1 2 4 8 16 32 64 # Nodes # Nodes 16

Results: I/O Time Total, cumulative and Slowest Individual MPIIO Operation slowest individual I/O time 64 remain consistent for the 32 original and replicated runs 16 Time (s) Looking at a wider range of 8 4 Darshan counters, access MACSio #1 2 MACSio #2 sizes and frequencies are Bookleaf #1 1 also consistent Bookleaf #2 0 . 5 1 2 4 8 16 32 64 # Nodes 17

Results: Testing Independent vs Collective I/O with MACSio Using the MACSio replication, a parameter tweak can be used to 128 manipulate I/O library behaviour 32 Time (s) The switch to use collective 8 buffering has a very predictable Collective #1 2 effect, reducing the number of Collective #2 Independent #1 small write operations and Independent #2 0 . 5 lowering the overall I/O time 1 2 4 8 16 32 64 # Nodes 18

Conclusion We use a proxy application and high level library to mimic an I/O pattern based off as lightweight profiling as possible I/O characterisation and a small amount of application familiarity is enough to produce a proxy that is workable Once a parameter set has been identified, we can chop and change strategy, library and platform with a reasonable amount of simplicity 19

Next Steps More irregular I/O patterns from range of applications Exercise different parallel interfaces Multiple concurrent workloads 20

Acknowledgements UK Atomic Weapons Establishment Technical Outreach Programme UK Engineering and Physical Sciences Research Council

Thank You Any Questions? J.Dickson@warwick.ac.uk

Replicating HPC I/O Workloads With Proxy Applications James Dickson , - PowerPoint PPT Presentation

Replicating HPC I/O Workloads With Proxy Applications James Dickson , Steven Wright, Stephen Jarvis - University of Warwick Satheesh Maheswaran, Andy Herdman - UK Atomic Weapons Establishment Marc C. Miller - Lawrence Livermore National Laboratory

Web Proxy Web Proxy Caching Caching Caching Web Proxy Web Proxy Caching By Miquel Company

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

I n t e r n s L i g h t n i n g T a l k s Proxy editing PiTiVi Proxy editing

Gospel DNA Replicating Effective Ministry Afternoon Tea Back at 3:15pm Gospel DNA Replicating

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

for HPC workloads Key Liao Center for HPC Shanghai Jiao Tong University Jan 9th, 2019 About Me

MySQL Proxy Making MySQL more flexible Jan Kneschke jan@mysql.com MySQL Proxy proxy-servers

C# Design Patterns: Proxy APPLYING THE PROXY PATTERN Steve Smith FORCE MULTIPLIER FOR DEV TEAMS

Introduction Workloads for Experiments Introduction to workloads CS 239 Workload

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

On the Difficulty of Replicating Human Subject Studies in Software Engineering Jonathan Lung,

January 29, 2018 Proxy Statements under Maryland Law 2018 The 2018 proxy season is here.

Developing a Proxy Model for Solar EUV Irradiance Wren Suess 1,2,4 ,Rodney Viereck 2 , Janet

tst rt

Services Coercion and Proxy Access February 2016 www.england.nhs.uk Todays objectives

SWEN 262 Engineering of Software Subsystems Proxy You are designing a Tic-Tac-Toe game that can

Hindsight Credit Assignment Anna Harutyunyan , Will Dabney, Thomas Mesnard, Mo Azar, Bilal Piot,

1 Executive Summary Database Proxies: Improves SQL read/write performance and

OHT: Hierarchical Distributed Hash Tables Kun Feng, Tianyang Che Outline Introduction

Proxy Functions Jeffrey Snover Distinguished Engineer/Lead Architect Windows Server Demo New

Sambuz

Useful Links

Newsletter

Mail Us