parallel i o characterisation based on server side
play

Parallel I/O Characterisation Based on Server-Side Performance - PowerPoint PPT Presentation

Parallel I/O Characterisation Based on Server-Side Performance Counters Member of the Helmholtz-Association SC16: PDSW-DISC S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 JSC J ulich Supercomputing Centre,


  1. Parallel I/O Characterisation Based on Server-Side Performance Counters Member of the Helmholtz-Association SC16: PDSW-DISC S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 JSC J¨ ulich Supercomputing Centre, Forschungszentrum J¨ ulich Kas Institut f¨ ur Mathmatik, Universit¨ at Kassel

  2. CONTENTS 1 Motivation 2 Methodology 3 Characterisation Criteria 4 Selected Results 5 Summary Member of the Helmholtz-Association S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 2

  3. Parallel I/O Characterisation Based on Server-Side Performance Counters Member of the Helmholtz-Association Part I: Motivation S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016

  4. Motivation Why analyse I/O? I/O to compute imbalance Exascale I/O challenge to balance I/O bandwidth with instruction throughput Applications I/O requirements are increasing Solution: Emerging I/O architectures Hierarchical storage Active storage Key Point Member of the Helmholtz-Association Impact of emerging I/O architectures requires understanding I/O load characteristics on current high-end HPC systems S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 4

  5. Contribution 1 Formulate an approach to monitor I/O workload using server-side performance counters 2 Introduce characterisation metrics to evaluate performance data 3 Use the approach to analyse collected data on a BlueGene/P system Member of the Helmholtz-Association S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 5

  6. Parallel I/O Characterisation Based on Server-Side Performance Counters Member of the Helmholtz-Association Part II: Methodology S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016

  7. Methodology Performance Counters Assuming an I/O sub-system that periodically ( ∆ t ) (for an extended time) logs 6 values: Data read [Bytes] Data written [Bytes] Number of read operations [IOP] Number of write operations [IOP] Number of file open operations Number of file close operations Some notation: ∆ t Logging time period t 0 Start time of logging v i i -th logged value ( v represents any of the 6 logged values) Member of the Helmholtz-Association S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 7

  8. Methodology continue Performance Counters Pre-processing data might be required, for example: To cope with lost data or counter resets Synchronise I/O servers using linear interpolation v k = v i + (˜ t 0 + k ∆˜ t ) − ( t 0 + i ∆ t ) ∆˜ ˜ ( v i +1 − v i ) t Interpolate period ∆ t ˜ t 0 Global start of interpolation v k k -th interpolated value where ( t 0 + i ∆ t ) ≤ (˜ t 0 + k ∆˜ t ) ≤ [ t 0 +( i +1)∆ t ] . Member of the Helmholtz-Association S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 8

  9. Methodology Job information Collect job (Application run during I/O logging) information: t s Start time, t e End time & n I/O servers used Pre-process job list Filter job list, for example to remove erroneous jobs Link performance counters to job Member of the Helmholtz-Association Validate performance counters, preprocessing and linking job to performance counters using jobs with known I/O behaviour (Benchmarks) S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 9

  10. Parallel I/O Characterisation Based on Server-Side Performance Counters Member of the Helmholtz-Association Part III: Characterisation Criteria S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016

  11. Characterisation Criteria Basic Quantities Characterising I/O on a per job basis D r ( l, s, t ) Number of read operations of length l Bytes arriving at server s during [ t s , t ] D w ( l, s, t ) Number of write operations of length l Bytes arriving at server s during [ t s , t ] δ ( s, t, ∆ t ) Helper quantity with value 1 if more than c Bytes are moved  1 if � l l [ D r ( l, s, t + ∆ t ) − D r ( l, s, t )] > c ,  Member of the Helmholtz-Association δ r ( s, t, ∆ t ) = 0 otherwise  where c ≥ 0 is a threshold parameter. S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 11

  12. Characterisation Criteria Bandwidth a Aggregate I/O volumes � � N r = l D r ( l, s, t e ) l s ∈ S where S is the set of I/O servers used by the job. b Bandwidth B r ( s, t ) = 1 � l [ D r ( l, s, t + ∆ t ) − D r ( l, s, t )] ∆ t l Member of the Helmholtz-Association c I/O operations per second (IOPS) Γ r ( s, t ) = 1 � [ D r ( l, s, t + ∆ t ) − D r ( l, s, t )] ∆ t l S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 12

  13. Characterisation Criteria I/O intensity  1 δ ( s, t, ∆ t ) > 0 for any server s ,  Considering: H ( t, ∆ t ) =  0 otherwise   H ( t, ∆ t ) = 1 means I/O exceeded threshold c during [ t , ∆ t ] d I/O intensity: Ratio of number of time intervals with I/O against total number of time intervals. I = ∆ t � n i =0 H ( t i , ∆ t ) t e − t s Member of the Helmholtz-Association where t i = t s + i ∆ t and t s ≤ t i ≤ t e for i = 0 , ..., n 0 ≤ I ≤ 1 , with I = 1 indicating that application is performing continuous read or write. S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 13

  14. Characterisation Criteria Burstiness Considering: l IO Average number of consecutive intervals ∆ t with H = 1 l noIO Average number of consecutive intervals ∆ t with H = 0 e Burstiness parameter � 1 − tanh( l IO /l noIO ) if l noIO > 0 , ρ = 0 otherwise tanh bounds burstiness parameter to the interval [ 0 , 1 ]. Key Point Member of the Helmholtz-Association If a short period of I/O, i.e. l IO is small, is followed by a long period without I/O, i.e. l noIO , becomes large, then we expect ρ to be close to 1 S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 14

  15. Characterisation Criteria Parallel I/O intensity Considering: � s δ ( s, t, ∆ t ) π ( t, ∆ t ) = | S | where | S | is the number of I/O servers used by the job. π = 1 indicates in a given interval all servers read or write data beyond threshold c e Parallel I/O intensity � i π ( t s + i ∆ t, ∆ t ) Π = � i δ ( t s + i ∆ t, ∆ t ) Normalised: Member of the Helmholtz-Association P = | S | Π − 1 | S | − 1 P = 1 when I/O > c all I/O servers are involved P = 0 when I/O > c only one I/O server is involved S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 15

  16. Parallel I/O Characterisation Based on Server-Side Performance Counters Member of the Helmholtz-Association Part IV: Selected Results S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016

  17. Selected Results I/O sub-system background JUGENE (72 racks of BlueGene/P) I/O sub-system uses GPFS Performance counters logged on the 600 I/O nodes with ∆ t = 120 s for approximately 19 months Analysed 0.17 million jobs that ran over 1 hour Counter Description Observable Bytes read � l l D r ( l, s, t ) br Member of the Helmholtz-Association Bytes written � l l D w ( l, s, t ) bw Read requests � l D r ( l, s, t ) rdc Write requests � l D w ( l, s, t ) wc S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 17

  18. Selected Results Aggregate I/O & maximum average bandwidth 10 5 10 4 1.0 1.0 Number of Jobs Number of Jobs (b) (b) 10 4 Cumulative Cumulative 0.8 0.8 10 3 10 3 0.6 0.6 10 2 10 2 0.4 0.4 10 1 10 1 0.2 0.2 Cumulative Cumulative 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 3600 2 44 2 42 (a) (c) (a) (c) 2400 2 39 2 40 3200 Max Written Bandwidth[Bytes/s] 2100 Max Read Bandwidth[Bytes/s] 2 36 2 36 2800 2 33 1800 2 32 2 30 Number of Jobs Number of Jobs 2400 2 28 1500 2 27 2000 2 24 2 24 1200 2 20 2 21 1600 2 18 2 16 900 1200 2 15 2 12 600 2 12 800 2 8 2 9 300 400 2 4 2 6 2 0 0 0 2 12 2 16 2 20 2 24 2 28 2 32 2 36 2 40 2 44 2 9 2 12 2 15 2 18 2 21 2 24 2 27 2 30 2 33 2 36 2 39 2 42 2 0 2 4 2 8 10 1 10 2 10 3 10 4 2 6 10 1 10 2 10 3 10 4 Bytes Read Number of Jobs Bytes Written Number of Jobs Member of the Helmholtz-Association Max read 109.5 TiByte Max write 22.3 TiByte 80% read 12.7 GiByte or less 80% wrote 15.3 GiByte or less 20% read 97.6% of total volume 20% wrote 97.7% of total volume 80% read below 84 MiByte/s 80% wrote below 19 MiByte/s S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 18

  19. Selected Results I/O intensity, burstiness & Parallel I/O intensity 80% of analysed jobs are equal or below these values 0 Byte 128 KiByte 1 MiByte Threshold c read read read I/O intensity ( I ) 0.28 0.15 0.05 Burstiness ( ρ ) 0.99 0.99 1.0 Parallel I/O intensity ( P ) 0.91 0.88 0.84 0 Byte 128 KiByte 1 MiByte Threshold c write write write Member of the Helmholtz-Association I/O intensity ( I ) 1.0 0.34 0.12 Burstiness ( ρ ) 0.0 1.0 1.0 Parallel I/O intensity ( P ) 1.0 0.28 0.27 S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016 Slide 19

  20. Parallel I/O Characterisation Based on Server-Side Performance Counters Member of the Helmholtz-Association Part V: Summary S. El Sayed JSC M. Bolten Kas D. Pleiter JSC and W. Frings JSC November 14, 2016

Recommend


More recommend