SFIO progress on Swiss-Tx SCS meeting on Frangipani: a scalable distrib- uted file system to Linux December 1, 2000 Emin Gabrielyan EPFL, Computer Science Dept. Peripheral Systems Lab. Emin.Gabrielyan@epfl.ch • SFIO library architecture • SFIO on top of MPICH and on top of MPIFCI, performance on T1 • performance of SFIO on top of MPIFCI on T1. Very large files, no cache effect. • Swiss-T1’s topology. Possible influence to the SFIO performance • Conclusion • Future work
mread mreadb mwritec mwriteb mreadc mwrite cyclic mrw distribution sfp_rflush sfp_wflush sfp_readc sfp_writec sfp_read requests sfp_rdwrc caching sortcache sfp_write flushcache Compute Node mkbset sfp_readb sfp_writeb sfp_waitall MPI MPI I/O Node MPI MPI bkmerge SFP_CMD SFP_CMD SFP_CMD SFP_CMD _READ _WRITE _BREAD _BWRITE -
Com Com Com Com pute pute pute pute I/O I/O I/O I/O tonep1 tonep3 tonep0 tonep2 Network • SFIO All-to-All concurent write access from all compute nodes to all I/O nodes • Global File size is 2000MByte for MPICH and MPIFCI. • Stripe unit size is 200Byte only
SFIO all-to-all I/O performance on Swiss-T1’s Fast Ethernet and Tnet SFIO on top of MPICH 70 70 number of compute and I/O nodes 60 60 Performance MB/s 50 50 40 40 30 30 20 20 10 10 0 0 3 2 0 3 2 8 31 6 3 1 2 29 9 2 4 27 2 2 2 2 7 2 0 2 25 5 8 23 3 1 2 1 6 2 21 1 4 19 19 1 17 17 1 2 0 15 15 1 13 13 0 8 6 11 11 0 09 0 4 9 07 7 0 2 05 5 03 3 01 1 SFIO on top of MPI-FCI 800 800 d e s n o O I / n d e a u t m p 700 c o o f 700 e r m b u n 600 600 Performance MB/s 500 500 400 400 300 300 200 200 100 100 0 0 2 3 0 3 31 31 2 8 29 6 29 2 27 27 2 4 25 25 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 23 21 19 17 15 13 11 9 7 5 3 1 • Superlinear speedup of SFIO/FCI due to augmentation of cache effect when increasing the number of I/O nodes.
SFIO All-to-all performance on T1. (1GB-31GB file size, 200Byte chunk, 53 measurements) 400 350 300 throughput MB/s 250 write maximum write average 200 read maximum 150 read average 100 50 0 1 4 7 0 3 6 9 2 5 8 1 1 1 1 1 2 2 2 3 Number of I/O Nodes • To avoid the cache effect the total size of SFIO files is increasing when the number of I/O nodes grows.
11 12 13 14 15 16 17 18 19 20 10 21 09 22 08 23 2 3 07 24 06 25 05 26 04 27 03 28 4 1 02 29 01 30 00 31 63 32 62 33 61 34 5 8 35 60 36 59 37 58 57 38 39 56 7 6 55 40 54 41 53 42 43 52 51 44 45 50 46 49 48 47 ?? Processor ? TNet 12 port Full Crossbar Switch TNet connection Logical Routing Swiss-T1 TNet interconnection and routing topology
11 12 13 14 15 16 17 18 19 20 10 21 09 22 08 23 2 3 07 24 06 25 05 26 04 27 03 28 4 1 02 29 01 30 00 31 63 32 62 33 61 34 5 8 60 35 36 59 37 58 38 57 56 39 7 6 55 40 41 54 42 53 52 43 51 44 45 50 49 46 ?? I/O Node 48 47 ?? Compute Node ? TNet 12 port Full Crossbar Switch TNet connection Logical Routing Swiss-T1 SFIO over TNet topology
36 Pr. 56 56 56 56 38 Pr. 53 53 53 53 56 56 53 53 56 56 53 53 56 56 53 53 56 56 53 53 100 89 56 56 53 53 1 2 1 2 56 56 53 53 56 100 56 53 89 53 75 78 56 56 53 53 56 56 53 53 56 56 53 53 75 78 56 56 53 53 0 3 0 3 25 33 56 56 53 53 56 56 53 53 56 56 53 53 50 44 75 100 56 53 50 44 56 53 56 53 7 4 7 4 56 53 53 53 6 5 6 5 I/O Node I/O Node 56 53 Tnet Tnet 0 0 Compute Node Switch Compute Node Switch 56 53 42 42 44 44 40 Pr. 42 42 42 Pr. 44 44 42 42 44 44 42 42 44 44 42 42 44 44 42 42 44 44 67 75 42 42 44 44 1 2 1 2 42 42 44 44 42 42 44 44 67 67 67 67 42 42 44 44 42 42 44 44 42 42 44 44 67 67 42 42 44 44 0 3 0 3 42 33 42 44 33 44 42 42 44 44 42 42 44 44 17 33 100 33 100 42 44 33 8 42 42 44 42 44 7 4 7 4 42 44 42 44 8 42 44 42 44 42 44 8 44 6 5 6 5 44 I/O Node I/O Node 42 44 Tnet Tnet 0 0 Compute Node Switch Compute Node Switch 42 44 connection loads
Theoretical throughput of the Swiss-T1 network as a percentage of ideal throughput of fully crossbared switch. 1008000 montecarlo events of parallel simulation. The min represent the worst topology and the max the best 70 throughput in percentage 60 50 max 40 aver 30 CODINE min 20 10 0 1 5 9 3 7 1 5 9 1 1 2 2 2 number of nodes Theoretical throughput
Conclusion • SFIO is portable, highly scalable, and ready for the distribution.
Future work • SFIO performance benchmarking on the large supercomputer of Sandia National Laboratory. • Performance measurements of MPI-I/O inter- faced to SFIO through MPICH/ADIO. • Possibly, creation of a portable MPI-I/O interface library to SFIO.
Recommend
More recommend