Implementation, evaluation and analysis of Block index for ADIOS Tzuhsien Wu, Jerry Chou National Tsing Hua University, Taiwan Norbert Podhorszki, Yuan Tian Oak Ridge National Laboratory, USA Junmin Gu, Kesheng Wu Lawrence Berkeley National Laboratory, USA NTHU LSA Lab 1
Introduction Scientific datasets are commonly stored and managed by parallel file systems and I/O libraries ◦ E.g. Lustre, HDF5, NetCDF, ADIOS ◦ optimized for reading/writing large chunks of data ◦ Data layout and file organization impact query performance The characteristics and behaviors of I/O systems should be considered into the design of indexing methods NTHU LSA Lab 2
The idea of “Block index” Indexing blocks (consecutive data records) instead of individual data records ◦ Reduce index size ◦ Reduce number of I/O requests ◦ Reading an individual record has similar I/O latency as reading a data block NTHU LSA Lab 3
Implement block index into ADIOS Minmax method in ADIOS ◦ Records the min, max value from each writeblock ◦ The size of writeblock => the size of data of each process (can be extremely big) Block index method in ADIOS ◦ Logically divides a writeblock into smaller partitions ◦ Records the min, max values of each partition ◦ Using logical partition can maintain the same number of writeblock ◦ The I/O requests on the same writeblock can be merged by ADIOS to minimize I/O contention NTHU LSA Lab 4
Experiment Setup Edison Cray XC30 at NERSC ◦ 5576 compute nodes, with 12-core Intel Ivy Bridge 2.4GHz CPU and 64GB memory per node ◦ Lustre parallel file system with 72GB peak performance S3D dataset ◦ Each variable contains 1100*1080*1408 double precision records ◦ Each variable is written to file using 64 writeblocks of size 275*270*352 (~200MB) NTHU LSA Lab 5
Performance evaluation Varied partition size ◦ The performance is a tradeoff between read size and I/O throughput ◦ Minmax’s read bytes is more than twice the block index NTHU LSA Lab 6
Performance evaluation Varied query selectivity ◦ Block index reads less data when query selectivity is smaller => speedup is higher ◦ Similar performance under 100% query selectivity NTHU LSA Lab 7
Conclusion Query performance of minmax is limited by the size of writeblock Query performance of Block index that logically partitions a writeblock improves due to less data reading, and more flexible read size Future work ◦ Performance analysis and modeling of I/O systems ◦ Design the algorithm to select the proper block size and request merging condition NTHU LSA Lab 8
THANKYOU NTHU LSA Lab 9
Recommend
More recommend