HPC Filesystems Today – What’s Working and Opportunities to Improve May 15 2017 Ned Bass Dagstuhl Seminar 17202 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
Current Parallel File System Summary (OCF) OCF File Bandwidth Capacity OSTs MDTs Systems lscratchrzb 18GB/s 1.2PB 16 1 (Stout) lscratchf (Cider) 36GB/s 2.4PB 32 1 lscratchd 90GB/s 5.7PB 80 1 (Pilsner) lscratche 90GB/s 5.7PB 80 1 (Porter) lscratchv 106GB/s 6.7PB 96 1 (Vesta) lscratchh (Zinc) 60 GB/s 18PB 36 16 Lscratchrza 30 GB/s 9PB 18 4 (Brass) Lawrence Livermore National Laboratory
Current Parallel File System Summary (SCF) SCF File Bandwidth Capacity OSS OSTs Systems lscratch1 (Grove) 850GB/s 53PB 768 768 lscratch7 (Lambic) 90GB/s 5.7PB 80 80 lscratch3 (Marzen) 90GB/s 5.7PB 80 80 lscratch6 (Bock) 90GB/s 5.7PB 80 80 * Multiple MDS nodes will be utilized in the future when LC stability requirements. Lawrence Livermore National Laboratory
What’s Working Well ▪ Open Source Development ▪ Scalability for Current-Generation Systems ▪ Data Integrity ▪ Stability ▪ Well Understood Programming Model ▪ Well-formed I/O Performs Well Lawrence Livermore National Laboratory
HPC Filesystem Challenges ▪ Storage Hierarchy not Transparent to Users ▪ Inflexible Semantics – System Decides Consistency Model ▪ Heavy Burden on Users to Manage Data ▪ Technical Debt ▪ High Total Cost of Ownership ▪ Visibility and Debugging for Devs and Admins ▪ Metadata Performance ▪ Disk/JBOD Management Lawrence Livermore National Laboratory
Heavy Data Management Burden ▪ Knowing where their data lives ▪ Knowing where it should live ▪ What is the provenance ▪ Opportunity: efficient, intuitive interfaces that are integrated across the storage hierarchy Lawrence Livermore National Laboratory
More recommend