Parallel File Systems John White Lawrence Berkeley National Lab
Topics • Defining a File System • Our Specific Case for File Systems • Parallel File Systems • A Survey of Current Parallel File Systems • Implementation
What is a File System? • Simply, a method for ensuring ▪ A Unified Access Method to Data ▪ Organization (in a technical sense…) ▪ Data Integrity ▪ Efficient Use of Hardware
The HPC Application (our application) • Large Node Count • High IO Code (small file operations) • High Throughput Code (large files fast) • You Can Never Provide Too Much Capacity
What’s the Problem With Tradition? • NFS/CIFS/AFP/NAS is slow Single point of contact for both data and metadata ▪ Protocol Overhead ▪ File based locking ▪ We want parallelism from the application to disk ▪ • We Need a Single Namespace • We Need Truly Massive Aggregate Throughput (stop thinking MB/s) • Bottlenecks are Inherent to Architecture • Most Importantly:
• Researchers Just Don’t Care ▪ They want their data available everywhere ▪ They hate transferring data (this bears repeating) ▪ Their code wants the data several cycles ago ▪ If they have to learn new IO APIs, they commonly won't use it, period ▪ An increasing number aren’t aware their code is inefficient
Performance in Aggregate: A Specific Case • File System capable of Performance of 5GB/s • Researcher running an analysis of past stock ticker data ▪ 10 independent processes per node, 10+ nodes, sometimes 1000+ processes ▪ Was running into “performance issues” • In Reality, code was hitting 90% of peak performance ▪ 100s of processes choking each other ▪ Efficiency is key
Parallel File Systems • A File System That Provides ▪ Access to Massive Amounts of Data at Large Client Counts ▪ Simultaneous Client Access at Sub-File Levels ▪ Striping at Sub-File Levels ▪ Massive Scalability ▪ A Method to Aggregate Large Numbers of Disks
Popular Parallel File Systems • Lustre Purchased by Intel ▪ Support offerings from Intel, Whamcloud and numerous ▪ vendors Object based ▪ Growing feature list ▪ ∼ Information Lifecycle Management ∼ “Wide Area” mounting support ∼ Data replication and Metadata clustering planned Open source ▪ ∼ Large and growing install base, vibrant community ∼ “Open” compatibility
Popular Parallel File Systems • GPFS IBM, born around 1993 as Tiger Shark multimedia file ▪ system Support direct from vendor ▪ AIX, Linux, some Windows ▪ Ethernet and Infiniband support ▪ Wide Area Support ▪ ILM ▪ Distributed metadata and locking ▪ Matured storage pool support ▪ Replication ▪
Licensing Landscape • GPFS (A Story of a Huge Feature Set at a Huge Cost) ▪ Binary ∼ IBM licensing • Per Core ▪ Site-Wide • Lustre ▪ Open ▪ Paid Licensing available tied to support offerings
Striping Files
SAN – All nodes have access to storage fabric, all LUNs
Direct Connect – A separate storage cluster hosts and exports via common fabric
Berkeley Research Computing • Current Savio Scratch File System ▪ Lustre 2.5 ▪ 210TB of DDN 9900 ∼ ~10GB/s ideal throughput ▪ Accessible on all nodes • Future ▪ Lustre 2.5 or GPFS 4.1 ▪ ~1PB+ Capacity ▪ ~20GB/s throughput ▪ Vendor yet to be determined
Berkeley Research Computing • Access Methods ▪ Available on every node ∼ POSIX ∼ MPIIO ▪ Data Transfer ∼ Globus Online • Ideal for large transfers • Restartable • Tuned for large networks and long distance • Easy to use graphical interface online ∼ SCP/SFTP • Well known • Suitable for quick and dirty transfers
Current Technological Landscape • Tiered Storage (Storage Pools) ▪ When you have multiple storage needs within a single namespace ∼ SSD/FC for for jobs, metadata (Tier0) ∼ SATA for capacity (Tier1) ∼ Tape for long-term/archival (Tier2) • ILM ▪ Basically, perform actions on data per a rule set ∼ Migration to Tape ∼ Fast Tier 0 storage use case ∼ Purge Policies • Replication ▪ Dangers of metadata operations ▪ Long term storage
Further Information Berkeley Research Computing http://research-it.berkeley.edu/brc HPCS At LBNL http://scs.lbl.gov/ Email: jwhite@lbl.gov
Recommend
More recommend