Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth - PowerPoint PPT Presentation

Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University �

What is PLFS? • Parallel Log Structured File System – Interposed filesystem b/w apps & backing storage – Los Alamos National Labs, CMU, EMC, … – Target: HPC checkpoint files • PLFS transparently transforms a highly concurrent write access pattern to a pattern more efficient for distributed filesystems – First paper: Bent et al, Supercomputer 2009 – http://github.com/plfs, http://institute.lanl.gov/plfs/ � 2

Checkpoint Write Patterns • The two main checkpoint write patterns: – N-1: all N processes write to one shared file • Concurrent I/O to a single file is often unscalable • Small, unaligned, clustered traffic is problematic – N-N: each process writes to its own file • Overhead of inserting many files in a single dir • Easier for DFS (after files created) • Archival and management more difficult • Initial PLFS focus: improve N-1 case � 3

PLFS Transforms Workloads • PLFS improves N-1 performance by transforming it into an N-N workload • FUSE/MPI: transparent solution, no application changes required � 4

PLFS Converts N-1 to N-N 279 131 152 132 281 148 host1 host3 host2 PLFS Virtual Layer /foo /foo/ hostdir.1/ hostdir.2/ hostdir.3/ data.132 data.131 data.281 data.279 data.148 data.152 indx.132 indx.131 indx.281 indx.279 indx.148 indx.152 � Physical Underlying Parallel File System 5

PLFS N-1 Bandwidth Speedups 100X SPEED UP 10X � 6

The Price of Success • Original PLFS was limited to 1 workload: – N-1 checkpoint on mounted posix filesystem – All data stored in PLFS container logs • Ported first to MIO-IO/ROMIO – Feasibly deploy on leadership class machines • Success with LANL apps: actual adoption? – Requires maintainability & roadmap evolution – Develop a team: LANL, EMC, CMU, … • Revisit code with maintainability in mind � 7

PLFS Extensibility Architecture HPC Application libplfs PLFS high-level API Logical FS interface flat file small file container Index API byte-range pattern distributed I/O Store interface posix pvfs iofsl hdfs MDHIM libhdfs/jvm hdfs.jar w/LevelDB � 8

Case Study: HPC in the Cloud • Emergence of Hadoop: converged storage • HDFS: Hadoop Distributed Filesystem – Key attributes: • Single sequential writer (not POSIX, no pwrite) • Not VFS mounted, access through Java API • Local storage on nodes (converged) • Data replicated ~3 times (local+remote1+remote2) • HPC in the Cloud: N-1 checkpoint on HDFS? – Observation: PLFS log I/O fits HDFS semantics � 9

PLFS Backend Limitations • PLFS hardwired to POSIX API: – Needs a kernel mounted filesystem – Uses integer file descriptors – Memory maps index files to read them • HDFS does not fit these assumptions • Solution: I/O Store – Insert a layer of indirection above PLFS backend – Model after POSIX API to minimize code changes � 10

PLFS I/O Store Architecture PLFS FUSE PLFS MPI I/O libplfs PLFS container I/O store posix HDFS i/o i/o posix libc lib{hdfs,jvm} API mounted fs hdfs.jar Java code � 11

PLFS/HDFS Benchmark • Testbed: PRObE (www.nmc-probe.org) • Each node has dual 1.6GHz AMD cores, 16GB RAM, 1TB drive, gigabit ethernet • Ubuntu Linux, HDFS 0.21.0, PLFS, OpenMPI • Benchmark: LANL FS Test Suite (fs_test) • Simulates N-1 checkpoint, strided • Filesystems tested: – PVFS OrangeFS 2.8.4 w/64MB stripe size – PLFS/HDFS w/1 replica (local disk) – PLFS/HDFS w/3 replicas (local disk + remote1 + remote 2) • Blocksizes: 47001, 48K, 1M • Checkpoint size: 32GB written by 64 nodes � 12

Benchmark Operation nodes 0 1 2 3 continue pattern for remaining strides write phase block read phase stride 2 3 0 1 nodes (shifted for read) We unmount and cache flush data filesystem between read/write � 13

PLFS Implementation Architecture • FUSE filesystem and a Middleware lib (MPI) PLFS PLFS PLFS PLFS PLFS FUSE FUSE FUSE MPI app MPI app app app daemon proc1 proc2 proc1 proc2 PLFS/ PLFS/ PLFS lib app i/o MPI libs MPI libs use r kernel FUSE VFS/POSIX API upcall MPI sync calls FUSE backing Distributed Local fs store i/o interconnect module fs to disk to network to other nodes � 14

PLFS/HDFS Write Bandwidth PVFS-write 2000 PLFS/HDFS1-write PLFS/HDFS3-write write bandwidth (Mbytes/s) 1500 1000 500 0 47001 48K 1M access unit size (bytes) � 15

PLFS/HDFS Write Bandwidth • PLFS/HDFS performs well (note HDFS1 is local disk) PVFS-write 2000 PLFS/HDFS1-write PLFS/HDFS3-write write bandwidth (Mbytes/s) 1500 1000 500 0 47001 48K 1M access unit size (bytes) � 16

PLFS/HDFS Write Bandwidth • PLFS/HDFS performs well (note HDFS3 is 3 copies) PVFS-write 2000 PLFS/HDFS1-write PLFS/HDFS3-write write bandwidth (Mbytes/s) 1500 1000 500 0 47001 48K 1M access unit size (bytes) � 17

PLFS/HDFS Read Bandwidth • HDFS with small access size benefits from PLFS log grouping PVFS-read PLFS/HDFS1-read PLFS/HDFS3-read 1000 read bandwidth (Mbytes/s) 500 0 47001 48K 1M access unit size (bytes) � 18

PLFS/HDFS Read Bandwidth • HDFS3 with large access size suffers imbalance PVFS-read PLFS/HDFS1-read PLFS/HDFS3-read 1000 read bandwidth (Mbytes/s) 500 0 47001 48K 1M access unit size (bytes) � 19

HDFS 1 vs 3: I/O Scheduling • Network counters show HDFS3 read imbalance PLFS/HDFS1 PLFS/HDFS3 1000 Total size of data served (MB) 500 0 10 20 30 40 50 60 Node number � 20

I/O Store Status • Rewrote initial I/O Store prototype – Production-level code – Multiple concurrent instances of I/O Stores • Re-plumbed entire backend I/O path • Prototyped POSIX, HDFS, PVFS stores – IOFSL done by EMC • Regression tested at LANL • I/O Store now part of PLFS released code – https://github.com/PLFS � 21

Conclusions • PLFS extensions for workload transformation: – Logical FS interface • Not just container logs; packing small files, burst buffer – I/O Store layer • Non-POSIX backends (HDFS, IOFSL, PVFS) • Compression, write buffering, IO forwarding – Container index extensions • PLFS is open source, available on github – http://github.com/plfs – Developer email: plfs-devel@lists.sourceforge.net � 22

Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth - PowerPoint PPT Presentation

Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University What is PLFS? Parallel Log Structured File System Interposed filesystem b/w apps & backing storage

Extensibility in the Kerberos Protocol Sam Hartman Mekinok, Inc. IETF 52 Table of Contents

GNU/Hurd AKA Extensibility from the Ground Samuel Thibault 2011 August 26th 1 <marcus>

Variant Path Types for Scalable Extensibility Atsushi Igarashi (Kyoto Univ.) joint work with

Extensibility for DSL design and Example implementation Step 1 Step 2 A case study in Common

DSLs from the Perspective of Extensible Taxonomy Extensibility Languages Example Internal

Update on Kerberos Extensibility draft-yu-krb-wg-kerberos-extensions-02.txt Tom Yu IETF 61

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

Good Deals Gone Bad: Good Deals Gone Bad: Structuring Transactions to Structuring Transactions

Genetic structuring of Genetic structuring of spotted gums spotted gums Merv Shepherd Merv

Structuring the Financing Structuring the Financing The Mechanics of a Bond Sale The Mechanics

Alpha Presentation Snagit and Camtasia Output Extensibility The Capstone Experience Team

Beta Presentation Snagit and Camtasia Output Extensibility The Capstone Experience Team

Project Plan Snagit and Camtasia Output Extensibility The Capstone Experience Team TechSmith

XLIFF Extensibility and Metadata Applied to DITA-XLIFF and Drupal-XLIFF programs Bryan Schnabel

Extensibility in GNUstep & toil GNU Hackers 2011 http://www.etoileos.com

Plugin Architectures in Haskell Motivation 1 [1] [2] Problem Description Extensibility through

Analytical Data Management with R Hannes Mhleisen /132 1 Overview 1. Motivations to use a

Anno unc e me nts FIT100 FIT100 FIT100 Quiz c a nc e le d fo r this we e k Anno unc e me

Guest Lecture Daniel Dao & Chad Cotton OVERVIEW What is Civitas Learning What We Do

Flat Files vs. DB Files So far, our PHP examples have

COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL Week 5 Bijan Parsia &

C S C I 1 2 7 0 I n t r o d u c t i o n t o D a t a b a s e S y s

Streaming Grand Challenge Overview Graham Heyes February 12 th 2019 Where are we now? Online :

Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Motivation Imagine a

Sambuz

Useful Links

Newsletter

Mail Us

Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth - PowerPoint PPT Presentation

Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University What is PLFS? Parallel Log Structured File System Interposed filesystem b/w apps & backing storage

Extensibility in the Kerberos Protocol Sam Hartman Mekinok, Inc. IETF 52 Table of Contents

GNU/Hurd AKA Extensibility from the Ground Samuel Thibault 2011 August 26th 1 &lt;marcus&gt;

Variant Path Types for Scalable Extensibility Atsushi Igarashi (Kyoto Univ.) joint work with

Extensibility for DSL design and Example implementation Step 1 Step 2 A case study in Common

DSLs from the Perspective of Extensible Taxonomy Extensibility Languages Example Internal

Update on Kerberos Extensibility draft-yu-krb-wg-kerberos-extensions-02.txt Tom Yu IETF 61

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

Good Deals Gone Bad: Good Deals Gone Bad: Structuring Transactions to Structuring Transactions

Genetic structuring of Genetic structuring of spotted gums spotted gums Merv Shepherd Merv

Structuring the Financing Structuring the Financing The Mechanics of a Bond Sale The Mechanics

Alpha Presentation Snagit and Camtasia Output Extensibility The Capstone Experience Team

Beta Presentation Snagit and Camtasia Output Extensibility The Capstone Experience Team

Project Plan Snagit and Camtasia Output Extensibility The Capstone Experience Team TechSmith

XLIFF Extensibility and Metadata Applied to DITA-XLIFF and Drupal-XLIFF programs Bryan Schnabel

Extensibility in GNUstep &amp; toil GNU Hackers 2011 http://www.etoileos.com

Plugin Architectures in Haskell Motivation 1 [1] [2] Problem Description Extensibility through

Analytical Data Management with R Hannes Mhleisen /132 1 Overview 1. Motivations to use a

Anno unc e me nts FIT100 FIT100 FIT100 Quiz c a nc e le d fo r this we e k Anno unc e me

Guest Lecture Daniel Dao &amp; Chad Cotton OVERVIEW What is Civitas Learning What We Do

Flat Files vs. DB Files So far, our PHP examples have

COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL Week 5 Bijan Parsia &amp;

C S C I 1 2 7 0 I n t r o d u c t i o n t o D a t a b a s e S y s

Streaming Grand Challenge Overview Graham Heyes February 12 th 2019 Where are we now? Online :

Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Motivation Imagine a

Sambuz

Useful Links

Newsletter

Mail Us

GNU/Hurd AKA Extensibility from the Ground Samuel Thibault 2011 August 26th 1 <marcus>

Extensibility in GNUstep & toil GNU Hackers 2011 http://www.etoileos.com

Guest Lecture Daniel Dao & Chad Cotton OVERVIEW What is Civitas Learning What We Do

COMP60411: Modelling Data on the Web Graphs, RDF, RDFS, SPARQL Week 5 Bijan Parsia &