datamods
play

DataMods Programmable File System Services Noah Watkins*, Carlos - PowerPoint PPT Presentation

DataMods Programmable File System Services Noah Watkins*, Carlos Maltzahn, Scott Brandt UC Santa Cruz, *Inktank Adam Manzanares California State University, Chico 1 Talk Agenda 1. Middleware and modern IO stacks 2. Services in middleware and


  1. DataMods Programmable File System Services Noah Watkins*, Carlos Maltzahn, Scott Brandt UC Santa Cruz, *Inktank Adam Manzanares California State University, Chico 1

  2. Talk Agenda 1. Middleware and modern IO stacks 2. Services in middleware and parallel file systems 3. Avoid duplicating work with DataMods 4. Case study: Checkpoint/restart 2

  3. Why DataMods? • Applications struggle to scale on POSIX I/O • Parallel FS rarely provide other interfaces – POSIX I/O designed to prevent lock-in • Open-source PFS are now available – Ability to avoid lock-in • Can we generalize PFS services to provide new behavior to new users? 3

  4. Application Middleware • Complex data models and interfaces • Difficult to work directly with simple byte stream • Middleware maps the complex onto the simple 4

  5. Middleware Complexity Bloat • Hadoop and “Big Data” data models – Ordered key/value pairs stored in file – Dictionary for random key-oriented access – Common table abstractions 5

  6. Middleware Complexity Bloat • Scientific data – Multi-dimensional arrays – Imaging – Genomics 6

  7. Middleware Complexity Bloat • IO Middleware – Low-level data models and I/O optimization – Transformative I/O avoids POSIX limitations 7

  8. Middleware Scalability Challenges • Scalable storage system • Exposes one data model • Must find ‘magic’ alignment 8

  9. Data Model Modules • Plugin new “ file ” interfaces and behavior • Native support; atop existing scalable services New behavior Generalized storage services Pluggable customization ( new programmer role ) 9

  10. What does middleware do? Metadata Data Management Placement Intelligent Asynchronous Access Services 10

  11. Middleware: Metadata Management File • Byte stream layout Header • Data type information • Data model attributes • Example: Mesh Data Model – How is the mesh represented? – What does it represent? 11

  12. Middleware: Data Placement • Serialization Header • Placement index • Physical alignment Data – Including the metadata • Example: Mesh Data Model Met a – Vertex lists Data – Mesh elements Met a – Metadata Data 12

  13. Middleware: Intelligent Access • Data model specific interfaces Header • Rich access methods – Views, subsetting, filtering Data • Write-time optimizations • Locality and data movement Met a Data HDF5 Library Met read(array-slice) Array-based a Application Data 13

  14. Middleware: Asynchronous Services • Workflows Header – Regridding • Compression HDF5 Library Workflow Data • Indexing Driver • Layout optimization Met a • Performed online Data Met a Data 14

  15. Middleware Challenges • Inflexible byte stream abstraction • Scalability rules are simple – But middleware is complex • Applying ‘magic number’ – Unnatural and difficult to propogate • Loss of detail at lower-levels – Difficult for in-transit / co-located compute 15

  16. Storage System Services • Scalable meta data – Clustered service – Scalability invariants • Distributed object store – Local compute resources – Define new behavior • File operations – POSIX • Fault-tolerance – Scrubbing and replication 16

  17. DataMods Abstraction File Manifold (Metadata and Data Placement) Typed and Active Asynchronous Storage Services 17

  18. DataMods Architecture • Generalized file system services • Exposed through programming model 18

  19. File Manifold • Metadata management and data placement – Flexible, custom layouts • Extensible interfaces • Object namespace managed by manifold • Placement rules evaluated by system 19

  20. Typed and Active Storage • Active storage adoption has been slow – Code injection is scary – Security and QoS • Reading, writing, and checksums are not free • Exposing scalable services is tractable – Well-defined data models supports optimization – Programming model support data model creation – Indexing and filtering 20

  21. Asynchronous Services • Re-use of active / typed storage components • Temporal relationship to file manifold – Incremental processing – After file is closed – Object update trigger • Scheduling – Exploit idle time – Integrate with larger ecosystem – Preempted or aborted 21

  22. Case Study: PLFS Checkpoint/Restart • Long-running simulations need fault-tolerance – Checkpoint simulation state • Simulations run on expensive machines – Very expensive machines. Really, very expensive. • Decrease cost (time) of checkpoint/restart • Translation: increase bulk I/O bandwidth 22

  23. Overview of PLFS • Middleware layer – Transforms I/O pattern • IO Pattern: N-1 – Most common • IO Pattern: N-N – File system friendly • Convert N-1 into N-N • Applications see the same logical file 23

  24. Simplified PLFS I/O Behavior Client 1 Client 2 Client 3 Parallel Log-structured File System Index Index Index Log-structured Log-structured Log-structured 24

  25. PLFS Scalability Challenges • Index maintenance and volume • Optimization above file system – Compression and reorganization Compute Application PLFS File System Optimization Process Time 25

  26. Moving Overhead to Storage System • Checkpoints are not read immediately (if at all) – Index maintenance and optimization in storage Compute Application PLFS File System Return to compute sooner Time Optimization Process 26

  27. DataMods Module for PLFS • File Manifold – Logical file view – Per-process log-structured files – Index • Hierarchical Solution – Top-level manifold routes to logs – Inner manifold implements log-structured file – Automatic namespace management (metadata) 27

  28. PLFS Outer File Manifold Logical top-half file is not materialized 28

  29. PLFS Outer File Manifold Logical top-half file is not materialized Routes to per- process log file 29

  30. PLFS Inner File Manifold Logical top-half file is not materialized Routes to per- process log file Append striping within object namespace 30

  31. PLFS Inner File Manifold Logical top-half file is not materialized Routes to per- process log file Append striping within object namespace Index-enabled objects record logical-to-phy 31

  32. PLFS Inner File Manifold Logical top-half file is not materialized Routes to per- process log file Interface to index maintenance Append striping routines within object namespace Index-enabled objects record logical-to-phy 32

  33. Active and Typed Objects • Append-only object • Automatic indexing • Managed layout • Built on existing services • Logical view at lowest level • Index maintenance interface

  34. Offline Index Optimization • Extreme index fragmentation (per-object) • Exploit opportunities for optimization – Storage system idle time – Re-use of analysis I/O – Piggy-backed on scrubbing / healing • Index Compression – Merging contiguous entries – Pattern discovery and replacement – Consolidation 34

  35. Offline Index Optimization • Three stage pipeline – Incremental compression and consolidation • Incremental compression 1. Merging physically contiguous entries (in PLFS) Not subject to buffer size limits • Applied technique to 92 PLFS indexes • published by LANL 35

  36. Merging Reduces PLFS Index Size 10000000 Raw Trace (Baseline) Large, Strided Merge Compress 1000000 100000 Number of Index Entries 10000 Contiguous Writes 1000 100 10 1 1 11 21 31 41 51 61 71 81 91 PLFS Map File 36

  37. Index Compression: Pattern • Compactly represent extents using patterns • Example pattern template – offset + stride * i, low < i < high • Fit data to this pattern to reduce index size • Linear algorithm; parallel across logs 37

  38. Pattern Compression Improves Over Merging 10000000 Raw Trace (Baseline) Strided pattern identified Merge Compress 1000000 Pattern Compress 100000 Number of Index Entries 10000 1000 100 10 1 1 11 21 31 41 51 61 71 81 91 PLFS Map File 38

  39. Index Consolidation • Combines all logs together (in PLFS) • Increases index read efficiency Index Consolidation Index Pack … 39

  40. Wrapping Up • Implementing new data model plugins – Hadoop and Visualization – Refining API with more use cases – Constructing specification language • Thank you to supporters – DOE funding (DE-SC0005428), Gary Grider John Bent, James Nunez • Questions? --- jayhawk@cs.ucsc.edu • Poster session 40

  41. Extra Slides 41

Recommend


More recommend