okeanos wasteless journaling for fast and reliable
play

Okeanos: Wasteless Journaling for Fast and Reliable Multistream - PowerPoint PPT Presentation

Okeanos: Wasteless Journaling for Fast and Reliable Multistream Storage Andromachi Hatzieleftheriou , Stergios V. Anastasiadis Department of Computer Science University of Ioannina, Greece University of Ioannina A. Hatzieleftheriou 1 Outline


  1. Okeanos: Wasteless Journaling for Fast and Reliable Multistream Storage Andromachi Hatzieleftheriou , Stergios V. Anastasiadis Department of Computer Science University of Ioannina, Greece University of Ioannina A. Hatzieleftheriou 1

  2. Outline Motivation Design Implementation Evaluation Conclusions University of Ioannina A. Hatzieleftheriou 2

  3. Motivation Write Traffic • Synchronous small writes (Linux ext3) data & metadata  critical for system and application Page Size=4KB Total Journal Volume (MB) reliability 1000 metadata only 100 10 Data Journaling • Multistream concurrency Ordered 1  effectively random I/O 0 1 10 100 Request Size (KB) • In page-sized disk accesses  async writes have good performance due to batching in memory  sync writes result in wasteful traffic due to excessive full-page I/Os University of Ioannina A. Hatzieleftheriou 3

  4. Design Goals Reliable storage 1.  keep data on disk Inexpensive synchronous small writes 2.  sequential disk throughput Reduce disk bandwidth waste due to: 3.  writes with high positioning overhead  unnecessary writes of unmodified data • Proposed approach:  batch random small writes in memory  journal data updates at subpage granularity University of Ioannina A. Hatzieleftheriou 4

  5. Wasteless Journaling M EMORY Pages data deltas Journal Filesystem D ISK • Idea: 1. Synchronously transfer data deltas from memory to journal 2. Occasionally move data blocks from memory to final location • Still wasteful!  large writes  disk traffic duplication University of Ioannina A. Hatzieleftheriou 5

  6. Selective Journaling M EMORY Pages data deltas Journal Filesystem D ISK • Definition:  write threshold differentiates requests by size • Idea: 1. Transfer large requests to final location without journaling of data 2. Treat small requests according to wasteless journaling University of Ioannina A. Hatzieleftheriou 6

  7. Consistency • Wasteless Journaling:  atomic updates of both data and metadata • Selective Journaling: data updates either journaled or not depending on request size   consistency at least as strict as default ext3 journaling mode (ordered) University of Ioannina A. Hatzieleftheriou 7

  8. Prototype Implementation Journal Descriptor Multiwrite Journal Block Block Data Copies Header Data Delta • block num of final location Tag Data Delta • offset in page Tag • length in bytes Data Delta … … Tag … Page Cache Modified Data Block Buffer Original Data • Multiwrite journal block  accumulates multiple subpage data updates • During recovery  apply data deltas to corresponding final disk blocks University of Ioannina A. Hatzieleftheriou 8

  9. Experiments • Implemented in Linux kernel 2.6.18 ext3 • Experimentation Environment:  x86-based servers  quad-core 2.66GHz processor  3GB RAM  Seagate Cheetah SAS 300GB 15KRPM disks • Workloads:  Microbenchmarks  Postmark  MPIO-IO over PVFS2 University of Ioannina A. Hatzieleftheriou 9

  10. Latency 1 Mbps/stream 1 Mbps/stream 1000 1000 Write Latency (ms) Read Latency (us) 100 100 Selective NILFS 10 10 Ordered Selective Wasteless Ordered Data Data Wasteless NILFS 1 1 0 20 40 60 80 100 0 20 40 60 80 100 Number of Streams Number of Streams ⁻ Data & wasteless achieve substantially lower write latency  similar to NILFS (stable Linux port of LFS ) ⁻ NILFS read latency significantly higher due to poor storage locality! University of Ioannina A. Hatzieleftheriou 10

  11. Disk Traffic Lower is better! 1Kbps/stream 1Kbps/stream Journal Throughput (MB/s) File System Throughput (MB/s) Ordered 5 10 Data Wasteless 4 1 Selective 3 0.1 Data 2 Wasteless 0.01 1 Selective Ordered 0.001 0 0 2000 4000 6000 8000 0 2000 4000 6000 8000 Number of Streams Number of Streams ⁻ Data journaling expensive in terms of journal traffic ⁻ Ordered journaling incurs increased filesystem traffic ⁻ Wasteless & selective substantially reduce journal and filesystem traffic University of Ioannina A. Hatzieleftheriou 11

  12. Application-Level Workloads Postmark MPI-IO over PVFS2 (Write Size 1KB) 800 Wasteless 1.0 Data 600 Selective Throughput MB/s Transactions/s 0.8 Ordered 0.6 400 Wasteless 0.4 Data 200 0.2 Selective Ordered 0 0.0 0 1 10 100 0 10 20 30 40 Request Size (KB) Threads per Client ₋ Small files workload wasteless increases transaction throughput  ₋ Parallel I/O workload 13 clients, 1 PVFS2 data server, 1 PVFS2 metadata server (15 machines)  wasteless doubles the throughput of parallel application checkpointing  University of Ioannina A. Hatzieleftheriou 12

  13. Conclusions & Future Work • Key concept:  apply subpage journaling of data updates to ensure reliability • Wasteless Journaling  merges subpage writes into page-sized journal blocks • Selective Journaling  journals only updates below a write threshold • Performance benefits demonstrated over ext3:  reduced write latency  improved transaction throughput  avoided bandwidth waste • Future Work  extent for virtualization environments and flash memory systems University of Ioannina A. Hatzieleftheriou 13

Recommend


More recommend