Okeanos: Wasteless Journaling for Fast and Reliable Multistream Storage Andromachi Hatzieleftheriou , Stergios V. Anastasiadis Department of Computer Science University of Ioannina, Greece University of Ioannina A. Hatzieleftheriou 1
Outline Motivation Design Implementation Evaluation Conclusions University of Ioannina A. Hatzieleftheriou 2
Motivation Write Traffic • Synchronous small writes (Linux ext3) data & metadata critical for system and application Page Size=4KB Total Journal Volume (MB) reliability 1000 metadata only 100 10 Data Journaling • Multistream concurrency Ordered 1 effectively random I/O 0 1 10 100 Request Size (KB) • In page-sized disk accesses async writes have good performance due to batching in memory sync writes result in wasteful traffic due to excessive full-page I/Os University of Ioannina A. Hatzieleftheriou 3
Design Goals Reliable storage 1. keep data on disk Inexpensive synchronous small writes 2. sequential disk throughput Reduce disk bandwidth waste due to: 3. writes with high positioning overhead unnecessary writes of unmodified data • Proposed approach: batch random small writes in memory journal data updates at subpage granularity University of Ioannina A. Hatzieleftheriou 4
Wasteless Journaling M EMORY Pages data deltas Journal Filesystem D ISK • Idea: 1. Synchronously transfer data deltas from memory to journal 2. Occasionally move data blocks from memory to final location • Still wasteful! large writes disk traffic duplication University of Ioannina A. Hatzieleftheriou 5
Selective Journaling M EMORY Pages data deltas Journal Filesystem D ISK • Definition: write threshold differentiates requests by size • Idea: 1. Transfer large requests to final location without journaling of data 2. Treat small requests according to wasteless journaling University of Ioannina A. Hatzieleftheriou 6
Consistency • Wasteless Journaling: atomic updates of both data and metadata • Selective Journaling: data updates either journaled or not depending on request size consistency at least as strict as default ext3 journaling mode (ordered) University of Ioannina A. Hatzieleftheriou 7
Prototype Implementation Journal Descriptor Multiwrite Journal Block Block Data Copies Header Data Delta • block num of final location Tag Data Delta • offset in page Tag • length in bytes Data Delta … … Tag … Page Cache Modified Data Block Buffer Original Data • Multiwrite journal block accumulates multiple subpage data updates • During recovery apply data deltas to corresponding final disk blocks University of Ioannina A. Hatzieleftheriou 8
Experiments • Implemented in Linux kernel 2.6.18 ext3 • Experimentation Environment: x86-based servers quad-core 2.66GHz processor 3GB RAM Seagate Cheetah SAS 300GB 15KRPM disks • Workloads: Microbenchmarks Postmark MPIO-IO over PVFS2 University of Ioannina A. Hatzieleftheriou 9
Latency 1 Mbps/stream 1 Mbps/stream 1000 1000 Write Latency (ms) Read Latency (us) 100 100 Selective NILFS 10 10 Ordered Selective Wasteless Ordered Data Data Wasteless NILFS 1 1 0 20 40 60 80 100 0 20 40 60 80 100 Number of Streams Number of Streams ⁻ Data & wasteless achieve substantially lower write latency similar to NILFS (stable Linux port of LFS ) ⁻ NILFS read latency significantly higher due to poor storage locality! University of Ioannina A. Hatzieleftheriou 10
Disk Traffic Lower is better! 1Kbps/stream 1Kbps/stream Journal Throughput (MB/s) File System Throughput (MB/s) Ordered 5 10 Data Wasteless 4 1 Selective 3 0.1 Data 2 Wasteless 0.01 1 Selective Ordered 0.001 0 0 2000 4000 6000 8000 0 2000 4000 6000 8000 Number of Streams Number of Streams ⁻ Data journaling expensive in terms of journal traffic ⁻ Ordered journaling incurs increased filesystem traffic ⁻ Wasteless & selective substantially reduce journal and filesystem traffic University of Ioannina A. Hatzieleftheriou 11
Application-Level Workloads Postmark MPI-IO over PVFS2 (Write Size 1KB) 800 Wasteless 1.0 Data 600 Selective Throughput MB/s Transactions/s 0.8 Ordered 0.6 400 Wasteless 0.4 Data 200 0.2 Selective Ordered 0 0.0 0 1 10 100 0 10 20 30 40 Request Size (KB) Threads per Client ₋ Small files workload wasteless increases transaction throughput ₋ Parallel I/O workload 13 clients, 1 PVFS2 data server, 1 PVFS2 metadata server (15 machines) wasteless doubles the throughput of parallel application checkpointing University of Ioannina A. Hatzieleftheriou 12
Conclusions & Future Work • Key concept: apply subpage journaling of data updates to ensure reliability • Wasteless Journaling merges subpage writes into page-sized journal blocks • Selective Journaling journals only updates below a write threshold • Performance benefits demonstrated over ext3: reduced write latency improved transaction throughput avoided bandwidth waste • Future Work extent for virtualization environments and flash memory systems University of Ioannina A. Hatzieleftheriou 13
Recommend
More recommend