HPIS3: Towards a High-Performance Simulator for Hybrid Parallel I/O - - PowerPoint PPT Presentation
HPIS3: Towards a High-Performance Simulator for Hybrid Parallel I/O - - PowerPoint PPT Presentation
HPIS3: Towards a High-Performance Simulator for Hybrid Parallel I/O and Storage Systems Bo Feng , Ning Liu, Shuibing He, Xian-He Sun Department of Computer Science Illinois Institute of Technology, Chicago, IL Email: {bfeng5,
Outline
- Introduction
- Related Work
- Design and Implementation
- Experiments
- Conclusions and Future Work
Speaker: Bo Feng 2
Outline
- Introduction
- Related Work
- Design and Implementation
- Experiments
- Conclusions and Future Work
Speaker: Bo Feng 3
To Meet the High I/O Demands
- 1. PFS
- 2. SSD
50 100 150 200 250 300 50 100 150 200 250 300 BANDWITH (MB/SEC) REQUEST SIZE (KB) SSD-seq SSD-ran HDD-seq HDD-ran Speaker: Bo Feng 4
HPIS3: Hybrid Parallel I/O and Storage System Simulator
- Parallel discrete event simulator
- A variety of hardware and software configurations
- Hybrid settings
- Buffered-SSD
- Tiered-SSD
- …
- HDD and SSD latency and bandwidth under parallel file systems
- Efficient and high-performance
Speaker: Bo Feng 5
Event 1 Event 2 Event 3 Event 4 Event 5
Outline
- Introduction
- Related Work
- Design and Implementation
- Experiments
- Conclusions and Future Work
Speaker: Bo Feng 6
Related Work
- S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems [1]
- A Cost-Aware Region-Level Data Placement Scheme For Hybrid
Parallel I/O Systems [2]
- On the Role of Burst Buffers in Leadership-Class Storage Systems
[3]
- iBridge: Improving Unaligned Parallel File Access with Solid-State
Drives [4]
- More…
Speaker: Bo Feng 7
[1] S. He, X.-H. Sun, and B. Feng, “S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems,” in Proceedings of International Conference on Distributed Computing Systems (ICDCS), 2014. [2] S. He, X.-H. Sun, B. Feng, X. Huang, and K. Feng, “A Cost-Aware Region-Level Data Placement Scheme for Hybrid Parallel I/O Systems,” in Proceedings of 2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013. [3] N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn, “On the Role of Burst Buffers in Leadership-Class Storage Systems,” in Proceedings of 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), 2012. [4] X. Zhang, K. Liu, K. Davis, and S. Jiang, “iBridge: Improving unaligned parallel file access with solid-state drives,” in Proceedings of the 2013 IEEE 27th International Parallel and Distributed Processing Symposium (IPDPS), 2013.
Co-design tool for hybrid parallel I/O and storage systems
Outline
- Introduction
- Related Work
- Design and Implementation
- Experiments
- Conclusions and Future Work
Speaker: Bo Feng 8
Design Overview
- Platform: ROSS
- Target: PVFS
- Architecture Overview
- Client LPs
- Server LPs
- Drive LPs
- Note: LP is short of logical process. They act
like real processes in the system and are synchronized by Time Warp protocol.
Application I/O Workloads
SSD HDD
Node Node Node
...
HDD
PVFS Clients PVFS Servers
...
Storage Devices
Client Client Client
...
Speaker: Bo Feng 9
File, Queue and PVFS Client Modeling
- File requests and request queue
modeling
- <file_id, length, file_offset>
- State variables define queues
- PVFS client modeling
- Stripping mechanism
Request File Servers File Str S (a) (b) (c) S Str Sn S File Servers File Servers Request File Request File S
Speaker: Bo Feng 10
PVFS Sever Modeling
- Connected with clients and
drives
SSD
Node Client Client Client
...
Speaker: Bo Feng 11
Event flow in HPIS3: a write example
- Write event flow for HDD
- Single-queue effect
- Write event flow for SSD
- Multi-queue effect
Client Events Client Requests Server Events Drive Events
write FWRITE_init prelude FWRITE_posit ive_ack FWRITE_start _flow HW_START HW_READY WRITE_ACK WRITE_END FWRITE_com pletion_ack FWRITE_dataf ile_complete_o perations FWRITE_io_g etattr FWRITE_insp ect_attr FWRITE_io_d atafile_setup FWRITE_dataf ile_post_msgpa ris FWRITE_clea nup FWRITE_term inate end release
Speaker: Bo Feng 12
Storage Device Modeling: HDD vs. SSD (1)
HDD SSD
FWRITE_start _flow HW_START HW_READY WRITE_ACK WRITE_END FWRITE_com pletion_ack HW_READY WRITE_ACK WRITE_END HW_READY WRITE_ACK WRITE_END HW_READY WRITE_ACK WRITE_END FWRITE_start _flow HW_START HW_READY WRITE_ACK WRITE_END FWRITE_com pletion_ack
Speaker: Bo Feng 13
Storage Device Modeling: HDD vs. SSD (2)
HDD SSD
Speaker: Bo Feng 14
Read Write Sequential SR SW Random RR RW
- Start up time
- Seek time
- Data transfer time
- Start up time
- FTL mapping time
- GC time
Hybrid PVFS I/O and Storage Modeling
Speaker: Bo Feng 15
Application H H H H S S Application H H H H S S Sservers Hservers
Buffer Storage SSD Tier HDD Tier
- S is short for SSD-Server, which is a server node with SSD.
- H is short for HDD-Server, which is a server node with HDD.
Buffered-SSD setting Tiered-SSD setting
Outline
- Introduction
- Related Work
- Design and Implementation
- Experiments
- Conclusions and Future Work
Speaker: Bo Feng 16
Experimental setup
- 32 nodes used throughout our experiments in this study
65-nodes SUN Fire Linux Cluster CPU Quad-Core AMD Opteron(tm) Processor 2376 * 2 Memory 4 * 2GB, DDR2 333MHz Network 1 Gbps Ethernet Storage HDD: Seagate SATA II 250GB, 7200RPM SSD: OCZ PCI-E X4 100GB OS Linux kernel 2.6.28.10 File system OrangeFS 2.8.6
Speaker: Bo Feng 17
Benchmark and Trace tool
- IOR
- Sequential read/write
- Random read/write
- IOSIG
- Conducted from trace replay to trigger events
Speaker: Bo Feng 18
Simulation Validity
- 8 clients
- 4 HDD-servers
- 4 SSD-servers
- Lowest error rate is 2%
- Average error rate is 11.98%
50 100 150 200 250 300 350 400 450 500 4 32 256 2048 16384 Throughput (MB/Sec) Transfer Size (KB)
H-ran S-ran H-sim-ran S-sim-ran
Speaker: Bo Feng 19
Simulation Performance Study
- 32 physical nodes
- 2048 clients
- 1024 servers
- # of processes from 2 to 256
50 100 150 200 250 300 350 400
10,000 20,000 30,000 40,000 50,000 60,000 70,000
2 4 8 16 32 64 128 256
Running time (sec) Event rate (evt/sec) # of processes Event rate (events/sec) Running time (sec)
Speaker: Bo Feng 20
Case study: Tiered-SSD Performance Tuning
- 16 clients
- 64K random requests
- 4 HDD-servers + 4 SSD-servers
- Performance boosts about 15%
for original setting
- Performance boosts about 140%
for tuned setting
86.053 275.67 99.92 240.96 50 100 150 200 250 300
4HDD 4SSD Original Tuned Throughput (MB/sec)
Speaker: Bo Feng 21
Outline
- Introduction
- Related Work
- Design and Implementation
- Experiments
- Conclusions and Future Work
Speaker: Bo Feng 22
Conclusions and Future Work
- HPIS3 simulator: a hybrid parallel I/O and storage simulation system
- Models of PVFS clients, servers, HDDs and SSDs
- Validate against benchmarks
- Minimum error rate is 2% and average is about 11.98% in IOR tests.
- Scalable: # of processes from 2 to 256
- Showcase of tiered-SSD settings under PVFS
- Useful to find optimal settings
- Useful to self-tuning at runtime
- Future work
- More evaluation for tiered-SSD vs. buffered-SSD
- Improve accuracy by detailed models
- Client-side settings and more
Speaker: Bo Feng 23
Thank you Questions?
Bo Feng bfeng5@hawk.iit.edu
Speaker: Bo Feng 24