HPIS3: Towards a High-Performance Simulator for Hybrid Parallel I/O and Storage Systems Bo Feng , Ning Liu, Shuibing He, Xian-He Sun Department of Computer Science Illinois Institute of Technology, Chicago, IL Email: {bfeng5, nliu8}@hawk.iit.edu, {she11, sun}@iit.edu
Outline • Introduction • Related Work • Design and Implementation • Experiments • Conclusions and Future Work Speaker: Bo Feng 2
Outline • Introduction • Related Work • Design and Implementation • Experiments • Conclusions and Future Work Speaker: Bo Feng 3
To Meet the High I/O Demands 1. PFS 2. SSD SSD-seq SSD-ran HDD-seq HDD-ran 300 BANDWITH (MB/SEC) 250 200 150 100 50 0 0 50 100 150 200 250 300 REQUEST SIZE (KB) Speaker: Bo Feng 4
HPIS3: Hybrid Parallel I/O and Storage System Simulator Event 1 • Parallel discrete event simulator Event 5 Event 2 • A variety of hardware and software configurations • Hybrid settings • Buffered-SSD Event 4 Event 3 • Tiered-SSD • … • HDD and SSD latency and bandwidth under parallel file systems • Efficient and high-performance Speaker: Bo Feng 5
Outline • Introduction • Related Work • Design and Implementation • Experiments • Conclusions and Future Work Speaker: Bo Feng 6
Related Work Co-design tool for hybrid parallel I/O and storage systems • S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems [1] • A Cost-Aware Region-Level Data Placement Scheme For Hybrid Parallel I/O Systems [2] • On the Role of Burst Buffers in Leadership-Class Storage Systems [3] • iBridge: Improving Unaligned Parallel File Access with Solid-State Drives [4] • More… [1] S. He, X.-H. Sun, and B. Feng, “S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems,” in Proceedings of International Conference on Distributed Computing Systems (ICDCS) , 2014. [2] S. He, X.-H. Sun, B. Feng, X. Huang, and K. Feng, “A Cost-Aware Region-Level Data Placement Scheme for Hybrid Parallel I/O Systems,” in Proceedings of 2013 IEEE International Conference on Cluster Computing (CLUSTER) , 2013. [3] N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn, “On the Role of Burst Buffers in Leadership-Class Storage Systems,” in Proceedings of 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST) , 2012. [4] X. Zhang, K. Liu, K. Davis, and S. Jiang, “iBridge: Improving unaligned parallel file access with solid-state drives,” in Proceedings of the 2013 IEEE 27th International Parallel and Distributed Processing Symposium (IPDPS) , 2013. Speaker: Bo Feng 7
Outline • Introduction • Related Work • Design and Implementation • Experiments • Conclusions and Future Work Speaker: Bo Feng 8
Design Overview • Platform: ROSS Application I/O Workloads • Target: PVFS PVFS ... Client Client Client Clients • Architecture Overview • Client LPs • Server LPs PVFS ... • Drive LPs Node Node Node Servers • Note: LP is short of logical process. They act like real processes in the system and are synchronized by Time Warp protocol. Storage ... Devices SSD HDD HDD Speaker: Bo Feng 9
File, Queue and PVFS Client Modeling • File requests and request queue File File File modeling Request Request Request • <file_id, length, file_offset> S S S • State variables define queues S Str Sn Str • PVFS client modeling • Stripping mechanism File Servers File Servers File Servers (a) (b) (c) Speaker: Bo Feng 10
PVFS Sever Modeling • Connected with clients and ... Client Client Client drives Node SSD Speaker: Bo Feng 11
Event flow in HPIS3: a write example Client Requests Client Events Server Events Drive Events write FWRITE_init FWRITE_io_g etattr • Write event flow for HDD FWRITE_insp HW_START prelude ect_attr • Single-queue effect FWRITE_io_d FWRITE_posit atafile_setup ive_ack HW_READY FWRITE_dataf FWRITE_start ile_post_msgpa WRITE_ACK _flow ris • Write event flow for SSD FWRITE_dataf • Multi-queue effect FWRITE_com ile_complete_o WRITE_END pletion_ack perations FWRITE_clea release nup FWRITE_term end inate Speaker: Bo Feng 12
Storage Device Modeling: HDD vs. SSD (1) HDD SSD FWRITE_start FWRITE_start HW_START HW_START _flow _flow HW_READY HW_READY HW_READY HW_READY HW_READY WRITE_ACK WRITE_ACK WRITE_ACK WRITE_ACK WRITE_ACK FWRITE_com FWRITE_com WRITE_END WRITE_END WRITE_END WRITE_END WRITE_END pletion_ack pletion_ack Speaker: Bo Feng 13
Storage Device Modeling: HDD vs. SSD (2) HDD SSD • Start up time • Start up time • Seek time • FTL mapping time • Data transfer time • GC time Read Write Sequential SR SW Random RR RW Speaker: Bo Feng 14
Hybrid PVFS I/O and Storage Modeling Application Application Sservers SSD Tier Buffer S S S S Hservers HDD Tier Storage H H H H H H H H Buffered-SSD setting Tiered-SSD setting S is short for SSD-Server, which is a server node with SSD. • H is short for HDD-Server, which is a server node with HDD. • Speaker: Bo Feng 15
Outline • Introduction • Related Work • Design and Implementation • Experiments • Conclusions and Future Work Speaker: Bo Feng 16
Experimental setup 65-nodes SUN Fire Linux Cluster CPU Quad-Core AMD Opteron(tm) Processor 2376 * 2 Memory 4 * 2GB, DDR2 333MHz Network 1 Gbps Ethernet Storage HDD: Seagate SATA II 250GB, 7200RPM SSD: OCZ PCI-E X4 100GB OS Linux kernel 2.6.28.10 File system OrangeFS 2.8.6 • 32 nodes used throughout our experiments in this study Speaker: Bo Feng 17
Benchmark and Trace tool • IOR • Sequential read/write • Random read/write • IOSIG • Conducted from trace replay to trigger events Speaker: Bo Feng 18
Simulation Validity H-ran S-ran H-sim-ran S-sim-ran • 8 clients 500 450 Throughput (MB/Sec) 400 350 • 4 HDD-servers 300 250 200 • 4 SSD-servers 150 100 50 • Lowest error rate is 2% 0 4 32 256 2048 16384 • Average error rate is 11.98% Transfer Size (KB) Speaker: Bo Feng 19
Simulation Performance Study Event rate (events/sec) Running time (sec) • 32 physical nodes 400 70,000 350 60,000 Event rate (evt/sec) Running time (sec) 300 • 2048 clients 50,000 250 40,000 200 30,000 • 1024 servers 150 20,000 100 10,000 50 • # of processes from 2 to 256 0 0 2 4 8 16 32 64 128 256 # of processes Speaker: Bo Feng 20
Case study: Tiered-SSD Performance Tuning 300 • 16 clients 275.67 240.96 • 64K random requests 250 Throughput (MB/sec) • 4 HDD-servers + 4 SSD-servers 200 • Performance boosts about 15% 150 for original setting 99.92 • Performance boosts about 140% 100 86.053 for tuned setting 50 0 4HDD 4SSD Original Tuned Speaker: Bo Feng 21
Outline • Introduction • Related Work • Design and Implementation • Experiments • Conclusions and Future Work Speaker: Bo Feng 22
Conclusions and Future Work • HPIS3 simulator: a hybrid parallel I/O and storage simulation system • Models of PVFS clients, servers, HDDs and SSDs • Validate against benchmarks • Minimum error rate is 2% and average is about 11.98% in IOR tests. • Scalable: # of processes from 2 to 256 • Showcase of tiered-SSD settings under PVFS • Useful to find optimal settings • Useful to self-tuning at runtime • Future work • More evaluation for tiered-SSD vs. buffered-SSD • Improve accuracy by detailed models • Client-side settings and more Speaker: Bo Feng 23
Thank you Questions? Bo Feng bfeng5@hawk.iit.edu Speaker: Bo Feng 24
Recommend
More recommend