Dr. Mais Nijim 1 20.07.2010
Motivation Introduction Related Work 2 20.07.2010
Global online satellite images distribution system operated at the Earth Resources Observation and Science (EROS) center of the U.S Geological Survey The EROS system motivates the needs for prefetching to improve the performance of hybrid storage system 3 20.07.2010
Reference: 4 20.07.2010
Study shows that new data is growing annually at the rate of 30% Supercomputing centers and rich media organizations Lawrence National laboratory, Oakridge national Lab, NASA, Google, and CNN rely on the large scale storage systems to meet demanding requirements of large data capacity with high performance and reliability 5 20.07.2010
Large scale storage systems have to be developed to fulfill rapidly increasing demands on both large storage capacity with high performance and high I/O performance employing more disks Storage capacity I/O performance increasing the number of storage components 6 20.07.2010
Hybrid storage system Solid State Tapes Drives Hard Disks 7 20.07.2010
Solid State Disks Highly accessed storage objects in a hybrid storage system can be prefetched and cashed to a high speed storage components Solid-state disks can be readily connected to any other type of storage devices 8 20.07.2010
Tape Storage Hybrid storage systems are cost-effective, because of the inexpensive tapes Tape storage system has high reliability, long archive life time, and low cost Tapes are ideal storage platform for a wide variety of data-intensive applications 9 20.07.2010
Prefetching is a promising solution to the reduction of latency of data transferring among SSDs, HDDs, and tapes Prefetching is a process that aims at reducing the number of requests issued to HDDS or tapes while caching popular data in SSDs Aggressive prefetching are need to efficiently reduce I/O latency Overaggressive scheme may waste I/O bandwidth by transforming useless data 10 20.07.2010
Web users L AN FTP server with solid state disks SAN Upper level Data miss prefetching SAN to FTP Lower level prefetching tapes to SAN Data miss ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Fig.2 ¡The ¡Hybrid ¡Storage ¡System ¡Architecture ¡with ¡Prefetching 11 20.07.2010
Move data from parallel tape storage to hard disks Parallel tape can increase the aggregate bandwidth between the disk storage and the tape storage by changing parallel load/unload operation 12 20.07.2010
To support parallelism Data striping technique is used To obtain the optima striping, data size and workload are considered 13 20.07.2010
Striping can cause lots of small data requests Increase switch time Data Placement Algorithm Propose a data clustering algorithm that clusters objects with high probability to be requested together Related data requests are highly to be requested together 14 20.07.2010
Index 1 2 3 Reference O1 O2 O3 Priority 4 3 2 Tape Library 3 Tape Library 2 Tape Library 1 O13 O12 O11 O23 O22 O21 O32 O33 O31 O43 O42 O41 Step 1 2 3 4 Disk 1 O11 O13 O31 O3 3 Disk 2 O12 O21 O32 Updated priority - 3 - 2 - 1 - LRU Eviction - Either O11,O12 or O13 15 20.07.2010
start R={r i ,r i+1 ,…,r j } P(r) Tape-library(r)) Fetch data from yes No If data is in As many tapes No prefetching disk system As possible Round Robin Data placement No yes If disk if LRU Eviction policy full 16 20.07.2010
The first component is solid state partitioning PaSSD Dynamically partition the array of the solid states among HDDs in such a way to maximize I/O performance Allocated dynamically depending on the popularity, size of contents, and access pattern 17 20.07.2010
Two approaches: I. Content popularity based weight assignment II. Collaborative popularity based weight assignment 18 20.07.2010
start R={r i ,r i+1 ,…,r j } P(r), block(r),disk(r) yes No If data is in Apply PaSSD No prefetching Solid-state Fetch data from as many disks as possible yes No LRU Eviction policy If solid-states 19 20.07.2010
Server Access Model Access time with no prefetching Access time with prefetching 20 20.07.2010
The ultimate goal of our analytical model is to provide criteria that can mathematically evaluate the performance of our algorithm Average Access time improvement S , where S is defined as Access time when Access time when prefetching is not carried out prefetching is carried out 21 20.07.2010
In the server access model, we consider multiple users accessing the network through the ftp server we consider M/G/1 round robin queuing system In this system, the average time to finish a job, necessitate a service time x, is calculated as follows System Utilization A job is defined as the retrieval time of an object. Therefore, the above equation gives the average retrieval time of an item. 22 20.07.2010
s = s + ʹ″ s Size of object located in Size of object located in disks tapes 23 20.07.2010
The average service time x is calculated as 24 20.07.2010
Prefetching a proportion h s of the users requests results in a hit in the solid-state disks, which means that this portion is served by the solid state disks The failure ratio f s =1-h s which means that the requests are located in the disk systems and/or the tapes The portion h d results in a hit in the disk system f d =1-h s -h d means that the request is served by the tape storage 25 20.07.2010
26 20.07.2010
27 20.07.2010
Average number of items to be Average number of items to prefetched from disk to solid state be prefetched from tapes to disks disks 28 20.07.2010
Probability of items to be Probability of items to be prefetched from the disk prefetched from the tape system to the solid states 29 20.07.2010
The hit ratio in the disk system will be increased by the number of the prefetched items When the data objects are to prefetched from the disk system to the solid-state disks, the hit ratio in the solid state-disks is expected to rise 30 20.07.2010
t = h ⋅ 0 + (1 − ʹ″ h ) ⋅ r d + (1 − h − ʹ″ h ) r t 1 − h s − n ( F ) p 1 = + b (1 − h s − (1 − p 1 )) λ s 1 f d − n ( F ) P 2 − n ( F ) p 1 ʹ″ s b b ʹ″ b − ʹ″ b (1 − h d + n ( F )(1 − p 2 )) λ s + b (1 − h s + n ( F )(1 − p 1 )) λ ʹ″ s 31 20.07.2010
32 20.07.2010
33 20.07.2010
The use of large scale parallel disk systems continues to rise as the demands for data-intensive applications with large capacities grow Traditional storage systems scale up storage capacity by employing more hard disk drives, which tends to be an expensive solution due to ever increasing cost for HDDs In hybrid storage systems, judiciously transferring data back and forth among SSDs, HDDS, and tapes is critical for I/O performance A multi-layer prefetching algorithm (PreHySys) that can reduce missing rate of high-end storage components thereby reducing the average response time for data requests in hybrid storage systems 34 20.07.2010
Recommend
More recommend