Differentiated Storage Services M. Mesnier, J.B. Akers, F. Chen, T. Luo Presentation by Szymon Bachnij
Introduction ● DSS is a proposition of I/O classification architecture ● we want to define the separate classes of I/O ● our goal is to assign the storage system policy to each of those classes to efficently manage data and I/O requests
Challenges: ● Computer system performance depends on storage system ● Storage systems are becoming more and more complex ● Storage system need some information to provide any optimazation ● ... but too much information is not a good idea
Requirements
Operating system: ● classifier assosiated with every I/O request ● new field must be added to each OS structure describing I/O which is always copied to actual I/O command (SCSI, ATA) ● OS scheduler need to be changed
Filesystem: ● must have its own classification scheme ● each class have its own policy ● I/O can change the classification class (ex. file changes its size)
Storage system: ● must exctract the classifier, find the appropriate policy and enforce it ● don’t need to remember the class of each data block ● have to inform about changing the location of block
Application: ● O_CLASSIFIED needed to use DSS while opening the file ● POSIX gather/scatter operations are overloaded ● changes in VFS are essential in order to handle DSS features
Implementation
Operating system ● interface for classifying I/O requests
Operating system ● then we copy from the BIO to the 5-bit vendor-specific Group Number field in byte 6 of the SCSI CDB SCpnt->cmnd[6] = SCpnt->request->bio->bi_class; ● adding I/O classification is a matter of tracking an I/O from filesystem to device drivers through block layers
File system ● Goal: provide the storage system information which blocks should be cached and the order of eviction of cached blocks
File system ● class id and priority may change ● we using 19 out of 32 available ID’s ● the less numer the higher priority is
File system ● provided POSIX interface for user-level I/O
File system ● example for PostgreSQL
Storage system Baseline algorithm: ● at the beginning we have ‘free list’ of allocations ● when the data block is cached the allocation is moved to ‘dirty list’ ● when the ‘free list’ drops below some level ‘syncer deamon’ begins to clean the ‘dirty list’
Storage system Selective allocation: ● decision about caching is not based on request size ● metadata and small files are always cached ● large files are cached conditionally (it depends on ‘syncer deamon’ state)
Storage system ● Selective eviction: ● is not a LRU algorithm ● first are evicted entries with lowest priority ● If this is not enough we evict next lowest entries ● metadata and small files rarely leave cache ● large files are usually moved out because of priority, but also its size
Evaluation
Environment ● single Linux machine (Fedora 13) ● kernel version: 2.6.34 ● 8-core system with 8GB of RAM ● file system: Ext3 ● storage device: 5-disk LSI RAID-1E array ● cache: Intel 32GB X25-E SSD
Test methodology ● Workload generator which on input takes: file size distribution, request file size, read/write ratio, number of subdirectories
File server ● file server worload based on SPECsfs2008 ● over 262,000 files and 8,500 directories created ● over 262,000 transactions performed ● read/write ratio is 2:1 ● 184GB of memory used ● 18GB cache
E-mail server ● e-mail server worload based on a study of e-mail server file sizes ● 1 milion files 1,000 directories ● 1 milion transactions performed ● read/write ratio is 2:1 ● 204GB memory used ● 20GB cache
Results
Database ● used database: PostgreSQL ● highest priority for: metadata, user tables, log files and temporary tables (all in one class) ● index files have lower priority ● 8GB cache
Database results
The end
Recommend
More recommend