FAST: Quick Application Launch on Solid-State Drives Yongsoo Joo 1 , Junhee Ryu 2 , Sangsoo Park 1 , and Kang G. Shin 1,3 1 Ewha Womans University, Korea 2 Seoul National University, Korea 3 University of Michigan, USA
Application Launch Delay � Elapsed time between two events � A user clicks the icon � The application becomes responsible � Important for interactive applications � Critically affects user satisfaction 2
Application Launch Performance � Moore’s law not applicable � Faster CPU and larger main memory not helpful � HDD seek and rotational latencies do not improve well (Mbit/s) (ms) (MIPS) (Gbit/s) 1200 � 15 � 100000 � 250 � seek 10000 � 1000 � 12 � 200 � 1000 � 800 � rotational 9 � 150 � 600 � 100 � 6 � 100 � 400 � 10 � 3 � Average seek time � 50 � 200 � 1 � Average rotational latency � 0 � 0 � 0.1 � 0 � 1970 � 1980 � 1990 � 2000 � 2010 � 1980 � 1990 � 2000 � 2010 � 1990 � 2000 � 2010 � 1990 � 2000 � 2010 � (c) Peak bandwidth of HDDs (d) Disk access latency (a) CPU performance (b) Peak bandwidth of DRAMs CPU performance DRAM throughput HDD throughput HDD access latency Linear Exponential improvement improvement 3
Application Launch Performance � Application launch breakdown >'%6.$?$/'0@ 5**B@?0C@2'$?4 D?$?@$2?0E3*2@ -?$*0A# $/'0?-@-?$*0A# $/%* 1456'$ 1/2*3'( +,'-.$/'0 )'$*% !"#$"%&'( 78 978 :78 ;78 <78 =778 4
SW-Level Optimization � Many SW-level schemes deployed in OSes � Application defragment, Superfetch, readahead, BootCache, etc. � Sorted prefetch (ex: Windows prefetch) � Obtain the set of accessed blocks for each application � Monitor I/O requests during an application launch � Pause the target application upon detection of its launch � Prefetch the predetermined set of blocks in their LBA order � Reduce the total seek distance of the disk head � Resume the launch after the prefetch completes 5
SW-Level Optimization � How sorted prefetch works HDD track position Time Launch Launch start <Without sorted prefetch> completion HDD track position Prefetcher CPU Improvement execution computation (typ: 40%) Time Launch Launch Launch detection resumption completion (x-axis not in scale) <With sorted prefetch> 6
Flash-based SSD � The single most effective way to eliminate disk head positioning delay � Acrobat reader: 4.0s -> 0.8s (84% reduction) � Matlab: 16.0s -> 5.1s (68% reduction) � Characteristics � Consist of multiple NAND flash chips � No mechanical moving part � Uniform access latency (a few 100 microseconds) � Prices now affordable � 80 GB MLC SSD: less than 200$ now 7
Motivation � Question: Are we satisfied with the app launch on SSD? � Yes for lightweight applications (e.g., less than 1 sec ) � No for heavy applications (e.g., more than 5 sec ) � Far from ultimate user satisfaction � Faster application launch is always good (at least, not bad) � Needs increase for launch optimization on SSDs � Applications are getting HEAVIER � More blocks to be read � SSD random read performance improves slowly � Bounded by the single chip performance 8
HDD-Aware Optimizers on SSD � Question: Will traditional HDD optimizers work for SSDs? � Consensus: they will not be effective on SSDs � Rationale: they mostly optimize disk head movement � No disk head in SSDs � Often recommended not to use on SSDs � Microsoft Windows 7 � HDD-aware optimizers disabled upon detection of SSD � Windows prefetch, Application defragmentation, Superfetch, Readyboost, etc. 9
Sorted Prefetch on SSDs � No benefit from LBA sorting � Uniform seek latency of SSD � Launch performance still improves � Increased effective queue depth (0.3->3.4, app: Eclipse) � Observed 7 % launch time reduction: better than nothing! 32 32 Queue depth: 0.3 Average QD: 0.3 24 Queue depth Queue depth 24 16 16 8 8 0 0 (sec) (sec) 0 1 2 3 4 5 0 1 2 3 4 5 (b) Baseline prefetcher (a) Cold start (no prefetcher) 32 Queue depth: 3.4 Average QD: 3.4 Queue depth 24 16 8 0 ( sec ) 0 0.1 0.2 0.3 0.4 0.5 0.6 (c) Baseline prefetcher (zoomed in) 10
FAST: Fast Application STarter � Overlap CPU computation with SSD accesses Application s 1 c 1 s 2 c 2 s 3 c 3 s 4 c 4 t launch 0 Time (a) Cold start scenario Application c 1 c 2 c 3 c 4 t launch 0 Time (b) Warm start scenario Application c 1 c 2 c 3 c 4 Time Prefetcher s 1 s 2 s 3 s 4 t launch 0 Time (c) Proposed prefetching ( ) t cpu > t ssd 11
Application Launch Sequence � Deterministic block requests over repeated launches � Raw block request traces b 5 b 2 b 3 b 4 b 1 b 2 b 3 b 4 b 1 b 5 ... b 2 b 3 b 4 b 5 b 1 � Application launch sequence b 2 b 3 b 4 b 5 b 1 Block requests irrelevant Unrelated to application launch to the application launch 12
What to Do � Application launch sequence profiling � Using blktrace tool � Prefetcher generation � Replay block requests according to the application launch sequence � Prefetcher execution � Simultaneously with the original application � By wrapping the system call exec() � LD_PRELOAD 13
Prefetcher Generation � Example application launch sequence � AB->C->D � Block-level I/O: (start LBA, size) � (5, 2)->(1, 1)->(7, 1) <- obtainable from blktrace � File-level I/O: (filename, offset, size) � (“b.so”, 2, 2)->(“a.conf”, 1, 1)->(“c.lib”, 0, 1) "a.conf" "b.so" "c.lib" File offset 0 1 2 0 1 2 3 0 1 2 C A B D LBA 0 1 2 3 4 5 6 7 8 9 "/dev/sda" Accessed block 14
Prefetcher Generation � Example application launch sequence � AB->C->D � Block-level I/O: (start LBA, size) � (5, 2)->(1, 1)->(7, 1) <- obtainable from blktrace � File-level I/O: (filename, offset, size) � (“b.so”, 2, 2)->(“a.conf”, 1, 1)->(“c.lib”, 0, 1) "a.conf" "b.so" "c.lib" File offset 0 1 2 0 1 2 3 0 1 2 C A B D LBA 0 1 2 3 4 5 6 7 8 9 "/dev/sda" Accessed block 15
Prefetcher Generation � Block-level I/O replay int main(void) { � fd = open(" /dev/sda ",O_RDONLY|O_LARGEFILE); � posix_fadvise(fd, 5 *512, 2 *512,POSIX_FADV_WILLNEED); � posix_fadvise(fd, 1 *512, 1 *512,POSIX_FADV_WILLNEED); � posix_fadvise(fd, 7 *512, 1 *512,POSIX_FADV_WILLNEED); � return 0; } LBA size "a.conf" "b.so" "c.lib" File offset 0 1 2 0 1 2 3 0 1 2 C A B D LBA 0 1 2 3 4 5 6 7 8 9 "/dev/sda" Accessed block 16
Page Cache Structure Page cache inode /dev/sda a.conf b.so c.lib cached A B blocks C D 17
Page Cache Structure Page cache inode /dev/sda a.conf b.so c.lib cached A B blocks Miss! Miss! Miss! C D 18
Page Cache Structure Page cache inode /dev/sda a.conf b.so c.lib cached A B D C A B blocks C D What we need to construct 19
Prefetcher Generation � File-level I/O replay int main(void) { � fd1 = open(" b.so ", O_RDONLY); � posix_fadvise(fd1, 2 *512, 2 *512,POSIX_FADV_WILLNEED); � fd2 = open(" a.conf ",O_RDONLY); � posix_fadvise(fd2, 1 *512, 1 *512,POSIX_FADV_WILLNEED); � fd3 = open(" c.lib ", O_RDONLY); � posix_fadvise(fd3, 0 *512, 1 *512,POSIX_FADV_WILLNEED); � return 0; file name file offset size } "a.conf" "b.so" "c.lib" File offset 0 1 2 0 1 2 3 0 1 2 C A B D LBA 0 1 2 3 4 5 6 7 8 9 "/dev/sda" Accessed block 20
Block-to-File Level I/O Conversion � LBA-to-inode mapping � Not supported by EXT file system (5,2) (“b.so”, 2,2) (1,1) (“a.conf”,1,1) (7,1) (“c.lib”, 0,1) "a.conf" "b.so" "c.lib" File offset 0 1 2 0 1 2 3 0 1 2 C A B D LBA 0 1 2 3 4 5 6 7 8 9 "/dev/sda" Accessed block 21
Block-to-File Level I/O Conversion � Inode-to-LBA map for a single file � Easy to build � LBA-to-inode map for the entire file system � Millions of files in a file system � Frequently changed � Only a few 100s of files used by a single application � Our approach: build a partial map for each application � Determine the set of files used for the launch � Monitoring system calls using filename as their argument 22
Application Prefetcher � Automatically generated application prefetcher for Gimp int main(void) { ... readlink("/etc/fonts/conf.d/90-ttf-arphic-uming-embolden.conf", linkbuf, 256); int fd423; fd423 = open("/etc/fonts/conf.d/90-ttf-arphic-uming-embolden.conf", O_RDONLY); posix_fadvise(fd423, 0, 4096, POSIX_FADV_WILLNEED); posix_fadvise(fd351, 286720, 114688, POSIX_FADV_WILLNEED); int fd424; fd424 = open("/usr/share/fontconfig/conf.avail/90-ttf-arphic-uming-embolden.conf", O_RDONLY); posix_fadvise(fd424, 0, 4096, POSIX_FADV_WILLNEED); int fd425; fd425 = open("/root/.gnupg/trustdb.gpg", O_RDONLY); posix_fadvise(fd425, 0, 4096, POSIX_FADV_WILLNEED); dirp = opendir("/var/cache/"); if(dirp)while(readdir(dirp)); ... return 0; } 23
Recommend
More recommend