Transparent Tiering Today write free() read RAM 29 30 31 32 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 13 14 15 16 9 10 11 12 29 30 31 32 17 18 19 20 25 26 27 28 33 34 35 36 45 46 47 48 41 42 43 44 49 50 51 52 61 62 63 64 57 58 59 60 SSD 9
Transparent Tiering Today write free() read RAM 29 30 31 32 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 13 14 15 16 5 6 7 8 9 10 11 12 29 30 31 32 17 18 19 20 21 22 23 24 25 26 27 28 33 34 35 36 45 46 47 48 37 38 39 40 41 42 43 44 49 50 51 52 61 62 63 64 53 54 55 56 57 58 59 60 SSD 9
Transparent Tiering Today write free() read RAM 29 30 31 32 1 2 3 4 5 6 7 8 9 10 11 12 Indirection In the OS Table or in the FTL 1 2 3 4 13 14 15 16 5 6 7 8 9 10 11 12 29 30 31 32 17 18 19 20 21 22 23 24 25 26 27 28 33 34 35 36 45 46 47 48 37 38 39 40 41 42 43 44 49 50 51 52 61 62 63 64 53 54 55 56 57 58 59 60 SSD (log structured page store) 9
Non-Transparent Tiering 10
Non-Transparent Tiering • Redesign application to be flash aware • Custom object store with custom pointers • Reads, writes and garbage collection at an application object granularity • Avoid in-place writes (objects could be small) • Obtain the best performance and lifetime from flash memory device 10
Non-Transparent Tiering • Redesign application to be flash aware • Custom object store with custom pointers • Reads, writes and garbage collection at an application object granularity • Avoid in-place writes (objects could be small) • Obtain the best performance and lifetime from flash memory device • Intrusive modifications needed • Expertise with flash memory needed 10
Non-Transparent Tiering MyObject* obj = malloc( sizeof( MyObject ) ); malloc obj->x = 0; + obj->y = 1; SSD-swap obj->z = 2; free( obj ); MyObjectID oid = createObject( sizeof( MyObject ) ); MyObject* obj = malloc( sizeof( MyObject ) ); Application readObject( oid, obj ); obj->x = 0; Rewrite obj->y = 1; obj->z = 2; writeObject( oid, obj ); free( obj ); 11
Our Goal 12
Our Goal • Run mostly unmodified applications • Work via memory allocators in C-style programs 12
Our Goal • Run mostly unmodified applications • Work via memory allocators in C-style programs • Use the DRAM effectively • Use it as an object cache (not as a page cache) 12
Our Goal • Run mostly unmodified applications • Work via memory allocators in C-style programs • Use the DRAM effectively • Use it as an object cache (not as a page cache) • Use the SSD wisely • As a log-structured object store 12
Our Goal • Run mostly unmodified applications • Work via memory allocators in C-style programs • Use the DRAM effectively • Use it as an object cache (not as a page cache) • Use the SSD wisely • As a log-structured object store • Reorganize virtual memory allocation to discern object information 12
SSDAlloc Overview Application Virtual Memory (Object per page - OPP) Physical Memory SSD 13
SSDAlloc Overview Memory Manager: Creates 64 objects of 1KB size Application Virtual Memory (Object per page - OPP) Physical Memory SSD 13
SSDAlloc Overview Memory Manager: Creates 64 objects of 1KB size Application Virtual 1 2 3 4 ... Memory (Object per 61 62 63 64 page - OPP) Physical Memory SSD 13
SSDAlloc Overview Memory Manager: Creates 64 objects of 1KB size Application Virtual 1 2 3 4 ... Memory (Object per 61 62 63 64 page - OPP) 1 Physical 12 Memory 33 Page Buffer SSD 13
SSDAlloc Overview Memory Manager: Creates 64 objects of 1KB size Application Virtual 1 2 3 4 ... Memory (Object per 61 62 63 64 page - OPP) 15 16 17 18 2 3 4 5 1 ... Physical 19 20 21 22 6 7 8 9 12 Memory 23 24 25 26 10 11 13 14 33 Page Buffer RAM Object Cache SSD 13
SSDAlloc Overview Memory Manager: Creates 64 objects of 1KB size Application Virtual 1 2 3 4 ... Memory (Object per 61 62 63 64 page - OPP) 15 16 17 18 2 3 4 5 1 ... Physical 19 20 21 22 6 7 8 9 12 Memory 23 24 25 26 10 11 13 14 33 Page Buffer RAM Object Cache Log structured object store SSD 13
SSDAlloc Options 14
SSDAlloc Options Object Per Page (OPP) Memory Page (MP) Application Defined 4KB objects Data Entity Objects (like pages) 14
SSDAlloc Options Object Per Page (OPP) Memory Page (MP) Application Defined 4KB objects Data Entity Objects (like pages) Memory Manager Pool Allocator Coalescing Allocator 14
SSDAlloc Options Object Per Page (OPP) Memory Page (MP) Application Defined 4KB objects Data Entity Objects (like pages) Memory Manager Pool Allocator Coalescing Allocator No. of pages * Virtual Memory No. of objects * page_size page_size 14
SSDAlloc Options Object Per Page (OPP) Memory Page (MP) Application Defined 4KB objects Data Entity Objects (like pages) Memory Manager Pool Allocator Coalescing Allocator No. of pages * Virtual Memory No. of objects * page_size page_size Separate Page Buffer & No such Physical Memory RAM Object Cache separation 14
SSDAlloc Options Object Per Page (OPP) Memory Page (MP) Application Defined 4KB objects Data Entity Objects (like pages) Memory Manager Pool Allocator Coalescing Allocator No. of pages * Virtual Memory No. of objects * page_size page_size Separate Page Buffer & No such Physical Memory RAM Object Cache separation Log-structured Object Log-structured Page SSD Usage Store Store 14
SSDAlloc Options Object Per Page (OPP) Memory Page (MP) Application Defined 4KB objects Data Entity Objects (like pages) Memory Manager Pool Allocator Coalescing Allocator No. of pages * Virtual Memory No. of objects * page_size page_size Separate Page Buffer & No such Physical Memory RAM Object Cache separation Log-structured Object Log-structured Page SSD Usage Store Store Minimal changes restricted Code Changes No changes needed to memory allocation 14
SSDAlloc Overview 15
SSDAlloc Overview Application Virtual Memory RAM Object Cache SSD 15
SSDAlloc Overview • Application A small set of pages in core Virtual Memory Page Buffer RAM Object Cache SSD 15
SSDAlloc Overview • Application A small set of pages in core • Pages materialized on demand Virtual Memory from RAM object cache/SSD • Restricted in size to minimize Page Buffer RAM wastage (from OPP) RAM Object Cache Demand SSD Fetching 15
SSDAlloc Overview • Application A small set of pages in core • Pages materialized on demand Virtual Memory from RAM object cache/SSD • Restricted in size to minimize Page Buffer RAM wastage (from OPP) • Implemented using mprotect RAM Object Cache Demand SSD Fetching 15
SSDAlloc Overview • Application A small set of pages in core • Pages materialized on demand Virtual Memory from RAM object cache/SSD • Restricted in size to minimize Page Buffer RAM wastage (from OPP) • Implemented using mprotect RAM Object Cache Demand SSD Fetching 15
SSDAlloc Overview • Application A small set of pages in core • Pages materialized on demand Virtual Memory from RAM object cache/SSD • Restricted in size to minimize Page Buffer RAM wastage (from OPP) • Implemented using mprotect RAM Object Cache Demand SSD Fetching 15
SSDAlloc Overview • Application A small set of pages in core X • Pages materialized on demand Virtual Memory from RAM object cache/SSD • Restricted in size to minimize Page Buffer RAM wastage (from OPP) • Implemented using mprotect RAM Object • Page materialized in seg-fault handler Cache Demand SSD Fetching 15
SSDAlloc Overview • Application A small set of pages in core X • Pages materialized on demand Virtual Memory from RAM object cache/SSD • Restricted in size to minimize Page Buffer RAM wastage (from OPP) • Implemented using mprotect RAM Object • Page materialized in seg-fault handler Cache • RAM Object Cache continuously Demand Dirty flushes dirty objects to the SSD in SSD Fetching Objects LRU order 15
SSD Maintenance 16
SSD Maintenance Virtual Memory Object RAM Object Tables Cache Dirty Objects SSD 16
SSD Maintenance 16
SSD Maintenance • Copy-and-compact garbage-collector/log-writer • Seek optimizations not needed 16
SSD Maintenance • Copy-and-compact garbage-collector/log-writer • Seek optimizations not needed • Read at the head and write live and dirty objects • Use Object Tables to determine liveness 16
SSD Maintenance • Copy-and-compact garbage-collector/log-writer • Seek optimizations not needed • Read at the head and write live and dirty objects • Use Object Tables to determine liveness • Garbage is disposed • Objects written elsewhere are garbage • OPP object which is “free” is garbage 16
Implementation 17
Implementation • 11,000 lines of C++ code (runtime library) • Implemented using mprotect, mmap, and madvise • SSDAlloc-OPP pool and array allocator • SSDAlloc-MP coalescing allocator (array allocations) • SSDFree frees the allocated data • Can coexist with malloc pointers 17
SSD Usage Techniques 18
SSD Usage Techniques Write Access < Finegrained Avoid DRAM High Programming Technique Logging 4KB GC Pollution Performance Ease ✔ SSD Swap SSD Swap (Write Logged) ✔ ✔ Application ✔ ✔ ✔ ✔ ✔ Rewrite 18
SSD Usage Techniques Write Access < Finegrained Avoid DRAM High Programming Technique Logging 4KB GC Pollution Performance Ease ✔ SSD Swap SSD Swap (Write Logged) ✔ ✔ Application ✔ ✔ ✔ ✔ ✔ Rewrite ✔ ✔ ✔ ✔ ✔ ✔ SSDAlloc 18
SSDAlloc Runtime Overhead 19
SSDAlloc Runtime Overhead • Overhead for SSDAlloc runtime intervention Overhead Source Max Latency TLB Miss (DRAM Read) 0.014 μ Sec Object Table Lookup 0.046 μ Sec Page Materialization 0.138 μ Sec Page Dematerialization 0.172 μ Sec Signal Handling 0.666 μ Sec Combined Overhead 0.833 μ Sec 19
SSDAlloc Runtime Overhead • Overhead for SSDAlloc runtime intervention Overhead Source Max Latency TLB Miss (DRAM Read) 0.014 μ Sec Object Table Lookup 0.046 μ Sec Page Materialization 0.138 μ Sec Page Dematerialization 0.172 μ Sec Signal Handling 0.666 μ Sec Combined Overhead 0.833 μ Sec • NAND Flash latency ~ 30-50 μ Sec 19
SSDAlloc Runtime Overhead • Overhead for SSDAlloc runtime intervention Overhead Source Max Latency TLB Miss (DRAM Read) 0.014 μ Sec Object Table Lookup 0.046 μ Sec Page Materialization 0.138 μ Sec Page Dematerialization 0.172 μ Sec Signal Handling 0.666 μ Sec Combined Overhead 0.833 μ Sec • NAND Flash latency ~ 30-50 μ Sec • Can reach 1 Million IOPS 19
Experiments 20
Experiments • Comparing three allocation methods • malloc replaced with SSDAlloc-OPP • malloc replaced with SSDAlloc-MP • Swap 20
Experiments • Comparing three allocation methods • malloc replaced with SSDAlloc-OPP • malloc replaced with SSDAlloc-MP • Swap • 2.4Ghz Quadcore CPU with 16GB RAM • RiData, Kingston, Intel X25-E, Intel X25-V and Intel X25-M 20
Results Overview SSDAlloc-OP c-OPP’s gain vs Original Original Modified Modified Application LOC LOC Swap SSDAlloc-MP Memcached 11,193 21 5.5 - 17.4x 1.4 - 3.5x B+Tree 477 15 4.3 - 12.7x 1.4 - 3.2x Index Packet 1,540 9 4.8 - 10.1x 1.3 - 2.3x Cache HashCache 20,096 36 5.3 - 17.1x 1.3 - 3.3x 21
Results Overview SSDAlloc-OP c-OPP’s gain vs Original Original Modified Modified Application LOC LOC Swap SSDAlloc-MP Memcached 11,193 21 5.5 - 17.4x 1.4 - 3.5x B+Tree 477 15 4.3 - 12.7x 1.4 - 3.2x Index Packet 1,540 9 4.8 - 10.1x 1.3 - 2.3x Cache HashCache 20,096 36 5.3 - 17.1x 1.3 - 3.3x • SSDAlloc applications write up to 32 times less data to the SSD than when compared to the traditional VM style applications 21
Microbenchmarks 22
Recommend
More recommend