The Multi-streamed Solid-State Drive Jeong-Uk Kang* , Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho Memory Solutions Lab. Memory Division, Samsung Electronics Co., Ltd
SSD as a Drop-in Replacement of HDD SSD shares a common interface with HDD • The block device abstraction paved the way for wide adoption of SSDs Host Host Host Host Application Application OS OS SATA File System File System Logical Block Generic Block Layer Generic Block Layer Generic Block Layer Generic Block Layer Address SSD SSD HDD HDD
Great, BUT… Rotating media and NAND flash memory are very different! Host Host Host Host Application Application Sector base OS OS File System File System Logical Block Generic Block Layer Generic Block Layer Generic Block Layer Generic Block Layer Address SSD SSD HDD HDD ? Read_ Sector () Read_ Sector () Read_ Page () Write_ Page () Read_ Page () Write_ Page () Write_ Sector () Write_ Sector () Erase_ Block () Copy_ Page () Erase_ Block () Copy_ Page () Disk NAND NAND NAND Flash Flash Flash Memory Memory Memory
The Trick is FTL! Flash translation layer (FTL) Host Host • Logical block mapping Application • Bad block management OS • Garbage Collection (GC) (Sector based) File System Generic Block Layer Generic Block Layer SSD SSD FTL Page Block Page Write Erase Read NAND Flash Memory Block Block Page Page … Page Page Page Page Page Page
Garbage Collection (GC) GC reclaims space to prepare new empty blocks • NAND’s “erase ‐ before ‐ update” requirement Valid page copying followed by an erase operation • Has a large impact on SSD lifetime and performance Block A Block A Page Valid data1 ERASED Page Valid data2 ERASED Free block Page Invalid data ERASED Page Invalid data ERASED Valid data1 Valid data2 Block B Block B Valid data3 Valid data4 Page Invalid data ERASED Page Invalid data ERASED Page Valid data3 ERASED Valid page Page Valid data4 ERASED copying
GC is Expensive! Performance of SSD gradually decreases as time goes on • Example: Cassandra update throughput 1.2 1.1 Cassandra Update Throughput (ops/sec) Throughput 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940 Time
GC is Expensive! Performance of SSD gradually decreases as time goes on • Example: Cassandra update throughput GC overhead 1.2 1.2 3 1.1 1.1 Cassandra Update Throughput (ops/sec) Cassandra Update Throughput (ops/sec) 2.5 Throughput 1 1 Valid Pages copied (ops/sec) 0.9 0.9 2 0.8 0.8 0.7 0.7 GC highly affects the SSD performance! 1.5 0.6 0.6 1 0.5 0.5 0.4 0.4 0.5 0.3 0.3 0.2 0.2 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940 (Minutes) Time
Our I dea: Multi-streamed SSD Host Host Application OS Co ‐ exists with the existing block layer File System General & concrete Generic Block Layer Generic Block Layer interface New interface for SSD Multi ‐ streaming Interface Host ‐ provided stream information guides SSD SSD desirable data placement within SSD! FTL NAND Flash memory
End Result The multi-streamed SSD can sustain Cassandra update throughput 1.2 1.2 Proposed Update Throughput (ops/sec) Update Throughput (ops/sec) 1 1 0.8 0.8 0.6 0.6 Traditional SSD 0.4 0.4 0.2 0.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Time (Minutes) Time (Minutes)
Contents Background Write optimization in SSD The Multi-streamed SSD Our approach Case study Evaluation Experimental setup Results Conclusion
Effects of Write Patterns Previous write patterns (= current state) matter Block 0 Block 1 Block 2 Block 0 Block 1 Block 2 LBA 7 LBA 2 LBA 2 LBA 7 LBA 7 LBA 2 LBA 0 LBA 0 LBA 0 LBA 0 LBA 3 LBA 3 LBA 1 LBA 3 LBA 0 LBA 0 LBA 1 LBA 1 LBA 1 LBA 6 LBA 2 LBA 1 LBA 1 LBA 6 LBA 4 LBA 4 LBA 4 LBA 4 LBA 5 LBA 3 LBA 5 LBA 7 Sequential LBA updates into Block 2 Random LBA updates into Block 2 Just erase Block 0 Need valid page copying from Block 0 & Block 1
Stream SSD SSD Block Block Block Write to stream 1 Page Page Page Stream 1 Page Page Page … Lifetime 1 Page Page Page Page Page Page Block Block Block Write to stream 2 Page Page Page Data Stream 2 Page Page Page … Lifetime? Lifetime 2 Page Page Page Page Page Page Block Block Block Write to stream 3 Page Page Page Stream 3 Page Page Page … Lifetime 3 Page Page Page Page Page Page
The Multi-streamed SSD Multi-streamed SSD • Mapping data with different lifetime to different streams Host Host Multi ‐ stream interface Generic Block Layer Generic Block Layer Data1 Data2 StreamID Data3 Data4 Data13 SSD SSD Data5 Data10 FTL Application NAND Flash Memory Provide information about Block Block Block data lifetime 1 Data1 2 Data2 3 Data10 1 Data3 2 Data7 3 Data12 Page Data9 Data13 2 3 Page Page Page Stream ID = 1 Stream ID = 3 Stream ID = 2 Place data with similar lifetime into the same erase unit
Working Example Multi-streamed SSD • High GC efficiency (Reduce GC overheads) effects on Performance! Request data Request data 1 20 1 22 1 21 100 20 1 1 20 1 22 1 21 100 20 1 Block 0 Block 1 Block 2 Block 0 Block 1 Block 2 1 1 1 1 1 1 1 20 20 100 20 20 22 1 1 21 Reduce valid pages to copy 1 100 1 1 1 22 21 20 1 20 Without Stream Multi ‐ Stream For effective multi ‐ streaming, proper mapping of data to streams is essential!
Case Study: Cassandra Cassandra employs a size-tiered compaction strategy Write Memory Memtable Request Commit Log Flushing SSTable 21 SSTable 5 SSTable 6 SSTable 7 K1 K2 K3 SSTable 1 SSTable 2 SSTable 3 SSTable 4 K1 K1 K2 K1 K2 K3 K3 K3
Summary of Cassandra’s Write Patterns Write operations when Cassandra runs Memory Memtable Flushing data Commit ‐ log Commit Log Write SSTable 21 System data Write SSTable 5 SSTable 6 SSTable 7 K1 K2 K3 Compaction data write metadata, journal … SSTable 1 SSTable 2 SSTable 3 SSTable 4 K1 K1 K2 K1 System K2 K3 K3 K3
Mapping # 1: “Conventional” Just one stream I D (= conventional SSD) Memory Memtable 0 Flushing data 0 Commit ‐ log Commit Log Write SSTable 21 0 System data Write SSTable 5 SSTable 6 SSTable 7 K1 K2 K3 0 Compaction data write metadata, journal … SSTable 1 SSTable 2 SSTable 3 SSTable 4 K1 K1 K2 K1 System K2 K3 K3 K3
Mapping # 2: “Multi-App” Add a new stream to separately handle application writes (stream I D 1) from system traffic (stream I D 0) Memory Memtable 1 Flushing data 1 Commit ‐ log Commit Log Write SSTable 21 0 System data Write SSTable 5 SSTable 6 SSTable 7 K1 K2 K3 1 Compaction data write metadata, journal … SSTable 1 SSTable 2 SSTable 3 SSTable 4 K1 K1 K2 K1 System K2 K3 K3 K3
Mapping # 3: “Multi-Log” Use three streams; further separate Commit Log Memory Memtable 2 Flushing data 1 Commit ‐ log Commit Log Write SSTable 21 0 System data Write SSTable 5 SSTable 6 SSTable 7 K1 K2 K3 2 Compaction data write metadata, journal … SSTable 1 SSTable 2 SSTable 3 SSTable 4 K1 K1 K2 K1 System K2 K3 K3 K3
Mapping # 4: “Multi-Data” Give distinct streams to different tiers of SSTables Memory Memtable 2 Flushing data 1 Commit ‐ log Commit Log Write SSTable 21 0 System data Write 4 Compaction data write SSTable 5 SSTable 6 SSTable 7 K1 K2 K3 3 Compaction data write metadata, journal … SSTable 1 SSTable 2 SSTable 3 SSTable 4 K1 K1 K2 K1 System K2 K3 K3 K3
Experimental Setup Multi-stream SSD Prototype Linux kernel 3.13 (modified) • Samsung 840 Pro SSD • Passes the stream ID through fadvise() system call – 60 GB device capacity • Stores in the inode of VFS YCSB benchmark on Cassandra • Write intensive workload – 1 K data x 1,000,000 record Application fadvise (fd, Stream ID) counts – 100,000,000 operation counts VFS inode field = Stream ID I ntel i7-3770 3.4 GHz processor Store Stream ID EXT4 In buffer head 2 GB Memory Device SSD • Accelerates SSD aging by increasing Cassandra’s flush frequency
Results Cassandra’s normalized update throughput • Conventional “TRIM off” 1.2 Update Throughput (ops/sec) 1 0.8 Conventional 0.6 (TRIM off) 0.4 0.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Time (Minutes)
Results Cassandra’s normalized update throughput • Conventional “TRIM on” 1.2 1.2 But still far from ideal Update Throughput (ops/sec) Update Throughput (ops/sec) 1 1 Conventional (TRIM on) TRIM gives 0.8 0.8 non ‐ trivial improvement 0.6 0.6 0.4 0.4 Conventional (TRIM off) 0.2 0.2 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Time (Minutes) Time (Minutes) 27/37
Recommend
More recommend