the multi streamed solid state drive
play

The Multi-streamed Solid-State Drive Jeong-Uk Kang* , Jeeseok Hyun, - PowerPoint PPT Presentation

The Multi-streamed Solid-State Drive Jeong-Uk Kang* , Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho Memory Solutions Lab. Memory Division, Samsung Electronics Co., Ltd SSD as a Drop-in Replacement of HDD SSD shares a common interface with HDD


  1. The Multi-streamed Solid-State Drive Jeong-Uk Kang* , Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho Memory Solutions Lab. Memory Division, Samsung Electronics Co., Ltd

  2. SSD as a Drop-in Replacement of HDD SSD shares a common interface with HDD • The block device abstraction paved the way for wide adoption of SSDs Host Host Host Host Application Application OS OS SATA File System File System Logical Block Generic Block Layer Generic Block Layer Generic Block Layer Generic Block Layer Address SSD SSD HDD HDD

  3. Great, BUT… Rotating media and NAND flash memory are very different! Host Host Host Host Application Application Sector base OS OS File System File System Logical Block Generic Block Layer Generic Block Layer Generic Block Layer Generic Block Layer Address SSD SSD HDD HDD ? Read_ Sector () Read_ Sector () Read_ Page () Write_ Page () Read_ Page () Write_ Page () Write_ Sector () Write_ Sector () Erase_ Block () Copy_ Page () Erase_ Block () Copy_ Page () Disk NAND NAND NAND Flash Flash Flash Memory Memory Memory

  4. The Trick is FTL! Flash translation layer (FTL) Host Host • Logical block mapping Application • Bad block management OS • Garbage Collection (GC) (Sector based) File System Generic Block Layer Generic Block Layer SSD SSD FTL Page Block Page Write Erase Read NAND Flash Memory Block Block Page Page … Page Page Page Page Page Page

  5. Garbage Collection (GC) GC reclaims space to prepare new empty blocks • NAND’s “erase ‐ before ‐ update” requirement  Valid page copying followed by an erase operation • Has a large impact on SSD lifetime and performance Block A Block A Page Valid data1 ERASED Page Valid data2 ERASED Free block Page Invalid data ERASED Page Invalid data ERASED Valid data1 Valid data2 Block B Block B Valid data3 Valid data4 Page Invalid data ERASED Page Invalid data ERASED Page Valid data3 ERASED Valid page Page Valid data4 ERASED copying

  6. GC is Expensive! Performance of SSD gradually decreases as time goes on • Example: Cassandra update throughput 1.2 1.1 Cassandra Update Throughput (ops/sec) Throughput 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940 Time

  7. GC is Expensive! Performance of SSD gradually decreases as time goes on • Example: Cassandra update throughput GC overhead 1.2 1.2 3 1.1 1.1 Cassandra Update Throughput (ops/sec) Cassandra Update Throughput (ops/sec) 2.5 Throughput 1 1 Valid Pages copied (ops/sec) 0.9 0.9 2 0.8 0.8 0.7 0.7 GC highly affects the SSD performance! 1.5 0.6 0.6 1 0.5 0.5 0.4 0.4 0.5 0.3 0.3 0.2 0.2 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940 (Minutes) Time

  8. Our I dea: Multi-streamed SSD Host Host Application OS Co ‐ exists with the existing block layer File System General & concrete Generic Block Layer Generic Block Layer interface New interface for SSD Multi ‐ streaming Interface Host ‐ provided stream information guides SSD SSD desirable data placement within SSD! FTL NAND Flash memory

  9. End Result The multi-streamed SSD can sustain Cassandra update throughput 1.2 1.2 Proposed Update Throughput (ops/sec) Update Throughput (ops/sec) 1 1 0.8 0.8 0.6 0.6 Traditional SSD 0.4 0.4 0.2 0.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Time (Minutes) Time (Minutes)

  10. Contents Background Write optimization in SSD The Multi-streamed SSD Our approach Case study Evaluation Experimental setup Results Conclusion

  11. Effects of Write Patterns Previous write patterns (= current state) matter Block 0 Block 1 Block 2 Block 0 Block 1 Block 2 LBA 7 LBA 2 LBA 2 LBA 7 LBA 7 LBA 2 LBA 0 LBA 0 LBA 0 LBA 0 LBA 3 LBA 3 LBA 1 LBA 3 LBA 0 LBA 0 LBA 1 LBA 1 LBA 1 LBA 6 LBA 2 LBA 1 LBA 1 LBA 6 LBA 4 LBA 4 LBA 4 LBA 4 LBA 5 LBA 3 LBA 5 LBA 7 Sequential LBA updates into Block 2 Random LBA updates into Block 2 Just erase Block 0 Need valid page copying from Block 0 & Block 1

  12. Stream SSD SSD Block Block Block Write to stream 1 Page Page Page Stream 1 Page Page Page … Lifetime 1 Page Page Page Page Page Page Block Block Block Write to stream 2 Page Page Page Data Stream 2 Page Page Page … Lifetime? Lifetime 2 Page Page Page Page Page Page Block Block Block Write to stream 3 Page Page Page Stream 3 Page Page Page … Lifetime 3 Page Page Page Page Page Page

  13. The Multi-streamed SSD Multi-streamed SSD • Mapping data with different lifetime to different streams Host Host Multi ‐ stream interface Generic Block Layer Generic Block Layer Data1 Data2 StreamID Data3 Data4 Data13 SSD SSD Data5 Data10 FTL Application NAND Flash Memory Provide information about Block Block Block data lifetime 1 Data1 2 Data2 3 Data10 1 Data3 2 Data7 3 Data12 Page Data9 Data13 2 3 Page Page Page Stream ID = 1 Stream ID = 3 Stream ID = 2 Place data with similar lifetime into the same erase unit

  14. Working Example Multi-streamed SSD • High GC efficiency (Reduce GC overheads)  effects on Performance! Request data Request data 1 20 1 22 1 21 100 20 1 1 20 1 22 1 21 100 20 1 Block 0 Block 1 Block 2 Block 0 Block 1 Block 2 1 1 1 1 1 1 1 20 20 100 20 20 22 1 1 21 Reduce valid pages to copy 1 100 1 1 1 22 21 20 1 20 Without Stream Multi ‐ Stream For effective multi ‐ streaming, proper mapping of data to streams is essential!

  15. Case Study: Cassandra Cassandra employs a size-tiered compaction strategy Write Memory Memtable Request Commit Log Flushing SSTable 21 SSTable 5 SSTable 6 SSTable 7 K1 K2 K3 SSTable 1 SSTable 2 SSTable 3 SSTable 4 K1 K1 K2 K1 K2 K3 K3 K3

  16. Summary of Cassandra’s Write Patterns Write operations when Cassandra runs Memory Memtable Flushing data Commit ‐ log Commit Log Write SSTable 21 System data Write SSTable 5 SSTable 6 SSTable 7 K1 K2 K3 Compaction data write metadata, journal … SSTable 1 SSTable 2 SSTable 3 SSTable 4 K1 K1 K2 K1 System K2 K3 K3 K3

  17. Mapping # 1: “Conventional” Just one stream I D (= conventional SSD) Memory Memtable 0 Flushing data 0 Commit ‐ log Commit Log Write SSTable 21 0 System data Write SSTable 5 SSTable 6 SSTable 7 K1 K2 K3 0 Compaction data write metadata, journal … SSTable 1 SSTable 2 SSTable 3 SSTable 4 K1 K1 K2 K1 System K2 K3 K3 K3

  18. Mapping # 2: “Multi-App” Add a new stream to separately handle application writes (stream I D 1) from system traffic (stream I D 0) Memory Memtable 1 Flushing data 1 Commit ‐ log Commit Log Write SSTable 21 0 System data Write SSTable 5 SSTable 6 SSTable 7 K1 K2 K3 1 Compaction data write metadata, journal … SSTable 1 SSTable 2 SSTable 3 SSTable 4 K1 K1 K2 K1 System K2 K3 K3 K3

  19. Mapping # 3: “Multi-Log” Use three streams; further separate Commit Log Memory Memtable 2 Flushing data 1 Commit ‐ log Commit Log Write SSTable 21 0 System data Write SSTable 5 SSTable 6 SSTable 7 K1 K2 K3 2 Compaction data write metadata, journal … SSTable 1 SSTable 2 SSTable 3 SSTable 4 K1 K1 K2 K1 System K2 K3 K3 K3

  20. Mapping # 4: “Multi-Data” Give distinct streams to different tiers of SSTables Memory Memtable 2 Flushing data 1 Commit ‐ log Commit Log Write SSTable 21 0 System data Write 4 Compaction data write SSTable 5 SSTable 6 SSTable 7 K1 K2 K3 3 Compaction data write metadata, journal … SSTable 1 SSTable 2 SSTable 3 SSTable 4 K1 K1 K2 K1 System K2 K3 K3 K3

  21. Experimental Setup Multi-stream SSD Prototype Linux kernel 3.13 (modified) • Samsung 840 Pro SSD • Passes the stream ID through fadvise() system call – 60 GB device capacity • Stores in the inode of VFS YCSB benchmark on Cassandra • Write intensive workload – 1 K data x 1,000,000 record Application fadvise (fd, Stream ID) counts – 100,000,000 operation counts VFS inode field = Stream ID I ntel i7-3770 3.4 GHz processor Store Stream ID EXT4 In buffer head 2 GB Memory Device SSD • Accelerates SSD aging by increasing Cassandra’s flush frequency

  22. Results Cassandra’s normalized update throughput • Conventional “TRIM off” 1.2 Update Throughput (ops/sec) 1 0.8 Conventional 0.6 (TRIM off) 0.4 0.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Time (Minutes)

  23. Results Cassandra’s normalized update throughput • Conventional “TRIM on” 1.2 1.2 But still far from ideal Update Throughput (ops/sec) Update Throughput (ops/sec) 1 1 Conventional (TRIM on) TRIM gives 0.8 0.8 non ‐ trivial improvement 0.6 0.6 0.4 0.4 Conventional (TRIM off) 0.2 0.2 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Time (Minutes) Time (Minutes) 27/37

Recommend


More recommend