u i u i using using flash fl fl flash ssds h h ssd ssds
play

U i U i Using Using Flash Fl Fl Flash SSDs h h SSD SSDs as - PowerPoint PPT Presentation

U i U i Using Using Flash Fl Fl Flash SSDs h h SSD SSDs as SSD as Primary Primary P i P i Database Database Storage Storage g Robert Gottstein, Ilia Petrov, Guillermo G. Almeida, Todor Ivanov, Alex Buchmann


  1. U i U i Using Using Flash Fl Fl Flash SSDs h h SSD SSDs as SSD as Primary Primary P i P i Database Database Storage Storage g Robert Gottstein, Ilia Petrov, Guillermo G. Almeida, Todor Ivanov, Alex Buchmann {lastname}@dvs.tu-darmstadt.de | Fachgebiet DVS | Ilia Petrov | 11/6/2010 1

  2. Fl Flash SSDs, X25-E, ioDriveDuo h SSD X25 E i D i D FTL FTL | Fachgebiet DVS | Ilia Petrov | 11/6/2010 2

  3. S Specification ifi ti  Specification – Intel X25-E 64GB, SLC  Specification: Savvio 146GB,15k  Seq. Read/Write: 250 / 170 MB/s  Seq. Read / Write: 160 MB/s  Read/Write IOPS (4K): R d/W it IOPS (4K) 35 000 / 3 300 35 000 / 3 300  Read/Write IOPS: 350 / 300 R d/W it IOPS 350 / 300  Latency Read/Write (4K): 0.075/0.085 ms  Latency Read/Write: 3.2 / 3.5 ms  Price: € 650  Price: € 180 10x 20x | Fachgebiet DVS | Ilia Petrov | 11/6/2010 3

  4. Fl Flash vs Magnetic Storage h M ti St 10x 10x … 20x > 1000x > 1000x  IoFusion ioDrive Duo  Seq. Read/Write: 1.5 / 1.4 GB/s  Read/Write IOPS (4K): 130 000 / 80 000  Latency Read/Write (4K): 0.025/0.035 ms  Price: Price: approx. € 6000 approx. € 6000 | Dr.-Ing. Ilia Petrov | 11/6/2010 4

  5. Amdahl’s Law – Speedup [1] A d hl’ L S d [1]  An OLTP database performs IO approx 60% of the time [Patterson]  An OLTP database performs IO approx. 60% of the time [Patterson]  10x faster CPUs or 10x faster IO-Subsystem? S( S( f,k ( f,k ) ) f = 0,6 = 0,6 S=2 2x S=2 2x S=2.2x S=2.2x S=1/( (1- S=1/( (1 -f) + f) + f/k f/k ) ) k = 10 = 10 10x faster storage 10x faster storage f f = 0,4 = 0,4 0,4 0,4 S( f k S( f,k S( S( f k ) f,k ) S 1 5 S 1 5 S=1.5x S=1.5x k = 10 = 10 10x faster CPUs 10x faster CPUs [1] Amdahl, Gene. "Validity of the Single Processor Approach to Achieving Large- Scale Computing Capabilities". In Proc. AFIPS Conference pp.483–485. 1967 | Fachgebiet DVS | Ilia Petrov | 11/6/2010 5

  6. A Amdahl’s Revised Balanced System Law [2]: d hl’ R i d B l d S t L [2]  A system needs 8 MIPS/MB/s IO A t d 8 MIPS/MB/ IO  The instruction rate and IO rate workload dependent  OLTP, CPI=2.1  Assume 75% random write, 25% random read, 8KB page size, 3.2 GHz CPU Assume 75% random write, 25% random read, 8KB page size, 3.2 GHz CPU Amdahl’s Amdahl’s CPU= 3.2 GHz Balanced B l B l Balanced d d 80 HDD/CPU 80 HDD/CPU 80 HDD/CPU 80 HDD/CPU HDD SAS 15K HDD SAS 15K System Law System Law € 12 000 RPM RPM RPM RPM Amdahl s Amdahl’s Amdahl’s Amdahl s CPU= 3.2 GHz CPU= 3 2 GHz Balanced Balanced 3 SSD/CPU 3 SSD/CPU Enterpr. SSD Enterpr. SSD System Law System Law y € 2 000 [2] Jim Gray, Prashant Shenoy, "Rules of Thumb in Data Engineering," In Proc. , ICDE 2000 | Fachgebiet DVS | Ilia Petrov | 06.11.2010 6

  7. In summary  HDD have reached physical limits HDD h h d h i l li it  Fighting low access density with thousands of HDDs is unreasonable  Outdated storage technology g gy  Data-Intensive Systems are IO-Bound  Data-Intensive systems built around HDD properties  Access Gap / Access Density  Access Gap / Access Density  Larger Buffer Sizes  Larger Page Sizes  Algorithms optimized for streaming access rather than random access  SSDs come at the right moment SSDs come at the right moment | Fachgebiet DVS | Ilia Petrov | 11/6/2010 7

  8. Flash SSD Characteristics Flash SSD Characteristics Flash SSD Characteristics Flash SSD Characteristics | Fachgebiet DVS | Ilia Petrov | 11/6/2010 8

  9. Ch Characteristics t i ti  Throughput asymmetry Th h t t  Random Throughput Random Throughput  Better for small block-sizes  Random Writes are an issue  Very good sequential throughput  Still asymmetric  Still asymmetric  Caching  Very low latency  Command Queuing and internal parallelism  Command Queuing and internal parallelism | Fachgebiet DVS | Ilia Petrov | 11/6/2010 9

  10. R Random Throughput d Th h t  Random Throughput-Very High Random Throughput Very High  Better for small blocksizes: Better for small blocksizes:  Asymmetric: Read vs. Write  Major weakness of HDD  Up to 10x difference 4K: 35 500 Read | 6 000 Write 8K: 23 000 Read | 4 800 Write | Fachgebiet DVS | Ilia Petrov | 10 11/6/2010

  11. R Random Throughput – SSD and HDD d Th h t SSD d HDD | Fachgebiet DVS | Ilia Petrov | 11 11/6/2010

  12. S Sequential Throughput ti l Th h t  Caching Caching  Sequential Bandwidth MB/s  Sequential Bandwidth MB/s  Command Queuing  Asymmetric  >= HDD 189 MB/s  Write Caching Write Cache OFF | Fachgebiet DVS | Ilia Petrov | 12 11/6/2010

  13. A Average Access Time / Latency (AVG) A Ti / L t (AVG) AVG. Latency [ m s] y [ ] Max Latency[ m s] y[ ] WC On WC Off WC On WC Off Seq. Read 0.053 -- Seq. Read 12.29 -- Seq.Write 0.059 0.455 Seq.Write 94.82 100.26 R Rand. Read d R d 0 167 0.167 -- R Rand. Read d R d 12 41 12.41 -- Rand. Write 0.113 0.435 Rand. Write 175.27 100.68 | Fachgebiet DVS | Ilia Petrov | 13 11/6/2010

  14. FTL Add FTL, Address-Mapping M i  Block device interface Bl k d i i t f File System Fil S t  Logical Blocks, LBA LBA Block Device Interface SAS/SATA2  Pages (Erase)Blocks Log Pages, (Erase)Blocks, Log records NAND Flash SSD  FTL- Flash Translation Layer NAND Flash FTL FTL Memory  Background Processes Mapping  Wear-leveling  Wear-leveling Table(SRAM) Table(SRAM)  Garbage collection troller L P  Metadata synch  Log-block merging Con P Block  SATA2/SAS – TRIM  SATA2/SAS – TRIM Log Block Area<3% Log Block Area<3%  RAID | Fachgebiet DVS | Ilia Petrov | 14 11/6/2010

  15. Fragmentation and Background Fragmentation and Background Fragmentation and Background Fragmentation and Background Processes Processes | Fachgebiet DVS | Ilia Petrov | 15 11/6/2010

  16. Single Drive Fragmentation – max. 70% full Si l D i F t ti 70% f ll  Fragment: 5h write (rand., seq.)  Most affected Seq.Read, Rnd.Write Fragment: 5h write (rand., seq.) Most affected Seq.Read, Rnd.Write  Random reads less affected - 11%  Sequential reads – 52% slower !  Read ahead not possible  Seq. writes – 18% slower  Better for larger block sizes  Reason: (+) write cache/write back for small block sizes,(-) garbage collection  Random writes – 50% slower !   Worse for larger block sizes g  Reason: excessive garbage collection Reason: excessive garbage collection | Fachgebiet DVS | Ilia Petrov | 16 11/6/2010

  17. Si Single Drive Fragmentation – over 90% full l D i F t ti 90% f ll  Reads less affected  Writes affected significantly Reads less affected Writes affected significantly  Random reads not affected  Random writes 75% slower  Sequential reads approx 30% slower  Sequential writes 79% slower SEQUENTI AL, 6 4 K Read Read W rite W rite Fragmented Non-Fragment. Fragmented Non-Fragment. Bandw. [ MB/ s] 177 255 38 185 Avg. Latency [ ms] Avg Latency [ ms] 9 9 8 8 52 52 11 11 RANDOM, 4 K Read W rite Fragmented Non-Fragment. Fragmented Non-Fragment. IOPS 38900 39810 828 3358 Avg. Latency [ ms] 0.8 0.8 39 10 | Fachgebiet DVS | Ilia Petrov | 17 11/6/2010

  18. Fl Flash Trends [A. v. Bechtolsheim HPTS 2009] h T d [A B ht l h i HPTS 2009]  Density doubling each year  1TB in 4 years D it d bli h  1TB i 4  Costs falling by 50% per year  Access times falling by 50% per year  5 μ s in 4 years  Throughput doubling every year  Interface moving from SATA to PCI Express  Interface moving from SATA to PCI Express  Very large-scale I/O looks feasible y g /O | Fachgebiet DVS | Ilia Petrov | 18 11/6/2010

  19. SSD RAID Storage SSD RAID Storage SSD RAID Storage SSD RAID Storage How do we build large SSD storage? How do we build large SSD storage? How do we build large SSD storage? How do we build large SSD storage? | Fachgebiet DVS | Ilia Petrov | 19 11/6/2010

  20. Si Single SSD vs. SSD RAID l SSD SSD RAID Rnd. Write Rnd. Write Seq. Write Rnd. Read Rnd. Read Price Read Price Write Seq. Read Device IOPS/€ IOPS/€ [MB/s] [MB/s] Price [€/GB] IOPS IOPS [ms] [ms] S R R S R R P P E.SSD 250 170 0.075 0.085 35 000 3 300 10 56 5.3 RAID0 2xSSD 0 2 SS 422 22 631 631 0.375 0 3 0 0.458 8 2 24 371 3 1 2 03 2 035 19 19 13 13 1 1 1.1 What did go wrong? g g  RAID benefits come at a high cost in SSD configurations  Random throughput (IOPS)  approx. 30% lower  Random throughput (IOPS)  approx 30% lower  Sequential read throughput (MB/s)  better than that of a single SSD  Sequential write throughput good q g p g  Entirely due to write caching | Fachgebiet DVS | Ilia Petrov | 20 11/6/2010

  21. Scalability Tests – Random Load S l bilit T t R d L d  RAID 0 RAID 0  Controller saturated with:  SMALL Block size  SMALL Block size  2 SSDs!!! (even 1)  Larger block sizes  more SSDs  less than 4 | Fachgebiet DVS | Ilia Petrov | 21 11/6/2010

Recommend


More recommend