U i U i Using Using Flash Fl Fl Flash SSDs h h SSD SSDs as SSD as Primary Primary P i P i Database Database Storage Storage g Robert Gottstein, Ilia Petrov, Guillermo G. Almeida, Todor Ivanov, Alex Buchmann {lastname}@dvs.tu-darmstadt.de | Fachgebiet DVS | Ilia Petrov | 11/6/2010 1
Fl Flash SSDs, X25-E, ioDriveDuo h SSD X25 E i D i D FTL FTL | Fachgebiet DVS | Ilia Petrov | 11/6/2010 2
S Specification ifi ti Specification – Intel X25-E 64GB, SLC Specification: Savvio 146GB,15k Seq. Read/Write: 250 / 170 MB/s Seq. Read / Write: 160 MB/s Read/Write IOPS (4K): R d/W it IOPS (4K) 35 000 / 3 300 35 000 / 3 300 Read/Write IOPS: 350 / 300 R d/W it IOPS 350 / 300 Latency Read/Write (4K): 0.075/0.085 ms Latency Read/Write: 3.2 / 3.5 ms Price: € 650 Price: € 180 10x 20x | Fachgebiet DVS | Ilia Petrov | 11/6/2010 3
Fl Flash vs Magnetic Storage h M ti St 10x 10x … 20x > 1000x > 1000x IoFusion ioDrive Duo Seq. Read/Write: 1.5 / 1.4 GB/s Read/Write IOPS (4K): 130 000 / 80 000 Latency Read/Write (4K): 0.025/0.035 ms Price: Price: approx. € 6000 approx. € 6000 | Dr.-Ing. Ilia Petrov | 11/6/2010 4
Amdahl’s Law – Speedup [1] A d hl’ L S d [1] An OLTP database performs IO approx 60% of the time [Patterson] An OLTP database performs IO approx. 60% of the time [Patterson] 10x faster CPUs or 10x faster IO-Subsystem? S( S( f,k ( f,k ) ) f = 0,6 = 0,6 S=2 2x S=2 2x S=2.2x S=2.2x S=1/( (1- S=1/( (1 -f) + f) + f/k f/k ) ) k = 10 = 10 10x faster storage 10x faster storage f f = 0,4 = 0,4 0,4 0,4 S( f k S( f,k S( S( f k ) f,k ) S 1 5 S 1 5 S=1.5x S=1.5x k = 10 = 10 10x faster CPUs 10x faster CPUs [1] Amdahl, Gene. "Validity of the Single Processor Approach to Achieving Large- Scale Computing Capabilities". In Proc. AFIPS Conference pp.483–485. 1967 | Fachgebiet DVS | Ilia Petrov | 11/6/2010 5
A Amdahl’s Revised Balanced System Law [2]: d hl’ R i d B l d S t L [2] A system needs 8 MIPS/MB/s IO A t d 8 MIPS/MB/ IO The instruction rate and IO rate workload dependent OLTP, CPI=2.1 Assume 75% random write, 25% random read, 8KB page size, 3.2 GHz CPU Assume 75% random write, 25% random read, 8KB page size, 3.2 GHz CPU Amdahl’s Amdahl’s CPU= 3.2 GHz Balanced B l B l Balanced d d 80 HDD/CPU 80 HDD/CPU 80 HDD/CPU 80 HDD/CPU HDD SAS 15K HDD SAS 15K System Law System Law € 12 000 RPM RPM RPM RPM Amdahl s Amdahl’s Amdahl’s Amdahl s CPU= 3.2 GHz CPU= 3 2 GHz Balanced Balanced 3 SSD/CPU 3 SSD/CPU Enterpr. SSD Enterpr. SSD System Law System Law y € 2 000 [2] Jim Gray, Prashant Shenoy, "Rules of Thumb in Data Engineering," In Proc. , ICDE 2000 | Fachgebiet DVS | Ilia Petrov | 06.11.2010 6
In summary HDD have reached physical limits HDD h h d h i l li it Fighting low access density with thousands of HDDs is unreasonable Outdated storage technology g gy Data-Intensive Systems are IO-Bound Data-Intensive systems built around HDD properties Access Gap / Access Density Access Gap / Access Density Larger Buffer Sizes Larger Page Sizes Algorithms optimized for streaming access rather than random access SSDs come at the right moment SSDs come at the right moment | Fachgebiet DVS | Ilia Petrov | 11/6/2010 7
Flash SSD Characteristics Flash SSD Characteristics Flash SSD Characteristics Flash SSD Characteristics | Fachgebiet DVS | Ilia Petrov | 11/6/2010 8
Ch Characteristics t i ti Throughput asymmetry Th h t t Random Throughput Random Throughput Better for small block-sizes Random Writes are an issue Very good sequential throughput Still asymmetric Still asymmetric Caching Very low latency Command Queuing and internal parallelism Command Queuing and internal parallelism | Fachgebiet DVS | Ilia Petrov | 11/6/2010 9
R Random Throughput d Th h t Random Throughput-Very High Random Throughput Very High Better for small blocksizes: Better for small blocksizes: Asymmetric: Read vs. Write Major weakness of HDD Up to 10x difference 4K: 35 500 Read | 6 000 Write 8K: 23 000 Read | 4 800 Write | Fachgebiet DVS | Ilia Petrov | 10 11/6/2010
R Random Throughput – SSD and HDD d Th h t SSD d HDD | Fachgebiet DVS | Ilia Petrov | 11 11/6/2010
S Sequential Throughput ti l Th h t Caching Caching Sequential Bandwidth MB/s Sequential Bandwidth MB/s Command Queuing Asymmetric >= HDD 189 MB/s Write Caching Write Cache OFF | Fachgebiet DVS | Ilia Petrov | 12 11/6/2010
A Average Access Time / Latency (AVG) A Ti / L t (AVG) AVG. Latency [ m s] y [ ] Max Latency[ m s] y[ ] WC On WC Off WC On WC Off Seq. Read 0.053 -- Seq. Read 12.29 -- Seq.Write 0.059 0.455 Seq.Write 94.82 100.26 R Rand. Read d R d 0 167 0.167 -- R Rand. Read d R d 12 41 12.41 -- Rand. Write 0.113 0.435 Rand. Write 175.27 100.68 | Fachgebiet DVS | Ilia Petrov | 13 11/6/2010
FTL Add FTL, Address-Mapping M i Block device interface Bl k d i i t f File System Fil S t Logical Blocks, LBA LBA Block Device Interface SAS/SATA2 Pages (Erase)Blocks Log Pages, (Erase)Blocks, Log records NAND Flash SSD FTL- Flash Translation Layer NAND Flash FTL FTL Memory Background Processes Mapping Wear-leveling Wear-leveling Table(SRAM) Table(SRAM) Garbage collection troller L P Metadata synch Log-block merging Con P Block SATA2/SAS – TRIM SATA2/SAS – TRIM Log Block Area<3% Log Block Area<3% RAID | Fachgebiet DVS | Ilia Petrov | 14 11/6/2010
Fragmentation and Background Fragmentation and Background Fragmentation and Background Fragmentation and Background Processes Processes | Fachgebiet DVS | Ilia Petrov | 15 11/6/2010
Single Drive Fragmentation – max. 70% full Si l D i F t ti 70% f ll Fragment: 5h write (rand., seq.) Most affected Seq.Read, Rnd.Write Fragment: 5h write (rand., seq.) Most affected Seq.Read, Rnd.Write Random reads less affected - 11% Sequential reads – 52% slower ! Read ahead not possible Seq. writes – 18% slower Better for larger block sizes Reason: (+) write cache/write back for small block sizes,(-) garbage collection Random writes – 50% slower ! Worse for larger block sizes g Reason: excessive garbage collection Reason: excessive garbage collection | Fachgebiet DVS | Ilia Petrov | 16 11/6/2010
Si Single Drive Fragmentation – over 90% full l D i F t ti 90% f ll Reads less affected Writes affected significantly Reads less affected Writes affected significantly Random reads not affected Random writes 75% slower Sequential reads approx 30% slower Sequential writes 79% slower SEQUENTI AL, 6 4 K Read Read W rite W rite Fragmented Non-Fragment. Fragmented Non-Fragment. Bandw. [ MB/ s] 177 255 38 185 Avg. Latency [ ms] Avg Latency [ ms] 9 9 8 8 52 52 11 11 RANDOM, 4 K Read W rite Fragmented Non-Fragment. Fragmented Non-Fragment. IOPS 38900 39810 828 3358 Avg. Latency [ ms] 0.8 0.8 39 10 | Fachgebiet DVS | Ilia Petrov | 17 11/6/2010
Fl Flash Trends [A. v. Bechtolsheim HPTS 2009] h T d [A B ht l h i HPTS 2009] Density doubling each year 1TB in 4 years D it d bli h 1TB i 4 Costs falling by 50% per year Access times falling by 50% per year 5 μ s in 4 years Throughput doubling every year Interface moving from SATA to PCI Express Interface moving from SATA to PCI Express Very large-scale I/O looks feasible y g /O | Fachgebiet DVS | Ilia Petrov | 18 11/6/2010
SSD RAID Storage SSD RAID Storage SSD RAID Storage SSD RAID Storage How do we build large SSD storage? How do we build large SSD storage? How do we build large SSD storage? How do we build large SSD storage? | Fachgebiet DVS | Ilia Petrov | 19 11/6/2010
Si Single SSD vs. SSD RAID l SSD SSD RAID Rnd. Write Rnd. Write Seq. Write Rnd. Read Rnd. Read Price Read Price Write Seq. Read Device IOPS/€ IOPS/€ [MB/s] [MB/s] Price [€/GB] IOPS IOPS [ms] [ms] S R R S R R P P E.SSD 250 170 0.075 0.085 35 000 3 300 10 56 5.3 RAID0 2xSSD 0 2 SS 422 22 631 631 0.375 0 3 0 0.458 8 2 24 371 3 1 2 03 2 035 19 19 13 13 1 1 1.1 What did go wrong? g g RAID benefits come at a high cost in SSD configurations Random throughput (IOPS) approx. 30% lower Random throughput (IOPS) approx 30% lower Sequential read throughput (MB/s) better than that of a single SSD Sequential write throughput good q g p g Entirely due to write caching | Fachgebiet DVS | Ilia Petrov | 20 11/6/2010
Scalability Tests – Random Load S l bilit T t R d L d RAID 0 RAID 0 Controller saturated with: SMALL Block size SMALL Block size 2 SSDs!!! (even 1) Larger block sizes more SSDs less than 4 | Fachgebiet DVS | Ilia Petrov | 21 11/6/2010
Recommend
More recommend