How does a flash memory cell work? How to read a cell? • Reading a cell consists in V I inferring on the cell state CG • Apply an intermediate SOURCE DRAIN FG STATE=? voltage V 1 < V I < V 0 on N+ N+ CG P-well • Read the actual value I ∗ D of the current I D I D V 1 V I V I V 0 V CG 0 STATE 1 STATE 0 10 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How does a flash memory cell work? How to read a cell? • Reading a cell consists in V I inferring on the cell state CG • Apply an intermediate SOURCE DRAIN _ _ _ _ _ _ _ _ FG voltage V 1 < V I < V 0 on N+ N+ CG P-well • Read the actual value I ∗ D of the current I D • If I ∗ D = 0 the bit value is 0 I D V 1 V I V I V I V 0 V CG 0 STATE 1 STATE 0 10 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How does a flash memory cell work? How to read a cell? • Reading a cell consists in V I inferring on the cell state CG • Apply an intermediate SOURCE DRAIN FG voltage V 1 < V I < V 0 on N+ N+ CG P-well • Read the actual value I ∗ D of the current I D • If I ∗ D = 0 the bit value is 0 I D • If I ∗ D � = 0 the bit value is 1 V 1 V I V I V I V I V 0 V CG 0 STATE 1 STATE 0 10 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Multi-Level Cell • If we partially charge FG, we need a lower threshold voltage for creating a channel • We store 2 bits by using 1 programmed state, 2 partially programmed states and 1 erased state • A flash cell storing multiple bits is a Multi-Level Cell (MLC) • A Triple-Level Cell (TLC) stores 3 bits _ _ _ _ _ _ _ _ _ _ _ _ _ _ ERASED PROGRAMMED PARTIALL Y PROGRAMMED 11 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Why does a flash cell deteriorate in time? • writing a flash cell involves an erase and a program ⇒ electrons move from/into FG anode interface (SiO 2 ) < 10 nm electron traps cathode interface 12 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Why does a flash cell deteriorate in time? • writing a flash cell involves an erase and a program ⇒ electrons move from/into FG • electrons collide with and damage the insulating layer creating traps ⇒ a Stress Induced Leakage Current (SILC) can flow through these traps Damaged Oxide anode interface anode interface SILC (SiO 2 ) < 10 nm electron traps cathode interface cathode interface 12 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Why does a flash cell deteriorate in time? • writing a flash cell involves an erase and a program ⇒ electrons move from/into FG • electrons collide with and damage the insulating layer creating traps ⇒ a Stress Induced Leakage Current (SILC) can flow through these traps • a lot of traps can build a path from the body to FG ⇒ electrons can flow through that path ⇒ impossibility to program ⇒ the flash cell is unusable Damaged Oxide Oxide Breakdown anode interface anode interface anode interface breakdown path SILC (SiO 2 ) < 10 nm electron traps cathode interface cathode interface cathode interface 12 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Why does a flash cell deteriorate in time? PE-Cycles • A flash cell can be programmed and erased a limited number of times before a breakdown ⇒ This number is called P/E-Cycles • Vendors design firmware capable of recompute the voltage thresholds for read/write operations ⇒ enterprise-MLC (eMLC) P/E Cycles 100000 30000 10000 5000 SLC eMLC MLC TLC 13 of 47 - An overview on solid-state-drives architectures and enterprise solutions
MLC vs SLC SLC: MLC: • lower density • higher density • higher cost • lower cost • faster write • erase time is similar to SLC • faster read • the level of charges in FG has to be set carefully ⇒ slower program ⇒ • higher endurance slower write • state is not 0/1 ⇒ slower read • eMLC has 3x shorter endurance • MLC has 10x shorter endurance • TLC has 20x shorter endurance 14 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How are flash chips organized? Flash Cell 1 bit 15 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How are flash chips organized? Flash Cell Page 1 bit 16384 + 512 bits 15 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How are flash chips organized? Flash Cell Page 1 bit 16384 + 512 bits Additional bits are used to store ECC and recover from runtime read errors 15 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How are flash chips organized? Flash Cell Page Block 1 bit 16384 + 512 bits 64 pages= 128KB + 4KB Additional bits are used to store ECC and recover from runtime read errors 15 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How are flash chips organized? Flash Cell Page Block 1 bit 16384 + 512 bits 64 pages= 128KB + 4KB Additional bits are used to store ECC and recover from runtime read errors Some bits are used mark the block as faulty 15 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How are flash chips organized? Flash Cell Page Block 1 bit 16384 + 512 bits Plane 64 pages= 128KB + 4KB Additional bits are used to store ECC and recover from 2048 blocks = 256MB + 8MB runtime read errors Some bits are used mark the block as faulty 15 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How are flash chips organized? Flash Cell Page Block 1 bit 16384 + 512 bits Plane 64 pages= 128KB + 4KB Additional bits are used to store ECC Die and recover from 2048 blocks = 256MB + 8MB runtime read errors Some bits are used 2 planes = 512MB + 16MB mark the block as faulty 15 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How are flash chips organized? Flash Cell Page Block 1 bit 16384 + 512 bits Plane 64 pages= 128KB + 4KB Additional bits are used to store ECC Die and recover from 2048 blocks = 256MB + 8MB Chip runtime read errors Some bits are used 2 planes = 512MB + 16MB mark the block as faulty 4 dies = 2048MB + 64MB 15 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How are flash cells organized? • Flash cells are connected forming an array called string • According to the strategies used to connect multiple cells, we can distinguish at least two kind of configuration: NAND NOR Flash cells are connected in Flash cells are connected in parallel, resembling a NOR gate series, resembling a NAND gate V cc V cc V out V out A B A B 16 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How are flash cells organized? • Let F be the CG side length NOR: NAND: • occupies area 10 F 2 • occupies area 4 F 2 • read/write a single cell • read/write a single page • erase a single block NOR ARCHITECTURE NAND ARCHITECTURE Bit 1 Bit 2 Bit 3 Bit 1 Bit 2 Bit 3 Page 1 Select Gate 1 Page 1 Page 2 Page 2 Page 3 Page 3 Page 8 Select Gate 2 Source For One Block 17 of 47 - An overview on solid-state-drives architectures and enterprise solutions
NOR vs NAND NOR: NAND: • fast random-byte read • no random-byte read • slower page read • slow partial page read when supported • slower write • faster page read • lower density • faster page write ⇒ good for source code • higher density ⇒ good for storage We focus on NAND flash technology 18 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How is a page written? • Write-in-place strategy: 1. read the block 2. erase the block 3. program the block with the updated page 19 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How is a page written? • Write-in-place strategy: 1. read the block 2. erase the block 3. program the block with the updated page • 1 page write = N page read + 1 block erase + N page write ( N = number of pages in a block) ⇒ very slow write 19 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How is a page written? • Write-in-place strategy: 1. read the block 2. erase the block 3. program the block with the updated page • 1 page write = N page read + 1 block erase + N page write ( N = number of pages in a block) ⇒ very slow write ⇒ If we update the page 40 times per second (every 25ms), the block is completely broken in: 10 5 PECycles ◦ SLC = UpdateRate = 40 ps ≈ 2500 s ≈ 40 m 10 4 ◦ MLC = 40 ps ≈ 4 m ◦ TLC = 5 · 10 3 40 ps ≈ 2 m 19 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How is a page written? • Write-in-place strategy: 1. read the block 2. erase the block 3. program the block with the updated page • 1 page write = N page read + 1 block erase + N page write ( N = number of pages in a block) ⇒ very slow write ⇒ If we update the page 40 times per second (every 25ms), the block is completely broken in: 10 5 PECycles ◦ SLC = UpdateRate = 40 ps ≈ 2500 s ≈ 40 m 10 4 ◦ MLC = 40 ps ≈ 4 m ◦ TLC = 5 · 10 3 40 ps ≈ 2 m ALERT! In our example the write rate is 80 KBps 19 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Write Amplification • Write amplification occurs when 1 user page write leads to multiple flash writes • Write amplification make flash blocks deteriorate faster • Let F be the number of flash writes corresponding to U user writes ⇒ The write amplification A is: A = F + U = 1 + F U = 1 + A f U where A f is the write amplification factor 20 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How is a page written? Relocation-on-write • Write-in-place is inadequate in terms of reliability and performance ( A f ≈ # number of pages in a block) 0 1 2 3 4 5 6 7 8 9 Page Id Operating System's view of SSD SSD 0 1 2 3 4 5 6 7 8 9 Physical Page Id 21 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How is a page written? Relocation-on-write • Write-in-place is inadequate in terms of reliability and performance ( A f ≈ # number of pages in a block) • Updated pages are re-written on new locations 3 User Write: 0 1 2 3 4 5 6 7 8 9 Page Id Operating System's view of SSD SSD 0 1 2 3 4 5 6 7 8 9 Physical Page Id 21 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How is a page written? Relocation-on-write • Write-in-place is inadequate in terms of reliability and performance ( A f ≈ # number of pages in a block) • Updated pages are re-written on new locations • The logical address of the update page is mapped to a different physical page 3 3 User Write: User Write: 0 1 2 3 4 5 6 7 8 9 Page Id Operating System's view of SSD SSD 0 1 2 3 4 5 6 7 8 9 Physical Page Id 21 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How is a page written? Relocation-on-write • Write-in-place is inadequate in terms of reliability and performance ( A f ≈ # number of pages in a block) • Updated pages are re-written on new locations • The logical address of the update page is mapped to a different physical page • Previous pages are invalidated 3 3 3 User Write: User Write: User Write: 0 1 2 3 4 5 6 7 8 9 Page Id Operating System's view of SSD x SSD 0 1 2 3 4 5 6 7 8 9 Physical Page Id 21 of 47 - An overview on solid-state-drives architectures and enterprise solutions
How is a page written? Relocation-on-write • Write-in-place is inadequate in terms of reliability and performance ( A f ≈ # number of pages in a block) • Updated pages are re-written on new locations • The logical address of the update page is mapped to a different physical page • Previous pages are invalidated ⇒ 1 user page write = 1 page read (obtain an empty page) + 2 page write (update data + invalidate page) ⇒ faster write 3 3 3 User Write: User Write: User Write: 0 1 2 3 4 5 6 7 8 9 Page Id Operating System's view of SSD x SSD 0 1 2 3 4 5 6 7 8 9 Physical Page Id 21 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Flash Translation Layer • Assign Logical Addresses to pages • Store the association between physical and logical addresses in a Translation Mapping Table • Store the number of erase operation performed on physical pages in a Erase Count Table • Tables are: ◦ maintained in SRAM (high efficient) at runtime ◦ stored on flash during shutdown to ensure durability ◦ loaded at boot-up 22 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Wear-Leveling • Free pages for relocation can be retrieved from the whole SSD • Wear-leveling guarantees that the number of PE-Cycles is uniformly distributed among all blocks ⇒ Wear-leveling extends the time to live of each block and the whole SSD • Thanks to wear-leveling all blocks break at the same time 1200 1000 800 Erase 600 Count 400 200 0 0 5000 10000 15000 20000 25000 30000 35000 Block ID 32GB+Free blocks 23 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Wear-Leveling • In order to guarantee that enough free pages are available for write relocation, wear-leveling needs: ◦ Over-provisioning - keep free a percentage of raw capacity ◦ Garbage collection - keep invalid pages in the same block ◦ DRAM buffers - keep valid pages in a buffer in order to write full blocks and reduce fragmentation • We can distinguish at least two kind of wear-leveling algorithms: ◦ Dynamic wear-leveling ◦ Static wear-leveling 24 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Wear-Leveling Dynamic Algorithm • It is called dynamic, because it is executed every time the OS replace a block of data • A small percentage (e.g. 2%) of raw capacity is reserved as free-block pool • It chooses from the free pool the block with minimum erase count the buffer is flushed • The replaced block is erased and added to the free pool ⇒ Only frequently-updated blocks are consumed 25 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Wear-Leveling Static Algorithm • Periodically scan the metadata of each block • Individuate inactive data blocks with lower erase count than free blocks • Copy their content into free-blocks and exchange them ⇒ this guarantees that static blocks participate to wear leveling 26 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Wear-Leveling Impact on reliability At first approximation, wear-leveling eliminate write amplification generated by different sizes of erase and write units ⇒ The block time to fault is: Block TTF ≈ N die · N planes · N blocks · N · PECycles PageWriteRate 27 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Wear-Leveling Impact on reliability At first approximation, wear-leveling eliminate write amplification generated by different sizes of erase and write units ⇒ The block time to fault is: Block TTF ≈ N die · N planes · N blocks · N · PECycles PageWriteRate Block TTF ≈ N die · N planes · N blocks · N · PageSize · PECycles PageWriteRate · PageSize 27 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Wear-Leveling Impact on reliability At first approximation, wear-leveling eliminate write amplification generated by different sizes of erase and write units ⇒ The block time to fault is: Block TTF ≈ N die · N planes · N blocks · N · PECycles PageWriteRate Block TTF ≈ N die · N planes · N blocks · N · PageSize · PECycles PageWriteRate · PageSize Block TTF ≈ Capacity SSD · PECycles WriteRate 27 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Wear-Leveling Impact on reliability At first approximation, wear-leveling eliminate write amplification generated by different sizes of erase and write units ⇒ The block time to fault is: Block TTF ≈ N die · N planes · N blocks · N · PECycles PageWriteRate Block TTF ≈ N die · N planes · N blocks · N · PageSize · PECycles PageWriteRate · PageSize Block TTF ≈ Capacity SSD · PECycles WriteRate • Blocks deteriorate uniformly, thus: Block TTF ≈ SSD TTF 27 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Wear-Leveling Example • Take a SSD with capacity C and a write rate W • According to the flash cells used, we have different time to fault: 28 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Wear-Leveling Example • Take a SSD with capacity C and a write rate W • According to the flash cells used, we have different time to fault: • C = 4 GB , W = 80 KBps = 4 GB · 10 5 ◦ SLC ⇒ SSD TTF = C · PECycles SLC 80 KBps ≈ 158 years W ◦ MLC ⇒ SSD TTF = C · PECycles MLC ≈ 15 . 8 years W ◦ TLC ⇒ SSD TTF = C · PECycles TLC ≈ 7 . 9 years W 28 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Wear-Leveling Example • Take a SSD with capacity C and a write rate W • According to the flash cells used, we have different time to fault: • C = 4 GB , W = 80 KBps = 4 GB · 10 5 ◦ SLC ⇒ SSD TTF = C · PECycles SLC 80 KBps ≈ 158 years W ◦ MLC ⇒ SSD TTF = C · PECycles MLC ≈ 15 . 8 years W ◦ TLC ⇒ SSD TTF = C · PECycles TLC ≈ 7 . 9 years W • C = 128 GB , W = 4 MBps = 128 GB · 10 5 ◦ SLC ⇒ SSD TTF = C · PECycles SLC ≈ 101 years W 4 MBps ◦ MLC ⇒ SSD TTF = C · PECycles MLC ≈ 10 years W ◦ TLC ⇒ SSD TTF = C · PECycles TLC ≈ 5 years W 28 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Wear-Leveling Impact on Reliability 2 As said before wear leveling make flash blocks deteriorate uniformly. Anyhow • Garbage collection increase the number of flash write • Static Wear-leveling increases the number of flash write ⇒ re-introduce write amplification factor SSD TTF ≈ Capacity SSD · PECycles (1 + A f ) WriteRate 29 of 47 - An overview on solid-state-drives architectures and enterprise solutions
SSD Architecture Durable Storage NAND NAND Flash Flash NAND NAND Flash Flash 30 of 47 - An overview on solid-state-drives architectures and enterprise solutions
SSD Architecture Durable Storage NAND NAND Flash Flash Flash Bus NAND NAND Flash Flash 30 of 47 - An overview on solid-state-drives architectures and enterprise solutions
SSD Architecture Durable Storage NAND NAND Flash Flash Flash Bus Flash Controller NAND NAND Flash Flash Read & Write Wear-leveling FTL 30 of 47 - An overview on solid-state-drives architectures and enterprise solutions
SSD Architecture TM tables EC tables SRAM Durable Storage Control Bus NAND NAND Flash Flash Flash Bus Flash Controller NAND NAND Flash Flash Read & Write Wear-leveling FTL 30 of 47 - An overview on solid-state-drives architectures and enterprise solutions
SSD Architecture Garbage Collection TM tables ECC errors EC tables CPU SRAM Durable Storage Control Bus NAND NAND Flash Flash Flash Bus Flash Controller NAND NAND Flash Flash Read & Write Wear-leveling FTL 30 of 47 - An overview on solid-state-drives architectures and enterprise solutions
SSD Architecture Garbage Collection TM tables ECC errors EC tables CPU SRAM Durable Storage Control Bus NAND NAND Flash Flash Flash Bus Flash Host Interface Controller PATA, SATA SCSI, etc NAND NAND Flash Flash Read & Write Wear-leveling FTL 30 of 47 - An overview on solid-state-drives architectures and enterprise solutions
SSD Architecture Garbage Collection TM tables ECC errors EC tables CPU SRAM Durable Storage Control Bus NAND NAND Flash Flash Flash Bus Flash Host Interface Controller PATA, SATA SCSI, etc NAND NAND Flash Flash Read & Write Wear-leveling FTL 30 of 47 - An overview on solid-state-drives architectures and enterprise solutions
SSD Architecture Garbage Collection TM tables ECC errors EC tables CPU SRAM Durable Storage Control Bus NAND NAND Flash Flash Data Bus Data Bus Flash Bus DRAM Flash Host Interface Buffer Controller PATA, SATA SCSI, etc NAND NAND Flash Flash Write Cache Read & Write Wear-leveling FTL 30 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Data Reduction • Reducing the amount of user data effectively stored in flash chips allows to reduce the write rate and increase the life of flash drives • Data reduction techniques are: ◦ Compression ◦ Deduplication 31 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Data Reduction Data Compression • It consists in reducing the number of bits needed to store data. • Lossless compression allows to restore data to its original state • Lossy compression permanently eliminates bits of data that are redundant, unimportant or imperceptible • CompressionRatio = UncompressedSize CompressedSize 1 ⇒ Data reduction is DR c = CompressionRatio Capacity SSD · PECycles SSD TTF ≈ WriteRate · (1 + A f ) · DR c 32 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Data Reduction Data Deduplication • It looks for redundancy of sequences of bytes across very large comparison windows. • Sequences of data are compared to the history of other such sequences. • The first uniquely stored version of a sequence is referenced rather than stored again • Let DD the average percentage of deduplicable data Capacity SSD · PECycles SSD TTF ≈ WriteRate · (1 + A f ) · DR c · (1 − DD ) 33 of 47 - An overview on solid-state-drives architectures and enterprise solutions
RAID Solutions on flash technology • RAID uses redundancy (e.g. a parity code) to increase reliability • Any RAID solution increase the amount of data physically written on disks (RAID Overhead) ⇒ when adopting a RAID solution with flash technology we are reducing the lifetime of the whole storage system by a factor at most equal to the RAID overhead 34 of 47 - An overview on solid-state-drives architectures and enterprise solutions
RAID Solutions on flash technology Example • N flash disks of capacity C and cells supporting L P/E-cycles. • Write load rate equal to W. 35 of 47 - An overview on solid-state-drives architectures and enterprise solutions
RAID Solutions on flash technology Example • N flash disks of capacity C and cells supporting L P/E-cycles. • Write load rate equal to W. RAID0: • stripes data • no fault tolerance • W is uniformly distributed on disks (thanks to striping) 35 of 47 - An overview on solid-state-drives architectures and enterprise solutions
RAID Solutions on flash technology Example • N flash disks of capacity C and cells supporting L P/E-cycles. • Write load rate equal to W. RAID0: • stripes data • no fault tolerance • W is uniformly distributed on disks (thanks to striping) ⇒ TTL RAID 0 = N · C · L W 35 of 47 - An overview on solid-state-drives architectures and enterprise solutions
RAID Solutions on flash technology Example • N flash disks of capacity C and cells supporting L P/E-cycles. • Write load rate equal to W. RAID10: RAID0: • stripes data • stripes data • replicates each disk • no fault tolerance • W is uniformly distributed on disks (thanks to striping) ⇒ TTL RAID 0 = N · C · L W 35 of 47 - An overview on solid-state-drives architectures and enterprise solutions
RAID Solutions on flash technology Example • N flash disks of capacity C and cells supporting L P/E-cycles. • Write load rate equal to W. RAID10: RAID0: • stripes data • stripes data • replicates each disk • no fault tolerance • W is uniformly distributed on disks (thanks to striping) ⇒ TTL RAID 0 = N · C · L ⇒ TTL RAID 10 = N · C · L W 2 W 35 of 47 - An overview on solid-state-drives architectures and enterprise solutions
RAID Solutions on flash technology Example • N flash disks of capacity C and cells supporting L P/E-cycles. • Write load rate equal to W. RAID10: RAID0: • stripes data • stripes data • replicates each disk • no fault tolerance • W is uniformly distributed on disks (thanks to striping) ⇒ TTL RAID 0 = N · C · L ⇒ TTL RAID 10 = N · C · L W 2 W Alert! In order to increase reliability we half the time to live of flash cells 35 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Modeling SSD endurance in a complex system 36 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Modeling SSD endurance in a complex system SSDs 36 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Modeling SSD endurance in a complex system SSDs System Workload 36 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Modeling SSD endurance in a complex system SSDs SSD System System Workload 36 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Does it still make sense to use RAID? • We have learned that any redundancy reduces the maximum time to live of all SSDs • The answer is YES , but why? 37 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Does it still make sense to use RAID?YES! Why? CPU SRAM Control Bus NAND NAND Flash Flash Data Bus Data Bus Flash Bus DRAM Flash Host Interface Buffer Controller PATA, SATA SCSI, etc NAND NAND Flash Flash 38 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Does it still make sense to use RAID?YES! Why? CPU SRAM ALL THESE COMPONENTS MAY HAVE Control Bus NAND NAND A FAUL T Flash Flash Data Bus Data Bus Flash Bus DRAM Flash Host Interface Buffer Controller PATA, SATA SCSI, etc NAND NAND Flash Flash 38 of 47 - An overview on solid-state-drives architectures and enterprise solutions
EMC XtremIO • The system building block is called XBrick: ◦ 25 800GB eMLC SSDs ◦ Two 1U Storage Controllers (redundant storage processors) • The scale up is guaranteed by adding more XBricks (up to six in a rack) that will be connected through InfiniBand ports. • The system performs inline data reduction by: ◦ deduplication ◦ compression 39 of 47 - An overview on solid-state-drives architectures and enterprise solutions
EMC XtremIO Deduplication • The system checks for deduplicated data: 1. subdivide the write stream in 4KB blocks 2. for each block in the stream 2.1 compute a digest 2.2 check in a shared mapping table the presence of the block 2.3 if present update a reference counter 2.4 else use the digest to determine the location of the block and send the block to the respective controller node • The addressing of blocks should uniformly distribute the data on all nodes 40 of 47 - An overview on solid-state-drives architectures and enterprise solutions
EMC XtremIO XtremIO Data Protection • The XtremIO system implements a proprietary data protection algorithm called XtremIO Data Protection (XDP) • Disks in a node are arranged in 23+2 columns • 1 row parity and 1 diagonal party • Each stripe is subdivided in 28 rows and 29 diagonals 41 of 47 - An overview on solid-state-drives architectures and enterprise solutions
EMC XtremIO XtremIO Data Protection • In order to compute efficiently the diagonal parity and to spread writes on all disks, XDP waits to fill in memory the emptiest stripe • When the stripe is full, commit it on disks • The emptiest stripe selection implies that free space is linearly distributed on stripes • XDP can: ◦ overcome 2 concurrent failures (2 parities) ◦ have a write overhead smaller than other RAID solutions 42 of 47 - An overview on solid-state-drives architectures and enterprise solutions
EMC XtremIO XtremIO Data Protection Suppose a system that is 80% full: • The emptiest stripe is 40% full (due to the emptiest selection) • A stripe can handle 28 · 23 = 644 writes • The emptiest stripe can handle 644 · 40% ≈ 257 # parities = 28( rows ) + 29( diagonal ) = 57 # writes RAIDoverhead = # userwrites RAIDoverhead = 257 + 57 1 . 22 257 43 of 47 - An overview on solid-state-drives architectures and enterprise solutions
Recommend
More recommend