performance analysis of commodity and enterprise class
play

Performance Analysis of Commodity and Enterprise Class Flash Devices - PowerPoint PPT Presentation

Performance Analysis of Commodity and Enterprise Class Flash Devices Neal M. Master, Matthew Andrews, Jason Hick, Shane Canon & Nicholas J. Wright 1 Data Trends at NERSC 2 Data Trends at NERSC cont. 3 Memory Capacity Trends


  1. Performance Analysis of Commodity and Enterprise Class Flash Devices Neal M. Master, Matthew Andrews, Jason Hick, Shane Canon & Nicholas J. Wright 1

  2. Data Trends at NERSC 2

  3. Data Trends at NERSC cont. 3

  4. Memory Capacity Trends • Technology trends: – Memory density 2X every 3 yrs; processor logic every 2 – Storage costs ($/MB) drops more gradually than logic costs Cost of Computation vs. Memory Source: David Turek, IBM Source: IBM 4

  5. I/O Performance Challenges Performance Crisis #1 • Disks are outpaced by compute speed. • To achieve reasonable aggregate bandwidth many spindles needed – 10 3 spindles = 1PB but only ~0.1 TB/s ! now 2018 10000 PicoJoules 1000 100 Performance Crisis #2 10 Data Motion on an Exascale 1 Machine DP FLOP Register 1mm on-chip 5mm on-chip Off-chip/DRAM Cross system will be expensive – both in local terms of energy & performance ! 5

  6. Flash Memory - Ubiquitous 6

  7. Flash – What is it good for? • Fits nicely into latency gap between spinning disk and memory • Lots of open Q’s: – PCI vs SATA vs ? – SLC vs MLC – Write requires block erase - performance dependent upon previous IO pattern – Correct algorithm in software at all levels – …. 7

  8. Devices Evaluated • 3 PCI-e SLC – Virident tachIOn 400GB 8x – FusionIO ioDrive Duo 2x 160GB 4x – Texas Memory Systems RamSan-20 450GB 4x • 2 SATA MLC – Intel X-25M 160GB – OCZ Colossus 250GB 8

  9. IOZone Experiments • Bandwidth – Vary block size: 2 n KB, n =2-8 – Vary concurrency: 2 n threads, n=0-7 (1-128) – Vary IO Patterns: Sequential Write/Re-write, Sequential Read/Re-read, Random Write, Mixed Random Write/Read, Random Read • IOPS – 4KB block size – Vary concurrency: 2 n threads, n=0-7 (1-128) 9

  10. SATA Bandwidths 0-50 50-100 100-150 150-200 INTEL X25-M READ 200 Bandwidth (MB/s) 150 100 50 256 64 IO Block Size 0 (KB) 16 1 2 4 8 4 16 32 64 128 Number of Threads 10

  11. PCI-e Bandwidths 0-100 100-200 200-300 300-400 400-500 500-600 600-700 700-800 TMS RAMSAN READ 800 700 Bandwidth (MB/s) 600 500 400 300 200 256 100 64 IO Block Size (KB) 0 16 1 2 4 8 4 16 32 64 128 Number of Threads 11

  12. PCI-Bandwidths continued 0-100 100-200 200-300 300-400 400-500 500-600 600-700 700-800 800-900 900-1000 1000-1100 1100-1200 Virident tachIOn READ 1200 1100 1000 Bandwidth (MB/s) 900 800 700 600 500 400 300 256 200 64 100 IO Block Size 0 (KB) 16 1 2 4 8 4 16 32 64 128 Number of Threads 12

  13. Bandwidth Summary Read Bandwidth Company Reported Read Bandwidth Write Bandwidth Company Reported Write Bandwidth 1400 1200 1000 Bandwidth (MB/s) 800 600 400 200 0 TMS RamSan 20 Virident tachIOn Fusion IO ioDrive Intel X-25M OCZ Colossus (450GB) (400GB) Duo (Single Slot, (160GB) (250 GB) 160GB) 13

  14. IOPS - READ Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB) 180 160 140 IO/s (thousands) 120 100 80 60 40 20 0 1 2 4 8 16 32 64 128 Number of Threads 14

  15. IOPS - Write Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB) 180 160 140 IO/s (thousands) 120 100 80 60 40 20 0 1 2 4 8 16 32 64 128 Number of Threads 15

  16. Flash Device Evaluation - IOPS Peak Read Peak Write 180 160 140 Thousands IOPs 120 100 80 60 40 20 0 TMS RamSan 20 Virident tachIOn Fusion IO ioDrive Intel X-25M OCZ Colossus (450GB) (400GB) Duo (Single Slot, (160GB) (250 GB) 160GB) 16

  17. Degradation Experiment • Create a file using – Cat /dev/urandom | dd – that fills X% of the drive X=30,50,70,90 • Using FIO randomly write to the file – Using 4KB blocks - IOPS – Using 128KB blocks - BW 17

  18. Degradation - IOPS Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB) 35 30 25 IO/s (thousands) 20 15 10 5 0 0 10 20 30 40 50 60 Minutes 18

  19. Degradation – IOPS Summary 30% 50% 70% 90% 120% Percentage of Peak Write IO/s 100% 80% 60% 40% 20% 0% Virident tachIOn TMS RamSan Fusion IO Intel X-25M OCZ Colossus (400GB) 20 (450GB) ioDrive Duo (160GB) (250 GB) (Single Slot, 160GB) 19

  20. Degradation - Bandwidth Virident tachIOn (400GB) TMS RamSan 20 (450GB) Fusion IO ioDrive Duo (Single Slot, 160GB) Intel X-25M (160GB) OCZ Colossus (250 GB) 1200 1000 800 MB/s 600 400 200 0 0 10 20 30 40 50 Minutes 20

  21. Degradation BW Summary 30% Capacity 50% Capacity 70% Capacity 90% Capacity 90% Percentage of Peak Write Bandwidth 80% 70% 60% 50% 40% 30% 20% 10% 0% Virident tachIOn TMS RamSan 20 Fusion IO ioDrive Intel X-25M OCZ Colossus (400GB) (450GB) Duo (Single Slot, (160GB) (250 GB) 160GB) 21

  22. Summary • PCI devices are much more capable than the SATA ones • For PCI read ~ write for both sequential I/O and IOPS • It is important to test for your workload each device • The PCI devices especially can be difficult to use…… 22

  23. Future Work • Testing Flash with Hadoop • Evaluating various new storage technologies. PCM etc • Explore other uses for flash – Metadata storage 23

  24. Combining Flash with Hadoop 24

  25. 25

Recommend


More recommend