workout i o workload outsourcing for boosting raid
play

WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction - PowerPoint PPT Presentation

WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction Performance Reconstruction Performance Suzhen Wu 1 , Hong Jiang 2 , Dan Feng 1 , Lei Tian 12 , Bo Mao 1 1 Huazhong University of Science & Technology 2 University of


  1. WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction Performance Reconstruction Performance Suzhen Wu 1 , Hong Jiang 2 , Dan Feng 1 , Lei Tian 12 , Bo Mao 1 1 Huazhong University of Science & Technology 2 University of Nebraska-Lincoln University of Nebraska Lincoln

  2. Outline � Background � Motivation � Motivation � WorkOut � Performance Evaluations � Conclusion HUST & UNL 2

  3. RAID Reconstruction R Recovers the data content on a failed disk th d t t t f il d di k � Two metrics � Reconstruction time � User response time � User response time � Categories � Off-line reconstruction Off li st ti � On-line reconstruction ( commonly deployed ) HUST & UNL 3

  4. Challenges g � Higher error rates than expected g p � Complete disk failures [Schroeder07, Pinheiro07, Jiang08] g ] � Latent sector errors [Bairavasundaram07] � Correlation in drive failures � Correlation in drive failures � e.g. after one disk fails, another disk failure will likely occur soon will likely occur soon. � RAID reconstruction might become the common case in large-scale systems. i l l t � Increasing number of drives HUST & UNL 4

  5. Reconstruction and Its Performance Impact 70 times 3 times HUST & UNL 5

  6. I/O Intensity Impact on Reconstruction 21 times ~4 times � Both the reconstruction time and user response time increase with IOPS. p HUST & UNL 6

  7. Intuitive Idea � Observation � Performing the rebuild IOs and user IOs simultaneously leads to disk bandwidth y contention and frequent long seeks to and from the multiple separate data areas. � Our intuitive idea � Our intuitive idea � To redirect the amount of user IOs that are issued to the degraded RAID set issued to the degraded RAID set. � But, What to redirect? & Where to redirect to? HUST & UNL 7

  8. What To Redirect � Access locality cc ca ty � Existing studies on workload analysis revealed that strong spatial and temporal locality exists that strong spatial and temporal locality exists even underneath the storage cache. � Answer to “what to redirect?” � Popular read requests P l d t � All write requests HUST & UNL 8

  9. Where To Redirect To � Availability of spare or free space in data centers � A spare pool including a number of disks p p g � Free space on other RAID sets � Answer to “Where to redirect to?” � Answer to Where to redirect to? � Spare or free space � Comparison C i � Existing approaches: in the context of a single RAID set � Our approach: in the context of data centers with multiple RAID sets HUST & UNL 9

  10. Main Idea of WorkOut � Workload Outsourcing (Workout) W r a ut urc ng (W r ut) � Temporarily redirect all write requests and popular read requests originally targeted at the popular read requests originally targeted at the degraded RAID set to a surrogate RAID set, to significantly improve on-line reconstruction g y p performance. � Goal � Goal � Approaches reconstruction-time performance of the off-line reconstruction without of the off line reconstruction without affecting user-response-time performance at the same time. m m . HUST & UNL 10

  11. WorkOut Architecture Administrator Administrator Interface Surrogate Popular Data Space Manager Space Manager Identifier Identifier Request Reclaimer Redirector Spare Failed Disk Disk Disk Disk Disk Disk Disk HUST & UNL 11

  12. Data Structure � D_Table: a log table that manages the D T bl l t bl th t th redirected data � D_Flag=1: Write data from the user application D Fl 1 W it d t f th li ti � D_Flag=0: Popular read data from D-RAID to S-RAID � R_LRU: an LRU-style list that identifies the R LRU: n LRU st l list th t id ntifi s th most recent reads HUST & UNL 12

  13. Algorithm During Reconstruction g g � Workflow � Workflow � For each write, it will be redirected to its previous location or a new location on the previous location or a new location on the surrogate RAID set according to whether it is an overwrite or not an overwrite or not. � For each read, Check the D_Table: � Whether it hits D Table or not? � Whether it hits D_Table or not? � If a hit, full hit or partial hit? � If a miss, whether it hits R_LRU? HUST & UNL 13

  14. Algorithm During Reclaim g g � The redirected write data should be � The redirected write data should be reclaimed back to the newly recovered RAID set after the reconstruction process set after the reconstruction process completes. � All requests must be checked in D_Table: All b h k d i D T bl � Each write request is served by the recovered RAID set and the corresponding log in D_Table should be deleted if it exists. � Read requests can be also handled well, but it is complicated to explain in a short time. More d details can be found in our paper. l b f d HUST & UNL 14

  15. Design Choices g Optional p Device De ice surrogate Performance Reliability Maintainability Overhead RAID set A dedicated A dedicated surrogate medium medium high simple RAID1 set A dedicated surrogate high high high simple RAID5 set D5 s t A live surrogate low low medium-high complicated RAID5 RAID5 set t HUST & UNL 15

  16. Data Consistency � Data Protection � In order to avoid data loss caused by a disk failure in the surrogate RAID set, all g redirected write data in the surrogate RAID set should be protected by a redundancy scheme, such as RAID1 or RAID5. � “Metadata” Protection � The content of D_Table should be stored in a NVRAM during the entire period when NVRAM during the entire period when WorkOut is activated, to prevent data loss in the event of a power supply failure the event of a power supply failure. HUST & UNL 16

  17. Performance Evaluation � Prototype implementation � A built-in module in MD � Incorporated into PR & PRO � Experimental setup � Intel Xeon 3.0GHz processor, 1GB DDR memory, 15 S Seagate SATA disks (10GB), Linux 2.6.11 t SATA di k (10GB) Li 2 6 11 � Methodology � Open-loop: trace replay O l l Trace: Financial1, Financial2, Websearch2 � Tool: RAIDmeter Tool: RAIDmeter � � � Closed-loop: TPC-C-like benchmark HUST & UNL 17

  18. Experimental Results Trace Reconstruction Time (second) Off-line PR WorkOut+PR Speedup PRO WorkOut+PRO Speedup Fin1 1121.75 203.13 5.52 1109.62 188.26 5.89 Fin2 136.4 745.19 453.32 1.64 705.79 431.24 1.64 Web 9935.6 7623.22 1.30 9888.27 7851.36 1.26 Trace Average User Response Time during Reconstruction (millisecond) g p g ( ) Normal Degraded PR WorkOut+PR Speedup PRO WorkOut+PRO Speedup Fin1 7.92 9.52 12.71 4.43 2.87 9.83 4.58 2.15 Fin2 8.13 13.36 25.8 9.69 2.66 22.97 10.19 2.25 Web 18.46 26.95 38.57 28.35 1.36 35.58 29.12 1.22 � Degraded RAID set: RAID5, 8 disks, 64KB stripe unit size � Surrogate RAID set: RAID5, 4 disks, 64KB stripe unit size � Minimum reconstruction bandwidth: 1MB/s � Minimum reconstruction bandwidth: 1MB/s HUST & UNL 18

  19. Percentage of Redirected Requests g q 84% � Minimum reconstruction bandwidth of 1MB/s HUST & UNL 19

  20. Sensitivity Study (1) ms) se Time (m on Time (s) ge Respons constructio Averag Rec (a) (b) � Different minimum reconstruction bandwidth: D ff b d d h 1MB/s, 10MB/s, 100MB/s HUST & UNL 20

  21. Sensitivity Study (2) 900 45 ms) PR nse Time (m ) on Time (s) 800 800 40 40 PRO WorkOut 700 35 600 30 500 500 25 25 econstructio age Respon 400 20 300 15 PR 200 10 Avera PRO PRO Re 100 5 WorkOut 0 0 5 8 11 5 8 11 (a) (b) � Different number of disks (5, 8, 11) D ff b f d k (5 8 11) HUST & UNL 21

  22. Sensitivity Study (3) 40 40 PR n Time (s) 35 WorkOut 30 25 25 onstruction 20 15 10 Reco 5 0 RAID10 RAID6 (a) (b) � Different RAID level: RAID10 (4 disks), RAID6 (8 disks) HUST & UNL 22

  23. Different Surrogate Set g 45 Dedicated RAID1 40 40 Dedicated RAID5 35 The same reconstruction time for the Live RAID5 30 PR three different surrogate sets 25 20 15 10 5 0 0 Fin1 Fin2 Web Dedicated RAID1: 2 disks Dedicated RAID1: 2 disks � � Dedicated RAID5: 4 disks � Live RAID5: 4 disks (Replaying the Fin1 workload on it) � HUST & UNL 23

  24. TPC-C-like Benchmark 15% 12000 tion Rate 10000 8000 8000 d Transact 6000 Normalized 4000 2000 N 0 0 (a) Transaction rate (b) Reconstruction time � Minimum reconstruction bandwidth of 1MB/s HUST & UNL 24

  25. Extendibility — Re-synchronization y y ms) (s) nse Time (m ation Time age Respon ynchroniza Avera Re-sy ( ) (a) (b) (b) � Re-synchronization: RAID5, 8 disks, 64KB stripe unit size � Surrogate RAID set: RAID5, 4 disks, 64KB stripe unit size � Minimum Re-synchronization bandwidth: 1MB/s HUST & UNL 25

Recommend


More recommend