Practical Scrubbing Getting to the bad sector at the right time George Amvrosiadis Bianca Schroeder University of Toronto Alina Oprea RSA Laboratories !"#$%#&'()*+$,)-).)/0$/1$234 Tuesday, June 26, 2012
Hard disk errors & Scrubbing Tuesday, June 26, 2012
What could go wrong? 3 Tuesday, June 26, 2012
What could go wrong? S u r f a c e a b n o r m a l i t i e s ! H Firmware e a d bug Spindle c l u r a b s e failure h e b d u / i l s d Electrical t - u u c failure p k PCB , failure 3 Tuesday, June 26, 2012
What could go wrong? Latent Sector Error (LSE) 4 L. Bairavasundaram et al., “An analysis of latent sector errors in disk drives”, ACM SIGMETRICS 2007. Tuesday, June 26, 2012
What could go wrong? No Redundancy + LSE = Data Loss Exp. errors during reconstruction 10 2012 (4x4 TB ) Expected: 0.96 LSEs 1.0 ~ 2015 (4x12 TB ) Expected: 2.88 LSEs 0.10 1.0E+14 0.01 1.0 2.0 4.0 8.0 16.0 32.0 64.0 Single − parity RAID (3+P) Array Capacity (TB) 5 Steven R. Hetzler, “ System Impacts of Storage Trends: Hard Errors and Testability ”. In USENIX ;login:, v.36/3 Tuesday, June 26, 2012
Scrubbing • Goal: Detect LSEs timely to enable recovery • How: Background process verifying sector contents • Detection speed: verify sectors at high frequency • Verification cost: avoid delaying workload requests • Previous Work: Focus on detecting LSEs fast • Cost of scrubbing? Practical questions raised: When should I scrub, How do I implement a to minimize impact on the scrubber? How do I configure system? it to find LSEs fast? 6 Tuesday, June 26, 2012
Implementation & Configuration Tuesday, June 26, 2012
Data scrubbing • Option 1: Use R EADs to verify data integrity - Overhead: Data transfer cost, Cache pollution - Correctness: Might not check medium surface • Option 2: Use V ERIFY firmware command - Caveat: Treated as scheduling barriers - Solution: Disguise scrubber’s V ERIFY s as R EAD s ! Verify Read Read Read Read Read Write Write Write 8 Tuesday, June 26, 2012
Data scrubbing • Option 1: Use R EADs to verify data integrity - Overhead: Data transfer cost, Cache pollution - Correctness: Might not check medium surface • Option 2: Use V ERIFY firmware command - Caveat: Treated as scheduling barriers - Solution: Disguise scrubber’s V ERIFY s as R EAD s Read Read Read Read Read Read Write Write Write Verify 8 Tuesday, June 26, 2012
System Overview Filesystem Layer Generic Block Layer Scrubbing Framework ? Verify Verify Verify I/O Scheduler Verify Verify Verify Verify On-disk cache Hard disk 9 Tuesday, June 26, 2012
Parameter 1: V ERIFY request size larger requests! smaller requests?! Finding the sweet spot between scrub throughput and service delay 300 Fujitsu SCSI, 36GB, 10K RPM Higher service Fujitsu SAS, 73GB, 15K RPM delay 250 Hitachi SAS, 300GB, 15K RPM Service time (msec) 200 Marginal increase in 150 throughput for requests ≥ 4 MB Seeking cost 100 dominant for ] requests ≤ 64 KB Lower 50 Throughput [ 0 16K 32K 64K 128K 256K 512K 1M 2M 4M 8M 16M VERIFY command size (bytes) 10 Tuesday, June 26, 2012
Parameter 2: Scrubbing Order • Sequential scrubbing (used in the field today) Logical 1 2 3 4 5 6 7 8 9 10 11 12 Can we afford this? Block Address HDD capacities: CAGR +100% Sector Data transfer speed: CAGR +40% • Staggered scrubbing (fast LSE detection) [Oprea, Juels FAST‘10] Logical 1 2 3 4 5 6 7 8 9 10 11 12 Block Address Sector Region Staggered scrubbing guarantees fast LSE detection, but: seeking overhead never evaluated in implementation 11 Tuesday, June 26, 2012
Parameter 3: Number of Regions more, smaller regions! fewer, larger regions?! Limit seeking between regions, but retain frequent disk passes 20.0 Slower LSE detection Performance of two (longer disk passes) Scrubbing throughput (MB/sec) 17.5 approaches equated Sequential scrubber [ 15.0 12.5 Staggered scrubber 10.0 7.5 Lower throughput 5.0 (Staggered) Hitachi SAS, 300GB, 15K RPM (more seeking) (Sequential) 2.5 (Staggered) Fujitsu SAS, 73GB, 15K RPM (Sequential) 0 1 2 4 8 16 32 64 128 256 512 Number of regions 12 Tuesday, June 26, 2012
System Overview Filesystem Layer Generic Block Layer V ERIFY size: Regions: Scrubbing Framework [64 KB , 4 MB ] ≥ 128 ! 1 2 3 4 5 6 7 8 9 10 11 12 Region Verify Verify Verify I/O Scheduler Verify Verify Verify Verify On-disk cache Hard disk 13 Tuesday, June 26, 2012
Minimizing Impact Tuesday, June 26, 2012
System Overview Filesystem Layer Generic Block Layer V ERIFY size: Regions: Scrubbing Framework [64 KB , 4 MB ] ≥ 128 ! 1 2 3 4 5 6 7 8 9 10 11 12 Region Verify Verify ? Verify I/O Scheduler Verify Verify Verify Verify On-disk cache Hard disk 15 Tuesday, June 26, 2012
Background Scheduling • Fire V ERIFY requests only when disk otherwise idle - Previous work: Focus on unobtrusive R EAD s/W RITE s [Lumb’02, Bachmat’02] • Avoid collisions with workload requests - Start time: When should we start firing V ERIFY s? - Stop time: When do we stop to avoid collision? • Statistical analysis of idleness - I/O traces: 2 systems, 77 disks, diverse workloads [SNIA, IOTTA repository] Collision Collision delay V ERIFY R EAD W RITE R EAD V ERIFY V ERIFY R EAD W RITE Time Busy interval Busy interval Idle interval 16 Tuesday, June 26, 2012
Idleness & Long Tails Property: Long Tail - Majority of idle time in few idle intervals [Riska ’09] 1.0 Fraction of total idle time 0.9 Processing b/w requests Lunch Break 0.8 Home 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.00 0.05 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 Fraction of largest idle intervals 17 Tuesday, June 26, 2012
Idleness & Long Tails Property: Long Tail - Majority of idle time in few idle intervals [Riska ’09] 1.0 Fraction of total idle time 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Threshold 0.00 0.05 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 Fraction of largest idle intervals Predictor: Waiting - Fire past threshold R EAD W RITE R EAD V ERIFY V ERIFY R EAD W RITE Time Waiting Threshold (T w ) 17 Tuesday, June 26, 2012
Idleness & Long Tails Property: Long Tail - Majority of idle time in few idle intervals [Riska ’09] 1.0 10 3 Expected idle time remaining (s) Fraction of total idle time 0.9 0.8 10 2 0.7 0.6 10 1 0.5 0.4 10 0 0.3 0.2 10 ! 1 0.1 0.0 10 ! 2 Threshold 0.00 0.05 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 10 ! 6 10 ! 5 10 ! 4 10 ! 3 10 ! 2 10 ! 1 10 0 10 1 10 2 Fraction of largest idle intervals Amount of idle time passed (s) Predictor: Waiting - Fire past threshold , stop only on collision V ERIFY R EAD W RITE R EAD V ERIFY V ERIFY R EAD W RITE Time Waiting Threshold (T w ) 17 Tuesday, June 26, 2012
Idleness & Periodicity Property: Periodicity - Repeating patterns in disk traffic 4.00 4.00 Number of requests (millions) Number of requests (millions) 2.00 2.00 1.00 1.00 0.40 0.40 0.20 0.20 0.10 0.10 0.04 0.04 0.02 0.02 0.01 0.01 0 0 24 24 48 48 72 72 96 96 120 120 144 144 168 168 Trace hour Trace hour Predictor: Autoregression - Fire if prediction > threshold, don’t stop X ms V ERIFY . . . R EAD W RITE R EAD V ERIFY V ERIFY R EAD Time Hours ago... X ms > Prediction Threshold (T p )? 18 Tuesday, June 26, 2012
Predictor evaluation 1.0 0.9 Fraction of idle time utilized by predictor Optimal 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 Fraction of idle intervals picked by predictor 19 Tuesday, June 26, 2012
Predictor evaluation 1.0 1.0 0.9 0.9 Fraction of idle time utilized by predictor Fraction of idle time utilized by predictor Oracle: always picks X% largest intervals 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 Oracle 0.1 0.1 Autoregression ! Waiting 0.0 0.0 0.00 0.00 0.01 0.01 0.02 0.02 0.03 0.03 0.04 0.04 0.05 0.05 0.06 0.06 0.07 0.07 0.08 0.08 0.09 0.09 0.10 0.10 Fraction of idle intervals picked by predictor Fraction of idle intervals picked by predictor 19 Tuesday, June 26, 2012
Recommend
More recommend