flexible cache error protection using an ecc fifo
play

Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and - PowerPoint PPT Presentation

Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin 1 SC09 ECC FIFO Goal: to reduce on-chip ECC overhead Two-tiered error


  1. Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin 1 SC’09

  2. ECC FIFO • Goal: to reduce on-chip ECC overhead – Two-tiered error protection • T1EC: light-weight on-chip error code • T2EC: strong error correcting code – Off-load T2EC overhead to FIFO in DRAM • Why FIFO? It’s simple to manage • 15-25 % LLC area reduction • 10-17 % LLC power saving • Just 1 % performance penalty 2 SC’09

  3. BACKGROUND 3 SC’09

  4. Error Correcting Codes • 1-bit parity for error detection • SEC-DED (Hamming) codes – Single-bit Error Correction and Double-bit Error Detection – 8bit ECC for 64bit data • DEC-TED – Double-bit Error Correction and Triple-bit Error Detection – 15bit ECC for 64bit data 4 SC’09

  5. Interleaving • To detect and correct burst errors – N-way interleaving converts an N-bit burst error to N single-bit errors 0 1 2 … N-1 N N+1 N+2 … 2N-1 2N 2N+1 2N+2 … Error code 0 Error code 1 Error code 2 . . . Error code N-1 5 SC’09

  6. Interleaving • To detect and correct burst errors – N-way interleaving converts an N-bit burst error to N single-bit errors 0 1 2 … N-1 N N+1 N+2 … 2N-1 2N 2N+1 2N+2 … Error code 0 Error code 1 Error code 2 . . . . . . Error code N-1 6 SC’09

  7. Interleaving • To detect and correct burst errors – N-way interleaving converts an N-bit burst error to N single-bit errors • Baseline cache error protection – 8 way interleaved SEC-DED • Can correct up to 8-bit burst errors • 8B ECC per 64B cache line 7 SC’09

  8. Uniform Error Protection Tag Data ECC 8 ways ... . . . . . . . . . . . . 2 11 sets . . . .. . 8B 64B ECC increases area AND leakage/dynamic power 8 SC’09

  9. RELATED WORK 9 SC’09

  10. Soft Errors: Observations • Still, Soft Error Rate (SER) is low – Every cache access tries to detect errors, but finds no error in most cases • Error Detection – Common case – Need a low cost, low overhead error detection mechanism • Error Correction – Uncommon case – Correction can be slow – But, still need to maintain error correction info somewhere • Memory hierarchy provides redundancy inherently for clean data – Only dirty lines need error correcting codes 10 SC’09

  11. PERC [Sorin’06] / Energy Efficient [Li’04] Tag Data EDC ECC 8 ways ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 11 sets . . . . . . . ... 1B 7B 64B Read only Data and EDC – saves dynamic power Power gate ECC of clean lines – saves static power 11 SC’09

  12. Area Efficient [Kim’06] Tag Data EDC 4 ways ECC . . . . . . . . . . . . . . . . 2 12 sets 2 12 sets . . . . . 1B 8B 64B Allow only 1 dirty line per set 12 SC’09

  13. MAXn scheme Tag Data EDC 8 ways ... n ways Tag ECC . .. (n<8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 11 2 11 . . . . . . sets sets .. . ... 8B 64B 1B Allow only n dirty lines per set May cause detrimental cleaning traffic 13 SC’09

  14. Two-tiered error protection • Tier-1 Error Code (T1EC) – On-chip light-weight error code – Uniform error protection • Tier-2 Error Code (T2EC) – Strong error codes only for dirty lines – Corrects Detected but Uncorrected Errors (DUE) of T1EC 14 SC’09

  15. Memory Mapped ECC [Yoon’09] Tag Data T1EC T2EC 8 ways ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 11 . . . . . . sets ... ... 1B 8B 64B On-Chip DRAM T2EC is memory mapped AND cached 15 SC’09

  16. ECC FIFO 16 SC’09

  17. ECC FIFO • Use Two-tiered error protection • T2EC is off-loaded to FIFO in DRAM – LLC caching behavior is unaffected • FIFO – Simple to manage • Coalesce buffer – To better utilize DRAM channel 17 SC’09

  18. Rest of cache hierarchy Data T1EC . T2EC . . encoder . . . . . . . . . . . . . . . . . . . . . Coalesce Buffer Last Level Cache DRAM . . . ECC FIFO 18 SC’09

  19. Rest of cache hierarchy Dirty line eviction to LLC Data T1EC . T2EC . . encoder . . . . . . . . . . . . . . . . . . . . . Coalesce Buffer Last Level Cache DRAM . . . ECC FIFO 19 SC’09

  20. Rest of cache hierarchy Encode T2EC and TAG Data T1EC . T2EC . . encoder TAG T2EC . . . . . . . . . . . . . . . . . . . . . Coalesce Buffer Last Level Cache DRAM . . . ECC FIFO 20 SC’09

  21. Rest of cache hierarchy Data T1EC . T2EC . . encoder Push to . . . . Coalesce . . . . Buffer . . . . . . . . . . . . . Coalesce Buffer TAG T2EC Last Level Cache DRAM . . . ECC FIFO 21 SC’09

  22. Rest of cache hierarchy Next dirty line comes Data T1EC . T2EC . Tag/T2EC . encoder buffered in . . . . Coalesce . . . . Buffer . . . . . . . . . . . . . Coalesce TAG T2EC Buffer TAG T2EC Last Level Cache DRAM . . . ECC FIFO 22 SC’09

  23. Rest of cache hierarchy Data T1EC . T2EC . . encoder . . . . Coalesce . . . . Buffer is TAG T2EC . . . . TAG T2EC . . . . now FULL . . . TAG T2EC . TAG T2EC . Coalesce TAG T2EC Buffer TAG T2EC Last Level Cache DRAM . . . ECC FIFO 23 SC’09

  24. Rest of cache hierarchy Data T1EC . T2EC . . encoder . . . . . . . . TAG T2EC . . . . TAG T2EC . . . . . . . TAG T2EC . TAG T2EC . Coalesce TAG T2EC Buffer TAG T2EC T2EC write size Last Level Cache matches to DRAM DRAM burst size Write the coalesced T2EC into ECC FIFO . . . ECC FIFO T2EC 24 SC’09

  25. Rest of cache hierarchy Data T1EC . T2EC . . encoder . . . . . . . . . . . . . . . . . . . . . Coalesce Buffer Last Level Cache Coalesce Buffer becomes empty DRAM . . . ECC FIFO T2EC 25 SC’09

  26. More on ECC FIFO • Write-back data, but write-through ECC • Potential performance degradation – Increased DRAM traffic due to T2EC writes • Error correction – Search the matching TAG in coalesce buffer AND ECC FIFO • May take a long time – Not a problem since SER is low • Sometimes, may not find the matching TAG – ECC FIFO is finite – Potentially unprotected dirty lines – discussed later 26 SC’09

  27. EVALUATION 27 SC’09

  28. Performance Evaluation • GEMS + DRAMsim – An out-of-order SPARC V9 core – Exclusive two-level cache hierarchy – DDR2 667MHz – 5.33GB/s – Eager write-back • Clean dirty lines periodically • Workloads – 16 data intensive applications – SPEC CPU 2006, PARSEC, and SPLASH2 28 SC’09

  29. Normalized Execution Time SC’09 0.95 1.05 0.9 1.1 1 DRAM – 5.33 GB/s CHOLESKY SPLASH2 FFT OCEAN Performance Penalty RADIX canneal PARSEC dedup fluidanimate freqmine 6.0% bzip2 mcf hmmer SPEC2006 libquantum omnetpp milc lbm sphinx3 1.2% Average 29

  30. Normalized Execution Time Normalized Execution Time SC’09 0.95 1.05 0.95 1.05 0.9 1.1 0.9 1.1 1 1 DRAM – 2.67 GB/s DRAM – 5.33 GB/s CHOLESKY CHOLESKY SPLASH2 SPLASH2 FFT FFT OCEAN OCEAN Performance Penalty RADIX RADIX canneal canneal PARSEC PARSEC dedup dedup fluidanimate fluidanimate freqmine freqmine 6.3% 6.0% bzip2 bzip2 mcf mcf hmmer hmmer SPEC2006 SPEC2006 libquantum libquantum omnetpp omnetpp milc milc 6.8% lbm lbm sphinx3 sphinx3 1.2% 2.3% Average Average 30

  31. Normalized Execution Time SC’09 0.9 1.1 1.2 1.3 1.4 1 Comparison to MAXn and MME CHOLESKY SPLASH2 FFT OCEAN RADIX canneal PARSEC dedup fluidanimate freqmine bzip2 mcf hmmer SPEC2006 libquantum omnetpp milc ECC FIFO MME Max4 Max2 Max1 lbm sphinx3 1% Average 31

  32. Normalized Execution Time SC’09 0.9 1.1 1.2 1.3 1.4 1 Comparison to MAXn and MME CHOLESKY SPLASH2 FFT OCEAN RADIX canneal PARSEC dedup fluidanimate freqmine bzip2 mcf hmmer SPEC2006 libquantum omnetpp milc ECC FIFO MME Max4 Max2 Max1 lbm sphinx3 1% Average 32

  33. Normalized Execution Time SC’09 0.9 1.1 1.2 1.3 1.4 1 Comparison to MAXn and MME CHOLESKY SPLASH2 FFT OCEAN RADIX canneal PARSEC dedup fluidanimate freqmine bzip2 mcf hmmer SPEC2006 libquantum omnetpp milc ECC FIFO MME Max4 Max2 Max1 lbm sphinx3 1% Average 33

  34. Normalized Execution Time SC’09 0.9 1.1 1.2 1.3 1.4 1 Comparison to MAXn and MME CHOLESKY SPLASH2 FFT OCEAN RADIX canneal PARSEC dedup fluidanimate freqmine bzip2 mcf hmmer SPEC2006 libquantum omnetpp milc ECC FIFO MME Max4 Max2 Max1 lbm sphinx3 4% 1% Average 34

  35. Normalized Execution Time SC’09 0.9 1.1 1.2 1.3 1.4 1 Comparison to MAXn and MME CHOLESKY SPLASH2 FFT OCEAN RADIX 11% canneal 23% PARSEC dedup fluidanimate freqmine bzip2 mcf hmmer 36% SPEC2006 libquantum 11% omnetpp milc ECC FIFO MME Max4 Max2 Max1 lbm sphinx3 4% 8% 1% Average 35

  36. Comparison to MME 2.60E+09 OCEAN 258x258 Baseline MME 2.40E+09 ECC FIFO Execution Time [cycle] 2.20E+09 2.00E+09 1.80E+09 10.4% 1.60E+09 1.40E+09 256KB 512KB 1MB 2MB 36 SC’09 LLC size

Recommend


More recommend