argus low cost comprehensive error detection for simple
play

Argus: Low-cost, Comprehensive Error-Detection for Simple Cores - PowerPoint PPT Presentation

Argus: Low-cost, Comprehensive Error-Detection for Simple Cores Albert Meixner, Michael Bauer, Daniel Sorin Duke University Introduction Introduction Hardware error rates are expected to rise as CMOS shrinks Online error detection


  1. Argus: Low-cost, Comprehensive Error-Detection for Simple Cores Albert Meixner, Michael Bauer, Daniel Sorin Duke University

  2. Introduction Introduction � Hardware error rates are expected to rise as CMOS shrinks � Online error detection techniques can keep errors from propagating to the application propagating to the application • Dual Modular Redundancy • Redundant Multithreading • Checker cores (DIVA) Checker cores (DIVA) � Existing techniques are overly expensive for small, simple cores • Simple cores dominate embedded market Si l d i t b dd d k t • Throughput-oriented CMPs utilize simple cores Sun Ultrasparc T1

  3. Argus Goals and Approach Argus Goals and Approach � Goal: Detect both transient and permanent errors in simple cores at low cost � Approach: Decompose program execution into four high-level tasks and check them independently • Control Flow, Data Flow, Computation, Memory Access C l Fl D Fl C i M A � Advantages of high-level decomposition • Checkers exploit task-specific properties to reduce cost Ch k l it t k ifi ti t d t • Unlike per-component checkers, tasks are abstract and implementation-independent p p

  4. Task Decomposition Task Decomposition Dynamic Instruction Instruction Stream Data Flow Data Flow Computation Computation Correct inputs Operation selected and result r1 ← r3+r4 result data passed computed Inputs correctly correctly Memory Control Flow Data Correct transferred transferred instruction instruction correctly from selected for and to memory execution

  5. From Theory to Practice From Theory to Practice Ideal Checker Tasks Tasks Checkers Hardware CF A CF A CF Z Control Flow CF Checker Definit Hardw DF A DF Z DF Z Data Flow DF Checker Form Desig mal ware Computation Computation gn tion CC A CC A CC Z Computation Checker Memory MC A MC A MC Z Memory A A Z Checker Checker Completeness Equivalence Proof Proof * * * * “Checkers ensure “Argus-1 checkers correct execution” are equivalent to ideal checkers”

  6. Limitations Limitations � Completeness Proof assumes no interrupts, exceptions, or I/O • Single fault assumption Single fault assumption � Equivalence Proof holds under limiting assumptions • Equivalence only holds at block boundary • Perfect checksums (no aliasing) • Known coverage hole in memory checker • Known coverage hole in memory checker � When assumptions are violated, errors can go undetected

  7. Outline Outline � Introduction � Basic Argus concept g p � Argus-1 checker designs � Arg s 1 implementation and e al ation � Argus-1 implementation and evaluation � Conclusions

  8. Control Flow Checker Control Flow Checker loop: loop: A � Similar to prior control flow … checkers bnez r1, L1 � Assign each basic block B C address-independent ID dd i d d t ID L1: • ID computed from block B C … … contents j L2 � Embed IDs of legal � Embed IDs of legal D D successors in each block • Most blocks have one or two L2: legal successors D D … • Pick correct ID at runtime bez r2, loop � Indirect branch addresses are A E more challenging • • See paper See paper E ret

  9. Data Flow Checker Data Flow Checker � B � Based on “Dynamic B Basic Block d “D i sub b r4, r2, r3 4 2 3 mul r5, r2, r6 Dataflow Verification” add r3, r5, r4 • Presented at PACT 2007 k • Compiler computes reference data flow r2 r3 r6 Dat signatures for basic Values Val es bl blocks k aflow Gra protected • Data flow checker tracks actual data flow and with EDC compares to reference t f ph � Data flow signatures are r4 r3 r5 used as block IDs for control flow checker Dataflow Signature

  10. Computation Checkers Computation Checkers � Not a single monolithic checker, but multiple sub-checkers for different operations • Large amounts of prior work on computation checking Large amounts of prior work on computation checking � Operations are checked using redundant hardware • Exploit that checking computation is often easier than performing it � Multiply checker trades coverage for cost • Replay modulo 31 • Non zero probability of missing errors due to aliasing • Non-zero probability of missing errors due to aliasing

  11. Memory Checker Memory Checker � Data corruption detected using parity � Addressing errors are transformed into data corruption • Error in cache logic transforms access to address A into E i h l i t f t dd A i t access to address B • No storage overhead, addresses are embedded into data words � Address computation and alignment errors are detected by redundant computation checkers by redundant computation checkers � Stores that don’t update the cache are not detected • Unlikely error scenario, high-level fixes are expensive y , g p

  12. Outline Outline � Introduction � Basic Argus concept g p � Argus-1 checker designs � Arg s 1 implementation and e al ation � Argus-1 implementation and evaluation � Conclusions

  13. Argus-1 Core Specs Argus 1 Core Specs � Based on Verilog model of OpenRISC 1200 core • 4-stage, single-issue, 32-bit RISC CPU • Fully functional, open source core from opencores.org ll f i l f � Removed unnecessary features to obtain a minimal core minimal core • TLBs, advanced interrupt controller, debug unit • Worst case for Argus-1 area overhead Worst case for Argus 1 area overhead � GCC 3.4 used to compile benchmarks • Patch from opencores.org adds OpenRISC support Patch from opencores.org adds OpenRISC support

  14. Argus-1 Pipeline Overview Argus 1 Pipeline Overview Original Argus

  15. Argus 1 Compilation Tool Chain Argus-1 Compilation Tool Chain compile pad assemble link sign O g a Original Argus gus � Embed signatures used for data and control flow checking � To minimize code bloat, signatures are embedded in unused instruction bits unused instruction bits • Blocks with insufficient unused bits padded with NOPs � Signatures are embedded after linking • Compute data flow signatures for each block • Determine legal successor blocks • Embed signatures of legal successors into unused bits Embed signatures of legal successors into unused bits

  16. Argus-1 Error Coverage Argus 1 Error Coverage � Coverage results based on error injection experiments • 5000 test-runs each with a single fault injected into a 5000 test-runs, each with a single fault injected into a different randomly selected gate • Compare test program run to known correct execution • Test does not use configuration registers, interrupts, T t d t fi ti i t i t t and exception logic � Argus detects 98.0% of transient and 98.8% of g permanent errors that affected test program � Most undetected errors due to aliasing in operand parity it

  17. Argus 1 Area Overhead Argus-1 Area Overhead � Synthesized with Component Overhead Core 16.6% Synopsys Design 8KB, 2 ‐ way D ‐ Cache , y 5.1% Compiler using 250nm Compiler using 250nm 8KB, 2 ‐ way I ‐ Cache 0% VTVT standard cell Argus ‐ 1 (Core+Caches) 10.6% library library � Laid out with Cadence Silicon Ensemble � Cache overhead estimated with CACTI

  18. Argus-1 Performance Overhead Argus 1 Performance Overhead � No direct impact from checkers • Checkers work in parallel with regular execution and never stall the pipeline and never stall the pipeline • CAD tools showed no increase in cycle time � Only impact is from padding blocks to embed Only impact is from padding blocks to embed signatures • One cycle penalty for each embedded NOP • Increased pressure on instruction cache � Performance results obtained by running MediaBench on the OR1K simulator MediaBench on the OR1K simulator

  19. Performance Overhead Graph Performance Overhead Graph

  20. Conclusions Conclusions � Self-checking core can be built using a high-level “divide and conquer” approach • Correctness of this approach can be shown formally pp y � Individual tasks can be checked using existing checkers with slight alterations • Result is a self checking core with very low area and • Result is a self-checking core with very low area and performance overhead � Not a complete solution for self-checking chip, yet • Missing error detection for exception and interrupt circuitry Mi i d i f i d i i i • Use multi-processor aware memory checker to build self- checking CMP

Recommend


More recommend