audit stress testing the automatic way
play

AUDIT: Stress Testing the Automatic Way Youngtaek Kim, Lizy Kurian - PowerPoint PPT Presentation

MICRO 2012 AUDIT: Stress Testing the Automatic Way Youngtaek Kim, Lizy Kurian John ECE, The University of Texas at Austin Sanjay Pant, Srilatha Manne, Michael Schulte, W. Lloyd Bircher, Madhu S. Sibi Govindan AMD Research and PPO Lab,


  1. MICRO 2012 AUDIT: Stress Testing the Automatic Way Youngtaek Kim, Lizy Kurian John ECE, The University of Texas at Austin Sanjay Pant, Srilatha Manne, Michael Schulte, W. Lloyd Bircher, Madhu S. Sibi Govindan AMD Research and PPO Lab, Advanced Micro Devices, Inc. Laboratory for Computer Architecture 12/04/2012

  2. AUDIT: Stress Testing the Automatic Way Outline  AUDIT: AUtomated DI/dT Stressmark Generation – is an automation framework for stressmark generation • Target: Multi-core processor • Finding Max. Voltage Droop: Genetic Algorithm with hardware measurement – generates effective di/dt stressmarks in a short time • Larger voltage droop than other benchmarks/stressmarks • Higher voltage failure points than other benchmarks/stressmarks – works well with different configurations / architectures • Throttling off / on • Different processor 2 Laboratory for Computer Architecture 12/04/2012

  3. AUDIT: Stress Testing the Automatic Way Introduction: Reliability ( di/dt = inductive noise) Issue Physical Structure of PDN current Inductive Noise ( di/dt noise) CPU cycles voltage Vdd Corresponding Circuit Representation Supply Voltage Margin MB Package Die CPU cycles V = Vdd – L*di/dt – RI R MB L MB R pkg 1 L pkg 1 R pkg 2 L pkg 2 R die L die D Vdd ( t ) Insufficient voltage C pkg VRM C MB C die i die ( t ) Increase in delay R MB L MB R pkg 1 L pkg 1 R pkg 2 L pkg 2 R die L die Timing violation! 3 Laboratory for Computer Architecture 12/04/2012

  4. AUDIT: Stress Testing the Automatic Way Introduction: Supply Voltage Failure Symptoms  Unexpected Value / Wrong Result – bit-flip: 0  1 or 1  0  OS Freezing / System Hang  Blue Screen  Sudden Shutdown 4 Laboratory for Computer Architecture 12/04/2012

  5. AUDIT: Stress Testing the Automatic Way Background: Characteristics of di/dt Voltage Noise 5 Laboratory for Computer Architecture 12/04/2012

  6. AUDIT: Stress Testing the Automatic Way Related Work & Motivation  To characterize di/dt voltage noise in a microprocessor – Using standard benchmarks • SPEC benchmarks: ineffective to test voltage margin – Generating and running di/dt stressmarks • Manual stressmark [Joseph, HPCA’ 03] – Inefficient to make a new manual stressmark for different configurations • Instruction Scheduling using Integer Linear Programming (ILP) [Ketkar, MICRO’ 09] – Difficult to make linear algebra formula for a complex system • Genetic Algorithm [Joshi, HPCA’08] [Kim, ISLPED’11] – Single-core, simulation only  Automatic di/dt Stressmark Generation using Genetic Algorithm with Post-Silicon Hardware Measurement 6 Laboratory for Computer Architecture 12/04/2012

  7. AUDIT: Stress Testing the Automatic Way AUDIT: AUtomated DI/dT Stressmark Generation Initial Seed Opcode Entries List Opcode Genetic Code x86 POPULATION Seq Algorithm Gen. Assembly (no regs) Control No Params Met Cost Current Simulator Exit HSPICE Function Trace Cond? Yes PDN End HW Measure 7 Laboratory for Computer Architecture 12/04/2012

  8. AUDIT: Stress Testing the Automatic Way AUDIT: AUtomated DI/dT Stressmark Generation Initial Seed Opcode Entries List Opcode Genetic Code x86 POPULATION Seq Algorithm Gen. Assembly (no regs) Control No Params Met Cost Current Simulator Exit HSPICE Function Trace Cond? Yes PDN End HW Measure 8 Laboratory for Computer Architecture 12/04/2012

  9. AUDIT: Stress Testing the Automatic Way AUDIT – Genetic Algorithm: Operational Concept Voltage 1.0V max. v droop  Time initial after crossover after mutation 9 Laboratory for Computer Architecture 12/04/2012

  10. AUDIT: Stress Testing the Automatic Way AUDIT: AUtomated DI/dT Stressmark Generation Initial Seed Opcode Entries List Opcode Genetic Code x86 POPULATION Seq Algorithm Gen. Assembly (no regs) Control No Params Met Cost Current Simulator Exit HSPICE Function Trace Cond? Yes PDN End HW Measure 10 Laboratory for Computer Architecture 12/04/2012

  11. AUDIT: Stress Testing the Automatic Way AUDIT – Instruction Sequence Generation 11 Laboratory for Computer Architecture 12/04/2012

  12. AUDIT: Stress Testing the Automatic Way AUDIT: AUtomated DI/dT Stressmark Generation Initial Seed Opcode Entries List Opcode Genetic Code x86 POPULATION Seq Algorithm Gen. Assembly (no regs) Control No Params Met Cost Current Simulator Exit HSPICE Function Trace Cond? Yes PDN End HW Measure 12 Laboratory for Computer Architecture 12/04/2012

  13. AUDIT: Stress Testing the Automatic Way AUDIT – Hardware Measurement  Hardware Measurement for Max. Voltage Droop and Power Target Monitor Host Monitor Oscilloscope Target Board DAQ Differential Probe 13 Laboratory for Computer Architecture 12/04/2012

  14. AUDIT: Stress Testing the Automatic Way Step 1: Frequency Sweep  To find 1 st droop resonant frequency, frequency sweep = increasing code length with simple instructions – HP length: 4, 8, 12, …, 4n – LP length: 4, 8, 12, …, 4n – Total length: (4+4), (8+8), (12+12), …, (4n+4n) Larger Droop Voltage droop (V) 8 16 24 32 40 48 56 Loop Length 14 14 Laboratory for Computer Architecture 12/04/2012

  15. AUDIT: Stress Testing the Automatic Way Step 2: Using Sub-blocks for GA  Scaling & Replicating the Base part Schedule “Base” 1. Replicate “Base” according to resonant cycles 2. 15 15 Laboratory for Computer Architecture 12/04/2012

  16. AUDIT: Stress Testing the Automatic Way Step 3: Code Generation for Multiple Threads Prepare a core part – one high-low power pattern 1. 2. Make multiple copies of <1> to increase the intensity of resonance 3. Add a header part that contains initialization codes 4. Make multiple copies of <3> according to the number of threads 1. 2. 3. 4. 5. 16 Laboratory for Computer Architecture 12/04/2012

  17. AUDIT: Stress Testing the Automatic Way Step 3: Code Generation for Multiple Threads  Natural dithering: thread alignment shifts due to OS Droop amplitude changes due to OS alignment shifts Max droop 16 ms 17 Laboratory for Computer Architecture 12/04/2012

  18. AUDIT: Stress Testing the Automatic Way Step 3: Code Generation for Multiple Threads Prepare a core part – one high-low power pattern 1. 2. Make multiple copies of <1> to increase the intensity of resonance 3. Add a header part that contains initialization codes 4. Make multiple copies of <3> according to the number of threads 5. Attach dithering parts to each thread for alignment 1. 2. 3. 4. 5. Aligned 18 Laboratory for Computer Architecture 12/04/2012

  19. AUDIT: Stress Testing the Automatic Way Experimental Methodology - Benchmark  Benchmark – Standard benchmark • SPEC CPU2006 (12 INTs and 17 FPs): multi-programed • PARSEC: multi-threaded – Stressmark • Manual: SM1 and SM2 (single+resonant) and SM-Res (resonant) • AUDIT: A-Ex (single) and A-Res (resonant)  Compiler: NASM, gcc 4.6.2  OS: Windows 7, RedHat Linux Enterprise 6 19 Laboratory for Computer Architecture 12/04/2012

  20. AUDIT: Stress Testing the Automatic Way Experimental Methodology - Thread Configuration 1T per module 2T per module 1T 2T 4T 8T 1 1 1 1 1 1 1 1 Thread Core Core 0-1 4-5 T T T T T T T T 1 1 1 1 1 1 1 Core Core 2-3 T 6-7 T T T T T T AMD Bulldozer Cores 2 cores per Bulldozer Shared Front-end FPU L2 Cache 20 Laboratory for Computer Architecture 12/04/2012

  21. AUDIT: Stress Testing the Automatic Way Experimental Results - Max. Voltage Droop Larger Droop Relative to Manual (SM1) PARSEC SPEC INT SPEC FP 21 Laboratory for Computer Architecture 12/04/2012

  22. AUDIT: Stress Testing the Automatic Way Experimental Results - Max. Voltage Droop Larger Droop Relative to Manual (SM1) • Manual: SM1, SM2, SM-Res • AUDIT Single Droop: A-Ex • AUDIT Resonant: A-Res 22 Laboratory for Computer Architecture 12/04/2012

  23. AUDIT: Stress Testing the Automatic Way Experimental Results - Max. Voltage Droop Larger Droop Relative to Manual (SM1) • Manual: SM1, SM2, SM-Res • AUDIT Single Droop: A-Ex • AUDIT Resonant: A-Res, A-Res-8T 23 Laboratory for Computer Architecture 12/04/2012

  24. AUDIT: Stress Testing the Automatic Way Experimental Results - Voltage at Failure  Lowering the operating voltage & finding the voltage at failure  Higher voltage at failure  more stressful benchmark More Stress Benchmark V at Fail A-Res V F V F – 12.5 mV SM-Res V F – 62.5 mV SM1 V F – 75.0 mV A-Ex V F – 87.5 mV SM2 V F – 125 mV zeusmp V F – 125 mV swaptions 24 Laboratory for Computer Architecture 12/04/2012

  25. AUDIT: Stress Testing the Automatic Way Experimental Results - Droop Probability  Histogram of Droop Event – 8M samples are captured at Max. voltage droop – More frequent, larger droop  more probability of failure Freq of droop events Freq of droop events Freq of droop events undershoot overshoot undershoot overshoot undershoot overshoot Vmin Vmin Vmin Vnom Vnom Vnom Proc supply voltage Proc supply voltage Proc supply voltage * Max. Droop = V nom - V min 25 Laboratory for Computer Architecture 12/04/2012

Recommend


More recommend