post silicon patchable hardware post silicon patchable
play

Post Silicon Patchable Hardware Post-Silicon Patchable Hardware - PowerPoint PPT Presentation

Post Silicon Patchable Hardware Post-Silicon Patchable Hardware Masahiro F jita Masahiro Fujita VLSI Design and Education Center (VDEC) VLSI Design and Education Center (VDEC) The University of Tokyo July 22 nd , 2011 Respin Statistics (North


  1. Post Silicon Patchable Hardware Post-Silicon Patchable Hardware Masahiro F jita Masahiro Fujita VLSI Design and Education Center (VDEC) VLSI Design and Education Center (VDEC) The University of Tokyo July 22 nd , 2011

  2. Respin Statistics (North America) 100% 80% Respin is becoming cess more frequent more frequent on Succ 60% 48% 44% 44% st Silico 39% 40% 33% Firs 20% 0% 1998 1998 2000 2000 2002 2002 2004 2004 [G. S. Spirakis, DATE 2006] Fujita Lab. – VLSI Design and Education Center - University of Tokyo 2

  3. Manufacturing Cost $5M $4M US$) Cost (U $3M sk Set C $2M Mas Respin risk is $1M increasing dramatically 90nm 90nm 65nm 65nm 45nm 45nm 32nm 32nm [Nikkei Electronics, 2008] Fujita Lab. – VLSI Design and Education Center - University of Tokyo 3

  4. Causes for Respins 91% Logic/Function 36% 36% Cl Clock k Fast Path 32% Slow Path Slow Path 26% 26% Delay/Glitch 26% Logic and functional errors Power 21% are the leading cause Yield 19% Analog 19% Fi Firmware 17% Mixed Signal 15% IR Drop IR Drop 15% 15% 0% 20% 40% 60% 80% 100% IC/ASIC Designs Having One or More Re spins by Type of Flaw IC/ASIC Designs Having One or More Re-spins by Type of Flaw [Collett International Research 2005] Fujita Lab. – VLSI Design and Education Center - University of Tokyo 4

  5. Conventional SoC Design Flow High-Level Description 75 % of the whole development time High-Level Bug Fix Synthesis Synthesis [Source: Intel 2007] [Source: Intel 2007] Machine- Bug Localization Generated RTL Bug Fix Verification/Simul Logic Synthesis Logic Synthesis Bug Bug ation ation Place & Route Localization Pre-Silicon Need to RTL Verification Understand RTL E Error Detection SoC Post-Silicon Design RTL Validation Fujita Lab. – VLSI Design and Education Center - University of Tokyo 5

  6. Proposed Patchable SoC Design Flow High-Level Bug Fix g Description i i Bug Localization High-Level High Level Error Error Synthesis of Detection Patchable HW Verification/Simul ation ation B Bug Localization Pre-Silicon Logic Synthesis High-Level Verification Place & Route No Bug Fix Respin Needed! P t h Patch Compilation Patchable SoC Post-Silicon Design High-Level ECO Fujita Lab. – VLSI Design and Education Center - University of Tokyo 6

  7. Proposed Patchable Hardware Efficeum offers behavioral-level programmability using a patchable controller Custom Datapath Patchable Controller Patchable Controller Hardwired Hardwired Hardwired Hardwired Patch Patch Patch Patch FSM FSM FSM FSM ALU1 ALU1 ALU1 ALU1 ALU2 ALU2 ALU2 ALU2 Partially-Programmable Circuit (PPC) offers logic-level programmability offers logic level programmability using a mixed gate/LUT circuit Fujita Lab. – VLSI Design and Education Center - University of Tokyo 7

  8. Effice m Efficeum: An Energy-Efficient Patchable Accelerator An Energy-Efficient Patchable Accelerator For Post-Silicon Engineering Changes Fujita Lab. – VLSI Design and Education Center - University of Tokyo 8

  9. Energy Efficiency vs. Programmability Energy Efficiency of 90nm OFDM Energy Efficiency of 90nm OFDM Fixed-function HW: 200GOPS/W E Embedded Proc.: 4GOPS/W 50X! b dd d P 4GOPS/W 50X! Laptop Proc.: 0.05GOPS/W 4,000X! >100GOPS High Performance 〜 1W Power/Thermal Constraints 1W P /Th l C i Energy efficiency (in [GOPS/W] or [J/op]) Energy efficiency (in [GOPS/W] or [J/op])  How much computation can be done in a given energy  Slowing down the chip reduces power but not efficiency  Sl i d th hi d b t t ffi i Fujita Lab. – VLSI Design and Education Center - University of Tokyo 9

  10. Fixed-Function Accelerator  Achieves high energy efficiency by customization:  Hardwired controller → No reprogrammability  Highly-customized datapath → Low flexibility Hardwired Hardwired Local Local Reg Reg Reg Reg Reg Reg Controller Controller ・・・ Store Store 1 1 2 2 3 3 Control Sparse Interconnect Network Sparse Interconnect Network Comp- Comp- Multi- Multi- ・・・ ALU1 ALU1 ALU2 ALU2 arator arator t t plier plier li li Fujita Lab. – VLSI Design and Education Center - University of Tokyo 10

  11. Proposed Patchable Accelerator  Behavioral reprogrammability by control patching  Increased flexibility by adding register file via data bus y y g g Local Local Hardwired Hardwired Patch Patch Patch Patch Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Store Store Store Store Controller Controller C C t t ll ll ・・・ Logic Logic 1 1 2 2 3 3 Control Control Bus Sparse Interconnect Network Sparse Interconnect Network Sparse Interconnect Network Sparse Interconnect Network Data Bus ・・・ Comp- Comp- Multi- Multi- ALU1 ALU1 ALU2 ALU2 Register Register g g arator arator arator arator plier plier plier plier File File Fujita Lab. – VLSI Design and Education Center - University of Tokyo 11

  12. Patch Logic PC1 er =? =? Counte >PC patch ? >PC patch ? PC2 =? =? ogram C Signal ・・ Hardwired Hardwired PC1’ Controller Controller ontrol S ・・・ Pro PC2’ ・ ・ Control Co Signal Memory Memory Program Counter Patch Control Signal Patch P Patch Memory h M Fujita Lab. – VLSI Design and Education Center - University of Tokyo 12

  13. Patching Example (1/2) Scheduling Result of Initial Design PC ALU1 ALU2 MUL1 Next PC 1 2 wired 2 3 ic Hardw log 3 1 4 ogic Patch lo Dataflow graph for 5 Initial Design Fujita Lab. – VLSI Design and Education Center - University of Tokyo 13

  14. Patching Example (2/2) Scheduling Result After Engineering Change PC ALU1 ALU2 MUL1 NextPC 1 2 4 wired 2 3 ic logi Hardw 3 1 4 3 Patch logic Dataflow graph 5 After EC Fujita Lab. – VLSI Design and Education Center - University of Tokyo 14

  15. Patching-Based Post-Silicon ECO Flow Post-Silicon ECO P t Sili ECO C Program Post-ECO Program (Spec. Change & Bug Fix) High-Level Synthesis S h i + x + << - x x Computing the Difference Fixed-Function HW Between Two Programs Inserting RF & Writing into Patch Logic Patch Memory x + + - << << Patch x Patch Compilation Patch Compilation Efficeum Fujita Lab. – VLSI Design and Education Center - University of Tokyo 15

  16. Experimental Setup  Example: 8x8 IDCT  T  Technology: FreePDK 45nm h l F PDK 45  Logic Synthesis: Synopsys Design Compiler Ultra g y y p y g p  High effort options with gated clock optimization  P&R Cadence SoC Enco nter  P&R: Cadence SoC Encounter  Simulation: Synopsys VCS y p y  Power/timing analysis: Synopsys PrimeTime PX  Si  Simulation results are used for power calculation l ti lt d f l l ti  Energy efficiencies (GOPS/W) are compared Fujita Lab. – VLSI Design and Education Center - University of Tokyo 16

  17. Energy Efficiency Comparison No Patching Fully-Patched 6% 48% 89% 8x8 IDCT (FreePDK 45nm technology) ( gy) Offers a tradeoff between efficiency and programmability Fujita Lab. – VLSI Design and Education Center - University of Tokyo 17

  18. Area & Performance Comparison 5% 20% 5X Smaller Up to 40% Up to 40% Increase Fujita Lab. – VLSI Design and Education Center - University of Tokyo 18

  19. Area Comparison 4x reduction 4x reduction 18% increase Fully-programmable accelerator Single-function Hardwired accelerator Hardwired accelerator Effi Efficeum (Technology: FreePDK 45nm (NCSU/Nangate), Operating Frequency: 200MHz) Fujita Lab. – VLSI Design and Education Center - University of Tokyo 19

  20. Power Comparison 6x reduction 6x reduction 13% increase Fully-programmable accelerator Single-function Hardwired accelerator Hardwired accelerator Effi Efficeum (Technology: FreePDK 45nm (NCSU/Nangate), Operating Frequency: 200MHz) Fujita Lab. – VLSI Design and Education Center - University of Tokyo 20

  21. Incremental High Le el S nthesis Incremental High-Level Synthesis and Patch Compilation and Patch Compilation For High-Level ECO g Fujita Lab. – VLSI Design and Education Center - University of Tokyo 21

  22. Conventional High-Level Synthesis  Several phases are applied separately  This prevents incremental synthesis  This prevents incremental synthesis Binding Allocation Scheduling g SHFT1 ADD1 ADD2 MUL1 AD AD + D1 D2 Step 1 x + x x Step 1 Step 1 + + + + + << Step 2 MU SHFT << + Step 2 x Step 3 L1 1 x Step 3 Registers FSM FSM D t Datapath th Datapath Datapath Datapath Fujita Lab. – VLSI Design and Education Center - University of Tokyo 22

  23. Incremental High-Level Synthesis  Each operation is scheduled and bound incrementally, and the hardware is enhanced accordingly Incremental Incremental Scheduling & Binding Scheduling & Binding SHFT1 SHFT1 ADD1 ADD2 MUL1 ADD1 ADD2 MUL1 Step 1 + + + + + + + + Step 1 Step 1 << Step 2 << + + Step 2 x Step 3 Registers Registers Registers Registers FSM S FSM S Datapath Datapath Fujita Lab. – VLSI Design and Education Center - University of Tokyo 23

Recommend


More recommend