Post Silicon Patchable Hardware Post-Silicon Patchable Hardware Masahiro F jita Masahiro Fujita VLSI Design and Education Center (VDEC) VLSI Design and Education Center (VDEC) The University of Tokyo July 22 nd , 2011
Respin Statistics (North America) 100% 80% Respin is becoming cess more frequent more frequent on Succ 60% 48% 44% 44% st Silico 39% 40% 33% Firs 20% 0% 1998 1998 2000 2000 2002 2002 2004 2004 [G. S. Spirakis, DATE 2006] Fujita Lab. – VLSI Design and Education Center - University of Tokyo 2
Manufacturing Cost $5M $4M US$) Cost (U $3M sk Set C $2M Mas Respin risk is $1M increasing dramatically 90nm 90nm 65nm 65nm 45nm 45nm 32nm 32nm [Nikkei Electronics, 2008] Fujita Lab. – VLSI Design and Education Center - University of Tokyo 3
Causes for Respins 91% Logic/Function 36% 36% Cl Clock k Fast Path 32% Slow Path Slow Path 26% 26% Delay/Glitch 26% Logic and functional errors Power 21% are the leading cause Yield 19% Analog 19% Fi Firmware 17% Mixed Signal 15% IR Drop IR Drop 15% 15% 0% 20% 40% 60% 80% 100% IC/ASIC Designs Having One or More Re spins by Type of Flaw IC/ASIC Designs Having One or More Re-spins by Type of Flaw [Collett International Research 2005] Fujita Lab. – VLSI Design and Education Center - University of Tokyo 4
Conventional SoC Design Flow High-Level Description 75 % of the whole development time High-Level Bug Fix Synthesis Synthesis [Source: Intel 2007] [Source: Intel 2007] Machine- Bug Localization Generated RTL Bug Fix Verification/Simul Logic Synthesis Logic Synthesis Bug Bug ation ation Place & Route Localization Pre-Silicon Need to RTL Verification Understand RTL E Error Detection SoC Post-Silicon Design RTL Validation Fujita Lab. – VLSI Design and Education Center - University of Tokyo 5
Proposed Patchable SoC Design Flow High-Level Bug Fix g Description i i Bug Localization High-Level High Level Error Error Synthesis of Detection Patchable HW Verification/Simul ation ation B Bug Localization Pre-Silicon Logic Synthesis High-Level Verification Place & Route No Bug Fix Respin Needed! P t h Patch Compilation Patchable SoC Post-Silicon Design High-Level ECO Fujita Lab. – VLSI Design and Education Center - University of Tokyo 6
Proposed Patchable Hardware Efficeum offers behavioral-level programmability using a patchable controller Custom Datapath Patchable Controller Patchable Controller Hardwired Hardwired Hardwired Hardwired Patch Patch Patch Patch FSM FSM FSM FSM ALU1 ALU1 ALU1 ALU1 ALU2 ALU2 ALU2 ALU2 Partially-Programmable Circuit (PPC) offers logic-level programmability offers logic level programmability using a mixed gate/LUT circuit Fujita Lab. – VLSI Design and Education Center - University of Tokyo 7
Effice m Efficeum: An Energy-Efficient Patchable Accelerator An Energy-Efficient Patchable Accelerator For Post-Silicon Engineering Changes Fujita Lab. – VLSI Design and Education Center - University of Tokyo 8
Energy Efficiency vs. Programmability Energy Efficiency of 90nm OFDM Energy Efficiency of 90nm OFDM Fixed-function HW: 200GOPS/W E Embedded Proc.: 4GOPS/W 50X! b dd d P 4GOPS/W 50X! Laptop Proc.: 0.05GOPS/W 4,000X! >100GOPS High Performance 〜 1W Power/Thermal Constraints 1W P /Th l C i Energy efficiency (in [GOPS/W] or [J/op]) Energy efficiency (in [GOPS/W] or [J/op]) How much computation can be done in a given energy Slowing down the chip reduces power but not efficiency Sl i d th hi d b t t ffi i Fujita Lab. – VLSI Design and Education Center - University of Tokyo 9
Fixed-Function Accelerator Achieves high energy efficiency by customization: Hardwired controller → No reprogrammability Highly-customized datapath → Low flexibility Hardwired Hardwired Local Local Reg Reg Reg Reg Reg Reg Controller Controller ・・・ Store Store 1 1 2 2 3 3 Control Sparse Interconnect Network Sparse Interconnect Network Comp- Comp- Multi- Multi- ・・・ ALU1 ALU1 ALU2 ALU2 arator arator t t plier plier li li Fujita Lab. – VLSI Design and Education Center - University of Tokyo 10
Proposed Patchable Accelerator Behavioral reprogrammability by control patching Increased flexibility by adding register file via data bus y y g g Local Local Hardwired Hardwired Patch Patch Patch Patch Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Store Store Store Store Controller Controller C C t t ll ll ・・・ Logic Logic 1 1 2 2 3 3 Control Control Bus Sparse Interconnect Network Sparse Interconnect Network Sparse Interconnect Network Sparse Interconnect Network Data Bus ・・・ Comp- Comp- Multi- Multi- ALU1 ALU1 ALU2 ALU2 Register Register g g arator arator arator arator plier plier plier plier File File Fujita Lab. – VLSI Design and Education Center - University of Tokyo 11
Patch Logic PC1 er =? =? Counte >PC patch ? >PC patch ? PC2 =? =? ogram C Signal ・・ Hardwired Hardwired PC1’ Controller Controller ontrol S ・・・ Pro PC2’ ・ ・ Control Co Signal Memory Memory Program Counter Patch Control Signal Patch P Patch Memory h M Fujita Lab. – VLSI Design and Education Center - University of Tokyo 12
Patching Example (1/2) Scheduling Result of Initial Design PC ALU1 ALU2 MUL1 Next PC 1 2 wired 2 3 ic Hardw log 3 1 4 ogic Patch lo Dataflow graph for 5 Initial Design Fujita Lab. – VLSI Design and Education Center - University of Tokyo 13
Patching Example (2/2) Scheduling Result After Engineering Change PC ALU1 ALU2 MUL1 NextPC 1 2 4 wired 2 3 ic logi Hardw 3 1 4 3 Patch logic Dataflow graph 5 After EC Fujita Lab. – VLSI Design and Education Center - University of Tokyo 14
Patching-Based Post-Silicon ECO Flow Post-Silicon ECO P t Sili ECO C Program Post-ECO Program (Spec. Change & Bug Fix) High-Level Synthesis S h i + x + << - x x Computing the Difference Fixed-Function HW Between Two Programs Inserting RF & Writing into Patch Logic Patch Memory x + + - << << Patch x Patch Compilation Patch Compilation Efficeum Fujita Lab. – VLSI Design and Education Center - University of Tokyo 15
Experimental Setup Example: 8x8 IDCT T Technology: FreePDK 45nm h l F PDK 45 Logic Synthesis: Synopsys Design Compiler Ultra g y y p y g p High effort options with gated clock optimization P&R Cadence SoC Enco nter P&R: Cadence SoC Encounter Simulation: Synopsys VCS y p y Power/timing analysis: Synopsys PrimeTime PX Si Simulation results are used for power calculation l ti lt d f l l ti Energy efficiencies (GOPS/W) are compared Fujita Lab. – VLSI Design and Education Center - University of Tokyo 16
Energy Efficiency Comparison No Patching Fully-Patched 6% 48% 89% 8x8 IDCT (FreePDK 45nm technology) ( gy) Offers a tradeoff between efficiency and programmability Fujita Lab. – VLSI Design and Education Center - University of Tokyo 17
Area & Performance Comparison 5% 20% 5X Smaller Up to 40% Up to 40% Increase Fujita Lab. – VLSI Design and Education Center - University of Tokyo 18
Area Comparison 4x reduction 4x reduction 18% increase Fully-programmable accelerator Single-function Hardwired accelerator Hardwired accelerator Effi Efficeum (Technology: FreePDK 45nm (NCSU/Nangate), Operating Frequency: 200MHz) Fujita Lab. – VLSI Design and Education Center - University of Tokyo 19
Power Comparison 6x reduction 6x reduction 13% increase Fully-programmable accelerator Single-function Hardwired accelerator Hardwired accelerator Effi Efficeum (Technology: FreePDK 45nm (NCSU/Nangate), Operating Frequency: 200MHz) Fujita Lab. – VLSI Design and Education Center - University of Tokyo 20
Incremental High Le el S nthesis Incremental High-Level Synthesis and Patch Compilation and Patch Compilation For High-Level ECO g Fujita Lab. – VLSI Design and Education Center - University of Tokyo 21
Conventional High-Level Synthesis Several phases are applied separately This prevents incremental synthesis This prevents incremental synthesis Binding Allocation Scheduling g SHFT1 ADD1 ADD2 MUL1 AD AD + D1 D2 Step 1 x + x x Step 1 Step 1 + + + + + << Step 2 MU SHFT << + Step 2 x Step 3 L1 1 x Step 3 Registers FSM FSM D t Datapath th Datapath Datapath Datapath Fujita Lab. – VLSI Design and Education Center - University of Tokyo 22
Incremental High-Level Synthesis Each operation is scheduled and bound incrementally, and the hardware is enhanced accordingly Incremental Incremental Scheduling & Binding Scheduling & Binding SHFT1 SHFT1 ADD1 ADD2 MUL1 ADD1 ADD2 MUL1 Step 1 + + + + + + + + Step 1 Step 1 << Step 2 << + + Step 2 x Step 3 Registers Registers Registers Registers FSM S FSM S Datapath Datapath Fujita Lab. – VLSI Design and Education Center - University of Tokyo 23
Recommend
More recommend