hardw are softw are hardw are softw are hardw are softw
play

Hardw are/Softw are Hardw are/Softw are Hardw are/Softw are - PowerPoint PPT Presentation

Hardw are/Softw are Hardw are/Softw are Hardw are/Softw are Instruction Set Configurability Instruction Set Configurability Instruction Set Configurability for Sytem-on-Chip Processors for Sytem-on-Chip Processors for Sytem-on-Chip


  1. Hardw are/Softw are Hardw are/Softw are Hardw are/Softw are Instruction Set Configurability Instruction Set Configurability Instruction Set Configurability for Sytem-on-Chip Processors for Sytem-on-Chip Processors for Sytem-on-Chip Processors 38 th DAC, Las Vegas, 38 th DAC, Las Vegas, June 18-22, 2001 June 18-22, 2001 Albert Wang, Chris Row en, Dror Maydan, Earl Killia

  2. Landscape of reconfigurable reconfigurable computing computing Landscape of Landscape of reconfigurable computing Optimality/ integration (e.g. mW, $) Instruction-set ASIC Configurable Processor ∆ ~10x FPGA + Processor FPGA General Processor Flexibility/modularity ∆ ~10x (e.g. time-to-market) 2

  3. Computing using temporal connection Computing using temporal connection Computing using temporal connection Processor Solution Registers Memory (Program) Control Datapath � X Correct Efficient � Processor X 3

  4. Computing using spatial connection Computing using spatial connection Computing using spatial connection Processor Solution ASIC Solution Registers Memory (Program) Control Storage FSM Datapath � X Correct Efficient � ASIC X 4

  5. Configurable Processors: best of both Configurable Processors: best of both Configurable Processors: best of both Processor with Application-specific Instructions Processor Solutions ASIC Solutions Registers Memory (Program) Control Storage FSM Datapath � � Correct Efficient � Processor � ASIC 5

  6. Outline Outline Outline � Configurable processor solution � Xtensa ™ processor Architecture � Instruction extension automation � Software development tools � An Example � Results � Summary 6

  7. Conventional Architecture Conventional Architecture Conventional Architecture •More registers Decoder •More FU’s S0 S1 RF0 RF1 RF2 •Deeper pipeline Source •Bypass/forward Control FU0 FU0 FU0 FU0 Result 7

  8. Conventional Architecture - cont. Conventional Architecture - cont. Conventional Architecture - cont. • More FU’s Decoder S0 S1 RF0 RF1 RF2 Source routing Control FU0 FU1 FU2 FU3 Result routing 8

  9. Conventional Architecture – cont. Conventional Architecture – cont. Conventional Architecture – cont. •More FU’s Decoder • More registers S0 S1 RF0 RF1 RF2 Source routing Control FU0 FU1 FU2 FU3 Result routing 9

  10. Conventional Architecture – cont. Conventional Architecture – cont. Conventional Architecture – cont. •More registers Decoder •More FU’s S0 S1 RF0 RF1 RF2 • Deeper pipeline Source routing Control FU0 FU1 FU2 FU3 Result routing 10

  11. Conventional Architecture – cont. Conventional Architecture – cont. Conventional Architecture – cont. •More registers Decoder •More FU’s S0 S1 RF0 RF1 RF2 •Deeper pipeline Source routing • Bypass/forward Control FU0 FU1 FU2 FU3 Result routing 11

  12. Conventional Architecture – cont. Conventional Architecture – cont. Conventional Architecture – cont. � Problem with fixed processor: � Waste silicon • There is no universal extensions, or even one for each application class � Not fast enough, compared with hardware implementation � Waste power � The Tensilica solution: � Small core processor � Allow easy and efficient application-specific instruction extensions 12

  13. Xtensa Architecture – Base Xtensa Architecture – Base Xtensa Architecture – Base � Good performance � Comparable to any embedded 32-bit Decoder RISC � Good code density S0 S1 RF0 RF1 RF2 � Much better than 32-bit RISC � Use 16b/24b instructions Source routing � Small � .7mm 2 in .18 � Low power Control � .37mw / MHz � Easy extension FU0 FU0 FU0 FU0 � With Tensilica Instruction Extension (TIE) language – ISA level � Efficient extension � TIE compiler generates efficient pipelined implementation Result routing � TIE compiler extends all software development tools 13

  14. TIE language - opcode TIE language - opcode TIE language - opcode • Opcode Decoder S0 S1 RF0 RF1 RF2 Source routing Control FU0 FU0 FU0 FU0 Result routing opcode MAC op2 =5 CUST0 14

  15. TIE Language – regfile regfile / / state TIE Language – state TIE Language – regfile / state •Opcode Decoder • Register file / State S0 … as needed RF0 Source routing Control FU0 FU0 FU0 FU0 Result routing state ACC 40 15

  16. TIE Language – semantics TIE Language – semantics TIE Language – semantics •Opcode Decoder •Register file / state … as needed S0 RF0 • semantics Source routing Control … as needed FU0 MAC Result routing semantic sem1 {MAC} { assign ACCL=ACCL+ars[16:0]*art[15:0];} 16

  17. TIE Language – iclass TIE Language – iclass TIE Language – iclass •Opcode Decoder •Register file / state … as needed S0 RF0 •semantics Source routing • Instruction class Control … as needed FU0 MAC Result routing iclass c1 {MAC} { in ars , in art } { inout ACC} 17

  18. TIE Language - schedule TIE Language - schedule TIE Language - schedule •Opcode Decoder •Register file / state … as needed S0 RF0 •semantics Source routing • Instruction class • schedule Control MAC … as needed FU0 Result routing schedule s1 {MAC}{ use ars 1; use art 1; use ACC 2; def ACC 2;} 18

  19. A Complete Example – parallel MAC A Complete Example – parallel MAC A Complete Example – parallel MAC opcode PMAC op2=0 CUST0 state ACC1 40 state ACC2 40 iclass rr {PMAC}{in ars, in art}{inout ACC1, inout ACC2} semantic pmac_sem {PMAC} { assign ACC1 = ACC1 + ars[15:0] * art[15:0]; assign ACC2 = ACC2 + ars[31:16] * art[31:16]; } schedule pmac_schd {PMAC} { use ars 1; use art 1; use ACC1 2; use ACC2 2; def ACC1 2; def ACC2 2; } 19

  20. Productivity Gain – language + compiler Productivity Gain – language + compiler Productivity Gain – language + compiler I/O ALU Timer Pipe Cache Register File MMU Tailored, synthesizable HDL uP core Select processor options Using the ∗∗∗∗∗∗∗ Xtensa ∗∗∗∗ processor Customized ∗∗∗∗∗∗∗∗ generator, Compiler, ∗∗∗ create... Assembler, Linker, Describe new Debugger, In Minutes! instructions Simulator 20

  21. Productivity Gain – Softw are Tools Productivity Gain – Softw are Tools Productivity Gain – Softw are Tools I/O ALU Timer Pipe Cache Register File MMU Tailored, synthesizable HDL uP core Select processor options Using the ∗∗∗∗∗∗∗ Xtensa ∗∗∗∗ processor Customized ∗∗∗∗∗∗∗∗ generator, Compiler, ∗∗∗ create... Assembler, Linker, Describe new Debugger, instructions Simulator 21

  22. Softw are Support – Assembler Softw are Support – Assembler Softw are Support – Assembler • Assembler Loop a2, .L1 Decoder l16si a10, a3, 0 l16si a11, a3, 2 ACC1 ACC2 RF0 addi.n a3, a3, 2 PMAC a10, a11 .L1: ∗ ∗ Control • Custom data type + + • Register allocation FU0 • Code Scheduling • RTOS • Simulator/debugger 22

  23. Softw are Support – custom data type Softw are Support – custom data type Softw are Support – custom data type • Assembler Decoder • Custom data type ACC1 ACC2 RF0 sat_int x,y,z; C Code: z = sat_add(x,y); ∗ ∗ Control • Register allocation + + • Code Scheduling FU0 • RTOS • Simulator/debugger 23

  24. Softw are Support – register allocation Softw are Support – register allocation Softw are Support – register allocation • Assembler Decoder • Custom data type ACC1 ACC2 RF0 • Register allocation Spilling around a call: sat_add s3, s1, s2 ∗ ∗ Control sat_store s3, a1, 0 call8 foo + + FU0 sat_load s3, a1, 0 • Code Scheduling • RTOS • Simulator/debugger 24

  25. Softw are Support – code scheduling Softw are Support – code scheduling Softw are Support – code scheduling • Assembler • Custom data type Decoder ACC1 ACC2 • Register allocation RF0 • Code Scheduling t = sat_mult(x,y); ∗ ∗ Control z = sat_add(z, t); t2 = sat_mult(x2, y2); + + FU0 sat_mult s3, s1, s2 sat_mult s6, s5, s4 sat_add s7, s7, s3 • RTOS • Simulator/debugger 25

  26. Softw are Support - RTOS Softw are Support - RTOS Softw are Support - RTOS • Assembler • Custom data type Decoder • Register allocation ACC1 ACC2 RF0 • Code Scheduling • RTOS ∗ ∗ Control Context Switch + + FU0 Task0 Task1 sat_store S0, S0, S1, S1, Memory … … sat_load s15 s15 • Simulator/debugger 26

  27. Softw are Support – simulator/debugger Softw are Support – simulator/debugger Softw are Support – simulator/debugger • Assembler Decoder ? • Custom data type ? ACC1 ACC2 RF0 • Register allocation • Code Scheduling ∗ ∗ Control • RTOS ? + + • Simulator/debugger FU0 gdb> break … gdb> cont gdb> step gdb> display … 27

  28. Outline Outline Outline � Configurable processors � Architecture � Instruction extension � Software support � An Example � Results � Summary 28

Recommend


More recommend