architectural synthesis and exploration using term
play

Architectural Synthesis and Exploration using Term Rewriting - PowerPoint PPT Presentation

Architectural Synthesis and Exploration using Term Rewriting Systems Arvind James C. Hoe Laboratory for Computer Science Massachusetts Institute of Technology http:/ /www.csg.lcs.mit.edu Outline u Introduction u Term Rewriting Systems (TRS)


  1. Architectural Synthesis and Exploration using Term Rewriting Systems Arvind James C. Hoe Laboratory for Computer Science Massachusetts Institute of Technology http:/ /www.csg.lcs.mit.edu

  2. Outline u Introduction u Term Rewriting Systems (TRS) as a Hardware Description Language u Hardware Synthesis from Term Rewriting Systems u Results Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 2

  3. Internet/Communication Space u Rapidly changing functionality and performance requirements necessitate rapid hardware development _ ATM, frame-relay, Gigabit Ethernet, packet-over- SONET protocols _ voice-over-IP, video, streaming data, QoS issues dominant _ merger of LAN and WAN infrastructures u Currently addressed by _ General-purpose or Embedded processors + ASICs _ Network processors (emerging) ASIC development time and cost is the limiting factor in product release Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 3

  4. Current ASIC Design Flow Informal Architectural Spec Manual Steps High-level C Simulators Verification nightmare Labor Intensive Time Consuming Error Prone ASICs Fab Synthesis/Optimization RTL Implementation Time pressure means: little architecture exploration & high technology risk Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 4

  5. Our New Design Technology u Reduces time to market _ Faster design capture _ Same specification for simulation, verification and synthesis _ Rapid feedback ⇒ architectural exploration u Enables rapid development of a large variety of chips with related designs ⇒ complex systems-on-a-chip u Reduces manpower requirement Makes designing hardware as commonplace as writing software Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 5

  6. State-Centric Descriptions Hardware description Schematics languages always @ (posedge Clk) begin π Flip + π Mod π Mod if (a >= b) begin a <= a - b; ce δ Mod,a δ Flip,b π Flip a b <= b; < δ Flip,a end else begin δ Mod,a - π Flip a <= b; π Mod b <= a; b =0 δ Flip,b δ Flip,a end ce end π Flip what does it describe? Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 6

  7. Operation-Centric Descriptions Euclid’s Algorithm Gcd(a, b) if b ≠ 0 ⇒ Gcd(b, Rem(a, b)) (Rule 1 ) Gcd(a, 0) ⇒ a (Rule 2 ) Rem(a, b) if a < b ⇒ a (Rule 3 ) Rem(a, b) if a ≥ b ⇒ Rem(a-b, b) (Rule 4 ) Execution: R 1 ⇒ Gcd(4,Rem(2,4)) Gc11d(2,4) R 3 R 1 ⇒ Gcd(4,2) ⇒ Gcd(2,Rem(4,2)) R 4 R 4 ⇒ Gcd(2,Rem(2,2)) ⇒ Gcd(2,Rem(0,2)) R 3 R 2 ⇒ Gcd(2,0) ⇒ 2 Hardware description? Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 7

  8. Operation-Centric Description:MIPS MIPS Microprocessor Manual ADD rd, rs, rt GPR[rd] ← GPR[rs] + GPR[rt] PC ← PC + 4 Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 8

  9. TRS as a Hardware Description Language

  10. Term Rewriting System a set of terms a set of rewriting rules TRS ≡ < A, R> hierarchically state organized transitions state elements System ≡ Structure + Behavior An operation centric view of the world Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 10

  11. TRS Execution Semantics Given a set of rules and an initial term s While ( some rules are applicable to s ) { ♦ choose an applicable rule (non-deterministic) ♦ apply the rule atomically to s } Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 11

  12. Architectural Description +1 PC PROG RF ALU BF Iport Oport Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 12

  13. AX Architectural Description Type SYS = Sys( PROC, IPORT, OPORT ) Type PROC = Proc( PC, RF, PROG, BF ) Abstract Type PC = Bit[16] Datatypes Type RF = Array[RNAME] VAL Type RNAME = Reg0 || Reg1 || Reg2 || . . . Type VAL = Bit[16] +1 Type PROG = Array[PC] INST Type BF = Fifo INST_D PC PROG RF ALU BF Type IPORT = Iport VAL Type OPORT = Oport VAL Iport Oport Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 13

  14. AX Instruction Set Type INST = Loadi (RD, VAL) || Loadpc (RD) || Add (RD, R1, R2) || Sub (RD, R1, R2) || . . . || Bz (RA,RC) || MovToO (R1) || MovFromI (RD) Decoded instructions Type INST_D = Add d (RD, V1, V2) || ... RD, RA, etc. are RNAME’s. V1, V2, etc. are values Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 14

  15. AX Processor Model: Fetch Rules Fetch Add Rule Proc( pc, rf, prog, bf ) if r 1 ∉ target(bf) ∧ r 2 ∉ target(bf) where Add(r, r 1 , r 2 )=prog[pc] ⇒ Proc( pc+1, rf, prog, enq(bf,Add d (r,rf[r 1 ],rf[r 2 ])) ) +1 PC PROG RF ALU BF Iport Oport Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 15

  16. AX Processor Model: Execute Rules Proc( pc, rf, prog, bf ) if r 1 ∉ target(bf) ∧ r 2 ∉ target(bf) where Add(r, r 1 , r 2 )=prog[pc] ⇒ Proc( pc+1, rf, prog, enq(bf,Add d (r,rf[r 1 ],rf[r 2 ])) ) Proc( pc, rf, prog, bf ) where Add d (r, v 1 , v 2 )=first(bf) ⇒ Proc( pc, rf[r:=v 1 +v 2 ], prog, deq(bf) ) +1 “Execute Add” BF PC PROG RF ALU Iport Oport Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 16

  17. TRS as an HDL u Clean, expressive, precise and concise - speculative & superscalar microarchitectures [IEEE Micro, June ’99] - memory models & cache coherence protocols [ISCA99, ICS99] u Supports parallel and non-deterministic specifications u The correctness of a TRS can be verified against a reference TRS specification u Some pipelining can be done automatically as a source-to- source transformation on TRS’s u Superscalar versions of TRS’s can be derived mechanically from pipelined TRS’s. Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 17

  18. Synthesis from TRS’s

  19. From TRS to Synchronous FSM I S “Next” S O Transition States Logic u Extract state elements (registers) from the type declaration u Extract state transition logic from the rules Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 19

  20. Rule: As a State Transformer Proc( pc, rf, prog, bf ) where Bz d (v a , 0 ) = first(bf) ⇒ Proc( v a , rf, prog, clear(bf) ) enable PC PC’ π RF RF’ PR PR δ OG OG’ BF BF’ current next state state values Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 20

  21. Reference Implementation u Synchronous state elements WA ED WD first EE WE D F _full A R DE Q _empty RA 1 RD 1 LE CE RA 2 RD 2 RA 3 RD 3 u Single transition per clock cycle Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 21

  22. Scheduler π 1 φ 1 π 2 φ 2 Scheduler π n φ n 1 . φ i ⇒ π i 2 . π 1 ∨ π 2 ∨ .... ∨ π n ⇒ φ 1 ∨ φ 2 ∨ .... ∨ φ n 3. One-rule-a-time ⇒ at most one φ i is true Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 22

  23. Combining Logic from Multiple Rules latch φ 0 enables φ 1 latch from OR enable different φ n rules sel δ 0, PC δ 1,PC next state next PC’ values state from value different δ n , PC rules Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 23

  24. Performance Considerations u Concurrent Execution _ Statically determine which transitions can be safely executed concurrently _ Generate a scheduler and update logic that allows as many concurrent transitions as possible Caution: Concurrent firing of two rules can violate one- transition-at-a-time semantics if, for example, firing of one rule disables the other Conflict-free rules Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 24

  25. Quality of Synthesis

  26. TRAC Synthesis Flow Design SPEC Transform Compile RTL Sim C RTL Synopsys Std C Sim Gate Array FPGA Cell Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 26

  27. Performance: TRS vs. Verilog 32-bit MIPS Integer Core CBA tc6a LSI 10K Area Clock Area Clock (cells) (gates) TRS 9521 10ns 30756 19.48ns 100MHz 51MHz Verilog 8960 11.4ns 29483 23.79ns RTL 88MHz 42MHz TRS 1 day Dan Rosenband & James Hoe Verilog 1 month Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 27

  28. Architectural Derivatives +1 PC PROG RF ALU BF BF 0 1 MIN MOUT Non-pipelined Other Dimensions: 2-stage Superscalar, Custom Instructions, Number of Registers, Word Size ... 3-stage Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 28

  29. Derivatives and Feedback u Derivatives of a 32-bit 4-GPR embedded RISC processor u Synopsys RTL Analyzer reports GTECH area and gate delays (no wiring or load model) simple 2-stage 3-stage 3-stage,2-way Delay 30+X max(18+X,25) max(6+X,25) max(8+X,31) Delay(X=20) 50 38 26 31 Area 4334 5753 6378 9492 unit area=1 NAND unit delay=1 NAND Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 29

  30. Application: ASPN Chips ASIC ASPN Performance NP GP Flexibility Application-Specific Programmable Network (ASPN) Chips are based on a core architecture and a set of domain-specific building blocks TRAC allows rapid customization of ASPN designs with ASIC like performance for evolving needs and for different vertical markets within the communication space Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 30

Recommend


More recommend