physical implementation of the dspin network on chip in
play

Physical Implementation of the DSPIN Network-on-Chip in the FAUST - PowerPoint PPT Presentation

Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture Ivan Miro-Panades 1,2,3 , Fabien Clermidy 3 , Pascal Vivet 3 , Alain Greiner 1 1 The University of Pierre et Marie Curie, Paris, France 2 STMicroelectronics, Crolles,


  1. Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture Ivan Miro-Panades 1,2,3 , Fabien Clermidy 3 , Pascal Vivet 3 , Alain Greiner 1 1 The University of Pierre et Marie Curie, Paris, France 2 STMicroelectronics, Crolles, France 3 CEA-Leti, MINATEC, Grenoble, France 1 Ivan MIRO PANADES – NOCS 2008

  2. Outline Motivation FAUST architecture Migration of DSPIN into FAUST DSPIN implementation Networks comparison 2 Ivan MIRO PANADES – NOCS 2008

  3. Motivation Physically implement the DSPIN NoC into the FAUST application platform - DSPIN is a NoC developed between Lip6 and ST - FAUST is a stream-oriented application platform for 4G telecom applications, based on ANOC, developed by CEA-Leti. Compare the performances between ANOC and DSPIN on a real application and traffic 3 Ivan MIRO PANADES – NOCS 2008

  4. FAUST architecture JTAG Clk & Reset CTRL Clk, Rst RAC NOC1 I F SPort � 23 computation units 84 Pads APort � Asynchronous NoC EXP (ANOC) OFDM ALAM. CDMA BI T TURBO CONV. MAPP. MOD. MOD. MOD. I NTER. CODER CODER � 20 ANOC routers � GALS conception EXT. NoC AHB RAM I F RAM ARM946 RAM RAM Perf. 58 Pads � 24 independent Clks CTRL TX units AHB units � Ethernet port ETHERNET I F CHAN. CONV. ETHER RX units ROTOR EQUAL. EST. DEC. NET 17 Pads � Internal/External RAM Async / Sync I F � CPU ARM946ES Async node FRAME ODFM CDMA DE- DE- SYNC. DEM. DEM. MAPP. I NTER. � Cache 4KB-I 4KB-D EXP � Hardware OFDM NOC2 I F SPort modulation/demodulation DART 83 Pads APort 4 Ivan MIRO PANADES – NOCS 2008

  5. ANOC architecture Asynchronous NoC - Asynchronous send/accept handshake protocol Crossbar Crossbar - QDI 4-phase/4-rail asynchronous logic - QoS with two Virtual Channels (Best Effort, Guaranteed Service) C NoC Routers I N IP S e L c A a G f r e t - 5 ports router n i - Source routing Crossbar Crossbar - Wormhole packet switching - 32 bits payload Asynchronous handshake protocol FIFO based GALS interfaces Mapped onto libraries - CMOS ST 130nm - TIMA TAL130 library 5 Ivan MIRO PANADES – NOCS 2008

  6. FAUST floor-plan with ANOC 4.5 M Gates CLK Reset Turbo CDMA Bit. NP1 276 chip pins Mapp. Exp. Ala. OFDM mod. Dec. Mod. Inter. RAC Chip area 80 mm² Conv. Codec. ST CMOS 130nm 166 MHz (worst-case) ARM946 Ext. RAM2 RAM RAM1 Ctrl. NoC implementation : - Uses ANOC router hard-macro - Perform buffering and routing Conv. of ANOC links Rotor Equal. Channel Est. Dec. Ethernet - No spaghetti routing at top level ! CDMA Deinter. NP2 OFDM Frame Demapp. Exp. Dem. sync. demod. NorthPort Unit Port West East DART Port Port South Port ANOC router (0.211 mm²) 6 Ivan MIRO PANADES – NOCS 2008

  7. Outline Motivation FAUST architecture Migration of DSPIN into FAUST DSPIN implementation Networks comparison 7 Ivan MIRO PANADES – NOCS 2008

  8. DSPIN architecture in (Y+1,X- (Y+1,X -1) 1) Cluster (Y+1,X) Cluster (Y+1,X) (Y+1,X+1) (Y+1,X+1) CK8 CK8 CK6 CK6 CK7 CK7 CK3 CK3 (Y,X- (Y,X -1) 1) Cluster (Y,X) Cluster (Y,X) (Y,X+1) (Y,X+1) CK5 CK5 in CK4 CK4 in in CK0 CK0 CK CK CK1 CK1 2 2 in in Local port port in Local in Packet base (Y (Y- -1,X 1,X- -1) 1) Cluster (Y- Cluster (Y -1,X) 1,X) (Y (Y- -1,X+1) 1,X+1) in Distributed router architecture Suited to GALS approach Mesochronous links between routers Synthesizable with standard cells Neither asynchronous nor custom cells Metastability resolved by “bi-synchronous FIFO” More details in: "A Low Cost Network-on-Chip with Guaranteed Service Well Suited to the GALS Approach“ NanoNet’06 8 Ivan MIRO PANADES – NOCS 2008

  9. NoC architectures comparison ANOC DSPIN Router arity 5 port router 5 port router Topology Irregular 2D mesh Regular 2D mesh Routing technique Source routing (9 hops) Address-based (8 bits) X-First algorithm Switching technique Wormhole Wormhole Generic: 34 bits (32 bits) Flit size (payload) 34 bits (32 bits) Begin of packet (BOP) Begin of packet (BOP) Flow control bits on the flit End of packet (EOP) End of packet (EOP) Virtual channels Best effort Best effort Guaranteed service Guaranteed service Programming model Message passing Shared memory (2 routers per cluster) Message passing (1 router per cluster) Clocking scheme Fully asynchronous (QDI) with Multi-synchronous with mesochronous GALS interfaces interfaces Flow control protocol Send/accept asynchronous handshake FIFO protocol (Write and WriteOk) Clock tree None One per router Physical implementation Hard macro Soft macro distributed on five modules Long wires Inter-router wires Intra-cluster wires 9 Ivan MIRO PANADES – NOCS 2008

  10. Packet format 34 bits 34 bits (generic) 18 bits 8 bits ... First flit H8 H2 H1 H0 Y X 2 bits 2 bits 2 bits 4 bits 4 bits Following flits ANOC packet DSPIN packet Similar packet format and control bits ANOC uses Source-routing (18 bits) allowing 9 hops DSPIN uses Address-based (8 bits) Packet conversion module required: - Design of Protocol_conversion module 10 Ivan MIRO PANADES – NOCS 2008

  11. FAUST integration CLK_IP CLK_IP IP IP Protocol_conversion module: NIC NIC Translates the routing algorithm Synchronous Synchronous using a LUT SEND/ACCEPT SEND/ACCEPT Protocol_conversion GALS interface Adapts the flow control signals: LUT - ANOC: Send/Accept - DSPIN: FIFO protocol Asynchronous Synchronous SEND/ACCEPT READ/WRITE Implements two bi-synchronous ANOC DSPIN FIFOs router router Asynchronous Mesochronous SEND/ACCEPT READ/WRITE CLK_NoC ANOC IP template DSPIN IP template ANOC GALS Interface: FAUST IPs and NICs are not modified Implements 4 FIFOs synchronous ↔ asynchronous 11 Ivan MIRO PANADES – NOCS 2008

  12. Outline Motivation FAUST architecture Migration of DSPIN into FAUST DSPIN implementation Networks comparison 12 Ivan MIRO PANADES – NOCS 2008

  13. DSPIN implementation � Hierarchical synthesis Synthesis � Low Power CMOS ST 130nm technology � Standard cells Floorplanning � Without asynchronous nor custom cells Place � Timing constraints file: - Muti-cycle path (mesochronous interfaces) Optimize placement - False path (asynchronous interfaces) Clock-tree � GALS compatible Route � Clock gating � Implemented in 4 steps Optimize 13 Ivan MIRO PANADES – NOCS 2008

  14. DSPIN clock-tree Mesochronous links GALS compatible Clk Clk’ Bi-synchronous FIFO [NOCS 2007] Clk Max skew 50% clock period Clk’ Timing constraints: set_multi_cycle_path Clk_NoC 4 th Step Clock-tree implementation 30% skew (top tree) 1. Add buffers/inverters 2. Built bottom clock tree 1 st Step (5% skew) 2 nd , 3 th Step 5% skew 3. Characterize bottom ( bottom tree) clock-tree 180° phase shift and 5% skew 30% skew between routers 4. Build top clock-tree within the router (30% skew) Router (0,0) Router (0,1) Router (0,2) 14 Ivan MIRO PANADES – NOCS 2008

  15. FAUST floor-plan with DSPIN N N Ala. CDMA N Bit. Inter. N N Mod. Distributed router NP1 Turbo L OFDM mod. W RAC CLK Dec. E L E W implementation L E W S L Conv. E W Codec. L S E N S W S N N N Soft macro approach S Ext. Higher floor-plan flexibility ARM946 RAM2 RAM N Ctrl. RAM1 NoC adapts to the SoC L E W L L Long wires are routed in a E W L E W S W E L E S N S W S S tree manner N N L N L N L W Conv. Channel Est. Dec. E W Ethernet Different router W E W E Rotor E Equal. L L W S S E S S S configurations are N N N N N Frame CDMA sync. Demapp. Deinter. possible OFDM demod. Dem. W E L E L W L S W E W E W S L L E S S E N W L S DART E (reserved area) DSPIN router Placement density: 60-70 % 15 Ivan MIRO PANADES – NOCS 2008

  16. Mapp. FAUST with DSPIN FAUST floor-plan NP2 CLK Reset FAUST with ANOC NP1 Ivan MIRO PANADES – NOCS 2008 NP2 Exp. Exp. 16

  17. Outline Motivation FAUST architecture Migration of DSPIN into FAUST DSPIN implementation Networks comparison 17 Ivan MIRO PANADES – NOCS 2008

  18. NoC Area ANOC DSPIN Router 0.211 mm² 0.161 mm² Interface GALS 0.070 mm² 0.024 mm² Clock tree 0.000 mm² 0.0016 mm² Total 0.281 mm² 0.187 mm² CMOS 130nm ANOC is implemented as a hard macro DSPIN is implemented as a soft macro DSPIN is 33% smaller than ANOC 18 Ivan MIRO PANADES – NOCS 2008

  19. NoC Throughput ANOC DSPIN Throughput on worst-case ≤ 289Mflit/s ~ 160Mflit/s conditions Throughput on nominal ≤ 408Mflit/s ~ 220Mflit/s conditions DSPIN throughput is deterministic with respect to the clock frequency (one flit per clock cycle) Long wire latency penalty on throughput: - DSPIN: critical path crosses one time the long wires - ANOC: critical path crosses 4 times the long wires, 4-phase protocol • ANOC link pipelining is feasible In a commercial circuit, DSPIN will be clocked not far away from worst-case (289 MHz) to improve the fabrication yield 19 Ivan MIRO PANADES – NOCS 2008

Recommend


More recommend