exploring high dimensional topologies for noc design
play

Exploring High Dimensional Topologies for NoC Design Through an - PowerPoint PPT Presentation

Exploring High Dimensional Topologies for NoC Design Through an Integrated Analysis and Synthesis Framework F. Gilabert , S. Medardoni , D. Bertozzi , L. Benini , M.E. Gomez , P. Lopez and J. Duato Universidad


  1. Exploring High ‐ Dimensional Topologies for NoC Design Through an Integrated Analysis and Synthesis Framework F. Gilabert † , S. Medardoni ‡ , D. Bertozzi ‡ , L. Benini † † , M.E. Gomez † , P. Lopez † and J. Duato † † Universidad Politécnica de Valencia. ‡ University of Ferrara. † † University of Bologna.

  2. Multi ‐ dimension topologies 2D mesh frequently used for NoC design - perfectly matches 2D silicon surface - high level of modularity - controllability of electrical parameters But its avg latency and resource consumption scale poorly with network size Topology with more than 2 dimensions attractive: - higher bandwidth and lower avg latency - on-chip wiring more cost-effective than off-chip But layout (routing) issues might impact their effectiveness and even feasibility (use of more metal layers) (links with different latencies)

  3. Objective Explore the effectiveness and feasibility of multi-dimensional topologies Exploration methodology issues arise Exploration methodology issues arise 1. Fast and accurate exploration tools required for system-level analysis and topology selection Our approach Abstract the behaviour of all NoC architecture-level mechanisms while retaining RTL clock cycle accuracy (flow control, arbitration, switching, routing, buffering, injection and ejection )

  4. Objective Explore the effectiveness and feasibility of multi-dimensional topologies Exploration methodology issues arise Exploration methodology issues arise 2. Realistically capture traffic behavior May lead to highly inaccurate performance Traffic pattern predictions usually abstracted (traffic peaks, different as an average link kinds of messaging, bandwidth utilization synchronization mismatches) Our approach • Project network traffic based on latest advances in MPSoC communication middleware • Generate traffic patterns for the NoC “shaped” by the above communication middleware (e.g., synchronization, communication semantics)

  5. Objective Explore the effectiveness and feasibility of multi-dimensional topologies Exploration methodology issues arise Exploration methodology issues arise 3. Backend synthesis flow required for assessment of layout effects A single 65nm LP-LVT 65nm LP-HVT 65nm LP-LVT 65nm LP-HVT technology The spread library no increases as longer technology exists for scales down standard cell design Our approach • Silicon-aware topology exploration • Derive physical constraints that, if met, allow to keep the better theoretical properties of multi-dimensional topologies

  6. Topology exploration framework � Reference NoC architecture � Transaction Level models � Traffic pattern generation Exploration of multi ‐ dimensional topologies � System ‐ level performance analysis � Implementation space exploration � TLS driven physical synthesis

  7. Topology exploration framework � Reference NoC architecture � Transaction Level models � Traffic pattern generation Exploration of multi ‐ dimensional topologies � System ‐ level performance analysis � Implementation space exploration � TLS driven physical synthesis

  8. Reference NoC architecture Xpipes-Lite switch architecture ARB ARB ARB FLOW CONTROL ARB MGR IN 0 BUFF OUT 0 LATCH PATH SHFT MUXES IN 1 LATCH BUFF PATH SHFT OUT 1 IN 2 BUFF OUT 2 LATCH PATH SHFT IN 3 LATCH BUFF OUT 3 PATH SHFT 1 CK cycle � Input and output sampling � Latency: 1 cycle in the switch, 1 cycle in the link � Wormhole switching � Round ‐ robin arbitration on the output ports

  9. Reference NoC architecture The Network Interface OCP CLK NoC CLK OCP Slave Interface Network Master IF NoC Interface Initiator OCP Fabric Core Back ‐ end � Protocol conversion (from OCP to network) � Packetization � Clock domain crossing � OCP Clock is an integer divider of NoC Clock � Pre ‐ computation of routing path (source routing) � A symmetric network interface target exists

  10. Topology exploration framework � Reference NoC architecture � Transaction level models � Traffic pattern generation Exploration of multi ‐ dimensional topologies � System ‐ level performance analysis � Implementation space exploration � TLS driven physical synthesis

  11. Transaction Level Models NoC Architecture Transaction Level Models Architectural Data structures components Abstraction Component behavior Logical functions Sensitivity Events Our transaction level models thus achieve maximum accuracy Each cycle, only components affected by an event require simulation time •Speed-up is dependant on the system idleness

  12. Network Interface Master Processor To Network STALL flag OCP Side Network Side Data structure Buffers modeled as counters NI Slave is modeled using a similar data structure Flow control status modeled as a flag

  13. Network Interface Master Processor 7 8 To Network STALL flag OCP Side Network Side Packet header generation Events Packetization process Notifies message availability in the processor Send Send one burst of data from the processor to the NI Path calculation If it is possible, moves flits from the NI to the attached OCP Transfer Indicates the size in bursts of data switch Schedules a Packetization Event Schedules additional OCP Transfer Event when OCP Packetization Schedules a OCPTransfer Event according to the is free and Processor has pending bursts processor rate Transmit Each time a flit is created it schedules Transmit Event if possible

  14. Switch Fabric Input ports Output ports STALL flag STALL flag STALL flag STALL flag Data structure Input and Output buffers modeled as counters Flow control status at each port modeled as a flag Internal variables used to represents switch status

  15. Switch Fabric Input ports Output ports STALL flag STALL flag STALL flag STALL flag Events Scheduled by the previous NoC component Input Transmit If possible, moves flits from input to output Chooses for each output, one input which packets are Moves a flit to an input port not routed and want to go through it Route If possible, transmits a flit to the next NoC component If it is moving the tail flit, it frees the output If it is the first flit of a packet it schedules a Route Event If it is possible to move flits from input to output, it Cross schedules a Cross Event If it is possible, it schedules an Output Transmit Event If not, it schedules a Cross Event Output Transmit

  16. TL Models Validation RTL Results simulator System definition Topology description Validation Traffic pattern TL Results simulator

  17. TL Models Validation OCP Processor NI Master OCP Shared NI Memory Slave Switch Switch Switch Switch Switch Network frequency: 1GHz OCP Core frequency: 1GHz, 500MHz, 250 MHz, 125MHz Newort/OCP Clock ratio: 1,2,4,8 Several OCP traffic patterns (parameters: burst length and inter ‐ burst idle time)

  18. TL Models Validation • Maximum error of all the tests was: 0.03% • Simulation speedup varied from 20x to 100x with respects to RTL simulator – Depends heavily on the number of idle cycles of the simulation • 4x4 mesh test: – Maximum error: 0.01% – Speed ‐ up: ~100x

  19. Topology exploration framework � Reference NoC architecture � Transaction level models � Traffic pattern generation Exploration of multi ‐ dimensional topologies � System ‐ level performance analysis � Implementation space exploration � TLS driven physical synthesis

  20. Tile Architecture Tile Memory Processor Core Core Network IF Network IF Initiator Target • Processor core – Connected through a Network Interface Initiator • Local memory core – Connected through a Network Interface Target • Two network interfaces can be used in parallel

  21. Communication protocol Producer Tile ConsumerTile • Step 4: Consumer sends • Step 2: Consumer • Step 3: Consumer reads • Step 1: Producer checks Local 1 Local Polling local semaphores for a notification upon data from the producer detects unblocked Polling 2 pending messages for completion semaphore the destination Read – This allows the producer • Requests producer for Write Operation Message 3 • If not, it writes data to to send another message data the local tile memory to this consumer 4 and unblocks a Reset semaphore at the Semaphore consumer tile • Message sent only when consumer is ready to read it • The producer is free • Only one outstanding message for a producer-consumer pair • Low network bandwidth utilization Dalla Torre, A. et al., ”MP-Queue: an Efficient Communication Library for Embedded to carry out other • Tight latency constraints on the topology Streaming Multimedia Platform”, IEEE Workshop on Embedded Systems for Real-Time Multimedia, 2007. tasks

  22. Workload distribution External I/O • Producer, worker and consumer tasks • I/O devices dedicated to input OR output data • Modeling of layout constraints (I/O devices on one side of the chip)

  23. Topology exploration framework � Reference NoC architecture � Transaction level models � Traffic pattern generation Exploration of multi ‐ dimensional topologies � System ‐ level performance analysis � Implementation space exploration � TLS driven physical synthesis

  24. System ‐ level performance analysis • Tile ‐ based architecture • 16 tiles system – Up to 5 tiles used for access external I/O • Baseline topology 4x4 mesh (4 ‐ ary 2 ‐ mesh) – Switch frequency: 1GHZ – Tile Frequency: 500 MHz – External I/O frequency: 500 MHz

Recommend


More recommend