A Network of Time Division Multiplexing for FPGAs Rosemary Francis
Motivation • FPGAs are now home to complex Systems on-Chip • Designs require the use of Network-on- Chip • FPGA global wiring is simple in comparison with ASIC Networks-on-Chip • Networks for FPGAs use lots of wires or lots of logic • Hard blocks are limited by the soft IP blocks
Goals • Improve wiring density through TDM • Use TDM components for effective soft NoC implementation • Funnel data to high-speed hard blocks – Hard NoC – Multipliers – Block RAM
Hierarchy of interconnect Coarse-grain packet-switched network Time-division multiplexed wires in a fine-grain network Clusters of logic elements with local interconnect
Architecture: Stratix vs TDM SRAM TDM Global routing Stratix Global routing Switch box Switch box Local routing Local routing LUT LUT Cluster of logic elements Cluster of logic elements with latched inputs
Wire Sharing • Many wires can be 1 shared without a problem 2 1 2 3 4 3 4 5
Wire Sharing • Many wires can be shared without a problem 1 • Other configurations 1 require a more intelligent approach 2 2 Conflict!!
Wire Sharing • Many wires can be shared without a problem 2 • Other configurations 1 require a more intelligent approach 2 3 • Signals can be 4 delayed to allow more efficient wire 3 use without 4 5 rerouting
Parameter selection • Assume infinite time slots to reduce wiring – Determine optimum number of TDM wires
Infinite resources 70 65 Total number of wires needed 60 55 50 45 40 35 30 25 20 15 10 5 0 0 6 7 8 10 12 14 16 18 Number of TDM wires
Parameter selection • Assume infinite time slots to reduce wiring – Determine optimum number of TDM wires • Vary number of time slots – Determine optimum number of time slots – Investigate the effect this has on latency
Determine number of time slots 70 65 60 Wires per switch box 55 50 45 40 35 30 25 20 15 10 5 0 1 8 12 16 20 24 28 32 36 Number of time slots (= number of configurations bits per mux)
Number of time slots vs latency Normalised latency of critical path 4 3.5 3 2.5 2 1.5 1 0.5 0 1 8 12 16 20 24 28 32 36 Number of time slots (=number of configuration bits per mux)
Parameter selection • Assume infinite time slots to reduce wiring – Determine optimum number of TDM wires • Vary number of time slots – Determine optimum number of time slots – Investigate the effect this has on latency • Using optimum number of time slots – Re-evaluate optimum number of TDM wires
Limited resources 70 65 60 Total number of wires needed 55 50 45 40 35 30 25 20 15 10 5 0 0 6 7 8 10 12 14 16 18 Number of TDM wires
Architectural drawbacks • Extra configuration SRAM • High-speed interconnect clock • Benchmarks run over three times slower • New CAD tools needed – Re-routing in space as well as time – Optimise for TDM wiring at every stage
Conclusions • Using TDM wiring we can reduce the number of wires whilst increasing the data rate within channels – 75% less wiring * 24 time slots * 3 times slower means 2 times channel data rate • This will allow – the design of effective global interconnect – more efficient sharing of on-chip resources – simplification of multi-chip designs
Future Work • Current scheduling algorithm gives • Large wire reduction • Large latency penalty • Is there a better compromise? • Halve the wiring, small latency penalties • How can we reduce latency in other ways? • Better scheduling algorithms • Circuit redesign
Thanks for listening... Rosemary.Francis@cl.cam.ac.uk
Recommend
More recommend