How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers for Asynchronous On- -chip Networks chip Networks On Wei Song Supervisor: Doug Edwards Advanced Processor Technologies Group Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Index • What is What is asynchronous circuit asynchronous circuit ? ? • • Why to use on-chip network ? • Why asynchronous on-chip network is slow? • How can we improve it? • So, what’s next? Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Synchronous Circuit • Pipeline style • Strict timing assumption • A global clock driven by a balanced tree Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Asynchronous Circuits – C-element A B Q’ 0 0 0 0 X Q X 0 Q 1 1 1 Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Asynchronous Pipeline • Handshake • Nearly delay insensitive (no timing assumptions) • Power efficient (no global clock) • Complicated (larger area) Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Index • What is asynchronous circuit ? • Why to use Why to use on on- -chip network chip network ? ? • • Why asynchronous on-chip network is slow? • How can we improve it? • So, what’s next? Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Bus Based Multiprocessor System • A shared communication fabric • One master at one time • Bandwidth constrained • Fixed communication latency Advanced Processor Technology Group 2009-10-28 The School of Computer Science
A Mesh Network-on-Chip (NoC) • Distributed communication Processor Processor Processor resource router router router • Scalable bandwidth Processor Processor Processor • Multiple master and slave pairs at a time router router router • Variable Processor Processor Processor communication router router router latency Advanced Processor Technology Group 2009-10-28 The School of Computer Science
The Router for NoC • 5 ports North • Duplex channels l a Arb • Input buffer c A o r b L • Arbiter Arb • Crossbar (Muxes) Arb South Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Data Path of a NoC North Processor Processor Processor l a Arb c Arb o L router router router Processor Processor Processor Arb router router router Processor Processor Processor Arb router router router South Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Index • What is asynchronous circuit ? • Why to use on-chip network ? • Why Why asynchronous on asynchronous on- -chip network chip network • is slow? is slow? • How can we improve it? • So, what’s next? Advanced Processor Technology Group 2009-10-28 The School of Computer Science
A 4-bit Synchronous Pipeline • Data are synchronised by the global clock • No significant speed difference with the 1- bit pipeline Advanced Processor Technology Group 2009-10-28 The School of Computer Science
A 4-bit Asynchronous Pipeline d0i d0o d1i d1o d2i d2o d3i d3o acko acki Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Reasons of the Low Speed • Asynchronous pipelines deliberately detect the arrival of data • A big C-element tree in the loop! Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Index • What is asynchronous circuit ? • Why to use on-chip network ? • Why asynchronous on-chip network is slow? • How can we improve it? How can we improve it? • • So, what’s next? Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Channel Slicing d0i d0o d0i d0o d1i d1i d1o d1o d2i d2o d2i d2o d3i d3o d3i acko acki d3o Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Re-Synchronisation (1) Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Re-Synchronisation (2) Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Re-Synchronisation (3) Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Hardware Implementation • Verilog HDL+STG(Petrify) • Layout Implementation • Faraday 130 nm Technology • 12.6K Gates (50,000um 2 ) • 0.3*0.3mm 2 • Channel Sliced 450MHz • Synchronised 360MHz Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Performance Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Index • What is asynchronous circuit ? • Why to use on-chip network ? • Why asynchronous on-chip network is slow? • How can we improve it? • So, what So, what’ ’s next? s next? • Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Spatial Division Multiplex • Frequently Re-synchronisation will compromise the speed • Sub-channels should run independently • Sub-channels could transmit different messages • Multiple messages could be transmitted by the same channel but on different sub-channels Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Spatial Division Multiplex (con.) Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Conclusion • Asynchronous Circuits – Delay insensitive, low power • On-chip Network – Distributed communication fabric, scalable bandwidth • Asynchronous On-chip Network – The C-element tree in synchronisation compromises speed • Channel Slicing – Let sub-channels run independently, fast • SDM – Let more messages share the fabric simultaneously Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Thanks! Advanced Processor Technology Group 2009-10-28 The School of Computer Science
Recommend
More recommend