priority based wormhole networks on chip challenges and
play

Priority-based Wormhole Networks-on-Chip: challenges and - PowerPoint PPT Presentation

Leandro Soares Indrusiak Priority-based Wormhole Networks-on-Chip: challenges and opportunities Leandro Soares Indrusiak Real-Time Systems Group Department of Computer Science University of York United Kingdom RTN 2017 1 Real-Time Systems


  1. Leandro Soares Indrusiak Wormhole Networks-on-Chip PE PE PE Router R R R Core PE PE PE R R R PE PE PE Link R R R 44 Real-Time Systems Group

  2. Leandro Soares Indrusiak Wormhole Networks-on-Chip PE PE PE R R R PE PE PE R R R PE PE PE Link R R R 45 Real-Time Systems Group

  3. Leandro Soares Indrusiak Wormhole Networks-on-Chip arbitration PE PE PE R R R data out data in PE PE PE routing routing data out data in & & transmission transmission data out data in control control R R R data out data in PE PE PE data in data out R R R 46 Real-Time Systems Group

  4. Leandro Soares Indrusiak Wormhole Networks-on-Chip 47 Real-Time Systems Group

  5. Leandro Soares Indrusiak NoC parallelism and scalability CPU I/O CPU CPU Multiple connections simultaneously RAM CPU CPU CPU 48 Real-Time Systems Group

  6. Leandro Soares Indrusiak NoC performance CPU I/O CPU CPU link contention task contention leads to latency leads to latency variability variability RAM CPU CPU CPU 49 Real-Time Systems Group

  7. Leandro Soares Indrusiak Time predictability in embedded NoCs  Ability to guarantee an upper bound frequency on the system’s temporal behaviour upper bound  worst-case response time of each task  worst-case latency of each NoC packet  worst-case end-to-end latencies of communicating task chains time frequency  Ability to constrain the variability of the system’s temporal behaviour  limited best/worst case difference time 50 Real-Time Systems Group

  8. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 51 Real-Time Systems Group

  9. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 52 Real-Time Systems Group

  10. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 53 Real-Time Systems Group

  11. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 54 Real-Time Systems Group

  12. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 55 Real-Time Systems Group

  13. Leandro Soares Indrusiak Wormhole Networks-on-Chip packet is blocked R R R R R R Packet Header Packet Data PE PE 56 Real-Time Systems Group

  14. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 57 Real-Time Systems Group

  15. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 58 Real-Time Systems Group

  16. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header new packet Packet Data PE PE released 59 Real-Time Systems Group

  17. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 60 Real-Time Systems Group

  18. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 61 Real-Time Systems Group

  19. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 62 Real-Time Systems Group

  20. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 63 Real-Time Systems Group

  21. Leandro Soares Indrusiak Performance guarantees in embedded NoCs  As the core counts increase, NoC link contention tends to me the dominant source of latency variability  Current solutions  Full traffic separation (i.e. no link contention) • deterministic routing, fully disjoint routes (e.g. Hermes) • multiple overlay networks (e.g. Tilera) - contention over NIs and memory still possible • circuit switching (e.g. PNoC) - unpredictable circuit setup time • very low utilisation • state of the art: mixed criticality, virtual traffic separation 64 Real-Time Systems Group

  22. Leandro Soares Indrusiak Performance guarantees in embedded NoCs  As the core counts increase, NoC link contention tends to me the dominant source of latency variability  Current solutions  Virtual traffic separation • time-division multiplexing (TDM) - fixed traffic slotting (e.g. Aethereal, AElite) • round-robin (RR) - rate controlling (e.g. Kalray, Nostrum, IDAMC) • fixed-priority (FP) - priority-arbitrated virtual channels (e.g. QNoC) 65 Real-Time Systems Group

  23. Leandro Soares Indrusiak Priority preemptive virtual channels  Wormhole NoCs using virtual channels with priority preemptive arbitration can discriminate packets of different levels of urgency  Matches previous work on schedulability analysis in priority-preemptive wormhole networks 66 Real-Time Systems Group

  24. Leandro Soares Indrusiak Priority preemptive virtual channels PE PE PE highest priority highest priority priority ID with remaining credit with remaining credit R R R PE PE PE data_out data_in … … routing routing R R R credit_out credit_in & & transmission transmission PE PE PE control control R R R … … 67 Real-Time Systems Group

  25. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 68 Real-Time Systems Group

  26. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 69 Real-Time Systems Group

  27. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header high priority Packet Data PE PE packet released 70 Real-Time Systems Group

  28. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 71 Real-Time Systems Group

  29. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 72 Real-Time Systems Group

  30. Leandro Soares Indrusiak first packet is Priority preemptive virtual channels preempted R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 73 Real-Time Systems Group

  31. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 74 Real-Time Systems Group

  32. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 75 Real-Time Systems Group

  33. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 76 Real-Time Systems Group

  34. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 77 Real-Time Systems Group

  35. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 78 Real-Time Systems Group

  36. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 79 Real-Time Systems Group

  37. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 80 Real-Time Systems Group

  38. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs Pros vs Cons 81 Real-Time Systems Group

  39. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs  Cons  not available as COTS 82 Real-Time Systems Group

  40. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs Xilinx Artix FPGA  Cons  hardware overhead related to virtual channel buffering and arbitration B. Sudev, L. S. Indrusiak: Low overhead predictability enhancement in non-preemptive network-on-chip routers using Priority Forwarded Packet Splitting. ReCoSoC 2014. 83 Real-Time Systems Group

  41. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs Xilinx Artix FPGA  Cons  hardware overhead related to virtual channel buffering and arbitration simple round-robin, no traffic shaping B. Sudev, L. S. Indrusiak: Low overhead predictability enhancement in non-preemptive network-on-chip routers using Priority Forwarded Packet Splitting. ReCoSoC 2014. 84 Real-Time Systems Group

  42. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs Xilinx Artix FPGA  Cons  hardware overhead related to virtual channel buffering and arbitration priority non- OPEN preemptive PROBLEM arbitration [Sudev ALERT & Indrusiak, ReCoSoC 2014] B. Sudev, L. S. Indrusiak: Low overhead predictability enhancement in non-preemptive network-on-chip routers using Priority Forwarded Packet Splitting. ReCoSoC 2014. 85 Real-Time Systems Group

  43. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs Xilinx Artix FPGA  Cons  hardware overhead related to virtual channel buffering and arbitration priority preemptive arbitration, 4 VCs with 2 position buffers each B. Sudev, L. S. Indrusiak: Low overhead predictability enhancement in non-preemptive network-on-chip routers using Priority Forwarded Packet Splitting. ReCoSoC 2014. 86 Real-Time Systems Group

  44. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs  Pros  notion of priorities is very intuitive and natural  no waste of bandwidth through reservation mechanisms  amenable to tight analysis methods (more on this later)  virtual separation of traffic  accommodates change in traffic properties (periods, packet sizes, jitter) 87 Real-Time Systems Group

  45. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs  Pros  simple protocols to handle mixed-criticality traffic R R R R R R C C C C C C R R R R R R C C C C C C after a mode change, routers R R R R R R arbitrate links in criticality order, C C C C C C and in priority order within the mode change same criticality notification L. S. Indrusiak, J. Harbin, A. Burns: Average and Worst-Case Latency Improvements in Mixed-Criticality Wormhole Networks-on-Chip. ECRTS 2015. 88 Real-Time Systems Group

  46. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs vs 89 Real-Time Systems Group

  47. Leandro Soares Indrusiak Outline  Wormhole Networks  Networks-on-Chip  Real-Time Analysis  Resource Management 90 Real-Time Systems Group

  48. Leandro Soares Indrusiak Performance evaluation  How to estimate performance figures for a particular application mapped to a Network-on-Chip?  full system prototyping • cores + NoC in FPGA, running OS + application • extremely costly setup time, can only explore few design alternatives  accurate system simulation • cycle-accurate model of cores + NoC, running OS + application • extremely long simulation time, can only explore few design alternatives  approximately-timed system simulation • approximately-timed model of cores + NoC, executing an abstract model of the OS + application  analytical system performance models • average or worst-case latency estimation for restricted application styles (periodic independent tasks, synchronous dataflow, etc.) 91 Real-Time Systems Group

  49. Leandro Soares Indrusiak Performance evaluation  How to estimate performance figures for a particular application mapped to a Network-on-Chip?  full system prototyping • cores + NoC in FPGA, running OS + application • extremely costly setup time, can only explore few design alternatives  accurate system simulation • cycle-accurate model of cores + NoC, running OS + application • extremely long simulation time, can only explore few design alternatives  approximately-timed system simulation • approximately-timed model of cores + NoC, executing an abstract model of the OS + application  analytical system performance models • average or worst-case latency estimation for restricted application styles (periodic independent tasks, synchronous dataflow, etc.) 92 Real-Time Systems Group

  50. Leandro Soares Indrusiak Real-Time Analysis  First approaches to analyse priority-preemptive wormhole networks came during the 90s  Mutka (1994)  Hary and Ozguner (1997)  Key idea is to consider the entire path of a packet as a single shared resource  worst-case latency bound of a packet flow can be found by analysing the higher priority packet flows that share at least one link of its route 93 Real-Time Systems Group

  51. Leandro Soares Indrusiak Real-Time Analysis t4 t2 t1 interference t1 graph PE PE PE R R R t3 PE PE PE t2 R R R t3 PE PE PE R R R t4 pri(t1)>pri(t2)>pri(t3)>pri(t4) 94 Real-Time Systems Group

  52. Leandro Soares Indrusiak Real-Time Analysis  Kim et al (1998) recognised that direct interferences are not enough to produce t2 correct upper bounds t1 PE PE PE  Indirect interference must R R R t3 PE PE PE be considered, in order to take into account back-to- R R R PE back hits caused by PE PE upstream indirect R R R interference pri(t1)>pri(t2)>pri(t3) 95 Real-Time Systems Group

  53. Leandro Soares Indrusiak Real-Time Analysis  With the introduction of Networks-on-Chip in the 2000s, the approach of Kim et al was revisited by Lu et al (ASP DAC 2005)  aiming to provide upper bounds to sporadic packets over NoCs with priority preemptive virtual channels  flawed assumption of a critical instant where all packets start flowing simultaneously 96 Real-Time Systems Group

  54. Leandro Soares Indrusiak Real-Time Analysis  Shi and Burns (NOCS 2008) corrected the flaw on Lu et al and produced a response time formulation that uses a conservative approach to upstream indirect interference interference jitter OPEN I = R j -L j J j PROBLEM ALERT 97 Real-Time Systems Group

  55. Leandro Soares Indrusiak Real-Time Analysis  Several lines of work were derived from Shi and Burns 2008  highly cited: 145 (Google Scholar)  many works on priority assignment and task mapping  a few on analysis improvement, aiming to make it tighter • Nikolic et al (arxiv 2016) considered that the interference should not be calculated based on the full path, but the contention domain • Kashif et al (Trans Comp 2015) attempted to analyse packet paths on a link-by-link manner, but assumed infinite buffering (i.e. did not consider backpressure) • Kashif and Patel (RTAS 2016) attempted to consider buffering and backpressure effects • all of them upper-bounded by Shi and Burns 2008 98 Real-Time Systems Group

  56. Leandro Soares Indrusiak Real-Time Analysis  Xiong et al (GLSVLSI 2016) has made two key contributions  new formulation to the upstream indirect interference problem, aiming to be tighter than Shi and Burns 2008  new formulation to the downstream indirect interference problem, aiming to capture a previously unseen issue, and showing that Shi and Burns 2008 is optimistic and unsafe (and so are all the analyses upper-bounded by it) 99 Real-Time Systems Group

  57. Leandro Soares Indrusiak Real-Time Analysis  Indrusiak et al (arxiv 2016) has shown that  Xiong et al’s formulation to the upstream indirect interference problem was flawed  Xiong et al’s formulation to the downstream indirect interference problem was correct, but unnecessarily pessimistic (i.e. it assumed all indirect interference as if it is direct interference)  a tighter upper bound that considers the downstream indirect interference problem is possible  Xiong et al published a corrected analysis on IEEE Trans Comp in 2017 OPEN PROBLEM ALERT 100 Real-Time Systems Group

Recommend


More recommend