DUAL CONGESTION AWARENESS SCHEME IN ON-CHIP NETWORKS DEPARTMENT OF INFORMATION TECHNOLOGY Masoumeh Ebrahimi, Masoud Daneshtalab Juha Plosila, Hannu Tenhunen
M ANY - CORE E MBEDDED S YSTEMS Network Interface (NI) Input Packetizer Router Core Core Core Core PE Controller I I I I N N N N R R R R Input Depacketizer Processor Core Core Core Core I I I I N N N N R R R R Core Core Core Core Router (R) I I I I N N N N Arbiter Input Output R R R R Input Output Input Output Crossbar Core Core Core Core Input Output I I I I N N N N R R R R Input Output Routing Unit Memory
M ICRO -S LICE S ERVERS Tilera S2Q
M ICRO -S LICE S ERVERS Tilera S2Q 4 U, each U 2x64 cores: 512 cores
M ICRO -S LICE S ERVERS Tilera S2Q
AXIM: AXI- BASED M ANY - CORE E MBEDDED S YSTEM ICCD 2012 & FPL 2012 & ReCoSoC 2012 BIO-based applications: • SAXPY/GAXPY • Cholesky AX IM ¡on ¡FPGA • Jacobi AXI ¡BUS Embedded ¡ • QR factorization Controller Slave Slave Web-server accelerator MicroBlaze+Linux AXI AXI Plasma Plasma Interface Interface + ¡Linux + ¡Linux Prototype layout on FPGA Vertix-6 Stream application: H264, Transcoder 25 MIPS-like PEs Compact ¡ FLASH I/O ¡Controller Slave Slave Slave Slave Memory Plasma Plasma Plasma Plasma + ¡Linux + ¡Linux + ¡Linux + ¡Linux AXI ¡BUS AXI ¡BUS t 0 DDR DDR Memory w 0,1 : 5 w 0,2 : 4 Memory Controller Slave Slave Slave t 1 t 2 AXI Plasma Plasma Plasma Interface + ¡Linux + ¡Linux + ¡Linux w 1,4 : 6 w 2,4 : 7 w 2,3 : 3 Ethernet ¡ Ethernet ¡ PHY Controller t 4 t 3 Slave Slave Slave Master Plasma Plasma Plasma Plasma + ¡Linux + ¡Linux + ¡Linux + ¡Linux w 4,5 : 6 w 3,5 : 4 PCI-‑Express PCI-‑Express ¡ PHY Controller t 5
AXIM: AXI- BASED M ANY - CORE E MBEDDED S YSTEM ICCD 2012 & FPL 2012 & ReCoSoC 2012 Cluster ¡of ¡PEs Embedded ¡ Master ¡PE Slave ¡PE ¡1 Slave ¡PE ¡2 Slave ¡PE ¡N Controller AX IM ¡on ¡FPGA System ¡Process App ¡1 ¡– ¡Task ¡1 App ¡2 ¡– ¡Task ¡1 App ¡2 ¡– ¡Task ¡2 App ¡1 ¡– ¡Task ¡2 App ¡3 ¡– ¡Task ¡1 App ¡2 ¡– ¡Task ¡5 App ¡3 ¡– ¡Task ¡2 App ¡3 ¡– ¡Task ¡4 App ¡2 ¡– ¡Task ¡3 User ¡Process User ¡Process User ¡Process AXI ¡BUS Task ¡Mapping ¡ Embedded ¡ Algorithm Controller Slave Slave MicroBlaze+Linux AXI AXI OS Micro-‑kernel Micro-‑kernel Micro-‑kernel Micro-‑kernel Plasma Plasma Interface Interface + ¡Linux + ¡Linux Prototype layout on FPGA Vertix-6 25 MIPS-like PEs Compact ¡ FLASH I/O ¡Controller Communication ¡Infrastructure ¡(NoC) Slave Slave Slave Slave Memory Plasma Plasma Plasma Plasma + ¡Linux + ¡Linux + ¡Linux + ¡Linux AXI ¡BUS AXI ¡BUS t 0 DDR DDR Memory w 0,1 : 5 w 0,2 : 4 Memory Controller Application ¡ Slave Slave Slave Memory ¡ t 1 t 2 AXI Plasma Plasma Plasma Repository (DRAM ¡& ¡FLASH) Interface + ¡Linux + ¡Linux + ¡Linux w 1,4 : 6 w 2,4 : 7 w 2,3 : 3 Ethernet ¡ Ethernet ¡ Data ¡Set ¡1 Application ¡1 PHY Controller Data ¡Set ¡2 Application ¡2 t 4 t 3 Slave Slave Slave Master Data ¡Set ¡3 Application ¡3 Plasma Plasma Plasma Plasma + ¡Linux + ¡Linux + ¡Linux + ¡Linux w 4,5 : 6 w 3,5 : 4 PCI-‑Express PCI-‑Express ¡ Data ¡Set ¡M Application ¡K PHY Controller t 5
AXIM: AXI- BASED M ANY - CORE E MBEDDED S YSTEM N ETWORK - ON -C HIP ICCD 2012 & FPL 2012 & ReCoSoC 2012 Prototype layout on FPGA Vertix-6 25 MIPS-like PEs t 0 w 0,1 : 5 w 0,2 : 4 t 1 t 2 w 1,4 : 6 w 2,4 : 7 w 2,3 : 3 t 4 t 3 w 4,5 : 6 w 3,5 : 4 t 5
C ONGESTION IN N ETWORKS - ON -C HIP • Congestion • Is one of the main factors limiting the performance of Networks-on- Chip. • Routing algorithms • Perform an important role in distributing the traffic load over the network by providing alternative routing paths. However, • The major parts of research works avoid congestion by considering the traffic condition in the forward paths and delivering packets through less congested paths. They do not consider the impact of the router arbitration in traffic distribution. • Traditional methods usually consider the traffic at the node level rather than a region level.
I NPUT SELECTION F UNCTIONS The performance and efficiency of NoCs largely depend on the output-selection and input-selection • Input selection function • Chooses one of input channels to get access to the output channel, done by an arbitration process. • The arbiter could follow either non-priority or priority scheme. • In the priority method, when there are multiple input port requests for the same available output port, the arbiter grants access to the input port having the highest priority level. • The priority scheme can help to flatten the network congestion by giving higher priority level to traffic coming from congested areas.
O UTPUT SELECTION F UNCTIONS • Output selection function • Determines which output channel should be chosen for a packet arrived from an input channel, using a routing algorithm. • The output selection function can be classified as either congestion- oblivious or congestion-aware. • Congestion-oblivious schemes (static/deterministic) • Decisions are independent of the congestion condition of the network, such as dimension-order routing. • This policy cannot disrupt the load since the network status is not considered. • Congestion-aware schemes (dynamic/adaptive) • Usually consider local traffic condition to choose the output channel. • Routing decisions based on local congestion information may lead to an unbalanced distribution of traffic load.
T RADITIONAL M ETHODS : AN O C • ANoC: Congestion-Aware Selection Method in Agent- based Network-on-Chip (VLSI-SoC 2011) • In the Agent-based Network-on-Chip (ANoC) structure, the network is divided into several clusters in which a cluster includes four routers and a cluster agent. Data network and congestion network. • Advantages and shortcomings • Providing a wider view of the network congestion by considering a group of nodes (regions) rather than single nodes. • It only uses the congestion condition of forward paths in the routing decision.
T RADITIONAL M ETHODS : CATRA • CATRA: Congestion Aware Trapezoid-based Routing Algorithm (DATE 2012) • CATRA tries to collect and utilize the congestion information for just enough number of nodes in the network. • In CATRA, the passing probability of packets through the nodes is calculated. Based on this measurement, the nodes with high passing probabilities form a region of trapezoid shapes. • Advantages and shortcomings • Efficiently choosing the groups of nodes which is determined based on the passing probability of packets through the nodes • It only involves the congestion status of forward paths in the routing decision.
T RADITIONAL M ETHODS : GLB • GLB: Global Load Balancing Method (ReCoSoC 2012) • Packets aggregate and carry congestion information along a path they route. This information contains a global view of the path from where a packet is routed. • This global congestion information is used as a metric in the arbitration unit in order to give more priority to packets arriving from congested area. NW NE C D 20 23 24 21 22 SW 4 15 16 18 19 17 Congestion Quadrants SE 2 10 11 12 13 14 NW 3 NE 1 5 6 7 8 9 B A • Advantages and shortcomings 0 1 2 3 4 SW SE • It provides a global congestion information • This method makes more attention on the congestion information of backward paths with less attention to forward paths.
S UMMARY ON T RADITIONAL M ETHODS • ANoC, and CATRA are region-based methodologies, considering only forward paths. • GLB considers backward paths and based on the global congestion status. • Characteristics of DuCA (Dual Congestion Awareness): • It is a region-based approach, providing a wider view of network traffic • It is a scalable approach without using tables • It considers the congestion information of both forward and backward paths in the routing decision (output selection) and arbitration unit (input selection)
I NPUT S ELECTION F UNCTION OF D U CA • The input selection function examines the priority values of competing packets and gives the priority to packets coming from congested clusters. • In order to prevent starvation, each time after finding the highest value, the priorities of defeated packets are incremented. 30 31 32 33 34 35 6 7 8 24 26 27 28 25 29 18 19 20 21 22 23 3 4 5 12 13 14 15 16 17 6 7 8 9 10 11 2 0 1 0 1 2 3 4 5
O UTPUT S ELECTION F UNCTION OF D U CA • The output selection function chooses either X or Y dimensions based on the congestion condition of the neighboring clusters. • If the destination is located in the same 30 31 32 33 34 35 row or column as the neighboring cluster, 6 7 8 24 25 26 27 28 29 only the number of occupied buffer cells at the corresponding input buffers of the 18 19 20 21 22 23 neighboring nodes is considered. 3 4 5 12 13 14 15 16 17 • Otherwise, the congestion levels of two 6 7 8 9 10 11 neighboring clusters are compared with 1 2 0 each other. 0 1 2 3 4 5
Recommend
More recommend