Design of Bandwidth Bandwidth Aware Aware and and Design of Congestion Avoiding Avoiding Efficient Efficient Routing Routing Congestion Algorithms for for NoCs NoCs Platforms Platforms Algorithms M. Palesi 1 , G. Longo 1 , S. Signorino 1 , R. Holsmark 2 , S. Kumar 2 , V. Catania 1 1 DIIT, University of Catania, Italy 2 J ő nk ő ping University, Sweden {mpalesi, vcatania}@diit.unict.it {Rickard.Holsmark, Shashi.Kumar}@jth.hj.se NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 1
Outline Outline � Motivation � Application specific scenario � Bandwidth aware routing algorithm � Experiments and Results � Architectural implications � Conclusions NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 2
Limitations of of Current Current Routing Routing Algorithms Algorithms Limitations � Efforts biased toward performance Buffer occupation � Side effects like congestion ignored � Estimation and control of congestion is difficult in general � Partially tackled by the selection function S � Designed only for specific network topologies Routing Routing Network Network Algorithm Algorithm Topology Topology Design Design NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 3
Application Specific Specific Scenario Scenario Application � Information available about T2 � Tasks which communicate and tasks T1 T4 which do never communicate � After task mapping � Information about T3 Tn network nodes which communicate � Concurrent/non concurrent Network Application Network Application communications Topology Specification Topology Specification � Communication bandwidth requirement for different pais � Many opportunities � Improving performance (e.g., maximize routing adaptivity) � Simplify the estimation/control of AS Routing AS Routing congestion Algorithm Algorithm � Design more effective selection policies Design Design NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 4
Channel Dependency Dependency Graph Graph Channel Topology Graph Channel Dependency Graph l 12 l 23 P1 P2 P3 l 12 l 23 l 21 l 32 l 41 l 25 l 63 l 14 l 52 l 36 l 21 l 32 l 45 l 56 l 36 l 41 l 14 l 52 l 25 l 63 P4 P5 P6 l 54 l 65 l 45 l 56 l 54 l 65 NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 5
Application Specific Specific CDG CDG Application P1 � P3 Communication Graph T1 T2 P4 � P3 P1 � P6 T6 T3 � l 12 l 23 � � T5 T4 � l 21 l 32 � Topology Graph l 36 l 12 l 23 l 41 l 14 l 52 l 25 l 63 � � P1 P2 P3 l 45 l 56 � l 21 l 32 � l 41 l 25 l 63 l 14 l 52 l 54 l 65 l 36 l 45 l 56 Channel Dependency Graph P6 P4 P5 l 54 l 65 NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 6
Application Specific Specific CDG CDG Application Communication Graph T1 T2 T6 T3 l 12 l 23 T5 T4 l 21 l 32 Topology Graph l 41 l 14 l 52 l 25 l 63 l 36 l 12 l 23 P1 P2 P3 l 45 l 56 l 21 l 32 l 41 l 25 l 63 l 54 l 14 l 52 l 65 l 36 l 45 l 56 Application Specific P6 P4 P5 Channel Dependency Graph l 54 l 65 NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 7
APSRA Design Methodology Methodology APSRA Design Network Topology Communication Graph Application Application T2 to be mapped T1 P1 P2 P3 P4 to be mapped T4 Mapping Mapping Function Function T3 P5 P7 Tn P6 GOAL P8 P9 Maximize adaptivity APSRA APSRA P10 P11 P12 P13 [Palesi, et al. , CODES+ISSS’06] Routing Tables Memory Memory budget budget Compression Compression Compressed [Palesi, et al. , SAMOS’06] Routing Tables NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 8
Bandwidth Variation Variation: Multimedia : Multimedia Example Example Bandwidth Communication bandwidth ranges Source: Hu and Marculescu, TCAD from 10 to 500 MB/s 24(4), 2005 NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 9
Contributions Contributions � Design APSRAs which � are highly adaptive � Translates into high performance, in general � uniformly distribute traffic over the network � allow maintenance of load of links under a given bandwidth threshold NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 10
1st Phase Phase 1st � Removing a dependency d � Removing all the paths which use d � As soon as a path is removed � The fraction of bandwidth it transports must be redistributed between the remaining paths NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 11
1st Phase Phase 1st � Removing a dependency d � Removing all the paths which use d � As soon as a path is removed � The fraction of bandwidth it transports must be redistributed between the remaining paths d due to Path 3 d NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 12
1st Phase Phase 1st � Removing a dependency d � Removing all the paths which use d � As soon as a path is removed � The fraction of bandwidth it transports must be redistributed between the remaining paths Remove d Path 3 must be removed 25 MB/s must be redistributed 66 MB/s 33 MB/s between the remaining paths 66 MB/s 33 MB/s 33 MB/s 33 MB/s 33 MB/s 33 MB/s 66 MB/s NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 13
1st Phase Phase 1st � Removing a dependency d � Removing all the paths which use d � As soon as a path is removed � The fraction of bandwidth it transports must be redistributed between the remaining paths � Strategy � Removing the dependency d which minimizes the overhead of bandwidth that should be allocated to the remaining paths that do not use d × 2 B c PT c d ( ) ( , ) ∑ = cost( d ) × − 2 P ( c ) ( P ( c ) PT ( c , d ) ) ∈ c C NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 14
2nd Phase Phase 2nd � After Phase 1 we have a routing function which is � Deadlock free � Provides more adaptivity to communications characterized by higher communication bandwidth � But… � It is possible that the agregate bandwidth ( AB ) on some links exceeds the capacity of that link � “Some” routing paths passing on that link, must be removed to reduce the AB on that link down to the link capacity ∀ link l → AB ( l ) ≤ Cap ( l ) NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 15
Routing and and Selection Selection Routing Neighbours Inputs information Locally stored Body H information Adaptive Adaptive Selection Selection Routing Routing Function Function Function Function Routing Algorithm Outputs NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 16
Load Balancing Balancing Selection Selection Function Function Load Admissible Selection Dst Out Prob … … … … … … … … … … n d E S 0,75 0,25 … … … … … … … … … … Overhead � The probability to select output channel l is proportional to the number of admissible paths starting from l and that can be used to reach the destination NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 17
Experimental Setup Setup Experimental � 8x8 mesh based NoC � Buffer size 4-flits � Simulation time 100,000 cycles � Warmup time 20,000 cycles � Traffic injection distribution � Poisson (for synthetic traffic scenarios) � Self-similar (for MMS traffic) NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 18
Stdev Reduction Reduction Stdev � Percentage reduction of standard deviation of the aggregated bandwidth in network links APSRA-BW APSRA-BWL 30% 25% Stdev reduction (%) 20% 15% 10% 5% 0% m e R l y 1 2 C S a l l e e r f f _ T M s o r f s s r e u t _ o M f e o o i t h t n t p p p o v u S U s p e s s B r n n - s - t a a o - t t i r r H o B T T H NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 19
Aggregate bandwidth bandwidth Aggregate � Aggregate bandwidth per link for a 9x9 mesh-based NoC under uniform traffic APSRA APSRA-BWL NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 20
Delay Reduction Reduction Delay � Average delay reduction obtained when APSRA-BW and APSRA-BWL are used taking APSRA as a baseline NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 21
Average Delay Delay Variation Variation Average � Average delay variation under uniform traffic for different ranges of communication bandwidth NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 22
Links Utilization Utilization Links � Links utilization under uniform traffic for APSRA and APSRA-BWL → → → → APSRA-BWL APSRA NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 23
Router Architecture Architecture Router NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 24
Area Overhead Overhead Area RND LB 45000 40000 35000 30000 Area (um^2) 25000 20000 15000 10000 5000 0 Arbiter XBar FIFO WHRT Ctrl Routing Selection Function Function +5% overall area overhead NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 25
Conclusions Conclusions � Bandwidth aware routing algorithm � Highly adaptive � Reduces the variation of load in the network links � Ensures that the link bandwidth is not violated � Evaluate the idea for irregular mesh topology NoCS 2008 � Newcastle University, UK – 7th-11th April 2008 26
Recommend
More recommend