synchronized progress in interconnection networks
play

Synchronized Progress in Interconnection Networks (SPIN) : A new - PowerPoint PPT Presentation

ISCA 2018 Session 8B: Interconnection Networks Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Tushar Krishna Georgia Tech Georgia Tech (aniruddh@gatech.edu)


  1. ISCA 2018 Session 8B: Interconnection Networks Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Tushar Krishna Georgia Tech Georgia Tech (aniruddh@gatech.edu) (tushar@ece.gatech.edu) Paul V. Gratz Texas A&M University (pgratz@tamu.edu)

  2. 2 Network Routing F J A B E G C H I K D Deadlock

  3. 3 Routing Deadlocks � A Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible. A F Deadlock B E C D

  4. 4 Routing Deadlocks � A Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible. � Deadlocks are a fundamental problem in both off-chip and on- chip interconnection networks. � Cause system breakdown and kill chips. � Deadlocks are hard to detect during functional verification. � Manifest after a long use time. � Depend on : traffic pattern, injection rate, congestion. � Show up due to system wear out faults and power-gating of network elements which are hard to simulate. � Need a solution for functional correctness !!

  5. 5 Solution I: Dally’s Theory � Defines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created. Higher to Lower not A F allowed 1 6 2 B E 3 5 4 C D

  6. 6 Solution I: Dally’s Theory � Defines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created. � Implementations: Turn model [5], XY routing, Up- Down routing [20]. � Limitations: � Routing Restrictions: Increased Latency, Throughput loss, Energy overhead � Require large no. of VCs for fully adaptive routing.

  7. 7 Solution II: Duato’s Theory � Adds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks. � Implementation: turn restrictions in escape-VC. A F F Escape-VC VC0 B E E C D VC0 Escape-VC

  8. 8 Solution II: Duato’s Theory � Adds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks. � Implementation: turn restrictions in escape-VC. � Limitations: � Energy and Area overhead of escape VCs. � Additional routing tables/logic for routing within escape-VC.

  9. 9 Other Solutions � Solution III: Flow Control � Restrict injection when no. of empty buffers fall below a threshold � Implementation: Bubble Flow Control [9] � Limitation: Implementation Complexity, Throughput Loss. � Solution IV: Deflection Routing � Assign every flit to some output port even if they get misrouted . � Implementation: BLESS [10], CHIPPER [35] � Limitation: Livelocks, non-minimal routing

  10. 10 Comparison of Deadlock Freedom Theories Metric Acyclic No Packet Topology VC cost for Mesh Theory Livelock Routing CDG not Injection Indepen- Free Minimal Adaptive Required Restrictions dent 1 6 Dally 1 2 Duato Can we do better ?? Flow 2 2 Control Deflection 1 Routing 1 1 SPIN

  11. 11 Outline � Routing Deadlocks � State of the Art � Dally’s Theory � Duato’s Theory � Flow Control Routing � Deflection Routing � SPIN : S ynchronized P rogress in I nterconnection N etworks � Evaluations � Conclusion

  12. 12 SPIN : Key Idea � Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin . What if: We coordinate the movement A F of every packet to the next hop at a given time ?? spin complete B E Simultaneous Deadlock Synchronized C D Movement

  13. 13 SPIN : Key Idea � Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin . � Each spin leads to one hop forward movement of all deadlocked packets. � One spin may not resolve the deadlock. If so, spin can be repeated � Deadlock is guaranteed to be resolved in a finite number of spins [proof in paper, Sec. III]

  14. 14 SPIN : Key Idea A F First spin complete B E Second spin complete C D

  15. 15 SPIN : Key Idea E D Deadlock Resolved F C Packets E &B exit the loop A B

  16. 16 Outline � Routing Deadlocks � State of the Art � SPIN : S ynchronized P rogress in I nterconnection N etworks � Key Idea � Implementation Example � Micro-architecture � FAvORS � Evaluations � Conclusion

  17. 17 SPIN: Implementation Example � SPIN is a generic deadlock freedom theory that can have multiple implementations . � We choose a recovery approach as deadlocks are rare scenarios (See Sec. II-F). � Our Implementation : � Detect the Deadlock. � Coordinate a time for spin. � Execute the spin.

  18. 18 Implementation Example : Detect Deadlocks � Use counters . � Placed at every node at design time. � Optimize by exploiting topology symmetry (See Static Bubble [6]). � If packet does not leave in threshold time (configurable), it indicates a 
 potential deadlock . � Counter expired ? Send probe to verify deadlock.

  19. 19 Implementation Example : Probe Msg. Probe Returns: A F Deadlock Confirmed 1. Deadlock 
 Detection Counter Expires at Node 2. Coordinating 
 the spin. 5 Probe B 3. Executing 
 E the spin. Send Probe C D

  20. 20 Implementation Example : Probe Msg. � Probe is a special message that tracks the buffer dependency . � Probe returns to sender: � Cyclic buffer dependence, hence deadlock . � Next, send a move msg. to convey the spin time � Upon receiving move msg., router sets its counter to count to spin cyle .

  21. 21 Implementation Example : Move Msg. A F 1. Deadlock 
 Detection Set counter to count to spin cycle 2. Coordinating 
 the spin. Move B 3. Executing 
 E the spin. Move returns Send Move C D

  22. 22 Implementation Example : spin A F 1. Deadlock 
 Detection 2. Coordinating 
 the spin. B Counters expire 3. Executing 
 E together in the spin the spin. cycle C D

  23. 23 Implementation Example : spin F E 1. Deadlock 
 Detection 2. Coordinating 
 the spin. A 3. Executing 
 D the spin. B C

  24. 24 Multiple SPIN Optimization � Resolving a deadlock may require multiple spins � After spin, router can resume normal operation. � Counter expires again, process repeated. � Optimization: send probe_move after spin is complete. � probe_move checks if deadlock still exists and if so, sets the time for the next spin. � Details in paper (Sec. IV-B).

  25. 25 Outline � Routing Deadlocks � State of the Art � SPIN : S ynchronized P rogress in I nterconnection N etworks � Key Idea � Implementation Example � Micro-architecture � FAvORS � Evaluations � Conclusion

  26. 26 Implementation Micro-architecture � No additional links: Spl. Msgs. use the same links as regular flits. � Spl. Msgs. have higher priority in link usage over regular flits. � Links are anyways idle during deadlocks. � Bufferless Forwarding: Spl. Msgs. are not buffered anywhere (either forwarded or dropped). � Distributed Design: any router can initiate the recovery. � 4% area overhead compared to traditional mesh router in 15nm Nangate [42].

  27. 27 Outline � Routing Deadlocks � State of the Art � SPIN : S ynchronized P rogress in I nterconnection N etworks � Key Idea � Walkthrough Example � Micro-architecture � FAvORS � Evaluations � Conclusion

  28. 28 FAvORS Routing Algorithm � SPIN is the first scheme that enables true one-VC fully adaptive deadlock-free routing for any topology . � FAvORS : F ully A dapti v e O ne-vc R outing with S PIN. � Algorithm has two flavors: � Minimal Adaptive � Non-minimal Adaptive. � Route Selection Metrics: � Credit turn-around time � Hop Count � More details in paper (Sec. V).

  29. 29 Outline � Routing Deadlocks � State of the Art � SPIN : Synchronized Progress in Interconnection Networks � Evaluations � Conclusion

  30. 30 Evaluations � Network Configuration: Simulator gem5 simulator + Garnet 2.0 Network model Topologies 1024 node Off-chip 8x8 Mesh Dragon-fly Link 1-cycle Inter-group: 3-cycle Latency Intra-group: 1-cycle Traffic Synthetic + Synthetic Multi-threaded (PARSEC)

  31. 31 Evaluations : Baselines � 8x8 Mesh: Design Routing Minimal Theory Deadlock Adaptivity Freedom Type West-first Routing Partial Yes Dally Avoidance Escape-VC Full Yes Duato Avoidance Static-Bubble [6] Full Yes Flow-Control Recovery � 1024 Node Off-chip Dragon-fly: Design Routing Minimal Theory Deadlock Adaptivity Freedom Type UGAL [37] Full No Dally Avoidance

  32. 32 Saturation Throughput � 1024-node Off-chip Dragon-fly: Neighbor Bit-complement 100 100 Latency (cycles) Latency (cycles) 75 75 50 50 25 25 0 0 0.01 0.08 0.15 0.22 0.29 0.36 0.01 0.03 0.05 0.07 0.09 0.11 Inj. Rate (flits/node/cycle) Inj. Rate (flits/node/cycle) FAvORS_NMin_1VC UGAL_3VC Minimal_1VC UGAL_3VC SPIN SPIN Dally SPIN 62% higher 50% higher throughput 25% higher throughput throughput compared to compared to UGAL_Dally compared to UGAL_Dally Minimal Routing 1-VC

Recommend


More recommend