NoCAlert: An On-Line and Real- Time Fault Detection Mechanism for Network-on-Chip Architectures Andreas Prodromou, Andreas Panteli, Chrysostomos Nicopoulos, and YiannakisSazeides University of Cyprus MICRO 2012 Presenters: Leul Belayneh, Shibo Chen
Motivation Transient Fault Permanent Fault Intermittent Moore’s Law Chip Multi-Processor Fault Network on Chips Pictures from internet
Previous Works Fault Prevention: Bulletproof [HPCA 06’] Fault Recovery: Vicis [DAC 09’] Relinoc [DATE 11’] Fault Detection: ForEver [MICRO 11']
Goal A light-weight comprehensive on-line and real-time fault detection mechanism on NoC’s control logic.
Background Flit A. Prodomou, et.al, "NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures," In proc. of the International Symposium on Microarchitecture (MICRO), 2012.
Background Network invariance instances A. Prodomou, et.al, "NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures," In proc. of the International Symposium on Microarchitecture (MICRO), 2012.
Methodology • Identifying invariances/forbidden behaviors in the baseline router • Design individual checker for each invariance • Low power and area overhead design
Identifying Invariances 32 Invariances Identified: RC Unit: 3 No flit drop: 13 Arbiter Modules : 10 No new flit generated: 6 Crossbar: 3 Bounded delivery: 13 Buffer State: 12 No data corruption/ packet missing:16 Port-Level: 3 Network-Level: 1
Checkers Checker result INPUTS OUTPUTS A. Prodomou, et.al, "NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures," In proc. of the International Symposium on Microarchitecture (MICRO), 2012.
Experimental Evaluation Cycle-accurate simulator • Garnet NoC Simulator for 8X8 2D mesh network • Extended Garnet with checker modules. Hardware Implementation • Baseline router + 32 invariance checkers are implemented on 65nm TSMC Compared with the baseline- ForEVeR. R. Parikh, V. Bertacco, "Formally enhanced runtime verification to ensure noc functional correctness," In proc. of the International Symposium on Microarchitecture (MICRO), 2011.
Fault Injection Model 205 fault locations per 5-port router 205 X 64 = 11,808 fault locations in 8X8 mesh A. Prodomou, et.al, "NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures," In proc. of the International Symposium on Microarchitecture (MICRO), 2012.
Results A. Prodomou, et.al, "NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures," In proc. of the International Symposium on Microarchitecture (MICRO), 2012.
Results A. Prodomou, et.al, "NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures," In proc. of the International Symposium on Microarchitecture (MICRO), 2012.
Results A. Prodomou, et.al, "NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures," In proc. of the International Symposium on Microarchitecture (MICRO), 2012.
Results A. Prodomou, et.al, "NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures," In proc. of the International Symposium on Microarchitecture (MICRO), 2012.
Results A. Prodomou, et.al, "NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures," In proc. of the International Symposium on Microarchitecture (MICRO), 2012.
Conclusion NoCAlert has achieved: 0% false negatives Low detection latency- 100X over the baseline • 97% of true positive faults are captured at the instance of injection Minimal power (0.7%) and area overhead (3%)
Discussion Points Is it worth having the 36% of false positives exhibited in the detection process? Delayed response Vs increased false positives Is it good to minimize the number of checkers to decrease area and power overhead? Do you think it is feasible on other NoC networks and router designs?
Recommend
More recommend