A Case for Clumsy Packet Processors Arindam Mallik and Gokhan Memik Electrical and Computer Engineering Dept. Northwestern University
Overview � Faults � Correctness is overrated � What if the higher levels take care of it? � Processor can be even more aggressive/speculative � Application-specific correctness � Networking applications � How do we measure? � Tools for architects � Relation between overclocking and faults Treat correctness as an objective, not a requirement 12/15/2004 International Symposium on Microarchitecture - MICRO 37 2
Outline � Introduction � Application description and error metrics � Error models for overclocking a cache � Processor configuration � Measurement definitions � Simulations 12/15/2004 International Symposium on Microarchitecture - MICRO 37 3
Motivation � Performance, energy requirements � Reliability / Probabilistic Circuits � Circuit designers have to be conservative � Worst-case design 12/15/2004 International Symposium on Microarchitecture - MICRO 37 4
Introduction � Inherent possibility of fault occurrence � Adverse environmental conditions � Aggressive scaling of supply voltage � Smaller manufacturing technologies � Need for analysis � More Transistors � Higher fault probability � Effect on system integrity � Transient faults � Permanent faults 12/15/2004 International Symposium on Microarchitecture - MICRO 37 5
Application Errors � For desktop processor or server � Capture and eliminate all faults � Networking – Communication � A certain level of error is acceptable � Nevertheless � The integrity of the system behavior must be maintained � System impact � Excessive “resubmission” � Program output 12/15/2004 International Symposium on Microarchitecture - MICRO 37 6
Overview of Approach Application Error Overclocking vs. Metrics Fault Modeling Simulator Configuration - Performance Comparison Metric - Application Errors 12/15/2004 International Symposium on Microarchitecture - MICRO 37 7
Error Classification � Fault vs. Error � Effect or duration � Volatile Error � Occurs mostly while processing a packet � Effects unit data element � Error in a single packet � Non-volatile Error � Occurs in the static data structures � Effects seen in many elements � Error in routing table 12/15/2004 International Symposium on Microarchitecture - MICRO 37 8
Error Metrics for Applications � Categorization of NetBench Applications � Low or micro-level Routines related to lowest layers of network stack � � Routing-level Applications similar to traditional IP routing (Layer 3-4 of the � network stack) � Application-level Traditional as well as emerging applications � � Common property of all applications � Control level tasks � Data level tasks 12/15/2004 International Symposium on Microarchitecture - MICRO 37 9
Error Measurement Procedure � Mark data structures in NetBench apps � Important Data Structures Routing Table Entries, TTL Value, … � � Outputs of Key Function Units Checksum Value, NAT Address � � Perform simulation � Introduce hardware faults � Mark the change � Data values change � Application behavior changes � Define the application error rate 12/15/2004 International Symposium on Microarchitecture - MICRO 37 10
A Sample Application - Route � Route – one of the most common networking applications � Implements IPv4 routing � Receives each packet – table lookup – processes it to decide the next network hop � Error Keys � Routing Table Initialization (IMPORTANT !!) � Checksum value � TTL Value � Path traversed in Routing Table for each packet 12/15/2004 International Symposium on Microarchitecture - MICRO 37 11
Fault Models for Overclocking � Overclock a component � Increased performance � Reduced energy � Increase in fault probability � Goal � Find fault vs. overclocking aggressiveness � Particular circuit design � Parameters � Voltage swing, noise 12/15/2004 International Symposium on Microarchitecture - MICRO 37 12
Opportunity for overclocking Voltage Swing vs. Time � Voltage swing � Rapid increase at first � Slow increase later 12/15/2004 International Symposium on Microarchitecture - MICRO 37 13
Not so fast, my friend! � Noise (inductive and/or capacitive) � Signal deviation � Overclocking � Reduced immunity 12/15/2004 International Symposium on Microarchitecture - MICRO 37 14
Approach � Analyze each component separately � 6-transistor SRAM cell � Input, clock, feedback loop 12/15/2004 International Symposium on Microarchitecture - MICRO 37 15
Finding fault probability 0.05*2 2n V fs 0.89V fs 0.78V fs 0.67V fs 0.39V fs 0.50V fs 0.56V fs 0.61V fs Noise immunity curves Noise amplitude for switching comb. � Analyze the impact of noise on the feedback loop � Noise immunity curves � Different noise amplitude probabilities � Check all switching combinations − 28 . 8 A = r P ( A ) 28 . 8 * e r 12/15/2004 International Symposium on Microarchitecture - MICRO 37 16
Estimation model 3.00E-04 2.50E-04 2.00E-04 1.50E-04 1.00E-04 5.00E-05 0.00E+ 00 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 relative voltage swing ( Vrs ) Fault probability versus voltage swing � Fit distribution into immunity � Combine it with voltage swing vs. time 1.00E+00 1.00E-01 Data Formula 1.00E-02 1.00E-03 1.00E-04 1.00E-05 1.00E-06 1.00E-07 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 relative cycle time ( Cr ) Fault probability versus relative clock frequency 1 2 F r = − 2 = − 6 * C 7 7 6 P 2 . 59 * 10 * e 2 . 59 * 10 * e r E 12/15/2004 International Symposium on Microarchitecture - MICRO 37 17
Outline � Introduction � Application description and error metrics � Error models for overclocking a cache � Processor configuration � Measurement definitions � Simulations 12/15/2004 International Symposium on Microarchitecture - MICRO 37 18
Processor Configuration � Fault detection � No detection � Parity One-strike, two-strikes, three-strikes � � Overclocking � Static 75%, 50%, and 25% of the original � � Dynamic Processors adapts according to fault observed � Frequency is adjusted at the end of each epoch � 12/15/2004 International Symposium on Microarchitecture - MICRO 37 19
Measurement definitions � Comparison between ideal and erroneous execution � Traditional parameters – unfair competition � � Consider both performance and reliability � � Energy-Delay-Fallibility product � Energy k x delay m x fallibility n � Fallibility = unit error occurrence probability � Can adjust the importance of faults by changing n � In present work, k = 1; m = 2; n = 2 12/15/2004 International Symposium on Microarchitecture - MICRO 37 20
Simulations � SimpleScalar Simulator for StrongARM 110 � Roughly an execution core of a Network Processor � Separate 4 KB direct mapped L1 data and instruction caches � 128 KB 4-way set-associative unified L2 cache � Error Probability � At normal clock frequency Error probability = 2.59*10 -7 per bit � � Increased error probability at higher clock rate according to the fault model 12/15/2004 International Symposium on Microarchitecture - MICRO 37 21
Application Error Behavior 0.012 0.025 Initialization Error Initialization Error 0.01 0.02 Interface Value Interface Value Error Probability Error Probability Destn Add 0.008 Destn Add 0.015 Radix Tree Entry Radix Tree Entry 0.006 Translated IP Address Translated IP Address 0.01 Fatal Error 0.004 Fatal Error 0.005 0.002 0 0 100% 75% 50% 25% 100% 75% 50% 25% Relative Clock Cycle Relative Clock Cycle Data plane Control plane 0.05 Initialization Error 0.045 Interface Value 0.04 Error Probability 0.035 Destn Add 0.03 Radix Tree Entry 0.025 Translated IP Address 0.02 Fatal Error 0.015 0.01 0.005 0 100% 75% 50% 25% Relative Clock Cycle Error introduced in both control and data plane 12/15/2004 International Symposium on Microarchitecture - MICRO 37 22
Fatal Error Probability � Curse on the system � Destroys integrity – unacceptable � Increases with high clock frequency � Observed on system with no error detection 0.0012 100% 75% 0.001 50% 25% 0.0008 Probability 0.0006 0.0004 0.0002 0 route drr nat tl url md5 crc avrg Applications 12/15/2004 International Symposium on Microarchitecture - MICRO 37 23
Energy-Delay-Fallibility Values � High Energy-Delay-Fallibility � Higher fallibility rate � Increased execution cycle � Extra instructions due to errors � Erroneous load � cache miss 2 Energy-Delay^2-Fallibility^2 1.8 1 0.75 1.6 0.5 0.25 1.4 dynamic 1.2 1 0.8 0.6 no detection one-strike two strikes three strikes Recovery Scheme 12/15/2004 International Symposium on Microarchitecture - MICRO 37 24
Conclusions � Release correctness constraint � Application-Specific Processors � Utilizing released correctness � Application-Specific error metrics � Overclocking � Fault modeling for overclocking a data cache � Error weighting – metrics 12/15/2004 International Symposium on Microarchitecture - MICRO 37 25
Recommend
More recommend