energy efficient fault tolerance in chip multiprocessors
play

Energy-Efficient Fault Tolerance In Chip Multiprocessors Using - PowerPoint PPT Presentation

Energy-Efficient Fault Tolerance In Chip Multiprocessors Using Critical Value Forwarding P. Subramanyan 1 V. Singh 1 K. K. Saluja 2 E. Larsson 3 1 Indian Institute of Science, Bangalore, India 2 University of Wisconsin-Madison, Madison, WI, USA 3


  1. Energy-Efficient Fault Tolerance In Chip Multiprocessors Using Critical Value Forwarding P. Subramanyan 1 V. Singh 1 K. K. Saluja 2 E. Larsson 3 1 Indian Institute of Science, Bangalore, India 2 University of Wisconsin-Madison, Madison, WI, USA 3 Link¨ oping University, Link¨ oping, Sweden 29 June 2010 Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 1 / 43

  2. Outline Introduction 1 Motivation Related Work RECVF Design 2 Overview Design Options DVFS in the Trailing Core Evaluation 3 Methodology Results Conclusion 4 Conclusion Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 2 / 43

  3. Introduction Outline Introduction 1 Motivation Related Work RECVF Design 2 Overview Design Options DVFS in the Trailing Core Evaluation 3 Methodology Results Conclusion 4 Conclusion Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 3 / 43

  4. Introduction Motivation The Reliability Problem Moore’s law is expected to apply for the next 10 years giving us smaller and faster devices with reduced power. But, there is a downside: Smaller devices make ICs more susceptible to transient faults Wearout and drift effects are now more prominent - negative bias temperature instability (NBTI), electromigration (EM), hot carrier injection (HCI) etc. Increased process variations The upshot of decreased reliability is the need for architectural mechanisms for fault tolerance. Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 4 / 43

  5. Introduction Motivation Requirements of a Reliability Solution Traditional fault-tolerance systems are targeted at mainframes or specially designed processors. Fault-tolerant systems for the commodity market have different requirements. Reliability mechanisms need to have low cost - low performance overhead - low energy overhead - small area overhead Mechanism must be configurable at runtime - Switched off for users who do not require reliability - Switched off for applications that are inherently resilient Transparent to software - i.e. , must work with existing software Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 5 / 43

  6. Introduction Motivation Why is Power/Energy Important? Power and peak temperature are key performance limiters in CMPs 1 - Since power budget for a chip is fixed, decreasing the power for a single core increases available power and hence performance of other cores 23 Decreasing operating temperatures leads to a significant increase in device reliability 4 - Decreasing temperature from 105 ◦ C to 66 ◦ C increased GOI median time-to-breakdown by a factor of 9 ; NBTI degradation decreased by 29% equivalent to eight-fold increase in lifetime 1 Isci et al., MICRO ’06 2 Greskamp et al., HPCA ’10 3 Intel Nehalem 4 Parulkar et al., SELSE ’08 Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 6 / 43

  7. Introduction Related Work Execution Assistance (1/2) output comparison results inputs leading core error predictions trailing core Two streams of execution in a leader-follower configuration Leader assists execution of follower by forwarding results Forwarded values used as predictions in the follower - potentially more accurate than “traditional” predictors - help speed up the follower [AR-SMT, FTCS ’99], [DIVA, MICRO ’99], [SRT, ISCA ’00] Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 7 / 43

  8. Introduction Related Work Execution Assistance (2/2) Classification of Mechanisms for Execution Assistance 1 Forwarding all values a Highest speedup but also requires highest bandwidth Suited for components of a single core or adjacent cores 2 Forwarding loads and branches b Eliminates branch mispredictions and data cache misses Solves the problem of input incoherence Still requires considerable bandwidth 3 Forwarding only branches c 4 Forwarding critical values a AR-SMT [FTCS ’99], DIVA [MICRO ’99], Slipstream ∗ [ASPLOS ’00], Madan et al. [TPDS ’07] b SRT [ISCA ’00], CRT [ISCA ’02], SRTR [ISCA ’02], CRTR [ISCA ’03], SpecIV [HPCA ’08], etc. c Paceline [PACT ’08], PVA [PACT ’05], Circuit Pruning [MICRO ’07], Decoupled Performance Correctness Architecture [MICRO ’08], etc. Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 8 / 43

  9. Introduction Related Work Chip-level Redundant Threading (CRT) Execute a logical thread as two physical threads Load Value Queue (LVQ) - both threads see the same memory state - trailing thread does not suffer data cache misses Branch outcome queue (BOQ) - prevents trailing thread from mis-speculating Modified store buffer - ensures that stores are identical across threads [Mukherjee et al., ISCA 2002] Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 9 / 43

  10. Introduction Related Work Parallelized Verification Architecture (PVA) Idea Split up the verification among two cores and operate the two cores at half voltage-frequency levels. - Energy vs. voltage is superlinear Uses three cores for execution of a single thread Trailing threads have to consult leading thread caches - Higher performance overhead with increasing latency Limited voltage scaling increases energy consumption Increase in L2 power [Rashid et al., PACT 2005] Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 10 / 43

  11. RECVF Design Outline Introduction 1 Motivation Related Work RECVF Design 2 Overview Design Options DVFS in the Trailing Core Evaluation 3 Methodology Results Conclusion 4 Conclusion Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 11 / 43

  12. RECVF Design Overview Overview logical thread 1 logical thread 2 core 0 core 1 core 2 core 3 Shared L2 Shared L2 shared bus interconnect core 4 core 5 core 6 core 7 logical thread 3 System block diagram One logical thread is executed on two cores Cores exchange information via shared bus interconnect Cores designated as leading and trailing cores - leading core assists execution trailing core by forwarding critical values Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 12 / 43

  13. RECVF Design Overview Critical Value Forwarding (1/2) Critical Value Forwarding The leading core identifies instructions on the critical path and forwards the results of these instructions to the trailing core. Breaks data dependence chains in the trailing core - Dependent instructions can execute “early” - Creates a cascade effect that speeds up trailing core Forward only a few values having the most effect on performance - Bandwidth is limited even in CMPs Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 13 / 43

  14. RECVF Design Overview Critical Value Identification (1/3) The mechanism for critical value identification is as follows: Observe execution of instructions through the processor pipeline Based on predefined marking criteria, instructions are marked critical At instruction commit, values of marked instructions are forwarded The idea of observing events in the processor to detect critical instructions is from [Tune et al., HPCA 2001]. Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 14 / 43

  15. RECVF Design Overview Critical Value Identification (2/3) ROBStall Unexecuted instruction at head of ROB InstQHead Unexecuted instruction at head of instruction queue InstQHFree Instruction producing a value freeing another instruction at head of instruction queue FreedN Instruction frees at least N instructions FanoutN Instruction produces a value used by at least N other in- flight instructions EveryN Every N th instruction AllBJ All Branch/Jump Instructions MispredBJ Mispredicted Branch/Jump Instructions All All values are forwarded Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 15 / 43

  16. RECVF Design Overview Critical Value Identification (3/3) Special Handling of Branch/Jump Instructions Mispredicted branch instructions are marked as critical - branch direction mispredictions as well as BTB, RAS misses are all considered as “mispredicted” branches The outcomes (i.e. branch target) of these are forwarded This scheme eliminates most mispredictions in trailing core at the cost of forwarding the outcomes of a small fraction of branch/jump instructions. Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 16 / 43

  17. RECVF Design Overview Critical Value Forwarding: Microarchitectural Support Microarchitectural structures Branch Outcome Queue (BOQ) Holds forwarded branch outcome and index in trailing core Consulted along with branch predictor Branch outcome, if available, overrides branch predictor Instruction Result Queue (IRQ) Holds results and index of forwarded instructions Accessed at the time of instruction dispatch If value is available in IRQ: - IRQ value written to destination physical register - Dependent instructions can now execute using forwarded value - Destination is written-to again after execution completes Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 17 / 43

  18. RECVF Design Overview Block Diagram BOQ BPred Fetch Critical Value Identification Heuristic Decode Rename IRQ Issue ROB LSQ FUs Reg File WB D-cache to trailing Retire core Fingerprint Block Diagram Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 18 / 43

Recommend


More recommend