1
play

1 Branch History Table of 1-bit Predictor 1-bit BHT Weakness BHT - PDF document

Reducing Branch Penalty Branch penalty in dynamically scheduled processors: wasted cycles due to pipeline flushing on mis- predicted branches Lecture 9: Branch Prediction Reduce branch penalty: 1. Basic idea, saturating counter, BHT, Predict


  1. Reducing Branch Penalty Branch penalty in dynamically scheduled processors: wasted cycles due to pipeline flushing on mis- predicted branches Lecture 9: Branch Prediction Reduce branch penalty: 1. Basic idea, saturating counter, BHT, Predict branch/jump instructions AND branch direction (taken or not taken) BTB, return address prediction, correlating prediction 2. Predict branch/jump target address (for taken branches) 3. Speculatively execute instructions along the predicted path 1 2 What to Use and What to Predict Mis-prediction Detections and Feedbacks Available info: Detections: At the end of decoding Current predicted PC FETCH � predictors pred_PC Target address known at Past branch history PC � � decoding, and not match (direction and target) RENAME Flush fetch stage What to predict: � At commit (most cases) REB/ROB Conditional branch inst: Wrong branch direction or � branch direction and Predictors � target address not match IM target address SCHD Flush the whole pipeline � Jump inst: target (at EXE: MIPS R10000) � address EXE Procedure call/return: � Feedbacks: target address WB Any time a mis-prediction is May need instruction pre- detected PC & Inst pred info feedback PC decoded COMMIT At a branch’s commit (at EXE: called speculative update) 3 4 Branch Direction Prediction Predictor for a Single Branch Predict branch direction: taken or not taken General Form (T/NT) BNE R1, R2, L1 taken 1. Access … 2. Predict state L1: … Not taken Output T/NT PC Static prediction: compilers decide the direction Dynamic prediction: hardware decides the 3. Feedback T/NT direction using dynamic information 1-bit prediction 1-bit Branch-Prediction Buffer 1. Feedback 2-bit Branch-Prediction Buffer 2. Correlating Branch Prediction Buffer T NT 3. Tournament Branch Predictor NT 4. 1 0 Predict Taken and more … Predict Taken 5. T 5 6 1

  2. Branch History Table of 1-bit Predictor 1-bit BHT Weakness BHT also Called Branch Example: in a loop, 1-bit BHT will cause Prediction Buffer in 2 mispredictions textbook K-bit Branch Consider a loop of 9 iterations before exit: Can use only one 1-bit address predictor, but accuracy is for (…){ low BHT: use a table of simple for (i=0; i<9; i++) 2 k predictors, indexed by bits a[i] = a[i] * 2.0; from PC } Similar to direct mapped cache � End of loop case, when it exits instead of looping Prediction Prediction More entries, more cost, as before but less conflicts, higher � First time through loop on next time through accuracy code, when it predicts exit instead of looping BHT can contain complex � Only 80% accuracy even if loop 90% of the time predictors 7 8 Branch Target Buffer 2-bit Saturating Counter Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) � Note: must check for branch match now, since can’t use wrong Solution: 2-bit scheme where change prediction only if branch address get misprediction twice: (Figure 3.7, p. 249) Example: BTB combined with BHT T Branch PC Predicted PC NT PC of instruction 11 10 Predict Taken Predict Taken T FETCH T NT NT 01 00 Predict Not Predict Not T Taken Taken NT Extra =? Blue: stop, not taken Yes: instruction is prediction state branch and use Gray: go, taken bits No: branch not predicted PC as predicted, proceed normally Adds hysteresis to decision making process next PC (Next PC = PC+4) 9 10 Return Addresses Prediction Correlating Branches Register indirect branch hard to predict Code example showing Assemble code address the potential � Many callers, one callee � Jump to multiple return addresses from a single If (d==0) BNEZ R1, L1 address (no PC-target correlation) d=1; DADDIU R1,R0,#1 SPEC89 85% such branches for procedure If (d==1) L1: DADDIU R3,R1,#-1 return … BNEZ R3, L2 Since stack discipline for procedures, save L2: return address in small buffer that acts like … a stack: 8 to 16 entries has small miss rate Observation: if BNEZ1 is not taken, then BNEZ2 is taken 11 12 2

  3. Correlating Branch Predictor Correlating Branch Predictor Idea: taken/not taken of General form: (m, n) Branch address (4 bits) Branch address (4 bits) recently executed predictor branches is related to � m bits for global behavior of next branch 1-bits per branch 2-bits per branch history, n bits for local local predictors local predictors (as well as the history of history that branch behavior) � Records correlation � Then behavior of between m+1 branches recent branches Prediction Prediction � Simple implementation: Prediction Prediction selects between, say, 2 global history can be predictions of next store in a shift branch, updating just register that prediction � Example: (2,2) � (1,1) predictor: 1-bit predictor, 2-bit global, global, 1-bit local 1-bit global 2-bit global 2-bit local branch history branch history (0 = not taken) (01 = not taken then taken) 13 14 Estimate Branch Penalty Accuracy of Different Schemes (Figure 3.15, p. 206) EX: BHT correct rate 20% 4096 Entries 2-bit BHT is 95%, BTB hit 18% Frequency of Mispredictions Unlimited Entries 2-bit BHT rate is 95% 16% 1024 Entries (2,2) BHT 14% Frequency of Mispredictions 12% Average miss penalty 11% is 15 cycles 10% 8% 6% 6% 6% 6% 5% 5% How much is the 4% 4% branch penalty? 2% 1% 1% 0% 0% nasa7 matrix300 tomcatv doducd spice fpppp gcc espresso eqntott li 4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2) 15 16 Accuracy of Return Address Predictor 17 3

Recommend


More recommend