CS252 Today’s Big I dea Graduate Computer Architecture Lecture 18: • Reactive: past actions cause system to adapt use Branch Prediction + analysis resources => I LP – do what you did bef ore better – ex: caches – TCP windows – URL completion, . . . • Proact ive: uses past act ions t o predict April 2, 2002 f ut ure act ions Prof . David E. Culler – optimize speculatively, anticipate what you are about to Comput er Science 252 do – branch prediction Spring 2002 – long cache blocks – ??? CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 1 Lec 18. 2 Review: Case f or Branch Prediction when Review: 7 Branch Prediction Schemes I ssue N instructions per clock cycle 1. Branches will arrive up t o n t imes f ast er in 1. 1- bit Branch- Predict ion Buf f er an n - issue processor 2. 2- bit Branch- Predict ion Buf f er 2. Amdahl’s Law => relat ive impact of t he 3. Correlat ing Branch Predict ion Buf f er control stalls will be larger with the lower 4. Tournament Branch Predict or pot ent ial CPI in an n - issue processor 5. Branch Target Buf f er 6. I nt egrat ed I nst ruct ion Fet ch Unit s 7. Ret urn Address Predict ors conversely, need branch predict ion t o ‘see’ potential parallelism CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 3 Lec 18. 4 Review: Dynamic Branch Prediction Review: Dynamic Branch Prediction (Jim Smit h, 1981) • Bet t er Solut ion: 2- bit scheme where change • Perf ormance = ƒ(accuracy, cost of mispredict ion) predict ion only if get mispredict ion twice: • Branch Hist ory Table: Lower bit s of PC address index t able of 1- bit values T – Says whether or not branch taken last time NT – No address check (saves HW, but may not be right branch) Predict Taken Predict Taken T • Problem: in a loop, 1- bit BHT will cause 2 mispredict ions (avg is 9 it erat ions bef ore exit ): T NT NT – End of loop case, when it exits instead of looping as bef ore Predict Not Predict Not – First time through loop on next time through code, when it T Taken predicts exit inst ead of looping Taken – Only 80% accuracy even if loop 90% of the time • Red: st op, not t aken NT • Green: go, taken • Adds hyst eresis t o decision making process CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 5 Lec 18. 6 P age 1
Correlating Branches Consider 3 Scenarios I dea: t aken/ not Branch address (4 bits) t aken of recent ly • Branch f or loop t est execut ed branches is 2-bits per branch relat ed t o behavior • Check f or error or except ion local predictors of next branch (as • Alt ernat ing t aken / not- t aken well as t he hist ory of – example? that branch behavior) – Then behavior of recent Prediction Prediction branches selects between, say, 4 predictions of next • Your worst- case predict ion scenario branch, updating just that prediction • (2, 2) predict or: 2- bit global, 2- bit local 2-bit recent global branch history (01 = not taken then taken) CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 7 Lec 18. 8 Accuracy of Dif f erent Schemes Re- evaluating Correlation (Figure 3.15, p. 206) 20% 18% • Several of the SPEC benchmarks have less 18% 4096 Entries 2-bit BHT t han a dozen branches responsible f or 90% Frequency of Mispredictions 16% Unlimited Entries 2-bit BHT of t aken branches: 14% 1024 Entries (2,2) BHT program branch % static # = 90% 12% 11% compress 14% 236 13 10% eqntott 25% 494 5 8% gcc 15% 9531 2020 6% 6% 6% 6% mpeg 10% 5598 532 5% 5% 4% 4% real gcc 13% 17361 3214 • Real programs + OS more like gcc 2% 1% 1% 0% 0% 0% • Small benef its beyond benchmarks f or correlat ion? problems wit h branch aliases? 4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2) CS252/ Culler CS252/ Culler 4/ 2/ 02 What ’s missing in t his pict ure? 4/ 2/ 02 Lec 18. 9 Lec 18. 10 BHT Accuracy Tournament Predictors • Mot ivat ion f or correlat ing branch predict ors is • Mispredict because eit her: 2- bit predictor f ailed on important branches; – Wrong guess f or that branch by adding global inf ormat ion, perf ormance – Got branch history of wrong branch when index the improved table • Tournament predict ors: use 2 predict ors, 1 • 4096 ent ry t able programs vary f rom 1% based on global inf ormat ion and 1 based on mispredict ion (nasa7, t omcat v) to 18% local inf ormat ion, and combine wit h a select or (eqntott ), wit h spice at 9% and gcc at 12% • Hopes t o select right predict or f or right • For SPEC92, branch (or right cont ext of branch) 4096 about as good as inf init e t able CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 11 Lec 18. 12 P age 2
Dynamically f inding structure in Tournament Predictor in Alpha 21264 Spaghetti • 4 K 2 - bit counters to choose f rom among a global predictor and a local predictor • Global predictor also has 4K entries and is indexed by the history of the last 12 branches; each entry in the global predictor is a standard 2 - bit predictor – 12- bit pat t ern: ith bit 0 => ith prior branch not taken; it h bit 1 => it h prior branch taken; • Local predictor consists of a 2 - level predictor: ? – Top level a local history table consisting of 1024 10- bit entries; each 10- bit ent ry corresponds t o t he most recent 10 branch outcomes f or the entry. 10- bit history allows patterns 10 branches to be discovered and predicted. – Next level Selected entry f rom the local history table is used to index a table of 1K entries consisting a 3 - bit saturating counters, which provide the local prediction • Total size: 4K*2 + 4K*2 + 1K*10 + 1K*3 = 29K bit s! (~180, 000 transistors) CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 13 Lec 18. 14 Accuracy of Branch Prediction % of predictions f rom local predictor in Tournament Prediction Scheme 99% tomcatv 99% 100% 0% 20% 40% 60% 80% 100% 95% doduc 84% 98% 97% nasa7 matrix300 100% 86% fpppp 82% tomcatv 94% Profile-based 98% 90% doduc 2-bit counter 88% spice 55% Tournament li 77% 98% fpppp 76% gcc 72% 86% espresso 82% 63% espresso 96% 37% eqntott 88% 69% li g c c 70% 94% f ig 3.40 0% 20% 40% 60% 80% 100% Branch prediction accuracy • Prof ile: branch prof ile f rom last execution (static in that in encoded in instruction, but prof ile) CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 15 Lec 18. 16 Need Address Accuracy v. Size (SPEC89) at Same Time as Prediction 10% • Branch Target Buf f er (BTB): Address of branch index to get prediction AND branch address (if taken) 9% – Note: must check f or branch match now, since can’t use wrong branch address 8% ( Figure 3.19, 3.20 ) Local 7% Branch PC Predict ed PC PC of inst ruct ion 6% FETCH 5% Correlating 4% 3% Tournament 2% Extra =? 1% Yes: instruction is prediction state branch and use 0% bits No: branch not predicted PC as 0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 predicted, proceed normally next PC (Next PC = PC+4) Total predictor size (Kbits) CS252/ Culler CS252/ Culler 4/ 2/ 02 4/ 2/ 02 Lec 18. 17 Lec 18. 18 P age 3
Recommend
More recommend