ECE 2162 Branch Prediction Control Dependencies Branches are very - PowerPoint PPT Presentation

ECE 2162 Branch Prediction

Control Dependencies • Branches are very frequent – Approx. 20% of all instructions • Can not wait until we know where it goes – Long pipelines • Branch outcome known after B cycles • No scheduling past the branch until outcome known – Superscalars (e.g., 4-way) • Branch every cycle or so! • One cycle of work, then bubbles for ~B cycles? 2

Surviving Branches: Prediction • Predict Branches – And predict them well! • Fetch, decode, etc. on the predicted path – Option 1: No execute until branch resovled – Option 2: Execute anyway (speculation) • Recover from mispredictions – Restart fetch from correct path 3

Branch Prediction • Need to know two things – Whether the branch is taken or not (direction) – The target address if it is taken (target) • Direct jumps, Function calls – Direction known (always taken), target easy to compute • Conditional Branches (typically PC-relative) – Direction difficult to predict, target easy to compute • Indirect jumps, function returns – Direction known (always taken), target difficult 4

Branch Prediction: Direction • Needed for conditional branches – Most branches are of this type • Many, many kinds of predictors for this – Static: fixed rule, or compiler annotation (e.g. “BEQL” is “branch if equal likely ”) – Dynamic: hardware prediction • Dynamic prediction usually history-based – Example: predict direction is the same as the last time this branch was executed 5

Static Prediction • Always predict NT – easy to implement – 30-40% accuracy … not so good • Always predict T – 60-70% accuracy • Displacement based – Forward not taken, backward taken – loops usually have a few iterations, so this is like always predicting that the loop is taken 6

One-Bit Branch Predictor Branch history K bits of branch table of 2 K entries, instruction address 1 bit per entry Use this entry to Index predict this branch: 0: predict not taken 1: predict taken When branch direction resolved, go back into the table and update entry: 0 if not taken, 1 if taken 7

One-Bit Branch Predictor (cont’d) 0xDC08: for(i=0; i < 100000; i++) { 0xDC44: if( ( i % 100) == 0 ) T tick( ); 0xDC50: if( (i & 1) == 1) odd( ); N } 8

The Bit Is Not Enough! • Example: short loop (8 iterations) – Taken 7 times, then not taken once – Not-taken mispredicted (was taken previously) • Execute the same loop again – First always mispredicted (previous outcome was not taken) – Then 6 predicted correctly – Then last one mispredicted again • Each fluke/anomaly in a stable pattern results in two mispredicts per loop 9

Examples DC08: TTTTTTTTTTT ... TTTTTTTTTTNTTTTTTTTT … 100,000 iterations NT How often is branch outcome != previous outcome? TN 2 / 100,000 99.998% DC44: NNNNN ... NTNNNNN … NTNNNNN … Prediction Rate 2 / 100 98.0% DC50: TNTNTNTNTNTNTNTNTNTNTNTNTNTNT … 2 / 2 0.0% 10

Two Bits are Better Than One Predict NT Predict T Transistion on T outcome 2 3 Transistion on NT outcome 0 1 0 1 FSM for 2bC FSM for Last-Outcome ( 2 - b it C ounter) Prediction 11

Example Initial Training/Warm-up 1bC: 0 1 1 1 1 1 1 0 1 1 … … T T T T T T N T T T           2bC: 0 1 2 3 3 3 3 2 3 3 … … T T T T T T N T T T           Only 1 Mispredict per N branches now! DC08: 99.999% DC44: 99.0% 12

Still Not Good Enough These are We can good live with these This is bad! 13

Importance of Branches • 98%  99% – Who cares? – Actually, it’s 2% misprediction rate  1% – That’s a halving of the number of mispredictions • So what? – If misp rate equals 50%, and 1 in 5 insts is a branch, then number of useful instructions that we can fetch is: 5*(1 + ½ + (½) 2 + (½) 3 + … ) = 10 – If we halve the miss rate down to 25%: 5*(1 + ¾ + (¾) 2 + (¾) 3 + … ) = 20 – Halving the miss rate doubles the number of useful instructions that we can try to extract ILP from 14

How about the Branch at 0xdc50? • 1bc and 2bc don’t do too well (50% at best) • But it’s still obviously predictable • Why? – It has a repeating pattern : (NT)* – How about other patterns? (TTNTN)* • Use branch correlation – The outcome of a branch is often related to previous outcome(s) 15

Idea: Track the History of a Branch 2 3 T Previous Outcome PC 0 1 N Counter if prev=0 1 3 0 Counter if prev=1 prediction = N 1 3 3 prev = 1 3 0 prediction = T prev = 0 3 0 prediction = N prev = 1 3 0  prediction = T prev = 1 3 3 prediction = T prev = 0 3 0 prediction = T prev = 0 3 2 prediction = T prev = 1 3 2 prediction = T prev = 1 3 3 16

Deeper History Covers More Patterns Last 3 Outcomes Counter if prev=000 Counter if prev=001 Counter if prev=010 PC 0 0 1 1 3 1 0 3 2 0 2 Counter if prev=111 • What pattern has this branch predictor entry learned? 001  1; 011  0; 110  0; 100  1 00110011001… (0011)* 17

Global vs. Local Branch History • Local Behavior – What is the predicted direction of Branch A given the outcomes of previous instances of Branch A? • Global Behavior – What is the predicted direction of Branch Z given the outcomes of all* previous branches A, B, …, X and Y? * number of previous branches tracked limited by the history length 18

Why Global Correlations Exist • Example: related branch conditions p = findNode(foo); A: if ( p is parent ) do something; do other stuff; /* may contain more branches */ Outcome of second branch is always if ( p is a child ) B: opposite of the first branch do something else; 19

Can we do better ? • Correlating branch predictors also look at other branches for clues Prediction if the last branch is NT Prediction if the last branch is T (1,1) predictor – uses history of 1 branch and uses a 1-bit predictor 20

Correlating Branch Predictor • If we use 2 branches as histories, then there are 4 possibilities (T-T, NT-T, NT-NT, NT-T). • For each possibility, we need to use a predictor (1-bit, 2-bit). • And this repeats for every branch . if (aa==2) T aa = 0 if (bb==2) T bb = 0 if(aa!=bb) { … NT (2,2) branch prediction 21

Performance of Correlating Branch Prediction • With same number of state bits, (2,2) performs better than noncorrelating 2-bit predictor. • Outperforms a 2-bit predictor with infinite number of entries 22

Other Global Correlations • Testing same/similar conditions – code might test for NULL before a function call, and the function might test for NULL again – partial correlations: one branch could test for cond 1 , and another branch could test for cond 1 && cond 2 (if cond 1 is false, then the second branch can be predicted as false) – multiple correlations: one branch tests cond 1 , a second tests cond 2 , and a third tests cond 1 ⊕ cond 2 (which can always be predicted if the first two branches are known). 23

Tournament Predictors • No predictor is clearly the best – Different branches exhibit different behaviors • Some “constant”, some global, some local • Idea: Let’s have a predictor to predict which predictor will predict better  24

Tournament Hybrid Predictors table of 2-/3-bit counters Meta- Pred 0 Pred 1 Predictor Meta Pred 0 Pred 1 Update Final Prediction   --- If meta-counter MSB = 0,   Inc use pred 0 else use pred 1   Dec ---   25

Direction Predictor Accuracy 27

Target Address Prediction • Branch Target Buffer – IF stage: need to know fetch addr every cycle – Need target address one cycle after fetching a branch – For some branches (e.g., indirect) target known only after EX stage, which is way too late – Even easily-computed branch targets need to wait until instruction decoded and direction predicted in ID stage (still at least one cycle too late) – So, we have a quick-and-dirty predictor for the target that only needs the address of the branch instruction 28

Reduce Branch Penalty 29

Branch Target Buffer • BTB indexed by instruction address • We don’t even know if it is a branch! Direction prediction • If address matches a BTB entry, it is can be factored out into separate table predicted to be a branch • BTB entry tells whether it is taken (direction) and where it goes if taken • BTB takes only the instruction address, so while we fetch one instruction in the IF stage we are predicting where to fetch the next one from 30

Branch Target Buffer 31

BTB Operations 32

Return Address Stack (RAS) • Function returns are frequent, yet – Address is difficult to compute (have to wait until EX stage done to know it) – Address difficult to predict with BTB (function can be called from multiple places) • But return address is actually easy to predict – It is the address after the last call instruction that we haven’t returned from yet – Hence the Return Address Stack 33

Return Address Stack (RAS) • Call pushes return address into the RAS • When a return instruction decoded, pop the predicted return address from RAS • Accurate prediction even w/ small RAS 34

ECE 2162 Branch Prediction Control Dependencies Branches are very - PowerPoint PPT Presentation

ECE 2162 Branch Prediction Control Dependencies Branches are very frequent Approx. 20% of all instructions Can not wait until we know where it goes Long pipelines Branch outcome known after B cycles No scheduling past the

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

1 Predictor for a Single Branch Branch History Table of 1-bit Predictor BHT also Called Branch

Building stuff with monadic dependencies + unchanging dependencies + polymorphic dependencies +

ECE 2162 Memory Views of Memory Real machines have limited amounts of memory 640KB? A

1 The Hardware: Reorder Buffer Branch Prediction vs. Precise Interrupt If inst write results in

Task Dependencies: ant Steven J Zeil February 25, 2013 Task Dependencies: ant Outline

1 Branch History Table of 1-bit Predictor 1-bit BHT Weakness BHT also Called Branch Example: in

COMP 590-154: Computer Architecture Branch Prediction Fragmentation due to Branches Fetch

Branch Prediction Tackles problem of stalls from control

GOPIPURA 2649 SBIN02649 SURAT MAIN (CHOWK BAZAR) 488 SBIN00488 2 AHMEDABAD AMBHETHA 4075

Stopping CAUTI Henry County Hospital Where We Started 2500 2283 2246 2162 2000 Device Days

FERMION DARK MATTER Accepted to JHEP [arXiv:1106.2162] Cornell University In collaboration with

Branch Prediction Philipp Koehn 11 October 2019 Philipp Koehn Computer Systems Fundamentals:

What about branches? Branch outcomes are not known until EXE What are our options? 1

Multicore Processing Element for SIMD Computing Da-Qi Ren and Reiji Suda Department of Computer

Excursion 3 Tour III Capability and Severity: Deeper Concepts Frequentist Family Feud A

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

About me A data engineering challenge

Rigorous Evaluation Analysis and Reporting Structure is from A Practical Guide to Usability

Research and Analysis for Public Policy and Management: Principles and Practices from Active

The local velocity field according to 6dFGSv Christina Magoulas (UCT) ! and the 6dFGSv team LSS

9 P T A H > T 2 4 D 0 C LinearSystemswithConstant Coefficients W e are now ready

ECE 2162 Branch Prediction Control Dependencies Branches are very - PowerPoint PPT Presentation

ECE 2162 Branch Prediction Control Dependencies Branches are very frequent Approx. 20% of all instructions Can not wait until we know where it goes Long pipelines Branch outcome known after B cycles No scheduling past the

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

1 Predictor for a Single Branch Branch History Table of 1-bit Predictor BHT also Called Branch

Building stuff with monadic dependencies + unchanging dependencies + polymorphic dependencies +

ECE 2162 Memory Views of Memory Real machines have limited amounts of memory 640KB? A

1 The Hardware: Reorder Buffer Branch Prediction vs. Precise Interrupt If inst write results in

Task Dependencies: ant Steven J Zeil February 25, 2013 Task Dependencies: ant Outline

1 Branch History Table of 1-bit Predictor 1-bit BHT Weakness BHT also Called Branch Example: in

COMP 590-154: Computer Architecture Branch Prediction Fragmentation due to Branches Fetch

Branch Prediction Tackles problem of stalls from control

GOPIPURA 2649 SBIN02649 SURAT MAIN (CHOWK BAZAR) 488 SBIN00488 2 AHMEDABAD AMBHETHA 4075

Stopping CAUTI Henry County Hospital Where We Started 2500 2283 2246 2162 2000 Device Days

FERMION DARK MATTER Accepted to JHEP [arXiv:1106.2162] Cornell University In collaboration with

Branch Prediction Philipp Koehn 11 October 2019 Philipp Koehn Computer Systems Fundamentals:

What about branches? Branch outcomes are not known until EXE What are our options? 1

Multicore Processing Element for SIMD Computing Da-Qi Ren and Reiji Suda Department of Computer

Excursion 3 Tour III Capability and Severity: Deeper Concepts Frequentist Family Feud A

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

About me A data engineering challenge

Rigorous Evaluation Analysis and Reporting Structure is from A Practical Guide to Usability

Research and Analysis for Public Policy and Management: Principles and Practices from Active

The local velocity field according to 6dFGSv Christina Magoulas (UCT) ! and the 6dFGSv team LSS

9 P T A H &gt; T 2 4 D 0 C LinearSystemswithConstant Coefficients W e are now ready

9 P T A H > T 2 4 D 0 C LinearSystemswithConstant Coefficients W e are now ready