Branch prediction: Jim, Yale, Andr, Daniel and the others Andr - PowerPoint PPT Presentation

1 Branch prediction: Jim, Yale, André, Daniel and the others André Seznec Daniel A. Jiménez

2 Title genuinely inspired by: 4 stars, but many other actors Yeh, Pan, Evers, Young, McFarling, Michaud, Stark, Loh, Sprangle, Mudge, Kaeli, Skadron and many others

3 Prehistory • As soon as one considers pipelining, branches are a performance issue  I was told that IBM considered the problem as • early as the late 50’s.

4 Jim ”Let us predict the branches”

5 History begins • Jim Smith (1981) :  A study of branch prediction strategies Introduced: •  Dynamic branch prediction  PC based prediction  2-bits counter prediction 2bc prediction performs quite well

6 ”let us use branch history”

By 1990, (very) efficient branch 7 prediction became urgent • Deep pipeline : 10 cycles • Superscalar execution: 4 inst/cycle Out-of-Order execution •  50-100 instructions inflight considered possible • Nowadays: much more !!

8 Two level history • Tsu Yeh and Yale Patt 91:  Not just the 2-bit counters indexed by PC  But also the past: Of this branch: local history  Of all branches: global history  ☞ global control flow path 

global branch history 9 9 Yeh and Patt 91, Pan, So, Rameh 92 B1: if cond1 B2: if cond2 B3: if cond1 and cond2 B1 and B2 outputs determine B3 output Global history: vector of bits (T/NT) representing the past branches Table indexed by PC + global history

local history 10 10 Yeh and Patt 91 Look at the 3 last occurrences: for (i=0; i<100; i++) If all loop backs then loop exit for (j=0;j<4;j++) otherwise: loop back loop body • A local history per branch • Table of counters indexed with PC + local history Loop count is a particular form of local history

11 Nowadays most predictors exploit: Global path/branch history Some form of local history

Branch prediction: 12 Hot research topic in the late 90 ’ s • McFarling 1993:  Gshare (hashing PC and history) +Hybrid predictors • « Dealiased » predictors: reducing table conflicts impact  Bimode, e-gskew, Agree 1997 Essentially relied on 2-bit counters

13 Two level history predictors • Generalized usage by the end of the 90’s • Hybrid predictors (e.g. Alpha EV6).

14 A few other highly mentionable folks • Marius Evers (from Yale’s group) showed  Power of hybrid predictors to fight aliasing, improve accuracy  Most branches predictable with just a few selected ghist bits  Potential of long global histories to improve accuracy • Jared Stark (also Yale’s)  Variable length path BP: long histories, pipelined design  Implements these crazy things for Intel, laughs heartily when I ask him how it works Trevor Mudge could have his own section •  Many contributions to mitigating aliasing  More good analysis of branch correlation  Cool analysis of branch prediction through compression

15 ”let us apply machine learning”

16 A UFO : The perceptron predictor Jiménez and Lin 2001 branch history signed 8-bit Integer weights as (-1,+1) ∑ X Sign=prediction Update on mispredictions or if |SUM| < 

17 (Initial) perceptron predictor • Competitive accuracy • High hardware complexity and latency Often better than classical predictors • • Intellectually challenging

18 Rapidly evolved to + 4 out of 5 CBP-1 (2004) finalists based on perceptron, including the winner (Gao and Can combine predictions: Zhou) -global path/branch history -local history Oracle, AMD, Samsung -multiple history lengths use perceptron (Zen 2 - .. added TAGE)

19 Path-Based Perceptron (2003, 2005) Path-based predictor reduces latency and improves accuracy Turns out (2005) it also eliminates linear separability problem

20 Scaled Neural Analog Predictor (2008) Mixed-signal implementation allows weight scaling, power savings, very low latency

Multiperspective Perceptron 21 Predictor (2016) Traditional perceptron. Few perspectives: global and local history. New idea: multiple perspectives: global/local plus many new features e.g. recency position, blurry path, André’s IMLI, modulo path, etc.etc. Greatly improved accuracy. Can combine with TAGE. Work continues…

22 ”let us use very long histories”

23 In the old world

EV8 predictor: ( derived from) 2bc-gskew 24 Seznec et al, ISCA 2002 (1999) e-gskew Michaud et al 97 Learnt that: - Very long path correlation exists - They can be captured

25 In the new world

26 An answer • The geometric length predictors:  GEHL and TAGE

The basis : A Multiple length global 27 history predictor T0 T1 T2 L(0) ? T3 L(1) L(2) T4 L(3) L(4) With a limited number of tables

28 Underlying idea • H and H’ two history vectors equal on N bits, but differ on bit N+1 e.g. L(1)  N  L(2)  Branches (A,H) and (A,H’) • biased in opposite directions Table T2 should allow to discriminate between (A,H) and (A,H’)

29 GEometric History Length predictor The set of history lengths forms a geometric series L(0) = 0 L(i) = a i - 1L(1) {0, 2, 4, 8, 16, 32, 64, 128} What is important: L(i)-L(i-1) is drastically increasing Spends most of the storage for short history !!

GEHL (2004) 30 prediction through an adder tree TO T1 T2 L(0) ∑ T3 L(1) L(2) T4 L(3) Prediction=Sign L(4) Using the perceptron idea with geometric histories

TAGE (2006) 31 prediction through partial match h[0:L1] pc pc pc h[0:L2] pc h[0:L3] ctr tag u ctr tag u ctr tag u =? =? =? 1 1 1 1 1 1 1 1 1 Tagless base predictor prediction

The Geometric History Length 32 Predictors • Tree adder:  O-GEHL: Optimized GEometric History Length predictor CBP-1, 2004, best practice award  • Partial match: TAGE: TAgged GEometric history length predictor  Inspired from PPM-like, Michaud 2004  + geometric length + optimized update policy Basis of the CBP-2,-3,-4,-5 winners 

33 GEHL (CBP-1, 2004) • Perceptron-inspired  Eliminate the multiply-add  Geometric history length: 4 to 12 tables Dynamic threshold fitting  Jiménez consider this the most important  contribution to perceptron learning  6-bit counters appears as a good trade-off

34 Doing better : TAGE • Partial tag match  almost .. • Geometric history length Very effective update policy •

35 Miss Hit Pred = = = ? ? ? 1 1 1 1 1 1 1 1 Hit 1 Altpred

36 TAGE update policy Minimize the footprint of the prediction.  Just update the longest history matching component Allocate at most one otherwise useless  entry on a misprediction

37 TAGE vs OGEHL Rule of thumb: At equivalent storage budget 10 % less misprediction on TAGE

38 Hybrid is nice

From CBP 2011, 39 « the Statistical Corrector targets » • Branches with poor correlation with history:  Sometimes better predicted by a single wide PC indexed counter than by TAGE • More generally, track cases such that: « For this (PC, history, prediction, confidence),  TAGE is likely (>50 %) to mispredict » statistically

40 TAGE-GSC ( CBP 2011) (was named a posteriori in Micro 2015) ≈3-5% MPKI red. PC +Global history (Main) Prediction + Stat. Confidence Glob hist PC + TAGE Cor. Predictor Just a global hist neural predictor: + tables indexed with PC, TAGE pred. and confidence

41 TAGE-SC • Micro 2011, CBP4, CBP5 Use any (relevant) source of information at the entry of the statistical correlator. Global history  Local history  IMLI counter (Micro 2015)  TAGE-SC = Multiperspective perceptron + TAGE

42 A BP research summary (CBP1 traces) 2bit counters 1981: 8.55 misp/KI  No real work before 1991: win 37 % Gshare 1993: 5.30 misp/KI  Hot topic, heroic efforts: win 28 %,  EV8-like 2002 (1999): 3.80 misp/KI The perceptron era, a few actors: win 25 %  CBP-1 2004: 2.82 misp/KI TAGE introduction: win 10%,  TAGE 2006: 2.58 misp/KI A hobby for AS and DJ : win 10%,  TAGE-SC 2016: 2.36 misp/KI

43 Future of Branch Prediction research ? •See the limit study at CBP-5: • about 30 % misp. gap 512K  unlimited •New workloads are challenging •Server •Mobile •Web •These were in CBP-5, expected in CBP-6 •Need other new ideas to go further •Information source ? •Some better way to extract correlation ? •Deep learning ?

Branch prediction: Jim, Yale, Andr, Daniel and the others Andr - PowerPoint PPT Presentation

1 Branch prediction: Jim, Yale, Andr, Daniel and the others Andr Seznec Daniel A. Jimnez 2 Title genuinely inspired by: 4 stars, but many other actors Yeh, Pan, Evers, Young, McFarling, Michaud, Stark, Loh, Sprangle, Mudge, Kaeli,

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

1 Predictor for a Single Branch Branch History Table of 1-bit Predictor BHT also Called Branch

Is branch prediction important for performance? Daniel J. Bernstein Spectre paper: Modern

YALE YOUTH HOCKEY 2014-2015 Season Program Orientation March 13, 2014 YALE YOUTH HOCKEY

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Graduate Student Life at Yale Presented By Lisa Brandes, Ph.D. Assistant Dean for Student

Deterministic OpenMP Amittai Aviram Dissertation Defense Department of Computer Science Yale

with Machine Learning Michela Paganini Yale 1 Yale How does ML empower Physics at the

Introduction to Data Mining CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University

Distances & Similarities CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University

Branch Prediction Philipp Koehn 11 October 2019 Philipp Koehn Computer Systems Fundamentals:

Branch-and-Bound Math 482, Lecture 33 Misha Lavrov April 27, 2020 Branch-and-bound methods

1 Branch History Table of 1-bit Predictor 1-bit BHT Weakness BHT also Called Branch Example: in

California State Disability Insurance 2012 EDD Unemploy. Policy Public Work. Disability

Interpretable Rules in Relaxed Logical Form Bishwamittra Ghosh 1 ML algorithms continue to

The$use$of$ar+ficial$sweeteners$and$ cancer$risk:$a$systema+c$review$

CDBG/SSG CDBG/SSG Pr Pre-Applica pplication tion Meeting Meeting October 8, 2020 Applica

BlaiseIS Sample Management BlaiseIS Sample Management Hueichun Peng, Lisa Wood and Gina-Qian

Welcome to the Back End: The LLVM Machine Representation Matthias Braun, Apple Program

A New Path Forward for Using Decentralized Clinical Trials Jeffry Florian, FDA CDER Annemarie

9: 9:

Using Imp Type Theory and Coq Tom Salet Radboud University Nijmegen May 13, 2016 Tom Salet

Branch prediction: Jim, Yale, Andr, Daniel and the others Andr - PowerPoint PPT Presentation

1 Branch prediction: Jim, Yale, Andr, Daniel and the others Andr Seznec Daniel A. Jimnez 2 Title genuinely inspired by: 4 stars, but many other actors Yeh, Pan, Evers, Young, McFarling, Michaud, Stark, Loh, Sprangle, Mudge, Kaeli,

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

1 Predictor for a Single Branch Branch History Table of 1-bit Predictor BHT also Called Branch

Is branch prediction important for performance? Daniel J. Bernstein Spectre paper: Modern

YALE YOUTH HOCKEY 2014-2015 Season Program Orientation March 13, 2014 YALE YOUTH HOCKEY

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Graduate Student Life at Yale Presented By Lisa Brandes, Ph.D. Assistant Dean for Student

Deterministic OpenMP Amittai Aviram Dissertation Defense Department of Computer Science Yale

with Machine Learning Michela Paganini Yale 1 Yale How does ML empower Physics at the

Introduction to Data Mining CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University

Distances &amp; Similarities CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University

Branch Prediction Philipp Koehn 11 October 2019 Philipp Koehn Computer Systems Fundamentals:

Branch-and-Bound Math 482, Lecture 33 Misha Lavrov April 27, 2020 Branch-and-bound methods

1 Branch History Table of 1-bit Predictor 1-bit BHT Weakness BHT also Called Branch Example: in

California State Disability Insurance 2012 EDD Unemploy. Policy Public Work. Disability

Interpretable Rules in Relaxed Logical Form Bishwamittra Ghosh 1 ML algorithms continue to

The$use$of$ar+ficial$sweeteners$and$ cancer$risk:$a$systema+c$review$

CDBG/SSG CDBG/SSG Pr Pre-Applica pplication tion Meeting Meeting October 8, 2020 Applica

BlaiseIS Sample Management BlaiseIS Sample Management Hueichun Peng, Lisa Wood and Gina-Qian

Welcome to the Back End: The LLVM Machine Representation Matthias Braun, Apple Program

A New Path Forward for Using Decentralized Clinical Trials Jeffry Florian, FDA CDER Annemarie

*9: *9:

Using Imp Type Theory and Coq Tom Salet Radboud University Nijmegen May 13, 2016 Tom Salet

Distances & Similarities CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University

9: 9: