Advanced branch predic.on algorithms Ryan Gabrys Ilya - PowerPoint PPT Presentation

Advanced ¡branch ¡predic.on ¡ algorithms ¡ Ryan ¡Gabrys ¡ Ilya ¡Kolykhmatov ¡

Context ¡ • Branches ¡are ¡frequent: ¡15-‑25 ¡% ¡ • A ¡branch ¡predictor ¡allows ¡the ¡processor ¡to ¡specula.vely ¡ fetch ¡and ¡execute ¡instruc.ons ¡down ¡the ¡predicted ¡path ¡ • Predictor ¡accuracy ¡is ¡more ¡important ¡for ¡deeper ¡pipelines ¡ – Pen.um ¡4 ¡with ¡PrescoJ ¡core ¡pipeline ¡has ¡31 ¡stages ¡ – A ¡lot ¡of ¡cycles ¡can ¡be ¡wasted ¡on ¡mispredic.on: ¡ • No ¡specula.ve ¡state ¡may ¡commit ¡ • Squash ¡instruc.ons ¡in ¡the ¡pipeline ¡ • Must ¡not ¡allow ¡stores ¡in ¡the ¡pipeline ¡to ¡occur ¡ • Need ¡to ¡handle ¡excep.ons ¡appropriately ¡ – Pen.um ¡III ¡branch ¡penal.es: ¡ ¡ • Not ¡Taken: ¡no ¡penalty ¡ • Correctly ¡predicted ¡taken: ¡1 ¡cycle ¡ • Mispredicted: ¡at ¡least ¡9 ¡cycles, ¡as ¡many ¡as ¡26, ¡average ¡10-‑15 ¡cycles ¡

Branch ¡predic.on ¡schemes Tradeoff! ¡ Accuracy ¡ Latency ¡ (larger ¡tables, ¡more ¡logic) ¡ (smaller ¡tables, ¡less ¡logic) ¡

Dynamic ¡branch ¡predic.on ¡ with ¡perceptrons 2001 ¡ Daniel ¡A. ¡Jimenez ¡and ¡Calvin ¡Lin ¡

Condi.onal ¡branch ¡predic.on ¡as ¡ ¡a ¡machine ¡learning ¡problem ¡ • The ¡machine ¡learns ¡to ¡predict ¡condi.onal ¡ branches ¡ • So ¡why ¡not ¡apply ¡a ¡machine ¡learning ¡algorithm? ¡ • Ar.ficial ¡neural ¡networks ¡ – Simple ¡model ¡of ¡neural ¡networks ¡in ¡brain ¡cells ¡ – Learn ¡to ¡recognize ¡and ¡classify ¡paJerns ¡ • Perceptron ¡– ¡simplest ¡neural ¡network ¡with ¡ beJer ¡accuracy ¡than ¡any ¡previously ¡known ¡ predictor ¡

Branch-‑predic.ng ¡perceptron ¡ … branch ¡history ¡ 1 ¡ –1 ¡ 1 ¡ 1 ¡ 1 ¡ weights ¡ learned ¡by ¡ on-‑line ¡training ¡ predict ¡taken ¡if ¡ y ¡≥ ¡0 ¡ • Training ¡finds ¡correla.ons ¡between ¡history ¡and ¡outcome ¡

Organiza.on ¡of ¡the ¡perceptron ¡predictor ¡ Hash

Training ¡algorithm ¡

What ¡do ¡the ¡weights ¡mean? ¡ ¡ Correla.ng ¡weights ¡ w 1 ,…, ¡ w n : ¡ … 1 ¡ –1 ¡ 1 ¡ 1 ¡ 1 ¡ – w i ¡is ¡propor.onal ¡to ¡the ¡probability ¡that ¡ the ¡predicted ¡branch ¡agrees ¡with ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ the ¡ i th ¡branch ¡in ¡the ¡history ¡ ¡ Bias ¡weight ¡ w 0 : ¡ – Propor.onal ¡to ¡the ¡probability ¡that ¡the ¡ branch ¡is ¡taken ¡ – Doesn’t ¡take ¡into ¡account ¡other ¡ branches ¡ ¡ ¡ What’s ¡ θ ? ¡ – Keeps ¡from ¡overtraining; ¡adapt ¡quickly ¡ to ¡changing ¡behavior ¡

Mathema.cal ¡intui.on ¡ • Perceptron ¡defines ¡a ¡hyperplane ¡in ¡(n+1)-‑dimensional ¡space: ¡ • In ¡2D ¡space ¡we ¡have ¡equa.on ¡of ¡a ¡line: ¡ • In ¡3D, ¡we ¡have ¡equa.on ¡of ¡a ¡plane: ¡ ¡ • This ¡hyperplane ¡forms ¡a ¡ decision ¡surface ¡separa.ng ¡predicted ¡ taken ¡from ¡predicted ¡not ¡taken ¡instances ¡ • This ¡surface ¡intersects ¡the ¡feature ¡space. ¡ ¡Is ¡it ¡a ¡linear ¡surface, ¡ e.g. ¡a ¡line ¡in ¡2D, ¡a ¡plane ¡in ¡3D, ¡a ¡cube ¡in ¡4D… ¡

Example: ¡AND ¡ A ¡linear ¡decision ¡surface ¡ ( i.e. ¡a ¡plane ¡in ¡3D ¡space ) ¡ intersec.ng ¡the ¡feature ¡space ¡ ( i.e. ¡the ¡2D ¡plane ¡where ¡z=0 ) ¡ Representa.on ¡of ¡the ¡AND ¡func.on: ¡ separates ¡ Not ¡taken ¡from ¡ Taken ¡instances: ¡ B-‑1 ¡not ¡taken ¡ B-‑1 ¡taken ¡ B-‑2 ¡ taken ¡ B-‑2 ¡ not ¡taken ¡

Example: ¡AND ¡ • Watch ¡a ¡perceptron ¡learn ¡the ¡AND ¡func.on: ¡

Example: ¡XOR ¡ Decision ¡surface: ¡ if ¡(a) ¡not ¡taken ¡ if ¡(a) ¡taken ¡ if ¡(b) ¡ if ¡(x) ¡taken ¡ if ¡(x) ¡not ¡taken ¡ taken ¡ if ¡(b) ¡ if ¡(x) ¡not ¡taken ¡ if ¡(x) ¡taken ¡ not ¡taken ¡

Example: ¡XOR ¡ • Watch ¡a ¡perceptron ¡try ¡to ¡learn ¡XOR ¡ • Perceptron ¡cannot ¡learn ¡such ¡ linearly ¡ inseparable ¡func.ons ¡

Predic.on ¡rate ¡ Hardware ¡Budget ¡vs. ¡Predic2on ¡Rate ¡on ¡SPEC ¡2000. ¡The ¡perceptron ¡predictor ¡is ¡ more ¡accurate ¡than ¡the ¡two ¡PHT ¡methods ¡at ¡all ¡hardware ¡budgets ¡over ¡1 ¡KB. ¡

Hybrid ¡branch ¡predictor • Single ¡branch ¡predictor ¡may ¡not ¡perform ¡well ¡within ¡ and ¡across ¡different ¡execu.ons ¡ • Previous ¡research ¡shows ¡the ¡usefulness ¡of ¡adap.ng ¡ branch ¡predictors ¡at ¡run ¡.me ¡ ¡ – Combining ¡advantages ¡ of ¡different ¡branch ¡ predictors ¡ • ¡ – Increasing ¡accuracy ¡ – Use ¡choice ¡predictor ¡ to ¡decide ¡which ¡ branch ¡predictors ¡to ¡ favor ¡

Path-‑based ¡perceptron ¡ • Perceptron ¡predictor ¡uses ¡only ¡paJern ¡history ¡ informa.on ¡ – The ¡same ¡weights ¡vector ¡is ¡used ¡for ¡every ¡predic.on ¡of ¡a ¡branch ¡ – The ¡ i th ¡correla.ng ¡weight ¡is ¡aliased ¡among ¡many ¡branches ¡ • Path-‑based ¡predictor ¡uses ¡path ¡informa.on ¡ – The ¡ i th ¡correla.ng ¡weight ¡is ¡selected ¡using ¡the ¡ i th ¡branch ¡address ¡ – This ¡allows ¡the ¡predictor ¡to ¡be ¡pipelined, ¡mi.ga.ng ¡latency ¡ – This ¡strategy ¡improves ¡accuracy ¡because ¡of ¡path ¡informa.on ¡ – Even ¡more ¡aliasing ¡since ¡the ¡ i th ¡weight ¡could ¡be ¡used ¡to ¡predict ¡many ¡ different ¡branches ¡

Path-‑based ¡perceptron ¡ Path-‑based ¡perceptron ¡fetches ¡ Perceptron ¡fetches ¡all ¡weights ¡ weights ¡along ¡the ¡path ¡leading ¡up ¡ based ¡on ¡the ¡current ¡branch ¡ to ¡the ¡branch ¡and ¡computes ¡a ¡ address ¡ running ¡par.al ¡sum ¡in ¡the ¡pipeline ¡

Ahead ¡pipelining ¡ • Because ¡of ¡the ¡delay ¡in ¡accessing ¡SRAM ¡arrays ¡and ¡going ¡ through ¡whatever ¡logic ¡is ¡necessary, ¡perceptron ¡cannot ¡ produce ¡a ¡predic.on ¡in ¡the ¡same ¡cycle ¡ – decouple ¡the ¡table ¡access ¡for ¡reading ¡the ¡weights ¡from ¡adder ¡ • Ahead ¡pipelining ¡ – start ¡predic.on ¡early ¡to ¡hide ¡latency ¡of ¡predic.on ¡ – by ¡adding ¡the ¡summands ¡for ¡the ¡dot ¡product ¡before ¡the ¡ branch ¡to ¡be ¡predicted ¡is ¡fetched, ¡some ¡accuracy ¡is ¡lost ¡ because ¡the ¡weights ¡chosen ¡may ¡not ¡be ¡op.mal, ¡given ¡that ¡ they ¡were ¡not ¡chosen ¡using ¡the ¡PC ¡of ¡the ¡branch ¡to ¡be ¡ predicted ¡ – increases ¡destruc.ve ¡aliasing, ¡but ¡latency ¡benefits ¡worth ¡the ¡ loss ¡in ¡accuracy ¡

Pipelined ¡perceptron ¡ Uses ¡current ¡address ¡in ¡each ¡cycle ¡to ¡retrieve ¡the ¡weights ¡for ¡perceptron: ¡

Ahead ¡pipelined ¡perceptron ¡ Uses ¡addresses ¡from ¡the ¡previous ¡cycle ¡to ¡retrieve ¡two ¡weights ¡and ¡ then ¡chooses ¡between ¡the ¡two ¡at ¡the ¡beginning ¡of ¡the ¡next ¡cycle ¡ based ¡on ¡the ¡predic.on ¡whether ¡the ¡previous ¡branch ¡was ¡predicted ¡ taken ¡or ¡not ¡taken ¡

Piecewise ¡linear ¡branch ¡predic.on ¡ • Generaliza.on ¡of ¡perceptron ¡and ¡path-‑based ¡predictors ¡ • Weights ¡are ¡selected ¡based ¡on ¡the ¡current ¡branch ¡and ¡ the ¡i th ¡most ¡recent ¡branch ¡ • Forms ¡a ¡piecewise ¡linear ¡decision ¡surface ¡ – Each ¡piece ¡determined ¡by ¡the ¡path ¡to ¡the ¡predicted ¡branch ¡ • Can ¡solve ¡more ¡problems ¡than ¡perceptron ¡ Perceptron ¡decision ¡surface ¡for ¡XOR ¡ Piecewise ¡linear ¡decision ¡surface ¡for ¡XOR ¡ doesn’t ¡classify ¡all ¡inputs ¡correctly ¡ classifies ¡all ¡inputs ¡correctly ¡

Generaliza.on ¡con.nued ¡ Perceptron ¡and ¡path-‑based ¡ are ¡the ¡least ¡accurate ¡ extremes ¡of ¡piecewise ¡linear ¡ branch ¡predic.on ¡

Advanced branch predic.on algorithms Ryan Gabrys Ilya - PowerPoint PPT Presentation

Advanced branch predic.on algorithms Ryan Gabrys Ilya Kolykhmatov Context Branches are frequent: 15-25 % A branch predictor allows the processor

Branch Predic,on J. Nelson Amaral Why Branch Predic,on?

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

California State Disability Insurance 2012 EDD Unemploy. Policy Public Work. Disability

1 Predictor for a Single Branch Branch History Table of 1-bit Predictor BHT also Called Branch

Branch-and-Bound Math 482, Lecture 33 Misha Lavrov April 27, 2020 Branch-and-bound methods

Advanced Algorithms (I) Chihao Zhang Shanghai Jiao Tong University Feb. 25, 2019 Advanced

GOPIPURA 2649 SBIN02649 SURAT MAIN (CHOWK BAZAR) 488 SBIN00488 2 AHMEDABAD AMBHETHA 4075

Park Heights Branch Library Library Characteristics Orleans Street Branch Model Waverly Branch

Branch Branch out and become part of our wider network About us ... The Derbyshire

Town Branch Commons Town Branch History Lexington is situated on the Town Branch of the

The New Rules of Bank Transformation Branch transformation strategy expert Branch

Walz Branch Detroit-Shoreway Neighborhood facilities master plan Branch Information branch

1 Branch History Table of 1-bit Predictor 1-bit BHT Weakness BHT also Called Branch Example: in

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

In Deep Learning Anima Anandkumar & Zachary Lipton DATA AUGMENTATION To improve

Particle identification using TMVA/MLP and Nave Bayes for EMC detector Malgorzata

Bias/Variance Analysis for Network Data Jennifer Neville and David Jensen Knowledge Discovery

First Observation of Single Top Quark Production at D Monica Pangilinan Brown University on

Validation and Testing COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

LEARNING [These slides were adapted from those created by Dan Klein and Pieter Abbeel for CS188

= + TF TF IIC 0 . 1 (4) m The IIC can be qualified as a criterion to estimate

Feature-Based Tagging The Task, Again Recall: tagging ~ morphological disambiguation

Advanced branch predic.on algorithms Ryan Gabrys Ilya - PowerPoint PPT Presentation

Advanced branch predic.on algorithms Ryan Gabrys Ilya Kolykhmatov Context Branches are frequent: 15-25 % A branch predictor allows the processor

Branch Predic,on J. Nelson Amaral Why Branch Predic,on?

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

California State Disability Insurance 2012 EDD Unemploy. Policy Public Work. Disability

1 Predictor for a Single Branch Branch History Table of 1-bit Predictor BHT also Called Branch

Branch-and-Bound Math 482, Lecture 33 Misha Lavrov April 27, 2020 Branch-and-bound methods

Advanced Algorithms (I) Chihao Zhang Shanghai Jiao Tong University Feb. 25, 2019 Advanced

GOPIPURA 2649 SBIN02649 SURAT MAIN (CHOWK BAZAR) 488 SBIN00488 2 AHMEDABAD AMBHETHA 4075

Park Heights Branch Library Library Characteristics Orleans Street Branch Model Waverly Branch

Branch Branch out and become part of our wider network About us ... The Derbyshire

Town Branch Commons Town Branch History Lexington is situated on the Town Branch of the

The New Rules of Bank Transformation Branch transformation strategy expert Branch

Walz Branch Detroit-Shoreway Neighborhood facilities master plan Branch Information branch

1 Branch History Table of 1-bit Predictor 1-bit BHT Weakness BHT also Called Branch Example: in

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

In Deep Learning Anima Anandkumar &amp; Zachary Lipton DATA AUGMENTATION To improve

Particle identification using TMVA/MLP and Nave Bayes for EMC detector Malgorzata

Bias/Variance Analysis for Network Data Jennifer Neville and David Jensen Knowledge Discovery

First Observation of Single Top Quark Production at D Monica Pangilinan Brown University on

Validation and Testing COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

LEARNING [These slides were adapted from those created by Dan Klein and Pieter Abbeel for CS188

= + TF TF IIC 0 . 1 (4) m The IIC can be qualified as a criterion to estimate

Feature-Based Tagging The Task, Again Recall: tagging ~ morphological disambiguation

In Deep Learning Anima Anandkumar & Zachary Lipton DATA AUGMENTATION To improve