CS 152: Discussion Section 7 Branch Predictor and VLIW Albert Ou, - PowerPoint PPT Presentation

CS 152: Discussion Section 7 Branch Predictor and VLIW Albert Ou, Yue Dai 03/013/2020

Administrivia Problem Set 3 due 10:30am on Mon, March 16 ● Lab 3 released today, due 10:30am on Mon, April 6 ● Midterm 1 scores are available on Gradescope ● One week to submit regrade requests ○ Regrade window opens at 4pm today ○ Solutions posted on course webpage ○

Agenda Branch Prediction ● Branch History Table ○ Branch Target Buffer ○ Load/Store Queue ● VLIW ● Software Pipelining ○ Lab 3 overview ●

Branch Prediction - BHT ● Exploit temporal correlation ● How to learn based on spatial correlation?

Branch Prediction - BHT ● Use history register ● Worksheet Q1 ● Q: what’s the limitation of just using BHT?

Branch Prediction - BTB Index by branch PC but need checking ● whether PC matches, and contains branch PC and target PC in the same line. Q: What target PC should be stored? ● Should we store the not-taken target PC? Q: Which happens earlier? BTB check or ● BHT check? Q: When should BTB be updated? ●

Branch Prediction - BTB update Here we assume using both ● BTB and BHT. BTB check in IF stage, BHT check in decode stage But in a real design, the fetch ● stage may be pipelined, which makes BHT check occur in a later stage of IF. Computer Architecture, A Quantitative Approach Ch3.9

Load/Store Queue We would like to speculatively issue loads without violating in-order semantics and precise ● exceptions Q: What extra structure do you need? ●

Load/Store Queue Speculative Store Buffer ● Dispatch: ○ Store: allocate an entry in store buffer in program order ■ Load: record the position of youngest store instruction ■ older than this load Execute: ○ Store: update the corresponding address and data in ■ store buffer Load: can only execute when all older store address are ■ known; Find all stores prior to the load; If has, forward the data from the youngest match to load / If not, load from cache Commit: ○ Store: store the data to cache, free that entry ■ Load: commit normally ■ Q: What if you want to be more aggressive? ● Speculative load

Load/Store Queue Speculative Store Buffer + Load Queue ● SQ ○ Can execute load instruction without waiting for all previous store address are known ○ Load Queue is used to keep the order of load instructions. ○ When a store address is finished execution, check all load addresses in load queue which is younger than this store. ■ If no match, keep executing normally ■ If has match, flush all instruction executions after the oldest load match Problem: too expensive, large penalty for ● inaccurate addressspeculation

VLIW Compiler ● VLIW compiler needs to explicitly schedule operations to maximize parallel execution and avoid ○ data hazards Guarantees intra-instruction parallelism ○ Q: How to better schedule the code ● Loop unrolling ○ Software pipelining ○ Trace scheduling ○

VLIW - Software pipelining Software pipelining pays ● startup/wind-down costs only once per loop, not once per iteration Worksheet Q2 ●

VLIW - Trace scheduling Find the most frequent branch path and optimize it ● Use profiling feedback ● Add fix up code ●

VLIW - Predicated execution Remove mispredicted branches by using predicated execution with predict register ● Predicate register true: execute; false: nop ● Predicate register Execute either inst 3 & inst 4 or inst 5 & inst 6

BOOM: Berkeley Out-of-Order Machine Open-source, synthesizable, out-of-order superscalar RISC-V core ● Heavily inspired by the MIPS R10000 and Alpha 21264 ● Unified physical register file with explicit renaming ● Split ROB / issue window design ● Extensively parameterized: ● Fetch and issue widths, ROB size, LSU size ○ Functional unit mix, latencies ○ Issue scheduler ○ Composable branch predictors, RAS size, BTB size ○ Commit map table (R10k rollback vs Alpha 21264 single-cycle flush) ○ Maximum in-flight branches ○

BOOM: Berkeley Out-of-Order Machine

Open-Ended: Branch predictor design Implement a branch predictor in C++ that integrates with BOOM ● Objective is to improve accuracy over baseline BHT ● Competition: ● Winning team receives 10% extra credit ○ Limited division : Constrained to 64 KiB of storage, plus 2048 bits of additional budget ○ Open division : No restrictions ○ Gradescope autograder will be deployed next week ○

Open-Ended: Spectre attacks Spectre/Meltdown : Microarchitectural side-channel attacks that exploit branch prediction, ● speculative execution, and cache timing to bypass security mechanisms Objective is to recreate Spectre attacks on BOOM ● Attack scenario ● Vulnerable Spectre gadget present in supervisor syscall code ○ Write user program to infer secret data from protected kernel memory using branch predictor mis-training ○ and cache side effects Team that can guess most bytes correctly receives 10% extra credit ● Gradescope autograder will be deployed next week ○

CS 152: Discussion Section 7 Branch Predictor and VLIW Albert Ou, - PowerPoint PPT Presentation

CS 152: Discussion Section 7 Branch Predictor and VLIW Albert Ou, Yue Dai 03/013/2020 Administrivia Problem Set 3 due 10:30am on Mon, March 16 Lab 3 released today, due 10:30am on Mon, April 6 Midterm 1 scores are available on

Area of Rectangles 2 Return to Table of Contents 3 Slide 7 / 152 Slide 8 / 152 Area of a

Decimal Addition Return to Table of Contents Slide 5 / 152 Place Value Chart Slide 6 / 152

Area of Rectangles MP6: Attend to precision. MP7: Look for & make use of structure. MP8:

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Half Year Results Presentation 2019 6 months ended 30 June 2019 Section 1 Section 2 Section 3

2018 Full year results presentation 12 months ended 31 December 2018 1 Section 1 Section 2

Download Worksheet 10 Please mute yourself when not asking questions CS 152: Discussion Section

CS 152: Discussion Section 2 Pipelining Review Yue Dai, Albert Ou 02/07/2020 Administrivia PS1

CS 152: Discussion Section 6 Out-of-Order Execution Albert Ou, Yue Dai 03/06/2020 Administrivia

Osseo Road reconstruction County Road 152 in Minneapolis Project update July 23, 2020 Amber

CS 152 Computer Architecture and Engineering Lecture 12: Multicycle Controller Design October 10,

Final exam location: Clough 152 Please fill out your CIOS survey! Post topics for

How to Give a Bad Talk How to Give a Bad Talk Professor David A. Patterson Computer Science 152

May 2013 Agenda Section 1 Jaypee Group Overview Section 2 Company Overview Section 3 Yamuna

Fermilab NORTH 0 20 20 40 1"=20'-0" 2/8/2019 6:57:50 PM 4850 LEVEL SCALE SC LE

Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section

1 Predictor for a Single Branch Branch History Table of 1-bit Predictor BHT also Called Branch

Branch Prediction Tackles problem of stalls from control

ARCHITECT & ACTIVIST Walking the thin line between professional practice and social &

The Global Effects of U.S. Monetary Policy Riccardo Degasperi 1 Simon Hong 2 Giovanni Ricco 3 1

An Efficient Quantum Collision Search Algorithm and Implications on Symmetric Cryptography

Reducing the branch delay IF.Flush Hazard detection unit ID/EX M u x WB

r r t rt

Superscalar Design: Instruction Flow Techniques Virendra Singh Associate Professor C omputer A

CS 152: Discussion Section 7 Branch Predictor and VLIW Albert Ou, - PowerPoint PPT Presentation

CS 152: Discussion Section 7 Branch Predictor and VLIW Albert Ou, Yue Dai 03/013/2020 Administrivia Problem Set 3 due 10:30am on Mon, March 16 Lab 3 released today, due 10:30am on Mon, April 6 Midterm 1 scores are available on

Area of Rectangles 2 Return to Table of Contents 3 Slide 7 / 152 Slide 8 / 152 Area of a

Decimal Addition Return to Table of Contents Slide 5 / 152 Place Value Chart Slide 6 / 152

Area of Rectangles MP6: Attend to precision. MP7: Look for &amp; make use of structure. MP8:

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Half Year Results Presentation 2019 6 months ended 30 June 2019 Section 1 Section 2 Section 3

2018 Full year results presentation 12 months ended 31 December 2018 1 Section 1 Section 2

Download Worksheet 10 Please mute yourself when not asking questions CS 152: Discussion Section

CS 152: Discussion Section 2 Pipelining Review Yue Dai, Albert Ou 02/07/2020 Administrivia PS1

CS 152: Discussion Section 6 Out-of-Order Execution Albert Ou, Yue Dai 03/06/2020 Administrivia

Osseo Road reconstruction County Road 152 in Minneapolis Project update July 23, 2020 Amber

CS 152 Computer Architecture and Engineering Lecture 12: Multicycle Controller Design October 10,

Final exam location: Clough 152 Please fill out your CIOS survey! Post topics for

How to Give a Bad Talk How to Give a Bad Talk Professor David A. Patterson Computer Science 152

May 2013 Agenda Section 1 Jaypee Group Overview Section 2 Company Overview Section 3 Yamuna

Fermilab NORTH 0 20 20 40 1&quot;=20'-0&quot; 2/8/2019 6:57:50 PM 4850 LEVEL SCALE SC LE

Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section

1 Predictor for a Single Branch Branch History Table of 1-bit Predictor BHT also Called Branch

Branch Prediction Tackles problem of stalls from control

ARCHITECT &amp; ACTIVIST Walking the thin line between professional practice and social &amp;

The Global Effects of U.S. Monetary Policy Riccardo Degasperi 1 Simon Hong 2 Giovanni Ricco 3 1

An Efficient Quantum Collision Search Algorithm and Implications on Symmetric Cryptography

Reducing the branch delay IF.Flush Hazard detection unit ID/EX M u x WB

r r t rt

Superscalar Design: Instruction Flow Techniques Virendra Singh Associate Professor C omputer A

Area of Rectangles MP6: Attend to precision. MP7: Look for & make use of structure. MP8:

Fermilab NORTH 0 20 20 40 1"=20'-0" 2/8/2019 6:57:50 PM 4850 LEVEL SCALE SC LE

ARCHITECT & ACTIVIST Walking the thin line between professional practice and social &