What about branches? Branch outcomes are not known until EXE What - PowerPoint PPT Presentation

What about branches? • Branch outcomes are not known until EXE • What are our options? 1

Control Hazards 2

Today • Quiz • Control Hazards • Midterm review • Return your papers 3

Key Points: Control Hazards • Control occur when we don’t know what the next instruction is • Mostly caused by branches • Strategies for dealing with them • Stall • Guess! • Leads to speculation • Flushing the pipeline • Strategies for making better guesses • Understand the difference between stall and flush 4

Control Hazards add $s1, $s3, $s2 • Computing the new PC sub $s6, $s5, $s2 beq $s6, $s7, somewhere and $s2, $s3, $s1 Fetch Deco Mem Write EX de back 5

Computing the PC • Non-branch instruction • PC = PC + 4 • When is PC ready? Fetch Deco Mem Write EX de back 6

Computing the PC • Branch instructions • bne $s1, $s2, offset • if ($s1 != $s2) { PC = PC + offset} else {PC = PC + 4;} • When is the value ready? Fetch Deco Mem Write EX de back 7

Computing the PC if (Instruction is branch) { if ($s1 != $s2) { PC = PC + offset; • Wait, when we do know? } else { PC = PC + 4; } } else { PC = PC + 4; } Fetch Deco Mem Write EX de back 8

There is a constant control hazard • We don’t even know what kind of instruction we have until decode. • Let’s consider the non-branch case first. • What do we do? 9

Option 1: Smart ISA design Cycles Fetch Deco Mem Write EX add $s0, $t0, $t1 de back Fetch Deco Mem Write EX sub $t2, $s0, $t3 de back Fetch Deco Mem Write EX sub $t2, $s0, $t3 de back Fetch Deco Mem Write EX sub $t2, $s0, $t3 de back • Make it very easy to tell if the instruction is a branch -- maybe a single bit or just a couple. • Decode is trivial • Pre-decode -- • Do part of decode when the instruction comes on chip. • more on this later 10

Option 2: The compiler • Use “branch delay” slots. • The next N instructions after a branch are always executed • Good • Simple hardware • Bad • N cannot change. 11

Delay slots. Cycles Taken Fetch Deco Mem Write EX bne $t2, $s0, somewhere de back Fetch Deco Mem Write EX add $t2, $s4, $t1 de back Branch Delay Fetch Deco Mem Write EX add $s0, $t0, $t1 de back ... Fetch Deco Mem somewhere: EX de sub $t2, $s0, $t3 12

Option 4: Stall Cycles Fetch Deco Mem Write EX add $s0, $t0, $t1 de back Fetch Deco Mem Write EX bne $t2, $s0, somewhere de back Fetch Deco EX sub $t2, $s0, $t3 Stall de Fetch Deco sub $t2, $s0, $t3 de • What does this do to our CPI? • Speedup? 13

Performance impact of stalling • ET = I * CPI * CT • Branches about about 1 in 5 instructions • What’s the CPI for branches? • Speedup = • ET = 14

Performance impact of stalling • ET = I * CPI * CT • Branches about about 1 in 5 instructions • What’s the CPI for branches? 1 + 2 = 3 This is really the CPI for the instruction that follows the branch. • Speedup = • ET = 14

Performance impact of stalling • ET = I * CPI * CT • Branches about about 1 in 5 instructions • What’s the CPI for branches? 1 + 2 = 3 This is really the CPI for the instruction that follows the branch. • Speedup = 1/(.2/(1/3) + (.8) = 0.714 • ET = 14

Performance impact of stalling • ET = I * CPI * CT • Branches about about 1 in 5 instructions • What’s the CPI for branches? 1 + 2 = 3 This is really the CPI for the instruction that follows the branch. • Speedup = 1/(.2/(1/3) + (.8) = 0.714 • ET = 1 * (.2*3 + .8 * 1) * 1 = 1.4 14

Option 2: Simple Prediction • Can a processor tell the future? • For non-taken branches, the new PC is ready immediately. • Let’s just assume the branch is not taken • Also called “branch prediction” or “control speculation” • What if we are wrong? 15

Predict Not-taken Cycles Not-taken Fetch Deco Mem Write EX bne $t2, $s0, somewhere de back Taken Fetch Deco Mem Write EX bne $t2, $s4, else de back add $s0, $t0, $t1 ... else: sub $t2, $s0, $t3 • We start the add, and then, when we discover the branch outcome, we squash it. • We “flush” the pipeline. 16

Predict Not-taken Cycles Not-taken Fetch Deco Mem Write EX bne $t2, $s0, somewhere de back Taken Fetch Deco Mem Write EX bne $t2, $s4, else de back Fetch Deco Mem Write EX add $s0, $t0, $t1 de back ... else: sub $t2, $s0, $t3 • We start the add, and then, when we discover the branch outcome, we squash it. • We “flush” the pipeline. 16

Predict Not-taken Cycles Not-taken Fetch Deco Mem Write EX bne $t2, $s0, somewhere de back Taken Fetch Deco Mem Write EX bne $t2, $s4, else de back Fetch Deco Mem Write EX add $s0, $t0, $t1 de back ... Fetch Deco else: de sub $t2, $s0, $t3 • We start the add, and then, when we discover the branch outcome, we squash it. • We “flush” the pipeline. 16

Predict Not-taken Cycles Not-taken Fetch Deco Mem Write EX bne $t2, $s0, somewhere de back Taken Fetch Deco Mem Write EX bne $t2, $s4, else de back Fetch Deco Mem Write EX Squash add $s0, $t0, $t1 de back ... Fetch Deco else: de sub $t2, $s0, $t3 • We start the add, and then, when we discover the branch outcome, we squash it. • We “flush” the pipeline. 16

Simple “static” Prediction • “static” means before run time • Many prediction schemes are possible • Predict taken • Pros? • Predict not-taken • Pros? 17

Simple “static” Prediction • “static” means before run time • Many prediction schemes are possible • Predict taken • Pros? Loops are commons • Predict not-taken • Pros? 17

Simple “static” Prediction • “static” means before run time • Many prediction schemes are possible • Predict taken • Pros? Loops are commons • Predict not-taken • Pros? Not all branches are for loops. 17

Simple “static” Prediction • “static” means before run time • Many prediction schemes are possible • Predict taken • Pros? Loops are commons • Predict not-taken • Pros? Not all branches are for loops. Backward Taken/Forward not taken Best of both worlds. 17

Implementing Backward taken/forward not taken .// 2 .// :;5< 7+<=> !"#$+%$$&+- !"#$%&'()" ?@$@ !"#$ 3+45#$+% *+,)%- *+,)%- +,#*#+- !6+$';A?+' !"#$+%$$&+. !"#$ 657+ BC+'A*+, ?+'ABC+' !"#$ .89 %$$&"'' 01 %$$&"'' (&)*"+%$$& ,#*# *+,ADE !"#$ +,#*#+. (&)*"+,#*# (&)*"+,#*# :54" BC$+"/ -/ 0.

Implementing Backward taken/forward not taken Compute target Sign Shi< le< 2 Extend Add Insert bubble Add Add 4 Shi< le< 2 Read Addr 1 Instruc(on Data Read Register Memory Memory Data 1 IFetch/Dec Read Addr 2 Read File Exec/Mem Dec/Exec Read ALU PC Address Address Write Addr Data Mem/WB Read Data 2 Write Data Write Data Sign Extend 16 32

Implementing Backward taken/forward not taken • Changes in control • New inputs to the control unit • The sign of the offset • The result of the branch • New outputs from control • The flush signal. • Inserts “noop” bits in datapath and control 20

Performance Impact • ET = I * CPI * CT • Back taken, forward not taken is 80% accurate • Branches are 20% of instructions • Changing the front end increases the cycle time by 10% • What is the speedup Bt/Fnt compared to just stalling on every branch? 21

Performance Impact • ET = I * CPI * CT • Back taken, forward not taken is 80% accurate • Branches are 20% of instructions • Changing the front end increases the cycle time by 10% • What is the speedup Bt/Fnt compared to just stalling on every branch? • Btfnt • CPI = 0.2*0.2*(1 + 2) + (1-.2*.2)*1 = • CT = 1.1 • ET = 1.188 • Stall • CPI = .2*3 + .8*1 = 1.4 • CT = 1 • ET = 1.4 • Speed up = 1.4/1.188 = 1.18 22

The Importance of Pipeline depth • There are two important parameters of the pipeline that determine the impact of branches on performance • Branch decode time -- how many cycles does it take to identify a branch (in our case, this is less than 1) • Branch resolution time -- cycles until the real branch outcome is known (in our case, this is 2 cycles) 23

Pentium 4 pipeline 1. Branches take 19 cycles to resolve 2. Identifying a branch takes 4 cycles. 3. Stalling is not an option.

What about branches? Branch outcomes are not known until EXE What - PowerPoint PPT Presentation

What about branches? Branch outcomes are not known until EXE What are our options? 1 Control Hazards 2 Today Quiz Control Hazards Midterm review Return your papers 3 Key Points: Control Hazards Control occur when we

Opportunity Day 30 March 2017 Draft Background and Business Company History and Background 20

Q12019 RESULTS OUR REGIONAL PRESENCE Branches 10 Ethiopia South ATMs 2 Sudan Staff 138

CSE 110A: Winter 2020 Fundamentals of Compiler Design I Branches and Binary Operators

CS356 Unit 5 x86 Control Flow 5.2 JUMP/BRANCHING OVERVIEW 5.3 Concept of Jumps/Branches

Memcheck Reloaded: Memcheck Reloaded: dealing with compiler-generated branches dealing with

Mon., 21 Sept. 2015 (delayed slides) Conditional and unconditional branches The go to

Bungee Jumps: Accelerating Indirect Branches Through Hardware/Software Co-Design Daniel S.

Announcements Final Examples Tree-Structured Data def tree(label, branches=[]): A tree can

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Administrative Law Branches of Government Legislative (Congress) creates law Judicial

Zygomatic Nerve Branches Around Zygomaticus Major Muscle in Facelift Min-Hee Ryu, MD Sino-Kor

Working with the OSPCA Comprised of 50 branches and affiliated Humane Societies across the

Nutrient Management in Subtropical Tree Crops The avocado model Avocado Fertilization Tissue

The Vine & the Branches John 15:1-3 John 15 gives us an indication of our priorities as a

Identification of Pruning Branches for for Automated Dormant Pruning M Manoj Karkee j K k

TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic

Y86-64 Instruction Set Byte 0 1 2 3 4 5 6 7 8 9 halt 0 0 Computer Architecture: nop

Scaling symbolic evaluation for automated verification of systems code with Serval Luke Nelson

Ion slides 2 pc windows 10 driver download Moon gives a good business the last-named aspect

9: Advanced shading techniques Obtaining realistic renderings in real-time! Remember the

Custom Writing Service - Special Prices Pc problem solving presentation slides Health research

14.54 International Trade Lecture 13: Heckscher-Ohlin Model of Trade (I) 14.54 Week 9

6.828: PC hardware and x86 Frans Kaashoek kaashoek@mit.edu A PC how to make it to do something

Synthesizing Software Verifiers from Proof Rules Corneliu Popeea Technical University Munich

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

What about branches? Branch outcomes are not known until EXE What - PowerPoint PPT Presentation

What about branches? Branch outcomes are not known until EXE What are our options? 1 Control Hazards 2 Today Quiz Control Hazards Midterm review Return your papers 3 Key Points: Control Hazards Control occur when we

Opportunity Day 30 March 2017 Draft Background and Business Company History and Background 20

Q12019 RESULTS OUR REGIONAL PRESENCE Branches 10 Ethiopia South ATMs 2 Sudan Staff 138

CSE 110A: Winter 2020 Fundamentals of Compiler Design I Branches and Binary Operators

CS356 Unit 5 x86 Control Flow 5.2 JUMP/BRANCHING OVERVIEW 5.3 Concept of Jumps/Branches

Memcheck Reloaded: Memcheck Reloaded: dealing with compiler-generated branches dealing with

Mon., 21 Sept. 2015 (delayed slides) Conditional and unconditional branches The go to

Bungee Jumps: Accelerating Indirect Branches Through Hardware/Software Co-Design Daniel S.

Announcements Final Examples Tree-Structured Data def tree(label, branches=[]): A tree can

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Administrative Law Branches of Government Legislative (Congress) creates law Judicial

Zygomatic Nerve Branches Around Zygomaticus Major Muscle in Facelift Min-Hee Ryu, MD Sino-Kor

Working with the OSPCA Comprised of 50 branches and affiliated Humane Societies across the

Nutrient Management in Subtropical Tree Crops The avocado model Avocado Fertilization Tissue

The Vine &amp; the Branches John 15:1-3 John 15 gives us an indication of our priorities as a

Identification of Pruning Branches for for Automated Dormant Pruning M Manoj Karkee j K k

TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic

Y86-64 Instruction Set Byte 0 1 2 3 4 5 6 7 8 9 halt 0 0 Computer Architecture: nop

Scaling symbolic evaluation for automated verification of systems code with Serval Luke Nelson

Ion slides 2 pc windows 10 driver download Moon gives a good business the last-named aspect

9: Advanced shading techniques Obtaining realistic renderings in real-time! Remember the

Custom Writing Service - Special Prices Pc problem solving presentation slides Health research

14.54 International Trade Lecture 13: Heckscher-Ohlin Model of Trade (I) 14.54 Week 9

6.828: PC hardware and x86 Frans Kaashoek kaashoek@mit.edu A PC how to make it to do something

Synthesizing Software Verifiers from Proof Rules Corneliu Popeea Technical University Munich

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

The Vine & the Branches John 15:1-3 John 15 gives us an indication of our priorities as a