Prediction and speculation : the role of stochastic models of - PowerPoint PPT Presentation

Prediction and speculation : the role of stochastic models of program behaviour in the performance of modern computers r. innocente 20 Nov 2005 roberto innocente 1 1

Speculation  from the Merriam-Webster dict : an assumption of an unusual risk in hopes of obtaining commensurate gains 20 Nov 2005 roberto innocente 2 2

Speculative execution  A prediction of what work is likely to be needed soon is made. Then it is speculatively executed in such a way that you can commit it if the prediction was correct or abort it. 20 Nov 2005 roberto innocente 3 3

von Neumann's model : Stored Program Computer  The Control Counter today called Program Counter (PC) or Instruction Pointer (IP) keeps the address of the next instruction to be executed. The control part fetches this instruction, decodes and executes it. At the end the PC is updated. 20 Nov 2005 roberto innocente 4 4

Linear scaling of speed- Quadratic scaling of transistors  Let's look at the last scaling in silicon litography from 0.13 u 250,00 to 0.9 u : a 0.70 linear scaling, 200,00 a 0.49 scaling of surface.  150,00 Gate delays scale linearly, Gate Speed Transistors transistors available scale 100,00 quadratically 50,00  We will get much more in 0,00 available complexity than 0.25 0.18 0.13 0.09 0.065 in gate speed 20 Nov 2005 roberto innocente 5 5

von Neumann's Projection/ Collapse postulate of QM  A system can be described with any mix of states, but if you observe it you can only find it in one of the eigenstates , and you can only measure an eigenvalue .  ( When you look at it the Shroedinger's cat is aut dead aut alive ) 20 Nov 2005 roberto innocente 6 6

Modern microprocessors Today µ processors take advantage of the  fact that they need to present an architectural state compliant with the standard von Neumann's model only from time to time, being for the remaining time free to proceed in whatever way they find it convenient 20 Nov 2005 roberto innocente 7 7

ILP – Instruction Level Parallelism (Fisher 1981)  Obeying the standard semantic when required, try to overlap the execution of multiple instructions as much as possible. (We will see that current microprocessors can have more than 100 instructions in flight ) 20 Nov 2005 roberto innocente 8 8

Enabling technologies for ILP exploitation   Pipelining   Multiple issue = Superscalar 20 Nov 2005 roberto innocente 9 9

A microprocessor in 1989 (Intel 386)  CPI = Cycles Per Instruction  Performance = Frequency / CPI  Intel 386 :  feature size : 1 micron  frequency: 33 Mhz  CPI = 5/6  Performance = 33 M/6 ~ 6 Kinstructions/s 20 Nov 2005 roberto innocente 10 10

Pipelining  The work to be done is eXecute Memory Fetch divided in stages , with a Writeback clear signal interface Decode between them. After each Pipeline latch stage a latch memorizes the state for the next cycle. It adds some W F X D overhead, but the hope is M to get 1 result per cycle, after the pipe is full. 20 Nov 2005 roberto innocente 11 11

Limits of pipelining  A latch can add 2 or 3 gate delays.  Current work is around 400 gate delays  you get a result every 400/n + 3 gate delays  you add an overhead of 3n gate delays 20 Nov 2005 roberto innocente 12 12

Pipeline at work F D X M W When there is a cycle dependency we 1 add r1,r3,r4 say that the 2 mul r5,r6,r7 pipeline is stalled 3 bnez loop,r1 or a bubble is 4 X inserted waiting 5 X X for the dependency 6 X X X to solve. Here a 7 X X X X control 8 div r8,r3,r6 X X X X dependency causes 9 add r10,r8,r9 X X X a 4 cycles stall. 10 jmp loop X X 20 Nov 2005 roberto innocente 13 13

Instruction dependencies   Data dependency : Control dependency : add r1,r2,r3 ; r1<-r2+r3 bne label1,r1,r2 mul r1,r4,r5 ; r5<-r4*r5 add r1,r2,r3  Solution: label1:  register renaming, mul r4,r5,r6 result forwarding  Solution:   branch prediction Structural dependency :  Solution:  add functional units 20 Nov 2005 roberto innocente 14 14

Multiple issue (Superscalar) Architectures Architectures that are able to process multiple instructions at a time. While it was common to have multiple W F X D execution units (like an M integer and a FP unit), only in the '90 appeared the first superscalar architectures e.g. IBM Power and Pentium Pro. W F X D These architectures require a M very good branch prediction. Here it's depicted a 2 way superscalar. 20 Nov 2005 roberto innocente 15 15

Superscalar/2  Current architectures are commonly 4 or 8 way superscalars  The design of the last Alpha, canceled in its late phase, was for an 8 way superscalar  Extremely good branch prediction is needed : there can be hundredths of instructions in flight ( 4 way*30 stages=120) 20 Nov 2005 roberto innocente 16 16

Superscalar at work F D X M W The wasted cycle slots are now 1 add r1,r3,r4 much more than mul r5,r6,r7 in the pipelined 2 bnez loop,r1 only case X 3 X X X 4 X X X X X 5 X X X X X X X 20 Nov 2005 roberto innocente 17 17

Real World Architectures IBM power5 20 Nov 2005 roberto innocente 18 18

15 years of x86 year processor feature transistor cycles / frequen pipe FO4 gates size count instr. cy length per cycle 1979 8088 12 1988 386dx 1 275 5 33 80 1991 486dx 1 1100 50 1993 pentium 60 0.8 3100 60 5 1995 pentiumPro 0.6 5500 150 10 1997 Pentium II 0.35 7500 233 10 1999 Pentium III 0.25 9500 450 10 2000 Pentium 4 0.18 42000 1300 20 2005 Pentium 4 571 0.09 130000 3800 30 13 20 Nov 2005 roberto innocente 19 19

Feature size, frequency, complexity 1 4000 0.9 3500 0.8 3000 0.7 0.6 feat.size 2500 freq 0.5 2000 0.4 1500 0.3 0.2 1000 0.1 500 0 0 386 486dx P 60 p pro P II P III P 4 P 4 571 386 486dx P 60 p pro P II P III P 4 P 4 571 130000 120000 110000 100000 90000 80000 trans.# 70000 60000 50000 40000 30000 20000 10000 0 386 486dx P 60 p pro P II P III P 4 P 4 571 20 Nov 2005 roberto innocente 20 20

A microprocessor in 2005 (Intel Pentium4)  IPC = Instructions Per Cycle  Performance = Frequency * IPC  Intel Pentium4 :  feature size : 90 nm  frequency: 3 Ghz  IPC ~ 2/3 (2 for SPECint,3 for SPECfp)  Performance = 3 G * 2 = 6 Ginstructions/s 20 Nov 2005 roberto innocente 21 21

Control xfer instructions  Some of the instructions, instead of simply incrementing the PC to the next instruction, change it to a different value. We distinguish :  Unconditional branches or simply jumps  Conditional branches or simply branches  subroutine calls  subroutine returns  traps, returns from interrupts or exceptions 20 Nov 2005 roberto innocente 22 22

Assembly – Machine instructions  Only jumps or branches :  j <label>  j @register  beq <label>  bne <label>  bz <label>  bnz <label> 20 Nov 2005 roberto innocente 23 23

High level Language – Assembly ld r1,1 ld r1,1 ld r2,4 ld r2,4  for(i=1;i<=4;i++) loop:cmp r1,r2 loop:cmp r1,r2 { .. } beq out beq out .. .. add r1,r1,1 add r1,r1,1 jmp loop jmp loop out: out: ld r1,i ld r1,i  if (i) { .. } bz next bz next .. .. next: next: loop: sub r1,1 loop: sub r1,1  while (i--) bz out bz out { .. } .. .. jmp loop jmp loop out: out: 20 Nov 2005 roberto innocente 24 24

SPEC-Std Perf. Evaluation Corporation benchmarks  Well-known set of benchmarks, continuously updated, recognized as representative of possible workloads  Divided in 2 big sets :  SPECint : integer programs( go, m88ksim, compress, li, ijpeg, perl, vortex)  SPECfp : floating point programs (mathematical simulation prgs)  http://www.spec.org 20 Nov 2005 roberto innocente 25 25

Branches by type I ndirect Average from 2 % SPECint95 Returns 1 0 % I m m ediat Conditional e I m m ediate 1 6 % Returns I ndirect Condition al 7 2 % 20 Nov 2005 roberto innocente 26 26

Branches by frequency 2 5 0 2 0 0 1 5 0 Dynamic instructions 1 0 0 Dynamic branches Dynamic Cond BR 5 0 SPEC95 0 com press perl gcc go ijpeg m 8 8 ksim vortex xlisp Benchmarks (on y-axis millions of instruction) 20 Nov 2005 roberto innocente 27 27

Branches by taken rate Never Alw ays Taken taken 1 4 % 1 4 % Alw ays taken 0 -5 % 9 5 -1 0 0 % 9 5 -1 0 0 % 7 % 5 0 -9 5 % 2 1 % 5 -5 0 % 0 -5 % 5 -5 0 % Never Taken 2 4 % 5 0 -9 5 % 2 0 % Average from SPECint95 20 Nov 2005 roberto innocente 28 28

Occurrences of branches  Occurrences of branches (conditional branches) :  SPECint 95 1 out of 5 instruction executed (20%)  SPECfp 95 1 out of 10 instruction executed (10%)  Basic block is the term used for a sequence of instructions without any control xfer Note : this is different and much more than the rate of branches in the static program 20 Nov 2005 roberto innocente 29 29

Prediction and speculation : the role of stochastic models of - PowerPoint PPT Presentation

Prediction and speculation : the role of stochastic models of program behaviour in the performance of modern computers r. innocente 20 Nov 2005 roberto innocente 1 1 Speculation from the Merriam-Webster dict : an assumption of an

Years Guri Sohi University of Wisconsin-Madison Outline Speculation infancy performance

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Sentiment and speculation in a market with heterogeneous beliefs Ian Martin Dimitris

BCs Speculation & Vacancy Tax Register to claim your exemption by March 31 st , 2019 What

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add

Data Speculation Adam Wierman Daniel Neill Lipasti and Shen. Exceeding the dataflow limit, 1996.

Cray-1 and Graphics Processors 1 Last time TM modern implementations hide all side efgects

(seasonal) prediction systems Arun Kumar Climate Prediction Center College Park, Maryland, USA

Summary of part I: prediction and RL Prediction is important for action selection The

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 20 Probabilistic Prediction Also

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 26 Probabilistic Prediction Also

Year 7 Camp 2019 PARENT MEETING Rotation Reminder 2 rotations of approx 80 pupils at a time

Urgent Healthcare Review update Wycombe District Council September 2015 NHS Chiltern CCG, NHS

Mithril Resources Ltd MTH:ASX THIN INK ZIN INC Exploring the Billy Hills Project David Hutton

Use of observations in data assimilation Grald Desroziers Mto-France, Toulouse, France

Outline Use of OLI to Manage Production Wells Artificial Lift Methods Production Chemistry

DFM FOODS LIMITED Investor Presentation PRIVATE & CONFIDENTIAL. Safe Harbor Statement This

Q2 & H1 FY20 - Results Presentation November 2019 Disclaimer Certain statements in this

JUNE 2020 TEACHER REFERENCE PRESENTATION CLASS 7 onwards TABLE OF CONTENTS JUNE 2020 CLASS 7

Prediction and speculation : the role of stochastic models of - PowerPoint PPT Presentation

Prediction and speculation : the role of stochastic models of program behaviour in the performance of modern computers r. innocente 20 Nov 2005 roberto innocente 1 1 Speculation from the Merriam-Webster dict : an assumption of an

Years Guri Sohi University of Wisconsin-Madison Outline Speculation infancy performance

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Sentiment and speculation in a market with heterogeneous beliefs Ian Martin Dimitris

BCs Speculation &amp; Vacancy Tax Register to claim your exemption by March 31 st , 2019 What

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add

Data Speculation Adam Wierman Daniel Neill Lipasti and Shen. Exceeding the dataflow limit, 1996.

Cray-1 and Graphics Processors 1 Last time TM modern implementations hide all side efgects

(seasonal) prediction systems Arun Kumar Climate Prediction Center College Park, Maryland, USA

Summary of part I: prediction and RL Prediction is important for action selection The

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 20 Probabilistic Prediction Also

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 26 Probabilistic Prediction Also

Year 7 Camp 2019 PARENT MEETING Rotation Reminder 2 rotations of approx 80 pupils at a time

Urgent Healthcare Review update Wycombe District Council September 2015 NHS Chiltern CCG, NHS

Mithril Resources Ltd MTH:ASX THIN INK ZIN INC Exploring the Billy Hills Project David Hutton

Use of observations in data assimilation Grald Desroziers Mto-France, Toulouse, France

Outline Use of OLI to Manage Production Wells Artificial Lift Methods Production Chemistry

DFM FOODS LIMITED Investor Presentation PRIVATE &amp; CONFIDENTIAL. Safe Harbor Statement This

Q2 &amp; H1 FY20 - Results Presentation November 2019 Disclaimer Certain statements in this

JUNE 2020 TEACHER REFERENCE PRESENTATION CLASS 7 onwards TABLE OF CONTENTS JUNE 2020 CLASS 7

BCs Speculation & Vacancy Tax Register to claim your exemption by March 31 st , 2019 What

DFM FOODS LIMITED Investor Presentation PRIVATE & CONFIDENTIAL. Safe Harbor Statement This

Q2 & H1 FY20 - Results Presentation November 2019 Disclaimer Certain statements in this