Five poTAGEs and a COLT for an unrealistic predictor Pierre Michaud - PowerPoint PPT Presentation

Five poTAGEs and a COLT for an unrealistic predictor Pierre Michaud june 2014

Competition track: Unlimited size 2

I did not modify the predictor after the submission 3

Two-level history branch predictors E.g., global branch history, First level = context local branch history E.g., TAGE Second level branch address prediction 4

PPM-like second level • Search the longest context that already occurred at least once, and predict from the past history for that context - search with the maximum context length L1 - if no past occurrence for L1, search with L2 < L1 - if no past occurrence for L2, search with L3 < L2 - and so on… • One table per context length • To know if a context already occurred, use tags - false hit probability divided by 2 every time we increase the tag length by 1 bit 5

TAGE • PPM-like (TAgged) with GEometric context lengths - does not name a specific predictor but a predictor family - PPM-like 2004, TAGE 2006, TAGE 2011 • Most of the tricks are in the update - allocation policy, u bit, selection counter,... - makes the difference between bad TAGE (e.g., PPM-like 2004) and good TAGE 6

Let’s tune TAGE for limit studies 7

PPM’s main weakness: the cold-counter problem 8

Biased-coin tossing game • The coin is biased, we don’t know which side is the bias • We play repeatedly with the same coin • At game N+1, we count how many times head occurred vs. tail in the N previous games  we choose the side which occurred the most - if equal head and tail counts  choice = outcome of last game 10

Biased-coin tossing game • The coin is biased, we don’t know which side is the bias • We play repeatedly with the same coin • At game N+1, we count how many times head occurred vs. tail in the N previous games  we choose the side which occurred the most - if equal head and tail counts  choice = outcome of last game similar to TAGE’s taken/not-taken counters 11

Cold-counter problem bias = 90% game 1 2 3 4 5 8 9 6 7 10 win proba. 0.500 0.820 0.878 0.878 0.893 0.893 0.898 0.898 0.899 0.820 bias = 60% game 2 3 4 5 1 6 7 8 9 10 win proba. 0.530 0.530 0.537 0.537 0.542 0.542 0.547 0.500 0.520 0.520 12

Cold counter problem in TAGE • Limited storage  allocate entry for longer context only upon misprediction •  counter likely to be initialized with least frequent outcome • TAGE has a mechanism for reducing the cold counter problem - sometimes, second longest match entry more accurate than (cold) longest match entry - single global selection counter chooses between longest match and second longest 13

poTAGE: post-predicted TAGE • TAGE tuned for limit studies • Tackle cold counter problem • Replace the selection counter with a post-predictor • Aggressive update & allocation for fast ramp up 14

Selection counter  post-predictor • Selection counter is cost-effective, but does not solve the cold counter problem completely • Post-predictor  more effective solution 15

Post-predictor TAGE ctr ctr ctr u 1 3 3 3 third hit second hit first hit 10 1024 T: increment five-bit NT: decrement counters T/NT prediction 16

Post-predictor TAGE ctr ctr ctr u 1 3 3 3 third hit second hit first hit 10 1024 T: increment 5% fewer five-bit NT: decrement mispredictions than counters selection counter T/NT prediction 17

Ramp up • Realistic TAGE  careful policy allocates new entries only upon mispredictions - good use of limited storage by minimizing useless allocations • poTAGE  aggressive policy for reducing cold-start mispredictions - update all hitting counters - allocate for all context lengths greater than the longest hitting context and for which u bit is reset - stop aggressive allocation for context lengths greater than 200 when all hitting counters are saturated - switch to careful policy after a fixed number of mispredictions 18

Ramp up • Realistic TAGE  careful policy allocates new entries only upon mispredictions - good use of limited storage by minimizing useless allocations • poTAGE  aggressive policy for reducing cold-start mispredictions - update all hitting counters - allocate for all context lengths greater than the longest hitting context and for which u bit is reset - stop aggressive allocation for context lengths greater than 200 when all hitting counters are saturated - switch to careful policy after a fixed number of mispredictions 4% fewer mispredictions 19

Global-path TAGE: footprint problem • Global path, if long enough, can (in theory) capture all branch correlations • Problem: high-entropy branches grow the footprint (number of allocations) • We could try to filter out of the global path branches that carry no useful correlation information - in practice, difficult to identify these branches - filtering them out does not necessarily reduce the footprint • Alternative approach: intentional path aliasing 20

Intentional path aliasing • Path aliasing = several distinct global paths aliased to the same predictor entry and tag - something we try to avoid in a global-path TAGE • Intentional path aliasing reduces the footprint - we lose some correlation information  only some branches benefit from it • Local history can be viewed as intentional path aliasing • Per-set history (Yeh & Patt, 1993) is intentional path aliasing - was used in the FTL++ predictor (Yasuo Ishii et al., CBP-3) 21

multi-poTAGE • Combine several poTAGE predictors using different first-level histories - P0: 1 global path - P1: 32 local (per-address) subpaths - P2: 16 per-set subpaths (128-byte sets) - P3: 4 per-set subpaths (2-byte sets) - P4: 8 frequency subpaths • Combined through COLT Fusion - Loh & Henry, PACT 2002 • Better to have a few long subpaths than many short ones - Yasuo Ishii et al., CBP-3 22

multi-poTAGE P3 P4 P0 P1 P2 (per set) (frequency) (global) (local) (per set) branch address COLT T/NT prediction 23

multi-poTAGE P3 P4 P0 P1 P2 (per set) (frequency) (global) (local) (per set) branch address COLT T/NT prediction 24

Frequency-based first-level history • Branch frequency = number of times the branch was executed - Branch Frequency Table  one counter per branch address - increment counter on each dynamic occurrence • Exploit correlations between branches with (roughly) same frequency • Define 8 frequency bins - from high to low frequency • Associate one subpath with each frequency bin • Access poTAGE with subpath corresponding to the branch frequency 25

Global path: most accurate single component P0 (global) 26

Global path: most accurate single component P0 (global) branch address COLT -0.5 % 27

2nd most important: 128-byte sets -5 % P0 P2 (global) (per set) branch address COLT 28

3rd: local -3 % -5 % P0 P1 P2 (global) (local) (per set) branch address COLT 29

4th: frequency -3 % -5 % -2.5 % P0 P1 P4 P2 (global) (local) (frequency) (per set) branch address COLT 30

5th: 4-byte sets -3 % -5 % -2.5 % -1 % P0 P1 P3 P4 P2 (global) (local) (per set) (frequency) (per set) branch address COLT 31

Total -10 % P0 P1 P3 P4 P2 (global) (local) (per set) (frequency) (per set) branch address COLT 32

Conclusion • Post-predictor more effective than selection counter for reducing cold- counter problem • Huge TAGE can use aggressive update & allocation • Fundamental weakness of global-path TAGE: high-entropy branches grow the footprint • Proposed solution: blind use of intentional path aliasing • Is it possible to use intentional path aliasing in a cost-effective way ? 33

Questions ? 34

Five poTAGEs and a COLT for an unrealistic predictor Pierre Michaud - PowerPoint PPT Presentation

Five poTAGEs and a COLT for an unrealistic predictor Pierre Michaud june 2014 Competition track: Unlimited size 2 I did not modify the predictor after the submission 3 Two-level history branch predictors E.g., global branch history, First

Colt International Ltd Pressurisation systems in residential and commercial buildings Colt CPD

Colt International Ltd Commissioning and maintenance of smoke control systems CPD Technical

Colt International Ltd General Principles of Smoke Control CPD Technical Seminar 2018 People

Colt International Ltd Fundamentals of Evaporative Cooling CPD Technical Seminar 2018 People

Colt International Ltd Car Park Ventilation CPD Technical Seminar 2018 People feel better in

Creston Elementary Colt Character Colt Character Student behavior is a growing concern in

Colt International Ltd Design considerations when specifying weather louvres CPD Technical

Colt International Ltd General Principles of Smoke Control CPD Technical Seminar 2020 People

Unrealistic Assumptions Philosophy of Economics University of Virginia Matthias Brinkmann

Colt International Ltd Design considerations when integrating smoke and fire curtains into a

Colt International Ltd Ventilation solutions for overheated common corridors in apartment

Colt International Ltd Anatomy of a control system for life safety smoke control CPD Technical

Variant Effect Predictor Demo: The Variant Effect Predictor (VEP)

1 Tournament Branch Predictor Accuracy of Return Address Predictor Used in Alpha 21264: Track

Is Chocolate a Personality Is Chocolate a Personality Predictor? Predictor? Susan C. Sharpe,

Lesson 2 Greek Vocabulary One does not equal five!!! One does not equal five!!! One does not

LOCKING CS 2550 / Spring 2006 Principles of Database Systems 10 Locking Alexandros

Preventa(ve)Measures)for)School)Bullying Roz)Myers,)JD,)MA IIRP)Conference) October)2013 1

GCC/Clang Optimizations for Embedded Linux Khem Raj, Comcast Embedded Linux Conference &

UCL PRO/CON Debate: Aggressive versus progressive therapeutic approach Dr. Shahin Moledina -

Passive Aggressive Measurement with MGRP Pavlos Papageorge1,2, Justin McCann2, Michael Hicks2

Icing Supporting Fast-Math Style Optimizations in a Verified Compiler Heiko Becker , Eva

APNA 29th Annual Conference Session 4011: October 31, 2015 Camille Kennedy MSN RNBC Director

Empowering Software Debugging Through Architectural Support for Program

Five poTAGEs and a COLT for an unrealistic predictor Pierre Michaud - PowerPoint PPT Presentation

Five poTAGEs and a COLT for an unrealistic predictor Pierre Michaud june 2014 Competition track: Unlimited size 2 I did not modify the predictor after the submission 3 Two-level history branch predictors E.g., global branch history, First

Colt International Ltd Pressurisation systems in residential and commercial buildings Colt CPD

Colt International Ltd Commissioning and maintenance of smoke control systems CPD Technical

Colt International Ltd General Principles of Smoke Control CPD Technical Seminar 2018 People

Colt International Ltd Fundamentals of Evaporative Cooling CPD Technical Seminar 2018 People

Colt International Ltd Car Park Ventilation CPD Technical Seminar 2018 People feel better in

Creston Elementary Colt Character Colt Character Student behavior is a growing concern in

Colt International Ltd Design considerations when specifying weather louvres CPD Technical

Colt International Ltd General Principles of Smoke Control CPD Technical Seminar 2020 People

Unrealistic Assumptions Philosophy of Economics University of Virginia Matthias Brinkmann

Colt International Ltd Design considerations when integrating smoke and fire curtains into a

Colt International Ltd Ventilation solutions for overheated common corridors in apartment

Colt International Ltd Anatomy of a control system for life safety smoke control CPD Technical

Variant Effect Predictor Demo: The Variant Effect Predictor (VEP)

1 Tournament Branch Predictor Accuracy of Return Address Predictor Used in Alpha 21264: Track

Is Chocolate a Personality Is Chocolate a Personality Predictor? Predictor? Susan C. Sharpe,

Lesson 2 Greek Vocabulary One does not equal five!!! One does not equal five!!! One does not

LOCKING CS 2550 / Spring 2006 Principles of Database Systems 10 Locking Alexandros

Preventa(ve)Measures)for)School)Bullying Roz)Myers,)JD,)MA IIRP)Conference) October)2013 1

GCC/Clang Optimizations for Embedded Linux Khem Raj, Comcast Embedded Linux Conference &amp;

UCL PRO/CON Debate: Aggressive versus progressive therapeutic approach Dr. Shahin Moledina -

Passive Aggressive Measurement with MGRP Pavlos Papageorge1,2, Justin McCann2, Michael Hicks2

Icing Supporting Fast-Math Style Optimizations in a Verified Compiler Heiko Becker , Eva

APNA 29th Annual Conference Session 4011: October 31, 2015 Camille Kennedy MSN RNBC Director

Empowering Software Debugging Through Architectural Support for Program

GCC/Clang Optimizations for Embedded Linux Khem Raj, Comcast Embedded Linux Conference &