Execution-based Prediction Using Speculative Slices Craig Zilles and Guri Sohi University of Wisconsin - Madison International Symposium on Computer Architecture July, 2001
The Problem Two major barriers to achieving high ILP: MISPREDICTED BRANCHES and CACHE MISSES TRADITIONAL PREDICTION : SOMEWHAT MATURE TECHNOLOGY • correctly anticipate > 90% instructions • exploit patterns in outcome/address stream • remaining mispredictions still expensive EXECUTION - BASED PREDICTION • exploit regularity in computations • speculatively compute results early for use as predictions • speedups from 1 to 43% on SPECINT 2000 2 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
The Solution RETIREMENT Identify frequently STREAM 1 mispredicting instructions Extract and pack 2 dependant computation PROGRAM into code fragments BRANCH called slices TIME branch mispredict branch slice BRANCH LOAD cache miss load slice LOAD 3 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
The Solution RETIREMENT Execute slices in 3 STREAM helper threads to fork generate predictions branch fork slice load prediction slice BRANCH BRANCH TIME cache branch miss mispredict idle LOAD thread cache hit LOAD } idle cache thread miss speedup 4 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
The Outline • P ROBLEM I NSTRUCTIONS fork branch fork slice load prediction slice BRANCH cache miss LOAD cache hit } speedup 5 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
The Outline • P ROBLEM I NSTRUCTIONS fork • E XECUTION - BASED P REDICTION branch fork slice load prediction slice BRANCH cache miss LOAD cache hit } speedup 6 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
The Outline • P ROBLEM I NSTRUCTIONS fork • E XECUTION - BASED P REDICTION branch • P REDICTION C ORRELATION fork slice load prediction slice BRANCH cache miss LOAD cache hit } speedup 7 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
The Outline • P ROBLEM I NSTRUCTIONS fork • E XECUTION - BASED P REDICTION branch • P REDICTION C ORRELATION fork slice • P ERFORMANCE R ESULTS load prediction slice BRANCH cache miss LOAD cache hit } speedup 8 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
Problem Instructions Misses and mispredictions are not evenly distributed. E XAMPLE : P ERLBMK • 82 static branches: 68% of misp., 9% of dynamic branches • 140 static loads: 67% misses, 2% of dynamic memory insts Fixing just problem inst’s gives > 1/2 perf. of perfect cache/pred O UTCOMES OF THESE INSTRUCTIONS DO NOT EXHIBIT A PREDICTABLE PATTERN ... • consistently mispredicted ... BUT SOMETIMES THE COMPUTATION IS REGULAR . while (...) { while (i < n) { ... if (object[i] != NULL) { ptr = ptr->next; ... } } 9 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
Outline • P ROBLEM I NSTRUCTIONS • E XECUTION - BASED P REDICTION O An different pre-execution approach O Speculative slices and imprecise transformations O Slice structure O Slice characterization fork • P REDICTION C ORRELATION branch fork slice • P ERFORMANCE R ESULTS load prediction slice BRANCH cache miss LOAD cache hit } speedup 10 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
Previous pre-execution proposal Speculative Data-driven Multithreading: Roth and Sohi, HPCA’01 • Speculatively pre-executes data-driven threads (DDTs) • Register integration matches DDTs to main thread + avoids re-execution of DDT instructions + early branch resolution (at decode stage) - DDTs must be sub-set of original program 11 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
Two Observations Two Observations: • benefit comes from prefetches and predictions • strict program subsets not most efficient slices Our approach: generate predictions/prefetches in as efficient manner as possible. O PTIMIZE SLICES : + reduce fetch/execution overhead + reduce critical path to making prediction - need a new mechanism to correlate predictions 12 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
Speculative Slices DON ’ T ALLOW SLICES TO AFFECT ARCHITECTED STATE • only generate pre-fetches and predictions • need not be 100% accurate 3 C LASSES OF TRANSFORMATIONS : ( NOT ORIGINALLY APPLIED BY COMPILER ) • Imprecise O static branch assertion (remove branches/cold code) • Not-provably safe O register allocation in the presence of aliases • Previously unprofitable O if-conversion (of a subset of a block) 13 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
Slice Structure • problem instructions frequently in loops • encapsulate loop in slice Program Fork Slice BENEFITS : • lower overhead • earlier predictions problem load • amortize fork overhead • single helper thread I SSUES & S OLUTIONS : in paper 14 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
Slice Characterization C ONSTRUCTED AND OPTIMIZED SLICES BY HAND • encouraging results S TATISTICS : • 85% of slices cover multiple static problem instructions • 70% of slices contained loops • small static size O smaller than 4 * # problem instructions covered • prefetch or prediction generated every ~3 dynamic inst’s. • small number of live-in values O 80% of slices had 2 or less slices can be very small 15 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
Outline • P ROBLEM I NSTRUCTIONS • E XECUTION - BASED P REDICTION • P REDICTION C ORRELATION O difficult problem O valid regions • R ESULTS AND A NALYSIS fork branch fork slice load prediction slice BRANCH cache miss LOAD cache hit } speedup 16 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
Prediction Correlation fork branch slice prediction BRANCH T O BENEFIT FROM A SLICE - GENERATED PREDICTION • must bind it to fetched branch instruction • overrides hardware branch predictor H OW ARE PREDICTIONS CORRELATED TO DYNAMIC BRANCHES ? 17 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
Prediction Correlation fork branch slice prediction BRANCH Tagged prediction queues BRANCH PC F T F T T BRANCH PC T T F BRANCH PC T F T T F Related Work: Farcy, et al, Micro ‘98 C HALLENGES : • re-ordering predictions produced out-of-order • recovering from misspeculation by main thread • dealing with conditionally-executed problem branches 18 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
Conditionally-executed problem branches program’s CFG M INIMIZE OVERHEAD BY BUILDING SIMPLEST SLICE • compute prediction for each iteration fork A point N AIVE IMPLEMENTATION B • predictions dequeued when used • mis-alignment occurs on path CF C G problem branch C ONDITIONALLY GENERATE PREDICTIONS ? D • include “existence slice” in slice E • too much overhead F I NSIGHT • existence slice encoded in fetch path not executed on all iterations 19 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
Valid Regions D EFINE REGION WHERE PREDICTION IS VALID A • using assumptions from building slice B 1st pred first iteration C G D E F 2nd pred second iteration B C G D F F E 20 Execution-based Prediction using Speculative Slices - Craig Zilles and Guri Sohi International Symposium on Computer Architecture (ISCA-28), July 2001
Recommend
More recommend