Predicting Instruction Cache Behavio r F rank Mueller (FSU) David Whalley (FSU) Ma rion Ha rmon(F AMU) FSU Depa rtment of Computer DEPARTMENT OF COMPUTER SCIENCE Science Flo rida State Universit y T allahassee, FL 32304-4019 e-mail: mueller@cs.fsu.edu whalley@cs.fsu.edu ha rmon@vm.cc.famu.edu Predicting Instruction Cache Behavio r LCTS'94 1
Overview � caches often disabled fo r real-time due to \unp redictabilit y" FSU � analysis of instruction cache b ehavio r p ossible DEPARTMENT OF COMPUTER SCIENCE � static cache simulation p redicts many references � allo ws tighter WET/BET p redictions fo r regula r caches � new a rchitectural feature: fetch-from-memo ry bit - sp eedup facto r 3 to 8 over uncached system - no loss in p redictabilit y Predicting Instruction Cache Behavio r LCTS'94 2
Intro duction � timing p redictions required fo r schedulabilit y analysis FSU DEPARTMENT OF COMPUTER SCIENCE � caches b ridge b ottleneck b et w een CPU and MM sp eed � caches rega rded as \unp redictable" � caches often disabled fo r ha rd real-time systems � CPU sp eed not fully utilized � p roblem will increase in future Predicting Instruction Cache Behavio r LCTS'94 3
Static Cache Simulation � address of instructions kno wn statically � p redicts la rge p o rtion of instruction cache references FSU � uses data-�o w analysis of call graph and control �o w DEPARTMENT OF COMPUTER SCIENCE � catego rizes each instruction � assumes: - direct-mapp ed caches - task: co de executed b et w een 2 scheduling p oints - non-p reemptive static scheduling - currently no recursion allo w ed Predicting Instruction Cache Behavio r LCTS'94 4
Overview of Static Cache Simulation FSU DEPARTMENT OF COMPUTER SCIENCE executable source object linker compiler files files program cache configuration Predicting Instruction Cache Behavio r LCTS'94 5 control static instruction flow cache annotation simulator information
Instruction Catego rization � transfo rms call graph into function-instance graph (FIG) � p erfo rms analysis on FIG and control-�o w graph � uses data-�o w analysis algo rithms fo r p rediction FSU DEPARTMENT OF COMPUTER SCIENCE � abstract cache state : p otentially cached p rogram lines � reaching state : reachable p rogram lines � catego ries based on these states: - alw a ys hit - alw a ys miss - �rst miss: miss on �rst reference, hit on consecutive ones - con�ict: either hit o r miss (dynamic) Predicting Instruction Cache Behavio r LCTS'94 6
7 DEPARTMENT OF COMPUTER SCIENCE main() 1 a-miss a-hit program line 0 a-hit call foo() a-hit 2 a-miss a-hit program line 1 3 conflict LCTS'94 a-hit f-miss 4 a-hit program line 2 a-hit a-hit 5 f-miss call foo() a-hit program line 3 6 f-miss r Behavio a-hit Cache 7 a-hit a-hit Instruction program line 4 return a-hit (a) (b) FSU foo() Predicting 8 a-miss a-hit a-miss a-miss program line 5 return a-hit a-hit
F rank Mueller (FSU) David Whalley (FSU) Ma rion Ha rmon(F AMU) LCTS'94 � 4 cache lines � 16 b ytes p er line (4 instructions) � instances fo o (a) blo ck 8a and (b) blo ck 8b � 7(1): alw a ys hit, spacial lo calit y � 8b(1): alw a ys hit, temp o ral lo calit y � 3(3): �rst miss � 5(1) and 6(1): group �rst miss � 3(1): con�ict with 8b(2) conditionally executed Predicting Instruction Cache Behavio r (notes) 7-1
F etch-F rom-Memo ry Bit � motivation: FSU - b etter p erfo rmance than uncached systems DEPARTMENT OF COMPUTER SCIENCE - no loss of p redictabilit y � fetch-from-memo ry (FFM) bit enco ded in instruction � semantics: - FFM set: fetch instruction from MM - FFM clea r: fetch instruction from cache Predicting Instruction Cache Behavio r LCTS'94 8
F etch-F rom-Memo ry Bit (cont.) � ha rdw a re logic: - cache miss: fetch from memo ry ( n cycle dela y) - cache hit and FFM set: fetch from memo ry ( n cycle dela y) FSU - cache hit and FFM clea r: fetch from cache without dela y DEPARTMENT OF COMPUTER SCIENCE � relation to instruction catego rization: - FFM set i� con�ict o r alw a ys miss - FFM clea r i� �rst miss o r alw a ys hit � �rst miss: - 1st reference results in cache miss ( n cycle dela y) - consecutive references result in cache hit and FFM clea r (no dela y) Predicting Instruction Cache Behavio r LCTS'94 9
Measurements � mo di�ed back-end of opt. compiler VPO FSU DEPARTMENT OF COMPUTER SCIENCE � p erfo rmed static cache simulation � instrumented p rograms fo r instruction cache simulation � direct-mapp ed cache simulated � unifo rm instruction size of 4 b ytes simulated � cache line size w as 4 w o rds (16 b ytes) Predicting Instruction Cache Behavio r LCTS'94 10
Static Measurements FSU Cache cache p rediction DEPARTMENT OF COMPUTER SCIENCE Size FFM set alw a ys hit alw a ys miss �rst-miss con�ict 1kB 25.19% 71.23% 8.66% 3.69% 16.42% 2kB 21.18% 72.09% 5.88% 7.28% 14.75% 4kB 11.35% 72.40% 4.36% 16.64% 6.60% 8kB 4.73% 72.61% 4.03% 22.77% 0.59% Predicting Instruction Cache Behavio r LCTS'94 11
F rank Mueller (FSU) David Whalley (FSU) Ma rion Ha rmon(F AMU) LCTS'94 � cache sizes 1-4kB � 12 p rograms with sizes 5-18kB � FFM set: in D A G call graph and CF G � others: in FIG � caches statically p redictable fo r 84-99% of references � remaining 1-16% due to con�icts Predicting Instruction Cache Behavio r (notes) 11-1
Dynamic Measurements FSU Cache hit ratio con�icts % of exec time DEPARTMENT OF COMPUTER SCIENCE Size bit-enc. cached cached bit-enc. cached 1kB 71.81% 92.40% 25.38% 39.30% 18.71% 2kB 77.81% 97.49% 21.14% 33.30% 13.62% 4kB 90.73% 99.74% 9.12% 20.38% 11.37% 8kB 98.15% 99.99% 1.76% 12.97% 11.13% Predicting Instruction Cache Behavio r LCTS'94 12
F rank Mueller (FSU) David Whalley (FSU) Ma rion Ha rmon(F AMU) LCTS'94 � uncached: simulated disabled instruction cache with 10 overhead fo r each instruction fetch � bit-enco ded: simulated translation of bit-enco ding as discussed � conventional cached � 1-19 million instructions executed � results imp rove with increasing cache size � bit-enc.: lo w er hit ratio than cached (72-98% vs. 92-99%) but much b etter than uncached! � bit-enc.: 3-8 times faster than uncached (39-13% of uncached exec time)! � cached: 5-9 times faster than uncached (18-11% of uncached exec time) � cached less p redictable, bit-enc. as p redictable as uncached! � con�icts source of unp redictabilit y , 25-5% � results can b e still imp roved if combined with timing to ol (4-9 sp eedup) � very tight estimating of regula r cached system p ossible with timing to ol Predicting Instruction Cache Behavio r (notes) 12-1
Prelimina ry Timing Results Dynamic W o rst-Case Measurements FSU Name Observed Our Esti- Naive DEPARTMENT OF COMPUTER SCIENCE Cycles mated Ratio Ratio Matmult 2,917,887 1.00 9.21 Matsum 677,204 1.00 4.63 Matsumcnt 959,064 1.09 4.31 Bubbleso rt 7,620,684 1.99 8.18 Predicting Instruction Cache Behavio r LCTS'94 13
F rank Mueller (FSU) David Whalley (FSU) Ma rion Ha rmon(F AMU) LCTS'94 � 8 lines of 16 b ytes each, i.e. cache size is 128 b ytes � p rograms 4-6 times la rger than cache � observed: simulated cached system � our estimate: timing to ol � naive: uncached system (10 cycles fetch dela y p er instruction) � matmult: lo ops, no if-then-else � matsum: lo ops, if-then � matsumcnt: lo ops, if-the-else � bubbleso rt: inner lo op counters dep ending on outer lo op counters � general p roblem fo r timing to ols (kno wn numb er of lo op iterations) � surp risingly tight estimates p ossible, just as go o d as without caches Predicting Instruction Cache Behavio r (notes) 13-1
F uture W o rk FSU � data caching DEPARTMENT OF COMPUTER SCIENCE � recursion � set-asso ciative caches � integrate with timing to ol to tightly p redict WET/BET � other applications Predicting Instruction Cache Behavio r LCTS'94 14
Recommend
More recommend