Static Transformation for Heap Layout Using Memory Access Patterns Jinseong Jeon Computer Science, KAIST
Static Transformation computing user compiler machine + static transformation • Compilers can transform program memory layout. – program behaviors: memory access patterns – machine properties: memory hierarchy -12 CS @ KAIST 2 2006-1 -12-1
Heap Layout Transformation [ Pool Allocation ] - complex pointer analysis Node { int key; char data[6]; Node *next; } * T; char* search(int k) { ... while (...) { if (h → key == k) k n return h → data; ... h = h → next; } d ... ... } [ Field Layout Reconstruction ] - profiling -12 CS @ KAIST 3 2006-1 -12-1
Goal & Direction • To build static transformation for heap layout – Based on both heap layout transformations • Predict program behaviors – How to represent memory access behaviors • Regular expressions – How to extract run-time behaviors from codes • Code → CFG → Automaton → R.E. • Then, apply optimizing techniques – How to interpret predicted behaviors -12 CS @ KAIST 4 2006-1 -12-1
Overview Access Pattern Analysis Source Optimized code code Structure Field Affinity Layout Selection Analysis Transformer Analysis -12 CS @ KAIST 5 2006-1 -12-1
Structure Selection S 1 .x = T 1 .c; S = T ; structure type for (...) { for ( ) { projection conversion TS T i .a = ...; T = ; ... = T i .b; = T ; U j .y = ...; U = ; TTU } } T U TS(TTU)* candidate selection for pool allocation -12 CS @ KAIST 6 2006-1 -12-1
Field Affinity Estimation S 1 .x = T 1 .c; = .c; field usage for (...) { for ( ) { projection conversion c T i .a = ...; .a = ; ... = T i .b; = .b; U j .y = ...; ab } } c a b ... c(ab)* c a b ... * field layout symbolic estimation reconstruction -12 CS @ KAIST 7 2006-1 -12-1
Field Affinity Estimation • Symbolic approach – record closure marks with nesting information – regard all closure marks as a same variable (kdn(n)*)* ((kn)*(kd+ � ))* ((kn)*(kd+ � ))* k k ** 2x 2 +3x * ** 3x * * * * x 2 ** * n d n d x * -12 CS @ KAIST 8 2006-1 -12-1
Code Transformation • Explicit field names → field accesses on modified layouts – O i .next is converted into *(O i + offset(next)). – Random pointer dereferences like *(p + 4) are not allowed. – For some accesses, extra instructions are required. -12 CS @ KAIST 9 2006-1 -12-1
Code Transformation • Type-aware malloc → pool allocation routines – For custom allocators, feed hints which consist of target structures and corresponding custom allocators char* _T_alloc_() { // pool allocation } char* my_malloc(int s) { char* my_malloc(int s) { char* _T_alloc_() { ... ... ... = _T_alloc(); // pool allocation ... = malloc(s); } } } ... ... ... ... ... = my_malloc(sizeof(T)); ... = malloc(sizeof(T)); ... = my_malloc(sizeof(T)); ... = _T_alloc_(); ... ... ... ... -12 CS @ KAIST 10 2006-1 -12-1
Overview Access Pattern Analysis Source Optimized code code Structure Field Affinity Layout Selection Analysis Transformer Analysis -12 CS @ KAIST 11 2006-1 -12-1
Experimental Environment • Using the CIL CIL compiler and OCaml • Redhat 9.0 Linux PC – 2.6GHz Pentium4 processor – 8KB L1D cache, 512KB L2 cache, 1.7GB main memory • GCC 3.2.2 with -O3 -12 CS @ KAIST 12 2006-1 -12-1
Analysis Time Lines of Structure Field Code Benchmark Program Total Code Selection Affinity Transform 175.vpr 11301 7.220 0.324 0.107 7.651 SPECINT 2 000 300.twolf 17821 15.598 3.455 1.126 20.179 FreeBench analyzer 763 0.096 0.027 0.012 0.135 chomp 378 0.021 0.006 0.003 0.030 McGill misr 181 0.003 0.002 0.001 0.006 bisort 597 0.020 0.003 0.002 0.025 health 474 0.024 0.004 0.002 0.030 mst 408 0.031 0.004 0.002 0.037 Olden suite perimeter 345 0.012 0.012 0.001 0.025 treeadd 154 0.002 0.000 0.000 0.002 tsp 433 0.011 0.004 0.002 0.017 voronoi 975 0.048 0.004 0.003 0.055 anagram 355 0.031 0.003 0.001 0.035 bc 4303 2.028 0.634 0.193 2.855 Ptrdist suite ft 926 0.050 0.014 0.010 0.074 ks 551 0.055 0.012 0.020 0.087 -12 CS @ KAIST 13 2006-1 -12-1
Cache Miss - L1D 1.99 1.99 2.23 2.23 1. 1.4 Pool Pool 1.3 1. miss (1.0 = Original) 1. 1.2 Pool + Re Re 1. 1.1 1 Pool Po l 0. 0.9 0.86 0.86 0. 0.8 0.84 0.84 malized L1D cache mi 0. 0.7 Pool Po l + + Re 0.6 0. 0.5 0. 0.4 0. 0.3 0. Norma 0. 0.2 0.1 0. 0 175.vpr 175. 300.twol 300. anal analyzer chomp misr mi bi bisort heal health mst ms perime treeadd tr tsp tsp vo voronoi anagram anagram bc bc ft ft ks ks ort mp meter pr er olf -12 CS @ KAIST 14 2006-1 -12-1
Cache Miss - L2 4.10 4.18 4.10 4.18 1.4 1.3 Pool Pool miss (1.0 = Original) 1.2 Pool + Re Re Pool Po l 1.1 1.06 1.06 1 1.00 1.00 0.9 Po Pool l + + Re 0.8 malized L2 cache mi 0.7 0.6 0.5 0.4 Norma 0.3 0.2 0.1 0 175. 175.vpr 300.twol 300. analyzer anal chomp misr mi bisort bi health heal mst ms perime tr treeadd tsp tsp voronoi vo anagram anagram bc bc ft ft ks ks ort mp meter pr olf er -12 CS @ KAIST 15 2006-1 -12-1
Performance Benchmark Program Lines of Pool Pool + Re Original ( second) Code / Original / Original 175.vpr 11301 10.959 1.01 1.01 SPECINT 2 000 300.twolf 17821 435.19 0.98 0.99 FreeBench analyzer 763 66.64 0.41 0.45 chomp 378 7.44 0.59 0.47 McGill misr 181 31.39 0.99 1.01 bisort 597 24.29 0.99 0.99 health 474 86.05 0.71 0.63 mst 408 65.73 0.82 0.82 Olden suite perimeter 345 7.19 0.78 0.84 treeadd 154 10.17 0.48 0.55 tsp 433 20.44 0.96 0.97 voronoi 975 11.03 0.99 0.99 anagram 355 1.53 0.99 1.11 bc 4303 1.95 0.82 0.81 Ptrdist suite ft 926 8.25 0.83 0.73 ks 551 7.46 1.03 1.03 Avg. 0.84 0.84 -12 CS @ KAIST 16 2006-1 -12-1
Contribution • Predict memory access patterns at compile-time – Regular expressions – Automata reduction algorithm • Interpret predicted patterns according to heap layout transformations • Cache misses are reduced by 16% • Execution times are reduced by 14% -12 CS @ KAIST 17 2006-1 -12-1
Backup Slides
From CFG to Automaton start T F h == NULL T F NotFound k h → key == k h → data h = h → next d n return -12 CS @ KAIST 19 2006-1 -12-1
State Elimination ae*c e be*d -12 CS @ KAIST 20 2006-1 -12-1
From Automaton to R.E. (kn)*(kd+ e ) kn n n k k k k kd+ e kd+ e d d n (kn)*(kd+ � ) -12 CS @ KAIST 21 2006-1 -12-1
State Compare state_compare(state s1 , state s2 ) b1 Ã whether 9 s ’ .( s ’ → s1 , s1.dfn ≤ s ’ .dfn ) // 0 or 1 b2 Ã whether 9 s ’ .( s ’ → s2 , s2.dfn ≤ s ’ .dfn ) // 0 or 1 if b1 and not b2 then 1 // s1 > s2 else if not b1 and b2 then -1 // s1 < s2 else if b1 and b2 then compare( s2.dfn , s1.dfn ) // dfn = Depth First Numbering else compare( s1.dfn , s2.dfn ) end if -12 CS @ KAIST 22 2006-1 -12-1
Automata Reduction worklist à ; workhorse(state s ) if s ≠ start state and s ≠ end state then for all s ’ 2 s.successor do delete s ’ from worklist end for eliminate(s) for all s ’ 2 s.successor do push s ’ into worklist end for end if -12 CS @ KAIST 23 2006-1 -12-1
Automata Reduction reduce() E à { s 2 S | 9 s ’ . s → ε s ’ } R à { s 2 E | @ s ’ . s ’ → s , s.dfn ≤ s ’ .dfn } for all s 2 R do workhorse( s ) end for worklist à S\R while worklist ≠ ; do workhorse(pop( worklist )) end while -12 CS @ KAIST 24 2006-1 -12-1
From Intra- to Inter-proc. • Intrinsically, reverse topological order of a call graph • For self-recursive function calls, f() { a a ... = s.a; if (!end) f(); ... = s.b; f() } b b F → ab | aFb a i b i a*abb* -12 CS @ KAIST 25 2006-1 -12-1
Structure Selection • “ One structure per pool ” – Most pools are used in a type-consistent manner • Identify which structures are exhaustively used – Structure access patterns – Repeatedly used ones • Structure detection in closures -12 CS @ KAIST 26 2006-1 -12-1
Closure Detection • Presence of closures – EMPTY , NORMAL , HAVE main foo bar1 bar2 . . . . . . . . while(..) s->f1; foo(); bar1(); exc. exc. bar2(); s->f2; . . . . . . . . main x HAVE foo x HAVE foo x HAVE bar1 x HAVE bar1 x HAVE bar1 x HAVE bar2 x NORMAL bar2 x NORMAL bar2 x NORMAL bar2 x NORMAL -12 CS @ KAIST 27 2006-1 -12-1
Field Affinity ... ... o 3 .next o 4 .key o 4 .next o 5 .key o 5 .next o 6 .key 712440 key 4267275 2849975 7580 37858 key,next data next data 30278 704860 -12 CS @ KAIST 28 2006-1 -12-1
Affinity Relation Abstraction where F is the set of fields where VAR is the set of function names main foo bar1 bar2 . . . . s->f3; . s->f3; s->f1; while(..) s->f1; foo(); bar1(); bar2(); s->f2; . s->f2; s->f1; . . . . . bar2.s x {f1} main.s x {f3} foo.s x {f1} bar1.s x {f3} bar2.e x {f2} main.e x {f2} foo.e x {f2} bar1.e x {f1} bar2.r x main.r x foo.r x bar1.r x [(f1,f3) x {(0,3)} [(f1,f3) x {(0,2)} [(f1,f3) x {(0,1)} [(f1,f2) x {(0,1)}] (f1,f2) x (f1,f2) x (f1,f2) x {(1,1), (0,2)}] {(1,1), (0,2)}] {(1,1), (0,1)}] -12 CS @ KAIST 29 2006-1 -12-1
Recommend
More recommend