tempo a program specializer for c
play

TEMPO, A Program Specializer for C Renaud MARLET Compose group - PowerPoint PPT Presentation

TEMPO, A Program Specializer for C Renaud MARLET Compose group IRISA / INRIA Rennes (France) Dynamo '00 1 What it is / What it does Automatic compile-time and run-time specialization Program and data specialization Modular


  1. TEMPO, A Program Specializer for C Renaud MARLET Compose group IRISA / INRIA Rennes (France) Dynamo '00 1

  2. What it is / What it does  Automatic compile-time and run-time specialization  Program and data specialization  Modular specialization  Incremental specialization  Real-size applications (~ 6,000 specialized lines) Q  Back-end partial evaluator for Java (Jspec)  Publicly available (~ 40 licenses) Dynamo '00 2

  3. Some Applications of Tempo  Operating systems [PEPM’97, ICDCS’97]  Sun RPC (3.7x), Chorus IPC (1.5x), BPF (4x)  Numerical computations [LNCS, ICCL’98, PEPM’99]  FFT (4–12x), standard library routines  Computer graphics [ECOOP’99]  Convolution filters (4x)  Software architectures [ASE’97]  Selective broadcast, software layers, generic libraries, …  Compilers/JITs for interpreters [DSL’97, SRDS’98, ICDCS’99]  PLAN-P (80x, 96% of C throughput), O’Caml (1.2–2.5x) … Dynamo '00 3

  4. Overview Abstract Behavior C source specialization of external context functions Analysis C source annotated with specialization actions Compile-time specializer Run-time specializer generator generator Concrete specialization Run-time specializer Compile-time specializer context Specialized source Specialized binary Dynamo '00 4

  5. Stages: S D Specialization Templates dotprod(size,u[],v[]) dotprod(size,u[],v[]) dotprod(size,u[],v[]) size=3 u[]={7,4,6} { { dotprod_size_u(v[]) res = 0; res = 0; T1 { for(i = 0; i < size; i++) for(i = 0; i < size; i++) T1 res = 0; H1 H2 { { T2 res += 7 * v[0]; res += u[i] * v[i]; res += u[i] * v[i]; res += 4 * v[1]; T2 T2 } } res += 6 * v[2]; T2 return res; return res; return res; } } T3 T3 } dotprod_size_u(v[]) T1 T2[ 7 , 0 ] T2[ 4 , 1 ] T2[ 6 , 2 ] T3 Dynamo '00 5

  6. Dedicated Code generation instructions: Stages: S D Run-Time Specializer dotprod_spec(size,u[]) dotprod(size,u[],v[]) dotprod(size,u[],v[]) dotprod(size,u[],v[]) { { { buf = alloc(); res = 0; res = 0; T1 copy_temp(buf, T1 ); for(i = 0; i < size; i++) for(i = 0; i < size; i++) for(i = 0; i < size; i++) { H1 H2 { { copy_temp(buf, T2 ); res += u[i] * v[i]; res += u[i] * v[i]; T2 fill_hole(buf, H1 ,u[i]); } } fill_hole(buf, H2 ,i); return res; return res; } } } T3 copy_temp(buf, T3 ); return buf; } buf T1 T2[ u[0] , 0 ] T2[ u[1] , 1 ] T2[ u[2] , 2 ] T3 Dynamo '00 6

  7. Tentative Balance-Sheet for Tempo (1994 – 1999) Pros Cons  Automation, safety  Complex declarations  Non-intrusiveness  Slicing & re-plugging  Accurate analyses Q  Fixed precision  Predictability  A posteriori control  Low break-even point  Code less optimized  Easy engineering  Limitations  AST, compiler re-use  BT precision, optimisation  Realistic applications  Prototype  Framework for CT/RT Dynamo '00 7

  8. Precision of the Analyses [PEPM’97, SAS’97, TCS’00] Analyses Alias Binding time Interprocedural   Flow-sensitive   Context-sensitive on-going work  Return-sensitive  N.A. Use-sensitive  N.A. Field-sensitive per struct type per struct type (or instance) (or instance) Dynamo '00 8

  9. Challenges?  Detecting specialization opportunities:  Existing code already hand-optimized  Little hope Dynamo '00 9

  10. Challenges  Architecturing software for specialization  Development methodology  More quantitative prediction  Declaring specialization  More automation: no slicing and plugging (guards)  Less inference, more checking: downgrade Tempo  Make the technology usable by humans Dynamo '00 10

  11. Extra slides Dynamo '00 11

  12. Stages: S D Making Templates /* T1_start: */ /* T1_start: */ dotprod(v[]) dotprod(size,u[],v[]) dotprod(size,u[],v[]) dotprod(size,u[],v[]) { { { res = 0; res = 0; res = 0; T1 T1_end: T1_end: while( dummy ){ for(i = 0; i < size; i++) for(i = 0; i < size; i++) T2_start: T2_start: H1 H2 { { &h1 &h2 res += * v[ ]; &h1 &h2 res += u[i] * v[i]; res += u[i] * v[i]; T2_end: T2_end: T2 } } } T3_start: T3_start: return res; return res; return res; } } T3 } /* T3_end: */ /* T3_end: */ • Re-use existing compiler • Re-use existing compiler • Re-use existing compiler • Re-use existing compiler • Symbol table • Symbol table • Symbol table • Original control flow • Original control flow • Prevent inter-template code motion Dynamo '00 12

  13. Generating The Run-Time Specializer Start & end template marks: labels Holes: ptr to global variables specialization actions Templates ( .c ) tcc Templates objdump Holes gcc bfd Templates ( .o ) Templates description Symbol table Inter-template jumps Template offsets Code generator ( .c ) gcc Code generator ( .o ) ld Dedicated run-time specializer ( .o ) + peep-hole optimisations + inlining (register usage) Dynamo '00 13

  14. Run-Time Specialization: Implementation  Compilers: gcc, lcc  Machines: Sparc, Pentium  Main run-time cost: copying instructions  Little inter-template optimizations  Run-time inlining Dynamo '00 14

  15. Run-Time Specialization: Experimental Results 1 Time (normalized) 0.9 0.8 0.7 Original 0.6 0.5 RT-specialized 0.4 CT-specialized 0.3 0.2 0.1 0 integration Cubic spline approximation Dithering FFT Romberg Chebyshev CT-specialized compiled with optimizations ⇒ “optimal” Applications Dynamo '00 15

Recommend


More recommend