efficient program compilation through machine learning
play

Efficient Program Compilation through Machine Learning Techniques - PowerPoint PPT Presentation

Efficient Program Compilation through Machine Learning Techniques Gennady Pekhimenko IBM Canada Angela Demke Brown University of Toronto Motivation Unroll My cool Compiler Unroll Executable program -O2 Inline Inline Peephole


  1. Efficient Program Compilation through Machine Learning Techniques Gennady Pekhimenko IBM Canada Angela Demke Brown University of Toronto

  2. Motivation Unroll My cool Compiler Unroll Executable program -O2 Inline Inline Peephole Peephole few seconds DCE DCE Unroll But what to do if executable is slow? Unroll Unroll Unroll Unroll Replace – O2 with – O5 Unroll Optimization New 100 Fast Executable 1-10 minutes

  3. Motivation (2) Compiler Executable -O2 1 hour Too slow! Our cool Operating System Compiler New Executable -O5 20 hours We do not have that much time Why did it happen?

  4. Basic Idea Do we need all these optimizations for every function? Unroll Unroll Unroll Probably not. Optimization 100 Compiler writers can typically solve this problem , but how ? 1. Description of every function 2. Classification based on the description 3. Only certain optimizations for every class Machine Learning is good for solving this kind of problems

  5. Overview  Motivation  System Overview  Experiments and Results  Related Work  Conclusions  Future Work

  6. Initial Experiment 3X difference on average

  7. Initial Experiment (2) SPEC2000 execution time at – O3 and – qhot – O3 Time, secs 500 "-O3" 400 "-qhot -O3" 300 200 100 0 bzip2 applu crafty eon gap gzip mcf vortex vpr ammp art equake facerec fma3d galgel lucas mgrid sixtrack swim wupwise mesa Benchmarks

  8. Our System Gather Training Data Prepare • extract features • modify heuristic values • choose transformations Measure Compile • find hot methods run time Best feature Online Offline settings Learn Deploy Logistic Regression Classifier TPO/XL Compiler Classification set heuristic values parameters

  9. Data Preparation • Existing XL compiler is Three key elements: missing functionality • Extension was made to the  Feature extraction existing Heuristic Context approach  Heuristic values modification  Target set of transformations • Unroll • Total # of insts • Wandwaving • Loop nest level • If-conversion • # and % of Loads, Stores, • Unswitching Branches • CSE • Loop characteristics • Index Splitting …. • Float and Integer # and %

  10. Gather Training Data  Try to “cut” transformation backwards (from last to first) Late Wandwaving Unroll Inlining  If run time not worse than before, transformation can be skipped  Otherwise we keep it  We do this for every hot function of every test The main benefit is linear complexity.

  11. Learn with Logistic Regression Function Classifier Descriptions Input Best Heuristic • Logistic Regression Values • Neural Networks • Genetic Programming Compiler + .hpredict Output Heuristic Values files

  12. Deployment Online phase, for every function:  Calculate the feature vector  Compute the prediction  Use this prediction as heuristic context Overhead is negligible

  13. Overview  Motivation  System Overview  Experiments and Results  Related Work  Conclusions  Future Work

  14. Experiments Benchmarks: SPEC2000 Others from IBM customers Platform: IBM server, 4 x Power5 1.9 GHz, 32GB RAM Running AIX 5.3

  15. Results: compilation time Normalized Time 2x 1 average 0.9 speedup Oracle 0.8 0.7 Classifer 0.6 0.5 0.4 0.3 0.2 0.1 0 mcf vortex vpr bzip2 crafty eon gap gzip ammp applu art equake fma3d galgel lucas mesa mgrid sixtrack swim wupwise facerec GeoMean Benchmarks

  16. Results: execution time Time, secs 100 150 200 250 300 350 50 0 bzip2 crafty eon gap gzip mcf vortex vpr ammp Benchmarks applu art equake facerec fma3d galgel lucas mesa mgrid sixtrack Classifer Oracle Baseline swim wupwise

  17. New benchmarks: compilation time Normalized Time 1 0.8 Classifier 0.6 0.4 0.2 0 Benchmarks

  18. New benchmarks: execution time Time, secs 350 Baseline 300 Classifer 250 200 150 4% speedup 100 50 0 apsi parser twolf dmo argonne Benchmarks

  19. Overview  Motivation  System Overview  Experiments and Results  Related Work  Conclusions  Future Work

  20. Related Work  Iterative Compilation  Pan and Eigenmann  Agakov, et al.  Single Heuristic Tuning  Calder, et al.  Stephenson, et al.  Multiple Heuristic Tuning  Cavazos, et al.  MILEPOST GCC

  21. Conclusions and Future Work  2x average compile time decrease  Future work  Execution time improvement  -O5 level  Performance Counters for better method description  Other benefits  Heuristic Context Infrastructure  Bug Finding

  22. Thank you  Raul Silvera, Arie Tal, Greg Steffan, Mathew Zaleski  Questions?

Recommend


More recommend