impact of different compiler options on energy
play

Impact of different compiler options on energy consumption James - PowerPoint PPT Presentation

Impact of different compiler options on energy consumption James Pallister University of Bristol / Embecosm Simon Hollis University of Bristol Jeremy Bennett Embecosm 1 Motivation Compiler optimizations are claimed to have a


  1. μ Impact of different compiler options on energy consumption James Pallister University of Bristol / Embecosm Simon Hollis University of Bristol Jeremy Bennett Embecosm 1

  2. μ Motivation ● Compiler optimizations are claimed to have a large impact on software: – Performance – Energy ● No extensive study prior to this considering: – Different benchmarks – Many individual optimizations – Different platforms ● This work looks at the effect of many different optimizations across 10 benchmarks and 5 platforms. ● 238 Optimization passes covered by 150 flags – Huge amount of combinations 2

  3. μ This Talk ● This talk will cover: – Importance of benchmarks – Platforms – How to explore 2^150 combinations of options – Correlation between time and energy – How to predict the effect of the optimizations 3

  4. μ Importance of Benchmarks ● One benchmark can't ● Broad categories to trigger all be considered for a optimizations benchmark: ● Perform differently on – Integer different platforms – Floating point ● Need a range of – Branching benchmarks – Memory 4

  5. μ Existing Benchmark Suites Considered ● MiBench ● Require embedded Linux ● WCET ● Targeted at higher-end systems ● DSPstone ● Multithreaded ● ParMiBench benchmarks typically for ● OpenBench HPC ● LINPACK ● Don't necessarily test all ● Livermore Fortran corners of the platform Kernels ● Dhry/Whet-stone 5

  6. μ Our Benchmark List 6

  7. μ Choosing the Platforms ● Range of different features in the platforms chosen – Pipeline Depth – Multi- vs Single- core – FPU available? – Caching – On-chip vs off-chip memory 8

  8. μ Platforms Chosen ARM Cortex-M0 ARM Cortex-M3 ARM Cortex-A8 XMOS L1 Adapteva Epiphany Small memory Small memory Large memory Small memory On-chip and off-chip memory Simple Pipeline Simple Pipeline, Complex Simple pipeline Simple superscalar with forwarding superscalar pipeline logic, etc. pipeline SIMD/FPU FPU Multiple threads 16 cores 9

  9. μ Experimental Methodology ● Compiler optimizations have many non-linear interactions ● 238 optimization passes combined into 150 different options (GCC) ● 82 compiler options enabled by O3 ● How to test all of these, while accounting for the interactions between optimizations? Fractional Factorial Designs ● 10

  10. μ Hardware Measurements ● Current, voltage and power monitor ● 10 kSamples/s ● Low noise ● XMOS board to control and timestamp measurements ● Integrate to get energy consumption 16

  11. μ Results ● Energy consumption ≈ Execution time – Generalization, not true in every case ● Optimization unpredictability ● No optimization is universally good across benchmarks and platforms 20

  12. μ Overview FDCT, Cortex-M0 FDCT, Cortex-A8 21

  13. μ Overview FDCT, Cortex-M0 FDCT, Cortex-A8 22

  14. μ Overview FDCT, Cortex-M0 FDCT, Cortex-A8 23

  15. μ Overview 24

  16. μ Time ≈ Energy O1 Flags, Blowfish, Cortex-M0 25

  17. μ When Time ≠ Energy ● Complex pipeline ● -ftree-vectorize – NEON SIMD unit – Much lower power O3 Flags, 2DFIR, Cortex-A8 29

  18. μ Conclusion: Mostly, Time ≈ Energy ● Complex pipelines: ● Highly correlated – Still a correlation ● Especially so for – But more variability 'simple' pipelines – SIMD, superscalar ● Little scope for stalling execution or superscalar ● To get the most optimal execution energy consumption we need better than “go fast” 30

  19. μ Optimization Unpredictability ● Pairs of optimizations on top of O0 ● Possibly higher order interactions occurring? O1 Flags, Cubic, Cortex-M0 31

  20. μ Conclusion: Which optimization to choose? For the general case, this question can't be answered ● Unpredictable interactions ● Evidence of higher ● Many non-linear order interactions effects between ● Not enough data optimizations? recorded in the fractional factorial design to model 35

  21. μ What does this mean? For the Compiler Writer ● Current optimizations ● Current optimization levels (O1, O2, etc.) are a good balance targeted for performances between compile time and ● Few (if any) optimizations performance/energy. in current compilers ● Never completely optimal designed to reduce ● Machine learning energy consumption – MILEPOST – Genetic algorithms 36

  22. μ Conclusion ● Time ≈ Energy – True for simple pipelines – Mostly true for complex pipelines – Good approximation ● Optimization unpredictability – Difficult to model the interactions between optimizations 38

  23. μ Questions? jp@cs.bris.ac.uk simon@cs.bris.ac.uk jeremy.bennett@embecosm.com All data at: www.jpallister.com/wiki 39

  24. μ The Best Three Optimizations for Energy 40

  25. μ Conclusion: Optimizations are common across architectures... … Sometimes ● A few consistently ● Common options good options for across all the ARM Epiphany platforms for a – Simpler instruction set particular benchmark – Newer compiler – Many more registers than ARM 41

Recommend


More recommend