μ Impact of different compiler options on energy consumption James Pallister University of Bristol / Embecosm Simon Hollis University of Bristol Jeremy Bennett Embecosm 1
μ Motivation ● Compiler optimizations are claimed to have a large impact on software: – Performance – Energy ● No extensive study prior to this considering: – Different benchmarks – Many individual optimizations – Different platforms ● This work looks at the effect of many different optimizations across 10 benchmarks and 5 platforms. ● 238 Optimization passes covered by 150 flags – Huge amount of combinations 2
μ This Talk ● This talk will cover: – Importance of benchmarks – Platforms – How to explore 2^150 combinations of options – Correlation between time and energy – How to predict the effect of the optimizations 3
μ Importance of Benchmarks ● One benchmark can't ● Broad categories to trigger all be considered for a optimizations benchmark: ● Perform differently on – Integer different platforms – Floating point ● Need a range of – Branching benchmarks – Memory 4
μ Existing Benchmark Suites Considered ● MiBench ● Require embedded Linux ● WCET ● Targeted at higher-end systems ● DSPstone ● Multithreaded ● ParMiBench benchmarks typically for ● OpenBench HPC ● LINPACK ● Don't necessarily test all ● Livermore Fortran corners of the platform Kernels ● Dhry/Whet-stone 5
μ Our Benchmark List 6
μ Choosing the Platforms ● Range of different features in the platforms chosen – Pipeline Depth – Multi- vs Single- core – FPU available? – Caching – On-chip vs off-chip memory 8
μ Platforms Chosen ARM Cortex-M0 ARM Cortex-M3 ARM Cortex-A8 XMOS L1 Adapteva Epiphany Small memory Small memory Large memory Small memory On-chip and off-chip memory Simple Pipeline Simple Pipeline, Complex Simple pipeline Simple superscalar with forwarding superscalar pipeline logic, etc. pipeline SIMD/FPU FPU Multiple threads 16 cores 9
μ Experimental Methodology ● Compiler optimizations have many non-linear interactions ● 238 optimization passes combined into 150 different options (GCC) ● 82 compiler options enabled by O3 ● How to test all of these, while accounting for the interactions between optimizations? Fractional Factorial Designs ● 10
μ Hardware Measurements ● Current, voltage and power monitor ● 10 kSamples/s ● Low noise ● XMOS board to control and timestamp measurements ● Integrate to get energy consumption 16
μ Results ● Energy consumption ≈ Execution time – Generalization, not true in every case ● Optimization unpredictability ● No optimization is universally good across benchmarks and platforms 20
μ Overview FDCT, Cortex-M0 FDCT, Cortex-A8 21
μ Overview FDCT, Cortex-M0 FDCT, Cortex-A8 22
μ Overview FDCT, Cortex-M0 FDCT, Cortex-A8 23
μ Overview 24
μ Time ≈ Energy O1 Flags, Blowfish, Cortex-M0 25
μ When Time ≠ Energy ● Complex pipeline ● -ftree-vectorize – NEON SIMD unit – Much lower power O3 Flags, 2DFIR, Cortex-A8 29
μ Conclusion: Mostly, Time ≈ Energy ● Complex pipelines: ● Highly correlated – Still a correlation ● Especially so for – But more variability 'simple' pipelines – SIMD, superscalar ● Little scope for stalling execution or superscalar ● To get the most optimal execution energy consumption we need better than “go fast” 30
μ Optimization Unpredictability ● Pairs of optimizations on top of O0 ● Possibly higher order interactions occurring? O1 Flags, Cubic, Cortex-M0 31
μ Conclusion: Which optimization to choose? For the general case, this question can't be answered ● Unpredictable interactions ● Evidence of higher ● Many non-linear order interactions effects between ● Not enough data optimizations? recorded in the fractional factorial design to model 35
μ What does this mean? For the Compiler Writer ● Current optimizations ● Current optimization levels (O1, O2, etc.) are a good balance targeted for performances between compile time and ● Few (if any) optimizations performance/energy. in current compilers ● Never completely optimal designed to reduce ● Machine learning energy consumption – MILEPOST – Genetic algorithms 36
μ Conclusion ● Time ≈ Energy – True for simple pipelines – Mostly true for complex pipelines – Good approximation ● Optimization unpredictability – Difficult to model the interactions between optimizations 38
μ Questions? jp@cs.bris.ac.uk simon@cs.bris.ac.uk jeremy.bennett@embecosm.com All data at: www.jpallister.com/wiki 39
μ The Best Three Optimizations for Energy 40
μ Conclusion: Optimizations are common across architectures... … Sometimes ● A few consistently ● Common options good options for across all the ARM Epiphany platforms for a – Simpler instruction set particular benchmark – Newer compiler – Many more registers than ARM 41
Recommend
More recommend