Modelling the Energy Consumption of Soft Real-Time Tasks on Heterogeneous Computing Architectures H.E. Zahaf 1 , R. Olejnik 1 , G. Lipari 1 , A.E. Benyamina 2 1 Université de Lille 2 University of Oran January 19, 2016
Outline Introduction Experimental setting Time vs. energy Conclusions and Current work
Outline Introduction Experimental setting Time vs. energy Conclusions and Current work
Context and motivation Computing at the edge
Fog Computing ◮ Fog Computing characteristics ◮ Computing at the edge means that data are pre-processed before being stored in the cloud ◮ thus reducing network load ◮ Fog Computing requirements ◮ Multicore, heterogeneous ◮ different kind of computation are needed ◮ Low power consumption ◮ (Soft real-time)
Minimise power consumption ◮ Modern processors have many ways of reducing power consumption ◮ Dynamic Voltage and Frequency Scaling (DVFS) ◮ dynamically adjust processor frequency to minimise energy . . . ◮ . . . without reducing performances too much ◮ Dynamic Power Management (DPM) ◮ Turn off processors that are not used/needed ◮ Pack all computation in a small number of processors . . . ◮ . . . without reducing performance too much ◮ In any case, performance is the key here
Soft real-time tasks ◮ A soft real-time task consists of a sequence of processing to be executed periodically ◮ e.g.: every 20 msec, encode one video frame ◮ Period = 20 msec ◮ Usually associated with a deadline ◮ every video frame must be encoded within 20 msec ◮ Deadline = 20 msec ◮ Goal: ◮ find the minimum frequency such that the task completes within its deadline τ 1 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Soft real-time tasks ◮ A soft real-time task consists of a sequence of processing to be executed periodically ◮ e.g.: every 20 msec, encode one video frame ◮ Period = 20 msec ◮ Usually associated with a deadline ◮ every video frame must be encoded within 20 msec ◮ Deadline = 20 msec ◮ Goal: ◮ find the minimum frequency such that the task completes within its deadline τ 1 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Soft real-time tasks ◮ A soft real-time task consists of a sequence of processing to be executed periodically ◮ e.g.: every 20 msec, encode one video frame ◮ Period = 20 msec ◮ Usually associated with a deadline ◮ every video frame must be encoded within 20 msec ◮ Deadline = 20 msec ◮ Goal: ◮ find the minimum frequency such that the task completes within its deadline τ 1 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Soft real-time tasks ◮ A soft real-time task consists of a sequence of processing to be executed periodically ◮ e.g.: every 20 msec, encode one video frame ◮ Period = 20 msec ◮ Usually associated with a deadline ◮ every video frame must be encoded within 20 msec ◮ Deadline = 20 msec ◮ Goal: ◮ find the minimum frequency such that the task completes within its deadline τ 1 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
The problem ◮ Model: ◮ a set of soft real-time tasks ◮ with different periods, deadlines, execution time profiles ◮ scheduled by an operating system using a real-time scheduling algorithm ◮ on a set of heterogeneous processors ◮ Problem: ◮ allocate tasks to processor ◮ set frequency ◮ set scheduling parameters ◮ Objective ◮ minimise total energy without missing deadlines
Energy model ◮ In order to solve the problem, we need to have a model of the energy consumption ◮ Problems: ◮ Processor model: ◮ Energy saving mechanisms, internal to the chip and transparent to the programmer, try to minimise energy of micro-operations ◮ Complexity of the hardware/software interaction ◮ Influence of pipeline and cache on execution time ◮ Tasks share resources (caches, memory bus, peripherals) ◮ It is impossible to derive an exact model ◮ we resort to measurement
Outline Introduction Experimental setting Time vs. energy Conclusions and Current work
ARM Big/Little RAM Memory L2 Cache L2 Cache Big's cluster Little's cluster L1 Cache L1 Cache L1 Cache L1 Cache ARM ARM ARM ARM Cortex A15 Cortex A15 Cortex A7 Cortex A7 GPU Mali L1 Cache L1 Cache L1 Cache L1 Cache ARM ARM ARM ARM Cortex A15 Cortex A15 Cortex A7 Cortex A7 Figure : ARM Big/Little ◮ It is possible to set the frequency of each processor group, but not of individual cores ◮ Each group (Little or Big) has its own characteristics in terms of execution time speed-up and energy consumption
Energy Sensors ◮ Sensors: ◮ current for Little’s group ◮ current for Big’s group ◮ current for RAM ◮ A global amperometer for the card ◮ used to check consistency of measurements
Benchmarks ◮ Three periodic tasks ◮ MATMUL (L) ◮ Multiplying 2 square matrices of LxL for a certain number of times ◮ FFT ◮ Fast Fourier transform of a random input signal ◮ FFMEG ◮ the decoding algorithm of a specific video input ◮ Tasks were executed periodically every T units of time ◮ we measured execution time, and energy consumption of processors and memory ◮ Linux OS ◮ Frequency governor disabled
Outline Introduction Experimental setting Time vs. energy Conclusions and Current work
Execution time ◮ The execution time of MATMUL(200x200) thread allocated on one little/big Core, with no interference 2 . 5 B-avg L-avg 2 Execution Time (S) 1 . 5 1 0 . 5 0 500 1 , 000 1 , 500 2 , 000 Frequency (Mhz)
Model of computation time ◮ Computation time varies with frequency according to the following rule C i ( f ) = f m f ct i + mt i ◮ Two components: ◮ ct i represents the number of instruction cycles executed on the processor ◮ mt i represent the main memory access ◮ The second component does not vary with frequency, but depends on the number of cache misses ◮ hence on the interference of other tasks on the cache and on the bus
Computing task’s parameters ◮ We can compute both components for each task in a typical setting, with a simple regression ◮ Example: MATMUL(size) ◮ (times are expressed in milliseconds) Size RSS (Kb) ct (L) mt (L) ct (B) mt (B) 150 1272 98 15 23 7 200 1452 254 17 66 8 250 1651 526 19 146 9 300 1840 978 21 278 10
Impact of interference ◮ Co-execution of an interfering task (MATMUL(200x200)) ◮ the interference increases with the size of the matrix L-With-P B-With-P 0 . 8 0 . 8 L-Without B-Without Execution Time (S) Execution Time (S) 0 . 6 0 . 6 0 . 4 0 . 4 0 . 2 0 . 2 0 0 200 400 600 800 1 , 000 1 , 200 1 , 400 500 1 , 000 1 , 500 2 , 000 Frequency (Mhz) Frequency (Mhz)
Dynamic power ◮ Energy consumption of MATMUL(150) on Big and Little cores 4 B-avg L-avg 3 Power (w) 2 1 0 500 1 , 000 1 , 500 2 , 000 Frequency (Mhz) ◮ The little at f max = 1400 consumes less than the big at f min = 200 ◮ Power can be model as a polynomial of 3 rd degree: P ( f ) = af 3 + bf 2 + cf + d
Impact of idle processors ◮ We can only measure the energy consumed by all little cores ◮ one single sensor per group of cores 6 One-L Two-L Three-L One-B 4 Power (w) Two-B Three-B 2 0 500 1 , 000 1 , 500 2 , 000 Frequency (Mhz) ◮ Not easy to understand what it is going on: ◮ the OS puts the core in low power mode when not executing, reducing also static energy ◮ however, there is a shared "base" for all processors
Power consumption of RAM ◮ Big core consumes slightly less ◮ probably due to the larger L2 cache (less cache misses) 0 . 1 8 · 10 − 2 6 · 10 − 2 4 · 10 − 2 Little core Big core 2 · 10 − 2 0 500 1 , 000 1 , 500 2 , 000 Frequency (Mhz)
Model of energy ◮ We used a 3-degree polynomial of frequency to model the energy consumption: P ( f ) = af 3 + bf 2 + cf + d 0 . 4 Real-FFT-1 0 . 3 Power (W) 0 . 2 0 . 1 0 200 400 600 800 1 , 000 1 , 200 1 , 400 Frequency (Mhz)
Model of energy ◮ We used a 3-degree polynomial of frequency to model the energy consumption: P ( f ) = af 3 + bf 2 + cf + d 0 . 4 Real-FFT-1 Rg-FFT-1 0 . 3 Power (W) 0 . 2 0 . 1 0 200 400 600 800 1 , 000 1 , 200 1 , 400 Frequency (Mhz)
Model of energy ◮ We used a 3-degree polynomial of frequency to model the energy consumption: P ( f ) = af 3 + bf 2 + cf + d 0 . 4 Real-FFT-1 Rg-FFT-1 Real-MM-1 0 . 3 Power (W) 0 . 2 0 . 1 0 200 400 600 800 1 , 000 1 , 200 1 , 400 Frequency (Mhz)
Model of energy ◮ We used a 3-degree polynomial of frequency to model the energy consumption: P ( f ) = af 3 + bf 2 + cf + d 0 . 4 Real-FFT-1 Rg-FFT-1 Real-MM-1 0 . 3 Rg-MM-1 Power (W) 0 . 2 0 . 1 0 200 400 600 800 1 , 000 1 , 200 1 , 400 Frequency (Mhz)
Model of energy ◮ We used a 3-degree polynomial of frequency to model the energy consumption: P ( f ) = af 3 + bf 2 + cf + d 0 . 4 Real-FFT-1 Rg-FFT-1 ◮ Regression: Real-MM-1 0 . 3 Rg-MM-1 Power (W) FFT MatMul 4 . 6 · 10 − 11 5 . 2 · 10 − 11 0 . 2 a 2 . 2 · 10 − 8 4 . 1 · 10 − 9 b 3 . 4 · 10 − 8 7 . 8 · 10 − 5 c 0 . 1 4 . 4 · 10 − 2 1 . 7 · 10 − 2 d 0 200 400 600 800 1 , 000 1 , 200 1 , 400 Frequency (Mhz)
Recommend
More recommend