Efficient Utilization of Scratch ‐ Pad Memory in Preemptive Multi ‐ Task Systems Hiroyuki Tomiyama Ritsumeikan University http://hiroyuki.tomiyama ‐ lab.org/ Motivation Memory is one of the most energy ‐ hungry subsystems in embedded systems Up to 50% of total energy Cache improves energy efficiency by reducing off ‐ chip memory accesses Cache is still energy hungry because of Tag comparison Automatic replacement mechanism Parallel accesses to multiple ways (in high ‐ performance cache) Use of SPM instead of (in addition to) cache Normalized read energy (calculated by CACTI 5.0) SPM Direct $ 2 ‐ way $ 4 ‐ way $ 1 1.56 1.93 2.54 SPM is energy efficient but small 2
Overview How to efficiently utilize SPM in the presence of multiple tasks? For simplicity, this talk focuses on instruction memory Static data is OK, but stack and heap data need special care. Outline SPM partitioning and code allocation for Non ‐ preemptive multi ‐ task systems [Takase, Tomiyama and Takada, VLSI ‐ DAT 2009] Preemptive multi ‐ task systems [Takase, Tomiyama and Takada, DATE 2010] Code layout for inter ‐ task Interference minimization [Gauthier, Ishihara, Takase, Tomiyama and Takada, CASES 2010] Main Contributors Hideki Takase (Ph.D. Candidate, Nagoya University) Lovic Gauthier (Associate Professor, Kyushu University) 3 SPM Allocation Principle Which memory objects should be placed in SPM? Memory objects can be functions (procedures), basic blocks, or other granularity. For simplicity, we consider functions as memory objects Knapsack problem 1 if i ‐ th function is placed in SPM. Otherwise, 0. x i # of accesses to i ‐ th function fetch i (Obtained by profiling) Code size of i ‐ th function size i Σ i fetch i × x i Maximize Σ i size i × x i ≦ SPMsize Subject to 4
Task Execution Model Task states dispatch Dormant / Ready / Running Ready Running Task scheduling policy activate terminate Dormant All tasks are periodic and independent Fixed ‐ priority ‐ based scheduling The highest priority task among ready tasks gets dispatched when CPU becomes available Periods and priorities of tasks are statically decided No task preemption 5 SPM Partitioning and Code Allocation SPM partitioning Assignment of SPM address space to tasks Code allocation Assignment of memory objects to SPM Three methods Spatial method Temporal method Hybrid method Execution time Execution time Execution time task Task1 arrival task is Task2 runnig MM-SPM Task3 copy 6
Spatial Method SPM space is exclusively partitioned and assigned to tasks No transfer necessary region SPM between SPM and main memory Effective for large SPM ILP Formulation of simultaneous partitioning and allocation func i,j j ‐ th function of i ‐ th task 1 if func i,j is placed in SPM x i,j Period of i ‐ th task period i hyperperiod Least common multiple of periods 7 Temporal Method Running task may use entire SPM space When dispatched, code is region transferred from main SPM memory to SPM. Effective for small SPM ILP formulation of simultaneous partitioning and allocation Eoverhead i,j Energy overhead for transfer of func i,j 1 if func i,j is placed in SPM y i,j 8
Hybrid Method Mixture of spatial and temporal approaches More flexible than the two approaches Partition the SPM space into two regions Spatial region Temporal region Spatial region is further partitioned and assigned to tasks statically Execution time task Task1 arrival task is Task2 running MM-SPM Task3 copy Spatial region region SPM Temporal region 9 Hybrid Method ILP Formulation Partitioning of SPM into spatial region and temporal one Partitioning of spatial region into tasks Code allocation for temporal region 10
Experimental Setup and Tools Simulator : SimpleScalar / ARM An instruction ‐ set simulator of ARM7TDMI microprocessor Compiler : arm ‐ linux ‐ gcc 2.95.2 ILP solver : GNU GLPK 4.23 Memory configurations: On ‐ chip: 16KBytes 4 ‐ way cache + 4K / 8K / 12K / 16KBytes SPM Energy model: CACTI 4.2 Off ‐ chip main memory: Mobile DDR SDRAM Energy model: Micron System ‐ Power Calculator Benchmark task sets (from MiBench suite) TasksetA : bf / tiff2rgba TasksetB : cjpeg / crc / qsort / tiff2rgba TasksetC : bitcnts / cjpeg / ispell / rawcaudio / sha TasksetD : bitcnts / bf / crc / dijkstra / ispell / qsort / rawcaudio / sha TasksetE : bitcnts / bf / cjpeg / crc / dijkstra / ispell / qsort / rawcaudio / sha / tiff2rgba 11 Experimental Procedure 12
Results: TasksetE (10 tasks) 80.0 cache hit cache miss -47.2 % SPM hit Overhead 60.0 Energy [mJ] 40.0 20.0 0.0 Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Std Std Std Std 4k 8k 12k 16k Std : Simple spatial method where SPM is partitioned equally to every task Spt: Spatial method, Tmp: Temporal method, Hyb: Hybrid method 13 Results: TasksetA (2 tasks) 4.0 cache hit cache miss SPM hit Overhead -28.4 % 3.0 Energy [mJ] 2.0 1.0 0.0 Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Spt Tmp Hyb Std Std Std Std 4k 8k 12k 16k 14
Results: TasksetA / TasksetC / TasksetE 1.2 Cache hit Cache miss SPM hit Overhead 1.0 Normalized Energy Consumption 0.8 0.6 0.4 0.2 0.0 Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Std Prd Hyb Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn Rgn 4K 8K 12K 16K 4K 8K 12K 16K 4K 8K 12K 16K setA setC setE Hybrid approach is stably good Increased SPM size is not always effective 15 Preemptive Multi ‐ Task Systems dispatch Task states Ready Running Dormant / Ready / Running preempted Task scheduling policy terminate activate All tasks are periodic and independent Dormant Fixed ‐ priority preemptive scheduling Periods and priorities of tasks are statically decided Higher ‐ priority task preempts lower ‐ priority task under execution 16
SPM Partitioning and Code Allocation Spatial method Same as non ‐ preemptive systems Temporal method Hybrid method 17 Temporal Method Running task may use entire SPM space Program code is transferred at most twice per execution When the task gets started 1. When a higher priority task is completed, and a preempted 2. task resumes execution The contents of the preempted task needs to be restored into SPM 18
Temporal Method: ILP Formulation Eoverhead i,j : Energy consumption of transferring func i,j SPMsize_tmp i : Amount of SPM space that task i can use. y i,j : 1 if func i,j is placed in SPM. 19 Hybrid Method Mixture of the two methods At compile time, SPM is partitioned by the spatial method At run time, a higher priority task may preempt not only CPU but also SPM space of lower ‐ priority tasks Reduces overhead of high ‐ priority tasks Execution time Execution time Execution time Execution time task task task task Task1 Task1 Task1 Task1 arrival arrival arrival arrival task is task is task is task is Task2 Task2 Task2 Task2 running running running running Task3 Task3 Task3 Task3 MM-SPM MM-SPM MM-SPM MM-SPM copy copy copy copy Task1 preempts SPM The contents of spaces of Task2 and 3 SPM is restored 20
Hybrid Method: ILP Formulation SPMsize_spt i SPM size statically assigned to task i by spatial method Constraint (1) SPMsize_tmp i SPM size which task i preempts by temporal method Constraint (2) 21 Experimental Setup and Tools Simulator: SkyEye ‐ 1.2.6_rc1 (ARM920T) ILP solver: GNU GLPK 4.23 Compiler: arm ‐ elf ‐ gcc 4.1.1 RTOS: TOPPERS/ASP Kernel (Release 1.3.2) Memory configurations: On ‐ chip: 4 KBytes 4 ‐ way cache + 1 / 2 / 4 / 8 KBytes SPM Off ‐ chip main memory: Mobile DDR SDRAM Energy model: CACTI 5.3 Task sets: tasks are selected from EEMBC suites SetA: aifftr, basefp, bitmnp, cacheb, idctrn SetB: bezier, dither, ospf, pktflow, rotate, routelookup, text SetC: conven, rgbcmy, rgbriq, viterb, and SetB SetD: SetA and SetC The periods were set according to be proportional to their execution times The total CPU utilization rate of the task set was set about 50 % 22
Overall Workflow 23 Results: SetC (11 tasks) cache hit cache miss 1600 -73 % Energy Consumption [uJ] SPM hit overhead 1200 800 400 0 Tmp Tmp Tmp Tmp Std Spt Hyb Std Spt Hyb Std Spt Hyb Std Spt Hyb 1k 2k 4k 8k Std: Simple method where SPM space is partitioned equally to each task Spt: Spatial method, Tmp: Temporal method, Hyb: Hybrid method 24
Recommend
More recommend