multicore shared memory in interference analysis through
play

MULTICORE SHARED MEMORY IN INTERFERENCE ANALYSIS THROUGH HARDWARE - PowerPoint PPT Presentation

MULTICORE SHARED MEMORY IN INTERFERENCE ANALYSIS THROUGH HARDWARE PERFORMANCE COUNTERS Alfonso Mascareas Gonzlez Youcef Bouchebaba Luca Santinelli GEN-F178-3 (GEN-SCI-029) PLAN 1. Objectives 2. Background 3. Multicore device 4.


  1. MULTICORE SHARED MEMORY IN INTERFERENCE ANALYSIS THROUGH HARDWARE PERFORMANCE COUNTERS Alfonso Mascareñas González Youcef Bouchebaba Luca Santinelli GEN-F178-3 (GEN-SCI-029)

  2. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework 5. Task design 6. Statistical application 7. Results 8. Conclusions 2

  3. OBJECTIVES • Design and validate a Performance Monitor Hardware measurement based framework • Analyze memory interference within a multicore system • Check the pWCET applicability on the obtained results 3

  4. BACKGROUND Core Cache • Critical application: Meet timing conditions Interconnection • Single core vs Multicore processor systems Memory • Multicore systems + Throughput + SWaP (Size, Weight and Power) Core 1 Core 2 - Predictability: Interference within the whole platform increases • Timing analysis: Tasks Worst Case Execution Time (WCET) to Cache Cache Tasks probabilistic WCET (pWCET) Shared cache Interconnection Memory 4

  5. MULTICORE DEVICE: OVERVIEW Keystone II TCI6630K2L • 2 ARM cores @ 1.2GHz • 4 DSP cores @ 1.2GHz • L1, L2 cache memories • MSM SRAM and DDR3 memories 5

  6. MULTICORE DEVICE: MEMORY ORGANIZATION 1MB L2 1MB L2 32KB L1P 32KB L1D 32KB L1P 32KB L1D 2MB MSM DSP1 DSP2 2GB DDR 1MB L2 1MB L2 32KB L1P 32KB L1D 32KB L1P 32KB L1D 1MB L2 DSP3 DSP4 32KB L1P 32KB L1D 32KB L1P 32KB L1D ARM1 ARM2 6

  7. MEASUREMENT FRAMEWORK Events (~ 80) • Performance Monitor Hardware (PMH): L1 data cache refill • Coprocessors L1 data cache access • Performance Monitor Unit (PMU): 6 general counters + 1 cycle specific Mispredicted branch speculatively executed counter Execution cycles • Start-read access pattern: L2 data cache access 1. Selection of the counter 2. Selection of the event L2 data cache refill 3. Enable counter L2 data cache Write-Back 4. Reset counter Bus access 5. Read actual counter value (first time) 6. Run critical task Data memory access 7. Read actual counter value (second time) and make the difference … 7

  8. TASKS DESIGN • The real-time applications: Loops • Critical task: The one under observation. Three Simple operations stressing levels to choose (safety1, safety2, safety3) Matrices: Main memory demanding source • Non-critical tasks: Act as memory stressing source Tasks are continuously being executed. They are structured as follows: ▪ Critical task in 1 ARM ARMs are managed by PikeOS ▪ Non-critical task in 1 ARM and 4 DSPs DSPs are fully bare metal 8

  9. STATISTICAL APPLICATION: pWCET & EVT MBPTA = Measurement-Based Probabilistic Hypothesis to fulfill: Timing Analysis 1. Stationarity MBTA = Measurement-Based Timing 2. Short or Long range independence Analysis 3. Maximum Domain of Attraction (MDA) EVT = Extreme Value Theorem Measures MBTA EVT MBPTA Relative Relative pWCET WCET 9

  10. SCENARIOS: DESIGN ARM Critical Scenario1 Task ARM ARM Four possible scenarios: Non- Critical Scenario2 critical Task Task 1. Critical task analysis DSP DSP 2. Critical task + ARM non-critical task analysis Non-critical Non-critical ARM Task Task Critical Scenario3 3. Critical task + DSPs non-critical task analysis DSP DSP Task Non-critical Non-critical 4. Critical task + ARM and DSPs non-critical task analysis Task Task DSP DSP Non-critical Non-critical ARM ARM Task Task Non- Critical Scenario4 critical DSP DSP Task Non-critical Non-critical Task Task Task 10

  11. SCENARIO 1 RESULTS: EXECUTION CYCLES (SAFETY1) Memory usage = 32KB 38498 L1-L2 36000 cycles Memory usage = 128KB 140957 L2 139000 cycles 11

  12. SCENARIO 1 RESULTS: EXECUTION CYCLES (SAFETY1) Memory usage = 512KB L2 542626 cycles Memory usage = 2MB DDR 12

  13. SCENARIO 1 SUMMARY: EXECUTION CYCLES Safety3 Safety1 13

  14. Non-critical SCENARIO 2 RESULTS: EXECUTION CYCLES (SAFETY1) task memory usage = 2MB DDR Memory usage = 128KB Memory usage = 2MB DDR 14

  15. Non-critical SCENARIO 2 SUMMARY: EXECUTION CYCLES task memory usage = 2MB Safety1 Memory Size (KB) Mean Overhead (%) Max Overhead (%) 8 0,185 11,241 32 7,362 13,735 128 21,228 45,112 512 10,72 23,481 2048 4,091 4,363 15

  16. Non-critical SCENARIO 3 RESULTS: EXECUTION CYCLES (SAFETY1) task memory usage = 12MB Memory usage = 8MB 0 DSPs Memory usage = 8MB 1 DSPs 16

  17. Non-critical SCENARIO 3 RESULTS: EXECUTION CYCLES (SAFETY1) task memory usage = 12MB Memory usage = 8MB 2 DSPs Memory usage = 8MB 3 DSPs 17

  18. Non-critical SCENARIO 3 RESULTS: EXECUTION CYCLES (SAFETY1) task memory usage = 12MB Memory usage = 8MB 4 DSPs 18

  19. SCENARIO 3 SUMMARY: EXECUTION CYCLES Safety1 Cores Mean Overhead (%) Max Overhead (%) ARM 0 0 ARM + 1DSP 0,299 0,363 ARM + 2DSP 0,659 1,105 ARM + 3DSP 1,854 5,769 ARM + 4DSP 4,514 14,991 Data caches have been turned off 19

  20. PREDICTABILITY EVT application to the different scenarios • Hypothesis check • Inverse Cumulative Distribution Function (ICDF) • Pay attention to its convergence Memory usage = 128KB 20

  21. CONCLUSIONS • Measurements based on Performance Monitor Hardware successfully works • The EVT can successfully predict the outcome • The best placement strategy is: 1. The critical task in one ARM core 2. Non-critical tasks in the DSPs (Resource accessing arbitration may be used if needed) 3. Non-critical tasks in the second ARM (main interference source) 21

Recommend


More recommend