thanks
play

Thanks Multicore Real-Time Systems Guan Nan, Martin Stigge, - PDF document

2010-09-11 Part 2 Thanks Multicore Real-Time Systems Guan Nan, Martin Stigge, Mingsong Lv, Zhang Yi, -- Challenges & Solutions Erik Hagersten, Bengt Jonsson and Alexander Medvedev Wang Yi Uppsala University VTSA Summer School


  1. 2010-09-11 Part 2 Thanks Multicore Real-Time Systems Guan Nan, Martin Stigge, Mingsong Lv, Zhang Yi, -- Challenges & Solutions Erik Hagersten, Bengt Jonsson and Alexander Medvedev Wang Yi Uppsala University VTSA Summer School Luxembourg, Sept 2010 2 OUTLINE What is multi-core, and why? Off-chip memory  Multicore Challenges (Real-Time Applications?) • Why and what are multicores? CPU CPU CPU CPU • What we are doing in Uppsala: CoDeR-MP • The timing analysis problem L1 L1 L1 L1  Possible Solutions – Partition/Isolation L2 Cache • Dealing with Cache Contention [EMSOFT 2009] L1 L1 L1 L1 • Dealing with Bus Interference [RTSS 2010] CPU CPU CPU CPU • Dealing with Core Sharing [RTAS 2010] Multicore = Multiple hardware threads sharing the memory system 3 4 Free lunch is over, Erik Hagersten Year 2003-2007 Multicore: Requires Performance Parallel [log] The free lunch is over & Applications Multicores are coming ! 1000 Single Core Erik Hagersten 100 Chief Architect at SUN (till 1999) Professor of Computer Architecture, Uppsala 10 1 Now Year 5 6 1

  2. 2010-09-11 Multicore Challenges Theoretically you may get: Off-chip memory  Higher Performance • Increasing the cores -- unlimited computing power  ! CPU CPU CPU CPU Bandwidth L1 L1 L1 L1  Lower Power Consumption L2 Cache • Increasing the cores, decreasing the frequency L1 L1 L1 L1  Performance (IPC) = Cores * F  2* Cores * F/2  Cores * F  Power = C * V 2 * F  2* C * (V /2) 2 * F/2  C * V 2 /4 * F CPU CPU CPU CPU  Keep the “same performance” using ¼ of the energy (by doubling the cores) Real-time applications? -- Cache contention -- Bus interference Shared This sounds great for embedded & real-time applications! -- Multiprocessor scheduling Resources Weak memory models - locking Cheap/expensive Synchronization 7 8 UPMARC Research Areas Year 2008 (June) Applications & Algorithms  Climate simulation  PDE solvers High Performance Computing  Parallel algorithms for RT signal processing UPMARC : Computer Networks  Parallelization of network protocols Uppsala Programming Multicore Verification & Language Technology  Erlang, language constructs/libraries, run-time systems Architecture Research Center  Static analysis, Model-checking , testing, UPPAAL Resource Management  Efficiency: performance opt. Awarded by the Swedish Research Council CPU CPU CPU CPU L1 L1 L1 L1  Predictability: real-time applications 10 millions US$: 2008 -- 2018 L2 L1 L1 L1 L1 CPU CPU CPU CPU Similar centers: Stanford, UC Berkeley 9 10 Objective (CoDeR-MP) Year 2008 (November) New techniques for High-performance software for soft RT applications & • Predictable software for hard RT applications • on multicore CoDeR-MP : Computationally Demanding Real-Time Applications on Multicore Platforms Industry participation • Control Software for Industrial Robots – ABB robotics Awarded by the Swedish Strategic Research Foundation 3 millions US$: 2009 -- 2014 • Tracking with parallel particle filter – SAAB 11 2

  3. 2010-09-11 Parallelization Real-Time Tracking with parallel particle filter – SAAB (Speed-up for PF algorithms) Number of particles N = 100 Number of particles N = 500 8 8 7 7 6 6 5 Speed-up Speed-up 5 4 4 3 3 2 1 2 0 1 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Number of cores M Number of cores M Number of particles N = 1000 Number of particles N = 10000 8 8 GDPF 7 7 RNA GPF 6 6 RPA Speed-up Speed-up Linear speed up 5 5 4 4 3 3 2 2 1 1 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Number of cores M Number of cores M OUTLINE Real-Time Control – ABB Robotics  Multicore Challenges IRC5 robot controller • Why and what are multicores? • What we are doing in Uppsala: CoDeR-MP Precise moves • The timing analysis problem Welding A B C D  Possible Solutions – Partition/Isolation program Commands High-level • Dealing with Cache Contention [EMSOFT 2009] instructions • Dealing with Bus Interference [RTSS 2010] Requests • Dealing with Core Sharing [RTAS 2010] Mixed Hard and Soft Real-Time Tasks 20% hard real-time tasks Main concerns: Isolation between hard & soft tasks: “fire walls” Real- time guarantee for the 20% “super” RT tasks Migration to multicore? 16 Single-Processor Timing Analysis Sequential Case (WCET analysis) task 1 On single processor: WCRT=WCET Concurrent Case (Schedulability analysis) WCET = #instructions + “cache miss penalty” Non- deterministic task 3 releases task 2 “Cache miss penalty” can be estimated “precisely” by e.g abstract interpretation – based on the history of executions task 1 WCRT WCRT 17 18 3

  4. 2010-09-11 An Experiment on a LINUX machine with 2 cores (Zhang Y i) WCET (vary 10 – 50%) 350000 On multicore processor: 300000 250000 Execution time (uS) 200000 WCET = #instructions + “cache miss penalty” + … 150000 100000 50000 “Cache miss penalty” can be much larger due to cache 0 mcol cnt mcol mcol mcol sha mcol susane mcol susans contentions from the other cores … and also bus delays without cache partitioning mcol runs with different programs WCET of a single task can not be estimated in isolation 19 20 An Example Architecture Cache analysis on multicore  L2 cache contents of task 1 may be over-written by task 2 core 1 core 2 core 3 core 4 Task 1 Task 2 Task 3 Task 4 Private L1 Private L1 Private L1 Private L1 cache cache cache cache Shared L2 cache 21 22 Cache analysis on multicore Cache analysis on multicore  L2 cache contents of task 1 may be over-written by task 2 Task 1 Task 2 Task 3 Task 4 Task 1 Task 2 Task 3 Task 4 Private L1 Private L1 Private L1 Private L1 cache cache cache cache Shared L2 cache 23 24 4

  5. 2010-09-11 The multicore challenge: Schedulability analysis The multicore challenge: WCET analysis  #cores < #tasks  Must explore all interleavings of “execution paths” on all cores  Must represent “precise” timing information on each core (to keep track of the progress on each core and cache contents) Task 1 Task 2 Task 3 Task 4 Task 5 25 26 The “Impossible” Problem Cyclic dependence We must “schedule” the shared cache lines 1. We must “schedule” the shared memory bus 2. • when cache misses ocur Multicore schedulability analysis We must “schedule” the shared cores 3. WCET analysis 27 28 OUTLINE OUTLINE  Multicore Challenges  Multicore Challenges • Why and what are multicores? • Why and what are multicores? • What we are doing in Uppsala: CoDeR-MP • What we are doing in Uppsala: CoDeR-MP • The timing analysis problem • The timing analysis problem  Possible Solutions – Partition/Isolation  Possible Solutions – Partition/Isolation • Dealing with Shared Caches [EMSOFT 2009] • Dealing with Shared Caches [EMSOFT 2009] • Dealing with Bus Interference [RTSS 2010] • Dealing with Bus Interference [RTSS 2010] • Dealing with Core Sharing [RTAS 2010] • Dealing with Core Sharing [RTAS 2010] 29 30 5

  6. 2010-09-11 Cache analysis on multicore Cache-Coloring: partitioning and isolation Task 1 Task 2 Task 3 Task 4 Task 1 Task 2 Task 3 Task 4 Private L1 Private L1 Private L1 Private L1 cache cache cache cache Shared L2 cache 31 32 Cache-Coloring: partitioning and isolation Cache-Coloring: partitioning and isolation  E.g. LINUX – Power5 (16 colors) Task 1 Task 2 Task 3 Task 4 Logical Pages of Task A Logical Pages of Task B … … … … controlled by software (OS) … … Physical Pages indexed by hardware WCET can be estimated using static techniques for single processor platforms (for the given portion L2 cache) L2 Cache 33 34 An Experiment on a LINUX machine with 2 cores with Cache Coloring/Partitioning [ZhangYi et al] What to do when #tasks > #cores ? 350000 340000 320000 300000 300000 280000 260000 250000 Execution time (uS) Execution time (uS) 240000 220000 200000 200000 180000 160000 150000 140000 120000 100000 100000 80000 60000 50000 40000 20000 0 0 mcol cnt mcol mcol mcol sha mcol susane mcol susans mcol cnt mcol mcol mcol sha mcol susane mcol susans without cache partitioning with cache partitioning 35 36 6

Recommend


More recommend