2010-09-11 Part 2 Thanks Multicore Real-Time Systems Guan Nan, Martin Stigge, Mingsong Lv, Zhang Yi, -- Challenges & Solutions Erik Hagersten, Bengt Jonsson and Alexander Medvedev Wang Yi Uppsala University VTSA Summer School Luxembourg, Sept 2010 2 OUTLINE What is multi-core, and why? Off-chip memory Multicore Challenges (Real-Time Applications?) • Why and what are multicores? CPU CPU CPU CPU • What we are doing in Uppsala: CoDeR-MP • The timing analysis problem L1 L1 L1 L1 Possible Solutions – Partition/Isolation L2 Cache • Dealing with Cache Contention [EMSOFT 2009] L1 L1 L1 L1 • Dealing with Bus Interference [RTSS 2010] CPU CPU CPU CPU • Dealing with Core Sharing [RTAS 2010] Multicore = Multiple hardware threads sharing the memory system 3 4 Free lunch is over, Erik Hagersten Year 2003-2007 Multicore: Requires Performance Parallel [log] The free lunch is over & Applications Multicores are coming ! 1000 Single Core Erik Hagersten 100 Chief Architect at SUN (till 1999) Professor of Computer Architecture, Uppsala 10 1 Now Year 5 6 1
2010-09-11 Multicore Challenges Theoretically you may get: Off-chip memory Higher Performance • Increasing the cores -- unlimited computing power ! CPU CPU CPU CPU Bandwidth L1 L1 L1 L1 Lower Power Consumption L2 Cache • Increasing the cores, decreasing the frequency L1 L1 L1 L1 Performance (IPC) = Cores * F 2* Cores * F/2 Cores * F Power = C * V 2 * F 2* C * (V /2) 2 * F/2 C * V 2 /4 * F CPU CPU CPU CPU Keep the “same performance” using ¼ of the energy (by doubling the cores) Real-time applications? -- Cache contention -- Bus interference Shared This sounds great for embedded & real-time applications! -- Multiprocessor scheduling Resources Weak memory models - locking Cheap/expensive Synchronization 7 8 UPMARC Research Areas Year 2008 (June) Applications & Algorithms Climate simulation PDE solvers High Performance Computing Parallel algorithms for RT signal processing UPMARC : Computer Networks Parallelization of network protocols Uppsala Programming Multicore Verification & Language Technology Erlang, language constructs/libraries, run-time systems Architecture Research Center Static analysis, Model-checking , testing, UPPAAL Resource Management Efficiency: performance opt. Awarded by the Swedish Research Council CPU CPU CPU CPU L1 L1 L1 L1 Predictability: real-time applications 10 millions US$: 2008 -- 2018 L2 L1 L1 L1 L1 CPU CPU CPU CPU Similar centers: Stanford, UC Berkeley 9 10 Objective (CoDeR-MP) Year 2008 (November) New techniques for High-performance software for soft RT applications & • Predictable software for hard RT applications • on multicore CoDeR-MP : Computationally Demanding Real-Time Applications on Multicore Platforms Industry participation • Control Software for Industrial Robots – ABB robotics Awarded by the Swedish Strategic Research Foundation 3 millions US$: 2009 -- 2014 • Tracking with parallel particle filter – SAAB 11 2
2010-09-11 Parallelization Real-Time Tracking with parallel particle filter – SAAB (Speed-up for PF algorithms) Number of particles N = 100 Number of particles N = 500 8 8 7 7 6 6 5 Speed-up Speed-up 5 4 4 3 3 2 1 2 0 1 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Number of cores M Number of cores M Number of particles N = 1000 Number of particles N = 10000 8 8 GDPF 7 7 RNA GPF 6 6 RPA Speed-up Speed-up Linear speed up 5 5 4 4 3 3 2 2 1 1 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Number of cores M Number of cores M OUTLINE Real-Time Control – ABB Robotics Multicore Challenges IRC5 robot controller • Why and what are multicores? • What we are doing in Uppsala: CoDeR-MP Precise moves • The timing analysis problem Welding A B C D Possible Solutions – Partition/Isolation program Commands High-level • Dealing with Cache Contention [EMSOFT 2009] instructions • Dealing with Bus Interference [RTSS 2010] Requests • Dealing with Core Sharing [RTAS 2010] Mixed Hard and Soft Real-Time Tasks 20% hard real-time tasks Main concerns: Isolation between hard & soft tasks: “fire walls” Real- time guarantee for the 20% “super” RT tasks Migration to multicore? 16 Single-Processor Timing Analysis Sequential Case (WCET analysis) task 1 On single processor: WCRT=WCET Concurrent Case (Schedulability analysis) WCET = #instructions + “cache miss penalty” Non- deterministic task 3 releases task 2 “Cache miss penalty” can be estimated “precisely” by e.g abstract interpretation – based on the history of executions task 1 WCRT WCRT 17 18 3
2010-09-11 An Experiment on a LINUX machine with 2 cores (Zhang Y i) WCET (vary 10 – 50%) 350000 On multicore processor: 300000 250000 Execution time (uS) 200000 WCET = #instructions + “cache miss penalty” + … 150000 100000 50000 “Cache miss penalty” can be much larger due to cache 0 mcol cnt mcol mcol mcol sha mcol susane mcol susans contentions from the other cores … and also bus delays without cache partitioning mcol runs with different programs WCET of a single task can not be estimated in isolation 19 20 An Example Architecture Cache analysis on multicore L2 cache contents of task 1 may be over-written by task 2 core 1 core 2 core 3 core 4 Task 1 Task 2 Task 3 Task 4 Private L1 Private L1 Private L1 Private L1 cache cache cache cache Shared L2 cache 21 22 Cache analysis on multicore Cache analysis on multicore L2 cache contents of task 1 may be over-written by task 2 Task 1 Task 2 Task 3 Task 4 Task 1 Task 2 Task 3 Task 4 Private L1 Private L1 Private L1 Private L1 cache cache cache cache Shared L2 cache 23 24 4
2010-09-11 The multicore challenge: Schedulability analysis The multicore challenge: WCET analysis #cores < #tasks Must explore all interleavings of “execution paths” on all cores Must represent “precise” timing information on each core (to keep track of the progress on each core and cache contents) Task 1 Task 2 Task 3 Task 4 Task 5 25 26 The “Impossible” Problem Cyclic dependence We must “schedule” the shared cache lines 1. We must “schedule” the shared memory bus 2. • when cache misses ocur Multicore schedulability analysis We must “schedule” the shared cores 3. WCET analysis 27 28 OUTLINE OUTLINE Multicore Challenges Multicore Challenges • Why and what are multicores? • Why and what are multicores? • What we are doing in Uppsala: CoDeR-MP • What we are doing in Uppsala: CoDeR-MP • The timing analysis problem • The timing analysis problem Possible Solutions – Partition/Isolation Possible Solutions – Partition/Isolation • Dealing with Shared Caches [EMSOFT 2009] • Dealing with Shared Caches [EMSOFT 2009] • Dealing with Bus Interference [RTSS 2010] • Dealing with Bus Interference [RTSS 2010] • Dealing with Core Sharing [RTAS 2010] • Dealing with Core Sharing [RTAS 2010] 29 30 5
2010-09-11 Cache analysis on multicore Cache-Coloring: partitioning and isolation Task 1 Task 2 Task 3 Task 4 Task 1 Task 2 Task 3 Task 4 Private L1 Private L1 Private L1 Private L1 cache cache cache cache Shared L2 cache 31 32 Cache-Coloring: partitioning and isolation Cache-Coloring: partitioning and isolation E.g. LINUX – Power5 (16 colors) Task 1 Task 2 Task 3 Task 4 Logical Pages of Task A Logical Pages of Task B … … … … controlled by software (OS) … … Physical Pages indexed by hardware WCET can be estimated using static techniques for single processor platforms (for the given portion L2 cache) L2 Cache 33 34 An Experiment on a LINUX machine with 2 cores with Cache Coloring/Partitioning [ZhangYi et al] What to do when #tasks > #cores ? 350000 340000 320000 300000 300000 280000 260000 250000 Execution time (uS) Execution time (uS) 240000 220000 200000 200000 180000 160000 150000 140000 120000 100000 100000 80000 60000 50000 40000 20000 0 0 mcol cnt mcol mcol mcol sha mcol susane mcol susans mcol cnt mcol mcol mcol sha mcol susane mcol susans without cache partitioning with cache partitioning 35 36 6
Recommend
More recommend