Online Teaching Ø Lectures are delivered live over Zoom at class time. q Also recorded for offline viewing after class. q Time to become a Zoom master J Ø Project demos will be done live in class (preferred if possible), or through prerecorded videos to be played in class. q Prepare a prerecorded video as backup even if you plan a live demo. q Send your video and slides to Ruixuan by 11am on demo day. • Use YouTube or WU Box. q Feel free to adapt your projects. Ø Your feedbacks are welcome! q We will work together to optimize online learning. 1
Coming Up Ø Demo II: 3/31 (next Tuesday) q 10 min per team. q Email Ruixuan your slides and videos by 11am. q Gearing up for the final demo. Ø Critique #3: 4/7 q S. Xi, M. Xu, C. Lu, L. Phan, C. Gill, O. Sokolsky and I. Lee, Real-Time Multi-Core Virtual Machine Scheduling in Xen, ACM International Conference on Embedded Software (EMSOFT'14), October 2014. 2
Parallel Real-Time Systems for Latency-Critical Applications Chenyang Lu CSE 520S
Cyber-Physical Systems (CPS) Cyber-Physical Boundary Real-Time Hybrid Simulation (RTHS) Since the application interacts with the physical world, its computation must be completed under a time constraint. NSF Cyber-Physical Systems Program Solicitation: CPS are built from, and depend upon, the seamless integration of computational algorithms and physical components. ^ Robert L. and Terry L. Bowen Large Scale Structures Laboratory at Purdue University 4
Cyber-Physical Systems (CPS) Cyber-Physical Boundary 5
Interactive Cloud Services (ICS) Need to respond within100ms for users to find responsive*. Query doc Doc. index search 2 nd phase ranking Snippet generator Response Search the web * Jeff Dean et al. (Google) "The tail at scale." Communications of the ACM 56.2 (2013) 6
Interactive Cloud Services (ICS) Need to respond within100ms for users to find responsive*. E.g., web search, online gaming, stock trading etc. Search the web * Jeff Dean et al. (Google) "The tail at scale." Communications of the ACM 56.2 (2013) 7
Real-Time Systems The performance of the systems depends not only upon their functional aspects, but also upon their temporal aspects. Real-time performance: 1) Provide hard guarantee of meeting jobs’ deadlines (e.g. CPS) 2) Optimize latency-related objectives for jobs (e.g. ICS) jobs Job 1 Job 2 Job 3 cores single multi-core machine 8
New Generation of Real-Time Systems Characteristics: Ø New classes of applications with complex functionalities Ø Increasing computational demand of each application Ø Consolidating multiple applications onto a shared platform Ø Rapid increase in the number of cores per chip Demand : leverage parallelism within the applications, to improve real-time performance and system efficiency jobs cores single multi-core machine 9
Parallelism Improves RTHS Accuracy A RTHS simulates a nine stories building, with first story damper Ø Previously, sequential processing power limits a rate of 575Hz Ø Parallel execution now allows a rate of 3000Hz 10
Parallelism Improves RTHS Accuracy A RTHS simulates a nine stories building, with first story damper Ø Previously, sequential processing power limits a rate of 575Hz Ø Parallel execution now allows a rate of 3000Hz Sequential (575 Hz) Normalized Error (%) Parallel (3000 Hz) Time (sec) Ø Reduction in error for acceleration and displacement Ø Parallelism increases accuracy via faster actuation and sensing 11
State of the Art Ø Real-time systems q Schedule multiple sequential jobs on a single core q Schedule multiple sequential jobs on multiple cores Ø Parallel runtime systems q Schedule a single parallel job q Schedule multiple parallel jobs to optimize fairness or throughput Ø New: parallel real-time systems for latency-critical applications 12
Challenges for Parallel Real-Time Systems Theory Systems How to provide real-time How to build parallel real-time performance for multiple systems that are efficient and parallel jobs? scalable? Develop provably good and practically efficient real-time systems for parallel applications 13
Parallel Job – Directed Acyclic Graph (DAG) Naturally captures programs generated by parallel languages such as Cilk Plus, Thread Building Blocks and OpenMP. Node : sequential computation Edge : dependence between nodes C i = 18 L i = 9 Work C i : execution time on one core 14
Parallel Job – Directed Acyclic Graph (DAG) Naturally captures programs generated by parallel languages such as Cilk Plus, Thread Building Blocks and OpenMP. Node : sequential computation Edge : dependence between nodes C i = 18 L i = 9 Work C i : execution time on one core Span (critical-path length) L i : execution time on ∞ cores 15
Parallel Real-Time Task Model A task periodically releases DAG jobs with deadlines. Job 1 Job 2 Task 1 D i = 12 D i = 12 deadline D i = period 16
Parallel Real-Time Task Model A task periodically releases DAG jobs with deadlines. Job 1 Job 2 Task 1 D i = 12 D i = 12 deadline D i = period worst-case span L i worst-case work C i 17
Parallel Real-Time Task Model A task periodically releases DAG jobs with deadlines. Multiple tasks scheduled on multi-core system. Task 1 D i = 12 D i = 12 Task 2 D i = 9 D i = 9 Goal of system : guarantee all tasks can meet all their deadlines. 18
Federated Scheduling For parallel tasks, FS has the best bound in term of schedulability FS assigns n i dedicated cores to each parallel task deadline D i = period • ⎡ ⎤ n i = C i − L i worst-case span L i • ⎢ ⎥ D i − L i ⎢ ⎥ worst-case work C i • n i – the minimum #cores needed for a task to meet its deadline tasks cores 19
Empirical Comparison FS platform Ø Middleware platform providing FS service in Linux Ø Work with GNU OpenMP runtime system Ø Run OpenMP programs with minimum modification Compare with our Global Earliest Deadline First platform (GEDF) • Linux kernel 3.10.5 with LITMUS RT patch • 16-core machine with 2 Intel Xeon E5-2687W processors • GCC version 4.6.3. with OpenMP • Each data point has 100 task sets • Each task is randomly generated with parallel for-loops 20
Empirical Comparison 1 0.9 0.8 Fraction of Task Sets 0.7 C i ∑ normalized Missing Deadlines 0.6 D i system i m: #cores = utilization Better m 0.5 performance 0.4 0.3 • Linux kernel 3.10.5 with GEDF LITMUS RT patch 0.2 FS • 16-core machine with 2 Intel 0.1 Xeon E5-2687W processors 0 • GCC version 4.6.3. with OpenMP 0.2 0.4 0.6 0.8 • Each data point has 100 task sets Normalized System Utilization • Each task is randomly generated Harder to schedule with parallel for-loops 21
Empirical Comparison 1 0.9 0.8 Fraction of Task Sets 0.7 Missing Deadlines 0.6 52% tasks sets Better 0.5 become schedulable performance 0.4 under FS 0.3 • Linux kernel 3.10.5 with GEDF LITMUS RT patch 0.2 FS • 16-core machine with 2 Intel 0.1 Xeon E5-2687W processors 0 • GCC version 4.6.3. with OpenMP 0.2 0.4 0.6 0.8 • Each data point has 100 task sets Normalized System Utilization • Each task is randomly generated Harder to schedule with parallel for-loops 22
Summary of Federated Scheduling For parallel real-time systems with guarantee of meeting deadlines, Federated Scheduling has: Ø the best theoretical bound in term of schedulability Ø better empirical performance compared to GEDF RTHS has used FS platform to improve system performance The End? tasks cores 23
Issue with the Classic System Model The classic system model uses the worst-case work for analysis. The worst-case work is significantly larger than the average work. à The average system utilization is very low in practice. To guarantee that all tasks can meet all deadlines at all cases. Most cases Very rare cases Work 10ms Work 100ms core 1 core 1 core 2 core 2 100ms 10ms core 3 core 3 0 40 0 40 24
Mixed-Criticality in Cars Features with different criticality levels: q Safety-critical features q Infotainment features Display system with Car Navigation and Infotainment 25
Toy Example of MC System High criticality task deadline 40ms Worst-case work 100ms Most-case work 10ms Very rare cases Most cases core 1 core 2 100ms 10ms core 3 core 1 0 40 0 40 Low criticality task deadline 40ms Most-case work 80ms core 1 80ms core 2 0 40 26
Most-Case vs. Worst-Case Scenarios Single-criticality systems: need to model worst-case scenario Most cases Very rare cases core 1 core 1 10ms core 2 100ms core 2 80ms core 3 core 3 core 4 core 4 80ms core 5 core 5 0 40 0 40 27
MC Model Improves Resource Efficiency Mixed-criticality system: Provide different levels of real-time guarantees overrun core 1 10ms 100ms core 2 。。。 。。。 80ms core 3 400 440 0 40 Very rare cases: Most cases: only guarantee that guarantee that both high high-criticality tasks and low-criticality tasks meet deadlines meet deadlines 28
MCFS Algorithm at a High Level For each parallel task, calculate and assign: (1) dedicated cores in typical-state Typical-state (most cases) dedicated cores in High-Criticality typical-state m cores Low-Criticality High-Criticality 29
Recommend
More recommend