Challenges for Worst-case Execution Time Analysis of Multi-core Architectures Jan Reineke @ saarland university computer science Intel, Braunschweig April 29, 2013
The Context: Hard Real-Time Systems Safety-critical applications: ¢ Avionics, automotive, train industries, manufacturing Side airbag in car Crankshaft-synchronous tasks Reaction in < 10 msec Reaction in < 45 microsec ¢ Embedded controllers must finish their tasks within given time bounds. ¢ Developers would like to know the Worst-Case Execution Time (WCET) to give a guarantee. Jan Reineke, Saarland 2
The Timing Analysis Problem ? ¡ + ¡ Embedded Software Timing Requirements + Microarchitecture Jan Reineke, Saarland 3
What does the execution time depend on? ¢ The input, determining which path is taken through the program. ¢ The state of the hardware platform: l Due to caches, pipelining, speculation, etc. ¢ Interference from the environment: l External interference as seen from the analyzed task on shared busses, caches, memory. Simple Memory CPU Jan Reineke, Saarland 4
What does the execution time depend on? ¢ The input, determining which path is taken through the program. ¢ The state of the hardware platform: l Due to caches, pipelining, speculation, etc. ¢ Interference from the environment: l External interference as seen from the analyzed task on shared busses, caches, memory. Complex CPU (out-of-order Simple L1 Main Memory execution, CPU Cache Memory branch prediction, etc.) Jan Reineke, Saarland 5
What does the execution time depend on? ¢ The input, determining which path is taken through the program. ¢ The state of the hardware platform: l Due to caches, pipelining, speculation, etc. ¢ Interference from the environment: l External interference as seen from the analyzed task on shared busses, caches, memory. Complex L1 Complex CPU CPU Cache (out-of-order Simple L1 Main L2 Main Memory execution, ... CPU Cache Memory Cache Memory branch prediction, etc.) Complex L1 CPU Cache Jan Reineke, Saarland 6
Example of Influence of Microarchitectural State LOAD r2, _a x=a+b; LOAD r1, _b ADD r3,r2,r1 PowerPC 755 Jan Reineke, Saarland 7
Example of Influence of Corunning Tasks in Multicores Radojkovic et al. (ACM TACO, 2012) on Intel Atom and Intel Core 2 Quad: up to 14x slow-down due to interference on shared L2 cache and memory controller Jan Reineke, Saarland 8
Challenges 1. Modeling How to construct sound timing models? 2. Analysis How to precisely & efficiently bound the WCET? 3. Design How to design microarchitectures that enable precise & efficient WCET analysis? Jan Reineke, Saarland 9
The Modeling Challenge architecture ? ¡ Timing + Micro- Model Timing model = Formal specification of microarchitecture’s timing Incorrect timing model à possibly incorrect WCET bound. Jan Reineke, Saarland 10
Current Process of Deriving Timing Model ? ¡ Timing Micro- + architecture Model Jan Reineke, Saarland 11
Current Process of Deriving Timing Model ? ¡ Timing Micro- + architecture Model Jan Reineke, Saarland 12
Current Process of Deriving Timing Model ? ¡ Timing Micro- + architecture Model Jan Reineke, Saarland 13
Current Process of Deriving Timing Model ? ¡ Timing Micro- + architecture Model Jan Reineke, Saarland 14
Current Process of Deriving Timing Model ? ¡ Timing Micro- + architecture Model à Time-consuming, and à error-prone. Jan Reineke, Saarland 15
Current Process of Deriving Timing Model ? ¡ Timing Micro- + architecture Model à Time-consuming, and à error-prone. Jan Reineke, Saarland 16
1. Future Process of Deriving Timing Model Timing Micro- + VHDL architecture Model Model Jan Reineke, Saarland 17
1. Future Process of Deriving Timing Model Timing Micro- + VHDL architecture Model Model Derive timing model automatically from formal specification of microarchitecture. à Less manual effort, thus less time-consuming, and à provably correct. Jan Reineke, Saarland 18
1. Future Process of Deriving Timing Model Timing Micro- + VHDL architecture Model Model Derive timing model automatically from formal specification of microarchitecture. à Less manual effort, thus less time-consuming, and à provably correct. Jan Reineke, Saarland 19
1. Future Process of Deriving Timing Model Timing Micro- + VHDL architecture Model Model Derive timing model automatically from formal specification of microarchitecture. à Less manual effort, thus less time-consuming, and à provably correct. Jan Reineke, Saarland 20
2. Future Process of Deriving Timing Model Perform Timing Micro- + measurements on Infer model architecture Model hardware Jan Reineke, Saarland 21
2. Future Process of Deriving Timing Model Perform Timing Micro- + measurements on Infer model architecture Model hardware Derive timing model automatically from measurements on the hardware using ideas from automata learning. à No manual effort, and à (under certain assumptions) provably correct. à Also useful to validate assumptions about microarch. Jan Reineke, Saarland 22
2. Future Process of Deriving Timing Model Perform Timing Micro- + measurements on Infer model architecture Model hardware Derive timing model automatically from measurements on the hardware using ideas from automata learning. à No manual effort, and à (under certain assumptions) provably correct. à Also useful to validate assumptions about microarch. Jan Reineke, Saarland 23
2. Future Process of Deriving Timing Model Perform Timing Micro- + measurements on Infer model architecture Model hardware Derive timing model automatically from measurements on the hardware using ideas from automata learning. à No manual effort, and à (under certain assumptions) provably correct. à Also useful to validate assumptions about microarch. Jan Reineke, Saarland 24
2. Future Process of Deriving Timing Model Perform Timing Micro- + measurements on Infer model architecture Model hardware Derive timing model automatically from measurements on the hardware using ideas from automata learning. à No manual effort, and à (under certain assumptions) provably correct. à Also useful to validate assumptions about microarch. Jan Reineke, Saarland 25
Proof-of-concept: Automatic Modeling of the Cache Hierarchy ¢ Cache Model is important part of Timing Model ¢ Can be characterized by a few parameters: l ABC: associativity, block size, capacity l Replacement policy B = Block Size Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data A = Associativity ... Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data N = Number of Cache Sets chi [Abel and Reineke, RTAS 2013] derives all of these parameters fully automatically. Jan Reineke, Saarland 26
Example: Intel Core 2 Duo E6750, L1 Data Cache |Misses| 90000 80000 70000 60000 50000 L1 Misses 40000 30000 20000 10000 0 |Size| 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950 Jan Reineke, Saarland 27
Example: Intel Core 2 Duo E6750, L1 Data Cache |Misses| 90000 80000 70000 60000 50000 L1 Misses 40000 30000 20000 10000 0 |Size| 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950 Capacity = 32 KB Jan Reineke, Saarland 28
Example: Intel Core 2 Duo E6750, L1 Data Cache Way Size = 4 KB |Misses| 90000 80000 70000 60000 50000 L1 Misses 40000 30000 20000 10000 0 |Size| 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950 Capacity = 32 KB Jan Reineke, Saarland 29
Replacement Policy Approach inspired by methods to learn finite automata. Heavily specialized to problem domain. Jan Reineke, Saarland 30
Replacement Policy Approach inspired by methods to learn finite automata. Heavily specialized to problem domain. Discovered to our knowledge undocumented policy of the Intel Atom D525: d x a b d c x e c d a b d c e f e f a b More information: http://embedded.cs.uni-saarland.de/chi.php Jan Reineke, Saarland 31
Modeling Challenge: Future Work Extend automation to other parts of the microarchitecture: ¢ Translation lookaside buffers, branch predictors ¢ Shared caches in multicores including their coherency protocols ¢ Out-of-order pipelines? Jan Reineke, Saarland 32
The Analysis Challenge ? ¡ ! ¡ Timing + Micro- Precise & Efficient architecture Model Timing Analysis Consider all Consider all possible possible initial program states of the inputs hardware WCET H ( P ) := max h ∈ States ( H ) ET H ( P, i, h ) max i ∈ Inputs Jan Reineke, Saarland 33
The Analysis Challenge Consider all Consider all possible possible initial program states of the inputs hardware WCET H ( P ) := max h ∈ States ( H ) ET H ( P, i, h ) max i ∈ Inputs Explicitly evaluating ET for all inputs and all hardware states is not feasible in practice: ¢ There are simply too many. è Need for abstraction and thus approximation! Jan Reineke, Saarland 34
Recommend
More recommend