wcet analyzers for industry
play

WCET Analyzers for Industry Christian Ferdinand AbsInt Angewandte - PowerPoint PPT Presentation

WCET Analyzers for Industry Christian Ferdinand AbsInt Angewandte Informatik GmbH 2 3 AbsInt Angewandte Informatik GmbH ! Provides advanced development tools for embedded systems, and tools for validation, verification, and certification of


  1. WCET Analyzers for Industry Christian Ferdinand AbsInt Angewandte Informatik GmbH

  2. 2

  3. 3 AbsInt Angewandte Informatik GmbH ! Provides advanced development tools for embedded systems, and tools for validation, verification, and certification of safety-critical software Staff growth graph ! Founded in February 1998 by six researchers of Saarland University, Germany ! Privately held by the founders ! Selected Customers:

  4. 4 Hard Real-Time Systems ! Controllers in planes, cars, plants, … are expected to finish their tasks within reliable time bounds. ! Schedulability analysis must be performed ! Hence, it is essential that an upper bound on the execution times of all tasks is known ! Commonly called the Worst-Case Execution Time (WCET)

  5. 5 The Timing Problem Probability Unsafe: Safe worst-case Best-case execution time execution time execution time measurement estimate Exact worst-case execution time Execution time

  6. 6 Embedded Control Software ! Tends to be large and complex ! Lots of functionality ! Code-generating tools ! 3 rd party software ! RTOS ! communication libraries

  7. 7 The Ever-Growing Gap LOAD r2, _a � x = a + b; LOAD r1, _b � ADD r3,r2,r1 � 68K (1990) MPC 5xx (2000) PPC 755 (2001) Execution time (clock cycles) Execution time depending on flash memory Execution time (clock cycles)

  8. 8 aiT WCET Analyzer Combines global static program analysis by Abstract Interpretation: ! microarchitecture analysis (caches, pipelines, …) + value analysis integer linear programming for path analysis ! in a single intuitive GUI. Specifications (*.ais) Application Code void Task (void) { clock 10200 kHz ; variable++; function(); loop "_codebook" + 1 loop exactly 16 end ; next++: if (next) recursion "_fac" max 6; do this; terminate() } SNIPPET "printf" IS NOT ANALYZED AND TAKES MAX 333 Entry Point CYCLES; flow "U_MOD" + 0xAC bytes / "U_MOD" + 0xC4 bytes is max 4; area from 0x20 to 0x497 is read-only; Compiler aiT Linker " Worst Case Execution Time Executable (*.elf / *.out) " Visualization, Documentation à = ! @ ! aŒ† | @ ! ,@ ! ; " Kÿÿô;ÿ Kÿÿ؉ � ! 2} Œ`øÿÿ™ � ! (8H#é # ¡ � ¶ � ! (

  9. 9 Kelvin D. Nilsen, Bernt Rygg, Worst-Case Execution Time Analysis on Modern Processors “Furthermore, given the ever increasing sizes of multiple-level cache hierarchies, and the high complexity of static cache- behavior analysis, it seems unlikely that, even in the best of circumstances, the cache analyzer can predict more than 50% of the actual cache hits for realistic workloads.”

  10. 10 PAG Program Analyzer Generator

  11. 11 Example: Direct Mapped I-Cache CPU Program Counter: Main memory I-Cache 1032 1028 1024: add … ble 1024 1032: 1024: add … Instruction: 1028: mul … 1028: mul … mul ... ble 1024 1032: ble 1024 Cache Hit: ~ 1 Cycle Cache Miss: ~ +1 to +100 Cycles

  12. 12 Set Associative Cache CPU Address: Address Byte in Set prefix line number Compare address prefix Main Memory If not equal, fetch block from memory Byte select & align Data Out

  13. 13 Example: Fully Associative Cache (2 Elements)

  14. 14 Abstract Semantics: Transfer z concrete y s “young” z Age y x x “old” t z s s z x x t t [ s ] abstract { x } { s } { } { x } { s, t } { t } { y } { y } [ s ]

  15. 15 Abstract Semantics: Join Join (must) { c } { a } { e } { } { a } { c, f } { d } { d } “intersection + maximal age” { } Interpretation: memory block a is { } { a, c } definitively in the (concrete) cache { d } => always hit Question: How many references will a memory block surely survive in the cache?

  16. 16 Structure of the aiT WCET Analyzer Executable program CFG builder AIS file Loop trafo CRL file CRL file Static Path analyses analysis Loop analyzer ILP generator Value analyzer LP solver Cache/pipeline Evaluation WCET, analyzer visualization

  17. 17 Pipeline Analysis ! Goal: calculate all possible pipeline states at a program point ! Method: perform a cycle-wise evolution of the pipeline, determining all possible successor pipeline states ! Implementation: from a formal model of the pipeline, its stages and communication between them ! Generation: from a PAG specification ! Result: WCET for basic blocks

  18. 18 Pipelines Inst 1 Inst 2 Inst 3 Inst 4 Fetch Fetch Decode Decode Fetch Execute Execute Decode Fetch Write back Write back Execute Decode Fetch Write back Execute Decode Write back Execute Ideal case: 1 instruction per cycle Write back

  19. 19 Pipeline of the PPC755

  20. 20 Pipeline Model

  21. 21 Visualization of Pipeline Analysis Results

  22. 22 Path Analysis by Integer Linear Programming (ILP) ! Execution time of a program = Basic_Block b ! Execution_Time(b) x Execution_Count(b) ! ILP solver maximizes this function to get the WCET ! Program structure described by linear constraints ! automatically created from CFG structure ! user provided loop/recursion bounds ! arbitrary additional linear constraints to exclude infeasible paths

  23. 23 Path Analysis: Example (simplified constraints) 4t a max: 4 x a + 10 x b + 3 x c + 2 x d + 6 x e + 5 x f where x a = x b + x c if a then 3t x cc = x d + x e c b x f = x b + x d + x e elseif c then 10t x a = 1 b d else 6t 2t e d e endif Value of objective function: 19 f xa 1 xb 1 x c 0 x d 0 f 5t x e 0 x f 1

  24. 24 aiT WCET Analyzer

  25. 25 Domino Effect ! Timing anomaly ! Execution time increase is not bounded by hardware determined constants ! Certain instruction sequences e.g. in loop bodies can trigger this effect and increase latencies in further iterations

  26. 26 Pseudo-LRU Replacement (PPC755) ! Each setting of B[0..2] points to a specific line: 0 � 1 � B0 0 � 1 � 0 � 1 � B1 B2 L0 L1 L2 L3

  27. 27 4-way PLRU Domino Effect Empty cache Non-empty cache 0 Sequence: c, d, f, c, d, h 0 . . . . e f a b b 0 0 0 0 1 1 c: c . . . c: c e a b 1 0 1 0 1 0 d: c d . . d: c e d b 0 0 1 1 0 1 f: c d f . f: c f d b 0 1 0 1 1 1 c: c d f . c: c f d b 1 1 1 1 1 0 d: c d f . d: c f d b 0 1 1 1 0 1 h: c d f h h: c h d b 0 0 0 1 1 1 c: c d f h c: c h d b 1 0 1 1 1 0 d: c d f h d: c h d b 0 0 1 1 This sequence is then 0 1 f: c d f h f: c f d b 0 1 0 1 repeated ad infinitum 1 1 c: c d f h c: c f d b 1 1 1 1 # only cache hits 1 0 d: c d f h d: c f d b 0 1 1 1 0 1 h: c d f h h: c h d b two misses each time $ 0 0 0 1

  28. 28 Pipeline of the PPC755

  29. 29 Domino Effect on Instruction Sequence S1 A lwz r20, 0(r2) ! mullw can only be executed by integer unit IU1 B addi r21, r20, 4 C mullw r19, r14, r29 ! lwz can only be executed by the D lwz r23, 0(r20) load/store unit LSU E addi r24, r23, 4 ! S1 must be repeated at least 3 F addi r25, r14, 4 times G lwz r26, 0(r19) H mullw r27, r14, r29 I lwz r28, 0(r26) J addi r22, r28, 0

  30. 30 Execution Units Overview Distribution of instruction sequence S1 on the execution units IU1, IU2 and LSU. In cycle 1 instructions A and B are dispatched to LSU and IU2. So C can be ! dispatched to IU1 in cycle 1. 10 + 9(n-1) cycles are needed with n being the number of iterations !

  31. 31 Example: Domino Effect Distribution of instruction sequence S1 on the execution units IU1, IU2 and LSU with an additional leading instruction X. Domino effect ! With the insertion of instruction X, B is dispatched to IU1 in cycle 1. ! C can only be executed by IU1 and so has to wait for B to finish. B has to wait for the results of A. ! While J is executing B can be already dispatched to IU1 and the stream is again delayed ! 3 more cycles per iteration (33%)!! !

  32. Effort to support new processors? Executable program Call- & CFG Graph AIS File Builder Loop Transformation CRL2 File CRL2 File Path Analysis Static Analyses AIS File Loop-Bound Analyzer ILP-Generator Loop Bounds Value Analyzer LP-Solver Cache/Pipeline Evaluation Analyzer

  33. Pipeline Analyzer Generation ! Semi-automatic process ! Based on VHDL specification ! Generates C-Code that ! performs abstract simulation of system behavior, ! fits into the aiT framework and ! incorporates the usual abstractions ! Theoretical background done in research project AVACS ! National research program for basic research ! Saarland University Prof. Wilhelm ! without industrial participation

  34. Semi-Automatic Derivation of Timing Models

  35. Deriving the Timing Model Processor specification too large to be used in aiT framework ! Infineon PCP2 (~40.000 loc), Leon2 (~80.000 loc), Infineon TriCore 1.3 (~250.000 loc) " Specification needs to be compressed

  36. 36 SCADE / aiT automated Flow

  37. 37 Analysis Reports ! Customizable HTML reports ! Global and detailed reports ! Diff feature

  38. 38 Integration with Modelling Tools Example: ETAS ASCET MD

Recommend


More recommend