Continuous Non-Intrusive Hybrid WCET Estimation Using Waypoint Graphs Boris Dreyer, Christian Hochberger, Alexander Lange, Simon Wegener and Alexander Weiss Boris Dreyer dreyer@rs.tu-darmstadt.de Prof. Dr.-Ing. Christian Hochberger Computer Systems Group Department of Electrical Engineering and Information Technology Technische Universität Darmstadt, Germany 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 1
This work was funded within the project CONIRAS by the German Federal Ministry for Education and Research with the funding ID 01IS13029. The responsibility for the content remains with the authors. 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 2
Agenda • Motivation – Measurement-based Execution Time Estimation – Program Flow Trace (PFT) • Waypoint based Worst Case Execution Time Estimation – Waypoint Graph – Context Model • Evaluation – TACLeBench • Conclusion 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 3
Execution Time Estimation Boris Dreyer, Christian Hochberger, Simon Wegener, and Alexander Weiss. Precise Continuous Non-Intrusive Measurement-Based Execution Time Estimation. In Francisco J. Cazorla, editor, 15th International Workshop on Worst-Case Execution Time Analysis (WCET 2015), volume 47 of OpenAccess Series in Informatics (OASIcs), pages 45-54, Dagstuhl, Germany, 2015. Schloss Dagstuhl—Leibniz-Zentrum für Informatik. 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 4
WCET Estimation – BB Approach 4. Transfer BB Executable statistics to host BB Max Max 3. Run Address First Further Application 0x1004 8 us 0 us 1. Reconstruct 0x2000 22 us 11 us CFG 0x202c 16 us 8 us Target FPGA-based 0x2034 2 us 1 us Processor Timing Analysis 0x2050 12 us 8 us 0x205c 2 us 1 us 0x206c 8 us 4 us Basic Block (BB) 0x207c 6 us 0 us 0x101c 6 us 0 us Context Sensitive BB Statistics 2. Adapt timing analysis module 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 5
WCET Estimation – BB Approach BB Max Max Address First Further 5. Annotate CFG 0x1004 8 us 0 us 0x2000 22 us 11 us 0x202c 16 us 8 us 0x2034 2 us 1 us 0x2050 12 us 8 us 0x205c 2 us 1 us 0x206c 8 us 4 us 0x207c 6 us 0 us 0x101c 6 us 0 us 6. Find longest path Overall execution time estimate (ILP based) Our method: 191 us Context Sensitive BB Statistics Context insensitive: 258 us 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 6
Embedded Trace Units ARM Cortex A Nexus 5001 ARM Coresight ETU type Implementation Traditional Branch ETMv3 ETMv4 PFT branch history messages messages Program Flow Branch Branch Instruction Branch Branch Observation Level Cycle count Yes No No Yes Yes Yes Applicable for Yes No No Yes Yes Yes hybrid WCET measurement 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 7
Basic Block Graph CFG 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 8
Basic Block Graph CFG 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 9
Basic Block Graph vs. Waypoint Graph Waypoint instruction Waypath CFG Maximization Equivalence WPG 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 10
WCET Estimation Using Waypoint Graphs 4. Transfer waypoint Executable statistics to host Waypath Max Max ID First Further 3. Run Application 0 8 us 0 us 1. Reconstruct 1 22 us 11 us Waypoint Graph 2 16 us 8 us Target FPGA-based Processor Timing Analysis 3 2 us 1 us 4 12 us 8 us 5 2 us 1 us 6 8 us 4 us 7 6 us 0 us Waypoint Graph 8 6 us 0 us Context Sensitive Waypath 2. Adapt timing Statistics analysis module 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 11
WCET Estimation Using Waypoint Graphs Waypath Max Max 5. Annotate ID First Further Waypoint Graph 0 8 us 0 us 1 22 us 11 us 2 16 us 8 us 3 2 us 1 us 4 12 us 8 us 5 2 us 1 us 6 8 us 4 us 6. Find longest path 7 6 us 0 us Overall execution time estimate (ILP based) 8 6 us 0 us Context Sensitive Waypath Statistics 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 12
Execution Time Estimation - Architecture WPG Generation 1. Offline Pre-processing Executable (CFG Analysis) 2. Context Sensitive Statistics Loop Automata Cluster Min, Max, Avg. Statistics Timing Report 3. Offline Post-processing Timing Analysis 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 13
Loop Automata Cluster Loop Automata Cluster Models one loop Loop Loop Waypath ID, Automaton Automaton Cycles Loop Loop Loop Automaton Automaton Tracepath context Reconstruction Instruction Waypath Statistics Loop Loop event Automaton Automaton Loop Loop Loop bounds Loop Automaton Automaton Statistics Loop Loop Automaton Automaton Statistics Module Innermost Loop Selector 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 14
Loop Automata Waypath ID, Cycles Loop Automaton Loop Tracepath Reconstruction context Loop Iteration Statistics Instruction Waypath Loop Event Generator Context FSM event Loop Loop Iteration bounds Loop Counter FSM Statistics Statistics Module 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 15
Loop Event Generator Comparator Tree Set 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 16
Evaluation Setting • Xilinx Zynq XC7Z020-1CLG484C – Dual-core ARM Cortex-A9 (666 MHz) – 32 kilobytes of L1 cache – 512 kilobytes of L2 cache (disabled) – SRAM data memory • DDR3 instruction memory (533 MHz) • TACLeBench benchmark collection – Executing each benchmark ten times – With and without L1 instruction cache enabled • Xilinx SDK 2016.1 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 17
Context-Insensitive Overestimation (Ratio) Context-insensitive runtime estimation / End-to-End runtime 7 3000 L1 enabled L1 disabled 6 2500 5 2000 4 1500 *trace buffer overflow 3 1000 2 500 1 0 0 Average: 2,20 Average: 2,76 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 18
Context-Sensitive Overestimation (Ratio) Context-sensitive runtime estimation / End-to-End runtime (L1 enabled) Context-insensitivity overhead (L1 enabled) 4,5 3000 4 2500 3,5 2000 3 2,5 1500 2 1000 1,5 1 500 0,5 0 0 Average: 2,02 Avg. overhead: 6 % 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 19
Conclusion Continuous ● We perform direct online aggregation at runtime. Non-intrusive ● We use the hardware support of modern SoCs. Hybrid WCET Estimation Using Waypoint Graphs ● We measure waypath execution times online. ● We estimate the overall runtime offmine. 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 20
Thank you! 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 21
Recommend
More recommend