Last Time � Response time analysis � Blocking terms � Priority inversion � And solutions � Release jitter � Release jitter � Other extensions
Today � Timing analysis � Answers a question we commonly ask: • At most long can this code take to run? � Response time over CAN � Worst-case message times � Holistic scheduling
Timing Analysis Definitions � Worst case execution time (WCET) : Longest execution time of a program on a given platform, considering all possible inputs � Precise timing analysis problem : Compute WCET � Trivially reduces to halting problem • Though not in practice • But still too hard � Timing analysis problem : Compute a conservative estimate of the WCET � I.e., estimate of WCET can be > true WCET � This is decidable • Correct analyzer could always return �
Timing Analysis by Testing � WCET is often estimated by looking for the maximum execution time over many executions � This is easy � However, it does not solve the problem! ions True True Number of executio WCET Execution time WCET Longest Longest estimate observed observed ET #1 ET #2
Timing Analysis by Testing � Always true: � Longest observed ET � true WCET � WCET estimate � Question: What is the requirement for correctly estimating WCET using testing?
Static Timing Analysis � Static timing analysis : Estimate WCET without running a program � Problem 1: Can’t do this from source code � Which variables go into registers? � Which functions are inlined? � Which switches become jump tables vs. cascaded tests? � Solution: Analyze compiler output � Problem 2: Understanding what’s going on in HW � Where are the branch mispredicts? � Where are the icache / dcache misses? � Solution: Build model of the hardware
Static Timing Analysis int foo1 (int a, int b) { int c = b + 31*a; int e = 120 - c - a; return e; } link a6,#0 link a6,#0 move.l 8(a6),d0 moveq #-32,d1 muls.l d1,d0 sub.l 12(a6),d0 addi.l #120,d0 unlk a6 rts � What does it take to estimate WCET of this code?
Analyzing Branches void foo2 (int a) { if (a) { x += 3*a; } else { y -= x-a; }} link a6,#0 move.l 8(a6),d2 tst.l d2 tst.l d2 beq.s *+16 moveq #3,d0 muls.l d0,d2 add.l d2,_x bra.s *+24 move.l _x,d1 sub.l d2,d1 move.l _y,d0 sub.l d1,d0 move.l d0,_y unlk a6 rts
Analyzing Loops void foo3 (int a) { do { y++; } while (a--); } link a6,#0 move.l 8(a6),d2 move.l 8(a6),d2 move.l _y,d1 move.l d2,d0 addq.l #1,d1 subq.l #1,d2 tst.l d0 bne.s *-8 move.l d1,_y unlk a6 rts
Loop Analysis Strategies Programmer annotates loops with bounds 1. � Not very fun � Doesn’t work well for library code � However, could argue that in critical software programmer should always know the loop bounds Analyzer tries to figure out loop bounds Analyzer tries to figure out loop bounds 2. 2. � Doesn’t always work � Derived bound might be too high � Reasonable answer: Analyzer figures out simple loops, programmer annotates difficult ones
Bottom-Up WCET Analysis void foo4 (void) foo4 { 15+3 = 18 if (y==0) { if y == 0 if (x>5) { max(7,13) + 2 = 15 x++; x++; if x>5 x=1 } else { max(5,4) + 2 = 7 3+10 = 13 x = 1; } x++ x = 1 3+2 = 5 3+1 = 4 } else { x *= 3; Return Return Return } 3 3 3 }
Real-World Problems � Timing models for complex processors are difficult to create � Probably impossible for processors like Pentium 4 • Not even Intel knows! � However, easy for ColdFire, ARM, and lots of others � Caches, TLBs, branch predictors enormously � Caches, TLBs, branch predictors enormously complicate WCET analysis � Need to estimate cache, TLB, predictor state at every program point � Difficult, imprecise, and computationally expensive � Pointers and heap allocation are very difficult to analyze � But critical software typically doesn’t do much of these
Hardware Horror Story � Start with some simple code: � Measure time per loop iteration for k = 1..32
Result on NEC V850E
Result on Pentium III
Result on Athlon
Commercial Timing Analysis � aiT from Absint Supports ARM7TDMI, ColdFire 5307, PowerPC 755, � MPC5xx � Analysis steps: Reconstruct control flow from object code � Value analysis: Computation of address ranges for Value analysis: Computation of address ranges for � � instructions accessing memory Cache analysis: Classification of memory references as � cache misses or hits Pipeline analysis: Predicting the behavior of the program � on the processor pipeline Path analysis: Determination of the worst-case execution � path of the program Analysis of loops and recursive procedures �
Making Predictable Systems � Avoid recursion � Avoid deeply nested loops � Avoid if/else where one branch is a lot faster than the other � Avoid data-dependent loops � Avoid data-dependent loops � Use fixed iteration count whenever possible � Avoid variable-time data structures � E.g. hash tables are usually very unpredictable � Avoid unpredictable thread blocking � E.g. on disk, network, etc. � Avoid unpredictable processors � This is any processor that is much faster than its memory
Timing Analysis Summary � WCET estimation for simple hardware + simple software � Largely a solved problem � Technology far less mature than e.g. compiler technology � WCET estimation for complex hardware + complex software software � Open problem � May not be possible � May not even be a good idea • I.e. nobody cares about WCET of spell check in MS Word � Products exist � What kind of chip should one use for a high- performance, time-critical embedded system?
Time Guarantees over CAN � Basic idea: � Processors scheduled using priorities � We know WCET of tasks � CAN scheduled using priorities � We know WCTT of messages � Can put it all together using “holistic scheduling” � Can put it all together using “holistic scheduling” � Why do we care? � Accelerometer on your pickup is on CAN bus � Airbags are also on CAN bus � Want to guarantee airbag deployment within 150 ms of when you start to roll the truck • Even if lots of other stuff is going over the bus
Modeling the CAN Bus � Recall: � CAN message stores 0-64 bits of data � 47 bits of message overhead � Bit stuffing occurs – in worst case every 5 bits has a 6 th added � 34 overhead bits stuffed � 34 overhead bits stuffed � For an n-byte message: � WC number of stuff bits = floor ((34 + 8n -1)/4) � � � � 34 + 8 s − 1 i � � 8 47 C = s + + τ � � � � i i bit � � � � 5
Modeling CAN � Priority of a message determined by message type � Message must have minimum period T i � Blocking term B i = 135 � bit � Release jitter J i equal to queuing delay � Usually equal to worst-case response time of the task the � Usually equal to worst-case response time of the task the queues the message � Now we just reuse the processor scheduling equation! � � � R J + i i R = C + B + C � � i i i j � � T j ∀ j ∈ hp ( i )
Modeling the Whole CAN Network � How to compute minimum queuing time of a message? � Could be clever, but 0 is safe � How to compute maximum queuing time of a message? � Equal to worst-case response time of task that queues the message � How to compute release jitter of a task that awaits a message i? � J dest(i) = R i – 47 � bit � This is nice but there’s a problem… � Circular dependency between task and message jitters
Holistic Analysis � Solution to circular dependencies: � Start out with all jitters set to zero � Iterate between processor and network scheduling until convergence � This is the same trick we used to solve the dependency of task response time on itself last lecture task response time on itself last lecture � Finally: We have worst-case response time for every task and message � …and we can figure out if the airbag deploys on time � What if response times are too long? � Can fiddle with message priorities � Finding an optimal priority ordering (that minimizes global response times) is NP-hard
CAN Bus Scheduling Summary � Can reason about an entire network of processors plus their network using holistic scheduling � Pretty cool result � This is used in practice
Recommend
More recommend