dataflow analysis first example analysis 1 available
play

Dataflow analysis First example (analysis #1) Available expressions - PDF document

Dataflow analysis First example (analysis #1) Available expressions Michel Schinz Advanced compiler construction, 2008-05-09 Common subexp. elimination Available expressions The following C program fragment sets r to x y for y > 0. How can


  1. Dataflow analysis First example (analysis #1) Available expressions Michel Schinz Advanced compiler construction, 2008-05-09 Common subexp. elimination Available expressions The following C program fragment sets r to x y for y > 0. How can it be (slightly) optimised? Why is the previous optimisation valid? 1 int y 1 = 1; Because at line 7, where expression y 1 *2 appears for the 2 int r = x; second time, it is available . That is, no matter how we reach 3 while (y 1 != y) { line 7, y 1 *2 will have been computed previously at line 4. 4 int t = y 1 *2; The computation of line 4 is still valid at line 7 because no 5 if (t <= y) { redefinition of y 1 appears between those two points. 6 r = r*r; 7 y 1 = y 1 *2; Generally speaking, we can define for every program point 8 } else { the set of available expressions , which is the set of all non- Here, y 1 *2 can be 9 r = r*x; trivial expressions whose value has already been computed replaced by t 10 y 1 = y 1 +1; at that point. 11 } 12 } 3 4 Available expressions Formalising the analysis {} int y 1 = 1 {} Note: we before after only consider {} int r = x {} arithmetic expressions. How can these ideas be formalised? {} while (y 1 != y) {} 1. introduce a variable i n for the set of expressions available before node n , and a variable o n for the set of {} int t = y 1 *2 { y 1 *2 } expressions available after node n , 2. define equations between those variables, { y 1 *2 } if (t <= y) { y 1 *2 } 3. solve those equations. { y 1 *2 } r = r*r { y 1 *2 } { y 1 *2 } r = r*x { y 1 *2 } { y 1 *2 } y 1 = y 1 *2 {} { y 1 *2 } y 1 = y 1 +1 {} 5 6

  2. Equations Solving equations int y 1 = 1 1 o 1 = i 1 i 1 ={} The equations can be solved by iteration: o 2 = i 2 i 2 = o 1 int r = x 2 i 3 = o 2 � o 7 � o 10 o 3 = i 3 1. initialise all sets i 1 , …, i 10 , o 1 , …, o 10 to the set of all i 4 = o 3 o 4 = { y 1 *2 } � i 4 non-trivial expressions in the program, here 3 while (y 1 != y) i 5 = o 4 o 5 = i 5 { y 1 *2 , y 1 +1 , r*r , r*x }, i 6 = o 5 o 6 = i 6 � r 2. viewing the equations as assignments, compute the int t = y 1 * 2 i 7 = o 6 o 7 = i 7 � y 1 4 “new” value of those sets, i 9 = o 5 o 9 = i 9 � r 3. iterate until fixed point is reached. i 10 = o 9 o 10 = i 10 � y 1 if (t <= y) 5 Initialisation is done that way because we are interested in Notation: finding the largest sets satisfying the equations: the larger a r = r * r r = r * x 6 9 S � x = set is, the more information it conveys (for this analysis). S \{all expressions using x } y 1 = y 1 * 2 y 1 = y 1 + 1 7 10 7 8 Solving equations Solving equations To simplify the equations, we can first replace all i k variables The simpler system can be solved by iterating until a fixed by their value, to obtain a simpler system, and then solve that point is reached, which happens after 7 iterations. system. It. 1 2 3 4 5 6 7 For our example, we get: o 1 YR {} {} {} {} {} {} o 2 YR YR {} {} {} {} {} o 1 = {} o 6 = o 5 � r o 3 YR YR R {} {} {} {} o 2 = o 1 o 7 = o 6 � y 1 o 4 o 3 = o 2 � o 7 � o 10 YR YR YR { y 1 *2 , r*r , r*x } { y 1 *2 } { y 1 *2 } { y 1 *2 } o 9 = o 5 � r o 5 YR YR YR YR { y 1 *2 , r*r , r*x } { y 1 *2 } { y 1 *2 } o 4 = o 3 � { y 1 *2 } o 10 = o 9 � y 1 o 5 = o 4 o 6 YR Y Y Y Y { y 1 *2 } { y 1 *2 } o 7 YR R {} {} {} {} {} o 9 YR Y Y Y Y { y 1 *2 } { y 1 *2 } o 10 YR R {} {} {} {} {} Notation: Y ={ y 1 *2 , y 1 +1 }, R ={ r*r , r*x }, YR = Y � R 9 10 Generalisation Note: generated expressions In general, for a node n of the control-flow graph, the equations have the following form: The equation giving the expressions available at the exit of i n = o p1 � o p2 � … � o pk node n is: where p 1 … p k are the predecessors of n. o n = gen AE ( n ) � ( i n \ kill AE ( n )) o n = gen AE ( n ) � ( i n \ kill AE ( n )) where gen AE ( n ) are the non-trivial expressions computed by n , and kill AE ( n ) is the set of all non-trivial expressions that use where gen AE ( n ) are the non-trivial expressions a variable modified by n . computed by n , and kill AE ( n ) is the set of all non-trivial expressions that use a variable modified by n . In order for this equation to be correct, expressions that are computed by n but which use a variable modified by n must Substituting i n in o n , we obtain the following equation for o n : not be part of gen AE ( n ). For example o n = gen AE ( n ) � [( o p1 � o p2 � … � o pk ) \ kill AE ( n )] gen AE ( x=y*y ) = { y*y } but gen AE ( y=y*y ) = {} These equations are the dataflow equations for the available expressions dataflow analysis. 11 12

  3. Dataflow analysis Analysis scope Available expressions is one example of a dataflow analysis. In this course, we will only consider intra-procedural Dataflow analysis is a global analysis framework that can be dataflow analyses. That is, analyses that work on a single used to approximate various properties of programs. function at a time. The results of those analyses can be used to perform several As in our example, those analyses work on the code of a optimisations, for example: function represented as a control-flow graph ( CFG ). • common sub-expression elimination, as we have seen, The nodes of the CFG are the statements of the function. • dead-code elimination, The edges of the CFG represent the flow of control: there is • constant propagation, an edge from n 1 to n 2 if and only if control can flow immediately from n 1 to n 2 . That is, if the statements of n 1 and • register allocation, n 2 can be executed in direct succession. • etc. 13 14 Live variable A variable is said to be live at a given point if its value will be Analysis #2 read later. While liveness is clearly undecidable, a conservative approximation can be computed using dataflow analysis. Live variables This approximation can then be used, for example, to allocate registers: a set of variables that are never live at the same time can share a single register. 16 Intuitions Equations We associate to every node n a pair of variables ( i n , o n ) that give the set of variables live when the node is entered or exited, respectively. These variables are defined as follows: Intuitively, a variable is live after a node if it is live before any i n = gen LV ( n ) � ( o n \ kill LV ( n )) of its successors. where gen LV ( n ) is the set of variables read by n , and Moreover, a variable is live before node n if it is either read kill LV ( n ) is the set of variables written by n . by n , or live after n and not written by n . o n = i s1 � i s2 � … � i sk Finally, no variable is live after an exit node. where s 1 … s k are the successors of n . Substituting o n in i n , we obtain the following equation for i n : i n = gen LV ( n ) � [( i s1 � i s2 � … � i sk ) \ kill LV ( n )] 17 18

  4. Equation solving Example CFG equations solution 1 x=read-int i 1 = i 2 \ { x } i 1 = {} i 2 = i 3 \ { y } i 2 = { x } We are interested in finding the smallest sets of variables live i 3 = { x , y } � ( i 4 � i 5 ) 2 y=read-int i 3 = { x , y } at a given point, as the information conveyed by a set i 4 = { x } � ( i 6 \ { z }) i 4 = { x } decreases as its size increases. i 5 = { y } � ( i 6 \ { z }) i 5 = { y } 3 if x<y i 6 = { z } i 6 = { z } Therefore, to solve the equations by iteration, we initialise all sets with the empty set. 4 z=x 5 z=y 6 print z 19 20 Using live variables The previous analysis shows that neither x nor y are live at the same time as z . Therefore, z can be replaced by x or y , thereby removing one assignment. original CFG analysis result optimised CFG Analysis #3 i 1 = {} 1 x=read-int x=read-int i 2 = { x } Reaching definitions i 3 = { x , y } 2 y=read-int y=read-int i 4 = { x } i 5 = { y } 3 if x<y if x<y i 6 = { z } 4 z=x 5 z=y y=x 6 print z print y 21 Reaching definitions Intuitions Intuitively, a definition reaches the beginning of a node if it reaches the exit of any of its predecessors. Moreover, a definition contained in a node n always reaches The reaching definitions for a program point are the the end of n itself. assignments that may have defined the values of variables at that point. Finally, a definition reaches the end of a node n if it reaches the beginning of n and is not killed by n itself. Dataflow analysis can approximate the set of reaching definitions for all program points. These sets can then be (A node n kills a definition d if and only if n is a definition used to perform constant propagation, for example. and defines the same variable as d .) As a first approximation, we consider that no definition reaches the beginning of the entry node. 23 24

Recommend


More recommend