Interesting Questions Interesting Questions � Is every statement reachable? � Does every non- void method return a value? Compilation 2007 Compilation 2007 � Will local variables definitely be assigned before Static Analysis Static Analysis they are read? � Will the current value of a variable ever be read? � ... � How much heap space will the program need? � Does the program always terminate? � Will the output always be correct? Michael I. Schwartzbach BRICS, University of Aarhus Static Analysis 2 Rice’ ’s Theorem s Theorem Rice’ ’s Theorem Explained s Theorem Explained Rice Rice Theorem 11.9 (Martin p. 420) Theorem 11.9 (Martin, p. 420) If R is a property of languages that is satisfied by some but "Every interesting question about the behavior of a not all recursively enumerable languages then the decision program is undecidable." problem P R : Given a TM T , does L(T) have property R ? is unsolvable. Static Analysis 3 Static Analysis 4 1
Static Analysis Conservative Approximation Static Analysis Conservative Approximation � Static analysis provides approximate answers to � A typical scenario for a boolean property: interesting questions about programs • if the analysis says yes , the property definitely holds � The approximation is conservative , meaning that • if it says no , the property may or may not hold some answers are guaranteed to be true • only the yes answer will help the compiler • a trivial analysis will say no always • the engineering challenge is to say yes often enough � Compilers spend most of their time performing static analysis so they may: � For other kinds of properties, the notion of • understand the semantics of programs approximation may be more subtle • provide safety guarantees • generate efficient code Static Analysis 5 Static Analysis 6 A Range of Static Analyses The Phases Phases of of GCC (1/2) GCC (1/2) A Range of Static Analyses The � Static analysis may take place: Parsing If-conversion Tree optimization Register movement • at the source code level RTL generation Instruction scheduling • at some intermediate level Sibling call optimization Register allocation • at the machine code level Jump optimization Basic block reordering Register scan � Static analysis may look at: Delayed branch scheduling Jump threading Branch shortening • statement blocks only Common subexpression elimination Assembly output • an entire method (intraprocedural) Loop optimizations Debugging output • the whole program (interprocedural) Jump bypassing � The precision and cost both rise as we include Data flow analysis Instruction combination more information Static Analysis 7 Static Analysis 8 2
The Phases Phases of of GCC (2/2) GCC (2/2) Reachability Analysis The Reachability Analysis Parsing If-conversion � Java requires two reachability guarantees: Tree optimization Register movement • all statements must be reachable (avoid dead code) RTL generation Instruction scheduling • all non- void methods must return a value Sibling call optimization Register allocation � These are interesting properties and thus they are Jump optimization Basic block reordering Register scan undecidable Delayed branch scheduling Jump threading Branch shortening � But a static analysis may provide conservative Common subexpression elimination Assembly output approximations Loop optimizations Debugging output � To ensure that different compilers accept the Jump bypassing Data flow analysis same programs, the Java language specification Static analysis uses 60% Instruction combination of the compilation time mandates a specific static analysis Static Analysis 9 Static Analysis 10 Constraint- -Based Analysis Based Analysis Information Flow Constraint Information Flow � For every node S that represents a statement in � The values of R [[...]] are inherited the AST, we define two boolean properties: � The values of C [[...]] are synthesized • C [[S]] denotes that S may complete normally • R [[S]] denotes that S is possibly reachable � A statement may only complete if it is reachable R C � For each syntactic kind of statement, we generate constraints that relate C [[...]] and R [[...]] AST Static Analysis 11 Static Analysis 12 3
Reachability Constraints (1/3) Reachability Constraints (2/3) Reachability Constraints (1/3) Reachability Constraints (2/3) while( E ) S: if( E ) S: R [[S]] = R [[ while( E ) S]] R [[S]] = R [[ if( E ) S]] C [[ while( E ) S]] = R [[ while( E ) S]] C [[ if ( E ) S]] = R [[ if( E ) S]] return : if( E ) S 1 else S 2 : C [[ return ]] = false R [[S i ]] = R [[ if( E ) S 1 else S 2 ]] return E: C [[ if( E ) S 1 else S 2 ]] = C [[S 1 ]] ∨ C [[S 2 ]] C [[ return E]] = false while(true) S: throw E: R [[S]] = R [[ while(true) S]] C [[ throw E]] = false C [[ while(true) S]] = false { σ x ; S } : while(false) S: R [[S]] = R [[ { σ x ; S } ]] R [[S]] = false C [[ { σ x ; S } ]] = C [[S]] C [[ while(false) S]] = R [[ while(false) S]] Static Analysis 13 Static Analysis 14 Reachability Constraints (3/3) Exploiting the Information Reachability Constraints (3/3) Exploiting the Information S 1 S 2 : � For any statement S where R [[S]] = false: R [[S 1 ]] = R [[S 1 S 2 ]] unreachable statement R [[S 2 ]] = C [[S 1 ]] C [[S 1 S 2 ]] = C [[S 2 ]] � For any non- void method with body { S } where C [[S]] = true: for any simple statement S: missing return statement C [[S]] = R [[S]] � These guarantees are sound but conservative for any method or constructor body { S } : R [[S]] = true Static Analysis 15 Static Analysis 16 4
Approximations Definite Assignment Analysis Approximations Definite Assignment Analysis � C [[S]] may be true too often: � Java requires that a local variable is assigned before its value is used some unfair missing return errors may occur if (b) return 17; � This is an interesting properties and thus it is if (!b) return 42; undecidable � But a static analysis may provide a conservative � R [[S]] may be true too often: approximation some dead code is not detected � To ensure that different compilers accept the if (b==!b) { ... } same programs, the Java language specification mandates a specific static analysis Static Analysis 17 Static Analysis 18 Constraint- -Based Analysis Based Analysis Increased Precision Constraint Increased Precision � For every node S that represents a statement in � To handle cases such as: the AST, we define some set-valued properties: { int k; • B [[S]] denotes the variables that are definitely if (a>0 && (k=System.in.read())>0) System.out.print(k); assigned before S is executed } • A [[S]] denotes the variables that are definitely we also use two refinements of A[[...]]: assigned after S is executed • A t [[E]] which assumes that E evaluates to true • A f [[E]] which assumes that E evaluates to false � For every node E that represents an expression in the AST, we similarly define B [[E]] and A [[E]] Static Analysis 19 Static Analysis 20 5
Information Flow Definite Assignment Constraints (1/7) Information Flow Definite Assignment Constraints (1/7) if( E ) S: � The values of B [[...]] are inherited B [[E]] = B[[ if( E ) S]] � The values of A [[...]], A t [[....]] and A f [[...]] are B [[S]] = A t [[E]] synthesized A [[ if( E ) S]] = A [[S]] ∩ A f [[E]] if( E ) S 1 else S 2 : B [[E]] = B [[ if( E ) S 1 else S 2 ]] B A, A t , A f B [[S 1 ]] = A t [[E]] B [[S 2 ]] = A f [[E]] AST A [[ if( E ) S 1 else S 2 ]] = A [[S 1 ]] ∩ A [[S 2 ]] Static Analysis 21 Static Analysis 22 Definite Assignment Constraints (2/7) Definite Assignment Constraints (3/7) Definite Assignment Constraints (2/7) Definite Assignment Constraints (3/7) while( E ) S: E ; : B [[E]] = B [[ while( E ) S]] B [[E]] = B [[E ; ]] B [[S]] = A t [[E]] A [[E ; ]] = A [[E]] A [[ while( E ) S]] = A f [[E]] { σ x = E ; S } : return : B [[E]] = B [[ { σ x = E ; S } ]] the set of all variables in scope A [[ return ]] = ∞ B [[S]] = A [[E]] ∪ {x} return E: A [[ { σ x = E ; S } ]] = A [[S]] B [[E]] = B [[ return E]] { σ x ; S } : A [[ return E]] = ∞ B [[S]] = B [[ { σ x ; S } ]] throw E: A [[ { σ x ; S } ]] = A [[S]] B [[E]] = B [[ throw E]] A [[ throw E]] = ∞ Static Analysis 23 Static Analysis 24 6
Recommend
More recommend