Datalog CO444H Pointer analysis Ben Livshits 1
Call Graphs • Class analysis: • Given a reference variable x, what are the classes of the objects that x refers to at runtime? • We saw CHA and RTA • Deal with polymorphic/virtual calls: x.m() • Compilers: can we devirtualize a virtual call x.m() ? • Software engineering: • Construct the call graph of the program • Why is that important in everyday development?
Features of RTA • RTA may evaluate a method several times • If new callers are discovered the method has to be re-evaluated • RTA runs until the worklist is empty, at which point it has reached a fixed point and cannot resolve any new call edges to add to the call graph 3
RTA Revisited RAPID TYPE ANALYSIS RTA = call graph of only methods (no edges) CHA = class hierarchy analysis call graph W = worklist containing the main method while W is not empty M = next method in W T = set of allocated types in M T = T U {allocated types in RTA callers of M} for each callsite (C) in M if C is a static dispatch or constructor: add an edge to statically resolved method otherwise: M' = methods called from M in CHA M' = M' intersection {methods declared in T or supertypes of T} add an edge from the method M to each method in M' add each method in M' to worklist W 4
Using RTA in Eclipse 5
RTA May Be Unsound • main calls foo, which public static void main(String[] args){ returns an allocation Object o = foo(); of type A that is then bar(o); passed as a } parameter in the call to bar public static Object foo(){ • The call edge to A. return new A(); toString would be } missing because neither bar or its public static void bar(Object o){ parents (main) o.toString() allocated a type of A } 6
Call Graph Construction: Reachability Computation Queue worklist CallGraph graph; worklist.addAtTail( main() ) Graph.addNode( main() ) while (worklist.notEmpty()) { m = worklist.getFromHead(); process_method_body(m); }
Next Steps… • Ingredients • Adding pointers • Adding call graphs • Combining those two • How to we mix the ingredients? • Can first build a call graph; then add pointers • Can do it all at once: we can use Datalog to represent everything , with some Datalog relations encoding intraprocedural aspects and some interprocedural 8
9 Pointer Analysis: Basics and Algorithms
Variants of Pointer Analysis • For C: • Andersen analysis • Steensgard analysis • Pointer analysis for Java • How to encode these in Datalog • Other variants 10
What is the Goal of Pointer Analysis? • What memory locations can a pointer expression refer to? • Alias analysis: When do two pointer expressions refer to the same storage location? int x; •*p and *q alias p = &x; • as do x and *p q = p; • and x and *q 11
Sources of Aliasing • Aliasing can arise due to several reasons, depending on the language… • Pointers • e.g., int *p, i; p = &i; • Call-by-reference void m(Object a, Object b) { … } m(x,x); // a and b alias in m • Array indexing • int i, j, a[100]; • i = j; // a[i] and a[j] alias 12
Why do we Want to Know? • Pointer analysis tells us • If *p aliases a or b, then second computation of a+b what memory locations is not redundant code uses or modifies • E.g., consider constant • Useful in many analyses propagation • E.g., available x = 3; expressions *p = 4; *p = a + b; y = x; y = a + b; • Is y constant? • If *p and x do not alias each other, then yes. • If *p and x always alias each other, then yes. • If *p and x sometimes alias each 13 other, then no
Pointer Analysis Dimensions • Intraprocedural / • Definiteness: May interprocedural versus must • Flow-sensitive / • Heap modelling flow-insensitive • Data representation • Context-sensitive / context-insensitive 14
Flow-sensitive vs. Flow- insensitive Points-To • Flow-sensitive pointer • Flow-sensitive pointer analysis computes for each analysis is (traditionally) program point what too expensive to memory locations pointer perform for whole expressions may refer to program • Flow-insensitive pointer • Flow-insensitive pointer analysis computes what memory locations pointer analyses typically used expressions may refer to, at for whole program any time in program analyses execution 15
Context Sensitivity • Also difficult, • BDDs see Whaley and Lam PLDI 2004 but success in • Doop, Bravenboer and scaling up to Smaragdakis OOPSLA hundreds of 2009 thousands LOC 16
May vs. Must • May analysis: aliasing • Sometimes both are that may occur during useful execution • E.g., consider liveness • (cf. must-not alias, analysis for *p = *q + 4; although often has • If *p must alias x, then different x in kill set for representation) statement • Must analysis: aliasing • If *q may alias y, then y that must occur during in gen set for statement execution 17
Representation Options • Points-to pairs: first • Pairs that refer to the element points to the same memory second • e.g., (*p,b), (*q,b), (*p,*q), (**r, b) • e .g., (p → b), (q → b) • General, may be less • p and b alias, as do *q concise than points-to and b, as do *p and *q pairs • Equivalence sets: sets that are aliases • e.g., {*p,*q,b} 18
Modeling Memory Locations • We want to describe • For local variables, use what memory locations a single “node” per a pointer expression context may refer to • i.e., just one node if context insensitive • How do we model • For dynamically memory locations? allocated memory • For global variables, no • Problem: Potentially trouble, use a single unbounded locations “node” created at runtime • Need to model locations with some finite abstraction 19
Modeling Dynamic Memory Locations • Other solutions: • For each allocation statement, use one • One node for node per context entire heap • Note: could choose • One node for context-sensitivity for each type modelling heap • Nodes based on locations to be less precise than context- analysis of sensitivity for modelling “ shape ” of heap procedure invocation 20
Problem Statement • Let’s consider flow- • Assume pointers p,q ∈ P and address-taken variables insensitive may pointer a,b ∈ A are disjoint analysis • Can transform program to • Assume program make this true • For any variable v for which consists of statements this isn’t true, add statement of form pv = &av, and replace v with *pv p = &a (address of, includes allocation • Want to compute relation statements) pts : P ∪ A → 2 A p = q *p = q • Essentially points to pairs p = *q 21
Andersen-style Pointer Analysis • View pointer assignments as subset constraints • Use constraints to propagate points-to information • Called inclusion-based pointer analysis 22
Andersen-style Pointer Analysis • Can solve these constraints directly on sets pts(p) • p = &a; p ⊇ {a} • q = p; q ⊇ p • p = &b; p ⊇ {b} • r = p; r ⊇ p 23
Example of Subset Constraints 24
How Precise Is This Analysis? 25
Andersen-style as Graph Closure • Can be cast as a graph closure problem • One node for each pts(p), pts(a) • Each node has an associated points-to set • Compute transitive closure of graph, and add edges according to complex constraints 26
Work List Algorithm • Initialize graph and points to sets using base and simple constraints • Let W = { v | pts(v) ≠ ∅ } (all nodes with non-empty points to sets) • While W not empty • v ← select from W • for each a ∈ pts(v) do • add edge a→ p, and add a to W if edge is new • for each constraint *v ⊇ q • add edge q→a , and add q to W if edge is new • for each edge v→q do • pts(q) = pts(q) ∪ pts(v), and add q to W if pts(q) changed 27
Same Example, as A Graph (Initial) W: p q r s a 28
Same Example, as A Graph (Final) W: {} 29
Cycle Elimination • Andersen-style pointer analysis is O(n 3 ), for number of nodes in graph • Actually, quadratic in practice [Sridharan and Fink, SAS 09]; • Improve scalability by reducing the value of n • Cycle elimination: important optimization for Andersen- style analysis • Detect strongly connected components in points-to graph, collapse to single node • Why? All nodes in an SCC will have same points-to relation at end of analysis 30
Steensgaard-style Analysis • Also a constraint-based analysis • Uses equality constraints instead of subset constraints • Originally phrased as a type-inference problem • Less precise than Andersen-style, thus more scalable 31
Steensgaard-style Example p q a b c p,q a,b c All pointers end up in p,q r the same equivalence a,b c class pointing to all the locations p,q,s,t r a,b c p,q,s,t,r a,b,c 32
Implementing Steensgaard • Can be efficiently implemented using UnionFind algorithm • Nearly linear time: O( nα (n)) • Each statement needs to be processed just once • Unlike Andersen’s, which is a lot more difficult to scale 33
34 Datalog-based Formulation of Pointer Analysis
Recommend
More recommend