4/5/2010 What is points-to analysis? � Informally: analysis determining what locations (objects) pointers can point to Points-To Analysis Program main() { x = &a; Meeting 21, CSCI 5535, Spring 2010 y = &b; z = x; Guest Lecture: Manu Sridharan } Result pt(x) = {a}, pt(y) = {b}, pt(z) = {a} Importance of points-to analysis Lecture overview Verification � What we’ll cover Is { *x = 10 } *y = 3 { *x = 10 } valid? � Definition / complexity of several points-to analysis If x and y cannot point to same location, then yes variants Optimization � Andersen’s analysis via CFL-reachability Can � Some issues with handling method calls a = *x; *y = 4; b = *x � Refinement-based analysis be optimized to a = *x; *y = 4; b = a? � What we’ll skip (lack of time) If x and y cannot point to same location, then yes � Shape analysis (Evan will cover later in semester) Control Flow � Many other optimizations, control-flow analysis of Can l.add() invoke ArrayList.add()? functional languages, … If l can point to an ArrayList object, then yes Formal definition (for C) Soundness and precision � If p can point to q in some execution, a sound analysis Points-to analysis : Given a program and two will always report it variables p and q, points-to analysis checks if p � Analysis may be over-approximate , reporting p can point can point to q in some program execution to q even if it cannot � A precise analysis is sound but not over-approximate [Chakaravarthy03] � Yields exact answer given program semantics � malloc creates a fresh, unnamed variable � Precise analysis for C is undecidable Alias analysis : check if p1 and p2 can point to q � Or, for any Turing-complete language simultaneously in some execution � foo( TuringMachine M, TMInput i) { run M on i; p = &q; } � We’ll focus on points-to Bottom line: to obtain decidability and efficiency, must For now, assume no procedure calls approximate program semantics 1
4/5/2010 Approximation 1: Path Insensitivity Approximation 2: Flow Insensitivity � Treat all branches as non-deterministic � Assume statements can execute in any order � Given if (c) then p; else q; , always assume � With possible repetition either p or q can execute � Assume control-flow graph is complete � Must still respect execution order (flow sensitive) � Complexity � Complexity � With dynamic memory (malloc), undecidable � With dynamic memory (malloc), decidability � See [Ramalingam94,Chakaravarthy03] unknown (!) � Without dynamic memory, PSPACE-complete [MD00] � Without dynamic memory, NP-Hard [Horwitz97] � Even with just one procedure! � Bottom line: need even more approximation � Bottom line: need to approximate more Simultaneity Approximation 3: Andersen’s Semantics of pointer accesses � Assumes discovered points-to relations can all Pointer Write Pointer Read occur simultaneously x w � Hence, less precise handling of pointer accesses y z w x = *y *x = y � Challenge: express as approximate semantics? x y z � Breaks up multi-level derefs � Note: black arrows must occur simultaneously � **x = y becomes temp = *x, *temp = y Issue: Some relations cannot arise simultaneously � Again, imprecision due to simultaneity reasoning (**x does two derefs atomically) Statement set (flow insensitive): � Heap abstraction? Other? (I don’t know) {a=&c;b=a;c=&b;b=&a;*b=c} b points to c: a=&c;b=a, a points to b: c=&b;b=&a;*b=c � Complexity: O(N 3 ); much better! But not both! Andersen’s for Java: The Basics � Four statement types � new : x = new Obj() � assign : x = y � getfield : x = y.f � putfield : x.f = y � Single abstract location for each new ANDERSEN’S ANALYSIS IN CFL- � Represents objects allocated by all executions REACHABILITY � For more precise treatment, shape analysis 2
4/5/2010 CFL-Reachability More on CFL-Reachability � � Several variants � � → �� �� � � � ε � � � � All-pairs : find all pairs of nodes connected by valid paths � � � Single-source : find all nodes to which source is connected by valid path ������� � General algorithm O(N 3 ) Points-to analysis graph: � N is number of nodes • Nodes represent variables / abstract locations � Faster algorithms for special cases (see [RHS95]) • Edges represent statements Points-to analysis paths: � Specialized algorithm needed to scale pointer analysis � ∈ �� � � � • flowsTo - path from o to x: � For more details, see [Reps98] • alias - path from x to y: �� � � � ∩ �� � � � ≠ ∅ What about alias ? Andersen’s Analysis in CFL-Reachability x = new Obj(); // o 1 � Want: � ����� � ⇔ ∃ � � � ������� � ∧ � ������� � ��� � � � z = new Obj(); // o 2 w = x; ������ ������ � Problem: need all edges in same direction y = x; � � � Solution: alias => flowsTo flowsTo y.f = z; Edge types statement v = w.f; ����� ����� � flowsTo is inverse of flowsTo flowsTo alias ������ ��� � Must add inverse edges to graph (e.g., assign) � � � � � See [SB06] for full grammar flowsTo => new (pf[f] alias gf[f] | assign)* flowsTo => new (assign)* balanced parens Importance of Handling Method Calls � Used pervasively, esp. in Java-like languages � Often deeply related to objects and pointers class ArrayList { Object[] elems; int i; public ArrayList() { allocation this.elems = new Object[10]; } METHOD CALLS public void add(Object o) { pointer write this.elems[i++] = o; } public Object get(int i) { pointer read return this.elems[i]; } } 3
4/5/2010 Precise Handling of Method Calls Decidability with Context Sensitivity � Precise path-insensitive + dynamic memory still � Idea: analyze as if all method calls inlined undecidable � Yields separate copies of local variables / new � Already undecidable with just one method expressions for each possible call � Flow-insensitive + dynamic memory + precise � Known as a context-sensitive analysis calls: undecidable � Problem: how to handle recursion � Recall that with one method, decidability unknown � Full inlining yields an infinite program � Via small modification of [Reps00] proof � But, analysis definitions still work fine! � Even Andersen-style analysis + precise calls is � Require variables p and q up front; forces choice of inlined undecidable (details coming up) copy � No dynamic memory: not well-studied � Flow-insensitive: find finite sequence from infinite statement � Note that stack frames are a form of dynamic memory set Andersen’s and Calls, Simplified Matching Calls and Returns: Example � Four statement types (ignore fields for now) � new : x = new Obj() ��� � � � � � assign : x = y � � � � id(p) { return p; } � call : x = m(p1, p2, …) x = new Obj(); // o1 � � return: return x y = new Obj(); // o2 a = id(x); � Idea: use balanced parentheses to match calls � � ��� � � � � � � b = id(y); and returns � Parens labeled by call site � → �� �� � � � � � � � ε � Grammar filters out unrealizable paths (method call returning to wrong site) � → ��� � ������ � Classic use of CFL-reachability [RHS95] Andersen’s and Calls: The Details Andersen’s and Calls: Decidability � Must allow for partially balanced call parens � Analysis requires solving reachability over � E.g., to handle intersection of two CFLs ( S and flowsTo ) makeObj() { return new Obj(); } � But, CFLs are not closed under intersection � Handle fields and calls simultaneously via � In our case, problem is undecidable intersected languages � Proof via reduction from PCP [Reps00] � Enhance N production (previous slide) to include all � Standard approach for decidability: approximate field accesses recursion � Points-to analysis must find paths that are both S � Collapse SCCs in call graph (change ( i into assign ) paths (for calls) and flowsTo paths (for fields) � Yields imprecise handling of recursive calls / returns � Also need barred edges, etc.; details in [SB06] 4
Recommend
More recommend