CO444H Pointer analysis Ben Livshits 1 Call Graphs Class - PowerPoint PPT Presentation

Datalog CO444H Pointer analysis Ben Livshits 1

Call Graphs • Class analysis: • Given a reference variable x, what are the classes of the objects that x refers to at runtime? • We saw CHA and RTA • Deal with polymorphic/virtual calls: x.m() • Compilers: can we devirtualize a virtual call x.m() ? • Software engineering: • Construct the call graph of the program • Why is that important in everyday development?

Features of RTA • RTA may evaluate a method several times • If new callers are discovered the method has to be re-evaluated • RTA runs until the worklist is empty, at which point it has reached a fixed point and cannot resolve any new call edges to add to the call graph 3

RTA Revisited RAPID TYPE ANALYSIS RTA = call graph of only methods (no edges) CHA = class hierarchy analysis call graph W = worklist containing the main method while W is not empty M = next method in W T = set of allocated types in M T = T U {allocated types in RTA callers of M} for each callsite (C) in M if C is a static dispatch or constructor: add an edge to statically resolved method otherwise: M' = methods called from M in CHA M' = M' intersection {methods declared in T or supertypes of T} add an edge from the method M to each method in M' add each method in M' to worklist W 4

Using RTA in Eclipse 5

RTA May Be Unsound • main calls foo, which public static void main(String[] args){ returns an allocation Object o = foo(); of type A that is then bar(o); passed as a } parameter in the call to bar public static Object foo(){ • The call edge to A. return new A(); toString would be } missing because neither bar or its public static void bar(Object o){ parents (main) o.toString() allocated a type of A } 6

Call Graph Construction: Reachability Computation Queue worklist CallGraph graph; worklist.addAtTail( main() ) Graph.addNode( main() ) while (worklist.notEmpty()) { m = worklist.getFromHead(); process_method_body(m); }

Next Steps… • Ingredients • Adding pointers • Adding call graphs • Combining those two • How to we mix the ingredients? • Can first build a call graph; then add pointers • Can do it all at once: we can use Datalog to represent everything , with some Datalog relations encoding intraprocedural aspects and some interprocedural 8

9 Pointer Analysis: Basics and Algorithms

Variants of Pointer Analysis • For C: • Andersen analysis • Steensgard analysis • Pointer analysis for Java • How to encode these in Datalog • Other variants 10

What is the Goal of Pointer Analysis? • What memory locations can a pointer expression refer to? • Alias analysis: When do two pointer expressions refer to the same storage location? int x; •*p and *q alias p = &x; • as do x and *p q = p; • and x and *q 11

Sources of Aliasing • Aliasing can arise due to several reasons, depending on the language… • Pointers • e.g., int *p, i; p = &i; • Call-by-reference void m(Object a, Object b) { … } m(x,x); // a and b alias in m • Array indexing • int i, j, a[100]; • i = j; // a[i] and a[j] alias 12

Why do we Want to Know? • Pointer analysis tells us • If *p aliases a or b, then second computation of a+b what memory locations is not redundant code uses or modifies • E.g., consider constant • Useful in many analyses propagation • E.g., available x = 3; expressions *p = 4; *p = a + b; y = x; y = a + b; • Is y constant? • If *p and x do not alias each other, then yes. • If *p and x always alias each other, then yes. • If *p and x sometimes alias each 13 other, then no

Pointer Analysis Dimensions • Intraprocedural / • Definiteness: May interprocedural versus must • Flow-sensitive / • Heap modelling flow-insensitive • Data representation • Context-sensitive / context-insensitive 14

Flow-sensitive vs. Flow- insensitive Points-To • Flow-sensitive pointer • Flow-sensitive pointer analysis computes for each analysis is (traditionally) program point what too expensive to memory locations pointer perform for whole expressions may refer to program • Flow-insensitive pointer • Flow-insensitive pointer analysis computes what memory locations pointer analyses typically used expressions may refer to, at for whole program any time in program analyses execution 15

Context Sensitivity • Also difficult, • BDDs see Whaley and Lam PLDI 2004 but success in • Doop, Bravenboer and scaling up to Smaragdakis OOPSLA hundreds of 2009 thousands LOC 16

May vs. Must • May analysis: aliasing • Sometimes both are that may occur during useful execution • E.g., consider liveness • (cf. must-not alias, analysis for *p = *q + 4; although often has • If *p must alias x, then different x in kill set for representation) statement • Must analysis: aliasing • If *q may alias y, then y that must occur during in gen set for statement execution 17

Representation Options • Points-to pairs: first • Pairs that refer to the element points to the same memory second • e.g., (*p,b), (*q,b), (*p,*q), (**r, b) • e .g., (p → b), (q → b) • General, may be less • p and b alias, as do *q concise than points-to and b, as do *p and *q pairs • Equivalence sets: sets that are aliases • e.g., {*p,*q,b} 18

Modeling Memory Locations • We want to describe • For local variables, use what memory locations a single “node” per a pointer expression context may refer to • i.e., just one node if context insensitive • How do we model • For dynamically memory locations? allocated memory • For global variables, no • Problem: Potentially trouble, use a single unbounded locations “node” created at runtime • Need to model locations with some finite abstraction 19

Modeling Dynamic Memory Locations • Other solutions: • For each allocation statement, use one • One node for node per context entire heap • Note: could choose • One node for context-sensitivity for each type modelling heap • Nodes based on locations to be less precise than context- analysis of sensitivity for modelling “ shape ” of heap procedure invocation 20

Problem Statement • Let’s consider flow- • Assume pointers p,q ∈ P and address-taken variables insensitive may pointer a,b ∈ A are disjoint analysis • Can transform program to • Assume program make this true • For any variable v for which consists of statements this isn’t true, add statement of form pv = &av, and replace v with *pv p = &a (address of, includes allocation • Want to compute relation statements) pts : P ∪ A → 2 A p = q *p = q • Essentially points to pairs p = *q 21

Andersen-style Pointer Analysis • View pointer assignments as subset constraints • Use constraints to propagate points-to information • Called inclusion-based pointer analysis 22

Andersen-style Pointer Analysis • Can solve these constraints directly on sets pts(p) • p = &a; p ⊇ {a} • q = p; q ⊇ p • p = &b; p ⊇ {b} • r = p; r ⊇ p 23

Example of Subset Constraints 24

How Precise Is This Analysis? 25

Andersen-style as Graph Closure • Can be cast as a graph closure problem • One node for each pts(p), pts(a) • Each node has an associated points-to set • Compute transitive closure of graph, and add edges according to complex constraints 26

Work List Algorithm • Initialize graph and points to sets using base and simple constraints • Let W = { v | pts(v) ≠ ∅ } (all nodes with non-empty points to sets) • While W not empty • v ← select from W • for each a ∈ pts(v) do • add edge a→ p, and add a to W if edge is new • for each constraint *v ⊇ q • add edge q→a , and add q to W if edge is new • for each edge v→q do • pts(q) = pts(q) ∪ pts(v), and add q to W if pts(q) changed 27

Same Example, as A Graph (Initial) W: p q r s a 28

Same Example, as A Graph (Final) W: {} 29

Cycle Elimination • Andersen-style pointer analysis is O(n 3 ), for number of nodes in graph • Actually, quadratic in practice [Sridharan and Fink, SAS 09]; • Improve scalability by reducing the value of n • Cycle elimination: important optimization for Andersen- style analysis • Detect strongly connected components in points-to graph, collapse to single node • Why? All nodes in an SCC will have same points-to relation at end of analysis 30

Steensgaard-style Analysis • Also a constraint-based analysis • Uses equality constraints instead of subset constraints • Originally phrased as a type-inference problem • Less precise than Andersen-style, thus more scalable 31

Steensgaard-style Example p q a b c p,q a,b c All pointers end up in p,q r the same equivalence a,b c class pointing to all the locations p,q,s,t r a,b c p,q,s,t,r a,b,c 32

Implementing Steensgaard • Can be efficiently implemented using UnionFind algorithm • Nearly linear time: O( nα (n)) • Each statement needs to be processed just once • Unlike Andersen’s, which is a lot more difficult to scale 33

34 Datalog-based Formulation of Pointer Analysis

CO444H Pointer analysis Ben Livshits 1 Call Graphs Class - PowerPoint PPT Presentation

Datalog CO444H Pointer analysis Ben Livshits 1 Call Graphs Class analysis: Given a reference variable x, what are the classes of the objects that x refers to at runtime? We saw CHA and RTA Deal with polymorphic/virtual calls:

CO444H Pointer analysis Ben Livshits 1 Approaches to Finding Reliability and Security Bugs

CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects Available 1. Crashes to

CO444H Administrivia Overview of the Material Ben Livshits Two Primary Goals We Pursue

CO444H Ben Livshits 1 Basic Instrumentation Insert additional code into the program This

CO444H parallelism Ben Livshits 1 Why Parallelism? One way to speed up a computation is to

CO444H SSA SSA Construction SSA-based analysis Ben Livshits 1 Refresher: Reaching Definitions

ALIAS: results of Legal Network Themes Survey Giuseppe Contissa | European University Institute

Fractional Factorial Designs Each replicate of a 2 k design requires 2 k runs. E.g. 64 runs for k =

Interprocedural Analysis and Optimization Mod/Ref Analysis Alias Analysis Constant Propagation

CIS 118 Intro to UNIX Bourne Again Shell (BASH) CIS 118: Intro to UNIX Shell

Optimizing quantum circuits with classical thinking Craig Gidney Google Quantum AI QPL/MFPS

Instance-wise Points-to Analysis for Loop- based Dependence Testing Peng Wu 1 , Paul Feautrier 2 ,

Redis Graph A graph database built on top of redis Whats Redis? Open source in-memory

The sampling theorem in and digital dividers Claudio E. Calosso and Enrico Rubiola

Alias Analysis Motivation a = 1; a = 1; b = 2; b = 2; c = a + b; c = 3; Alias Analysis

ENHANCING BASE CODE PROTECTION IN ASPECT ORIENTED PROGRAMS Mohamed ElBendary and John

This Lecture More SQL SELECT Aliases SQL SELECT II Self - Joins Subqueries

Linux Pranks RITlug April Fools Edition Disclaimer: These pranks range from funny, to annoying,

Probability distribution of interest is { p i } = Pr { X = i } . Assume there are N values i for

CS 103 Unit 13 Slides C++ References Mark Redekopp 2 Swap Two Variables int main() { int

1 Lets begin with a dataflow analysis for an

Entity Linking via Low-rank Subspaces Akhil Arora , Alberto Garca-Durn, and Bob West SMLD

Wandering through linear types, capabilities, and regions Franc ois Pottier May 24th, 2007

Hooge Finding Functions from Types [ ] [ ] Neil Mitchell