CS711 Advanced Programming Languages Pointer Analysis Overview and - PowerPoint PPT Presentation

CS711 Advanced Programming Languages Pointer Analysis Overview and Flow-Sensitive Analysis Radu Rugina 8 Sep 2005

Pointer Analysis • Informally: determine where pointers (or references) in the program may point to. • Significant amount of research in past 15 years – … still going • It is a fundamental problem in program analysis – Required by virtually all other analyses, optimizations, program understanding tools, bug-finding tools, etc. – Worst-case assumptions are too conservative • Especially for type-unsafe languages (e.g., C)

Points-To vs. Alias Analysis • Points-to analysis: Compute the set of memory locations that each pointer may point to. – Hence, a may analysis – E.g., pt(x)={z,t}, pt(t)={u}, pt(y)={z} – Essentially, a points-to graph x t u y z • (Pointer) alias analysis computes alias pairs – E.g. (*x,z), (*x,t), (*t,u), (**x,u), (*y,z) – Points-to graphs = a compact representation of alias pairs – Used in older analyses, e.g., [LR92]

Classifying Points-To Analyses • Flow-sensitivity – Flow analyses • compute a points-to graph at each program point – Flow-insensitive analyses • Assignments can execute in any order, any number of times • Obviously models program execution • A points-to graph for the entire program • Two main kinds: – Steensgaard, a.k.a. unification-based – Andersen, a.k.a. inclusion-based • Context-sensitivity – Distinguish the behavior of a function based on its calling context

Classifying Points-To Analyses C analyses (yellow) Java analyses (green) [RH98] [EGH94] Context [DLFR01] [WL04] [FRD00] [WL95] Sensitive [SH98] Context [And94] [Ruf95] [Ste96] [Das00] [BLQ+03] Insensitive [SGSB05] Flow-Insensitive Flow-Insensitive Dataflow Steensgaard Andersen

Points-To Analysis • “ compute set of locations where each pointer may point to ” • Ambiguities: – What are locations? – What about heap-allocated pointers? – What about aggregate structures: records, arrays, etc? – What about different instances of the same variable? • We ’ re missing a notion of memory abstraction

Memory Model • An abstraction of the memory – Map concrete locations to “ abstract locations/nodes ” • One abstract node may represent one or more concrete memory locations • Approximate unbounded concrete program state using a finite abstraction – Analysis clients need to know about this abstraction – Difficult to compare (results for) different abstractions

Heap Abstraction • Heap abstraction – Typically: one abstract node for each allocation site – Think: “ one global variable per malloc ” 12: x = malloc( … ) x m12 • Alternatives: – Less precise: one node for the entire heap – More precise: different nodes for locations allocated in different calling contexts • Aka “ context-sensitive heap abstraction ” • Think malloc wrappers • Model is imprecise for recursive structures – Shape analysis is significantly more precise here

Records and Structures • Option A: Model each field of each struct variable – A.k.a. “ field-sensitive ” . Think “ x.f ” x.a x.b struct { int a, b; } x, y; y.a y.b • Option B: Merge all fields of each struct variable – A.k.a. “ field-independent ” , “ field-insensitive ” . Think “ x.* ” x.* y.* struct { int a, b; } x, y; • Option C: Model each field of all struct variables – A.k.a. “ field-based ” . Think “ *.f ” *.a *.b struct { int a, b; } x, y;

Unions • Unions are type-unsafe – Sound approach: merge all fields • As in “ field-independent ” (B) x.* union { int a; char b; } x; – Unsound approach: assume fields don ’ t interfere • As in “ field-sensitive ” (A) x.a x.b union { int a; char b; } x;

Arrays • Merge all array elements together int a[10]; a[*] • Or use a separate abstraction for the first element int a[10]; a[0] a[1..10]

Nested Arrays and Structures • Recurse through nested structure – Merge array elements – Separate all structure fields • even if structure is nested in an array x[*].a[*] x[*].b struct { int a[3], b; } x[3]; x[0] x[1] x[2]

The Flow Analysis • Program assignments: address-of copy load store x = &y x = y x = *y *x = y • Dataflow information = points-to graphs – Use pt(x) = points-to set of x • Merge operator = set union • Transfer functions – x = &y : pt ’ ( x ) = {y} – x = y : pt ’ ( x ) = pt( y ) – x = *y : pt ’ ( x ) = U pt( z ), for all z ∈ pt( y ) – *x = y : pt ’ ( z ) U= pt( y ), for all z ∈ pt( x )

Strong vs. Weak Updates • “ strong updates ” = update value • “ weak updates ” = accumulate value • Strong updates = more precise • Weak updates if can ’ t tell which concrete location is written – *x = y – x[i] = y • Strong updates = key difference between flow-sensitive and flow-insensitive analyses

Inter-Procedural Analysis [EGH ’ 94] • Analyze callee for each function call – “ map ” the points-to information in the caller – Analyze callee with mapped information – “ unmap ” result and return to caller • Mapping process: – Use “ invisible variables ” to model variables that are not in the current scope, but accessible through pointers – Store mapping information, use it during unmap Call site graph: b � a foo() { int a, *b = &a; Mapped graph: p � p_1 � p_2 bar(&b); } bar(int** p) { … } Mapping info: (b,p_1) (a,p_2)

Invocation Graph • Use an “ invocation graph ” for context-sensitivity – Unroll call-graph, turn it into a tree main main() { g(); g(); } g g g() { f(); } f() { … } f f

Invocation Graph • Use an “ invocation graph ” for context-sensitivity – For recursion: • Use two nodes: “ approximate ” and “ recursive ” • Perform a fixed-point computation along the back edge • Use summaries for each node main f-R main() { f(); } f() { if ( … ) g(); } g g() { f(); } f-A

Function Pointers • Indirect calls: a “ chicken-and-egg ” problem – Need points-to information to resolve such calls – Need to resolve the calls to compute the points-to info – Solution: compute both at the same time – Once a call is resolved: analyze each callee, merge the results main main main fp fp fp g g f f-R fp fp f-A

Evaluating an Analysis • What is the right metric? – An ongoing debate – Option 1: size of points-to sets • At loads and stores, at indirect calls • Difficult to compare analyses that use different abstractions – Option 2: evaluate effect on analysis clients • E.g, how many virtual calls are disambiguated? Or how many false data dependencies are being removed? • How much faster do programs run because of a better points- to analysis? • How is the false positive ratio improved in a bug-finding tool?

Experiments [EGH ’ 94] • Programs ranging from 0.1 K to 2.2 K LOC • Small points-to set sizes at indirect accesses (avg. 1.13) • Many indirect with one single target (28%) – But only 19% where the target is a program variable • Invocation Graph statistics: – Average ratio IG size / call-sites = 1.45 (up to 2.5) – Ratio IG size / procedures larger (up to 21) – In theory, IG size is exponential

Memoization [WL ’ 95] • [Wilson,Lam,PLDI ’ 95] “ Efficient Context-Sensitive Pointer Analysis for C Programs ” – Always use procedure summaries (not just for recursion) • Called “ partial transfer functions ” (PTFs) – Do not build an Invocation Graph – Build “ invisible variables ” lazily – Memory abstraction using triples (b, f, s), with base b, offset f,and stride s – Ratio PTFs / procedures : between 1.00 and 1.39 – Report a program with 37 procedures that generates an invocation graph with more than 700,000 nodes

CS711 Advanced Programming Languages Pointer Analysis Overview and - PowerPoint PPT Presentation

CS711 Advanced Programming Languages Pointer Analysis Overview and Flow-Sensitive Analysis Radu Rugina 8 Sep 2005 Pointer Analysis Informally: determine where pointers (or references) in the program may point to. Significant amount of

CS711 Advanced Programming Languages Shape Analysis With Tracked Locations Radu Rugina 22 Sep

CS711 Advanced Programming Languages Inter-Procedural Analysis Radu Rugina 1 Sep 2005

1 Analysis Information Where Do Facts Hold? How much information depends on the client

Opaque Pointer Types To a world without pointer to pointer bitcasts Motivation Proximal

Pointer arithmetic arrays only arrays only Pointer arithmetic Can add or subtract an

Pointer Basics Lecture 13 COP 3014 Fall 2019 November 7, 2019 What is a Pointer? A pointer

61A Lecture 26 Announcements Programming Languages Programming Languages 4 Programming

Dangling Pointer Dangling Pointer Jonathan Afek, 2/ 8/ 07, BlackHat USA 1 Table of Contents

Pointers and Memory 1 Pointer values Pointer values are memory addresses

Alias Analysis Last time Reuse optimization Today Alias analysis (pointer analysis)

Hierarchical Pointer Analysis for Distributed Programs Distributed Programs Amir Kamil and

Precision-Guided Context Sensitivity for Pointer Analysis Yue Li, Tian Tan, Anders Mller,

A Probabilistic Pointer Analysis A Probabilistic Pointer Analysis for Speculative Optimization

Making k- Object-Sensitive Pointer Analysis More Precise with Still k -Limiting Tian Tan , Yue Li

Alias Analysis Last time Alias analysis I (pointer analysis) Address Taken FIAlias,

Chapter 2 Early History: low level languages The 1950s: first programming languages History of

Really Naturally Linear Indexed Type Checking Arthur Azevedo de Amorim 1 , Marco Gaboardi 2 , us

Sensitivity Analysis in Real-Time Systems Enrico Bini Scuola Superiore Sant'Anna Why

Modeling Strategies for the Initial Allocation of SARS-CoV-2 Vaccines Matthew Biggerstaff, ScD,

Bioinformatics: Network Analysis Analyses of Biological Systems Models COMP 572 (BIOS 572 / BIOE

Impact of QCD and PDF uncertaintjes on Standard Model Measurements a theory heory/PD PDF a ana

Software Performance Engineering in the DevOps World Sources of Uncertainty in Performance-aware

Parallel Flow-Sensitive Pointer Analysis by Graph-Rewriting Vaivaswatha Nagaraj R. Govindarajan

Global sensitivity analysis in PROMETHEE Sndor Bozki MTA SZTAKI Institute for Computer