refinement based context sensitive points to analysis for
play

Refinement-Based Context-Sensitive Points-To Analysis for Java Manu - PowerPoint PPT Presentation

Refinement-Based Context-Sensitive Points-To Analysis for Java Manu Sridharan, Rastislav Bodk UC Berkeley PLDI 2006 1 What Does Refinement Buy You? Increased scalability: enable new clients Memory: orders of magnitude savings Time:


  1. Refinement-Based Context-Sensitive Points-To Analysis for Java Manu Sridharan, Rastislav Bodík UC Berkeley PLDI 2006 1

  2. What Does Refinement Buy You? Increased scalability: enable new clients • Memory: orders of magnitude savings • Time: answer for a variable comes back in 1 second • ) Suitable for IDE Cast Safety Client Precision : 2

  3. Approach: Focus on the Client Demand-driven: only do requested work Client-driven refinement: stop when client satisfied Example: • client asks: “can x point to o?” • we refine until we answer NO (the good answer) or we time out 3

  4. Context-Sensitive Analysis Costly Context-sensitive analysis (def): • Compute result as if all calls inlined • But, collapse recursive methods Exponential blowup (code growth) 4

  5. Why Not Existing Technique? Most analyses approximate same way in all code • E.g., k-CFA • Precision lost, esp. for data structures Our analysis focuses precision where it matters • Fully precise in the limit • Only small amount of code analyzed precisely • First refinement algorithm for Java 5

  6. Points-To Analysis Overview Compute objects each variable can point to For each var x, points-to set pt(x) Model objects with abstract locations 1: x = new Foo() yields pt(x) = { o 1 } Flow-insensitive: statements in any order 6

  7. Points-To Analysis as CFL-Reachability 1) Assignments x = new Obj(); // o 1 y = new Obj(); // o 2 z = x; o 1 x z a 2) Method calls ( 1 [ f ) 1 id(p) { return p; } ] f a = id(x); d c p id ret id b = id(y); [ g ) 2 ( 2 3) Heap accesses o 2 y b c.f = x; c.g = y; d = c.f; pt(x) = { o | o flowsTo x } flowsTo : path exists flowsTo : balanced call and field parens flowsTo : balanced call parens 7

  8. Summary of Formulation Graph represents program Compute reachability with two filters • Language of balanced call parens • Language of balanced field parens 8

  9. Single path problem … … t 9 t 7 ] j [ p t 8 ) 8 t 6 … t 5 ) 5 ( 7 ( 1 ) 1 o t 0 t 1 t 2 x [ h [ f t 12 ] k [ f ( 1 ) 1 [ h ] g o 2 t 10 Problem: show path is unbalanced t 11 Goal: reduce number of visited edges Insight: enough to find one unbalanced paren 9

  10. Approximation via Match Edges o t 0 t 1 t 2 t 3 t 4 x [ g ] h [ f [ h ] j ] f [ f [ g [ h ] h ] j ] f Match edges connect matched field parens • From source of open to sink of close • Initially, all pairs connected Use match edges to skip subpaths 10

  11. Refining the Approximation o t 0 t 1 t 3 t 4 x [ g [ f ] j ] f [ f [ g [ h ] h ] j ] f Refine by removing some match edges • Exposes more of original path for checking Soundness: Traverse match edge ) assume field parens balanced on skipped path Remove where unbalanced parens expected • Explore deeper levels of pointer indirection 11

  12. Refinement With Both Languages ( 2 ) 3 ( 1 ) 1 o t 0 t 1 t 2 t 3 t 4 t 5 t 6 x [ g ] g ] f [ f Fields: [ f [ g ] g ] f Calls: ( 1 ) 1 ( 2 ) 3 Match edges enable approximation of calls • Only can check calls on match-free subpaths Match edge removal ) more call checking • Key point: refine heap and calls together 12

  13. Evaluation 13

  14. Experimental Configuration Implemented in Soot framework Tested on large benchmarks x 2 clients • SPECjvm98, Dacapo suite • Downcast checking, factory method props Refine context-insensitive result Timeout for long-running queries 14

  15. Precision: Cast Checking 15

  16. Scalability: Time and Memory Average query time less than 1 second • Interactive performance (for IDE) • At most 13 minutes for casts, 4 minutes for factory client Very low memory usage: at most 35MB • Of this, 30MB for context-insensitive result • Compare with >2GB for 1-ObjSens analysis 16

  17. Demand-Driven vs. Exhaustive Demand advantage: no caching required • Hence, low memory overhead • No engineering of efficient sets • Good for changing code; just re-compute Demand advantage: faster for many clients • Often only care about some variables Demand disadvantage: slower querying all vars • At most 90 minutes for all app. vars • But, still good precision, memory 17

  18. Conclusions Novel refinement-based analysis • More precise for tested clients • Interactive performance for queries • Low memory: could scale even more • Relatively easy to implement Insight: refine heap and calls together • Useful for other balanced-paren analyses? 18

Recommend


More recommend