special topics on binary level program analysis more on
play

Special topics on binarylevel program analysis: More on Static - PowerPoint PPT Presentation

Special topics on binarylevel program analysis: More on Static Analysis Gang Tan CSE 597 Spring 2019 Penn State University 1 ITERATION ALGORITHMS 2 Chaotic Iteration Suppose there are n equations in total RD j = F j ( RD 1 ,


  1. Special topics on binary‐level program analysis: More on Static Analysis Gang Tan CSE 597 Spring 2019 Penn State University 1

  2. ITERATION ALGORITHMS 2

  3. Chaotic Iteration • Suppose there are n equations in total – RD j = F j ( RD 1 , …, RD n ), 1 ≤ j ≤ n For all j, RD j := ∅ while RD j  F j (RD 1 , …, RD n ) for some j do RD j := F j (RD 1 , …, RD n ) 3

  4. Example • [x:= 1] 1 ; (while [ y>0] 2 do [x:= x-1] 3 ); [x:= 2] 4 • Equations – RD entry (1) = {(x,?), (y,?)} – RD entry (2 ) = RD exit (1 ) ∪ RD exit (3 ) – RD entry (3 ) = RD exit (2 ) – RD entry (4 ) = RD exit (2 ) – RD exit (1) = (RD entry (1) \ {(x,l)}) ∪ {(x,1)} – RD exit (2) = RD entry (2) – RD exit (3) = (RD entry (3) \ {(x,l)}) ∪ {(x,3)} – RD exit (4) = (RD entry (4) \ {(x,l)}) ∪ {(x,4)} 4

  5. Work‐list Algorithm for Reaching Definitions • dep(j) = {k | RD k depends on RD j } – That is, if RD j changes, then RD k will change too; things that depend on RD j W ← {1, 2,… , n}; For all j, RD j := ∅ ; while W  ∅ do { Remove a number j from W; If RD j  F j (RD 1 , …, RD n ) { RD j  F j (RD 1 , …, RD n ); W = W ∪ dep(j) } } 5

  6. Example • [x:= 1] 1 ; (while [ y>0] 2 do [x:= x-1] 3 ); [x:= 2] 4 • Equations – dep(1n) = {1x}; dep(2n) = {2x}; dep(3n) = {3x}; dep(4n) = {4x}; – dep(1x) = {2n}; dep(2x) = {3n, 4n}; dep(3x) = {2n}; dep(4x) = { }; 6

  7. Example • [x:= 1] 1 ; (while [ y>0] 2 do [x:= x-1] 3 ); [x:= 2] 4 • Solution – RD entry (1) = {(x,?), (y,?)} – RD entry (2) = {(x,1), (x,3), (y,?)} – RD entry (3) = {(x,1), (x,3), (y,?)} – RD entry (4) = {(x,1), (x,3), (y,?)} – RD exit (1) = {(x,1), (y,?)} – RD exit (2) = {(x,1), (x,3), (y,?)} – RD exit (3) = {(x,3), (y,?)} – RD exit (4) = {(x,4), (y,?)} 7

  8. COMPLETE LATTICE 8

  9. Foundation of Static Analysis: Fixed Point Theory of Complete Lattice • A partial order is a mathematical structure: L = (S, v ) – S is a set; v is a binary relation on S – Reflexive: ∀ x ∈ S. x v x – Transitive: • ∀ x,y,z ∈ S. x v y ∧ y v z → x v z – Anti‐symmetric • ∀ x,y ∈ S. x v y ∧ y v x → x = y 9

  10. Partial Order • Examples – (N, ≤) – (N, ≥) – (P(A), ⊆ ) – (P(A), ⊇ ) • Partial order diagrams 10

  11. Upper bound and lower bound • y is an upper bound for X, if ∀ x ∈ X: x v y • t X is the least upper bound of X – Called the join operator • u X is the greatest lower bound of X – Called the meet operator • L = (S, v ) is a complete lattice if – It is a partial order, and – t X and u X exist for every X ⊆ S • > stands for the greatest element • ⊥ stands for the least element 11

  12. INTERPROCEDURAL ANALYSIS 12

  13. Interprocedural CFGs void main() { x:=7 x := 7; r := p(x); y:=a+2 call p(x) x := r; r:= ret p(x) z := p(x + 10); } ret y x:=r int p(int a) { y := a+2; call p(x+10) return y; z:= ret p(x+10) } 13

  14. One Idea for Interprocedural Analysis • Ignore the differences between inter and intra‐procedural edges – Conflate them into one kind of edges – Context‐insensitive interprocedural analysis • Introduce a lot of imprecision – Because of many invalid paths 14

  15. Conflating Intra and Inter Edges void main() { x:=7 x := 7; {x:7} r := p(x); {a:T} y:=a+2 call p(x) x := r; r:= ret p(x) z := p(x + 10); {r:T} } ret y {y:T} x:=r int p(int a) { {x:T} y := a+2; call p(x+10) return y; z:= ret p(x+10) } {z:T} 15

  16. Invalid Paths • Information about all call sites are merged – Loss of precision – Put it in another way, it considers “the worst case” when calls and returns do not match • When returns return to nonmatching call sites • One Easy Fix: Inlining function calls – Essentially use a new copy of the function whenever it’s called – So that different calls don’t mix information together 16

  17. Inlining for the Example {a:7} void main() { int p1(int a) { x := 7; y := a+2; {y:9} return y; r := p1(x); } x := r; z := p2(x + 10); {a:19} int p2(int a) { } y := a+2; {y:21} return y; } 17

  18. Problem with Inlining? • Code/CFG blow‐up – Can be exponential in the worst case void p1() { p2(); p2(); } Void p2() { p3(); p3(); } void p3() { p4(); p4(); } • Cannot deal with recursion void p1() { … p1() … } 18

  19. Context Sensitivity • Group calls into a finite number of contexts – Label information using contexts so that information related to different contexts do not mix – For a context, analyze the callee function w.r.t that context • Common contexts – Call‐site stack of a finite size k • also called the call‐string context – Let k=1, then interprocedural constant propagation computes information like this: • (1, {x:2, y:T}), (2, {x:T, y:3}) 19

  20. Size‐one Call‐String Contexts void main() { x:=7 x := 7; (1, {a:7}), (2,{a:19}) (‐,{x:7}) r := p(x); y:=a+2 1 call p(x) x := r; r:= ret p(x) z := p(x + 10); (‐, {r:9}) } ret y x:=r int p(int a) { (1, {y:9}), (‐, {x:9}) y := a+2; (2,{y:21}) 2 call p(x+10) return y; z:= ret p(x+10) } (‐, {z:21}) 20

  21. Call‐String Contexts of Various Sizes void main() { Size‐one call strings: 1: fib(7); ‐; 1; 2; 3; } int fib(int n) { if n <= 1 Size‐two call strings: x := 0 else { ‐; 1::‐; 2::1; 3::1; 2: y := fib(n‐1); 2::2; 3::2; 2::3; 3::3 3: z := fib(n‐2); x:= y+z; } return x; } 21

  22. Other Kinds of Contexts • Assumption sets – What states at the call site? – Example paper: “ESP: path‐sensitive program verification in polynomial time” • Caller stack – The stack of caller functions – Less precise than call‐site stack (2::3 versus fib::fib) • OO programs – Object sensitivity 22

  23. MISC. 23

  24. Flow Sensitivity • Dataflow analysis is flow sensitive – Take into account the order of statements – E.g., “x:=1; y:=x” would get different liveness analysis result from “y:=x; x:=1” • A flow‐insensitive analysis – Do not consider the order of statements – E.g., a simple analysis that collects all the constants used in the program is flow‐insensitive • “x:=1; y:=x” would produce {1}, so is “y:=x; x:=1” 24

  25. Path Sensitivity • Dataflow analysis is path insensitive AV AV AV 25

  26. Path Sensitive Analysis • Example if (x>1) {t1= x+y} else {t2 = x‐y}; Is “x+y” available here? if (x>1) {u = (x+y) –z} • By conventional available expression analysis, “x+y” is not available • Path sensitive analysis – Associating information with edges – At the end of “if (x>1) …” • {(x>1, x+y), (x<=1, x‐y)} – Then “x+y” is available inside the second branch 26

  27. Analyzing the Heap • The heap poses a major challenge for static analysis – Many static analysis disregard the heap completely – Source of false positives and false negatives 27

  28. Pointer Analysis, Points‐to Analysis, Alias Analysis • Example: int x = 3, y = 4; int *p = &x; Is “x+y” available here? int t = x + y; *p = 5; if (x+y > 10) {…} No! x was modified through its alias *p 28

  29. Shape Analysis • Dataflow analysis – Good at analyzing atomic values: labels, constants, variable names – Cannot easily extend to data structures in the heap: arrays, trees, lists, … • Shape analysis can analyze the shapes of data structures – A very active research area 29

Recommend


More recommend