co444h
play

CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects - PowerPoint PPT Presentation

Static analysis CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects Available 1. Crashes to exploits 2. Pointer analysis for JavaScript 3. Private data management languages 4. Programming robots to assemble IKEA furniture


  1. Static analysis CO444H Dataflow Dataflow frameworks Ben Livshits

  2. Master’s Projects Available 1. Crashes to exploits 2. Pointer analysis for JavaScript 3. Private data management languages 4. Programming robots to assemble IKEA furniture 5. Project in software security 6. Security vulnerabilities in web browsers 7. Toward auditable financial software 8. User tracking in mobile browsers 2

  3. We are in the Idealized World of CFGs t = x+y t = x+y a = t a = t t = x+y t = x+y b = t b = t b = t c = t c = t 3

  4. Data Flow Equations 4

  5. Dataflow Analysis • Computes facts about values in the program • Little or no interaction between facts • Based on all paths through program • Including, sometimes, infeasible paths • Let’s consider some dataflow analyses … 5

  6. Some Static Analysis Goals • For example • What can values can integer x have? • What locations can pointer p point to? • Can double y be negative? • Can it assume value 17? • etc. • This is static reasoning – we are approximating runtime execution here 6

  7. Static vs. Runtime • How can we i = 1; approximate the while(true){ possible values of i? i = i + 2; • What can we conclude if(…) break; on the basis of this } code? i = 1; • How about now? while(i < 1000){ i = i + 2; a = i*2; } 7

  8. Examples of Dataflow Analysis • We will cover three common types of analysis • Reaching definitions • Available expressions • Live variables 8

  9. Reaching Definitions 9

  10. Reaching Definitions • We will start this discussion by talking about an analysis called Reaching Definitions… • A basic block can generate a definition • A basic block can either • Kill a definition of x if it surely redefines x • Transmit a definition if it may not redefine the same variable(s) as that definition 10

  11. IN and OUT The following sets are defined: • IN(B) = set of definitions reaching the beginning of block B • OUT(B) = set of definitions reaching the end of B 11

  12. Equations Two kinds of equations: • Confluence equations: IN(B) in terms of OUTs of predecessors of B • Transfer equations: OUT(B) in terms of IN(B) and what goes on in block B 12

  13. Confluence Equations IN(B) = ∪ predecessors P of B OUT(P) P 1 P 2 {d 1 , d 2 } {d 2 , d 3 } {d 1 , d 2 , d 3 } B 13

  14. Transfer Equations • Generate a definition in the block if its variable is not definitely rewritten later in the basic block • Kill a definition if its variable is definitely rewritten in the block • An internal definition may be both killed and generated 14

  15. Example: GEN and KILL • For each basic block B1, B2, B3 we can compute GEN and KILL sets independently • These will be part of the transfer function 15

  16. Transfer Function for a Block Connecting IN and OUT sets… For any block B: OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B) 16

  17. Iterative Solution --- (2) IN(entry) = ∅ ; for each block do OUT(B)= ∅ ; while (changes occur) do for each block B do { IN(B) = ∪ predecessors P of B OUT(P); OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B); } 17

  18. Iterative Solution to Equations • For an n-block flow graph, there are 2*n equations and 2*n unknowns. • Alas, the solution is not unique. • Standard theory assumes a field of constants; sets are not a field. • Use iterative solution to get the least fixedpoint. • Identifies any def that might reach a point 18

  19. Reaching Definitions: Algorithm in Action IN(B 1 ) = {} B 1 d 1 : x = 5 OUT(B 1 ) = { d 1 } IN(B 2 ) = {d 1 , d 2 } if x == 10 B 2 OUT(B 2 ) = { d 1 , d 2 } IN(B 3 ) = {d 1 , d 2 } d 2 : x = 15 B 3 OUT(B 3 ) = { d 2 } 19

  20. A bit-vector representation for greater computational efficiency 20

  21. Aside: Notice the Conservatism • Not only the most conservative assumption about when a def is KILLed or GEN’d • Also the conservative assumption that any path in the flow graph can actually be taken • Also, this is a may analysis, not a must analysis 21

  22. Available Expressions 22

  23. Another Data-Flow Problem: Available Expressions • An expression x+y is available at a point if no matter what path has been taken to that point from the entry, x+y has been evaluated, and neither x nor y have even possibly been redefined • Useful for global common-subexpression elimination 23

  24. Available expressions example • Watch out for things that are possibly KILLed by an assignment 24 2010 Stephen Chong, Harvard University

  25. Defining GEN(B) and KILL(B) • An expression x+y is generated if it is computed in B, and afterwards there is no possibility that either x or y is redefined • An expression x+y is killed if it is not generated in B and either x or y is possibly redefined 25

  26. Equations for Available Expressions • The equations for AE are essentially the same as for RD, with one exception • Confluence of paths involves intersection of sets of expressions rather than union of sets of definitions • Available expressions is a forward must analysis • Forward means that data facts flow from IN to OUT • Must means that join points, only keep facts that hold on all paths that are joined 26

  27. Example of GEN and KILL for Available Expressions Kills x+y, w*x, etc. x = x+y z = a+b Generates Kills z-w, a+b x+z, etc. 27

  28. Transfer Equations • Transfer equation is exactly the same as before: OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B) • Which is good – we can use the same template for all GEN/KILL problems 28

  29. Confluence Equations • Confluence involves intersection, because an expression is available coming into a block if and only if it is available coming out of each predecessor IN(B) = ∩ predecessors P of B OUT(P) 29

  30. Iterative Solution IN(entry) = ∅ ; for each block B do OUT(B)= ALL; while (changes occur) do for each block B do { IN(B) = ∩ predecessors P of B OUT(P); OUT(B) = (IN(B) – Kill(B)) ∪ Gen(B); } 30

  31. Why It Works • An expression x+y is unavailable at point p iff there is a path from the entry to p that either: 1. Never evaluates x+y, or 2. Kills x+y after its last evaluation • IN(entry) = ∅ takes care of #1 above • OUT(B) = ALL, plus intersection during iteration handles #2 above 31

  32. Example of Why We Want Intersection Entry x+y killed x+y never gen’d x+y never GEN’d point p 32

  33. Subtle Point • It is conservative to assume an expression isn’t available, even if it is • But we don’t have to be “insanely conservative” • If after considering all paths, and assuming x+y killed by any possibility of redefinition, we still can’t find a path explaining its unavailability, then x+y is available • This is a delicate dance between soundness and precision 33

  34. How Would the Algorithm Change for A Backwards Analysis? 34

  35. Live Variables 35

  36. Live Variable Analysis • Variable x is live at a point p if on some path from p, x is used before it is redefined • Useful in code generation: if x is not live on exit from a basic block, there is no need to copy x from a register to memory • Captures if there is a demand for a variable 36

  37. Equations for Live Variables • LV is essentially a “backwards” version of RD • In place of GEN(B): Use(B) = set of variables x possibly used in B prior to any certain definition of x • In place of KILL(B): Def(B) = set of variables x certainly defined before any possible use of x 37

  38. Transfer Equations • Transfer equations give IN’s in terms of OUT’s: IN(B) = (OUT(B) – Def(B)) ∪ Use(B) • This is a little different – the direction is reversed 38

  39. Confluence Equations • Confluence involves union over successors, so a variable is in OUT(B) if it is live on entry to any of B’s successors. OUT(B) = ∪ successors S of B IN(S) 39

  40. Iterative Solution for Live Variables OUT(exit) = ∅ ; for each block B do IN(B)= ∅ ; while (changes occur) do for each block B do { OUT(B) = ∪ successors S of B IN(S); IN(B) = (OUT(B) – Def(B)) ∪ Use(B); } 40

  41. Data-Flow Frameworks Lattice-Theoretic Formulation Meet-Over-Paths Solution Monotonicity/Distributivity 41

  42. Data-Flow Analysis Frameworks • Generalizes and unifies each of the DFA examples from previous lecture. • Important ingredients : Element Symbol Explanation Direction D forward or backward Domain V (possible values for IN, OUT) ∧ Meet operator (effect of path confluence) Transfer functions F (effect of passing through a basic block) 42

  43. Good News! • All three analyses above fit the model • RD’s : Forward, meet = union, transfer functions based on GEN and KILL • AE’s : Forward, meet = intersection, transfer functions based on GEN and KILL • LV’s : Backward, meet = union, transfer functions based on USE and DEF 43

  44. May vs. Must Analysis May Must Forward Reaching Available definitions expressions Backward Live variables Very busy expressions 44

  45. Semilattices We stay that a set V and operation meet (denoted ∧) form a semilattice if for all x, y, and z in V: x ∧ x = x 1. ( idempotence ) x ∧ y = y ∧ x 2. ( commutativity ) x ∧ (y ∧ z) = (x ∧ y) ∧ z 3. ( associativity ) Top element ⊤ such that for all x, ⊤∧ x = x. 4. Bottom element (optional) ⊥ such that for all x: ⊥ 5. ∧ x = ⊥ 45

Recommend


More recommend