why data flow models
play

Why Data Flow Models? Models from Chapter 5 emphasized control - PowerPoint PPT Presentation

Why Data Flow Models? Models from Chapter 5 emphasized control Control flow graph, call graph, finite state machines We also need to reason about dependence Dependence and Data Flow Models Where does this value of x come from?


  1. Why Data Flow Models? • Models from Chapter 5 emphasized control • Control flow graph, call graph, finite state machines • We also need to reason about dependence Dependence and Data Flow Models • Where does this value of x come from? • What would be affected by changing this? • ... • Many program analyses and test design techniques use data flow information – Often in combination with control flow • Example: “Taint” analysis to prevent SQL injection attacks • Example: Dataflow test criteria (Ch.13) (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 1 (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 2 Learning objectives Def-Use Pairs (1) • Understand basics of data-flow models and the • A def-use (du) pair associates a point in a program related concepts (def-use pairs, dominators…) where a value is produced with a point where it is used • Definition : where a variable gets a value • Understand some analyses that can be performed with the data-flow model of a – Variable declaration (often the special value “uninitialized”) program – Variable initialization – Assignment – The data flow analyses to build models – Values received by a parameter – Analyses that use the data flow models • Use : extraction of a value from a variable • Understand basic trade-offs in modeling data – Expressions flow – Conditional statements – variations and limitations of data-flow models and – Parameter passing analyses, differing in precision and cost – Returns (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 3 (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 4

  2. Def-Use Pairs (3) Def-Use Pairs /** Euclid's algorithm */ public class GCD ... ... { if (...) { public int gcd(int x, int y) { if (...) { Definition: x = ... ; int tmp; // A: def x, y, tmp x gets a while (y != 0) { // B: use y value ... tmp = x % y; // C: def tmp; use x, y x = ... } x = y; // D: def x; use y y = ... + x + ... ; y = tmp; // E: def y; use tmp ... } Use: the value return x; // F: use x of x is } Def-Use extracted path y = ... + x + ... ... Figure 6.2, page 79 (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 5 (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 6 Def-Use Pairs (3) Definition-Clear or Killing • A definition-clear path is a path along the CFG x = ... // A: def x ... from a definition to a use of the same variable q = ... Definition: x x = y; // B: kill x, def x A without* another definition of the variable x = ... gets a value z = ... y = f(x); // C: use x between ... – If, instead, another definition is present on the path, Definition: x gets Path A..C is a new value, old then the latter definition kills the former B not definition-clear x = y value is killed • A def-use pair is formed if and only if there is a ... definition-clear path between the definition Path B..C is Use: the value definition-clear and the use C of x is y = f(x) extracted *There is an over-simplification here, which we will repair later. (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 7 (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 8

  3. (Direct) Data Dependence Graph Control dependence (1) • Data dependence: Where did these values come from? • A direct data dependence graph is: • Control dependence: Which statement controls whether – Nodes: as in the control flow graph (CFG) this statement executes? – Edges: def-use (du) pairs, labelled with the variable name – Nodes: as in the CFG – Edges: unlabelled, from entry/branching points to controlled blocks Dependence edges show this x value could be the unchanged parameter or could be set at line D (Figure 6.3, page 80) (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 9 (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 10 Dominators Dominators (example) • Pre-dominators in a rooted, directed graph can be • A pre-dominates all used to make this intuitive notion of “controlling A nodes; G post-dominates decision” precise. all nodes • Node M dominates node N if every path from the root B to N passes through M. • F and G post-dominate E – A node will typically have many dominators, but except for the • G is the immediate post- C E root, there is a unique immediate dominator of node N which dominator of B is closest to N on any path from the root, and which is in turn dominated by all the other dominators of N. – C does not post-dominate B D F – Because each node (except the root) has a unique immediate • B is the immediate pre- dominator, the immediate dominator relation forms a tree. dominator of G G • Post-dominators : Calculated in the reverse of the – F does not pre-dominate G control flow graph, using a special “exit” node as the root. (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 11 (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 12

  4. Control dependence (2) Control Dependence • We can use post-dominators to give a more precise A definition of control dependence: Execution of F is not inevitable at B – Consider again a node N that is reached on some but not all B execution paths. – There must be some node C with the following property: Execution of F is C E inevitable at E • C has at least two successors in the control flow graph (i.e., it represents a control flow decision); • C is not post-dominated by N D F • there is a successor of C in the control flow graph that is post- dominated by N. G – When these conditions are true, we say node N is control- F is control-dependent on B, the last point at which its dependent on node C. execution was not inevitable • Intuitively: C was the last decision that controlled whether N executed (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 13 (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 14 Calculating def-use pairs • Definition-use pairs can be defined in terms of paths in the program control flow graph: – There is an association (d,u) between a definition of variable v at d and a use of variable v at u iff Data Flow Analysis • there is at least one control flow path from d to u • with no intervening definition of v. – v d reaches u (v d is a reaching definition at u). – If a control flow path passes through another definition e of the same Computing data flow information variable v, v e kills v d at that point. • Even if we consider only loop-free paths, the number of paths in a graph can be exponentially larger than the number of nodes and edges. • Practical algorithms therefore do not search every individual path. Instead, they summarize the reaching definitions at a node over all the paths reaching that node. (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 15 (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 16

  5. Exponential paths DF Algorithm (even without loops) • An efficient algorithm for computing reaching definitions (and several other properties) is based on the way reaching definitions at one node are related to the reaching definitions at an adjacent node. A B C D E F G V • Suppose we are calculating the reaching definitions of node n, and there is an edge (p,n) from an immediate 2 paths from A to B predecessor node p. Tracing each path is not efficient, and we 4 from A to C – If the predecessor node p can assign a value to variable v, then can do much better. the definition v p reaches n. We say the definition v p is 8 from A to D generated at p. 16 from A to E – If a definition v p of variable v reaches a predecessor node p, and if v is not redefined at that node (in which case we say the ... v p is killed at that point), then the definition is propagated on 128 paths from A to V from p to n. (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 17 (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 18 Equations of node E (y = tmp) Equations of node B (while (y != 0)) public class GCD { public int gcd(int x, int y) { public class GCD { int tmp; // A: def x, y, tmp public int gcd(int x, int y) { Calculate reaching while (y != 0) { // B: use y int tmp; // A: def x, y, tmp This line has two definitions at E in tmp = x % y; // C: def tmp; use x, y while (y != 0) { // B: use y predecessors: terms of its x = y; // D: def x; use y tmp = x % y; // C: def tmp; use x, y Before the loop, immediate y = tmp; // E: def y; use tmp x = y; // D: def x; use y end of the loop predecessor D } y = tmp; // E: def y; use tmp return x; // F: use x } } return x; // F: use x } • Reach(B) = ReachOut(A) � ReachOut(E) Reach(E) = ReachOut(D) • ReachOut(A) = gen(A) = {x A , y A , tmp A } ReachOut(E) = (Reach(E) \ {y A }) � {y E } • ReachOut(E) = (Reach(E) \ {y A }) � {y E } (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 19 (c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 20

Recommend


More recommend