dart directed automated random testing
play

DART: Directed Automated Random Testing PLDI 2005 Patrice Godefroid - PowerPoint PPT Presentation

DART: Directed Automated Random Testing PLDI 2005 Patrice Godefroid 1 Nils Klarlund 1 Koushik Sen 2 1 Bell Laboratories, Lucent Technologies 2 University of Illinois at Urbana-Champaign November 10, 2015 Presented by Markus 1/20 Introduction


  1. • To better illustrate why program testing is hard, and the difficulties Introduction with current automated techniques, we’ll look at this example program • Here, we have the function h which we would like to to test. We’ve ◮ Automated random testing int mul2(int x) { encoded an error statement in h using the abort statement 1 return 2 * x; • There are two conditions guarding the reachability of the abort 2 ◮ Hard to guess statement: x must not be equal to y and the result of calling mul2 } 3 on x must be equal to x + 10 constraints ( x == 10 ) int h(int x, int y) { 4 • Random testing is one automated testing technique: it simply applies ◮ Directed random testing if (x != y) { 5 random inputs to the function under test with hopes to execute if (mul2(x) == x + 10) { different paths 6 abort(); • Random testing is good since it requires very low overhead but it 7 often has difficulty exercising new paths within the program } 8 • Specifically, if we examine look at a condition such as x equal to 10, } 9 with 32 bit integers there is a 2 32 chance to guess this correctly • Obviously, with such a low probability, random testing will likely end up having low coverage on this function • An alternative approach is to use what the authors refer to as directed testing • In this way, the inputs required to reach a specific point in the program are specified as a set of constraints who’s satisfiability 3/20 represent inputs to reach a certain location

  2. • To better illustrate why program testing is hard, and the difficulties Introduction with current automated techniques, we’ll look at this example program • Here, we have the function h which we would like to to test. We’ve ◮ Automated random testing int mul2(int x) { encoded an error statement in h using the abort statement 1 return 2 * x; • There are two conditions guarding the reachability of the abort 2 ◮ Hard to guess statement: x must not be equal to y and the result of calling mul2 } 3 on x must be equal to x + 10 constraints ( x == 10 ) int h(int x, int y) { 4 • Random testing is one automated testing technique: it simply applies ◮ Directed random testing if (x != y) { 5 random inputs to the function under test with hopes to execute if (mul2(x) == x + 10) { ◮ Specify reachability as different paths 6 abort(); • Random testing is good since it requires very low overhead but it constraints 7 often has difficulty exercising new paths within the program } 8 • Specifically, if we examine look at a condition such as x equal to 10, } 9 with 32 bit integers there is a 2 32 chance to guess this correctly • Obviously, with such a low probability, random testing will likely end up having low coverage on this function • An alternative approach is to use what the authors refer to as directed testing • In this way, the inputs required to reach a specific point in the program are specified as a set of constraints who’s satisfiability 3/20 represent inputs to reach a certain location

  3. • To better understand this concept of directed testing, we’ll continue Introduction looking at this example • Consider we randomly generate the following inputs to h: x equal to 20 and y equal to 1000 ◮ Input One: int mul2(int x) { • With this input, the first branch, x not equal to y, will be taken, but 1 x = 20 , y = 1000 return 2 * x; the second one will not since the result of mul2 returns 40 and 40 is 2 not equal to 30 } 3 • Given this programs execution, we can capture its path constraint: int h(int x, int y) { 4 the path constraint is a logical formula capturing all program inputs if (x != y) { 5 resulting in the same path if (mul2(x) == x + 10) { 6 • Specifically, this path constraint specifies that x is not equal to y and abort(); 2x is not equal to x + 10: intuitively, we can see these conditions 7 represent the first branch being taken and the second one not being } 8 taken } 9 • Since our goal is to increase testing coverage of the function, we’d like to direct the tester to explore a new path through the function • To do this, we can negate the last condition in the previous constraint, in other words, try to find an input to satisfy the first and second branch conditions • Passing this equation to a solver, we can get a solution that x equals 10 and y equals 1000 which are valid inputs to reach the abort 4/20 statement and find the bug

  4. • To better understand this concept of directed testing, we’ll continue Introduction looking at this example • Consider we randomly generate the following inputs to h: x equal to 20 and y equal to 1000 ◮ Input One: int mul2(int x) { • With this input, the first branch, x not equal to y, will be taken, but 1 x = 20 , y = 1000 return 2 * x; the second one will not since the result of mul2 returns 40 and 40 is 2 ◮ Second branch not not equal to 30 } 3 • Given this programs execution, we can capture its path constraint: taken: 40 � = 20 + 10 int h(int x, int y) { 4 the path constraint is a logical formula capturing all program inputs if (x != y) { 5 resulting in the same path if (mul2(x) == x + 10) { 6 • Specifically, this path constraint specifies that x is not equal to y and abort(); 2x is not equal to x + 10: intuitively, we can see these conditions 7 represent the first branch being taken and the second one not being } 8 taken } 9 • Since our goal is to increase testing coverage of the function, we’d like to direct the tester to explore a new path through the function • To do this, we can negate the last condition in the previous constraint, in other words, try to find an input to satisfy the first and second branch conditions • Passing this equation to a solver, we can get a solution that x equals 10 and y equals 1000 which are valid inputs to reach the abort 4/20 statement and find the bug

  5. • To better understand this concept of directed testing, we’ll continue Introduction looking at this example • Consider we randomly generate the following inputs to h: x equal to 20 and y equal to 1000 ◮ Input One: int mul2(int x) { • With this input, the first branch, x not equal to y, will be taken, but 1 x = 20 , y = 1000 return 2 * x; the second one will not since the result of mul2 returns 40 and 40 is 2 ◮ Second branch not not equal to 30 } 3 • Given this programs execution, we can capture its path constraint: taken: 40 � = 20 + 10 int h(int x, int y) { 4 the path constraint is a logical formula capturing all program inputs ◮ Path constraint: if (x != y) { 5 resulting in the same path if (mul2(x) == x + 10) { ( x � = y ) ∧ (2 x � = x + 10) 6 • Specifically, this path constraint specifies that x is not equal to y and abort(); 2x is not equal to x + 10: intuitively, we can see these conditions 7 represent the first branch being taken and the second one not being } 8 taken } 9 • Since our goal is to increase testing coverage of the function, we’d like to direct the tester to explore a new path through the function • To do this, we can negate the last condition in the previous constraint, in other words, try to find an input to satisfy the first and second branch conditions • Passing this equation to a solver, we can get a solution that x equals 10 and y equals 1000 which are valid inputs to reach the abort 4/20 statement and find the bug

  6. • To better understand this concept of directed testing, we’ll continue Introduction looking at this example • Consider we randomly generate the following inputs to h: x equal to 20 and y equal to 1000 ◮ Input One: int mul2(int x) { • With this input, the first branch, x not equal to y, will be taken, but 1 x = 20 , y = 1000 return 2 * x; the second one will not since the result of mul2 returns 40 and 40 is 2 ◮ Second branch not not equal to 30 } 3 • Given this programs execution, we can capture its path constraint: taken: 40 � = 20 + 10 int h(int x, int y) { 4 the path constraint is a logical formula capturing all program inputs ◮ Path constraint: if (x != y) { 5 resulting in the same path if (mul2(x) == x + 10) { ( x � = y ) ∧ (2 x � = x + 10) 6 • Specifically, this path constraint specifies that x is not equal to y and abort(); 2x is not equal to x + 10: intuitively, we can see these conditions ◮ Direct tester to new paths 7 represent the first branch being taken and the second one not being } 8 taken } 9 • Since our goal is to increase testing coverage of the function, we’d like to direct the tester to explore a new path through the function • To do this, we can negate the last condition in the previous constraint, in other words, try to find an input to satisfy the first and second branch conditions • Passing this equation to a solver, we can get a solution that x equals 10 and y equals 1000 which are valid inputs to reach the abort 4/20 statement and find the bug

  7. • To better understand this concept of directed testing, we’ll continue Introduction looking at this example • Consider we randomly generate the following inputs to h: x equal to 20 and y equal to 1000 ◮ Input One: int mul2(int x) { • With this input, the first branch, x not equal to y, will be taken, but 1 x = 20 , y = 1000 return 2 * x; the second one will not since the result of mul2 returns 40 and 40 is 2 ◮ Second branch not not equal to 30 } 3 • Given this programs execution, we can capture its path constraint: taken: 40 � = 20 + 10 int h(int x, int y) { 4 the path constraint is a logical formula capturing all program inputs ◮ Path constraint: if (x != y) { 5 resulting in the same path if (mul2(x) == x + 10) { ( x � = y ) ∧ (2 x � = x + 10) 6 • Specifically, this path constraint specifies that x is not equal to y and abort(); 2x is not equal to x + 10: intuitively, we can see these conditions ◮ Direct tester to new paths 7 represent the first branch being taken and the second one not being } ◮ Alter path constraint & 8 taken } solve 9 • Since our goal is to increase testing coverage of the function, we’d like to direct the tester to explore a new path through the function • To do this, we can negate the last condition in the previous constraint, in other words, try to find an input to satisfy the first and second branch conditions • Passing this equation to a solver, we can get a solution that x equals 10 and y equals 1000 which are valid inputs to reach the abort 4/20 statement and find the bug

  8. • To better understand this concept of directed testing, we’ll continue Introduction looking at this example • Consider we randomly generate the following inputs to h: x equal to 20 and y equal to 1000 ◮ Input One: int mul2(int x) { • With this input, the first branch, x not equal to y, will be taken, but 1 x = 20 , y = 1000 return 2 * x; the second one will not since the result of mul2 returns 40 and 40 is 2 ◮ Second branch not not equal to 30 } 3 • Given this programs execution, we can capture its path constraint: taken: 40 � = 20 + 10 int h(int x, int y) { 4 the path constraint is a logical formula capturing all program inputs ◮ Path constraint: if (x != y) { 5 resulting in the same path if (mul2(x) == x + 10) { ( x � = y ) ∧ (2 x � = x + 10) 6 • Specifically, this path constraint specifies that x is not equal to y and abort(); 2x is not equal to x + 10: intuitively, we can see these conditions ◮ Direct tester to new paths 7 represent the first branch being taken and the second one not being } ◮ Alter path constraint & 8 taken } solve 9 • Since our goal is to increase testing coverage of the function, we’d ◮ New constraint: like to direct the tester to explore a new path through the function • To do this, we can negate the last condition in the previous ( x � = y ) ∧ (2 x = x + 10) constraint, in other words, try to find an input to satisfy the first and second branch conditions • Passing this equation to a solver, we can get a solution that x equals 10 and y equals 1000 which are valid inputs to reach the abort 4/20 statement and find the bug

  9. • To better understand this concept of directed testing, we’ll continue Introduction looking at this example • Consider we randomly generate the following inputs to h: x equal to 20 and y equal to 1000 ◮ Input One: int mul2(int x) { • With this input, the first branch, x not equal to y, will be taken, but 1 x = 20 , y = 1000 return 2 * x; the second one will not since the result of mul2 returns 40 and 40 is 2 ◮ Second branch not not equal to 30 } 3 • Given this programs execution, we can capture its path constraint: taken: 40 � = 20 + 10 int h(int x, int y) { 4 the path constraint is a logical formula capturing all program inputs ◮ Path constraint: if (x != y) { 5 resulting in the same path if (mul2(x) == x + 10) { ( x � = y ) ∧ (2 x � = x + 10) 6 • Specifically, this path constraint specifies that x is not equal to y and abort(); 2x is not equal to x + 10: intuitively, we can see these conditions ◮ Direct tester to new paths 7 represent the first branch being taken and the second one not being } ◮ Alter path constraint & 8 taken } solve 9 • Since our goal is to increase testing coverage of the function, we’d ◮ New constraint: like to direct the tester to explore a new path through the function • To do this, we can negate the last condition in the previous ( x � = y ) ∧ (2 x = x + 10) constraint, in other words, try to find an input to satisfy the first and ◮ x = 10 ∧ y = 1000 second branch conditions • Passing this equation to a solver, we can get a solution that x equals 10 and y equals 1000 which are valid inputs to reach the abort 4/20 statement and find the bug

  10. Contributions • This brings us to the authors contributions • The authors present a framework combining random testing with ◮ Random testing + directed testing directed testing • The approach works just as in the previous example: they first randomly apply function inputs, gather a set of path constraints on an explored trace, and then use a solver to generate new inputs guiding the program along a new path • Along with this testing technique, they also present a technique to identify interfaces, or, locations which should be tested, in the program • In this way, the authors analysis becomes fully automated without requiring the developers to do anything 5/20

  11. Contributions • This brings us to the authors contributions • The authors present a framework combining random testing with ◮ Random testing + directed testing directed testing ◮ Randomly apply function inputs • The approach works just as in the previous example: they first randomly apply function inputs, gather a set of path constraints on an explored trace, and then use a solver to generate new inputs guiding the program along a new path • Along with this testing technique, they also present a technique to identify interfaces, or, locations which should be tested, in the program • In this way, the authors analysis becomes fully automated without requiring the developers to do anything 5/20

  12. Contributions • This brings us to the authors contributions • The authors present a framework combining random testing with ◮ Random testing + directed testing directed testing ◮ Randomly apply function inputs • The approach works just as in the previous example: they first randomly apply function inputs, gather a set of path constraints on ◮ Gather path constraints on a trace an explored trace, and then use a solver to generate new inputs guiding the program along a new path • Along with this testing technique, they also present a technique to identify interfaces, or, locations which should be tested, in the program • In this way, the authors analysis becomes fully automated without requiring the developers to do anything 5/20

  13. Contributions • This brings us to the authors contributions • The authors present a framework combining random testing with ◮ Random testing + directed testing directed testing ◮ Randomly apply function inputs • The approach works just as in the previous example: they first randomly apply function inputs, gather a set of path constraints on ◮ Gather path constraints on a trace an explored trace, and then use a solver to generate new inputs ◮ Use solver to find new inputs guiding the program along a new path • Along with this testing technique, they also present a technique to identify interfaces, or, locations which should be tested, in the program • In this way, the authors analysis becomes fully automated without requiring the developers to do anything 5/20

  14. Contributions • This brings us to the authors contributions • The authors present a framework combining random testing with ◮ Random testing + directed testing directed testing ◮ Randomly apply function inputs • The approach works just as in the previous example: they first randomly apply function inputs, gather a set of path constraints on ◮ Gather path constraints on a trace an explored trace, and then use a solver to generate new inputs ◮ Use solver to find new inputs guiding the program along a new path ◮ Static method to identify interfaces • Along with this testing technique, they also present a technique to identify interfaces, or, locations which should be tested, in the program • In this way, the authors analysis becomes fully automated without requiring the developers to do anything 5/20

  15. Contributions • This brings us to the authors contributions • The authors present a framework combining random testing with ◮ Random testing + directed testing directed testing ◮ Randomly apply function inputs • The approach works just as in the previous example: they first randomly apply function inputs, gather a set of path constraints on ◮ Gather path constraints on a trace an explored trace, and then use a solver to generate new inputs ◮ Use solver to find new inputs guiding the program along a new path ◮ Static method to identify interfaces • Along with this testing technique, they also present a technique to identify interfaces, or, locations which should be tested, in the ◮ Fully automated program • In this way, the authors analysis becomes fully automated without requiring the developers to do anything 5/20

  16. Next, I’ll go over how the authors generate path constraints during testing Overview Introduction Path Constraints Experimental Results Conclusions and Questions 6/20

  17. Path Constraints: Overview • Here again is an overivew of the exploration technique used by the authors 1. Execute the program with random inputs • First, they execute the program with random inputs • During the execution of the program, they collect the path constraints visited by the dynamic execution • To collect these paths constraints, they instrument each statement in the program and model the semantics of the statements • Next, given the path constraints from one execution, they negate one of the branches in the path constraint and pass the formula to a solver • The solver then attempts to find a valuation of the program inputs such that the path constraint is satisfied, or, in other words, values of the program inputs such that the new path is expored • They then use these newly generated inputs to the program and re-execute the program and repeat the process • To make this more clear, I’ll go over an example 7/20

  18. Path Constraints: Overview • Here again is an overivew of the exploration technique used by the authors 1. Execute the program with random inputs • First, they execute the program with random inputs 2. Collect path-constraints of execution • During the execution of the program, they collect the path constraints visited by the dynamic execution • To collect these paths constraints, they instrument each statement in the program and model the semantics of the statements • Next, given the path constraints from one execution, they negate one of the branches in the path constraint and pass the formula to a solver • The solver then attempts to find a valuation of the program inputs such that the path constraint is satisfied, or, in other words, values of the program inputs such that the new path is expored • They then use these newly generated inputs to the program and re-execute the program and repeat the process • To make this more clear, I’ll go over an example 7/20

  19. Path Constraints: Overview • Here again is an overivew of the exploration technique used by the authors 1. Execute the program with random inputs • First, they execute the program with random inputs 2. Collect path-constraints of execution • During the execution of the program, they collect the path constraints visited by the dynamic execution 3. Negate a condition to generate new inputs • To collect these paths constraints, they instrument each statement in the program and model the semantics of the statements • Next, given the path constraints from one execution, they negate one of the branches in the path constraint and pass the formula to a solver • The solver then attempts to find a valuation of the program inputs such that the path constraint is satisfied, or, in other words, values of the program inputs such that the new path is expored • They then use these newly generated inputs to the program and re-execute the program and repeat the process • To make this more clear, I’ll go over an example 7/20

  20. Path Constraints: Overview • Here again is an overivew of the exploration technique used by the authors 1. Execute the program with random inputs • First, they execute the program with random inputs 2. Collect path-constraints of execution • During the execution of the program, they collect the path constraints visited by the dynamic execution 3. Negate a condition to generate new inputs • To collect these paths constraints, they instrument each statement in 4. Repeat the program and model the semantics of the statements • Next, given the path constraints from one execution, they negate one of the branches in the path constraint and pass the formula to a solver • The solver then attempts to find a valuation of the program inputs such that the path constraint is satisfied, or, in other words, values of the program inputs such that the new path is expored • They then use these newly generated inputs to the program and re-execute the program and repeat the process • To make this more clear, I’ll go over an example 7/20

  21. Path Constraints: Example (1) • Here is an example program which is slightly more complicated than the one we previously looked at because it has some side effects. • To understand the path-constraint generation approach, we’ll go through this program line-by-line and look at how it evolves int f(int x, int y) { 1 symbolically int z = y; 2 • First, if we look at the concrete execution with these inputs, the first bool c1 = x == z; 3 branch is not taken since x is not equal to z. So, the first test halts if (c1) { 4 after the check of the first branch int t2 = x + 10; 5 • During the concrete execution of the program, the authors build a bool c2 = y == t2; 6 symbolic representation of all the variables if (c2) { 7 • Before the execution of the function, the program inputs are abort(); 8 unconstrained, here, I assume 32 bit integers } 9 • After executing line 2, the value of z is updated to be the value of y } 10 • Similarly, the value of c1 is updated to be the value of the expression } 11 x equal to y. Notice that this sets the value of c to be the boolean value represented by the expression z equal to y • Finally, since during the concrete execution the branch was not taken we negate the condition in the branch to generate the path 8/20 constraint for the first run

  22. Path Constraints: Example (1) • Here is an example program which is slightly more complicated than the one we previously looked at because it has some side effects. • To understand the path-constraint generation approach, we’ll go through this program line-by-line and look at how it evolves ◮ Concrete input: int f(int x, int y) { 1 symbolically int z = y; x = 10 , y = 20 2 • First, if we look at the concrete execution with these inputs, the first bool c1 = x == z; 3 branch is not taken since x is not equal to z. So, the first test halts if (c1) { 4 after the check of the first branch int t2 = x + 10; 5 • During the concrete execution of the program, the authors build a bool c2 = y == t2; 6 symbolic representation of all the variables if (c2) { 7 • Before the execution of the function, the program inputs are abort(); 8 unconstrained, here, I assume 32 bit integers } 9 • After executing line 2, the value of z is updated to be the value of y } 10 • Similarly, the value of c1 is updated to be the value of the expression } 11 x equal to y. Notice that this sets the value of c to be the boolean value represented by the expression z equal to y • Finally, since during the concrete execution the branch was not taken we negate the condition in the branch to generate the path 8/20 constraint for the first run

  23. Path Constraints: Example (1) • Here is an example program which is slightly more complicated than the one we previously looked at because it has some side effects. • To understand the path-constraint generation approach, we’ll go through this program line-by-line and look at how it evolves ◮ Concrete input: int f(int x, int y) { 1 symbolically x = 10 , y = 20 int z = y; 2 • First, if we look at the concrete execution with these inputs, the first ◮ z = 20 → x � = z bool c1 = x == z; 3 branch is not taken since x is not equal to z. So, the first test halts if (c1) { 4 after the check of the first branch int t2 = x + 10; 5 • During the concrete execution of the program, the authors build a bool c2 = y == t2; 6 symbolic representation of all the variables if (c2) { 7 • Before the execution of the function, the program inputs are abort(); 8 unconstrained, here, I assume 32 bit integers } 9 • After executing line 2, the value of z is updated to be the value of y } 10 • Similarly, the value of c1 is updated to be the value of the expression } 11 x equal to y. Notice that this sets the value of c to be the boolean value represented by the expression z equal to y • Finally, since during the concrete execution the branch was not taken we negate the condition in the branch to generate the path 8/20 constraint for the first run

  24. Path Constraints: Example (1) • Here is an example program which is slightly more complicated than the one we previously looked at because it has some side effects. • To understand the path-constraint generation approach, we’ll go through this program line-by-line and look at how it evolves ◮ Concrete input: int f(int x, int y) { 1 symbolically int z = y; x = 10 , y = 20 2 • First, if we look at the concrete execution with these inputs, the first bool c1 = x == z; ◮ Initially: 3 branch is not taken since x is not equal to z. So, the first test halts if (c1) { 4 after the check of the first branch − 2 31 ≤ x ≤ 2 31 − 1 int t2 = x + 10; 5 • During the concrete execution of the program, the authors build a ∧ − 2 31 ≤ y ≤ 2 31 − 1 bool c2 = y == t2; 6 symbolic representation of all the variables if (c2) { 7 • Before the execution of the function, the program inputs are abort(); 8 unconstrained, here, I assume 32 bit integers } 9 • After executing line 2, the value of z is updated to be the value of y } 10 • Similarly, the value of c1 is updated to be the value of the expression } 11 x equal to y. Notice that this sets the value of c to be the boolean value represented by the expression z equal to y • Finally, since during the concrete execution the branch was not taken we negate the condition in the branch to generate the path 8/20 constraint for the first run

  25. Path Constraints: Example (1) • Here is an example program which is slightly more complicated than the one we previously looked at because it has some side effects. • To understand the path-constraint generation approach, we’ll go through this program line-by-line and look at how it evolves ◮ Concrete input: int f(int x, int y) { 1 symbolically int z = y; x = 10 , y = 20 2 • First, if we look at the concrete execution with these inputs, the first bool c1 = x == z; ◮ After line 2: 3 branch is not taken since x is not equal to z. So, the first test halts if (c1) { 4 after the check of the first branch − 2 31 ≤ x ≤ 2 31 − 1 int t2 = x + 10; 5 • During the concrete execution of the program, the authors build a ∧ − 2 31 ≤ y ≤ 2 31 − 1 bool c2 = y == t2; 6 symbolic representation of all the variables if (c2) { 7 ∧ z := y • Before the execution of the function, the program inputs are abort(); 8 unconstrained, here, I assume 32 bit integers } 9 • After executing line 2, the value of z is updated to be the value of y } 10 • Similarly, the value of c1 is updated to be the value of the expression } 11 x equal to y. Notice that this sets the value of c to be the boolean value represented by the expression z equal to y • Finally, since during the concrete execution the branch was not taken we negate the condition in the branch to generate the path 8/20 constraint for the first run

  26. Path Constraints: Example (1) • Here is an example program which is slightly more complicated than the one we previously looked at because it has some side effects. • To understand the path-constraint generation approach, we’ll go through this program line-by-line and look at how it evolves ◮ Concrete input: int f(int x, int y) { 1 symbolically int z = y; x = 10 , y = 20 2 • First, if we look at the concrete execution with these inputs, the first bool c1 = x == z; ◮ After line 3: 3 branch is not taken since x is not equal to z. So, the first test halts if (c1) { 4 after the check of the first branch − 2 31 ≤ x ≤ 2 31 − 1 int t2 = x + 10; 5 • During the concrete execution of the program, the authors build a ∧ − 2 31 ≤ y ≤ 2 31 − 1 bool c2 = y == t2; 6 symbolic representation of all the variables if (c2) { 7 ∧ z := y • Before the execution of the function, the program inputs are abort(); 8 unconstrained, here, I assume 32 bit integers ∧ c 1 := ( x = z ) } 9 • After executing line 2, the value of z is updated to be the value of y } 10 • Similarly, the value of c1 is updated to be the value of the expression } 11 x equal to y. Notice that this sets the value of c to be the boolean value represented by the expression z equal to y • Finally, since during the concrete execution the branch was not taken we negate the condition in the branch to generate the path 8/20 constraint for the first run

  27. Path Constraints: Example (1) • Here is an example program which is slightly more complicated than the one we previously looked at because it has some side effects. • To understand the path-constraint generation approach, we’ll go through this program line-by-line and look at how it evolves ◮ Concrete input: int f(int x, int y) { 1 symbolically int z = y; x = 10 , y = 20 2 • First, if we look at the concrete execution with these inputs, the first bool c1 = x == z; ◮ After line 3: 3 branch is not taken since x is not equal to z. So, the first test halts if (c1) { 4 after the check of the first branch − 2 31 ≤ x ≤ 2 31 − 1 int t2 = x + 10; 5 • During the concrete execution of the program, the authors build a ∧ − 2 31 ≤ y ≤ 2 31 − 1 bool c2 = y == t2; 6 symbolic representation of all the variables if (c2) { 7 ∧ z := y • Before the execution of the function, the program inputs are abort(); 8 unconstrained, here, I assume 32 bit integers ∧ c 1 := ( x = z ) } 9 • After executing line 2, the value of z is updated to be the value of y } 10 • Similarly, the value of c1 is updated to be the value of the expression ◮ Path constraint: ¬ c 1 } 11 x equal to y. Notice that this sets the value of c to be the boolean value represented by the expression z equal to y • Finally, since during the concrete execution the branch was not taken we negate the condition in the branch to generate the path 8/20 constraint for the first run

  28. Path Constraints: Example (2) • After generating the symbolic expression for the variables along with the path constraint, the next step is to generate a new input to the program in order to explore a new path ◮ After line 3: int f(int x, int y) { • Since we’ve only seen one branch, the only new choice we can make 1 is to explore inside this branch, or, to find program inputs such that int z = y; 2 − 2 31 ≤ x ≤ 2 31 − 1 c 1 is true bool c1 = x == z; 3 ∧ − 2 31 ≤ y ≤ 2 31 − 1 • To do this, we use the symbolic values for all the variables and if (c1) { 4 conjunct it with the path constraint we want to build a new logic int t2 = x + 10; 5 ∧ z := y formula bool c2 = y == t2; 6 ∧ c 1 := ( x = z ) • Next, we can ask a solver to find a satisfying assignment to this if (c2) { 7 formula: the satisfying assignment is a valuation of x and y such that abort(); ◮ Old constraint: ¬ c 1 8 all the constraints hold } 9 • One such solution is that x and y are both equal to zero } 10 • The key thing to notice is that the logic formula we’ve constructed is } 11 such that a satisfying assignment represents values of the inputs which are guaranteed to reach the branch we are interested in 9/20

  29. Path Constraints: Example (2) • After generating the symbolic expression for the variables along with the path constraint, the next step is to generate a new input to the program in order to explore a new path ◮ After line 3: int f(int x, int y) { • Since we’ve only seen one branch, the only new choice we can make 1 is to explore inside this branch, or, to find program inputs such that int z = y; 2 − 2 31 ≤ x ≤ 2 31 − 1 c 1 is true bool c1 = x == z; 3 ∧ − 2 31 ≤ y ≤ 2 31 − 1 • To do this, we use the symbolic values for all the variables and if (c1) { 4 conjunct it with the path constraint we want to build a new logic int t2 = x + 10; 5 ∧ z := y formula bool c2 = y == t2; 6 ∧ c 1 := ( x = z ) • Next, we can ask a solver to find a satisfying assignment to this if (c2) { 7 formula: the satisfying assignment is a valuation of x and y such that abort(); ◮ Old constraint: ¬ c 1 8 all the constraints hold } 9 ◮ New constraint: c 1 • One such solution is that x and y are both equal to zero } 10 • The key thing to notice is that the logic formula we’ve constructed is } 11 such that a satisfying assignment represents values of the inputs which are guaranteed to reach the branch we are interested in 9/20

  30. Path Constraints: Example (2) • After generating the symbolic expression for the variables along with the path constraint, the next step is to generate a new input to the program in order to explore a new path ◮ Logic formula: int f(int x, int y) { • Since we’ve only seen one branch, the only new choice we can make 1 is to explore inside this branch, or, to find program inputs such that int z = y; 2 − 2 31 ≤ x ≤ 2 31 − 1 c 1 is true bool c1 = x == z; 3 ∧ − 2 31 ≤ y ≤ 2 31 − 1 • To do this, we use the symbolic values for all the variables and if (c1) { 4 conjunct it with the path constraint we want to build a new logic int t2 = x + 10; 5 ∧ z := y formula bool c2 = y == t2; 6 ∧ c 1 := ( x = z ) • Next, we can ask a solver to find a satisfying assignment to this if (c2) { 7 formula: the satisfying assignment is a valuation of x and y such that ∧ c 1 abort(); 8 all the constraints hold } 9 • One such solution is that x and y are both equal to zero } 10 • The key thing to notice is that the logic formula we’ve constructed is } 11 such that a satisfying assignment represents values of the inputs which are guaranteed to reach the branch we are interested in 9/20

  31. Path Constraints: Example (2) • After generating the symbolic expression for the variables along with the path constraint, the next step is to generate a new input to the program in order to explore a new path ◮ Logic formula: int f(int x, int y) { • Since we’ve only seen one branch, the only new choice we can make 1 is to explore inside this branch, or, to find program inputs such that int z = y; 2 − 2 31 ≤ x ≤ 2 31 − 1 c 1 is true bool c1 = x == z; 3 ∧ − 2 31 ≤ y ≤ 2 31 − 1 • To do this, we use the symbolic values for all the variables and if (c1) { 4 conjunct it with the path constraint we want to build a new logic int t2 = x + 10; 5 ∧ z := y formula bool c2 = y == t2; 6 ∧ c 1 := ( x = z ) • Next, we can ask a solver to find a satisfying assignment to this if (c2) { 7 formula: the satisfying assignment is a valuation of x and y such that ∧ c 1 abort(); 8 all the constraints hold } 9 ◮ Satisfying assignment: • One such solution is that x and y are both equal to zero } 10 x = 0 ∧ y = 0 • The key thing to notice is that the logic formula we’ve constructed is } 11 such that a satisfying assignment represents values of the inputs which are guaranteed to reach the branch we are interested in 9/20

  32. • On the next iteration, we use the inputs we obtained previously to Path Constraints: Example (3) re-execute the program concretely • During the concrete execution, we enter the first if-branch, then, we calculate the value of t2 which is x plus ten which evaluates to 10 ◮ Concrete input: int f(int x, int y) { • The value of c2 check is y is equal to t2 which evaluates to false 1 int z = y; x = 0 , y = 0 • So, the results of the second iteration are that the first branch is 2 taken and the second branch is not taken bool c1 = x == z; 3 • Again, during the concrete execution we can generate a symbolic if (c1) { 4 representation of the program. The symbolic representation this time int t2 = x + 10; 5 is the same as in the previous iteration except it includes the bool c2 = y == t2; 6 evaluations of t2 and c2 if (c2) { • Again, this execution has a path constraint which is c1 and not c2. 7 To generate the next path constraint we again flip one of the abort(); 8 conditions and produce a new logic formula with the desired path } 9 conditions we want } 10 • As a human, solving the constraints on the input variables to reach } this location is already, at least for me, becoming non-trivial 11 • Luckily, we can use a solver to solve this formula: the result from the solver is that the formula is unsatisfiable: this means that there does not exist a value for the inputs to cause the abort to be reached • For this function at least, the procedure is sound: we’ve formally 10/20 proved that the abort statement in this function can never be reached

  33. • On the next iteration, we use the inputs we obtained previously to Path Constraints: Example (3) re-execute the program concretely • During the concrete execution, we enter the first if-branch, then, we calculate the value of t2 which is x plus ten which evaluates to 10 ◮ Concrete input: int f(int x, int y) { • The value of c2 check is y is equal to t2 which evaluates to false 1 int z = y; x = 0 , y = 0 • So, the results of the second iteration are that the first branch is 2 taken and the second branch is not taken bool c1 = x == z; ◮ c1 = x == z = 1 3 • Again, during the concrete execution we can generate a symbolic if (c1) { 4 representation of the program. The symbolic representation this time int t2 = x + 10; 5 is the same as in the previous iteration except it includes the bool c2 = y == t2; 6 evaluations of t2 and c2 if (c2) { • Again, this execution has a path constraint which is c1 and not c2. 7 To generate the next path constraint we again flip one of the abort(); 8 conditions and produce a new logic formula with the desired path } 9 conditions we want } 10 • As a human, solving the constraints on the input variables to reach } this location is already, at least for me, becoming non-trivial 11 • Luckily, we can use a solver to solve this formula: the result from the solver is that the formula is unsatisfiable: this means that there does not exist a value for the inputs to cause the abort to be reached • For this function at least, the procedure is sound: we’ve formally 10/20 proved that the abort statement in this function can never be reached

  34. • On the next iteration, we use the inputs we obtained previously to Path Constraints: Example (3) re-execute the program concretely • During the concrete execution, we enter the first if-branch, then, we calculate the value of t2 which is x plus ten which evaluates to 10 ◮ Concrete input: int f(int x, int y) { • The value of c2 check is y is equal to t2 which evaluates to false 1 int z = y; x = 0 , y = 0 • So, the results of the second iteration are that the first branch is 2 taken and the second branch is not taken bool c1 = x == z; ◮ c1 = x == z = 1 3 • Again, during the concrete execution we can generate a symbolic if (c1) { 4 ◮ t2 = x + 10 = 10 representation of the program. The symbolic representation this time int t2 = x + 10; 5 is the same as in the previous iteration except it includes the bool c2 = y == t2; 6 evaluations of t2 and c2 if (c2) { • Again, this execution has a path constraint which is c1 and not c2. 7 To generate the next path constraint we again flip one of the abort(); 8 conditions and produce a new logic formula with the desired path } 9 conditions we want } 10 • As a human, solving the constraints on the input variables to reach } this location is already, at least for me, becoming non-trivial 11 • Luckily, we can use a solver to solve this formula: the result from the solver is that the formula is unsatisfiable: this means that there does not exist a value for the inputs to cause the abort to be reached • For this function at least, the procedure is sound: we’ve formally 10/20 proved that the abort statement in this function can never be reached

  35. • On the next iteration, we use the inputs we obtained previously to Path Constraints: Example (3) re-execute the program concretely • During the concrete execution, we enter the first if-branch, then, we calculate the value of t2 which is x plus ten which evaluates to 10 ◮ After line 6: int f(int x, int y) { • The value of c2 check is y is equal to t2 which evaluates to false 1 int z = y; • So, the results of the second iteration are that the first branch is 2 2 31 ≤ x ≤ 2 31 − 1 taken and the second branch is not taken bool c1 = x == z; 3 ∧ 2 31 ≤ y ≤ 2 31 − 1 • Again, during the concrete execution we can generate a symbolic if (c1) { 4 representation of the program. The symbolic representation this time int t2 = x + 10; 5 ∧ z := y is the same as in the previous iteration except it includes the bool c2 = y == t2; 6 evaluations of t2 and c2 ∧ c 1 := ( x = z ) if (c2) { • Again, this execution has a path constraint which is c1 and not c2. 7 ∧ t 2 := x + 10 To generate the next path constraint we again flip one of the abort(); 8 conditions and produce a new logic formula with the desired path ∧ c 2 := y = t 2 } 9 conditions we want } 10 • As a human, solving the constraints on the input variables to reach } this location is already, at least for me, becoming non-trivial 11 • Luckily, we can use a solver to solve this formula: the result from the solver is that the formula is unsatisfiable: this means that there does not exist a value for the inputs to cause the abort to be reached • For this function at least, the procedure is sound: we’ve formally 10/20 proved that the abort statement in this function can never be reached

  36. • On the next iteration, we use the inputs we obtained previously to Path Constraints: Example (3) re-execute the program concretely • During the concrete execution, we enter the first if-branch, then, we calculate the value of t2 which is x plus ten which evaluates to 10 ◮ After line 6: int f(int x, int y) { • The value of c2 check is y is equal to t2 which evaluates to false 1 int z = y; • So, the results of the second iteration are that the first branch is 2 2 31 ≤ x ≤ 2 31 − 1 taken and the second branch is not taken bool c1 = x == z; 3 ∧ 2 31 ≤ y ≤ 2 31 − 1 • Again, during the concrete execution we can generate a symbolic if (c1) { 4 representation of the program. The symbolic representation this time int t2 = x + 10; 5 ∧ z := y is the same as in the previous iteration except it includes the bool c2 = y == t2; 6 evaluations of t2 and c2 ∧ c 1 := ( x = z ) if (c2) { • Again, this execution has a path constraint which is c1 and not c2. 7 ∧ t 2 := x + 10 To generate the next path constraint we again flip one of the abort(); 8 conditions and produce a new logic formula with the desired path ∧ c 2 := y = t 2 } 9 conditions we want } 10 • As a human, solving the constraints on the input variables to reach ◮ Path constraint: c 1 ∧ ¬ c 2 } this location is already, at least for me, becoming non-trivial 11 • Luckily, we can use a solver to solve this formula: the result from the solver is that the formula is unsatisfiable: this means that there does not exist a value for the inputs to cause the abort to be reached • For this function at least, the procedure is sound: we’ve formally 10/20 proved that the abort statement in this function can never be reached

  37. • On the next iteration, we use the inputs we obtained previously to Path Constraints: Example (3) re-execute the program concretely • During the concrete execution, we enter the first if-branch, then, we calculate the value of t2 which is x plus ten which evaluates to 10 ◮ New constraint: c 1 ∧ c 2 int f(int x, int y) { • The value of c2 check is y is equal to t2 which evaluates to false 1 int z = y; • So, the results of the second iteration are that the first branch is 2 taken and the second branch is not taken bool c1 = x == z; 3 • Again, during the concrete execution we can generate a symbolic if (c1) { 4 representation of the program. The symbolic representation this time int t2 = x + 10; 5 is the same as in the previous iteration except it includes the bool c2 = y == t2; 6 evaluations of t2 and c2 if (c2) { • Again, this execution has a path constraint which is c1 and not c2. 7 To generate the next path constraint we again flip one of the abort(); 8 conditions and produce a new logic formula with the desired path } 9 conditions we want } 10 • As a human, solving the constraints on the input variables to reach } this location is already, at least for me, becoming non-trivial 11 • Luckily, we can use a solver to solve this formula: the result from the solver is that the formula is unsatisfiable: this means that there does not exist a value for the inputs to cause the abort to be reached • For this function at least, the procedure is sound: we’ve formally 10/20 proved that the abort statement in this function can never be reached

  38. • On the next iteration, we use the inputs we obtained previously to Path Constraints: Example (3) re-execute the program concretely • During the concrete execution, we enter the first if-branch, then, we calculate the value of t2 which is x plus ten which evaluates to 10 ◮ New constraint: c 1 ∧ c 2 int f(int x, int y) { • The value of c2 check is y is equal to t2 which evaluates to false 1 int z = y; ◮ Logic formula: • So, the results of the second iteration are that the first branch is 2 taken and the second branch is not taken bool c1 = x == z; 3 2 31 ≤ x ≤ 2 31 − 1 • Again, during the concrete execution we can generate a symbolic if (c1) { 4 representation of the program. The symbolic representation this time ∧ 2 31 ≤ y ≤ 2 31 − 1 int t2 = x + 10; 5 is the same as in the previous iteration except it includes the bool c2 = y == t2; 6 evaluations of t2 and c2 ∧ z := y if (c2) { • Again, this execution has a path constraint which is c1 and not c2. 7 ∧ c 1 := ( x = z ) To generate the next path constraint we again flip one of the abort(); 8 conditions and produce a new logic formula with the desired path ∧ t 2 := x + 10 } 9 conditions we want } ∧ c 2 := y = t 2 10 • As a human, solving the constraints on the input variables to reach } this location is already, at least for me, becoming non-trivial 11 ∧ c 1 ∧ c 2 • Luckily, we can use a solver to solve this formula: the result from the solver is that the formula is unsatisfiable: this means that there does not exist a value for the inputs to cause the abort to be reached • For this function at least, the procedure is sound: we’ve formally 10/20 proved that the abort statement in this function can never be reached

  39. • On the next iteration, we use the inputs we obtained previously to Path Constraints: Example (3) re-execute the program concretely • During the concrete execution, we enter the first if-branch, then, we calculate the value of t2 which is x plus ten which evaluates to 10 ◮ New constraint: c 1 ∧ c 2 int f(int x, int y) { • The value of c2 check is y is equal to t2 which evaluates to false 1 int z = y; ◮ Logic formula: • So, the results of the second iteration are that the first branch is 2 taken and the second branch is not taken bool c1 = x == z; 3 2 31 ≤ x ≤ 2 31 − 1 • Again, during the concrete execution we can generate a symbolic if (c1) { 4 representation of the program. The symbolic representation this time ∧ 2 31 ≤ y ≤ 2 31 − 1 int t2 = x + 10; 5 is the same as in the previous iteration except it includes the bool c2 = y == t2; 6 evaluations of t2 and c2 ∧ z := y if (c2) { • Again, this execution has a path constraint which is c1 and not c2. 7 ∧ c 1 := ( x = z ) To generate the next path constraint we again flip one of the abort(); 8 conditions and produce a new logic formula with the desired path ∧ t 2 := x + 10 } 9 conditions we want } ∧ c 2 := y = t 2 10 • As a human, solving the constraints on the input variables to reach } this location is already, at least for me, becoming non-trivial 11 ∧ c 1 ∧ c 2 • Luckily, we can use a solver to solve this formula: the result from the solver is that the formula is unsatisfiable: this means that there does ◮ Unsatisfiable! (The error is not exist a value for the inputs to cause the abort to be reached unreachable) • For this function at least, the procedure is sound: we’ve formally 10/20 proved that the abort statement in this function can never be reached

  40. Implementation Intuition • Now that I’ve gone over an example of their technique, I’ll go over a high level intuition of how their technique works and try to relate it back to stuff we’ve seen so far ◮ Transfer functions • Like most of the analyses we’ve seen so far, their technique uses transfer functions • To keep track of the symbolic values of all the variables, the authors define transfer functions for all statements in the program • For example, if we encounter an assignment statement during the execution, we use a transfer function which takes as input a symbolic representation, S, and returns a new symbolic representation which is the same as S except the value of x is assigned to z • Defining transfer functions for every type of statement in the program allows for the analysis to operate on arbitrary sequences of expressions 11/20

  41. Implementation Intuition • Now that I’ve gone over an example of their technique, I’ll go over a high level intuition of how their technique works and try to relate it back to stuff we’ve seen so far ◮ Transfer functions • Like most of the analyses we’ve seen so far, their technique uses ◮ Function from symbolic equation to symbolic equation transfer functions • To keep track of the symbolic values of all the variables, the authors define transfer functions for all statements in the program • For example, if we encounter an assignment statement during the execution, we use a transfer function which takes as input a symbolic representation, S, and returns a new symbolic representation which is the same as S except the value of x is assigned to z • Defining transfer functions for every type of statement in the program allows for the analysis to operate on arbitrary sequences of expressions 11/20

  42. Implementation Intuition • Now that I’ve gone over an example of their technique, I’ll go over a high level intuition of how their technique works and try to relate it back to stuff we’ve seen so far ◮ Transfer functions • Like most of the analyses we’ve seen so far, their technique uses ◮ Function from symbolic equation to symbolic equation transfer functions ◮ S → S • To keep track of the symbolic values of all the variables, the authors define transfer functions for all statements in the program • For example, if we encounter an assignment statement during the execution, we use a transfer function which takes as input a symbolic representation, S, and returns a new symbolic representation which is the same as S except the value of x is assigned to z • Defining transfer functions for every type of statement in the program allows for the analysis to operate on arbitrary sequences of expressions 11/20

  43. Implementation Intuition • Now that I’ve gone over an example of their technique, I’ll go over a high level intuition of how their technique works and try to relate it back to stuff we’ve seen so far ◮ Transfer functions • Like most of the analyses we’ve seen so far, their technique uses ◮ Function from symbolic equation to symbolic equation transfer functions ◮ S → S • To keep track of the symbolic values of all the variables, the authors ◮ Evaluate: z = x define transfer functions for all statements in the program • For example, if we encounter an assignment statement during the execution, we use a transfer function which takes as input a symbolic representation, S, and returns a new symbolic representation which is the same as S except the value of x is assigned to z • Defining transfer functions for every type of statement in the program allows for the analysis to operate on arbitrary sequences of expressions 11/20

  44. Implementation Intuition • Now that I’ve gone over an example of their technique, I’ll go over a high level intuition of how their technique works and try to relate it back to stuff we’ve seen so far ◮ Transfer functions • Like most of the analyses we’ve seen so far, their technique uses ◮ Function from symbolic equation to symbolic equation transfer functions ◮ S → S • To keep track of the symbolic values of all the variables, the authors ◮ Evaluate: z = x define transfer functions for all statements in the program ◮ λ S . S � z := x � • For example, if we encounter an assignment statement during the execution, we use a transfer function which takes as input a symbolic representation, S, and returns a new symbolic representation which is the same as S except the value of x is assigned to z • Defining transfer functions for every type of statement in the program allows for the analysis to operate on arbitrary sequences of expressions 11/20

  45. Soundness • Since in general programs may be infinite, for example, in the presence of infinite loops, the analysis cannot generally handle all types of programs ◮ Programs may be infinite • This is because we eventually need to produce a logic formula representing a path through the program: this logic formula cannot be infinitely long • The solution to this problem is to only search through a bounded depth of a program • As a result, the authors analysis, in general, is under-approximated • This means it should be used for bug hunting and not proof generation • However, because it is under-approximated, we have a nice side effect that the analysis has no false alarms • This means that any bug which is detected by the algorithm is guaranteed to be a real bug 12/20

  46. Soundness • Since in general programs may be infinite, for example, in the presence of infinite loops, the analysis cannot generally handle all types of programs ◮ Programs may be infinite • This is because we eventually need to produce a logic formula ◮ Cannot have an infinitly long formulas representing a path through the program: this logic formula cannot be infinitely long • The solution to this problem is to only search through a bounded depth of a program • As a result, the authors analysis, in general, is under-approximated • This means it should be used for bug hunting and not proof generation • However, because it is under-approximated, we have a nice side effect that the analysis has no false alarms • This means that any bug which is detected by the algorithm is guaranteed to be a real bug 12/20

  47. Soundness • Since in general programs may be infinite, for example, in the presence of infinite loops, the analysis cannot generally handle all types of programs ◮ Programs may be infinite • This is because we eventually need to produce a logic formula ◮ Cannot have an infinitly long formulas representing a path through the program: this logic formula cannot ◮ Solution: bound the depth of the search be infinitely long • The solution to this problem is to only search through a bounded depth of a program • As a result, the authors analysis, in general, is under-approximated • This means it should be used for bug hunting and not proof generation • However, because it is under-approximated, we have a nice side effect that the analysis has no false alarms • This means that any bug which is detected by the algorithm is guaranteed to be a real bug 12/20

  48. Soundness • Since in general programs may be infinite, for example, in the presence of infinite loops, the analysis cannot generally handle all types of programs ◮ Programs may be infinite • This is because we eventually need to produce a logic formula ◮ Cannot have an infinitly long formulas representing a path through the program: this logic formula cannot ◮ Solution: bound the depth of the search be infinitely long ◮ Under-approximated analysis • The solution to this problem is to only search through a bounded depth of a program • As a result, the authors analysis, in general, is under-approximated • This means it should be used for bug hunting and not proof generation • However, because it is under-approximated, we have a nice side effect that the analysis has no false alarms • This means that any bug which is detected by the algorithm is guaranteed to be a real bug 12/20

  49. Soundness • Since in general programs may be infinite, for example, in the presence of infinite loops, the analysis cannot generally handle all types of programs ◮ Programs may be infinite • This is because we eventually need to produce a logic formula ◮ Cannot have an infinitly long formulas representing a path through the program: this logic formula cannot ◮ Solution: bound the depth of the search be infinitely long ◮ Under-approximated analysis • The solution to this problem is to only search through a bounded ◮ Bug hunting depth of a program • As a result, the authors analysis, in general, is under-approximated • This means it should be used for bug hunting and not proof generation • However, because it is under-approximated, we have a nice side effect that the analysis has no false alarms • This means that any bug which is detected by the algorithm is guaranteed to be a real bug 12/20

  50. Soundness • Since in general programs may be infinite, for example, in the presence of infinite loops, the analysis cannot generally handle all types of programs ◮ Programs may be infinite • This is because we eventually need to produce a logic formula ◮ Cannot have an infinitly long formulas representing a path through the program: this logic formula cannot ◮ Solution: bound the depth of the search be infinitely long ◮ Under-approximated analysis • The solution to this problem is to only search through a bounded ◮ Bug hunting depth of a program ◮ Not proof generation • As a result, the authors analysis, in general, is under-approximated • This means it should be used for bug hunting and not proof generation • However, because it is under-approximated, we have a nice side effect that the analysis has no false alarms • This means that any bug which is detected by the algorithm is guaranteed to be a real bug 12/20

  51. Soundness • Since in general programs may be infinite, for example, in the presence of infinite loops, the analysis cannot generally handle all types of programs ◮ Programs may be infinite • This is because we eventually need to produce a logic formula ◮ Cannot have an infinitly long formulas representing a path through the program: this logic formula cannot ◮ Solution: bound the depth of the search be infinitely long ◮ Under-approximated analysis • The solution to this problem is to only search through a bounded ◮ Bug hunting depth of a program ◮ Not proof generation • As a result, the authors analysis, in general, is under-approximated ◮ No false alarms: • This means it should be used for bug hunting and not proof generation • However, because it is under-approximated, we have a nice side effect that the analysis has no false alarms • This means that any bug which is detected by the algorithm is guaranteed to be a real bug 12/20

  52. Soundness • Since in general programs may be infinite, for example, in the presence of infinite loops, the analysis cannot generally handle all types of programs ◮ Programs may be infinite • This is because we eventually need to produce a logic formula ◮ Cannot have an infinitly long formulas representing a path through the program: this logic formula cannot ◮ Solution: bound the depth of the search be infinitely long ◮ Under-approximated analysis • The solution to this problem is to only search through a bounded ◮ Bug hunting depth of a program ◮ Not proof generation • As a result, the authors analysis, in general, is under-approximated ◮ No false alarms: • This means it should be used for bug hunting and not proof ◮ Detected bugs are guarnateed to exist in the actual generation program • However, because it is under-approximated, we have a nice side effect that the analysis has no false alarms • This means that any bug which is detected by the algorithm is guaranteed to be a real bug 12/20

  53. Now that I’ve gone over a high-level intution behind their approach, I’ll Overview present the experimental results Introduction Path Constraints Experimental Results Conclusions and Questions 13/20

  54. Test Bench • The authors implemented their tool to test C programs • They ran tests on a Pentium III processor running at 800 MHz ◮ Pentium III 800 MHz Processor • They used a solver called lp solve to solve the constraint formulas • And, they tested on three different programs: a small air conditioner controller example, a crypto protocol, and an open source library called oSIP 14/20

  55. Test Bench • The authors implemented their tool to test C programs • They ran tests on a Pentium III processor running at 800 MHz ◮ Pentium III 800 MHz Processor • They used a solver called lp solve to solve the constraint formulas ◮ lp solve solver • And, they tested on three different programs: a small air conditioner controller example, a crypto protocol, and an open source library called oSIP 14/20

  56. Test Bench • The authors implemented their tool to test C programs • They ran tests on a Pentium III processor running at 800 MHz ◮ Pentium III 800 MHz Processor • They used a solver called lp solve to solve the constraint formulas ◮ lp solve solver • And, they tested on three different programs: a small air conditioner ◮ CIL parser controller example, a crypto protocol, and an open source library called oSIP 14/20

  57. Test Bench • The authors implemented their tool to test C programs • They ran tests on a Pentium III processor running at 800 MHz ◮ Pentium III 800 MHz Processor • They used a solver called lp solve to solve the constraint formulas ◮ lp solve solver • And, they tested on three different programs: a small air conditioner ◮ CIL parser controller example, a crypto protocol, and an open source library called oSIP ◮ Three programs: 14/20

  58. Test Bench • The authors implemented their tool to test C programs • They ran tests on a Pentium III processor running at 800 MHz ◮ Pentium III 800 MHz Processor • They used a solver called lp solve to solve the constraint formulas ◮ lp solve solver • And, they tested on three different programs: a small air conditioner ◮ CIL parser controller example, a crypto protocol, and an open source library called oSIP ◮ Three programs: 1. Air-Conditioner Controller 14/20

  59. Test Bench • The authors implemented their tool to test C programs • They ran tests on a Pentium III processor running at 800 MHz ◮ Pentium III 800 MHz Processor • They used a solver called lp solve to solve the constraint formulas ◮ lp solve solver • And, they tested on three different programs: a small air conditioner ◮ CIL parser controller example, a crypto protocol, and an open source library called oSIP ◮ Three programs: 1. Air-Conditioner Controller 2. Needham-Schroeder Protocol 14/20

  60. Test Bench • The authors implemented their tool to test C programs • They ran tests on a Pentium III processor running at 800 MHz ◮ Pentium III 800 MHz Processor • They used a solver called lp solve to solve the constraint formulas ◮ lp solve solver • And, they tested on three different programs: a small air conditioner ◮ CIL parser controller example, a crypto protocol, and an open source library called oSIP ◮ Three programs: 1. Air-Conditioner Controller 2. Needham-Schroeder Protocol 3. oSIP Telephony Library 14/20

  61. AC-Controller • First, we can look at the source code of the AC controller • The source code is very small but makes a serves as a good comparison to randomized testing int is_room_hot, ac, is_door_closed; 1 void ac_controller(int message) { 2 ◮ Random testing • The program is buggy: the abort statement in the program is if (message == 0) is_room_hot = 1; 3 reachable under certain program inputs does not work if (message == 1) is_room_hot = 0; 4 • First, to understand how this function was run you need to imagine if (message == 2) { 5 that this function can be called an arbitrary number of times with is_door_closed = 0; 6 different values for message ac = 0; 7 } 8 • It is essentially representing a state machine which causes transitions if (message == 3) { 9 based on the input to the function is_door_closed = 1; 10 • The abort statement in the program can be reached after applying if (is_room_hot) ac = 1; 11 two messages: first passing 3 and then passing 0 } 12 if (is_room_hot && is_door_closed 13 • Because this bug takes at least two messages to manifest, the && !ac) { 14 chance for a random tester to find it is one out of 2 to the sixty four, abort(); 15 which is obviously very close to zero } 16 • DART on the other hand, finds the bug in less than one second } 17 15/20

  62. AC-Controller • First, we can look at the source code of the AC controller • The source code is very small but makes a serves as a good comparison to randomized testing int is_room_hot, ac, is_door_closed; 1 void ac_controller(int message) { 2 ◮ Random testing • The program is buggy: the abort statement in the program is if (message == 0) is_room_hot = 1; 3 reachable under certain program inputs does not work if (message == 1) is_room_hot = 0; 4 ◮ 2 32 × 2 32 = 2 64 • First, to understand how this function was run you need to imagine if (message == 2) { 5 that this function can be called an arbitrary number of times with is_door_closed = 0; 6 number of different values for message ac = 0; 7 possibilities } 8 • It is essentially representing a state machine which causes transitions if (message == 3) { 9 based on the input to the function is_door_closed = 1; 10 • The abort statement in the program can be reached after applying if (is_room_hot) ac = 1; 11 two messages: first passing 3 and then passing 0 } 12 if (is_room_hot && is_door_closed 13 • Because this bug takes at least two messages to manifest, the && !ac) { 14 chance for a random tester to find it is one out of 2 to the sixty four, abort(); 15 which is obviously very close to zero } 16 • DART on the other hand, finds the bug in less than one second } 17 15/20

  63. AC-Controller • First, we can look at the source code of the AC controller • The source code is very small but makes a serves as a good comparison to randomized testing int is_room_hot, ac, is_door_closed; 1 void ac_controller(int message) { 2 ◮ Random testing • The program is buggy: the abort statement in the program is if (message == 0) is_room_hot = 1; 3 reachable under certain program inputs does not work if (message == 1) is_room_hot = 0; 4 ◮ 2 32 × 2 32 = 2 64 • First, to understand how this function was run you need to imagine if (message == 2) { 5 that this function can be called an arbitrary number of times with is_door_closed = 0; 6 number of different values for message ac = 0; 7 possibilities } 8 • It is essentially representing a state machine which causes transitions ◮ One leads to the if (message == 3) { 9 based on the input to the function is_door_closed = 1; 10 error • The abort statement in the program can be reached after applying if (is_room_hot) ac = 1; 11 two messages: first passing 3 and then passing 0 } 12 if (is_room_hot && is_door_closed 13 • Because this bug takes at least two messages to manifest, the && !ac) { 14 chance for a random tester to find it is one out of 2 to the sixty four, abort(); 15 which is obviously very close to zero } 16 • DART on the other hand, finds the bug in less than one second } 17 15/20

  64. AC-Controller • First, we can look at the source code of the AC controller • The source code is very small but makes a serves as a good comparison to randomized testing int is_room_hot, ac, is_door_closed; 1 void ac_controller(int message) { 2 ◮ Random testing • The program is buggy: the abort statement in the program is if (message == 0) is_room_hot = 1; 3 reachable under certain program inputs does not work if (message == 1) is_room_hot = 0; 4 ◮ 2 32 × 2 32 = 2 64 • First, to understand how this function was run you need to imagine if (message == 2) { 5 that this function can be called an arbitrary number of times with is_door_closed = 0; 6 number of different values for message ac = 0; 7 possibilities } 8 • It is essentially representing a state machine which causes transitions ◮ One leads to the if (message == 3) { 9 based on the input to the function is_door_closed = 1; 10 error • The abort statement in the program can be reached after applying if (is_room_hot) ac = 1; 11 ◮ Never finds the bug two messages: first passing 3 and then passing 0 } 12 after “hours” if (is_room_hot && is_door_closed 13 • Because this bug takes at least two messages to manifest, the && !ac) { 14 chance for a random tester to find it is one out of 2 to the sixty four, abort(); 15 which is obviously very close to zero } 16 • DART on the other hand, finds the bug in less than one second } 17 15/20

  65. AC-Controller • First, we can look at the source code of the AC controller • The source code is very small but makes a serves as a good comparison to randomized testing int is_room_hot, ac, is_door_closed; 1 void ac_controller(int message) { 2 ◮ Random testing • The program is buggy: the abort statement in the program is if (message == 0) is_room_hot = 1; 3 reachable under certain program inputs does not work if (message == 1) is_room_hot = 0; 4 ◮ 2 32 × 2 32 = 2 64 • First, to understand how this function was run you need to imagine if (message == 2) { 5 that this function can be called an arbitrary number of times with is_door_closed = 0; 6 number of different values for message ac = 0; 7 possibilities } 8 • It is essentially representing a state machine which causes transitions ◮ One leads to the if (message == 3) { 9 based on the input to the function is_door_closed = 1; 10 error • The abort statement in the program can be reached after applying if (is_room_hot) ac = 1; 11 ◮ Never finds the bug two messages: first passing 3 and then passing 0 } 12 after “hours” if (is_room_hot && is_door_closed 13 • Because this bug takes at least two messages to manifest, the && !ac) { 14 ◮ DART: less than chance for a random tester to find it is one out of 2 to the sixty four, abort(); 15 which is obviously very close to zero one second } 16 • DART on the other hand, finds the bug in less than one second } 17 15/20

  66. Needham-Schroeder Protocol • Next, the authors looked at the C implementation of the Needham-Schroeder protocol ◮ Protocol for two users to authenticate each other • We do not need to consider the details of the protcol but is essentially a way for two users to start a secure communication channel • The original algorithm contains a bug allowing an attacker to impersonate a user • They tested on a 400 line C implementation • They constrained the environment, or, the actions acceptable by the attacker to be as reasonable as the assumptions used in the paper describing the fault in the protocol • Given these assumptions, DART was able to reproduce the fault in the protocol after 18 minutes of testing • The author who originally reported the fault in the protocol proposed a fix • Re running dart on the fixed protocol lead to another bug to be found which was acknowledged by the author • It took DART 22 minutes to find this bug 16/20

  67. Needham-Schroeder Protocol • Next, the authors looked at the C implementation of the Needham-Schroeder protocol ◮ Protocol for two users to authenticate each other • We do not need to consider the details of the protcol but is essentially a way for two users to start a secure communication ◮ Contains impersonation bug channel • The original algorithm contains a bug allowing an attacker to impersonate a user • They tested on a 400 line C implementation • They constrained the environment, or, the actions acceptable by the attacker to be as reasonable as the assumptions used in the paper describing the fault in the protocol • Given these assumptions, DART was able to reproduce the fault in the protocol after 18 minutes of testing • The author who originally reported the fault in the protocol proposed a fix • Re running dart on the fixed protocol lead to another bug to be found which was acknowledged by the author • It took DART 22 minutes to find this bug 16/20

  68. Needham-Schroeder Protocol • Next, the authors looked at the C implementation of the Needham-Schroeder protocol ◮ Protocol for two users to authenticate each other • We do not need to consider the details of the protcol but is essentially a way for two users to start a secure communication ◮ Contains impersonation bug channel ◮ C implementation (400 LOC) • The original algorithm contains a bug allowing an attacker to impersonate a user • They tested on a 400 line C implementation • They constrained the environment, or, the actions acceptable by the attacker to be as reasonable as the assumptions used in the paper describing the fault in the protocol • Given these assumptions, DART was able to reproduce the fault in the protocol after 18 minutes of testing • The author who originally reported the fault in the protocol proposed a fix • Re running dart on the fixed protocol lead to another bug to be found which was acknowledged by the author • It took DART 22 minutes to find this bug 16/20

  69. Needham-Schroeder Protocol • Next, the authors looked at the C implementation of the Needham-Schroeder protocol ◮ Protocol for two users to authenticate each other • We do not need to consider the details of the protcol but is essentially a way for two users to start a secure communication ◮ Contains impersonation bug channel ◮ C implementation (400 LOC) • The original algorithm contains a bug allowing an attacker to ◮ Used “reasonable” environment constraints impersonate a user • They tested on a 400 line C implementation • They constrained the environment, or, the actions acceptable by the attacker to be as reasonable as the assumptions used in the paper describing the fault in the protocol • Given these assumptions, DART was able to reproduce the fault in the protocol after 18 minutes of testing • The author who originally reported the fault in the protocol proposed a fix • Re running dart on the fixed protocol lead to another bug to be found which was acknowledged by the author • It took DART 22 minutes to find this bug 16/20

  70. Needham-Schroeder Protocol • Next, the authors looked at the C implementation of the Needham-Schroeder protocol ◮ Protocol for two users to authenticate each other • We do not need to consider the details of the protcol but is essentially a way for two users to start a secure communication ◮ Contains impersonation bug channel ◮ C implementation (400 LOC) • The original algorithm contains a bug allowing an attacker to ◮ Used “reasonable” environment constraints impersonate a user ◮ Dart: 18 minutes to find error • They tested on a 400 line C implementation • They constrained the environment, or, the actions acceptable by the attacker to be as reasonable as the assumptions used in the paper describing the fault in the protocol • Given these assumptions, DART was able to reproduce the fault in the protocol after 18 minutes of testing • The author who originally reported the fault in the protocol proposed a fix • Re running dart on the fixed protocol lead to another bug to be found which was acknowledged by the author • It took DART 22 minutes to find this bug 16/20

  71. Needham-Schroeder Protocol • Next, the authors looked at the C implementation of the Needham-Schroeder protocol ◮ Protocol for two users to authenticate each other • We do not need to consider the details of the protcol but is essentially a way for two users to start a secure communication ◮ Contains impersonation bug channel ◮ C implementation (400 LOC) • The original algorithm contains a bug allowing an attacker to ◮ Used “reasonable” environment constraints impersonate a user ◮ Dart: 18 minutes to find error • They tested on a 400 line C implementation ◮ Re-ran on “fixed” version: found another bug • They constrained the environment, or, the actions acceptable by the attacker to be as reasonable as the assumptions used in the paper describing the fault in the protocol • Given these assumptions, DART was able to reproduce the fault in the protocol after 18 minutes of testing • The author who originally reported the fault in the protocol proposed a fix • Re running dart on the fixed protocol lead to another bug to be found which was acknowledged by the author • It took DART 22 minutes to find this bug 16/20

  72. Needham-Schroeder Protocol • Next, the authors looked at the C implementation of the Needham-Schroeder protocol ◮ Protocol for two users to authenticate each other • We do not need to consider the details of the protcol but is essentially a way for two users to start a secure communication ◮ Contains impersonation bug channel ◮ C implementation (400 LOC) • The original algorithm contains a bug allowing an attacker to ◮ Used “reasonable” environment constraints impersonate a user ◮ Dart: 18 minutes to find error • They tested on a 400 line C implementation ◮ Re-ran on “fixed” version: found another bug • They constrained the environment, or, the actions acceptable by the attacker to be as reasonable as the assumptions used in the paper ◮ 22 minutes describing the fault in the protocol • Given these assumptions, DART was able to reproduce the fault in the protocol after 18 minutes of testing • The author who originally reported the fault in the protocol proposed a fix • Re running dart on the fixed protocol lead to another bug to be found which was acknowledged by the author • It took DART 22 minutes to find this bug 16/20

  73. oSIP • oSIP is essentially a library implementing telephone and other multi-media stuff over IP ◮ oSIP: Telephone over IP library • The authors tested the external library functions • First, they found many functions which crash when passed a NULL pointer because the function seemed to assume the pointers were non-null • The authors moved onto looking at more functions in the program and found a potential way to crash the library • The crash involved an input allocating too much space on the stack; the library does not check the return of the alloca call, which could be NULL, causing a crash • Because there is not a clear specification, the authors were not sure if these were real bugs, but they note that the parser issue was fixed by the developers • Though the authors do not mention it, this points at one of the issues of making a practical directed testing framework which is that the tool produces more meaningful results if there is a specification present 17/20

  74. oSIP • oSIP is essentially a library implementing telephone and other multi-media stuff over IP ◮ oSIP: Telephone over IP library • The authors tested the external library functions ◮ Tested external functions • First, they found many functions which crash when passed a NULL pointer because the function seemed to assume the pointers were non-null • The authors moved onto looking at more functions in the program and found a potential way to crash the library • The crash involved an input allocating too much space on the stack; the library does not check the return of the alloca call, which could be NULL, causing a crash • Because there is not a clear specification, the authors were not sure if these were real bugs, but they note that the parser issue was fixed by the developers • Though the authors do not mention it, this points at one of the issues of making a practical directed testing framework which is that the tool produces more meaningful results if there is a specification present 17/20

  75. oSIP • oSIP is essentially a library implementing telephone and other multi-media stuff over IP ◮ oSIP: Telephone over IP library • The authors tested the external library functions ◮ Tested external functions • First, they found many functions which crash when passed a NULL pointer because the function seemed to assume the pointers were ◮ Found many functions not checking NULL pointers non-null • The authors moved onto looking at more functions in the program and found a potential way to crash the library • The crash involved an input allocating too much space on the stack; the library does not check the return of the alloca call, which could be NULL, causing a crash • Because there is not a clear specification, the authors were not sure if these were real bugs, but they note that the parser issue was fixed by the developers • Though the authors do not mention it, this points at one of the issues of making a practical directed testing framework which is that the tool produces more meaningful results if there is a specification present 17/20

  76. oSIP • oSIP is essentially a library implementing telephone and other multi-media stuff over IP ◮ oSIP: Telephone over IP library • The authors tested the external library functions ◮ Tested external functions • First, they found many functions which crash when passed a NULL pointer because the function seemed to assume the pointers were ◮ Found many functions not checking NULL pointers non-null ◮ Found denial of service in parser • The authors moved onto looking at more functions in the program and found a potential way to crash the library • The crash involved an input allocating too much space on the stack; the library does not check the return of the alloca call, which could be NULL, causing a crash • Because there is not a clear specification, the authors were not sure if these were real bugs, but they note that the parser issue was fixed by the developers • Though the authors do not mention it, this points at one of the issues of making a practical directed testing framework which is that the tool produces more meaningful results if there is a specification present 17/20

  77. oSIP • oSIP is essentially a library implementing telephone and other multi-media stuff over IP ◮ oSIP: Telephone over IP library • The authors tested the external library functions ◮ Tested external functions • First, they found many functions which crash when passed a NULL pointer because the function seemed to assume the pointers were ◮ Found many functions not checking NULL pointers non-null ◮ Found denial of service in parser • The authors moved onto looking at more functions in the program ◮ Request too large a stack frame and found a potential way to crash the library • The crash involved an input allocating too much space on the stack; the library does not check the return of the alloca call, which could be NULL, causing a crash • Because there is not a clear specification, the authors were not sure if these were real bugs, but they note that the parser issue was fixed by the developers • Though the authors do not mention it, this points at one of the issues of making a practical directed testing framework which is that the tool produces more meaningful results if there is a specification present 17/20

  78. oSIP • oSIP is essentially a library implementing telephone and other multi-media stuff over IP ◮ oSIP: Telephone over IP library • The authors tested the external library functions ◮ Tested external functions • First, they found many functions which crash when passed a NULL pointer because the function seemed to assume the pointers were ◮ Found many functions not checking NULL pointers non-null ◮ Found denial of service in parser • The authors moved onto looking at more functions in the program ◮ Request too large a stack frame and found a potential way to crash the library ◮ Return of alloca not checked • The crash involved an input allocating too much space on the stack; the library does not check the return of the alloca call, which could be NULL, causing a crash • Because there is not a clear specification, the authors were not sure if these were real bugs, but they note that the parser issue was fixed by the developers • Though the authors do not mention it, this points at one of the issues of making a practical directed testing framework which is that the tool produces more meaningful results if there is a specification present 17/20

  79. oSIP • oSIP is essentially a library implementing telephone and other multi-media stuff over IP ◮ oSIP: Telephone over IP library • The authors tested the external library functions ◮ Tested external functions • First, they found many functions which crash when passed a NULL pointer because the function seemed to assume the pointers were ◮ Found many functions not checking NULL pointers non-null ◮ Found denial of service in parser • The authors moved onto looking at more functions in the program ◮ Request too large a stack frame and found a potential way to crash the library ◮ Return of alloca not checked • The crash involved an input allocating too much space on the stack; ◮ “Bugs” fixed by developers the library does not check the return of the alloca call, which could be NULL, causing a crash • Because there is not a clear specification, the authors were not sure if these were real bugs, but they note that the parser issue was fixed by the developers • Though the authors do not mention it, this points at one of the issues of making a practical directed testing framework which is that the tool produces more meaningful results if there is a specification present 17/20

  80. oSIP • oSIP is essentially a library implementing telephone and other multi-media stuff over IP ◮ oSIP: Telephone over IP library • The authors tested the external library functions ◮ Tested external functions • First, they found many functions which crash when passed a NULL pointer because the function seemed to assume the pointers were ◮ Found many functions not checking NULL pointers non-null ◮ Found denial of service in parser • The authors moved onto looking at more functions in the program ◮ Request too large a stack frame and found a potential way to crash the library ◮ Return of alloca not checked • The crash involved an input allocating too much space on the stack; ◮ “Bugs” fixed by developers the library does not check the return of the alloca call, which could be NULL, causing a crash ◮ Intuition: specifications make this technique much better • Because there is not a clear specification, the authors were not sure if these were real bugs, but they note that the parser issue was fixed by the developers • Though the authors do not mention it, this points at one of the issues of making a practical directed testing framework which is that the tool produces more meaningful results if there is a specification present 17/20

  81. Next, I’ll go over some conclusions and open questions in the paper Overview Introduction Path Constraints Experimental Results Conclusions and Questions 18/20

  82. Open Questions • The paper leaves some questions open at the time of writing • First, the authors are only considering branches as a source of ◮ How to handle concurrent programs? non-determinism in the program • In the case of a concurrent program, it is not clear how the technique could simultaneously generate inputs to check both the branches and thread schedules • There was, however, an interesting sounding paper by some cool authors in this years FSE extending the DART approach to efficiently handle multi-threaded programs • Second, the analysis is bounded: its not clear how or if a technique such as this can be used in an unbounded analysis • And third, it is not too clear how scalable this analysis is • For example, if there are very complicated functions or those using very long loops or recurions, its not clear if the constraints generated by the analysis will be solvable 19/20

  83. Open Questions • The paper leaves some questions open at the time of writing • First, the authors are only considering branches as a source of ◮ How to handle concurrent programs? non-determinism in the program ◮ Branches and thread schedules? • In the case of a concurrent program, it is not clear how the technique could simultaneously generate inputs to check both the branches and thread schedules • There was, however, an interesting sounding paper by some cool authors in this years FSE extending the DART approach to efficiently handle multi-threaded programs • Second, the analysis is bounded: its not clear how or if a technique such as this can be used in an unbounded analysis • And third, it is not too clear how scalable this analysis is • For example, if there are very complicated functions or those using very long loops or recurions, its not clear if the constraints generated by the analysis will be solvable 19/20

  84. Open Questions • The paper leaves some questions open at the time of writing • First, the authors are only considering branches as a source of ◮ How to handle concurrent programs? non-determinism in the program ◮ Branches and thread schedules? • In the case of a concurrent program, it is not clear how the ◮ Assertion Guided Symbolic Execution of Multithreaded technique could simultaneously generate inputs to check both the Programs , Shengjian Guo, Markus Kusano, Chao Wang, branches and thread schedules Zijiang Yang, Aarti Gupta. FSE ’15 • There was, however, an interesting sounding paper by some cool authors in this years FSE extending the DART approach to efficiently handle multi-threaded programs • Second, the analysis is bounded: its not clear how or if a technique such as this can be used in an unbounded analysis • And third, it is not too clear how scalable this analysis is • For example, if there are very complicated functions or those using very long loops or recurions, its not clear if the constraints generated by the analysis will be solvable 19/20

  85. Open Questions • The paper leaves some questions open at the time of writing • First, the authors are only considering branches as a source of ◮ How to handle concurrent programs? non-determinism in the program ◮ Branches and thread schedules? • In the case of a concurrent program, it is not clear how the ◮ Assertion Guided Symbolic Execution of Multithreaded technique could simultaneously generate inputs to check both the Programs , Shengjian Guo, Markus Kusano, Chao Wang, branches and thread schedules Zijiang Yang, Aarti Gupta. FSE ’15 • There was, however, an interesting sounding paper by some cool ◮ How to handle unbounded programs? authors in this years FSE extending the DART approach to efficiently handle multi-threaded programs • Second, the analysis is bounded: its not clear how or if a technique such as this can be used in an unbounded analysis • And third, it is not too clear how scalable this analysis is • For example, if there are very complicated functions or those using very long loops or recurions, its not clear if the constraints generated by the analysis will be solvable 19/20

  86. Open Questions • The paper leaves some questions open at the time of writing • First, the authors are only considering branches as a source of ◮ How to handle concurrent programs? non-determinism in the program ◮ Branches and thread schedules? • In the case of a concurrent program, it is not clear how the ◮ Assertion Guided Symbolic Execution of Multithreaded technique could simultaneously generate inputs to check both the Programs , Shengjian Guo, Markus Kusano, Chao Wang, branches and thread schedules Zijiang Yang, Aarti Gupta. FSE ’15 • There was, however, an interesting sounding paper by some cool ◮ How to handle unbounded programs? authors in this years FSE extending the DART approach to efficiently handle multi-threaded programs ◮ How scalable is this approach? • Second, the analysis is bounded: its not clear how or if a technique such as this can be used in an unbounded analysis • And third, it is not too clear how scalable this analysis is • For example, if there are very complicated functions or those using very long loops or recurions, its not clear if the constraints generated by the analysis will be solvable 19/20

  87. Conclusion • So, in conclusion I presented DART, a tool to generate test inputs for functions in order to automated the creation of unit tests ◮ Function-test generation • The technique is fully automated in that the developer does not need to hand generate test inputs to exercise new paths in a function • The experimental results showed that the technique is faster than simple random testing • With that, I’ll take any questions 20/20

  88. Conclusion • So, in conclusion I presented DART, a tool to generate test inputs for functions in order to automated the creation of unit tests ◮ Function-test generation • The technique is fully automated in that the developer does not need to hand generate test inputs to exercise new paths in a function ◮ Fully automated • The experimental results showed that the technique is faster than simple random testing • With that, I’ll take any questions 20/20

  89. Conclusion • So, in conclusion I presented DART, a tool to generate test inputs for functions in order to automated the creation of unit tests ◮ Function-test generation • The technique is fully automated in that the developer does not need to hand generate test inputs to exercise new paths in a function ◮ Fully automated • The experimental results showed that the technique is faster than ◮ Faster than random testing simple random testing • With that, I’ll take any questions 20/20

Recommend


More recommend