Efficient Incremental Dynamic Invariant Detection Jeff Perkins and - - PowerPoint PPT Presentation

efficient incremental dynamic invariant detection
SMART_READER_LITE
LIVE PREVIEW

Efficient Incremental Dynamic Invariant Detection Jeff Perkins and - - PowerPoint PPT Presentation

Efficient Incremental Dynamic Invariant Detection Jeff Perkins Efficient Incremental Dynamic Invariant Detection Jeff Perkins and Michael Ernst MIT CSAIL Page 1 27 Oct 2004 20:38 Efficient Incremental Dynamic Invariant Detection Jeff


slide-1
SLIDE 1

Efficient Incremental Dynamic Invariant Detection

Jeff Perkins and Michael Ernst MIT CSAIL

27 Oct 2004 20:38 Page 1 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-2
SLIDE 2

Dynamic invariant detection Program analysis that generalizes over observed runtime values to hypothesize program properties The result is a set of likely invariants per program point

Entry to function binary_search(int[] list, int val)

list is sorted list ≠ null val ∈ list

Exit from function square(int a)

return = a ⋅ a

Class Stack

this.top = this.stack[this.top_stack-1] this.stack[this.top_stack..] = null

27 Oct 2004 20:38 Page 2 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-3
SLIDE 3

Uses of dynamic invariant detection

Verifying safety properties [Vaziri 98] [Nimmer 02] Automatic theorem proving [Win 02] Identifying refactoring opportunities [Kataoka 01] Predicate abstraction [Dodoo 02] Generating test cases [Xie 03] [Gupta 03] Selecting and prioritizing test cases [Harder 03] Explaining test failures [Groce 03] Predicting incompatibilities in component upgrades [McCamant 03] Error detection [Raz 02] [Hangal 02] [Pytlik 03] [Mariani 04] [Brun 04] Error isolation [Xie 02] [Liblit 03] Choosing modalities [Lin 04]

27 Oct 2004 20:38 Page 3 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-4
SLIDE 4

Goals of this research Handle moderate to large programs Produce useful and expressive program properties

Rich set of derived variables

array references: a[i], a[i..], a[..i] pre-state variables: at exit, orig(x) stands for the value at entry

Rich invariant grammar

unary, binary, and ternary invariants invariants over pointers, integers, floats, strings and arrays

27 Oct 2004 20:38 Page 4 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-5
SLIDE 5

Outline Approaches to invariant detection

Simple incremental algorithm Simple incremental algorithm scales poorly Many invariants are redundant Multiple pass approach Multi-pass scales poorly to large data sets

Optimized incremental algorithm Complications Results

27 Oct 2004 20:38 Page 5 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-6
SLIDE 6

Simple incremental algorithm Hypothesize each invariant in the grammar

Over each set of variables At each program point

Check observed values for each variable (sample) at each invariant

Discard invariants that are falsified

The remaining invariants are true over the sample data Examples

DIDUCE [Hangal 02] - checks 1 invariant over each variable Carrot [Pytlik 03] - checks 2 unary and 4 binary invariants Daikon version 1

27 Oct 2004 20:38 Page 6 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-7
SLIDE 7

Simple incremental algorithm scales poorly Ternary derived variables (eg, A[i..j])

V = the number of source program variables (at a program point) VD = O(V3)

Ternary invariants

I = O(VD

3) = O(V9)

The number of possible invariants in modest test cases ranged from 460 million to 750 million

27 Oct 2004 20:38 Page 7 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-8
SLIDE 8

Many invariants are redundant Many invariants are implied by other invariants Examples

(x = y) ∧ odd(x) ⇒ odd(y) (x = 5) ∧ (y = 6) ⇒ (x < y) (x < y) ⇒ (x ≥ y) (x ≥ y) at class Stack ⇒ (x ≥ y) at method Stack.top()

27 Oct 2004 20:38 Page 8 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-9
SLIDE 9

Multiple pass approach Processes the input data multiple times Early passes check simple invariants Later passes check more complex invariants only if they are not redundant

Constants are checked first and removed Equality is checked next. Only one member of an equal set need be checked in following passes

The multi-pass approach doesn’t create or check invariants implied by earlier passes (saving both time and space) Example: Daikon version 2

27 Oct 2004 20:38 Page 9 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-10
SLIDE 10

Multi-pass scales poorly to large data sets Even modest traces require gigabytes of space Possible solutions have drawbacks

May be too large to store in memory File I/O is expensive and disks may be insufficient for larger traces Running the target program multiple times is often not acceptable

Program has side effects Program depends on its environment Program uses expensive resources (such as human attention) Program doesn’t terminate

27 Oct 2004 20:38 Page 10 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-11
SLIDE 11

Outline Approaches to invariant detection Optimized incremental algorithm

Optimized incremental algorithm concept Constants Equality sets Program point and variable hierarchy program point and variable hierarchy Suppression

Complications Results

27 Oct 2004 20:38 Page 11 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-12
SLIDE 12

Optimized incremental algorithm concept Same processing model as the simple incremental algorithm Redundant invariants are not instantiated or checked

Many invariants are implied by others As long as the antecedents are true, the consequent need be neither instantiated nor checked

An invariant must be created when its antecedent is falsified

(x = y) ∧ odd(x) ⇒ odd(y) If a sample is seen where x ≠ y, the odd(y) invariant must be created The new invariant must be true over all past samples (which are no longer available) The new invariant must be checked over future samples

27 Oct 2004 20:38 Page 12 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-13
SLIDE 13

Constants Invariants over (only) constant variables are redundant

(x = 5) ⇒ odd(x) (x = 5) ∧ (y = 6) ⇒ x < y

All variables are initially constant Invariants are not instantiated between constants When (var = constant) is falsified

Invariants are instantiated between it and all remaining constants Invariants which are not true over the constant values are discarded

27 Oct 2004 20:38 Page 13 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-14
SLIDE 14

Equality sets If two or more variables are equal, any invariant true over

  • ne variable is true over all of them

(x = y) and f(x) ⇒ f(y)

Initially, all variables are placed in a single equality set One variable (the leader) represents the set Invariants are instantiated only between leaders When (var1 = var2) is falsified

The set is split into two or more equality sets Invariants over each old leader are copied to each new leader

27 Oct 2004 20:38 Page 14 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-15
SLIDE 15

Program point and variable hierarchy Relationship between program points

Class A A.m1() entry A.m1() exit A.m2() entry A.m2() exit

Samples are only processed at the leaves of the hierarchy Invariants are created at the parent iff it is true at each child

x = y

Initially each invariant (e.g., x = y) holds at each leaf

x = y x = y x = y x = y x = y

27 Oct 2004 20:38 Page 15 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-16
SLIDE 16

program point and variable hierarchy Relationship between program points

Class A A.m1() entry A.m1() exit A.m2() entry A.m2() exit

Samples are only processed at the leaves of the hierarchy Invariants are created at the parent iff it is true at each child

x = y

After processing the invariant was falsified at one program point (red)

x = y x = y x = y x = y x = y

27 Oct 2004 20:38 Page 16 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-17
SLIDE 17

program point and variable hierarchy Relationship between program points

Class A A.m1() entry A.m1() exit A.m2() entry A.m2() exit

Samples are only processed at the leaves of the hierarchy Invariants are created at the parent iff it is true at each child

x = y

Post processing creates parent invariants

x = y x = y x = y x = y x = y x = y x = y x = y

27 Oct 2004 20:38 Page 17 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-18
SLIDE 18

program point and variable hierarchy Relationship between program points

Class A A.m1() entry A.m1() exit A.m2() entry A.m2() exit

Samples are only processed at the leaves of the hierarchy Invariants are created at the parent iff it is true at each child

x = y

Post processing creates parent invariants

x = y x = y x = y x = y x = y x = y x = y x = y x = y

27 Oct 2004 20:38 Page 18 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-19
SLIDE 19

Suppression An invariant can be suppressed if it is logically implied by some set of other invariants. For example:

(x = y) ∧ (z = 1) ⇒ x = y ⋅ z (x = z) ∧ (y = 1) ⇒ x = y ⋅ z

Other optimizations are special cases of suppression Goals

Instantiate/check only non-redundant invariants Use no storage for a non-instantiated invariants

When an antecedent is falsified

Each invariant that might be suppressed is checked If a suppression held before the antecedent was falsified, but no suppression holds after, the invariant is instantiated

27 Oct 2004 20:38 Page 19 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-20
SLIDE 20

Outline Approaches to invariant detection Optimized incremental algorithm Complications

Missing variables Optimizations interact

Results

27 Oct 2004 20:38 Page 20 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-21
SLIDE 21

Missing variables Suppose a is null. What do we do with the invariant a.b > x? One choice is to falsify the invariant

The invariant is thus: (a ≠ null) ∧ (a.b > x) Problem: interesting invariants are lost

Alternative is to retain the invariant

The invariant is thus: (a ≠ null) ⇒ (a.b > x) Problem: difficult to implement

Optimizations must take missing into account

Constants must never be missing Members of an equality set must have identical missing attributes Suppressions can’t assume transitivity

(x > a.b) ∧ (a.b > y) ≠> (x > y)

((a ≠ null) ⇒ (x > a.b)) ∧ ((a ≠ null) ⇒ (a.b > y))

⇒ (a ≠ null) ⇒ (x > y)

27 Oct 2004 20:38 Page 21 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-22
SLIDE 22

Optimizations interact When checking to see if an invariant is no longer suppressed, uninstantiated invariants must be considered. Creating parent invariants using the program point hierarchy

Suppression optimizations must be undone Constant and equality set information must be merged Different equalities in different children require special processing Uninstantiated invariants between constants must be considered

27 Oct 2004 20:38 Page 22 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-23
SLIDE 23

Outline Approaches to invariant detection Optimized incremental algorithm Complications Results

Optimizations are effective Real programs can be processed Performance comparison on the Daikon utilities Contributions

27 Oct 2004 20:38 Page 23 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-24
SLIDE 24

Optimizations are effective

1 10 100 1000 10000 100000 1e+06 1e+07 500 1000 1500 2000 2500

invariant count sample count Candidate invariant count after each sample is processed

without optimizations with all optimizations

100 times fewer invariants with the optimizations

27 Oct 2004 20:38 Page 24 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-25
SLIDE 25

Real programs can be processed The optimized algorithm can process non-trivial programs in a reasonable amount of time and space The multi-pass and simple incremental approaches cannot process our experiments Experiments

Flex lexical analyzer

391 program points averaging 275 variables each 232,000 samples (9.2 Gbytes of data) Processing time of 4 hours Max memory use of 750 Mbytes

Daikon utilities

1593 program points averaging 60 variables each 26 million samples (11.5 Gbytes of data) Processing time of 1.5 hours Max memory use of 150 Mbytes

27 Oct 2004 20:38 Page 25 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-26
SLIDE 26

Performance comparison on the Daikon utilities

200 400 600 800 0M 5M 10M 15M 20M 25M 30M

Memory (Mbytes)

incremental multi-pass 50 100 150 200 0M 5M 10M 15M 20M 25M 30M

Time (minutes) number of samples processed

incremental multi-pass

27 Oct 2004 20:38 Page 26 Jeff Perkins Efficient Incremental Dynamic Invariant Detection

slide-27
SLIDE 27

Contributions Effective optimizations in an incremental context

Redundant invariants are neither instantiated or checked When antecedents are falsified, the optimization is undone and invariants that are no longer redundant are created

Result is usable in a wide variety of contexts

Handles non-trivial programs Supports a rich set of derived variables and invariants Supports on-line operation

27 Oct 2004 20:38 Page 27 Jeff Perkins Efficient Incremental Dynamic Invariant Detection