Program verification via Machine learning Aditya V. Nori Programming Languages & Tools group Microsoft Research India Joint work with Rahul Sharma, Alex Aiken (Stanford University)
Program verification 1: x = y = 0; 1: gcd(int x, int y) 2: while (*) 2: { 3: x++; y++; 3: assume(x>0 && y>0); 4: while (x != 0) 4: while (x !=y ) { 5: x--; y--; 5: if (x > y) x = x-y; 6: assert (y == 0); 6: if (y > x) y = y-x; 7: } 8: return x; Qu Questi tion on 9 } Is the assertion satisfied for all Qu Questi tion on possible inputs? Does gcd terminate for all inputs π¦ , π§ ?
Current state of affairs β’ Precision β’ Scalability β’ Testing is still the dominant technique for establishing software quality
Question β¦ β’ Most applications are associated with test suites, primarily used for regression or fuzz testing β’ Can we use these test suites profitably for proving program correctness?
Hereβs the plan β¦ β’ Guess: analyse data from tests in order to infer a candidate invariant (use ML techniques) β’ Check: validate candidate invariant using Guess sound program analysis techniques β’ If check succeeds, then we have a proof! β’ If check fails, use failure to generate more data program π’ π and repeat guess+check Check β’ Why is this nice? β’ Program analysis not so good at guessing invariants β’ Program analysis is good at checking invariants β’ Able to make use of data generated from programs and existing ML algorithms for analysis
Instantiations of Guess β’ Classification ο± Interpolants as Classifiers. Sharma, N, Aiken, Computer-Aided Verification (CAV 2012) ο± Program Verification as Learning Geometric Concepts. Sharma, Gupta, Hariharan, Aiken, N. Submitted β’ Linear algebra ο± A Data Driven Approach for Algebraic Loop Invariants. Sharma, Gupta, Hariharan, Aiken, N. European Symposium on Programming (ESOP 2012) β’ Regression ο± Termination proofs from tests. N, Sharma. submitted
Interpolants β’ An interpolant for a pair of formulas π΅, πΆ s.t. (π΅ β§ πΆ =β₯) is a formula π½ satisfying: β’ π΅ β π½ β’ π½ β§ πΆ =β₯ β’ π€ππ π‘ π½ β π€ππ π‘ π΅ β© π€ππ π‘ πΆ β’ An interpolant is a βsimpleβ proof
Example β’ π΅ = π¦ β₯ π§ y β’ πΆ = π§ β₯ π¦ + 1 β’ π½ = 2π¦ + 1 β₯ 2π§ x
Binary classification β’ Input: a set of points π with labels π β +1, β1 β’ Goal: find a classifier π·: X β {π’π π£π, ππππ‘π} such that: β’ π· π = π’π π£π, βπ β π . πππππ π = +1 , and β’ π· π = ππππ‘π, βπ β X . πππππ π = β1
Verification & Machine-learning β’ Interpolant: separates formula π΅ from formula πΆ β’ Classifier: separates positive examples from negative examples Is there a connection?
Yes! β’ Main result: view interpolants as classifiers which distinguish β + β examples from β β β examples β’ Use state-of-the-art classification algorithms ( SVM s) for computing invariants β’ SVM s are predictive β generalized predicates for verification
Verification & Machine-learning Get positive and negative Unroll the loops examples β’ Find interpolants β’ Find a classifier β’ Get general proofs (loop β’ This is a predicate which invariants) generalizes to test data
Example 1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert (y == 0);
Example β¦ β’ π΅ β‘ π¦ 1 = 0 β§ π§ 1 = 0 β§ ππ’π(π, π¦ = π¦ 1 + 1 β§ π§ = 1: x = y = 0; π΅ 2: while (*) π§ 1 + 1, π¦ = π¦ 1 β§ π§ = π§ 1 ) 3: x++; y++; 4: while (x != 0) β’ πΆ β‘ ππ’π(π¦ = 0, π¦ 2 = π¦ β 1 β§ π§ 2 = π§ β 1, π¦ 2 = 5: x--; y--; πΆ 6: assert (y == 0); π¦ β§ π§ 2 = π§) β§ π¦ 2 = 0 β§ π§ 2 β 0 β’ π΅ β§ πΆ =β₯ β’ π½ π¦, π§ β‘ π¦ = π§
Example ο‘ π΅ β‘ π¦ 1 = 0 β§ π§ 1 = 0 β§ ππ’π(π, π¦ = π¦ 1 + 1 β§ y π§ = π§ 1 + 1, π¦ = π¦ 1 β§ π§ = π§ 1 ) ο‘ πΆ β‘ ππ’π(π¦ = 0, π¦ 2 = π¦ β 1 β§ π§ 2 = π§ β + (1,1) 1, π¦ 2 = π¦ β§ π§ 2 = π§) β§ π¦ 2 = 0 β§ π§ 2 β 0 ο‘ π½ 1 β‘ 2π§ β€ 2π¦ + 1 + x (0,0)
Example ο‘ π΅ β‘ π¦ 1 = 0 β§ π§ 1 = 0 β§ ππ’π(π, π¦ = π¦ 1 + 1 β§ y π§ = π§ 1 + 1, π¦ = π¦ 1 β§ π§ = π§ 1 ) ο‘ πΆ β‘ ππ’π(π¦ = 0, π¦ 2 = π¦ β 1 β§ π§ 2 = π§ β + (1,1) 1, π¦ 2 = π¦ β§ π§ 2 = π§) β§ π¦ 2 = 0 β§ π§ 2 β 0 ο‘ π½ 2 β‘ 2π§ β€ 2π¦ + 1 β§ 2π§ β₯ 2π¦ β 1 + x (0,0) Interpolant!
The algorithm Theorem: π½ππ’ππ ππππππ’(π΅, πΆ) terminates only if π½ππ’ππ ππππππ’(π΅, πΆ) output πΌ is an interpolant between π΅ and πΆ (π + , π β ) = π½πππ’(π΅, πΆ) while(true) { Find candidate interpolant πΌ = ππππ½(π + , π β ) if ( ππ΅π π΅ β§ Β¬πΌ ) π΅ β π½ Add π‘ to π + and continue; if ( ππ΅π πΆ β§ Β¬πΌ ) π½ β§ πΆ =β₯ Add π‘ to π β and continue; break; Exit if interpolant found } return πΌ ;
Evaluation β’ 1000 lines of C++ β’ LIBSVM for SVM queries β’ Z3 theorem prover
Proving termination β’ For every loop, guess a bound on the number of iterations β’ Check the bound with a safety checker
Example: GCD 1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: while (x !=y ) { 5: if (x > y) x = x-y; 6: if (y > x) y = y-x; 7: } 8: return x; 9 }
Example: Instrumented GCD β’ Inputs 1: gcd(int x, int y) π¦, π§ = { 1,2 , 2,1 , 1,3 , 3,1 } 2: { 3: assume(x>0 && y>0); π 4: // instrumented code 1 π π 5: a = x; b = y; c = 0; 1 1 1 2 6: while (x !=y ) { 1 1 2 1 7: // instrumented code β’ π΅ = 1 , C = 1 1 3 8: c = c+1; 2 1 1 3 9: writeLog(a, b, c, x, y); 1 1 3 1 10: if (x > y) x = x-y; 2 11: if (y > x) y = y-x; 1 3 1 12: } 13: return x; β’ Find π β π₯ 1 π + π₯ 2 π + π₯ 3 (linear regression) 14: }
Linear regression β’ min π (π₯ 1 π + π₯ 2 π + π₯ 3 β π π ) 2
Quadratic programming β’ min π (π₯ 1 π + π₯ 2 π + π₯ 3 β π π ) 2 π‘. π’. π΅π₯ β₯ π· β’ Guess is π π, π = π + π β 2
Example: Annotated GCD β’ Check with a safety checker 1: gcd(int x, int y) 2: { β’ Free invariant to aid checker 3: assume(x>0 && y>0); π β€ π + π β π¦ β π§ β§ π¦ > 0 β§ π§ > 0 4: a = x; b = y; c = 0; 5: while (x !=y ) { β’ Corrective measures 6: // annotation β’ Sound rounding for polynomials 7: free_invariant(c <= a+b-x-y); with integer coefficients 8: // annotation β’ Partitioning of tests for 9: assert(c <= a+b-2); 10: if (x > y) x = x-y; discovering disjunctive loop 11: if (y > x) y = y-x; bounds 12: } 13: return x; 14: }
Evaluation
Summary β’ Classification based algorithms can be used for computing proofs in program verification β’ Follow-up work on using techniques from linear algebra and PAC learning for scalable proofs β’ Proving program termination via linear regression β’ Data a Driven ven Program ram An Analys lysis is
Recommend
More recommend