machine learning
play

Machine learning Aditya V. Nori Programming Languages & Tools - PowerPoint PPT Presentation

Program verification via Machine learning Aditya V. Nori Programming Languages & Tools group Microsoft Research India Joint work with Rahul Sharma, Alex Aiken (Stanford University) Program verification 1: x = y = 0; 1: gcd(int x, int


  1. Program verification via Machine learning Aditya V. Nori Programming Languages & Tools group Microsoft Research India Joint work with Rahul Sharma, Alex Aiken (Stanford University)

  2. Program verification 1: x = y = 0; 1: gcd(int x, int y) 2: while (*) 2: { 3: x++; y++; 3: assume(x>0 && y>0); 4: while (x != 0) 4: while (x !=y ) { 5: x--; y--; 5: if (x > y) x = x-y; 6: assert (y == 0); 6: if (y > x) y = y-x; 7: } 8: return x; Qu Questi tion on 9 } Is the assertion satisfied for all Qu Questi tion on possible inputs? Does gcd terminate for all inputs 𝑦 , 𝑧 ?

  3. Current state of affairs β€’ Precision β€’ Scalability β€’ Testing is still the dominant technique for establishing software quality

  4. Question … β€’ Most applications are associated with test suites, primarily used for regression or fuzz testing β€’ Can we use these test suites profitably for proving program correctness?

  5. Here’s the plan … β€’ Guess: analyse data from tests in order to infer a candidate invariant (use ML techniques) β€’ Check: validate candidate invariant using Guess sound program analysis techniques β€’ If check succeeds, then we have a proof! β€’ If check fails, use failure to generate more data program 𝑒 πœ… and repeat guess+check Check β€’ Why is this nice? β€’ Program analysis not so good at guessing invariants β€’ Program analysis is good at checking invariants β€’ Able to make use of data generated from programs and existing ML algorithms for analysis

  6. Instantiations of Guess β€’ Classification  Interpolants as Classifiers. Sharma, N, Aiken, Computer-Aided Verification (CAV 2012)  Program Verification as Learning Geometric Concepts. Sharma, Gupta, Hariharan, Aiken, N. Submitted β€’ Linear algebra  A Data Driven Approach for Algebraic Loop Invariants. Sharma, Gupta, Hariharan, Aiken, N. European Symposium on Programming (ESOP 2012) β€’ Regression  Termination proofs from tests. N, Sharma. submitted

  7. Interpolants β€’ An interpolant for a pair of formulas 𝐡, 𝐢 s.t. (𝐡 ∧ 𝐢 =βŠ₯) is a formula 𝐽 satisfying: β€’ 𝐡 β‡’ 𝐽 β€’ 𝐽 ∧ 𝐢 =βŠ₯ β€’ 𝑀𝑏𝑠𝑑 𝐽 βŠ† 𝑀𝑏𝑠𝑑 𝐡 ∩ 𝑀𝑏𝑠𝑑 𝐢 β€’ An interpolant is a β€œsimple” proof

  8. Example β€’ 𝐡 = 𝑦 β‰₯ 𝑧 y β€’ 𝐢 = 𝑧 β‰₯ 𝑦 + 1 β€’ 𝐽 = 2𝑦 + 1 β‰₯ 2𝑧 x

  9. Binary classification β€’ Input: a set of points π‘Œ with labels π‘š ∈ +1, βˆ’1 β€’ Goal: find a classifier 𝐷: X β†’ {𝑒𝑠𝑣𝑓, π‘”π‘π‘šπ‘‘π‘“} such that: β€’ 𝐷 𝑏 = 𝑒𝑠𝑣𝑓, βˆ€π‘ ∈ π‘Œ . π‘šπ‘π‘π‘“π‘š 𝑏 = +1 , and β€’ 𝐷 𝑐 = π‘”π‘π‘šπ‘‘π‘“, βˆ€π‘ ∈ X . π‘šπ‘π‘π‘“π‘š 𝑐 = βˆ’1

  10. Verification & Machine-learning β€’ Interpolant: separates formula 𝐡 from formula 𝐢 β€’ Classifier: separates positive examples from negative examples Is there a connection?

  11. Yes! β€’ Main result: view interpolants as classifiers which distinguish β€œ + ” examples from β€œ βˆ’ ” examples β€’ Use state-of-the-art classification algorithms ( SVM s) for computing invariants β€’ SVM s are predictive β†’ generalized predicates for verification

  12. Verification & Machine-learning Get positive and negative Unroll the loops examples β€’ Find interpolants β€’ Find a classifier β€’ Get general proofs (loop β€’ This is a predicate which invariants) generalizes to test data

  13. Example 1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert (y == 0);

  14. Example … β€’ 𝐡 ≑ 𝑦 1 = 0 ∧ 𝑧 1 = 0 ∧ 𝑗𝑒𝑓(𝑐, 𝑦 = 𝑦 1 + 1 ∧ 𝑧 = 1: x = y = 0; 𝐡 2: while (*) 𝑧 1 + 1, 𝑦 = 𝑦 1 ∧ 𝑧 = 𝑧 1 ) 3: x++; y++; 4: while (x != 0) β€’ 𝐢 ≑ 𝑗𝑒𝑓(𝑦 = 0, 𝑦 2 = 𝑦 βˆ’ 1 ∧ 𝑧 2 = 𝑧 βˆ’ 1, 𝑦 2 = 5: x--; y--; 𝐢 6: assert (y == 0); 𝑦 ∧ 𝑧 2 = 𝑧) ∧ 𝑦 2 = 0 ∧ 𝑧 2 β‰  0 β€’ 𝐡 ∧ 𝐢 =βŠ₯ β€’ 𝐽 𝑦, 𝑧 ≑ 𝑦 = 𝑧

  15. Example ο‚‘ 𝐡 ≑ 𝑦 1 = 0 ∧ 𝑧 1 = 0 ∧ 𝑗𝑒𝑓(𝑐, 𝑦 = 𝑦 1 + 1 ∧ y 𝑧 = 𝑧 1 + 1, 𝑦 = 𝑦 1 ∧ 𝑧 = 𝑧 1 ) ο‚‘ 𝐢 ≑ 𝑗𝑒𝑓(𝑦 = 0, 𝑦 2 = 𝑦 βˆ’ 1 ∧ 𝑧 2 = 𝑧 βˆ’ + (1,1) 1, 𝑦 2 = 𝑦 ∧ 𝑧 2 = 𝑧) ∧ 𝑦 2 = 0 ∧ 𝑧 2 β‰  0 ο‚‘ 𝐽 1 ≑ 2𝑧 ≀ 2𝑦 + 1 + x (0,0)

  16. Example ο‚‘ 𝐡 ≑ 𝑦 1 = 0 ∧ 𝑧 1 = 0 ∧ 𝑗𝑒𝑓(𝑐, 𝑦 = 𝑦 1 + 1 ∧ y 𝑧 = 𝑧 1 + 1, 𝑦 = 𝑦 1 ∧ 𝑧 = 𝑧 1 ) ο‚‘ 𝐢 ≑ 𝑗𝑒𝑓(𝑦 = 0, 𝑦 2 = 𝑦 βˆ’ 1 ∧ 𝑧 2 = 𝑧 βˆ’ + (1,1) 1, 𝑦 2 = 𝑦 ∧ 𝑧 2 = 𝑧) ∧ 𝑦 2 = 0 ∧ 𝑧 2 β‰  0 ο‚‘ 𝐽 2 ≑ 2𝑧 ≀ 2𝑦 + 1 ∧ 2𝑧 β‰₯ 2𝑦 βˆ’ 1 + x (0,0) Interpolant!

  17. The algorithm Theorem: π½π‘œπ‘’π‘“π‘ π‘žπ‘π‘šπ‘π‘œπ‘’(𝐡, 𝐢) terminates only if π½π‘œπ‘’π‘“π‘ π‘žπ‘π‘šπ‘π‘œπ‘’(𝐡, 𝐢) output 𝐼 is an interpolant between 𝐡 and 𝐢 (π‘Œ + , π‘Œ βˆ’ ) = π½π‘œπ‘—π‘’(𝐡, 𝐢) while(true) { Find candidate interpolant 𝐼 = π‘‡π‘Šπ‘π½(π‘Œ + , π‘Œ βˆ’ ) if ( π‘‡π΅π‘ˆ 𝐡 ∧ ¬𝐼 ) 𝐡 β‡’ 𝐽 Add 𝑑 to π‘Œ + and continue; if ( π‘‡π΅π‘ˆ 𝐢 ∧ ¬𝐼 ) 𝐽 ∧ 𝐢 =βŠ₯ Add 𝑑 to π‘Œ βˆ’ and continue; break; Exit if interpolant found } return 𝐼 ;

  18. Evaluation β€’ 1000 lines of C++ β€’ LIBSVM for SVM queries β€’ Z3 theorem prover

  19. Proving termination β€’ For every loop, guess a bound on the number of iterations β€’ Check the bound with a safety checker

  20. Example: GCD 1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: while (x !=y ) { 5: if (x > y) x = x-y; 6: if (y > x) y = y-x; 7: } 8: return x; 9 }

  21. Example: Instrumented GCD β€’ Inputs 1: gcd(int x, int y) 𝑦, 𝑧 = { 1,2 , 2,1 , 1,3 , 3,1 } 2: { 3: assume(x>0 && y>0); 𝑑 4: // instrumented code 1 𝑏 𝑐 5: a = x; b = y; c = 0; 1 1 1 2 6: while (x !=y ) { 1 1 2 1 7: // instrumented code β€’ 𝐡 = 1 , C = 1 1 3 8: c = c+1; 2 1 1 3 9: writeLog(a, b, c, x, y); 1 1 3 1 10: if (x > y) x = x-y; 2 11: if (y > x) y = y-x; 1 3 1 12: } 13: return x; β€’ Find 𝑑 β‰ˆ π‘₯ 1 𝑏 + π‘₯ 2 𝑐 + π‘₯ 3 (linear regression) 14: }

  22. Linear regression β€’ min 𝑗 (π‘₯ 1 𝑏 + π‘₯ 2 𝑐 + π‘₯ 3 βˆ’ 𝑑 𝑗 ) 2

  23. Quadratic programming β€’ min 𝑗 (π‘₯ 1 𝑏 + π‘₯ 2 𝑐 + π‘₯ 3 βˆ’ 𝑑 𝑗 ) 2 𝑑. 𝑒. 𝐡π‘₯ β‰₯ 𝐷 β€’ Guess is 𝜐 𝑏, 𝑐 = 𝑏 + 𝑐 βˆ’ 2

  24. Example: Annotated GCD β€’ Check with a safety checker 1: gcd(int x, int y) 2: { β€’ Free invariant to aid checker 3: assume(x>0 && y>0); 𝑑 ≀ 𝑏 + 𝑐 βˆ’ 𝑦 βˆ’ 𝑧 ∧ 𝑦 > 0 ∧ 𝑧 > 0 4: a = x; b = y; c = 0; 5: while (x !=y ) { β€’ Corrective measures 6: // annotation β€’ Sound rounding for polynomials 7: free_invariant(c <= a+b-x-y); with integer coefficients 8: // annotation β€’ Partitioning of tests for 9: assert(c <= a+b-2); 10: if (x > y) x = x-y; discovering disjunctive loop 11: if (y > x) y = y-x; bounds 12: } 13: return x; 14: }

  25. Evaluation

  26. Summary β€’ Classification based algorithms can be used for computing proofs in program verification β€’ Follow-up work on using techniques from linear algebra and PAC learning for scalable proofs β€’ Proving program termination via linear regression β€’ Data a Driven ven Program ram An Analys lysis is

Recommend


More recommend