Machine learning Aditya V. Nori Programming Languages & Tools - PowerPoint PPT Presentation

Program verification via Machine learning Aditya V. Nori Programming Languages & Tools group Microsoft Research India Joint work with Rahul Sharma, Alex Aiken (Stanford University)

Program verification 1: x = y = 0; 1: gcd(int x, int y) 2: while (*) 2: { 3: x++; y++; 3: assume(x>0 && y>0); 4: while (x != 0) 4: while (x !=y ) { 5: x--; y--; 5: if (x > y) x = x-y; 6: assert (y == 0); 6: if (y > x) y = y-x; 7: } 8: return x; Qu Questi tion on 9 } Is the assertion satisfied for all Qu Questi tion on possible inputs? Does gcd terminate for all inputs 𝑦 , 𝑧 ?

Current state of affairs • Precision • Scalability • Testing is still the dominant technique for establishing software quality

Question … • Most applications are associated with test suites, primarily used for regression or fuzz testing • Can we use these test suites profitably for proving program correctness?

Here’s the plan … • Guess: analyse data from tests in order to infer a candidate invariant (use ML techniques) • Check: validate candidate invariant using Guess sound program analysis techniques • If check succeeds, then we have a proof! • If check fails, use failure to generate more data program 𝑢 𝜅 and repeat guess+check Check • Why is this nice? • Program analysis not so good at guessing invariants • Program analysis is good at checking invariants • Able to make use of data generated from programs and existing ML algorithms for analysis

Instantiations of Guess • Classification  Interpolants as Classifiers. Sharma, N, Aiken, Computer-Aided Verification (CAV 2012)  Program Verification as Learning Geometric Concepts. Sharma, Gupta, Hariharan, Aiken, N. Submitted • Linear algebra  A Data Driven Approach for Algebraic Loop Invariants. Sharma, Gupta, Hariharan, Aiken, N. European Symposium on Programming (ESOP 2012) • Regression  Termination proofs from tests. N, Sharma. submitted

Interpolants • An interpolant for a pair of formulas 𝐵, 𝐶 s.t. (𝐵 ∧ 𝐶 =⊥) is a formula 𝐽 satisfying: • 𝐵 ⇒ 𝐽 • 𝐽 ∧ 𝐶 =⊥ • 𝑤𝑏𝑠𝑡 𝐽 ⊆ 𝑤𝑏𝑠𝑡 𝐵 ∩ 𝑤𝑏𝑠𝑡 𝐶 • An interpolant is a “simple” proof

Example • 𝐵 = 𝑦 ≥ 𝑧 y • 𝐶 = 𝑧 ≥ 𝑦 + 1 • 𝐽 = 2𝑦 + 1 ≥ 2𝑧 x

Binary classification • Input: a set of points 𝑌 with labels 𝑚 ∈ +1, −1 • Goal: find a classifier 𝐷: X → {𝑢𝑠𝑣𝑓, 𝑔𝑏𝑚𝑡𝑓} such that: • 𝐷 𝑏 = 𝑢𝑠𝑣𝑓, ∀𝑏 ∈ 𝑌 . 𝑚𝑏𝑐𝑓𝑚 𝑏 = +1 , and • 𝐷 𝑐 = 𝑔𝑏𝑚𝑡𝑓, ∀𝑐 ∈ X . 𝑚𝑏𝑐𝑓𝑚 𝑐 = −1

Verification & Machine-learning • Interpolant: separates formula 𝐵 from formula 𝐶 • Classifier: separates positive examples from negative examples Is there a connection?

Yes! • Main result: view interpolants as classifiers which distinguish “ + ” examples from “ − ” examples • Use state-of-the-art classification algorithms ( SVM s) for computing invariants • SVM s are predictive → generalized predicates for verification

Verification & Machine-learning Get positive and negative Unroll the loops examples • Find interpolants • Find a classifier • Get general proofs (loop • This is a predicate which invariants) generalizes to test data

Example 1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert (y == 0);

Example … • 𝐵 ≡ 𝑦 1 = 0 ∧ 𝑧 1 = 0 ∧ 𝑗𝑢𝑓(𝑐, 𝑦 = 𝑦 1 + 1 ∧ 𝑧 = 1: x = y = 0; 𝐵 2: while (*) 𝑧 1 + 1, 𝑦 = 𝑦 1 ∧ 𝑧 = 𝑧 1 ) 3: x++; y++; 4: while (x != 0) • 𝐶 ≡ 𝑗𝑢𝑓(𝑦 = 0, 𝑦 2 = 𝑦 − 1 ∧ 𝑧 2 = 𝑧 − 1, 𝑦 2 = 5: x--; y--; 𝐶 6: assert (y == 0); 𝑦 ∧ 𝑧 2 = 𝑧) ∧ 𝑦 2 = 0 ∧ 𝑧 2 ≠ 0 • 𝐵 ∧ 𝐶 =⊥ • 𝐽 𝑦, 𝑧 ≡ 𝑦 = 𝑧

Example  𝐵 ≡ 𝑦 1 = 0 ∧ 𝑧 1 = 0 ∧ 𝑗𝑢𝑓(𝑐, 𝑦 = 𝑦 1 + 1 ∧ y 𝑧 = 𝑧 1 + 1, 𝑦 = 𝑦 1 ∧ 𝑧 = 𝑧 1 )  𝐶 ≡ 𝑗𝑢𝑓(𝑦 = 0, 𝑦 2 = 𝑦 − 1 ∧ 𝑧 2 = 𝑧 − + (1,1) 1, 𝑦 2 = 𝑦 ∧ 𝑧 2 = 𝑧) ∧ 𝑦 2 = 0 ∧ 𝑧 2 ≠ 0  𝐽 1 ≡ 2𝑧 ≤ 2𝑦 + 1 + x (0,0)

Example  𝐵 ≡ 𝑦 1 = 0 ∧ 𝑧 1 = 0 ∧ 𝑗𝑢𝑓(𝑐, 𝑦 = 𝑦 1 + 1 ∧ y 𝑧 = 𝑧 1 + 1, 𝑦 = 𝑦 1 ∧ 𝑧 = 𝑧 1 )  𝐶 ≡ 𝑗𝑢𝑓(𝑦 = 0, 𝑦 2 = 𝑦 − 1 ∧ 𝑧 2 = 𝑧 − + (1,1) 1, 𝑦 2 = 𝑦 ∧ 𝑧 2 = 𝑧) ∧ 𝑦 2 = 0 ∧ 𝑧 2 ≠ 0  𝐽 2 ≡ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1 + x (0,0) Interpolant!

The algorithm Theorem: 𝐽𝑜𝑢𝑓𝑠𝑞𝑝𝑚𝑏𝑜𝑢(𝐵, 𝐶) terminates only if 𝐽𝑜𝑢𝑓𝑠𝑞𝑝𝑚𝑏𝑜𝑢(𝐵, 𝐶) output 𝐼 is an interpolant between 𝐵 and 𝐶 (𝑌 + , 𝑌 − ) = 𝐽𝑜𝑗𝑢(𝐵, 𝐶) while(true) { Find candidate interpolant 𝐼 = 𝑇𝑊𝑁𝐽(𝑌 + , 𝑌 − ) if ( 𝑇𝐵𝑈 𝐵 ∧ ¬𝐼 ) 𝐵 ⇒ 𝐽 Add 𝑡 to 𝑌 + and continue; if ( 𝑇𝐵𝑈 𝐶 ∧ ¬𝐼 ) 𝐽 ∧ 𝐶 =⊥ Add 𝑡 to 𝑌 − and continue; break; Exit if interpolant found } return 𝐼 ;

Evaluation • 1000 lines of C++ • LIBSVM for SVM queries • Z3 theorem prover

Proving termination • For every loop, guess a bound on the number of iterations • Check the bound with a safety checker

Example: GCD 1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: while (x !=y ) { 5: if (x > y) x = x-y; 6: if (y > x) y = y-x; 7: } 8: return x; 9 }

Example: Instrumented GCD • Inputs 1: gcd(int x, int y) 𝑦, 𝑧 = { 1,2 , 2,1 , 1,3 , 3,1 } 2: { 3: assume(x>0 && y>0); 𝑑 4: // instrumented code 1 𝑏 𝑐 5: a = x; b = y; c = 0; 1 1 1 2 6: while (x !=y ) { 1 1 2 1 7: // instrumented code • 𝐵 = 1 , C = 1 1 3 8: c = c+1; 2 1 1 3 9: writeLog(a, b, c, x, y); 1 1 3 1 10: if (x > y) x = x-y; 2 11: if (y > x) y = y-x; 1 3 1 12: } 13: return x; • Find 𝑑 ≈ 𝑥 1 𝑏 + 𝑥 2 𝑐 + 𝑥 3 (linear regression) 14: }

Linear regression • min 𝑗 (𝑥 1 𝑏 + 𝑥 2 𝑐 + 𝑥 3 − 𝑑 𝑗 ) 2

Quadratic programming • min 𝑗 (𝑥 1 𝑏 + 𝑥 2 𝑐 + 𝑥 3 − 𝑑 𝑗 ) 2 𝑡. 𝑢. 𝐵𝑥 ≥ 𝐷 • Guess is 𝜐 𝑏, 𝑐 = 𝑏 + 𝑐 − 2

Example: Annotated GCD • Check with a safety checker 1: gcd(int x, int y) 2: { • Free invariant to aid checker 3: assume(x>0 && y>0); 𝑑 ≤ 𝑏 + 𝑐 − 𝑦 − 𝑧 ∧ 𝑦 > 0 ∧ 𝑧 > 0 4: a = x; b = y; c = 0; 5: while (x !=y ) { • Corrective measures 6: // annotation • Sound rounding for polynomials 7: free_invariant(c <= a+b-x-y); with integer coefficients 8: // annotation • Partitioning of tests for 9: assert(c <= a+b-2); 10: if (x > y) x = x-y; discovering disjunctive loop 11: if (y > x) y = y-x; bounds 12: } 13: return x; 14: }

Evaluation

Summary • Classification based algorithms can be used for computing proofs in program verification • Follow-up work on using techniques from linear algebra and PAC learning for scalable proofs • Proving program termination via linear regression • Data a Driven ven Program ram An Analys lysis is

Machine learning Aditya V. Nori Programming Languages & Tools - PowerPoint PPT Presentation

Program verification via Machine learning Aditya V. Nori Programming Languages & Tools group Microsoft Research India Joint work with Rahul Sharma, Alex Aiken (Stanford University) Program verification 1: x = y = 0; 1: gcd(int x, int

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Kathleen B. Roberts OIg issues three Corporate Compliance and reports on adverse Privacy

Presentation Manor 1 2 3 November 2018 Volume 1 THE VALUES OF PRESENTATION MANOR RESPECT 4

September 2019 The content of this promotion has not been approved by an authorised person within

INVESTOR PRESENTATION July 2020 DISCLAIMER The content of information contained in this

L -01 New York, New York 10018 New York, New York Telephone 212.768.7676 Fax 212.840.9871

Fresno County Employees Retirement System Data as of September 30, 2006 November 1, 2006 Gary

Motion Capture for Runners Design Team 8 - Spring 2013 Members: Blake Frantz, Zhichao Lu, Alex

Criteria for the validity of Amontons Coulombs law; Study of friction using dynamics of driven