Machine learning for instance selection in SMT solving (W ork in Progress ) Jasmin Christian Blanchete 1, 2 Daniel El Ouraoui 2 Pascal Fontaine 2 Cezary Kaliszyk 3 Vrije Universiteit Amsterdam, Amsterdam, The Netherlands University of Lorraine, CNRS, Inria, and LORIA, Nancy, France University of Innsbruck, Innsbruck, Austria 9th April 2019
Contents 1 Introduction 2 CDCL(T) 3 Instantiation techniques 4 Machine learning for instance selection 5 Evaluation 6 Conclusion 2 / 32
Contents 1 Introduction 2 CDCL(T) 3 Instantiation techniques 4 Machine learning for instance selection 5 Evaluation 6 Conclusion 3 / 32
Motivations Instantiation Satisfiability modulo theories (SMT) Hard for SMT solvers Automation Proof assistant Heuristically solved Verification conditions Model checking Solvers Challenge Z3, cvc 4, veriT , ... Improve instantiation techniques Solve more problems Be more efficient 4 / 32
Our tool Université de Lorraine/UFRN ( http://www.verit-solver.org ) 5 / 32
Contents 1 Introduction 2 CDCL(T) 3 Instantiation techniques 4 Machine learning for instance selection 5 Evaluation 6 Conclusion 6 / 32
Context Ground b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y Instantiation 7 / 32
Ground problem How efficiently check the satisfiability of a ground formula ( f ( a , b ) = g ( a ) ∨ d = b ) ∧ d = g ( b ) ∧ d � = f ( a , b ) ∧ b = a ∧ d � = g ( a ) ( f ( a , b ) = g ( a ) ∨ d = b ) ∧ d = g ( b ) ∧ d � = f ( a , b ) ∧ b = a ∧ d � = g ( a ) l 2 l 5 l 1 l 3 l 4 l 6 ( l 1 ∨ ¬ l 2 ) ∧ l 3 ∧ l 4 ∧ l 5 ∧ l 6 8 / 32
Ground problem How efficiently check the satisfiability of a ground formula ( f ( a , b ) = g ( a ) ∨ d = b ) ∧ d = g ( b ) ∧ d � = f ( a , b ) ∧ b = a ∧ d � = g ( a ) ( f ( a , b ) = g ( a ) ∨ d = b ) ∧ d = g ( b ) ∧ d � = f ( a , b ) ∧ b = a ∧ d � = g ( a ) l 2 l 5 l 1 l 3 l 4 l 6 ( l 1 ∨ ¬ l 2 ) ∧ l 3 ∧ l 4 ∧ l 5 ∧ l 6 8 / 32
Ground problem How efficiently check the satisfiability of a ground formula ( f ( a , b ) = g ( a ) ∨ d = b ) ∧ d = g ( b ) ∧ d � = f ( a , b ) ∧ b = a ∧ d � = g ( a ) ( f ( a , b ) = g ( a ) ∨ d = b ) ∧ d = g ( b ) ∧ d � = f ( a , b ) ∧ b = a ∧ d � = g ( a ) l 2 l 5 l 1 l 3 l 4 l 6 ( l 1 ∨ ¬ l 2 ) ∧ l 3 ∧ l 4 ∧ l 5 ∧ l 6 8 / 32
Ground problem How efficiently check the satisfiability of a ground formula ( f ( a , b ) = g ( a ) ∨ d = b ) ∧ d = g ( b ) ∧ d � = f ( a , b ) ∧ b = a ∧ d � = g ( a ) ( f ( a , b ) = g ( a ) ∨ d = b ) ∧ d = g ( b ) ∧ d � = f ( a , b ) ∧ b = a ∧ d � = g ( a ) l 2 l 5 l 1 l 3 l 4 l 6 ( l 1 ∨ ¬ l 2 ) ∧ l 3 ∧ l 4 ∧ l 5 ∧ l 6 8 / 32
CDCL(T) Ground Solver Conflict clauses Theory solvers SAT solver Boolean model Formulas are embedded in SAT SAT solver produces a boolean model Theory solvers produce conflict clauses Conflict clauses guide the SAT solver 9 / 32
First-Order problem b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y Instantiation 10 / 32
First-Order problem How to find an instance such that the problem is UNSAT b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y SAT f ( a ) � = f ( b ) ∨ a = b b � = a ∧ f ( a ) = f ( b ) ∧ f ( a ) � = f ( b ) UNSAT 11 / 32
First-Order problem How to find an instance such that the problem is UNSAT b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y SAT f ( a ) � = f ( b ) ∨ a = b b � = a ∧ f ( a ) = f ( b ) ∧ f ( a ) � = f ( b ) UNSAT 11 / 32
First-Order problem How to find an instance such that the problem is UNSAT b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y SAT f ( a ) � = f ( b ) ∨ a = b b � = a ∧ f ( a ) = f ( b ) ∧ f ( a ) � = f ( b ) UNSAT 11 / 32
First-Order problem How to find an instance such that the problem is UNSAT b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y SAT f ( a ) � = f ( b ) ∨ a = b b � = a ∧ f ( a ) = f ( b ) ∧ f ( a ) � = f ( b ) UNSAT 11 / 32
First-Order problem How to find an instance such that the problem is UNSAT b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y SAT f ( a ) � = f ( b ) ∨ a = b b � = a ∧ f ( a ) = f ( b ) ∧ f ( a ) � = f ( b ) UNSAT 11 / 32
First-Order problem How to find an instance such that the problem is UNSAT b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y b � = a ∧ f ( a ) = f ( b ) ∧ ∀ xy f ( x ) = f ( y ) ⇒ x = y SAT f ( a ) � = f ( b ) ∨ a = b b � = a ∧ f ( a ) = f ( b ) ∧ f ( a ) � = f ( b ) UNSAT 11 / 32
First-Order CDCL(T) SMT Solver Instances Ground solver Instantiation FO model 12 / 32
Contents 1 Introduction 2 CDCL(T) 3 Instantiation techniques 4 Machine learning for instance selection 5 Evaluation 6 Conclusion 13 / 32
State of the art Conflict based instantiation Introduced by Reynolds, this technique produces relevant sets of instances. The idea is that, given a ground model M and a quantified formula ∀ ( x n : τ n ) .ϕ , we find a substitution σ such that M | = ¬ ϕσ . Congruence Closure with Free Variable (CCFV) Introduced by Barbosa et al., generalizes the idea of Conflict based instantiation by reasoning over equivalence classes. 14 / 32
State of the art � Enumerative instantiation ∀ ( x : τ ) .ψ [ x ] ≡ ψ [ t ] t ∈D τ Enumerate all ground terms over the domain of x (aka. Herbrand universe) Trigger based instantiation Triggers A trigger T for a quantified formula ∀ x n .ψ is a set of non-ground terms u 1 , . . . , u n ∈ T ( ψ ) such that: { x } ⊆ FV ( u 1 ) ∪ . . . ∪ FV ( u n ) . E = f ( a ) ≃ g ( b ) , a ≃ g ( b ) Q = ∀ x f ( g ( x )) �≃ g ( x ) T = f ( g ( x )) f ( a ) E -matches f ( g ( x )) under x �→ b 15 / 32
Strategie CCFV Works Fails ground solver Trigger + Enum Figure: Instantiation strategie 16 / 32
Summarize Conflict based instantiation and CCFV : Pro Efficient, if find substitution kill the model Pro All generated instances are useful Cons Finds contradiction involving only one instance Enumerative and Trigger based instanciation : Pro Useful when CCFV fail Cons Many heuristics Cons Generates a lot of junk, and many instances 17 / 32
Summarize Conflict based instantiation and CCFV : Pro Efficient, if find substitution kill the model Pro All generated instances are useful Cons Finds contradiction involving only one instance Enumerative and Trigger based instanciation : Pro Useful when CCFV fail Cons Many heuristics Cons Generates a lot of junk, and many instances Indeed This is what we want improve! 17 / 32
Contents 1 Introduction 2 CDCL(T) 3 Instantiation techniques 4 Machine learning for instance selection 5 Evaluation 6 Conclusion 18 / 32
Problem How many lemmas are generated to solve a problem? around 300 for the UF category of the SMT-LIB some generate more than 100 000 instances How many lemmas are needed to solve a problem? Only 10% of this number, and sometimes much less 19 / 32
Problem How many lemmas are generated to solve a problem? around 300 for the UF category of the SMT-LIB some generate more than 100 000 instances How many lemmas are needed to solve a problem? Only 10% of this number, and sometimes much less Question Could we select the good one? 19 / 32
Our approach ML-Solver Instantiation Ground Solver Instance selection Processing Instances in a priority queue Encode instances Call predictor Instance selection Several strategies for selection Instances Delayed Selected instances Inst 1 Filter ... Inst 1 rank Inst n ... Predictor Inst n rank 20 / 32
State description Model Formula Instances ( l 1 , . . . , l n , ∀ x n . ψ [ x n ] , x 1 �→ t 1 , . . . , x n �→ t n ) Qformula 1 Inst 1 Inst 1 0 x 12 x 13 . . . x 1 n ( model 1 1 , m ) . . . 1 1 , 1 Qformula 2 Inst 2 Inst 2 1 x 22 x 23 . . . x 2 n ( model 1 1 , m ) . . . 1 1 , 1 rounds { → ֒ . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qformula i Inst i Inst i ( model k k , m ) . . . 0 x d 2 x d 3 . . . x dn k k , 1 21 / 32
Experiments veriT Small proof pre processing Data set balancing data over sampling under sampling Features Train importance XGBoost XGBoost classification predictions Model C code 22 / 32
Contents 1 Introduction 2 CDCL(T) 3 Instantiation techniques 4 Machine learning for instance selection 5 Evaluation 6 Conclusion 23 / 32
Time evaluation Experiments run on UF SMTLIB benchmarks with 120s timeout veriT without learning solves 2923 veriT with learning solves 2939 with learning 24 / 32
Evaluation on test + training set Figure: comparison of veriT configurations on UF SMT-LIB benchmarks. 25 / 32
Evaluation on test set only Figure: comparison of veriT configurations on UF SMT-LIB benchmarks. 26 / 32
Recommend
More recommend