testing a saturation based theorem prover experiences and
play

Testing a Saturation-Based Theorem Prover: Experiences and - PowerPoint PPT Presentation

Testing a Saturation-Based Theorem Prover: Experiences and Challenges Giles Reger 1 , Martin Suda 2 , and Andrei Voronkov 1 , 2 1 School of Computer Science, University of Manchester, UK 2 TU Wien, Vienna, Austria TAP 2017 Marburg, July 19,


  1. Testing a Saturation-Based Theorem Prover: Experiences and Challenges Giles Reger 1 , Martin Suda 2 , and Andrei Voronkov 1 , 2 1 School of Computer Science, University of Manchester, UK 2 TU Wien, Vienna, Austria TAP 2017 – Marburg, July 19, 2017 1/16

  2. Introduction First-order Automatic Theorem Proving: a well-established discipline of automated deduction main approach: refutational, saturation-based proving example systems: E, SPASS, Vampire 1/16

  3. Introduction First-order Automatic Theorem Proving: a well-established discipline of automated deduction main approach: refutational, saturation-based proving example systems: E, SPASS, Vampire Often used in larger projects and systems as black boxes e.g., program verification, static analysis, interpolation, . . . 1/16

  4. Introduction First-order Automatic Theorem Proving: a well-established discipline of automated deduction main approach: refutational, saturation-based proving example systems: E, SPASS, Vampire Often used in larger projects and systems as black boxes e.g., program verification, static analysis, interpolation, . . . ➥ Importance of ensuring correctness 1/16

  5. Introduction First-order Automatic Theorem Proving: a well-established discipline of automated deduction main approach: refutational, saturation-based proving example systems: E, SPASS, Vampire Often used in larger projects and systems as black boxes e.g., program verification, static analysis, interpolation, . . . ➥ Importance of ensuring correctness How are we doing? 1/16

  6. Introduction First-order Automatic Theorem Proving: a well-established discipline of automated deduction main approach: refutational, saturation-based proving example systems: E, SPASS, Vampire Often used in larger projects and systems as black boxes e.g., program verification, static analysis, interpolation, . . . ➥ Importance of ensuring correctness How are we doing? CASC competition: preliminary period for testing soundness 1/16

  7. Introduction First-order Automatic Theorem Proving: a well-established discipline of automated deduction main approach: refutational, saturation-based proving example systems: E, SPASS, Vampire Often used in larger projects and systems as black boxes e.g., program verification, static analysis, interpolation, . . . ➥ Importance of ensuring correctness How are we doing? CASC competition: preliminary period for testing soundness SMT-COMP 2016: 79 answers classified as incorrect 1/16

  8. Our Prover Vampire Automatic Theorem Prover for first-order logic and theories 2/16

  9. Our Prover Vampire Automatic Theorem Prover for first-order logic and theories regular winner of the main divisions of the CASC competition since 2016, also a successful participant of SMT-COMP 2/16

  10. Our Prover Vampire Automatic Theorem Prover for first-order logic and theories regular winner of the main divisions of the CASC competition since 2016, also a successful participant of SMT-COMP Quite complex piece of software ( ≈ 194000 lines of C++) ➥ easy to introduce incorrectness when adding a new feature 2/16

  11. Outline What Does Correctness Means for Us 1 Detecting and Investigating Bugs 2 Challenges 3 Conclusion 4 3/16

  12. Theorem proving basics Standard form of the input: F := ( Axiom 1 ∧ . . . ∧ Axiom n ) → Conjecture 4/16

  13. Theorem proving basics Standard form of the input: F := ( Axiom 1 ∧ . . . ∧ Axiom n ) → Conjecture 1 Negate F (to seek a refutation): ¬ F := Axiom 1 ∧ . . . ∧ Axiom n ∧ ¬ Conjecture 4/16

  14. Theorem proving basics Standard form of the input: F := ( Axiom 1 ∧ . . . ∧ Axiom n ) → Conjecture 1 Negate F (to seek a refutation): ¬ F := Axiom 1 ∧ . . . ∧ Axiom n ∧ ¬ Conjecture 2 Preprocess and transform ¬ F to a normal form S := { C 1 , . . . , C n } 4/16

  15. Theorem proving basics Standard form of the input: F := ( Axiom 1 ∧ . . . ∧ Axiom n ) → Conjecture 1 Negate F (to seek a refutation): ¬ F := Axiom 1 ∧ . . . ∧ Axiom n ∧ ¬ Conjecture 2 Preprocess and transform ¬ F to a normal form S := { C 1 , . . . , C n } 3 saturate S with respect to an inference system I 4/16

  16. Theorem proving basics Standard form of the input: F := ( Axiom 1 ∧ . . . ∧ Axiom n ) → Conjecture 1 Negate F (to seek a refutation): ¬ F := Axiom 1 ∧ . . . ∧ Axiom n ∧ ¬ Conjecture 2 Preprocess and transform ¬ F to a normal form S := { C 1 , . . . , C n } 3 saturate S with respect to an inference system I C 1 ∨ P C 2 ∨ ¬ P Example inference rule: C 1 ∨ C 2 4/16

  17. The Saturation Process Saturation = fixed-point (closure) computation 5/16

  18. The Saturation Process Saturation = fixed-point (closure) computation Does the final set S contain false ? 5/16

  19. The Saturation Process Saturation = fixed-point (closure) computation Does the final set S contain false ? Basic properties: 5/16

  20. The Saturation Process Saturation = fixed-point (closure) computation Does the final set S contain false ? Basic properties: explosive in nature 5/16

  21. The Saturation Process Saturation = fixed-point (closure) computation Does the final set S contain false ? Basic properties: explosive in nature may not terminate 5/16

  22. The Saturation Process Saturation = fixed-point (closure) computation Does the final set S contain false ? Basic properties: explosive in nature may not terminate various tricks to mitigate the explosion 5/16

  23. Possible Answers: Theorem (together with a proof) if the input F is logically valid 6/16

  24. Possible Answers: Theorem (together with a proof) if the input F is logically valid Non-theorem if F is invalid (there is a counter-example) 6/16

  25. Possible Answers: Theorem (together with a proof) if the input F is logically valid Non-theorem if F is invalid (there is a counter-example) relies on a completeness argument 6/16

  26. Possible Answers: Theorem (together with a proof) if the input F is logically valid Non-theorem if F is invalid (there is a counter-example) relies on a completeness argument Unknown 6/16

  27. Possible Answers: Theorem (together with a proof) if the input F is logically valid Non-theorem if F is invalid (there is a counter-example) relies on a completeness argument Unknown time limit / memory limit 1 6/16

  28. Possible Answers: Theorem (together with a proof) if the input F is logically valid Non-theorem if F is invalid (there is a counter-example) relies on a completeness argument Unknown time limit / memory limit 1 incomplete strategy failed 2 6/16

  29. Different Ways of Being Incorrect unsoundness: Reports Theorem for an invalid F . (Derives false for a satisfiable S .) 7/16

  30. Different Ways of Being Incorrect unsoundness: Reports Theorem for an invalid F . (Derives false for a satisfiable S .) Check the proof and see what went wrong. 7/16

  31. Different Ways of Being Incorrect unsoundness: Reports Theorem for an invalid F . (Derives false for a satisfiable S .) Check the proof and see what went wrong. completeness issue: Reports Non-theorem for a valid F . (Finitely saturates unsat. S without deriving false .) 7/16

  32. Different Ways of Being Incorrect unsoundness: Reports Theorem for an invalid F . (Derives false for a satisfiable S .) Check the proof and see what went wrong. completeness issue: Reports Non-theorem for a valid F . (Finitely saturates unsat. S without deriving false .) Should have said Unknown here! 7/16

  33. Different Ways of Being Incorrect unsoundness: Reports Theorem for an invalid F . (Derives false for a satisfiable S .) Check the proof and see what went wrong. completeness issue: Reports Non-theorem for a valid F . (Finitely saturates unsat. S without deriving false .) Should have said Unknown here! fairness issue: Prover runs indefinitely, while a proof exists. (Violation of fairness criteria in saturation.) 7/16

  34. Different Ways of Being Incorrect unsoundness: Reports Theorem for an invalid F . (Derives false for a satisfiable S .) Check the proof and see what went wrong. completeness issue: Reports Non-theorem for a valid F . (Finitely saturates unsat. S without deriving false .) Should have said Unknown here! fairness issue: Prover runs indefinitely, while a proof exists. (Violation of fairness criteria in saturation.) never (strictly) violated after finitely many steps 7/16

  35. Violating the Contract of Proper Behaviour General error conditions shared by any other program: program crash E.g., 8/16

  36. Violating the Contract of Proper Behaviour General error conditions shared by any other program: program crash E.g., unhandled exceptions 8/16

  37. Violating the Contract of Proper Behaviour General error conditions shared by any other program: program crash E.g., unhandled exceptions signal interrupts (SIGFPE, SIGSEG) 8/16

  38. Violating the Contract of Proper Behaviour General error conditions shared by any other program: program crash E.g., unhandled exceptions signal interrupts (SIGFPE, SIGSEG) assertion violation defensive development via assertions around 2500 assertions in total; (one per 77 lines on average) potential errors detected early on 8/16

  39. Outline What Does Correctness Means for Us 1 Detecting and Investigating Bugs 2 Challenges 3 Conclusion 4 9/16

Recommend


More recommend