Curriculum Learning and Theorem Proving Zombori 1 arik 1 Michalewski - PowerPoint PPT Presentation

Curriculum Learning and Theorem Proving Zombori 1 arik 1 Michalewski 2 Zsolt Adri´ an Csisz´ Henryk Cezary Kaliszyk 3 Josef Urban 4 1 Alfr´ ed R´ enyi Institute o Mathematics, Hungarian Academy of Sciences 2 University of Warsaw, deepsense.ai 3 University of Innsbruck 4 Czech Technical University in Prague

Motivation 1. ATPs tend to only find short proofs - even after learning 2. AITP systems typically trained/evaluated on large proof sets - hard to see what the system has learned • Can we build a system that learns to find longer proofs? • What can be learned from just a few (maybe one) proof? 1

Aim • Build an internal guidance system for theorem proving • Use reinforcement learning • Train on a single problem • Try to generalize to long proofs with very similar structure 2

Domain: Robinson Arithmetic %theorem: mul(1,1) = 1 fof(zeroSucc, axiom, ! [X]: (o != s(X))). fof(diffSucc, axiom, ! [X,Y]: (s(X) != s(Y) | X = Y)). fof(addZero, axiom, ! [X]: (plus(X,o) = X)). fof(addSucc, axiom, ! [X,Y]: (plus(X,s(Y)) = s(plus(X,Y)))). fof(mulZero, axiom, ! [X]: (mul(X,o) = o)). fof(mulSucc, axiom, ! [X,Y]: (mul(X,s(Y)) = plus(mul(X,Y),X))). fof(myformula, conjecture, mul(s(o),s(o)) = s(o)). • Proofs are non trivial, but have a strong structure • See how little supervision is required to learn some proof types 3

Challenge for Reinforcement learning • Theorem proving provides sparse, binary rewards • Long proofs provide extremely little reward 4

Idea • Use curriculum learning • Start learning from the end of the proof • Gradually move starting step towards the beginning of proof 5

Reinforcement Learning Approach • Proximal Policy Optimization (PPO) • Actor - Critic Framework • Actor learns a policy (what steps to take) • Critic learns a value (how promising is a proof state) • Actor is confined to change slowly to increase stability 6

PPO challenges • Action space is not fixed (different at each step) • Action space cannot be directly parameterized • Guidance cannot ”output” the correct action • Guidance takes the state - action pair as input and returns a score 7

Technical Details • ATP: LeanCoP (ocaml / prolog) • Connection tableau based • Available actions are determined by the axiom set (does not grow) • Returns (hand designed) Enigma features • Machine learning in python • Learner is a 3-4 layer deep neural network • PPO1 implementation of Stable Baselines 8

Evaluation: STAGE 1 • N 1 + N 2 = N 3 , N 1 × N 2 = N 3 • Enough to find a good ordering of the actions • Can be fully mastered from the proof of 1 × 1 = 1 • Useful: • Some reward for following the proof 9

Evaluation: STAGE 2 • RandomExpr = N • Features from the current goal become important • Couple ”rare” actions • Can be mastered from the proof of 1 × 1 × 1 = 1 • Useful: • Features from the current goal • Oversample positive trajectories 10

Evaluation: STAGE 3 • RandomExpr 1 = RandomExpr 2 • More features required • ”Rare” events tied to global proof progress • Trained on 4-5 proofs, we can learn 90% of problems • Useful: • Features from the path • Features from other open goals • Features from the previous action • Random perturbation of the curriculum stage • Train on several proofs in parallel 11

Future work • Extend Robinson arithmetic with other operators • Learn on multiple proofs to master multiple strategies in parallel • Try some other RL approaches • Move beyond Robinson 12

Curriculum Learning and Theorem Proving Zombori 1 arik 1 Michalewski - PowerPoint PPT Presentation

Curriculum Learning and Theorem Proving Zombori 1 arik 1 Michalewski 2 Zsolt Adri an Csisz Henryk Cezary Kaliszyk 3 Josef Urban 4 1 Alfr ed R enyi Institute o Mathematics, Hungarian Academy of Sciences 2 University of Warsaw,

Visual theorem proving with the Incredible Proof Machine The idea Theorem Proving without

Artificial Intelligence in Theorem Proving Cezary Kaliszyk VTSA Overview Last Lecture theorem

Theorem-Proving Environments Nathan Ng CSC2547: Learning to Search Theorem Proving What is a

On Theorem Proving for Program Checking Historical perspective and recent developments Maria

Symbolic Computation and Theorem Proving in Program Analysis Laura Kov acs Chalmers

Artificial Intelligence in Theorem Proving Cezary Kaliszyk VTSA 2019 Computer Theorem Proving

Learning theorem proving through self-play Stanisaw Purga Overview AlphaZero Proving

31. Stokes Theorem Stokes theorem is to Greens theorem, for the work done, as the

Functional Programming Functional Programming and Theorem Proving and Theorem Proving for

Automated Theorem Proving 1/4: Introduction and Propositional Theorem Proving A.L. Lamprecht

Automated Theorem Proving 2/4: First-Order Theorem Proving A.L. Lamprecht Course Program

Instantiation-Based Automated Theorem Proving for First-Order Logic Konstantin Korovin The

Formal Verification Methods 4: Theorem Proving John Harrison Intel Corporation Need for

Saturation-based Theorem Proving and ML Course Machine Learning and Reasoning 2020 MLR 2020 1 1

Large Scale Deep Learning for Theorem Proving in HOList: First Results and Future Directions

Proving that Artinian implies Noetherian without proving that Artinian implies finite length

Membership Proposal for the Federal University of Rio de Janeiro - UFRJ Joo R. T. de Mello

Advanced Parallel Programming Communicator Management David Henty, Fiona Reid Overview

GlyphosateResistant Palmer amaranth response to weed management programs in Roundup Ready and

Spatial Data Structures What is it? Data structures that organize geometry in 2D,3D or higher

GADTs Practice Curtis Millar CSE, UNSW (and Data61) 22 July 2020 1 Exercise 5 GADTs TypeSafe

WELCOME NHCUC Board Retreat May 23, 2016 History of the NHCUC Celebrating Fifty Years of

Data Management Research @ UW Seattle uwdb.io http://uwdb.io/ Magdalena Balazinska Research

Beginning With a Gift LESSON 12 Your Response to the Lesson What was most interesting in the