learning theorem proving through self play
play

Learning theorem proving through self-play Stanisaw Purga The goal - PowerPoint PPT Presentation

Learning theorem proving through self-play Stanisaw Purga The goal Learn to prove theorems without: any proofs any theorems What we get: a list of axioms defining the logic 1 Overview AlphaZero (briefly) Proving game


  1. Learning theorem proving through self-play Stanisław Purgał

  2. The goal Learn to prove theorems without: • any proofs • any theorems What we get: • a list of axioms defining the logic 1

  3. Overview • AlphaZero (briefly) • Proving game • adjusting MCTS for proving game • some results 2

  4. Neural black box game state S move policy expected outcome π ∈ R n v ∈ R 3

  5. Neural black box ( S 1 , π 1 , v 1 ) . . . ( S n , π n , v n ) 4

  6. Monte-Carlo Tree Search game state S move policy expected outcome π ∈ R n v ∈ R 5

  7. Monte-Carlo Tree Search S choose a child according S 1 S 2 S 3 to the formula: v π √ n c · n i π i + v i log n + c base + 1 � � c = + c init c base weighted c base = 19652 average c init = 1 . 25 6

  8. Monte-Carlo Tree Search 7

  9. Monte-Carlo Tree Search 8

  10. Closing the loop • play lots of games • choose moves randomly, according to MCTS policy • use finished games for training: • target value in the result of the game • target policy is the MCTS policy • also add noise to neural network output to increase exploration 9

  11. Proving game theorem Prove the theorem win lose 10

  12. Proving game Construct a theorem Adversary wins Prove the theorem Prover wins 11

  13. Prolog-like proving A ⊢ X A ⊢ Y (1) A ⊢ X ∧ Y holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) (2) 12

  14. Prolog-like proving [ X : A ⊢ X ∧ ¬¬ X , ... ] A ⊢ X ∧ Y :- A ⊢ X , A ⊢ Y X : A ⊢ X ∧ ¬¬ X :- X : A ⊢ X , X : A ⊢ ¬¬ X [ X : A ⊢ X , X : A ⊢ ¬¬ X , ... ] 13

  15. Prolog-like proving [ X : A , and ( X , not ( not ( X )))) , ... ] holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) holds ( X : A , and ( X , not ( not ( X )))) :- holds ( X : A , X ) , holds ( X : A , not ( not ( X ))) [ holds ( X : A , X ) , holds ( X : A , not ( not ( X ))) , ... ] 14

  16. Prolog-like theorem constructing [ holds ( X : A , and ( X , not ( not ( X )))) , ... ] holds ( X : A , and ( X , not ( not ( X )))) :- holds ( X : A , X ) , holds ( X : A , not ( not ( X ))) holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) [ holds ( X : A , X ) , holds ( X : A , not ( not ( X ))) , ... ] bad idea 15

  17. Prolog-like theorem constructing [ holds ( A , ♣ ) , ... ] holds ( A , ♣ ) :- holds ( A , or ( ♦ , ♥ )) , holds ( A , implies ( ♦ , ♣ )) , holds ( A , implies ( ♥ , ♣ )) holds ( A , Z ) :- holds ( A , or ( X , Y )) , holds ( A , implies ( X , Z )) , holds ( A , implies ( Y , Z )) [ � , � , � , ... ] bad idea 16

  18. Prolog-like theorem constructing [ T ] holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) [ holds ( A , X ) , holds ( A , Y ) ] 17

  19. Prolog-like theorem constructing T holds ( X : A , and ( X , not ( not ( X )))) holds ( x:a , and ( x , not ( not ( x )))) 18

  20. Forcing termination of the game Step limit: • ugly extension of game state • strategy may depend on number of steps left • even if we hide it, there is a correlation: large term constructed ∼ few steps left ∼ will likely lose 19

  21. Forcing termination of the game Sudden death chance: • game states nicely equal • no hard limit for length of a theorem During training playout, randomly terminate game with chance p d . In MCTS, adjust value v ′ = ( − 1 ) · p d + v · ( 1 − p d ) . 20

  22. Disadvantages of this game • two different players - if one player starts winning every game, we can’t learn much • proof use single inference steps - inefficient • players don’t take turns - MCTS not designed for that situation 21

  23. Not using maximum 22

  24. Not using maximum 23

  25. Not using maximum 24

  26. Not using maximum 25

  27. Certainty propagation 26

  28. Certainty propagation 27

  29. Certainty propagation 28

  30. Certainty propagation recursively: for uncertain leafs: for certain leafs: v = min( u , max( l , a )) v = � v = result a = � +Σ v i · n i a = � a = result n + 1 l = max i l i l = − 1 l = result u = max i u i u = 1 u = result when player changes: • values and bounds flip • lower and upper bound switch places 29

  31. Learning the proving game Like AlphaZero, with few differences: • using Transformer (encoder) for � • for theorems that prover failed to prove, show proper path with additional training samples • during evaluation, greedy policy and step limit instead of sudden death • balance training batches to have even split of won and lost games 30

  32. Proving game evaluation Construct a theorem evaluation theorem Adversary wins Prove the theorem Prover wins 31

  33. Potential problems Players are non symmetrical: • Prover could be winning everything • Adversary could be winning everything to some extent this is handled by additional training samples can be solved by more exploration 32

  34. Uninteresting space of hard theorems ∃ x f ( x ) = y (where f is a one-way function) • easy to prove if you can choose what y is • hard to prove if y is fixed so hard that we can’t expect the prover to learn it this is stable - more learning and/or exploration won’t help 33

  35. Results (intuitionstic first-order - sequential calculus) 20 15 solved theorems 10 5 0 0 5 10 15 20 25 time (hours) 34

  36. Results Solved: ⊢ ( ∀ a ∀ b p c ( f c ( a , b )) → ∃ d ∃ e p c ( f c ( d , e ))) ⊢ ( ¬ ( p a ( ∅ ) → p b ( ∅ )) → ( p b ( ∅ ) → p a ( ∅ ))) Unsolved: ⊢ ( ∃ a p b ( a ) → ∃ c p b ( c )) (3) 35

  37. Results (intuitionstic first-order - sequential calculus) construction failed proven not proven 100% 75% 50% 25% 0% 5 10 15 20 time 36

  38. Results (intuitionistic first-order - sequential calculus) unproven theorems - first hour: A , ⊥ ⊢ C ⊢ ( ⊥ → B ) ( A → B ) , A ⊢ B A , B , C , D , E , F , G , H ⊢ H A , B , C , D , E , F , G , H , I , J , K , L , M ⊢ M A , B , C , D , E , F , G , H , I ⊢ I 37

  39. Results (intuitionistic first-order - sequential calculus) unproven theorems - second hour: ∀ a Ω a C ⊢ Ω a C ⊢ ( B ∨ ( ¬⊥ ∨ C )) ( A ∧ Ω c Ω e F ) ⊢ ∃ e Ω c Ω e F ( A ∧ B ) ⊢ ( D → B ) ( A ∧ B ) ⊢ ( D ∨ A ) ⊢ (( B ∧ ( C ∧ D )) → C ) 38

  40. Results (intuitionistic first-order - sequential calculus) unproven theorems - third hour: ∀ a (Ω c Ω a E ∧ Ω g (Ω a J ⋆ Ω a L )) ⊢ Ω g (Ω a J ⋆ Ω a L ) A , B , C , D , E , F , G , (( H ∧ ⊥ ) ∧ I ) ⊢ ¬ K A , B , C , D , E , F , G , H , ⊥ ⊢ ( J ∨ K ) A , ¬ B , C , ( D ∧ B ) ⊢ ( F ∨ G ) ∀ a ( p b ( f c ( f d ( a , ∅ ) , ∅ )) ∧ ⊥ ) , ¬¬ E ⊢ ∃ g Ω g I A , B , ¬ C , D , E , ( C ∧ F ) ⊢ ( H ↔ ¬⊥ ) 39

  41. Results (intuitionistic first-order - sequential calculus) unproven theorems - twelth hour: A , B , ( ∀ c Ω e (Ω c H ⋆ ¬¬¬¬ Ω j Ω l ¬ ( ¬⊥ ⋆ ( ¬¬ ( ⊥ ⋆ Ω c Q ) ⋆ ¬¬ Ω c S ))) ↔ A ) ⊢ Ω e (Ω c H ⋆ ¬¬¬¬ Ω j Ω l ¬ ( ¬⊥ ⋆ ( ¬¬ ( ⊥ ⋆ Ω c Q ) ⋆ ¬¬ Ω c S ))) A , B , ( ∀ c X ↔ A ) ⊢ X 40

  42. How to do better • train longer and/or harder costly • relegate low-level reasoning to some more efficient solver need to invent some other mechanism for generating theorems • allow use of theorems, not only axioms action space becomes large and changing over time all above still face uninteresting theorem space • use some other objective would be nice to find theorems that are useful in proving other theorems – but how exactly would that work? 41

  43. Thank you for your attention! Stanisław Purgał

Recommend


More recommend