learning theorem proving through self play
play

Learning theorem proving through self-play Stanisaw Purga Overview - PowerPoint PPT Presentation

Learning theorem proving through self-play Stanisaw Purga Overview AlphaZero Proving game adjusting MCTS for proving game some results 2019-10 1 Neural black box game state S move policy expected outcome R n v


  1. Learning theorem proving through self-play Stanisław Purgał

  2. Overview • AlphaZero • Proving game • adjusting MCTS for proving game • some results 2019-10 1

  3. Neural black box game state S move policy expected outcome π ∈ R n v ∈ R 2019-10 2

  4. Neural black box ( S 1 , π 1 , v 1 ) . . . ( S n , π n , v n ) 2019-10 3

  5. Monte-Carlo Tree Search game state S move policy expected outcome π ∈ R n v ∈ R 2019-10 4

  6. Monte-Carlo Tree Search S choose a child according S 1 S 2 S 3 to the formula: v π √ n n i π i + v i c · log n + c base + 1 � � c = + c init c base weighted c base = 19652 average c init = 1 . 25 2019-10 5

  7. Monte-Carlo Tree Search 2019-10 6

  8. Monte-Carlo Tree Search 2019-10 7

  9. Why not maximum? game state S move policy expected outcome π ∈ R n v ∈ R v = t + error 2019-10 8

  10. Why not maximum? v 1 = t 1 + error v 2 = t 2 + error v 3 = t 3 + ERROR min / max v = t + ERROR 2019-10 9

  11. Why not maximum? v 1 = t 1 + error v 2 = t 2 + error v 3 = t 3 + ERROR average v = t + Σ error n 2019-10 10

  12. Closing the loop • play lots of games • choose moves randomly, according to MCTS policy • use finished games for training: • desired value in the result of the game • desired policy is the MCTS policy • also add noise to neural network output to increase exploration 2019-10 11

  13. Proving game theorem Prove the theorem win lose 2019-10 12

  14. Proving game Construct a theorem Adversary wins Prove the theorem Prover wins 2019-10 13

  15. Prolog-like proving A ⊢ X A ⊢ Y (1) A ⊢ X ∧ Y holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) (2) 2019-10 14

  16. Prolog-like proving [ X : A ⊢ X ∧ ¬¬ X , ... ] A ⊢ X ∧ Y :- A ⊢ X , A ⊢ Y X : A ⊢ X ∧ ¬¬ X :- X : A ⊢ X , X : A ⊢ ¬¬ X [ X : A ⊢ X , X : A ⊢ ¬¬ X , ... ] 2019-10 15

  17. Prolog-like proving [ X : A , and ( X , not ( not ( X )))) , ... ] holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) holds ( X : A , and ( X , not ( not ( X )))) :- holds ( X : A , X ) , holds ( X : A , not ( not ( X ))) [ holds ( X : A , X ) , holds ( X : A , not ( not ( X ))) , ... ] 2019-10 16

  18. Prolog-like theorem constructing [ holds ( X : A , and ( X , not ( not ( X )))) , ... ] holds ( X : A , and ( X , not ( not ( X )))) :- holds ( X : A , X ) , holds ( X : A , not ( not ( X ))) holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) [ holds ( X : A , X ) , holds ( X : A , not ( not ( X ))) , ... ] bad idea 2019-10 17

  19. Prolog-like theorem constructing [ holds ( A , ♣ ) , ... ] holds ( A , ♣ ) :- holds ( A , or ( ♦ , ♥ )) , holds ( A , implies ( ♦ , ♣ )) , holds ( A , implies ( ♥ , ♣ )) holds ( A , Z ) :- holds ( A , or ( X , Y )) , holds ( A , implies ( X , Z )) , holds ( A , implies ( Y , Z )) [ � , � , � , ... ] bad idea 2019-10 18

  20. Prolog-like theorem constructing [ T ] holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) holds ( A , and ( X , Y )) :- holds ( A , X ) , holds ( A , Y ) [ holds ( A , X ) , holds ( A , Y ) ] 2019-10 19

  21. Prolog-like theorem constructing T holds ( X : A , and ( X , not ( not ( X )))) holds ( x:a , and ( x , not ( not ( x )))) 2019-10 20

  22. Forcing termination of the game Step limit: • ugly extension of game state • strategy may depend on number of steps left • even if we hide it, there is a correlation: large term constructed ∼ few steps left ∼ will likely lose 2019-10 21

  23. Forcing termination of the game Sudden death chance: • game states nicely equal • no hard limit for length of a theorem During training playout, randomly terminate game with chance p d . In MCTS, adjust value v ′ = ( − 1 ) · p d + v · ( 1 − p d ) . 2019-10 22

  24. Disadvantages of this game • two different players - if one player starts winning every game, we can’t learn much • proof use single inference steps - inefficient • players don’t take turns - MCTS not designed for that situation 2019-10 23

  25. Not using maximum 2019-10 24

  26. Not using maximum 2019-10 25

  27. Not using maximum 2019-10 26

  28. Not using maximum 2019-10 27

  29. Certainty propagation 2019-10 28

  30. Certainty propagation 2019-10 29

  31. Certainty propagation 2019-10 30

  32. Certainty propagation recursively: for uncertain leafs: for certain leafs: v = min( u , max( l , a )) v = � v = result a = � +Σ v i · n i a = � a = result n + 1 l = max i l i l = − 1 l = result u = max i u i u = 1 u = result when player changes: • values and bounds flip • lower and upper bound switch places 2019-10 31

  33. Toy problem ablist([]). ablist([a|L]) :- ablist(L). ablist([b|L]) :- ablist(L). ablist([c|L]) :- ablist(L). ablist([d|L]) :- ablist(L). rev3([],L,L). rev3([H|T],L,Acc) :- rev3(T, L, [H|Acc]). revablist(L) :- ablist(T), rev3(L, T, []). 2019-10 32

  34. Toy problem evaluation ablist([a,b,a,b,a,b,b]), revablist([]), revablist([a]), revablist([b]), revablist([c,d]), revablist([c,a,b]), revablist([a,d,c,b]), revablist([a,d,c,a,a]), revablist([a,b,c,d,b,d]), revablist([d,b,c,a,d,a,b]), revablist([a,c,b,a,c,a,d,d]) 2019-10 33

  35. Certainty propagation effect 2019-10 34

  36. Learning the proving game Like AlphaZero, with few differences: • using Graph Attention Network for � • for theorems that prover failed to prove, show proper path with additional policy training samples • during evaluation, greedy policy and step limit instead of sudden death 2019-10 35

  37. Proving game evaluation Construct a theorem evaluation theorem Adversary wins Prove the theorem Prover wins 2019-10 36

  38. Learning toy problem 2019-10 37

  39. Intuitionistic propositional logic holds([A|T], A). holds(T, A) :- holds([B|T], A), holds(T, B). holds([H|T], A) :- holds(T, A). holds(T, impl(A, B)) :- holds([A|T], B). holds(T, B) :- holds(T, A), holds(T, impl(A, B)). holds(T, or(A, B)) :- holds(T, A). holds(T, or(A, B)) :- holds(T, B). holds(T, C) :- holds(T, or(A, B)), holds([A|T], C), holds([B|T], C). holds(T, and(A, B)) :- holds(T, A), holds(T, B). holds(T, A) :- holds(T, and(A, B)). holds(T, B) :- holds(T, and(A, B)). holds([false|T], A). 2019-10 38

  40. Classical propositional logic holds([A|T], A). holds(T, A) :- holds([B|T], A), holds(T, B). holds([H|T], A) :- holds(T, A). holds(T, impl(A, B)) :- holds([A|T], B). holds(T, B) :- holds(T, A), holds(T, impl(A, B)). holds(T, or(A, B)) :- holds(T, A). holds(T, or(A, B)) :- holds(T, B). holds(T, C) :- holds(T, or(A, B)), holds([A|T], C), holds([B|T], C). holds(T, and(A, B)) :- holds(T, A), holds(T, B). holds(T, A) :- holds(T, and(A, B)). holds(T, B) :- holds(T, and(A, B)). holds([false|T], A). holds(T, A) :- holds([impl(A, false)|T], false). 2019-10 39

  41. Learning classical propositional logic 2019-10 40

  42. Constructed theorem example ConstantNode ConstantNode and/2 ConstantNode ⊢ ((( d ∧ b ∧ c ) ∨ ( b ∧ c ∧ d )) = ⇒ b ) ∨ e and/2 and/2 ⊥ ⊢ a ∨ b ∨ c or/2 ⊢ (((( a ∧ ⊥ ∧ b ) = ⇒ c ) = ⇒ d ) = ⇒ d ) (( a ∧ b ) = ⇒ a ) = ⇒ ( ⊥ ∧ c ) ⊢ d impl/2 ConstantNode (( a = ⇒ ⊥ ) = ⇒ b ) , c , ( a = ⇒ b ) ⊢ b ConstantNode or/2 holds/2 RootNode 2019-10 41

  43. First-order logic %some classical logic neq(var([a|_]), var([b|_])). neq(var([b|_]), var([a|_])). neq(var([_|A]), var([_|B])) :- neq(var(A), var(B)). repl(var(A), R, var(A), R). repl(var(A), R, var(B), var(B)) :- neq(var(A), var(B)). repl(var(A), R, op(O, X1, Y1), op(O, X2, Y2)) :- repl(var(A), R, X1, X2), repl(var(A), R, Y1, Y2). repl(var(A), R, q(O, var(A), P), q(O, var(A), P)). repl(var(A), R, q(O, var(B), P1), q(O, var(B), P2)) :- neq(var(A), var(B)), repl(var(A), R, P1, P2). repl(var(A), R, false, false). repl(var(A), R, [], []). repl(var(A), R, [H1|T1], [H2|T2]) :- repl(var(A), R, H1, H2), repl(var(A), R, T1, T2). holds(T, q(forall, var(A), Phi)) :- repl(var(A), var(B), Phi, PhiBA), repl(var(B), false, [Phi|T], [Phi|T]), holds(T, PhiBA). holds(T, Phi) :- holds(T, q(forall, var(A), PhiA)), repl(var(A), B, PhiA, Phi). holds(T, q(exists, var(A), Phi)) :- repl(var(A), R, Phi, PhiR), holds(T, PhiR). holds(T, P) :- holds(T, q(exists, var(A), Phi)), repl(var(B), false, Phi, Phi), repl(var(A), var(B), Phi, PhiB), holds([PhiB|T], P). 2019-10 42

  44. Future work • better rule representation? • proper prover with a different construction mechanism? • different use cases? • more computational power? 2019-10 43

  45. Thank you for your attention! Stanisław Purgał

Recommend


More recommend