leelachesszero
play

LeelaChessZero Open Source Community (F. Huizinga) Overview What - PowerPoint PPT Presentation

LeelaChessZero Open Source Community (F. Huizinga) Overview What is Lc0? The GameTree and A0 in a nutshell Contribute Useful links Technical details What is Lc0? 2016 Deepminds AlphaGo 2017 AlphaZero


  1. LeelaChessZero Open Source Community (F. Huizinga)

  2. Overview ● What is Lc0? ● The GameTree and A0 in a nutshell ● Contribute ● Useful links ● Technical details

  3. What is Lc0? ● 2016 Deepmind’s AlphaGo ● 2017 AlphaZero ● 2017 LeelaZero ● 2018 LeelaChessZero

  4. The Game Tree

  5. Why care? ● General approach, no domain knowledge required (Go, Chess, Shogi, …) ● Visual interpretation of the game allows for a deep positional - and materialistic understanding obtained from selfplay ● Fascinating gameplay, see youtube videos on alphazero/leelachesszero

  6. LeelaChessZero ● Initially missing details on the neural network architecture ● Variable compute budget ● Obtain dedicated hardware for training ● Always looking for contributors ○ Developers ○ Computational help ○ Testers/Elo estimators ○ Enthusiasts

  7. Links ● lczero.org ● testtraining.lczero.org ● github.com/LeelaChessZero ● discord.gg/pKujYxD

  8. Thanks to ● DeepMind ● Gian-Carlo Pascutto ● Leela Developers ● Lc0 Developers ● Testers ● Chess enthusiasts

  9. Minimax Algorithm +1 max(-1, +1, -1) -1 +1 -1 min(0, 0, -1) min(0, +1) 0 0 -1 0 +1

  10. Evaluation Function ● Minimax unable to reach terminal nodes given time constraints ● Approximate minimax value of subtree ● Must evaluate non-terminal nodes ● Centuries of human chess understanding to properly define this function

  11. Minimax + Eval 2 max(-3, 2, 0) -3 2 0 min(8, -3, 1) min(2, 4) 8 -3 1 2 4

  12. AlphaZero Main objective: Prune the gametree Learn the evaluation function (value) and most promising moves (policy) of the gametree iteratively from selfplay data.

  13. Neural Network Expected outcome: 1 X O Neural Network Move distribution X O 0 2 4 1 2 1 0

  14. Training Data Result Win +1 Loss -1 Draw 0 Expected outcome: 1 Game state X O Neural Network Policy Move distribution X O 0 2 4 1 Obtain data through selfplay 2 1 0

  15. (MCT) Search

  16. (MCT) Search

  17. (MCT) Search

  18. (MCT) Search

  19. (MCT) Search

  20. (MCT) Search

  21. Records of data (State 1 , Policy 1 , Result 1 ) (State 2 , Policy 2 , Result 2 ) ... (State n , Policy n , Result n ) Where n is the total moves in the game played.

Recommend


More recommend