why is go hard for computers to play
play

Why is Go hard for computers to play? Game tree complexity = b d - PowerPoint PPT Presentation

Why is Go hard for computers to play? Game tree complexity = b d Brute force search intractable: 1. Search space is huge 2. Impossible for computers to evaluate who is winning Convolutional neural network Value network Evaluation v


  1. Why is Go hard for computers to play? Game tree complexity = b d Brute force search intractable: 1. Search space is huge 2. “Impossible” for computers to evaluate who is winning

  2. Convolutional neural network

  3. Value network Evaluation v (s) � � s Position

  4. Policy network Move probabilities p (a|s) � � s Position

  5. Neural network training pipeline Human expert Supervised Learning Reinforcement Learning Self-play data Value network positions policy network policy network

  6. Supervised learning of policy networks Policy network: 12 layer convolutional neural network Training data: 30M positions from human expert games (KGS 5+ dan) Training algorithm: maximise likelihood by stochastic gradient descent Training time: 4 weeks on 50 GPUs using Google Cloud Results: 57% accuracy on held out test data (state-of-the art was 44%)

  7. Reinforcement learning of policy networks Policy network: 12 layer convolutional neural network Training data: games of self-play between policy network Training algorithm: maximise wins z by policy gradient reinforcement learning Training time: 1 week on 50 GPUs using Google Cloud Results: 80% vs supervised learning. Raw network ~3 amateur dan.

  8. Reinforcement learning of value networks Value network: 12 layer convolutional neural network Training data: 30 million games of self-play Training algorithm: minimise MSE by stochastic gradient descent Training time: 1 week on 50 GPUs using Google Cloud Results: First strong position evaluation function - previously thought impossible

  9. Exhaustive search

  10. Reducing depth with value network

  11. Reducing breadth with policy network

  12. Professional Amateur Beginner dan (p) dan (d) kyu (k) Evaluating AlphaGo against computers 9p 7p 5p 3p 1p 9d 7d 5d 3d 1d 1k 3k 5k 7k Gnu Go Fuego Pachi Zen Crazy Stone AlphaGo (Nature v13) AlphaGo (Seoul v18) 4500 4000 3500 3000 2500 2000 1500 1000 500 0

  13. Computer Programs Calibration Human Players Lee Sedol (9p) DeepMind challenge match AlphaGo (Mar 2016) Top player of 4-1 past decade Beats Beats Fan Hui (2p) Nature match AlphaGo (Oct 2015) 3-times reigning 5-0 Euro Champion Beats Beats KGS Amateur Crazy Stone and Zen humans

  14. What’s Next?

  15. Demis Hassabis

Recommend


More recommend