Mastering the Game of Go without Human Knowledge 06/15/18 - PowerPoint PPT Presentation

Mastering the Game of Go without Human Knowledge 06/15/18 Presented by: Henry Chen CS885 Reinforcement Learning

Introduction Image source: https://medium.com/syncedreview/alphago-zero-approaching-perfection-d8170e2b4e48 PAGE 2

Introduction The Game of Go ▪ ancient board game ▪ 19 x 19 grid ▪ complexity: ~ 10 170 Image source: Challenging AI problem https://medium.com/@karpathy/alphago-in-context-c47718cb95a5 ▪ How to search through an intractable search space? ▪ Breakthrough: AlphaGo PAGE 3

Background AlphaGo ▪ March 2016: defeated 18-time world champion Lee Sedol 4-1 Image source: https://www.tastehit.com/blog/google-deepmind-alphago-how-it-works/ PAGE 4

Background AlphaGo - Architecture 1. Policy Network ▪ Purpose: decide next best move ▪ Convolution Neural Network (13 hidden layers) ▪ Stage 1: Supervised Learning to predict human expert moves (57%) ▪ Stage 2: Improve network by Policy Gradient Reinforcement Learning through self-play using roll-out policy (80% > stage 1) Source: Google DeepMind, Mastering the Game of Go with Deep Neural Networks and Tree Search PAGE 5

Background AlphaGo - Architecture 2. Value Network ▪ Purpose: evaluate chances of winning ▪ Convolution Neural Network (14 hidden layers) ▪ Train network by regression on state-outcome pair sampled from self-play data using policy network Source: Google DeepMind, Mastering the Game of Go with Deep Neural Networks and Tree Search PAGE 6

Background Source: Google DeepMind, Mastering the Game of Go with Deep Neural Networks and Tree Search Policy Network (stage 1): Policy Network (stage 2): Value Network ▪ ▪ ▪ 30 millions position 10,000 mini-batches 30 millions unique from 160,000 human of 128 self-play games positions ▪ ▪ games 50 GPUs 50 GPUs ▪ ▪ ▪ 50 GPUs 1 day 1 week ▪ 3 weeks PAGE 7

Background 3. Monte-Carlo Tree Search (MCTS) Purpose: Combining policy and value networks to select actions by ▪ lookahead search Asynchronous multi-threaded search (distributed ~50 GPUs) ▪ Source: Google DeepMind, Mastering the Game of Go with Deep Neural Networks and Tree Search PAGE 8

Background Limitations ▪ Require large data-set of expert games ▪ Use of handcraft features Source: Google DeepMind, Mastering the Game of Go with Deep Neural Networks and Tree Search ▪ Asynchronous training and computation intensive PAGE 9

Content of paper PAGE 10

Content of paper AlphaGo Zero 1. uses no Human Knowledge and learn only by Self-Play Source: Google DeepMind, Mastering the Game of Go without Human Knowledge PAGE 11

Content of paper AlphaGo Zero Source: Google DeepMind, Mastering the Game of Go without Human Knowledge PAGE 12

Content of paper AlphaGo Zero 2. Single Neural Network with ResNets Structure ▪ Dual purpose: decide next best move and evaluate chances of winning Source: Source: Google DeepMind, http://neural.vision/blog Mastering the Game of /article-reviews/deep-lea Go without Human rning/he-resnet-2015/ Knowledge PAGE 13

Content of paper AlphaGo Zero 3. Simpler Tree Search Source: Google DeepMind, Mastering the Game of Go without Human Knowledge PAGE 14

Content of paper AlphaGo Zero 4. Requires no handcraft features ▪ Only requires raw board representations and its history, plus some basic game rules as neural network input 5. Improved computation efficiency Single machine on Google Cloud with 4 TPUs ▪ Source: Google DeeMind, Mastering the Game of Go without Human Knowledge PAGE 15

Empirical Evaluation ▪ Training for 3 days Source: Google DeepMind, Mastering the Game of Go without Human Knowledge PAGE 16

Empirical Evaluation ▪ Comparison of neural network architectures Source: Google DeepMind, Mastering the Game of Go without Human Knowledge PAGE 17

Empirical Evaluation ▪ Discovering existing strategies and some unknown by human Source: PAGE 18 Google DeepMind, Mastering the Game of Go without Human Knowledge

Empirical Evaluation ▪ Training for 40 days Source: Google DeepMind, Mastering the Game of Go without Human Knowledge PAGE 19

Conclusion ▪ Pure reinforcement learning is fully feasible, even in the most challenging domain ▪ It is possible to achieve superhuman performance, without human knowledge ▪ In the matter of days, AlphaGo Zero rediscover Go knowledge accumulated by human over thousands of year; it also discover new insights and strategies for the game PAGE 20

Discussion ▪ Some critics suggest AlphaGo is a very narrow AI and it rely on many properties of Go. Do you think the algorithm can be generalized for another domain? ▪ Did this paper inspire you in any way? Any suggestions for improvement? ▪ Do you think we should use AI to discover more knowledge? ▪ How do you feel about superintelligence AI? Are you in the Elon Musk or Mark Zuckerberg camp? PAGE 21

Images source: https://jedionston.wordpress.com/2015/02/14/go-wei-chi-vs-tafl-hnafatafl/ https://www.123rf.com/photo_69824284_stock-vector-thank-you-speech-bubble-in-retro-style-vector- illustration-isolated-on-white-background.html PAGE 22

Mastering the Game of Go without Human Knowledge 06/15/18 - PowerPoint PPT Presentation

Mastering the Game of Go without Human Knowledge 06/15/18 Presented by: Henry Chen CS885 Reinforcement Learning Introduction Image source: https://medium.com/syncedreview/alphago-zero-approaching-perfection-d8170e2b4e48 PAGE 2 Introduction

THE MOD METHOD with VESPERS MASTERING In this Module What mastering can do & what it

e-Bug Junior Game Junior Game Game Style Game Process Demo Game Mechanics and

e-Bug Senior Game Senior Game Game Style Game Process Demo Game Puzzles and

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Game interoperability with functors functor AgsFun (structure Game : GAME) :> sig structure

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Mastering Complex Complex Analogue Analogue Mixed Signal Mixed Signal Mastering Systems with

Download How to Wash a Chicken Mastering the Business Presentation pdf ebook by Tim Calkins

Mastering the Gospel P resentation Welcome to the CMF Training page on Mastering a Gospel

Mastering Your Mindset Mastering Your Money Focus: How to Focus on Earning More Income, and

Game Loops CIS 580 - Fundamentals of Game Programming Hangman Game Phases Game Loop

VIDEOGAMES ARE A MESS Ian Bogost WHAT IS A GAME? Is a game a system of rules, or is a game a

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

Connect your device to application GAME ENGINE ON ANDROID Julian Chu Agenda We Love Game Why

Inductive general game playing Andrew Cropper, Richard Evans, and Mark Law General game playing

Rally-Owl Overview of Rally-Owl Game This game is based off of Rally-X The goal of the game is

[Transition from Matts presentation] Before the University Libraries at UNCG began making the

Mastering the game of Go with deep neural networks and tree search Article overview by

@ Mastering the Millennial Mindset and Beyond How to Attract and Retain Emerging Leaders Lisa

Effect of Non-Passive Operator on Enhanced Wave-Based Teleoperator for Robotic-Assisted Surgery:

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #8: DATABASES

ADEPTSHIP AD MYSTIC ADEPT We are ready to transfer to you blessings and initiations and an

Name: Jake Bobowski Office: SCI 261 Email: jake.bobowski@ubc.ca Website:

Mastering Drupal 8 Views Gregg Marshall Amanda Marshall http://bit.ly/D8Views Today About

Mastering the Game of Go without Human Knowledge 06/15/18 - PowerPoint PPT Presentation

Mastering the Game of Go without Human Knowledge 06/15/18 Presented by: Henry Chen CS885 Reinforcement Learning Introduction Image source: https://medium.com/syncedreview/alphago-zero-approaching-perfection-d8170e2b4e48 PAGE 2 Introduction

THE MOD METHOD with VESPERS MASTERING In this Module What mastering can do &amp; what it

e-Bug Junior Game Junior Game Game Style Game Process Demo Game Mechanics and

e-Bug Senior Game Senior Game Game Style Game Process Demo Game Puzzles and

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Game interoperability with functors functor AgsFun (structure Game : GAME) :&gt; sig structure

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Mastering Complex Complex Analogue Analogue Mixed Signal Mixed Signal Mastering Systems with

Download How to Wash a Chicken Mastering the Business Presentation pdf ebook by Tim Calkins

Mastering the Gospel P resentation Welcome to the CMF Training page on Mastering a Gospel

Mastering Your Mindset Mastering Your Money Focus: How to Focus on Earning More Income, and

Game Loops CIS 580 - Fundamentals of Game Programming Hangman Game Phases Game Loop

VIDEOGAMES ARE A MESS Ian Bogost WHAT IS A GAME? Is a game a system of rules, or is a game a

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

Connect your device to application GAME ENGINE ON ANDROID Julian Chu Agenda We Love Game Why

Inductive general game playing Andrew Cropper, Richard Evans, and Mark Law General game playing

Rally-Owl Overview of Rally-Owl Game This game is based off of Rally-X The goal of the game is

[Transition from Matts presentation] Before the University Libraries at UNCG began making the

Mastering the game of Go with deep neural networks and tree search Article overview by

@ Mastering the Millennial Mindset and Beyond How to Attract and Retain Emerging Leaders Lisa

Effect of Non-Passive Operator on Enhanced Wave-Based Teleoperator for Robotic-Assisted Surgery:

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #8: DATABASES

ADEPTSHIP AD MYSTIC ADEPT We are ready to transfer to you blessings and initiations and an

Name: Jake Bobowski Office: SCI 261 Email: jake.bobowski@ubc.ca Website:

Mastering Drupal 8 Views Gregg Marshall Amanda Marshall http://bit.ly/D8Views Today About

THE MOD METHOD with VESPERS MASTERING In this Module What mastering can do & what it

Game interoperability with functors functor AgsFun (structure Game : GAME) :> sig structure