AlphaGo, etc. Lab 4 Due Feb. 29 (you have two weeks 1.5 - PowerPoint PPT Presentation

AlphaGo, etc.

Lab 4 ● Due Feb. 29 (you have two weeks … 1.5 remaining) ● new game0.py with show_values for debugging

Exam on Tuesday in lab ● I sent out a topics list last night. ● On Monday in lecture, we’ll be doing review problems, plus Q&A. ○ We’ll also do Q&A at the end today if there’s time. ○ I plan to send out review problems over the weekend. What sorts of questions will be on the exam? ● selecting an appropriate algorithm for various problems ○ state space search vs. local search; BFS vs. A*; minimax vs. MCTS... ● setting up an appropriate model for the problem and algorithm ○ generating neighbors; identifying a goal; describing utilities; choosing a heuristic... ● stepping through algorithms ○ identify the next state; list the order nodes are expanded; eliminate dominated strategies...

AlphaGo neural networks normal MCTS

AlphaGo neural networks evaluation selection evaluation

CS63 topic neural networks Step 1: learn to predict human moves week 7, 14? ● used a large database of online expert games ● learned two versions of the neural network ○ a fast network P � for use in evaluation ○ an accurate network P � for use in selection

CS63 topic reinforcement Step 2: improve the accurate network learning weeks 9-10 ● run large numbers of self-play games CS63 topic stochastic ● update the network using reinforcement learning gradient ascent week 3 ○ weights updated by stochastic gradient ascent

Step 3: learn a board evaluation network, V � ● use random samples from the self-play database ● prediction target: probability that black wins from a given board

AlphaGo tree policy select nodes randomly according to weight: prior is determined by the improved policy network P �

AlphaGo default policy When expanding a node, its initial value combines: ● an evaluation from value network V � ● a rollout using fast policy P � A rollout according to P � selects random moves with the estimated probability a human would select them instead of uniformly randomly.

AlphaGo results ● Beat a low-rank professional player (Fan Hui) 5 games to 0. ● Will take on a top professional player (Lee Sedol) March 8-15 in Seoul. ● There are good reasons to think AlphaGo may lose: ○ AlphaGo’s estimated ELO rating is lower than Lee’s. ○ Professionals who analyzed AlphaGo’s moves don’t think it can win. ○ Deep Blue lost to Kasparov on its first attempt after beating lower-ranked grandmasters.

Transforming normal to extensive form Key idea: represent simultaneous moves with information sets. 1 2 A B A B 2 2 A 5,5 2,8 1 A B A B B 1,3 3,0 (5,5) (2,8) (1,3) (3,0)

Transforming extensive to normal form 2 Key idea: strategies are complete policies, specifying an L R action for every information set. LLL 1,2 4,4 1 LLR 1,2 4,4 1 L R LRL 0,3 4,4 1 LRR 0,3 4,4 2 2 L R RLL 1,4 3,2 2 3 1 R L 1 RLR 1,4 0,0 L R L R RRL 1,4 3,2 1,2 0,3 4,4 1,4 3,2 0,0 RRR 1,4 0,0

DESIGN DIMENSIONS Improvements Utility - modularity - iterative deepening - preferences - representation scheme - branch and bound, IDA* - expected utility maximizing - discreteness - multiple searches Extensive-Form Games - planning horizon - game tree representation - uncertainty LOCAL SEARCH - backwards induction - dynamic environment - state spaces - minimax - number of agents - cost functions - alpha-beta pruning - learning - neighbor generation - heuristic evaluation - computational limitations Hill-Climbing Normal Form Games - random restarts - payoff matrix repr. STATE SPACE SEARCH - random moves - removing dominated strats - state space modeling - simulated annealing - pure-strategy Nash eq. - completeness - temperature, decay rate - find one - optimality Population Search - mixed strategy Nash eq. - time/space complexity - (stochastic) beam search - verify one Uninformed Search - gibbs sampling - matrix/tree equivalence - depth-first - genetic algorithms - breadth-first - select/crossover/mutate MONTE CARLO SEARCH - uniform cost - state representation - random sampling evaluation Informed Search - satisfiability - explore/exploit tradeoff - greedy - gradient ascent Monte Carlo Tree Search - A* - tree policy - heuristics, admissibility GAME THEORY - default policy - UCT/UCB

AlphaGo, etc. Lab 4 Due Feb. 29 (you have two weeks 1.5 - PowerPoint PPT Presentation

AlphaGo, etc. Lab 4 Due Feb. 29 (you have two weeks 1.5 remaining) new game0.py with show_values for debugging Exam on Tuesday in lab I sent out a topics list last night. On Monday in lecture, well be doing review

If Mathematical Proof is a Game, What are the States and Moves? David McAllester 1 AlphaGo Fan

AlphaGo 2/17/17 Video https://www.youtube.com/watch?v=g-dKXOlsf98 Figure from the AlphaGo Paper

Current AI A quick summary Issam June 22, 2016 University of British Columbia AlphaGo 1

AI and Security: Lessons, Challenges & Future Directions Dawn Song UC Berkeley AlphaGo:

A Deep Journey of Playing Games with RL NSE Seminar Kim Hammar kimham@kth.se January 31, 2020

ETC/ACM air quality mapping method and its evaluation Jan Horlek (ETC/ACM, CHMI) Nina

HCC@UF Lab Resources Overview (and Tour) Lisa Anthony, PhD January 12, 2017 HCC@UF Lab

Lab 7 Lab 6 Review Review for Lab 7 March 5, 2019 Sprenkle - CSCI111 1 Lab 7: Pair

Tuberculosis Researches in Thailand

Medical Lab Medical Lab Technology Technology - ELO ELO What is a Medical lab What is a

Computer Applications Lab Computer Applications Lab Lab 1 Lab 1 Introduction to Matlab

Week 1 Tutorial: Lab Preview & Building Gates Lab 0 Using the DE2. Creating a project

Computer Applications Lab Computer Applications Lab Lab 7 Lab 7 Designing GUI with Matlab

Computer Applications Lab Computer Applications Lab Lab 9 Lab 9 Numerical Calculus and Symbolic

Lab Overview Review lab 8 Prep for lab 9 March 20, 2018 Sprenkle - CSCI111 1 Lab 8:

Strengthening ETC Engagement With NGOs Session Purpose 1. Inform members of the ETC NGO

sr s t r P

Computable Lower Bounds for Capacities of Input-Driven Finite-State Channels V. Arvind Rameshwar

Secure and Trustworthy Cyber-Physical System Design: A Cross-Layer Perspective Pierluigi Nuzzo

An Overview of the AI Safety Landscape Workshop on Reliable Artificial Intelligence 2017, ETH

Semi-leptonic and Dileptonic Top-Quark Decays at ATLAS Raphael Mameghani IMPRS/GK Young

UMBC A B M A L T F O U M B C I M Y O R T 1 (May 5, 2002) I E S R C E O

Distributed Systems Security Topics Byzan7ne fault resistance BitCoin Course Wrap Up

Cataloging Roundtable Item Status and Copy Buckets Item Status provides a wealth of information!

AlphaGo, etc. Lab 4 Due Feb. 29 (you have two weeks 1.5 - PowerPoint PPT Presentation

AlphaGo, etc. Lab 4 Due Feb. 29 (you have two weeks 1.5 remaining) new game0.py with show_values for debugging Exam on Tuesday in lab I sent out a topics list last night. On Monday in lecture, well be doing review

If Mathematical Proof is a Game, What are the States and Moves? David McAllester 1 AlphaGo Fan

AlphaGo 2/17/17 Video https://www.youtube.com/watch?v=g-dKXOlsf98 Figure from the AlphaGo Paper

Current AI A quick summary Issam June 22, 2016 University of British Columbia AlphaGo 1

AI and Security: Lessons, Challenges &amp; Future Directions Dawn Song UC Berkeley AlphaGo:

A Deep Journey of Playing Games with RL NSE Seminar Kim Hammar kimham@kth.se January 31, 2020

ETC/ACM air quality mapping method and its evaluation Jan Horlek (ETC/ACM, CHMI) Nina

HCC@UF Lab Resources Overview (and Tour) Lisa Anthony, PhD January 12, 2017 HCC@UF Lab

Lab 7 Lab 6 Review Review for Lab 7 March 5, 2019 Sprenkle - CSCI111 1 Lab 7: Pair

Tuberculosis Researches in Thailand

Medical Lab Medical Lab Technology Technology - ELO ELO What is a Medical lab What is a

Computer Applications Lab Computer Applications Lab Lab 1 Lab 1 Introduction to Matlab

Week 1 Tutorial: Lab Preview &amp; Building Gates Lab 0 Using the DE2. Creating a project

Computer Applications Lab Computer Applications Lab Lab 7 Lab 7 Designing GUI with Matlab

Computer Applications Lab Computer Applications Lab Lab 9 Lab 9 Numerical Calculus and Symbolic

Lab Overview Review lab 8 Prep for lab 9 March 20, 2018 Sprenkle - CSCI111 1 Lab 8:

Strengthening ETC Engagement With NGOs Session Purpose 1. Inform members of the ETC NGO

sr s t r P

Computable Lower Bounds for Capacities of Input-Driven Finite-State Channels V. Arvind Rameshwar

Secure and Trustworthy Cyber-Physical System Design: A Cross-Layer Perspective Pierluigi Nuzzo

An Overview of the AI Safety Landscape Workshop on Reliable Artificial Intelligence 2017, ETH

Semi-leptonic and Dileptonic Top-Quark Decays at ATLAS Raphael Mameghani IMPRS/GK Young

UMBC A B M A L T F O U M B C I M Y O R T 1 (May 5, 2002) I E S R C E O

Distributed Systems Security Topics Byzan7ne fault resistance BitCoin Course Wrap Up

Cataloging Roundtable Item Status and Copy Buckets Item Status provides a wealth of information!

AI and Security: Lessons, Challenges & Future Directions Dawn Song UC Berkeley AlphaGo:

Week 1 Tutorial: Lab Preview & Building Gates Lab 0 Using the DE2. Creating a project