Thinking Fast and Slow with Deep Learning and Tree Search Thomas - PowerPoint PPT Presentation

Jan 24, 2023 •20 likes •180 views

Thinking Fast and Slow with Deep Learning and Tree Search Thomas Anthony, Zheng Tian, and David Barber University College London Alex Adam and Fartash Faghri CSC2547 Hex What is MCTS Tree search algo that addresses limitations of

Thinking Fast and Slow with Deep Learning and Tree Search Thomas Anthony, Zheng Tian, and David Barber University College London Alex Adam and Fartash Faghri CSC2547
Hex
What is MCTS ● Tree search algo that addresses limitations of Alpha-Beta Search ● Alpha-Beta worst case explores O(B^D) nodes ● MCTS approximates Alpha-Beta by exploring promising actions and using simulations 1. Select nodes according to 2. At leaf node a. If node has not been explored, simulate until end of game b. If node has been explored, add child states to tree, then simulate from random child state 3. Update UCT values of nodes along path from leaf to root
MCTS in Action
Why not REINFORCE? Maximize the expected reward: Gradient estimator: Find policy that maximizes the expected reward.
Why not REINFORCE? Challenges: ● We can only use differentiable policies (Hence use MCTS!) ● High variance of REINFORCE ● Need to compute efficiently ○ Solution 1: Do roll-outs to compute exactly (with a bit of MCTS) ○ Solution 2: Approximate r(s, a) with a neural network called Value Network
Imitation Learning ● Consists of an expert and an apprentice ● Apprentice tries to mimic expert Expert Apprentice
Imitation Learning Limits ● The apprentice will never exceed performance of expert ● Nothing can beat tree search given infinite resources and time ● In many domains, like game playing, expert might not be good enough Eat Sleep Fail Repeat
ExIt Pseudocode
The Minimal Policy Improvement Technique MCTS as a policy improvement operator Define the goal of learning as finding policy p* s.t. Gradient descent to solve this: Instead of minimizing the norm of minimize:
Learning Targets ● Chosen-action Targets (CAT) loss: Where is the move selected by MCTS. ● Tree-Policy Targets (TPT) loss: Where n(s, a) is the number of times an edge has been traversed.
Expert Improvement Upper confidence bounds for trees: Bias MCTS tree policy:
Value Network and AlphaGo Zero Value Networks can do better than random rollouts if trained with enough data AlphaGo Zero is very similar with a slight difference in the loss function
Results: ExIt vs REINFORCE
Results: Value and Policy ExIt vs MoHEX
References ● Anthony, Thomas, Zheng Tian, and David Barber. "Thinking fast and slow with deep learning and tree search." Advances in Neural Information Processing Systems. 2017. ● Silver, David, et al. "Mastering the game of go without human knowledge." Nature 550.7676 (2017): 354. ● http://www.inference.vc/alphago-zero-policy-improvement-and-vector-fields/ ● Farquhar, Gregory, et al. "TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning." arXiv preprint arXiv:1710.11417 (2017).

Recommend

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is a binary search tree if each element in the left subtree is smaller than the root, each element in the right subtree is larger than the root,

757 views • 51 slides

Machine Learning Machine Learning Fast & Slow Fast & Slow Suman Deb Roy Suman Deb Roy

Machine Learning Machine Learning Fast & Slow Fast & Slow Suman Deb Roy Suman Deb Roy Lead Data Scientist @ betaworks bot www.rundexter.com /messaging www.poncho.is www.digg.com www.digg.com/messaging www.rundexter.com

781 views • 42 slides

Modern Monte Carlo Tree Search Andrew Li, John Chen, Keiran Paster 1 Outline Motivation

Modern Monte Carlo Tree Search Andrew Li, John Chen, Keiran Paster 1 Outline Motivation Optimistic Exploration and Bandits Monte Carlo Tree Search (MCTS) Learning to Search in MCTS Thinking Fast and Slow with Deep Learning

567 views • 35 slides

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Columnstore and B+ tree Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree B+ tree B+ tree & Columnstore on same table = Hybrid design 4 ? C O L C O L B+ tree B+ tree Are Hybrid

1.09k views • 79 slides

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte Helmert Universit at Basel March 7, 2016 Introduction Tree Search Graph Search Evaluating Search Algorithms Summary State-Space Search:

550 views • 23 slides

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in which, at every node v, the values stored in the left subtree of v are less than the value at v and the values stored in the right subtree are

280 views • 25 slides

This is a slide Group Philsophy Binary Search Tree (BST) A binary search tree 12 5 15 3

This is a slide Group Philsophy Binary Search Tree (BST) A binary search tree 12 5 15 3 8 16 17 4 6 9 7 Binary Search Tree 2 Construct binary search tree Input[2, 4, 5, 1, 3] 1 4 3 5 O(logN) What if the input

470 views • 25 slides

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class BTree(Tree): 4 Binary Tree Class A binary tree is a tree that has class BTree(Tree): a left branch and a right branch 4 Binary Tree Class A binary

1.05k views • 93 slides

Big and Small Steps for Fast and Slow Provability Paula Henk illc , University of Amsterdam

GL GLT Fast provability Slow provability Big and Small Steps for Fast and Slow Provability Paula Henk illc , University of Amsterdam September 1, 2016 1 / 6 GL GLT Fast provability Slow provability G odel-L ob provability logic GL

334 views • 5 slides

Fast-slow systems with chaotic noise David Kelly Ian Melbourne Courant Institute New York

Fast-slow systems with chaotic noise David Kelly Ian Melbourne Courant Institute New York University New York NY www.dtbkelly.com May 12, 2015 Averaging and homogenization workshop, Luminy. Fast-slow systems We consider fast-slow systems

769 views • 27 slides

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep 3D Representation Learning for Visual Computing Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms Conclusion 2 Outline Overview of 3D deep learning Background 3D deep learning tasks 3D deep

1.66k views • 122 slides

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep

1.15k views • 79 slides

Visual Reasoning Peng Wang School of Computing and Information Technology University of

Two New Datasets and Tasks on Visual Reasoning Peng Wang School of Computing and Information Technology University of Wollongong Fast thinking vs. Slow thinking Fast thinking Slow thinking Object recognition Ravens Progressive

883 views • 30 slides

Week 8 Oliver Kullmann Binary trees The notion BinaryTrees of binary search tree Tree

CS 270 Algorithms Week 8 Oliver Kullmann Binary trees The notion BinaryTrees of binary search tree Tree traversal Binary trees 1 Queries in binary The notion of binary search tree search trees 2 Insertion Tree

1.02k views • 49 slides

SEARCHING: FAST AND SLOW Susan Dumais http://research.microsoft.com/~sdumais #TAIA2014 Jul

SEARCHING: FAST AND SLOW Susan Dumais http://research.microsoft.com/~sdumais #TAIA2014 Jul 11, 2014 Searching: Fast and Slow Tremendous engineering effort aimed at making search fast and for good reason But, many compromises

446 views • 25 slides

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Neural Networks and Deep Reinforcement Learning Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and Courville [chapt. 6,7,8]; AIMA [sect. 21.1-21.3]; Sutton and Barto, Reinforcement Learning: an

528 views • 35 slides

A Higher Structure Identity Principle Dimitris Tsementzis (cww B. Ahrens, P. North, M. Shulman)

A Higher Structure Identity Principle Dimitris Tsementzis (cww B. Ahrens, P. North, M. Shulman) October 28, 2017 Dimitris Tsementzis HSIP October 28, 2017 1 / 19 Main Idea Theorem (HoTT Book, Theorem 9.4.16) For any univalent precategories

1.3k views • 55 slides

Articles Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation

Articles Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study Benjamin F Voight*, Gina M Peloso*, Marju Orho-Melander, Ruth Frikke-Schmidt, Maja Barbalic, Majken K Jensen, George Hindy, Hilma Hlm, Eric L

301 views • 9 slides

Quantale-valued Approach Spaces via Closure and Convergence Hongliang Lai (based on joint work

Quantale-valued Approach Spaces via Closure and Convergence Hongliang Lai (based on joint work with Walter Tholen) Sichuan University, Chengdu, China (Permanent) York University, Toronto, Canada (Visiting) Hongliang Lai (Sichuan Univ. &

858 views • 29 slides

1 US COMMERCIAL PROPERTY SALES US COMMERCIAL PROPERTY SALES Office, Industrial, Multifamily

OVERVIEW Commercial Real Estate Investment Market Update Housing versus Commercial Real Estate Presented To: Changes in the Capital Markets Commercial Real Estate as an Investment Class Trends by Property Type Office

178 views • 4 slides

Cache-aware Scheduling and Performance Modeling with LLVM-Polly and Kerncraft Julian Hammer

Cache-aware Scheduling and Performance Modeling with LLVM-Polly and Kerncraft Julian Hammer [RRZE] <julian.hammer@fau.de>, Johannes Doerfert [UdS] <doerfert@cs.uni-saarland.de>, Georg Hager [RRZE], Gerhard Wellein [RRZE] and

631 views • 35 slides

Outline Background on dose response (concentration response) Different types of

6/27/2016 Dose Response Modeling: An Example Using Ozone and Mortality Statistical Methods and Analysis of Health Data Workshop Mumbai, India May 30, 2016 Michelle L. Bell , Yale University Outline Background on dose response

690 views • 11 slides

Workshop 11.1: Generalized linear models Murray Logan 26-011-2013 Other data types Binary -

Workshop 11.1: Generalized linear models Murray Logan 26-011-2013 Other data types Binary - only 0 and 1 (dead/alive) (present/absent) Proportional abundance - range from 0 to 100 Count data - min of zero General linear models

636 views • 59 slides

Dose-response modelling using R Christian Ritz Faculty of Life Sciences, University of

Christian Ritz Dose-response modelling using R Christian Ritz Faculty of Life Sciences, University of Copenhagen, Denmark Rennes, July 8 2009 Christian Ritz (Uni. Copenhagen) useR! 2009 1 / 7 Package overview: drc Principal idea: use of self

98 views • 7 slides