Extending MCTS 2-17-16 Reading Quiz (from Monday) What is the - PowerPoint PPT Presentation

Extending MCTS 2-17-16

Reading Quiz (from Monday) What is the relationship between Monte Carlo tree search and upper confidence bound applied to trees? a) MCTS is a type of UCT b) UCT is a type of MCTS c) both (they are the same algorithm) d) neither (they are different algorithms)

Reading Quiz Which of these functions from the lab4 pseudocode implements the tree policy ? a) UCB_sample b) random_playout c) backpropagation d) none of these

Generic MCTS algorithm The tree policy returns a The default policy returns child node in the explored a value estimate for a region of the tree. newly expanded node. UCT’s tree policy UCT’s default policy draws samples completes a uniform according to UCB. random playout.

function MCTS(root, rollouts) for i = 1 : rollouts node = root # selection while all children expanded and node is not terminal node = UCB_sample(node) # expansion if node not terminal node = expand(random unexpanded child of node) # simulation outcome = random_playout(node's state) # backpropagation backpropagation(node, root, outcome) return move that generates the highest-value successor of root (from the current player's perspective)

function UCB_sample(node) weights = [UCB_weight(child) for each child of node] distribution = normalize(weights) return random sample from distribution function random_playout(state) while state is not terminal state = random successor of state return winner function backpropagation(node, root, outcome): until node is root increment node's visits update_value(node, outcome) node = parent of node

Upper confidence bound (UCB) Pick each node with probability proportional to: parent node visits value estimate number of visits tunable parameter ● probability is decreasing in the number of visits (explore) ● probability is increasing in a node’s value (exploit) ● always tries every option once

Exercise: construct the UCB distribution visits = 19 value = .68 visits = 5 visits = 2 visits = 12 visits = 1 value = .6 value = .5 value = .75 value = 0 w = [ 2.13 2.93 1.74 3.43 ] prob = [ .209 .286 .170 .335 ]

The next time we select the parent... Which values change? visits = 20 value = .65 How much? visits = 5 visits = 2 visits = 12 visits = 2 value = .6 value = .5 value = .75 value = 0 w = [ 2.13 2.93 1.74 3.43 ] w = [ 2.15 2.95 1.75 2.45 ] prob = [ .209 .286 .170 .335 ] prob = [ .231 .317 .188 .263 ]

Alternative tree policies The tree policy must trade off exploration and exploitation. ● Epsilon-greedy: pick a uniform random child with probability ε and the best child with probability (1-ε). ● Use UCB, but seed the tree within initial values. ○ from previous runs ○ based on a heuristic ● Other ideas?

Alternative default policies The default policy must be fast to evaluate and return a value estimate. ● Use the board evaluation heuristic from bounded minimax. ● Run multiple random rollouts for each expanded node. ● Other ideas?

Options for returning a move ● Return the neighbor with the best value estimate. ● Return the neighbor you’ve visited the most. ● Some combination of the above: ○ Continue simulating until they agree. ○ Use some weighted combination. ■ Question: could we use UCB_weight for this?

Extension: dynamic or unobservable environment We’re already doing Monte Carlo sampling; just sample over the unknowns! 1 When we select this action, go to the left child 40% of the time 2 N and the right child 60%. .4 .6 1 1 2 2 16 -5 102 187 -3 12 -28 -54 -96 106 354 17

Extension: non-zero-sum games ● We now have a tuple of utilities at each outcome node. ● We can maintain a tuple of value estimates at each search tree node. ● The agent deciding at the parent node will use its entry in the value tuple when picking a child node to expand. 1 L R 2 2 L R L R (3,1) (1,2) (2,1) (0,0)

Exercise: construct the UCB distribution visits = 20 2 value = (2.4, 3.4, 2.55) 3 3 1 1 visits = 5 visits = 2 visits = 12 visits = 1 value = value = value = value = (0, 3, 5) (9, 1, 5) (2, 4, 1) (6, 3, 4) w = [ 4.55 3.45 5.00 6.46 ] prob = [ .234 .177 .257 .332 ]

Comparing to minimax / backwards induction UCT / MCTS Minimax / Backwards Induction ● optimal with infinite rollouts ● optimal once the entire tree is ● anytime algorithm (can give an explored or pruned answer immediately, improves its ● can prove the outcome of the game answer with more time) ● Can be made anytime-ish with ● A heuristic is not required, but can iterative deepening. be used if available. ● A heuristic is required unless the ● Handles incomplete information game tree is small. gracefully. ● Hard to use on incomplete information games.

Extending MCTS 2-17-16 Reading Quiz (from Monday) What is the - PowerPoint PPT Presentation

Extending MCTS 2-17-16 Reading Quiz (from Monday) What is the relationship between Monte Carlo tree search and upper confidence bound applied to trees? a) MCTS is a type of UCT b) UCT is a type of MCTS c) both (they are the same algorithm)

MCTS Extensions 2/15/17 The Monte Carlo Tree Search Algorithm MCTS Pseudocode for i = 1 :

Class Structure Last time: Batch RL This Time: MCTS Next time: Human in the Loop RL Lecture 14:

Learning to Search with MCTSnets Minghan Li Ignavier Ng Motivation of MCTSnet MCTS is

Class Structure Last time: Batch RL This Time: MCTS Next time: Human in the Loop RL Lecture 16:

Monte Carlo Tree Search Simon M. Lucas Outline MCTS: The Excitement! A tutorial: how it

Partially Observable Markov Decision Processes 3/3/17 (Dis)Advantages of Online MCTS + Just

Extending ns Extending ns In OTcl In C++ Debugging Padma Haldar USC/ISI 1 2 ns

Extending CSP with tests for availability Gavin Lowe Extending CSP with tests for availability

Extending Rational Apex Extending Rational Apex Greg Bek Greg Bek gab@rational.com

Extending a CICS web application using JCICS Extending a CICS web application using JCICS

Reading: The Means Reading: The Means of Extending & of Extending & Building Funds of

Seeing Further: Extending Seeing Further: Extending Visualization as a Basis for Visualization

The SEEREN Initiative The SEEREN Initiative Extending the Network into SE Europe Extending the

Extending ns Padma Haldar USC/ISI 1 Outline Extending ns In OTcl In C++

Extending Simple Drawings Alan Arroyo 1 , Martin Derka 2 , and Irene Parada 3 1 IST Austria 2

GObject subclassing in GObject subclassing in 2/4/2019 GObject subclassing in Rust for extending

Problem Reduction Search: Problem Reduction Search: AND/OR Graphs & Game Trees AND/OR Graphs

Binary Decision Diagrams INF5140 1 Motivation Fig 1. Polynomial representation Boolean

Intention Interleaving Via Classical Replanning Mengwei Xu , Kim Bauters, Kevin McAreavey, Weiru

Backpropagation Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI), Emily Fox

RECSM Summer School: Machine Learning for Social Sciences Session 2.1: Introduction to

Integer Programming Formulations for the Steiner Forest Problem Sarah Lewin Franois Margot

Cast Project & Node.js Paul Querna paul.querna@rackspace.com <- we are hiring May 5, 2011

Two applications of Bayesian networks Ji r Vomlel Laboratory for Intelligent Systems

Extending MCTS 2-17-16 Reading Quiz (from Monday) What is the - PowerPoint PPT Presentation

Extending MCTS 2-17-16 Reading Quiz (from Monday) What is the relationship between Monte Carlo tree search and upper confidence bound applied to trees? a) MCTS is a type of UCT b) UCT is a type of MCTS c) both (they are the same algorithm)

MCTS Extensions 2/15/17 The Monte Carlo Tree Search Algorithm MCTS Pseudocode for i = 1 :

Class Structure Last time: Batch RL This Time: MCTS Next time: Human in the Loop RL Lecture 14:

Learning to Search with MCTSnets Minghan Li Ignavier Ng Motivation of MCTSnet MCTS is

Class Structure Last time: Batch RL This Time: MCTS Next time: Human in the Loop RL Lecture 16:

Monte Carlo Tree Search Simon M. Lucas Outline MCTS: The Excitement! A tutorial: how it

Partially Observable Markov Decision Processes 3/3/17 (Dis)Advantages of Online MCTS + Just

Extending ns Extending ns In OTcl In C++ Debugging Padma Haldar USC/ISI 1 2 ns

Extending CSP with tests for availability Gavin Lowe Extending CSP with tests for availability

Extending Rational Apex Extending Rational Apex Greg Bek Greg Bek gab@rational.com

Extending a CICS web application using JCICS Extending a CICS web application using JCICS

Reading: The Means Reading: The Means of Extending &amp; of Extending &amp; Building Funds of

Seeing Further: Extending Seeing Further: Extending Visualization as a Basis for Visualization

The SEEREN Initiative The SEEREN Initiative Extending the Network into SE Europe Extending the

Extending ns Padma Haldar USC/ISI 1 Outline Extending ns In OTcl In C++

Extending Simple Drawings Alan Arroyo 1 , Martin Derka 2 , and Irene Parada 3 1 IST Austria 2

GObject subclassing in GObject subclassing in 2/4/2019 GObject subclassing in Rust for extending

Problem Reduction Search: Problem Reduction Search: AND/OR Graphs &amp; Game Trees AND/OR Graphs

Binary Decision Diagrams INF5140 1 Motivation Fig 1. Polynomial representation Boolean

Intention Interleaving Via Classical Replanning Mengwei Xu , Kim Bauters, Kevin McAreavey, Weiru

Backpropagation Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI), Emily Fox

RECSM Summer School: Machine Learning for Social Sciences Session 2.1: Introduction to

Integer Programming Formulations for the Steiner Forest Problem Sarah Lewin Franois Margot

Cast Project &amp; Node.js Paul Querna paul.querna@rackspace.com &lt;- we are hiring May 5, 2011

Two applications of Bayesian networks Ji r Vomlel Laboratory for Intelligent Systems

Reading: The Means Reading: The Means of Extending & of Extending & Building Funds of

Problem Reduction Search: Problem Reduction Search: AND/OR Graphs & Game Trees AND/OR Graphs

Cast Project & Node.js Paul Querna paul.querna@rackspace.com <- we are hiring May 5, 2011