Planning and Optimization G4. Asymptotically Suboptimal Monte-Carlo - PowerPoint PPT Presentation

Planning and Optimization G4. Asymptotically Suboptimal Monte-Carlo Methods Gabriele R¨ oger and Thomas Keller Universit¨ at Basel December 5, 2018

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Content of this Course Tasks Progression/ Regression Classical Complexity Heuristics Planning MDPs Blind Methods Probabilistic Heuristic Search Monte-Carlo Methods

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Motivation

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Monte-Carlo Methods: Brief History 1930s: first researchers experiment with Monte-Carlo methods 1998: Ginsberg’s GIB player competes with Bridge experts 2002: Kearns et al. propose Sparse Sampling 2002: Auer et al. present UCB1 action selection for multi-armed bandits 2006: Coulom coins term Monte-Carlo Tree Search (MCTS) 2006: Kocsis and Szepesv´ ari combine UCB1 and MCTS to the famous MCTS variant, UCT 2007–2016: Constant progress of MCTS in Go culminates in AlphaGo’s historical defeat of dan 9 player Lee Sedol

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Monte-Carlo Methods

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Monte-Carlo Methods: Idea Summarize a broad family of algorithms Decisions are based on random samples (Monte-Carlo sampling) Results of samples are aggregated by computing the average (Monte-Carlo backups) Apart from that, algorithms can differ significantly Careful: Many different definitions of MC methods in the literature

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Monte-Carlo Backups Algorithms presented so far used full Bellman backups to update state-value estimates: V i +1 ( s ) := min ˆ � T ( s , ℓ, s ′ ) · ˆ V i ( s ′ ) ℓ ∈ L ( s ) c ( ℓ ) + s ′ ∈ S Monte-Carlo methods use Monte-Carlo backups instead: i 1 ˆ V i ( s ) := � C k ( s ) , where N ( s ) · k =1 N ( s ) ≤ k is a counter for the number of state-value estimates for state s in first k algorithm iterations and C k ( s ) is cost of k -th iteration for state s (assume C i ( s ) = 0 for iterations without estimate for s ) Advantage: no need to know SSP model, a simulator that samples successor states and reward is sufficient

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Idea Perform samples as long as resources (deliberation time, memory) allow Sample outcomes of all actions ⇒ deterministic (classical) planning problem For each applicable action ℓ ∈ L ( s 0 ), compute plan in the sample that starts with ℓ Execute the action with the lowest average plan cost

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Example s ⋆ 5 4 3 2 s 0 1 1 2 3 4 cost of 1 for all actions except for moving away from (3,4) where cost is 3 get stuck when moving away from gray cells with prob. 0 . 6

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Example s ⋆ 5 3 1 1 0 4 2 1 6 5 1st sample 3 1 1 1 4 2 1 2 1 1 s 0 1 1 1 1 1 1 2 3 4 Samples can be described by number of times agent is stuck Multiplication with cost to move away from cell gives cost of leaving cell in sample

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Example s ⋆ 5 5 2 1 0 4 5 3 7 5 C 1 ( s ) 3 5 4 5 9 2 6 6 6 7 s 0 1 7 7 7 8 1 2 3 4 Samples can be described by number of times agent is stuck Multiplication with cost to move away from cell gives cost of leaving cell in sample

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Example s ⋆ ⇒ ⇒ 5 5 2 1 0 4 ⇑ 5 3 7 5 ˆ V 1 ( s ) ⇒ ⇑ 3 5 4 5 9 2 ⇑ 6 6 6 7 s 0 1 ⇑ 7 7 7 8 1 2 3 4 Samples can be described by number of times agent is stuck Multiplication with cost to move away from cell gives cost of leaving cell in sample

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Example s ⋆ 5 1 1 1 0 4 6 1 6 1 2nd sample 3 5 1 1 5 2 3 4 1 1 s 0 1 1 1 1 1 1 2 3 4 Samples can be described by number of times agent is stuck Multiplication with cost to move away from cell gives cost of leaving cell in sample

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Example s ⋆ 5 3 2 1 0 4 9 3 7 1 C 2 ( s ) 3 9 4 5 6 2 11 8 6 7 s 0 1 9 8 7 8 1 2 3 4 Samples can be described by number of times agent is stuck Multiplication with cost to move away from cell gives cost of leaving cell in sample

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Example s ⋆ ⇒ ⇒ 5 4 2 1 0 4 ⇑ 7 3 7 3 ˆ V 2 ( s ) ⇑ 3 7 4 5 7 . 5 2 ⇑ 8 . 5 7 6 7 s 0 ⇒ 1 ⇑ 8 7 . 5 7 8 1 2 3 4 Samples can be described by number of times agent is stuck Multiplication with cost to move away from cell gives cost of leaving cell in sample

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Example s ⋆ ⇒ ⇒ 5 4 . 0 2 . 0 1 . 0 0 4 ⇑ 6 . 3 3 . 0 8 . 8 1 . 8 ˆ V 10 ( s ) ⇑ 3 6 . 5 4 . 0 4 . 3 4 . 7 2 ⇑ 7 . 0 5 . 6 5 . 3 7 . 2 s 0 ⇒ 1 ⇑ 7 . 2 6 . 3 6 . 3 8 . 3 1 2 3 4 Samples can be described by number of times agent is stuck Multiplication with cost to move away from cell gives cost of leaving cell in sample

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Example s ⋆ ⇒ ⇒ 5 4 . 55 2 . 0 1 . 0 0 4 ⇑ 5 . 43 3 . 0 8 . 50 2 . 40 ˆ V 100 ( s ) ⇑ ⇐ 3 6 . 57 4 . 0 4 . 51 4 . 99 2 ⇑ 8 . 22 6 . 69 5 . 51 7 . 16 s 0 ⇒ ⇒ 1 ⇑ 7 . 69 6 . 89 6 . 51 8 . 48 1 2 3 4 Samples can be described by number of times agent is stuck Multiplication with cost to move away from cell gives cost of leaving cell in sample

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Example s ⋆ ⇒ ⇒ 5 4 . 58 2 . 0 1 . 0 0 4 ⇑ 5 . 56 3 . 0 8 . 33 2 . 44 ˆ V 1000 ( s ) ⇑ 3 6 . 54 4 . 0 4 . 49 4 . 84 2 ⇑ 7 . 88 6 . 48 5 . 49 6 . 80 s 0 ⇒ 1 ⇑ 7 . 60 6 . 75 6 . 49 8 . 44 1 2 3 4 Samples can be described by number of times agent is stuck Multiplication with cost to move away from cell gives cost of leaving cell in sample

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Evaluation HOP well-suited for some problems must be possible to solve sampled MDP efficiently: domain-dependent knowledge (e.g., games like Bridge, Skat) classical planner (FF-Hindsight, Yoon et. al, 2008) What about optimality in the limit?

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Optimality in the Limit 20 s 3 2 5 0 3 s 1 5 10 0 s 4 s 6 a 1 0 s 0 0 6 0 a 2 s 2 s 5

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Optimality in the Limit 20 s 3 0 s 1 10 0 s 4 s 6 a 1 0 s 0 0 6 20 s 3 0 a 2 s 2 s 5 0 s 1 10 (sample probability: 60%) 0 s 4 s 6 a 1 0 s 0 0 6 0 a 2 s 2 s 5 (sample probability: 40%)

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Optimality in the Limit 20 s 3 with k → ∞ : 0 ˆ Q k ( s 0 , a 1 ) → 4 s 1 10 0 s 4 s 6 ˆ Q k ( s 0 , a 2 ) → 6 a 1 0 s 0 0 6 20 s 3 0 a 2 s 2 s 5 0 s 1 10 (sample probability: 60%) 0 s 4 s 6 a 1 0 s 0 0 6 0 a 2 s 2 s 5 (sample probability: 40%)

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Hindsight Optimization: Evaluation HOP well-suited for some problems must be possible to solve sampled MDP efficiently: domain-dependent knowledge (e.g., games like Bridge, Skat) classical planner (FF-Hindsight, Yoon et. al, 2008) What about optimality in the limit? ⇒ in general not optimal due to assumption of clairvoyance

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Policy Simulation

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Policy Simulation: Idea Avoid clairvoyance by separation of computation of policy and its evaluation: Perform samples as long as resources (deliberation time, memory) allow: Sample outcomes of all actions ⇒ deterministic (classical) planning problem Compute policy by solving the sample Simulate the policy Execute the action with the lowest average simulation cost

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Policy Simulation: Example s ⋆ 5 4 3 2 s 0 1 1 2 3 4

Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Policy Simulation: Example s ⋆ 5 3 1 1 0 4 2 1 6 5 3 1st sample 1 1 1 4 2 1 2 1 1 s 0 1 1 1 1 1 1 2 3 4

Planning and Optimization G4. Asymptotically Suboptimal Monte-Carlo - PowerPoint PPT Presentation

Planning and Optimization G4. Asymptotically Suboptimal Monte-Carlo Methods Gabriele R oger and Thomas Keller Universit at Basel December 5, 2018 Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Content

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

Planning and Optimization October 16, 2019 C2. Delete Relaxation: Properties of Relaxed

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

A Pedagogical Framework for Modeling and Simulating Intelligent Agents and Control Systems Dan

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS &

Imprecise random variables, random sets, and Monte Carlo simulation Thomas Fetz, Michael

Simulating the Greeks in Finance By: John Lehoczky Carnegie Mellon University June 7,

DR. WALTER S. DAVIS President 1943 -1968 Academic Excellence and Civil Rights Era 1965 - 1969

Meetin eting g of th the Bo Board d of Visit itor ors s Fina nance nce Commi mmitt ttee

Planning and Optimization G4. Asymptotically Suboptimal Monte-Carlo - PowerPoint PPT Presentation

Planning and Optimization G4. Asymptotically Suboptimal Monte-Carlo Methods Gabriele R oger and Thomas Keller Universit at Basel December 5, 2018 Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Content

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

Planning and Optimization October 16, 2019 C2. Delete Relaxation: Properties of Relaxed

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&amp;N ICS 271 Fall 2016 Outline: Planning Planning

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

A Pedagogical Framework for Modeling and Simulating Intelligent Agents and Control Systems Dan

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS &amp;

Imprecise random variables, random sets, and Monte Carlo simulation Thomas Fetz, Michael

Simulating the Greeks in Finance By: John Lehoczky Carnegie Mellon University June 7,

DR. WALTER S. DAVIS President 1943 -1968 Academic Excellence and Civil Rights Era 1965 - 1969

Meetin eting g of th the Bo Board d of Visit itor ors s Fina nance nce Commi mmitt ttee

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning

Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS &