Planning and Optimization F1. Markov Decision Processes Malte - PowerPoint PPT Presentation

Planning and Optimization F1. Markov Decision Processes Malte Helmert and Thomas Keller Universit¨ at Basel November 27, 2019

Motivation Markov Decision Process Policy Summary Content of this Course Foundations Logic Classical Heuristics Constraints Planning Explicit MDPs Probabilistic Factored MDPs

Motivation Markov Decision Process Policy Summary Content of this Course: Explicit MDPs Foundations Linear Programing Explicit MDPs Policy Iteration Value Iteration

Motivation Markov Decision Process Policy Summary Motivation

Motivation Markov Decision Process Policy Summary Limitations of Classical Planning timetable for astronauts on ISS

Motivation Markov Decision Process Policy Summary Generalization of Classical Planning: Temporal Planning timetable for astronauts on ISS concurrency required for some experiments optimize makespan

Motivation Markov Decision Process Policy Summary Limitations of Classical Planning kinematics of robotic arm

Motivation Markov Decision Process Policy Summary Generalization of Classical Planning: Numeric Planning kinematics of robotic arm state space is continuous preconditions and effects described by complex functions

Motivation Markov Decision Process Policy Summary Limitations of Classical Planning 5 4 3 2 1 1 2 3 4 5 satellite takes images of patches on earth

Motivation Markov Decision Process Policy Summary Generalization of Classical Planning: MDPs 5 4 3 2 1 1 2 3 4 5 satellite takes images of patches on earth weather forecast is uncertain find solution with lowest expected cost

Motivation Markov Decision Process Policy Summary Limitations of Classical Planning Chess

Motivation Markov Decision Process Policy Summary Generalization of Classical Planning: Multiplayer Games Chess there is an opponent with a contradictory objective

Motivation Markov Decision Process Policy Summary Limitations of Classical Planning Solitaire

Motivation Markov Decision Process Policy Summary Generalization of Classical Planning: POMDPs Solitaire some state information cannot be observed must reason over belief for good behaviour

Motivation Markov Decision Process Policy Summary Limitations of Classical Planning many applications are combinations of these all of these are active research areas we focus on one of them: probabilistic planning with Markov decision processes MDPs are closely related to games (Why?)

Motivation Markov Decision Process Policy Summary Markov Decision Process

Motivation Markov Decision Process Policy Summary Markov Decision Processes Markov decision processes (MDPs) studied since the 1950s Work up to 1980s mostly on theory and basic algorithms for small to medium sized MDPs ( � Part F) Today, focus on large, factored MDPs ( � Part G) Fundamental datastructure for reinforcement learning (not covered in this course) and for probabilistic planning different variants exist

Motivation Markov Decision Process Policy Summary Reminder: Transition Systems Definition (Transition System) A transition system is a 6-tuple T = � S , L , c , T , s 0 , S ⋆ � where S is a finite set of states, L is a finite set of (transition) labels, c : L → R + 0 is a label cost function, T ⊆ S × L × S is the transition relation, s 0 ∈ S is the initial state, and S ⋆ ⊆ S is the set of goal states.

Motivation Markov Decision Process Policy Summary Reminder: Transition System Example LR LL TL TR RR RL Logistics problem with one package, one truck, two locations: location of package: { L , R , T } location of truck: { L , R }

Motivation Markov Decision Process Policy Summary Stochastic Shortest Path Problem Definition (Stochastic Shortest Path Problem) A stochastic shortest path problem (SSP) is a 6-tuple T = � S , L , c , T , s 0 , S ⋆ � , where S is a finite set of states, L is a finite set of (transition) labels (or actions), c : L → R + 0 is a label cost function, T : S × L × S �→ [0 , 1] is the transition function, s 0 ∈ S is the initial state, and S ⋆ ⊆ S is the set of goal states. For all s ∈ S and ℓ ∈ L with T ( s , ℓ, s ′ ) > 0 for some s ′ ∈ S , s ′ ∈ S T ( s , ℓ, s ′ ) = 1. we require � Note: An SSP is the probabilistic pendant of a transition system.

Motivation Markov Decision Process Policy Summary Reminder: Transition System Example LR . 2 . 8 LL TL TR RR . 2 . 8 RL Logistics problem with one package, one truck, two locations: location of package: { L , R , T } location of truck: { L , R } if truck moves with package, 20% chance of losing package

Motivation Markov Decision Process Policy Summary Markov Decision Process Definition (Markov Decision Process) A (discounted reward) Markov decision process (MDP) is a 6-tuple T = � S , L , R , T , s 0 , γ � , where S is a finite set of states, L is a finite set of (transition) labels (or actions), R : S × L → R is the reward function, T : S × L × S �→ [0 , 1] is the transition function, s 0 ∈ S is the initial state, and γ ∈ (0 , 1) is the discount factor. For all s ∈ S and ℓ ∈ L with T ( s , ℓ, s ′ ) > 0 for some s ′ ∈ S , we require � s ′ ∈ S T ( s , ℓ, s ′ ) = 1.

Motivation Markov Decision Process Policy Summary Example: Grid World +1 3 2 − 1 s 0 1 1 2 3 4 moving north goes east with probability 0 . 4 only applicable action in (4,2) and (4,3) is collect , which sets position back to (1,1) gives reward of +1 in (4,3) gives reward of − 1 in (4,2)

Motivation Markov Decision Process Policy Summary Terminology (1) p : ℓ → s ′ or s → s ′ if not p If p := T ( s , ℓ, s ′ ) > 0, we write s − − − interested in ℓ . → s ′ or s → s ′ if not ℓ If T ( s , ℓ, s ′ ) = 1, we also write s − interested in ℓ . If T ( s , ℓ, s ′ ) > 0 for some s ′ we say that ℓ is applicable in s . The set of applicable actions in s is L ( s ). We assume that L ( s ) � = ∅ for all s ∈ S .

Motivation Markov Decision Process Policy Summary Terminology (2) the successor set of s and ℓ is succ( s , ℓ ) = { s ′ ∈ S | T ( s , ℓ, s ′ ) > 0 } s ′ is a successor of s if s ′ ∈ succ( s , ℓ ) for some ℓ with s ′ ∼ succ( s , ℓ ) we denote that successor s ′ ∈ succ( s , ℓ ) of s and ℓ is sampled according to probability distribution T

Motivation Markov Decision Process Policy Summary Terminology (3) s ′ is reachable from s if there exists a sequence of transitions s 0 p 1 : ℓ 1 → s 1 , . . . , s n − 1 p n : ℓ n → s n s.t. s 0 = s and s n = s ′ − − − − − − Note: n = 0 possible; then s = s ′ s 0 , . . . , s n is called (state) path from s to s ′ ℓ 1 , . . . , ℓ n is called (action) path from s to s ′ length of path is n cost of path in SSP is � n i =1 c ( ℓ i ) and reward of path in MDP is � n i =1 γ i − 1 R ( s i − 1 , ℓ i ) s ′ is reached from s through this path with probability � n i =1 p i

Motivation Markov Decision Process Policy Summary Policy

Motivation Markov Decision Process Policy Summary Solutions in SSPs LR LL TL TR RR move-L, pickup, move-R, drop RL solution in deterministic transition systems is plan, i.e., a goal path from s 0 to some s ⋆ ∈ S ⋆ cheapest plan is optimal solution deterministic agent that executes plan will reach goal

Motivation Markov Decision Process Policy Summary Solutions in SSPs LR can’t drop! . 2 . 8 LL TL TR RR . 2 . 8 move-L, pickup, move-R, drop RL probabilistic agent will not reach goal or cannot execute plan non-determinism can lead to different outcome than anticipated in plan require a more general solution: a policy

Motivation Markov Decision Process Policy Summary Solutions in SSPs move-L LR . 2 . 8 pickup drop LL TL TR RR . 2 move-R . 8 RL policy must be allowed to be cyclic policy must be able to branch over outcomes policy assigns applicable actions to states

Motivation Markov Decision Process Policy Summary Policy for SSPs Definition (Policy for SSPs) Let T = � S , L , c , T , s 0 , S ⋆ � be an SSP. A policy for T is a mapping π : S → L ∪ {⊥} such that π ( s ) ∈ L ( s ) ∪ {⊥} for all s . The set of reachable states S π ( s ) from s under π is defined recursively as the smallest set satisfying the rules s ∈ S π ( s ) and succ( s ′ , π ( s ′ )) ⊆ S π ( s ) for all s ′ ∈ S π ( s ) \ S ⋆ where π ( s ′ ) � = ⊥ . If π ( s ′ ) � = ⊥ for all s ′ ∈ S π ( s ), then π is executable in s .

Motivation Markov Decision Process Policy Summary Policy Representation size of explicit representation of executable policy π is | S π ( s 0 ) | often, | S π ( s 0 ) | similar to | S | compact policy representation, e.g. via value function approximation or neural networks, is active research area ⇒ not covered in this course instead, we consider small state spaces for basic algorithms or online planning where planning for the current state s 0 is interleaved with execution of π ( s 0 )

Planning and Optimization F1. Markov Decision Processes Malte - PowerPoint PPT Presentation

Planning and Optimization F1. Markov Decision Processes Malte Helmert and Thomas Keller Universit at Basel November 27, 2019 Motivation Markov Decision Process Policy Summary Content of this Course Foundations Logic Classical

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

Planning and Optimization October 16, 2019 C2. Delete Relaxation: Properties of Relaxed

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Learning in Autonomous Systems Proff. Luca Iocchi, Giorgio Grisetti A.Y. 2015/2016 Luca Iocchi

What Do We Need? Markov Decision Processes } AI systems must be able to handle complex, uncertain

Finite State Machines: Definitions; Verification Greg Plaxton Theory in Programming Practice,

Comparing State Machines Udo Kelter , Maik Schmidt Software Engineering Group University of

On the Difficulty of FSM-based Hardware Obfuscation CHES 2018, September 10, 2018 Marc Fyrbiak 1 ,

Software Architecture Bertrand Meyer ETH Zurich, March-July 2007 Lecture 1: Introduction Goal

Statecharts - Tool Joo Pimentel O CTOBER /2014 REQUIREMENTS ENGINEERING LAB Agenda

boost::statechart visualisation Antons Jelkins Meeting C++ 2019 . . . . . . . . . .

Planning and Optimization F1. Markov Decision Processes Malte - PowerPoint PPT Presentation

Planning and Optimization F1. Markov Decision Processes Malte Helmert and Thomas Keller Universit at Basel November 27, 2019 Motivation Markov Decision Process Policy Summary Content of this Course Foundations Logic Classical

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

Planning and Optimization October 16, 2019 C2. Delete Relaxation: Properties of Relaxed

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&amp;N ICS 271 Fall 2016 Outline: Planning Planning

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Learning in Autonomous Systems Proff. Luca Iocchi, Giorgio Grisetti A.Y. 2015/2016 Luca Iocchi

What Do We Need? Markov Decision Processes } AI systems must be able to handle complex, uncertain

Finite State Machines: Definitions; Verification Greg Plaxton Theory in Programming Practice,

Comparing State Machines Udo Kelter , Maik Schmidt Software Engineering Group University of

On the Difficulty of FSM-based Hardware Obfuscation CHES 2018, September 10, 2018 Marc Fyrbiak 1 ,

Software Architecture Bertrand Meyer ETH Zurich, March-July 2007 Lecture 1: Introduction Goal

Statecharts - Tool Joo Pimentel O CTOBER /2014 REQUIREMENTS ENGINEERING LAB Agenda

boost::statechart visualisation Antons Jelkins Meeting C++ 2019 . . . . . . . . . .

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning