CS885 Reinforcement Learning Lecture 1b: May 2, 2018 Markov - PowerPoint PPT Presentation

CS885 Reinforcement Learning Lecture 1b: May 2, 2018 Markov Processes [RusNor] Sec. 15.1 University of Waterloo CS885 Spring 2018 Pascal Poupart 1

Outline • Environment dynamics • Stochastic processes – Markovian assumption – Stationary assumption University of Waterloo CS885 Spring 2018 Pascal Poupart 2

Recall: RL Problem Agent State Action Reward Environment Goal: Learn to choose actions that maximize rewards University of Waterloo CS885 Spring 2018 Pascal Poupart 3

Unrolling the Problem • Unrolling the control loop leads to a sequence of states, actions and rewards: ! " , $ " , % " , ! & , $ & , % & , ! ' , $ ' , % ' , … • This sequence forms a stochastic process (due to some uncertainty in the dynamics of the process) University of Waterloo CS885 Spring 2018 Pascal Poupart 4

Common Properties • Processes are rarely arbitrary • They often exhibit some structure – Laws of the process do not change – Short history sufficient to predict future • Example : weather prediction – Same model can be used everyday to predict weather – Weather measurements of past few days sufficient to predict weather. University of Waterloo CS885 Spring 2018 Pascal Poupart 5

Stochastic Process • Consider the sequence of states only • Definition – Set of States: S – Stochastic dynamics: Pr(s t |s t-1 , …, s 0 ) s 0 s 1 s 2 s 4 s 3 University of Waterloo CS885 Spring 2018 Pascal Poupart 6

Stochastic Process • Problem: – Infinitely large conditional distributions • Solutions: – Stationary process: dynamics do not change over time – Markov assumption: current state depends only on a finite history of past states University of Waterloo CS885 Spring 2018 Pascal Poupart 7

K-order Markov Process • Assumption: last k states sufficient • First-order Markov Process – Pr(s t |s t-1 , …, s 0 ) = Pr(s t |s t-1 ) s 0 s 1 s 2 s 4 s 3 • Second-order Markov Process – Pr(s t |s t-1 , …, s 0 ) = Pr(s t |s t-1 , s t-2 ) s 0 s 1 s 2 s 4 s 3 University of Waterloo CS885 Spring 2018 Pascal Poupart 8

Markov Process • By default, a Markov Process refers to a – First-order process Pr # $ # $%& , # $%( , … , # * = Pr # $ # $%& ∀- – Stationary process Pr # $ # $%& = Pr # $ . # $ . %& ∀- / • Advantage: can specify the entire process with a single concise conditional distribution Pr(# / |#) University of Waterloo CS885 Spring 2018 Pascal Poupart 9

Examples • Robotic control – States: !, #, $, % coordinates of joints – Dynamics: constant motion • Inventory management – States: inventory level – Dynamics: constant (stochastic) demand University of Waterloo CS885 Spring 2018 Pascal Poupart 10

Non-Markovian and/or non-stationary processes • What if the process is not Markovian and/or not stationary? • Solution: add new state components until dynamics are Markovian and stationary – Robotics: the dynamics of !, #, $, % are not stationary when velocity varies… – Solution: add velocity to state description e.g. $, ̇ !, #, $, %, ̇ !, ̇ #, ̇ % – If acceleration varies… then add acceleration to state – Where do we stop? University of Waterloo CS885 Spring 2018 Pascal Poupart 11

Markovian Stationary Process • Problem: adding components to the state description to force a process to be Markovian and stationary may significantly increase computational complexity • Solution: try to find the smallest state description that is self-sufficient (i.e., Markovian and stationary) University of Waterloo CS885 Spring 2018 Pascal Poupart 12

Inference in Markov processes Common task: • – Prediction: Pr($ %&' |$ % ) Computation: • ' – Pr $ %&' $ % = ∑ - ./0 …- ./230 ∏ 567 Pr($ %&5 |$ %&587 ) Discrete states (matrix operations): • – Let 9 be a : ×|:| matrix representing Pr($ %&7 |$ % ) – Then Pr $ %&' $ % = 9 ' – Complexity: <(= : > ) University of Waterloo CS885 Spring 2018 Pascal Poupart 13

Decision Making Predictions by themselves are useless • They are only useful when they will influence future • decisions Hence the ultimate task is decision making • How can we influence the process to visit desirable • states? Model: Markov Decision Process • University of Waterloo CS885 Spring 2018 Pascal Poupart 14

CS885 Reinforcement Learning Lecture 1b: May 2, 2018 Markov - PowerPoint PPT Presentation

CS885 Reinforcement Learning Lecture 1b: May 2, 2018 Markov Processes [RusNor] Sec. 15.1 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Environment dynamics Stochastic processes Markovian assumption

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course Introduction [SutBar] Chapter 1,

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Lecture 8a: May 25, 2018 Multi-armed Bandits [SutBar] Sec. 2.1-2.7,

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8

CS885 Reinforcement Learning Lecture 2a: May 4, 2018 Intro to Markov decision processes [SutBar]

CS885 Reinforcement Learning Lecture 4b: May 11, 2018 Deep Q-networks [SutBar] Sec. 9.4, 9.7,

CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10

CS885 Reinforcement Learning Lecture 15c: June 20, 2018 Semi-Markov Decision Processes [Put]

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and

Neural Combinatorial Optimization With Reinforcement Learning CS885 Reinforcement Learning Paper

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

An Introduction to Stochastic Simulation Stephen Gilmore Laboratory for Foundations of Computer

Unemployment and Productivity in the Long Run: the Role of Macroeconomic Volatility Pierpaolo

Wachusett Reservoir David Reckhow CEE 577 #26 2 1 CEE 577 Lecture #26 3/28/2013 Wachusett

Outline Motivation Network Processor Complexity Methodology and Architecture Faraydon

M362M: Introduction to Stochastic Processes First-day Handout Fall 2019 Caveat : This syllabus

Stochastic Process Creation Javier Esparza Technical University of Munich Joint work with T.

Learning Semantic Definitions Learning Semantic Definitions for Information Sources on the

Stand for Children believes Oregon schools are underfunded. Our large class sizes and short