Module 4 Markov Processes CS 886 Sequential Decision Making and - PowerPoint PPT Presentation

Module 4 Markov Processes CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Sequential Decision Making • In general: exponentially large decision tree s1 a b . 9 . 1 . 2 . 8 s2 s3 s12 s13 a b a b a b a b . 5 . 5 . 6 . 4 . 2 . 8 . 7 . 3 . 1 . 9 . 2 . 8 . 2 . 8 . 7 . 3 s4 s5 s6 s7 s8 s9 s10 s11 s14 s15 s16 s17 s18 s19 s20 s21 2 CS886 (c) 2013 Pascal Poupart

Common Properties • Processes are rarely arbitrary • They often exhibit some structure – Laws of the process do not change – Short history sufficient to predict future • Example: weather prediction – Same model can used everyday to predict weather – Weather measurements of past few days sufficient to predict weather. 3 CS886 (c) 2013 Pascal Poupart

Stochastic Process • Problem: – Infinitely large conditional probability tables • Solutions: – Stationary process: dynamics do not change over time – Markov assumption: current state depends only on a finite history of past states 5 CS886 (c) 2013 Pascal Poupart

K-order Markov Process • Assumption: last k states sufficient • First-order Markov Process – Pr(s t |s t-1 , …, s 0 ) = Pr(s t |s t-1 ) s 0 s 1 s 2 s 4 s 3 • Second-order Markov Process – Pr(s t |s t-1 , …, s 0 ) = Pr(s t |s t-1 , s t-2 ) s 0 s 1 s 2 s 4 s 3 6 CS886 (c) 2013 Pascal Poupart

Markov Process • By default, a Markov Process refers to a – First-order process Pr 𝑡 𝑢 𝑡 𝑢−1 , 𝑡 𝑢−2 , … , 𝑡 0 = Pr 𝑡 𝑢 𝑡 𝑢−1 ∀𝑢 – Stationary process Pr 𝑡 𝑢 𝑡 𝑢−1 = Pr 𝑡 𝑢 ′ 𝑡 𝑢 ′ −1 ∀𝑢 ′ • Advantage: can specify the entire process with a single concise conditional distribution Pr (𝑡 ′ |𝑡) 7 CS886 (c) 2013 Pascal Poupart

Examples • Robotic control – States: 𝑦, 𝑧, 𝑨, 𝜄 coordinates of joints – Dynamics: constant motion • Inventory management – States: inventory level – Dynamics: constant (stochastic) demand 8 CS886 (c) 2013 Pascal Poupart

Non-Markovian and/or non-stationary processes • What if the process is not Markovian and/or not stationary? • Solution: add new state components until dynamics are Markovian and stationary – Robotics: the dynamics of 𝑦, 𝑧, 𝑨, 𝜄 are not stationary when velocity varies… – Solution: add velocity to state description e.g. 𝑦, 𝑧, 𝑨, 𝜄, 𝑦 , 𝑧 , 𝑨 , 𝜄 – If velocity varies… then add acceleration – Where do we stop? 9 CS886 (c) 2013 Pascal Poupart

Markovian Stationary Process • Problem: adding components to the state description to force a process to be Markovian and stationary may significantly increase computational complexity • Solution: try to find the smallest state description that is self-sufficient (i.e., Markovian and stationary) 10 CS886 (c) 2013 Pascal Poupart

Inference in Markov processes Common task: • – Prediction: Pr (𝑡 𝑢+𝑙 |𝑡 𝑢 ) Computation: • 𝑙 – Pr 𝑡 𝑢+𝑙 𝑡 𝑢 = Pr (𝑡 𝑢+𝑗 |𝑡 𝑢+𝑗−1 ) 𝑡 𝑢+1 …𝑡 𝑢+𝑙−1 𝑗=1 Matrix operations: • – Let 𝑈 be a 𝑇 × |𝑇| matrix representing Pr (𝑡 𝑢+1 |𝑡 𝑢 ) – Then Pr 𝑡 𝑢+𝑙 𝑡 𝑢 = 𝑈 𝑙 – Complexity: 𝑃(𝑙 𝑇 2 ) 11 CS886 (c) 2013 Pascal Poupart

Decision Making Predictions by themselves are useless • They are only useful when they will influence • future decisions Hence the ultimate task is decision making • How can we influence the process to visit • desirable states? Model: Markov Decision Process • 12 CS886 (c) 2013 Pascal Poupart

Module 4 Markov Processes CS 886 Sequential Decision Making and - PowerPoint PPT Presentation

Module 4 Markov Processes CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Sequential Decision Making In general: exponentially large decision tree s1 a b . 9 . 1 . 2 . 8 s2 s3 s12 s13 a b a b

JOBS IN VALUE CHAINS ANALYSIS INTRODUCTION Roadmap: Why are we here today? Agenda for the

WebEOC Training 1 Topics Module 1 WebEOC Overview Module 2 Getting Started Module 3

Module E: Solving Systems of Linear Equations Module E Math 237 Module E Section E.0 Section

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Agenda Module 1 - Risk, Volatility & Timescale Module 2 - Asset Allocation Module 3 -

Emergency Management Roles and Responsibilities Joe Myers Agenda MODULE 1 WHAT IS MODULE

1 MODULE SPECIFICATION Module Aims The module aims to deliver knowledge of the essential

Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module bio

Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section

6.15 Module 15: Research and Presentation Module Title Research and Presentation Module NFQ

Module Title: Broadcasting & Presentation Skills Level : 4 Credit Value : 20 Code of module

Agenda Module 1 - Risk, Volatility & Timescale Module 2 - Asset Allocation Module 3 -

Using the Code Review Module Szeged DrupalCon Using the Code Review Module Doug Green Stella

Module 3 Doing a Noise Audit This module and Module 2 provide the necessary training needed

MODERATE SEDATION MODULE MODERATE SEDATION MODULE MODERATE SEDATION MODULE Introduction

Auxiliary Rubrics Module 6 Module 5 Review At the conclusion of Module 5, the team completed

Introduction to Mobile Robotics Basics of LSQ Estimation, Geometric Feature Extraction Wolfram

Analysing Gene Expression Data Using Gaussian Processes Lorenz Wernisch School of

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak

A motivation for polynomial regression We have obtained input-output pairs { ( x t , y t ) } t over

Fast location of the process noise for nonlinear system identification Erliang Zhang * , Maarten

State estimation approach to nonstationary Introduction inverse problems State estimation

with P.L. Krapivsky and Arkady Vilenkin Prof. Giulio Racah 1909-1965 He was born in Firenze,

Diagnostics tools for space-time point processes Giada Adelfio and Marcello Chiodi

Module 4 Markov Processes CS 886 Sequential Decision Making and - PowerPoint PPT Presentation

Module 4 Markov Processes CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Sequential Decision Making In general: exponentially large decision tree s1 a b . 9 . 1 . 2 . 8 s2 s3 s12 s13 a b a b

JOBS IN VALUE CHAINS ANALYSIS INTRODUCTION Roadmap: Why are we here today? Agenda for the

WebEOC Training 1 Topics Module 1 WebEOC Overview Module 2 Getting Started Module 3

Module E: Solving Systems of Linear Equations Module E Math 237 Module E Section E.0 Section

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Agenda Module 1 - Risk, Volatility &amp; Timescale Module 2 - Asset Allocation Module 3 -

Emergency Management Roles and Responsibilities Joe Myers Agenda MODULE 1 WHAT IS MODULE

1 MODULE SPECIFICATION Module Aims The module aims to deliver knowledge of the essential

Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module bio

Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section

6.15 Module 15: Research and Presentation Module Title Research and Presentation Module NFQ

Module Title: Broadcasting &amp; Presentation Skills Level : 4 Credit Value : 20 Code of module

Agenda Module 1 - Risk, Volatility &amp; Timescale Module 2 - Asset Allocation Module 3 -

Using the Code Review Module Szeged DrupalCon Using the Code Review Module Doug Green Stella

Module 3 Doing a Noise Audit This module and Module 2 provide the necessary training needed

MODERATE SEDATION MODULE MODERATE SEDATION MODULE MODERATE SEDATION MODULE Introduction

Auxiliary Rubrics Module 6 Module 5 Review At the conclusion of Module 5, the team completed

Introduction to Mobile Robotics Basics of LSQ Estimation, Geometric Feature Extraction Wolfram

Analysing Gene Expression Data Using Gaussian Processes Lorenz Wernisch School of

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak

A motivation for polynomial regression We have obtained input-output pairs { ( x t , y t ) } t over

Fast location of the process noise for nonlinear system identification Erliang Zhang * , Maarten

State estimation approach to nonstationary Introduction inverse problems State estimation

with P.L. Krapivsky and Arkady Vilenkin Prof. Giulio Racah 1909-1965 He was born in Firenze,

Diagnostics tools for space-time point processes Giada Adelfio and Marcello Chiodi

Agenda Module 1 - Risk, Volatility & Timescale Module 2 - Asset Allocation Module 3 -

Module Title: Broadcasting & Presentation Skills Level : 4 Credit Value : 20 Code of module

Agenda Module 1 - Risk, Volatility & Timescale Module 2 - Asset Allocation Module 3 -