module 4
play

Module 4 Markov Processes CS 886 Sequential Decision Making and - PowerPoint PPT Presentation

Module 4 Markov Processes CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Sequential Decision Making In general: exponentially large decision tree s1 a b . 9 . 1 . 2 . 8 s2 s3 s12 s13 a b a b


  1. Module 4 Markov Processes CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

  2. Sequential Decision Making • In general: exponentially large decision tree s1 a b . 9 . 1 . 2 . 8 s2 s3 s12 s13 a b a b a b a b . 5 . 5 . 6 . 4 . 2 . 8 . 7 . 3 . 1 . 9 . 2 . 8 . 2 . 8 . 7 . 3 s4 s5 s6 s7 s8 s9 s10 s11 s14 s15 s16 s17 s18 s19 s20 s21 2 CS886 (c) 2013 Pascal Poupart

  3. Common Properties • Processes are rarely arbitrary • They often exhibit some structure – Laws of the process do not change – Short history sufficient to predict future • Example: weather prediction – Same model can used everyday to predict weather – Weather measurements of past few days sufficient to predict weather. 3 CS886 (c) 2013 Pascal Poupart

  4. Stochastic Process • Definition – Set of States: S – Stochastic dynamics: Pr(s t |s t-1 , …, s 0 ) s 0 s 1 s 2 s 4 s 3 4 CS886 (c) 2013 Pascal Poupart

  5. Stochastic Process • Problem: – Infinitely large conditional probability tables • Solutions: – Stationary process: dynamics do not change over time – Markov assumption: current state depends only on a finite history of past states 5 CS886 (c) 2013 Pascal Poupart

  6. K-order Markov Process • Assumption: last k states sufficient • First-order Markov Process – Pr(s t |s t-1 , …, s 0 ) = Pr(s t |s t-1 ) s 0 s 1 s 2 s 4 s 3 • Second-order Markov Process – Pr(s t |s t-1 , …, s 0 ) = Pr(s t |s t-1 , s t-2 ) s 0 s 1 s 2 s 4 s 3 6 CS886 (c) 2013 Pascal Poupart

  7. Markov Process • By default, a Markov Process refers to a – First-order process Pr 𝑡 𝑢 𝑡 𝑢−1 , 𝑡 𝑢−2 , … , 𝑡 0 = Pr 𝑡 𝑢 𝑡 𝑢−1 ∀𝑢 – Stationary process Pr 𝑡 𝑢 𝑡 𝑢−1 = Pr 𝑡 𝑢 ′ 𝑡 𝑢 ′ −1 ∀𝑢 ′ • Advantage: can specify the entire process with a single concise conditional distribution Pr (𝑡 ′ |𝑡) 7 CS886 (c) 2013 Pascal Poupart

  8. Examples • Robotic control – States: 𝑦, 𝑧, 𝑨, 𝜄 coordinates of joints – Dynamics: constant motion • Inventory management – States: inventory level – Dynamics: constant (stochastic) demand 8 CS886 (c) 2013 Pascal Poupart

  9. Non-Markovian and/or non-stationary processes • What if the process is not Markovian and/or not stationary? • Solution: add new state components until dynamics are Markovian and stationary – Robotics: the dynamics of 𝑦, 𝑧, 𝑨, 𝜄 are not stationary when velocity varies… – Solution: add velocity to state description e.g. 𝑦, 𝑧, 𝑨, 𝜄, 𝑦 , 𝑧 , 𝑨 , 𝜄 – If velocity varies… then add acceleration – Where do we stop? 9 CS886 (c) 2013 Pascal Poupart

  10. Markovian Stationary Process • Problem: adding components to the state description to force a process to be Markovian and stationary may significantly increase computational complexity • Solution: try to find the smallest state description that is self-sufficient (i.e., Markovian and stationary) 10 CS886 (c) 2013 Pascal Poupart

  11. Inference in Markov processes Common task: • – Prediction: Pr (𝑡 𝑢+𝑙 |𝑡 𝑢 ) Computation: • 𝑙 – Pr 𝑡 𝑢+𝑙 𝑡 𝑢 = Pr (𝑡 𝑢+𝑗 |𝑡 𝑢+𝑗−1 ) 𝑡 𝑢+1 …𝑡 𝑢+𝑙−1 𝑗=1 Matrix operations: • – Let 𝑈 be a 𝑇 × |𝑇| matrix representing Pr (𝑡 𝑢+1 |𝑡 𝑢 ) – Then Pr 𝑡 𝑢+𝑙 𝑡 𝑢 = 𝑈 𝑙 – Complexity: 𝑃(𝑙 𝑇 2 ) 11 CS886 (c) 2013 Pascal Poupart

  12. Decision Making Predictions by themselves are useless • They are only useful when they will influence • future decisions Hence the ultimate task is decision making • How can we influence the process to visit • desirable states? Model: Markov Decision Process • 12 CS886 (c) 2013 Pascal Poupart

Recommend


More recommend