eecs 3401 ai and logic prog lecture 20
play

EECS 3401 AI and Logic Prog. Lecture 20 Adapted from official - PowerPoint PPT Presentation

EECS 3401 AI and Logic Prog. Lecture 20 Adapted from official slides for 3-ed ed. Russell & Norvig (Ch.17) Vitaliy Batusov vbatusov@cse.yorku.ca York University November 30, 2020 Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS


  1. EECS 3401 — AI and Logic Prog. — Lecture 20 Adapted from official slides for 3-ed ed. Russell & Norvig (Ch.17) Vitaliy Batusov vbatusov@cse.yorku.ca York University November 30, 2020 Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 1 / 55

  2. Today: Sequential Decision-Making Required reading: Russell & Norvig Ch. 17.1–17.3 Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 2 / 55

  3. Context Covered to date: Search; Belief Networks Today: Markov Decision Processes Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 3 / 55

  4. Basic Idea behind MDP Goal: decision making under uncertainty and a notion of utility Random variables to describe the world (like in Belief Networks) But now the world is again dynamical Transition model: specifies the probability distribution over the latest state variables, given the previous values Markov assumption : current state depends on only a finite fixed number of previous states First-order Markov process: current state depends only on last state Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 4 / 55

  5. Sequential Decision Problems Search uncertainty Planning and utility explicit actions and subgoals uncertainty MDP and utility uncertain sensing Decision-theoretic Planning POMDP Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 5 / 55

  6. Example MDP States: s ∈ S , actions: a ∈ A Transition model : T ( s , a , s ′ ) � P ( s ′ | s , a ) — probability that a in s leads to s ′ Reward function : � − 0 . 04 (small penalty for non-terminal states) R ( s ) = ± 1 for terminal states Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 6 / 55

  7. Solving MDPs In search problems, the aim is to find an optimal sequence of actions In MDPs, the aim is to find an optimal policy π ( s ) I.e., best action for every possible state s The optimal policy maximizes the expected sum of rewards Suppose R ( s ) = − 0 . 04. Optimal policy: Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 7 / 55

  8. Risk and Reward Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 8 / 55

  9. Utility of State Sequences Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 9 / 55

  10. Utility of States Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 10 / 55

  11. Utility of States Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 11 / 55

  12. Dynamic Programming Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 12 / 55

  13. Value Iteration Algorithm Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 13 / 55

  14. Convergence Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 14 / 55

  15. Policy Iteration Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 15 / 55

  16. Modified Policy Iteration Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 16 / 55

  17. Partial Observability Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 17 / 55

  18. Partial Observability Vitaliy Batusov vbatusov@cse.yorku.ca (YorkU) EECS 3401 Lecture 20 November 30, 2020 18 / 55

Recommend


More recommend