Function Approximation for (on policy) Prediction and Control
Deep Reinforcement Learning and Control Katerina Fragkiadaki
Carnegie Mellon School of Computer Science Lecture 8, CMU 10-403
Function Approximation for (on policy) Prediction and Control - - PowerPoint PPT Presentation
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Function Approximation for (on policy) Prediction and Control Lecture 8, CMU 10-403 Katerina Fragkiadaki Used Materials Disclaimer : Much of the material
Carnegie Mellon School of Computer Science Lecture 8, CMU 10-403
|𝒯|
n=1
2
s∈𝒯
s∈𝒯
2
|𝒯|
n=1
2
s∈𝒯
¯ s
a
Gradient Monte Carlo Algorithm for Approximating ˆ v ⇡ vπ Input: the policy π to be evaluated Input: a differentiable function ˆ v : S ⇥ Rn ! R Initialize value-function weights θ as appropriate (e.g., θ = 0) Repeat forever: Generate an episode S0, A0, R1, S1, A1, . . . , RT , ST using π For t = 0, 1, . . . , T 1: θ θ + α ⇥ Gt ˆ v(St,θ) ⇤ rˆ v(St,θ)
Semi-gradient TD(0) for estimating ˆ v ⇡ vπ Input: the policy π to be evaluated Input: a differentiable function ˆ v : S+ ⇥ Rn ! R such that ˆ v(terminal,·) = 0 Initialize value-function weights θ arbitrarily (e.g., θ = 0) Repeat (for each episode): Initialize S Repeat (for each step of episode): Choose A ⇠ π(·|S) Take action A, observe R, S0 θ θ + α ⇥ R + γˆ v(S0,θ) ˆ v(S,θ) ⇤ rˆ v(S,θ) S S0 until S0 is terminal
At+1
Episodic Semi-gradient Sarsa for Estimating ˆ q ⇡ q⇤ Input: a differentiable function ˆ q : S ⇥ A ⇥ Rn ! R Initialize value-function weights θ 2 Rn arbitrarily (e.g., θ = 0) Repeat (for each episode): S, A initial state and action of episode (e.g., ε-greedy) Repeat (for each step of episode): Take action A, observe R, S0 If S0 is terminal: θ θ + α ⇥ R ˆ q(S, A, θ) ⇤ rˆ q(S, A, θ) Go to next episode Choose A0 as a function of ˆ q(S0, ·, θ) (e.g., ε-greedy) θ θ + α ⇥ R + γˆ q(S0, A0, θ) ˆ q(S, A, θ) ⇤ rˆ q(S, A, θ) S S0 A A0
! 1 . 2 P
i t i
. 6
Step 428 Goal
P
i t i
4 !.07 .07 Velocity Velocity Velocity Velocity Velocity Velocity P
i t i
P
i t i
P
i t i
27 120 104 46
Episode 12 Episode 104 Episode 1000 Episode 9000
MOUNTAIN CAR
Goal
K
i=1