Monte Carlo Learning
Deep Reinforcement Learning and Control Katerina Fragkiadaki
Carnegie Mellon School of Computer Science Lecture 4, CMU 10-403
Katerina Fragkiadaki
Monte Carlo Learning Lecture 4, CMU 10-403 Katerina Fragkiadaki - - PowerPoint PPT Presentation
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Monte Carlo Learning Lecture 4, CMU 10-403 Katerina Fragkiadaki Katerina Fragkiadaki Used Materials Disclaimer : Much of the material and slides for this
Carnegie Mellon School of Computer Science Lecture 4, CMU 10-403
Katerina Fragkiadaki
v[k+1](s) = ∑
a
π(a|s)(r(s, a) + γ∑
s′
p(s′|s, a)v[k](s′)), ∀s
v[k+1](s) = max
a∈ (r(s, a) + γ ∑ s′∈𝒯
p(s′|s, a)v[k](s′)), ∀s
p(s′, r|s, a)
v[k+1](s) = ∑
a
π(a|s)(r(s, a) + γ∑
s′
p(s′|s, a)v[k](s′)), ∀s
v[k+1](s) = max
a∈ (r(s, a) + γ ∑ s′∈𝒯
p(s′|s, a)v[k](s′)), ∀s
Law of large numbers
approximate expectation: so the estimator has correct mean (unbiased).
Note that:
25
are known as importance weights.
26
Every time: the set of all time steps in which state s is visited First time of termination following time t return after t up through T(t)