Hindsight Credit Assignment Anna Harutyunyan , Will Dabney, Thomas Mesnard, Mo Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi Munos NeurIPS 2019
How did past actions infmuence future outcomes?
RL relies on MDP structure, and takes time as main proxy for credit relevance
RL relies on MDP structure, and takes time as main proxy for credit relevance
RL relies on MDP structure, and takes time as main proxy for credit relevance Time
RL relies on MDP structure, and takes time as main proxy for credit relevance Credit Time
RL relies on MDP structure, and takes time as main proxy for credit relevance Credit Meeting at 4pm Time
RL relies on MDP structure, and takes time as main proxy for credit relevance Credit Lunch Time
RL relies on MDP structure, and takes time as main proxy for credit relevance Credit Umbrella Time
RL relies on MDP structure, and takes time as main proxy for credit relevance Credit Time Credit
Instead of only relying on MDP assumptions, let’s learn credit relevance explicitly!
x
x past future action outcome
x past future action outcome
How did past actions infmuence future outcomes? x past future action outcome
How did past actions infmuence future outcomes? x y past future action outcome
How did past actions infmuence future outcomes? x y past future action outcome
How did past actions infmuence future outcomes? x y past future action outcome
How did past actions infmuence future outcomes? x y past future action outcome
How did past actions infmuence future outcomes? z x y past future action outcome
How did past actions infmuence future outcomes? z x y past future action outcome
How did past actions infmuence future outcomes? z x y past future action outcome
How did past actions infmuence future outcomes? z x y past future action outcome State
How did past actions infmuence future outcomes? z x y past future action outcome State Return
Hindsight Credit Assignment How relevant was a to get to a state X k ?
Hindsight Credit Assignment How relevant was a to get to a state X k ? How relevant was a to achieve the return Z ?
Hindsight Credit Assignment How relevant was a to get to a state X k ? How relevant was a to achieve the return Z ?
Hindsight Credit Assignment HCA Algorithms: Learn the hindsight distribution P, and use it to better estimate value functions or policy gradients How relevant was a to get to a state X k ? How relevant was a to achieve the return Z ?
Experiments
Experiments
Thank you for your attention! Poster #204 :)
Recommend
More recommend