Policy Continuation with Hindsight Inverse Dynamics Hao Sun 1 , - - PowerPoint PPT Presentation

▶

Dec 31, 2023 43 likes •157 views

Policy Continuation with Hindsight Inverse Dynamics Hao Sun 1 , Zhizhong Li 1 , Xiaotong Liu 2 , Dahua Lin 1 , Bolei Zhou 1 1 The Chinese University of Hong Kong 2 Peking University sh018@ie.cuhk.edu.hk Goal-Oriented Reward Sparse Tasks Goal

SLIDE 1

Policy Continuation with Hindsight Inverse Dynamics

Hao Sun1, Zhizhong Li1, Xiaotong Liu2, Dahua Lin1, Bolei Zhou1

1 The Chinese University of Hong Kong 2 Peking University

sh018@ie.cuhk.edu.hk

SLIDE 2

Goal-Oriented Reward Sparse Tasks

Start Goal

SLIDE 3

Inspirations from Human Learning

1. Learning from failures

[Hindsight Experience Replay, M Andrychowicz et al. 2017]

Aimed Achieved

SLIDE 4

Inspirations from Human Learning

1. Learning from failures

[Hindsight Experience Replay, M Andrychowicz et al. 2017]

Aimed Achieved

SLIDE 5

Inspirations from Human Learning

1. Learning from failures
2. Extrapolating Success

Learned Extrapolate

SLIDE 6

Our Proposed Method

ID HID 1.Hindsight

2. Extrapolate 3. Policy Continuation

SLIDE 7

Equipe Inverse Dynamics with Hindsight

Inverse Dynamics:

State Goal

Hindsight Inverse Dynamics:

SLIDE 8

1-step HID Is Not Enough

Linear Case Non-linear Case 1-step HID

SLIDE 9

Multi-step Optimality?

Policy Continuation: Test the optimality recursively

step 1 step 2

In 1 step ?

SLIDE 10

Multi-step Optimality?

Policy Continuation: Test the optimality recursively

step 1 step 2 In 1 step ?

step 1 step k

In less than k-1 steps ?

step 1 step 2

In 1 step ?

SLIDE 11

East Exhibition Hall B + C #194