policy continuation with hindsight inverse dynamics
play

Policy Continuation with Hindsight Inverse Dynamics Hao Sun 1 , - PowerPoint PPT Presentation

Policy Continuation with Hindsight Inverse Dynamics Hao Sun 1 , Zhizhong Li 1 , Xiaotong Liu 2 , Dahua Lin 1 , Bolei Zhou 1 1 The Chinese University of Hong Kong 2 Peking University sh018@ie.cuhk.edu.hk Goal-Oriented Reward Sparse Tasks Goal


  1. Policy Continuation with Hindsight Inverse Dynamics Hao Sun 1 , Zhizhong Li 1 , Xiaotong Liu 2 , Dahua Lin 1 , Bolei Zhou 1 1 The Chinese University of Hong Kong 2 Peking University sh018@ie.cuhk.edu.hk

  2. Goal-Oriented Reward Sparse Tasks Goal Start

  3. Inspirations from Human Learning 1. Learning from failures [Hindsight Experience Replay, M Andrychowicz et al. 2017] Aimed Achieved

  4. Inspirations from Human Learning 1. Learning from failures [Hindsight Experience Replay, M Andrychowicz et al. 2017] Aimed Achieved

  5. Inspirations from Human Learning 1. Learning from failures 2. Extrapolating Success Learned Extrapolate

  6. Our Proposed Method ID HID 2. Extrapolate 3. Policy Continuation 1.Hindsight

  7. Equipe Inverse Dynamics with Hindsight Hindsight Inverse Dynamics: Inverse Dynamics: State Goal

  8. 1-step HID Is Not Enough Non-linear Linear Case 1-step HID Case

  9. Multi-step Optimality? Policy Continuation: Test the optimality recursively step 2 step 1 In 1 step ?

  10. Multi-step Optimality? Policy Continuation: Test the optimality recursively step k step 2 step 2 step 1 step 1 In 1 step ? step 1 In 1 step ? In less than k-1 steps ?

  11. East Exhibition Hall B + C #194

Recommend


More recommend