Learning Symmetric and Low-energy Locomotion Wenhao Yu, Greg Turk, C. Karen Liu Georgia Institute of Technology
2 /45
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591461/ https://www.youtube.com/watch?v=Wz3beWec7D4 2 /45
[Robotis OP2] 3 /45
[Yin et al. 2009] 4 /45
[OptiTrack] [Liu et al. 2015] 5 /45
No FSM No Mocap 6 /45
[Tan et al. 2011] [Tan et al. 2014] 7 /45
[Heess et al. 2017] [Peng et al. 2018] 8 /45
Deep Reinforcement Learning 9 /45
1.0 m/s Deep Reinforcement Learning 3.0 m/s 9 /45
Deep Reinforcement Energy Minimization Gait Symmetry Learning 9 /45
� Deep Reinforcement Energy Minimization Gait Symmetry Learning 10 /45
Deep Reinforcement Learning State: 11 /45
Deep Reinforcement Learning State: 11 /45
State: 12 /45
State: 12 /45
State: Action: 13 /45
State: Action: 14 /45
State: Action: Transition: 14 /45
State: Action: Transition: Reward: 15 /45
Markov Decision Process: { } Control Policy: [ s 0 , a 0 , s 1 , a 1 , . . . , s T ] Rollout: Reinforcement Learning: DeepReinforcement Learning: : π 17 /45
Policy Gradient ∂ L RL ( π ) π new ∂π 18 /45
T ( · ) = R ( s , a ) = − || v − ¯ v || ∗ w v −|| a || 1 ∗ w a − LaternalDeviation ∗ w L − TorsoRotation ∗ w T +AliveBonus 19 /45
� Deep Reinforcement Learning 20 /45
� � Curriculum Learning Deep Reinforcement Energy Minimization Learning 20 /45
all possible gaits what we want Keep Balance Go Forward Energy Efficient 21 /45
all possible gaits typical RL result what we want Keep Balance Somewhat Energy Efficient Go Forward Energy Efficient 21 /45
search space what we want Keep Balance Go Forward Energy Efficient 22 /45
Keep Balance Go Forward Energy Efficient https://www.youtube.com/watch?v=5BiesFPqYWE https://www.youtube.com/watch?v=5a21xnaqCzk 23 /45 https://www.youtube.com/watch?v=LjnZHbFpDxc
lateral push walls waist support propel forward treadmill 24 /45
lateral push walls waist support propel forward treadmill 24 /45
With Assistance 25 /45
Iteration i 26 /45
Iteration i propel strength time (0,0) 26 /45
Iteration i propel strength time (0,0) 26 /45
Iteration i+1 propel strength time (0,0) 26 /45
27 /45
� � Curriculum Learning Deep Reinforcement Energy Minimization Gait Symmetry Learning 28 /45
Flip 29 /45
Flip 29 /45
Flip left hand height right hand height time 29 /45
left hand height right hand height time 30 /45
Asymmetry = ( ) left hand height right hand height time 30 /45
Reward -= ( ) 31 /45
Reward -= ( ) 31 /45
32 /45
32 /45
32 /45
symmetric symmetric 33 /45
Mirror state Mirror(state) = action Mirror( ) action mirror 34 /45
L RL minimize: action = Mirror( ) action mirror subject to: *weights omitted for clarity 36 /45
2 - L RL + Mirror( ) action mirror action minimize: 2 *weights omitted for clarity 36 /45
37 /45
Curriculum Learning Deep Reinforcement Energy Minimization Gait Symmetry Learning 38 /45
Running Stylized 39 /45
Walking Running Walking back 40 /45
Trotting Galloping 41 /45
Slow Fast 42 /45
Ball Walking 43 /45
What’s Next? - Biomechanics-based models - Extend to more agile motions such as gymnastics - Running on real-hardware 44 /45
Thank you! - paper link: https://arxiv.org/abs/1801.08093 - code link: https://github.com/VincentYu68/SymmetryCurriculumLocomotion - my website: wenhaoyu.weebly.com - email: wenhaoyu@gatech.edu 45 /45
Recommend
More recommend