Learning to Control Complex Human Motions Using Reinforcement Learning
Libin Liu DeepMotion Inc
1
http://libliu.info http://deepmotion.com
Motions Using Reinforcement Learning Libin Liu http://libliu.info - - PowerPoint PPT Presentation
Learning to Control Complex Human Motions Using Reinforcement Learning Libin Liu http://libliu.info DeepMotion Inc http://deepmotion.com 1 Physics-based Character Animation Motion Control Physics Character Controller Signal Engine
1
http://libliu.info http://deepmotion.com
[Gang Beasts] [Totally Accurate Battle Simulator]
[Hodgins et al. 1995] [Tan et al. 2014] [Coros et al. 2010] [Peng et al. 2017] [Mordatch et al. 2010] SIMBICON [Yin et al. 2007]
4
5
8
9
10
11
Start Sample Start1 Sample Start2 Sample Startn
End
12
Particle filtering / Sequential Monte Carlo
13
ππ’ ππ’ ππ’ ππ’ time Reference Trajectory State
ππ’ ππ’ ππ’ ππ’
14
time Reference Trajectory π’ π State Actions (PD-control Targets)
ππ’ ππ’ ππ’ ππ’
15
π’ time Reference Trajectory π Actions (PD-control Targets) State
ππ’ ππ’ ππ’ ππ’
16
π ππ’ time Reference Trajectory State Actions (PD-control Targets)
ππ’ ππ’ ππ’ ππ’
17
π ππ’ time Reference Trajectory State Actions (PD-control Targets)
ππ’ ππ’ ππ’ ππ’
18
π ππ’ time Reference Trajectory State Actions (PD-control Targets)
19
Control Trajectory Simulation
20
21
22
π
ΰ· π
ππ’
π1 π2
β―
ππΏ
23
24
Multiple Open-loop Solutions Feedback Policy
25
Feedback Policy
Multiple Open-loop Solutions
26
Feedback Policy
Multiple Open-loop Solutions
27
Feedback Policy
Multiple Open-loop Solutions
28
π1 π2 π4 π3
ππ: ΰ· ππ, ππ’, ππ
π π’ SAMCON
ππ, ππ, ππ, ππ, ππ, ππ, ππ, ππ, ππ, ππ, ππ, ππ, β¦
29
π1
π‘1 π1
π2
π‘2 π2 π π’ SAMCON
ππ, ππ, ππ, ππ, ππ, ππ, ππ, ππ, ππ, ππ, ππ, ππ, β¦ ππ
π‘3 π3
ππ
π‘4 π4
30 30
π‘ π
31 31
π‘ π
32 32
π‘ π
π‘ π
33 33
π‘ π
π‘ π
34
Control Graph
35
Control Graph Motion Graph
36
37
38
Reference Basin of attraction Simulation
Reference Basin of attraction Simulation
[Mnih et al. 2015, DQN]
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ state
max 0, π¨ max 0, π¨
Q-values
Fully Connected
π π
300 ReLus 300 ReLus
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ state
max 0, π¨ max 0, π¨
Q-values
Fully Connected
π π
300 ReLus 300 ReLus
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ state
max 0, π¨ max 0, π¨
Fully Connected
π π
300 ReLus 300 ReLus
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
max 0, π¨ max 0, π¨
Fully Connected
π π
300 ReLus 300 ReLus
actions:
Penalty
In-sequence action Out-of-sequence action
Action Sequence
Open-loop Control Fragments Feedback-augmented Fragments
Motion Clip Open-loop Tracking Control Feedback Policy Feedback Policy Control Scheduler
Libin Liu, Michiel Van De Panne, and Kangkang Yin.
Based Characters. ACM Trans. Graph. 35, 3, Article 29 (May 2016), 14 pages. Libin Liu and Jessica Hodgins. 2017. Learning to Schedule Control Fragments for Physics-Based Characters Using Deep Q-Learning. ACM Trans. Graph. 36, 3, Article 29 (June 2017), 14 pages.
62
[Peng et al. 2017, DeepLoco] [Heess et al. 2017] [Holden et al. 2017]
Libin Liu http://libliu.info DeepMotion Inc http://deepmotion.com