Motions Using Reinforcement Learning Libin Liu http://libliu.info - - PowerPoint PPT Presentation

β–Ά
motions using reinforcement learning
SMART_READER_LITE
LIVE PREVIEW

Motions Using Reinforcement Learning Libin Liu http://libliu.info - - PowerPoint PPT Presentation

Learning to Control Complex Human Motions Using Reinforcement Learning Libin Liu http://libliu.info DeepMotion Inc http://deepmotion.com 1 Physics-based Character Animation Motion Control Physics Character Controller Signal Engine


slide-1
SLIDE 1

Learning to Control Complex Human Motions Using Reinforcement Learning

Libin Liu DeepMotion Inc

1

http://libliu.info http://deepmotion.com

slide-2
SLIDE 2

Physics-based Character Animation

Motion Controller Control Signal Physics Engine Character Animation

[Gang Beasts] [Totally Accurate Battle Simulator]

slide-3
SLIDE 3

Designing Controllers for Locomotion

Hand-crafted control policy Simulating abstract model

SIMBICON, IPM, ZMP…

Optimization/policy search Reinforcement learning

Actor-critic

[Hodgins et al. 1995] [Tan et al. 2014] [Coros et al. 2010] [Peng et al. 2017] [Mordatch et al. 2010] SIMBICON [Yin et al. 2007]

slide-4
SLIDE 4

Designing Controllers for Complex Motions

4

slide-5
SLIDE 5

Designing controllers for complex motions

5

Motion Clip Tracking Controller

slide-6
SLIDE 6

Tracking Control for Complex Human Motion

Motion Clip Open-loop Tracking Control Feedback Policy Feedback Policy

Control Scheduler

slide-7
SLIDE 7

Reinforcement Learning

Feedback Policy Feedback Policy

Control Scheduler Guided Policy Learning Deep Q-Learning

slide-8
SLIDE 8

Outline

Construct open-loop control

SAMCON (Sample-based Motion Control)

Guided learning of linear feedback policies Learning to schedule control fragment using deep Q-learning

8

slide-9
SLIDE 9

Tracking Control

  • PD servo

𝜐 = π‘™π‘ž ΰ·¨ πœ„ βˆ’ πœ„ βˆ’ 𝑙𝑒 ሢ πœ„

9

slide-10
SLIDE 10

Mocap Clips as Tracking Target

10

slide-11
SLIDE 11

Correction with Sampling

[ ]

11

πœ€π‘’

slide-12
SLIDE 12

SAMCON

  • SAmpling-based Motion CONtrol [Liu et al. 2010, 2015]
  • Motion Clip οƒ  Open-loop control trajectory

Start Sample Start1 Sample Start2 Sample Startn

…

End

12

Particle filtering / Sequential Monte Carlo

slide-13
SLIDE 13

SAMCON

13

πœ€π‘’ πœ€π‘’ πœ€π‘’ πœ€π‘’ time Reference Trajectory State

slide-14
SLIDE 14

πœ€π‘’ πœ€π‘’ πœ€π‘’ πœ€π‘’

Sampling & Simulation

14

time Reference Trajectory 𝑒 𝑏 State Actions (PD-control Targets)

slide-15
SLIDE 15

πœ€π‘’ πœ€π‘’ πœ€π‘’ πœ€π‘’

Resampling

15

𝑒 time Reference Trajectory 𝑏 Actions (PD-control Targets) State

slide-16
SLIDE 16

πœ€π‘’ πœ€π‘’ πœ€π‘’ πœ€π‘’

SAMCON Iterations

16

𝑏 πœ€π‘’ time Reference Trajectory State Actions (PD-control Targets)

slide-17
SLIDE 17

πœ€π‘’ πœ€π‘’ πœ€π‘’ πœ€π‘’

SAMCON Iterations

17

𝑏 πœ€π‘’ time Reference Trajectory State Actions (PD-control Targets)

slide-18
SLIDE 18

πœ€π‘’ πœ€π‘’ πœ€π‘’ πœ€π‘’

Constructed Open-loop Control Trajectory

18

𝑏 πœ€π‘’ time Reference Trajectory State Actions (PD-control Targets)

slide-19
SLIDE 19

Control Reconstruction

19

slide-20
SLIDE 20

Linear Policy

𝑑 βˆ’ ǁ 𝑑 = πœ€π‘‘

𝜌: πœ€π‘ = 𝑁 πœ€π‘‘ + ො 𝑏

πœ€π‘ = 𝑏 βˆ’ ΰ·€ 𝑏

Control Trajectory Simulation

20

slide-21
SLIDE 21

For complex motions

21

Uniform Segmentation Linear Feedback Policy

Control Fragments

slide-22
SLIDE 22

Control Fragment

  • A short control unit:
  • πœ€π‘’ β‰ˆ 0.1 seconds long
  • Open-loop control segment ෝ

𝑛

  • Linear Feedback policy 𝜌

22

π’Ÿ ∢ πœ€π‘’, ෝ 𝑛, 𝜌

𝜌

ෝ 𝑛

πœ€π‘’

slide-23
SLIDE 23

Controller

  • A chain of control fragments

π’Ÿ1 π’Ÿ2

β‹―

π’ŸπΏ

23

slide-24
SLIDE 24

Guided Learning of Control Policies

24

Multiple Open-loop Solutions Feedback Policy

Regression

slide-25
SLIDE 25

Guided Learning of Control Policies

25

Feedback Policy

Guided Learning

Multiple Open-loop Solutions

slide-26
SLIDE 26

Guided Learning of Control Policies

26

Feedback Policy

Guided Learning

Multiple Open-loop Solutions

slide-27
SLIDE 27

Guided Learning of Control Policies

27

Feedback Policy

Guided Learning

Multiple Open-loop Solutions

slide-28
SLIDE 28

Example: Cyclical Motion

28

π’Ÿ1 π’Ÿ2 π’Ÿ4 π’Ÿ3

π’Ÿπ‘™: ෝ 𝑛𝑙, πœ€π‘’, πœŒπ‘™

𝑏 𝑒 SAMCON

π““πŸ, π““πŸ‘, π““πŸ’, π““πŸ“, π““πŸ, π““πŸ‘, π““πŸ’, π““πŸ“, π““πŸ, π““πŸ‘, π““πŸ’, π““πŸ“, …

slide-29
SLIDE 29

Example: Cyclical Motion

29

𝜌1

𝑑1 𝑏1

𝜌2

𝑑2 𝑏2 𝑏 𝑒 SAMCON

π““πŸ, π““πŸ‘, π““πŸ’, π““πŸ“, π““πŸ, π““πŸ‘, π““πŸ’, π““πŸ“, π““πŸ, π““πŸ‘, π““πŸ’, π““πŸ“, … πœŒπŸ’

𝑑3 𝑏3

πœŒπŸ“

𝑑4 𝑏4

slide-30
SLIDE 30

Policy Update

30 30

𝑑 𝑏

slide-31
SLIDE 31

Policy Update

31 31

𝑑 𝑏

Regression

slide-32
SLIDE 32

Guided Learning Iterations

32 32

𝑑 𝑏

Regression

𝑑 𝑏

Guided SAMCON

slide-33
SLIDE 33

Guided Learning Iterations

33 33

𝑑 𝑏

Regression

𝑑 𝑏

Regression Guided SAMCON

slide-34
SLIDE 34

34

slide-35
SLIDE 35

Control Graph

  • A graph whose nodes are control fragments

Control Graph

35

slide-36
SLIDE 36

Control Graph

  • A graph whose nodes are control fragments
  • Converted from a motion graph

Control Graph Motion Graph

36

slide-37
SLIDE 37

37

slide-38
SLIDE 38

38

slide-39
SLIDE 39

Problem of Fixed Time-Indexed Tracking

Reference Basin of attraction Simulation

slide-40
SLIDE 40

Scheduling

Reference Basin of attraction Simulation

slide-41
SLIDE 41

Scheduling

?

slide-42
SLIDE 42

Deep Q-Learning

Learn to perform good actions Raw image input Deep convolutional network

[Mnih et al. 2015, DQN]

slide-43
SLIDE 43

A Q-Network For Scheduling

… … … … … … … … … … state

max 0, 𝑨 max 0, 𝑨

Q-values

Fully Connected

𝑔 𝑔

300 ReLus 300 ReLus

slide-44
SLIDE 44

A Q-Network For Scheduling

… … … … … … … … … … state

max 0, 𝑨 max 0, 𝑨

Q-values

Fully Connected

𝑔 𝑔

300 ReLus 300 ReLus

Input: motion state environmental state user command DoFs: 18 ~ 25

slide-45
SLIDE 45

A Q-Network For Scheduling

… … … … … … … … … … state

max 0, 𝑨 max 0, 𝑨

Fully Connected

𝑔 𝑔

300 ReLus 300 ReLus

Action Set: Control Fragments # of actions: 39 ~ 146

slide-46
SLIDE 46

A Q-Network For Scheduling

… … … … … … … … … …

max 0, 𝑨 max 0, 𝑨

Fully Connected

𝑔 𝑔

300 ReLus 300 ReLus

actions:

Q-Values

slide-47
SLIDE 47

Training

Pipeline: Exploration / Exploitation Simulation Reward Replay Buffer Batch SGD

slide-48
SLIDE 48

Reward Function

𝑆 = 𝐹tracking + 𝐹preference + 𝐹feedback + 𝐹task + 𝑆0

slide-49
SLIDE 49

Importance of the Reference Sequence

  • riginal sequence is enforced
  • riginal sequence is not enforced
slide-50
SLIDE 50

Tracking penalty term

Penalty

In-sequence action Out-of-sequence action

slide-51
SLIDE 51

Tracking exploration strategy

with probability πœπ‘  select a random action with probability πœπ‘ select an in-sequence action

slide-52
SLIDE 52

Bongo Board Balancing

Action Sequence

slide-53
SLIDE 53

Effect of Feedback Policy

Open-loop Control Fragments Feedback-augmented Fragments

slide-54
SLIDE 54

Discover New Transitions

slide-55
SLIDE 55

Running

slide-56
SLIDE 56

Tripping

slide-57
SLIDE 57

Skateboarding

slide-58
SLIDE 58

Skateboarding

slide-59
SLIDE 59

Walking On A Ball

slide-60
SLIDE 60

Push-Recovery

slide-61
SLIDE 61

Conclusion

Motion Clip Open-loop Tracking Control Feedback Policy Feedback Policy Control Scheduler

Libin Liu, Michiel Van De Panne, and Kangkang Yin.

  • 2016. Guided Learning of Control Graphs for Physics-

Based Characters. ACM Trans. Graph. 35, 3, Article 29 (May 2016), 14 pages. Libin Liu and Jessica Hodgins. 2017. Learning to Schedule Control Fragments for Physics-Based Characters Using Deep Q-Learning. ACM Trans. Graph. 36, 3, Article 29 (June 2017), 14 pages.

slide-62
SLIDE 62

Future Work

Statistical/generative model Control with raw simulation state and terrain information Active human-object interaction

basketball, soccer dancing, boxing, martial arts

62

[Peng et al. 2017, DeepLoco] [Heess et al. 2017] [Holden et al. 2017]

slide-63
SLIDE 63

Questions?

Libin Liu http://libliu.info DeepMotion Inc http://deepmotion.com