dynamic facial analysis from bayesian filtering to rnn
play

DYNAMIC FACIAL ANALYSIS: FROM BAYESIAN FILTERING TO RNN Jinwei Gu, - PowerPoint PPT Presentation

DYNAMIC FACIAL ANALYSIS: FROM BAYESIAN FILTERING TO RNN Jinwei Gu, 2017/4/18 with Xiaodong Yang, Shalini De Mello, and Jan Kautz FACIAL ANALYSIS IN VIDEOS Exploit temporal coherence to track facial features in videos Head/Face Tracking


  1. DYNAMIC FACIAL ANALYSIS: FROM BAYESIAN FILTERING TO RNN Jinwei Gu, 2017/4/18 with Xiaodong Yang, Shalini De Mello, and Jan Kautz

  2. FACIAL ANALYSIS IN VIDEOS Exploit temporal coherence to track facial features in videos Head/Face Tracking Performance 3D Capture HeadPoseFromDepth, 2015 DeepHeadPose, 2015 HyperFace, 2016 2

  3. CLASSICAL APPROACH: BAYESIAN FILTERING It is challenging to design Bayesian filters specific for each task! Particle Filters Tree-based DPM Spatial-Temporal RNN Head Pose Tracking Face Landmark Tracking Face Landmark [2010] [ICCV2015] [ECCV2016] 3

  4. FROM BAYESIAN FILTERING TO RNN Use RNN to avoid tracker-engineering Output 𝐳 𝑢−1 𝐳 𝑢 𝐳 𝑢−1 𝐳 𝑢 (Target) Hidden 𝐢 𝑢−1 𝐢 𝑢 𝐢 𝑢−1 𝐢 𝑢 State Input 𝐲 𝑢−1 𝐲 𝑢 𝐲 𝑢−1 𝐲 𝑢 (Measurement) Bayesian Filter RNN (unfolded) 4

  5. FROM BAYESIAN FILTERING TO RNN Use RNN to avoid tracker-engineering 5

  6. AN EXAMPLE: KALMAN FILTERS VS. RNN state transition process noise (process model) noisy estimated input state ℎ 𝑢 = 𝜏 1 (𝑋ℎ 𝑢−1 + 𝑉𝑦 𝑢 + 𝑐 1 ) 𝑦 𝑢 = 𝑋𝑦 𝑢−1 + 𝑜 1 measurement noise 𝑧 𝑢 = 𝜏 2 (𝑊ℎ 𝑢 + 𝑐 2 ) 𝑧 𝑢 = 𝑊𝑦 𝑢 + 𝑜 2 target measurement model output noisy observation Simple RNN (i.e., vanilla RNN) Linear Kalman Filter 6

  7. AN EXAMPLE: KALMAN FILTERS VS. RNN Kalman Gain noisy input 𝑦 𝑢 = 𝑋𝑦 𝑢−1 + 𝐿 𝑢 (𝑧 𝑢 − 𝑊𝑦 𝑢−1 ) ℎ 𝑢 = 𝜏 1 (𝑋ℎ 𝑢−1 + 𝑉𝑦 𝑢 + 𝑐 1 ) noisy Input 𝑦 𝑢 = (𝑋 −𝐿 𝑢 𝑊)𝑦 𝑢−1 +𝐿 𝑢 𝑧 𝑢 𝑧 𝑢 = 𝜏 2 (𝑊ℎ 𝑢 + 𝑐 2 ) 𝑨 𝑢 = 𝑊𝑦 𝑢 target output target output Linear Kalman Filter Simple RNN (i.e., vanilla RNN) 7

  8. AN EXAMPLE: KALMAN FILTERS VS. RNN Kalman Gain 𝑦 𝑢 = 𝑋𝑦 𝑢−1 + 𝐿 𝑢 (𝑧 𝑢 − 𝑊𝑦 𝑢−1 ) noisy noisy Input Input 𝑦 𝑢 = 𝑋𝑦 𝑢−1 + 𝑉𝑧 𝑙 𝑦 𝑢 = (𝑋 −𝐿 𝑢 𝑊)𝑦 𝑢−1 +𝐿 𝑢 𝑧 𝑢 𝑨 𝑢 = 𝑊𝑦 𝑢 𝑨 𝑢 = 𝑊𝑦 𝑢 target target output output Simple RNN (i.e., vanilla RNN): Linear Kalman Filter assume linear activation & no bias 8

  9. A TOY EXAMPLE: TRACKING A MOVING CURSOR Input: a noisy curve y(t) state: [x, x’, x’’] Kalman Filter: 𝑦 𝑢 = (𝑋 −𝐿 𝑢 𝑊)𝑦 𝑢−1 +𝐿 𝑢 𝑧 𝑢 𝑨 𝑢 = 𝑊𝑦 𝑢 LSTM: 𝑦 𝑢 = 𝑀𝑇𝑈𝑁(𝑦 𝑢−1 , 𝑧 𝑢 ) 𝑨 𝑢 = 𝑊𝑦 𝑢 9

  10. FACIAL ANLYSIS IN VIDEOS WITH RNN Variants of RNN: FC-RNN*, LSTM, GRU 10

  11. HEAD POSE FROM VIDEOS Results on BIWI dataset 11

  12. HEAD POSE FROM VIDEOS Input Per-Frame + KF RNN (Ours) 12

  13. LARGE SYNTHETIC DATASET MATTERS! The SynHead Dataset 10 high-quality 3D scans of head models 51,096 head poses from 70 motion tracks 510,960 RGB images in total Accurate head pose and landmark annotations (2D/3D) Available at: https://research.nvidia.com (BIWI Dataset: 24 videos and 15,678 frames in total) 13

  14. LARGE SYNTHETIC DATASET MATTERS! The SynHead Database 14

  15. FACIAL LANDMARKS FROM VIDEO HyperFace Per-Frame RNN (Ours) Ground Truth Estimated 15

  16. MORE EXAMPLES 16

  17. VARIANTS OF RNN FOR LANDMARK ESTIMATION FC-RNN FC-LSTM FC-GRU fc6 0.7567, 0.10 0.7690, 0.13 0.7715 , 0.15 fc7 0.7424, 0.06 0.7539, 0.06 0.7554, 0.36 fc6+fc7 0.7630, 0.28 0.7456, 0.27 0.7605, 0.19 (Latest results) 17

  18. CO-PILOT DEMO IN THE CES KEYNOTE (together with GazeNet by Shalini et.al.) 18

  19. DYNAMIC FACIAL ANALYSIS: From Bayesian Filtering to RNN RNNs can be views as a variant of Bayesian Filters • • A general framework to leverage temporal coherence in videos Large synthetic datasets improve the performance • The SynHead Dataset Available at: https://research.nvidia.com 19

Recommend


More recommend