real time human pose recognition in parts from single
play

Real-Time Human Pose Recognition in Parts from Single Depth Images - PowerPoint PPT Presentation

Real-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake CVPR 2011 PRESENTER: AHSAN ABDULLAH PROBLEM APPROACH


  1. Real-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake CVPR 2011 PRESENTER: AHSAN ABDULLAH

  2. PROBLEM

  3. APPROACH Partitioning into body parts helps localizing the joints • left right hand neck shoulder right elbow Shotton et. al. CVPR 2011

  4. PIPELINE Design Goals • Efficiency • Robustness capture depth image & remove bg infer body parts per pixel cluster pixels to hypothesize body joint fit model & positions track skeleton Shotton et. al. CVPR 2011

  5. BODY PART CLASSIFICATION  Compute P( c i | w i )  pixels i = ( x , y )  body part c i  image window w i image windows move with classifier  Discriminative approach  learn classifier P( c i | w i ) from training data Shotton et. al. CVPR 2011

  6. LEARNING DATA synthetic real (test) (train & test) Shotton et. al. CVPR 2011

  7. LEARNING – DATA SYNTHESIS Retarget to several models Record MoCap 500k frames distilled to 100k poses Render (depth, body parts) pairs Shotton et. al. CVPR 2011

  8. FEATURE SET input • Depth comparisons Δ depth image Δ - very fast to compute Δ x Δ x x x image depth offset depth Δ Δ feature 𝑔 𝐽, x = 𝑒 𝐽 x − 𝑒 𝐽 (x + Δ) x response x image coordinate 𝐰 Δ = 𝑒 𝐽 x scales inversely with depth Background pixels d = large constant Shotton et. al. CVPR 2011

  9. DECISION FORESTS  Aggregation of decision trees

  10. TRAINING DECISION TREES for all Q n = (I, x) P n ( c ) pixels [Breiman et al. 84] body part c f ( I, x ; Δ n ) > θ n n no yes P l ( c ) reduce P r ( c ) entropy r l c c Take ( Δ , θ ) that maximises information gain Shotton et. al. CVPR 2011

  11. DECISION TREE CLASSIFICATION image window Toy example: centred at x Distinguish left ( L ) and right ( R ) sides of f ( I, x ; Δ 1 ) > θ 1 the body no yes f ( I, x ; Δ 2 ) > θ 2 P( c ) no yes L R P( c ) P( c ) L R L R Shotton et. al. CVPR 2011

  12. DECISION FOREST CLASSIFIER (𝐽, x) (𝐽, x) tree 1 tree T ……… P T ( c ) P 1 ( c ) c [Amit & Geman 97] [Breiman 01] c  Trained on different random subset of images [Geurts et al. 06]  “bagging” helps avoid over -fitting 𝑈  Average tree posteriors 𝑄 𝑑 𝐽, x = 1 𝑈 𝑄 𝑢 (𝑑|𝐽, x) 𝑢=1 Shotton et. al. CVPR 2011

  13. NUMBER OF TREES ground truth … Average per-class 55% 50% inferred body parts (most likely) 45% 1 tree 3 trees 6 trees 40% 1 2 3 4 5 6 Number of trees Shotton et. al. CVPR 2011

  14. TREE DEPTH 65% 65% Average per-class synthetic test data real test data 60% 60% accuracy 55% 55% 50% 50% 45% 45% 40% 40% 35% 35% 30% 30% 8 12 16 20 5 15 Depth of trees Depth of trees Shotton et. al. CVPR 2011

  15. Body parts to joint hypotheses Define 3D world space density • 1 2 3D coord pixel of i th pixel weight 3D coord bandwidth pixel index i inferred depth at 3. hypothesize i th pixel probability body joints Mean shift for mode detection • Shotton et. al. CVPR 2011 …

  16. input depth inferred body parts front view side view top view inferred joint positions No tracking or smoothing Shotton et. al. CVPR 2011

  17. input depth inferred body parts front view side view top view inferred joint positions No tracking or smoothing Shotton et. al. CVPR 2011

  18. JOINT PREDICTION ACCURACY Average precision 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Center Head Shotton et. al. CVPR 2011 Center Neck Left Shoulder Right… Left Elbow Right Elbow Left Wrist Right Wrist Left Hand Right Hand Left Knee Right Knee Left Ankle Right Ankle Left Foot Right Foot Mean AP

  19. JOINT PREDICTION ACCURACY Average precision 0.0 0.2 0.4 0.5 0.6 0.8 0.9 0.1 0.3 0.7 1.0 Center Head Center Neck Joint prediction from inferred body parts Joint prediction from ground truth body parts Left Shoulder Shotton et. al. CVPR 2011 Right Shoulder Left Elbow Right Elbow Left Wrist Right Wrist Left Hand Right Hand Left Knee Right Knee Left Ankle Right Ankle Left Foot Right Foot Mean AP

  20. ANALYSIS No temporal information • frame-by-frame - Very fast • simple depth image feature - parallel decision forest classifier - Shotton et. al. CVPR 2011

  21. KINECT SYSTEM Uses… 1 2 • 3D joint hypotheses • kinematic constraints • temporal coherence 3 … to give • full skeleton • higher accuracy • invisible joints • multi-player 4. track skeleton

  22. SUMMARY • Frame-by-frame gives robustness • Body parts representation for efficiency • Fast, simple machine learning • Significant engineering to scale to a massive, varied training data set Shotton et. al. CVPR 2011

  23. QUESTIONS

Recommend


More recommend