Leveraging orientation knowledge to enhance human pose estimation methods S. Azrour, S. Pi´ erard, M. Van Droogenbroeck INTELSIG Laboratory, University of Li` ege, Belgium Conference on Articulated Motion and Deformable Objects (AMDO 2016) 13-15th July 2016 1 / 16
What is human pose estimation ? Definition (Human pose estimation) In computer vision, it is the study of algorithms and systems that recover the pose of a human body, which consists of joints and rigid parts. 2 / 16
Application of human pose estimation: some examples Motion analysis Medical Entertainment Animation movies 3 / 16
Types of camera-based pose estimation The camera-based pose estimation (or motion capture) can be marker-based or markerless : maker-based : markers are put on the subject and the pose is recovered by localizing these markers with a multi-camera setup. markerless : the subject has nothing to wear and its pose is recovered using a body model tracking method or a machine learning technique. 4 / 16
Markerless pose estimation using a machine learning technique I Pose estimation algorithms developed by Microsoft for the Kinect camera ) . (from “J. Shotton, R. Girshick et al. , PAMI 2013” 5 / 16
Silhouette ambiguity I There is an intrisic limitation when using color cameras: for one given silhouette, two di ff erent poses are possible = ⇒ Depth cameras help to overcome this limitation but it still remains hard to disambiguate the silhouette orientation and predict the body joint positions at the same time. 6 / 16
Using an orientation information to improve the pose estimation Idea It is preferable to rely on an additional method that is specifically designed for orientation estimation instead of trying to recover the joint positions and disambiguate the silhouette orientation all at once. How can we estimate the orientation ? I The orientation estimation can be obtained from the image itself or thanks to any kind of sensors through a machine learning or a tracking algorithm. 7 / 16
Using an orientation information to improve the pose estimation I The configuration considered in this work: I How do we take advantage of the orientation estimation ? = ⇒ We slice the full orientation range into smaller ranges and learn a di ff erent model for each of these smaller ranges. 8 / 16
Outline of our method 9 / 16
Synthetic data generation I The body model is created with MakeHuman. I Depth images are rendered inside Blender . I Poses are taken randomly from the CMU motion capture database . 10 / 16
Pose estimation algorithm used in this work We use our own implementation of the o ff set joint regression algorithm proposed by Microsoft (R. Girshick et al., ICCV, 2011) . I The machine learning technique used is a random forest. I Each pixel of the silhouette predicts a set of 3D o ff sets toward the body joints. I These predictions are then aggregated using Mean Shift. 11 / 16
Experiments I We compared the accuracy of the estimated pose when using 1, 4 and 12 models. I We considered two scenarios: A constant global learning dataset size. 1 A constant learning dataset size per model . 2 12 / 16
Results with a constant global learning dataset size ⇒ Significant reduction of the error when going from 1 to 4 models. ⇒ However, going from 4 to 12 models slightly worsens the performance. 13 / 16
Results with a constant learning dataset size per model ⇒ Systematic decrease of the error when the number of models is increased. ⇒ However, small di ff erence between 4 and 12 models suggests a plateau is reached. 14 / 16
Mean error according to the orientation Right shoulder Right elbow Right wrist 90 90 90 15 15 15 120 60 120 60 120 60 10 10 10 150 30 150 30 150 30 5 5 5 180 0 180 0 180 0 210 330 210 330 210 330 1 model (|LS|=2000) 1 model (|LS|=2000) 1 model (|LS|=2000) 12 models (|LS|=12x2000) 12 models (|LS|=12x2000) 12 models (|LS|=12x2000) 1 model (|LS|=8000) 1 model (|LS|=8000) 1 model (|LS|=8000) 240 4 models (|LS|=4x2000) 300 240 4 models (|LS|=4x2000) 300 240 4 models (|LS|=4x2000) 300 12 models (|LS|=12x666) 12 models (|LS|=12x666) 12 models (|LS|=12x666) 270 270 270 Right hip Right knee Right ankle 90 90 90 15 15 15 120 60 120 60 120 60 10 10 10 150 30 150 30 150 30 5 5 5 180 0 180 0 180 0 210 330 210 330 210 330 1 model (|LS|=2000) 1 model (|LS|=2000) 1 model (|LS|=2000) 12 models (|LS|=12x2000) 12 models (|LS|=12x2000) 12 models (|LS|=12x2000) 1 model (|LS|=8000) 1 model (|LS|=8000) 1 model (|LS|=8000) 240 4 models (|LS|=4x2000) 300 240 4 models (|LS|=4x2000) 300 240 4 models (|LS|=4x2000) 300 12 models (|LS|=12x666) 12 models (|LS|=12x666) 12 models (|LS|=12x666) 270 270 270 15 / 16
Conclusion I We can improve the accuracy of the estimated pose by taking advantage of an orientation estimation. I One way to take advantage of the orientation estimation is to learn multiple models specialized for di ff erent range of orientations. I We show that accuracy can be significantly improved when the number of models increases, even while keeping a constant global learning dataset size. 16 / 16
Recommend
More recommend