Deep Convolutional Poses for Human Interaction Recognition in Monocular Videos Marcel Sheeny de Moraes Supervisor: Neil Robertson
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Outline • Introduction • Related Works • Methodology • Results • Conclusion and Future Works 2 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Outline • Introduction • Related Works • Methodology • Results • Conclusion and Future Works 3 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Introduction • Human Interaction Recognition • Surveillance • Human-Computer Interaction Automatic Video labeling • Kicking Hand shake High five 4 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Introduction • Goal of the project • To use Human Pose Estimation to recognize the Human Interaction in Monocular videos (RGB) . 5 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Outline • Introduction • Related Works • Methodology • Results • Conclusion and Future Works 6 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Related Works • Park and Arggawal, (Multimedia systems 2004) • Ellipse and Convex Hull features. • Hierarchical Bayesian Network. 9 types of interactions. • 78% of accuracy. • 7 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Related Works • Yun, et al. , (CVPR 2012) • Depth camera (kinect) to estimate the human pose. • 6 features from human pose. • Multiple Instance Learning. • 8 types of interactions. • 80% of accuracy using 3 frames. • 91% of accuracy whole sequence. 8 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Related Works • Hu, et al. , (ECCV 2014) • Yun, et al. (2012) dataset. • Positive actor features. • Hidden Markov Model to classify. • 76.2% of accuracy per frame and 83.3% for the whole sequence. • Zhu, et al., (AAAI 2016) • Yun, et al. (2012) dataset. • Deep LSTM network to recognize the interaction using the human pose estimation. • 90.41% of accuracy. 9 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Benchmark • Benchmark for the Two-Person Interaction dataset. Method Per frame Whole sequence 91.10% Yun, et al. (2012) 80.30% Hu, et al. (2014) 76.1% 83.33% 90.41% Zhu, et al. (2015) - 10 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Outline • Introduction • Related Works • Methodology • Results • Conclusion and Future Works 11 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Methodology 12 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Dataset • Two-person Interaction Detection Using Body-Pose Features (Yun, et al ., 2012). 8 interactions: approaching, departing, pushing, kicking, punching, • exchanging objects, hugging, and shaking hands. • 282 samples of interactions. Kicking Punching Hugging Shaking Hands 13 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Methodology 14 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Person Detection • “Faster R -CNN: Towards Real-Time Object Detection with Region Proposal Networks”, by Ren, et al (2015) • State-of-the-art method for object detection. • 0.1 s to detect the person in each image. • Very Deep Convolutional Neural Network (VGG-16). 15 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Person Detection Results 16 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Methodology 17 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Multi-Person Tracking • Kalman Filter with a linear motion model. • Hungarian Algorithm is used to assign detections and predictions. • Threshold methods are used to decide new/lost tracks. 18 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Methodology 19 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Human Pose Estimation • “Convolutional Pose Machines”, by Wei, et al (CVPR 2016) • 12 hierarchical Deep Convolutional Neural Networks (DCNN). • 12 different size inputs. • Current state-of-the-art for human pose estimation. PCKh @ 0.2 PC Benchmark using LSP dataset Method Head Shoulder Elbow Wrist Hip Knee Ankle Total AUC Pischulin, et al., ICCV’13 87.2 56.7 46.7 38.9 61.0 57.5 52.7 57.1 35.8 Chen and Yulle, NIPS’14 91.8 78.2 71.8 65.5 73.3 70.2 63.4 73.4 40.1 Carreira, et al. , CVPR’16 90.5 81.8 65.8 59.8 81.6 70.6 62.0 73.1 41.5 Fan et al. , CVPR’15 92.4 75.2 65.3 64.0 75.7 68.3 70.4 73.0 42.2 Tompson, et al., NIPS’14 90.6 79.2 67.9 63.4 69.5 71.0 64.2 72.3 47.3 Yang, et al. , CVPR’16 90.6 78.1 73.8 68.8 74.8 69.9 58.9 73.6 39.3 Pischulin, et al. , CVPR’16 97.0 91.0 83.8 78.1 91.0 86.7 82.0 87.1 63.5 Wei, et al. , CVPR 97.8 92.5 87.0 83.9 91.5 90.8 89.9 90.5 65.4 20 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Human Pose Estimation • “Convolutional Pose Machines”, by Wei, et al (CVPR 2016) 21 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Results 22 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Results https://www.youtube.com/watch?v=llLj50gE9GI 23 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Methodology 24 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Feature Extraction • 6 types of features were used • XY Joint Position (XY) • Distances from Related Joints (DRJ) • Distances from One Joint (DOJ) • Absolute Difference (AD) • Joint Angles (JA) • Velocity (VEL) 25 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Joint Position and Distance from Related Joints • Raw Joint Position (XY) 𝐺 𝑘 = 𝑄(𝑘) • Distance Related Joints (DRJ) 1 𝑘 − 𝑄 2 𝑘 | 𝐺 𝑘 = | 𝑄 26 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Distance from One Joint and Absolute Difference • Distance from One Joint (DOJ) 1 𝑘 1 − 𝑄 2 𝑘 2 | 𝐺 𝑘 1 , 𝑘 2 = | 𝑄 • Absolute difference (AD) 1 𝑘 − 𝑄 2 𝑘 | 𝐺 𝑘 = |𝑄 27 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Joint Angles and Velocity • Joint Angles (JA) 𝑄 𝑧 𝑘 1 −𝑄 𝑧 (𝑘 2 ) 𝐺 𝑘 1 , 𝑘 2 = 𝑢𝑏𝑜 −1 𝑄 𝑦 𝑘 1 −𝑄 𝑦 (𝑘 2 ) • Velocity (VEL) 𝐺(𝑘, 𝑢 1 , 𝑢 2 ) = 𝑄 𝑘, 𝑢 1 − 𝑄(𝑘, 𝑢 2 ) 28 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
HERIOT-WATT Introduction Related Works Methodology UNIVERSITY Results Conclusion and Future Works Methodology 29 /39 VIVA 15th June 2016 - Marcel Sheeny de Moraes
Recommend
More recommend