3D Human Action Segmentation and Recognition using Pose Kinetic Energy Junjie Shan Srinivas Akella Department of Computer Science University of North Carolina, Charlotte Charlotte, North Carolina
Action Recognition for Patient Safety Microsoft Kinect sensor Linet bed
Human Poses and Actions • Pose: Configuration (set of 3D joint coordinates) of human • Action: Sequence of poses Pose 1 Pose 2 Pose 3 Pose 4 Pose 5 *Skeletons and RGB images are from Cornell Activity Dataset
Action Recognition Problem • Action Recognition: Given a sequence of poses containing 3D skeleton data, what is the action type?
Challenges Varied nature of human actions Spatial variations human body size differences orientation and position change pose estimation error Temporal variations non-linear stretching random pauses differences in number of repetitions
RGB-D Sensor Features • RGB + Depth + Coordinates • Depth + Coordinates: most common [LiZL12, WangLWY12, SungPSS12] • Coordinates [YangT14] Source: http://pr.cs.cornell.edu/humanactivities/
Classification Approaches • Sequence based – HMM [CalinonB05] , MEMM [SungPSS12], DTW [DarrellP93, GavrilaD95, ShaoL13] – Difficult to train • High-level feature extraction – Extract abstract, meaningful features [WangLWY12, YangT14] – Can use many machine learning algorithms
Our Approach Normalize spatial features Normalize all human poses to same scale Rotate and translate all poses to same position and orientation Repair/discard broken poses Extract temporal features Identify key poses, omit transition poses Ignore random pauses Segment repetitions Apply the machine learning algorithm (Random Forest, SVM, KNN)
Outline of Approach
Pose Kinetic Energy • Idea: Identify characteristic poses of action, at extrema of movements • Use kinetic energy
Key Poses • Key poses : Poses that have zero kinetic energy • A key pose P* must satisfy E(P*)=0 • In practice,
Identifying Key Poses
Atomic Action Template • 5-tuple of key and intermediate poses
Atomic Action Template • Intermediate pose : Pose at middle frame between two consecutive key poses • Atomic action templates used as features • Templates preserve temporal order, e.g., sit down versus stand up
Classification Results on Cornell Data Tested on Cornell Activity dataset with • Random Forest (RF) • Support vector machine (SVM) • K-Nearest neighbor (KNN) • Hidden Markov Model (HMM)
Results on Cornell Activity Dataset
Microsoft Action3D Dataset
Temporal Variations • Method works well on actions with small temporal variations • In fact, robust to significant temporal variations • Tested on randomly stretched action samples
Random Temporal Stretching of Actions Original Stretched
Randomly Stretched Action Sample • Can still identify key poses in randomly stretched action samples
Results: Random Stretching • Cornell Activity Dataset
Conclusion Method to extract features from 3D joint coordinates using kinetic energy, and recognize actions from features Atomic action templates with key poses exhibit good discriminative power with multiple classifiers Can perform as well or better than existing methods while using less data Works robustly on randomly stretched actions
Future Work • Inter-person variation still a challenge • Identifying actions in the presence of noise and occlusion • Evaluation on streaming data • Test on action samples that contain a mix of different types of actions
Acknowledgments • National Science Foundation Award IIS-1258335.
Recommend
More recommend