Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece Agenda - PowerPoint PPT Presentation

Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece

Agenda § What is Hand Pose Estimation? § Why does it matter? § How does it work? § What has been done? 2

What is Hand Pose Estimation? § Estimate full Degree of Freedom (DOF) of a hand from depth images § This is a tough problem, especially to perform in real time! § Not to be confused with “hand shape estimation” 3

Why Does it Matter? § More than just gestures § Ideal for continuous input applications § Links your hand dexterity into a computer model § Will it redefine how we interact with computers?? 5

Gaming 6

Design / Engineering 7

Robot Hand Control– Surgery? Industry? 8

Communication – Sign Language 9

How Does it Work? § Its going to take some time to explain § Starting from the ground up! § Decision trees § Ensemble techniques § Random forests § Body Pose estimation § Hand Pose Estimation § Assumption is that everyone has a very basic idea of what machine learning is and does 10

Machine Learning § Goal: § Given training data T with entries ( 𝒚 , 𝒛 ) § Find a model that estimates 𝒛 for unseen 𝒚 § This is called prediction § Quality Measurement: § Minimize the probability of model prediction errors on future data § What are some models? § Linear Regression § Support Vector Machines § Decision Trees! 11

Decision Trees § Very intuitive § Each node asks a question about a feature of the data § Propagates through the tree depending on the answer to each question § When algorithm gets to the end, the decision tree makes a classification 12

How to grow a tree from data? § In what order do we ask the questions (test features)? § Each possible tree has an amount of entropy § Test out all possible questions for a node, and choose the one that reduces the entropy the most (largest information gain) § How do nodes make decisions based on the features? § Same way! § Choose a decision boundary that gives the largest information gain 13

How to grow a tree from data? 14

Decision Trees: A Pretty Good Model! 15

Ensemble Learning § Two competing methodologies: § Traditional: Build one really good model § Ensemble: Build many models and average the results § Build a ton of “pretty good” models § Combine them into one “pretty awesome” prediction! § Important for individual models to not be correlated, otherwise there is a strong tendency to overfit § So we add randomness! 16

Ensemble Techniques § Bootstrap Aggregation (Bagging) § Take a random subsample from the training set T, with replacement § Train each model on a different subsample § Classification is the majority vote; Regression is the average § Random Forests: Multiple, randomized decision trees 1. Bagging 2. Randomized Node Optimization: choose random set of questions § Number of questions affects the correlation of the trees 3. Decision boundary of the decision trees: conic, linear, etc. 4. Depth of the component decision trees § More depth means there will be more overfitting 17

Example: Different Trees 18

Example: Random Decision Forest 21

Example: Multi-class Decision Trees 22

Example: Comparison to SVM Model 23

A quick look at body pose estimation § Body Pose Estimation Pipeline § Technology found in consumer devices, like the Kinect § Very similar to hand pose estimation 24

Hand Pose Estimation Pipeline 25

What makes Hand Pose tough? § Hand is much smaller than the body, but still has 22 DOF § Self occlusion is very common and severe § Can be rotated in any direction (body is always upright) § Real depth data can be difficult to label 26

Some ideas.. § Restrict the viewing area of the hand § One Advantage: Hands are fairly invariant among humans § Train with synthetic data, rendered from 3D models 27

Train based on Synthetic Data § Use 3D hand models to generate data § Train the Random Decision Forests using this data 28

Hand Pose Estimation Pipeline 29

Pixel Classification One Tree Two Trees Three Trees 30

Mean shift local mode finding § Algorithm used to determine where the joints are § Each pixel is given a weighted Gaussian kernel § Weight is determined by class probability times depth § Gradient ascent from many points finds the local maxima § Highest local maxima determines the joint § Threshold the scores to filter out non-visible joints 31

Joint Determination 32

Hand Pose Estimation Algorithm Strengths § Very fast § Robust to fast movements and noise § No initialization needed § Can run on a GPU for interface applications or games Issues § Training must be done offline § Number of images ~1-10M, takes 25-250 GB of data § Number of operations is huge even with simple algorithm 33

Limitations of Single Layer RDF § Difficult to generate every possible hand pose § Dataset size is huge! § Hard to capture the variation in the data set § More variation à deeper trees à more RAM/memory § Solution: Divide into sub problems and solve with separate RDFs § Lower variation à lower complexity à less RAM/memory 34

Multi-layered RDFs for Hand Pose 36

Two Structures of Multi-layer RDFs § Local Expert Network § Hand Shape Classification gives each pixel a label § Train local expert forests for each pixel label § Expert forest depends on pixel label; each pixel is classified § Global Expert Network § Hand Shape Classification gives each pixel a label § The hand shape is determined by pixel voting § Train global expert forests for each pixel label § Expert forest depends on hand shape label; each pixel is classified 37

Local Expert Network 38

Global Expert Network 39

Training a Multi-layer RDF § Given the same data as before (hand shape not given) 1. Cluster the data 2. Train Hand Shape Classifier based on all clusters 3. Train each Pixel Classifier based on a specific cluster 40

Which is better? GEN or LEN § Global Expert Networks average class distributions à More robust to noise § Local Expert Networks use info from each pixel à Better at generalizing unseen data 41

Test: American Sign Language 42

Results § Huge improvement over single-layer RDFs 43

Results § Remaining errors are concentrated on very similar poses 44

Summary § What is Hand Pose Estimation? Determine the joint positions to fix all DOFs of the hand § Why does it matter? Continuous Input Applications § How does it work? Randomized Decision Forests § What has been done? Add multiple layers for increased performance. 45

References § [1] Keskin- Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests § [2] Thompson-Real Time Continuous Pose Recovery of Human Hands Using Convolutional Networks § [3] Qian- Realtime and Robust Hand Tracking from Depth § [4] Tang- Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture § [5] Oikonomidis - Evolutionary Quasi-random Search for Hand Articulations Tracking [6] Wang - 6D Hands: Markerless Hand Tracking for Computer Aided Design § § [7] Hilliges - Advanced topics in Gesture Recognition Part II 46

Questions? 47

Appendix: Getting Hand Shape from Hand Pose § Hand shape is just shape information “fist”, “flat”, etc. § Hand pose is specific joint angles for every DOF § With hand pose, can use SVM to determine hand shape very robustly 48

Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece Agenda - PowerPoint PPT Presentation

Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece Agenda What is Hand Pose Estimation? Why does it matter? How does it work? What has been done? 2 What is Hand Pose Estimation? Estimate full Degree of

Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous Computing Seminar FS2014

Model-based Deep Hand Pose Estimation Xingyi Zhou, Qingfu Wan, Wei Zhang, Xiangyang Xue, Yichen

Human Pose Estimation by Yannic Jnike - 04.11.2019 https://www.youtube.com/watch?v=mxKlUO_tjcg

Low Cost solution for Pose Estimation of Quadrotor mangal@iitk.ac.in

Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image Denis Tom

Human Pose Estimation and Action Recognition Gang Yu, Megvii (Face++) Junsong Yuan, SUNY Buffalo

Tsinghua University Monocular Depth-Pose Prediction [R, t] Depth and Pose RGB PoseNet

Nonlinear Filter Design for Pose and IMU Bias Estimation Glauco Garcia Scandaroli, Pascal Morin.

CosyPose: Consistent multi-view multi-object 6D pose estimation arXiv:2008.08465 Yann Labb 1,2

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation Mykhaylo

Object Pose Estimation in Robotics Using a Low-Cost RGB-D Camera Alexander Ganslandt &

Head Pose Estimation Via Probabilistic High-Dimensional Regression Vincent Drouard 1 Sil` eye Ba 1

Leveraging orientation knowledge to enhance human pose estimation methods S. Azrour, S. Pi

Pose Estimation Vasileios Belagiannis 1 , Sikandar Amin 2,3 , Mykhaylo Andriluka 3,4 , Bernt

Fields of Parts & Friends peter.gehler.net p i Detection + Geometry p i Human Pose

Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach Zhe

Xiao CHU ( ) Supervisor: Xiaogang Wang The Chinese University of Hong Kong 4 th

Deep neural nets for human pose estimation in videos Tomas Pfister, James Charles, Andrew

VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment Chunyu Wang

Vehicle Pose Estimation using UWB Radios Alireza Ansaripour (University of Houston) Milad

Mask R-CNN OBJECT INSTANCE SEGMENTATION AND HUMAN POSE ESTIMATION Kaiming He Georgia Gkioxari

Deep Learning Approach for Pose Estimation Talk #23444 MSc Kanter van Deurzen Introduction

Pose Estimation for Robotic Soccer Players in the Context of RoboCup Judith Hartfill University

Robust Camera Pose Estimation Using 2D Fiducials Tracking for Real-Time Augmented Reality Systems

Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece Agenda - PowerPoint PPT Presentation

Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece Agenda What is Hand Pose Estimation? Why does it matter? How does it work? What has been done? 2 What is Hand Pose Estimation? Estimate full Degree of

Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous Computing Seminar FS2014

Model-based Deep Hand Pose Estimation Xingyi Zhou, Qingfu Wan, Wei Zhang, Xiangyang Xue, Yichen

Human Pose Estimation by Yannic Jnike - 04.11.2019 https://www.youtube.com/watch?v=mxKlUO_tjcg

Low Cost solution for Pose Estimation of Quadrotor mangal@iitk.ac.in

Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image Denis Tom

Human Pose Estimation and Action Recognition Gang Yu, Megvii (Face++) Junsong Yuan, SUNY Buffalo

Tsinghua University Monocular Depth-Pose Prediction [R, t] Depth and Pose RGB PoseNet

Nonlinear Filter Design for Pose and IMU Bias Estimation Glauco Garcia Scandaroli, Pascal Morin.

CosyPose: Consistent multi-view multi-object 6D pose estimation arXiv:2008.08465 Yann Labb 1,2

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation Mykhaylo

Object Pose Estimation in Robotics Using a Low-Cost RGB-D Camera Alexander Ganslandt &amp;

Head Pose Estimation Via Probabilistic High-Dimensional Regression Vincent Drouard 1 Sil` eye Ba 1

Leveraging orientation knowledge to enhance human pose estimation methods S. Azrour, S. Pi

Pose Estimation Vasileios Belagiannis 1 , Sikandar Amin 2,3 , Mykhaylo Andriluka 3,4 , Bernt

Fields of Parts &amp; Friends peter.gehler.net p i Detection + Geometry p i Human Pose

Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach Zhe

Xiao CHU ( ) Supervisor: Xiaogang Wang The Chinese University of Hong Kong 4 th

Deep neural nets for human pose estimation in videos Tomas Pfister, James Charles, Andrew

VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment Chunyu Wang

Vehicle Pose Estimation using UWB Radios Alireza Ansaripour (University of Houston) Milad

Mask R-CNN OBJECT INSTANCE SEGMENTATION AND HUMAN POSE ESTIMATION Kaiming He Georgia Gkioxari

Deep Learning Approach for Pose Estimation Talk #23444 MSc Kanter van Deurzen Introduction

Pose Estimation for Robotic Soccer Players in the Context of RoboCup Judith Hartfill University

Robust Camera Pose Estimation Using 2D Fiducials Tracking for Real-Time Augmented Reality Systems

Object Pose Estimation in Robotics Using a Low-Cost RGB-D Camera Alexander Ganslandt &

Fields of Parts & Friends peter.gehler.net p i Detection + Geometry p i Human Pose