Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous - PowerPoint PPT Presentation

Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous Computing Seminar FS2014 27.05.2014 1

What is hand pose estimation? Input Computer-usable form 2

Augmented Reality Gaming PC Control Robot Control 3 3

Data glove • Utilizes optical flex sensors to measure finger bending. • Advantage: High accuracy, can provide haptic feedback. • Disadvantages: invasive, long calibration time, unnatural feeling, heavily instrumented. 4 4

Thanks to cheap depth cameras... Depth Camera RGB Camera 5 5

...and increase in GPU Power 6

Problems occuring • • Segmentation Noisy data 7

Problems occuring • Self-occlusion and viewpoint change: 8

Problems occuring • 27 Degrees of freedom per hand -> 280 trillion hand poses: 9

Problems occuring • Performance: For practical use, must be real time. 10

Principle of operation Algorithm 11

Existing schools of thought • • Model-based: Discriminative:  Keeps internally track of  Maps directly from current pose. observation to pose.  Updates pose according  “Learn” from training data to current pose and and apply knowledge to observation. unseen data. Processing 12

Short intro to Random Forests  Ensemble learning A decision tree:  Classification and Regression  Consists of decision trees 13

Short intro to Random Forests Data in feature space Features = «Properties» of data 14

Building a classification tree 19

Random feature sampling Choose 𝑈 𝑘 which splits the data with maximum information gain. 22

Bagging 23

Prediction 24

RF for pose estimation Why Random Forests? How should we use them? • Robust • Must choose what to split on. • Fast • What should the labels be? • Thorougly studied 25

Advanced body pose recognition [Shotton2011] 26

Advanced body pose recognition  Discriminative approach.  Used in the Kinect.  First paper to use synthetic training data.  Basis for many future papers. [Shotton2011] 27

Creating synthetic data [Shotton2011] 28

Split funtion : Depth at position x 29 [Shotton2011]

Joint prediction [Shotton2011] 30

Per-class accuracy vs. tree depth • Accuracy increases as depth of tree increases. • Overfitting occurs for 15k training images. • More training images leads to higher accuracy and less overfitting. [Shotton2011] 31 31

Negative Results • Failure due to self-occlusion: • Failure due to unseen pose: [Shotton2011] 32

Unresolved issues • To capture all possible poses, need to generate huge amount of training data. • Training RF on big training set means more trees and deeper trees. • Big amount of memory needed. 33

Unresolved issues • To capture all possible poses, need to generate huge amount of training data. • Training RF on big training set means more trees and deeper trees. • Big amount of memory needed. • Solution: Divide training data into sub-sets and solve classification for each set separately. 34

Multi-layered Random Forest  Cluster training data based on similarity.  Train RF on and for each cluster.  First layer assigns input to proper cluster.  Second layer gives the final hand part label distribution. [Keskin2012] 35

Clustering training data  Cluster based on weighted differences.  Penalize differences of viewpoint, finger positions.  Label each cluster, labels refer to hand shape.  Train Random Forest on clusters. 36

Experts  Use hand part labels.  Train for each cluster a separate Random Forest.  Each forest is called Expert. 37

Two prediction methods  Global Expert Network:  Feed input to first layer of Random Forest, average input, get hand shape label.  Feed input to corresponding expert, get hand part distribution. 38

Two prediction methods  Local Expert Network  Feed input to first layer of Random Forest, get hand shape label for each pixel.  Feed each pixel to its corresponding expert, get hand part distribution. 39

Parts distribution to pose • RDF returns the hand part distribution. • Get centre of each distribution by utilizing mean shift. 40

American Sign Language 41

First layer accuracy on ASL • 2-fold cross-validation: 97.8% • Confusion occurs for (m,n), (m,t) and (n,t) 42 42

Confusions • Confusion occurs for (m,n), (m,t) and (n,t) 43

Second layer accuracy Q = Number of clusters 44

Problems  Not feasible to capture all possible variations of hand with synthetic data.  Methods using only synthetic data suffer from synthetic- realistic discrepancies.  But: Using realistic training data expensive, due to manually labelling them. 45 Synthetic Real

Problems  Not feasible to capture all possible variations of hand with synthetic data.  Methods using only synthetic data suffer from synthetic- realistic discrepancies.  But: Using realistic training data expensive, due to manually labelling them.  Solution: Transductive Learning. 46

Transductive Random Forest  Transductive learning: learn from labelled data, apply knowledge transform to related unlabelled data  Estimate pose based on knowledge gained from both labelled and unlabelled data. 47

Overview 48

Training data a = «Front»  Training data consists of p = «Thumb» labelled real data and v = (3x16) synthetic data, and coordinates unlabelled real data  Labelled elements are image patches, not pixels  Label consists of tuple (a,p,v):  a = Viewpoint  p = Label of the closest joint  v = Vector containing all positions of joint 49

Quality Function • Randomly choose between the two: Transductive Term Classification-Regression Term 50

Quality Function • 𝑅 𝑏 : Measures quality of split with respect to viewpoint a • 𝑅 𝑞 : Measures quality of split with respect to joint label p • 𝑅 𝑤 : Measures compactness of vote vector v 51

Quality Function Parameter Measures the “purity” of the node with respect to either the viewpoint a, or the joint label p 52

Quality Function • 𝑅 𝑢 : Measures image similarity between real data patches • 𝑅 𝑣 : Measures purity based on the association between the labelled and unlabelled data 53

Kinematic Refinement • Hands are biomechanically constrained on the poses it can do. • Use this for our advantage. • Utilize kinematic refinement to enforce these constraints. 54

Some results 55

Joint prediction accuracy 56

Estimating pose of two hands?  Just apply single hand pose estimator twice?  What if both hands are strongly interacting?  Additional occlusion must be accounted for. 57

Dual hand pose estimation  Model-based approach.  Set up parameter space representing all degrees of freedom for both hands.  Employ PSO to find best parameters suiting observation and current configuration with respect to a cost function. 58

Sample parameter space z - Yaw y - Pitch x - Roll 59

Cost function over param. space 60

Initialization Random sample of n particles with random velocities. 61

Iterating over parameter space Update particle velocities Update particle position with regards to: according to velocity  Current velocity  Local best position  Global best position 62

Tracking  Use RGB image to create skin map.  Segment depth image according to skin map. 63

Tracking  Cost function to optimize: P(h): Penalizes invalid finger positions. D(O,h,C): Penalizes discrepancies between hypothesis h and observation O. 64

Applying PSO  Change particle velocity according to: = Best known position of particle i in generation k. = Best known position of all particles in generation k.  Apply PSO for each observation O. Exploit temporal information by sampling particles around previous hypothesis. 65

Some results 66

Accuracy 67 67

Future of Hand Pose estimation • Academically solved • Further research in areas of recovering more than pose, such as hand model or 3D skin models.  Including RGB image for prediction increases accuracy.  Use of real data reduces synthetic-realistic discrepancies. 68

Thank you for your attention! 69 69

Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous - PowerPoint PPT Presentation

Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous Computing Seminar FS2014 27.05.2014 1 What is hand pose estimation? Input Computer-usable form 2 Augmented Reality Gaming PC Control Robot Control 3 3 Data glove

Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece Agenda What is Hand Pose

Gesture Recognition with CNN Ahmed Abdelghany 20 January 2020 Outline Motivation for Gesture

GESTURE SENSORS Microsoft Kinect V1 24M - 2013 Microsoft Kinect V2 20M - 2016 + VR + GESTURE

Human Pose Estimation by Yannic Jnike - 04.11.2019 https://www.youtube.com/watch?v=mxKlUO_tjcg

Human Pose Recovery And Gesture Recognition CS365 : Artificial Intelligence Khandesh

Human Pose Estimation and Action Recognition Gang Yu, Megvii (Face++) Junsong Yuan, SUNY Buffalo

List of hand outs for this session Hand out 1: Incident decision tree Hand out 2: Yorkshire

Gesture recognition for Smartphones/Wearables Gestures hands, face, body movements

Gesture Recognition Adrian Kndig adkuendi@student.ethz.ch Datum Informatik II Samstag, 27.

Features, Regions, Gestures: Components of a Generic Gesture Recognition Engine Florian Echtler

Hand Hygiene Stefan Morton Hand Hygiene Coordinator Evidence Improved adherence to hand

Hand Gesture Recognition By Jonathan Pritchard Outline Motivation Methods o Kinematic

Human Gesture Recognition for Drone Control Drones are cool - Flying is hard 2 Drone

GESTURE RECOGNITION WITH 3D CNNS Pavlo Molchanov 4/6/2016 Xiaodong Yang Shalini Gupta Kihwan

GESTURE RECOGNITION: USING A MULTI SENSOR APPROACH SHALINI GUPTA, PAVLO MOLCHANOV, KIHWAN KIM,

Motion Capturing and Machine Learning for Gesture Recognition Sotiris Manitsaris Centre for

Recall Impcore concrete syntax Definitions and expressions: def ::= (define f (x1 ... xn) exp)

log ( parseProb ) (Alex) log ( parseProb / trigramProb ) (Anoop) Result: worse than

Course Script INF 5110: Compiler con- struction INF5110, spring 2020 Martin Steffen Contents

61A Lecture 37 Two TAs are available every hour One room will be a review session going

Exact Camera Location Recovery by Least Unsquared Deviations Gilad Lerman University of

Hyperprior bayesian approach for inverse problems in imaging. Application to single shot HDR.

Systematic Testing of Fault Handling Code in Linux Kernel Alexey Khoroshilov Andrey Tsyvarev

Understanding User Cognition: from Everyday Behavior and Spatial Ability to Code Writing and