Human Pose Search using Deep Poselets Nataraj Jammalamadaka * Andrew Zisserman Β§ C. V. Jawahar * * CVIT, IIIT Hyderabad, India Β§ Visual Geometry Group, Department of Engineering, IIIT Hyderabad University of Oxford
Human Pose: Gesture and action Cover Drive Walking Gesturing Human pose is a very important IIIT Hyderabad precursor to gesture and action
Pose Search: Motivation Retrieve cover drive shots Retrieve Bharatanatyam poses IIIT Hyderabad
Pose Search: System π¦ 1 , β¦ , π¦ π Build a feature Take a query IIIT Hyderabad Search through video DB Return the retrieved results
Overview Deep Poselets Poselet Discovery Training Detection β’ Cluster pose space β’ Train poselets using β’ Detect poselets convolutional neural networks Pose retrieval β¦ IIIT Hyderabad β’ Given a query image β’ Build Bag of Deep poselets β’ Return the retrieved results
Datasets Buffy Stickmen (Season 1, 5 episodes) ETH Pascal dataset (Flickr Images) H3D (Flickr Images) IIIT Hyderabad
Datasets FLIC dataset (30 Hollywood movies) IIIT Hyderabad Movie dataset (Ours) ( 22 Hollywood movies) No overlap with FLIC
Datasets Dataset Train Validation Test Total H3D 238 0 0 238 ETHZ Pascal 0 0 548 548 Buffy 747 0 0 747 Buffy-2 396 0 0 396 Movie 1098 491 2172 3756 Flic 2724 2279 0 5003 Total stickmen 5198 2764 2720 10682 annotations + Flipped version 10396 5528 5440 21364 IIIT Hyderabad
Overview Deep Poselets Poselet Discovery Training Detection β’ Cluster pose space β’ Train poselets using β’ Detect poselets convolutional neural networks Pose retrieval β¦ IIIT Hyderabad β’ Given a query image β’ Build Bag of Deep poselets β’ Return the retrieved results
Poselets Poselets model body parts in a particular spatial configuration. IIIT Hyderabad
Poselets Poselets model body parts in a particular spatial configuration. Poselet 1 IIIT Hyderabad
Poselets Poselets model body parts in a particular spatial configuration. Poselet 2 IIIT Hyderabad
Poselets Poselets model body parts in a particular spatial configuration. Poselet 3 IIIT Hyderabad
Poselets: Discovery Reorganize Left arm (LA) LA + Head LA + Head + Torso All parts except head Training data with ground truth stickmen annotations Right arm (RA) RA + head RA + head + torso Poselet Average Images For each set, get pose descriptors K-Means Clustering β’ For each body part, note the angle β’ Cluster on the angles IIIT Hyderabad
Deep Poselets: CNNs . ... . Convolution Convolution . Deep Poselet labels followed by followed by pooling pooling Input Layer 2 Layer 5 Layer 7 Layer 8 Layer 6 Softmax Fully connected Convolutional layers layer layers ReLU Non linearity: 26 30 π(π¦) = max(0, π¦) 13 3x3 5 5 13 30 26 Softmax layer: 50 π π¦π π(π¦ π ) = π π π¦π 3 50 Max Pooling Convolution IIIT Hyderabad
Deep Poselets: Training . ... . Convolution Convolution . Deep Poselet labels followed by followed by pooling pooling Input Layer 2 Layer 5 Layer 7 Layer 8 Layer 6 Softmax Fully connected Convolutional layers layer layers Input image: π¦ Model parameters: π₯ Ground truth: π Output: π§ = π(π¦, π₯) Training: Stochastic Gradient Descent Loss function: π = π π π log(π§ π ) π₯ = π₯ β πππ ππ₯ IIIT Hyderabad Architecture from Krizhevsky et al., NIPS 2012
Deep Poselets: Fine tuning . ... . Convolution Convolution . Deep Poselet labels followed by followed by pooling pooling Input Layer 2 Layer 5 Layer 7 Layer 8 Layer 6 Softmax Fully connected Convolutional layers layer layers Challenge: Fine tuning procedure: -- Network has 40 million parameters. -- Required training data ~1-2 million. -- Train image classification task -- Available training data ~50K. using imagenet data of size 1.2 million. Solution: -- Replace the softmax layer with -- Train the network on a task with enough random initialization. data present. IIIT Hyderabad -- Fine-tune the network to the current task. -- Run the gradient descent.
Deep Poselets: Detection Given a test image, run all the deep poselets. β’ Each poselet occurs in a localized regions within a upper body detection. β’ Run the classifiers on the βExpected center points of poselets β. Expected center points of poselets. β’ This improves both the speed and accuracy. IIIT Hyderabad
Deep Poselets: Spatial reasoning Score: 0.3 1 Problem: The three detections fired in the same area. Score: 0.7 2 3 IIIT Hyderabad Score: 0.2
Deep Poselets: Spatial reasoning Score: 0.3 ο 0 1 Problem: The three detections fired in the same area. Score: 0.7 ο 1 Objective: Rescore detection 2 to 1 and the detections 1,3 to 0. 2 Solution: For each poselet, learn regression function whose -- Input: Scores of other poselet detections -- Output: New score 3 IIIT Hyderabad Score: 0.2 ο 0
Deep Poselets: Results Method MAP-test β’ Evaluation measure: Mean HOG 32.6 average precision. CNN before fine-tuning 48.6 β’ Comparison: Poselets are trained using HOG feature. CNN after fine-tuning 56.0 IIIT Hyderabad
Deep Poselets: Results 40.4 78.1 AP AP #positives #positives 1863 698 in train set in train set Rank 1 Rank 11 Rank 16 Rank 1 Rank 6 Rank 11 Rank 16 Rank 6 IIIT Hyderabad Rank 21 Rank 26 Rank 31 Rank 36 Rank 36 Rank 21 Rank 26 Rank 31
Overview Deep Poselets Poselet Discovery Training Detection β’ Cluster pose space β’ Train poselets using β’ Detect poselets convolutional neural networks Pose retrieval β¦ IIIT Hyderabad β’ Given a query image β’ Build Bag of Deep poselets β’ Return the retrieved results
Pose Search: Indexing β’ Detect the upper body. β’ Run all the poselets. β’ Perform spatial reasoning. For each frame in the video DB collection Descriptor: Max pool the Deep Poselet detections 122D vector β¦ IIIT Hyderabad Index in a database
Pose Search: Retrieval Build Bag of Deep poselets β¦ Given a query image Using cosine distance , search IIIT Hyderabad Return the retrieved results through the database
Pose Search: Results Experimental setup β’ Database: Test data of size 5440 is used as the database. β’ Queries: All the samples in the test data are used as query. β’ Evaluation metric: Mean average precision (MAP). Methods compared against Results β’ Bag of visual words (BOVW) β Detect sift ο K means (K = 1000) ο VQ. Method MAP BOVW 14.2 β’ Berkeley Poselets (BPL) BPL 15.3 β Run poselets ο Bag of parts. HPE [1] 17.5 β’ Human pose estimation [1] (HPE) Ours 34.6 β Run human pose estimation algorithms β Concatenate (sin(x),cos(x)) of IIIT Hyderabad all the body part angles. [1] Y. Yang and D. Ramanan . βArticulated pose estimation with flexible mixtures-of- parts.β In CVPR, 2011.
Pose Search: Results 45 HPE [1]: 17.5 40 75% queries < 20% AP 5% queries > 50% AP 35 Percentage of queries Ours: 34.6 30 45% queries < 20% AP 25 25% queries > 50% AP 20 15 10 5 0 0 10 20 30 40 50 60 70 80 90 100 Average Precision Comparison with the state-of-the-art IIIT Hyderabad
Pose Search: Analysis HPE Ground truth Detection β’ Pose detection algorithms often commit to wrong pose. β’ Pose search systems based on them perform poorly. OURS β’ Bag of poselets descriptor encodes multiple S: 0.3 S: 0.2 proposals weighted by their likelihood β’ Hence it can recover when some of the detections are wrong. IIIT Hyderabad S: 0.7
Pose Search: Results AP: 59.4 Precision Query Recall IIIT Hyderabad Rank 15 Rank 1 Rank 20 Rank 25 Rank 5 Rank 10
Pose Search: Results AP: 44.5 Precision Query Recall IIIT Hyderabad Rank 25 Rank 1 Rank 5 Rank 10 Rank 15 Rank 20
Pose Search: Results AP: 40.3 Precision Query Recall Rank 25 IIIT Hyderabad Rank 1 Rank 5 Rank 10 Rank 15 Rank 20
Summary β’ We propose a novel Deep Poselets based method for human pose search system. β’ Our Deep Poselet method outperforms HOG based poselets by 25% MAP. β’ Our pose retrieval method improves the performance of the current state-of-art system by 17% MAP. IIIT Hyderabad
Thank you. Questions? IIIT Hyderabad
Recommend
More recommend