human pose search using deep
play

Human Pose Search using Deep Poselets Nataraj Jammalamadaka * Andrew - PowerPoint PPT Presentation

Human Pose Search using Deep Poselets Nataraj Jammalamadaka * Andrew Zisserman C. V. Jawahar * * CVIT, IIIT Hyderabad, India Visual Geometry Group, Department of Engineering, IIIT Hyderabad University of Oxford Human Pose: Gesture and


  1. Human Pose Search using Deep Poselets Nataraj Jammalamadaka * Andrew Zisserman Β§ C. V. Jawahar * * CVIT, IIIT Hyderabad, India Β§ Visual Geometry Group, Department of Engineering, IIIT Hyderabad University of Oxford

  2. Human Pose: Gesture and action Cover Drive Walking Gesturing Human pose is a very important IIIT Hyderabad precursor to gesture and action

  3. Pose Search: Motivation Retrieve cover drive shots Retrieve Bharatanatyam poses IIIT Hyderabad

  4. Pose Search: System 𝑦 1 , … , 𝑦 π‘œ Build a feature Take a query IIIT Hyderabad Search through video DB Return the retrieved results

  5. Overview Deep Poselets Poselet Discovery Training Detection β€’ Cluster pose space β€’ Train poselets using β€’ Detect poselets convolutional neural networks Pose retrieval … IIIT Hyderabad β€’ Given a query image β€’ Build Bag of Deep poselets β€’ Return the retrieved results

  6. Datasets Buffy Stickmen (Season 1, 5 episodes) ETH Pascal dataset (Flickr Images) H3D (Flickr Images) IIIT Hyderabad

  7. Datasets FLIC dataset (30 Hollywood movies) IIIT Hyderabad Movie dataset (Ours) ( 22 Hollywood movies) No overlap with FLIC

  8. Datasets Dataset Train Validation Test Total H3D 238 0 0 238 ETHZ Pascal 0 0 548 548 Buffy 747 0 0 747 Buffy-2 396 0 0 396 Movie 1098 491 2172 3756 Flic 2724 2279 0 5003 Total stickmen 5198 2764 2720 10682 annotations + Flipped version 10396 5528 5440 21364 IIIT Hyderabad

  9. Overview Deep Poselets Poselet Discovery Training Detection β€’ Cluster pose space β€’ Train poselets using β€’ Detect poselets convolutional neural networks Pose retrieval … IIIT Hyderabad β€’ Given a query image β€’ Build Bag of Deep poselets β€’ Return the retrieved results

  10. Poselets Poselets model body parts in a particular spatial configuration. IIIT Hyderabad

  11. Poselets Poselets model body parts in a particular spatial configuration. Poselet 1 IIIT Hyderabad

  12. Poselets Poselets model body parts in a particular spatial configuration. Poselet 2 IIIT Hyderabad

  13. Poselets Poselets model body parts in a particular spatial configuration. Poselet 3 IIIT Hyderabad

  14. Poselets: Discovery Reorganize Left arm (LA) LA + Head LA + Head + Torso All parts except head Training data with ground truth stickmen annotations Right arm (RA) RA + head RA + head + torso Poselet Average Images For each set, get pose descriptors K-Means Clustering β€’ For each body part, note the angle β€’ Cluster on the angles IIIT Hyderabad

  15. Deep Poselets: CNNs . ... . Convolution Convolution . Deep Poselet labels followed by followed by pooling pooling Input Layer 2 Layer 5 Layer 7 Layer 8 Layer 6 Softmax Fully connected Convolutional layers layer layers ReLU Non linearity: 26 30 𝑔(𝑦) = max(0, 𝑦) 13 3x3 5 5 13 30 26 Softmax layer: 50 𝑓 𝑦𝑗 𝑔(𝑦 𝑗 ) = π‘˜ 𝑓 π‘¦π‘˜ 3 50 Max Pooling Convolution IIIT Hyderabad

  16. Deep Poselets: Training . ... . Convolution Convolution . Deep Poselet labels followed by followed by pooling pooling Input Layer 2 Layer 5 Layer 7 Layer 8 Layer 6 Softmax Fully connected Convolutional layers layer layers Input image: 𝑦 Model parameters: π‘₯ Ground truth: 𝑕 Output: 𝑧 = 𝑔(𝑦, π‘₯) Training: Stochastic Gradient Descent Loss function: 𝑀 = π‘˜ 𝑕 π‘˜ log(𝑧 π‘˜ ) π‘₯ = π‘₯ βˆ’ πœƒπœ–π‘€ πœ–π‘₯ IIIT Hyderabad Architecture from Krizhevsky et al., NIPS 2012

  17. Deep Poselets: Fine tuning . ... . Convolution Convolution . Deep Poselet labels followed by followed by pooling pooling Input Layer 2 Layer 5 Layer 7 Layer 8 Layer 6 Softmax Fully connected Convolutional layers layer layers Challenge: Fine tuning procedure: -- Network has 40 million parameters. -- Required training data ~1-2 million. -- Train image classification task -- Available training data ~50K. using imagenet data of size 1.2 million. Solution: -- Replace the softmax layer with -- Train the network on a task with enough random initialization. data present. IIIT Hyderabad -- Fine-tune the network to the current task. -- Run the gradient descent.

  18. Deep Poselets: Detection Given a test image, run all the deep poselets. β€’ Each poselet occurs in a localized regions within a upper body detection. β€’ Run the classifiers on the β€œExpected center points of poselets ”. Expected center points of poselets. β€’ This improves both the speed and accuracy. IIIT Hyderabad

  19. Deep Poselets: Spatial reasoning Score: 0.3 1 Problem: The three detections fired in the same area. Score: 0.7 2 3 IIIT Hyderabad Score: 0.2

  20. Deep Poselets: Spatial reasoning Score: 0.3 οƒ  0 1 Problem: The three detections fired in the same area. Score: 0.7 οƒ  1 Objective: Rescore detection 2 to 1 and the detections 1,3 to 0. 2 Solution: For each poselet, learn regression function whose -- Input: Scores of other poselet detections -- Output: New score 3 IIIT Hyderabad Score: 0.2 οƒ  0

  21. Deep Poselets: Results Method MAP-test β€’ Evaluation measure: Mean HOG 32.6 average precision. CNN before fine-tuning 48.6 β€’ Comparison: Poselets are trained using HOG feature. CNN after fine-tuning 56.0 IIIT Hyderabad

  22. Deep Poselets: Results 40.4 78.1 AP AP #positives #positives 1863 698 in train set in train set Rank 1 Rank 11 Rank 16 Rank 1 Rank 6 Rank 11 Rank 16 Rank 6 IIIT Hyderabad Rank 21 Rank 26 Rank 31 Rank 36 Rank 36 Rank 21 Rank 26 Rank 31

  23. Overview Deep Poselets Poselet Discovery Training Detection β€’ Cluster pose space β€’ Train poselets using β€’ Detect poselets convolutional neural networks Pose retrieval … IIIT Hyderabad β€’ Given a query image β€’ Build Bag of Deep poselets β€’ Return the retrieved results

  24. Pose Search: Indexing β€’ Detect the upper body. β€’ Run all the poselets. β€’ Perform spatial reasoning. For each frame in the video DB collection Descriptor: Max pool the Deep Poselet detections 122D vector … IIIT Hyderabad Index in a database

  25. Pose Search: Retrieval Build Bag of Deep poselets … Given a query image Using cosine distance , search IIIT Hyderabad Return the retrieved results through the database

  26. Pose Search: Results Experimental setup β€’ Database: Test data of size 5440 is used as the database. β€’ Queries: All the samples in the test data are used as query. β€’ Evaluation metric: Mean average precision (MAP). Methods compared against Results β€’ Bag of visual words (BOVW) – Detect sift οƒ  K means (K = 1000) οƒ  VQ. Method MAP BOVW 14.2 β€’ Berkeley Poselets (BPL) BPL 15.3 – Run poselets οƒ  Bag of parts. HPE [1] 17.5 β€’ Human pose estimation [1] (HPE) Ours 34.6 – Run human pose estimation algorithms – Concatenate (sin(x),cos(x)) of IIIT Hyderabad all the body part angles. [1] Y. Yang and D. Ramanan . β€œArticulated pose estimation with flexible mixtures-of- parts.” In CVPR, 2011.

  27. Pose Search: Results 45 HPE [1]: 17.5 40 75% queries < 20% AP 5% queries > 50% AP 35 Percentage of queries Ours: 34.6 30 45% queries < 20% AP 25 25% queries > 50% AP 20 15 10 5 0 0 10 20 30 40 50 60 70 80 90 100 Average Precision Comparison with the state-of-the-art IIIT Hyderabad

  28. Pose Search: Analysis HPE Ground truth Detection β€’ Pose detection algorithms often commit to wrong pose. β€’ Pose search systems based on them perform poorly. OURS β€’ Bag of poselets descriptor encodes multiple S: 0.3 S: 0.2 proposals weighted by their likelihood β€’ Hence it can recover when some of the detections are wrong. IIIT Hyderabad S: 0.7

  29. Pose Search: Results AP: 59.4 Precision Query Recall IIIT Hyderabad Rank 15 Rank 1 Rank 20 Rank 25 Rank 5 Rank 10

  30. Pose Search: Results AP: 44.5 Precision Query Recall IIIT Hyderabad Rank 25 Rank 1 Rank 5 Rank 10 Rank 15 Rank 20

  31. Pose Search: Results AP: 40.3 Precision Query Recall Rank 25 IIIT Hyderabad Rank 1 Rank 5 Rank 10 Rank 15 Rank 20

  32. Summary β€’ We propose a novel Deep Poselets based method for human pose search system. β€’ Our Deep Poselet method outperforms HOG based poselets by 25% MAP. β€’ Our pose retrieval method improves the performance of the current state-of-art system by 17% MAP. IIIT Hyderabad

  33. Thank you. Questions? IIIT Hyderabad

Recommend


More recommend