semantic spaces for zero shot behaviour analysis
play

Semantic Spaces for Zero-Shot Behaviour Analysis Xun Xu Computer - PowerPoint PPT Presentation

Semantic Spaces for Zero-Shot Behaviour Analysis Xun Xu Computer Vision and Interactive Media Lab, NUS Singapore 1 Collaborators Prof. Shaogang Gong Dr. Timothy Hospedales 2 Outline Background Transductive Zero-Shot Action


  1. Semantic Spaces for Zero-Shot Behaviour Analysis Xun Xu Computer Vision and Interactive Media Lab, NUS Singapore 1

  2. Collaborators Prof. Shaogang Gong Dr. Timothy Hospedales 2

  3. Outline • Background • Transductive Zero-Shot Action Recognition • Multi-Task Zero-Shot Embedding • Zero-Shot Crowd Analysis 3

  4. Video Behaviour Defined as Visually Distinguishable Activities • Human Actions • Crowd Behaviour 4

  5. Human Actions • Individual or multiple interactive human activities 5 Soomro, et al. “UCF101 : A Dataset of 101 human actions classes from videos in the wild.” 2012

  6. Human Actions Tasks • Action Recognition Eye Makeup Rafting Swimming Fencing Diving Archery 6

  7. Human Actions Tasks • Action Detection (Retrieval) Given query “Swimming” return ranked videos Lower Ranking …… 7

  8. Crowd Behaviour • A group of people acting collectively 8 Shao, J., et al. “Deeply learned attributes for crowded scene understanding .” CVPR 2015

  9. Crowd Behaviour Tasks • Crowd Behaviour Profiling 9

  10. Crowd Behaviour Tasks • Crowd Anomaly Detection 10 Hassner, T., et al. “Violent flows: Real-time detection of violent crowd behavior .” CVPR 2012

  11. Potential Applications Human Computer Interaction Surveillance Video Sharing 11

  12. Outline • Background • Transductive Zero-Shot Action Recognition • Multi-Task Zero-Shot Embedding • Zero-Shot Crowd Analysis 12

  13. Motivation • Ever Increasing #Categories for action recognition 2004 2005 2010 Weizmann 9 Classes KTH 6 Classes Olympic Sports 16 Classes 2011 2012 2015 203 Classes UCF101 101 Classes HMDB51 51 Classes 13

  14. Motivation • Ever Increasing #Categories Limitations 2004 2005  Expensive to collect training data 2010 Weizmann 9 Classes KTH 6 Classes Olympic Sports 16 Classes  Annotating video is costly 2011 2012 2015 203 Classes UCF101 101 Classes HMDB51 51 Classes 14

  15. Zero-Shot Learning (ZSL) • Can we use videos from known class to help predict videos from unknown classes? Known Classes Unknown Classes Shot-Put Hammer Throw Discus Throw 15

  16. Attribute Semantic Space • Attribute Based Attributes Hammer Throw Throw Away Outdoor Discus Throw Turn Around Ball Bend 16

  17. Attribute Semantic Space • Attribute Based Attributes Hammer Throw Throw Away Outdoor Discus Throw Turn Around Ball Shot-put Bend Known a priori 17

  18. Attribute Semantic Space • Attribute Based Attributes Hammer Throw Throw Away Test video Outdoor Discus Throw Turn Around Ball Shot-put Bend 18

  19. Attribute Semantic Space • Attribute Based Attributes Limitations Discus Throw Throw Away • Ontological problem Outdoor Hammer • Manual label attributes is Throw Turn costly for videos Around • Incompatible with other Ball Shot-put attribute sets Bend 19

  20. Word-Vector Semantic Space Feature Space X Word-Vector Space Z Hammer Discus Throw = [0.2 0.5 0.1 …] Throw  ( ) z f x Discus Throw Hammer Throw = [0.1 0.6 0.1 …] 20

  21. Word-Vector Semantic Space Feature Space X Word-Vector Space Z Hammer Discus Throw = [0.2 0.5 0.1 …] Throw ShotPut = [0.3 0.4 0.2 …] Discus Throw Hammer Throw = [0.1 0.6 0.1 …] 21

  22. Semantic Word-Vector • Skip-gram model predicts adjacent words 1 T   max log p(z | z ) t  j t T { z }      t c j c , j 1 0 T exp(z z )   i j p(z z ) | i j T exp(z z ) i j i Result of this optimization vec (“ball”)=[ -0.004 0.01 0.01 -0.03 0.05] vec (“sword”)=[0.16 0.06 0.09 -0.06 -0.002] vec (“archery”)=[0.02 0.01 0.02 -0.03 -0.03] vec (“boxing”)=[ -0.08 -0.01 0.15 -0.01 0.09] Mikolov, T., et al. "Distributed representations of words and phrases and their compositionality .” NIPS2013 22 Pennington, J., et al. "Glove: Global vectors for word representation." EMNLP 2014.

  23. Benefits • Geometric Meaningful Word-Vector Space ship Far Away Run cat Walk Closer dog 23

  24. Benefits • Unsupervised Semantic Space 24

  25. Benefits • Wide coverage of words Vec (“Apple”) = [0.2 0.3 0.1 …] Vec (“Bear”) = [0.1 0.9 0.1 …] Vec (“Car ”) = [0.6 0.2 0.4 …] Vec (“Desk”) = [0.2 0.8 0.4 …] Vec (“Fish”) = [0.5 0.2 0.3 …] … 25

  26. Benefits • Uniform across datasets Dataset 1 Dataset 2 Discus Throw = [0.2 0.5 …] Discus Throw = [0.2 0.5 …] HammerThrow = [0.1 0.2 …] HammerThrow = [0.1 0.2 …] 26

  27. Challenges • Domain Shift Feature Space X Semantic Vector Space Y Discus Throw Hammer Throw HammerThrow Sword Exercise Discus Throw Play Guitar 27

  28. Challenges • Domain Shift Feature Space X Semantic Vector Space Y Discus Throw Hammer Throw HammerThrow Sword Exercise Discus Throw Confusion Play Guitar 28

  29. Our Solution 29 Xu, X., et al. “ Transductive Zero-Shot Action Recognition by Word-Vector Embedding .” IJCV 2017

  30. Our Solution 30 Xu, X., et al. “ Transductive Zero-Shot Action Recognition by Word-Vector Embedding .” IJCV 2017

  31. Low-Level Visual Feature • Improved Trajectory Feature for x 31 Wang, H. and Schmid , C., et al. “Action recognition with improved trajectories,” ICCV13

  32. Our Solution 32 Xu, X., et al. “ Transductive Zero-Shot Action Recognition by Word- Vector Embedding.” IJCV 2017

  33. Combinations of Multi Words • A phrase is constructed from single word vectors Additive Composition vec (“Apply Eye Makeup”) = vec (“Apply”) + vec (“Eye”) + vec (“Makeup”) vec (“Brushing Teeth”) = vec (“Brushing”) + vec (“Teeth”) vec (“Playing Guitar”) = vec (“Playing”) + vec (“Guitar”) 33

  34. Our Solution 34 Xu, X., et al. “ Transductive Zero-Shot Action Recognition by Word- Vector Embedding.” IJCV 2017

  35. Visual to Semantic Mapping by Regularized Linear Regression • Multi-Dimensional Regularized Linear Regression N  2 2    min z Wx W i i 2 2 W  i 1 W x is N Dimension z is D Dimension x 1 z 1 Feature Space Semantic Space x 2 z 2 …… x 3 … 35

  36. Domain Shift – Semi Supervised (Manifold Regularized) Regression • Semi-supervised regression is applied to tackle domain shift which takes test data distribution into consideration trg X Target Train Data tr X trg Target Test Data te Train and Test Data in Feature Space X  X trg tr tr  trg X X te te KNN Graph KNN Graph to model Manifold weight     2       f x f x : x [ X ;X ] Manifold Regularizor ij i j tr te 2 36

  37. Domain Shift – Semi Supervised (Manifold Regularized) Regression • Semi-supervised regression is applied to tackle domain shift which takes test data distribution into consideration trg X Target Train Data tr X trg Target Test Data te KNN Graph to model Manifold N   2 2 2        min z Wx W Wx Wx i i ij i j 2 2 2 W  i ij 1 37

  38. Our Solution Additional datasets are available 38 Xu, X., et al. “ Transductive Zero-Shot Action Recognition by Word- Vector Embedding.” IJCV 2017

  39. Data Augmentation • Use more training data from Auxiliary Dataset to help learn a better regression Augmented Train and Test Data in Feature Space X  [ X trg ; X aux ] tr tr  trg X X te te trg X Target Train Data tr aux Auxiliary Data X trg X Target Test Data te trg X Target Dataset Train Data tr (e.g. HMDB51) Data Augmentation X aux Auxiliary Dataset Data More Data is considered to learn more robust regressor (e.g. UCF101) 39

  40. Semantic Word Vector Approach 40

  41. Zero-Shot Recognition by Nearest Neighbor • Do nearest Neighbor search in word-vector space to predict category of test data HulaHoop Fencing Basketball W Diving TestData Kayaking Minimal distance Rafting TaiChi Category Name Test Video Instance 41

  42. Domain Shift – SelfTraining • Self-training is applied to tackle domain shift Category Name  z f ( x ) Test Video Instance te z z  Z("Taichi") g("Taichi") 2 3 K 1  *  Z ("Taichi") z z z te K 4 Z("Taichi") 1 z  NN( Z("Taichi"),K ) te z , K ) is the KNN function NN( Z proto 5 z 7 z z 4 NN example 8 Z ("Taichi") * 6     Z ("Taichi") * ( z z z z ) 4 5 6 7 8 42

  43. Domain Shift – SelfTraining • Self-training is applied to tackle domain shift Category Name  z f ( x ) Test Video Instance te z z  Z("Taichi") g("Taichi") 2 3 K 1  *  Z ("Taichi") z z z te K 4 Z("Taichi") 1 z  NN( Z("Taichi"),K ) te z , K ) is the KNN function NN( Z proto 5 z 7 z z 4 NN example 8 Z ("Taichi") * 6     Z ("Taichi") * ( z z z z ) 4 5 6 7 8 43

Recommend


More recommend