European Conference on Computer Vision 2018 Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data Alina Roitberg, Manuel Martinez, Monica Haurilet, Rainer Stiefelhagen Computer Vision for Human-Computer Interaction cvhci.anthropomatik.kit.edu Karlsruhe Institute of Technology, Germany Workshop on Shortcomings in Vision and Language, ECCV 2018 www.kit.edu KIT – The Research University in the Helmholtz Association
Introduction: Zero-Shot Action Recognition Task : classifying actions without any training data How? Linking visual and semantic features A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen 2 08.09.2018 Workshop on Shortcoming in Vision and Language Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data European Conference on Computer Vision 2018
Introduction: Zero-Shot Action Recognition Task : classifying actions without any training data How? Linking visual and semantic features A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen 3 08.09.2018 Workshop on Shortcoming in Vision and Language Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data European Conference on Computer Vision 2018
Introduction: Zero-Shot Action Recognition Task : classifying actions without any training data How? Linking visual and semantic features Zero-Shot Learning premise: source and target classes are disjoint! A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen 4 08.09.2018 Workshop on Shortcoming in Vision and Language Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data European Conference on Computer Vision 2018
What is the origin of the source dataset? Origin? Zero-Shot Learning premise: source and target classes are disjoint! A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen 5 08.09.2018 Workshop on Shortcoming in Vision and Language Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data European Conference on Computer Vision 2018
What is the origin of the source dataset? Standard: intra-dataset, same origin of training- and test data Zero-Shot AR Supervised Action Origin? Intra-dataset Recognition (AR) Source from the same Classifying the already known actions domain : 𝑻 = 𝑻 𝒐𝒃𝒖𝒋𝒘𝒇 𝑈 ⊂ 𝑇 𝑼 ∩ 𝑻 𝒐𝒃𝒖𝒋𝒘𝒇 = ∅ ZSL (T: target classes, S: source classes) premise satisfied A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen 6 08.09.2018 Workshop on Shortcoming in Vision and Language Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data European Conference on Computer Vision 2018
What is the origin of the source dataset? Standard: intra-dataset, same origin of training- and test data Zero-Shot AR Supervised Action Origin? Intra-dataset Recognition (AR) Source from the same Classifying the already known actions domain : 𝑻 = 𝑻 𝒐𝒃𝒖𝒋𝒘𝒇 𝑈 ⊂ 𝑇 𝑼 ∩ 𝑻 𝒐𝒃𝒖𝒋𝒘𝒇 = ∅ ZSL (T: target classes, S: source classes) premise satisfied Cross-dataset: utilize large-scale external data sources Zero-Shot AR | Cross-dataset (Zhu et. al, 2018) Source from a different domain: 𝑻 = 𝑻 𝒇𝒚𝒖 Boost in accuracy 𝑼 ∩ 𝑻 𝒇𝒚𝒖 ≠ ∅ . ZSL premise not given A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen 7 08.09.2018 Workshop on Shortcoming in Vision and Language Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data European Conference on Computer Vision 2018
What is the origin of the source dataset? Standard: intra-dataset, same origin of training- and test data Zero-Shot AR Supervised Action Origin? Intra-dataset Recognition (AR) Source from the same Classifying the already known actions domain : 𝑻 = 𝑻 𝒐𝒃𝒖𝒋𝒘𝒇 𝑈 ⊂ 𝑇 𝑼 ∩ 𝑻 𝒐𝒃𝒖𝒋𝒘𝒇 = ∅ ZSL (T: target classes, S: source classes) premise satisfied Cross-dataset: utilize large-scale external data sources Zero-Shot AR | Cross-dataset Zero-Shot AR (Zhu et. al, 2018) Hybrid (ours) Source from native and external Source from a different domain: 𝑻 = 𝑻 𝒇𝒚𝒖 domains: 𝑻 = 𝑻 𝒇𝒚𝒖 ∪ 𝑻 𝒐𝒃𝒖𝒋𝒘𝒇 Boost in accuracy Accuracy, lower-bounded by the 𝑼 ∩ 𝑻 𝒇𝒚𝒖 ≠ ∅ . ZSL premise not given intra- and cross-dataset regimes A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen 8 08.09.2018 Workshop on Shortcoming in Vision and Language Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data European Conference on Computer Vision 2018
What is the origin of the source dataset? Standard: intra-dataset, same origin of training- and test data Zero-Shot AR Supervised Action Origin? Intra-dataset Recognition (AR) Source from the same Classifying the already known actions domain : 𝑻 = 𝑻 𝒐𝒃𝒖𝒋𝒘𝒇 𝑈 ⊂ 𝑇 𝑼 ∩ 𝑻 𝒐𝒃𝒖𝒋𝒘𝒇 = ∅ ZSL (T: target classes, S: source classes) premise satisfied Cross-dataset: utilize large-scale external data sources Zero-Shot AR | Cross-dataset Zero-Shot AR (Zhu et. al, 2018) Hybrid (ours) Source from native and external Source from a different domain: 𝑻 = 𝑻 𝒇𝒚𝒖 domains: 𝑻 = 𝑻 𝒇𝒚𝒖 ∪ 𝑻 𝒐𝒃𝒖𝒋𝒘𝒇 Boost in accuracy Accuracy, lower-bounded by the 𝑼 ∩ 𝑻 𝒇𝒚𝒖 ≠ ∅ . ZSL premise not given intra- and cross-dataset regimes Our corrective protocol eliminates source-target synonyms ZSL premise ✔ A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen 9 08.09.2018 Workshop on Shortcoming in Vision and Language Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data European Conference on Computer Vision 2018
Exploring the semantic source-target similarity A dataset does not contain the same action twice no label synonyms External datasets intersect with datasets for zero-shot AR! Semantic similarity between the source and target Example : brushing hair classes is much higher for the external datasets in ActivityNet , Kinetics and HMDB-51 Specializations: drinking beer vs. drinking Getting rid of the direct matches is not enough! A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen 10 08.09.2018 Workshop on Shortcoming in Vision and Language Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data European Conference on Computer Vision 2018
Exploring the semantic source-target similarity Semantic similarity between the source and target classes is much higher for the external datasets The accuracy is greatly influenced by the presence of analogue classes Need for a method to constrain the external datasets A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen 11 08.09.2018 Workshop on Shortcoming in Vision and Language Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data European Conference on Computer Vision 2018
Corrective protocol for fair cross-dataset transfer 1) Calculate the maximum intra- Propotion of allowed source labels for dataset similarity as our different maximum similarity thresholds rejection threshold 𝒕 𝒖𝒊 : 𝑡 𝑢ℎ = 𝑏 𝑙 ∈𝑇 𝑗𝑜𝑢𝑠𝑏 , 𝑢 𝑛 ∈𝑈 𝑡 (𝜕 𝑏 𝑙 , 𝜕(𝑢 𝑛 )) 𝑛𝑏𝑦 2) Filter out the source category, if the label is too similar: ∀𝑢 𝑛 ∈ 𝑈, 𝑡 𝜕 𝑏 𝑙 , 𝜕 𝑢 𝑛 ≤ 𝑡 𝑢ℎ Notation K M S = a K k=1 − source actions classes T = t m m=1 − target classes 𝜕 ⋅ − 𝑚𝑏𝑐𝑓𝑚 𝑛𝑏𝑞𝑞𝑗𝑜 𝑢𝑝 𝑢ℎ𝑓 𝑡𝑓𝑛𝑏𝑜𝑢𝑗𝑑 𝑡𝑞𝑏𝑑𝑓 𝑓. . 𝑥𝑝𝑠𝑒2𝑤𝑓𝑑 𝑡 𝑢ℎ −𝑢ℎ𝑠𝑓𝑡ℎ𝑝𝑚𝑒 𝑡𝑗𝑛𝑗𝑚𝑏𝑠𝑗𝑢𝑧 s ⋅ − 𝑡𝑗𝑛𝑗𝑚𝑏𝑠𝑗𝑢𝑧 𝑛𝑓𝑏𝑡𝑣𝑠𝑓 A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen 12 08.09.2018 Workshop on Shortcoming in Vision and Language Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data European Conference on Computer Vision 2018
Summary We empirically show that external sources tend to have actions excessively similar to the target classes, strongly influencing the performance and violating the ZSL premise We propose an evaluation procedure that enables fair use of external data for zero-shot action recognition We propose the hybrid ZSL regime , which uses the available training data of the source domain and the large-scale external datasets, improving ZSL performance A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen 13 08.09.2018 Workshop on Shortcoming in Vision and Language Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data European Conference on Computer Vision 2018
Thank you for your attention! Come and see our poster “Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data” Alina Roitberg, Manuel Martinez, Monica Haurilet, Rainer Stiefelhagen A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen 14 08.09.2018 Workshop on Shortcoming in Vision and Language Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data European Conference on Computer Vision 2018
Recommend
More recommend