Towards a Fair Evaluation of Zero-Shot Action Recognition using - - PowerPoint PPT Presentation

▶

Jan 07, 2023 228 likes •375 views

European Conference on Computer Vision 2018 Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data Alina Roitberg, Manuel Martinez, Monica Haurilet, Rainer Stiefelhagen Computer Vision for Human-Computer Interaction

SLIDE 1

KIT – The Research University in the Helmholtz Association

Computer Vision for Human-Computer Interaction Karlsruhe Institute of Technology, Germany

www.kit.edu

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Alina Roitberg, Manuel Martinez, Monica Haurilet, Rainer Stiefelhagen

European Conference on Computer Vision 2018

Workshop on Shortcomings in Vision and Language, ECCV 2018

cvhci.anthropomatik.kit.edu

SLIDE 2

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 2

A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Introduction: Zero-Shot Action Recognition

Task: classifying actions without any training data How? Linking visual and semantic features

08.09.2018

SLIDE 3

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 3

A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Introduction: Zero-Shot Action Recognition

Task: classifying actions without any training data How? Linking visual and semantic features

08.09.2018

SLIDE 4

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 4

A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Introduction: Zero-Shot Action Recognition

Task: classifying actions without any training data How? Linking visual and semantic features

08.09.2018

Zero-Shot Learning premise: source and target classes are disjoint!

SLIDE 5

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 5

A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

What is the origin of the source dataset?

08.09.2018

Zero-Shot Learning premise: source and target classes are disjoint!

Origin?

SLIDE 6

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 6

A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

What is the origin of the source dataset?

Standard: intra-dataset, same origin of training- and test data

08.09.2018

Origin? Supervised Action Recognition (AR) Zero-Shot AR Intra-dataset

Classifying the

already known actions

𝑈 ⊂ 𝑇

(T: target classes, S: source classes)

Source from the same

domain: 𝑻 = 𝑻𝒐𝒃𝒖𝒋𝒘𝒇

𝑼 ∩ 𝑻𝒐𝒃𝒖𝒋𝒘𝒇 = ∅  ZSL

premise satisfied 

SLIDE 7

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 7

A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

What is the origin of the source dataset?

Standard: intra-dataset, same origin of training- and test data Cross-dataset: utilize large-scale external data sources

08.09.2018

Origin? Supervised Action Recognition (AR) Zero-Shot AR Intra-dataset

Classifying the

already known actions

𝑈 ⊂ 𝑇

(T: target classes, S: source classes)

Source from the same

domain: 𝑻 = 𝑻𝒐𝒃𝒖𝒋𝒘𝒇

𝑼 ∩ 𝑻𝒐𝒃𝒖𝒋𝒘𝒇 = ∅  ZSL

premise satisfied 

Source from a different domain: 𝑻 = 𝑻𝒇𝒚𝒖
Boost in accuracy
𝑼 ∩ 𝑻𝒇𝒚𝒖 ≠ ∅ . ZSL premise not given

Zero-Shot AR | Cross-dataset (Zhu et. al, 2018)

SLIDE 8

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 8

A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

What is the origin of the source dataset?

Standard: intra-dataset, same origin of training- and test data Cross-dataset: utilize large-scale external data sources

08.09.2018

Origin? Supervised Action Recognition (AR) Zero-Shot AR Intra-dataset

Classifying the

already known actions

𝑈 ⊂ 𝑇

(T: target classes, S: source classes)

Source from the same

domain: 𝑻 = 𝑻𝒐𝒃𝒖𝒋𝒘𝒇

𝑼 ∩ 𝑻𝒐𝒃𝒖𝒋𝒘𝒇 = ∅  ZSL

premise satisfied 

Source from a different domain: 𝑻 = 𝑻𝒇𝒚𝒖
Boost in accuracy
𝑼 ∩ 𝑻𝒇𝒚𝒖 ≠ ∅ . ZSL premise not given
Source from native and external

domains: 𝑻 = 𝑻𝒇𝒚𝒖 ∪ 𝑻𝒐𝒃𝒖𝒋𝒘𝒇

Accuracy, lower-bounded by the

intra- and cross-dataset regimes Zero-Shot AR | Cross-dataset (Zhu et. al, 2018) Zero-Shot AR Hybrid (ours)

SLIDE 9

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 9

A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

What is the origin of the source dataset?

Standard: intra-dataset, same origin of training- and test data Cross-dataset: utilize large-scale external data sources

08.09.2018

Origin? Supervised Action Recognition (AR) Zero-Shot AR Intra-dataset

Classifying the

already known actions

𝑈 ⊂ 𝑇

(T: target classes, S: source classes)

Source from the same

domain: 𝑻 = 𝑻𝒐𝒃𝒖𝒋𝒘𝒇

𝑼 ∩ 𝑻𝒐𝒃𝒖𝒋𝒘𝒇 = ∅  ZSL

premise satisfied 

Source from a different domain: 𝑻 = 𝑻𝒇𝒚𝒖
Boost in accuracy
𝑼 ∩ 𝑻𝒇𝒚𝒖 ≠ ∅ . ZSL premise not given
Source from native and external

domains: 𝑻 = 𝑻𝒇𝒚𝒖 ∪ 𝑻𝒐𝒃𝒖𝒋𝒘𝒇

Accuracy, lower-bounded by the

intra- and cross-dataset regimes Zero-Shot AR | Cross-dataset (Zhu et. al, 2018) Zero-Shot AR Hybrid (ours) Our corrective protocol eliminates source-target synonyms  ZSL premise ✔

SLIDE 10

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 10

A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Exploring the semantic source-target similarity

A dataset does not contain the same action twice  no label synonyms External datasets intersect with datasets for zero-shot AR! Example: brushing hair in ActivityNet, Kinetics and HMDB-51 Specializations: drinking beer vs. drinking  Getting rid of the direct matches is not enough!

Semantic similarity between the source and target classes is much higher for the external datasets

08.09.2018

SLIDE 11

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 11

A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Exploring the semantic source-target similarity

Semantic similarity between the source and target classes is much higher for the external datasets The accuracy is greatly influenced by the presence of analogue classes  Need for a method to constrain the external datasets

08.09.2018

SLIDE 12

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 12

A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Corrective protocol for fair cross-dataset transfer

1) Calculate the maximum intra- dataset similarity as our rejection threshold 𝒕𝒖𝒊: 𝑡𝑢ℎ = 𝑛𝑏𝑦

𝑏𝑙∈𝑇𝑗𝑜𝑢𝑠𝑏, 𝑢𝑛∈𝑈𝑡 (𝜕 𝑏𝑙 , 𝜕(𝑢𝑛))

2) Filter out the source category, if the label is too similar: ∀𝑢𝑛 ∈ 𝑈, 𝑡 𝜕 𝑏𝑙 , 𝜕 𝑢𝑛 ≤ 𝑡𝑢ℎ

Propotion of allowed source labels for different maximum similarity thresholds

Notation S = aK k=1

− source actions classes T = tm m=1

− target classes 𝜕 ⋅ − 𝑚𝑏𝑐𝑓𝑚 𝑛𝑏𝑞𝑞𝑗𝑜𝑕 𝑢𝑝 𝑢ℎ𝑓 𝑡𝑓𝑛𝑏𝑜𝑢𝑗𝑑 𝑡𝑞𝑏𝑑𝑓 𝑓. 𝑕. 𝑥𝑝𝑠𝑒2𝑤𝑓𝑑 s ⋅ − 𝑡𝑗𝑛𝑗𝑚𝑏𝑠𝑗𝑢𝑧 𝑛𝑓𝑏𝑡𝑣𝑠𝑓 𝑡𝑢ℎ−𝑢ℎ𝑠𝑓𝑡ℎ𝑝𝑚𝑒 𝑡𝑗𝑛𝑗𝑚𝑏𝑠𝑗𝑢𝑧

08.09.2018

SLIDE 13

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 13

A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Summary

We empirically show that external sources tend to have actions excessively similar to the target classes, strongly influencing the performance and violating the ZSL premise We propose an evaluation procedure that enables fair use of external data for zero-shot action recognition We propose the hybrid ZSL regime, which uses the available training data of the source domain and the large-scale external datasets, improving ZSL performance

08.09.2018

SLIDE 14

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 14

A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Come and see our poster  Thank you for your attention!

“Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data”

Alina Roitberg, Manuel Martinez, Monica Haurilet, Rainer Stiefelhagen

08.09.2018