Co-Representation Network for Generalized Zero-Shot Learning Fei Zhang, Guangming Shi XIDIAN UNIVERSITY ICML 2019
In Intr troduct oduction ion ➢ Classic Deep CNN Data requirements decrease Predict ➢ Transfer Learning • Few-Shot Learning Predict • One-Shot Learning • Zero-Shot Learning Source space (ZSL) (Seen Classes) Conventional ZSL (CZSL) Semantic space Predict · · · Legs Fur (Attributes, word2vecs) Target space (Unseen Classes) Generalized ZSL (GZSL)
Bia ias s Pro roblem blem Existing Embedding Models for GZSL Average per-class top-1 accuracy in % on unseen classes of various models following CZSL settings and GZSL settings • Visual Space 80 to Semantic Space 68.3 65.6 • Visual & Semantic Space 60.1 59.9 60 55.1 to a Latent Space 54.2 54 53 • Semantic Space 45.6 44.1 to Visual Space 40 Bias Problem Unseen samples are easily 20 classified into similar seen classes. 16.8 13.4 11.3 8.9 0 0.4 7 7.3 1.8 1.8 0 DAP CONSE SSE LATEM ALE DEVISE SJE SYNC SAE GFZSL CZSL GZSL e.g. Zebra → Horse Yongqin , Xian , et al. “Zero -Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly.” IEEE TPAMI 2017
Ou Our Mo r Model del O Feature Image 1 Expert module f 1 anchor feature O 2 Expert module f 2 C Image input CNN Concatennate Horse Back progagation O Zebra K Relation module Expert module f K Panda Predict Tiger Relation module g Semantic input Cooperation module f Similarity output ➢ Co-Representation Network (CRnet) 1. A cooperation module for visual feature representation (our main contribution). 2. A pre-trained CNN (Resnet-101) for feature extraction. 3. A relation module for similarity output, i.e. the classification. (Sung, Flood , et al. "Learning to Compare: Relation Network for Few-Shot Learning." CVPR 2018.)
Alg lgorithm rithm ➢ Initialization Algorithm O 1 Single layer perceptron Expert module f 1 Perform K-means Clustering on the semantic space. O 2 Semantic vectors: Single layer perceptron Expert module f 2 Clustering center: O Expert module k: K Single layer perceptron Expert module f K ➢ Cooperation Module Feature Sum the outputs of expert modules. Expert module f 1 anchor Expert module f 2 Visual Embedding Space Expert module f K
Alg lgorithm rithm ➢ Relation Module ➢ Training Feature Image Concatenate feature anchor (output anchor feature Objective function: of cooperation module) and image feature v as the input. C Tow-layer perceptron with Sigmoid. Concatenate Ground-truth: End-to-end manner. Relation module Predict • When the model converges, cooperation module divides the semantic space into several parts. • Semantic vectors located in different parts are projected by several different expert modules. Semantic Space Semantic Space
Be Benchmar nchmark k Resul sults ts
Ana naly lysis sis ➢ Bias Problem ➢ Local Relative Distance (LRD) Unseen anchors distribute too close to We propose the LRD as a metric for bias problem. seen anchors in the embedding space used for classification. , Larger LRD means a more uniform embedding space, i.e. slighter bias problem. 1-d semantic space to 1-d visual embedding space: Visual Embedding Space Visual Embedding Space Serious bias problem Slight bias problem • High local linearity results in larger LRD. • Cooperation module actually learns a piecewise linear function of K+1 pieces with high local linearity f G : General fitting curve; f CR : Fitting curve of CRnet S: semantic space; V: visual embedding space.
Contra ntrast st Exp xperi erimen ments ts ➢ Relation Network (RN) A two-layer perceptron instead of cooperation module is used. (Sung, Flood , et al. "Learning to Compare: Relation Network for Few-Shot Learning." CVPR 2018.) O Feature Image 1 Expert module f 1 anchor feature O 2 CRnet Expert module f 2 C Image input CNN Concatenate Horse O Zebra K Relation module Expert module f K Panda Predict Tiger Relation module g Semantic input Cooperation module f Similarity output vs vs Feature Image anchor feature Semantic Vectors Input RN C Image input CNN Concatenate Two-layer Horse Zebra Perceptron Relation module Panda Predict Tiger Relation module g Similarity output
Contra ntrast st Exp xperi erimen ments ts ➢ Results • Slighter Bias Problem Compared with RN, CRnet achieves: Bias Rate of RN Error Rate of RN Bias Rate of CRnet Error Rate of CRnet • More Sparse and Discriminative Features 97.9 96.2 100 89.7 82.1 81.5 78.9 80 66.2 64.2 61.5 60.9 59.5 59.3 Rate (%) 57.5 60 48.6 46.7 46.3 44.9 44.6 39.3 40 32.3 22.3 21.1 17.6 18.2 • More Uniform Embedding Space (Larger LRD) 13.8 20 14.8 14.5 12.6 12.3 11.6 5.8 4.5 4.3 2.9 2.3 2.1 1.3 0.2 0.8 0.5 0 0.1 0 0 0 1 2 3 4 5 6 7 8 9 10 Avg Unseen Class Index Figure. Bar chart of per-class Bias Rate and per-class Error Rate of RN and CRnet on AwA2. Bias Rate: The rate in % of misclassification into the closest seen class; Error Rate: Per-class classification Error Rate in %.
Su Summarize mmarize ➢ Co-representation network • Decomposition method for projecting semantic space to visual embedding space. • Cooperation module for representation and learnable relation module for classification. ✓ Training in an end-to-end manner. ✓ Slighter bias problem leads to a good performance on GZSL. Other advantages: ✓ Simple structure with high expandability. Email: ✓ No need for semantic information of unseen classes during training f.zhang@stu.xidian.edu.cn (compared with generative models)
Recommend
More recommend