embedding and data augmentation
play

Embedding and Data Augmentation yanweifu@fudan.edu.cn - PowerPoint PPT Presentation

One-shot Learning in Semantic Embedding and Data Augmentation yanweifu@fudan.edu.cn http://yanweifu.github.io One-shot Learning: learning object categories from just a few images, by incorporating


  1. One-shot Learning in Semantic Embedding and Data Augmentation 付彦伟 复旦大学大数据学院 yanweifu@fudan.edu.cn http://yanweifu.github.io

  2. One-shot Learning: “ learning object categories from just a few images, by incorporating “generic” knowledge which may be obtained from previously learnt models of unrelated categories ” . Fei-Fei et al. A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories. ICCV 2003 Fei-Fei, et al. One-Shot Learning of Object Categories. IEEE TPAMI 2006 One-shot Learning Object categorization

  3. Fu, Y.; Hospedales , T.; Xiang, T; Gong, S. “Attribute Learning for Understanding Unstructured Social Activity”, ECCV 2012; Fu, Y. ; Hospedales , T. ; Xiang, T. ; Gong, S. “Learning Multi - modal Latent Attributes” IEEE TPAMI 2014; Fu et al. Semi-supervised Vocabulary-informed Learning. (CVPR 2016, oral) Fu et al. Vocabulary-informed Zero-shot and Open-set Learning. IEEE TPAMI to appear One-shot Learning by Semantic Embedding

  4. Attribute Learning Pipeline mule lion horse Zebra strips tails Lampert, C. H. Learning to detect unseen object classes by between-class attribute transfer. CVPR 2009

  5. Semantic Attributes in Zero/One-shot Learning Fu, Y.; Hospedales , T.; Xiang, T; Gong, S. “Attribute Learning for Understanding Unstructured Social Activity”, ECCV 2012; Fu, Y. ; Hospedales , T. ; Xiang, T. ; Gong, S. “Learning Multi - modal Latent Attributes” IEEE TPAMI 2014;

  6. Learning Multi-modal Latent Attributes Fu, Y.; Hospedales , T.; Xiang, T; Gong, S. “Attribute Learning for Understanding Unstructured Social Activity”, ECCV 2012; Fu, Y. ; Hospedales , T. ; Xiang, T. ; Gong, S. “Learning Multi - modal Latent Attributes” IEEE TPAMI 2014;

  7. Experimental Settings Dataset & Settings: • USAA dataset (4 source cls, 4 target cls, multiple round class splits); • Animal with Attributes (AwA) dataset (40 source cls; 10 target cls); Comparisons • Direct: KNN/SVM of features to classes; • DAP: Direct Attribute Prediction [Lampert et al. CVPR 2009]; • SVM-UD: an SVM generalization of DAP; • SCA: Topic models in [Wang et al CVPR 2009]; • ST: Synthetic Transfer in [Yu et al ECCV 2010];

  8. Unstructured Social Activity Dataset (USAA) Music Non-music Wedding Wedding Wedding Parade Birthday party Graduation performance performance ceremony dance reception

  9. One-shot Learning Results For more results, please check our papers.

  10. Fu et al. Semi-supervised Vocabulary-informed Learning. (CVPR 2016, oral) Fu et al. Vocabulary-informed Zero-shot and Open-set Learning. IEEE TPAMI to appear Vocabulary-informed Learning

  11. Supervised Learning Semantic labels Visual feature space airplane car unicycle tricycle

  12. One-shot Learning Semantic labels Visual feature space airplane car unicycle tricycle

  13. Zero/One-shot Learning by Semantic Embedding (Problem Definition) Semantic labels Visual feature space Zero/one-shot Learning: We have zero/one instances visually labeled instances of what these look like. bicycle truck

  14. Learning Semantic labels Visual feature space airplane unicycle bicycle bicycle tricycle car truck truck

  15. Inference airplane unicycle bicycle bicycle tricycle car truck truck Key Question: How do we define semantic space?

  16. Semantic Label Vector Spaces Spaces Type Advantages Disadvantages Manual annotation Semantic Good interpretability of each dimension: Supervised Attributes Limited vocabulary Good vector representation for millions of Semantic Word Limited interpretability of Vectors Unsupervised vocabulary each dimension (e.g. word2vec)

  17. Vocabulary-Informed Recognition Image unicycle tricycle Fu et al. Semi-supervised Vocabulary-informed learning, CVPR 2016 (Oral)

  18. Estimating Density of Classes in the Space The knowledge of margin distribution of instances, rather than a single margin across all instances, is crucial for improving the generalization performance of a classifier. Instance margin : the distance between one instance and the separating hyperplane. The distribution for the minimal values of the margin distance is characterized by a Weibull distribution The probability of 𝑕(𝑦) included in the boundary estimated by 𝑕(𝑦 𝑗 ) Margin Distribution of Prototypes: Margin distribution of prototypes in the semantic space Coverage Distribution of Prototypes. Extreme Value Theorem Fu et al. Vocabulary-informed Zero-shot and Open-set Learning. IEEE TPAMI to appear

  19. Experimental Dataset and Tasks Dataset: AwA dataset: • ImageNet 2012/2010 dataset. • We can address following tasks by learning semantic embedding, • SUPERVISED recognition • ZERO-SHOT recognition • GENERAL-ZERO-SHOT recognition • ONE-SHOT recognition • OPEN-SET recognition

  20. Experimental Settings of Few-shot Learning • Learning Classifiers from Few Source Training Instances • Source classes: One-shot Recognition • Target classes: Zero-shot Recognition • Key insights: leveraging the knowledge from semantic space (vocabulary-informed) • Few-shot Target Training instances • Few-shot setting, consistent with general definition

  21. Results on Few-shot Learning Few-shots on source dataset

  22. Results on Few-shot Learning

  23. One-shot learning aims to learn information about object categories from one, or only a few , training images. Meta-Learning Data-Augmentation Meta Augmentation Learning One-shot Learning by Data Augmentation

  24. Multi-level Semantic Feature Augmentation for One-shot Learning Zitian Chen, Yanwei Fu, Yinda Zhang, Yu-Gang Jiang, Xiangyang Xue, and Leonid Sigal. IEEE Transaction on Image Processing (TIP) 2019

  25. Motivation • A straight forward way to tackle one-shot learning is data augmentation • We want to utilize semantic space • Related concepts in the semantic space help to learn Help? Image Feature Space Semantic Feature Space Killer whale Sea lion Mountain goat Whale Hartebeest Orca Antelopes Pronghorn Muskrat Beaver Badger Woodchuck

  26. Method Image Feature Space Semantic Feature Space Killer whale Sea lion Mountain goat Whale Hartebeest Orca Antelopes Pronghorn Muskrat 𝑔(𝑦) Beaver Badger Woodchuck 𝑕(𝑦)

  27. Single-level • But we want to utilize different level visual concepts.

  28. Multi-level • Use High-level feature and low-level feature help to encode • Decode semantic feature to different level feature diversify the augmented features

  29. Visualization

  30. Image Deformation Meta-Networks for One-Shot Learning Zitian Chen, Yanwei Fu, Yu-Xiong Wang, Lin Ma, Wei Liu, Martial Hebert

  31. The Basic Idea of Jigsaw Augmentation Method Image Block Augmentation for One-Shot Learning. Zitian Chen, Yanwei Fu, Kaiyu Chen, Yu-Gang Jiang. AAAI 2019

  32. Visual contents from other images may be helpful to synthesize new images 33

  33. Stitched Ghosted Partially occluded Montaged Human can learn novel visual concepts even when images undergo various deformations 34

  34. Deformed Images Visual contents from other images might be helpful

  35. Approach 36

  36. Motivation 1.Visual contents from other images may be helpful to synthesize new images. 2.Human can learn novel visual concepts even when images undergo various deformations. Approach We design a deformation sub-network that learns to deform images by fusing a pair of images — a probe image that keeps the visual content and a gallery image that diversifies the de- formations. 37

  37. Probe Image ANET Probe Image Concat find visually similar BNET Gallery Image Gallery Image Embedding Sub-Network Deformation Sub-Network

  38. Top-1 accuracies(%) on miniImagenet Top-1 accuracies(%) on miniImagenet 75 75 70 70 Baseline Baseline 65 65 60 60 Ours Ours 55 55 50 50 1-Shot 1-Shot 5-Shot 5-Shot 39

  39. Gaussian Ours real probe image deformed image real image 40 40

  40. NeurIPS 2019

  41. Falcon Hawk source: https://birdeden.com/distinguishing-between-hawks-falcons

  42. Fine-grained Visual Recognition • Much harder than normal classification. • Difficult to collect data. • Can’t use crowdsourcing. • Need expert annotator. • Demand one-shot learning.

  43. Can we generate more data? • How about state-of-the-art GANs? • Challenge: GAN training itself need a lot of data.

  44. Our Idea: Fine-tune GANs trained on ImageNet. One Million General Images BigGAN Z Transfer generative knowledge from one million general images to a domain specific image. ? A Specific Image Z

  45. Fine-tune BigGAN with a single image Generated Original

  46. Technical Point: Fine-tune Batch Norm Only Original Fine-Tune All Fine-Tune BatchNorm

  47. Our idea: Meta-Augmentation Learning Learning to reinforce with the original image Fused: 𝑥𝐽 + (1 − 𝑥)𝐻(𝐽) Generated: 𝐻(𝐽) Original: 𝐽 Image Fusion Net F Fusing Weight 𝑥 Use meta-learning to learn the best mixing strategy to help one-shot classifiers.

  48. Examples

  49. Our method has consistent improvement.

  50. Embodied One-Shot Video Recognition: Learning from Actions of a Virtual Embodied Agent Yuqian Fu, Chengrong Wang, Yanwei Fu, Yu-Xiong Wang, Cong Bai, Xiangyang Xue, Yu-Gang Jiang ACM Multimedia 2019

Recommend


More recommend