learning from limited data
play

Learning from Limited Data The Univ. of Tokyo / RIKEN AIP Tatsuya - PowerPoint PPT Presentation

GTC March 29, 2018 Learning from Limited Data The Univ. of Tokyo / RIKEN AIP Tatsuya Harada Contents Background Deep Learning (DL) is one of the most successful machine learning methods. DL generally requires a huge amount of


  1. GTC March 29, 2018 Learning from Limited Data The Univ. of Tokyo / RIKEN AIP Tatsuya Harada

  2. Contents  Background  Deep Learning (DL) is one of the most successful machine learning methods.  DL generally requires a huge amount of annotated data.  Annotation cost is very expensive.  Challenge  Obtaining High Quality Deep Neural Networks from limited data  Topics  Learning method for supervised learning from limited data  Unsupervised domain adaptation using classifier discrepancy

  3. Learning from Limited Data Y. Tokozume Between-class Learning Yuji Tokozume, Yoshitaka Ushiku, Tatsuya Harada Learning from Between-class Examples for Deep Sound Recognition To appear ICLR 2018 Between-class Learning for Image Classification To appear CVPR 2018 https://github.com/mil-tokyo/bc_learning_sound https://github.com/mil-tokyo/bc_learning_image

  4. 4 Standard Supervised Learning 1. Select one example from training dataset 2. Train the model to output 1 for the corresponding class and 0 for the other classes Random Select & Augment Dog 1 Cat 0 Dog Bird 0 Input Bird Output Label Cat Model Training Dataset

  5. Between-class (BC) Learning 5 Proposed method 1. Select two training examples from different classes On test phase, we input a single example into the network. 2. Mix those examples with a random ratio 3. Train the model to output the mixing ratio and mixing classes Random Select & Augment 0.7 Dog 0.7 KL Cat 0.3 Dog Bird 0 Bird Input Output 0.3 Label Cat Model Training Dataset Merits  Generate infinite training data from limited data  Learn more discriminative feature space than standard learning

  6. BC learning for sounds 6  Two training examples A dog and a cat  Random ratio Dog: 1 Dog: 0 Dog: Cat: Cat: 0 Cat: 1 labels Bird: 0 Bird: 0 Bird: 0 sounds 𝐻 � , 𝐻 � : sound pressure level of 𝒚 � , 𝒚 � [dB]

  7. Results of Sound Recognition 7 ② Various datasets ① Various models ③ Compatible with strong data augmentation ④ Surpass the human level We can improve recognition performance for any sound networks, if we apply the BC learning.

  8. BC Learning for Image would not be important or even 8 have a bad effect if CNNs treat Images as waveforms input data as waveforms Proposal 1 static component wave component Dog 0.5 Dog 1.0 Cat 1.0 Cat 0.5 Proposal 2 (BC+)

  9. 9 Results on CIFAR Our preliminary results were presented in ILSVRC2017 on July 26, 2017.

  10. 10 Results on ImageNet-1K Our preliminary results were presented in ILSVRC2017 on July 26, 2017. top-1/top-5 val. error Standard 20.4/5.3 [28] 100 epochs BC (ours) 19.92/4.91 Standard 20.44/5.25 150 epochs BC (ours) 19.43/4.80 around 1% gain in top-1 error

  11. How BC Learning Works 11 Less discriminative More discriminative Class A Class A distribution distribution rA+(1-r)B rA+(1-r)B distribution distribution Class B Class B distribution distribution Small Fisher’s criterion Large Fisher’s criterion → Overlap among distributions → No overlap among distributions → Large BC learning loss → Small BC learning loss

  12. How BC Learning Works In the classification, the distributions must be uncorrelated because the teaching signal is discrete. Small correlation Large correlation A C C A rA+(1-r)B rA+(1-r)B B B Decision Decision boundary boundary Large correlation among classes Small correlation among classes → Mixing class of A and B may → Mixing class of A and B is be classified into class C. not classified into class C. → Large BC learning loss → Small BC learning loss

  13. Visualization using PCA 13 Fisher’s criterion: 1.97 Fisher’s criterion: 1.76 Activations of ・ 10-th layer of 11-layers CNN ・ trained on CIFAR-10 Standard learning BC learning (ours)  Distributions are more compact than those from standard learning.  Distributions are spherical.  Larger Fisher’s criterion than that of standard learning

  14. Learning from Limited Data K. Saito Unsupervised Domain Adaptation using Classifier Discrepancy Adversarial Dropout Regularization Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko To apper ICLR 2018 Maximum Classifier Discrepancy for Unsupervised Domain Adaptation Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, Tatsuya Harada To apper CVPR 2018, oral presentation

  15. Domain Adaptation (DA)  Problems  Supervised learning model needs many labeled examples  Cost to collect them in various domains  Goal  Transfer knowledge from source to target domain  Classifier that works well on target domain.  Unsupervised Domain Adaptation (UDA)  Labeled examples are given only in the source domain.  There are no labeled examples in the target domain. Target domain Source domain Real images, unlabeled Synthetic images, labeled

  16. Related Work  Distribution matching based method • Match distributions of source and target features • Domain Classifier (GAN) [Ganin et al., 2015] • Maximum Mean Discrepancy [Long et al., 2015]  Problems • Features are aligned just by looking hidden features. • Relationship between the decision boundary and target examples is not considered. • This method only considers whole distribution. Before adaptation Adapted T Target Source (unlabeled) Source Domain Classifier Feature Extractor S Source Category (labeled) Classifier Decision boundary Target Target

  17. Proposed Approach  Considering class specific distributions  Using decision boundary to align distributions Proposed Before adaptation Adapted Source Source Source Decision Class A boundary Target Source Class B Previous work Decision Decision Target Target Target boundary boundary

  18. Key Idea  Maximizing discrepancy by learning two classifiers  Minimizing discrepancy by learning feature space Maximize discrepancy Minimize discrepancy Maximize discrepancy Minimize discrepancy by learning classifiers by learning feature space by learning classifiers by learning feature space Source Source Source Source F 1 F 1 F 1 F 1 F 2 F 2 F 2 F 2 Target Target Target Target Discrepancy is the example which gets different Discrepancy Discrepancy predictions from two different classifiers.

  19. Network Architecture and Training Loss Input L1 class 1 F 1 Classifiers 1 2 F 2 L2 class 2 Algorithm 1. Fix generator 𝐻 , and find classifiers 𝐺 � , 𝐺 � that maximize 𝑬 − (𝑴 𝟐 + 𝑴 𝟑 ) 2. Find 𝐻 , 𝐺 � , 𝐺 � that minimize 𝑴 𝟐 + 𝑴 𝟑 (minimize classification error on source) 3. for 𝑙 = 1: 𝑜 Fix classifiers 𝐺 � , 𝐺 � , and find feature generator 𝐻 that minimizes 𝑬 Maximize D by learning classifier Minimize D by learning feature generator Source Source F 1 F 1 F 2 F 2 Target Target

  20. Why Discrepancy Method Works Well? 20 Hypothesis Shared error of Expected error the ideal hypothesis in source domain Expected error in target domain

  21. Object Classification  Synthetic images to Real images (12 Classes)  Finetune pre-trained ResNet101 [He et al., CVPR 2016] (ImageNet)  Source:images, Target:images Source (Synthetic images) Target (Real images)

  22. Semantic Segmentation  Simulated Image (GTA5) to Real Image (CityScape)  Finetuning of pre-trained VGG, Dilated Residual Network [Yu et al., 2017] (ImageNet)  Calculate discrepancy pixel-wise  Evaluation by mean IoU (TP/(TP+FP+FN)) GTA 5 (Source) CityScape(Target) 100 source only 90 80 ours 70 60 IoU 50 40 30 20 10 0 road sdwk bldng wall pole light sign vg n trrn rider truck train mcycl bcycl fence sky car bus perso

  23. Qualitative Results RGB Ground truth Source only Adapted (ours)

  24. Take Home Messages  Between-class learning (BC learning)  Mix two training examples with a random ratio  Train the model to output the mixing ratio  Simple and easy to implement  Can be introduced independently from previous techniques:  network architectures, data augmentation schemes, optimizers, etc.  Unsupervised Domain Adaptation  Unsupervised domain adaptation method using classifier discrepancy is useful.

Recommend


More recommend