CS 4803 / 7643: Deep Learning Topics: – Moving beyond supervised learning Zsolt Kira Georgia Tech
Administrativia • Projects! – Due April 30 th – Template online – Can use MS Word but follow the organization/rubric! • No posters/presentations (C) Zsolt Kira 2
Project Note • Important note: – Your project should include doing something beyond just downloading open-source code, fine-tuning, and showing the result – This can include: • implementation of additional approaches (even if leveraging open- source code), • a thorough analysis/investigation of some phenomena or hypothesis • theoretical analysis, or • When using external resources, provide references to anything you used in the write-up! (C) Zsolt Kira 3
Supervised Learning Supervised Learning ● ML has been focused largely on this ● Lots of other problem settings are now coming up: What if we have unlabeled data? ○ What if we have many datasets? ○ 4 What if we only have one example per (new) class? ○
But wait, there’s more! • Transfer Learning • Semi-supervised learning • One/Few-shot learning • Un/Self-Supervised Learning • Domain adaptation Setting Source Target Shift Type • Meta-Learning Semi-supervised Single Single None • Zero-shot learning labeled unlabeled • Continual / Lifelong-learning Domain Single Single Non- Adaptation labeled unlabeled semantic • Multi-modal learning Domain Multiple Unknown Non- • Multi-task learning Generalization labeled semantic Cross-Task Single Single Semantic • Active learning Transfer labeled unlabeled • … Few-Shot Single Single few- Semantic Learning labeled labeled Un/Self- Single Many labeled Both/Task Supervised unlabeled (C) Zsolt Kira 5
An Entire Class on this! • Deep Unsupervised Learning class (UC Berkeley) • Link: – https://sites.google.com/view/berkeley-cs294-158-sp20/home (C) Zsolt Kira 6
But wait, there’s more! • Transfer Learning • Semi-supervised learning • One/Few-shot learning • Un/Self-Supervised Learning • Domain adaptation • Meta-Learning • Zero-shot learning • Continual / Lifelong-learning • Multi-modal learning • Multi-task learning • Active learning • … (C) Zsolt Kira 7
What is Semi-Supervised Learning? Supervised Learning Semi-Supervised Learning 8 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley
What is Semi-Supervised Learning? Supervised Learning Semi-Supervised Learning 9 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley
Semi-Supervised Learning ● Classification: Fully Supervised ○ Training data: (image, label), predict label for new images. ● What if we have a few labeled samples and many unlabeled samples? Labeling is generally time-consuming and expensive in certain domains. ● Semi-Supervised Learning ○ Training data: Labeled data (image, label) and Unlabeled data (image) ○ Goal: Use the unlabeled data to make supervised learning better ○ Note: If we have lots of labeled data, this goal is much harder 10 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley
Why Semi-Supervised Learning? Slide: Thang Luong ● My take: Reality might be in-between: 11 ● Might be able to improve upon high-labeled data regime but with exponentially increasing unlabeled data (of the proper type) ● See Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley
Agenda ■ Core concepts Confidence vs Entropy ■ Pseudo Labeling ■ Entropy minimization ■ Virtual Adversarial Training ■ Label Consistency ■ Make sure augmentations of the sample have the same class ■ Pi-Model, Temporal Ensembling, Mean Teacher ■ Regularization ■ Weight decay ■ Dropout ■ Data-Augmentation (MixUp, CutOut) ■ Unsupervised Data Augmentation (UDA), MixMatch ■ Co-Training / Self-Training / Pseudo Labeling (Noisy Student) ■ 12 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley
Pseudo Labeling ● Simple idea: ○ Train on labeled data ○ Make predictions on unlabeled data ○ Add confident predictions to training data ○ Can do these both end-to-end (no need to separate stages) 13 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley
Issue: Confidences on New Data ● Predictions on unlabeled data may be too flat (high entropy) ● Solution: Entropy minimization ● Several ways to achieve this ○ Explicit loss ○ Sharpening function (e.g. temperature scaling) 14 Image Credit: Figure modified from MixMatch paper
Label Consistency with Data Augmentation 15
Label Consistency with Data Augmentation Could be Unlabeled or Labeled 16
Label Consistency with Data Augmentation 17
Label Consistency with Data Augmentation Make sure that the logits are similar 18
More Data Augmentation -> Regularization 19
Realistic Evaluation of Semi-Supervised Learning 20
Outline ■ Realistic Evaluation of Semi-Supervised Learning pi-model ■ Temporal Ensembling ■ Mean Teacher ■ Virtual Adversarial Training ■ 21
pi-Model Temporal Ensembling for Semi-Supervised Learning 22
pi-Model Temporal Ensembling for Semi-Supervised Learning 23
Comparison 24
Comparison 25
Varying number of labels 26
Class Distribution Mismatch 27
MixMatch 28
MixMatch 29
MixMatch MixUp 30
MixMatch 31
MixMatch 32
MixMatch 33
MixMatch 34
FixMatch 35
FixMatch - Results 36
But wait, there’s more! • Transfer Learning • Semi-supervised learning • One/Few-shot learning • Un/Self-Supervised Learning • Domain adaptation • Meta-Learning • Zero-shot learning • Continual / Lifelong-learning • Multi-modal learning • Multi-task learning • Active learning • … (C) Zsolt Kira 37
Few-Shot Learning (C) Zsolt Kira 38 Slide Credit: Hugo Larochelle
Few-Shot Learning (C) Zsolt Kira 39 Slide Credit: Hugo Larochelle
Normal Approach • Do what we always do: Fine-tuning – Train classifier on base classes – Freeze features – Learn classifier weights for new classes using few amounts of labeled data (during “ query” time!) A Closer Look at Few-shot Classification, Wei-Yu Chen, Yen-Cheng Liu, (C) Zsolt Kira 40 Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang
Cons of Normal Approach • The training we do on the base classes does not factor the task into account • No notion that we will be performing a bunch of N- way tests • Idea: simulate what we will see during test time (C) Zsolt Kira 41
Meta-Training Approach • Set up a set of smaller tasks during training which simulates what we will be doing during testing https://www.borealisai.com/en/blog/tutorial-2-few-shot-learning-and-meta-learning-i/ – Can optionally pre-train features on held-out base classes (not typical) • Testing stage is now the same, but with new classes (C) Zsolt Kira 42
Meta-Learning Approaches • Learning a model conditioned on support set (C) Zsolt Kira 43
More Sophisticated Meta-Learning Approaches • Learn gradient descent: – Parameter initialization and update rules – Output: • Parameter initialization • Meta-learner that decides how to update parameters • Learn just an initialization and use normal gradient descent (MAML) – Output: • Just parameter initialization! • We are using SGD (C) Dhruv Batra & Zsolt Kira 44
Meta-Learner • How to parametrize learning algorithms? • Two approaches to defining a meta-learner – Take inspiration from a known learning algorithm • kNN/kernel machine: Matching networks (Vinyals et al. 2016) • Gaussian classifier: Prototypical Networks (Snell et al. 2017) • Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) , MAML (Finn et al. 2017) – Derive it from a black box neural network • MANN (Santoro et al. 2016) • SNAIL (Mishra et al. 2018) (C) Zsolt Kira 45 Slide Credit: Hugo Larochelle
More Sophisticated Meta-Learning Approaches • Learn gradient descent: – Parameter initialization and update rules – Output: • Parameter initialization • Meta-learner that decides how to update parameters • Learn just an initialization and use normal gradient descent (MAML) – Output: • Just parameter initialization! • We are using SGD (C) Zsolt Kira 46
Meta-Learner LSTM (C) Zsolt Kira 47 Slide Credit: Hugo Larochelle
Meta-Learner LSTM (C) Zsolt Kira 48 Slide Credit: Hugo Larochelle
Meta-Learner LSTM (C) Zsolt Kira 49 Slide Credit: Hugo Larochelle
Meta-Learner LSTM (C) Zsolt Kira 50 Slide Credit: Hugo Larochelle
Model-Agnostic Meta-Learning (MAML) (C) Zsolt Kira 53 Slide Credit: Hugo Larochelle
Model-Agnostic Meta-Learning (MAML) (C) Zsolt Kira 55 Slide Credit: Sergey Levine
Model-Agnostic Meta-Learning (MAML) (C) Zsolt Kira 56 Slide Credit: Sergey Levine
Comparison (C) Zsolt Kira 57 Slide Credit: Sergey Levine
Recommend
More recommend