incremental classification first step into lifelong
play

Incremental Classification: First Step into Lifelong Learning PAN - PowerPoint PPT Presentation

Incremental Classification: First Step into Lifelong Learning PAN Xinyu MMLab, Department of IE Multi-task Incremental Classification: Setup Training Data ... Target Model Multi-task Incremental Classification: Setup Training


  1. Incremental Classification: First Step into Lifelong Learning PAN Xinyu MMLab, Department of IE

  2. Multi-task Incremental Classification: Setup Training Data ... Target Model … … …

  3. Multi-task Incremental Classification: Setup Training Data ... Target Model … … …

  4. Multi-task Incremental Classification: Setup Training Data ... Target Model … … …

  5. Multi-task Incremental Classification: Baseline … … … … Feature Extraction Re-training Finetuning Time consuming Sub-optimal for the new task Catastrophic forgetting

  6. Potential Application Scenarios • Limited storage budget that can not keep all sequential data. • The collected data will expire due to privacy issues. • Efficient deployment of the model for incremental data. • ...

  7. Lifelong Learning via Progressive Distillation and Retrospection Saihui Hou 1* Xinyu Pan 2* Chen Change Loy 3 Zilei Wang 1 Dahua Lin 2 1 University of Science and Technology of China 2 The Chinese University of Hong Kong 3 Nanyang Technological University [* indicates joint first authorship] (Accepted in ECCV 2018)

  8. Handle Catastrophic Forgetting How to prevent performance drop … in the old task? … Finetuning Catastrophic forgetting

  9. Handle Catastrophic Forgetting How to prevent performance drop … in the old task during training? We need an indicator . … Finetuning Catastrophic forgetting

  10. Handle Catastrophic Forgetting How to prevent performance drop … in the old task during training? We need an indicator . … How to construct an indicator if we do not reserve any of old data? Finetuning Catastrophic forgetting

  11. Handle Catastrophic Forgetting How to prevent performance drop … in the old task during training? We need an indicator . … How to construct an indicator if we do not reserve any of old data? Finetuning Take new data as fake old data. Catastrophic forgetting

  12. Learning without Forgetting (Accepted in ECCV 2016) Task-specific Feature Extractor Training Data Classifiers 𝑮 ∗ ∗ 𝑼 𝒐 Loss 𝒐𝒇𝒙 New Data ∗ 𝑼 𝒑 Loss 𝒑𝒎𝒆 𝑮 𝑼 𝒑 Original CNN

  13. Adaptation by Distillation Task-specific Feature Extractor Training Data Classifiers New Data 𝑮 𝒐 𝑼 𝒐 Expert CNN ∗ 𝑼 𝒐 Loss 𝒐𝒇𝒙 𝑮 ∗ ∗ 𝑼 𝒑 Loss 𝒑𝒎𝒆 What if we reserve a small faction of 𝑮 𝑼 𝒑 Original CNN old data?

  14. Adaptation by Distillation + Retrospection Task-specific Feature Extractor Training Data Classifiers New Data 𝑮 𝒐 𝑼 𝒐 Expert CNN ∗ 𝑼 𝒐 Loss 𝒐𝒇𝒙 𝑮 ∗ ∗ 𝑼 𝒑 Loss 𝒑𝒎𝒆 𝑮 𝑼 𝒑 Original CNN + Retrospection

  15. Overview of Distillation and Retrospection Training Data Retrospection ... from old tasks Lifelong ... learning … … … Adaptation by ... distillation to … … new tasks Expert CNN for Task 1 Expert CNN for Task 2

  16. Dataset

  17. Some Results

  18. Ablation Study on #Reserved Samples

  19. Learning a Unified Classifier Incrementally via Rebalancing Saihui Hou 1* Xinyu Pan 2* Chen Change Loy 3 Zilei Wang 1 Dahua Lin 2 1 University of Science and Technology of China 2 The Chinese University of Hong Kong 3 Nanyang Technological University [* indicates joint first authorship] (To appear in CVPR 2019)

  20. From Multi-task to Multi-class … There is an oracle to tell which classifier should be used at inference time. … Multi-task Setting

  21. From Multi-task to Multi-class … … … … Multi-task Setting Multi-class Setting

  22. From Multi-task to Multi-class … … There is no oracle here. But can we simply adapt distillation and retrospection to this setup? … … Multi-task Setting Multi-class Setting

  23. A Toy Example to Visualize Imbalance

  24. Handle the Imbalance new classes embeddings old class embeddings Cosine Normalization Imbalanced Magnitudes (We will use embedding and the weights of last fully-connected layer alternatively in the following.)

  25. Handle the Imbalance previous knowledge Deviation Less-Forget Constraint

  26. Handle the Imbalance Negative Anchor new classes embeddings old class Positive embeddings Ambiguities Inter-Class Separation

  27. Overview CNN Features Class Embedding ∗ 𝒈 𝐩𝐦𝐞 Old Model 𝑮 ∗ Reserved Samples 𝑯 𝑴 𝐧𝐬 𝑴 𝐞𝐣𝐭 𝒈 𝐩𝐦𝐞 𝑮 New Samples New Model 𝑮 𝒈 𝐨𝐟𝐱 𝑴 𝐝𝐟

  28. Overview CNN Features Class Embedding ∗ 𝒈 𝐩𝐦𝐞 Old Model 𝑮 ∗ Reserved Samples 𝑯 𝑴 𝐧𝐬 𝑴 𝐞𝐣𝐭 𝒈 𝐩𝐦𝐞 𝑮 New Samples New Model 𝑮 𝒈 𝐨𝐟𝐱 𝑴 𝐝𝐟

  29. Overview CNN Features Class Embedding ∗ 𝒈 𝐩𝐦𝐞 Old Model 𝑮 ∗ Reserved Samples 𝑯 𝑴 𝐧𝐬 𝑴 𝐞𝐣𝐭 𝒈 𝐩𝐦𝐞 𝑮 New Samples New Model 𝑮 𝒈 𝐨𝐟𝐱 𝑴 𝐝𝐟

  30. Overview CNN Features Class Embedding ∗ 𝒈 𝐩𝐦𝐞 Old Model 𝑮 ∗ Reserved Samples 𝑯 𝑴 𝐧𝐬 𝑴 𝐞𝐣𝐭 𝒈 𝐩𝐦𝐞 𝑮 New Samples New Model 𝑮 𝒈 𝐨𝐟𝐱 𝑴 𝐝𝐟

  31. Overview CNN Features Class Embedding ∗ 𝒈 𝐩𝐦𝐞 Old Model 𝑮 ∗ Reserved Samples 𝑯 𝑴 𝐧𝐬 𝑴 𝐞𝐣𝐭 𝒈 𝐩𝐦𝐞 𝑮 New Samples New Model 𝑮 𝒈 𝐨𝐟𝐱 𝑴 𝐝𝐟

  32. Some Results 10 phases 5-phase ablation study

  33. Thank you!

Recommend


More recommend