learning with multiple complementary labels
play

Learning with Multiple Complementary Labels Lei Feng 1* , Takuo - PowerPoint PPT Presentation

Learning with Multiple Complementary Labels Lei Feng 1* , Takuo Kaneko 2,3* , Bo Han 3,4 , Gang Niu 3 , Bo An 1 , Masashi Sugiyama 2,3 1 Nanyang Technological University, Singapore 2 The University of Tokyo, Tokyo, Japan 3 RIKEN Center for Advanced


  1. Learning with Multiple Complementary Labels Lei Feng 1* , Takuo Kaneko 2,3* , Bo Han 3,4 , Gang Niu 3 , Bo An 1 , Masashi Sugiyama 2,3 1 Nanyang Technological University, Singapore 2 The University of Tokyo, Tokyo, Japan 3 RIKEN Center for Advanced Intelligent Project, Tokyo, Japan 4 Hong Kong Baptist University, Hong Kong SAR, China * Equal Contribution ICML 2020 ICML 2020 Learning with Multiple Complementary Labels 1

  2. Outline ⚫ Learning Frameworks ⚫ Problem Formulation ⚫ The Proposed Methods ❑ Wrappers ❑ Unbiased Risk Estimator ❑ Upper-Bound Surrogate Losses ⚫ Experiments ⚫ Conclusion ICML 2020 Learning with Multiple Complementary Labels 2

  3. Learning Frameworks Supervised Learning: ⚫ Instance True Label Unsupervised Learning: ⚫ Instance ??? Semi-Supervised Learning [Chapelle et ⚫ Instance True Label al., 2006]: Instance ??? Complementary-Label Learning [Ishida ⚫ False label Instance et al., 2017;2019]: False label Learning with Multiple ⚫ False label Instance Complementary Labels (our paper): False label ICML 2020 Learning with Multiple Complementary Labels 3

  4. ҧ ҧ ҧ ҧ ҧ ҧ Data Distribution For complementary-label (CL) learning [Ishida et al., 2017; 2019]: 1 𝑙−1 σ 𝑧≠ ത 𝑞 𝒚, ത 𝑧 = 𝑧 𝑞(𝒚, 𝑧) . For learning with multiple complementary labels (MCLs): 𝑙−1 𝑞 𝑡 = 𝑘 𝑞 𝑦, ത 𝑞 𝒚, ത 𝑍 = σ 𝑘=1 𝑍 𝑡 = 𝑘) , where 1 if ത ෍ 𝑞(𝒚, 𝑧) 𝑍 = 𝑘, 𝑞 𝒚, ത 𝑙−1 𝑍 𝑡 = 𝑘) ≔ ൞ 𝑘∉ ത 𝑍 𝑘 0 otherwise. ➢ 𝑙 : the number of classes 𝑞 𝒚, ത 𝑧 : joint distribution with a single CL ➢ ➢ 𝑞(𝒚, 𝑧) : joint distribution with a single true label 𝑞 𝒚, ത 𝑍 : joint distribution with MCLs ➢ ➢ 𝑞 𝑡 = 𝑘 : the probability of the size of the set of MCLs being 𝑘 ICML 2020 Learning with Multiple Complementary Labels 4

  5. Wrappers ➢ #TP: how many times the correct label serves as a non-complementary label for each instance ➢ #FP: how many times the other labels except the correct label serve as a non-complementary label for each instance ➢ Supervision Purity: #TP/(#TP+#FP) Decomposing a set of MCLs into many single CLs: Decomposition after Shuffle/Decomposition before Shuffle. E.g., suppose ത 𝑧 2 , 𝑦, ത 𝑍 = 𝑧 1 , ത ത 𝑍 is decomposed into 𝑦, ത 𝑧 1 and 𝑦, ത 𝑧 2 . Using the wrappers, we can apply any existing complementary-label learning methods. However, the supervision purity would be diluted after decomposition, as shown in the above table. ICML 2020 Learning with Multiple Complementary Labels 5

  6. ҧ Unbiased Risk Estimator The classification risk can be equivalently expressed as 𝑙−1 𝑞(𝑡 = 𝑘) ഥ 𝑆 𝑔 = σ 𝑘=1 𝑆 𝑘 (𝑔) , where ഥ 𝑍 𝑡=𝑘) [ ҧ ℒ 𝑘 (𝑔 𝑦 , ത 𝑆 𝑘 𝑔 ≔ 𝔽 ҧ 𝑍)] , 𝑞 𝑦, ത and 𝑙−1−𝑘 𝑍 ℒ 𝑔 𝑦 , 𝑧 ′ . ℒ 𝑘 𝑔 𝑦 , ത 𝑍 ≔ σ 𝑧∉ ത σ 𝑧 ′ ∉ ത 𝑍 ℒ 𝑔 𝑦 , 𝑧 − 𝑘 ➢ 𝑆 𝑔 : the classification risk defined as 𝔽 𝑞(𝑦,𝑧) [ℒ 𝑔 𝑦 , 𝑧 ] ➢ ℒ 𝑔 𝑦 , 𝑧 : multi-class loss function Each set of MCLs is taken as a whole! ICML 2020 Learning with Multiple Complementary Labels 6

  7. Practical Implementation Observation: The empirical risk estimator may become unbounded below if the used loss function is unbounded, thereby leading to over-fitting. Conjecture: Bounded loss is better than unbounded loss. Results: We validate via experiments that MAE, MSE, GCE [Zhang & Sabuncu, 2018], and Phuber-CE [Menon et al., 2020] outperform CCE. ICML 2020 Learning with Multiple Complementary Labels 7

  8. Is Bounded Loss Good Enough? Is the performance of the unbiased risk estimator with bounded loss good enough? We take MAE for example, and insert MAE into the empirical risk estimator, and obtain an equivalent formulation as ′ 𝑔 𝒚 𝑗 , ത 𝑗 = 1 − σ 𝑘∉ ത ℒ MAE 𝑍 𝑍 𝑗 𝑞 𝜾 𝑘 𝒚 𝑗 ) , Its gradient is expressed as ′ 𝜖ℒ MAE if 𝑘 ∉ ത = ቊ −𝛼 𝜾 𝑞 𝜾 𝑘 𝒚 𝑗 ) ∙ 1 𝑍 𝑗 , 𝜖𝜾 0 otherwise. Each example is treated equally important for optimization. ICML 2020 Learning with Multiple Complementary Labels 8

  9. Upper-Bound Surrogate Losses We propose the following upper-bound surrogate losses: ℒ EXP 𝑔 𝒚 𝑗 , ത 𝑍 𝑗 = exp(− σ 𝑘∉ ത 𝑍 𝑗 𝑞 𝜾 𝑘 𝒚 𝑗 )) , ℒ LOG 𝑔 𝒚 𝑗 , ത 𝑗 = −log(σ 𝑘∉ ത 𝑍 𝑍 𝑗 𝑞 𝜾 𝑘 𝒚 𝑗 )) . Their gradient can be expressed as 𝜖ℒ EXP if 𝑘 ∉ ത = ቊ−𝛼 𝜾 𝑞 𝜾 𝑘 𝒚 𝑗 ) ∙ 𝑥 EXP 𝑍 𝑗 , 𝜖𝜾 0 otherwise, 𝜖ℒ LOG if 𝑘 ∉ ത = ቊ−𝛼 𝜾 𝑞 𝜾 𝑘 𝒚 𝑗 ) ∙ 𝑥 LOG 𝑍 𝑗 , 𝜖𝜾 0 otherwise, 𝑍 𝑗 𝑞 𝜾 𝑘 𝒚 𝑗 )) −𝟐 . where 𝑥 EXP = exp(− σ 𝑘∉ ത 𝑍 𝑗 𝑞 𝜾 𝑘 𝒚 𝑗 )) and 𝑥 LOG = (σ 𝑘∉ ത Higher weights will be given to hard examples! ICML 2020 Learning with Multiple Complementary Labels 9

  10. Experiments Benchmark datasets: MNIST, Kuzushiji-MNIST, Fashion-MNIST, ⚫ CIFAR-10. UCI datasets: Yeast, Texture, Dermatology, Synthetic Control, ⚫ 20Newsgroups. Compared methods: GA, NN, and Free [Ishida et al., 2019], PC [Ishida et ⚫ al., 2017], Forward [Yu et al., 2018], CLPL [Cour et al., 2011], unbiased risk estimator with bounded losses MAE, MSE, GCE [Zhang & Sabuncu, 2018], and PHuber-CE (Menon et al., 2020) and unbounded loss CCE, and the two upper-bound surrogate losses EXP and LOG. Extensive experimental results clearly demonstrate the effectiveness of our proposed methods. ICML 2020 Learning with Multiple Complementary Labels 10

  11. Conclusion ❑ A novel problem setting that generalizes learning with a single CL to learning with MCLs. ❑ Solutions including the wrappers and an unbiased risk estimator. ❑ Upper-bound surrogate losses. Thank you! ICML 2020 Learning with Multiple Complementary Labels 11

Recommend


More recommend