Distill Efgective Supervision from Severe Label Noise Zizhao Zhang - PowerPoint PPT Presentation

Distill Efgective Supervision from Severe Label Noise Zizhao Zhang | Han Zhang | Sercan Ö. Arık | Honglak Lee | Tomas Pfister Google Cloud AI, Google Brain

Noisy label in Practice Practically-common scenario Previous work MentorNet, Jiang et al. ICML 2018 Crowd-sourcing, web search Learning-to-reweight, Ren et al, ICML 2018 Trusted Noisy Model dataset dataset Optimization TruestedData, Hendrycks et al., NeuIPS 2019

Motivation Experimented on CIFAR100 uniform noise Previous methods still sufger from high label noise. How can do betuer utilize the hidden correct labels in the big noisy-label datasets? Trusted Noisy dataset Our method estimates dataset Drop noisy labels Data Coeffjcients Semi-supervised learning Noise-robust learning with a generalized meta learning framework to distill efgective Green line: Fully-supervised baseline without label noise. supervision from label noise. Blue line : Noise-robust methods can be severely afgected if the label noise ratio is high, e.g. > 50% label noise. Yellow line : Semi-supervised learning (SSL) methods, which discard labels of the large noisy-label dataset. Red line : Our method signifjcantly improves noise-robust training.

Method Key training steps Obtain initial pseudo label candidates ● Contrust meta re-labeling and ● re-weighting in a generalized meta learning framework. Re-labeling is formulated as a difgerential selection problem between estimated labels and original labels. Construct composed losses with ● estimated data coeffjcients. Train a model for one step. ● Key insights (see paper): Betuer initial pseudo labels ● Betuer regularizations ●

Initial Pseudo Labels Inspired by MixMatch, Beruhelot et al, NeurIPS, 2019 Pseudo label estimator average predictions of augmentations and then apply sofumax temperature calibration For augmentation, we use AutoAugment/RandAugment: geomatic/color transformation →fmip→random crop→cutout

Pseudo labels need consistent predictions 3 augmentations Pred 1 Pred 2 Pred 3 Pseudo labels inconsistent predictions Flat Consistent predictions Sharp Consistency enforcing

Training overview The training losses are composted by multiple cross-entropy losses using learned data coeffjcients (weights and pseudo labels) Introduce probe data in actual updating: MixUp is used to "gently" introduce the probe data with possibly-noisy data as training data

Experiments State-of-the-aru over many benchmarks

Experiments with uniform noise Two used networks: WRN28-10 (default) and ResNet29 (very light) Table 1: CIFAR10 with uniform noises. - Upto 9% (86.8% -> 93.7%) improvement. - Outpergorm others with a much smaller ResNet and uses 1 trusted train data/class. 0.01k: 1 probe image per class Table 2: CIFAR100 with uniform noises. - Upto 56% (48.2% -> 75.5%) improvement. - Outpergorm others with a much smaller ResNet and uses 1 trusted train data/class. 0.1k: 1 probe image per class Table 2 : CIFAR100 with different uniform noises. - Upto 56% (48.2% -> 75.5%) improvement. - Outperform others with a much smaller ResNet and merely 1 clean image per class.

Experiments with semantic noise Table 1: Asymmetric noise on CIFAR10. Table 2: Experiments with semantic noise where labels are generated by a neural network trained on limited data. The resulting noise ratio is shown in parentheses. * Trained by us

Large-scale experiments Table 1: WebVision 2M comparison the on min and full version (10 clean ImageNet training images per class is used). - Upto 25% (63.8% -> 80.0%) improvement. - Outpergorm MentorNet even with a much smaller ResNet50 compared with default InceptionResNetv2. mini: 60k (50 class) full: 2M (1000 class) Table 2: Food101N comparison.

Efgectiveness of meta re-labeling Study on CIFAR100 Data coeffjcients: exemplar weights and labels Binary selection formulation: Smaller \lambda favors pseudo labels

Conclusion Our method Estimates Data Coeffjcients, exemplar ● htups://github.com/google-research/google-rese weights and labels, to distill efgective arch/tree/master/ieg supervision for noise-robust model training. ● Signifjcantly outpergorms previous methods and sets new state of the arus on most benchmarks.

Distill Efgective Supervision from Severe Label Noise Zizhao Zhang - PowerPoint PPT Presentation

Distill Efgective Supervision from Severe Label Noise Zizhao Zhang | Han Zhang | Sercan . Ark | Honglak Lee | Tomas Pfister Google Cloud AI, Google Brain Noisy label in Practice Practically-common scenario Previous work MentorNet, Jiang

Noise2Self: Blind Denoising by Self-Supervision Joshua Batson Loc Royer Noisy Data

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Distilling Effective Supervision from Severe Label Noise Zizhao Zhang, Han Zhang, Sercan .

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

Supervision Strengthening Our Practice The plan Supervision what is it? Benefits

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

Main Points Thursday Severe Threat Potential for severe weather throughout the day on

Boosting under high noise. Adaboost is sensitive to label noise Letter / Irvine Database

Supervision Mandatory Webinar 4 Webinar overview I. Background II. Why supervision? III.

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

l"=o r , ",),], "^J{.' ? .P-2 Ht (*{ , z.) I uu, v i /4 I ll' fl -{ ,

Training and Professional Development: Issues and perspectives Dont start when it is not

CPANDT CPANDT M.S.Sozzi M.S.Sozzi VIOLATION VIOLATION

CSE 190 Lecture 13 Data Mining and Predictive Analytics T ext mining Part 2 Assignment 1

A Canada-wide Collaboration Working Group Meeting June 16, 2011 Ottawa, Ontario Guiding

Fast field survey with a smartphone A. Masiero F. Fissore, F. Pirotti, A. Guarnieri, A. Vettore

FOSS Assistant Professor Lyles School of Civil Engineering Geospatial Data Science

Geometric Algorithms Well-Separated Pair Decomposition & Spanners Motivation Connect a set

Distill Efgective Supervision from Severe Label Noise Zizhao Zhang - PowerPoint PPT Presentation

Distill Efgective Supervision from Severe Label Noise Zizhao Zhang | Han Zhang | Sercan . Ark | Honglak Lee | Tomas Pfister Google Cloud AI, Google Brain Noisy label in Practice Practically-common scenario Previous work MentorNet, Jiang

Noise2Self: Blind Denoising by Self-Supervision Joshua Batson Loc Royer Noisy Data

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Distilling Effective Supervision from Severe Label Noise Zizhao Zhang, Han Zhang, Sercan .

Extreme Classification A New Paradigm for Ranking &amp; Recommendation Manik Varma Microsoft

Supervision Strengthening Our Practice The plan Supervision what is it? Benefits

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

Main Points Thursday Severe Threat Potential for severe weather throughout the day on

Boosting under high noise. Adaboost is sensitive to label noise Letter / Irvine Database

Supervision Mandatory Webinar 4 Webinar overview I. Background II. Why supervision? III.

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

l&quot;=o r , &quot;,),], &quot;*^J{.'* ? .P-2 Ht (*{ , z.) I uu, v i /4 I ll' fl -{ ,

Training and Professional Development: Issues and perspectives Dont start when it is not

CPANDT CPANDT M.S.Sozzi M.S.Sozzi VIOLATION VIOLATION

CSE 190 Lecture 13 Data Mining and Predictive Analytics T ext mining Part 2 Assignment 1

A Canada-wide Collaboration Working Group Meeting June 16, 2011 Ottawa, Ontario Guiding

Fast field survey with a smartphone A. Masiero F. Fissore, F. Pirotti, A. Guarnieri, A. Vettore

FOSS Assistant Professor Lyles School of Civil Engineering Geospatial Data Science

Geometric Algorithms Well-Separated Pair Decomposition &amp; Spanners Motivation Connect a set

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

l"=o r , ",),], "^J{.' ? .P-2 Ht (*{ , z.) I uu, v i /4 I ll' fl -{ ,

Geometric Algorithms Well-Separated Pair Decomposition & Spanners Motivation Connect a set