ICML2019 Breaking Inter-Layer Co-Adaptation by Classifier Anonymization Ikuro Sato 1 Denso IT Laboratory. Inc., Japan 1 Kohta Ishikawa 1 National Institute of Advanced Industrial 2 Guoqing Liu 1 Science and Technology, Japan Masayuki Tanaka 2 I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 1 /10
Summary first About what? Breaking co-adaptation between feature extractor and classifier. How? By classifier anonymization technique. Theory? Proved: Features form simple point-like distribution . In reality? Point-like property largely confirmed on real datasets. I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 2 /10
E2E optimization scheme flourishes. Is it always good? 1 E2E opt. ๐ โ , ๐ โ = arg min เท ๐ ๐ท ๐ ๐บ ๐ ๐ฆ , ๐ข ๐ 0 ๐,๐ ๐ฆ,๐ข โ๐ Input DNN Feature Ext. Classifier Loss w/ target ๐ข ๐บ ๐ ๐ฆ ๐ฆ ๐ท ๐ ๐บ ๐ ๐ฆ ๐ ๐ท ๐ ๐บ ๐ ๐ฆ , ๐ข Feature extractor ๐บ ๐ โ adapts to a particular classifier ๐ท ๐ . โ+1โ color: ๐ท ๐ value Feature dim-2 โ - 1โ Toy ex.) Features may form 2-class regression excessively complex distribution. Disjointed โข Split โข Feature dim-1 I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 3 /10
FOCA: Feature-extractor Optimization through Classifier Anonymization 1 ๐ โ = arg min FOCA เท ๐ฝ ๐~ฮ ๐ ๐ ๐ท ๐ ๐บ ๐ ๐ฆ , ๐ข ๐ 0 ๐ ๐ฆ,๐ข โ๐ Want to know more about ๐ช ๐ ? Random weak classifier: ๐~ฮ ๐ Please come to the poster! Feature extractor ๐บ ๐ โ adapts to a set of weak classifiers ๐ท ๐ . Feature dim-2 Features form simple point-like distribution per class under some conditions. Feature dim-1 I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 4 /10
Proposition about the point-like property In words, If feature extractor has an enough representation ability, all input data of the same class are projected to a single point in the feature space in a class-separable way under certain conditions. Please see the paper for the proof. I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 5 /10
x-axis Feature dim. #1 Toy problem demonstration y-axis Feature dim. #2 data used to generate classifier decision boundary start Small-batch classifier works as a weak classifier to the entire dataset. Small perturbations lead to end point-like distribution. I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 6 /10
Experiment #1: partial-dataset training Thing we wish to confirm: full-dataset classifier partial-dataset classifier Do they perform similarly for given ๐บ ๐ โ ?? I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 7 /10
Experiment #1: partial-dataset training CIFAR10 test error rates Performance gap large for other methods much smaller One indication of for FOCA point-like property classifier trained classifier trained with large dataset with small dataset (The same, fixed feature extractor is used within each method.) I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 8 /10
More experiments โฆ including: โข Approximate geodesic distance measurements between large- and small-dataset solutions โข Low-dimensional analyses to further study the point-like property. I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 9 /10
Poster #28 tonight What? Breaking co-adaptation between feature extractor and classifier. How? By classifier anonymization . Proved: Features form simple Theory? point-like distribution . Reality? Point-like property largely confirmed on real datasets. 10 /10
Recommend
More recommend