Solo or Ensemble? Choosing a CNN Architecture for Melanoma Classification Fábio Perez¹, Sandra Avila², Eduardo Valle¹ ¹Recod Lab., DCA, FEEC, University of Campinas (UNICAMP) ²Recod Lab., IC, University of Campinas (UNICAMP) 1 ISIC Workshop @ CVPR 2019
Convolutional Neural Networks SotA for most computer vision problems, including skin lesion analysis Used by all winner submissions in ISIC Challenges 2016, 2017, 2018 2
CNN Architectures AlexNet 3
CNN Architectures GoogLeNet AlexNet ZFNet VGG 4
CNN Architectures NASNet ResNet n GoogLeNet o t i p e c n I AlexNet DenseNet DualPathNet MobileNet n o i p t e c X ZFNet VGG SE-Net I ResNeXt n c e p PNASNet t i o n SqueezeNet - R e s N e t 5
CNN Architectures ISIC Challenges 2016 ResNet 2017 ResNet, Inception 2018 ResNet, Inception, DenseNet, ResNeXt PNASNet, DPN, SENet... 6
Transfer Learning The most critical factor for model performance SotA for most computer vision problems, including skin lesion analysis Also used by all ISIC Challenges winners Valle et al. (2017). Data, Depth, and Design: Learning Reliable Models for Melanoma Screening https://arxiv.org/abs/1711.00441 Menegola et al. (2017). Knowledge Transfer for Melanoma Screening with Deep Learning https://arxiv.org/abs/1703.07479 7
Do better ImageNet models transfer better? Short answer: Yes For multiple natural datasets Fine-tuning, fixed features, and random initialization Kornblith et al. (2018) https://arxiv.org/abs/1805.08974 8
Do better ImageNet models transfer better? 9 Kornblith et al. (2018) arxiv.org/abs/1805.08974
How to predict model performance? 10
Experimental Design 9 architectures × 5 splits × 3 replicates = 135 experiments 11
Experimental Design DenseNet Dual Path Nets 9 architectures Inception-v4 × 5 splits Inception-ResNet-v2 MobileNetV2 × 3 replicates PNASNet ResNet = 135 experiments SENet Xception 12
Experimental Design 9 architectures ISIC 2017 1750 train × 5 splits 500 validation × 3 replicates 500 test = 135 experiments 13
Explored factors Training Architectural AUC Accuracy Acc@1 on ImageNet Validation Sensitivity # of Parameters Test Specificity Date of Publication Loss Validation # of Epochs 14
Results 15
Results 16
Results (without MobileNetV2) 17
Datasets Kornblith et al. (2018) Ours Multiple large datasets ISIC 2017 (2750 images) ➔ ➔ One factor: Acc@1 Multiple factors ➔ ➔ Hyperparameter tuning “Best-practice” hyperparameters ➔ ➔ 18 18
Datasets Kornblith et al. (2018) Ours Multiple large datasets ISIC 2017 (2750 images) ➔ ➔ One factor: Acc@1 Multiple factors ➔ ➔ Hyperparameter tuning “Best-practice” hyperparameters ➔ ➔ One split per dataset Five splits ➔ ➔ No replicates Three replicates ➔ ➔ 19 19
Ensembles 20
Creating the Ensembles 9 architectures × 3 replicates = 27 models per split For each split , ensemble 1, 2, …, 27 models Two strategies for adding models: in random order models with best validation AUC first 21
Results 22
Results (normalized) 23
Conclusions For the SotA models, performance on ImageNet does not necessarily translate to performance on melanoma detection Validation metrics correlate with test metrics much better much better than validation loss Ensembles are needed for stable SotA performance; large ensembles work okay from simply picking at random from a pool of SotA individual models 24
Acknowledgments REC D reasoning for complex data eScience U N I C A M P 25
Thanks! 26
Recommend
More recommend