(De)Constructing Bias on Skin Lesion Datasets A. Bissoto¹, M. Fornaciali², E. Valle², S. Avila¹ ¹RECOD Lab., IC, University of Campinas (UNICAMP) ²RECOD Lab., DCA, FEEC, University of Campinas (UNICAMP) ISIC Workshop @ CVPR 2019
RECOD Titans melanoma research 5 years 2014–2019 2
3 h t t p : / / w w w . t o d a y i f o u n d o u t . c o m / i n d e x . p h p / 2 0 1 3 / 1 2 / a n t i - t a n k - d o g s - w o r l d - w a r - i i /
Bias Reproduced from: “Unbiased Look at Dataset 4 Bias”, Torralba et al. (2011)
Confounders on Skin Lesion Datasets Vignetting (dark borders) Staining Rulers Color markers 5 Reproduced from: “An Overview of Melanoma Detection in Dermoscopy Images Using Image Processing and Machine Learning”, Mishra et al. (2016)
Bias Play Down Performance Inflate Performance Legitimate (Overlooked?) Spurious Correlations Correlations Destruction Experiments Construction Experiments 6
Datasets Atlas of Dermoscopy ISIC Archive Educational Large ➔ ➔ Rich Metadata Diverse ➔ ➔ Clinical and dermoscopic Different sources, different ➔ ➔ images for every case devices Clinical data (location, Segmentation masks for lesion ➔ ➔ diameter, elevation) (large subset) Metadata for dermoscopic Segmentation masks for ➔ ➔ features. dermoscopic features (small subset). 7
Destruction Experiments 8
Traditional 9
Traditional Only Skin 10
Traditional Only Skin Bbox 11
Traditional Only skin Bbox Bbox70
Destruction Experiments 13
Destruction Experiments 14
Destruction Experiments Performance of machine learning with all cogent information removed on ISIC Archive: 71% AUC 15
Destruction Experiments Performance of machine learning with all cogent information removed on ISIC Archive: 71% AUC Performance of 157 dermatologists¹ on ISIC Archive: 67% AUC 16 ¹“The Melanoma Classification Benchmark”, Brinker et al. (2019)
Construction Experiments 17
Traditional b) Grayscale Attributes c) RGB Attributes d) Traditional + Grayscale Attributes 18
Traditional Grayscale Attributes c) RGB Attributes d) Traditional + Grayscale Attributes 19
Traditional Grayscale Attributes RGB Attributes d) Traditional + Grayscale Attributes 20
Traditional Grayscale Attributes RGB Attributes Traditional + Grayscale Attributes 21
Construction Experiments 22
Conclusions Machine learning results results are probably optimistic Feeding the model with relevant dermoscopic attributes is worse than feeding it with “only skin” or “bbox” sets Solving the bias problem is critical for deploying automated skin lesion analysis to the real world 23
Team 24 24
Acknowledgments REC D reasoning for complex data 25
Thanks! ISIC Workshop @ CVPR 2019
Recommend
More recommend