towards practical problems in deep learning for radiology
play

Towards Practical Problems in Deep Learning for Radiology Image - PowerPoint PPT Presentation

Towards Practical Problems in Deep Learning for Radiology Image Analysis Quanzheng Li, Xiang Li, James H.Thrall Center for Clinical Data Science Department of Radiology Massachusetts General Hospital, Harvard Medical School Background


  1. Towards Practical Problems in Deep Learning for Radiology Image Analysis Quanzheng Li, Xiang Li, James H.Thrall Center for Clinical Data Science Department of Radiology Massachusetts General Hospital, Harvard Medical School

  2. Background Challenges Solution Performance Our Vision • Purpose of AI in medical imaging: creating value in the delivery of medical care and delivery of radiology services: – increasing diagnostic certainty – decreasing time on task for radiologists – faster availability of results – reducing costs of care. • Interrogating image data for extracting maximum value, with/without pre-defined model structure. • Accuracy, Efficiency, Robustness.

  3. Background Challenges Solution Performance Our Vision • Automatically detect (i.e. screening) the presence of free air lesion regions in the lung CT images. • Manual inspection of the incoming medical images can be time-consuming and lack of the efficiency in handling life-threatening cases (such as pneumothorax). • Certain image abnormities can be subtle human inspector, leads to potential mistakes in handling the patients. • No systematic way of using learning-based methods for fully automatic screening.

  4. Background Challenges Solution Performance Our Vision • Region with free air in the lung. • Can be presented anywhere. • Low HUT area in the image. • Usually associated with other conditions.

  5. Background Challenges Solution Performance Our Vision Data Data Storage, Preprocessing Preparation for Transfer, and Analysis Postprocessing and Quality Analysis Sharing Control PACS Meta-information Patch extraction Model training and Heatmap • • • • • HDFS processing Data augmentation validation generation • • Cloud storage Format conversion Network definition Model testing Diagnosis • • • • • • Image enhancement • Model application • Visualization /de-noising for practice ? Positive

  6. Background Challenges Solution Performance Our Vision Data Data Storage, Preprocessing Preparation for Transfer, and Analysis Postprocessing and Quality Analysis Sharing Control PACS Meta-information Patch extraction Model training and Heatmap • • • • • HDFS processing Data augmentation validation generation • • Cloud storage Fomat conversion Network definition Model testing Diagnosis • • • • • • Image enhancement • Model application • Visualization /de-noising for practice ? Heterogeneity ? Missing / erroneous data items ? Online vs. Offline 1 https://arxiv.org/abs/1512.03385

  7. Background Challenges Solution Performance Our Vision • High image/feature heterogeneity + lack of training samples: more likely to over-fitting. • Data items can be missing or wrong (e.g. in DICOM headers during the scan). • Most sophisticated preprocessing (e.g. image restoration, image segmentation) techniques have to be done off-line with group-wise information provided and/or ground-truth. 1 https://arxiv.org/abs/1512.03385

  8. Background Challenges Solution Performance Our Vision Data Data Storage, Preprocessing Preparation for Transfer, and Analysis Postprocessing and Quality Analysis Sharing Control PACS Meta-information Patch extraction Model training and Heatmap • • • • • HDFS processing Data augmentation validation generation • • Cloud storage Fomat conversion Network definition Model testing Diagnosis • • • • • • Image enhancement • Model application • Visualization /de-noising for practice ? Low speed and intensive I/O for patch extraction ? Lack of training data samples ? Arbitrary parameter / model structure 1 https://arxiv.org/abs/1512.03385

  9. Background Challenges Solution Performance Our Vision • Deep learning models typically run on small image patches for increased sample size and better feature representation. Normal Control Patches Pneumothorax Patches 1 https://arxiv.org/abs/1512.03385

  10. Background Challenges Solution Performance Our Vision Data Data Storage, Preprocessing Preparation for Transfer, and Analysis Postprocessing and Quality Analysis Sharing Control PACS Meta-information Patch extraction Model training and Heatmap • • • • • HDFS processing Data augmentation validation generation • • Cloud storage Fomat conversion Network definition Model testing Diagnosis • • • • • • Image enhancement • Model application • Visualization /de-noising for practice ? Computational time: large data size + complex models ? Needs for real-time results. 1 https://arxiv.org/abs/1512.03385

  11. Background Challenges Solution Performance Our Vision • More complex models and deeper networks: Increased computational load for the system. • Example: >1000 layered Deep Residual Learning network 1 has been evaluated on the ImageNet 2012 dataset consists of 1000 classes, trained on 1.28 million training images. 1 https://arxiv.org/abs/1512.03385

  12. Background Challenges Solution Performance Our Vision • Large data size of most medical image types, high performance computing becomes a crucial component for a practical and running solution. • Example: A typical CT image has more than 30 million voxels (512 × 512 × 120). The pneumothorax project dataset constitutes imaging data from >600 subjects.

  13. Background Challenges Solution Performance Our Vision • Dataset consists of 648 subjects with/without pneumothorax, 66 of them are annotated. • Network trained on 31 subjects, totally 21,540 36 × 36 patches. • Training a 16-layer, VGG-like 2D CNN for lesion detection on two classes: pneumothorax vs. normal. • The whole pipeline takes DICOM images as input, generates a lesion heatmap, provides diagnosis score for the probability of pneumothorax.

  14. Background Challenges Solution Performance Our Vision Data Data Storage, Preprocessing Preparation for Transfer, and Analysis Postprocessing and Quality Analysis Sharing Control PACS Meta-information Patch extraction Model training and Heatmap • • • • • HDFS processing Data augmentation validation generation • • Cloud storage Fomat conversion Network definition Model testing Diagnosis • • • • • • Image enhancement • Model application • Visualization /de-noising for practice ? Security ? Heterogeneity ? Low speed and ? Computational ? Integration into the ? Privacy ? Missing / erroneous intensive I/O for time: large data workflow ? Transfer speed data items patch extraction size + complex ? Real-time feedback ? Online vs. Offline ? Lack of training data models samples ? Arbitrary parameter / model structure √ Co-development with √ HPC platform √ Self-paced learning high radiologist supported by DGX1 √ Asynchronous I/O involvement √ CUDA implementation

  15. Background Challenges Solution Performance Our Vision • Intensive involvement of radiologist During the training phase: addressing the data heterogeneity and under-coverage of training samples: • Four types of mis-classification cases identified, 3 false-positive, 1 false-negative: – Extra small pneumothorax lesions (mainly caused by the image view). – Empyema. – Imaging artifacts (e.g. dark strips). – Irregular trachea/branches shapes.

  16. Background Challenges Solution Performance Our Vision • Self-paced learning scheme to further increase the sample size. spCNN: Close Looped, Multiple Rounds of Training Bootstrapping Module Classification Module Virtual Original Original samples at samples samples round i -1 retraining retraining retraining retraining Bootstrapping … Bootstrapping New CNN, CNN 1,i CNN k,i at round i apply apply apply virtual sample Dataset for New, selection analyze unlabeled data obtain obtain Classification Distribution of results and prediction diagnosis probability

  17. Background Challenges Solution Performance Our Vision • Tested 35 subjects, patch-wise accuracy: 93.9%. • Subject-wise accuracy is calculated by counting the number of detected patches followed by thresholding. Subject-wise ROC Curve 1.0 0.9 0.8 True positive rate 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 False positive rate

  18. Background Challenges Solution Performance Our Vision • Although the detection is done on each slice (i.e. 2D network), the detected lesion boundary is stable across slices.

  19. Background Challenges Solution Performance Our Vision • Detection (i.e. generating heatmap) of a single subject takes less than 3 minutes. • Enabled by the computing power of DGX1: 50 times faster than single K40, 10 times faster than single P100. • Most time consuming step is on the patch extraction, further I/O synchronous will help. • The detection speed is on the same scale of a typical CT scan (minutes), thus enables real-time screening of the patients.

Recommend


More recommend