Computer Aided Detection (CAD) for 3D Breast Imaging and GPU - PowerPoint PPT Presentation

Computer Aided Detection (CAD) for 3D Breast Imaging and GPU Technology Xiangwei Zhang, Chui Haili Imaging and CAD science, Hologic Inc., Santa Clara, CA 03/19/2015

Summary  Computer aided detection (CAD) of breast cancer in 3D digital breast tomo-synthesis  An introduction;  GPU kernel optimization  A case study of convolution filtering;  GPU optimization in DBT CAD  GPU/CPU data copy; GPU memory management;  Conclusions/Questions; 3/25/2015 Slide 2

Cancer Incidence/Mortality Rates (USA) 250,000 200,000 Number of Cases 150,000 100,000 50,000 0 Lung Colorect Breast Prostate Cancer al Cancer Cancer Cancer Incidence 169400 148300 205000 189000 Mortality 154900 56600 40000 30200 Disease Type 2001 – American Cancer Society 3/25/2015 Slide 3

Computer Aided Detection (CAD)  Early detection of cancer is the key to reduce the mortality rate;  Medical imaging can help the early detection of cancer;  Breast X-ray mammography, chest X-ray, lung CT, colonoscopy, brain MRI, etc;  Interpreting the image to find signs of cancer is very challenging for radiologists;  Automated processing using computer software helps radiologists in clinical decision;  Various image analysis software, including computer aided detection (CAD) and diagnosis (CADx); 3/25/2015 Slide 4

Medical Imaging Applications Lung CT Breast Mammography Chest X-ray Colonoscopy

Micro-calcifications in digital mammography 3/25/2015 Slide 6

CAD for 2D Mammography  Each patient/examination has 4 views (left/right breast, CC/MLO views);  There are four 2D image to be processed;  CAD generates marker overlays ( triangle - micro-calcification clusters; star – mass density or spiculation/architectural distortion) 3/25/2015 Slide 7

2D Mammography CAD processing flow  Pre-processing  Pixel value transformation (log, invert, scaling);  Segmentations;  breast, pectoral muscle (MLO view), roll – off region;  Suspicious candidate generation;  Filtering(general and dedicated), region growing;  Region analysis/classification;  feature extraction/selection, classification;  It takes ~10 seconds/view to complete (pure CPU implementation); 3/25/2015 Slide 8

Digital Breast Tomo-synthesis (DBT)  Acquisition  Multiple 2D projection views (PVs) are acquired in different angles (11 to 15);  The angle span is limited to get high in-plane resolution (15 to 30 degree);  Each projection uses a much lower dose than 2D mammography;  Reconstruction  Back projection is used to reconstruct a 3D volume with 1mm slice interval;  Usually a volume consists of 40 to 80 slices (1560x2457 pixels/slice);  Advantage (vs 2D mammogram)  Reduce tissue overlap  reveal 3D anatomical structures hidden in 2D;  Disadvantage  Much more data to interpret and store; 3/25/2015 Slide 9

DBT acquisition and reconstruction X ray tube Reconstruction slices PV 1 Compression paddle PV n … … Compressed PV 2 breast PV n-1 PV m Digital detector Center of rotation PV 1 , PV 2 , PV 3 , …, PV m PV 1 , PV 2 , PV 3 , …, PV m 3/25/2015 Slide 10

DBT CAD processing flow  Slice by slice processing (similar to first three steps in 2D CAD)  3D region growing;  3D Region analysis/classification;  Prototype (2007)  Pure CPU implementation;  It takes ~10 minutes/view to complete;  Clinically unacceptable;  What we can do to speedup?  CUDA computation on GPGPU; 3/25/2015 Slide 11

GPU kernel performance optimization  Key requirements for good GPU kernel performance  Sufficient parallelism;  Efficient memory access;  Efficient instruction execution;  Efficient memory access: a case study of 1D convolution on 2D image with different implementations  CPU;  GPU  Global memory;  Texture memory;  Shared memory; 3/25/2015 Slide 12

GPU CUDA memory space Grid Block (0, 0) Block (1, 0) Shared Memory Shared Memory Registers Registers Registers Registers Thread (0, 0) Thread (1, 0) Thread (0, 0) Thread (1, 0) Local Local Local Local Memory Memory Memory Memory Host Global Memory Constant Memory Texture Memory 13 3/25/2015 Slide 13

GPU global memory access optimization  GPU global memory  DRAM -- High latency;  Not necessarily cached;  Many algorithms are memory-limited  Or at least somewhat sensitive to memory bandwidth;  Arithmetic operation to memory access ratio is low;  Optimization goal: maximize bandwidth utilization  Memory accesses are per warp (warp – 32 consecutive threads in a single block);  Memory accesses are in discrete chunks (line – 128 bytes; segments – 32 bytes);  The key is to have sufficient concurrent memory access per warp; 3/25/2015 Slide 14

Efficient GPU memory access addresses from a warp one 4-segment transaction ... 32 64 224 256 288 384 416 448 0 96 128 160 192 320 352 Memory addresses addresses from a warp ... 32 64 96 128 160 192 224 256 288 320 352 384 416 448 0 Memory addresses 3/25/2015 Slide 15

Not efficient GPU memory access addresses from a warp ... 32 64 224 256 288 384 416 448 0 96 128 160 192 320 352 Memory addresses addresses from a warp ... 3/25/2015 Slide 16

A case study: 1D vertical convolution on 2D image Pixel in a 2D image Pixel of interest X Local neighborhood X Convolution: For each pixel, the new pixel value is the weighted sum of the pixel value in the defined neighborhood; 3/25/2015 Slide 17

Running time comparison: CPU and GPU  Platform  Host – Dell Precision 7500  CPU: Intel Xeon dual core @3.07GHz, @3.06GHz;  RAM: 16.0GB;  Device  GeForce GTX 690 (dual card);  3072 CUDA cores (1536x2), 16 SMs;  4GB 512-Bit GDDR5;  PCI Express 3.0x16;  CUDA 5.5;  OS  Window 7 Professional Service Pack 1, 2009; 3/25/2015 Slide 18

CPU based – Two ways of serial processing Assumption: 2D image stored in continuous linear memory space; left: row-wise; right: column-wise; X X 3/25/2015 Slide 19

CPU based – Two ways of serial processing  The speeds are different;  Due to the linear memory structure and data caching;  Vertical = 146.59 ms/run; Horizontal = 104.19 ms/run; Running time for two CPU versions (milliseonds/run) 160 140 120 100 CPU Vertical 80 CPU Horizontal 60 40 20 0 CPU 3/25/2015 Slide 20

Global memory based – thread-block design 1 Pixel in a 2D memory chunk Pixel of interest X Local neighborhood CUDA thread-block X Cuda implementation: The whole image is divided into multiple vertical bar shape thread-blocks (1x128); 3/25/2015 Slide 21

Global memory based – thread-block design 2 Pixel in a 2D memory chunk Pixel of interest X Local neighborhood CUDA threadblock X Cuda implementation: The whole image is divided into multiple horizontal bar threadblocks (128x1); 3/25/2015 Slide 22

Global memory based – Vertical vs Horizontal  The speeds are quite different;  Due to the linear memory structure and concurrent aligned reading in a WARP;  Vertical = 13.325; Horizontal = 1.652 ms/run;  60x speedup compared to CPU version; Running time for two GPU versions (milliseonds/run) 160 140 120 100 80 Vertical 60 Horizontal 40 20 0 CPU GPU Global Memory 3/25/2015 Slide 23

Texture memory based version  Texture memory  Read only cache;  Good for scattered reads;  Caching is 32 bytes (one segment);  Two different thread-blocks  Vertical (1x128);  Horizontal (128x1); 3/25/2015 Slide 24

Texture memory based – Vertical vs Horizontal  The speeds are different;  Vertical = 2.507; Horizontal = 1.707 ms/run;  Horizontal: comparable to global memory version;  Vertical: much better than global memory version (better at scattered data); Running time for two GPU versions (milliseonds/run) 14 12 10 8 Vertical 6 Horizontal 4 2 0 GPU Global GPU Texture Memory Memory 3/25/2015 Slide 25

Shared memory based version  Shared memory  Read/write cache in SM;  Low latency compared to global memory, or even texture memory;  Using as read cache, the original data still need to be loaded from global memory;  Two different thread-blocks  Vertical (1x768);  Horizontal (32x24); 3/25/2015 Slide 26

Shared memory based – thread-block design 1 Pixel in a 2D memory chunk Pixel of interest X Local neighborhood CUDA thread-block X Cuda implementation: The whole image is divided into multiple vertical bar shape thread-blocks (1x768); 3/25/2015 Slide 27

Shared memory based – thread-block design 2 Pixel in a 2D memory chunk Pixel of interest X Local neighborhood CUDA thread-block X Cuda implementation: The whole image is divided into multiple vertical bar shape thread-blocks (32x24 = 768 threads); 3/25/2015 Slide 28

Shared memory based – Vertical vs Horizontal  The speeds are quite different;  Vertical = 3.725 ms/run; Horizontal = 1.084 ms/run;  Horizontal: better than both global/texture memory;  Vertical: better than global memory, worse than texture memory; Running time for two GPU versions (milliseonds/run) 14 12 10 8 Vertical 6 Horizontal 4 2 0 GPU Global GPU Texture GPU Shared Memory Memory Memory 3/25/2015 Slide 29

Computer Aided Detection (CAD) for 3D Breast Imaging and GPU - PowerPoint PPT Presentation

Computer Aided Detection (CAD) for 3D Breast Imaging and GPU Technology Xiangwei Zhang, Chui Haili Imaging and CAD science, Hologic Inc., Santa Clara, CA 03/19/2015 Summary Computer aided detection (CAD) of breast cancer in 3D digital

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

Nuclear Imaging Medical Imaging Medical Imaging Nuclear Imaging Nuclear Imaging Nuclear

Introduction to CAD Motivation Principles of CAD / CAM AML710 Computer Aided Design L-T-P

BREAST CANCER PREVENTION AND SCREENING What is Breast Cancer Why Breast Cancer Screening is

Meritus Health Systems 1 Breast Cancer Breast Cancer is cancer that forms in breast cells

CBIG Computational Breast Imaging Group Quantitative imaging phenotyping of breast cancer risk

Computer- -Aided Diagnosis in Aided Diagnosis in Computer Medical Imaging: From Pattern

Welcome to SolidWorks Computer Aided Design (CAD) Part Assemblies Drawings Design Analyses

Why and What is the National 3D CAE Software Computer Aided Engineering Computer Aided

Breast Cancer Path of Support Milestones for Empowering and Assisting Breast Cancer Patients

and Quality of f Lif Life Maria Joo Cardoso, MD, PhD Head Breast Cancer Surgeon, Breast Unit

Current training model in breast surgery CCT in G/S (special interest breast) Core training:

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Computer Aided Many sorts of computer assisted/aided Learning learning From PowerPoint

CAD Output of Plots CAD-Import Program Environment Presentation ITS-fem-SlabPackage 1 / 5 ZWAX

WVHS ROBOTICS PATRICK SLATER CAD / ROBOTICS WAYNE VALLEY HIGH SCHOOL CAD PROGRAM

NORTH SALEM CENTRAL SCHOOL DISTRICT Special Education & Pupil Personnel Services Proposed

Technology and Innovation. www.civisti.org Project funded under the Socio-economic Sciences and

Program Michael Fitzgerald, APRN, Executive Director Mental Health and Addiction Services, El

CAPA Forum 19 April 2018 Dr Karin Stam, MICAMHS Bay of Plenty A Secondary Mental Health

Accelerated Partial Breast Irradiation (APBI) Michael Zhang (MSIV), Matthew Spraker, MD, PhD

MPPM Overview Brian Barth, Director of Project Planning & Development Erika Kemp, MPPM

DCF Survey Study 2020: Toward effective development cooperation in the COVID-19 period

DCIS Primary Curriculum Teams Curriculum Teams 1 Maths Joanna Fairchild KS2 co-ordinator Anna

Computer Aided Detection (CAD) for 3D Breast Imaging and GPU - PowerPoint PPT Presentation

Computer Aided Detection (CAD) for 3D Breast Imaging and GPU Technology Xiangwei Zhang, Chui Haili Imaging and CAD science, Hologic Inc., Santa Clara, CA 03/19/2015 Summary Computer aided detection (CAD) of breast cancer in 3D digital

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

Nuclear Imaging Medical Imaging Medical Imaging Nuclear Imaging Nuclear Imaging Nuclear

Introduction to CAD Motivation Principles of CAD / CAM AML710 Computer Aided Design L-T-P

BREAST CANCER PREVENTION AND SCREENING What is Breast Cancer Why Breast Cancer Screening is

Meritus Health Systems 1 Breast Cancer Breast Cancer is cancer that forms in breast cells

CBIG Computational Breast Imaging Group Quantitative imaging phenotyping of breast cancer risk

Computer- -Aided Diagnosis in Aided Diagnosis in Computer Medical Imaging: From Pattern

Welcome to SolidWorks Computer Aided Design (CAD) Part Assemblies Drawings Design Analyses

Why and What is the National 3D CAE Software Computer Aided Engineering Computer Aided

Breast Cancer Path of Support Milestones for Empowering and Assisting Breast Cancer Patients

and Quality of f Lif Life Maria Joo Cardoso, MD, PhD Head Breast Cancer Surgeon, Breast Unit

Current training model in breast surgery CCT in G/S (special interest breast) Core training:

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Computer Aided Many sorts of computer assisted/aided Learning learning From PowerPoint

CAD Output of Plots CAD-Import Program Environment Presentation ITS-fem-SlabPackage 1 / 5 ZWAX

WVHS ROBOTICS PATRICK SLATER CAD / ROBOTICS WAYNE VALLEY HIGH SCHOOL CAD PROGRAM

NORTH SALEM CENTRAL SCHOOL DISTRICT Special Education &amp; Pupil Personnel Services Proposed

Technology and Innovation. www.civisti.org Project funded under the Socio-economic Sciences and

Program Michael Fitzgerald, APRN, Executive Director Mental Health and Addiction Services, El

CAPA Forum 19 April 2018 Dr Karin Stam, MICAMHS Bay of Plenty A Secondary Mental Health

Accelerated Partial Breast Irradiation (APBI) Michael Zhang (MSIV), Matthew Spraker, MD, PhD

MPPM Overview Brian Barth, Director of Project Planning &amp; Development Erika Kemp, MPPM

DCF Survey Study 2020: Toward effective development cooperation in the COVID-19 period

DCIS Primary Curriculum Teams Curriculum Teams 1 Maths Joanna Fairchild KS2 co-ordinator Anna

NORTH SALEM CENTRAL SCHOOL DISTRICT Special Education & Pupil Personnel Services Proposed

MPPM Overview Brian Barth, Director of Project Planning & Development Erika Kemp, MPPM