v
play

v ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ?? ? ? ? ? ?? - PowerPoint PPT Presentation

MASSACHUSETTS GENERAL HOSPITAL RADIATION ONCOLOGY A CCELERATING MI - B ASED B - S PLINE R EGISTRATION U SING CUDA E NABLED GPU S James Shackleford (1) , Nagarajan Kandasamy (2), Gregory C. Sharp (1) (1) Massachusetts General Hospital, Radiation


  1. MASSACHUSETTS GENERAL HOSPITAL RADIATION ONCOLOGY A CCELERATING MI - B ASED B - S PLINE R EGISTRATION U SING CUDA E NABLED GPU S James Shackleford (1) , Nagarajan Kandasamy (2), Gregory C. Sharp (1) (1) Massachusetts General Hospital, Radiation Oncology (2) Drexel University, Electrical and Computer Engineering

  2. S LIDE 2 OF 33 F IXED I MAGE M OVING I MAGE I NTRODUCTION W HAT IS D EFORMABLE R EGISTRATION?

  3. S LIDE 3 OF 33 F IXED I MAGE M OVING I MAGE I NTRODUCTION W HAT IS D EFORMABLE R EGISTRATION?

  4. S LIDE 4 OF 33 F IXED I MAGE D EFORMATION V ECTOR F IELD M OVING I MAGE I NTRODUCTION W HAT IS D EFORMABLE R EGISTRATION?

  5. S LIDE 5 OF 33 B-S PLINE G RID P ARAMETERIZATION M ETHOD P Y β X β Y P X P ARAMETER C OEFF P ARAMETER W EIGHT R EGIONAL I NFLUENCE

  6. S LIDE 6 OF 33 v Y v Y v Y P Y v X v X = ( β X β Y ) P X v Y = ( β X β Y ) P Y P X P Y β X β Y P X P ARAMETER C OEFF P ARAMETER W EIGHT R EGIONAL I NFLUENCE

  7. S LIDE 7 OF 33 16 C ONTRIBUTIONS 4 4 v X = Σ Σ ( β X,i β Y,j ) P X,i,j j=1 i=1 4 4 v Y = Σ Σ ( β X,i β Y,j ) P Y,i,j j=1 i=1 P Y β X β Y P X P ARAMETER C OEFF P ARAMETER W EIGHT R EGIONAL I NFLUENCE

  8. S LIDE 8 OF 33 v ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ?? ? ? ? ? ?? ? ? ?? ? ? ? F F F M ? ? ? ?? C ORRESPONDANCE AND C OST 𝚬 C OST w.r.t. V ECTORS D ECOMPRESS V ECTOR F IELD C ∂C New ∂v P ∂C ? ∂P ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ??? ? ? ? ? ?? ? ? ?? ? ? ? Q UASI-NEWTONIAN ? ? ? ?? F O PTIMIZER 𝚬 C OST w.r.t. C OEFFICIENTS

  9. S LIDE 9 OF 33 M OVING I MAGE V ALUE F M H(F) + H(M) – H(F,M) C = F IXED I MAGE V ALUE H(F) H(F) H(F | M) ⨉ h j (i, j) N C = 1 B F B M Σ Σ h j (i, j) ln C H(F,M) N ⨉ h M ( j ) h F ( i ) j=1 i=1 H(M | F) H(M) H(M)

  10. S LIDE 10 OF 33 F M F M 2 3 # of voxels 4 1 Static Image Moving Image intensity 4 1 B C 3 2 A D A B C D Nearest Neighbors Partial Volumes ( ∂v ) ⨉ h j (i n , j n ) N ∂C ∂P = ∂C ∂h ∂v ∂C ∂C ∂C ⨉ ∂w n 4 ⨉ ⨉ Σ ln - C = ∂v = ∂h ∂v ∂P ∂h ∂h x n ⨉ h M ( j n ) h F ( i n ) n=1 x n

  11. S LIDE 11 OF 33 S ERIAL I MPLEMENTATION F OLLOWING A S INGLE T HREAD

  12. S LIDE 12 OF 33 use partial volumes for moving & joint MOVING IMAGE INTENSITY Generate Histograms get corresponding voxels in moving image 4 1 B C 3 2 A D F M Nearest Neighbors Partial Volumes compute compute vector partial volumes for each voxel FIXED IMAGE INTENSITY

  13. S LIDE 13 OF 33 use partial volumes for moving & joint MOVING IMAGE INTENSITY Generate Histograms get corresponding voxels in moving image 4 1 B C 3 2 A D F M Nearest Neighbors Partial Volumes compute compute vector partial volumes for each voxel FIXED IMAGE INTENSITY Compute Score simply cycle Traditional Serial CPU thru histograms ⨉ h j (i, j) N C = 1 B F B M is very fast Σ Σ h j (i, j) ln N ⨉ h M ( j ) h F ( i ) (time required is negligible) j=1 i=1

  14. S LIDE 14 OF 33 use partial volumes for moving & joint MOVING IMAGE INTENSITY Generate Histograms get corresponding voxels in moving image 4 1 B C 3 2 A D F M Nearest Neighbors Partial Volumes compute compute vector partial volumes for each voxel FIXED IMAGE INTENSITY Compute Score simply cycle Traditional Serial CPU thru histograms ⨉ h j (i, j) N C = 1 B F B M is very fast Σ Σ h j (i, j) ln N ⨉ h M ( j ) h F ( i ) (time required is negligible) j=1 i=1 change in cost as Compute Gradient ( ∂v ) vector changes get corresponding ∂C ∂C ⨉ ∂w n 4 voxels in moving image Σ ∂v = 4 1 ∂h x n B C n=1 3 2 A D F M Nearest Neighbors Partial Volumes compute NEXT get vector partial volume for each voxel derivatives ∂C ∂P = ∂C ∂v ⨉ h j (i n , j n ) N ⨉ ∂C ln - C = ∂v ∂P ∂h ⨉ h M ( j n ) h F ( i n ) x n

  15. S LIDE 15 OF 33 v ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ?? ? ? ? ? ?? ? ? ?? ? ? ? F F F M ? ? ? ?? C ORRESPONDANCE AND C OST 𝚬 C OST w.r.t. V ECTORS D ECOMPRESS V ECTOR F IELD C ∂C New ∂v P ∂C ? ∂P ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ??? ? ? ? ? ?? ? ? ?? ? ? ? Q UASI-NEWTONIAN ? ? ? ?? F O PTIMIZER 𝚬 C OST w.r.t. C OEFFICIENTS

  16. S LIDE 16 OF 33 β X ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? 1 2 3 4 ? ? ? ? ? ? ? ? ? ? ? ? F 7 8 5 6 C HANGE IN C OST w.r.t. C OEFFICIENTS 9 10 11 12 4 4 v X = Σ Σ ( β X,i β Y,j ) P X,i,j β Y 13 14 15 16 j=1 i=1 ∂C ∂C ∂v ∂C 4 4 Σ Σ Σ β X,i β Y,j ∂P = ∂P = ∂v ∂v j=1 i=1

  17. S LIDE 17 OF 33 β X ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? 1 2 3 4 ? ? ? ? ? ? ? ? ? ? ? ? F 7 8 5 6 C HANGE IN C OST w.r.t. C OEFFICIENTS 9 10 11 12 4 4 v X = Σ Σ ( β X,i β Y,j ) P X,i,j β Y 13 14 15 16 j=1 i=1 ∂C ∂C ∂v ∂C 4 4 Σ Σ Σ β X,i β Y,j ∂P = ∂P = ∂v ∂v j=1 i=1

  18. S LIDE 18 OF 33 P ARALLELIZATION L EVERAGING GPU S , O PEN MP, ETC

  19. S LIDE 19 OF 33 What do we parallelize ? ✓ ✓ ✓ v ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ?? ? ? ? ? ?? ? ? ?? ? ? ? F F F M ? ? ? ?? C ORRESPONDANCE AND C OST 𝚬 C OST w.r.t. V ECTORS D ECOMPRESS V ECTOR F IELD C ✗ ∂C New ∂v P ✓ ✗ ∂C ? ∂P ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ??? ? ? ? ? ?? ? ? ?? ? ? ? Q UASI-NEWTONIAN ? ? ? ?? F O PTIMIZER 𝚬 C OST w.r.t. C OEFFICIENTS

  20. S LIDE 20 OF 33 C OMPUTE V ECTOR F ROM C OEFF F C OMPUTE H ISTOGRAMS C YCLE H IST C OST ( MI ) F M C OMPUTE C HANGE IN C OST w.r.t V ECTOR ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ∂C ? ? ? ? ? ? ? ? ∂v ? ? ? ? ? ? ? ? ? ? ? ? F ? ? ? ? ? ?

  21. S LIDE 21 OF 33 use partial volumes for moving & joint MOVING IMAGE INTENSITY Generate Histograms get corresponding voxels in moving image 4 1 B C 3 2 A D F M Nearest Neighbors Partial Volumes compute compute vector partial volumes for each voxel FIXED IMAGE INTENSITY Compute Score simply cycle Traditional Serial CPU thru histograms ⨉ h j (i, j) N C = 1 B F B M is very fast Σ Σ h j (i, j) ln N ⨉ h M ( j ) h F ( i ) (time required is negligible) j=1 i=1 change in cost as Compute Gradient ( ∂v ) vector changes get corresponding ∂C ∂C ⨉ ∂w n 4 voxels in moving image Σ ∂v = 4 1 ∂h x n B C n=1 3 2 A D F M Nearest Neighbors Partial Volumes compute NEXT get vector partial volume for each voxel derivatives ∂C ∂P = ∂C ∂v ⨉ h j (i n , j n ) N ⨉ ∂C ln - C = ∂v ∂P ∂h ⨉ h M ( j n ) h F ( i n ) x n

  22. S LIDE 22 OF 33 β X β X 1 2 3 4 1 2 3 4 5 6 7 8 ? 5 6 7 8 9 10 11 12 ? ? ? ? ? ? ? ? ? ? ? ? β Y 9 10 11 12 13 1415 16 β Y ? ? ? ? ? ?? ? ? ? ? ? ? ? 13 1415 16 ? ? ? ? ? ? ? ? ? ? F C HANGE IN C OST w.r.t. C OEFFICIENTS CPU 1 4 4 v X = Σ Σ ( β X,i β Y,j ) P X,i,j j=1 i=1 . . . 1 2 3 4 5 16 ∂C ∂C ∂v ∂C 4 4 Σ Σ β X,i β Y,j ∂P = ∂P = ∂v ∂v j=1 i=1 . . . 1 2 3 4 5 16 . . . 1 2 3 4 5 16

  23. S LIDE 23 OF 33 β X β X 1 2 3 4 1 2 3 4 5 6 7 8 ? 5 6 7 8 9 10 11 12 ? ? ? ? ? ? ? ? ? ? ? ? β Y 9 10 11 12 13 1415 16 β Y ? ? ? ? ? ?? ? ? ? ? ? ? ? 13 1415 16 ? ? ? ? ? ? ? ? ? ? F C HANGE IN C OST w.r.t. C OEFFICIENTS CPU 2 CPU 1 4 4 v X = Σ Σ ( β X,i β Y,j ) P X,i,j j=1 i=1 . . . 1 2 3 4 5 16 ∂C ∂C ∂v ∂C 4 4 Σ Σ β X,i β Y,j ∂P = ∂P = ∂v ∂v j=1 i=1 . . . 1 2 3 4 5 16 . . . 1 2 3 4 5 16

  24. S LIDE 24 OF 33 C ONSTANT C ONTROL P OINT S PACING 16x 15 x 15 x 15 speedup 30 min → 1.8 min J. Shackleford, N. Kandasamy, and G. Sharp, Deformable Volumetric Registration using B-splines. GPU Computing Gems: Emerald Edition, Morgan Kaufmann Pub, 2011. J. Shackleford, N. Kandasamy, and G. Sharp, “On developing B-spline registration algorithms for multi-core processors,” Physics in Medicine and Biology , vol. 55, p. 6329, 2010.

  25. S LIDE 25 OF 33 C ONSTANT V OLUME S IZE 256 x 256 x 256 J. Shackleford, N. Kandasamy, and G. Sharp, Deformable Volumetric Registration using B-splines. GPU Computing Gems: Emerald Edition, Morgan Kaufmann Pub, 2011. J. Shackleford, N. Kandasamy, and G. Sharp, “On developing B-spline registration algorithms for multi-core processors,” Physics in Medicine and Biology , vol. 55, p. 6329, 2010.

  26. S LIDE 26 OF 33 OpenMP CUDA thread-level histograms (shared memory) + block-level histograms (global memory) complete histograms (global memory) H ISTOGRAM C OMPUTATION L EVERAGING GPU S , O PEN MP, ETC

  27. S LIDE 27 OF 33 OpenMP CUDA block thread-level histograms (shared memory) + block-level histograms (global memory) complete histogram (global memory) H ISTOGRAM C OMPUTATION L EVERAGING GPU S , O PEN MP, ETC

Recommend


More recommend