my parallel electromagnetic solver
play

My Parallel Electromagnetic Solver A Star-Maxwell-P FDTD Propagator - PowerPoint PPT Presentation

My Parallel Electromagnetic Solver A Star-Maxwell-P FDTD Propagator 18.337 Parallel Computing Alejandro W. Rodriguez Outline Overview of nanophotonics Statement of the problem Parallelization: a data-parallel approach Minimizing temporal and


  1. My Parallel Electromagnetic Solver A Star-Maxwell-P FDTD Propagator 18.337 Parallel Computing Alejandro W. Rodriguez

  2. Outline Overview of nanophotonics Statement of the problem Parallelization: a data-parallel approach Minimizing temporal and memory scalings Future work

  3. “New-School” Electromagnetism Beyond the geometric-optics limit Omniguides and Fiber optical guiding Design of frequency-selective structures [ P. Vukosic et al. , Proc. Roy. Soc: Bio. 3µm Sci. 266 , 1403 (1999) ] [ S. Y. Lin et al. , Nature 394 , 251 (1998) ] optical “insulators” [ P. Vukosic et al. , Proc. Roy. Soc: Bio. Sci. 266 , 1403 (1999) ] [ A. Rodriguez et al. , Opt. Lett. 30 , (2005) ]

  4. Nanophotonics — l -scale geometries (wavelength) periodic geometries Quasi-periodic geometries a Coherent scattering! Any 1d-periodic (layered) Structure has a band gap [ Notomi, 2004 ] [ Lederman, 2006 ] [ A. Rodriguez et. al. Phys. Rev. B, 77, [J. Zi et al , Proc. Nat. Acad. Sci . USA , 104201, (2008) ] 100 , 12576 (2003) ] [figs: Blau, Physics Today 57 , 18 (2004)]

  5. A Light-Speed Introduction to FDTD Maxwell’s Equations 2D Yee Grid r r D = ∂ r E = - ∂ r r H B — ¥ — ¥ ∂ t + 4 p J ∂ t r r r r D = e B = m 0 E H EM between produce D y D x r r ( x i , y j , t j ) = n E E i , j r ( x i , y j , t n ) = B i + 1/ 2, j + 1/ 2 n B

  6. A Light-Speed Introduction to FDTD 2D Maxwell’s Equations continuum discrete ( ) + D t n - 1 n - 1 n - 1 = B - E n B D y E ∂ t = ∂ E z ∂ B y ∂ B x ∂ t = - ∂ E z x ,( i + , j + ) x ,( i + , j + ) z ,( i , j + 1) z ,( i , j - 1) ∂ y ( ) ∂ x - D t = B n - 1 n - 1 - E n - 1 n B D x E y ,( i + , j + ) y ,( i + , j + ) z ,( i + 1, j ) z ,( i - 1, j ) r - 1 r H = m 0 = m 0 - 1 B B n n H ( i + , j + ) ( i + , j + ) ( ) n - 1 + D t = D n - 1 - H n - 1 n D D x H ∂ t = ∂ H y ∂ D z Ê ∂ x - ∂ H x ˆ y ,( i + , j ) y ,( i - , j ) z ,( i , j ) z ,( i , j ) Á ˜ ( ) + 4 p J z ,( i , j ) - D t Ë ∂ y ¯ n - 1 - H n - 1 n - 1 D y H x ,( i , j + ) x ,( i , j - ) - 1 D = e ( i , j ) E z = e - 1 D z n n E ( i , j ) ( i , j )

  7. So…why is this a hard problem? Maxwell’s Equations Temporal complexity r r D = ∂ r E = ∂ r r H B • 2 complex (discrete) 3D fields — ¥ — ¥ ∂ t + 4 p J ∂ t ~ 100 flops / pixel / step r r r r D = e B = m 0 E H 3 • 3D size ~ 20 x 18 x 18 a • resolution (pixels / a) ~ 25 8 ~ 10 pixels • 100-400 time steps rod layer 10 12 ~ flops!! hole layer Memory ~ 20 GB FCC crystal (solved 1995) Now, let’s create our own parallel code!

  8. Parallelization Schemes Optimizing complexity 1D 2D # op. counts ~ a res d + b res d - 1 n p n p task ~ volume comm. ~ area

  9. Parallelization Schemes optimizing complexity 1D 2D Most common scenario n >> n p fi 2D Wins! Ê ˆ Ê ˆ ~ O n 2 n 2 + nn p + 4 n n p Á ˜ ~ O Á ˜ Ë ¯ Ë ¯ n p n p

  10. Star-P Implementation Power of vectorization Simple 1D example B , E Œ¬ ( N ¥ N ) consider E N + 1, j = E 0, j = 0 ( ) B i , j = B i , j + a E i , j + 1 - E i , j Looped Vectorized for k=2:(N-1) B(:,2:end-1)=B(:,2:end-1) B(:,k)=B(:,k)+ a ( E(:,k+1)-E(:,k) ) + a ( E(:,3:end-1)-E(:,1:end-2) ) end

  11. Star-P Implementation Power of vectorization (1D parallelization) Communication ( ) over second index B i , j = B i , j + a E i , j + 1 - E i , j looped vectorized Communication cost too great?

  12. Star-P Implementation 1D parallelization ( ) B i , j = B i , j + a E i , j + 1 - E i , j E(N,N*p) E(N*p,N) Direction of costly operation Parallelizing over Ê ˆ Ê ˆ direction perpendicular n 2 n 2 + nn p + 1 ~ O to operation means ~ O Á ˜ Á ˜ Ë ¯ Ë ¯ n p n p constant comm. cost!

  13. 2D Maxwell Equations Back to our problem ( ) + D t = B n - 1 n - 1 - E n - 1 n (1) B D y E x ,( i + , j + ) x ,( i + , j + ) z ,( i , j + 1) z ,( i , j - 1) Our problems mixes ( ) - D t direction of n - 1 n - 1 n - 1 = B - E n (2) B D x E cost-operations y ,( i + , j + ) y ,( i + , j + ) z ,( i + 1, j ) z ,( i - 1, j ) 1D parallelization will be = m 0 - 1 B n n H susceptible to ( i + , j + ) ( i + , j + ) communication costs due to either (1) or (2) ( ) n - 1 + D t = D n - 1 - H n - 1 and certainly due to (3) n D D x H y ,( i + , j ) y ,( i - , j ) z ,( i , j ) z ,( i , j ) (3) ( ) + 4 p J z ,( i , j ) - D t n - 1 - H n - 1 n - 1 D y H x ,( i , j + ) x ,( i , j - ) - 1 D = e ( i , j ) n n E ( i , j ) ( i , j ) Let’s try 2D parallelization!

  14. 2D Maxwell Equations 2D parallelization? Expected from previous results Regime of interest

  15. 2D Maxwell Equations A solution: hybridization! Hybrid parallelization scheme ( ) + D t n Œ N * p ¥ N n Œ N * p ¥ N ( ) ~ E z , x ( ) = B n - 1 n - 1 - E n - 1 n B D y E B x x ,( i + , j + ) x ,( i + , j + ) z ,( i , j + 1) z ,( i , j - 1) ( ) n Œ N ¥ N * p n Œ N ¥ N * p - D t ( ) ~ E z , y ( ) n - 1 n - 1 n - 1 = B - E B y n B D x E y ,( i + , j + ) y ,( i + , j + ) z ,( i + 1, j ) z ,( i - 1, j ) = m 0 - 1 B Auxiliary fields n n H ( i + , j + ) ( i + , j + ) ( ) n Œ N * p ¥ N * p n Œ N ¥ N * p n - 1 + D t ( ) ~ B y ( ) = D n - 1 - H n - 1 D z n D D x H y ,( i + , j ) y ,( i - , j ) z ,( i , j ) z ,( i , j ) n Œ N * p ¥ N ( ) ( ) + 4 p J z ,( i , j ) + B x - D t n - 1 - H n - 1 n - 1 D y H x ,( i , j + ) x ,( i , j - ) n Œ N ¥ N * p n Œ N * p ¥ N * p ( ) ~ D z ( ) E z , x - 1 D n Œ N * p ¥ N n Œ N * p ¥ N * p = e ( i , j ) ( ) ~ D z ( ) n n E E z , y ( i , j ) ( i , j )

  16. 2D Maxwell Equations hybridization wins! ~ order of magnitude

  17. Example 1 Field visualization E z Metallic geometry Steady-state field J z ~ e i w t A quadrupole is born! res = 100 fi N pixels = 10,000 Serial ~ 1 minute Parallel ~ 10 minutes Let’s go to a more interesting problem…

  18. Example 2 Field visualization PhC-Metal geometry Using moderate resolutions res ~ 30 fi N pixels = O (10 8 ) J z Is this a job for star-Maxwell? a a serial = impossible parallel ~ hour L >> a

  19. Future Work (coming months) C++ for loops much faster than Matlab’s (try MPI implementation --- task-parallel approach) A promising optimization (3D geometries):

Recommend


More recommend