porting the plasma simulation picongpu to heterogeneous
play

Porting the Plasma Simulation PIConGPU to Heterogeneous - PowerPoint PPT Presentation

Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka Ren Widera 1 , Erik Zenker 1,2 , Guido Juckeland 1 , Benjamin Worpitz 1,2 , Axel Huebl 1,2 , Andreas Knpfer 2 , Wolfgang E. Nagel 2 , Michael Bussmann 1 1


  1. Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka René Widera 1 , Erik Zenker 1,2 , Guido Juckeland 1 , Benjamin Worpitz 1,2 , Axel Huebl 1,2 , Andreas Knüpfer 2 , Wolfgang E. Nagel 2 , Michael Bussmann 1 1 Helmholtz-Zentrum Dresden – Rossendorf 2 Technische Universität Dresden Prof. Peter Mustermann I Institut xxxxx I www.hzdr.de

  2. PICon GPU Electron Acceleration Ion Acceleration Plasma Instabilities with Lasers with Lasers    Compact X-Ray sources Tumor Therapy Astrophysics Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 2 { r.widera, e.zenker, g.juckeland }@hzdr.de

  3. Domain Decomposition ─ Field and Particle Domain + + + ─ + ─ + ─ ─ ─ + + + + ─ ─ ─ + + + ─ ─ ─ + + ─ ─ + ─ + + + ─ ─ Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 3 { r.widera, e.zenker, g.juckeland }@hzdr.de

  4. Domain Decomposition ─ Field and Particle Domain + + + ─ + ─ + ─  Moving Particles create Fields ─ ─ + + +  Fields act back on Particles +  ─ ─ Particles change Cells ─ + + + ─ ─ ─ + + ─ ─ + ─ + + + ─ ─ Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 4 { r.widera, e.zenker, g.juckeland }@hzdr.de

  5. Creating Vectorized Data Structures for Particles and Fields Field Domain Particle Domain + + + 1 2 + 3 4 Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 5 { r.widera, e.zenker, g.juckeland }@hzdr.de

  6. Creating Vectorized Data Structures for Particles and Fields Field Domain Particle Domain Cell 1 Cell 1 + + 1 2 Cell 2 Cell 4 + + 3 4 Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 6 { r.widera, e.zenker, g.juckeland }@hzdr.de

  7. Creating Vectorized Data Structures for Particles and Fields Field Domain Particle Domain Cell 1 Cell 1 + + 1 2 Cell 2 Cell 4 + + 3 4  chunked in supercells  line wise aligned Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 7 { r.widera, e.zenker, g.juckeland }@hzdr.de

  8. Creating Vectorized Data Structures for Particles and Fields Field Domain Particle Domain Cell 1 Cell 1 + + 1 2 Cell 2 Cell 4 + + 3 4  chunked in supercells  fixed size frames  line wise aligned  struct of aligned arrays Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 8 { r.widera, e.zenker, g.juckeland }@hzdr.de

  9. Creating Vectorized Data Structures for Particles and Fields Field Domain Particle Domain Cell 1 Cell 1 + + 1 2 Cell 2 Cell 4 + + 3 4  chunked in supercells  fixed size frames  line wise aligned  struct of aligned arrays Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 9 { r.widera, e.zenker, g.juckeland }@hzdr.de

  10. Algorithm Driven Cache Strategy Cell 1 Cell 1 + + 1 2 Cell 2 Cell 4 + + 3 4 Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 10 { r.widera, e.zenker, g.juckeland }@hzdr.de

  11. Algorithm Driven Cache Strategy Global Memory Cell 1 Cell 1 + + 1 2 Cell 2 Cell 4 + + 3 4 Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 11 { r.widera, e.zenker, g.juckeland }@hzdr.de

  12. Algorithm Driven Cache Strategy Global Memory Shared Memory Cell 1 Cell 1 + + 1 2 Cell 2 Cell 4 + + 3 4 Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 12 { r.widera, e.zenker, g.juckeland }@hzdr.de

  13. Algorithm Driven Cache Strategy Shared Memory Global Memory Cell 1 Cell 1 + + 1 2 Cell 2 Cell 4 + + 3 4 Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 13 { r.widera, e.zenker, g.juckeland }@hzdr.de

  14. High Utilization of Threads Shared Memory Global Memory Cell 1 Cell 1 + + 1 2 Cell 2 Cell 4 + + THREAD BLOCK 3 4 THREAD 1 THREAD 2 THREAD 3 THREAD 4 Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 14 { r.widera, e.zenker, g.juckeland }@hzdr.de

  15. Task-Parallel Execution of Kernels + Asynchronous Communication Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 15 { r.widera, e.zenker, g.juckeland }@hzdr.de

  16. PIConGPU ─ Scales up to 16,384 GPUs strong scaling 10000 1000 speedup 100 ideal 1 to 32 10 8 to 256 64 to 2048 512 to 16384 4096 to 16384 1 1 10 100 1000 10000 number of GPUs Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 16 { r.widera, e.zenker, g.juckeland }@hzdr.de

  17. PIConGPU ─ Scales up to 16,384 GPUs strong scaling weak scaling efficiency 105 10000 100 1000 efficiency [%] 95 speedup 100 90 ideal 1 to 32 10 8 to 256 85 64 to 2048 512 to 16384 ideal 4096 to 16384 PIConGPU 1 80 1 10 100 1000 10000 1 10 100 1000 10000 number of GPUs number of GPUs Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 17 { r.widera, e.zenker, g.juckeland }@hzdr.de

  18. PIConGPU ─ Scales up to 16,384 GPUs strong scaling weak scaling efficiency 105 10000 100 1000 efficiency [%] 95 Efficiency >95% speedup 100 90 ideal 1 to 32 10 8 to 256 85 64 to 2048 512 to 16384 ideal 4096 to 16384 PIConGPU 1 80 1 10 100 1000 10000 1 10 100 1000 10000 number of GPUs number of GPUs Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 18 { r.widera, e.zenker, g.juckeland }@hzdr.de

  19. PIConGPU ─ Scales up to 16,384 GPUs strong scaling weak scaling efficiency 105 10000 100 1000 efficiency [%] 95 Efficiency >95% speedup 100 6.9 PFlop/s (SP) 90 ideal 1 to 32 10 8 to 256 85 64 to 2048 512 to 16384 ideal 4096 to 16384 PIConGPU 1 80 1 10 100 1000 10000 1 10 100 1000 10000 number of GPUs number of GPUs Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 19 { r.widera, e.zenker, g.juckeland }@hzdr.de

  20. More Physics, More Computations, More Power! ─ + s 1 s 2 ─ Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 20 { r.widera, e.zenker, g.juckeland }@hzdr.de

  21. More Physics, More Computations, More Power! Old atom state ─ s 1,1 + s 1,2 s 1 s 2 s 1,3 ... s n,m ─ Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 21 { r.widera, e.zenker, g.juckeland }@hzdr.de

  22. More Physics, More Computations, More Power! Atom-physical Old atom state effects ─ t 1,1 t 1,2 t 1,3 … t 1,n s 1,1 + t 2,1 . s 1,2 s 1 s 2 t 3,1 . s 1,3 … . ... t n,1 t n,n s n,m ─ Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 22 { r.widera, e.zenker, g.juckeland }@hzdr.de

  23. More Physics, More Computations, More Power! Atom-physical Old atom New atom state state effects ─ t 1,1 t 1,2 t 1,3 … t 1,n s 1,1 s 1,1 + t 2,1 . s 1,2 s 1,2 s 1 s 2 t 3,1 . s 1,3 s 1,3 = … . ... ... t n,1 t n,n s n,m s n,m ─ Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 23 { r.widera, e.zenker, g.juckeland }@hzdr.de

  24. More Physics, More Computations, More Power! Atom-physical Old atom New atom state state effects ─ t 1,1 t 1,2 t 1,3 … t 1,n s 1,1 s 1,1 + t 2,1 . s 1,2 s 1,2 s 1 s 2 t 3,1 . s 1,3 s 1,3 = … . ... ... t n,1 t n,n s n,m s n,m ─ Really Big Data Task ◾ Random access on big amounts of data > 100 GB ◾ Good job for powerful CPUs ◾ Efficient CPU/GPU cooperation Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 24 { r.widera, e.zenker, g.juckeland }@hzdr.de

Recommend


More recommend