partial wave analysis using graphics cards
play

Partial Wave Analysis using Graphics Cards Niklaus Berger Hadron - PowerPoint PPT Presentation

Partial Wave Analysis using Graphics Cards Niklaus Berger Hadron 2011, Mnchen IHEP Beijing The (computational) problem with partial wave analysis rec N MC n 1 * * * * gen N MC i=1 i=1 A complex calculation (repeated many times


  1. Partial Wave Analysis using Graphics Cards Niklaus Berger Hadron 2011, München IHEP Beijing

  2. The (computational) problem with partial wave analysis rec N MC n 1 * * * * gen N MC i=1 i=1 A complex calculation (repeated many times over) + = something potentially very slow lots of statistics at Babar, Belle, BES III, Compass, GlueX, Panda etc. 2 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  3. • • • • • Four years ago... I moved to IHEP Beijing All I remembered about partial waves was an unpleasant theory exam People at IHEP were worried about a × 100 increase in statistics Photo: Andreas Rodler I did not know about partial waves, but new how to do things fast I happened to have just read a magazine article about computing on graphics processors 3 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  4. • • • Partial Wave Analysis as a Computational Problem Splits into subtasks: Building a model Determining model parameters through a fjt to the data Judge fjt results Iterate until satisfjed Tightly coupled with the physicist: look at plots, adjust model and input parameters 4 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  5. From Model to Likelihood Decay amplitudes: Resonance and angular structure Sum over partial waves 5 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  6. From Model to Likelihood Decay amplitudes: Resonance and angular structure Sum over partial waves n Normalisation integral over phase space i=1 Product over data events 6 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  7. From Model to Likelihood n Normalisation integral over phase space i=1 Product over data events Log likelihood rec N MC n 1 * * * * gen N MC i=1 i=1 Sum over data events Sum over partial waves 7 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  8. From Model to Likelihood: Fixed Amplitudes n Normalisation integral over phase space i=1 Product over data events Log likelihood 2 { rec N MC n { 1 * * * * gen N MC i=1 i=1 Sum over data events Sum over partial waves { 2 Computationally intensive: O (N iteration × N event × N wave ) 8 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  9. • • • • Going parallel! Almost all our hardware is now parallel Almost all our software is not Almost all our problems are trivially parallel (events!) The solution to speed problems is obvious... 9 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  10. • • • • • • • • • • • • How to do parallel? Farm/Cluster Lots of power Some inter-process communication Long latency (Network & Scheduling) Multi-core CPU Grid Finite power Almost infjnite power Very fast inter-process com- Very limited inter-process communication munication Very long latency Almost no latency Graphics Processor Almost infjnite fmoating-point power Fast communication with CPU Short latency 10 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  11. • • • • Parallel PWA PWA is embarassingly parallel: Exactly the same (relatively simple) calculation for each event Every event has its own data, only fjt parameters are shared Use parallel hardware and make use of Single Instruction - Multiple Data (SIMD) capabilities Very strong here: Graphics proces- sors (GPUs): Cheap and powerful hardware 11 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  12. • • • • • Accessing the Power of GPUs Programming for the GPU is less straightforward than for the CPU Early days: Use graphics interface (OpenGL) - translate problem to drawing a picture Vendor low-level frameworks : Nvidida CUDA and ATI CAL Vendor higher level framework: Brook+ Independent commercial software : RapidMind Emerging standard: OpenCL 12 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  13. • • • • • • • ATI Brook+ We started with using ATI Brook+ Had all of the early adopter problems Was the fjrst to provide double Lots of bugs and limitations precision Small user base Hardware with best performance/ Mediocre support price Uncertain future Very clean programming model, narrow interface Now discontinued by AMD/ATI, we switched to OpenCL 13 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  14. • • • • OpenCL OpenCL is a vendor- and hardware independent standard for parallel computing (in principle...) Gives you lots of detailed control and optimization options... ... at the cost of a very low level, hardware driver like interface No type safety, optimization depends on machine type For embarrassingly parallel tasks: use some higher level abstraction 14 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  15. • • • • • • • • GPUPWA at BES III GPUPWA is our running framework Just done transition to OpenCL 30000 GPU based tensor manipulation Management of partial waves 20000 GPU based normalisation integrals GPU based likelihoods 10000 GPU based analytic gradients Interface to ROOT::Minuit2 fjtters 0 1.8 2 2.2 Projections and plots using ROOT m(K + K - ) [GeV/c 2 ] See: http://gpupwa.sourceforge.net 15 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  16. Performance (Brook+) We use a toy model J/ψ → γ K + K - anal- Using an Intel Core 2 Quad 2.4 GHz ysis for all performance studies workstation with 2 GB of RAM and an ATI Radeon 4870 GPU with 512 MB of RAM for measurements 10 s FORTRAN 1 s Time/Iteration ×150 Speedup GPUPWA 0.1 s Sums on CPU 0.01 s GPUPWA Sums on GPU 0 s 0 200000 400000 Number of Events 16 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  17. Performance (OpenCL) 0.06 0.05 Time/Iteration [s] 0.04 Brook+ 0.03 OpenCL 0.02 0.01 0 100000 200000 500000 Events 17 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  18. Performance (CPU/GPU) 10.0 Fortran 1.0 Time/Iteration [s] OpenCL CPU 0.1 Brook+ OpenCL 0.01 0.001 0 100000 200000 500000 Events 18 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  19. • • • • • • • • Indiana framework (Cleo-c, BES III and GlueX) Following a presentation by M. Shepherd; work done by M. Shepherd, R. Mitchell and H. Matevosyan, Indiana University Using a cluster with Calculation on GPUs message passing inter- using Nvidias CUDA face (MPI) (also on a cluster) High-level inter-process communi- Need more than hundred-fold cation; “easy” to code and debug parallel tasks: amplitude calcula- tion at event level Perform likelihood calculation in parallel; each node with a subset of Some cost for copying data to and data and MC from GPU Use Open MPI implementation of Small fraction of code (large, ex- MPI2 (www.open-mpi.org) pensive loops) ported to GPU Scales well over multiple cores, Coding/debugging somewhat with fast network also over small challenging cluster 19 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  20. • • • • • Speed benchmarks → π + π + π - n analysis Tested with a γp with 5 π + π + π - resonances and one fmoating Breit-Wigner mass Amplitudes and log likelihoods are done on the GPU(s), the rest on the CPU(s) CPU parallelizaition handled by MPI Preliminary conclusions: MPI paralellization is effjcient It is diffjcult to use the full power of GPUs 20 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  21. • • • • Multi-CPU scaling MPI allows very effjcient parallelization of likeli- hood computation Only parameters and partial sums need to be exchanged between nodes User never needs to write MPI calls - all taken care of behind the scenes Fast and easy solution for multi-core systems 21 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

  22. • • • Compute-intensive amplitudes on the GPU Same fjt with one change: Compute π in the Breit-Wigner using the fjrst n terms of the arctan Taylor-expansion Now the fjt time is dominated by the computational complexity of the amplitude More compute intensive ampli- tudes, i.e. more sophisticated models, are an excellent match for GPU accelerated fjtting 22 Partial Wave Analysis on GPUs — Niklaus Berger Hadron 2011 — München

Recommend


More recommend