Updates on VecGeom Focus on SIMD performance- and developments Sandro Wenzel / CERN-PH-SFT For the VecGeom team Geant4 collaboration meeting, Fermilab, 31.09.2015
Primary Goals of VecGeom Provide multi-track interface/API to important shape functions and geometry navigation x 2 s d 1 x 4 x 3 x 1 vectors of particles ComputeStep for multiple tracks 2 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Primary Goals of VecGeom Provide multi-track interface/API to important shape functions and geometry navigation Gain from CPU SIMD units when processing multiple tracks for simple shapes for logical volumes with few daughters Alternatively: Gain from CPU SIMD units when processing single- tracks for complicated shapes for logical volumes with many daughters Code re-usage/compilation on many platforms (including GPUs) 2 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Main components of VecGeom Geometry Modeller „Shapes“ Navigation Box, Tube,... LogicalVolume NavigationState PlacedVolume Navigator scalar API vector API Transformations scalar API vector API double DistanceToOut(Vector3D const &p, double ComputeStep(Vector3D, Vector3D) Vector3D const &d) void ComputeStep(...“multi-track“ void DistanceToOut(„multitrack- interface...) interface“) 3 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Recap of prototype status early 2014 provided SIMD optimized vector interfaces and algorithms for few elementary solids and geometry base functions ( implemented important functions for particle navigation ) can run chain of algorithms in vector/SIMD mode SIMD distFromInside mothervolume vector flow pick next daughter volume SIMD transform coordinates to daughter frame SIMD distToOutside daughtervol SIMD update step + boundary CHEP13 paper: http://arxiv.org/pdf/1312.0816.pdf 4 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Recap of prototype status early 2014 provided SIMD optimized vector interfaces and algorithms for few elementary solids and geometry base functions ( implemented important functions for particle navigation ) can run chain of algorithms in vector/SIMD mode good overall performance gains for such an algorithm (in toy detector SIMD with 4 boxes, 3 tubes, 2 cones) - compared to ROOT/5.34.17 distFromInside mothervolume vector flow 16 particles 1024 particles SIMD MAX pick next Intel daughter volume ~2.8x ~4.0x 4x IvyBridge (AVX) SIMD transform coordinates to Intel Haswell ~3.0x ~5.0x 4x daughter frame (AVX2) SIMD distToOutside Intel Xeon- daughtervol ~4.1x ~4.8x 8x Phi SIMD (AVX512) update step + boundary gcc 4.8; -O3 -funroll-loops -mavx; no FMA CHEP13 paper: http://arxiv.org/pdf/1312.0816.pdf 4 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Summary of developments after prototype transition of prototype into true library development gitlab.cern.ch/VecGeom/VecGeom design work...; integration with USolids developments, ... porting considerable portion of solid code to VecGeom ported/adapted existing (USolids) code into generic templated and platform independent code which be instantiated for the scalar + GPU + multi-track interfaces (following the VecGeom development model) see table next slide focused somewhat on getting CMS geometry treatable with VecGeom; now possible a lot of effort into validating shape algorithms worked on navigator structure, geometry model, etc. very much ongoing (active R&D) integration of VecGeom into Geant-V simulation framework more or less achieved but more effort needed 5
Shape development status mid 2015 Shape VecGeom Box yes Trap + Trd yes Tube[s] yes Cone[s] yes GenericTrap/Arb8 (yes) Tet Polycone yes Polyhedron yes Torus yes Parallelepiped yes Extruded solid MultiUnion Tesselated Solid Composites yes Templat. Composites (yes) Hype,Ellipsoid, Parab yes Orb/Sphere yes ... the rest ... the rest is „Eltu, Twisted[*], ScaledShape, ...“ 6 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Shape development status mid 2015 SIMD acceleration Shape VecGeom Multi-Track Internal SIMD SIMD impr Box yes yes Trap + Trd yes yes Tube[s] yes yes Cone[s] yes (incomplete) GenericTrap/Arb8 (yes) (yes) (yes) Tet (targeted) Polycone yes (targeted) Polyhedron yes yes Torus yes yes Parallelepiped yes yes Extruded solid (targeted) MultiUnion (targeted) Tesselated Solid (targeted) Composites yes Templat. Composites (yes) (yes) Hype,Ellipsoid, Parab yes yes Orb/Sphere yes yes ... the rest ... the rest is „Eltu, Twisted[*], ScaledShape, ...“ 6 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Example for multi-track SIMD Performance performance of hollow tube segment 1600 time units 1200 800 VecGeom scalar excellent 400 SIMD vector USolids ROOT performance VMP G4 ROOT 0 Geant4 DistanceToIn SafetyToIn In-or-Out? USolids VecGeom ScalarAPI total speedup cmp 7x 3.3x 13.62x VecGeom Many-Track API to USolids gcc 4.7; -O3 -funroll-loops -mavx; no FMA; Geant4 10.1 (Release); Root 5.34.18 (Release); benchmark with 1000 particles 7 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Multi-particle SIMD performance on Xeon Phi Often achieving considerable vector performance on the Intel Xeon Phi with the multi-track interface (example for the trapezoid and simple tube) theoretical max vector gain is 8 for double precision (register width = 512 bytes) trapezoid benchmark - Vc vectorization - Intel(R) Xeon Phi(TM) tube benchmark - Vc vectorization - Intel(R) Xeon Phi(TM) Inside Contains SafetyToIn Inside Contains SafetyToIn SafetyToOut DistanceToIn SafetyToOut DistanceToIn DistanceToOut DistanceToOut benchmark performed by Sofia Vallecorsa + Guilherme Amadio (Intel IPCC) 8 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Example for 1-track SIMD improvement: Polyhedron 0.004 USolids VecGeom noSIMD 0.003 VecGeom SIMD 0.002 for some polyhedra 0.001 considerable overall improvement compared to small test USolids implementation 0 DistToIn DistToOut SafetyToOut For very complex shapes; 0.01 USolid implementation might be better choice 0.008 demonstrated gain from 0.005 internal vectorization ( typically factor 1.4 ish ) 0.003 test done on iCore7 AVX 0 with 1000 particles HBHalf@CMS 9 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Global library performance evaluations 10 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
A global performance evaluation of 1-track mode Trying to benchmark complete geometry modeller: shapes + navigation Developed X-Ray benchmark: propagate geantinos pixel-by-pixel not a realistic benchmark ... (G4 is not optimized for geantino tracing) ... but an indication that we are globally moving into the right direction dir G4 ROOT VecGeom* y 21.5s 12.7s 5.9s z 10.7s 6.58s 4.09s time to obtain the X-Ray image for the CMS calorimeter along different propagation directions (* current stable state of master branch, further improvements expected ) 11 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Scaling on the Xeon Phi Cannot yet compile Geant-V on the Xeon Phi But we can compile VecGeom X-Ray benchmark and can use it for some scaling studies Idea: treat different pixels in different treads (OpenMP) Plot shows thread-speedup for x-raying the CMS calorimeter Demonstrating: thread safety of VecGeom sharing of the geometry among all threads (memory reduction); and perfect scaling up to the number of physical cores preliminary, plot provided by Sofia Vallecorsa (Intel IPCC@CERN) 12 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Comparing VecGeom/TGeo in Geant-V Spent considerable time this year to make CMS@Geant-V run with VecGeom many many debugging sessions -:) more or less stable now (validated by number of steps + simple observables) Allows for a first realistic estimate of the overall impact on total simulation time 10 p-p events 7TeV in CMS; Factor ~ 1.6 improvement in simulation runtime when switching from ROOT to VecGeom using only scalar mode of VecGeom so far; further speedup expected in future preliminary, plot provided by Andrei Gheata 13 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Comparing VecGeom/TGeo in Geant-V VecGeom has a thin „NavigationStates“ (no caching of global matrix; usage of 32byte indices rather than 64byte volume pointers) leads to considerable memory reduction in Geant-V track objects and in the overall simulation (which also contributes positively to the speed gain) preliminary, plot provided by Andrei Gheata 14 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Latest developments in navigation 15 Geant4 collaboration meeting (Vector session), Fermilab, 31/09/2015 Sandro Wenzel
Recommend
More recommend