leveraging modern supercomputing infrastructure for
play

Leveraging modern supercomputing infrastructure for tensor - PowerPoint PPT Presentation

Leveraging modern supercomputing infrastructure for tensor contractions in large electronic-structure calculations Ilya A. Kaliman University of Southern California September 18-19, 2017 Tensors in Quantum Chemistry ^ H = E Coupled


  1. Leveraging modern supercomputing infrastructure for tensor contractions in large electronic-structure calculations Ilya A. Kaliman University of Southern California September 18-19, 2017

  2. Tensors in Quantum Chemistry ^ H ψ= E ψ Coupled Cluster Equations 2

  3. Tensors in Quantum Chemistry ● Tensors of floating point numbers are used extensively in high-level electronic-structure calculations ● 4-index tensors are common Coupled Cluster methods ● Contractions are the most expensive step ● Complex structure of tensors – must use symmetry and sparsity ● Huge data size (many terabytes) ● Large calculations can take weeks 3

  4. Q-Chem Quantum Chemistry Package Q-Chem ccman2 – Coupled Cluster module libcc – library of CC equatjons libtensor (frontend) Natjve backend libxm backend CTF backend 4 This work

  5. Data storage using block tensors Molecular point-group Permutational Spin symmetry symmetry symmetry a ji =− a ij Canonical tensor blocks Non-canonical blocks (computed from canonical blocks) 5 Zero blocks

  6. Block tensor operations 0 C 11 = A 11 ⊗ B 11 + A 21 ⊗ B 12 Contractions Unfolding + BLAS/BLIS C 11 C 21 A 11 A 21 B 11 B 21 = C 12 C 22 A 12 A 22 0 0 x C 12 = A 12 ⊗ B 11 + A 22 ⊗ B 12 B 12 B 22 C 13 C 23 A 13 A 23 ● Only non-zero canonical blocks (orange) need to Additions be computed ● Blocks can be computed independently in parallel C 11 C 21 C 31 A 11 A 21 A 31 B 11 B 21 B 31 = + C 12 C 22 C 32 A 12 A 22 A 32 B 12 B 22 B 32 C 13 C 23 C 33 A 13 A 23 A 33 B 13 B 23 B 33 6

  7. Calculations on a single node Shared Memory CPU Canonical tensor blocks CPU CPU CPU 7

  8. Calculations on a supercomputer Shared Filesystem Compute node Compute node Canonical tensor blocks Compute node Compute node Compute node Compute node Can this scale? 8

  9. Calculations on a supercomputer Shared Filesystem Compute node Fast cache (SSD, etc) Compute node Canonical tensor blocks Compute node Compute node Compute node Compute node Can this scale? It can! (with a fast cache) 9

  10. BurstBuffer on NERSC Cori 6.5 Gb/sec read/write bandwidth 10 http://www.nersc.gov/users/computational-systems/cori/burst-buffer/burst-buffer/

  11. Implementation and benchmarks: libxm ● Libxm is a library of primitive tensor operations xm_contract(1.0, A, B, 2.0, C, “abcd”, “ijcd”, “ijab”); – xm_add(1.0, A, 2.0, B, “ij”, “ji”); – ... – ● Main components – MPI-aware disk-backed memory allocator – Code for tensor operations – Auxiliary routines ● Stores all data on disk ● Hybrid MPI/OpenMP parallel design – Static load balancing between the nodes (MPI) – Dynamic load balancing within a node (OpenMP) ● https://github.com/ilyak/libxm 11

  12. Libxm parallel scaling on Cori Total tensor data size is over 2 Tb, time in seconds, speedup relative to one node in parenthesis 12

  13. Conclusions ● A new distributed-parallel model for tensor operations is implemented in the libxm library ● Shared filesystem is used as an inter-node common storage for tensors ● Data size is not limited by the amount of RAM or number of nodes ● The hybrid MPI/OpenMP parallel code shows excellent scaling when adequate data caching is employed 13

  14. Thank you! ● Acknowledgments – Prof. Anna Krylov, USC – Dr. Evgeny Epifanovsky, Q-Chem https://github.com/ilyak/libxm 14

Recommend


More recommend