A Physicists Guide to Parallelization at the IHPC Viar Gumundsson - PowerPoint PPT Presentation

A Physicist’s Guide to Parallelization at the IHPC Viðar Guðmundsson Science Institute, University of Iceland vidar@hi.is 2017

Transport of electrons through dots in a photon cavity Exactly interacting electrons and photons, geometry

Equation of motion Complicated construction through functional spaces ⇓ Non-interacting single-electron basis, Hilbert space ⇓ Interacting many-electron states, Fock space ⇓ Non-interacting electron-photon basis, Fock space ⇓ Photon-dressed electron states, Fock space ⇓ Open system → projection on the central system ⇓ Equation of motion, linear space of transitions, Liouville space

Lot of analytical work and planning ⇓ Heavy use of linear algebra ⇓ Truncation of infinite dimensional spaces ⇓ Compact (dense) complex matrices ⇓ Intel MKL, BLAS, Lapack, OpenMP – cuBLAS, Magma ⇓ G-FORTRAN + ifort – ifort + CUDA

BLAS Always use (parallel f95-interface) BLAS for simple linear algebra task Level I : Vector – vector operations Level II : Matrix – vector operations Level III : Matrix – matrix operations You can not compete with your own code!

Module – wrapper Subroutine → function – matrix multiplication MODULE PAB CONTAINS FUNCTION MATMULVG(Af ,Bf) USE Mod_Precision USE lapack95 USE blas95 USE omp$_$lib IMPLICIT NONE INTEGER :: Nd1 , Nd2 , ierr COMPLEX(KIND=dp), ALLOCATABLE , DIMENSION (: ,:) :: MATMULVG , Af , Bf Nd1 = SIZE(Af ,1) Nd2 = SIZE(Bf ,2) ierr = 0 ALLOCATE(MATMULVG(Nd1 ,Nd2), STAT=ierr) MATMULVG = (0.0_dp , 0.0 _dp) CALL GEMM3M(Af ,Bf ,MATMULVG) ! ---------------------------------------- END FUNCTION MATMULVG END MODULE PAB

Usage – readability ρ vec ( t ) = { V exp( L t ) U } ρ (0) SUBROUTINE NaZwa_gR_errCorr_I ... USE PAB USE PcAB USE PABz USE PAv USE PAvT ... rhotv = MATvecVG(MATMULVG(MATMULVG(vrV ,expiLt),vlU),rho0v) ... END SUBROUTINE NaZwa_gR_errCorr_I

GPU-cards We can also off-load matrix handling to GPU-cards MODULE Mod_MVG CONTAINS FUNCTION MVG(A,B) ! Input: -Two complex N x N matrices A and B ! Output: -Complex N x N matrix c(A)*B USE Mod_Precision USE mkl95_lapack USE blas95 USE omp_lib IMPLICIT NONE INTEGER :: m, k, n, ierr COMPLEX(KIND=dp), ALLOCATABLE , DIMENSION (: ,:) :: MVG , A, B m = SIZE(A ,1) n = SIZE(B ,2) k = SIZE(A ,2) ierr = 0 ALLOCATE(MVG(m,n), STAT=ierr) MVG = (0.0_dp , 0.0 _dp) CALL matmulvgcuda (A, B, MVG , m, k, n) ! ---------------------------------------- END FUNCTION MVG END MODULE Mod_MVG

extern "C" void matmulvgcuda_ ( cuDoubleComplex *A, cuDoubleComplex *B, cuDoubleComplex *C, cudaSetDevice (0); cuDoubleComplex *d_A = 0; cuDoubleComplex *d_B = 0; cuDoubleComplex *d_C = 0; cuDoubleComplex alpha = {.x=1.0 , .y=0.0}; cuDoubleComplex beta = {.x=0.0 , .y=0.0}; int matrixSizeA = (*m)*(*k); int matrixSizeB = (*k)*(*n); int matrixSizeC = (*m)*(*n); cublasHandle_t handle; /* Initialize CUBLAS */ CUBLAS_HANDLE_ERROR ( cublasCreate (& handle) ); /* Allocate device memory for matrices */ CUDA_HANDLE_ERROR ( cudaMalloc (( void **)&d_A , matrixSizeA *sizeof( cuDoubleComplex ))); CUDA_HANDLE_ERROR ( cudaMalloc (( void **)&d_B , matrixSizeB *sizeof( cuDoubleComplex ))); CUDA_HANDLE_ERROR ( cudaMalloc (( void **)&d_C , matrixSizeC *sizeof( cuDoubleComplex ))); /* Copy host matrixes to device */ CUBLAS_HANDLE_ERROR ( cublasSetVector (matrixSizeA , sizeof( cuDoubleComplex ), A, 1, d_A , CUBLAS_HANDLE_ERROR ( cublasSetVector (matrixSizeB , sizeof( cuDoubleComplex ), B, 1, d_B , /* Perform multiplication with CUBLAS */ CUBLAS_HANDLE_ERROR ( cublasZgemm (handle , CUBLAS_OP_N , CUBLAS_OP_N , *m, *n, *k, &alpha /* Read result back */ CUBLAS_HANDLE_ERROR ( cublasGetVector (matrixSizeC , sizeof( cuDoubleComplex ), d_C , 1, C, /* Clean up device memory */ CUDA_HANDLE_ERROR ( cudaFree(d_A) ); CUDA_HANDLE_ERROR ( cudaFree(d_B) ); CUDA_HANDLE_ERROR ( cudaFree(d_C) ); /* Shutdown */ CUBLAS_HANDLE_ERROR ( cublasDestroy (handle) ); }

GPU – off loading � Information on FORTRAN-CUDA modules: vidar@hi.is � Lapack: MAGMA: Matrix Algebra on GPU and Multicore Architectures, http://icl.cs.utk.edu/magma/ � cuBLAS: https://developer.nvidia.com/cublas

OpenMP Matrix elements: !$OMP PARALLEL DO PRIVATE(fs , sumLp , sumRp , FermiS , Cpnu) & !$OMP SHARED(TL , TR , TauLp , TauRp , gamma , Plus , i1 , i2 , j1 , j2) SCHEDULE(DYNAMIC) DO nu = 1, N_mes DO mu = 1, N_mes sumLp = Czero sumRp = Czero DO ia = 1, N_ses FermiS = gamma(nu ,ia) Cpnu = Plus(nu ,ia) IF(mu .EQ. Cpnu) THEN fs = ( -1)** FermiS sumLp = sumLp+TL(i1 ,j1 ,ia)* CMPLX(REAL(fs ,dp) ,0.0_dp ,dp) sumRp = sumRp+TR(i2 ,j2 ,ia)* CMPLX(REAL(fs ,dp) ,0.0_dp ,dp) END IF END DO TauLp(mu ,nu) = sumLp TauRp(mu ,nu) = sumRp END DO END DO !$OMP END PARALLEL DO

Many dimensions � OpenMP – shared RAM – upto 32 cores on Garðar � MPI – distributed RAM – images � OpenMP – depth of parallel tasks � BLAS – linear algebra � Modules – readability – easy usage

Importance of analytical + numerical work Efficient determination of the Markovian time-evolution towards a steady-state of a complex open quantum system, Computer Physics Communications , in press, (2017), https://doi.org/10.1016/j.cpc.2017.06.018

Conclusions � MKL, OpenMP, cuBLAS, MAGMA � Faster development than with MPI � Build modules (wrappers) � Get clever students � G-FORTRAN – ifort

Collaboration and support � Þorsteinn Hjörtur Jónsson (UI) � University of Iceland Research Fund � Andrei Manolescu (RU) � The Icelandic � Chi-Shung Tang (NUU) Research Fund � Hsi-Sheng Goan (NTU) � The Taiwan Ministry � Anna Sitek (UI) of Technology � Nzar Rauf Abdullah (KUS) � The Icelandic � Maria Laura Bernodusson (ALUF) Infrastructure Fund � Sigtryggur Hauksson (McGill) � Árni Johnsen (UI) � Elías Snorrason (UI)

A Physicists Guide to Parallelization at the IHPC Viar Gumundsson - PowerPoint PPT Presentation

A Physicists Guide to Parallelization at the IHPC Viar Gumundsson Science Institute, University of Iceland vidar@hi.is 2017 Transport of electrons through dots in a photon cavity Exactly interacting electrons and photons, geometry

Towards Effective Modeling of the Traffic Dynamics Yang Bo (A*STAR, IHPC)

From Neural Networks to Music Technology for Healthcare: An Overview of Recent Research from IHPC

Jupyter in HPC 1 Matthias Bussonnier A Physicist/Bio-Physicist About Me Core developer

Speed up evaluation by parallelization /////////// November 2018 Michael Weiss Bayer AG

Parallelization and Parallelization and Proling Proling Programming for Statistical

Parallelization Parallelization Programming for Statistical Programming for Statistical Science

Code Parallelization Fabrice Schlegel Introduction Goal: Efficient parallelization and memory

Solid State Physics (Major, 8 ECTS) 1 physicist over 3 in US declares to be a condensed matter

The Artificial Physicist or fully automatic modeling with statistical learning Guillaume

Modelling music Dorien Herremans ISTD, Singapore University of Technology and Design IHPC,

Applications of AI in music Smart music through machine learning Dorien Herremans ISTD,

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Spectral theory for the q-Boson particle system Alexei Borodin A physicist's guide to solving the

Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization Alain

for Effective Speculative Parallelization in Hardware VICTOR A. YING MARK C. JEFFREY* DANIEL

Parallelization in Time Mark Maienschein-Cline Department of Chemistry University of Chicago

flow (rapid) Lava tubes Ash Bomb Cinder cone (cinders) Cinder cones St. Pierre,

Rocks 1 Rocks in Relation to Minerals Many kinds of rocks are composed of minerals

Introduction to StarNEig A Task-based Library for Solving Nonsymmetric Eigenvalue Problems Mirko

Software Toolkit for HFE -based Multivariate Schemes J-C. Faugre 1 , 2 , L. Perret 1 , 2 ,

Online Scheduling of Parallel Programs on Hybrid Architectures

Efficient Search for Inputs Causing High Floating-point Errors Wei-Fan Chiang , Ganesh

Natural Hazards Return to Table of Contents Slide 5 / 142 What happened? Look at the picture

Examination 1 Natural Sciences II (ERTH 1040), Feb. 24, 2006 There are 28 questions over 5 pages