Generic Programming Techniques by the example of tensor contraction Patrick Seewald International Fortran Conference, University of Zurich, July 3, 2020 Department of Chemistry, University of Zurich 1 / 16
Background https://github.com/aradi/fypp generalization of DBCSR sparse matrix library to tensors • My work: algorithms based on sparse tensor contractions, https://github.com/Fortran-FOSS-Programmers/ford • FORD : Documentation generator https://github.com/pseewald/fprettify • fprettify : Auto formatter (whitespace / indentation) • Fypp : Preprocessor (Python-based templates / macros) • PhD student in theoretical chemistry • Fortran tools used in CP2K / DBCSR: https://github.com/cp2k/dbcsr sparse matrix/tensor library DBCSR • CP2K is strong in algorithms based on sparse linear algebra using the https://github.com/cp2k/cp2k • CP2K project: quantum chemistry & solid state physics, Fortran 2008 2 / 16
Generic Programming in General • application of the same algorithm to multiple data types • solve a class of related problems instead of tackling each specifjc problem on its own 3 / 16 → e.g. sort implemenation for arbitrary data types → general algorithm that can be applied to many difgerent problems
Generic Programming with Fortran implement a basic preprocessor (cpp/fpp) restricted to including macro language • Non-standard preprocessors : extend Fortran syntax with an external • Code generators : delegate generic programming to another language • Code duplication : limited and problematic (bug fjxes, refactoring) Typical workarounds for missing Fortran macros/templates: common code snippet. 4 / 16 Important generic programming ingredients: types • Templates and macros (compile-time): generate code for difgerent types (e.g. a common shape class for rectangles and triangles) • Polymorphism (run-time): generic type representing multiple specifjc □ ✓ Fortran 2003 □ Fortran standard does not include templates/macros. Compilers
Example: tensor contractions / Einstein summation ! allocatable : : a , b , c integer , dimension ( : , : , : , : ) , allocatable : : data_a integer , dimension ( : , : , : ) , allocatable : : data_b . . . [ 1 ] ) a l l o c a t e and assign data_a , data_b . . . a = tensor ( data_a ) b = tensor (data_b ) c = tensor_einsum ( & a , [1 ,2 ,3 ,4] , b , [2 ,1 ,5] , [5 ,3 ,4]) Fortran API as simple as commonly used Python libraries (Numpy, PyTorch) Implementation in 500 lines of Fortran: https://github.com/pseewald/fortran-einsum-example c l a s s ( tensor ) , [3 ,2] , 5 / 16 real , k Generic Fortran API: c l a s s ( tensor ) , allocatable : : a , b , c real , dimension ( : , : , : ) , allocatable : : [1 ,2 ,3] , b , data_a dimension ( : , : ) , data_b a , c = tensor_einsum ( & b = tensor (data_b ) a = tensor ( data_a ) allocatable . . . and assign data_a , a l l o c a t e . . . ! data_b : : ∑ ∑ A ijk B kj = C i A ijkl B jim = C mkl i , j
Example: tensor contractions / Einstein summation DO i =1, SIZE (A,1) C(m, k , l ) = C(m, k , l ) + & A( i , j , k , l )∗B( j , i ,m) ENDDO ENDDO ENDDO ENDDO ENDDO DO j =1, SIZE (A,2) DO l =1, SIZE (A,4) DO k=1, SIZE (A,3) C = C + & A( i , j , k)∗B( j , i , k) ENDDO ENDDO ENDDO Generated code also needs to be optimized (loop unrolling, blocking, DO m=1, SIZE (B,3) DO k=1, SIZE (A,3) DO j =1, SIZE (A,2) C( i ) = C( i ) + & k DO i =1, SIZE (A,1) DO j =1, SIZE (A,2) DO k=1, SIZE (A,3) Naive / direct implementation implementation: A( i , j , k)∗B(k , j ) ENDDO ENDDO ENDDO DO i =1, SIZE (A,1) 6 / 16 ∑ ∑ ∑ A ijk B kj = C i A ijkl B jim = C mkl A ijk B jik = C i , j i , j , k parallelization) → not a good starting point
Generic einsum implementation: strategy • all tensor contractions can be mapped to products of matrices and specifjc: implementations for all specifjc types (code generation) 3. ( select type construct) dynamic: mapping generic algorithms to specifjc implementations 2. generic: generic algorithms working on generic types (abstract classes) 1. • generic implementation based on following hierarchy: • difgerent tensor ranks (0–7) • difgerent data types (integer/real/complex in 4-byte/8-byte precision) • tensor type should incorporate vector-vector outer product vector-vector inner product matrix-matrix product matrix-vector product for all fmoating point operations: 7 / 16 vectors → reuse of existing libraries (e.g. Fortran intrinsics or BLAS) • ∑ k A ijk B kj = C i • ∑ i , j A ijkl B jim = C mkl • ∑ i , j , k A ijk B jik = C • A ij B k = C ijk → 6 · 8 = 48 difgerent base data types, need a code generator or preprocessor
Generic code design real , allocatable :: t2 (:, :) type is (tensor_2d) t2%data = specifjc_2d_func(t1%data) ! ... end select end function function specifjc_2d_func(t1) RESULT(t2) real , intent(in) :: t1 (:, :) ! ... type is (tensor_1d) end function function generic_func(t1) RESULT(t2) class(tensor) :: t1 class(tensor), allocatable :: t2 t2 = dynamic_func(t1) ! ... end function t2%data = specifjc_1d_func(t1%data) select type (t1) tensor_einsum tensor_from_2d tensor_to_matrix matrix_to_tensor tensor_to_0d tensor_to_1d tensor_to_2d tensor_from_0d tensor_from_1d reshape_data ! ... reshape matrix_product dot_product outer_product matmul function dynamic_func(t1) RESULT(t2) class(tensor) :: t1 class(tensor), allocatable :: t2 8 / 16
Tensor einsum: types type , d ( : , : ) : : allocatable integer ( int32 ) , data_2d_i4 : : extends ( tensor_data ) . . . Automated type generation using ! end type d ( : ) : : allocatable integer ( int32 ) , data_1d_i4 : : end type Fypp preprocessor: type , data_${ rank }$d_${ name }$ #:endfor #:endfor end type d${ shape ( rank )}$ : : allocatable ${ type }$ , : : & #: for extends ( tensor_data ) type , in data_params type #: for name , ranks in rank extends ( tensor_data ) . . . ! data integer ( int32 ) , data_0d_i4 : : extends ( tensor_data ) type , instances ) types (48 concrete : : ! end type tensor_data : : abstract type , data type abstract allocatable d ! type , end type d : : allocatable real ( real32 ) , data_0d_r4 : : extends ( tensor_data ) end type end type d : : allocatable integer ( int64 ) , data_0d_i8 : : extends ( tensor_data ) type , 9 / 16
Fypp – Python powered Fortran metaprogramming @:ASSERT( size (myArray) > 0) l i n e ${_LINE_}$” e r r o r stop end i f #: endif #:enddef ASSERT • Insertion of arbitrary Python expressions https://github.com/aradi/fypp character (∗) , parameter : : comp_date = ”${time . s t r f t i m e (’%Y % m % d ’)} $” ${_FILE_}$ , f i l e in f a i l e d • Iterated output to simulate templates interface myfunc #: for dtype in [ ’ r e a l ’ , ’ dreal ’ , ’ complex ’ , ’ dcomplex ’ ] module procedure myfunc_${dtype}$ #:endfor end interface myfunc • Macros #:def ASSERT( cond ) #: i f DEBUG > 0 i f ( . not . ${cond}$) then print ∗ , ” Assert 10 / 16 − −
Tensor einsum: macros #:endfor #: for name , type in data_params type , extends ( tensor_data ) : : data_${ rank }$d_${ name }$ ${ type }$ , allocatable : : d${ shape ( rank )}$ end type #:endfor tensor_lib .fpp ! . . . type , extends ( tensor_data ) : : data_3d_i4 integer ( int32 ) , allocatable : : d ( : , : , : ) end type ! . . . ranks in rank . . . ] #:set ranks = range (0 , RANK +1) #:set data_name = [ ’ i4 ’ , ’ i8 ’ , ’ r4 ’ , ’ r8 ’ , ’ c4 ’ , ’ c8 ’ ] #:set data_type = [ ’ integer ( int32 ) ’ , ’ integer ( int64 ) ’ , ’ r e a l ( real32 ) ’ , #:set data_params = l i s t ( zip (data_name , #: for data_type )) #:def shape (n) $ : ’ ’ i f n == 0 else #:enddef ! concrete data types generated using Fypp preprocessor 11 / 16 ’ ( ’ + ’ : ’ + ’ , : ’ ∗ (n − 1) + ’ ) ’ fypp − DRANK=7 tensor_lib.fpp > tensor_lib.f90
Tensor einsum: types and constructor data${ shape ( rank )}$ allocatable c l a s s ( tensor ) , t_${ rank }$d : : & allocatable type ( tensor_${ rank }$d ) , sh : : dimension (${ rank }$) integer , : : t intent ( in ) ${ type }$ , r e s u l t ( t ) tensor_${ rank }$d_${ name }$ ( data ) & function in data_params type #: for name , ranks in rank : : type ( data_${ rank }$d_${ name }$ ) , & implementation ( t_data ) #:endfor #:endfor end function t ) move_alloc (t_${ rank }$d , c a l l move_alloc ( t_data , t_${ rank }$d% data ) c a l l source= data ) ( t_data%d , allocate allocate allocatable source=sh ) (t_${ rank }$d% shape (${ rank }$ ) , & allocate (t_${ rank }$d) allocate #: endif sh = shape ( data ) rank > 0 #: i f t_data : : #: for constructor ! c l a s s ( tensor_data ) , rank #: for types tensor concrete ! end type data : : allocatable shape ranks : : allocatable dimension ( : ) , integer , tensor : : abstract type , type tensor abstract in type , ! interface end interface #:endfor #:endfor tensor_${ rank }$d_${ name }$ module procedure #: for name in data_name ranks in rank #: for tensor constructor extends ( tensor ) ! #:endfor end type #: endif vector_type = row_vec : : integer rank == 1 #: i f tensor_${ rank }$d : : 12 / 16
Recommend
More recommend