a parallel generator of non hermitian matrices computed
play

A Parallel Generator of Non-Hermitian Matrices computed from Known - PowerPoint PPT Presentation

A Parallel Generator of Non-Hermitian Matrices computed from Known Given Spectra Xinzhe WU 1 , 2 Serge G. Petiton 1 , 2 1 Maison de la Simulation, Gif-sur-Yvette, 91191, France 2 CRIStAL, Universit e de Lille, France PMAA18, Zurich, Jun. 2018


  1. A Parallel Generator of Non-Hermitian Matrices computed from Known Given Spectra Xinzhe WU 1 , 2 Serge G. Petiton 1 , 2 1 Maison de la Simulation, Gif-sur-Yvette, 91191, France 2 CRIStAL, Universit´ e de Lille, France PMAA18, Zurich, Jun. 2018

  2. Introduction Outline Introduction 1 A Scalable Matrix Generator from Given Spectra (SMG2S) 2 Experimentations, evaluation and analysis 3 Accuracy Verification 4 Application: Krylov Solvers Evaluation using SMG2S 5 Conclusion and Perspectives 6 Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 2 / 27

  3. Introduction Linear System Solvers and Spectra When we solve the linear systems Ax = b by the Krylov Subspace methods, such as GMRES (Saad and Schultz (1986)), with A a non-Hermitian matrix. The spectra have more or less the impact during the procedure of resolution by these methods, such as: 1 Convergence Analysis; 2 Preconditioners; 3 Recyling of eigenvalues for a sequence of linear systems; 4 etc. Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 3 / 27

  4. Introduction Requirement of large-scale matrix generator Today: the linear problem size is increasing; the numerical methods should adjust to the coming exascale platforms. Thus there are four special requirements on the test matrices for the eval- uation of numerical algorithms: their spectra must be known and can be customized; they should be sparse, non-Hermitian and non-trivial; they could have a very high dimension to evaluate the algorithms on large-scale systems; they should be generated in parallel with low memory required during the procedure of generation. Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 4 / 27

  5. Introduction Related works The related work: Saad’s SPARSKIT (Saad (1990)); Tim Davis collection (Davis and Hu (2011)); Matrix Market collection (Boisvert et al. (1997)); Bai’s collection (Bai et al. (1996)) Galeri package of Trilinos to generate simple well-know finite element and finite di ff erence matrices; J. Demmel’s generation suite in 1989 to benchmark LAPACK (Demmel and McKenney (1989)), etc. Only the method by Demmel generate matrices with given spectra, which can transfer the diagonal matrix into a dense matrix by the orthogonal matrices, and then reduce them to unsymmetric band ones by Householder transformation. This method requires O ( n 3 ) time and O ( n 2 ) storage even for generating a small bandwidth matrix. Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 5 / 27

  6. A Scalable Matrix Generator from Given Spectra (SMG2S) Outline Introduction 1 A Scalable Matrix Generator from Given Spectra (SMG2S) 2 Experimentations, evaluation and analysis 3 Accuracy Verification 4 Application: Krylov Solvers Evaluation using SMG2S 5 Conclusion and Perspectives 6 Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 6 / 27

  7. A Scalable Matrix Generator from Given Spectra (SMG2S) Mathematical notations Based on the preliminary theoretical of H. Galicher (Galicher et al. (2014)), for all matrices A œ C n × n , M œ C n × n , n œ N , a linear operator Ê A A of matrix M determined by matrix A can be set up as Formule (1): I Ê A A : C n × n æ C n × n , (1) M æ AM ≠ MA . k ÿ ( Ê A A ) k ( M 0 ) = ( ≠ 1) m C m k A k − m M 0 A m . (2) m =0 M i +1 = M i + 1 i !( Ê A A ) i ( M 0 ) , i œ (0 , + Œ ) . (3) i In order to make ] ( A A ) tends to 0 in limited steps, we select A to be a nilpotent matrix. Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 7 / 27

  8. A Scalable Matrix Generator from Given Spectra (SMG2S) Nilpotent Matrix The selected nilpotent matrix is given as: # $ … 1 1 1 0 1 1 1 0 1 % Figure: Nilpotent Matrix. If p = 1, with d œ N ∗ , or p = 2 with d œ N ∗ to be even, the nilpotency of A is d + 1. Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 8 / 27

  9. A Scalable Matrix Generator from Given Spectra (SMG2S) SMG2S Algorithm The SMG2S algorithm is given as: Algorithm 1 Matrix Generation Method Input: Spec in œ C n , h , d Output: M t œ C n × n 1: Insert random elements in h lower diagonals of M o œ C n × n 2: Insert Spec in on the diagonal of M 0 and M 0 = (2 d ≠ 2)! M 0 3: Generate the nilpotent matrix A œ N n × n with parameters p and d 4: for i = 0 , · · · , 2( d ≠ 2) ≠ 1 do M i +1 = M i + ( r 2 d − 2 k = i +1 k )( Ê A A ) i ( M 0 ) 5: 6: end for 1 7: M t = (2 d − 2)! M 2 d − 2 Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 9 / 27

  10. A Scalable Matrix Generator from Given Spectra (SMG2S) Matrix Generation Example Through SMG2S, this nilpotent matrix can transfer an low band matrix to be a band matrix which have same spectrum. l < 2pd h h Figure: Matrix Generation Example. Operation complexity is max ( O ( hdn ) , O ( d 2 n )). If d π n and h π n , it turns out to be O ( n ) operations and memory space. Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 10 / 27

  11. A Scalable Matrix Generator from Given Spectra (SMG2S) Parallel Implementation of CPUs and GPUs We implement SMG2S on homogenous and heterogeneous machines. The former is implemented based on MPI and PETSc, the latter is based on MPI, CUDA, and PETSc. The kernel of implementation is the SpGEMM. Host (CPU) Host (CPU) Device (GPU) d , ) eff d , ) eff d d ) abc , ) abc , ` d = ) abc d _ iej d + ) eff d _ ekl d d , _ eff d , _ ekl d d _ abc _ iej g , ) eff g , ) eff g g ) abc , ) abc , ` g = ) abc g + ) eff )×_ g _ iej g _ ekl g ` = ) × _ g , _ eff g , _ ekl g g _ abc _ iej h , ) eff h , ) eff ) abc h h MPI & CUDA , ) abc , ` h = ) abc h h h h _ iej + ) eff _ ekl h , _ eff h h , _ ekl _ abc _ iej h MPI MPI CUDA Figure: The structure of a CPU-GPU implementation of SpGEMM, where each GPU is attached to a CPU. The GPU is in charge of the computation, while the CPU handles the MPI communication among processes. Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 11 / 27

  12. A Scalable Matrix Generator from Given Spectra (SMG2S) Optimized Communication Implementation on CPUs The implementation of SMG2S, especially the parallel SpGEMM kernel’s communication can be specifically optimized based on the particular prop- erty of nilpotent matrix A . M M AM MA ! Proc 0 " +1 Proc 1 2" + 2 Proc 2 ! Proc 3 ! ! " +1 2" + 2 (a) (b) Figure: (a) AM operation; (b) MA operation. Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 12 / 27

  13. Experimentations, evaluation and analysis Outline Introduction 1 A Scalable Matrix Generator from Given Spectra (SMG2S) 2 Experimentations, evaluation and analysis 3 Accuracy Verification 4 Application: Krylov Solvers Evaluation using SMG2S 5 Conclusion and Perspectives 6 Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 13 / 27

  14. Experimentations, evaluation and analysis Experimental hardware environment We implement SMG2S on the supercomputers Tianhe-2 and Romeo . The node specfication for the two platforms is given as following: Table: Node Specifications of the cluster ROMEO and Tianhe-2 Machine Name ROMEO Tianhe-2 Nodes Number BullX R421 ◊ 130 16000 ◊ nodes Mother Board SuperMicro X9DRG-QF Specific Infiniband CPU 2 ◊ Intel Ivy Bridge 8 cores 2.6 GHz 2 ◊ Intel Ivy Bridge 12 cores 2.2 GHz Memory DDR3 32GB DDR3 64GB Accelerator NVIDIA GPU Tesla K20X ◊ 2 Intel Knights Corner ◊ 3 Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 14 / 27

  15. Experimentations, evaluation and analysis Scalability and Speedup Evaluation I The scaling and speedup evaluations are given as: 10 4 CD-SS CD-WS OCD-SS OCD-WS CD-SS CD-WS OCD-SS OCD-WS RD-SS RD-WS ORD-SS ORD-WS RD-SS RD-WS ORD-SS ORD-WS 10 3 10 3 Time (s) Time (s) 10 2 10 2 10 1 10 1 10 0 48 96 192 384 768 1536 16 32 64 128 256 Number of CPU cores (Tianhe-2) Number of CPU cores (ROMEO) (a) Strong and Weak Scaling of SMG2S on Tianhe-2 (b) Strong and Weak Scaling of SMG2S on ROMEO Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 15 / 27

  16. Experimentations, evaluation and analysis Scalability and Speedup Evaluation II 10 CD-SS RD-SS CD-WS RD-WS SMG2S on CPU SMG2S on GPU Optimized SMG2S 10 3 8.4 8.4 8.1 8.0 7.9 8 Speedup/4CPUs 6 Time (s) 4 10 2 1.9 1.9 1.9 1.8 1.8 2 1.0 1.0 1.0 1.0 0.9 0 4 8 16 32 64 4 8 16 32 64 Number of GPUs (ROMEO) CPU or GPU number (c) Strong and Weak Scaling of SMG2S on ROMEO with (d) Speedup of di ff erent implementation multiple GPUs Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 16 / 27

  17. Accuracy Verification Outline Introduction 1 A Scalable Matrix Generator from Given Spectra (SMG2S) 2 Experimentations, evaluation and analysis 3 Accuracy Verification 4 Application: Krylov Solvers Evaluation using SMG2S 5 Conclusion and Perspectives 6 Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 17 / 27

Recommend


More recommend