Outline The Sixth International Conference on Parallel Processing and Applied Mathematics • Background (PPAM 2005) – Matrix computation libraries – Traditional programming style based on function calls SILC: a Flexible and Environment Independent • Proposal of SILC Interface to Matrix Computation Libraries – Simple Interface for Library Collections – How SILC works Tamito KAJIYAMA 1, 2 Akira NUKADA 1, 2 • Design and Implementation of SILC Hidehiko HASEGAWA 3, 1 Reiji SUDA 2, 1 Akira NISHIDA 2, 1 • Experimental Results 1. CREST, Japan Science and Technology Agency (JST), Japan 2. The University of Tokyo, Japan • Future work 3. University of Tsukuba, Japan PPAM 2005 Background The traditional way of using libraries • Matrix computations 1. Preparation of matrices and vectors using library-specific data structures – Fundamental components in large-scale scientific applications 2. Function calls with a function's name and • Taking a major proportion of execution time and its arguments in a prescribed order memory resources As a result... • Long computation time with relatively small data • User programs will depend on a specific – Matrix computation libraries library • Facilitating rapid development of user programs • A few examples of libraries: LAPACK, IMSL, NAG – Not easy to replace the library by another PPAM 2005 PPAM 2005 You need to use other libraries An example in the traditional way • When user programs need to be ported to SSI_MATRIX A; SSI_MATRIX A; SSI_SCALAR *b, *x, work[N*6], params[2]; other computing environments SSI_SCALAR *b, *x, work[N*6], params[2]; int options[6], status; int options[6], status; – Required to use environment-specific libraries /* Create matrix A and vector b, allocate buffer for x */ /* Create matrix A and vector b, allocate buffer for x */ • When solvers and matrix storage formats status = ssi_cg (b, x, work, params, options, &A, NULL); status = ssi_cg (b, x, work, params, options, &A, NULL); in other libraries are necessary • A user program to solve A x = b – The best solver and matrix storage format • Using a library-specific function and data structures depend on: • A source-level dependency upon the library • The problem to be solved – Switch of libraries requires a number of modifications to the user program • The computing environment in use PPAM 2005 PPAM 2005 1
Proposal of SILC An example in SILC • Simple Interface for Library Collections silc_envelope_t A, b, x; /* as in A x = b */ silc_envelope_t A, b, x; /* as in A x = b */ – Separating a function call into data transfer /* Create matrix A and vector b, allocate buffer for x */ /* Create matrix A and vector b, allocate buffer for x */ and a request of computation SILC_PUT ("A", &A); SILC_PUT ("A", &A); – Requesting the computation by means of SILC_PUT ("b", &b); SILC_PUT ("b", &b); mathematical expressions in the form of text SILC_EXEC ("x = A \ b"); /* Call a solver (e.g., ssi_cg) */ SILC_EXEC ("x = A \ b"); /* Call a solver (e.g., ssi_cg) */ – Using separate memory space to carry out SILC_GET (&x, "x"); SILC_GET (&x, "x"); the requested computation • Data transfer and a request of computation • Mathematical expressions in the form of text A, b • Computation in separate memory space "Solve Ax = b" Memory space for User program → Independent of any specific library and environment computation x PPAM 2005 Main benefits of using SILC Functionalities • User programs are independent of libraries • Data types: scalar, vector, matrix, cubic array – Allowing users to change environments easily • Precisions: integer, real, complex (single/double) PC PC PC PC SMP • Matrix storage formats: dense, band, CRS • Mathematical expressions • Only the smallest amount of data is needed – Statements: assignments, procedure calls – Temporary buffers for computation are automatically – Components of a statement allocated in separate memory space • Binary arithmetic operators (+, − , *, /, %) • Mathematical expressions are well-defined and • Solution of systems of linear equations (A \ b) language-independent • Transposition (A'), complex conjugate (A~) • Functions (e.g., “sqrt(b' * b)” is the 2-norm of vector b) – Fit for use in many computing environments with • Subscript (e.g., “A[1:5, 1:5]” is a 5 × 5 submatrix of A) various programming languages (C, Fortran, Python) PPAM 2005 PPAM 2005 How to use alternative solvers Implementation • User program (client) User program • Alternative solvers as separate modules Main program (Client) – Connects to a SILC server – One module for each solver – Issues PUT, EXEC and SILC client routines – The “prefer” statement to specify a preferred module GET requests • Interface thread • An example: a comparison of two solvers Communications – For communications SILC_EXEC ("prefer leq_lu"); – Puts EXEC requests into Interface thread SILC server SILC_EXEC ("x1 = A \ b"); /* solved by LU decomposition */ the request queue SILC_EXEC ("prefer leq_cg"); • Execution thread SILC_EXEC ("x2 = A \ b"); /* solved by the CG method */ Request queue – For computation SILC_EXEC ("d = b − A * x1; norm1 = sqrt(d' * d)"); /* ||b − Ax 1 || */ Execution thread – Handles EXEC requests SILC_EXEC ("d = b − A * x2; norm2 = sqrt(d' * d)"); /* ||b − Ax 2 || */ asynchronously Linear Eigenvalue Modules (pluggable) equation FFT solvers solvers PPAM 2005 PPAM 2005 2
Experiments with 4 SILC servers in Implementation (continued) different computing environments • A user program (client) that solves A x = b • User programs – Where A is a tridiagonal matrix in the CRS format – Sequential programs (at the moment) – Run in the notebook PC of Environment (a) – In a 100-Base TX local-area network – Written in C, Fortran and Python Environment Specification OpenMP • SILC servers (a) A notebook PC Intel Pentium M 733 1.1GHz, N/A C S 768MB memory, – Run in sequential and shared-memory (SMP) Fedora Core 3 parallel computing environments Intel Itanium2 1.3GHz × 32, (b) SGI Altix3700 1 thread C S 32GB memory, Red Hat Linux – OpenMP is used for parallel computation in Advanced Server 2.1 IBM Power5 1.65GHz × 2 (c) IBM eServer 4 threads the execution thread C S OpenPower 710 (4 logical CPUs), 1GB memory, SuSE Linux Enterprise Server 9 (d) SGI Altix3700 Same as (b) 16 threads C S PPAM 2005 Experimental results Observations • About 0.1 second of data communications over the LAN • Performance of SILC is not bad – Data size: 0.46MB (N=10,000) to 4.27MB (N=80,000) – Speedups by parallel computation even with a • SILC servers in (c) and (d) achieved better performance time loss due to data communications because of parallel computation • Communication time will have less impact 10,000 C S (a) Notebook PC Execution time (in seconds) as dimension N increases (b) Altix3700 (1 thread) 1,000 (c) OpenPower 710 (4 threads) C S – Communication time is of O ( N ) (d) Altix3700 (16 threads) 100 – Computation time is of O ( N 2 ) C S 10 • Faster networks and computing environments also reduce communication time in SILC C S 1 10,000 20,000 40,000 80,000 PPAM 2005 Dimension N Future work For your information • Ready-made modules for existing matrix • The first public release of SILC (version 1.0) will be made on September 20 computation libraries • Please visit our project home page at • MPI-based SILC for distributed-memory http://ssi.is.s.u-tokyo.ac.jp/silc/ parallel computing environments • Just-in-time (dynamic) optimizations based on mathematical expressions • Extension of mathematical expressions to an interactive scripting language PPAM 2005 PPAM 2005 3
Recommend
More recommend