Outline The Sixth International Conference on Parallel Processing - PDF document

Outline The Sixth International Conference on Parallel Processing and Applied Mathematics • Background (PPAM 2005) – Matrix computation libraries – Traditional programming style based on function calls SILC: a Flexible and Environment Independent • Proposal of SILC Interface to Matrix Computation Libraries – Simple Interface for Library Collections – How SILC works Tamito KAJIYAMA 1, 2 Akira NUKADA 1, 2 • Design and Implementation of SILC Hidehiko HASEGAWA 3, 1 Reiji SUDA 2, 1 Akira NISHIDA 2, 1 • Experimental Results 1. CREST, Japan Science and Technology Agency (JST), Japan 2. The University of Tokyo, Japan • Future work 3. University of Tsukuba, Japan PPAM 2005 Background The traditional way of using libraries • Matrix computations 1. Preparation of matrices and vectors using library-specific data structures – Fundamental components in large-scale scientific applications 2. Function calls with a function's name and • Taking a major proportion of execution time and its arguments in a prescribed order memory resources As a result... • Long computation time with relatively small data • User programs will depend on a specific – Matrix computation libraries library • Facilitating rapid development of user programs • A few examples of libraries: LAPACK, IMSL, NAG – Not easy to replace the library by another PPAM 2005 PPAM 2005 You need to use other libraries An example in the traditional way • When user programs need to be ported to SSI_MATRIX A; SSI_MATRIX A; SSI_SCALAR *b, *x, work[N*6], params[2]; other computing environments SSI_SCALAR *b, *x, work[N*6], params[2]; int options[6], status; int options[6], status; – Required to use environment-specific libraries /* Create matrix A and vector b, allocate buffer for x */ /* Create matrix A and vector b, allocate buffer for x */ • When solvers and matrix storage formats status = ssi_cg (b, x, work, params, options, &A, NULL); status = ssi_cg (b, x, work, params, options, &A, NULL); in other libraries are necessary • A user program to solve A x = b – The best solver and matrix storage format • Using a library-specific function and data structures depend on: • A source-level dependency upon the library • The problem to be solved – Switch of libraries requires a number of modifications to the user program • The computing environment in use PPAM 2005 PPAM 2005 1

Proposal of SILC An example in SILC • Simple Interface for Library Collections silc_envelope_t A, b, x; /* as in A x = b */ silc_envelope_t A, b, x; /* as in A x = b */ – Separating a function call into data transfer /* Create matrix A and vector b, allocate buffer for x */ /* Create matrix A and vector b, allocate buffer for x */ and a request of computation SILC_PUT ("A", &A); SILC_PUT ("A", &A); – Requesting the computation by means of SILC_PUT ("b", &b); SILC_PUT ("b", &b); mathematical expressions in the form of text SILC_EXEC ("x = A ＼ b"); /* Call a solver (e.g., ssi_cg) */ SILC_EXEC ("x = A ＼ b"); /* Call a solver (e.g., ssi_cg) */ – Using separate memory space to carry out SILC_GET (&x, "x"); SILC_GET (&x, "x"); the requested computation • Data transfer and a request of computation • Mathematical expressions in the form of text A, b • Computation in separate memory space "Solve Ax = b" Memory space for User program → Independent of any specific library and environment computation x PPAM 2005 Main benefits of using SILC Functionalities • User programs are independent of libraries • Data types: scalar, vector, matrix, cubic array – Allowing users to change environments easily • Precisions: integer, real, complex (single/double) PC PC PC PC SMP • Matrix storage formats: dense, band, CRS • Mathematical expressions • Only the smallest amount of data is needed – Statements: assignments, procedure calls – Temporary buffers for computation are automatically – Components of a statement allocated in separate memory space • Binary arithmetic operators (+, − , *, /, %) • Mathematical expressions are well-defined and • Solution of systems of linear equations (A ＼ b) language-independent • Transposition (A'), complex conjugate (A~) • Functions (e.g., “sqrt(b' * b)” is the 2-norm of vector b) – Fit for use in many computing environments with • Subscript (e.g., “A[1:5, 1:5]” is a 5 × 5 submatrix of A) various programming languages (C, Fortran, Python) PPAM 2005 PPAM 2005 How to use alternative solvers Implementation • User program (client) User program • Alternative solvers as separate modules Main program (Client) – Connects to a SILC server – One module for each solver – Issues PUT, EXEC and SILC client routines – The “prefer” statement to specify a preferred module GET requests • Interface thread • An example: a comparison of two solvers Communications – For communications SILC_EXEC ("prefer leq_lu"); – Puts EXEC requests into Interface thread SILC server SILC_EXEC ("x1 = A ＼ b"); /* solved by LU decomposition */ the request queue SILC_EXEC ("prefer leq_cg"); • Execution thread SILC_EXEC ("x2 = A ＼ b"); /* solved by the CG method */ Request queue – For computation SILC_EXEC ("d = b − A * x1; norm1 = sqrt(d' * d)"); /* ||b − Ax 1 || */ Execution thread – Handles EXEC requests SILC_EXEC ("d = b − A * x2; norm2 = sqrt(d' * d)"); /* ||b − Ax 2 || */ asynchronously Linear Eigenvalue Modules (pluggable) equation FFT solvers solvers PPAM 2005 PPAM 2005 2

Experiments with 4 SILC servers in Implementation (continued) different computing environments • A user program (client) that solves A x = b • User programs – Where A is a tridiagonal matrix in the CRS format – Sequential programs (at the moment) – Run in the notebook PC of Environment (a) – In a 100-Base TX local-area network – Written in C, Fortran and Python Environment Specification OpenMP • SILC servers (a) A notebook PC Intel Pentium M 733 1.1GHz, N/A C S 768MB memory, – Run in sequential and shared-memory (SMP) Fedora Core 3 parallel computing environments Intel Itanium2 1.3GHz × 32, (b) SGI Altix3700 1 thread C S 32GB memory, Red Hat Linux – OpenMP is used for parallel computation in Advanced Server 2.1 IBM Power5 1.65GHz × 2 (c) IBM eServer 4 threads the execution thread C S OpenPower 710 (4 logical CPUs), 1GB memory, SuSE Linux Enterprise Server 9 (d) SGI Altix3700 Same as (b) 16 threads C S PPAM 2005 Experimental results Observations • About 0.1 second of data communications over the LAN • Performance of SILC is not bad – Data size: 0.46MB (N=10,000) to 4.27MB (N=80,000) – Speedups by parallel computation even with a • SILC servers in (c) and (d) achieved better performance time loss due to data communications because of parallel computation • Communication time will have less impact 10,000 C S (a) Notebook PC Execution time (in seconds) as dimension N increases (b) Altix3700 (1 thread) 1,000 (c) OpenPower 710 (4 threads) C S – Communication time is of O ( N ) (d) Altix3700 (16 threads) 100 – Computation time is of O ( N 2 ) C S 10 • Faster networks and computing environments also reduce communication time in SILC C S 1 10,000 20,000 40,000 80,000 PPAM 2005 Dimension N Future work For your information • Ready-made modules for existing matrix • The first public release of SILC (version 1.0) will be made on September 20 computation libraries • Please visit our project home page at • MPI-based SILC for distributed-memory http://ssi.is.s.u-tokyo.ac.jp/silc/ parallel computing environments • Just-in-time (dynamic) optimizations based on mathematical expressions • Extension of mathematical expressions to an interactive scripting language PPAM 2005 PPAM 2005 3

Outline The Sixth International Conference on Parallel Processing - PDF document

Outline The Sixth International Conference on Parallel Processing and Applied Mathematics Background (PPAM 2005) Matrix computation libraries Traditional programming style based on function calls SILC: a Flexible and Environment

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Numerical Integration Sanzheng Qiao Department of Computing and Software McMaster University

Contextual Media Retrieval Using Natural Language Queries IMPRS-CS PhD Application Talk Sreyasi

Adapting restorative practices to a center for teens living and learning without school Joshua

The G3 F2PY for connecting Python to Fortran 90 programs Pearu Peterson pearu@simula.no F2PY

Break-out session topic Articulate a grand-challenge/big-idea for the architecture community

INRDB the Internet Number Resource Database Robert Kisteleki Science Group Manager, RIPE NCC

Mflop 300 200 100 0 6 8 10 12 14 16 18 20 22 Vector length 2^n 600

Distributed Systems How does the OS ensure security? 13C. Distributed Systems: Security all