High Performance Fortran (HPF) Source: Chapter 7 of "Designing - PowerPoint PPT Presentation

High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)

Question • Can't we just have a clever compiler generate a parallel program from a sequential program? • Fine-grained parallelism x = a*b + c*d • Trivial parallelism for i := 1 to 100 do for j := 1 to 100 do C [i, j] := dotproduct ( A [ i,*], B [*, j ]); od od

Automatic parallelism Automatic parallelization of any program is extremely hard Solutions: • Make restrictions on source program • Restrict kind of parallelism used • Use semi-automatic approach • Use application-domain oriented languages

High Performance Fortran (HPF) • Designed by a forum from industry, government, universities • Extends Fortran 90 • To be used for computationally expensive numerical applications • Portable to SIMD machines, vector processors, shared-memory MIMD and distributed-memory MIMD

Fortran 90 - Base language of HPF Extends Fortran 77 with 'modern' features • abstract data types, modules • recursion • pointers, dynamic storage Array operators A = B + C A = A + 1.0 A(1:7) = B(1:7) + B(2:8) WHERE (X /= 0) X = 1.0/X

Data parallelism • Data parallelism: same operation applied to different data elements in parallel • Data parallel program: sequence of data parallel operations • Overall approach: – Programmer does domain decomposition – Compiler partitions operations automatically • Data may be regular (array) or irregular (tree, sparse matrix) • Most data parallel languages only deal with arrays

Data parallelism - Concurrency Explicit parallel operations A = B + C ! A, B, and C are arrays Implicit parallelism do i = 1,m do j = 1,n A(i,j) = B(i,j) + C(i,j) enddo enddo

Compiling data parallel programs • Programs are translated automatically into parallel SPMD (Single Program Multiple Data) programs • Each processor executes same program on subset of the data • Owner computes rule: - Each processor owns subset of the data structures - Operations required for an element are executed by the owner - Each processor may read (but not modify) other elements

Example real s, X(100), Y(100) ! s is scalar, X and Y are arrays X = X * 3.0 ! Multiply each X(i) by 3.0 do i = 2,99 Y(i) = (X(i-1) + X(i+1))/2 ! Communication required enddo s = SUM(X) ! Communication required X and Y are distributed (partitioned) s is replicated on each machine X Y

HPF primitives for data distribution • Directives: PROCESSORS: shape & size of abstract processors ALIGN: align elements of different arrays DISTRIBUTE: distribute (partition) an array • Directives affect performance of the program, not its result

Processors directive !HPF$ PROCESSORS P(32) !HPF$ PROCESSORS Q(4,8) • Mapping of abstract to physical processors not specified in HPF (implementation-dependent)

Alignment directive • Aligns an array with another array • Species that specific elements should be mapped to the same processor real A(50), B(50) !HPF$ ALIGN A(I) WITH B(I) ! A(1) on same cpu as B(1), etc !HPF$ ALIGN A(I) WITH B(I+2) ! A(1) on same cpu as B(3), etc

Distribution directive • Species how elements should be partitioned among the local memories • Each dimension can be distributed as follows: * no distribution BLOCK (n) block distribution CYCLIC (n) cyclic distribution

Figure 7.7 from Foster's book

Example: Successive Over relaxation (SOR) Recall algorithm discussed in Introduction: float G[1:N, 1:M], Gnew[1:N, 1:M]; for (step = 0; step < NSTEPS; step++) for (i = 2; i < N; i++) /* update grid */ for (j = 2; j < M; j++) Gnew[i,j] = f(G[i,j], G[i-1,j], G[i+1,j],G[i,j-1], G[i,j+1]); G = Gnew;

Parallel SOR with message passing float G[lb-1:ub+1, 1:M], Gnew[lb-1:ub+1, 1:M]; for (step = 0; step < NSTEPS; step++) SEND(cpuid-1, G[lb]); /* send 1st row left */ SEND(cpuid+1, G[ub]); /* send last row right */ RECEIVE(cpuid-1, G[lb-1]); /* receive from left */ RECEIVE(cpuid+1, G[ub+1]); /* receive from right */ for (i = lb; i <= ub; i++) /* update my rows */ for (j = 2; j < M; j++) Gnew[i,j] = f(G[i,j], G[i-1,j], G[i+1,j], G[i,j-1], G[i,j+1]); G = Gnew;

Finite differencing (~ SOR) in HPF See Ian Foster, Program 7.2; uses convergence criterion instead of fixed number of steps program hpf_finite_difference !HPF$ PROCESSORS pr(4) ! use 4 CPUs real X(100, 100), New(100, 100) ! data arrays !HPF$ ALIGN New(:,:) WITH X(:,:) !HPF$ DISTRIBUTE X(BLOCK,*) ONTO pr ! row-wise New(2:99, 2:99) = (X(1:98, 2:99) + X(3:100, 2:99) + X(2:99, 1:98) + X(2:99, 3:100))/4 diffmax = MAXVAL (ABS (New-X)) end

Changing the distribution Use block distribution instead of row distribution program hpf_finite_difference !HPF$ PROCESSORS pr(2,2) ! use 2x2 grid real X(100, 100), New(100, 100) ! data arrays !HPF$ ALIGN New(:,:) WITH X(:,:) !HPF$ DISTRIBUTE X(BLOCK, BLOCK) ONTO pr ! block-wise New(2:99, 2:99) = (X(1:98, 2:99) + X(3:100, 2:99) + X(2:99, 1:98) + X(2:99, 3:100))/4 diffmax = MAXVAL (ABS (New-X)) end

Performance Distribution affects • Load balance • Amount of communication Example (communication costs): !HPF$ PROCESSORS pr(3) integer A(8), B(8), C(8) !HPF$ ALIGN B(:) WITH A(:) !HPF$ DISTRIBUTE A(BLOCK) ONTO pr !HPF$ DISTRIBUTE C(CYCLIC) ONTO pr

Figure 7.9 from Foster's book

Historical Evaluation • See : “ The rise and fall of High Performance Fortran: an historical object lesson ” by Ken Kennedy, Charles Koelbel, Hans Zima. In: Proceedings of the third ACM SIGPLAN conference on History of programming languages, June 2007 [Optional, obtainable from ACM Digital Library]

Problems with HPF • Immature compiler technology – Upgrading to Fortran 90 was complicated – Implementing HPF extensions took much time • HPC community was impatient and started using MPI • Missing features: – Support for sparse array and other irregular data structures • Obtaining portable performance was difficult • Performance tuning was difficult

Impact of HPF • Huge impact on parallel language design – Very frequently cited – Some impact on OpenMP (shared-memory standard) – Impact on programming systems for GPUs – New wave of High Productivity Computing Systems (HPCS) languages: Chapel (Cray), Fortress (Sun), X10 (IBM) • Used in extended form (HPF/JA) for Japanese Earth Simulator

Conclusions • High-level model • User species data distribution • Compiler generates parallel program + communication • More restrictive than general message passing model (only data parallelism) • Restricted to array-based data structures • HPF programs will be easy to modify, enhances portability • Changing data distribution only requires changing directives

High Performance Fortran (HPF) Source: Chapter 7 of "Designing - PowerPoint PPT Presentation

High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs (Ian Foster, 1995) Question Can't we just have a clever compiler generate a parallel program from a sequential program? Fine-grained

1954 1957 FORTRAN I FORTRAN II FORTRAN III FORTRAN 1957 end-1958 october 1956 november

Introduction to FORTRAN A Brief Summary of GNU FORTRAN Ashik Iqubal Department of Physics

The Fortran 90 programming language Fortran has evolved since the early days of computing

Getting started with Fortran branches loops 1 2 Why learn Fortran? Well suited for

An introduction to Fortran Daniel Price School of Physics and Astronomy Monash University

FORTRAN 04 February 1999; CS655 FORTRAN Concepts/Contributions Binding time Separate

Evolution of Fortran standards over the few A brief overview of this course decades The 1 st

AMath 483/583 Lecture 8 Notes: This lecture: Fortran subroutines and functions Arrays

Getting along and working together Fortran-Python Interoperability Jacob Wilkins Fortran AND

Fortran 90 Arrays Fortran 90 Arrays Program testing can be used to show the presence of bugs

Ruby on .NET Dr Wayne Kelly Queensland University of Technology Australia Language vs

Application of Fortran Application of Fortran Pthreads on Linear Algebra Pthreads on Linear

Programming Memory allocation and ordering Fortran array syntax MPI derived types enable

FORTRAN Intrinsic Functions FORTRAN Intrinsic Functions

A New Vision for Coarray Fortran John Mellor-Crummey, Laksono Adhianto William Scherer III

Lecture 10: Ideal Filters Mark Hasegawa-Johnson ECE 401: Signal and Image Analysis, Fall 2020

Provider Fee Program Presented by: Nancy Dolson 8/30/17 1 Our Mission Improving health care

Vouchers Scheme WHAT IS THE BROADBAND CONNECTION VOUCHER SCHEME? WHAT IS THE BROADBAND

ADG Mhendislik Danmanlk Tic. Ltd. ti. Establishing Date : 30 Temmuz 2004 Ahmet Cem

Kick-off Interreg Papenburg, 18 oktober 2012 Jelle van der Heide Programm-manager Energie

Washington Health Benefit Exchange Tribal Assister Touch Base Call December 19, 2017 In

DEPARTMENT OF THE PREMIER AND CABINET of the government of South Australia High Performing

Health Partners Forum The downward replication of the State level Health Partners Forum concept

Proposed Riverfront Development District NOV EMBE R 9, 2 017 Pendleton Riverfront Development

Sambuz

Useful Links

Newsletter

Mail Us