Steve Deitz Cray Inc.
A new parallel language Under development at Cray Inc. Supported through the DARPA HPCS program Goals Improve programmer productivity Improve the programmability of parallel computers Match or improve performance of MPI/UPC/CAF Provide better portability than MPI/UPC/CAF Improve robustness of parallel codes Support multi-core and multi-node systems
What is Chapel? Chapel’s Parallel Programming Model HPCC STREAM Triad in Chapel HPCC RA in Chapel Summary and Future Work HPCC STREAM and RA in Chapel 3
Programming model The mental model of a programmer Fragmented models Programmers take point-of-view of a single processor/thread SPMD models (Single Program, Multiple Data) Fragmented models with multiple copies of one program Global-view models Programmers write code to describe computation as a whole Chapel: Background 4
1 6 Initial state 2.0 0.0 0.0 0.0 0.0 12.0 Iteration 1 2.0 1.0 0.0 0.0 6.0 12.0 Iteration 2 2.0 1.0 0.5 3.0 6.0 12.0 Iteration 3 2.0 1.25 2.0 3.25 7.5 12.0 ... Steady state 2.0 4.0 6.0 8.0 10.0 12.0 Chapel: Background 5
Global-View vs. Fragmented Computation Global-View Fragmented ( + )/2 = ( ( ( )/2 + + )/2 + )/2 = = = Chapel: Background 6
Global-View vs. Fragmented Code Assumes p divides n Global-View Fragmented def main() { def main() { var n = 1000; var n = 1000; var me = commID (), p = commProcs (), var A, B: [1..n] real ; myN = n/p, myLo = 1, myHi = myN; var A, B: [0..myN+1] real ; forall i in 2..n-1 do B(i) = (A(i-1)+A(i+1))/2; if me < p { } send (me+1, A(myN)); recv (me+1, A(myN+1)); } else myHi = myN-1; if me > 1 { send (me-1, A(1)); recv (me-1, A(0)); } else myLo = 2; for i in myLo..myHi do B(i) = (A(i-1)+A(i+1))/2; } Chapel: Background 7
use caf_intrinsics enddo use caf_intrinsics u(i1,i2,1) = buff(indx, buff_id ) endif 1,i2, i3+1) enddo enddo if( axis .eq. 1 )then y1(i1-1) = r(i1-1,i2-1,i3-1) + r(i1- implicit none enddo dir = -1 do i3=2,n3-1 1,i2-1,i3+1) implicit none do i2=2,n2-1 > + r(i1-1,i2+1,i3-1) + r(i1- include 'cafnpb.h' buff(1:buff_len,buff_id+1)[nbr(axis,dir endif buff_id = 2 + dir indx = indx + 1 1,i2+1,i3+1) ,k)] = include 'globals.h' include 'cafnpb.h' endif buff_len = 0 u(1,i2,i3) = buff(indx, buff_id ) enddo > buff(1:buff_len,buff_id) include 'globals.h' enddo do j1=2,m1j-1 integer n1, n2, n3, kk return if( axis .eq. 1 )then enddo endif i1 = 2*j1-d1 double precision u(n1,n2,n3) integer axis, dir, n1, n2, n3 end do i3=2,n3-1 endif endif y2 = r(i1, i2-1,i3-1) + r(i1, i2- integer axis double precision u( n1, n2, n3 ) do i2=2,n2-1 1,i3+1) buff_len = buff_len + 1 if( axis .eq. 2 )then if( axis .eq. 2 )then > + r(i1, i2+1,i3-1) + r(i1, if( .not. dead(kk) )then integer buff_id, indx subroutine comm1p( axis, u, n1, n2, n3, kk ) buff(buff_len,buff_id ) = u( 2, do i3=2,n3-1 if( dir .eq. -1 )then i2+1,i3+1) do axis = 1, 3 use caf_intrinsics i2,i3) do i1=1,n1 do i3=2,n3-1 x2 = r(i1, i2-1,i3 ) + r(i1, i2+1,i3 enddo if( nprocs .ne. 1) then integer i3, i2, i1 do i1=1,n1 indx = indx + 1 ) enddo call sync_all() buff_len = buff_len + 1 endif u(i1,1,i3) = buff(indx, buff_id ) > + r(i1, i2, i3-1) + r(i1, i2, call give3( axis, +1, u, n1, n2, n3, buff_id = 3 + dir implicit none buff(buff_len, buff_id ) = u( i1, enddo i3+1) kk ) indx = 0 2,i3) s(j1,j2,j3) = call give3( axis, -1, u, n1, n2, n3, if( axis .eq. 2 )then enddo include 'cafnpb.h' kk ) enddo do i3=2,n3-1 endif > 0.5D0 * r(i1,i2,i3) if( axis .eq. 1 )then include 'globals.h' enddo call sync_all() do i1=1,n1 > + 0.25D0 * (r(i1-1,i2,i3) + if( dir .eq. -1 )then call take3( axis, -1, u, n1, n2, n3 ) r(i1+1,i2,i3) + x2) integer axis, dir, n1, n2, n3 buff_len = buff_len + 1 buff(1:buff_len,buff_id+1)[nbr(axis,dir,k)] = if( axis .eq. 3 )then call take3( axis, +1, u, n1, n2, n3 ) do i3=2,n3-1 buff(buff_len, buff_id ) = u( i1, > + 0.125D0 * ( x1(i1-1) + x1(i1+1) + double precision u( n1, n2, n3 ) do i2=1,n2 else > buff(1:buff_len,buff_id) 2,i3) y2) do i2=2,n2-1 else if( dir .eq. +1 ) then do i1=1,n1 call comm1p( axis, u, n1, n2, n3, kk enddo > + 0.0625D0 * ( y1(i1-1) + y1(i1+1) ) indx = indx + 1 integer i3, i2, i1, buff_len,buff_id ) indx = indx + 1 enddo u(n1,i2,i3) = buff(indx, buff_id ) integer i, kk, indx enddo endif do i3=2,n3-1 endif u(i1,i2,1) = buff(indx, buff_id ) enddo enddo enddo do i1=1,n1 enddo enddo dir = -1 enddo else buff_len = buff_len + 1 if( axis .eq. 3 )then enddo do axis = 1, 3 buff(buff_len, buff_id )= u( j = k-1 do i2=1,n2 else if( dir .eq. +1 ) then buff_id = 3 + dir endif call sync_all() i1,n2-1,i3) call comm3(s,m1j,m2j,m3j,j) do i1=1,n1 buff_len = nm2 call sync_all() enddo buff_len = buff_len + 1 return enddo do i3=2,n3-1 return enddo buff(buff_len, buff_id ) = u( end do i2=2,n2-1 do i=1,nm2 call zero3(u,n1,n2,n3) i1,i2,2) end indx = indx + 1 buff(i,buff_id) = 0.0D0 endif enddo buff(1:buff_len,buff_id+1)[nbr(axis,dir u(1,i2,i3) = buff(indx, buff_id ) enddo return enddo subroutine ,k)] = enddo end endif rprj3(r,m1k,m2k,m3k,s,m1j,m2j,m3j,k) > buff(1:buff_len,buff_id) enddo implicit none dir = +1 subroutine give3( axis, dir, u, n1, n2, n3, k do i=1,nm2 endif include 'cafnpb.h' endif ) buff(i,4) = buff(i,3) endif include 'globals.h' use caf_intrinsics endif buff_id = 3 + dir buff(i,2) = buff(i,1) buff_len = nm2 enddo if( axis .eq. 3 )then if( axis .eq. 2 )then integer m1k, m2k, m3k, m1j, m2j, m3j,k if( dir .eq. -1 )then if( dir .eq. -1 )then do i=1,nm2 implicit none dir = -1 buff(i,buff_id) = 0.0D0 double precision r(m1k,m2k,m3k), do i2=1,n2 include 'cafnpb.h' do i3=2,n3-1 enddo s(m1j,m2j,m3j) buff_id = 3 + dir do i1=1,n1 include 'globals.h' subroutine do i1=1,n1 integer j3, j2, j1, i3, i2, i1, d1, d2, d3, j indx = 0 buff_len = buff_len + 1 comm3(u,n1,n2,n3,kk) indx = indx + 1 dir = +1 double precision x1(m), y1(m), x2,y2 buff(buff_len, buff_id ) = u( u(i1,n2,i3) = buff(indx, buff_id ) i1,i2,2) if( axis .eq. 1 )then integer axis, dir, n1, n2, n3, k, ierr enddo buff_id = 2 + dir enddo do i3=2,n3-1 if(m1k.eq.3)then double precision u( n1, n2, n3 ) enddo buff_len = 0 enddo do i2=2,n2-1 d1 = 2 indx = indx + 1 else integer i3, i2, i1, buff_len,buff_id else if( dir .eq. +1 ) then if( axis .eq. 1 )then u(n1,i2,i3) = buff(indx, buff_id ) d1 = 1 buff(1:buff_len,buff_id+1)[nbr(axis,dir do i3=2,n3-1 enddo buff_id = 2 + dir ,k)] = do i3=2,n3-1 do i2=2,n2-1 endif enddo buff_len = 0 > buff(1:buff_len,buff_id) do i1=1,n1 buff_len = buff_len + 1 endif indx = indx + 1 buff(buff_len, buff_id ) = u( n1-1, if(m2k.eq.3)then if( axis .eq. 1 )then i2,i3) else if( dir .eq. +1 ) then u(i1,1,i3) = buff(indx, buff_id ) d2 = 2 if( axis .eq. 2 )then enddo if( dir .eq. -1 )then enddo do i3=2,n3-1 else enddo do i2=1,n2 enddo do i1=1,n1 d2 = 1 do i3=2,n3-1 do i1=1,n1 endif indx = indx + 1 endif do i2=2,n2-1 buff_len = buff_len + 1 endif u(i1,n2,i3) = buff(indx, buff_id ) if( axis .eq. 2 )then buff_len = buff_len + 1 buff(buff_len, buff_id ) = u( endif enddo i1,i2,n3-1) do i3=2,n3-1 if(m3k.eq.3)then buff(buff_len,buff_id ) = u( 2, enddo i2,i3) enddo if( axis .eq. 3 )then do i1=1,n1 d3 = 2 endif enddo enddo buff_len = buff_len + 1 if( dir .eq. -1 )then else enddo buff(buff_len, buff_id )= u( i1,n2- d3 = 1 if( axis .eq. 3 )then 1,i3) do i2=1,n2 do i2=1,n2 endif buff(1:buff_len,buff_id+1)[nbr(axis,dir enddo do i1=1,n1 ,k)] = do i1=1,n1 buff(1:buff_len,buff_id+1)[nbr(axis,dir enddo ,k)] = indx = indx + 1 > buff(1:buff_len,buff_id) endif indx = indx + 1 do j3=2,m3j-1 u(i1,i2,n3) = buff(indx, buff_id ) > buff(1:buff_len,buff_id) u(i1,i2,n3) = buff(indx, buff_id ) i3 = 2*j3-d3 enddo endif enddo if( axis .eq. 3 )then do j2=2,m2j-1 enddo else if( dir .eq. +1 ) then endif enddo do i2=1,n2 i2 = 2*j2-d2 do i1=1,n1 endif do j1=2,m1j do i3=2,n3-1 else if( dir .eq. +1 ) then return buff_len = buff_len + 1 i1 = 2*j1-d1 do i2=2,n2-1 end dir = +1 buff(buff_len, buff_id ) = u( do i2=1,n2 buff_len = buff_len + 1 i1,i2,n3-1) x1(i1-1) = r(i1-1,i2-1,i3 ) + r(i1- do i1=1,n1 buff(buff_len, buff_id ) = u( n1- 1,i2+1,i3 ) enddo buff_id = 3 + dir 1, i2,i3) indx = indx + 1 > + r(i1-1,i2, i3-1) + r(i1- subroutine take3( axis, dir, u, n1, n2, n3 ) enddo indx = 0 def rprj3(S, R) { const Stencil = [-1..1, -1..1, -1..1], W: [0..3] real = (0.5, 0.25, 0.125, 0.0625), W3D = [(i,j,k) in Stencil] W((i!=0)+(j!=0)+(k!=0)); forall inds in S.domain do S(inds) = + reduce [offset in Stencil] (W3D(offset) * R(inds + offset*R.stride)); } Chapel: Background 8
What is Chapel? Chapel’s Parallel Programming Model HPCC STREAM Triad in Chapel HPCC RA in Chapel Summary and Future Work HPCC STREAM and RA in Chapel 9
Given: m -element vectors A , B , C Compute: forall i in 1..m do A (i) = B (i) + α * C (i); + * = HPCC STREAM and RA in Chapel 10
Given: m -element vectors A , B , C Compute: forall i in 1..m do A (i) = B (i) + α * C (i); + + + + * * * * = = = = HPCC STREAM and RA in Chapel 11
Given: m -element vectors A , B , C Compute: forall i in 1..m do A (i) = B (i) + α * C (i); + + + + * * * * = = = = HPCC STREAM and RA in Chapel 12
Recommend
More recommend