using the global arrays toolkit to reimplement numpy for
play

Using the Global Arrays Toolkit to Reimplement NumPy for - PowerPoint PPT Presentation

Using the Global Arrays Toolkit to Reimplement NumPy for Distributed Computation Jeff Daily , Pacific Northwest National Laboratory jeff.daily@pnnl.gov Robert R. Lewis, Washington State University bobl@tricity.wsu.edu Motivation Lots of


  1. Using the Global Arrays Toolkit to Reimplement NumPy for Distributed Computation Jeff Daily , Pacific Northwest National Laboratory jeff.daily@pnnl.gov Robert R. Lewis, Washington State University bobl@tricity.wsu.edu

  2. Motivation � Lots of NumPy applications � NumPy (and Python) are for the most part single-threaded � Resources underutilized � Computers have multiple cores � Academic/business clusters are common � Lots of parallel libraries or programming languages � Message Passing Interface (MPI), Global Arrays (GA), X10, Co-Array Fortran, OpenMP, Unified Parallel C, Chapel, Titianium, Cilk � Can we transparently parallelize NumPy? 2 Scipy July 13 2011

  3. Background – Parallel Programming � Single Program, Multiple Data (SPMD) � Each process runs the same copy of the program � Different branches of code run by different threads if my_id == 0: foo() else: bar() 3 Scipy July 13 2011

  4. Background – Message Passing Interface � Each process assigned a rank starting from 0 � Excellent Python bindings – mpi4py � Two models of communication � Two-sided i.e. message passing (MPI-1 standard) � One-sided (MPI-2 standard) if MPI.COMM_WORLD.rank == 0: foo() else: bar() 4 Scipy July 13 2011

  5. Background – Communication Models Message Passing: Message requires cooperation on both sides. The processor receive send sending the message (P1) and P1 P0 the processor receiving the message (P0) must both message passing participate. MPI One-sided Communication: Once message is initiated on sending processor (P1) the sending processor can continue computation. put Receiving processor (P0) is P0 P1 not involved. Data is copied directly from switch into memory on P0. one-sided communication SHMEM, ARMCI, MPI-2-1S 5 Scipy July 13 2011

  6. Background – Global Arrays Physically distributed data 0 2 4 6 � Distributed dense arrays that can be accessed through a shared memory-like style 1 3 5 7 � single, shared data structure/ global indexing � e.g., ga.get(a, (3,2)) rather than buf[6] on process 1 � Local array portions can be ga.access() ’d Global Address Space 6 Scipy July 13 2011

  7. Remote Data Access in GA vs MPI Message Passing: Global Arrays: identify size and location of data buf=ga.get(g_a, lo=None, hi=None, buffer=None) blocks loop over processors: if (me = P_N) then Global Array Global upper Local ndarray pack data in local message handle and lower buffer buffer indices of data send block of data to patch message buffer on P0 else if (me = P0) then receive block of data from P_N in message buffer P0 P2 unpack data from message buffer to local buffer endif P1 P3 end loop copy local data on P0 to local buffer Scipy July 13 2011

  8. Background – Global Arrays � Shared data model in context of distributed dense arrays � Much simpler than message-passing for many applications � Complete environment for parallel code development � Compatible with MPI � Data locality control similar to distributed memory/ message passing model � Extensible � Scalable 8 Scipy July 13 2011

  9. Previous Work to Parallelize NumPy � Star-P � Global Arrays Meets MATLAB (yes, it’s not NumPy, but…) � IPython � gpupy � Co-Array Python 9 Scipy July 13 2011

  10. Design for Global Arrays in NumPy (GAiN) � All documented NumPy functions are collective � GAiN programs run in SPMD fashion � Not all arrays should be distributed � GAiN operations should allow mixed NumPy/GAiN inputs � Reuse as much of NumPy as possible (obviously) � Distributed nature of arrays should be transparent to user � Use owner-computes rule to attempt data locality optimizations 10 Scipy July 13 2011

  11. Why Subclassing numpy.ndarray Fails � The hooks: � __new__(), � __array_prepare__() � __array_finalize__() � __array_priority__ � First hook __array_prepare__() is called after the output array has been created � No means of intercepting array creation � Array is allocated on each process – not distributed 11 Scipy July 13 2011

  12. The gain.ndarray in a Nutshell [0:3,0:3] [0:3,3:6] [0:3,6:9] [0:3,9:12] (3,3) (3,3) (3,3) (3,3) � Global shape and P local shapes � Memory allocated from Global Arrays library, wrapped in local [3:6,0:3] [3:6,3:6] [3:6,6:9] [3:6,9:12] numpy.ndarray (3,3) (3,3) (3,3) (3,3) � The memory distribution is static � Views and array operations [0:6,0:12] query the current (6,12) global_slice 12 Scipy July 13 2011

  13. Example: Slice Arithmetic � Observation: In a = ndarray(6,12) b = a[::2,::3] c = b[1,:] both cases shown here, Array b could be created either using the b = a[slice(0,6,2), c = a[2, standard notation slice(0,12,3)] slice(0,12,3)] (top) or the a = ndarray(6,12) b = a[1:-1,1:-1] c = b[1:-1,1:-1] “canonical” form (bottom) b = a[slice(1,5,1), c = a[slice(2,4,1), slice(1,11,1)] slice(2,10,1)] 13 Scipy July 13 2011

  14. Example: Binary Ufunc + = � Owner-computes rule means output array owner does the work � ga.access() other input array portions since all distributions and shapes are the same � call original NumPy ufunc on the pieces 14 Scipy July 13 2011

  15. Example: Binary Ufunc with Sliced Arrays + = � Owner-computes rule means output array owner does the work � ga.get() other input array portions since arrays not aligned � call original NumPy ufunc 15 Scipy July 13 2011

  16. Example: Binary Ufunc + = � Broadcasting works too � Not all arrays are distributed 16 Scipy July 13 2011

  17. How to Use GAiN Ideally, change one line in your script: #import numpy import ga.gain as numpy Run using the MPI process manager: $ mpiexec -np 4 python script.py 17 Scipy July 13 2011

  18. Live Demo: laplace.py 2D Laplace equation using an iterative finite difference scheme (four point averaging, Gauss-Seidel or Gauss- Jordan). I’ll now show you how to use GAiN (This is not the “pretty pictures” part of the presentation -- there’s nothing pretty about raw computation.) 18 Scipy July 13 2011

  19. laplace.py Again, but Bigger 19 Scipy July 13 2011

  20. GAiN is Not Complete (yet) � What’s finished: � Ufuncs (all, but not reduceat or outer) � ndarray (mostly) � flatiter � numpy dtypes are reused! � Various array creation and other functions: � zeros, zeros_like, ones, ones_like, empty, empty_like � eye, identity, fromfunction, arange, linspace, logspace � dot, diag, clip, asarray � Everything else doesn’t exist, including order=‘ � GAiN is here to stay – it’s official supported by the GA project (me!) 20 Scipy July 13 2011

  21. Thanks! Time for Questions Using the Global Arrays Toolkit to Reimplement NumPy for Distributed Computation Jeff Daily , Pacific Northwest National Laboratory jeff.daily@pnnl.gov Robert R. Lewis, Washington State University bobl@tricity.wsu.edu Where to get the code until pnl.gov domain is restored : https://github.com/jeffdaily/Global-Arrays-Scipy-2011 Where to get the code, usually: https://svn.pnl.gov/svn/hpctools/trunk/ga Website (documentation, download releases, etc): http://www.emsl.pnl.gov/docs/global 21 Scipy July 13 2011

Recommend


More recommend