Parallel computing with Python Delft University of Technology ´ Alvaro Leitao Rodr´ ıguez December 10, 2014 ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 1 / 36
Outline 1 Python tools for parallel computing 2 Parallel Python What is PP? API 3 MPI for Python MPI mpi4py 4 GPU computing with Python GPU computing CUDA PyCUDA Anaconda Accelerate - Numbapro ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 2 / 36
Symmetric multiprocessing • Multiprocessing: included in the standard library. • Parallel Python. • IPython. • Others: POSH, pprocess, etc... ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 3 / 36
Cluster computing • Message Passing Interface (MPI): mpi4py, pyMPI, pypar, ... • Parallel Virtual Machine (PVM): pypvm, pynpvm, ... • IPython. • Others: Pyro, ScientificPython, ... ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 4 / 36
Parallel GPU computing • PyCUDA. • PyOpenCL. • Copperhead. • Anaconda Accelerate. ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 5 / 36
Next ... 1 Python tools for parallel computing 2 Parallel Python What is PP? API 3 MPI for Python MPI mpi4py 4 GPU computing with Python GPU computing CUDA PyCUDA Anaconda Accelerate - Numbapro ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 6 / 36
Parallel Python - PP • PP is a python module. • Parallel execution of python code on SMP and clusters. • Easy to convert serial application in parallel. • Automatic detection of the optimal configuration. • Dynamic processors allocation (number of processes can be changed at runtime). • Cross-platform portability and interoperability (Windows, Linux, Unix, Mac OS X). • Cross-architecture portability and interoperability (x86, x86-64, etc.). • Open source: http://www.parallelpython.com/ . ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 7 / 36
Next ... 1 Python tools for parallel computing 2 Parallel Python What is PP? API 3 MPI for Python MPI mpi4py 4 GPU computing with Python GPU computing CUDA PyCUDA Anaconda Accelerate - Numbapro ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 8 / 36
PP - module API • Idea: Server provide you workers (processors). • Workers do a job. • class Server - Parallel Python SMP execution server class init (self, ncpus=’autodetect’, ppservers=(), secret=None, • restart=False, proto=2, socket timeout=3600) • submit (self, func, args=(), depfuncs=(), modules=(), callback=None, callbackargs=(), group=’default’, globals=None) • Other: get ncpus , set ncpus , print stats , ... • class Template init (self, job server, func, depfuncs=(), modules=(), • callback=None, callbackargs=(), group=’default’, globals=None) • submit (self, *args) ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 9 / 36
PP - Examples • First example: pp hello world.py • More useful example: pp sum primes ntimes.py • What happens if n is too different? • A really useful example: pp sum primes.py • How long is the execution with different amount of workers? • Template example: pp sum primes ntimes Template.py • More involved examples: pp montecarlo pi.py and pp midpoint integration.py ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 10 / 36
Next ... 1 Python tools for parallel computing 2 Parallel Python What is PP? API 3 MPI for Python MPI mpi4py 4 GPU computing with Python GPU computing CUDA PyCUDA Anaconda Accelerate - Numbapro ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 11 / 36
What is MPI? • An interface specification: MPI = Message Passing Interface. • MPI is a specification for the developers and users of message passing libraries. • But, by itself, it is NOT a library (it is the specification of what such a library should be). • MPI primarily follows the message-passing parallel programming model. • The interface attempts to be: practical, portable, efficient and flexible. • Provide virtual topology, synchronization, and communication functionality between a set of processes. • Today, MPI implementations run on many hardware platforms: Distributed memory, Shared memory, Hybrid, ... ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 12 / 36
MPI concepts • MPI processes. • Communicator: connect groups of processes. • Communication: • Point-to-point: • Synchronous: MPI Send, MPI Recv. • Asynchronous: MPI ISend, MPI Recv. • Collective: MPI Bcast, MPI Reduce, MPI Gather, MPI Scatter. • Rank: within a communicator, every process has its own unique, integer identifier. ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 13 / 36
Next ... 1 Python tools for parallel computing 2 Parallel Python What is PP? API 3 MPI for Python MPI mpi4py 4 GPU computing with Python GPU computing CUDA PyCUDA Anaconda Accelerate - Numbapro ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 14 / 36
mpi4py • Python implementation of MPI. • API based on the standard MPI-2 C++ bindings. • Almost all MPI calls are supported. • Code is easy to write, maintain and extend. • Faster than other solutions (mixed Python and C codes). • A pythonic API that runs at C speed. • Open source: http://mpi4py.scipy.org/ ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 15 / 36
mpi4py - Basic functions • Python objects. • send (self, obj, int dest=0, int tag=0) • recv (self, obj, int source=0, int tag=0, Status status=None) • bcast (self, obj, int root=0) • reduce (self, sendobj, recvobj, op=SUM, int root=0) • scatter (self, sendobj, recvobj, int root=0) • gather (self, sendobj, recvobj, int root=0) • C-like structures. • Send (self, buf, int dest=0, int tag=0) • Recv (self, buf, int source=0, int tag=0, Status status=None) • Bcast (self, buf, int root=0) • Reduce (self, sendbuf, recvbuf, Op op=SUM, int root=0) • Scatter (self, sendbuf, recvbuf, int root=0) • Gather (self, sendbuf, recvbuf, int root=0) ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 16 / 36
mpi4py - Examples • First example: mpi hello world.py • Message passing example: mpi simple.py • Point-to-point example: mpi buddy.py • Collective example: mpi matrix mul.py • Reduce example: mpi midpoint integration.py ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 17 / 36
Next ... 1 Python tools for parallel computing 2 Parallel Python What is PP? API 3 MPI for Python MPI mpi4py 4 GPU computing with Python GPU computing CUDA PyCUDA Anaconda Accelerate - Numbapro ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 18 / 36
What is GPU computing? • GPU computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate application. • CPU consists of a few cores optimized for sequential serial processing. • GPU has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously. • GPU can be seen as a co-processor of the CPU. ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 19 / 36
GPU computing • Uses standard video cards by Nvidia or sometimes ATI. • Uses a standard PC with Linux, MSW or MacOS. • Programming model SIMD (Single Instruction, Multiple Data). • Parallelisation inside card is done through threads . • SIMT (Single Instruction, Multiple Threads). • Dedicated software to access the card and start kernels . • CUDA by Nvidia and OpenCL are the most popular solutions. ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 20 / 36
GPU computing - Advantages • Hardware is cheap compared with workstations or supercomputers. • Simple GPU already inside many desktops without extra investments. • Capable of thousands of parallel threads on a single GPU card. • Very fast for algorithms that can be efficiently parallelised. • Better speedup than MPI for many threads due to shared memory. • Several new high level libraries hiding complexity: BLAS, FFTW, SPARSE, ... • In progress. ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 21 / 36
GPU computing - Disadvantages • Limited amount of memory available (max. 2-24 GByte). • Memory transfers between host and graphics card cost extra time. • Fast double precision GPUs still quite expensive. • Slow for algorithms without enough data parallellism. • Debugging code on GPU can be complicated. • Combining more GPUs to build a cluster is (was?) complex (often done with pthreads, MPI or OpenMP). • In progress. ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 22 / 36
GPU computing ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 23 / 36
GPU hardware structure ´ Alvaro Leitao Rodr´ ıguez (TU Delft) Parallel Python December 10, 2014 24 / 36
Recommend
More recommend