Jussi Enkovaara Martti Louhivuori Python in High-Performance Computing CSC – IT Center for Science Ltd, Finland January 29-31, 2018 import sys, os try: from Bio.PDB import PDBParser __biopython_installed__ = True except ImportError: __biopython_installed__ = False __default_bfactor__ = 0.0 # default B-factor __default_occupancy__ = 1.0 # default occupancy level __default_segid__ = '' # empty segment ID class EOF(Exception): def __init__(self): pass class FileCrawler: """ Crawl through a file reading back and forth without loading anything to memory. """ def __init__(self, filename): try: self.__fp__ = open(filename) except IOError: raise ValueError, "Couldn't open file '%s' for reading." % filename self.tell = self.__fp__.tell self.seek = self.__fp__.seek def prevline(self): try: self.prev()
All material (C) 2018 by the authors. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License, http://creativecommons.org/licenses/by-nc-sa/3.0/
PRACTICALITIES Computing servers We will use either the classroom computers or CSC’s computing server Taito for the exercises. You can use also your laptop if all the necessary Python modules (numpy, cython, mpi4py etc.) are installed. To log in to Taito , use the provided tnrgXX username and password, e.g. % ssh – X trng10@taito.csc.fi For editing files you can use e.g. nano editor: nano example.py Also other popular editors (vim, emacs) are available. Python environment at CSC Python is available on all computing servers (taito, sisu), in order to utilize scientific computing packages use the module command to load correct Python environment. Default Python version is 2.7, in this course we will be using Python 3.4, which can be loaded in Taito as % module load python-env/3.4.5 In the classroom computers module load is not needed. Exercise material All the exercise material (skeleton codes, input files, model solutions, etc.) is available in github at https://github.com/csc-training/hpc-python The material can be downloaded (cloned) as % git clone https://github.com/csc-training/hpc-python.git Each exercise is contained in its own subdirectory, i.e. numpy/reference-copy numpy/one-dimensional-arrays …
General exercise instructions Simple exercises can be carried out directly in the interactive interpreter. For more complex ones it is recommended to write the program into a .py file. Still, it is useful to keep an interactive interpreter open for testing! Some exercises contain references to functions/modules which are not addressed in actual lectures. In these cases Python's interactive help (and google) are useful, e.g. In [4]: help(numpy) It is not necessary to complete all the exercises, instead you may leave some for further study at home. Also, some Bonus exercises are provided in the end of the sheet. Visualisation In some exercises it might be convenient to do visualisations with matplotlib Python package. Interactive plotting is most convenient with the IPython enhanced interpreter. For enabling interactive plotting, start IPython with --matplotlib argument: % ipython – matplotlib Simple x-y plots can then be done as: In [1]: import matplotlib.pyplot as plt … In [6]: plt.plot(x,y) # line In [7]: plt.plot (x,y, ’ro’) # individual points in red Look matplotlib documentation for additional information for visualisation. Parallel calculations In class room workstations, one needs to load the MPI environment before using mpi4py: % module load mpi/openmpi-x86_64 After that MPI parallel Python programs can be launched with mpirun, e.g. to run with 4 MPI tasks one issues % mpirun – np 4 python3 example.py
In Taito one can launch interactive MPI programs with srun : % srun -n4 python3 hello.py Note that for real production calculations in Taito one should use batch job scripts, see https://research.csc.fi/taito-user-guide
Basic array manipulation 1. Reference vs. copy Investigate the behavior of the statements below by looking at the values of the arrays a and b after assignments: a = np.arange(5) b = a b[2] = -1 b = a[:] b[1] = -1 b = a.copy() b[0] = -1 2. One dimensional arrays Start from a Python list containing both integers and floating point values, and construct then a NumPy array from the list. Generate a 1D NumPy array containing all numbers from -2.0 to 2.0 with a spacing of 0.2. Use optional start and step arguments of np.arange() function. Generate another 1D NumPy array containing 11 equally spaced values between 0.5 and 1.5. Extract every second element of the array. 3. Two dimensional arrays and slicing First, create a 4x4 array with arbitrary values, then a) Extract every element from the second row. b) Extract every element from the third column. c) Assign a value of 0.21 to upper left 2x2 subarray. Next, create a 8x8 array with checkerboard pattern, i.e. alternating zeros and ones: 1 0 1 … 0 1 0 … ( ) 1 0 1 … … … … …
4. Splitting and combining arrays Continue with the previous 4x4 array a) Use np.split() function for splitting the array into two new 2x4 arrays. Reconstruct the original 4x4 array by using np.concatenate(). b) Repeat the above exercise but create now 4x2 subarrays and then combine them. 5. Subdiagonal matrices Create a 6x6 matrix with 1’s above and below the diagonal and zeros otherwise: 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 1 ( 0) 0 0 0 0 1 Use the numpy.eye() – function.
NumPy tools 6. Input and output File xy_coordinates.dat contains a list of (x,y) value pairs. Read the data with numpy.loadtxt() . Add then 2.5 to all y values and write the new data into a file using numpy.savetxt() . 7. Polynomials Fit a second order polynomial to the data of previous exercises by using numpy.polyfit() . Construct the values of the polynomial in the interval [-6,6] with numpy.polyval() . You can use matplotlib for plotting both the original points and the fitted polynomial. 8. Random numbers Generate a one dimensional 1000 element array of uniformly distributed random numbers using numpy.random module. Calculate the mean and standard deviation of the array using numpy.mean() and numpy.std(). Choose some other random distribution and calculate its mean and standard deviation. You can visualize the ran dom distributions with matplotlib’s hist() function. 9. Linear algebra Construct two symmetric 2x2 matrices A and B . (hint: a symmetric matrix can be constructed easily as A sym = A + A T ) Calculate the matrix product C = A * B using numpy.dot() . Calculate the eigenvalues of matrix C with numpy.linalg.eigvals() .
Advanced NumPy 10. Advanced indexing Start with 10x10 array of uniformly distribute random numbers ( np.random.random ). Find all the elements larger than 0.5 by using Boolean mask. Find also the indices of elements larger than 0.5. You can use the above mask and numpy.nonzero() for finding the indices. Check that the values obtained by using the index array are the same as with the earlier direct Boolean indexing. 11. Translation with broadcasting File points_circle.dat contains x, y coordinates along a circle. Translate all the coordinates with some vector e.g. (2.1, 1.1). Plot both the original and translated points in order to see the effect of translation. 12. Finite-difference Derivatives can be calculated numerically with the finite-difference method as: Construct 1D Numpy array containing the values of x i in the interval [0, π /2] with spacing Δ x=0.1. Evaluate numerically the derivative of sin in this interval (excluding the end points) using the above formula. Try to avoid for loops. Compare the result to function cos in the same interval. 13. Numerical integration A simple method for evaluating integrals numerically is by the middle Riemann sum
with x’ i = (x i + x i-1 )/2. Use the same interval as in the first exercise and investigate how much the Riemann sum of sin differs from 1.0. Avoid for loops. Investigate also how the results changes with the choice of Δ x. 14. Temporary arrays Try different NumPy array expressions and investigate how much memory is used in temporary arrays. You can use the Unix command ‘/usr/bin/time –v ‘ for finding out the maximum memory usage of program, or utilize the maxmem() function in demos/memory_usage.py 15. Numexpr Try different array expressions and investigate how much numexpr can speed them up. Try to vary also the number of threads used by numexpr . IPython and %timeit magic can be useful in testing.
Performance analysis 16. Using cProfile The file heat_simple.py contains (very inefficient) implementation of the two dimensional heat equation. Use cProfile for investigating where the time is spent in the program. You can try to profile also the more efficient model solution of numpy/ heat-equation
Recommend
More recommend