what you need to learn now to decide what you need to
play

what you need to learn now to decide what you need to learn next - PowerPoint PPT Presentation

Scientific computing: An introduction to tools and programming languages what you need to learn now to decide what you need to learn next Bob Dowling rjd4@cam.ac.uk University Information Services Course outline Basic concepts


  1. Scientific computing: An introduction to tools and programming languages “ what you need to learn now to decide ” what you need to learn next Bob Dowling rjd4@cam.ac.uk University Information Services

  2. Course outline Basic concepts Good practice Specialist applications Programming languages

  3. Course outline Basic concepts Good practice Specialist applications Programming languages

  4. Serial computing Single CPU

  5. Parallel computing Multiple CPUs S ingle I nstruction M ultiple D ata MPI OpenMP

  6. Parallel computing courses Parallel Programming: Options and Design Parallel Programming: Introduction to MPI

  7. Distributed computing Multiple computers

  8. Distributed computing courses HTCondor and CamGrid

  9. High Perfomance Computing course High Performance Computing: An Introduction

  10. Floating point numbers e.g. numerical simulations Universal principles: 0.1 → 0.1000000000001 and worse… >>> 0.1 + 0.1 0.2 >>> 0.1 + 0.1 + 0.1 0.30000000000000004

  11. Floating point courses Program Design: How Computers Handle Numbers

  12. Text processing fabliaux e.g. sequence comparison factrix text searching falx faulx faux fax feedbox … ^f.*x$ fornix forty-six fourplex fowlpox “Regular expressions” fox fricandeaux frutex fundatrix

  13. Regular expression courses Programming Concepts: Pattern Matching Using Regular Expressions Python 3: (includes a regular Advanced Topics expressions unit) (Self-paced)

  14. Course outline Basic concepts Good practice Specialist applications Programming languages

  15. “Divide and conquer” Complex “divide” problem Simple Simple Simple Less complex problem problem problem problem Simple Simple problem problem “conquer” Partial Partial solution solution Partial Partial Partial Partial solution solution solution solution “glue” Complete solution

  16. “Divide and conquer” — the trick Simple Simple Simple problem problem problem Simple Simple problem problem “conquer” Partial Partial solution solution Partial Partial Partial solution solution solution No need to use the same tool for each “mini-conquest” !

  17. Example “ Read 256 lines of data represented in a CSV format. Each line should have 256 numbers on it, but some are split into two lines of 128 numbers each. Run Aardvark’s algorithm on each 256×256 set of data. Write out the output as text in the same CSV format (exactly 256 numbers per line, every line) and plot a heat graph of the output to a separate file. Keep ” reading 256-line lumps like this until they’re all done.

  18. Example Read 256 lines of data represented in a CSV format. Each line should have 256 numbers on it, but some are split into two lines of 128 numbers each. Run Aardvark’s algorithm on each 256×256 set of data. Write out the output as text in the same CSV format (exactly 256 numbers per line, every line) and plot a heat graph of the output to a separate file. Keep reading 256-line lumps like this until they’re all done.

  19. Example Read Read 256 lines of data CSV format. Each line will have 256 numbers on it. 256×256 set of data. Repeat Aardvark’s algorithm Process output CSV Graphics Write file CSV format plot a heat graph Write file Keep reading 256-line lumps like this until they’re all done.

  20. “Structured programming” Split program into “lumps” Use lumps methodically Use lumps methodically Do not repeat code Programs Functions “ Lumps ” ? Modules Units

  21. Example: unstructured code a_norm = 0.0 for i in range(0,100): a_norm += a[i]*a[i] … b_norm = 0.0 Repetition ! for i in range(0,100): b_norm += b[i]*b[i] … c_norm = 0.0 for i in range(0,100): c_norm += c[i]*c[i]

  22. Example: structured code def norm2(v): v_norm = 0.0 Single instance for i in range(0,100): of the code. v_norm += v[i]*v[i] return v_norm a_norm = norm2(a) Calling the function … three times b_norm = norm2(b) … c_norm = norm2(c)

  23. Structured programming Import function Write Test function function Time Debug Once! function function Improve All good practice follows from function structured programming

  24. Example: improved code def norm2(v): w = [] Improved code for i in range(0,100): w.append(v[i]*v[i]) w.sort() v_norm = 0.0 for i in range(0,100): v_norm += w[i] return v_norm a_norm = norm2(a) No change to … calling function b_norm = norm2(b) … c_norm = norm2(c)

  25. Example: improved again code def norm2(v): w = [item*item for item in v] More flexible, w.sort() “pythonic” code v_norm = 0.0 for w_item in w: v_norm += w_item return v_norm a_norm = norm2(a) Still no change to … calling function b_norm = norm2(b) … c_norm = norm2(c)

  26. Example: best code Somebody from library import norm2 else’s code! a_norm = norm2(a) No change to … calling function b_norm = norm2(b) … c_norm = norm2(c)

  27. Structured programming courses Programming Concepts: Introduction for Absolute Beginners

  28. Libraries Written by experts In every area Learn what libraries exist in your area Use them Save your effort for your research

  29. Example libraries N umerical A lgorithms G roup Sci entific Py thon Num erical Py thon

  30. Hard to improve on library functions for(int i=0; i<N, i++) { for(int j=0; j<P, j++) for(int k=0; k<Q, k++) { { for(int k=0; k<Q, k++) for(int j=0; j<P, j++) { { a[i][j] += b[i][k]*c[k][j] } } This “trick” may save you 1% on each matrix multiplication. } It is a complete waste of time!

  31. Hard to improve on library functions ( ) ( )( ) C 11 C 12 A 11 A 12 B 11 B 12 = C 21 C 22 A 21 A 22 B 21 B 22 M 1 =(A 11 +A 22 )(B 11 +B 22 ) C 11 =M 1 +M 2 ‒ M 5 +M 7 M 2 =(A 21 +A 22 )B 11 C 12 =M 3 +M 5 ‒ M 3 =A 11 (B 12 B 22 ) C 21 =M 2 +M 4 ‒ M 4 =A 22 (B 21 B 11 ) C 22 =M 1 ‒ M 2 +M 3 +M 6 M 5 =(A 11 +A 12 )B 22 M 6 =(A 21 ‒ A 11 )(B 11 +B 12 ) Applied recursively: much faster ‒ M 7 =(A 12 A 22 )(B 21 +B 22 )

  32. Algorithms Time taken / Memory used vs. Size of dataset / Required accuracy O(n 2 ) notation Algorithm selection makes or breaks programs.

  33. Unit testing Test each function individually Test each function’s common use “edge cases” bad data handling Catch your bugs early ! Extreme version: “ T est D riven D evelopment”

  34. Revision control Code “checked in” and “checked out” Branches for trying things out Communal working Reversing out errors.

  35. Revision control Two main programs: subversion git Starting from scratch? git Something in use already? Use it! github.com free repository (for open source) try.github.io free online training

  36. Integrated Development Environment “All in one” systems: necessarily quite complex Eclipse Most languages Visual Studio C++. C#, VB, F#, … XCode Most languages Qt Creator C++. JavaScript NetBeans Java

  37. make — the original build system $ make target Command line tool target target.c Dependencies Build rules cc target.c -o target Makefile Used behind the scenes by many IDEs

  38. Building software courses Unix: Building, Installing and Running Software

  39. Course outline Basic concepts Good practice Specialist applications Programming languages

  40. Specialist applications Often no need to program Or only to program simple snippets All have pros and cons

  41. Spreadsheets Microsoft Excel LibreOffice Calc Apple Numbers

  42. Spreadsheets Taught at school Taught badly at school! Easy to tinker Easy to corrupt data Easy to get started Hard to be systematic Very hard to debug Example: Best selling book, buggy spreadsheets!

  43. Excel courses Excel 2010/2013: Introduction Analysing and Summarising Data Functions and Macros Managing Data & Lists

  44. Statistical software

  45. Statistical software Stata: Introduction R: Introduction for Beginners SPSS: Introduction for Beginners SPSS: Beyond the Basics

  46. Mathematical manipulation Matlab Octave Mathematica

  47. Mathamtical software courses Matlab: Introduction for Absolute Beginners Linear Algebra Graphics (Self-paced)

  48. Drawing graphs Manual or automatic?

  49. Courses for drawing graphs Python 3: (includes a Advanced Topics matplotlib unit) (Self-paced)

  50. Course outline Basic concepts Good practice Specialist applications Programming languages

  51. Computer languages Interpreted Compiled Untyped Typed Shell Perl Python Java C,C++, script Fortran What What What the files get you do system created sees

  52. Shell script Suitable for … Unsuitable for … gluing programs together performance- critical jobs “wrapping” programs floating point small tasks GUIs Easy to learn complex tasks Very widely used

  53. Shell script Several “shell” languages: #!/bin/bash job="${1}" /bin/sh … /bin/sh /bin/csh /bin/bash /bin/ksh /bin/tcsh /bin/zsh

  54. Shell scripting courses Unix: Introduction to the Command Line Interface (Self-paced) Simple Shell Scripting for Scientists Simple Shell Scripting for Scientists — Further Use

Recommend


More recommend