multi many core programming strategies
play

Multi/Many Core Programming Strategies Greg Michaelson School of - PowerPoint PPT Presentation

Multicore Challenge Conference 2012 UWE, Bristol Multi/Many Core Programming Strategies Greg Michaelson School of Mathematical & Computer Sciences Heriot-Watt University Multicore Challenge Conference 2012 1 Overview RAM good old


  1. Multicore Challenge Conference 2012 UWE, Bristol Multi/Many Core Programming Strategies Greg Michaelson School of Mathematical & Computer Sciences Heriot-Watt University Multicore Challenge Conference 2012 1

  2. Overview RAM • good old fashioned parallel computing PE PE PE based on lots of identical single CPUs is Network over Shared memory RAM RAM RAM PE PE PE Network Distributed memory Multicore Challenge Conference 2012 2

  3. Overview • Moore’s Law implications have changed – speed of CPUs now stable at Intel 4004 – 1971 http://en.wikipedia.org/wiki/Intel_4004 ~3.5 GHz – performance increases from multi- & many-core CPUs Intel Core I7 – 2008 http://en.wikipedia.org/wiki/Intel_Core_i7 Multicore Challenge Conference 2012 3

  4. Overview • multi-processor architectures increasingly hierarchical & heterogeneous • message passing grids of clusters of: – now: shared memory Hector – Edinburgh Parallel Computer Centre • 464 compute blades with… multi-core • 4 compute nodes with… • 2 *12-core processors. • 44,544 cores http://www.hector.ac.uk/abouthector/hectorbasics/ Multicore Challenge Conference 2012 4

  5. Overview • multi-processor architectures increasingly hierarchical & heterogeneous • message passing grids of clusters of: – soon: message passing many-core arrays SCC – Intel Research http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Multicore Challenge Conference 2012 5

  6. Overview • cores also have SIMD processors (MMX/SSE) • non-uniform memory – differing degrees/levels of private & shared cache • old programming strategies break down – one size no longer fits all • need for hybrid strategies Multicore Challenge Conference 2012 6

  7. Overview • developing multi- processor software is still a black art • would like: – low effort – flexibility – scalability – future proof – re-use Multicore Challenge Conference 2012 7

  8. Overview • different approaches: – require different effort – offer different degree of control over: • task division • communications • process placement Multicore Challenge Conference 2012 8

  9. Methodological choices START Multicore Challenge Conference 2012 9

  10. Methodological choices START automatic parallelisation Multicore Challenge Conference 2012 10

  11. Automatic Parallelisation • vector/array parallelisation • implicit – e.g. SIMD in C with gcc • language directives – Fortrans: Fortran 90; F; High Performance Fortran Multicore Challenge Conference 2012 11

  12. Automatic Parallelisation • low effort – no communications – no/minimal task division • poor flexibility/scalability – good for regular problems – good on uniform architectures Multicore Challenge Conference 2012 12

  13. Methodological choices START automatic do it yourself parallelisation Multicore Challenge Conference 2012 13

  14. Methodological choices START automatic do it yourself parallelisation skeleton Multicore Challenge Conference 2012 14

  15. Algorithmic skeletons • capture common stage patterns of data & 1 control parallelism farmer stage – e.g. pipeline; farm; 2 divide & conquer worker • skeleton libraries worker worker stage N for C/Java process farm pipeline Multicore Challenge Conference 2012 15

  16. Algorithmic skeletons • capture common parent patterns of data & control parallelism parent/ parent/ – e.g. pipeline; farm; child child divide & conquer parent/ parent/ parent/ parent/ • skeleton libraries child child child child for C/Java divide & conquer Multicore Challenge Conference 2012 16

  17. Algorithmic skeletons • industrial frameworks • e.g. Google Map- Reduce • Apache Hadoop Google Map-Reduce http://labs.google.com/papers/mapreduce-osdi04-slides/index-auto- 0008.html Multicore Challenge Conference 2012 17

  18. Algorithmic skeletons • industrial frameworks • e.g. Microsoft Dryad Microsoft Dryad www.wikibench.eu/CloudCP2011/wp-content/.../Isaacs-keynote.ppsx Multicore Challenge Conference 2012 18

  19. Algorithmic skeletons • can choose appropriate skeleton for problem class • medium effort to use skeleton library/industrial framework – must fit problem to skeleton • high effort to develop own skeletons – must make communication & task division explicit Multicore Challenge Conference 2012 19

  20. Algorithmic skeletons • can hand tune for: – problem – irregularity – scalability – process placement • strong potential re-use of components Multicore Challenge Conference 2012 20

  21. Methodological choices START automatic do it yourself parallelisation programmed skeleton parallelisation Multicore Challenge Conference 2012 21

  22. Methodological choices START automatic do it yourself parallelisation programmed skeleton parallelisation operating system Multicore Challenge Conference 2012 22

  23. Operating system • independent programs – realised as threads • communication via pipes/sockets • bolted together with shell scripts Multicore Challenge Conference 2012 23

  24. Operating system • low effort • highly dependent on underlying operating system for: – communication – scheduling – process placement • unpredictable performance Multicore Challenge Conference 2012 24

  25. Methodological choices START automatic do it yourself parallelisation programmed skeleton parallelisation operating explicit system processes Multicore Challenge Conference 2012 25

  26. Methodological choices START automatic do it yourself parallelisation programmed skeleton parallelisation operating explicit system processes library Multicore Challenge Conference 2012 26

  27. Library • shared memory – OpenMP • platform & architecture independent – Posix Threads • Unix/Linux specific/architecture independent – Intel Threading Building Blocks • platform/architecture independent Multicore Challenge Conference 2012 27

  28. Library • distributed memory – MPI & PVM • specialised hardware – SIMD on MMX/SSE – CUDA & OpenCL for GPU arrays Multicore Challenge Conference 2012 28

  29. Library • now common to use: – MPI for inter-cluster – OpenMP for intra-cluster • medium to high effort – explicit communication & task division • can shape algorithm to architecture • best for irregular problem/architecture Multicore Challenge Conference 2012 29

  30. Library • often end up re-inventing some standard algorithmic skeleton • good potential for reuse of: – structure – components Multicore Challenge Conference 2012 30

  31. Methodological choices START automatic do it yourself parallelisation programmed skeleton parallelisation operating explicit system processes library hand crafted Multicore Challenge Conference 2012 31

  32. Hand crafted • very low level • shared memory – critical regions via semaphores • distributed memory – communication over RS232; USB Multicore Challenge Conference 2012 32

  33. Hand crafted • very high effort • highly problem/architecture specific • best for embedded systems Multicore Challenge Conference 2012 33

  34. Questions... • is my problem suitable for parallelisation? • how do I know how my problem scales? • if I parallelise my problem, how do I tell how much communication overhead will be incurred? • how do I assess the benefits of shared versus distributed memory? 28th June, 2011 KTN ICT Scalable Applications & Services 34

  35. Questions... • can I do better with smarter solutions on my existing technology? • where can I get help with deciding how to proceed? • have other people already come up with solutions that might work for me? 28th June, 2011 KTN ICT Scalable Applications & Services 35

  36. Future • UK has major research strengths in multi- processor architectures, parallel languages/compilers, skeletons etc • groups don’t talk much to each other or to practitioners e.g. in eScience • need to build inclusive UK community • opportunities through – EPSRC multi-core priority for ITC – TSB ICT KTN for multi-core Multicore Challenge Conference 2012 36

Recommend


More recommend