high performance computing for numerical probability
play

High performance computing for numerical probability J er ome - PowerPoint PPT Presentation

HPC for numerical probability High performance computing for numerical probability J er ome Lelong Universit e de Grenoble Alpes Ensimag / LJK Thursday 15 October 2015 J. Lelong (Univ. Grenoble Alpes) 15/10/2015 1 / 31 HPC for


  1. HPC for numerical probability High performance computing for numerical probability J´ erˆ ome Lelong Universit´ e de Grenoble Alpes – Ensimag / LJK Thursday 15 October 2015 J. Lelong (Univ. Grenoble Alpes) 15/10/2015 1 / 31

  2. HPC for numerical probability Introduction to parallel computing 1 Why we need parallel computing Parallel architectures Generating random numbers in parallel 2 How HPC can help numerical probability 3 Monte Carlo PDE methods Tree methods Non linear problems J. Lelong (Univ. Grenoble Alpes) 15/10/2015 2 / 31

  3. HPC for numerical probability Introduction to parallel computing 1 Why we need parallel computing Parallel architectures Generating random numbers in parallel 2 How HPC can help numerical probability 3 Monte Carlo PDE methods Tree methods Non linear problems J. Lelong (Univ. Grenoble Alpes) 15/10/2015 3 / 31

  4. HPC for numerical probability Why we need HPC (1) ◮ We want to handle larger problems (more data or variables). ◮ We want programs to be faster. Sequential programming has reached some limitations ◮ Moore’s law reaches physical limits (broken since 2004) ◮ No more possible to double the density of transistors every 18 months (Gordon Moore, 1965). ◮ Increasing frequency increases both the energy consumption and heating. ◮ Frequency stays almost constant, but the number of cores increases. J. Lelong (Univ. Grenoble Alpes) 15/10/2015 4 / 31

  5. HPC for numerical probability Why we need HPC (2) ◮ Memory is a real bottleneck. ◮ Since the 70s, the frequency of CPUs has increased much faster than the one of memory. Processors keep waiting for data. ◮ When computing a [ i ] = b [ i ] + c [ i ] using an Intel Core i7-6700HQ. ◮ Memory Bandwidth 34 GB/s, with 2 channels. ◮ Maximum turbo frequency 3 . 5 Ghz ◮ 4 adds per cycle ◮ Transferring data takes 3 ∗ 8 / (34 E 9 / 2) / (1 / 3 . 5 E 9 / 4) = 19 . 7. more times than computing the sum. amount data / bandwidth / time for add ◮ Parallel computing gives access to more memory and more memory channels. ◮ When all the cores of processor are computing, the clock speed of each core decreases a bit but all the memory channels can be used. Memory is less a bottleneck. ◮ Memory bandwidth increases very slowly. J. Lelong (Univ. Grenoble Alpes) 15/10/2015 5 / 31

  6. HPC for numerical probability Main architectures (1) ◮ Multi core units: shared memory. All the cores share the same global memory. ◮ Scaling is often hard/bad mainly because of cache synchronization. ◮ Programming is pretty easy at first sight (OPENMP) ◮ No need of message passing but some concurrent accesses. ◮ Clusters: distributed memory. Aggregation of processing units through a high speed network. Each unit has its local memory and there is no global memory access. ◮ Scaling is better. ◮ Need of a specific communication protocol to exchange data between the processing units. ◮ Programming needs to handle data passing explicitly. ◮ Optimizing the ratio communication/computations requires careful design. J. Lelong (Univ. Grenoble Alpes) 15/10/2015 6 / 31

  7. HPC for numerical probability Main architectures (2) ◮ Clusters of multi cores: two levels of parallelism. Distributed memory between the nodes (first level) and shared memory inside each node (second level). ◮ Programming is delicate with two different parallel paradigms. ◮ Optimizing is complex. ◮ Can achieve better performances. J. Lelong (Univ. Grenoble Alpes) 15/10/2015 7 / 31

  8. HPC for numerical probability Main architectures (3) J. Lelong (Univ. Grenoble Alpes) 15/10/2015 8 / 31

  9. HPC for numerical probability Main architectures (4) ◮ Grids: heterogeneous processing units linked through a low speed network. ◮ Low network. ◮ Only for applications with very little communication. ◮ Startup latencies. ◮ Accelerators: ◮ GPU: it requires to use a dedicated programming language. ◮ Intel MIC (Xeon Phi): 60–core with 8GB of RAM; pragmas programming. J. Lelong (Univ. Grenoble Alpes) 15/10/2015 9 / 31

  10. HPC for numerical probability Accelerators Excerpt from the Top 500 highlights (June 2015). ◮ The No. 1 system and the No. 7 system use Intel Xeon Phi processors to speed up their computational rate. The No. 2 system and the No. 6 system use NVIDIA GPUs to accelerate computation. ◮ A total of 90 systems on the list are using accelerator/co-processor technology, up from 75 on November 2014. 52 of these use NVIDIA chips, 4 use ATI Radeon, and there are now 35 systems with Intel MIC technology (Xeon Phi). 4 systems use a combination of Nvidia and Intel Xeon Phi accelerators/co-processors. ◮ 97% the systems use processors with six or more cores and 87 . 8% use eight or more cores. J. Lelong (Univ. Grenoble Alpes) 15/10/2015 10 / 31

  11. HPC for numerical probability J. Lelong (Univ. Grenoble Alpes) 15/10/2015 11 / 31

  12. HPC for numerical probability Shared memory (1) ◮ Global memory space for all processing units and local caches. ◮ All units have the same memory. Memory changes are viewed “instantaneously”. But, maintaining the memory coherence costs system time and may cause concurrent access. ◮ Two kinds of shared memories ◮ SMP (Symmetric Multi Processors). All units are plugged on a unique memory bus. CPU CPU CPU CPU MEMORY J. Lelong (Univ. Grenoble Alpes) 15/10/2015 12 / 31

  13. HPC for numerical probability Shared memory (2) ◮ NUMA (Non Uniform Memory Access). The memory topology ensures faster access to closer memory. CPU CPU CPU CPU CPU CPU CPU CPU MEMORY MEMORY MEMORY MEMORY CPU CPU CPU CPU CPU CPU CPU CPU J. Lelong (Univ. Grenoble Alpes) 15/10/2015 13 / 31

  14. HPC for numerical probability Distributed memory ◮ Each processor has its own memory. No global memory space. ◮ Access to other processors’ memory requires message passing through an interconnect network. No problem of concurrent access but explicit communication. INTERCONNECT NETWORK CPU CPU CPU CPU MEMORY MEMORY MEMORY MEMORY J. Lelong (Univ. Grenoble Alpes) 15/10/2015 14 / 31

  15. HPC for numerical probability Introduction to parallel computing 1 Why we need parallel computing Parallel architectures Generating random numbers in parallel 2 How HPC can help numerical probability 3 Monte Carlo PDE methods Tree methods Non linear problems J. Lelong (Univ. Grenoble Alpes) 15/10/2015 15 / 31

  16. HPC for numerical probability Random number generators ◮ A random number generator is a deterministic recurrent sequence with statistical properties close to those of an i.i.d. sample from the uniform distribution. ◮ Fully determined by the initial state and the transition function. x n +1 = ( a 0 · x n + a 1 · x n − 1 + · · · + a k · x n − k + b ) mod m . ◮ What do we expect from parallel random number generators? ◮ Reproducibility: rerun a scenario (at least with the same number of processing units) ◮ Independence of the numbers generated on different processing units and within one unit. ◮ No communication between the generators (at least after startup). J. Lelong (Univ. Grenoble Alpes) 15/10/2015 16 / 31

  17. HPC for numerical probability Parallel random number generators Two possible approaches ◮ Splitting : split a single generator in several independent streams, each of them still having the same properties as the initial generator but with a shorter period. ◮ Parametrization : each stream uses different parameters for its transition function (usually the multipliers of the recurrence) in order to achieve both independence and a maximum period. J. Lelong (Univ. Grenoble Alpes) 15/10/2015 17 / 31

  18. HPC for numerical probability Splitting (1) ◮ Blocking : assign each processing unit a contiguous bloc of the sequential generator. ◮ It requires to be able to jump ahead at startup (usually only possible for a jump of size 2 k , see [L’Ecuyer and Cˆ ot´ e, 91]) ◮ Shorter period. ◮ The way the sequential samples are spread on the processing units may depend on the number of units. ◮ Long range correlation may occur at short scale. unit 1 unit 2 unit 3 unit 4 unit 5 M M M M M J. Lelong (Univ. Grenoble Alpes) 15/10/2015 18 / 31

  19. HPC for numerical probability Splitting (2) ◮ Leap frog method : assign each processing unit an equally spaced subsequence of the original sequence. ◮ It requires to make a jump of size p at each call as quickly as getting the next value in the sequence. Usually possible for linear congruential generators. ◮ Numbers used by each processing unit change with the number of units. ◮ Elements of a sub stream are equally spaced in the original sequence. Such sub streams may show high correlations, see [Matteis and Pagnutti, 84]. x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x 19 x 20 x 21 x 22 x 23 x 24 x 25 x 26 J. Lelong (Univ. Grenoble Alpes) 15/10/2015 19 / 31

  20. HPC for numerical probability Splitting (3) ◮ Splitting the Mersenne Twister generator, see [Haramoto et al., 08] et [Haramoto, 08]. ◮ Nice, because of the huge period N = 2 19937 with negligible long range correlations. With 10 5 processors, the period is still 2 19920 ≈ 10 5981 . ◮ Hard to implement because of the computation of the n th power of the multiplier. J. Lelong (Univ. Grenoble Alpes) 15/10/2015 20 / 31

Recommend


More recommend