introduction to openmp
play

Introduction to OpenMP ! Introduction to parallel computing ! - PowerPoint PPT Presentation

Agenda ! Introduction to OpenMP ! Introduction to parallel computing ! Classification of parallel computers ! What is OpenMP? ! OpenMP examples ! !" #" Parallel computing ! Demand for parallel computing ! A form of


  1. Agenda ! Introduction to OpenMP ! • Introduction to parallel computing ! • Classification of parallel computers ! • What is OpenMP? ! • OpenMP examples ! !" #" Parallel computing ! Demand for parallel computing ! A form of computation in which many instructions are With the increased use of computers in every sphere of carried out simultaneously operating on the principle human activity, computer scientists are faced with two that large problems can often be divided into smaller crucial issues today: ! ones, which are then solved concurrently (in parallel). ! • Processing has to be done faster ! Parallel computing has become a dominant paradigm in computer architecture, mainly in the form of multi-core • Larger or more complex problems need to be solved ! processors. ! Why is it required? ! $" %"

  2. Parallelism ! Serial computing ! A problem is broken into a discrete series of instructions. ! Instructions are executed one after another. ! Increasing the number of transistors as per Moore’s law Only one instruction may execute at any moment in time. ! is not a solution, as it increases power consumption. ! Power consumption has been a major issue recently, as it causes a problem of processor heating. ! The prefect solution is parallelism , in hardware as well as software. ! Moore's law (1965): The density of transistors will double every 18 months. ! &" '" Parallel computing ! Forms of parallelism ! A problem is broken into discrete parts that can be solved concurrently. • Bit-level parallelism ! ! ! ! ! ! ! ! Each part is further broken down to a series of instructions. ! ! More than one bit is manipulated per clock cycle ! Instructions from each part execute simultaneously on different CPUs ! • Instruction-level parallelism ! ! ! ! ! ! Machine instructions are processed in multi-stage pipelines ! • Data parallelism ! ! ! ! ! ! ! ! ! Focuses on distributing the data across different parallel ! computing nodes. It is also called loop-level parallelism ! • Task parallelism ! ! ! ! ! ! ! ! ! Focuses on distributing tasks across different parallel ! computing nodes. It is also called functional or control ! parallelism ! (" )"

  3. Key difference between data ! Parallelism versus concurrency ! and task parallelism ! • ! Concurrency is when two tasks can start, run and ! complete in overlapping time periods. It doesn’t ! necessarily mean that they will ever be running at Data parallelism ! Task parallelism ! ! the same instant of time. E.g., multitasking on a A task ‘A’ is divided into sub- A task ‘A’ and a task ‘B’ are ! single-threaded machine. ! parts and then processed ! processed separately by different processors ! • ! Parallelism is when two tasks literally run at the ! same time, e.g. on a multi-core computer. ! *" !+" Granularity ! Flynn’s classification of computers ! • ! Granularity is the ratio of computation to ! communication. ! • ! Fine-grained parallelism means individual ! tasks are relatively small in terms of code size ! and execution time. The data are transferred ! among processors frequently in amounts of ! one or a few memory words. ! • ! Coarse-grained is the opposite: data are ! communicated infrequently, after larger ! amounts of computation. ! M. Flynn, 1966 ! !!" !#"

  4. Flynn’s taxonomy: SISD ! Flynn’s taxonomy: MIMD ! ,-./"." ,-./"." 5.,,"9" ,-./">" 678" .//"0" 678" .//"0" ,-./"." >?5">" 12-34"5"" 12-34"5"" :;<=","" :;<=@"" = !" = # " = A" A serial (non-parallel) computer ! Examples: uni-core PCs, single CPU workstations and Multiple Instruction : Every Examples: Most modern Single Instruction : Only one instruction mainframes ! processing unit may be executing computers fall into this stream is acted on by the CPU in one a different instruction at any given category ! clock cycle ! clock cycle ! Single Data : Only one data stream is used Multiple Data : Each processing as input during any clock cycle ! unit can operate on a different data element ! !$" !%" Flynn’s taxonomy: SIMD ! Flynn’s taxonomy: MISD ! ,-./".B!C" ,-./".B#C" ,-./".BAC" ,-./"." ,-./"." ,-./"." 678" 678" .//"0B!C" .//"0B#C" .//"0BAC" <;,2"!" <;,2"#" <;,2"A" 12-34"5B!C"" 12-34"5B#C"" 12-34"5BAC"" 12-34"5B!C"" 12-34"5B#C"" 12-34"5BAC"" = !" = # " = A" = !" = # " = A" Single Instruction : All processing Best suited for problems of A single data stream is fed into Examples: Few actual units execute the same instruction high degree of regularity, multiple processing units ! examples of this category at any given clock cycle ! such as image processing ! have ever existed ! Each processing unit operates on Multiple Data : Each processing Examples: Connection the data independently via unit can operate on a different data Machine CM-2, Cray C90, independent instruction streams ! element ! NVIDIA ! !&" !'"

  5. Flynn’s Taxonomy ! Parallel classification ! (1966) ! MISD (Multiple-Instruction Single-Data) ! MIMD (Multiple-Instruction Multiple-Data) ! Parallel computers: ! > ! � > A � > ! � > A � • SIMD (Single Instruction Multiple Data): ! !"!"!" = ! � = A � !"!"!" = ! � = A � ! Synchronized execution of the same instruction on a ! ! set of processors ! / D � / E � / D! � / E! � / DA � / EA � • MIMD (Multiple Instruction Multiple Data): ! SISD (Single-Instruction Single-Data) ! SIMD (Single-Instruction Multiple-Data) ! ! Asynchronous execution of different instructions ! > � > � !"!"! � = ! � = A � "= � / D! � / E! � / DA � / EA � / D � / E � !(" 18 ! Parallel machine memory models ! Distributed memory ! Distributed memory model ! Shared memory model ! Processors have local (non-shared) + Very scalable ! memory ! + No cache coherence problems ! Hybrid memory model ! + Use commodity processors ! Requires communication network. Data from one processor must be - Lot of programmer responsibility ! communicated if required ! - NUMA (Non-Uniform Memory ! Access) times ! Synchronization is programmer’s responsibility ! !*" #+"

  6. Shared memory ! Hybrid memory ! Multiple processors operate + User friendly ! independently but access global + UMA (Uniform Memory Access) Biggest machines today have both ! memory space ! ! times. UMA = SMP (Symmetric ! Multi-Processor) ! Advantages and disadvantages are Changes in one memory location those of the individual parts ! are visible to all processors ! - Expense for large computers ! Poor scalability is “myth” ! #!" ##" Shared memory access ! Uniform Memory Access – UMA: ! (Symmetric Multi-Processors – SMP) ! • ! centralized shared memory, accesses to global ! memory ! ! from all processors have same latency ! Non-uniform Memory Access – NUMA: ! (Distributed Shared Memory – DSM) ! • ! memory is distributed among the nodes, local accesses ! much faster than remote accesses ! #$" #%"

  7. Moore’s Law ! Parallelism in hardware and software ! (1965) ! • Hardware ! Moore’s Law, despite predictions of its demise, is still ! going parallel (many cores) ! in effect. Despite power issues, transistor densities are still doubling every 18 to 24 months. ! • Software? ! ! Herb Sutter (Microsoft): ! With the end of frequency scaling, these new ! “ The free lunch is over. Software performance transistors can be used to add extra hardware, such as ! will no longer increase from one generation to additional cores, to facilitate parallel computing. ! ! the next as hardware improves ... unless it is ! parallel software .” ! #&" #'" Amdahl’s Law [1967] ! Gene Amdahl, IBM designer ! Assume fraction f p of execution time is parallelizable ! ! No overhead for scheduling, communication, synchronization, etc. ! Fraction f s = 1 " f p is serial (not parallelizable) " T 1 = f p T 1 + (1 ! f p ) T 1 Time on 1 processor: ! T N = f p T 1 Time on N processors: " N + (1 ! f p ) T 1 = T 1 1 1 Speedup " = ! 1 " f p f p T N N + (1 ! f p ) #(" #)"

Recommend


More recommend