analytical modeling of parallel programs chapter 5
play

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre - PowerPoint PPT Presentation

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David B2-206 Topic Overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of granularity on performance. Scalability of


  1. Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David B2-206

  2. Topic Overview � Sources of overhead in parallel programs. � Performance metrics for parallel systems. � Effect of granularity on performance. � Scalability of parallel systems. � Minimum execution time and minimum cost-optimal execution time. � Asymptotic analysis of parallel programs. � Other scalability metrics. 10-03-2006 Alexandre David, MVP'06 2

  3. Analytical Modeling – Basics � A sequential algorithm is evaluated by its runtime in function of its input size. � O(f(n)), Ω (f(n)), Θ (f(n)). � The asymptotic runtime is independent of the platform. Analysis “at a constant factor”. � A parallel algorithm has more parameters. � Which ones? 10-03-2006 Alexandre David, MVP'06 3

  4. Analytical Modeling – Basics � A parallel algorithm is evaluated by its runtime in function of � the input size, � the number of processors, � the communication parameters. � Which performance measures? � Compare to which (serial version) baseline? 10-03-2006 Alexandre David, MVP'06 4

  5. Sources of Overhead in Parallel Programs � Overheads: wasted computation, communication, idling, contention. � Inter-process interaction. � Load imbalance. � Dependencies. 10-03-2006 Alexandre David, MVP'06 5

  6. Performance Metrics for Parallel Systems � Execution time = time elapsed between � beginning and end of execution on a sequential computer. � beginning of first processor and end of the last processor on a parallel computer. 10-03-2006 Alexandre David, MVP'06 6

  7. Performance Metrics for Parallel Systems � Total parallel overhead. � Total time collectively spent by all processing elements = pT P . � Time spent doing useful work (serial time) = T S . � Overhead function: T O = pT P -T S . 10-03-2006 Alexandre David, MVP'06 7

  8. Performance Metrics for Parallel Systems � What is the benefit of parallelism? � Speedup of course… let’s define it. � Speedup S = T S /T P . � Example: Compute the sum of n elements. � Serial algorithm Θ ( n ). � Parallel algorithm Θ (log n ). � Speedup = Θ ( n /log n ). � Baseline (T S ) is for the best sequential algorithm available. 10-03-2006 Alexandre David, MVP'06 8

  9. Speedup � Theoretically, speedup can never exceed p . If > p , then you found a better sequential algorithm… Best: T P =T S /p. � In practice, super-linear speedup is observed. How? � Serial algorithm does more work? � Effects from caches. � Exploratory decompositions. 10-03-2006 Alexandre David, MVP'06 9

  10. Speedup – Example Depth-first Search 1 processing element: 14t c . 2 processing elements: 5t c . Speedup: 2.8. 10-03-2006 Alexandre David, MVP'06 10

  11. Performance Metrics � Efficiency E=S/p . � Measure time spent in doing useful work. � Previous sum example: E = Θ (1/log n ). � Cost C=pT P . � A.k.a. work or processor-time product. � Note: E=T S /C . � Cost optimal if E is a constant. 10-03-2006 Alexandre David, MVP'06 11

  12. Effect of Granularity on Performance � Scaling down: To use fewer processing elements than the maximum possible. � Naïve way to scale down: � Assign the work of n/p processing element to every processing element. � Computation increases by n/p . � Communication growth ≤ n/p . If it is not cost optimal, it may � If a parallel system with n processing elements still not be cost optimal after the is cost optimal, then it is still cost optimal with granularity increase. p. 10-03-2006 Alexandre David, MVP'06 12

  13. Adding n Numbers – Bad Way 12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3 0 1 2 3 10-03-2006 Alexandre David, MVP'06 13

  14. Adding n Numbers – Bad Way 12+13 14+15 8+9 10+11 4+5 6+7 0+1 2+3 0 1 2 3 10-03-2006 Alexandre David, MVP'06 14

  15. Adding n Numbers – Bad Way 12+13+14+15 + 8+9+10+11 + 4+5+6+7 + 0+1+2+3 0 1 2 3 Bad way: T= Θ (( n/p )log p ) 10-03-2006 Alexandre David, MVP'06 15

  16. Adding n Numbers – Good Way 3 7 11 15 + + + + 2 6 10 14 + + + + 1 5 9 13 + + + + 0 4 8 12 0 1 2 3 10-03-2006 Alexandre David, MVP'06 16

  17. Adding n Numbers – Good Way 0+1+2+3 4+5+6+7 8+9+10+11 12+13+14+15 0 1 2 3 Much less communication. T= Θ ( n/p +log p ). 10-03-2006 Alexandre David, MVP'06 17

  18. Scalability of Parallel Systems � In practice: Develop and test on small systems with small problems. � Problem: What happens for the real large problems on large systems? � Difficult to extrapolate results. 10-03-2006 Alexandre David, MVP'06 18

  19. Problem with Extrapolation 10-03-2006 Alexandre David, MVP'06 19

  20. Scaling Characteristics of Parallel Programs � Rewrite efficiency (E): ⎧ S T = = S E ⎪ 1 ⇒ = ⎨ p pT E p T ⎪ + = + 0 1 pT T T ⎩ p 0 S T S � What does it tell us? 10-03-2006 Alexandre David, MVP'06 20

  21. Example: Adding Numbers n = + T P 2 log p p n ⇒ = S n + 2 log p p 1 S ⇒ = = E 2 log p p p + 1 n 10-03-2006 Alexandre David, MVP'06 21

  22. Speedup 10-03-2006 Alexandre David, MVP'06 22

  23. Scalable Parallel System � Can maintain its efficiency constant when increasing the number of processors and the size of the problem. � In many cases T 0 =f(T S ,p) and grows sub- linearly with T S . It can be possible to increase p and T S and keep E constant. � Scalability measures the ability to increase speedup in function of p . 10-03-2006 Alexandre David, MVP'06 23

  24. Cost-Optimality � Cost optimal parallel systems have efficiency Θ (1). � So scalability and cost-optimality are linked. � Adding number example: becomes cost- optimal when n= Ω ( p log p ). 10-03-2006 Alexandre David, MVP'06 24

  25. Scalable System � Efficiency can be kept constant when � the number of processors increases and � the problem size increases. � At which rate the problem size should increase with the number of processors? � The rate determines the degree of scalability. � In complexity problem size = size of the input. Here = number of basic operations to solve the problem. Noted W. 10-03-2006 Alexandre David, MVP'06 25

  26. Rewrite Formulas Parallel execution time Efficiency Speedup 10-03-2006 Alexandre David, MVP'06 26

  27. Isoefficiency Function � For scalable systems efficiency can be kept constant if T 0 /W is kept constant. For a target E Keep this constant Isoefficiency function W=KT 0 (W,p) 10-03-2006 Alexandre David, MVP'06 27

  28. Example � Adding number: We saw that T 0 =2 p log p . � We get W=K 2 p log p . � If we increate p to p’ , the problem size must be increased by ( p’ log p’ )/( p log p ) to keep the same efficiency. � Increase p by p’/p . � Increase n by ( p’ log p’ )/( p log p ). 10-03-2006 Alexandre David, MVP'06 28

  29. Example Isoefficiency = Θ ( p 3 ). 10-03-2006 Alexandre David, MVP'06 29

  30. Why? � After isoefficiency analysis, we can test our parallel program with few processors and then predict what will happen for larger systems. 10-03-2006 Alexandre David, MVP'06 30

  31. Link to Cost-Optimality A parallel system is cost-optimal iff pT P = Θ (W). A parallel system is cost-optimal iff its overhead (T 0 ) does not exceed (asymptotically) the problem size. 10-03-2006 Alexandre David, MVP'06 31

  32. Lower Bounds � For a problem consisting of W units of work, p ≤ W processors can be used optimally. � W= Ω (p) is the lower bound. � For a degree of concurrency C(W), p ≤ C(W) . � C(W)= Θ (W) for optimality (necessary condition). 10-03-2006 Alexandre David, MVP'06 32

  33. Example � Gaussian elimination: W= Θ (n 3 ). � But eliminate n variables consecutively with Θ (n 2 ) operations → C(W) = O(n 2 ) = O(W 2/3 ). � Use all the processors: C(W)= Θ (p) → W= Ω (p 3/2 ). 10-03-2006 Alexandre David, MVP'06 33

  34. Minimum Execution Time � If T P in function of p, we want its minimum. Find p 0 s.t. dT P /dp=0 . � Adding n numbers: T P =n/p+2 log p . → p 0 =n/2 . → T P min =2 log n . � Fastest but not necessary cost-optimal. 10-03-2006 Alexandre David, MVP'06 34

  35. Cost-Optimal Minimum Execution Time � If we solve cost-optimally, what is the minimum execution time? � We saw that if isoefficiency function = Θ (f(p)) then a problem of size W can be solved optimally iff p= Ω (f -1 (W)). � Cost-optimal system: T P = Θ (W/p) → T P cost_opt = Ω (W/f -1 (W)). 10-03-2006 Alexandre David, MVP'06 35

  36. Example: Adding Numbers � Isoefficiency function f( p )= Θ ( p log p ). W= n =f( p )= p log p → log n =log p loglog p . We have approximately p= n /log n =f -1 (n). � T Pcost_opt = Ω (W/f -1 (W)) = Ω ( n /log n * log( n /log n ) / ( n /log n )) = Ω (log( n /log n ))= Ω (log n -loglog n )= Ω (log n ). � T P = Θ ( n / p +log p )= Θ (log n +log( n /log n )) = Θ (2log n -loglog n )= Θ (log n ). � For this example T P cost_opt = Θ (T P min ). 10-03-2006 Alexandre David, MVP'06 36

  37. Remark � If p 0 > C(W) then its value is meaningless. min is obtained for p=C(W). T P 10-03-2006 Alexandre David, MVP'06 37

  38. Asymptotic Analysis of Parallel Programs 10-03-2006 Alexandre David, MVP'06 38

Recommend


More recommend