recap brent s principle
play

Recap: Brents principle Sequential algorithms: time = work Parallel - PowerPoint PPT Presentation

Recap: Brents principle Sequential algorithms: time = work Parallel algorithms (PRAM): time T(n) work W(n) Brents princliple: Work matters! 1 / 61 Work efficiency Parallel alg is work-efficient if: Work is the


  1. Recap: Brent‘s principle ● Sequential algorithms: time = work ● Parallel algorithms (PRAM): – time T(n) – work W(n) ● Brent‘s princliple: ● Work matters! 1 / 61

  2. Work efficiency ● Parallel alg is work-efficient if: – Work is the same as sequential algorithm: ● Example: Parallel Alg. A: Parallel Alg. B: 2 / 61

  3. Work efficiency ● Parallel alg is work-efficient if: – Work is the same as sequential algorithm: ● Example: Parallel Alg. A: Parallel Alg. B: Work Efficient NOT Work Efficient 3 / 61

  4. Prefix Sums ● Given A : set of n integers ● Find B : prefix sums A: 3 1 1 7 2 5 9 2 4 3 3 B: 3 4 5 12 14 19 28 30 34 37 40 4 / 61

  5. Sequential Prefix Sums ● Trivial sequential algorithm A: 3 1 1 7 2 5 9 2 4 3 3 B: 0 0 0 0 0 0 0 0 0 0 0 5 / 61

  6. Sequential Prefix Sums ● Trivial sequential algorithm A: 3 1 1 7 2 5 9 2 4 3 3 B: 3 0 0 0 0 0 0 0 0 0 0 6 / 61

  7. Sequential Prefix Sums ● Trivial sequential algorithm A: 3 1 1 7 2 5 9 2 4 3 3 B: 3 4 0 0 0 0 0 0 0 0 0 7 / 61

  8. Sequential Prefix Sums ● Trivial sequential algorithm A: 3 1 1 7 2 5 9 2 4 3 3 B: 3 4 5 0 0 0 0 0 0 0 0 8 / 61

  9. Sequential Prefix Sums ● Trivial sequential algorithm A: 3 1 1 7 2 5 9 2 4 3 3 B: 3 4 5 12 14 19 28 30 34 37 40 9 / 61

  10. Sequential Prefix Sums ● Trivial sequential algorithm ● Can we do this in-place ? 10 / 61

  11. Sequential Prefix Sums ● Trivial sequential algorithm ● Can we do this in-place ? 11 / 61

  12. Sequential Prefix Sums ● Trivial sequential algorithm ● What is the complexity of this algorithm? 12 / 61

  13. Sequential Prefix Sums ● Trivial sequential algorithm ● What is the complexity of this algorithm? ● Loop runs n-1 times: T 1 (n) = O(n) 13 / 61

  14. Parallel approach ● Similar to summation – Except all cells need correct values ● “Binary tree“ approach – Add values of increasing distance ● Goal: time T( n ) = O(log n ) work W( n ) = O( n ) 15 / 61

  15. Naive parallel algorithm 3 1 1 7 2 5 9 2 3 4 2 8 9 7 14 11 16 / 61

  16. Naive parallel algorithm 3 1 1 7 2 5 9 2 3 4 2 8 9 7 14 11 3 4 5 12 11 15 23 18 17 / 61

  17. Naive parallel algorithm 3 1 1 7 2 5 9 2 3 4 2 8 9 7 14 11 3 4 5 12 11 15 23 18 3 4 5 12 14 19 28 30 18 / 61

  18. Proof of correctness ● Does this correctly solve prefix sums? – Let‘s prove it: ● Before step j: – the first 2 j elements are prefix sums, – each A[i] = sum of previous 2 j elements Step 3 1 1 7 2 5 9 2 0 3 4 2 8 9 7 14 11 1 3 4 5 12 11 15 23 18 19 / 61

  19. Proof of correctness ● Claim: before each round j , ● Proof: ● Base case: j = 0 20 / 61

  20. Proof of correctness ● Inductive step: Assume that before the j th step: ● Step j‘ = j+1 : 21 / 61

  21. Proof of correctness A[i-2 j ] A[i] 2 j 2 j 22 / 61

  22. Proof of correctness 23 / 61

  23. Naive algorithm time ● What is T( n )? – Remember: we have unlimited processors! 24 / 61

  24. Naive algorithm time ● What is T( n )? – Inner loop is done in parallel: O(1) – Outer loop is sequential: O(log n ) ● T( n ) = O(log n ) 25 / 61

  25. Naive algorithm work ● What is W( n )? 26 / 61

  26. Solving the summation ● What is 2nd term? k sum of internal (red) nodes = 27 / 61

  27. Proof of summation ● Claim: ● Proof by induction: ● Base case: k = 1 28 / 61

  28. Proof of summation ● Inductive Step: ● Assume k > 1 , show for k‘ = k+1 29 / 61

  29. Naive algorithm work ● Is this work-efficient ? 30 / 61

  30. Naive algorithm work ● Is this work-efficient? No. Our sequential algorithm was O( n ) 31 / 61

  31. Ideas for a work-efficient alg. ● What if we had half the prefix sums? 3 1 1 7 2 5 9 2 A 3 4 5 12 B ● No help :( 32 / 61

  32. Ideas for a work-efficient alg. ● What about this half? 3 1 1 7 2 5 9 2 A 4 12 19 30 B ● How would we compute the rest? 33 / 61

  33. Ideas for a work-efficient alg. ● B[i] = B[i-1] + A[i] 3 1 1 7 2 5 9 2 A 3 4 5 12 14 19 30 B 28 34 / 61

  34. Work-efficient algorithm ● Idea based on balanced binary trees: – Depth of a tree is O(log n ) – Number of nodes is O( n ) ● Compute prefix sum for every odd element ● Use these to compute remaining 35 / 61

  35. Computing prefix for all 2 i ● Perform summation (reduce) ● Save intermediate values in B 4 8 7 11 B A 3 1 1 7 2 5 9 2 36 / 61

  36. Computing prefix for all 2 i ● Continue until total sum is found B 4 12 7 18 4 8 7 11 A 3 1 1 7 2 5 9 2 37 / 61

  37. Computing prefix for all 2 i ● Continue until total sum is found B 4 12 7 30 4 12 7 18 4 8 7 11 A 3 1 1 7 2 5 9 2 38 / 61

  38. Remaining odd elements ● Each prefix sum at 2 i is correct! B 4 12 7 30 A 3 1 1 7 2 5 9 2 39 / 61

  39. Remaining odd elements ● Each prefix sum at 2 i is correct! ● Add preceeding 2 i to fix other odd B entries B 4 12 7 30 4 12 19 30 A 3 1 1 7 2 5 9 2 40 / 61

  40. Defining the algorithm ● Each odd-index prefix sum is correct ● Empty spaces in B – Easier to use consecutive spaces – Compress to the first n/2 positons B 4 12 19 30 A 3 1 1 7 2 5 9 2 41 / 61

  41. Computing prefix for all odds B 4 12 19 30 A 3 1 1 7 2 5 9 2 42 / 61

  42. Proof of correctness ● Claim: ● Proof ● Base case : 43 / 61

  43. Proof of correctness ● Inductive step : Assume true for i > 0 , show for i‘ = i + 1 44 / 61

  44. Even index sums ● We compute all odd-index prefix sums ● Simple to fill in evens: 45 / 61

  45. Putting it all together Odd-index prefix sums Fill in even indices 46 / 61

  46. Time analysis ● Each recursive call works on ½ as many elements – Can be done in parallel: – Solution? 47 / 61

  47. Time analysis ● Each recursive call works on ½ as many elements – Can be done in parallel: – Solution? 48 / 61

  48. Work analysis ● Have to consider work done at each level: – Solution? 49 / 61

  49. Work analysis ● Have to consider work done at each level: – Solution? – (or using the Master Theorem) 50 / 61

  50. Algorithm overview ● Parallel runtime same as our naive algorithm ● Less work…. but is it work-efficient ? 51 / 61

  51. Algorithm overview ● Parallel runtime same as our naive algorithm ● Less work…. but is it work-efficient ? Yes! 52 / 61

  52. One more example B A 2 1 1 3 2 4 1 1 3 2 6 1 8 4 2 1 53 / 61

  53. One more example B 3 4 6 2 5 7 12 3 A 2 1 1 3 2 4 1 1 3 2 6 1 8 4 2 1 54 / 61

  54. One more example B 7 8 12 15 A 3 4 6 2 5 7 12 3 2 1 1 3 2 4 1 1 3 2 6 1 8 4 2 1 55 / 61

  55. One more example B 15 27 A 7 8 12 15 3 4 6 2 5 7 12 3 2 1 1 3 2 4 1 1 3 2 6 1 8 4 2 1 56 / 61

  56. One more example B 42 A 15 27 7 8 12 15 3 4 6 2 5 7 12 3 2 1 1 3 2 4 1 1 3 2 6 1 8 4 2 1 57 / 61

  57. One more example B 42 A 15 42 7 8 12 15 3 4 6 2 5 7 12 3 2 1 1 3 2 4 1 1 3 2 6 1 8 4 2 1 58 / 61

  58. One more example 42 B 15 42 A 7 15 27 42 3 4 6 2 5 7 12 3 2 1 1 3 2 4 1 1 3 2 6 1 8 4 2 1 59 / 61

  59. One more example 42 15 42 B 7 15 27 42 A 3 7 13 15 20 27 39 42 2 1 1 3 2 4 1 1 3 2 6 1 8 4 2 1 60 / 61

  60. One more example 42 15 42 7 15 27 42 B 3 7 13 15 20 27 39 42 A 2 1 1 3 2 4 1 1 3 2 6 1 8 4 2 1 61 / 61

  61. One more example 42 15 42 7 15 27 42 B 3 7 13 15 20 27 39 42 A 2 3 4 7 9 13 14 15 18 20 26 27 35 39 41 42 62 / 61

Recommend


More recommend