The dynamic multithreading model (part 1) CSE 6230, Fall 2014 August 26 1
Recall: DAG model of parallel computation Work = Total ops. Could interpret as sequential time. Span (or depth ) = Length of longest seq. dependence chain. Example: W = 15, D = 4. “Available” parallelism = W / D = 3.75 2
Relating work, depth, and time on p “ideal” processors Work & span laws T p ≥ W T p ≥ D � p Speedup & ideal speedup S p ≡ T 1 � ≤ p T p Brent’s theorem T p ≤ D + W − D p 3
Parallel primitives to generate DAGs A dynamic multithreading model Augment the usual sequential model with three concurrency keywords: spawn , sync , parallel-for Generates nested data-parallel DAGs Permits “simple” analysis of work, depth See new “readings” link at website for PDF file Data-parallel operations, e.g., vector-add , scan 4
3 1 4 7 9 5 2 8 3 1 2 4 7 9 5 8 1 2 3 5 7 9 8 Quicksort 8 9 “Natural” parallelism exists at each branch in the recursion. 5
1: function Y ← qsort ( X ) // | X | = n 2: if | X | ≤ 1 then return Y ← X 3: 4: else X L , Y M , X R ← partition-seq ( X ) // Pivot 5: Y L ← qsort ( X L ) 6: Y R ← qsort ( X R ) 7: return Y ← Y L ∪ Y M ∪ Y R 8: 9: endif Quicksort 6
1: function Y ← qsort ( X ) // | X | = n 2: if | X | ≤ 1 then return Y ← X 3: 4: else X L , Y M , X R ← partition-seq ( X ) // Pivot 5: Y L ← spawn qsort ( X L ) 6: Y R ← spawn qsort ( X R ) 7: return Y ← Y L ∪ Y M ∪ Y R 8: 9: endif Quicksort 7
1: function Y ← qsort ( X ) // | X | = n 2: if | X | ≤ 1 then return Y ← X 3: 4: else X L , Y M , X R ← partition-seq ( X ) // Pivot 5: Y L ← spawn qsort ( X L ) 6: Y R ← spawn qsort ( X R ) 7: sync 8: return Y ← Y L ∪ Y M ∪ Y R 9: 10: endif Quicksort 8
1: function Y ← qsort ( X ) // | X | = n 2: if | X | ≤ 1 then return Y ← X 3: 4: else X L , Y M , X R ← partition-seq ( X ) // Pivot 5: Y L ← spawn qsort ( X L ) 6: Y R ← qsort ( X R ) 7: sync 8: return Y ← Y L ∪ Y M ∪ Y R 9: 10: endif Quicksort 9
1: for i ← 1 to n do f ( i ) 2: Parallel loops 10
1: parallel-for i ← 1 to n do f ( i ) 2: Parallel loops 11
1: parallel-for i ← 1 to n do f ( i ) 2: Work and span? Parallel loops 12
Recommend
More recommend