Prefix sums on GPUs Bruce Merry Definition and Applications Prefix sums on GPUs Motivating Problem Definitions Other Applications Parallel Algorithms Bruce Merry Kogge-Stone Brent-Kung GPU Strategies Department of Computer Science, University of Cape Town Reduce-then-Scan Two-Level Prefix Sum GPGPU2 Workshop 2014 Summary
Outline Prefix sums on GPUs Bruce Merry Definition and Applications 1 Motivating Problem Definition and Applications Definitions Motivating Problem Definitions Other Applications Other Applications Parallel Algorithms Parallel Algorithms 2 Kogge-Stone Brent-Kung Kogge-Stone GPU Brent-Kung Strategies Reduce-then-Scan Two-Level Prefix Sum 3 GPU Strategies Summary Reduce-then-Scan Two-Level Prefix Sum
Outline Prefix sums on GPUs Bruce Merry Definition and Applications 1 Motivating Problem Definition and Applications Definitions Motivating Problem Definitions Other Applications Other Applications Parallel Algorithms Parallel Algorithms 2 Kogge-Stone Brent-Kung Kogge-Stone GPU Brent-Kung Strategies Reduce-then-Scan Two-Level Prefix Sum 3 GPU Strategies Summary Reduce-then-Scan Two-Level Prefix Sum
Problem Statement Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem Definitions For every object in a set, output a list of the other objects Other Applications Parallel that differ by less than some amount. Algorithms This is deliberately vague: could be for n-body simulation, Kogge-Stone Brent-Kung clustering, scattered data interpolation. GPU Strategies Reduce-then-Scan Two-Level Prefix Sum Summary
Problem Statement Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem Definitions For every object in a set, output a list of the other objects Other Applications Parallel that differ by less than some amount. Algorithms This is deliberately vague: could be for n-body simulation, Kogge-Stone Brent-Kung clustering, scattered data interpolation. GPU Strategies Reduce-then-Scan Two-Level Prefix Sum Summary
Output Format Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem The lists should be packed together contiguously. Definitions Other Applications Parallel A0 A1 A2 Algorithms Kogge-Stone Brent-Kung Assuming one workitem per object, how do the workitems GPU Strategies know where to start? Reduce-then-Scan Two-Level Prefix Sum Summary
Output Format Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem The lists should be packed together contiguously. Definitions Other Applications Parallel A0 A1 A2 B0 B1 Algorithms Kogge-Stone Brent-Kung Assuming one workitem per object, how do the workitems GPU Strategies know where to start? Reduce-then-Scan Two-Level Prefix Sum Summary
Output Format Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem The lists should be packed together contiguously. Definitions Other Applications Parallel A0 A1 A2 B0 B1 D0 Algorithms Kogge-Stone Brent-Kung Assuming one workitem per object, how do the workitems GPU Strategies know where to start? Reduce-then-Scan Two-Level Prefix Sum Summary
Output Format Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem The lists should be packed together contiguously. Definitions Other Applications Parallel A0 A1 A2 B0 B1 D0 E0 E1 Algorithms Kogge-Stone Brent-Kung Assuming one workitem per object, how do the workitems GPU Strategies know where to start? Reduce-then-Scan Two-Level Prefix Sum Summary
Output Format Prefix sums on GPUs Bruce Merry Definition and Applications Motivating Problem The lists should be packed together contiguously. Definitions Other Applications Parallel A0 A1 A2 B0 B1 D0 E0 E1 Algorithms Kogge-Stone Brent-Kung Assuming one workitem per object, how do the workitems GPU Strategies know where to start? Reduce-then-Scan Two-Level Prefix Sum Summary
Solution Prefix sums on GPUs Bruce Merry Definition and This can be solved with a multi-pass approach: Applications Motivating Problem Definitions 1 Every workitem counts how many records to emit, and Other Applications writes this number to a buffer. Parallel Algorithms 2 The buffer is processed to determine the start position Kogge-Stone Brent-Kung for each object, and writes this position to a buffer. GPU Strategies 3 Each workitem reads this buffer, and emits its records Reduce-then-Scan Two-Level Prefix in the right place. Sum Summary
Solution Prefix sums on GPUs Bruce Merry Definition and This can be solved with a multi-pass approach: Applications Motivating Problem Definitions 1 Every workitem counts how many records to emit, and Other Applications writes this number to a buffer. Parallel Algorithms 2 The buffer is processed to determine the start position Kogge-Stone Brent-Kung for each object, and writes this position to a buffer. GPU Strategies 3 Each workitem reads this buffer, and emits its records Reduce-then-Scan Two-Level Prefix in the right place. Sum Summary
Solution Prefix sums on GPUs Bruce Merry Definition and This can be solved with a multi-pass approach: Applications Motivating Problem Definitions 1 Every workitem counts how many records to emit, and Other Applications writes this number to a buffer. Parallel Algorithms 2 The buffer is processed to determine the start position Kogge-Stone Brent-Kung for each object, and writes this position to a buffer. GPU Strategies 3 Each workitem reads this buffer, and emits its records Reduce-then-Scan Two-Level Prefix in the right place. Sum Summary
Outline Prefix sums on GPUs Bruce Merry Definition and Applications 1 Motivating Problem Definition and Applications Definitions Motivating Problem Definitions Other Applications Other Applications Parallel Algorithms Parallel Algorithms 2 Kogge-Stone Brent-Kung Kogge-Stone GPU Brent-Kung Strategies Reduce-then-Scan Two-Level Prefix Sum 3 GPU Strategies Summary Reduce-then-Scan Two-Level Prefix Sum
Exclusive Prefix Sum Prefix sums on GPUs Given an operator ⊕ and an identity element I , the exclusive Bruce Merry prefix sum of ( a 0 , a 1 , . . . , a n − 1 ) is Definition and Applications i − 1 Motivating Problem � Definitions ( I , a 0 , a 0 ⊕ a 1 , a 0 ⊕ a 1 ⊕ a 2 , . . . , a 0 ⊕ · · · ⊕ a n − 2 ) = a j Other Applications j = 0 Parallel Algorithms Kogge-Stone In other words, element i is the sum of all elements strictly Brent-Kung GPU before i . Strategies Reduce-then-Scan Two-Level Prefix Sum 4 3 7 9 2 3 Summary 0 4 7 14 23 25
Exclusive Prefix Sum Prefix sums on GPUs Given an operator ⊕ and an identity element I , the exclusive Bruce Merry prefix sum of ( a 0 , a 1 , . . . , a n − 1 ) is Definition and Applications i − 1 Motivating Problem � Definitions ( I , a 0 , a 0 ⊕ a 1 , a 0 ⊕ a 1 ⊕ a 2 , . . . , a 0 ⊕ · · · ⊕ a n − 2 ) = a j Other Applications j = 0 Parallel Algorithms Kogge-Stone In other words, element i is the sum of all elements strictly Brent-Kung GPU before i . Strategies Reduce-then-Scan Two-Level Prefix Sum 4 3 7 9 2 3 Summary 0 4 7 14 23 25
Inclusive Prefix Sum Prefix sums on GPUs Given an operator ⊕ and an identity element I , the inclusive Bruce Merry prefix sum of ( a 0 , a 1 , . . . , a n − 1 ) is Definition and Applications i Motivating Problem � Definitions ( a 0 , a 0 ⊕ a 1 , a 0 ⊕ a 1 ⊕ a 2 , . . . , a 0 ⊕ · · · ⊕ a n − 1 ) = a j Other Applications j = 0 Parallel Algorithms Kogge-Stone In other words, element i is the sum of all elements before Brent-Kung and including i . GPU Strategies Reduce-then-Scan Two-Level Prefix Sum 4 3 7 9 2 3 Summary 4 7 14 23 25 28
Inclusive Prefix Sum Prefix sums on GPUs Given an operator ⊕ and an identity element I , the inclusive Bruce Merry prefix sum of ( a 0 , a 1 , . . . , a n − 1 ) is Definition and Applications i Motivating Problem � Definitions ( a 0 , a 0 ⊕ a 1 , a 0 ⊕ a 1 ⊕ a 2 , . . . , a 0 ⊕ · · · ⊕ a n − 1 ) = a j Other Applications j = 0 Parallel Algorithms Kogge-Stone In other words, element i is the sum of all elements before Brent-Kung and including i . GPU Strategies Reduce-then-Scan Two-Level Prefix Sum 4 3 7 9 2 3 Summary 4 7 14 23 25 28
Outline Prefix sums on GPUs Bruce Merry Definition and Applications 1 Motivating Problem Definition and Applications Definitions Motivating Problem Definitions Other Applications Other Applications Parallel Algorithms Parallel Algorithms 2 Kogge-Stone Brent-Kung Kogge-Stone GPU Brent-Kung Strategies Reduce-then-Scan Two-Level Prefix Sum 3 GPU Strategies Summary Reduce-then-Scan Two-Level Prefix Sum
Recommend
More recommend