parallel models
play

Parallel Models An abstract description of a real world parallel - PDF document

Parallel Models An abstract description of a real world parallel machine. Attempts to capture essential Advanced Algorithms features (and suppress details?) What other models have we seen so Piyush Kumar far? (Lecture e 10:


  1. Parallel Models • An abstract description of a real world parallel machine. • Attempts to capture essential Advanced Algorithms features (and suppress details?) • What other models have we seen so Piyush Kumar far? (Lecture e 10: Parallel el Algorithms) RAM? External Memory Model? Courtesy Baker 05. RAM Parallel RAM aka PRAM • Generalization of RAM • Random Access Machine Model • P processors with their own programs (and – Memory is a sequence of bits/words. unique id) – Each memory access takes O(1) time. • MIMD processors : At each point in time – Basic operations take O(1) time: the processors might be executing Add/Mul/Xor/Sub/AND/not… different instructions on different data. – Instructions can not be modified. • Shared Memory – No consideration of memory hierarchies. • Instructions are synchronized among the – Has been very successful in modelling real processors world machines. PRAM Variants of CRCW • Common CRCW: CW iff processors write same value. • Arbitrary CRCW Shared Memory EREW/ERCW/CREW/CRCW • Priority CRCW • Combining CRCW EREW: A program isnt allowed to access the same memory location at the same time. 1

  2. Why PRAM? PRAM Algorithm design. • Lot of literature available on • Problem 1: Produce the sum of an algorithms for PRAM. array of n numbers. • One of the most “clean” models. • RAM = ? • Focuses on what communication is • PRAM = ? needed ( and ignores the cost/means to do it) Problem 2: Prefix Computation Prefix computation • Suffix computation is a similar Let X = {s 0 , s 1 , …, s n-1 } be in a set S problem. Let  be a binary , associative , closed operator with respect to S • Assumes Binary op takes O(1) (usually Q (1) time – MIN, MAX, AND, +, ...) • In RAM = ? The result of s 0  s 1  …  s k is called the k-th prefix Computing all such n prefixes is the parallel prefix computation 1 st prefix s 0 2 nd prefix s 0  s 1 3 rd prefix s 0  s 1  s 2 ... ... s 0  s 1  ...  s n-1 (n -1)th prefix Prefix Computation (Akl) 2

  3. Problem 3: Array packing EREW PRAM Prefix computation • Assume that we have – an array of n elements, X = {x 1 , x 2 , ... , x n } • Assume PRAM has n processors and n is a power of 2. – Some array elements are marked (or Input: s i for i = 0,1, ... , n-1 . • distinguished ). • Algorithm Steps: • The requirements of this problem are to for j = 0 to (lg n) -1, do – pack the marked elements in the front part of for i = 2 j to n-1 do the array. – place the remaining elements in the back of the h = i - 2 j array. s i = s h  s i • While not a requirement, it is also desirable to endfor – maintain the original order between the endfor marked elements – maintain the original order between the unmarked elements Total time in EREW PRAM? EREW PRAM Algorithm In RAM? 1. Set s i in P i to 1 if x i is marked and set s i = 0 otherwise. • How would you do this? 2. Perform a prefix sum on S =(s 1 , s 2 ,..., s n ) to obtain destination d i = s i for each marked x i . • Inplace? 3. All PEs set m = s n , the total nr of marked elements. • Running time? 4. P i sets s i to 0 if x i is marked and otherwise • Any ideas on how to do this in PRAM? sets s i = 1. 5. Perform a prefix sum on S and set d i = s i + m for each unmarked x i . 6. Each P i copies array element x i into address d i in X. Problem 4: PRAM Array Packing MergeSort • Assume n processors are used above. • Optimal prefix sums requires O(lg n) time. • The EREW broadcast of s n needed in Step 3 takes • RAM Merge Sort Recursion? O(lg n) time using a binary tree in memory • PRAM Merge Sort recursion? • All and other steps require constant time. • Runs in O(lg n) time and is cost optimal. • Can we speed up the merging? • Maintains original order in unmarked group as well – Merging n elements with n processors can be Notes: done in O(log n) time. • Algorithm illustrates usefulness of Prefix Sums • There many applications for Array Packing – Assume all elements are distinct algorithm – Rank(a, A) = number of elements in A smaller than a. For example rank(8, {1,3,5,7,9}) = 4 3

  4. PRAM Merging PRAM Merge Sort • T(n) = T(n/2) + O(log n) • Using the idea of pipelined d&c PRAM A = 2,3,10,15,16 B = 1,8,12,14,19 Mergesort can be done in O(log n). Rank(2)=1 +1 Rank(1)=0 +1 • D&C is one of the most powerful Rank(3)=1 +2 Rank(8)=2 +2 techniques to solve problems in Rank(10)=2 +3 Rank(12)=3 +3 parallel. Rank(15)=4 +4 Rank(14)=3 +4 Rank(16)=4 +5 Rank(19)=5 +5 1 2 3 8 10 12 14 15 16 19 Closest Pair: RAM Version Problem 5: Closest Pair Closest-Pair(p 1 , …, p n ) { Compute separation line L such that half the points O(n log n) • RAM Version ? are on one side and half on the other side.  1 = Closest-Pair(left half) 2T(n / 2)  2 = Closest-Pair(right half)  = min(  1 ,  2 ) Delete all points further than  from separation line L L O(n) 7 O(n log n) Sort remaining points by y-coordinate. 6 Scan points in y-order and compare distance between 5 21 4 O(n) each point and next 11 neighbors. If any of these distances is less than  , update  .  = min(12, 21) return  . 12 3 } 2 1 Closest Pair: PRAM Version? Closest-Pair(p 1 , …, p n ) { Compute separation line L such that half the points O(1) are on one side and half on the other side. Use sorted lists  1 = Closest-Pair(left half) In parallel T(n / 2) Other Interesting  2 = Closest-Pair(right half)  = min(  1 ,  2 ) Delete all points further than  from separation line L Algorithms Use presorting and O(log n) prefix Sort remaining points by y-coordinate. computation. Scan points in y-order and compare distance between O(1) each point and next 11 neighbors. Find min of all these distances, update  . O(log n) Again use prefix return  . computation. } Recurrence : T(n) = T(n/2) + O(log n) 4

  5. Interesting Classes at A List FSU • Approximation Algorithms • Online Algorithms In case you liked this class: • Learning Algorithms • Network Algorithms – Parallel Algorithms • Advanced Data Structures. – Computational Geometry • Flow Algorithms. • Algorithmic Game Theory – Advanced Algorithms • Quantum Algorithms. • Geometric Algorithms Next Class • Practice Problem Solving for Finals. • Extra Office Hours : – Wednesday, I will be in office and accessible anytime for questions. 5

Recommend


More recommend