parallel algorithms parallel algorithms
play

Parallel Algorithms Parallel Algorithms Examples Examples - PowerPoint PPT Presentation

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions Concepts & Definitions Analysis of Algorithms Analysis of Algorithms Lemma Lemma Any complete binary tree with n leaves has


  1. Parallel Algorithms Parallel Algorithms � Examples Examples � � Concepts & Definitions Concepts & Definitions � � Analysis of Algorithms Analysis of Algorithms �

  2. Lemma Lemma � Any complete binary tree with n leaves has Any complete binary tree with n leaves has � � internal nodes = n internal nodes = n- -1 (i.e., 2n 1 (i.e., 2n- -1 total nodes) 1 total nodes) � � height = log height = log 2 n 2 n � � Exercise: Prove it. Exercise: Prove it. �

  3. Warming up Warming up � Consider the BTIN (Binary Tree Interconnected Consider the BTIN (Binary Tree Interconnected � Network) computational model. Suppose the tree Network) computational model. Suppose the tree has n leaves (and hence 2n- -1 processors). 1 processors). has n leaves (and hence 2n � If we have n numbers stored at the leaves, how can If we have n numbers stored at the leaves, how can � we obtain the sum? we obtain the sum? � How can we obtain the max or min? How can we obtain the max or min? � � How can we propagate a number stored at the root How can we propagate a number stored at the root � to all leaves? to all leaves?

  4. Warming up Warming up � Suppose we have n Suppose we have n- -1 numbers stored at n 1 numbers stored at n- -1 1 � arbitrary leaves. How can we move these numbers arbitrary leaves. How can we move these numbers to the n- -1 internal nodes? 1 internal nodes? to the n � If the leftmost n/2 leaves have numbers, how can If the leftmost n/2 leaves have numbers, how can � we move them the rightmost leaves? we move them the rightmost leaves? � How many steps does each of the above How many steps does each of the above � ? computation require ? computation require

  5. Example 1.4 Example 1.4 Grouping in a shared- -Memory PC Memory PC Grouping in a shared � Given a sequence of pairs {(x Given a sequence of pairs {(x 1 , d 1 ), … …, ( , (x x n , d d n )} 1 , d 1 ), n , n )} � ∈ {0, 1, .., m i ∈ where x i {0, 1, .., m- -1}, m < n, and 1}, m < n, and d d i is an where x i is an arbitrary datum. arbitrary datum. � By pigeonhole principle several x By pigeonhole principle several x i will be i will be � repeated because m < n. Write a parallel repeated because m < n. Write a parallel algorithm to group these pairs according to the algorithm to group these pairs according to the x i ’s s. . x i ’

  6. Example 1.4 Example 1.4 Grouping in a shared- -Memory Model Memory Model Grouping in a shared � Sequential algorithm: for each step i, read x and insert it in the hash table. � Time = n steps Memory = Θ (n) 0 1 m-1

  7. Example 1.4: Parallel algorithm Parallel algorithm Example 1.4: Grouping in a shared- -Memory Model Memory Model Grouping in a shared � Shared memory with n processors P 1 1 , P 2 2 , .., P n n � Memory = m(2n-1) � Think about m complete BT T 0 0 , T 1 1 , ..., T m 1 each with n m- -1 leaves numbered 1, 2, .., n which corresponds to P 1 1 ,.., P n n . 0 1 m-1 T 0 T 1 T m 0 1 m- -1 1 1 n 1 n 1 n 1 n

  8. Example 1.4: Parallel algorithm Parallel algorithm Example 1.4: Grouping in a shared- -Memory Model Memory Model Grouping in a shared � Phase 1: Each processor P i i will read the pair (x i i , d i i ) and insert it in the leaf i that belongs to the tree T x i i � Phase 2: Each processor P i i will try to move the pair (x i i , d i i ) higher up in its tree until it can go no higher as follows: 0 1 m-1 T 0 T 1 T m 0 1 m- -1 1 1 n 1 n 1 n 1 n

  9. Example 1.4: Parallel algorithm Parallel algorithm Example 1.4: Grouping in a shared- -Memory Model Memory Model Grouping in a shared Shifting-up rule: � If node u is free, then the pair in the right child (if any) takes precedence in moving to u over the pair in the left child (if any).

  10. Example 1.4: Parallel algorithm Parallel algorithm Example 1.4: Analysis Analysis � Since in shared memory parallel computer we have common program for all processors and they execute synchronously, then � Phase 1: takes 1 step only � Phase 2: it takes log 2 2 n steps only because each tree has height = log 2 2 n. � Total = log 2 2 n + 1 steps � Extra empty cells in the m(2n-1) memory can be released.

  11. Example: 1.5 Example: 1.5 Pipelining database in a BTIN model Pipelining database in a BTIN model � In a BTIN with n leaves (processors) containing In a BTIN with n leaves (processors) containing � n distinct records each of the form (k, d) where n distinct records each of the form (k, d) where k is a key and d is a datum. k is a key and d is a datum. � Suppose that the root receives a query to Suppose that the root receives a query to � retrieve the record whose key is K (if it exists) retrieve the record whose key is K (if it exists) � Write a parallel algorithm. Write a parallel algorithm. �

  12. Example: 1.5 Example: 1.5 Pipelining database in a BTIN model Pipelining database in a BTIN model � Sequential Algorithm: Sequential Algorithm: Use Use binary search binary search � algorithm after sorting the records according to after sorting the records according to algorithm the keys. the keys. Θ (n log n) (n log n) � Time = Time = Θ �

  13. Example: 1.5: Parallel Algorithm Parallel Algorithm Example: 1.5: Pipelining database in a BTIN model Pipelining database in a BTIN model � The root sends the key K to its children which they send The root sends the key K to its children which they send � subsequently to their children and so on. subsequently to their children and so on. � Until it reaches the leaves where it is compared with the Until it reaches the leaves where it is compared with the � keys they stored there. keys they stored there. � The leaf that contains the key is going to send up the The leaf that contains the key is going to send up the � corresponding record to the root through its parent and corresponding record to the root through its parent and grandparents. Other leaves will send null message. grandparents. Other leaves will send null message. � When a parent receives a record from one of its children When a parent receives a record from one of its children � then it will send the same record to its parent; otherwise it then it will send the same record to its parent; otherwise it will send null. will send null. � and so on ... and so on ... �

  14. Example: 1.5 Example: 1.5 Analysis Analysis � This is called pipeline technique. This is called pipeline technique. � � All processor have the same program working All processor have the same program working � asynchronously when they receive messages from asynchronously when they receive messages from parents or children. parents or children. � Time = 2 Time = 2 log 2 2 n steps to send the key down the tree � and receive back the record.

  15. Example: 1.5: Parallel Algorithm Parallel Algorithm Example: 1.5: Pipelining database in a BTIN model Pipelining database in a BTIN model � What if we make m queries K1, K2, ..., Km? What if we make m queries K1, K2, ..., Km? � � Solution: Solution: � � Sends them sequentially one after another Sends them sequentially one after another � � Total time = Total time = 2 2 log 2 2 n + m - 1 �

  16. Example 1.6 Example 1.6 Prefix (Partial) Sums Prefix (Partial) Sums � Given n numbers x Given n numbers x 0 , x 1 , ..., x n where n is a 0 , x 1 , ..., x 1 where n is a � n- -1 power of 2. power of 2. � Compute the partial sums for all k =0, 1, .., n Compute the partial sums for all k =0, 1, .., n- -1 1 � S k = x 0 + x 1 + ... + x x k S k = x 0 + x 1 + ... + k

  17. Example 1.6 Example 1.6 Prefix (Partial) Sums Prefix (Partial) Sums � Sequential Algorithm: Sequential Algorithm: � We need to make the unavoidable n- -1 additions. 1 additions. We need to make the unavoidable n

  18. Example 1.6: Parallel Algorithm Parallel Algorithm Example 1.6: Prefix (Partial) Sums Prefix (Partial) Sums S i = x i 1, let initially S i = x � For i = 0, 1, ...n For i = 0, 1, ...n- -1, let initially � i � Then for j =0, ..., Then for j =0, ..., log 2 2 n – 1, let let � ← S S i i ← S i + S i until 2^j = i S i + S j until 2^j = i 2^ j i- -2^ � This is can be done using the This is can be done using the combinatorial combinatorial � Circuit model with n( with n(log 2 2 n +1) processors Circuit model distributed over log 2 2 n +1 columns and n rows. � at each step we add the one that is at distance at each step we add the one that is at distance � equal to twice the distance we use in the equal to twice the distance we use in the previous step. previous step.

  19. Example 1.6: Parallel Algorithm Parallel Algorithm Example 1.6: Analysis Analysis � The number of processors in the model is The number of processors in the model is � n(log 2 2 n +1) n( � The number of columns is log 2 2 n +1 � Each processor does at most one step (addition). � The processors in any fixed column work in parallel. � Time = log 2 2 n +1 additions.

  20. Summary Summary � At the cost of At the cost of increasing the computation power increasing the computation power � (the number of processors & memory & memory) we may ) we may (the number of processors be able to decrease the computation time decrease the computation time be able to drastically. . drastically !!Expensive computations!! !!Expensive computations!! � Is it worth it? Is it worth it? �

Recommend


More recommend