+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic - PowerPoint PPT Presentation

+ Design of Parallel Algorithms Parallel Sorting Algorithms

+ Topic Overview n Issues in Sorting on Parallel Computers n Sorting Networks n Bubble Sort and its Variants n Quicksort n Bucket and Sample Sort n Other Sorting Algorithms

+ Sorting: Overview n One of the most commonly used and well-studied kernels. n Sorting can be comparison-based or noncomparison-based . n The fundamental operation of comparison-based sorting is compare-exchange . n The lower bound on any comparison-based sort of n numbers is Θ (n log n) . n We focus here on comparison-based sorting algorithms.

+ Sorting: Basics What is a parallel sorted sequence? Where are the input and output lists stored? n We assume that the input and output lists are distributed. n The sorted list is partitioned with the property that each partitioned list is sorted and each element in processor P i 's list is less than that in P j 's list if i < j .

+ Sorting: Parallel Compare Exchange Operation A parallel compare-exchange operation. Processes P i and P j send their elements to each other. Process P i keeps min{ a i , a j } , and P j keeps max{ a i , a j } .

+ Sorting: Basics What is the parallel counterpart to a sequential comparator? n If each processor has one element, the compare exchange operation stores the smaller element at the processor with smaller id. This can be done in t s + t w time. n If we have more than one element per processor, we call this operation a compare split. Assume each of two processors have n/p elements. n After the compare-split operation, the smaller n/p elements are at processor P i and the larger n/p elements at P j , where i < j . n The time for a compare-split operation is ( t s + t w n/p) , assuming that the two partial lists were initially sorted.

+ Sorting: Parallel Compare Split Operation A compare-split operation. Each process sends its block of size n/p to the other process. Each process merges the received block with its own block and retains only the appropriate half of the merged block. In this example, process P i retains the smaller elements and process P i retains the larger elements.

+ Sorting Networks n Networks of comparators designed specifically for sorting. n A comparator is a device with two inputs x and y and two outputs x' and y' . For an increasing comparator , x' = min{ x,y } and y' = min{ x,y } ; and vice-versa. n The speed of the network is proportional to its depth.

+ Sorting Networks: Comparators A schematic representation of comparators: (a) an increasing comparator, and (b) a decreasing comparator.

+ Sorting Networks A typical sorting network. Every sorting network is made up of a series of columns, and each column contains a number of comparators connected in parallel.

+ Sorting Networks: Bitonic Sort n A bitonic sorting network sorts n elements in Θ (log 2 n ) time. n A bitonic sequence has two tones - increasing and decreasing, or vice versa. Any cyclic rotation of such networks is also considered bitonic. n 〈 1,2,4,7,6,0 〉 is a bitonic sequence, because it first increases and then decreases. 〈 8,9,2,1,0,4 〉 is another bitonic sequence, because it is a cyclic shift of 〈 0,4,8,9,2,1 〉 . n The kernel of the network is the rearrangement of a bitonic sequence into a sorted sequence.

+ Sorting Networks: Bitonic Sort n Let s = 〈 a 0 ,a 1 ,…,a n-1 〉 be a bitonic sequence such that a 0 ≤ a 1 ≤ ··· ≤ a n/2-1 and a n/2 ≥ a n/2+1 ≥ ··· ≥ a n-1 . n Consider the following subsequences of s : s 1 = 〈 min{ a 0 , a n/2 },min{ a 1 , a n/2+1 },…,min{ a n/2-1 , a n-1 } 〉 s 2 = 〈 max{ a 0 , a n/2 },max{ a 1 , a n/2+1 },…,max{ a n/2-1 , a n-1 } 〉 n Note that s 1 and s 2 are both bitonic and each element of s 1 is less than every element in s 2 . n We can apply the procedure recursively on s 1 and s 2 to get the sorted sequence.

+ Sorting Networks: Bitonic Sort Merging a 16 -element bitonic sequence through a series of log 16 bitonic splits.

+ Sorting Networks: Bitonic Sort n We can easily build a sorting network to implement this bitonic merge algorithm. n Such a network is called a bitonic merging network . n The network contains log n columns. Each column contains n/2 comparators and performs one step of the bitonic merge. n We denote a bitonic merging network with n inputs by +BM [n] . n Replacing the + comparators by - comparators results in a decreasing output sequence; such a network is denoted by -BM [n] .

+ Sorting Networks: Bitonic Sort A bitonic merging network for n = 16 . The input wires are numbered 0,1,…, n - 1 , and the binary representation of these numbers is shown. Each column of comparators is drawn separately; the entire figure represents a ⊕ BM[ 16 ] bitonic merging network. The network takes a bitonic sequence and outputs it in sorted order.

+ Sorting Networks: Bitonic Sort How do we sort an unsorted sequence using a bitonic merge? n We must first build a single bitonic sequence from the given sequence. n A sequence of length 2 is a bitonic sequence. n A bitonic sequence of length 4 can be built by sorting the first two elements using +BM[ 2 ] and next two, using -BM[ 2 ]. n This process can be repeated recursively to generate larger bitonic sequences.

+ Sorting Networks: Bitonic Sort A schematic representation of a network that converts an input sequence into a bitonic sequence. In this example, ⊕ BM[ k ] and Ө BM[ k ] denote bitonic merging networks of input size k that use ⊕ and Ө comparators, respectively. The last merging network ( ⊕ BM[ 16 ]) sorts the input. In this example, n = 16 .

+ Sorting Networks: Bitonic Sort The comparator network that transforms an input sequence of 16 unordered numbers into a bitonic sequence.

+ Sorting Networks: Bitonic Sort n The depth of the network is Θ (log 2 n ) . n Each stage of the network contains n /2 comparators. A serial implementation of the network would have complexity Θ ( n log 2 n ) .

+ Mapping Bitonic Sort to Hypercubes n Consider the case of one item per processor. The question becomes one of how the wires in the bitonic network should be mapped to the hypercube interconnect. n Note from our earlier examples that the compare-exchange operation is performed between two wires only if their labels differ in exactly one bit! n This implies a direct mapping of wires to processors. All communication is nearest neighbor!

+ Mapping Bitonic Sort to Hypercubes Communication during the last stage of bitonic sort. Each wire is mapped to a hypercube process; each connection represents a compare-exchange between processes.

+ Mapping Bitonic Sort to Hypercubes Communication characteristics of bitonic sort on a hypercube. During each stage of the algorithm, processes communicate along the dimensions shown.

+ Mapping Bitonic Sort to Hypercubes Parallel formulation of bitonic sort on a hypercube with n = 2 d processes.

+ Mapping Bitonic Sort to Hypercubes n During each step of the algorithm, every process performs a compare- exchange operation (single nearest neighbor communication of one word). n Since each step takes Θ (1) time, the parallel time is T p = Θ (log 2 n ) n This algorithm is cost optimal w.r.t. its serial counterpart, but not w.r.t. the best sorting algorithm.

+ Mapping Bitonic Sort to Meshes n The connectivity of a mesh is lower than that of a hypercube, so we must expect some overhead in this mapping. n Consider the row-major shuffled mapping of wires to processors.

+ Mapping Bitonic Sort to Meshes Different ways of mapping the input wires of the bitonic sorting network to a mesh of processes: (a) row-major mapping, (b) row-major snakelike mapping, and (c) row- major shuffled mapping.

+ Mapping Bitonic Sort to Meshes The last stage of the bitonic sort algorithm for n = 16 on a mesh, using the row-major shuffled mapping. During each step, process pairs compare-exchange their elements. Arrows indicate the pairs of processes that perform compare-exchange operations.

+ Mapping Bitonic Sort to Meshes n In the row-major shuffled mapping, wires that differ at the i th least-significant bit are mapped onto mesh processes that are 2 ⎣ ( i -1)/2 ⎦ communication links away. n The total amount of communication performed by each process is . The total computation performed by each process is log n i " $ ( j − 1 ) /2 ( ) ∑ ∑ # % 2 ≈ 7 n = Θ n i = 1 j = 1 Θ (log 2 n ) . n The parallel runtime is:        comparisons communication T P = Θ log 2 n ( ) ( ) n + Θ n This is not cost optimal w.r.t bitonic sort algorithm!

+ Block of Elements Per Processor n The parallel bitonic sort algorithm is not cost optimal with respect to the fastest serial algorithm. To find a cost optimal algorithm, consider changing algorithm to support n/p elements per processor as follows: n Each process is assigned a block of n/p elements. n The first step is a local sort of the local block. n Each subsequent compare-exchange operation is replaced by a compare-split operation. n We can effectively view the bitonic network as having (1 + log p )(log p )/2 steps.

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic - PowerPoint PPT Presentation

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in Sorting on Parallel Computers n Sorting Networks n Bubble Sort and its Variants n Quicksort n Bucket and Sample Sort n Other Sorting

SORTING Review of Sorting Merge Sort Sets sorting 1 Sorting Algorithms

Sorting Sorting used as a step in many algorithms Savitch Chapter 7.4 Sorting algorithms

Overview/Questions What is sorting? Why does sorting matter? How is sorting

Sorting Lower Bound Sorting Lower Bound 1 Comparison-Based Sorting (10.4) Many sorting

Sorting Insertion sort Bubble sort Divide and conquer sorting Sorting Last time: introduction

Sorting Algorithms Introduction Sorting Problem Sorting Problem Given a sequence A = a 1 , .

Sorting Sorting as a tool Sorting problem: Given a list a with n elements possessing a There are

Sorting Sorting: to arrange data in some sequential order Sorting occurs as a part in

Sorting Algorithms CENG 707 Data Structures and Algorithms Sorting Sorting is a process

Sorting Algorithms October 18, 2017 CMPE 250 Sorting Algorithms October 18, 2017 1 / 74

Cache and TLB-aware Parallel Sorting Kynan Shook Sorting Sorting is used in many places

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Chapter 7 External Sorting Sorting Tables Larger Than Main Memory Query Processing Sorting

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

Sorting a List: bubble sort selection sort insertion sort Sept. 22, 2017 1 Sorting BEFORE

Chapter 10 Sorting and Searching Some concepts Sorting is one of the most common

tclrmq Garrett McGrath Tcl Conference 2017 Where Is It? Full source code under a BSD-style

The UP-C Structure Employing the Umbrella PartnershipC-Corporation Structure in an IPO Joshua

Trading and Arbitrage in Cryptocurrency Markets Igor Makarov Antoinette Schoar LSE MIT Sloan

Read, inspect, & clean data from csv files Importing & Managing Financial Data in Python

PLANNING & EXECUTING DIRECT MAIL ACQUISITION Presented by: Virginia Dambach Dambach &

Overview/Questions Review: selection sort and bubble sort On what basis should we compare

MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefits Ashish

PARAMETERS, INDEFINITE LOOPS, AND LOOP PATTERNS CSSE 120Rose Hulman Institute of Technology A

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic - PowerPoint PPT Presentation

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in Sorting on Parallel Computers n Sorting Networks n Bubble Sort and its Variants n Quicksort n Bucket and Sample Sort n Other Sorting

SORTING Review of Sorting Merge Sort Sets sorting 1 Sorting Algorithms

Sorting Sorting used as a step in many algorithms Savitch Chapter 7.4 Sorting algorithms

Overview/Questions What is sorting? Why does sorting matter? How is sorting

Sorting Lower Bound Sorting Lower Bound 1 Comparison-Based Sorting (10.4) Many sorting

Sorting Insertion sort Bubble sort Divide and conquer sorting Sorting Last time: introduction

Sorting Algorithms Introduction Sorting Problem Sorting Problem Given a sequence A = a 1 , .

Sorting Sorting as a tool Sorting problem: Given a list a with n elements possessing a There are

Sorting Sorting: to arrange data in some sequential order Sorting occurs as a part in

Sorting Algorithms CENG 707 Data Structures and Algorithms Sorting Sorting is a process

Sorting Algorithms October 18, 2017 CMPE 250 Sorting Algorithms October 18, 2017 1 / 74

Cache and TLB-aware Parallel Sorting Kynan Shook Sorting Sorting is used in many places

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Chapter 7 External Sorting Sorting Tables Larger Than Main Memory Query Processing Sorting

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

Sorting a List: bubble sort selection sort insertion sort Sept. 22, 2017 1 Sorting BEFORE

Chapter 10 Sorting and Searching Some concepts Sorting is one of the most common

tclrmq Garrett McGrath Tcl Conference 2017 Where Is It? Full source code under a BSD-style

The UP-C Structure Employing the Umbrella PartnershipC-Corporation Structure in an IPO Joshua

Trading and Arbitrage in Cryptocurrency Markets Igor Makarov Antoinette Schoar LSE MIT Sloan

Read, inspect, &amp; clean data from csv files Importing &amp; Managing Financial Data in Python

PLANNING &amp; EXECUTING DIRECT MAIL ACQUISITION Presented by: Virginia Dambach Dambach &amp;

Overview/Questions Review: selection sort and bubble sort On what basis should we compare

MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefits Ashish

PARAMETERS, INDEFINITE LOOPS, AND LOOP PATTERNS CSSE 120Rose Hulman Institute of Technology A

Read, inspect, & clean data from csv files Importing & Managing Financial Data in Python

PLANNING & EXECUTING DIRECT MAIL ACQUISITION Presented by: Virginia Dambach Dambach &