A Parallel Implementation of Quicksort and its Performance Evaluation Philippas Tsigas Yi Zhang Department of Computing Science Chalmers University of Technology (c) Ph. Tsigas, Y. Zhang
The aim of our work � Sorting is an important kernel � Parallel implementations of sorting � Based on message-passing machines, � Sample sort � New developments in computer architecture bring us new research opportunities � Cache-Coherent shared memory � Tightly-coupled multiprocessor (c) Ph. Tsigas, Y. Zhang
Quicksort � Advantages � General purpose � In-place � Good cache-behavior � Simple � Disadvantages � Parallel implementations do not scale up. (c) Ph. Tsigas, Y. Zhang
Our Approach 3+1 Phases � Parallel Partition of the Data � Block based partition � Cache efficient � Sequential Partition of the Data � At most P+1 blocks (P: Number of processors) � Process Partition � Sequential Sorting with Helping � Load-balancing � Non-blocking synchronization (c) Ph. Tsigas, Y. Zhang
The advantages of our approach � General purpose � In-place � Good cache-behavior � Fine grain parallelism � Good speedup in theory (c) Ph. Tsigas, Y. Zhang
Experimental Results (8M Integers) 1P 2P 4P 8P 16P 32P 16 14 12 10 Speedup 8 6 4 2 0 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-8M [G]-8M [Z]-8M [B]-8M [S]-8M (c) Ph. Tsigas, Y. Zhang
Experimental Results (32M Integers) 1P 2P 4P 8P 16P 32P 30 25 20 Speedup 15 10 5 0 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-32M [G]-32M [Z]-32M [B]-32M [S]-32M (c) Ph. Tsigas, Y. Zhang
Experimental Results (64M Integers) 1P 2P 4P 8P 16P 32P 30 25 20 Speedup 15 10 5 0 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-64M [G]-64M [Z]-64M [B]-64M [S]-64M (c) Ph. Tsigas, Y. Zhang
Experimental Results (128M Integers) 1P 2P 4P 8P 16P 32P 30 25 20 Speedup 15 10 5 0 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-128M [G]-128M [Z]-128M [B]-128M [S]-128M (c) Ph. Tsigas, Y. Zhang
Conclusions � Quicksort can beat Sample Sort on cache- coherent shared memory multiprocessors. � Fine grain parallelism that incorporates non- blocking synchronization can be efficient. � Cache-coherent shared memory multiprocessors offer many new research opportunities. (c) Ph. Tsigas, Y. Zhang
Recommend
More recommend