a parallel implementation of quicksort and its
play

A Parallel Implementation of Quicksort and its Performance - PowerPoint PPT Presentation

A Parallel Implementation of Quicksort and its Performance Evaluation Philippas Tsigas Yi Zhang Department of Computing Science Chalmers University of Technology (c) Ph. Tsigas, Y. Zhang The aim of our work Sorting is an


  1. A Parallel Implementation of Quicksort and its Performance Evaluation Philippas Tsigas Yi Zhang Department of Computing Science Chalmers University of Technology (c) Ph. Tsigas, Y. Zhang

  2. The aim of our work � Sorting is an important kernel � Parallel implementations of sorting � Based on message-passing machines, � Sample sort � New developments in computer architecture bring us new research opportunities � Cache-Coherent shared memory � Tightly-coupled multiprocessor (c) Ph. Tsigas, Y. Zhang

  3. Quicksort � Advantages � General purpose � In-place � Good cache-behavior � Simple � Disadvantages � Parallel implementations do not scale up. (c) Ph. Tsigas, Y. Zhang

  4. Our Approach 3+1 Phases � Parallel Partition of the Data � Block based partition � Cache efficient � Sequential Partition of the Data � At most P+1 blocks (P: Number of processors) � Process Partition � Sequential Sorting with Helping � Load-balancing � Non-blocking synchronization (c) Ph. Tsigas, Y. Zhang

  5. The advantages of our approach � General purpose � In-place � Good cache-behavior � Fine grain parallelism � Good speedup in theory (c) Ph. Tsigas, Y. Zhang

  6. Experimental Results (8M Integers) 1P 2P 4P 8P 16P 32P 16 14 12 10 Speedup 8 6 4 2 0 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-8M [G]-8M [Z]-8M [B]-8M [S]-8M (c) Ph. Tsigas, Y. Zhang

  7. Experimental Results (32M Integers) 1P 2P 4P 8P 16P 32P 30 25 20 Speedup 15 10 5 0 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-32M [G]-32M [Z]-32M [B]-32M [S]-32M (c) Ph. Tsigas, Y. Zhang

  8. Experimental Results (64M Integers) 1P 2P 4P 8P 16P 32P 30 25 20 Speedup 15 10 5 0 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-64M [G]-64M [Z]-64M [B]-64M [S]-64M (c) Ph. Tsigas, Y. Zhang

  9. Experimental Results (128M Integers) 1P 2P 4P 8P 16P 32P 30 25 20 Speedup 15 10 5 0 PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS PQuick PSRS [U]-128M [G]-128M [Z]-128M [B]-128M [S]-128M (c) Ph. Tsigas, Y. Zhang

  10. Conclusions � Quicksort can beat Sample Sort on cache- coherent shared memory multiprocessors. � Fine grain parallelism that incorporates non- blocking synchronization can be efficient. � Cache-coherent shared memory multiprocessors offer many new research opportunities. (c) Ph. Tsigas, Y. Zhang

Recommend


More recommend