the sliding window algorithm
play

The Sliding Window Algorithm The Sliding Window algorithm sums - PowerPoint PPT Presentation

The Sliding Window Algorithm The Sliding Window algorithm sums several small sub-matrices of a matrix of values. The maximum of these sums is then located and it's coordinates passed out of the program. This is used as a


  1. The Sliding Window Algorithm ● The “Sliding Window” algorithm sums several small sub-matrices of a matrix of values. ● The maximum of these sums is then located and it's coordinates passed out of the program. ● This is used as a calorimetry trigger – used for locating events with high energy jets. – (The following 6 slides have been copied from Matthew's presentation)

  2. Sliding window: serial

  3. Sliding window: serial

  4. Sliding window: serial

  5. Sliding window: serial

  6. Sliding window: serial etc...

  7. Sliding window: parallel (5x5 border around each submatrix – use your imagination)

  8. Two Approaches to the Sliding Window ● CPU Algorithm (Standard) ● GPU Algorithm ● Hybrid Algorithm – Use the GPU to perform the sliding window sum – Transfer the resulting matrix of sums to CPU – Use the CPU to locate the maximum

  9. Motivation : Time Complexity ● The time complexity of the algorithm to locate a maximum on the CPU is linear O(N) ● The time complexity of the best algorithm to do the same on the GPU is O(N log N) – Even if the GPU cores and CPU cores had the same processing speed, more calculations are required by the GPU to perform the same task.

  10. Small Problem : Find Max Note that at ATLAS scale problems (~5,000) this algorithm performs MUCH worse than the CPU version. Matthew wrote a new algorithm that works better at ATLAS scale problems but not as well at extreme values. The speed-up is not fully realized for small window sizes because the GPU finishes the calculation nearly as fast as new calculation commands are issued.

  11. Small Problem : Sliding Window Speed-Up For those concerned : The sudden drops on this plot are because of my testing procedure with rectangular grids of varying dimensions. Threads are issued 1 warp (32 threads) at a time and I declared each block to be a constant 256 threads. Because of this there are problem sizes for which a large number of threads are inactive.

  12. Large Problem : Find Max Even at extremely large sizes the speed up offered by the GPU algorithm pales compared to the speed-up of the sliding window.

  13. Large Problem : Sliding Window Speed-Up Here the algorithm has plateaued. The speed-up for any algorithm is limited by the number of GPU cores which can run simultaneously.

  14. Motivation : Processing Speed vs Copy Speed ● The GPU cores are individually much slower than a CPU core. ● The copy speed from the GPU to CPU is very fast – and the result that needs to be copied is relatively small. – It may be worth the time to copy the memory to the CPU as it can do it much faster.

  15. Small Problem Fraction Plots

  16. Large Problem Fraction Plots

  17. Conclusion ● At ATLAS scale, the speed-up grows fastest and is greatest for the Hybrid algorithm (see ratio plot) ● Beyond the ATLAS scale (a factor of ~10 greater) the purely GPU algorithm becomes better.

  18. Small Problem : Ratio At small problem sizes (current ATLAS size is around 5,000) the Hybrid algorithm provides greater speed-up

  19. Large Problem : Ratio This shows that at extremely large problem sizes the purely GPU based algorithm provides a greater speed-up than the hybrid.

  20. Small Problem Speed Up

  21. Large Problem Speed-Up

  22. Backup Slide : Cuda Card Specs ● 8 SM (streaming multiprocessors) with 192 cores each (1,536 cores) @ ~1000 MHz each ● ~15.75 GB/s bandwidth to host

Recommend


More recommend