fpga vs gpu performance comparison on the implementation
play

FPGA vs GPU Performance Comparison on the Implementation of FIR - PDF document

FPGA vs GPU Performance Comparison on the Implementation of FIR Filters FPGA. While comparing the performance of them, we Abstract choose different models for each platform to get fairer FIR filters find place in digital signal processing


  1. FPGA vs GPU Performance Comparison on the Implementation of FIR Filters FPGA. While comparing the performance of them, we Abstract choose different models for each platform to get fairer FIR filters find place in digital signal processing comparison, since the performance has steep difference applications that require stopping a frequency band while between different models and architectures of the CPU, passing another band or removing noise. Due to the complex GPU and FPGA. Previous work usually takes one model structure and parallelism property of FIR filters, dedicated from each platform and compares the performance of these reconfigurable hardware are preferred for implementation platforms only by comparing the results of one model from rather than CPUs. Recently, GPGPU emerged as an each platform, which may misguide the researchers. effective technique for solving computation-intensive Therefore, 3 different FPGAs, 5 GPUs, and 4 different CPU problems having massive level of parallelism. In this paper, models are selected for comparison. In section two, previous we took FIR filtering application with different tap sizes and works are given in which FPGA and GPUs are compared for implemented them on different FPGA and GPU models performance. Section three gives a summary of GPGPU using both OpenCL and CUDA platforms. We have programming architecture for OpenCL and CUDA. The evaluated FIR filters’ performance s using two different details of FIR the filter implementations on different kernels on GPU and compared the performances with platforms are given in section four. We show comprehensive various FPGA implementations by taking an OpenMP performance results and discuss them in section five. Finally implementation that utilizes all available cores in single in section six the discussion is concluded. CPU as a baseline performance point. In general, FPGA 2. Related Work outperformed GPU in terms of output samples produced per second. But GPU is a life saver when very high order filters In their work Llamocca et al. compares the energy, are needed where FPGA cannot help due to their inadequate performance and accuracy of implementations of 2D logic units. difference of Gaussians (DOG) filter for real-time digital video processing applications on FPGA and GPU. The Keywords article concludes that for 2D filtering applications GPUs are FIR Filter, GPGPU, FPGA, heterogeneous computing better for performance and precision, but FPGAs have the 1. Introduction advantage of lower power dissipation [1]. Pauwels et al. made a comparison of FPGA and GPU performance on FIR (finite impulse response) filters are the most common computation of phase-based optical flow, stereo and local digital filters used in signal processing applications due to image features. Based on their work, GPUs overcome the linear phase response and always stable characteristics. FPGAs for perform ance aspects especially by GPU’s higher In signal processing, FIR filters are usually used for memory bandwidth and clock speed [2]. Kalarot and Morris stopping a frequency band while passing another frequency compare FPGA and GPU for implementation of real-time band or removing noise from an information carrying signal. stereo vision applications. Although prior works state that FIR filters find place themselves for applications varying FPGAs outperform GPUs [3], they conclude that GPUs are from radar, satellite and military to numerous industrial as effective as FPGAs when graphic processors are utilized systems; in fact, whenever an application involves signals, efficiently with CUDA [4]. processing operations on them is inevitable, where filtering is the most common operation. In their work, Zhang et al. take the operation of sparse matrix-vector multiplication (SpMV) for performance FIR filters are inherently parallel structures, so that by using comparison between FPGA and GPU. GPU greatly extra resources they can be implemented in a parallel outperforms FPGA when considered the memory transaction fashion to reduce the operation time. In high order FIR operations; however, when FPGA memory performance is filters, FPGAs were the common solution to achieve scaled to the GPU rates the FPGA exceeds the performance massive level of parallelism. However, programming of GPU [5]. In the digital video processing field, dynamic FPGAs is not as easy as programming microcontrollers or partial reconfiguration method allows designers to control digital signal processors (DSPs). Recently, GPGPU emerged resources based on energy, performance, and accuracy as an efficient technique for solving computer-intensive considerations. FPGA implementations of different problems having massive level of parallelism with the ease approaches utilized dynamic partial reconfiguration on of programmability. OpenCL and CUDA are the two most digital video processing [6] [7]. Recently, image and video common frameworks to program GPUs for general-purpose processing applications with OpenCL and CUDA applications [13] [16]. OpenMP is a parallel platform for programming have made practical performance as stated in CPUs and can also be used to parallelize FIR filter [8] and [9]. In their work Che et al. compares FPGA, GPU applications on CPU platforms [19]. However, due to the and CPU for three different applications: Gaussian fact that CPUs have small number of cores comparing to elimination, data encryption standard, and Needleman- GPUs, even with OpenMP the performance results usually is Wunsch algorithm. They conclude that the application not comparable with GPUs and FPGAs. characteristics are important for choosing the platform to In this work, we take the FIR filtering application and accelerate specific applications [10]. In their work Howes et implement it on different platforms, namely CPU, GPU and

Recommend


More recommend