23 September 2010 Using GPU VSIPL & CUDA to Accelerate RF Clutter Simulation Accelerate RF Clutter Simulation 2010 High Performance Embedded 2010 High Performance Embedded Computing Workshop ECRB - HPC - 1 Dan Campbell, Mark McCans, Mike Davis, Mike Brinkmann dan.campbell@gtri.gatech.edu GTRI_B-1 ECRB - HPC - 1
Outline RF Clutter Simulation Validation Approach GPU VSIPL GPU VSIPL Precision Issues VSIPL Port, Optimization, and Results ECRB - HPC - 2 GTRI_B-2 ECRB - HPC - 2
Outline RF Clutter Simulation Validation Approach GPU VSIPL GPU VSIPL Precision Issues VSIPL Port, Optimization, and Results ECRB - HPC - 3 GTRI_B-3 ECRB - HPC - 3
Radar Clutter Radar Clutter Radar will observe echo from object… ECRB - HPC - 4 …as well as a strong return from the ground. g g Strong returns from the ground, called “clutter”, often limit the g g , , performance of radars in air-to-air and air-to-ground operations. GTRI_B-4 ECRB - HPC - 4
Synthetic Air-to-Air Clutter Synthetic Air to Air Clutter 7,500 Hz 10,000 Hz 12,500 Hz 35 400 400 400 30 350 350 350 25 300 300 300 Range Bin Range Bin Range Bin 250 250 20 250 200 200 200 15 150 150 150 10 100 100 100 5 50 50 50 0 -5000 0 5000 -5000 0 5000 -5000 0 5000 Doppler (Hz) Doppler (Hz) Doppler (Hz) MPRF MPRF Look-Down MPRF L k D MPRF RG-HPRF RG HPRF HPRF HPRF 16 6 14 5 ECRB - HPC - 5 12 Bin Bin 4 10 10 Range Range 8 3 6 2 4 2 1 -2 2 -1 1 0 0 1 1 2 2 -5 5 0 0 5 5 Doppler (Hz) 4 Doppler (Hz) 4 x 10 x 10 Targets at same range/Doppler as clutter will be obscured. GTRI_B-5 ECRB - HPC - 5
RF Clutter Simulation RF Clutter Simulation Approach : Sub-divide ground into number of unresolvable clutter patches and compute l bl l tt t h d t contribution of each. ECRB - HPC - 6 GTRI_B-6 ECRB - HPC - 6
RF Clutter Simulation RF Clutter Simulation Delayed Phase Shift Signal Signal ECRB - HPC - 7 Radar clutter data is sum of delayed and phase shifted versions of radar waveform. h hift d i f d f GTRI_B-7 ECRB - HPC - 7
RF Clutter Simulation RF Clutter Simulation Notional Parameters Notional Parameters Air-to-Air SAR Imaging Our Test (Air-to-Ground) # of Range Bins 200 1750 500 # of Pulses 128 3000 8 # of Clutter 6,800 Rng x 96 Az 14,500 Rng x 26,812 Az 566 rng x 52 az Patches = 6.5 x 10 5 = 3.8 x 10 8 = 29,432 ECRB - HPC - 8 Computational load depends on radar parameters and collection geometry (e.g., high resolution scenarios require a large number of independent clutter patches) require a large number of independent clutter patches) GTRI_B-8 ECRB - HPC - 8
RF Clutter Simulation RF Clutter Simulation Algorithm: Inputs Inputs Radar Parameters (waveform, antenna, etc.) • Location of platform for each pulse • Output Si Simulated radar data cube (sample voltage for each pulse, each channel, and each range bin) l t d d d t b ( l lt f h l h h l d h bi ) • For each pulse and for each range bin… For each clutter patch in this range ring For each clutter patch in this range ring… 1. Compute range, azimuth, and elevation from platform to clutter patch. ECRB - HPC - 9 2 Scale contribution of this clutter patch according to the radar 2. Scale contribution of this clutter patch according to the radar range equation. 3. Accumulate the contribution of this clutter patch to the simulated data cube. s u ated data cube GTRI_B-9 ECRB - HPC - 9
Outline RF Clutter Simulation Validation Approach GPU VSIPL GPU VSIPL Precision Issues VSIPL Port, Optimization, and Results ECRB - HPC - 10 GTRI_B-10 ECRB - HPC - 10
Validation Needs • Porting MATLAB C introduces changes • Random Number Generator Random Number Generator • Double Single • Implementation of some functions e.g. transcendentals p g • Reordering of operations • Programmer Error • Identical output too costly ECRB - HPC - 11 • Derive acceptance criteria from expected usage needs • Derive acceptance criteria from expected usage needs GTRI_B-11 ECRB - HPC - 11
Validation Approach • Modify sim to capture RNG stream from MATLAB • Automate large number of runs for golden data A t t l b f f ld d t • Accelerated port optionally ingests RNG stream • Capture port output and compare to golden data • Acceptance Criteria: • Acceptance Criteria: CNR ∆ = ( CNR M – CNR T ) / CNR M < 10 - 4 ECRB - HPC - 12 • ECR = 20 log10( norm(M(:) • ECR = 20 log10( norm(M(:) - T(:)) / norm(M(:)) ) < -60dB T(:)) / norm(M(:)) ) < 60dB • ADMSE = Mean( | fft2(M(:)) - fft2(T(:)) | 2 ) < 10 - 3 GTRI_B-12 ECRB - HPC - 12
Outline RF Clutter Simulation Validation Approach GPU VSIPL GPU VSIPL Precision Issues VSIPL Port, Optimization, and Results ECRB - HPC - 13 GTRI_B-13 ECRB - HPC - 13
GPU VSIPL http://www.vsipl.org Industry standard C API for portable dense linear Industry standard C API for portable dense linear algebra & signal processing Also C++, Python Accelerated implementations for many platforms, primarily embedded, coprocessor-based systems ECRB - HPC - 14 VSIPL implementation that exploits VSIPL implementation that exploits Graphics Processing Units to accelerate VSIPL applications – developed at GTRI http://gpu-vsipl.gtri.gatech.edu GTRI_B-14 ECRB - HPC - 14
Outline RF Clutter Simulation Validation Approach GPU VSIPL GPU VSIPL Precision Issues VSIPL Port, Optimization, and Results ECRB - HPC - 15 GTRI_B-15 ECRB - HPC - 15
Original Validation Results VSIPL versions compared to MATLAB version VSIPL Double VSIPL Single Threshold CNR Consistent Yes Yes CNR ∆ CNR ∆ 10 1 6 10 - 1 6 10 6 10 - 6 < 10 - 4 10 4 ECR ECRB - HPC - 16 -152 dB 2.9 dB < -60 dB ADMSE 10 - 1 2 10 4 < 10 - 3 GTRI_B-16 ECRB - HPC - 16
Single Precision Single precision errors caused by high dynamic range in platform to clutter patch range calculation: range in platform to clutter patch range calculation: d(Platform clutter) >>> d(clutter patch clutter patch) Solution: use far-field approximation technique • Double precision used to compute a base range Double precision used to compute a base range • Single precision for sets of ∆ R values ECRB - HPC - 17 • Small number of double precision calculations has negligible affect on performance GTRI_B-17 ECRB - HPC - 17
Far Field Approx. via Taylor Expansion Range between platform at x and clutter patch at y Unit vector from Distance CPI center to CPI center to from center from center Linear approximation near x 0 clutter patch of scene, Distance travelled in direction orthogonal to ECRB - HPC - 18 “lines” of constant range lines of constant range Quadratic Term GTRI_B-18 ECRB - HPC - 18
Bounding Error Approximation Error Case 1: Air-to-Air 128 pulses, 20 kHz PRF, 300 m/s velocity 10 km Altitude error < 50 µ m < 0.06° phase at X band p µ Case 2: SAR 10 second dwell, 100 m/s velocity ECRB - HPC - 19 10 km Altitude 10 km Altitude error < 12.5 m >> λ at X band!!! Linear approximation to range may be appropriate for typical air-to-air scenarios. GTRI_B-19 ECRB - HPC - 19
Validation Results Comparison to original MATLAB version • Approximation technique used in each version listed Approximation technique used in each version listed MATLAB VSIPL VSIPL Threshold Single Double Single CNR Yes Yes Yes Consistent CNR ∆ 10 - 7 10 - 14 10 - 5 < 10 - 4 ECRB - HPC - 20 ECR - 101 dB 0 d -130 dB 30 d -98 dB 98 d < -60 dB 60 d ADMSE 10 - 7 10 - 10 10 - 6 < 10 - 3 GTRI_B-20 ECRB - HPC - 20
Outline RF Clutter Simulation Validation Approach GPU VSIPL GPU VSIPL Precision Issues VSIPL Port, Optimization, and Results ECRB - HPC - 21 GTRI_B-21 ECRB - HPC - 21
VSIPL PORT • MATLAB to VSIPL port made easier due to VSIPL functions that emulate MATLAB operations p • Original MATLAB code very complex, particularly for radar novice • First pass of the port was done with almost no attempts at optimizations • GPU transition required some additional changes ECRB - HPC - 22 • Single vs Double precision issues g p • Time cost of operations differ TASP GPU • VSIPL needs “sample” function • VSIPL needs “sample” function GTRI_B-22 ECRB - HPC - 22
Optimization Issues • MATLAB code written for readability over speed • Too many nested loops, operations involving small datasets oo a y ested oops, ope at o s o g s a datasets • Many redundant calculations • Original code was very flexible, due to large user base Original code was very flexible, due to large user base • Most optimizations required removing some generality • Assumptions need to be made about the scenario • Abstraction barrier issues • Small operations less costly on CPU than GPU ECRB - HPC - 23 • Operation fusion, coarser operations, and leaving small things in C each helped GTRI_B-23 ECRB - HPC - 23
HPC Port – Performance Optimization progression of single precision VSIPL: 180s Matlab VSIPL GPU VSIPL 160s 140s 120s 100s 80s 60s ECRB - HPC - 24 40s 20s s GTRI_B-24 ECRB - HPC - 24
Recommend
More recommend