A Fully-Automated High Performance Geolocation Improvement Workflow for Problematic Imaging Systems Devin White 1 , Sophie Voisin 1 , Christopher Davis 1 , Andrew Hardin 1 , Jeremy Archuleta 2 , David Eberius 3 , 1 Scalable and High Performance Geocomputation Team Geographic Information Science and Technology Group 2 Data Architectures Team Computational Data Analytics Group Oak Ridge National Laboratory 3 Innovative Computing Laboratory Department of Electrical Engineering and Computer Science University of Tennessee – Knoxville GTC 2016 – April 5, 2016
Outline Project background System overview Scientific foundation Technological solution Current system performance Managed by UT-Battelle for the Department of Energy
Background Overhead imaging systems (spaceborne and airborne) can vary substantially in their geopositioning accuracy The sponsor wanted an automated near real time geocoordinate correction capability at Satellites ground processing nodes upstream of their entire user community Extensible automated solution is using well- established photogrammetric, computer vision, and high performance computing techniques to reduce risk and uncertainty Manned Aircraft Robust multi-year advanced R&D portfolio aimed at continually improving the system through science, engineering, software, and hardware innovation We are moving towards on-board processing Unmanned Aerial Systems Managed by UT-Battelle for the Department of Energy
Isn’t This a Solved Problem? Systemic constraints – Space – Power – Quality/reliability of components – Subject matter expertise – Time – Budget – Politics Operational constraints – Collection conditions – Sensor and platform health – Existing software quality and performance – System independence Many of these issues are greatly amplified on UAS platforms Managed by UT-Battelle for the Department of Energy
Sponsor Requirements Solution must: – Be completely automated – Be government-owned and based on open source/GOTS code – Be sensor agnostic by leveraging the Community Sensor Model framework – Be standards-based (NITF, OGC, etc.) to enable interoperability – Clearly communicate the quantified level of uncertainty using standard methods – Be multithreaded and hardware accelerated – Construct RPC and RSM replacement sensor models as well as generate SENSRB/GLAS and BLOCKA tagged record extensions (TREs) – Improve geolocation accuracy to within a specific value – Complete a run within a specific amount of time The first sensor supported is one of the sponsor’s most important, but also its most problematic Managed by UT-Battelle for the Department of Energy
Technical Approach (General) 1. Ingest and preprocessing 2. Trusted source selection 3. Global localization (coarse alignment, in ground space) 4. Image registration to generate GCPs (fine alignment, in image space) 5. Sensor model resection and uncertainty propagation 6. Generation and export of new and improved metadata Managed by UT-Battelle for the Department of Energy
PRIMUS Pipeline Photogrammetric Registration of Imagery from Manned and Unmanned Systems PRIMUS Input NITF R2D2 Preprocessing Controlled Sources Core Libraries: Source Selection • NITRO (Glycerin) Orthorectification Reprojection • GDAL • Proj.4 Global Localization libpq (Postgres) • Mosaicking OpenCV • Registration • CUDA • OpenMP • CSM Resection MSP • CPU Implementation Metadata Output NITF GPU Implementation Managed by UT-Battelle for the Department of Energy
Source Selection Find and assemble trusted control imagery and elevation data that cover the spatial extent of an image. Elevation Input: image Source Selection Imagery Managed by UT-Battelle for the Department of Energy
Mosaic Generation Read images from Start disk Create bounding box Grow bounding 150% box Create (elevation + geoid) Mosaic imagery mosaic Query R2D2’s DB Returns image paths Managed by UT-Battelle for the Department of Energy
System Hardware CPU/GPU hybrid architecture – 12 Dell C4130 HPC nodes – Each node has: 48 logical processors 256GB of RAM Dual high speed SSDs 4 Tesla K80s – Virtual Machine option Managed by UT-Battelle for the Department of Energy
A Note on Virtualization We ran VMware on one of our nodes with mixed results We were able to access one GPU on that node through a VM using PCI passthrough, but the other seven remained unavailable due to VMware software limitations VMware, GPU, and OS resource requirements limited us to two VMs per node, which is not very helpful We greatly appreciate the technical assistance NVIDIA provided as we conducted this experiment Verdict: It’s still a little too early for virtualization to be really useful for high-density compute nodes with multiple GPUs Managed by UT-Battelle for the Department of Energy
PRIMUS Pipeline Photogrammetric Registration of Imagery from Manned and Unmanned Systems PRIMUS Input NITF R2D2 Preprocessing Controlled Sources Core Libraries: Source Selection • NITRO (Glycerin) Orthorectification Reprojection • GDAL • Proj.4 Global Localization libpq (Postgres) • Mosaicking OpenCV • Registration • CUDA • OpenMP • CSM Resection MSP • CPU Implementation Metadata Output NITF GPU Implementation Managed by UT-Battelle for the Department of Energy
Orthorectification Process Source image Begin Orthorectify Create bounding box Grow bounding box Query R2D2’s DB Returns image paths Read images from disk Create (elevation + geoid) mosaic Control Selection Global Localization Managed by UT-Battelle for the Department of Energy
Orthorectification Solution Accelerate portions of our OpenMP-enabled code with GPUs using CUDA – Sensor Model calculations – Band Interpolation calculations Optimize both of the CUDA kernels and their associated memory operations Create in-house Transverse Mercator CUDA device functions Combined the Sensor Model and Band Interpolation kernels Managed by UT-Battelle for the Department of Energy
Orthorectification Optimized Managed by UT-Battelle for the Department of Energy
Orthorectification Performance • JPEG2000-compressed commercial image pair (36,000 x 30,000 each) • GPU-enabled RPC orthorectification to UTM • Each is done in 8 seconds, using one eighth of a single node’s horsepower • 65,000,000,000 pixels per minute per node, running on multiple nodes • That includes building HAE terrain models on the fly from tiled global sources Managed by UT-Battelle for the Department of Energy
PRIMUS Pipeline Photogrammetric Registration of Imagery from Manned and Unmanned Systems PRIMUS Input NITF R2D2 Preprocessing Controlled Sources Core Libraries: Source Selection • NITRO (Glycerin) Orthorectification Reprojection • GDAL • Proj.4 Global Localization libpq (Postgres) • Mosaicking OpenCV • Registration • CUDA • OpenMP • CSM Resection MSP • CPU Implementation Metadata Output NITF GPU Implementation Managed by UT-Battelle for the Department of Energy
Global Localization - Coarse Adjustment Roughly determine where source and control images match. Adjust the sensor model. Triage step in the pipeline. Input: Output: source and control images coarse sensor model adjustments C C S S Global Localization Managed by UT-Battelle for the Department of Energy
Computation - Solution Space Solution Space: – Each possible shift (exhaustive search) C Solution: Solution space – Similarity coefficient between the source and the control sub-image S Managed by UT-Battelle for the Department of Energy
Similarity Metric Normalized Mutual Information 𝑂𝑁𝐽 = 𝐼 𝑇 + 𝐼 𝐷 𝐼 𝐾 Source image and mask: N S xM S pixels 𝑙 𝐼 = − 𝑞 𝑗 𝑚𝑝 2 𝑞 𝑗 𝑗=0 𝐼 is the entropy 𝑞 𝑗 is the probability density function 𝑙 ∈ 0. . 255 for S and C 0. . 65535 for J Histogram with masked area Control image and mask: N C xM C pixels – Missing data – Artifact – Homogeneous area Solution space: nxm NMI coefficients Managed by UT-Battelle for the Department of Energy
Visual Example Histogram computation (for normalized mutual information) – nVidia Histogram64 Histogram256 – Literature Joint histogram 80x80 bins – Our problem (joint)Histogram 65536 nxm times N S xM S data Managed by UT-Battelle for the Department of Energy
Kernel families How to leverage the GPU to compute one solution\one joint histogram (65536 bins) – 1 kernel per NMI computation Pros: use shared memory to piecewise fill the histogram - Cons: atomicAdd – syncthread for reduction – CPU call for each solution – 1 block per NMI computation (K1, K2) Pros: use shared memory to piecewise fill the histogram – 1 kernel to evaluate all solutions Cons: atomicAdd – syncthread for reduction – 1 thread per NMI computation (K3, K4, K5) Pros: global memory access read only - no atomicAdd – no syncthread – 1 kernel to evaluate all solutions Cons: stack frame 264192 Bytes / thread Managed by UT-Battelle for the Department of Energy
Recommend
More recommend