SKYSTITCH A Cooperative Multi-UAV-based Real-time Video Surveillance System with Stitching Xiangyun Meng, Wei Wang and Ben Leong National University of Singapore
Motivation Aerial video surveillance has become ubiquitous • Search & rescue • TV live broadcast • Border monitoring
Motivation We always prefer… Higher resolution Larger field of view (More details) (Better awareness)
The problem Single aircraft Limited resolution Limited field of view
Increasing #aircraft?
Increasing #aircraft? Hard to correlate multiple videos!
Stitch them together!
Stitch them together! Video Stitching Video Streaming
Challenges Image stitching is Artefacts affect computationally perceptual expensive experience
Our contributions 4X improvement on Sensor Distributed stitching speed GPU Architecture Hints H Ensuring good quality Failure Sanity Fusion under dynamic conditions Recovery Check
Image stitching in a nutshell Image alignment 1. Feature extraction 2. Feature matching 3. Estimation of H 4. Compositing H
Image stitching in a nutshell Image alignment 1. Feature extraction 2. Feature matching 3. Estimation of H 4. Compositing H
Image stitching in a nutshell Image alignment 1. Feature extraction 2. Feature matching 3. Estimation of H 4. Compositing H
Image stitching in a nutshell Image alignment 1. Feature extraction 2. Feature matching 3. Estimation of H 4. Compositing H
Conventional architecture Video frames Feature Feature RANSAC Compositing Extraction Matching Ground Station Stitching Pipeline
Conventional architecture Video frames + Features Feature Feature Feature Feature Feature RANSAC Compositing Extraction Extraction Extraction Extraction Matching 6ms / image Ground Station Stitching Pipeline
Offloading feature extraction 300 CPU Benchmark 250 6X faster than CPU benchmark GPU Benchmark 2X faster than GPU benchmark 200 SkyStitch Time (ms) Scalability: constant 150 100 50 0 2 3 4 5 6 7 8 9 10 11 12 # Video sources
Speed optimization Video streams + Features Feature RANSAC Compositing Matching Ground Station Stitching Pipeline
Speeding up feature matching ¨ Bruteforce feature matching: inefficient and error- prone ¨ 1K features ➔ 1 million comparisons ➔ 16 ms + + + + + + + + + + + + + + + + + + + + + + + + + +
Exploiting flight status information Camera Attitude Accelerometer Gyroscope Compass GPS Camera Heading Camera Location
Exploiting flight status information ¨ Idea: estimate the matched feature’s location + + + + + + + + + + + + + + + + + + + + + + + + + +
Exploiting flight status information ¨ Idea: search for the matched feature around the estimated location
Exploiting flight status information ¨ Idea: search for the matched feature around the estimated location r r < 30 pixels for a 1280x1024 image
Speeding up feature matching 20 18 GPU Benchmark 70X faster than CPU benchmark 16 SkyStitch 4X faster than GPU benchmark 14 Time (ms) 12 Note: SkyStitch’s feature 10 matching runs on CPU! 8 Potentially much faster if 6 implemented on GPU. 4 2 0 2 3 4 5 6 7 8 9 10 11 12 # Video sources
Speed optimization Video streams + Features Feature GPU GPU RANSAC Matching++ Compositing 20X faster 30 ms for compositing 12 HD images Ground Station
Putting things together… 40 CPU Benchmark 35 GPU Benchmark Stitching rate (fps) 30 SkyStitch 25 SkyStitch: 22 fps 20 15 10 5 GPU Benchmark: 4.2 fps CPU Benchmark: 1.4 fps 0 2 3 4 5 6 7 8 9 10 11 12 # Video sources
Speed is not everything Perspective Frame Perspective distortion drops jerkiness
Frame drops ¨ When we get a bad homography, we have to drop the frame Frame n L H n Frame n × Stitched frame n+1
Failure recovery ¨ Instead of dropping the frame, we predict a good one Computing an optical flow F homography F on each UAV UAV 1 Frame n Frame n+1 1, n + 1 L H n Predict H n+1 : − 1 H n M 1, n F − 1 M 1, n + 1 − 1 H n + 1 = M 2, n + 1 F 2, n + 1 M 2, n 1, n + 1 F Frame n Frame n+1 2, n + 1 UAV 2 ( M is the orthorectification matrix) × Stitched frame n+1
Failure recovery ¨ Instead of dropping the frame, we predict a good one Computing an optical flow F homography F on each UAV UAV 1 Frame n Frame n+1 1, n + 1 J H n H n + 1 Predict H n+1 : − 1 H n M 1, n F − 1 M 1, n + 1 − 1 H n + 1 = M 2, n + 1 F 2, n + 1 M 2, n 1, n + 1 F Frame n Frame n+1 2, n + 1 UAV 2 ( M is the orthorectification matrix) ✔ Stitched frame n+1
Perspective jerkiness
Quantifying jerkiness Frame 1 R + tn T Roll, pitch, yaw H 1 Frame 1
Stitching each pair of frame Frame 1 Frame 2 Frame 3 Frame 4 H 1 H 2 H 3 H 4 Frame 1 Frame 2 Frame 3 Frame 4
Stitching is noisy Pitch angles in estimated homographies 3 Only Stitching 2.5 2 1.5 Angle (deg) 1 0.5 0 -0.5 -1 -1.5 0 50 100 150 200 250 Frames
Observation We have two homography F 1, n + 1 solutions for a pair of frames Frame n Frame n+1 H n H n + 1 One is from stitching F 2, n + 1 Frame n Frame n+1 The other is from prediction Stitched Stitched frame n frame n+1
Keep doing prediction Frame 1 Frame 2 Frame 3 Frame 4 H 1 H 2 H 3 H 4 Frame 1 Frame 2 Frame 3 Frame 4 Stitching Prediction Prediction Prediction
Optical flow is drifty Pitch angles in estimated homographies 3 Only Optical Flow 2.5 2 1.5 Angle (deg) 1 0.5 0 -0.5 -1 -1.5 0 50 100 150 200 250 Frames
Stitching vs. Prediction Short Term Long Term Stitching Noisy Stable Prediction Smooth Drifty
Idea: fuse them n 1 Matched features H stitching t 1 R 1 Multiplicative R’ Solve Translation Extended R 2 Kalman Filter t 2 H prediction H’ t’ n 2
Fusion Pitch angles in estimated homographies 3 Only Stitching 2.5 Only Optical Flow 2 Fused 1.5 Angle (deg) 1 0.5 0 -0.5 -1 -1.5 0 50 100 150 200 250 Frames
Implementation UAVs: Two DIY Quadcopters Ground station: A single Linux desktop 16k lines of C/C++ DIY Quadcopter ($1200 USD each)
Demo 1 Demo 1 Synchronized shutter Very strong wind Camera views UAV 1 UAV 2
Demo 2 Demo 2 Simulation of stitching 12 video streams
Conclusion & Future work Present Future Test in more complex Real-time scenarios performance (20 fps @ 12 videos) Maximizing field of view High quality (High success rate, Network optimization low jerkiness)
Thank you!
Unused slides
Performance of existing solutions Image alignment 1. Feature extraction 2. Feature matching 3. Estimation of H 4. Compositing H 6 ms per image 3 ms per image pair 11 ms per image pair (GPU) (GPU) (CPU) Test setup: Intel Core i7 2600K; GeForce GTX 670; OpenCV 2.4.8 with CUDA; 1000 features per image
Performance of existing solutions ¨ Each stage could be a computational bottleneck ¨ Optimize each stage one by one
Offloading feature extraction Video streams + Features Feature RANSAC Compositing Matching Ground Station Stitching Pipeline
Exploiting flight status information Cameras can be tilted due to wind turbulence Quadcopter attitude Flight controller Orthorectification R Camera M = KBR − 1 B − 1 K − 1 Orthorectified Video video frames frames Warp each image as if camera is always pointed vertically downwards
Speed optimization Video streams + Features Feature RANSAC Compositing Matching++ Ground Station
Speeding up RANSAC ¨ Existing RANSAC homography estimator ¤ Each iteration: Solve a 4-point homography n Do SVD on a 9x9 matrix to find the eigenvector corresponding to eigenvalue zero. ¤ Takes 11 ms for 512-iteration RANSAC on a 3.4GHz Core i7 ¤ GPU SVD? n SVD is not well suited for GPU architecture (shown later)
Speeding up RANSAC ¨ Idea: no need to do SVD at all! ¤ Just find the null vector for the 9x9 matrix ¤ Gauss-Jordan elimination is sufficient n Well suited to GPU architecture n Much simpler code n No branching n Takes 0.6 ms for 512-iteration RANSAC
Speeding up RANSAC ¨ Idea: maximize parallelism and minimize I/O ¤ Compute ALL pairwise homographies in one pass 4 − point Candidate correspondences homographies All matches Inlier mask Scores p q 1 2 3 4 H p q Pair 1 2 3 4 H p q ... ... Pair i 1 , j 1 i 1 , j 1 p q 1 2 3 4 H p q Reprojection 1 2 3 4 H ... ... ... Error Gauss Jordan p q ... ... ... Pair i k , j k Pair p q Elimination ... i k , j k p q H 0 H n − 1 1 2 3 4 H p q H 0 H n − 1 Uploaded Uploaded Downloaded Downloaded from CPU from CPU from GPU from GPU
Speeding up RANSAC 180 benchmark, CPU 160 benchmark, GPU RANSAC time (ms) 140 SkyStitch 24X faster than CPU benchmark 120 18X faster than GPU benchmark 100 based on Jacobian SVD 80 60 40 20 0 2 4 6 8 10 12 Number of video sources (512 RANSAC iterations)
Multiple video sources H 1 Camera 1 Camera 2 H 4 H 2 H 3 Camera 4 Camera 3
Recommend
More recommend