(UAS) Surveillance Video - IEEE Xplore

GPU-CPU Implementation for Super-Resolution Mosaicking of Unmanned Aircraft System (UAS) Surveillance Video Aldo Camargo1, Richard R.Schultz1, Yi Wang1, Ronald A. Fevig2, Qiang He3 1

Department of Electrical Engineering, University of North Dakota, Grand Forks ND 58202 USA 2 Department of Space Studies, University of North Dakota, Grand Forks ND 58202 USA 3 Department of Mathematics, Computer and Information Sciences, Mississippi Valley State Univ., Itta Bena MS 38941 USA Unmanned Aircraft Systems (UAS) have been used in many military and civilian applications, particularly surveillance. One of the best ways to use the capacity of a UAS imaging system is by constructing a mosaic of the recorded video. In this paper, we present a novel algorithm to calculate a super-resolution mosaic for UAS, which is both fast and robust. In this algorithm, the features points between frames are found using SIFT (Scale-Invariant Feature Transform), and then RANSAC (Random Sample Consensus) is used to estimate the homography between two consecutive frames. Next, a low-resolution (LR) mosaic is computed. LR images are extracted from the LR mosaic, and then they are subtracted from the input frames to form LR error images. These images are used to compute an error mosaic. The regularization technique uses Huber prior information and is added to the error mosaic to form the superresolution (SR) mosaic. The proposed algorithm was implemented using both a GPU (Graphics Processing Unit) and a CPU (Central Processing Unit). The first part of the algorithm, which is the construction of the LR mosaic, is performed by the GPU, and the rest is performed by the CPU. As a result, there is a significant speed-up of the algorithm. The proposed algorithm has been tested in both the infrared (IR) and visible spectra, using real and synthetic data. The results for all these cases show a great improvement in resolution, with a PSNR of 41.10 dB for synthetic data, and greater visual detail for the real UAV surveillance data. Index Terms— GPU, Computer vision, Huber Regularization, Ill-Posed Inverse Problem, Mosaic, Parallel Programming, Super-Resolution.

M

I. INTRODUCTION

ulti-frame super-resolution refers to the particular case where multiple images of a particular scene are available [3]. The idea is to use the low-resolution images containing motion, camera zoom, and out-of-focus blur to recover extra data to reconstruct an image with a resolution above the limits of the camera. The super-resolved image should have more detail that any of the low resolution images. Mosaicking is the alignment or stitching of two or more images into a single composition representing a large 2D scene [5]. Generally, mosaics are used to create a map which is impossible to visualize with only one video frame. Mosaicking super-resolution combines both methods, and it has a number of applications when frames from a UAS or satellite are used. One clear application is the surveillance of certain areas during nighttime hours with the use of an IR imaging system. The UAS can fly over areas of interest and generate super-resolved mosaics that can be analyzed at the ground control station. Another important application is the supervision of high voltage transmission lines, oil pipelines, and the highway system. NASA also uses super-resolution mosaicking to study the surface of Mars, the Moon, and other planets. Super-resolution mosaicking has been studied by many researchers. Zomet and Peleg [6] use the overlapping area within a sequence of video frames to create a super-resolved mosaic. In this method, the SR reconstruction technique proposed in [7] is applied to a strip rather than an entire image. This means that the resolution of each strip is enhanced by the

978-1-4244-7802-6/10/$26.00 ©2010 IEEE

25

use of all frames that contain that particular strip. The disadvantage is that it is computationally expensive. Ready and Taylor [8] use a Kalman filter to compute the superresolved mosaic. They add unobserved data to the mosaic with Dellaert’s method. Basically, they construct a matrix that relates the observed pixels to estimated mixel values. This matrix is constructed using the homography and the point spread function (PSF). However, this matrix is extremely large, so they use a Kalman filter and diagonalization of the covariance matrix to reduce the amount of storage and computation required. The drawback of this algorithm is the use of the large matrix, and the best results with synthetic data obtain a PSNR of 31.6 dB. Simolic and Wiegand [9] use a method based on image warping. In this method, each pixel of every frame is mapped into the SR mosaic, and its gray value is assigned to the corresponding pixel in the SR mosaic within a range of ± 0.2 pixel units. The drawback of this method is that it requires that the estimated motion vectors and homography must be accurate, which is difficult when dealing with real surveillance video from a UAS. Wang et al. [10] use the overlapped area within five consecutive frames from a video sequence, along with use sparse matrices to model the relationship between the LR and SR frames, which is solved using maximum a posteriori estimation. To deal with the illposed problem of the super-resolution model, they use hybrid regularization. The drawback of this method is that it has to be applied every five frames, which means that every five frames, several sparse matrices have to be constructed. Therefore, this method does not seem to be appropriate for dealing with a real video sequence that has thousands of frames. Pickering and

SSIAI 2010

Ye [1] proposed an interesting model for the mosaicking and super-resolution of video sequences, using the Laplacian operator to determine the regularization factor. The problem with the use of the Laplacian, however is spatial smoothness. Therefore, noise and edge pixels are removed in the regularization process, eliminating sharp edges [11]. Our method combines the ideas of many of these techniques, but it also inserts a different way of dealing with super-resolution mosaicking that does not require the construction of sparse matrices. Therefore, it is feasible to apply the algorithm to a relative large video sequence and obtain a video mosaic. Additionally, we use Huber regularization which preserves high frequency pixels, so sharp edges are maintained. This paper is organized as follows. Section II describes the observation model that we are using in this paper. Section III describes the mathematical formulation to find the optimal solution to the ill-posed problem. Section IV describes the implementation of the algorithm and how tasks are distributed to the GPU and CPU processors, and finally Section V shows the results for color synthetic data and real IR surveillance video.

for SR mosaicking. This regularization term helps to convert the ill-posed inverse problem into a well-posed problem. We use Huber regularization, because it preserves edges [2][3]: ‫ݔ‬ො ൌ ௫ ൛σ௄௞ୀଵԡ‫ݕ‬௞ െ ‫ܤܦ‬௞ ܹ௞ ܴሾ‫ݔ‬ሿ௞ ԡଶଶ ൅ ߣ σ௚ఢீ௫ ߩሺ݃ǡ ߙሻൟ

(3)

The Huber function [3] is defined as ݂݅ȁ‫ݔ‬ȁ ൑ ߙ ‫ݔ‬ଶǡ ൠ. ߩሺ‫ݔ‬ǡ ߙሻ ൌ ൜ ʹߙȁ‫ݔ‬ȁ െ ߙ ଶ ǡ ‫݁ݏ݅ݓݎ݄݁ݐ݋‬

(4)

Based on the gradient descent algorithm for minimizing (3), the robust iterative update for ‫ݔ‬ො can be expressed as ‫ݔ‬ො ሺ௡ାଵሻ ൌ ‫ݔ‬ො ሺ௡ሻ ൅ ߙ ሺ௡ሻ ൛ܴ ் ሾ்ܹ௞ ‫ܤ‬௞் ‫ ்ܦ‬ሺ‫ݕ‬௞ െ ‫ܤܦ‬௞ ܹ௞ ܴሾ‫ݔ‬ොሺ௡ሻ ሿ௞ ሿ௄௞ୀଵ െ ߣሺ௡ሻ ‫ߩ ் ܩ‬ƍ ሺ‫ݔܩ‬ො ሺ௡ሻ ǡ ߙሻሽ (5)

where ‫ ܩ‬is the gradient operator over the cliques [2][4], and the regularization parameter is computed as ߣሺ௡ሻ ൌ ቆ

ሺ೙ሻ ൧ ቛ σ಼ ೖసభቛ௬ೖ ି஽஻ೖ ௐೖ ோൣ௫ො ೖ

௄ σ೒ചಸೣ ఘሺ௚ǡఈሻ

మ

ଶ

ቇ .

(6)

II. OBSERVATION MODEL Assuming that there are k LR frames available, the observation model can be represented as ‫ݕ‬௞ ൌ ‫ܤܦ‬௞ ܹ௞ ܴሾ‫ݔ‬ሿ௞ ൅ ݊௞ ൌ ‫ܪ‬௞ ‫ ݔ‬൅ ݊௞ .

(1)

Here, ‫ݕ‬௞ (k =1,2, …, K), ‫ݔ‬, and ݊௞ represent the kth LR image, the part of the real world depicted by the super-resolution mosaic, and the additive noise, respectively. The observation model in (1) introducesܴሾǤ ሿ୩ , which represents the reconstruction of the kth warped SR image from the original high-resolution data x [1]. The geometric warp operator and the blur matrix between‫ ݔ‬and the kth LR image ‫ݕ‬௞ are represented by ܹ௞ and‫ܤ‬௞ , respectively. The decimation operator is denoted as ‫ܦ‬. The warp operator is found by computing the SIFT features first, and then finding the best match using RANSAC between two consecutive frames. SIFT is used to find the features because it is robust to photometric and geometric changes between two consecutive frames [13]. III.

IV. IMPLEMENTATION The implementation of the proposed algorithm is not complicated, because it does not require the construction of a sparse matrix. Hence, the proposed algorithm is implemented using simple image operations such as warping, convolution, interpolation, and decimation. These operations are described as follows: x ܴ and ܴ ் : ܴ represents the reconstruction from the

x

x

ROBUST SUPER-RESOLUTION MOSAICKING

The estimation of the unknown SR mosaic image is not only based on the observed LR images, but also on many assumptions such as the blurring process and additive noise. The motion is estimated using a projective model representing the homography between frames; the blur is considered only optical. The additive noise ݊௞ is considered to be independent and identically distributed white Gaussian noise. Therefore, the problem of finding the maximum likelihood estimate of the SR mosaic image ‫ݔ‬ො can be formulated as ଶ ‫ݔ‬ො ൌ ௫ ሼσ௄ ௞ୀଵԡ‫ݕ‬௞ െ ‫ܤܦ‬௞ ܹ௞ ܴሾ‫ݔ‬ሿ௞ ԡଶ ሽ.

(2)

In this case, ԡǤ ԡଶ denotes the 2-norm. As the SR reconstruction is an ill-posed inverse problem, we need to add another term for regularization that contains prior information

26

x

mosaic image, and ܴ ் represents the construction of a mosaic. ܹ and ܹ ் : ܹ represents the backward warping, and ܹ ் is the forward warping by inverse of the homography. The homography is computed using SIFT features and the RANSAC algorithm [12]. ‫ ܤ‬and ‫ ்ܤ‬: ‫ ܤ‬represents the blurring effect, and is implemented by convolution with the PSF kernel. ‫ ்ܤ‬is implemented by convolution with the flipped [1] PSF kernel. ‫ ܦ‬and ‫ ்ܦ‬: ‫ ܦ‬represents image interpolation, while

‫ ்ܦ‬represents image decimation. Image interpolation refers to the process of upsampling followed by appropriate lowpass filtering, and image decimation refers to downsampling after appropriate anti-alias filtering. Figure 1 shows the task distribution between the GPU and CPU processors. The GPU task finds the SIFT features from the LR frames, and then it finds the matching points to compute the homography. With the homography, we construct the LR mosaic. The CPU takes this LR mosaic and creates

reconstructed LR frames by using the inverse transformation (i.e., using the inverse of the homography); the LR error images are computed by subtracting the original LR frames from the reconstructed images. After this, the LR error images are interpolated (use of the operator ‫) ܦ‬, and then added to the regularization term to obtain the error mosaic. This error mosaic is an iterative update to the original LR mosaic to construct the final SR mosaic. Table 1 shows the distribution of the processing time for both Task 1 (GPU) and Task 2 (CPU). This table was constructed based on the results of the SR mosaic for five LR IR video frames. We also show the time for the task 1 computed on the CPU. The speed up for Task1 using the GPU is 6 times faster than using the CPU. TABLE I DISTRUBUTION OF THE PROCESSING TIME IN THE GPU AND CPU TASKS Task GPU time (ms) CPU time (ms) Task 1: Find SIFT features, and 328 1985 then match them to find the homography. Task 2: Construct LR mosaic, 5422 error mosaic, regularization, and SR mosaic.

V. RESULTS We conducted two different tests, the first with color data and the second with real IR frames from UAS surveillance video. The parameter ߙ ሺ௡ሻ was chosen to be 1.75 for the entire test, and the algorithm was set to run for 7 iterations. Section 5(A) shows the results with the synthetic data and Section 5(B) describes the experiments for real UAS video frames. A. Results Using Synthetic Color Frames We created synthetic LR color frames from a high resolution image; these frames were created using different translations, rotations, and scales. Additionally, are blurred with a Gaussian kernel. Figure 2(a) is the input LR mosaic, the Figure 2(b) is the SR mosaic obtained by the proposed algorithm. Finally, Figure 2(c) is the mosaic using the high resolution frames, representing ground truth. The results obtained by our algorithm are close to the high resolution mosaic. The low resolution mosaic (Figure 2 (a)) is cloudy, with less detail and less sharpness than the SR mosaic (Figure 2 (b)). Resolution improvement is easy to see, and the SR mosaic is close to ground truth high resolution data (Figure 2 (c)). The PSNR that was computed according to (7) is 41.10 dB:

can better see the enhancement of the LR mosaic produced by our algorithm. VI. CONCLUSIONS The proposed algorithm performs as demonstrated in Figure 2 with synthetic and real UAS surveillance frames. The algorithm works with both IR and visible (i.e., RGB) frames. The visual results show that the resolution of the mosaic is improved. It is not necessary to use sparse matrices, so the processing time is reduced. Utilizing the GPU – CPU implementation works faster than utilizing only a CPU. VII. FUTURE WORK ሺ௡ሻ

The parameter ߙ was set to a fix value. We will compute this value for every iteration to make the algorithm more adaptive. Also, only a small number of extracted frames from a video were used; we are working to improve the algorithm to take the entire video and create the SR video mosaic. Furthermore, we are considering the use a parallel implementation of the conjugate gradient algorithm instead of the steepest decent method to compute the optimal solution of (2) as well as to split the image as in [14] to increase parallelization; consequently the speed-up will increase even more. ACKNOWLEDGMENT This research was supported in part by the FY2006 Defense Experimental Program to Stimulate Competitive Research (DEPSCoR) program, Army Research Office grant number 50441-CI-DPS, Computing and Information Sciences Division, “Real-Time Super-Resolution ATR of UAV-Based Reconnaissance and Surveillance Imagery,” (Richard R. Schultz, Principal Investigator, active dates June 15, 2006, through June 14, 2010). This research was also supported in part by Joint Unmanned Aircraft Systems Center of Excellence contract number FA4861-06-C-C006, “Unmanned Aerial System Remote Sense and Avoid System and Advanced Payload Analysis and Investigation,” as well as the North Dakota Department of Commerce grant, “UND Center of Excellence for UAV and Simulation Applications.” Additionally, the authors would like to acknowledge the contributions of the Unmanned Aircraft Systems Engineering (UASE) Laboratory team at the University of North Dakota. REFERENCES [1]

ଶହହమ ே

ܴܲܵܰ ൌ ͳͲ݈‫݃݋‬ଵ଴ ԡ௫ି௫ොԡమ

(7)

B. Results Using Real IR Surveillance Video from UAS We use LR frames from an aerial UAS IR surveillance video to compute the SR mosaic. The SR mosaic (Figure 2 (e)) contain more details, and also more sharpness than the low resolution mosaic (figure 2 (d)). In order to have a better appreciation of the results, Figure 2(f) shows a selected region of interest (ROI) area where you

27

[2] [3] [4] [5]

Mark Pickering, Getian Ye, Michael Frater and Jhon Arnold, “A Transform-Domain Approach to Super-Resolution Mosaicing of Compressed Images.” 4th AIP International Conference and the 1st Congress of the IPIA. Journal of Physics: Conference Series 124 (2008) 012039. Richard R Schultz, Li Meng, and Robert L. Stevenson, “Subpixel Motion Estimation for Multiframe Resolution Enhancement,” Proc. Visual Communication and Image Processing 1997, pp 1317-1328. Lynsey C. Pickup, “Machine Learning in Multi-frame Image Superresolution,” Ph.D. Dissertation, University of Oxford, 2007. Sean Borman, “Topic in Multiframe Superresolution Restoration,” Ph.D. Dissertation, University of Notre Dame, Notre Dame, Indiana, 2004. David Peter Capel, “Image Mosaicing and Super-resolution,” University of Oxford, Ph. D. Dissertation. University of Oxford, 2001.

[6] [7] [8] [9] [10] [11] [12] [13] [14]

Assaf Zomet and Shmuel Peleg, “Efficient Super-resolution and Applications to Mosaics,” Proc. of International Conference of Pattern Recognition, Sept 2000. M. Irani and S. Peleg, “Improving Resolution by Image Registration.” Graph. Models Image Process, vol. 53, pp. 231-239, March 1991. Bryce B. Ready, Clark N. Taylor and Randal W. Beard. “A Kalmanfilter Based Method for Creation of Super-resolved Mosaicks.” Robotics and Automation, 2006. UCRA 2006. Aljoscha Smolic and Thomas Wiegand, “High-resolution Video Mosaicking,” Proc. IEEE International Conference on Image Processing, Thessaloniki, Greece, October 2001. Yi Wang, Ronald Fevig and Richard R. Schultz. “Super-Resolution Mosaicking of UAV Surveillance Video,” ICIP 2008, pp 345-348, 2008. Sina Farsiu, Dirk Robinson, Michael Elad, Peyman Milanfar. “Fast and Robust Multi-Frame Super-resolution,” IEEE Transaction on Image Processing,Vol. 13, No. 10. (2004), pp. 1327-1344. M. Brown and D.G. Lowe, “Recognising Panoramas,” ICCV, Proceedings of the Ninth IEEE International Conference on Computer Vision – Volume 2, pp. 1218, 2003. Krystian Mikolajczyk and Cordelia Schmid. “A Performance Evaluation of Local Descriptors,” IEEE Transactions of Pattern Analysis and Machine Intelligence. Vol.27, No. 10, 2005. Antonio Plaza, Javier Plaza and Hugo Vegas. “Improving the Performance of Hyperspectral Image and Signal Processing Algorithms Using Parallel, Distributed and Specialized Hardware-Based Systems,” Journal Signal Processing Systems, DOI 10.1007/s11265-010-0453-1.

(a)

Fig. 1. Task distribution between GPU and CPU processors.

(b)

(c)

(f-1) LR ROI mosaic

(d)

(e)

(f-2) SR ROI mosaic

Fig. 2. Results of the SR mosaicking algorithm proposed. (a) is the LR mosaic for the synthetic color frames; (b) shows the SR mosaic, output of our proposed algorithm; (c) shows the HR mosaic (i.e., ground truth) (d) is the LR mosaic for a real sequence of IR video frames, (e) is the SR mosaic, output of our proposal algorithm, (f) is a region of interest (ROI) extracted to see more details of the LR (f-1) and SR mosaic (f-2) respectively.

28