404
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2006
Target Tracking in Infrared Imagery Using Weighted Composite Reference Function-Based Decision Fusion Amer Dawoud, M. S. Alam, Senior Member, IEEE, A. Bal, and C. Loo
Abstract—In this paper, we propose a novel decision fusion algorithm for target tracking in forward-looking infrared image sequences recorded from an airborne platform. An important part of this study is identifying the failure modes in this type of imagery. Our strategy is to prevent these failure modes from developing into tracking failures. The results furnished by competing ego-motion compensation and tracking algorithms are evaluated based on their similarity to a target model constructed using the weighted composite reference function.
thetic discriminate function (SDF) to represent target model in FJTC tracking. Wang et al. [7] applied modular neural network classifier for the automatic target recognition in FLIR imagery. Bharadwaj and Carin [3] classified IR image targets (vehicles) using hidden Markov trees. Cooper and Miller [4] presented a deformable template representation accommodating both geometric and signature variability of objects in FLIR.
Index Terms—Decision fusion, forward-looking infrared (FLIR), target tracking (TT), weighted composite reference function (WCRF).
II. TRACKING FAILURE MODES IN CLOSING FLIR SEQUENCES
I. INTRODUCTION
D
ETECTION and tracking of targets in forward-looking infrared (FLIR) imagery is a challenging problem in pattern recognition and tracking applications. In contrast to visual images, the images obtained from an IR sensor have extremely low signal-to-noise ratio (SNR), which results in limited information for performing detection and tracking tasks. In addition, in FLIR images, nonrepeatability of the target signature, competing clutter, lack of a priori information, and high ego-motion of the sensor make detection or tracking of a target extremely harder. Most of the algorithms used in detection and tracking targets depend on the hot spot technique, which assume that the target is brighter than the background and acceptable levels of noise. Bal and Alam [2] used intensity variation function to capture the target intensity signature. Strehl and Aggarwal [6] compensated the global motion using a multi resolution scheme based on the affine motion model. Yilmaz et al. [8] utilized fuzzy clustering, edge fusion, and local texture energy for detecting targets in FLIR imagery. The position and the size of the detected targets are then used to initialize the tracking algorithm. Alam and Karim [1] used fringe-adjusted joint transform correlation (FJTC). Loo and Alam [5] used invariant synManuscript received May 25, 2004; revised January 25, 2005. This work was supported by the U.S. Army Research Office under Grant DAAD 19-02-1-0150. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Bruno Carpintieri. A. Dawoud is with the Department of Electrical and Computer Engineering, The University of South Alabama, Mobile, AL 36688 USA, and also with Rotoflex International, Inc., Mississauga, ON L6L 5K4 Canada (e-mail: adawoud@ usouthal.edu). M. S. Alam, A. Bal, and C. Loo are with the Department of Electrical and Computer Engineering, The University of South Alabama, Mobile, AL 36688 USA (e-mail:
[email protected];
[email protected]; helmentloo@ rocketmail.com). Digital Object Identifier 10.1109/TIP.2005.860626
An important part of the research is to understand the failure modes that would contribute to the failure of correlation-based tracking in FLIR imagery. We identified three failure modes that will be discussed in the rest of this section and they will be demonstrated in the frames shown in Fig. 1. For efficient target tracking (TT), it is desirable that the searching area should be chosen to be small to reduce the probability of false alarm errors and to reduce the computation cost. Therefore, in Fig. 1, we assume that the tracking algorithm has certain operational limits, which is indicated by the large white box. The result of TT is indicated by the black small box. A. Ego-Motion Compensation (EMC) Failure Mode FLIR closing sequences taken from airborne platform suffer from abrupt discontinuities in motion. The failure to compensate this ego-motion can cause the movement of the target outside the operational limits of the tracking algorithm, as shown in Fig. 1(a). This failure mode could lead to unrecoverable tracking failure; the real target will be outside the operational limits of the tracking algorithm in subsequent frames. B. Tracking Failure Modes In this failure mode, as shown in Fig. 1(b), even though the target is located within the operational limits of the tracking algorithm, it fails to determine the correct target location. This failure is mainly due to the low SNR and competing clutter nature of this IR imagery. The tracking failure resulting from this failure mode could be recovered; the tracking algorithm could still go back and detect the real target, particularly if the target is not moving. C. Reference Image Distortion Failure Mode Correlation-based tracking acquires a reference image and searches for the most similar area to estimate the target position in the current frame. The reference image distortion failure mode is caused by the gradual accumulation of walk-off error,
1057-7149/$20.00 © 2006 IEEE
DAWOUD et al.: TARGET TRACKING IN INFRARED IMAGERY
405
Fig. 1. Tracking failure modes in closing FLIR imagery video sequences. White large boxes show the operational limits of the tracking algorithm. Black small boxes show target-tracking results. A: EMC failure. B: TT failure. C: Reference image-drifting failure.
especially when the object of interest is changing in size, shape, or orientation from frame to frame. It also could be caused as a result of the previous two failure modes. This failure mode, as shown in Fig. 1(c), is characterized by the gradual departure of the target from the tracking window. III. PROPOSED DECISION FUSION ALGORITHM The proposed decision fusion algorithm allows the integration of complementary ego-motion and TT algorithms. The flowchart in Fig. 2 summarizes the components of the proposed algorithm, and Fig. 3 is used for the demonstration. We used the information from the ground truth (GT) file to initialize tracking. The tracking in the following frames is done automatically without using GT information. A. EMC Component The first component of the proposed algorithm is the EMC. Different and complementary EMC algorithms are used to generate initial locations for the target. In the flowchart in Fig. 2, we assume that two EMC algorithms are used and the two initial and . More target locations are referred to as EMC algorithms could be added in this component. We can see that EMC algorithm #1 succeeded in compensating the ego-motion, while algorithm #2 failed. For the subsequent TT to be successful, at least one EMC algorithm should succeed in placing
the target within the operational limits of the tracking algorithm. The failure of all EMC algorithms will result in unrecoverable tracking error. B. TT Component The next component is the TT. In the vicinity of the initial locations generated by EMC algorithms, different TT algorithms are used to pinpoint the target by generating tentative locations. In the flowchart in Fig. 2, we assume that two TT algorithms are used. More TT algorithms could be added in this component. The number of tentative target location is the product of the number of EMC algorithms times the number of TT algorithms. In the example in Fig. 3, four tentative target locations, referred , , , and , are to as and generated. The correct location of the target is the other locations are false alarms. C. Decision Fusion Component: Selecting Final Target Location by Template Matching This component selects the final target location from among these tentative locations. This is achieved by template matching. segmented targets (STij) from the tentative location are compared with a reference image for the target that we called the weighted composite reference function (WCRF), which will be discussed in Section III-D. An error margin (EMij) is computed
406
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2006
Fig. 2. Flowchart of proposed decision-fusion algorithm for correlation-based TT in FLIR imagery.
for every STij. EMij reflects the dissimilarity between STij and the reference image WCRF (1) where STD is the standard deviation and
D. Update Target Model UsingWCRF The next component of the proposed algorithm is updating the target model. The WCRF method is used to update the target model. WCRF is generated by summing the multiplication of FDT images of the target used in the previous frames with a set of weighted coefficients as shown in the following equation:
(2) The STij with the lowest EMij is selected as the final detected target (FDT). In the example in Fig. 3, has the lowest EM; is selected as the final target location, therefore, EM and FDT is ST11.
(3) are a set of weighted coefficients. The where WCRF of (3) is designed in such a way that if all consecutive , reference images are identical, i.e.,
DAWOUD et al.: TARGET TRACKING IN INFRARED IMAGERY
407
Fig. 3. Example demonstrating the location of the initial target locations generated by the EMC algorithms (x1; y 1) and (x2; y 2) and tentative target locations generated by the tracking algorithms (x11; y 11), (x12; y 12), (x21; y 21), and (x22; y 22).
Fig. 4.
WCRF target model update.
then the corresponding WCRF composite image would be the . Under the aforementioned reference image, i.e., condition, the summation of all coefficients becomes equal to 1. To eliminate the need for preserving all of the preceding reference images in the formulation of the WCRF shown in (3), a recursive form of WCRF is proposed herein, defined as
where is an arbitrary constant. Rewriting (4) in the form of a series of summation yields
(5) Comparing (3) and (5), we get
(4)
(6)
408
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2006
Fig. 5. Sequence L21-04: Example demonstrating successful rejection of ego-motion failure in frame 3.
Fig. 6.
Sequence L15-NS: Example demonstrating successful tracking with low signature profile targets.
(7) The larger the value of , the more information from prior reference images is incorporated in the WCRF formulation. This makes the tracking algorithm less vulnerable to the effects of noisy frames. In addition, it becomes less susceptible to the changes in target appearance with respect to various three-dimensional distortions, such as rotation and scale variations. Fig. 4 illustrates an example of WCRF assistance in the remedy of failure mode caused by reference image distortion. Images in the left column are the input FLIR images where the tracked target is shown within a small box. Images in the middle column are the segmented reference images of the corresponding frame, FDT, while images in the right column correspond to the associated WCRF. In this FLIR sequence, frame #4 is totally corrupted by noise, which results in the failure of tracking the target.
E. Detecting Target Going Out of Screen Because the sensor or camera can only capture a fraction of the field of view at any point in time, it may not be possible to track all targets of interest at once. Therefore, a component to detect targets leaving the view of the camera is essential. Detecting a target leaving a scene is achieved by estimating the next location of the target from its motion trajectory information gathered from the preceding frames. If the estimated location of the target in the next frame is out of the 128 128 pixels of the frame, the tracking of the target is to be held. Simple time domain polynomial equations are employed to extrapolate the next position of the target in the new frame from the position information of the target from the previous three frames. Two coordinate set of the location of the target are estimated from the following equations: Set 1: (8)
DAWOUD et al.: TARGET TRACKING IN INFRARED IMAGERY
409
TABLE I SUMMARY OF THE RESULTS OF SUCCESSFUL TRACKING PERCENTAGES FOR SOME CHALLENGING FLIR SEQUENCES
(9) Set 2:
TABLE II TOTAL NUMBER OF TARGETS THAT WERE SUCCESSFULLY TRACKED OUT OF THE 101 TARGETS
(10) (11) where is the frame number while , , , and are the real coefficients that are determined from solving the equations using the three sets of and coordinate locations from the previous frames. To reduce false alarm rate, both sets estimated locations of the target must be located outside the screen in order for the target to be considered leaving the screen. IV. RESULTS AND DISCUSSION We have applied the proposed decision fusion algorithm to the AMCOM FLIR data set. The data set consists of FLIR sequences of grayscale format frames (128 128 pixels). The proposed algorithm fused the algorithms presented in [1], [2], and [5]. EMC algorithm #1 and TT algorithm #1 use FJTC [1]. EMC algorithm #2 uses the correlation coefficients and TT algorithm #2 uses the intensity variation function [2]. The TT algorithm #3 [5] used the SDF reference image for FJTC-based tracking. Figs. 5 and 6 are examples demonstrating the robustness of the proposed fusion algorithm. The white boxes in these figures are the tracking results of the different algorithms, and the black boxes are the final tracking results selected by the fusion algorithm. In Fig. 5, which shows frames from sequence L21-04, there was a big camera motion in frame 3. The proposed fusion algorithm successfully rejected the false alarm errors resulting from the failure of one ego-motion algorithm. In Fig. 6, which shows frames from sequence L15-NS, the targets had low signature profile. The fusion algorithm successfully tracked the targets and rejected the false alarms. To demonstrate the robustness of the proposed algorithm, we compared its performance with the performance of the individual algorithms references [1], [2], [5]. Table I summarizes the results of successful tracking percentages for some sequences. The results show that the proposed algorithm consistently preformed better than the individual algorithms; for each target, the fusion algorithm produced results that are better (or at least
equal to) than the results of the best individual algorithm. The FLIR data set consists of 50 sequences that include 101 targets. Table II reports the number of targets that were successfully tracked. The results indicate that there a substantial improvement in comparison with the individual algorithms. The WCRF method for updating the target model prevents the distortion of the reference image, and at the same time it allows the target model to update its shape, size and orientation from frame to frame. It also enhances the probability of recovering after a failure at a certain frame. V. CONCLUSION We presented a novel data fusion algorithm for TT in (FLIR) imagery that is taken from airborne platform. The algorithm allowed the fusion of complementary EMC and tracking algorithms. The overall performance of the proposed algorithm has been found to be significantly better than any individual tracking algorithm used in the fusion. The experiments performed on the AMCOM FLIR data set verify the robustness of the proposed algorithm. ACKNOWLEDGMENT The authors would like to thank R. Sims and W. Sanders for many rewarding discussions. REFERENCES [1] S. Alam and M. A. Karim, “Multiple target detection using a modified fringe-adjusted joint transform correlator,” Opt. Eng., vol. 33, pp. 1610–1617, 1994. [2] A. Bal and M. S. Alam, “Automatic target tracking in FLIR image sequences,” presented at the SPIE Defense and Security Symp., Apr. 2004. [3] P. Bharadwaj and L. Carin, “Infrared-Image classification using hidden Markov trees,” IEEE Trans. Pattern. Anal. Mach. Intell., vol. 24, no. 10, pp. 1394–1398, Oct. 2002.
410
[4] M. L. Cooper and M. I. Miller, “Information measures for object recognition accommodating signature variability,” IEEE Trans. Inf. Theory, vol. 46, no. 7, pp. 1896–1907, Jul. 2000. [5] W. T. Freeman and E. H. Adelson, “The design and use of steerable filters,” IEEE Trans. Pattern. Anal. Mach. Intell., vol. 13, no. 7, pp. 891–906, Jul. 1991. [6] C. Loo and M. S. Alam, “Invariant object tracking using fringe-adjusted joint transform correlation,” Opt. Eng., to be published. [7] E. Saber, A. M. Tekalp, and G. Bozdagi, “Fusion of color and edge information for improved segmentation and edge linking,” Image Vis. Comput., vol. 15, pp. 935–952, 1997. [8] A. Strehl and J. K. Aggarwal, “Detecting moving objects in airborne forward looking infrared sequences,” Mach. Vis. Appl., vol. 11, pp. 267–276, 2000. [9] L. Wang, S. D. Der, and N. M. Nasrabadi, “Automatic target recognition using a feature-decomposition and data-decomposition modular neural network,” IEEE Trans. Image Process., vol. 7, no. 8, pp. 1113–1121, Aug. 1998. [10] A. Yilmaz, K. Shafique, and M. Shah, “Target tracking in airborne forward looking infrared imagery,” Image Vis., vol. 21, pp. 623–635, 2003.
Amer Dawoud received the B.Sc. degree in electrical engineering from Yarmouk University, Jordan, in 1988, the M.Sc. degree in electrical engineering from Kuwait University, Kuwait, in 1998, and the Ph.D. degree in systems design engineering, University of Waterloo, Waterloo, ON, Canada, in 2002. He was a Postdoctoral Research Associate and part-time faculty member with the Electrical and Computer Engineering Department, University of South Alabama, Mobile. Currently, he is with Rotoflex International, Inc., as a Vision Systems Engineering Specialist. His research interests include image processing, computer vision systems, real-time imaging, computation acceleration, pattern recognition, data fusion, and target tracking in infrared imagery.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2006
M. S. Alam (SM’98) is a Professor and Chair of the Electrical and Computer Department, University of South Alabama, Mobile. His research interests include ultrafast computer architectures and algorithms, image processing, pattern recognition, fiber optics, infrared systems, digital system design, and smart energy management and control. He is the author or coauthor of more than 280 published papers, including 121 articles in refereed journals and ten book chapters. He has presented more than 55 invited papers, seminars, and tutorials at international conferences and research institutions in the USA and abroad. Dr. Alam has received numerous excellences in research, teaching, and service awards, including the 2003 Scholar of the Year award from the USA. He has served as the PI or Co-PI of many research projects totaling nearly 12 million dollars, supported by the National Science Foundation, FAA, DoE, ARO, AFOSR, WPAFB, and ITT industry. He is a Fellow of OSA, SPIE, the IEE (U.K.), and a member of the ASEE and AIP. He was the Chairman of the Fort Wayne Section of the IEEE from 1995 to 1996.
A. Bal received the B.S. degree in electronics and telecommunication engineering from Istanbul Technical University, Istanbul, Turkey, in 1993, and the M.S. and Ph.D. degrees in electrical engineering from Yildiz Technical University, Istanbul, in 1997 and 2002. He was a faculty member at Yildiz Technical University. He has been a Postdoctoral Research Associate with the Electrical and Computer Engineering Department, University of South Alabama, Mobile, since December 2002. His research interests include digital-optical signal and image processing, pattern recognition, artificial neural networks, machine learning, wavelet theory, and data and decision fusion.
C. Loo received the B.S. and M.S. degrees in electrical engineering from the University of South Alabama, Mobile, in 2001 and 2003, respectively. Currently, he holds a research position with the Department of Electrical and Computer Engineering, University of South Alabama, Mobile. His research interests are ATR and remote sensing.