1846
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 54, NO. 5, OCTOBER 2005
Automatic Target Tracking in FLIR Image Sequences Using Intensity Variation Function and Template Modeling A. Bal and M. S. Alam, Senior Member, IEEE
Abstract—A novel automatic target tracking (ATT) algorithm for tracking targets in forward-looking infrared (FLIR) image sequences is proposed in this paper. The proposed algorithm efficiently utilizes the target intensity feature, surrounding background, and shape information for tracking purposes. This algorithm involves the selection of a suitable subframe and a target window based on the intensity and shape of the known reference target. The subframe size is determined from the region of interest and is constrained by target size, target motion, and camera movement. Then, an intensity variation function (IVF) is developed to model the target intensity profile. The IVF model generates the maximum peak value where the reference target intensity variation is similar to the candidate target intensity variation. In the proposed algorithm, a control module has been incorporated to evaluate IVF results and to detect a false alarm (missed target). Upon detecting a false alarm, the controller triggers another algorithm, called template model (TM), which is based on the shape knowledge of the reference target. By evaluating the outputs from the IVF and TM techniques, the tracker determines the real coordinates of one or more targets. The proposed technique also alleviates the detrimental effects of camera motion, by appropriately adjusting the subframe size. Experimental results using real-life long-wave and medium-wave infrared image sequences are shown to validate the robustness of the proposed technique. Index Terms—Automatic target tracking (ATT), intensity variation function (IVF), long-wave infrared imagery, medium-wave infrared imagery, template model.
I. INTRODUCTION
T
ARGET detection, recognition, and tracking in forward looking infrared (FLIR) imagery are challenging problems. For target tracking, FLIR imagery poses different types of challenges such as low signal-to-noise ratio, background clutter, unpredictable camera motion, target occlusion, and illumination variation due to weather conditions. In addition, the presence of stationary or moving nontarget objects in the input scene complicates the detection and tracking process. Several algorithms for target detection and tracking in FLIR imagery have been reported in the literature which may alleviate some of the aforementioned shortcomings [1]–[5]. While a particular
Manuscript received January 3, 2004; revised April 25, 2005. This work was supported by a grant from the Army Research Office under Grant 43004-CI. The authors are with the Department of Electrical and Computer Engineering, University of South Alabama, Mobile, AL 36688-0002 USA (e-mail:
[email protected];
[email protected]). Digital Object Identifier 10.1109/TIM.2005.855090
algorithm yields good results for target tracking with some sequences, the same technique produces unsatisfactory results for other FLIR sequences. To overcome the aforementioned problems, a very limited amount of work has been reported in the literature. In a recent study, Strehl and Aggarwal [1] proposed the segmentation and Bayesian estimation-based detection and motion estimation approach. Yilmaz et al. [2], [5] applied statistical methods and sensor ego motion compensation. Mahalanobis et al. [3] defined and utilized a signal-to-clutter metric for target tracking. Sun et al. [4] focused on modeling and artificial neural network-based classifications for target detection and tracking. In addition, Kwon et al. [6] and Borghys et al. [7] utilized data fusion methods for detection and tracking using data from multiple sensors. Alam et al. [8] investigated the joint transform correlation approach for multiple target detection. In this paper, we propose a novel an automatic target tracking (ATT) algorithm for target detection and tracking in long-wave and medium-wave FLIR imagery involving targets affected by motion blur, illumination variation, noise, in-plane and out-ofplane distortions, as well as other three-dimensional artifacts. The proposed algorithm consists of three building blocks—the IVF, controller, and template model. In addition, the proposed approach also alleviates the detrimental effects due to camera motion. In the proposed technique, a target IVF is used to detect the location of the target in consecutive frames. Since the IVF is produced by the target window from the previous frame, this function yields maximum peak value when there is a match between the intensity variations of the known reference target and the unknown candidate target window in the subframe. The subframe is chosen via the segmentation process to ensure that the target is present in the region of interest [9], [10]. However, for some frames, we may get more than one maximum peak value due to background or other nontarget-related effects leading to false alarms. To overcome these problems, the TM technique is used which utilizes shape information, another distinct property of the target. Consequently, the tracker determines new target coordinates by evaluating the outputs available from the IVF and TM techniques. Experimental results using real-life FLIR imagery are presented to verify the robustness of the proposed technique. This paper has been organized as follows. Each component of the proposed tracking algorithm is introduced in Section II. The tracker system for single and multiple targets is described
0018-9456/$20.00 © 2005 IEEE
BAL AND ALAM: AUTOMATIC TARGET TRACKING IN FLIR IMAGE SEQUENCES
1847
Fig. 1. Automatic target tracking algorithm block diagram.
in Section III. Experimental results using real FLIR image sequences are also included in Section III. Finally, concluding comments are included in Section IV.
Fig. 2. (a) Target window for the long-wave sequence (lwir-1918). (b) Sub-frame (33 33) selected for the subsequent frame.
2
II. ATT ALGORITHM The ATT algorithm has three modules, as illustrated in Fig. 1. At first, image preprocessing, extraction of the local maximum, and subframe segmentation are performed on two consecutive frames. Then, the first module, called IVF, is initiated for target detection. The second module, called the controller, evaluates the results of IVF module using a distance metric. When the reliability of the results obtained from the IVF module is low, another module called the TM module, is triggered by the controller module. Otherwise, IVF results are passed to the evaluation module to determine the new location of the target. A. Intensity Variation Function (IVF) The proposed IVF-based target tracking approach primarily utilizes target intensity information. In general, target intensity changes between consecutive frames which are not abrupt. In addition, the generation and maintenance of kinetic energy by the target usually causes brighter spots or regions on the target in FLIR image sequences. The IVF technique is formulated by using the aforementioned intensity properties of the target given by (1) where is the IVF for the subframe matrix in the th of the target window, and is the frame, is the size th local maximum window matrix of the target from the frame, defined by (2) where is the spatial coordinates in the rectangular window which includes the target as shown in Fig. 2(a). In addition, a region of interest in the form of a subframe corresponding to each candidate target is selected from the subsequent frame. The target window is used for searching the target within the subframe as illustrated in Fig. 2(b). Furthermore, to accommodate camera movement as well as target motion, a larger (33 33 pixels) subframe is used in this work. Although the IVF has been formulated using standard deviation properties [2], [11], and [12], in this research, we utilized the local maximum value from the previous frame instead of the mean value, which provides superior results. It may be mentioned that the selection of mean value has some disadvantages.
Fig. 3. (a) First frame of a long-wave sequence (lwir-1913). (b) Results obtained using the standard deviation approach. (c) Results obtained using the IVF algorithm.
For example, it is easily affected by background variation, noise, camera zooming, rotation and scale variations, partial occlusion of the target, as well as other artifacts. However, the local maximum value is not generally affected by the aforementioned factors and it propagates the reference target intensity information from the previous frame to the subsequent frame. In addition, it contains distinguishing properties corresponding to other targets and nontarget objects that may be present in the input scene. Furthermore, a small window (e.g., 3 3 pixels) is used to determine the local maximum instead of using one maximum value based on a single pixel, which can be easily affected by noise or other surrounding pixels. Also, an arbitrary pixel value may be higher than the maximum pixel value corresponding to the candidate target. To compare the performance of the standard deviation and IVF techniques, we considered an image sequence supplied by the Army Missile Command (AMCOM). Fig. 3 shows the results obtained using the standard deviation and IVF techniques. Fig. 3(a) depicts the first frame of a long-wave sequence (lwir-1913), Fig. 3(b) shows the results obtained using the standard deviation approach, and Fig. 3(c) illustrates the results obtained using the IVF technique. The standard deviation technique shows the difference between the background and the target as well as nontarget objects. However, the IVF technique represents the target by incorporating the variation around the local maximum which incorporates inherent information of the target from the previous frames. To represent the candidate target coordinates in the form of a peak value, an exponential function, called correlation output is used as shown in the following equation: plane (3) where is a constant and represents the coordinates of the target window in the subframe. The correlation output plane corresponding to (3) is shown in Fig. 4. Fig. 4(a) shows the
1848
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 54, NO. 5, OCTOBER 2005
TABLE I FALSE ALARM RATE FOR THE IVF TECHNIQUE
B. Template Model (TM)
Fig. 4. (a) First frame of a long-wave sequence (lwir-1913). (b) Correlation output plane.
first frame of a long-wave sequence (lwir-1913) and the corresponding correlation output plane is shown in Fig. 4(b). From Fig. 4, it is evident that the candidate target coordinates can be obtained from the maximum peak value by using the following equations
Although the IVF technique produces robust results, occasionally it may generate false alarms due to background effects or neighborhood properties of the target which may appear around one or more pixels of the target. To alleviate the false alarms, we utilized the TM approach in this work. The proposed technique can be implemented in two ways. • The IVF and TM techniques can be executed simultaneously and the results can be combined using some criteria or a suitable data fusion technique [5], [6]. • The IVF technique is executed first and, based on a control criterion, the TM algorithm is initiated. In this paper, we used the second, i.e., the controller-based approach because of the difficulty associated with finding the best criterion for all FLIR sequences. The occasional generation of false alarms by the IVF technique can be alleviated by implementing a suitable criterion in the controller such that the controller triggers the execution of the TM model for a sequence generating false alarms. Thus, the TM model is executed for a very few sequences. Table I shows the false alarm rate for 10 sequences obtained from AMCOM using the IVF technique. From Table I, it is evident that the minimum false alarm rate is 1.79% while the maximum is 23.71% for the IVF technique. The aforementioned controller module uses a distance metric between the reference image and the new target candidate introduced from the subsequent frame which is defined by (6) When (7)
(4) represents the new coordinates of the target in where represents the coordinates of the the th frame, th frame, denotes the subframe size, and target in the denotes the coordinates corresponding to the maximum peak value in the correlation plane, given by (5) where
is limited by the subframe size.
the TM technique is initiated. In (6), is a constant, repdenotes referresents candidate target coordinates, and ence target coordinates. The value of is chosen by considering the computational complexity and the sensitivity needed for tracking a target. In the TM technique, a rectangular window which is larger than the reference target size is used to obtain the target shape information. Therefore, this window contains the target as well as surrounding background information which provides distinguishing properties, such as boundary information and variation
BAL AND ALAM: AUTOMATIC TARGET TRACKING IN FLIR IMAGE SEQUENCES
1849
Fig. 5. (a) Frame #11 of the long-wave sequence (lwir-1913) used in Fig. 3. (b) False alarm generated by the IVF technique which shows two maximum peak values at coordinates (17, 14) and (17, 15). (c) Improved result obtained using the TM technique which shows the correct coordinates at (16, 15).
with background, to differentiate between other similar target or nontarget objects. The TM technique may be modeled as (8) where represents the TM-based matching result for the th region of subframe in the th frame, and repre) of the target window . sents the size To construct the new correlation output plane for the TM technique, an exponential term, similar to (3), is used as shown in the following equation: (9) In (9), represents the correlation output plane for the region. Sometimes, more than one peak values may be generated owing to background or target neighborhood effects. For such cases, the distance metric is utilized for evaluating the IVF and TM results which are based on the minimum distance between the results obtained from the two techniques. Fig. 5
Fig. 6. Single-target tracking results. (a) Sample low-contrast frames (200, 222, 500, and 707) for a long-wave sequence (lwir 2115). (b) Sample frames (26, 139, 250, and 500) with noisy background in a medium-wave sequence (mwir 1415). (c) Sample frames (126, 140, 160, and 170) showing high-speed target and close proximity between targets and bright nontarget objects in a long-wave sequence (lwir 1907).
1850
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 54, NO. 5, OCTOBER 2005
Fig. 7. Multiple-target tracking block diagram.
shows the false alarm for the IVF technique and the compensated results obtained using the TM algorithm. The new coorof the target being tracked is found by using the dinates correlation output defined by (10) is limited by the subframe size. where Fig. 5(a) shows Frame #11 of the long-wave sequence (lwir1913) used in Fig. 4. Fig. 5(b) depicts the false alarm generated by the IVF technique which shows the maximum peak values at coordinates (17, 14) and (17, 15), respectively. Fig. 5(c) shows the improved results obtained using the TM technique which depicts the accurate target coordinates at (16, 15). III. EXPERIMENTAL RESULTS In this section, real-life FLIR image sequences supplied by AMCOM are used to test the performance of the proposed algorithm. Each of these gray-level FLIR sequences has more then 200 frames and each frame consists of 128 128 pixels. The FLIR sequences used in the experiment can be divided into two
classes: long-wave infrared (LWIR) and medium-wave infrared (MWIR) image sequences. It may be mentioned that the MWIR involves noisy images with low-contrast targets. The proposed tracking algorithm is tested for both single- and multiple-target tracking. A detailed MATLAB software package has been developed to test the performance of the proposed algorithm. A. Single-Target Tracking The FLIR sequences under consideration includes single target which are affected by various degrading factors such as low contrast, noise, proximity to bright objects, camera motion, rotation, and scale changes. The proposed tracking algorithm shows satisfactory performance under various challenging scenarios. The proposed algorithm involves the following steps. Step1) In the first frame, the target coordinates are obtained from a ground truth file and the target window is selected in such a way that it is larger than the target size. The selection process for the target window is illustrated in Fig. 2(a). Step 2) Use (2) to obtain the local maximum value. Use this region information to generate the IVF func-
BAL AND ALAM: AUTOMATIC TARGET TRACKING IN FLIR IMAGE SEQUENCES
tion around the maximum intensity pixels inside the subframe as shown in Fig. 4(a). Step 3) The IVF technique may produce more then one maximum peak value or false alarm for some frames due to background effect, low contrast, or other artifacts. Use the distance metric corresponding to (6) and (7) to detect the aforementioned false alarms. Step 4) To compensate for the break points which may be occasionally generated due to false alarms in Step 3), the controller module triggers the TM technique to accurately determine the new coordinates corresponding to the desired target. Step 5) By saving the new coordinates obtained in Step 4), the first step is initiated for the subsequent frame. Repeat the aforementioned steps until all frames of the sequence are processed. The results obtained for single target tracking using the proposed algorithm is shown in Fig. 6. Fig. 6(a) shows low-contrast target tracking in a long-wave infrared image sequence (lwir 2115). Fig. 6(b) demonstrates the performance of the algorithm for a noisy medium-wave infrared image sequence (mwir 1415). To illustrate the performance of the proposed algorithm for fast target tracking, another long-wave infrared image sequence (lwir 1907) is used, and the corresponding results are shown in Fig. 6(c).
1851
Fig. 8. Multiple-target tracking results. (a) Sample frames (2, 50, 60, and 80) showing low contrast and noisy background in a mid-wave FLIR sequence (mwir 1410). (b) Sample frames (210, 220, 230, and 270) showing close proximity between two targets and target rotation in a long-wave FLIR sequence (lwir 19NS). (c) Sample frames (67, 150, 180, and 300) showing similarity between the background and target appearance in a long-wave FLIR sequence (lwir 1812). TABLE II TRACKING RESULTS FOR LONG-WAVE AND MEDIUM-WAVE FLIR IMAGE SEQUENCES
B. Multiple-Target Tracking FLIR sequences with multiple targets pose additional challenges such as similarity between targets, overlapping of targets, and proximity of targets to each other. In the literature, most of the target tracking applications are reported for singletarget tracking. We modified the proposed algorithm for multiple-target tracking by adding additional features such as the target entrance/exit scenario. When a new target enters the input scene, target coordinates for the first frame are obtained either by using a suitable target detection algorithm or from a ground truth file. When a target leaves a frame, target exit is detected by comparing the border coordinates of the frame and the coordinates of the target location. The IVF and TM techniques distinguish targets from each other and ensure the tracking of similar or dissimilar targets while eliminating the tracking of wrong target or nontarget objects. The block diagram for the multiple-target tracking algorithm is illustrated in Fig. 7. To verify the performance of the proposed algorithm for multiple-target tracking, FLIR sequences were chosen with painstaking care. Fig. 8(a) shows the results for a FLIR sequence (mwir 1410) which contains two targets. This sequence involves low-contrast frames, and the targets are located in close proximity. In addition, these targets are affected by the appearance of challenging high-contrast moving objects. When targets either appear to be similar or in close proximity, the IVF technique yields dissimilar peak values. For instance, Fig. 8(b) depicts the results obtained for another long-wave sequence (lwir 19NS) involving targets in close proximity. Fig. 8(c) shows another difficult long-wave sequence (lwir 1812) to demonstrate low signal-to-noise ratio (SNR) and low-contrast target tracking. Tracking performance is also shown in Table II
for long-wave and medium-wave sequences. From Table II, it is obvious that the proposed algorithm effectively tracks both single and multiple targets. This shows significant improvement when compared to similar results reported in the literature [2], [5] for comparable FLIR image sequences. IV. CONCLUSION In this paper, we proposed robust algorithms for single- and multiple-target tracking. These algorithms are primarily based on intensity and shape features for both MWIR and LWIR
1852
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 54, NO. 5, OCTOBER 2005
image sequences. The IVF technique effectively distinguishes and tracks targets for challenging scenarios such as similarity between targets and nontarget objects or for input scenes with low SNR. When the IVF technique generates a false alarm, the shape-based TM algorithm starts to execute to ensure accurate determination of target coordinates. Often, the size of the subframe range prevents accommodating the effects of camera motion. The proposed algorithm accommodates target speed without having any detrimental effect on tracking performance. Camera zooming causes scale variations of the target which is accommodated by the IVF technique. The high-speed tracking algorithm is desirable for real-time applications. The proposed algorithm ensures high processing speed by eliminating redundant image preprocessing steps and by minimizing the number of computations. Finally, Table I shows the robustness of the IVF technique which shows that on average only 9.2% false alarms are produced. The TM technique 100% compensates these problems for most of the sequences as illustrated in Table II. The main reasons for slightly low tracking performance for some targets may be attributed to targets blending with the background and targets overlapping with each other. ACKNOWLEDGMENT The authors would like to thank J. Khan for many rewarding discussions. A. Bal would like to thank Yildiz Technical University for granting the leave of absence to pursue research on this project. REFERENCES [1] A. Strehl and J. K. Aggarwal, “Detecting moving objects in airborne forward looking infrared sequences,” Mach. Vision Appl. J., vol. 11, pp. 267–276, 2000. [2] A. Yilmaz, K. Shafique, and M. Shah, “Target tracking in airborne forward looking infrared imagery,” Image Vision Comput. J., vol. 21, pp. 623–635, 2003. [3] A. Mahalanobis, A. R. Sims, and A. V. Nevel, “Signal-to-clutter measure for measuring automatic target recognition performance using complimentary eigenvalue distribution analysis,” Opt. Eng., vol. 42, pp. 1144–1151, 2003. [4] S. G. Sun and H. W. Park, “Automatic target recognition using boundary partitioning and invariant features in forward-looking infrared images,” Opt. Eng., vol. 42, pp. 524–533, 2003. [5] A. Yilmaz et al., “Target-tracking in FLIR imagery using mean-shift and global motion compensation,” in Proc. IEEE Workshop Computer Vision Beyond Visible Spectrum, Kauai, HI, 2001. [6] H. Kwon, S. Z. Der, and N. M. Nasrabadi, “Adaptive multisensor target detection using feature-based fusion,” Opt. Eng., vol. 41, pp. 69–80, 2002.
[7] D. Borghys, P. Verlinde, C. Perneel, and M. Acheroy, “Multilevel data fusion for the detection of targets using multispectral image sequences,” Opt. Eng., vol. 37, pp. 477–484, 1998. [8] M. S. Alam and M. A. Karim, “Multiple target detection using a modified fringe-adjusted joint transform correlator,” Opt. Eng., vol. 33, pp. 1610–1617, 1994. [9] E. Oron, A. Kumar, and Y. Barshalom, “Precision tracking with segmentation for imaging sensors,” IEEE Trans. Aerosp. Electron. Syst., vol. 29, no. 3, pp. 977–987, Jul. 1993. [10] M. S. Alam, J. G. Bognar, R. C. Hardie, and B. J. Yasuda, “Infrared image registration and high resolution reconstruction using multiple translationally shifted aliased video frames,” IEEE Trans. Instrum. Meas., vol. 49, no. 5, pp. 915–923, Oct. 2000. [11] D. P. Hader, Image Analysis Methods and Applications. Boca Raton, FL: CRC, 2001. [12] M. Seul, L. O’Gorman, and M. J. Sammon, Image Analysis. Cambridge, U.K.: Cambridge Univ. Press, 2000.
A. Bal received the B.S. degree in electronics and telecommunication engineering from Istanbul Technical University, Istanbul, Turkey, in 1993, and the M.S. and Ph.D. degrees in electrical engineering from Yildiz Technical University, Istanbul, in 1997 and 2002, respectively. He has been a Faculty Member at Yildiz Technical University and a Research Scholar in the Electrical and Computer Engineering Department, University of South Alabama, Mobile, since December 2002. His research interests include digital-optical signal and image processing, pattern recognition, artificial neural networks, machine learning, wavelet theory, and data/decision fusion.
M. S. Alam (SM’95) is a Professor and Chair of the Electrical and Computer Engineering Department, University of South Alabama (USA), Mobile. His research interests include ultrafast computer architectures and algorithms, image processing, pattern recognition, fiber-optics, infrared systems, digital system design, and smart energy management and control. He is the author or coauthor of more than 275 published papers, including 117 articles in refereed journals and 10 book chapters. He presented over 55 invited papers, seminars and tutorials at international conferences and research institutions at the USA and abroad. Prof. Alam received numerous excellences in research, teaching, and service awards including the 2003 Scholar of the Year award from the USA. He served as the PI or Co-PI of many research projects totaling nearly 12M dollars, supported by NSF, FAA, DoE, ARO, AFOSR, WPAFB, and ITT industry. He is a Fellow of the Optical Society of America (OSA), a Fellow of Institution of Electrical Engineers (U.K.), a Fellow of the Society of Photooptical Instrumentation Engineers (SPIE), and a member of the American Society of Electrical Engineers (ASEE) and the American Institute of Physics (AIP). He was the Chairman of the Fort Wayne Section of IEEE for 1995–1996.