Multisensor Image Fusion Using a Region-Based Wavelet Transform Approach Zhong Zhang and Rick S. Blum Electrical Engineering and Computer Science Department Lehigh University, Bethlehem, PA 18015
[email protected] [email protected] Abstract Using multiple sensors in a vision system can significantly reduce both human and machine errors in detection and recognition of objects. A particular case of interest is where images from possibly different types of sensors are to be combined. An image fusion scheme is proposed which combines aspects of feature-level fusion with the pixel-level fusion. Images are fused by combining their wavelet transforms. The identification of important features in each image, such as edges and regions of interest, are used to guide the fusion process. Experiments show that this algorithm works well in many situations.
1 Introduction In recent years, multisensor image fusion has received significant attention for both military and industrial applications. Concealed weapon detection (CWD) is one interesting application. CWD appears to be a critical technology for dealing with terrorism. Detecting concealed weapons is especially difficult when one wants to monitor an area where portal systems are not practical. Portable systems, which could be placed in a police car, would be desirable. Due to the difficult nature of the problem, an extensive study indicated that no single sensor technology can provide acceptable performance over all of the scenarios of interest [Currie et al.-1996]. This justifies a study of fusion techniques to achieve improved CWD procedures. A number of compatible sensor technologies have already been identified which could provide improved performance if a fusion scheme were available [Currie et al.-1996].
This work is supported in part by ONR/DOD MURI program, contract N00014-95-1-0601. The source images in Fig. 3 were obtained from Thermotex Corporation.
Most of the technologies produce images, so image fusion is of interest. We use the term image fusion to denote a process by which multiple images or information from multiple images are combined. These images may be obtained from different types of sensors. The majority of research on image fusion can be classified into the two categories: pixel-level image fusion and feature-level image fusion [Luo and Kay-95]. Pixellevel fusion generates a fused image in which each pixel is determined from a set of pixels in source images. The fused image is expected to be such that the performance of a particular task of interest, such as object detection, can be improved. Feature-level fusion first employs feature extraction separately on each image and then performs the fusion based on the extracted features. It enables the detection of useful features with higher confidence, and a fused image is not necessarily generated in this case. Currently, it appears that more people are focusing on pixel-level fusion. Image fusion based on pyramid decomposition is one of the popular fusion methods. Pyramid decomposition methods construct a fused pyramid representation from the pyramid representations of the original images. The fused image is then obtained by taking an inverse pyramid transform. The approach was apparently first introduced in [Burt and Adelson-1983, Burt-1984] for image coding and binocular fusion in human vision. Several other pyramid-based image fusion schemes were proposed in [Toet-1990, Akerman III-1992, Burt and Lolczynski-1993] . More recently, approaches based on the wavelet transform have begun to receive considerable attention [Ranchin et al.-1993, Chipman et al.-1995, Li et al.-1995]. In [Ranchin et al.-1993], the authors studied fusion based on multiresolution
image decomposition and reconstruction using the wavelet transform. They presented a technique for enhancing the spatial resolution of a SPOT image using another image from a different band from the same satellite. In [Chipman et al.-1995], the focus is on fusing multispectral aerial photos using a set of basic operations on particular sets of wavelet coefficients which correspond to certain frequency bands. In [Li et al.-1995], a wavelet transform approach is considered which uses an area-based maximum selection rule and a consistency verification step. The wavelet-transform-based approaches exhibit advantages in terms of compactness, directional selectivity and orthogonality [Li et al.-1995]. However, previous research had considered relatively simple methods for combining the wavelet coefficients which didn’t make full use of the spacial information contained in the source images. In this paper, we illustrate a wavelet transform based image fusion approach where we combine aspects of both pixel-level and feature-level fusion. The feature used is an object or region of interest which we refer to as a region here. Since objects and parts of objects carry the information of interest, it is reasonable to focus on them in the fusion algorithm. While previous researchers consider each pixel in the image separately, or just consider the pixel and its close neighbors, such as a 3 x 3 or 5 x 5 window, they neglect the fact that each pixel is just one part of an object or region. The objects are what we are really interested in. By considering each pixel separately, noise and blurring effects are often introduced during the fusion process. Our region-based method appears to be a good way to solve these problems. In this paper, we consider fusion of two images only. Extensions to more than two images can be developed in a similar way. The proposed fusion scheme is described in Section 2. Some experimental results are presented in Section 3. Section 4 presents our conclusions.
2 The Region-Based Image Fusion Scheme We take it as a prerequisite that the source images must be registered, so that the corresponding pixels are aligned. The discrete wavelet transform of each of the two registered images is computed. Then, using a scheme discussed later, the decision map is generated. Each pixel of the decision map denotes which image best describes this pixel. Based on the decision map, we fuse the two images in the wavelet transform domain. The final fused image is obtained by taking the inverse wavelet transform.
For each source image, the edge image, region image and region activity table are generated as shown in Figure 1. Next, the region activity tables of each image are used to create the fusion decision map. This is also illustrated in Figure 1. Each pixel in the fusion decision map tells which image should be used to provide the wavelet coefficients related to the corresponding pixel in the region image. Source Image 1
Wavelet Coefficient
Region Activity Table
Source Image 2
Edge Image
Wavelet Coefficient
Edge Image
Region Image
Region Activity Table
Region Image
Fusion Criterion
Fusion Decision Map
Figure 1: Data flow for creating the decision map
2.1 Wavelet Transform The wavelet transform [Vetterli and Herley-1992, Mallat-1989] of an image provides a multiscale pyramid decomposition for the image. This decomposition will typically have several stages. There are four frequency bands after each decomposition. These are the low-low, low-high, high-low and highhigh bands. The next stage of the decomposition process operates only on the low-low part of the previous result. This produces a pyramid hierarchy as shown in Figure 2, in which the top of the pyramid, denoted by LL2 , is a low-low frequency band. We can think of this low-low band as the lowpass filtered and subsampled source image. All the other bands which we call high frequency bands contain transform coefficients that reflect the differences between neighboring pixels and thus can be positive or negative. If we are dealing with grayscale images, then the absolute values of the high frequency coefficients represent the intensity of brightness fluctuation of the scene at a given scale. The larger values imply more distinct brightness changes which typically correspond to the salient features of objects. Thus, a simple fusion rule is to select the larger absolute value of the two corresponding wavelet coefficients from each of the two source images. There are two disadvantages of this method. It may have high sensitivity to noise and it may produce a blurring effect. To eliminate these undesirable features,
we first divide each image into different objects and regions. Then, instead of performing the fusion pixel by pixel, we make the decision object by object and region by region. Thus in the fused image, each object will be described by the data from the clearer of the two images. LL
2
LH
2
HL
HH
1
Recall that the region image corresponds to the lowlow band of the wavelet coefficient pyramid. The activity level of region k in source image n Aln (k) is given by
Aln (k) = N1
2
LH HL
this can be accounted for by introducing a weight factor in the activity level calculation.
2
HH
k 1j Nk
1 1, 2 -- Resolution levels H --- High frequency band 1
L --- Low frequency band
Figure 2: Pyramid hierarchy of wavelet transform
2.2 Region Labeling We apply Canny edge detection to the low-low band of the coefficient pyramid obtained from the wavelet transform. The low-low band has a lower resolution than the source image, but it still contains the spacial region information. The output of the Canny detector is an edge image, on which region segmentation is performed using a labeling algorithm described in [Zhang and Blum-1997]. Finally, we get a labeled image in which each different value represents a different region, zero corresponds to edges. The focus of this paper is on image fusion. While we have employed some specific edge detection and region labeling algorithms, other edge detection and region labeling algorithms, which may come from current or future studies, can be easily substituted for ours. The edge detection and labeling algorithms we choose are not necessarily the best. They just illustrate our approach.
2.3 Fusion Information on salient features of an object is partially captured by the magnitude of high frequency wavelet coefficients that corresponding to that object. Consider two regions with similar size and signal-to-noise ratio (SNR) in two registered images which each represent the same object in a real scene. The one which has the larger magnitude high frequency components will generally contain more detail. Under this assumption, we first calculate the activity level of each region as the average of the absolute value of the high frequency band wavelet coefficients. Next we generate the decision map according to the activity level, size and position of each object or region. If the SNRs of two images are different,
X
Pj
(1)
where Nk is the total number of pixels in region k, Pj is the activity intensity of pixel j in region k, which is given by
Pj = W M
X
mM 3 2
1
X
1
M ?m)
2(
i322(M ?m)
Ci
j
j
1
(2) where W is the weight which is determined by the SNR of the image and other factors, M is the number of wavelet decomposition levels, Ci is one of the wavelet coefficients in high frequency bands corresponding to pixel j. The second sum in (2) is over all the wavelet coefficients that correspond to pixel j in the high frequency bands of the mth decomposition stage. Next we describe how to produce the binary decision map. Suppose we have two registered images A and B to be fused. If a given pixel in the decision map is a “1” then all the wavelet coefficients corresponding to this pixel are taken from image A. If the pixel is “0”, all the wavelet coefficients corresponding to this pixel are taken from image B. For a specific pixel of the decision map, P(i,j), this pixel may be: 1. in region m of image A, and region n of image B 2. an edge point in one image, and in certain region in the other image 3. an edge point in both images We assign the value of each pixel in decision map according to the following criteria:
Small regions preferred over large regions when comparing activity levels
Edge points preferred over non-edges point when comparing activity levels
High activity-level preferred over low activitylevel
Make decision on non-edge points first and consider their neighbors when making the decision on edge points
Avoid isolated points in decision map
A binary decision map is now created to fuse the two wavelet coefficient arrays into one. Each pixel in the decision map corresponds to a set of wavelet coefficients in each frequency band of all decomposition levels. The size of the decision map is just 1 of the original image where M is the number of 2M decomposition stages. The value of M should not be too small, or we can not take the advantage of the decrease in image size due to the wavelet transform. In this case, the computation complexity will increase sharply. A large decomposition value is also not desired since resolution for region detection will be low. Practically, the choice of M is made according to the size of source image and its resolution. For our second example in Fig. 4, which uses 512 x 512 source images, an appropriate number of decomposition stages is two. In this case, the size of edge image, region image and decision map will be 128 x 128.
3 Experimental Result We tested our algorithm on several pairs of images. Some of the result are described here. Figure 3 shows a pair of visual and 94GHz millimeter-wave (MMW) images. The visual image provides the outline and the appearance of the people while the MMW image shows the existence of a gun. From the fused image, we can clearly and promptly see that the person on the right has a concealed gun beneath his clothes. This fused image may be very helpful to a police officer, for example, who must response quickly. Figure 4 shows a pair of multifocused test images. In one image, the focus is on the Pepsi can. In the other image, the focus is on the testing card. In the fused image, the Pepsi can, the table, and the testing card are all in focus. These examples illustrate that our algorithm works in cases when the images either come from the same or different types of sensors.
4 Conclusion We have presented a new approach to multi-sensor image fusion which combines the frequency information from wavelet transform with the spacial information from the original image. We use a particular image feature, regions which we believe represent objects, to guide the fusion process. Since objects and parts of objects carry the information of interest, it is reasonable to focus them in the fusion algorithm. Concealed weapon detection is one interesting application of our fusion algorithm. However, our algorithm can also be used in other applications.
References [Akerman III, 1992] A. Akerman III. Pyramid techniques for multisensor fusion. In Sensor Fusion V, volume 1828, pages 124–131. SPIE, 1992. [Burt and Adelson, 1983] P. J. Burt and E. Adelson. The laplacian pyramid as a compact image code. IEEE Trans. Communications, Com-31(4):532– 540, 1983. [Burt and Lolczynski, 1993] P. J. Burt and R. J. Lolczynski. Enhanced image capture through fusion. In Proc. the 4th Intl. Conf. on Computer Vision, pages 173–182, Berlin,Germany, May 1993. [Burt, 1984] P. J. Burt. The pyramid as structure for efficient computation. In Multiresolution Image Processing and Analysis, pages 6–35. SpringerVerlag, 1984. [Chipman et al., 1995] L.J. Chipman, T.M. Orr and L.N. Graham. Wavelets and image fusion. In Wavelet Applications in Signal and Image Processing III, volume 2569, pages 208–219. SPIE, 1995. [Currie et al., 1996] N. Currie, F. Demma, D. Ferris, R. McMillan, M. Wicks, and K. Zyga. Imaging sensor fusion for concealed weapon detection. In SPIE Symposium on Enabling Technologies for Law Enforcement and Security: Investigative Image Processing. SPIE, Boston, MA, Nov. 1996. [Li et al., 1995] H. Li, B. S. Manjunath and S. K. Mitra. Multisensor image fusion using the wavelet transform. Graphical Models and Image Processing, 57(3):235–245, May 1995. [Luo and Kay, 95] R.C. Luo and M.G. Kay. Multisensor Integration and Fusion for Intelligent Machines and Systems, pages 7–10. Ablex Publishing Corp, 95. [Mallat, 1989] S. G. Mallat. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Machine Intell., PAMI-11:674–693, July 1989. [Ranchin et al., 1993] T. Ranchin, L. Wald and M. Mangolini. Efficient data fusion using wavelet transform: the case of spot satellite images. In Mathematical Imaging: Wavelet Applications in Signal and Image Processing, volume 2934, pages 171–178. SPIE, 1993. [Toet, 1990] A. Toet. Hierarchical image fusion. Mach. Vision Appl, pages 1–11, Mar 1990. [Vetterli and Herley, 1992] M. Vetterli and C. Herley. Wavelets and filter banks: theory and design. IEEE Trans. Signal Processing, 40:2207– 2232, September 1992. [Zhang and Blum, 1997] Z. Zhang and R. S. Blum. Region-based image fusion scheme for concealed weapon detection. In Proc. 30th Conf. on CISS, March 1997.