2300
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 40, NO. 10, OCTOBER 2002
Context-Driven Fusion of High Spatial and Spectral Resolution Images Based on Oversampled Multiresolution Analysis Bruno Aiazzi, Luciano Alparone, Stefano Baronti, Member, IEEE, and Andrea Garzelli, Member, IEEE
Abstract—This paper compares two general and formal solutions to the problem of fusion of multispectral images with high-resolution panchromatic observations. The former exploits the undecimated discrete wavelet transform, which is an octave bandpass representation achieved from a conventional discrete wavelet transform by omitting all decimators and upsampling the wavelet filter bank. The latter relies on the generalized Laplacian pyramid, which is another oversampled structure obtained by recursively subtracting from an image an expanded decimated lowpass version. Both the methods selectively perform spatial-frequencies spectrum substitution from an image to another. In both schemes, context dependency is exploited by thresholding the local correlation coefficient between the images to be merged, to avoid injection of spatial details that are not likely to occur in the target image. Unlike other multiscale fusion schemes, both the present decompositions are not critically subsampled, thus avoiding possible impairments in the fused images, due to missing cancellation of aliasing terms. Results are presented and discussed on SPOT data. Index Terms—Context modeling, data fusion, Laplacian pyramid, multiresolution analysis, multispectral images, undecimated discrete wavelet transform (UDWT).
I. INTRODUCTION
M
ULTISENSOR data fusion has nowadays become a discipline to which more and more general formal solutions to a number of application cases are demanded. In remote sensing applications, the increasing availability of spaceborne sensors, imaging in a variety of ground scales, and spectral bands undoubtedly provides strong motivations. Due to the physical constraint of a tradeoff between spatial and spectral resolution, spatial enhancement of poor-resolution multispectral (MS) data is desirable. More generally, spectral enhancement of data collected with adequate ground resolution but poor spectral selection (as a limit case, a single panchromatic (P) band) is pursued. Spaceborne imaging sensors routinely allow a global coverage of the earth’s surface. MS observations, however, may Manuscript received September 29, 2001; revised July 12, 2002. This work was supported by grants from the Italian Ministry of Education, University and Research (MIUR)—Project on Data Fusion (PRIN 2000). B. Aiazzi and S. Baronti are with the Institute of Applied Physics “N. Carrara,” National Research Council (IFAC-CNR), I-50127 Florence, Italy (e-mail:
[email protected];
[email protected]). L. Alparone is with the Department of Electronics and Telecommunications, University of Florence, I-50139 Florence, Italy (e-mail:
[email protected]). A. Garzelli is with the Department of Information Engineering, University of Siena, I-53100 Siena, Italy (e-mail:
[email protected]). Digital Object Identifier 10.1109/TGRS.2002.803623
exhibit limited ground resolutions that may be inadequate to specific identification tasks. Data fusion techniques have been designed not only to allow integration of different information sources, but also to take advantage of complementary spatial and spectral resolution characteristics. In fact, the P band is transmitted with the maximum resolution allowed by the imaging sensor, while the MS bands are usually acquired and transmitted with coarser resolutions, e.g., two or four times lower. At the receiving station, the P image may be merged with the MS bands to enhance the spatial resolution of the latter. Since the pioneering high-pass filtering (HPF) technique [1], fusion methods based on injecting high-frequency components into resampled versions of the MS data have demonstrated a superior performance [2]. HPF basically consists of an addition of spatial details, taken from a high-resolution P observation, into a bicubically resampled version of the low-resolution MS image. Such details are obtained by taking the difference between the P image and its lowpass version achieved through a simple local pixel averaging, i.e., a box filtering. Later efforts benefit from an underlying multiresolution analysis employing the discrete wavelet transform (DWT) [3], [4], uniform rational filter banks (borrowed from audio coding) [5], and Laplacian pyramid (LP) [6], [7]. Although never explicitly addressed by most of the literature, the rationale of highpass detail injection as a problem of spatial frequency spectrum substitution from a signal to another was formally developed in a multiresolution framework as an application of filter banks theory [8]. The DWT has been recently employed for remote sensing data fusion [9]–[11]. According to the basic DWT fusion scheme [12], couples of subbands of corresponding frequency content are merged together. The fused image is synthesized by taking the inverse transform. Fusion schemes based on the “à trous” wavelet (ATW) algorithm were recently proposed [13]–[15]. Unlike the DWT, which is critically subsampled, the ATW and the LP are oversampled. The missing decimation allows an image to be decomposed into nearly disjointed bandpass channels in the spatial frequency domain without losing the spatial connectivity of its highpass details, e.g., edges and textures. As a simple outcome of multirate signal processing theory [16], the LP can be easily generalized (GLP) to deal with scales whose ratios are arbitrary integers or even fractional numbers [17]–[19]. The remainder of the paper is organized as follows. Section II lays the foundations of multiresolution analysis and highlights the concept of oversampling in the transformed domain. The rationale of contextual fusion is introduced in Section III, together with two original procedures based on undecimated wavelets
0196-2892/02$17.00 © 2002 IEEE
AIAZZI et al.: CONTEXT-DRIVEN FUSION OF HIGH SPATIAL AND SPECTRAL RESOLUTION IMAGES
and Laplacian pyramids, respectively. Experimental results and comparisons are presented and discussed in Section IV on SPOT data (XS+P). Conclusions are drawn in Section V. II. MULTIRESOLUTION ANALYSIS The theoretical fundamentals of multiresolution analysis will be briefly reviewed in this section with specific reference to the dyadic case, i.e., an analysis whose scales vary as powers of two. Thus, the outcome frequency bands exhibit octave structure, i.e., their extent doubles with increasing frequency. Although this constraint may be relaxed to allow more general analyses [20], such an issue will not be addressed here for sake of clarity and conciseness. The goal is to demonstrate that multiresolution analysis is a unifying framework in which novel and existing image fusion schemes can be easily accommodated. A. Fundamental Principles denote the Hilbert space of real square summable Let . functions, with a scalar product levels of a continMultiresolution analysis with having finite energy is a projection of uous signal onto a basis [21]. Basis functions result from translations and called the scaling function, dilations of a same function . The family spans a verifying . The projection of onto gives an subspace of at the scale 2 . approximation Analogously, basis functions are the result of dilations and translation of the same function called the wavelet function, which fulfills . spans a subspace . The The family yields the wavelet coefficients of , projection of onto , representing the details between two to obsuccessive approximations: the data to be added to is the complement of in tain . Hence, (1) realize the multiresolution analysis. They The subspaces present the following properties [22]: , ; • ; • , ; • is dense in and ; • such that is a basis of • ; exists such that • a wavelet function is a basis for . Eventually, multiresolution analysis with levels yields the following decomposition of (2) All functions
can be decomposed as follows: (3)
2301
The functions and are generated from and that translations and dilations of dual functions are to be defined in order to ensure a perfect reconstruction. The connection between filter banks and wavelets stems from dilation equations allowing us to pass from a finer scale to a coarser one [21]
(4) and . with . Normalization of the scaling function implies implies . MultiresoluAnalogously, tion analysis of a signal can be performed with a filter bank and a highpass analcomposed of a lowpass analysis filter ysis filter
(5) As a result, successive coarser approximations of at scale 2 are provided by successive lowpass filtering, with a downsampling operation applied on each filter output. Wavelet coefficients at scale 2 are obtained by highpass filtering an approximation of at the scale 2 , followed by downsampling. The signal reconstruction is directly derived from (1)
(6) and define the synthesis filters. where the coefficients If the wavelet analysis is applied to a discrete sequence, with are the original signal samples regarded as the coefficients of the projection of a continuous onto . The coefficients relative to the lower function resolution subspace and to its orthogonal complement can be obtained through the subsampling of the discrete convolution by the coefficients of the impulse response of the two of and , lowpass and highpass, respectively digital filters [22]. The two output sequences represent a smoothed version (or approximation) and the rapid changes occurring of within the signal (or detail). To achieve reconstruction of the original signal, the coefficients of the approximation and detail signals are upsampled and , or synthesis filand filtered by the dual filter of and , which are still lowpass and highpass filters, ters respectively. The scheme of a wavelet coefficient decomposition and reconstruction is depicted in Fig. 1(a), in which is a discrete one-dimensional (1-D) sequence, and is the sequence reconstructed after the analysis/synthesis stages. As can be seen, the wavelet representation is closely related to a subband decomposition scheme [16]. and can 1) Orthogonal Wavelets: The functions be constructed in such a way to realize an orthogonal decompois the orthogonal complement of sition of the signal; then
2302
Fig. 1.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 40, NO. 10, OCTOBER 2002
Dyadic wavelet decomposition (analysis) and reconstruction (synthesis). (a) Decimated. (b) Undecimated.
in . These filters cannot be chosen independently of each other if perfect reconstruction (PR) is desired. The synthesis bank must be composed by filters having an impulse response that is a reversed version of that of the analysis ones [16], [22], and . Quadrature mirror filters (QMF) i.e., ; satisfy all these constraints [21], [23] with . Thus, the power-complementary hence (PC) property, stated in the frequency domain as , which allows cancellation of aliasing created by downsampling in the dyadic analysis/synthesis scheme shown . Despite in Fig. 1(a), becomes the mathematical elegance of the decomposition, constraints imposed on QMF do not allow the design of filters with impulse response symmetric around the zeroth coefficient, i.e., with null phase, since the number of coefficients is necessarily even. Furthermore, the bandwidth value is fixed to be exactly one half (in the dyadic case) of the available one. 2) Biorthogonal Wavelets: If the orthogonality constraint is relaxed, we can have symmetric (zero-phase) filters, which are suitable for image processing. Furthermore, the filters of the bank are no longer constrained to have the same size and may be chosen independently of each other. In order to obtain PR, two conditions must be met on the conjugate filters of the filter bank [24]
(7) The former implies a correct data restoration from one scale to another, and the latter represents the compensation of recovery effects introduced by downsampling, i.e., the aliasing compensation. Synthesis filters are derived from the analysis filters with the aid of the following relations:
(8)
B. Undecimated Discrete Wavelet Transform The multiresolution analysis described above does not preserve the translation invariance. In other words, a translation of the original signal does not necessarily imply a translation of the corresponding wavelet coefficient. This property is essential in image processing. On the contrary, wavelet coefficients generated by an image discontinuity could disappear arbitrarily. This nonstationarity in the representation is a direct consequence of the downsampling operation following each filtering stage. In order to preserve the translation invariance
property, some authors have introduced the concept of stationary wavelet transforms [25]. The downsampling operation is suppressed, but filters are upsampled by 2 , i.e., dilated zeroes between any couple of consecutive by inserting coefficients , if else , if else.
(9)
and , The frequency response of (9) will be respectively [16]. The wavelet decomposition may be recursively applied, i.e., the lowpass output of the wavelet transform may be further decomposed into two sequences. This process creates a set of levels of wavelet decomposition that represent the signal viewed at different scales. If the decomposition of the lowpass signal is sequences are obtained: one sequence repeated times, represents the approximation of the original signal containing a ) of the original spectrum around zero; the other fraction ( sequences are the detail information that allow to reconstruct the original signal. The scheme of the decomposition of a signal ) is shown in Fig. 2(a). This represeninto three levels ( tation will be referred to as a decimated, or critically subsamand denote pled, wavelet. In the decimated domain, the approximation, i.e., lowpass and the detail, i.e., highpass or bandpass, sequences at the output of the th stage, respectively. An equivalent representation is given in Fig. 2(b), obtained from that of Fig. 2(a) by shifting the downsamplers toward the output and by using upsampled filters [16]. The coefficients before the and , and this last downsamplers will be denoted with representation will be referred to as an undecimated, or oversampled, discrete wavelet transform (UDWT). can be obtained by Notice that the coefficients by a factor 2 . PR is achieved in both downsampling cases. In the undecimated domain, lowpass and highpass coefficients are obtained by filtering the original signal. In fact, from Fig. 2(b) it can be noticed that at the th decomposition level, the and can be obtained by filtering the original sequences with a bank of equivalent filters given by signal
(10) The frequency responses of the equivalent analysis filters are shown in Fig. 3. As it appears, apart from the lowpass filter
AIAZZI et al.: CONTEXT-DRIVEN FUSION OF HIGH SPATIAL AND SPECTRAL RESOLUTION IMAGES
Fig. 2.
(a) Three-level scheme (J
2303
= 3) for decimated wavelet decomposition. (b) Equivalent scheme with undecimated wavelet subbands (denoted with a tilde). where is the approximation of the original image at the scale ]. 2 , giving the low-frequency content in the subband [0, Image details are contained in three high-frequency zero-mean , , and , corresponding to hori2-D signals zontal, vertical, and diagonal detail orientations, respectively. Wavelet coefficients of the th level give high-frequency infor]. For each decomposition mation in the subband [ level, in the undecimated case, images preserve their original size, since downsampling operations after each filter have been suppressed. Thus, such a decomposition is highly redundant. In times a -level decomposition, a number of coefficients greater than the number of pixels is generated. Fig. 4 shows examples of DWT and UDWT. D. “À Trous” Wavelet Decomposition of an Image
Fig. 3. Frequency responses of the equivalent analysis filters of an undecimated wavelet decomposition, for J = 3.
(leftmost), all the other filters are bandpass with bandwidths roughly halved as increases by one. The prototype filters and are Daubechies-4 [23] with 8 coefficients. can be reconstructed from the wavelet A sequence subbands, either decimated [see Fig. 1(a)] or not [Fig. 1(b)], by and . using the synthesis filters C. Translation-Invariant Wavelet Decomposition of an Image Image multiresolution analysis was introduced in [22] in the decimated case. However, the 1-D filter bank used for the stationary wavelet decomposition can still be applied in the two-dimensional (2-D) case. Image rows and columns are then filtered from separately. Filtering relationships to obtain the level ) stands for pixel the th level are the following, in which ( position:
The ATW [26] is a nonorthogonal multiresolution decompoand , with the sition defined by a filter bank Kronecker operator denoting an allpass filter. Such filters are not QMF; thus, the filter bank does not allow PR if the output is decimated. In the absence of decimation, the lowpass filter is upsampled by , as in (9), before processing the th level; hence the name “à trous,” which means “with holes.” In two diand , mensions, the filter bank becomes which means that the 2-D detail signal is given by the pixel difference between two successive approximations, which have all the same scale , i.e., one. The prototype lowpass filter is usually zero-phase symmetric. For a -level decomposition, times the ATW accommodates a number of coefficients greater than the number of pixels. Incidentally, HPF [1] uses a , apart frequency decomposition identical to an ATW with from the analysis filter which, however, is odd-sized with constant coefficients and thus zero-phase. The frequency responses of box filters of different sizes, plotted in Fig. 5, show that a smooth transition band is accompanied by a large ripple outside the passband. E. Laplacian Pyramid
(11)
The LP, originally proposed in [27] before multiresolution wavelet analysis was introduced, is a bandpass image decomposition derived from the Gaussian pyramid (GP), which is a
2304
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 40, NO. 10, OCTOBER 2002
(a)
(b)
(c) Fig. 4. Landsat TM band 5 of the island Elba. (a) Original. (b) DWT. (c) UDWT. (b) and (c) are with J 1.
=
as an ATW in which the image is recursively lowpass filtered and downsampled to generate a lowpass subband, which is reexpanded and subtracted pixel by pixel from the original image to yield the 2-D detail signal having zero mean. The output of a separable 2-D filter is downsampled along rows and columns to yield the next level of approximation. Again, the detail is given as the difference between the original image and an expanded version of the lowpass approximation. Unlike the baseband approximation, the 2-D detail signal cannot be decimated if PR is desired. The attribute enhanced depends on the zero-phase expansion filter being forced to cut off at exactly one half of the bandwidth and chosen independently of the reduction filter, which may be half-band as well or not. The ELP outperforms the former LP from [27] when image compression is concerned [29], thanks to its layers being almost completely uncorrelated with one another. Fig. 6 shows the GP and ELP applied on a typical optical remote sensing image. Notice the lowpass octave structure of GP layers, as well as the bandpass octave structure of ELP layers. An octave LP is oversampled by a factor of 4/3 at most (when the baseband is one pixel wide). This moderate data overhead is achieved thanks to decimation of the lowpass component. In the case of a scale ratio , i.e., frequency octave decomposition, polynomial kernels with three (linear), seven (cubic), 11 (fifth-order), 15 (seventh-order), 19 (ninth-order) and 23 (11th-order) coefficients have been assessed [17]. The term polynomial stems from interpolation and denotes fitting an th order polynomial to the nonzero samples. The seven-taps kernel is widespread to yield a bicubic interpolation. It is noteworthy that half-band filters have the even-order coefficients, except the zeroeth one, all identically null [16]. The frequency responses of all the filters are plotted in Fig. 7. Frequency is normalized to the sampling frequency , which is known to be twice the bandwidth available to the discrete signal. The above kernels are defined by the coefficients reported in Table I. The filter design stems from a tradeoff between selectivity (sharp frequency cutoff) and computational cost (number of nonzero coefficients). In particular, the absence of ripple, which can be appreciated in the plots with logarithmic scale, is one of the most favorable characteristics. III. CONTEXT-DRIVEN MULTIRESOLUTION DATA FUSION
Fig. 5. Frequency response of zero-phase box filters of HPF fusion. The 7-box was specifically used in [1] for fusion of Landsat-TM and SPOT-P (1:3).
multiresolution image representation obtained through a recursive reduction (lowpass filtering and decimation) of the image dataset. In the present multiresolution framework, a modified version of the LP, known as enhanced LP (ELP) [28], can be regarded
From multiresolution analysis reviewed in Section II, it appears that DWT, UDWT, ATW, and (E)LP may be regarded as particular cases of multiresolution analysis. The critical subsampling property, featured by DWT only, though essential for data compression, is not required for other applications, e.g., for data fusion. In the remainder of this section, two similar schemes, respectively based on UDWT and ELP (octave) or better GLP (nonoctave), will be described in greater detail. A. Context-Based Injection Model Data fusion based on multiresolution analysis requires the definition of a model establishing how the missing highpass information to be injected into the MS bands is extracted from the P band [13]. Such a model can be global over the whole image
AIAZZI et al.: CONTEXT-DRIVEN FUSION OF HIGH SPATIAL AND SPECTRAL RESOLUTION IMAGES
Fig. 6.
Examples of (a) GP and (b) ELP applied to a Landsat TM image.
or depend on context, either spectral [30], [31] or spatial [32], [33]. The goal is to make the fused bands the most similar to what the narrowband MS sensor would image if it had the same resolution as the broadband one sensing the P band. In this paper, the higher frequency coefficients taken from the higher resolution image are selected based on statistical congruence and weighted by a space-varying factor to achieve gain equalization of otherwise different sensors. This is accomplished by measuring the degree of matching between each of the expanded MS bands and a lowpass version of the P band having the same spatial scale, i.e., the smaller one. The matching function is thresholded to establish whether detail injection should occur or not. A gain factor mapping the highpass coefficient from the P image into the resampled MS band is locally given by the ratio of standard deviations of the target (one MS band) to the source (P image). The approach is similar to that used in [32], which, however, does not exploit the benefits of multiresolution analysis to discriminate lowpass information (common to both MS and P sensors) from highpass information (possessed by the P observation only). B. Wavelet-Based Data Fusion Scheme Fig. 8 outlines a procedure based on UDWT, suitable for fusion of MS and P image data whose scale ratio is two [34]. For ratios greater than two, but still powers of two, the UDWT is achieved from an octave wavelet transform by omitting all decimators and upsampling the filter bank, as shown in Fig. 2(b). With reference to Fig. 8, both the higher resolution P image and the lower resolution MS image dataset are decomposed by the one-level UDWT. The MS images have been previously interpolated by two along rows and columns, in order to process MS images having the same spatial scale as the P image. To this purpose, the 23-taps pyramid-generating lowpass filter , whose frequency response is shown in Fig. 7, is applied along rows and columns, after upsampling by two.
2305
Two sets of undecimated wavelet coefficients are obtained, including approximation (LL) and detail (HL, LH, and HH) signals of the original data. The approximation coefficients and are considered for computing local correlation coefficients (LCC) over a square sliding window. An LCC map is of each of the MS computed between the approximation , both at the scale of the bands and that of the P image, latter, i.e., of the fused image. When injection of the smaller scale details takes place, the higher frequency coefficients from the high-resolution image are weighted to equalize contributions coming from different sensors according to the context model that has been chosen. The ratio of the local standard deviation of the low-resolution image to the corresponding value of the high-resolution image is used as the local weight. To avoid injecting unlikely details, injection is accomplished only if the context-based criterion is of the approximation of locally fulfilled. For each pixel a given band of the MS dataset at the scale of the P image, if is greater than a threshold , the three detail coLCC of the P image are multiplied by efficients at location the local weight and replace the corresponding coefficients in the UDWT of that MS band. The threshold value may be selected, e.g., by minimizing the standard deviation between the fused image (obtained from a P image and a degraded version of an MS dataset) and the original MS data. Finally, undecimated wavelet synthesis is performed for each band of the MS image by applying the synthesis filter bank. The above context model can also be extended to DWT (decimated). The LCC is now calculated between the approximation of P and the MS band itself, both having the same scale (twice that of the undecimated case for one level of dyadic analysis). C. Pyramid-Based Data Fusion Scheme Since its first appearance in the literature [35], LP-based fusion has been progressively upgraded and generalized to different application cases [19]. The block diagram reported in Fig. 9 describes the multirate data fusion algorithm for the more general case of two image datasets, whose scale ratio is an integer , and which have been preliminarily registered on each other. be the dataset constituted by a single image having Let , and smaller scale (i.e., finer resolution) and size is the dataset made up of MS observations having scale larger by a factor (i.e., coarser resolution) and . The goal is to get a set of MS images, each thus size . The upgrade of having the same spatial resolution as to the resolution of is the level of the (zero-mean) . The images of the set have to be interpolated GLP of by to match the finer resolution. Then, the highpass compois added to the expanded , nent from which constitute the lowpass component, in order to yield a spa. Altially enhanced set of MS observations ) with though only one level of decomposition ( is capable to yield 1:4 fusion, for computational convenience and are preferable, since 1:4 filters are much longer than 1:2 filters of similar characteristics. Furthermore,
2306
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 40, NO. 10, OCTOBER 2002
Fig. 7. Frequency responses of pyramid-generating filters. (a) Linear scale. (b) Logarithmic scale. Reduction filters also used for expansion, with dc gain doubled. TABLE I NONZERO COEFFICIENTS OF POLYNOMIAL 1-D KERNELS r (i) = 1=2e (i)
Fig. 8.
Flowchart of undecimated wavelet-based image fusion procedure for a 1:2 scale ratio.
fewer data are to be processed at the second level thanks to decimation after the first level. Crucial points of the above scheme are, on one hand, a check on the congruence of injection to prevent introduction of “ghost” details in some of the MS bands, particularly annoying on urban areas, and on the other hand, an equalization of pyramid levels before merging, to compensate different sensors responses. Again, the LCC between the lowpass version of the
high-resolution image and each of the expanded MS bands seems to be a simple yet effective matching function. Therefore, LCC is calculated and thresholded at each pixel position. If LCC exceeds a threshold, which is expected to be a function of the scale ratio, as well as of the spectral band, injection takes place at the current pixel position; otherwise details are not injected. Since the LCC between image and image is , as given by
AIAZZI et al.: CONTEXT-DRIVEN FUSION OF HIGH SPATIAL AND SPECTRAL RESOLUTION IMAGES
2307
Fig. 9. Flowchart of GLP data fusion for an MS image and a P one, whose integer scale ratio is p > 1. r is the p-reduction lowpass filter with frequency cutoff at 1=p of bandwidth and unity gain at zero frequency. e is the p-expansion lowpass filter with cutoff at 1=p and dc gain equal to p.
an outcome of LCC calculation, the local gain (LG) factor, by are to be scaled before injection which details at pixel , into MS bands, will be one gain for each of the MS bands. The high-frequency compowill be multiplied by LG before being added to the nent of . All the local statistics are calculated expanded version of on square windows.
TABLE II MEANS () AND STANDARD DEVIATIONS ( ) OF ORIGINAL 20-m XS DATA
IV. EXPERIMENTAL RESULTS AND COMPARISONS A test SPOT image portraying the Tweed Heads region, New South Wales, Australia, was available in the three XS bands, i.e., B1 (green), B2 (red), and B3 (near infrared). Table II reports means and standard deviations of the original data having eight-bit wordlength. The XS image was used to synthesize a perfectly overlapped P at a 20-m scale, which is shown in Fig. 10. Then, the XS bands were decimated by two and four and used, together with the P at 20 m, to synthesize B1, B2, and B3 back at 20 m. The reason underlying the merging of simulated P and decimated XS is twofold: all the images are spatially coregistered, being acquired simultaneously from the same platform. No geometric corrections that are likely to affect the results of data fusion [36], [37] are to be preliminarily carried out. Also, the true XS data at 20 m are available for objective comparisons. Four multiresolution fusion schemes were comparatively assessed on the above test dataset. They are as follows: 1) HPF [1], which implicitly relies on an ATW achieved by a box filter (3 3 and 5 5 box filters yield best results for 1:2 and 1:4 fusion, respectively); 2) DWT-based scheme (decimated) utilizing Daubechies-8 (16 taps) orthogonal wavelet filter bank and same context model as that outlined in Section III-B; 3) UDWT-based scheme (Section III-B) still utilizing Daubechies-8 filters; ) uti4) GLP-based scheme (see Section III-C with lizing the 23-taps filter reported in Table I. The first three schemes utilize the same context model; HPF does not. The size of the local window on which the local statistics driving the model are calculated is roughly halved in the DWT scheme, because of decimation.
Fig. 10.
SPOT-P synthesized from XS at 20 m as P = (B 1 + B 2 + B 3)=3.
The decision threshold acts as a tap ruling details injection. When , the injection is enabled regardless of context. , injection is disabled; thus, fusion reduces In the case to plain resampling at the new scale of the low-resolution data. All the intermediate cases are characterized by a context-conditioned injection. The values of optimizing performances in terms of root mean square error (RMSE) between 20-m originals and fused images are indicated in Fig. 11. A 9 9 window provide the for calculation of local statistics and best performances in terms of RMSE, even though all the plots exhibit extremely flat minima. In practice, the threshold value and the window size become little relevant for scale ratios equal to four (or larger).
2308
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 40, NO. 10, OCTOBER 2002
Fig. 11. RMSE (averaged on the three XS bands) between 20-m original and fused XS bands as a function of LCC decision threshold driving detail injection. (a) UDWT. (b) GLP.
Table III reports correlation coefficients (CC) between 20-m reference originals and fused XS bands. This parameter measures how the shape of the fused image reflects that of the original. CC, however, is insensitive to a constant gain and bias between the two images. Although widely reported in the literature for performance comparisons of data fusion techniques, this parameter does not allow a subtle discrimination of possible fusion artifacts. Table IV reports RMSE between 20-m originals and fused XS bands, expressed as percentages of the mean of each band (NRMSE). This is an objective measurement of the extent to which the target is matched by the fused product. However, it may be scarcely related to visual appearance, in the sense that a low NRMSE value may not be necessarily accompanied by high visual quality and vice versa. Table V reports percentages of pixels whose differences between 20-m originals and fused XS bands do not exceed one in modulus, more generally speaking, 1/256th of the full scale. This score parameter was originally proposed in [1], as a detector of spectral distortions originated by fusion and measured between the fused image and the expanded version of the MS data, since MS data with same resolution as the P data are not available in practice. The underlying rationale was that spectral distortion may occur when details are injected into the resampled MS bands. If injection does not take place, distortion is trivially null; hence the score is 100%. On the other hand, details to be injected should be confined in heterogeneous areas, because homogeneous ones, e.g., water, are unchanged when they are imaged with a finer scale. When degraded MS images are fused and compared with the reference originals, as in the present case, this parameter should still be as large as possible to indicate that mismatches between fused bands and reference originals are concentrated in a reduced number of pixels and not spread over the whole image. Furthermore, an abnormally low value may detect the presence of otherwise imperceivable structured impairments (ringing effects and canvas-like patterns) originated, for example, by wavelet analysis/synthesis [38]. Previous work carried out by some of the authors [39], [40] on simulated SPOT data have demonstrated that the percentages of “unchanged” pixel calculated either with respect to the resampled low-resolution MS data that are available in practice,
TABLE III CORRELATION COEFFICIENTS BETWEEN ORIGINAL XS IMAGES (20 m) AND THOSE OBTAINED BY 1:2 AND 1:4 FUSION. EXP DENOTES PLAIN RESAMPLING WITHOUT DETAIL INJECTION, i.e., ABSENCE OF FUSION
TABLE IV AVERAGE RMSES BETWEEN ORIGINAL XS IMAGES (20 m) AND THOSE OBTAINED BY 1:2 AND 1:4 FUSION, AS PERCENTAGES OF THE MEANS OF EACH ORIGINAL BAND. EXP STANDS FOR “NO FUSION”
TABLE V PERCENTAGES OF UNCHANGED PIXELS ( 1 ERROR) BETWEEN 20-m XS ORIGINAL AND OBTAINED BY 1:2 AND 1:4 FUSION. EXP “NO FUSION”
6
=
or with respect to the high-resolution (target) MS data (available in simulations only), follow the same trend varying with the fusion algorithm. All the three scores are comparable for the UDWT and GLP schemes and somewhat poorer for the DWT scheme. Notice that in the present implementation, the only difference between UDWT and DWT fusion schemes is the absence of decimation
AIAZZI et al.: CONTEXT-DRIVEN FUSION OF HIGH SPATIAL AND SPECTRAL RESOLUTION IMAGES
2
2309
Fig. 12. (a) 512 512 detail from original SPOT-XS (20 m). (b) 80-m XS expanded by 4. (c) UDWT-fused XS (20 m). (d) GLP-fused XS (20 m). (e) DWT-fused XS (20 m). (f) HPF-fused XS (20 m).
2310
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 40, NO. 10, OCTOBER 2002
(a)
(b)
(c)
(d)
Fig. 13. Highpass components (rescaled by a gain factor two and biased by 128 for display convenience) injected into B3 by (a) UDWT, (b) GLP, (c) DWT, and (d) HPF.
of the detail components. HPF, which is undecimated as well, yields intermediate results. When the percentage of unchanged pixels is concerned, the selection of details based on thresholding LCC yields appreciable benefits, with respect to the ), especially on the near-inunconditioned injection ( frared band (NIR) B3, which is weakly correlated with the two visible bands B1 and B2. This explains why the percentage of unchanged pixels for 1:4 HPF fusion is even lower than that corresponding to plain resampling (EXP): unlike the other three schemes, HPF does not utilize any context model. Since HPF relies on an undecimated multiresolution analysis scheme similar to ATW, its relatively poor performance is due to the choice of box filters whose frequency responses (see Fig. 5) are far different from the filters conventionally used to yield multiresolution analyses, especially for what concerns ripple outside the passband. Fig. 12(a) shows the original 20-m XS image as a color composite (B3, B2, and B1 as R-G-B). Fig. 12(b) shows the 80-m XS expanded to 20-m scale by means of the pyramid filter (23-taps)
and its spatially enhanced versions through UDWT fusion with and LCC and LG calP in Fig. 12(c) (decision threshold , culated on 9 9 windows), GLP fusion in Fig. 12(d) ( ,5 5 9 9 LCC and LG), DWT fusion in Fig. 12(e) ( LCC and LG), and HPF fusion in Fig. 12(f). As it appears, the UDWT and GLP fused images are hardly distinguishable from each other, as well as from the original [Fig. 12(a)]. Although the HPF fused images look sharp and pleasant at a first glance, they reveal spatial distortions and degradations to a keen observer. Overenhancement is perhaps the major drawback. To provide a deeper insight on the mechanism of injection in a multiresolution framework, either decimated or not, Fig. 13 shows the highpass component injected into B3 (NIR) by the four methods analyzed. Such 2-D zero-mean signals are biased by a constant offset (128) for displaying convenience and stretched by a gain factor equal to two, to allow an easier inspection. As it appears, out of the four sketches, the three employing context models are similar to one another. HPF is somewhat different. A visible clue of the presence of artifacts
AIAZZI et al.: CONTEXT-DRIVEN FUSION OF HIGH SPATIAL AND SPECTRAL RESOLUTION IMAGES
is represented by the coastline. GLP yields a hardly perceivable overshoot, or ringing (an outcome of Gibbs’ effect for 2-D signals); notwithstanding, pyramid filters (23 taps) are longer than the filters used by DWT/UDWT (16 taps) and HPF (three or five taps). In general, for a given frequency cutoff, the longer the filter, the lower the (N)RMSE and the higher the CC. UDWT yields a more pronounced overshoot, due to the length of the QMF bank. The length chosen for QMFs stems from a tradeoff between CC/NRMSE performance and absence of ringing. DWT exhibits another annoying artifact due to aliasing introduced by decimation. A canvas-like pattern is superimposed to overshoots around sharp edges. The lower part of the coastline highlights such an impairment. V. CONCLUDING REMARK Multiresolution analysis has been shown to be the common framework of fusion algorithms. The absence of detail decimation featured by UDWT, HPF, and GLP is the key to avoid artifacts and spatial distortions in general. Although results were reported only on a simulated case, the immediate conclusion is that the translation-invariance property inherited by the missing decimation is invaluable in practical cases concerning different sensors, since possible misregistrations of the data may be emphasized if the transformation achieving the multiresolution analysis is not shift-invariant. Results of UDWT and GLP are comparable for both 1:2 and 1:4 fusion. Thanks to LCC-based details selection and space-varying sensors equalization by ratio of local RMS, both methods achieve impressive scores. As a peculiarity common to most of multiresolution schemes, spectral distortions, regarded as changes in color hues of the composite image, never occur in any of the fused images. Ringing artifacts are completely missing for GLP, moderate for UDWT. The most notable benefit of the context-based injection is that spectral signatures of small size (two or three pixels) may be restored [41], even though they appear to be heavily smeared in the expanded XS image. As a conclusion, the two proposed methods can fit any (reasonable) scaling requirements and are practically equivalent in performance. Besides the lower computational effort required by the GLP method, the main difference is that the GLP does not require advanced signal processing expertise to set up the filter bank when the scale ratios are not powers of two [20], or even noninteger [8]. Another attractive characteristic is that the GLP reduction filter (dyadic case) may be designed such that its cascade with theoptical transfer function (OTF) of the imaging system is halfband, with the outcome benefit that a restoration of the spatial frequency content is achieved together with the enhancement of MS bands. This feature can be valuable in application contexts, since the OTF of real systems is often far from the ideal case. ACKNOWLEDGMENT The authors are grateful to F. Argenti for valuable discussions on wavelet theory and multirate filter banks, as well as to the anonymous reviewers for their constructive remarks and useful suggestions throughout.
2311
REFERENCES [1] P. S. Chavez Jr, S. C. Sides, and J. A. Anderson, “Comparison of three different methods to merge multiresolution and multispectral data: Landsat TM and SPOT panchromatic,” Photogramm. Eng. Remote Sens., vol. 57, no. 3, pp. 295–303, 1991. [2] L. Wald, T. Ranchin, and M. Mangolini, “Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images,” Photogramm. Eng. Remote Sens., vol. 63, no. 6, pp. 691–699, 1997. [3] B. Garguet-Duport, J. Girel, J.-M. Chassery, and G. Pautou, “The use of multiresolution analysis and wavelet transform for merging SPOT panchromatic and multispectral image data,” Photogramm. Eng. Remote Sens., vol. 62, no. 9, pp. 1057–1066, 1996. [4] D. A. Yocky, “Multiresolution wavelet decomposition image merger of Landsat Thematic Mapper and SPOT panchromatic data,” Photogramm. Eng. Remote Sens., vol. 62, no. 9, pp. 1067–1074, 1996. [5] B. Aiazzi, L. Alparone, F. Argenti, S. Baronti, and I. Pippi, “Multisensor image fusion by frequency spectrum substitution: Subband and multirate approaches for a 3:5 scale ratio case,” in Proc. IGARSS, 2000, pp. 2629–2631. [6] T. A. Wilson, S. K. Rogers, and M. Kabrisky, “Perceptual-based image fusion for hyperspectral data,” IEEE Trans. Geosci. Remote Sensing, vol. 35, pp. 1007–1017, July 1997. [7] L. Alparone, V. Cappellini, L. Mortelli, B. Aiazzi, S. Baronti, and R. Carlà, “A pyramid-based approach to multisensor image data fusion with preservation of spectral signatures,” in Future Trends in Remote Sensing, P. Gudmandsen, Ed. Rotterdam, The Netherlands: Balkema, 1998, pp. 418–426. [8] F. Argenti and L. Alparone, “Filterbanks design for multisensor data fusion,” IEEE Signal Processing Lett., vol. 7, pp. 100–103, May 2000. [9] J. Zhou, D. L. Civco, and J. A. Silander, “A wavelet transform method to merge Landsat TM and SPOT panchromatic data,” Int. J. Remote Sens., vol. 19, no. 4, pp. 743–757, 1998. [10] T. Ranchin and L. Wald, “Fusion of high spatial and spectral resolution images: The ARSIS concept and its implementation,” Photogramm. Eng. Remote Sens., vol. 66, no. 1, pp. 49–61, Jan. 2000. [11] P. Scheunders and S. De Backer, “Fusion and merging of multispectral images with use of multiscale fundamental forms,” J. Opt. Soc. Amer. A, vol. 18, no. 10, pp. 2468–2477, Oct. 2001. [12] H. Li, B. S. Manjunath, and S. K. Mitra, “Multisensor image fusion using the wavelet transform,” Graph. Models Image Process., vol. 57, no. 3, pp. 235–245, 1995. [13] J. Núñez, X. Otazu, O. Fors, A. Prades, V. Palà, and R. Arbiol, “Multiresolution-based image fusion with additive wavelet decomposition,” IEEE Trans. Geosci. Remote Sensing, vol. 37, pp. 1204–1211, May 1999. [14] Y. Chibani and A. Houacine, “Model for multispectral and panchromatic image fusion,” in Proc. SPIE Image and Signal Processing for Remote Sensing VI, S. B. Serpico, Ed., 2000, vol. 4170, EUROPTO Series, pp. 238–244. [15] A. Garzelli, G. Benelli, M. Barni, and C. Magini, “Improving waveletbased merging of panchromatic and multispectral images by contextual information,” in Proc. SPIE Image and Signal Processing for Remote Sensing VI, S. B. Serpico, Ed., 2000, vol. 4170, EUROPTO Series, pp. 82–91. [16] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Englewood Cliffs, NJ: Prentice-Hall, 1992. [17] B. Aiazzi, L. Alparone, S. Baronti, V. Cappellini, R. Carlà, and L. Mortelli, “A Laplacian pyramid with rational scale factor for multisensor image data fusion,” in Proc. Int. Conf. Sampling Theory and Applications-SampTA 97, 1997, pp. 55–60. [18] B. Aiazzi, L. Alparone, S. Baronti, and I. Pippi, “Fusion of 18 m MOMS-2P and 30 m Landsat TM multispectral data by the generalized Laplacian pyramid,” ISPRS Int. Arch. Photogramm. Remote Sens., vol. 32, no. 7-4-3W6, pp. 116–122, 1999. [19] B. Aiazzi, L. Alparone, A. Barducci, S. Baronti, and I. Pippi, “Multispectral fusion of multisensor image data by the generalized Laplacian pyramid,” in Proc. IGARSS, 1999, pp. 1183–1185. [20] P. Blanc, T. Blu, T. Ranchin, L. Wald, and R. Aloisi, “Using iterated rational filter banks within the ARSIS concept for producing 10 m Landsat multispectral images,” Int. J. Remote Sens., vol. 19, no. 12, pp. 2331–2343, 1998. [21] I. Daubechies, Ten Lectures on Wavelets. Philadelphia, PA: SIAM, 1992, vol. 61, CBMS-NSF Regional Conference Series in Applied Mathematics.
2312
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 40, NO. 10, OCTOBER 2002
[22] S. Mallat, “A theory for multiresolution signal decomposition: The wavelet representation,” IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 674–693, July 1989. [23] I. Daubechies, “Orthonormal bases of compactly supported wavelets,” Commun. Pure Appl. Math., vol. 41, pp. 909–996, 1988. [24] A. Cohen, I. Daubechies, and J. C. Feauveau, “Biorthogonal bases of compactly supported wavelets,” Commun. Pure Appl. Math., vol. 45, pp. 485–500, 1995. [25] G. P. Nason and B. W. Silverman, “The stationary wavelet transform and some statistical applications,” in Wavelets and Statistics, A. Antoniadis and G. Oppenheim, Eds. New York: Springer-Verlag, 1995, vol. 103, Lecture Notes Statist., pp. 281–299. [26] M. J. Shensa, “The discrete wavelet transform: Wedding the à trous and Mallat algorithm,” IEEE Trans. Signal Processing, vol. 40, pp. 2464–2482, Oct. 1992. [27] P. J. Burt and E. H. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun., vol. COMM-31, pp. 532–540, Apr. 1983. [28] S. Baronti, A. Casini, F. Lotti, and L. Alparone, “Content-driven differential encoding of an enhanced image pyramid,” Signal Process.: Image Commun., vol. 6, no. 5, pp. 463–469, 1994. [29] B. Aiazzi, L. Alparone, S. Baronti, and F. Lotti, “Lossless image compression by quantization feedback in a content-driven enhanced Laplacian pyramid,” IEEE Trans. Image Processing, vol. 6, pp. 831–843, June 1997. [30] R. Nishii, S. Kusanobu, and S. Tanaka, “Enhancement of low spatial resolution image based on high resolution bands,” IEEE Trans. Geosci. Remote Sensing, vol. 34, pp. 1151–1158, Sept. 1996. [31] B. Zhukov, D. Oertel, F. Lanzl, and G. Reinhäckel, “Unmixing-based multisensor multiresolution image fusion,” IEEE Trans. Geosci. Remote Sensing, vol. 37, pp. 1212–1226, May 1999. [32] J. Hill, C. Diemer, O. Stöver, and T. Udelhoven, “A local correlation approach for the fusion of remote sensing data with different spatial resolutions in forestry applications,” ISPRS Int. Arch. Photogramm. Remote Sens., vol. 32, no. 7-4-3W6, pp. 167–174, July 1999. [33] L. Alparone, S. Baronti, and A. Garzelli, “Assessment of image fusion algorithms based on noncritically decimated pyramids and wavelets,” in Proc. IGARSS, 2001, pp. 852–854. [34] A. Garzelli and F. Soldati, “Context-driven image fusion of multispectral and panchromatic data based on a redundant wavelet representation,” in Proc. IEEE/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas, Nov. 8–9, 2001, pp. 122–126. [35] B. Aiazzi, L. Alparone, S. Baronti, and R. Carlà, “A pyramid approach to fusion of Landsat TM and SPOT-PAN data to yield multispectral high-resolution images for environmental archaeology,” in Proc. SPIE Remote Sensing for Geography, Geology, Land Planning and Cultural Heritage, vol. 2960, EUROPTO Series, 1996, pp. 153–162. [36] P. Terretaz, “Comparison of different methods to merge SPOT P and XS data: Evaluation in an urban area,” in Future Trends in Remote Sensing, P. Gudmandsen, Ed. Rotterdam, The Netherlands: Balkema, 1998, pp. 435–443. [37] P. Blanc, L. Wald, and T. Ranchin, “Importance and effect of co-registration quality in an example of pixel to pixel fusion process,” in Proc. 2nd Conf. Fusion of Earth Data: Merging Point, Measurements, Raster Maps and Remotely Sensed Images, 1998, pp. 67–74. [38] D. A. Yocky, “Artifacts in wavelet image merging,” J. Opt. Eng., vol. 35, no. 7, pp. 2094–2101, July 1996. [39] B. Aiazzi, L. Alparone, S. Baronti, and R. Carlà, “Assessment of pyramid-based multisensor image data fusion,” in Proc. SPIE Image and Signal Processing for Remote Sensing IV, vol. 3500, EUROPTO Series, S. B. Serpico, Ed., 1998, pp. 237–248. [40] B. Aiazzi, L. Alparone, F. Argenti, and S. Baronti, “Wavelet and pyramid techniques for multisensor data fusion: A Performance comparison varying with scale ratios,” in Proc. SPIE Image and Signal Processing for Remote Sensing V, vol. 3871, EUROPTO Series, S. B. Serpico, Ed., 1999, pp. 251–262. [41] B. Aiazzi, L. Alparone, S. Baronti, and I. Pippi, “Quality assessment of decision-driven pyramid-based fusion of high resolution multispectral with panchromatic image data,” in Proc. IEEE/ISPRS Joint Workshop on Remote Sensing and Data Fusion Over Urban Areas, Nov. 8–9, 2001, pp. 337–341.
Bruno Aiazzi received the “Laurea” degree in electronic engineering from the University of Florence, Florence, Italy, in 1991. Since 2001, he has been a Researcher at the “Nello Carrara” IFAC-CNR (formerly IROE-CNR), Florence, Italy, where he is currently involved in research activities on image quality definition and measurement, advanced methods for lossless and near-lossless data compression, multiresolution image analysis, data fusion, and synthetic aperture radar image processing. Mr. Aiazzi won a fellowship on “Digital Image Compression for Broad Band Communications Networks” in 1992, which was supported by the National Research Council (CNR) of Italy.
Luciano Alparone was born in Florence, Italy, in 1960. He received the “Laurea” degree (cum laude) in electronic engineering and the Ph.D. degree from the University of Florence, Florence, Italy, in 1985 and 1990, respectively. In 1992, he joined the Department of Electronics and Telecommunications (formerly Department of Electronic Engineering) of the University of Florence, as an Assistant Professor and is now an Associate Professor of Electrical Communications. In 1989, he was a Postgraduate Research Fellow with the Signal Processing Division at the University of Strathclyde, Glasgow, U.K. During the spring 2000 and summer 2001, he was Visiting Professor at the Tampere International Centre for Signal Processing (TICSP), Tampere, Finland. His main research interests are compression of still images and video for low-rate communications, especially lossless and near-lossless compression for medical and remote sensing applications, multiresolution image analysis and processing, nonlinear filtering, multisensor data fusion, and processing and analysis of synthetic aperture radar images. He is an author or coauthor of 40 papers published in international peer-reviewed journals.
Stefano Baronti (M’98) was born in Florence, Italy, in 1954. He received the “Laurea” degree in electronic engineering from the University of Florence, Florence, Italy, in 1980. In 1985, he joined the National Research Council of Italy (CNR), as a Researcher of the “Nello Carrara” IFAC-CNR (formerly IROE-CNR), Florence, Italy. From 1985 to 1989, he was involved in an ESPRIT Project of the European Union aimed at the development of an automated system for quality control of composite materials through analysis of infrared image sequences. Later he moved toward remote sensing image processing by participating in and as responsible for several projects funded by the Italian, French, and European Space Agencies. His research topics are in digital image processing and analysis aimed at computer vision and cultural heritage applications, data compression and image communication (including medical imaging), and optical and microwave remote sensing by synthetic aperture radar. He has coauthored over 25 papers published in international peer-reviewed journals. Mr. Baronti is a member of IEEE Signal Processing Society and the IEEE Geoscience and Remote Sensing Society’s Data Fusion Committee.
Andrea Garzelli (M’99) received the “Laurea” degree (summa cum laude) in electronic engineering and the Ph.D. degree in information and telecommunication engineering from the University of Florence, Florence, Italy, in 1991 and 1995, respectively. In 1995, he joined the Department of Information Engineering of the University of Siena, Siena, Italy, as an Assistant Professor. He is now Associate Professor of Telecommunications at the same department, where he holds a course on digital signal processing. His research interests are in signal and image analysis, processing, and communication; nonlinear filtering; fractal analysis of SAR signals; and image fusion for optical and SAR remote sensing applications. Dr. Garzelli is a member of the IEEE Geoscience and Remote Sensing Society’s Data Fusion Committee.