A New Saliency-Driven Fusion Method Based on ... - IEEE Xplore

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 14, NO. 12, DECEMBER 2017

2433

A New Saliency-Driven Fusion Method Based on Complex Wavelet Transform for Remote Sensing Images Libao Zhang , Member, IEEE, and Jue Zhang

Abstract— In remote sensing images, demands for spectral and spatial resolution vary from region to region. Regions with abundant texture and well-defined boundaries (like residential areas and roads) need more spatial details to provide better descriptions of various ground objects while regions such as farmland and mountains are mainly discriminated by spectral characteristic. However, most existing fusion algorithms for remote sensing images execute a unified processing in the whole image, leaving those important needs out of consideration. The employment of diverse fusion strategy for regions with different needs can provide an effective solution to this problem. In this letter, we propose a new saliency-driven fusion method based on complex wavelet transform. First, an adaptive saliency detection method based on clustering and spectral dissimilarity is presented to generate saliency factor for indicating diverse needs of the two kinds of resolutions in regions. Then, we combine nonlinear intensity–hue–saturation transform with multiresolution analysis based on dual-tree complex wavelet transform in order to complement each other’s advantages. Finally, saliency factor is employed to control the detail injection in the fusion, helping to satisfy different needs of different regions. Experiments reveal the validity and advantages of our proposal. Index Terms— Complex wavelet transform, image fusion, remote sensing, saliency detection, spectral dissimilarity.

I. I NTRODUCTION

O

WING to limitations of remote sensing systems, a large collection of remote sensors offer satellite imagery with either high spectral diversity (e.g., multispectral (MS) image) or high spatial resolution (e.g., panchromatic (Pan) image). Remote sensing image fusion methods aim at integrating MS images with Pan images, thereby offering new informative images with better spatial and spectral resolutions than sole sensor [1]. Over the past decades, numerous valuable studies have been developed to tackle the issue, which can be generally categorized into two groups: component substitution (CS) methods and multiresolution analysis (MRA) methods. The intensity– hue–saturation (IHS), principal component analysis (PCA) [2], Manuscript received June 10, 2017; revised September 9, 2017; accepted October 6, 2017. Date of publication November 21, 2017; date of current version December 4, 2017. This work was supported in part by the National Natural Science Foundation of China under Grant 41771407 and Grant 61571050 and in part by the Beijing Natural Science Foundation under Grant 4162033. (Corresponding author: Libao Zhang.) The authors are with the College of Information Science and Technology, Beijing Normal University, Beijing 100875, China (e-mail: [email protected]). Color versions of one or more of the figures in this letter are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LGRS.2017.2768070

and generalized IHS (GIHS) [3] are conventional CS-based fusion schemes. Recently, a number of improved algorithms have been presented, such as adaptive GIHS [4], band-dependent spatial-detail model (BDSD) [5], partial replacement adaptive CS method (PRACS) [6], and matting model-based method [7]. The advent of MRA offers a new powerful tool for image fusion. Discrete wavelet transform (DWT) and laplacian pyramid are classical tools in this group. Since DWT suffers from shift variance and lack of directionality, several improved modalities, such as additive wavelet (AWT) [8], à trous wavelet transform (ATWT) [9], and dual-tree complex wavelet transform (DT-CWT) [10], are proposed to overcome these drawbacks. Particularly, DT-CWT is at an advantage over other modalities because of its near shift invariance and rotation invariance. Previous studies have indicated that demands for spectral and spatial resolutions in remote sensing images vary from region to region. Regions with abundant texture and welldefined boundaries (like residential areas and roads) need more spatial details for better visualization. In contrast, regions like farmland and mountains are mainly discriminated by spectral characteristic in the cases where undistorted spectral diversity holds the key to the validity of further processing. However, most existing methods execute a unified fusion processing of the whole image without distinguishing the diverse need of different regions. Motivated by this, we come up with a novel idea, i.e., distinguish regions with different needs in the image, and then apply different fusion strategies according to their needs and features. Thereby, the key step is to find an index for indicating those regions. Saliency detection originates from research on visual attention mechanism and tends to extract distinct details or unique parts that can draw immediate attention in images. Extracted saliency map has been introduced to many computer vision applications such as object recognition [11] and adaptive compression of image [12]. However, conventional saliency detection methods do not consider unique features of remote sensing images and cannot work directly in the fusion. Hence, we propose an adaptive saliency detection method for indicating the diverse needs of spatial and spectral resolutions in different regions and then present a new saliency-driven fusion method for remote sensing images. The main contributions of this letter are highlighted as follows.

1545-598X © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

2434

Fig. 1.


Flowchart of the proposed method.

1) The key contribution of this letter is to present a saliency-driven fusion method to fulfill fusion tasks with special consideration of different demands of different regions on spectral and spatial resolution in remote sensing. 2) We propose an adaptive saliency detection method based on clustering and spectral dissimilarity particularly for indicating diverse needs of spatial and spectral resolution. 3) We take the extracted saliency map as saliency factor to drive the fusion framework and make it attune to subregional needs. 4) We integrate MRA based on DT-CWT with nonlinear IHS transform, combining the advantages of both MRA-based and CS-based methods in our fusion framework. II. M ETHODOLOGY In this section, we present a new saliency-driven fusion method based on DT-CWT for remote sensing images. Fig. 1 gives an overview of our proposal. The proposed method includes three steps. Step 1: Feature vector containing spectral and spatial feature is utilized in fuzzy c-means (FCM) to classify pixels into different groups. Then, saliency factor is generated based on the spectral dissimilarity between different groups. Step 2: The MS image is projected onto IHS space by the nonlinear IHS transform. The intensity (I ), hue (H ), and saturation (S) component are obtained. Step 3: After the Pan image is histogram matched to I , both matched Pan image and I are decomposed by DT-CWT. Then, the saliency factor is used to adjust the coefficient fusion. With the generation of new coefficients, the fused image is obtained after backward conversion. The rest of this section is organized as follows. Section II-A presents principles of 2-D DT-CWT. The saliency detection method and generation of saliency factor are explained in detail in Sections II-B and Section II-C. Section II-D demonstrates the whole fusion process.

A. Two-Dimensional Dual-Tree Complex Wavelet Transform Despite achieving great success in many cases, DWT has been proven disappointing in certain applications in which data are high dimension or corrupted by noise. Therefore, varieties of improved versions such as AWT and ATWT are exploited to address the issue. Selesnick et al. [13] further introduced DT-CWT as an enhancement to fetch more compact representations and then extended DT-CWT to 2-D cases. A 2-D DT-CWT can not only provide perfect decomposition and reconstruction but also perform excellently in negligible aliasing and directional selectivity. The computation load of 2-D DT-CWT is also much lower. Integrating direction-selecting ability with good artifacts diminishment, 2-D DT-CWT has a promising prospect in remote sensing image processing. To understand how 2-D DT-CWT develops with these good properties, we begin with the extension of 1-D complex wavelet. The 1-D complex wavelet function is as follows: (1) ψ(t) = ψh (t) + j · ψg (t) (2) ψg ≈ H{ψh } where H notes the Hilbert transform. ψh is the real part of the function while ψg is the imagery part as well as the Hilbert transform of ψh ψ(x, y) = [φh (x) + j · φg (x)][ϕh (y) + j · ϕg (y)] = φh (x)ϕh (y) − φg (x)ϕg (y) + j · [φg (x)ϕh (y) + φh (x)ϕg (y)].

(3)

A 2-D DT-CWT implements a row–column operation based on 1-D wavelet transform, thereby producing a 2-D complex wavelet ψ(x, y). A 2-D complex wavelet is orientated at six directions (±15°, ±45°, ±75°), which makes it directionally selective. Using six separate basis functions, 2-D DT-CWT can decompose images into six high-frequency subbands at each scale and the low-frequency subband is obtained from h to denote the highthe lowest scale. In this letter, we use Cλ,θ frequency subband in scale λ and direction θ . C l refers to the low-frequency subband.

ZHANG AND ZHANG: NEW SALIENCY-DRIVEN FUSION METHOD BASED ON COMPLEX WAVELET TRANSFORM

Fig. 2.

2435

Flowchart of the saliency detection method.

B. Adaptive Saliency Detection Based on Clustering Many traditional saliency detection methods merely evaluate the saliency through measuring per-pixel visual importance. However, according to the principles of human visual system (HVS), people always pay more attention to objects and targets in images rather than individual pixels [14]. Pixel’s saliency would change significantly with its placement and context. Pixel-based saliency detection ignores integral information and is less effective in locating salient regions. Hence, we introduce an adaptive saliency detection method based on clustering, in which we take FCM as a way to mimic how HVS processes visual stimuli with integral information. Fig. 2 shows the flowchart of our saliency detection method. As a classical and efficient soft clustering method, FCM outperforms conventional hard clustering algorithms (like K -means and bisecting K -means) in assigning the membership of each pixel when applied to remote sensing images. For obtaining an ideal classification for image fusion, we utilize the correlation between MS bands and the Pan image through using spatial along with spectral feature for clustering. Vector 1 ,..., XM , X X i = [X ms,i pan,i ] is the feature vector of pixel i . ms,i b represents the bth band in M is the sum of MS bands. X ms,i the MS image, b ∈ (1, M) while X pan,i refers to the intensity value in the Pan image. In this letter, M = 3. In FCM, objective function J is designed as follows to determine the membership of each pixel: ⎛ ⎞ k ⎝1 ⎠ um (4) J= i j · dist(X i , C j ) nj j =1

i∈G j

where m denotes the fuzzy coefficient weighting the impact of membership on clustering results. k indicates the predefined number of clustering. G j is the j th cluster with centroid C j , containing n j pixels. dist(X i , C j ) represents the Euclidean distance between X i and C j . In this letter, we set m = 2 and k = 10. Then, we can obtain membership degree u i j and cluster center C j by the following equations: ui j =

k p=1

1

dist(X i −C j ) 2/(m−1) dist(X i −C p )

Ni m i=1 u i j X i C j = N . i m i=1 u i j

(5)

Fig. 3. Examples of saliency detection results. (a) MS image. (b) Saliency map. (Yellow means higher SF and blue means lower one.) (c) Enlarged view of residential areas. (d) Enlarged view of roads and farmland.

Finally, pixel i in the image belongs to cluster G p as (7) shows. Fig. 3 shows saliency detection results of our proposal. p = arg max{u i j } j ∈(1,k)

(7)

C. Generation of Saliency Factor Based on Spectral Dissimilarity In HVS, color feature always works as an important cue for distinguishing diverse scenes. It is also widely believed that HVS is more sensitive to high contrast stimuli or differences rather than absolute values in receptive fields [15]. In remote sensing images, we notice the fact that salient regions usually stick out for their unique colors in contrast to the background and the color contrast theoretically corresponds to spectral dissimilarity in MS images. In addition, owing to the recognition that spectral distortion can be evaluated via calculating spectral differences between fused images and original images, it is feasible to take such differences as measures to determine whether more detail injections are preferable or not. Thus, we generate saliency factor SF for every group by measuring spectral dissimilarity between different clusters in the image. In this letter, spectral dissimilarity SD for group G p is estimated as the weighted sum of the spectral distance to other clusters p = j n j · p j (8) SD(G p ) = n p

dist(C p , C j ) p j = exp − (9) σ2 where σ is a parameter to control the influence of spectral dissimilarity and set to be 0.4. For each pixel i in cluster G p , SF is defined as follows: SF(i ) = norm(SD(G p ))

(10)

where norm(·) normalizes the input to [0, 1]. D. Saliency-Driven Image Fusion

(6)

In order to reduce the computational time, clustering centers and memberships are initialized by K -means.

In this section, we integrate MRA based on DT-CWT with nonlinear IHS transform, combining the advantages of both MRA-based and CS-based methods in our fusion framework. The saliency factor obtained in the former section is exploited

2436


here to drive the fusion framework. The MS image is first projected onto IHS space by the nonlinear IHS transform, and we can obtain intensity (I ), hue (H ), and saturation (S) of the MS image. For reducing spectral distortion, we carry out a histogram matching of the Pan image and I component. Then, the mean and standard deviation of the Pan image and I component can be restricted to the same range σI (Pan − μPan ) + μ I (11) PAN_HM = σPan where σPan and σ I refer to the standard deviation of the Pan image and I component, respectively. μPan and μ I refer to the average value of the Pan image and I component, respectively. After that, both the matched Pan image and I component are decomposed by the 2-D DT-CWT with a depth equal to r , with high-frequency and low-frequency coefficients generated, respectively. In this letter, we set r = 3 h Cλ,θ , C l PAN_HM = 2 − D DT-CWT(PAN_HM) (12) h Cλ,θ , C l I = 2 − D DT-CWT(I ). (13)

Fig. 4. Fusion results for SPOT-5 scene (1024 × 1024). (a) Degraded MS image. (b) Pan image. (c) GIHS. (d) PCA. (e) AWLP. (f) BDSD. (g) PRACS. (h) ATWT-CE. (i) MORPH. (j) Our proposal.

Subsequently, saliency factor SF is used during the fusion of high-frequency coefficients, controlling the proper injection of spatial details. And we keep the low-frequency coefficients of I component unchanged

h h h +β × Cλ,θ Cλ,θ Fused = (1− SF)× α× Cλ,θ PAN_HM I h + SF × Cλ,θ (14) PAN_HM {C l }Fused = {C l } I

(15)

where α and β are two constants corresponding to different satellite sensors which are selected through experiments. Finally, after multiresolution reconstruction and backward conversion, we can obtain the fused image. III. E XPERIMENTAL R ESULTS To verify the performance of our proposal, we used registered MS and Pan images for experiments and selected two typical scenes to show visual and quantitative performances. The first scene is the suburb in Beijing, China, provided by SPOT-5 satellite sensor, at 2.5-m Pan and 10-m MS resolution. The second one is captured by GeoEye-1 satellite sensor in Inakadate, Japan, at 0.46-m Pan and 1.84-m MS resolution. Due to the inexistence of absolute references, we spatially degrade down original images to obtain MS images with lower resolution and take original ones as genuine references. We compare our proposal with seven competing algorithms including GIHS [3], PCA [2], AWLP [8], BDSD [5], PRACS [6], ATWT-CE [9], and MORPH [16]. A. Visual Assessment The visual performances of the algorithms mentioned above for SPOT-5 and GeoEye-1 data are shown in Figs. 4 and 5. We can observe that PCA method, as well as MORPH method, produces apparent spatial artifacts. GIHS method suffers from apparent spectral distortion as colors of those roofs and farmland are obviously altered in fused images. The visual results also evidence the insufficient spatial injection of AWLP,

Fig. 5. Fusion results for GeoEye-1 scene (512 × 512). (a) Degraded MS image. (b) Pan image. (c) GIHS. (d) PCA. (e) AWLP. (f) BDSD. (g) PRACS. (h) ATWT-CE. (i) MORPH. (j) Our proposal.

PRACS, and ATWT-CE. Our proposal produces fused images in which colors of farmland are closer to the original image and details of residential areas are significantly improved. B. Quantitative Assessment As for the quantitative assessment, we compute various performance indices corresponding to the algorithms. Four criterion with references, including the universal image quality index (UIQI), correlation coefficient (CC), spectral angle mapper (SAM), and relative global dimensional synthesis error (ERGAS), are utilized in this assessment [17]. In addition, the quality not requiring a reference (QNR) [18] works as a joint evaluation of spectral and spatial quality for the overall assessment. All the metrics are reported in Tables I and II, with optimal values displayed in the second line. The best performance of each metric is in bold, showing the superior ability of our proposal. Both visual and quantitative assessments reveal that our proposal not only considers the varying needs of adverse regions in images but also achieves excellent performance in improving spatial resolution and maintaining spectral quality. In order to analyze the effectiveness of the saliency-based rule utilized in the fusion of wavelet coefficients, we also compared it with two traditional rules, the maximum rule and

ZHANG AND ZHANG: NEW SALIENCY-DRIVEN FUSION METHOD BASED ON COMPLEX WAVELET TRANSFORM

TABLE I Q UANTITATIVE E VALUATION FOR SPOT-5 S CENE

2437

different regions. Then, we integrate MRA based on DT-CWT with nonlinear IHS transform, combining the advantages of both MRA-based and CS-based methods in our fusion framework. The saliency factor is exploited here to drive the fusion framework. The visual and quantitative evaluations demonstrate that our proposal provides a better performance in improving spatial resolution, maintaining spectral quality, and particularly meeting varying needs of diverse regions. R EFERENCES

TABLE II Q UANTITATIVE E VALUATION FOR G EO E YE -1 S CENE

TABLE III Q UANTITATIVE E VALUATION OF D IFFERENT RULES FOR G EO E YE -1 S CENE

the average rule [19]. We select the results of the GeoEye-1 scene for exhibition in Table III. We can observe that the maximum rule sharpens the image excessively, leading to severe spectral distortion and the average rule exhibits inadequate spatial injection. The saliency-based rule provides fused images with well-preserved spectral information as well as sufficient improvement in spatial quality. Overall, the saliencybased strategy is more competing and effective than the two traditional rules. IV. C ONCLUSION In this letter, a new saliency-driven image fusion method based on complex wavelet transform for remote sensing images is proposed to satisfy different needs of spatial and spectral resolution for different regions. The key step is to discriminate diverse regions with different needs of the two kinds of resolutions. Hence, we introduce an adaptive saliency detection method to generate saliency factor for indicating the diverse needs of spatial and spectral resolution in

[1] H. Ghassemian, “A review of remote sensing image fusion methods,” Inf. Fusion, vol. 32, pp. 75–89, Nov. 2016. [2] P. Chavez, Jr., S. C. Sides, and J. A. Anderson, “Comparison of three different methods to merge multiresolution and multispectral data: Landsat TM and SPOT panchromatic,” Photogramm. Eng. Remote Sens., vol. 57, no. 3, pp. 295–303, Mar. 1991. [3] T.-M. Tu, S.-C. Su, H.-C. Shyu, and P. S. Huang, “A new look at IHS-like image fusion methods,” Inf. Fusion, vol. 2, no. 3, pp. 177–186, Sep. 2001. [4] S. Rahmani, M. Strait, D. Merkurjev, M. Moeller, and T. Wittman, “An adaptive IHS pan-Sharpening method,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 4, pp. 746–750, Oct. 2010. [5] A. Garzelli, F. Nencini, and L. Capobianco, “Optimal MMSE pan sharpening of very high resolution multispectral images,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 1, pp. 228–236, Jan. 2008. [6] J. Choi, K. Yu, and Y. Kim, “A new adaptive component-substitutionbased satellite image fusion by using partial replacement,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 1, pp. 295–309, Jan. 2011. [7] X. Kang, S. Li, and J. A. Benediktsson, “Pansharpening with matting model,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 8, pp. 5088–5099, Aug. 2014. [8] X. Otazu, M. Gonzalez-Audicana, O. Fors, and J. Nunez, “Introduction of sensor spectral response into image fusion methods. Application to wavelet-based methods,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 10, pp. 2376–2385, Oct. 2005. [9] G. Vivone, R. Restaino, M. Dalla Mura, G. Licciardi, and J. Chanussot, “Contrast and error-based fusion schemes for multispectral image pansharpening,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 5, pp. 930–934, May 2014. [10] P. Hill, M. E. Al-Mualla, and D. Bull, “Perceptual image fusion using wavelets,” IEEE Trans. Image Process., vol. 26, no. 3, pp. 1076–1088, Mar. 2017. [11] L. Zhang, A. Li, Z. Zhang, and K. Yang, “Global and local saliency analysis for the extraction of residential areas in high-spatial-resolution remote sensing image,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 7, pp. 3750–3763, Jul. 2016. [12] L. Zhang, J. Chen, and B. Qiu, “Region-of-interest coding based on saliency detection and directional wavelet for remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 1, pp. 23–27, Jan. 2017. [13] I. W. Selesnick, R. G. Baraniuk, and N. C. Kingsbury, “The dual-tree complex wavelet transform,” IEEE Signal Process. Mag., vol. 22, no. 6, pp. 123–151, Nov. 2005. [14] M. Jian, K.-M. Lam, J. Dong, and L. Shen, “Visual-patch-attentionaware saliency detection,” IEEE Trans. Cybern., vol. 45, no. 8, pp. 1575–1586, Aug. 2015. [15] Y. Dong, M. T. Pourazad, and P. Nasiopoulos, “Human visual systembased saliency detection for high dynamic range content,” IEEE Trans. Multimedia, vol. 18, no. 4, pp. 549–562, Apr. 2016. [16] R. Restaino, G. Vivone, M. Dalla Mura, and J. Chanussot, “Fusion of multispectral and panchromatic images based on morphological operators,” IEEE Trans. Image Process., vol. 25, no. 6, pp. 2882–2895, Jun. 2016. [17] F. Palsson, J. R. Sveinsson, M. O. Ulfarsson, and J. A. Benediktsson, “Quantitative quality evaluation of pansharpened imagery: Consistency versus synthesis,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 3, pp. 1247–1259, Mar. 2016. [18] L. Alparone, B. Aiazzi, S. Baronti, A. Garzelli, F. Nencini, and M. Selva, “Multispectral and panchromatic data fusion assessment without reference,” Photogramm. Eng. Remote Sens., vol. 74, no. 2, pp. 193–200, Feb. 2008. [19] S. Li, X. Kang, L. Fang, J. Hu, and H. Yin, “Pixel-level image fusion: A survey of the state of the art,” Inf. Fusion, vol. 33, pp. 100–112, Jan. 2017.