Hierarchical Sparse Learning with Spectral-Spatial

sensors Article

Hierarchical Sparse Learning with Spectral-Spatial Information for Hyperspectral Imagery Denoising Shuai Liu 1,2, *, Licheng Jiao 1,2 and Shuyuan Yang 1,2 1 2

*

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi’an 710071, China; [email protected] (L.J.); [email protected] (S.Y.) Joint International Research Laboratory of Intelligent Perception and Computation, Xidian University, Xi’an 710071, China Correspondence: [email protected]; Tel.: +86-185-919-86967

Academic Editor: Jonathan Li Received: 20 July 2016; Accepted: 9 October 2016; Published: 17 October 2016

Abstract: During the acquisition process hyperspectral images (HSI) are inevitably corrupted by various noises, which greatly influence their visual impression and subsequent applications. In this paper, a novel Bayesian approach integrating hierarchical sparse learning and spectral-spatial information is proposed for HSI denoising. Based on the structure correlations, spectral bands with similar and continuous features are segmented into the same band-subset. To exploit local similarity, each subset is then divided into overlapping cubic patches. All patches can be regarded as consisting of clean image component, Gaussian noise component and sparse noise component. The first term is depicted by a linear combination of dictionary elements, where Gaussian process with Gamma distribution is applied to impose spatial consistency on dictionary. The last two terms are utilized to fully depict the noise characteristics. Furthermore, the sparseness of the model is adaptively manifested through Beta-Bernoulli process. Calculated by Gibbs sampler, the proposed model can directly predict the noise and dictionary without priori information of the degraded HSI. The experimental results on both synthetic and real HSI demonstrate that the proposed approach can better suppress the existing various noises and preserve the structure/spectral-spatial information than the compared state-of-art approaches. Keywords: hierarchical hyperspectral images

sparse

learning;

denoising;

spectral-spatial

information;

1. Introduction As three-dimensional (3D) data integrating both spectral and spatial information concurrently, hyperspectral imagery (HSI) can provide more reliable and accurate information about observed objects. It has numerous applications in remote sensing, diagnostic medicine and mineralogy, etc., with encouraging results [1–3]. HSI can be considered as a set of gray-scale images over wavelength, whose entries are the spectral responses. Due to the limited light, imaging system, photon effects, and calibration error, hyperspectral images are inevitably contaminated by annoying noises with different statistical properties [4,5]. The existing noises greatly reduce the quality of the HSI and increase the difficulty of subsequent processing, such as target detection, agriculture assessment, mineral exploitation and ground-cover classification. Therefore, noise reduction is an essential research topic in hyperspectral images analysis. Over the past two decades, various approaches have been developed for the noise reduction of HSI, which aim to restore the original HSI from its noisy images as accurately as possible. The traditional denoising algorithms, in two-dimensional (2D) image domain, have been widely utilized to remove the noise for HSI [6–10], such as wavelet transform, total variation, nonlocalmeans (NLM), K-singular Sensors 2016, 16, 1718; doi:10.3390/s16101718

www.mdpi.com/journal/sensors

Sensors 2016, 16, 1718

2 of 26

value decomposition (KSVD) and block matching three-dimensional filtering (BM3D). These methods denoise the HSI band by band and destroy the latent high-dimensional structure of the HSI, which result in a great loss of spectral correlations. Especially when the spectral bands are contaminated by heavy noise, these approaches cannot effectively restore the HSI at most time. Lam et al. [4] showed that spectral domain statistics can aid in improving the quality of the restored HSI. In this regard, some techniques treat HSI as 3D data cube rather than a 2D picture, which can fully exploit the correlations in the spectral domain. In [11,12], the kernel nonnegative Tucker decomposition and the spatial–spectral wavelet shrinkage are proposed for denoising HSI. NLM techniques based on 3D cubes are applied for the HSI recovery by exploiting the high structural sparseness/redundancy [13,14]. Total variation algorithms have been proved their effectiveness in preserving edge information for denoising HSI. In [15], the cubic total variation is developed to denoise HSI, which is achieved by exploring the 2D total variation in the spatial dimension and the 1D total variation in the spectral domain. Using spectral noise differences and spatial information differences, a spectral-spatial adaptive total variation model is proposed [16]. Nonlocal models, integrated with adaptive multidimensional total variation are shown in [17]. Noise reduction based on the tensor factorization is given in [18]. The block matching 4D (BM4D) filtering is considered in [19], which reduces the noise for the volumetric data by grouping similar cubes and exploring the collaborative filtering paradigm. By solving a Bayesian least squares optimization problem, Monte Carlo sampling has been used to estimate the posterior probability for denoising HSI [20]. In [21], an algorithm employing the hybrid conditional random field model has been introduced. To improve the denoising performance, a metric Q-weighted fusion algorithm is proposed to merge the denoising results of both spatial and spectral views [22]. Notably, the works based on sparse dictionary learning have proven their efficacy and popularity for image recovery. The success can be attributed to the fact that valid data in corrupted images are intrinsically sparse and the noises are uniformly spread in many domains [23]. Based on this view point, the noise can be successfully reduced. The 3D sparse coding is exploited to denoise HSI [24], which could fully explore the spectral information by extracting the different patches and obtain competitive results. In [25], the dictionary learning in a Bayesian manner is applied to recover the hyperspectral cube. Exploiting the intraband structure and the interband correlation, a joint spectral–spatial distributed sparse representation is proposed for noise reduction of HSI [26]. Sparse nonnegative matrix factorization combined with spectral–spatial information is also implemented to reduce the noise of HSI [27]. Under a new linear model exploring singular value decomposition and wavelet transform, the HSI denoising is achieved via sparse regularization [28]. In the HSI, adjacent spectral bands and spatial pixels are typically highly correlated, which exhibits the low-rank characteristics of HSI [29–32]. Using low-rank matrix recovery framework, an HSI restoration technique (LRMR) is explored to simultaneously remove various noises in HSI [31]. Zhu et al. presented an HSI mixed-noise removal method [32], which achieves the promising image quality by combining low-rank constraint and total-variation-regularized. Deep learning has been widely used in the HSI analysis [33,34]. Motivated by this, a deep dictionary learning method (DDL3+FT), consisting of the hierarchical dictionary, feature denoising and fine-turning, is developed to effectively suppress the noise in HSI [34]. However, most denoising methods are globally applied to the whole HSI or utilized on specific spectral bands obtained by partitioning the spectra with a fixed value. They obviously discard the high correlations in the spectral domain. Additionally, the real-world HSI are usually contaminated by the signal-independent noise, signal-dependent noise components or the mixed noise levels, e.g., Gaussian noise, Poisson noise, dead pixel lines and stripes. The dead pixel lines and stripes only exist in some pixels/bands, and they can be regarded as sparse noises [31]. Most studies above only focus on suppressing one or two kinds of specific image noise, such as Gaussian noise with constant variance among the bands or Poisson noise, which can not exactly depict the noise characteristics of the real-world HSI.

Sensors 2016, 16, 1718

3 of 26

To address the previous limitations, this paper integrates the spectral-spatial information into the hierarchical dictionary model to efficiently suppress the existing noises in HSI. The contributions of this paper can be summarized as follows: (1)

(2)

(3)

First, we present a spectral–spatial data extraction approach for HSI based on their high structural correlations in the spectral domain. Using this approach, the spectral bands, with similar and continuous features can be adaptively segmented into the same band-subset. Second, a hierarchical dictionary model is organized by the prior distributions and hyper-prior distributions to deeply illustrate the noisy HSI. It depicts the noiseless data by utilizing the dictionary, in which the spatial consistency is exploited by Gaussian process. Meanwhile, by decomposing the noise term into Gaussian noise term and sparse noise term, the proposed method can well represent the intrinsic noise properties. Last but not the least, many experiments performed under different conditions are displayed. Compared with other state-of-the-art denoising approaches, the suggested method shows superior performance on suppressing various noises, including Gaussian noise, Poisson noise, dead pixel lines and stripes.

The remainder of this paper is organized as follows: the denoising framework for 2D images based on sparse learning is presented in Section 2. The spatial-spectral data extraction and the proposed hierarchical sparse learning model, along with the Bayesian inference implemented by Gibbs sampler, are explained in detail in Section 3. Experimental results on both synthetic and real-world hyperspectral images are reported in Section 4, with comparisons between the suggested method and the state-of-the-art approaches. The concluding remarks are given in Section 5. 2. Sparse Learning for Denoising 2D Images Let X represents a 2D noisy image of size T1 × T2 . The common restoration algorithms regard that the pixels extracted from the image are similar to the pixels extracted from their neighboring region to some degree. In this sense, the observed images are often divided into many 2D patches before restoration. By transforming the i-th patch of X into the column vector xi , the commonly used image degradation framework can be formulated by Equation (1): xi = Dai + N

(1)

The first term on the right side of the equation represents the clean component of the i-th patch, where D is the underlying dictionary and ai are the sparse coefficients with respect to dictionary elements. The second part N defines the zero-mean Gaussian noise component, with the standard variance σ. Generally, the noiseless signals can be estimated by solving the lp -minimization problem given by Equation (2): (2) {D, aî } = arg min||xi − Dai ||22 + λ||ai || p D,a

In Equation (2), ||xi − Dai ||22 is the data fidelity item, which stands for the fidelity between the observed noisy image and the original clean image, and ||ai || p is the regularization item, which gives the prior knowledge of noiseless images. λ ≥ 0 is a regularization parameter, which is utilized to trade off between the fidelity item and sparseness of the coefficient. A larger λ usually implies a much sparser solution of a. Denoising approaches aim to restore the noise-free patch xˆ i from degraded data xi by calculating xˆ i = Daˆ i . The sparse prior, p is usually set to zero or one. When p = 0, ||ai ||0 represents the number of nonzero elements in ai ; when p = 1, ||ai ||1 is the sum of elements’ absolute value. l0 norm is not convex, and it cannot guarantee the uniqueness of the solution. Meanwhile, l0 -minimization is an NP-hard combinatorial search problem in general case, which is usually solved by greedy algorithms. l1 norm is a convex function. The uniqueness of solution can be guaranteed by replacing l0 norm with l1 norm [35]. Traditional convex optimization methods are used to solve the problem by alternating minimization of D and ai .

Sensors2016, 16, 1718

4 of 26

Sensors 2016,l016, 1718 with l1 norm [35]. Traditional convex optimization methods are used to solve4 of 26 replacing norm the problem by alternating minimization of D and ai. According to the types of dictionary basis, methods can be broadly classified into two groups According to the types of dictionary basis, methods can be broadly classified into two groups [36]: [36]: (1) transformation-based methods. In these works, the mathematical model are developed to (1) transformation-based methods. In these works, the mathematical model are developed to predict predict a dictionary, e.g., discrete cosine transformation (DCT), wavelet, curvelet, contourlet and a dictionary, e.g., discrete cosine transformation (DCT), wavelet, curvelet, contourlet and shearlet; shearlet; (2) learning-based methods [9,37]. The learning techniques are utilized to construct the (2) learning-based methods [9,37]. The learning techniques are utilized to construct the dictionary dictionary from the training data in these studies. The learned dictionary has shown better from the training data in these studies. The learned dictionary has shown better performance in many ̂𝑖 are given, the idea image xi can be obtained as performance in many applications. Once D and 𝒂 applications. Once D and aˆ i are given, the idea image xi can be obtained as xˆ i = Daˆ i . Bayesian methods ̂𝑖 = 𝑫𝒂 ̂𝑖 . Bayesian methods have been widely used for learning the sparse dictionary [25,38], which 𝒙 have been widely used for learning the sparse dictionary [25,38], which have achieved competing have achieved competing restoration performance. restoration performance. All these conventional methods have achieved good performance on denoising 2D images, but All these conventional methods have achieved good performance on denoising 2D images, but for more complicated pictures, e.g., hyperspectral images, the conventional 2D methods usually for more complicated pictures, e.g., hyperspectral images, the conventional 2D methods usually have less ability to explain the structural or textural details, which significantly limits the ability of have less ability to explain the structural or textural details, which significantly limits the ability of representation and the range of application for the dictionary learning. In this case, the hierarchical representation and the range of application for the dictionary learning. In this case, the hierarchical dictionary learning methods have been introduced to deeply exploit the latent characteristics of dictionary learning methods have been introduced to deeply exploit the latent characteristics of images [37–39], with promising results. images [37–39], with promising results.

3.3.HSI HSIDenoising Denoisingwith withSpectral-spatial Spectral-SpatialInformation Informationand andHierarchical HierarchicalSparse SparseLearning Learning For restoringwell wellthethe HSI, we extend the denoising framework for 2Dinto images into the For restoring HSI, we extend the denoising framework for 2D images the hierarchical hierarchical dictionary learning framework in a Bayesian manner. Let Y be a HSI with a of dictionary learning framework in a Bayesian manner. Let Y be a HSI with a size of lm × ln × lλsize , where llmml× nl, where lmln represents the number of spatial pixels, and l defines the size of the spectral ln represents the number of spatial pixels, and lλ defines the size of the spectral domain. Generally, domain. correlations of HSI are of more importance than spatial characteristics, spectral Generally, correlationsspectral of HSI are of more importance than spatial characteristics, which have recently which have recently been reported to enhance performance in tracking, classification and target been reported to enhance performance in tracking, classification and target detection, etc. The proposed detection, etc. The methodspatial-spectral consists of twodata main stages: spatial-spectral data extraction method consists of proposed two main stages: extraction and noise-free estimation based and noise-free estimation based on hierarchal spare learning. Firstly, according to the structure on hierarchal spare learning. Firstly, according to the structure similarity index measure (SSIM), similarity indexpartition measureis(SSIM), a band-subset introduced to segment the HSI Each into a band-subset introduced to segmentpartition the HSIisinto multiple/one band-subsets. multiple/one band-subsets. Each band-subset of continuous bands structure band-subset consists of continuous bands with consists high structure correlation. Thenwith localhigh homogeneous correlation. Then local homogeneous region can be extracted for each spectral pixel. Secondly, a region can be extracted for each spectral pixel. Secondly, a hierarchical sparse learning model, which hierarchical sparse learning model, which is composed of the clean image term, Gaussian noise is composed of the clean image term, Gaussian noise term and sparse noise term, is constructed to term and well sparse noise term, is To constructed to suppress wellspatial the various noises. To effectively suppress the various noises. effectively capture the latent information of HSI, a Gaussian capture the latent spatial information of HSI, a Gaussian process with gamma constraints is process with gamma constraints is imposed to the dictionary in the clean image term. The second imposed to theterm dictionary clean term. The secondofand the third term areGaussian used to noise, infer and the third are usedintothe infer the image statistics characteristics existing noises, e.g., the statistics characteristics of existing noises, e.g., Gaussian noise, Poisson noise, dead pixel lines, Poisson noise, dead pixel lines, stripes or a mixture of them. By solving this framework in a Bayesian stripes orthe a mixture them. for By each solving this framework a Bayesian the restored results manner, restoredofresults band-subset can bein obtained. Themanner, framework of the proposed for each band-subset can be 1. obtained. The framework of the proposed method is shown in Figure 1. method is shown in Figure

Spectral Correlation Calculation between Adjacent Bands Local Segmentation Point Detection

Hierarchical Model Construction for Each Band-subset Bayesian Inference for Each Band-subset

Non-overlapped Band-subset Generation Denoised Hyperspectral Image Overlapped Cubic Patch Partition for Each Band-subset

Noisy Hyperspectral Image Spatial-spectral Data Extraction

Hierarchical Sparse Learning for Denoising Each Band-subset

Figure1.1.Framework Frameworkof ofthe theproposed proposedapproach. approach. Figure

Sensors2016, 16, 1718 Sensors 2016, 16, 1718

5 of 26 5 of 26

3.1. Spatial-spectral Data Extraction 3.1. Spatial-Spectral Data Extraction In HSI, the neighboring bands are acquired under relatively similar sensor conditions, and they are strongly correlated in thebands spectral domain. Based on this, SSIM is utilized to measureand spectral In HSI, the neighboring are acquired under relatively similar sensor conditions, they correlations adjacent bands [40]. Suppose Bj and Bj+1 represent 2D images lying inspectral the j-th are strongly between correlated in the spectral domain. Based on this, SSIM is utilized to measure and j+1-th band, respectively. similarity the and j+1-th band can be defined by correlations between adjacentStructure bands [40]. Supposebetween Bj and B represent 2D images lying in the j-th j+1j-th Equation and j+1-th(3): band, respectively. Structure similarity between the j-th and j+1-th band can be defined by Equation (3): 2   c 2  B j B j 1 1 B j , B j 1  c2 2µ µ + c 2σ + c 2 B j B j +1 Bj ,Bj+1 1 SSIM ( B j , B j 1 )  (3) 2   2  c (3) SSI M(B j , B j+ 1 ) =  B2   B2  c1  B B 2 2 2 2 2 j j µ B + µ B j1 + c1 σB + σjB1 + c2



 j

 

j +1



j



j +1

In Equation (3), 𝜇 and 𝜎 𝑗 are the mean and variance of band Bj, respectively; 𝜇𝐵𝑗+1 and In Equation (3), µ B𝐵j 𝑗and σBj 𝐵are the mean and variance of band Bj , respectively; µ Bj+1 and σB2 j+1 2 𝜎𝐵𝑗+1 are the mean and variance of band Bj+1; The predefined constants c1 and c2 are applied to are the mean and variance of band B ; The predefined constants c and c are applied to stabilize 1

j+1

2

stabilize thewith division weak denominator. Normally, the closer SSIM(Bj,Bj+1) is to one, the the division weakwith denominator. Normally, the closer SSIM(B j ,Bj+1 ) is to one, the stronger the stronger the structuralare correlations arej-th between the j-th and j+1-th bands. Sc(j) ), = structural correlations between the and j+1-th spectral bands.spectral Supposing Sc (j)Supposing = SSIM(Bj ,B j+1 SSIM(B j,Bj+1), the structural correlation curve Sc can be generated. Figure 2 shows the structural the structural correlation curve Sc can be generated. Figure 2 shows the structural correlation curves of correlation curves of different different hyperspectral images.hyperspectral images.

(a)

(b)

(c)

Figure 2. Structural correlation curves of different hyperspectral images: (a) Pavia data; (b) Urban Figure 2. Structural correlation curves of different hyperspectral images: (a) Pavia data; (b) Urban data. data. (c) Indian Pines data. (c) Indian Pines data.

Based on the curves in Figure 2, it can be found that the correlation coefficients across adjacent Based on the curves in Figure 2, itfor candifferent be foundhyperspectral that the correlation coefficients adjacent bands obviously vary considerably images. The curveacross in Figure 2a bands obviously vary considerably for different hyperspectral images. The curve in Figure 2a displays displays a relatively steady trend. Obvious drops exist in Figure 2b,c, which means that some aadjacent relatively steady trend. Obvious exist inPines Figure that somesimilarity; adjacent spectral bands in Urban datadrops and Indian data2b,c, havewhich much means lower structural spectral bands in Urban data and Indian Pines data have much lower structural similarity; meanwhile, meanwhile, continuous spectral bands between two neighboring drops show a relatively stable continuous spectral bands between twoneglect neighboring drops show a relatively However, trend. However, most previous studies this property of spectral bands.stable Thesetrend. approaches are most previous studies neglect this property of spectral bands. These approaches are applied to directly applied to directly restore the entire bands, or to sequentially denoise the band-subset constructed restore the entireallbands, or to sequentially denoise the To band-subset by partitioning all by partitioning spectral bands with a fixed value. make the constructed best of the correlations across spectral bands with this a fixed value. To make the bestpartition of the correlations bands, neighboring bands, paper explores the optimal in spectral across domainneighboring by estimating the this paper explores the optimal partition in spectral domain by estimating the drops in curve Sc . drops in curve Sc. The detailed procedure of spatial-spectral data extraction is as follows: firstly, The detailed procedure of spatial-spectral data extraction is as follows: firstly, form the structural form the structural correlation curve Sc using Equation (3). Secondly, detect the local segmentation correlation curve Sc using (3). Secondly, detect the local segmentation points Sc (j) in the points Sc(j) in the curve Sc. SEquation c(j) meets the condition denoted by Equation (4): curve Sc . Sc (j) meets the condition denoted by Equation (4): (   Sc  j  1  Sc  j    Sc ( j − 1) − Sc ( j) > η j 11) −  SSc (jj) >η SSc c( j + c

(4) (4)

In Equation (4),  is a predefined threshold, which is applied to avoid the local disturbance In Equation (4), η is a predefined threshold, which is applied to avoid the local disturbance caused caused by noise in curve Sc. The beginning and ending points in Sc can be treated as the intrinsic by noise in curve Sc . The beginning and ending points in Sc can be treated as the intrinsic local partition local partition points. To better exploit useful spectral characteristics and suppress the noise [40], a points. To better exploit useful spectral characteristics and suppress the noise [40], a fused image is fused image is generated by using the average of the spectral bands between the neighboring generated by using the average of the spectral bands between the neighboring segmentation points. segmentation points. Based on Equations (3) and (4), it is determined whether adjacent fused images are to be merged or not. By this way, the segmentation points in curve Sc can be identified.

Sensors 2016, 16, 1718

6 of 26

Based on Equations (3) and (4), it is determined whether adjacent fused images are to be merged or not. By this way, the segmentation points in curve Sc can be identified. As shown in Figure 2a, the correlations across all neighboring bands have a relatively stable trend, and there is no need for spectral band segmentation in this case. As shown in Figure 2b,c, the HSI can be divided into the non-overlapping band-subsets based on the segmentation points. In this case, the HSI itself can be considered as a special band-subset. Let C denote the number of band-subsets. Above all, the noisy data can be reshaped as X = {X 1 , . . . , X c , . . . , X C }, c = 1, . . . , C and X c is the c-th band-subset. Finally, to effectively preserve the local details of HSI in the spatial domain [41], we utilize the cubic patches instead of the 2D patches during the denoising process. By this, each band-subset is divided into many overlapped cubic patches. The size of each cubic patch is fixed as lx × ly × lc , where lx × ly is the size of spatial dimensions and lc defines the number of spectral bands in the c-th band-set. We note that there are various material categories within the same cubic patch, which increase the spectral-signature contamination (mixing/blurring) as the spatial size of cubic patches gets bigger. Therefore, the larger lx and ly may result in the instability of classification accuracies. If the neighborhood size is too small, it cannot well explore the spatial information. In this paper, we make a balance between the stability and the exploitation of spatial information, and we set lx = ly = 4 in the experiments. After reshaping the cubic patches into the vectors, we can obtain the c-th band-subset Xc = x1c , . . . , xic , . . . , xcM , where M = (lm − lx + 1)(ln − ly + 1) represents the number of cubic patches and xci ∈ RP , P = lx × ly × lc is the vector generated by the i-th cubic patch of the c-th band-subset. 3.2. Hierarchical Sparse Learning for Denoising Each Band-Subset In this subsection, the hierarchical sparse learning framework is well constructed with prior distribution and hyper-prior distribution to restore the noisy HSI [27,37,42], which integrates the Gaussian process and Gamma distribution into dictionary atoms to explore the spatial consistency of HSI simultaneously. The novel prior consisting of prior distribution and hyper-prior distribution is often called as hierarchical prior. Noticed that, the sparse learning framework with multiple hierarchical priors can be regarded as a special case of deep learning. Considering the dataset X = {X 1 , . . . , X c , . . . , X C }, c = 1, . . . , C, the suggested approach sequentially depicts the characteristics of each subset to recover HSI by utilizing independent draws. For the c-th band-subset Xc = x1c , . . . , xic , . . . , xcM , the hierarchical denoising model can be denoted as Equation (5): Xc = Dc Ac + Nc + Qc ◦ Sc

(5)

where the symbol ◦ represents the element-wise multiplication. The first term on the right side of the Equation (5) represents the ideally noiseless estimation for X c , which is represented as a linear combination of dictionary atoms. Dc = [d1c , . . . , dic , . . . , dcK ] ∈ RP×K has columns that denote the dictionary atoms to be learned, where K is the number of dictionary atoms. In the context of HSI, the signatures of xic have the highly spatial consistency with samples located at its neighboring region. To better learn the HSI [43], this prior knowledge is explicitly imposed to the dictionary atom dck by using the Gaussian process (GP), which is constrained by the Gamma distribution. Additionally, there is a very high probability that the spectral pixels with similar characteristics in a local region share the same dictionary atoms, which is consistent with the cognition of vision and structure. Ac = a1c , . . . , aic , . . . , acM ∈ RK × M places a sparse restriction on the vectors to remove noise, which can be written as Ac = W c ◦ Zc . The matrix Wc = w1c , . . . , wic , . . . , wcM , with the size of K × M, represents the weight of matrix Ac , where wic is drawn from Gaussian distribution. The sparseness of matrix Ac is adjusted by Zc = z1c , . . . , zic , . . . , zcM ∈ {0, 1}K × M , which is drawn from the Beta-Bernoulli process. Let symbol ∼ define an i.i.d. draw from a distribution, N represents the Normal distribution, IK is

Sensors 2016, 16, 1718

7 of 26

the K × K identity matrix, Bern and Beta refer to the Bernoulli and Beta distributions, respectively. Then the noise-free data Dc Ac may be represented in the following manner:

2

Σ jj0 = ξ 1 exp(− xcj − xcj0 /ξ 2 ) ξ 1 ∼ Γ ( β, ρ) 2 c c c c c wi ∼ N (0, IK /γw ) zki ∼ Bern πk πk ∼ Beta ( a0 /K, b0 (K − 1)/K ) dck ∼ N (0, Σ)

(6)

where ξ 1 is drawn from the Gamma distribution and ξ 1 is the predefined constant. Both of them are reapplied to manifest the smoothness between xcj and xcj0 . If ||xcj − xcj0 ||2 is small, and the 2 corresponding components between xcj and xcj0 are regarded to have a great spatial consistency. According to this, both the spectral correlation and spatial consistency for the c-th band-subset X c can be exploited simultaneously. zcki denotes whether the atom dck is utilized to represent xic with probability πkc or not. When K→∞, the expectation E(πkc ) = K− 1 a0 /[K− 1 a0 + K− 1 (K−1)b0 ] is approximately zero. According to this, most elements in the set zcki k=1,...,K are equal to zero, and the sparsity is reasonably imposed to the vector aic . Obviously, aic consists of a few non-zero values and numerous zero values. In this sense, the constraint constructed by the Beta-Bernoulli process can be regarded as a special type of l0 norm. Notice that, noiseless estimation for each xcj uses a specific subset of dictionary atoms {dck }k=1,...,K , which is specified by the sparse vector zcki k=1,...,K . The expression zcki = 1 indicates that the atom dck is employed for representing coefficient acki . By analyzing the number and the locations non-zero elements in the set zcki k=1,...,K , the size of the dictionary can be learned adaptively, including atom selection and prediction. The matrix Nc = n1c , . . . , nic , . . . , ncM , with the size of P × M, defines the zero-mean Gaussian noise. nic is drawn from the zero-mean Gaussian noise components, with standard variance γn , and it can be displayed by the expressions in Equation (7). Additionally, Poisson noise is a dependent-signal noise, which means the pixels with higher light can be very strongly interrupted. To remove the dependency of the noise variance, the variance stability transform (VST) is introduced to convert Poisson noise into Gaussian noise before implementing a denoising approach [14,44]. After recovery, a corresponding inverse transformation is utilized to obtain the final HSI restoration results. nic ∼ N (0, IP /γnc )

(7)

The third term denotes the sparse noise, such as dead pixel lines, which exploits the Beta-Bernoulli process to manifest the sparseness as displayed in Equation (8). The matrix Qc = q1c , . . . , qic , . . . , qcM , with the size of P × M, represents the intensity of the sparse noise, and Sc = s1c , . . . , sic , . . . , scM ∈ RP× M depicts the location information where the sparse noise may exist. qic ∼ N (0, IP /γvc )

scpi ∼ Bernoulli θ cpi

θ cpi ∼ Beta ( aθ , bθ )

(8)

When scpi = 1, the m-th element of xic is polluted by sparse noise with the amplitude qcpi . The expectation E(θic ) = aθ /(aθ + bθ ) could be close to zero by adjusting the shape parameters of the Beta distribution (a0 and b0 ). Each element of sic is i.i.d. drawn from the Beta distribution separately, which depicts well the arbitrariness of positions in the sparse noise. c , γc and γc are interpretable as non-informative hyper-parameters, which can regulate the γw n v precision of wic , nic and qic . To flexibly solve the model with the posterior PDF, Gamma distribution is developed for these hyper-parameters as shown in Equation (9): c γw ∼ Γ (c, d)

γnc ∼ Γ (e, f )

γvc ∼ Γ ( g, h)

(9)

Sensors 2016,16, 16,1718 1718 Sensors2016,

Xc

of26 26 88of







c cposterior c cdensity c c c of cthec above c c model c The negative logarithm function (utilized jointly to all data c  log p D , wi  , zi  , qi  , si  ,  k  , θ p  , 1 ,  w ,  n ,  v  xi   = xi i=1,...,M ) may be represented as Equation (10): T 1 c 2 n 1 xic  D c ( wic zic )  qic sic   k  dnkc    n o dk o i 2 c c c 2 c 2 c c c c , γ c , γ c xc −logp D , wi ,2 zi , qi , si , π k , θ p , ξ 1c , γw = n v i w c c c 2K , b10  K  1c TK )− γn  w  Beta (  a  log Bernoulli z  kc c 1 c c c c c c      ik k 0 ik i , k D (w ◦ z k) − q ◦ s || + ∑ (d ) Σ i,k | |x − d ∑ i k k k 2 2 2 i i i i i 2 c 2+ c π c + γ2w ∑ i,k wik Beta(πkc | ac 0 /K, b0 (K − 1)/K ) + ∑i,k logBernoulli zik c 2 ∑k c c v k q    p Beta ( p a , b )   p ,i logBernoulli s pi  pi (10) + c γv 2  pc,i 2pi c c (10) + 2 ∑ p,i q pi + ∑ p Beta(θ p | aθ , bθ ) + ∑ p,i logBernoulli s pi θ pi c c  log Gamma  )logGamma  log Gamma 1 ρ), + +logGamma (ξ c (|β, (γ(c w|c,c,dd))





1





w

 log Gamma ) logGamma log Gamma((γvcv | g, g , h ))+const +logGamma (γnc(|e,n ef ,)f+ const c

c

According Equation(10), (10),the thefull fullposterior posterior parameters be obtained rather a According to to Equation of of all all parameters cancan be obtained rather than than a point point approximation, is commonly by maximum a posteriori Allparameters necessary approximation, which which is commonly applied applied by maximum a posteriori (MAP). All(MAP). necessary parameters can be byand the prior and hyper-prior distribution, so the for solving for thesolving model the can model be learned bylearned the prior hyper-prior distribution, so the proposed proposed hierarchical sparse learningis framework robust and accurate. A graphical hierarchical sparse learning framework more robust is andmore accurate. A graphical representation of the representation of the complete model is shown in Figure 3. complete model is shown in Figure 3.

Hyperprior

Prior

Figure 3. Graphical representation of the hierarchical sparse learning model.

Figure3. Graphical representation of the hierarchical sparse learning model.

Gibbs sampling is then implemented to solve the hierarchical model by employing the conjugacy Gibbs sampling then implemented solve the equations hierarchical by employing of the model posterioris distributions [45,46]. toThe update formodel each random variable the are conjugacy of the model posterior distributions [45,46]. The update equations for each random drawn from the conditional distributions on the most recent values of all other ones in the model. variable are drawn from theequations conditional most recent values of all other in The details of the update candistributions be found inon thethe Appendix. After performing theones Gibbs the model. The details of the equations can{be the Appendix. performing sampling, Dc and Ac can beupdate obtained based on dc }found , wic inand zic . ThenAfter restored imagesthe for c be obtained on k{𝒅𝑐 }, {𝒘 𝑐 𝑐 c andcAc can Gibbs sampling, D based } and {𝒛 }. Then restored images c c c c 𝑘 𝑖 by calculating 𝑖 the c-th band-subset X = x1 , . . . , xi , . . . , x M can be inferred D A . Additionally,for as 𝑐 𝑐 𝑐 𝑐 cAc. Additionally, as the c-th band-subset 𝑿 = [𝒙 , … , 𝒙 , … , 𝒙 ] can be inferred by calculating D 1 𝑀 𝑖 there are several solutions for the same spectral pixel due to the usage of overlapping cubic patches, there are restored several solutions for the same pixel to the usage overlapping cubic patches, the final result is constructed byspectral averaging alldue overlapping cubicofpatches. For all band-subsets the final restored result is constructed by averaging all overlapping cubic patches. For all 1 c C X = {X , . . . , X , . . . , X }, c = 1, . . . , C, the whole HSI after denoising can be obtained by sequentially 1 c C band-subsets X =operation. {X ,…,X ,…,X }, c = 1,…,C, the whole HSI after denoising can be obtained by performing this sequentially performing this operation. 4. Experimental Results and Discussion 4. Experimental Results and Discussion To evaluate the performance of the proposed approach, six state-of-the-art denoising methods To evaluate performance of the proposed approach, sixdata. state-of-the-art denoising methods were selected for the a comparison based on both simulated and real The compared methods include were selected for a[10], comparison based both[19], simulated The compared methods K-SVD [9], BM3D ANLM3D [13],on BM4D LRMR and [31] real and data. DDL3+FT [34]. The necessary include K-SVD [9],K-SVD, BM3D BM3D, [10], ANLM3D [19],methods, LRMR [31] DDL3+FT The parameters in the ANLM3D[13], and BM4D DDL3+FT wereand finely tuned or[34]. selected necessary parameters in the K-SVD, BM3D, ANLM3D and DDL3+FT methods, were finely tuned or automatically to generate the optimal experimental results as the reference suggested. In BM4D, selected automatically to generate the optimal experimental results as the reference suggested. In

Sensors 2016, 16, 1718

9 of 26

the noise variance is chosen from the set {0.01, 0.03, 0.04, 0.05, 0.07, 0.09, 1.1}. In LRMR, the rank of the noiseless matrix is selected from {4, 5, 6, 7}, and the cardinality of the sparse term is chosen from the set {0, 500, 1000, 1500, 2000, 3000, 4000, 5000}. For the proposed method, K is set to a relatively small value to reduce computational time, and here we choose K = 128. The parameters of the hyper-priors in the Gaussian process are set to β = ρ = 10− 6 and ξ 2 = 200. The iteration number of Gibbs sampling is set to 100. For setting the remaining parameters of the hyper-priors, two cases are taken into consideration: (1) The HSI is contaminated by Gaussian noise, dead pixel lines or a mixture of them. In this case the parameters are set as follows: a0 = b0 = c = d = 10− 6 ; e = f = 10− 6 ; aθ = bθ = g = h = 10− 6 ; (2) The HSI is contaminated by a mixture of Gaussian and Poisson noise or a mixture of Gaussian noise, Poisson noise and dead pixel lines. The parameters are set as follows: a0 = b0 = 10− 4 ; c = d = 10− 5 ; e = f = 10− 6 ; aθ = bθ = 10− 4 ; g = h = 10− 5 . Once the types of noises are selected, the parameters of the hyper-priors are then set as described above and have no need to be tuned. By using these hyper parameters, the proposed framework can efficiently learn the latent information and suppress the various noises for the input HSI by sampling the infinite prior space. For one Gibbs sampling iteration, the computational complexity of the proposed method is approximate to O(K(P + M) + PM). It should be pointed out that the proposed framework consumes much higher time than the six compared algorithms. 4.1. Experiment on Simulated Data An image of Pavia University, acquired by the reflective optics system imaging spectrometer (ROSIS) over Pavia, northern Italy, is used for our experiments on simulated data. It consists of 103 spectral bands with a size of 610 × 340 for each band andcontains theentire spectral resolution reflectance characteristics collected from 430 nm to 860 nm in 10 nm steps. Before the simulated experiments, the gray values of the Pavia University data are scaled to the interval [0, 1]. For the sake of comparison, two kinds of evaluations are presented: (1) The visual appearances are shown to qualitatively estimate the experimental results before and after denoising, including spatial images at selected bands and spectral signatures of a few pixels; (2) The two commonly used metrics, peak signal to noise ratio (PSNR) and feature similarity index measure (FSIM) are provided to make a quantitative assessment on the denoising results. PSNR measures gray-level similarity between the restored image and reference image according to MSE. FSIM is designed to estimate the simulated result by integrating the gradient magnitude feature with the phase congruency feature from the view of human’s perceptual consistency [17,47]. Better denoising performance is indicated by higher PSNR and FSIM values. To estimate the effectiveness of the proposed approach, three types of noise are considered for the Pavia data: (1) zero-mean Gaussian noise; (2) Poisson noise. It is parameterized by the expression X Poisson = X × peak, where X Poisson refers to the noisy HSI corrupted by Poisson noise; X is the reference image; peak denotes the Poisson noise intensity; (3) Dead pixel lines. They are added to the same position of the selected bands in HSI, and their width varies from one line to three lines. In the simulated experiments, the noises are added for the Pavia data as the following three cases: Case 1: The noise standard variance σ for each band of the HSI is randomly selected from 0.02 to 0.15. Figure 4 displays the PSNR and FSIM values of each band before and after recovery, which is as the quantitative evaluation for Pavia data. In Figure 4, the noisy HSI curves show big fluctuations with the spectral bands, which is caused by the changing σ with the spectral bands. Therefore, the noise information is very important to substantially improve the denoising performance. By adaptively predicting the noise and fully exploring the spectral-spatial information, the proposed method obtains the higher values of PSNR and FSIM than its competitors in most bands. KSVD and BM3D restore the HSI with the predefined fixed noise variance and do not learn σ in the simulated process. Meanwhile, they are implemented in the noisy HSI band by band and ignore the strongly spectral correlations. KSVD and BM3D show the lower values in both Figure 4a,b. ANLM3D and BM4D suppress the noise by exploiting both spatial and spectral information, which yields better simulated results compared with KSVD and BM3D. ANLM3D presents quite unstable performance, as shown in Figure 3. LRMR

Sensors 2016, 16, 1718

10 of 26

Sensors2016, 16, 1718

10 of 26

takes advantage of the low-rank property in HSI, and it presents a better FSIM value by10retaining Sensors2016, 16, 1718 of 26 FSIMfeatures value by well the latent features of HSI. exploring hierarchicalDDL3+FT deep well better the latent ofretaining HSI. By exploring the hierarchical deepBy learning andthe fine-tuning, learning and fine-tuning, DDL3+FT shows comparable PSNR values as shown in Figure 4a. LRMR shows comparable valueswell as shown in Figure LRMR DDL3+FT transformdeep the HSI better FSIM valuePSNR by retaining the latent features4a. of HSI. By and exploring the hierarchical and transform the HSI into a 2D matrix during the recovery process, and both them and fine-tuning, shows comparable PSNR values as shown in Figure 4a. of LRMR intolearning a 2DDDL3+FT matrix during theDDL3+FT recovery process, and both of them cannot effectively exploit the spatial cannot effectively exploit of the HSI. Obviously, the plots obtained by the and DDL3+FT transform thethe HSIspatial into aconsistencies 2Dobtained matrix during the recovery approach process, and botha of them consistencies of HSI. Obviously, the plots by suggested have more stable suggested approach have the a more stable trend than the six compared approaches. This demonstrates cannot effectively exploit spatial consistencies of HSI. Obviously, the plots obtained by the trend than the six compared approaches. This demonstrates the effectiveness and robustness of the the effectiveness and robustness of the proposed method on reducing zero-mean noise suggested approach have a more stable trend than the six compared ThisGaussian demonstrates proposed method on reducing zero-mean Gaussian noise with the σapproaches. varying across bands. with the  varying bands. of the proposed method on reducing zero-mean Gaussian noise the effectiveness andacross robustness with the  varying across bands. 40 1 PSNR PSNR

PSNR PSNR

1

FSIM FSIM

40 Noisy HSI KSVD Noisy HSI BM3D KSVD Spectral Bands ANLM3D BM3D BM4D Spectral ANLM3D Bands LRMR BM4D DDL3+FT LRMR Ours DDL3+FT Ours

25

25

0.7

10 10

0.7

0

0.4

55 110 Spectral Bands 55 110 Spectral (a) Bands

0

0.4

0

55 110 Spectral Bands 55 110 Spectral (b) Bands

0

(a) Figure4. Quantitative evaluation results for Pavia data: (a) PSNR; (b) (b) FSIM. Figure 4. Quantitative evaluation results for Pavia data: (a) PSNR; (b) FSIM. Figure4. Quantitative evaluation results for Pavia data: (a) PSNR; (b) FSIM.

In terms of visual comparison, Figure 5 shows the denoised results of band 101 calculated by In terms of of visual Figure shows thedenoised denoised results ofwell band 101 calculated different approaches. Itcomparison, can be clearly seen55that the proposed method works in suppressing the by In terms visualcomparison, Figure shows the results of band 101 calculated by different approaches. It can be clearly seen that the proposed method works well in suppressing noise and preserving the detailed local structure. different approaches. It can be clearly seen that the proposed method works well in suppressing the the

noise andand preserving the noise preserving thedetailed detailedlocal local structure. structure. 1

1

1

1

1 0.8

1 0.8

1 0.8

1 0.8

0.8 0.6

0.8 0.6

0.8 0.6

0.8 0.6

0.6 0.4

0.6 0.4

0.6 0.4

0.6 0.4

0.4 0.2

0.4 0.2

0.4 0.2

0.4 0.2

0.2 0

0.2 0

0.2 0

0.2 0

0

0

0

0.5

1

0.5 (a)

1

0

0

0

(a)

0.5

1

0.5 (b)

0

1

0

0.5

0

1

0.5 (c)

0

1

0

0.5

0

0.5 (d)

(c)

(b) 1

1

1

1

1 0.8

1 0.8

1 0.8

1 0.8

1 0.8

0.8 0.6

0.8 0.6

0.8 0.6

0.8 0.6

0.8 0.6

0.6 0.4

0.6 0.4

0.6 0.4

0.6 0.4

0.6 0.4

0.4 0.2

0.4 0.2

0.4 0.2

0.4 0.2

0.4 0.2

0.2 0

0.2 0

0.2

0.2

0

0

0

0.5 0.5 (e)

1 1

0

0

0

0.5 0.5 (f)

1 1

0

0

0

0.5 0.5 (g)

(g)

1 1

1

(d)

1

0

1

0 0

0

0

0.5 0.5 (h)

1 1

0.2 0 0

0

0

0.5 0.5 (i)

1 1

(h)

(e) 5. Restored images (f) of band 101corruptedwith Gaussian noise: (a) clean HSI; (b) noisy (i) HSI; Figure (c) KSVD; (d) BM3D; (e) ANLM3D; (f) BM4D; (g) LRMR; (h) DDL3+FT; (i)clean Our method. Figure 5. Restored images of band 101corruptedwith Gaussian noise: (a) HSI; (b) noisy HSI;

Figure 5. Restored images of band 101corruptedwith Gaussian noise: (a) clean HSI; (b) noisy HSI; (c) KSVD; (d) BM3D; (e) ANLM3D; (f) BM4D; (g) LRMR; (h) DDL3+FT; (i) Our method. (c) KSVD; (d) BM3D; (e) ANLM3D; (f) BM4D; (g) LRMR; (h) DDL3+FT; (i) Our method.

Sensors 2016, 16, 1718

11 of 26

This is further Sensors2016, 16, 1718 depicted by the amplified area in the restored images of all competing 11methods. of 26 KSVD shows very poor performance and loses useful structural information. BM3D smooths out some Thisobjects, is further depicted the amplified the restored imagescan of all competingutilize methods. important which also by works worse onarea the in recovery. ANLM3D effectively the high KSVD self-similarity shows very poor performance and loses useful and structural information. BM3D but smooths nonlocal to balance between smoothing structural preservation, it stillout fails to some the important alsoThe works worse on the obtained recovery. by ANLM3D recover profilesobjects, of localwhich targets. denoised result BM4D can andeffectively DDL3+FTutilize lose some high nonlocal self-similarity to balanceresults between andmethod, structural preservation, butnoise it finethe objects. LRMR can obtain comparable to smoothing the proposed but the Gaussian is still fails to recover the profiles of local targets. The denoised result obtained by BM4D and not well reduced as shown in Figure 5g. Clearly, these visual assessment results are totally consistent DDL3+FT lose some fine objects. LRMR can obtain comparable results to the proposed method, but with the above numerical evaluations. the Gaussian noise is not well reduced as shown in Figure 5g. Clearly, these visual assessment Case 2: Mixed noise consisting of zero-mean Gaussian noise with standard variance σ = 0.2 and results are totally consistent with the above numerical evaluations. dead pixel For noise Case 2, the restored results atGaussian band 95noise calculated by different methods, Caselines. 2: Mixed consisting of zero-mean with standard variance  = 0.2including and KSVD, ANLM3D, BM4D, LRMR, DDL3+FT the proposed approach, are shown in Figure 6. deadBM3D, pixel lines. For Case 2, the restored resultsand at band 95 calculated by different methods, One area of KSVD, interestBM3D, is amplified in BM4D, the recovery ofand all methods. Asapproach, shown in including ANLM3D, LRMR, images DDL3+FT the proposed areFigure shown6a–h, we incan find that the result obtained by the suggested method is much closer to the clean HSI in Figure 6. One area of interest is amplified in the recovery images of all methods. As shown in Figure 6a–h, we canThe finddead that pixel the result the suggested method which is much closerthat to the visual performance. linesobtained in Figureby6c–h are still obvious, means the six clean HSI in visualfail performance. in What’s Figure 6c–h areaccording still obvious, which means area compared methods to suppressThe thedead deadpixel pixellines lines. more, to the amplified thatbe the six compared to suppress the dead lines. What’s more, according to the it can easily observedmethods that ourfail restoration method canpixel effectively restore the homogenous region amplified area it can be easily observed that our restoration method can effectively the the while preserving the edges. KSVD can partly ameliorate the noisy image quality, butrestore it destroys homogenous region while preserving the edges. KSVD can partly ameliorate the noisy image structure of sparse coefficients in dictionary learning process, which results in the loss of edges and quality, but it destroys the structure of sparse coefficients in dictionary learning process, which other structural details as shown in Figure 6c. BM3D reduces the noises by utilizing the statistics of the results in the loss of edges and other structural details as shown in Figure 6c. BM3D reduces the similar neighboring patches and achieves a much better visual impression than KSVD, but it smooths noises by utilizing the statistics of the similar neighboring patches and achieves a much better visual outimpression the structural and its result blurred. information, and its result is blurred. thaninformation, KSVD, but it smooths out theisstructural 1

1

1

1

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0

0

0.5

0

1

0

(a)

0.5

0

1

0

0.5

(b)

0

1

0

0.5

(d)

(c)

1

1

1

1

1

0.8

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.2

0

0

0.5

(e)

1

0

0

0.5

(f)

1

0

0

0.5

(g)

1

1

0

0

0.5

(h)

1

0

0

0.5

1

(i)

Figure 6. Restored results of band 95 corrupted with a mixture of Gaussian noise and dead pixel

Figure 6. Restored results of band 95 corrupted with a mixture of Gaussian noise and dead pixel lines: lines: (a) clean HSI; (b) noisy HSI; (c) KSVD; (d) BM3D; (e) ANLM3D; (f) BM4D; (g) LRMR; (h) (a) clean HSI; (b) noisy HSI; (c) KSVD; (d) BM3D; (e) ANLM3D; (f) BM4D; (g) LRMR; (h) DDL3+FT; DDL3+FT; (i) Our method. (i) Our method.

Sensors 2016, 16, 1718 Sensors2016, 16, 1718 Sensors2016, 16, 1718

12 of 26 12 of 26 12 of 26

ANLM3D nonlocal self-similarity self-similaritytotoachieve achieve better balance ANLM3Dcan caneffectively effectively utilize utilize the high high nonlocal better balance ANLM3D can effectively utilize the high nonlocal self-similarity to achieve better balance smoothingand andstructural structuralpreserving. preserving. It fails fails however thethe details. BM4D smoothing howevertotopreserve preservethe theedges edgesand and details. BM4D smoothing and structural preserving. It fails however to preserve the edges and the details. BM4D can achieve visual improvements by utilizing the 3D nonlocal self-similarity of data cube, however, can achieve visual improvements by utilizing the 3D nonlocal self-similarity of data cube, however, can achieve visual improvements by utilizing the 3D nonlocal self-similarity of data cube, however, it removessome someofofthe thefine fineobjects objects and and over over smooths Figure 6g,6g, thethe blurred it removes smoothsthe theHSI. HSI.As Asshown showninin Figure blurred it removes some of the fine objects and over smooths the HSI. As shown in Figure 6g, the blurred black lines mean that LRMR only removes part of the dead pixels lines; meanwhile, the blurry black lines mean that LRMR only removes part of the dead pixels lines; meanwhile, the blurry white black lines mean that LRMR only removes part of the dead pixels lines; meanwhile, the blurry white dots indicate that LRMR cannot efficiently remove the heavy Gaussian noise. Clearly, the dots indicate LRMR remove the heavy noise. Clearly, the DDL3+FT white dots that indicate thatcannot LRMRefficiently cannot efficiently remove theGaussian heavy Gaussian noise. Clearly, the DDL3+FT method fails tomixed restorenoisy the mixed noisy image, which can be the seenblurry from the blurry method fails to restore the image, which can be seen from edges andedges obvious DDL3+FT method fails to restore the mixed noisy image, which can be seen from the blurry edges andpixel obvious dead pixel lines as shown Figure 6h. dead lines as shown in Figure 6h. in and obvious dead pixel lines as shown in Figure 6h. Figures 7 and 8 present the vertical and horizontal profiles of band 95 at pixel (559, 150) in the Figures 7 and 8 present profiles of ofband band95 95atatpixel pixel(559, (559, 150) Figures 7 and 8 presentthe thevertical verticaland andhorizontal horizontal profiles 150) in in thethe simulated experiment of Case 2, respectively. Clearly, the results are visually different in both shape simulated experiment of Case 2, respectively. Clearly, the results are visually different in both shape simulated experiment of Case 2, respectively. the results are visually different in both shape and amplitude, where rapid fluctuations are introduced by the existence of dead pixel lines. and amplitude, where rapid existenceof ofdead deadpixel pixellines. lines. and amplitude, where rapidfluctuations fluctuationsare areintroduced introduced by the existence 0.9 0.9

0.9 0.9

0.9 0.9

0.9 0.9

0.45 0.45

0.45 0.45

0.45 0.45

0.45 0.45

0 00 0

0.9 0.9

350 700 350 700 Row number Row number (a) (a)

0.45 0.45

0 00 0

0.9 0.9

350 700 350 700 Row number Row number (b) (b)

0 00 0

0.9 0.9

350 700 350 700 Row number Row number (c) (c)

0.45 0.45

0.45 0.45

0 00 0

0.9 0.9

350 700 350 700 Row number Row number (d) (d)

0.45 0.45

0 0 0 00 350 700 00 00 350 700 350 700 350 700 0 350 700 350 700 0 350 700 0 350 700 Row number Row number Row number Row number Row number Row number Row number Row number (e) (f) (g) (h) (e) (f) (g) (h) Figure 7. Vertical profiles of band 95 at pixel (559, 150) before and after denoising: (a) clean HSI; Figure 7. 7.Vertical before and and after afterdenoising: denoising:(a)(a) clean HSI; Figure Verticalprofiles profilesofofband band95 95atatpixel pixel (559, (559, 150) 150) before clean HSI; (b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. (b)(b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; DDL3+FT; (h) Our method.

0 00 0

0.8 0.8

0.8 0.8

0.8 0.8

0.8 0.8

0.4 0.4

0.4 0.4

0.4 0.4

0.4 0.4

0 0

0.8 0.8 0.4 0.4 0 0

175 350 175 350 Column number Column number (a) (a)

0 0

0.8 0.8 0.4 0.4

175 350 175 350 Column number Column number (b) (b)

0 0

0.8 0.8 0.4 0.4

175 350 175 350 Column number Column number (c) (c)

0 0

0.8 0.8

175 350 175 350 Column number Column number (d) (d)

0.4 0.4

0 0 0 0 0 0 175 350 175 350 175 350 175 350 175 350 175 350 175 350 175 350 Column number Column number Column number Column number Column number Column number Column number Column number (e) (f) (g) (h) (e) (f) (g) (h) Figure 8. Horizontal profiles of band 95 at pixel (559, 150) before and after denoising: (a) clean HSI; Figure 8. Horizontal profiles of band 95 at pixel (559, 150) before and after denoising: (a) clean HSI; (b) KSVD; (c) BM3D;profiles (d) ANLM3D; (f)(559, LRMR; (g)before DDL3+FT; Our method. (a) clean HSI; Figure 8. Horizontal of band(e)95BM4D; at pixel 150) and (h) after denoising: (b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. (b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method.

Sensors 2016, 16, 1718

13 of 26

From Figures 7 and 8, it can be observed that the profiles produced by the proposed method are closest to those of the original HSI and give the best removal of Gaussian noise and dead pixel lines. The curves in Figures 7b–g and 8b–g are not ideally consistent with those in Figures 7a and 8a and therefore lead to the limited ability for denoising HSI, which greatly supports the analysis above. Spectral signatures of the clean image and the restored images at pixel (559, 150) are displayed in Figure 9, which are very critical for the classification and identification of the HSI. KSVD and BM3D denoise the HSI band by band and destroy the spectral–spatial correlation. As shown in Figure 9b,c, there are recognizable artifacts in the restored signatures obtained by KSVD and BM3D. Using ANLM3D Sensors2016, 16, 1718and BM4D, the restored signatures are much nearer to the initial spectral reflectance 13 of 26 curve due to the exploitation of the spectral–spatial information, but they appear to fluctuate strongly compared the clean Thethat restored signatures LRMRby and the Fromwith Figures 7 andspectral 8, it cansignature. be observed the profiles produced theDDL3+FT proposedhave method similar trends and shapes, the detailed cannot be well remained.noise Obviously, spectral are closest to those of the but original HSI andinformation give the best removal of Gaussian and dead pixel signature the proposed method is the optimal, demonstrates theinsuperiority of and the lines. Thecalculated curves inby Figures 7b–g and 8b–g are not ideallywhich consistent with those Figure 7,8a therefore method lead to the limited ability for denoising HSI,dead which greatly proposed in reducing the Gaussian noise and pixel lines.supports the analysis above. 0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0

0

60 120 Spectral Bands

0

0

(a)


0

0

(b)


0

(c)

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0

60 120 Spectral Bands (e)

0

0

60 120 Spectral Bands (f)

0

0

60 120 Spectral Bands (g)

60 120 Spectral Bands (d)

0.4

0

0

0

0

60 120 Spectral Bands (h)

Figure 9. Spectral reflectance curves at pixel (559, 150) before and after denoising: (a) clean HSI; (b) Figure 9. Spectral reflectance curves at pixel (559, 150) before and after denoising: (a) clean HSI; KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. (b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method.

Spectral signatures of the clean image and the restored images at pixel (559, 150) are displayed Case 3:9,Mixed consisting zero-mean Gaussian and noise, Poisson noise in Figure whichnoise are very criticaloffor the classification identification ofand the dead HSI. pixel KSVDlines, and with σ = 0.15 and peak = 30. For Case 3, band 90 of the initial HSI and denoised results are presented BM3D denoise the HSI band by band and destroy the spectral–spatial correlation. As shown in in Figure 10. there One area of all listed images is in magnified for making a clear comparison. KSVD Figure 9b,c, are recognizable artifacts the restored signatures obtained by KSVD and shows BM3D. poor performance as shown in Figure 10c, and BM3D is oversmoothing and loses some useful objects. Using ANLM3D and BM4D, the restored signatures are much nearer to the initial spectral Also, there are obvious dead lines in Figure which indicates that thebut KSVD BM3D reflectance curve due to thepixel exploitation of the 10c,d, spectral–spatial information, theyand appear to methods fail to remove the dead pixel lines. The ANLM3D and BM4D algorithms can only reduce fluctuate strongly compared with the clean spectral signature. The restored signatures LRMRpart and of the deadhave pixelthe lines, as presented in Figure Both of them do not work well in the DDL3+FT similar trends and shapes,10d,f. but the detailed information cannot bepreserving well remained. fine objects. The restored image of LRMR still has dead pixels lines and some Gaussian noise. From Obviously, spectral signature calculated by the proposed method is the optimal, which Figure 10h, it can found that of DDL3+FT fails inmethod suppressing the deadthe pixels lines. As presented in demonstrates thebesuperiority the proposed in reducing Gaussian noise and dead Figure 10i, our approach can effectively remove the Gaussian noise, Poisson noise and dead pixel lines pixel lines. while Case preserving thenoise local consisting details such edges andGaussian textures. noise, Obviously, it performs thedead best pixel compared 3: Mixed ofas zero-mean Poisson noise and lines, to the six compared methods. with  = 0.15 and peak = 30. For Case 3, band 90 of the initial HSI and denoised results are presented Figures and 12 of display the images verticalisand horizontal profiles aofclear band 90 at pixel KSVD (399, 290) in in Figure 10.11 One area all listed magnified for making comparison. shows simulated experiment of Case 3, separately. The spectral reflectance curves of all competing approaches poor performance as shown in Figure 10c, and BM3D is oversmoothing and loses some useful at location (399,there 290)are before and after Figure 13. indicates that the KSVD and objects. Also, obvious deaddenoising pixel linesare in shown Figure in 10c,d, which BM3D methods fail to remove the dead pixel lines. The ANLM3D and BM4D algorithms can only reduce part of the dead pixel lines, as presented in Figure 10d,f. Both of them do not work well in preserving the fine objects. The restored image of LRMR still has dead pixels lines and some Gaussian noise. From Figure 10h, it can be found that DDL3+FT fails in suppressing the dead pixels lines. As presented in Figure 10i, our approach can effectively remove the Gaussian noise, Poisson

Sensors 2016, 16, 1718

14 of 26

Sensors2016, 16, 1718

14 of 26

Sensors2016, 16, 1718

14 of 26

1

1

1

1

1 0.8

1 0.8

1 0.8

1 0.8

0.8 0.6

0.8 0.6

0.8 0.6

0.8 0.6

0.6 0.4

0.6 0.4

0.6 0.4

0.6 0.4

0.4 0.2

0.4 0.2

0.4 0.2

0.4 0.2

0.2 0 0

0

0

0.5

0.2 0

1

(a) 0.5

0

1

0

0

0.5

1

(b) 0.5

0.2 0 0

1

0

0.5

(c) 0.5

0

(b)

(a)

1

0.2 0 0

1

0

0.5

(d) 0.5

0

(c)

1

1

1

1

1 0.8

1 0.8

1 0.8

1 0.8

1 0.8

0.8 0.6

0.8 0.6

0.8 0.6

0.8 0.6

0.8 0.6

0.6 0.4

0.6 0.4

0.6 0.4

0.6 0.4

0.6 0.4

0.4 0.2

0.4 0.2

0.4 0.2

0.4 0.2

0.4 0.2

0.20

0.20

0.2 0

0

0

0.5

(e) 0.5

0

1 1

0.2 0 0

0

0

0.5

(f)0.5

1 1

0

0

0

0.5

(g) 0.5

1 1

1

(d)

1

0.2 0

1

0

0

0

0.5

(h) 0.5

1 0

1

0

0

0.5

(i)0.5

1 1

Figure Restored results of band 90 corrupted with a mixture of(h) Gaussian noise, Poisson noise (e)10.10. Figure Restored results(f) of band 90 corrupted(g) with a mixture of Gaussian noise, Poisson(i) noise and and dead pixel lines:(a) clean HSI; (b) noisy HSI; (c) KSVD; (d) BM3D; (e) ANLM3D; (f) BM4D; dead pixel10.lines:(a) clean HSI;of(b) noisy (c) KSVD; BM3D; (e) ANLM3D;noise, (f) BM4D; (g)noise LRMR; Figure Restored results band 90HSI; corrupted with(d) a mixture of Gaussian Poisson (g) LRMR; (h) DDL3+FT; (i) Our method. (h)and DDL3+FT; (i) Our method. dead pixel lines:(a) clean HSI; (b) noisy HSI; (c) KSVD; (d) BM3D; (e) ANLM3D; (f) BM4D; (i) Our method. 0.8 (g) LRMR; (h) DDL3+FT;0.8

0.8

0.8

0.8 0.4

0.8 0.4

0.8 0.4

0.8 0.4

0.4 0

0.4 0

0.4 0

0.4 0

0.4 0

0.4 0

0.4 0

0

0

0

0.8 0.8 0.4 0.4 0 0

0

350 700 0 350 700 0 350 700 0 350 700 Row number Row number Row number Row number 0 0 0 350 700 0 350 700 0 350 700 0 350 700 (a) (b) (c) (d) Row number Row number Row number Row number 0.8 0.8 0.8 (a) (b) (c) (d) 0.8 0.8 0.8 0.4 0.4 0.4

350 700 0 350 700 0 350 700 0 350 700 0 0 0 Row number Row number Row number Row number 0 350 700 0 350 700 0 350 700 0 350 700 (e) (f) (g) (h) Row number Row number Row number Row number Figure(e) 11. Vertical profiles of band(f) 90 at pixel (399, 290) before(g) and after denoising: (a) clean HSI; (b) (h)

KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. Figure 11. Vertical profiles of band 90 at pixel (399, 290) before and after denoising: (a) clean HSI; (b) Figure 11. Vertical profiles of band 90 at pixel (399, 290) before and after denoising: (a) clean HSI; KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. (b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method.

0

0

200 400 Column number (a)

0.8

0

0

200 400 Column number (b)

0.8

0

0

200 400 Column number (c)

0.8

0

0

200 400 Column number (d)

0.8

Sensors 2016, 16, 1718 Sensors2016, 16, 1718

0.4 0.8

15 of 26 15 of 26

0.4 0.8

0.4 0.8

0.4 0.8

0 0.40

0 0 0 200 400 0.40 200 400 0.40 200 400 0.40 200 400 Column number Column number Column number Column number (e) (f) (g) (h) 0 0 0 0 0 Figure 200 400 profiles 0 of band 20090 at pixel 400 (399,0290) before 200 and after 400 denoising:(a) 0 200 HSI;400 12. Horizontal clean Column Column(e) number number Column number (b) KSVD;number (c) BM3D; (d) ANLM3D; BM4D; (f) LRMR;Column (g) DDL3+FT; (h) Our method. (a) (b) (c) (d) 0.8 The rapid fluctuations 0.8 of results in Figure 10 0.8are caused by the dead 0.8 pixel lines. A visual comparison is made based on the difference in the shape and amplitude of Figures 11–13. KSVD is inferior to suppress the mixed noise and preserve 0.4 0.4 0.4the spectral information, 0.4 which can be clearly seen in Figures 11–13b. According to Figures 11–13c, BM3D is over smoothing while preserving the structure and introduces some recovery artifacts. From Figures 11–13d, it can be found that 0 0 the mixed noise, which 0 can be easily seen by 0the reduction of rapid ANLM3D can reduce 0 200 partly400 0 200 400 0 200 400 0 200 400 fluctuations shown in Figure 12d. As shown in Figures 11–13e,f, obtained by BM4D Column as number Column number Column numberthe curves Column number and LRMR(e) are much near to the initial BM3D, ANLM3D (h) and DDL3+FT, (f) curves compared to KSVD, (g) but they fail in the detail preservation. By comparing the region marked by red rectangle, it is easily Figure12. 12.Horizontal Horizontalprofiles band90 90atatpixel pixel(399, (399,290) 290)before beforeand afterdenoising:(a) denoising:(a)clean cleanHSI; HSI; Figure ofofband after obtained that our methodprofiles achieves the best approximation to the and intrinsic patterns of the clean HSI, (b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. (b) is KSVD; BM3D; (d)with ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. which fully(c) consistent the analysis above.

The rapid fluctuations of results in Figure 10 are caused by the dead pixel lines. A visual 0.3 0.3 0.3 comparison is made based on the difference in the shape and amplitude of Figures 11–13. KSVD is 0.2 0.2the spectral information, 0.2 which can be clearly inferior to suppress the 0.2 mixed noise and preserve seen in Figures 11–13b. According to Figures 11–13c, 0.1 0.1 0.1 BM3D is over smoothing 0.1 while preserving the structure and introduces some recovery artifacts. From Figures 11–13d, it can be found that 0 0 the mixed noise, which 0 can be easily seen by 0the reduction of rapid ANLM3D partly reduce 0 40 can80 120 0 40 80 120 0 40 80 120 0 40 80 120 fluctuations shown in Figure 12d. As shown in Figures 11–13e,f, obtainedBands by BM4D Spectralas Bands Spectral Bands Spectral Bandsthe curves Spectral and LRMR are much near to the initial curves compared to KSVD, BM3D, ANLM3D and DDL3+FT, (a) (b) (c) (d) but they fail in the detail 0.3 preservation. By comparing red rectangle, it is easily 0.3 0.3 the region marked by0.3 obtained that our method achieves the best approximation to the intrinsic patterns of the clean HSI, 0.2 0.2 0.2 0.2 which is fully consistent with the analysis above. 0.1 0.1 0.1 0.1 0.3 0.3 0.3 0.3 0 0 0 0 40 80 120 0.20 40 80 120 0.20 40 80 120 0.20 40 80 120 0.20 Spectral Bands Spectral Bands Spectral Bands Spectral Bands 0.1 0.1 0.1 0.1 (e) (f) (g) (h) 0.3

0 Figure 13. Spectral reflectance 0 0 denoising at pixel (399,0290): (a) clean HSI; curves before120 and after after 0Figure 4013. Spectral 80 120 0 curves 40 80 0 denoising 40 80pixel120 0 (a)40clean80 reflectance before and at (399, 290): HSI; 120 (b) KSVD;(c) (c)BM3D; BM3D;(d) (d)ANLM3D; ANLM3D; (e)Bands BM4D;(f) (f)LRMR; LRMR;(g) (g)DDL3+FT; DDL3+FT; (h)Our Ourmethod. method. Spectral Bands Spectral Spectral Bands Spectral Bands (b) KSVD; (e) BM4D; (h)

(a)

(b)

(c)

(d)

0.3 The rapid fluctuations 0.3of results in Figure 100.3 0.3pixel lines. A visual are caused by the dead comparison is made based shape and amplitude of Figures 11–13. KSVD 0.2 0.2on the difference in the0.2 0.2 is inferior to suppress the mixed noise and preserve the spectral information, which can be clearly 0.1 0.1 0.1 0.1 seen in Figures 11b, 12b and 13b. According to Figures 11c, 12c and 13c, BM3D is over smoothing while 0 preserving the structure 0 and introduces some0recovery artifacts. From0Figures 11d, 12d and 0 40 80 120 0 40 80 120 0 40 80 120 0 40 80 120 13d, it can be found that ANLM3D can partly reduce the mixed noise, which can be easily seen by Spectral Bands Spectral Bands Spectral Bands Spectral Bands the reduction of rapid fluctuations as shown in Figure 12d. As shown in Figures 11e,f, 12e,f and 13e,f, (e) (f) (g) (h) the curves obtained by BM4D and LRMR are much near to the initial curves compared to KSVD, BM3D, ANLM3D and DDL3+FT, but they fail and in the detail preservation. By comparing theHSI; region Figure 13. Spectral reflectance curves before after denoising at pixel (399, 290): (a) clean (b)by KSVD; (c) BM3D; (d) (e) BM4D; LRMR; (g) DDL3+FT; marked red rectangle, it isANLM3D; easily obtained that(f)our method achieves(h) theOur bestmethod. approximation to the intrinsic patterns of the clean HSI, which is fully consistent with the analysis above.

Sensors 2016, 16, 1718 Sensors2016, 16, 1718

16 of 26 16 of 26

Experiment on Real Data 4.2. Experiment

Two well-known real data sets are adopted for evaluating the proposed method, including impressions areare presented to qualitatively estimate the Urban data and and Indian IndianPines Pinesdata. data.The Thevisual visual impressions presented to qualitatively estimate experimental results beforebefore and after For realFor HSI,real there is no reference to implement the experimental results andrestoration. after restoration. HSI, there is no image reference image to the numerical e.g., PSNR and the classification accuracies onaccuracies Indian Pines implement theevaluation, numerical evaluation, e.g.,FSIM. PSNRHence, and FSIM. Hence, the classification on data are utilized to quantitatively estimate the denoising performance. Indian Pines data are utilized to quantitatively estimate the denoising performance. 4.2.1. Denoising Denoising for for Urban Urban data Data 4.2.1. Urban data, data, with 307 ××210, 210,isisacquired acquiredby bythe the HYDICE HYDICE sensor. sensor. Due Due to to Urban with the the original original size size of of 307 307 × × 307 the detector-to-detector difference, it has different intensity strips and mixed noises versus bands. the detector-to-detector difference, it has different intensity strips and mixed noises versus bands. After removing removing the the bands bands 104–108, 104–108, 139–151, 139–151, and and 207–210 207–210 of of Urban Urban data, data, polluted polluted by by the the atmosphere atmosphere After and water absorption, a subset with the size of 150 × 150 × 188 is used in the following experiments. and water absorption, a subset with the size of 150 × 150 × 188 is used in the following experiments. Figure 14 recovered images of band 186 obtained by different methods.methods. To facilitate Figure 14presents presentsthe the recovered images of band 186 obtained by different To the visual comparison, the yellow arrows are utilized to mark the obvious stripes in Figure 14. facilitate the visual comparison, the yellow arrows are utilized to mark the obvious stripes in Figure Meanwhile, Figure 15 gives the enlarged details in the red rectangle of Figure 14. KSVD shows poor 14. Meanwhile, Figure 15 gives the enlarged details in the red rectangle of Figure 14. KSVD shows performance on theon reduction of stripes and theand preservation of structures as shown Figures and poor performance the reduction of stripes the preservation of structures as in shown in 14b Figures 15b. Obviously, BM3D and BM4D smooth out the important texture and fine targets, and both of them 14–15b. Obviously, BM3D and BM4D smooth out the important texture and fine targets, and both of fail in fail removing the stripes. According to Figures and14–15d 15d,g, it can14–15g, be found and them in removing the stripes. According to 14d,g Figures and it that can ANLM3D be found that DDL3+FT can suppress stripes, and DDL3+FT has aand better ability for preserving texture ANLM3D andpartly DDL3+FT can the partly suppress the stripes, DDL3+FT has a better the ability for and edge information than ANLM3D. LRMR shows better recovery performance than the other five preserving the texture and edge information than ANLM3D. LRMR shows better recovery compared methods on the noisefive reduction and structure Figure 14f,h, it can be performance than the other compared methods preservation. on the noiseFrom reduction and structure clearly observed that LRMR is inferior to the proposed method in removing the stripes. Generally, our preservation. From Figure 14f,h, it can be clearly observed that LRMR is inferior to the proposed proposed restoration method can efficiently restore the Urban data and convincingly outperform the method in removing the stripes. Generally, our proposed restoration method can efficiently restore six compared approaches. the Urban datadenoising and convincingly outperform the six compared denoising approaches.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 14. Restored results in Urban image: (a) Original band 186; (b) KSVD; (c) BM3D; (d) Figure 14. Restored results in Urban image: (a) Original band 186; (b) KSVD; (c) BM3D; (d) ANLM3D; ANLM3D; (e)LRMR; BM4D;(g) (f) DDL3+FT; LRMR; (g) (h) DDL3+FT; (h) Our method. (e) BM4D; (f) Our method.

In addition, Figure 16 presents the results at band 104 before and after denoising. As shown in Figure 16a, there are many heavy stripes within the initial band 104. The false-color images of Urban data before and after restoration are presented in Figure 17, which consist of the 1st, 104th and 135th bands. KSVD updates the dictionary atoms one by one, and destroys the structure of sparse

(a)

(b)

(c)

(d)

Sensors 2016, 16, 1718

17 of 26

Sensors2016, 16, 1718

17 of 26

(a) (b) (c) (d) coefficients. As shown in Figures 16b and 17b, it blurs the image structures and shows very poor Figure Cont. performance. By exploiting the block matching and15.3D collaborative filter, BM3D greatly enhances the image quality of the initial ones, but there are still obvious stripes in Figures 16c and 17c. According to the regions marked by the red ellipse, it can be found that ANLM3D, BM4D, LRMR and DDL3+FT can partly remove the stripes, and LRMR works worse in effectively restoring the images polluted by serious noises. Obviously, the proposed method shows the superior performance on denoising the flat region with the better edge preservation. Especially, by compared the regions marked by a blue rectangle in Figure 16, it can be easily found that the suggested method can restore well the targets (f)(f) (g)mixed noises. Above (h)the restored (g) (h)all, and remain(e) the(e) edges while efficiently reducing the stripes and images obtained by the proposed present(a) theOriginal best visual result than KSVD,(c) BM3D, ANLM3D, Figure 14.  Restored results inmethod Urban band KSVD; BM3D; Figure 15. 4 magnified results of theimage: various approaches in the186; red (b) rectangle of Figure 14: (d) (a) BM4D, LRMR and DDL3+FT, with(g) the satisfying preservation. ANLM3D; (e) BM4D; (f) LRMR; DDL3+FT; (h)detail Our method. Original band 186; (b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method.

In addition, Figure 16 presents the results at band 104 before and after denoising. As shown in Figure 16a, there are many heavy stripes within the initial band 104. The false-color images of Urban data before and after restoration are presented in Figure 17, which consist of the 1st, 104th Sensors2016, 16, 1718 17 of 26 and 135th bands. KSVD updates the dictionary atoms one by one, and destroys the structure of sparse coefficients. As shown in Figures 16–17b, it blurs the image structures and shows very poor Figureand 15. Cont. (a)exploiting the block (b) (c) performance. By matching 3D collaborative filter, BM3D(d) greatly enhances the image quality of the initial ones, but there are still obvious stripes in Figures 16–17c. According to the regions marked by the red ellipse, it can be found that ANLM3D, BM4D, LRMR and DDL3+FT can partly remove the stripes, and LRMR works worse in effectively restoring the images polluted by serious noises. Obviously, the proposed method shows the superior performance on denoising the flat region with the better edge preservation. Especially, by compared the regions marked by a blue rectangle in Figure 16, it can be easily found that the suggested method can (g) (h) restore well the(e)targets and remain (f) the edges while efficiently reducing the stripes and mixed noises. Above all, the restored images obtained by the proposed method present the best visual Figure 15. ×4 magnified 4 magnified results of various the various approaches inred the red rectangle of 14: Figure 14: (a) Figure 15. results of the approaches in the rectangle of Figure (a) Original resultOriginal than KSVD, BM3D, ANLM3D, BM4D, LRMR (e) and DDL3+FT, with the satisfying detail band 186; (b) KSVD; (c) BM3D; (d) ANLM3D; BM4D; (f) LRMR; (g) DDL3+FT; (h) Our band 186; (b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. preservation. method. In addition, Figure 16 presents the results at band 104 before and after denoising. As shown in Figure 16a, there are many heavy stripes within the initial band 104. The false-color images of Urban data before and after restoration are presented in Figure 17, which consist of the 1st, 104th and 135th bands. KSVD updates the dictionary atoms one by one, and destroys the structure of sparse coefficients. As shown in Figures 16–17b, it blurs the image structures and shows very poor performance. By exploiting the block matching and 3D collaborative filter, BM3D greatly enhances the image quality of the initial ones, but there are still obvious stripes in Figures 16–17c. According to the regions ellipse, it can be found(c)that ANLM3D, BM4D,(d)LRMR and (a) marked by the red(b) DDL3+FT can partly remove the stripes, and LRMR works worse in effectively restoring the images polluted by serious noises. Obviously, the proposed method shows the superior performance on denoising the flat region with the better edge preservation. Especially, by compared the regions marked by a blue rectangle in Figure 16, it can be easily found that the suggested method can restore well the targets and remain the edges while efficiently reducing the stripes and mixed noises. Above all, the restored images obtained by the proposed method present the best visual result than KSVD, BM3D, ANLM3D, BM4D, LRMR and DDL3+FT, with the satisfying detail preservation. (e) (f) (g) (h) Figure 16. Restored results in Urban image: (a) Original band 104; (b) KSVD; (c) BM3D; (d) Figure 16. Restored results in Urban image: (a) Original band 104; (b) KSVD; (c) BM3D; (d) ANLM3D; ANLM3D; (e)LRMR; BM4D; (g) (f) DDL3+FT; LRMR; (g) (h) DDL3+FT; (h) Our method. (e) BM4D; (f) Our method.

(a)

(b)

(c)

(d)

Sensors 2016, 16, 1718

18 of 26

Sensors2016, 16, 1718

18 of 26

Sensors2016, 16, 1718

18 of 26

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure(e) 17. Restored results of Urban (f) image: (a) original false-color (g) image (R: 1, G: 104, and (h)B: 135); Figure 17. Restored results of Urban image: (a) original false-color image (R: 1, G: 104, and B: 135); (b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. (b)Figure KSVD;17. (c)Restored BM3D; (d) ANLM3D; (e)image: BM4D;(a)(f)original LRMR;false-color (g) DDL3+FT; (h)(R: Our method. results of Urban image 1, G: 104, and B: 135); (b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method.

4.2.2. Experimental Results on Indian Pines Data 4.2.2. Experimental Results on Indian Pines Data 4.2.2.The Experimental Results IndianIndian Pines Data second data set ison named Pines, which is recorded by the NASA AVIRIS sensor The second setregion is named Indian which contains is recorded by random the NASA AVIRIS sensor over over the Indiandata Pines in 1992, andPines, this dataset much noise in some bands The second data set is named Indian Pines, which is recorded by the NASA AVIRIS sensor theduring Indianthe Pines region process. in 1992, and this dataset much random some bands during acquiring It comprises 220 contains spectral bands and the noise spatialindimension of each over the Indian Pines region in 1992, and this dataset contains much random noise in some bands thespectral acquiring process. It comprises 220 spectral bands and the spatial dimension of each spectral band band is 145 × 145 pixels. For Indian Pines data, the ground truth has 16 land cover classes during the acquiring process. It comprises 220 spectral bands and the spatial dimension of each is 145 × 145 pixels. For Indian Pines data, the ground truth has 16 land cover classes and a total of and a total of 10,366 labeled pixels. spectral band is 145 × 145 pixels. For Indian Pines data, the ground truth has 16 land cover classes 10,366 labeled Figures pixels. 18–20 display the visual comparisons of different bands polluted by different noises. and a total of 10,366 labeled pixels. Figures 18–20 display the visual comparisons of different by different noises. Obviously, the proposed method achieves better visual qualitybands than polluted the compared ones. From Figures 18–20 display the visual comparisons of different bands polluted by different noises. Figures 18–20b, KSVD shows poorer denoising capability than other methods. BM3D significantly Obviously, the proposed method achieves better visual quality than the compared ones. From Obviously, the proposed method achieves better visual quality than the compared ones. From blurs the images and 20b, losesKSVD the texture information and edges.capability than other methods. BM3D Figures 19b and shows poorercapability denoising Figures18b, 18–20b, KSVD shows poorer denoising than other methods. BM3D significantly significantly blurs the images and loses the texture information blurs the images and loses the texture information and edges. and edges.

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure(e) 18. Restored results in Indian (f) Pines image: (a) Original (g) band 164; (b) KSVD; (c) (h)BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. Figure 18. Restored results in Indian Pines image: (a) Original band 164; (b) KSVD; (c) BM3D; Figure 18. Restored results in Indian Pines image: (a) Original band 164; (b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method.

Sensors 2016, 16, 1718

19 of 26

Sensors2016, 16, 1718 Sensors2016, 16, 1718

19 of 26 19 of 26

(a) (a)

(b) (b)

(c) (c)

(d) (d)

(e) (e)

(f) (f)

(g) (g)

(h) (h)

Figure 19. Restored results in Indian Pines image: (a) Original band 220; (b) KSVD; (c) BM3D; Figure 19. 19. Restored Restored results results in in Indian Indian Pines Pines image: image: (a) (a) Original band band 220; 220; (b) (b) KSVD; KSVD; (c) (c) BM3D; BM3D; Figure (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) OurOriginal method. (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method.

(a) (a)

(b) (b)

(c) (c)

(d) (d)

(e) (e)

(f) (f)

(g) (g)

(h) (h)

Figure 20. Restored results in Indian Pines image: (a) Original band 1; (b) KSVD; (c) BM3D; Figure 20. Restored results in Indian Pines image: (a) method. Original band 1; (b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our Figure 20. Restored results in Indian Pines image: (a) Original band 1; (b) KSVD; (c) BM3D; (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method. (d) ANLM3D; (e) BM4D; (f) LRMR; (g) DDL3+FT; (h) Our method.

As shown in Figures 18–19d, the image quality improvements obtained by ANLM3D appear shown incan Figures 18–19d, the imageisquality improvements obtained by ANLM3D appear very As small andin be 18d neglected. oversmoothing and loses texture information. By As shown Figures and 19d,BM4D the image quality improvements obtained by ANLM3D appear very small and can be neglected. BM4D is oversmoothing and loses texture information. By observing the regions marked by the blue rectangle in Figures 18–20, our denoising method has the very small and can be neglected. BM4D is oversmoothing and loses texture information. By observing observing the regions marked by the blue rectangle in Figures 18–20, our denoising method has the better abilitymarked on the by edge than our LRMR and DDL3+FT; meanwhile, the regions theand bluestructure rectanglepreservation in Figures 18–20, denoising method has the betterLRMR ability better ability on the edge and structure preservation than LRMR and DDL3+FT; meanwhile, LRMR and DDL3+FT worse in efficiently the random meanwhile, noises as shown the regions on the edge andwork structure preservation thanremoving LRMR and DDL3+FT; LRMRin and DDL3+FT and DDL3+FT work worse in efficiently removing the random noises asalgorithm shown inis the regions marked by the red ellipse in Figures 18–20. According toas Figures 18–20, our much more work worse in efficiently removing the random noises shown in the regions marked by the red marked by the red ellipse in Figures 18–20. According to Figures 18–20, our algorithm is much more superior than the six compared methods for seriously corrupted images, which is consistent with ellipse in Figures 18–20. According to Figures 18–20, our algorithm is much more superior than the superior the sixsocompared methods seriously images, is consistent with the abovethan analysis, our algorithm can for perform best corrupted in the is removal of which random the six compared methods for seriously corrupted images, which consistent with thenoises abovefrom analysis, the above analysis, so our algorithm can perform best in the removal of random noises from the Indian Pines data, while effectively improving the quality of the noisy HSI and retoring the texture so our algorithm can perform best in the removal of random noises from the Indian Pines data, while Indian Pines data, while effectively improving the quality oftwo thecases noisyare HSItaken and retoring the texture and structure details. For classification-based consideration effectively improving the quality of the noisy evaluation, HSI and retoring the texture andinto structure details. and structure details. For classification-based evaluation, two cases are taken into consideration according to the testing data: (1) classifying the 20 heavily corrupted bands of Indian Pines data, according bands to the testing (1) classifying the classifying 20 heavily the corrupted of Indian data, including 104–108,data: 150–163 and 220; (2) Indian bands Pines data with Pines 20 heavily including bands 104–108, 150–163 and 220; (2) classifying the Indian Pines data with 20 heavily

Sensors 2016, 16, 1718

20 of 26

Sensors2016, 16, 1718

20 of 26

For classification-based evaluation, two cases are taken into consideration according to the testing data: corrupted bands removed. Similar with the setting in [48], the number of training samples for small (1) classifying the 20 heavily corrupted bands of Indian Pines data, including bands 104–108, 150–163 classes “alfalfa”, “grass/pasture mowed”, and “oats” is set as 15 samples per class, and the number and 220; (2) classifying the Indian Pines data with 20 heavily corrupted bands removed. Similar with for the remaining classes is set to 50. Support vector machine (SVM) is utilized as the classification the setting in [48], the number of training samples for small classes “alfalfa”, “grass/pasture mowed”, method. As usual, the commonly used overall accuracy (OA) and the kappa coefficient are selected and “oats” is set as 15 samples per class, and the number for the remaining classes is set to 50. Support as evaluation metrics and the map of the results is used for visual estimation. vector machine (SVM) is utilized as the classification method. As usual, the commonly used overall accuracy (OA) and the kappa coefficient are selected as evaluation metrics and the map of the results Table 1. SVM classification accuracies on heavily corrupted bands (104–108, 150–163, 220) of the is used for visual estimation. Indian Pines data. Table 1 lists the overall accuracy (OA) and kappa coefficient (κ) of the results for the heavily Initial Table HSI 2KSVD BM3D BM4Dof the LRMR Oursbands. corrupted bands. shows the OA andANLM3D kappa coefficient resultsDDL3+FT for the remaining AfterOA achieving the recovery for testing the values69.27% of OA and kappa coefficient obviously 15.74% 57.96% 81.2%data,25.17% 48.38% 49.12% are 83.76% enhanced as shown in Tables 1 and 2, which demonstrate the necessity of HSI denoising before 0.0912 0.5368 0.7803 0.2135 0.679 0.3664 0.4284 0.8109  implementing the classification. Compared with other algorithms, the proposed method obtains the best OA and kappa coefficients in both Tables 1 and 2, which means that our denoising method can Table 2. SVM classification accuracies on the Indian Pines data (bands 104–108, 150–163, 220 greatly restore the structure information (which is essential for classification) in the seriously polluted removed). bands or the remaining 200 bands. We note that ANLM3D method obtains an OA value of 25.17% and κ of 0.2135 Initial in Table 1, which are justBM3D 9.43% and 0.1223 higher than initial HSI.DDL3+FT This lower classification HSI KSVD ANLM3D BM4D LRMR Ours accuracy and kappa coefficient are totally in line with the poor denoising performance as displayed in OA 74.39% 87.96% 0.9215% 81.06% 87.59% 87.18% 85.92% 90.35% Figure 19d. 0.7183 0.8531 0.8673 0.775 0.8568 0.8548 0.8442 0.8726  Table 1. SVM classification accuracies on heavily corrupted bands (104–108, 150–163, 220) of the Indian

Pines data. Table 1 lists the overall accuracy (OA) and kappa coefficient (  ) of the results for the heavily corrupted bands. Table 2 shows the OA and kappa coefficient of the results for the remaining bands. Initial HSI KSVD BM3Ddata,ANLM3D BM4D DDL3+FT Ours After achieving the recovery for testing the values of OA and LRMR kappa coefficient are obviously OA 15.74% in Tables 57.96% 1 and 81.2% 69.27% 48.38% of HSI 49.12% 83.76% enhanced as shown 2, which25.17% demonstrate the necessity denoising before κ 0.0912 0.5368 0.7803 0.2135 0.679 0.3664 0.4284 0.8109 the implementing the classification. Compared with other algorithms, the proposed method obtains best OA and kappa coefficients in both Tables 1 and 2, which means that our denoising method can Table 2. SVMthe classification on the(which Indian Pines data (bands 150–163, 220 greatly restore structureaccuracies information is essential for104–108, classification) in removed). the seriously polluted bands or the remaining 200 bands. We note that ANLM3D method obtains an OA value of Initial HSI KSVD BM3D ANLM3D BM4D LRMR DDL3+FT Ours 25.17% and  of 0.2135 in Table 1, which are just 9.43% and 0.1223 higher than initial HSI. This OAclassification 74.39% accuracy 87.96%and0.9215% 81.06% are87.59% 90.35% lower kappa coefficient totally in87.18% line with 85.92% the poor denoising κ 0.7183 0.8531 0.8673 0.775 0.8568 0.8548 0.8442 0.8726 performance as displayed in Figure 19d. The classification maps of different algorithms are displayed in Figure 21, where the first row classification of different algorithms are displayed Figure 21, where first of rowthe is is theThe results of the 20maps heavily corrupted bands and the secondinrow presents the the results the results of 20 heavily and the second row theit results the remaining remaining 200the bands beforecorrupted and afterbands restoration. According to presents Figure 21, can be of easily observed 200 bands before and after restoration. According to Figure 21, it can be easily observed that the result that the result of suggested method presents the better visual effect than the six compared of suggested method presents the better visual effect than the six compared algorithms. algorithms.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure data before and after denoising: the the firstfirst rowrow is the Figure 21. 21. Classification Classificationresults resultsfor forIndian IndianPines Pines data before and after denoising: is results of the 20 heavily corrupted bands; the second row presents the results of the remaining 200 the results of the 20 heavily corrupted bands; the second row presents the results of the remaining bands: (a) Original HSI;HSI; (b) (b) KSVD; (c) (c) BM3D; (d)(d) ANLM3D; 200 bands: (a) Original KSVD; BM3D; ANLM3D;(e)(e)BM4D; BM4D;(f) (f)LRMR; LRMR; (g) (g) DDL3+FT; DDL3+FT; (h) Our method. (h) Our method.

Sensors 2016, 16, 1718 Sensors2016, 16, 1718

21 of 26 21 of 26

4.3.Discussion Discussion 4.3. 4.3.1. 4.3.1.Threshold ThresholdParameter Parameterη The Thevalue valueofofthe theparameter parameterηininEquation Equation(4) (4)isisselected selectedfrom from0 0toto1.1.Quantitative Quantitativeevaluations evaluationsofof different differentηvalues valueswere wereimplemented implementedby bycomparing comparingthe theOA OAmetric metricon onIndian IndianPines Pinesdata. data.Noticing Noticing that for the data consisting of the 20 heavily polluted bands, the SSIM values of adjacent bands that for the data consisting the 20 heavily polluted bands, the SSIM values of adjacent bands are are very similar with each other as shown in Figure 2c, just we just analyzed the remaining 200 bands very similar with each other as shown in Figure 2c, we analyzed the remaining 200 bands here. here. Figure 22 presents relative SVM classificationaccuracies. accuracies.It Itcan canbe be seen seen that the Figure 22 presents the the relative SVM classification the highest highest classification ∈ (0.4,0.5)and andthe thelowest lowestvalues valuesisisinin η (0,0.1). ∈ (0,0.1). In particular, classificationaccuracies accuraciesoccur occur in in η(0.4,0.5) In particular, the the case when meansthat thatthe thesegmentation segmentationpoints pointsare are selected selected in line with case when  =η = 0 0means with the thelocal localdropping dropping points, points,which whichresult resultininthe theabsence absenceofofsample sampleinformation informationfor forsome someband-subsets band-subsetsand anddecrease decreasethe the classification classificationaccuracy. accuracy.When Whenη==1,1,there thereisisno noneed needtotodivide dividethe thebands. bands.Observing Observingthe thecurve curveinin Figure cancan be be found thatthat the appropriate η may the classification accuracies, which Figure22, 22,it it found the appropriate  well mayenhance well enhance the classification accuracies, also demonstrates the necessary for the band-subset segmentation. which also demonstrates the necessary for the band-subset segmentation.

Figure Figure22. 22.SVM SVMclassification classificationaccuracies accuraciesofofIndian IndianPines Pinesdata data(104–108, (104–108,150–163, 150–163,220 220removed). removed).

Thenumber numberofof band subsets, is only dominated byparameter the parameter . For thewith HSICwith The thethe band subsets, C, isC,only dominated by the η. Forthe HSI bandC 𝐶 C (𝐾c 𝑐M+) 𝑀) band subsets, its computational is approximately to O(∑𝑐=1M (𝑙)𝑥 𝑙+ subsets, its computational complexitycomplexity is approximately equal to O(∑equal for+ 𝑦 𝑙( 𝑐K c =1 l x l y l c ( K c + (𝐾 𝑀)) for one iteration. The values of l x , l y and M are fixed in our experiments. Hence, the one𝑐 iteration. The values of lx , ly and M are fixed in our experiments. Hence, the computational computational complexity the proposed method isbydetermined by thelcparameters lc and Kc. To complexity of the proposedof method is determined the parameters and Kc . To simplify simplify the analysis, we consider the special casethe thatnumber the number of dictionary atoms, c, is setasas the analysis, we consider the special case that of dictionary atoms, Kc ,Kis set the same value K’ for all band-subsets. Then the computational complexity can be rewritten the same value K’ for all band-subsets. Then the computational complexity can be rewritten asas ′ ′ 0 + 𝑀)(𝑙 𝐶𝐾 thatthat the the value of the (𝑙1 +(l⋯ + 𝑙. 𝐶. ). +islCthe OO(𝑙lx𝑥l𝑙y𝑦((𝐾 K0 + M) (l11++⋯ . .+ . +𝑙𝐶l)C+ CK𝑀). M It. isItnoted is noted value of expression the expression ) )+ 1+ number of bands in HSI and M is much larger than K’, so the computational complexity is mainly is the number of bands in HSI and M is much larger than K’, so the computational complexity is 0 M.to dominated by theby expression 𝐾 ′ 𝑀. K Due thetostrong spectral correlations, C is usually smaller than mainly dominated the expression Due the strong spectral correlations, C is usually smaller 10, as in Figure 2. When C changes from from 1 to 10, valuevalue of K’ of shows a rapid than 10,shown as shown in Figure 2. When C changes 1 tothe 10,empirical the empirical K’ shows a ′ 0 decrease, and the relative value of 𝐶𝐾 𝑀 shows a downwards trend, which implies a lower rapid decrease, and the relative value of CK M shows a downwards trend, which a lower computationalcomplexity. complexity. computational 4.3.2. 4.3.2.The TheSparse SparseNoise NoiseTerm Term To the the effectiveness of theof sparse we term, performed a comprehensive comparison Toevaluate evaluate effectiveness the noise sparseterm, noise we performed a comprehensive between the restored obtained by the proposed with the sparsewith noisethe term disabled comparison betweenimages the restored images obtained bymethod the proposed method sparse noise and enabled. The Urban dataset is applied for the simulation, and the method with disabled sparse term disabled and enabled. The Urban dataset is applied for the simulation, and the method with noise termsparse is named SNT-DIS. Figure 23 showsFigure the visual impression of impression band 133 obtained disabled noiseasterm is named as SNT-DIS. 23 shows the visual of band by 133 SNT-DIS the proposed method. Compared the corrupted in Figure it can 23a, be obtainedand by SNT-DIS and the proposed method.with Compared with theimage corrupted image23a, in Figure easily that found both algorithms greatly improve the improve image quality as displayed 23b,c, it canfound be easily that both can algorithms can greatly the image quality in asFigure displayed in but the SNT-DIS reduce smallreduce part ofa small the stripes, presented in presented Figure 23b, it also Figure 23b,c, butcan the only SNT-DIS cana only part ofasthe stripes, as inand Figure 23b, fails fine objects. As presented Figure 23c, suggested can method effectively andin it preserving also fails inthe preserving the fine objects. Asinpresented in the Figure 23c, themethod suggested can remove heavy stripesheavy and mixed while noises preserving the local details edges such and textures. effectively remove stripesnoises and mixed while preserving thesuch localasdetails as edges and textures. The obvious superiority of the proposed method demonstrates that the denoising performance can be greatly improved by introducing the sparse noise term.

Sensors 2016, 16, 1718

22 of 26

The obvious superiority of the proposed method demonstrates that the denoising performance can be greatly improved Sensors2016, 16, 1718 by introducing the sparse noise term. 22 of 26

(b)

(a)

(c)

Figure 23. Restored results in Urban data: (a) Original band 133; (b) SNT-DIS; (c) Our method. Figure 23. Restored results in Urban data: (a) Original band 133; (b) SNT-DIS; (c) Our method.

5. Conclusions 5. Conclusions In this paper, a novel hierarchical framework combining structural correlations in spectral In this paper, a novel hierarchical framework combining structural correlations in spectral domain domain and sparse learning is developed for HSI denoising. First, with the suggested band-subset and sparse learning is developed for HSI denoising. First, with the suggested band-subset partition, partition, spectral-spatial cubic patches can be effectively extracted, where each patch has strong spectral-spatial cubic patches can be effectively extracted, where each patch has strong structure structure correlations between adjacent bands and high spatial similarities within the cubic patch. correlations between adjacent bands and high spatial similarities within the cubic patch. Second, Second, noise-free estimations for each band-subset are generated by solving a sparse problem noise-free estimations for each band-subset are generated by solving a sparse problem constructed constructed by a Beta-Bernoulli process. The Gaussian process with Gamma distribution is by a Beta-Bernoulli process. The Gaussian process with Gamma distribution is regarded as the regarded as the precursor of dictionary elements to fully exploit the spatial consistency in HSI. By precursor of dictionary elements to fully exploit the spatial consistency in HSI. By utilizing the utilizing the spectral-spatial information and Gibbs sampling, the suggesting framework can spectral-spatial information and Gibbs sampling, the suggesting framework can effectively learn effectively learn the latent detail and structure information in HSI. It has the advantage of the latent detail and structure information in HSI. It has the advantage of adaptively predicting the adaptively predicting the noise in a data-driven manner, which enables the framework to better noise in a data-driven manner, which enables the framework to better depict the existing noises in depict the existing noises in HSI. Meanwhile, it can automatically infer the size and the atoms of HSI. Meanwhile, it can automatically infer the size and the atoms of dictionary. Compared with dictionary. Compared with KSVD, BM3D, ANLM3D, BM4D, LRMR and DDL3+FT, the proposed KSVD, BM3D, ANLM3D, BM4D, LRMR and DDL3+FT, the proposed framework achieves superior framework achieves superior performance both numerically and visually, which has been verified performance both numerically and visually, which has been verified on the simulated and real HSI. on the simulated and real HSI. It can suppress various noises in HSI, such as Gaussian noise, It can suppress various noises in HSI, such as Gaussian noise, Poisson noise, dead pixel lines, stripes or Poisson noise, dead pixel lines, stripes or a mixture of them, with better image quality improvement a mixture of them, with better image quality improvement and structure characteristics preservation. and structure characteristics preservation. Further research will be directed toward exploiting the Further research will be directed toward exploiting the appropriately hidden structure and reducing appropriately hidden structure and reducing the computational time for the analysis of the HSI. the computational time for the analysis of the HSI. Acknowledgments: This work workisissupported supportedby bythe theNational National Basic Research Program of China Program) Acknowledgments: This Basic Research Program of China (973 (973 Program) with with no. 2013CB329402, the Program for Cheung Kong Scholars and Innovative Research Team in University no. 2013CB329402, the Program for Cheung Kong Scholars and Innovative Research Team in University with No. IRT_15R53. with No. IRT_15R53. Author Contributions: Liu is author who who theoretically theoretically proposed proposed the the original original method method Author Contributions: Shuai Shuai Liu is the the corresponding corresponding author and wrote this paper; Licheng Jiao and Shuyuan Yang gave some valuable suggestions. and wrote this paper; Licheng Jiao and Shuyuan Yang gave some valuable suggestions. Conflicts of Interest: The authors declare no conflict of interest. Conflicts of Interest: The authors declare no conflict of interest.

Appendix A Appendix A Supposing that symbol—represents “the remaining variables except the one being sampled”. Supposing that symbol represents “theare remaining variables except the one being sampled”. The update equations for the— c-th band-subset listed asfollows: c The update equations for the c-th band-subset are listed asfollows: Sampling dk : p (dck |− ) ∼ N dck µdc , Ωdc (A1) Sampling 𝒅𝑐𝑘 : k



k

pk  d k   ~ N d k μd c , Ωd c

where the covariance Ωdc and mean µdc arec expressedc as: k

k

k



1 c 2as: c 2 where the covariance Ω𝒅𝑐𝑘 andΩ mean 𝝁 𝑐 −are + expressed γnc ∑ (wik ) (zik ) dc = 𝒅Σ𝑘 k

i

2 2   Ωd c =  Σ -1 + nc   wikc   zikc   k i  

μd c = nc Ωd c  wikc zikc xci ,  k  k

k

i

(A1) ! −1 (A2)

1

(A2) (A3)

Sensors 2016, 16, 1718

23 of 26

c c c µdc = γnc Ωdc ∑ wik zik x(i,−k) k

k

(A3)

i

c zc )dc . where xc(i,−k) = xic − Dc (wic zic ) − qic sic + (wik ik k c Sampling zik : p1 c p (zik ) |− ) ∼ Bern( p1 + p0

(A4)

c = 1 is proportional to: The posterior probability of zik

p1 = πkc exp(−

γnc c 2 c ((wik ) (dck )T dck − 2wik (dck )T xc(i,−k) )) 2

(A5)

c = 0 is proportional to: The posterior probability of zik

Sampling wic :

p0 = 1 − πkc

(A6)

c c p (wik |− ) ∼ N (wik |µcwik , Ω cwik )

(A7)

c c 2 Ωcwik = (γw + γnc (zik ) (dck )T dck )

−1

(A8)

µcwik = γnc Ωcwik (dck )T xc(i,−k)

(A9)

Sampling πkc : c c p (πkc |− ) ∼ Beta( a0 /K + ∑ zik , b0 (K − 1)/K + M − ∑ zik ) i

Sampling ξ 1c :

(A10)

i

p (ξ 1c |− ) ∼ Γ( β + PK/2, ρ +

∑ (dck )T Σ−1 dck /2 )

(A11)

∑ (wic )T wic /2 )

(A12)

k

c: Sampling γw

c p ( γw |− ) ∼ Γ(c + MK/2, d +

i

Sampling γnc : 2

p (γnc |− ) ∼ Γ(e + PM/2, f + ∑ k xic − Dc (wic ◦ zic ) − qic ◦ sic k ) i

Sampling scpi : p scpi |− ∼ Bern( v1 = θ cp exp(−

(A13)

2

v1 ) v1 + v0

γv c 2 ( q pi − 2qcpi xc(i,s) )) 2

v0 = 1 − θ cp

(A14) (A15) (A16)

where xc(i,s) = xic − Dc wic ◦ zi . The posterior probability that scpi = 1 is proportional to ν1 , and the posterior probability that scpi = 0 is proportional to ν0 . Sampling qcpi : p qcpi |− ∼ N (µqpi , Ωqpi ) (A17) c

Ωqpi =

γvc

+ γnc

scpi

2 −1

µqpi = γnc Ωqpi scpi xc(i,s)

(A18) (A19)

Sensors 2016, 16, 1718

24 of 26

Sampling θ cp : p θ cp |− ∼ Beta( aθ + ∑ scpi , bθ + M − ∑ scpi ) i

Sampling γνc :

p (γvc |− ) ∼ Γ( g + PM/2, h +

(A20)

i

∑ (qic )T qic /2 )

(A21)

i

References 1.

2. 3. 4.

5. 6.

7. 8.

9. 10. 11. 12. 13. 14.

15. 16. 17. 18.

Van der Meer, F.D.; van der Werff, H.M.; van Ruitenbeek, F.J.; Hecker, C.A.; Bakker, W.H.; Noomen, M.F.; van der Meijde, M.; Carranza, E.J.M.; de Smeth, J.B.; Woldai, T. Multi-and Hyperspectral geologic remote sensing: A review. Int. J. Appl. Earth Obs. Geoinf. 2012, 14, 112–128. [CrossRef] Lu, G.; Fei, B. Medical hyperspectral imaging: A review. J. Biomed. Opt. 2014, 19, 010901. [CrossRef] [PubMed] Gmur, S.; Vogt, D.; Zabowski, D.; Moskal, L.M. Hyperspectral analysis of soil nitrogen, carbon, carbonate, and organic matter using regression trees. Sensors 2012, 12, 10639–10658. [CrossRef] [PubMed] Lam, A.; Sato, I.; Sato, Y. Denoising hyperspectral images using spectral domain statistics. In Proceedings of the IEEE International Conference on Pattern Recognition (ICPR’12), Tsukuba, Japan, 11–15 November 2012; pp. 477–480. Fu, P.; Li, C.Y.; Xia, Y.; Ji, Z.X.; Sun, Q.S.; Cai, W.D.; Feng, D.D. Adaptive noise estimation from highly textured hyperspectral images. Appl. Opt. 2014, 53, 7059–7071. [CrossRef] [PubMed] Chen, G.Y.; Qian, S.E. Denoising and dimensionality reduction of hyper-spectral imagery using wavelet packets, neighbour shrinking and principal component analysis. Int. J. Remote Sens. 2009, 30, 4889–4895. [CrossRef] Beck, A.; Teboulle, M. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 2009, 18, 2419–2434. [CrossRef] [PubMed] Buades, A.; Coll, B.; Morel, J.M. A non-local algorithm for image denoising. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 7–12 June 2005; pp. 60–65. Aharon, M.; Elad, M.; Bruckstein, A. K-SVD: An algorithm for designing overcomplete dictionaries for Sparse Representation. IEEE Trans. Signal Process. 2006, 54, 4311–4322. [CrossRef] Dabov, K.; Foi, A.; Karkovnik, V. Image denoising by sparse 3Dtransform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2094. [CrossRef] [PubMed] Karami, A.; Yazdi, M.; Asli, A.Z. Noise reduction of hyperspectral images using kernel non-negative Tucker decomposition. IEEEJ. Sel. Top. Signal Process. 2011, 5, 487–493. [CrossRef] Chen, G.Y.; Qian, S.E. Denoising of hyperspectral imagery using principal component analysis and wavelet shrinkage. IEEE Trans. Geosci.Remote Sens. 2011, 49, 973–980. [CrossRef] Manjón, J.V.; Pierrick, C.; Luis, M.B.; Collins, D.L.; Robles, M. Adaptive non-local means denoising of MR images with spatially varying noise levels. J. Magn. Reson. Imaging 2010, 31, 192–203. [CrossRef] [PubMed] Qian, Y.T.; Ye, M.C. Hyperspectral imagery restoration using nonlocal spectral-spatial structured sparse representation with noise estimation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 499–515. [CrossRef] Aggarwal, H.K.; Majumdar, A. Hyperspectral Image Denoising Using Spatio-Spectral Total Variation. IEEE Trans. Geosci. Remote Sens. 2016, 13, 442–446. [CrossRef] Yuan, Q.Q.; Zhang, L.P.; Shen, H.F. Hyperspectral image denoising employing a spectral–spatial adaptive total variation model. IEEE Trans. Geosci. Remote Sens. 2012, 50, 3660–3677. [CrossRef] Li, J.; Yuan, Q.Q.; Shen, H.F.; Zhang, L.P. Hyperspectral image recovery employing a multidimensional nonlocal total variation model. Signal Process. 2015, 111, 230–446. [CrossRef] Liao, D.P.; Ye, M.C.; Jia, S.; Qian, Y.T. Noise reduction of hyperspectral imagery based on nonlocal tensor factorization. In Proceedings of the 2013 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Melbourne, Australia, 21–26 July 2013; pp. 1083–1086.

Sensors 2016, 16, 1718

19. 20. 21. 22. 23. 24.

25. 26. 27. 28.

29. 30. 31. 32. 33.

34. 35. 36. 37.

38. 39. 40. 41. 42. 43.

25 of 26

Maggioni, M.; Katkovnik, V.; Egiazarian, K.; Foi, A. Nonlocal transform-domain filter for volumetric data denoising and reconstruction. IEEE Trans. Image Process. 2013, 22, 119–133. [CrossRef] [PubMed] Xu, L.L.; Li, F.; Wong, A.; Clausi, D.A. Hyperspectral Image Denoising Using a Spatial–Spectral Monte Carlo Sampling Approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 3025–3038. [CrossRef] Zhong, P.; Wang, R.S. Jointly learning the hybrid CRF and MLR model for simultaneous denoising and classification of hyperspectral imagery. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 1319–1334. [CrossRef] Yuan, Q.Q.; Zhang, L.P.; Shen, H.F. Hyperspectral image denoising with a spatial–spectral view fusion strategy. IEEE Trans. Geosci. Remote Sens. 2014, 52, 2314–2325. [CrossRef] Mairal, J.; Bach, F.; Ponce, J. Sparse Modeling for Image and Vision Processing. Found. Trends Comput. Graph. Vis. 2014, 8, 85–283. [CrossRef] Wu, D.; Zhang, Y.; Chen, Y. 3D sparse coding based denoising of hyperspectral images. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium, Milan, Italy, 26–31 July 2015; pp. 3115–3118. Xing, Z.; Zhou, M.; Castrodad, A.; Sapiro, G.; Carin, L. Dictionary learning for noisy and incomplete hyperspectral images. SIAM J. Imaging Sci. 2012, 5, 33–56. [CrossRef] Li, J.; Yuan, Q.Q.; Shen, H.F.; Zhang, L.P. Noise Removal from Hyperspectral Image with Joint Spectral-Spatial Distributed Sparse Representation. IEEE Trans. Geosci. Remote Sens. 2016, 54, 5425–5439. [CrossRef] Ye, M.C.; Qian, Y.T.; Zhou, J. Multitask sparse nonnegative matrix factorization for joint spectral–spatial hyperspectral imagery denoising. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2621–2639. [CrossRef] Rasti, B.; Sveinsson, J.R.; Ulfarsson, M.O.; Benediktsson, J.A. Hyperspectral image denoising using a new linear model and Sparse Regularization. In Proceedings of the 2013 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Melbourne, Australia, 21–26 July 2013; pp. 457–460. Yang, J.X.; Zhao, Y.Q.; Chan, J.C.W.; Kong, S.G. Coupled Sparse Denoising and Unmixing with Low-Rank Constraint for Hyperspectral Image. IEEE Trans. Geosci. Remote Sens. 2016, 54, 1818–1833. [CrossRef] Zhu, R.; Dong, M.Z.; Xue, J.H. Spectral nonlocal restoration of hyperspectral images with low-rank property. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 3062–3067. Zhang, H.Y.; He, W.; Zhang, L.P.; Shen, H.F.; Yuan, Q.Q. Hyperspectral image restoration using low-rank matrix recovery. IEEE Trans. Geosci. Remote Sens. 2014, 52, 4729–4743. [CrossRef] He, W.; Zhang, H.Y.; Zhang, L.P.; Shen, H.F. Total-variation-regularized low-rank matrix factorization for hyperspectral image restoration. IEEE Trans. Geosci. Remote Sens. 2016, 54, 178–188. [CrossRef] Makantasis, K.; Karantzalos, K.; Doulamis, A.; Doulamis, N. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 4959–4962. Huo, L.; Feng, X.; Huo, C.; Pan, C. Learning deep dictionary for hyperspectral image denoising. IEICE Trans. Inf. Syst. 2015, 7, 1401–1404. [CrossRef] Donoho, D.L. For most large underdetermined systems of linear equations the minimal -norm solution is also the sparsest solution. Commun. Pure Appl. Math. 2006, 59, 797–829. [CrossRef] Mairal, J.; Bach, F.; Ponce, J. Task-driven dictionary learning. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 791–804. [CrossRef] [PubMed] Shah, A.; David, K.; Ghahramani, Z.B. An empirical study of stochastic variational inference algorithms for the beta Bernoulli process. In Proceedings of the 32nd International Conference on Machine Learning (ICML‘15), Lille, France, 6–11 July 2015. Thiagarajan, J.J.; Ramamurthy, K.N.; Spanias, A. Learning stable multilevel dictionaries for sparse representations. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 1913–1926. [CrossRef] [PubMed] Yan, R.M.; Shao, L.; Liu, Y. Nonlocal hierarchical dictionary learning using wavelets for image denoising. IEEE Trans. Image Process. 2013, 22, 4689–4698. [CrossRef] [PubMed] Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [CrossRef] [PubMed] Papyan, V.; Elad, M. Multi-scale patch-based image restoration. IEEE Trans. Image Process. 2016, 25, 249–261. [CrossRef] [PubMed] Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis; Chapman & Hall/CRC: Boca Raton, FL, USA, 2014. Rasmussen, C.; Williams, C. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006.

Sensors 2016, 16, 1718

44. 45. 46. 47. 48.

26 of 26

Makitalo, M.; Foi, A. Noise parameter mismatch in variance stabilization, with an application to Poisson-Gaussian noise estimation. IEEE Trans. Image Process. 2014, 23, 5349–5359. [CrossRef] [PubMed] Casella, G.; George, E.I. Explaining the Gibbs sampler. Am. Stat. 1992, 46, 167–174. Rodriguez, Y.G.; Davis, R.; Scharf, L. Efficient Gibbs Sampling of Truncated Multivariate Normal with Application to Constrained Linear Regression; Columbia University: New York, NY, USA, 2004. Zhang, L.; Zhang, L.; Mou, X.Q.; Zhang, D. FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [CrossRef] [PubMed] Xu, Y.; Wu, Z.; Wei, Z. Spectral–Spatial Classification of Hyperspectral Image Based on Low-Rank Decomposition. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2370–2380. [CrossRef] © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Hierarchical Sparse Learning with Spectral-Spatial

Hierarchical Sparse Learning with Spectral-Spatial

Suggest Documents

unsupervised feature learning via sparse hierarchical ...

Collaborative Hierarchical Sparse Modeling

Sparse Bayesian structure learning with

Compressed Sparse Code Hierarchical SOM on Learning and

SLEP: Sparse Learning with Efficient Projections

Multi-Label Transfer Learning with Sparse Representation

Latent Semantic Learning with Structured Sparse

Spatiotemporal Sparse Bayesian Learning With ... - IEEE Xplore

Learning Sparse Classifiers with Applications to ... - CiteSeerX

Tracking Dynamic Sparse Signals with Hierarchical Kalman Filters: A ...

Hierarchical Classification of Images with Sparse ... - EECS @ Michigan

Learning Hierarchical Image Representation with Sparsity, Saliency ...

Hierarchical Reinforcement Learning with Deictic ... - Pieter Spronck

Learning Hierarchical Representations for Face Verification with ...

A Hybrid Learning System with a Hierarchical

Hierarchical Learning with Procedural Abstraction ... - CiteSeerX

Proximal Methods for Hierarchical Sparse Coding

Hierarchical Sparse Channel Estimation for Massive MIMO

Proximal Methods for Hierarchical Sparse Coding

HIERARCHICAL SPARSE AND COLLABORATIVE LOW-RANK ...

Hierarchical Learning - Google Sites

On learning hierarchical classifications

Learning Hierarchical Similarity Metrics

Deviant Learning Algorithm: Learning Sparse Mismatch ... - arXiv