Wavelet Based Hyperspectral Image Restoration ...

Wavelet Based Hyperspectral Image Restoration Using Spatial and Spectral Penalties Behnood Rasti, Johannes R. Sveinsson, Magnus O. Ulfarsson and Jon Atli Benediktsson University of Iceland Faculty of Electrical and Computer Engineering Hjardarhagi 2-6. IS-107 Reykjavik, Iceland;

ABSTRACT In this paper a penalized least squares cost function with a new spatial-spectral penalty is proposed for hyperspectral image restoration. The new penalty is a combination of a Group LASSO (GLASSO) and First Order Roughness Penalty (FORP) in the wavelet domain.

The restoration criterion is solved using the Alternative

Direction Method of Multipliers (ADMM). The results are compared with other restoration methods where the proposed method outperforms them for the simulated noisy data set based on Signal to Noise Ratio (SNR) and visually outperforms them on a real degraded data set.

Keywords:

Hyperspectral image restoration, penalized least squares, wavelets, group lasso penalty, rst order

roughness penalty, sparse regularization, alternative direction method of multipliers.

1. INTRODUCTION Hyperspectral imaging is one of the latest imaging technology to observe the Earth's surface using ground, airborne or spaceborne hyperspectral sensors. Improvements in the spectroscopy technology led to new generation of sensors called hyperspectral sensors that could enhance the discrete reectance spectra to contiguous one. The hyperspectral image cube is a 3-Dimensional (3D) data in which the rst two dimensions represent spatial information and the third dimension represents the spectral information of a scene. The received radiance at the

1, 2

sensor is degraded by atmospheric eects and instrumental noise which can be modeled as an additive noise.

In this paper we only consider the additive noise and assume that the atmospheric eects are compensated for. The information loss in some bands due to these degrading eects can be signicant. These bands are usually removed before further processing. An alternative strategy is to restore those corrupted bands. Hyperspectral image restoration and its eect on the analysis of these images have been studied extensively. Due to the high correlation between bands, classic band-by-band restoration methods are not ecient for hyperspectral image restoration. Thus, eorts have been focused to exploit the joint spatial-spectral information in the data cube. Noise reduction was done for hyperspectral images by using the Discrete Fourier Transform (DFT) and the 2-Dimensional Discrete Wavelet Transform (2D-DWT) where DFT was used to decorrelate the channels

3 A sparse synthesis regularization technique4 was proposed to restore junk

and 2D-DWT to decorrelate spatially.

5 bands using GLASSO penalty on 2D wavelet coecients. 3D wavelet shrinkage was applied for multi-spectral 6 7 8 image denoising. 2D bivariant wavelet shrinkage (2D BiShrink) was extended for the datacube. Recently, 3D 9

undecimated wavelet using analysis sparse regularization was given for hyperspectral image denoising.

10 was used for hyperspectral image

Recently, a three-mode factor analysis called Tucker3 decomposition

11 there hyperspectral image were assumed to be a third order tensor and the "best" lower rank of restoration,

the decomposition was chosen by minimizing a Frobenius norm. A similar idea was exploited for hyperspectral

12 Genetic Algorithm (GA) developed for choosing the

image restoration by applying more reduction spectrally.

13 rank of the Tucker3 decomposition.

Due to the spectral redundancy, Principal Component Analysis (PCA) is very ecient for decorrelating hy-

14 where 15 2D BiShrink was applied on each Principal Component (PC) and then 1D dual tree complex wavelet transform perspectral image spectrally. PCA and wavelet shrinkage were used for hyperspectral noise reduction

was applied on each pixel. A linear model using Singular Value Decomposition (SVD) and 2D-DWT for hyper-

16 where Stein's Unbiased Risk Estimator (SURE) was used for selecting the

spectral image denoising was given

regularization parameters automatically. Wavelet Based Sparse PCA in a sparse regularization framework was developed for hyperspectral image restoration.

17 A cyclic type algorithm was given to solve the non-convex cost

function where the wavelet coecients of PCs and PCA transformation matrix were simultaneously extracted with the help of an additional orthogonality constraint. The results showed signicant improvements compared to other methods. Penalized least squares using FORP was used for hyperspectral image restoration and showed a great im-

18, 19 Here, we propose a novel spatial-spectral penalty using

provement based on SNR and classication indices.

a combination of FORP and GLASSO for penalized least squares in the wavelet domain.

FORP is used as

the spectral penalty and GLASSO as the spatial one. The cost function given is solved by using ADMM. The restoration results show considerable improvements compared to other methods based on both SNR and visually. The rest of the paper is organized as follows. In Section 2 a penalized least squares cost function with a new spatial-spectral penalty which is a combination of GLASSO and FORP, called GLASSORP, is given and solved using ADMM. In Section 3, the performance of the proposed method on both simulated and real data sets is investigated and compared with some methods in the literature using SNR and also visually. Finally, Section 4 concludes the paper.

2. PENALIZED LEAST SQUARES USING GLASSORP The hyperspectral image model is given by

Y = DW + N, is an n × p matrix containing the vectorized observed image at band i in its i-th column, Y = y (i) W = w(i) is an n × p matrix containing the 2D wavelet coecients for the i-th band in its i-th column and N = n(i) is an n×p matrix containing the noise at band i in its i-th column. Here, n(i) is a zero-mean Gaussian 2 2 2 noise vector with covariance Ω = diag σ1 , σ2 , . . . , σp . D is a 2D wavelet basis (n × n). The cost function for estimating W is given by n

2 X

λspec 1

T 2 −1/2

kwj k2 , (1) WR F + λspat J(W) = (Y − DW) Ω

+ 2 2 F j=1 where

where

R

is a

(p − 1) × p

dierence matrix,



−1  0  R= .  .. 0 k.kF

is the Frobenius norm and

wjT

is the

j -th

1 −1

0 1

... 0

0 ···

. . .

..

..

. . .

0

0

.

.

...

−1

row in the matrix

 0 0   . , .  . 1 W.

Equation (1) only uses the sparsity

property of the wavelets, by also considering the Multi-Resolution Analysis (MRA) property we can allow the tuning parameters to be scale dependent

where

2 X

2 X l

1X l

ˆ = arg min 1

wjl , W λspec Wl RT F + λspat

(Y − DW) Ω−1/2 + 2 W 2 2 F j l l W = W1 ; W2 ; ...; WL+1 (the notation ';' shows vertical concatenation) and L is the level

decomposition. Since

D

(2)

of wavelet

is unitary and the Frobenius norm enjoys unitary invariant property (2) is separable

and thus

l

l 1

−1/2 l

2 λspec

Rwjl 2 + λl

wj , vj − wjl + (3)

Ω spat 2 2 l 2 2 wj 2 1 2 T l T l L+1 2022 where D Y = V and (vj ) is the j -th row in the matrix V where V = V ; V ; ...; V . We use ADMM

ˆ jl = arg min w

to solve (3). Here, we explain the procedure briey. For simplicity we drop the variable indices (upper and lower) in this part. Using variable splitting our minimization problem turns to

arg min w,s

2 λ 1 spec

−1/2

2 (v − w) + kRwk2 + λspat ksk2

Ω 2 2 2

s.t.

w = s,

(4)

Augmented Lagrangian (AL) for (4) is given by

arg min w,s

2 λ µ 1 spec

−1/2 2 2 (v − w) + kRwk2 + λspat ksk2 + kw − s − dk2 .

Ω 2 2 2 2

(5)

To solve (5) a cyclic descent type algorithm is used. The solution is given by solving the problem w.r.t. one variable at a time while the other one is kept. Thus, the solution w.r.t.

w

is given by

ˆ = Λ−1 Ω−1 v + s + d , w where

Λ = Ω−1 + µI + λspec RT R

and the solution w.r.t.

ˆs = max 0, kw − dk2 − The last step is to update the Lagrangian multiplier is given in Algorithm 1. Note that the upper index

l

d

as

s

(6)

is

λspat

µ

w−d . kw − dk2

d ←− d − w + s.

A compact form of the algorithm

was dropped for simplicity. Since the problem is separable

the MRA property can be easily taken into account by substituting

λspec

and

λspat

with

λlspec

and

λlspat ,

respectively. Consequently, all the variables will be updated correspondingly.

Algorithm 1: Input: D:

ADMM for GLASSORP.

(2D-wavelet) basis,

parameters,

:

Y:

λspec > 0, λspat > 0

The observed signal,

and

µ > 0:

Regularization

Tolerance value.

ˆ : Estimated Output: X Initialization; d0 , s0 ,

signal.

V = DT Y, T Λ = Ω −1 + µI + λ spec R R

while J k+1 − J k ≤ do for j = 1 : n do ˆ jk = Λ−1 Ω−1 vj + skj + dkj , w λ

k

w − dk − spat ˆsk+1 = max 0, j j 2 j µ

wjk −dk j

kwjk −dkj k2

,

dk+1 = dkj − wjk + sk+1 , j j end end

ˆ = DW. ˆ X

3.

EXPERIMENTAL RESULTS

GLASSORP was applied on both for simulated and real hyperspectral data sets. pared with SURE Shrinkage (SUREShrink),

The results were com-

23 bivariant shrinkage (BiShrink)7 and bivariant shrinkage for PCA

14 based on SNR and also visually. The noise standard deviation for i-th band (σ ) is estimated in i 23 the wavelet domain as

(PCABiShr)

σi = where

WiL+1

median

WL+1 i

0.6745

is the 2D wavelet coecients for the subband

L+1

,

(7)

(HH) in the

i-th

band.

3.1 Simulated Hyperspectral Data Set

24 Ten dierent spectra were chosen

A hyperspectral data set was simulated using the USGS spectral library.

from USGS and are listed in Table 1, each were allocated to one class shown in Figure 1. To get the variability in the spectra in each band, dierent spectra from the same label in USGS were chosen randomly for each pixel

and a moving average (3

× 3)

was applied on them. First and last bands are ignored since the spectra chosen

from USGS have mostly zero value in those wavelength. Thus, the nal simulated data set has 222 bands of size

128 × 128. A noisy hyperspectral data set was simulated by adding Gaussian zero-mean noise to the simulated data set. The noise variance along the spectral axis (σi ) varies like a Gaussian shape centered at the middle band (p/2) as

σi2 = σ 2

e

−

Pp

(i−p/2)2 2η 2

j=1 e

where the power of the noise is controlled by paper,

η = 15).25

σ

η

and

−

(j−p/2)2 2η 2

,

(8)

is like the standard deviation for Gaussian shape (in this

To evaluate the simulation results SNR in dB is given by

 SNRout

= 10

log10

 2 kXkF

  

2  ,

ˆ

X − X F

and the level of input noise is shown by

!

2

SNRin

= 10

log10

kXkF 2

kX − YkF

.

Table 1: The text in the label: Names of the information classes for the simulated data set. Cl. #

Class Name

Sample #

1

Lodgepole-Pine

2

Olivine

13783 289

3

Grass-dry

289

4

Antigorite

289

5

Jarosite

289

6

Melting-snow

289

7

Sagebrush

289

8

Espruce-Sr

289

9

Grass-Fescue-Wheatg

289

10

Hematite-Coatd-Qtz

289

Figure 1: Labels for the simulated data set. 3.1.1 Spectral v.s. Spatial Penalty Here, the contributions of the spatial penalty (GLASSO) and spectral penalty (FORP) are investigated on the simulated data set based on SNR. The zero-mean Gaussian noise was added to the simulated data set using (8) to get SNRin = 15 dB. Figure 2 shows SNRout w.r.t. λspec and λspat . The optimum value is at λspec = 1.00 and λspat = 0.45 where SNRout = 30.54 dB. From Figure 2 it can be seen that when the spatial penalty is only used (when

λspec = 0.00)

we get a very low SNRout . The optimum in this case occurs when

λspat = 1.50

and

then SNRout = 24.76 dB. By adding the spectral penalty (λspec = 0.20), it can be seen that SNRout increases signicantly. This is when the slope of the mesh graph is high in Figure 2. On the other hand, for the spectral

λspat = 0.00) we get pretty high SNRout and the optimum is at for λspec = 1.20 and λspat = 0.00 and then SNRout = 29.03 dB. Daubechies wavelet with 12 coecients in three levels is used in this

penalty alone (when

26

paper. The proportion of the regularization parameters for dierent wavelet decomposition levels is chosen as

1

[λ , spat

√ λ2spat , λ3spat , λ4spat ]=λspat [0,1/2,1/ 2,1] for GLASSO penalty and for FORP is chosen as [λ1spec , λ2spec ,

λ3spec , λ4spec ]=λspec [λ1 , λ2 , λ3 , λ4 ] given in Ref. 19 when λspat = 0 as

where

λl

is given by minimizing Stein's Unbiased Risk Estimator (SURE)

n

22L −1

1 1 X −1 1 T 1 2

ˆ j − vj 2 + 2tr SUREFORP = n Ω + λspec R R w , LL

22L j=1

for the coarse coecients and for the detail ones in l-th level of the wavelet decomposition is

1 SUREFORP = 3 × 2n2L l

4×idx X

−1

l l 2 −1 l T

w ˆ j − vj 2 + 2tr Ω + λspec R R ,

j=idx+1

n . However, the regularization parameters are expected to be dierent for GLASSORP, 22(L−l+1) indeed they are selected automatically based on the amount of the noise in each level for FORPDN using where

idx =

aforementioned SURE. Hence, they can represent the proportion of

λspec

for GLASSORP.

3.1.2 Restoration Performance w.r.t. Noise Power The performance of GLASSORP is compared based on SNR with the aforementioned methods in Figure 3 in which SNRout was shown in dB for dierent level of noise added. The amount of input noise is given by SNRin . The results shown in Figure 3 are all an average over 10 experiments. The error bars show the standard deviations for the experiments. GLASSORP outperforms the other methods used in the experiments for dierent value of noise, except for very low level of noise (SNRin =40 dB) there PCABiShrink gives slightly better result. Figure 4 shows the reconstruction for band 127 of the simulated data set for all the methods used in the experiments. There it can be seen that GLASSORP outperforms the other methods both visually and also based on SNR.

50 45

SNRin=15(dB)

40 35 SNRout

30

SNR

out

(dB)

40

Noise BiShrink2D SUREShrink2D PCABiShrink GLASSORP

20

30 25 20

10 4

15 2 2 λspec

10

1 0 0

λspat

Figure 2: The eects of spatial and spectral penalties on the SNRout for the simulated data set.

5 5

10

15

20 25 SNRin

30

35

40

Figure 3: GLASSORP performance compared to other methods applied on simulated data set.

3.2 Real Hyperspectral Data Set

∗

Here, we consider the restoration of the Indian Pine data set . The Indian Pine data set is

145 × 145 pixels in 220

bands collected by AVIRIS sensor. The whole data set was restored by GLASSORP and visually compared with the methods used for comparing in the simulated part of the experiment. Visual results are shown in Figure 5 for bands 1, 105, 156 and 220. It can be seen that the results obtained by the proposed method give considerable visual improvement compared to the other methods.

It is also worth to mention that GLASSORP preserves

more details than the other methods used in this part of the experiment. ∗

Is available through Purdue's University MultiSpec site

Figure 4: The restoration of band number 127 for the simulated data set when SNRin = 15 dB for the whole data cube. (a) original band, (b) noisy band (10.32 dB): denoised band using (c) SURShrink2D (17.49 dB), (d) BiShrink2D (18.05 dB), (e) PCABiShrink (21.81 dB) and (f) GLASSORP (27.36 dB). 4. CONCLUSION In this paper a new hyperspectral restoration method called GLASSORP was proposed. GLASSORP is a penalized least squares using a spatial-spectral penalty. First order roughness penalty was used as a spectral penalty and a group lasso penalty as a spatial one. The method was applied on a simulated and a real hyperspectral data set. It was shown that GLASSORP outperforms other methods for the simulated data set except for a very low noise (SNRin

= 40

dB) where PCABiShrink showed slightly better performance based on SNR. Moreover, the

visual improvements of GLASSORP applied on the real data set were signicant compared to other methods.

ACKNOWLEDGMENTS This work was supported by the Doctoral Grants of the University of Iceland Research Fund and the University of Iceland Research Fund, and the Icelandic research fund (130635051).

REFERENCES [1] Kerekes, J. P. and Baum, J. E., Hyperspectral imaging system modeling,

Lincoln Laboratory

14(1),

130 (2003). [2] Landgrebe, D. and Malaret, E., Noise in remote-sensing systems: The eect on classication error,

Transactions on Geoscience and Remote Sensing GE-24, 294 300 (March 1986).

[3] Atkinson, I., Kamalabadi, F., and Jones, D., Wavelet-based hyperspectral image estimation, in [

International Geoscience and Remote Sensing Symposium (IGARSS) ], 55, 743 745 (Jul 2003).

117

IEEE

Proc. of

[4] Zelinski, A. and Goyal, V., Denoising hyperspectral imagery and recovering junk bands using wavelets

Proceedings of International Geoscience and Remote Sensing Symposium (IGARSS) ], 387 390 (Aug 2006). and sparse approximation, in [

Figure 5: Restoration of the Indian Pines data set. From top to bottom: Data, SUREShrink, BiShrink, PCABiShrink and GLASSORP and from left to right: band 1, 105, 156 and 220.

IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) ], 3754 3757 (March 2010). Basuhail, A. A. and Kozaitis, S. P., Wavelet-based noise reduction in multispectral imagery, in [SPIE, Algorithms for Multispectral and Hyperspectral Imagery IV ], 3372, 234 240 (July 1998).

[5] Solo, V. and Ulfarsson, M., Threshold selection for group sparsity, in [ [6]

[7] Sendur, L. and Selesnick, I. W., Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency,

IEEE Trans. Signal Process. 50(11), 27442756 (Nov.,2002).

[8] Chen, G., Bui, T. D., and Krzyzak, A., Denoising of three-dimensional data cube using bivariate wavelet shrinking,

International Journal of Pattern Recognition and Articial Intelligence (IJPRAI) 25(3), 403413

(2011). [9] Rasti, B., Sveinsson, J. R., Ulfarsson, M. O., and Benediktsson, J. A., Hyperspectral image denoising using

IEEE Proceedings of International Geoscience and Remote Sensing Symposium (IGARSS) ],

3D wavelets, in [

13491352 (2012). [10] Tucker, L. R., Some mathematical notes on three-mode factor analysis,

Psychometrika

31,

279311

(1966c). [11] Lathauwer, L. D., Moor, B. D., and Vandewalle, J., On the best rank-1 and rank-(R1,R2,. . .,RN) approximation of higher-order tensors,

SIAM J. Matrix Anal. Appl. 21, 13241342 (March 2000).

[12] Renard, N., Bourennane, S., and Blanc-Talon, J., Denoising and dimensionality reduction using multilinear tools for hyperspectral images,

IEEE Geoscience and Remote Sensing Letters 5, 138 142 (April 2008).

[13] Karami, A., Yazdi, M., and Asli, A., Best rank-r tensor selection using Genetic Algorithm for better

Fifth International Conference on Digital

noise reduction and compression of hyperspectral images, in [

Information Management (ICDIM) ], 169 173 (July 2010).

[14] Chen, G. and Qian, S.-E., Denoising of hyperspectral imagery using principal component analysis and wavelet shrinkage,

IEEE Tran. Geoscience and Remote Sensing 49, 973980 (2011).

[15] Selesnick, I., Baraniuk, R., and Kingsbury, N., The dual-tree complex wavelet transform,

Processing Magazine 22, 123151 (Nov. 2005).

IEEE Signal

[16] Rasti, B., Sveinsson, J. R., Ulfarsson, M. O., and Benediktsson, J. A., Hyperspectral image denoising using a new linear model and sparse regularization, in [

Sensing Symposium (IGARSS) ], (2013).

IEEE Proceedings of International Geoscience and Remote

[17] Rasti, B., Sveinsson, J. R., Ulfarsson, M. O., and Sigurdsson, J., Wavelet based sparse principal component analysis for hyperspectral denoising, in [IEEE 5th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS) ], (2013).

[18] Rasti, B., Sveinsson, J. R., Ulfarsson, M. O., and Sigurdsson, J., First order roughness penalty for hyper-

IEEE 5th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS) ], (2013). spectral image denoising, in [

[19] Rasti, B., Sveinsson, J. R., Ulfarsson, M. O., and Benediktsson, J. A., Hyperspectral image denoising

[20]

using rst order spectral roughness penalty in wavelet domain, accepted IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2013). Eckstein, J., [Splitting Methods for Monotone Operators with Applications to Parallel Optimization ], Center for Intelligent Control Systems, M.I.T. (1989).

[21] Eckstein, J. and Bertsekas, D. P., On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators,

Math. Program. 55, 293318 (June 1992).

[22] Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J., Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers,

Foundations and Trends in Machine Learning. 3,

1122 (Jan 2011). [23] Donoho, D. and Johnstone, I. M., Adapting to unknown smoothness via wavelet shrinkage,

American Statistical Association 90, 12001224 (1995).

[24] Clark, D. R. N., Usgs digital spectral library, (September 2007).

Journal of the

(http://speclab.cr.usgs.gov/spectral-

lib.html). [25] Bioucas-Dias, J. and Nascimento, J., Hyperspectral subspace identication,

science and Remote Sensing 46, 2435 2445 (Aug. 2008).

IEEE Transactions on Geo-

[26] Selesnick, I. W. and Figueiredo, M. A. T., Signal restoration with overcomplete wavelet transforms: Comparison of analysis and synthesis priors, in [

Proceedings of SPIE ], 7446

(Wavelets XIII)

(2009).