Hyperspectral image super-resolution using deep ...

0 downloads 0 Views 6MB Size Report
May 3, 2017 - It means that the input is c feature maps of size w × h. 245 ...... [20] C. Dong, C. C. Loy, K. He, X. Tang, Image super-resolution using deep con-.
Hyperspectral image super-resolution using deep convolutional neural network Yunsong Lia,b , Jing Hua,b,∗, Xi Zhaoa,b , Weiying Xiea,b , JiaoJiao Lia,b a State

Key Laboratory of Integrated Service Network, Xidian University, Xian 710071, China b Joint Laboratory of High Speed Multi-source Image Coding and Processing, Xidian University, Xian 710071, China

Abstract Limited by the existed imagery hardware, it is challenging to obtain a hyperspectral image (HSI) with a high spatial resolution. Super-resolution (SR) focuses on the ways to enhance the spatial resolution. HSI SR is a highly attractive topic in computer vision and has attracted the attention from many researchers. However, most HSI SR methods improve the spatial resolution with the important spectral information severely distorted. This paper presents an HSI SR method by combining a spatial constraint (SCT) strategy with a deep spectral difference convolutional neural network (SDCNN) model. It super-resolves the HSI while preserving the spectral information. The SCT strategy constrains the low-resolution (LR) HSI generated by the reconstructed high-resolution (HR) HSI spatially close to the input LR HSI. The SDCNN model is proposed to learn an end-to-end spectral difference mapping between the LR HSI and HR HSI. Experiments have been conducted on three databases with both indoor and outdoor scenes. Comparative analyses validate that the proposed method enhances the spatial information better than the state-of-arts methods, with spectral information preserving simultaneously. Keywords: hyperspectral image, super-resolution, convolutional neural network

∗ Corresponding

author. Tel:(86) 153-1998-0169 Email address: [email protected] (Jing Hu)

Preprint submitted to Journal of LATEX Templates

May 3, 2017

1. Introduction Hyperspectral imagery is the technology that acquires images of the same scene in many contiguous and narrow spectral bands. The acquired hyperspectral image (HSI) is a spatial-spectral merging three-dimensional data cube that 5

contains both spatial information and spectral information [1]. The important and rich spectral information can be exploited to distinguish some interest material among various ones. By the spectral information, HSI has been widely used in many remote sensing and computer vision applications, such as the military object detection [2], geological exploration [3] and change detection [4, 5].

10

While the HSI can achieve a high spectral resolution with these contiguous and narrow bands, the spatial resolution of HSIs is usually much coarser than that of the RGB images in our daily life [6]. This is due to the fact that the dense spectral bands in the hyperspectral sensors make a limited amount of photons reached one narrow spectral window averagely. The straightforward way to in-

15

crease the spatial resolution is to reduce the pixel size or to increase the chip size by the imagery sensor manufacturing technology. Reducing the pixel size indicates that the number of pixels in per unit area is increased. However, when the pixel size decreases, the number of light available will also decrease. It makes the image quality be severely degraded by the shot noise [7]. As to the chip

20

size increasing way, it will increase the capacitance. Larger capacitance makes it difficult to speed up the charge transfer rate, so this size increasing approach is impractical neither [7]. Signal post-processing technique allows the enhancement of the measured signal after measurement [8]. It can be utilized as one of ways to solve the low spatial resolution of HSIs. Super-resolution (SR) recon-

25

struction is a classical signal post-processing technique that aims at obtaining a high-resolution (HR) image (or sequence) from one or multiple observed low resolution (LR) images. The major advantages of the SR approach include that it is economical and the existed LR imaging systems can still be utilized. SR has been proved useful in many practical cases, including the medical imaging [9],

30

satellite imaging [10], video applications [11], and so on. Therefore, it is of great

2

necessity to develop the HSI SR technique to improve the spatial resolution of HSIs. In this paper, we present an HSI SR method by combining a spatial constraint (SCT) strategy with a deep spectral difference convolutional neural net35

work (SDCNN) model. The SCT strategy constrains the LR HSI generated by the reconstructed HR HSI spatially close to the input LR HSI, and aims at enhancing the spatial resolution. The SDCNN model is a specific convolutional neural network (CNN) model that are proposed to learn an end-to-end spectral difference mapping between the LR HSI and the HR HSI. It is designed

40

to preserve the important spectral information. Our method differs fundamentally from the existed HSI SR method, in that ours does not require an extra panchromatic image as the input, or explicitly learn the dictionary for modeling the patch space. Spatial information is improved with spectral information preserving through our proposed SCT strategy and SDCNN model (named

45

SCT SDCNN) with little preprocessing or post-processing. Experiments have been conducted on HSIs from three different databases that include both indoor scenes and outdoor scenes. Comparative analyses have validated the effectiveness of our proposed SCT SDCNN method.

2. Related work 50

2.1. Image super-resolution The SR methods nowadays can be classified into three categories: interpolationbased methods [12, 13], reconstruction-based methods [14, 15, 16, 17] and learningbased methods [18, 19, 20]. The interpolation-based methods are simple in theory, but can not bring in much extra information. It is difficult for this type of

55

methods to recover the missed high frequency detail of the HR image and to enhance the spatial resolution. The reconstruction-based methods firstly build an image acquisition model, and then solve for the unknown HR image according to the LR input. This model is usually an ill-posed one. During the SR process, constraints like edge prior can be enforced on the model to make the optimal

3

60

solution be close to the real scene [15]. The learning-based methods are the most popular in recent years. They can be classified into two sub- types in detail: the internal-learning and the external-learning SR methods. The internal learning methods hold that some similar patches can be found in the other locations or other scales for one input image patch. The combination of similar patches can

65

be made to reconstruct an HR image. The external learning methods firstly extract the ”meta-detail” shared among external images, and reconstruct the HR image by the ”meta-detail”. The SR methods via deep learning are the typical external learning methods which achieve the-state-of-arts performance in nowadays. This type of methods learns a mapping between LR images and HR

70

images by a large number of training data. It is noted that most SR methods are designed for the monochromatic and color images. In recent years, for the development of hyperspectral sensors and their wide applications, the HSI SR technology has received much more attention from researchers. Akgun et al.[21] proposed a complex model of the HSI acquisition process, and put up with

75

a projection-onto-convex-sets-based SR method. Zhang et al. [6] divided the bands of an HSI into three groups, and a principal-component-analysis-based SR method was applied for the primary component. Simoes et al. [22] proposed an optimization model that contains two data-fitting terms and an edge-preserving regularizer. Akhtar et al. [23] proposed a sparse coding method exploited non-

80

negativity and spatio-spectral sparsity of the scene. Dong et al. [24] proposed that the estimation of the HR HSI was a joint estimation of the dictionary and the sparse code. However, most of these methods tackled the HSI SR problem as an image fusion problem through using an auxiliary HR image. In the reality, it is of great difficulty to obtain a couple of HR panchromatic image and HSI

85

about the same scene with completely registration, making this kind of method not so practical. Sub-pixel mapping was introduced by Atkinson et al. [25] to transform a fraction image by the unmixing operation, a method that tackled the SR problem from the unmixing operation without requiring an auxiliary panchromatic image. This technology divides a pixel into sub-pixels and assigns

90

each new smaller sub-pixel into a land cover class. However, the noise generated 4

by the unmixing operation is inevitable during the mapping operation, which may have a negative influence on the SR process. Moreover, all these methods super-resolved the HSI without specifically preserving the important spectral information. 95

2.2. Deep learning for HSI applications CNN dates back decades [26]. Owing to the rapid growth of the computing ability and amount of data, deep CNN has recently shown an explosively popularity for its success in image processing applications, such as object detection [27, 28], face recognition [29], image denoising [30]. For its successful

100

applications in the RGB images, there have also been some studies of using deep CNN techniques for HSI applications, such as classification [31, 32, 33], dimensionality reduction [34], feature extraction [35, 36]. Zhang et al. [33] proposed a deep ensemble framework for scene classification at the first time. Zhao et al. [32] explored multiscale CNN to transform the original HSI into a

105

pyramid structure that contains spatial features for HSI classification with final result in the form of major voting. Chen et al. [36] introduced the deep belief network (DBN) to extract spectral-spatial features as a pre-processing for the future applications of HSI. Yuan et al. [37] applied the CNN model proposed by Dong et al. [20] directly on the hyperspectral images, but this method did not

110

take consideration of the spectral information preservation and the difference between hyperspectral images and RGB images. In this paper, our proposed SDCNN model is the first time of applying CNN model specifically for the HSI SR problem. The SDCNN is mainly equipped with convolutional layers. Different from all these aforementioned HSI SR methods

115

in section 2.1, the proposed SDCNN model learns the mapping between the input and the output, and it can be trained in an end-to-end manner by the back propagation method [38]. In order to preserve the important spectral information, which most HSI fusion methods cannot achieve [6], the input data and label of the proposed SDCNN are the spectral difference of the LR HSI and

120

that of the HR HSI, respectively. We have also incorporated the SCT strategy 5

to spatially constrain the LR HSI generated by the reconstructed HR HSI close to the input LR HSI. More details are discussed in section 3. The remainder of the paper is organized as follows. Section 3 describes our proposed method. Experimental results are provided in section 4, and we have 125

a conclusion in section 5. 3. Proposed method The main four parts of the proposed SCT SDCNN method are as follows: building an observation model of the HSI, spatial reconstruction by the SCT strategy, spectral reconstruction by the SDCNN model and SR reconstruction

130

by the SCT SDCNN method. Detailed descriptions about these parts have been presented in the following subsections. 3.1. Observation model of the HSI An observation model bridges the desired HR HSI and the input LR HSI. To develop the HSI SR method, we formulate an observation model firstly.

135

Let the desired HR HSI be denoted as H = {H1 , H2 , ..., HK }, where K is the total bands of the HSI, and each band of the HR HSI has sN1 × sN2 pixels. Parameter s represents the down-sampling factor in the spatial domain. The LR HSI is represented as L = {L1 , L2 , ..., LK }, whose size is N1 × N2 in the spatial domain. Both Hi and Li are spatial descriptions of the scene at one

140

band. During the imagery process, the camera lens and aperture produce a blurred version of the object, and the charged-coupled devices (CCD) turns this degraded analog signal into a discrete image. In addition, the images are contaminated by additive noise from various sources: quantization error, sensor measurement, and so on [39, 40]. Hence, an image observation model is built to

145

depict the relationship between the desired HR HSI and the input LR HSI as follows: L = (H∗G) ↓ +n

(1)

where G represents a spatial filter, ’*’ represents the convolutional operation and ’↓’ represents the down-sampling operation. n denotes the noise that follows 6

the gaussian distribution with zero mean value. For each band in the HSI, the 150

observation model can be rewritten as Li = (Hi ∗G) ↓ +n

(2)

where i = 1, 2, ..., K. i is the index of the current band. Considering the spatial constraint and spectral constraint in the SR process, we minimize the following energy function by enforcing the constraints in both the spatial domain and the spectral domain: ˆ i ) = E(H ˆ i |Li ) + λE(H ˆ i |H ˆ i−1 , Li , Li−1 ) E(H 155

(3)

where Li−1 and Li denote the i − 1th and ith band of the input LR HSI (L), reˆ i−1 and H ˆ i represent the i − 1th and ith band of the reconstructed spectively. H ˆ E(H ˆ i |Li ) and E(H ˆ i |H ˆ i−1 , Li , Li−1 ) denote the constraint providHR HSI (H). ed by the SCT strategy and the SDCNN model, respectively. Detailed descriptions about the reconstruction in the spatial domain and spectral domain are

160

presented in sections 3.2 and 3.3. 3.2. Spatial reconstruction by the SCT strategy Given the LR HSI (L), we denote the constraints in the spatial domain as ˆ i |Li ) = ||(H ˆ i ∗ G) ↓ −Li ||F 2 E(H

(4)

This energy function enforces the LR band generated by the reconstruct band ˆ i close to the input LR band Li , and can be minimized by the gradient descent H 165

method:

ˆ i |Li ) ∂E(H ˆ i ∗ G) ↓ −Li ) ↑ ∗G = 2((H ˆi ∂H

(5)

where ’↑’ denotes the up-scaling operation. The specific up-scaling operation used in this paper is the bicubic interpolation. Under the ideal condition that the down-sampling operation is reversible and the blur imported by the filtering operation is fully removed, the iteration strategy ˆ t+1 = H ˆ it − τ (H ˆ it ∗ G ↓ −Hi ∗ G ↓) ↑ ∗G H i 7

(6)

170

can be denoted as ˆ t+1 = H ˆ it − τ (H ˆ it − Hi ) H i

(7)

In this way, minimizing the spatial constraint leads to the following iterations: ˆ1 = H ˆ 0 − τ (H ˆ 0 − Hi ) = (1 − τ )H ˆ 0 − (1 − τ )Hi + Hi H i i i i ˆ i2 = H ˆ i1 − τ (H ˆ i1 − Hi ) = (1 − τ )2 H ˆ i0 − (1 − τ )2 Hi + Hi H

(8)

... ˆ ik = H ˆ k−1 − τ (H ˆ k−1 − Hi ) = (1 − τ )k H ˆ i0 − (1 − τ )k Hi + Hi H i i where k represents the iteration times. The difference between the spatially ˆ i and the desired HR band Hi can be denoted as reconstructed HR band H ˆ ik − Hi = (1 − τ )k H ˆ i0 − (1 − τ )k Hi = (1 − τ )k (H ˆ i0 − Hi ) H

(9)

ˆ 0 and Hi are with constant values, and τ is between 0 and 1, so the Both H i 175

ˆ k and Hi gets smaller as k grows. But in the reality, the difference between H i down-sampling operation is irreversible and the blur is hard to be fully removed, iteration at each time will bring in the information loss, and a much too large k will not be beneficial to the SR performance. Meanwhile, when the scaling factor grows, the information loss each iteration brings will be larger. It will

180

make a performance decline of the SCT strategy. 3.3. Spectral reconstruction by the SDCNN model As described in section 1, the spectral curves of HSIs depict the reflectance variation with the wavelength of the scene and are extremely important for their applications. Post-processing like the SCT strategy may bring in some distortion

185

on the spectral information. So we propose the SDCNN model to make sure the spectral information of the reconstructed HSI be close to the desired HSI. Meanwhile, the SCT strategy is hard to import some extra information, making this strategy has a limited performance when the scaling factor grows. The SDCNN model learns the information from the training data, and is trained to

190

describe the mapping between a spectral difference of the LR HSI and that of the corresponding HR HSI. Thus, a simulated spectral difference of the HR HSI 8

LHd K 1 LHd 2

-

L1

LK LK 1

Ld K 1 Ld 2

L2

LHd1 up-sampling

Ld1

...

...

...

L1

-

...

LK

LK 1

L2

minus operation between neighboring bands

HK H K 1

-

...

HK

Hd K 1

H K 1

Hd 2

H2 Hd1

H1

minus operation between neighboring bands

HR HSI

data of the training process

...

H2 H1

-

spectral difference of the LR HSI

...

LR HSI

label of the training process

Fig. 1: The generating process of the training data and label

with respect to the LR HSI can be imported by the SDCNN model. As the scaling factor grows, the spatial information enhancement performance of the SDCNN model will be more stable than that of the SCT strategy. There are 195

main three parts of the proposed SDCNN: data and label setup; formulation of the SDCNN and HSI SR using SDCNN. Detailed descriptions are presented as follows. 3.3.1. Data and label setup The LR HSI has experienced blurring, down-sampling and noise, and it-

200

s spectral information has been distorted from that of the desired HR HSIs. Deep CNN models can represent a hierarchy of increasingly complex features [41], so we propose to build an SDCNN model to depict the mapping between the spectral difference of the LR HSI and that of the desired HR HSI. For the SDCNN training process, our input data is the bicubic version of the spectral

205

difference between the LR HSI neighboring bands, and the label is the corresponding spectral difference between the HR HSI neighboring bands. We have plotted the generation process of the training data and label in Fig. 1. Seen from Fig. 1, Hd and Ld represent the spectral difference of the HR HSI and

9

that of the LR HSI, respectively, and they can be denoted as Hdi−1 = Hi − Hi−1 Ldi−1 = Li − Li−1 210

for i > 1 (10) for i > 1

where i is the index of the band in the HSI. Up-sampling is operated on Ld to obtain an LHd whose size is the same as Hd. We still call LHd the spectral difference of the LR HSI for simplicity. The computed LHd and Hd are the data and label of the training process. In a deep CNN training process, normalization is a pre-processing to accelerate the convergence speed [42], so we have

215

normalized the data and the label of our SDCNN by the following formula: LHd − min(LHd, Hd) max(LHd, Hd) − min(LHd, Hd) Hd − min(LHd, Hd) label = max(LHd, Hd) − min(LHd, Hd) data =

(11)

Both data and label of the SDCNN have been normalized at the same time to the value between 0 and 1. 3.3.2. Formulation of the SDCNN We plan to learn a mapping model to represent the relationship between 220

the LHd and Hd, and function F () is used to represent this mapping. There are three operations contained in the mapping process: patch extraction and representation, non-linear mapping and reconstruction. • Patch extraction and representation: this operation extracts some overlapping patches from the input LHd , then each patch is represented as a

225

high dimensional vector. We call these patches as feature maps. • Non-linear mapping: this operation is the core of the model. The convolution layers here can act as a non-linear function which maps a highdimensional vector to another high-dimensional vector. In our SDCNN, the mapped high dimensional vectors are conceptually the representation

230

of the patches in Hd.

10

put out

ity ear lin non

on uti vol con

ity ear lin non

n tio olu

v con

ity ear lin non

on uti vol con

ut inp

Fig. 2: Typical SDCNN architecture with three stages

• Reconstruction: this operation combines the mapped HR patches into the ˆ H dˆ is expected to be close to final spectral difference of the HR HSI H d. the desired spectral difference Hd. The SDCNN is a trainable multilayer architecture composed of a series of 235

stages. Each stage consists of two layers: 1) a convolutional layer; and 2) a nonlinearity layer. Although pooling in SDCNN makes the network training computationally feasible, it results in reduced resolution [43], so no pooling layer is contained in the SDCNN model designed for super-resolution. On the other hand, because both the data and label are images, contrary to the multiple-

240

to-one classification problem, neither full-connection layer is contained in the SDCNN model. A typical SDCNN architecture with three stages has been plotted in Fig. 2. Each layer type is described as follows: • Convolutional layer: Both the input and the output of the convolutional layer are 3-D arrays with various 2-D feature maps. The size of the input

245

is w × h × c. It means that the input is c feature maps of size w × h. The size of the corresponding output is w1 × h1 × n. Padding can be added in this layer to make sure the same size of the input feature map and output feature map. The convolutional layer acts as a detector of the local conjunctions of features from previous layer [44].

250

• Nonlinearity layer: This layer simply consists of a point wise nonlinearity function applied to each component in the feature map, which can be 11

denoted as a = f (z). f () can be chosen as rectified linear unit (ReLU) or sigmoid function. In the SDCNN training process, f () is chosen as the ReLU function for easier optimization [45]. 255

Learning the end-to-end mapping function F () requires the estimation of model parameters Θ = {w1 , ..., wi , b1 , ...bi }, where wj and bj represent the weight and bias of the jth convolutional layer, respectively. i is the number of the total convolutional layers. In the SDCNN training process, values of these parameters Θ are initialized from a Gaussian distribution with zero mean

260

and the variance is 0.001. The optimum values of these parameters are achieved through minimizing the loss between the reconstructed HR spectral difference F (Ld; Θ) and the corresponding label Hd. Mean Squared Error (MSE) is utilized as the loss function: n

L(Θ) =

1X ||F (Ld; Θ) − Hd||2 n 1

(12)

where n is the number of training samples. The loss L(Θ) is minimized via 265

stochastic gradient descent with the back-propagation [46]. During the SDCNN training process, the learning rate and momentum are set as 0.001 and 0.9, respectively. 3.3.3. HSI SR using SDCNN When an LR HSI is input, minus operation is done between neighboring

270

bands to get the spectral difference Ld, and then a LHd with the same size of the desired Hd is achieved by the up-sampling operation. According to the proposed SDCNN that has learned in the aforementioned step, the output H dˆi−1 = F (Li − Li−1 ) is exploited to restrict the spectral information in the spectral domain. The SR method constrained by the spectral difference can be

275

expressed as: ˆ i |H ˆ i−1 , Li , Li−1 ) = ||(H ˆi − H ˆ i−1 ) − F (Li − Li−1 )||2F E(H

(13)

and it can be minimized by ˜i = H ˆ i−1 + F (Li − Li−1 ) H 12

(14)

Because the spectral information of a pixel is a vector that is combined by a set of values at different bands, restricting the correlation between neighboring bands can keep both the magnitude and the direction of the spectral vector. It 280

results in a good SR reconstruction performance in spatial domain with spectral preserving. 3.4. SR reconstruction of the SCT SDCNN method When combined the SCT strategy with the SDCNN model, the proposed SR method has a good performance at the spatial information enhancement and

285

spectral information preservation. The overall algorithm for super-resolving the HR HSI is summarized in Algorithm 1.

The whole process of the proposed

Algorithm 1: HSI SR via SCT SDCNN method ˆ iteration Input: LR HSI L, SDCNN learned spectral difference H d, times m, band number K ˆ Output: reconstructed HR HSI H 1 2

for i = 1; i ≤ K; i + + do if i == 1 then

3

up-sample L1 → LH1 ;

4

ˆ 0 = LH1 ; set H 1

5

for k = 1; k ≤ m; k + + do ˆ k via Eq.(6); update H 1

6 7 8 9 10

ˆ m as H ˆ 1; set H 1 else ˆ0 = H ˆ i−1 + H dˆi−1 ; update H i for k = 1; k ≤ m; k + + do ˆ k via Eq.(6); update H i

11

12

ˆ return H;

method is shown in Fig. 3. When super-resolving the i + 1th band, the former

13

F ( Li 1  Li ) Hˆ i

min || ( Hˆ i 1  G)   Li 1 ||2F Hˆ i 1

Hˆ i 1  G

( Hˆ i 1  G ) 

Li 1 ( Hˆ i 1  G )   Li 1 super-resolution by the SDCNN model

super-resolution by the SCT strategy

Fig. 3: Whole process of the proposed method

ˆ i has already been super-resolved. H ˆ i is combined with the SDCNN band H mapped spectral difference F (Li+1 − Li ) (namely H dˆi ) to obtain an initial 290

value of Hi+1 , and then Hi+1 is updated by the SCT strategy through Eq.(6).

4. Experimental results We first validate the performance of the proposed SCT SDCNN method module by module, and then show the overall performance. Experimental data come from three different databases [47, 48, 49]. The 32 HSIs in the CAVE 295

database [47] are of a wide variety of real-world materials and objects with controlled illumination in the laboratory environment, and their wavelengths range from 400 to 700nm with 10nm interval a band. The size of all these HSIs is 512 × 512. The Harvard database [48] contains 50 outdoors and indoors HSIs under daylight illumination, and each HSI contains 31 bands from 420nm

300

to 720nm. The HSIs database created by Foster et al. [49] (named Foster database) contains 25 HSIs of outdoor urban and rural scenes, and each HSI has 33 bands from 400-720 nm with 10 nm a band, and its size is 1341 × 1022. In the experiment, all HSIs in the Harvard and Foster databases are cropped to 1008×1008 in the spatial domain. We have selected two HSIs with comparatively

305

rich textures from each database to validate the performance of our proposed method. The rest HSIs from these three databases are used for the training process, in which 95 HSIs and six HSIs are used as training data and cross validation data, respectively. In order to provide an eye-appealing exhibition about the HSI for evaluating, we have also reconstructed an RGB image for 14

310

each HSI by averaging the first to the 11th band (400-500 nm or 420nm-520nm) as the B channel, the 11th to the 21th (500-600 nm or 520nm-620nm) band as the G channel, and the 21th to the 31th band (600-700 nm or 620nm-720nm) as the R channel. Fig. 4 provides an exhibition about the original HSIs to be tested.

(a) Fig. 4:

(b)

(c)

(d)

(e)

(f)

The HR RGB images reconstructed by the HSIs from three different databas-

es.(a) f lowers and (b) f ake and real f ood with the size 512x512 from the CAVE database [47]. (c)imgg1 and (d)imgb0 under daylight illumination from the Harvard database[48]. (e)Braga Graf f iti and (f) Y ellow Rose from the database built by Foster et al.[49] . The size of (c)-(f) is 1008x1008.

315

4.1. Experiment design and parameters setting 4.1.1. Experimental design The LR HSIs are obtained by the following steps: (1) the original HR HSIs are convolved by a gaussian filter G; (2) the convolved HR HSIs are downsampled by the scaling factor; (3) zero-mean gaussian noise is added to the

320

down-sampled HSIs. Implementations of the compared methods are all from the publicly available codes provided by the authors, and all the input LR HSIs are the same. For the fusion methods [24, 50, 51, 52, 53] that require an auxiliary panchromatic image, we use a panchromatic image created by interpolation and spectrally averaging over the entire range of the LR HSI [54] as one of the input.

325

The following four quantitative measurements: root mean square error (RMSE), peak-signal-to-noise ratio (PSNR), structure similarity index (SSIM)[55] and spectral angle mapper (SAM) [6] are employed to evaluate the quality of the reconstructed HSI. In addition, to provide some proofs from the information theory view, the following entropy measurement is employed to evaluate the

15

38.5

38

38

average PSNR

average PSNR

38.5

37.5 37

37.5 37 36.5 36

36.5 0

0.5

1

1.5

0

2

10

20

(a) Fig. 5:

30

40

iteration times

6

(b)

(a)The average PSNR curve as a function of the value of λ; (b)The average PSNR

curve as a function of the value of iteration times

330

performance of the reconstructed HSIs. EN T R(i) =

65535 X

−p(I(x, y) = val) log(p(I(x, y) = val))

(15)

val=0

where x ∈ [1, sN1 ] and y ∈ [1, sN2 ]. Let the image I be normalized into the range [0, 65535] and be forced into integer. p(I(x, y) = val) denotes the possibility that the pixel value is equal to val. EN T R(i) represents the information entropy of the ith band, and the global EN T R is computed by averaging over the whole 335

image. During the SDCNN training process, in order to evaluate the performance of the trained model, we have defined pl PSNR and pl SSIM to evaluate the similarity between the prediction output and the label. 4.1.2. Parameters setting

340

To evaluate the sensitivity of key parameters to the performance of the proposed method, we have varied the value λ and the iteration times. Fig. 5(a) plots the average PSNR of the six reconstructed HR HSIs as a function of the value λ. We can see that the proposed method performs best when the value of λ locates in the range of 1.0 ∼ 1.2 and is insensitive to the variation of λ in this

345

range. In our implementation, we set λ =1.0. Fig. 5(b) plots the average PSNR of the six reconstructed test HR HSIs as a function of the iteration times. It is

16

noted that the performance of the proposed method grows slowly when iteration times is larger than 10. Moreover, the performance goes decline when iteration time is larger than 15. The reason may be that the information loss caused by 350

down-sampling and blur operations has suppressed the information brought by the SCT strategy. Hence, we set the iteration time as 10 in the experiment. 4.2. SR performance of the SCT strategy In order to verify the performance of our method in a clear way, we have tested its performance module by module. Table 1 shows the SR performance

355

of the bicubic interpolation and the SCT strategy for each test HSI. Seen from Table 1 in the horizontal direction, the SCT strategy can greatly improve the SR performance when the scaling factor is small. But when compared the data in the vertical direction, the performance of the SCT strategy will get worse as the scaling factor increases. According to the data in Table 1, when scaling

360

factor is 8, the performance of SCT strategy may be worse than that of the bicubic interpolation. The reason for this phenomenon is that when the scaling factor is small, most information about the original HR HSI is retained in the LR HSI, enforcing the LR HSI generated by the reconstructed HSI be close to the input LR HSI can be benefit for the SR process. When the scaling factor

365

increases, the information of the LR HSI is badly ruined from the original HR HSI, and the noise imported by this strategy will suppress the information the LR HSI brings, making a negative influence on the SR process. 4.3. SR performance of the SDCNN model Parameters sensitivity analysis: In order to learn an SDCNN between

370

the LHd and Hd, we have tested the performances with different network layers to choose the best layer number. When the best layer number is determined, different groups of kernel sizes have been tested to choose the best kernel size. Generally speaking, the deeper a network is, the more complex relationship it can represent, making the performance more acceptable. But Dong et al. [20]

375

proposed that in practice, it is hard to say the deeper the better. This is because

17

Table 1: Quantitative measurement of the experimental results by SCT strategy

CAVE

Eval.Mat

f lowers bicubic

SCT

Harvard

f ake and real f ood bicubic

SCT

imgg1 bicubic

SCT

Foster

imgb0

Braga Graf f iti

Y ellow Rose

bicubic

SCT

bicubic

SCT

bicubic

SCT

Scaling factor=2 ENTR

5.3951

5.4865

5.6635

5.9225

4.6158

4.8399

2.3830

2.5767

3.3134

3.3889

2.2391

2.3068

RMSE

0.0156

0.0110

0.0137

0.0099

0.0074

0.0057

0.0189

0.0154

0.0128

0.0075

0.0058

0.0034

PSNR

36.17

39.19

37.27

40.07

42.60

44.92

34.46

36.24

37.85

42.47

44.70

49.45

SSIM

0.9689

0.9795

0.9701

0.9874

0.9889

0.9919

0.9080

0.9310

0.9713

0.9850

0.9896

0.9946

SAM

5.0834

4.0978

4.0176

3.3308

4.5515

4.4638

2.1498

2.1620

10.6775

9.9880

3.4218

3.0349

Scaling factor=4 ENTR

5.2983

5.3538

5.6596

5.7503

4.6376

4.6623

2.3365

2.4656

3.2785

3.3699

2.2371

2.2375

RMSE

0.0200

0.0186

0.0174

0.0159

0.0089

0.0083

0.0222

0.0211

0.0176

0.0153

0.0078

0.0067

PSNR

33.97

34.59

35.17

35.99

41.00

41.62

33.09

33.52

35.07

36.30

42.14

43.42

SSIM

0.9523

0.9522

0.9538

0.9551

0.9854

0.9865

0.8826

0.8891

0.9515

0.9569

0.9833

0.9854

SAM

6.3232

5.7587

5.1150

4.7207

4.6716

5.0783

2.2831

2.2714

11.5335

11.5323

3.8215

3.6748

Scaling factor=8 ENTR

5.0167

5.0149

5.4655

5.4727

4.3006

4.4141

2.1714

2.2160

3.1296

3.1744

2.2200

2.2238

RMSE

0.0305

0.0310

0.0267

0.0265

0.0126

0.0127

0.0280

0.0281

0.0297

0.0294

0.0141

0.0140

PSNR

30.30

30.16

31.38

31.52

38.02

37.94

31.05

31.02

30.53

30.63

37.00

37.10

SSIM

0.9063

0.9027

0.9077

0.9055

0.9745

0.9740

0.8453

0.8440

0.9016

0.9007

0.9597

0.9591

SAM

9.6781

9.3706

8.1185

7.9154

4.9423

5.0018

2.5620

2.6230

13.3825

14.8103

4.7035

5.0010

of that the deeper means the more parameters need to be determined, a process which is more difficult. The SDCNN training process is conducted on the 95 training data and six cross validation data. When the SDCNNs with different layer number or different kernel size have been successfully trained, finding which 380

SDCNN achieves the best performance is operated on the six test HSIs. The pl PSNR and pl SSIM are defined to measure the average similarity between the reconstructed spectral difference H dˆ and the desired spectral difference Hd for these six test HSIs. Comparisons between three layers and four layers have been plotted in Fig. 6. Here we use ’+’ to denote the pl PSNR and pl SSIM of three

385

layers, and use ’x’ to denote four layers. Three groups of kernel sizes have been set for each layer number. It is observed that as the iteration times grows, the pl PSNR and pl SSIM of three layers grows first and then comes into a stable

18

42

0.925

40

0.92

pl_751 pl_7511 pl_753 pl_7513 pl_953 pl_9513

36 34

0.915

SSIM

PSNR

38

0.905

32

0.9

30

0.895

28 0

5

10

15

20

25

pl_751 pl_7511 pl_753 pl_7513 pl_953 pl_9513

0.91

0.89

30

0

5

10

iteration times (10 4)

15

20

25

30

iteration times (10 4)

(a)

(b)

(a)pl PSNR comparison between three and four layers with different kernel sizes;

Fig. 6:

(b)pl SSIM comparison between three and four layers with different kernel sizes 40.6

29.5 29

pl_95 pl_953

40.4

pl_9513 pl_95313

28

40.2 PSNR

PSNR

28.5

40

27.5 39.8

27 26.5 0

5

10

15

20

25

39.6

30

0

iteration times(10 4)

5

10

15

20

25

30

35

iteration times (10 4)

(a)

(b)

Fig. 7: (a)pl PSNR comparison between four and five layers; (a)pl PSNR comparison between three and two layers

state, but the performance of four layers is unstable and always worse than three layers. When we go deeper to the five layers, performance comparison 390

with four layers has been shown in Fig. 7(a). It is noted that the SDCNN with five layers has poorer performance than that with four layers. We have also made a comparison between three layers and two layers, and the experimental data is shown in Fig. 7(b). Seen from Fig. 7(b), the SDCNN with three layers has a better performance than that with two layers, so we have selected three

395

as the number of the final network layers. After the number of layers has been decided, we planned to explore the best kernel size for each layer. Three groups of different parameters (9-5-3, 7-5-3 and

19

40.5

0.924

40.4

0.923 pl_751 pl_753 pl_953

0.922

40.2

0.921

40.1

0.92

SSIM

PSNR

40.3

40

pl_751 pl_753 pl_953

0.919

39.9

0.918

39.8

0.917 0.916

39.7

0.915

39.6 0

5

10

15

20

25

30

0

35

5

10

iteration times (10 4)

15

20

25

30

35

iteration times (10 4)

(a)

(b)

Fig. 8: (a)pl PSNR comparison between three groups of kernel size; (b)pl SSIM comparison between three groups of kernel size 40.5 40.4

PSNR

40.3 pl_953 pl_973

40.2 40.1 40 39.9 39.8 0

5

10

15

20

25

30

iteration times (10 4 )

Fig. 9: pl PSNR comparison between kernel size 9-5-3 and 9-7-3

7-5-1) have been set in the experiment. Experimental results of different kernel sizes, but same iteration times, have been shown in Fig. 8. Seen from Fig. 8, 400

as the iteration times grows, the overall tendency of the pl PSNR and pl SSIM grows till a stable state. It can also be seen that the SDCNN with kernel size 9-5-3 has a superior performance than the others. By analyzing that 9-5-3 has a larger size than 7-5-3 and 7-5-1, so we tried another group of the larger kernel size 9-7-3, the experimental result is shown in Fig. 9. Seen from Fig. 9, the

405

optimal performance of the SDCNN with kernel size 9-7-3 is a little better than that of the SDCNN with 9-5-3, but the SDCNN with kernel size 9-5-3 keeps a more stable state than the SDCNN with kernel size 9-7-3 when iteration times grows, so we have set 9-5-3 as the final kernel size. Some feature maps for the f lower HSI at different layers have been shown in Fig. 10.

410

Performance: Experimental results of the SDCNN model on each single

20

(a)

(b)

(c) Fig. 10:

(a)some feature maps at the first layer; (b)some feature maps at the second layer;

(c)feature map at the third layer

Table 2: Quantitative measurement of the experimental results by the SDCNN model

Eval.Mat

CAVE f lowers

Harvard

f ake and real f ood

imgg1

imgb0

Foster Braga Graf f iti

Y ellow Rose

Scaling factor=2 ENTR

5.4349

5.7974

5.1091

2.5820

3.3543

2.2671

RMSE

0.0113

0.0109

0.0083

0.0173

0.0100

0.0057

PSNR

38.91

39.24

41.66

35.23

40.03

44.93

SSIM

0.9757

0.9725

0.9776

0.9195

0.9780

0.9905

SAM

4.2008

3.5523

4.5649

2.2095

10.4546

3.0456

ENTR

5.4126

5.7533

4.7908

2.5532

3.3072

2.2411

RMSE

0.0172

0.0152

0.0083

0.0205

0.0150

0.0062

PSNR

35.29

36.35

41.67

33.77

36.51

44.19

SSIM

0.9550

0.9527

0.9849

0.8942

0.9592

0.9874

SAM

5.6806

4.7810

4.6869

2.2401

11.4133

3.6508

Scaling factor=4

Scaling factor=8 ENTR

5.1238

5.4916

4.6573

2.2388

3.2570

2.2383

RMSE

0.0283

0.0254

0.0127

0.0273

0.0286

0.0130

PSNR

30.96

31.89

37.95

31.29

30.88

37.73

SSIM

0.9068

0.8929

0.9740

0.8473

0.9003

0.9613

SAM

8.7770

7.8734

4.8372

2.5548

13.4083

4.6629

21

test HSI from three databases have been shown in Table 2. By analyzing the data in Table 2, the SDCNN model reduces the spectral information loss with spatial information increased at the same time. This is caused by the spatialspectral merging feature of HSIs. Excellent preservation of the spectral infor415

mation will ensure a high spatial resolution at the same time. For the HSIs in the Harvard database which have more homogeneous texture than that of the HSIs in the other databases, it is more suitable to use deep learning to extract their feature. The SDCNN model has a stable and outstanding performance. By comparing the data in Tables 1 and 2, the performance of the SDCNN model

420

is not better than the SCT strategy when the scaling factor is small, for examle 2, but when the scaling factor grows, the performance of the SDCNN model will be superior to the SCT strategy. So it is concluded that the SDCNN model is not only good at preserving the spectral information, but also has a more stable SR performance as the scaling factor increases.

425

4.4. SR performance of the SCT SDCNN method As described in the aforementioned part, the SR performance of the SCT strategy is excellent at a small scaling factor, but it decreases rapidly as the scaling factor grows. The SR performance of the SDCNN model is relatively more stable in both spatial information enhancement and spectral information

430

preservation, so we combine the SCT strategy with the SDCNN model to superresolve an LR HSI into an HR one. 4.4.1. Evaluation for the CAVE database In Table 3, we have provided quantitative measurements averaged on the two test HSIs in the CAVE database. By analyzing the data in Table 3, the perfor-

435

mance of our proposed SCT SDCNN method outperforms the other methods. Fig.8(a) presents a direct show about the RGB images created by the reconstructed f lowers HSI via various methods when scaling factor is 2. One point in the main body of the f lowers is randomly selected and the point locates at the center of red circle whose radius is two pixels. Its corresponding spec-

22

Table 3: The average result of RMSE, PSNR (dB), SSIM and SAM on two test HSIs in the CAVE database

Eval. Mat

ENTR

RMSE

PSNR

SSIM

SAM

440

Scale

Bicubic

GS [50]

GSA [51]

GFPCA [52]

HySure [53]

NSSR[24]

SCT SDCNN

2

5.5113

5.5567

5.5342

5.6490

5.5254

5.7483

5.7542

4

5.4790

5.3997

5.3750

5.4660

5.3844

5.5360

5.5944

8

5.2411

4.9466

4.9229

5.1294

4.8581

5.2685

5.3455

2

0.0146

0.0148

0.0147

0.0158

0.0139

0.0148

0.0101

4

0.0187

0.0203

0.0202

0.0196

0.0241

0.0173

0.0162

8

0.0287

0.0249

0.0320

0.0295

0.0332

0.0274

0.0263

2

36.72

36.60

36.65

36.06

37.13

36.59

39.95

4

34.57

33.87

33.91

34.18

32.37

35.23

35.83

8

30.84

32.07

29.91

30.63

29.59

31.24

31.60

2

0.9695

0.9679

0.9692

0.9641

0.9675

0.9649

0.9804

4

0.9531

0.9382

0.9391

0.9500

0.9272

0.9542

0.9546

8

0.9070

0.8737

0.8695

0.9078

0.8388

0.9017

0.9074

2

4.5505

4.3629

4.2833

4.8469

6.3298

7.1989

3.6036

4

5.7191

6.4412

6.3896

5.4542

7.8015

5.9957

5.0730

8

8.8983

10.4128

10.3768

7.7394

10.8693

8.8658

8.0551

tral curve is plotted in Fig. 11(c)-(i). The comparison between these spectral curves has been shown in Fig. 11(b). According to Fig. 11, our proposed SCT SDCNN method can obtain better spatial information and achieve better spectral information preservation at the same time. 4.4.2. Evaluation for the Harvard database

445

Experiments have also been conducted on the two test HSIs with comparatively rich textures in the Harvard database. Results have been shown in Table 4. The data in Table 4 indicate the proposed SCT SDCNN method outperforms all the other competing methods. It is noted that the EN T R of the ground truth imgg1 and imgb0 is 7.2147 and 8.1576, respectively. But the EN T R in Table

450

4 is very small. According to the bicubic result in Table 4, this may be due to the degradation, making the input HSIs be badly damaged. We have provided a direct show about the RGB images created by the reconstructed imgb0 HSI via various methods in Fig. 12(a). Because the size of the imgb0 is much too

23

original

bicubic

GS

GSA

GFPCA

HySure

NSSR

(a) 0.12

0.08

0.06

0.06

0.08

0.06 0.04

0.02

0.02

0.02

0 10

15

20

25

30

5

10

15

20

25

30

(b)

(c)

value

0.1 0.08

value

0.1 0.08 0.06 0.04

0.04

0.02

0.02

0.02

0

band number

(f)

25

30

30

0

5

10

15

10

20

25

30

25

30

20

25

30

SCT_SDCNN

0.1 0.08 0.06 0.04 0.02 0

5

10

15

band number

(g)

20

original

NSSR

0

band number

15

(e)

0 0

5

band number

0.06

0.04

20

25

original

HySure

GFPCA

0

20

0.12

original

original

0.06

15

(d)

0.1

15

10

0.12

0.08

10

5

band number

0.12

5

0 0

band number

0.12

0

0.02

0 0

0.06 0.04

value

5

band number

value

0.08

0.04

0

GSA

0.1

0.04

0

original

0.1

value

bicubic

0.12

original GS

original

0.1

value

0.08

value

0.12

original bicubic GS GSA GFPCA HySure NSSR SCT_SDCNN

0.1

value

0.12

(h)

20

25

30

0

5

10

15

band number

(i)

Fig. 11: (a)experimental results of the f lowers with different methods; (b)-(i)spectral profiles of the point in (a) by different methods

24

SCT_SDCNN

Table 4: The average result of RMSE, PSNR (dB), SSIM and SAM on two test HSIs in the Harvard database

Eval. Mat

ENTR

RMSE

PSNR

SSIM

SAM

Scale

Bicubic

GS [50]

GSA [51]

GFPCA [52]

HySure [53]

NSSR [24]

SCT SDCNN

2

3.4994

3.5047

3.5086

3.5314

3.5544

3.7636

3.8804

4

3.4871

3.4062

3.4040

3.4359

3.4976

3.5600

3.7468

8

3.2360

3.2446

3.2389

3.2699

3.1895

3.3342

3.5064

2

0.0118

0.0115

0.0115

0.0131

0.0126

0.0125

0.0093

4

0.0140

0.0140

0.0140

0.0156

0.0167

0.0137

0.0128

8

0.0188

0.0189

0.0189

0.0204

0.0219

0.0189

0.0182

2

38.53

38.77

38.79

38.63

39.01

38.06

40.65

4

37.05

37.07

37.11

37.02

36.30

37.28

37.82

8

34.54

34.48

34.49

34.51

33.72

34.49

34.82

2

0.9485

0.9546

0.9547

0.9524

0.9534

0.9542

0.9620

4

0.9340

0.9407

0.9408

0.9403

0.9330

0.9427

09406

8

0.9099

0.9185

0.9184

0.9196

0.9080

0.9146

0.9119

2

3.3507

3.3738

3.3738

3.6200

3.7375

4.0040

3.2497

4

3.4774

3.6275

3.6286

3.6744

4.1228

3.8124

3.3919

8

3.7522

4.0500

4.0888

3.8155

4.7386

4.9684

3.7140

large and it is inconvenient to provide a comparison within the limited space, 455

we cropped a rectangle region in the HR HSI and selected one point in edges to show the spectral curves comparison in Fig. 12(c)-(i). The total comparisons have been shown in Fig. 12(b). Seen from the created image in Fig. 12(a), it can be seen that our proposed SCT SDCNN method can reconstruct a more eye-appealing HR HSI. According to the curves plotted in Fig. 12(c)-(i) and

460

the overall comparison in Fig. 12(b), our SCT SDCNN method has a better spectral preserving ability than the other methods for the Harvard database. 4.4.3. Evaluation for the Foster database Experiments have also been conducted on the two test HSIs from the Foster database. Results of the SCT SDCNN method and the other methods have

465

been shown in Table 5. EN T R of the ground truth Rose and Braga Graf f iti is 3.5752 and 2.4438, respectively. In this way, the EN T R of the reconstructed HSI is not very large neither, as shown in Table 5. We have also cropped a

25

original

bicubic

GS

GSA

GFPCA

HySure

NSSR

SCT_SDCNN

original

bicubic

GS

GSA

GFPCA

HySure

NSSR

SCT_SDCNN

(a)

value

0.25 0.2 0.15

0.45

original bicubic

0.4

0.35

0.3

0.3

0.3

0.25

0.25

0.25

0.2 0.15

0.15

0.1

0.1

0.05

0.05

0 5

10

15

20

25

30

5

10

25

30

0.45

GFPCA

0.1

0 5

10

15

20

25

30

0

0.45

0.45

original NSSR

0.4

0.2

0.15

0.15

0.1

0.1

0.05

0 5

10

15

band number

(f)

20

25

30

5

10

15

20

25

30

25

30

0 5

10

15

band number

(g)

20

0.1

0

band number

0.2

0.05

0 0

30

0.15

0.05

0 0

value

0.3 0.25

value

0.3 0.25

value

0.3 0.25

0.1

25

0.35

0.3

0.15

20

original SCT_SDCNN

0.4

0.35

0.2

15

(e)

0.25

0.05

10

band number

(d)

0.35

0.2

5

band number

original HySure

0.4

0.35

GSA

0.2

0.05 0

(c)

original

0.4

20

band number

(b) 0.45

15

original

0.15

0 0

band number

value

0.2

0.1

0

0.4

GS

0.35

0.05 0

0.45 original

0.4

0.35

value

0.3

value

0.45

original bicubic GS GSA GFPCA HySure NSSR SCT_SDCNN

0.4 0.35

value

0.45

(h)

20

25

30

0

5

10

15

band number

(i)

Fig. 12: (a) experimental results of the imgb0 with different methods; (b)-(i) spectral profiles of the point in (a) by different methods

26

Table 5: The average result of RMSE, PSNR (dB), SSIM and SAM on two test HSIs in the Foster database

Eval. Mat

ENTR

RMSE

PSNR

SSIM

SAM

Scale

Bicubic

GS [50]

GSA [51]

GFPCA [52]

Hysure [53]

NSSR [24]

SCT SDCNN

2

2.7763

2.7947

2.8076

2.7982

2.8039

2.8564

2.9333

4

2.7578

2.7781

2.7859

2.7770

2.7762

2.8039

2.8255

8

2.6748

2.6992

2.7003

2.6837

2.6751

2.7472

2.7553

2

0.0086

0.0085

0.0085

0.0088

0.0090

0.0082

0.0048

4

0.0117

0.0125

0.0124

0.0114

0.0122

0.0073

0.0092

8

0.0205

0.0219

0.0218

0.0210

0.0193

0.0194

0.0188

2

41.27

41.40

41.45

41.11

41.56

41.77

46.31

4

38.61

38.06

38.15

38.89

38.30

42.69

40.77

8

33.77

33.20

33.25

33.54

34.27

34.24

34.54

2

0.9805

0.9813

0.9814

0.9768

0.9807

0.9816

0.9905

4

0.9674

0.9656

0.9658

0.9645

0.9567

0.9863

0.9746

8

0.9307

0.9256

0.9242

0.9302

0.9076

0.9333

0.9329

2

7.0497

7.2560

7.2558

7.4827

7.8265

7.5801

6.3711

4

7.6775

8.8116

8.8346

8.9295

8.5882

7.4369

7.6036

8

9.0430

11.1369

11.3432

9.6921

11.9425

9.4430

9.0382

rectangle region in the HR HSI to provide a better comparison between different methods. In Fig. 13(a), it can be seen that our SCT SDCNN can reconstruct a 470

more eye-appealing HR HSI. We have also randomly selected one point in the main region of the Braga Graf f iti HSI to compare the spectral difference, and the point locates at the center of the red circle whose radius is two pixels. Its spectral curves of the reconstructed HSIs by different methods are plotted in Fig. 13(c)-(i), and the overall spectral curves comparison is presented in Fig.

475

13(b). The spectral curve created by the SCT SDCNN is also more close to the curve of the original HSI.

5. Conclusion In this paper, we propose a novel HSI SR method by combining an SCT strategy to improve the spatial information with an SDCNN model to preserve 480

the spectral information. The proposed SDCNN learns an end-to-end spectral

27

original

bicubic

GS

GSA

Our

original

bicubic

GS

GSA

GFPCA

HySure

SCT_SDCNN

original

bicubic

GS

GSA

GFPCA

HySure

SCT_SDCNN

(a)

0.03

0.04

0.035

0.035

0.035

0.03

0.025

0.02

0.02

0.02

0.015 0

5

10

15

20

25

30

5

10

0.05

20

25

30

0.02 0.015 0

5

10

band number

(b) 0.05

20

25

30

0

0.05

0.05

original NSSR

0.045

0.035

0.035

0.03

0.03 0.025

0.02

0.02

0.02

0.015 0

5

10

15

band number

(f) Fig. 13:

20

25

30

value

0.035

value

0.035

value

0.04

0.025

5

10

15

20

25

30

25

30

20

25

30

0.03 0.025 0.02

0.015 0

20

original SCT_SDCNN

0.045

0.04

0.025

15

(e)

0.04

0.015

10

band number

0.04

0.03

5

(d)

original HySure

0.045

15

band number

(c)

original GFPCA

0.045

15

0.03 0.025

0.015 0

band number

value

0.03

0.025

original GSA

0.045

0.04

0.025

0.015

0.05

original GS

0.045

0.04

value

0.035

0.05

original bicubic

0.045

value

value

0.05

original bicubic GS GSA GFPCA HySure NSSR SCT_SDCNN

0.04

value

0.05 0.045

0.015 0

5

10

band number

15

band number

(g)

(h)

20

25

30

0

5

10

15

band number

(i)

(a) experimental results of the Braga Graf f iti with different methods; (b)-(i)

spectral profiles of the point in (a) by different methods

28

difference mapping between the LR HSI and the HR HSI. Different from most state-of-the-arts methods are focused on improving the resolution by fusion an HSI with an RGB image of the same scene, which is difficult to achieve in the real world, our proposed SCT SDCNN method improves the spatial resolution with 485

just one LR HSI as the input. Spatial resolution is improved by constraining the LR band generated by the reconstructed HR HSI should be close to the input LR band, and spectral difference of the reconstructed HSI is restricted to be close to the SDCNN learned spectral difference. In this way, the reconstructed HR HSI is excellent in both spatial information recovery and spectral information

490

preservation. We have tested the performance of our proposed SCT SDCNN method on three different databases, which contain both indoor scenes with controlled illumination and outdoor scenes with daylight. Comparative analyses demonstrate that the proposed method outperforms the existing state-of-thearts methods.

495

Acknowledgments This work was supported by the National Science Foundation of China under Grants 61222101, 61272120, 61301287, 61301291 and 61350110239.

References [1] J. Li, J. M. Bioucas-Dias, A. Plaza, Spectralcspatial hyperspectral image 500

segmentation using subspace multinomial logistic regression and markov random fields, IEEE Transactions on Geoscience and Remote Sensing 50 (3) (2012) 809–823. [2] M. P. Nelson, L. Shi, L. Zbur, R. J. Priore, P. J. Treado, Real-time shortwave infrared hyperspectral conformal imaging sensor for the detection of

505

threat materials, in: SPIE Defense + Commercial Sensing (DCS) Symposium, Vol. 9824, 2016, pp. 1–9.

29

[3] S. Asadzadeh, C. Roberto, De Souza Filho, A review on spectral processing methods for geological remote sensing, International Journal of Applied Earth Observation and Geoinformation 47 (2016) 69–90. 510

[4] B. Du, L. Zhang, A discriminative metric learning based anomaly detection method, IEEE Transactions on Geoscience and Remote Sensing 52 (11) (2014) 6844–6857. [5] K. Tan, X. Jin, A. Plaza, X. Wang, L. Xiao, P. Du, Automatic change detection in high-resolution remote sensing images by using a multiple classifier

515

system and spectralcspatial features, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2016) 3439–3451. [6] H. Zhang, L. Zhang, H. Shen, A super-resolution reconstruction algorithm for hyperspectral images, Signal Processing 92 (9) (2012) 2082–2096. [7] S. C. Park, M. K. Park, M. G. Kang, Super-resolution image reconstruction:

520

a technical overview, Signal Processing Magazine IEEE 20 (4) (2003) 21–36. [8] F. Jiru, Introduction to post-processing techniques, European Journal of Radiology 67 (2) (2008) 202–217. [9] M. Abdel-Nasser, J. Melendez, A. Moreno, O. A. Omer, D. Puig, Breast tumor classification in ultrasound images using texture analysis and super-

525

resolution methods, Engineering Applications of Artificial Intelligence 59 (2017) 84–92. [10] A. Hori, T. Toda, Regulation of centriolar satellite integrity and its physiology, Cellular and Molecular Life Sciences 74 (2) (2017) 213–229. [11] A. Kappeler, S. Yoo, Q. Dai, A. K. Katsaggelos, Super-resolution of com-

530

pressed videos using convolutional neural networks, in: IEEE International Conference on Image Processing, 2016, pp. 1150–1154. [12] W. C. Siu, K. W. Hung, Review of image interpolation and superresolution, in: Signal and Information Processing Association Summit and Conference, 2012, pp. 1–10. 30

535

[13] X. Li, M. T. Orchard, New edge-directed interpolation., IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 10 (10) (2000) 1521–1527. [14] J. Sun, J. Sun, Z. Xu, H. Y. Shum, Image super-resolution using gradient profile prior, Proceedings / CVPR, IEEE Computer Society Conference on

540

Computer Vision and Pattern Recognition 8 (2008) 1–8. [15] Y. W. Tai, S. Liu, M. S. Brown, S. Lin, Super resolution using edge prior and single image detail synthesis, in: Cvpr, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 2400–2407. [16] M. K. Ozkan, A. M. Tekalp, M. I. Sezan, Pocs-based restoration of space-

545

varying blurred images, IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 3 (4) (1994) 450–454. [17] R. R. Schultz, R. L. Stevenson, Extraction of high-resolution frames from video sequences, IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 5 (6) (1996) 996–1011.

550

[18] J. Yang, J. Wright, T. S. Huang, Y. Ma, Image super-resolution via sparse representation., IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 19 (11) (2010) 2861–2873. [19] J. Bruna, P. Sprechmann, Y. Lecun, Image super-resolution using deep convolutional networks, arXiv:1511.05666v2 (2015) 1–15.

555

[20] C. Dong, C. C. Loy, K. He, X. Tang, Image super-resolution using deep convolutional networks., IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (2) (2015) 295–307. [21] T. Akgun, Y. Altunbasak, R. M. Mersereau, Super-resolution reconstruction of hyperspectral images, IEEE Transactions on Image Processing

560

14 (11) (2005) 1860–1875.

31

[22] M. Simoes, J. Bioucas-Dias, L. B. Almeida, J. Chanussot, Hyperspectral image superresolution: An edge-preserving convex formulation, in: IEEE International Conference on Image Processing, 2014, pp. 4166 – 4170. [23] N. Akhtar, F. Shafait, A. Mian, Sparse spatio-spectral representation for 565

hyperspectral image super-resolution, in: European Conference on Computer Vision, 2014, pp. 63–78. [24] W. Dong, F. Fu, G. Shi, X. Cao, J. Wu, G. Li, X. Li, Hyperspectral image super-resolution via non-negative structured sparse representation., IEEE Transactions on Image Processing 25 (5) (2016) 2337–2351.

570

[25] P. M. Atkinson, Mapping sub-pixel vector boundaries from remotely sensed images, Proceedings of GISRUK ’96.Canterbury, UK, 1996. [26] Y. Lecun, B. Boser, J. Denker, D. Henderson, Backpropagation applied to handwritten zip code recognition, Neural Computation 1 (4) (1989) 541– 551.

575

[27] W. Ouyang, X. Wang, X. Zeng, S. Qiu, Deepid-net: Deformable deep convolutional neural networks for object detection, in: Computer Vision and Pattern Recognition, 2015, pp. 2403–2412. [28] C. Szegedy, S. Reed, D. Erhan, D. Anguelov, S. Ioffe, Scalable, high-quality object detection, arXiv:1412.1441v3 (2015) 1–10.

580

[29] Y. Sun, X. Wang, X. Tang, Deep learning face representation by joint identification-verification, Advances in Neural Information Processing Systems 27 (2014) 1988–1996. [30] H. M. Li, Deep learning for image denoising, International Journal of Signal Processing, Image Processing and Pattern Recognition 7 (3) (2014) 171–

585

180. [31] J. Zabalza, J. Ren, J. Zheng, H. Zhao, C. Qing, Z. Yang, P. Du, S. Marshall, Novel segmented stacked autoencoder for effective dimensionality re32

duction and feature extraction in hyperspectral imaging, Neurocomputing 185 (2016) 1–10. 590

[32] W. Zhao, S. Du, Learning multiscale and deep representations for classifying remotely sensed imagery, Isprs Journal of Photogrammetry and Remote Sensing 113 (2016) 155–165. [33] F. Zhang, B. Du, L. Zhang, Scene classification via a gradient boosting random convolutional network framework, IEEE Transactions on Geoscience

595

and Remote Sensing 54 (3) (2016) 1793–1802. [34] W. Zhao, S. Du, Spectralcspatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach, IEEE Transactions on Geoscience and Remote Sensing 54 (8) (2016) 4544–4554. [35] A. Romero, C. Gatta, G. Camps-Valls, Unsupervised deep feature extrac-

600

tion for remote sensing image classification, IEEE Transactions on Geoscience and Remote Sensing 54 (3) (2015) 1349–1362. [36] Y. Chen, X. Zhao, X. Jia, Spectralcspatial classification of hyperspectral data based on deep belief network, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8 (6) (2015) 2381–2392.

605

[37] Y. Yuan, X. Zheng, X. Lu, Hyperspectral image superresolution by transfer learning, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10 (5) (2017) 1963–1974. [38] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998)

610

2278–2324. [39] N. Nguyen, P. Milanfar, G. Golub, Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement, IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 10 (9) (2001) 1299–1308.

33

615

[40] M. Elad, A. Feuer, Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images, IEEE Transactions on Image Processing 6 (12) (1997) 1646–1658. [41] Y. Li, W. Xie, H. Li, Hyperspectral image reconstruction by deep convolutional neural network for classification, Pattern Recognition 63 (2017)

620

371–383. [42] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv:1502.03167v3 (2015) 1–11. [43] A. Dosovitskiy, P. Fischery, E. Ilg, Flownet: Learning optical flow with convolutional networks, in: IEEE International Conference on Computer

625

Vision, 2015, pp. 2758–2766. [44] Y. LeCun, G. Bengio, Yoshua andHinton, Deep learning, NATURE (2015) 436–444. [45] K. Jarrett, K. Kavukcuoglu, M. Ranzato, Y. Lecun, What is the best multistage architecture for object recognition?, in: Proc. International Confer-

630

ence on Computer Vision, 2009, pp. 2146 – 2153. [46] Y. Lcun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278– 2324. [47] F. Yasuma, T. Mitsunaga, D. Iso, S. K. Nayar, Generalized assorted pixel

635

camera: postcapture control of resolution, dynamic range, and spectrum., IEEE Transactions on Image Processing 19 (9) (2010) 2241–2253. [48] A. Chakrabarti, T. Zickler, Statistics of real-world hyperspectral images, in: IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 193–200.

640

[49] F. Dh, N. Smc, A. K, Information limits on neural identification of colored surfaces in natural scenes, Visual Neuroscience 21 (21) (2004) 331–336. 34

[50] C. A. Laben, B. V. Brower, Process for enhancing the spatial resolution of multispectral imagery using pan-sharpening, US Patent 6011875 (2000) 1–9. 645

[51] B. Aiazzi, S. Baronti, M. Selva, Improving component substitution pansharpening through multivariate regression of ms +pan data, IEEE Transactions on Geoscience and Remote Sensing 45 (10) (2007) 3230–3239. [52] L. Wenzhi, H. Xin, C. Frieke Van, G. Sidharta, P. Aleksandra, P. Wilfried, L. Hui, Processing of multiresolution thermal hyperspectral and digital

650

color data: Outcome of the 2014 ieee grss data fusion contest, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8 (6) (2015) 2984–2996. [53] M. Simoes, J. Bioucas-Dias, L. B. Almeida, J. Chanussot, A convex formulation for hyperspectral image superresolution via subspace-based reg-

655

ularization, IEEE Transactions on Geoscience and Remote Sensing 53 (6) (2015) 3373–3388. [54] M. T. Eismann, R. C. Hardie, Application of the stochastic mixing model to hyperspectral resolution enhancement, IEEE Transactions on Geoscience and Remote Sensing 42 (9) (2004) 1924–1933.

660

[55] C. Y. Yang, J. B. Huang, M. H. Yang, Exploiting self-similarities for single frame super-resolution, in: Asian Conference on Computer Vision, 2010, pp. 497–510.

35

Suggest Documents