Hyperspectral image super-resolution using deep convolutional neural network Yunsong Lia,b , Jing Hua,b,∗, Xi Zhaoa,b , Weiying Xiea,b , JiaoJiao Lia,b a State
Key Laboratory of Integrated Service Network, Xidian University, Xian 710071, China b Joint Laboratory of High Speed Multi-source Image Coding and Processing, Xidian University, Xian 710071, China
Abstract Limited by the existed imagery hardware, it is challenging to obtain a hyperspectral image (HSI) with a high spatial resolution. Super-resolution (SR) focuses on the ways to enhance the spatial resolution. HSI SR is a highly attractive topic in computer vision and has attracted the attention from many researchers. However, most HSI SR methods improve the spatial resolution with the important spectral information severely distorted. This paper presents an HSI SR method by combining a spatial constraint (SCT) strategy with a deep spectral difference convolutional neural network (SDCNN) model. It super-resolves the HSI while preserving the spectral information. The SCT strategy constrains the low-resolution (LR) HSI generated by the reconstructed high-resolution (HR) HSI spatially close to the input LR HSI. The SDCNN model is proposed to learn an end-to-end spectral difference mapping between the LR HSI and HR HSI. Experiments have been conducted on three databases with both indoor and outdoor scenes. Comparative analyses validate that the proposed method enhances the spatial information better than the state-of-arts methods, with spectral information preserving simultaneously. Keywords: hyperspectral image, super-resolution, convolutional neural network
∗ Corresponding
author. Tel:(86) 153-1998-0169 Email address:
[email protected] (Jing Hu)
Preprint submitted to Journal of LATEX Templates
May 3, 2017
1. Introduction Hyperspectral imagery is the technology that acquires images of the same scene in many contiguous and narrow spectral bands. The acquired hyperspectral image (HSI) is a spatial-spectral merging three-dimensional data cube that 5
contains both spatial information and spectral information [1]. The important and rich spectral information can be exploited to distinguish some interest material among various ones. By the spectral information, HSI has been widely used in many remote sensing and computer vision applications, such as the military object detection [2], geological exploration [3] and change detection [4, 5].
10
While the HSI can achieve a high spectral resolution with these contiguous and narrow bands, the spatial resolution of HSIs is usually much coarser than that of the RGB images in our daily life [6]. This is due to the fact that the dense spectral bands in the hyperspectral sensors make a limited amount of photons reached one narrow spectral window averagely. The straightforward way to in-
15
crease the spatial resolution is to reduce the pixel size or to increase the chip size by the imagery sensor manufacturing technology. Reducing the pixel size indicates that the number of pixels in per unit area is increased. However, when the pixel size decreases, the number of light available will also decrease. It makes the image quality be severely degraded by the shot noise [7]. As to the chip
20
size increasing way, it will increase the capacitance. Larger capacitance makes it difficult to speed up the charge transfer rate, so this size increasing approach is impractical neither [7]. Signal post-processing technique allows the enhancement of the measured signal after measurement [8]. It can be utilized as one of ways to solve the low spatial resolution of HSIs. Super-resolution (SR) recon-
25
struction is a classical signal post-processing technique that aims at obtaining a high-resolution (HR) image (or sequence) from one or multiple observed low resolution (LR) images. The major advantages of the SR approach include that it is economical and the existed LR imaging systems can still be utilized. SR has been proved useful in many practical cases, including the medical imaging [9],
30
satellite imaging [10], video applications [11], and so on. Therefore, it is of great
2
necessity to develop the HSI SR technique to improve the spatial resolution of HSIs. In this paper, we present an HSI SR method by combining a spatial constraint (SCT) strategy with a deep spectral difference convolutional neural net35
work (SDCNN) model. The SCT strategy constrains the LR HSI generated by the reconstructed HR HSI spatially close to the input LR HSI, and aims at enhancing the spatial resolution. The SDCNN model is a specific convolutional neural network (CNN) model that are proposed to learn an end-to-end spectral difference mapping between the LR HSI and the HR HSI. It is designed
40
to preserve the important spectral information. Our method differs fundamentally from the existed HSI SR method, in that ours does not require an extra panchromatic image as the input, or explicitly learn the dictionary for modeling the patch space. Spatial information is improved with spectral information preserving through our proposed SCT strategy and SDCNN model (named
45
SCT SDCNN) with little preprocessing or post-processing. Experiments have been conducted on HSIs from three different databases that include both indoor scenes and outdoor scenes. Comparative analyses have validated the effectiveness of our proposed SCT SDCNN method.
2. Related work 50
2.1. Image super-resolution The SR methods nowadays can be classified into three categories: interpolationbased methods [12, 13], reconstruction-based methods [14, 15, 16, 17] and learningbased methods [18, 19, 20]. The interpolation-based methods are simple in theory, but can not bring in much extra information. It is difficult for this type of
55
methods to recover the missed high frequency detail of the HR image and to enhance the spatial resolution. The reconstruction-based methods firstly build an image acquisition model, and then solve for the unknown HR image according to the LR input. This model is usually an ill-posed one. During the SR process, constraints like edge prior can be enforced on the model to make the optimal
3
60
solution be close to the real scene [15]. The learning-based methods are the most popular in recent years. They can be classified into two sub- types in detail: the internal-learning and the external-learning SR methods. The internal learning methods hold that some similar patches can be found in the other locations or other scales for one input image patch. The combination of similar patches can
65
be made to reconstruct an HR image. The external learning methods firstly extract the ”meta-detail” shared among external images, and reconstruct the HR image by the ”meta-detail”. The SR methods via deep learning are the typical external learning methods which achieve the-state-of-arts performance in nowadays. This type of methods learns a mapping between LR images and HR
70
images by a large number of training data. It is noted that most SR methods are designed for the monochromatic and color images. In recent years, for the development of hyperspectral sensors and their wide applications, the HSI SR technology has received much more attention from researchers. Akgun et al.[21] proposed a complex model of the HSI acquisition process, and put up with
75
a projection-onto-convex-sets-based SR method. Zhang et al. [6] divided the bands of an HSI into three groups, and a principal-component-analysis-based SR method was applied for the primary component. Simoes et al. [22] proposed an optimization model that contains two data-fitting terms and an edge-preserving regularizer. Akhtar et al. [23] proposed a sparse coding method exploited non-
80
negativity and spatio-spectral sparsity of the scene. Dong et al. [24] proposed that the estimation of the HR HSI was a joint estimation of the dictionary and the sparse code. However, most of these methods tackled the HSI SR problem as an image fusion problem through using an auxiliary HR image. In the reality, it is of great difficulty to obtain a couple of HR panchromatic image and HSI
85
about the same scene with completely registration, making this kind of method not so practical. Sub-pixel mapping was introduced by Atkinson et al. [25] to transform a fraction image by the unmixing operation, a method that tackled the SR problem from the unmixing operation without requiring an auxiliary panchromatic image. This technology divides a pixel into sub-pixels and assigns
90
each new smaller sub-pixel into a land cover class. However, the noise generated 4
by the unmixing operation is inevitable during the mapping operation, which may have a negative influence on the SR process. Moreover, all these methods super-resolved the HSI without specifically preserving the important spectral information. 95
2.2. Deep learning for HSI applications CNN dates back decades [26]. Owing to the rapid growth of the computing ability and amount of data, deep CNN has recently shown an explosively popularity for its success in image processing applications, such as object detection [27, 28], face recognition [29], image denoising [30]. For its successful
100
applications in the RGB images, there have also been some studies of using deep CNN techniques for HSI applications, such as classification [31, 32, 33], dimensionality reduction [34], feature extraction [35, 36]. Zhang et al. [33] proposed a deep ensemble framework for scene classification at the first time. Zhao et al. [32] explored multiscale CNN to transform the original HSI into a
105
pyramid structure that contains spatial features for HSI classification with final result in the form of major voting. Chen et al. [36] introduced the deep belief network (DBN) to extract spectral-spatial features as a pre-processing for the future applications of HSI. Yuan et al. [37] applied the CNN model proposed by Dong et al. [20] directly on the hyperspectral images, but this method did not
110
take consideration of the spectral information preservation and the difference between hyperspectral images and RGB images. In this paper, our proposed SDCNN model is the first time of applying CNN model specifically for the HSI SR problem. The SDCNN is mainly equipped with convolutional layers. Different from all these aforementioned HSI SR methods
115
in section 2.1, the proposed SDCNN model learns the mapping between the input and the output, and it can be trained in an end-to-end manner by the back propagation method [38]. In order to preserve the important spectral information, which most HSI fusion methods cannot achieve [6], the input data and label of the proposed SDCNN are the spectral difference of the LR HSI and
120
that of the HR HSI, respectively. We have also incorporated the SCT strategy 5
to spatially constrain the LR HSI generated by the reconstructed HR HSI close to the input LR HSI. More details are discussed in section 3. The remainder of the paper is organized as follows. Section 3 describes our proposed method. Experimental results are provided in section 4, and we have 125
a conclusion in section 5. 3. Proposed method The main four parts of the proposed SCT SDCNN method are as follows: building an observation model of the HSI, spatial reconstruction by the SCT strategy, spectral reconstruction by the SDCNN model and SR reconstruction
130
by the SCT SDCNN method. Detailed descriptions about these parts have been presented in the following subsections. 3.1. Observation model of the HSI An observation model bridges the desired HR HSI and the input LR HSI. To develop the HSI SR method, we formulate an observation model firstly.
135
Let the desired HR HSI be denoted as H = {H1 , H2 , ..., HK }, where K is the total bands of the HSI, and each band of the HR HSI has sN1 × sN2 pixels. Parameter s represents the down-sampling factor in the spatial domain. The LR HSI is represented as L = {L1 , L2 , ..., LK }, whose size is N1 × N2 in the spatial domain. Both Hi and Li are spatial descriptions of the scene at one
140
band. During the imagery process, the camera lens and aperture produce a blurred version of the object, and the charged-coupled devices (CCD) turns this degraded analog signal into a discrete image. In addition, the images are contaminated by additive noise from various sources: quantization error, sensor measurement, and so on [39, 40]. Hence, an image observation model is built to
145
depict the relationship between the desired HR HSI and the input LR HSI as follows: L = (H∗G) ↓ +n
(1)
where G represents a spatial filter, ’*’ represents the convolutional operation and ’↓’ represents the down-sampling operation. n denotes the noise that follows 6
the gaussian distribution with zero mean value. For each band in the HSI, the 150
observation model can be rewritten as Li = (Hi ∗G) ↓ +n
(2)
where i = 1, 2, ..., K. i is the index of the current band. Considering the spatial constraint and spectral constraint in the SR process, we minimize the following energy function by enforcing the constraints in both the spatial domain and the spectral domain: ˆ i ) = E(H ˆ i |Li ) + λE(H ˆ i |H ˆ i−1 , Li , Li−1 ) E(H 155
(3)
where Li−1 and Li denote the i − 1th and ith band of the input LR HSI (L), reˆ i−1 and H ˆ i represent the i − 1th and ith band of the reconstructed spectively. H ˆ E(H ˆ i |Li ) and E(H ˆ i |H ˆ i−1 , Li , Li−1 ) denote the constraint providHR HSI (H). ed by the SCT strategy and the SDCNN model, respectively. Detailed descriptions about the reconstruction in the spatial domain and spectral domain are
160
presented in sections 3.2 and 3.3. 3.2. Spatial reconstruction by the SCT strategy Given the LR HSI (L), we denote the constraints in the spatial domain as ˆ i |Li ) = ||(H ˆ i ∗ G) ↓ −Li ||F 2 E(H
(4)
This energy function enforces the LR band generated by the reconstruct band ˆ i close to the input LR band Li , and can be minimized by the gradient descent H 165
method:
ˆ i |Li ) ∂E(H ˆ i ∗ G) ↓ −Li ) ↑ ∗G = 2((H ˆi ∂H
(5)
where ’↑’ denotes the up-scaling operation. The specific up-scaling operation used in this paper is the bicubic interpolation. Under the ideal condition that the down-sampling operation is reversible and the blur imported by the filtering operation is fully removed, the iteration strategy ˆ t+1 = H ˆ it − τ (H ˆ it ∗ G ↓ −Hi ∗ G ↓) ↑ ∗G H i 7
(6)
170
can be denoted as ˆ t+1 = H ˆ it − τ (H ˆ it − Hi ) H i
(7)
In this way, minimizing the spatial constraint leads to the following iterations: ˆ1 = H ˆ 0 − τ (H ˆ 0 − Hi ) = (1 − τ )H ˆ 0 − (1 − τ )Hi + Hi H i i i i ˆ i2 = H ˆ i1 − τ (H ˆ i1 − Hi ) = (1 − τ )2 H ˆ i0 − (1 − τ )2 Hi + Hi H
(8)
... ˆ ik = H ˆ k−1 − τ (H ˆ k−1 − Hi ) = (1 − τ )k H ˆ i0 − (1 − τ )k Hi + Hi H i i where k represents the iteration times. The difference between the spatially ˆ i and the desired HR band Hi can be denoted as reconstructed HR band H ˆ ik − Hi = (1 − τ )k H ˆ i0 − (1 − τ )k Hi = (1 − τ )k (H ˆ i0 − Hi ) H
(9)
ˆ 0 and Hi are with constant values, and τ is between 0 and 1, so the Both H i 175
ˆ k and Hi gets smaller as k grows. But in the reality, the difference between H i down-sampling operation is irreversible and the blur is hard to be fully removed, iteration at each time will bring in the information loss, and a much too large k will not be beneficial to the SR performance. Meanwhile, when the scaling factor grows, the information loss each iteration brings will be larger. It will
180
make a performance decline of the SCT strategy. 3.3. Spectral reconstruction by the SDCNN model As described in section 1, the spectral curves of HSIs depict the reflectance variation with the wavelength of the scene and are extremely important for their applications. Post-processing like the SCT strategy may bring in some distortion
185
on the spectral information. So we propose the SDCNN model to make sure the spectral information of the reconstructed HSI be close to the desired HSI. Meanwhile, the SCT strategy is hard to import some extra information, making this strategy has a limited performance when the scaling factor grows. The SDCNN model learns the information from the training data, and is trained to
190
describe the mapping between a spectral difference of the LR HSI and that of the corresponding HR HSI. Thus, a simulated spectral difference of the HR HSI 8
LHd K 1 LHd 2
-
L1
LK LK 1
Ld K 1 Ld 2
L2
LHd1 up-sampling
Ld1
...
...
...
L1
-
...
LK
LK 1
L2
minus operation between neighboring bands
HK H K 1
-
...
HK
Hd K 1
H K 1
Hd 2
H2 Hd1
H1
minus operation between neighboring bands
HR HSI
data of the training process
...
H2 H1
-
spectral difference of the LR HSI
...
LR HSI
label of the training process
Fig. 1: The generating process of the training data and label
with respect to the LR HSI can be imported by the SDCNN model. As the scaling factor grows, the spatial information enhancement performance of the SDCNN model will be more stable than that of the SCT strategy. There are 195
main three parts of the proposed SDCNN: data and label setup; formulation of the SDCNN and HSI SR using SDCNN. Detailed descriptions are presented as follows. 3.3.1. Data and label setup The LR HSI has experienced blurring, down-sampling and noise, and it-
200
s spectral information has been distorted from that of the desired HR HSIs. Deep CNN models can represent a hierarchy of increasingly complex features [41], so we propose to build an SDCNN model to depict the mapping between the spectral difference of the LR HSI and that of the desired HR HSI. For the SDCNN training process, our input data is the bicubic version of the spectral
205
difference between the LR HSI neighboring bands, and the label is the corresponding spectral difference between the HR HSI neighboring bands. We have plotted the generation process of the training data and label in Fig. 1. Seen from Fig. 1, Hd and Ld represent the spectral difference of the HR HSI and
9
that of the LR HSI, respectively, and they can be denoted as Hdi−1 = Hi − Hi−1 Ldi−1 = Li − Li−1 210
for i > 1 (10) for i > 1
where i is the index of the band in the HSI. Up-sampling is operated on Ld to obtain an LHd whose size is the same as Hd. We still call LHd the spectral difference of the LR HSI for simplicity. The computed LHd and Hd are the data and label of the training process. In a deep CNN training process, normalization is a pre-processing to accelerate the convergence speed [42], so we have
215
normalized the data and the label of our SDCNN by the following formula: LHd − min(LHd, Hd) max(LHd, Hd) − min(LHd, Hd) Hd − min(LHd, Hd) label = max(LHd, Hd) − min(LHd, Hd) data =
(11)
Both data and label of the SDCNN have been normalized at the same time to the value between 0 and 1. 3.3.2. Formulation of the SDCNN We plan to learn a mapping model to represent the relationship between 220
the LHd and Hd, and function F () is used to represent this mapping. There are three operations contained in the mapping process: patch extraction and representation, non-linear mapping and reconstruction. • Patch extraction and representation: this operation extracts some overlapping patches from the input LHd , then each patch is represented as a
225
high dimensional vector. We call these patches as feature maps. • Non-linear mapping: this operation is the core of the model. The convolution layers here can act as a non-linear function which maps a highdimensional vector to another high-dimensional vector. In our SDCNN, the mapped high dimensional vectors are conceptually the representation
230
of the patches in Hd.
10
put out
ity ear lin non
on uti vol con
ity ear lin non
n tio olu
v con
ity ear lin non
on uti vol con
ut inp
Fig. 2: Typical SDCNN architecture with three stages
• Reconstruction: this operation combines the mapped HR patches into the ˆ H dˆ is expected to be close to final spectral difference of the HR HSI H d. the desired spectral difference Hd. The SDCNN is a trainable multilayer architecture composed of a series of 235
stages. Each stage consists of two layers: 1) a convolutional layer; and 2) a nonlinearity layer. Although pooling in SDCNN makes the network training computationally feasible, it results in reduced resolution [43], so no pooling layer is contained in the SDCNN model designed for super-resolution. On the other hand, because both the data and label are images, contrary to the multiple-
240
to-one classification problem, neither full-connection layer is contained in the SDCNN model. A typical SDCNN architecture with three stages has been plotted in Fig. 2. Each layer type is described as follows: • Convolutional layer: Both the input and the output of the convolutional layer are 3-D arrays with various 2-D feature maps. The size of the input
245
is w × h × c. It means that the input is c feature maps of size w × h. The size of the corresponding output is w1 × h1 × n. Padding can be added in this layer to make sure the same size of the input feature map and output feature map. The convolutional layer acts as a detector of the local conjunctions of features from previous layer [44].
250
• Nonlinearity layer: This layer simply consists of a point wise nonlinearity function applied to each component in the feature map, which can be 11
denoted as a = f (z). f () can be chosen as rectified linear unit (ReLU) or sigmoid function. In the SDCNN training process, f () is chosen as the ReLU function for easier optimization [45]. 255
Learning the end-to-end mapping function F () requires the estimation of model parameters Θ = {w1 , ..., wi , b1 , ...bi }, where wj and bj represent the weight and bias of the jth convolutional layer, respectively. i is the number of the total convolutional layers. In the SDCNN training process, values of these parameters Θ are initialized from a Gaussian distribution with zero mean
260
and the variance is 0.001. The optimum values of these parameters are achieved through minimizing the loss between the reconstructed HR spectral difference F (Ld; Θ) and the corresponding label Hd. Mean Squared Error (MSE) is utilized as the loss function: n
L(Θ) =
1X ||F (Ld; Θ) − Hd||2 n 1
(12)
where n is the number of training samples. The loss L(Θ) is minimized via 265
stochastic gradient descent with the back-propagation [46]. During the SDCNN training process, the learning rate and momentum are set as 0.001 and 0.9, respectively. 3.3.3. HSI SR using SDCNN When an LR HSI is input, minus operation is done between neighboring
270
bands to get the spectral difference Ld, and then a LHd with the same size of the desired Hd is achieved by the up-sampling operation. According to the proposed SDCNN that has learned in the aforementioned step, the output H dˆi−1 = F (Li − Li−1 ) is exploited to restrict the spectral information in the spectral domain. The SR method constrained by the spectral difference can be
275
expressed as: ˆ i |H ˆ i−1 , Li , Li−1 ) = ||(H ˆi − H ˆ i−1 ) − F (Li − Li−1 )||2F E(H
(13)
and it can be minimized by ˜i = H ˆ i−1 + F (Li − Li−1 ) H 12
(14)
Because the spectral information of a pixel is a vector that is combined by a set of values at different bands, restricting the correlation between neighboring bands can keep both the magnitude and the direction of the spectral vector. It 280
results in a good SR reconstruction performance in spatial domain with spectral preserving. 3.4. SR reconstruction of the SCT SDCNN method When combined the SCT strategy with the SDCNN model, the proposed SR method has a good performance at the spatial information enhancement and
285
spectral information preservation. The overall algorithm for super-resolving the HR HSI is summarized in Algorithm 1.
The whole process of the proposed
Algorithm 1: HSI SR via SCT SDCNN method ˆ iteration Input: LR HSI L, SDCNN learned spectral difference H d, times m, band number K ˆ Output: reconstructed HR HSI H 1 2
for i = 1; i ≤ K; i + + do if i == 1 then
3
up-sample L1 → LH1 ;
4
ˆ 0 = LH1 ; set H 1
5
for k = 1; k ≤ m; k + + do ˆ k via Eq.(6); update H 1
6 7 8 9 10
ˆ m as H ˆ 1; set H 1 else ˆ0 = H ˆ i−1 + H dˆi−1 ; update H i for k = 1; k ≤ m; k + + do ˆ k via Eq.(6); update H i
11
12
ˆ return H;
method is shown in Fig. 3. When super-resolving the i + 1th band, the former
13
F ( Li 1 Li ) Hˆ i
min || ( Hˆ i 1 G) Li 1 ||2F Hˆ i 1
Hˆ i 1 G
( Hˆ i 1 G )
Li 1 ( Hˆ i 1 G ) Li 1 super-resolution by the SDCNN model
super-resolution by the SCT strategy
Fig. 3: Whole process of the proposed method
ˆ i has already been super-resolved. H ˆ i is combined with the SDCNN band H mapped spectral difference F (Li+1 − Li ) (namely H dˆi ) to obtain an initial 290
value of Hi+1 , and then Hi+1 is updated by the SCT strategy through Eq.(6).
4. Experimental results We first validate the performance of the proposed SCT SDCNN method module by module, and then show the overall performance. Experimental data come from three different databases [47, 48, 49]. The 32 HSIs in the CAVE 295
database [47] are of a wide variety of real-world materials and objects with controlled illumination in the laboratory environment, and their wavelengths range from 400 to 700nm with 10nm interval a band. The size of all these HSIs is 512 × 512. The Harvard database [48] contains 50 outdoors and indoors HSIs under daylight illumination, and each HSI contains 31 bands from 420nm
300
to 720nm. The HSIs database created by Foster et al. [49] (named Foster database) contains 25 HSIs of outdoor urban and rural scenes, and each HSI has 33 bands from 400-720 nm with 10 nm a band, and its size is 1341 × 1022. In the experiment, all HSIs in the Harvard and Foster databases are cropped to 1008×1008 in the spatial domain. We have selected two HSIs with comparatively
305
rich textures from each database to validate the performance of our proposed method. The rest HSIs from these three databases are used for the training process, in which 95 HSIs and six HSIs are used as training data and cross validation data, respectively. In order to provide an eye-appealing exhibition about the HSI for evaluating, we have also reconstructed an RGB image for 14
310
each HSI by averaging the first to the 11th band (400-500 nm or 420nm-520nm) as the B channel, the 11th to the 21th (500-600 nm or 520nm-620nm) band as the G channel, and the 21th to the 31th band (600-700 nm or 620nm-720nm) as the R channel. Fig. 4 provides an exhibition about the original HSIs to be tested.
(a) Fig. 4:
(b)
(c)
(d)
(e)
(f)
The HR RGB images reconstructed by the HSIs from three different databas-
es.(a) f lowers and (b) f ake and real f ood with the size 512x512 from the CAVE database [47]. (c)imgg1 and (d)imgb0 under daylight illumination from the Harvard database[48]. (e)Braga Graf f iti and (f) Y ellow Rose from the database built by Foster et al.[49] . The size of (c)-(f) is 1008x1008.
315
4.1. Experiment design and parameters setting 4.1.1. Experimental design The LR HSIs are obtained by the following steps: (1) the original HR HSIs are convolved by a gaussian filter G; (2) the convolved HR HSIs are downsampled by the scaling factor; (3) zero-mean gaussian noise is added to the
320
down-sampled HSIs. Implementations of the compared methods are all from the publicly available codes provided by the authors, and all the input LR HSIs are the same. For the fusion methods [24, 50, 51, 52, 53] that require an auxiliary panchromatic image, we use a panchromatic image created by interpolation and spectrally averaging over the entire range of the LR HSI [54] as one of the input.
325
The following four quantitative measurements: root mean square error (RMSE), peak-signal-to-noise ratio (PSNR), structure similarity index (SSIM)[55] and spectral angle mapper (SAM) [6] are employed to evaluate the quality of the reconstructed HSI. In addition, to provide some proofs from the information theory view, the following entropy measurement is employed to evaluate the
15
38.5
38
38
average PSNR
average PSNR
38.5
37.5 37
37.5 37 36.5 36
36.5 0
0.5
1
1.5
0
2
10
20
(a) Fig. 5:
30
40
iteration times
6
(b)
(a)The average PSNR curve as a function of the value of λ; (b)The average PSNR
curve as a function of the value of iteration times
330
performance of the reconstructed HSIs. EN T R(i) =
65535 X
−p(I(x, y) = val) log(p(I(x, y) = val))
(15)
val=0
where x ∈ [1, sN1 ] and y ∈ [1, sN2 ]. Let the image I be normalized into the range [0, 65535] and be forced into integer. p(I(x, y) = val) denotes the possibility that the pixel value is equal to val. EN T R(i) represents the information entropy of the ith band, and the global EN T R is computed by averaging over the whole 335
image. During the SDCNN training process, in order to evaluate the performance of the trained model, we have defined pl PSNR and pl SSIM to evaluate the similarity between the prediction output and the label. 4.1.2. Parameters setting
340
To evaluate the sensitivity of key parameters to the performance of the proposed method, we have varied the value λ and the iteration times. Fig. 5(a) plots the average PSNR of the six reconstructed HR HSIs as a function of the value λ. We can see that the proposed method performs best when the value of λ locates in the range of 1.0 ∼ 1.2 and is insensitive to the variation of λ in this
345
range. In our implementation, we set λ =1.0. Fig. 5(b) plots the average PSNR of the six reconstructed test HR HSIs as a function of the iteration times. It is
16
noted that the performance of the proposed method grows slowly when iteration times is larger than 10. Moreover, the performance goes decline when iteration time is larger than 15. The reason may be that the information loss caused by 350
down-sampling and blur operations has suppressed the information brought by the SCT strategy. Hence, we set the iteration time as 10 in the experiment. 4.2. SR performance of the SCT strategy In order to verify the performance of our method in a clear way, we have tested its performance module by module. Table 1 shows the SR performance
355
of the bicubic interpolation and the SCT strategy for each test HSI. Seen from Table 1 in the horizontal direction, the SCT strategy can greatly improve the SR performance when the scaling factor is small. But when compared the data in the vertical direction, the performance of the SCT strategy will get worse as the scaling factor increases. According to the data in Table 1, when scaling
360
factor is 8, the performance of SCT strategy may be worse than that of the bicubic interpolation. The reason for this phenomenon is that when the scaling factor is small, most information about the original HR HSI is retained in the LR HSI, enforcing the LR HSI generated by the reconstructed HSI be close to the input LR HSI can be benefit for the SR process. When the scaling factor
365
increases, the information of the LR HSI is badly ruined from the original HR HSI, and the noise imported by this strategy will suppress the information the LR HSI brings, making a negative influence on the SR process. 4.3. SR performance of the SDCNN model Parameters sensitivity analysis: In order to learn an SDCNN between
370
the LHd and Hd, we have tested the performances with different network layers to choose the best layer number. When the best layer number is determined, different groups of kernel sizes have been tested to choose the best kernel size. Generally speaking, the deeper a network is, the more complex relationship it can represent, making the performance more acceptable. But Dong et al. [20]
375
proposed that in practice, it is hard to say the deeper the better. This is because
17
Table 1: Quantitative measurement of the experimental results by SCT strategy
CAVE
Eval.Mat
f lowers bicubic
SCT
Harvard
f ake and real f ood bicubic
SCT
imgg1 bicubic
SCT
Foster
imgb0
Braga Graf f iti
Y ellow Rose
bicubic
SCT
bicubic
SCT
bicubic
SCT
Scaling factor=2 ENTR
5.3951
5.4865
5.6635
5.9225
4.6158
4.8399
2.3830
2.5767
3.3134
3.3889
2.2391
2.3068
RMSE
0.0156
0.0110
0.0137
0.0099
0.0074
0.0057
0.0189
0.0154
0.0128
0.0075
0.0058
0.0034
PSNR
36.17
39.19
37.27
40.07
42.60
44.92
34.46
36.24
37.85
42.47
44.70
49.45
SSIM
0.9689
0.9795
0.9701
0.9874
0.9889
0.9919
0.9080
0.9310
0.9713
0.9850
0.9896
0.9946
SAM
5.0834
4.0978
4.0176
3.3308
4.5515
4.4638
2.1498
2.1620
10.6775
9.9880
3.4218
3.0349
Scaling factor=4 ENTR
5.2983
5.3538
5.6596
5.7503
4.6376
4.6623
2.3365
2.4656
3.2785
3.3699
2.2371
2.2375
RMSE
0.0200
0.0186
0.0174
0.0159
0.0089
0.0083
0.0222
0.0211
0.0176
0.0153
0.0078
0.0067
PSNR
33.97
34.59
35.17
35.99
41.00
41.62
33.09
33.52
35.07
36.30
42.14
43.42
SSIM
0.9523
0.9522
0.9538
0.9551
0.9854
0.9865
0.8826
0.8891
0.9515
0.9569
0.9833
0.9854
SAM
6.3232
5.7587
5.1150
4.7207
4.6716
5.0783
2.2831
2.2714
11.5335
11.5323
3.8215
3.6748
Scaling factor=8 ENTR
5.0167
5.0149
5.4655
5.4727
4.3006
4.4141
2.1714
2.2160
3.1296
3.1744
2.2200
2.2238
RMSE
0.0305
0.0310
0.0267
0.0265
0.0126
0.0127
0.0280
0.0281
0.0297
0.0294
0.0141
0.0140
PSNR
30.30
30.16
31.38
31.52
38.02
37.94
31.05
31.02
30.53
30.63
37.00
37.10
SSIM
0.9063
0.9027
0.9077
0.9055
0.9745
0.9740
0.8453
0.8440
0.9016
0.9007
0.9597
0.9591
SAM
9.6781
9.3706
8.1185
7.9154
4.9423
5.0018
2.5620
2.6230
13.3825
14.8103
4.7035
5.0010
of that the deeper means the more parameters need to be determined, a process which is more difficult. The SDCNN training process is conducted on the 95 training data and six cross validation data. When the SDCNNs with different layer number or different kernel size have been successfully trained, finding which 380
SDCNN achieves the best performance is operated on the six test HSIs. The pl PSNR and pl SSIM are defined to measure the average similarity between the reconstructed spectral difference H dˆ and the desired spectral difference Hd for these six test HSIs. Comparisons between three layers and four layers have been plotted in Fig. 6. Here we use ’+’ to denote the pl PSNR and pl SSIM of three
385
layers, and use ’x’ to denote four layers. Three groups of kernel sizes have been set for each layer number. It is observed that as the iteration times grows, the pl PSNR and pl SSIM of three layers grows first and then comes into a stable
18
42
0.925
40
0.92
pl_751 pl_7511 pl_753 pl_7513 pl_953 pl_9513
36 34
0.915
SSIM
PSNR
38
0.905
32
0.9
30
0.895
28 0
5
10
15
20
25
pl_751 pl_7511 pl_753 pl_7513 pl_953 pl_9513
0.91
0.89
30
0
5
10
iteration times (10 4)
15
20
25
30
iteration times (10 4)
(a)
(b)
(a)pl PSNR comparison between three and four layers with different kernel sizes;
Fig. 6:
(b)pl SSIM comparison between three and four layers with different kernel sizes 40.6
29.5 29
pl_95 pl_953
40.4
pl_9513 pl_95313
28
40.2 PSNR
PSNR
28.5
40
27.5 39.8
27 26.5 0
5
10
15
20
25
39.6
30
0
iteration times(10 4)
5
10
15
20
25
30
35
iteration times (10 4)
(a)
(b)
Fig. 7: (a)pl PSNR comparison between four and five layers; (a)pl PSNR comparison between three and two layers
state, but the performance of four layers is unstable and always worse than three layers. When we go deeper to the five layers, performance comparison 390
with four layers has been shown in Fig. 7(a). It is noted that the SDCNN with five layers has poorer performance than that with four layers. We have also made a comparison between three layers and two layers, and the experimental data is shown in Fig. 7(b). Seen from Fig. 7(b), the SDCNN with three layers has a better performance than that with two layers, so we have selected three
395
as the number of the final network layers. After the number of layers has been decided, we planned to explore the best kernel size for each layer. Three groups of different parameters (9-5-3, 7-5-3 and
19
40.5
0.924
40.4
0.923 pl_751 pl_753 pl_953
0.922
40.2
0.921
40.1
0.92
SSIM
PSNR
40.3
40
pl_751 pl_753 pl_953
0.919
39.9
0.918
39.8
0.917 0.916
39.7
0.915
39.6 0
5
10
15
20
25
30
0
35
5
10
iteration times (10 4)
15
20
25
30
35
iteration times (10 4)
(a)
(b)
Fig. 8: (a)pl PSNR comparison between three groups of kernel size; (b)pl SSIM comparison between three groups of kernel size 40.5 40.4
PSNR
40.3 pl_953 pl_973
40.2 40.1 40 39.9 39.8 0
5
10
15
20
25
30
iteration times (10 4 )
Fig. 9: pl PSNR comparison between kernel size 9-5-3 and 9-7-3
7-5-1) have been set in the experiment. Experimental results of different kernel sizes, but same iteration times, have been shown in Fig. 8. Seen from Fig. 8, 400
as the iteration times grows, the overall tendency of the pl PSNR and pl SSIM grows till a stable state. It can also be seen that the SDCNN with kernel size 9-5-3 has a superior performance than the others. By analyzing that 9-5-3 has a larger size than 7-5-3 and 7-5-1, so we tried another group of the larger kernel size 9-7-3, the experimental result is shown in Fig. 9. Seen from Fig. 9, the
405
optimal performance of the SDCNN with kernel size 9-7-3 is a little better than that of the SDCNN with 9-5-3, but the SDCNN with kernel size 9-5-3 keeps a more stable state than the SDCNN with kernel size 9-7-3 when iteration times grows, so we have set 9-5-3 as the final kernel size. Some feature maps for the f lower HSI at different layers have been shown in Fig. 10.
410
Performance: Experimental results of the SDCNN model on each single
20
(a)
(b)
(c) Fig. 10:
(a)some feature maps at the first layer; (b)some feature maps at the second layer;
(c)feature map at the third layer
Table 2: Quantitative measurement of the experimental results by the SDCNN model
Eval.Mat
CAVE f lowers
Harvard
f ake and real f ood
imgg1
imgb0
Foster Braga Graf f iti
Y ellow Rose
Scaling factor=2 ENTR
5.4349
5.7974
5.1091
2.5820
3.3543
2.2671
RMSE
0.0113
0.0109
0.0083
0.0173
0.0100
0.0057
PSNR
38.91
39.24
41.66
35.23
40.03
44.93
SSIM
0.9757
0.9725
0.9776
0.9195
0.9780
0.9905
SAM
4.2008
3.5523
4.5649
2.2095
10.4546
3.0456
ENTR
5.4126
5.7533
4.7908
2.5532
3.3072
2.2411
RMSE
0.0172
0.0152
0.0083
0.0205
0.0150
0.0062
PSNR
35.29
36.35
41.67
33.77
36.51
44.19
SSIM
0.9550
0.9527
0.9849
0.8942
0.9592
0.9874
SAM
5.6806
4.7810
4.6869
2.2401
11.4133
3.6508
Scaling factor=4
Scaling factor=8 ENTR
5.1238
5.4916
4.6573
2.2388
3.2570
2.2383
RMSE
0.0283
0.0254
0.0127
0.0273
0.0286
0.0130
PSNR
30.96
31.89
37.95
31.29
30.88
37.73
SSIM
0.9068
0.8929
0.9740
0.8473
0.9003
0.9613
SAM
8.7770
7.8734
4.8372
2.5548
13.4083
4.6629
21
test HSI from three databases have been shown in Table 2. By analyzing the data in Table 2, the SDCNN model reduces the spectral information loss with spatial information increased at the same time. This is caused by the spatialspectral merging feature of HSIs. Excellent preservation of the spectral infor415
mation will ensure a high spatial resolution at the same time. For the HSIs in the Harvard database which have more homogeneous texture than that of the HSIs in the other databases, it is more suitable to use deep learning to extract their feature. The SDCNN model has a stable and outstanding performance. By comparing the data in Tables 1 and 2, the performance of the SDCNN model
420
is not better than the SCT strategy when the scaling factor is small, for examle 2, but when the scaling factor grows, the performance of the SDCNN model will be superior to the SCT strategy. So it is concluded that the SDCNN model is not only good at preserving the spectral information, but also has a more stable SR performance as the scaling factor increases.
425
4.4. SR performance of the SCT SDCNN method As described in the aforementioned part, the SR performance of the SCT strategy is excellent at a small scaling factor, but it decreases rapidly as the scaling factor grows. The SR performance of the SDCNN model is relatively more stable in both spatial information enhancement and spectral information
430
preservation, so we combine the SCT strategy with the SDCNN model to superresolve an LR HSI into an HR one. 4.4.1. Evaluation for the CAVE database In Table 3, we have provided quantitative measurements averaged on the two test HSIs in the CAVE database. By analyzing the data in Table 3, the perfor-
435
mance of our proposed SCT SDCNN method outperforms the other methods. Fig.8(a) presents a direct show about the RGB images created by the reconstructed f lowers HSI via various methods when scaling factor is 2. One point in the main body of the f lowers is randomly selected and the point locates at the center of red circle whose radius is two pixels. Its corresponding spec-
22
Table 3: The average result of RMSE, PSNR (dB), SSIM and SAM on two test HSIs in the CAVE database
Eval. Mat
ENTR
RMSE
PSNR
SSIM
SAM
440
Scale
Bicubic
GS [50]
GSA [51]
GFPCA [52]
HySure [53]
NSSR[24]
SCT SDCNN
2
5.5113
5.5567
5.5342
5.6490
5.5254
5.7483
5.7542
4
5.4790
5.3997
5.3750
5.4660
5.3844
5.5360
5.5944
8
5.2411
4.9466
4.9229
5.1294
4.8581
5.2685
5.3455
2
0.0146
0.0148
0.0147
0.0158
0.0139
0.0148
0.0101
4
0.0187
0.0203
0.0202
0.0196
0.0241
0.0173
0.0162
8
0.0287
0.0249
0.0320
0.0295
0.0332
0.0274
0.0263
2
36.72
36.60
36.65
36.06
37.13
36.59
39.95
4
34.57
33.87
33.91
34.18
32.37
35.23
35.83
8
30.84
32.07
29.91
30.63
29.59
31.24
31.60
2
0.9695
0.9679
0.9692
0.9641
0.9675
0.9649
0.9804
4
0.9531
0.9382
0.9391
0.9500
0.9272
0.9542
0.9546
8
0.9070
0.8737
0.8695
0.9078
0.8388
0.9017
0.9074
2
4.5505
4.3629
4.2833
4.8469
6.3298
7.1989
3.6036
4
5.7191
6.4412
6.3896
5.4542
7.8015
5.9957
5.0730
8
8.8983
10.4128
10.3768
7.7394
10.8693
8.8658
8.0551
tral curve is plotted in Fig. 11(c)-(i). The comparison between these spectral curves has been shown in Fig. 11(b). According to Fig. 11, our proposed SCT SDCNN method can obtain better spatial information and achieve better spectral information preservation at the same time. 4.4.2. Evaluation for the Harvard database
445
Experiments have also been conducted on the two test HSIs with comparatively rich textures in the Harvard database. Results have been shown in Table 4. The data in Table 4 indicate the proposed SCT SDCNN method outperforms all the other competing methods. It is noted that the EN T R of the ground truth imgg1 and imgb0 is 7.2147 and 8.1576, respectively. But the EN T R in Table
450
4 is very small. According to the bicubic result in Table 4, this may be due to the degradation, making the input HSIs be badly damaged. We have provided a direct show about the RGB images created by the reconstructed imgb0 HSI via various methods in Fig. 12(a). Because the size of the imgb0 is much too
23
original
bicubic
GS
GSA
GFPCA
HySure
NSSR
(a) 0.12
0.08
0.06
0.06
0.08
0.06 0.04
0.02
0.02
0.02
0 10
15
20
25
30
5
10
15
20
25
30
(b)
(c)
value
0.1 0.08
value
0.1 0.08 0.06 0.04
0.04
0.02
0.02
0.02
0
band number
(f)
25
30
30
0
5
10
15
10
20
25
30
25
30
20
25
30
SCT_SDCNN
0.1 0.08 0.06 0.04 0.02 0
5
10
15
band number
(g)
20
original
NSSR
0
band number
15
(e)
0 0
5
band number
0.06
0.04
20
25
original
HySure
GFPCA
0
20
0.12
original
original
0.06
15
(d)
0.1
15
10
0.12
0.08
10
5
band number
0.12
5
0 0
band number
0.12
0
0.02
0 0
0.06 0.04
value
5
band number
value
0.08
0.04
0
GSA
0.1
0.04
0
original
0.1
value
bicubic
0.12
original GS
original
0.1
value
0.08
value
0.12
original bicubic GS GSA GFPCA HySure NSSR SCT_SDCNN
0.1
value
0.12
(h)
20
25
30
0
5
10
15
band number
(i)
Fig. 11: (a)experimental results of the f lowers with different methods; (b)-(i)spectral profiles of the point in (a) by different methods
24
SCT_SDCNN
Table 4: The average result of RMSE, PSNR (dB), SSIM and SAM on two test HSIs in the Harvard database
Eval. Mat
ENTR
RMSE
PSNR
SSIM
SAM
Scale
Bicubic
GS [50]
GSA [51]
GFPCA [52]
HySure [53]
NSSR [24]
SCT SDCNN
2
3.4994
3.5047
3.5086
3.5314
3.5544
3.7636
3.8804
4
3.4871
3.4062
3.4040
3.4359
3.4976
3.5600
3.7468
8
3.2360
3.2446
3.2389
3.2699
3.1895
3.3342
3.5064
2
0.0118
0.0115
0.0115
0.0131
0.0126
0.0125
0.0093
4
0.0140
0.0140
0.0140
0.0156
0.0167
0.0137
0.0128
8
0.0188
0.0189
0.0189
0.0204
0.0219
0.0189
0.0182
2
38.53
38.77
38.79
38.63
39.01
38.06
40.65
4
37.05
37.07
37.11
37.02
36.30
37.28
37.82
8
34.54
34.48
34.49
34.51
33.72
34.49
34.82
2
0.9485
0.9546
0.9547
0.9524
0.9534
0.9542
0.9620
4
0.9340
0.9407
0.9408
0.9403
0.9330
0.9427
09406
8
0.9099
0.9185
0.9184
0.9196
0.9080
0.9146
0.9119
2
3.3507
3.3738
3.3738
3.6200
3.7375
4.0040
3.2497
4
3.4774
3.6275
3.6286
3.6744
4.1228
3.8124
3.3919
8
3.7522
4.0500
4.0888
3.8155
4.7386
4.9684
3.7140
large and it is inconvenient to provide a comparison within the limited space, 455
we cropped a rectangle region in the HR HSI and selected one point in edges to show the spectral curves comparison in Fig. 12(c)-(i). The total comparisons have been shown in Fig. 12(b). Seen from the created image in Fig. 12(a), it can be seen that our proposed SCT SDCNN method can reconstruct a more eye-appealing HR HSI. According to the curves plotted in Fig. 12(c)-(i) and
460
the overall comparison in Fig. 12(b), our SCT SDCNN method has a better spectral preserving ability than the other methods for the Harvard database. 4.4.3. Evaluation for the Foster database Experiments have also been conducted on the two test HSIs from the Foster database. Results of the SCT SDCNN method and the other methods have
465
been shown in Table 5. EN T R of the ground truth Rose and Braga Graf f iti is 3.5752 and 2.4438, respectively. In this way, the EN T R of the reconstructed HSI is not very large neither, as shown in Table 5. We have also cropped a
25
original
bicubic
GS
GSA
GFPCA
HySure
NSSR
SCT_SDCNN
original
bicubic
GS
GSA
GFPCA
HySure
NSSR
SCT_SDCNN
(a)
value
0.25 0.2 0.15
0.45
original bicubic
0.4
0.35
0.3
0.3
0.3
0.25
0.25
0.25
0.2 0.15
0.15
0.1
0.1
0.05
0.05
0 5
10
15
20
25
30
5
10
25
30
0.45
GFPCA
0.1
0 5
10
15
20
25
30
0
0.45
0.45
original NSSR
0.4
0.2
0.15
0.15
0.1
0.1
0.05
0 5
10
15
band number
(f)
20
25
30
5
10
15
20
25
30
25
30
0 5
10
15
band number
(g)
20
0.1
0
band number
0.2
0.05
0 0
30
0.15
0.05
0 0
value
0.3 0.25
value
0.3 0.25
value
0.3 0.25
0.1
25
0.35
0.3
0.15
20
original SCT_SDCNN
0.4
0.35
0.2
15
(e)
0.25
0.05
10
band number
(d)
0.35
0.2
5
band number
original HySure
0.4
0.35
GSA
0.2
0.05 0
(c)
original
0.4
20
band number
(b) 0.45
15
original
0.15
0 0
band number
value
0.2
0.1
0
0.4
GS
0.35
0.05 0
0.45 original
0.4
0.35
value
0.3
value
0.45
original bicubic GS GSA GFPCA HySure NSSR SCT_SDCNN
0.4 0.35
value
0.45
(h)
20
25
30
0
5
10
15
band number
(i)
Fig. 12: (a) experimental results of the imgb0 with different methods; (b)-(i) spectral profiles of the point in (a) by different methods
26
Table 5: The average result of RMSE, PSNR (dB), SSIM and SAM on two test HSIs in the Foster database
Eval. Mat
ENTR
RMSE
PSNR
SSIM
SAM
Scale
Bicubic
GS [50]
GSA [51]
GFPCA [52]
Hysure [53]
NSSR [24]
SCT SDCNN
2
2.7763
2.7947
2.8076
2.7982
2.8039
2.8564
2.9333
4
2.7578
2.7781
2.7859
2.7770
2.7762
2.8039
2.8255
8
2.6748
2.6992
2.7003
2.6837
2.6751
2.7472
2.7553
2
0.0086
0.0085
0.0085
0.0088
0.0090
0.0082
0.0048
4
0.0117
0.0125
0.0124
0.0114
0.0122
0.0073
0.0092
8
0.0205
0.0219
0.0218
0.0210
0.0193
0.0194
0.0188
2
41.27
41.40
41.45
41.11
41.56
41.77
46.31
4
38.61
38.06
38.15
38.89
38.30
42.69
40.77
8
33.77
33.20
33.25
33.54
34.27
34.24
34.54
2
0.9805
0.9813
0.9814
0.9768
0.9807
0.9816
0.9905
4
0.9674
0.9656
0.9658
0.9645
0.9567
0.9863
0.9746
8
0.9307
0.9256
0.9242
0.9302
0.9076
0.9333
0.9329
2
7.0497
7.2560
7.2558
7.4827
7.8265
7.5801
6.3711
4
7.6775
8.8116
8.8346
8.9295
8.5882
7.4369
7.6036
8
9.0430
11.1369
11.3432
9.6921
11.9425
9.4430
9.0382
rectangle region in the HR HSI to provide a better comparison between different methods. In Fig. 13(a), it can be seen that our SCT SDCNN can reconstruct a 470
more eye-appealing HR HSI. We have also randomly selected one point in the main region of the Braga Graf f iti HSI to compare the spectral difference, and the point locates at the center of the red circle whose radius is two pixels. Its spectral curves of the reconstructed HSIs by different methods are plotted in Fig. 13(c)-(i), and the overall spectral curves comparison is presented in Fig.
475
13(b). The spectral curve created by the SCT SDCNN is also more close to the curve of the original HSI.
5. Conclusion In this paper, we propose a novel HSI SR method by combining an SCT strategy to improve the spatial information with an SDCNN model to preserve 480
the spectral information. The proposed SDCNN learns an end-to-end spectral
27
original
bicubic
GS
GSA
Our
original
bicubic
GS
GSA
GFPCA
HySure
SCT_SDCNN
original
bicubic
GS
GSA
GFPCA
HySure
SCT_SDCNN
(a)
0.03
0.04
0.035
0.035
0.035
0.03
0.025
0.02
0.02
0.02
0.015 0
5
10
15
20
25
30
5
10
0.05
20
25
30
0.02 0.015 0
5
10
band number
(b) 0.05
20
25
30
0
0.05
0.05
original NSSR
0.045
0.035
0.035
0.03
0.03 0.025
0.02
0.02
0.02
0.015 0
5
10
15
band number
(f) Fig. 13:
20
25
30
value
0.035
value
0.035
value
0.04
0.025
5
10
15
20
25
30
25
30
20
25
30
0.03 0.025 0.02
0.015 0
20
original SCT_SDCNN
0.045
0.04
0.025
15
(e)
0.04
0.015
10
band number
0.04
0.03
5
(d)
original HySure
0.045
15
band number
(c)
original GFPCA
0.045
15
0.03 0.025
0.015 0
band number
value
0.03
0.025
original GSA
0.045
0.04
0.025
0.015
0.05
original GS
0.045
0.04
value
0.035
0.05
original bicubic
0.045
value
value
0.05
original bicubic GS GSA GFPCA HySure NSSR SCT_SDCNN
0.04
value
0.05 0.045
0.015 0
5
10
band number
15
band number
(g)
(h)
20
25
30
0
5
10
15
band number
(i)
(a) experimental results of the Braga Graf f iti with different methods; (b)-(i)
spectral profiles of the point in (a) by different methods
28
difference mapping between the LR HSI and the HR HSI. Different from most state-of-the-arts methods are focused on improving the resolution by fusion an HSI with an RGB image of the same scene, which is difficult to achieve in the real world, our proposed SCT SDCNN method improves the spatial resolution with 485
just one LR HSI as the input. Spatial resolution is improved by constraining the LR band generated by the reconstructed HR HSI should be close to the input LR band, and spectral difference of the reconstructed HSI is restricted to be close to the SDCNN learned spectral difference. In this way, the reconstructed HR HSI is excellent in both spatial information recovery and spectral information
490
preservation. We have tested the performance of our proposed SCT SDCNN method on three different databases, which contain both indoor scenes with controlled illumination and outdoor scenes with daylight. Comparative analyses demonstrate that the proposed method outperforms the existing state-of-thearts methods.
495
Acknowledgments This work was supported by the National Science Foundation of China under Grants 61222101, 61272120, 61301287, 61301291 and 61350110239.
References [1] J. Li, J. M. Bioucas-Dias, A. Plaza, Spectralcspatial hyperspectral image 500
segmentation using subspace multinomial logistic regression and markov random fields, IEEE Transactions on Geoscience and Remote Sensing 50 (3) (2012) 809–823. [2] M. P. Nelson, L. Shi, L. Zbur, R. J. Priore, P. J. Treado, Real-time shortwave infrared hyperspectral conformal imaging sensor for the detection of
505
threat materials, in: SPIE Defense + Commercial Sensing (DCS) Symposium, Vol. 9824, 2016, pp. 1–9.
29
[3] S. Asadzadeh, C. Roberto, De Souza Filho, A review on spectral processing methods for geological remote sensing, International Journal of Applied Earth Observation and Geoinformation 47 (2016) 69–90. 510
[4] B. Du, L. Zhang, A discriminative metric learning based anomaly detection method, IEEE Transactions on Geoscience and Remote Sensing 52 (11) (2014) 6844–6857. [5] K. Tan, X. Jin, A. Plaza, X. Wang, L. Xiao, P. Du, Automatic change detection in high-resolution remote sensing images by using a multiple classifier
515
system and spectralcspatial features, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2016) 3439–3451. [6] H. Zhang, L. Zhang, H. Shen, A super-resolution reconstruction algorithm for hyperspectral images, Signal Processing 92 (9) (2012) 2082–2096. [7] S. C. Park, M. K. Park, M. G. Kang, Super-resolution image reconstruction:
520
a technical overview, Signal Processing Magazine IEEE 20 (4) (2003) 21–36. [8] F. Jiru, Introduction to post-processing techniques, European Journal of Radiology 67 (2) (2008) 202–217. [9] M. Abdel-Nasser, J. Melendez, A. Moreno, O. A. Omer, D. Puig, Breast tumor classification in ultrasound images using texture analysis and super-
525
resolution methods, Engineering Applications of Artificial Intelligence 59 (2017) 84–92. [10] A. Hori, T. Toda, Regulation of centriolar satellite integrity and its physiology, Cellular and Molecular Life Sciences 74 (2) (2017) 213–229. [11] A. Kappeler, S. Yoo, Q. Dai, A. K. Katsaggelos, Super-resolution of com-
530
pressed videos using convolutional neural networks, in: IEEE International Conference on Image Processing, 2016, pp. 1150–1154. [12] W. C. Siu, K. W. Hung, Review of image interpolation and superresolution, in: Signal and Information Processing Association Summit and Conference, 2012, pp. 1–10. 30
535
[13] X. Li, M. T. Orchard, New edge-directed interpolation., IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 10 (10) (2000) 1521–1527. [14] J. Sun, J. Sun, Z. Xu, H. Y. Shum, Image super-resolution using gradient profile prior, Proceedings / CVPR, IEEE Computer Society Conference on
540
Computer Vision and Pattern Recognition 8 (2008) 1–8. [15] Y. W. Tai, S. Liu, M. S. Brown, S. Lin, Super resolution using edge prior and single image detail synthesis, in: Cvpr, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 2400–2407. [16] M. K. Ozkan, A. M. Tekalp, M. I. Sezan, Pocs-based restoration of space-
545
varying blurred images, IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 3 (4) (1994) 450–454. [17] R. R. Schultz, R. L. Stevenson, Extraction of high-resolution frames from video sequences, IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 5 (6) (1996) 996–1011.
550
[18] J. Yang, J. Wright, T. S. Huang, Y. Ma, Image super-resolution via sparse representation., IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 19 (11) (2010) 2861–2873. [19] J. Bruna, P. Sprechmann, Y. Lecun, Image super-resolution using deep convolutional networks, arXiv:1511.05666v2 (2015) 1–15.
555
[20] C. Dong, C. C. Loy, K. He, X. Tang, Image super-resolution using deep convolutional networks., IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (2) (2015) 295–307. [21] T. Akgun, Y. Altunbasak, R. M. Mersereau, Super-resolution reconstruction of hyperspectral images, IEEE Transactions on Image Processing
560
14 (11) (2005) 1860–1875.
31
[22] M. Simoes, J. Bioucas-Dias, L. B. Almeida, J. Chanussot, Hyperspectral image superresolution: An edge-preserving convex formulation, in: IEEE International Conference on Image Processing, 2014, pp. 4166 – 4170. [23] N. Akhtar, F. Shafait, A. Mian, Sparse spatio-spectral representation for 565
hyperspectral image super-resolution, in: European Conference on Computer Vision, 2014, pp. 63–78. [24] W. Dong, F. Fu, G. Shi, X. Cao, J. Wu, G. Li, X. Li, Hyperspectral image super-resolution via non-negative structured sparse representation., IEEE Transactions on Image Processing 25 (5) (2016) 2337–2351.
570
[25] P. M. Atkinson, Mapping sub-pixel vector boundaries from remotely sensed images, Proceedings of GISRUK ’96.Canterbury, UK, 1996. [26] Y. Lecun, B. Boser, J. Denker, D. Henderson, Backpropagation applied to handwritten zip code recognition, Neural Computation 1 (4) (1989) 541– 551.
575
[27] W. Ouyang, X. Wang, X. Zeng, S. Qiu, Deepid-net: Deformable deep convolutional neural networks for object detection, in: Computer Vision and Pattern Recognition, 2015, pp. 2403–2412. [28] C. Szegedy, S. Reed, D. Erhan, D. Anguelov, S. Ioffe, Scalable, high-quality object detection, arXiv:1412.1441v3 (2015) 1–10.
580
[29] Y. Sun, X. Wang, X. Tang, Deep learning face representation by joint identification-verification, Advances in Neural Information Processing Systems 27 (2014) 1988–1996. [30] H. M. Li, Deep learning for image denoising, International Journal of Signal Processing, Image Processing and Pattern Recognition 7 (3) (2014) 171–
585
180. [31] J. Zabalza, J. Ren, J. Zheng, H. Zhao, C. Qing, Z. Yang, P. Du, S. Marshall, Novel segmented stacked autoencoder for effective dimensionality re32
duction and feature extraction in hyperspectral imaging, Neurocomputing 185 (2016) 1–10. 590
[32] W. Zhao, S. Du, Learning multiscale and deep representations for classifying remotely sensed imagery, Isprs Journal of Photogrammetry and Remote Sensing 113 (2016) 155–165. [33] F. Zhang, B. Du, L. Zhang, Scene classification via a gradient boosting random convolutional network framework, IEEE Transactions on Geoscience
595
and Remote Sensing 54 (3) (2016) 1793–1802. [34] W. Zhao, S. Du, Spectralcspatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach, IEEE Transactions on Geoscience and Remote Sensing 54 (8) (2016) 4544–4554. [35] A. Romero, C. Gatta, G. Camps-Valls, Unsupervised deep feature extrac-
600
tion for remote sensing image classification, IEEE Transactions on Geoscience and Remote Sensing 54 (3) (2015) 1349–1362. [36] Y. Chen, X. Zhao, X. Jia, Spectralcspatial classification of hyperspectral data based on deep belief network, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8 (6) (2015) 2381–2392.
605
[37] Y. Yuan, X. Zheng, X. Lu, Hyperspectral image superresolution by transfer learning, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10 (5) (2017) 1963–1974. [38] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998)
610
2278–2324. [39] N. Nguyen, P. Milanfar, G. Golub, Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement, IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 10 (9) (2001) 1299–1308.
33
615
[40] M. Elad, A. Feuer, Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images, IEEE Transactions on Image Processing 6 (12) (1997) 1646–1658. [41] Y. Li, W. Xie, H. Li, Hyperspectral image reconstruction by deep convolutional neural network for classification, Pattern Recognition 63 (2017)
620
371–383. [42] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv:1502.03167v3 (2015) 1–11. [43] A. Dosovitskiy, P. Fischery, E. Ilg, Flownet: Learning optical flow with convolutional networks, in: IEEE International Conference on Computer
625
Vision, 2015, pp. 2758–2766. [44] Y. LeCun, G. Bengio, Yoshua andHinton, Deep learning, NATURE (2015) 436–444. [45] K. Jarrett, K. Kavukcuoglu, M. Ranzato, Y. Lecun, What is the best multistage architecture for object recognition?, in: Proc. International Confer-
630
ence on Computer Vision, 2009, pp. 2146 – 2153. [46] Y. Lcun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278– 2324. [47] F. Yasuma, T. Mitsunaga, D. Iso, S. K. Nayar, Generalized assorted pixel
635
camera: postcapture control of resolution, dynamic range, and spectrum., IEEE Transactions on Image Processing 19 (9) (2010) 2241–2253. [48] A. Chakrabarti, T. Zickler, Statistics of real-world hyperspectral images, in: IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 193–200.
640
[49] F. Dh, N. Smc, A. K, Information limits on neural identification of colored surfaces in natural scenes, Visual Neuroscience 21 (21) (2004) 331–336. 34
[50] C. A. Laben, B. V. Brower, Process for enhancing the spatial resolution of multispectral imagery using pan-sharpening, US Patent 6011875 (2000) 1–9. 645
[51] B. Aiazzi, S. Baronti, M. Selva, Improving component substitution pansharpening through multivariate regression of ms +pan data, IEEE Transactions on Geoscience and Remote Sensing 45 (10) (2007) 3230–3239. [52] L. Wenzhi, H. Xin, C. Frieke Van, G. Sidharta, P. Aleksandra, P. Wilfried, L. Hui, Processing of multiresolution thermal hyperspectral and digital
650
color data: Outcome of the 2014 ieee grss data fusion contest, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8 (6) (2015) 2984–2996. [53] M. Simoes, J. Bioucas-Dias, L. B. Almeida, J. Chanussot, A convex formulation for hyperspectral image superresolution via subspace-based reg-
655
ularization, IEEE Transactions on Geoscience and Remote Sensing 53 (6) (2015) 3373–3388. [54] M. T. Eismann, R. C. Hardie, Application of the stochastic mixing model to hyperspectral resolution enhancement, IEEE Transactions on Geoscience and Remote Sensing 42 (9) (2004) 1924–1933.
660
[55] C. Y. Yang, J. B. Huang, M. H. Yang, Exploiting self-similarities for single frame super-resolution, in: Asian Conference on Computer Vision, 2010, pp. 497–510.
35