remote sensing Article
Multi-Scale Residual Convolutional Neural Network for Haze Removal of Remote Sensing Images Hou Jiang 1,2 1
2 3
*
ID
and Ning Lu 1,3, *
ID
State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China;
[email protected] College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China Correspondence:
[email protected]; Tel.: +10-86-6488-9981
Received: 14 May 2018; Accepted: 12 June 2018; Published: 14 June 2018
Abstract: Haze removal is a pre-processing step that operates on at-sensor radiance data prior to the physically based image correction step to enhance hazy imagery visually. Most current haze removal methods focus on point-to-point operations and utilize information in the spectral domain, without taking consideration of the multi-scale spatial information of haze. In this paper, we propose a multi-scale residual convolutional neural network (MRCNN) for haze removal of remote sensing images. MRCNN utilizes 3D convolutional kernels to extract spatial–spectral correlation information and abstract features from surrounding neighborhoods for haze transmission estimation. It takes advantage of dilated convolution to aggregate multi-scale contextual information for the purpose of improving its prediction accuracy. Meanwhile, residual learning is utilized to avoid the loss of weak information while deepening the network. Our experiments indicate that MRCNN performs accurately, achieving an extremely low validation error and testing error. The haze removal results of several scenes of Landsat 8 Operational Land Imager (OLI) data show that the visibility of the dehazed images is significantly improved, and the color of recovered surface is consistent with the actual scene. Quantitative analysis proves that the dehazed results of MRCNN are superior to the traditional methods and other networks. Additionally, a comparison to haze-free data illustrates the spectral consistency after haze removal and reveals the changes in the vegetation index. Keywords: haze removal; multi-scale context aggregation; residual learning; convolutional neural network; Landsat 8 OLI
1. Introduction During the acquisition of optical satellite images, light reflected from the surface is usually scattered in the process of propagation due to the presence of water vapor, ice, fog, sand, dust, smoke, or other small particles in the atmosphere. This process reduces the image contrast and blurs the surface colors, leading to difficulties in many fields including cartography and web mapping, land use planning, archaeology, and environmental studies. Therefore, an effective haze removal method is of great importance to improve the capability and accuracy of applications that use satellite images. Haze removal aims at eliminating haze effects on at-sensor radiance data prior to the physically based image correction that converts at-sensor radiance to surface reflectance. In cloudy areas, there is no information about the ground surface, whereas in areas affected by haze the image still contains valuable spectral information. Although haze transparency presents an opportunity for image
Remote Sens. 2018, 10, 945; doi:10.3390/rs10060945
www.mdpi.com/journal/remotesensing
Remote Sens. 2018, 10, 945
2 of 24
restoration, an efficient and widely applicable haze removal method for handling various haze or thin clouds is still a great challenge, especially when only a single hazy image is available. Single-image based haze removal has made significant progress recently by relying on different assumptions and prior information. Chavez [1,2] presented an improved dark object subtraction (DOS) technique to correct optical data for atmospheric scattering, assuming a constant haze over the whole scene. Liang et al. [3,4] proposed a cluster matching technique for Landsat TM data, assuming that each land cover cluster has the same visible reflectance in both clear and hazy regions. The demand for existing aerosol transparent bands makes it impractical in some situations, as visible and near-infrared bands are usually contaminated by haze. Zhang et al. [5] proposed a haze optimized transformation (HOT) to characterize the spatial distribution of haze based on the assumption that the radiances of the red and blue band are highly correlated for pixels within the clearest portions of a scene and that this relationship holds for all surface types. However, the sensitivity of HOT to water bodies, snow cover, bare soil, and urban targets limits its application. To reduce the impact of spurious HOT responses, various strategies are proposed in the literature [6–9]. Liu et al. [10] developed a background suppressed haze thickness index (BSHTI) to estimate the relative haze thickness and used a virtual cloud point method to remove haze. Makarau et al. [11,12] utilized a haze thickness map (HTM) for haze evaluation, based on the premise that a local dark object reflects the haze thickness of the image. Shen et al. [13] developed a simple and effective method by using a homomorphic filter [14] for the removal of thin clouds in visible remote sensing (RS) images. He et al. [15] discovered the dark channel prior (DCP) that in most of the non-sky patches of haze-free outdoor images, at least one color channel has very low intensity at some pixels. DCP combined with an image degradation model has proved to be simple and effective enough for haze removal. However, it is computationally intensive and may be invalid in special cases. Some improved algorithms [16–20] are proposed to overcome these limitations. The great success of DCP in computer vision attracted the attention of researchers working on satellite application. Long et al. [21] redefined the transmission of DCP and used a low-pass Gaussian filter to refine the atmospheric veil instead of using soft matting method. Pan et al. [22] noted that the average intensity of remote sensing images’ dark channel is low, but not close to zero. Thus, they added a constant term into the image degradation model for haze removal. Jiang et al. [23] utilized a proportional strategy to gain accurate haze thickness maps for all bands from the original dark channel in order to prevent underestimation. These methods succeed in solving specific scenarios in practical applications. However, adjustable parameters in an algorithm must be well designed for various situations to obtain ideal results, which requires a considerable number of experiments on a wide variety of selected images. In addition, these algorithms are effective for local operations, but cannot handle a whole satellite image properly. In recent years, haze removal methods were developed in the machine learning framework. Tang et al. [24] combined four types of haze-relevant features with random forests [25] to estimate the haze transmission. Zhu et al. [26] created a linear model to evaluate the scene depth of a hazy image, depending on a prior color attenuation. The parameters of the model are learned with a supervised learning method. Despite the remarkable progress, the limitation of these methods lies in the fact that the haze-relevant features or heuristic cues are not effective enough. Following the success of the convolutional neural network (CNN) for image restoration or reconstruction [27–29], Cai et al. [30] proposed DehazeNet, a trainable CNN-based end-to-end system for haze transmission estimation. DehazeNet provides superior performance on natural images over existing methods and maintains efficiency and ease of use. Nevertheless, the “shallow” DehazeNet cannot handle RS images properly due to the dramatic spatial variability of images or haze in it and complicated nonlinear relationship between haze transmission and spectral-spatial information of images [31]. It is believed that deep learning architectures are generally more robust to the nonlinear input data owing to the ability to extract high-level, hierarchical, and abstract features. In the RS community, large numbers of deep networks are currently developed in the field of hyperspectral image classification [32,33], semantic
Remote Sens. 2018, 10, 945
3 of 24
labelling [34,35], image segmentation [36], object detection [37,38], change detection [39], etc. However, to the best of our knowledge, a deep network has not been proposed for haze removal of RS images. In this study, we propose a multi-scale residual convolutional neural network (MRCNN) for the first time that can learn the mapping relations between hazy images and their associated haze transmission automatically. MRCNN behaves well in predicting accurate haze transmission by extracting spatial–spectral correlation information and high-level abstract features from hazy image blocks. Specifically, the dilated convolution is utilized to obtain local-to-global contexts including details of haze and the trend of haze spatial variations. Technically, residual blocks are introduced into the network to avoid loss of weak information, and dropout is used to improve the generalization ability and prevent overfitting. Experiments on Landsat 8 Operational Land Imager (OLI) data demonstrate the effectiveness of MRCNN for haze removal. The remaining parts of this paper are organized as follows. Section 2 briefly introduces the haze degradation model and provides some basic information about CNNs. Section 3 describes the details of the proposed MRCNN. The experimental results and comparison to other state-of-the-art methods are showed in Section 4. The model performance, spectral consistency before and after haze removal, and influence on vegetation index are discussed in Section 5. Finally, our conclusions are outlined in Section 6. 2. Preliminaries 2.1. Haze Degradation Model To describe the formation of a hazy image, different haze models have been proposed in the literature. The widely used models include an additive model [11,12] and a haze degradation model [40,41]. Herein, we adopt the latter for the sake of developing a trainable end-to-end deep neural network. The mathematical expression is written as follows [40,41]: I (x) = J (x)t(x) + A(1 − t(x))
(1)
t(x) = e− βd(x)
(2)
where x is the position of a pixel in the image, I is the hazy image, J is the clear image to be recovered, A is the global atmospheric light, t represents the haze transmission describing the portion of atmospheric light that reaches the sensor, d is the distance between the object and the observer, and β represents the scattering coefficient of the atmosphere. The clear image J can be recovered after A and t are estimated properly. Equation (2) suggests that when d goes to infinity, t approaches zero. Together with Equation (1) we have: A = I (x), d(x) → ∞.
(3)
In a practical imaging process, d cannot be infinity, but it can be a long distance that leads to a very low transmission t0 . Thus, the atmospheric light A is estimated as follows: A=
max
y∈{ x |t(x)≤t0 }
I (y).
(4)
The discussion above indicates that to recover a clear image, i.e., to achieve haze removal, the key is to estimate an accurate haze transmission map. In this paper, we plan to build a deep neural network for the estimation of haze transmission according to the input of hazy images, i.e., taking hazy images as input and outputting corresponding transmission maps. Once A and t are estimated, the clear image J can be recovered as follows: J (x) =
I (x) − A + A. t (x)
(5)
Remote Sens. 2018, 10, 945
4 of 24
In this paper, the haze degradation model is used to simulate training pairs of hazy patches and target haze transmission from sampled clear blocks, and finally recover a clear image according to the Remote 44 of Remote Sens. Sens. 2018, 2018, 10, 10, xx FOR FOR PEER PEER REVIEW REVIEW of 24 24 predicted transmission map.
2.2. CNNs 2.2.CNNs CNNs 2.2. CNNs [42] are biologically inspired variants multilayer perceptron that can learn hierarchical CNNs[42] [42]are arebiologically biologicallyinspired inspiredvariants variantsofof ofmultilayer multilayerperceptron perceptronthat thatcan canlearn learnhierarchical hierarchical CNNs representations from raw data and are capable of initially abstracting simple concepts and then representations from raw data and are capable of initially abstracting simple concepts andthen then representations from raw data and are capable of initially abstracting simple concepts and constructing more complex concepts by comprehending the simpler ones. There are two special constructing more complex concepts by comprehending the simpler ones. There are two special constructing more complex concepts by comprehending the simpler ones. There are two special aspects the architecture CNN, i.e., sparse connections and shared weights. CNNs exploit aspectsinin inthe thearchitecture architectureofof ofCNN, CNN,i.e., i.e.,sparse sparseconnections connectionsand andshared sharedweights. weights.CNNs CNNsexploit exploitaaa aspects spatially local correction by enforcing a local connectivity pattern between neurons of adjacent layers. spatiallylocal localcorrection correctionby byenforcing enforcingaalocal localconnectivity connectivitypattern patternbetween betweenneurons neuronsof ofadjacent adjacentlayers. layers. spatially We illustrate this graphically in Figure 1, where units in layer m are connected to three spatially We illustrate this graphically in Figure 1, where units in layer m are connected to three spatially We illustrate this graphically in Figure 1, where units in layer m are connected to three spatially contiguous units layer In addition, in CNNs, each convolutional filter replicated across contiguousunits unitsinin inlayer layermm m−−−1.1. 1. In Inaddition, addition,in inCNNs, CNNs,each eachconvolutional convolutionalfilter filterisis isreplicated replicatedacross across contiguous the entire layer, sharing the same weights and biases. In Figure 2, the weights of the same color are the entire layer, sharing the same weights and biases. In Figure 2, the weights of the same color are the entire layer, sharing the same weights and biases. In Figure 2, the weights of the same color are shared-constrained to be identical. In this way, CNNs are able to achieve better generalization on shared-constrained to be identical. In this way, CNNs are able to achieve better generalization on shared-constrained to be identical. In this way, CNNs are able to achieve better generalization on vision vision problems and the learning efficiency is increased by greatly reducing the number of free vision problems and theefficiency learning is efficiency by greatly reducingofthe of free problems and the learning increasedisbyincreased greatly reducing the number freenumber parameters to parameters to be learnt. parameters to be learnt. be learnt.
Figure Local connectivity pattern CNNs. Each unit connected three spatially contiguous Figure1.1. 1.Local Localconnectivity connectivitypattern patterninin inCNNs. CNNs.Each Eachunit unitisis isconnected connectedtoto tothree threespatially spatiallycontiguous contiguous Figure units the layer below. unitsinin inthe thelayer layerbelow. below. units
Figure Shared weights CNNs. The same color indicates the same weight. Figure Figure2.2. 2.Shared Sharedweights weightsinin inCNNs. CNNs.The Thesame samecolor colorindicates indicatesthe thesame sameweight. weight.
The input and output each layer are sets arrays called feature maps. standard CNN Theinput inputand andoutput outputofof ofeach eachlayer layerare aresets setsofof ofarrays arrayscalled calledfeature featuremaps. maps.AA Astandard standardCNN CNN The usually contains three kinds of basic layers: a convolutional layer, nonlinear layer, and pooling layer. usuallycontains containsthree threekinds kindsof ofbasic basiclayers: layers:aaconvolutional convolutional layer, layer,nonlinear nonlinearlayer, layer,and andpooling poolinglayer. layer. usually Deep CNNs are constructed by stacking several basic layers to form the deep architecture. Deep CNNs are constructed by stacking several basic layers to form the deep architecture. Deep CNNs are constructed by stacking several basic layers to form the deep architecture. The convolutional layer a series of convolutional operations on the previous layer with Theconvolutional convolutional layer layer performs performs operations on on thethe previous layerlayer with The performsaaseries seriesofofconvolutional convolutional operations previous aa small kernel (e.g., 3 × 3, 5 × 5, etc.). Each convolutional operation computes the dot product between small kernel (e.g.,(e.g., 3 × 3,35×× 5, operation computes the dot product between with a small kernel 3, etc.). 5 × 5,Each etc.).convolutional Each convolutional operation computes the dot product the weights of the kernel and a local region (called the receptive field) of the input feature map. The the weights of the kernel and a local region (called the receptive field) of the input feature map. The between the weights of the kernel and a local region (called the receptive field) of the input feature map. xy xy xy value at position (x, y) of the jth feature map in the mth layer is denoted as follows: mjν atatposition value The value position(x, (x,y)y)ofofthe thejth jthfeature featuremap mapininthe themth mthlayer layerisisdenoted denotedasasfollows: follows: m j mj
NN
PP 11 QQ 11
N P −1 Q −1 xy pq xy pq (x(+ (xx ppy)( )(yy qq )) pqmji mj ((mmp)( 11))ii+q) , ,, xymj b mj mj =bb mji ν + ω ν mj ∑∑ ∑ mj
p q mji (m−1)i ii p q
0 0 0 i =0p0=0q0=0 0
(6) (6) (6)
where where ii indexes indexes the the feature feature map map in in the the (m (m −− 1)th 1)th layer layer connected connected to to the the current current jth jth feature feature map, map, NN where i indexes the feature map in the (m − 1)th layer connected to the current jth feature map, pq pq represents the number of feature maps in (m − 1)th layer, is the weight at position (p, pq is the the weight weight at at position position (p, (p, q)q) q) ωmji mji is Nrepresents representsthe thenumber numberofoffeature featuremaps mapsinin(m (m−−1)th 1)thlayer, layer, mji connected to the ith feature map, P and are the height and the width of the convolutional kernel, Q connectedtotothe theith ithfeature featuremap, map,P and P and areheight the height andwidth the width the convolutional kernel, Q the connected Q are and the of theofconvolutional kernel, and and is the bias of the jth feature map in the mth layer. Figure 3 shows two layers of a CNN. b and is the bias of the jth feature map in the mth layer. Figure 3 shows two layers of a CNN. mj bmj is the b mj bias of the jth feature map in the mth layer. Figure 3 shows two layers of a CNN. Layer m − 1 Layer −− 11 contains four maps and m two maps. in contains feature maps and layer m contains two feature maps. The pixels in The layerpixels m (blue or redm Layer m mfour contains four feature feature maps and layer layer m contains contains two feature feature maps. The pixels in layer layer m (blue or red squares) are computed from the pixels of layer m − 1, which fall within their 2 × 2 receptive squares) are computed from the pixels of layer m − 1, which fall within their 2 × 2 receptive field (blue or red squares) are computed from the pixels of layer m − 1, which fall within their 2 × 2 receptive field field in in the the layer layer below below (shown (shown as as colored colored rectangles). rectangles). It It is is noted noted that that the the subscript subscript m m is is omitted omitted in in the weight matrices. the weight matrices. The The nonlinear nonlinear layer layer embeds embeds aa nonlinear nonlinear activation activation function function that that is is applied applied to to each each feature feature map map to to learn learn nonlinear nonlinear representations. representations. The The rectified rectified linear linear unit unit (ReLU) (ReLU) [43,44] [43,44] is is widely widely used used in in recent recent deep deep neural neural networks networks and and can can be be defined defined as as ff((xx )) max( max(00,,xx )).. In In other other words, words, ReLU ReLU thresholds thresholds the the non-positive value as zero and keeps the positive value unchanged. ReLU can alleviate the vanishing non-positive value as zero and keeps the positive value unchanged. ReLU can alleviate the vanishing
Remote Sens. 2018, 10, 945
5 of 24
in the layer below (shown as colored rectangles). It is noted that the subscript m is omitted in the weight matrices. The nonlinear layer embeds a nonlinear activation function that is applied to each feature map to learn nonlinear representations. The rectified linear unit (ReLU) [43,44] is widely used in recent Remoteneural Sens. 2018, 10, x FOR and PEERcan REVIEW 5 of 24 deep networks be defined as f ( x ) = max(0, x ). In other words, ReLU thresholds the Remote Sens. 2018, 10, x FOR PEER REVIEW 5 of 24 non-positive value as zero and keeps the positive value unchanged. ReLU can alleviate the vanishing gradient problems [45] and speed up the learning process to achieve considerable reduction in gradient gradient problems problems [45] [45] and and speed speed up up the thelearning learning process processto toachieve achieveaaaconsiderable considerable reduction reduction in in training time [46]. training trainingtime time[46]. [46]. The pooling layer provides way to perform sub-sampling along the spatial dimension and to The Thepooling poolinglayer layerprovides providesaaaway wayto toperform performsub-sampling sub-samplingalong alongthe thespatial spatialdimension dimensionand andto to make features invariant from the location. It summarizes the outputs of neighboring groups of make from the the location. It summarizes the outputs of neighboring groups ofgroups neurons makefeatures featuresinvariant invariant from location. It summarizes the outputs of neighboring of neurons in the There two of operations: and in the same map.kernel There map. are two kindsare of pooling operations: max-pooling andmax-pooling average-pooling. neurons inkernel the same same kernel map. There are two kinds kinds of pooling pooling operations: max-pooling and average-pooling. The former samples the maximum in the region to be pooled while the latter The former samplesThe the former maximum in the region to be pooled while the latter the mean average-pooling. samples the maximum in the region to becomputes pooled while thevalue. latter computes the value. 44shows an of pooling operations. Figure 4 shows an example of different pooling operations. Traditionally, neighborhoods summarized computes themean mean value.Figure Figure shows anexample example ofdifferent different pooling operations.Traditionally, Traditionally, neighborhoods summarized by adjacent pooling units do not overlap. Overlapping max-pooling by adjacent pooling units do not max-pooling (OMP)Overlapping is used in ImageNet [46], neighborhoods summarized byoverlap. adjacentOverlapping pooling units do not overlap. max-pooling (OMP) used [46], proven to which be an effective way toisisavoid overfitting. (OMP)isisisproven usedin intoImageNet ImageNet [46],which which proven tobe bean aneffective effectiveway wayto toavoid avoidoverfitting. overfitting.
Figure 3.3.Example Pixels in from Figure Example of convolutional layer. layer m (blue or red squares) are computed Figure3. Exampleof ofaaaconvolutional convolutionallayer. layer.Pixels Pixelsin inlayer layerm m(blue (blueor orred redsquares) squares)are arecomputed computedfrom from pixels of layer mm− −−111that within their ××222receptive layer below (shown as colored pixels that fall receptive field in the pixelsof oflayer layerm thatfall fallwithin withintheir their222× receptivefield fieldin inthe thelayer layerbelow below(shown (shownas ascolored colored rectangles). Weight matrices for feature maps in layer m are listed in the squares with the same color. rectangles). rectangles).Weight Weightmatrices matricesfor forfeature featuremaps mapsin inlayer layerm mare arelisted listedin inthe thesquares squareswith withthe thesame samecolor. color.
Figure Example of pooling operations. The kernel size isis22××22and Figure4.4.4.Example Exampleof ofpooling poolingoperations. operations.The Thekernel kernelsize sizeis andthe thestride strideis is222pixels. pixels. Figure 2 × 2 and the stride is pixels.
3. Data Method Dataand andMethod Method 3.3.Data and 3.1. 3.1. Data 3.1.Data Data The in this study are 88Operational Land Image (OLI) The remote (RS)images imagesused used this study are Landsat 8 Operational Theremote remotesensing sensing(RS) (RS) images used inin this study areLandsat Landsat Operational LandLand ImageImage (OLI) data, obtained from the Earth Explorer of the United States Geological Survey (USGS) (OLI) obtained Earth Explorer United States States Geological Geological Survey (USGS) data, data, obtained from from the the Earth Explorer of ofthetheUnited (USGS) (https://earthexplorer.usgs.gov/). satellite, which (https://earthexplorer.usgs.gov/). TheOLI OLIisis isan aninstrument instrumentonboard onboard the the Landsat Landsat (https://earthexplorer.usgs.gov/). The The OLI an Landsat 888 satellite, satellite,which which was launched in February 2013. was The satellite collects images of the Earth with 16-day repeat cycle. waslaunched launchedin inFebruary February2013. 2013.The Thesatellite satellitecollects collectsimages imagesof ofthe theEarth Earthwith withaaa16-day 16-dayrepeat repeatcycle. cycle. The Theapproximate approximatescene scenesize sizeisis170 170km kmnorth–south north–southby by183 183km kmeast–west. east–west.In Intotal, total,the theOLI OLIsensor sensorhas has eight multispectral bands. The spatial resolution of the OLI multispectral bands is 30 m, and the eight multispectral bands. The spatial resolution of the OLI multispectral bands is 30 m, and the digital digitalnumbers numbers(DNs) (DNs)of ofthe thesensor sensordata dataare are16-bit 16-bitpixel pixelvalues. values.As Ashaze hazeusually usuallyhas hasan aninfluence influenceon on the visible and near-infrared (NIR) bands, sequentially including band 1 (coastal/aerosol, the visible and near-infrared (NIR) bands, sequentially including band 1 (coastal/aerosol, 0.43–0.45 0.43–0.45 μm), μm),band band22(blue, (blue,0.45–0.51 0.45–0.51μm), μm),band band33(green, (green,0.53–0.59 0.53–0.59μm), μm),band band44(red, (red,0.64–0.67 0.64–0.67μm), μm),and andband band
Remote Sens. 2018, 10, 945
6 of 24
The approximate scene size is 170 km north–south by 183 km east–west. In total, the OLI sensor has eight multispectral bands. The spatial resolution of the OLI multispectral bands is 30 m, and the digital numbers (DNs) of the sensor data are 16-bit pixel values. As haze usually has an influence on the visible and near-infrared (NIR) bands, sequentially including band 1 (coastal/aerosol, 0.43–0.45 µm), band 2 (blue, 0.45–0.51 µm), band 3 (green, 0.53–0.59 µm), band 4 (red, 0.64–0.67 µm), and band 5 (NIR, 0.85–0.88 µm), thus just the first five bands of the OLI images are used as inputs of the following Remote Sens. 2018, 10, x FOR PEER REVIEW 6 of 24 network for the prediction of haze transmission.
3.2. CNN CNN Architecture Architecture 3.2. Thehaze hazedegradation degradationmodel modelin inSection Section2.1 2.1suggests suggeststhat thatthe theestimation estimationof ofthe thehaze hazetransmission transmission The map isis of of the the most most important important task task to to recover recover aaclear clearimage. image. To To this this end, end, we we present present aamulti-scale multi-scale map residual CNN (MRCNN) to learn the mapping relations between the raw hazy images and their their residual CNN (MRCNN) to learn the mapping relations between the raw hazy images and associated haze transmission automatically. Figure 5 illustrates the architecture of MRCNN, which associated haze transmission automatically. Figure 5 illustrates the architecture of MRCNN, which mainlyconsists consistsof offour fourmodules: modules: spectral–spatial spectral–spatial feature feature extraction, extraction, multi-scale multi-scale context context aggregation, aggregation, mainly residual learning and fully connected layers. The detailed configurations of all layers are summarized residual learning and fully connected layers. The detailed configurations of all layers are summarized inTable Table1,1,and andare areexplained explainedininthe thefollowing. following. in
Figure 5. 5. Overview of of thethe proposed MRCNN. The network mainlymainly consistsconsists of four modules: spectral– Figure Overview proposed MRCNN. The network of four modules: spatial feature extraction, multi-scale context aggregation, residual learning, and fully connected spectral–spatial feature extraction, multi-scale context aggregation, residual learning, and fully layers. connected layers. Table 1. Detailed configurations of MRCNN. Conv represents the convolutional layer, Conv-i (i = 1, Table 1. Detailed configurations of MRCNN. Conv represents the convolutional layer, Conv-i (i = 1, 2, 4) 2, 4) represents the dilated convolutional layer, whose dilation rate equals i, OMP represents the represents the dilated convolutional layer, whose dilation rate equals i, OMP represents the overlapping overlapping max-pooling and FC the fully-connected layer. max-pooling layer, and FC layer, represents therepresents fully-connected layer. Module Input Module M1 Input M1
M2 M2
M3 M3
FCs FCs
Unit Unit Conv Maxout Conv Conv-1 Maxout Conv-2 Conv-1 Conv-4 Conv-2 OMP Conv-4 Conv OMP Conv Conv Conv Conv OMP Conv OMP Conv Conv Conv Conv Conv Conv OMP OMP FC FC FC FC FC FC
Input Size Kernel Num. Kernel Size Pad Activation Function Output Size Kernel Activation -Input Size - Kernel - Pad 5 × 16 × 16 Output Size Num. Size Function 5 × 16 × 16 64 3×3 1 64 × 16 × 16 -5 ×16 16××1616× 16 64 × 16 × 164 - 0 5 × 16 × 16 64 3 × 3 1 64 × 16 × 16 × 16 16 3×3 1 1616× × 1616 × 16 64 ××16 16 × 16 4 -16 × 16 × 16 16 3×3 3 0 1616× × 1616 × 16 16 ××16 16 × 16 16 -16 × 16 × 16 16 3 × 33 × 3 7 1 1616× × 1616 × 16 16 × 16 × 16 16 16 × 16 × 16 48 × 16 × 16 9 × 93 × 3 0 3 48 × 8 × 8 16 × 16 × 16 16 3×3 7 16 × 16 × 16 48 ×48 8 ××816 × 16 64 3 × 39 × 9 1 0 ReLU 648 ××88× 8 48 × 64 × 8 × 8 64 3×3 1 ReLU 64 × 8 × 8 48 × 8 × 8 64 3×3 1 ReLU 64 × 8 × 8 64 × 864× × 88×8 64 3 × 3 1 ReLU 648 ××88× 8 64 3×3 1 ReLU 64 × 64 × 864× × 88×8 5 × 53 × 3 0 1 648 ××48× 4 64 ReLU 64 × 64 × 4× 64 × 464× × 48×8 128 3 × 35 × 5 1 0 ReLU 128 × 44 × 4 128 ReLU 128128 × 4 ××44× 4 128 × 64 4 ××44 × 4 128 3 × 33 × 3 1 1 ReLU 128 × 4 × 4 128 3×3 1 ReLU 128 × 4 × 4 128 ×128 4 ××4 4 × 4 128 3 × 33 × 3 1 1 ReLU 128 ReLU 128128 × 4 ××44× 4 128 ×128 4 ××4 4 × 4 3 × 33 × 3 0 0 -128128 × 2 ××22× 2 512 512 - - ReLU ReLU 512512 512 512 - - ReLU ReLU 64 64 ReLU 1 1 64 64 - - ReLU
3.2.1. Spectral–Spatial Feature Extraction To address the ill-posed nature of the single image dehazing problem, existing methods propose various empirical assumptions or prior knowledge to extract intermediate haze-relevant features, such as dark channel [15], hue disparity [47], and color attenuation [26]. These features reflect different perspectives of the original image and are helpful for the estimation of haze transmission. Inspired by this, we design the first module of (M1) MRCNN for haze-relevant feature extraction. In
Remote Sens. 2018, 10, 945
7 of 24
3.2.1. Spectral–Spatial Feature Extraction To address the ill-posed nature of the single image dehazing problem, existing methods propose various empirical assumptions or prior knowledge to extract intermediate haze-relevant features, such as dark channel [15], hue disparity [47], and color attenuation [26]. These features reflect different perspectives of the original image and are helpful for the estimation of haze transmission. Inspired by this, we design the first module of (M1) MRCNN for haze-relevant feature extraction. In the field of image classification, it is proved that the usage of spectral features and spatial information in combined fashion can significantly improve the final accuracy [48]. Thus, we utilize 3D convolutional kernels to extract spectral and spatial features simultaneously in M1. The 3D convolution operation computes each pixel in association with d × d spatial neighborhoods and n spectral bands to exploit the important discriminative information and take full advantage of the structural characteristics of the 3D input data cubes. After feature extraction, we apply a Maxout unit [49] for nonlinear mapping as well as dimension reduction, which is able to eliminate information redundancies and improve the performance of the network by removing multi-collinearity. Maxout generates a new feature map by taking a pixel-wise maximization operation over k feature maps in the layer below: hmj (x) =
max
i ∈[k ∗ j,k∗ j+k )
f (m−1)i (x), j ∈ [0, N − 1],
(7)
where j indexes the feature map in the mth layer, i indexes the feature map in the (m − 1)th layer, N means the number of feature maps in mth layer, and ∗ denotes multiplicative operation. Specifically, M1 connects to the input layer that contains 3D hazy patches of size 5 × 16 × 16 (channels × width × height, similarly hereafter). The inputs are padded with one pixel in the spatial dimensions. We use zero padding in this paper. The convolutional layer filters inputs with 64 kernels of size 5 × 3 × 3. The stride of these kernels, or the distance between the receptive fields’ centers of the neighboring neurons, is one pixel. The 3 × 3 convolutional filter is the smallest kernel to seize patterns in different directions, such as center, up/down, and left/right. Additionally, small convolutional filters will increase the nonlinearities inside the network and thus make the network more discriminative. The Maxout unit takes four feature maps that are generated by the convolutional layer in a non-overlapping manner as input, calculates maximum value at each pixel, and finally outputs one feature map with an unchanged size. Finally, M1 outputs 16 feature maps of size 16 × 16. 3.2.2. Multi-Scale Context Aggregation When observing an image, we often zoom in or out to recognize its characteristics from local to global. This process demonstrates that features in different scales are important for inference of relative haze thickness from a single image when additional information is lacking. Herein, we design the second module (M2) for multi-scale context aggregation. Basically, there are two approaches to gain the feature in a large receptive field: deepening the network or enlarging the size of the convolutional kernels. Although theoretically, features from high-level layers of a network have a larger receptive field on the input image, in practice, they are much smaller [37]. Enlarging the convolution kernel size directly can also obtain wider information, but it is always associated with exponential growth of learnable parameters. Dilated convolution [50] provides us with a new approach to capture multi-scale context by using different dilation rates. Dilated convolution expands the receptive field without extra parameters so that it can efficiently learn more extensive, powerful and abstract information. In addition, dilated convolution is capable of aggregating multi-scale contextual information without losing resolution or analyzing rescaled images. Figure 6 illustrates an example of 2-dilated convolution. The convolution kernel is of size 3 × 3 and its dilation rate equals 2. Thus, each element in the feature map after dilated convolution has a
Remote Sens. 2018, 10, 945
8 of 24
receptive field of 7 × 7. In general, the size of receptive field Fr of 3 × 3 filters with different dilation rate r can be computed as follows: n o Fr = (2i+2 − 1) × (2i+2 − 1), i = max j 2 j ≤ r, j ∈ N ,
(8)
where N represents the set of natural numbers. To make the size of resulting feature map unchanged, the padding set as Fr/2 in the corresponding direction. By setting a group Remote Sens. 2018,rate 10, x should FOR PEERbe REVIEW 8 of of 24 small-to-large dilation rates, a series of feature maps with local-to-global contexts can be obtained. Local contextsMulti-scale record low-level haze while global identify ofblock haze spatial respectively. featuredetails mapsof are concatenated to contexts form a 48 × 16 ×the 16 trend feature before variations withthe its following wide visual cues.layer. Meanwhile, generated mapslayer withismulti-scale contexts being fed into OMP The kernel size of feature the pooling 9 × 9 and its stride can is 1 be aligned automatically due to their equal resolution. pixel. Therefore, the final output of M2 is 48 feature maps of size 8 × 8.
Figure 6. 6. An An illustration illustration of of dilated dilated convolution. convolution. The The input input feature feature map map is is of of size size 99×× 99 and and padded padded Figure with 3 pixels. The padding value is zero. The convolution kernel is of size 3 × 3 and its dilation rate with 3 pixels. The padding value is zero. The convolution kernel is of size 3 × 3 and its dilation rate equals 2. Each element in the resulting feature map has a receptive field of 7 × 7. equals 2. Each element in the resulting feature map has a receptive field of 7 × 7.
It should be noted that no activation function is used in M1 and M2 due to the experimental fact Specifically, M2 takes as input the output of M1. It contains three parallel sub-layers using that remote sensing image blocks usually produce large gradients in the early stage. A large gradient 1-dilated, 2-dilated and 4-dialated convolution, respectively. Each layer has 16 convolution kernels flowing through a ReLU neuron could cause the weights to update in such a way that the neuron will of size 3 × 3. Thus, their actual receptive fields correspond to 3 × 3, 7 × 7 and 15 × 15. To ensure never activate on any data point again. If this occurs, then the gradient flowing through the unit will the multi-scale outputs are with the same size, inputs for three sub-layers are padded with 1, 3 and 7 forever be zero from that point on, i.e., the training process would “die”. As the process of extracting pixels, respectively. Multi-scale feature maps are concatenated to form a 48 × 16 × 16 feature block features proceeds, the distribution of the feature maps in deeper layers tends to be more stable. before being fed into the following OMP layer. The kernel size of the pooling layer is 9 × 9 and its Therefore, it is more appropriate to add the ReLU activation function in the deeper convolutional stride is 1 pixel. Therefore, the final output of M2 is 48 feature maps of size 8 × 8. layers instead of the shallow ones. It should be noted that no activation function is used in M1 and M2 due to the experimental fact that remote sensing image blocks usually produce large gradients in the early stage. A large 3.2.3. Residual Learning gradient flowing through a ReLU neuron could cause the weights to update in such a way that the Obtaining an activate accurateon estimation haze is notIfeasily accessible, surface coverage, not neuron will never any data of point again. this occurs, thenbecause the gradient flowing through haze, is the in RS images. Thethe features from high-level to lose the unit willdominant forever beinformation zero from that point on, i.e., training process wouldlayers “die”.are Aslikely the process weak information, such as haze.the Deeper networks face a degradation problem with an of extracting features proceeds, distribution of also the feature maps in deeper layers[51]: tends to be increase in theTherefore, network depth, accuracy gets saturated andReLU then degrades is more stable. it is more appropriate to add the activationrapidly; functionthis in outcome the deeper not caused by overfitting. Herein, we introduce residual learning [52] to resolve these issues. Instead convolutional layers instead of the shallow ones. of anticipating that each layer will directly fit a desired underlying mapping, we explicitly allow 3.2.3. Learning some Residual layers to fit a residual mapping. Formally, denoting the desired underlying mapping as Η(x), we expect stacked layers estimation to fit another mapping of F(x) Η(x)because - x . Therefore, the original Obtaining an accurate of haze is not easily accessible, surface coverage, not haze, is the dominant in residual RS images. The features from high-level layers arebecause likely to mapping is recast into information F(x) x . The learning is very effective in deep network, it lose weak such as haze.fitDeeper face adeepens. degradation problem [51]: with an is easier toinformation, fit F(x) than to directly Η(x) networks when thealso network increase in the network accuracy gets saturated and then degrades this outcome layers is not Figure 7 shows thedepth, residual block used in the third module (M3). Allrapidly; three convolutional caused by overfitting. we introduce residual learning [52] tofunction. resolve these issues. Instead of have k kernels of size 3 Herein, × 3, equipped with ReLU nonlinear activation The first convolutional anticipating that each layer will directly fit a desired underlying mapping, we explicitly allow some layer outputs its learned features x , which are sent to the second and third convolutional layer for layers to fit a residual mapping. Formally, denoting the desired underlying mapping as ( x ) , we expect learning residual features F(x). F(x) and x are then fused using the sum operation to form the stacked layers toΗ fit(x) another of OMP F (x) =layer (x) −performs x. Therefore, original mapping is recast into target features . The mapping following localtheaggregation on Η(x) without padding. Specifically, M3 connects to the outputs of M2, which are of size 48 × 8 × 8. The inputs are sent to two sequential residual blocks. The convolutional layer in the first block has 64 kernels while 128 kernels are used in the second block. The kernel size of the OMP layers is 5 × 5 and 3 × 3, respectively. Finally, M3 outputs features of size 128 × 2 × 2.
Remote Sens. 2018, 10, 945
9 of 24
F (x) + x. The residual learning is very effective in deep network, because it is easier to fit F (x) than to directly fit (x) when the network deepens. Figure 7 shows the residual block used in the third module (M3). All three convolutional layers have k kernels of size 3 × 3, equipped with ReLU nonlinear activation function. The first convolutional layer outputs its learned features x, which are sent to the second and third convolutional layer for learning residual features F (x). F (x) and x are then fused using the sum operation to form the target features (x). The following OMP layer performs local aggregation on (x) without padding. Specifically, M3 connects to the outputs of M2, which are of size 48 × 8 × 8. The inputs are sent to two sequential residual blocks. The convolutional layer in the first block has 64 kernels while 128 kernels are used in the second block. The kernel size of the OMP layers is 5 × 5 and 3 × 3, respectively. Finally, M3 Remote Sens. 2018, 10,of x FOR REVIEW 9 of 24 outputs features sizePEER 128 × 2 × 2.
7. Residual block block used usedin inMRCNN. MRCNN.The Thefirst firstconvolutional convolutionallayer layeroutputs outputsitsits learned features Figure 7. learned features x, x, which sent to the second third convolutional layer for learning residual features which areare sent to the second andand third convolutional layer for learning residual features F(x).F(x). F(x) F(x) and are then a sum operation to form the target features xand arexthen fusedfused usingusing a sum operation to form the target features H(x).H(x).
3.2.4. Fully Layers 3.2.4. Fully Connected Connected Layers At the utilize fully connected (FC)(FC) layers to achieve our At the end end of of the theproposed proposednetwork, network,wewe utilize fully connected layers to achieve regressive task,task, i.e., i.e., predicting haze transmission relying stacked our regressive predicting haze transmission relyingononabstract abstract features features from from stacked convolutional layers. The feature maps of the last convolutional layer are flattened and fed into the convolutional layers. The feature maps of the last convolutional layer are flattened and fed into FC layers. However, the FC are prone to overfitting, thus hampering the generalization ability the FC layers. However, thelayers FC layers are prone to overfitting, thus hampering the generalization of the overall network. Dropout,Dropout, a regularization method proposed by Hintonby et Hinton al. [53], et randomly ability of the overall network. a regularization method proposed al. [53], sets a portion the hidden neurons toneurons zero during training. dropped doneurons not contribute randomly setsof a portion of the hidden to zero duringThe training. Theneurons dropped do not in the forward and are used back-propagation procedure. DropoutDropout has beenhas proven contribute in thepass forward passnot and are in notthe used in the back-propagation procedure. been to improve the generalization ability and largely prevents overfitting [54]. proven to improve the generalization ability and largely prevents overfitting [54]. Specifically, three with 512,512, 64, and respectively. They Specifically, three FC FClayers layersare areimplemented implemented with 64, one and nodes, one nodes, respectively. computes their their output as as whereωiare weight matrices, bias They computes output weight matrices, bi are bbias vectors, y i yi =f(f(iωyi yii-−11+bbii)), where i are i are yi−1 is the output of the previous layer and f (·) represents the ReLU activation function. In addition, vectors, y i - 1 is the output of the previous layer and f() represents the ReLU activation function. we have allowed a 50% dropout in the first FC layer. Finally, the FC layers produce a single value In addition, we have allowed a 50% dropout in the first FC layer. Finally, the FC layers produce a representing the haze transmission at the central pixel of each input of hazy patches. single value representing the haze transmission at the central pixel of each input of hazy patches. 3.3. Training Process 3.3. Training Process To train the designed MRCNN, a large number of training samples consisting of hazy patches To train the designedhaze MRCNN, a large number of training samples of hazy patches and their corresponding transmissions are required. However, it isconsisting challenging to obtain the and haze their transmission. correspondingInspired haze transmissions However, it is challenging to obtainfrom the real by Tang et are al.’srequired. method [24], we generate training samples real haze transmission. Inspired by Tang et al.’s method [24], we generate training samples from clear clear image blocks by simulating the haze degradation process according to the model in Section 2.1. image blocks by simulating the haze degradation process according to the model in Section 2.1. This work is based on two assumptions: first, the image content is independent of transmission, i.e.,This the work is based on two assumptions: first, the image content is independent of transmission, i.e., same content can appear under any transmission; and second, the transmission is locally constant, the i.e., same content appear under transmission; and second, transmission locally constant, image pixels incan a small patch haveany a similar transmission. Given athe clear patch PJ , theisatmospheric light i.e., image pixels in a small patch have a similar transmission. Given a clear patch , P J A, and a random transmission t ∈ [0, 1], a simulated hazy patch PI is generated as PI = tPJ + (1 − tthe ) A. atmospheric A , and a random transmission t [0,1]when , a simulated hazy patch is As the surfacelight radiances reached the sensor would be too weak the transmission is lower P Ithan 0.3, we restrict [0.3, 0.95]. To reduce the uncertainty of variables in learning, A is simply t Prange generated as PtIinthe J (1 t )A . As the surface radiances reached the sensor would be too weak set to 1 in all five channels, i.e., A = [1, 1, 1, 1, 1]. Herein, a training pair is composed of the generated when the transmission is lower than 0.3, we restrict t in the range [0.3,0.95] . To reduce the PI and given t. uncertainty of variables in learning, A is simply set to 1 in all five channels, i.e., A [1,1,1,1,1] . Herein, a training pair is composed of the generated P I and given t .
Considering the difficulty of building a complete dataset containing various kinds of surface cover types, clear samples are collected in a local clear region of a single scene of Landsat 8 OLI data (path 123, row 032, acquisition date 23 September 2015) in our experiments. In addition, if too many surface types are selected for training, the dataset would become extremely large when ensuring
Remote Sens. 2018, 10, 945
10 of 24
Considering the difficulty of building a complete dataset containing various kinds of surface cover types, clear samples are collected in a local clear region of a single scene of Landsat 8 OLI data (path 123, row 032, acquisition date 23 September 2015) in our experiments. In addition, if too many surface types are selected for training, the dataset would become extremely large when ensuring sufficient samples for each type, which requires a considerable amount of computer memory and training time. In total, 60 clear blocks of size 240 × 240 are sampled from the test scene. Some examples are shown in Figure 8. All original clear blocks are normalized using the max-min method to ensure identical scale and that the atmospheric light equals 1. For each block, we uniformly generate 20 random transmissions to generate hazy blocks, which are then tiled into patches with a size of 16 × 16. Thus, 270,000 simulated hazy patches are collected. To ensure the robustness, these patches are shuffled to break potential correlation. Finally, all patches are sorted into a 90% training set and a 10% testing set, Remote Sens. 2018, 10, x FOR PEER 10 of 24 whose numbers are 243,000 andREVIEW 27,000, respectively. Hereafter, we refer to this dataset as D1. It is important to note that the simulated patches are directly used as input of the training network without dataset as D1. It is important to note that the simulated patches are directly used as input of the additionaltraining normalization, which would normalization, change the real haze depth. network without additional which would change the real haze depth.
Figure 8. Examples of clear blocks sampled from Landsat 8 OLI data (path: 123, row: 32, acquired
Figure 8. Examples of clear blocks sampled from Landsat 8 OLI data (path: 123, row: 32, acquired date: date: 23 September 2015). 23 September 2015). The MRCNN is trained through mini-batch stochastic gradient descent (MSGD) and an earlystopping mechanism. Gradient descent is a simple algorithm in which we repeatedly make small The MRCNN is trained through mini-batch stochastic gradient descent (MSGD) and an steps downward on an error surface defined by a loss function of some parameters. MSGD estimates early-stopping mechanism. Gradientofdescent simplemore algorithm which make small the gradient from a mini-batch examplesistoaproceed quickly.in The batch we size repeatedly is set to 500 in our training. As the predicted variable is continuous, we use the mean squared error as the loss steps downward on an error surface defined by a loss function of some parameters. MSGD estimates function:
the gradient from a mini-batch of examples to proceed more quickly. The batch size is set to 500 in our 2 N training. As the predicted variable is continuous, 1 we use the mean squared error as the loss function: L
N
tr tp 1
N
(9)
,
2
1 where tr refers to the real transmission tptrrepresents L = value, k − tpk , the predicted value, and N is the (9) N 1 by monitoring the network’s performance on a batch size. Early-stopping combats overfitting validation set. This technique relinquishes on further optimization when the network’s performance ceasesto to the improve sufficiently on the validation set, or even withvalue, furtherand optimization. where tr refers real transmission value, tp represents thedegrades predicted N is the batch size. During training, we randomly chose 80% of the training samples to learn the parameters and the Early-stopping combats overfitting by monitoring the network’s performance on a validation set. remaining 20% of the training samples were used as the validation set to identify if the network was This technique relinquishes on further optimization when the network’s performance overfitting. In our experiments, if the validation score is not improved by 1.0 × 10−5 within 10ceases epochs,to improve sufficiently the validation set, or even degrades withtofurther optimization. During training, we theon training process is terminated. The testing set is used assess the final prediction performance the trained with the best validation weights of and each convolutional layer20% of the randomlyofchose 80% network of the training samples to score. learnThe thefilter parameters the remaining are initialized through the Xavier initializer [45], which uniformly samples from a symmetric interval:
∑
training samples were used as the validation set to identify if the network was overfitting. In our experiments, if the validation score is not improved by6 1.0 × 10−5 within 10 epochs, the training 6 , , (10) fin to fassess out finthe ffinal out prediction performance of the trained process is terminated. The testing set is used where fin is the number of input units, and fout is the number of output units. For the first convolutional layer, fin equals 45 (5 × 3 × 3) and fout equals 576 (64 × 3 × 3). The biases are set to 0. The learning rate is 0.01 and is decreased by 0.5 when reaching a learning plateau. We implement our model using the keras [55] package with the theano backend [56]. Based on the parameters above, the best validation score is 0.0289% obtained at epoch 198, with a test performance of 0.0288%. It takes
Remote Sens. 2018, 10, 945
11 of 24
network with the best validation score. The filter weights of each convolutional layer are initialized through the Xavier initializer [45], which uniformly samples from a symmetric interval: " s
−
6 , f in + f out
s
# 6 , f in + f out
(10)
where f in is the number of input units, and f out is the number of output units. For the first convolutional layer, f in equals 45 (5 × 3 × 3) and f out equals 576 (64 × 3 × 3). The biases are set to 0. The learning rate is 0.01 and is decreased by 0.5 when reaching a learning plateau. We implement our model using the keras [55] package with the theano backend [56]. Based on the parameters above, the best validation score is 0.0289% obtained at epoch 198, with a test performance of 0.0288%. It takes 145.68 min to finish training using an NVIDIA Quadro K620 GPU. Hereafter, we refer to this trained MRCNN as MODEL-O. 3.4. Dehazing and Post-Processing After finishing training, we can predict the haze transmission given a new hazy patch. It is feasible to feed a small block into the network at one time for prediction. For each pixel in the block, its 16 × 16 surrounding neighborhood is used for prediction of haze transmission. While for the pixels belonging to the borders of the block, a 16 × 16 surrounding neighborhood cannot be defined. We have implemented a simple algorithm to replicate borders that allows us to handle all the border pixels as any other pixel in the block, i.e., mirroring eight pixels, half of the patch size, of the border outwards, to create the corresponding patches of the original border pixels. When handling a full-size image that occupies considerable physical memory, it is necessary to slice the original image and mosaic the output tiles to avoid running out of computer memory. The adjacent tiles overlap each other with pixels whose number equals the size of training patch, i.e., 16 pixels in our test, to prevent visual disruption. Furthermore, the original hazy image should multiply a scale factor depending on the pixel depth of the original data to ensure that the input values are in [0, 1]. In this way, we can predict the complete haze transmission map of the input block or full-size image. Finally, the clear image can be recovered according to Equations (4) and (5). The lowest decile of predicted transmission map is used as the threshold t0 in Equation (4). Generally, the radiances of directed dehazing results are lower than that of the clear scenes, as both haze contribution and clear scene aerosol are removed entirely. Meanwhile, the inaccurate estimation of the atmospheric light might lead to potential residual of radiances. Herein, we utilize clear regions, least influenced by haze, as a reference for the compensation of scene aerosol or correction of the residual. We slice the predicted transmission map into tiles of size 200 × 200. The tile with the maximum mean is considered as the clear region. The aim of compensating is to ensure mean radiance of the dehazed block corresponding to the clear region equals to that of the original block in all bands. The process of compensation can be expressed as: R0 i = Ri + ( MiO − MiR ),
(11)
where i is the band number, R0 is the final radiance, R is the directed recovered value, MO represents the mean value of the original clear image block, and M R represents the mean value of the directed recovered image block corresponding to the clear region. 3.5. Transferring Application To reduce the demand for computer memory and time consumption, MODEL-O is trained on a limited dataset. It is feasible to apply MODEL-O for haze removal of surrounding areas that covers the similar surface types. When handling images in another region, a new trained network is required. Learning from the beginning usually costs too much and is unnecessary. Transfer learning is an
Remote Sens. 2018, 10, 945
12 of 24
effective approach to apply stored knowledge gained while solving one problem to a different but related problem. The core of haze removal in different areas is essentially the same, but the surface types are dissimilar. Thus, the new MRCNN can be retrained on the basis of MODEL-O. We choose another scene (path 119, row 038; acquisition date 14 April 2013) to collect new training samples. In total, 40 blocks of size 240 × 240 are sampled and then 180,000 simulated hazy patches are generated for training. Hereafter, we refer to this dataset as D2. The filter weights and biases are initialized using the learned parameters of MODEL-O, and other settings remain unchanged. The best validation score is 0.0128% obtained at epoch 43, with a test performance of 0.0168%. It takes 31.32 min to complete the optimization. Herein, the retrained MRCNN is named MODEL-T. After fine-tuning, MODEL-T gains the ability to handle images covering this new type of surface coverage and theoretically still owns the ability of the previous network. The trained network can extend its applicability constantly through transferring learning. The more new types of samples are used for fine-tuning, the stronger the network will be. We expect that the network will finally be Sens. 2018, 10, various x FOR PEERcomplex REVIEW situations after several cycles. 12 of 24 capableRemote of addressing used for fine-tuning, the stronger the network will be. We expect that the network will finally be capable of addressing various complex situations after several cycles.
4. Experimental Results
4.1. Dehazing Results on Simulated Images 4. Experimental Results We first test the effectiveness of the trained networks using simulated hazy images. To better 4.1. Dehazing Results on Simulated Images simulate the spatially varying haze, randomly generated haze transmissions are uniformly sampled in We first testasthe effectiveness theetc., trained using simulated hazy filter. images. To original better a small in range, such [0.5, 0.6], [0.6, of 0.8], andnetworks smoothed with a Gaussian The clear simulate the spatially varying haze, randomly generated haze transmissions are uniformly sampled images are used as references to assess the dehazed results quantitatively. MRCNN is compared with in a small in range, such as [0.5, 0.6], [0.6, 0.8], etc., and smoothed with a Gaussian filter. The original traditional removal methods (HOT [5], DCP [15], HTM [11]) and other MRCNN networksis (DehazeNet [30], clearhaze images are used as references to assess the dehazed results quantitatively. compared VGGNet [57]). Herein, we use the trained networks with the best validation errors after 200-epoch with traditional haze removal methods (HOT [5], DCP [15], HTM [11]) and other networks VGGNetfine-tuning [57]). Herein,on we D2. use the trained networks the bestofvalidation errors results training(DehazeNet on D1 and[30], 50-epoch Figure 9 shows anwith example the dehazing after 200-epoch on D1 andmethods. 50-epoch fine-tuning on D2.distortion Figure 9 shows an example of the results. on simulated images training using different Serious color appears in the HOT dehazing results on simulated images using different methods. Serious color distortion appears in There still exists slight haze in DCP’s result. The colors in the HTM results are oversaturated. The the HOT results. There still exists slight haze in DCP’s result. The colors in the HTM results are recovered image of DehazeNet looks dim and has poor local contrast. In contrast, the results of oversaturated. The recovered image of DehazeNet looks dim and has poor local contrast. In contrast, VGGNet MRCNN are better visually. theand results of VGGNet and MRCNN are better visually.
Figure 9. Dehazing results of different methods on simulated images: (a) original clear image; (b)
Figure HOT 9. Dehazing results of different methods on simulated images: (a) original clear image; [5]; (c) DCP [15]; (d) HTM [11]; (e) simulated hazy image; (f) DehazeNet [30]; (g) VGGNet [57]; (b) HOT(h) [5]; (c) DCP [15]; (d) HTM [11]; (e) simulated hazy image; (f) DehazeNet [30]; (g) VGGNet [57]; MRCNN. (h) MRCNN. Furthermore, we collect 20 clear image block of size 1000 × 1000 and generate 400 hazy images in total for our test. To quantitatively assess different methods, we utilize a series of indices, including the mean square error (MSE), peak signal-to-noise ratio (PSNR), visual image fidelity (VIF) [58], universal quality index (UQI) [59] and structural similarity (SSIM) [60], for image quality assessment (IQA) of dehazed results. Table 2 reports the average IQA results on simulated images. All indices indicate that the trained networks obtain dehazing results superior to the traditional methods using priors and MRCNN is state-of-the-art. Although MRCNN is optimized by the MSE loss function, it
Remote Sens. 2018, 10, 945
13 of 24
Furthermore, we collect 20 clear image block of size 1000 × 1000 and generate 400 hazy images in total for our test. To quantitatively assess different methods, we utilize a series of indices, including the mean square error (MSE), peak signal-to-noise ratio (PSNR), visual image fidelity (VIF) [58], universal quality index (UQI) [59] and structural similarity (SSIM) [60], for image quality assessment (IQA) of dehazed results. Table 2 reports the average IQA results on simulated images. All indices indicate that the trained networks obtain dehazing results superior to the traditional methods using priors and MRCNN is state-of-the-art. Although MRCNN is optimized by the MSE loss function, it also achieves the best performance on the other types of evaluation indices. The average VIF is −0.0306, very close to zero, implying the image distortion is minimal or acceptable. The IQA result of UQI, which measures the loss of correlation, luminance distortion and contrast distortion of images, supports the same conclusion. Meanwhile, MRCNN gains relatively high values of SSIM, reflecting its powerful ability to Remote structural Sens. 2018, 10,information. x FOR PEER REVIEW 13 of 24 preserve supports the same conclusion. Meanwhile, MRCNN gains relatively highresults. values of SSIM, reflecting Table 2. Image quality assessment results of dehazed its powerful ability to preserve structural information. IQA MSE PSNR VIF UQI SSIM
Hazy 0.1206
HOT
HTM
0.0334
0.0146
DCP
DehazeNet
VGGNet
MRCNN
0.0045
0.000810
0.000641
0.000579
Table 2. Image quality assessment results of dehazed results.
IQA HOT HTM 57.3178Hazy 62.8991 66.4915 MSE 0.1206−4.3330 0.0334 −0.0146 −6.2915 1.3643 PSNR 57.3178 0.0001 62.8991 66.4915 0.4647 0.6599 VIF −6.2915 0.9025 −4.3330 −1.3643 0.9820 0.9991 UQI 0.4647 0.0001 0.6599 SSIM 0.9820 0.9025 0.9991
4.2. Dehazing Results on Hazy RS Images
DCP MRCNN 80.5033 71.5961DehazeNet 79.0475VGGNet 80.0592 0.0045 0.000579 −0.0306 −0.1087 0.000810 −0.04170.000641 −0.0337 71.5961 80.5033 0.9363 0.8787 79.0475 0.9155 80.0592 0.9345 −0.1087 −0.0417 −0.0337 −0.0306 0.9999 0.9996 0.9999 0.9999 0.8787 0.9155 0.9345 0.9363 0.9996 0.9999 0.9999 0.9999
4.2. Dehazing Results Hazy RSthe Images In this section, weonvalidate effectiveness of different MRCNN models on hazy RS images.
Figure 10 the dehazing results of MODEL-O on threeMRCNN sub-scenes cuton out from full scene Inshows this section, we validate the effectiveness of different models hazy RSaimages. Figure 10 shows dehazingdate results MODEL-O2015). on three outarea fromisamainly full scene (path by (path 123, row 033, the acquisition 23 of September Thesub-scenes surface ofcut this covered 033, acquisition date 23The September 2015). The surface of thisisarea is mainly covered by bare bare123, soil,row with sparse vegetation. visibility of dehazed images significantly improved and the withrecovered sparse vegetation. visibilityconsistent of dehazedwith images significantly improved and the color colorsoil, of the surface The is basically theisactual surface. More details are visible, of the recovered surface is basically consistent with the actual surface. More details are visible, such such as urban buildings in the central part of Figure 10d, the shape of rural villages in Figure 10e, buildings in in thethe central partof ofFigure Figure 10f. 10d, Haze the shape rural villages Figure 10e, and the and as theurban transverse road middle andof thin clouds areinentirely removed while transverse road in the middle of Figure 10f. Haze and thin clouds are entirely removed while thick thick cumulus clouds remain unchanged especially in Figure 10b. It is proven that the trained model cumulus clouds remain unchanged especially in Figure 10b. It is proven that the trained model is is suitable for haze removal of surrounding areas covering the similar surface types with training suitable for haze removal of surrounding areas covering the similar surface types with training samples in D1. samples in D1.
Figure 10. Dehazing results (sub-scenes) of MODEL-O. (a–c) hazy sub-scenes cut out from a full scene
Figure 10. Dehazing results (sub-scenes) of MODEL-O. (a–c) hazy sub-scenes cut out from a full scene (path 123, row 033, acquisition date 23 September 2015); (d–f) dehazed images using MODEL-O. (path 123, row 033, acquisition date 23 September 2015); (d–f) dehazed images using MODEL-O.
Figure 11 shows the dehazing results using MODEL-T. The three hazy images are from different scenes near the training samples in D2. Figure 11a,b covers rich vegetation while the surface in Figure 11c is much barer. The effect of the haze has been eliminated completely, and all surface covers, including bare soil, building areas, vegetation, water body etc., exhibit the proper color. As the training dataset contains samples for the water body, MODEL-T removes haze over the river
Remote Sens. 2018, 10, 945
14 of 24
Figure 11 shows the dehazing results using MODEL-T. The three hazy images are from different scenes near the training samples in D2. Figure 11a,b covers rich vegetation while the surface in Figure 11c is much barer. The effect of the haze has been eliminated completely, and all surface covers, including bare soil, areas, vegetation, water body etc., exhibit the proper color. As14the Remote Sens. 2018,building 10, x FOR PEER REVIEW of 24training dataset Remote contains samples for the water body, MODEL-T removes haze over the river properly, as shown Sens. 2018, 10, x FOR PEER REVIEW 14 of 24 knowledge of MODEL-O. Since MRCNN extracts abstract features of input images for prediction, in Figure 11e,f. MODEL-T handles these situations properly, implying that MRCNN gets the ability to MODEL-T does well in removing the haze over apparently different water bodies in Figures 11b,c knowledge of MODEL-O. Since MRCNN extracts abstract features of input images for prediction, solve new problems through transferring learning. and 12c,d, even though training samples only contain one kind of water body, similar to Figure 11b. MODEL-T does well in removing the haze over apparently different water bodies in Figures 11b,c and 12c,d, even though training samples only contain one kind of water body, similar to Figure 11b.
Figure 11. Dehazing results (sub-scenes) of MODEL-T. (a) hazy block (path 119, row 037, acquisition
Figure 11. Dehazing results (sub-scenes) of MODEL-T. (a) hazy block (path 119, row 037, acquisition date 22 April 2016); (b) hazy block (path 119, row 038, acquisition date 22 July 2014); (c) hazy block 11. Dehazing results (sub-scenes) of MODEL-T. (a) hazy block (path row 037, acquisition date 22 Figure April 2016); (b) hazy block (path 119, row 038, acquisition date 119, 22 July 2014); (c) hazy block (path 119, row 038, acquisition date 2 February 2016); (d–f) dehazed images using MODEL-T. date 22 April 2016); (b) hazy block (path 119, row 038, acquisition date 22 July 2014); (c) hazy block (path 119, row 038, acquisition date 2 February 2016); (d–f) dehazed images using MODEL-T. (path 119, row 038, acquisition date 2 February 2016); (d–f) dehazed images using MODEL-T.
Figure 12. Dehazing results of full-size Landsat 8 OLI data using MODEL-T. (a–d) original hazy images with spatially varying haze or clouds; (a) path 123, row 032, acquisition date 3 October 2013; Figure 12. Dehazing results of full-size Landsat 8 OLI data using MODEL-T. (a–d) original hazy (b) path 123, row 033, acquisition date 7 May 2015; pathusing 119, row 038, (d) acquisition date 22hazy July images Figure 12. Dehazing results of full-size Landsat 8 OLI(c)data MODEL-T. (a–d) original images with spatially varying haze or clouds; (a) path 123, row 032, acquisition date 3 October 2013; 2017; path 119, row 038,or acquisition date 29 December 2014); (e–h) dehazed date images. with spatially varying haze clouds; (a) path 123, row 032, acquisition 3 October 2013; (b) path 123, row 033, acquisition date 7 May 2015; (c) path 119, row 038, (d) acquisition date 22 July(b) path 123, row2017; 033,path acquisition dateacquisition 7 May 2015; path 119,2014); row 038, acquisition 119, row 038, date (c) 29 December (e–h)(d) dehazed images.date 22 July 2017; path
119, row 038, acquisition date 29 December 2014); (e–h) dehazed images.
Remote Sens. 2018, 10, 945
15 of 24
Figure 12 shows the results of four full-size images affected by haze or transparent clouds using MODEL-T. Visually, hazy regions are recovered successfully, and clear regions remain unchanged. Although Figure 12a,b are far away from the training area and show different characteristics, MODEL-T succeeds in removing the haze over them, proving that MODEL-T inherits the learned knowledge of MODEL-O. Since MRCNN extracts abstract features of input images for prediction, MODEL-T does well in removing the haze over apparently different water bodies in Figure 11b,c and Figure 12c,d, Remote Sens. 2018, 10, x FOR PEER REVIEW 15 of 24 even though training samples only contain one kind of water body, similar to Figure 11b. Figure Figure 13 13 shows shows the the dehazing dehazing results results in in the the coastal/aerosol coastal/aerosol band band and and near-infrared near-infrared band band of of Landsat 8 OLI data. Obviously, the effect of haze has been eliminated, and the visibility of the results Landsat 8 OLI data. Obviously, the effect of haze has been eliminated, and the visibility of the results has has been been significantly significantly enhanced. enhanced.
Figure 13. Dehazing results in other bands: (a) coastal/aerosol band (0.43–0.45 μm) of Figure 10c; (b) Figure 13. Dehazing results in other bands: (a) coastal/aerosol band (0.43–0.45 µm) of Figure 10c; near-infrared band (0.85–0.88 μm) of Figure 10c; (c) coastal/aerosol band of Figure 11c; (d) near(b) near-infrared band (0.85–0.88 µm) of Figure 10c; (c) coastal/aerosol band of Figure 11c; infrared band of Figure 11c; (e–h) dehazed results of (a–d) respectively. (d) near-infrared band of Figure 11c; (e–h) dehazed results of (a–d) respectively.
5. Discussion 5. Discussion 5.1. Model Performance 5.1. Model Performance The estimation of the haze transmission map is of the greatest importance for haze removal. Its The estimation of the haze transmission map is of the greatest importance for haze removal. accuracy is directly related to the quality of dehazed images. To validate the accuracy of the proposed Its accuracy is directly related to the quality of dehazed images. To validate the accuracy of the MRCNN for the prediction of haze transmission of RS images, the novel network is compared with proposed MRCNN for the prediction of haze transmission of RS images, the novel network is compared DehazeNet [30] and VGGNet [57]. DehazeNet was originally designed for natural images and has with DehazeNet [30] and VGGNet [57]. DehazeNet was originally designed for natural images and five weight layers. We use the same network architecture except for replacing the BReLU [61] has five weight layers. We use the same network architecture except for replacing the BReLU [61] activation function with the Sigmoid [62] function, based on the experimental fact that the network activation function with the Sigmoid [62] function, based on the experimental fact that the network is difficult to converge when using BReLU. VGGNet is designed for the image classification task. is difficult to converge when using BReLU. VGGNet is designed for the image classification task. Herein, we adopt a network that has 16 weight layers and remove the softmax classifier to fit the Herein, we adopt a network that has 16 weight layers and remove the softmax classifier to fit the regression problem. The proposed MRCNN has 13 weight layers, which achieves a balance between regression problem. The proposed MRCNN has 13 weight layers, which achieves a balance between the network depth and trainable parameters. Three models are initialized in the same way and the network depth and trainable parameters. Three models are initialized in the same way and trained trained using the same datasets and method. using the same datasets and method. The qualities of the trained networks can be reflected by learning curves. As shown in Figure 14a, the training errors of the three networks reduce quickly and finally reach a relatively stable state. The speed of convergence for MRCNN is faster than that for DehazeNet and VGGNet. Compared with DehazeNet and VGGNet, the validation error of MRCNN declines more quickly and achieves a more stable state. With further training, the validation error for VGGNet and MRCNN varies between increasing and decreasing but can reduce to a lower value, indicating that the networks are trying to search for better solutions, while it decreases the original level for DehazeNet, implying that the
Remote Sens. 2018, 10, 945
16 of 24
The qualities of the trained networks can be reflected by learning curves. As shown in Figure 14a, the training errors of the three networks reduce quickly and finally reach a relatively stable state. The speed of convergence for MRCNN is faster than that for DehazeNet and VGGNet. Compared with DehazeNet and VGGNet, the validation error of MRCNN declines more quickly and achieves a more stable state. With further training, the validation error for VGGNet and MRCNN varies between increasing and decreasing but can reduce to a lower value, indicating that the networks are trying to search for better solutions, while it decreases the original level for DehazeNet, implying that the network converges to a local minimum in the end, and the learning process is apparently slowed down. The smaller amplitude of oscillation in the validation error indicates that MRCNN is much easier to adjust. When applying the trained networks for a new dataset, MRCNN can achieve the best result as shown in Figure 14c,d, originally with 0.0294% training error and 0.1455% validation error. Remote Sens. 2018, 10, x FOR PEER REVIEW 16 of 24 Fine-tuning can improve the performance of the networks. Finally, DehazeNet, VGGNet, and MRCNN can obtainFine-tuning the best validation error of 0.0317%, 0.0310%, and 0.0168%, respectively. The validation error can improve the performance of the networks. Finally, DehazeNet, VGGNet, and of VGGNetMRCNN remains unchanged 20 epochs, indicating that respectively. the network might have canapproximately obtain the best validation errorafter of 0.0317%, 0.0310%, and 0.0168%, The validation error of VGGNet remains approximately unchanged of after epochs, indicating that the using a encountered a bottleneck. We further validate the feasibility the20transferring application network might have encountered a bottleneck. We further validate the feasibility of the transferring small number of training samples. We divide the dataset D2 into a 30% training set and a 70% testing set. application using a small number of training samples. We divide the dataset D2 into a 30% training It takes approximately aboutset.13.76 min to fine-tune MRCNN. best validation score set and a 70% testing It takes approximately about 13.76 minThe to fine-tune MRCNN. The bestis 0.0179% obtained at epoch 45, with a testobtained performance The accuracy is close toaccuracy that of isMODEL-T, validation score is 0.0179% at epoch of 45,0.0193%. with a test performance of 0.0193%. The close to that the training is reduced greatly. Theare similar experiments but the training timeof isMODEL-T, reducedbut greatly. Thetime similar experiments performed forare DehazeNet performed for DehazeNet and VGGNet. Their training errors closely approaches 0.035% after 100and VGGNet. Their training errors closely approaches 0.035% after 100-epoch of fine-tuning, while the epoch of fine-tuning, while the validation and testing errors remain stable at over 0.04% without validationfurther and testing errors remain stable at over 0.04% without further improvement. improvement.
Figure 14. Learning curves for DehazeNet, VGGNet, and the proposed MRCNN. (a,b) training error
Figure 14. Learning curves for DehazeNet, VGGNet, and the proposed MRCNN. (a,b) training error and validation error on D1 during the original training process; (c,d) training error and validation and validation error on D1the during the original training process; (c,d) training error and validation error error on D2 during transferring application. on D2 during the transferring application. Figure 15 plots the predicted value versus the true transmission on the testing set. The first row corresponds to the results of the trained networks using D1, and the second row shows the results of Figure plots the predicted value the true transmission onlabeled the testing set. The first the15 fine-tuned networks using D2. The versus testing errors of different networks are at the upper row corresponds tosub-plot. the results of the trained networksare using the values second row left of each The predicted values of DehazeNet lowerD1, thanand the truth when theshows the real transmissions are close to 1. The center line of VGGNet is discrete, meaning that some results of the fine-tuned networks using D2. The testing errors of different networks are labeled transmissions are missed in the prediction result. For MRCNN, the predicted values are centered on the 45-degree line, and the testing errors are lower than that of other networks. In fact, DehazeNet is inherently a shallow network and the extracted features are still low-level. The network generally suffers from different surface types so that the trained network is unable to handle other images in most cases. VGGNet, a deep learning architecture, is capable of extracting high-level, hierarchical,
Remote Sens. 2018, 10, 945
17 of 24
at the upper left of each sub-plot. The predicted values of DehazeNet are lower than the truth values when the real transmissions are close to 1. The center line of VGGNet is discrete, meaning that some transmissions are missed in the prediction result. For MRCNN, the predicted values are centered on the 45-degree line, and the testing errors are lower than that of other networks. In fact, DehazeNet is inherently a shallow network and the extracted features are still low-level. The network generally suffers from different surface types so that the trained network is unable to handle other images in most cases. VGGNet, a deep learning architecture, is capable of extracting high-level, Remote Sens. 2018, 10, x FOR PEER REVIEW 17 of 24 hierarchical, and abstract features to avoid the influence from the surface coverage. However, the high-level to lose weakthe information andthe local details, resulting in difficulties for achieving and features abstract tend features to avoid influence from surface coverage. However, the high-level features tend to lose weak information local details, resulting in difficulties accuratecontext accurate haze transmission estimation. Inand contrast, MRCNN takes advantageforofachieving the multi-scale haze transmission Inbalance contrast,between MRCNNdeepening takes advantage of the context aggregation and residualestimation. learning to the depth of multi-scale network and preventing aggregation and residual learning to balance between deepening the depth of network and information loss. The trained or fine-tuned MRCNN can achieve higher prediction accuracy than preventing information loss. The trained or fine-tuned MRCNN can achieve higher prediction other networks. Although the three networks can obtain a similar final training error, MRCNN accuracy than other networks. Although the three networks can obtain a similar final training error, converges faster and achieves better validation errors and testing errorserrors during the the original MRCNN converges faster and achieves better validation errors and testing during originaltraining and transferring application. training and transferring application.
Figure 15. The density plots between the predicted and truth transmission for DehazeNet, VGGNet,
Figure 15. The density plots between the predicted and truth transmission for DehazeNet, VGGNet, and the proposed MRCNN. (a–c) the results using the testing set of D1 during the original training and the proposed MRCNN. (a–c) the results using the testing set of D1 during the original training process; (d–f) the results using the testing set of D2 during transferring learning. process; (d–f) the results using the testing set of D2 during transferring learning. 5.2. Analysis of Spectral Consistency
5.2. AnalysisIt of Spectral Consistency is difficult to evaluate a single-image based haze removal method quantitatively as the ground information corresponding to the hazy image is usually unknown. Therefore, we choose aas pair It is difficult to evaluate a single-image based haze removal method quantitatively theofground hazy and haze-free images that have minimal time difference and a minimal difference of sun/sensor information corresponding to the hazy image is usually unknown. Therefore, we choose a pair of geometry for our evaluation. The spectra of images are compared in two aspects: (1) the difference hazy and haze-free images that have minimal time difference and a minimal difference of sun/sensor between the hazy image and the dehazed result in clear regions must be minimal; and (2) the spectra geometry fordehazed our evaluation. Thehaze-free spectraimage of images are compared in two aspects:the (1)dehazing the difference of the result and the are highly similar. Figure 16 illustrates between the hazy image and the dehazed result in clear regions must be minimal; and (2) result (Figure 16b) of a hazy image (Figure 16a) and the selected haze-free image (Figure 16c). the Thespectra hazy and haze-free datathe were acquired image with a time of 16 days. The dehazed image exhibits of the dehazed result and haze-free are difference highly similar. Figure 16 illustrates the dehazing good recovery surface except slight16a) residual under heavy hazy regions. result (Figure 16b) ofofa the hazy image (Figure and the selected haze-free image (Figure 16c). The hazy Figure 17 illustrates the spectra of pixels collected in hazy (asterisk), dehazed (plus), and hazeand haze-free data were acquired with a time difference of 16 days. The dehazed image exhibits good free (diamond) image. Figure 17a,b are spectral profiles of pixels (cross 1 and 2 in Figure 17a) located recovery of the surface except slight residual under heavy hazy regions. in clear regions. The spectra of different images have the same shape and similar DN value, proving that dehazing does not modify the spectral properties in clear regions. Figure 17c,d show spectral profiles of pixels (cross 3 and 4 in Figure 17a) located in hazy regions. The spectra of different images have a similar shape. The DN values of the dehazed and haze-free image are in close agreement but
Remote Sens. 2018, 10, 945
18 of 24
Figure 17 illustrates the spectra of pixels collected in hazy (asterisk), dehazed (plus), and haze-free (diamond) image. Figure 17a,b are spectral profiles of pixels (cross 1 and 2 in Figure 17a) located in clear regions. The spectra of different images have the same shape and similar DN value, proving that dehazing does not modify the spectral properties in clear regions. Figure 17c,d show spectral profiles of pixels (cross 3 and 4 in Figure 17a) located in hazy regions. The spectra of different images have Remote Sens. 2018, x FOR PEER REVIEW 18 18 of 24 a similar shape. The values of the dehazed and haze-free image are in close agreement Remote Sens. 2018, 10, 10, xDN FOR PEER REVIEW of but 24 are different from the hazy image. This suggests that dehazing adjusts the radiances of hazy pixels to a are different from hazy image.This Thissuggests suggeststhat that dehazing dehazing adjusts hazy pixels to to aredegree different from thethe hazy adjuststhe theradiances radiancesofof hazy pixels certain and produces aimage. spectrally consistent result. a certain degree and produces a spectrally consistent result. a certain degree and produces a spectrally consistent result.
Figure 16. Comparison of dehazing result and haze-free image: (a) hazy image (path: 119, row: 038,
Figure 16. Comparison of dehazing result and haze-free image: (a) hazy image (path: 119, row: 038, acquisition date: 1 January 2016); (b) dehazed usingimage: MODEL-T; (c) haze-free image (path: 119, Figure 16. Comparison of dehazing result and image haze-free (a) hazy image (path: 119, row: 038, acquisition 1 Januarydate: 2016); (b) dehazed row: date: 038, acquisition 16 December 2015).image using MODEL-T; (c) haze-free image (path: 119, acquisition date: 1 January 2016); (b) dehazed image using MODEL-T; (c) haze-free image (path: 119, row: 038, acquisition date: 16 December 2015). row: 038, acquisition date: 16 December 2015).
Figure 17. Spectral profiles of pixels in hazy (asterisk), dehazed (plus) and haze-free (diamond) images. The location of sampled pixels is marked by cross 1–4 in Figure 8a. (a–d) spectra corresponding to cross 1 to 4, respectively. The X-axis is band index and Y-axis is DN (digital numbers) value. Figure 17. Spectral profiles of pixels in hazy (asterisk), dehazed (plus) haze-free (diamond) Figure 17. Spectral profiles of pixels in hazy (asterisk), dehazed (plus) andand haze-free (diamond) images.
images. The location pixels of sampled pixelsbyiscross marked cross 1–4 Figure 8a. (a–d) spectra The location of sampled is marked 1–4 by in Figure 8a. in (a–d) spectra corresponding to Figure 18 presents the blue and red bandThe profiles ofishazy (red line), dehazed (green line), and corresponding to cross 1 to 4, respectively. X-axis band index and Y-axis is DN (digital cross 1 to 4, respectively. The X-axis is band index and Y-axis is DN (digital numbers) value. haze-free line) images. Along the vertical line, the hazy image is relatively clear on both sides numbers)(blue value. while affected by haze in the middle. Correspondingly, the left and right parts of Figure 18a,c have
Figure 18 presents the blue and red band profiles of hazy (red line), dehazed (green line), and haze-free (blue line) images. Along the vertical line, the hazy image is relatively clear on both sides while affected by haze in the middle. Correspondingly, the left and right parts of Figure 18a,c have
Remote Sens. 2018, 10, 945
19 of 24
Figure 18 presents the blue and red band profiles of hazy (red line), dehazed (green line), and haze-free (blue Along the vertical line, the hazy image is relatively clear on both sides Remote Sens. 2018, 10,line) x FORimages. PEER REVIEW 19 of 24 while affected by haze in the middle. Correspondingly, the left and right parts of Figure 18a,c have similarshapes shapesand andDN DNvalues, values,while whilethe themiddle middleparts partsare aredifferent. different.Along Alongthe thehorizontal horizontalline, line,the the similar image is entirely covered by haze. In Figure 18b,d, the curves of the hazy image are remarkably image is entirely covered by haze. In Figure 18b,d, the curves of the hazy image are remarkably differentfrom fromthe theothers. others.InIncontrast, contrast,the thespectra spectraofofthe thedehazed dehazedand andhaze-free haze-freeimage imageare arehighly highly different overlapping. This indicates that dehazing maintains the similarity in clear regions properly and reveals overlapping. This that dehazing maintains the similarity in clear regions properly and a noticeable enhancement in hazyinregions. The remaining differences can be attributed to residual reveals a noticeable enhancement hazy regions. The remaining differences can be attributed to haze, different atmospheric conditions, or a change of surface. residual haze, different atmospheric conditions, or a change of surface.
Figure 18. Band profiles taken from hazy (red line), dehazed (green line) and haze-free (blue line) Figure 18. Band profiles taken from hazy (red line), dehazed (green line) and haze-free (blue line) images. The location is marked by the red cross in Figure 8a. (a) vertical profile of blue band (0.483 images. The location is marked by the red cross in Figure 8a. (a) vertical profile of blue band (0.483 µm); μm); (b) horizontal profile of blue band; (c) vertical profile of red band (0.655 μm); (d) horizontal (b) horizontal profile of blue band; (c) vertical profile of red band (0.655 µm); (d) horizontal profile of profile of red band. red band.
5.3. Influence on Vegetation Index 5.3. Influence on Vegetation Index Spectral consistency of dehazing results ensures that haze removal would not affect the Spectral consistency of dehazing results ensures that haze removal would not affect the algorithms algorithms that rely on the spectral information of remote sensing images. The dehazed images are that rely on the spectral information of remote sensing images. The dehazed images are expected expected to be used as data sources of land cover classification and mapping, surface change to be used as data sources of land cover classification and mapping, surface change detection and detection and other applications involving ground information extraction. Herein, we implement a other applications involving ground information extraction. Herein, we implement a test of extracting test of extracting vegetation using the normalized difference vegetation index (NDVI) from the original hazy image, dehazed image, and haze-free reference image respectively. The expression of NDVI is written as: NDVI
nir r nir r
(12)
where is the reflectivity for the indicated band, nir and r stands for band NIR and red, respectively.
Remote Sens. 2018, 10, 945
20 of 24
vegetation using the normalized difference vegetation index (NDVI) from the original hazy image, dehazed image, and haze-free reference image respectively. The expression of NDVI is written as: NDVI =
ρnir − ρr ρnir + ρr
(12)
where is the reflectivity for REVIEW the indicated band, nir and r stands for band NIR and red, respectively. Remoteρ Sens. 2018, 10, x FOR PEER 20 of 24 An example is given in Figure 19. The original hazy image, covering the urban and suburban areas An example is given 19.2014 The and original hazy image, covering theThe urban and suburban of Beijing, was acquired on in 19Figure August processed with MODEL-T. haze-free reference areas of Beijing, was acquired on 19 August 2014 and processed with MODEL-T. The haze-free image is on 4 September 2014, having a minimal time difference with the hazy image. Both the original reference on 4the September having a minimal time difference withinthe hazy image. Both hazy or clearimage imageisand dehazing2014, result are run for atmospheric correction FLAASH [63] module original hazyFigure or clear image andthe thecorrected dehazingresults result for arethe runhazy for image, atmospheric correction in the ENVI software. 19a–c shows dehazing result,in and FLAASH [63] module in ENVI software. Figure 19a–c shows the corrected results for the hazy image, reference image respectively. Figure 19d–f are classification results using an NDVI threshold of 0.5. result, and reference image respectively. Figure and 19d–f are classification results using an In dehazing this period, the vegetation in the north finishes growing begins withering, while a comparison NDVI threshold of 0.5. In this period, the vegetation in the north finishes growing and begins of Figure 19d,f displays a rapid growth of vegetation, which defies the common sense. Due to the withering, while a comparison of Figure 19d,f displays a rapid growth of vegetation, which defies influence of haze, NDVI of the hazy image is smaller than that of the usual, leading to neglection of the common sense. Due to the influence of haze, NDVI of the hazy image is smaller than that of the some vegetation with a usual threshold. In contrast, the comparison of Figure 19e,f is much closer to the usual, leading to neglection of some vegetation with a usual threshold. In contrast, the comparison truth, which reflects a reduction of vegetation with the arrival of autumn. In fact, when adjusting the of Figure 19e,f is much closer to the truth, which reflects a reduction of vegetation with the arrival of threshold of Figure 19d to 0.35, we reach the same conclusion as Figure 19e, with 66.74% of vegetation. autumn. In fact, when adjusting the threshold of Figure 19d to 0.35, we reach the same conclusion as It indicates that NDVI is strongly influenced by severe haze conditions, andinfluenced haze removal can corrected Figure 19e, with 66.74% of vegetation. It indicates that NDVI is strongly by severe haze theconditions, bias to some anddegree. haze removal can corrected the bias to some degree.
Figure 19. Atmospheric correction of dehazing results: (a) hazy image (path: 123, row: 032, acquisition Figure 19. Atmospheric correction of dehazing results: (a) hazy image (path: 123, row: 032, acquisition date: 19 August 2014); (b) dehazed image using MODEL-T; (c) haze-free image (path: 123, row: 032, date: 19 August 2014); (b) dehazed image using MODEL-T; (c) haze-free image (path: 123, row: 032, acquisition date: 4 September 2014); (d–f) classification results of a-c respectively; the green represents acquisition date: 4 September 2014); (d–f) classification results of a-c respectively; the green represents pixels whose NDVI are equal to or larger than 0.5; the brown represents pixels whose NDVI are pixels whose NDVI are equal to or larger than 0.5; the brown represents pixels whose NDVI are smaller smaller than 0.5; the numbers at the bottom indicate the percentage of vegetation. than 0.5; the numbers at the bottom indicate the percentage of vegetation.
6. Conclusions We present a multi-scale residual convolutional neural network (MRCNN), which takes advantage of both spatial and spectral information for haze removal of remote sensing images. The overall architecture mainly contains four sequential modules: (1) spectral–spatial feature extraction, which utilizes 3D convolutional kernels to extract spatial–spectral correlation information; (2) multiscale context aggregation, which uses dilated convolution is used to capture abstract features in different receptive fields and aggregate multi-scale contextual information without losing resolution;
Remote Sens. 2018, 10, 945
21 of 24
6. Conclusions We present a multi-scale residual convolutional neural network (MRCNN), which takes advantage of both spatial and spectral information for haze removal of remote sensing images. The overall architecture mainly contains four sequential modules: (1) spectral–spatial feature extraction, which utilizes 3D convolutional kernels to extract spatial–spectral correlation information; (2) multi-scale context aggregation, which uses dilated convolution is used to capture abstract features in different receptive fields and aggregate multi-scale contextual information without losing resolution; (3) residual learning, which avoids the loss of weak information while deepening the network for high-level features; and (4) fully connected layers, which take advantage of dropout to improve the generalization ability and prevent overfitting. The network takes hazy patches as input and outputs haze transmission. The training datasets are generated from clear image blocks by simulating the haze degradation process. Considering the difficulty of building a complete dataset containing various kinds of surface cover types, clear samples are collected in a local clear region of a single scene in our experiments. MRCNN is trained through mini-batch stochastic gradient descent (MSGD) and an early-stopping mechanism to minimize the mean squared error between the predicted values and truth haze transmissions. After finishing training, the network is capable of predicting the haze transmission of hazy images in surrounding areas. Post-processing is necessary for the correction of the latent residual of dehazed images. The trained network can be reinforced and fine-tuned by means of further learning from new samples collected in other areas during the transferring application. The optimization costs little time since the initialized parameters have learned sufficient knowledge in the previous stage. The fine-tuned network not only gains the ability to solve new problems but also inherits the ability of the previous network. The trained network can extend its applicability constantly through transferring learning. Experiments show that the trained network can achieve a validation score of 0.0289% and testing performance of 0.0288% during the original training, and 0.1455% validation error during transferring application, which can reach 0.0168% with further fine-tuning. Taking advantage of the multi-scale context aggregation and residual learning, MRCNN converges faster and can achieve a higher prediction accuracy compared with DehazeNet [30] and VGGNet [57]. We selected several scenes of Landsat 8 OLI data for haze removal. The result of image quality assessment indicates that the trained MRCNN is state-of-the-art to obtain dehazed images, whose color is consistent with the actual scene. Compared with the traditional methods based on different priors, the proposed MRCNN owns more powerful generalization ability. Since MRCNN extracts high-level, hierarchical, and abstract features for haze transmission estimation, it hardly suffers from different surface types and various haze or thin clouds. Meanwhile, MRCNN is able to preserve structural information, and prevent the loss of correlation, luminance distortion and contrast distortion of images. A comparison to haze-free reference data reveals that the dehazing process maintains the proper similarity in clear regions and produces a noticeable enhancement in hazy regions. In addition, the spectral consistency of dehazing results ensures that haze removal would not affect algorithms that rely on the spectral information of remote sensing images. Author Contributions: N.L. conceived and designed the experiments; H.J. performed the experiments and wrote the paper. Funding: This work was supported by the Young Talent Fund of Institute of Geographic Sciences and Natural Resources Research (2015RC203). Acknowledgments: The Landsat 8 OLI data were obtained from the Global Visualization Viewer of the United States Geological Survey (USGS). Conflicts of Interest: The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Remote Sens. 2018, 10, 945
22 of 24
References 1. 2. 3. 4.
5. 6. 7. 8.
9. 10. 11. 12. 13. 14. 15. 16.
17.
18. 19. 20.
21. 22. 23.
Chavez, P.S. An improved dark-object subtraction technique for atmospheric scattering correction of multispectral data. Remote Sens. Environ. 1988, 24, 459–479. [CrossRef] Chavez, P.S. Image-based atmospheric corrections revisited and improved. Photogramm. Eng. Remote Sens. 1996, 62, 1025–1036. Liang, S.; Fang, H.; Chen, M. Atmospheric correction of Landsat ETM+ land surface imagery. I. Methods. IEEE Trans. Geosci. Remote Sens. 2001, 39, 2490–2498. [CrossRef] Liang, S.; Fang, H.; Morisette, J.T.; Chen, M.; Shuey, C.J.; Walthall, C.; Daughtry, C.S.T. Atmospheric correction of Landsat ETM+ land surface imagery: II. Validation and applications. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2736–2746. [CrossRef] Zhang, Y.; Guindon, B.; Cihlar, J. An image transform to characterize and compensate for spatial variations in thin cloud contamination of Landsat images. Remote Sens. Environ. 2002, 82, 173–187. [CrossRef] He, X.Y.; Hu, J.B.; Chen, W.; Li, X.Y. Haze removal based on advanced haze-optimized transformation (AHOT) for multispectral imagery. Int. J. Remote Sens. 2010, 31, 5331–5348. [CrossRef] Jiang, H.; Lu, N.; Yao, L. A high-fidelity haze removal method based on hot for visible remote sensing images. Remote Sens. (Basel) 2016, 8, 844. [CrossRef] Chen, S.L.; Chen, X.H.; Chen, J.; Jia, P.F.; Cao, X.; Liu, C.Y. An iterative haze optimized transformation for automatic cloud/haze detection of Landsat imagery. IEEE Trans. Geosci. Remote Sens. 2016, 54, 2682–2694. [CrossRef] Sun, L.X.; Latifovic, R.; Pouliot, D. Haze removal based on a fully automated and improved haze optimized transformation for Landsat imagery over land. Remote Sens. (Basel) 2017, 9, 972. Liu, C.; Hu, J.; Lin, Y.; Wu, S.; Huang, W. Haze detection, perfection and removal for high spatial resolution satellite imagery. Int. J. Remote Sens. 2011, 32, 8685–8697. [CrossRef] Makarau, A.; Richter, R.; Muller, R.; Reinartz, P. Haze detection and removal in remotely sensed multispectral imagery. IEEE Trans. Geosci. Remote Sens. 2014, 52, 5895–5905. [CrossRef] Makarau, A.; Richter, R.; Schlapfer, D.; Reinartz, P. Combined haze and cirrus removal for multispectral imagery. IEEE Geosci. Remote Sens. Lett. 2016, 13, 379–383. [CrossRef] Shen, H.F.; Li, H.F.; Qian, Y.; Zhang, L.P.; Yuan, Q.Q. An effective thin cloud removal procedure for visible remote sensing images. ISPRS J. Photogramm. 2014, 96, 224–235. [CrossRef] Mitchell, O.R.; Delp, E.J.; Chen, P.L. Filtering to remove cloud cover in satellite imagery. IEEE Trans. Geosci. Electron. 1977, 15, 137–141. [CrossRef] He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2341–2353. [PubMed] Xie, B.; Guo, F.; Cai, Z. Improved single image dehazing using dark channel prior and multi-scale retinex. In Proceedings of the 2010 International Conference on Intelligent System Design and Engineering Application, Changsha, China, 13–14 October 2010; pp. 848–851. Zhu, Q.; Yang, S.; Heng, P.A.; Li, X. An adaptive and effective single image dehazing algorithm based on dark channel prior. In Proceedings of the 2013 IEEE International Conference on Robotics and Biomimetics, Shenzhen, China, 12–14 December 2013; pp. 1796–1800. Xiao, C.; Gan, J. Fast image dehazing using guided joint bilateral filter. Vis. Comput. 2012, 28, 713–721. [CrossRef] He, K.; Sun, J.; Tang, X. Guided image filtering. In Proceedings of the 2010 European Conference on Computer Vision, Heraklion, Greece, 5–11 September 2010; pp. 1–14. Tarel, J.P.; Hautière, N. Fast visibility restoration from a single color or gray level image. In Proceedings of the 2009 IEEE International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 2201–2208. Long, J.; Shi, Z.W.; Tang, W.; Zhang, C.S. Single remote sensing image dehazing. IEEE Geosci Remote Sens. 2014, 11, 59–63. [CrossRef] Pan, X.; Xie, F.; Jiang, Z.; Yin, J. Haze removal for a single remote sensing image based on deformed haze imaging model. IEEE Signal Process. Lett. 2015, 22, 1806–1810. [CrossRef] Jiang, H.; Lu, N.; Yao, L.; Zhang, X.X. Single image dehazing for visible remote sensing based on tagged haze thickness maps. Remote Sens. Lett. 2018, 9, 627–635. [CrossRef]
Remote Sens. 2018, 10, 945
24.
25. 26. 27.
28.
29. 30. 31. 32.
33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43.
44. 45. 46.
47. 48.
23 of 24
Tang, K.T.; Yang, J.C.; Wang, J. Investigating haze-relevant features in a learning framework for image dehazing. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Beijing, China, 23–28 June 2014; pp. 2995–3002. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef] Zhu, Q.S.; Mai, J.M.; Shao, L. A fast single image haze removal algorithm using color attenuation prior. IEEE Trans. Image Process. 2015, 24, 3522–3533. [PubMed] Schuler, C.J.; Burger, H.C.; Harmeling, S.; Scholkopf, B. A machine learning approach for non-blind image deconvolution. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 1067–1074. Eigen, D.; Krishnan, D.; Fergus, R. Restoring an image taken through a window covered with dirt or rain. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 1–8 December 2013; pp. 633–640. Dong, C.; Chen, C.L.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [CrossRef] [PubMed] Cai, B.L.; Xu, X.M.; Jia, K.; Qing, C.M.; Tao, D.C. Dehazenet: An end-to-end system for single image haze removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [CrossRef] [PubMed] Ghamisi, P.; Chen, Y.S.; Zhu, X.X. A self-improving convolution neural network for the classification of hyperspectral data. IEEE Geosci. Remote Sens. 2016, 13, 1537–1541. [CrossRef] Yang, J.X.; Zhao, Y.Q.; Chan, J.C.W.; Yi, C. Hyperspectral image classification using two-channel deep convolutional neural network. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 5079–5082. Paoletti, M.E.; Haut, J.M.; Plaza, J.; Plaza, A. A new deep convolutional neural network for fast hyperspectral image classification. ISPRS J. Photogramm. Remote Sens. 2017. [CrossRef] Sherrah, J. Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery. arXiv 2016, doi:arXiv:1606.02585. Liu, Y.; Fan, B.; Wang, L.; Bai, J.; Xiang, S.; Pan, C. Semantic labeling in very high resolution images via a self-cascaded convolutional neural network. ISPRS J. Photogramm. Remote Sens. 2017. [CrossRef] Langkvist, M.; Kiselev, A.; Alirezaie, M.; Loutfi, A. Classification and segmentation of satellite orthoimagery using convolutional neural networks. Remote Sens. (Basel) 2016, 8, 329. [CrossRef] Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Object detectors emerge in deep scene cnns. Comput. Sci. 2014, arXiv:1412.6856. Diao, W.; Sun, X.; Dou, F.; Yan, M.; Wang, H.; Fu, K. Object recognition in remote sensing images using sparse deep belief networks. Remote Sens. Lett. 2015, 6, 745–754. [CrossRef] Pacifici, F.; Frate, F.D.; Solimini, C.; Emery, W.J. An innovative neural-net method to detect temporal changes in high-resolution optical satellite imagery. IEEE Trans. Geosci. Remote Sens. 2007, 45, 2940–2952. [CrossRef] Narasimhan, S.G.; Nayar, S.K. Vision and the atmosphere. Int. J. Comput. Vis. 2002, 48, 233–254. [CrossRef] Narasimhan, S.G.; Nayar, S.K. Contrast restoration of weather degraded images. IEEE Trans. Pattern Anal. 2003, 25, 713–724. [CrossRef] Cun, Y.L.; Boser, B.; Denker, J.S.; Howard, R.E.; Habbard, W.; Jackel, L.D.; Henderson, D. Handwritten digit recognition with a back-propagation network. Adv. Neural Inf. Process. Syst. 1990, 2, 396–404. Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 2010 International Conference on International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 807–814. Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. JMLR W & CP 2012, 15, 315–323. Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. 2010, 9, 249–256. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–8 December 2012; pp. 1097–1105. Ancuti, C.O.; Ancuti, C.; Hermans, C.; Bekaert, P. A fast semi-inverse approach to detect and remove the haze from a single image. Lect. Notes Comput. Sci. 2011, 6493, 501–514. Li, Y.; Zhang, H.K.; Shen, Q. Spectral-spatial classification of hyperspectral imagery with 3D convolutional neural network. Remote Sens. (Basel) 2017, 9, 67. [CrossRef]
Remote Sens. 2018, 10, 945
49. 50. 51.
52.
53. 54. 55. 56.
57. 58. 59. 60. 61. 62. 63.
24 of 24
Goodfellow, I.J.; Warde-Farley, D.; Mirza, M.; Courville, A.; Bengio, Y. Maxout networks. In Proceedings of the 2013 ICML, Atlanta, GA, USA, 16–21 June 2013; pp. 1319–1327. Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Computer Vision—ECCV 2014, Pt III; Springer: Cham, Switzerland, 2014; Volume 8691, pp. 346–361. He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. Hinton, G.E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R.R. Improving neural networks by preventing co-adaptation of feature detectors. Comput. Sci. 2012, 3, 212–223. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. Chollet, F. Keras. Available online: http://keras-cn.readthedocs.io/en/latest (accessed on 12 January 2018). Team, T.D.; Alrfou, R.; Alain, G.; Almahairi, A.; Angermueller, C.; Bahdanau, D.; Ballas, N.; Bastien, F.; Bayer, J.; Belikov, A. Theano: A python framework for fast computation of mathematical expressions. arXiv 2017, arXiv:1605.02688. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. Comput. Sci. 2014, arXiv:1409.1556. Sheikh, H.R.; Bovik, A.C. Image information and visual quality. IEEE Trans. Image Process. 2006, 15, 430–444. [CrossRef] [PubMed] Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett. 2002, 9, 81–84. [CrossRef] Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [CrossRef] [PubMed] Wu, Z.; Lin, D.; Tang, X. Adjustable bounded rectifiers: Towards deep binary representations. Comput. Sci. 2015, arXiv:1511.06201. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [CrossRef] [PubMed] Cooley, T.; Anderson, G.P.; Felde, G.W.; Hoke, M.L.; Ratkowski, A.J.; Chetwynd, J.H.; Gardner, J.A.; Adler-Golden, S.M.; Matthew, M.W.; Berk, A.; et al. Flaash, a modtran4-based atmospheric correction algorithm, its application and validation. Int. Geosci. Remote Sens. 2002, 3, 1414–1418. © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).