Saliency Preprocessing Locality-Constrained Linear Coding ... - MDPI

0 downloads 0 Views 3MB Size Report
Aug 30, 2018 - method, such as those shown in References [5,6], and the Spatial Pyramid Matching (SPM) method [7], which is a successful method. The SPM ...
electronics Article

Saliency Preprocessing Locality-Constrained Linear Coding for Remote Sensing Scene Classification Lipeng Ji 1, *, Xiaohui Hu 2 and Mingye Wang 1 1 2

*

School of Automation Science & Electrical Engineering, Beihang University, Beijing 100083, China; [email protected] Institute of Software Chinese Academy of Science, Beijing 100190, China; [email protected] Correspondence: [email protected]; Tel.: +86-010-8231-7751

Received: 30 July 2018; Accepted: 26 August 2018; Published: 30 August 2018

 

Abstract: Locality-constrained Linear Coding (LLC) shows superior image classification performance due to its underlying properties of local smooth sparsity and good construction. It encodes the visual features in remote sensing images and realizes the process of modeling human visual perception of an image through a computer. However, it ignores the consideration of saliency preprocessing in the human visual system. Saliency detection preprocessing can effectively enhance a computer’s perception of remote sensing images. To better implement the task of remote sensing image scene classification, this paper proposes a new approach by combining saliency detection preprocessing and LLC. This saliency detection preprocessing approach is realized using spatial pyramid Gaussian kernel density estimation. Experiments show that the proposed method achieved a better performance for remote sensing scene classification tasks. Keywords: saliency preprocessing LLC; saliency detection; image processing; scene classification

1. Introduction Over recent decades, an overwhelming amount of high-resolution (HR) remote sensing images have become available. Since remote sensing images have abundant structural patterns and spatial information that are difficult to be fully applied directly, we need to correctly classify them by senses before further processing. Therefore, remote sensing scene classification is a central issue in remote sensing applications [1,2]. Various methods have been proposed to classify remote sensing scenes over the years. Bag-of-Features (BoF) [3,4] is a classical method in whole-image categorization tasks. This method first forms a histogram based on a remote sensing image’s local features, and then uses the histogram to represent the remote sensing image. However, this method lacks consideration of the spatial layout information of features in remote sensing images. There are some improvements based on the BoF method, such as those shown in References [5,6], and the Spatial Pyramid Matching (SPM) method [7], which is a successful method. The SPM method divides a remote sensing image into different scale spatial sub-regions. Then, histograms of local features from each sub-region are computed. Usually, 2l × 2l sub-regions (where l = 0, 1, 2) are used. SPM has shown a better performance than BoF in most image classification tasks; however, the traditional SPM approach also has limitations. It requires nonlinear classifiers to complete classification. To improve SPM, a new coding algorithm, named Locality-constrained Linear Coding (LLC), was proposed [8]. The LLC method is widely applied in image classification tasks. It considers both the locality constraints and global sparsity when coding remote sensing image. LLC shows a state-of-the-art classification accuracy. In this paper, the proposed remote sensing scene classification approach is based on LLC. The proposed approach

Electronics 2018, 7, 169; doi:10.3390/electronics7090169

www.mdpi.com/journal/electronics

Electronics 2018, 7, 169

2 of 12

will improve the codebook technology in traditional LLC by combing saliency detection technology, Electronics 2018, 7, x FOR PEER REVIEW 2 of 12 which is good for remote sensing scene classification. LLC performances inthe several image classification tasks; however, basedhas on achieved LLC. The state-of-the-art proposed approach will improve codebook technology in traditional LLC by it is still based on modeling the cognitive mechanism of human vision to complete the image classification combing saliency detection technology, which is good for remote sensing scene classification. task. Physiological and psychophysical indicates that the human’s visualtasks; system has evolved LLC has achieved state-of-the-artevidence performances in several image classification however, it a specialized focusonprocessing treatment called “attentive” mode,vision whichtoiscomplete directed the to particular is still based modeling the cognitive mechanism of human image classification task. Physiological andon psychophysical evidence thathave the human’s visual locations in the visual field [9]. Based the “attentive” mode, indicates researchers proposed several system has evolved a specialized focus processing treatment called “attentive” mode, which is visual saliency detection methods [10]. LLC does not take into account visual saliency detection. directed to particular locations in the visual field [9]. Based on the “attentive” mode, researchers Thus, we can improve LLC by combining it with visual saliency detection. have proposed several saliencyinto detection methods [10]. LLC does in notthe take into account visual to Saliency detection wasvisual introduced the field of computer vision late 1990s. According saliency detection. Thus, we can improve LLC by combining it with visual saliency detection. the computational view, saliency detection can be grouped into different algorithms. One category Saliency detection was introduced into the field of computer vision in the late 1990s. According is the center-surround thought. This type of algorithm assumes that a local window exists, which to the computational view, saliency detection can be grouped into different algorithms. One can divide anisimage into a center containing antype object and a surround Another category category the center-surround thought. This of algorithm assumes[10–13]. that a local window exists, of saliency detection algorithms based on frequency domain methods otherAnother category is which can divide an imageis into a center containing an object and a [14–16]. surroundThe [10–13]. relying on information theory concepts [17–19]. In Reference [13], Fast and Efficient Saliency category of saliency detection algorithms is based on frequency domain methods [14–16]. TheDetection other (FESD) was proposed, belongs to the first saliency detection algorithm the FESD category is relying which on information theory concepts [17–19]. In Reference [13],category. Fast and In Efficient Saliency Detection (FESD) wasby proposed, which belongs the firstestimation, saliency detection method, a saliency map is built computing the kerneltodensity whichalgorithm has a faster category. In the FESD compared method, a saliency map is other built by computing the kernel densityhowever, estimation, computing performance with several density estimations [20,21]; most which has a faster computing performance compared with several other density estimations [20,21]; remote sensing images, in which saliency maps are obtained using FESD, are not always the best. however, most remote sensing images, in which saliency maps are obtained using FESD, are not This is because remote sensing images usually have a wide field of view and the size of the kernel always the best. This is because remote sensing images usually have a wide field of view and the size used for FESD, chosen by experience, is not necessarily optimal. However, we found that FESD can of the kernel used for FESD, chosen by experience, is not necessarily optimal. However, we found be improved using a simple technique. The technique is named spatial pyramid Gaussian kernel that FESD can be improved using a simple technique. The technique is named spatial pyramid density estimation (SPGKDE), which is suitable forwhich saliency detection remote detection sensing images. In our Gaussian kernel density estimation (SPGKDE), is suitable forofsaliency of remote research, SPGKDE the preprocessing of remote images. Then, by combining sensing images.isInused our for research, SPGKDE is used for the sensing preprocessing of remote sensing images.LLC encoding, accuracy of remote sensing images canofbe improved. Then, the by classification combining LLC encoding, the classification accuracy remote sensing images can be The main contribution of this paper is to propose a new kind of remote sensing scene classification improved. The main contribution of this detection paper is to propose a new method by combing SPGKDE saliency preprocessing andkind LLCof to remote improvesensing remotescene sensing by combing SPGKDE saliency preprocessing and LLC to improve sceneclassification classificationmethod accuracy. This paper is structured as detection follows. Section 2 describes the remote sensing remote sensing scene classification accuracy. This paper is structured as follows. Section 2 describes image scene classification proposed method in detail. In particular, this section outlines SPGKDE the remote sensing image scene classification proposed method in detail. In particular, this section saliency detection preprocessing, and explains how and why remote sensing scene classification outlines SPGKDE saliency detection preprocessing, and explains how and why remote sensing scene accuracy can be improved by SPGKDE preprocessing. Section 3 shows a comparison of the classification accuracy can be improved by SPGKDE preprocessing. Section 3 shows a comparison of experimental results based on traditional LLC classification and the proposed method, followed the experimental results based on traditional LLC classification and the proposed method, followed by discussions. Section 4 concludes the paper. by discussions. Section 4 concludes the paper. 2. Methodology 2. Methodology As mentioned above, thisthis paper proposes locality-constrained linear As mentioned above, paper proposesaasaliency saliency preprocessing preprocessing locality-constrained linear coding method for remote sensing scene Figure11shows showsthe the realization of this method. coding method for remote sensing sceneclassification. classification. Figure realization of this method. The technologies core technologies are SPGKDE saliency preprocessingLLC. LLC. More More specifically, specifically, descriptions The core are SPGKDE andand saliency preprocessing descriptions of of these two techniques are as follows. these two techniques are as follows. original image output label

SPGKDE

input image

saliency map image preprocessing

SIFT/LBP

LLC

Figure 1. 1. The method. Figure Theflow flowchart chartof of the the proposed proposed method.

SVM

Electronics Electronics 2018, 2018, 7, 7, 169 x FOR PEER REVIEW Electronics 2018, 7, x FOR PEER REVIEW

of 12 12 33 of 3 of 12

2.1. Spatial Pyramid Gaussian Kernel Density Estimation Saliency Detection Preprocessing 2.1. Spatial Estimation Saliency Saliency Detection Detection Preprocessing Preprocessing 2.1. Spatial Pyramid Pyramid Gaussian Gaussian Kernel Kernel Density Density Estimation FESD builds a saliency map of a remote sensing image using Gaussian kernel density FESD builds builds a a saliency saliency map map of of aa remote remote sensing sensing image image using using Gaussian Gaussian kernel kernel density density FESD estimation [13]. This method also implicitly considers sparse sampling and center bias. Since the estimation [13]. [13]. This implicitly considers considers sparse sparse sampling sampling and and center center bias. bias. Since Since the the estimation This method method also also implicitly human eye is more focused on the center of an image and is accustomed to taking a photo with the human eye is more focused on the center of an image and is accustomed to taking a photo with the human eye is more focused on the center of an image and is accustomed to taking a photo with the subject in the center, this method works well for most images. However, this method is not universal subject in this method method works works well well for for most most images. images. However, this method method is is not not universal subject in the the center, center, this However, this universal for remote sensing images. In some remote sensing images, the scene occupies the entire image. for remote remote sensing sensing images. images. In for In some some remote remote sensing sensing images, images, the the scene scene occupies occupies the the entire entire image. image. Center bias can lead to the loss of some salient information, which is detrimental to the classification Center bias can lead to the loss of some salient information, which is detrimental to the classification Center bias can lead to the loss of some salient information, which is detrimental to the classification of remote sensing images. We therefore need to improve FESD and propose SPGKDE. Figure 2 of remote remotesensing sensingimages. images.We We therefore need to improve and propose SPGKDE. of therefore need to improve FESDFESD and propose SPGKDE. FigureFigure 2 shows2 shows the weakness of the FESD method, which is the loss of salient information. shows the weakness of themethod, FESD method, which is the loss of information. salient information. the weakness of the FESD which is the loss of salient Original image Original image

FESD FESD

SPGKDE SPGKDE

Figure 2. Saliency maps comparison between FESD (fast and efficient saliency detection method, Figure 2. Saliency Saliencymaps mapscomparison comparison between FESD and efficient saliency detection method, Figure between (fast(fast and efficient saliency detection method, which which 2. is proposed by reference [13]) andFESD SPGKDE (spatial pyramid Gaussian kernel density which is proposed by reference [13]) and SPGKDE (spatial pyramid Gaussian kernel density is proposed by reference [13]) and SPGKDE (spatial pyramid Gaussian kernel density estimation). estimation). estimation).

thispaper, paper, a spatial pyramid Gaussian estimation (SPGKDE) saliency In this a spatial pyramid Gaussian kernelkernel densitydensity estimation (SPGKDE) saliency detection In this paper, a spatial pyramid Gaussian kernel density estimation (SPGKDE) saliency detection basedisonproposed. FESD is proposed. requires minor changes offers a great improvement based on FESD It requiresItminor changes but offers abut great improvement on FESD. Iton is detection based on FESD is proposed. It requires minor changes but offers a great improvement on FESD. It isfor proposed for aobtaining a remote sensingsaliency image’smap saliency more effectively, whichfor is proposed obtaining remote sensing image’s moremap effectively, which is used FESD. It is proposed for obtaining a remote sensing image’s saliency map more effectively, which is used for preprocessing saliency preprocessing locality-constrained linear coding. saliency locality-constrained linear coding. used for saliency preprocessing locality-constrained linear coding. The spatial pyramid method (SPM (SPM [7]) is a simple and practical practical method method in in computer computer vision. vision. The spatial pyramid method (SPM [7]) is a simple and practical method in computer vision. fine, by by level. level. Then, local features in This method divides a remote sensing image, from coarse to fine, This method divides a remote sensing image, from coarse to fine, by level. Then, local features in each level later. Usually, 2l × 22l llsub-regions (where(where l = 0, 1, l2)=are Theused. proposed levelare areaggregates aggregates later. Usually, × 2ll sub-regions 0,1,used. 2 ) are The each level are aggregates later. Usually, 2 × 2 sub-regions (where l = 0,1, 2 ) are used. The method, SPGKDE, is based on this thought. Figure 3 shows the realization of SPGKDE. proposed method, SPGKDE, is based on this thought. Figure 3 shows the realization of SPGKDE. proposed method, SPGKDE, is based on this thought. Figure 3 shows the realization of SPGKDE.

saliency map saliency map splice and overlap maps splice and overlap maps

FEDS FEDS

FEDS FEDS

spatial pyramid layer spatial pyramid layer

input image input image

Figure 3. 3. The flow chart of the SPGKDE. Figure The flow chart of the SPGKDE.

FEDS FEDS

Electronics 2018, 7, 169

4 of 12

Assume that there is a remote sensing image I and Ril is one of its sub-regions. For each sub-region image Ril , each pixel exists as x = ( x, f ), where f is a feature vector extracted from Ril , and x denotes the coordinate of pixel x in Ril . For each sub-region Ril , we can get its corresponding saliency map Sil using the FESD method proposed in Reference [13]. Sil ( x ) = A ∗ [ Prn (1| f , x )] α

(1)

where ∗ is convolution operator, A is a circular averaging filter, Prn (1| f , x ) is a calculated probability for characterizing the pixels in the saliency areas, and α ≥ 1 is a factor that affects high probability areas. Then, the saliency map S of remote sensing image I can be calculated as follows: S=

d0 S10

4

+ d1 ∑

Si1 +d2

i =1

16



Si2

i =1

2

=∑

4l

∑ dl Sil

(2)

l =0 i =1

where dl = (l = 0, 1, 2) means a weight, and it can be defined as dl = 21l , and Sil is acquired by Equation (1). When the level l is increasing, the weight dl is decaying to prevent the collection of too much useless salient information. Finally, a preprocessed image is obtained by: I0 = I + ξ · S

(3)

where I’ is the preprocessed image, and ξ ∈ (0,1] is used in order to avoid adding invalid details caused by the saliency map. SPGKDE has a stronger nature of saliency detection than FESD because of its re-aggregation of salient information from different image space scales. After saliency detection preprocessing, the salient information can increase the inter-class variations between different remote sensing scenes. This is beneficial for improving the accuracy of classification tasks. In this paper, the Gaussian kernel is fixed as a 9 × 9 size, and the proportional control coefficient ξ is fixed at 0.5. 2.2. Saliency Preprocessing Locality-Constrained Linear Coding Different kinds of coding algorithms have been proposed in the past few decades [5,8], most of which usually consist of feature extraction and feature coding. Experimental results have shown that, with a certain visual codebook, utilizing different coding schemes will directly affect the remote sensing scene classification accuracies [22,23]. Meanwhile, sparsity is less essential than locality under certain assumptions, as pointed out in Reference [24]. Therefore, Locality-constrained Linear Coding (LLC) was proposed [8]. LLC is widely applied in image classification tasks. It adds local restrictions to achieve global sparsity. Based on a given visual codebook, LLC provides analytical solutions. This coding method also has a fast coding speed. In this paper, LLC is selected as the basic method for remote sensing image feature coding. Let F be a set of L-dimensional local descriptors extracted from the remote sensing image, i.e., F = [ f 1 , f 2 , ..., f P ] ∈ R L× P . Given a codebook B = [b1 , b2 , ..., bQ ] ∈ R L×Q with Q entries, LLC obeys the following criteria: P

min ∑ k f i − Bci k2 + λkdi ci k2 , C

s.t.1T ci = 1, ∀i

(4)

i =1

where di ∈ RQ is the locality adaptor, and is the element-wise multiplication. Usually, we have: di = exp(

dist( f i , B) ) σ

(5)

d i = exp(

i

σ

)

(5)

where dist ( fi , B) = [ dist ( fi , b1 ),..., dist ( f i , bq )]T , and dist ( f i , bi ) represent the distance between fi and bi . σ is the weight used for adjusting the locality adaptor decay speed. Further, we can get the Electronics 2018, 7, 169

5 of 12

following solutions:

ci = (Ci + λ diag (d i )) \ 1 (6) where dist( fi , B) = [dist( fi , b1 ), ..., dist( fi , bq )]T , and dist( fi , bi ) represent the distance between fi and bi . σ is the weight used for adjusting the locality adaptor decay speed. Further, we can get the following solutions: ci = ci / 1T ci (7) 2

ce = (C + λdiag2 (d ))1

Ci =i ( B T −i 1 f i T )( B T − i1 f i T )

(6) (8)

T

= cei /1T ceanalytical i Unlike most coding methods, LLC canci provide solutions. It is of great benefit (7) to T (8) T T T T of the given codebook B directly computation. It also can be seen fromCEquation that the quality (8) i = ( B − 1 f i )( B − 1 f i ) affects the coding results in LLC, and, further, indirectly affects the classification accuracy of remote Unlike most coding methods, LLC complete can provide analytical solutions. is ofclassification, great benefitwe to sensing image scene categories. To better the task of remote sensing It scene computation. be seen from Equation that the information. quality of theThus, givenwe codebook directly make full use Itofalso the can advantages of the saliency(8) detection proposeBsaliency affects the coding results in LLC, and, further, indirectly affects the classification accuracy of remote preprocessing LLC. ' sensing image scene categories. To better complete the task of remote sensing scene classification, For each remote sensing image I , we can obtain its preprocessed image I by the method we make full use of the advantages of the saliency' detection information. Thus, we propose saliency proposed in Section 2.1. We use image I and I to get the corresponding codebook B and B ' . preprocessing LLC. Then, the given codebook B in Equation (8) can be replaced by B final , which uses the following For each remote sensing image I, we can obtain its preprocessed image I’ by the method proposed formula: in Section 2.1. We use image I and I’ to get the corresponding codebook B and B0 . Then, the given codebook B in Equation (8) can be replaced by B f inal 1 , which uses the following formula: B final = ( B + B ') (9) 2 1 B f inal = ( B + B0 ) (9) 2 where codebook B represents the original remote sensing images features codebook, and B ' comes from the computations of corresponding saliency preprocessed remote sensing images. where codebook B represents the original remote sensing images features codebook, and B0 comes Figure 4 shows the differences between the traditional LLC codebook and the proposed method from the computations of corresponding saliency preprocessed remote sensing images. Figure 4 codebook. Obviously, the features used to generate the new codebook B final are more prominent shows the differences between the traditional LLC codebook and the proposed method codebook. traditional LLCBfeatures, and this new codebook than those of codebook B constructed B final Obviously, the features used to generatefrom the new codebook prominent than those of f inal are more codebook B constructed from traditional LLC features, and this new codebook B will improve the f inal will improve the performance of remote sensing scene classification. performance of remote sensing scene classification. Input:fi’

Input:fi

Codebook: B={bj}j=1,…,q LLC

Codebook: Bfinal={½(bj+bj’)}j=1,…,q Saliency Preprocessing LLC

Figure 4. 4. Comparison Comparison between between LLC LLC (Locality-constrained (Locality-constrained Linear Linear Coding) Coding) and and proposed proposed method. method. Figure

Saliency coding. In Saliency preprocessing preprocessing LLC LLC only solves the problem of feature coding. In this this paper, Scale Invariant Feature Feature Transform Transform(SIFT) (SIFT)[25] [25] and Local Binary Pattern (LBP) [26]used are to used to extract and Local Binary Pattern (LBP) [26] are extract image image features and a support vector machine (SVM) [27,28] is used as the training classifier. Of features and a support vector machine (SVM) [27,28] is used as the training classifier. Of course, course, prove the effectiveness of the proposed thisuses paper uses a19-class public 19-class remote to provetothe effectiveness of the proposed method,method, this paper a public remote sensing sensing scene dataset to conduct experiments The dataset is proposed by Dengxin Dai Wen and scene dataset to conduct experiments [29,30].[29,30]. The dataset is proposed by Dengxin Dai and Yang, and named as WHU-RS dataset. The images in this dataset are a fixed size of 600 × 600 pixels. All the images are collected from Google Earth. In the early days, the dataset contained 12 categories of physical scenes in the satellite imagery [29]; later, it was expanded to 19 classes [30]. The 19 classes of the dataset are airport, beach, bridge, river, forest, farmland, meadow, mountain, pond, parking, port, park, viaduct, desert, football field, railway station, residential area, industrial area and commercial area.

Electronics 2018, 7, x FOR PEER REVIEW

6 of 12

Wen Yang, and named as WHU-RS dataset. The images in this dataset are a fixed size of 600 × 600 pixels. All the images are collected from Google Earth. In the early days, the dataset contained 12 categories of physical scenes in the satellite imagery [29]; later, it was expanded to 19 classes [30]. Electronics 2018, 7, 169 6 of 12 The 19 classes of the dataset are airport, beach, bridge, river, forest, farmland, meadow, mountain, pond, parking, port, park, viaduct, desert, football field, railway station, residential area, industrial area dataset and commercial The dataset hasand higher intra-class variations and smaller inter-class The has higher area. intra-class variations smaller inter-class dissimilarities. The experimental dissimilarities. The experimental results andsection. discussions are given in the next section. results and discussions are given in the next 3. Experiments and Discussion In this section, we will focus on experiments based on a WHU-RS WHU-RS dataset, demonstrate the advantage of SPGKDE preprocessing for remote sensing images, and then show the performance of both the traditional LLC and the proposed method. Thereafter, Thereafter, the the results results will will be be briefly briefly discussed. discussed. 3.1. SPGKDE Preprocessing Figure 5 shows several sample images of the WHU-RS dataset and the corresponding saliency detection results computed by SPGKDE (proposed (proposed in in this this paper). paper).

Original image

SPGKDE saliency map

airport

commercial

football field

pond

park

Figure 5. Cont.

input image

Electronics 2018, 7, 169

7 of 12

Electronics 2018, 7, x FOR PEER REVIEW

7 of 12

parking

port

viaduct

Figure 5. Samples of a 19-class dataset, SPGKDE saliency detection, and the final preprocessed Figure 5. Samples of a 19-class dataset, SPGKDE saliency detection, and the final preprocessed images images for LLC input. for LLC input.

As shown in Figure 5, most of the images can obtain useful and expected saliency maps via the As shown in Figure 5, most of the images can obtain useful and expected saliency maps via proposed SPGKDE saliency detection, and after detection preprocessing, key parts of the images can the proposed SPGKDE saliency detection, and after detection preprocessing, key parts of the images be made more prominent. The preprocessing technique can increase inter-class variations and can be made more prominent. The preprocessing technique can increase inter-class variations and reduce inter-class dissimilarities. Though, inevitably, there are some unsatisfactory saliency maps reduce inter-class dissimilarities. Though, inevitably, there are some unsatisfactory saliency maps for images that may lead to classification confusion; for instance, the pond scene image in Figure 5, for images that may lead to classification confusion; for instance, the pond scene image in Figure 5, where the features of the pond are vague, whereas the bridge becomes rather prominent after where the features of the pond are vague, whereas the bridge becomes rather prominent after saliency saliency detection preprocessing and would be wrongly classified into the park scene class, though detection preprocessing and would be wrongly classified into the park scene class, though this this phenomenon is very rare. In fact, a very high percentage of images can correctly receive saliency phenomenon is very rare. In fact, a very high percentage of images can correctly receive saliency detection preprocessing. detection preprocessing. 3.2. Performance of Traditional LLC and Proposed Method 3.2. Performance of Traditional LLC and Proposed Method In this paper, two kinds of low-level feature are used to form a codebook for LLC separately; In this paper, two kinds of low-level feature are used to form a codebook for LLC separately; namely, Scale Invariant Feature Transform (SIFT) and Local Binary Pattern (LBP). namely, Scale Invariant Feature Transform (SIFT) and Local Binary Pattern (LBP). 3.2.1. Based on SIFT Feature 3.2.1.Performance Performance Based on SIFT Feature The dimensionof of128. 128.Half Halfofofthe the dataset is used as the training set and the TheSIFT SIFTvector vector has has aa dimension dataset is used as the training set and the other other half is used for the test. The accuracy of classification of traditional LLC and the proposed half is used for the test. The accuracy of classification of traditional LLC and the proposed method method based SIFT is in shown where the BoF performance is also as amethod. baseline based on SIFTon is shown Tablein 1, Table where1,the BoF performance is also shown asshown a baseline method. Table 1. Classification accuracies based on Scale Invariant Feature Transform (SIFT). Table 1. Classification accuracies based on Scale Invariant Feature Transform (SIFT). Methods Proposed Method Traditional BoF Traditional LLC DescriptorsMethods Traditional BoF Traditional LLC Proposed Method

Descriptors SIFT SIFT

72.87%

72.87%

73.27%

73.27%

79.01%

79.01%

Figure 6 shows the confusion matrices generated by the traditional LLC and the proposed method Figure 6 shows the confusion matrices generated by the traditional LLC and the proposed based on SIFT. It is benefit for observing more differences in detail between the two methods. We can method based on SIFT. It is benefit for observing more differences in detail between the two observe that more scene image categories, such as airport, bridge and commercial area were correctly methods. We can observe that more scene image categories, such as airport, bridge and commercial area were correctly classified by the proposed method. Although the classification accuracies of

Electronics 2018, 7, 169

8 of 12

Electronics 2018, 7,7, x FOR PEER REVIEW Electronics 2018, x FOR PEER REVIEW

8 of 1212 8 of

classified by the proposed method. Although the classification accuracies of meadow, parking and meadow, meadow,parking parkingand andresidential residentialarea areacategories categorieswere werereduced, reduced,especially especiallythe theclassification classificationaccuracy accuracy residential area categories were reduced, especially the classification accuracy of meadow, which ofofmeadow, meadow,which whichdropped droppedsharply sharplybyby12%, 12%,the themethod methodimproved improvedthe theclassification classificationaccuracy accuracyofofthe the dropped sharply by 12%, the method improved the classification accuracy of the entire public dataset entire entirepublic publicdataset datasetbybyabout about6%. 6%. by about 6%.

(a) (a)

(b) (b)

Figure LLC (SIFT); (b) Proposed method (SIFT). Figure6. Confusionmatrices matricesbased basedon onSIFT; SIFT;(a) (a)Traditional Traditional LLC (SIFT); (b) Proposed method (SIFT). Figure 6.6.Confusion Confusion matrices based on SIFT; (a) Traditional LLC (SIFT); (b) Proposed method (SIFT).

To Toinvestigate investigatethe theimpact impactofofsaliency saliencydetection detectionpreprocessing, preprocessing,different differentproportions proportionsofoftraining training To investigate the impact of saliency detection preprocessing, different proportions of training and andtest testare areused usedfor forthe theexperiments. experiments.The Thetraining trainingproportion proportionisisranged rangedfrom from20% 20%toto80%. 80%.The The and test are used for the experiments. The training proportion is ranged from 20% to 80%. experimental experimentalresults resultsare areshown shownininFigure Figure7.7.The Theclassification classificationaccuracies accuraciesare areclearly clearlyimproved improvedwith with The experimental results are shown in Figure 7. The classification accuracies are clearly improved with our ourproposed proposedmethod. method. our proposed method.

Figure SIFT. Figure7.7.Different Differentratios ratiosofoftraining trainingset setexperiments experimentsbased basedononSIFT. SIFT.

LBP Feature 3.2.2.Performance PerformanceBased Basedonon LBP Feature 3.2.2. Performance Based on LBP Feature vector feature vector. Half dataset isisused asasthe set LBPvector vector a256-dimension 256-dimension feature vector. Half ofthe the dataset used thetraining training setand and LBP isisis aa256-dimension feature vector. Half ofof the dataset is used as the training set and the the half isisused the test. The ofofclassification ofofthe LLC and theother other half used for the test. Theaccuracy accuracy classification thetraditional traditional LLC andthe the other half is used for thefor test. The accuracy of classification of the traditional LLC and the proposed proposed method based on LBP is shown in Table 2, where the BoF performance is shown as proposed method based on LBP is shown in Table 2, where the BoF performance is shown as method based on LBP is shown in Table 2, where the BoF performance is shown as a baseline methoda a baseline baseline methodasaswell. well. as well. method Table 2. based on Local Binary Pattern (LBP). Table 2.2.Classification Classification accuracies based onon Local Binary Pattern (LBP). Table Classificationaccuracies accuracies based Local Binary Pattern (LBP). Methods Methods Methods Proposed Method Traditional BoF Traditional Traditional LLCProposed Traditional BoF LLC Method Traditional BoF Traditional LLC Proposed Method Descriptors Descriptors Descriptors LBP LBP LBP

68.71% 68.71% 68.71%

72.87% 72.87% 72.87%

78.22% 78.22% 78.22%

To Toobserve observemore moreclassification classificationdetails detailsofofthe theproposed proposedmethod, method,we wegave gavethe theconfusion confusionmatrices matrices ofoftraditional traditionalLLC LLCand andproposed proposedmethod methodbybyFigure Figure8.8.The Theconfusion confusionmatrices matricesare aregenerated generatedbybythe the traditional traditionalLLC LLCand andthe theproposed proposedmethod methodbased basedon onLBP. LBP.We Wecan cansee seethat thatthe theclassification classificationaccuracies accuracies

Electronics 2018, 7, 169

9 of 12

To observe more classification details of the proposed method, we gave the confusion matrices Electronics 2018, x FOR PEER REVIEW 9 of Electronics 2018, 7, 7, x FOR PEER REVIEW 9 of 1212 of traditional LLC and proposed method by Figure 8. The confusion matrices are generated by the traditional LLC and the proposed method based on LBP. We can see that the classification accuracies thefour fourcategories categoriesisisreduced, reduced,namely namelyfootball footballfield, field,industrial, industrial,meadow meadowand andmountain, mountain, ofofthe of the four categories is reduced, namely football field, industrial, meadow and mountain, accounting accountingfor for21% 21%ofofallallcategories, categories,whereas whereasthe theclassification classificationaccuracy accuracyofofthe theentire entirepublic publicdataset datasetisis accounting for 21% of all categories, whereas the classification accuracy of the entire public dataset is improved by improvedbybyapproximately approximately5.4%. 5.4%. improved approximately 5.4%.

(a) (a)

(b) (b)

Figure 8. Confusionmatrices matrices basedononLBP. LBP. (a)Traditional Traditional LLC(LBP (LBP);(b) (b)Proposed Proposedmethod method(LBP (LBP). Figure Figure8.8.Confusion Confusion matricesbased based on LBP. (a) (a) Traditional LLC LLC (LBP);); (b) Proposed method (LBP).).

Toinvestigate investigatethe theimpact impactofofsaliency saliencydetection detectionpreprocessing, preprocessing,different differentproportions proportionsofoftraining training To To investigate the impact of saliency detection preprocessing, different proportions of training and andtest testare areused usedfor forthe theexperiments. experiments.The Thetraining trainingproportion proportionranged rangedfrom from20% 20%toto80%. 80%.The The and test are used for the experiments. The training proportion ranged from 20% to 80%. The experimental experimentalresults resultsare areshown shownininFigure Figure9.9.As Ascan canbebeseen, seen,the theclassification classificationaccuracies accuraciesare are experimental results are shown in Figure 9. As can be seen, the classification accuracies are improved with our improvedwith withour ourproposed proposedmethod. method. improved proposed method.

Figure9.9.Different Differentratios ratiosofoftraining trainingset setexperiments experimentsbased basedononLBP. LBP. Figure LBP.

Throughthe theabove aboveexperiments, experiments,we wecan cansee seethat thatthe theproposed proposedmethod methodcan canimprove improvethe thescene scene Through the above experiments, we can see that the proposed method can improve the scene Through classification accuracies of remote sensing images, regardless of the features extracted by the SIFT classification accuracies of remote sensing images, regardless of the features extracted by the SIFT oror classification accuracies of remote sensing images, regardless of the features extracted by the SIFT LBP methods. However, the classification accuracies of the meadow category are reduced using the LBP methods. However, the classification accuracies of the meadow category are reduced using the or LBP methods. However, the classification accuracies of the meadow category are reduced using proposed method. Todetermine determine whythe theclassification classification themeadow sceneimages images ismore moredifficult proposed method. To why ofofthe scene the proposed method. To determine why the classification ofmeadow the meadow scene is images isdifficult more to identify using the proposed method, we reviewed the entire process of processing meadow to identify using the proposed method, we reviewed the entire process of processing meadow difficult to identify using the proposed method, we reviewed the entire process of processing meadow categoryimages. images.We Wefound foundthat thatalmost almost the meadow-category images WHU-RS dataset did category the meadow-category images ininthe WHU-RS dataset did category images. We found that almostallall all the meadow-category images inthe the WHU-RS dataset nothave have obvious saliency region, shown Figure 10.The The most striking featureoffeature ofthe themeadow meadow not ananobvious saliency region, asasshown ininFigure 10. most striking feature did not have an obvious saliency region, as shown in Figure 10. The most striking of the category images was the color. When we used SPGKDE to preprocess meadow category images, not category images was the color. When we used SPGKDE to preprocess meadow category images, not meadow category images was the color. When we used SPGKDE to preprocess meadow category onlydid did we get little color information, butthere there was also additional untrueuntrue saliency region only we get little information, but was also additional untrue saliency region images, not only did wecolor get little color information, but there was also additional saliency information. Furthermore, weakening the color information and forcibly mining saliency region information. Furthermore, weakening the color information and forcibly mining saliency region region information. Furthermore, weakening the color information and forcibly mining saliency region information,can canlead leadtotogenerating generatingthe thewrong wrongcodebook codebookfor forLLC, LLC,which whichisisdetrimental detrimentalfor for information, classificationbased basedononLLC. LLC.This Thismay mayexplain explainwhy whythe theproposed proposedmethod methodreduced reducedthe theclassification classification classification accuraciesofofthe themeadow meadowcategory. category.Fortunately, Fortunately,there thereare aresaliency saliencyregions regionsininmost mostremote remotesensing sensing accuracies imagesand andthis thishappens happensonly onlyfor formeadow meadowcategory categoryimages. images. images

Electronics 2018, 7, 169

10 of 12

information, can lead to generating the wrong codebook for LLC, which is detrimental for classification based on LLC. This may explain why the proposed method reduced the classification accuracies of the meadow category. Fortunately, there are saliency regions in most remote sensing images and this happens2018, only7,for meadow category images. Electronics x FOR PEER REVIEW 10 of 12

Original meadow image

SPGKDE saliency map Figure 10. Meadow category images and corresponding saliency maps. Figure 10. Meadow category images and corresponding saliency maps.

From the above experimental results, we have good reason to believe that the proposed method From theremote above sensing experimental we have good reason to believe that the proposed method can improve scene results, classification performance. can improve remote sensing scene classification performance. 4. Conclusions 4. Conclusions In this paper, an improvement saliency detection method named SPGKDE is proposed based In this paper, an improvement saliency detection method named SPGKDE is proposed based on the existing saliency detection FESD for remote sensing image preprocessing. The new saliency on the existing saliency detection FESD for remote sensing image preprocessing. The new saliency detection method re-aggregates salient information from different image space scales and is detection method re-aggregates salient information from different image space scales and is obviously obviously more applicable to remote sensing images. SPGKDE has a wider field of view than FESD. more applicable to remote sensing images. SPGKDE has a wider field of view than FESD. Thus, a new kind of remote sensing scene classification method that combines SPGKDE Thus, a new kind of remote sensing scene classification method that combines SPGKDE saliency saliency detection preprocessing and LLC is proposed. The method is easy to operate. Visual detection preprocessing and LLC is proposed. The method is easy to operate. Visual saliency detection saliency detection plays an important role in remote sensing image analysis and the traditional LLC plays an important role in remote sensing image analysis and the traditional LLC classification classification technology ignores this technology, resulting in a limited accuracy for image technology ignores this technology, resulting in a limited accuracy for image classification. This paper classification. This paper proposes integrating an improved saliency detection method—SPGKDE— proposes integrating an improved saliency detection method—SPGKDE— into the LLC classification. into the LLC classification. This method can increase inter-class variations and reduce inter-class This method can increase inter-class variations and reduce inter-class dissimilarities. In fact, dissimilarities. In fact, the preprocessing method improves the codebook technology in traditional the preprocessing method improves the codebook technology in traditional LLC. It is a core technology LLC. It is a core technology that directly determines the classification result. This method achieves a that directly determines the classification result. This method achieves a better simulation of the better simulation of the human visual system than traditional LLC. In this paper, both SIFT and LBP human visual system than traditional LLC. In this paper, both SIFT and LBP features were used for features were used for experiments. The experiments show that the proposed method is useful and experiments. The experiments show that the proposed method is useful and can improve remote can improve remote sensing scene classification accuracy. sensing scene classification accuracy. Author Contributions: Conceptualization, Conceptualization, L.J.; M.W.; Formal Formal analysis, analysis, L.J.; L.J.; Funding Funding Author Contributions: L.J.; Data Data curation, curation, L.J. L.J. and and M.W.; acquisition, Writing—original acquisition, X.H.; X.H.; Investigation, Investigation, M.W.; M.W.; Methodology, Methodology, L.J. L.J. and and X.H.; X.H.; Software, Software, L.J. L.J. and and M.W.; M.W.; Writing—original draft, L.J. Funding: This This research funded by by the the National Natural Science Foundation of China (U1435220). Funding: researchwas was funded National Natural Science Foundation of China (U1435220). Acknowledgments: The authors acknowledge the National Natural Science Foundation of China (U1435220). Acknowledgments: The authors acknowledge the National Natural Science Foundation of China (U1435220). Conflicts of Interest: The authors declare no conflicts of interest. Conflicts of Interest: The authors declare no conflicts of interest.

References 1.

2.

3.

4.

Cheng, G.; Han, J.; Guo, L.; Liu, Z.; Bu, S.; Ren, J. Effective and efficient midlevel visual elements-oriented land-use classification using VHR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2015, 53, 4238–4249. Anwer, R.M.; Khan, F.S.; van de Weijer, J.; Molinier, M.; Laaksonen, J. Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification. ISPRS J. Photogramm. Remote Sens. 2018, 138, 74–85. Sivic, J.; Zisserman, A. Video google: A text retrieval approach to object matching in videos. In Proceedings of the IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; Volume 2, pp. 1470–1477. Csurka, G.; Dance, C.R.; Fan, L.; Bray, C.; Csurka, G. Visual categorization with bags of keypoints.

Electronics 2018, 7, 169

11 of 12

References 1.

2.

3.

4. 5. 6.

7.

8.

9. 10. 11. 12.

13. 14.

15. 16.

17.

18. 19. 20.

Cheng, G.; Han, J.; Guo, L.; Liu, Z.; Bu, S.; Ren, J. Effective and efficient midlevel visual elements-oriented land-use classification using VHR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2015, 53, 4238–4249. [CrossRef] Anwer, R.M.; Khan, F.S.; van de Weijer, J.; Molinier, M.; Laaksonen, J. Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification. ISPRS J. Photogramm. Remote Sens. 2018, 138, 74–85. [CrossRef] Sivic, J.; Zisserman, A. Video google: A text retrieval approach to object matching in videos. In Proceedings of the IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; Volume 2, pp. 1470–1477. Csurka, G.; Dance, C.R.; Fan, L.; Bray, C.; Csurka, G. Visual categorization with bags of keypoints. Workshop Statist. Learn. Comput. Vis. Eccv. 2004, 44, 1–22. Bosch, A.; Zisserman, A.; Muñoz, X. Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 712–727. [CrossRef] [PubMed] Yang, L.; Jin, R.; Sukthankar, R.; Jurie, F. Unifying discriminative visual codebook generation with classifier training for object category recognition. In Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008. Lazebnik, S.; Schmid, C.; Ponce, J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In Proceedings of the Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; pp. 2169–2178. Wang, J.; Yang, J.; Yu, K.; Lv, F.; Huang, T.; Gong, Y. Locality-constrained linear coding for image classification. In Proceedings of the Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3360–3367. Koch, C.; Ullman, S. Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiol. 1985, 4, 219–227. Achanta, R.; Estrada, F.; Wils, P.; Süsstrunk, S. Salient region detection and segmentation. In Computer Vision Systems; Springer: Berlin, Germany, 2008; pp. 66–75. Seo, H.J.; Milanfar, P. Training-free, generic object detection using locally adaptive regression kernels. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1688–1704. Rahtu, E.; Heikkilä, J. A Simple and efficient saliency detector for background subtraction. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Kyoto, Japan, 27 September–4 October 2009; pp. 1137–1144. Tavakoli, H.R.; Rahtu, E. Fast and efficient saliency detection using sparse sampling and kernel density estimation. In Image Analysis; Springer-Verlag: Berlin, Germany, 2011; pp. 666–675. Guo, C.; Ma, Q.; Zhang, L. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008. Hou, X.; Zhang, L. Saliency detection: A spectral residual approach. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007. Achanta, R.; Hemami, S.; Estrada, F.; Susstrunk, S. Frequency-tuned salient region detection. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1597–1604. Bruce, N.D.B.; Tsotsos, J.K. Saliency based on information maximization. In Proceedings of the NIPS’05 Proceedings of the 18th International Conference on Neural Information Processing Systems, Cambridge, MA, USA, 5–8 December 2005; pp. 155–162. Lin, Y.; Fang, B.; Tang, Y. A computational model for saliency maps by using local entropy. In Proceedings of the National Conference on Artificial Intelligence, Atlanta, GA, USA, 11–15 July 2010; Volume 2, pp. 967–973. Mahadevan, V.; Vasconcelos, N. Spatiotemporal saliency in dynamic scenes. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 171–177. [CrossRef] [PubMed] Aiazzi, B.; Alparone, L.; Baronti, S.; Garzelli, A.; Zoppetti, C. Nonparametric change detection in multitemporal SAR images based on mean-shift clustering. IEEE Trans. Geosci. Remote Sens. 2013, 51, 2022–2031. [CrossRef]

Electronics 2018, 7, 169

21. 22. 23. 24. 25. 26. 27.

28. 29. 30.

12 of 12

Liu, J.; Tang, Z.; Cui, Y.; Wu, G. Local competition-based superpixel segmentation algorithm in remote sensing. Sensors 2017, 17, 1364. [CrossRef] [PubMed] Chen, J.; Li, Q.; Peng, Q.; Wong, K.H. CSIFT based locality-constrained linear coding for image classification. Pattern Anal. Appl. 2015, 18, 441–450. [CrossRef] Yu, K.; Zhang, T.; Gong, Y. Nonlinear learning using local coordinate coding. In Proceedings of the Advances in Neural Information Processing Systems 22, Vancouver, BC, Canada, 7–10 December 2009; pp. 2223–2231. Wang, S.; Ding, Z.; Fu, Y. Marginalized denoising dictionary learning with locality constraint. IEEE Trans. Image Process. 2018, 27, 500–510. [CrossRef] [PubMed] Lowe, D.G.; Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [CrossRef] Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Anal. Mach. Intell. IEEE Trans. 2002, 24, 971–987. [CrossRef] Boser, B.E.; Guyon, I.M.; Vapnik, V.N. Training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, New York, NY, USA, 27–29 July 1992; pp. 144–152. Fan, R.E.; Chang, K.W.; Hsieh, C.J.; Wang, X.-R.; Lin, C.-J. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 2008, 9, 1871–1874. Dai, D.; Yang, W. Satellite image classification via two-layer sparse coding with biased image representation. IEEE Geosci. Remote Sens. Lett. 2011, 8, 173–176. [CrossRef] Sheng, G.; Yang, W.; Xu, T.; Sun, H. High-resolution satellite scene classification using a sparse coding based multiple feature combination. Int. J. Remote Sens. 2012, 33, 2395–2412. [CrossRef] © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Suggest Documents