sensors Article
Feature Selection Method Based on High-Resolution Remote Sensing Images and the Effect of Sensitive Features on Classification Accuracy Yi Zhou 1 , Rui Zhang 1,2 , Shixin Wang 1, * and Futao Wang 1, * 1 2
*
Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, China;
[email protected] (Y.Z.);
[email protected] (R.Z.) University of Chinese Academy of Sciences, Beijing 100049, China Correspondence:
[email protected] (S.W.);
[email protected] (F.W.)
Received: 26 May 2018; Accepted: 20 June 2018; Published: 22 June 2018
Abstract: With the advent of high spatial resolution remote sensing imagery, numerous image features can be utilized. Applying a reasonable feature selection approach is critical to effectively reduce feature redundancy and improve the efficiency and accuracy of classification. This paper proposes a novel feature selection approach, in which ReliefF, genetic algorithm, and support vector machine (RFGASVM) are integrated to extract buildings. We adopt the ReliefF algorithm to preliminary filter high-dimensional features in the feature database. After eliminating the sorted features, the feature subset and the C and γ parameters of support vector machine (SVM) are encoded into the chromosome of the genetic algorithm. A fitness function is constructed considering the sample identification accuracy, the number of selected features, and the feature cost. The proposed method was applied to high-resolution images obtained from different sensors, GF-2, BJ-2, and unmanned aerial vehicles (UAV). The confusion matrix, precision, recall and F1-score were applied to assess the accuracy. The results showed that the proposed method achieved feature reduction, and the overall accuracy (OA) was more than 85%, with Kappa coefficient values of 0.80, 0.83 and 0.85, respectively. The precision of each image was more than 85%. The time efficiency of the proposed method was two-fold greater than SVM with all the features. The RFGASVM method has the advantages of large feature reduction and high extraction performance and can be applied in feature selection. Keywords: SVM; feature selection; genetic algorithm; object-based; accuracy evaluation
1. Introduction High-resolution remote sensing images are widely used in different fields, such as in land cover mapping and monitoring, classification analysis, road detection and automatic building extraction in a complex environment [1]. Extraction methods can be categorized into three main groups, namely visual interpretation, pixel and object-based methods. The main difference is the basic unit. The visual interpretation method is inefficient and easily affected by human factors. Pixel-based approaches use pixels as the basic analysis units, and object-based approaches split an image into homogeneous regions (objects) of different sizes containing multiple pixels [2]. These methods cannot satisfy the need for information extraction with an increase in image space resolution. The latter also considers the spectral, geometric, texture and topological relationships of image objects which makes it possible to utilize the contextual information. Although the generation of hundreds of different features for each image object is an advantage of object-based approaches, the large number of features results in two main problems at the same time. On one hand, the computational burden of the procedure and the calculation of features become
Sensors 2018, 18, 2013; doi:10.3390/s18072013
www.mdpi.com/journal/sensors
Sensors 2018, 18, 2013
2 of 16
time-consuming. More importantly, the classification accuracy is degraded with limited samples. Thus, optimizing the feature subset is important for classification and information extraction based on high spatial resolution images [3]. Consequently, feature selection methods can be applied to tackle these problems. Feature selection methods include filter, wrapper and embedded algorithms. Filter algorithms remove features directly from the original feature set and independent of the learning algorithm, which may be a classification algorithm or a clustering algorithm. Some studies of filters methods have selected features by object-based extraction. The ReliefF (RF) is a typical feature selection algorithm, which assigns higher weights to features associated with categories, removes irrelevant features quickly and has a high operating efficiency when dealing with multiple classification problems. However, the RF algorithm has difficulty removing redundant features of datasets. Wrapper and embedded algorithms select features concurrently with the learning process and generally lead to better results than filter methods, such as genetic algorithms, which measure the performance of features with a classifier and improve the effect of the learning algorithm at the same time. Numerous object-based models for selecting features have been developed in recent decades to study optimum feature selection; these models include empirical analysis [4], separability and thresholds (SEaTH) [5], minimal redundancy maximal relevance (mRMR) [6] and the genetic algorithm (GA) [7]. Most studies have focused on one method, and only a few works have concentrated on coupling methods. S Rajesh et al. [8] proposed a method based on GA for the selection of a subset from the combination of wavelet packet statistical and Wavelet Packet Cooccurrence textural feature sets. Wang et al. [9] adopted the ReliefF algorithm (RF) to eliminate redundant features. Thus, combined with the advantages of previous methods, optimization of both the goodness-of-fit and the number of variables is worth studying. With the development of space and sensor technology, the amount of high-resolution remote sensing data has increased dramatically [10], and features have been characterized by massiveness and high dimensionality. Thus, extracting effective features of targets from feature sets is a key stage in information extraction. Previous works have focused on single feature extraction methods and pixel-based analyses. However, another problem in objected-based methods is that primitive features are calculated for large areas. The efficiency and precision of information extraction are also challenges. They do not take advantage of the different types of feature selection and object-based methods and do not take into account the optimization of classifier parameters. To solve the problem of high-dimensional feature redundancy and slow convergence in objected-based information extraction, we proposed a feature extraction method integrating ReliefF, the genetic algorithm and support vector machine, namely the RFGASVM method, which combines the filter with the wrapper method. Our approach can be described in four steps: First, improved multi-scale segmentation is utilized to construct blocks of homogenous regions. Second, the features are ranked based on the weights and irrelevant features are removed; third, the preliminarily selected feature subsets are encoded with support vector machine (SVM) parameters (C and γ) into the chromosome and optimized based on the genetic algorithm (GA). Finally, the SVM classifier is employed to test the proposed method and its sensitivity is compared with related methods. 2. Methodology 2.1. Data Sets The described methodology was applied to high-resolution optical satellite images (1 m data for GaoFen-2 and 0.5 m data for Beijing-2) and 0.2 m unmanned aerial vehicle (UAV) images (Figure 1). The sizes of the sample images from the dataset were 900 × 500, 1000 × 1000 and 1500 × 1000, respectively. The remote sensing images included two kinds of data: panchromatic and multispectral (blue, green, red and near-infrared). For the optical remote sensing images, radiometric calibration, Gram–Schmidt pan sharpening algorithm fusion and atmospheric correction were used to obtain high spatial resolution multispectral images. For original aerial drone images, Pixel-Grid software was used
Sensors 2018, 18, 2013 Sensors 2018, 18, x FOR PEER REVIEW
3 of 16 3 of 16
to correct difference the original photo’s distortion and the image was rotated according to the the imagethe was rotated in according to the actual overlapping direction. A position and orientation actual overlapping direction. A position and orientation system (POS) was used under the condition system (POS) was used under the condition of a no-control point through POS-assisted aerial of a no-controlafter point through POS-assisted aerial triangulation three-dimensional free network triangulation three-dimensional free network adjustment after to generate an original orthographic adjustment to generate an original orthographic image (DOM) from the original single photo mosaic. image (DOM) from the original single photo mosaic. Although the urban remote sensing imagery Although the urban remote sensing imagery used in this work was of high resolution, some object edges used in this work was of high resolution, some object edges were still fuzzy which resulted in the were fuzzy which resulted inthe thebackground. object being Therefore, unrecognizable from the background. objectstill being unrecognizable from edge enhancement, an imageTherefore, Gaussian edge enhancement, anreduces image Gaussian method that reduces the effect noise, was used. filtering method that the effectfiltering of noise, was used. This method canofalso decrease the This methodof can also decrease the complexity of image computation system noise [10]. Edge complexity image computation and remove system noise [10]. and Edgeremove enhancement is widely used enhancement in fieldsand suchimage as pattern recognition and image semantic segmentation. in fields such is aswidely patternused recognition semantic segmentation.
(a)
(b)
(c)
Figure 1. Experimental data ((a) GF-2 satellite data, (b) BJ-2 satellite data, and (c) unmanned aerial Figure 1. Experimental data ((a) GF-2 satellite data, (b) BJ-2 satellite data, and (c) unmanned aerial vehicle (UAV) data). vehicle (UAV) data).
2.2. Related Theories 2.2. Related Theories This research attempted to extract a building by employing object-based image analysis. The This research attempted to extract a building by employing object-based image analysis. The goal goal of extraction is to obtain the highest accuracy of identification while using relatively few of extraction is to obtain the highest accuracy of identification while using relatively few features. features. In this work, we used the SVM classifier to extract building information. The two In this work, we used the SVM classifier to extract building information. The two parameters, namely, parameters, namely, C and γ, considerably influenced the final classification accuracy. In feature C and γ, considerably influenced the final classification accuracy. In feature selection based on the selection based on the previous GA, the number of features was barely considered and the previous GA, the number of features was barely considered and the optimization and improvement optimization and improvement of the input parameters of the classifier were not considered. This of the input parameters of the classifier were not considered. This work combined the GA and SVM work combined the GA and SVM classifiers using the RF feature weighting algorithm. When the classifiers using the RF feature weighting algorithm. When the fitness function in GA was set, three fitness function in GA was set, three factors were considered (classification accuracy, number of factors were considered (classification accuracy, number of features and feature cost). This case is features and feature cost). This case is a typical multi-objective optimization problem. Multi-objective optimization enables multiple targets to reach the optimal state at the same time under specific constraints.
Sensors 2018, 18, 2013
4 of 16
a typical multi-objective optimization problem. Multi-objective optimization enables multiple targets to reach2018, the18, optimal stateREVIEW at the same time under specific constraints. Sensors x FOR PEER 4 of 16 2.2.1. 2.2.1.Multiresolution MultiresolutionSegmentation SegmentationMethods MethodsBased Basedon onHigh-Resolution High-ResolutionRemote RemoteSensing SensingImages Images We cover elements elementsfor foraasegmentation segmentation according their characteristics. Wedefined defined typical typical land land cover according to to their characteristics. An An image is segmented into a cluster, called an object, and has shape information [11]. The created image is segmented into a cluster, called an object, and has shape information [11]. The created image image represent real[12,13]. objectsIn[12,13]. In the present study, an adaptive multiscale objectsobjects should should represent real objects the present study, an adaptive multiscale segmentation segmentation model was used to create image objects and the optimization of the scale parameters. model was used to create image objects and the optimization of the scale parameters. For Forhigh-resolution high-resolutionremote remotesensing sensingimages, images,the thefractal fractalnet netevolution evolutionapproach approach(FNEA) (FNEA)was wasaaregional regional growth algorithm from the bottom to the top. Based on the principle of least heterogeneity, growth algorithm from the bottom to the top. Based on the principle of least heterogeneity, the the neighboring pixels similar spectral information merged a homogeneous neighboring pixels withwith similar spectral information were were merged into ainto homogeneous imageimage object. object. All pixels that belonged to the same object after segmentation represented the same feature. All pixels that belonged to the same object after segmentation represented the same feature. In image In image segmentation, thespectral spatial, and spectral andfeatures shape features of the object imagesimultaneously object simultaneously segmentation, the spatial, shape of the image operate operate to generate an object with spectral homogeneity and homogeneous spatial to generate an object with spectral homogeneity and homogeneous spatial characteristics characteristics and and shape features. shape features. The Thescale scaleparameter parameter of ofthe theFNEA FNEAsegmentation segmentation algorithm algorithm was was the the region regionmerger mergercost costwhich which was wasaathreshold thresholdof of“heterogeneity “heterogeneitychange” change”when whenthe theobjects objectswere weremerged. merged.The Themultiscale multiscaleexpression expression of ofimages imageswas wasachieved achievedto toaacertain certainextent; extent;however, however,the theresult resultofofpreviously previouslyset setscaling scalingparameters parameters were barely recorded before segmentation. This method obtained a limited number were barely recorded before segmentation. This method obtained a limited numberof ofmultiscale multiscale expressions. For issues such as unclear hierarchical relationships and scale conversion, an expressions. For issues such as unclear hierarchical relationships and scale conversion, anefficient efficient graph-based model (EGSM) waswas proposed by Felzenszwalb in 2004 graph-basedimage imagesegmentation segmentation model (EGSM) proposed by Felzenszwalb in[14]. 2004This [14].work This adopted the optimal scale scale method, a novel bilevel scale-set model (BSM), which work adopted the optimal method, a novel bilevel scale-set model (BSM), whichwas wasproposed proposed by byHu Hu[15] [15]based basedon onEGSM. EGSM.The Themethod methodcombines combinesthe theFNEA FNEAalgorithm algorithmand andthe thelayered layerediterative iterative optimization of regional consolidation methods. The regional hierarchy structure was optimization of regional consolidation methods. The regional hierarchy structure was constructed, constructed, and and multiscale multiscale representation representation of of house houseimages imageswas wasobtained obtained(Figure (Figure2). 2). Using Using the the BSM, BSM, global global evolutionary analysis and unsupervised scale set reduction were applied, and in processes where evolutionary analysis and unsupervised scale set reduction were applied, and in processes where hierarchical consolidation waswas recorded completely, the hierarchical relationships were recorded hierarchicalregion region consolidation recorded completely, the hierarchical relationships were and each region was indexed a scale. based on global evolution analysis was recorded and each region was on indexed on aScale scale.reduction Scale reduction based on global evolution analysis performed according to the minimum risk Bayesian decision framework. The BSM can be used to was performed according to the minimum risk Bayesian decision framework. The BSM can be used calculate the image segmentation results at any scale inversely, so as to solve the problem of adjusting to calculate the image segmentation results at any scale inversely, so as to solve the problem of the scale parameters. adjusting the scale parameters.
(a)
(b) Figure 2. Cont.
Sensors 2018, 18, 2013
5 of 16
Sensors 2018, 18, x FOR PEER REVIEW
5 of 16
(c) Figure 2. Results of segmented images ((a) GF-2 satellite data, (b) BJ-2 satellite data, and (c) UAV data). Figure 2. Results of segmented images ((a) GF-2 satellite data, (b) BJ-2 satellite data, and (c) UAV data).
2.2.2. Feature Extraction Structure 2.2.2. Feature Extraction Structure In the investigated images (obtained from satellite and UAV images), the variable features In the investigated images (obtained from satellite and UAV images), the variable features were were extracted using eCognition 9.1. Such features included spectral, geometry, texture, shadow, extracted using eCognition 9.1. Such features included spectral, geometry, texture, shadow, context context and geoscience auxiliary features of image objects. To test the performances of feature and geoscience auxiliary features of image objects. To test the performances of feature optimization optimization and selection, we collected a total of 113 features from high resolution remote-sensing and selection, we collected a total of 113 features from high resolution remote-sensing images, images, which included GF-2, BJ-2 satellite images, and 67 features from UAV images. A which included GF-2, BJ-2 satellite images, and 67 features from UAV images. A description of description of the object features is shown in Tables 1 and 2. The UAV images only contained R, G the object features is shown in Tables 1 and 2. The UAV images only contained R, G and B bands; and B bands; as such, their spectral and shadow characteristics considerably differed from those of as such, their spectral and shadow characteristics considerably differed from those of satellite images. satellite images. Table 1. Description of features extracted from high resolution remote-sensing images. Table 1. Description of features extracted from high resolution remote-sensing images. FeatureName Name Feature
Feature FeatureDescription Description
Spectral features
MeanL L(R,(R, B, NIR); brightness; L (R, G, B,ratio NIR); ratio L NIR); (R, G,max.diff; B, NIR);MBI Mean G,G, B, NIR); brightness; SD L SD (R, G, B, NIR); L (R, G, B, index (Huang Xin et al.);(Huang BAI:(B − NIR)/(B NIR); − NIR)/(B + NIR); max.diff; MBI index Xin et al.);+BAI:(B NDBI: (MIR − NIR)/(MIR + NIR); NDVI: (NIR − R)/(NIR + R); DVI: NIR − R; RVI: NDBI: (MIR − NIR)/(MIR + NIR); NDVI: (NIR − R)/(NIR + R); DVI: NIR − R; RVI: NIR/R; SAVI: 1.5 × (NIR − R)/NIR + R + 0.5); OSAVI: (NIR − R)/(NIR + R + 0.16); SBI: NIR/R; 2 + NIRSAVI: 2 )0.5 ; 1.5 × (NIR − R)/NIR + R + 0.5); OSAVI: (NIR − R)/(NIR + R + 0.16); (R 2 + NIR2)0.5; SBI: (R NDWI:(G − NIR)/(G + NIR)
Geometrical features
NDWI:(G NIR)/(G + NIR) Area; length;−width; length/width; boundary length; pixel number; shape index; density; Area;direction; length; asymmetry; width; length/width; boundary length; pixelfit; number; shape main compactness; rectangular fit; elliptic differential of index; morphological (DMP) density; mainprofiles direction; asymmetry; compactness; rectangular fit; elliptic fit;
Spectral features
Geometrical features Textural features
Textural features Shadow indexs
Shadow indexs
GLCM entropy; angular second moment; differential of GLCM morphological profiles (DMP)GLCM correlation; GLCM homogeneity; GLCM GLCM mean; GLCMsecond SD; GLCM dissimilarity; angularGLCM second GLCMcontrast; entropy; GLCM angular moment; GLCMGLDV correlation; moment; GLDV entropy; GLDV contrast; GLDV mean
homogeneity; GLCM contrast; GLCM mean; GLCM SD; GLCM dissimilarity;
SI:(R + Gangular + B + NIR)/4; GLDV second moment; GLDV entropy; GLDV contrast; GLDV mean Index related to shadow: Chen1: 0.5 × (G + NIR)/R − 1; Chen2: (G − R)/(R + NIR); SI:(R + G + B + NIR)/4; Chen3: (G + NIR − 2R)/(G + NIR + 2R); Chen4: (R + B)/(G − 2); Chen5: |R + G − 2B|
Index related to shadow: Chen1: 0.5 × (G + NIR)/R − 1; Chen2: (G − R)/(R + NIR); Chen3: (G + NIR − 2R)/(G + NIR + 2R); Chen4: (R + B)/(G − 2); Chen5: |R + G − 2B| Geo-Auxiliary features Digital elevation model(DEM); slope; aspect; building vectors Contextual features Object numbers; object layers; image resolution; mean of image layers (Remark: Mean L: mean of the bands; SD L:model(DEM); Standard Deviation the bands; ratio L: ratio of the bands; MBI: Geo-Auxiliary features Digital elevation slope;ofaspect; building vectors Contextual features
Object numbers; object layers; image resolution; mean of image layers
Morphological Building Index; BAI: Building Area Index; NDBI: Normalized Difference Build-up Index; NDVI: (Remark: Mean L: mean of theIndex; bands; SDDifference L: Standard Deviation of RVI: the bands; ratio L: ratio the Normalized Difference Vegetation DVI: Vegetation Index; Ratio Vegetation Index;of SAVI: Soil-Adjusted Index; OSAVI: Optimized Adjusted Vegetation SBI: NDBI: Soil brightness index; bands; MBI:Vegetation Morphological Building Index;SoilBAI: Building AreaIndex; Index; Normalized NDWI: Normalized Difference Water Index; GLCM: Gray Level Co-occurrence Matrix; GLDV: Grey Level Difference Difference Build-up Index; NDVI: Normalized Difference Vegetation Index; DVI: Difference Vector; SI: Shadow Index; Chen: Custom features).
Vegetation Index; RVI: Ratio Vegetation Index; SAVI: Soil-Adjusted Vegetation Index; OSAVI: Optimized Soil Adjusted Vegetation Index; SBI: Soil brightness index; NDWI: Normalized Difference Water Index; GLCM: Gray Level Co-occurrence Matrix; GLDV: Grey Level Difference Vector; SI: Shadow Index; Chen: Custom features).
Sensors 2018, 18, 2013
6 of 16
Table 2. Description of features extracted from unmanned aerial vehicle image. Feature Name
Feature Description
Spectral features
Mean L (R, G, B, NIR); brightness; SD L (R, G, B, NIR); ratio L (R, G, B, NIR); max.diff; Green Index: GR = G/(R + G + B); Red-Green Vegetation Index: NGRDI = (G − R)/(G + R); GLI = (2G − R − B)/(2G + R + B)
Geometrical features
Area; length; width; length/width; boundary length; pixel number; shape index; density; main direction; asymmetry; compactness; rectangular fit; elliptic fit; differential of morphological profiles (DMP); digital surface model(nDSM); height standard deviation
Textural features
GLCM entropy; GLCM angular second moment; GLCM correlation; GLCM homogeneity; GLCM contrast; GLCM mean; GLCM SD; GLCM dissimilarity; GLDV angular second moment; GLDV entropy; GLDV contrast; GLDV mean Chen4: (R + B)/(G − 2); Chen5: |R + G − 2B|
Shadow indexs
Object numbers; object layers; image resolution; mean of image layers
Contextual features Geo-Auxiliary features
Digital elevation model(DEM); slope; aspect; building vectors
(Remark: GR: Green Index; GLR: Green Leaf Index; NGRDI: Red-Green vegetation index).
2.2.3. Feature Selection Based on ReliefF Algorithm and Coupled GA-SVM Models ReliefF algorithm: ReliefF (RF), an extension of the Relief method, is efficient in estimating the quality of attributes but is limited to two-class problems only. This method can calculate distances between sample distributions and reliably estimate probabilities and can handle incomplete and multiclass data sets while the complexity remains the same [16,17]. When dealing with multiple types of problems, such as regression problems, for continuous data, the RF algorithm does not uniformly select the nearest neighbor sample from all different sample sets but selects the nearest neighbor sample from each set of samples; the degree of the importance of a feature is evaluated by calculating the ability to separate the nearest distance between any two classes. Given a sample set, S, sample R is selected from S, and the K nearest neighbors of sample R are found. The closest same class instance of sample R is called “near-hit (NH),” and the closest different-class instance of sample R is called “near-miss (NM).” The weight of feature t is denoted as ωt , which is updated. To reduce the randomness in feature evaluation, the entire process should be repeated m times to obtain the average value, which is set as the final weight. ωit
=
ωit−1
+
∑c6=class( x)
p( x ) 1− p(class( x ))
∑kj=1 di f f ( x, M ( x ))
m×k
−
∑ik=1 di f f ( x, H ( x )) , m×k
where diff () indicates the distance of the sample on feature t; M(x) and H(x) represent the closest same class sample and a different-class sample of sample x, respectively; p() represents the ratio of the entire samples in class ci to all heterogeneous samples in S; m is the number of iterations; and k is the number of nearest neighbors. SVM model: The basic principles of SVM can be found in the studies of Cortes and Vapnik [18] and Devroye et al. [19]. SVM provides the optimal hyperplane (Figure 1) to maximize the margin between the closest positive and negative samples because of its effectivity in working with linearly non-separable and high dimensional datasets [20]. The white and black points are samples of two categories (Figure 3). H is the classification line, H1 and H2 represent the straight lines of the two closest samples from H and the distance between them is the classification interval. The optimal classification hyperplane makes the classification correct while maximizing the separation margin.
Sensors 2018, 2018, 18, 18, x2013 Sensors FOR PEER REVIEW
of 16 16 77 of
Positive objects
Negative objects
H1
H
H2
Figure 3. 3. The The optimal optimal hyperplane. hyperplane. Figure 𝑇 Tx The algorithm seeks seeks aa linear lineardecision decisionsurface surface(H) (H)using usingf(xf(x) 𝑥++b,𝑏,where where The original original SVM SVM algorithm ww is ) ==w𝑤 is a dimensional coefficient and is the SVM achieveshyperplane an optimal a dimensional coefficient vectorvector and b is the boffset. Theoffset. linear The SVMlinear achieves an optimal by hyperplane by solvingoptimization the followingproblem: optimization problem: solving the following
1 min1 ‖𝑊‖22 min 2k W k 2 𝑇 s. t. : 𝑌𝑖 ∙ (𝑊 ∙ 𝑋𝑖 + 𝑏) ≥ 1 𝑖 = 1,2,3, … 𝑁. T s.t. : Y · W · Xi + b ≥ 1 i = 1, 2, 3, . . . N. The optimization of the optimali hyperplane can be converted into a Lagrangian dual problem: 1 The optimization of the optimal hyperplane can 𝑁 be converted into a Lagrangian dual problem: ‖𝑊‖2 − ∑ L(W, b, a) = 𝑎 [(𝑊 ∙ 𝑋 + 𝑏) − 1], 2
𝑖=1
𝑖
𝑖
1 N where 𝑎𝑖 ≥ 0 and is the Lagrangian multiplier. The function can be k W k2 − 1], L(W, b, a) = a classification [(W · Xi + b) −discriminant ∑final i =1 i 2 expressed as T classification discriminant function can be where ai ≥ 0 and is the Lagrangian multiplier. The final f(𝑋) = ∑𝑁 𝑖=1 𝑎𝑖 𝑌𝑗 𝑋𝑖 ∙ X + 𝑏. expressed as N In most cases, SVM maps nonlinear to+ the f( X )training = ∑i=1samples ai Yj Xi T ·X b. high-dimensional feature space and constructs linear discriminant functions. One of the most popular and frequently used kernel In most cases, SVM maps nonlinear training the high-dimensional functions is the radial basis function (RBF), which samples has goodtogeneralization ability: feature space and constructs linear discriminant functions. One of the most popular and frequently used kernel functions f(𝑋) = ∑𝑁 𝑎𝑖 𝑌𝑗 K(𝑋𝑖 ∙ X) + 𝑏. 𝑖=1 is the radial basis function (RBF), which has good generalization ability: SVM uses a kernel function to map nonlinearly separable classes from a low-dimension to a N f( Xis) a=useful ai Yj K( Xi ·and X) +has b. been implemented widely. It can higher dimension feature space. RBF ∑i=1 function map non-linear primitive features to high dimensions and deal with problems of non-linear SVM uses kernelkernel function to mapis nonlinearly separable from aalow-dimension to separability. Thea linear function a special case of RBF.classes In addition, large amount of a higher dimension featureparameters space. RBF a useful functionneed and to has implemented widely. polynomial kernel function andisthe inner product be been calculated. As a result, the It can map non-linear primitive features to high dimensions and deal with problems of non-linear model is complex, and there are calculation problems, such as overflow. A small number of RBF separability. linear kernel function is a special case of for RBF. In addition, a large amount of kernel functionThe parameters are more convenient and efficient model calculating. An RBF kernel polynomial kernel function and the to be calculated. Asmodel. a result, needs two parameters (C andparameters γ) which should be inner set to product obtain anneed improved classification C the model is complex, and there are calculation problems, such as overflow. A small number of is a preset value that penalizes the misclassification, and γ controls the width of the RBF kernel [21]. RBFobtain kernelan function are of more convenient and efficient model RBF To optimalparameters combination C and γ, the present work for used gridcalculating. search and An 10-fold kernel needs two parameters (C and γ) which should be set to obtain an improved classification cross-validation. Grid search is a process where various combinations of C and γ are selected model. aCpredefined is a preset value the misclassification, and is γ controls of the RBF within range that at apenalizes certain interval. Cross-validation used to the testwidth the accuracy of kernel [21]. To obtain an optimal combination of C and γ, the present work used grid search and 10-fold classification in terms of different combinations of C and γ.
Sensors 2018, 18, 2013
8 of 16
Sensors 2018, 18, x FOR PEER REVIEW
8 of 16
cross-validation. Grid search is a process where various combinations of C and γ are selected within a predefined at a certain interval. Cross-validation is used to test theasaccuracy of classification in GA: Thisrange algorithm consists a series of genetic operations, such selection and crossover, terms of different combinations of C which are mutations, to generate a and newγ.generation of groups which are gradually evolved to be GA: This algorithm consists a series of genetic operations, such as first, selection and crossover, included or become close to the optimal solution [22]. In feature selection, the feature set to be which are mutations, to generate a new generation of groups which are gradually evolved to be optimized and C and γ in the SVM classifier are encoded into a chromosome. A fitness function is included or become close to optimal solution [22].ofIn the feature selection, first,initial the feature set to be constructed considering thethe recognition accuracy house, and an population is optimized and C and γ in the SVM classifier are encoded into a chromosome. A fitness function is generated. The initial population is selected through selection and cross-mutation operations. constructed considering the recognition accuracy the house, an initial population is generated. Individuals in the population are optimized to ofproduce theand optimal subset of features and the The initial population is selected through selection and cross-mutation operations. Individuals in the optimal C and γ. population are optimized to produce the optimal subset of features and the optimal C and γ. The basic procedure for chromosome coding can be summarized as shown in Figure 4. In The basic procedure for chromosome coding bethe summarized in Figure 4. chromosome design, the chromosome includes three can parts: feature set,as C shown and γ. The detailed 𝑛(𝑓)feature set, C and γ. The detailed 1 In chromosome design, the chromosome includes three parts: the design for the chromosome is shown in Figure 3, where 𝑔 to 𝑔 is the encoding of feature set (f), 𝑓
n𝑓( f )
design for the chromosome is shown Figuren 3, where g1f atosequence gf is the encoding 1ofrepresents feature setthat (f ), 𝑛(𝑓) represents the bits of the code, ininwhich represents of numbers, 𝑛(𝐶)1 represents that the n f represents the bits of the code, in which n represents a sequence of numbers, ( ) 1 the feature is selected and 0 represents that the feature is ignored; 𝑔𝐶 to 𝑔𝐶 encodes the SVM 1 to gn(C ) encodes the SVM parameter 𝑛(𝛾) feature is selected and that the feature is ignored; g 1 0 represents C 𝑛(𝐶) and 𝑛(𝛾) represent the parameter C, and 𝑔𝛾 to 𝑔𝛾 encodes the SVM parameter γ,C while 1 to gn(γ) encodes the SVM parameter γ, while n (C ) and n ( γ ) represent the bits of the code. C, and g γ bits of the γ code.
Figure 4. The chromosome design. SVM: support vector machine. Figure 4. The chromosome design. SVM: support vector machine.
Based on the feature weights calculated by RF, the initial population is provided for GA. In the Based on the feature phase, weightsthe calculated by RF, theofinitial population provided for GA. In the the population initialization population size GA is based on isfeature weights and population initialization phase, the population size of GA is based on feature weights and the maximum maximum number of iterations must be set appropriately. The weights of the retained features are number of iterations must be setas appropriately. The weights of theisretained are normalized, normalized, and the result is set the probability that the feature selected.features An individual’s fitness and the result is set as the probability that the feature is selected. An individual’s fitness of the proposed of the proposed method is primarily determined by three evaluation criteria, namely the method is primarily determined evaluation criteria, namely classification accuracy, the size classification accuracy, the sizeby ofthree the selected feature subset andthe the feature cost. However, the of the selected feature subset and the feature cost. However, the feature cost was ignored in previous feature cost was ignored in previous studies. Thus, small feature subsets include a low total feature studies. small feature subsets include low total and a high classification cost andThus, a high classification accuracy, in awhich the feature optimalcost individual (single optimal accuracy, feature) in which the optimal individual (single optimal feature) demonstrating good fitness is chosen demonstrating good fitness is chosen during the evolutionary process; an individual’s fitness during can be the evolutionary process; an individual’s fitness can be obtained as follows: obtained as follows: 11 ∑𝑛n 𝑓 × 𝐶 ), fitness 𝑎ccuracy++W𝑊𝑓××((1 𝑎× 𝑖 ), 𝑖=1 𝑖f × C fitness = =W𝑊 accuracy 1 −− 𝑛 ∑ a× i f n i =1 i where, 𝑊𝑎 represents the weight of the classification accuracy, which is the classification results of where, the weight the of the classification accuracy, which isfeatures the classification results ofand the a represents the testW samples. 𝑊𝑓 represents weight of the number of selected with feature costs, test samples. W represents the weight of the number of selected features with feature costs, and C 𝐶𝑖 represents thef cost of the features. If 𝑓𝑖 = 1, then the feature is selected as an input feature for thei represents the cost of the features. If f i = 1, then the feature is selected as an input feature for the SVM SVM classifier; if 𝑓𝑖 = 0, then the feature is ignored. Algorithm 1 shows the flow of the proposed classifier; if f i = 0, then the feature is ignored. Algorithm 1 shows the flow of the proposed feature feature selection method. selection method.
Sensors 2018, 18, 2013
9 of 16
Algorithm 1. Flow of the proposed feature selection method. Input Sensors 2018, 18, x FOR PEER REVIEW
9 of 16 n( f )
n(C )
n (γ)
S is an initial sample feature set and g f , gC , gγ are the initial population, where f encodes Algorithm 1. Flow of the proposed feature selection method. the feature set, Input C and γ are the encoded SVM parameters. 𝑛(𝑓) 𝑛(𝐶) 𝑛(γ) are the initial population, where f encodes OutputS is an initial sample feature set and 𝑔𝑓 , 𝑔𝐶 , 𝑔γ Extracted based on the RFGASVM method the features feature set, RepeatC and γ are the encoded SVM parameters. Output sample feature set using ReliefF and the weight of feature t (ωi ) is updated m times to obtain 1. Sequence t Extracted the averagefeatures value based on the RFGASVM method Repeat 2. Population initialization with the RFGA method 1. Sequence sample feature set using ReliefF and the weight of feature 𝑡 (ω𝑖𝑡 ) is updated m times Set the individual’s fitness. The feature cost is n1 ∑in=1 f i × Ci , where Ci represents the cost of features and 3. to obtain the average value f i = 1, 0 2. Population initialization with the RFGA method 1 3. Set the individual’s fitness. The feature cost is ∑𝑛𝑖=1 𝑓𝑖 × 𝐶𝑖 , where 𝐶𝑖 represents the cost of 𝑛 Until the termination test is met features and 𝑓𝑖 = 1, 0 4. AUntil smallthe feature subset with low total feature cost and high classification accuracy termination test isa met 4. A small feature subset with a low total feature cost and high classification accuracy (Remark: RFGASVM: ReliefF, genetic algorithm, and support machine (Remark: RFGASVM: ReliefF, genetic algorithm, and support vector machinevector method; RFGA:method; integrateRFGA: ReliefF with integrate ReliefF with genetic algorithm method). genetic algorithm method).
The algorithm mentioned aboveisisshown shown in The algorithm mentioned above in Figure Figure5.5. Information extraction
segmentation
High resolution images
Stopping criterion
Fitness function value
Yes
The optimal feature set
No Enhancement and denoising Multiresolution segmentation
Image objects
selection crossover mutation
Chromosome decoding and population initialization
Calculating and ranking features
features
SVM classifier
SVM Parameter (C,γ)
Initial feature set
Feature selection
Accuracy evaluation
Figure 5. Flow chartofofthe theextraction extraction of of house house information onon thethe RFGASVM optimization algorithm. Figure 5. Flow chart informationbased based RFGASVM optimization algorithm.
2.3. Accuracy Assessment
2.3. Accuracy Assessment
The accuracy assessment was conducted using a confusion matrix from the perspective of
The accuracyand assessment wasofconducted using confusion matrix recall, fromandthe perspective classification, the performance the SVM classifier wasa evaluated by precision, F1-score of classification, and the rate. performance of the classifier evaluated precision, recall, based on recognition We evaluated the SVM accuracy of the was proposed method by from these two perspectives. and F1-score based on recognition rate. We evaluated the accuracy of the proposed method from From the perspective of classification, the overall accuracy (OA), the producer’s accuracy (PA), the these two perspectives. user’s accuracy (UA), and the Kappa coefficient (Kappa) [23] were evaluated using the accuracy evaluation From the perspective of classification, the overall accuracy (OA), the producer’s accuracy (PA), the user’s function of eCognition. The Kappa coefficient is the most significant one, because it marks the robustness accuracy (UA), and the Kappa coefficient (Kappa) [23] were evaluated using the accuracy evaluation of an algorithm. If the coefficient is over 0.6, the algorithm is recognized as having good performance. function The Kappa coefficient thegeneral most performance significant one, it marks the robustness OAofiseCognition. an overall assessment which indicatesisthe of thebecause technique. of an algorithm. If the coefficient is over 0.6, the algorithm is recognized as having good performance. 𝑇𝑃+𝑇𝑁 𝑂𝐴 = , 𝑇 OA is an overall assessment which indicates the general performance of the technique. 𝐾𝑎𝑝𝑝𝑎 =
𝑇×(𝑇𝑃+𝑇𝑁)−∑
TP𝑇×𝑇−∑ + TN OA = , T
Kappa =
,
T × ( TP + TN ) − ∑ , T×T−∑
Sensors 2018, 18, 2013
10 of 16
Sensors 2018, 18, x FOR PEER REVIEW
10 of 16 TP , TP + 𝑇𝑃FN 𝑃𝐴 = , 𝑇𝑃+𝐹𝑁 TP UA = , 𝑇𝑃 𝑈𝐴 =TP + FP , 𝑇𝑃+𝐹𝑃 where ∑ = ( TP + FP) × ( TP + FN ) + ( FN + TN ) × ( FP + TN ). TP is the correctly extracted pixels; where ∑ = (𝑇𝑃 + 𝐹𝑃) × (𝑇𝑃 + 𝐹𝑁) + (𝐹𝑁 + 𝑇𝑁) × (𝐹𝑃 + 𝑇𝑁). TP is the correctly extracted pixels; FP is the incorrectly extracted pixels; TN is the non-building pixels that are correctly rejected; FN is the FP is the incorrectly extracted pixels; TN is the non-building pixels that are correctly rejected; FN is building pixels that are not detected. the building pixels that are not detected. From the perspective of the recognition rate [24], precision is the percentage of building objects From the perspective of the recognition rate [24], precision is the percentage of building objects that are correctly classified by the SVM classifier. Recall is the percentage of correctly classified building that are correctly classified by the SVM classifier. Recall is the percentage of correctly classified objects among all actual buildings. The F1-score is a combination of precision and recall: building objects among all actual buildings. The F1-score is a combination of precision and recall: Ntp 𝑁𝑡𝑝 100%, Pre𝑃𝑟𝑒 = = 𝑁𝑡𝑝+N𝑓𝑝 × × 100%, Ntp + N f p
PA =
𝑁𝑡𝑝
𝑅𝑒𝑐 = × 100%, 𝑁𝑡𝑝+𝑁𝑓𝑛 Ntp Rec = × 100%, Ntp + N𝑃𝑟𝑒×𝑅𝑒𝑐 fn 𝐹1 = 2 × . 𝑃𝑟𝑒+𝑅𝑒𝑐 Pre × Rec F1 = 2 ×to be a binary . classification, where building objects The building extraction can be considered Pre + Rec are positives and the remaining non-building objects are negatives. Ntp denotes the number of The building extraction can be considered to be a binary classification, where building objects are buildings correctly extracted; the detected buildings are at least partially real. Nfp denotes the positives and the remaining non-building objects are negatives. Ntp denotes the number of buildings number of buildings mistakenly extracted. Nfn denotes the number of non-buildings mistakenly correctly extracted; the detected buildings are at least partially real. Nfp denotes the number of extracted. buildings mistakenly extracted. Nfn denotes the number of non-buildings mistakenly extracted. 3. Experimental Results and and Discussion Discussion 3. Experimental Results 3.1. Selection of of Building Building Samples Samples 3.1. According to to the the characteristics characteristics of of different different sensor sensor images images and and the the principle principle that that human human eyes eyes According can recognize recognize houses houses in in high-resolution high-resolution images, images, aa remote remote sensing sensing classification classification system system for for houses houses can was determined. The houses were divided into four types: high-rise buildings, multistorey was determined. The houses were divided into four types: high-rise buildings, multistorey buildings, buildings, factory and buildings general houses. Various typical house samples selected factory buildings generaland houses. Various typical house samples were selectedwere through the through the segmentation GF-2, BJ-2 and UAV images. During sample samples were segmentation of GF-2, BJ-2of and UAV images. During sample selection, theselection, samplesthe were distributed distributed evenly and as possible and included type of house. Given the SVM model was as evenly asas possible included each type ofeach house. Given that the SVMthat model was used, roads, used, roads, vegetation, shadows, and bare land have selected. To reduce the vegetation, shadows, water and barewater land should have beenshould selected. To been reduce the influence of mixed influence of mixed pixels on classification accuracy, we tried to avoid mixed pixels when selecting pixels on classification accuracy, we tried to avoid mixed pixels when selecting a sample. The housea sample. samples The house samples and test samples are the shown in Figure 6 andand thetheir selected land training andtraining test samples are shown in Figure 6 and selected land types quantities types andin their quantities are listed in Table 3. are listed Table 3.
Figure 6. 6. Schematic of training training and and test test samples. samples. Figure Schematic of
Sensors 2018, 18, 2013
11 of 16
Sensors 2018, 18, x FOR PEER REVIEW
Table 3. Sample statistics selected from different high-resolution images.
11 of 16
Table 3. Sample statistics selected from different high-resolution images. Data Sample Category Building Road VegetationShadow Water Bare Land Data Sample Category Building Road Vegetation Shadow Water Bare Land Training samples 95 75 85 68 70 92 GF-2 image Training samples 95 75 85 68 70 92 106 92 113 72 85 110 GF-2 image Testing samples Testing samples 106 92 113 72 85 110 Training samples 95 80 87 79 – Training samples 95 80 87 79 -91 91 BJ-2BJ-2 image image Testing samples 102 95 86 Testing samples 102 95 9292 86 -- – 95 95 Training 105 110 90 Trainingsamples samples 105 110 9595 90 -- – 90 90 UAV image UAV image Testing 112 115 102 98 Testingsamples samples 112 115 102 98 -- – 102 102
3.2.Building BuildingIdentification IdentificationResults Resultsand andAccuracy Accuracy of of the the Proposed Proposed Method Method 3.2. Thenew newmethod methodwas wasvalidated validated with with three three images images captured The captured by by the theGF-2 GF-2satellite, satellite,BJ-2 BJ-2satellite, satellite, and UAV. These images described parts of urban and rural areas. However, visual interpretation and UAV. These images described parts of urban and rural areas. However, visual interpretation onon the the image the administrative entire administrative area ispractical; rarely practical; such, threeimages, typicalwhich images, which image of the of entire area is rarely as such,as three typical contained contained darksimilar roofs and similar spectral characteristics roads, weretoselected to obtainresults reliable dark roofs and spectral characteristics with roads,with were selected obtain reliable on results on performance. Therefore, the performances of our feature optimization algorithm may not performance. Therefore, the performances of our feature optimization algorithm may not seem the seem the same as those in other studies. However, the proposed method acquired satisfactory same as those in other studies. However, the proposed method acquired satisfactory results under results under relatively poor conditions, and high accuracy would be easily accessible. The relatively poor conditions, and high accuracy would be easily accessible. The experiments were experiments were performed 15 times on the three resolution images (Figure 7a: GF-2; Figure 7b: performed 15 times on the three resolution images (Figure 7a: GF-2; Figure 7b: BJ-2; Figure 7c: UAV), BJ-2; Figure 7c: UAV), and the average value represents the highest recognition accuracy. Figure 7a and the average value represents the highest recognition accuracy. Figure 7a shows the housing shows the housing extraction results of GF-2 imagery; in the figure, buildings are differentiated from extraction results of GF-2 imagery; in the figure, buildings are differentiated from other land types, other land types, especially high-rise and multistorey buildings in urban areas. Figure 7b was the especially high-rise and multistorey buildings in urban areas. Figure 7b was the most difficult to detect most difficult to detect among the three images, because all the buildings and roads shared similar among the three images, because all the buildings and roads shared similar spectral characteristics. spectral characteristics. It is difficult to distinguish buildings from the background when there is no It shadow is difficult to distinguish buildings from the background there is in nothe shadow the sensing building. from the building. We obtained four different when rural houses UAVfrom remote We obtained four different rural houses in the UAV remote sensing images and compared the proposed images and compared the proposed algorithm with manual visual interpretation. The experimental algorithm with manual visual7c. interpretation. The experimental results are shown in the Figure 7c.area Theon left results are shown in Figure The left image is the original remote sensing image; black image is the original the remote sensing thealgorithm, black areaand on the the red right represents the results of the the right represents results of the image; proposed polygons represent the visual proposed algorithm, and the red polygons represent the visual interpretation results. interpretation results.
(a)
(b) Figure 7. Cont.
Sensors 2018, 18, 2013
12 of 16
Sensors 2018, 18, x FOR PEER REVIEW
12 of 16
(c) Figure 7. Extraction results of urban and rural areas. Figure 7. Extraction results of urban and rural areas.
For the optimal accuracy of building identification, the selected feature subset of image 1 (GF-2) For the accuracy building the selected of imageand 1 (GF-2) contains theoptimal mean b, mean r,ofSBI, GLCMidentification, mean (all direction), MBI,feature NDVI,subset length/width the contains the mean b, mean r, SBI, GLCM mean (all direction), MBI, NDVI, length/width and the elliptic elliptic fit; image 2 (BJ-2) contains the max.diff, mean r, shape index, GLCM homogeneity (all fit; image 2 (BJ-2) mean3r,(UAV) shapecontains index, GLCM homogeneity (allbrightness, directions), directions), GLDVcontains entropy the andmax.diff, Chen3; image the ratio g, green index, GLDV entropy and Chen3;GLCM image SD, 3 (UAV) contains ratio g, greenlength/width index, brightness, rectangular Fit, rectangular Fit, density, GLCM ASM, the main direction, and the elliptic fit. density, GLCM SD, GLCM ASM, main direction, length/width and the elliptic fit. The stastics analysis The stastics analysis of the accuracy is shown in Table 4. The new technique has a robust kappa ofcoefficient, the accuracy is shown inatTable Thepreferred new technique has kappaand coefficient, concentrated concentrated 0.83.4.The features area robust more robust resist variation in theat 0.83. The preferred features are more robust and resist variation in the images, whether theirR,buildings images, whether their buildings are densely distributed or not. The UAV images only have G, and B bands; such, the longer optimization time forhave extracting classification higher are denselyasdistributed or not.the The UAV images only R, G, and B bands; features as such, is, thethe longer the the number of features used for identification optimization time for extracting classification is. features is, the higher the number of features used for identification is. Table 4. Statistical analysis of the accuracy of proposed method for processing high-resolution imagery. Table 4. Statistical analysis of the accuracy of proposed method for processing high-resolution imagery. High-Resolution Imagery GF-2 Satellite Image BJ-2 Satellite Image UAV Image Overall accuracy (OA) 88.52 89.75 91.3 High-Resolution Imagery GF-2 Satellite Image BJ-2 Satellite Image UAV Image Kappa coefficient 0.8 0.83 0.85 Overall accuracy (OA) (PA) 88.52 89.75 Producer’s Accuracy 91 93.12 96.2191.3 Kappa 0.8 0.83 User’scoefficient Accuracy (UA) 89.65 89 90.380.85 Producer’s Accuracy (PA) 918 93.12 Number of features 6 10 96.21 User’sOptimization Accuracy (UA) 89.65 89 time 7.85 13.79 18 90.38 Number of features 8 6 10 Optimization time 7.85 13.79 18
3.3. Verification of Feature Selection Based on Kernel Density Estimation
To analyze object features, we on used the KS density (Kernel Smoothing function estimate) to 3.3. Verification of the Feature Selection Based Kernel Density Estimation fit the probability distribution density of every feature for different category samples. The kernel To analyze object features,representation we used the KSofdensity (Kernel Smoothing function(PDF) estimate) distribution is the a nonparametric the probability density function of ato fitrandom the probability distribution density of every feature for different category samples. The kernel variable. Fitting the probability distribution of an object feature by using the KS density is distribution a nonparametric of the probability density reasonable.isThe formula of the representation kernel density estimator is as follows [25]: function (PDF) of a random variable. Fitting the probability distribution of an1 object feature by using the KS density is reasonable. 𝑥−𝑥𝑖 𝑛 𝑓̂ℎ (x)is=as follows 𝑎𝑖=1 𝐾( [25]: ), The formula of the kernel density estimator 𝑛ℎ ℎ where n is the sample size; xi is the object feature value; K(.) is the Kernel Smoothing function; and h 1 n x − xi fˆh (x) = places a Kthe ( values ), into discrete bins and sums the is the bandwidth. The kernel distribution nh i=1 h component smoothing functions for each data value to produce a smooth, continuous probability where is the sample size; xi is the object feature value; K(.) is thedensity Kernel of Smoothing and h is curve.n Figure 8 below represents the probability distribution differentfunction; object features the bandwidth. kernel distribution discrete bins and on sums component from the three The typical study areas. Landplaces typesthe canvalues be wellinto distinguished based thethe features, and smoothing data value produce a smooth, continuous probability Figure 8 residentialfunctions land canfor be each separated from to other adjacent land types, thereby facilitatingcurve. information extraction. below represents the probability distribution density of different object features from the three typical
Sensors 2018, 18, 2013
13 of 16
study areas. Land types can be well distinguished based on the features, and residential land can be types, thereby facilitating information extraction. 13 of 16
separated other adjacent Sensors 2018, from 18, x FOR PEER REVIEWland
(a)
(b)
(c) Figure 8. Probability distribution density of different object features based on extraction from Figure 8. Probability distribution density of different object features based on extraction from high-resolution (b) UAV UAVimagery, imagery,and and(c) (c)GF-2 GF-2imagery). imagery). high-resolution images images ((a) ((a) BJ-2 BJ-2 imagery, imagery, (b)
3.4. Accuracy and Efficiency Assessment of Selected Features 3.4. Accuracy and Efficiency Assessment of Selected Features We compared the RFGASVM method with other methods, namely, SVM with all features and We compared the RFGASVM method with other methods, namely, SVM with all features and RFSVM without GA, to optimize parameters. The OA values of RFGASVM, SVM (all features) and RFSVM without GA, to optimize parameters. The OA values of RFGASVM, SVM (all features) and RFSVM are shown in Table 5. The OA for features selected by RFGASVM had a mean value of over RFSVM are shown in Table 5. The OA for features selected by RFGASVM had a mean value of over 80%, and UAV imagery reached 91.3%. This finding indicates that features selected by RFGASVM 80%, and UAV imagery reached 91.3%. This finding indicates that features selected by RFGASVM are are more representative than those selected by the other two methods. The accuracy of SVM (all more representative than those selected by the other two methods. The accuracy of SVM (all features) features) also reached 80%; however, the use of many features brings huge computational costs. also reached 80%; however, the use of many features brings huge computational costs. RFSVM had RFSVM had a lower accuracy than the other two methods. RFGASVM-selected features achieved a lower accuracy than the other two methods. RFGASVM-selected features achieved higher accuracy higher accuracy and effectiveness based on the OA, Kappa coefficient and feature number. Hence, and effectiveness based on the OA, Kappa coefficient and feature number. Hence, the proposed method the proposed method is more suitable for the identification of residential land. Table 6 shows that is more suitable for the identification of residential land. Table 6 shows that our feature dimensionality our feature dimensionality reduction and optimization strategy outperformed other methods for reduction and optimization strategy outperformed other methods for high-resolution remote sensing high-resolution remote sensing images. The precision of each image was more than 85%, and the images. The precision of each image was more than 85%, and the precision and recall were significantly precision and recall were significantly greater than the other two methods. greater than the other two methods.
Sensors 2018, 18, 2013
14 of 16
Table 5. Comparison among RFGASVM and related methods. Sensors 2018, 18, Data x FOR PEER REVIEW Experimental Evaluation Index
RFGASVM
SVM (All Features)
86.46 88.52 Overall accuracy (OA) Table 5. Comparison among RFGASVM and related methods. 0.88 0.90 Kappa coefficient 85 8 NumberEvaluation of features Experimental Data Index RFGASVM SVM (All Features)
GF-2 imagery
accuracy OverallOverall accuracy (OA)(OA) Kappa coefficient Kappa coefficient Number of features Number of features
14 of 16 RFSVM
83.02 0.85 RFSVM 13
86.46 83.02 80 81.06 0.88 0.85 0.90 0.85 BJ-2 imagery 85 85 13 13 Overall accuracy (OA) 89.75 81.06 80 91.30 Overall accuracy (OA) BJ-2 imagery Kappa coefficient 0.93 0.85 86 0.90 90.25 0.85 0.91 UAV imagery Kappa coefficient Number of features 6 85 0.88 13 15 10 Number of features Overall accuracy (OA) 91.30 86 70 90.25 UAV imagery Kappa coefficient 0.91 0.88 0.85 Number featuresand UAV10imagery between 70 proposed RFGASVM 15 Table 6. Results of the test samples of of satellite and GF-2 imagery
88.52 89.75 0.90 0.93 68
related methods in terms of precision, recall and F1-score.
Table 6. Results of the test samples of satellite and UAV imagery between proposed RFGASVM and related methods in terms of precision, recall and F1-score.
Experimental Data
Method
Precision
Experimental Data
Method RFGASVM RFGASVM GF-2 imagery SVM (all features) GF-2 imagery SVM (all features) RFSVM RFSVM RFGASVM RFGASVM BJ-2 imagery (all(all features) BJ-2 imagery SVM SVM features) RFSVM RFSVM RFGASVM RFGASVM UAV imagery SVM SVM features) UAV imagery (all(all features) RFSVM RFSVM
Precision 85.50 85.50 83.25 83.25 81.0 81.0 89.51 89.51 78.67 78.67 80.10 80.10 92.25 92.25 86.51 86.51 87.35 87.35
Recall
F1-Score
Recall F1-Score 86.81 86.15 86.81 82.53 86.15 82.89 82.53 80.0 82.89 80.50 80.0 80.50 88.12 88.81 88.81 88.12 77.50 78.08 78.08 77.50 79.1 79.60 79.60 79.1 90.05 90.05 91.14 91.14 78.81 78.81 82.48 82.48 85.0 85.0 86.41 86.41
Feature redundancy increases the size of the search space and affects the speed of algorithms.
Feature redundancy increases thedifferent size ofiteration the search space and affects the speed of algorithms. We extracted the running time from times of BJ-2 imagery and later, compared the We extracted running timethe from different iterationand times of BJ-2without imageryGA andtolater, compared proposedthe method with SVM (all features) RFSVM measure the the computational efficiency. As shown in Figureand 9, the SVM (all features) method took more time using proposed method with the SVM (all features) RFSVM without GA to measure the computational a largeAs number of in features. because optimization takes took a lot of timetime with using increased efficiency. shown FigureThis 9, is the SVM global (all features) method more a large iterations. The implementations of global the RFGASVM took much time—up two times fasteriterations. than number of features. This is because optimization takesless a lot of timetowith increased the other two methods. The results took showmuch that efficiency is greatly improved whenthan dealing with two The implementations of the RFGASVM less time—up to two times faster the other images of large regions. methods. The results show that efficiency is greatly improved when dealing with images of large regions. RFGASVM
SVM (all features)
RFSVM
85
Running time (s)
75 65 55 45 35 25 15 0
20
40
60
80
100
120
Iteration times
Figure 9. Comparison of the efficiency of proposed RFGASVM and related methods using different Figure 9. Comparison of the efficiency of proposed RFGASVM and related methods using different iteration times. iteration times.
Sensors 2018, 18, 2013
15 of 16
4. Conclusions and Future Work In this study, we proposed a novel feature dimensionality reduction and optimization strategy to extract buildings using an object-based image analysis approach. The feature selection method is based on the ReliefF, genetic algorithm (GA) and support vector machine (SVM) methods and is called the “RFGASVM” method. We collected several samples using results from three high-resolution remote sensing images (GaoFen-2, Beijing-2 and UAV images), and then selected features to extract buildings to evaluate the performance of the proposed method. The approach consisted of four steps: First, the image pixels from the image were grouped using a multiresolution segmentation algorithm to form objects. Then, features were calculated by object-based image analysis, and stable features were derived from the inherent characteristics of objects and were given the possibility of being implemented on high-resolution images. Features were ranked based on ReliefF method to reduce the redundancy. The preliminarily selected feature subset and SVM parameters were optimized based on the genetic algorithm (GA) by selecting the optimal feature sets from the remaining sorted features. Finally, the experimental results demonstrated the effectiveness of the proposed method in terms of the efficiency and classification accuracy. The proposed feature selection method reduces the redundancy for object-based image analysis and is well-suited for high-resolution remote sensing images. In addition, it can be applied to feature selection and information extraction and has the advantage of a higher reduction rate. In our future research, we plan to design and implement high-quality samples for high-performance feature selection. Author Contributions: Z.Y., W.S. and W.F. assisted with the study design and the interpretation of the results; Z.R. designed and wrote the paper. Funding: This research was funded by the National Key Research and Development Program of China (grant number 2017YFB0504100, 2016YFC0803000), Major Project of High Resolution Earth Observation System (Civil Part) (00-Y30B15-9001-14/16-1) and Youth Innovation Promotion Association of CAS (2015129). Conflicts of Interest: The authors declare that they have no conflicts of interest.
References 1. 2.
3.
4. 5. 6. 7. 8.
9.
Moser, G.; Serpico, S.B.; Benediktsson, J.A. Land-Cover Mapping by Markov Modeling of Spatial-Contextual Information in Very-High-Resolution Remote Sensing Images. Proc. IEEE 2013, 101, 631–651. [CrossRef] Blaschke, T.; Hay, G.J.; Kelly, M.; Lang, S.; Hofmann, P.; Addink, E.; Feitosa, R.Q.; van der Meer, F.; van der Werff, H.; van Coillie, F.; et al. Geographic object-based image analysis—Towards a new paradigm. ISPRS J. Photogramm. Remote Sens. 2014, 87, 180–191. [CrossRef] [PubMed] Laliberte, A.S.; Browning, D.M.; Rango, A. A comparison of three feature selection methods for object-based classification of subdecimeter resolution UltraCamL imagery. Int. J. Appl. Earth Observ. Geoinf. 2012, 15, 70–78. [CrossRef] Takayama, T.; Iwasaki, A. Optimal wavelength selection on hyperspectral data with fused lasso for biomass estimation of tropical rain forest. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 8, 101–108. Yu, X.; Zhan, F.; Liao, M.; Hu, J. Object-oriented feature selection algorithms based on improved SEaTH algorithms. Geomat. Inf. Sci. Wuhan Univ. 2012, 37, 921–924. [CrossRef] Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. Proc. Comput. Syst. Bioinf. 2003, 3, 523–528. El Akadi, A.; Amine, A.; El Ouardighi, A.; Aboutajdine, D. A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl. Inf. Syst. 2011, 26, 487. [CrossRef] Rajesh, S.; Arivazhagan, S.; Moses, K.P.; Abisekaraj, R. Genetic Algorithm Based Feature Subset Selection for Land Cover/Land Use Mapping Using Wavelet Packet Transform. J. Indian Soc. Remote Sens. 2013, 41, 237–248. [CrossRef] Wang, Z.; Zhang, Y.; Chen, Z.; Yang, H.; Sun, Y.; Kang, J.; Yang, Y.; Liang, X. Application of Relief algorithm to selection features sets for classification of high resolution remote sensing image. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 755–758. [CrossRef]
Sensors 2018, 18, 2013
10. 11. 12.
13.
14. 15. 16. 17. 18. 19. 20.
21. 22.
23. 24. 25.
16 of 16
Chang, X.; He, L. System Noise Removal for Gaofen-4 Area-Array Camera. Remote Sens. 2018, 10, 759. [CrossRef] Nussbaum, S.; Menz, G. Object-Based Image Analysis and Treaty Verification; Springer B.V.: Heidelberg, Germany, 2008. Hofmann, P.; Strobl, J.; Blaschke, T.; Kux, H. Detecting informal settlements from QuickBird data in Riode janeiro using an object-based approach. In Proceedings of the 1st International Conference on Object-based Image Analysis, Salzburg University, Salzburg, Austria, 4–5 July 2006. Vu, T.T.; Matsuoka, M.; Yamazaki, F. Shadow analysis in assisting damage detection due to earthquakes from Quickbird imagery. In Proceedings of the XXth ISPRS Congress, Istanbul, Turkey, 12–23 July 2004; pp. 607–610. Felzenszwalb, P.F. Efficient graph-based image segmentation. Int. J. Comput. Vis. 2004, 59, 167–181. [CrossRef] Hu, Z.; Li, Q.; Zou, Q.; Zhang, Q.; Wu, G. A Bilevel Scale-Sets Model for Hierarchical Representation of Large Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 12. [CrossRef] Huang, Y.; McCullagh, P.J.; Black, N.D. An optimization of ReliefF for classification in large datasets. Data Knowl. Eng. 2009, 68, 1348–1356. [CrossRef] Spolaor, N.; Cherman, E.A.; Monard, M.C.; Lee, H.D. ReliefF for multi-label feature selection. In Proceedings of the 2nd Brazilian Conference on Intelligent Systems, Fortaleza, Brazil, 20–24 October 2013; pp. 6–11. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn 1995, 20, 273–297. [CrossRef] Devroye, L.; Györfi, L.; Lugosi, G. Vapnik-Chervonenkis theory. In A Probabilistic Theory of Pattern Recognition; Springer: New York, NY, USA, 1996; pp. 187–213. Kalantar, B.; Pradhan, B.; Naghibi, S.A.; Motevalli, A.; Mansor, S. Assessment of the effects of training data selection on the landslide susceptibility mapping: A comparison between support vector machine (SVM), logistic regression (LR) and artificial neural networks (ANN). Geomat. Nat. Hazard. Risk 2017, 9, 49–69. [CrossRef] Zhang, R.; Ma, J.W. Feature selection for hyperspectral data based on recursive support vector machines. Int. J. Remote Sens. 2009, 30, 3669–3677. [CrossRef] Ye, F. Evolving the SVM model based on a hybrid method using swarm optimization techniques in combination with a genetic algorithm for medical diagnosis. Multimed. Tools Appl. 2018, 77, 3889–3918. [CrossRef] Yang, X.; Zhao, S.; Qin, X.; Zhao, N.; Liang, L. Mapping of urban surface water bodies from Sentinel-2 MSI Imagery at 10 m resolution via NDWI-based image sharpening. Remote Sens. 2017, 9, 596. [CrossRef] Yang, X.; Qin, X.; Wang, J.; Wang, J.; Ye, X.; Qin, Q. Building façade recognition using oblique aerial images. Remote Sens. 2015, 7, 10562–10588. [CrossRef] Xu, X.Y.; Yan, Z.; Xu, S.L. Estimating wind speed probability distribution by diffusion-based kernel density method. Electr. Power Syst. Res. 2015, 121, 28–37. [CrossRef] © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).