An Efficient Feature Selection for SAR Target Classification Moussa Amrani1(&), Kai Yang2, Dongyang Zhao3, Xiaopeng Fan1, and Feng Jiang1 1
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, People’s Republic of China
[email protected] 2 Chinese People’s Liberation Army Aviation School, Jilin, China 3 Beijing Institute of Computer Application, Beijing, China
Abstract. Selecting appropriate features is a prerequisite to attain the accuracy and the efficiency of SAR target classification. Inspired by the great success of BoVW, we address this issue by proposing an efficient feature selection method for SAR target classification. First, Graphic Histogram of oriented Gradients (HOG) based features is adopted to extract features from the training SAR images. Second, a discriminative codebook is generated using K-means clustering algorithm. Third, after feature encoding by computing the closest Euclidian distance, only two bags of features are considered. Fourth, for best result and lower time complexity, the Discriminant Correlation Analysis (DCA) is used to combine relevant information forming new discriminant features. Finally, for target classification, SVM is used as a baseline classifier. Experiments on MSTAR public release dataset are conducted, and the results demonstrate that the proposed method outperforms the state-of-the-art methods. Keywords: Synthetic aperture radar DCA BoVW
Target classification HOG
1 Introduction Airborne and space borne synthetic aperture radar (SAR) has been a great success in targeting systems that it can work in all-weather day-and-night conditions and provides microwave images of extremely high-resolution. A SAR system sends electromagnetic pulses with high power from radar mounted on a moving platform to a fixed particular area of interest on the target and receives the echoes of the backscattered signal in a sequential way. SAR collects data from multiple viewing angles and combines them coherently to illuminate and attain a very high-resolution description of the target. SAR has been primarily utilized for many applications on a target [1] such as surveillance, reconnaissance, recognition and classification etc. However, the understanding of SAR images is hard to manually manipulate compared to optical images which describe a better appearance of a target. This leads to develop more creative automatic target recognition (ATR) algorithms for SAR images. Defining a good way to represent and recognize the targets in synthetic aperture radar automatic target recognition © Springer International Publishing AG, part of Springer Nature 2018 B. Zeng et al. (Eds.): PCM 2017, LNCS 10736, pp. 68–78, 2018. https://doi.org/10.1007/978-3-319-77383-4_7
An Efficient Feature Selection for SAR Target Classification
69
(SAR-ATR) systems has become a challenging task. In this regard, developing a well-designed feature selection method is very important issue. Recently, many methods have been proposed to understand the target from SAR images; include geometric descriptors, such as peak locations, edges, corners, shapes [2], and transform-domain coefficients such as wavelet coefficients [3]. Although the above-mentioned methods may have some advantages, but most of these methods failed to achieve the promising classification performance (i.e., accuracy and time). In this paper, we address these problems using the Bag-of-Visual-Words (BoVW) paradigm [4], which can solve the multi-target SAR images classification effectively. The distributions of the proposal method mainly include four aspects. First, the ability of Graphic Histogram of oriented Gradients (HOG) is introduced. HOG based features [5] achieved a fast realization of feature extraction and a high precision performance than some other methods. Second, K-means clustering algorithm is adopted to generate a discriminative codebook, and also reduce the overall computational complexity. Meanwhile, the closest Euclidian distance between the extracted features and the visual dictionary is computed, and the two bags of features with high performance are considered. Third, for better classification accuracy and lower time complexity, the Discriminant Correlation Analysis (DCA) [6] is used to combine relevant information by concatenation forming new discriminant features. Finally, for simplicity, the linear support vector machine (SVM) is used as a baseline classier throughout the study. The simplicity comes from the fact that SVM applies a simple linear method to the data but in a high-dimensional feature space nonlinearly related to the input space. Moreover, even though we can think of SVM as a linear algorithm in a high-dimensional space, in practice, it does not involve any computations in that high dimensional space. Besides, the fusion between SVM classifier and the discriminant features is studied, which brings higher classifier precision than the common and the traditional method. The proposed method is evaluated on the real SAR images from Moving and Stationary Target Acquisition and Recognition (MSTAR) public release and the experimental results validate the effectiveness of the proposed method. The rest of this paper is organized as follows: Sect. 2 describes the framework of the proposed method as well as introduces precisely: the feature extraction method using HOG, BoVW feature representation, and the feature fusion by concatenation using DCA. The experimental results are presented in Sect. 3. Finally, Sect. 4 gives the concluding remarks of the paper.
2 The Proposed Method The proposed framework consists of the following four main steps as shown in Fig. 1: (1) HOG feature extraction; (2) using the BoVW paradigm to compute the bag of features based on k-means; (3) bag of features fusion based on DCA; and (4) using SVM as a baseline classifier for SAR target classification. The overall SAR target classification framework is given in Algorithm 1.
70
M. Amrani et al.
Fig. 1. Overall architecture of the proposed method. Algorithm I : Feature selection and classification.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
Input: Ctrain= {Ctr1 … CtrN} train category set, Ctest= {Cts1 … CtsN} test category set, visual dictionary D. for i = 1 to N do Extract features from Ctrain and Ctest using HOG-based feature. X htrain = x htrain1 , x htrain2 ,.., x htrainn , X htest = xhtest , x htest ,..,x htest m 1 2
{
}
{
}
Codebook generation using K-means clustering algorithm. Select Vtrain1 and Vtest1 using the Euclidian distance d(Xhtrain, D=30). Select Vtrain2 and Vtest2 using the Euclidian distance d(Xhtrain, D=70). Compute the transformation matrix Wx and WY. Project Vtrain1,Vtrain2 into the DCA subspace.Vtrain1= Wx* Vtrain1,Vtrain2= Wy* Vtrain2 Project Vtest1, Vtest2 into the DCA subspace. Vtest1= Wx* Vtest1, Vtest2= Wy* Vtest2 Fuse Vtrain1, Vtrain2 and Vtest1, Vtest2 by concatenation. end for Perform SVM classification Output: Overall Accuracy
In the following subsections, we describe the main three steps of the proposed method (i.e., the feature extraction, feature representation, and feature fusion steps) with more details. 2.1
HOG Feature Extraction
In this paper, HOG features are adopted to enhance the target classification performance of the proposed method. The HOG feature extraction technique counts occurrences of gradient orientation in regions of interest (ROI) of SAR sample images as illustrated in Fig. 2.
An Efficient Feature Selection for SAR Target Classification
71
Fig. 2. HOG feature extraction framework.
For each pixel (x, y), the gradient computation is done by applying two discrete derivative kernels Gh = [−1, 0, 1] and Gv = [−1, 0, 1]T to respectively obtain the horizontal difference @x ðxi ; yi Þ, and the vertical difference @y ðxi ; yi Þ. The gradient magnitude m(xi, yi) and the orientation hðxi ; yi Þ are determined as follows: mðxi ; yi Þ ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi @x ðxi ; yi Þ2 þ @y ðxi ; yi Þ2
hðxi ; yi Þ ¼ tan
1
@y ðxi ; yi Þ @x ðxi ; yi Þ
ð1Þ
ð2Þ
First, a Histogram is generated [5, 7]. Second, as clarified in Fig. 3, each cell in the feature block is represented by 9-D orientation histograms. In order to reduce the effect of changes in contrast between the sample images of the same target class, the normalized histograms are divided into 16 * 16 pixel blocks, each cell size of 8 * 8 pixels. Finally, for preventing complexity, we quantify the orientation range from 0 to 360°
Fig. 3. HOG feature extraction of sample SAR image: (a) sample SAR image, (b) HOG features with a cell size of 8, (c) gradient magnitude of the SAR image.
72
M. Amrani et al.
into 9 bins, in which, each bin corresponds to 40°. Third, the summation of mðxi ; yi Þ wG ðxi ; yi Þ is accumulated in a specific bin according to hðxi ; yi Þ, where WG(xi, yi) is the weighting mask from a Gaussian distribution. 2.2
BoVW Feature Representation
In this paper, BoVW feature representation is applied into two fundamental phases: codebook construction and feature encoding. In the first phase, a robust and discriminative codebook is generated by clustering the HOG feature vectors of the training feature set. For the sake of implementation simplicity and lower complexity k-means clustering algorithm is implemented; which starting with a set of randomly chosen initial centers, one repeatedly assigns each input point to its nearest center, and then recomputed the centers given the point assignment. In this phase k-means algorithm seeks to find clusters that minimize the objective function: k X X 2 D fpc gkc1 ¼ kai m c k
ð3Þ
c¼1 ai 2pc
P mc ¼
ai 2pc
jpc j
ai
ð4Þ
where the centroid of the cluster pc is denoted by mc, and the number of visual words (i.e., the k values) is dependent on the used training dataset. In the second phase, the closest Euclidian distance between HOG feature vectors of the training and testing feature sets, and the constructed vocabulary are computed forming new robust bag of features that represent all the SAR targets. 2.3
Feature Fusion and Classification
The fusion of the two feature sets is done into two principal phases: in the first phase we calculated the transformation matrices Wx and Wy as well as projected the training feature sets into the DCA subspace, and then fused the two transformed training feature sets by concatenation. In the second phase, we projected the testing feature sets into the DCA subspace and fused the two transformed testing feature sets by concatenation. 2.3.1 Transformation Matrix Computation and Project the Training Feature Sets into the DCA Subspace First, we computed the mean vectors of the training feature sets for each class. Therefore, the n columns of the data matrix P are divided into c separate classes, where ni columns belong to the ith class (n ¼ ci¼1 ni ). Let xij 2 X denote the feature vector corresponding to the jth sample in the ith class xi and x denote the means of the xij
An Efficient Feature Selection for SAR Target Classification
73
Pi vectors in the ith class and the whole feature set, respectively. That is, xi ¼ 1n nj¼1 xij , Pc Pn i Pc 1 1 and x ¼ n i¼1 j¼1 xij ¼ n i¼1 ni xi . Second, we diagonalized the between-class scatter matrix (Sb) for the two training feature sets, where: SbxðppÞ ¼
Xc i¼1
ni ðxi xÞðxi xÞT ¼ Ubx UTbx
ð5Þ
pffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffi UbxðpcÞ ¼ ½ n1 ðx1 xÞ; n1 ðx1 xÞ; . . . n1 ðx1 xÞ
ð6Þ
Third, we projected the training feature sets in the between-class scatter matrices space to reduce the dimensionality of X from p to r as: T Wbx Sbx Wbx ¼ I;
ð7Þ
T X Xð0rnÞ ¼ Wbx ðrpÞ ðpnÞ
ð8Þ
where I is the between class scatter matrix, and X′ is the projection of X in the space of I. Similar to the above step, the between-class scatter matrix for the second modality Sby is used to compute the transformation matrix Wby, which reduce the dimensionality of the training feature set Y from q to r as: T Sby Wby ¼ I Wby
ð9Þ
T Y Yð0rnÞ ¼ Wby ðrpÞ ðqnÞ
ð10Þ
Fourth, after using the between-class scatter matrices to transform X and Y to X′ and Y′, respectively, the singular value decomposition (SVD) is utilized to diagonalize the between-set covariance matrix of the transformed feature sets S0xy ¼ X 0 Y 0T as: S0xyðrrÞ ¼ U
X
V T ) U T S0xy V ¼
X
ð11Þ
P where is a non-zero diagonal matrix. Then, the between-set covariance matrix S0xy is used to transform the training feature sets matrices as follows:
T 0 T T X ¼ Wcx Wbx X ¼ Wx X X ¼ Wcx
ð12Þ
T 0 T T X ¼ Wcy Wby X ¼ Wy X Y ¼ Wcy
ð13Þ
T T T T where Wx ¼ Wcx Wbx and Wy ¼ Wcy Wby are the final transformation matrices for X and
Y respectively. Finally, the transformed training feature sets X and Y are fused by concatenation.
74
M. Amrani et al.
2.3.2 Project the Testing Feature Sets into the DCA Subspace In this phase, the final transformation matrices and are used to transform the testing feature sets and as follows:
X test ¼ Wx Xtest
ð14Þ
Y test ¼ Wy Xtest
ð15Þ
Then, the projected testing feature sets X test and Y test are fused by concatenation. Finally, the linear SVM is trained to be able to classify unknown targets, into one of the learned class labels in the training set. More precisely, the classifier calculates the similarity of all trained classes and assigns the unlabeled targets to the class with the highest similarity measure.
3 Experimental Results 3.1
Dataset
In this section, the MSTAR public mixed target dataset is used for the evaluation of the system, which is downloadable [8]. This dataset Consists of SAR images of ground vehicle targets form different categories. All SAR images are with 1-foot resolution collected by Sandia National Laboratory (SNL). They are collected using the STARLOS X-band SAR sensor in a spotlight mode with a circular flight path from diverse depression angles. 3.2
Experimental Results
The effectiveness of the proposed method is evaluated on ten different vehicle targets from the dataset, which are: armored personnel carrier: BMP-2, BRDM-2, BTR-60, and BTR-70; tank: T-62, T-72; rocket launcher: 2S1; air defense unit: ZSU-234; truck: ZIL-131; bulldozer: D7. For the test, we used 3211 sample images with 15° of depression angle, and for the training, we used 3681 sample images with 17° of depression angle as illustrated in Table 1. First, a Histogram is generated and normalized, and then the normalized histograms are divided into 16 * 16 pixel blocks, each cell size of 8 * 8 pixels. For lower complexity the orientation range is quantified into 9 bins, in which, each bin corresponds to 40°. Thus, the HOG features of SAR image are concatenated into a 7936-D feature. Second, we obtained the codebook of the features extracted from the training SAR images by applying the BOVW approach. Furthermore, the codebook constructed is used to compute the bag of features for both training and testing sets. Third, DCA is utilized to fuse the selected bag of features by concatenation. Finally, the classification of the targets with respect to its training set is done by comparing the bag of features of
An Efficient Feature Selection for SAR Target Classification
75
Table 1. Number of training and testing samples used in the experiments for MSTAR dataset. Target
Train Depression BMP-2 17° BTR-70 17° T-72 17° BTR-60 17° 2S1 17° BRDM-2 17° D7 17° T-62 17° ZIL-131 17° ZSU-234 17°
Test No. Images Depression 699 15° 233 15° 699 15° 256 15° 299 15° 299 15° 299 15° 299 15° 299 15° 299 15°
No. Images 587 196 588 196 274 274 274 274 274 274
the testing set to those of training set, this process is repeated five times. During our experimental analysis, the sensitivity of both cell and dictionary sizes are noticed through the variation of the cell and the dictionary sizes, respectively. The cell sizes are changed from 4 to 16, and the cell size of 8 produced the highest accuracy for the data set. In the same way, the classification accuracies with dictionary sizes from 10 to 100 are compared and the experiments showed that the features vectors of size 30 and 70 achieved the best results as shown in Fig. 4. To achieve the highest accuracy and lower time complexity, the feature vectors of size 30 and 70 are fused together forming new discriminant feature vector of size 18. 100 MSTAR dataset
99
98
Accuracy
97
96
95
94
93
92 10
20
30
40
50
60
70
80
90
100
Dictionary size
Fig. 4. Effect of the dictionary size on the classification accuracy.
The proposed method has a low-complexity time due to the small size of the used feature vectors (i.e., 18). Figure 5 shows the classification time of the proposed method compared with different approaches such as: A-Convnets [9], CNN [10], DWT + Real-Adaboost [11], BCS + Scattering centers [12], SVM + Scattering centers [12],
76
M. Amrani et al.
Object Matching + SIFT [13]. The test machine has an Intel(R) Core i5 processing unit @ 2.67 GHz, 4 GB internal memory and Microsoft Windows 10 Pro 64-bit. All methods have been implemented using MATLAB R2016a. All the classification time results are measured in millisecond (ms). As shown in Fig. 5, the proposed method has lower classification time than other methods. BCS + Scattering centers [12] and SVM + Scattering centers [12] have the longest classification time due to the high dimension of the feature vectors size (i.e., 1024) that is used for classification. However, SVM + Scattering centers [12] has a lower classification time since SVM has a lower complexity than BCS. Both DWT + Real-Adaboost [11] and Object Matching + SIFT [13] used feature vectors of size 75 and 128, respectively. Therefore, their classification time is longer than the proposed method. CNN [10] and A-Convnets [9] classification needs GPU support and takes a lot of time, the reason being that CNN [10] used sparse auto encoder and a Convolutional neural network (CNN) for the feature extraction as well as a softmax layer, and A-Convnets [9] used a deep convolutional network with five trainable layers. Thus, the proposed method attained the lowest time complexity.
DWT+Real-Adaboost [11]
4540 ms
45200 ms
SVM + Scattering centers [12]
BCS + Scattering centers [12]
48170 ms
Object Matching + SIFT [13]
The proposed method
10 3
6220 ms
2090 ms
10 4
10 5
Classification time ( log scale )
Fig. 5. The classification time comparison between the proposed method and the state-of-the-art methods.
The classification accuracy of our algorithm is compared with the same different approaches as shown above. The experimental results shown in Fig. 6 clearly interpret that our proposed method induced the best performance. The confusion matrix of the classification performance is shown in Fig. 7.
An Efficient Feature Selection for SAR Target Classification
99.5%
99.13%
100
77
99.3% 92.6%
Accuracy
90 84.7%
80 74.1% 70
67.2%
60
d
tho
ed
s po
ro
ep
me
ets
[9]
vn
on
C A-
SIF
h
atc
Th
tM
jec
Ob
S BC
]
13
T[
+ ing
r
tte
ca
+S
c ing
]
]
]
12
s[
ter
en
N
CN
r
tte
ca
+S
M SV
]
12
s[
ter
en
c ing
[10
st
oo
ab
d l-A
[11
a
Re
T+
DW
Fig. 6. The performance comparison between the proposed method and the state-of-the-art methods.
Fig. 7. Confusion matrix of the classification performance on MSTAR dataset: the rows and columns of the matrix indicate the actual and predicted classes, respectively.
4 Conclusion Feature extraction plays a key role in the classification performance of SAR-ATR. It is very crucial to choose appropriate features to train a classifier, which is prerequisite. This paper proposed an efficient feature selection method, which takes advantages of BoVW to precisely represent the targets features, and then combine the relevant features together obtaining discriminative features. In contrast to previous SAR target classification studies which produced high complexity classification, the proposed method achieved high accuracy and low-complexity due to the small size of the
78
M. Amrani et al.
discriminative feature vectors. Experimental results on the MSTAR dataset demonstrate the effectiveness of the proposed method compared with the state-of-the-art methods.
References 1. Ozdemir, C.: Inverse Synthetic Aperture Radar Imaging with MATLAB Algorithms, vol. 210. Wiley, Hoboken (2012) 2. Olson, C.F., Huttenlocher, D.P.: Automatic target recognition by matching oriented edge pixels. IEEE Trans. Image Process. 6(1), 103–113 (1997) 3. Sandirasegaram, N.: Spot SAR ATR using wavelet features and neural network classifier, DTIC Document, Technical report (2005) 4. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Proceedings of 9th IEEE International Conference on Computer Vision, vol. 2, pp. 1470– 147, 13–16 October 2003 5. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893. IEEE (2005) 6. Haghighat, M., Abdel-Mottaleb, M., Alhalabi, W.: Discriminant correlation analysis: Real-time feature level fusion for multimodal biometric recognition. IEEE Trans. Inf. Forensics Secur. 11(9), 1984–1996 (2016) 7. Porikli, F.: Integral histogram: a fast way to extract histograms in cartesian spaces. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 829–836. IEEE (2005) 8. SDMS: Mstar data. https://www.sdms.afrl.af.mil/index.php?collection=mstar. Accessed 23 Nov 2016 9. Chen, S., Wang, H., Xu, F., Jin, Y.Q.: Target classification using the deep convolutional networks for SAR images. IEEE Trans. Geosci. Remote Sens. 54(8), 4806–4817 (2016) 10. Chen, S., Wang, H.: SAR target recognition based on deep learning. In: 2014 International Conference on Data Science and Advanced Analytics (DSAA), pp. 541–547. IEEE, October 2014 11. Zhao, X., Jiang, Y.: Extracting high discrimination and shift invariance features in synthetic aperture radar images. Electron. Lett. 52(11), 958–960 (2016) 12. Zhang, X., Qin, J., Li, G.: SAR target classification using Bayesian compressive sensing with scattering centers features. Prog. Electromagnet. Res. 136, 385–407 (2013) 13. Agrawal, A., Mangalraj, P., Bisherwal, M.A.: Target detection in SAR images using SIFT. In: 2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 90–94. IEEE, December 2015