They represent 2 destroyers, 2 cruisers, 2 aircraft carriers, a frigate, and a support ship. The obtained results indicate that BP classi ers trained by minimizing the.
New Cost Function for Backpropagation Neural Networks with Application to SAR Imagery Classi cation Hossam Osman and Steven D. Blostein Department of Electrical and Computer Engineering Queen's University Kingston, Ontario, Canada, K7L 3N6
Abstract This paper proposes the minimization of a new cost function while training backpropagation (BP) neural networks to solve pattern classi cation problems. The new cost function is referred to as the gain-weighted normalized-target mean-square error (GWNTMSE). The paper proves that the minimization of the GWNTMSE is optimal in the sense of yielding network classi er with minimum variance from the optimal Bayes classi er in the limit of an asymptotically large number of statistically independent training patterns. Experimental results are presented. The application selected is the classi cation of ship targets in airborne synthetic aperture radar (SAR) imagery. The number of ship classes is 8. They represent 2 destroyers, 2 cruisers, 2 aircraft carriers, a frigate, and a support ship. The obtained results indicate that BP classi ers trained by minimizing the GWNTMSE consistently outperform those trained by minimizing the standard MSE.
Keywords: pattern classi cation, mean-square error, backpropagation neural networks, SAR
1 INTRODUCTION Backpropagation (BP) neural network classi ers have shown a good performance in a great variety of applications1 . Training these classi ers involves the minimization of a cost function over an available training set. For pattern classi cation in general and for BP classi ers in particular, the cost function
that has been used more frequently than any other alternative is the standard mean-square error (MSE)1 2. The standard MSE has the advantage of assuming no prior knowledge of class distributions or a priori probabilities. This paper proposes training BP classi ers using a new MSE cost function referred to as the gain-weighted normalized-target mean-square error (GWNTMSE). The paper shows that as the standard MSE, the GWNTMSE is optimal in the sense of yielding network output with minimum variance from the Bayes vector in the limit of an asymptotically large number of statistically independent training patterns. The paper also compares the performance of BP classi ers trained by minimizing the GWNTMSE to that of BP classi ers trained by minimizing the standard MSE. The application selected is the classi cation of ship targets in airborne synthetic aperture radar (SAR) imagery. SAR is a microwave-based sensor that is characterized by the production of images of very high resolution and the possibility of all-weather operation. Recently, the problem of automatically interpreting SAR imagery containing ship patterns has received increasing attention3?5. ;
This paper is organized as follows. Section 2 introduces the GWNTMSE cost function. Section 3 employs BP classi ers trained by minimizing the GWNTMSE in the application of SAR imagery classi cation. A comparison between the performance of these classi ers and that of BP classi ers trained by minimizing the standard MSE is also presented in this section. Finally, Section 4 contains the conclusion of the given work.
2 GAIN-WEIGHTED NORMALIZED-TARGET MEAN-SQUARE ERROR Consider employing a BP network to assign an n-dimensional input x to one of c possible classes. Class C is associated with a c-dimensional prototype vector = (1 ; 2 ; ; ) . Before training the BP network, a coding scheme has to be selected, where selecting such a scheme is equivalent to choosing the matrix j
j
j
=4 [1 2 ]
j
cj
t
(1)
c
Let p(C jx) denote the a posteriori probability of class C given pattern x, and de ne the Bayes gain vector b(x) = (b1(x); b2(x); ; b (x)) by2 j
j
c
t
b(x) =4 (p(C1jx); p(C2jx); ; p(C jx)) c
t
(2)
The components of are interpreted as gains obtained by assigning pattern x to a speci c class. Let N denote the size of an available training set, x denote the ith training pattern, and o(x ) denote the c-dimensional network output given x . Then, consider training the BP network by minimizing i
i
4 1 = N
E
X X X c
c
j
xi 2Cj
=1
k
=1
i
!
k P ? o(x ) k2 j
kj
c
k
=1
(3)
i
kj
The cost function E is a MSE in which the target outputs are normalized and each error term is weighted by the total gain assigned to the class of the corresponding training pattern. Thus, it is referred to as the gain-weighted normalized-target MSE (GWNTMSE). The GWNTMSE is reduced to the standard MSE when the components of each prototype vector sum to unity. j
Let n denote the number of training patterns that belong to class C , denote the a priori probability of C , and p(x) denote the probability density function of x. Then, given an asymptotically large number of statistically independent training patterns, E can be rewritten as j
j
j
j
Xn 1 X X ! k P ? o(x) k2 lim E = lim !1 !1 =1 N n x2 =1 =1 ! XZ X ! X X 2 k o(x) k2 P (C jx)p(x)dx kP k + = =1 =1 =1 =1 =1 Z X o (x) P (C jx)p(x)dx ?2 =1 9 8 0 1 v ! u = < X X X X X ! u P (C jx) k2 ; k P k2 ?E :k @ P (C jx)A =t = =1 =1 =1 =1 =1 =1 + E1 ; (4) c
c
j
j
N
kj
N
j
j
Cj
c
k
k
c
c
kj
c
c
j
kj
j
j
j
kj
c k
k
kj
j
k
c
t
j
j
j
c
c
c
c
c
j
kj
j
j
where
k
j
c
kj
k
j
j
kj
j
9 80 1 = < X b ( x ) E1 = E :@ b (x)A k P b (x) ? o(x) k2; =1 =1 c
j
j
c j
j
j
k
(5)
Since the rst two terms on the right side of (4) are independent of the network weights, it follows that minimizing E is equivalent to minimizing E1 . Hence, the minimization of the GWNTMSE, as the minimization of the standard MSE, is optimal in the sense of yielding network classi er with minimum variance from the optimal Bayes classi er in the limit of an asymptotically large number of statistically independent training patterns. However, in real-world applications, the available training set is of nite size, the network may have insucient functional capacity to approximate the optimal
Bayes classi er, and the minimized MSE may attain a local rather than a global minimum. In these practical situations, we expect that minimizing the GWNTMSE where the target outputs are normalized and each error term is weighted by the total gain assigned to the class of the corresponding training pattern would most probably yield a better classi cation performance. Experimental results are presented in the next section.
3 SAR IMAGERY CLASSIFICATION A data set of 8 ship classes was generated using a SAR imagery simulator6. The classes represented 2 destroyers, 2 cruisers, 2 aircraft carriers, a frigate, and a supply ship. The class number assigned to each class type is given in Table 1. Since a ship signature depends on the ship orientation relative to the SAR sensor, while collecting signatures the aspect angle for each ship was varied by increments of 2 until the whole range of interest was covered. Each collected signature was centered in a window of 80 80 pixels and scaled such that its minimum and maximum intensities corresponded to gray values of 0 and 255, respectively. For each ship class, at each aspect angle, eighteen 80 80 images with resolution 3 m 3 m per pixel were generated. This yielded 864 images per class for a total of 8 864 = 6912 images. The generated images were divided into a training set and a test set. The training set was needed to construct the BP classi er, whereas the test set was needed to measure its expected classi cation performance. The division of the generated images was done such that for each ship class, at each aspect angle, 9 of the eighteen generated images were taken for the training set and 9 were taken for the test set. This gave 432 images per class per set for a total of 8 432 = 3456 images per set. Sample SAR images are shown in Figure 1. Since patterns of dierent ship classes were very similar, special attention was given to the selection of the prototype vectors . Speci cally, the matrix in (1) was taken as j
2 66 100 66 5 66 5 66 6 = 666 5 66 5 66 5 66 64 5 0
5 100 0 5 5 5 5 5
5 0 100 5 5 5 5 5
5 5 5 80 5 5 5 5
5 5 5 5 80 5 5 5
5 5 5 5 5 80 5 5
5 0 5 5 5 5 5 5 5 5 5 5 80 5 5 100
3 77 77 77 77 77 77 77 77 77 75
The vectors 1 , 2 , 3, and 8 were set in a way dierent from that of the other prototype vectors, because of the similarity between the patterns of classes C1 and C8 (see columns 1 and 8 in Figure 1) and between those of classes C2 and C3 (see columns 2 and 3 in Figure 1). Class 1 2 3 4 5 6 7 8
Class Type Destroyer Cruiser Destroyer Supply ship Aircraft carrier Cruiser Aircraft carrier Frigate
Table 1: Classes of the generated SAR data set.
Figure 1: Sample SAR images. Each column contains images of one ship class obtained at dierent aspect angles.
A 3-layer BP network classi er with 8 output units, where 8 is the number of ship classes, was employed to solve this classi cation problem. The whole SAR image was taken as input to the BP classi er after using simple averaging to reduce its resolution to 16 16. Thus, the BP network had 256 input units. It should be mentioned that the reduction of the image resolution was done to avoid using a huge BP network and to smooth, to some extent, the speckles present in the SAR images7 . Regarding the number of the BP network hidden units, it was varied between 7 and 22 in increment of 3. For each number of hidden units, the network was trained by minimizing the GWNTMSE over the available training set till convergence. Using dierent network weight initializations, 10 training runs were performed. Once a training run was complete, the network classi cation rate on the test set was determined. Figure 2 plots the average of the 10 classi cation rates versus the number of hidden units. Figure 2 also shows the results obtained in case of training the network by minimizing the standard MSE instead. It should be mentioned that network weight initializations identical to those of the GWNTMSE training runs were used. As shown in Figure 2, the performance obtained by minimizing the GWNTMSE was signi cantly and consistently better than that obtained by minimizing the standard MSE.
4 CONCLUSION This paper has proposed the minimization of a new cost function while training BP networks as classi ers. The proposed cost function is a MSE referred to as the gain-weighted normalized-target MSE (GWNTMSE). This name stems from the fact that in the GWNTMSE, the target outputs are normalized and each error term is weighted by the total gain assigned to the class of the corresponding training pattern. The paper has proved that the minimization of the GWNTMSE is optimal in the sense of yielding network classi er with minimum variance from the optimal Bayes classi er in the limit of an asymptotically large number of statistically independent training patterns. The above indicates that the GWNTMSE, as the standard MSE, achieves asymptotic optimality. However, in real-world applications, with the target outputs of the GWNTMSE normalized and with each of its error terms weighted by the total gain assigned to the class of the corresponding training pattern, we have found that minimizing the GWNTMSE rather than standard MSE would most probably yield a better classi cation performance. This has been demonstrated using an application of
1.0 0.9 0.8 0.7
MSE
0.6
Classification on Test Set
GWNTMSE
7
10
13
16
19
22
Number of Hidden Units Figure 2: Correct classi cation rate on the test set, plotted for both the GWNTMSE and the standard MSE, versus the number of hidden units. signi cant industrial relevance. This is the application of classifying ship targets in airborne synthetic aperture radar (SAR) imagery, where the image classes were 2 destroyers, 2 cruisers, 2 aircraft carriers, a frigate, and a support ship. The obtained results have demonstrated that BP classi ers trained by minimizing the GWNTMSE consistently outperform those trained by minimizing the standard MSE
5 ACKNOWLEDGMENT The authors would like to thank Mr. Li Pan for his assistance in generating the SAR imagery. The authors would also like to acknowledge the support given by the Natural Science and Engineering Research Council (NSERC) of Canada under grant OGP0041731, by NSERC CRD grant no. 177119, and by Lockheed Martin Electronic Systems Canada.
6 REFERENCES 1. S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan College, New York, 1994. 2. K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York, 1990. 3. G. Benelli, A. Garzelli, and A. Mecocci, "Complete processing system that uses fuzzy logic for ship detection in SAR images," IEE Proc.-Radar, Sonar Navig., vol. 141, no. 4, pp. 181-186, 1994. 4. K. Eldhuset, "An automatic ship and ship wake detection system for spaceborne SAR images in coastal regions," IEEE Transactions on Geoscience and Remote Sensing, vol. 34, no. 4, pp. 1010-1019, 1996. 5. H. Osman, L. Pan, S. D. Blostein, and L. Gagnon,"Classi cation of ships in airborne SAR imagery using backpropagation neural networks," in Proceedings of SPIE Symposium on Radar Processing, Technology, and Applications II, vol. 3161, pp. 126-136, California USA, 1997. 6. Technology Service Corporation, "RIG, The Radar Imagery Simulator," version 3.0, 1996. 7. Jong-Sen Lee, "Speckle analysis and smoothing of synthetic aperture radar images," Computer Graphics and Image Processing, vol. 17, pp. 24-32, 1981.