Experimental results using a nonlinear extension of the MACE filter

2 downloads 0 Views 177KB Size Report
ABSTRACT. The minimum average correlation energy filter (MACE) filter has been shown to have superior perfor- mance for rejecting out of class inputs in ...
Experimental results using a nonlinear extension of the MACE filter John W. Fisher III and Jose C. Principe Computational NeuroEngineering Laboratory University of Florida Gainesville, FL 32611 ABSTRACT The minimum average correlation energy filter (MACE) filter has been shown to have superior performance for rejecting out of class inputs in pattern recognition applications. The MACE filter exhibits a sharp correlation peak at a specified location in the output plane and low correlation energy elsewhere. It has also been shown that the MACE filter suffers from poor generalization. Increasing the number of exemplars used to compute the filter coefficients can improve the generalization, but the number of exemplars is restricted by the stability of the computation. We show a simple extension of the MACE filter to nonlinear processing techniques (i.e. nonlinear associative memories) which exhibits improved generalization and discrimination performance. The operating parameters of the proposed extension are difficult to compute analytically and adaptive learning methods are needed. Since the output of the MACE filter is optimized over the output plane any nonlinear extension of the MACE filter should encompass the output plane as well. In general this leads to exhaustive training over the entire output plane over all training exemplars. We present an efficient method for computing the parameters of the nonlinear extension which greatly reduces the training iterations required. Experimental results with 35 GHz inverse synthetic aperture radar (ISAR) data are also shown. Keywords: correlation filters, neural networks, pattern recognition, radar target classification 1. INTRODUCTION We have recently proposed a methodology by which minimum average correlation energy (MACE) filtering techniques can be extended to nonlinear signal processing20. It has been shown that the MACE filter can be decomposed as a pre-whitening filter followed by a synthetic discriminant function (SDF)6 which can also be viewed as a special case of Kohonen’s linear associative memory (LAM).10,13 Extension to nonlinear processing is not without cost. As a practical matter, closed form solutions are difficult and such filters must be computed iteratively via gradient search techniques. Unlike the MACE filter, the output plane variance cannot, in general, be characterized by the average power spectrum of the inputs. Iterative techniques must consider the entire output plane. As a result, the order of the filter computation increases by a factor of N 1 N 2 (image dimension) using brute force methods. We will show a technique by which the order of the filter computation is nearly the same as in the linear case (when computed iteratively) and performance in terms of generalization and classification are improved over the MACE filter.

Fisher and Principe

1

Our current interest is in applying these techniques to synthetic aperture radar (SAR) data. We show experimental results here using 35 GHz TABILS24 ISAR data. We compare the performance of the linear MACE filter (i.e. LAM preceded by pre-whitening over the training set) to a nonlinear extension of the MACE filter (nonlinear associative memory structure preceded by a pre-whitening filter). 2. SIMPLE NONLINEAR EXTENSION One possible, and straightforward, approach for a nonlinear extension of the MACE filter is to replace the LAM portion of the filter decomposition shown in figure 1 with a nonlinear associative memory structure such as a feed-forward multi-layer perceptron (MLP). The pre-whitening filter, computed over the average spectrum of the training images is the same as in the MACE filter. This simple modification is illustrated in figure 2. We have shown limited results with this architecture recently using ISAR data15.

pre-whitening filter

input

SDF/LAM

scalar output

pre-whitened Figure 1. Decomposition of MACE filter as pre-whitener followed by SDF/LAM.

pre-whitening filter

input

MLP (N1N2-2-4-1)

scalar output

pre-whitened Figure 2. Nonlinear variation of MACE filter. MLP replaces SDF/LAM.

The notation (N1N2-2-4-1) in figure 2 represents the connectivity of the MLP used in our experiments. The MLP has N1N2 input units (one for each image pixel) feeding into two (2) nonlinear processing elements (PEs) in the first hidden layer. The outputs of these PEs feed into four (4) nonlinear PEs in the second hidden layer which are fed into a single nonlinear PE at the output. The choice of two nodes on the input layer was for experimental convenience only, additional nodes on the input layer of the MLP may yield better performance with respect to generalization and discrimination. The choice of two PEs, however, is enough to demonstrate improved performance with a single additional feature. It should be noted that the choice of a single node on the input layer can only implement a linear discriminant function.

Fisher and Principe

2

Mathematically the output of the filter can be written as a function of the pre-whitened input image, x , as g ( α, x ) = α

σ(W 3(σ(W 2 σ(W 1 x + θ 1) + θ 2)) + θ 3)

= { W3 ∈ ℜ

1×4

;W 2 ∈ ℜ

4×2

;W 1 ∈ ℜ

2 × N1 N2

,

(1)

θ 1, θ 2, θ 3 are constant bias vectors }

where W i represent matrices of weighted connections linking the nonlinear outputs of one layer to the inputs of the next layer, σ( ) is a nonlinear sigmoidal function, and x is an N 1 × N 2 image that has been reordered by row or column into a column vector. The MACE criterion can be restated for this architecture 20 as the problem of finding the parameters, α , which minimize the criterion 2

J = E { g(α, x) } ,

where E( ) is the expectation operator, such that the constraints g(α, x i) = d i

;

i = 1…N t

N N ×1

are satisfied. From this perspective, x ∈ ℜ 1 2 , is a random variable whose second order statistics are characterized by all shifts of the centered exemplars x i . Analytic solutions to this optimization problem are difficult in general, however, gradient search methods can be used to yield satisfactory results. In order to use gradient search techniques, the optimization criterion is approximated with Nt

2

J = βE { g(α, x) } + ( 1 – β )



2

( g ( α, x i ) – d i ) .

(2)

i=1

The criterion of equation (2) relaxes the equality constraints. The degree to which the constraints are emphasized is controlled by setting the parameter β in the range ( 0, 1 ) . In the case of a linear mapping and an under-determined set of constraints minimizing equation (2) yields an optimal-trade-off SDF 11. There are several advantages to this architecture and criterion. The input layer can be implemented with correlators operating in parallel prior to the non-linearity. The architecture retains the shift invariance property of the MACE filter since the signal graph is a feed-forward mapping of these correlator outputs. It is well known that a feed-forward MLP with a single hidden layer can implement any “smooth” discriminant function of its inputs. If we view the nonlinear outputs following the input correlators as extracted features, the second hidden layer allows for any smooth discriminant function of these inputs therefore an explicit form of the discriminant function is not assumed. Finally, with the approximation of the original MACE filter criterion, the back propagation algorithm can be used to solve for the filter coefficients adaptively. 3. Efficient Training Adaptation methods become necessary when the associative memory takes a nonlinear form. Due to the optimization criterion the adaptation must take place over the entire output image plane when brute-force

Fisher and Principe

3

methods are used. This yields N 1 N 2 N t image presentations per training epoch (i.e. one for each of the N t exemplar images and one for each possible N 1 N 2 shift of the image). As was pointed out in earlier work,20 the statistical approach yields a more efficient approach to training. We seek, from the criterion, the parameters, α , which minimize the filter’s response to all sequences with the same second order statistics (i.e. autocorrelation sequence) as characterized by our exemplar images. When pre-whitening is done first, however, any white sequence will have the same characteristics. Instead of training over all shifts of the exemplars, random white-noise sequences can be presented to the MLP as representative of the rejection class during adaptation. Each white noise sequence is used as a noisy estimate of the gradient over the rejection class. This results in ( N t + 1 ) image presentations per training epoch (one for each centered exemplar and one noise exemplar). The relative weighting of the gradients is controlled with the parameter β . A further modification to the training algorithm yields an interesting result.20 If the rows of the matrix, W 1 , representing the input layer connectivity are required to be orthogonal, that is

T

W1 W1 =

h1 h2

2

T T

h1 h2 =

σh 0 1

2

0 σ2

N N ×1

where h 1, h 2 ∈ ℜ 1 2 , the projected random variables will also be orthogonal when the input sequences are white. This can be shown mathematically as  T  T  T T  T E  h 1 x  h 2 x   = h 1 E  xx  h 2 =

T

h 1 Ih 2

.

(3)

T

h1 h2 0

If pre-whitening on the inputs is not done, then equation (3) is only true if the orthogonality condition is enforced and one of the vectors, { h 1, h 2 } , is an eigenvector of the rejection class autocorrelation matrix. In other reporting15,20 it was observed that the linear solution is a strong attractor, enforcing orthogonality on the input layer increases the likelihood of avoiding this solution. It is possible that the linear solution is the optimal solution, but there are better methods for computing the linear discriminant functions.6,9,8,11 4. Experimental Results At this point we describe experimental results using the nonlinear extension described above on 35 GHz fully polarimetric ISAR data. For our experiments we use turn-table target data which can be separated into 3 classes of vehicle; A, B, and C. Within classes A and B there are two target vehicles, while in class C there are three vehicles. Examples of the class and vehicle types are shown in figure 3. The data for all but vehicle 1 in class B are taken at a nominal depression angle of 20 degrees. The depression angle for vehicle 1 in class B was 15 degrees. Prior to filtering, all targets are processed with a polarimetric whitening filter 19 (PWF) and then logarithmically scaled. Target exemplars were sampled at intervals of 0.5 degrees aspect angle over the range 30 to 60 degrees. Target aspect was sampled at 2.0 degree intervals for training exem-

Fisher and Principe

4

plars resulting in 16 training exemplars per vehicle and 45 testing exemplars. Examples of the imagery used are shown in figure 3.

class A

class B

class C

vehicle #1

vehicle #2

vehicle #3

Figure 3. 35GHz ISAR images, by class and vehicle number. In these experiments we wish to compare discrimination between classes and discrimination within classes while maintaining the properties of low output energy and narrow constrained peak about the center in the output plane. To begin with, we first consider the output of the MACE filter as compared to the nonlinear variation. Two qualities are of interest, the variance in the output image plane and the presence of a sharp narrow constrained peak at the center for targets in the recognition class of the filter. Vehicle 1 in class A will be used as an example. A MACE filter was computed using the training exemplars from this vehicle data set. Two other filters using the architecture of figure 2, which will be referred to as an NL-MACE filter herein, were also computed using gradient descent. The first NL-MACE filter used exhaustive training over all training exemplars and over the entire output plane, the second NL-MACE filter was trained using white noise sequences at the input to the MLP and centered training exemplars. Typical outputs of these filters are shown in figure 4. In the figure adapting the NL-MACE filter coefficients exhaustively yields a noticeably reduced variance in the output plane as compared to the MACE filter outputs for the same target exemplars. The output plane

Fisher and Principe

5

variance for the NL-MACE filter is reduced as well, although not as much as in the exhaustively trained case. The quantitative differences are shown numerically in table 1. TABLE 1. Quantitative comparison of MACE filter and NL-MACE filters for vehicle 1 in class A.

output plane variance peak constraint error (over training exemplars)

MACE FIlter

NL-MACE (exhaustive training)

NL-MACE (noise training)

0.00174 0.0

0.00035 0.001

0.00077 0.023

The desirable properties of the MACE filter (i.e. sharp correlation peak, low output plane variance) appear to have been preserved in the NL-MACE. In practice, however, the real comparison of these filters should be with regard to discrimination performance over several vehicle and target classes. This issue is addressed later. Another issue, training iterations, is of importance with respect to the NL-MACE filters. When using exhaustive training, N 1 N 2 N t images (one for each training exemplar and all possible shifts) are presented to the MLP for each epoch (training cycle over all images), while in the noise training case only N t + 1 images (one for each training exemplar and one white noise image) are used for each epoch. The convergence time should be compared on the basis of training images presented. Using this metric, the normalized mean square error (NMSE) over the training images (including rejection class) is plotted in figures 5 and 6 for both filters as a function of the number of images presented during training. The difference in convergence is significant when measured by this metric, taking approximately 2000 to 4000 image presentations for the noise training versus nearly 80,000 image presentations for the exhaustive training. Aside from the marginal differences in peak constraint error (over the training images) and output plane variance, the classification/discrimination performance was nearly identical.

Fisher and Principe

6

Figure 4. Sample outputs from the MACE (top) and NL-MACE (middle) filters using exhaustive and noise training (bottom) for a testing (right) and training (left) exemplar of vehicle 1 in class A.

Fisher and Principe

7

Figure 5. Learning curve for exhaustive training of NL-MACE over entire output plane.

Figure 6. Learning curve for training of NL-MACE centered exemplars only and white noise as the rejection class. 4.1 Classification/Discrimination Experiments As stated, the comparison of the MACE and NL-MACE filters should be on the basis of discrimination and generalization. Experiments were performed to illustrate such differences. Since the classification/discrimination performance did not differ significantly for the exhaustively trained NL-MACE versus the NLMACE trained with white noise, the experiments only show results with the latter type of filter. The following rules were used to determine target classification.

Fisher and Principe

8

1. The minimum threshold for detection/classification was set to be twice the maximum level, excluding the central peak region, found in the output image plane over the training exemplars (of the training vehicle) or the minimum peak response over the non training exemplars (of the training vehicle), whichever was higher. In essence, unless there is a spurious peak away from the center of the output image over the training exemplars, the threshold is set for 100% detection for the training vehicle. 2. If more than one filter were being used, the classification went to the filter class with the maximum response if it exceeded the threshold for that filter, otherwise the target was not considered to be detected/classified. 3. If the class of the target under test corresponded to the filter class, the detection peak had to occur in the 5x5 pixel region surrounding the origin in order to be detected/classified. 4. If the class of the target under test did not correspond to the filter class, the target was considered detected/misclassified if the threshold were exceeded anywhere in the output image plane (note - this happened only twice and the peak was within a 10x10 pixel region about the origin). The first experiment was designed to compare the inter-class discrimination/detection capabilities of the MACE filter versus the NL-MACE filter. Three filters were trained, one for each of the vehicle classes. The training exemplars from a single vehicle in each class were used to train both the MACE and NL-MACE filter. In class A, vehicle 1 was used, in class B, vehicle 2 was used, and in class C vehicle 3 was used. All remaining non-training exemplars were used to test the detection performance of the filters. This left 45 non-training exemplars for the training vehicles and 61 for the non-training vehicles. Table 1 details the detection/classification results. In all cases, the non-training exemplars corresponding to the training vehicle were detected and classified for both the MAC E and NL-MACE filter. The experiment does bear some other interesting results. In particular, vehicle 2 in class A has been modified to reduce the radar backscatter, as a result the intensity of the ISAR image is also reduced. In this case the MACE filter was able to detect/classify only 7 images of this vehicle while incorrectly classifying 9. The NL-MACE on the other hand detected and classified 25 images of this vehicle while only incorrectly classifying 3. This tends to support the idea that the NL-MACE has improved dynamic range. In the second experiment class A is used as a confusion class. The same filters are used for class B and class C. The results of this experiment are shown in table 3. The results of this experiment are also of interest. The detection/classification results for classes B and C do not change from the first experiment since none of these target images were misclassified as class A. The MACE filter is seen to pick up 43 more misclassifications from this class while the number of misclassifications for the NL-MACE filter increases by only 23. The majority of new misclassifications occur in class C for both filters. The final experiment was designed to illustrate intra-class discrimination. Class C is used for this experiment. A new filter is computed for vehicle 2 in class C. The results of this experiment are shown in table 4. Recall that since vehicles 2 and 3 were used to train the filters there are only 45 possible detections for these vehicles while the other vehicles include 61 images. The results of this experiment appear to demonstrate improved discrimination in the NL-MACE. Only 8 targets from the non-training exemplars of vehicles 2 and 3 in class C were misclassified while in the case of the MACE filter 13 were misclassified. The misclassifications of the confusion targets from classes A and B drop only slightly with addition of the new filter, indicating that there is not much difference in the two vehicles. The classification of vehicle 1 in class

Fisher and Principe

9

C is not easily interpreted as there are similarities between both vehicles. The overall detection performance of the MACE filter did improve as 8 more images of vehicle 1 were detected. TABLE 2. Confusion matrix for MACE and NL-MACE when trained over class A (vehicle 1) class B (vehicle 2), and class C (vehicle 3). Vehicle Class

A

Vehicle Number

1

MACE class A MACE class B MACE class C Unclassified Number of Targets

45

NL-MACE class A NL-MACE class B NL-MACE class C Unclassified Number of Targets

B 2

1

C 2

1

2

3

45

7 1 50 45 1 8 49 52 45 45 11 11 9 61 61 45 61 61 45

45

25

0 45

59

45

3 33 2 61 61

45

1 59 57 45 1 4 61 61 45

TABLE 3. Confusion matrix for MACE and NL-MACE when trained over class B (vehicle 2) and class C (vehicle 3) using class A as a confusion class. Vehicle Class Vehicle Number

A 1

B 2

1

C 2

1

2

3

MACE class B MACE class C Unclassified Number of Targets

1 1 50 45 1 41 11 49 52 45 19 49 11 11 9 61 61 61 45 61 61 45

NL-MACE class B NL-MACE class C Unclassified Number of Targets

1 1 59 45 1 20 4 59 57 45 40 56 2 1 4 61 61 61 45 61 61 45

Fisher and Principe

10

TABLE 4. Confusion matrix for MACE and NL-MACE when trained over class A vehicle 1, class B vehicle 2, and class C vehicle 3. Vehicle Class

A

Vehicle Number

1

B 2

1

C 2

1

2

3

MACE class C, vehicle 2 MACE class C, vehicle 3 Unclassified Number of Targets

2 1 19 33 1 39 11 1 2 38 12 44 20 49 60 59 4 61 61 61 61 61 45 45

NL-MACE class C, vehicle 2 NL-MACE class C, vehicle 3 Unclassified Number of Targets

1

24 38 1 18 4 35 7 44 43 57 61 60 2 61 61 61 61 61 45 45

5. Conclusions We have presented results using a recently proposed extension of the MACE filter to nonlinear signal processing. The extension is necessarily rooted in adaptive training methods since globally optimal analytic solutions to nonlinear optimization problems are difficult in general. Since any extension of the MACE filter must consider the entire output image plane, nonlinear adaptive methods would lead to training exhaustively over the entire output image plane. A significant result was the convergence speed in filter training times when white noise exemplars were used to represent the rejection class, which circumvented the need to train exhaustively. This characterization of the rejection class was possible as a consequence of a prewhitening filter. It should be possible to extend to nonlinear signal processing any filter that can be decomposed as linear pre-processor followed by a linear associative memory in the fashion we have described. The experimental results demonstrate that the approach shows promise for the ATR/D problem. The NLMACE demonstrated improved discrimination/classification and generalization while maintaining the desirable properties of the MACE filter: a sharp centralized peak near the origin of the output image and minimum energy elsewhere. It should be noted that the experiments were performed on high quality ISAR imagery and that further experimentation is needed in order to characterize the performance of the NL-MACE in target plus clutter environments. We are currently pursuing this line of research. 6. Acknowledgments This research was partially supported by ARPA grant N60921-93-C-A335.

Fisher and Principe

11

7. References 1. B. V. K. Vijaya Kumar, “Tutorial survey of composite filter designs for optical correlators”, Appl. Opt. 31 no.23, 4773-4801, 1992. 2. C. F. Hester and D. Casasent, “Multivariant technique for multiclass pattern recognition,” Appl. Opt. 19, 1758-1761, 1980. 3. A. Mahalanobis, B.V.K. Vijaya Kumar, and D. Casasent, “Minimum average correlation energy filters,” Appl. Opt. 26 no. 17, 3633-3640, 1987. 4. B. V. K. Vijaya Kumar, “Minimum variance synthetic discriminant functions,” J. Opt. Soc. Am. A 3 no. 10, 1579-1584, 1986. 5. T. Kohonen, Self-Organization and Associative Memory (1st ed.), Springer Series in Information Sciences, vol. 8, Springer-Verlag, 1988. 6. Ph. Réfrégier and J. Figue, “Optimal trade-off filter for pattern recognition and their comparison with Weiner approach,” Opt. Comp. Proc. 1, 3-10, 1991. 7. J. Fisher and J.C. Principé, “Formulation of the MACE Filter as a Linear Associative Memory,” Proc. of the IEEE Int. Conf. on Neural Networks, Vol. 5, p. 2934, 1994. 8. J. Fisher and J. C. Principé, “A Performance Comparison of the MACE Filter to a Simple Nonlinear Extension,” Proceedings of the Joint Automatic Target Recognizer Systems and Technology Conference IV, Monterrey, Ca, August 1994. 9. L. M. Novak, M. C. Burl, And W. W. Irving, “Optimal Polarimetric Processing For Enhanced Target Detection,” IEEE Trans. Aerospace And Electronic Systems, Vol. 29, p. 234, 1993. 10. J. Fisher and J.C. Principé, “A nonlinear extension of the MACE filter,” currently under review, 1994.

Fisher and Principe

12

Suggest Documents