A Comparison of Inter-Frame Feature Measures For Robust Object

458

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 24, NO. 4, OCTOBER 1999

A Comparison of Inter-Frame Feature Measures For Robust Object Classification in Sector Scan Sonar Image Sequences Ioseba Tena Ruiz, David M. Lane, and Mike J. Chantler

Abstract—This paper presents an investigation of the robustness of an inter-frame feature measure classifier for underwater sector scan sonar image sequences. In the initial stages the images are of either divers or remotely operated vehicles (ROV’s). The inter-frame feature measures are derived from sequences of sonar scans to characterize the behavior of the objects over time. The classifier has been shown to produce error rates of 0%–2% using real nonnoisy images. The investigation looks at the robustness of the classifier with increased noise conditions and changes in the filtering of the images. It also identifies a set of features that are less susceptible to increased noise conditions and changes in the image filters. These features are the mean variance, and the variance of the rate of change in time of the intra-frame feature measures area, perimeter, compactness, maximum dimension and the first and second invariant moments of the objects. It is shown how the performance of the classifier can be improved. Success rates of up to 100% were obtained for a classifier trained under normal noise conditions, signal-to-noise ratio (SNR) around 9.5 dB, and a noisy test sequence of SNR 7.6 dB. Index Terms—Robust classification, remotely operated vehicles, sonar images.

I. INTRODUCTION

S

ECTOR SCAN sonars are widely used as sensors for unmanned underwater vehicles (UUV’s), with application in vehicle navigation, obstacle avoidance, and general inspection/survey of the surrounding environment. An ability to automatically detect and classify objects observed by such sonars would be useful in all three applications. For terrainbased vehicle navigation [1], it would assist with the correct re-acquisition of previously detected objects, thus bounding increasing navigation errors. For obstacle avoidance [2], it provides more reliable object detection and additional information on possible object motion. For inspection and survey, it would allow interesting objects to be discriminated and the mission to be adapted on-line and progress accordingly. Such a detection/classification capability cannot currently be acquired in the public/commercial domain, and hence our objective

Manuscript received December 29, 1998; revised August 5, 1999. This work was supported by the Centre for Marine and Petroleum Technology (CMPT) acting as agents for the UK Engineering and Physical Science Research Council (EPSRC) in the Project CLASS, Robust Classification of Sector Scan Sonar Image Sequences 1993-97, under Grant GR/J17012. The authors are with the Ocean Systems Laboratory, Department of Computing & Electrical Engineering, Heriot-Watt University, Edinburgh EH14 4AS, U.K. Publisher Item Identifier S 0364-9059(99)0884507.

Fig. 1. Overall system.

is to research possible algorithms and approaches leading to enhanced operational performance. Our previous research (on which this investigation is based) presented an inter-frame feature measure classifier, (previously known as a temporal feature measure classifier) [3]–[7] which performed well with a limited set of real data. However, the image data used consisted of “clean” sonar returns with no significant noise. In real world conditions, noise plays a limiting role on the performance of any system. Similarly, care was taken in object segmentation, to mimimize the effects on classification performance. In this paper, we therefore examine the robustness of the system with changing noise conditions and segmentation errors, to better evaluate the usefulness of these methods. Fig. 1 illustrates the main aspects of the system. Sonar image data is filtered and a binary segmentation performed to identify objects. Characteristic features are then obtained from the binary and original images. During training, these are used alongside a priori knowledge of object identities to configure the classifier. During testing, they are used by the classifier to identify objects at the output. The filtering and segmentation, feature extraction, feature selection, and classifier will be examined in the following sections. The goal of the investigation is to observe the robustness of this system to changes in noise or segmentation and attempt to improve the classifier performance. To achieve this, we first observe the performance of the classifier as described in [4], with no alterations made. We then observe the robustness of

0364–9059/99$10.00  1999 IEEE

TENA RUIZ et al.: A COMPARISON OF INTER-FRAME FEATURE MEASURES FOR ROBUST OBJECT CLASSIFICATION

the classification process to errors in each individual feature measure. Finally, we examine the performance of the classifier for changes in noise, or the segmentation, using only those features identified as being robust. The sequences of images were obtained on trials at Oban on the west coast of Scotland. The sonar used was the SeaBat 6012. This sonar has a sector size of 90 by 15 . The sonar head contains all solid-state electronics required to form and transmit acoustic pulses at 455 kHz and receive returned energy into 60 1.5 electronically formed beams. The video output of the Processor is in RGB, Y/C (S-Video), or composite, PAL or NTSC format. During trials, the images were recorded on S-Video format and then extracted into a file at a rate of five frames per second. The finished product is a sequence of 8-bit grayscale images. The SeaBat’s range was set at 10 m and the sonar was placed 1 m above the seabed in a water depth varying between 5 and 9 m (due to tides). The objects observed were both divers and a remotely operated vehicle (ROV) (Hyball, marketed by Hydrovision). Section II describes the segmentation, feature extraction, and classification stages. Section III then describes the experiments that analyze the system’s performance, both in the presence of increased noise conditions and changes to the segmentation threshold. In Section III, we also present a reduced set of robust features and the performance of the system using only those features.

II. THEORETICAL BACKGROUND A. Literature Review The research on which this study is based initially used grayscale and shape descriptors derived from images in single sonar scans to classify targets [5]–[7]. Problems were experienced with objects that change significantly over time, such as divers (Fig. 2). However, it was also noted that the ways in which a sonar scan changes over time provide important cues for the identification of targets. A new set of inter-frame features were therefore developed [4] that quantitatively describe the behavior of an object over a sequence of scans. This new system was observed to give classification errors between 1% and 2% and sometimes even zero. However, it has become evident that for a system to be useful outside the laboratory it must be able to succeed in the face of varying conditions. In our case, these are noise and changes to the segmentation. For robust classification in general, several approaches have previously been reported. In [8], emphasis has been placed on feature selection, using a sub-optimal Sequential Backward Search (SBS) algorithm which incorporates Genetic Algorithms (GA). The use of context-sensitive features was outlined in [9]. They proved that the use of contextual information can be used to increase the accuracy of classifiers and demonstrated this by the use of examples. In [10] an approach for interpreting infrared images with the aid of thermophysical invariant features is presented, the aim being to identify changes in an observed site. We also appreciate the

459

(a)

(b)

(c)

(d)

Fig. 2. (a)–(d) Diver changing shape in time. Returns from a diver taken at five frames per second.

importance of features and have placed considerable effort in trying to find a robust set for classification. Another approach to robust classification has been the use of artificial neural networks. Hierarchical neural networks are proposed by [11] as an architecture for classification of multiresolution invariant image representations. The results are obtained from simulations, but seem to be promising. In [12], neural networks are used for the classification of onedimensional time-series information from a sonar. They show good classification results for a signal-to-noise ratio (SNR) of 5 dB or above with a time-delay neural network (TDNN). This compares with their adaptive spatio-temporal recognizer (ASTER) which performs better under noisier conditions, but they still require more experimentation. Another interesting development was presented in [13], in which a strategy named computerized consensus diagnosis (CCD) was proposed. Its purpose is to provide robust classification of biomedical data. The strategy involves the crossvalidated training of several classifiers of diverse conceptual and methodological origin on the same data and appropriately combining their outcomes. They showed how the CCD gives a better result and more reliable prediction than any individual classification method on its own. For our application, the main drawback is the processing power required to produce the result. For sonar images, a recent paper [14] addressed a system that recognizes manmade objects and estimates their twodimensional attitude. It shows a certain robustness to noise, achieving good results for an SNR of 10 dB. Although it identifies manmade objects, no distinctions are made between them. Their approach was similar to that of [15], which reported a vision system for industrial scenes using a polygonal approximation of the object silhouette. The authors then pro-

460


posed a two-stage matching algorithm. In the first stage, they generate hypotheses to assign images to model polygons and also to the object’s pose. Corresponding continuous measures of similarity are derived from the turning functions of the curves. In the second stage, compatible matches of polygons are collected by using a voting scheme in transformation space. They show robustness to noise, broken image contours, and partial occlusion of objects. Our principle effort here is to underpin this kind of approach by investigating the robustness of an established image processing and classification method, thus etablishing the need for further sophistication. The following sections, therefore, present more detailed descriptions of the filtering and segmentation, feature measures, and classifier which are the basis for this study.

1) Intra-Frame Feature Measures: The intra-frame feature measures used here are taken from [1], with the exception of major axis, minor axis, eccentricity, and orientation, which are no longer used. In their place, two other features have been introduced, the first and the second invariant moments of the objects [14]. The intra-frame feature measures are obtained from a single frame and, in conjunction with the measures taken in subsequent frames, are used to calculate the interframe feature measures. The intra-frame features are listed and explained below. 1) Area: This is the surface area of the object, defined as,

B. Filtering and Segmentation With reference to Fig. 1, filtering and segmentation is used to identify significant returns in the sonar image. The emphasis of this investigation is the performance of the classifier, and hence the filtering and segmentation has been kept simple, consisting of preprocessing filters, segmentation, region growing, and size filtering. Initially, a median filter of size 9 9 is used to eliminate much of the salt and pepper noise. For segmentation, a single threshold filter of value 20 is used to identify the significant targets. This threshold yields a binary image. The binary image is then used as the seed for a region-growing operation TRUE

for all

(1)

all the pixels share some property where, in some region . For our purposes, the chosen property is that pixels must be equal in value and connected, i.e., four neighbors. Once this operation has been performed, those regions which lie close to each other are joined to form a single region, and this is done by applying a 5 5 size kernel. Finally, a size filter is applied and those regions with less then 20 20 pixels are removed. The significant returns are tracked throughout the sequence of scans to provide sequences of objects whose features can be calculated. For the purpose of this experiment, the tracking is performed manually. However, we have also developed an automatic tracking mechanism [3], [5]. C. Feature Measures The classifier uses inter-frame feature measures [2] (previously known as temporal feature measures) to describe the behavior of an object over a sequence of frames. Intra-frame feature measures (previously known as static feature measures) are used to derive their inter-frame counter parts. Intra-frame feature measures are grey-level and shape descriptors of the targets for a single scan. These features were previously found to give the best classification performance with fixed segmentation threshold and no added noise. However, a few changes have been introduced in the feature measures used, and these are described in the following sections.

Area

(2)

has a value of one for a pixel in the object where and zero if not. 2) Perimeter: Two pixels are four neighbors if they share a common boundary. The boundary of an object consists of those pixels that have four neighbors in the background. Also, the perimeter can be defined as the number of boundary pixels. 3) Compactness: The compactness of a continuous geometric figure is measured by the isoperimetric inequality (3) is the perimeter and where research, the algorithm used is

is the area. In this

(4)

Compactness

A circle is the most compact figure, i.e., it has the smallest compactness value. 4) Maximum dimension: The maximum dimension is defined as Maximum Dimension for all

(5)

where and are indices for the boundary pixels and and are the coordinates of those pixels. 5) Mean: This feature measures the average grey-level value of an object. The output of the binary threshold filter is used as a mask on the original image. The resulting object is then analyzed, (6)

Mean

is the gray-level value of the pixels and where is the total number of pixels in the object. 6) Variance: The variance of the gray levels in the object is Mean Variance

(7)


7) Contrast: The contrast of the object is measured by comparing the mean of the object to the mean of the background Contrast

Object Mean Background Mean Resolution

(8)

where Background Mean is

(9)

Background Mean

are those pixels which lie in the background, where i.e., are four neighbors with the boundary pixels, and is the number of pixels belonging to . Also, Object Mean is

(10)

Object Mean

is the number of pixels belonging to the where are those whose object and the pixels belonging to coordinates lie within the object region. Resolution is the number of grey levels for any pixel, in this case 255; this will give a value for the contrast between 0 and 1. 8) First and second invariant moment: These features are derived from the second-order normalized central moments of the objects and are invariant to translation, rotation, and scale [17]. The moments are obtained as

461

2) Inter-Frame Feature Measures: Four inter-frame feature measures are employed. These are the mean value of the feature, variance of the feature, mean rate of change of the feature, and variance of the rate of change of the feature. The percentage rate of change and the standard deviation of the percentage rate of change used in [4] are not employed. Since each intra-frame feature now has four inter-frame counterparts, the set of features used in the training (explained in the following section) therefore consist of 36 inter-frame feature measures, as opposed to the 66 features previously reported. The intra-frame feature measures are calculated at a rate of one every 2 s to produce the inter-frame feature measures. Ten intra-frames features are used to produce single interframe feature measure. No time averaging has been performed. These parameters have shown greater classification success, but a detailed discussion of this falls beyond the scope of this investigation. The inter-frame feature measures are listed and explained below. 1) Mean: The mean value of the intra-frame feature is (18)

where is the number of scans or images (ten in our case) and is the value of the intra-frame feature at scan . 2) Variance: The variance of the intra-frame feature (19)

(11) and the central moments are then (12)

is the inter-frame mean for the same intrawhere frame feature. 3) Mean rate of change: The mean of the rate of change of the intra-frame feature

where

(20) (13)

The normalized central moments, denoted by then

are

4) Variance of the rate of change: The variance of the rate of change of the intra-frame feature

(14)

(21)

where D. Classifier (15) Finally, the first can be derived as

and second

invariant moments (16) (17)

The classifier performs supervised classification. It uses discriminant functions, which assign sequences of significant returns to different object classes, which are known in advance. We have used a linear discriminant function that uses class statistics derived from our training data. This classifier has shown good results in the past [1]. The discriminant function and the feature selection algorithm are briefly explained in this section.

462


1) Discriminant Functions: Discriminant functions are generated for a class , and the classification rule used is to the simply to assign the object with feature vector with the lowest discriminant score . The linear class discriminant is defined as

the Conditional F ratio [17], defined as (23) where

(22) where is the inverse of the pooled variance/covariance • are the means of the matrix, and the entries on corresponding entries of the variance/covariance matrices of each class. is the by variance/covariance matrix of features for class ; is the element vector of feature measures means for • class ; is the number of feature measures contained within the • column feature vector ; signifies the transpose of the matrix. • The implementation is straightforward. The training set is selected. Feature vectors are generated for each of the training sets’ objects. The statistics for each class can then be generated from these vectors. This function is then applied to the feature vector of an unknown object and the object will be assigned to the class with the lowest discriminant score. The feature measures that will be included on the feature vector are selected by the use of a feature selection algorithm, which is explained below. 2) Feature Selection: Since there are 36 inter-frame feature measures, it is not practical to assess every possible feature subset, which would require 2 classification experiments. A suboptimal feature selection method was therefore used, known as Sequential Forward Selection (SFS) or stepwise feature selection [15]. It works by adding one feature at a time to the existing feature set. At the start, the feature with the greatest ability to separate object classes by itself is selected and this becomes the first candidate set. The remaining features are then tested in combination with the first candidate set, and the feature pair giving the best separation estimate is selected as the next candidate feature set. This continues with triples, where two of the variables are already set and tested in combination with the remaining single features. The process is repeated until all features have been selected. The ordering of the features provided by this method gives no indication as to the numbers of features that must be used. Extensive tests [16] have shown that best results are obtained when this number lies between seven and ten, as opposed to the two features indicated in [4]. Our initial approach classified a dynamic target (diver) amongst static targets (pier legs, anchors, etc.). However, our current application classifies two dynamic targets, thus changing the requirements. In the following experiments, the number of features chosen to perform the classification was eight. Measure of goodness attempts to measure a probable classification error that would be achieved with a given feature set. This measure is in effect the ability of a feature set to separate object classes. The measure of goodness used was

(24) and are the number of significant returns for classes and 1 and 2, is the number of features in the new set, is the and are the number of features already selected, and Mahalanobis [16] distances between classes 1 and 2 in the new feature space and the previously selected feature space, is the difference respectively. The Mahalanobis distance of the means of two multivariate groups (25) and are the means of each class and for where signifies matrix transpose, and is the variancefeature covariance matrix. The conditional F ratio measures the effect of adding one feature to an existing set of features. It is proportional to the ratio of the inter-group sum of squares and the intragroup sum of squares. The conditional F ratio is a different case from other measures of goodness since it calculates individual variables rather than the group of variables as a whole. This makes it particularly appropriate for simple stepwise procedures. III. EXPERIMENTS In the following experiments, the robustness of the classifier outlined in the previous sections is tested as noise and changes in the segmentation threshold are introduced. Section IIIA will discuss the SNR and observe the level at which the segmentation fails. Sections III-B and -C will look at the robustness of the existing classifier to changes in the segmentation and increased noise from surface/seabed clutter. Sections III-D and III-E will try to increase the robustness of the classifiers by eliminating those features most sensitive to changes in filtering, segmentation, and noise. A. SNR and Segmentation Performance The SNR can be defined as the ratio between the mean power of the object echo and the mean power of the background echo SNR

Mean Power of Object Mean Power of Background

(26)

and

Mean Power of Object

(27)

where is the number of pixels in the object and the pixels belonging to are those whose coordinates lie within the


463

Fig. 4. A typical seabed clutter scan multiplied by a scale factor f . Fig. 3. A typical scan of a diver (top left quadrant). Sonar parameters are as specified in Section I.

object region. Finally,

Mean Power of Background

(28)

is the total number of pixels contained within a where of the local background of size 50 50 pixels. The window local background has been chosen as the signal (diver/ROV) is not affected by the rest of the noise in the image. With these definitions in mind, we wish to find the SNR which causes the filtering and segmentation to fail. This then defines the upper limit on noise level for the study. Further noise can only be introduced if a more sophisticated segmentation scheme is employed. Two sets of experiments are performed in this section, using image sequences chosen at random, and sampled at five frames per second. In the first set of experiments, the filtering and segmentation (with threshold 20) are left untouched. In sequence 1, 16 frames of seabed clutter returns are multiplied by a scale factor and added to another 16 frames of returns from a diver. In sequence 2, 13 frames of seabed clutter returns and added to 13 frames are multiplied by a scale factor of returns from an ROV. By adding scaled images in this way, we are able to control the amount of noise and thus make a more reliable assessment of the system’s performance. Although not exact, the statistical content of the scaled noise bears some resemblance to that of genuine clutter at similar levels. Finally, in sequence 3, 16 frames of the surface return are multiplied by the same factor and added to 16 frames of returns from a diver. Figs. 3–5 portray this method. For the experiment, the scale factor was increased from zero until the segmentation failed, at which point the SNR was calculated. Table I illustrates the results obtained. In the second set of experiments, the segmentation threshold filter was changed to 30 and then left untouched throughout the experiments. In sequence 1, 16 frames of seabed clutter and added to 16 frames of were multiplied by a factor diver returns. In sequence 2, 15 frames of surface returns were multiplied by the same factor and added to 15 frames of returns from an ROV. Again, the factor was increased from zero until the segmentation failed. Table II illustrates the results obtained.

Fig. 5. The diver and the seabed clutter returns added together.

TABLE I MEASURED SNR FOR WHICH THE SEGMENTATION FAILED, WITH FILTERING AND SEGMENTATION LEFT UNCHANGED AND A THRESHOLD OF 20

TABLE II MEASURED SNR FOR WHICH THE SEGMENTATION FAILED, OBTAINED WITH A THRESHOLD OF 30

Although incrementing the value of the threshold yields better classification results, some diver sequences and, especially, some ROV sequences will not be picked up by the segmentation even with a SNR greater than 8.2 dB. This is because the mean object power is not high enough for the set threshold. In separate experiments, it was also found that the feature values can be affected by changes in the image filters. However, the threshold filter has the most significant effect, and thus remains the focus of this study. B. Robustness of the Classifier for Changes in Object Segmentation Threshold The robustness of the classifier for changes in the segmentation was investigated by observing the classification error

464


TABLE III PERFORMANCE OF THE CLASSIFIER UNDER VARYING THRESHOLDS, FROM 20 TO 35, WITH A MEAN SNR OF 9.5 dB

for changes in the threshold filter. It is important to know how the classifier performs as the segmentation parameters are changed to determine whether the classifier would have to be retrained for each different setting. To train the classifier, six sequences of 150 frames in length were employed. This number is the maximum limit as the digitized sequences are all 150 frames in length. Three of the sequences contained divers and the other three ROV’s. For testing, two sequences also 150 frames in length, one of a diver and another one of an ROV, were employed. Thus, the training sequences used were different to those used for testing. During segmentation, the thresholds were varied from 20 to 35, increasing in steps of 5. The sequences were obtained at a rate of five frames per second. Ten intra-frame features were used to create their inter-frame feature counter parts, obtained at a rate of one every two seconds. Table III shows the classification performance as the segmentation threshold is changed both for the training and test data. These results clearly show that the existing classifier must be retrained if the segmentation threshold is to be altered. If the classifier is trained at a threshold of 20, and an operator wishes to alter it to 35 (because of increased noise conditions), the classification success will drop to 15.6%, which renders the classifier useless.

TABLE IV PERFORMANCE OF THE CLASSIFIER FOR A DECREASING SNR, FROM 9.5 dB to 7.6 dB, AND THE THRESHOLD SET AT 30

TABLE V MAHALANOBIS DISTANCE 2 BETWEEN CLASSES FOR EACH FEATURE. RESULTS WERE OBTAINED FROM POOLED FEATURES AT THRESHOLDS OF 20 AND 35

D

sequence individually. The factor was increased to the point were the segmentation failed. The performance of the classifier trained under different noise conditions was observed as it attempted to classify the test set in varying levels of noise. Table IV displays the classification success for the different combinations. The mean SNR for an image with no added noise is 9.5 dB. Thus, the classifier shows a great degree of robustness to changes in noise within the limits imposed by the segmentation method.

C. Robustness of Classifier for a Decreasing SNR The main objective of this investigation is to observe the classifier robustness as the seabed and surface clutter is increased. The following experiment therefore observes the performance of the classifier as the mean SNR is reduced down to the point where the segmentation collapses. The threshold filter was set at 30 and the rest of the filters and segmentation procedure were left untouched. This threshold value was chosen to allow for the noise to be increased and the mean SNR reduced to the lowest possible level, without the loss of too many sequences. As explained in Section A, some objects are not picked up by a threshold of 30 due to their mean power being too low. Increasing the threshold value further does not produce enough data. In the experiment, we used a training set of six sequences, 150 frames in length. There were three sequences of divers and three of ROV’s. These were added to six sequences of the same length of seabed clutter returns multiplied by a scale factor . The test set consisted of two sequences, also 150 frames in length, one of a diver and another one of an ROV. Another sequence, 150 frames in length, of seabed clutter was added to each returns multiplied by a scale factor

D. Feature Robustness A set of experiments was performed to determine how different feature combinations were affected by both an increasing amount of noise and changes in the segmentation threshold. Those features that are less robust will obviously have a negative effect in the classification success and will add noise to the feature vector used for classification. The experiment consisted of finding the Mahalanobis distance (25) between each class for each individual feature. For a single feature, this measure can be simplified to (29) and are the means of the feature’s outputs for where is the pooled variance of the two classes concerned and the two classes. Higher values will signify greater separation between classes. To find which features were more robust to variations in the segmentation threshold filter, the values of the features at


465

Fig. 6. Distribution function of the mean of the second invariant moment.

Fig. 7. Distribution function of the rate of change of the perimeter.

a threshold of 20 were pooled with the values at a threshold of was found for each individual feature. Table V shows 35. the values of for each feature. The most robust features are highlighted using a bold font. From these experiments, it was concluded that only half the features were robust for changes in the threshold. Grayscale descriptors seem to be most sensitive to changes in the image filters as are the inter-frame feature measurements mean rate of change. To illustrate this difference, Fig. 6 shows the distribution function of the feature with the highest Mahalanobis distance between classes, and Fig. 7 shows that with the lowest distance.

The same procedure was also used to find which features were more robust to increased noise. The values of the features at a SNR of 9.5 dB were pooled with the values at a SNR of 7.6 dB. was found for each individual feature. Table VI for each feature. The most robust shows the values of features are highlighted using a bold font. As in the previous experiment, the grayscale descriptors and the inter-frame feature measurements mean rate of change seem to be the most sensitive features to increased noise. Again only half the features show robustness. Figs. 8 and 9 show the distribution function of the features with both the highest and lowest Mahalanobis distance between classes respectively.

466


Fig. 8. Distribution function for the mean of the area.

Fig. 9. Distribution function for the mean of the mean.

E. Performance of the Linear Classifier with a Robust Feature Set We now present the results obtained for the classifier using (in the training) only those features indicated in the previous set of experiments (Section D) as being robust. The set of experiments are identical to those performed in Sections B and C, except that the set of features used in the SFS is reduced and robust. Table VII shows the classification success for varying thresholds and Table VIII shows the classification success for varying levels of noise. These results show that, by eliminating the sensitive features, the classifier achieves an exceptional robustness to

changes in the filters and an increased robustness to changes in noise. The results also show that, by pooling both extremes in both circumstances, the classification performance is even further increased, i.e., by using the pooled results of the interframe feature measures at thresholds of 20 and 35, in both the feature selection (where the features used are robust) and in the training of the classifier. Figs. 10 and 11 illustrate a comparison between the classifier with the full set of features and the classifier with only those features that are robust. The classifier training through the different thresholds and noises has been averaged out. These charts clearly illustrate the improved performance obtained using only the robust features.


Fig. 10.

In this chart, the values for classification success for the different training sequences have been averaged out.

Fig. 11.

In this chart, the values for classification success for the different training sequences have been averaged out.

IV. CONCLUSION The purpose of this investigation was to examine the robustness of the inter-frame feature measure classifier for underwater sector scan sonar images. It was shown that the classifier was sensitive to changes in the segmentation algorithm and robust, within the limits allowed by the segmentation, to a decreased SNR. To improve the classification performance, the robustness of each feature was observed for both changes in the segmentation threshold and changes in the noise. The inter-frame features of mean, variance, and variance of the rate of change derived from the intra-

467

frame features of area, perimeter, compactness, maximum dimension, and the first and second invariant moments were found to be robust to these changes. The intra-frame features mean, variance, and contrast are pixel-value-based features and are found to be brittle to changes in segmentation or increased noise conditions. For instance, if we take the case of the intra-frame feature mean as the threshold value is increased, the value of this feature will also increase since all the pixels of value less than the threshold are dropped from the object. Similarly, if the noise is increased, so will the value of this feature. The changes are of such

468


TABLE VI MAHALANOBIS DISTANCE 2 BETWEEN CLASSES FOR EACH FEATURE. RESULTS OBTAINED FROM POOLED FEATURES AT SNR’S OF 9.5 dB AND 7.6 dB

D

The results have shown that crude but effective classification of broadly similar objects can be achieved with modest computational resources in acoustically noisy environments, provided that the objects can be successfully segmented from raw sonar data. ACKNOWLEDGMENT The authors would like to thank everyone in the Ocean Systems Laboratory at Heriot-Watt University and, in particular, N. Williams and D. Y. Dai. REFERENCES

TABLE VII PERFORMANCE OF THE CLASSIFIER UNDER VARYING THRESHOLDS, FROM 20 35, WITH A ROBUST SET OF FEATURES AND A MEAN SNR OF 9.5 dB

TO

TABLE VIII PERFORMANCE OF THE CLASSIFIER FOR A DECREASING SNR, FROM 9.5 dB TO 7.6 dB, WITH A ROBUST SET OF FEATURES AND THE THRESHOLD SET AT 30

magnitude that pixel-value-based features are rendered useless. As part of any future work, we would like to address the robustness of the segmentation algorithms. We believe that, with a more robust segmentation, and by using the features identified in this report, the classifier may be able to work in more real-life conditions without the need for constant retraining. A further need is to look at how the different inter-frame features correlate. By omitting highly correlated features, the processing time may be reduced without sacrificing the classification performance. We would also like to point out the need to test the classifier with an increase in the number of classes. Theoretically, there should be no upper bound on the number of classes. For practical purposes, there is no guarantee that the features that were successful in discriminating between a diver and an ROV would be useful for the new problem. We also believe that the robust set of features could form the backbone of other types of classifiers and not just the statistical classifier presented in this paper. Unsupervised classifiers such as the k-means clustering algorithm could be easily adjusted to work with these.

[1] H. J. S. Feder, J. J. Leonard, and C. M. Smith, “Adaptive sensing for terrain aided navigation,” in Proc. IEEE OCEANS’98, Sept. 1998, vol. 1, pp. 336–341. [2] Y. Petillot, I. Tena Ruiz, D. M. Lane, Y. Wang, E. Trucco, and N. Pican, “Underwater vehicle path planning using a multi-beam forward looking sonar,” in Proc. IEEE OCEANS’98, Sept. 1998, vol. 2, pp. 1194–1198. [3] D. M. Lane, M. J. Chantler, and D. Dai, “Robust tracking of multiple objects in sector scan sonar image sequences using optical flow motion estimation,” IEEE J. Oceanic Eng., vol. 23, pp. 31–46, Jan. 1998. [4] M. J. Chantler and J. P. Stoner, “Automatic interpretation of sonar image sequences using temporal feature measures,” IEEE J. Oceanic Eng., vol. 22, pp. 47–56, Jan. 1997. [5] M. J. Chantler, D. M. Lane, D. Y. Dai, and N. Williams, “Detection and tracking of returns in sector scan sonar image sequences,” in Proc. IEE Radar, Sonar and Navigation, June 1996, vol. 143, no. 3, pp. 157–162. [6] D. M. Lane and J. P. Stoner, “Automatic interpretation of sonar imagery sequences using qualitative feature matching,” IEEE J. Oceanic Eng., vol. 19, pp. 391–405, July 1994. [7] G. T. Russell and D. M. Lane, “A knowledge-based system framework for environmental perception in a subsea robotics context,” IEEE J. Oceanic Eng., vol. OE-11, pp. 401–412, July 1986. [8] H. Vafaie and K. A. De Jong, “Robust feature selection algorithms,” in Proc. 5th Int. Conf. Tools with Artificial Intelligence, Boston, MA, Nov. 1993. [9] P. Turney, “Robust classification with context sensitive features,” in Proc. 6th Int. Conf. Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, Edinburgh, U.K., June 1993, pp. 268–276. [10] N. Nandhakumar, J. D. Michel, D. G. Arnold, G. A. Tsihrintzis, and V. Velten, “Robust thermophysics-based interpretation of radiometrically uncalibrated IR images for ATR and site change detection,” IEEE Trans. Image Processing, vol. 6, pp. 65–78, Jan. 1997. [11] S. D. Kollias, “A multiresolution neural network approach to invariant image recognition,” Neurocomputing, vol. 12, pt. 1, pp. 35–37, Mar. 1995. [12] J. Ghosh, N. V. Gangishetti, and S. V. Chakravarthy, “Robust classification of variable length sonar sequences,” in SPIE Conf. Applications of Artificial Neural Networks IV, Apr. 1993, vol. 1965. [13] R. L. Somorjai, A. E. Nikulin, N. Pizzi, D. Jackson, G. Scarth, B. Dolenko, H. Gordon, P. Russell, C. L. Lean, L. Delbridge, C. E. Mountford, I. and C. P. Smith, “Computerized consensus diagnosis: A classification strategy for the robust analysis of MR spectra. I. Application to 1H spectra of Thyroid Neoplasms,” Magnetic Resonance in Medicine, vol. 33, pt. 2, pp. 257–263, 1995. [14] G. L. Foresti, V. Murino, C. S. Regazzoni, and A. Trucco, “A votingbased approach for fast object recognition in underwater acoustics images,” IEEE J. Oceanic Eng., vol. 22, pp. 57–65, Jan. 1997. [15] R. Gerdes, R. Otterbach, and R. Kamm¨uller, “Fast and robust recognition and localization of 2-D objects,” Machine Vision and Applications, vol. 8, pt. 6, pp. 365–374, 1995. [16] N. Williams, “Recognizing objects in sector scan sonar images,” Ph.D. dissertation, Heriot-Watt University, U.K., May 1998. [17] M. Hu, “Visual recognition by moment invariants,” IRE Trans. Inform. Theory, vol. IT-8, pp. 179–187, 1962. [18] A. W. Whitney, “A direct method of nonparametric measurement selection,” IEEE Trans. Computing, vol. 20, pp. 1100–1103, 1971. [19] M. Kendall, A. Stuart, and J. K. Ord, The Advanced Theory of Statistics, 4th ed. London, U.K.: Charles Grifin, 1983, vol. 3.


Ioseba Tena Ruiz was born in La Laguna, Santa Cruz de Tenerife, Spain. He received the degree in electrical and electronic engineering fro HeriotWatt University, Edinburgh, U.K., in 1996. He is currently working toward the Ph.D. degree at the same university. He is currently a Research Associate at HeriotWatt University. His research interests are in sonar image processing, sonar image classification, terrain-based navigation for unmanned underwater vehicles and obstacle avoidance.

David M. Lane received the B.S. degree in electrical and electronic engineering in 1980 and the Ph.D. degree in 1986 for work on subsea robotics. He is a Professor in the Department of Computing and Electrical Engineering at Heriot-Watt University, Edinburgh, U.K. His research interests involve using advanced technology in the ocean, embracing tethered and autonomous underwater vehicles and subsea robotics. He is currently the coordinator of the MAST-III project AMADEUS, and principle Heriot-Watt investigator on the collaborative CEC programs MAST-III ARAMIS, MAST-II AMADEUS Phase I, and for the ESPRIT III UNION Basic Research Action, now finished. He also holds grants for several U.K. Government and industry funded projects, involving underwater robotics for North Sea oil and gas exploration and production. He was previously involved in the EUREKA EU191 Advanced Underwater Robots AUV program. He has worked in the U.K., Defence Industry as a Development Engineer and in the Offshore industry on the operations and maintenance of manned underwater vehicles for inspection and survey. Dr. Lane is a Chartered Engineer in the U.K. He is a member of the Institute of Electrical Engineers (IEE). He was the initial chairman of the European MARABOT special interest group on Subsea Robotics, is a member of the IEE Professional Group B3, Intelligent Automation and Robotics, and the UK Society for Underwater Technology Underwater Robotics Group. He is Associate Editor of International Journal of Systems Science and has recently acted on numerous Program Committees for IEEE Oceanic Engineering and Robotics & Automation Societies, including organizing special sessions at the annual international conferences.

469

Mike J. Chantler received the B.Sc. degree in electronic and electrical engineering from Glasgow University, U.K., in 1979 and the Ph.D. degree in image processing from Heriot-Watt University, Edinburgh, U.K., in 1994. He is a Senior Lecturer at the Department of Computing & Electrical and Electronic Engineering, Heriot-Watt University, which he joined in 1983. Over the last 15 years, he has had numerous research projects on image processing and applied artificial intelligence, many concerned with subsea applications. He has published over seventy papers in these areas and has served on numerous conference, technical, and professional committees. His current work is focused on three-dimensional texture classification in the presence of varying, or multiple, illumination sources.