Segmentation of 3D acoustic images for object recognition ... - CiteSeerX

1 downloads 0 Views 699KB Size Report
3D acoustic images of an underwater scene in order to ... viewing volume in the underwater scene. .... Zerr and Stage 15] describe an approach for recon-.
Segmentation of 3D acoustic images for object recognition purposes Rajendra C. Patel, Alistair R. Greig

Abstract |This paper addresses the problem of segmenting 3D acoustic images for object recognition purposes. For remotely operated vehicle (ROV) navigation the 3D surroundings have to be understood. Sensors, such as 3D acoustic imaging sensors, can be used for this purpose. Automatic segmentation and reconstruction of an underwater scene could make many underwater operations more e ective and reliable. The characteristics of 3D acoustic images are di erent from conventional optical or range images and simple standard segmentation methods developed for optical imaging or range images are often rendered more or less useless. 3D acoustic images have usually got low resolution and exhibit characteristics inherent to acoustical imaging, such as speckle noise and target shadows. The main advantages are that the acoustic imaging sensor is functional even in turbid water conditions and is capable of providing 3D (range) information. A literature review of work in the eld of acoustic image segmentation is given, summarising the state{of{the{ art in acoustic image segmentation approaches for object recognition purposes. Four di erent approaches, a thresholding, a fuzzy clustering, a Markov Random Fields and a connected components approach, are implemented and tested on synthetic and real 3D acoustic images. Keywords | 3D acoustic image, acoustic camera, segmentation, object recognition.

T

I. Introduction

HE segmentation of an image is an essential step prior to any object recognition or data fusion processes in computer vision applications. The 3D acoustic camera is a relatively new development which could be useful as a sensor on board an ROV to provide real{time 3D acoustic images of an underwater scene in order to enhance the ease of navigation operations. Not many researchers have been addressing the problem of 3D acoustic image processing yet. Much work has been done using conventional acoustic sensors such as sector{scan sonars, side{scan sonars or multibeam echosounders. These kind of sensors can also be used to obtain 3D information similar to that of an acoustic camera, but not in real{time. In the literature described acoustic image segmentation methods often address problems in geological applications (e.g. sea{bed classi cation) or methods for detection of small objects on the sea{ oor (e.g. mine hunting). Little work describing acoustic image segmentation for object recognition or navigational purposes can be found. The purpose of this paper is to give an overview R.C.Patel and A.R.Greig are with the Automatic Control Group, Department of Mechanical Engineering, Torrington Place, University College London (UCL), London WC1E 7JE, England. E-mail: ra [email protected] .

of segmentation methods which can be used with the data from a 3D acoustic camera in order to assist the following object recognition and data{fusion processes. Therefore the main problem is to segment the acoustic image into object and background regions. In the following sections the three representation schemes for 3D acoustic images (the range and intensity image, the voxel representation and the frames representation) are described and a literature review of work in the eld of acoustic image segmentation for object recognition purposes is given. Thus summarising the state{of{the{art in segmentation concepts for acoustic images. Four of these concepts, a thresholding segmentation, a clustering approach, a connected components approach and a Markov Random Fields (MRF) based method, are implemented and the results of segmentation tests on real and synthetic 3D acoustic images are presented. Qualitative and quantitative comparisons are given and the segmentation results are discussed. II. 3D acoustic images

A. Image formation 3D acoustic images are formed by active acoustic imaging devices. An acoustic signal is transmitted and the returns from targets are collected and processed in such a way that acoustical intensities and range information can be retrieved for several viewing directions (beam{ directions). The real images used in this paper stem from the EchoScope, a 3D real{time acoustic camera produced by OmniTech AS, Bergen, Norway. The camera has a 40x40 array of receiver transducers and operates on the pulse{echo principle. Acoustic holography principles are then used to reconstruct the 3D image from the collected acoustic returns. A description of an earlier prototype is given by Hansen et al [1][2] and Allen [3]. The synthetic images are produced by a simulation of this type of acoustic camera. The image formation principle is based on a simpli ed discrete backward projection method [4]. Both, the real and synthetic 3D acoustic images reconstruct the image for an array of 64x64 beamdirections, a minimum and maximum detection range, de ning a viewing volume in the underwater scene. B. Representation of the data To reduce the amount of data only the maximum intensity along each beam and the corresponding range is considered. The data of 3D acoustic images can then be

−200 −400

z [cm]

represented in three ways, as  intensity and range images,  points in three{dimensional space (voxels), or  equidistant contours (frames). Examples of acoustic images of a real underwater scene with three cylinders and the sea oor can be seen in gures 1, 2 and 3. Figure 1 shows the voxel representation of the data. The greylevel of each point is determined by the corresponding acoustic intensity and the viewing volume boundaries are represented by the solid lines. This representation of the data is the most useful for human interpretation and as such would already be very useful for ROV operators as an additional navigational tool. Figure 2 shows the intensity and range images of the same scene. Dark pixels in the intensity image correspond to returns with high acoustic intensity and dark pixels in the range image correspond to points in a further distance. Each pixel of the 64x64 intensity and range image corresponds to a distinct beam{direction. Knowing the viewing angles corresponding to each pixel, the cartesian coordinates of the image points can be calculated to arrive at the voxel representation. Equidistant contours can be extracted from the 3D acoustic image data by considering only those intensities found at a certain range. This is equivalent to looking at the intensities of a spherical slice through the viewing volume. Figure 3 shows the intensities for two spherical slices through the viewing volume at distances d = 950cm and d = 960cm from the acoustic camera. None of the reported segmentation methods on acoustic images uses the frames representation of the data. Which is not surprising as each frame would have to be processed separately and the results would have to be combined, resulting in a lot of unnecessary processing. If the data is not reduced by just considering the maximum intensity along each beam, the frames representation scheme can also be used to visualise the full data set of all the intensities recorded in the whole viewing volume. Therefore this representation is still useful for detailed analysis of the 3D acoustic images. The intensity and range, as well as the frames representation schemes are not easy to interpret by a human for navigational purposes, but allow a very useful representation for machine based processing of the images. Computer vision methods can be easily implemented for these two representations.

−600 −800 −1000

500 500 0 0 y [cm]

−500

−500

x [cm]

Fig. 1. Voxel representation of a real 3D acoustic image

Fig. 2. Intensity and range image of a real 3D acoustic image

is crucial to the segmentation quality and precision. Guillaudeux et al [5] present an improvement of a sonar image processing chain using the thresholding segmentation approach by replacing thresholdings by an unsupervised fuzzy clustering strategy. A fuzzy k{means clustering approach with a two-dimensional feature vector for each pixel is implemented. The rst feature is the greylevel, which can discriminate between shadow pixels with low greylevels and echo pixels with high greylevels. The second feature is the variance, calculated from a 5x5 window, which helps partitioning of the reverbera-

III. Literature review

For online visualisation of acoustic images generally one threshold is used on the raw signals to segment the acoustic image into object and background regions. An acoustic image can in many applications also be segmented into three classes: shadow, echo, and reverberation. To achieve this segmentation a simple two threshold approach can be used. The selection of the thresholds

Fig. 3. Equidistant contours at range distance d = 950cm and d = 960cm of a real 3D acoustic image

tion from the echo and shadow groups. Murino et al [6][7][8][9] describe probabilistic techniques for restoration and reconstruction of underwater acoustic images based on Markov Random Field (MRF) methodology. Range and intensity images are modelled as MRF's whose associated probability distributions are speci ed by energy functions designed to embed the physics of the acoustic image formation process. Both types of data, range and intensity data, are exploited in an integrated manner for the restoration and reconstruction of 3D acoustic images. Two basic approaches are described, one using a single energy function and the other using separate MRF's for the range and the intensity image. For the minimisation of the energy functions leading to optimal reconstructed and restored estimates a simulated annealing procedure is suggested. Collet et al [10][11] describe an approach on sonar image segmentation based on hierarchical Markovian modelling. The output of the processing chain is a two class segmentation of the sonar image in shadow and sea{bottom reverberation regions. The Markov Random Field model takes into account the phenomenon of speckle noise through Rayleigh's law as well as notions of a{priori information about the geometry corresponding to geometric object shadows. Parameters of the Markovian model are the noise parameters and the a{priori model parameters. A multigrid method is applied and in order to minimise the energy functions associated with the MRF model at each scale the Integrated Conditional Modes (ICM) algorithm is used. After thresholding and pre ltering of the raw sonar data Auran and Malvig [12][13] organise the returns of the sonar in a dynamical 3D occupancy framework. A 3D spherical map is used as it represents the sonar data given for the various beam{directions in a more natural way. This map represents a cluster map corresponding to the target objects. Segmentation in this case is understood as separating the echo clusters of individual objects. This is done by a connected component algorithm based on the connectivity between spherical segments in an 8{neighbourhood of the sonar beams. Rajpal, Banerjee and Bahl [14] describe a framework for automatic object identi cation from acoustic shadow. The pre{processing involves segmentation of the sonar image into background and shadow regions. The segmentation is done by converting the image into a binary image using a threshold and the regions of interest are then extracted using a connectivity algorithm on the binary image. Zerr and Stage [15] describe an approach for reconstruction of three{dimensional information from a sequence of conventional sector scanning sonar images. Segmentation of each image into background, shadow and object regions is needed. For high contrast images a two{threshold approach is thought to be sucient and

for low contrast images segmentation based on Markov Random Fields is proposed. A connected component analysis to remove undesired shadow and object regions using size and location criteria can be used. Dai et al [16] describe a method to separate moving and static objects from sector{scan sonar image sequences. Time{domain and frequency domain approaches are employed. Static objects are detected in the time{domain and moving objects in the frequency{ domain. Subramiam and Bahl [17] describe an approach for segmentation and reconstruction of a dense surface from sparse sonar data. A piecewise{smooth graph surface tting procedure is proposed to t surfaces to the sparse data resulting in the grouping of the image pixels into regions based on their describing surface functions. The surface tting procedure is split up into the following functional groups: selection of seed points, surface tting over points, region updating, and continuation/termination. A dense representation of the image or the rendering of the 3D scene can be performed by deriving object and region masks from the sparse data. From the literature review it can be seen that several di erent approaches have been used for acoustic image segmentation either separately or combined. The main concepts are  thresholding,  clustering,  Markov Random Fields,  connected components, and  surface tting. IV. Segmentation tests

Segmentation approaches based on thresholding, fuzzy clustering, Markov Random Fields, and connected components analysis, have been implemented and segmentation tests have been performed. As the real test image the 3D acoustic image shown in gure 1 and 2 is used. A synthetic 3D acoustic image has been created by using 256x256 points extracted from a virtual scene. These points serve as input to the 3D acoustic imaging simulation. The virtual scene and the resulting acoustic image in voxel representation is shown in gure 4. A. Thresholding It is assumed that acoustic intensities above a certain threshold value can be associated with returns from an object in the scene. Therefore the image can be segmented into object and background regions by using a single threshold level. Knowledge of the expected sidelobe{levels of the acoustic imaging system can be used to determine a minimum threshold level.

−200

−300

−300

−400

−400

−500

−500

−600

−600 z [cm]

z [cm]

−200

−700 −800

−700 −800

−900

−900

−1000

−1000

−500

−500 0

0

500 x [cm]

500 −500

0 y [cm]

500

x [cm]

−500

0 y [cm]

500

Fig. 4. Virtual scene and synthetic image

B. Fuzzy Clustering A fuzzy k{means clustering algorithm has been implemented. Clustering algorithms partition a group of data points into a number of subgroups. The number of subgroups in this case is known to be two, as a binary segmentation of the image is anticipated. An iterative optimisation minimising an objective function is performed. The feature vector in the segmentation test consists of the intensity and variance of intensity as proposed by Guillaudeux et al [5] . Other features such as the variance of range have also been tested, but the segmentation results did not improve. The degree of membership to the two subgroups is calculated for each data point according to an euclidian distance measure and the pixel is classi ed according to the higher degree of membership. C. Connected Components For the connected component analysis approach a 64x64 connectivity matrix is rst calculated for the 3D acoustic image by applying an L{shaped mask on the intensity and range image. The L{shaped mask looks at three pixels, the centre pixel, and the neighbouring pixels on the right to the centre and below the centre. The centre beam{direction (centre pixel) is said to be connected to its neighbour beam direction (pixel on right or below) if the following two conditions are ful lled: 1. the centre pixel intensity value is above a certain intensity threshold, and 2. the absolute di erence of the range between the centre range and the neighbour range value is below a certain threshold. The elements of the connectivity matrix can have four di erent label values, depending on the number of connected pixels. The connectivity matrix is then used to create a label image of connected components where zero denotes the background and values above zero belong to object regions. For a binary segmentation image the labels above zero are set to one. D. MRF The implemented MRF based segmentation produces a binary label image based on the intensity image. En-

ergy functions for the sensor model and the a{priori model have to be chosen. Here a Gaussian sensor model is assumed and apart from a multiplicative constant the same energy term related to the prior{model is chosen as in Murino et al [6][7][8][9] for the restoration of the intensity image. For fast minimisation of the energy function an Iterated Conditional Mode (ICM) algorithm is used. E. Qualitative and quantitative evaluation The output of all the implemented segmentation methods is a binary segmentation of the 3D acoustic image in object points and background points. The best representation for qualitative evaluation of the segmentation results is a 64x64 binary image where each of the pixels can be associated with a certain beam{direction, just as in the intensity and range representation of the 3D acoustic data. As the scene is exactly known, ground truth images can be extracted in order to be able to quantitatively assess the segmentation results. Figure 5 shows the ground truth binary image and the acoustic intensity image of the synthetic 3D acoustic image for the virtual scene given in gure 4. The binary segmentation results for the synthetic image can be seen in gure 6. The segmentation results for the real image given in gure 2 are shown in gure 7. Thresholding produces a segmentation with several small holes, which are due to the destructive interference and speckle noise in the acoustic image. It can be seen that the clustering and connected components method produce qualitatively better results in this respect. The here implemented MRF based segmentation produces thinner object regions, which in fact comes closer to the ground truth. Table I shows the results of the quantitative analysis, where the ground truth image is compared to the binary segmentation, and the numbers of oversegmented and undersegmented pixels are counted and the percentage from the overall number of pixels is calculated. The quantitative results underline the qualitative evaluation of the results given above.

Fig. 5. Ground truth and acoustic intensity image

The binary segmentation of the 3D acoustic image can be used to interpolate a smooth surface for a more user{ friendly representation of the results (see gure 8). V. Discussion

In this paper the main concepts for acoustic image segmentation have been summarised by looking at previous work done in this eld. Four of these concepts have

Thresholding

Clustering

Conn.Comp.

MRF

−200 −300 −400

z [cm]

−500 −600 −700 −800 −900

Fig. 6. Synthetic image segmentation results Thresholding

Clustering

Conn.Comp.

−1000

MRF

500

0 500 y [cm]

−500

0 −500 x [cm]

Fig. 8. Interpolated smooth surface

Fig. 7. Real image segmentation results

been implemented and segmentation tests have been performed on real and synthetic data. The segmentation results are presented and qualitative and quantitative evaluation of the results are given. It should be noted though that di erent implementations of the same concepts using di erent parameters and implementation details might yield di erent results. Using more sophisticated methods than the thresholding can improve the segmentation results, but at the cost of a higher processing time. Acknowledgements

The authors would like to thank OmniTech AS who kindly provided the real 3D acoustic images, and MTD/EPSRC who fund the project. [1] [2] [3] [4] [5]

References R.K. Hansen and P.A. Andersen. A 3D underwater acoustic camera { properties and applications. In P. Tortoli and L. Masoti, editors, Acoustical Imaging, volume 22, pages 607{611, New York, 1996. Plenum Press. R.K. Hansen and P.A. Andersen. 3D acoustic camera for underwater imaging. In Y. Wei and B. Gu, editors, Acoustical Imaging, volume 20, pages 723{727, New York, 1993. Plenum Press. H. Allen. Echoscope reveals all. O shore Engineer, June 1996. H. Lee, G. Wade, and J. Fontana. Digital reconstruction of acoustic holograms in the space domain with a vector space approximation. In K.Y. Wang, editor, Acoustical Imaging, volume 9, pages 631{641, New York, 1980. Plenum Press. S. Guillaudeux, S. Daniel, and E . Maillard. Optimization of a sonar image processing chain: A fuzzy rules based exQuantitative evaluation of segmentation results

Thresholding Fuzzy Clustering Connected Comp. MRF

[7] [8] [9] [10] [11] [12] [13] [14] [15]

TABLE I

Undersegm.

[6]

[16]

Oversegm.

182  = 4:4% 365  = 8:9%  115  2 : 8% 494 = = 12:1% = 12:7% 73  1 : 8% 522 =  200 = 4:9% 169  = 4:1%

[17]

pert system approach. In Proceedings of the 1996 MTS/IEEE Oceans Conference, volume 3, pages 1319{1323, Piscataway, NJ, USA, September 1996. V. Murino. Integration of con dence information by Markov random elds for reconstruction of underwater 3D acoustic images. In M. Pelillo and E.R. Hancock, editors, Energy Minimization Methods in Computer Vision and Pattern Recognition, Lecture Notes in Computer Science 1223, pages 476{490. 1997. V. Murino, A. Trucco, and C.S. Regazzoni. A probablistic approach to the coupled reconstruction and restoration of underwater acoustic images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):9{22, January 1998. V. Murino and A. Trucco. Markov{based methodology for the restoration of underwater acoustic images. International Journal of Imaging Systems and Technology, 8(4):386{395, 1997. V. Murino. Acoustic image reconstruction by Markov random elds. Electronics Letters, 32(7):697{698, March 1996. C. Collet, P. Thourel, P. Perez, and P. Bouthemy. Hierarchical MRF modeling for sonar picture segmentation. In IEEE International Conference on Image Processing, volume 3, pages 979{982, Los Alamitos, CA, USA, September 1996. C. Collet, M. Mignotte, P. Perez, and P. Bouthemy. Unsupervised Markovian segmentation of sonar images In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 4, pages 2781{2784, Munich, Germany, April 1997. P.G. Auran and K.E. Malvig. Realtime extraction of connected components in 3D sonar range images. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 580{585, San Francisco, CA, USA, June 1996. P.G. Auran and K.E. Malvig. Clustering and feature extraction in 3D real{time echo management framework. In Proceedings of the 1996 IEEE Symposium on Autonomous Underwater Vehicle Technology, pages 300{307, Monterey, CA, USA, June 1996. N. Rajpal, S. Banerjee, and R. Bahl. Automatic object identi cation from sonar image shadow. In Y. Wei and B. Gu, editors, Acoustical Imaging, volume 20, pages 773{779, New York, 1993. Plenum Press. B. Zerr and B. Stage. Three{dimensional reconstruction of underwater objects from a sequence of sonar images. In Proceedings of the IEEE International Conference on Image Processing, volume 3, pages 927{930, September 1996. D. Dai, Chantler M.J., D.M. Lane, and N. Williams. A spatial{temporal approach for segmentation of moving and static objects in sector scan sonar image sequences. In Proceedings of the 5th International Conference on Image Processing and its Applications, pages 163{167, Stevenage, UK, July 1995. L.V. Subramaniam and R. Bahl. Segmentation and surface tting of sonar images for 3{D visualization. In Proceedings of the 8th International Symposium on Unmanned Untethered Submersible Technology, pages 290{298, September 1993.