Automatic classification of deep benthic habitats ...

3 downloads 0 Views 1MB Size Report
Automatic classification of deep benthic habitats: detection of microbial mats and siboglinid polychaete fields from optical images on the HÃ¥kon Mosby Mud.
Automatic classification of deep benthic habitats: detection of microbial mats and siboglinid polychaete fields from optical images on the Håkon Mosby Mud Volcano C. Chailloux*, A.G. Allais**, P. Simeoni**, K. Olu* *Département Etudes des Ecosystèmes Profonds/ Laboratoire Environnement Profond, Ifremer Brest, BP 70, 29280 Plouzané, France **Département Systèmes sous-Marin/ Service Positionnement Robotique Acoustique Optique, Ifremer Toulon, BP 330, 83507 La Seyne sur Mer Cedex, France

Abstract- While many seafloor surveys provide a growing number of data, only few automatic techniques are developed to analyze images. This study is interested in the automatic detection and quantification of microbial mats and fields of siboglinid polychaete (tubeworm) colonizing the seafloor of the deep-sea Håkon Mosby Mud Volcano surveyed by ROV. Three algorithms are developed to segment the high resolution optical images of the seafloor which apply a watershed transformation coupled with a region growing technique, or a similarity measure operated with a region growing technique, or a texture analysis used with the Kullback-Leibler divergence. The results are compared through the score of a classification ratio estimated with a human-made classification.

A. Scientific Background and objective The Håkon Mosby Mud Volcano (HMMV) as many marine and mud volcanoes which have been observed all over the world, is located along an active continental margin. HMMV is an active mud, fluid and methane-venting seep, located at 1250-1266 m water depth in the Norwegian sea at 72°00.25’N 14°43.50’E (Fig. 1). It presents a circular caldera of around 1 km of diameter.

Keywords-Classification; watershed; region growing; mutual information; texture analysis; bacterial mats; siboglinid polychaete

I.

INTRODUCTION

Most of the time seafloor habitat mapping studies are led manually in analyzing from video or photo transects the different species colonizing the seafloor (e.g. [1]). A study enhances the use of an automatic tool to detect bacterial mats on video data [1]. Nevertheless, the use of ROV (Remote Operated Vehicle) for the surveys of deep-sea ecosystems has enhanced image datasets and a first study promoting the use of an automatic tool to detect bacterial mats on video data sets has been recently published [2]. In this new study, based on the analysis of high sensitivity, black and white, still camera images we propose an automatic classification tool to detect and quantify different classes of habitat of the Håkon Mosby Mud Volcano (HMMV), namely bacterial mats, siboglinid polychaete fields and bottom. The purpose of an automatic classification tool for biologists is double: 1. objectively classify habitats to map their spatial distribution and detect changes over time 2. rapidly process a classification on large data sets.

Figure 1. Bathymetry of the HMMV provided by the multibeam sonar RESON 7125 at 20m, enhancing three different zones: the flat central unit with a mud flow oriented toward south-east, the periphery crown, and the surrounding moat [2].

A previous study [2] accurately describes HMMV and its three characteristics concentrically units which correspond to: 1. a central zone, flat and colonized in the southern part by a dense bacterial coverage, 2. the periphery composed of a hummock mainly colonized by siboglinid polychaete and sparsely by bacterial mats, 3. a non colonized moat area surrounding the entire volcano. The study interest zone hereafter is concerning the south-eastern, where a recent mud flow spread. Active mud volcanoes are seep of natural gas (methane), therefore they are often densely populated by bacteria, tubeworms, bivalves, and other symbiotic organisms [4]. These latest indicate the presence of active gas seeps, and their distribution and their density are indicative of low versus high methane discharge [3]. Density variations are important and this induces that biologists manually define many sub classes into a class to split it in intervals with a corresponding coverage percent. One sub-class, enclosed by a polygon, can contain for example 25-50% of bacterial mats and 50% of siboglinid polychaete [2]. In our study more accurate results are provided, because the image analysis operates at the pixel scale. B. Image processing The survey, led during the VICKING oceanographic cruise was carried out by the R.O.V. Victor 6000 equipped with a high resolution camera OTUS, and a RESON 7125 multibeam sonar. Optical data were acquired with the OTUS camera providing images of 1024x1024 pixel size at a height of 8 m above the seafloor, which is equivalent to a 1.03cm/ pixel resolution. Image processing techniques consisted in several region-based segmentation methods summarized in the following. A first method consisted in defining seed points of a regiongrowing algorithm as centers of gradient-based watershed regions. An homogeneity measure was used as a growing criterion, it means that a neighbourhood region is integrated to the reference region (around a seed point) if this region owns an average and a variance that are included in an grey-level interval defined by each reference region and evolving at each iteration of the region-growing process. A second method, also based on a region-growing algorithm, was achieved by an initialization of seed points determined from a similarity criterion. This requires a human skill to determine some homogeneous regions corresponding to a class. These samples (9x9 pixels region) show typical distributions of grey level. Then a similarity measure as the mutual information was used to estimate which samples, among all composing the entire image, were supposed to own to a class in function of their scores compared to a fixed threshold (samples whose scores are over 95% of the maximum score of similarity).

Finally, a third method based on the texture analysis was compared. It consisted in using cooccurrence matrix parameters and Gabor filter parameters. These parameters have proved to be competitive compared with a wavelet analysis or Fourier Transform parameters. The KullbackLeibler divergence was used to estimate the similarity between the different samples of images of parameters. Performances of algorithms are illustrated by a classification ratio defined on a synthetic texture image composed by several real textures and on real optical data compared with a human-made classification. An accurate percent of cover was estimated showing the distribution of each class of microbial mats or tubeworm fields. This approach will allow us to carry out long-term observation of deep-sea benthic habitats. II. IMAGE ANALYSIS A first step in image analysis consists in filtering data, because of the illumination variation generated by the artificial light necessary to enhance the scene on optical images. Therefore the underwater vehicle carried out its own light source, thus producing a kind of halo effect. The relative positions of the camera and the lights can justify this phenomenon: the OTUS camera is on the front of the vehicle while spot lights are on the back. A.

Optical image filtering Many solutions have been suggested to overcome problems of illumination variations on optical images in the literature [5, 6]. Several solutions have been tested: Homomorphic filtering, Local Histogram Equalization (CLAHE), radiometric correction, and illumination-reflectance model. This latest, considering the noise as a multiplicative variable, is well fitted to our application. Herein is a brief description of this approach [5]: The image is considered as a product of the illumination and reflectance as described by the equation (1): (1) where f(x,y) is the image generated by the sensor of the camera, r(x,y) is the reflectance function (or ideal image under absence of shading), and i(x,y) represents the multiplicative illumination factor. The reflectance can be expressed in function of the characteristics of the camera as a contribution of a multiplicative gain g(x,y) and a negligible offset o(x,y) with respect to i(x,y): (2) The multiplicative factor g(x,y).i(x,y) can be modeled by a smooth function. In order to model this non-uniform

illumination, a Gaussian-smoothed version of the image fs(x, y) is used. It acts on each pixel of the image, and the acquired image can be corrected by a point by point division by the smoothed image, giving an estimate of ideal reflectance or image: (3) where δ is a normalization constant which restores the overall image luminance. The contrast of the resulting image is then emphasized with an equalized version of the ideal reflectance. The histogram of the image is normalized. The selected values to specify the fraction of the image to saturate at low and high pixels values are defined by the couple [0.1, 0.9], which determines a threshold on the values of the normalized histogram. The smoothed image is computed from a set of successive images, then averaged, and a 2D-Gaussian function is fitted to the average image. Results of filtering appear in Fig. 2.

Figure 2. Raw optical image on the left and filtered image A corrected by an average of Gaussians (Illumination-reflectance model)

B. Watershed & Region growing The set of optical images of HMMV presents three main classes. One of them, Bacterial mats, because of its white colour is easy to detect and characterize by the most intuitive way to segment an image, it means a threshold on grey levels. The segmentation process of bacterial mats is split in two parts: 1. an initial region-based segmentation by a watershed transform [7]. 2. a region growing algorithm which process iteratively until the convergence of an homogeneity criterion. The watershed method consists in partitioning the image into a set of homogenous region based on grey values discontinuities. The gradient image (the first derivative of the image) is calculated and interpreted as a topographic map. It is then iteratively flooded from local minima to get a region based over-segmentation of the image. A double threshold (an interval defined by t1=0 and t2=3) is applied on the grey levels of the watershed transform to extract the regions corresponding to the class “Bacterial mats”. The validated pixels of the watershed transform correspond to pixels which present a neighborhood of 3x3 whose all pixels are contained in this interval.

Then the center of each region is selected as a seed to initialize the region growing [8] algorithm. This latest is based on a homogeneity criterion which requires to estimate the mean and two different standard deviation of a well defined neighborhood of 3x3 pixels. The mean mgv is estimated from the median and not from the average of pixel values. The two standard deviations ud and ld are then estimated by separately evaluating grey values of the sample pixels that are greater or lower than the mean. Then it is possible to define two lower and upper thresholds computed to determine region membership: (4) (5) with n corresponding to the number of pixels computed for the estimation, w setting to 1.5 for a Gaussian distribution. The function c ( n ) = 20 / n decreases with n to compensate the estimation error of mgv, ld and ud. While [8] decide to operate a randomized region growing around each seed point, we chose to study the 3x3 neighborhood of each seed point initially and then to expand iteratively this analysis to each pixel selected as a pixel of the region. Remind that a pixel is considered owning to a region when it is contained into the interval [Tlower,Tupper] . These thresholds are evaluated iteratively for each new set of points. The set of points of region is also updated for each iteration in erasing the old points of region to prevent from computing redundancy. The surface coverage of the bacterial mat is evaluated for each iteration, and this value is the convergence criterion in the sense that when the surface stops to increase the algorithm stops (a dozen of iterations is necessary).

Figure 3. Watershed transform of the optical filtered image A on the left, and segmentation of bacterial mats after 15 iterations of the region growing algorithm

C. Mutual Information & Region growing Results obtained with the previous method are not able to detect and segment siboglinid polychaete fields as showed in the next part (cf. Experimental results). A convenient way to well define their characteristics is to use their grey level distributions (see Fig. 4 of different distributions).

The weakness of the previous method is coming from the misdetection of seed points of siboglinid polychaete fields by the watershed transform. As a consequence, rather than using a watershed transform to initialize the seed points, in this section an operator firstly determine some samples corresponding faithfully to the desired classes, in this case it is the bacterial mats class. Six samples are selected, they are averaged and normalized to get a representative model of the distribution of bacterial mats. Then, this reference distribution is compared to all other ones resulting from the entire analysis of the image. A systematic correlation is led between the reference distribution and the current ones with a similarity measure to estimate the score of similarity. We plan to use mutual information as a similarity measure because of its efficiency to deal with multimodal distributions. Indeed distributions evaluated on 9x9 windows sometimes show nonGaussian distributions. The Mutual Information (MI) I(X,Y) is the sum of entropies of two random variables X and Y of associated density probabilities p(x) and p(y) and of joint density probability p(x,y): (6) where H(X) and H(Y) are the marginal entropies of the random variables and H(X,Y) is the joint entropy, their expression in the continuous case are: (7) Another expression is the following: (8)

where p(x,y) is now the joint probability density function of X and Y, and p(x) and p(y) are the marginal probability density functions of X and Y respectively

Figure 4. Corresponding distributions of samples of bacterial mats (top) and siboglinid polychaete fields (bottom)

Figure 5. Segmentation of bacterial mats after 3 iterations by the analysis of grey level distributions by the mutual information and a region growing algorithm on the image A

The second part of the algorithm, namely the region growing algorithm, is similar to the previous one. But in contrast to the previous algorithm, only three iterations are necessary to converge. This is due to the fact that more seed points are determined with the ‘similarity of distributions method’ than with the watershed transform.

D. Texture analysis & Mutual Information A third classification method is necessary to deal with the case of siboglinid polychaete fields. Indeed, the use of grey levels distribution is not enough accurate to distinguish regions of siboglinid with regions of bioturbation (see Experimental results in the next part). The texture analysis is a powerful segmentation tool compared with grey level distribution analysis, because it includes an exhaustive spatial analysis of relative pixel positions in addition. One main study has been used for the following section [9]. Its originality is based on the use of the fusion of several texture descriptors. Each class is characterized by 121 cooccurrence distributions, 50 marginal distributions coming from a Gabor filtering, and 48 marginal distributions of wavelet coefficient. This study reveals that Gabor filter performs better than co-occurrence matrices on oriented textures, moreover Gabor filters are independent of the variation of the grey level average, while co-occurrence matrix take into account the relative grey level values of pixels and explain the relative organization and the orientation of pixels. Wavelet coefficient descriptors are inappropriate to well characterize texture compared with co-occurrence or Gabor filter descriptors. It is also noticeable that the best classification rate is obtained with only 3 distributions on 219. The segmentation algorithm is based on the KullbackLeibler divergence which measures the similarity of the reference distribution with the current ones. Its expression is: (9)

common pixels of the reference class with the current one, divided by the total of pixels composing the reference class: Each pixel is characterized by a descriptor set which is represented by the empirical distributions of the texture filtered by filter bench, estimated on a local spatial frame, centered on the pixel. The results obtained with this method are apparently satisfying with both bacterial mats and siboglinid polychaete classes (see Fig. 6), except an error due to illumination variation on the top right corner of the image. This problem enhanced the weakness of this algorithm which requires a perfect filtering data. Another reason for this error can be explained by the fact that bacterial mats do not show a particular texture with an accurate signature because of their homogeneous texture and of their bright white color, producing a high reflective aspect.

Figure 6. Optical filtered image A and segmentation of bacterial mats by texture analysis.

From these three algorithms, it is noticeable that a priori some of them seem to under-estimate the coverage of bacterial mats, this is the case for the watershed approach and the similarity measure approach, while the texture analysis seems to over-estimate bacterial mats and even produces some mistakes (see the artifacts in the top right corner where some bacterial mats are detected). To confirm these first observations we have tested these algorithms on over more complex sets of image containing bacterial mats and siboglinid polychaete fields. III. EXPERIMENTAL RESULTS AND DISCUSSION Results and comparison of the different segmentation algorithms are presented hereafter in two experiences. Experiences reveal the classification ratio associated with each algorithm and each image. The reference segmentation map required to evaluate the classification ratio, is established by an expert in biology (see Fig. 7). To estimate the performance of the segmentation algorithm, we subtract from the reference segmentation map the segmentation map delivered by each segmentation algorithm. The classification ratio µ (equation 10) is the sum of the

(10)

The analysis of the results enhanced that the texture analysis performs better than the watershed approach or the distribution analysis, and this for both classes bacterial mats and siboglinid polychaete fields. However the texture analysis reveals some weaknesses (see the image A on Fig. 6 in the previous part), and this problem could be happened with the presence of some illumination variations. This observation and the results of the classification ratio µ (which is approximately the same for both distribution analysis and texture analysis, see Fig. 9) suggest to use the distribution analysis to segment the bacterial mats, and to process with the texture analysis to determine the corresponding siboglinid polychaete fields. It is noticeable that the good classification ratio are not high, it could be explained by the fact that the reference classification map stills really subjective (because of the human expertise). It is possible, for example, to take into account a larger quantity of pixels belonging to the siboglinid polychaete fields. Indeed this latest class can be subdivided into sub-classes with different density. The case is the same for the bacterial mats class which can be divided in many subclasses. It is particularly the case for the segmentation of bacterial mats of the image B: the expert has largely included in the bacterial mats class several densities of bacterial mats, what is explaining the difference of estimation between the human made classification and the results of segmentation algorithms which associate only one density to one class. The second classification experience is based on the image C which presents more compact class comparing with the image B which presents sparse regions. Basically the image C could be separated in three distinct areas as it is shown in the figure 10. The analysis of the results of the classification ratio shows that ratios can vary from an image to another one. Once again the texture analysis performs better, with good classification ratios of 57% for the bacterial mats and 48% for the siboglinid polychaete fields. The classification ratios for the watershed approach or the distribution analysis are twice smaller than the texture analysis classification ratio, and this only for the bacterial mats class. Indeed remind that neither the watershed approach or the distribution analysis is able to detect the siboglinid class. The segmentation results appear at the figure 11, and the corresponding classification ratios are written in the table contained in the figure 12. A convenient solution to classify the bacterial mats class could be to merge both analysis (texture and distribution) to segment bacterial mats in the way to prevent from artifacts generated by the texture analysis. The siboglinid polychaete class is

segmented from the texture analysis, and an improvement could consist in merging classification results coming from the texture analysis of both the backscattering data and the optical data. IV. CONCLUSION A segmentation tool has been developed to allow biologists to objectively classify habitats and to map their spatial distribution over time with species colonizing the HMMV. This segmentation algorithm is based on a texture analysis to estimate the siboglinid polychaete fields, and helped by a distribution analysis to segment the bacterial mats. The results, illustrated by a classification ratio, are not satisfying as the scores are weak, but it is noticeable that this test is led on a subjective human-made classification, and not on a classification based on a synthetic texture image. A next step to improve this segmentation method will consist in adding to the current segmentation map another segmentation map elaborated from backscattering data, and to compare both segmentation results to make a possible fusion. Figure 8. Segmentation of the bacterial mats on image B with the watershed method (top left), the distribution analysis (top right), the texture analysis (bottom left), and the segmentation of the siboglinid polychaete fields with the texture analysis (bottom right)

BACTERIAL

DISTRIBUTION ANALYSIS

TEXTURE ANALYSIS

9.23%

16.53%

17.25%

-

-

53.61%

WATERSHED

MATS

SIBOGLINID FIELDS

Figure 9. Good classification ratio (µ) for two classes in function of the segmentation algorithm, established on the image B (cf. Fig. 7)

Figure 7. Hand-made classification with 3 classes of bacterial mats (blue), siboglinid polychaete (red) and bottom (white), on image B.

Figure 10. Hand-made classification with 3 classes of bacterial mats (blue), siboglinid polychaete (red) and bottom (white), on image C.

REFERENCES [1] [2]

[3] [4]

[5] [6] [7] [8] Figure 11. Segmentation of the bacterial mats on image C with the watershed method (top left), the distribution analysis (top right), the texture analysis (bottom left), and the segmentation of the siboglinid polychaete fields with the texture analysis (bottom right)

ANALYSIS

TEXTURE ANALYSIS

21.63%

22.49%

57.37%

-

-

48.65%

WATERSHED

BACTERIAL

DISTRIBUTION

MATS

SIBOGLINID FIELDS

Figure 12. Good classification ratio (µ) for two classes in function of the segmentation algorithm, established on the image C (cf. Fig. 10)

ACKNOWLEDGMENT We are grateful to the chief scientist (H. Nouzé, Ifremer) project manager (J.P. Foucher), the crew of the N/O Pourquoi pas? and ROV victor 6000 for data acquisition during the Vicking cruise. This cruise was part of the European Integrated Project HERMES (Hotspot Ecosystem Research on the Margins of European Seas) and the study is supported by the Carnot institute (France). We also address our gratefulness to I. Karoui and J.M. Augustin to have put matlab code to our disposal for the texture analysis.

[9]

K. Olu-le Roy et al., “Cold-seep assemblages on a giant pockmark off West Africa: spatial patterns and environmental control“ in Marine Ecology, vol. 28(1), p.115, 2007. K. Jerosch, A. Lüdtke, M. Schlüter, G.T. Ioannidis, “Automatic contentbased analysis of georeferenced image data: Detection of Beggiatoa mats in seafloor video mosaics from the Håkon Mosby Mud Volcano” Journal of Computer & Geosciences, vol. 33, p. 202-218, 2007. M. Sibuet, K. Olu, “Biogeography, biodiversity and fluid dependence of deep-sea cold-seep communities at active and passive margins“ DeepSea Research II, vol. 45, p. 517-567, 1998. A. Gebruk, E. Krylova, A. Lein, G. Vinogradov, E. Anderson, N. Pimenov, G. Cherkashev, K. Crane, “Methane seep community of the Håkon Mosby Mud Volcano (the Norwegian sea): composition and trophic aspects”, Sarsia: North Atlantic Marine Science 88 (6), 2003. R. Garcia, T. Nicosevici, X. Cufi, “On the way to solve lighting problems in underwater imaging”, Proceedings of Oceans ’02 MTS/IEEE, vol. 2, p. 1018- 1024, 29-31 Oct. 2002. S. Bazeille, I. Quidu, L. Jaulin, J.P. Malkasse, “Automatic underwater image pre-processing”, Proceedings of CMM’06, 16-19 Oct. 2006. Jos B.T.M. Roerdink and Arnold Meijster, “The watershed transform: Definitions, algorithms and parallelization strategies” Fundamenta Informaticae, vol. 41, p. 187-228, 2001. R. Pohle and K. D. Tönnies. “Segmentation of medical images using adaptive region growing” In Proceedings of SPIE (Medical Imaging 2001), vol. 4322, p. 1337-1346, San Diego, 2001. I. Karoui, R. Fablet, J.M. Boucher, J.M. Augustin, “Unsupervised region-based image segmentation using texture statistics and level-set methods” Intelligent Signal Processing, 2007. WISP 2007. IEEE International Symposium on, 3-5 oct. 2007.