automatic segmentation and modelling of two

0 downloads 0 Views 392KB Size Report
electrophoresis images is the detection and quantification of ... samples. In this paper we concentrate on the first two steps. ... The procedure results in there are ...
AUTOMATIC SEGMENTATION AND MODELLING OF TWO-DIMENSIONAL ELECTROPHORESIS GELS E.Bettens, P. Scheunders, J. Sijbers, D.Van Dyck, L.Moens RUCA, University of Antwerp Vision Lab, Dept. of Physics Groenenborgerlaan 171 B - 2020 Antwerpen BELGIUM E-mail: [email protected] ABSTRACT An important issue in the analysis of two-dimensional electrophoresis images is the detection and quantification of protein spots. In this paper we describe a new robust technique to segment and model the different spots present in the gels. For the segmentation a watershed technique is applied. For the quantification of the spots, a new spot model is constructed, based on diffusion principles. Besides the advantage of having a physical interpretation, the model is demonstrated to be superior to the commonly used Gaussian models. 1. INTRODUCTION Two-dimensional electrophoresis (2-DE), an important technique in protein research, separates different kinds of proteins, based on their molecular weight (Mr) and iso-electric point (pI) [1]. In the resulting two-dimensional spot pattern, each spot represents a specific protein. Each protein is characterized by its position in the gel, which determines its pI- and Mr-values, and by geometrical parameters describing the form and the volume of the spot. The complexity of these spot patterns necessitates the use of powerful computers and image processing techniques to analyze the gels [2-7]. The analysis is done through different stages. The first step is the segmentation of spots from the background. Here the aim is to extract from the gel as many spots as possible. The second step is the quantification of the extracted spots : the centre of each spot is determined as well as some additional features describing intensity, size and form of the spot. The third step is a calibration step : the spot coordinates obtained in the previous step are relative and depend on the deformation of the gel during the 2DE-process. A transformation is needed to obtain absolute coordinates, being the pI- and Mr-values. In the last step patterns of proteins are recognized and classified. Combinations of spots

correspond to the simultaneous appearance of different proteins, and the recognition of complex spot patterns leads to detailed information of the protein content of the biological samples. In this paper we concentrate on the first two steps. To segment the spots from the background, the density peaks in the image have to be found. We will present a fully automatic 'watershed algorithm' for segmenting the image into different spot regions : the algorithm finds the appropriate regions so that in each region found, a spot can be quantified in a correct manner. A big advantage of this algorithm is that it is robust in the sense that it is not influenced by a variable background (low-frequency variations). As a result background corrections, which were needed for other methods like gradient or laplacian filters[5], aren't necessary anymore. The next step is the modeling of the spots. All existing parametric methods use Gaussian models, which are represented by five parameters consisting of two half-widths, the x- and y-coordinates of the centre of the Gaussian and the height of the Gaussian. However when the local concentration of a protein is high, saturation effects can occur which causes the spot profile to differ from the gaussian shape. This saturation is inherent to the diffusion process of the proteins themselves and is different from saturation due to the staining process. The number of these saturated spots differs from gel to gel, and depends on the concentration of the proteins present in the investigated sample. A gel can contain up to 50% of non-gaussian spots which makes that the Gaussian model is no longer justified. In order to improve the spot model we took into account the diffusion process that lies at the basis of the formation of the spot. Based on this process we propose an improved and more realistic spot model. The outline of this paper is as follows : section 2 describes the watershed algorithm, which is used for the segmentation. In section 3 we elaborate our new spot model and in section 4 results and discussions are presented.

2. SEGMENTATION : WATERSHED ALGORITHM For the segmentation of the different spots in the gel, we made use of a watershed algorithm [8][9]. In a watershed algorithm the grey scale image is considered as a topographic relief where the brightness value of each pixel corresponds to a physical elevation. The aim is to find the watersheds of this relief or to find the catchment basins belonging to the local minima. An efficient and accurate watershed algorithm was developed by Vincent and Soille [10] who used an immersion based approach to calculate the watersheds. The technique can simply be described by figuring that holes are pierced in each local minimum of the topographic surface. In the sequel, the surface is slowly immersed into a 'lake' thereby filling up all the catchment basins starting from the basin which is associated to the global minimum. As soon as two catchment basins tend to merge, a dam is built. The procedure results in the partitioning of the image in many catchment basins of which the borders define the watersheds. The immersion technique is quite efficient in terms of processing time and memory requirements. The algorithm is based on a sorting of pixels in increasing order of the greyvalues thereby enabling a fast scanning of the plateaus by a First-in First-out data structure. This watershed algorithm is very well suited for the problem of segmenting the different spots in a 2DE-gel, because, after applying a small mean-filter, these spots are characterized by a monotonic increasing and thereafter decreasing shape. In this way it is possible to detect the catchment basins belonging to the different gel spots. This is a very robust approach : a varying background intensity has no influence on the finding of the different spot regions. To exclude small regions corresponding to background noise, a threshold was chosen for the minimal size of the basins. The remaining basins delineate the regions of most spots. However some spots overlap in such a way that they give rise to only one catchment basin, and as a result they will be identified as one spot. The solution for this problem is given in section 3. A first approximation of the spot positions is given by the minimal grey-level in a basin. A modeling of the spots will result in a pixels-by-pixel parameter fitting procedure, which fits a parameterized spot model to a spot, within the corresponding basin. 3. MODELING OF THE SPOTS Until now most parametric models are based on the following Gaussian model : C (x,y)

I exp(

(x xo)2 2

2x

) exp(

(y yo)2 2

2y

)

(1)

We experienced however that this model fails for certain spots, this inspired us to look at the diffusion process itself. If all the diffusing substance (M) is concentrated in one point, the solution of the diffusion equation in an one-dimensional medium, is given by :

C



M

Dt

2

exp (

x 2 )

(2)

4Dt

with C the concentration of the diffusing substance which is function of x, time t, and the diffusion constant D. In case of a 2DE-gel, we assume that the medium in which the diffusion takes place is two-dimensional and anisotropic: there are two main directions of diffusion, with different diffusion properties for each direction. The initial distribution is also not concentrated in one point but occupies a finite region. In reality this distribution has a complicated form, f.e. due to the electric forces during the 2DE-proces, but we will approximate this by assuming that the diffusing substance is initially distributed uniformly through a circle of radius a. The solution of the corresponding diffusion equation is given by C(x,y,t)



1 Co ( erf ( ar )  erf ( a r )) 2

Co

Dt

r

 with r

2 Dt

( exp(



(ar) ) 4Dt

D (

2

exp( (a r) ) ) 2

4Dt

(x xo)2 (y yo)2 Dx

Dt

2



Dy

)

(3)

(4)

with Co the initial concentration in the circle, x o and yo the place coordinates and Dx,y -diffusion constants for the x and y-direction. Remark that for small a equation (3) reduces to equation (2). Thus, the Gaussian model is valid in the limit of small a. Larger spots however are formed by proteins for which the concentrations in the sample are relatively large. In this case the parameter a, which is related to the initial region in the gel occupied by such proteins, can not be assumed small. For such spots the Gaussian model is no longer valid. To use equation (3), in combination with equation (4), as a new model in a fittingprocedure, the symmetric parameters are eliminated and an extra parameter to compensate for the background is added. The new model is then given by :

(a) (b) Fig. 1. : (a) example of an original digitized 2DE-gel and (b) synthetic image with the resulting modeled spots 4. RESULTS AND CONCLUSIONS C (x,y)



B 1 Co ( erf ( a r )  erf ( a r ))

2

Co

1

r



with

r



(a r )2 ) 4

(x xo)2 D x





2

2

( exp(





2 exp( (a r ) ) ) (5)



4

(y yo)2 D y

(6)

where the 7 parameters to be fitted are :

B, Co, a

D a, D x Dx t, D y Dy t, xo, yo t

The best parameters for each spot are determined by minimization of a !2- function . As mentioned above it is possible that two overlapping spots are identified as one spot by the watershed algorithm and thus modeled as one spot. This problem can be solved by subtracting the modeled spot from the original spot. If a peak appears in the difference image, then this is an indication of the presence of an extra spot in the spot region. In this case a new modeling will take place but now with a model that is the sum of two spots. From the !2-value it is then concluded wether the two spots fitting is better and should be retained.

The spot segmentation and modeling algorithms were developed in a HP-9000 Unix Workstation environment and were written in the standard C programming language. For the test of the algorithm we used silver-stained gels, which were digitized with a laser scanning densitometer with a spatial sampling rate of 50 µm. Figure 1a shows the digitized image of a part of a 2-DE gel, with size 512 x 512 pixels and optical resolution of 8 bits. In this gel, Gaussian as well as non-Gaussian spots are present. The segmentation into the different spot regions was done by the watershed algorithm. For the modeling we used equations (5) and (6). Figure 1b shows the resulting synthetic image : for every detected spot the modeled result in the according spot region is shown. The overlapping spots were also correctly detected and modeled together. In a second experiment we compared the results of the new diffusion model with the results of the Gaussian model. Figure 2 shows the profile of an original saturated (inverted) spot (2a) and the profiles found by the fitting procedure using the diffusion model (2b) and the Gaussian model (2c). One clearly notices that the Gaussian model fails, especially in the flatted region. The modeling with the diffusion model resulted however in a more correct shape. This was the case for all the spots we tested. To conclude : the watershed algorithm is a very robust algorithm for detecting spots, with the major advantage that there is no need for a background subtraction. Secondly we showed that our new spot model is much more suited to perform spot modeling than the Gaussian models, which were used until now.

REFERENCES [1]O'Farrell, P.H., “High resolution two-dimensional electrophoresis of proteins”, J. Biol. Chem., vol. 250, pp. 4007-402, 1975. [2]Anderson, N. L., Taylor, J., Scandora, A. E., Coulter, B. P., Anderson, N. G., “The TYCHO system for computer analysis of two-dimensional gel electrophoresis patterns”, Clin. Chem., vol. 27, pp. 1807- 1820, 1981. [3]Garrels, J.I.,“The Quest system for quantitative analysis of two-dimensional gels”, J. Biol Chem., vol. 264, pp. 5269-5282, 1989. (a)

[4]Lemkin, P., F., Lipkin, L. E., “GELLAB : A computer system for 2D gel electrophoresis analysis I : segmentation of spots and preliminaries”, Comput. Biomed. Res. , vol. 14 , pp. 272 -297, 1981. [5]Yecheng, W., Lemkin, P. F., Upton, K., “A fast spot segmentation algorithm for two-dimensional gel electrophoresis analysis”, Electrophoresis, vol. 14, pp. 1351-1356, 1993. [6]Lemkin, P. F., Myrick, J. E., Upton, K. M., “ Splitting merged spots in two-dimensional polyacrylamide gel electrophoresis images”, Appl. Theor. Electrophoresis , vol. 3, pp. 163-172, 1993

(b)

[7]Solomon, J.E., Harrington, M.G., “A robust, high-sensitivity algorithm for automated detection of proteins in two-dimensional electrophoresis gels” , CABIOS, Vol.9 no.2, pp. 133-139, 1993. [8]Beucher, S., Lantuéjoul, C., “Use of watersheds in contour detection”, Proc. Int. Workshop Image Processing, Real time edge and motion detection/estimation, Rennes, France, Sept. 17-21, 1979. [9]Beucher, S., “Watersheds of functions and picture segmentation”, Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Paris, France, May 1982, pp.1928-1931. [10]Vincent, L., Soille, P. , “Watersheds in Digital Spaces : An Efficient Algorithm Based on Immersion Simulations”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13 No. 6, pp. 583-59,.1991.

(c) Fig. 2 : (a) profile of an original spot, (b) resulting profile after modeling with the Gaussian model, (c) resulting profile after modeling with the diffusion model