3178
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
A Deformable Grid-Matching Approach for Microarray Images Michele Ceccarelli, Member, IEEE, and Giuliano Antoniol, Member, IEEE
Abstract—A fundamental step of microarray image analysis is the detection of the grid structure for the accurate location of each spot, representing the state of a given gene in a particular experimental condition. This step is known as gridding and belongs to the class of deformable grid matching problems which are well known in literature. Most of the available microarray gridding approaches require human intervention; for example, to specify landmarks, some points in the spot grid, or even to precisely locate individual spots. Automating this part of the process can allow high throughput analysis. This paper focuses on the development of a fully automated procedure for the problem of automatic microarray gridding. It is grounded on the Bayesian paradigm and on image analysis techniques. The procedure has two main steps. The first step, based on the Radon transform, is aimed at generating a grid hypothesis; the second step accounts for local grid deformations. The accuracy and properties of the procedure are quantitatively assessed over a set of synthetic and real images; the results are compared with well-known methods available from the literature. Index Terms—Bayesian image analysis, microarray gridding, radon transform.
I. INTRODUCTION
D
NA microarrays [11] technology has a large impact in many application areas, such as diagnostic human diseases and treatments (determination of risk factors, monitoring disease stage and treatment progress, etc.), agricultural development (plant biotechnology), or quantification of genetically modified organisms, drug discovery, and design. In cDNA microarrays, a set of genetic DNA probes (from several hundreds to some thousands) are spotted on a slide. Two populations of mRNA, tagged with fluorescent dyes, are then hybridized with the slide spots, and finally the slide is read with a scanner. The outlined process produces two images, one for each mRNA population, each of which varies in intensity according to the level of hybridization represented as the quantity of fluorescent dye contained in each spot. Image analysis is an essential aspect of microarray experiments; measures over the scanned image can substantially affect successive steps such as clustering and identification of differentially expressed genes. Scanned microarray image processing has three main tasks [28]: 1) gridding, which is the process of
Manuscript received May 23, 2005; revised February 1, 2006. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Mario A. T. Figueiredo. The authors are with Research Center on Software Technologies RCOST, University of Sannio, 82100 Benevento, Italy (e-mail:
[email protected];
[email protected]). Color versions of Figs. 1, 2, 7, 8, 9, 13, and 14 are available online at http:// ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2006.877488
assigning image coordinates to the spots; 2) segmentation, it allows the separation between foreground and background pixels; and 3) intensity extraction, it consists in the computation of the average foreground and background intensities for each spot of the array. Most of available gridding approaches require human intervention, for example, to specify some points in the spot grid or even to precisely adjust individual spots. Automating this part of the process will allow high throughput analysis. Typically, the technician interactively puts some reference points and manually adjusts the spot locations. The manual gridding is one of the main sources of variation in microarray studies. In an interesting experiment reported in [19], it is demonstrated that for the same image, the intensities extracted by different researches can easily have large variations. Therefore, this paper focuses on the development of an automated procedure for the problem of automatic microarray gridding. Automated segmentation is a different problem and different contribution were published in the literature [6], [18], [25]. The problem of automatic gridding is complicated by the fact that microarray images are usually highly contaminated with the noise and artifacts of the wet lab processes. Rotations, misalignment and local deformations of the ideal rectangular grid can often occur. There is a high need of methods for microarray gridding which are robust and flexible at the same time. Some efforts to help automatic microarray data processing have recently emerged in literature. However, most of them impose different kinds of restrictions and are based on stringent assumptions. For example, the approaches in [14] and [21] require that the grid rows and columns are strictly aligned with the and image axes. Other approaches, such as [16] and [17], rely on the Bayesian paradigm to deal with uncertainty and noise. In particular, the approach presented in [17] describes a secondorder prior for microarray gridding, whereas [16] presents a general approach to the grid matching and image warping problems. Here, we adopt a Bayesian approach with a prior containing just 1-cliques. This can simplify, from a computational point of view, the search of the maximum a posteriori (MAP) solution. The adopted prior requires the previous computation of a reference regular grid which is obtained by using the Radon transform (RT) [23] of a filtered microarray image. The use of a voting-based method such as the Hough transform to get the overall grid parameters was originally proposed in [9]. In particular, potential starting spot locations are computed according to the orientation matching transform (OM) [10]. The use of a rectangular grid model has also been successfully exploited in [21] together with a heuristic search procedure, where several restrictions (alignment with and axes) over the grid are imposed without any further adaptation, and computing times
1057-7149/$20.00 © 2006 IEEE
CECCARELLI AND ANTONIOL: DEFORMABLE GRID-MATCHING APPROACH FOR MICROARRAY IMAGES
may be a problem in the adopted search. Our method improves the approach reported in [21] by allowing arbitrary grid alignments with respect to the image axes and by refining the interpolating grid for dealing with local deformations. It also simplifies the general model in [16] by adopting a simpler prior based on 1-cliques. The final result is a MAP grid obtained by a Markov chain Monte Carlo approach. The adopted probabilistic model allows one to deal with local deformations of a reference regular grid. Our approach has some similarities with the heuristic method reported [4] where a similar idea of a prior image filtering step is exploited. However, here we show how the steps of angular optimization and parameter optimization, which are performed by heuristic search in [4], can be efficiently implemented by exploiting the properties of the RT. Our approach has also some similarities with the method reported in [7] where, instead, a set of guide spots are used to generate the grid parameters. The idea of using guide spots and the interpolation has also been used in [21], and for arbitrary grid rotations in [2]. Energy-based gridding has also been recently proposed in [26]. The paper is organized as follows. Section II reports the background methodologies our algorithm relies on, whereas in Section III we present our gridding algorithm. A set of experiments consisting of computer generated and real images is then reported in the last Section. II. GRIDDING A microarray contains up to several thousands fragments of DNA. Each sequence is a probe for capturing a given gene. In practice, each probe will hybridize with a specific sequence of complementary RNA. The genes extracted from the tissues to be studied are first labeled with fluorescent dyes and then hybridized to the array. The genes which are more activated will be evident on the array as they will light up. Biological studies are aimed at explaining how the genes are differentially activated in the same tissues under different conditions, for example, by comparing healthy and sick tissues, or treated and not treated tissues. Therefore, the array will measure how the genes are differentially expressed in two cell cultures, one is the sample and the other is the control. First, the probe sequences are spotted on the array and arranged over a bidimensional grid. Then the mRNA are extracted and labeled, with a red dye for the sample, and a green dye for the control tissues. They are then hybridized with the sequences spotted on the array. Finally, the array is scanned to obtain an image where each spot represents the differential level of hybridization of each gene in the two tissues. The differential activation of the genes in the control and in the sample tissue will produce a set of different configurations. The scanning process produces an RGB image where the first channel corresponds to the measure of expression level of genes in the sample tissue and the second channel corresponds to the expression levels of genes in the control tissue, the third channel is just set to zero. For example, with respect to the image of Fig. 1, the spots with a green color correspond to genes which are more expressed in the control tissue, whereas red spots correspond to genes which are more expressed in the sample, and yellow spots represent genes which are more or less equally expressed in both tissues.
3179
Fig. 1. Microarray image taken from the Stanford Microarray Database, ExptID 15739, [3].
Gridding is the first step in the analysis of microarray images, it consists into the addressing of each spot in the image. As already explained, this step is one of the major source of variation in microarray experiments [19] and has a strong influence on the successive statistical analysis steps. As we can appreciate from the image of Fig. 1, the nature of the wet acquisition process makes the images influenced by several source of noise, such as imperfect alignment of spots, local deformations, and small rotations due to wrong placement under the image scanner, skew of the axis, irregular spot shape, and size, etc.. Here we try to deal with severe image situations such as image skewness and misalignments, local geometrical deformations. The solution of our problem is a deformable grid which has to satisfy some regularity constraints. We first apply a matched filter for the enhancement of circular objects and then we compute a rectangular grid which is successively deformed according to a MAP estimate of the optimal grid with a Bayesian scheme of inference as reported in Section III. Fig. 1 also evidences that real images contain a set of grids which are organized at two levels (4 4 or 12 4). We focus our attention on inner grids; one example is reported in Fig. 2, since the problem of segmenting the outer grid has been approached by several authors. Here, we use a variant of the approach reported in [1] for segmenting inner grids, as explained in Section IV. A. The Orientation Matching Transform (OM) The two channels of input images are first filtered according to the OM, this filtering process is aimed at the detection of candidate points to be centers of spot. OM was proposed in [10]. It is an extension of the Hough Transform for circles, and has several advantages: it is a correlation-based transform; it does not require a prior edge-detection phase; and it can be applied to circles with a wide range of radii. Finally, it can be tailored to recognize light spots on a dark background and vice versa or both.
3180
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
Fig. 3. Grid representation in the GridMatching step.
represents the matching between the oriTherefore, entation of the image gradient with the orientation of an ideal . annulus of radii
Fig. 2. Subgrid of the image in Fig. 1.
Let be an annulus or radii and with center in the point of the plane, and for each , as the orientation of an ideal circle cenlet us define such that tered at the origin and radius and . Let represent one of the two fluorescence images and be the orientation of the gradient of at , then let the OM is given by
B. The Radon Transform The RT can be used to describe a function in terms of its (integral) projections [23]; it is a mapping from the function onto the projection space. The inverse RT corresponds to the reconstruction of the function from the projections. In the case of images, the two–dimensional (2-D) RT of an image is formulated as (3)
(1) with equality when all of the We have are circles, or level lines of the image in the annulus more properly, when the direction of their gradients is the same as that of ideal circles. The operator OM performs a mapping can from the image space onto itself and the value of to be be interpreted as the degree of certainty of the point the center of a spot. As the cosine can be implemented with a scalar product, the transform (1) can be easily implemented by with a bidimentemplate matching of the image gradient sional vector field given by the direction of the gradient of an ideal annulus. In particular
(2) where the kernel
where is the Dirac delta function. Within the field of image analysis, RT is mostly known for its role in computed tomography. It is used to model the process of acquiring projections of the original object using X-rays. Given the projection data, the inverse RT, in whatever form (e.g., back-projection), can be applied to reconstruct the original object. Here, we propose to . find the initial parameters of the input grid from C. GridMatching This step computes an initial guess consisting into a regular described by six parameters: , grid, where and are the coordinates of the upper left spot, and are the angles between the grid directions and the axis, and and are the grid spacings in both directions (see Fig. 3). and , the number of spots along the grid directions, Let , , have the grid points coordinates
is given by The main step of the GridMatching phase consists in the computation of the Radon Transform of the filtered images. By ana-
CECCARELLI AND ANTONIOL: DEFORMABLE GRID-MATCHING APPROACH FOR MICROARRAY IMAGES
3181
Fig. 5. Function 0().
Fig. 4. OM transform of the image in Fig. 2.
lyzing the peaks of the RT, we can compute all the six parameters of the grid. Each channel of the input image is filtered by the OM having as parameters just the minimum and maximum spot radii. Note that the operator (1) weights all the radii in the same manner, and therefore, this range can be wide. This is demonstrated in the experimental section. As an example, consider the image of Fig. 2 presenting a microarray with a global rotation and various local deformations. The sum of the OM, for each image channel, is reported in Fig. 4. We can compute the grid parameters and , as anticipated, by using the RT of the transformed image. In order to find the principal directions of the grid, we consider just the direction of the projection by integrating the space variable in the transform. This will allow to select the directions having the maximum score corresponding to the angles along which we have the maximum number of aligned spot centers. In particular, our algorithm computes the two main peaks of the function
(4) Our peak-detection algorithm is very simple; we just compute . The reason lies in the fact the two best local maxima of that to detect the directions along which the spot locations are should have been scored many maximally aligned, times for several values of . Following the above image example, we report in Fig. 5 the plot of the function having two distinct peaks corresponding to 1 and 92 , which we select as the principal orientations of the grid. In order to remove the noise, is low-pass filtered before the peak detection, the adopted filter has an impulse response (0.3 0.4 0.3). Once the principal orientations of the grid have been comand of puted, we are able to determine the parameters the reference grid. This is done again by using the properties of the RT. We actually know that the possible spot centers are max-
Fig. 6. Two columns of the RT corresponding to (a)
= and (b) = .
imally aligned along directions and . Therefore, if we project the OM image along one of them, for example , we obtain a sequence of values which correspond to the spot alignments which are parallel to this direction. Fortunately, we know that the spots form a grid, and the alignments parallel to are organized as a sequence of rows which are apart. Therefore, the projection of the OM image along the direction will eventually produce a apart. Once the profile having a sequence of peaks which are RT has been computed, it will be enough to extract the columns corresponding to the directions and , which are respectively reported in Fig. 6(a) and (b). It is worth noting that this method is a generalization of the algorithm proposed in [14]. Indeed, the use of the RT allows one to develop a framework able to deal with a wide range of directions, on the contrary, the original approach reported in [14] requires the grid to be perfectly is the avaligned with the and axes. Our estimate of , erage distance between consecutive peaks in whereas the estimate of is the average distance between con. However, in order to define secutive peaks in a robust procedure, the peak detection must be as much as possible immune from outliers and image imperfections. For example, it is possible that a complete row or column of the grid
3182
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
well-defined organization. The MAP grid estimate, therefore, is obtained by searching for the most likely grid given the observed , and the Bayes principle image, i.e., the grid maximizing states that this posterior is proportional to the product between the prior and the likelihood
(5) This paper assumes a notation similar to the one adopted in [16]; let us define the list of node locations , where is the vector of image coordinates of the node . Let be the nodes of the reference grid computed by the GridMatching step detailed in the previous section, then the joint distribution of is modeled by a Gaussian random field (6) Fig. 7. Result of the GridMatching step.
does not contain any significant spot. To this purpose, we use the following procedure for a robust estimation of . Let us the positions of the local maxima of define as . We the column of the RT corresponding to , , order these position in such a way that and then we compute the mean and standard deviation of the se. We extract from such series ries the values passing the Grubbs’ test for detecting outliers [5], is then selected as the mean of this new set. In the value of the same way is computed starting from Finally, the computation of and is a straightforward step. It is enough to project back the coordinates of the center of spots and take the minimum of the attained values. A plot of the result for the image in Fig. 2 is reported in Fig. 7. III. BAYESIAN GRID REFINEMENT The Bayesian approach at Image Analysis provides the means to incorporate prior knowledge in the process of discovering useful patterns from measured images [12]. Bayesian analysis relies on the posterior probability, which summarizes the degree of certainty concerning a given situation. Bayes’ law states that the posterior probability is proportional to the product of the likelihood and the prior probability. The likelihood encompasses the information contained in the observed data. The prior expresses the degree of certainty concerning the expected result before observing the data. [15]. Although the posterior probability completely describes the state of certainty about any possible image, it is often necessary to select a single datum as the result of the analysis. A typical choice is the image, or feature, that maximizes the posterior probability, which is called the MAP estimate. For the case of microarray image gridding, we have an observed datum, the input image, , which is the raw visual representation of an ideal grid , consisting of a sequence of spot locations with a
Therefore, the grid location of the point has a Gaussian distribution with mean . This means that we assume that the real grid can deviate in a Gaussian manner from the regularity. Other approaches, such as [16], are based on a second-order prior and take into account local interactions between grid locations. In any case, since the GridMatching step produces a regular interpolating grid as result, this prior serves as a regularizer, in the sense that it tends to favor solutions which are not far from the regularity imposed by the reference grid . Here, we do not assume mutual dependence of the Gaussian displacement along the two axes, and therefore we fix
(7) where is a constant that regulates the amplitude of the spot . displacement. In all our experiments, we fixed To apply the Bayes’ principle, we must consider an observation model for the grid matching problem. In particular, having selected a grid configuration, the observation model should weight the difference between the observed image and the ideal image corresponding to a specific grid configuration [20]. In other words, the likelihood term in the Bayes principle encompasses our knowledge about the ideal image [12]. In describes the probability of our case, the likelihood observing given . Of course, the ideal image should have a spot center at each vertex of the grid , or equivalently, the OM transform should reach its maximum value at each grid vertex. Therefore, we adopt as an observation model
(8) Notice that once the OM transform has been computed at the beginning of the analysis, the computation of (8) requires its values at the vertex points, without recomputing it.
CECCARELLI AND ANTONIOL: DEFORMABLE GRID-MATCHING APPROACH FOR MICROARRAY IMAGES
3183
By combining (6) and (8), we obtain the MAP grid estimate that maximizes the log-posterior (9) (10) where and are two parameters balancing the weight of the reflects the trade-off prior over the posterior. The ratio between the regularity of the reference grid and the local deformation imposed by maximizing the OM over the grid vertexes, it is known in literature as the regularization parameter [22]. To further refine the solution, we adopt a meta heuristic approach; in particular, we apply the simulated annealing scheme to maximize (10). In the maximization phase, we restrict the grid node positions to correspond to pixel positions and the moves, as in [16], are allowed among the four neighbor pixels. The sampling scheme is as follows. 1) Start with
.
2) Select a node and move it in a new position defines a new grid configuration .
, this
3) Compute
4) Replace
by
with probability ;
5) Go to step 2) until convergence. As suggested in [12], the parameter (the Temperature) gradually decreases toward zero. Here, we adopt a logarithmic scheduling as function of the number of iterations. The image in Fig. 8 reports the final results obtained after 1000 sampling , , which are the iterations with parameters same that we adopted in all our experiments, both synthetic and real, as reported in the next section. The convergence is checked when there is no more modification of the solution within a given number of iterations, or when the maximum number of iterations has been reached. IV. RESULTS We have demonstrated the output of each step of the proposed method with a typical microarray image. Here, we want to investigate the robustness of the method and evaluate its accuracy. However, due to the difficulty of having exact labeled grid images, and in order to perform an accurate quantitative analysis, as a first experiment we investigate the capability of the proposed method over synthetic images representing microarray grids with artificially introduced deformations. Our grid-generation module has several parameters: • minimum and maximum radius of a spot; ; • parameters • spot location noise.
Fig. 8. Output of the GridRefinement step; parameters are = 10, = 100.
Therefore, for each spot location, the grid-generation module randomly selects a radius in the given interval, and then alters the center coordinates according to the adopted noise model. In our experiments, we assume a Gaussian distribution of the spot location noise and a Gaussian distribution of the deviation of the grid directions with respect to the horizontal and vertical directions of the image. Each spot center is translated with random and offsets having distributions With the help of the grid module, we want to quantitatively evaluate the robustness of the proposed method with respect to grid noise and deformations. To this aim, we separately perform the evaluation of the GridMatching step and the GridRefinement step. Given a set of reference grid parameters, we run the experiments by generating synthetic noisy images. The second kind of artificially induced distortion refers to the deviation of the grid principal directions from the and image axes, in this case and the angles and are random variables with and is measured in degrees. Since the results of the algorithm do not depend on the specific values of the other and , we do not consider parparameters, such as ticular alterations of these parameters for the quantitative analysis. For convenience we fix the dimension of the image to be and . 700 700, There is no reason to vary these parameters also, since the is expressed in pixels; once the image dimensions and grid spacing have been fixed, one has fixed the scale of the problem and the performance of the algorithm can be evaluated as function of the noise parameters in the adopted scale. We run our experiments in the range [0, 24] and in the range [0, 8] degrees. with Therefore, the maximum value of is near the value of the grid spacing. Values of the noise level of spot displacement above the range of the grid spacing do not make sense as the resulting image would not appear as a grid, but just as a cluster of spots. The image of Fig. 9(a) depicts synthetic grids with and and , whereas Fig. 9(b) has a value of
3184
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
Fig. 9. Typical synthetic grids used in the experimental evaluation. The output of the GridMatching step is reported in (a) and (b) whereas the output of the GridRefinement is reported in (c) and (d). The first column refers to a noisy grid image with = 20 (below the grid spacing) whereas the second column corresponds to = 30 (above the grid spacing).
. As we increase the noise above the grid spacing, we obtain a very complex image and the algorithm is often unable to correctly identify the right grid structure. This is also due to the fact that as the noise level gets larger than the grid spacing, the spots tend to overlap each other and there is a loss of the grid structure itself. It must also be said that such a noise level (above the grid spacing) seems to be quite unrealistic for real images. Therefore, once that the problem dimensions have been fixed, we believe that a set of experiments with a noise level up to the grid spacing can be considered an efficient testbed for the performance of the algorithm. The second row of Fig. 9 reports the result of the Grid Refinement step for both grids. For each and , we generated a set of 100 images combination of and recorded the matching grids computed by GridMatching. As a figure of merit, we adopt the root mean square error (rmse) between the true solution and the solution attained by the algoand . We rithm over all the 100 images for each value of report in Fig. 10 the rmse for the six parameters resulting from the GridMatching step. We observe that the parameter which is more affected by the noise is the origin of the grid. This is due
to the fact that it is computed by intersecting the lines, having orientation , passing through the spot centers with minimum coordinates with the lines, having orientation , passing through the spot centers with minimum coordinates. Such spot centers are the local maxima of OM which are above a fixed high threshold, in all the reported (simulated and real) we adopted a threshold of 0.8. This threshold serves to locate well-shaped spots whose coordinates are then projected back along the and directions. This parameter is important as the detection of a false spot due to noise would introduce an error in the computation of the origin of the grid. Therefore, it cannot be chosen too low. On the converse, a too high threshold could prevent the detection of true spots, especially for noisy images. After these considerations, we fixed this threshold to 0.8 (80% of confidence to being a spot center) as it seems a good compromise for the selectivity and sensitivity of the method. Of course, for noise-free images, this threshold can be augmented; however, since the reported experiments refer to real images, often presenting noisy and not well-shaped spots, the adopted value seem to work well and we adopted it in all the experiments.
CECCARELLI AND ANTONIOL: DEFORMABLE GRID-MATCHING APPROACH FOR MICROARRAY IMAGES
3185
Fig. 10. Experimental rmse as function of and , for the various parameters of the GridMatching step. The error is computed on a set of 100 random grid generated for each pair ; . The interval for the spot radius is [4], [8] pixels. The first row reports the error on x ; y , whereas the second row reports the error on and . The last row reports the error for x and y .
(
)
1
1
As Fig. 10 reports, the grid orientations and the grid spacing parameters are always computed in a very accurate way. The grid spacing computation seems to be accurate even since we imposed such values to be integer valued. From these results, we see that the GridMatching step is very efficient in detecting the grid parameter even in the presence of random deviation which are of the same order of the spot radius, since we used an interval between 4 and 8 pixels. The accuracy of this step affect the overall performance, and makes the step of grid refinement particularly simple as most of the spot centers are already near the grid nodes. As we can see, the results are quite independent from ; this is due to the fact that the use of the properties of the RT actually allow to recover the right orientations of the grid with respect to the image axis. As a second quantitative experiment, we can evaluate the performance of the GridRefinement step. Since it reports the estimated position of each spot, we can measure the mean square
(
)
error with respect to the true spot position of all the spot locations
where is the average and the mean is taken over all the 100 and are, respectively, the true spot position experiments, and the computed spot position, is of course expressed in pixel units. Even in this case, we apply the same noise model of the previous experiment and as we can see from Fig. 11, the global error always lies below the maximum spot radius. A further useful experiment consists in the comparison of the accuracy of the proposed method with the accuracy of a well-known approach in literature. Here, we want to compare the mean square error between the true solution and the esti-
3186
Fig. 11. Rmse computed over all the grid points as function of and averaged on 100 images.
Fig. 12. SE obtained with the proposed method and the spot algorithm [14] as function of the noise variance on the spot location.
mated solution as function of the noise variance. We compare the results obtained with the proposed algorithm with those obtained by using the Spot algorithm [14]. Spot is well known to be a completely unsupervised gridding method. In order to perform the comparison it is important to remark that Spot is not able to deal with rotations of the grid, therefore, the only noise which we could add to the image consists, as above, in small deviations form the ideal spot location on the grid. We report in Fig. 12 the MSE of the two methods. We see that the Bayesian approach can efficiently recover the grid geometry even in the presence of noise. In addition, the Bayesian approach can efficiently deal with severe grid rotations, as demonstrated in the previous experiments. This makes our method particularly robust, also considering that the only significant parameter required by the algorithm is the interval of the spot radius. Once we have demonstrated the accuracy of our approach over synthetic images, it is useful to consider its behavior over real microarray images. Real microarray images present a set of spots organized in a two level grid as in Fig. 1. In our experiments, we perform a first unsupervised segmentation of the
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
outer grid and then independently apply the described algorithm to each inner grid. For the segmentation of the outer grid, we use the algorithm reported in [1], where the authors use morphological filtering to connect spots in the same grid and to separate spots in different grids, the only variant of the algorithm is that we apply morphological filtering to the OM image rather then to the original image. In order to perform a validation of the method over real images, it can be convenient to refer to the collection of microarray images available in the Stanford Microarray Database (SMD) [13]. It is a research tool for hundreds of researchers. SMD functions as a resource for the entire biological research community by providing unrestricted access to microarray data published by SMD users and by disseminating its source code. SMD has the ability to store, retrieve, display, and analyze the complete raw data produced by several microarray platforms and image analysis software packages. It contains hundreds of images annotated with the bounding box of each spot. These images have been analyzed by researchers with one of the interactive image processing platform compatible with SMD. Therefore, starting form the bounding box of each spot, we can easily compute their centers and compare it with the gridding obtained by our algorithm. Indeed, it must be said that these annotations are the result of the interactions of researcher with image processing software, and therefore can eventually contain some small errors and variations. As a comparison, we report in Table I a measure of the accuracy obtained by our method compared with the results obtained by using the and the rmse for Spot package [14]. The table reports the the and coordinates of the spot locations, which are reand . These images are particspectively referenced as ularly challenging as they present several types of distortions and contain a high number of spots, the proposed Bayesian grid matching method seem to perform in a very accurate way even in the presence of noise. The last row of the table reports the results obtained with a less noisy image, where the Spot package performs relatively better. In any case, also for our method, the accuracy is compatible with the application as it stays under two pixels. The spot radius for the reported images varies between five and seven pixels. Indeed, the ultimate goal of gridding is that of helping in the segmentation phase where the image intensities must be measured in order to be converted into expression level of each gene. Having accurately detected the spot locations with the automatic gridding method, here we perform a simple local background estimation based on clustering. The basic method is similar to the approach proposed in [6], the pixels near the spot location generated by the gridding algorithm serve to initialize cluster centroid for the foreground, and the pixels far from the spot center serve to initialize the background cluster. The clustering-based segmentation has been also proposed in [24]. The segmentation step releases any assumption on the shape of the spots. Form the image of Fig. 2 it is also evident that every segmentation approach should necessarily perform a local estimation of the background, and this is just possible if an accurate gridding method is adopted. We see in the image of Fig. 13 the resulting image segmented with the corresponding grid. We also report in Fig. 14 the segmentation result obtained with the Spot algorithm, where the error due to the misalignment of the grid axis
CECCARELLI AND ANTONIOL: DEFORMABLE GRID-MATCHING APPROACH FOR MICROARRAY IMAGES
3187
TABLE I ACCURACY OF GRIDDING OBTAINED BY OUR METHODS OVER SOME REAL MICROARRAY IMAGES TAKEN FROM THE STANFORD MICROARRAY DATABASE. THE TABLE REPORTS THE RMSE FOR IN THE x AND y DIRECTION AND THE TOTAL ERROR. THE LAST THREE COLUMNS REFER TO THE RESULTS OBTAINED WITH THE SPOT [14] ALGORITHM OVER THE SAME IMAGES
Fig. 14. Image of Fig. 1 segmented by the spot algorithm. Fig. 13. Segmented image.
is quite evident and the algorithm erroneously detects a whole column at the beginning of the grid. Last we want to report the computing time of the various steps of the algorithm over the image of Fig. 2 of dimensions 503 503. The Table II shows the elapsed time in seconds on a 2.4-GHz Pentium running Linux. The main computation demanding phase is the RT in the GridMatching step as it takes about the 65% of the whole time. This is the time to process each inner grid. In order to process the whole image, the times must be multiplied by the number of subgrids, in the case on image in Fig. 1, of dimensions 1872 1900, we have 16 subgrids. In addition, in order to process the whole image and to extract the subgrid by the algorithm of [1], there are some other computations consisting in two main morphological closing operations, which for this image take approximately 3.5 s for the adopted computing platform. What the times report is that our procedure, even being completely unsupervised, can efficiently be executed on standard platforms, other approaches such as those based on guide spots and heuristic searches for the best interpolating grid (e.g., [2] and [21]) can seriously suffer of slow computing times and dependence on the initial search parameters.
TABLE II COMPUTATION TIME FOR THE IMAGE OF FIG. 2 OVER A PENTIUM PC
V. CONCLUSION The paper reported a microarray gridding algorithm grounded on the Bayesian approach and on image analysis. The first step exploits the properties of the RT, whereas the second step computes the maximum a priori estimate of the grid given the observed data. The method can efficiently deal with various kinds of perturbation such as image rotations, spot irregularities, and deviations. The experimental results over synthetic and real images suggest that it can outperform other methods for noisy images. The contributions of the paper can be summarized as the follows: — adoption of a simpler prior, with respect to [16], for the regularization constraint in Bayesian gridding of Microarray Images; — proposing a procedure for the computation of the reference grid parameters exploiting the properties of the RT;
3188
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
— improving previous approaches based on projections [14] and on guide spots [2], [21] by using the advantages of both approaches within a Bayesian framework. The simplification of the model proposed by Hartelius and Carstensen [16], however, can have some drawbacks. Indeed, the proposed method does not deal with gross distortions, like major row gaps and major warps, as well as the model in [16], and is thus more dependent on the correctness and consistency of the initially estimated parameters. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers who helped them improve the manuscript and pointed out some references and useful data. REFERENCES [1] J. Angulo and J. Serra, “Automatic analysis of DNA microarray images using mathematical morphology,” Bioinformatics, vol. 19, no. 5, pp. 553–562, 2003. [2] G. Antoniol and M. Ceccarelli, “A Markov random field approach to microarray image gridding,” in Proc. IEEE Int. Conf. Pattern Recognition, 2004, vol. 3, pp. 550–553. [3] M. Arbeitman, E. Furlong, F. Imam, E. Johnson, B. Null, B. Baker, M. Krasnow, M. Scott, R. Davis, and K. White, “Gene expression during the life cycle of drosophila melanogaster,” Science, vol. 2957, pp. 2270–2275, 2002. [4] P. Bajcsy, “Gridline: Automatic grid alignment in DNA microarray scans,” IEEE Trans. Image Process., vol. 13, no. 1, pp. 15–25, Jan. 2004. [5] V. Barnett and T. Lewis, Outlier in Statistical Data. New York: Wiley, 1994. [6] D. Bozinov and J. Rahneufuhrer, “Unsupervised technique for robust target separation and analysis of DNA microarray spots through adaptive pixel clustering,” Bioinformatics, vol. 18, no. 5, pp. 747–756, 2002. [7] N. Brandle, H. Bischof, and H. Lapp, “Robust DNA microarray image analysis,” Mach. Vis. Appl., vol. 15, pp. 11–28, 2003. [8] M. Bredel, C. Bredel, D. Juric, G. R. Harsh, H. Vogel, L. D. Recht, and B. I. Sikic, “High-resolution genome-wide mapping of genetic alterations in human glial brain tumors,” Cancer Res., vol. 65, no. 10, pp. 4088–4096, 2005. [9] J. M. Carstensen, “An active lattice model in a Bayesian framework,” Comput. Vis. Image Understand., vol. 63, no. 2, pp. 380–387, 1996. [10] M. Ceccarelli and A. Petrosino, “The orientation matching approach to circular object detection,” in Proc. IEEE Int. Conf. Image Processing, 2001, pp. 712–715. [11] M. B. Eisen and P. O. Brouwn, “DNA arrays for analysis of gene expression,” Meth. Enzymol., vol. 303, pp. 179–205, 1999. [12] G. Geman and D. Geman, “Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-6, pp. 721–742, 1984. [13] J. Gollub, “The stanford microarray database: Data access and quality assessment tools,” Nucl. Acids Res., vol. 31, pp. 94–96, 2003. [14] A. N. Jain, T. Tokuyasu, A. Snijders, R. Segraves, D. Albertso, and D. Pinkel, “Fully automatic quantification of microarray image data,” Genome Res., vol. 12, pp. 325–332, 2003. [15] E. T. Jaynes, , J. H. Justice, Ed., “Bayesian methods – An introductory tutorial,” in Maximum Entropy and Bayesian Methods in Applied Statistics. Cambridge, U.K.: Cambridge Univ. Press, 1986. [16] K. Hartelius and J. M. Carstensen, “Bayesian grid matching,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 2, pp. 162–173, Feb. 2003. [17] M. Katzer, F. Kummert, and G. Sagerer, “A Markov random field model of microarray gridding,” presented at the ACM Symp. Applied Computing 2003.
[18] R. Lukac and K. Plataniotis, “Vector median root signals determination for cDNA microarray image segmentation,” Lecture Notes Comput. Sci., vol. 3656, pp. 982–989, 2005, A. Campilho and M. Kamel, Eds.. [19] N. D. Lawrence, M. Milo, M. Niranjan, P. Rashbass, and P. Soullier, “Reducing the variability in cDNA microarray image processing by Bayesian inference,” Bioinformatics, vol. 20, no. 4, pp. 518–526, 2004. [20] S. Z. Li, Markov Random Field Modeling in Image Analysis, 2nd ed. New York: Springer-Verlag, 2001. [21] A. W. Liew, H. Yan, and M. Yang, “Robust adaptive spot segmentation of DNA microarray images,” Pattern Recognit., vol. 36, pp. 1251–1254, 2003. [22] T. Poggio, V. Torre, and C. Koch, “Computational vision and regularization theory,” Nature, vol. 317, pp. 314–318, 1986. [23] A. G. Ramm and A. I. Katsevich, The Radon Transform and Local Tomography. Boca Raton, FL: CRC, 1996. [24] L. Rueda and L. Qin, “An improved clustering-based approach for DNA microarray image segmentation,” Lecture Notes Comput. Sic., vol. 3212, pp. 17–24, 2004, A. Campilho and M. Kamel, Eds.. [25] ——, “A new method for DNA microarray image segmentation,” Lecture Notes Comput. Sic., pp. 886–893, 2005, A. Campilho and M. Kamel, Eds.. [26] L. Rueda and V. Vidyadharan, “A new approach to automatically detecting grids in DNA microarray images,” Lecture Notes Comput. Sic., vol. 3656, pp. 982–989, 2005, A. Campilho and M. Kamel, Eds.. [27] S. Subramanian, R. B. West, R. J. Marinelli, T. O. Nielsen, B. P. Rubin, J. R. Goldblum, R. M. Patel, S. Zhu, K. Montgomery, T. L. Ng, C. L. Corless, M. C. Heinrich, and M. van de Rijn, “The gene expression profile of extraskeletal myxoid chondrosarcoma,” J. Pathol., vol. 206, pp. 443–444, 2005. [28] Y. H. Yang, M. M. Buckley, S. Dudoit, and T. Speed, “Comparison of methods for image analysis on cDNA microarray data,” J. Comput. Graph. Statist., vol. 11, pp. 108–136, 2002. Michele Ceccarelli (M’94) received the Laurea degree in computer science from the University of Salerno, Salerno, Italy, in 1989. He was with the Italian National Research Council. Since 1997, he has been with the University of Sannio, Via Traiano, Italy, where he is an Associate Professor of Computer Science. He leads, with G. Antoniol, the Scientific Computing and Data Analysis (SCODA) group at the Research Center on Software Technologies. His main scientific interests are in various fields of pattern recognition, computer vision, and image processing such as Bayesian image analysis, nonlinear image restoration, and texture analysis. He has published about 60 papers in journals and international conferences in this field. He has been the leader of several national projects in his research field. Mr. Ceccarelli is a member of IAPR and SIREN. Details of his work can be found at http://www.scoda.unisannio.it.
Giuliano Antoniol (M’93) received the degree in electronics engineering from the Universita’ di Padova, Padova, Italy, in 1982, and the Ph.D. degree in electrical engineering from the Ecole Polytechnique de Montreal, Montreal, QC, Canada, in 2004. He has worked at companies, research institutions, and universities. He has published more than 90 papers in journals and international conferences. Dr. Antoniol has served as a Program Committee Member of international conferences and workshops, such as the International Conference on Software Maintenance, the International Workshop on Program Comprehension, and the International Symposium on Software Metrics. He is presently a member of the Editorial Boards of the Journal Software Testing Verification & Reliability, Journal Information and Software Technology, Journal of Empirical Software Engineering,, and Journal of Software Quality. In 2005, he was awarded the Canada Research Chair Tier I in Software Change and Evolution.