A Downstream Algorithm Based on Extended Gradient ... - IEEE Xplore

1 downloads 0 Views 6MB Size Report
The downstream process starts with a set of seeds scored and ... need of manual drawing. ..... manual interaction, the proposed method is, thus, fully auto- matic ...
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 10, OCTOBER 2004

1379

A Downstream Algorithm Based on Extended Gradient Vector Flow Field for Object Segmentation Cheng-Hung Chuang and Wen-Nung Lie

Abstract—For object segmentation, traditional snake algorithms often require human interaction; region growing methods are considerably dependent on the selected homogeneity criterion and initial seeds; watershed algorithms, however, have the drawback of over segmentation. A new downstream algorithm based on a proposed extended gradient vector flow (E-GVF) field model is presented in this paper for multiobject segmentation. The proposed flow field, on one hand, diffuses and propagates gradients near object boundaries to provide an effective guiding force and, on the other hand, presents a higher resolution of direction than traditional GVF field. The downstream process starts with a set of seeds scored and selected by considering local gradient direction information around each pixel. This step is automatic and requires no human interaction, making our algorithm more suitable for practical applications. Experiments show that our algorithm is noise resistant and has the advantage of segmenting objects that are separated from the background, while ignoring the internal structures of them. We have tested the proposed algorithm with several realistic images (e.g., medical and complex background images) and gained good results. Index Terms—Gradient vector flow (GVF), image segmentation, object detection, region growing, snakes, watershed.

I. INTRODUCTION

I

MAGE segmentation plays an important role in applications where image analysis or understanding is necessary. There were numerous techniques proposed to address this difficult problem in recent years. These techniques, in general, can be categorized into two types of algorithms: edge based and region based. Edge-based segmentation puts emphasis on detecting significant gray-level changes near region boundaries, while region-based segmentation puts emphasis on partitioning pixels into regions of uniform properties like grayscale, texture, etc. For high-level image segmentation, it means distinguishing objects from the backgrounds. In literature, the active contour models (also called snakes) [1]–[7] were well known to provide a good detection of object boundaries. These models utilize a closed contour to approach object boundary by iteratively minimizing an energy function. The energy depends on the shape (internal energy) and image gradient (external energy) where the contour resides and its minimization results in a desired object boundary. One of the drawbacks of traditional snake algorithms is that the construction of initial contour often requires

Manuscript received December 10, 2001; revised December 29, 2003. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Aleksandra Mojsilovic. C.-H. Chuang is with the Institute of Statistical Science, Academia Sinica, Taipei City 115, Taiwan, R.O.C. (e-mail: [email protected]). W.-N. Lie is with the Department of Electrical engineering, National Chung Cheng University, Chia-Yi 621, Taiwan, R.O.C. (e-mail: [email protected]). Digital Object Identifier 10.1109/TIP.2004.834663

human interaction. It is laborious for users to draw an initial contour near the object boundary. Yuen et al. [4] proposed an automatic initialization snake algorithm for multiobject segmentation. Nevertheless, their algorithm is limited to some specifically distributed objects, e.g., those spreading around the center of gravity. Moreover, in traditional snakes, the segmentation results are heavily sensitive to initial contour conditions. In other words, if the initial contour is far from the true object boundary, the snake will be extremely affected by local noise and the result will be inaccurate. Furthermore, the snake is difficult to approach concave parts of the object boundaries. An iterative split-and-merge procedure [5] and a gradient vector flow (GVF) field model [6], [7] were proposed to solve this problem. Although these snakes are less sensitive to initial contour conditions, they are still in need of manual drawing. On the other hand, the region-growing methods [8]–[11] often start with selected seeds to group pixels into regions based on the homogeneity property. Unfortunately, pure region growing methods normally segment objects inaccurately (over growing or under growing). To increase the accuracy, a morphological grayscale algorithm [12] was proposed to reconstruct edge information while performing smoothing to remove noise. The performance of segmentation fully depends on the traversing orders of image pixels [8] as well as the number and locations of selected seeds. Some additional merging criteria [9], [10], e.g., contrast between adjacent regions, can be used to assist in distinguishing objects from backgrounds. Besides, one approach to improving accuracy is to fuse the intermediate grown results with the information provided by edge detection [11]. Another important issue is how to choose the seeds without human interaction. To automate the process, seeds can be extracted automatically by evaluating certain quantities, e.g., centroids of patterns enclosed by edges [11]. The watershed algorithm [13]–[15] is also a powerful regionbased method to segment images without human interaction. Watersheds and catchment basins in topography correspond to the edges and interiors of regions in gradient images, respectively. The flooding method [13] is fast and accurate for performing watershedding. It assumes that there is a hole with local minimum for each catchment basin. Then, the holes are immersed in water and dams are built where floods of water from different catchment basins would meet. After the immersion is completed, the remaining dams represent watersheds of the image. The watershed algorithm can find contiguous edges in an image accurately but suffers from the over-segmentation problem. Hence, a robust merging algorithm [14] is required for the combination of small regions; otherwise, some other tech-

1057-7149/04$20.00 © 2004 IEEE

1380

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 10, OCTOBER 2004

niques, e.g., multiscale image analysis [15], should be integrated to prevent over segmentation. In this paper, we propose a new downstream algorithm based on an extended gradient vector flow (E-GVF) field model for object segmentation. The downstream behavior is apparently in contrast to the flooding process adopted in the watershed algorithm. Our algorithm starts with an initial set of seeds, which are automatically chosen by considering properties of local force vectors around each pixel. The computed E-GVF field is then used to guide the downstream operation for object region growing. Since the E-GVF field is capable of preserving gradients near object boundaries, on one hand, and diffusing them to elsewhere (e.g., areas of uniform intensity), on the other hand, less over- or under-growing (i.e., under- or over-segmentation) problems exist in our processing result. Our method is featured of the capabilities in automatic operation and multiple object processing. Experiments show the robust performance of our algorithm for object segmentation. This paper is organized as follows. The E-GVF field model is presented in Section II. In Section III, the proposed downstream algorithm based on the E-GVF field is described. In Section IV, simulations and performance analysis are demonstrated. Finally, Section V draws some conclusions. II. EXTENDED GVF FIELD MODEL A. Traditional GVF Field It is well known that the snake (active contour) algorithms, which are actually edge-based segmentation techniques, are effective methods to detect object boundaries if an initial guess of contour can be given. The traditional snake model is a conjunction of internal and external energy, by which a contour is animated and deformed toward object boundaries. Principally, the minimized internal energy maintains smoothness and compactness of the contour shape, but causes degeneration to a single point in the extreme case. The minimized external energy however adjusts the contour so that it is in consistency with the image gradient status where it is located. They can be considered as two forces that guide the snake to move. The negative of image gradient magnitude is often adopted as the external force of a snake model. Obviously, object boundaries that have larger gradient magnitudes will lead to a smaller external energy status. This moves snake contours toward the object boundaries when the external energy is minimized. Unfortunately, this kind of external force is restricted to areas close to the object boundaries. When snake contours are located far from object boundaries, snakes will fail to approach them effectively. For the same reason, snakes based on traditional image gradient fields are also deficient in converging to the concave parts of an object contour. In [6], Xu et al. defined a GVF field as the external force of a snake model to solve the problem mentioned above, i.e., the sensitivity to initial contour condition. The GVF field is defined to be a force field of vectors. Each vector , for any image pixel ( ), is computed by minimizing the energy function as (1)

Fig. 1. Definition of a four-component vector in the proposed E-GVF field model.

where is the gradient of the edge image derived from the original intensity image , is a regularization parameter, and the subscripts represent partial derivatives with respect to and axes. Since the traditional GVF field is stable only with the Courant–Friedrichs–Lewy step-size condition [6], the regular. After the ization parameter should be restricted to will approximate where minimization process, is large and be smooth elsewhere, i.e., the gradients are diffused from object boundaries to homogeneous regions. Each GVF vector will point toward object boundaries even it is far from them. The GVF field is then adopted as the external energy term for stronger resistance to noise than the traditional image gradients only. Consequently, snake processing based on the GVF field can approach object boundaries even if the initial contour is located far from them or certain concave parts of object boundaries are present. However, this type of snake algorithms still requires human interaction (i.e., drawing initial contours). In this paper, we extend the GVF field for use in a downstream process, which not only makes best use of GVF’s diffused gradient information, but also resolves the drawbacks it presents prior. B. Extended Gradient Vector Flow Field In our method, a force field of four component vectors is defined, where , , , and represent the flow amplitudes in the , , , and directions, respectively (as shown in Fig. 1). Among them, ( ) and ( ) constitute two sets of orthogonal coordinate systems with a rotating angle of 45 . That is (2) As an extension of original GVF field model [6], this force vector field can be described as

which minimizes the energy functional

(3)

CHUANG AND LIE: DOWNSTREAM ALGORITHM BASED ON EXTENDED GRADIENT VECTOR FLOW FIELD

Fig. 3. field.

1381

Block diagram of proposed downstream algorithm based on the E-GVF

Fig. 2. Curve of energy versus iteration number in solving the E-GVF field.

where the gradient operators are defined as and , and other symbols have similar definitions as in (1). can be solved from the The force vector field Euler equations by applying calculus of variations [16] to (3). ) values of Integrals are then computed by substituting ( each pixel into (3) to get the total energy f in each iteration. The convergence of iterations can be reached when the energy value is hardly further decreased (within a threshold). Fig. 2 gives an example of converged energy value. Basically, the main goal of this iterative process is to diffuse the gradients all over the image to form the force field for the downstream processing that follows. It is apparent from (3) that the minimization of E-GVF enis equivalent to simultaneous minimization of its two ergy integral items since they are independent and both positive definite. Hence, our E-GVF field can be considered as a joint of two GVF fields that describe the force flows in a relative orientation of 45 . This extended field, having a higher directional resolution, will provide more abundant information for the proposed downstream process than traditional GVF field can do. A further verification of this viewpoint is presented in the section of experimental results. Since the outward downstream paths at each pixel are limand ited to its eight neighbors only, the force vectors are further quantized into eight directions (i.e., 0, , , , ). That is, and where if

(4)

if

(5) if

(6)

if

(7)

if

(8) if

(9)

Fig. 4. Illustration of force contradiction elimination.

and is the sign function to obtain the value if or if . Consequently, each pixel has four downstream flow compo, , , and , and can be connected outnents, wards to at most four out of its eight neighboring pixels. Via this downstream linking, regions can be grown to approach object boundaries. A real case is shown in Fig. 5(a), which is a magnified E-GVF field for an area in Fig. 14(a). To lower down the computing load of E-GVF field, one can from (1) first figure out the vector ) coordinate system to ap[6] and then project it to the ( , i.e., compute proximate via (10) This approximation is undoubtedly not accurate enough. However, the discreteness of and following quantization on and [i.e., (4)–(9)] will lead to flow com) computed via (10) nearly indistinguishable ponents ( from those obtained by solving (3) directly. Experiments found that this approximation is feasible. The main goal of computing E-GVF field is to find out, for each pixel, a set of flow directions (at most four) for downstream processing. On the other hand, traditional GVF field generates only two flow directions for each pixel, which are generally insufficient for downstream processing (see experiments in Fig. 12). III. PROPOSED DOWNSTREAM ALGORITHM Our new downstream algorithm based on the E-GVF field is different from the flooding-based watershed algorithm, but similar to another waterfall-like approach [17] that relies on finding downstream paths. The flooding watershed algorithm starts to fill regions from the bottoms of catchment basins. However, the waterfall process that performs watershed segmentation starts with finding downstream paths from local maxima of gradients to other local minima. A catchment basin is then recognized as a region where multiple downstream paths all end up in the

1382

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 10, OCTOBER 2004

Fig. 5. Real example of E-GVF field [a partial area in Fig. 14(a)] and elimination of components, u ^ 1 ~x, v^ 1 ~ y , p^ 1 ~x , and q^ 1 ~ y , for each pixel, (b) result after force contradictions. (a) Original E-GVF field in terms of 4 flow contradiction elimination.

same local minimum. The main differences between our method and this waterfall approach are the starting positions (i.e., the seeds) and the downstream guidance. The downstream behavior is guided by the E-GVF force field in our proposed method, while by gradient magnitudes in the waterfall approach. In regard to the selection of seeds, pixels with the maximum number of outward gradient directions (see later) are considered in our method, while the seeds correspond to pixels of local maximal gradient magnitudes in the waterfall approach. There are two difficulties to be overcome in our method. One is the force contradiction in the E-GVF field; the other concerns the overlap of grown regions near object boundaries. Solutions to these two problems are described subsequently and the block diagram of the proposed algorithm is illustrated in Fig. 3. A. Force Contradiction Elimination Each pixel in the E-GVF field has at most four force flow components in the four defined axes. It is possible that the quantized force flow components ( )s at any two adjacent pixels are in opposite directions. This force contradiction mostly occurs at object boundaries where force flows are directed inwards from their two sides. It should be eliminated so that the downstream paths are cut to prevent over immersion that possibly occurs during growing. That is, the contradicting , , , or , between two adjaflow component, cent pixels will be removed or set to zero. Fig. 4 illustrates this treatment of force contradiction. Fig. 5 shows a real example to demonstrate the result of force contradiction elimination. B. Seed Generation To search for starting seeds, we score the force flow components pointed from adjacent neighborhoods for each pixel. We call this the seed generation process. Fig. 6 demonstrates six possible cases, where the arrows represent the directions (inwards or outwards) of force flow components to the central pixel. Basically, the scoring evaluates the number of neighboring pixels whose force flow components do not point inwards to the considered central pixel. For example, Fig. 6(a) and (b)

have the highest and lowest scores, 8 and 0, respectively. Any pixel having eight neighborhoods has a score ranging from 0 to 8. For those image boundary pixels with only five neighborhoods, the highest and lowest scores as represented by Fig. 6(c) and (d) are 8 and 3, respectively. As for the image corner pixels, their scoring range is from 5 to 8, e.g., Fig. 6(e) and (f). Pixels of larger scores normally have a lower probability of being immersed by other downstream paths and hence can be used as the seeds for downstream processing. By the above definitions, image boundary or corner pixels are likely to be chosen as the seeds. On processing, pixels with a score 8 are selected first and grouped together by connected component labeling technique to form seed regions. These seed regions are then considered as the top of waterfalls and used to initiate subsequent downstream processing. If the grown regions cannot cover all the image pixels, pixels with the score 7 are then selected as the second set of seeds from the un-tracked pixels. This seed selection process continues with descending scores until all image pixels are tracked. Since in the above procedure we require no manual interaction, the proposed method is, thus, fully automatic. C. Downstream Processing The downstream paths start at seed regions selected from the previous step and move outwards along the directions of force flow components at each traced pixel. The concept of downstream processing is explained first with a one-dimensional (1-D) example in Fig. 7. Fig. 7(a) shows the original signal approximately divided into three regions with a bump in the middle. Fig. 7(b) shows the profiled gradients of an edge function computed from the original image [Fig. 7(a)]. The triangular marks indicate the force flow directions (leftward “ ” or rightward “ ” for this 1-D case) calculated based on the gradient field. On the other hand, in Fig. 7(c), the force flows are computed based on the diffusion and propagation of original gradient responses (i.e., the GVF field). There are four chosen seeds as positioned by circles and marked as A, B, C, and D. Seed A initiates a downstream path from the left side,

CHUANG AND LIE: DOWNSTREAM ALGORITHM BASED ON EXTENDED GRADIENT VECTOR FLOW FIELD

1383

Fig. 6. Illustration of force flows from neighborhoods. The numbers represent the scoring for the central pixels. (a), (b) Interior pixels. (c), (d) Image boundary pixels. (e), (f) Image corner pixels.

seed D initiates another from the right side and seeds B and C are grouped to be a seed region which activates two outward paths. These downstream paths travel along the triangular-mark directions and stop at points where they meet (i.e., force flow contradiction). Three regions are formed as shown in Fig. 7(c) and object boundaries can then be accurately identified from the junctions between them. Notice the differences on force flow directions between Fig. 7(b) and (c). This, resulting from the diffusion and propagation of gradient fields, prohibits the introduction of small regions (i.e., over segmentation). Otherwise, there will be totally 13 regions, as shown in Fig. 7(b), if the same downstream process is applied using the original gradient field. As regarding the case of two-dimensional (2-D) images, each pixel has at most four outward downstream paths (some of them may be eliminated due to force contradiction). A region is then constructed by downstream processing from seed regions. When two grown regions overlap each other by a considerable part of area, they are merged together to form a larger one. Our downstream algorithm based on the E-GVF field is described in details below. . Step 0: Set seed regions in the th Step 1: Assume that there are pass and let them be the initialized regions , . Put pixels of into its track queue . Perform steps for each region , . Step 2: Reset the track_tag arrays , , to zero values (0: untracked; 1: tracked). in a given , track Step 3: For each pixel , , , and directions, along its get the descendant pixels , , and then delete itself from the . The descendant pixels are if

(11)

if

(12) if

(13)

if

(14)

Include, in , the descendants not belonging to previously and insert them into the track queue . Also, mark as a tracked pixel in the image ). (i.e., set Step 4: Iterate Step 3 until no pixels remain in all the trackqueues , .

Fig. 7. One-dimensional example of proposed downstream processing. (a) Original signal. (b) Profile of gradients with computed force flows. (c) Profile of gradients with optimized GVF force flows and selected seed positions (A D).



Step 5: If there still exists untracked pixels in the image, then choose from them the pixels with the highest

1384

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 10, OCTOBER 2004

Fig. 8. Two-dimensional example of downstream processing based on the E-GVF field. (a) Computed force flow field. (b) Force flow field after elimination of force contradiction. (c) Seed-selection scores. (d) Final segmentation result.

score as the seeds and go to Step 1 for next pass ) of downstream processing. If all image ( pixels are tracked (i.e., belonging to at least one of the grown regions), then go to Step 6. Step 6: Examine every possible pair of grown reand . If they satisfy gions, denoted as or , then merge them together, where is a preset ) and means the size threshold ( and cannot be merged and of a region. If , then perform reclassification satisfy of overlapping pixels between them (see following Section III-D). Iterate Step 6 until no any two regions can be merged. Fig. 8 shows a 2-D example of downstream processing in a 6 6 block image. The E-GVF force flow field is figured out as in Fig. 8(a). After the elimination of force contradiction, the new force flow field is given in Fig. 8(b). Since the top two corner pixels have the highest scoring [Fig. 8(c)], they are selected as seeds in the first pass. Fig. 8(d) demonstrates the result that contains only two distinct regions (only one pass is required in this example). D. Reclassification of Overlapping Pixels Between Two Grown Regions The elimination of force contradiction in most cases prevents over growing of regions by cutting the downstream paths at near object boundaries. Since the force flows are generally four directional, they cannot be fully eliminated at boundary pixels. Once a region over grows to immerse in another, the force flows therein will soon guide it back to the boundaries, thus resulting in a slightly overlapping area between two adjacent grown regions. An example of this over growing effect is shown in Fig. 9. In Fig. 9(a), two regions grown after downstream processing are marked with the dash and solid lines, respectively. In order to locate object boundaries accurately, a two-phase reclassification procedure is performed for the overlapping pixels. Phase 1: Reclassification by Dominant Labels: Initially, consider all overlapping pixels as unclassified. For each unclas, check and find the dominant region sified overlapping pixel label from its eight neighborhoods (excluding the unclassified and consider as pixels). Assign the region label to classified after. Hence, the dominant region label is defined as the region that has the largest number of pixels within these eight neighborhoods, something like the eight nearest neighborhood classification rule (8-NNR). Repeat this process

Fig. 9. Reclassification of overlapping boundary pixels. (a) Two regions (with the dash and solid boundaries) obtained after downstream growing. (b) Two-phase reclassification into R1 or R2. (c) Final refined segmentation.

until no pixels can be further classified. If there still exist some overlapping pixels not classified by using dominant region label

CHUANG AND LIE: DOWNSTREAM ALGORITHM BASED ON EXTENDED GRADIENT VECTOR FLOW FIELD

1385

2

Fig. 10. Automatic annular boundary detection by using proposed downstream algorithm. (a) “Annulus” image (256 256 pixels) corrupted with Gaussian noise =w = 7. (c) Visualization of seed scoring. (d) Selected seed regions. (e) Final segmentation result. (SNR = 2.51 dB). (b) Edge image obtained with w (f) Reference boundaries.

in this phase (e.g., in the case that two labels have an equal number of pixels in the eight neighborhoods), then proceed to Phase 2. Otherwise, stop the reclassification procedure. Phase 2: Reclassification by Dominant Inward Force Flows: Check and find the dominant region emitting the largest number of inward force flows from the neighborhoods and to the considered pixel. Assign the region label to consider as classified after. Fig. 9(b) illustrates the procedure of reclassification. Most of the overlapping pixels, except the circled one, can be successfully reclassified in phase 1. For the circled pixel, there are reand . Hence, classispective four neighbors classified as fication rule in phase 2 should be applied. Since there are 3 and 0 force flows pointing inwards to the circled pixel from regions and , respectively, it is classified as a pixel in finally. The result of this example is shown in Fig. 9(c), with the region boundaries being accurately detected. IV. EXPERIMENTAL RESULTS In our following experiments, the block-pixel variation (BPV) quantity [7] is first extracted at each image pixel as a preprocessing to reduce the texture effect. After that, the measure of mean absolute deviation (MAD) [18] is used to compute the edge information and construct the proposed E-GVF field by approximation [i.e., by (1) and (10)]. The window size, pixels, to compute the BPV and MAD values is determined by the degree of image noisiness (should be larger for noisy images) and the speed requirement (larger window . On spends more CPU time). Normally, we set the other hand, the threshold in the region-merging criterion of step 6 in Section III-C is set to be 0.15. This lower value accounts for the possibility of complicated internal structure of the objects to be segmented (e.g., Figs. 15 and 16) which

Fig. 11. Evaluation of segmentation accuracy with varying SNRs: normalized mean and standard deviation of distances [19].

induces several seed regions growing according to their own local structures and overlapping one another by a considerable parts. First, we use artificial annular images, corrupted with Gaussian noise to result in a range of signal-to-noise ratios dB, for simulation. Fig. 10(a) shows an (SNRs), image for which the SNR is about 2.5 dB. The edge image is shown in Fig. 10(b). The computed with E-GVF field is then solved after 80 iterations of computation to converge to a flow field for downstream processing that follows. Fig. 10(c) gives a visualization of seed scoring, where blacker pixels have more outward force flows to the neighborhoods (i.e., higher scoring) and should be chosen as the seeds. Then eight connected seed pixels are grouped to form seed regions.

1386

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 10, OCTOBER 2004

2

Fig. 12. Comparison of downstream algorithm based on the proposed E-GVF versus traditional GVF field models. (a), (f) Original image (128 128 pixels) and its another version generated by corrupting with open-edge noise. (b), (g) Edge images of (a) and (f) obtained with w =w = 3. (c), (h) Seed scoring. (d), (i) Segmentation results by using the E-GVF field. (e), (j) Segmentation results by using traditional GVF field.

Fig. 10(d) shows the result of seed selection (identified as black pixels). After downstream processing, these seed regions grow with the guidance of E-GVF field to form three distinct regions [Fig. 10(e)]. Although the original image is quite noisy, our algorithm is still capable of detecting the annulus reasonably. This can be owing to the BPV preprocessing and efficient diffusion and propagation of gradients via the proposed E-GVF field model. For quantitatively evaluating the segmentation quality of proposed algorithm, an objective evaluation criterion [19] that computes the normalized mean and standard deviation of distances between the detected [Fig. 10(e)] and reference [Fig. 10(f)] boundaries is adopted. The mean and standard deviation of distances are computed separately for each circle and normalized by the diameter that it should have. The measured data for these two concentric circles are then averaged to obtain the final estimation results. Fig. 11 shows the curves of normalized mean and standard deviation of distances evaluated with varying SNRs. The performances are stabilized at 1.5% of mean and 1% of standard deviation when SNR is approximately 2.5 dB), a mean distance equal to 7.0 dB. At worst (SNR of less than 3.5% can still be maintained for our proposed algorithm. Subsequently, experiments were performed to prove the robustness of our E-GVF field over the traditional GVF field [6] in downstream processing. Fig. 12(a) and (b) shows the ), intensity and edge images (by using respectively. The BPV is capable of reducing intensity variation inside the lips. Visualization of seed scoring is shown in Fig. 12(c). Fig. 12(d) and (e) shows the segmentation results by using proposed E-GVF and traditional GVF fields, respectively. In Fig. 12(e), the lips region is obviously over segmented. The reason is that only two force flow components are provided for each pixel in the GVF field, which is not enough to guide the downstream paths to immerse in the whole object. That is, the traditional GVF field has less directional resolution than the E-GVF field and supplies deficient guidance information to be used in proposed downstream processing. Fig. 12(f)

shows the image corrupted with drawn segments, i.e., open edges. The edge image obtained with and the visualization of seed scoring are shown in Fig. 12(g) and (h), respectively. Our downstream algorithm yields the result in Fig. 12(i) and shows its capability in resisting this kind of noise. However, the algorithm based on traditional GVF field is subject to the interference of this noise and gains the segmentation result shown in Fig. 12(j). This reveals that local edges are likely to interfere with the downstream processing, which is guided by only two force flow components. The following experiment demonstrates our capability in multiobject segmentation. Fig. 13(a) shows an image “multiple cars” containing military trucks and cluttered backgrounds. Fig. 13(b)–(d) shows, as before, the edge image ), seed scoring, and final segmentation ( result, respectively. The computation of E-GVF field is converged after 80 iterations, which is sufficient to diffuse the gradients. Our algorithm still gives satisfactory result for this complex image, in spite of a failure in segmenting out the roof of the central truck, which is due to nearly indistinguishable target/background contrast. For applications to medical images, we use the computed tomography (CT) kidney and brain images for experiments. First, the interesting areas are cropped for further processing [Fig. 14(a): 176 144 pixels, Fig. 14(e): 130 140 pixels]. Iterations to compute the E-GVF field are lowered down to 20 resulting from a smaller image size. The proposed algorithm can detect the kidney and brain ventricles accurately, as shown in Fig. 14(d) and (h). To compare with other methods, two images “dog” [Fig. 15(a)] and “twin dogs” [Fig. 16(a)] were used for experiments. Our goal is to segment out the whole bodies of the dogs. The difficulties come from dog’s internal structure in Fig. 15(a) and the complex background in Fig. 16(a). The BPV preprocessing is not implemented here due to their texture- or noise-free characteristic. The block size for computing MAD is expanded to 10 to get the edge images shown in Figs. 15(b) and 16(b). The number of iterations for computing E-GVF field is

CHUANG AND LIE: DOWNSTREAM ALGORITHM BASED ON EXTENDED GRADIENT VECTOR FLOW FIELD

Fig. 13.

Experiments for the image “multiple cars.” (a) Original image (256

1387

2 256 pixels). (b) Edge image. (c) Seed scoring. (d) Final segmentation result.

Fig. 14. Experiments for CT images. (a)–(d) Kidney and (e)–(h) brain ventricle. (a), (e) Cropped images. (b), (f) Edge images. (c), (g) Seed scorings. (d), (h) Final segmentation results.

set to 80. The seed scorings and final segmentation results are shown in Figs. 15(c) and (d) and 16(c) and (d), respectively. To explore the effect of gradient diffusion, Figs. 15(e) and (f) and 16(e) and (f) demonstrate the results when only 30 iterations were performed in forming the E-GVF fields. Obviously, without sufficient gradient diffusion, the pattern of seed regions will be more complicated and under-growing (i.e., over segmentation) would occur. Examining the image shown in Fig. 15(a), it is found that there exist many internal structures within the dog’s body, e.g., dog’s forearms, eyes, and nose, which will possibly interfere with our downstream processing. Since the block size for computing MAD is large and gradient diffusion is capable of removing the boundary effect of weak edges (in contrast to the strong edges between the objects and backgrounds), the dog can be mostly segmented out, in spite of some imperfections happened near the dog’s ears. On the other hand, small regions, such as the eyes or paws, are not segmented by proposed method due to the removal of local boundary information in the estimated E-GVF field. For the “twin dogs,” regions growing from the seeds at image borders are capable of penetrating through open edges [just like that in Fig. 12(f)] in the backgrounds, hence resulting in a clear object segmentation in Fig. 16(d). However, the by-product effect is that the segmentation accuracy is lost at

the left dog’s paws and tail, due to over gradient diffusion there. Figs. 15(j) and 16(j) specify some rectangle areas, whose magnified E-GVF fields are given in Figs. 15(k) and (l) and 16(k) and (l), respectively. By contrast, Figs. 15(g) and 16(g) demonstrate the segmentation results by using the region growing algorithm based on morphological grayscale reconstruction [12], while Figs. 15(h) and (i) and 16(h) and (i) were obtained by applying the watershed algorithm [13] plus some merging criteria based on the analysis of region mean and variance. A comparison shows that only large regions, i.e., the dogs and the background, are detected in Figs. 15(d) and 16(d), while 21 and 24 regions are formed in Figs. 15(g) and 16(g), respectively, due to over segmentation. As for the well-known watershed algorithm, it is ineffective to segment out the objects unless a robust merging procedure is adopted. The under- or over-segmentation problem still occurs in Figs. 15(h) and (i) and 16(h) and (i); even the merging threshold is set to higher values in those cases. In comparison to the results in Fig. 15(g)–(i) where some object regions are merged with the backgrounds, our E-GVF field is actually more effective in providing sufficient edge information for object/background separation. The above experiments were all implemented based on C-codes running on a Pentium IV 2.4 GHz CPU. The average

1388

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 10, OCTOBER 2004

2

Fig. 15. Experiments for the image “dog.” (a) Original image (310 364 pixels). (b) Edge image obtained with w segmentation results based on the E-GVF fields computed in 80 and 30 iterations, respectively.

CPU time spent in preprocessing (BPV and MAD), E-GVF, s, s, and downstream processing is s, respectively, depending on the image size, and block size ( ), number of iterations required in solving E-GVF field, and number of seed regions. For example, it spends a total of 2.61 s and 1.76 s to process “dog” and

= 10. (c), (e) Seed scorings. (d), (f) Final

“twin dogs,” respectively (approximately 300 300 pixels, , and 80 iterations). Table I shows a quantitative comparison of performances between several techniques for the “twin dogs”: downstream based on E-GVF field, downstream based on traditional GVF field, region growing, and watershed. The reference contours are ob-

CHUANG AND LIE: DOWNSTREAM ALGORITHM BASED ON EXTENDED GRADIENT VECTOR FLOW FIELD

1389

Fig. 15 (Continued). Experiments for the image “dog.” (g) Segmented regions by using the region growing method [12]. (h), (i) Results obtained by using the watershed algorithm [13] plus some merging procedure. (j) Two specified areas. (k), (l) Magnified views of E-GVF field for the specified areas.

tained via manual segmentation. Note that we ignore the incoherent parts due to over segmentation in computing the segmen-

tation errors. This leads to a result that region growing and watershedding with the intensity-merging criterion contrarily have

1390

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 10, OCTOBER 2004

2

Fig. 16. Experiments for the image “twin dogs.” (a) Original image (320 240 pixels). (b) Edge image obtained with w = 10. (c), (e) Seed scorings. (d), (f) Final segmentation results based on the E-GVF fields computed in 80 and 30 iterations, respectively. (g) Segmented regions by using the region growing method [12]. (h) Results obtained by using the watershed algorithm [13] with a merging procedure.

a lower mean error and standard deviation than ours. On the other hand, the proposed algorithm prevents over segmentation and maintains a reasonable segmentation accuracy, at an expense of higher computing loads. V. CONCLUSION In this research, a new downstream algorithm based on E-GVF field model is proposed for automatic multiobject

segmentation. Our algorithm extends the GVF field originally proposed in [6] by adding two components in a 45 -rotated coordinate system. Optimization to achieve a minimum energy for the flow field remains the same. Since the E-GVF field model naturally gives eight neighborhood gradient information toward the object boundary, it can be used for accurate image segmentation than traditional GVF can do. We perform scoring and selection of seeds by considering local gradient direction information around each pixel. This step is automatic and

CHUANG AND LIE: DOWNSTREAM ALGORITHM BASED ON EXTENDED GRADIENT VECTOR FLOW FIELD

1391

Fig. 16 (Continued). Experiments for the image “twin dogs.” (i) Results obtained by using the watershed algorithm [13] with a merging procedure. (j) Two specified areas. (k), (l) Magnified views of E-GVF field for the specified areas. TABLE I COMPARISONS ON SEGMENTATION ACCURACY (IGNORING OVER-SEGMENTED PARTS) AND EXECUTION TIME BETWEEN THE DOWNSTREAM (BASED ON E-GVF OR GVF), REGION GROWING, AND WATERSHED TECHNIQUES FOR “TWIN DOGS” IN FIG. 16(a)

requires no human interaction, making our algorithm much suitable for practical applications. Based on the E-GVF field (with four direction flow components and elimination of force contradiction), object segmentation can be easily achieved via the proposed downstream processing. Our algorithm encountered less over segmentation, often occurring in watershed algorithms, owing to sufficient

gradient diffusion in forming the GVF field and higher direction resolution in guiding the downstream paths (i.e., E-GVF). Hence, our algorithm is advantageous of segmenting objects that are separated from the backgrounds, while ignoring the internal structure of them. That is, our method is positioned to be an algorithm of object segmentation, not for region segmentation purpose. If users want to segment out internal features

1392

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 10, OCTOBER 2004

of an object (e.g., eyes or paws), a manual drawing to give the region of interest (ROI) first should be necessary. The computation of E-GVF fields can be approximated via the projection of original GVF fields without distinguishable loss of segmentation performance, thus reducing the computing load nearly into half with respect to optimizing (3) directly. Nevertheless, this may be still critical in certain real-time applications (e.g., about 1 sec for a 310 364 pixel image on a Pentium IV 2.4 GHz CPU). Fortunately, a multiresolution GVF computing architecture has been proposed [20] to gain a speed up of nearly 40 times. Our downstream processing can no doubt be beneficial from this speed up. The proposed algorithm is versatile with possible applications anywhere automatic detection or segmentation of multiple objects is required, e.g., content-based image retrieval systems and biomedical image analysis. Further work will be to reduce computing load, investigate other robust criteria (e.g., color, texture, size, etc.) for merging overlapping regions, and extend to three-dimensional objects or video object plane segmentation in MPEG-4 standard applications. REFERENCES [1] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” Int. J. Comput. Vis., vol. 1, pp. 321–331, 1988. [2] D. J. Williams and M. Shah, “A fast algorithm for active contours and curvature estimation,” CVGIP: Image Understanding, vol. 55, no. 1, pp. 14–26, 1992. [3] K.-M. Lam and H. Yan, “Locating head boundary by snakes,” in Proc. Int. Symp. Speech, Image Processing, Neural Networks, vol. 1, 1994, pp. 17–20. [4] P. C. Yuen, G. C. Feng, and J. P. Zhou, “A contour detection method: Initialization and contour model,” Pattern Recognit. Lett., vol. 20, no. 2, pp. 141–148, 1999. [5] Y. Y. Wong, P. C. Yuen, and C. S. Tong, “Segmented snake for contour detection,” Pattern Recognit., vol. 31, no. 11, pp. 1669–1679, 1998. [6] C. Xu and J. L. Prince, “Snakes, shapes, and gradient vector flow,” IEEE Trans. Image Processing, vol. 7, pp. 359–369, Feb. 1998. [7] W.-N. Lie and C.-H. Chuang, “A fast and accurate snake model for object contour detection,” Electron. Lett., vol. 37, no. 10, pp. 624–626, 2001. [8] R. Adams and L. Bischof, “Seeded region growing,” IEEE Trans. Pattern Anal. Machine Intell., vol. 16, pp. 641–647, June 1994. [9] Y.-L. Chang and X. Li, “Adaptive image region-growing,” IEEE Trans. Image Processing, vol. 3, pp. 868–872, July 1994. [10] S. A. Hojjatoleslami and J. Kittler, “Region growing: A new approach,” IEEE Trans. Image Processing, vol. 7, pp. 1079–1084, Aug. 1998. [11] J. Fan, D. K. Y. Yau, A. K. Elmagarmid, and W. G. Aref, “Automatic image segmentation by integrating color-edge extraction and seeded region growing,” IEEE Trans. Image Processing, vol. 10, pp. 1454–1466, Oct. 2001. [12] L. Vincent, “Morphological grayscale reconstruction in image analysis: Applications and efficient algorithms,” IEEE Trans. Image Processing, vol. 2, pp. 176–201, Jan. 1993.

[13] L. Vincent and P. Soille, “Watersheds in digital spaces: An efficient algorithm based on immersion simulations,” IEEE Trans. Pattern Anal. Machine Intell., vol. 13, pp. 583–598, June 1991. [14] K. Haris, S. N. Efstratiadis, N. Maglaveras, and A. K. Katsaggelos, “Hybrid image segmentation using watersheds and fast region merging,” IEEE Trans. Image Processing, vol. 7, pp. 1684–1699, Dec. 1998. [15] J. M. Gauch, “Image segmentation and analysis via multiscale gradient watershed hierarchies,” IEEE Trans. Image Processing, vol. 8, pp. 69–79, Jan. 1999. [16] P. V. O’Neil, Advanced Engineering Mathematics, 3rd ed. Belmont, CA: Wadsworth, 1991, ch. 9. [17] M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis, and Machine Vision, 2nd ed. Pacific Grove, CA: Brooks/Cole, 1999, ch. 5. [18] T. Pham-Gia and T. L. Hung, “The mean and median absolute deviations,” Math. Comput. Modeling, vol. 34, no. 7-8, pp. 921–936, 2001. [19] [20] K. S. Ntalianis, N. D. Doulamis, A. D. Doulamis, and S. D. Kollias, “Multiresolution gradient vector flow field: A fast implementation toward video object plane segmentation,” in IEEE Proc. Int. Conf. Multimedia and Expo, Tokyo, Japan, 2001, pp. 1–4.

Cheng-Hung Chuang received the B.S. degree in electrical engineering from the Tatung Institute of Technology, Taiwan, R.O.C., in 1994 and the M.S. and Ph.D. degrees in electrical engineering from the National Chung Cheng University, Chia-Yi, Taiwan, in 1996 and 2003, respectively. In 2003, he joined the Institute of Statistical Science, Academia Sinica, Taipei, Taiwan, as a Postdoctoral Fellow for human brain science research. His research interests include optical and biomedical signal processing, 3-D display systems, computer vision, image/video processing, and pattern recognition.

Wen-Nung Lie was born in Tainan, Taiwan, R.O.C., in 1962. He received the B.S., M.S., and Ph.D. degrees, all in electrical engineering, from the National Tsing-Hua University, Taiwan, in 1984, 1986, and 1990, respectively. From September 1990 to June 1996, he served as an Assistant Scientist at the Chung Shan Institute of Science and Technology (CSIST), Taiwan, where he was devoted to the development of infrared imaging systems and target trackers for military applications. In August 1996, he joined the Department of Electrical Engineering, National Chung Cheng University, Chia-Yi, Taiwan, as an Asociate Professor. From July 2000 to January 2001, he was a Visiting Scholar at the University of Washington, Seattle. His research interests include image/video compression, networked video transmission, audio/image/video watermarking, multimedia content analysis, standard-compliant multimedia encryption, infrared image processing, 3-D stereo virtual reality by image-based rendering, and industrial inspection by computer vision techniques.