1182
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 5, MAY 2006
A Faster Converging Snake Algorithm to Locate Object Boundaries Mustafa Sakalli, Member, IEEE, Kin-Man Lam, Member, IEEE, and Hong Yan, Senior Member, IEEE
Abstract—A different contour search algorithm is presented in this paper that provides a faster convergence to the object contours than both the greedy snake algorithm (GSA) and the fast greedy snake (FGSA) algorithm. This new algorithm performs the search in an alternate skipping way between the even and odd nodes (snaxels) of a snake with different step sizes such that the snake moves to a likely local minimum in a twisting way. The alternative step sizes are adjusted so that the snake is less likely to be trapped at a pseudo-local minimum. The iteration process is based on a coarse-to-fine approach to improve the convergence. The proposed algorithm is compared with the FGSA algorithm that employs two alternating search patterns without altering the search step size. The algorithm is also applied in conjunction with the subband decomposition to extract face profiles in a hierarchical way. Index Terms—Active contour model, boundary detection, fast greedy snake algorithm (FGSA), greedy snake algorithm (GSA), locating human face boundaries.
I. INTRODUCTION
A
snake is an active contour model which has been widely used to extract the boundaries of an object in facial image processing [1], [2] object tracking [3], [4] medical imaging [5]–[7], and image segmentation [8]. Initially, the location of the object under examination is estimated, and a snake is set around the object boundaries. Then, the iterative process is initiated to allow the snake to converge to the likely boundaries. One of the major advantages of the snake [9], [10] over edge detection algorithms is the continuous representation of the contours. The active contour model was introduced by Kass et al. [10] as an elastic contour extraction technique for boundary detection. In this modeling technique, the candidate boundary points are examined and moved in such a way that the overall energy level of a snake will reach a minimum with respect to all the possible object contours. In [2] and [10], locating the best matching contour was presented as a variational problem. A dynamic programming Manuscript received August 28, 2003; revised October 30, 2004. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Aly A. Farag. M. Sakalli is with the School of Electrical and Information Engineering, University of Sydney, Sydney NSW 2006, Australia, and also with the Centre for Multimedia Signal Processing, Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong (e-mail:
[email protected]). K.-M. Lam is with the Centre for Multimedia Signal Processing, Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong (e-mail:
[email protected]). H. Yan is with the School of Electrical and Information Engineering, University of Sydney, Sydney NSW 2006, Australia, and also with the Research Center for Media Technology, Department of Computer Engineering and Information Technology, City University of Hong Kong, Kowloon, Hong Kong (e-mail:
[email protected]). Digital Object Identifier 10.1109/TIP.2006.871401
approach was proposed in [11], which was incorporated with the geometric models of face images in [12]. Later on, the greedy snake algorithm (GSA) developed in [13] was based on investigating the first order neighboring pixels around a snaxel. This algorithm can provide a considerable improvement in the convergence speed of the snaxels. A faster version of GSA, namely fast greedy snake algorithm (FGSA), was introduced in [1], and was also implemented for locating head boundaries. The GSA uses a search pattern with a step size equal to one, and considers all the surrounding pixels in the search pattern. The FGSA employs two alternate pixel search patterns during the iteration process. It can increase the speed of convergence, but at the expense of a twisted route. In this paper, an approach employing different step sizes for the search patterns of the GSA and FGSA is proposed to improve the speed of convergence. The attractive force to draw the snakes to the object boundaries is the image gradients, which are obtained by means of an edge-detection step to approximate the contour. However, in the case of weak gradients, the snake model tends to collapse or does not work properly. A solution for the partial differential equation representing the energy functional for snake can be obtained by using the finite difference methods in an iterative fashion, in which case snakes converge only to localized solutions. Balloon forces were introduced into the equation to prevent a collapse. The statistical snakes proposed in [14] are based on a statistical region growing scheme, where pixels within standard deviations of the regional mean generate an expanding pressure while all other pixels produce a collapsing pressure. A class of probabilistic shape models, including fuzzy approaches, was proposed in [15], and, recently, a pattern recognition-based sophisticated method was elaborated in [16] for the pressure snakes, again in conjunction with the fuzzy approaches to enhance the performance regarding localizing contours across a variety of objects and backgrounds. A fuzzy curve tracing (FCT) algorithm was proposed with fuzzy and crisp clustering techniques in [17] and [18], using the theoretical basis for the stable convergence conditions. Different snake models—the traditional, principle curve-based, the balloon, and the gradient vector flow (GVF) based models—are compared for their performances in terms of convergence abilities. It is found that the FCT outperforms the others in terms of the accuracy of the traced curve and the speed of convergence. This is because the energy function is computed for a cluster of pixels instead of individual pixels. The clustering procedure has an averaging effect and can smooth or filter out the influence of noisy and outlier pixels, but the smoothing effect of the clustering may also prevent snakes from converging to the vigorous details of the contours, such as to sharp corners.
1057-7149/$20.00 © 2006 IEEE
SAKALLI et al.: FASTER CONVERGING SNAKE ALGORITHM TO LOCATE OBJECT BOUNDARIES
The emphasis of this paper is mainly on the convergence speed applied to the traditional snake model, although it can also be adopted by other algorithms as a search technique. A snake model based on a B-spline representation was proposed in [19], where the emphasis was to achieve global optimization in multiple stages for both the accurate localization and the speedy convergence. It employs Zernike moments to locate corners accurately. When compared to our approach presented below, this representation is complex and sophisticated. This paper is organized as follows. In Section II, a brief overview to the snake model is presented, including some recent algorithms, GSA and FGSA. In Section III, we present our new algorithm, which is called skippy greedy snake algorithm (SGSA). Section V gives the details of the implementation considerations. The subsequent section incorporates our algorithm with the subband decomposition for a multiresolution approach in order to provide further improvements to the convergence speed. The idea is to estimate the direction of the gradient information over the subbands of facial images so that only the snaxels at a certain direction will be considered in an iteration. The last section concludes the paper with the performance achieved by our algorithm. II. SHORT REVIEW OF SNAKE ALGORITHMS The snake is a method of contour representation with a , called snaxels, where is the normalnumber of nodes . The convergence ized arc length in the range is governed by two energy functionals, the internal energy and the external energy , which are defined based on the template properties and the image properties, consists of two terms, the continuity respectively. and , force and the curvature force, denoted as respectively. The energy functional to be minimized for a snake with snaxels is as follows: (1)
neighboring pixel positions under exwhere refers to the which is based on a pixel search amination with step size , around the snaxel, where , pattern . The snaxel movement is to choose the pixel in the neighborhood that minimizes (1). For the traditional snakes , which means that all the pixels (GSA), step size is reached are at the first degree neighborhood relation at all directions. The coefficients , , and are weighting factors that control the relative importance of the continuity energy, bending energy, and image forces, respectively. The continuity energy is approximated as the first-order continuity function of the snaxels with (2) refers to the distance between the two conwhere secutive snaxels and . The average distance be. tween the adjacent pixels is is the degree of uniformity of the distance distribution
1183
between adjacent snaxels of a contour. When the distances be, tween the adjacent snaxels are close to the average distance approaches zero. is updated at the end of each iteration. The continuity term encourages the snaxels to be evenly , indicates spaced, while the curvature energy, the degree to which a snaxel is being bent with respect to its is calculated as follows: two adjacent snaxels.
(3) These two energy terms are normalized by the respective largest values in the neighborhood. The counteracting energy , which is also to be normalized, is functional defined based on the local gradient magnitude as follows: (4) represents the intensity gradient at the point where of the snaxel with a step size of , and and are the minimum and maximum gradients for all pixels in the neighborhood. The new position of the snaxels is the one that results in the maximum reduction of the total energy based on (1). Greedy Snake Algorithm and Fast Greedy Snake Algorithm: GSA [13] and FGSA [1] search for the new position of a snaxel based on search patterns with a step size equal to 1. In the case of the GSA, the energy functional is computed at each snaxel for its current pixel and for its eight neighboring pixels, to determine its new position in an iteration. Therefore, is 9, as the number of neighboring pixels examined shown in Fig. 1(a). The FGSA employs two patterns that are swapped alternately at successive iterations, so the number of . One search pattern pixels examined is reduced to has a cross pattern shape and the other has a diagonal pattern shape, as illustrated in Fig. 1(b). The route traced during the optimization process depends on the local variations of the image content, which makes it harder to find an optimal solution with a simple and nonvigorous path, is given in which assuming that an image segment the energy functionals guide the path of the search in a diagonal direction from one corner to its opposite counterpart. The number of iterations required for the GSA, which is denoted as , is 14. In the case of the FGSA, the number, which is de, is increased by as compared noted as with [1], due to the use of the alternate search patterns in FGSA. This results in a zigzag moving path instead of a straight line to the destination. Suppose that the first search pattern used is the cross pattern at the left-bottom corner of Fig. 2(a). The path of a snaxel to reach the pixels along the diagonal axis with the minimum number of iterations will be as presented in the same figure. If the target is the next diagonal pixel, and the current search pattern is cross shaped, then the minimum number of iterations to reach the first diagonal pixel requires two iterations without going through the corners. The next pixel on the diagonal direction can be reached in three iterations only, if a snaxel follows a close side way in parallel to the diagonal axis so that
1184
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 5, MAY 2006
Fig. 3. Minimum number of iterations required to reach a target pixel position by a snaxel using GSA, FGSA, and SGSA. In the case of FGSA, the shape of the different routing courses will require a higher number of iterations.
Fig. 1. Black pixels are examined for a possible local minimum (a) in GSA, (b) in FGSA alternating between the patterns b1 and b2, and (c) in the case of SGSA, the step sizes of adjacent snaxels are alternated between the step size = 1 in c1 and = 2 in c2.
Fig. 4. Computational efficiency (the number of iterations times the number of pixels examined at each iteration) for FGSA, SGSA, and SFGSA as compared with GSA for 100 iterations.
a snaxel if the pixel search pattern is cross shaped. So, the minimum number of iterations required to reach the pixels along a diagonal axis will have the characteristics of 2, 3, 4, 6, 7, 8, 10, etc. [the third line in Fig. 2(b)], which translates one iteration . That is delay at every three iterations comparing with (5)
Fig. 2. (a) Routing pattern along the diagonal axis that a pixel position can be reached with the minimum number of iterations using GSA, FGSA, and SGSA (the search pattern for SGSA is given for = 1 and = 2); (b) number of iterations required for each method for the same course of routing path given in (a) with a length of those 15 pixels. The pixels visited in the reverse direction (marked black) are those for which the energy functionals in (1) will not be executed if they do not satisfy the minimum energy functional.
this snaxel can reach vertically to the next pixel [Fig. 2(a)]. The next search pattern used at the fourth iteration is the diagonal pattern, which gives the way directly to the next diagonal (third) pixel. There will always be a delay with the diagonal routing of
where . Even though the number of iterations increases, this algorithm requires less computation because of the fact that a fewer pixels are examined. The computational efficiency is calculated in terms of the number of iterations and the number of pixels examined at each iteration [1], , where is that is, the number of pixels examined at each iteration and indicates the number of iterations. A comparison of the performance is presented in terms of the number of iterations required to reach a pixel along the diagonal path in Fig. 3, and the computational efficiency presented in Fig. 4 takes into account the number of iterations required and the number of pixels examined at each converges to 1.35 when the length of the path iteration. becomes longer. During the iteration process, a snake might be trapped around an undesired local minimum due to the slithering behavior of
SAKALLI et al.: FASTER CONVERGING SNAKE ALGORITHM TO LOCATE OBJECT BOUNDARIES
convergence [11]. In this sense, FGSA has the advantage of introducing a perturbation type of behavior. Nevertheless, FGSA still has a high probability of being trapped around an undesirable local minimum. Possible methods of escaping from the local minima include the exploration of all possible positions of the snake and the application of simulated annealing. However, these approaches are too computational for practical applications. Possible solutions to alleviate the effect of local minima are to smooth the images with either a median or Gaussian filter with a large window size, and to increase the step size of the search windows. In fact, a snaxel in FGSA cannot move in a straight path because of the use of alternate patterns. The maximum cost occurs when the routing path is a vertical or a horizontal straight line. In this case, the number of iterations required will be doubled, and, therefore, the advantage of reducing pixels to be examined will disappear. In the next section, we introduce a method that can reduce the number of iterations required irrespective of the path of the search and the effect of local minima by introducing a perturbation kind of behavior to the snaxels during iteration process. III. SKIPPY GREEDY SNAKE ALGORITHM Our algorithm adopts search patterns of two different step sizes as denoted by and . The step sizes of even and odd ordered snaxels alternate between and at each iteration. The step size of the search can be set to a higher value if the neighboring pixels are highly correlated, which is the case pointed out for natural images [20]. Therefore, the pixels in the vicinity of the current snaxel can be searched in a skipped way. However, it is possible that a pixel position which can produce a smaller value of energy functional might be skipped. This implies that the step sizes should be selected in such a way that all possible positions can be reached by the snaxels, particularly in the fine tuning stage. For the same reason, if the step sizes are chosen to be large, then alternate step sizes between adjacent snaxels moving in shorter step sizes will probably prevent the snaxels from going astray. Using this idea, the pixels that are skipped during the large step size will be examined with the small step size in the reverse direction, as illustrated in Fig. 2. While one set of snaxels complete the search with a smaller step size, the snaxels at the adjacent executes the search at a larger step size, and the roles are switched at every iteration. This is performed continuously, like a hopscotch-like behavior of dancing partners, as shown in Fig. 1(c). The patterns of the step sizes applied in the paper are fixed to a certain shape which can be more diversified with other patterns or with the local image dependencies. With a step size of , the number of pixels covered in the search for a new position . Only nine of the pixels that are located is at the corners and the midpoints, including the current snaxel position, are examined. If the step size is increased to a value of 3 or more, some of the positions cannot be reached. Therefore, the applicable step size patterns must be determined cautiously. An optimal solution to determining the step size is to employ the local statistical features. In the next section, this is experimented with the edge images to attract the snaxels gradually to their final destinations.
1185
For an image in which a snaxel takes a diagonal path, the minimum number of iterations required along the route using , the FGSA algorithm increases from 1.2 to 1.35 times of as calculated in (5). On the other hand, the minimum number of iterations required for our new approach decreases. This is calculated as follows: Suppose that the current pixel position and the initial step size is . The new position is will be . After the second step, the snaxel position will . The step size of the consecutive search will be be in all directions around the snaxel positioned at . to So, the pixels examined will be at the range , one at the backward direction and one at the forward direction. This is clearly depicted in Fig. 2(a) and (b) with at the backward direction toward the skipped region and with step toward the forward direction to examine further locataken, pixel positions of tions. Thus, at every leap of will be skipped. The search algorithm will not be executed for the skipped pixels along the path, but they will be ex, amined if they can be reached. If the step size is chosen then the pixels skipped need to be taken into consideration. In our algorithm, is set at one to examine the close vicinity of the snaxel that is reached after a large skip. This gives a gain as the number of pixels skipped for which the algorithm of (1) will not be executed. As a result, the minimum number of iterations required to reach a pixel position will be reduced by the number of pixels skipped when compared with the number of iterations required for GSA. Thus, the number of pixels skipped is proportional to the number of the large step size taken, which is . Since the small step size does not contribute reducing the number of pixels at this at the second part of (6). This part of stage, it is offset by the equation is rounded to the upper integer with to cover the effective range of the large step that is between (not inclusive) and (inclusive), where is the for the given . As a number of complete leaps of result, the number of iterations required in our case is (6) is to add two extra iterations for the where the term pixels that can be reached only in the reverse direction with the step size , as indicated in Fig. 2, after an immediate large step size of . As an example in the same figure, the number of pixels skipped while reaching the is range of pixels respectively; {.} represents after an the range of pixels that can be reached with number of leaps. After subtracting skipped , the number of iterations required will be values from except for the first pixels in each group that are the second, fifth, eighth, , pixels that are reached in the reverse direction after the middle pixels in pixel group of are reached, Fig. 2(b). Therefore, each the value of for these pixels must count these two additional steps for each backward direction, otherwise it is assigned to zero at all other locations. The coefficient can be determined , where , as
1186
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 5, MAY 2006
which marks the pixels reached in the reverse direction with that is configured a function of the an even numbered pixel under consideration. is basically for marking a specific pixel position with even numbers. Equation (6) will remain valid for all different step sizes but its formulacan also be formulated as tion needs to be reconsidered. . Substituting the values of and to the equation given above yields (7) Notice that the second term with in (6) is subtracted from the value of , while the second term with in (5) for FGSA . The third term in (7) is either 2 or 0 for the is added to and . given step sizes of Another straightforward way of obtaining the same result for is to calculate it , needs to add one iteration for the in which case residual pixels that can be reached with the small step size , but not for the pixels that are reached with the large step , that is, . Equation (6) is preferred since it formulates the reduction in the number of iterations in SGSA when compared with the increasing number of iterations in FSGA. In the skippy case, if the step size is chosen to be very large, some pixel positions cannot be reached and examined. Therefore, the path taken by a snaxel will be discontinuous (interruptive), but, in the case of FSGA and GSA algorithms, the path will always be continuous. In the case of SGSA, if the step sizes and , then all the pixels along are chosen as the course of a snaxel can be examined, although the routing behavior of the snaxels will always be skippy. This means that some pixels that are probed along the path followed by a snaxel will not take the role of a snaxel. If the step size is set larger than 2, there will be a number of pixels that cannot be considered as candidate. This may be one of the disadvantages of the skippy behavior, in which case a snaxel may skip the true contour positions and/or may even be trapped between two local minima from each other. For exwith an approximate distance of in Figs. 6 and 7, at the fine tuning stage in ample, in image our experiment, the snaxels above the forehead collapse toward the spectacle frame. This situation can be avoided by applying a different size of at the nearby regions of the likely contour positions with different sets of weighting parameters, which is not applicable for the other images. Besides, if a larger step size is practiced close to the final contours, then the convergence of snaxels results in a zigzagged shape of contour, which requires additional number of iterations for snaxels to recover their final smooth contours. Therefore, it is important to determine step sizes cautiously. A unified pseudocode of the algorithms is given in Fig. 5. Fig. 3 illustrates a comparison of the number of iterations required for the GSA, FGSA, and SGSA. FGSA requires a higher number of iterations, while SGSA a fewer iterations than GSA. The efficiency achieved by FGSA is due to the fact that a less number of pixels are examined in a larger area for any search pattern. When the reduction in pixel numbers to be
Fig. 5. Unified pseudocode of the snake algorithms of GSA, FGSA, and SGSA.
examined is taken into account, the FGSA has a computational efficiency of 1.2 to 1.35 times of GSA, as shown in Fig. 4. The SGSA achieves a higher efficiency level, reaching 1.5 when the number of iterations exceeds 20, as shown in the same figure. The zigzagging behavior of the curves in Fig. 4 is due to the switching of the search patterns and the switching between the alternate step sizes at each iteration. From (5) and (6), the computational efficiency of the proposed skippy GSA is higher than that of the FGSA, as presented in Fig. 4. The improvement becomes noticeable after a number of iterations are taken along the path. This is because the maximum computational efficiency is constant while the zigzag variation becomes smaller in amplitude when compared with the length of the path followed by a snaxel but the improvement provided in speed up remains always constant in long term. An integrated version of the SGSA and FGSA is also evaluated. Similar to FGSA, the search pattern in this algorithm is switched between two search patterns with the additional cost of a zigzagged route. By choosing a step size larger than 2, different search patterns may also be considered, i.e., a starshaped pattern, or vertically or horizontally organized patterns, etc. This algorithm is called the SFGSA. Similar to the FGSA, the SFGSA examines five pixels only while switching between the cross and diagonal patterns. The computational efficiency of SFGSA approaches the upper limit of 2.7 and the lower limit approximately converges to almost 2.4 after the thirteenth iteration, as shown in Fig. 4. Both SGSA and SFGSA can always achieve a positive gain with respect to GSA. The computation required by SFGSA is calculated by substituting (6) for (5), and by considering five pixels instead of nine in the search patterns. The vigorous lines of efficiency presented in Fig. 4 are due to the skipping action. In summary, the computational efficiency of FGSA is limited to 1.35. This efficiency can be increased to 1.5 by employing the SGSA, and to 2.5 if switching patterns are adopted, as employed in SFGSA.
SAKALLI et al.: FASTER CONVERGING SNAKE ALGORITHM TO LOCATE OBJECT BOUNDARIES
IV. IMPLEMENTATION OF SFGSA TO LOCATE FACE BOUNDARIES In this section, FGSA and SFGSA algorithms are applied to extract the human face boundaries in front and side looking images. The main issues with the snake are the initialization of the snaxels [6], [21], [22], and the need to prevent the snaxels from being attracted at undesired local minima [21] and to avoid their mingling to each other particularly while approaching to an object with large step sizes. Therefore, a successful application of the snake requires an arduous preprocessing stage to emphasize the likely position of the targeted contours for a smooth convergence to the final contours. In this sense, the direction and the magnitude of a contour gradient are applied to guide the snaxels, as described in the Section V. The method of initialization employed is based on the vertical and horizontal projections [23], [24] of head images [25]. The marginal points of a head in an image are estimated as the peak values of the first degree derivative of the projections that coincide with the inclining or declining ends of the head projections in [24], [26]. In some cases, the marginal points could not be located accurately, due to the existence of nonuniform structures in the faces, such as glasses, beard, and rotations. Based on these four marginal points, a snake is initialized as an ellipsoid with 30 snaxels placed close to the facial contour. Two types of facial contour images are produced for each image. One is the edge image, while the other is a further smoothed edge image to attract the snake to a close vicinity of the likely contour regions. This is to create a capture region to draw the snake to the likely boundaries even if it is placed far away from the object, and to prevent the snake from converging to strong but higher frequency components of the image. The initial image is obtained by using a combination of a first-degree averaging filter and a morphological edge detector using blurring, erosion and dilation operators. Each image is extended symmetrically with a number of pixels equal to (half of the filter dimension minus one), and is then filtered with a Gaussian mask of 17 17 with [1]. The coefficients , and are determined empirically for their dependence on the local properties of the snaxel location. The stiffness of the model is controlled by the relative values of and . However, a consistent set of parameters could not be achieved. It is observed that, when the value of is kept close , the balance between model and to or slightly higher than image forces resulted in a satisfactory contour and we could not observe an important position dependent set of and against to each other. In our opinion, this may be due to the preprocessing stage which amplifies the image forces to attract snaxels to the final contours. The parameters and are set at a value between 0 and 1, and the value of is set between 1 and 1.2 [1], [13]. The Gaussian filtered edge image is used in the first stage of the convergence process until the number of snaxels changing their positions in an iteration is less than a certain number set to 10. Then, the second fine-tuning stage is started, and a less smoothed edge image is used. The iteration process is carried on until fewer than three snaxels changed their positions or until the last routine is repeated four times. This contour is then assumed
1187
to be the one, among all the possible contours, that satisfies the minimum energy constraints. The step sizes applied in the first stage for the SFGSA are and for coarse localization. In the second set at stage of fine tuning, is reduced to 2. With this arrangement, the search patterns can have a larger perturbation (a larger step size) initially such that the probability of being trapped in a local minimum is reduced. In the second stage, the perturbation is reduced so that the snaxels can move to the targeted boundary point accurately. With a large step size, the snaxel under consideration can reach a position further away. This will make the effect of the bending and the discontinuity forces more significant. However, not all the snaxels in a snake move with this large step size. The alternate step sizes between adjacent snaxels can provide a balance in such a way that a snaxel with the small step size can slow down an undesired excessive deviation of its adjacent snaxels which are moving with a large step size. The advantage of alternating step sizes between adjacent snaxels is that they can provide a perturbing behavior to the snaxels, resulting in a better convergence. To compare the performances of FGSA and SFGSA, the same initialization procedure and weighting factors are used in the experiments. When the average convergence speed of SFGSA is compared to that of FSGA, it is observed that for the first stage the speed of SFGSA increases by more than 40% which is lower than the figures given for the ideal cases in Fig. 4. This is mainly due to the complexities of the image forces from the object concerned; these will not be likely to attract the snaxels to move to the object boundary along a straight and diagonal path. Besides, the snaxels are interrelated to each other and they will not converge and settle at the same time with the same number of iterations. Therefore, the average speedup obtained in experiments is reasonable. At the second stage, the speedup is 21%, as compared to FSGA, as presented in Table I. The improvement is less for the second stage due to the use of a smaller step size. Figs. 6 and 7 indicate that the precision of the final contours based on FSGA and SFGSA are very similar. Comparing the performances of final contour representation for FGSA and SFGSA, the experimental results illustrated in Figs. 6 and 7 are almost the same; or, in some cases, SFGSA can provide slightly better results such that the snaxels are attracted closer to the details of the likely positions rather than getting close to somewhere in the vicinity of the boundaries. fail to converge precisely For example, the snaxels in image for both algorithms, while in image , FGSA seems to provide smoother representation. For image , a snaxel is trapped at the left temporal side of the forehead when FGSA is employed in Fig. 6. The snaxel moves toward the facial boundaries with SFGSA, as shown in Fig. 7, while the snaxel on the right side of the face just over the cheek cannot move to the expected positions. This is due to the fact that image forces in this area are weak. These experiments indicate that, while the suggested skippy behavior of the algorithm provides a closer convergence to the details, in a few cases snaxels are trapped around the details far from facial boundaries, or moved further inside the face. As a result, one disadvantage of the skippy behavior is the likelihood of skipping boundaries or the possibility of being attracted to other details in the close proximity to the boundaries. One reason for this is that there are no structural or semantic restric-
1188
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 5, MAY 2006
TABLE I MEANS AND VARIANCES OF THE NUMBER OF ITERATIONS FOR THE COARSE TUNING STAGE S1 AND THE FINE TUNING STAGE S2 OF CONVERGENCE USING THE FGSA AND SFGSA BASED ON 40 IMAGES
Fig. 6. Some results of the facial images where boundaries are located with FGSA. From left to right and from top to bottom, denoted by s to s .
Fig. 7. Some results of the facial images where boundaries are located with SFGSA. From left to right and from top to bottom, denoted by s to s .
tions bounding the behavior of the snaxels as a model. Another obvious reason is due to the statistical features of the image and the geometric variations of the object in the image. V. GRADIENT-GUIDED (GG) SNAKES In this section, the gradient information about an image is incorporated with the search direction of the snaxels. The purpose of this is two-fold, to use a multiresolution approach [18] and to detect the likely direction of the search. The estimation of edge direction is formulated as weighted directional wavelet
coefficients. Possibly, this can provide a three times reduction in the number of positions to be examined for each snaxel to determine a new position without increasing the number of iterations, and even without employing alternate pixel search patterns or switching step sizes. In GSA and in SGSA, the search is performed in all directions, while the search in FGSA and SFGSA is performed in four directions only. The idea of the gradient-guided (GG) approach is to consider those positions in the search pattern where snaxels are likely to provide an optimum result based on the gradient information. Two search patterns in the GG-FGSA are experimented on, one with a wider angle and the other with a narrower angle. A wavelet based but complex and computationally heavy approach was proposed in [27], where zero crossings between adjacent discrete wavelet frame transform coefficients were utilized to find the edge orientation and a balloon expansion approach was adopted. An approach utilizing the gradient vector flow (GVF) was introduced in [28] as a one-way deterministic solution of the gradient vector field to snaxels. This is obtained by solving the Euler–Lagrange equation iteratively such that at each iteration, snaxels diffuse toward their final contours with a amount of gradient multiplied by an inverse matrix of parameters, where acts like a gain factor. Therefore, in GVF, the snaxels do not need to perform a directional search process, and, therefore, the convergence speed is not considered. On the other hand, the technique proposed in this paper focuses on the case where a directional search is executed, and for this an optimal solution is sought at each iteration of the convergence, consequently the search pattern and direction are important to reduce the computational workload required. Nevertheless, our method is applicable to the GVF to skip a number of pixels possibly without pattern switching, and, therefore, it will yield a faster convergence. In our experiments, gradient information is extracted as the directional features from the subband decomposition of face images in three-level depth. The type of filters used for the subband decomposition is the Haar transforms which has the fea, tures of symmetry, orthogonality with the coefficients , and , , for low-pass and highpass decomposition and for low-pass and high-pass reconstruction filters, respectively. In the Haar scaling function, is 1 at [0, 1), 0 otherwise, and the wavelet function, is 1 at [0, 1/2), 1 at [1/2, 1), and 0 otherwise. The higher frequency subband , , and are the coefficients obtained coefficients, after layers of subband decomposition [29]. That is, the sums and differences between the odd and even rows (and later between columns) are calculated and scaled by half at every level. In determining pixel search direction, the gradient information implies four possible directions: horizontal, vertical, diagonal and anti-diagonal. Moving toward one of the possible directions gives only three pixels to be examined in the direction perpendicular to the gradient. The search direction of the snaxels is determined by comparing the three bands at the corresponding layer. Using only the higher frequency subband components is not sufficient to direct the snaxels to the target contour since the snaxels are trapped around the details with smaller energies. Therefore, the subbands are further filtered with a Gaussian of the lowest band at level filter. The edge profile image and three is obtained with the Canny edge detector, with
SAKALLI et al.: FASTER CONVERGING SNAKE ALGORITHM TO LOCATE OBJECT BOUNDARIES
Fig. 8. (a) Original face profile; (b) blurred edge contours from the lowest frequency band at level 3; (c), (d) Horizontal and vertical edge profiles of the horizontal and vertical bands at the same level, respectively. The projections of subband edge profiles are used for the initial estimation of the head contours.
the threshold , where is the maximum value at the current band. The overall procedure is where is the lowest frequency band and at -layer depth of wavelet decomposition, and are the Canny edge detector and the Gaussian filter for additional blurring of the edges, respectively, as shown in Fig. 8(b). Fig. 8(c) and (d) illustrates the smoothed horizontal and vertical edge properties of decomposed subbands at level 3, respectively. The snake is initialized with this coarse image of , which is 8 8 times smaller than the original image. Approximate head positions for initialization are determined roughly from the vertical and horizontal projections as follows:
(8) where and are the subband coefficients of the vertical and horizontal bands, respectively, at -level depth of decomposition. In all levels, the coefficients of the vertical, horizontal of the maximum and diagonal directional bands smaller than value are nullified, where the threshold is determined empirically. The direction to proceed is determined as the dominant subband coefficient with, , where , are weighted Gaussian operators. If is not strong enough, then the search is performed in all directions with switching step sizes; otherwise, the search is performed at a certain direction for three pixels. An example of a gradient map obtained for the lowest frequency band at level 3 is illustrated in Fig. 9(a). Experimental Results of GG-Snakes: Experiments are performed for the FGSA, and the GG-FGSA, and the GG-SGSA
1189
Fig. 9. Locating facial boundaries while reconstructing the decomposed image. (a) Gradient direction is mapped from the brightest to the darkest points, diagonal, vertical, horizontal, and anti-diagonal search directions; (b) 24 snaxels are initialized at level 3; (c) snaxels converged to the facial boundary using gradient map (GG-SGSA) incorporated with the horizontal and vertical subbands; (d) snaxels’ positions are scaled to new positions at the next level of reconstruction and the snaxel number is doubled.
(with and ). The initial number of snaxels is set to 24 at the lowest frequency, the (third) layer of the decomposition, as shown in Fig. 9(b); the number is doubled at every upper layer of the synthesis. If the pixel search fields of two snaxels start to overlay each other, these snaxels are merged into one , as snaxel. Once a snake converges; for example, at level shown in Fig. 9(c), the snaxel position is scaled is scaled by two of the synthesis. to determine the initial posistion at level The iteration is continued till the number of snaxels displacing thier positions is less then a few snaxels. The result is shown in Fig. 9(d). Smoothing is performed in all the ssubbands, with a , to extract the gradient, as shown mask of in Fig. 9(a). It is observed that, in some of the GG-GSA cases, convergence cannot be achieved thoroughly, while the GG-FGSA can converge to the final contour with more iterations. When the snake is allowed to behave in a skippy way, with GG-SGSA the number of iterations required decreases to between 5 and 8, whereas it is between 18 and 19 for GG-FGSA. We also found that at some points where GG-FGSA does not converge to the expected final contours, GG-SGSA can provide convergence without any need for additional iterations, while a larger step size causes a deformed organization of the snaxels in the representation of facial contours. Fig. 10 compares the convergence speed of different methods in terms of number of iterations. While the traditional snakes requires 300 iterations, SGSA and GG-SGSA take 200 and 80 iterations to converge, respectively. Approaching a final contour with a large step size for example 8 causes the adjacent snaxels pathway to intersect, and, therefore, the shape of the contour becomes more disoriented, with the snaxels intertwined. Then, this requires reordering of snaxels with a longer time at the final tuning stage in order to obtain a smoother contour shape. To avoid this, stepsizes are reduced to 3, while the snaxels approaching to each other are merged to one snaxel. Convergence
1190
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 5, MAY 2006
hierarchical manner, which can decrease the burden of preprocessing by using smoothing filters and images of smaller sizes. The gradient information is obtained at every level from the wavelet decomposed images, and the convergence of the snake model is examined for all the possible behaviors, with alternate patterns with and without skipping. Obtaining gradient information from the wavelet coefficients requires further study, particularly for different imaging environments. We found that the advantages provided by the skippy behavior of snaxels are not only the reduced number of iterations and, as a result, the improvement in convergence speed, but also the reduced likelihood of snaxels converging to false local minima. This is due to the perturbative behavior introduced to snakes, i.e., switching the step size of the walking patterns between adjacent snaxels and switching between different pixel search patterns. ACKNOWLEDGMENT
Fig. 10. Comparison of the four methods—(a) greedy snakes, (b) SGSA, (c) GG-SGSA, and (d) GVF—in terms of the numbers of iterations required to move from an initial circular contour to the contours of a U-shaped object. Step sizes in SGSA are chosen as 1, 3 and 1, 1 during coarse and fine tuning stages, respectively. In the case of GG-SGSA, in 40 iterations, snaxels settled down around the object and totally in 80 iterations around the final positions. Although snaxels initially converge fast in GG-SGSA, GVF provides a faster convergence; this is due to the fact that the gradient flow vector field gives a direct solution at every iteration; therefore, snaxels proceed without any need to search pixels around.
of GVF takes 30 iterations, and it can continue to converge to the concave region. This means that the improvement provided in the case of conventional snakes can be maintained by incorporating our approach to the GVF, possibly in progressive steps (from a coarse to fine-tuning stage) or at least by modifying the gain factor accordingly. Nevertheless, achieving an optimal performance depends on many factors, particularly on the complexity of the local region concerned in the case of facial images. The issues that need to be pointed out are: a different gradient and thresholding measure and different ssubband filter families will lead to different results, and the subject of determining the best performing filter families for the routing patterns with minimum redundancy (pixel search pattern and/or the step-size switching of snaxels, depending on the image features) needs to be further studied under the various imaging conditions. VI. CONCLUSION In this paper, a skippy-kind (with an alternate step size between adjacent snaxels) greedy search algorithm is presented as SGSA and SFGSA by assuming the statistical closeness of adjacent pixel features. We have not only demonstrated that the convergence speed can be improved irrespective of the type of search pattern applied, but also that the zigzagging route caused by the FGSA can be avoided by using the SGSA. In the experiments for locating face boundaries, the increase in speed compared to FGSA is found to be more than 40% in the first stage of coarse convergence, and more than 20% in the fine-tuning stage. In addition, the GG snake approach is also implemented in a
The authors would like to thank the Olivetti at the Oracle Research Laboratory [25] for the face database used in this paper, as well as B. Achermann at the University of Bern [29] for face profile images. REFERENCES [1] K. M. Lam and H. Yan, “Fast algorithm for locating head boundaries,” J. Electron. Imag., vol. 3, no. 4, pp. 352–359, 1994. [2] J. B. Waite and W. J. Welsh, “Head boundary location using snakes,” Brit. Telecom Tech. J., vol. 8, no. 3, pp. 127–135, 1990. [3] W. N. Lie, “Automatic target segmentation by locally adaptive image thresholding,” IEEE Trans. Image Process., vol. 4, no. 7, pp. 1036–1041, Jul. 1995. [4] P. A. Couvignou, N. P. Papanikolopoulos, and P. K. Khosla, “On the use of snakes for 3-D robotic visual tracking,” in Proc. IEEE CVPR, 1993, pp. 750–751. [5] T. McInernery and D. Terzopolous, “Deformable models in medical image analysis: A survey,” Med. Imag. Anal., vol. 1, no. 2, pp. 91–108, 1996. [6] A. K. Jain, S. P. Smith, and E. Backer, “Segmentation of muscle cell pictures: a preliminary study,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-2, no. 3, pp. 232–242, Mar. 1980. [7] Y. L. Fok, J. C. K. Chan, and R. T. Chin, “Automated analysis of nervecell images using active contour models,” IEEE Trans. Med. Imag., vol. 15, no. 3, pp. 353–368, Jun. 1996. [8] B. Ginneken, A. F. Frangi, J. J. Staal, B. M. Haar Romeny, and M. A. Viergever, “Active shape model segmentation with optimal features,” IEEE Trans. Med. Imag., vol. 21, no. 8, pp. 924–933, Aug. 2002. [9] D. Marr and E. Hildreth, “A theory of edge detection,” Proc. Roy. Soc. B, vol. 207, pp. 187–217, 1980. [10] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: active contour models,” Int. J. Comput. Vis., vol. 1, no. 4, pp. 321–331, 1987. [11] A. A. Amini, S. Tehrani, and T. E. Weymouth, “Using dynamic programming for minimizing the energy of active contours in the presence of hard constraints,” in Proc. Int. Conf. Computer Vision, 1988, pp. 95–99. [12] A. L. Yuille, D. S. Cohen, and P. W. Hallinan, “Feature extraction from faces using deformable templates,” in Proc. IEEE CVPR, 1989, pp. 104–109. [13] D. J. Williams and M. Shah, “A fast algorithm for active contours and curvature estimation,” Comput. Vis. Graph. Image Process., vol. 55, pp. 14–26, 1992. [14] J. Ivins and J. Porrill, “Active region models for segmenting medical images,” in Proc. IEEE Int. Conf. Image Processing, 1994, p. 227. [15] J. S. Marques and A. J. Abrantes, “A class of probabilistic shape models,” in Proc. IEEE CVPR, 1997, pp. 1054–1059. [16] W. Abd-Almageed and C. Smith, “Active deformable models using density estimation,” Int. J. Image Graph., pp. 343–361, Jul. 2004. [17] H. Yan, “Fuzzy curve-tracing algorithm,” Trans. Syst. Man Cybern. B, Cybern., vol. 31, no. 5, pp. 768–780, Oct. 2001.
SAKALLI et al.: FASTER CONVERGING SNAKE ALGORITHM TO LOCATE OBJECT BOUNDARIES
[18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29]
, “Convergence condition and efficient implementation of the Fuzzy Curve-Tracing (FCT) algorithm,” Trans. Syst. Man Cybern. B, Cybern., vol. 33, no. 1, pp. 1–10, Feb. 2003. M. Wang, J. Evans, L. Hassebrook, and C. Knapp, “A multistage, optimal active contour model,” IEEE Trans. Image Process., vol. 5, no. 11, pp. 1586–1591, Nov. 1996. N. S. Jayant and P. Noll, Digital Coding of Waveforms—Principles and Applications to Speech and Video. Englewood Cliffs, NJ: PrenticeHall, 1984. K. F. Lai and R. T. Chin, “Deformable contours: Modeling, extraction, detection and classification,” Ph.D. dissertation, Dept. Elect. Comput. Eng., Univ. Wisconsin, 1994. W. Neuenschwander, P. Fua, G. Szekely, and O. Kubler, “Initializing snakes,” in Proc. IEEE Computer Vision and Pattern Recognition, Jun. 1994, pp. 658–663. T. Kanade, “Picture processing by computer complex and recognition of human faces,” Dept. Inf. Sci., Kyoto Univ., Kyoto, Japan, 1973. K. M. Lam, “Computerized human face recognition,” Ph.D. dissertation, School Elect. Inf. Eng., Univ. Sydney, Sydney, Australia, Apr. 1996. The ORL Database of Faces. UNIX System Laboratories, Inc. [Online] Available. ftp://ftp.uk.research.att.com/pub/data/orl_faces.tar.Z at ATT Cambridge Labs, Cambridge, U.K. M. Sakalli and H. Yan, “Feature-based compression of human face images,” Opt. Eng., vol. 37, pp. 1520–1529, 1998. H. Wu, J. Liu, and C. Chui, “A wavelet-frame based image force model for active contouring algorithms,” IEEE Trans. Image Process., vol. 9, no. 11, pp. 1983–1988, Nov. 2000. C. Xu and J. L. Prince, “Gradient vector flow: a new external force for snakes,” in IEEE Proc. Conf. Computer Vision Pattern Recognition, 1997, pp. 66–71. B. Achermann, Inst. Inf. angewandte Math., Univ. Bern, Bern, Germany. Profiles.tar.gz located in iamftp.unibe.ch at /pub/Images/FaceImages/.
Mustafa Sakalli (M’88) received the B.S. degree in electronics and communications from the Istanbul Technical University, Istanbul, Turkey, in 1980, the M.S. degree in biomedical engineering from the Bogazici University, Istanbul, in 1988, and the Ph.D. degree from the School of Electrical and Information Engineering, University of Sydney, Sydney, Australia, in 1999. Prior to this, he was a Biomedical Design and Research Engineer with the Applied Research Group, Royal Prince Alfred Hospital, and Telectronics Pacing Systems, Ltd., Australia. After receiving the Ph.D. degree, he was a postdoctorate fellow with INRIA, Rennes, France, and he joined to Netas, Nortel Networks, as a Communications Research Engineer, where he worked on the subjects of high-speed optical communication protocols and error-correcting codes. He also held visiting academic positions as an Assistant Professor and Research Fellow with the Department of Electronic and Information Engineering, Hong Kong Polytechnic University, and at the Department of Electronic and Information Engineering, University of Sydney. Currently, he is a Visiting Academic Fellow at ECSE, Rensselaer Polytechnic Institute, Troy, NY, working on spiht and speck compression algorithms. His interests include cryptographic and biologically inspired algorithms, statistical approaches in image processing, video encoding, variational methods and convex optimization, high-speed communication protocols, optical switches, and modeling of insect vision and flight.
1191
Kin-Man Lam (M’96) received the Associateship in Electronic Engineering (with distinction) from the Hong Kong Polytechnic University (formerly called Hong Kong Polytechnic) in 1986; the S. L. Poa Scholarship for overseas studies and M.Sc. degree in communication engineering from the Department of Electrical Engineering, Imperial College of Science, Technology and Medicine, London, U.K., in 1987; and the Ph.D. degree from the Department of Electrical Engineering, University of Sydney, Sydney, Australia, in August 1996. He joined the Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, as an Assistant Professor in October 1996, where he became an Associate Professor in February 1999. His current research interests include human face recognition, image and video processing, and computer vision. Dr. Lam won an Australia Postgraduate award the IBM Australia Research Student Project Prize. He is actively involved in professional activities. Currently, he is the Treasurer of the IEEE Hong Kong Chapter of Signal Processing; a member of the Program Committee of the Advanced Concepts for Intelligent Vision Systems (ACIVS 2004); the 8th International Conference on Control, Automation, Robotics, and Vision (ICARCV 2004); and the IASTED International Conference on Internet and Multimedia Systems and Applications (EuroIMSA 2005). He is also the Technical Chair of the 2004 International Symposium on Intelligent Multimedia, Video, and Speech Processing (ISIMP 2004) and a Technical Co-Chair of the 2005 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2005). In addition, he was a Guest Editor for the Special Issue on Biometric Signal Processing of the EURASIP Journal on Applied Signal Processing.
Hong Yan (SM’93) received the B.E. degree from the Nanking Institute of Posts and Telecommunications, Nanking, China, in 1982, the M.S.E. degree from the University of Michigan, Ann Arbor, in 1984, and the Ph.D. degree from Yale University, New Haven, CT, in 1989, all in electrical engineering. From 1982 to 1983, he worked on signal detection and estimation as a graduate student and Research Assistant with Tsinghua University, China. From 1986 to 1989, he was a Research Scientist at General Network Corporation, New Haven, where he worked on the design and optimization of computer and telecommunications networks. He joined the University of Sydney, Sydney, Australia, in 1989, and became Professor of imaging science in 1997. He is currently a Professor of computer engineering at the City University of Hong Kong. His research interests include image processing, pattern recognition, and bioinformatics. He is the author, coauthor, or editor of two books and 300 journal and conference papers in these areas. Dr. Yan is a Fellow of the International Association for Pattern Recognition (IAPR); a Fellow of the Institution of Engineers, Australia (IEAust); and a member of the International Society for Computational Biology (ISCB).