Available online at www.sciencedirect.com
Pattern Recognition Letters 29 (2008) 126–141 www.elsevier.com/locate/patrec
Parametric active contours for object tracking based on matching degree image of object contour points Qiang Chen
a,*
, Quan-Sen Sun a, Pheng-Ann Heng
b,c
, De-Shen Xia
a
a
c
The School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094, China b Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong Shenzhen Institute of Advanced Integration Technology, Chinese Academy of Sciences/The Chinese University of Hong Kong Received 25 July 2006; received in revised form 11 August 2007 Available online 1 October 2007 Communicated by H.H.S. Ip
Abstract A parametric active contour model is presented for object tracking based on matching degree image of object contour points. We first construct a matching degree image according to object contour points, and track the object using parametric active contours. This paper presents a new feature matching approach and a new directional filter. Assuming that the motion of objects is small in this paper, we constrain the motion of object contour within the contour vicinity defined by a band, which is constructed by the generation method of narrow band of level set method. Experimental results demonstrate that our method can effectively track rigid and non-rigid objects. We apply the proposed tracking method for face tracking, outer contour segmentation of left ventricle magnetic resonance (MR) images, and brain stem segmentation of Chinese visible human datasets, which demonstrates that our method is feasible for practical applications. 2007 Elsevier B.V. All rights reserved. Keywords: Object tracking; Feature matching; Active contour; Face tracking; MR image segmentation; Chinese visible human
1. Introduction Object tracking is an important task in many computer vision applications such as driver assistance (Handmann et al., 1998; Avidan et al., 2001), video surveillance (Aggarwal and Cai, 1999; Gavrila, 1999; Kettnaker and Zabih, 1999), object-based video compression (Lee et al., 1997; Bue et al., 2002). Various methods have been proposed and improved, from simple and rigid object tracking under a condition of a static camera, to complex and non-rigid object tracking under a condition of a moving camera. For ease of discussion, we classify these methods into two categories: region-based method and contour-based method.
*
Corresponding author. E-mail address:
[email protected] (Q. Chen).
0167-8655/$ - see front matter 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2007.09.009
The basic idea of region-based method is to track object based on the similarity measure of object regions. The Bhattacharyya coefficient and Kullback–Leibler divergence are two popular similarity measures, and the mean-shift algorithm has achieved considerable success in similarity region search due to its simplicity and robustness. The real-time kernel-based object tracking proposed by Comaniciu et al. (2003) can successfully track partial occluded non-rigid objects, but cannot cope with cases with a large deformation of object contours. A more discriminative similarity measure in spatial-feature space was proposed by Yang et al. (2005), which is a symmetric similarity function between spatially smoothed kernel-density estimates of the model and target distributions for object tracking. To cope with occlusions effectively, Kalman filter (Comaniciu et al., 2003; Zhu et al., 2002) and particle filter (Chang and Ansari, 2005; Deguchi et al., 2004) were often combined with the mean-shift algorithm. In clutter, there are typically several
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
competing observations and these tend to encourage methods based on a non-Gaussian state-density. Kalman filtering is inadequate to resolve this because it is based on Gaussian densities, which are unimodal, cannot represent simultaneous alternative hypotheses. CONDENSATION (Conditional Density Propagation) proposed by Isard and Blake (1998) aims at dealing with the problem of tracking curves in dense visual clutter. Foreground detection techniques are widely used for tracking objects, which identify for each frame of the video which pixels belong to the moving regions. Cheng and Chen (2006) proposed a real time multiple objects tracking and identification method based on discrete wavelet transform, which used the color and spatial information to identify moving objects. Conte et al. (2006) proposed a graph-based multi-resolution algorithm for tracking objects in presence of occlusions. For contour-based object tracking, active contour models, such as the parametric active contour model (snake) (Peterfreund, 1999; Sun et al., 2003; Park et al., 2005) and the geometric active contour model (level set) (Feghali and Mitiche, 2004; Paragios and Deriche, 2005; Niethammer et al., 2006; Bunyak et al., 2007; Rathi et al., 2007), are widely used. Peterfreund presented velocity snake (Peterfreund, 1999) and Kalman snake (Peterfreund, 1999) models, in which energy function was mainly constructed with optical flow. Yilmaz et al. (2004) incorporated prior shape into object energy functions, and used level set to evolve the contour by minimizing the energy functional. Contour-based methods can achieve a high tracking precision, but its robustness is usually not better than that of region-based methods. Furthermore, the computational cost of contour-based methods is usually high, especially for large and fast-moving objects. Combining region-based method and contour-based method, we present an object tracking method based on a matching degree image of object contour points, which consists of the following process: (1) According to an initial object contour, we generate a narrow band in the current frame Ik and determine the nearest object contour point of the previous frame Ik1 for each point within the narrow band. (2) In a feature space constructed by intensity, texture, etc. we calculate the optimal matching degree between each point of the current frame Ik within the narrow band and several near object contour points of the previous frame Ik1 in order to generate the matching degree image of object contour points. (3) Smooth the matching degree image using the proposed directional filter. (4) Based on
Video input Tracking results
Initialization of the object contour Snake model
127
the matching degree image, parametric active contour model is adopted to track the object contour. Fig. 1 shows the whole processing flow of our method. The paper is organized as follows. Section 2 introduces the generation of the matching degree image of object contour points. In Section 3, a new directional filter is constructed in order to smooth the matching degree image. Section 4 proposes the parametric active contour model for object tracking based on the matching degree image. Section 5 shows some experimental results. Three applications of our tracking method are demonstrated in Section 6. Finally, conclusions are drawn. 2. Generation of matching degree image of object contour points The basic idea of the object tracking by matching object contour points is to find the position in the current frame for each object contour point of the previous frame. Due to the continuation and limited speed of object motion, the displacement between two consecutive frames is not too large, so we can restrict the searching region of object contour points in the contour vicinity by a band. For each object contour point, we only have to search its near points in the narrow band. This can reduce the computational complexity and improve the matching precision. We construct the feature space according to the intensity, texture, etc. and calculate the matching degree between each point in the narrow band and the near object contour points, to find an optimal matching degree that is taken as the feature matching degree of this point. To improve the tracking precision, we constrain the feature matching degree by the gradient image, so as to obtain the final matching degree image. 2.1. Generation of narrow band Osher and Sethian (1998) presented the level set method in 1988. To reduce computational complexity, Adalsteinsson and Sethian (1995) presented the narrow band level set method in 1994. While generating the narrow band, the corresponding object contour point for each point in the narrow band is determined. Fig. 2 shows a narrow band, which is generated with the approach in (Chen et al., 2004), with a radius of 3 pixels, where the white points are object contour points and the different color regions are corresponding to the different object contour points. In other words, for each object contour point, there
Generation of narrow band Directional filter
Fig. 1. Flow chart of our tracking method.
Generation of matching degree image
128
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
2.2. Generation of matching degree image Let the feature space be F = {F1, F2, . . . Fn}, where Fi(i = 1 . . . n) is the feature image generated by intensity, texture, etc. Due to the assumption that the displacement of the object between two consecutive frames is small, the displacement of each object contour point is small. Assuming that the displacement of each object contour point is smaller than 5 pixels, so for each point (i, j) in the narrow band, we calculate the feature matching degree with its three closest object contour points that have the shortest Euclidean distance with the current point, dm(i, j), (m = 1, 2, 3), n r r X X X
d m ði; jÞ ¼
ðF kl ði þ dx; j þ dyÞ
l¼1 dx¼r dy¼r
!12 F k1 l ðim þ dx; jm þ dyÞÞ
2
ð1Þ
Fig. 2. A narrow band.
are some nearest points in the narrow band that belong to the same color region. The radius of the narrow band can be determined according to the approximate displacement of the object between two consecutive frames. If the displacement is small, the radius can be set small, and vice versa. For traditional snake model, the initial contour must, in general, be close to the true boundary or else it will likely converge to the wrong result. Because our method is based on the snake model, and the initial contour of the current frame is the tracking result of the previous frame, the displacement of the object between two consecutive frames must be small. In this paper, we assume that the displacement is smaller than 5 pixels, so the radius of the narrow band is set to be 5 pixels.
where Fk and Fk1 denote the feature spaces of the current frame and the previous frame, respectively, and (im, jm) is the coordinate of the mth object contour point. r is the radius of the matching window. If r is too small, the matching degree will not have the regional characteristic and cannot effectively distinguish different object contour points. If r is too large, the computing cost will be high, with no significant improvement to the matching effect. After considering the computing cost and the matching effect, we set r from 3 to 5. The optimal matching degree of point (i, j) is minðd m ði; jÞÞ dði; jÞ ¼
m
varðI k ði; jÞÞ
ð2Þ
where var(Ik(i, j)) denotes the variance in the current matching window of the original intensity image Ik. In
Fig. 3. Generation of matching degree image.
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
Eq. (2) the aim of being divided by the intensity variance is to prevent the object contour points running into the insides of objects or the background with homogenous intensity, while the intensity in the vicinity region of the object contour is commonly inhomogeneous. To improve the tracking precision when tracking the object by matching object contour points, we use the gradient image of the current frame to constrain the optimal matching degree. Let the gradient image of the current frame be Gk, then the final matching degree of point (i, j) is k ~ jÞ ¼ dði; jÞ þ k 1 G ði; jÞ ð3Þ dði; maxðdÞ maxðGk Þ
viy pi
vi
θi
vix pi +1
pi −1 Fig. 4. Direction calculation of object contour points.
129
Here we have normalized the gradient image. The parameter k is the weight governing the tradeoff between the feature image and the gradient image. If the background is simple and the object contour is obvious, a large k should be set. If the background is complex or the object contour is obscure (namely with small gradient), k should be set small. In this paper, we set k = 0.3. The effect of the gradient image is to make the evolving contour approaches regions with relatively larger gradient values. Since the gradient near the object contour is normally large, the gradient image can rectify the error of the optimal matching degree in order to improve the tracking precision to some extent. In clutter, there probably exist background region near the object contour that has large gradient, so k should be set small. The matching degree is based on the feature space in this paper. Because the illumination variation and the displacement of the object between two consecutive frames are small, the intensity information is just used to construct the feature space in this paper. If the illumination variation between two consecutive frames is large, some illumination invariance features should be used, such as the gradient features, gx = oI/ox and gy = oI/oy. If the rotation variation between two consecutive frames is large, some rotation
Fig. 5. Direction field.
130
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
invariance features should be used, such as the texture feature, intensity variance. Therefore, we present a uniform construction framework of feature space, which can be constructed with different information according to different application conditions.
Fig. 3 shows a matching degree image (Fig. 3b) of object contour points generated using Eq. (3). Here, we only adopt intensity image to construct the feature space. The radii of both the narrow band and the matching window are 5 pixels. The blacker the pixel is in Fig. 3b, the better
Fig. 6. Smoothing result with our directional filter.
Fig. 7. Tracking result of Fig. 3.
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
the matching of it with object contour points is. From Fig. 3b, we can observe that the object contour can be tracked by finding the black points with an optimal matching degree in the narrow band. In the following parts, we will introduce the active contour method for finding the object contour in detail. The generation time of the matching degree image (Fig. 3b) is 3.1 s with a MATLAB 7 implementation, which mainly depends on these factors, the number of the object contour points, the radii of the narrow band and the matching window, and the dimension of the feature space. For Fig. 3b, the number of the object contour points is about 265, and the dimension of the feature space is 1. 3. Directional filter It can be seen from Fig. 3b that though most of object contour points can be matched properly in the current frame, some object contour points are matched improperly because of noise, weak boundary between object and background, great change of object contour between two consecutive frames, etc. for example in the wheel region of the car. In order to remove noise and to simultaneously preserve the correct matching region, we present a new directional filter based on the direction of object contour points.
131
In order to preserve image edges, non-linear diffusion filtering (Perona and Malik, 1990; Rudin et al., 1992; Weichert et al., 1998) is widely used for smoothing image. For fingerprint recognition, directional filter (O’Gorman and Nickerson, 1989; Hong et al., 1998) was used to enhance fingerprint images, so as to facilitate the feature extraction and fingerprint recognition. It is critical for the image smoothing and enhancing methods to correctly estimate the direction of each point in the narrow band. In this paper, we do not adopt the traditional estimation method based on the image gradient, but estimate the direction according to the direction of object contour points. The detail of our direction estimation algorithm is as follows: Let {pi} with i = 1, . . . , N be the set of object contour points, which is sampled with a Euclidean distance a(1 6 a 6 3) along the object contour. Fig. 4 shows the illustration of the direction calculation of the object contour point pi. The unit direction vector of point pi is the ! vector pi vi that is parallel to the line pi1 piþ1 . Let vectors vix and viy be the horizontal and the vertical projections ! of the vector pi vi , respectively. Then the direction angle hi v of the point pi is hi ¼ a tan viyix , and the directions of the other points on the object contour can be determined similarly. After calculating the directions of the object contour points, we can determine the directions of all points in
Fig. 8. Tracking results of the traffic intersection sequence. The frames 1, 5, 10, 15, 20, 25, 32, 40 and 50 are shown.
132
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
the narrow band by extending the directions of the object contour points. In the generation process of the narrow band, because the corresponding object contour point for each point in the narrow band is determined, the direction of each point in the narrow band can be regarded as the direction of its corresponding object contour point. Fig. 5 shows the direction field in the narrow band of Fig. 3b, which is parallel to the direction of the initial object contour. Due to the assumption that the deformation of the object between two consecutive frames is small, the direction of object contour points between two consecutive frames is similar. Therefore, the direction of the matching degree image should be similar to the direction of object contour points, in order to remove noise and to simultaneously preserve the correct matching region. Fig. 5 indi-
cates that the direction of each point in the narrow band is similar to the direction of object contour points, which is suitable for the smoothing of the matching degree image. After determining the direction field, we can adopt various non-linear diffusion filters to smooth the matching degree image. We adopt the smoothing method based on templates as follows. The filter type is Gaussian lowpass filter. Let h be the filter template, and then 2
2
2Þ
hg ðn1 ; n2 Þ ¼ eðn1 þn2 Þ=ð2r hg ðn1 ; n2 Þ hðn1 ; n2 Þ ¼ P P n1 n2 hg
ð4Þ
where n1 and n2 are the row and column indexes, and r is the standard deviation. Let the numbers of rows and col-
Fig. 9. Tracking results of the flame.
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
umns in h be nr and nc, respectively. For a horizontal filter template, nr must be smaller than nc. In this paper, we set nr = 1, nc = 5, and r = 1 to generate the horizontal (of direction angle equals 0) filter template h = [0.0545, 0.2442, 0.4026, 0.2442, 0.0545]. The filter template with different directions can be generated by rotating h by the direction angle h. For example, the filter template with the direction angle 90 is [0.0545, 0.2442, 0.4026, 0.2442, 0.0545] 0 that is generated by rotating h by 90. We use the corresponding template to smooth each point in the narrow band. Fig. 6a shows the smoothing result of Figs. 3b and 6b and c shows the same part of Figs. 3b and 6a, respectively. The generation time of the smoothing result (Fig. 6a) is 0.28 s. From Fig. 6 we can observe that our directional filter can reduce the noise whose direction is different from the direction of object contour points, and preserve the correct matching region whose direction is similar to the direction of object contour points. If a traditional directional filter or a non-linear diffusion filter based on image gradients is adopted to smooth the matching degree image, noise with large gradient cannot be smoothed.
133
4. Object contour evolution based on parametric active contour model A traditional parametric active contour is a curve X(s) = [x(s), y(s)],s 2 [0, 1], which moves through the spatial domain of an image to minimize the energy functional Esnake ¼
Z 0
1
1 2 2 ðajX 0 ðsÞj þ bjX 00 ðsÞj Þ þ Eext ðX ðsÞÞds 2
ð5Þ
where a and b are weighting parameters that control the snake’s tension and rigidity, respectively, and X 0 (s) and X00 (s) denote the first and second derivatives of X(s) with respect to s, which are the internal forces coming from within the curve itself. The internal forces hold the curve together (elasticity forces) and avoid it from bending too much (bending forces). The external energy function Eext is derived from the image so that it takes on its smaller values at the features of interest, such as the image intensity and boundaries. In this paper, the external energy function Eext is designed to lead the active contour toward edges of the matching degree image.
Fig. 10. Tracking results of the flower garden sequence without texture information. The frames 1, 10, 20 and 30 are shown.
134
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
Eext ðx; yÞ ¼ jrMðx; yÞj
2
ð6Þ
where M(x, y) denotes the smoothed matching degree image. Kass et al. (1987) presented the numerical solution methods of the energy functional (5). 1
xt ¼ ðA þ cIÞ ðcxt1 fx ðxt1 ; y t1 ÞÞ 1
y t ¼ ðA þ cIÞ ðcy t1 fy ðxt1 ; y t1 ÞÞ
ð7Þ
where A is a pentadiagonal banded matrix composed of a and b, and c is a step size. Let fx(i) = oEext/oxi, fy(i) = oEext/oyi. For more details about the numerical solution, we refer the reader to Kass et al. (1987). Fig. 7 shows the tracking result, where Fig. 7a shows the original image of the second frame with the tracked object contour, and Fig. 7b shows the matching degree image with the initial object contour (blue line) and the tracked object contour (red line). The time of the object contour evolution is 1.1 s. Fig. 7 demonstrates that the tracked object contour is close to the optimal matching degree points, and due to the influence of the internal forces of the snake model, the tracked object contour is smooth and connected, which is identical with the characteristic of sequence images.
5. Experimental results and analysis We have tested our method on various sequences, and compared our method with some methods based on active contours. In this paper, the feature space is only constructed by the intensity image, because the experimental results have demonstrated that the intensity image feature and gradient constraint are good enough to track objects effectively. According to the experimental results, we generally set the parameters of the snake model, except for some experiments, as follows: a = 0.05, b = 0.02, c = 1. Our experiments are performed on a 2.8 GHz Pentium 4 PC with 512MB memory and the algorithm is implemented in Matlab. 5.1. Tracking of rigid and non-rigid objects Fig. 8 shows the tracking results of the traffic intersection sequence http://i21www.ira.uka.de/image_sequences/, and the image size of each frame is 178 · 220. The car is a rigid object, and its deformation is an affine transformation. The initial object contour in the first frame is given manually. As the car run away from the camera, the evolv-
Fig. 11. Tracking results of the flower garden sequence with texture information. The frames 1, 10, 20 and 30 are shown.
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
ing curve can adaptively decrease to track the object contour precisely. Due to the effect of the variance and gradient image in the generation process of the matching degree image, our method overcomes the ‘‘tail’’ phenomenon that exists in the original snake model, as shown in Fig. 10b. Fig. 9 shows the tracking results of the flame provided by Cremers (http://www.cs.ucla.edu/~doretto/projects/ dynamic-segmentation.html), which is a synthetic sequence that was generated by superimposing a sequence of fire to a sequence of ocean waves. The image size of each frame is
135
287 · 350. The experiment is a very challenging one (Doretto et al., 2003), since the flame and the ocean waves are continuously changing in time. Our method obtained good tracking results when the change of two consecutive frames is not too sharp. The two experimental results in Fig. 9 indicate that our method can effectively track the rigid and non-rigid objects, because the parametric active contour model can effectively deal with object deformation. 5.2. Utilization of texture and color information Figs. 10 and 11 show the tracking results of the flower garden sequence without texture information and with texture information. The initialization of Fig. 10 is the same as that of Fig. 11, and the image size of each frame is 240 · 244. The parameters for the experiments shown in Figs. 10 and 11 are a = 0.1, b = 0.01. The generation method of the texture feature images in (Chen et al., 2007) was adopted. Let the texture feature image be FT, which is constructed as follows: FT ¼
Fig. 12. Texture feature image of the frame 1.
4 1X Fi 4 i¼1
ð8Þ
where Fi is the feature image that is generated based on orientation and local variance. Fig. 12 shows the texture feature image of the frame 1 of the flower garden sequence. From Fig. 12, we can observe that the texture feature
Fig. 13. Tracking results of a color image sequence. The frames 1, 4, 7 and 10 are shown.
136
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
image can effectively distinguish the object (tree) and the background. For Fig. 11, the feature space is F = {I, FT}. The object contour points of Fig. 10 run into the background, whereas the tracking results of Fig. 11 are better. Because the variance and gradient of the texture background are large, the object contour points easily run into the background. The texture feature image can overcome this problem to a certain extent. Comparing Fig. 10 with Fig. 11, we can observe that the texture information is useful for the object tracking in the texture background. Fig. 13 shows the tracking result of a color image sequence. The image size of each frame is 240 · 284, and the feature space is F = {R, G, B}, where R, G and B are the three color components.QFor Fig. 13, The variance term in Eq. (2) is varðI k ði; jÞÞ ¼ f 2fR;G;Bg varðI kf ði; jÞÞ. The experimental results above indicate that our method presents a uniform construction framework of the feature space, which can be constructed with different information (such as texture and color information) according to different application conditions.
straint-based model, respectively (see Fig. 1b, c, and e in (Peterfreund, 1999), respectively). Fig. 14e shows the tracking result of the VSnake model (Sun et al., 2003). Fig. 14f shows the tracking result of our method, where the radii of the narrow band and matching window are set to be 5 pixels and 3 pixels, respectively. From Fig. 14, we can observe that both the original snake model and the VSnake model show ‘‘tail’’, i.e. object contour points run into the insides of objects or the background with homogenous intensity. Our method, the batch-mode model and the optical flow constraint-based model lead to similar tracking results. 6. Applications The proposed tracking method is used for face tracking, outer contour segmentation of left ventricle MR images, and brain stem segmentation of Chinese visible human datasets. 6.1. Face tracking
5.3. Comparison of some active contour models Fig. 14 shows a comparison of the tracking results of the Hamburg taxi sequences, and we give the tracking results after 19 frames. The initial frame and the corresponding initial contour are given in Fig. 14a. The initializations of Fig. 14b–f are similar to that of Fig. 14a. Fig. 14b–d show the tracking results of the original snake model, the batchmode model with affine velocity and the optical flow con-
Fig. 15 shows the tracking result of a face sequence of 240 · 320 pixels, which is downloaded from the homepage of Black (http://www.cs.brown.edu/people/black/). The intensity of the right part of the face is similar to that of the wall, and the boundaries between the face and the wall is weak. The evolving contour will leak out from the weak boundaries with the traditional snake model, but our method overcomes the problem by exploiting the
Fig. 14. Comparison of tracking results of the Hamburg taxi sequences (after 19 frames).
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
137
Fig. 15. Face tracking results. The frames 1, 8, 15, 22, 29, 36, 43 and 51 are shown.
matching degree image. From Fig. 15, we can observe that our method can adaptively track the pose change of the face. 6.2. Outer contour segmentation of left ventricle MR images For the automatic segmentation of sequential medical images, contour tracking methods were adopted, such as
the segmentation of the aorta from CT images (Ip et al., 1997) and the segmentation of the left ventricle MR images (Latson et al., 2001; Cho, 2004). Fig. 16 shows the tracking results of the outer contours of the left ventricle MR temporal sequential images on an identical image layer. The tracking precision is similar to the segmentation precisions in (Mitchell et al., 2001; de Bruijne et al., 2004; Chen et al., 2006). These segmentation
138
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
Fig. 16. Tracking results of the outer contours of the left ventricle MR temporal sequential images on an identical image layer.
methods are semi-automatic with onerous manual work, while our tracking method can obtain the outer contours of the other frames automatically after the outer contour of the first frame is given manually. According to our segmentation experience (Chen et al., 2006), if the intensity and gradient information is used to construct the energy function, some problems, such as the leakage of the evolving curve, still exist when the snake model is used to segment the outer contour of left ventricle MR images. For a detailed discussion about these problems, we refer the reader to Chen et al. (2006). In order to solve these problems, some prior knowledge, such as the shape prior, will be used to constrain the evolution of the active contour. Though no prior knowledge is used, our tracking method can precisely segment the outer contour of the left ventricle. 6.3. Brain stem segmentation of Chinese visible human datasets Because the slice thickness of the Chinese visible human is 0.2 mm, the organs have fine continuity between two serial slices. The continuity can be seen as the deformation of temporal sequential images on an identical slice, so we can use the proposed tracking method to segment the serial slices. Fig. 17 shows the brain stem segmentation results of Chinese visible human datasets. To facilitate the display,
the local region of 240 · 240 pixels containing the brain stem is shown. The Chinese visible human image has a section resolution of 3872 · 2048 pixels. In order to decrease computational complexity, the image was reduced to the resolution of 1246 · 1271 pixels, and the color image was converted to gray image. The initial object contour is manually given in the frame 1527, and the following hundreds of frames are tracked automatically. Currently Chinese visible human datasets are manually segmented by some experts with anatomy knowledge. Our tracking method is almost automatic, and the segmentation precision is high for some simple organs. The three applications of our tracking method indicate that our tracking method has an extensive application domain, which can track temporal sequential images and can also track spatial sequential images. Table 1 shows the average tracking time except the first initialized frame by hand. The performance can be improved with a C++ implementation which would yield to near real time performance. Due to the relatively high feature dimension and the relatively large number of the object contour points, the average tracking time of Fig. 13 is a little high. In the tracking process, the computation complexity of the generation of matching degree image is higher than that of the others, such as direction filter and contour evolution. Thus, the dimension of the feature space has an obvious influence on the tracking time.
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
139
Fig. 17. Brain stem segmentation results of Chinese visible human serial slices. The frames 1527, 1536, 1553, 1565, 1575, 1585, 1598, 1615, 1630, 1660, 1690, 1720, 1760, 1800, 1830 and 1858 are shown.
Table 1 Average tracking time per frame (unit: s) Fig. 8
Fig. 9
Fig. 11
Fig. 13
Fig. 15
Fig. 16
Fig. 17
1.2
2.5
5.5
13.2
1.6
2.2
2.1
7. Conclusion This paper presents a parametric active contour model for object tracking based on matching degree image. We first generate a matching degree image by matching object contour points, and smooth the matching degree image by using the directional filter, which is constructed according to the direction of object contour points. Then the snake model is used to track the object contour based on the smoothed matching degree image. The proposed matching degree function is effective for distinct object contour points, and the problem of the object contour points without fine matching degree can be solved by the constraint of the narrow band, directional filter, and the snake model’s
properties including continuity and smoothness. Our tracking method is based on the narrow band and the snake model, so the displacement of the object between two consecutive frames should be small; otherwise the radius of the narrow band must be large, which will increase the computational complexity and the snake model may converge to a local minimum. From the experimental results and applications above, it is shown that our method has the good computational performance and high tracking precision for the tracking of rigid and non-rigid objects, which has relatively small displacement between two consecutive frames and has not be occluded. For the tracking of an object with large displacement, we can first estimate the location of the object with Kalman filter or localize the object with the kernel-based method (Comaniciu et al., 2003), and then use our method to track the object contour. For the tracking of the occluded object, we can first judge the occlusion according to the value of the matching degree, and then some prior constraint, such as the shape prior, can be used to solve the problem. Some applications of our tracking
140
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141
method demonstrate that our method is effective and feasible to track the object with small displacement. Acknowledgement The authors sincerely thank the Prince of Wales Hospital for providing the cardiac MR images, and thank the Chinese University of Hong Kong and the Third Military Medical University for providing the processed Chinese visible human datasets. This work was supported by the Research Grants Council of the Hong Kong Special Administrative Region under Project CUHK 4461/05M and CUHK Direct Grant Allocation under Project 2050345. This research was also supported by the National Science Foundation of China under grant no. 60773172. References Adalsteinsson, D., Sethian, J.A., 1995. A fast level set method for propagating interfaces. J. Comput. Phys. 118 (2), 269–277. Aggarwal, J., Cai, Q., 1999. Human motion analysis: a review. Computer Vision and Image Understanding 73 (3), 428–440. Avidan, S., 2001. Support vector tracking. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, vol. I. Kauai, Hawaii, pp. 184–191. Bue, A.D., Comaniciu, D., Ramesh, V., Regazzoni, C., 2002. Smart cameras with real-time video object generation. In: Proc. of IEEE International Conference on Image Processing, vol. III. Rochester, NY, pp. 429–432. Bunyak, F., Palaniappan, K., Nath, S.K., Seetharaman, G., 2007. Geodesic active contour based fusion of visible and infrared video for persistent object tracking. IEEE Workshop on Applications of Computer Vision (WACV’07). Chang, C., Ansari, R., 2005. Kernel particle filter for visual tracking. IEEE Signal Process. Lett. 12 (3), 242–245. Chen, Q., Zhou, Z.M., Qu, Y.G., Heng, P.A., Xia, D.S., 2004. Level set based auto segmentation of the tagged left ventricle MR images. In: Medicine Meets Virtual Reality, vol. 12. IOS Press. Chen, Q., Zhou, Z.M., Tang, M., Heng, P.A., Xia, D.S., 2006. Shape statistics variational approach for the outer contour segmentation of left ventricle MR images. IEEE Trans. Infor. Technol. Biomed. 10 (3), 588–597. Chen, Q., Luo, J., Heng, P.A., Xia, D.S., 2007. Fast and active texture segmentation based on orientation and local variance. J. Visual Comm. Image Representation 18 (2), 119–129. Cheng, F.H., Chen, Y.L., 2006. Real time multiple objects tracking and identification based on discrete wavelet transform. Pattern Recognition 39 (6), 1126–1139. Cho, J., 2004. Sequential cardiac segmentation by seed contour tracking. Electron. Lett. 40 (23), 1467–1469. Comaniciu, D., Ramesh, V., Meer, P., 2003. Kernel-based object tracking. IEEE Trans. Pattern Anal. Machine Intell. 25 (5), 564–577. Conte, D., Foggia, P., Jolion, J.M., Vento, M., 2006. A graph-based multi-resolution algorithm for tracking objects in presence of occlusions. Pattern Recognition 39 (4), 562–572. de Bruijne, M., Nielsen, M., 2004. Shape particle filtering for image segmentation. MICCAI 2004, LNCS, pp. 168–175. Deguchi, K., Kawanaka, O., Okatani, T., 2004. Object tracking by the mean-shift of regional color distribution combined with the particlefilter algorithm. ICPR 3, 506–509. Doretto, G., Cremers, D., Favaro, P., Soatto, S., 2003. Dynamic texture segmentation. In: Proc. of ICCV’03. Nice, France, pp. 1236–1242.
Feghali, R., Mitiche, A., 2004. Spatiotemporal motion boundary detection and motion boundary velocity estimation for tracking moving objects with a moving camera: a level sets PDEs approach with concurrent camera motion compensation. IEEE Trans. Image Process. 13 (11), 1473–1490. Gavrila, D., 1999. The visual analysis of human movement: a survey. Computer Vision and Image Understanding 73 (1), 82–98. Handmann, U., Kalinke, T., Tzomakas, C., Werner, M., von Seelen, W., 1998. Computer vision for driver assistance systems. In: Proc. SPIE, vol. 3364, pp. 136–147. Hong, L., Wan, Y.F., Jain, A., 1998. Fingerprint image enhancement algorithm and performance evaluation. IEEE Trans. Pattern Anal. Machine Intell. 20 (8), 777–789. Ip, H.H.S., Hanka, R., Tang, H.Y., 1997. Segmentation of the aorta using a temporal active contour model with regularization scheduling. Proc. SPIE Medical Imaging 3043, 323–332. Isard, M., Blake, A., 1998. CONDENSATION – conditional density propagation for visual tracking. Int. J. Comput. Vision 29 (1), 5–28. Kass, M., Witkin, A., Terzopoulos, D., 1987. Snake: active contour models. Int. J. Comput. Vision 1 (4), 321–331. Kettnaker, V., Zabih, R., 1999. Bayesian multi-camera surveillance. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Fort Collins, CO, pp. 253–259. Latson, L.A., Powell, K.A., Sturm, B., Schvartzman, P.R., White, R.D., 2001. Clinical validation of an automated boundary tracking algorithm on cardiac MR images. Int. J. Cardiovascular Imaging 17 (4), 279–286. Lee, M., Chen, W., Lin, B., Gu, C., Markoc, T., Zabinsky, S., Szeliski, R., 1997. A layered video object coding system using sprite and affine motion model. IEEE Trans. Circuits Systems Video Technol. 7 (1), 130–145. Mitchell, S.C., Lelieveldt, B.P.F., van der Geest, R.J., Bosch, H.G., Reiber, J.H.C., Sonka, M., 2001. Multistage hybrid active appearance model matching: segmentation of left and right ventricles in cardiac MR images. IEEE Trans. Medical Imaging 20 (5), 415–423. Niethammer, M., Tannenbaum, A., Angenent, S., 2006. Dynamic active contours for visual tracking. IEEE Trans. Automatic Control 51 (4), 562–579. O’Gorman, L., Nickerson, J.V., 1989. An approach to fingerprint filter design. Pattern Recognition 22 (1), 29–38. Osher, S., Sethian, J.A., 1998. Fronts propagation with curvaturedependent speed: algorithms based on Hamilton–Jacobi formulation. J. Comput. Phys. 79, 12–49. Paragios, N., Deriche, R., 2005. Geodesic active regions and level set methods for motion estimation and tracking. Computer Vision and Image Understanding 97 (3), 259–282. Park, S.C., Lim, S.H., Sin, B.K., Lee, S.W., 2005. Tracking non-rigid objects using probabilistic Hausdorff distance matching. Pattern Recognition 38 (12), 2373–2384. Perona, P., Malik, J., 1990. Scale space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Machine Intell. 12 (7), 629–639. Peterfreund, N., 1999. The velocity snake: deformable contour for tracking in spatio-velocity space. Computer Vision and Image Understanding 73 (3), 346–356. Peterfreund, N., 1999. Robust tracking of position and velocity with Kalman snakes. IEEE Trans. Pattern Anal. Machine Intell. 21 (6), 564–569. Rathi, Y., Vaswani, N., Tannenbaum, A., Yezzi, A., 2007. Tracking deforming objects using particle filtering for geometric active contours. IEEE Trans. Pattern Anal. Machine Intell. 29 (8), 1470– 1475. Rudin, L., Osher, S., Fatemi, E., 1992. Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268. Sun, S., Haynor, D.R., Kim, Y., 2003. Semiautomatic video object segmentation using VSnakes. IEEE Trans. Circuits Systems Video Technol. 13 (1), 75–82.
Q. Chen et al. / Pattern Recognition Letters 29 (2008) 126–141 Weichert, J., ter Haar Romeny, B.M., Viergever, M.A., 1998. Efficient and reliable schemes for nonlinear diffusion filtering. IEEE Trans. Image Process. 7 (3), 398–410. Yang, C., Duraiswami, R., Davis, L., 2005. Efficient mean-shift tracking via a new similarity measure. In: Proc. 2005 IEEE Comput. Society Conf. on Comput. Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 176–183.
141
Yilmaz, A., Li, X., Shah, M., 2004. Contour-based object tracking with occlusion handling in video acquired using mobile cameras. IEEE Trans. Pattern Anal. Machine Intell. 26 (11), 1531–1536. Zhu, Z., Ji, Q., Fujimura, K., 2002. Combining Kalman filtering and mean shift for real time eye tracking under active IR illumination. ICPR 4, 318–321.