Two-Stage Object Tracking Method Based on Kernel ... - IEEE Xplore

6 downloads 0 Views 496KB Size Report
Qiang Chen, Quan-Sen Sun, Pheng Ann Heng, Senior Member, IEEE, and De-Shen Xia ... Q. Chen and Q.-S. Sun are with the School of Computer Science and.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 4, APRIL 2010

605

Two-Stage Object Tracking Method Based on Kernel and Active Contour Qiang Chen, Quan-Sen Sun, Pheng Ann Heng, Senior Member, IEEE, and De-Shen Xia

Abstract—This letter presents a two-stage object tracking method by combining a region-based method and a contourbased method. First, a kernel-based method is adopted to locate the object region. Then the diffusion snake is used to evolve the object contour in order to improve the tracking precision. In the first object localization stage, the initial target position is predicted and evaluated by the Kalman filter and the Bhattacharyya coefficient, respectively. In the contour evolution stage, the active contour is evolved on the basis of an object feature image generated with the color information in the initial object region. In the process of the evolution, similarities of the target region are compared to ensure that the object contour evolves in the right way. The comparison between our method and the kernel-based method demonstrates that our method can effectively cope with the severe deformation of object contour, so the tracking precision of our method is higher. Index Terms—Diffusion snake, Kalman filter, mean-shift, object deformation, object tracking.

I. Introduction BJECT TRACKING is an important task in many computer vision applications such as driver assistance [1], video surveillance [2], object-based video compression [3], and so on. Various methods have been proposed and improved, from the simple and rigid object tracking under the condition of a static camera, to the complex and nonrigid object tracking under the condition of a moving camera. For ease of discussion, we classify these methods into two categories: region-based method and contour-based method.

O

Manuscript received May 31, 2006; revised February 10, 2007 and October 12, 2008. First version published January 29, 2010; current version published April 2, 2010. This work was supported by the National Science Foundation of China, under Grants 60 805 003/60 773 172, the Special Grade of China Postdoctoral Science Foundation under Grant no. 200 902 519, and a grant from the Research Grants Council of the Hong Kong Special Administrative Region (Project No. CUHK4121/08E). This paper was recommended by Associate Editor, P. Topiwala. Q. Chen and Q.-S. Sun are with the School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094, China (e-mail: [email protected]; [email protected]). P. A. Heng is with the Department of Computer Science and Engineering, the Chinese University of Hong Kong, Hong Kong 852, China, and also with the Shenzhen Institute of Advanced Integration Technology, Chinese Academy of Sciences, Chinese University of Hong Kong, Shenzhen, China (e-mail: [email protected]). D.-S. Xia is with the School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094, China, and also with the ESIC/ELEC, Rouen, France, and also with the Computer Graphics Laboratory, Centre National de la Recherche Scientifique (CNRS), Paris, France (e-mail: deshen [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2010.2041819

The basic idea of region-based method is to track object with the similarity measure of object region. The Bhattacharyya coefficient and Kullback–Leibler divergence are two popular similarity measures, and the mean-shift algorithm has achieved considerable success in similarity region search due to its simplicity and robustness. The real-time kernel-based object tracking proposed by Comaniciu [4] can successfully track partial occluded nonrigid objects, but cannot cope with the severe deformation of object contours. A more discriminative similarity measure in spatial-feature space was proposed by Yang [5], which is a symmetric similarity function between spatially smoothed kernel-density estimates of the model and target distributions for object tracking. The new similarity measure in the spatial-feature space can effectively cope with the translation and scaling of the object, but does not consider the rotation invariance. To cope with occlusions effectively, Kalman filter [4], particle filter [6], and mean-shift [7] were combined with the mean-shift algorithm. The scale of the mean-shift kernel is a crucial parameter, so many mechanisms were presented for choosing or updating scale. Moments of the sample weight image were used in [8] to compute blob scale and orientation. [4] suggests repeating the mean-shift algorithm at each iteration using window size of ±10% of the current size, and evaluating which scale is best using the Bhattacharyya coefficient. Collins [9] added a scaling factor to the similarity measure, and used the updating rule to update the scale. But all of the scale updating methods cannot solve the object deformation very effectively. For contour-based methods, snake [10], [11] or level set [12] are mainly used to track object contours. Peterfreund presented Kalman snake [13] models, in which energy function was mainly constructed with optical flow. Chung and Chen [14] presented a video segmentation system that integrated Markov random field (MRF)-based contour tracking with graph-cut image segmentation. Yilmaz [15] incorporated prior shape into object energy functions, and used level set to evolve the contour by minimizing the energy functional. In order to track the object with non-Gaussian state density in clutter, Isard and Blake [16] presented the condensation algorithm. Contour-based methods can achieve a high tracking precision, but their robustness is usually not better than that of region-based methods. Furthermore, the computing cost of contour-based methods is usually high, especially for large and fast-moving objects. There are some tracking methods using both region and contour information. Sung and Kim [17] proposed an active

c 2010 IEEE 1051-8215/$26.00 

606

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 4, APRIL 2010

contour-based active appearance model (AAM) to improve the tracking accuracy and the convergence rate of the existing robust AAM [18]. Rathi et al. [19] formulates a particle filtering algorithm in the geometric active contour framework that can be used for tracking moving and deforming objects. Combining the merits of region-based and contour-based methods, we introduce a two-stage object tracking method. First, the kernel-based method is adopted to locate the object region, and the Kalman filter and the Bhattacharyya coefficient are used to determine the initial object tracking position. Then the object feature image is generated according to the color information in the object region, and the diffusion snake is used to evolve the object contour in order to improve the tracking precision.

II. Target Localization A. Target Prediction Based on the Kalman Filter In 1960, Kalman [20] published his famous paper describing a recursive solution to the discrete-data linear filtering problem. The Kalman filter is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error. In this letter, the Kalman filter is used to predict the center [xc, yc] of the object region. Let the displacements in the x and y directions be dx and dy, respectively, and the state vector is X = [xc, yc, dx, dy]. The Kalman filter system model is Xk+1 = FXk + Wk .

(1)

Measurement model is Zk = HXk + Vk where



1 ⎢0 F =⎢ ⎣0 0

0 1 0 0

1 0 1 0

⎤ 0 1⎥ ⎥, 0⎦ 1

(2)

 H=

1 0

0 1

0 0

 0 . 0

Details about the Kalman filter can refer to [21] and [22]. B. Evaluation of the Initial Target Region The center y of the candidate region in the kernel-based method is crucial. When the target region intersects on the object region, the mean-shift algorithm will find the location that maximizes the Bhattacharyya coefficient, which is the new position of the target. If there exists no intersection, the procedure fails, namely the loss of the object. Though Kalman filter can improve the validity of the initial target region, the object loss still exists when the object changes the moving direction suddenly. In this letter, the Bhattacharyya coefficient of the target region is calculated in order to judge whether the target region intersects the object region. If the Bhattacharyya coefficient is very small, the target region does not intersect the object region, and we need to relocate the target region with the following method. Let the center of the object region in the previous frame be x0 , y0 . The high and width of the region are h and w,

respectively. We calculate the Bhattacharyya coefficients of four regions with the centers, [x0 ± 0.5h, y0 ± w], and take the center with the maximum coefficient as the center of the initial target region. C. Kernel-Based Object Tracking Let the reference target model be the probability density function (pdf) q in the feature space. To reduce the computational cost, m-bin histograms are used. Thus, we have qˆ = {ˆqu }u=1···m , m ˆ u = 1, and the target candidate p(y) ˆ = u=1 q {pˆ u (y)}u=1···m , m p ˆ = 1, where y is the center of object u=1 u region. Let Xi∗ i=1···n be the normalized pixel locations in the region defined as the target model. The region is centered at 0. The function b : R2 → {1 · · · m} associates the pixel at location Xi∗ with the index b(Xi∗ ) of its bin in the quantized feature space. The probability of the feature u = 1 · · · m in the target model is then computed as n   2 qˆ u = C k(Xi∗  )δ[b(Xi∗ ) − u] (3) i=1

where k is the Epanechnikov kernel, and δ is the Kronecker delta function. The normalization constant C is derived by m imposing the condition qˆ u = 1. u=1

Let {Xi }i=1···nh be the normalized pixel locations of the target candidate, centered at y in the current frame. Then    nh   y − Xi 2   δ[b(Xi ) − u] pˆ u (y) = Ch k  (4)  h i=1 where h is the bandwidth, and Ch is the normalization constant. The similarity function defines a distance among the target model and candidates. We adopt the Bhattacharyya coefficient to define the distance m 

 ˆ ρ(y) ≡ ρ p(y), ˆ qˆ = pˆ u (y)ˆqu . (5) u=1

The detailed object localization process Bhattacharyya coefficient was introduced in [4].

with

the

III. Evolution of Object Contour After the target localization with the kernel-based method, we adopt the diffusion snake to evolve the object contour in the object feature space, in order to improve the tracking precision. A. Generation of Object Feature Image Let Q0 be the a bin image by quantifying each color component of the initial RGB color image I0 , and let {Yi }i=1,...,n be the pixel locations in the initial object region. The color probability density function in the object region is  n s  wr = C (6) δ [b (Yi ) − r] , r = 1, . . . , a3 i

where the definitions of the constant C and the function δ are the same as those of (3), the exponent 0 < s < 2 can adjust the color difference between the object and the background.

CHEN et al.: TWO-STAGE OBJECT TRACKING METHOD BASED ON KERNEL AND ACTIVE CONTOUR

Fig. 1. Generation of the object feature image. (a) Initialization. (b) Object feature image. (c) Smoothed object feature image.

The values of the parameters a and s depend on the consistency of the color in the object region. If the consistency is good, namely there are few color varieties in the object region and small color change of the object throughout the sequence, the values of a and s should be larger. If there are many color varieties in the object region and large color change of the object, the values of a and s should be smaller. In order to overcome the influence of the illumination changes, we can adopt the chrominance to generate the object feature image instead of the RGB space. If the texture feature in the object region is very obvious, we can adopt the texture feature to generate the object feature image. According to the color probability density function, the object feature image can be obtained FI (i, j) = w (b (Z(i, j)))

(7)

where Z (i, j) denotes the pixel with the image coordinate (i, j). The basic idea of (7) is that for each pixel in the image, the corresponding value of the color probability density function is used to represent the value of the object feature image. Fig. 1 shows the generation of an object feature image. Fig. 1(a) shows the initial object contour in the first frame. Fig. 1(b) shows the object feature image that is generated with (7), where a = 8 and s = 1. Fig. 1(c) is the smoothed object feature image with the Gaussian filter, where some noise is removed. B. Diffusion Snake Cremers et al. [23] presented a diffusion snake method by integrating shape statistics into the Mumford–Shah model [24]. To construct the energy function conveniently, we adopt a closed spline curve to represent the boundary C C : [0, 1] → C (s) =

N 

pn Bn (s)

(8)

n=1

where Bn are periodic, quadratic B-spline basis functions and pn = (xn , yn )t are the spline control points. s is the knot of the B-spline curve C and N is the total number of spline control points. In 1989, Mumford and Shah presented a variational approach for image segmentation, which consists of the minimization of the following energy function:   1 1 |∇u|2 dx+ν C (9) (I − u)2 dx+ λ2 Ei (u, C) = 2  2 −C where I is the input image, namely the object feature image FI in this letter. u is a piecewise smooth function,  is the image

607

Plane, and λ, ν are positive constants. Replace the original 1 length norm C by the squared L2-norm, L (C) = 0 Cs2 ds, to obtain the function for the diffusion snake   1 1 |∇u|2 dx + νL (C) . (I − u)2 dx + λ2 Ei (u, C) = 2  2 −C (10) The diffusion snake can be considered as a hybrid model which combines the external energy of the Mumford–Shah function with the internal energy of the snakes. For a fixed segmentation u, minimizing the diffusion snake functional with respect to the contour C, which gives the Euler–Lagrange equation

∂Ei − = e (s) − e+ (s) · n (s) − νCss (s) = 0, ∀s ∈ [0, 1] (11) ∂C where Css is the second derivative of the B-spline curve about s, and the terms e+ and e− denote the energy density inside and outside the regions of the contour C (s), respectively e+/− = (I − u)2 + λ2 (∇u)2

(12)

and n denotes the outer normal vector on the contour. Solving the minimization problem of (10) by gradient descent results in the evolution equation

∂Ei + ∂C =− = e (s) − e− (s) · n (s) + νCss (s) . (13) ∂t ∂C Equation (13) can be converted to an evolution equation for the control points by inserting the definition (8) of the contour as a spline curve. We can obtain the coordinates of control point m     dxm (t) = i B−1 mi e+ (si , t) − e− (si , t) nx (si , t) dt

+ν (xi−1 − 2xi + xi+1 )     dym (t) = i B−1 mi e+ (si , t) − e− (si , t) ny (si , t) dt (14)

+ ν (yi−1 − 2yi + yi+1 ) where the cyclic tridiagonal matrix B contains the spline basis functions evaluated at the nodes. C. Evolution Control Because the generated object feature image simply distinguishes object and background, the object contour can probably evolve into the background wrongly. In order to make the object contour evolve in the right way, we compare the similarities of target regions in the process of the contour evolution. After some evolution, the Bhattacharyya coefficient of the target region is calculated. If the coefficient becomes large, the object contour continues to evolve. Otherwise, the evolution stops. IV. Experimental Results and Discussion We have tested our method using various videos downloaded from the Internet, and have performed the experiments on a 2.8 GHz Pentium 4 PC with 512 M memory. In the experiments the RGB color space was taken as feature space and it was quantized into 16 × 16 × 16 bins.

608

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 4, APRIL 2010

Fig. 2.

Tracking of the Plane sequence with the kernel-based method (top) and our method (bottom). Frames 1, 8, 15, 25, 45, and 54 are shown.

Fig. 3.

Tracking of the Ball Board sequence with the kernel-based method (top) and our method (bottom). Frames 1, 11, 23, 33, and 45 are shown.

Fig. 4. Comparison of the tracking error. (a) Tracking error of Fig. 2. (b) Tracking error of Fig. 3.

Fig. 5. Comparison of the location error. (a) Location error of Fig. 2. (b) Location error of Fig. 3.

Fig. 2 shows the tracking results of the Plane sequence with the kernel-based method [4] and our method, in which the frames with large deformation are shown. The Plane sequence has 60 frames of 160 × 120 pixels. The color spaces for our method and the kernel-based method are quantized into 8 × 8 × 8 bins and 16 × 16 × 16 bins, respectively. The first frame is the manual initialization. The same Kalman filter is used for the target prediction, and the scale of the kernel is adjusted with ±10% of the current size. Fig. 3 shows the tracking results of the Ball Board sequence with the kernel-based method [4] and our method, which has 45 frames of 352 × 240 pixels. In order to evaluate the tracking precision quantitatively, the error measure in this letter is as follows:      O 1 O2 B 1 B2  error = 1 − (15) O1 B1

which is defined as follows:

where O1 and B1 are the object region and background region in the manual segmentation result, O2 and B2 are the object region and background region in the segmentation result with the object tracking method. In addition, the location error of object center is adopted to evaluate the location precision,

d = Ca − Cm 

(16)

where Ca and Cm are the object center coordinates of the object tracking and manual segmentation results, respectively. Fig. 4 shows the tracking errors of Figs. 2 and 3, respectively. Fig. 4 indicates that our method can cope with the deformation and the scale change more effectively than the kernel-based method, so our method can achieve a higher tracking precision. But the time performance of our method is not better than that of the kernel-based method. For the tracking of the Plane sequence, the time is 463 s with our method, but 54 s with the kernel-based method. Fig. 5 shows the location errors of Figs. 2 and 3, respectively. From Fig. 5, we can observe that our method has the better location of object center than the kernel-based method for most frames. For Figs. 2 and 3, there does not exist object loss. If we omit the target localization and only use the active contour, the object loss will appear. The reason of the object loss is that the initial target region does not include the object region, or includes very small object region.

CHEN et al.: TWO-STAGE OBJECT TRACKING METHOD BASED ON KERNEL AND ACTIVE CONTOUR

Fig. 6.

609

Tracking of the person’s head in a Movie sequence with the kernel-based method (top) and our method (bottom). Frames 1, 2, 16, and 24 are shown.

Fig. 6 shows the tracking of the person’s head in a Movie sequence. Because the displacement of the object between two consecutive frames is large, such as the frames 1 and 2 in Fig. 6, the object will be easily lost with the active contour method, and without the object location based on the kernelbased method. V. Conclusion By combining the merits of the region-based method and the contour-based method, we have presented a two-stage object tracking method. Using the kernel-based method, we can locate the object effectively in complex condition with camera motion, partial occlusions, clutter, etc., but the tracking precision is not high when the object severely deforms. In order to improve the tracking precision, we used the contourbased method to track the object contour precisely after the target localization. The experimental results demonstrated that our method can achieve a higher tracking precision than that of the kernel-based method, but our method is time-consuming. Because in this letter the object feature image is based on the color information, our method cannot effectively track the object when the color feature of the object is very similar to that of the background. In the future research, we will incorporate other image information, such as the texture and shape information, into the color information to generate more robust object feature image. References [1] U. Handmann, T. Kalinke, C. Tzomakas, M. Werner, and W. von Seelen, “Computer vision for driver assistance systems,” in Proc. Soc. PhotoOptic. Instrum. Eng., vol. 3364. 1998, pp. 136–147. [2] D. Gavrila, “The visual analysis of human movement: A survey,” Comput. Vision Image Understand., vol. 73, no. 1, pp. 82–98, Jan. 1999. [3] M. Lee, W. Chen, B. Lin, C. Gu, T. Markoc, S. Zabinsky, and R. Szeliski, “A layered video object coding system using sprite and affine motion model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 1, pp. 130–145, Feb. 1997. [4] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 5, pp. 564–577, May 2003. [5] C. Yang, R. Duraiswami, and L. Davis, “Efficient mean-shift tracking via a new similarity measure,” in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognit. (CVPR), vol. 1. 2005, pp. 176–183. [6] C. Chang, and R. Ansari, “Kernel particle filter for visual tracking,” IEEE Signal Process. Lett., vol. 12, no. 3, pp. 242–245, Mar. 2005. [7] H. Y. Zhou, Y. Yuan, and C. M. Shi, “Object tracking using SIFT features and mean shift,” Comput. Vision Image Understand., vol. 113, no. 3, pp. 345–352, Mar. 2009.

[8] G. R. Bradski, “Computer vision face tracking for use in a perceptual user interface,” Intell. Technol. J., vol. 2, no. 2, pp. 1–15, 1998. [9] R. T. Collins, “Mean-shift blob tracking through scale space,” in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognit. (CVPR), vol. 2. 2003, pp. 234–240. [10] S. Sun, D. R. Haynor, and Y. Kim, “Semiautomatic video object segmentation using VSnakes,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 1, pp. 75–82, Jan. 2003. [11] Q. Chen, Q. S. Sun, P. A. Heng, and D. S. Xia, “Parametric active contours for object tracking based on matching degree image of object contour points,” Pattern Recognition Letters, vol. 29, no. 2, pp. 126–141, Jan. 2008. [12] N. Paragios and R. Deriche, “Geodesic active regions and level set methods for motion estimation and tracking,” Comput. Vision Image Understand., vol. 97, no. 3, pp. 259–282, Mar. 2005. [13] N. Peterfreund, “Robust tracking of position and velocity with Kalman snakes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 6, pp. 564–569, Jun. 1999. [14] C. Y. Chung and H. H. Chen, “Video object extraction via MRF-based contour tracking,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 1, pp. 149–155, Jan. 2010. [15] A. Yilmaz, X. Li, and M. Shah, “Contour-based object tracking with occlusion handling in video acquired using mobile cameras,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 11, pp. 1531–1536, Nov. 2004. [16] M. Isard and A. Blake, “Condensation: Conditional density propagation for visual tracking,” Int. J. Comput. Vision, vol. 29, no. 1, pp. 5–28, 1998. [17] J. Sung and D. Kim, “A background robust active appearance model using active contour technique,” Pattern Recognit., vol. 40, no. 1, pp. 108–120, Jan. 2007. [18] R. Gross, I. Matthews, and S. Baker, “Constructing and fitting active appearance models with occlusion,” in Proc. IEEE Workshop Face Process. Video, 2004, pp. 674–679. [19] Y. Rathi, N. Vaswani, A. Tannenbaum, and A. Yezzi, “Tracking deforming objects using particle filtering for geometric active contours,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 8, pp. 1470–1475, Aug. 2007. [20] R. E. Kalman, “A new approach to linear filtering and prediction problems,” Trans. Am. Soc. Mechan. Eng.-J. Basic Eng., vol. 82, no. 1, pp. 35–45, 1960. [21] G. Welch, and G. Bishop, “An introduction to the Kalman filter,” Dept. Comput. Sci., Univ. North Carolina, Chapel Hill, Tech. Rep. TR95-041, 2004. [22] D. Salmond, “Target tracking: Introduction and Kalman tracking filters,” in Proc. IEE Target Tracking: Algorithms Applicat. (Ref. No. 2001/174), vol. 2. 2001, pp. 1–16. [23] D. Cremers, F. Tischhäuser, J. Weickert, C. Schnörr, “Diffusion snakes: Introducing statistical shape knowledge into the Mumford–Shah functional,” Int. J. Comput. Vision, vol. 50, no. 3, pp. 295–313, Dec. 2002. [24] D. Mumford and J. Shah, “Optimal approximations by piecewise smooth functions and associated variational problems,” Comm. Pure Appl. Math, vol. 42, no. 5, pp. 577–685, 1989.