RANGE BASED OBJECT TRACKING AND SEGMENTATION Jehoon Lee1 , Peter Karasev1 , and Allen Tannenbaum1,2 1
Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA 2 Electrical Engineering, Technion-Israel Institute of Technology, Haifa, Israel.
[email protected],
[email protected],
[email protected] ABSTRACT
We present an approach for tracking a moving object based on range information in stereoscopic temporal imagery. Range information is filtered by the proposed dynamic scheme to improve the quality of active contour segmentation, and to better estimate the global motion of the object. Region-based active contours using Bhattacharyya gradient flow is exploited due to its robustness to noise of a cluttered depth map. Such a sensor-fusion approach of an active contour segmentation of the image data with the statistics of a depth map provides a state estimate that is more accurate than what would be possible with either of the two alone. Experimental results demonstrate the applicability of the proposed method on several stereoscopic sequences. Index Terms— Visual tracking, segmentation, geometric active contours, range information, depth maps. 1. INTRODUCTION Visual tracking is a central topic in computer vision. Successful controlled active vision requires accurate state estimation via extraction of the relative camera-target pose from a temporal sequence of image data. Segmentation is a vital step that identifies the portion of the image generated by the target. Numerous algorithms have been proposed to segment and track objects of interest in recent years (e.g., see [1] and references therein). In this note, we address the problem of tracking a moving object based on a stereoscopic image sequence with statisticsbased active contours. Because of noise, various photometric artifacts, clutter, etc., the active contour may not always converge to boundary of the object of interest. To solve such a problem, range data (or depth maps) computed by determining correspondences from stereo sequences can be used to dramatically improve the quality of tracking/segmentation This work was supported in part by grants from NSF, AFOSR, ARO, as well as by a grant from NIH (NAC P41 RR-13218) through Brigham and Women’s Hospital. This work is part of the National Alliance for Medical Image Computing (NAMIC), funded by the National Institutes of Health through the NIH Roadmap for Medical Research, Grant U54 EB005149. Information on the National Centers for Biomedical Computing can be obtained from http://nihroadmap.nih.gov/bioinformatics/.
via active contours. Such information is in particular crucial for problems involving aim-point maintenance. Since there are many papers devoted to this subject, the followings are illustrative but by no means exhaustive reviews of the field. Adaptive object segmentation methods based on depth and spatio-temporal information are presented in [2]. Here, different disparity planes are used to segment the area with smooth disparity variations. In [3], a depth-based tracker is introduced to enhance object tracking using time-of-flight range imaging sensor in the limited illumination. The combination method of stereo-derived edge information and image intensity is proposed in [4] for contour-based segmentation. In this paper, we combined the advantages of geometric active contours and range information for better performance of object segmentation and tracking in a stereo sequence. The proposed algorithms are closely related to and based on our previous work in [5]. Compared with the approaches in [5], in this paper, the weighted depth maps is dynamically generated according to the previous results of tracking and range information. It is also used to estimate the global motion of an object instead of particle filters. In addition, the convex hull of the segmented contour of a previous frame is used for an initial contour of a current frame. These new approaches lead to better computational efficiency and simplification of the entire tracking framework. 2. PROPOSED ALGORITHM 2.1. Geometric Active Contours In the present work, our approach to segmentation relies upon general active contour concept originally introduced by Kass et. al. [6]. The basic idea of active contours is to deform a curve to capture the object of interest within the given image via the minimization of a certain energy functional. Active contours are often represented implicitly via the level set method which offer a powerful representation tool for numerical implementation of curve evolution [7]. In level set methods, a closed curve C is represented as the zero level set of a higher dimensional function Φ, which is typically chosen to be a signed distance function: Φ < 0 inside C and Φ > 0 outside C. Now, the curve can be described by an
(a) Occlusion handling using range information
Fig. 1. Upper row: Original left image with an initial contour (left), depth map (middle), and the weighted depth map with fw ∼ N (52, 2.52 ) (right). γ = 255. Lower row: Segmentation results using only image intensities (left), depth map (middle), and the weighted depth map (right). implicit surface: C = {x | Φ(x) ≡ 0, x ∈ Ω}, where Ω is the domain of an image I(x): R2 → Z which is mapping to the photometric variable z ∈ Z. In the present work, we adopted the region-based active contours driven by the Bhattacharyya gradient flow [8]. Therefore, the segmentation is achieved by separating object of interest from backgrounds as making interior and exterior distributions of the segmenting curve be maximally different. This method is robust appropriate for our scenarios because it is robust enough against noise to deal with highly cluttered depth maps. The Bhattacharyya distance between two probability density functions (pdfs) is defined by: DB = − log(B),
(1)
(b) Segmenation of a car through partial occlusions
Fig. 2. (a) Original depth map (left), a result after weighting range data (middle), and a result after a morphological smoothing filter (right). (b) Original left image with an initial contour (far left), segmentation results using only image intensities (the second column), depth map (the third column), and the weighted depth map (far right). where B 1 1 S= − 2 Ai Ao Z 1 K(z − I(x)) + 2 Z
1 Ao
s
1 Pi (z) − Po (z) Ai
s
Po (z) Pi (z)
! dz. (4)
R p
where B = Z Pi (z)Po (z)dz, which varies between 0 and 1 (0 indicates a complete mismatch while 1 represents perfect correspondence). Pi and Po are pdfs defined inside and outside curve C, respectively: R K(z − I(x))H(−Φ(x))dx R Pi (z) = Ω , H(−Φ(x))dx Ω R K(z − I(x))H(Φ(x))dx R Po (z) = Ω , (2) H(Φ(x))dx Ω where K is the given kernel. Popular choices for the kernel K are either Gaussian function or the Dirac delta function. H is the Heaviside step function such that H(Φ) = 1 for Φ ≥ 0 and H(Φ) = 0 for otherwise. Now, the optimal level set function with a regularizing term for smooth curve evolution is defined as: Z Φ? = arg inf {B(Φ) + α k∇H(Φ)k dx}, (3) Φ
Ω
where α is a user defined regularization constant (α > 0) and ∇ denotes the gradient. After differentiating Pi and Po and derivative of B with respect to Φ, we have the gradient flow for the level set evolution as follows (see [5, 8] for a detailed derivation.); ∂Φ = δ(Φ)(ακ − S), ∂t
Here δ(·) is the delta function. Ai and Ao are the areas inside and outside the segmenting curve, respectively. The curva∇Φ ture κ is given by: κ = div{ k∇Φk }. The gradient flow (4) will converge to a contour which maximizes the discrepancy between the distributions inside and outside the curve. 2.2. Weighted Depth Maps Given the calibration information of the cameras, the disparity function D(x) can yield the depth map, Dp (x): R2 → D, which maps the depth (or range) value d ∈ D, of a pair of images. Thus, the position vector of an object is represented by [x, y, d]T . Depth information simplifies the segmentation task and enhances its performances because the range values of an object are quasi-homogenous even though its intensity distribution is multi-modal and non-homogenous. Moreover, it provides an essential cue in dealing with partial occlusions; see Figure 2. The basic idea of the proposed segmentation scheme is to weight depth values in proportion to the probability of the object’s appearance before applying active contour segmentation. This allows for improved segmentation results in a highly cluttered environment. A kernel fw is a non-negative real-valued integrable weighting function. In our work, we chose a Gaussian func-
2.3. Dynamic Weighting Scheme The global position of an object is described by its centroid and depth value [xc , yc , dp ]T , and the local motion is represented by the segmenting curve C evolved by the gradient flow in (6). Here, we assume that the difference of the target’s location between consecutive frames is small. In this paper, the problem of tracking the position of an object at time t is 2 estimating the appropriate parameters, (dp )t and (σw )t , for the weighted depth map in (5). The depth value is obtained from the segmented curve of the previous frame: R (Dp )t−1 (x)H(−Φt−1 (x))dx . (7) (dp )t = Ω R H(−Φt−1 (x))dx Ω 2 The variance, σw , should be chosen to reflect the variation of depth values inside the tracked object. Thus, it is defined by the combination of mean and variance of depth values inside segmented contour of the previous frame: ζ − (dp )t 1 2 +V , (8) (σw )t = 2 max(It ) − min(It )
where R V = Fig. 3. The diagram of the proposed tracking framework. IL and IR denote a left image and a right image, respectively. 2 tion: N (dp , σw ). The weighted depth map is given by: 2 Dw (x) = fw (Dp (x)) = γ · N (dp , σw ) ⊗ fs (·) 2 (Dp (x) − dp ) γ = √ exp ⊗ fs (·) 2 2σw 2π
(5)
where γ is a weight parameter (γ > 0) and fs (·) is a morphological smoothness regularization filter. Next, we have the speed term S for (4) by substituting Dw (x) in (5) for an image space I(x) and choosing the delta function as K in (4): B 1 1 S= − 2 Ai Ao s s ! Z 1 1 Pi (z) 1 Po (z) + δ(z − Dw (x)) − dz. 2 Z Ao Po (z) Ai Pi (z) (6) Figure 1 shows segmentation results using different image information. Only the segmentation obtained using the weighted depth maps shows an acceptable result without loss of information and divergence. Occlusion handling of the proposed algorithm is indicated in 2. Here, the stop sign disappears after weighting the depth maps and the car is highlighted after applying a smoothing filter fs (·). As can be seen in 2(b), the contours leak into the stop sign and nearby structures excepts for the case of using the proposed scheme.
Ω
[(Dp )t−1 (x)H(−Φt−1 (x)) − (dp )t ]2 dx R , H(−Φt−1 (x))dx Ω
(9)
2 and ζ is a maximum allowance for σw . In our work, ζ was chosen to be about 20. The centroid of the object is taken as the centroid of the segmented curve Ct−1 and an initial contour for Ct is obtained from the convex hull of Ct−1 . The proposed tracking framework is composed of two parts: the dynamic weighting scheme to estimate the parameters for the weighting kernel fw in order to achieve the weighted depth maps, and active contour segmentation for tracking deformations of an object. The diagram of the entire tracking procedure is illustrated in Figure 3. The initialization process is carried out manually to identify the position of the target at the first frame.
3. EXPERIMENTS The proposed algorithm was tested on several illustrative sequences in real environments. The experimental results are shown only on the left image of stereo image pairs. The results are obtained without severe computational burden (approximately 5 seconds per frame on a 3.6GHz and 2GB Windows machines). In the test sequence I, the car moves in clockwise loops and a stereo camera is fixed. Thus, the car’s shape changes significantly whenever it moves. However, the car is accurately tracked by the proposed algorithms despite the large shape variations as shown in Figure 4. The test sequence II shows the tracking sequence taken from a moving camera following the car while trying to keep a constant distance on a road. The car is turning to the left and right
Fig. 4. Test Sequence I (frame order: from left to right).
Fig. 5. Test Sequence II (frame order: from left to right).
Fig. 7. First row: Loss of tracking via [5] due to an object with similar statistical depth information. Second row: Successful result of the proposed scheme. 5. REFERENCES Fig. 6. Test Sequence III (frame order: from left to right).
several times. Note that illumination conditions changed and backgrounds are cluttered. Shape deformations of the car are smaller than the test sequence I, but its size is not enough large to extract it from backgrounds. The several captures demonstrate the robust tracking results of the proposed algorithms in Figure 5. Figure 6 shows the reliable results of tracking a walking pedestrian. Here, as a stereo camera approaches the pedestrian, as the depth value of the pedestrian decreases largely. In Figure 7, the tracker introduced in [5] eventually lost the tracked car due to another truck with a similar statistical depth information which prevents tracker from sticking to the tracked car. However, the proposed dynamic weighting scheme allows the tracker to track the tracked car robustly without divergence.
4. CONCLUSION In this note, we presented a straightforward algorithm to track deformable objects in a time-varying stereo sequence. The dynamic weighting scheme and region-based active contours are used for estimating the global motion of an object and its local deformations, respectively. The proposed method shows the robustness to track a moving object with large deformations in a highly cluttered environment. In our future research, sophisticated shape analysis would be adopted for more challenging topics such as tracking multiple objects with a similar range value.
[1] A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” ACM Computing Surveys (CSUR), vol. 38, no. 4, 2006. [2] J. Wei, S. Wang, L. Chen, and T. Guan, “Adaptive Stereo Video Object Segmentation Based on Depth and SpatioTemporal Information,” in WRI World Congress on Computer Science and Information Engineering, 2009, pp. 140–144. [3] L. Sabeti, E. Parvizi, and Q.M.J. Wu, “Visual tracking using color cameras and time-of-flight range imaging sensors,” Journal of Multimedia, vol. 3, no. 2, pp. 29, 2008. [4] D. Markovic and M. Gelautz, “Experimental combination of intensity and stereo edges for improved snake segmentation,” Pattern Recognition and Image Analysis, vol. 17, no. 1, pp. 131–135, 2007. [5] J. Lee, S. Lankton, and A. Tannenbaum, “Object Tracking and Target Reacquisition based on 3D Range Data for Moving Vehicles,” submitted for publication to IEEE Transactions Image Processing, 2009. [6] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” International journal of computer vision, vol. 1, no. 4, pp. 321–331, 1988. [7] J.A. Sethian, Level set methods and fast marching methods, Cambridge university press, 1999. [8] O. Michailovich, Y. Rathi, and A. Tannenbaum, “Image segmentation using active contours driven by the Bhattacharyya gradient flow,” IEEE Transactions on Image Processing, vol. 16, no. 11, pp. 2787–2801, 2007.