2010 International Conference on Pattern Recognition
3D Model Based Vehicle Tracking Using Gradient Based Fitness Evaluation Under Particle Filter Framework Zhaoxiang Zhang1 , Kaiqi Huang2 , Tieniu Tan2 and Yunhong Wang1 School of Computer Science and Engineering, Beihang University, Beijing, China 2 National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
[email protected], {kqhuang, tnt}@nlpr.ia.ac.cn,
[email protected] 1
Abstract We address the problem of 3D model based vehicle tracking from monocular videos of calibrated traffic scenes. A 3D wire-frame model is set up as prior information and an efficient fitness evaluation method based on image gradients is introduced to estimate the fitness score between the projection of vehicle model and image data, which is then combined into a particle filter based framework for robust vehicle tracking. Numerous experiments are conducted and experimental results demonstrate the effectiveness of our approach for accurate vehicle tracking and robustness to noise and occlusions.
1
Introduction
Object tracking in videos is an important issue in image processing and video analysis. With object tracking achieved, temporal information can be applied to enhance accuracy and robustness of low level processing like object detection and classification. Furthermore, object tracking makes it possible to collect trajectories of objects in videos for high level processing such as behavior analysis and semantic interpretation. On the other hand, object tracking is a very challenging problem to solve. Firstly, objects in videos have significant appearance variance in different view angles and are easy to suffer from illumination changes, noise and occlusions. Secondly, objects in videos may move arbitrarily so that motion prediction model may fail to track the moving objects. In addition, appearance models of objects should be distinctive enough to distinguish from nearby distractors. Besides the most popular appearance based tracking, there are also other strategies like point tracking and silhouette tracking [1]. However, these two categories utilize even less information and are not competent to robust object tracking. Due to its importance and challenges, much work has been done in the field of object tracking. In [2, 3, 4], objects are represented using statistics of object appearance like colors with all geometrical layouts ignored. 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.437
Wang et al. [5] proposed an adaptive model using Gaussian Mixture Model in a jointed color space. These methods show outstanding performance in handing occlusions and pose variations. However, the features they used are sensitive to illumination changes. In contrast, Hager and Belhumeur [4] made use of vector space formulation to deal with illumination changes but suffers from occlusions and pose variations. In [6], an online subspace learning algorithm was proposed to model appearance changes by updating subspace incrementally during tracking. Tuzel et al. [7] proposed a covariance matrix descriptor to characterize object appearance, which is robust to both illumination changes and pose variations but shows limited ability to preserve object geometrical layouts. In addition, Lou et al. proposed a method for 3D model based object tracking using Extended Kalman Filter (EKF) [8]. However, the method requires manual initialization of tracking and EKF cannot deal with too complicated motion patterns. In this paper, we focus on 3D model based vehicle tracking in monocular videos of calibrated traffic scenes. Instead of requiring a distinctive appearance model to describe vehicles in different poses in videos, a 3D wire-frame model is set up as prior information to enhance the accuracy and robustness of vehicle tracking. The 3D model based methods have inherent advantages to deal with pose variations and occlusions. The robustness to illumination changes is obtained by introducing an image gradient based fitness evaluation to estimate fitness scores between the projection of 3D model and image data, which is then combined into a particle filter based framework for vehicle tracking. Extensive experiments are conducted and experimental results demonstrate the effectiveness and robustness of our approach, which is described in detail as follows.
2
Preprocessing
3D model based vehicle tracking is to estimate the 3D pose of vehicle models frame by frame from traffic scene videos. To achieve this aim, we should acquire 1775 1771
the vehicle model first and initialize its pose parameters for tracking. The 3D vehicle model can be automatically acquired from image data, which has been described in detail in our previous work [9]. One example of a 3D sedan model recovered is shown in Fig. 1(a), which can be taken as prior information for model based tracking. It is a common constraint that most vehicle are moving on the ground plane so that the 3D pose of the vehicle model can be composed of its position (X, Y ) on the ground plane and its orientation (θ) about its vertical axis. With camera calibrated, we can conveniently initialize the 3D pose from the detected bounding box of vehicles. The (X, Y ) is initialized as the point on the ground plane homographic to the center of the bounding box in image plane. The θ is initialized using Weak Perspective Assumption (WPA). With two main dominant directions d1 and d2 estimated within the bounding box using gradient information and the angle α between d1 and d2 calculated, the θ can be simply initialized by solving: d1 · d2 cosα = (1) |d1 ||d2 |
G ȕO
ȕN
6L
6M GM
ȕL
GL
6N GN
ȕM
GP 6P
Į
ȕP
Z
GO 6 O
G
GQ ȕQ 6Q
/
Figure 2. Fitness evaluation [10] of the projected line. As a result, we can estimate the fitness score from all pixels within the rectangle. For a pixel Si within the rectangle, we can simply calculate its gradient magnitude (m(x, y)) and orientation (β(x, y)) from pixel difference. Then the fitness score ESi contributed by Si is measured by the vertical component of its gradient magnitude along the line direction: ESi = |m(x, y) · sin(β(x, y) − α)|
(2)
It is evident that not all pixels in the rectangle have the same weight for fitness evaluation. For those closer to the projected line, the pixels should give more contributions to the fitness evaluation function (FEF). As a result, we give every pixel Si a weight value which equals to Nμ,σ (di ). Here di is the distance between Si and the projected line and Nμ,σ is a standard Gaussian distribution with μ = 0 and σ = w. In this way, the fitness score of the projected line l is measured by weighted sum of ESi as:
The recovered pose is used for initialization of the following model based tracking, which should recover the 3D pose frame by frame accurately so that the projection of 3D model can fit image data very well as illustrated in Fig. 1(b).
El =
[ESi · G0,w (di )]
(3)
Si
(a) 3D sedan model
and we calculate the whole FEF between projection of vehicle model and image data from all visible lines as:
(b) Required results
E=
Figure 1. Problem description [10]
3
[log(El )]
(4)
l
Fitness Evaluation
The fitness scores are taken as similarity measurement between the observation and object model, which is combined into a particle filter based object tracking framework which will be described in detail in the next section.
In this section, we will briefly introduce our method for fitness evaluation between the projection of 3D model and image data, which is utilized to measure the similarity between the observation and object model in the particle filter based tracking framework. Part of the following section is referenced from [10] The projection of the 3D wire-frame model in image plane is composed of a series of visible line segments. For every projected line whose direction is assigned as α with length of L in image plane, we can confirm a L × 2w virtual rectangle symmetric about it as shown in Fig. 2. If the line fits image data well, the gradient directions of pixels with large gradient magnitudes in the rectangle should focus on the perpendicular direction
4
Particle Filter Based Tracking
The state variable St in model based tracking corresponds to the 3 pose parameters (Xt , Yt , θt ). In our method, we assume each component in St to be independent and satisfy a Gaussian distribution around its counterpart in St−1 as: p(St |St−1 ) = N(St ; St−1 , Σ) 1776 1772
(5)
(a) Tracking result of the red sedan
(b) Tracking result of the black hatchback
Figure 3. Tracking results of vehicles
(i)
(6)
With all fitness scores calculated and the weights normalized, the object state at time t is estimated as the state of the highest weight: (j) Sˆt = St |w(j) =max(wt ) i
(7)
(i) St
is then used to generate N × Every object state (i) wt object states at time t + 1 from the assumption:
−
(i) St
5
3RVH;PP
í
(8)
In this section, we have described the whole framework for 3D model based vehicle tracking. Experimental results and analysis will be described in detail in the next section.
Experimental Results and Analysis
Extensive experiments are conducted and experimental results are presented in this section to demonstrate the performance of the proposed approach. PETS 2000 Database [11] is used for processing and all experiments are carried out on a PC computer with P4 3.0G CPU and 512M DDR. Two series from the database are taken to test the accuracy of our model based tracking. One example is a red sedan running into the scene while the other is a
í
∼ N(0; Σ)
í
(j) St+1
JURXQGWUXWK WUDFNLQJUHVXOW
JURXQGWUXWK WUDFNLQJUHVXOW
í
í í
í
)UDPH1XPEHU
(a) Accuracy of X
JURXQGWUXWK WUDFNLQJUHVXOW
3RVH$QJOH
(i)
wt ∝ Et
black hatchback coming from far away. With the 3D model obtained using [9] and the 3D pose initialized with the method described above, both of the series have achieved very good performance as shown in Fig. 3(a) and Fig. 3(b), which demonstrate the effectiveness of our approach for model based tracking. To further test the accuracy of model based tracking, we sample several frames of the red sedan series and manually adjust the pose parameters of the 3D model to obtain (X ∗ , Y ∗ , Z ∗ ) as ”ground truth” so that the projections can fit with image data perfectly. The accuracy of tracking is evaluated by the distance between recovered 3D pose and the ”ground truth”. The curves of the 3 pose parameters are shown in Fig. 4. As we can see, the recovered pose parameters are very close to the ”ground truth”, which illustrates the accuracy of our model based tracking. 3RVH