Tracking Objects through Occlusions Using Improved ... - IEEE Xplore

5 downloads 0 Views 392KB Size Report
Tracking Objects through Occlusions Using Improved Kalman Filter. Jin Wang, Fei He, Xuejie Zhang, Yun Gao. School of Information Science and Technology.
Tracking Objects through Occlusions Using Improved Kalman Filter Jin Wang, Fei He, Xuejie Zhang, Yun Gao School of Information Science and Technology Yunnan University Kunming, China [email protected], [email protected], [email protected], [email protected] multiple camera inputs to overcome occlusion in multiobject tracking. Khan and Shah [7] presented a framework to track people in the presence of occlusion. First, they segmented a person into classes of similar color using the Expectation Maximization Algorithm. Then they used a maximum a posteriori probability approach to track these classes in successive image frames. Jang et al. [8] proposed a model-based tracking algorithm. The algorithm constructed a model from the detected moving object and matched the model with successive image frames to track the target object. They used Kalman filter to predict motion information in order to reduce the computation complexity. Peterfreund [9] used Kalman filter to track contours of nonrigid objects. This method employed an optical-flow measurements and a Kalman filter to detect and reject the measurement which belong to other objects. These works cannot deal with occlusions. Tao et al. [10] proposed a dynamic layer representation for tracking moving objects, including layer motion, ownership and appearance. These three components of the dynamic layer representation are estimated simultaneously over time in a MAP framework. Their system can deal with partial occlusion of passing vehicles. Roh et al. [11] use an appearance model based on temporal color to track multiple people in the presence of occlusion. They use temporal color features which combine color values with associated weights. The weights are determined by the size, duration, frequency, and adjacency of a color object. In some instances, this method can only deal with partial occlusion.

Abstract—In a visual surveillance system, robust tracking of moving objects which are partially or even fully occluded is very difficult. In this paper, we present a method of tracking objects through occlusions using a combination of Kalman filter and color histogram. By changing covariance of process noise and measurement noise in Kalman filter, this method can maintain the tracking of moving objects before, during, and after occlusion. Experiments which described on several test sequences of the open PETS2000 and PETS2001 datasets have demonstrated the effectiveness and robustness of this method. Keywords-tracking; histogram

occlusion;

Kalman

filter;

color

I. INTRODUCTION Visual surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most active research topics in computer vision [1]. It has a wide range of potential applications, such as traffic surveillance in cities, human identification, and security guard for communities or important buildings, etc. In general, the processing framework of visual surveillance in dynamic scenes including the following stages: objects detection, tracking, understanding and description of behaviors, and human identification. Behavior understanding and human identification depend on the results of objects tracking which is one of the most important parts in visual surveillance Tracking is, in fact, the construction of correspondence relationships between “tracked objects” in previous frames and “detected objects” in the current frame [2]. If a “tracked object” moves behind a view-block, such as a tree or a car, it will become undetected. In this case, construction of correspondence becomes difficult. When the “undetected” object appears again, it will be tracked as another new object. In a robust visual surveillance, occlusion cannot be ignored.

B. Our Method In an actual processing of tracking, if there is only appearance information, tracking object which are disturbed by complex occlusion, illumination variation and other variations will be very difficult. A combination of appearance and motion information can improve the robustness of visual surveillance. On the other hand, tracking can be considered as a problem of estimating probability [12]. It can be solved by a statistic model of probability which describes the motion state and a method of Bayesian estimating. Kalman filter is a classic Bayesian estimating method. In linear and Gaussian conditions, the optimize solution about motion state of objects can be obtained by Kalman filter. Therefore, we propose a method to track object basing on both appearance and motion information. We use Kalman filter to model the positions and velocities of objects and then combine it with color histogram which

A. Related Work Occlusion is a significant problem in moving object detection and tracking. Some previous works did not deal with occlusion at all. Bobick et al. [3] and Grimson et al. [4] minimize occlusions by placing the cameras at a high sight, looking down on the plane of the motion object. Chang et al. [5] and Dockstader et al. [6] constructed a multiple camera surveillance system. They use the fusion of 978-1-4244-5848-6/10/$26.00 ©2010 IEEE

223

describes the appearances of objects. By dynamically changing the covariance of process noise and measurement noise in Kalman filter, objects in occlusion can be tracked correctly.

III.

BACKGROUND SUBTRACTION

The background subtraction method we presented here is similar to Horprasert et al. [13], which is able to cope with local illumination changes, such as shadows and highlights, even globe illumination changes. In this method, the background model is statistically modeled on each pixel. A computational color mode, include brightness distortion and chromaticity distortion is used to distinguish shading background from the ordinary background or moving foreground objects. A pixel is modeled by a 4-tuple [Ei, si, ai, bi ], where Ei is a vector with expected color value, si is a vector with the standard deviation of color value, ai is the variation of the brightness distortion, and bi is the variation of the chromaticity distortion of the ith pixel. In next step, the difference between the background image and the current image is evaluated. Each pixel is finally classified into four categories: original background, shaded background or shadow, highlighted background, and moving foreground object. The suitable threshold is automatic selected, and the details can be found in [13]. By using morphological filter and size filter, each foreground region is grouped into a component which is described by a bounding box. In Fig.2, this algorithm is shown to be robust and reliable in an outdoor scene.

II. OVERVIEW OF OUR METHOD The motivation of this paper is to present a robust method to track people and vehicles which are occluded. An overview of our approach is illustrated in Fig.1.

Figure 1. Overview of tracking moving object in improved

Kalman filter

The first step is background subtraction and background modeling. The video sequence captured by cameras is used to estimate the background model, which is then used to perform background subtraction. A morphology step is applied to remove small isolated spots and fill holes in the foreground image and then foreground regions of each frame are grouped into connected component. A size Filter is used to remove small component. Each component is bounded by a 2D bounding box which is then used to calculate color histogram. By matching color histogram, the first tracking process constructs correspondence between these detected foreground regions and those predicted by Kalman filter. The second task of tracking uses the measurement to update the Kalman filter and color histogram. By given object position and occupancy location, we can detect occlusions in image plane. If there are occlusions, the moving object will be tracked by changing the covariance of process noise and measurement noise in Kalman filter. When the end of occlusion event is detected, the tracker will be updated by matching color histogram. Finally, we describe results using the PETS2000 dataset and the PETS 2001 dataset 2, camera 2, and summarize our approach and experiences in evaluation.

Figure 2. An Example of moving object detection using

background subtraction

IV.

KALMAN FILTER FORMULATION

Kalman filter [14] can be used to cope with problems of estimating the state in linear and Gaussian conditions. Our process is guided by the following linear difference equation and measurement equation:

224

sk = Ask −1 + wk −1 ,

(1) zk = Hsk + vk , (2) where sk and zk represent the state and the measurement at time k, A is the system evolution matrix and H is the measurement matrix. The random variables wk and vk represent the process and measurement noise. They are assumed to be independent of each other, and with normal distributions:

p( w) p( v)

N (0, Q ) , N (0, R) .

sk = Ask −1 + wk −1

⎡1 ⎢0 zk = Hsk + vk = ⎢ ⎢0 ⎢ ⎣0

(3)

(4) In practice, the process noise covariance Q and measurement noise covariance R matrices can change with each time step or measurement. The Kalman filter estimates a process by using a form of feedback control: the filter estimates the process state at some time and then obtains feedback in the form of measurements. Therefore, the equations for the Kalman filter fall into two groups: time update equations and measurement update equations. The time update equations are responsible for projecting forward the current state and error covariance estimates to obtain the a priori (note the “minus”) estimates for the next time step:

sˆk− = Asˆk −1 ,

sˆk = sˆ + K k ( z k − Hsˆ ) , Pk = Pk− − K k HPk− .

0 0

0 0

1 0

0 1

0 1

0 0

0 0

0 1 0 0

0 0 1 0

m i

−1 0 ⎤ ⎡ xk −1 ⎤ ⎡ wkx−1 ⎤ ⎢ ⎥ 0 −1⎥⎥ ⎢⎢ yk −1 ⎥⎥ ⎢ wky−1 ⎥ 0 0 ⎥ ⎢ wk −1 ⎥ ⎢ wkw−1 ⎥ ⎥ ⎥⎢ ⎥+⎢ 0 0 ⎥ ⎢ hk −1 ⎥ ⎢ wkh−1 ⎥ 0 0 ⎥ ⎢ xk − 2 ⎥ ⎢ 0 ⎥ ⎥ ⎥ ⎢ ⎥⎢ 0 0 ⎦ ⎢⎣ yk − 2 ⎥⎦ ⎣⎢ 0 ⎦⎥ , (12)

0 0 0 1

0 0 0 0

0⎤ ⎡ vkx ⎤ ⎢ y⎥ 0⎥ ⎥ sk + ⎢ vk ⎥ ⎢vkw ⎥ 0⎥ ⎢ h⎥ ⎥ 0⎦ ⎣⎢ vk ⎦⎥ . (13)

H1 (i) ⋅ H 2 (i)

∑ H (i ) ⋅ ∑ H 1

2

(i )

(14) For Bhattacharyya matching, low scores indicate good matches and high scores indicate bad matches. A perfect match is 0 and a total mismatch is 1. In our approach, the scores below 0.55 can be considered as a good match. Then, tracking is done by the following operations: Step1. When an object moves into the FOV (Field-ofview), a tracker will be assigned to this object, including a Kalman filter and a color histogram. Besides, the parameters of KF must be initialized. Step2. Basing on centroid location predicted by KF, we locate a searching region in image plane. By matching the color histogram of tracked object and detected object in this region, we can find the best matching candidate. Step3. The state of the object is corrected basing on the position information of the candidate, and also the KF and the color histogram will be updated. Step4. Input the next frame, and repeat the step2.

(6) The measurement update equations are responsible for the feedback: − k

0 0

DBhattacharyya = 1 − ∑

Pk− = APk −1 AT + Q .

− k

0 2

V. TRACKING In order to distinguish the moving objects which have similar information of positions, we model the appearance of each object. We represent the appearance by color histograms in HSV color space. The distance D between m bins histogram H1 and H2 is computed as:

(5)

K k = Pk− H T ( HPk− H T + R)−1 ,

⎡2 ⎢0 ⎢ ⎢0 =⎢ ⎢0 ⎢1 ⎢ ⎣0

(7) (8)

(9) The time update equations can also be thought as predictor equations, while the measurement update equations can be thought as corrector equations. In our tracking process, each object is modeled as a planar rectangle moving along a linear trajectory in a constant velocity. Then our state vector becomes:

sk = ( xk , yk , wk , hk , xk −1 , yk −1 ) ,

(10) where (xk, yk) is the centroid coordinates of objects in the image plane, wk and hk represent the width and the height of planar rectangle. Therefore, our measurement vector is:

VI.

OCCLUSION HANDLING

In the tracking process, the process and measurement noise indicate the reliability of predication and measurement. In the case of low noise and no obvious disturbance, a moving object detection algorithm can easily and accurately locate the object. The measurement noise should be as low as possible, because the state of the object is mainly obtained basing on the measurement. When the object is partially or even fully occluded, the object becomes undetected. The measurement noise should be high, and the process noise should be low. Therefore, the covariance of process and measurement noise, Q and R, are usually not constants. If there is an occlusion, the system can adaptively change the covariance of process and measurement noise (Q and R) to

zk = ( xk , yk , wk , hk ) .

(11) We use xk-1-xk-2 to represent the instant velocity in time k1. Then the position in time k will be xk-1+ (xk-1 - xk-2) =2xk-1– xk-2.Therefore, Moving equation and measurement equation are described as:

225

consecutively track object basing on the result of occlusion detection. Our occlusion detection routine predicts future locations of objects, basing on current estimates of velocity and position. We use the system dynamics formulation of Sec.4:

s( k +Δt ) = AΔt sk . And a prediction in measurement is computed: z( k +Δt ) = Hs( k +Δt ) .

(15) (16)

Figure 3. An example of traking in normal conditions

If we give occupancy locations, it is easy to predict when and where occlusions will probably happen. Our confidence in this probable occlusion is determined by histogram matching in reliable regions. In the case of no occlusion, given covariance matrix of process and measurement Q and R, KF can normally track. If an object is occluded, the result of detection will be disabled, and can no longer accurately indicate the object in current frame. By increasing the measurement noise covariance matrix to infinite, diminishing the process noise covariance matrix to zero, we can prevent the system from using wrong measurement to update.

⎧diag[1,1,1,1], normal Q=⎨ 0 , occluded ⎩ −1

Fig.3 illustrates some frames on the PETS2000 dataset. It shows that our approach can accurately track objects in normal conditions. In this experiment, Different objects are labeled with centroid trajectories of different colors. Two vehicles enter the field of view in succession and then one leaves from frame 150 to frame 680. All objects are correctly detected, matched and tracked in most cases, except the following two cases: • Due to the size filter, people in the image plane are too small to be detected. • Objects which appear in the image plane during few frames can hardly be tracked. Fig.4 shows an example of partial occlusion on PETS2001 dataset 2, camera 2, from frame 1298 to frame 1448. In this example, a vehicle is occluded by a tree from frame 1350 to frame 1375. The results of background subtraction (green bounding box) fail to locate the moving object. In our method, the object can be also correctly tracked as in normal conditions.

(17)

−1

⎧diag[10 ,10 , 0, 0] , normal R=⎨ , occluded ∞ ⎩

(18)

VII. EXPERIMENTS To verify the proposed method, we have performed a number of experiments on the open PETS2000 and PETS2001 datasets. For reasons of speed and storage economy, we have chosen to process the video at half resolution. The application operates on AVI video files generated from the distributed JPEG images. Naturally, higher accuracies and reliability are to be expected from processing the video at full size and without compression artifacts. In our experiments, a tracked object is represented with three bounding boxes. The blue bounding box is the state of the object, green is the measurement from detection, and red is the prediction from KF. The size and position of all bounding box are determined by the state vector, as introduced in Sec. 4.

226

the tracking processing takes a similar amount of time. By this means, we believe that our method can also operate in real-time to be applicable to real-world tracking problems. VIII. CONCLUSION AND FUTURE WORK In this paper, we have proposed a method that can correctly track moving objects before, during, and after occlusion, as seen in the PETS2000 and PETS2001 datasets. A combination of appearance and motion information is efficient to moving object tracking in a complex environment. By dynamically changing covariance of process noise and measurement noise, our method makes it possible to locate objects accurately in the image plane even when they are partially or fully occluded. Our method has been tested on several real video sequences from the PETS2000 and the PETS2001 datasets. The experimental results have demonstrated the effectiveness, efficiency, and robustness of our method. The appearance model we use is color histogram which is, sometimes, unreliable. When different objects have similarly colored appearance, this method may produce wrong correspondence. In our future work, we will focus on a new appearance model which is adequate for a real-world and more complex environment. With future optimization, we shall test our method on live data which is captured from digital cameras.

Figure 4. An example of traking through partial occlusion

Fig.5 shows other two examples of tracking through occlusion. Two people go through the field of view, and are totally occluded by a tree. All of them are correctly tracked by our method. When the end of occlusion event is caught by the system, the trackers try to update the model using the results of histogram matching. Then the measurements are used to correct the motion trajectories of objects. Therefore, some errors of trajectory are shown in the portion of the sequences. Still, the trajectories using our method are accurate and even smooth.

ACKNOWLEDGMENT We thank High Performance Computing Center of Yunnan University for experimental support. This work is also supported partially by two projects, and they respectively are the project from Yunnan Provincial Science and Technology Department, named “Research of Key Technologies based on Behavior Pattern Recognition and System Development for Intelligent Video Surveillance in Detoxification Institute” and Innovation Group Project from Yunnan University.

(a)

REFERENCES [1]

W. Hu, T. Tan, L. Wang and S. Maybank, “A survey on visual surveillance of object motion and behaviors,” IEEE Transactions on Systems, Man, and Cybernetics, PART C: Applications and Reviews, vol. 34, No. 3, August 2004.

[2]

A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” ACM Comput. Surv., vol. 38, no. 4, pp. 45, December 2006.

[3]

A. F. Bobick et al., “The KidsRoom: A perceptually-based interactive and immersive story enviroment,” Teleoperators and Virtual Environment, vol. 8, pp. 367–391, 1999. W. Grimson, C. Stauffer, R. Romano, and L. Lee, “Using adaptive tracking to classify and monitor activities in a site”, Conference on Computer Vision and Pattern Recognition, pp. 22–29, 1998. 7 + &KDQJ 6 *RQJ DQG ( - 2QJ 2QJ 7UDFNLQJ PXOWLSOH SHRSOH XQGHU RFFOXVLRQ XVLQJ PXOWLSOH FDPHUDV LQ 3URF %ULWLVK 0DFKLQH 9LVLRQ &RQI  SS  S. L. Dockstader and A. M. Tekalp, “Multiple camera fusion for multi-object tracking”, Proc. IEEE. Vol.89, pp.1441–1455, October 2001. S. Khan and M. Shah, “Tracking people in presence of occlusion,” Asian Conference on Computer Vision, 2000.

[4]

[5]

(b)

[6]

Figure 5. Other two examples of traking through occlusions

For a scene that contains about five objects, the background subtraction currently works at about 15fps, and

[7]

227

[8] [9]

[10]

[11]

[12]

[13]

[14]

D. S. Jang, and H. I. Choi, “Active models for tracking moving objects,” Pattern Recognit., vol. 33, no. 7, pp. 1135–1146, 2000. N. Peterfreund, “Robust tracking of position and velocity with Kalman snakes,” IEEE Trans. Pattern Anal. Machine Intell., vol. 22, pp. 564–569, June 2000. H. Tao, H. S. Sawhney, and R. Kumar, “Dynamic layer representation with applications to tracking,” Proc.Computer Vision and Pattern Recognition, vol.2, pp. 134-141, 2000. H.Roh, S.Kang, S.W.Lee, “Multiple people tracking using an appearance model based on temporal color,” Proc. International Conference on Pattern Recognition, vol.4, pp. 643–646, 2000. S. Wachter and H. H. Nagel, “Tracking persons in monocular image sequences,” Comput. Vis. Image Understanding, vol. 74, no. 3, pp. 174–192, 1999. T. Horprasert, D. Harwood, L.S. Davis, “A statistical approach for real-time robust background subtraction and shadow detection,” IEEE ICCV’99 Frame- Rate Workshop, Corfu, Greece, Septemer 1999. G. Welch and G. Bishop. “An introduction to the kalman filter,” Technical Report TR 95-041, Computer Science, UNC Chapel Hill, 1995.

228