Object Tracking with Occlusion Handling Using Mean ...

13 downloads 1110 Views 1MB Size Report
Mean Shift algorithm to obtain center of desired object. But the robust of tracking is not ... Tracking is to identify the position of target in image sequence. Each of ...
2015 2nd International Conference on Pattern Recognition and Image Analysis (IPRIA 2015) March 11-12, 2015

Object Tracking with Occlusion Handling Using Mean Shift, Kalman Filter and Edge Histogram Iman Iraei, Karim Faez Dept. Electrical Engineering Amirkabir University of Technology (Tehran Polytechnic) Tehran – Iran E-mail: [email protected], [email protected]

Abstract—This paper propose an algorithm that uses Mean Shift and Kalman Filter for object tracking. Also this method uses Edge Histogram for occlusion handling. Firstly, we use Mean Shift algorithm to obtain center of desired object. But the robust of tracking is not very well, so we use Kalman Filter to improve the effect of tracking. Bhattacharyya coefficient and Edge Histogram are used for finding out both partial and full occlusions. With this approach we can track the object more accurately. The results prove that the robust of tracking is very well. Keywords—Kalman Filter; Occlusion Handling; Mean Shift; Object Tracking; Edge Histogram; MATLAB Simulation

I. INTRODUCTION Object tracking is an important topic in computer vision. Tracking is to identify the position of target in image sequence. Each of Kalman Filter, Mean Shift algorithm and edge detection play an important role in computer vision and image processing. The task of target tracking is a key component of video surveillance, vehicle navigation, robotics and industrial automation. It leads to high-level processing such as recognition, clustering and re-identification. Tracking algorithm can be classified into two major groups, namely state space approach such as Particle Filter or Kalman Filter and kernel based approach like Mean Shift algorithm [1], [2]. State space approaches are based largely on probability and estimation theory which the ability to recover from lost tracks makes state space approach one of the most used tracking algorithm [3]. Of course they may require high computational cost and more memory because of their recursive algorithms. The Mean Shift algorithm is a nonparametric method. It is an iterative kernel based deterministic procedure which converges to a local maximum by finding the mean of the distribution by iterating in the direction of maximum increase in probability density [4]. Mean Shift algorithm has advantages and disadvantages. One of the most important disadvantages of this algorithm is that it does not work properly when the color of target candidate is similar to background and in this situation we may lose important data. Of course there is another problem and it occurs when the target occludes [1], [5], [6], [7]. Hence Mean Shift can not track the object in all frames. Therefore, this paper propose a

978-1-4799-8445-9/15/$31.00 ©2015 IEEE

tracking algorithm which is greatly invariant to occlusions and different backgrounds. When there is no occlusion, Mean Shift algorithm works and simultaneously Kalman Filter get the centers which have obtained from Mean Shift to estimate next position, velocity and acceleration. In this situation Bhattacharyya coefficient of the target candidate which have obtained by Mean Shift and Bhattacharyya Coefficient which have calculated by Kalman Filter, compare with each other, and finally the algorithm is winner which have greater Bhattacharyya coefficient. At this time the object location stores in a memory as trajectory. But when there is occlusion, first the algorithm should realize partial occlusion occurs or full occlusion occurs. In this new method Edge Histogram helps us in order to finding out the kind of occlusions. We have partial occlusion if magnitude of Edge Histogram of target candidate be greater than a threshold and full occlusion occurs when magnitude of Edge Histogram of target candidate be less than a threshold. Although the issue of occlusion handling has been studied by several researchers, little attention has been paid to use of Edge Histogram for this problematic issue. Thus in this paper we present a new algorithm to overcome the occlusion’s problem. The remainder of this paper is organized as follows. Section 2 presents the object tracking system include Mean Shift, similarity measurement and Kalman Filter. In section 3 we describe the principle methods of preprocessing and feature analysis for better detecting the object during the tracking. In section 4 the proposed algorithm will be described. Section 5 provides experimental results and shows the performance of this method in video sequence. And finally the paper is concluded in section 6. II. OBJECT TRACKING SYSTEM A. Mean Shift Tracking Algorithm The Mean Shift algorithm is a non-parametric density gradient estimator [8] ,[9]. It is basically an iterative expectation maximization clustering algorithm executed within local search regions. The mean-shift tracker provides accurate localization and it is computationally feasible [4]. A target is typically defined by a rectangular region surrounding a region of interest in an image and feature space is chosen to

2015 2nd International Conference on Pattern Recognition and Image Analysis (IPRIA 2015) March 11-12, 2015 determine a histogram of the pixel distribution in the target region. The histogram presents the probability density of the characteristic distribution in the interest region. Of course the kernel function plays an important role in the algorithm. So widely used form of target representation is color histograms, because of its independence from scaling and rotation and its robustness to partial occlusions [2], [4]. Define the target model as its normalized color histogram, q = {qu}1…m, n



 

qˆ u  c k (|| x i || 2 ) b ( x i  u

(1)

i 1

Where m is the number of bins. The normalized color distribution of a target candidate p(y) ={pu(y)}1,...nh centered in y can be calculated as: nh

pˆ u ( y )  ch  k (|| i 1

y  xi 2 || ) b( xi )  u  h

(2)

Where {xi},i=1,...nh are the nh pixel locations of the target candidate in the target area, associates the pixel xi to the histogram bin, k(x) is the kernel profile [10] with bandwidth h, and ch is a normalization function defined as:

1 ch  nh y  xi 2 || ) b xi   u   k (|| h i 1

(3)

In order to calculate the likelihood of a candidate we need a similarity function which defines a distance between the model and the candidate. A metric can be based on the Bhattacharyya coefficient, defined between two normalized histograms p(y) and q as: m

  p  y , q    pu  y qu

(4)

y 1

The result is between 0 and 1 and how much result be colser to 1 it means that target candidate is more similar to target model [1]. By taking the Taylor expansion around the target candidate probability values [4], [5], [6] the estimated linear approximation of the Bhattacharyya can be described by:

  p y , q  

c n 1 m  y  xi 2  ||   p( y) qu  h  wi k  || h 2 u 1 2 i 1  

(5)

where m

wi    b ( xi )  u  u 1

qu pu ( y )

(6)

Note that equation (5) is actually a density estimate of the object centered at y in the current frame, computed with a kernel profile k(x) and weighted by wi . The maximum of this density in the local Neighborhood (starting from the last known position of the target) gives us the most probable target position in the current frame, and it can be found by employing a mean shift procedure. During this procedure, the center of the target candidate is successively shifted by: n yˆ  xi 2 || )  xi wi g (|| 0 h i  1 yˆ  n yˆ  xi 2 || )  wi g (|| 0 h i 1

(7)

Where y0 is the current location of the candidate center and g(x) is the derivative function. Since the derivative of the Epanechnikov kernel profile is constant, the above expression reduces to a weighted distance average. As mentioned above, the objects’ density estimates were weighted by a monotonically decreasing Epanechnikov kernel given by:  1 1  C ( d  2)(1  x ) k ( x)   2 d  0

if

x 1

(8)

otherwise

where cd is the volume of the unit d-dimensional sphere and x are the normalized pixel coordinates within the target, relative to the center. Since we were dealing with a twodimensional image space, our kernel function was of the form:

k( x ) 

2



(1 || x || 2 )

(9)

The rationale for using a kernel to assign smaller weights to pixels farther from the center is that those pixels are the least reliable, since they are the ones most affected by occlusion or interference from the background. A kernel with Epanechnikov profile was essential for the derivation of the smooth similarity function between the distributions, since its derivative is constant; thus the kernel masking lead to a function suitable for gradient optimization, which gave us the direction of the target’s movement [4]. The search for the matching target candidate in that case was restricted to a much smaller area and therefore much faster than the exhaustive search. Different type of kernel profile may be used such as normal kernel and uniform kernel but they have little impact on the localization accuracy of the Mean Shift. To track the target using the Mean Shift algorithm, it iterates the following steps: 1. Choose a search window size and the initial location of the search window. 2. Compute the mean location in the search window. 3. Center the search window at the mean location computed in Step 2.

2015 2nd International Conference on Pattern Recognition and Image Analysis (IPRIA 2015) March 11-12, 2015 4. Repeat Steps 2 and 3 until convergence (or until the mean location moves less than a preset threshold).

1 0 H  0 1

B. Kalman Filter The Mean Shift algorithm is not well suited for tracking object in the presence of full occlusion. The Kalman Filter algorithm belongs to the state space approach class of tracking algorithms. It solves the tracking problem based on the state space equation and the measurement equation. So object tracking with Kalman Filter divides to two steps: prediction equation and correction equation. In this paper Kalman Filter estimates the position, velocity and acceleration of the object in each frame of the sequence but it has been supposed that the changes in speed of the target is in limitation, in other words, the acceleration of the target is not very high. In order to simplify the problem, we suppose that the system noise and the observation noise are all white noise which has not relationship to each other [5], [11]. We will predict the location, velocity and acceleration of moving target after finding out the center of it with Mean Shift algorithm. These parameters represent the state vector and measurement vector of the Kalman filter. The state vector is composed of the position, velocity and acceleration of target at time tk.

So the covariance of process noise and measurement noise are deducted from wk−1 and vk by the matrices (18) and (19):

S k  ( xk , yk , xk , yk , xk , yk )

(10)

The measurement vector is composed of the position x, y (the center of mass of the object) at time tk. (11)

Z k  ( xc , y c )

1 0 0 1  0 0 Q 0 0 0 0  0 0

0 0

0 0

0 0 0 0 

0 0 0 0 0 0 0 0  1 0 0 0  0 1 0 0 0 0 1 0  0 0 0 1

(16)

(18)

0.1 Vk    0.1

15 0  R   0 15

Pk  APk 1 AT  Q

(21)

Correction Equations: K k  Pk H T ( HPk H T  R) 1

(22)

Sˆk  Sˆk  k k ( Z k  H Sˆk )

(23)

Pk  Pk  K k HPk

(24)

The prediction equations are responsible for timely forward projections estimated values of state variables and the error covariance and the correction equations are responsible for the backward. The expression of the estimation error [8] is given by:

S k  AS k 1  wk 1

Ex  X kalman  X Mean Shift

A is the transition matrix, wk is the noise process, dt is the difference between the two moments k and k −1 (dt = 1) and the process noise is of the form (14).  1  0  A  0  0  0 0

0 1 0 0 0 0

dt 2

 0   dt  0 dt 0 2  1 0 dt 0   0 1 0 dt   0 0 1 0  0 0 0 1  dt 0

(13)

1 (14) 1   1 w k 1    1 1   1

The measurement model is defined by equation (15): Z  HS k  V k

(15)

H is the measurement matrix (16) and the measurement noise is presented (17):

(19)

The output equations for the two blocks of prediction and correction of Kalman Filter are: Equations for predicting: (20) Sˆ k  A Sˆ k 1

The Kalman Filter estimates the state s as a discrete process. This state is modeled by the linear equation: (12)

(17)

(25)

III. IMAGE FEATURES A. Color Features Since one of the biggest issues in visual tracking is the robustness of the algorithm under changing video conditions, including illumination and shape changes [4], the first problem that we addressed was the choice of the color space in which our algorithm would operate. We needed the color model that is invariant to illumination changes and changes of the object’s shape [12], [13], [14] both of which are present in most of the video sequences. The easiest alternative was to choose the normalized RGB color space, since it is invariant to viewpoint, illumination, and object shape changes. For our task, we could also use HUE and saturation, which are even less dependent on the changing conditions. However, since we decided to use color histograms as the representation of the objects’ color probability density functions, three features of the RGB color space gave us better discriminating power with a three-dimensional histogram than would HUE and saturation with a two-dimensional representation.

2015 2nd International Conference on Pattern Recognition and Image Analysis (IPRIA 2015) March 11-12, 2015 For our implementation, we closely followed the choices and the algorithm above and decided to use color histograms of the normalized RGB model. The input frames were first converted to the RGB space (since I = R + G + B, all pixel values were divided by the sum of their R, G, and B component values), eliminating the intensity information from color, and thus eliminating the colors’ dependency on it. Then a weighted 3D histogram of the three components was calculated (the number of bins in each dimension was restricted to 16, estimated to give enough discriminating power to the object’s color distribution). The weighting kernels were adapted to the size of the target by the choice of the smoothing parameter h, which normalizes the target’s rectangular region, as defined in the paper, to a unit circle (by dividing each distance’s coordinates independently by hx and hy).

B. Edge Features In order to the best performance of tracking we use edge features however it can help to localization of target in bounds of image that presence of target is more greater [7]. Edges describe the structure of an image, edges provide beneficial descriptive information in object tracking when objects in a scene have similar color. Therefore a 2-dimensional edge histogram is used for this issue which illustrates the magnitude of edge feature in different part of rectangular region of target candidate. With respect of different partial or full occlusion that may be in the scene of tracking, this edge histogram could help us to find out position of target more accurately. By choosing a threshold we make a difference between partial occlusion and full occlusion. The gray scale image is convoluted with any horizontal and vertical edge operators. Suppose, for a particular pixel of the image, the outputs of these two operations are ∇H and ∇V, respectively. Then a gradient vector for this pixel is calculated with following formulas: magnitude of the vector:

G 

H  V 2

(26)

angle of the vector:

V    H 

  tan1 Where

(27)

 ( x, y ) is determined between edges directions

0    ( x , y )  360 . Edges were filtered and only edges with Fig. 1 The left images are normalized target candidates and right images are weighted target candidates for frames number 3, 6, and 10

magnitudes above a threshold feature histogram.

were considered in the edge

IV. PROPOSED ALGORITHM

Fig. 2 Projections of a 3D RGB color histogram

First the boundary of desired object is selected. According to its position the center of object is calculated and the rectangular region of tracking obtained (initialization of location of the search window). It is necessary to make the histogram of target model so determine target model features. Then by initializing the iteration number, Mean Shift algorithm could be started. In every frame histogram of target candidate and new center will be updated and new mean shift vector will be calculated. These processes will continue until exact center for tracking has been realized. After that the best center appears by Mean Shift, this position will be fed into Kalman Filter and because there is data measurement, Kalman Filter start predicting and correcting the new position. Finally the Bhattacharyya coefficient of histogram of position which was calculated before by Mean Shift and the Bhattacharyya coefficient of histogram of position which is obtained by Kalman Filter compare with each other and which one that is more closer to 1 is winner as best position for tracking. This

2015 2nd International Conference on Pattern Recognition and Image Analysis (IPRIA 2015) March 11-12, 2015 process will continue untill occlusion occurs. At this time Bhattacharyya coefficient and magnitude of edge histogram of target candidate are deeply decreased and we can realize that occlusion occurs. When we have full occlusion, we do not have any measurement data since we can not obtain centers. So we use only prediction equation of Kalman Filter to estimate the position of target in future frames. V. EXPERIMENTAL RESULTS We implemented the proposed algorithm in MATLAB. The algorithm tracks the object as it moves from one frame to the next one in approximately 0.019 second on a PC (core i7 with 8 GB RAM). The proposed algorithm’s performance is compared to other tracking algorithms. This algorithm was selected to use 16×16×16 RGB color histograms. Fig. 3 illustrates magnitude of edge histogram, it is obvious that among frames 360 untill 385 magnitude of edge histogram decrease and it is for this reason that there is no obvious target for tracking during full occlusion and by using this method we can find out when occlusion occurs. Fig. 4 shows two kind of Bhattacharyya coefficient which has obtained from Mean Shift and Kalman Filter. As graph describes in some frames Bhattacharyya coefficient that is result of Kalman Filter’s position estimation is greater than other and the Bhattacharyya coefficient decrease in frame number 356 when the full occlusion occurs. The visual descriptive of tracking performance for proposed algorithm can be observed from Fig. 5. Sequences are taken from TLD dataset for tracking a car where green rectangle box shows Mean Shift’s position estimation, red rectangle box shows Kalman Filter’s position estimation and white rectangle is for Kalman Filter’s prediction during the occlusion that in some frames the red one is more accurately estimation. As can be seen from the video and the Fig. 5, the algorithm is robust to all conditions such as background color similarity and object disappearing.

Fig .4 The blue line shows Bhattacharyya coefficient of Mean Shift and the red line performs Bhattacharyya coefficient of Kalman Filter

Fig .3 Magnitude of Edge Histogram for 408 frames Fig. 5 The result of the experience in presence of partial and full occlusion. Green rectangle represents Mean Shift tracker, red rectangle represents Kalman Filter’s estimation as a tracker and white rectangle shows Kalman Filter’s prediction during the occlusion

2015 2nd International Conference on Pattern Recognition and Image Analysis (IPRIA 2015) March 11-12, 2015 We tested this algorithm on 408 frames. Note taht in the case of using Mean Shift this algorithm is able to act as a tracker only before occlusion accurs, therefore the number which has been written for this algorithm is the number of true and false realized frame before occlusion. More useful results are tabulated at table. 1.

[8]

[9] Table 1. Processing time and number of frames which realized correctly by this algorithm (MS: Mean Shift - KF: Kalman Filter)

Algorithm

Number of True Frames

Number of False Frames

MS[1]

338

28

MS+KF[1]

392

16

MS+KF+EDGE[Proposed]

401

7

Time for each Frame

0.0052 sec 0.0133 sec 0.0191 sec

VI. CONCLUSION A new tracking algorithm using Mean Shift, Kalman filter and Edge Histogram is presented. We proposed a real time tracking algorithm which copes with occlusion with a small computational cost. Also can be concluded that the tracking of objects differs from one object to another and several parameters can affect the results of tracking. This algorithm performs robustly in complex scenes where background color similarity or partial and full occlusion occurs. Experimental results show that our algorithm (MS + KF + EDGE) is superior in term of selecting the best object‘s position but as the table shows the execution time for each frame has increased. Edge histogram can help us to detect the occlusion more correctly.

REFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7]

Rahim Panahi, Iman Golamipour, Mansour Jamzad, “Real Time Occlusion Handling Using Kalman Filter and Mean-Shift,” 8th Iranian Conference on Machine Vision and Image Processing(MVIP), vol. 4, no. 2, pp. 320–323, September 2013. M. Sanjeev Arulampalam, “A tutorial on particle filters for linear/nonlinear- Gaussian Bayesian tracking,” in IEEE Transactions on Signal Processing, vol. 50, iss. 2, pp. 174 - 188, February 2002. Jing Ren, Jie Hao, “Mean shift tracking algorithm combined with Kalman Filter,” 5th International Congress on Image and Signal Processing (CISP) , pp. 727-730, October 2012. D. Comaniciu, and P. Meer, “Mean Shift: A Robust Approach Toward Feature Space Analysis,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603 - 619, May 2002. Chengpo Mu, Zhijie Yuan, Jia Song, Yuanqian Chen, “A New Approach to Track Moving Target With improved Mean Shift Algorithm and Kalman Filter,” 4th International Conference on Intelligent HumanMachine vol. 1, pp. 359 - 362, August 2012. D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-Based Object Tracking,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 5, pp. 564 - 577, May 2003. B.Z. de Villiers, W.A. Clarke, P.E. Robinson, “Mean Shift Object Tracking with Occlusion handling,” 23th conference on Pattern

[10]

[11]

[12]

[13]

[14]

Recognition Association of South Africa, vol. 32, no. 12, pp.192-199, November 2012. Fukunaga, K. &Hostetler, L, “The estimation of the gradient of a density function”, Transactions on Information Theory”, vol. 21, pp.192-199, January 1975. Georgescu, B., Shimshoni, I. & Meer, P. “Mean shift based clustering in high dimensions: A texture classification example”, In International Conference on Computer Vision, vol. 1, no. 3, pp. 456–463, October 2003. Singh, M.& Ahuja, N.“Regression based bandwidth selection for segmentation using parzen windows”, In International Conference on Computer Vision, vol. 1, no. 2, pp. 2–9, October 2003. Afef Salhi and Ameni Yengui Jammoussi, “Object tracking system using Camshift, Meanshift and Kalman filter,” International Science Index Vol. 6, no. 64, pp. 674 - 677, April 2012. Hager G. and Belhumeur P, “Efficient region tracking with parametric models of geometry illumination” , IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 20, no. 10, , pp. 480–484, October 1998. D. Cremers, M. Rousson, and R. Deriche, “A review of statistical approaches to level set segmentation: Integrating color, texture, motion and shape”, vol. 72, no. 2, pp. 195–215, April 2007. P. F. H. Jin and S. Soatto, “Real-time feature tracking and outlier rejection with changes in illumination”, In Proceedings of International Conference on Computer Vision, vol. 18, no. 14, pp. 684–689, July 2001.

Suggest Documents