517
Robust Multiple Target Tracking Under Occlusion Using Fragmented Mean Shift and Kalman Filter Gargi phadke Indian Institute of technology Bombay Mumbai,India 400076 Email:
[email protected]
Abstract- Object tracking is critical to visual surveillance and activity analysis. The major issue in multiple visual target tracking is occlusion handling. In this paper, we investigate how to improve the robustness of visual tracking method for multiple target tracking with occlusion. Here we propose weighted fragment based mean shift with Kalman filter with the consideration of color features of the target. Discrete wavelet transform is used to detect the target automatically. Inter frame difference of LL-subband is used for detection of the target. Automatic fragments are acquired by calculating the mean and standard deviation of detected target. Here the
initialization of particle filtering for multiple target tracking. In the following section, we describe method of automatic detection of target. In section ill, Target appearance model is explained in detail. In section IV foreground separation is done using likelihood ratio. It is followed by detail description of fragmented weighted mean shift and Kalman filter to update the target center. The last section deals with results and discussion. Future work is described in section VIII.
weighted fragments are derived from the likelihood function of
II.
foreground and background of that particular fragment using color histogram. The output of weighted fragmented mean shift is updated with the help of Kalman filter. The Proposed tracking algorithm has been tested on several challenging videos of different situations and compared with mean shift method using Bhattacharyya coefficients and Bhattacharyya distance. Extensive experiments authenticate the robustness and reliability of the proposed method.
I.
INTRODUCTION
Object detection and tracking are important in any vision based surveillance system. It is most demanding research area in computer vision with applications such as video surveillance, robot localization, driver assistance, fast ob ject motion, change in appearance and cluttering. Occlu sion makes tracking of object in video a challenging task. Generally, tracking algorithm can be categorized into two major groups: mean shift tracker [ 1] and particle filtering based tracker. Although the particle filter based tracking is robust, but high computational cost restricts its application in real time scenarios. The mean shift tracker proposed by Comaniciu et al. [ 1], has advantage of low complexity, robustness and invariance to object deformations. Various approaches have been proposed to address the issue to overcome the drawbacks of the mean shift method. Mean shift tracker is fail in fast motion, illumination changes and occlusion. Yilaz et.al [ 16] gave the detail literature survey of the video tracking methods. Yang presented a new similarity measures between the target and the tar get candidate, in place of the Bhattacharyya metric [2]. [3]and [4] introduced modification in mean shift to improve video tracking, This approach accounts for partial occlusion and pose changes by representing the target in multiple fragments. Jeykar et al. also used fragmentation, [6], [7] and multiple features condition [ 13]. [ 12] used mean shift with
TARGET IDENTIFICATION
Initialization of target is an important process in the track ing. Most of the time it is done manually. To make system automatic, target should be detected automatically. Here we have used wavelet based method described in [9] with proposed modification to reduce computational cost. Discrete Wavelet Transform (DWT) is adopted to detect the moving target. Most of the unwanted motion in the background can be decomposed into higher frequency subbands. We decompose the image into sub images using two dimensional DWT till level 3. The low frequency subband (approximate band or LL band) is used for further processing to reduce computing cost. This helps in making the approach less susceptible to noise, as the approximate band has less noise compared to original image. Following subsection describes the detection method. A.
Inter Frame Difference
Let Diff(x,y) be the inter-frame difference between ap proximate bands of two neighboring frames, where (x,y) is the position of the wavelet coefficient as in equationl. D(x,y)is calculated using equation 2, generating a binary image D. Diff(x,y) will have a value less than the threshold when the object is not moving.
Diff(x,y)
=
D(x,y)
ILL[3]n(X,y) - LL[3]n-l(X,y)1
( 1)
�
(2)
=
{I, 0,
if Diff x,y) otherwIse
>
T
Where, LL[3]n(X,y) is wavelet coefficient of LL band of frame 'n' at position (x,y). Value of threshold T is found empirically. The value of D(x,y) 1, indicates motion; this helps in identifying the moving target automatically as the target will
978- 1-4244-9799-7/ 1 11$26.00 ©20 11
=
IEEE
518
2
3
o
4
5
7 6 Fig. 1.
Fig. 2. (a)Inter frame difference of image (b )detected target with bounding box (c)Fragmented target(d)Target with foreground and background method
8 neighbors of 'p'pixeJ
be 'white' in the binary image D. But the value of the pixel near the center of the moving object is almost zero giving 'black' pixels in the center of the target like a 'hole' , in the binary image D. We filled this hole Using morphological closing operation. In the resultant binary image,the region which belongs to the target is 'white' as shown in Figure 2(a). B.
Image Labeling
A.
Fragmention is an important issue to improve the robust ness of the trackers. The choice of the fragment is not specially restricted. But less distinctive fragment may not reflect direct motion of the target. So to get proper fragments we have used vertical projection method is used. The mean of the each row of the extracted target region is calculated,to form vector v v (vI,v2, v3,....... vn) [7]. Here n is the height of the target. Than using mean vector 'v, find out the new vector'sv' which is the difference between next and previous value of mean as in equation 5. =
Scan the binary image D(x,y) pixel by pixel, from left to right and top to bottom. Let 'P' denote the pixel at any step in scanning process.Out of 8 neighboring pixels shown in Figure 1, we consider only four pixels at at time. It is useful for finding the extreme left and right corner of the target. The Labeling procedure as described in [ 19]. Extreme left corner is detected by considering pixel neigh bors 1,2,3,4 as shown in Figure 1, values for all this point should be zero. Similarly for extreme right we consider pixel neighbors 0,7,6,5, as shown in Figure 1, values of this points should be zero. In this way we can find out multiple targets as given in equation 3 and 4. Using inter band spatial relationship of discrete wavelet transform, extreme points can be determined in the original image. (3)
SVi
Here "i" gives number of moving object. B:"in left top of the corner of the image and B:"ax is the right-bottom corner cardinalate of moving object i. Figure 2(b) gives the extracted targets with box.
The previous section gave the automatic detection of the target, in this section we describe development of the target model. To improve the performance of the mean shift tracker fragmented model is developed. Here fragmentation is done automatically.
IVi+l
- Vi-l l
(5)
=
T
=
mean(SV) + a * std(SV)
(6)
Here a is a constant, (we have tkaen a=2) and std is standard deviation. Using the threshold value points is segmented vertically. The result are as shown in Figure 2(c).Mininmum numbers of fragments should be four to achieve robustness. As the target is extracted and fragmented, target model is developed considering the color histogram feature as given in [I]. IV.
WEIGHTED MEAN SHIFT TRACKER
Basic mean shift tracker considers only foreground fea tures of target and target candidate [ 1]. In the proposed method foreground and background in fragments are used for features extraction. A.
TARGET ApPEARANCE M ODELING
=
Using equation 5 we get the vector (svI,sv2,sv3.........svn). The fragment extracted using threshold value as in equation 6.
SV
(4)
III.
Fragmentation
Foreground Feature Extraction
Likelihood of the color being found in the foreground of the region of interest is calculated [6]. Two windows means, two area are considered. One window is foreground window which represent target. Another window is called as the background window, The area which is twice of the fore ground window(target). It indicates we have to considered area around the target, as shown in Figure 2(d). The joint
519
color histogram hob and hbg in RGB space are calculated over target and background window. Here hob is histogram of the target and hbg is histogram of the background. L(Xi) the 'likelihood' of probability of the periocular pixel belongs to the foreground and can be given as
max(hob [b(Xi)], E ) L(Xi) = log max(hbg[b(Xi)], E )
(7)
where b(Xi) is color map of the pixel at Xi and E is included in equation to avoid numerical instability. The likelihood is thresholded to identify if the pixel belongs to foreground or not as given in equation (8).
T(Xi) =
{I, 0,
? > Th
if L(Xi
otherwIse
Likelihood Weight Calculation
Unlike the basic mean shift tracker here we add the weight [6]. Let Lube the likelihood calculated for Uth histogram bin. It gives major of probability to that particular color. But L is may be positive or negative so avoid it, use sigmoid function Auas given in equation 9.
Au= max(l -
1 _(I -a) ,.1) l+exp[-'b-]
(9)
' a' is based on the foreground region and ' b' controls the slope of mapping. Here we considered values (a,b)=(l,l). c.
Mean Shift Vector
The target model is modified as in equation 10, using computed weight in equation 9. m
ik= C
2: k(l l xiI12)Au8[b(Xi) - u] i=1
(10)
and the target candidate considering color is computed as follows m
Pu= C
2: k(l l xiI12)Au8[b(Xi) - u] i=1
( 1 1)
Unlike the basic mean shift tracker here we have con sidered fragments of the individual target and mean shift tracking is separately applied on each individual fragments of target. Since vector Pu and ikis same length, Bhattacharyya distance is still a valid metric. Hence we use the bhat tacharyya coefficient (p) as given in equation 12 and the Bhattacharyya distance as given in equation 13. m
P=
2: vPuik
u=1
( 12) ( 13)
The calculation of mean shift vector and the tracking is done as in [ 1] . The new center for the fragments of the target is as given in equation 14
( 14)
V. FRAGMENT BASED WEIGHTED MEAN SHIFT TRACKER
The previous section explained how to improve the basic mean shift tracker considering color feature of entire target. The fragmented mean shift tracker is used [6], [7], [ 1 1] to handle partial occlusion condition but they consider only the maximum Battacharyya coefficient for fragment center updation. Whether we considere all the centers for finding the new center for total target as given in equation 15.
�
Pi(Yi - di) center= L '--:.-::-'j.:.. i=1 2:i=1 Pi
(8)
This likelihood is integrated with basic mean shift target model. The threshold used in our experiment is 0.8. B.
"n x'w'g(YO-Xi)2 h " Y = 6,=1 "n x'g( � ) 2 h 6,=1 '
( 15)
Here 'i' indicate the fragments number, 'f' indicate total number of fragments and the 'd' is the distance of the fragment from the center of the target. This distance is always remain same. P gives bhattacharyya coefficients for all frag ments. This new center is updated using Kalman filtering. It is explained in next section. Now using the final center calculate new centers for all fragments for initialization of the centers of all fragments for next frame. This values is considered for further process. Y jO is updated centers for fragments, d is distance of fragment center from target center, Y j is the new centers for fragments as given in equation 16. Yj = YjO +di
VI.
( 16)
UPDATION USING KALMAN FILTER
We decided to use the Kalman filter because it uses the motion of the model which adds the robustness to the tracker. The Kalman filter can be described by two equations, the state equation and the measurement equation [ 15].
x(k)= A(k - l)x( k -1) + B(k)w(k)
( 17)
z(k)= c(k)x(k)+v(k)
(18)
x(k) is the state vector, z(k) is the measured value at time k, A(k-l) is state transition matrix and B(x) is control matrix. In this paper A is a velocity constant matrix, c(k) is measurement matrix, V(k) and w(k) is the noise(assume to be gaussian distributed with zero mean). In this paper,the state vector is ( 19) x= [x,x',y,y']T
X and Y represent the x and y coordinate component of the target central position (respectively). x' and y' represent the x and y coordinate velocity component of target central position. The measured value is z(k). (20) Here the measured value for the Kalman filter is the updated center of which is obtained from the weighted mean shift tracker. Here Xc = X coordinate of center and Yc = Y coordinate of the center. We update the final center position of the target to handle occlusion and the motion of the target Using Kalman filter.
520
U
P
lor
l'I •
•
(d)
•
• • _;�.m
•
•
�
&.�I
i"
]
Fig. 3. Failure of the basic mean shift tracker for challenge 1 :(a) Frame 379 (b) Frame 419 (c) Frame 435 (d) Frame 455
1-1iII1I'9tM1rJII:!
J 1t � .l jf l"r n�t�
iu
1
(e)
i_&fI'I9tM!qII\ -1iI •• l
V
I
I
I
0' \
I�l .... ,...
lt�
•
•
�)
II)
Fig. 5. Comparison Of The Tracking Performance Of Two Algorithms for challenge 1:(a)Bhattcharyya coefficients(b)Battcharyya distance
(d) Fig. 4. Output of Proposed Method for challenge 1: (a) Frame 379 (b) Frame 419 (c) Frame 435 (d) Frame 455
VII.
RESULTS AND DISCUSSION
In this section, we provide implementation details and show the results on the number of challenging sequences. After detection of target, it is automatically fragmented based on standard deviation and mean of target. The Kalman filter initialization is done. The background is considered as double of foreground of the target. Firstly using the mean shift method tracking is described in [ 1] it shown in Figure 3The basic mean shift tracker fails for challenge one video whrere two persons is crossing each other, their distance from camera is same. Figure (4)shows the proposed method tracks successfully for same sequence. It shows that proposed method handle occlusion and motion condition unlike basic mean shift tracker. For comparison we use Bhattacharyya coefficient and Bhattacharyya distance as given in equation 12 and 13 graphically shown in Figure 5. When the tracker works properly Bhattacharyya coefficient will be maximum and distance will be reduced. Here we have used 100 frames with target size( white shirt man) is 30 x 100 and another target (yellow shirt man ) 24 x 80. To check the robustness of the proposed method, we considered different sequences of challenging video. Next sequence is taken from caviardata [ 17] as a challenge
Fig. 6. Output of Proposed Method for Caviar Video Database Challenge 2: (a) Frame 1937 (b) Frame 2006 (c) Frame 2030 (d) Frame 2036
two. Here three persons are walking, it can handle double occlusion for man(wearing colorful shirt). The Proposed method properly work as shown in Figure 6. Figure 7will show the comparison of the Bhattacharyya coefficients and distance for above video. Here we use 120 frames for process target size are 24 x 80 (for woman) and second target( man) is 28 x 100. The sequence is taken from PETS2006 [ 18] database as challenge three. In this one person coming towards the camera and another is going away from the camera and in between they overlap each other. This is where the mean shift fails but proposed method tracks properly as shown in Figure 8. Here the Figure 9 gives the comparison for above sequence considering Bhattacharyya coefficient and distance between the proposed method and basic mean shift method. Total frames are 50 and target size (yellow shirt man) 12 x 40 and 14 x 50 for the back shirt man. VIII.
CONCLUSION AND FUTURE
W ORK
In this work, we have proposed simple but effective method for handing the occlusion and motion. We have considered different sequences wth different challenges. Though the conditions are challenging, the proposed method performs successfully. It worked for multiple occlusion in
521
challenge two and total occlusion in challenge three. Some limitations of the proposed method are it not adap tive for handing the scaling and orientation. Illumination parameter are not considered. We plan to address this issue to make the method more effective and robust in future. REFERENCES
..
•
u .,
• ... ,...
�)
(,)
Fig. 7. Comparison Of The Tracking Performance Of Two Algorithms For Caviar Video Database:(a)Bhattcharyya coefficients(b)Battcharyya distance
Fig. 8. Output Of Proposed Method For PETS2006 as challenge 3: (a) Frame 2417 (b) Frame 2447 (c) Frame 2450 (d) Frame 2460
II)
(b)
Fig. 9. Comparison of The Tracking Performance of Two Algorithms For PETS2006 as challenge 3:(a)Bhattcharyya coefficients(b)Battcharyya distance
[1] D.Comaniciu,V.Ramesh and P.Meer, "Real time tracking of non regid objects using mean shift".proc.conference on computer vision and pattern recognation(CVPR), PPI42-149, June2000. [2] C.Yang,Ramani D,L.Davis "Efficient mean shift tracking via a new similarity measure".proc.conferece on computer vision and pattern recognation(CVPR) 2005. [3] Jwu,Wang "Aspatial color mean shift object tracking algoritihm with scale and orientation estimation" .Pattern recognation letters journal 2008. [4] Adam A,Riviln E Shirnshomi "Robust fragment based tracking using the integral histogram " .computer society conjernece on computer vision and pattern regognation, PP798-805,2006. [5] K Lee,youn-Mi Lee "Tracking multi-person to illumination changes and occlusions" .ICAT 2004. [6] Jaideep J, R.Y.babu, K.R.Ramakrishan "Robust object tracking with background weighted local kemels".!ournal computer vision and image understandingPP 296-309,2008. [7] Faonglin wang,s.Yu ,lie Yang "Robust and effiecient fragments-based tracking using mean shift".!ournal of electronics and communications 2009. [8] M.khansari,H.Rabiee "Occlusion handing for object tracking in crowed video scences based on the unsecimated wavelet feacutres"lEEE 2007. [9] F.H.Chang, Y.L.chen "Real time multiple objects tracking and in dification based on discrete wavelet trasform". !ournal on Pattern recognation 2005. [10] J.zahoa, W hqua,"An Aprroach based on mean shift and kalamn fitIer for target tracking under occulusion". I.conference on machine earning and cyberestic bauding July 2009. [11] V.shrikrishan,T.nagraj,Subashis chudhari,"fmgment based tracking for scale and orientation adaptation" .Indian conference on computer vision ,graphics and image processing 2008. [12] Satoshi Y,"Multiple tracking using mean shift with particle filter based Intialization" .International conference information visualization 2008. [13] Amir babarian,Saeed Rastegar,"Mean shift based object tracking with multiple feactures". southestean sympoisim on system theory unversity of tennesses space insitute 2009. [14] A.Miller,A.Basharat "Person and vehicle tracking in survillance video". !ornal springer velag Berlin Heideberg PP 174-178,2008. [15] Y.Bar shlom,X-R Li "Estimation with applications to tmcking with applications to tracking and navigation"John wiley and sons ,New yark USA,2001. [16] A.Yilmaz,O.Javed ,M.shah,"Object tracking :A survey" ACM comput.surv. vol38, 2006. [17] (http/groups.inf.ed.ac.uk/vision/caviar/caviardata) [18] (http://www.pets2006.net) [19] R.C. Gonzalez and R.E.Woods,"Digital Jamge processing"New Jer sey,USA Prentice hall 2005.