Multiple Objects Tracking Using Extended Kalman

0 downloads 0 Views 985KB Size Report
Computer vision researchers have interest in tracking as it is a difficult problem and significant. .... comparison of target and candidate pdf's. Being a closely .... “Digital image processing using MATLAB” 2nd edition by Rafael C. Gonzalez, Richard E. Woods and Steven L. Eddins , TMH publications,. New Delhi ; 2011. [7].
2014 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT)

Multiple Objects Tracking Using Extended Kalman Filter, GMM and Mean Shift Algorithm – A comparative Study D. Harihara Santosh1, P.G. Krishna Mohan2 1 Research Scholar, JNTU College of Engineering, Hyderabad, Andhra Pradesh, 2 Department of ECE,Institute of Aeronautical Engineering,Hyderabad, Andhra Pradesh 1 [email protected],[email protected] 1

, Abstract— Object tracking is a primary step for image processing applications like object recognition, navigation systems and surveillance systems. The current image and the background image is differentiated by approaching conventionally in image processing. Image subtraction based algorithms are mainly used in extracting features of moving objects and take the information in frames. Here three algorithms namely Extended Kalman Filter, Gaussian Mixture Model (GMM), Mean Shift Algorithm are compared in the context of multiple object tracking. The comparative results show that GMM performs well when there are occlusions. Extended Kalman filter fails because of abnormal behavior in the distribution of random variables when there is nonlinear transformation. It cannot identify multiple objects when there are occlusions. Mean shift algorithm is best suitable for single object tracking and is very sensitive to window size which is adaptive. Results show that this algorithm has the limitation to detect multiple objects when there is even slight occlusion. Keywords— Gaussian mixture model, Extended Kalman Filter, Mean Shift Algorithm, multiple object tracking, background subtraction, foreground detection.

I. INTRODUCTION There is lots of importance in tracking a moving object in computer vision in video pictures. Surveillance systems, navigation systems and object recognition are primary steps in object tracking. There is a lot of importance to object tracking in real time environment[1] .It finds applications in providing provide better sense of security using visual information, Security and surveillance to recognize people, in Medical therapy to improve the quality of life for physical therapy patients and disabled people, to analyse shopping behaviour of customers in retail space instrumentation to enhance building and environment design, video abstraction to obtain automatic annotation of videos, to generate object based summaries, traffic management to analyse flow, to detect accidents, video editing to eliminate cumbersome human operator interaction, to design futuristic video effects. Computer vision researchers have interest in tracking as it is a difficult problem and significant. To obtain a relationship between objects and object parts between consecutive frames of video is main in tracking [2]. It is most critical task in image processing applications because it provides cohesive

ISBN No. 978-1-4799-3914-5/14/$31.00 ©2014 IEEE

temporal data about objects that are moving which are used to improve lower level processing such as motion segmentation and to enable higher level data extraction such as behaviour recognition and activity analysis. Tracking had become a tedious task to apply in sophisticated because objects are improperly segmented. Some basic problems of erroneous segmentation are Long shadows, full and partial occlusion of objects with each other. At segmentation level and at tracking level dealing with shadows and occlusions is important in robust algorithm. Here three object tracking algorithms are taken. The main focus in comparing these algorithms is to analyse the working of these algorithms in the presence of severe occlusions. In Section II Extended Kalman filter is described. The necessary equations to develop the algorithm are given. In Section III Gaussian Mixture Model is explained. Since it is developed based on back ground subtraction, the steps for back ground subtraction is explained clearly and the algorithm is explained. In Section IV Mean Shift Algorithm is explained. In Section V the results are shown. The frames at which each algorithm works well and worst are taken and shown to graphically illustrate the differences between these object tracking algorithm. In Section VI conclusions are made based on the results. II. THE EXTENDED KALMAN FILTER (EKF): A. The Process to be Estimated: As described above in Section 4.1.1, the Kalman filter addresses the general problem of trying to estimate the state of a discrete-time controlled process that is governed by a linear stochastic difference equation. But what happens if the process to be estimated and (or) the measurement relationship to the process is non-linear? Some of the most interesting and successful applications of Kalman filtering have been such situations. A Kalman filter that linearizes about the current mean and covariance is referred to as an extended Kalman filter or EKF. In something akin to a Taylor series, we can linearize the estimation around the current estimate using the partial derivatives of the process and measurement functions to compute estimates even in the face of non-linear relationships.

1484

2014 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT) To do so, we must begin by modifying some of the material presented in Section 4.1. Let us assume that our process again has a state vector x€Kn , but that the process is now governed by the non-linear stochastic difference equation X k = f( X k -1 , U k , Wk -1 ) (1) with a measurement Z€Km that is Zk=h(Xk,Vk) (2) Where the random variables Wk and Vk again represent the process and measurement noise. In this case the non-linear function f in the difference equation relates the state at the previous time step K-1 to the state at the current time step K . It includes as parameters any driving function Uk and the zero-mean process noise Wk. The non-linear function h in the measurement equation relates the state Xk to the measurement Zk . In practice of course one does not know the individual values of the noise Wk and Vk at each time step. However, one can approximate the state and measurement vector without them as

) ~ X k = f (X k -1 , U k ,0) ) ) Z k = h( X k ,0)

(3.a)

(3.b) It is important to note that a fundamental flaw of the EKF is that the distributions (or densities in the continuous case) of the various random variables are no longer normal after undergoing their respective nonlinear transformations. The EKF is simply an ad hoc state estimator that only approximates the optimality of Bayes’ rule by linearization III. OBJECT DETECTION USING GAUSSIAN MIXTURE MODEL There is a parametric model of probability density function for object detection. That model is coined as Gaussian Mixture Model (GMM). This model is represented in following way: GMM is equal to the weighted summation of the so called Gaussian component densities. This Gaussian Mixture Model can be used as background model. To get the anticipated outcomes pixels of frame are removed from necessary video[3]. This background subtraction includes different issues that implicit emerging am algorithm that can used in recognizing the object. Moreover, it could be capable in responding to different alterations like moving and halt of motion objects and illumination. A. Background Subtraction Background subtraction is relied on below mentioned four critical steps: 1. Preprocessing Temporal or spatial smoothing will be applicable at early stages of preprocessing to eradicate noise in the device. This noise could be an issue under the various intensities of light. Smoothing technique involves deleting different elements of environment such as rain and snow. In real time systems, to minimize the data rates for processing, frame rate and frame size are generally used. The important factor while

preprocessing the technique that can be used for data format using background subtraction model. Various algorithms are there to process intensity of luminance that is single valued scalar per every pixel. The below two figures shown among which left one shows snow and on right side shows spatial and temporal smoothing application. This outcomes removal of snow in more effective way to get clear image.

Figure 1 and Figure 2: Image on the left shows snowing and image on the right is a resultant of smoothing effect

2. Background Modeling Using video frame this step computes background model. The important motto of this emergence of background model is to withstand the alterations in environment in background. However, it is complicate to recognize the motion objects. 3. Foreground Detection This step recognizes the frame pixels. This foreground detection will differentiate video frame and background model. General way for foreground detection will verify that pixel is distinguished from background. 4. Data Validation Ultimately, the final step eradicates the pixels which are not used in image. It includes enlighten of process for foreground mask which is relied on data that is actually extracted from background model. There are efficient key points where it lags. These are mentioned below: 1. Disregarding any correlation among adjacent pixels. 2. The rate of amendment may not be compatible with the motion velocity of the so called foreground object [4]. 3. Pixels which are Non-stationary, from casting shadow or moving leavers by motion objects that actually violates from the true one. B. Methodology The GMM [5] is defined as a Combination of Gaussian distributions of K that analyze the alterations of state for equivalent pixels of frames. Hence algorithm established put on Gaussian mixtures to individual frame and alters images from colorful images to binary [6]. The pixels which has no change in its state is represented by value 1that is black color and the pixels which change their state is represented as value 0 that is white color. Hence, creating positions for motion objects of video is made possible. The GMM is a mixture of k Gaussian distributions that

1485

2014 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT) and in our case distributions pz and qz are the histograms of the target and the candidate, respectively. C. Mean shift: In order to find the best match of our target in the sequential frames, we needed to maximize the Bhattacharyya coefficient, which means that we needed to maximize the term Figure 3

n

Figure 4

Below shown a figure 3 that represents the pixels equivalent to road of the background image that doesn’t suffer changes in state, consequently the black color that is value 0 is represented and seems black in color as revealed in below figure 4. The pixels of image equivalent to cars experience radical state alteration, Hence the white color which is represented as value 1 is represented and cars seemsas white in color which is shown in below figure 4.

∑ ωi k ( i =1

2

y − xi ) h

(6)

where h is the kernel’s smoothing parameter, or bandwidth, and -i is given by m

ωi = ∑ u =1

qˆu δ [b(xi ) − u ] pˆ u ( yˆu )

(7)

and ωi is the Kronecker delta function, equal to 1 only at u and 0 otherwise (i.e. only equal to 1 at the particular binu). IV. MEAN SHIFT ALGORITHM The terms u qˆ and u pˆ are the values of the target and candidate histograms corresponding to pixel xi of the A. The kernel mask: candidate object. This mapping of colour values given by u qˆ The objects density estimates were weighted by a and u pˆ can be visualized to demonstrate how the target monotonically decreasing Epanechnikov kernel given by: object changes over time and what the corresponding distribution of weights is the figures below show the gray (4) scale images of some of these mappings, taken from the where cd is the volume of the unit d- football sequence: n dimensional sphere and x are the normalized pixel coordinates x iω i within the target, relative to the center (i.e. ||x||2 is a squared Euclidean distance of each pixel from the center of the target ˆy = i = 1n (8) – see Figure). Since we were dealing with a two dimensional ω i image space, our kernel function was of the form:





i =1

Where ˆy is the current location of the candidate center and g(x) is the derivative function. Since the derivative of the Epanechnikov kernel profile is constant, the above expression reduces to a weighted distance average The details of the mean-shift procedure are outlined [7]. Here we will only refer to the implementation choices that we made and the results that we obtained. Figure 5: kernel mask B. Distance minimization: Based on the fact that the probability of classification error is directly related to the similarity of the two distributions, the choice of the similarity measure in was such that it was supposed to maximize the Bayes error arising from the comparison of target and candidate pdf’s. Being a closely related entity to the Bayes error, a Bhattacharyya coefficient was chosen and its maximum searched for to estimate the target localization. Bhattacharyya coefficient of two statistical distributions is defined as:

ρ[ p( y ), q ] = ∫ p z ( y )q z dz

V. RESULTS All the algorithms are verified by giving a video of 124500 frames. To best illustrate these algorithm’s performance, a video which is having multiple buses is taken. Figure 6a is the sample frame shows that there is only one object of interest. Figure 6b shows that how this object is detected. Yellow color extraction of frame shown in figure 6.c is figure 6.d. fore ground detection method of frame 1422 is shown in figure 6e and figure 6f. Since the object of interest here is yellow colored bus, auto which is also of yellow color is detected as bus.

(5)

1486

2014 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT) such cases only GMM works better. Fig. 7.4 shows how the MSA fails in such cases. If the object of interest are far (size is small), then neither of the algorithms work. Figure shown in 7.5 is a video frame where there are other objects of interest. Fig. 7.6 shows that EKF algorithm failed to track the objects of interest. A van is also detected by this algorithm which is not the case of GMM and MSA algorithms. Figure 6.a

Figure .6.c: Frame 1422 of Traffic video sequence

Figure 6.e: Foreground detection from frame 3335

Figure 6.b

Figure shown in 7.7 is a frame where the objects of interest are far (very small). In such cases GMM could able to detect all the objects distinctly which is shown in figure 7.8. figure 7.9 shows that if the object size is small the EKF is failed to detect. Figure 7.10 shows that if the object size is small then it cannot distinguish those objects as different but as a single object.

Figure 6.d: Yellow color extraction from frame 1422

Fig. 7.3 frame 4512

Fig. 7.4. Failure in detecting occluded objects by MSA

Fig. 7.5 frame 35412

Fig. 7.6. Failure in detecting object which is not of interest by EKF

Figure 6.f: Auto in frame 3335is not tracked

A. Comparative analysis: All the three video tracking algorithms are compared by taking the same video sample. And some typical video frames are illustrated for analysis. The below frame fig. 7.1 is a typical video frame where there are two objects of interest which are overlapping. The foreground bus is detected by all the three algorithms. But only by using GMM the second bus is detected. GMM is working well for multiple objects of interest overlap each other.

Fig. 7.7 frame 115412

Fig. 7.1 frame 4512

Fig. 7.8. detection of occluded object by GMM

Fig. 7.2. Detection of occluded objects by GMM

Frame 4512 is a typical video frame in fig. 7.3 where one object is almost completely occluded by the other object. In

1487

2014 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT) Kalman filter and 37 buses from Mean shift algorithm.

Fig. 7.9 failure in detection of occluded object by EKF

Fig. 7.10 detecting multiple objects as single object by MSA

B. Case study: TABLE 1: TRACKING EFFICIENCY OF THE PROPOSED ALGORITHM Time slot

3:30 to3:45 PM

Buse s pass ed 11

Buses detected using GMM 10

Buses detected using EKF 10

Buses detected using MSA 9

3:45 to 4:00 PM

13

12

9

7

4:00 to 4:15 PM

4

4

1

2

4:15 to 4:30 PM

6

6

4

2

4:30 to 4:45 PM

17

15

8

10

4:45 to 5:00 PM

10

9

9

7

Total number of 61 buses passed Overall efficiency in %

56

41

37

93.44%

67.2%

60.65%

VI. CONCLUSIONS There may also be the concern that the linear motion estimation may fail for objects moving in a complicated nonlinear way. However, if the movement is not extremely fast, the deviation from estimated positions between successive frames is small, than correct tracking is reliably achieved. Furthermore, if mistracking occurred at some frame by reason of occlusion, newly appearing or disappearing objects, the proposed algorithm could recover correct tracking after a couple of frames. GMM makes the algorithm more robust, such that the tracker will recover tracking if occlusion takes place. The comparative results show that GMM performs well when there are occlusions. Extended Kalman filter fails because of abnormal behavior in the distribution of random variables when there is nonlinear transformation. It cannot identify multiple objects when there are occlusions. Mean shift algorithm is best suitable for single object tracking and is very sensitive to window size which is adaptive. Results show that this algorithm has the limitation to detect multiple objects when there is even slight occlusion. REFERENCES [1] [2] [3]

In order to calculate the density of institutional buses and to compute the efficiency of proposed algorithm six timeslots have been considered between 3:30 pm to 5 pm each of 15 minutes interval. In the table 1 it is observed that the efficiency of these algorithms varies between 60.65% to 93.44%.. The causes for this variation observed are: 1. the average speed of the vehicles is below the threshold 2. occlusions caused while overtaking other vehicles

[4]

[5] [6]

[7]

“Tracking Manifold Objects in Motion Using Gaussian Mixture Model And Blob Analysis”, by D. Harihara Santosh and Dr. P. G. Krishna Mohan at I2CT2014, at Pune, India during 6th to 8th of April 2014. “Beyond the Kalman filter: particle filters for tracking applications” By Branko. Ristic, Sanjeev Arulampalm, Neil James Gordon 2004. Artech House publications. “Target tracking using Kalman Filter Embedded Trust Region” by Wang Zhan-qing; Wu Chao-zhong; Wang Chuan-ting;Fan You-fu page no 119-122, volume 1, IEEE Transactions. 2009 High Speed Target Tracking Using Kalman Filter and Partial Window Imaging by Mikhel E. Hawkins ; George Woodru School of Mechanical Engineering Georgia Institute of Technology April 2002 “Digital signal processing” 4th edition by John G. Proakis and G.Manolakis; PHA publications 2007 “Digital image processing using MATLAB” 2nd edition by Rafael C Gonzalez, Richard E. Woods and Steven L. Eddins , TMH publications, New Delhi ; 2011 “Target Tracking Using Kalman Filter” Prasad Kalane PREC Loni Pune University International Journal of Science & Technology ISSN (online): 2250-141X Vol. 2 Issue 2, April 2012

From table it is observed that a total of 61 buses passed through, out of which 56 buses have been detected successfully from GMM algorithm. 41 buses from Extended

1488