A Real-time Approach for Autonomous Detection and ... - IEEE Xplore

0 downloads 0 Views 6MB Size Report
Index Terms—autonomous object detection; mobile visual surveillance platform; UAV. I. INTRODUCTION. Object detection and tracking in video streams has be- ...
A Real-time Approach for Autonomous Detection and Tracking of Moving Objects from UAV Pouria Sadeghi-Tehran; Christopher Clarke; Plamen Angelov Intelligent Systems Lab, Data Science Group School of Computing and Communications Lancaster University, United Kingdom Email: (p.sadeghi-tehran; c.clarke; p.angelov)@lancs.ac.uk

Abstract—A new approach to autonomously detect and track moving objects in a video captured by a moving camera from a UAV in real-time is proposed in this paper. The introduced approach replaces the need for a human operator to perform video analytics by autonomously detecting moving objects and clustering them for tracking purposes. The effectiveness of the introduced approach is tested on the footage taken from a real UAV and the evaluation results are demonstrated in this paper. Index Terms—autonomous object detection; mobile visual surveillance platform; UAV

I. I NTRODUCTION Object detection and tracking in video streams has be-come an important part of our daily lives. It is quite natural for humans to precisely detect and track objects, such as when we are looking for our keys, with minimum effort. However, autonomously detecting and tracking objects in computer vision applications are extremely challenging tasks. It has been studied extensively among computer vision researchers over the last decades. Most of the proposed techniques for objects detection and tracking are for static cameras [1]–[3]. Nevertheless, the rising demand for mobile platforms, such as unmanned aerial vehicles (UAVs) shows the need to detect and track moving objects autonomously by a moving camera. Due to the difficulty of detecting moving objects with a moving camera, only a handful of methods have been proposed compared to a stationary camera. In this paper, we tackle the challenge of aerial visual surveillance applications by tracking regions of interest (keypoints) and applying clustering techniques to detect and track moving objects from a UAV. Unlike many other detection techniques, no pre-knowledge about the number or shape of the objects that are to be tracked is required. There are two main approaches in video stream processing that can be employed for detecting and tracking moving objects. The first approach uses motion-based in-formation and the second one uses visual features such as colour, shape and texture. One of the most popular and desirable features in visual-based object detection is colour, which is widely used in a variety of applications to model moving objects [3]–[5]. However, colour-based techniques experience difficulty with respect to colour constancy due to changes in illumination conditions, inter-reflections with other objects, shadow, etc. Shape-based methods are one of the most demanding tech-

978-1-4799-4494-1/14/$31.00 ©2014 IEEE

niques in visual detection due to the difficulty of segmenting objects in an image. First, the object of interest needs to be pre-processed in order to distinguish the border of the object. In addition, the selection of the pre-processing algorithm is application dependent, e.g. the algorithm used to detect a car may be different from the algorithm to detect a person. Increasing the number of objects and occlusions in a scene also makes the detection and shape characterisation of the objects more difficult [6]. Template-based techniques are popular methods among computer vision researchers for matching features between the template and the image sequences [7]– [9]. The quality of matching depends on the details provided by the object template. On the other hand, an ideal match may require sampling of a large number of points; thus, in general, detecting an object with an exact match is not computationally efficient. Motion detection is one of the principal approaches in object detection in video streams. The object tracking approaches which use motion information can be grouped into two main categories: model-based and motion-based techniques. Motion-based approaches use robust methods to group visual motion consistencies over time [10]. Although these approaches are comparably fast, they struggle to track nonrigid objects and movements. Model-based approaches are more robust compared with motion-based methods; however, for complex models they need to cope with transaction, scaling, deformation and rotation of the objects which are not computationally efficient. Region-based methods are based on information extracted from the entire region using motion estimation techniques. In motion estimation processes, motion vectors that describe the transformation from one frame to another are determined. In such techniques, a point-to-point correspondence is required within the whole region. Another widely used method in this category is optical flow. In this method, the apparent direction and velocity of every pixel in a frame is computed. In order to stabilise the image on the background plane, a background model can be calculated. Feature/keypoint tracking techniques have been studied by Lucas-Kanade [11], Shi and Tomasi [12], and Baker and Matthews [13]. Kanade-Lucas-Tomasi (KLT) feature tracking assumes small spatial and temporal changes of appearance across an image sequence; therefore, it is not robust to camera motions. In this paper, we tackle the challenge of tracking

features with a moving camera by compensating for the camera movement. In order to do that, we build a mosaic image consisting of the current frame and the background model. For every new frame, matching keypoints in the mosaic image are updated. The differences between the current frame and the background model, which represents the background, are then evaluated to find any changes in the scene. Using this technique improves the precision of detecting changes in a scene using background subtraction techniques applied to moving camera footage. Binary Robust Invariant Scalable Keypoints, known as BRISK, [14] is used to find the matching keypoints between the current frame and the background model. Once the keypoints are extracted, the motion parameters are estimated using the RANSAC algorithm [15]. Optical flow is then used to track keypoints which move from frame to frame. Once the moving keypoints are extracted, we apply clustering algorithms to the moving features in order to extract the moving objects. We present the results of using both BRISK features (derived when warping the frames) and Good Features to Track (GFT) [12](derived after the warping) for identifying which keypoints will be passed to the optical flow algorithm. Three types of clustering known as greedy clustering, mean-shift clustering, and Evolving Local Means (ELM) clustering [16] are used and their performances are evaluated. II. T HE P ROPOSED A PPROACH In this paper, we aim to autonomously detect moving objects on a moving camera mounted on a UAV. We use camera estimation techniques to overcome the limitation of feature tracking using a moving camera, and clustering algorithms to group moving objects for the purpose of tracking. Fig. 1 illustrates the main steps of the process. In the first step the camera movement is compensated for and a mosaic image is built. In the second step moving keypoints between the mosaic image and the next frame are extracted. In the final step clustering is used to group moving keypoints (moving objects). A. Estimation of the Camera Motion Few techniques have been proposed to estimate the camera motion [17]–[20]; however, some of these approaches assume that there are no significant camera orientation changes during the process, which makes them impractical for mobile platforms, such as cameras mounted on UAVs. In order to compensate for the camera motion, a homography, H, is calculated for each consecutive frame which transforms the new frame to the background models coordinate frame. We used a well-known feature based technique known as Binary Robust Invariant Scalable Keypoints (BRISK) [14] to detect feature points in image sequences and to estimate the transformation, H, between frames k and k + 1. It has been proven that BRISK is computationally efficient and outperforms other feature detections techniques such as SIFT and SURF in terms of speed and robustness [21]. After a set of keypoints are extracted and matched with the BRISK keypoints from the

Fig. 1. Schematic diagram of the introduced method

previous frame, the transformation matrix, H, is calculated using RANSAC [15]. 1) Feature Detection using Binary Robust Invariant Scalable Keypoints (BRISK): Since the robustness of the BRISK approach for key-point detection, description and matching has been proven before [7], [14], [21], it is selected for our application. In BRISK, the points of interest are identified across both the frame and scale dimensions. Keypoints are detected in octave layers of the frame pyramid to increase the efficiency of computation. The location and scale of each keypoint is then calculated in the continuous domain via quadratic function fitting [14]. After keypoint detection, the corresponding features in different frames are determined. The characteristic direction of each keypoint is identified to allow for orientation-normalised descriptors. Once keypoints and their descriptors from two frames are extracted some preliminary

feature matches between frames are executed to find the match between keypoints extracted from the first frame and the corresponding keypoints in the second frame. Due to the binary nature of the descriptor in BRISK, the keypoints can be matched very efficiently [14]. 2) Estimation of the Homography Matrix and Devel-opment of the Mosaic Image: In general, tens to hundreds of keypoints are extracted from each frame. In order to remove mismatches and refine outliers, the RANSAC algorithm is used to find inliers by calculating the homography matrix, H. In order to do that, it randomly samples corresponding matches and tries to fit the homography to those samples by calculating an error between the rest of the samples and the model. Then, based on a threshold, T , the points are grouped as outliers or inliers and the process continues until the number of outliers is sufficiently small [22]. The generation of the homography matrix is time and computationally expensive relative to the other steps used in the proposed approach. The time to perform the homography process can be reduced by using fewer keypoints. This can be achieved by setting the parameters of the feature detector and extractor algorithm to extract less keypoints from the frames. However, modifying the parameters to reduce the keypoints results in reduced quality keypoints and, therefore, the resultant warped frame is less accurate. In order to alleviate this we split the image into 32 bins and performed the feature detection and extraction algorithm on each bin with the algorithm parameters set to produce accurate keypoints. The keypoint descriptors are then summed to get the keypoints for the whole frame. This decreases the time taken for the warping process because keypoint detection algorithms do not extract features close to the borders, and hence we artificially introduce additional borders by splitting the image into bins. This reduces the time to perform the homography whilst maintaining the accuracy of the resultant warped frame. Fig. 2 shows an example of the warping process where the current frame is warped into the previous frame. The bridge is only visible in the previous frame and the configuration of vehicles on the motorway is different in the current frame. The resultant warped image shows the new configuration of vehicles in the perspective of the previous frame, and hence the bridge is visible, which enables the application of background subtraction techniques to detect objects. B. Track Moving Features in the Background Model In this stage, we aim to track moving keypoints extracted from the mosaic image and locate them in the next frame. This can be accomplished by using the KLT optical flow function. By giving a vector of keypoints inside the mosaic image, KLT returns a vector of new keypoint locations in the next frame corresponding to the keypoints passed into the algorithm. The keypoints between the current frame and the background model which have not moved are rejected and only the keypoints whose movement is more than the threshold, T , are stored. The keypoints which have moved are then passed into the clustering algorithm to group moving objects.

Fig. 2. Example of the warping process showing (a) the previous image, (b) the current image and (c) the current image warped into the perspective of the previous image

The new positions of the keypoints in the next frame should be located in order to track which keypoints have moved between the current frame and the background model. With the assumption that the intensity of the point from two frames does not change; thus, for a displacement (u, v): I(x, y, t) = I(x + u, y + v, t + 1)

(1)

where It and It+1 are the current frame and the next frame respectively. In general, the assumption for constant intensity holds for small displacement in frames taken at two nearby instants. Thus, Taylor expansion of I(x + u, y + v, t + 1) at (x, y, t) can be used to approximate Eq. 1 by an equation that involves the image derivatives [12]. I(x + u, y + v, t + 1) ≈ I(x, y, t) +

∂I ∂I ∂I u+ v+ (2) ∂x ∂y ∂t

I(x + u, y + v, t + 1) − I(x, y, t) =

∂I ∂I ∂I u+ v+ (3) ∂x ∂y ∂t

∂I ∂I ∂I ∂I ∂I ∂I u+ v+ ≈0→ u+ v=− ∂x ∂y ∂t ∂x ∂y ∂t

(4)

This constraint is the fundamental optical flow constraint equation. If we make an assumption that the displacement of all keypoints in the neighbourhood of the keypoint are the same, it can be exploited by the LK feature tracking algorithm [11]. We can then use the optical flow constraint for those points with (u, v) unknown displacement. It should be noted that, although Lucas-Kanade optical flow estimation produces accurate angular estimates of the displacements, it usually produces large errors in displacement magnitudes if iteration towards convergence is not performed for each point.

The BRISK features extracted for performing the homography are passed to the optical flow algorithm to extract moving keypoints. However, it is also possible to apply a different feature detection algorithm for use with optical flow. This is beneficial because different feature detection algorithms extract features based on different criteria and hence detect different sets of keypoints. The keypoints for performing the homography must be mainly located on the background to ensure that moving objects do not affect the resultant homography matrix, whereas keypoints for use in the optical flow algorithm must be located on the moving objects in order them to be identified. As previously mentioned, we artificially reduce the number of keypoints generated for the warping process, which, in turns, means there are less keypoints for use in the optical flow algorithm. However, the time saved during the warping process is much greater than the time required to extract keypoints from the frame; therefore, this is a compromise that reduces the overall time for the algorithm whilst maintaining the accuracy. Careful selection of these algorithms can yield much better results as shown in Section III. In addition to the BRISK feature detector we implemented the Good Features to Track (GFT) algorithm [23]. The GFT algorithm assumes an affine motion image change model and uses a Lucas and Kanade style Newton Raphson minimization procedure in order to discriminate between good and bad features based on measurements of dissimilarity. This was chosen because it has been experimentally proven to produce good features for the purpose of tracking and is computationally inexpensive. C. Multiple-Objects Identification using ELM Clustering After the moving keypoints are identified, the next step is to group them using a clustering technique and label each cluster as a moving object. The spatial co-ordinates x and y of the moving keypoints in the frame are used for clustering in order to identify the moving objects. Multiple object identification has always been a challenge in computer vision. The proposed methods to address this problem are usually limited by the number of objects [24] or the type of tracked objects that should be specified beforehand [25], [26]. In our experiment the type and number of moving objects in the scene are unknown; therefore, classical clustering algorithms such as k-means or fuzzy C-means are not suitable due to their fixed structures and pre-defined number of clusters. To address this problem, we used Greedy clustering [5], Mean Shift clustering [27], and ELM clustering [16] which do not require the number of clusters to be pre-defined in advance and are only applied on the moving keypoints. ELM is based on the concept of non-parametric gradient estimation of density functions using local (per data) means [28]. The local means are updated for each incoming data vector allowing for the data set to be expanded. If the density pattern changes, a new cluster is created. For each data cluster i that is being formed, the local mean µi and variance σi is calculated. In ELM, an initial radius of the data clouds

is defined (r = 0.05). As a new point in the frame is being processed, the distance to all existing data clusters is computed. If the following condition is satisfied, it means that the region around the point X and cluster centre ci overlap, and the point is assigned to the data cluster i. di < (max (∥σi ∥ , r) + r)

(5)

where di is the distance from point X = (x, y) to the data cluster mean µi . If the region around the point X overlaps with more than one data cluster, the nearest data cluster is selected. By adding the new point to an existing cluster, the center of the cluster i and the variance σi are updated recursively as described in [16]. In order to avoid noise and clutter any new clusters that are formed around a small number of points, p are ignored. This number is determined is such a way that size of a moving object is expected to be detected to be comparable with the size of a regular blob formed by p points. If any of the newly formed clusters has less than p points as members, it will be removed and will not be identified as a moving object. III. E XPERIMENTAL R ESULTS In this section, we present nearly real experiments to identify moving objects using the introduced method. Videos are taken outdoors from a UAV which was flying over traffic. The proposed approach was developed on Linux using C++ and the experiments were run using an Intel Core 2 Duo running at 2.4GHz on a video with a resolution of 640 × 480. The video consists of a number of different scenes taken from a UAV. Two of the scenes are analysed in the following section: scene A consists of the UAV overlooking a motorway, and scene B consists of the UAV overlooking a crossroad from an isometric perspective. During the experiment, the keypoints were extracted from each frame using BRISK, matched with the corresponding keypoints from the next frame, and refined using RANSAC. The keypoints for performing the homography are extracted by splitting the frame into 32 bins, corresponding to 8 bins across the horizontal axis and 4 bins across the vertical axis. The mosaic image is then built to model the background. In the next step we either extract the corresponding keypoints derived from the BRISK feature detector or generate new keypoints using the GFT feature detector. Those keypoints that have moved are then extracted using the optical flow algorithm. If the movement of the keypoints between the background model and the current frame are less than the threshold, T , they will be rejected, where T = 2 for the following experiments, as shown in Fig. 3. At the final step, we applied three clustering techniques known as greedy clustering, mean-shift clustering, and ELM clustering to group moving points which are potentially moving objects. The three clustering methods were applied to the moving keypoints extracted using both the BRISK feature detector and the GFT detector, the outputs of which are shown in Fig. 4, Fig. 5, and Fig. 6.

Fig. 3. Screenshots showing (a) all keypoints detected by the GFT detector and (b) keypoints with displacement, calculated from the optical flow, greater than the threshold

Fig. 5. Greedy clustering on GFT features for scenes (a) A and (b) B, and BRISK features for scenes (c) A and (d) B

Fig. 4. ELM clustering on GFT features for scenes (a) A and (b) B, and BRISK features for scenes (c) A and (d) B

Positive detections are where a keypoint is associated with a moving object; false positives are where a keypoint is associated with a non-moving object, and false negatives are the moving objects that have not been detected. These results are affected by the choice of feature detection algorithm used to determine the moving keypoints. A single object multiple detection (SOMD) is where a single object is clustered into two or more clusters, whereas a multiple object single detection (MOSD) is where multiple objects are clustered into a single cluster. These results are affected by the choice of clustering algorithm and should both be minimised. A detected object is one in which more than one moving keypoint is associated with. A. Timings The whole approach takes approximately 400ms to process one frame. Table I provides details about the timings of each step of the proposed approach run on the video containing scene A. It can be seen that the homography step dominates the time taken to process a frame but is required in order to compensate for the motion of the camera. The approach

Fig. 6. Mean shift clustering on GFT features for scenes (a) A and (b) B, and BRISK features for scenes (c) A and (d) B

of splitting the image into 32 bins shows a reduction in the time taken to calculate the homography by over a factor of 2. This could be further reduced by using more bins; however the output of the resultant homography begins to deteriorate beyond 32 bins. The results also demonstrate the increased time taken when using the GFT feature detector instead of the BRISK features generated during the homography process. However, due to the fact the time for a frame is dominated by the homography process the additional time to extract the GFT features is offset by the increased robustness of the output. The time taken for the clustering step is dependent upon the number of keypoints passed into the algorithm. The number of

keypoints extracted for scene A ranges from 12 to 1797, with an average of 118. Greedy clustering is the quickest clustering technique which is to be expected due to the simplicity of the algorithm. ELM and Mean Shift clustering offer more intelligent methods for clustering and hence both take longer than Greedy clustering. However, on average Mean Shift takes at least twice as long as ELM and at least three times as long as Greedy Clustering. Observation of the maximum time for clustering also shows that Greedy Clustering and ELM scale better than Mean Shift clustering which is to be expected. TABLE I P RECISION COMPARISON OF THE PROPOSED APPROACH TO FOUR OTHER EXISTING CBIR SYSTEMS

Step of the proposed approach Homography (32 bins) Homography (no bins) Optical Flow (BRISK) Optical Flow (GFT) ELM clustering Greedy clustering Mean Shift clustering

Minimum time (ms)

Maximum time (ms)

Average time (ms)

174

444

367

479

1007.9

884

19.4

45.2

25.5

30.8

64.8

32.4

0.0161

3.65

0.121

0.0139

1.04

0.089

0.0666

10.6

0.245

B. Feature Detectors Table II shows the results of running different feature extraction techniques on scene A prior to using optical flow to extract the keypoints that have moved. It can be seen that the GFT feature detector generates more keypoints that are more accurate than the keypoints of the BRISK feature detector. This is a result of the GFT detectors ability to detect the most prominent corners in the image. Another contributory factor is the reduction in keypoints extracted using the BRISK algorithm for the purpose of the homography estimation.

R ESULTS USING GFT

Feature Detection GFT BRISK

Object detected 9 6

TABLE II BRISK FEATURES FOR SCENE A

AND

Positives 87 32

False positives 0 2

False negatives 0 3

Table III shows the results from scene B. It can be seen that a number of moving objects are not detected by the proposed method. This is due to the objects appearing in the top left hand corner of the scene where the objects have very low pixel

velocity due to the isometric perspective of the UAV. This could be alleviated by reducing the threshold when detecting moving keypoints, however reducing the threshold too much results in artefacts and increases the number of false positives. It should be noted that the number of false negatives at the cross-road intersection in scene B is 0 for both GFT and BRISK feature detectors. TABLE III R ESULTS USING GFT AND BRISK FEATURES FOR

Feature Detection GFT BRISK

Object detected 3 3

Positives 15 8

False positives 0 3

SCENE

B

False negatives 8 8

These results validate the use of a different feature detection algorithm, in this case GFT, because it provides a more robust solution. This is due to the different criteria that must be considered for both the homography and optical flow steps of the proposed method, i.e. detection of background keypoints for use in the homography step, and the detection of object keypoints for use in the optical flow step. It should also be noted that there are frames where the number of false positives and negatives dramatically in-crease due to the camera losing focus. When the camera loses focus the pixels, and hence objects, become blurred which reduces the precision when applying the proposed algorithms due to the algorithms reliance on the ability to precisely track the location of pixels in a scene. This could be alleviated by selection of a camera with sophisticated auto-focus capabilities. C. Clustering Algorithms Table IV shows the results of running the clustering algorithms on the different scenes. Mean shift clustering appears to be the best due to the reduced number of SOMD and MOSD. This is followed closely by the ELM algorithm; however this results in a higher number of MOSD. The Greedy clustering algorithm fared the worst with SOMD which is to be expected because of the dumb nature of the algorithm. Despite the proximity of some of the cars on the motorway in scene A and at the crossroads in scene B there were very few instances of MOSD. It can be seen that the choice of feature detector influences the outputs of the clustering algorithms. The BRISK feature detector provides better results for clustering because it results in a lower number of keypoints and, therefore, it is less likely that SOMD or MOSD will occur. However, there is a risk that not enough keypoints will be detected to identify an object. The instances of SOMD in scene A occurred on the lorry when it passes under the bridge, with the exception of the ELM clustering algorithm applied to BRISK keypoints. This is because keypoints are only detected on the front and rear of the lorry and the distance between them is greater than the radii of the clustering algorithms. This could be alleviated by reducing the radii of the clustering algorithms; however, it would result in a higher number of MOSD.

Due to the low number of dimensions and small number of keypoints the clustering algorithms do not add a significant amount of time to the proposed method, which is dominated by the time taken to perform the homography. One thing to note is that the ELM algorithm has the capability to cluster between frames which could further improve the result; however this was not exploited in this paper. TABLE IV R ESULTS USING THE CLUSTERING ALGORITHMS ON Clustering Algorithm

Feature Detector GFT

ELM BRISK GFT Greedy BRISK GFT Mean Shift BRISK

Scene A B A B A B A B A B A B

SCENES

SOMD 1 0 0 0 3 1 1 0 1 0 1 0

A AND B MOSD 1 1 0 0 2 1 0 1 0 0 0 0

IV. C ONCLUSION In this paper, we introduce a method for autonomously identify multiple objects using an airborne camera in real time. The introduced approach is fully autonomous and is able to identify multiple moving objects in a scene. The approach has been tested on aerial videos taken by a real UAV. The results show the high reliability of the approach for identifying and tracking moving objects. ACKNOWLEDGMENT This research was supported by the GAMMA Pro-gramme which is funded through the Regional Growth Fund. The Regional Growth Fund (RGF) is a 3.2 billion fund supporting projects and programmes which are using private sector investment to generate economic growth as well as creating sustainable jobs between now and the mid-2020s. For more information, please go to www.bis.gov.uk/rgf. R EFERENCES [1] P. Angelov, P. Sadeghi-Tehran, and R. Ramezani, “A Real-time Approach to Autonomous Novelty Detection and Object Tracking in Video Stream,” International Journal of Intelligent Systems, vol. 26, pp. 189– 205, Jan. 2011. [2] B. Leibe, K. Schindler, N. Cornelis, and L. van Gool, “Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, pp. 1683–1698, Jan. 2008. [3] A. M. Elgammal, R. Duraiswami, D. Harwood, and L. S. Davis, “Background and foreground modeling using nonparametric kernel density estimation for visual surveillance,” in Proceedings of the IEEE, 2002, pp. 1–13. [4] T. Gevers, “Robust segmentation and tracking of colored objects in video,” Circuits and Systems for Video Technology, IEEE Transactions on, no. 6, pp. –, 2004. [5] P. Sadeghi-Tehran, P. Angelov, and R. Ramezani, A Fast Approach to Autonomous Detection, Identification, and Tracking of Multiple Objects in Video Streams under Uncertainties. Springer Berlin Heidelberg, Jan. 2010.

[6] C. Faloutsos, W. Equitz, M. Flickner, W. Niblack, D. Petkovic, and R. Barber, “Efficient and Effective Querying by Image Content,” Journal of Intelligent Information Systems, vol. 3, pp. 231–262, Jan. 1994. [7] P. Sadeghi-Tehran and P. Angelov, “ATDT: Autonomous Templatebased Detection and Tracking of Objects from Airborne Camera,” IEEE IS’2014 (in press), pp. –, Feb. 2014. [8] W. K. Prat, “Correlation techniques of image registration,” IEEE Trans. On Aerospace and Electronic Systems, vol. 10, pp. 353–358, Jan. 1974. [9] M. Onoe and M. Saito, “Automatic threshold setting for the sequential similarity detection algorithm,” IEEE Trans. On Comput., pp. 1052– 1053, Jan. 1974. [10] N. Diehl, “Object Oriented Motion Estimation and Segmentation in Image Sequences,” IEEE Trans. on Image Processing, vol. 3, pp. 1901– 1904, Jan. 1991. [11] B. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision.” IJCAI, pp. 121–130, Jan. 1981. [12] J. Shi and C. Tomasi, “Good features to track,” in Computer Vision and Pattern Recognition, 1994. Proceedings CVPR ’94., 1994 IEEE Computer Society Conference on, 1994, pp. –. [13] S. Baker and I. Matthews, “Lucas-kanade 20 years on: A unifying framework,” International Journal of Computer Vision, pp. –, Jan. 2004. [14] S. Leutenegger, M. Chli, and R. Y. Siegwart, “BRISK: Binary Robust invariant scalable keypoints,” in Computer Vision (ICCV), 2011 IEEE International Conference on, 2011, pp. 2548–2555. [15] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Comm. of the ACM, vol. 24, no. 11, pp. 381– 395, Jan. 1981. [16] R. D. Baruah and P. Angelov, “Evolving Local Means Method for Clustering of Streaming Data,” IEEE World Congress on Computational Intelligence, pp. 2161–2168, Jan. 2012. [17] P. Sadeghi-Tehran and P. Angelov, “ARTOD: Autonomous Real Time Objects Detection by a Moving Camera using Recursive Density Estimation,” in In: Novel Applications of Intelligent Systemsm Springer Verlag. Springer Verlag, Jan. 2014, pp. –. [18] E. Hayman and J. O. Eklundh, “Statistical background subtraction for a mobile observer,” Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, vol. 2, pp. –, 2003. [19] A. Mittal and D. Huttenlocher, Scene modeling for wide area surveillance and image synthesis. Published by the IEEE Computer Society, Jan. 2000. [20] E. Tsinko, “Background Subtraction with a Pan/Tilt Camera,” Ph.D. dissertation, The University Of British Columbia, Jan. 2010. [21] I. Khvedchenia, “A battle of three descriptors: SURF, FREAK and BRISK,” Computer Vision Talks, pp. –, Jan. 2012. [22] R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge University Press, Jan. 2003. [23] X. Li and W. Hu, “A coarse-to-fine strategy for vehicle motion trajectory clustering,” Pattern Recognition, pp. 591–594, Jan. 2006. [24] P. J. MacCormick and A. Blake, “A Probabilistic Exclusion Principle for Tracking Multiple Objects,” in Computer Vision, Jan. 1999, pp. 572– 578. [25] I. Celasun, A. M. Tekalp, M. H. Gokeetekin, and D. M. Harmanci, “2D mesh based video object segmentation and tracking with occlusion resolution,” Signal Processing: Image Communication, vol. 16, pp. 949– 962, Jan. 2001. [26] X. Mao, F. Qi, and W. Zhu, “Multiple-part based Pedestrian Detection using Interfering Object Detection,” in Natural Computation, 2007. ICNC 2007. Third International Conference on, 2007, pp. –. [27] Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 17, pp. 790–799, Jan. 1995. [28] K. Fukunaga, Clustering, in Introduction to Statistical Pattern Recognotion. Academic Press Professional, Jan. 1990.