Mean-shift-FAST Algorithm to Handle Motion-Blur with Tracking Fiducial Markers Eman R. AlBasiouny
Amany Sarhan
Electrical Engineering Dept. Kafrelsheikh University Kafrelsheikh, Egypt
[email protected]
Computer & Control Engineering Dept. Tanta University Tanta, Egypt
[email protected]
Abstract—Vision-based registration methods for augmented reality systems recently have been the subject of intensive research due to their potential to accurately align virtual objects with the real world. The drawbacks of these vision-based approaches, however, are their high computational cost and lack of robustness. Motion blur and partial occlusion are considered two of the most critical problems that affect robustness of tracking fiducial markers, which is used in many vision-based tracking methods like augmented reality. To overcome these two problems, this paper presents a novel method which merges FAST detection with mean shift tracking algorithms. The original color-based mean shift tracking has a major problem of detecting fiducial markers. Therefore, we used “keypoints” feature to make them more distinguishable. These keypoints are detected by FAST corner detector and tracked by mean shift tracker. Experiments show that the proposed algorithm is able to handle problems of motion blur and partial occlusion efficiently.
Kafrelsheikh University Kafrelsheikh, Egypt
[email protected]
implement, their high speed and high degree of precision and accuracy across six degrees of freedom and markers provide the correct scale and convenient coordinate frames and may encode information or at least have an identity. Motion blur is one of the main obstacles for vision-based tracking and augmentation. Because it makes salient features almost disappear, motion blur disturbs tracking algorithms based on features extraction. Motion blur is a pervasive image distortion due to the relative motion between the camera and the scene. In reality, motion blurs are pervasive in the real videos due to the low speed of the camera and the fast motions of the target, and they confound visual tracking tasks by destroying both critical features of the target [9]. Tracking blurred target is hard due to several challenges [9]: (1) the degradation in appearance frequently brings troubles to the target inference, (2) the accompanied abrupt motion brings large uncertainty to the estimation of target position, and (3) the degree of blur itself can vary significantly over frames, ranging from blur-free to drastic blur.
Keywords—mean shift tracker; FAST detector; motion blur; fiducial marker tracking
I. INTRODUCTION Augmented reality [1, 2] is a process of combining or augmenting ‘video’ or photographic displays by overlaying the images with useful computer-generated data. In order to enhance the sensation of realism, the registration between the virtual and real world objects should be stable and accurate. To accomplish that, the system needs to know the user’s relative position to the camera and what he is looking at. Visual tracking systems, which deduce the pose of the camera based on observations of what it sees by using computer vision techniques, have been highly successful.
Some of dynamic tracking algorithms, which are based on target representation and localization, employ a probabilistic model of the object appearance and try to detect this model in consecutive frames of the image sequence. Then, the object’s position is estimated by minimizing a cost function between the model’s histogram and candidate histograms in the next image. A representative method in those algorithms is the mean shift algorithm [10, 11]. Due to its robustness and computational efficiency, color, masked by an isotropic kernel histogram, has been successfully applied in mean shift based tracking algorithms. However, it often causes false positives especially when similar color modes exist in the target’s neighborhood [12]. Therefore, several researches tried to stabilize the mean shift algorithm by combining it with one of the feature-based tracking algorithms.
According to [3], the visual tracking system depends on several detection approaches which are classified to fiducial markers, natural keypoints and natural edges. Marker-based systems consist of detectable predefined patterns that are mounted in the environment and automatically detected using an appropriate detection algorithm. Fiducial markers can be thought of as advanced barcodes with the potential not only to label an object but to position it accurately. Fiducial tracking libraries, such as ARToolkit [4], ARTag [5], ARToolkitPlus [6], ALVAR [7] and ArUco [8], have become very popular, allowing anyone with a PC, webcam and printer to become an augmented reality researcher. The popularity of marker-based systems is explained by the fact that they are easy to
978-1-4673-9971-5/15/$31.00 ©2015 IEEE
T. Medhat Electrical Engineering Dept.
In this paper, a tracking method is proposed by combining FAST corner detector [13] and mean shift tracker algorithms. Our proposal is based on the detection method used in ArUco toolkit. This method has a drawback of producing false negatives due to losing the detection of the blurred and occluded markers. Our algorithm completes the work introduced by ArUco detection method; when the method fails to detect the corresponding marker, our algorithm is invoked.
286
Experimental results show that the proposed algorithm can solve the target with existence of velocity changes, and can detect and track the blurred and partially occluded markers.
real-time with a GPU. In experiments, they confirmed that the mono-spectrum marker can be accurately detected in blurred and defocused images in real-time.
The remaining part of this paper is organized as follows: Section 2 presents an overview of some algorithms used to handle motion-blur problem with visual tracking technology, and a brief explanation of FAST corner detector and the mean shift tracker, which are used in our proposal, is discussed in some details. Section 3 describes the proposed framework and methodology. Section 4 gives the experiments performed on the implemented framework and their results. Finally, section 5 concludes the paper.
Okumura et al. [21] first estimate the amount of blur to try to remove its effect in the input images and improve the registration. Klein and Murray developed a SLAM method robust to motion blur [22]. Kernel density estimation (KDE) [23] is a common non-parametric estimation method. It has been widely used in target detection and tracking.
II.
Mean shift algorithm is based on kernel density gradient estimation. The mean shift procedure is a popular object tracking algorithm since it is fast, easy to implement and performs well in a range of conditions [24]. It firstly proposed by Fukunaga and Hostetler and was promoted by Yizong cheng [25], which greatly expanded the application of the algorithm. Comaniciu and Meer [26] successfully applied it in analysis of feature space, and now it has been widely used in many areas of image processing. Based on the mean shift framework, many tracking algorithms have been proposed recently. Normalized Cross Correlation is applied as an additive step for occlusion handling event, initiated by Bhattacharyya Coefficient threshold was proposed in [27].
RELATED WORK
Considerable research has been carried out on markerbased augmented reality systems. Several different approaches of marker detection are known in literature. All try to find a solution for efficient, accurate and fast augmented reality applications. The efficiency of the process of marker recognition is affected by the efficiency of both marker detection and marker identification. There are some criteria determining the quality of marker recognition: (1) good detection: There should be a minimum number of false positives. (2) Good localization: The marker location must be reported as close as possible to the correct position. (3) Speed: The algorithm should be fast enough to be usable in real-time applications. (4) High precision: The algorithm should have the ability to distinguish between different markers in different luminance conditions. (5) Robust to motion blur and occlusion: If the marker is blurred or part of it is occluded, it should not affect the global marker pose estimation.
However, the color histogram based algorithms are sensitive to similar backgrounds and illumination variation. Therefore, some robust and distinguishing features such as SIFT [28] and SURF [29] were introduced into the target tracking field of mean shift method. Haner et al. [30] proposed a method to improve the robustness of mean shift tracking algorithm by using SURF features. Then, Kai Du et al. [31] proposed a novel tracking algorithm which fused improved MS and SIFT. Finally, Nan Luo et al. proposed a great work in [32].They used FAST to detect the feature points and utilized the DAISY descriptor [33, 34] to describe the points. Then they created the histograms both of the target model and the candidate using the feature points and found the correspondence between two frames. Finally, they calculated the new location of the target by matching the histograms in the mean shift framework.
Most of the existing tracking methods ignore blur and are, therefore, prone to failure when the input images are blurred. However, few methods explicitly consider blur. A natural solution to the motion-blur problem is to first deblur the contents and then apply tracking. In image processing, a large number of robust deconvolution methods have been developed, from the earlier approaches based on regularization [14] to the latest ones using image statistics [15, 16], edge priors [17], and sparse representation [18, 19]. However, most deblurring algorithms are computationally expensive and therefore not suitable for time sensitive visual tracking tasks. In addition, dealing simultaneous different degrees of blur effects is not a trivial problem.
In this paper, an effective marker tracking algorithm is proposed to handle the motion-blur problem in ArUco toolkit. This algorithm is a combination of mean shift tracker and FAST corner detector. Comparing with other feature detectors, such as SURF, SIFT or BRISK, FAST is a corner based detector with high speed which can get more feature points. Therefore, the marker detection is more reasonable and precise with more feature points.
In [9], a novel BLUr-driven Tracker (BLUT) framework for tracking motion-blurred targets is presented. BLUT actively uses the information from blurs without performing deblurring. They further use the motion information inferred by blurs to guide the sampling process in the particle filter based tracking.
A. FAST Corner Detector FAST (Features from Accelerated Segment Test) [13] feature detector belongs to the class of corner detectors which work by examining a small patch of an image to see if it looks like a corner. Those corner detectors are computationally efficient because they work by examining only few pixels for each corner detected. FAST algorithm is fast enough to be used in real-time applications. As shown in fig. 1 [13], the test is performed at pixel ‘c’ by examining the 16-pixel circle surrounding it with a radius of 3 pixels. If the intensities of at least 12 contiguous pixels of the 16-pixel circle are brighter
For marker detection in augmented reality applications, the problem of conventional markers is that their patterns consist of high-frequency components, such as sharp edges, which are attenuated in blurred or defocused images. These conventional markers are difficult to detect and identify under conditions of image blur and defocusing, which attenuates the high-frequency components from their sharp edges and corners. In [20], a mono-spectrum marker consisting of a single low-frequency component is presented. It can be detected in
287
Obtain all borders in the frame by segmentation Find Contours Remove borders with small number of points Fig. 1. FAST feature detection in an image patch [13].
than or darker than the intensity of ‘c’ by some threshold t, ‘c’ is detected as a corner [13].
Polygonal approximation to keep the 4-corner contours
The test examines only the four pixels at 1, 5, 9 and 13 (the four compass directions). If ‘c’ is a corner, then at least three of them must be brighter than (Ic + t) or darker than (Ic – t). If neither of these is the case, then ‘c’ cannot be a corner. The full segment test criterion can then be applied to the remaining candidates by examining all pixels in the circle. This type of corner detection uses the intensities of the 16-pixel circle as a feature vector [13].
Sort corners in anti-clockwise direction
Remove too close rectangles to keep the most external border
B. Mean Shift Algorithm
Fig. 2. Block diagram of ArUco_Detection Algorithm.
high ability in tracking rigid and non-rigid objects. However, the color-based mean shift algorithm is vulnerable in tracking the object [38]. As a consequence, using a single feature in a tracking process is unsuitable. Color histogram describes only the color composition distribution and ignores the space information, so that the tracking target is easy to lose in the situation of complex background or changing illumination [12].
Mean shift algorithm is a tracking algorithm based on features. It is a simple iterative procedure that shifts each data point to the average of data points in its neighborhood [10], as described in the next general steps [35]: x Consider a set S of n data points x in d-D Euclidean space X. x Let K(x) denote a kernel function that indicates how much x contributes to the estimation of the mean. x Then, the sample mean m at x with kernel K is given by: m(x) =
∑ ( ) ∑ ( )
The important inherent drawback for the mean shift tracking algorithm is the local optimization [37]. The original mean shift tracking algorithm assumes that the initialization point falls within the basin of attraction of the desired mode, but this assumption may not be true when the displacement between successive frames is relatively large.
(1)
x The difference m(x) Ѹ x is called mean shift. x Mean shift algorithm: iteratively move data point to its mean. x In each iteration, x Ћ m(x). x The algorithm stops when m(x) = x. x The sequence x,m(x),m(m(x)), . . . is called the trajectory of x. x If sample means are computed at multiple points, update is done simultaneously to all these points at each iteration.
III.
THE PROPOSED ALGORITHM
As mentioned, using the color as a feature that represents the tracked target causes some drawbacks in mean shift tracking, especially with tracking rigid objects like markers. Therefore, in the proposed algorithm, some robust and distinguishing features such as keypoints are introduced into the marker tracking field. These keypoints are detected by using FAST corner detector.
Mean shift tracking algorithm first needs to determine the area contains a dynamic target feature. There are many features can be used for representing the target, such as feature points, contour, and color distribution. The original mean shift algorithm is a color-based object tracking method. This method works at each of the frames, where its color histogram is close to the object’s referenced color histogram [36].
The algorithm, employed to track markers in an image using both FAST corner detector and mean shift tracking, comprises several steps aimed at tracking markers in different conditions of motion, illuminance and occlusion. While each method of our compromised algorithm is applied in previous work, but this combination itself is a novel contribution.
The distance between two histograms is measured through the Bhattacharyya coefficient and the search process is continued to find the object’s location via the mean shift iterations, which are initiated from the object’s location as estimated in the previous frame [36].
Our algorithm is based basically on the marker detection algorithm used in ArUco toolkit and presented in [38]. They performed marker detection in two steps. The first one is performing “Image Segmentation” employing a local adaptive thresholding approach which has proven to be very robust to different lighting conditions. The second step is performing “Contour Extraction and Filtering”. A contour extraction was
Due to its robustness, high speed and computational efficiency mean shift algorithm is used in a wide range with
288
The mean (center) of the keypoints is calculated as follows: Input Video frame
mean =
∑ keypoints ,
(2) mean = ∑ keypoints ∀ n = number of keypoints within the rectangle
Perform ArUco_Detection
Yes
The center of the bounding rectangle is shifted to the new mean by applying the following equations: Marker Detected?
bounding_boxx = meanx ــbounding_box.width/2, bounding_boxy = meany ــbounding_box.height/2
No
(3)
For more clarification, Fig. 4 shows the steps of the proposed algorithm applied to frames 66 and 67 of the recorded video used in the experiment of ArUco toolkit and available on their web page [8] as an example. As it is clear in the application of the proposed algorithm, using “ArUco_detection” as a first step in each frame decreases the effect of the drawback of the local optimization which severely degrades the performance of the original mean shift algorithm.
Detect FAST keypoints in the current frame
Initialize the tracking window to the last known position
Detect keypoints falling within the tracking window
As the mean shift algorithm alone suffers from accumulating the detection mistakes and losing track of the
Update the center of the tracking window to be the centroid of the keypoints that fall within it
No Convergence?
(a)
(b)
performed using the Suzuki and Abe [39] algorithm. Then, a polygonal approximation was performed using the DouglasPeucker [40] algorithm. Finally, they simplified near contours leaving only the external ones. Fig. 2 describes the steps of the algorithm used in ArUco toolkit and presented in [38]. We will call this method in the remained of this paper as “ArUco_detection” for simplicity.
(c)
(d)
In this paper, the proposed method is performed using mean shift algorithm to track markers based on FAST keypoints. We have replaced the color as the most used feature in the mean shift algorithm with the “keypoints” feature. Point features have strong characteristics [13] and this makes it relatively easy to localize them and to find correspondences between frames. This makes point-based systems robust to large, unpredictable inter-frame motions. In addition, points have the advantage of not being affected by illumination and have the property of rotational invariance.
(e)
Yes Stop at the center of the marker Fig. 3. Block Diagram of the proposed method.
……... (f)
Fig. 4. The main steps of the proposed algorithm. (a) Frame 66: all markers are detected with ArUco_detection method. (b) Frame 67: one marker is lost by ArUco_detection method. (c) Detect the FAST keypoints in the current frame. (d) Initialize the tracking rectangle to bound coordinates of this marker in the last frame. (e) Determine keypoints within the bounding rectangle. (f) For 10 times, determine which keypoints fall within the tracking rectangle and update the center of the tracking rectangle to be the centroid of all the keypoints that fall within it.
Fig. 3 shows the steps of the proposed algorithm assuming tracking one marker. The number of iterations of the convergence of the algorithm is obtained when the subject is followed within the image sequence.
289
TABLE 1. AVERAGE PROCESSING TIME FOR THE THREE ALGORITHMS TO DETECT MARKERS.
markers. Using the “ArUco_detection” algorithm is mainly used to reinitialize the algorithm to the correct position. That will help in the cases when the displacement between successive frames is relatively large. As a result, we use a variable called max_num_absent_frames to be always sure that we don’t lose our continuous initialization guide for more than 4 frames. The proposed algorithm steps can be summarized as follows:
Processing Time (ms) 13.55552
Meanshift-FAST
13.66245
Meanshift-FAST-DAISY
59.23562
The proposed algorithm is compared with “ArUco_detection” method to which we referred in the previous section and another tracking algorithm, based on mean shift presented in [32]. They used FAST to detect the feature points and utilized the DAISY descriptor to describe the points. Then, they created the histogram both of the target model and the candidate using the feature points and found the correspondence between two frames. Finally, they calculated the new location of the target by matching the histograms in the mean shift framework. We will call this algorithm in the remainder of this paper as “meanshiftFAST-DAISY”. The performance of the above-mentioned algorithms is evaluated using three different criteria: processing time, motion blur and partial occlusion.
Algorithm: Tracking multiple-marker board by using meanshift-FAST algorithm 1. max_num_absent_frames = 4 #if a marker has been absent for more than 4 frames, assume it has left scene while load video frame It , t = 0, 1, 2, …..,N do 2. Detect the markers using ArUco_detection algorithm detected_markers[] 3. Detect FAST keypoints in It while markers[] has not desired size do #markers[] is the array of all currently being detected markers If marker is detected by ArUco_detection algorithm then 4. markers[] detected_markers[] 5. num_absent_frames = 0 6. return end if 7. num_absent_frames = num_absent_frame + 1 #marker has not been detected by ArUco_detection 8. Compute bounding_box #the bounding rectangle o of the last known position of the marker while index