Precision feature point tracking method using a ... - SPIE Digital Library

0 downloads 0 Views 654KB Size Report
Keywords: feature point tracking, template matching, template update strategy. 1. ... robust drift-correcting template matching method. It adapts the inverse ...
Precision feature point tracking method using a drift-correcting template update strategy a

Xiaoming Peng*a, Qian Mab, Qiheng Zhangb, Wufan Chena, Zhiyong Xub College of Automation, University of Electronic Science and Technology of China, No.4, Section 2, North Jianshe Road, Chengdu, Sichuan 610054, China; b 5th Lab, Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 350 P. O. Box, Chengdu, Sichuan 610209, China ABSTRACT

We present a drift-correcting template update strategy for precisely tracking a feature point in 2D image sequences in this paper. The proposed strategy greatly extends Matthews et al’s template tracking strategy [I. Matthews, T. Ishikawa and S. Baker, The template update problem, IEEE Trans. PAMI 26 (2004) 810-815.] by incorporating a robust non-rigid image registration step used in medical imaging. Matthews et al’s strategy uses the first template to correct drifts in the current template; however, the drift would still build up if the first template becomes quite different from the current one as the tracking continues. In our strategy the first template is updated timely when it is quite different from the current one, and henceforth the updated first template can be used to correct template drifts in subsequent frames. The method based on the proposed strategy yields sub-pixel accuracy tracking results measured by the commercial software REALVIZ® MatchMover® Pro 4.0. Our method runs fast on a desktop PC (3.0 GHz Pentium® IV CPU, 1GB RAM, Windows® XP professional operating system, Microsoft Visual C++ 6.0 ® programming), using about 0.03 seconds on average to track the feature point in a frame (under the assumption of a general affine transformation model, 61×61 pixels in template size) and when required, less than 0.1 seconds to update the first template. We also propose the architecture for implementing our strategy in parallel. Keywords: feature point tracking, template matching, template update strategy

1. INTRODUCTION We are interested in tracking a pre-selected local feature point precisely and consecutively through a long 2D image sequence containing thousands of frames, which is of high interest to motion analysis, 3D scene reconstruction and in particular, military purposes. It is non-trivial to track a feature point consecutively and also precisely in a long image sequence because the appearance of the feature point can change substantially over a number of frames, due to changes in illumination and/or scene structure. Some commercial softwares (e.g., REALVIZ® MatchMover® Pro) can do the job pretty well, but the technologies they use are not open. In addition, such softwares adopt batch computations that are not suitable for on-line tracking applications. In the public literature, most feature point tracking methods (e.g., [1]-[5]) extract features in two consecutive frames and then find correspondences between the features. Such methods have two drawbacks. First, as far as the authors know, no existing widely-used feature point extractors (e.g., the Harris detector [6], the SIFT detector [7], the SUSAN detector [8], a wavelet-based feature detector [9]) are reliable to extract the feature point in every frame of the long sequence under different illumination conditions and appearance variations. Second, a feature point extractor may output multiple feature points that are very adjacent to each other, and to limit the number of output feature points the non-maximum suppression is usually used which might result in the suppression of the real feature point of interest. We consider using template matching for this task. Template matching utilizes the whole content of the template while not relying on the accuracy of feature extraction, thus is more robust than feature-extraction-and-matching methods. Some methods use 3D-CAD models ([10] and [11]) or a set of appearance models [12] to assist tracking. However, such models are not always available in practice and in most cases we could only use the template extracted from the first frame. Assume that we extract a template in the first frame with the feature point to be tracked located at the center of the template. Now the problem is to match the template in the remaining frames and find the pixels (in sub-pixel accuracy) corresponding to the center of the template in those frames. Image Processing: Algorithms and Systems VII, edited by Jaakko T. Astola, Karen O. Egiazarian Nasser M. Nasrabadi, Syed A. Rizvi, Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 7245, 72450T © 2009 SPIE-IS&T · CCC code: 0277-786X/09/$18 · doi: 10.1117/12.805435 SPIE-IS&T/ Vol. 7245 72450T-1

The underlying assumption behind template matching is that the appearance of the template remains the same throughout the entire image sequence. This assumption is often violated in practice, especially in a long image sequence. One solution to this problem is the naive strategy, in which the template is updated every frame (or every number of frames) with a new template (this new template is called the current template) extracted from the current image at the current location of the template. The problem with this naive strategy is that the template drifts (so does the feature point which is located at the center of the template). A remedy for template drift was proposed in [12], where the template is first updated using the naive strategy, and then the updated template is aligned with the first template to eliminate drifts. However, the alignment of the first template with the updated template might fail in case the two templates are quite different in appearance (e.g., due to partial occlusion). In case the difference between the two templates is persistent over a quantity of frames, the drift would build up just as in the case of the naive strategy. A very recent work [13] proposed a robust drift-correcting template matching method. It adapts the inverse compositional algorithm [14] by adding robust weights to the least square solution and the Hessian matrix, and then uses this adaptation to successively align the current and first templates with the current frame. This method is more robust to appearance variations and illumination changes than the original drift-correcting method of [12], with the robust weights actually acting as a role of non-object pixels suppression. However, we have found that when the first template becomes comparably different with the current one this method will fail. Furthermore, the two template update methods of [12] and [14] share a common shortcoming in that they both use a single transformation model (e.g., a rigid transformation model for computation efficiency) to describe the template deformations through the tracking, which is generally insufficient as the template might undergo complex or non-rigid deformations. The main contribution of the paper is the proposition of a powerful template update strategy for precision feature point tracking. By incorporating a robust non-rigid image registration step, the proposed strategy is able to update the first template timely when it is quite different from the current one due to illumination changes, partial occlusion, and nonrigid deformations. Henceforth the updated first template can be used to correct template drifts in subsequent frames. It is worth pointing out that, in contrast to most methods[1]-[5] we emphasize the tracking of a single feature point because in some applications (particularly military applications) there is only a feature point of interest. The rest of the paper is organized as follows. The proposed template update strategy is given in section 2. In section 3 we present the experimental results using the proposed template update strategy. We also compare the results with those by the two previous template update strategies in that section. Conclusion and future work are given in section 4.

2. THE PROPOSED TEMPLATE UPDATE STRATEGY Let the image sequence be represented by {I n }0≤ n ≤ L −1 , where L is the total of frames. The feature point to be tracked is chosen in the first frame I 0 either manually or by the assistance of a feature point extractor. The first template1 T1 is then extracted from I 0 with the chosen feature point as the template’s center. The current templates are denoted as {Tn }1≤ n ≤ L −1 . An overview of the proposed template update strategy is seen in Fig. 1. The strategy is composed of two parts. The first part (enclosed by bold dash lines in Fig. 1), based on the template update strategy of [12]2, updates the current templates. The second part is a non-rigid image registration step followed by updating the starting template. The feature point is tracked in the subsequent frames {I n } ( n ≥ 1 ) by alternating between the two parts of the strategy.

2.1 Updating the current templates The idea of this part is to use the starting template Ts (which is initialized to the initial template T1 at the very beginning of the tracking) to correct the drifts in the current template Tn . The current frame I n is first tracked with template Tn , starting from the previous parameter p n−1 , and results a new parameter p n . A more accurate parameter p*n is obtained by tracking Ts in I n starting at p n . When the condition W(c;p n ) − W(c;p*n ) ≤ ε is satisfied, i.e., the two locations 1

In the following text we introduce the “starting template” to differentiate from the “first template” in [12]. In [12] the first template is extracted from the first frame of the sequence and will stay unchanged through the tracking. However, the “starting template” is initialized as the “first template” but can be changed over time. 2 The original template update strategy of [12] is for template tracking. We adapt it a bit to fit it with our feature point tracking framework. Nevertheless, the idea of the adaptation is the same as the original version.

SPIE-IS&T/ Vol. 7245 72450T-2

W(c;p n ) and W(c;p*n ) of the feature point found in I n by the two templates Tn and Ts are close to each other, the drift-correcting is viewed as successful; I n (W(X;p n *)) is used as Tn +1 for the next frame I n +1 . output W(c;pn') as the feature point's location

pn'

get the drift between the centers of the nonrigidly registered patches yes no

starting template update In(W(X;pn'))

In

tracking

non-rigid registration is successful

non-rigidly register two patches extracted from In and T1(pn)

Tn

In pn-1

tracking

pn

pn*

W(c;pn)

W(c;pn*)

no

pn*

W(c;p n ) − W(c;p*n ) ≤ ε

yes the feature point is probably lost

Ts

output W(c;pn) as the feature point's location

In(W(X;pn*))

current template update

Denotations: Tn-current template; In-current image; Ts-starting template p- parameter set that warps a template W(x;p)-takes a pixel location x in the coordinate frame of a template and maps it to a sub-pixel location W(x;p) in the coordinate frame of In c-center of a template X-the set of all the pixel locations in a template In(W(X;p))-the set of pixel intensities interpolated at locations W(X;p) of image In T1(pn)-warped version of T1 by pn Fig. 1 The flowchart diagram of the proposed template update strategy for feature point tracking.

This part involves choosing a transformation model for W (x; p) . Although in general the transformation model can be arbitrarily complex [12], it is not practical to choose a complex one in practice for two reasons. On the one hand, it is generally difficult to select a general model that is able to describe all the possible deformations (including non-rigid ones) of the template. On the other hand, even if one could use a complex transformation model, the computation efficiency would drop drastically as the freedom of degrees of the deformation model increases3.

3

We use a fast algorithm, the inverse compositional algorithm [14] to track a template in a frame in the tracking module of Fig. 1. According to [14], for each current template the one-time pre-computation cost computing the steepest descent images and the

SPIE-IS&T/ Vol. 7245 72450T-3

We choose the general affine transformation [Eq. (1)] as the transformation model, which is widely used in many real applications and has a moderate number of parameters. Also, the affine transformation is sufficient for describing the deformations between consecutive frames [4]. As can be seen in section 2.2, the assumption does not hinder us from capturing the non-rigid deformations of the templates. ⎧(1 + p1 ) x + p3 y + p5 W (x; p) = ⎨ (1) ⎩ p2 x + (1 + p4 ) y + p6 where p = [ p1 , p2 , p3 , p4 , p5 , p6 ]t , x = [ x, y ]t . p is initialized to [0, 0, 0, 0, x0 , y0 ]t , where [ x0 , y0 ]t is the location of the template in the first frame. The inverse compositional algorithm outputs a p ( p is p n or p*n ) that best matches a template T ( T is Tn or Ts ) to the frame I n . Since p is computed under the assumption that the deformations between T and I n can be described by the general affine transformation (which might not be accurate), we adopt a sub-pixel refinement technique [5] to further refine p . Assume that the center of the warped template T(p ) is offset [Δx, Δy ]t relative to W (c; p) , where c is the center of template T . Then for a small neighborhood −d ≤ x, y ≤ d around W (c; p) we have I n ( x, y ) = T(p ) ( x − Δx, y − Δy ) . The difference D between T(p ) and I n in the small neighborhood is expressed as D( x, y ) = T(p ) − I n ( x, y )

= T(p ) ( x, y ) − T(p ) ( x − Δx, y − Δy ) . ≈

(2)

∂T ∂T Δx + Δy ∂x ∂y (p )

(p )

Eq. (2) can be expressed in matrices as J = GO , where J = [D(−d , −d ),..., D(0, 0),...D(d , d )]t , t

⎡∂T(p ) (−d , −d ) / ∂x,..., ∂T(p ) (0, 0) / ∂x,..., ∂T(p ) (d , d ) / ∂x ⎤ t G = ⎢ (p ) ⎥ , and O = [Δx, Δy ] . The offset vector O can be computed (p ) (p ) ⎣⎢∂T (−d , −d ) / ∂y,..., ∂T (0, 0) / ∂y,..., ∂T (d , d ) / ∂y ⎦⎥

as O = (G t G ) −1 G t J . The refined p is written as ⎧ p5 ← p5 + Δx , ⎨ ⎩ p6 ← p6 + Δy

(3)

which is the output of the “tracking” module.

2.2 Updating the starting template When the condition W(c;p n ) − W(c;p*n ) ≤ ε is violated, we know that the current template Tn has become reasonably different from the starting template Ts . The original drift-correcting strategy of [12] would treat this case by using I n (W(X;pn )) as Tn +1 , which is a template without drifts corrected. In case that the difference between Tn and Ts is persistent over a quantity of frames, the drifts would build up just as in the case of the naive strategy. This part deals with how to update Ts when this happens. Intuitively, the updated Ts should satisfy two conditions. First, it should resemble Tn in appearance. Second, the drifts in the updated Ts should be corrected by the initial template T1 . Obviously, I n (W(X;pn )) is a good candidate for the first condition. Now the key is to correct the drifts between I n (W(X;pn )) and T1 . Generally speaking, the discrepancies between the two templates could be ascribed to intensity variations caused by illumination changes, partial occlusion, non-rigid deformation, or a combination of all these factors. It will be difficult (if not impossible) for a method to account for all these factors. Fortunately, the aim of the paper is to track a feature point, thus we only need to correct the drift between the two respective feature point positions in I n (W(X;pn )) and T1 . For this purpose, we extract two small patches from the warped template T1(pn ) and I n , with the centers of the patches coincident with the center of T1(pn ) and Hessian is O(n 2 N ) , and the cost of each succeeding iteration is O(nN + n 2 ) , where n is the number of parameters of the transformation model and N is the number of pixels in the template.

SPIE-IS&T/ Vol. 7245 72450T-4

W (c; p n ) respectively, where c is the center of template T1 . We then non-rigidly register the two patches. Using small patches instead of the two templates themselves for registration has two advantages. For one thing, non-rigid registration methods are usually time-consuming; small patches will take less time to be registered than will the templates themselves. For the other thing, even if I n (W(X;pn )) differs considerably with T1 due to partial occlusion, the

registration might still be carried out given that the area occupied by the small patch in I n is not occluded. The sizes of the small patches are 12×12 pixels as a balance of registration accuracy and computation efficiency. Non-rigid image registration methods are mainly used in medical image registration ([15]-[17]). We choose a robust general-purpose algorithm [17] to register the patches. This algorithm not only corrects non-rigid deformations but also explicitly accounts for local and global variations in image intensities. A brief description of the method is given below, and the interested reader is referred to [17]. Let the two patches of T1(pn ) and I n be denoted as f ( x, y, t ) and f ( xˆ , yˆ , t − 1) , respectively4. f ( x, y, t ) is the source image to be registered to f ( xˆ , yˆ , t − 1) , the target image. A registration model that accounts for the local motion between the patches as well as local intensity variations can be expressed as m7 f ( x, y, t ) + m8 = f (m1 x + m2 y + m5 , m3 x + m4 y + m6 , t − 1) , (4) where m1 ~ m6 are local affine transformation parameters; m7 and m8 are local parameters that embody a change in contrast and brightness, respectively. To estimate these parameters, a quadratic error function is minimized: E (m) = ∑ {[m7 f ( x, y, t ) + m8 ] − f (m1 x + m2 y + m5 , m3 x + m4 y + m6 , t − 1)}2 , (5) ( x , y )∈Ω

where m = [m1 , m2 ,..., m6 , m7 , m8 ] and Ω denotes a small spatial neighborhood. Eq. (5) can be expanded using a first-order truncated Taylor series and re-written as E (m) = ∑ [k − b t m]2 , t

(6)

( x , y )∈Ω

t

where k = ft − f + xf x + yf y , b = ⎡⎣ xf x , yf x , xf y , yf y , f x , f y , −f , −1⎤⎦ ; f x , f y and ft are the spatial/temporal derivatives of f . Differentiating Eq. (6) with respect to m and setting the results to zero yields −1

⎡ ⎤ ⎡ ⎤ m = ⎢ ∑ bbT ⎥ ⎢ ∑ bk ⎥ . ⎣ ( x , y )∈Ω ⎦ ⎣ ( x , y )∈Ω ⎦ In fact, a smoothness constraint on m is introduced in [17] and m is solved iteratively as m ( j +1) = (bbt + L) −1 (bk + Lm ( j ) ) ,

(7)

(8)

where m ( j ) is the component-wise average of m ( j ) over the small spatial neighborhood Ω , and L is an 8×8 diagonal matrix with diagonal elements λi ( i = 1 ~ 8 ) and zero off the diagonal. The initial estimate m (0) is estimated from the closed-form solution of Eq. (7). After the non-rigid registration of f ( x, y, t ) and f ( xˆ , yˆ , t − 1) , we get a warped version f w ( x, y, t ) of f ( x, y, t ) . We then compute a normalized cross-correlation (NCC for short) between f w ( x, y, t ) and f ( xˆ , yˆ , t − 1) . If the result is larger than a threshold (0.7 in this paper), we regard the non-rigid registration as successful. The center c = [ xc , yc ]t of f ( x, y, t ) (which is also the center of T1( pn ) ) is mapped to (mc,1 xc + mc,2 yc + mc,5 , mc,3 xc + mc,4 yc + mc,6 )t in f ( xˆ , yˆ , t − 1) , where mc ,1 ~ mc ,6 are the first six parameters of m c which is the local registration parameter for c . Then the drift [Δx ', Δy ']t

between the centers of the two patches can be computed as ⎪⎧Δx ' = (mc ,1 − 1) xc + mc,2 yc + mc,5 . ⎨ ⎪⎩Δy ' = mc ,3 xc + (mc,4 − 1) yc + mc,6 A refined version p n ' of p n is obtained using [Δx ', Δy ']t as p n ,i ' = p n ,i ( i = 1 ~ 4 ), p n ,5 ' = p n ,5 + Δx ' , and

p n ,6 ' = p n ,6 + Δy ' . Henceforth I n ( W ( X; p n ')) is used as the new starting template Ts and the tracking switches to the

4

The temporal parameter t is introduced for consistency within a differential formulation [18].

SPIE-IS&T/ Vol. 7245 72450T-5

(9)

first part. If the NCC between f w ( x, y, t ) and f ( xˆ , yˆ , t − 1) is smaller than the predefined threshold, we view the non-rigid registration as unsuccessful. When this happens, we use I n (W(X;pn )) as Tn +1 and try to update Ts at the next frame. Fig. 2 gives some representative updated starting templates in an experiment. The image sequence contains 2018 frames, with each frame having a size of 640×480 pixels.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

Fig.2 Some representative updated starting templates in an experiment. The feature point to be tracked is an eye corner of a toy cat model. (a) initial starting template extracted from frame 1; (b)-(j) are updated starting templates in frames 534, 622, 707, 940, 1106, 1320, 1519, 1800 and 1979, respectively.

The feature point to be tracked is an eye corner of a toy cat model. Comparing these starting templates one can see that none pair of these templates can be registered by a rigid transformation. This reveals an interesting characteristic of our method: even though we assume a rigid transformation model (affine transformation in this paper) to describe the template deformation, the assumption does not hinder us from capturing the non-rigid deformations of the templates.

2.3 Extension to parallel processing Our tracking method based on the proposed template update strategy runs fast on a desktop PC (3.0-GHz Pentium® IV CPU, 1GB RAM, Windows® XP professional operating system, Microsoft Visual C++ 6.0 ® programming), using about 0.03 seconds on average to track the feature point in a frame and when required, less than 0.1 seconds to update the starting template. Thus, the maximum time for processing a frame is the sum of the above two periods of time. In practice, the average speed of the method is still quite fast (about 28Hz on average in all our experiments) as the update of the first template is not frequent. Although this speed is desirable for off-line tracking purposes, it might not be sufficient to on-line tracking applications. We suggest two ways of overcoming this drawback. One way is to implement the proposed method by special hardware, which could be much faster but also more expensive. The other way is probably more practical: one may use two processing units to deal with the two parts of the proposed strategy respectively and simultaneously. In this architecture, the first processing unit tracks the feature point through the frames and which can be realized in real time. When the condition W(c;p n ) − W(c;p*n ) ≤ ε is violated, the other processing unit starts to update the starting template and at the same time the first processing unit keeps tracking the feature point with the drift-uncorrected templates I n ( W( X; p n )) . Within a few frames the second processing unit finishes updating the starting template and informs the first processing unit that a new starting template is available. The first processing unit can then use the updated starting template for drift-correcting in subsequent frames. One might lose tracking accuracy in these few frames as the templates I n ( W ( X; p n )) used for tracking are drift-uncorrected. Nevertheless, this sacrifice is acceptable as the drift accumulations in a few frames would not be prominent.

3. EXPERIMENTAL RESULTS We implemented the method based on the proposed template update strategy in Microsoft Visual C++ 6.0 ® and tested it in many experiments on a desktop PC (3.0-GHz Pentium® IV CPU, 1GB RAM, Windows® XP professional operating system). A feature point can be tracked up to a few frames before it disappears or is occluded. The size of the templates for the first part of the method is 61×61 pixels. It took the proposed method 0.03 seconds on average to run this part. The size of the small spatial neighborhood Ω in the second part of the proposed method is 3×3 pixels, and it took the method

SPIE-IS&T/ Vol. 7245 72450T-6

0.08 seconds on average to run this part. We used multithreading programming to simulate the parallel processing architecture in section 2.3 by implementing the two parts of the proposed template update strategy with two separate threads. Experimentally we have found that satisfactory result can be achieved when ε takes the value between 0.1 and 0.3. We also implemented the original template update strategy of [12] and the robust drift-correcting template matching strategy of [13] for comparison. To compare the three methods on a fair basis, we also incorporated the sub-pixel refinement technique used in the first part of the proposed strategy into the latter two methods. We present the results of two examples in this section. In the first example we generated an image sequence in which the locations of the feature point in each frame were recorded and used afterwards to measure the tracking accuracy of a given method. In the second example we recorded an image sequence with a hand-held Canon Power Shot G9® digital camera. As the ground truth was not available, we used the commercial software REALVIZ MatchMover Pro 4.0® as a benchmark to measure the tracking accuracies of the three methods. REALVIZ MatchMover Pro 4.0® uses a SMART (Scalable Matching Architecture for Tracking) technology that can yield reasonably accurate results.

3.1 Example 1: simulated image sequence tracking In this example we generated a simulated image sequence containing 300 frames, each frame having a size of 400×400 pixels. First, we generated a square of 51×51 pixels in size [Fig. 3 (a)]. The upper left quarter of the square is a uniform region of 25×25 pixels in size and the gray level of the region is 50; The upper right quarter of the square is a uniform region of 25×25 pixels in size and the gray level of the region is 100; the bottom left quarter of the square is a uniform region of 25×25 pixels in size and the gray level of the region is 150; the bottom right quarter of the square is a uniform region of 25×25 pixels in size and the gray level of the region is 200; the gray level of the one-pixel-thin lines in the square is 250. We overlaid the square on a black background (the gray level of which is 0), generating the first frame. Afterwards, we smoothly enlarged, rotated and translated the square and introduced non-rigid deformations of the square with thin-plate-splines [18]. We then overlaid the square on a black background, generating subsequent frames. Finally, we added moderate Gaussian white noise to the frames. The average SNR of each noise-contaminated frame is 8.79db5. The generated image sequence is shown in Fig. 3 (b)-(e). We also simulated partial occlusion cases between frames 128 and 211.

(a)

(b)

(c)

(d)

(e)

Fig. 3 The simulated object whose center is to be tracked and simulated image sequence for example 1. (a) the simulated object; (b) frame 1; (c) frame 100; (d) frame 169; (e) frame 300

The tracking errors6 of the three methods are listed in Table 1. From Table 1 it can be seen that the proposed template update strategy outperformed the previous two template update strategies in tracking the feature point in this example. When implemented in parallel, the tracking performance of the proposed strategy deteriorated slightly, but still achieved a mean accuracy of 0.42 pixels.

The SNR for each frame is computed as 20log10 (σ f / σ n ) , where σ f is the standard deviation of frame f and σ n is the standard

5

deviation of the added noise. Assume that at a frame we get a feature point location x using a method. The tracking error of the given method for the frame is defined as the Euclidean distance between x and the feature point’s true location. 6

SPIE-IS&T/ Vol. 7245 72450T-7

Table 1 Tracking errors (in pixels) of different methods for example 1.

Methods

Mean

the original template update method of [12]+ sub-pixel refinement technique [5] the robust template update method of [13]+ sub-pixel refinement technique [5] proposed template update method ( ε = 0.2 ) proposed template update method( implemented in parallel)

1.51

Standard deviation 1.07

Max 3.43

reached 3 pixels at frame 86 and continued to increase as the tracking went on 0.381 0.235 1.01 0.420 0.273 2.06

3.2 Example 2: real image sequence tracking In this example, we attached a chess-board patch on the keyboard of a telephone. We also placed a toy model partly overlaying the chess-board patch. The image sequence in this example was recorded by a hand-held Canon Power Shot G9® digital camera and contained 2018 frames. Each frame was 640×480 pixels in size. The feature point to be tracked was the center of the patch in the first frame [Fig. 4(a)]. We moved the camera from side to side or up and down and in the course of the camera movements the patch could be partially occluded by the toy model; illumination conditions would also deteriorate when partial occlusion happened. In addition, the toy model’s movement relative to the patch could result in a non-rigid motion of the template. Tracking results by our method in this example are given in Fig. 3 (b)(d). The non-rigid registration algorithm was applied on 80 frames, accounting for 3.80% of the total frames. Since frame 85 the condition W(c;p n ) − W(c;p*n ) ≤ ε ( ε = 0.15 in this example) no longer held and the current templates in the original template update strategy of [12] had not been drift-corrected. As a result, the tracked feature point slowly drifted away from its true position. At frame 256, the tracking error was 2.08 pixels for the original template update strategy of [12], while the tracking error was 0.30 pixels at that frame for our strategy. The robust drift-correcting template matching strategy of [13] performed better than the original template update strategy of [12], but also failed at frame 991 with a tracking error of 2.58 pixels. The tracking error was 0.59 pixels for our strategy at that frame. A comparison of the tracking accuracy of the three template update strategies is given in Fig. 3(e), from which it can be seen that our strategy performed the best. The mean and stand deviation of the tracking errors of our method in this example were 0.46 and 0.25 pixels, respectively. The maximum tracking error was 1.06 pixels. Fig. 3(f) gives the tracking errors of our template update strategy implemented in parallel. The mean and stand deviation of the tracking errors for the parallel implementation were 0.50 and 0.28 pixels, respectively. The maximum tracking error was 1.69 pixels. The explanation for the performance deterioration is that in some frames the current templates were driftuncorrected while being used for tracking. In table 2 we give the percentage of frames in which the tracking errors of our method were within a given bound. It can be seen from table 2 that the tracking errors were no larger than 0.5 pixels for about 53.02 (51.59 for parallel processing) percent of all the frames, and the tracking errors were less than 1.0 pixel for about 99.50 (94.25 for parallel processing) percent of all the frames.

ll

(a)

(b)

SPIE-IS&T/ Vol. 7245 72450T-8

(c)

(d) our template update strafegy by parallel processing

origirrel lemplete rpdste strategy [121 reboot template update otrategy [13] our template update otrategy

2.5

Us 0.5

0

200

400

600

600

1000

1200

1400

1500

1800

2000

U

2UU

4UU

BUy

8UU

Frame No

lUUU

I2UU

I4UU

lBUU

18UU

2UUU

Prame No

(e)

(f)

Fig.4 Tracking results in the second example. (b)-(d) are the tracking results by our method. The small images to the left of the frames show enlarged parts of the frames with the locations of the feature point denoted by white crosses. (a) at frame 1; (b) at frame 745, the tracking error was 0.71 pixels; (c) at frame 1066, the tracking error was 0.61 pixels; (d) at frame 1809, the tracking error was 0.18 pixels; (e) comparison of the tracking accuracy of the three methods; (f) tracking errors of the proposed template update strategy implemented in parallel processing. Table 2 Percentage of frames in which the tracking errors of our method were within a given bound in example 2.

Methods

Tracking Error (in pixels)

proposed template update method ( ε = 0.15 ) proposed template update method( implemented in parallel )

4.

≤0.2

≤0.4

≤0.5

≤0.6

≤0.8