Fast Robust Correlation - IEEE Xplore

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, AUGUST 2005

1063

Fast Robust Correlation Alistair J. Fitch, Member, IEEE, Alexander Kadyrov, William J. Christmas, and Josef Kittler, Member, IEEE

Abstract—A new, fast, statistically robust, exhaustive, translational image-matching technique is presented: fast robust correlation. Existing methods are either slow or non-robust, or rely on optimization. Fast robust correlation works by expressing a robust matching surface as a series of correlations. Speed is obtained by computing correlations in the frequency domain. Computational cost is analyzed and the method is shown to be fast. Speed is comparable to conventional correlation and, for large images, thousands of times faster than direct robust matching. Three experiments demonstrate the advantage of the technique over standard correlation. Index Terms—Correlation, fast algorithms, image registration, robust statistics.

I. INTRODUCTION

C

ORRELATION is a signal-matching technique. It is a key component of radar, sonar, digital communications, and many other systems. Mathematically similar to convolution, correlation is computationally expensive [1]. The first practical correlation methods were optical. Optical correlation was developed in the late 1950s and early 1960s [2]. Digital correlation took off with the advent of the fast Fourier transform (FFT) developed in the mid 1960s [3]. In this paper, we address the application of digital correlation to image registration. Image registration, or matching, is the process of aligning two or more images [4]. The topic has a wide range of applications, including: super-resolution, face detection, video coding, medical imaging, database classification, and mosaicking. In selecting a suitable image registration method, one must consider the nature of the transformation aligning the images [5]. Transformations can be parametric, e.g., translational, isometric, similarity, affine, projective or polynomial, or non-parametric, e.g., elastic deformations or thin-plate splines. Correlation finds parametric translation transformations. This is a prominent transformation in many of the above applications. Applications which require a more complex transformation may still use a translational model as the first stage of the estimation process [6], [7]. Two matching methodologies are prevalent in the literature: area-based methods, also known as direct methods, and featurebased methods. Area-based methods match measurable image quantities, e.g., brightness [8], absolute gradient [9], or phase [10], [11]. Feature-based methods match features extracted from Manuscript received April 26, 2002; revised August 28, 2003. This work was supported by EPSRC projects GR/M74658 and GR/M88600/01. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Patrick Perez. The authors are with the Centre for Vision, Speech, and Signal Processing, University of Surrey, Surrey GU2 7XH, U.K. (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/TIP.2005.849767

the images, e.g., corners [12] or junctions [6]. Correlation is an area-based method. Correlation, and the fast robust correlation method presented in this paper, can match normalized images, gradient images, or phase images. With the transformation selected and method of evaluating a transformation defined, image registration becomes an optimization task to find the best transformation parameters. In the case of a translational transformation, correlation has a unique ability to accomplish an exhaustive search quickly. The FFT allows fast correlation of digital signals. By the end of the 1960s, correlation through FFTs was applied to image registration [13]. Correlation is not robust. In fields such as computer vision and image processing, robustness is important. For example, consider matching images Fig. 1(a) and (b), the first and fifth frames of a video sequence. Over 90% of each frame is background. The background of the two frames is aligned with a (0, 6) translation. Matching the two frames with correlation returns a translation of (0,0), matching neither the background nor the objects in the scene. Closer examination reveals a four-pixel black border on the right of the frames. The black lines represent less than 1.2% of the pixels in each of the frames. Correlation is weighted in such a way as to consider the alignment of these black lines more important than other potential matches. At the heart of correlation is a squared-error kernel. This is why correlation is not robust. A variety of improvements to correlation have been proposed over the years [10], [14], [15]. These have improved correlation results and addressed issues such as illumination invariance. Comparative studies have been made [16], [17]. While some of these methods claim to be robust, all are based on the squared-error kernel; they are not statistically robust. Huber defines robust statistics as “…insensitivity to small deviations from the assumptions.” [18]. Note that “small” may imply small deviations for all data (e.g., Gaussian noise) or large deviations for a small quantity of data (outliers). In relation to image registration, robustness implies correct registration in the presence of noise, occlusion, revealed regions, new objects, highlights—in general, any effect that may cause deviation from a perfect match. Registration based on a squared-error kernel has optimal performance for normally distributed errors (Gaussian noise). However, as shown with the example in Fig. 1, a squared-error kernel is unable to handle serious mismatches at the correct registration (outliers). Robust statistics have been applied to image registration [19], and the related fields of motion estimation [20], [21] and optic flow [22], [23]. Median, used in [22], is robust but computationally expensive. The most prevalent robust method in the literature is the solution of M-estimators with iteratively reweighted

1057-7149/$20.00 © 2005 IEEE

1064


Fig. 1. Coastguard frames. (a) First. (b) Fifth.

least squares [19], [21], [23]. None of the methods are both exhaustive and robust. Redescending M-estimators reject extreme outliers entirely [18]. Thus, a small complete mismatch at the correct registration will not affect a correct match. The weakness of M-estimators is a relatively low breakdown point [22]. Good initial estimates are required for success [24]. Exhaustive robust matching provides the best of both worlds: a robust kernel (as used in M-Estimators) and an exhaustive search (negating the requirement for a good initial estimate). However, in doing this, one cannot use FFTs to compute correlation quickly. The main contribution of this paper is to present a method of computing robust correlation fast: fast robust correlation. As with conventional correlation our method utilizes the speed of the FFT. Recently, Chen et al. advocated the necessity of developing fast algorithms and proposed a fast implementation of robust template matching [25]. Their method makes use of a set of image pyramids they refer to as a p-pyramid. With the reasonable assumption of non-decreasing kernel functions, they present a fundamental inequality. The inequality is used to set bounds limiting the search in the p-pyramid but still guaranteeing the optimal solution. Experimental results present the gain in computation time over full search. The best results show the gain to be just over a factor of 10. However, this is with no outliers in the data, in which case, non-robust methods would be adequate. As outliers are added to the data, the computational cost increases and in some cases there is no saving in computational cost. Compared with fast robust correlation, Chen et al.’s method can use standard M-estimator kernels but has a non-deterministic computational cost. The remainder of this paper is arranged as follows: Fast robust correlation is detailed in Section II. Section III makes a theoretical comparison of the computational cost of fast robust correlation against: 1) correlation and 2) directly computing an exhaustive robust match. Fast robust correlation is shown to be computationally similar to correlation and, for large images, many orders of magnitude faster than direct robust matching.

Section IV experimentally demonstrates the advantage of fast robust correlation over correlation. Advantage is shown in three experiments: block motion estimation for video coding, video frame registration, and tolerance of rotation and zoom. Conclusions are drawn in Section V. II. METHOD We wish to match images and . The extent of the images and . Alpha masks1 allow the is defined by alpha masks registration of non-rectangular objects. Image pixels are indexed , where and are integers. using To represent the quality of a match at every translational position we use a matching surface. A matching surface is created by shifting one image with respect to the other and measuring the difference between the images at each position. We are interested in a pixel–precision matching surface and, thus, shift , where and are integers. At a given shift, by the difference between and is measured at each pixel using , where is the pixel difference. The result of the a kernel kernel function is appropriately multiplied by the alpha masks. Thus, the general, the expression for a matching surface is (1) The two-dimensional (2-D) translational vector aligning and is found from the position of the matching surface minimum. The meaning of alpha masks in the context of image registration shows itself in (1). Note that the alpha masks are applied to the output of the kernel function, not individually to the images. The remainder of this section is arranged as follows. First, the relation between correlation and squared-error kernel matching is shown. Then, our new method, fast robust correlation, is presented. 1Also known as alpha mattes or alpha channels, alpha masks associate variable transparency with an image. Image formats such as PNG and TIFF include alpha masks. For examples, see http://www.libpng.org/pub/png/.

FITCH et al.: FAST ROBUST CORRELATION

Fig. 2.

1065

Standard and cyclic matching. (a) Standard matching. (b) Cyclic matching.

A. Correlation

B. Fast Robust Correlation

Substituting the squared matching kernel, we have the squared kernel-matching surface

, in (1),

(2) Equation (2) can be expressed as a series of correlations

The squared kernel in (2) is sensitive to outliers. To achieve to the criterion robustness, the contribution of the kernel function (1) should have a negligible effect for large matching errors. Examples of such kernels can be found in [18] and [20]. with a limIn our method, we approximate an ideal kernel ited number of sinusoidal terms

(3) where the symbol

(5)

stands for correlation

where denotes complex conjugate. Correlation can be computed as the inverse Fourier transform of (4) and are the Fourier transforms of and equalwhere sized [1]. If FFTs are used, (4) is cyclic correlation. Cyclic matching/correlation is illustrated in Fig. 2. Cyclic matching results from using the FFT algorithm. Images are shifted on a torus. The effect can be removed through zero padding, but, in most cases, zero padding is not required. Most often, when correlation is referred to in the context of matching, cyclic correlation is implied. Under the condition of a cyclic match of equal sized and and unity alpha masks, the first and third terms of (3) are constant. The remaining second term is correlation with a coeffi. Thus, we can say the correlation will give the same cient of matching result as a cyclic match with a squared kernel.

Such a function is well suited to approximating kernels; zero , and the function is even: . maps to zero: This particular function is chosen because the matching surface of such a function can be computed with a series of correlations. As justification for our chosen kernel, consider . In this case, (5) is a Fourier cosine series. Fourier’s theorem states that any continuous function can be described with a series of sinusoids. Thus, we postulate that a finite number of sinusoids can give a usable approximation of a desired kernel. Since we are interested in even kernel functions, the odd sine terms are is shown in Secunnecessary. The advantage of using tion IV; fewer terms are required to create useful kernels. Each term in (5) has the same underlying function as the M-estimate proposed by Andrews [18]. Andrews defined an influence function (derivative of kernel) as for elsewhere Since image pixel values normally have a fixed range (e.g., [0 : 255] for byte pixels), it is possible for our kernel to be

1066


Fig. 3.

Flow diagram of possible implementation of fast robust correlation.

equivalent to Andrews. In this paper it is assumed that pixel values are within, or have been normalized to, the range [0 : 1]. Selection of kernel parameters is shown in Section IV. Section IV contains plots of example kernels (see Fig. 4) and example distributions of (see Fig. 5). Substituting (5) in (1) gives an equation for our robust kernelmatching surface

III. COMPUTATIONAL COST Two computational cost comparisons are made. First, fast robust correlation is compared with correlation in terms of the number of FFTs required. Second, fast robust correlation is compared with equivalent data domain processing in terms of the number of low level operations. Due to the wide availability of efficient FFTs, it has been assumed the correlation will be computed with FFTs. Note that other, theoretically faster ways of computing correlation are available [26], [27]. A. Versus Correlation

(6) This equation is expanded into a series of correlations using trigonometric functions [as in (7), shown at the bottom of the page], or exponential functions

(8) Implementing the correlations in (7) and (8) using FFTs allows translational matching of images that is fast, exhaustive, and robust. Fig. 3 shows the flow diagram of a possible implementation of (8). Further discussion relevant to the implementation of both correlation and fast robust correlation is located in the appendices.

The case of matching two equal-sized real images with unity alpha masks is considered. No zero padding is applied; the match is cyclic. Correlation requires three (two forward and one backward) real FFTs. Since the second correlation in (8) is complex, trigonometric equation (7) and exponential (8) forms of fast robust correlation are computationally equivalent. To count the number of real FFTs required for fast robust correlation, the trigonometric (7) is used. Since alpha masks are unity, their FFT and correlation are trivial. In this case, computing fast robust correlation requires two correlations for each of terms. Using the Fourier linearity property [1] fast robust correlation is computed with ( forward and one backward) real FFTs. In Section IV, we . In this case, fast demonstrate an effective kernel using robust correlation requires five real FFTs compared to correlation’s three. In a parallel or multiprocessor system, both correlation and fast robust correlation take the same time. With correlation

(7)


1067

the two forward FFTs can be computed simultaneously. In fast robust correlation, all forward FFTs, regardless of the number of kernel terms, can be computed simultaneously. Both correlation and fast robust correlation will take two FFT cycles to complete.

TABLE I COMPARING NUMBER OF OPERATIONS FOR MATCHING SURFACE GENERATION

B. Versus Direct Method Here, fast robust correlation is compared with directly computing an exhaustive robust translational match (the standard way of computing an equivalent robust match). The computational cost for each method to produce a matching surface is evaluated. The only difference in resulting matching surfaces will be the fast robust correlation’s approximation of the kernel. The match is to be non-cyclic, as is appropriate for direct matching. We consider matching two real images with unity by and image of size alpha masks: image of size by . To assess the computational cost of direct robust matching, we count the number of pixel operations of kernel function to generate a matching surface. Summing the arithmetic series of the number of overlapping pixels at each match position, we operations. have In fast robust correlation, the majority of computational cost is with the FFTs. For an equivalent measure to the number of direct matching operations, we count the number of real multipli, where is a natural cations required for the FFTs. For number, the FFT of a complex sequence requires complex multiplications [28]. If the size of the FFT has small prime factors the FFT will be fast, i.e., close to multiplications (see [29] or Appendix A). For a 2-D sequence by , we have complex multiplicasize real multiplications. tions, i.e., The correlation of zero-padded unity alpha masks is a 2-D ramp function. Thus, the first correlation in (7) is trivial. With unity alpha masks, normalized fast robust correlation requires Fourier transforms. For a non-cyclic match, full zero padding is required. All Fourier transforms are of the size by . For simplicity, we consider a size by . Since images and are real, the of number of multiplications can be halved [29]. Because of the the forward zero padding, one quarter of the computation of the FFTs does not require processing; this computational saving is forward FFTs. From the equivalent to removing one of the above sentences, we form an expression for the number of real multiplications for fast robust correlation (9) To compare the number of operations required for direct robust matching and fast robust correlation, images of size are considered. In this case, operations, matching using the direct method requires and with fast robust correlation operations. As image size increases, direct-matching operations will increase as a function of , while fast robust correlation . Actual operations will increase as a function of values are compared in Table I. In comparing the number of op-

erations for direct robust matching and fast robust correlation, we observe the following. • There is no benefit in using fast robust correlation if one, or both, of the images are small. • The benefit of using fast robust correlation rapidly increases with image size. • For large images, using fast robust correlation dramatically reduces the computational cost, e.g., by a factor of , . 3000 for We now comment on the ease of implementing fast robust correlation. The majority of computational cost is with the Fourier transforms. Computing Fourier transforms quickly has been the task of much work over the last 50 years. Off-the-shelf code [30], DSP chips with custom algorithms, and even full custom micro-chips based on [3] are readily available. By utilizing the FFT, we are able to make use of the existing body of work aimed at speeding up the discrete Fourier transform. IV. EXPERIMENTS The experiments section is arranged as follows. First, kernel coefficient selection is addressed. Following this, three experiments demonstrating the advantages of fast robust correlation (FRCorr) over correlation (Corr) are presented: video coding, video frame registration, and tolerance of rotation and zoom. The section finishes with a discussion. As appropriate for comparison with correlation, cyclic fast robust correlation is used in each experiment. A. Kernel Tuning To use FRCorr, kernel coefficients must be selected; two approaches are discussed here. The second approach is used to ) kernel for the experiments. Since select a single term ( robust kernels are dependent on the width of the error distribution, a method for adjusting the kernel to a new distribution is given. Note that this adjustment is not used in the experiments; this is because each experiment differences the same quantity (pixel intensity) and, thus, has similar error distributions. Our first approach to kernel tuning is to approximate known kernels. Given a desired robust kernel and a number of kernel terms , we fit our robust kernel by minimizing the square of the area between the two curves (10)

1068


Fig. 4. Example kernels. Squared kernel is shown for comparison.

Alternatively, an influence function (derivative of kernel) [18] can be approximated (11) The kernel is linear in the coefficients so, for given and , one can find using least squares. One can minimize error measures other than (10) and (11) or choose coefficients by trial and error. Using such methods, kernels shown in Fig. 4 can be produced. In all kernels, it is assumed that differences are scaled to [ 1 : 1]. Since kernels are symmetric functions, each kernel is shown over the range [0 : 1]. A second approach to kernel tuning is to search for coefficients on a sample of a given task. We have found this to be a more fruitful approach. Such an approach requires a task with a defined and measurable goal. We use block motion estimation for video coding; the goal is to minimize prediction error [31]. Details of the block motion estimation are given in Section IV-B. Since the scale of is arbitrary, we can impose the constraint

histogram of all differences. These are typical image-matching error distributions. Note: 1) at the best matches, there are many non-zero differences; 2) the majority of all differences are relatively near zero. An artifact of the kernel in Fig. 5 is minima at differences other than zero. This contrasts with typical robust kernels found in [18], [20]. Non-zero minima are particularly problematic for optimization techniques. In optimization, non-zero differences could be minimized resulting in an increased likelihood of becoming stuck in a local minima (the downfall of many optimization methods). As will be demonstrated by the following experiments, the non-zero minima are not a problem for fast robust correlation. We hypothesize that this is due to the exhaustive search. The kernel is only required to pick the global minimum, not direct a search. The advantage of such a kernel is that it rises quickly. As the histograms in Fig. 5 show, this is necessary for an effective image-matching kernel. In tuning a single-term kernel, only the kernel width needs to be selected. Setting the kernel width must be addressed when using any robust kernel. If images with significantly different error distributions are to be matched, the width of the robust kernel must be adjusted. If preprocessing is applied to the images, e.g., gradient or phase methods, the width of a robust kernel must be reset. We now consider adjusting the kernel shown in Fig. 5 to , there is only one match other error distributions. For unknown parameter ( ), which determines the width of the kernel. Since we wish the kernel width to scale with the distribution, we can determine from the width of the distribution. We use the median of absolute deviations (MAD) to determine this scale, as it is considered [18] to be the most useful estimate of scale, i.e., MAD For the distribution in Fig. 5(b), MAD was computed to be . Hence, for other distributions, we can 0.0471 and as compute MAD MAD

Thus, for a given , we need to search a space to determine the coefficients. Ideally, the global minimum of the space is then found. We conducted a search by sampling the space and searching for a minimum. The sampling interval was then reduced and the process repeated. As it sufficiently ilis presented. lustrates the point, only the kernel for To tune a kernel for general images but not a specific sequence, the first pair of adjacent frames from three standard test sequences are used. The resulting kernel is shown in Fig. 5. To put the kernel into context, it is shown with normalized histograms of cyclic match pixel differences, i.e., the pixel differences applied to the kernel. Fig. 5(a) shows the kernel with the histogram of absolute pixel differences at the best (full search) motion vector positions. Fig. 5(b) shows the kernel with the

(12)

Thus, given a task with a different error distribution, we propose to measure the median absolute deviation of the distribution, and use (12) to set the width of the kernel. Note that the same kernel ( ) is used in all experiments presented in this paper. We would expect the kernel to perform well on the video coding experiment as the kernel has been tuned on a sample of this experiment. Since the other experiments also use intensity differences, we expect the error distributions to be similar and the same kernel to perform well. B. Video Coding FRCorr is first compared to Corr in the application of block motion estimation for video coding. Experiments are undertaken on the first 150 frames of three Cif size (288 352) video sequences, using standard MPEG size (16 16) blocks. A well-


Fig. 5.

Kernel (1=2)(1

1069

0 cos(5:35r)) with normalized histogram of (a) differences at correct matches and (b) all differences.

TABLE II COMPUTATIONAL COST OF MOTION ESTIMATION PER BLOCK

TABLE III MEAN ABSOLUTE PREDICTION ERROR (MA)

defined goal, minimizing mean absolute prediction error (MA ), makes for a good comparative test MA

mean

where is the motion compensated reconstruction of frame . is constructed from the immediately preceding frame, frame , and the motion vectors generated from frames and . The defined goal also defines an optimal method: full search with absolute differences (FS). The FS method gives optimal results for the MA criterion but is avoided in many applications because of its computational cost. Comparison of Corr and FRCorr is made in relation to the baseline FS method. In all methods, motion vectors are limited to [ 8 : 7] with integer pixel accuracy. Corr and FRCorr are applied to block motion estimation for video coding by correlating temporally adjacent blocks. Correlating sequential blocks is efficient; compared to non-sequential matching, the overall number of forward FFTs can be halved. However, a relatively small block size does not suit correlation techniques; benefits of the FFT are greater for larger images. Counting FFT multiplications for Corr and FRCorr and pixel operations for FS, computational cost is shown in Table II. Note that correlating adjacent blocks restricts performance; ideally, a block is matched within a larger region of the previous frame (not just the adjacent block). FS is not restricted in this way; this is one reason for the drop in performance of the correlation methods. The correlation methods could be applied to match blocks to regions but the increased computational cost would make such implementations inefficient. Mean absolute prediction error (MA ) for the first 150 frames of bus, coastguard, and foreman sequences is shown in Table III. On all three sequences, FRCorr outperforms Corr. Compared to Corr, FRCorr is significantly closer to the optimal performance of FS.

C. Video Frame Registration Our second experiment demonstrates FRCorr overcoming the failings of Corr shown by the example in the introduction. Coastguard is a 300-frame Cif video sequence. The camera tracks a boat in a pan left, then tilts to track a second boat moving in the opposite direction. The background is at sufficient depth for a translational match to be a good approximation of the motion in the scene. As shown in the introduction, taking the first and fifth frames of coastguard, Corr does not match either the background or boat. Using the kernel shown in Fig. 5, FRCorr matches the frames such that the background is correctly aligned with a (0, 6) translation. As a more general experiment, the first frame of the coastguard sequence is matched with the th frame. Using the measured translation, the first and th frames are mosaicked. The mosaic is then placed in the th position of a video of mosaics. Playing the video of mosaics allows visual interpretation of the matching. With Corr, the only meaningful match is between the first and second frames. FRCorr correctly matches the background up to the 226th frame. The mosaic of the first and 226th frames as found by FRCorr is shown in Fig. 6. The matching surfaces from matching the first and 80th coastguard frames are shown in Fig. 7. Surfaces are inverted to aid vi, sualization. Frames are correctly aligned by FRCorr with a ( ) translation. Both surfaces have a ridge at zero column shift caused by the line blanking. Corr matching surface has a global minimum at ( 10, 0); no minima are evident near the correct shift. FRCorr has a clear, sharp, global minimum at the correct shift.

1070


Fig. 6. Mosaic of first and 226th frames of coastguard sequence using FRCorr.

0 0110) translation. (a) Correlation.

Fig. 7. Inverted matching surfaces, matching first and 80th coastguard frames. The background is matched with a ( 51, (b) Fast robust correlation.

D. Tolerance of Rotation and Zoom Experiments measuring the tolerance to rotation and zoom of Corr and FRCorr are presented. Each experiment uses a large image and extracts subimages offset by a shift of (40, 50) pixels. The subimages are deformed and then cropped to Cif size (288 352). This process avoids undefined areas in the cropped images. The correct match of undeformed, cropped, subimages is shown in Fig. 8. With no deformation both methods correctly find the match. The rotation experiment counter-rotates each subimage by half the rotation amount. In this fashion, both subimages have undergone equivalent levels of resampling. The zoom experiment reduces the size of one subimage by the zoom factor. The zoom is equivalent to a scale change centered on the image center. Deformed image pairs are matched, and the matching surface maximum or minimum, as appropriate, is found. This gives a measured translation. As an error measure we use the Euclidean distance between the true and measured translation of the center of the image. Graphs showing the matching error

against (a) rotation and (b) zoom are shown in Fig. 9. Again, FRCorr uses the kernel shown in Fig. 5. From Fig. 9, we observe that FRCorr outperforms Corr in terms of tolerance to both rotation and zoom. FRCorr matches images rotated by 12 to within 15 pixels and images differing in a scale factor of 1.3 to within 16 pixels. As deformation increases, FRCorr degrades gracefully. E. Discussion The sole difference between Corr and FRCorr is the error kernel. Cyclic matching involves wrapping, which is likely to cause outliers. Each of the experiments adds further outliers: video coding and video frame registration contain multiple motions; the coastguard sequence also contains line blanking; and rotation and zoom cause outliers for a translational match. In each case, the square kernel of Corr will give significant weight to such outlying errors. The robust kernel of FRCorr will not.


1071

Fig. 8.

Correct match in rotation and zoom experiment, no rotation or zoom.

Fig. 9.

Robustness to rotation and zoom. (a) Rotation. (b) Zoom.

V. CONCLUSION This paper has presented fast robust correlation. Fast robust correlation is a fast, robust, exhaustive, translational imagematching technique. The method is comparable to correlation, which is not robust, and exhaustive translational robust matching, which, until now, has not been computationally feasible. The method is derived by expressing a robust matching surface in terms of correlations. The speed of the method comes from using fast Fourier transforms to compute the correlations.

Theoretical computational cost comparisons have been made. In a parallel or multiprocessor system, correlation and fast robust correlation will take the same time to complete. On a serial system, fast robust correlation requires a minimum of five fast Fourier transforms compared to correlation’s three. Compared to exhaustive translational robust matching, fast robust correlation is thousands of times faster for large images. Three experiments showing the advantage of fast robust correlation over correlation have been presented. Advantages have been shown in block motion estimation for video coding, video frame registration and tolerance of rotation and zoom.

1072


Fig. 10.

Mean CPU time against FFT size. (a) All sizes. (b) Sizes with prime factors less than 17.

APPENDIX A ZERO PADDING FOR SPEED To perform correlation using FFTs, it may be necessary to zero pad the images before the FFTs. There are three different reasons for zero padding: First, to element-wise multiply with , they must be of the same size; second, to avoid cyclic correlation; and, third, to increase the speed of the FFT. It is now explained why and demonstrated how increasing the size of the image can increase the speed of the FFT. In general, the larger the sequence, the greater the time required for the FFT. However, there are many sizes from which an increase in size, for example, through zero padding, will result in a significant decrease in computation time. The FFT works by splitting sequences into small pieces. The basic FFT algorithm works on sequences of length , where is a natural number, continually halving the sequence down to sections of length 2. Modern FFTs, such as [30], work on sequences of any length. However, such FFTs favor sizes with small prime factors. To illustrate this statement, we measure the CPU time taken for MatLab 6.1 [32] to FFT a complex, square, 2-D sequence of varying size. MatLab 6.1 FFT routines are based on the Fastest Fourier Transform in the West [30]. Times were measured using the Linux version of MatLab on a 1-GHz Intel Pentium III processor. The average times of 100 runs for square images of sizes between 200 200 and 1000 1000 are shown in Fig. 10(a). From Fig. 10(a), we observe that, as image size increases, the variance within similar FFT sizes increases. Above image sizes of 600 600, there are differences in processing time greater than a factor of ten between FFT of images only a few pixels difference in size. Fig. 10(a) also shows there are many optimal sizes of FFT. Fig. 10(b) shows that sizes of FFT with small prime factors are fast. APPENDIX B NORMALIZATION If zero padding or non-unity alpha masks are used, there will be a variation in the number of pixels being matched. A sum

matching surface requires normalization. It is now shown how this can be quickly achieved. and to have value 1 over included pixels, We define 0 over pixels to not include and values between 0 and 1 for increasing partial inclusion. Alpha masks are the same size as their respective images and are zero padded by the same size as their respective images. With masks defined in such a way, the correlation of and gives the measure of the amount of pixel overlap at each matching surface position. Dividing the sum-matching surface by the correlation of zero padded masks, we obtain a normalized, or mean, matching surface (13) The denominator of (13) can also be used to remove small indicates the overlap matches. Thresholding position of small overlap matches. REFERENCES [1] J. G. Proakis and D. Manolakis, Digital Signal Processing : Principles, Algorithms, and Applications, 3rd ed. New York: Prentice-Hall, 1996. [2] A. VanderLugt, “Signal detection by complex spatial filtering,” IEEE Trans. Inf. Theory, vol. 10, no. 2, pp. 139–145, Apr. 1964. [3] J. Cooley and J. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Math. Comput., vol. 19, pp. 297–301, 1965. [4] L. G. Brown, “A survey of image registration techniques,” ACM Comput. Surv., vol. 24, no. 2, pp. 325–376, Dec. 1992. [5] C. A. Glasbey and K. V. Mardia, “A review of image warping methods,” J. Appl. Statist., vol. 25, pp. 155–171, 1998. [6] A. Can, C. V. Stewart, and B. Roysam, “Robust hierarchical algorithm for constructing a mosaic from images of the curved human retina,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Jun. 1999, pp. 286–292. [7] R. J. Althof, M. G. J. Wind, and J. T. Dobbins III, “A rapid and automatic image registration algorithm with subpixel accuracy,” IEEE Trans. Med. Imag., vol. 16, no. 6, pp. 308–316, Jun. 1997. [8] M. Irani and P. Anandan, “All about direct methods,” in Proc. Int. Workshop Vision Algorthims, W. Triggs, A. Zisserman, and R. Szeliski, Eds., 1999, pp. 267–277. [9] C. A. Glasbey and N. J. Martin, “Multimodality microscopy by digital image processing,” J. Microsc., vol. 181, pp. 225–237, 1996. [10] C. D. Kuglin and D. C. Hines, “The phase correlation image alignment method,” in Proc. IEEE Int. Conf. Cybernetics and Society, 1975, pp. 163–165. [11] C. A. Glasbey and K. V. Mardia, “A penalized likelihood approach to image warping,” J. Roy. Statist. Soc. B, vol. 63, pp. 465–514, 2001.


[12] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge, U.K.: Cambridge Univ. Press, 2001. [13] P. E. Anuta, “Spatial registration of multispectral and multitemporal digital imagery using fast Fourier transform techniques,” IEEE Trans. Geosci. Electron., vol. GE-8, no. 4, pp. 353–368, Oct. 1970. [14] W. K. Pratt, “Correlation techniques of image registration,” IEEE Trans. Aerosp. Electron. Syst., vol. AES-10, pp. 353–358, 1974. [15] J. P. Lewis, “Fast normalized cross-correlation,” Vis. Interf., pp. 120–123, 1995. [16] P. Aschwanden and W. Guggenbühl, “Experimental results from a comparative study on correlation-type registration algorithms,” in Robust Computer Vision: Quality of Vision Algorithms, W. Förstner and S. Ruwiedel, Eds. Karlsruhe, Germany: Wichmann, 1992, pp. 268–289. [17] J. Martin and J. L. Crowley, “Experimental comparison of correlation techniques,” presented at the Int. Conf. Intelligent Autonomous Systems, 1995. [18] P. J. Huber, Robust Statistics. New York: Wiley, 1981. [19] S.-H. Lai, “Robust image matching under partial occlusion and spatially varying illumination change,” Comput. Vis. Image Understand., vol. 78, pp. 84–98, 2000. [20] M. Bober and J. Kittler, “Estimation of complex multimodal motion: An approach based on robust statistics and hough transform,” Image Vis. Comput., vol. 12, no. 10, pp. 661–668, Dec. 1994. [21] J. M. Odobez and P. Bouthemy, “Robust multiresolution estimation of parametric motion models applied to complex scenes,” J. Vis. Commun. Image Represent., vol. 6, no. 4, pp. 348–365, Dec. 1995. [22] A. Bab-Hadiashar and D. Suter, “Robust optic flow computation,” Int. J. Comput. Vis., vol. 29, no. 1, pp. 59–77, 1998. [23] M. J. Black and P. Anandan, “The robust estimation of mulitple motions: Parametric and piecewise-smooth flow fields,” Comp. Vis. Image Understand., vol. 63, no. 1, pp. 75–104, 1996. [24] T. Darrell and A. P. Pentland, “Cooperative robust estimation using layers of support,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 5, pp. 474–487, May 1995. [25] J.-H. Chen, C.-H. Chen, and Y.-S. Chen, “Fast algorithm for robust template matching with M-estimators,” IEEE Trans. Signal Process., vol. 51, no. 1, pp. 230–243, Jan. 2003. [26] R. C. Agarwal and C. S. Burrus, “Number theoretic transforms to implement fast digital convolution,” Proc. IEEE, vol. 63, no. 4, pp. 550–560, Apr. 1975. [27] R. C. Agarwal and J. W. Cooley, “New algorithms for digital convolution,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-25, no. 10, pp. 392–410, Oct. 1977. [28] J. S. Walker, Fast Fourier Transforms, 2nd ed. Boca Raton, FL: CRC, 1996. [29] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C. The Art of Scientific Computing, 2nd ed. Cambridge, NJ: Cambridge Univ. Press, 1992. [30] M. Frigo and S. G. Johnson. Fastest Fourier transform in the west. [Online]. Available: http://www.fftw.org/ [31] J. L. Mitchell, W. B. Pennebaker, C. E. Fogg, and D. J. LeGall, MPEG Video Compression Standard. New York: Chapman & Hall, 1997. [32] MatLab [Online]. Available: http://www.mathworks.com/

Alistair J. Fitch (M’03) was born in the U.K. in 1977. He received the B.Eng. and Ph.D. degrees from the University of Surrey, Surrey, U.K., where his doctoral studies were undertaken in the Centre for Vision, Speech, and Signal Processing. He has worked for a number of companies on image-processing tasks, including real-time motion compensated interpolation of video, fully automatic removal of red-eye from still photos, special effects for video post production, and facial image processing. His research interests include image processing, pattern recognition, and computer vision. Dr. Fitch is a member of the IEE and BMVA. In 2002, he received an award for the work he presented at the British Machine Vision Conference.

1073

Alexander Kadyrov received the degree in mathematics in 1976 and the Ph.D. degree in mathematics in 1983 from St. Petersburg State University, St. Petersburg, Russia. From 1979 to 1998, he was with Penza State University, Russia. In 1998, he joined University of Surrey, Surrey, U.K.

William J. Christmas received the Ph.D. degree from the University of Surrey, Surrey, U.K., while working on the use of probabilistic methods for matching geometric features. He holds a University Fellowship in Technology Transfer at the Centre for Vision, Speech, and Signal Processing, University of Surrey. After studying engineering science at the University of Oxford, Oxford, U.K., he spent some years with the British Broadcasting Corporation as a Research Engineer, working on a wide range of projects related to broadcast engineering. He then moved to BP Research International as a Senior Research Engineer, working on research topics that included hardware aspects of parallel processing, real-time image processing, and computer vision. His other interests have included region-based video coding and the integration of machine vision algorithms to create complete applications. Currently, he is working on projects concerned with automated, content-based annotation of video and multimedia material.

Josef Kittler (M’78) received the degree in electrical engineering, the Ph.D. degree in pattern recognition, and the Sc.D. degree from the University of Cambridge, Cambridge, U.K., in 1971, 1974, and 1991, respectively. He joined the Department of Electronic and Electrical Engineering, University of Surrey, Surrey, U.K., in 1986, where he is currently a Professor in charge of the Centre for Vision, Speech, and Signal Processing. He coauthored the book Pattern Recognition: A statistical approach (Upper Saddle River, NJ: Prentice-Hall, 1982). He has published more than 400 papers. He is on the editorial boards of Image and Vision Computing, Pattern Recognition Letters, Pattern Recognition and Artificial Intelligence, Pattern Analysis and Applications, and Machine Vision and Applications. He has has worked on various theoretical aspects of pattern recognition and on many applications including automatic inspection, remote sensing, video retrieval, speech recognition, and document processing. His current research interests include pattern recognition, image processing, and computer vision.