Robust automated registration in the spatial domain - CiteSeerX

60 downloads 28639 Views 174KB Size Report
Robust automated registration in the spatial domain ... registration using the LPT is normally computationally ... features, which are available in many images.
Robust automated registration in the spatial domain Stéfan van der Walt, Ben Herbst Department of Electrical and Electronic Engineering, University of Stellenbosch, South Africa [email protected] Abstract Image registration algorithms have a great number of applications. Recently, many such methods have been developed, based on invariant localised interest points. These methods are fast, accurate and generally robust – ideal in almost every way. Unfortunately, they are inadequate under certain circumstances. Zokai & Wolberg suggested the log-polar transform (LPT) for use in cases involving large changes in scale and rotation. We demonstrate that in addition to these properties, the LPT is useful in cases where other local descriptors fail. Although registration using the LPT is normally computationally intensive, we suggests ways in which its computational cost may be significantly reduced. Stacking of images, often used in astronomy, as well as panoramic stitching are used as examples.

features, which are available in many images. There are, however, cases where it is very difficult to distinguish one feature from another without examining its spatial context. As an example, we will use frames recorded by a CCD mounted on a telescope pointing at a deep-space object. It is very difficult to find features to track in these images, because the stars (all potential features) are virtually identical and rotationally invariant. Since local features fail, and global methods are slow and unreliable, we would like to find an algorithm that can bridge the gap. We will proceed to show that the log-polar transform (LPT) is an ideal candidate. While previously its use has been limited due to its high computational cost, we develop ways of reducing those costs and making the LPT behave more like local features.

1. Introduction

2. The log polar transform

Registration algorithms can be divided into two broad classes: those that operate in the spatial and frequency (i.e. Fourier) domains, respectively. In the spatial domain, there are sparse methods including local descriptors, that depend on some form of feature extraction, and dense methods that operate directly on image values such as optical flow and correlation. The two classes generally differ in that the spatial methods are localised, whereas the frequency domain methods [15, 6, 8, 7] operate globally. Attempts have been made to bridge this gap, by using wavelet and other transforms to locate informationcarrying energy [4]. These have been met with varying success. Each registration method has its own particular advantages and disadvantages. Fourier methods, for example, are fast but inaccurate, suffer from resampling and occlusion effects [16, p. 1425], and only operate globally. Iterative registration, on the other hand, is highly accurate but extremely slow, and prone to misregistration due to local minima in the minimisation space. These problems led to the development of methods based on localised interest points [1, 2, 10, 17, 18], such as the scale-invariant feature transform (SIFT) [13], the fast Speeded Up Robust Features (SURF) [9] and others [11]. All these methods depend on unique localised

The log-polar transform (LPT) spatially warps an image onto new axes, angle (θ) and log-distance (L). Using the centre of the image, (xc , yc ) as reference, pixel coordinates (x, y) are written in terms of their offset from the centre, x¯ = y¯ =

x − xc y − yc .

For each pixel, the angle is defined by (  x ¯ 6= 0 arctan xy¯¯ θ= 0 x ¯=0 with a distance of L = logb

p  x ¯2 + y¯2 .

The base, b, which determines the width of the transform output, is chosen to be 1

b = eln(d)/w = d w , where d is the distance from (xc , yc ) to the corner of the image, and w is the width or height of the input image, whichever is largest.

Log polar transform 0

Colour input 200

Angle (θ)

150

100

π

50

0 0

50

100

150

200

2π 1.00

3.45 11.89 41.01 141.42 Distance from centre

Figure 1: Illustration of the log-polar transform. When warping images, it is not possible to use the forward transform. Since we use discrete coordinates (integer x and y values), more than one input coordinate may map to the same output coordinate. Worse still, not every output coordinate will be covered. One solution is to calculate the irregular grid of coordinates obtained by transforming each input coordinate (without discretising). Then, the input is warped and resampled (using interpolation) at the required output positions. An easier and computationally less intensive method is to reverse the process. For each output coordinate, the transformation is applied in reverse, to obtain a coordinate in the input image. Using interpolation, an output value is determined from the input. This can be done if, ignoring the effect of discretisation, the transformation function is bijective (a one-to-one correspondence, and all input and output coordinates are mapped). Given θ and L , we would now like to find x and y. First, calculate the distance r from the centre, r

= eln(b)L

3. Fast registration based on the log-polar transform In [16], affine registration based on the log-polar transform is described. Given a reference frame, R(x, y), and a target frame, B(x, y), we want to find a transformation T , such that R(x, y) = B(T (x, y)). Assuming that the frames are images taken of the same object from a long distance, we know that the transformation must be a similarity, i.e. it is limited to translation, rotation and scale. If we express a coordinate (x, y) as a homogeneous coordinate p = [x, y, 1]T , we can view the transformation as a matrix multiplication, T (p) = M p where M

=

= eL ln(d)/w after which x and y can be recovered as x = y =

r cos(θ) + xc r sin(θ) + yc .

Note the relationship of the input image to the axes of the LPT: if the input is rotated it results in a shift in the θ axis, whereas scaling the input is seen as a shift in the L axis. It is this property of the LPT that is used in registration.

=



s cos(θ) −s sin(θ)  s sin(θ) s cos(θ) 0 0   sR t 0T 1

 tx ty  1

and R represents rotation, t translation and s scale. The LPT proceeds as follows: • From the centre of the reference frame, pr = [xr , yr ]T , cut a square roughly 20% the size of the image (or at least the size of the objects we wish to track), and obtain its LPT. The square represents a feature.

this commonly fails. For example, imagine two images, one being a checker-board pattern and the other divided in two halves, one black and one white. The intensities in these images will have the same variance, while their content and texture differ. To counter this problem, we first apply the high-frequency emphasis kernel,   0 1 0 κ =  1 −2 1  , 0 1 0

1.0

0.5

0.0

−0.5

−1.0 −1.0

−0.5

0.0

0.5

1.0

Figure 2: Fourier domain representation of the highfrequency emphasis kernel. • For each position in the target frame pt = [xt , yt ]T , cut out a square of the same size and obtain its LPT. Correlate the resulting LPT with the LPT of the reference disc, taking care to wrap vertically (easily accomplished using the FFT). If the normalised correlation is higher than at any other point thus far, store the position, rotation (θ) and scale (s). The s and R components of M can now be determined, after which the position is related to translation by t = pr − sRpt . The method outlined above works well, even in the presence of high levels of noise, changes in illumination and large differences in scale, rotation and translation. Unfortunately, its computational intensity proves to be prohibitive: even for a small image of dimension 200 × 200, roughly 40000 log-polar transforms and correlations need to be calculated. Furthermore, unless the target frame overlaps with the centre of the reference frame, the registration fails. To address this problem, we choose more squares in the reference frame, but severely limit the number of discs in the target frame, thereby reducing the number of log-polar transforms and correlations that need to be calculated. To accomplish this, we need to identify areas in both the reference and target frames that are likely to contain useful features. Smooth areas are less likely to contain such information. An intuitive measure of smoothness is variance—as is often used in texture analysis—but

of which the frequency response is shown in Figure 2, to each frame. The variance is then calculated over a moving window, yielding a map indicating potentially useful features, as shown in Figure 3. To locate the position of such features, the map is divided into smaller blocks, and the peak value in each block is taken as the position of a feature. We use 10 equally-sized blocks in the reference frame and 40 equally-sized blocks in each of the target frames. Target positions are rejected when the variance falls outside the range of variances at reference positions. Features can also be rejected in terms of other criteria, for example on the grounds of being symmetric, since such features can yield invalid matches at different angles. The process outlined earlier is then followed, correlating the logpolar transforms of selected feature-squares in the reference and target frames to find the registration parameters. Note that the number of log-polar transforms and correlations are reduced from tens-of-thousands to fewer than 500. All transforms operate on images of the same dimension, therefore the coordinates involved only need to be calculated once. The algorithms were implemented using the Python language [5, 19, 3], calling C routines to obtain higher speed. On a 1.6GHz Intel T2300 processor, a log-polar transform on a 200 × 200 image takes less than 0.1s to compute. Depending on the number of features found, registration for such an image (without the refinement described below, depending on the sampling of the LPT and the kind of interpolation used) can be computed in roughly 3s. Since the variance map has a smoothing effect on the input, we should allow for a small error in the position of maximum correlation. A small search area, typically 5×5 pixels, is defined around this point, at which the correlation procedure is repeated to provide a refined measurement. It is suggested in [16] that iterative minimisation of the registration parameters is performed, thereby increasing the accuracy even further. We have found, however, that this process often diverges due to the large number of local minima in the minimisation space. Figure 4 shows 25 frames from the telescope CCD, registered and stacked. In Figure 5 we show a crude panorama, stitched after LPT registration. The images are

Figure 3: The variance map highlights areas of interest.

(a) Example input frame

(b) Result

Figure 7: Incorrectly identified edge-feature. Figure 4: Stacking is often used in astronomy to limit noise and gather more light on the camera sensor. Here, 25 observations are stacked after automatic registration, using the log-polar algorithm (photographs by Chris Forder of the Cederberg Observatory). not blended, in order to show the alignment more clearly. Notice how the foreground is malaligned, due to the registration being limited to a similarity. Figure 6 shows typical reference feature positions (white dots) and the matching target position (connected by a red line). Figure 7 shows an example containing an edge, one type of feature that is hard to position accurately, and for which the algorithm failed.

4. Conclusion For most applications in registration, localised interest points provide fast, localised and accurate results. However, there are cases in which these algorithms fail, thus requiring a different approach. The log-polar transform provides such an algorithm, at significant computational cost. Using the methods described in this paper, the computational cost of LPT-based registration can be sufficiently decreased, making it both practical and effective for such purpose. The main features of the different registration approaches are summarised in Table 1. While the proposed method worked well on many data sets, it did fail in some cases. This is mainly due to the complete registration depending on only one corre-

sponding feature in the reference and the target. Some further investigation is required to find ways of using LPT-based features in the same way we would other localised features. This would allow for a robust approach, where features can be selected and rejected using consensus theory (see for example [14]), after which the registration parameters can be estimated using a least-squares fit.

5. References [1] Bruce D. Lucas and Takeo Kanade. An Iterative Image Registration Technique with an Application to Stereo Vision. In Proceedings of the Joint Conference on Aritificial Intelligence, pages 674–679, August 1981. [2] Carlo Tomasi and Takeo Kanade. Detection and Tracking of Point Features. Technical Report CMU-CS-91-132, Carnegie Mellon University, April 1991. [3] Eric Jones and Travis Oliphant and Pearu Peterson and others. SciPy: Open source scientific tools for Python, 2001. [4] George Lazardis and Maria Petrou. Image Registration Using the Walsh Transform. IEEE Transactions on Image Processing, 15(8):2343–2357, August 2006.

Figure 5: Rough panorama of three frames, stitched after alignment using LPT registration.

Class Unmodified Fourier Localised interest points Log-polar transform Iterative minimisation

Advantages Fast Fast, localised, accurate Robust Accurate

Disadvantages Inaccurate, global Does not take feature surroundings into account Slower, cannot estimate skew Slow, overly sensitive, global

Table 1: Properties of certain common classes of registration algorithms. See [12] for a more detailed exposition.

Figure 6: Features found using the LPT. The white dots indicate feature positions in the reference frame, while the red dots show the best corresponding features in the target frames.

[5] Guido van Rossum and others. The Python language. [6] Hanzhou Liu and Baolong Guo and Zongzhe Feng. Pseudo-Log-Polar Fourier Transform for Image Registration. IEEE Signal Processing Letters, 13(1):17–20, January 2006. [7] Harold S Stone and Stephen A Martucci and Michael T Orchard and Ee-chien Chang. A fast direct Fourier-based algorithm for subpixel registration of images. IEEE Transactions on Geoscience and Remote Sensing, 39(10):2235–2243, October 2001. [8] Hassan Faroosh (Shekarforoush) and Josiane B. Zerubia. Extension of Phase Correlation to Subpixel Registration. IEEE Transactions on Image Processing, 11(3):188–200, March 2002. [9] Herbert Bay and Tinne Tuytelaars and Luc Van Gool. SURF: Speeded Up Robust Features. In Proceedings of the ninth European Conference on Computer Vision, May 2006. [10] Jianbo Shi and Carlo Tomasi. Good Features to Track. In IEEE Conference on Computer Vision and Pattern Recognition, pages 593–600, 1994. [11] K. Mikolajczyk and C. Schmid. An Affine Invariant Interest Point Detector. In ECCV 1, number 128– 142, 2002. [12] L. G. Brown. A survey of image registration techniques. ACM Comput. Surv., 24(4):325–376, December 1992. [13] D.G. Lowe. Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 20:91–110, 2003. [14] M.A. Fischler and R.C. Bolles. Random sample concensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981. [15] B.S. Reddy and B.N. Chatterji. An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Transactions on Image Processing, 5(8):1266–1271, August 1996. [16] Siavash Zokai and George Wolberg. Image Registration Using Log-Polar Mappings for Recovery of Large-Scale Similarity and Projective Transformations. IEEE Transactions on Image Processing, 14(10):1422–1434, October 2005.

[17] T. Tommasini, A. Fusiello, E. Trucco, and V. Roberto. Making good features track better. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 178– 183, Santa Barbara, CA, June 1998. IEEE Computer Society Press. [18] Tony Lindeberg. Feature Detection with Automatic Scale Selection. International Journal of Computer Vision, 30(2):77–116, 1998. [19] Travis E. Oliphant. Guide to NumPy. Brigham Young University, Provo, UT, 2006.

Suggest Documents