Study of Feature based Image Registration Algorithms ...

5 downloads 2938 Views 246KB Size Report
to deal with spectral, temporal and viewpoint variations between ... Image registration deals with inding ... the images to frequency domain with slightly reduced.
Indian Journal of Science and Technology, Vol 8(22), DOI: 10.17485/ijst/2015/v8i22/79315, September 2015

ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645

Study of Feature based Image Registration Algorithms for Navigation of Unmanned Aerial Vehicles K. Divya Lakshmi* and V. Vaithiyanathan School of Computing, SASTRA University, Thirumalaisamudram, Thanjavur - 613401, Tamil Nadu, India; [email protected], [email protected]

Abstract Image registration is an important preprocessing step for computer vision tasks such as object recognition by a robot or change detection over a natural terrain from satellite images. This paper discusses the basic steps for the feature based image registration, and surveys the various methodologies from literature applicable for each step. The evolution of algorithms on image features is explained with some of their mathematical concepts, along with their limitations. Two videos captured from an unmanned aerial vehicle were used for experimentation. The performance of intensity based image registration using normalized cross correlation and Speeded Up Robust Features was observed by registering various frames of the videos. Intensity based methods perform well for all type of images when the images to be registered vary by only a translation. Feature based methods are suitable when the possible transformation between the images are unknown. Feature based methods fail when the images lack distinct features such as corner points. The study presented would enable one to choose the right methodology for each step of the image registration algorithm, for various real time computer vision applications such as navigation of unmanned aerial vehicles.

Keywords: Image Matching, Image Registration, Local Features, Registration Algorithms, UAV Navigation

1. Introduction Vision based guidance is executed normally in two phases - oline and online1. Oline phase aims to produce reference imagery by combining data from images captured from the terrain it is planned to ly. Online phase compares on-light sensed image with the reference imagery both of which are aerial down looking or forward looking images. he fusion and comparison has to deal with spectral, temporal and viewpoint variations between images. Image registration deals with inding correspondences between two images in the presence of above said variations. Challenges in aerial image registration for vision based guidance include the following: 1. he registration

*Author for correspondence

algorithm should satisfy the real time constraints imposed by the speed of the vehicle. 2. he algorithm should be robust to variations introduced by image capture mechanism such as sensor used, position of the capturing device and other variations introduced by the atmospheric changes. 3. he algorithm has to deal with occlusion where successful matching has to be established even if only a part of the details of the reference imagery is visible in the on-light sensed image.

2. Motivation Unmanned Aerial Vehicles which should be incorporated with the human intelligence can represent the success of technology in competing with human abilities when

Study of Feature based Image Registration Algorithms for Navigation of Unmanned Aerial Vehicles

successfully implemented. he UAVs capable of lying at lower altitudes can be used for the close monitoring of regions afected by disasters such as earthquake, where the human intervention is of higher risk. here are several methods developed for registering two images targeting various domains and applications. he technique chosen should closely relate to the variations it has to handle. Image Registration techniques can be broadly classiied as 1. Intensity Based image registration and 2. Feature based image registration. Intensity Based image registration mainly works on the principle of pixel by pixel comparison of the two images. Various comparison metrics have been developed. Cross correlation coeicient, mutual information are some of the widely used metrics. he speed of intensity based image registration can be improved by transforming the images to frequency domain with slightly reduced accuracy2. he variations it can handle are restricted mainly to translation caused by the displacement of the camera in a direction perpendicular to the image plane and certain degree of rotation and scale. It does not deal with occlusion and wide changes in scale and viewpoint. Many of the limitations of the intensity based image registration is overcome in feature based image registration techniques. Instead of comparison of the pixels, features extracted from the images are used for comparison. Features extracted can be local or global. Global features are extracted from the entire image. hus the pattern can be learnt at a higher level covering the entire image. he main drawback of the global feature is that they cannot handle occlusions. his drawback is overcome when features are extracted from restricted portions of the image. here are also some hybrid techniques where global descriptions are added to the descriptions of the local features3. Owing to the advantages, feature based image registration is chosen for the purpose of image guided navigation with more focus on local features.

3. Methodology Figure 1 show the steps involved in the methodology followed for image matching based on feature based image registration. he various methods found in literature for each of the steps depicted in Figure 1 is discussed in the subsequent sections.

2

Vol 8 (22) | September 2015 | www.indjst.org

Figure 1. Steps in feature based matching of images.

3.1 Image Preprocessing Image preprocessing mainly focuses on removing the details in the images that will not contribute to signiicant information content of the image. his step also plays an important role in making the process robust to noise and spectral variations in case of inter spectral image registration. he image can be converted to gray scale and further to a binary or an edge image. Conversion to gray scale or binary scale depends on the nature of the image and performance requirements of the application. Intensity scale conversion also helps in normalizing the intensity variations caused by the change in sensor. Rough Features Along Edges (RFAE)4, targets registration of thermal and visual images captured by low altitude unmanned aerial vehicles. It converts the input image to a binary edge image. his will improve the quality of the features extracted at the same time improve the time complexity of the subsequent stages. he preprocessing step will also impact the accuracy of the subsequent steps.

Indian Journal of Science and Technology

K. Divya Lakshmi and V. Vaithiyanathan

3.2 Feature Detection Features are the structures in the scene which are unique and identiiable in diferent viewpoints of the scene. A scene in an image is composed of edges, corners and regions of uniform intensity or texture. Of the above components corners are the distinct point features that can be used for comparison. A feature detection algorithm attempts to ind the corner points which are at the intersection of edges or contours of an image. Eliminating edge points pose a challenge in a corner or interest point detection algorithm. Repeatability5 is an important performance measure for a point feature. It is the percentage of features that are located at the corresponding locations in the two images containing a scene in diferent viewpoints. here are basically two types of detectors for point features - corner and blob. he algorithms based on corners and blobs are discussed below.

3.2.1 Corner Based Rotation Covariant Feature Corners are characterized as pixels in an image which has strong variations in all directions. Detectors based on corners try to measure the cornerness of a pixel in an image. Moravec (1980) interest point operator calculates a variance measure around a pixel within a window6. he pixel is considered as a feature if the variance measure is above a local maximum. Sum of squares of the diferences of pixels in each of the four directions (horizontal, vertical, major diagonal, minor diagonal) are calculated. he minimum value among the four directions is taken as the variance measure. he drawback with Moravec corner detector is that it deals with rotational shits of minimum 45 degrees. he response is noisy owing to its binary rectangular window. Harris improved the Moravec corner with the following changes. Harris proposed the use of analytic expansions to deal with small rotational shits. he window is circular and Gaussian. he variance measure is the principle curvatures of the second moment matrix of the Gaussian weighted patch of the image surrounding the pixel. Eigen values of the second moment matrix are the principle curvatures and they are rotation invariant. A pixel is chosen as a feature if it has high values for both principle curvature7. SUSAN (Smallest Univalue Segment Assimilating Nucleus) is a corner based feature detector that uses

Vol 8 (22) | September 2015 | www.indjst.org

a morphological approach rather than a diferential approach which is noise sensitive and computationally expensive. SUSAN classiies a pixel as a corner, based on the intensity comparison results with its neighborhood. FAST (Features from Accelerated Segment Test) is an eicient implementation of SUSAN detector. It considers a circular neighborhood of a pixel and uses ID3 machine learning algorithm which is decision tree base for a pixel to be classiied as a corner8. he corner detectors discussed so far produce features which are categorized by the x and y coordinates. hey do not produce repeatable features in the presence of scale variations.

3.2.2 Blob based Feature Detector A blob in an image can be considered as a group of neighboring pixels whose intensity values are constant or vary within a range of values and diferent from the surrounding region. Point feature detector based on blobs can also be termed as interest point detectors. Hessian is a widely used blob based feature detector. he feature detector by Beaudet is based on the Hessian matrix which detects interest points based on blobs9. he determinant of the Hessian matrix produces a higher value for the blob regions. Speeded Up Robust Features (SURF) initially selects key points from the image using a Hessian hreshold10.

3.2.3 Scale Invariance to Corners and Blobs he concept of scale space is the key to scale invariance features. he concept of scale space in computer vision was introduced by Lindeberg. he method of scale space produces images of diferent scale from a given image by means of successive blurring and sub-sampling. he scale space thus generated is also called as image pyramid which comprises images of diferent scales derived from a base image11. Scale invariant methods found in literature involve the generation of the image pyramid in their detection process. he method of generation of the pyramid is varied depending on the speed and accuracy requirements. Laplace of Gaussian is one of the basic methods of generation of image pyramid. Scale Invariant Feature Transform (SIFT) uses Diference of Gaussian which is an approximation of Laplace of Gaussian method12. SURF generates the image pyramid using a box ilter

Indian Journal of Science and Technology

3

Study of Feature based Image Registration Algorithms for Navigation of Unmanned Aerial Vehicles

method which is yet another approximation of Laplace of Gaussian10. Blobs detected in the image have a characteristic scale which is found by convolving the image at the blob location with the images of the pyramid. he scale giving the maximum response is chosen as the characteristic scale of the blob. he blob detectors thus produce point features which are characterized by x coordinate, y coordinate and scale. Harris-Laplace, Hessian-Laplace are the scale invariant algorithms which are extensions of the interest point detectors Harris and Hessian respectively13. A blob in an image can be considered as a group of neighboring pixels whose intensity values are constant or vary within a range of values and diferent from the surrounding region. Point feature detector based on blobs can also be termed as interest point detectors. Hessian is a widely used blob based feature detector.

3.2.4 Aine Invariance to Local Features While scale invariant methods apply uniform scaling over the image, aine invariant methods apply nonuniform scaling. hus, as scale invariance deals with isotropic features of an image, aine invariance deals with anisotropic features. Hessian-Aine and Harris-Aine are the aine invariant extensions of Hessian and Harris interest point detectors respectively13. ASIFT (Aine SIFT) uses aine simulation of the image based on aine camera model to detect features that are truly distinct over various aine transformations14. PSIFT (Perspective SIFT) uses perspective camera model which models real world transformations better than the aine camera model to detect features that are distinct over perspective distortions15.

3.3 Feature Description Feature descriptor is a vector associated with a point feature of an image that will be used for inding corresponding point feature in the other image. he descriptor is constructed based on the local region surrounding the point feature and some descriptors also have a global descriptor component in it.

3.3.1 Orientation Assignment to a Feature Features detected are represented by the spatial co-ordinates in corner based detectors and by spatial co-ordinates and scale in case of scale invariant feature detectors. Orientation is assigned to a feature based

4

Vol 8 (22) | September 2015 | www.indjst.org

on the neighborhood, which can make the descriptor comparison tolerant to rotation. SIFT computes the orientation by constructing an orientation histogram where Gaussian weighted orientation of pixels within a circular neighborhood around the feature is calculated based on pixel diferences and the scale of the feature. he orientations having peak values are assigned to the feature. A single feature thus may be assigned multiple orientations increasing the number of features. he above said property is found to increase the stability of matching in the presence of noise12. SURF inds the orientation by sliding a sector of angle (Π/3) and radius 6s where s is the scale associated with the feature. he angle of the sector at which the points within sector gives maximum Haar wavelet response while sliding is the orientation associated with the point feature10.

3.3.2 Formation of Descriptor SIFT constructs a vector of size 128 as a descriptor. he orientation histogram constructed over pixels in the neighborhood forms the descriptor. he histogram is normalized to achieve luminance invariance. here are variants of SIFT that modiies the methodology of construction of descriptor. PCA-SIFT reduce the size of the descriptor using principal component analysis16. KBPSIFT uses kernel based projection to reduce the size of the descriptor to 3617. he above two algorithms improve the speed of inding corresponding features. here are algorithms that improve the accuracy of SIFT by adding components to the descriptor or by improving the process by additional modules. GLOH adds a global descriptor to the SIFT descriptor to achieve true correspondences in the presence of repeated local patterns within an image3. SURF computes the descriptor using Haar wavelet responses of the sub regions of a square region constructed around the feature, aligned to the orientation assigned. he alignment of the square neighborhood is not required for Upright SURF which does not handle rotational transformations10. Center Surround Extrema (CenSurE) is a feature detector which aims for real time detection and matching. It is a fast variant of the upright SURF detector with polygon shaped neighborhood regions18. STAR modiies CenSurE replacing polygon with two overlapping squares which difer by 45 degrees in their alignment19.

Indian Journal of Science and Technology

K. Divya Lakshmi and V. Vaithiyanathan

3.4 Point Feature Correspondence By comparing the descriptors of both the images the corresponding features between images are established. he methods for comparing and classifying a feature pair as corresponding features are discussed below.

3.4.1 Distance based Methods Distance measure such as Euclidean is calculated from the two descriptors belonging to images to be registered. In threshold based comparison if distance falls below a threshold the pair is labeled as corresponding feature pair of the images. More than one match will be returned for a single feature, thus producing ambiguous matches. he drawback of multiple matches is overcome in nearest neighborhood symmetric method which returns the nearest one in the other set as the corresponding feature. It returns only one matching feature. his method thus strongly depends on the descriptors and is thus less robust to noise. In nearest neighbor ratio method, ratio between the distance of the feature with the irst and second nearest features is calculated. he nearest neighbor is chosen as a corresponding feature when the ratio is beyond a threshold. It is the widely used method which is computational simple, returning reasonable match results3.

3.4.2 Other Methods Methods have been developed for more accurate and faster matching. One such method is to check if the neighborhood features also posses corresponding features in the other set before labeling a feature pair as corresponding features. Indexing based on sign of Laplacian is proposed in SURF to improve the fetching of features from the feature set10.



Finding the corresponding points for the boundary points of the sensed image using the transformation function estimated20.

3.5.1 Choice of Transformation Aine Transformation Model, Homograph, Projective transformation models are some of the transformation functions to model the distortion between the images to be registered. here is a trade-of between the accuracy of the transformation model in modeling the distortion and the speed of estimation of the transformation function while some algorithms involve methodologies for high speed computations for estimating an accurate function.

3.5.2 Estimation of Transformation Function he parameters of the chosen transformation function have to be estimated from the set of corresponding features. Linear Regression Random Sample Consesus (RANSAC), M-estimator Sample Consesus (MSAC) are some of the widely used methods for the estimation of transformation function.

4. Results and Discussion Experiments were conducted for scene matching of UAV images. Two UAV videos were taken. One contained a natural terrain containing trees and bushes with only translational movement (Figure 2) and the other contained building structures (Figure 3). A portion of the irst frame of the video was taken and trials were made

3.5 Region Correspondence he scene captured by the unmanned aerial vehicle has to ind its corresponding region in the reference map for navigation. he region corresponding to the scene captured has to be computed from the set of corresponding features. A straightforward approach to ind the corresponding region of the sensed image in the reference map is as follows: • Estimation of the transformation function from the set of corresponding features.

Vol 8 (22) | September 2015 | www.indjst.org

Figure 2. Frame from Video 1 of the dataset.

Indian Journal of Science and Technology

5

Study of Feature based Image Registration Algorithms for Navigation of Unmanned Aerial Vehicles

5. Conclusion Feature based image matching is found to be ideal for the registration of images containing manmade structures and whose possible transformations and the spectrum are unknown. Feature based methods which are variants of SIFT are said to provide superior performance in the literature and Perspective SIFT is found to be suitable for the real time matching as it focuses on low altitude remote sensing images modeling the transformation using perspective camera model which is more close to the real world scenario, requiring eicient implementation in hardware and sotware for meeting time constraints. Figure 3. Frame from Video 2 of the dataset.

to compute its location in the subsequent frames of the video. Intensity based registration based on Normalized Cross Correlation (NCC) produced expected matching results for the video of natural terrain with only translation movement. Intensity based registration failed to work for the second video with rotational movement which can be observed in Figure 4. Image matching was performed using SURF which is a variant of SIFT. It worked for the video containing manmade structures and for a limited range of rotational movement where sensed and reference images contained distinct corner points, with decreasing matching probability and with increasing rotation which can be seen from the graph of Figure 4. It failed in the irst video which lacked distinct corners in it.

Figure 4. Performance of Algorithms on Video 2.

6

Vol 8 (22) | September 2015 | www.indjst.org

6. References 1. Carr JR, Sobek JS. Digital scene matching area correlation. Proceedings SPIE 0238, Image Processing for Missile Guidance, 36; 1980 Dec 23. Available from: http://dx.doi. org/10.1117/12.959130. doi:10.1117/12.959130. 2. Lewis JP. Fast normalized cross-correlation. Vision Interface. 2003; 120–3. 3. Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005 Oct; 27(10):1615–30. 4. Yahyanejad S, Rinner B. A fast and mobile system for registration of low-altitude visual and thermal aerial images using multiple small-scale UAVs. ISPRS Journal of Photogrammetry and Remote Sensing. 2015 Jun; 104:189– 202. 5. Tuytelaars T, Mikolajczyk K. Local invariant feature detectors: a survey. FnT Computer Graphics and Vision. 2008 Jan; 3(3):177–280. 6. Moravec H. Obstacle avoidance and navigation in the real world by a seeing robot rover. Carnegie-Mellon University, Robotics Institute; 1980 Sep. Tech Report CMU-RI-TR-3. 7. Harris C, Stephens M. A combined corner and edge detector. Proceedings of Fourth Alvey Vision Conference; 1988. p. 147–51. 8. Smith SM. Brady JM. SUSAN - A new approach to low level image processing. International Journal of Computer Vision. 1997 May; 23(1):45–78. 9. Zhiqiang HW, Earn Z, Teoh K. Gray level corner detection. IAPR Workshop on Machine Vision Applications. 1998 Nov 17–19; Makuhari, Chiba, Japan. 10. Bay H, Tuytelaars T, Gool LC. Surf: Speeded up robust features. ECCV. 2006; 3951:404–17. 11. Lindeberg T. Scale-space theory in computer vision. Kluwer Academic Publishers, Netherlands; 1994; p. 256.

Indian Journal of Science and Technology

K. Divya Lakshmi and V. Vaithiyanathan 12. Lowe DG. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision. 2004 Nov; 60(2):91–110. 13. Mikolajczyk K, Schmid C. Scale & aine invariant interest point detectors. International Journal of Computer Vision. 2004 Oct; 60(1):63–86. 14. Yu G, Morel JM. A fully aine invariant image comparison method. Proceedings of ICASSP ’09; Apr 19–24; Taipei; 2009; p. 1597–600. 15. Cai GR, Jodoin PM, Li SZ, Wu YD, Su SZ, Huang ZK. Perspective-SIFT: an eicient tool for low-altitude remote sensing image registration. Signal Processing. 2013 Nov; 93(11):3088–110. 16. Ke Y, Sukthankar R. PCA-SIFT: a more distinctive representation for local image descriptors. Proceedings of the 2004 IEEE Computer Society Conference on Computer

Vol 8 (22) | September 2015 | www.indjst.org

17.

18.

19.

20.

Vision and Pattern Recognition (CVPR 2004); 2004 Jun 27-Jul 2; p. II-50613. Zhao G, Chen L, Chen G, Yuan J. KBP-sit: a compact local feature descriptor. Proceedings of the International Conference on Multimedia (MM '10); 2010; p. 1175–78. Agrawal M, Konolige K, Blas MR. CenSurE: Center Surround Extremas for realtime feature detection and matching. Computer Vision. ECCV 2008 Lecture Notes in Computer Science. 2008; 5305:102–15. Senst T, Unger B, Keller I, Sikora T. Performance evaluation of feature detection for local optical low tracking. International Conference on Pattern Recognition Applications and Methods (ICPRAM 2012); 2012. 2:303–9. Lowe DG. Object recognition from local scale-invariant features. Proceedings of the International Conference on Computer Vision. Kerkyra; 1999 Sep 20-27. 2:1150–7.

Indian Journal of Science and Technology

7