A Vector SIFT Detector for Interest Point Detection in ... - IEEE Xplore

2 downloads 0 Views 2MB Size Report
Abstract—This paper presents an algorithm for the extraction of interest points in hyperspectral images. Interest points are spatial features of the image that ...
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 50, NO. 11, NOVEMBER 2012

4521

A Vector SIFT Detector for Interest Point Detection in Hyperspectral Imagery Leidy P. Dorado-Muñoz, Miguel Vélez-Reyes, Senior Member, IEEE, Amit Mukherjee, and Badrinath Roysam, Senior Member, IEEE

Abstract—This paper presents an algorithm for the extraction of interest points in hyperspectral images. Interest points are spatial features of the image that capture information from their neighbors, are distinctive and stable under transformations such as translation and rotation, are helpful in data reduction, and reduce the computational burden of various algorithms such as image registration by replacing an exhaustive search over the entire image domain by a probe into a concise set of highly informative points. Interest points have been applied to problems in computer vision, including image matching, recognition, 3-D reconstruction, and change detection. Interest point operators for monochromatic images were proposed more than a decade ago and have extensively been studied. An interest point operator seeks out points in an image that are structurally distinct, invariant to imaging conditions, and stable under geometric transformations. An extension of Lowe’s scale-invariant feature transform (SIFT) to vector images is proposed here. The approach takes the vectorial nature of the hyperspectral images into account. Furthermore, the multiscale representation of the image is generated by vector nonlinear diffusion, which leads to improved detection, because it better preserves edges in the image as opposed to Gaussian blurring, which is used in Lowe’s original approach. Experiments with hyperspectral images of the same and different resolutions that were collected with the Airborne Hyperspectral Imaging System (AISA) and Hyperion sensors are presented. Evaluation of the proposed approach using repeatability criterion and image registration is carried out. Comparisons with other approaches that were described in the literature are presented. Index Terms—Hyperspectral image processing, image registration, interest points, scale-invariant feature transform (SIFT), spatial feature detection.

I. I NTRODUCTION

H

YPERSPECTRAL sensors simultaneously collect data in hundreds of narrow and contiguous spectral bands over a range of the electromagnetic spectrum, which provide densely sampled spectra for each pixel in an image. After suitable Manuscript received February 22, 2011; revised December 22, 2011; accepted February 22, 2012. Date of publication May 9, 2012; date of current version October 24, 2012. This work was supported in part by the Bernard M. Gordon Center for Subsurface Sensing and Imaging Systems through the Engineering Research Centers Program of the National Science Foundation under Award EEC-9986821 and by the U.S. Department of Defense. L. P. Dorado-Muñoz is with the Rochester Institute of Technology, Rochester, NY 14623-5603 USA. M. Vélez-Reyes was with the Electrical and Computer Engineering Department, University of Puerto Rico at Mayagüez, Mayagüez 00681, Puerto Rico. He is now with the Electrical and Computer Engineering Department, University of Texas, El Paso, TX 79968 USA (e-mail: [email protected]). A. Mukherjee is with the Rensselaer Polytechnic Institute, Troy, NY 12180 USA. B. Roysam is with the Department of Electrical and Computer Engineering, University of Houston, Houston, TX 77204-4005 USA. Digital Object Identifier 10.1109/TGRS.2012.2191791

calibration, this spectral signature can be compared with library signatures of different materials for the recognition or mapping of surfaces such as vegetation or minerals that allow us to monitor, detect, and analyze global and environmental changes that are produced on the Earth’s surface [1]. Many applications such as change detection and image mosaicking require the registration of images of the same scene that might have been taken at different times, from different viewpoints, or by different sensors. The first step in the image registration process is feature detection [2], and its performance heavily depends on this step. Feature detection methods should have good localization accuracy and be invariant to image transformations. In other words, the number of common elements or features that were detected should be sufficiently high, regardless of changes in image geometry, scale, and illumination, or the presence of noise [3]. Many feature selection methods are based on the extraction of salient and distinctive spatial structures or points in an image. Points that are invariant and stable under geometric transformations are called interest points. The operators that detect these points are called interest point operators, which have wide application in computer vision, and many algorithms have been developed. Some interest point operators detect intensity changes within a neighborhood to find regions of maximum change using the autocorrelation function of the signal [4], [5]. Other detectors add scale invariance such as the methods that were reported by Lindberg [6], Mikolajczyk [7], and Lowe [8]. Lowe’s operator [8] searches local extrema in the scale space of an image. The scale space is generated by Gaussian smoothing. The aforementioned operators have shown great performance on gray-scale or single-band images, but their extension to multispectral or hyperspectral images need to consider that the information is distributed over several bands, and it is often difficult to accurately map a vector image onto a scalar image while preserving all the necessary structural information. In this paper, an extension of Lowe’s operator to hyperspectral images is carried out. In this extension, the structural information is captured from all spectral bands of the image, and the vectorial nature of pixels in the image is taken into account. A modification to Lowe’s approach is also proposed, where the scale-space representation of the image is generated by vector nonlinear diffusion instead of Gaussian smoothing. Nonlinear diffusion better preserves edges in the generated scale-space representation, which leads to improved detection of interest points. This paper is organized as follows. A review of feature detection methodologies is presented in Section II-A. Lowe’s methodology for the detection of interest points in gray-scale

0196-2892/$31.00 © 2012 IEEE

4522

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 50, NO. 11, NOVEMBER 2012

images is reviewed in Section II-B. An extension of Lowe’s approach to hyperspectral images as proposed by Mukherjee is described in Section II-C. Section III presents the proposed approach for interest point detection in hyperspectral images. Section IV presents experimental results for different hyperspectral images. The performance of the algorithm was evaluated using the repeatability criterion and matching percentage for the image registration using hyperspectral images at different spatial resolutions over the same scene. A comparison between Mukherjee’s approach and the proposed SIFT vector detector is also presented. II. BACKGROUND A feature is a concept used in image processing and computer vision to denote a piece of information that is relevant for solving a particular image processing task. Spatial features that are useful for registration are specific structures such as points, lines, edges, or corners, and more complex structures such as objects. The choice of features for registration is dependent on the data type. The idea is to extract salient and significant features that are spread over the image and are common elements in the images that are registered, even when one of them has experienced changes of scale or transformations such as rotations and translations, do not exactly cover the same scene, or are degraded by the noise in the acquisition system [9]. These salient spatial features are called interest points or keypoints. In general, interest points are located at distinctive locations such as corners, junctions, edges, or blobs (a point or region that is brighter or darker than its surroundings) where the region around them is highly informative. In addition, interest points are helpful in data reduction and in the minimization of computational burden and cost, because processing can be performed using only the detected features. A. Methodologies for the Detection of Interest Points The operators that are used for interest point detection are based on edge or blob detection and on differentiation operators, where the difference between two horizontal or vertical adjacent pixels can give information about changes in intensity and can emphasize the boundaries of features or objects within an image [10]. In addition, the operators should be invariant to scale, illumination, affine transformations, and noise to guarantee that detected features are stable and invariant. In [11], Schmidt et al. classify interest point detectors as follows: 1) contour-based methods; 2) intensity-based methods; and 3) parametric methods. Parametric methods fit a parametric intensity function to the image. Contour-based methods detect points from curves of 2-D objects, taking advantage of spatial characteristics such as changes of curvature and inflexion [12]. These methods extract line segments from curves and contours to use the intersection among grouped line segments as interest points [13]. Intensity-based detectors compute a measure that indicates the presence of interest points from the image graylevel values. Scale invariance for feature detection was proposed by Lindeberg [4], where a methodology for detecting features and performing an automatic scale selection was proposed. The

method was based on the hypothesis that local maxima over different scales of normalized derivatives were likely candidates to conform structures on the image. Mikolajczyk and Schmid [7] used this scale invariance to propose an interest point detector that was invariant to scale and affine transformations. Because of affine transformations, invariance is determined by the repeatability of the results. More recently, Lowe [8] proposed an operator for intensity images that finds interest points through a multiscale represenation of the image and detection of local extremum. This operator, called the scale-invariant feature transform (SIFT) operator, involves the following two stages: 1) interest point detection and 2) interest point description. The SIFT operator, aside from being invariant to scale and view-point transformations, more efficiently performs feature extraction than other approaches and identifies a large number of features, allowing reliability in matching features through a considerable range of affine distortions. Variations to the SIFT descriptor have been proposed for scalar images. Ke and Sukthankar [14] improved the local image descriptor of SIFT by applying principal components analysis (PCA) instead of using SIFT’s smoothed weighted histograms, as in the original SIFT methodology. They demonstrated that a local descriptor based on PCA is more distinctive, more robust to image deformations, and more compact than the original SIFT descriptor. An extension of SIFT to color images was proposed in [15], where a comparison of this descriptor with other color descriptors was carried out, looking at discriminative power and constancy under imaging conditions. An extension of Lowe’s operator to hyperspectral images was proposed in [16], which used a scale-space representation for the first few principal components of the image and combined them using a norm to produce a scalar image that was analyzed using Lowe’s approach. The PCA transformation decorrelates the spectral bands, and the scale-space representation is generated in the reduced-dimensionality space. SIFT has also been used in hyperspectral imaging for spectral matching in [17], where a 1-D SIFT operator was applied to the spectral signature at each pixel, and spectral interest points were extracted. B. Lowe’s Approach to Gray-Scale or Scalar Images The SIFT operator finds repeatable and distinctive features in redundant data, and it is widely used in image matching. Lowe’s SIFT operator is composed of two stages. The first stage is the detection of interest points. The interest point operator seeks out points in the image that are brighter or darker than its surroundings or blobs. Fig. 1 summarizes Lowe’s methodology for interest point detection [8]. Initially, Gaussian blurring is used to generate a linear scalespace function and to ensure that interest point locations are invariant to scale changes. The scale-space representation is performed through the difference of Gaussian function (DoG), which is computed from the difference of two nearby scales that are separated by a constant multiplicative factor k, i.e., DoG(x, y, σ) = L(x, y, kσ) − L(x, y, σ)

(1)

where x, y are the spatial dimensions of image, and σ is the standard deviation of the Gaussian function. L(x, y, σ)

DORADO-MUÑOZ et al.: VECTOR SIFT DETECTOR FOR INTEREST POINT DETECTION

Fig. 1.

4523

Lowe’s interest point detector [8] for scalar images.

and L(x, y, kσ) are smoothed images that are computed according to L(x, y, σ) = G(x, y, σ) ∗ u(x, y)

(2)

with u(x, y) being the original image, and G(x, y, σ) being a Gaussian kernel, i.e.,  2 2  (x +y ) − 1 2σ 2 G(x, y, σ) = e . (3) 2πσ 2 The DoG function (1) provides a close approximation to the scale-normalized Laplacian of Gaussian that was studied by Lindeberg [18], which is required for true scale invariance. Because this operator is based on blob detection, a search for the local extrema of the DoG(x, y, σ) function is performed. The local extrema are found by the comparison of the current sample point with its 26 neighbors: 8 neighbors are located in the current scale, and 18 neighbors are from the scales above and below. If the intensity value of the sample pixel is a minimum or maximum, the pixel is considered an extremum, and it is qualified as an interest point candidate. The DoG function response is strong along edges, even when their location are poorly defined. Local maxima can also be detected in a neighborhood of contours where the points are more sensitive to noise; hence, it is necessary to perform, for each extremum, an subpixel interpolation in position and scale. First, each candidate point is fitted with a quadratic approximation of the DoG function. Then, the unsuitable extrema (e.g., low contrast and with strong response along edges but with poorly defined location) are rejected. To determine if a

point is an unsuitable extremum, the principal curvatures of the DoG function are used, which give information about how the surface bends by different amounts in different directions at a given point. At each point, there are two directions in which the curvature reaches its maximum and minimum rates of change. These directions are perpendicular and are called principal directions, and its maximum and minimum values are called principal curvatures, denoted as α and β. These directions are related to the eigendecomposition of the Hessian matrix [19], [20], where its eigenvalues α and β are the principal curvatures, and its eigenvectors are the principal directions. For a 2-D scalar image, the Hessian of the DoG function (1) is given by   2 ∂ ∂2 ∂x2 DoG ∂xy DoG . (4) H (DoG(x, y)) = ∂ 2 ∂2 ∂yx DoG ∂y 2 DoG The sum and the product of the eigenvalues α and β can be calculated from the trace and the determinant of H as [7] ∂ ∂ DoG + 2 DoG = α + β 2 ∂x ∂y 2  ∂ ∂ ∂ DoG = αβ. Det(H) = 2 DoG 2 DoG − ∂x ∂y ∂xy T r(H) =

(5)

Using the ratio r = α/β, the ratio of the trace and the determinant of H can be written as (α + β)2 (rβ + β)2 T r(H)2 (r + 1)2 = = . = Det(H) αβ rβ 2 r

(6)

4524

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 50, NO. 11, NOVEMBER 2012

Fig. 2. Interest point detector that was proposed by Mukherjee et al. [16] for hyperspectral images.

The trace and the determinant depend only on the ratio r, and the quantity (r + 1)2 /r is a minimum when r = 1 and increases with r. When r = 1, the principal curvatures are equal, which means that the curvature of surface in both directions is the same. On other hand, if r > 1, the curvature in one direction is larger than the curvature in the other direction. Notice that, as r → ∞, (r + 1)2 /r → r. A threshold test using (6) can be used to determine whether the point under consideration is an interest point as follows: if

(r + 1)2 Tr(H)2 < Det(H) r

then the extremum is an Interest Point.

(7)

A value of r = 10 is used by Lowe [8] and also by Mukherjee et al. [16]. Hyperspectral imagery (HSI) can be considered a vector or a multichannel image. Interest point detection could be individually performed by processing each band and then fusing results from all bands. This approach does not take advantage of the correlation between bands.

C. Mukherjee et al.’s Extension to Hyperspectral Images Mukherjee et al. [16] presented an approach for a SIFT detector for HSI, which takes into account the vectorial nature of the image. There, the image is initially transformed to an uncorrelated image using principal components, and each PC is individually processed. Fig. 2 shows a diagram that describes Mukherjee et al.’s approach [16]. PCA is used for decorrelation and dimensionality reduction and to eliminate false detections that arise from noise or spatial irregularities. Principal components whose variance spans more than 5% of the data variance are retained, as well as the smaller

components, which are acceptably smooth in a sense defined by Snx n =

std(Aˆi ) std(Ai − Aˆi )

(8)

where Ai is the ith principal component, Aˆi is the minimum mean absolute error estimate in the local nx n neighborhood of each pixel, and std(.) is the standard deviation. Once the hyperspetral image has been reduced to a few PC projections, the generation of scale-space representation is individually performed on each PC using (2) and (3). The DoG for each PC is computed, where the standard deviation in the Gaussian kernels is selected as follows:   log(smax ) σs ∈ {k s }; k = 21/3 , s = −1, 0, 1, . . . , . (9) 1/3 log(2) The scale discretization in (9) is motivated by Lowe [8], and a factor of 3 implies that each octave of scale is sampled at three sample points. The constant smax is specified according to the scale maxima of interest points, given in pixels. The multiple DoGs are combined using a function fm : RM → R+ , called a combination function, that generates a scalar image by combining the same scale DoG for each PC. The following four combination functions were evaluated by Mukherjee et al.: 1) L1-norm; 2) L1-top50; 3) earth movers distance [21]; and 4) diffusion distance [22]. Once the DoG functions are combined, the stages of local extrema candidate detection and unstable point rejection exactly follow those used in Lowe’s approach, as shown in Fig. 1. Mukherjee et al. [16] evaluate the methodology using the repeatability of interest points found in two time-lapse Hyperion images [34]. The criterion for defining if two interest points from the two images match is, as stated in [16], “if they are

DORADO-MUÑOZ et al.: VECTOR SIFT DETECTOR FOR INTEREST POINT DETECTION

Fig. 3.

4525

Proposed vector SIFT detector for hyperspectral images.

located within one standard deviation of their average scale and the ratio of their scale is between 0.5 and 2” when the images are registered. Although, in many cases, a suitable selection of PC reduces the dimensionality of the data set and keeps the information required by some applications, this selection is dependent on the data and could produce information loss. Likewise, there is a loss of information in the combining DoG functions for each PC; then, it can be of interest to search for other ways of extending Lowe’s operator to hyperspectral images that directly consider the vectorial nature of spectral responses. III. SIFT O PERATOR FOR V ECTOR I MAGES The proposed vector SIFT operator for interest point detection is depicted in Fig. 3. The extension to vector images is achieved by substituting scalar operations in Lowe’s approach by vector-valued operations. In addition, the scale-space representation is generated using vector nonlinear diffusion [23] instead of Gaussian smoothing. For local extrema detection and rejection of unstable points, vector-ordering methods [27]–[31] and the second fundamental form [32] are used. These operations are described next. A. Generation of Scale Space With Vector Nonlinear Diffusion Linear Scale Space: The first step in interest point detection is to identify locations and scales of distinctive and stable points. The search for these points is performed across all possible scales of a function called a scale space [8]. The scale space is an image representation parameterized by a scale parameter, that decomposes simplified structures at coarse

scales, and more complex structures at the fine scales. In 1983, Witkin [24] introduced a methodology for constructing a linear scale-space representation by the consecutive convolution of Gaussian kernel with the original image, which satisfied axioms of linearity, causality, isotropy, and homogeneity [18]. The theory of parabolic partial differential equations follows the maximum principle [25], which is equivalent to the causality axiom of the scale-space theory. Then, the diffusion equation, which is a specific case of a parabolic partial differential equation, can be used to generate a scale-space representation. The diffusion equation is given by ∂ u(x, y) = ∇. (c∇u(x, y)) ∂t =

∂2 ∂2 c(x, y, t)u + c(x, y, t)u. ∂x2 ∂y 2

(10)

In linear scale space, the diffusion coefficient c(x, y, t) is a constant, and it can be shown to be equivalent to Gaussian smoothing [see (2) and (3)]. Gaussian blurring does homogeneous smoothing across the entire image without considering the natural boundaries, and it hinders edge and distinctive features detection such as interest points. Nonlinear Scale Space: Linear diffusion causes blurring of edges as the scale is increased. However, in many imageprocessing applications, such as interest point detection, it is important that natural boundaries that are present in an image are preserved and the blurring level is controlled. This problem can be addressed by using nonlinear diffusion, where the value of c(x, y, t) varies in a way that sharpens natural edges of the image and smoothes uniform regions. The use of nonlinear diffusion for image processing was proposed by Perona and

4526

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 50, NO. 11, NOVEMBER 2012

Fig. 4. DoS and DoG comparison using a simulated 2-m panchromatic image. (a) DoS images at three different scales. (b) DoG images at the same scales. To improve image clarity, a linear stretch was applied to DoS and DoG images.

Malik [25], where by appropriately choosing the diffusion coefficient c(x, y, t), it is possible to achieve a blurring in separated regions without interaction among them. It can also satisfy homogeneity and isotropy criteria. The diffusion coefficient c(x, y, t) is a nonlinear function of the image gradient |∇u|, and it ranges between 0 ≤ c(x, y, t) ≤ 1. The extension of nonlinear diffusion to vector-valued images was proposed by Weickert and Brox [26] and DuarteCarvajalino et al. [23] applied it to hyperspectral images. In addition, Duarte-Carvajalino et al. [23] added the term 1/M , with M being the number of spectral bands, trying to make the diffusion coefficient independent of the number of bands, i.e.,     M

1 ∂ ui (x, y) = ∇. c  |∇uj (x, y, t)|2  ∇ui  , ∂t M j=1 i = 1, 2, . . . , M.

(11)

A computationally efficient algorithm for (11) was proposed by Duarte et al. [23], where fast linear solvers and approximation schemes such as additive operator splitting are used. Duarte et al.’s approach is used for generating the scale-space representation in the proposed vector SIFT detector. We generate a scale-space representation of the hyperspectral image as a stack of vector images, with the original image being the initial condition and scale values being equal to T = 1/2 σ 2 [27], where σ values are calculated from (9). Once the smoothed images at different scales have been generated, the vector difference between adjacent scales (DoS) is performed, and it is used to find the extrema. The DoS function serves the same purpose as the DoG function in Lowe’s approach for scalar images. Because nonlinear diffusion preserves edges better than Gaussian smoothing across scales, this improves the detection of edges and similar structures that are candidates for interest points. Fig. 4 shows the DoG and the DoS functions for a scalar image. Notice how edges of the image are much better preserved in the DoS function across scales, whereas in the DoG function, edges

disappear because of Gaussian blurring when the scale value is increased. Hence, when performing the local extrema operation on the DoS function, edges and similar structures will more likely appear as extrema points than on the DoG function, where they are flattened out because of Gaussian blurring.

B. Local Extrema of Pixel-Vectors The next stage of the SIFT detector is the determination of extrema points for the DoS function. Because pixels in HSI are vectors, finding the extrema points is not as well defined as in the scalar case. According to [28]–[31], vectorordering methods can be classified into marginal, reduced, and conditional ordering. In marginal ordering (or M -ordering), the components of the vector are independently ordered along each dimension. In reduced ordering (R-ordering), vectors are mapped onto a scalar value and are ordered according to an ordering of the scalar. In conditional ordering (C-ordering), the vectors are ordered based on the marginal ordering of some of its components. The advantages and disadvantages of the methods are discussed in the referenced literature. For example, marginal ordering can output vectors that are not present in the input vector set. Conditional ordering uses few components for ranking and may not take full advantage of the vectorial nature of the data but produces an output that is a member of the input set. Reduced ordering also does not take full advantage of the vectorial nature of the data but also produces an output that is a member of the input set. Lexicographical ordering [29] is used here for pixel ordering. This is a type of conditional ordering and has two characteristics that makes it suitable for vector ordering in the context of hyperspectral image data. First, lexicographical ordering preserves input vectors and do not introduce any new vectorpixels in the results (i.e., the output vector is a member of the input set). In addition, the computed extrema vectors are unique [29]. Equation (12) is a classical lexicographical comparison, and some variations have been proposed [30], with the purpose

DORADO-MUÑOZ et al.: VECTOR SIFT DETECTOR FOR INTEREST POINT DETECTION

of better tuning the priority of the components and the degree of their influence ∀a, b ∈ RM ,

a

Suggest Documents