sensors Article
Improved WαSH Feature Matching Based on 2D-DWT for Stereo Remote Sensing Images Mei Yu 1,2,3 , Kazhong Deng 1,2, * , Huachao Yang 1,2 and Changbiao Qin 4 1 2 3 4
*
NASG Key Laboratory of Land and Environment and Disaster Monitoring, China University of Mining and Technology, Xuzhou 221116, China;
[email protected] (M.Y.);
[email protected] (H.Y.) School of Environment Science and Spatial Informatics, China University of Mining and Technology, Xuzhou 221116, China Jiangsu Key Laboratory of Resources and Environmental Information Engineering, China University of Mining and Technology, Xuzhou 221116, China Wits Mining Institute, University of the Witwatersrand, Johannesburg 2050, South Africa;
[email protected] Correspondence:
[email protected]; Tel.: +86-135-1256-5985
Received: 16 July 2018; Accepted: 15 October 2018; Published: 16 October 2018
Abstract: Image matching is an outstanding issue because of the existing of geometric and radiometric distortion in stereo remote sensing images. Weighted α-shape (WαSH) local invariant features are tolerant to image rotation, scale change, affine deformation, illumination change, and blurring. However, since the number of WαSH features is small, it is difficult to get enough matches to estimate the satisfactory homography matrix or fundamental matrix. In addition, the WαSH detector is extremely sensitive to image noise because it is built on sampled edges. Considering the shortcomings of the WαSH detector, this paper improves the WαSH feature matching method based on the 2D discrete wavelet transform (2D-DWT). The method firstly performs 2D-DWT on the image, and then detects WαSH features on the transformed images. According to the methods of descriptor construction for WαSH features, three matching methods on the basis of wavelet transform WαSH features (WWF), improved wavelet transform WαSH features (IWWF), and layered IWWF (LIWWF) are distinguished with respect to the character of the sub-images. The experimental results on the dataset containing affine distortion, scale distortion, illumination change, and noise images, showed that the proposed methods acquired more matches and better stableness than WαSH. Experimentation on remote sensing images with less affine distortion and slight noise showed that the proposed methods obtained the correct matching rate greater than 90%. For images containing severe distortion, KAZE obtained a 35.71% correct matching rate, which is unacceptable for calculating the homography matrix, while IWWF achieved a 71.42% correct matching rate. IWWF was the only method that achieved the correct matching rate of no less than 50% for all four test stereo remote sensing image pairs and was the most stable compared to MSER, DWT-MSER, WαSH, DWT-WαSH, KAZE, WWF, and LIWWF. Keywords: stereo remote sensing image; feature matching; 2D-DWT; WαSH; MSER; image deformation
1. Introduction Image feature matching, which aims to acquire reliable homonymous features—matches between images—is one of the most basic and important processes of remote sensing image processing [1,2]. The accuracy of matching results directly affects the reliability of image registration [3] and fusion, automatic moving objects detection [4], and aerial triangulation and 3D reconstruction [5]. Feature matching includes three steps: feature detecting, feature describing, and similarity matching. At present, the most widely used feature matching algorithm is the Scale Invariant Feature Sensors 2018, 18, 3494; doi:10.3390/s18103494
www.mdpi.com/journal/sensors
Sensors 2018, 18, 3494
2 of 13
Transform (SIFT) algorithm [6] which firstly uses the difference-of-Gaussian (DOG) operator to identify potential interest features, followed by features localization and orientation assignment, and then describes the features according to local image gradients, finally performing the similarity matching to acquire matches. The SIFT algorithm is adaptable to distortions, such as scale, rotation, translation, etc. Therefore, it is used in most matching processes of aerial or aerospace images. Yu et al. [1] employed the SIFT algorithm to create a set of matches and then removed outliers from the matches and the estimated the transformation relationship between unmanned aerial vehicle (UAV) images by using a maximum likelihood framework. It achieved the feature matching of UAV images under non-rigid transformation in precision agriculture. Chang et al. [7] combined the SIFT algorithm with scale constraints and geometric constraints to achieve highly-accurate registration of high-resolution satellite image. Xiang et al. [8] used a SIFT-liked matching algorithm to extract reliable matches for GF-3 SAR images and then provided a coarse-to-fine registration algorithm, giving a robust, efficient and precise performance of SAR image. Qi and Zhu [9] decomposed remote sensing images by discrete wavelet transform (DWT), and extracted the SIFT features on low-frequency sub-images, then used a coarse-to-fine strategy to improve the precision and efficiency of SIFT. However, because of the effect of affine distortion as a result of the camera view change at the time of image acquisition, the matching performance of SIFT algorithm decreases sharply result in that reliable potential matches cannot be obtained [10,11]. In this case, affine-invariant features, such as Harris-Affine [12], maximally stable extremal region (MSER) [13], and edge-based region (EBR) [14,15] are required. Mikolajczyk et al. [16] compared six common affine-invariant feature detectors in which the results showed that MSER performs best in most cases and does well on images containing homogeneous regions with distinctive boundaries. Based on the MSER detector, Zhang et al. [17] proposed an automatic coarse-to-fine image registration strategy that achieves high accuracy for images containing affine distortion. Sedaghat and Ebadi [18] extracted the MSER and Harris-Affine features and selected the initial matches using the Euclidean distance between feature descriptors, followed by a consistency check process. Then oriented least square matching is developed to improve the positional accuracy of the affine-invariant features. Their affine-invariant image matching method was successfully applied for various synthetic and real close range and satellite images matching. Yu et al. [19] acquired the initial matches based on the MSER detector and calculated the affine transformations between the initial matches. A small number of high-accuracy matches were detected by combining the SIFT feature matching under the constraint of affine transformation and the neighborhood supporting principle. Then the propagation is performed based on the global affine transformation constraints to find more matches satisfying the accuracy registration of oblique images. The weighted α-shape (WαSH) [20,21] algorithm has certain stability to the geometric and radiometric distortion of images and can detect invariant local features with high uniqueness. Compared with Hessian-Affine [12], MSER, medial feature detector (MFD) [22], edge foci (EF) [23], and KAZE algorithms [24], Varytimidis et al. [20] found that the WαSH feature detection algorithm performs among the best in all cases of image distortion through a comprehensive analysis of factors, such as repeatability, matching score, and number of matches. However, there also are some shortcomings of WαSH feature matching: (1) When there are large geometric distortions between images, including affine changes and scale changes, WαSH feature matching detects a few matches that contain a significant proportion of false matches and causes failure in calculating the transformation model between image pairs; (2) Since the WαSH features are extracted based on the image edge information, the repeatability and matching score will be greatly decreased if there is significant noise in images. In order to deal with the problems above, we improve the WαSH feature matching method based on 2D-DWT. This method firstly performs 2D-DWT decomposition on the original image and obtains four sub-images: LL, LH, HL, and HH. We discard the HH layer in which the noise in the original image is concentrated and transform the remaining sub-images to the original image size. WαSH detector is then performed on the transformed images. Depending on the descriptor construction methods for the WαSH features, three matching methods on the basis
decomposition on the original image and obtains four sub-images: LL, LH, HL, and HH. We discard the HH layer in which the noise in the original image is concentrated and transform the remaining sub-images to the original image size. WαSH detector is then performed on the transformed images. Sensors 2018, 18, 3494 3 of 13 Depending on the descriptor construction methods for the WαSH features, three matching methods on the basis of wavelet transform WαSH feature (WWF), improved wavelet transform WαSH feature of wavelet transform WαSH feature (WWF), improved wavelet transform WαSH feature (IWWF), (IWWF),and and layered IWWF (LIWWF), respectively, are distinguished. The determined matches are layered IWWF (LIWWF), respectively, are distinguished. The matches are finally by finally determined using similarity matching. The procedure of improved WαSH feature matching usingby similarity matching. The procedure of improved WαSH feature matching based on 2D-DWT is based illustrated in Figure 1. on 2D-DWT is illustrated in Figure 1.
HH
HL
I1
I2
2D-DWT
2D-DWT
LH
LL
LL
LH
HL
Image size transforming
Image size transforming
WαSH detecting
WαSH detecting
Feature set constructing (WWF/IWWF/LIWWF)
Feature set constructing (WWF/IWWF/LIWWF)
HH
Similarity matching according to nearest neighbor distance ratio matches 1. Procedure of improved WαSH matching based onbased 2D-DWT. and I2 are the Figure 1.Figure Procedure of improved WαSHfeature feature matching onI12D-DWT. I1 reference and I2 are the image and the image to be matched, respectively. reference image and the image to be matched, respectively.
2. 2D-DWT
2. 2D-DWT DWT of an image transforms the image data from the spatial domain to frequency domain by using hierarchical decomposingthe functions 2D decomposition is achieved by sequentially DWT of an image transforms image [25]. dataAfrom the spatial domain to frequency domain by applying 1D-DWT along horizontal and vertical directions, respectively [26]. In a 1D decomposition, using hierarchical decomposing functions [25]. A 2D decomposition is achieved by sequentially there are both low-(L) and high-pass (H) filters along each direction, which leads to one “L” and applyingone 1D-DWT along horizontal directions, respectively [26]. In components a 1D decomposition, “H” sub-band, respectively. and The vertical “L” sub-band corresponds to low-frequency in there arethe both low-(L) andwhile high-pass (H) filters to along each direction, which leads to one “L” wavelet domain, the “H” sub-band high-frequency ones. After doing the first level 2D and one decomposition, there areThe four sub-bands in the wavelet domain,towhich are labeled ascomponents LL, LH, HL, in the “H” sub-band, respectively. “L” sub-band corresponds low-frequency HH, respectively. wavelet and domain, while the “H” sub-band to high-frequency ones. After doing the first level 2D The LL sub-band is generated by convolving the low-pass wavelet filter along horizontal and decomposition, there are four sub-bands in the wavelet domain, which are labeled as LL, LH, HL, vertical directions on the image. It is an approximate representation of the original image. The LH and HH,sub-band respectively. is generated by convolving a low-pass wavelet filter in the horizontal direction and then Theconvolving LL sub-band is generated by in convolving the low-pass wavelet filter along horizontal and a high-pass wavelet filter the vertical direction on the image. It represents a character in the vertical direction of the original image. The HL sub-band is generatedof bythe convolving a low-pass vertical directions on the image. It is an approximate representation original image. The LH wavelet filter in the horizontal direction after a convolution of high-pass wavelet filter in the verticaland then sub-band is generated by convolving a low-pass wavelet filter in the horizontal direction direction on the image. It represents the horizontal characteristic of the original image. The HH convolving a high-pass wavelet filter in the vertical direction on the image. It represents a character sub-band is a wavelet coefficient generated by convolving high-pass wavelet filters in two directions in the vertical direction of the the original image. The HL sub-band is generated by LL convolving on the image. It represents diagonal edge characteristics of the image. Therefore, the sub-band a lowpass wavelet filter the horizontal direction afterthe a convolution of high-pass wavelet filter contains mostinfeatures of the original image, while noise in the original image is concentrated in in the the HH sub-band. vertical direction on the image. It represents the horizontal characteristic of the original image. The HH sub-band is a wavelet coefficient generated by convolving high-pass wavelet filters in two directions on the image. It represents the diagonal edge characteristics of the image. Therefore, the LL sub-band contains most features of the original image, while the noise in the original image is concentrated in the HH sub-band.
Sensors 2018, 18, 3494
4 of 13
3. WαSH Feature Detector The WαSH detector exploits stable and distinctive dominant structural features in an image on the basis of image edge and α shape. It can be divided into four parts, namely edge detecting, edge sampling, WαSH constructing and feature extracting. The processes are as follows: (1) (2)
(3)
(4)
Edge detecting: g is the normalized gradient image of the input image in [0, 1]. The binary edge image e is acquired by applying the Canny detector on g. Edge sampling: With a fixed sampling interval s, e is sampled uniformly along edges to obtain a discrete set of edge points P ⊆ R2 . For each point p ∈ P, the weight w( p) is defined to be multiple of its gradient strength: s 2 , (1) w ( p ) = g( p ) 2 where g( p) ∈ [0, 1] is the value of g at p. WαSH construction: Regular triangulation of P is calculated. The line segments and triangles of triangulation are added into a collection K and are ordered by descending size, followed by the weighted α-shape constructing. For more details, please see [19]. Feature extracting: the neighbors of each triangle σT ∈ K are its three edges, while the neighbors of each line segments σE ∈ K are the two adjacent triangles in the triangulation. Since the size of an edge is not larger than that of its two adjacent triangles, and α is decreasing, the intuition is that this edge can keep the two triangles disconnected until it is processed itself. Based on this neighborhood system, each connected weighted α complex is called a component. Considering that each element (line segment or triangle) of K in descending order of size is an independent component, the components are joined with their neighbors that have already been processed. Decreasing the value of α continuously, the strength of a component kU is calculated following Equation (2) when it is joined to another component through its boundary element: s(kU ) =
a(kU ) , ρT
(2)
where a(kU ) is the total area of kU (the area of the line segment is 0); ρT is the size of the boundary element. If strength is greater than a fixed threshold τ, the component is determined to be a WαSH feature which is consequently fitted to an ellipse. We assume the image coordinate of the ellipse center is ( x, y), and the ellipse parameter is ( a, b, c), then the ellipse equation is a( x − u)( x − u) + 2b( x − u)(y − v) + c(y − v)(y − v) = 1. The WαSH feature is denoted as x = ( x, y, a, b, c)T in this paper. 4. Algorithm 4.1. Feature Set Construction Wavelet transform (the Haar wavelet base is used in this paper) is firstly performed on the reference image and the image to be matched to obtain four sub-bands—LL, LH, HL, and HH. Since the sub-bands are two-dimensional, we consider them as four images of the LL, LH, HL, and HH layer. The LL layer image retains the original image content information, the LH layer, and the HL layer, respectively, maintain the vertical and horizontal information on the original image after wavelet transform. They are favorable for edge information and WαSH local features detecting. The noise in the original image is concentrated in the HH layer. Therefore, we perform WαSH detector on the images of LL layer, LH layer, and HL layer while discarding the HH layer. In order to avoid inconsistent coordinates, we transform the sizes of three-layer images to be equal to the original image. We perform the WαSH detection on the images of the LL layer, LH layer, and HL i = 0, 1, . . . , nLL , SLH = xLH i = 0, 1, . . . , nLH , layer, and obtain the WαSH feature sets SLL = xLL i i i = 0, 1, . . . , nHL , respectively, as displayed in Figure 2. SHL = xHL i
Sensors 2018, 18, x FOR PEER REVIEW
obtain
the
WαSH
5 of 13
feature
sets
SLL = { xiLL | i = 0,1,..., n LL }
Sensors 2018, 18, 3494HL HL
SHL = { xi
,
SLH = { xiLH | i = 0,1,..., n LH }
,
5 of 13
| i = 0,1,..., n } , respectively, as displayed in Figure 2.
(a)
(b)
(c)
(d)
Figure 2. WαSH features on (a) originalimage, image, (b) (c) (c) LH LH layer,layer, and (d) HL(d) layer. Figure 2. WαSH features on (a) original (b)LL LLlayer, layer, and HL layer.
According to the normalization theory in [16], the WαSH feature, an elliptical region, can be
According to the normalization theory in [16], the WαSH feature, an elliptical region, can be normalized into a circular one, on which the description of the feature is more robust to affine distortion. normalized A into a circular on which the were description robust variety of featureone, describing operators comparedof in the [27], feature reaching is themore conclusion thatto affine distortion. SIFT-based description operators have better robustness than others. Therefore, we use the SIFT algorithm calculatedescribing the descriptors for WαSH features. A variety oftofeature operators were compared in [27], reaching the conclusion that Three feature sets—the wavelet transform WαSH feature set, Therefore, the improved wavelet SIFT-based description operators have better robustness than(WWF) others. we use the SIFT transform WαSH feature (IWWF) set, and the layered IWWF (LIWWF) set are constructed based on the algorithm to calculate the descriptors for WαSH features. above-mentioned WαSH local invariant features detected on the images of LL layer, LH layer, and HL Three sets—the transform WαSHonfeature the improved layerfeature in this paper. As the wavelet features consist of coordinates the image(WWF) x and theset, description vectors d, wavelet transform feature (IWWF) set,the and the layered set arecontext, constructed weWαSH then denote the features using representation of IWWF f = (x, d)(LIWWF) . In the following we use abased on superscript to distinguish different feature sets constructing method and a subscript to distinguish the above-mentioned WαSH local invariant features detected on the images of LL layer,the LH layer, original image for feature sets M containing descriptors. and HL layer in this paper. As the features consist of coordinates on the image x and the description d , we then the features using the representation of f = ( x , d ) . In the following vectors 4.1.1. WWF Set denote Constructing sets method W , where context, we use superscript to distinguish constructing and subscript Set SaLL , SLH , and SHL are combineddifferent into set SWfeature = xW nW = nLLa + i i = 0, 1, . . . , n HL is to distinguish original image for feature setsThe containingvector descriptors. M description nLH + nthe the number of features in WWF. of feature xW i is calculated by
W using n SIFT algorithm o on original image. Then the set of WWF is acquired and denoted as M = W W fW = 0, 1, . . . , nW , where fW 4.1.1. WWF Set Constructing i i i = (xi , di ). The process is demonstrated in Figure 3a.
Set SLL , SLH , and SHL are combined into set SW = { xiW | i = 0,1,..., n W } , where n W = n LL + n LH + n HL is the number of features in WWF. The description vector of feature xiW is calculated by using SIFT algorithm on original image. Then the set of WWF is acquired and denoted M W = { f i W | i = 0,1,..., n W } , where f i W = ( xiW , d iW ) . The process is demonstrated in Figure 3a.
as
Sensors 2018, 18, 3494
6 of 13
Sensors 2018, 18, x FOR PEER REVIEW
6 of 13
SLL S
SLL
SLH
SHL
SIFT descriptor extracted on original image
LL
LH
S
S
SLH
SHL
HL
SIFT descriptor SIFT descriptor extracted on extracted on original image LL layer
SIFT descriptor SIFT descriptor extracted on extracted on original image LL layer Add F
M
LL
M
LH
HL
M
MLL
W
M
MLH
MHL
MI
ML
(a)
(b)
(c)
Figure 3. Feature constructingprocess process of and (c) (c) LIWWF. Figure 3. Feature setsetconstructing of (a) (a)WWF, WWF,(b) (b)IWWF, IWWF, and LIWWF.
4.1.2. IWWF Set Constructing
4.1.2. IWWF Set Constructing
For xLL i , the features on the image of LL layer, the description vector is calculated on the image of
For xiLLwhile , the features on thevectors imageof ofxLL layer, vector is calculated on the image LH and LL layer, the description xHLthe aredescription calculated on the original image due to the i
i
of LL layer, while the description vectors of xiLHWe and xiHL are calculated lack of image information on LH and HL layers. acquired three sets MLLon = the fLLoriginal . . . , nLLdue , to i = 0, 1,image
n
o
i
LL the M lack on LH sets fLL M LL= =({xLL fi L,Ld|LL i =), 0,1,..., LH of image = fLH i information = 0, 1, . . . , nLH andand MHLHL = layers. fHL i We = 0,acquired 1, . . . , nHLthree , where fLH =n }
n
o
n
i
o
i
i
i
i
i
LH LH HL HL LL LL LH , M(LH = { fLH | i = 0,1,..., Mrespectively. = { f i HL | i Merging = 0,1,..., nsets } ,M where = (M xiLL = ( xiLH is , d iLH ) HL n HL LL HL, d i ) , f i i xLH = (xi} and , dHL , MLHf i, and , the set of IWWF i )n i , di ) and fi o LL HL HL I MHL f i HL = ( xand , MLHn, I and the+set is obtained andobtained denoted as MI = fMerging . . , nIM, where = nLL + ,nLH nHLofisIWWF the number of i ,d i ) respectively. i = 0, 1, .sets
i
I
LL
I I in IWWF. is demonstrated in Figure 3b. = { fprocess +n andfeatures denoted as M I The i | i = 0,1,..., n } , where n = n
LH
+ nHL is the number of features in
IWWF. process is demonstrated in Figure 3b. 4.1.3.The LIWWF Set Constructing LL
SimilarSet to IWWF set constructing method, the description vectors of features in S 4.1.3. LIWWF Constructing LH HL
are calculated on the image of LL layer, the descriptor vectors of features in S and S are calculated on Similar to image. IWWF The set difference constructing theconstructing descriptionmethod vectorsis of in FSLLis are the original withmethod, IWWF set thatfeatures a variable and SoHL are calculated on the image descriptor vectors of features in SLHTherefore, introduced to distinguish thelayer, layer the to which belong in LIWWF. we calculated obtain of LL n o the features n LL LH LL LH LL LH on sets the M original method ,is that variable = (image. fi , F ) iThe = 0, difference 1, . . . , n , Fwith = 1 IWWF , M =set(fconstructing F= 2 ,aand MHL =F is i , F ) i = 0, 1, . . . , n n o introduced to the layer to which the features belong in LIWWF. Therefore, we obtain sets distinguishHL HL (fHL , F = 3 , respectively. Merging sets MLL , MLH , and M , the set of LIWWF i , F ) i = 0, 1, . . . , n LH LH LH o M LL = {( f i LL , F ) | i = 0,1,..., n LL , F = 1}n , M = {( f , F ) | i = 0,1,..., n , F = 2 } , and i is obtained and denoted as ML = (fLi , F ) i = 0, 1, . . . , nL , F = 1, 2, 3 , LL where LH nL = nLL + HL nLH + nHL HL HL HL M = {( fi , F ) | i = 0,1,..., n , F = 3} , respectively. Merging sets M , M , and M , the set of is the number of features in LIWWF. The process is demonstrated in Figure 3c. LIWWF is obtained and denoted as M L = {( f i L , F ) | i = 0,1,..., n L , F = 1, 2, 3} , where n L = n LL + n LH + n HL 4.2.number SimilarlyofMatching is the features in LIWWF. The process is demonstrated in Figure 3c. W The WWF sets MW 1 and M2 are constructed separately for the reference image I1 and the image W W 4.2.to Similarly Matching be matched I2 , and the number of features are nW 1 and n2 . For each element in M1 , the Euclidean distance between theWfeature description vector di and all of the feature description vector d j in MW 2 are The WWF sets M1 and M 2W are constructed separately for the reference image I and the image calculated. If the nearest distance to nearest neighbor distance ratio (NNDR) between the1 di and d j is W W eachWelement in M W , the to be I 2 corresponding , and the number features are n1Wdistance and n2in .MFor lessmatched than 0.8, the featuresofwith the minimum 1 and M2 are determined1 as a match.distance For sets IWWF of Ithe I2 , the description matches are determined in the process with that in WWF. Euclidean between feature vector di and allsame of the feature description vector 1 and W LIWWF of I1 and I2 , we first compare the values of F. If the values of F are equal, the Euclidean For sets d j in M 2 are calculated. If the nearest distance to nearest neighbor distance ratio (NNDR) between W distance is calculated of the two description vectors in MW to 1 and M2 . If not, we set the distance W the infinity di and and is less than 0.8, the corresponding features with the minimum distanceunder in Mthe d j skip 1 and the distance calculation. Finally, the corresponding features are determined M 2Wsame are determined as of a match. criteria as that WWF. For sets IWWF of I1 and I 2 , the matches are determined in the same
process with that in WWF. For sets LIWWF of I1 and I 2 , we first compare the values of F . If the values of F are equal, the Euclidean distance is calculated of the two description vectors in M1W and M 2W . If not, we set the distance to infinity and skip the distance calculation. Finally, the corresponding
features are determined under the same criteria as that of WWF.
Sensors 2018,2018, 18, x18, FOR PEER REVIEW Sensors 3494
7 of 137 of 13
5. Experiments and Analysis
5. Experiments and Analysis
5.1. Performance Compared to to WαSH 5.1. Performance Compared WαSHfor forDifferent Different Distortion Distortion Six representative image sequences Mikolajczyk’s personal homepage Six representative image sequencesin inthe the Krystian Krystian Mikolajczyk’s personal homepage [28] [28] withwith affineaffine distortion, scale distortion, andaagroup group UAV image distortion, scale distortion,and andillumination illumination change, change, and of of UAV image pairspairs withwith different levels of noise are employed to test the performance of WαSH, WWF, IWWF, and LIWWF. different levels of noise are employed to test the performance of WαSH, WWF, IWWF, and LIWWF. The images test images shown Figure44in inwhich which each sixsix images. The test areare shown inin Figure eachsequence sequencecontains contains images.
(a)
(b)
(c)
(d)
(e)
(f) Figure 4. Test image sequences withaffine affinedistortion, distortion, scale illumination change and and noisenoise Figure 4. Test image sequences with scaledistortion, distortion, illumination change change in images. (a) graffti: the image affine distortion increasing from left to right; (b) wall: the the change in images. (a) graffti: the image affine distortion increasing from left to right; (b) wall: image affine distortion increasing from left to right; (c) bikes: the image scale distortion increasing from image affine distortion increasing from left to right; (c) bikes: the image scale distortion increasing left to right; (d) mosaic: the image illumination increasing from left to right; (e) Leuven: the image from left to right; (d) mosaic: the image illumination increasing from left to right; (e) Leuven: the illumination decreasing from left to right; (f) library: the image noise increasing from left to right. image illumination decreasing from left to right; (f) library: the image noise increasing from left to right.The number of total matches (Nt ), the number of correct matches (Nc ) and correct matching rate
(Cr ) are employed to evaluate the performances of the four methods. The matching results are shown The number of total matches t), the thehorizontal number of correct matches (Nc)pairs and representing correct matching in Figure 5, of which the 1, 2, 3, 4,(N 5 of axis are different image image rate (Cr) are employed to evaluate the performances of the four methods. The matching results are shown 2 to 1, 3 to 1, 4 to 1, 5 to 1, and 6 to 1, respectively. Cr is calculated as the ratio percentage of Nc to N t. According to [16],the a pair washorizontal determined to be correct one when therepresenting overlap error image is in Figure 5, of which 1, 2,of3,matches 4, 5 of the axis arethe different image pairs less than 2 to 1, 3 to 1, 50%. 4 to 1, 5 to 1, and 6 to 1, respectively. Cr is calculated as the ratio percentage of Nc to Nt.
According to [16], a pair of matches was determined to be the correct one when the overlap error is less than 50%.
Sensors 2018, 18, 3494 Sensors 2018, 18, x FOR PEER REVIEW
8 of 13 8 of 13
80
100
80
60
90
Nc
Nt
60 40
40 20
20 1
2
3
4
image pairs of graffti
50 1
80
80
80
40
40
20
20 1
2
3
4
0
5
image pairs of wall
1
2
3
4
image pairs of wall 120
90
100
80
Cr(%)
Nc
60
4
0
5
2
3
4
Nc 3
4
5
image pairs of mosaic
3
4
5
100
60
95
50
90
40
1
2
3
4
5
image pairs of bikes
70
20 2
2
image pairs of wall
60
40
5
30
1
1
70
image pairs of bikes
image pairs of bikes 90 80 70 60 50 40 30 20 10
1
Cr(%)
3
5
50
20 2
4
60
20
5
40
1
3
40
80
Nt
140 120 100 80 60 40 20 0
Cr(%)
Nc
100
Nt
100
60
2
image pairs of graffti
100
0
70
1 2 3 4 5 image pairs of graffti
5
60
80 60
0
0
Nt
Cr(%)
100
85 80 75
10 1
2
3
4
5
image pairs of mosaic Figure 5. Cont.
70 1
2
3
4
5
image pairs of mosaic
170
85 100
Nt
Cr(%)
Nc
120 Sensors 2018, 18, 3494 70 Sensors 2018, 18, x FOR PEER REVIEW
3
image 120
4
5
100 160
pair of leuven
1
2
3
4
5
20
800
image pair of leuven
1
2
3
4
5
image pair of leuven
150
Nt
1100 2
160
40
65
60
5
100
3
4
80
120
3
4
5
image pair of library 50
0
80
1
2
3
4
5
image pair of library
40
60
80
image pair of leuven
2
65
70 60
1
9 of 13 9 of 13
70
1
60
401
2
3
4
5
image pair of leuven
80 100 75
Cr(%)
200
50 0
12050
Nc
100
70
Nc
Nt
150
85
75
Cr(%)
2
Cr(%)
0
Nt
200
1
90
150
170
Nc
20
50
80
2
3
4
5
image pair of leuven
20 0
1
2
3
4
5
image pair of library
40 20
0 0 0 Figure 5. Results of different matching methods for the test data. Nt: number of total matches; Nc: the 1 2 3 4 5 1
2
3
4
5
number of correct matches ; and Cr: correct matching rate (%). image pair of library image pair of library
1
2
3
4
5
image pair of library
Figure 5. Results of different matching methods for the test data. N : number of total matches; N : the
t c According to the above experimental results, analysis as follow: Figure Results different matching methods forthe the test data. Nt:is number of total matches; Nc: the number5.of correctofmatches; and Cr : correct matching rate (%). From Figure canmatches find that similar orders and changes occurred in the plots of the number5,of we correct ; and Cthe r: correct matching rate (%). According to the above experimental results, theIWWF, analysisand is as LIWWF follow: offered higher Nt and Cr than number of total matches and correct matches. WWF, According to the above analysis as follow: From Figure 5, we canexperimental find that theresults, similarthe orders and ischanges occurred in the plots of the WαSH for most image pairs. That is because the improved methods use features on images of LL From 5, we can thatmatches. the similar orders and and changes occurred inhigher the plots of the number ofFigure total matches andfind correct WWF, IWWF, LIWWF offered Nt and Cr layer, LH layer, and HL layer which contain obvious image structure characteristics. Combining the number of total and correct matches. WWF,the IWWF, and LIWWF offered higher Non t and Cr than than WαSH formatches most image pairs. That is because improved methods use features images of features WαSH onlayer, allforof three sub-images, WWF, IWWF, andimage LIWWF achieved matches most image pairs. Thatwhich is because the improved methods usecharacteristics. featuresmore on images of LLthan the LL LH layer, and HL layer contain obvious structure Combining layer, LH layer, and HL layer which contain obvious image structure characteristics. Combining the the features on all of three sub-images, WWF, IWWF, and LIWWF achieved more matches than the original WαSH method. features onimage all of three IWWF, and LIWWF achieved moreofmatches the IWWF, original WαSH method. Take the 5th pair sub-images, of Leuven WWF, in Figure 5 for example. The results WαSH,than WWF, original WαSH method. Take the 5th image pair of Leuven in Figure 5 for example. The results of WαSH, WWF, IWWF, and LIWWF for this image pair are shown in Figure 6, and the repeatability and correct matching Take thefor 5ththis image pair of Leuven in Figure 5 for example. The results and of WαSH, IWWF, and LIWWF image pair are shown in Figure 6, and the repeatability correctWWF, matching rate rate are and displayed in Table 1. The repeatability measures how large proportion of the detected LIWWF for image pair are shown measures in Figure how 6, and the proportion repeatabilityofand are displayed in this Table 1. The repeatability large the correct detectedmatching features features rate is the the corresponding scene region. The correct matching rateproportion assesses thethedistinctiveness of arecorresponding displayed in scene Table 1. The repeatability measures howassesses large of detected is region. The correct matching rate the distinctiveness of the features isfeatures. the corresponding scene region. The correct matching rate assesses the distinctiveness of detected the detected features.
the detected features.
Nt =C34; Cr = 64.71% (a) Nt(a) = 34; r = 64.71%
(c) Nt = 55; Cr = 81.82%
(b)(b) Nt =N49; Cr = 77.55% t = 49; Cr = 77.55%
(d) Nt = 60; Cr = 80%
(c) Ntof= (a) 55;WαSH, Cr = 81.82% (d)forNthe t = 60; Cr = 80% Figure 6. Results (b) WWF, (c) IWWF, and (d) LIWWF 5th image pair of Leuven. The red lines indicate correct matches while the blue lines are incorrect matches. 6. Results of (a) WαSH, (b) WWF, IWWF, and for for the 5th pair ofpair Leuven. Figure 6.Figure Results of (a) WαSH, (b) WWF, (c)(c)IWWF, and(d) (d)LIWWF LIWWF theimage 5th image of Leuven. The red lines indicate correct matches while the blue lines are incorrect matches. The red Table lines indicate matches while theand blueLH lines arehad incorrect matches. 1 shows correct the WαSH features on LL layer a relative high repeatability. The
reason is that the image of LL and LH layer contains obvious image structure characteristics. In addition, the matching accuracy of features LL layer higher than matching Table 1 shows the WαSH features on LLdescribed and LHonlayer hadwas a relative hightherepeatability. The accuracy of features described on the original image. That means the image context of the LL layer is
reason is that the image of LL and LH layer contains obvious image structure characteristics. In addition, the matching accuracy of features described on LL layer was higher than the matching accuracy of features described on the original image. That means the image context of the LL layer is
Sensors 2018, 18, 3494
10 of 13
Table 1. Repeatability and correct matching rate of WαSH features on different image layer of the 5th image pair of Leuven. WαSH Features on
Repeatability
Cr
Original image
53.719%
LL layer
48.7395%
LH layer HL layer
52.5424% 38.2022%
67.68% (described on original image) 79% (described on original image) 88.36% (described on LL layer) 62.5% (described on LH layer) 84.62% (described on HL layer)
Table 1 shows the WαSH features on LL and LH layer had a relative high repeatability. The reason is that the image of LL and LH layer contains obvious image structure characteristics. In addition, the matching accuracy of features described on LL layer was higher than the matching accuracy of features described on the original image. That means the image context of the LL layer is more distinctive than the image context of the original image in this case. Therefore, the Cr scores of IWWF and LIWWF are higher than the Cr scores of WαSH and WWF. 5.2. Application for Stereo Remote Sensing Images Matching In order to perform a comprehensive analysis of the proposed methods, we selected four pairs of stereo remote sensing images obtained by UAV to the experimental analysis. Seeing in Figure 7, these UAV images contain different distortion including affine distortion, rotation, blur, and noise. The MSER+SIFT matching algorithm (abbreviated as MSER), DWT-MSER, which applies DWT to smooth images before MSER detecting, the WαSH + SIFT matching algorithm (abbreviated as WαSH), DWT-WαSH, and KAZE [24] are compared in this experiment. KAZE is a multiscale feature detection and description algorithm in nonlinear scale spaces [24]. It is similar to SIFT, but more robust than SIFT. In the MSER/WαSH method, the MSER/WαSH detector is applied to extract local features, which are consequently described by the SIFT-describing algorithm. All of the comparison methods determine matches by using NNDR. The threshold value is set to 0.8, which is equal to that of the proposed method. The matches of the proposed methods are depicted in Figure 7. The matching results are presented in Table 2. Nt is the number of total matches; Nc is the number of correct matches which have overlap error lower than 50%; Cr is the correct matching rate that is equal to the percentage of Nc and Nt . Table 2. Results of matching for stereo remote sensing image pairs.
Figure 7a
Figure 7b
Figure 7c
Figure 7d
Nt Nc Cr Nt Nc Cr Nt Nc Cr Nt Nc Cr
MSER
DWT-MSER
WαSH
DWT-WαSH
KAZE
WWF
IWWF
LIWWF
9 4 44.44 34 7 20.59 66 46 69.70 19 8 42.11
1 0 0.00 11 3 27.27 55 38 69.09 9 4 44.44
7 6 85.71 13 6 46.15 15 13 88.67 14 7 50.00
10 6 60.00 16 7 43.75 34 27 79.41 30 12 40.00
28 10 35.71 206 111 53.88 1236 1114 90.13 199 82 46.23
15 7 46.67 21 5 23.81 18 18 100.00 28 13 46.43
14 10 71.43 22 11 50.00 34 33 97.06 36 19 52.78
18 10 55.56 28 14 50.00 40 36 90.00 42 20 47.62
Based on the results displayed in Table 1, the analysis is as follow: Comparing Nt of MSER and DWT-MSER, we can find that the number of total matches decreased when DWT was paired to MSER. Nevertheless, when DWT was paired to WαSH, it increased. The reason is that MSER detected much fewer features on the smoothed images than on the original images, while WαSH detected a similar number due to edge detecting process. The values of Nt
Table 1. Repeatability and correct matching rate of WαSH features on different image layer of the 5th image pair of Leuven.
WαSH Features on Original image
Repeatability 53.719%
Cr 67.68% (described on original image) Sensors 2018, 18, 3494 11 of 13 79% (described on original image) LL layer 48.7395% 88.36% (described on LL layer) LH layer 52.5424% (described on LH layer) and Nc of WWF, IWWF, and LIWWF were improved62.5% relative to WαSH, which is consistent with HL5.1. layer 38.2022% 84.62% (described on HL layer) the result in Section This is because WWF, IWWF, and LIWWF detect WαSH features on three images, which increase the number of features. Among the WαSH-based methods (i.e., WαSH, 5.2. ApplicationWWF, for Stereo Remote Images Matching DWT-WαSH, IWWF, andSensing LIWWF), LIWWF achieved the largest number of total matches and correct matches, followed by IWWF. C values of IWWF andproposed LIWWF were relatively stable. For images r In order to perform a comprehensive analysis of the methods, we selected four pairs in Figure 7b, which contain simple texture and large rotation distortion, the correct matching rate of stereo remote sensing images obtained by UAV to the experimental analysis. Seeing in Figure of 7, MSER,UAV DWT-MSER, WαSH,different DWT-WαSH, and WWF were affine less than 50% with a low Nblur, is difficult t thatand these images contain distortion including distortion, rotation, noise. to estimate the homography matrix between image pairs, while IWWF and LIWWF The MSER+SIFT matching algorithm (abbreviated as MSER), DWT-MSER, whichobtained appliesrelatively DWT to reliable matches. For images in Figure 7c containing affine distortion and slight noise, C values of r smooth images before MSER detecting, the WαSH + SIFT matching algorithm (abbreviated as KAZE and proposed methods were all above 90%, and the correct matching rate of WWF reached to WαSH), DWT-WαSH, and KAZE [24] are compared in this experiment. KAZE is a multiscale feature 100%. KAZE the largest valueinofnonlinear Nt than other However, KAZEto obtained a 35.71% detection andobtained description algorithm scalemethods. spaces [24]. It is similar SIFT, but more correct matching rate, which is extremely low, in Figure 7a that may cause failure in calculating the robust than SIFT. In the MSER/WαSH method, the MSER/WαSH detector is applied to extract local homography matrix between the image pairs. IWWF was the only method that achieved the correct features, which are consequently described by the SIFT-describing algorithm. All of the comparison matchingdetermine rate no lessmatches than 50% all ofNNDR. the testThe images. Therefore, is the stable to different methods byfor using threshold valueIWWF is set to 0.8,most which is equal to that kinds of distortion among these comparative methods. of the proposed method. The matches of the proposed methods are depicted in Figure 7.
(a-1)
(a-2)
(a-3)
(b-1)
(b-2)
(b-3)
(c-1)
(c-2)
(c-3)
(d-1)
(d-2)
(d-3)
Figure 7. Results Results of of proposed proposed methods methods for for remote remote sensing sensing images images obtained obtainedby by UAV. UAV. The red lines (a-1) , (b-1), (c-1)(d-1) anddisplay (d-1) indicate correct correct matches matcheswhile whilethe theblue blue lines incorrect matches. lines areare incorrect matches. (a-1), (b-1), (c-1) and result of WWF; (a-2), (b-2), (c-2) and (d-2) display result of IWWF; (a-3), (b-3), (c-3) and (d-3) display result of LIWWF.
6. Conclusions The WαSH local invariant feature has advantages in distinctiveness and robustness. However, image matching based on WαSH features performs a small number of matches and noise sensitivity. Focusing on addressing these problems, this paper improved WαSH based on 2D-DWT and proposed three methods of WWF, IWWF, and LIWWF for image matching. Experiments on sequences of affine distortion, scale distortion, illumination change images, and a sequence of simulated noise images
Sensors 2018, 18, 3494
12 of 13
showed that the WWF, IWWF, and LIWWF acquired more matches and better performance than WαSH. For remote sensing images with less affine distortion and slight noise, the correct matching rate of KAZE and proposed methods were higher than MSER and WαSH, and the correct matching rate of WWF reached to 100%. Nevertheless, for images containing severe distortion, KAZE obtained extremely low correct matching rate which is unacceptable for the homography matrix calculating, while IWWF achieved 71.42% correct matching rate. IWWF was the only method that achieved the correct matching rate no less than 50% for all of the test images and was the most stable comparing with MSER, DWT-MSER, WαSH, DWT-WαSH, KAZE, WWF, and LIWWF. The proposed methods are 2–3 times slower than WαSH because they detected WαSH features on three sub-images. In addition, the normalizing and resampling processes make the proposed methods moderate increase in computational cost. The proposed methods improved the number of matches and the stability at the expense of computational efficiency. We are going to focus on speeding up the proposed method in future work. Author Contributions: Conceptualization, M.Y.; Funding acquisition, K.D.; Methodology, M.Y.; Resources, H.Y.; Supervision, K.D.; Validation, H.Y.; Writing—original draft, M.Y.; Writing—review & editing, C.Q. Funding: National Key Research and Development Program of China: 2017YFE0107100, Natural Science Foundation of China: 51774270, 41604005, 41371438, Enterprise Commissioned Project: D010102 Acknowledgments: This work was supported by the National Key Research and Development Program of China (No. 2017YFE0107100), the Natural Science Foundation of China (no. 51774270, no. 41604005, and no. 41371438), and the Enterprise Commissioned Project (no. D010102). The authors would like to thank authors who have provided their algorithm WαSH as the free and open-source resource, which was very helpful for the research in this paper. Conflicts of Interest: The authors declare no conflict of interest
References 1. 2. 3. 4.
5. 6. 7. 8. 9. 10. 11. 12. 13.
Yu, Z.; Zhou, H.; Li, C. Fast non-rigid image feature matching for agricultural UAV via probabilistic inference with regularization techniques. Comput. Electron. Agric. 2017, 143, 79–89. [CrossRef] Sedaghat, A.; Mokhtarzade, M.; Ebadi, H. Uniform Robust Scale-Invariant Feature Matching for Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4516–4527. [CrossRef] Kahaki, S.M.M.; Arshad, H.; Nordin, M.J.; Ismail, W. Geometric feature descriptor and dissimilarity-based registration of remotely sensed imagery. PLoS ONE 2018, 13, e200676. [CrossRef] [PubMed] Kalantar, B.; Mansor, S.B.; Abdul Halin, A.; Shafri, H.Z.M.; Zand, M. Multiple Moving Object Detection from UAV Videos Using Trajectories of Matched Regional Adjacency Graphs. IEEE Trans. Geosci. Remote Sens. 2017, 55, 5198–5213. [CrossRef] Nex, F.; Remondino, F. UAV for 3D mapping applications: A review. Appl. Geomat. 2014, 6, 1–15. [CrossRef] Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [CrossRef] Chang, X.; Du, S.; Li, Y.; Fang, S. A Coarse-to-Fine Geometric Scale-Invariant Feature Transform for Large Size High Resolution Satellite Image Registration. Sensors 2018, 18, 1360. [CrossRef] [PubMed] Xiang, Y.; Wang, F.; You, H. An Automatic and Novel SAR Image Registration Algorithm: A Case Study of the Chinese GF-3 Satellite. Sensors 2018, 18, 672. [CrossRef] [PubMed] Youjie, Q.I.; Zhu, E. A New Fast Matching Algorithm by Trans-Scale Search for Remote Sensing Image. Chin. J. Electron. 2015, 24, 654–660. Jiang, S.; Jiang, W. Efficient structure from motion for oblique UAV images based on maximal spanning tree expansion. ISPRS J. Photogramm. 2017, 132, 140–161. [CrossRef] Morel, J.; Yu, G. ASIFT: A New Framework for Fully Affine Invariant Image Comparison. SIAM J. Imaging Sci. 2009, 2, 438–469. [CrossRef] Mikolajczyk, K.; Schmid, C. Scale & Affine Invariant Interest Point Detectors. Int. J. Comput. Vis. 2004, 60, 63–86. [CrossRef] Matas, J.; Chum, O.; Urban, M.; Pajdla, T. Robust Wide-baseline Stereo from Maximally Stable Extremal Regions. Image Vis. Comput. 2004, 22, 761–767. [CrossRef]
Sensors 2018, 18, 3494
14. 15.
16. 17. 18. 19. 20. 21.
22.
23. 24. 25. 26. 27. 28.
13 of 13
Tuytelaars, T.; Van Gool, L. Matching Widely Separated Views Based on Affine Invariant Regions. Int. J. Comput. Vis. 2004, 51, 61–85. [CrossRef] Tuytelaars, T.; Van Gool, L. Content-based image retrieval based on local affinely invariant regions. In Proceedings of the International Conference on Visual Information and Information Systems, Amsterdam, The Netherlands, 2–4 June 1999; Volume 1614, pp. 493–500. Mikolajczyk, K.; Tuytelaars, T.; Schmid, C.; Zisserman, A.; Matas, J.; Schaffalitzky, F.; Kadir, T.; Gool, L.V. A Comparison of Affine Region Detectors. Int. J. Comput. Vis. 2005, 65, 43–72. [CrossRef] Zhang, Q.; Wang, Y.; Wang, L. Registration of Images with Affine Geometric Distortion Based on Maximally Stable Extremal Regions and Phase Congruency. Image Vis. Comput. 2015, 36, 23–39. [CrossRef] Sedaghat, A.; Ebadi, H. Accurate Affine Invariant Image Matching Using Oriented Least Square. Photogramm. Eng. Remote Sens. 2015, 81, 733–743. [CrossRef] Yu, M.; Yang, H.; Deng, K.; Yuan, K. Registrating oblique images by integrating affine and scale-invariant features. Int. J. Remote Sens. 2018, 39, 3386. [CrossRef] Varytimidis, C.; Rapantzikos, K.; Avrithis, Y.; Kollias, S. α-shapes for local feature detection. Pattern Recogn. 2016, 50, 56–73. [CrossRef] Varytimidis, C.; Rapantzikos, K.; Avrithis, Y. WαSH: Weighted α-Shapes for Local Feature Detection. In Proceedings of the ECCV’12 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 788–801. Avrithis, Y.; Rapantzikos, K. The Medial Feature Detector: Stable Regions from Image Boundaries. In Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; Volume 24, pp. 1724–1731. Zitnick, C.L.; Ramnath, K. Edge foci interest points. In Proceedings of the International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 359–366. Alcantarilla, P.F.; Bartoli, A.; Davison, A.J. KAZE features. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Volume 7577, pp. 214–227. Hussein, A.J.; Hu, F.; He, F. Multisensor of thermal and visual images to detect concealed weapon using harmony search image fusion approach. Pattern Recogn. Lett. 2017, 94, 219–227. [CrossRef] Luo, X.; Bhakta, T. Estimating observation error covariance matrix of seismic data from a perspective of image denoising. Comput. Geosci. 2017, 21, 205–222. [CrossRef] Mikolajczyk, K.; Schmid, C. A Performance Evaluation of Local Descriptors. IEEE Trans. Pattern Anal. 2005, 27, 1615–1630. [CrossRef] [PubMed] Krystian Mikolajczyk Personal Homepage. Available online: http://lear.inrialpes.fr/people/mikolajczyk/ (accessed on 15 January 2018). © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).