A robust keypoint extraction and matching algorithm ...

3 downloads 4278 Views 2MB Size Report
Dec 13, 2015 - self-archive your article, please use the accepted ... your own website. You may ... anatomical structures from endoscopic video sequences can.
A robust keypoint extraction and matching algorithm based on wavelet transform and information theory for point-based registration in endoscopic sinus cavity data Nasim Dadashi Serej, Alireza Ahmadian, Shohreh Kasaei, Seyed Musa Sadrehosseini & Parastoo Farnia Signal, Image and Video Processing ISSN 1863-1703 Volume 10 Number 5 SIViP (2016) 10:983-991 DOI 10.1007/s11760-015-0849-2

1 23

Your article is protected by copyright and all rights are held exclusively by SpringerVerlag London. This e-offprint is for personal use only and shall not be self-archived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”.

1 23

Author's personal copy SIViP (2016) 10:983–991 DOI 10.1007/s11760-015-0849-2

ORIGINAL PAPER

A robust keypoint extraction and matching algorithm based on wavelet transform and information theory for point-based registration in endoscopic sinus cavity data Nasim Dadashi Serej1,2 · Alireza Ahmadian1,2 · Shohreh Kasaei3 · Seyed Musa Sadrehosseini4 · Parastoo Farnia1,2

Received: 21 December 2014 / Revised: 10 October 2015 / Accepted: 17 November 2015 / Published online: 13 December 2015 © Springer-Verlag London 2015

Abstract Feature extraction is one of the most important steps in processing endoscopic data. The extracted features should be invariant to image scale and rotation to provide a robust matching across a substantial range of affine distortions and changes in 3D space. In this study, a method is proposed on the basis of the dual-tree complex wavelet transform. First, a map is estimated for each scale, and then a Gaussian weighted additive function (GWAF) is determined. Keypoints are selected from local peaks of GWAF. The matching and registration are performed by applying normalized mutual information and our modified iterative closest Electronic supplementary material The online version of this article (doi:10.1007/s11760-015-0849-2) contains supplementary material, which is available to authorized users.

B

point. Results are reported in terms of robustness to rotation, noise, color, brightness, number of keypoints, index of matching and execution time for the building, standard clinical and phantom sinus datasets. Although the results are comparable to that of the speeded up robust features, scale invariant feature transform, and the Harris method, they are more robust to the variations in rotation, brightness, color, and noise than those obtained from other methods. Registration errors obtained for consequent frames for building, clinical and phantom datasets are 0.97, 1.46 and 1.1 mm, respectively. Keywords Endoscopic sinus images · Repeatable and reproducible keypoints · DTCWT · NMI · Modified ICP · Gaussian weighted function

Alireza Ahmadian [email protected]

1 Introduction

Nasim Dadashi Serej [email protected]

In recent years, the minimally invasive surgical (MIS) systems have become more vital for conducting complex surgical procedures. The major problem with these systems is the low accuracy of intraoperative and preoperative image registration, which has a direct impact on the final target registration error. In order to acquire sufficient accuracy during a surgery, an image-guided system (IGS) is used to intra-operatively register the applied tools to a pre-operative space of a patient’s anatomy. During a surgery, the possibility of motion estimation followed by a 3D reconstruction of anatomical structures from endoscopic video sequences can be of great help in improving the accuracy of an intraoperative video-based registration procedure [1,2]. A critical step during a 3D reconstruction process is to extract the features and points of interest in individual images and to match them in any arbitrary pair of images. The extracted features should be scale and rotation invariant. They should also provide a robust matching capability across a wide range of distortions;

Shohreh Kasaei [email protected] Seyed Musa Sadrehosseini [email protected] Parastoo Farnia [email protected] 1

Medical Physics and Biomedical Engineering Department, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran

2

Research Center for Biomedical Technologies and Robotics (RCBTR), Tehran University of Medical Sciences, Tehran, Iran

3

Department of Computer Engineering, Sharif University of Technology, Tehran, Iran

4

Skull Base Center, Imam Khomeini Hospital Complex, Tehran University of medical sciences, Tehran, Iran

123

Author's personal copy 984

including the changes in 3D viewpoint and illumination, and the presence of noise [1,3]. Various methods have been developed for detecting the keypoints. Of these, the Harris corner detector is not a multiscale technique. The difference of Gaussian (DoG) method functions as an isotropic filter and therefore requires an extra step to distinguish between the keypoints and edges [4–6]. The scale invariant feature transform (SIFT) shows improved robustness to changes in image scale compared to earlier keypoint detectors such as the Harris corner detector [4,5]. The speeded up robust features (SURF) is much faster than SIFT, while having the same performance [7]. Wavelet transform has been applied as a multi-scale technique for detecting the keypoints. The discrete wavelet transform (DWT) has been used by Loupias et al. [8] for extracting the keypoints. The major drawback of DWT is that its coefficients are NOT shift-invariant and directionally selective; however, shift invariance and directional sensitivity are very crucial features in detection of keypoints. Complex wavelet transform can be a simple solution to the shortcomings of DWT, because it does not suffer from the abovementioned problems. However, the traditional formulations of complex wavelets are rarely used; because they commonly suffer from either a lack of speed or poor inversion properties [9]. The dual-tree complex wavelet transform (DTCWT) is found to be almost shift-invariant, and it offers low redundancy and good computational efficiency as well as good directional selectivity [10,11]. In fact, DTCWT replaces the tree structure of the conventional wavelet transform with a dyadic tree of Hilbert transform pair filters to compute the real and imaginary components. In DTCWT, at each scale, an image is decomposed into six directionally sensitive and approximately analytic sub-bands oriented at angles (30k − 15)◦ , where k = 1, . . . , 6. The rotational symmetry of the DTCWT can be further improved by adding a bandpass filter in each direction and applying a phase correction to make the coefficients conjugate symmetric [11]. The keypoint matching process proceeds by first using a keypoint detector to find the salient features such as edges, corners, and blobs in two different images. Then, the features (which are extracted from the keypoints in the first image) are compared to those in the second image. Since the keypoints will not necessarily be centered on exactly the same object components from one image to another, robustness to small displacement errors can be the key to the success of this method. In existing intensity-based registration and matching methods, the mutual information (MI) based on the concept of information theory is one of the most frequently used methods due to its accuracy and robustness [12,13]. MI was introduced as a similarity measure between images simultaneously by Viola et al. [14] and Maes et al. [15]. MI assumes no prior functional relationship between images. Rather it assumes a statistical relationship that can be captured by ana-

123

SIViP (2016) 10:983–991

lyzing the joint entropy. NMI improves the robustness of MI by avoiding some misregistrations [16]. Extracted keypoints with proposed method are also used for registration purposes. The registration process can be performed by using a point-based algorithm; because the acquired data is in the form of a set of points. Various pointbased registration algorithms exist that iteratively find the correspondence between points and then estimate the transformation parameters based on those correspondences. One of the most popular point-based registration algorithms is the iterative closest point (ICP) [17]. The process of point matching can be difficult due to existence of outliers and missing points in the point set of the image. To overcome these limitations, some variants of ICP have been introduced that affect all steps of the algorithm from the selection and matching of points to the minimization strategy [18]. These methods aim to recover either the correspondence or the transformation required to align the point sets, or both. An appropriate variant of the ICP algorithm which has already been experimented by the authors of this paper is used to register the two sets of keypoints [19,20]. In this study, it is shown that the unique advantages of DTCWT over other multi-scale decompositions makes it an ideal candidate for multi-scale, robust, and computationally efficient keypoint detection; which are desirable properties for visual recognition tasks. Matching and registration is performed by using the obtained keypoints. For comparison purposes, the Harris, SURF, and SIFT algorithms are applied on the same dataset and results are compared according to their repeatability, number of keypoints, robustness to rotation, noise, changes in brightness, color, execution time, and performance in matching and registration processes. The obtained results show the efficiency of the proposed method. The rest of this paper is organized as follows: in Sect. 2, the DTCWT-based procedure of keypoint detection and localization is explained in detail. Then, the modified ICP algorithm for matching and registration processes is described step by step. In Sect. 3, the experimental results of the proposed method are presented. Finally, Sect. 4 concludes the paper.

2 Proposed method and materials The proposed method of keypoint extraction is described in Sect. 2.1. Point-to-point correspondence is needed for comparison with SIFT by using the extracted keypoints for tracking and 3D reconstruction. To do this, points (or features) in one image are matched with the corresponding points (or features) in another image. The process of matching among keypoints of consequent frames is performed by using the NMI. It is given in Sect. 2.2. The ICP algorithm is used to represent the extracted keypoints as robust landmarks for image registration. In the algorithm, one point cloud (as

Author's personal copy SIViP (2016) 10:983–991

985 Selected Frame

Apply DTCWT(with selected scale)

Product of Subbands(Index)

Detecting Peaks of function

Calculating GWAF

Calculating Magnitude of subbands(Mag)

Keypoints

Fig. 2 Flowchart of proposed keypoint extraction method

Fig. 1 Structure of 1D Q-shift dual-tree

the reference) is kept fixed, while the other (the source) is transformed to best match the reference. The registration algorithm is described in Sect. 2.3. 2.1 Keypoint extraction The core of the proposed algorithm is based on the dual-tree complex wavelet transform shown in Fig. 1. In this work, first “rgb2gray” function of MATLAB is used for color transformation of frames and then, DTCWT is applied on each frame of our dataset and six subbands are achieved. For each scale, the magnitude of the six subbands is estimated as follows: For i = 1, 2, . . .6. 1/2  Magi = RealCoefi2 + IMGCoefi2

(1)

where RealCoef and IMGCoef stand for real and imaginary coefficients of each subband, respectively. Based on the product of all six subband magnitudes, an index is defined for each scale as Is =

6 

Magi .

(2)

i=1

To obtain an accurate keypoint localization, each scale index is interpolated with a 2D Gaussian kernel up to the original image size by a factor of 2s , where s is the scale level. A new function called Gaussian weighted additive function (GWAF) is proposed based on the interpolated indexes from scales s = 1 to m, as GWAF =

m 

[WF ∗ IIs ]

(3)

WF = e

2.2 Keypoints matching Mutual information, known as relative entropy, is a measure of information that two images have in common. Maximizing the mutual information can be thought of as minimizing the joint entropy H (A, B) relative to the marginal entropies H (A) and H (B) in the overlapping region of the images [14]. The form of definition that we used is related to the Kullback–Leibler distance, which is defined as i p (i) p(i) , for two distributions p and q. It is a measure of log q(i) the distance between two distributions. Analogous to the Kullback–Leibler measure, the mutual information of images A and B is defined as I (A, B) =

 a,b

p AB (a, b) log

p AB (a, b) p A (a) p B (b)

(4)

where IIs is the interpolated index, m is the number of scale, WF is the weight factor, smd is the median of scale number for odd number of scales (i.e., if m = 5, smd is 3) and m/2 for even scale numbers (i.e., if m = 4, smd is 2), and ξ (known

(5)

and I (A, B) = H (A) + H (B) − H (A, B) .

s=1 −(s−smd )2 ξ2

as the kernel width) is the weighting variance. As ξ gets larger, the weight of all points approaches 1.0, and GWAF becomes a global summation of all scales as in [21,22]. As ξ gets smaller, all but the very median scales, which have more details (than first scales) and less noise (than last scales) get a maximum weight. The number of keypoints is also controllable by applying different ξ . As we move toward the last scales, details and noise are increased. Choice of ξ is very important to make a trade-off between details and noise. The GWAF is a mixture of Gaussians that is centered on each feature. Keypoint locations are defined as the peak locations of GWAF by detecting the maximum values of this function on a pre-defined neighborhood. The flowchart of the proposed method is shown in Fig. 2. The experimental results show that the keypoints correspond to the visual image content.

(6)

The normalized measure of mutual information has been shown to be more robust for registration than the standard mutual information [23], which defined as NMI (A, B) =

H (A) + H (B) . H (A, B)

(7)

123

Author's personal copy 986

SIViP (2016) 10:983–991

Keypoints of reference frame

where, R and T are rotation and translation, respectively. For iterative minimization of the error criterion, SVD and Levenberg–Marquardt methods can be used [20]. To compare with previous studies, the SURF, SIFT and Harris algorithm is applied in order to extract the interest points in individual images and to match them in any arbitrary pair of images [4,5,7].

Keypoints of floating frame

Calculate NMI between two windows around each keypoint of each frame NO

2.4 Used dataset

Is NMI greater than 1.2

Yes Corresponding Matches

Fig. 3 Flowchart of proposed matching method

Corresponding points are found by maximizing the normalized mutual information (NMI) CP = argmaxNMI (A, B)

(8)

where NMI(A, B) is the NMI of sub-image A and sub-image B, which is defined as a neighboring window around keypoints of two different frames. The flowchart of the proposed matching method is illustrated in Fig. 3. 2.3 Registration using keypoints The ICP algorithm selects points, matches them with their corresponding points, weights them, rejects outliers, and finally iteratively minimizes the RMS error. For matching, a brute force method can be selected which calculates distances to all neighbor candidates and picks the closest point. In the weighting step, a constant weight can be used for all points or (in other implementations) a variable weight can be calculated. In this study, to eliminate outliers, points having the highest 10 % point-to-point distance are removed from the pre-deformation point cloud. The point-to-plane errors are used as an error criterion for registration process. This criterion minimizes the sum of differences between source points and the tangent plane which is matched to corresponding target points. This process can → be done by minimizing the dot products of the vectors − p− ı qı − → and the normal n ı , where p and q are source and target points, respectively. These errors can be expressed as N  2    E= R pi + T − qi 

(9)

i=1

E=

N   i=1

123

2 → R pi + T − qi · − nı .

(10)

Our dataset includes 8 standard images of a building (with a resolution of 1536 × 1024) from different viewpoints of the camera. For the assessment of varying texture and edge details, it is tested on a phantom endoscopic video data of the frontal sinus cavity, and to evaluate the method in a clinical setting, it was tested on an endoscopic intraoperative video of the frontal sinus cavity with translation and rotation of endoscopic camera. For each of the videos, 129 consecutive frames were extracted, and the extracted frames were checked based on method in [24]. The resolution of frames is 512 × 512. Some typical samples of these datasets are provided as supplementary material.

3 Experimental results For each frame, the DTCWT is applied, and scale indexes are obtained. Then, the size of each scale index is interpolated to the original image size. The GWAF is then estimated for each dataset. The resulted map for the first frame of each dataset is illustrated in Fig. 4. As shown, the GWAF map of phantom data is smoother than the others and has less dominant peaks. This is reasonable because of the data structure, which has less texture and edges compared to other datasets. Keypoints are detected as described in the previous section in a pre-defined neighborhood. The obtained keypoints are shown on the first frame of each dataset in Fig. 5. The method is tested on frames of each dataset. The average number of keypoints in the Building dataset, the clinical sinus dataset, and the phantom dataset for ξ = 4, smd = 2 with peak detection window size = 10 × 10 are 3358, 578 and 479, respectively. As described earlier, the process of matching between keypoints of consequent frames was performed using NMI algorithm. The keypoint locations were located in the original images, and then the NMI algorithm was calculated with neighboring window size of 11 × 11 pixels around keypoints location in two different frames. The results of matching different frames of dataset are illustrated in Fig. 6. In order to represent the extracted keypoints as robust landmarks for image registration purposes, all frames were registered to each other using our modified ICP algorithm. By conducting 15 runs of the ICP algorithm, the mean and

Author's personal copy SIViP (2016) 10:983–991

987

Fig. 6 Results of matching of different frames of: a Building dataset and b clinical sinus dataset Table 1 Registration RMSE error and executing time of proposed modified ICP

Fig. 4 GWAF for first frame of each dataset with smd = 2 and ξ = 4: a Building dataset, b clinical sinus dataset, and c phantom dataset

Fig. 5 Obtained keypoints for the first frame of each dataset

Dataset

RMSE error (mm)

Executing time (s)

Building

0.97

31

Sinus

1.46

38

Phantom dataset

1.16

36

standard deviation for root-mean-square error (RMSE) and execution time were obtained. RMSE was calculated as RMS differences between coordinates of the corresponding points of two registered frames with the ICP algorithm. The ground truth in registration process with modified ICP was obtained in our previous work with the magnetic resonance imaging (MRI) as a gold standard on the brainshaped phantom [20,25]. For validation of the registration process on this dataset, the first image of clinical dataset was rotated and translated with predefined values (45◦ and 3 mm). Then, the original and the transformed version of frames were registered to each other, and the RMSE value was calculated between coordinates of corresponding points; the reported value was very close to zero. The results of registration RMSE error and executing time of proposed modified ICP are listed in Table 1. In addition, to evaluate the impact of parameter ξ on keypoints, two values of ξ = 5 and ξ = 7 are also examined with a window size of 50 × 50. The results are shown in Fig. 7. Selection of ‘WF’ in GWAF is an important factor in the keypoints extraction process. Different values of ‘WF’ based on ξ and constant smd = 3 are listed in Table 2. As shown in Fig. 7 and Table 2, for large values of ξ the weight of all scales are getting close to each other and the weighting function starts becoming a uniform function. Hence, GWAF becomes a global summation of all scales and gets smoother. This leads to finding fewer but more salient keypoints in multiscales.

123

Author's personal copy 988

SIViP (2016) 10:983–991

Fig. 8 Average number of keypoints in different rotations, for each method

Fig. 9 VDC percentage between keypoints detected in each rotation degree and 0 rotation degree for each method in Building dataset

2 ∗ Nmf Norig + Nother

Fig. 7 Obtained keypoints with: a Smd = 3 and ξ = 5, b Smd = 4 and ξ = 7, and c GWAF for Smd = 4 and ξ = 7

VDC =

Table 2 Different values of ‘WF’ based on ξ and constant smd = 3

where Nmf is the number of matched feature between two images, Norig is the number of features in original image and Nother is the number of features in the other image. For the same images the value of VDC will be equal to 1 and for two completely different images it will be zero.

Value of WF for s1

Value of WF for s2

Value of WF for s3

Value of WF for s4

Value of WF for s5

ξ = 1 0.0183

0.367

1

0.367

0.0183

ξ = 3 0.641

0.894

1

0.894

0.641

ξ = 7 0.92

0.97

1

0.97

0.92

(11)

3.1 Rotation robustness To obtain robust and strong keypoints, the parameters of the proposed method are selected as smd = 3, ξ = 5, and a window size of 15 × 15 in the rest of the experiments. To compare our work with previous studies, the SIFT, SURF and Harris algorithms were applied on the datasets. For robustness experiments, we applied the keypoints detectors on the original frames and with the same parameters on affected frames of datasets. Then, the resulting average number of keypoints was reported. As an index, a variant of DICE coefficient (VDC) is used to evaluate the performance of the methods in this study. For calculating the coefficient, each algorithm was conducted on original images, and the keypoint locations were located in the image and then the procedure was repeated again on the affected versions of the image and finally number of the matches between two images was reported. The VDC shows the percentage of the matched features normalized by the number of extracted features.

123

To assess the robustness of the proposed keypoint detector compared to SIFT, SURF and Harris against rotation, algorithms were applied on rotated versions (1◦ –360◦ ) of the datasets. The average number of keypoints in rotated images for each method is illustrated in Fig. 8. To quantify the robustness of the proposed method, all detected keypoints at each rotation angle are matched with the keypoints detected in the original image (0◦ rotation).The VDC percentage of matching between the two sets of keypoints achieved in the three datasets remained above 75 % across orientations for our method compared to above 16 % for the Harris method, 55 % for the SIFT method, and above 70 % for the SURF method, indicating a better robustness. The VDC percentage for different rotations in each method in Building dataset is shown in Fig. 9. As can be seen from this figure, the proposed method extracts more robust and accurate keypoints.

Author's personal copy SIViP (2016) 10:983–991

989

3.2 Noise robustness Noise is a very important issue for detection methods since a noisy pixel can be identified as a feature. Our proposed method reduces the fine scale energies in GWAF to provide a smoother map. Therefore, in the presence of noise, only strong keypoints will be detected. To evaluate robustness against noise, Gaussian noises with different variances were added to the datasets. Figure 10 shows the noisy frames of the Building dataset. The average number of extracted keypoints in the presence of Gaussian noise with different variances is shown in Figs. 11, 12 and 13 for Building, sinus and phantom dataset, respectively. As these figures show, in the proposed method and the SURF method, even in presence of strong noise the average number of keypoints stays almost the same. However, as the noise increases, the other keypoint extraction methods produce more keypoints, which demonstrates their sensitivity to noise.

Fig. 12 Average number of keypoints in presence of Gaussian noise with different variances in clinical sinus dataset

Fig. 13 Average number of keypoints in presence of Gaussian noise with different variances in phantom sinus dataset

Fig. 14 VDC percentage in presence of Gaussian noise with different variances for each method in Building dataset

Fig. 15 Images with various percentage of brightness for Building dataset. Percentage of brightness: a 50 %, b 70 %, c 80 %, and d 90 %

The VDC percentage between the two sets of keypoints achieved in SIFT and Harris method reduces considerably as the noise variance increases, while SURF and the proposed method indicate a better robustness against Gaussian noise. The VDC percentage for Gaussian noise with different variances in each method in Building dataset is shown in Fig. 14. As can be seen from this figure, the proposed method extracts more robust and accurate keypoints. Fig. 10 Noisy dataset with different variances. Building dataset with variance of a 0.001, b 0.003, and c 0.005

Fig. 11 Average number of keypoints in presence of Gaussian noise with different variances in Building dataset

3.3 Brightness robustness Another critical issue is the brightness of frames in the clinical endoscopy data. Various percentages of brightness are applied on the datasets and the results are compared to each other. The images with their brightness changed are shown in Fig. 15. The average number of extracted keypoints for each method after applying a brightness change is shown in Figs. 16, 17 and 18 for Building, sinus and phantom dataset, respectively. As shown in these figures, the average number of keypoints reduces considerably in other methods as the bright-

123

Author's personal copy 990

SIViP (2016) 10:983–991

Fig. 20 Normal and color changed versions of Building dataset

Fig. 16 Average number of extracted keypoints after applying various percentage of brightness in Building dataset

Fig. 17 Average number of extracted keypoints after applying various percentage of brightness in clinical dataset

Table 3 Difference in the average number of keypoints for each method in the normal and color changed datasets

ness percentage increases, while in the proposed method it has a much less reduction in slope. The VDC percentage after applying various percentage of brightness in each method in Building dataset is shown in Fig. 19. As can be seen from this figure, the slope reduction of the location accuracy in the proposed method and SURF is less than other methods.

Keypoints difference proposed method (%)

42.435

31

22.5

2.6

25

18.5

4.5

1.2

30.30

16.10

5.40

1.75

Building Sinus Phantom

Table 4 Elapsed times for each method Executing Executing Executing time Harris time SIFT time SURF (s) (s) (s)

Executing time proposed method (s)

Building 5.54 ± 0.13

13.2 ± 0.69

10.03 ± 1.03 5.43 ± 0.19

1.77 ± 0.24

3.15 ± 0.44

2.82 ± 0.53

1.2 ± 0.22

Phantom 1.64 ± 0.11

2.62 ± 0.24

2.02 ± 0.12

1.01 ± 0.17

Sinus

Fig. 19 VDC percentage after applying various percentage of brightness for each method in Building dataset

Keypoints difference of SURF (%)

Keypoints difference of Harris (%)

Dataset

Fig. 18 Average number of extracted keypoints after applying various percentage of brightness in phantom dataset

Keypoints difference of SIFT (%)

Color

combined. In Fig. 20, the normal and color changed versions of datasets are shown. The results of difference in the number of keypoints are listed in Table 3. As shown in Table 3, variations in the average number of keypoints in the other methods except SURF are very significant compared to that of the proposed method. The proposed method obtains the best results. 3.5 Computational cost For a fair comparison, standard implementations were used for Harris, SIFT and SURF [26–28]. The algorithms are implemented in MATLAB and are tested using a Core i5-2.4 GHz processor with 4 GB RAM. Experiments are repeated three times and the results are listed in Table 4.

4 Discussion and conclusion 3.4 Color robustness To assess the robustness of the proposed method against other keypoint detectors, the sensitivity to color changes are also considered. One of the major limitations in frontal sinus cavity surgeries is the bleeding during surgical procedures that impairs the endoscopic view. After functioning the blood, shade of red color on the tissue remains. We tried to simulate this situation with color transformation. The color coefficients in RGB image were separated, and the red color coefficients were changed. Then, the new coefficients were

123

For processing endoscopic sinus surgery frames in addition to other descriptor specifications, keypoints with high repeatability in terms of number and location are required. In this study, a method based on dual-tree complex wavelet transform is proposed. The proposed method was applied on three types of datasets, the Building, the clinical sinus and the phantom dataset. First, a scale index is estimated for each scale and then, the Gaussian weighted summation function is determined. Keypoints are defined as local peaks of the determined function. The number of keypoints is control-

Author's personal copy SIViP (2016) 10:983–991

lable and can be varied by applying different variances in the Gaussian weighting function and window size of local maximum detection. Setting very small values for variance will emphasize details in the median scales, improve the localization, and results in more keypoints, but it will make the detector more sensitive to noise. There should be a trade-off between the number of keypoints and the robustness of keypoints to noise. An advantage of this approach is that we can also determine the location of keypoints based on the parameters of the method to some extent. By adjusting values of smd and ξ more corners, edges, blobs, and textures are extracted based on their strength and saliency in the frame. Based on figures, in repeated trials on frames the percentages of VDC were 100 % which indicates the localization accuracy too. Compared to the Harris, SURF, and SIFT algorithms, the robustness to rotation, noise, and changes in brightness and color are prominent in the proposed method. In this study, it is shown that the unique advantages of DTCWT over other multi-scale decompositions makes it an ideal candidate for multi-scale, robust, and computationally efficient keypoint detection; which are desirable properties for visual recognition tasks. Results on the Building dataset illustrate that the keypoints are compatible with the visual content of the image. We aim to implement this method on a procedure of motion estimation followed by 3D reconstruction of frontal sinus cavity from endoscopic video sequences in our next study. Acknowledgments This research has been supported by Tehran University of Medical Sciences & health Services grant 90-04-30-15836. Also, the authors would like to thank Research Center for Biomedical Technology & Robotics, RCBTR for supporting and providing an environment to carry on this project.

References 1. Mirota, D.J., et al.: High-accuracy 3D image-based registration of endoscopic video to C-arm cone-beam CT for image-guided skull base surgery. In: Medical Imaging 2011: Visualization, ImageGuided Procedures, and Modeling, vol. 7964 (2011) 2. Mirota, D.J., Uneri, A., Schafer, S., Nithiananthan, S., Reh, D., Ishii, M., Gallia, G.L., Taylor, R.H., Hager, G.D., Siewerdsen, J.H.: Evaluation of a system for high-accuracy 3D image-based registration of endoscopic video to C-arm cone-beam CT for image-guided skull base surgery. IEEE Trans. Med. Imaging. 32(7), (2013) 3. Thormahlen, T., Broszio, H., Meier, P.N.: Three-dimensional endoscopy. In: Falk Symposium, Medical Imaging in Gastroenterology and Hepatology, 212 edn., vol. 124, pp. 199–212. Hannover (2002) 4. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004) 5. Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, pp. 147–151 (1988) 6. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615– 1630 (2005)

991 7. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) Computer Vision—ECCV 2006, pp. 404–417. Springer, Berlin (2006) 8. Loupias, E. et al.: Wavelet-based salient points for image retrieval. In: IEEE International Conference on Image Processing (2000) 9. Neumann, J., Steidl, G.: Dual-tree complex wavelet transform in the frequency domain and an application to signal classification. Int. J. Wavelets Multiresolut Inf. Process. 3(43), (2005). doi:10. 1142/S0219691305000749 10. Kingsbury, N.G.: Complex wavelets for shift invariant analysis and filtering of signals. J. Appl. Comput. Harmonic Anal. 10(3), 234– 253 (2001) 11. Selesnick, I.W., Baraniuk, R.G., Kingsbury, N.G.: The dual-tree complex wavelet transform. IEEE Signal Process. Mag. 22(6), 123– 151 (2005) 12. Papademetris, X., et al.: Integrated intensity and point-feature nonrigid registration. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI (1), pp. 763–770. Springer, Berlin (2004) 13. Nabatchian, A., Abdel-Raheem, E., Ahmadi, M.: Illumination invariant feature extraction and mutual-information-based local matching for face recognition under illumination variation and occlusion. Pattern Recognit. 44(10–11), 2576–2587 (2011) 14. Wells III, W.M., et al.: Multi-modal volume registration by maximization of mutual information. Med. Image Anal. 1(1), 35–51 (1996) 15. Maes, F., et al.: Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imaging 16(2), 187–198 (1997) 16. Estevez, P.A., et al.: Normalized mutual information feature selection. IEEE Trans. Neural Netw. 20(2), 189–201 (2009) 17. Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992) 18. Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: 3-D Digital Imaging and Modeling. Quebec City, QC (2001) 19. Nazem, F., et al.: Two-stage point-based registration method between ultrasound and CT imaging of the liver based on ICP and unscented Kalman filter: a phantom study. Int. J. Comput. Assist. Radiol. Surg. 9(1), 39–48 (2014) 20. Ahmadian, A., et al.: An efficient method for estimation of soft tissue deformation based on intra-operative stereo image features and point-based registration. Int. J. Imaging Syst. Technol. 23(4), 294–303 (2013) 21. Bendale, P., Triggs, B., Kingsbury, N.: Multiscale keypoint analysis based on complex wavelets. In: Proceedings of the British Machine Vision Conference. BMVA Press (2010) 22. Fauqueur, J., Kingsbury, N., Anderson, R.: Multiscale keypoint detection using the dual-tree complex wavelet transform. In: International Conference Image Processing (2006) 23. Studholme, C., Hill, D.L.G., Hawkes, D.J.: An overlap invariant entropy measure of 3D medical image alignment. Pattern Recognit. 32(1), 71–86 (1999) 24. Abretske, D., et al.: Intelligent frame selection for anatomic reconstruction from endoscopic video. In: WACV, pp. 1–5. IEEE Computer Society (2009) 25. Serej, N.D., Ahmadian, A., Mohagheghi, S., Sadrehosseini, S.M.: A projected landmark method for reduction of registration error in image-guided surgery systems. Int. J. Comput. Assist. Radiol. Surg. 10(5), 541–554 (2015) 26. Kovesi, P.: http://www.csse.uwa.edu.au/~pk/research/matlabfns/ 27. Kroon, D.: http://www.mathworks.com/matlabcentral/ fileexchange/28300-opensurf-including-image-warp/ 28. Lowe, D.: http://www.cs.ubc.ca/spider/lowe/keypoints/ siftDemoV4.zip

123

Suggest Documents