Robust Visual Tracking for Retinal Mapping in

0 downloads 0 Views 467KB Size Report
this goal, we describe a robust method for tracking and mapping the human ... map is built on-the-fly by registering adjacent template keyframes. The result.
Robust Visual Tracking for Retinal Mapping in Computer-assisted Slit-lamp Imaging Mateus Souza

Rogerio Richa∗

Andre Puel

Jonas Caetano

Eros Comunello

Aldo von Wangenheim

Brazilian National Institute for Digital Convergence - INCoD [email protected] Florianopolis, Brazil Abstract—The slit lamp device is a very popular equipment for inspecting the human retina due to its magnification, stereoscopy and easy control. In this context, computer assistance can provide information overlay to surgeons, improving navigation, accuracy and safety in procedures where the slit-lamp is used. Toward this goal, we describe a robust method for tracking and mapping the human retina using slit-lamp images. Inspired in [1], the proposed method is essentially a direct visual tracking method using local illumination compensation, as well as a forwardbackward strategy for achieving superior tracking resilience. For achieving fast tracking speeds, an efficient pixel selection scheme is used. Experiments conducted on several human patients confirm the practical value of the system.

I.

Fake silicon eye

Camera view

Video Display

Contact lens

Retina mosaic Camera

I NTRODUCTION

Laser panretinal photocoagulation (PRP) is the standard treatment for preventing blindness in patients affected by diabetic retinopathy and retinal vein occlusions. PRP aims to destroy ischemic areas comprised between the macula and the periphery. Selective laser photocoagulation is also an efficient treatment for macular edema, which frequently occurs in both retinal diseases cited above. Naturally, with the aging of the population worldwide and the increasing rates of diabetes, there is a strong interest in developing and improving laser treatment techniques for the prevention of blindness. The slit-lamp in conjunction with a wide angle contact lens is the most commonly used device for laser delivery. Even though the field of view inside the retina is smaller in slit-lamp images compared to fundus camera devices [2]. To overcome some of the technical limitations associated with the slit-lamp device, researchers have developed methods for creating intraoperative retina maps to aid surgeons in tasks such as pre-operative planning, navigation, view augmentation and photodocumentation. The first works in this field have been reported by Markov et al. [3] and Barrett et al. [4], who used a computer system to stabilize the retinal image using fundus images. More specifically, Berger et al. [5] proposed the first system for tracking and mapping the retina in slit-lamp images. The system was able to track the 2D displacement of the retina and blend a mosaic with the tracked retina keyframes. Although the system was able to prove the feasibility of retina tracking and mosaicking, it was still not robust enough for the challenging clinical setting. In the field of functional imaging, similar mosaicking techniques [6], [7], [8] have been developed. In these works, direct visual tracking techniques are also employed to overcome difficulties with noise and lack of texture. Nevertheless, functional

Computer

Fig. 1. A slit-lamp device with a camera coupled to the biomicroscope, which allows the operator to use a high-definition display instead of the eye piece. It allows the creation of an intra-operative retina map on a side window.

imaging imposes a different set of problems compared to slitlamp imaging and tracking and mapping techniques used in that context do not apply directly to retina images. Inspired by recent progress in tracking and mapping for vitreo-retinal surgery, we adopted and extended the framework described in [1] to the context of slit-lamp imaging. In this paper, we describe three novelties. The first is a local illumination compensation method [9] for direct visual tracking. The second is a backward-forward tracking strategy inspired in [10] to increase tracking resilience. The third is the use of the efficient pixel selection scheme proposed in [11] for tracking at frame-rate. The proposed mapping system was deployed on a computer-assisted slit-lamp prototype for testing on several human subjects. Experimental results show the significant performance improvement in tracking and mapping using the extensions proposed in this paper, as well as the practical value of the system for intra-operative navigation, diagnosis and photodocumentation. This paper is organized as follows. In the next section, we describe in details the proposed method for tracking and

Retina map

Current view

Current view from microscope

Template centers Currently tracked template Map center Active template

Fig. 2. The retinal mapping method - Using direct visual tracking, a retina map is built on-the-fly by registering adjacent template keyframes. The result is the retina map shown on the right.

mapping the retina using a computer-assisted slit-lamp device. In section III, we discuss the results of experiments conducted on human subjects. Finally, we conclude the paper in section VI. II.

M ATERIALS AND M ETHODS

A. Computer-assisted Slit-lamp Setup In Figure 1, we show the main components of a computerassisted slit-lamp. the setup is composed by a wide angle Volk Super Quad lens (Volk Optical, USA) for visualizing the extreme periphery of the retina, a 720P high definition camera acquiring at 60fps (Aptina, EUA) and a wide screen monitor for visualization. The camera is coupled to a DeckLink acquisition board (BlackMagic, EUA) in a PC with 32Gbytes of RAM and an i7 3.8GHz Intel processor. Visualization and mapping are performed at frame-rate. In our laboratory setup, we use a silicon phantom eye for simulation. B. Mapping the retina A schematic overview of the visual tracking and mosaicking method is given in Figure 2. During the exam, only a small portion of the retina is visible. A pre-processing step is required for extracting the visible part of the retina in the images from the biomicroscope. For initializing the retinal mapping method, an initial reference image of the retina is selected. The center of this initial reference image represents the origin of the retina map. As the operator explores the retina, additional templates are incorporated into the retina map. New templates are incorporated as the distance between the current visible part of the retina and the map origin increases, as illustrated in Figure 2(left) (notice that adjacent templates overlap). At a given moment, the template closest to the current view of the retina is tracked using the direct tracking method detailed in section II-B2. 1) Segmenting the retina: During examination, only a small, roughly rectangular window of the retina is visible. The size of the visible area is manually adjusted by the operator,

which manages a tradeoff between patient comfort and view clarity. Due to the considerable amount of reflections from the contact lens and cornea, a pre-processing step is required to extract the visible part of the retina. This pre-processing step is basically a thresholding based on the average intensity level among all color channels, followed by a closing morphological operation (dilation followed by erosion). A binary mask is generated, where true values indicate the pixels which belong to the retina. The center of mass of the resulting binary mask is also computed to provide a reference to the retina tracker, which is presented next. 2) Direct visual tracking using local illumination compensation: At a given moment, the template in the retina map closest to the current view of the retina is tracked. In [1], the Sum of Conditional Variance (SCV) is used for tracking the retina under illumination variations. Compared to other robust image similarity measures from the medical image registration domain such as Mutual Information and the Normalized Cross Correlation (NCC), the SCV shows a good trade-off between robustness and convergence radius. Nevertheless, all previously cited similarity measures are not suitable for tracking under local illumination variations [12]. For this reason, we decided to adopt the local compensation method proposed in [9] to cope with the complex illumination variations in slit-lamp imaging. By definition, tracking is formulated as an optimization problem: at every image, the parameters p of the transformation function w(x, p) that maximize a similarity function between the template T and current images I(w(x, p)) is sought: XX c 2 min iSSD = min I ∗ (w(x, p), µ, β) −c T (x) , p,µ,β

p,µ,β

c

x

(1) where c indicates a specific color channel and µ and β are illumination parameters. Since patients use a head stand during examination, scale variations can be considered negligible in retinal images acquired with the biomicroscope. Therefore, the transformation function w(.) is chosen to be a rotation and translation model. Since the retina absorbs most of the blue light component, the blue channel is ignored for tracking. Hence, tracking is performed using red and green channels (c = {R, G}). The local illumination compensation method consists in modeling local variations in contrast using a deformable parametric surface such as a Thin-Plate Spline [13]: I ∗ (w(x, p), µ, β) = g(x, µ)I(w(x, p)) + β

(2)

The parameter vector µ is a vector of control point values, according to the formulation proposed by [13]. The parameter β is a variable for estimating global brightness variations. 3) Efficient pixel selection scheme: Due to the size of the tracked templates, the minimization problem in eq. 1 is very computationally expensive. To track at frame rate (60 fps), we have adopted the pixel selection technique described by Meilland et al. [11] for the following reasons. First, the chosen scheme is not based entirely on the image gradient intensity such as [14]. The pixel selection takes into consideration the transformation model used for tracking. Second, it does not require an expensive pre-processing step as in [15], which is too restrictive in our context since templates are being

continuously incorporated into the retinal map. Finally, it can be run on a CPU (no additional hardware such as Graphic Processing Units is required). The selection scheme consists of selecting the strongest M pixels from T for tracking. M is typically chosen around 25% of N , the total number of pixels in T . For each color channel, a N × 3 Jacobian matrix J(p) is computed for p = p0 (i.e. the original template configuration, where I(w(x, p0 ) = T (x)). The values in each column of J are sorted in decreasing order of magnitude, resulting in J0 . The set of active pixels is chosen by taking the pixel positions from each of the three columns of J0 at a time, carefully not incorporating the same pixel twice. This ensures that the chosen pixels equally represent each degree of freedom in the transformation estimation process. In summary, the algorithm above selects the pixels from T with the strongest gradients in parameter space, instead of image space. Details on the computation of J can be found in [16]. The final Jacobian matrix J∗ is obtained by stacking the Jacobian matrices generated by the selected pixels from each color component c. To solve the minimization problem in equation (1), the Efficient Second Order Minimization (ESM) [16] is used. Weights are incorporated to its formulation for removing pixels which do not belong to the retina from the estimation of tracking parameters, using the binary mask described in Section II-B1. The ESM loop is executed until the sum of the absolute value P of the increments drops below a certain threshold  (i.e. [|∆p| + |µ| + |β|] < ). 4) Forward-backward tracking: In practice, sudden motions, specular reflections, glare and distortions often degrade tracking quality. To avoid tracking loss and constant map relocalization, we implemented a forward-backward tracking strategy inspired in [10]. The principle is illustrated in Figure 3. The conventional approach described in Section II-B2 consists in tracking the template from the retina map closest to the current view. This conventional approach is named forward tracking. To increase tracking flow in situations where forward tracking is not reliable, we introduce the backward tracking step. It consists in extracting a region of interest in the center of the current image at time t (marked by a dashed line in Figure 3) and tracking its position in the previous frame at t − 1. Since the translation and rotation model forms a group, we can invert the transformation from t to t − 1 and use the most reliable tracking estimate between the forward and backward trackers. In practice, tracking backward in time can help bridge certain tracking degradations and avoid the more computationally expensive map relocalization step, described next in Section II-B5. 5) Map relocalization: Although the forward-backward strategy greatly increases tracking resilience, situations where the surgeon needs to clean the contact lens, move the light source, or in moments where sudden movements occur, tracking is eventually lost. If the tracking confidence drops below a given threshold λ = 0.85, tracking is deemed unreliable and suspended. Tracking confidence is measured as the average NCC between the reference image c T and the current compensated warp c I ∗ (w(x, p)) (equation (2)) over red and green color channels c. The NCC was chosen as confidence measure

Forward tracking

t-1

Backward tracking

t

Fig. 3. Backward-forward tracking - Tracking backward and forward in time can greatly increase tracking resilience compared to the conventional forward tracking. Since the transformation model is invertible, we are able to use the most reliable tracking results for updating the retina map coordinates.

since it is a bounded global measure (it varies between [-1,1]). To continue expanding the retina map, the position of the currently visible window in the retina map must be redetermined. In our work, we use the SURF feature-based strategy proposed in the original formulation [1]. Features from every template in the retina map were matched to those from the current image using RANSAC. The estimated location of the retina map on the current image was used to re-initialize the tracking and mapping system. 6) Mosaic blending: When mapping the retina, differences in lighting conditions between adjacent templates create seams in the resulting retina map which can confuse the operator during the exam. To circumvent this issue, we employ the weighted averaging blending method described in [17] to render a more photo-realistic mosaic of the retina, taking advantage of the overlap between stored templates. Even though this blending technique is not capable of compensating for blurs resulting from tracking drift, the visual quality of the resulting mosaic is satisfactory for our purposes. III.

E XPERIMENTS

The proposed mapping method was tested on a large database of videos recorded from several human patients using the device shown in Figure 1. The goal in this section is to evaluate the practical value of the proposed illumination compensation method, the forward-backward tracking strategy and the pixel selection scheme. We also describe certain key computational aspects and performance numbers concerning the proposed tracking and mapping method. A. Computational aspects All methods were implemented using OpenCV and paralellized using OpenMP. The system runs at frame-rate on an Intel i7 3.8GHz computer with 4 cores (with hyper-threading) with 32GBytes of RAM. Tracking parameters and system settings remained fixed in all experiments. The direct visual

Analyzing the tracking confidence value, one can clearly see the robust tracking results using the proposed method in Section II-B2 in conjunction with the forward-backward tracking strategy described in Section II-B4. The maximum number of ESM iterations was set to 15 in all experiments and the threshold  was set to 0.1. However, the average required number of iterations for convergence is 5.18 in this specific example. Only during more significant tracking disturbances, the maximum number of iteration is reached. The relocalization step was not required in this specific example. Also from the plot, we have measured the vertical and horizontal displacements. In the most extreme cases, the tracked inter-frame horizontal displacement was 15.25 pixels, which corresponds to approximately 15% of the width of the tracked template. This illustrates the ability of tracking fast motions using the proposed direct tracking approach. Finally, from the rotation θ plot, we can notice the slight rotations due to head tilts. We can also detect two occasions where the patient blinks, which cause fast oscillations in the plot. Tracking successfully copes with such events. C. Illumination compensation Comparing the reconstructed mosaics using the proposed local illumination compensation method and the previous SCV-

Confidence (NCC)

0.9

15 10 5

200

100

0

Y-Disp (px)

200

100

0

Rotation (dg)

In Figure 5, we show performance numbers for typical operation conditions. The video corresponding to this plot can be found in the supplementary materials. In this example, we recorded the NCC score which represents tracking quality, the number of required ESM iterations for tracking convergence, the horizontal and vertical tracked displacements in pixels and the map rotation (in degrees).

0.95

0

B. Analysis of mapping results A typical retina map reconstructed using the proposed method is shown in Figure 4. A visual comparison with images acquired by a 30o Cannon CR-2 fundus camera attests the consistency of the reconstructed mosaic.

1

20

Iterations

To prove the value of the efficient pixel selection scheme, we have varied the percentage of active pixels for a given video sequence and measured the number of iterations for tracking convergence. The length of the video was 1000 frames. We have selected percentages of 100%, 70%, 50% and 20% (the lower bound suggested in [11]). Tracking results were practically equal for all percentages and the average number of iterations was 2.25, 2.21, 2.21 and 2.15 iterations, respectively. These results show the value of the pixel selection scheme for reducing the tracking computational load, while maintaining high tracking accuracy. We have repeated this experiment using five different sequences and results systematically indicate equivalent tracking efficiency regardless of the chosen active pixel percentage. For this reason, we set the number of active pixels to 20% for all experiments.

X-Disp (px)

tracking method with illumination compensation described in II-B2 runs at an average of 0.45ms per iteration for tracking 270 × 110 images from the retina map. It is important to highlight the fact that the forward-backward tracking strategy requires tracking to be executed twice per frame, which doubles the time requirement per frame.

5

0

-5 500

1000

1500

2000

2500

Frames

Fig. 5. Tracking statistics for a typical mapping procedure. From top to bottom, the tracking confidence value, the number of ESM iterations for convergence, the tracked horizontal and vertical displacement in pixels and the map rotation θ (in degrees).

based tracking method [1], the advantages in tracking accuracy and robustness become very clear. In Figure 6, one can clearly notice the accuracy gain using the proposed method, which consequently reduces mapping drift. We also provide videos as supplementary material to illustrate the capability of creating larger mosaics using the local compensation method. D. Forward-backward tracking To show additional evidence of the value of the forwardbackward strategy, in Figure 7 we plot the NCC scores for both forward and backward trackers for a given example. From the plots, we can identify in frames [2940,2942] a confidence drop below the minimum threshold (λ = 0.85) in the forward tracker. During this interval, the backward tracking performance is significantly higher (The NCC score is above 0.98). Consequently, the forward-backward strategy is able to

Reconstructed retina map 30º Fundus images

Fig. 4. map.

Typical mapping results using the proposed method. A comparison between the retina map and fundus images attest the consistency of the resulting

overcome such short term tracking disturbances. Videos are provided as supplementary materials to illustrate the value of the proposed forward-backward strategy. It is also important to highlight that if both backward and forward trackers do not perform well, tracking is deemed unreliable and the map relocalization step described in Section II-B5 is executed. In the real clinical scenario, the operator can always return to a previously mapped position to restart tracking. However, this slows down the mapping process. The forward-backward strategy helps improve the system usability, increasing the chances of its adoption by practitioners. IV.

sation technique capable of tracking in extreme illumination conditions. The second is a forward-backward tracking strategy for superior tracking resilience. The third is an efficient pixel selection scheme for reducing the computational load. The experimental results attest the value of the proposed methods and suggest their capability of generating high-quality intraoperative retina maps for navigation, diagnosis and documentation. R EFERENCES [1]

R. Richa, B. Vagvolgy, M. Balicki, G. Hager, and R. Taylor, “Hybrid Tracking and Mosaicking for Information Augmentation in Retinal Surgery,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI’12), Nice, France, 2012, pp. 397–404.

[2]

A. Broehan, T. Rudolph, C. Amstutz, and J. Kowal, “Real-time Multimodal Retinal Image Registration for Computed-Assisted Laser Photocoagulation System,” IEEE Transactions on Biomedical Engineering (TBME), vol. 58, no. 10, pp. 2816–2824, 2011.

[3]

M. Markov, G. Rylander, and A. Welch, “Real-time algorithm for retinal

C ONCLUSION

In this paper, we describe a method for creating intraoperative retina maps using images acquired by a slit-lamp device. We propose three major improvements over the original tracking and mapping formulation in [1]. The first is a direct visual tracking method with a robust local illumination compen-

1

Fig. 6. Reconstructed mosaics (Left) using the proposed SSD-based tracking method with local illumination compensation (Right) using the original SCVbased tracking method with global illumination compensation [1].

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

tracking,” IEEE Transactions on Biomedical Engineering, vol. 40, no. 12, pp. 1269–1281, 1993. S. Barrett, M. Jerath, H. Rylander, and A. Welch, “Digital tracking and control of retinal images,” Optical Engineering, vol. 33, no. 1, pp. 150–159, 1994. J. Berger and D. Shin, “Image-guided macular laser therapy: design considerations and progress toward implementation,” in Ophtalmic Technologies IX - SPIE Medical Imaging, 1999, pp. 241–247. M. Hu, G. Penney, D. Rueckert, P. Edwards, F. Bello, M. Figl, R. Casula, Y. Cen, J. Liu, Z. Miao, and D. Hawkes, “A Robust Mosaicing Method with Super-Resolution for Optical Medical Images,” in MIAR, ser. LNCS. Springer Berlin, 2010, vol. 6326, pp. 373–382. K. Loewke, D. Camarillo, W. Piyawattanameth, M. Mandella, C. Contag, S. Thrun, and J. Salisbury, “In Vivo Micro-Image Mosaicing,” IEEE Transactions on Biomedical Engineering, vol. 58, no. 1, pp. 159–171, 2011. T. Vercauteren, A. Perchant, G. Malandain, X. Pennec, and N. Ayache, “Robust mosaicing with correction of motion distortions and tissue deformations for in vivo fibered microscopy,” Medical Image Analysis, vol. 10, no. 5, pp. 673–692, 2006. G. Silveira and E. Malis, “Real-time Visual Tracking under Arbitrary Illumination Changes,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07), Minneapolis, USA, 2007, pp. 1–6. Z. Kalal, K. Mikolajczyk, and J. Matas, “Forward-Backward Error: Automatic Detection of Tracking Failures,” in International Conference on Pattern Recognition, 2010, pp. 2756–2759. M. Meilland, A. Comport, and P. Rives, “Dense visual mapping of large scale environments for real-time localization,” in Proceedings of IEEE Conference on Intelligent Robots and Systems (IROS’11), San Francisco, USA, 2011, pp. 4242–4248. G. Scandaroli, M. Meilland, and R. Richa, “Improving NCC-based Direct Visual Tracking,” in Proceedings of the European Conference on Computer Vision (ECCV ’12), Firenze,Italy, 2012, pp. 442–455. J. Lim and M. Yang, “A Direct Method for Modeling Non-Rigid Motion with Thin Plate Spline,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’05), Washington, USA, 2005, pp. 1196–1202. S. Baker and I. Matthews, “Lucas-Kanade 20 Years On: A Unifying Framework: Part 1,” Robotics Institute, Carnegie Mellon University, Pittsburgh, USA, Tech. Rep., 2002. S. Benhimane, A. Ladikos, V. Lepetit, and N. Navab, “Linear and Quadratic Subsets for Template-Based Tracking,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’07), Minneapolis, USA, 2007, pp. 1–6. S. Benhimane and E. Malis, “Homography-based 2D Visual Tracking and Servoing,” International Journal of Robotics Research (IJRR), vol. 26, no. 7, pp. 661–676, 2007. R. Szeliski and H. Shum, “Creating full view panoramic image mosaics

Confidence (NCC)

0.95

0.9

0.85

0.8

Forward Backward 1000

1500

2000

2500

3000

3500

4000

Frame

Fig. 7. The NCC score representing tracking confidence for both forward and backward trackers. During the interval [2940,2942], the forward tracker confidence drops below the minimum threshold. Using the backward tracker, the disturbance is successfully bridged.

and environment maps,” in Proceedings of the 24th annual conference on Computer graphics and interactive techniques (SIGGRAPH’97), New York, NY, USA, 1997, pp. 251–258.