In Vivo Micro-Image Mosaicing - Semantic Scholar

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 58, NO. 1, JANUARY 2011

159

In Vivo Micro-Image Mosaicing Kevin E. Loewke*, David B. Camarillo, Wibool Piyawattanametha, Michael J. Mandella, Christopher H. Contag, Sebastian Thrun, and J. Kenneth Salisbury

Abstract—Recent advances in optical imaging have led to the development of miniature microscopes that can be brought to the patient for visualizing tissue structures in vivo. These devices have the potential to revolutionize health care by replacing tissue biopsy with in vivo pathology. One of the primary limitations of these microscopes, however, is that the constrained field of view can make image interpretation and navigation difficult. In this paper, we show that image mosaicing can be a powerful tool for widening the field of view and creating image maps of microanatomical structures. First, we present an efficient algorithm for pairwise image mosaicing that can be implemented in real time. Then, we address two of the main challenges associated with image mosaicing in medical applications: cumulative image registration errors and scene deformation. To deal with cumulative errors, we present a global alignment algorithm that draws upon techniques commonly used in probabilistic robotics. To accommodate scene deformation, we present a local alignment algorithm that incorporates deformable surface models into the mosaicing framework. These algorithms are demonstrated on image sequences acquired in vivo with various imaging devices including a hand-held dual-axes confocal microscope, a miniature two-photon microscope, and a commercially available confocal microendoscope. Index Terms—Confocal microscopy, image mosaicing, in vivo pathology, optical biopsy.

I. INTRODUCTION VER the last several years, a new class of imaging devices has emerged that can be used for noninvasive detection of cancerous and precancerous tissues at the cellular scale. These devices have the potential to revolutionize health care by bringing the microscope to the patient for in vivo pathology. One of the most promising technologies for enabling in vivo

O

Manuscript received May 5, 2010; revised July 23, 2010; accepted September 21, 2010. Date of publication October 7, 2010; date of current version December 17, 2010. The work of K. Loewke was supported by an National Science Foundation (NSF) Graduate Research Fellowship. Asterisk indicates corresponding author. ∗ K. E. Loewke is with the Department of Mechanical Engineering, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). D. B. Camarillo is with the Department of Bioengineering, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). W. Piyawattanametha is with the National Electronics and Computer Technology Center, Pathumthani 12120, Thailand, and also with the Advanced Imaging Research Center, Faculty of Medicine, Chulalongkorn University, Pathumwan, Bangkok 10330, Thailand (e-mail: [email protected]). M. J. Mandella is with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). C. H. Contag is with the Departments of Pediatrics, Microbiology and Immunology, and Radiology, Stanford University, Stanford, CA 94305 USA (email: [email protected]). S. Thrun is with the Department of Computer Science, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). J. K. Salisbury is with the Departments of Computer Science, Surgery, and Mechanical Engineering, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TBME.2010.2085082

Fig. 1. Image mosaic of colon tissue. Images were acquired in vivo with a Mauna Kea CellVizio microendoscope [2]. The mosaic shows a transition between dysplastic colonocytes, highlighted with a fluorescein-conjugated peptide, and normal colon mucosa. (a) Commercial microendoscope [1]. (b) Singleinput image. (c) Image mosaic composed of 22 images. Scale bar is 200 μm.

pathology is laser scanning microscopy (LSM), which can acquire high-resolution optical images of cellular and subcellular structures in a miniaturized form. Although miniature LSM can provide image resolution comparable to histopathology, many devices are inherently limited by a constrained field of view. This “tunnel-vision” effect can make operation difficult, as only a few hundred micrometers of tissue can be imaged at any one given moment when the user is often interested in much larger areas for context. In order to improve physician confidence during in vivo pathology, it will be necessary to visualize tissue at micrometerscale resolution across centimeter-sized fields of view for greater tissue coverage. One approach to widen the field of view is to apply image mosaicing techniques to stitch multiple images together. Fig. 1 shows an example image mosaic of human colon tissue acquired in vivo with a commercially available confocal microendoscope [2]. Image contrast was achieved by topically administering a fluorescent peptide designed to bind specifically to premalignant tissue. Images were acquired at 12 frames per second with 50-μm working distance, and 2.5-μm transverse and 20-μm axial resolution. The image mosaic enlarges the field of view and allows for improved interpretation by clearly showing the transition between colonic dysplasia and normal colon mucosa.

0018-9294/$26.00 © 2011 IEEE

160


Fig. 2. Examples of miniature LSMs. Example data for these devices is shown in Fig. 3. (a) NTROI 10-mm-diameter confocal microscope [4]. (b) NTROI 5-mm-diameter confocal microscope [6]. (c) Miniature two-photon microscope [7]– [9].

A. Laser Scanning Microscopy While traditionally a tabletop technology, LSM has been miniaturized for in vivo applications through advances in fiberoptics and microelectromechanical systems (MEMS) technology [3]– [5]. The in vivo use of these devices is often referred to as “optical biopsy,” since they can be used as a noninvasive alternative to traditional tissue biopsy. Fig. 2 shows three examples of miniature LSMs, and Fig. 3 shows corresponding images acquired in vivo. Fig. 2(a) and (b) are a 10-mm and 5-mm version, respectively, of the dual-axes confocal microscope developed by the Network for Translational Research in Optical Imaging (NTROI) group, Stanford Univeristy [3], [4], [6]. The 10-mm version is hand-held and used for in vivo imaging of skin surfaces and ex vivo imaging of tissue samples, while the 5-mm version fits down the working channel of an endoscope for in vivo imaging of the gastrointestinal tract. In this paper, we present data acquired with the dual-axes confocal microscopes as well as a miniature two-photon fluorescence microscope [7]– [9], as shown in Fig. 2(c), and a commercially available confocal microendoscope [1], as shown in Fig. 1.

B. Background on Image Mosaicing Image mosaicing is an active field of research in computer vision and has found applications in several areas, such as panorama imaging, mapping, teleoperation, and virtual travel [10]. Traditionally, a mosaic is created by stitching two or more overlapping images together to create a single larger image through a process involving registration, warping, resampling, and blending. The central step, image registration, is used to precisely align the images and can be achieved through a combination of different techniques [11]– [13]. Image mosaic-

Fig. 3. In vivo image data acquired by the microscopes shown in Fig. 2. (a) Image of human skin acquired by the 10-mm NTROI microscope. (b) Image of human skin acquired by the 5-mm NTROI microscope. (c) Image of blood vessels in mouse brain acquired with a two-photon microscope.

ing is becoming an increasingly popular tool for confocal microscopy [14]– [17] and medical imaging in general [18]– [22]. A common problem in image mosaicing is dealing with cumulative image registration errors. That is, if the images are registered in a sequential pairwise fashion, alignment errors will propagate through the image chain, becoming most prominent when the path closes a loop or traces back upon itself. In [23], cumulative errors were dealt with for rotational panoramic mosaics using a block adjustment algorithm, where the error between the first and last overlapping images was distributed between all of the views. A more general approach was taken in [24], where a global objective function was used to optimize both the mosaicto-frame alignment as well as the frame-to-frame alignment. A groupwise alignment method was developed in [25] to register multiple images simultaneously. A frame-to-frame estimate was used to initialize a graph structure in [26], and these estimates were refined using an iterative optimization procedure. This problem was addressed in [27] by building an overconstrained system of pairwise registrations that could be solved using linear optimization. This framework was extended in [16] and [28] by using transformations on a Riemannian manifold to model rigid-body motion. One of the other challenges of image mosaicing, and in particular mosaicing in medical applications, is scene deformation. Most mosaicing algorithms assume static scenes, and therefore, used rigid transformations to align the images. In many medical applications, however, the assumption of a static scene may not apply. This becomes important when imaging with miniature LSCM for two reasons. First, deformations can occur when the imaging probe moves too quickly during image acquisition. This skew effect is a common phenomena with scanning imaging devices, where the output image is not an instantaneous snapshot, but rather a collection of data points acquired at different times. Second, and more importantly, deformations can be induced through contact between the tissue surface and

LOEWKE et al.: In Vivo MICRO-IMAGE MOSAICING

Fig. 4.

161

Summary of the techniques presented in Sections II–IV.

the imaging probe. Both of these issues were addressed in [16] and [28] for mosaicing of microscopic images by modeling the motion of the imaging probe. Frame-to-reference mappings were estimated for both the rigid motion between frames as well as the local deformations.

tion problem. Radial basis functions are used to unwarp the images after running the optimization. Our local optimization algorithm is applied toward in vivo imaging with a miniature confocal microendoscope. II. PAIRWISE IMAGE MOSAICING

C. Outline In this paper, we present a new set of algorithms for mosaicing of image sequences acquired with miniature LSMs. An overview of the steps involved in these techniques is presented in Fig. 4. We begin in Section II by presenting an efficient algorithm for pairwise image mosaicing. As first reported in [14], this can allow the user to “paint” an image mosaic in real time and aids navigation by localizing the current view with respect to the larger image map. The algorithm recovers camera motion using an optical flow routine and blends images together using multiresolution pyramidal blending. We demonstrate this algorithm on image sequences acquired in vivo with various imaging devices. In Section III, we present a global image-registration algorithm that builds upon prior study in probabilistic robotics [29] and was first reported in [15]. The key idea is that rigid links between images are replaced by soft constraints that can be bent, but at a penalty. This algorithm results in a linear leastsquares optimization problem that is similar to [27], as well as [16] and [28]. The key advantage of our approach is that it is easily modified to incorporate a local optimization algorithm for simultaneously handling cumulative errors and scene deformations. Here, we also present a new strategy for applying this global optimization in an incremental fashion, and present new applications toward in vivo mosaicing with a two-photon microscope. In Section IV, we present a novel and computationally efficient local optimization algorithm that builds upon prior study in deformable 3-D scan registration [30]. As first reported in [15], this local optimization algorithm extends the mathematical framework of the global optimization algorithm to handle scene deformations by placing soft constraints at intermediate points within each individual image. Earlier methods have used an iterative refinement scheme, where a single image is deformed, the entire mosaic is updated, and then this process is repeated [16], [28]. The contribution of our approach is a computationally efficient framework, where all deformations are modeled and corrected for in a single noniterative optimiza-

In this section, we present an algorithm for pairwise image mosaicing that can be implemented to run in real time. Our algorithm assumes that the motion between frames is small and primarily translational. In general, a good rule of thumb regarding motion is to achieve at least 80% overlap between images, which means that the velocity of the imaging probe should be less than 20% of the field of view multiplied by the frame rate. We do not explicitly model axial rotation, since we have found experimentally that, in our applications, the orientation of the imaging devices remains roughly constant. This is an assumption that is valid for some, but not all, applications of microimaging. Accordingly, in subsequent sections, we discuss how our optimization algorithms can be modified to incorporate axial rotation. We also assume that any fluctuations in imaging depth are small and negligible. The pairwise algorithm consists of three steps: correcting for image distortions, image registration, and image blending. It can also be modified to include a color-mapping algorithm as discussed in [14]. A. Correcting for Image Distortions LSMs are often subject to image distortion that results from the acquisition process [31]. For example, the dual-axes confocal microscopes shown in Fig. 2(a) and (b) use a 2-D MEMS mirror to scan the illumination and collection beams for image generation. The inner axis of the MEMS device (which corresponds to the vertical axis in the image) is driven using a sinusoidal motion at its resonant frequency. While this enables fast generation of raster-scan images, the resulting images are elongated due to the nonlinear rotational velocity of the scanning mirror combined with uniform pixel-sampling frequency. To unwarp the images, we apply a correction factor that is determined using the period of the scanning mirror and the frequency of the sampling clock [31]. Details for implementing this algorithm are deferred to [14]. Fig. 5(a) shows the results of applying this correction factor for an image of a calibration pattern, and Fig. 5(b) shows the results for an image of human skin acquired in vivo. To validate the results, we measured the sizes

162


Fig. 5. Images acquired with the 10-mm dual-axes confocal microscope, before and after correcting for scanning distortions. (a) Image of a calibration pattern before (left) and after (right) correcting for scanning distortions. (b) In vivo image of human skin before (left) and after (right) correcting for scanning distortions.

of the bars shown in Fig. 5(a). Before applying the correction, the heights of the vertical and horizontal bars had a standard deviation of roughly 13% and 18%, respectively. After applying the correction, these values were reduced to 6% and 9%. B. Image Registration Our method for image registration begins with an optical flow routine. While all of the devices used in this paper implement a raster-scanning technique, we expect that our image-registration algorithm would work with other scanning techniques, such as spiral or Lissajous scanning [32]. In order to track motion between successive image pairs, we use a pyramidal version of the Lucas–Kanade tracker [33]. This implementation uses a coarse-to-fine approach, where the images are downsampled using a Gaussian pyramid, and the flow is computed iteratively at each level of the pyramid. A disadvantage of this technique is that downsampling can smooth out fine features [34], but usually in our datasets, there have been sufficient larger features that can be tracked successfully to recover the global motion. The motion of features between images frames can be visualized using flow vectors, as shown in Fig. 6. Once the motion vectors have been found, we convert them into a single translation vector. In most cases, there will be outliers, which are caused primarily by violations of the brightness constancy assumption (which can be mitigated using other techniques, such as [35]), as well as the fact that neighboring features can appear similar. Accordingly, we calculate the translation vector as the mean value of the motion vectors, while rejecting values outside two standard deviations. Before selecting an image for mosaicing, we impose a minimum motion requirement. That is, if the computed translation vector does not exceed a predefined threshold (typically 5 pixels), the image is discarded. This prevents an excess number of images from being added to the mosaic while the microscope is stationary or barely moving, which can significantly increase the number of images for subsequent optimization algorithms. An image is

Fig. 6. Examples of optical flow fields. The tails of the arrows indicate the location of features in the displayed image, and the heads of the arrows indicate the corresponding locations of features in previous images (not shown). That is, the microscope is moving in the direction away from the arrows.

selected for mosaicing, if one of the two events occurs: 1) the accumulated flow exceeds 5 pixels or 2) a maximum number of sequential images have been discarded (typically 10). After an image has been selected, we add it to the mosaic. In some cases the estimated translation vector will still have a small amount of error. We, therefore, fine-tune the result using template matching with normalized cross correlation. The template matching searches small displacements of only a couple pixels. It should be noted that template matching can often be used on its own as a suitable registration technique, but we have found experimentally that it can sometimes find an obviously incorrect solution, since the microimages are homogeneous in nature. C. Image Blending Our image blending routine uses a multiresolution pyramidal blending algorithm [36], where overlapping regions are decomposed into different frequency bands, merged at those frequency bands using a blending mask, and then recombined to form the composite image. First, the two images I1 and I2 are each decomposed into their Laplacian pyramids Li (I1 ) and Li (I2 ), where i indicates the level of the pyramid. Then, the blending mask IM is decomposed into a Gaussian image pyramid Gi (IM ). The Laplacian pyramid of the combined (or blended) image IB is constructed as Li (IB ) = Li (I1 )Gi (IM ) + Li (I2 )(1 − Gi (IM )).

(1)

The final blended image is then reconstructed from its Laplacian image pyramid by expanding and then summing all levels of the pyramid. This procedure can be modified by using different numbers of frequency bands (i.e., levels in the pyramids). In addition, different types of masks can be used at different frequency bands, such as a binary mask at high frequencies to preserve sharp details in the composite image. This is a key advantage of pyramid blending, since weighted averaging can


163

Fig. 7. Image mosaic of human skin acquired in vivo at a depth of 60 μm, composed of roughly 300 images. Inset on left: input frame. Inset on right: corresponding area of mosaic with an improved SNR compared to the input frame. Scale bar is 85 μm.

per second with 5-μm transverse resolution, 7-μm axial resolution, and a field of view of 27 × 85 μm sampled at 200 × 500 pixels. The images were first corrected for scanning distortions, registered, and then stitched using pyramidal blending. Images were added to the mosaic, if they passed the minimum motion threshold as discussed in Section II-B. Fig 7 also demonstrates how image mosaicing can increase the SNR. The degree of this effect is a result of the amount of image overlap as well as the choice of the blending mask. In this example, a mask was used that gave more weight to the center of each new image and almost zero weight toward the edges. III. GLOBAL OPTIMIZATION

Fig. 8. Fundamental idea: to deal with cumulative errors (represented by the misaligned bullseyes), rigid links between images are replaced by soft constraints, or springs. The spring resting positions comes from the initial registrations. To close the loop, the springs are allowed to stretch, but at a penalty.

often smooth away these details. However, it should be noted that other blending routines can be used, and in Section IV-C, we present results using both pyramid blending and simple averaging. D. Results With a Hand-Held Confocal Microscope Fig. 7 shows the results of applying our pairwise image mosaicing algorithm to a sequence of roughly 300 images of human skin obtained in vivo with the 10-mm dual-axes confocal microscope. This microscope acquires images at a rate of five frames

One of the main challenges associated with mosaicing from image-to-image is that alignment errors will propagate through the image chain [16], [24], [27]. This effect is most noticeable when the image chain traces back upon itself or closes a loop. Here, we present an efficient global image-alignment algorithm that draws upon a technique from probabilistic robotics [29]. The fundamental idea is that rigid links between images are replaced by soft constraints, or “springs.” These links can be bent, but bending them incurs a penalty. This idea is illustrated in Fig. 8. The mathematical framework for our algorithms is as follows. Images are registered in 2D image space, and each image location is denoted as xk = (xk , yk )T .

(2)

The estimated correspondence (from image registration) between two images is denoted as xk →k +1 . Images are registered in a sequential pairwise fashion, with link constraints

164


placed between neighboring images xk →k +1 = xk +1 − xk .

(3)

When the image path attempts to close a loop or trace back upon a earlier imaged area, cumulative registration errors will cause a misalignment with the mosaic, thereby requiring additional link constraints. For example, if the image chain attempts to close the loop by stitching the n th image to the first image, we would have a constraint based on the estimated correspondence xn →1 . The n th image would then have two constraints: one with the previous neighboring image, and the other with the first image. In a more general case, the k th image could overlap with either a neighboring image or any arbitrary location in the preexisting mosaic. This general constraint is of the form xk →l = xl − xk .

(4)

=:ˆ x k →l

To handle cumulative errors in the mosaic, we allow for a violation (or stretch) of these link constraints. To achieve this, we give each initial registration, a probability distribution for the amount of certainty in the measurement. This distribution is assumed Gaussian (which is commonly used in probabilistic robotics for measurement uncertainties), and potentials are placed at each link between the k th and l th images [29], [30] 1 −1/2 exp − (xk →l − ˆ xk →l )T hk →l = |2π k → l| 2 × Σ−1 (x − ˆ x ) (5) k →l k →l k →l where Σ is a covariance matrix that specifies strength of the link. In practice, this is a diagonal matrix, where the x- and y-directions are weighted equally. These parameters are chosen based on the quality of initial registration, quantified by the sumof-squared difference in pixel intensities (although normalized cross correlation or some other metric could alternatively be used). The negative logarithm of the potentials, summed over all links, is written as (constant omitted) (xk →l − ˆ xk →l )T Σ−1 xk →l ). (6) H= k →l (xk →l − ˆ k →l

Equation (6) represents the error between the initial image registration and the final image placement. By minimizing (6), we maximize (5), and thus maximize the probability of correct registration. To minimize H, we set up a system of overdetermined linear equations that can be solved via linear least squares. We denote x ˜ ∈ R2N ×1 as the state vector containing all of the camera poses xk , where N is the total number of poses. Then, we denote u ∈ R2P ×1 as the state vector containing all of the correspondence estimates xk →l , where P is the total number of constraints. Note that the first constraint fixes the first image to the origin ˜ ∈ R2P ×2P is a diagonal of the reference frame. The matrix Σ matrix with the weighting elements Σk →l . To anchor the first image, we set the weight to 1/∞, which can be implemented with a small positive number. Finally, we denote the matrix

J ∈ R2P ×2N as the Jacobian of the motion equation (4) with respect to the camera poses xk . The likelihood function H can then be rewritten as ˜ −1 (u − J˜ H = (u − J˜ x)T Σ x).

(7)

Expanding (7), we get ˜ −1 u − 2uT Σ ˜ −1 J˜ ˜ −1 J˜ H = uT Σ x+x ˜ T JT Σ x.

(8)

In order to find the x ˜ that maximizes the probability of correct image registrations, we take the derivative of (8) and set it equal to zero. Making use of the derivative formulas for quadratic matrix equations, we get δH ˜ −1 u + x ˜ −1 J)T + (JT Σ ˜ −1 J)]. = 0 = −2JT Σ ˜T [(JT Σ δ˜ x (9) This can be rewritten as ˜ −1 J˜ ˜ −1 u x = JT Σ JT Σ

(10)

which is of the form A˜ x = b and can be solved using least squares. It should be noted that, although our optimization algorithm does not explicitly model axial rotations, it is straightforward to extend our methods to accommodate rotation. The main difference is that the Jacobian will be nonlinear, and the resulting optimization can be solved using a variety of methods [29], [30], [37]. A. Implementation Strategy With the global optimization algorithm defined, we now need a strategy for implementation. This is straightforward when the image path completes a single loop. However, in many situations, we are faced with arbitrary image paths that can trace back upon any earlier imaged area. To deal with this type of arbitrary motion, one option would be to register all image-pair combinations and apply link constraints whenever a successful image registration is found. However, the computational demand for this can be prohibitive, especially for real-time applications. Instead, to correct for cumulative errors, we perform the following steps: first, images are registered in a pairwise fashion using the technique discussed in Section II. Then, the global position of each new image is checked against the global positions of neighboring images in the mosaic. If a sufficiently close image is found (typically a distance less than the image width), an additional image registration is performed between the two images to check for cumulative errors. Additional link constraints are placed if two criteria are satisfied: 1) the registration is successful (quantified by the residual error) and 2) the detected cumulative error is large enough (typically greater than a few pixels). After a certain number of cumulative errors have been detected (typically between 10–20), the optimization is performed. This process can then be repeated. We have found that this method works well, but to make the system more robust and less sensitive to registration errors, one can use robust least squares to solve (10).


165

Fig. 9. Image mosaic before and after global optimization. The images show live mouse brain blood vessels acquired in vivo with a miniature two-photon microscope. The mosaic is composed of 305 images. There are several areas with cumulative errors, which are corrected after running the optimization. (a) Before optimization. (b) After optimization. (c) Global optimization at intermediate stages, with link constraints. Boxes show images detected to have cumulative errors.

B. Results With a Two-Photon Microscope Fig. 9 shows the results of applying the global optimization algorithm to images of blood vessels in mouse brain acquired in vivo with a miniature two-photon microscope [7]– [9]. In this example, the microscope was positioned on a tabletop x–y stage 250 μm above the surface of the brain, and there was no direct tissue contact or scene deformation. To create the final mosaic, we performed the search strategy discussed in Section III-A. After ten cumulative errors were detected, the

global optimization was performed to update the mosaic and the procedure was repeated. In the final mosaic shown in Fig. 9, the global optimization was performed at four intermediate points during the processing of 305 images. C. Deficiencies of Rigid Mosaicing The example shown in Fig. 9 was a good dataset for demonstrating the global optimization algorithm because the microscope was moved slowly and was not in direct contact with the

166


Section III. Each image is partitioned into patches, with a local node assigned to the center of each patch. The location of a node is denoted by xi,k = (xi,k , yi,k )T .

(11)

We then introduce two new sets of constraints on the local nodes within each image. The first set of constraints is based on a node’s relative position to its neighbors within an individual image δxi→j,k = xj,k − xi,k

(12)

ˆ i →j , k =:δ x

where δxi→j,k is a constant value that represents the nominal spacing between the nodes. The second set of constraints is based on a node’s relative position to the corresponding node in a neighboring image Fig. 10. To model nonrigid deformations, images are partitioned into patches. Soft constraints, or springs, connect the patches within each image (as shown) as well as overlapping patches between neighboring images (not shown for clarity).

tissue. As a result, scene deformation was negligible. However, in other situations where controlled motion of the device is more difficult, such as imaging with flexible microendoscopes, scene deformation can be induced due to contact between the probe and the tissue surface. Accordingly, in the following section, we present a local optimization algorithm to correct for these local misalignments. IV. LOCAL OPTIMIZATION In this section, we present a method for integrating deformable surface models into the image mosaicing algorithms. This method builds upon the techniques discussed in the Section III, and is similar to prior work in block matching [23] and bundle adjustment [24]. We begin by partitioning each image into patches, and assigning a node to the center of each patch. The nodes are spaced evenly on a regular grid. The number of patches depends on the amount of anticipated deformation, since too small of a patch size will not be able to accurately recover larger deformations. Typically, four or nine patches are used. The key idea is that, in addition to the global constraints, or springs, between neighboring images, we place local constraints between the neighboring nodes within each image. As before, these constraints can be bent, but bending them incurs a penalty. We now have two sets of soft constraints: intraimage, which preserve image integrity and prevent an image from being deformed too much, and interimage, which model the deformation between neighboring images. Fig. 10 illustrates this concept. To measure the amount of deformation, we register the partitioned patches in each image with the corresponding patches in the earlier image using template matching with normalized cross correlation.

δxi,k →l = xi,l − xi,k

(13)

ˆ i , k →l =:δ x

where δxi,k →l contains the measured local deformation. If no deformation is present, then the values of δxi,k →l are simply set to the global displacement between the two images. To accommodate nonrigid deformations in the scene, we allow for a violation of these local link constraints, and apply the familiar Gaussian potentials 1 −(1/2) exp − (δxi→j,k − δˆ xi→j,k )T gi→j,k = |2πΘi→j,k | 2 × Θ−1 (δx − δˆ x ) (14) i→j,k i→j,k i→j,k −(1/2)

gi,k →l = |2πΘi,k →l |

× Θ−1 i,k →l (δxi,k →l

1 − (δxi,k →l − δˆ xi,k →l )T 2 − δˆ xi,k →l ) (15)

exp

where Θi→j,k and Θi,k →l are diagonal matrices. The interimage constraints, given by Θi,k →l , are quantified by the sumof-squared difference in pixel intensities. The intraimage constraints, given by Θi→j,k , reflect the rigidity of the surface, and are tunable based on the amount of allowable deformation. In practice, we set these to a fixed value roughly an order of magnitude larger than the interimage constraints. The negative logarithm of these potentials, summed over all links, is written as (constant omitted) (δxi→j,k − δˆ xi→j,k )T Θ−1 xi→j,k ) G= i→j,k (δxi→j,k − δˆ i→j,k

+

(δxi,k →l − δˆ xi,k →l )T Θ−1 xi,k →l ). i,k →l (δxi,k →l − δˆ

i,k →l

(16) A. Local Optimization Algorithm The mathematical framework for modeling the nonrigid deformations is similar to the optimization algorithm presented in

As earlier, we can rewrite G as a set of linear equations using state vectors and the Jacobian of the motion equations. Our new optimization algorithm now minimizes the combined target


167

Fig. 11. Example of image warping with radial basis functions. Left: original image. Right: warped image. Green triangles show the source landmarks, and red circles show the destination landmarks.

function min (G + H)

δ x,x

(17)

to simultaneously recover the global image locations as well as the local scene deformation. As discussed in Section III, this framework can be extended to include axial rotations, resulting in a nonlinear optimization problem [29], [30], [37]. B. Warping With Radial Basis Functions After running the optimization algorithm, each image must be unwarped based on the estimated deformation. For this, we use radial basis functions, which are commonly used in nonrigid medical image registration. Radial basis functions are smooth, radially symmetric functions centered on control points. Although there are several types to use, we have found that the Gaussian-based functions work well for our images. Radial basis functions use an interpolation function to transform pixel locations in a source image to pixels locations in a destination image. This interpolation function is derived by imposing constraints from a set of control points, or landmarks, and a set of destination points [38]. For our application, the control points are given by the centers of the patches within an image, while the corresponding destination points are recovered using the local optimization algorithm. Fig. 11 shows an example of warping an image using this method. In this example, a set of nine control points are placed at intermediate locations within the image. Fig. 12 shows an example of blending two images before and after compensating for the nonrigid deformation. The images show tissue structures on the skin of a human hand acquired with the CellVizio microendoscope. The deformation was corrected by segmenting the images into four patches, aligning the patches, and warping one of the images using radial basis functions. The root-mean-square error (RMSE) of pixel intensity for the two images was 50.1 for rigid registration and 27.3 for nonrigid registration. The remaining RMSE is due to smaller deformations that are not recovered by our algorithm as well as signal noise. C. Results With a Confocal Microendoscope Fig. 13 shows the results of applying our algorithm to images of skin tissue taken with a Mauna Kea CellVizio fibered

Fig. 12. Comparison of rigid versus nonrigid registration. (a) Two images are globally aligned and averaged. (b) Nonrigid deformation is corrected by warping one of the images. Triangles show the original control points, and circles show their warped locations. The visual gain is noticeable on the upper half. (a)Rigid registration. Left: average, right: difference. (b) Nonrigid registration. Left: average, right: difference.

confocal microendoscope [1] (model S-1500-5.0), with a frame rate of 12 images/s, field of view of 424 × 424 μm, and image size 336 × 336 pixels. In this example, the images are blended using a simple average of overlapping pixels (although in subsequent examples we use pyramidal blending). The results show that our global alignment algorithm reduces cumulative errors that are most noticeable at the loop closure. To remove local misalignments that cause blurriness, we model the deformation using nine patches per image and run the complete optimization algorithm. The optimized mosaic shown in Fig. 13(C) is composed of 93 images and took 73 s to complete on an Intel Xeon 2.33-GHz processor. Fig. 14 shows another example using the same experimental setup. In this case, the tissue stretch was substantially larger than in Fig. 13, and each image was, therefore, partitioned into four patches to accommodate larger deformations. The images were blended using pyramidal blending. The optimized mosaic in Fig. 14(B) is composed of 134 images and took 46 s to complete. A final example is shown in Fig. 15, again using four patches per image and pyramidal blending. In this case, the image path does not close a loop. The optimized mosaic in Fig. 15(b) is composed of 80 images and took 19 s to complete. Table I provides a summary of the quantitative results for our local optimization algorithm. We provide the average RMSE of pixel intensity, where the RMSE was calculated for each sequential image pair and then averaged for all pairs in a single mosaic. We report the RMSE for the global optimization (7) as well as the local optimization (17). For the local optimization al-

168


Fig. 13. (A) Original image mosaic of tissue structures on the back of a human hand acquired with a microendoscope. There are 93 images in the mosaic. The image path starts at the top, and moves downward in a clockwise circle. The cumulative error is most noticeable where the path closes the loop, as shown in box 1. (B) After the global alignment optimization. The cumulative error is reduced, as shown in box 2. However, there are still local misalignments due to scene deformation, as shown in box 3. (C) After the complete optimization. By modeling the nonrigid deformation, the image mosaic appears sharper and small details are more noticeable.


169

Fig. 14. Mosaics of tissue on a human hand, composed of 134 images. The image path starts at the lower left, and moves counterclockwise. (A) Original image mosaic. The cumulative error is most noticeable where the path closes the loop, as shown in box 1. (B) After the global alignment optimization. The cumulative error is reduced, as shown in box 2. However, there are still local misalignments due to scene deformation, as shown in box 3. (C) After the complete optimization. By modeling the nonrigid deformation, the image mosaic appears sharper and small details are more noticeable, as shown in box 4. Lattices before and after optimization are shown for every few images. Scale bar in (a) is 400 μm.

170


Fig. 15. Top row: Mosaic of tissue on a human hand, composed of 80 images, with misalignments in boxes 1 and 2. Bottom row: After the optimization, with sharper details in boxes 3 and 4. TABLE I QUANTITATIVE RESULTS OF THE GLOBAL VERSUS LOCAL OPTIMIZATION ALGORITHMS

gorithm, the images were warped prior to calculating the RMSE. As shown, the percent improvement ranges from roughly 5% to 20%. The remaining RMSE for the local optimization is primarily due to unrecovered smaller deformations and signal noise. Table I also compares the computation times for these mosaics. V. CONCLUSION In this paper, we showed that image mosaicing can be a powerful tool for widening the field of view and creating image maps of microanatomical structures. We presented a set of algorithms that can be applied toward mosaicing of image sequences acquired by miniature LSMs as well as other devices. First, we presented an efficient algorithm for pairwise image mosaicing that can be implemented in real time. Second, to deal with cumulative errors, we presented a global alignment algorithm that draws upon techniques commonly used in probabilistic robotics. Third, to accommodate scene deformation, we presented a lo-

cal alignment algorithm that incorporates deformable surface models into the mosaicing framework. The key contribution of this framework compared to the prior methods is that all deformations are modeled and corrected for in a single noniterative optimization problem. These algorithms were demonstrated on image sequences acquired in vivo with various imaging devices, including a hand-held dual-axes confocal microscope, a miniature confocal microendoscope, and a miniature two-photon microscope. In the future, we plan to explore the potential clinical impacts of this study for enhancing diagnosis and treatment during in vivo pathology. ACKNOWLEDGMENT This work was a collaboration between the Stanford BioRobotics Lab and the Stanford Network for Translational Research in Optical Imaging (NTROI), and the authors would like to thank members from both groups for advice, discussions, and technical assistance.

REFERENCES [1] Mauna Kea Technologies Website. (2010). [Online]. Available: http://www.maunakeatech.com/. [2] P. Hsiung, J. Hardy, S. Friedland, R. Soetikno, C. Du, A. Wu, P. Sahbaie, J. Crawford, A. Lowe, C. Contag, and T. Wang, “Detection of colonic dysplasia in vivo using a targeted heptapeptide and confocal microendoscopy,” Nature Med., vol. 14, pp. 454–458, 2008.


[3] J. Liu, M. Mandella, H. Ra, L. Wong, O. Solgaard, G. Kino, W. Piyawattanametha, C. Contag, and T. Wang, “Miniature near-infrared dual-axes confocal microscope utilizing a two-dimensional microelectromechanical systems scanner,” Opt. Lett., vol. 32, no. 3, pp. 256–258, 2007. [4] W. Piyawattanametha, H. Ra, M. Mandella, J. Liu, L. Wong, C. Du, T. Wang, C. Contag, G. Kino, and O. Solgaard, “Three-dimensional in vivo real time imaging by a miniature dual-axes confocal microscope based on a two-dimensional MEMS scanner,” in Proc. Int. Conf. SolidState Sens., Actuators, Microsyst., 2007, pp. 439–442. [5] V. Becker, T. Vercauteren, C. von Weyern, C. Prinz, R. Schmid, and A. Meining, “High resolution miniprobe-based confocal microscopy in combination with video-mosaicing,” Gastrointestinal Endosc., vol. 66, no. 5, pp. 1001–1007, 2007. [6] W. Piyawattanametha, M. Mandella, H. Ra, J. Liu, E. Garai, G. Kino, O. Solgaard, and C. Contag, “MEMS based dual-axes confocal clinical endoscope for real time in vivo imaging,” in Proc. Opt. MEMS Nanophoton., 2008, pp. 42–43. [7] W. Piyawattanametha, R. Barretto, T. Ko, B. Flusberg, E. Cocker, H. Ra, D. Lee, O. Solgaard, and M. Schnitzer, “Fast-scanning two-photon fluorescence imaging based on a microelectromechanical systems two dimensional scanning mirror,” Opt. Lett., vol. 31, no. 13, pp. 2018–2020, 2006. [8] K. Deisseroth, G. Feng, A. Majewska, G. Miesenbock, A. Ting, and M. Schnitzer, “Next-generation optical technologies for illuminating genetically targeted brain circuits,” J. Neurosci., vol. 26, no. 41, pp. 10 380– 10 386, 2006.. [9] W. Piyawattanametha, E. Cocker, L. Burns, R. Barretto, J. Jung, H. Ra, O. Solgaard, and M. Schnitzer, “In vivo brain imaging using a portable 2.9 gram two-photon microscope based on a microelectromechanical systems scanning mirror,” Opt. Lett., 2009, to be published.. [10] R. Szeliski, “Video mosaics for virtual environments,” IEEE Comput. Graph. Appl., vol. 16, no. 2, pp. 22–30, Mar. 1996. [11] L. Brown, “A survey of image registration techniques,” ACM Comput. Surv., vol. 24, no. 4, pp. 325–376, 1992. [12] T. Vercauteren, X. Pennec, A. Perchant, and N. Ayache, “Symmetric logdomain diffeomorphic registration: A demons-based approach,” in Proc. MICCAI, 2008, pp. 754–761. [13] T.Vercauteren, X. Pennec, A. Perchant, and N. Ayache, “Non-parametric diffeomorphic image registration with the demons algorithm,” in Proc. MICCAI, 2007, pp. 319–326. [14] K. Loewke, D. Camarillo, W. Piyawattanametha, D. Breeden, and K. Salisbury, “Real-time image mosaicing with a hand-held dual-axes confocal microscope,” in Proc. SPIE, 2008, vol. 6851, pp. 68510F-1–68510F-9. [15] K. Loewke, D. Camarillo, K. Salisbury, and S. Thrun, “Deformable image mosaicing for optical biopsy,” in Proc. ICCV, 2007, pp. 1–8. [16] T. Vercauteren, A. Perchant, G. Malandain, X. Pennec, and N. Ayache, “Robust mosaicing with correction of motion distortions and tissue deformations for in vivo fibered microscopy,” Med. Image Anal., vol. 10, no. 5, pp. 673–692, 2006. [17] Y. Patel, K. Nehal, I. Aranda, Y. Li, A. Halpern, and M. Rajadhyaksha, “Confocal reflectance mosaicing of basal cell carcinomas in mohs surgical skin excisions,” J. Biomed. Opt., vol. 12, no. 3, p. 034027, 2007. [18] K. Loewke, D. Camarillo, C. Jobst, and K. Salisbury, “Real-time image mosaicing for medical applications,” in Proc. MMVR, 2007, pp. 304–309. [19] S. Seshamani, W. Lau, and G. Hager, “Real-time endoscopic mosaicking,” in Proc. MICCAI, 2006, pp. 355–363. [20] S. Brekke, S. Rabben, A. Stoylen, A. Haugen, G. Haugen, E. Steen, and H. Torp, “Volume stitching in three-dimensional echocardiography: distortion analysis and extension to real time,” Ultrasound Med. Biol., vol. 33, no. 5, pp. 782–796, 2007.

171

[21] A. Can, C. Stewart, B. Roysam, and H. Tanenbaum, “A feature-based algorithm for joint, linear estimation of high-order image-to-mosaic transformations: Mosaicing the curved human retina,” Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp. 412–419, Mar. 2002. [22] S. Atasoy, D. Noonan, S. Benhimane, N. Navab, and G. Yang, “A global approach for automatic fibroscopic video mosaicing in minimally invasive diagnosis,” in Proc. MICCAI, 2008, pp. 850–857. [23] H. Y. Shum and R. Szeliski, “Construction of panoramic mosaics with global and local alignment,” Int. J. Comput. Vis., vol. 36, no. 2, pp. 101– 130, 2000. [24] H. Sawhney, S. Hsu, and R. Kumar, “Robust video mosaicing through topology inference and local to global alignment,” in Proc. ECCV, 1998, vol. 2, pp. 103–119. [25] C. Wachinger, B. Glocker, J. Zeltner, N. Paragios, N. Komodakis, M. S. Hansen, and N. Navab, “Deformable mosaicing for whole-body mri,” in Proc. MICCAI, 2008, pp. 113–121. [26] S. Seshamania, M. Smith, J. Corso, M. Filipovich, A. Natarajan, and G. Hager, “Direct global adjustment methods for endoscopic mosaicking,” in Proc. SPIE, 2009, vol. 7261, pp. 1–9. [27] J. Davis, “Mosaics of scenes with moving objects,” in Proc. CVPR, 1998, pp. 354–360. [28] T. Vercauteren, A. Perchant, X. Pennec, and N. Ayache, “Mosaicing of confocal microscopic in vivo soft tissue video sequences,” in Proc. MICCAI, 2005, pp. 753–760. [29] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. Cambridge, MA: MIT Press, 2005. [30] D. H¨ahnel, S. Thrun, and W. Burgard, “An extension of the ICP algorithm for modeling nonrigid objects with mobile robots,” in Proc. IJCAI, 2003, pp. 915–920. [31] M. Sanderson, “Acquisition of multiple real-time images for laser scanning microscopy,” Microsc. Anal., vol. 18, no. 4, pp. 17–23, 2004. [32] J. Liu, M. Mandella, N. Loewke, H. Haeberle, H. Ra, W. Piyawattanametha, O. Solgaard, G. Kino, and C. Contag, “Micromirror-scanned dual-axis confocal microscope utilizing a gradient-index relay lens for image guidance during brain surgery,” J. Biomed. Opt., vol. 15, no. 2, pp. 026029-1–026029-5, 2010. [33] J. Bouguet, “Pyramidal implementation of the Lucas Kanade feature tracker,” Intel Corporation, Microprocessor Research Labs, 2000. [34] T. Brox, C. Bregler, and J. Malik, “Large displacement optical flow,” in Proc. CVPR, 2009, pp. 1–8. [35] Y. Wu and J. Fan, “Contextual flow,” in Proc. CVPR, 2009, pp. 1–8. [36] P. Burt and E. Adelson, “A multiresolution spline with application to image mosaics,” ACM Trans. Graph., vol. 2, no. 4, pp. 1217–236, 1983. [37] E. Olson, J. Leonard, and S. Teller, “Fast iterative alignment of pose graphs with poor initial estimates,” in Proc. ICRA, 2006, pp. 2262–2269. [38] M. Fornefett, K. Rohr, and H. Stiehl, “Elastic registration of medical images using radial basis functions with compact support,” in Proc. CVPR, 1999, pp. 402–407.

Authors’ photographs and biographies not available at the time of publication.