c 2001 Society of Photo-Optical Instrumentation Engineers. This paper was published in Medical Imaging 2001: Visualization, Copyright ° Display, and Image-Guided Procedures, Proc. SPIE 4319 (San Diego, CA, February 17–22, 2001), pages 445–456, and is made available as an electronic preprint with permission of SPIE. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.
Augmented reality visualization of brain structures with stereo and kinetic depth cues: System description and initial evaluation with head phantom Calvin R. Maurer, Jr.a,b , Frank Sauerd , Bo Huc , Benedicte Bascled , Bernhard Geigerd , Fabian Wenzeld , Filippo Recchib , Torsten Rohlfinga , Christopher M. Brownc , Robert S. Bakosa , Robert J. Maciunasa , Ali Bani-Hashemid a Department
of Neurological Surgery, b Department of Biomedical Engineering, of Computer Science, University of Rochester, Rochester, NY d Siemens Corporate Research, Princeton, NJ
c Department
ABSTRACT We are developing a video see-through head-mounted display (HMD) augmented reality (AR) system for imageguided neurosurgical planning and navigation. The surgeon wears a HMD that presents him with the augmented stereo view. The HMD is custom fitted with two miniature color video cameras that capture a stereo view of the real-world scene. We are concentrating specifically at this point on cranial neurosurgery, so the images will be of the patient’s head. A third video camera, operating in the near infrared, is also attached to the HMD and is used for head tracking. The pose (i.e., position and orientation) of the HMD is used to determine where to overlay anatomic structures segmented from preoperative tomographic images (e.g., CT, MR) on the intraoperative video images. Two SGI 540 Visual Workstation computers process the three video streams and render the augmented stereo views for display on the HMD. The AR system operates in real time at 30 frames/sec with a temporal latency of about three frames (100 ms) and zero relative lag between the virtual objects and the real-world scene. For an initial evaluation of the system, we created AR images using a head phantom with actual internal anatomic structures (segmented from CT and MR scans of a patient) realistically positioned inside the phantom. When using shaded renderings, many users had difficulty appreciating overlaid brain structures as being inside the head. When using wire frames and texture-mapped dot patterns, most users correctly visualized brain anatomy as being internal and could generally appreciate spatial relationships among various objects. The 3-D perception of these structures is based on both stereoscopic depth cues and kinetic depth cues, with the user looking at the head phantom from varying positions. The perception of the augmented visualization is natural and convincing. The brain structures appear rigidly anchored in the head, manifesting little or no apparent swimming or jitter. The initial evaluation of the system is encouraging, and we believe that AR visualization might become an important tool for image-guided neurosurgical planning and navigation. Keywords: Augmented reality, head-mounted display, calibration, registration, tracking, visualization.
1. INTRODUCTION In conventional image-guided surgery, one or more three-dimensional (3-D) images (e.g., CT, MR) are acquired preoperatively. At the time of surgery, image-to-physical space registration is performed (e.g., using a stereotactic frame, skin-affixed markers, or bone-implanted markers) to establish the one-to-one mapping or transformation between coordinates in the image and in physical space such that points in the two spaces that correspond to the same anatomic point are mapped to each other. Stereotactic procedures use the image-to-physical transformation to direct a needle (stereotactic biopsy) or energy (stereotactic radiosurgery) to a surgical target (e.g., tumor) located Further author information: (Send correspondence to C.R.M.) C.R.M.: Department of Neurological Surgery, University of Rochester, 601 Elmwood Avenue, Box 670, Rochester, NY 14642, calvin
[email protected]; F.S.: Siemens Corporate Research, 755 College Road East, Princeton, NJ 08540,
[email protected]; B.H.:
[email protected]; C.M.B.:
[email protected]; B.B.:
[email protected]; B.G.:
[email protected]; F.W.:
[email protected]; F.R.:
[email protected]; T.R.: torsten
[email protected]; R.S.B.: robert
[email protected]; R.J.M.: robert
[email protected]; A.B.-H.:
[email protected].
445
in the images. Surgical navigation systems use the transformation to track in real time the changing position of a surgical probe or instrument on a display of the preoperative images. The use of a computer monitor for display of such information requires the surgeon using the system to look away from the surgical scene to see the position of the probe in the preoperative images. This situation is less than ideal for several reasons. The surgeon must mentally relate the surgical view and the computer display. It is potentially dangerous to look away from the surgical field while holding an instrument near vulnerable anatomy. The two-dimensional (2-D) nature of the monitor display limits the surgeon’s ability to appreciate the 3-D structure of the image information relative to the surgical scene. One way to address these limitations is with augmented reality (AR). In virtual reality (VR), the user’s sensory input is totally replaced by computer-generated information. In AR, computer-generated stimuli augment the natural environment, which can be useful to facilitate an application or task. One form of enhancement is to use computergenerated graphics to add virtual objects (e.g., wire-frame models) to the real world scene. AR systems in which information derived from pre- or intraoperative images is superimposed on the real world scene are being developed for a wide variety of applications in surgery and medicine, including neurosurgery,10,14,18,19,25 cerebrovascular neurosurgery,23 ENT surgery,10,19 oral and maxillofacial surgery,2 breast biopsy16 and surgery,29 orthopaedic surgery,3 laparoscopy,15 endoscopy,21 and ophthalmology.1 We are developing a video see-through head-mounted display (HMD) AR system for image-guided neurosurgical planning and navigation. Our main goals are to accurately register virtual objects derived from preoperative images with the real-world scene viewed by the surgeon, provide depth perception (with stereo disparity, motion parallax, perspective, and possibly shading) so that the surgeon can appreciate the 3-D structure of the image information relative to the surgical scene, and provide the augmented views in real-time with low latency and zero relative lag between the virtual objects and the real-world scene. With such a system, a neurosurgeon could look at a patient’s head before starting a surgical procedure and see structures such as the brain surface, a tumor, blood vessels, etc., either individually or in some combination, inside the head. We believe that this may be of great benefit in planning and visualizing neurosurgical procedures. Most applications of AR in neurosurgery are operating microscope systems in which preoperative image information is injected and merged with the real-world scene in a common focal plane. Early academic efforts14,25 and all current commercial systems are monochrome and monocular, i.e., they inject into only one of the two light paths, and they display only a planned surgical trajectory, a contour that is the intersection of the focal plane with the target, or a silhouette of the target. One recent system overlays projections into both light paths of a stereo operating microscope, thereby achieving stereoscopic depth perception of 3-D virtual objects representing internal anatomic structures segmented from preoperative images.6,7,8,9,10,19 A different group is doing something conceptually similar in a head-mounted stereo operating microscope.2 Our approach uses a video see-through head-mounted display, and thus is fundamentally different from the optical see-through approaches primarily used in most other applications to neurosurgery (see Ref. 28 for a comparison of optical vs. video see-through AR systems). One neurosurgical application augments the surgical scene by overlaying graphics on a single, stationary video camera image.18 Our approach is probably most similar to the breast biopsy system described in Ref. 16, except that we determine the pose of the surgeon’s head relative to the patient by tracking fiducial markers in the real-world scene rather than by optically tracking the surgeon’s head using infrared (IR) light emitting diodes (LEDs). It is thus in a sense a natural extension of previous work at the University of Rochester20 from an affine representation of virtual objects to a Euclidean representation. In this paper, we describe our system and present an initial evaluation performed with a head phantom. The system at the University of Rochester is modified from and substantially similar to a working system developed at Siemens Corporate Research.30
2. METHODS 2.1. Design choices and system description We chose to build a video see-through system rather than an optical see-through system. The latter approach would not require that the surgeon wear two video cameras on his head, would not require the processing of the corresponding pair of video streams, and would provide the real-world scene at much higher resolution than is possible with current cameras. A video see-through system composites the augmented view in the computer and thus allows much more control over the result.
446
Figure 1. Photographs of the HMD, camera mount, three cameras, and IR illuminator. • Camera calibration is necessary for embedding virtual objects into video frames because the geometric relationship among various physical objects, virtual objects, and the camera needs to be established to obtain correct views of the model object. Calibration of a video see-through system is relatively straight forward and can be quite accurate. • An optical see-through system has an intrinsic time lag between the presentation of the real-world scene, which is immediate, and the presentation of the virtual objects from the corresponding viewpoint, which occurs later because of the time required to determine the head pose and render and display the virtual objects. With a video see-through system, the real image and virtual objects can be synchronized. There will be a latency, but there will be no relative time lag, and thus the virtual objects can appear as true static parts of the real-world scene and not appear to float. • A video see-through approach allows complete control of visualization. This makes it possible, for example, to easily match virtual and real image brightness and contrast. It also allows the system designer to provide tools to manipulate the real-world scene. This is particularly important for creating solutions to problems of poor depth perception caused by occlusion. • The augmented view from a video see-through system is trivially shared with many people and thus has potential advantages with regard to surgical training and remote expert consultation. The user wears a custom video see-through HMD (see left and middle panels of Fig. 1). The HMD is a ProView XL35 (Kaiser Electronics, San Jose, CA). This HMD has the highest resolution of the currently commercially available LCD-based displays that we are aware of. For each eye, it features three LCD chips in the primary colors which are combined optically to yield true XGA resolution (1024 × 768 pixels). This resolution is available in both mono and stereo (two independent signals) modes. The diagonal field of view (FOV) is 35 deg, the brightness is adjustable from 5 to 50 ft-lamberts, the contrast is 40:1, and the weight is approximately 1 kg. We use two Panasonic GP-KS1000 color cameras to capture the real-world scene. These light-weight (camera head, 18 g) cameras in small “lipstick” format (17 mm diameter × 42 mm length) have a 0.5 in. CCD image sensor with 830 × 970 pixels. We attach telescopic lenses with f = 15 mm that cover a diagonal FOV of 30 deg. Thus the image is perceived as slightly magnified in the HMD, which has a 35 deg diagonal FOV. Although it is possible to track the pose of the real-world scene cameras using an optical or magnetic tracking system, as is done with most medical AR systems, we instead chose to determine the pose of the scene cameras by tracking fiducial markers in the real-world scene. We use a Sony XC-ST50 black-and-white camera for tracking. This camera weights 110 g and has a 0.5 in. CCD image sensor with 768 × 494 pixels. We attach a wide-angle lens (Cosmicar Pentax C60402) with f = 4.2 mm that covers a diagonal FOV of 87 deg. Seventeen markers are arranged on a frame around the workspace of the real-world object, which will eventually be a patient, but is currently in our case a head phantom (see right panel of Fig. 2). To simplify marker detection and localization, the tracking 447
Figure 2. Left panel: Photograph of calibration object. Right panel: Photograph of tracking plate and head phantom. The tracking plate is used for both tracking and calibration. camera operates in the near IR region, which was achieved by replacing the visible-pass, IR-block filter that comes with the camera with a custom ordered visible-block, IR-pass filter (830 nm cut-off wavelength). The markers are retroreflective (3M Scotchlite Silver Superbond 8830) circular disks (0.75 in. diameter) punched from a roll of material. They are illuminated with a custom-built IR illuminator consisting of twenty IR LEDs (Radio Shack 276-143, 5 mm diameter, 940 nm wavelength) circumferentially arranged on a printed circuit board and attached to the tracking camera wide-angle lens (see right panel of Fig. 1). The IR LEDs are continuously powered during operation. The three cameras are rigidly held in a custom built aluminum frame that can be mounted on either the HMD (see Fig. 1) or a tripod (see right panel of Fig. 2). The color real-world scene cameras are separated by approximately 65 mm, which is a typical interpupillary distance. The optical axes of these two cameras converge at a distance of approximately 0.6 m (our camera frame allows this distance to be adjusted, but we found that 0.6 m, which is approximately the distance to the end of an outstretched arm, seems to work best). The vertical angle between the scene cameras and the tracking camera is adjustable. We set it at approximately 15 deg for the work described in this paper so that when the scene cameras look at an object placed in front of the tracking plate, the tracking markers are approximately centered in the image of the tracking camera (see right panel of Fig. 2). Accurate projection of a virtual object requires knowing precisely the combined effect of the object-to-world, world-to-camera, and camera-to-image transformations.13 Calculation of the first transformation is discussed in Section 2.2 below. The second and third transformations are obtained by tracking and camera calibration. We need to estimate the internal parameters for all three cameras, and the transformations between the scene and the tracking camera coordinate systems. The tracking plate (see right panel of Fig. 2) is used for both tracking and also calibration by adding thirty-one additional markers in a 3-D configuration (see left panel of Fig. 2). We use a freeware implementation∗ of Tsai’s camera calibration algorithm.33 The 3-D world coordinates of each marker were measured by collecting points around the circumference using an Optotrak 3020 (Northern Digital, Inc., Waterloo, Ontario, Canada) and a probe with six IR LEDs. The circle that best fits the set of circumferential points in a least-squares sense was calculated17 and the 3-D marker position taken as the center of this circle. The 2-D image coordinates of the markers (intensity-weighted centroids) were determined in the three camera images. Each camera was calibrated independently, thus producing three sets of internal and external parameters. The two transformations between the scene and the tracking camera coordinate systems were determined from the three sets of external parameters. Real-time tracking was accomplished using optical guidelines. The markers are detected, identified and localized in each video frame independently without relying on information from the previous frame. Three bars, two vertical and one horizontal, serve as guidelines (see Fig. 3). The guidelines are detected very quickly with a few line scans. Then we search in a small area along these guidelines to find the markers. This approach is robust, since we do not require that markers be close to their position in the previous video frame, and fast, since we are able to limit the search to a small area. ∗
URL: http://www.cs.cmu.edu/cil/v-source.html.
448
Figure 3. Real-time tracking with optical guidelines. Left panel: Image of the tracking plate from the black-andwhite tracking camera. Right panel: Processed image showing detected and localized marker positions. We use two SGI Visual Workstation 540 computers. One advantage of these machines is that they have a unified memory architecture, i.e., main, video, graphics, and texture memory all share the same physical memory. This feature helps us meet our design goal of providing augmented views in real-time by reducing the amount of information that has to be transferred among the various functional components of the machine. Each machine comes standard with one analog video input channel. We use the optional Digital Video Board that supports two additional video streams in serial digital format. The color real-world scene camera outputs are fed into the analog video input channels. The black-and-white tracking camera output is first converted to serial digital format with an ASD-101i (Miranda Technologies, Inc., Ville Saint-Laurent, Quebec, Canada) analog-to-digital video converter and then fed into the digital video input channel on the optional Digital Video Board. The incoming video streams are genlocked (i.e., captured simultaneously). We record the MSC/UST (media sequence count/unadjusted system time) timestamp for each video frame and use this timestamp to synchronize the real-world scene images with the camera pose calculated from the tracking camera image. The software is written in C++ and runs under the Windows 2000 operating system. Video programming was written using the SGI Digital Media Software Development Kit library. The tracking and two augmentation (one each for the left and right scene images) programs are implemented as separate threads. These programs need to exchange information regarding camera pose, synchronization, and choice of virtual objects for augmentation. We implement communication with TCP/IP using the Windows socket interface. The tracking and left augmentation programs run on one computer and communication takes place within the computer. The right augmentation program runs on the other computer and communication occurs over an Ethernet link between the two machines. We found that message transfer takes about 1–3 ms for both types of communication. We create the augmented scene by rendering the virtual objects using OpenGL. We display the view of the OpenGL camera. The real-world scene video is texture mapped onto a “virtual screen” in the background of the augmented scene. The video images are scaled from the capture format of 720 × 468 to the XGA output format of 1024 × 768 via texture mapping, which is hardware supported.
2.2. Head phantom We use a head phantom (Pacific Research Laboratories, Inc., Vashon Island, WA) consisting of a hard plastic skull and soft rubber skin. We acquired a CT image† of the phantom after attaching eight skin-affixed markers (multimodality radiographic markers, IZI Medical Products, Baltimore, MD) to it. Internal anatomic structures were †
The phantom CT image was acquired using a GE HiSpeed Advantage CT scanner. There are 94 slices that are 3 mm thick with no interslice gap or slice overlap. Each slice contains 512 × 512 pixels of size 0.49 × 0.49 mm. Table advance was conventional and the gantry tilt angle was zero.
449
Figure 4. Left panel: Slice from MR image of a patient. Middle panel: Slice from CT image of the same patient. Right panel: Slice from CT image of the head phantom. Six skin-affixed markers can be seen in this slice. segmented from MR‡ and CT§ images of a patient with a recurrent right parietal glioblastoma multiforme. Figure 4 shows sample slices from these images. The right panel of Fig. 5 shows a photograph of the head phantom with the skin-affixed markers. The tumor and lateral, third, and fourth ventricles were manually segmented from the MR image. Contours (polygons) were interactively created using an independent implementation of live wire,11,24 and a 3-D triangle set representation of the surface was generated from the 2-D contours using nuages¶ .4,5 Isointensity triangle set representations of the skull and skin surfaces were automatically segmented from the patient and phantom CT images, respectively, using the marching cubes algorithm22 as implemented in Amira (TGS, Inc., San Diego, CA). The triangle sets were decimated using Amira to produce polyhedra with approximately 9,000 (skin), 9,000 (skull), 800 (tumor), 1,500 (lateral ventricles), and 1,000 (third and fourth ventricles) triangles. All of the triangle sets were mapped to the physical space (world) coordinate system using three transformations: patient MR-to-patient CT (image-to-image), patient CT-to-phantom CT (image-to-object), and phantom CT-to-physical space (object-to-world). The first two transformations were obtained using an independent implementation26,27 of the intensity-based image registration algorithm described in Refs. 31 and 32. This implementation optimizes the “normalized mutual information” (NMI) similarity measure.32 Since the patient MR and CT images were obtained from the same patient, we determined the patient MR-to-patient CT mapping by searching for the six parameters of the rigid-body transformation that best matches the patient MR image to the patient CT image. The left panel of Fig. 5 illustrates the quality of the transformation we obtained. Ideally we would like to have a head phantom manufactured from the patient’s MR or CT image (e.g., a stereolithography model). The particular head phantom we use is manufactured from a person’s CT image, but not this patient’s CT image. Our objective in this paper is to create AR images using a head phantom with actual internal anatomic structures that are realistically positioned inside the phantom. We accomplished this by searching for the nine parameters of the affine transformation (rigid-body transformation plus anisotropic scale factors) that best matches the patient CT image to the phantom CT image. The middle panel of Fig. 5 illustrates the quality of the transformation we obtained. Visual inspection suggests that the transformation is realistic. The registration appears to succeed reasonably well because it can match the plastic bone of the head phantom with the bone of the patient. The phantom CT-to-physical space ‡ The patient MR image was acquired using a GE 1.5 T Signa Echospeed MR scanner with version 5.8 configuration. A transverse T1 -weighted image was acquired using the head coil and an incoherent 3-D gradient-echo (RF spoiled, SPGR) sequence with TE = 2 ms, TR = 9 ms, and flip angle = 30 deg. There are 128 slices that are 1.5 mm thick with no interslice gap. Each slice contains 256 × 256 pixels of size 0.98 × 0.98 mm. § The patient CT image was acquired using a GE HiSpeed Advantage CT scanner. There are 46 slices that are 3 mm thick with no interslice gap or slice overlap. Each slice contains 512 × 512 pixels of size 0.67 × 0.67 mm. Table advance was helical and the gantry tilt angle was zero. ¶ URL: http://www-sop.inria.fr/prisme.
450
Figure 5. Left panel: Illustration of the quality of the patient MR-CT image rigid-body registration. Bone segmented from CT by thresholding is overlaid in red on MR using alpha blending. Middle panel: Illustration of the quality of the patient-phantom CT image affine registration. Bone and skin segmented from phantom CT by thresholding is overlaid in red on patient CT using alpha blending. Right panel: Photograph of head phantom showing skin-affixed markers used for phantom CT-physical space rigid-body registration.
Figure 6. Visual assessment of registration accuracy. The wire frame model of the skin surface that was segmented from the phantom CT scan appears to accurately match the real skin surface. Three red spheres whose centers are the fiducial marker positions appear to accurately match the real skin-affixed markers. transformation was obtained using point-based registration. The skin-affixed fiducial markers are thin disks, each with a cylindrical hole in the center (see right panel of Fig. 5). We define the fiducial point as the intersection of the axis of the cylindrical hole with the skin surface. The markers were manually localized in the phantom CT image. The marker positions in physical space were determined using an Optotrak 3020 and a probe with six IR LEDs. For an overview of point-based and intensity-based image registration see Ref. 12.
3. RESULTS The AR system operates in real time at 30 frames/sec with a temporal latency of about three frames (100 ms) and zero relative lag between the virtual objects and the real-world scene. As mentioned in the previous section, we eliminated the relative lag between the virtual and real components by synchronizing the tracker camera’s pose information with the video images of the real-world scene. For an initial evaluation of the system, we created AR images using a head phantom with actual internal anatomic 451
Figure 7. Depth cues. There are two primary depth cues in our system—stereo disparity and motion parallax. The left and rights panels illustrate an augmented image from two different orientations.
Figure 8. Occlusion problem. When using shaded renderings, most users correctly visualize brain anatomy as being internal if there are no occlusions, e.g., the brain tumor shown in the left panel. However, only a few users correctly visualize brain anatomy as being internal if there is a surface occlusion such as a skin-affixed marker or an ear, e.g., the third and fourth ventricles shown in the middle panel. The occlusion problem is worse when multiple virtual objects are displayed, as illustrated in the right panel. No user is able to correctly visualize brain anatomy as being internal if a person’s hand is placed in front of the virtual object. structures (segmented from CT and MR scans of a patient) realistically positioned inside the phantom and/or the skin surface of the phantom (segmented from a CT scan of the phantom). The internal anatomic structures (tumor; lateral, third, and fourth ventricles; and skull) and the skin are represented as triangle sets. These virtual objects are displayed as either shaded surfaces, wire frames, or point sets (triangle set vertices). We also display a virtual biopsy needle as a long and thin cylinder. We have performed only minimal tests of the accuracy of the system. The camera calibration 2-D projection error is generally less than 1 mm. A few preliminary tests suggest that the 3-D reprojection error of the system is less than 2 mm. We visually assessed the total system error, which includes the phantom image-to-physical space registration error, by examining an augmented image of the head phantom with a wire-frame model of the skin surface plus three red spheres whose centers are the fiducial marker positions. This assessment was carried out from many orientations. One sample image is shown in Fig. 6. The wireframe model appears to accurately match the real skin surface. Three red spheres appear to accurately match the real skin-affixed markers. Our visual assessment suggests that the total 452
Figure 9. Non-opaque overlays. Virtual objects can be displayed as shaded surfaces (left panel), wire-frame models (middle panel), or texture-mapped dot patterns (right panel). system error is on the order of 2 mm. The 3-D perception of the internal anatomic structures is based on both stereoscopic depth cues and kinetic depth cues (motion parallax), with the user looking at the head phantom from varying positions. The perception of the augmented visualization is natural and convincing. The brain structures appear rigidly anchored in the head, manifesting little or no apparent swimming or jitter. We had ten volunteers, including two neurosurgeons, give us feedback about 3-D visualization using the AR system. When using shaded renderings, most users (8/10) correctly visualized brain anatomy as being internal if there were no occlusions (see left panel of Fig. 8). However, only half of the users (5/10) correctly visualized brain anatomy as being internal if there was a surface occlusion such as a skin-affixed marker or an ear (see middle panel of Fig. 8). The occlusion problem was worse when multiple virtual objects were displayed (see right panel of Fig. 8). No user was able to correctly visualize brain anatomy as being internal if a person’s hand was placed in front of the virtual object. When using wire-frame models and texture-mapped dot patterns, most users correctly visualized brain anatomy as being internal and could generally appreciate spatial relationships among various objects. For example, almost all users (9/10) claimed that they could correctly visualize the brain tumor as being just inside the skin in the right posterior region of the head when viewing the phantom from a fixed orientation (see left panel of Fig. 7), and this increased to all users when the user changed the viewing orientation (see right panel of Fig. 7). The same results were obtained for both wire frames and point sets. Also, almost all users (8/10) claimed that they could correctly visualize the distal tip of the virtual biopsy needle as being in the middle of the brain tumor (see left panel of Fig. 7), and this increased to all users when the user changed the viewing orientation (see right panel of Fig. 7). Almost all users (8/10) could correctly appreciate spatial relationships among multiple virtual objects when they were displayed as wire frames or point sets (see Fig. 9). There is a continuum from reality to AR to VR (see Fig. 10). One advantage of a video-see-through approach is the ability to manipulate reality. Figure 11 illustrates VR images created by turning off the scene cameras and using a wire-frame model of the phantom head skin surface. We examined many combinations of virtual objects. In all cases, the user reported that perception of spatial relationships among virtual objects, both relative to each other and relative to the wire-frame model of the skin surface, was better in the VR images than in the corresponding AR images with the real-world scene present.
4. CONCLUSION The initial evaluation of our video see-through HMD AR system is encouraging, and we believe that AR visualization might become an important tool for image-guided neurosurgical planning and navigation. Both neurosurgeons who participated in the evaluation are very enthusiastic about the potential usefulness of AR visualization for neurosurgical planning. 453
Figure 10. The continuum of reality (left panel), AR (middle panel), and VR (right panel).
Figure 11. Examples of useful VR images.
ACKNOWLEDGMENT Most of the equipment used in this study was purchased using funds generously provided by a grant from the Glen and Maude Wyman-Potter Foundation. Some equipment was obtained, and Calvin Maurer and Torsten Rohlfing were supported, using funds generously provided by the Ronald L. Bittner Endowed Fund in Biomedical Research. Bo Hu was supported by a grant from Siemens Corporate Research. Filippo Recchi received partial support via a Special Opportunity Award from the Whitaker Foundation to the University of Rochester. The authors thank Kenneth Adams and William Schulz in the Instrument Machine Shop at the University of Rochester for helping design and building the camera mount. The authors thank the Department of Radiology at the University of Rochester for scanning the head phantom used in this study.
REFERENCES 1. J. W. Berger and D. S. Shin, “Computer vision-enabled augmented reality fundus biomicroscopy”, Ophthalmology, vol. 106, pp. 1935–1941, 1999. 2. W. Birkfellner, M. Figl, K. Huber, F. Watzinger, F. Wanschitz, R. Hanel, A. Wagner, D. Rafolt, R. Ewers, and H. Bergmann, “The Varioscope AR—A head-mounted operating microscope for augmented reality”, in Medical Imaging Computing and Computer-Assisted Intervention (MICCAI) 2000, S. L. Delp, A. M. DiGioia, and B. Jaramaz, Eds., pp. 869–877. Springer-Verlag, Berlin, 2000.
454
3. M. Blackwell, F. Morgan, and A. M. DiGioia, III, “Augmented reality and its future in orthopaedics”, Clin. Orthop., vol. 354, pp. 111–122, 1998. 4. J.-D. Boissonnat and B. Geiger, “Three-dimensional reconstruction of complex shapes based on the Delaunay triangulation”, Tech. Rep. 1697, INRIA, 1992. 5. J.-D. Boissonnat and B. Geiger, “Three-dimensional reconstruction of complex shapes based on the Delaunay triangulation”, Biomedical Image Processing and Biomedical Visualization 1993, vol. Proc. SPIE 1905, pp. 964–975, 1993. 6. P. J. Edwards, D. J. Hawkes, D. L. G. Hill, D. Jewell, R. Spink, A. Strong, and M. Gleeson, “Augmentation of reality using an operating microscope for otolaryngology and neurosurgical guidance”, J. Image Guid. Surg. (now Comput. Aided Surg.), vol. 1, pp. 172–178, 1995. 7. P. J. Edwards, D. L. G. Hill, D. J. Hawkes, and R. Spink, “Stereo overlays in the operating microscope for image-guided surgery”, in Computer Assisted Radiology 1995, H. U. Lemke, K. Inamura, C. C. Jaffe, and M. W. Vannier, Eds., pp. 1197–1202. Springer-Verlag, Berlin, 1995. 8. P. J. Edwards, A. P. King, D. J. Hawkes, O. J. Fleig, C. R. Maurer, Jr., D. L. G. Hill, M. R. Fenlon, D. A. de Cunha, R. P. Gaston, S. Chandra, J. Mannss, A. J. Strong, M. J. Gleeson, and T. C. S. Cox, “Stereo augmented reality in the surgical microscope”, in Medicine Meets Virtual Reality VII, J. D. Westwood, H. M. Hoffman, R. A. Robb, and D. Stredney, Eds., pp. 102–108. IOS Press, Amsterdam, 1999. 9. P. J. Edwards, A. P. King, C. R. Maurer, Jr., D. A. de Cunha, D. J. Hawkes, D. L. G. Hill, R. P. Gaston, M. R. Fenlon, S. Chandra, A. J. Strong, C. L. Chandler, A. Richards, and M. J. Gleeson, “Design and evaluation of a system for microscope-assisted guided interventions (MAGI)”, in Medical Imaging Computing and ComputerAssisted Intervention (MICCAI) 1999, C. J. Taylor and A. C. F. Colchester, Eds., pp. 842–851. Springer-Verlag, Berlin, 1999. 10. P. J. Edwards, A. P. King, C. R. Maurer, Jr., D. A. de Cunha, D. J. Hawkes, D. L. G. Hill, R. P. Gaston, M. R. Fenlon, A. Jusczyzck, A. J. Strong, C. L. Chandler, and M. J. Gleeson, “Design and evaluation of a system for microscope-assisted guided interventions (MAGI)”, IEEE Trans. Med. Imaging, vol. 19, pp. 1082–1093, 2000. 11. A. X. Falcao, J. K. Udupa, S. Samarasekera, S. Sharma, B. E. Hirsch, and R. D. A. Lotufo, “User-steererd image segmentation paradigms: Live wire and live lane”, Graphical Models Image Processing, vol. 60, pp. 233–260, 1998. 12. J. M. Fitzpatrick, D. L. G. Hill, and C. R. Maurer, Jr., “Image registration”, in Handbook of Medical Imaging, Volume 2: Medical Image Processing and Analysis, M. Sonka and J. M. Fitzpatrick, Eds., pp. 447–513. SPIE Press, Bellingham, WA, 2000. 13. J. D. Foley and A. van Dam, Computer Graphics: Principles and Practice, Addison-Wesley, Reading, MA, 2nd edition, 1990. 14. E. M. Friets, J. W. Strohbehn, J. F. Hatch, and D. W. Roberts, “A frameless stereotaxic operating microscope for neurosurgery”, IEEE Trans. Biomed. Eng., vol. 36, pp. 608–617, 1989. 15. H. Fuchs, M. A. Livingston, R. Raskar, D. Colucci, K. Keller, A. State, J. R. Crawford, P. Rademacher, S. H. Drake, and A. A. Meyer, “Augmented reality visualization for laparoscopic surgery”, in Medical Imaging Computing and Computer-Assisted Intervention (MICCAI) 1998, W. M. Wells, III, A. C. F. Colchester, and S. L. Delp, Eds., pp. 934–943. Springer-Verlag, Berlin, 1998. 16. H. Fuchs, A. State, E. D. Pisano, W. F. Garrett, G. Hirota, M. Livingston, M. C. Whitton, and S. M. Pizer, “Towards performing ultrasound-guided needle biopsies from within a head-mounted display”, in Visualization in Biomedical Computing 1996, K. H. H¨ohne and R. Kikinis, Eds., pp. 591–600. Springer-Verlag, Berlin, 1996. 17. W. Gander, G. H. Golub, and R. Strebel, “Least-squares fitting of circles and ellipses”, BIT, vol. 34, pp. 558–578, 1994. 18. W. E. L. Grimson, G. J. Ettinger, S. J. White, T. Lozano-Perez, W. M. Wells, III, and R. Kikinis, “An automatic registration method for frameless stereotaxy, image guided surgery, and enhanced reality visualization”, IEEE Trans. Med. Imaging, vol. 15, pp. 129–140, 1996. 19. A. P. King, P. J. Edwards, C. R. Maurer, Jr., D. A. de Cunha, R. P. Gaston, M. Clarkson, D. L. G. Hill, D. J. Hawkes, M. R. Fenlon, A. J. Strong, T. C. S. Cox, and M. J. Gleeson, “Stereo augmented reality in the surgical microscope”, Presence: Teleoperators Virtual Environments, vol. 9, pp. 360–368, 2000. 20. K. N. Kutulakos and J. R. Vallino, “Calibration-free augmented reality”, IEEE Trans. Visualization Comput. Graph., vol. 4, pp. 1–20, 1998.
455
21. M. L. Levy, J. D. Day, F. Albuquerque, G. Schumaker, S. L. Giannotta, and J. G. McComb, “Heads-up intraoperative endoscopic imaging: A prospective evaluation of techniques and limitations”, Neurosurgery, vol. 40, pp. 526–531, 1997. 22. W. E. Lorensen and H. E. Cline, “Marching Cubes: A high resolution 3D surface construction algorithm”, Comput. Graph., vol. 21, pp. 163–169, 1987. 23. Y. Masutani, T. Dohi, F. Yamane, H. Iseki, and K. Takakura, “Augmented reality visualization system for intravascular neurosurgery”, Comput. Aided Surg., vol. 3, pp. 239–247, 1998. 24. E. N. Mortensen and W. A. Barrett, “Interactive segmentation with intelligent scissors”, Graphical Models Image Processing, vol. 60, pp. 349–384, 1998. 25. D. W. Roberts, J. W. Strohbehn, J. F. Hatch, W. Murray, and H. Kettenberger, “A frameless stereotaxic integration of computerized tomographic imaging and the operating microscope”, J. Neurosurg., vol. 65, pp. 545–549, 1986. 26. T. Rohlfing, Multimodale Datenfusion f¨ ur die bildgesteurte Neurochirugie und Strahlentherapie, PhD thesis, Technical University Berlin, Berlin, 2000. 27. T. Rohlfing, J. B. West, J. Beier, T. Liebig, C. A. Taschner, and U.-W. Thomale, “Registration of functional and anatomical MRI: Accuracy assessment and application in navigated neurosurgery”, Comput. Aided Surg., vol. 5, pp. 414–425, 2000. 28. J. P. Rolland and H. Fuchs, “Optical versus video see-through head-mounted displays in medical visualization”, Presence: Teleoperators Virtual Environments, vol. 9, pp. 287–309, 2000. 29. Y. Sato, M. Nakamoto, Y. Tamaki, T. Sasama, I. Sakita, Y. Nakajima, M. Monden, and S. Tamura, “Image guidance of breast cancer surgery using 3-D ultrasound images and augmented reality visualization”, IEEE Trans. Med. Imaging, vol. 17, pp. 681–693, 1998. 30. F. Sauer, F. Wenzel, S. Vogt, Y. Tao, Y. Genc, and A. Bani-Hashemi, “Augmented workspace: Designing an AR testbed”, Proc. IEEE Int. Symp. Augmented Reality (ISAR) 2000, pp. 47–53, 2000. 31. C. Studholme, D. L. G. Hill, and D. J. Hawkes, “Automated 3D registration of MR and PET brain images by multi-resolution optimisation of voxel similarity measures”, Med. Phys., vol. 24, pp. 25–35, 1997. 32. C. Studholme, D. L. G. Hill, and D. J. Hawkes, “An overlap invariant entropy measure of 3D medical image alignment”, Pattern Recognit., vol. 33, pp. 71–86, 1999. 33. R. Y. Tsai, “A versatile camera calibration technique for high-accuracy 3-D machine vision metrology using off-the-shelf TV cameras and lenses”, IEEE Trans. Robotics Automat., vol. 3, pp. 323–344, 1987.
456