The 4th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems June 4-7, 2014, Hong Kong, China
A Preliminary Study on Surgical Instrument Tracking based on Multiple Modules of Monocular Pose Estimation Jiaole Wang1,2 , Hongliang Ren2 Member, IEEE and Max Q.-H. Meng1 , Fellow, IEEE Abstract— Optical means of instrument tracking technology have been widely used in image-guided surgery and regarded as the de facto standard for tracking rigid bodies under the constraint of direct line-of-sight. It remains unresolved for optical tracking to have the innate drawbacks, such as the bulky volume, the line-of-sight requirement and the occlusion constraint, etc., although they have been successful in clinical scenarios. To address these challenges in this article, we propose a surgical instrument tracking system based on dynamic configuration of multiple monocular pose estimation modules. The main approach is to enable the system to dynamically reconfigure the multiple vision sensors when occlusion occurs partially within the workspace. The corresponding multi-camera calibration algorithm and multi-camera based instrument tracking method are proposed and the evaluation experiments are carried out.
I. INTRODUCTION Towards minimally invasive procedures, the image-guided surgery and intervention typically depend on the surgical instrument tracking to get realtime spatial relationship between the instruments and surrounding anatomical structures [1]– [5]. The goal of the surgical instrument tracking is to track the position and orientation of the instruments with respect to the patient anatomy in pre-operative registration and intraoperative navigation. The most successful applications are the surgical instrument tracking in the orthopedic operation and tumor puncture biopsy and ablation procedures [6]–[10]. Tracking devices that are used in the surgical theater include the mechanical devices [11], the optical devices [12], [13], the electromagnetic devices [12], [14], [15], among R others. The optical tracking systems such as Optotrak R 3020, Polaris (both are from North Digital Inc. (NDI)), R and MicronTracker (Claron Technology Inc.) are the mainstream systems that predominate the operating room because of their high accuracy, robustness and reliability. However, optical tracking approaches have been reported to be constrained by the line-of-sight requirement and there are many studies addressing this challenge [5], [16] A. Related Work He et al. [17] proposed a quadric-ocular optical tracking system which utilizes infrared reflective markers in a passive This project is partially supported by RGC GRF Project #415512 awarded to Max Q.-H. Meng, and in-part by Singapore Academic Research Fund, under Grants R397000139133, R397000173133 and R397000157112, awarded to Hongliang REN. 1 Jiaole Wang and Max Q.-H. Meng are with the Department of Electronic Engineering, The Chinese University of Hong Kong, N.T., Hong Kong SAR, China
[email protected],
[email protected] 2 Jiaole Wang and Hongliang Ren are with the Department of Biomedical Engineering, National University of Singapore, Singapore
[email protected]
978-1-4799-3669-4/14/$31.00 © 2014 IEEE
way to do surgical instrument tracking for a spine surgical robot. This system aimed to solve the occlusion problem of binocular system and give improved accuracy and robustness through multi-ocular approach. The state-of-the-art optical tracking systems are all binocular systems which basically have two cameras, either infrared or RGB cameras, mounted onto a rigid body to keep a fixed geometric relation. The binocular systems exploit the binocular disparity of two calibrated cameras to provide depth perception by using triangulation method [18]. The merits of the binocular systems, such as relative simple algorithms and intuitive analogy of human vision system, make the binocular systems the de facto standard in the optical tracking systems. However, the demerits are also obvious: the bulky size of the state-of-the-art devices, the occlusion constraint, etc. Guler and Yaniv [19] proposed an alternative approach to the binocular system by monocular tracking of fiducial markers using a calibrated webcam and software based pose estimation. The fiducial markers based monocular tracking uses a calibrated camera to determine the position and orientation (6 DOFs) of a printed planar marker with respect to the camera. This technique is well studied in the augmented reality field [20]–[22]. There are several libraries that use planar markers to do monocular tracking by detection, such as ARToolkit [23], ARTag [24], ArUco [25], etc. Although the proposed system presents low accuracy and robustness, the idea of the monocular tracking of the surgical instruments opens new possibilities for the image-guided surgery because of its high maneuverability. B. The Proposed System and Contributions In this paper, we propose to use several multiple monocular pose estimation modules to handle the occlusion constraint in the optical tracking systems for surgical probe tracking. The monocular pose estimation was carried out by the ArUco library (IGSTK [26], BSD license) to track 6 DOFs location and orientation of printed square markers. To handle the most influential occlusion constraint in optical tracking systems, we further extended the monocular tracking system in a multi-camera manner, and proposed an agile multi-camera calibration and tracking method. The proposed system is able to recalibrate the multi-camera system rapidly after the reconfiguration when occlusion exists during the tracking. Furthermore, it works in a cooperative way that all the cameras in the system could contribute in both the occlusion avoidance and the instrument tracking.
146
(a) Examples of fiducial markers and a customized tool Fig. 1. The proposed system consists of several client modules and a server. OpenIGTLink protocol is used for data communication between clients and server. Clients are independent modules that can carry out monocular pose estimation.
The rest of the paper is organized as follows: Section II presents the proposed tracking system and the related algorithms. The experimental results, including calibration and tracking evaluation, are illustrated in Section III. We further discuss the results and the proposed system, and draw some conclusions at the end of this paper. II. METHODS A. Modular Structure The basic idea of the proposed system is to enable the adjustment of the system configuration when occlusion occurs, thus a modular approach is adopted. The configuration adjustment can be implemented manually by surgeons or automatically by controllers, and adding new modules to the system is also considered valid adjustment. Automatic control of the modules is out of the scope of this paper, and will be discussed elsewhere. As shown in Fig. 1, the proposed system consists of a server and several client modules, and the TCP-based network is used to connect them. A RGB camera and a client processor constitute a tracking module which has the following functionalities: image capture, image processing, fiducial marker detecting and tracking, and calculated instrument 6D data streaming. The Server, on the other hand, has the functionalities such as receiving tool 6D data, multicamera calibration, surgical instrument calibration and multicamera multi-tool data fusion and tracking, etc. To enable communication between server and clients, OpenIGTLink [27] which is a TCP-based network protocol designed to cope with numerous hardware and software in the image-guided surgery, is adopted. Instead of directly sending image data to the server, the instrument 6D position and orientation data are calculated and streamed to alleviate network traffic and server computational pressure. B. Monocular Probe Tracking based on Fiducial Marker We proposed to calculate the position and orientation of the surgical probe tip by mounting a fiducial marker to it. The surgical probe is used in the surgical procedures to find intervention location with respect to pre-operative images during the intra-operative stage. In this sense, a customized
(b) Tool pivot calibration
Fig. 2. The customized tool that combines fiducial marker and Polaris marker. Monocular fiducial marker tracking can be performed on one module. Tool calibration is carried out by pivoting the customized tool at a fixed position.
tool is built by mounting a planar ArUco marker onto a NDI Polaris commercial surgical probe (Shown in Fig. 2(a)). As the monocular fiducial marker pose estimation only provides marker translation and orientation with respect to camera frame, it is essential to carry out tool calibration to find out probe tip translation with respect to fiducial marker frame. The tool calibration as shown in Fig. 2(b), is to pivot the probe at a fixed location for several times while the mounted fiducial marker being tracked by the camera. Then probe tip translation with respect to fiducial marker frame could be calculated in a least square manner by the following equations. ⎡cam ⎢ ⎣
R1 .. .
cam R
n
⎤ ⎡ cam ⎤ −I − t1 Mt ⎢ .. ⎥ .. ⎥ tip = ⎣ . ⎦ , . ⎦ cam t f ix −cam tn −I
(1)
where the left superscript cam and M represent the frames of the camera and marker, and the tn and Rn are nth measurement of the translation vector and rotation matrix, respectively. Then the probe tip translation and rotation with respect to camera frame could be tracked after the tool calibration. C. Multi-camera Calibration By extending the monocular tool tracking, we introduce a multi-camera approach which incorporates the monocular tracking modules to handle the line-of-sight limitation and occlusion constraint. Since each tracking module only calculates the probe pose with respect to itself, it is essential to find out the relative rotation and translation among multiple cameras within the modules. Besides that, it is important to recalibrate the involved cameras because the configuration of the multi-camera system could be changed when occlusion happens. Many researches have addressed multi-camera calibration problem with lots of different methods. Warren et al. [28] proposed an automatic multi-camera calibration toolbox (AMCC) which uses a chessboard and requires high degree of field of view (FOV) overlap. Svoboda et al. [29] proposed a multi-camera self-calibration toolbox which uses an extra point light and requires dark environment lighting. The benefit of this approach is that it can calculate camera intrinsic
147
i
i
i
i
Fig. 3. Multi-camera calibration by moving the pointer tool in front of cameras. The flowchart shows the basic procedures for calibration.
parameter at the same time. Barreto et al. [30] proposed an EasyCal toolbox which also uses an extra point light but requires camera intrinsic parameters. Our approach does not require a chessboard, or dark environment lighting, or an extra point light device that are necessary in previous research. Instead, we directly use the fiducial marker which is mounted on the surgical instrument, and just wave the instrument slowly in front of the multiple cameras for few seconds to carry out calibration. In the following sections, a stereo camera setup will be studied first, then a multi-camera setup will be considered as an extension of the stereo camera one. The illustration of the multi-camera calibration in a camera-pair manner is shown in Fig. 3. Data: Camera intrinsic parameters (Ki ) & lens distortion Kc1 ,· · · ,Kc4 Result: Relation between the two camera frames X initialization; while Not enough pair data do Read a new image (Ii ) ; Detect and calculate marker ID (N) & homogeneous transformation (A = {M R|M T}i ); if In other camera(s) then Save data pairs ({A1 , B1 },{A2 , B2 },· · · ); else Go to next iteration; end end Solve X by using listed data pairs; Algorithm 1: Multi-camera calibration algorithm As shown in Fig. 3, A, B, and X represent the homogeneous transformation matrices, which describes the translation and rotation relation, of camera j with respect to camera i, fiducial marker frame with respect to two camera frames, respectively. The relation among them could be described by X·B = A .
The algorithms introduced for stereo camera calibration would become corner stones for multiple camera calibration. The multi-camera calibration method as introduced in Algorithm 1 can be summarized into three steps: Step 1. Clients detect the fiducial markers, calculate the corresponding rotation and translation with respect to camera frames, and then stream the data to the server. Step 2. Server lists the retrieved data in a camera pair manner, and check if a marker exists in the paired camera at the same time. Step 3. The unknown X then can be calculated by using aforementioned approaches when there are enough data. D. Multi-camera Tracking and Data Fusion Method After the calibration, surgical instrument could be tracked under a data fusion manner by multiple cameras. The flowchart of the multi-camera tracking and data fusion algorithm is shown in Algorithm 2. Data: Define world frame (K) to appropriate camera frame Result: Targets’ position and orientation w.r.t. world frame initialization; Receive streaming data, including marker ID(cami N j ) & homogeneous transformation(cami {R|T } j ) w.r.t ith camera frame(cami ); Transform the received data to the world frame; if Same marker in other camera(s) then Sensor data fusion by weighted averaging across multiple cameras to output targets’ position and orientation w.r.t. world frame; else Output targets’ position and orientation w.r.t. world frame; end Algorithm 2: Multi-camera multi-target tracking algorithm In order to track surgical instruments by using the proposed system, the tracking method takes full advantage of the multiple camera structure. Instead of using the tracking data from only one camera, we take the fusion of multiple camera data for the same marker that could be captured. The captured 6 DOFs data of the probe tip across multiple cameras were first transformed to the world frame by using the previous calibration geometric information. Then the transformed 6 DOFs data were fused in a least square manner to provide occlusion-free and range-increasing solution. The evaluation of this method will be shown in Section III-B.
(2)
To find the unknown relation between two camera frames, we first use a quaternion based approach to solve the rotational component of X in a least square manner [31], then the translational part can be easily solved by substitution.
III. RESULTS The experiments were carried out by using four Logitech C310 webcams [32] (1280×720 pixels, 30 Hz). The intrinsic parameters and lens distortion coefficients of all cameras were calibrated by the camera calibration toolbox for Matlab
148
Multi−camera calibration
79 86 464 121 128 212 688 646 114 135 93 709 695 394 177 219 702 170 205 233 226 716 723 737 401 163 457 37 408 254 184 275 639 436 142 44 681 674 191 625 72 429 415 632 660 317 597 590 422 387 653 611 107 149 667 618 450 310 562 198 156 730 324 303 352 380 268 100 373 289 541 744 9 604 576 282 555 261 583 359 471 296 569 366 331 513 443 345 240 506 485 548 30 51 2 23 16 534 527 65 499 492 520 58 478 247338
(a) H type configuration
1200
200
(b) V type configuration
1000 100
Fig. 4. Two configurations of the system used for evaluating the proposed system and the corresponding algorithms. The Polaris system is shown in the background for comparisons. H type and V type configurations are shown in the foreground.
RMS
800 Z
600
Z X Z X −100 Camera Y 4 Camera Y 3 X X −200 Camera Y 2 Y 1 −100 Camera 0 0 100
TABLE I R ESULTS OF MULTI - CAMERA CALIBRATION IN TERMS OF ROTATIONAL AND TRANSLATIONAL
Z
0
400 200
(a) Calibration under H type configuration
ERRORS UNDER TWO CONFIGURATIONS
Configuration
World frame
errrot (◦ )
errtrans (mm)
H type
Camera 3
0.120
0.338
V type
Camera 1
0.091
0.252
Multi−camera calibration X Z 50 0 −50 −100
[33], and a residual error of 0.22 pixels on the cameras image plane was obtained. As shown in Fig. 2(a), a planar fiducial marker has been attached to a pointer tool of the Polaris system. This setup makes the probe be tracked by the Polaris system and ours simultaneously.
800 779 471 772 821 863 786 478 856 884 506 443 450 485 513 436 457 170 121 842 870 499 814 345 114 254 338 387 849 835 464 492 380 828 793 107 177 702 891 352 583 226 765 247 289 639 282 275 184 394 240 142 219 807 632 233 296 212 744 373 541 198 359 303 611 331 877 548 401 646 758 730 709 520 653 205 429 128 534 191 366 576 310 723 324 408 555 660 149 317 135 261 100 695 590 751 268 674 86 79 415 681 163 72 667 737 604 156 30 58 597 23 16 2 65 37 51 9 44 93422 716 562 625 527 618 569688
X
Z Camera Y 1 Camera Y 2 Z Z
X X Camera Y 3 Camera Y 4
0 −200
0
200 400
−400 −600
600
A. Results of Multi-camera Calibration As shown in Fig. 4, two system configurations, called H type (Fig. 4(a)) and V type (Fig. 4(b)), have been installed and calibrated. During the calibration process, we move the instrument in front of cameras to capture 750 frames for each of the camera, and the whole process takes 25 seconds. Let the world frame be fixed onto camera i, then the marker positions with respect to camera j, are transformed to the world frame by using calibrated relation between camera i and j frames. Subsequently, the translational error errtrans is defined as, ˆ j errtrans = cami t −cam t2 ,
(3)
cami t
where stands for the translational relation between ˆ j camera i frame and the marker. The left superscript cam stands for the transformed camera j frame. By expressing rotation in the axis-angle form, the rotational error errrot is defined as the following equation in which the axis of rotation is regarded as irrelevant, ˆ j errrot = |cami θ −cam θ| ,
(4)
where cami θ is the rotation angle of the marker with respect to camera i frame. The calibration results are shown in Fig. 5 and Tab. I. As shown in Fig. 5(a) and 5(b), spatial positions of markers that captured during the calibration are shown as a square in different colors, and the cameras of both H type and V type configurations are well calculated and shown in red and blue as camera frames. In Tab. I, we show two
(b) Calibration under V type configuration Fig. 5. Calibration results under two configurations of the system. The red and blue frames are camera positions, the squares in different colors represent the marker positions that used for calibration.
examples of the calibration results under two configurations. It is noted that the world frame can be set to any camera in the workspace. The RMS errors of both configurations show a sub-millimeter accuracy in translation. To summarize, the calibration algorithm can find rotation and translation in the multi-camera setup accurately and precisely. The calibration procedure is simple and fast, it just needs the marker mounted tool to be waved in front of cameras for few seconds. Therefore, it is feasible to dynamically reconfigure multiple cameras in the workspace, such as moving the cameras or adding new cameras to the tracking system as needed. B. Evaluation of the Proposed System To evaluate the proposed system, we moved the customized probe along the cross points of a well machined chessboard (error