Tracking of Instruments in MIS for Surgical Skill Analysis. 149 structures through ... The objective of the classification stage is to generate an abstract description of the demonstrated ... highlights and varying lighting conditions. The proposed ...
Tracking of Instruments in Minimally Invasive Surgery for Surgical Skill Analysis Stefanie Speidel1 , Michael Delles1 , Carsten Gutt2 , and R¨ udiger Dillmann1 1
2
Institute of Computer Science and Engineering University of Karlsruhe, Germany Department of General, Visceral and Accident Surgery University of Heidelberg, Germany
Abstract. Intraoperative assistance systems aim to improve the quality of the surgery and enhance the surgeon’s capabilities. Preferable would be a system which provides support depending on the surgery context and surgical skills accomplished. Therefore, the automated analysis and recognition of surgical skills during an intervention is necessary. In this paper a robust tracking of instruments in minimally invasive surgery based on endoscopic image sequences is presented. The instruments were not modified and the tracking was tested on sequences acquired during a real intervention. The generated trajectory of the instruments provides information which can be further used for surgical gesture interpretation.
1
Introduction
Minimally invasive surgery (MIS), unlike conventional surgery, results in a reduced patient-trauma and the use of special instruments and operation techniques. The usually long and thin instruments and a camera (endoscope) are inserted through small incisions into the patient’s body. The surgeon performs the intervention by looking at a monitor device where the images from the endoscope are displayed. On patient-side, the advantages of MIS are numerous, including reduced pain, lower risk of infection and a rapid recovery resulting in shortened hospitalization. MIS has led to changes for the patient, but has also an impact on surgeon-side. Difficulties due to the limited field of view, restricted mobility and the complex hand-eye coordination lead to added strain and fatigue. Particular surgical gestures in MIS that require a high dexterity from the surgeon differ from open surgery and have to be newly adopted. To alleviate with these new problems, several robotic assistance systems [1] have been developed to enhance the surgeon’s capabilities. This means providing the surgeon with new tools that improve quality and accuracy, but also reduce the strain of the surgeon and risk for the patient during surgery. In this scenario the surgeon works cooperatively with the robot system in order to carry out the surgical procedure. It is necessary to have an intraoperative assistive human-machine interface that recognizes the surgical actions during the intervention, facilitates the interaction with technical additives and provides support depending on the surgery context and surgical actions. Support includes e.g. visualization of risk G.-Z. Yang et al. (Eds.): MIAR 2006, LNCS 4091, pp. 148–155, 2006. c Springer-Verlag Berlin Heidelberg 2006
Tracking of Instruments in MIS for Surgical Skill Analysis
149
structures through augmented reality techniques or semi-automatic execution of specific, standardized actions. The surgical action is defined as a surgical skill which consists of single elementary operations and describes the complex action. Examples in MIS include dissection of tissue, suturing, knot-tying or cutting [2]. Recently, surgical skill analysis and gesture interpretation has received increased attention. Proposed systems as described in [3,4,5,6] depend on their sensory input for analyzing surgical skills. The precondition for such an assistance that gives a situation-dependent support is the generation and classification of surgical skills in MIS. In this paper, the concept of generation and classification of surgical skills in an intraoperative assistance context is presented. The objective is to provide an automated assistance and augmentation to surgeons during an intervention using the endoscopic images as sensory input. Therefore, it is necessary to recognize surgical skills during the intervention which have to be generated and classified in advance. Preliminary results in vision-based tracking of the minimally invasive instruments, which is necessary for analyzing surgical skills, are presented. This research was conducted within the research training group ”Intelligent Surgery”. The research training group is a cooperation between the University of Heidelberg, the University of Karlsruhe and the German Cancer Research Center (DKFZ).
2
Overall Concept
The endoscopic image sequences provide a rich source of information about the intervention which can be used to analyze the surgical gestures. The benefit in using the images is that no changes in the operation-setup have to be made. The objective is to observe the surgeon performing a skill, to extract as much information as possible from the demonstration and map it into an abstract, generalized representation. Usually this is achieved by observing multiple demonstrations of the same skill and identifying the common features of the skill. This procedure is similar to the ”Programming by Demonstration” [7,8] paradigm applied in Robotics. The procedure can be divided in four different steps: 1. Identification: Specific, standardized skills in MIS. 2. Acquisition: Visual observation of multiple skill demonstrations using the endoscope. 3. Analysis: Segmentation of endoscopic images by characteristic features. 4. Classification: Mapping segmented subsequences to a formal representation for recognition. 2.1
Identification and Acquisition
Surgical skills in MIS are e.g. suturing, knot tying or cutting as identified in the ”Surgical Skills Workshop” [2] whose purpose was to establish basic definitions and standards of surgical technical skills. Skills which are repetitive and appear
150
S. Speidel et al.
frequently are suited for classification and proximate recognition for assistance or training. Complex endoscopic interventions contain on the one hand dissection and preparation of tissue and on the other hand reconstruction of organic structures. In the reconstruction phase standardized anastomosis techniques are applied frequently which consist of single steps like suturing and knot tying. The acquisition is done by observing multiple demonstrations of the same skill performed by different surgeons and recording the endoscopic image sequences. 2.2
Analysis
In order to understand the surgeons intention, the observed demonstration has to be segmented and analyzed. Therefore, in a first step, the sensor data has to be preprocessed for extracting reliable measurements and key points, which are used in a second step for segmenting the demonstration and analyzing the surgeons gestures. The preprocessing of the endoscopic images is necessary to provide the quality needed for computer-based processing. Specular highlights, distortion, reddish color and smoke cause degradations of the images and can be compensated using techniques described in [9]. After processing image data, the next step consists in fragmenting the demonstration into time related key points which serve to trigger the evaluation of elemental actions. The objective is to detect features and key points which describe the skill unambiguously and detect invariant attributes. Features which can be extracted from the image sequence and are considered are e.g. the trajectory, velocity and shape of the instruments. Possible key points are the contact points of the instrument with tissue or material. To segment features and key points only from the image data, two preconditions must be accomplished: – Tracking of the instruments to generate a trajectory. – 3D-model of the scene generated from the image data. The tracking of the instruments is described in detail in section 3. 2.3
Classification
The objective of the classification stage is to generate an abstract description of the demonstrated skill to provide the ability of the system to recognize and to interpret surgical skills. Mostly the knowledge of an intervention is implicit, there is no formal description of the single steps. The features and key points of the analysis step need to be merged and represented. Complex systems have usually a hierarchical design in which the different levels are characterized by their information content. A surgical skill can be decomposed into single elementary operators on the lowest level corresponding to the segmented subsequences of the analysis step. Additionally an elementary operator vocabulary depending on the used sensors is generated which can be reused or extended if a new skill is generated and classified.
Tracking of Instruments in MIS for Surgical Skill Analysis
3
151
Tracking: Method and Results
In order to analyze the temporal behavior of the instruments and to achieve automated localization, a robust tracking algorithm based on the monocular endoscopic images was implemented. The objective is to compute the 2D-trajectory of the instruments to analyze the surgeons gestures, but it could also be used for visual servoing applications like automatic guidance of the endoscope. The image sequences tested were acquired from the endoscope of the da Vinci surgical system [10] during a real intervention. Robust vision-based tracking without modifications to instruments is a challenging task due to the complex scene, e.g. the moving background, specular highlights and varying lighting conditions. The proposed tracking algorithm can be divided in two steps: 1. Segmentation: Color segmentation is first applied to the images to segment the instruments from the background tissue. 2. Tracking: Instruments are tracked to obtain their motion for analyzing gestures. 3.1
Segmentation
Color is an important low-level attribute that can be used for detecting the instruments in the minimally invasive scene. The instruments used in MIS have a typical color characteristic which differs from the tissue’s color. For color segmentation, the HSV space is preferable to the RGB color space. It separates the chromaticity from the luminance components and is therefore more robust to changes in lighting. The color characteristic of an instrument or tissue can be directly analyzed in the H-S plane. The segmentation of the instruments was realized using a Bayes classifier which was trained on a large sample of image sequences [11]. For this purpose, 60 images were segmented manually and for each color in the H-S plane the probability of belonging to the instrument or tissue calculated. Assuming an uni- or multimodal Gaussian distribution turned out to be disadvantageous for the given images. The probability of a color being instrument or tissue can be computed by employing the Bayes rule: P (K|F ) = =
P (F ∩ K) P (K) P (F ∩ K) P (F | K) · P (K) = · = P (F ) P (K) P (F ) P (F )
Quantity of pixels from class K and color F in training data Quantity of pixels from color F in training data
K : Event: pixel belongs to a certain class (instrument or tissue) F : Event: pixel has a certain color
152
S. Speidel et al.
The probability of a pixel being instrument or tissue is defined as follows: g w Instrument, if w·g ≥ τ · w·g P ixel = g w T issue, if w·g < τ · w·g w = Quantity of pixel color belonging to instrument g = Quantity of pixel color belonging to tissue The factor τ was empirically determined, whereas τ = 0.18 was ideal for the given image sequences. For run-time improvement, only every fourth line and column were taken for calculations resulting in a classified binary image (Fig. 1). A combination of masked thresholding and median filtering enhances the result. Incorrectly detected instrument areas, e.g. through specular highlights, were removed (Fig. 1).
Fig. 1. Segmentation results: original image (left), classified image(middle) and filtered image(right)
3.2
Tracking
The tracking of the instruments to derive their motion behavior was realized using the CONDENSATION algorithm - Conditional Density Propagation over time ([12]). The algorithm uses factored sampling with learned dynamical models to propagate an estimate of the state - the probability for object position - over time. The first classified image acquired is used for initialization of one tracker per detected instrument as the center of the classified region. To enhance the initialization in case instruments are crossing, only the border area is considered. The succeeding images are used for tracking the motion which results in a 2Dtrajectory for every instrument (Fig. 2). (n) (n) The algorithm uses a set of N samples (st , πt , n = 1 . . . N ) representing (n) approximately the conditional state density p(xt |Zt ) at time t. st denote the (n) pixels on the image plain and πt their corresponding weights. In detail, the algorithm iterates through the following steps: 1. First classified image: Initialization of N samples to the starting point, weight of every sample is set to 1/N .
Tracking of Instruments in MIS for Surgical Skill Analysis
153
2. For every new classified image at time t: – Select: Choose N samples of the set proportional to their weight. – Predict: Each new sample is calculated on the base of a dynamic model taking the sample from the last time step and the weighted difference between the two previous mean samples [13,14]: (n)
st
(n)
= st−1 + A(st−1 − st−2 ) + Bω
(1)
where A defines the deterministic component of the model and ω is a vector of independent standard normal random variables scaled by the diagonal matrix B so that BB T is the process noise covariance. – Measure: The new weight is calculated on the base of a likelihood function [15]: Mr 1 (n) (n) 2 (1 − rm ) (2) πt = p(zt |st ) ∝ exp − 2 2σr Mr m=1 where rm denotes the segmentation value of the mth pixel from the set (n) of instrument pixels. The weights are normalized so that πt = 1. – Estimate: The tracking position is estimated as the weighted mean of the samples: N st = = πtn snt (3) n=1
Fig. 2. Tracking results of three image sequences: Instrument with trajectory highlighted in green
154
S. Speidel et al.
The change of instruments, which happens often in MIS, is considered in the procedure. A tracker is deleted if it is not inside an instrument area for the next five images. If a new instrument appears, a new tracker is initialized if the classified instrument area touches the border area. In general, newly initialized trackers are ignored if their time appearance is below a certain threshold. For the special case of overlapping instruments the single trackers of each instrument have the same position and one tracker is deleted.
4
Conclusion and Future Work
The overall concept of generation and classification of surgical skills in MIS was presented. This procedure can be used for providing a context-specific support in an intraoperative assistance system scenario using the endoscopic image sequences as sensory input. For extracting as much information as possible from the observation of a skill, a robust tracking of the instruments was implemented and presented. The resulting trajectory provides necessary information which can be used to segment the surgical skill into elementary operators. The next step will be the extension of the tracking to three dimensions and the calculation of a 3D model from the endoscopic image sequences.
References 1. R. Taylor, D. Stoianovici: Medical Robotics in Computer-Integrated Surgery. IEEE Transactions on Robotics and Automation, 2003. 2. R.Satava, A. Cuschieri, J. Hamdorf: Metrics for objective assessment. Journal of Surgical Endoscopy, 2003. 3. H. Lin, I. Shafran, T. Murphy, A. Okamura, D. Yuh, G. Hager: Automatic Detection and Segmentation of Robot-Assisted Surgical Motions. International Conference on Medical Image Computing and Computer-Assisted Intervention, 2005. 4. J. Rosen, M. Solazzo, B. Hannaford, M. Sinanan: Objective Evaluation of Laparoscopic Skills Based on Haptic Information and Tool/Tissue Interactions. Journal of Computer Aided Surgery, 2002. 5. B. Lo, A. Darzi, G. Yang: Episode Classification for the Analysis of Tissue / Instrument Interaction with Multiple Visual Cues. International Conference on Medical Image Computing and Computer-Assisted Intervention, 2003. 6. H. Mayer, I.Nagy and A. Knoll: Skill Transfer and Learning by Demonstration in a Realistic Scenario of Laparoscopic Surgery. International Conference on Humanoid Robots, 2003. 7. M. Pardowitz, R. Z¨ ollner, R. Dillmann: Incremental Acquisition of Task Knowledge Applying Heuristic Relevance Estimation. International Conference on Robotics and Automation, 2006. 8. R. Z¨ ollner, O. Rogalla, R. Dillmann, M. Z¨ ollner: Understanding Users Intention: Programming Fine Manipulation Tasks by Demonstration. International Conference on Intelligent Robots and Systems, 2002. 9. F. Vogt, S. Kr¨ uger, H. Niemann, C. Schick: A System for Real-Time Endoscopic Image Enhancement. International Conference on Medical Image Computing and Computer-Assisted Intervention, 2003.
Tracking of Instruments in MIS for Surgical Skill Analysis
155
10. G. S. Guthart, J. K. Salisbury: The intuitive telesurgery system: Overview and application. International Conference on Robotics and Automation, 2000. 11. S. Phung, A. Bouzerdoum, D. Chai: Skin Segmentation Using Color Pixel Classification: Analysis and Comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005. 12. M. Isard, A. Blake: Condensation - conditional density propagation for visual tracking. International Journal of Computer Vision, 1998. 13. P. Azad, A. Ude, R. Dillmann, G. Cheng: A Full Body Human Motion Capture System using Particle Filtering and on-the-fly Edge Detection International Conference on Humanoid Robots, 2004. 14. P. Azad: Integrating Vision Toolkit (IVT). http://ivt.sourceforge.net 15. J. Deutscher, A. Blake, I. Reid: Articulated body motion capture by annealed particle filtering. International Conference on Computer Vision and Pattern Recognition, 2000.