Tracking of Instruments in Minimally Invasive ... - Semantic Scholar

Tracking of Instruments in Minimally Invasive Surgery for Surgical Skill Analysis Stefanie Speidel1 , Michael Delles1 , Carsten Gutt2 , and R¨ udiger Dillmann1 1

2

Institute of Computer Science and Engineering University of Karlsruhe, Germany Department of General, Visceral and Accident Surgery University of Heidelberg, Germany

Abstract. Intraoperative assistance systems aim to improve the quality of the surgery and enhance the surgeon’s capabilities. Preferable would be a system which provides support depending on the surgery context and surgical skills accomplished. Therefore, the automated analysis and recognition of surgical skills during an intervention is necessary. In this paper a robust tracking of instruments in minimally invasive surgery based on endoscopic image sequences is presented. The instruments were not modified and the tracking was tested on sequences acquired during a real intervention. The generated trajectory of the instruments provides information which can be further used for surgical gesture interpretation.

1

Introduction

Minimally invasive surgery (MIS), unlike conventional surgery, results in a reduced patient-trauma and the use of special instruments and operation techniques. The usually long and thin instruments and a camera (endoscope) are inserted through small incisions into the patient’s body. The surgeon performs the intervention by looking at a monitor device where the images from the endoscope are displayed. On patient-side, the advantages of MIS are numerous, including reduced pain, lower risk of infection and a rapid recovery resulting in shortened hospitalization. MIS has led to changes for the patient, but has also an impact on surgeon-side. Difficulties due to the limited field of view, restricted mobility and the complex hand-eye coordination lead to added strain and fatigue. Particular surgical gestures in MIS that require a high dexterity from the surgeon differ from open surgery and have to be newly adopted. To alleviate with these new problems, several robotic assistance systems [1] have been developed to enhance the surgeon’s capabilities. This means providing the surgeon with new tools that improve quality and accuracy, but also reduce the strain of the surgeon and risk for the patient during surgery. In this scenario the surgeon works cooperatively with the robot system in order to carry out the surgical procedure. It is necessary to have an intraoperative assistive human-machine interface that recognizes the surgical actions during the intervention, facilitates the interaction with technical additives and provides support depending on the surgery context and surgical actions. Support includes e.g. visualization of risk G.-Z. Yang et al. (Eds.): MIAR 2006, LNCS 4091, pp. 148–155, 2006. c Springer-Verlag Berlin Heidelberg 2006

Tracking of Instruments in MIS for Surgical Skill Analysis

149

structures through augmented reality techniques or semi-automatic execution of specific, standardized actions. The surgical action is defined as a surgical skill which consists of single elementary operations and describes the complex action. Examples in MIS include dissection of tissue, suturing, knot-tying or cutting [2]. Recently, surgical skill analysis and gesture interpretation has received increased attention. Proposed systems as described in [3,4,5,6] depend on their sensory input for analyzing surgical skills. The precondition for such an assistance that gives a situation-dependent support is the generation and classification of surgical skills in MIS. In this paper, the concept of generation and classification of surgical skills in an intraoperative assistance context is presented. The objective is to provide an automated assistance and augmentation to surgeons during an intervention using the endoscopic images as sensory input. Therefore, it is necessary to recognize surgical skills during the intervention which have to be generated and classified in advance. Preliminary results in vision-based tracking of the minimally invasive instruments, which is necessary for analyzing surgical skills, are presented. This research was conducted within the research training group ”Intelligent Surgery”. The research training group is a cooperation between the University of Heidelberg, the University of Karlsruhe and the German Cancer Research Center (DKFZ).

2

Overall Concept

The endoscopic image sequences provide a rich source of information about the intervention which can be used to analyze the surgical gestures. The benefit in using the images is that no changes in the operation-setup have to be made. The objective is to observe the surgeon performing a skill, to extract as much information as possible from the demonstration and map it into an abstract, generalized representation. Usually this is achieved by observing multiple demonstrations of the same skill and identifying the common features of the skill. This procedure is similar to the ”Programming by Demonstration” [7,8] paradigm applied in Robotics. The procedure can be divided in four different steps: 1. Identification: Specific, standardized skills in MIS. 2. Acquisition: Visual observation of multiple skill demonstrations using the endoscope. 3. Analysis: Segmentation of endoscopic images by characteristic features. 4. Classification: Mapping segmented subsequences to a formal representation for recognition. 2.1

Identification and Acquisition

Surgical skills in MIS are e.g. suturing, knot tying or cutting as identified in the ”Surgical Skills Workshop” [2] whose purpose was to establish basic definitions and standards of surgical technical skills. Skills which are repetitive and appear

150

S. Speidel et al.

frequently are suited for classification and proximate recognition for assistance or training. Complex endoscopic interventions contain on the one hand dissection and preparation of tissue and on the other hand reconstruction of organic structures. In the reconstruction phase standardized anastomosis techniques are applied frequently which consist of single steps like suturing and knot tying. The acquisition is done by observing multiple demonstrations of the same skill performed by different surgeons and recording the endoscopic image sequences. 2.2

Analysis

In order to understand the surgeons intention, the observed demonstration has to be segmented and analyzed. Therefore, in a first step, the sensor data has to be preprocessed for extracting reliable measurements and key points, which are used in a second step for segmenting the demonstration and analyzing the surgeons gestures. The preprocessing of the endoscopic images is necessary to provide the quality needed for computer-based processing. Specular highlights, distortion, reddish color and smoke cause degradations of the images and can be compensated using techniques described in [9]. After processing image data, the next step consists in fragmenting the demonstration into time related key points which serve to trigger the evaluation of elemental actions. The objective is to detect features and key points which describe the skill unambiguously and detect invariant attributes. Features which can be extracted from the image sequence and are considered are e.g. the trajectory, velocity and shape of the instruments. Possible key points are the contact points of the instrument with tissue or material. To segment features and key points only from the image data, two preconditions must be accomplished: – Tracking of the instruments to generate a trajectory. – 3D-model of the scene generated from the image data. The tracking of the instruments is described in detail in section 3. 2.3

Classification

The objective of the classification stage is to generate an abstract description of the demonstrated skill to provide the ability of the system to recognize and to interpret surgical skills. Mostly the knowledge of an intervention is implicit, there is no formal description of the single steps. The features and key points of the analysis step need to be merged and represented. Complex systems have usually a hierarchical design in which the different levels are characterized by their information content. A surgical skill can be decomposed into single elementary operators on the lowest level corresponding to the segmented subsequences of the analysis step. Additionally an elementary operator vocabulary depending on the used sensors is generated which can be reused or extended if a new skill is generated and classified.


3

151

Tracking: Method and Results

In order to analyze the temporal behavior of the instruments and to achieve automated localization, a robust tracking algorithm based on the monocular endoscopic images was implemented. The objective is to compute the 2D-trajectory of the instruments to analyze the surgeons gestures, but it could also be used for visual servoing applications like automatic guidance of the endoscope. The image sequences tested were acquired from the endoscope of the da Vinci surgical system [10] during a real intervention. Robust vision-based tracking without modifications to instruments is a challenging task due to the complex scene, e.g. the moving background, specular highlights and varying lighting conditions. The proposed tracking algorithm can be divided in two steps: 1. Segmentation: Color segmentation is first applied to the images to segment the instruments from the background tissue. 2. Tracking: Instruments are tracked to obtain their motion for analyzing gestures. 3.1

Segmentation

Color is an important low-level attribute that can be used for detecting the instruments in the minimally invasive scene. The instruments used in MIS have a typical color characteristic which differs from the tissue’s color. For color segmentation, the HSV space is preferable to the RGB color space. It separates the chromaticity from the luminance components and is therefore more robust to changes in lighting. The color characteristic of an instrument or tissue can be directly analyzed in the H-S plane. The segmentation of the instruments was realized using a Bayes classifier which was trained on a large sample of image sequences [11]. For this purpose, 60 images were segmented manually and for each color in the H-S plane the probability of belonging to the instrument or tissue calculated. Assuming an uni- or multimodal Gaussian distribution turned out to be disadvantageous for the given images. The probability of a color being instrument or tissue can be computed by employing the Bayes rule: P (K|F ) = =

P (F ∩ K) P (K) P (F ∩ K) P (F | K) · P (K) = · = P (F ) P (K) P (F ) P (F )

Quantity of pixels from class K and color F in training data Quantity of pixels from color F in training data

K : Event: pixel belongs to a certain class (instrument or tissue) F : Event: pixel has a certain color

152

S. Speidel et al.

The probability of a pixel being instrument or tissue is defined as follows: g w Instrument, if w·g ≥ τ · w·g P ixel = g w T issue, if w·g < τ · w·g w = Quantity of pixel color belonging to instrument g = Quantity of pixel color belonging to tissue The factor τ was empirically determined, whereas τ = 0.18 was ideal for the given image sequences. For run-time improvement, only every fourth line and column were taken for calculations resulting in a classified binary image (Fig. 1). A combination of masked thresholding and median filtering enhances the result. Incorrectly detected instrument areas, e.g. through specular highlights, were removed (Fig. 1).

Fig. 1. Segmentation results: original image (left), classified image(middle) and filtered image(right)

3.2

Tracking

The tracking of the instruments to derive their motion behavior was realized using the CONDENSATION algorithm - Conditional Density Propagation over time ([12]). The algorithm uses factored sampling with learned dynamical models to propagate an estimate of the state - the probability for object position - over time. The first classified image acquired is used for initialization of one tracker per detected instrument as the center of the classified region. To enhance the initialization in case instruments are crossing, only the border area is considered. The succeeding images are used for tracking the motion which results in a 2Dtrajectory for every instrument (Fig. 2). (n) (n) The algorithm uses a set of N samples (st , πt , n = 1 . . . N ) representing (n) approximately the conditional state density p(xt |Zt ) at time t. st denote the (n) pixels on the image plain and πt their corresponding weights. In detail, the algorithm iterates through the following steps: 1. First classified image: Initialization of N samples to the starting point, weight of every sample is set to 1/N .


153

2. For every new classified image at time t: – Select: Choose N samples of the set proportional to their weight. – Predict: Each new sample is calculated on the base of a dynamic model taking the sample from the last time step and the weighted difference between the two previous mean samples [13,14]: (n)

st

(n)

= st−1 + A(st−1 − st−2 ) + Bω

(1)

where A defines the deterministic component of the model and ω is a vector of independent standard normal random variables scaled by the diagonal matrix B so that BB T is the process noise covariance. – Measure: The new weight is calculated on the base of a likelihood function [15]: Mr 1 (n) (n) 2 (1 − rm ) (2) πt = p(zt |st ) ∝ exp − 2 2σr Mr m=1 where rm denotes the segmentation value of the mth pixel from the set (n) of instrument pixels. The weights are normalized so that πt = 1. – Estimate: The tracking position is estimated as the weighted mean of the samples: N st = = πtn snt (3) n=1

Fig. 2. Tracking results of three image sequences: Instrument with trajectory highlighted in green

154

S. Speidel et al.

The change of instruments, which happens often in MIS, is considered in the procedure. A tracker is deleted if it is not inside an instrument area for the next five images. If a new instrument appears, a new tracker is initialized if the classified instrument area touches the border area. In general, newly initialized trackers are ignored if their time appearance is below a certain threshold. For the special case of overlapping instruments the single trackers of each instrument have the same position and one tracker is deleted.

4

Conclusion and Future Work

The overall concept of generation and classification of surgical skills in MIS was presented. This procedure can be used for providing a context-specific support in an intraoperative assistance system scenario using the endoscopic image sequences as sensory input. For extracting as much information as possible from the observation of a skill, a robust tracking of the instruments was implemented and presented. The resulting trajectory provides necessary information which can be used to segment the surgical skill into elementary operators. The next step will be the extension of the tracking to three dimensions and the calculation of a 3D model from the endoscopic image sequences.

References 1. R. Taylor, D. Stoianovici: Medical Robotics in Computer-Integrated Surgery. IEEE Transactions on Robotics and Automation, 2003. 2. R.Satava, A. Cuschieri, J. Hamdorf: Metrics for objective assessment. Journal of Surgical Endoscopy, 2003. 3. H. Lin, I. Shafran, T. Murphy, A. Okamura, D. Yuh, G. Hager: Automatic Detection and Segmentation of Robot-Assisted Surgical Motions. International Conference on Medical Image Computing and Computer-Assisted Intervention, 2005. 4. J. Rosen, M. Solazzo, B. Hannaford, M. Sinanan: Objective Evaluation of Laparoscopic Skills Based on Haptic Information and Tool/Tissue Interactions. Journal of Computer Aided Surgery, 2002. 5. B. Lo, A. Darzi, G. Yang: Episode Classification for the Analysis of Tissue / Instrument Interaction with Multiple Visual Cues. International Conference on Medical Image Computing and Computer-Assisted Intervention, 2003. 6. H. Mayer, I.Nagy and A. Knoll: Skill Transfer and Learning by Demonstration in a Realistic Scenario of Laparoscopic Surgery. International Conference on Humanoid Robots, 2003. 7. M. Pardowitz, R. Z¨ ollner, R. Dillmann: Incremental Acquisition of Task Knowledge Applying Heuristic Relevance Estimation. International Conference on Robotics and Automation, 2006. 8. R. Z¨ ollner, O. Rogalla, R. Dillmann, M. Z¨ ollner: Understanding Users Intention: Programming Fine Manipulation Tasks by Demonstration. International Conference on Intelligent Robots and Systems, 2002. 9. F. Vogt, S. Kr¨ uger, H. Niemann, C. Schick: A System for Real-Time Endoscopic Image Enhancement. International Conference on Medical Image Computing and Computer-Assisted Intervention, 2003.


155

10. G. S. Guthart, J. K. Salisbury: The intuitive telesurgery system: Overview and application. International Conference on Robotics and Automation, 2000. 11. S. Phung, A. Bouzerdoum, D. Chai: Skin Segmentation Using Color Pixel Classification: Analysis and Comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005. 12. M. Isard, A. Blake: Condensation - conditional density propagation for visual tracking. International Journal of Computer Vision, 1998. 13. P. Azad, A. Ude, R. Dillmann, G. Cheng: A Full Body Human Motion Capture System using Particle Filtering and on-the-fly Edge Detection International Conference on Humanoid Robots, 2004. 14. P. Azad: Integrating Vision Toolkit (IVT). http://ivt.sourceforge.net 15. J. Deutscher, A. Blake, I. Reid: Articulated body motion capture by annealed particle filtering. International Conference on Computer Vision and Pattern Recognition, 2000.

Tracking of Instruments in Minimally Invasive ... - Semantic Scholar

Tracking of Instruments in Minimally Invasive ... - Semantic Scholar

Suggest Documents

minimally invasive techniques - Semantic Scholar

Open minimally invasive cholecystectomy in ... - Semantic Scholar

Minimally-Invasive Assessments of the ... - Semantic Scholar

Hemorrhagic Complications of Minimally Invasive ... - Semantic Scholar

Minimally Invasive Approach to Esophagectomy - Semantic Scholar

WORLD SURGERY Minimally Invasive Video ... - Semantic Scholar

Minimally invasive endoscopic staging for ... - Semantic Scholar

Minimally invasive surgery using intraoperative ... - Semantic Scholar

Minimally Invasive Surgery Combined with ... - Semantic Scholar

Minimally invasive unilateral versus bilateral ... - Semantic Scholar

A Minimally Invasive Sinus Augmentation ... - Semantic Scholar

Minimally Invasive Anterior Decompression ... - Semantic Scholar

Minimally invasive glaucoma surgery: current ... - Semantic Scholar

Minimally invasive video-assisted thyroidectomy - Semantic Scholar

Minimally Invasive Suturectomy and Postoperative ... - Semantic Scholar

Minimally Invasive Follicular Thyroid Carcinoma - Semantic Scholar

Minimally Invasive Surgical Versus Catheter ... - Semantic Scholar

Minimally invasive endoscopic staging for ... - Semantic Scholar

Minimally invasive transcrestal sinus floor ... - Semantic Scholar

Minimally invasive approach for percutaneous ... - Semantic Scholar

Minimally invasive esophagectomy for esophageal ... - Semantic Scholar

Anaesthesia for minimally invasive surgery - Semantic Scholar

Minimally Invasive Midvastus versus Standard ... - Semantic Scholar

Minimally Invasive Procedure for Ventriculoatrial ... - Semantic Scholar