Comparison of Image Features Calculated in Different ...

3 downloads 0 Views 285KB Size Report
Comparison of Image Features Calculated in Different Dimensions for. Computer-Aided Diagnosis of Lung Nodules. Ye Xu*a, Michael C. Leea, Lilla Boroczkya,.
Comparison of Image Features Calculated in Different Dimensions for Computer-Aided Diagnosis of Lung Nodules Ye Xu*a, Michael C. Leea, Lilla Boroczkya, Aaron D. Cann , Alain C. Borczukb, Steven M. Kawutb, Charles A. Powellb b

a

Philips Research North America, 345 Scarborough Road, Briarcliff Manor, NY, USA b Columbia University, College of Physicians and Surgeons, New York, NY, USA ABSTRACT

Features calculated from different dimensions of images capture quantitative information of the lung nodules through one or multiple image slices. Previously published computer-aided diagnosis (CADx) systems have used either twodimensional (2D) or three-dimensional (3D) features, though there has been little systematic analysis of the relevance of the different dimensions and of the impact of combining different dimensions. The aim of this study is to determine the importance of combining features calculated in different dimensions. We have performed CADx experiments on 125 pulmonary nodules imaged using multi-detector row CT (MDCT). The CADx system computed 192 2D, 2.5D, and 3D image features of the lesions. Leave-one-out experiments were performed using five different combinations of features from different dimensions: 2D, 3D, 2.5D, 2D+3D, and 2D+3D+2.5D. The experiments were performed ten times for each group. Accuracy, sensitivity and specificity were used to evaluate the performance. Wilcoxon signed-rank tests were applied to compare the classification results from these five different combinations of features. Our results showed that 3D image features generate the best result compared with other combinations of features. This suggests one approach to potentially reducing the dimensionality of the CADx data space and the computational complexity of the system while maintaining diagnostic accuracy. Keywords: lung, CAD development, classifier design, pulmonary nodule

1. INTRODUCTION There are 208,657 new cases of lung cancer expected to occur in US in 2008, which accounts for 15% of cancer diagnoses, and it ranks second among new cases of cancers [1]. Moreover, 161,775 lung cancer deaths are expected to occur in the US in 2008, and it is number one among all deaths of cancers [1]. One in 14 men and women will be diagnosed with lung cancers during their lifetime and overall 5-year survival rate is only 16.0% [2]. Recent studies showed that survival for particular types of lung cancer can be improved by chemotherapy, surgery, radiation therapy and intervention at early stages. Consequently, the earlier that lung cancer can be diagnosed, the more chance it can be treated. However, currently, only 16% of lung cancers are diagnosed at the early stage [3]. Solitary pulmonary nodules (SPN) are small (less than 3-4 cm), rounded, ellipse or irregular shape densities in the lungs often spotted on chest x-rays, or computed tomography (CT). CT scans are extremely sensitive in detecting nodules as small as 2 or 3mm within the lungs while they can not be viewed on a conventional chest x-ray. Identification of pulmonary nodules in CT images is important because it represents a potential finding of lung cancer. In current practice, 40% of the nodules found on CT are malignant [4]. The final accurate diagnosis of the nodule can be acquired by a biopsy of the nodule tissue sample. However, even minimally invasive biopsies include some level of patient risk, discomfort, and cost. In addition to detecting SPNs, CT scans can be used to non-invasively assess whether a nodule should be biopsied, or whether a second scan at a later date is more appropriate. Computer-aided diagnosis (CADx) can reduce the physicians’ workload and provide a second opinion by using a computer to assign a probability to a diagnosis. The imaging characteristics of nodules, such as shape, volume, and margin, are very important features in identifying malignant nodules, e.g., many malignant lesions have irregular shapes with spiculated margin, while many benign nodules are

Medical Imaging 2009: Computer-Aided Diagnosis, edited by Nico Karssemeijer, Maryellen L. Giger Proc. of SPIE Vol. 7260, 72600Z · © 2009 SPIE · CCC code: 1605-7422/09/$18 · doi: 10.1117/12.807866

Proc. of SPIE Vol. 7260 72600Z-1

round with a smooth margin. Fig. 1 shows a malignant lung nodule with a spiculated margin in one slice of a CT volume.

Fig. 1. Left image shows one slice of a CT image of a malignant nodule in the periphery of the left lung. The right image shows an enlarged view of the nodule, showing its spiculated margin and irregular shape.

Calculation of image features to quantify the characteristics of the abnormal area is essential for lung nodule CADx. Armato, et al., investigated automated lung nodule classification based on CT using the combination of gray-level and morphology-based 2D and 3D features [4]. Way et al., applied a rubber band straightening transform to the margin of the nodules and calculated run-length statistic texture features on the transformed data. Inside the nodules, they calculated several morphological and gray-level features [5]. Yoshida, et al. worked on CADx of pulmonary nodules in chest radiographs using wavelet snake features combined with morphological features. Among those research groups working on CADx of nodules, some CADx systems have only employed 2D features [69]; some used combinations of 2D features and 3D features [4]; some groups used 3D features alone [5, 10]. We have previously described a CADx algorithm combining image features in different dimensions, including 2D, 2.5D and 3D features [11]. 2D features are focused on characterizing a suspicious area in the slice containing the largest nodule crosssectional area. 3D features are calculated on the whole volume of interest (VOI) through multiple slices. 2.5D features are the size-weighted average of the features of VOI in multiple slices. We have calculated different features in various dimensions, such as grayscale, shape, texture and gradient features [11, 12]. The result achieved 78% accuracy, 79% sensitivity, and 78% specificity. However, there has been little systematic study of the relevance of the different dimensions and of the impact of combining different dimensions. The primary aim of this study is to determine the importance of combining 2D, 3D and 2.5D features in characterization of lung nodules. Secondly, we want to compare features in different dimensions to find a solution with the best accuracy, specificity and sensitivity. This report is organized as follows: our proposed CAD system will be presented in section 2, and we will explain the details of the experiments in section 3. The conclusion and discussion will be presented in section 4.

2. METHODS 2.1 Data Preparation Data were collected from 121 patients scanned using multi-detector row CT (MDCT) at the Columbia University Medical Center (New York, NY) between 2000 and 2003 (with in-plane resolution 0.5-0.8 mm, and slice thickness 5 mm) with one to three nodules per patient. 125 nodules (63 benign; 62 malignant) were used for leave-one-out cross-

Proc. of SPIE Vol. 7260 72600Z-2

validation of the CADx system. “Ground truth” diagnosis was established from either biopsy results or from observed two-year stability in nodule size. 2.2 System Overview The diagram of our CADx system is shown in Fig. 2. After extracting a VOI containing each nodule, the nodule and surrounding normal structures were segmented [13]. We then calculated 192 image features on the segmented volumes. The above procedure was applied to both the training data (124 nodules in each leave-one-out iteration) and the test data (1 nodule left out). We next used a Cross-generation Heterogeneous-crossover Cataclysmic-mutation (CHC) genetic algorithm (GA) wrapped around a linear support vector machine classifier (GA-SVM) to find optimal features based on the training data [14]. A linear SVM is then used to classify the test data based on the feature subset identified by the GA. By randomly splitting the full training data (124 nodules in each leave-one-out iteration) into GA training sets of 100 nodules and GA testing sets of 24 nodules, different feature subsets and classifications were obtained. This random splitting was performed n times to create n classifications which could then be combined to yield the final ensemble SVM result. The final likelihood of malignancy for the VOI was calculated by combining the individual SVMs of the ensemble by voting (0.50 was used as a threshold for the diagnosis).

Et:mhle SVM V

f)izti

Output

Fig. 2. Block diagram of the CADx system with solid lines representing the training steps and dashed lines representing the test steps.

2.3 Feature Extraction We used three dimensions of image features (2D, 2.5D and 3D features) since features in different dimensions might provide useful information for the characterization. Fig. 3 shows a 3D view of a nodule viewed at two different angles 90 degrees apart. On the right image, we can see this nodule is not in a round shape, it is more like a cone, especially at the tail of the nodule. If we take a slice of the nodule in a plane parallel to the left hand view, the nodule will appear to be round. If the slice is taken parallel to the right image, it may look like an elongated ellipse. Thus, just using 2D information might not provide enough information of the nodule, since it is dependent on the orientation of the nodule relative to the slice.

Proc. of SPIE Vol. 7260 72600Z-3

Fig. 3. A lesion in 3-D view (left image), the same lesion when turned in 90 degree (right image).

The numbers of the features calculated in different dimension in our study are given in Table 1. We calculated grayscale, shape, invariant moment, gradient, and texture features [11, 12] for the nodules and the surrounding areas in 2D, 2.5D and 3D dimensions, as defined in the Introduction. 3D features extended some of the 2D features to 3D while also adding features such as histogram of grayscale features, texture features and gradient features. Table 1: List of number of features used in the feature calculation of 2D, 2.5D and 3D total

grayscale

shape

invariant moments

2D

56

7

15

6

2.5D

56

7

15

3D

80

6

8

Total

192

features

texture

gradient

histogram

27

1

NA

6

27

1

NA

6

47

8

5

3. EXPERIMENTS We performed the experiment on five groups of combinations of features in different dimensions and named them as 2D, 3D, 2.5D, 2D+3D, and 2D+3D+2.5D respectively, For example, 2D means to use 2D features alone, 2D+3D means to use both 2D and 3D features, etc. For each combination, we first determined the optimal feature subset size using an ensemble classifier of 10 SVMs. Candidate sizes were tested by forcing the GA to produce subsets of a fixed size using the ensemble SVM approach describe in section 2.1. Then, we performed leave-one-out cross-validation on 125 cases. For each test case, a final ensemble classifier of 50 SVMs was constructed using the optimal subset size, training each ensemble member on the other 124 cases. Each experiment was repeated ten times. Each time, the random number seeds for the GA-SVM algorithm were reinitialized, resulting in selection of different features. The accuracy, sensitivity and specificity were averaged across the ten experiments. The Wilcoxon signed-rank test was applied to compare the accuracy, sensitivity and specificity across different groups using a combination of features in different dimensions. Fig. 4 illustrates the set-up of the experiment.

Proc. of SPIE Vol. 7260 72600Z-4

25D

2D+3D

T

2D+2.5D+3D

T Find optimal feature size 100 training cases, 24 testing cases, 10 SVMs

I Leave-one-out on optimal feature size 100 training cases, 24 testing cases, 50 SVMs

Repeat 10 times

Vote V

Average on 10 results (sensitivity, specificity, accuracy)

Fig. 4. Set-up of the experiment

4. RESULTS During the process to find optimal features size, we found that for each combination of features in different dimensions, the accuracy of classification increased initially and then declined after an optimal size of feature subset has reached. For example, in Fig. 5, for the 2D+2.5D+3D features, the accuracy at the number of 40 features reached to the highest value, and then decreased after that. The optimal feature subset sizes for the different feature combinations are shown in Table 2.

Accuracy change w ith feature subset size variation of 2D+2.5D+3D features 0.81 0.8

Accuracy

0.79 0.78 0.77 0.76 0.75 0.74 0.73 0.72 0

50

100

150

Num ber of features

Fig. 5. Accuracy change with feature subset size variation of 2D+2.5D+3D features

Proc. of SPIE Vol. 7260 72600Z-5

200

250

Table 2. Optimal feature subset size for each combination of features in different dimensions

Experiments

Optimal Size

2D features 2.5D features 3D features 2D+3D features 2D+2.5D+3D features

20 20 40 30 40

Fig. 6 shows the average accuracy, sensitivity, and specificity for the 2D, 2.5D, 3D, 2D+3D, and 2D+3D+2.5D experiments of leave-one-out performance. The Wilcoxon signed-rank tests showed that 3D image features alone generated the best result among the 5 groups (p

Suggest Documents