Computers in Biology and Medicine 62 (2015) 196–205
Contents lists available at ScienceDirect
Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm
Semiautomatic computer-aided classification of degenerative lumbar spine disease in magnetic resonance imaging Silvia Ruiz-España a, Estanislao Arana b, David Moratal a,n a b
Center for Biomaterials and Tissue Engineering, Universitat Politècnica de València, 46022 Valencia, Spain Radiology Department, Fundación Instituto Valenciano de Oncología, Valencia, Spain
art ic l e i nf o
a b s t r a c t
Article history: Received 16 December 2014 Accepted 16 April 2015
Background: Computer-aided diagnosis (CAD) methods for detecting and classifying lumbar spine disease in Magnetic Resonance imaging (MRI) can assist radiologists to perform their decision-making tasks. In this paper, a CAD software has been developed able to classify and quantify spine disease (disc degeneration, herniation and spinal stenosis) in two-dimensional MRI. Methods: A set of 52 lumbar discs from 14 patients was used for training and 243 lumbar discs from 53 patients for testing in conventional two-dimensional MRI of the lumbar spine. To classify disc degeneration according to the gold standard, Pfirrmann classification, a method based on the measurement of disc signal intensity and structure was developed. A gradient Vector Flow algorithm was used to extract disc shape features and for detecting contour abnormalities. Also, a signal intensity method was used for segmenting and detecting spinal stenosis. Novel algorithms have also been developed to quantify the severity of these pathologies. Variability was evaluated by kappa (k) and intra-class correlation (ICC) statistics. Results: Segmentation inaccuracy was below 1%. Almost perfect agreement, as measured by the k and ICC statistics, was obtained for all the analyzed pathologies: disc degeneration (k ¼ 0.81 with 95% CI ¼ [0.75..0.88]) with a sensitivity of 95.8% and a specificity of 92.6%, disc herniation (k ¼0.94 with 95% CI ¼ [0.87..1]) with a sensitivity of 60% and a specificity of 87.1%, categorical stenosis (k ¼0.94 with 95% CI ¼ [0.90..0.98]) and quantitative stenosis (ICC ¼0.98 with 95% CI¼[0.97..0.98]) with a sensitivity of 70% and a specificity of 81.7%. Discussion: The proposed methods are reproducible and should be considered as a possible alternative when compared to reference standards. & 2015 Elsevier Ltd. All rights reserved.
Keywords: Lumbar intervertebral discs Disc degeneration Herniation Lumbar spinal stenosis Segmentation Reproducibility
1. Introduction Nowadays, lumbar disc degeneration, herniation and spinal stenosis are very common entities that affect millions of people, causing lower back pain (LBP) which can restrict mobility and interfere with daily routine of posture [1]. Degenerative changes as loss of disc height or osteophyte formation and degenerative disc herniation cause most cases of lumbar spinal central and lateral stenosis (490%) [2,3]. So, approximately one in every 1000 individuals over the age of 65 undergoes laminectomy surgery annually for spinal stenosis [2,4]. One-third of adults over the age of 20 show evidence of herniated discs [5] and 90% of herniations occur in the lumbar and lumbosacral regions of the spine [6]. Although imaging techniques have limitations, magnetic n Correspondence to: Center for Biomaterials and Tissue Engineering, Universitat Politècnica de València, Camí de Vera, s/n, 46022 Valencia, Spain. Tel.: þ 34 96 387 70 07x88939; fax: þ 34 96 387 72 76. E-mail address:
[email protected] (D. Moratal).
http://dx.doi.org/10.1016/j.compbiomed.2015.04.028 0010-4825/& 2015 Elsevier Ltd. All rights reserved.
resonance imaging (MRI) is the preferred modality for the accurate diagnosis of intervertebral disc pathology and spinal stenosis [7]. As inter-rater agreement among radiologists is often moderate, reliable methods to quantify and classify these entities are needed [5,7]. Also, a demand of computer-aided diagnosis (CAD) methods has increased in the past decade as a way to reduce radiologist workload in the imaging diagnosis of lower back pain [8] and improve repeatability. Several algorithms have been developed, with variable success. For disc degeneration, some studies measured the T1, T1ρ and T2 relaxation times and the apparent diffusion coefficient (ADC) [9–11]. However, their main drawback is long acquisition times and the necessity of specific image acquisition protocols, which makes its routine clinical use difficult, showing also controversial outcomes relationship [7,12]. In addition, other studies based on shape, context, intensity, and texture information differentiated only between normal and degenerated discs, without specifying degeneration grade [13–15]. By contrast, several approaches used classification systems [12,16,17], such as Pfirrmann classification.
S. Ruiz-España et al. / Computers in Biology and Medicine 62 (2015) 196–205
Pfirrmann classification is a 5-level grading system for classifying the severity of disc degeneration. Disc degeneration is graded using MRI images to evaluate the homogeneity of disc structure, signal intensity, differentiation between nucleus and annulus and disc height. This information is converted into five grades, being considered as grade I when signal intensity is homogeneous and bright, and there is a clear distinction between nucleus and annulus and height is normal. As grade V shows low signal intensity, it is not possible to differentiate between nucleus and annulus and disc space is collapsed [6]. However, most of these approaches, based on Pfirrmann classification, were not capable to distinguish between grades IV and V [12,16]. An early degenerative change normally seen in MRI is a decrease of the mean intensity on T2 sequences, so this is a widely accepted and the most common method for disc degeneration classification [7], and it has been taken into account to develop the method presented in this work. There are not many clinically useful CAD systems for the detection of disc herniations in the lumbar area. Some of these studies proposed methods based mainly on geometrical features (shape size and location) [18–21]. However, the majority of them only distinguished discs as normal or abnormal [18,21], or between bulging and herniation [19]. Tsai et al. [20] developed a method able to distinguish among bulging, protrusion, extrusion and separation but it was only patented for educational purposes. In this work, we present a method, also based on shape features, capable of classifying among normal, focal-based protrusion, broad-based protrusion and extrusion, regardless of their location. Methods developed to diagnose lumbar spinal stenosis with standard definitions are limited [21–24] and even quantitative techniques are scarcer [2]. In the method presented in this work, it is possible not only to detect but also to quantify spinal stenosis. In addition, existence of one abnormality provokes the development of other abnormalities [1,2]. So, CAD systems available to detect several related pathologies would be very useful for clinical routine. However, combined methods to detect several pathologies are rarely reported [15,21], and they are only capable of classifying the discs as normal or abnormal. Another drawback is the absence of the gold standard, only obtained in cadaver specimens [25,26]. Currently, the only accepted source for the definition of a ground truth (GT) is based on signal intensities and boundary markings performed by expert radiologists [27], absent in several spine imaging studies [2,5,7]. Therefore, grading classifications, as Pfirrmann, for disc degeneration [6] and disc contour [28] according to standardized nomenclature are encouraged [29]. The purpose of supervised learning on a CAD system is to deduce a functional relationship from validating data that generalizes well to unknown data. This requisite is not always accomplished in diagnostic imaging studies [5,7,30,31]. In this work, we restrict ourselves to two-dimensional MRI as it is the gold standard in clinical practice [2], due to its relative simplicity and low computational requirements. The purpose of this paper is to present a three-way CAD methodology for detecting and quantifying degenerative disc disease, according to Pfirrmann classification, disc contour abnormalities and spinal stenosis with minimal user input. To the best of our knowledge, no full CAD system is available to detect and also to quantify spinal stenosis.
2. Materials and methods 2.1. Subjects For validating and testing, 14 (9 male, 5 female) and 53 (25 male, 28 female) subjects, respectively, were randomly selected among
197
patients referred to lumbar MRI in our Radiology Department for LBP and/or sciatica in 2013, and also randomly assigned to each group. All were assessed by visual analog pain scale (VAS, range 0–10). There were no statistical differences regarding age, gender, or VAS pain scale between both groups (t-test, p¼0.12). Their characteristics are shown in Table 1. All lumbar intervertebral levels were selected for Pfirrmann's grade analysis. Disc contour was studied where axial images were obtained; as in clinical practice discs observed by MRI technologist as normal in sagittal images are not explored in axial sequences. Prevalence of disc degeneration was available for the 70 discs used for validation and for the 265 discs used for testing. Its global value was 31.5% (32.3% for the validation cases and 31.1% for the test cases) (Table 1). No statistical differences were found between both datasets (ANOVA F:1.04, p ¼0.08). Disc contour abnormalities were validated in 52 discs of the former group and tested in 180 discs of the testing dataset (Table 1). 2.2. Magnetic resonance imaging All examinations were performed on a 1.5-T MRI (Siemens Symphony, Erlangen, Germany) with a 6-channel phased-array spine coil. Same image acquisition protocols were used for validating and testing. Common sequences usually used for detecting spinal pathology, axial and sagittal T2-weighted were used in this study without fat suppression [32]:
Sagittal T2-weighted turbo spin echo 2896–3300 ms/102–120 ms
(TR/effective TE), 416–576 448–1024 matrix, 270 mm field of view, 11 slices of 4 mm thickness and a pixel spacing of [0.4492– 0.8203] [0.4492–0.8203], 2 acquisitions, 12 echo train length. Axial T2-weighted turbo spin echo 2896–3040 ms/103–120 ms (TR/effective TE), 256–512 256–512 matrix, 180 mm field of view, 15 slices of 4 mm thickness and a pixel spacing of a [0.3906–0.8594] [0.3906–0.8594], 3 acquisitions, 5 echo train length. Slices were placed in the plane of the five lower discs.
2.3. Disc and spinal stenosis qualitative classification Qualitative classification of disc degeneration based on Pfirrmann grading system was made by an experienced radiologist (15 years' experience in spine imaging). Discs were classified into 5 grades (from grade I: normal disc to grade V: collapsed disc space). Table 1 Characteristics of patients included in the study. M, male; F, female; LBP, low back pain; VAS, visual analog scale; y, years.
Gender Age (y)a LBP intensity (VAS) Disc degeneration (Pfirrmann's grade) (%) I II III IV V Disc contour (%) Normal Bulging Herniation Spinal stenosis (yes) a
Validation group (14 patients)
Testing group (53 patients)
9 M/5 F 46.17 13.7 6.17 1.2 70 discs
25 M/28 F 47.3 7 12.7 6.3 7 1.7 265 discs
– 7.14 21.3 42.8 28.8 52 discs 28.8 48.1 23.1 50
– 7.2 21.6 43 28.2 180 discs 29.4 50.5 20.1 48.8
Mean and standard deviation.
198
S. Ruiz-España et al. / Computers in Biology and Medicine 62 (2015) 196–205
Qualitative classification of disc herniation and detection of spinal stenosis were also made by the same radiologist. Herniation was defined as a localized displacement of disc material beyond the limits of the intervertebral disc space [28]. So, every disc was classified as normal, bulging disc, and herniation: protrusion (focal-based or broad-based) or extrusion [28,32]. An example of these pathologies is shown in Fig. 1. This radiologist also performed a manual disc segmentation from 2D MRI axial and sagittal slices, blinded to imaging reports, one month later. Before starting any analysis, on mid-sagittal slice, localizer lines are superimposed indicating the location of the transverse axial image slices. In this way, at the same time as the user selects the discs to be analyzed, the localizer lines positioned in the middle of the discs are selected, locating not only the disc levels but also the axial images which will be analyzed. All semi-automatic analyses were performed by the same engineer, naïve to MRI reading and blinded to imaging reports. In the whole process of detecting and quantifying degenerative
disc disease and contour abnormalities, manual tasks were limited to placing a seed point in the intervertebral disc. Also a seed point was necessary to segment the spinal canal and 5 landmarks per patient to detect and quantify stenosis in the whole lumbar region. A processing time of 40 s per lumbar spine was required to perform a complete analysis. A diagram of the process can be seen in Fig. 2. This was done with in-house graphical user interface developed on MATLAB 7.10 (R2010a) (The MathWorks, Inc., Natick, MA, USA), as well as all the scripts implemented in the following sections.
2.3.1. Disc degeneration. Quantitative evaluation Evaluation of disc degeneration is performed in sagittal T2-weighted MRI; a mid-sagittal slice was selected for each case. Prior to other analyses, localization and labeling of the intervertebral discs was a mandatory first step [33]. Secondly, an image
Fig. 1. Axial T2-weighted MR images taken through L5-S intervertebral discs. (a) Normal disc where it is possible to distinguish the increased signal of the normal nucleus and decreased signal of the normal annulus. (b) Bulging disc (white arrows). Focal-based protrusion (c), broad-based protrusion (d) and an extrusion (e) are outlined by the dotted white lines.
S. Ruiz-España et al. / Computers in Biology and Medicine 62 (2015) 196–205
199
Fig. 2. Segmentation steps of the proposed methods to classify and quantify disc degeneration and herniation (a) and spinal stenosis (b).
preprocessing procedure was performed. A contrast stretching was applied to the images extending the intensity distribution around an ad-hoc level of interest automatically chosen for each image using Otsu's method [34]. A Gaussian filter was applied for denoising followed by a Canny edge detection method to enhance the edges and prepare the image for segmentation [35]. To solve poor convergence to concave boundaries, Gradient Vector Flow algorithm was used and manually initialized using a single seed point [36]. A comparison between semi-automatic segmentation and manual segmentation is shown in Fig. 3. Skeleton of the segmented area was extracted after sagittal image segmentation. So, a skeleton pruning method based on contour partitioning with Discrete Curve Evolution (DCE) was used [37]. Pruning was applied to obtain a skeleton with only four terminal points, shown in Fig. 4(a). The longest horizontal (x-direction) path of the skeleton was then chosen, keeping only two terminal points and one point per pixel. As degenerated discs present decreased height, its measurement was calculated [38] at midpoint of its horizontal skeleton, depicting the vertical (y-direction) intensity profile of the disc along the orthogonal line to the horizontal path of the skeleton. Grade V (severe disc degeneration) was categorized not only considering the intensity profile but also when disc height was below 6 mm, Fig. 4 (b), the threshold selected by comparing the results obtained with the training dataset and the qualitative evaluation. At the vertical intensity profile of each disc, percentiles 10–90th were computed to avoid signal intensity outliers (Fig. 4(c)). The 10% central values of the disc profile were thoroughly studied (ranging from 1 to 3 pixels depending on the disc height), as differentiation of nucleus and annulus is a key finding in
Pfirrmann classification. The vertical intensity profiles were then normalized according to its min–max intensities. Normalized vertical intensity profiles of the 52 lumbar intervertebral discs are shown in Fig. 5(a) and the magnification of the 10% central values in Fig. 5(b). From intensity profiles, intensity summary statistics were calculated (Fig. 6a and b). Mean intensities were selected as classification thresholds as they differentiated significantly among Pfirrmann grades. Therefore, using the training dataset, classification thresholds were obtained from intensity profiles depicting a hypointensity gradient as degeneration advances.
2.3.2. Disc herniation. Quantitative evaluation Evaluation of disc contour abnormalities is performed in axial T2-weighted MRI; the central slice of each disc analyzed was identified and selected for each case. For disc segmentation, Gradient Vector Flow algorithm was again used and a seed point was manually selected to initialize. Subsequently, binary masks of the segmented regions were obtained and boundary of the intervertebral discs was approximated by an ellipse. This approximation was made by 2D tensor scale, which is a morphometric parameter that provides local measurements of size, orientation and anisotropy [39]. An approximation of a disc by an ellipse can be seen in Fig. 7(a). A binary mask from the ellipse was obtained to subtract both masks. Connected areas bigger than 15 mm2 outside this border were considered disc contour abnormalities (Fig. 7(b) and (c)), the threshold also selected by comparing the results obtained with the training dataset and the qualitative evaluation. Contour abnormality length was computed by a normal vector at
200
S. Ruiz-España et al. / Computers in Biology and Medicine 62 (2015) 196–205
Fig. 3. (a–d) Comparison between semiautomatic segmentation using a Gradient Vector Flow (yellow line) and manual segmentation (red line) of a disc classified as grade II (a), as grade III (b), as grade IV (c) and as grade V (d). Orange points indicate where semiautomatic segmentation and manual segmentation coincide. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 4. (a) Skeleton of the disc with four terminal points. (b) Skeleton with two terminal points (longest path) and central height of the disc (interpolated vector). (c.1) Vertical intensity profile. (c.2) Vertical intensity profile ordered removing the 10% profile ends.
midpoint of the interpolation between the ellipse and the selected area, while contour abnormality width was calculated by a perpendicular vector at the middle of the obtained length. Differentiation between a disc protrusion and extrusion was made by geometry in axial plane. If object width was bigger than object length, a protrusion was named and vice versa as an extrusion [28]. Length in millimeters was considered herniation size (Fig. 7(d)). Distinction between focal-based protrusion and broad-based protrusion depends on the ratio between the border of the protrusion/extrusion and ellipse arc perimeter between the
intersection points. A focal-based protrusion involves less than 25% of the perimeter of the ellipse and a broad-based protrusion involves between 25% and 50% of the perimeter of the ellipse [28].
2.3.3. Spinal stenosis. Quantitative evaluation Detection and quantification of spinal stenosis was performed in sagittal T2-weighted MRI at intervertebral disc level [2]; midsagittal slice was again selected for this purpose.
S. Ruiz-España et al. / Computers in Biology and Medicine 62 (2015) 196–205
Fig. 5. (a) Normalized vertical intensity profiles of the 52 lumbar intervertebral discs, with a magnification of the 10% central values (b). Grade II ( (□), and grade V (*).
201
), grade III ( ), grade IV
the main criteria for calculation of spinal stenosis [40]: DSCR ¼ 1 d=m 100
ð1Þ
where d is the spinal canal diameter at disc level and m is the mean between diameters obtained at the mid-pedicle unaffected by the two closest vertebral bodies, modified from Zheng et al. [40] (Fig. 9(b)). 2.4. Statistical analysis Data analysis addressed 3 factors influencing semiautomatic evaluation:
Fig. 6. (a) Mean intensity, standard deviation and maximum and minimum intensity values of the 10% central range of each disc. (b) Box-plot showing mean intensity summary statistics. The red line is the median, the top of the box is the 75th percentile, the bottom of the box is the 25th percentile and the whiskers represent the maximum and minimum values. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
To segment spinal canal, a suitable size intensity window was established by setting a threshold. A small circumference (1.5 mm radius) was generated from a seed point manually placed in the cerebrospinal fluid (CSF) of the spinal canal. The adjustable window was centered on the mean intensity and the window width was proportional to signal standard deviation in this region of interest (Fig. 8(a)). In order to model the ideal contour of the spinal canal, a B-spline contour was fitted to landmarks, indicated by the radiologist, at the midpedicular level of each lumbar vertebra. The real contour was obtained tracing normal vectors to the B-spline at each of its points. The intersection between each normal vector and the anterior margin of the segmented spinal canal provided a sequence of points describing the real contour of the spinal canal (Fig. 8(b) and (c)). A comparison between both contours was used to establish a threshold for a subsequent detection of spinal stenosis; the threshold was set to 4 mm (Fig. 9(a)). Quantification of dural sac canal ratio (DSCR) was performed at the disc level as
1. Inaccuracy was calculated to evaluate segmentation quality in the training set. 2. Accuracy was estimated compared to reference radiologist, after dichotomization. At the analysis phase, disc degeneration was classified into grades I, II and III (normal) versus IV and V (pathological) [32]. Disc contour was dichotomized in normal (including bulging) and abnormal (protrusion or extrusion). Spinal stenosis was graded categorically (yes/no) and quantitatively (percentage of reduction of spinal canal diameter). A 4-way analysis of variance (ANOVA) was used to compare the Pfirrmann degeneration grades and signal intensity profiles. 3. Reproducibility of quantitative measures was evaluated with the intraclass correlation coefficient (ICC) for each analysis. The intraclass correlation coefficient is a widely used measure of inter-rater reliability for the case of quantitative ratings: S2b ICC ¼ 2 Sb þS2w
ð2Þ
Where S2w is the pooled variance within subjects, S2b is the variance of the measurements between subjects and S2b þ S2w is the total variance. Following the criterion of Fleiss et al. [41], ICC4 0.75 has been taken as good agreement. To assess reproducibility of categorical variables, subjective analysis of the training set of 52 lumbar discs was performed four times by the same engineer and analysis of the 243 lumbar discs used for testing was performed twice, with an interval of a month. Ratings from each test performance were cross-tabulated and agreement was measured by using k statistic. The k statistic (Cohen's kappa) is intended to measure agreement in statistical
202
S. Ruiz-España et al. / Computers in Biology and Medicine 62 (2015) 196–205
Fig. 7. (a) Approximation of the intervertebral disc by an ellipse. (b) Objects obtained after subtraction of the segmented intervertebral disc and the filled ellipse. (c) Selection of objects with areas bigger than 15 mm2 (the square indicates the magnified region shown in (d)). (d) Detection and classification of a disc as a focal-based protrusion (zoom of (c)). Pink lines are the length and the width of the object.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 8. (a) Segmented spinal canal (the square indicates the zoomed region shown in (b) and (c)). (b) Ideal and real contours in the magnified original image. Yellow points are the landmarks, indicated by the radiologist, at the midpedicular level of each lumbar vertebra. (c) Ideal and real contours in the magnified segmented image. Blue lines are the ideal contours, pink lines are the real contours and gray lines indicate the intervertebral disc locations.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 9. (a) Difference between real and ideal contours in mm. (b) Calculation of the spinal canal narrowing. Gray lines indicate the intervertebral disc locations.
S. Ruiz-España et al. / Computers in Biology and Medicine 62 (2015) 196–205
treatments of qualitative data: k¼
P o P e 1 Pe
ð3Þ
Where P o is the relative observed agreement among raters and P e is the hypothetical probability of chance agreement. Following the criterion of Landis et al. [42], k was categorized as reflecting an almost perfect (0.81–1.0), substantial (0.61–0.80), moderate (0.41–0.60), fair (0.21–0.40), slight (0.00–0.20), or poor ( o0.00) agreement. Agreement between each item (k) was calculated with 95% level of confidence. Bootstrap estimation of 95% confidence limits was calculated using 1000 resamples [43]. Statistical packages (GraphPad Prism, GraphPad software, Inc., La Jolla, USA and IBM SPSS statistics version 20, Somers, NY, USA) were used for data analysis.
203
At the testing phase, herniation reliability showed an almost perfect agreement of k ¼0.95 with CI_{95} ¼[0.87..1.03], while categorical stenosis depicted a k ¼0.85 with CI_{95} ¼ [0.77..0.93] and quantitative stenosis showed an ICC of 0.95 with CI_{95} ¼ [0.91..0.97]. For disc herniations, validation showed an almost perfect agreement of k ¼0.94 with CI_{95} ¼[0.87..1] and for categorical stenosis also showed a perfect agreement of k ¼0.94 with CI_ {95}¼[0.90..0.98]; while quantitative stenosis obtained an ICC of 0.98 with CI_{95} ¼ [0.97..0.98], shown in Table 2. However, sensitivity for herniation diagnosis was 60% and specificity 87.1%. Categorical stenosis diagnosis showed a 70% sensitivity and 81.7% specificity. Combined detection of disc herniations and spinal stenosis showed a 68% sensitivity and 85.6% specificity.
4. Discussion 3. Results 3.1. Disc degeneration Segmentation inaccuracy of disc segmentation was below 1% both for sagittal and axial images. Normalized vertical intensity profiles of the 52 lumbar intervertebral discs obtained by application of our method are shown in Fig. 5(a) and magnification detail of 10% central values is depicted in Fig. 5(b). Intensity means, standard deviation and minimum and maximum values of the range that contains the central 10% of the values (nucleus zone) are represented in Fig. 6(a). Mean intensities differentiated significantly among Pfirrmann grades (Anova test F, 3.87, p ¼0.015), Fig. 6(b). In this section, both k and ICC statistics were used to examine reproducibility of disc degeneration. From now on, values of ICC and k statistics will be given, followed by its 95% confidence interval, denoted by CI_{95}. Quantitative measures showed an ICC reproducibility of 0.9 with CI_{95} ¼[0.85..0.94] and a substantial agreement of k ¼0.65 with CI_{95} ¼[0.39..0.9]. In the testing dataset, ICC was 0.460 with CI_{95} ¼ [0.36..0.56] showing an almost perfect agreement of k ¼0.81 with CI_{95} ¼ [0.75..0.88]. Sensitivity was 95.8% and specificity 92.6%. Table 2 shows information relating to the ICC and k calculations.
3.2. Disc herniation and spinal stenosis In this section, ICC has been used to assess reproducibility of quantitative stenosis and k statistics to assess agreement of herniation and categorical stenosis. Table 2 Intraclass correlation coefficient and kappa statistics. Intraclass correlation (ICC)
Validation Degeneration 0.9 Herniation Stenosis 0.98 Testing Degeneration 0.46 Herniation Stenosis 0.95
95% Confidence interval Lower Bound
Upper Bound
0.85
0.94
0.97
0.98
0.36
0.56
0.91
0.97
Kappa (k)
95% Confidence interval Lower Bound
Upper Bound
0.65 0.94 0.94
0.39 0.87 0.90
0.9 1 0.98
0.81 0.95 0.85
0.75 0.87 0.77
0.88 1.03 0.93
Three complete procedures to evaluate lumbar MRI for disc degeneration, disc herniation and spinal stenosis are proposed, which are highly precise and give reproducible results. Previous studies for the diagnosis of the degenerative intervertebral disc disease were able to classify the discs as normal or degenerated [15]. However, present work also differentiated among Pfirrmann degeneration grades, usually found in adults, based on conventional T2-weighted MRI. Alternative approaches to disc degeneration quantification include diverse MRI sequences and the apparent diffusion coefficient [9–11]. These quantitative approaches provide supplementary information to the conventional T2-weighted images, regarding both the biochemical composition and structural integrity of the intervertebral discs, but these techniques require specific image protocols with longer acquisition times, which preclude their application in clinical routine [7,12]. Previous approaches based on textural descriptors could not distinguish between grades IV and V [12]. Similar problems were found comparing apparent diffusion coefficient (ADC) and quantitative T2 mapping images [10]. The latter was more accurate, but even with manual delineation of discs, could not differentiate grades IV and V [9]. Those results make measurement of disc space narrowing mandatory, an ancillary finding in grade V. This has been accomplished with the current method, based on segmentation and skeletonization of the disc. Although segmentation requires the user's interaction, only a seed point is necessary; from it, the whole procedure to distinguish between nucleus and annulus, which is based on intensity, as well as the measurement of the disc height is fully automatic. Our method for detection and characterization of disc herniation has shown moderate sensitivity and high specificity, in the range of recent studies correlating MRI with clinical examination [44]. Our yield was similar to the single classifier type described by Koh et al. [8]. However, an overall better sensitivity and specificity was obtained developing an ensemble classifier with four different algorithms [8]. Some approaches [19,20] do not give numeric values of sensitivity and specificity or accuracy. Tsai et al. [20] presented the results as excellent (75%) and good (25%). In addition, these results were obtained from a limited patient data (16 patients). The statistical analysis of Mayerhoefer et al. [19] was focused on identifying the three textural and three geometric descriptors that were best suited for distinguishing among normal, bulging or herniated discs. Better values of sensitivity and specificity were obtained in the study developed by Alomari et al. using a Bayesianbased classifier [18]. However they were only able to classify the discs as normal or abnormal. Caveats exist in the diagnosis of lumbar spinal pathology by MRI compared to radiologists as ground truth because inter-reader agreement is moderate, irrespective of the nomenclature or radiologists experience [32].
204
S. Ruiz-España et al. / Computers in Biology and Medicine 62 (2015) 196–205
Present accuracy of spinal stenosis was similar to the only study which used surgery as the reference standard, depicting lower sensitivity (70% vs 87%) but higher specificity (81.7% vs 75%) [45]. Substantial agreement was found, better than poor findings in clinical practice among radiologists [32]. This reinforces a clear need for quantitative measurement for spinal stenosis. In a recent systematic review, only four out of 63 studies reported quantitative measures relating to spinal stenosis with patients symptoms [2]. Among these quantitative techniques, computer-assisted segmentations on MRI were even scarcer [22]. Most recent approaches have been assessed on axial images analyzed by an artificial neural network [22]. Although their accuracy ranged from 92–96% there is no data regarding reproducibility in literature [2]. Current study has some strengths. To the best of our knowledge there is no full CAD system able to detect and also quantify the spinal canal narrowing. It obtained highly reproducible results quantifying spinal stenosis, being robust against different sagittal spinal curvatures and could be extended to other spinal segments. It uses accepted standard nomenclature and definitions for disc contour abnormalities (see [6,28]) and indeed classification is based on the standard protocol and in the slice thickness values habitually used in lumbar spine MRI [32]. This study also has limitations. Firstly, although excellent segmentations were obtained, small abnormalities (o15 mm2) may not be detected, considering the disc as normal. Due to the large prevalence of asymptomatic disc contour abnormalities, it is probably not a clinical relevant issue [46–48] and, in addition, although small abnormalities may not be detected, if there is central stenosis caused by this abnormality, it will be detected. Secondly, it was hampered by absent bulging disc detection. Our algorithm regards a normal intervertebral disc as an ellipse, and detection of herniated discs is based on contour abnormalities on that geometry. Bulging disc involves more than 50% of the perimeter of the disc (1801) and resembles an ellipse, so with this geometric approach abnormalities could remain undetected. However, bulging does not necessarily convey a pathological process [28,32]. Lastly, technical validation to identify the source of discogenic pain and outcome correlation is needed, as well as other advanced quantitative methods of degenerated discs [7]. Main advantages of future CAD algorithms, as shown, are combined detection and quantification of findings whenever human agreement is poor, as it happens with other spinal stenosis measures [1]. These results provide support for enhanced reproducibility of MRI reports, as in clinical practice only moderate agreement among radiologists can realistically be expected. The algorithms proposed in this work can be translated relatively quickly in clinical routine and should be taken into account when compared to reference standards.
Informed consent Informed consent was obtained from all patients for being included in the study. The identity of the subjects under study is not revealed.
Conflict of interest statement All authors declare that they have no conflict of interest to disclose.
Acknowledgments This work was supported by the Spanish Ministerio de Economía y Competitividad (MINECO) and by FEDER funds under Grant
TEC2012-33778. The authors want to thank Dr. F.M. Kovacs, Director of the Kovacs Foundation, Kovacs Foundation and Spanish Back Pain Research Network (REIDE) for the support and the patient dataset.
References [1] H.S. An, P.A. Anderson, V.M. Haughton, J.C. Iatridis, et al., Introduction: disc degeneration: summary, Spine (Phila Pa 1976) 29 (2004) 2677–2678. http: //dx.doi.org/10.1097/01.brs.0000147573.88916.c6. [2] J. Steurer, S. Roner, R. Gnannt, J. Hodler, Quantitative radiologic criteria for the diagnosis of lumbar spinal stenosis: a systematic literature review, BMC Musculoskelet. Disord. 12 (2011) 175. http://dx.doi.org/10.1186/1471-247412-175. [3] H.N. Herkowitz, The Lumbar Spine, third ed., Lippincott Williams & Wilkins, Philadelphia, 2004. [4] J.N. Katz, M. Dalgas, G. Stucki, S.J. Lipson, Diagnosis of lumbar spinal stenosis, Rheum. Dis. Clin. N. Am. 20 (1994) 471–483. [5] J.P. Jenkins, D.S. Hickey, X.P. Zhu, M. Machin, I. Isherwood, MR imaging of the intervertebral disc: a quantitative study, Br. J. Radiol. 58 (1985) 705–709. [6] C.W. Pfirrmann, A. Metzdorf, M. Zanetti, J. Hodler, N. Boos, Magnetic resonance classification of lumbar intervertebral disc degeneration, Spine (Phila Pa 1976) 26 (2001) 1873–1878. [7] M. Brayda-Bruno, M. Tibiletti, K. Ito, J. Fairbank, F. Galbusera, et al., Advances in the diagnosis of degenerated lumbar discs and their possible clinical application, Eur. Spine J. 23 (2014) S315–S323. http://dx.doi.org/10.1007/s00586-0132960-9. [8] J. Koh, V. Chaudhary, G. Dhillon, Disc herniation diagnosis in MRI using a CAD framework and a two-level classifier, Int. J. Comput. Assist. Radiol. Surg. 7 (2012) 861–869. http://dx.doi.org/10.1007/s11548-012-0674-9. [9] D. Stelzeneder, G.H. Welsch, B.K. Kovács, S. Goed, T. Paternostro-Sluga, et al., Quantitative T2 evaluation at 3.0T compared to morphological grading of the lumbar intervertebral disc: a standardized evaluation approach in patients with low back pain, Eur. J. Radiol. 81 (2012) 324–330. http://dx.doi.org/ 10.1016/j.ejrad.2010.12.093. [10] G. Niu, J. Yang, R. Wang, S. Dang, E.X. Wu, et al., MR imaging assessment of lumbar intervertebral disk degeneration and age-related changes: apparent diffusion coefficient versus T2 quantitation, Am. J. Neuroradiol. 32 (2011) 1617–1623. http://dx.doi.org/10.3174/ajnr.A2556. [11] G.H. Welsch, S. Trattnig, T. Paternostro-Sluga, K. Bohndorf, S. Goed, et al., Parametric T2 and T2n mapping techniques to visualize intervertebral disc degeneration in patients with low back pain: initial results on the clinical use of 3.0 T MRI, Skelet. Radiol. 40 (2011) 543–551. http://dx.doi.org/10.1007/ s00256-010-1036-8. [12] S. Michopoulou, L. Costaridou, M. Vlychou, R. Speller, A. Todd-Pokropek, Texture-based quantification of lumbar intervertebral disc degeneration from conventional T2-weighted MRI, Acta Radiol. 52 (2011) 91–98. http://dx.doi. org/10.1258/ar.2010.100166. [13] A. Neubert, J. Fripp, C. Engstrom, R. Schwarz, L. Lauer, et al., Automated detection, 3D segmentation and analysis of high resolution spine MR images using statistical shape models, Phys. Med. Biol. 57 (2012) 8357–8376. http: //dx.doi.org/10.1088/0031-9155/57/24/8357. [14] B.P. Bechara, S.K. Leckie, B.W. Bowman, C.E. Davies, B.I. Woods, et al., Application of a semiautomated contour segmentation tool to identify the intervertebral nucleus pulposus in MR images, Am. J. Neuroradiol. 31 (2010) 1640–1644. http://dx.doi.org/10.3174/ajnr.A2162. [15] A.B. Oktay, N.B. Albayrak, Y.S. Akgul, Computer aided diagnosis of degenerative intervertebral disc diseases from lumbar MR images, Comput. Med. Imaging Graph. 38 (2014) 613–619. http://dx.doi.org/10.1016/j.compmedimag.2014.04.006. [16] S. Michopoulou, Image analysis for the diagnosis of MR images of the lumbar spine Doctoral thesis, University College London, 2011 〈http://discovery.ucl.ac. uk/1317776/〉 (accessed 02.03.15). [17] R.I. Riesenburger, M.G. Safain, R. Ogbuji, J. Hayes, S.W. Hwang, A novel classification system of lumbar disc degeneration, J. Clin. Neurosci. 22 (2015) 346–351. http://dx.doi.org/10.1016/j.jocn.2014.05.052. [18] R.S. Alomari, J.J. Corso, V. Chaudhary, G. Dhillon, Toward a clinical lumbar CAD: herniation diagnosis, Int. J. Comput. Assist. Radiol. Surg. 6 (2011) 119–126. http://dx.doi.org/10.1007/s11548-010-0487-7. [19] M.E. Mayerhoefer, D. Stelzeneder, W. Bachbauer, G.H. Welsch, T.C. Mamisch, et al., Quantitative analysis of lumbar intervertebral disc abnormalities at 3.0 T: value of T2 texture features and geometric parameters, NMR Biomed. 25 (2012) 866–872. http://dx.doi.org/10.1002/nbm.1803. [20] M.D. Tsai, S. Bin Jou, M.S. Hsieh, A new method for lumbar herniated intervertebral disc diagnosis based on image analysis of transverse sections, Comput. Med. Imaging Graph. 26 (2002) 369–380. http://dx.doi.org/10.1016/ S0895-6111(02)00033-2. [21] R.S. Alomari, J.J. Corso, V. Chaudhary, G. Dhillon, Computer-aided diagnosis of lumbar disc pathology from clinical lower spine MRI, Int. J. Comput. Assist. Radiol. Surg. 5 (2010) 287–293. http://dx.doi.org/10.1007/s11548-009-0396-9. [22] S. Koompairojn, K. Hua, K.A. Hua, J. Srisomboon, Computer-Aided Diagnosis of Lumbar Stenosis Conditions, in: Proceedings of the Medical Imaging 2010: Computer-Aided Diagnosis, 2010, p. 76241C. 10.1117/12.844545.
S. Ruiz-España et al. / Computers in Biology and Medicine 62 (2015) 196–205
[23] F. Jäger, J. Hornegger, S. Schwab, R. Janka, Computer-aided assessment of anomalies in the scoliotic spine in 3-D MRI images, Med. Image Comput. Assist. Interv. 12 (2009) 819–826. [24] J. Koh, Lumbar spinal stenosis CAD from clinical MRM and MRI based on interand intra-context features with a two-level classifier, in: Proceedings of the Medical Imaging 2011: Computer-Aided Diagnosis, 2011, p. 796304. [25] J.M. Ho, P.J. Ben-Galim, B.K. Weiner, L.E. Karbach, C.A. Reitman, et al., Toward the establishment of optimal computed tomographic parameters for the assessment of lumbar spinal fusion, Spine J. 11 (2011) 636–640. http://dx. doi.org/10.1016/j.spinee.2011.04.027. [26] N. Attias, A. Hayman, J.A. Hipp, P. Noble, S.I. Esses, Assessment of magnetic resonance imaging in the diagnosis of lumbar spine foraminal stenosis—a surgeon's perspective, J. Spinal Disord. Tech. 19 (2006) 249–256. http://dx.doi. org/10.1097/01.bsd.0000203942.81050.c8. [27] J. Swets, Measuring the accuracy of diagnostic systems, Science 240 (1988) 1285–1293. http://dx.doi.org/10.1126/science.3287615. [28] D.F. Fardon, Nomenclature and classification of lumbar disc pathology, Spine (Phila Pa 1976) 26 (2001) 461–462. http://dx.doi.org/10.1097/00007632200103010-00007. [29] A. Kettler, H.J. Wilke, Review of existing grading systems for cervical or lumbar disc and facet joint degeneration, Eur. Spine J. 15 (2006) 705–718. http://dx. doi.org/10.1007/s00586-005-0954-y. [30] D.L. Sackett, Evidence base of clinical diagnosis: the architecture of diagnostic research, BMJ 324 (2002) 539–541. http://dx.doi.org/10.1136/bmj.324.7336.539. [31] E. Alpaydin, Introduction to Machine Learning, The MIT Press, Cambridge, MA, 2004. [32] E. Arana, A. Royuela, F.M. Kovacs, A. Estremera, H. Sarasíbar, et al., Lumbar spine: agreement in the interpretation of 1.5-T MR images by using the nordic modic consensus group classification form, Radiology 254 (2010) 809–817. http://dx.doi.org/10.1148/radiol.09090706. [33] R.S. Alomari, J.J. Corso, V. Chaudhary, Labeling of lumbar discs using both pixel- and object-level features with a two-level probabilistic model, IEEE Trans. Med. Imaging 30 (2011) 1–10. http://dx.doi.org/10.1109/TMI.2010.2047403. [34] N. Otsu, A threshold selection method from gray-level histograms, IEEE Trans. Syst. Man Cybern. 20 (1979) 62–66. [35] J. Canny, A computational approach to edge detection, IEEE Trans. Pattern Anal. Mach. Intell. 8 (1986) 679–698. http://dx.doi.org/10.1109/TPAMI.1986.4767851. [36] C. Xu, J.L. Prince, Snakes, shapes, and gradient vector flow, IEEE Trans. Image Process. 7 (1998) 359–369. http://dx.doi.org/10.1109/83.661186.
205
[37] X. Bai, L.J. Latecki, W.Y. Liu, Skeleton pruning by contour partitioning with discrete curve evolution, IEEE Trans. Pattern Anal. Mach. Intell. 29 (2007) 449–462. http://dx.doi.org/10.1109/TPAMI.2007.59. [38] N. Roberts, C. Gratin, G.H. Whitehouse, MRI analysis of lumbar intervertebral disc height in young and older populations, J. Magn. Reson. Imaging 7 (1997) 880–886. http://dx.doi.org/10.1002/jmri.1880070517. [39] P.K. Saha, Novel theory and methods for tensor scale: a local morphometric parameter, in: Proceedings of the Medical Imaging 2003: Image Process, 2003, pp. 743–753. 10.1117/12.480645. [40] F. Zheng, J.C. Farmer, H.S. Sandhu, P.F. O'Leary, A novel method for the quantitative evaluation of lumbar spinal stenosis, HSS J. 2 (2006) 136–140. http://dx.doi.org/10.1007/s11420-006-9006-3. [41] J. Fleiss, J. Cohen, The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability, Educ. Psychol. Meas. 33 (1973) 613–619. [42] J.R. Landis, G.G. Koch, The measurement of observer agreement for categorical data, Biometrics 33 (1977) 159–174. http://dx.doi.org/10.2307/2529310. [43] M. Wood, Statistical inference using bootstrap confidence intervals, Significance 1 (2004) 180–182. http://dx.doi.org/10.1111/j.1740-9713.2004.00067. [44] N.H. Al Nezari, A.G. Schneiders, P.A. Hendrick, Neurological examination of the peripheral nervous system to diagnose lumbar spinal disc herniation with suspected radiculopathy: a systematic review and meta-analysis, Spine J. 13 (2013) 657–674. http://dx.doi.org/10.1016/j.spinee.2013.02.007. [45] R.J. Bischoff, R.P. Rodriguez, K. Gupta, A. Righi, J.E. Dalton, et al., A comparison of computed tomography-myelography, magnetic resonance imaging, and myelography in the diagnosis of herniated nucleus pulposus and spinal stenosis, J. Spinal Disord. 6 (1993) 289–295. http://dx.doi.org/10.1097/ 00002517-199306040-00002. [46] S.J. Kim, T.H. Lee, S.M. Lim, Prevalence of disc degeneration in asymptomatic korean subjects. Part 1: lumbar spine, J. Korean Neurosurg. Soc. 53 (2013) 31–38. http://dx.doi.org/10.3340/jkns.2013.53.1.31. [47] W. Brinjikji, P.H. Luetmer, B. Comstock, B.W. Bresnahan, L.E. Chen, et al., Systematic literature review of imaging features of spinal degeneration in asymptomatic populations, Am. J. Neuroradiol. (2014), http://dx.doi.org/ 10.3174/ajnr.A4173. [48] F.M. Kovacs, E. Arana, A. Royuela, A. Estremera, G. Amengual, et al., Vertebral endplate changes are not associated with chronic low back pain among Southern European subjects: a case control study, Am. J. Neuroradiol. 33 (2012) 1519–1524. http://dx.doi.org/10.3174/ajnr.A3087.