ORIGINAL ARTICLE An open-source framework of neural networks for diagnosis of coronary artery disease from myocardial perfusion SPECT Levent A. Guner, MD, MS,a Nese Ilgin Karabacak, MD,a Ozgur U. Akdemir, MD,a Pinar Senkul Karagoz, PhD,b Sinan A. Kocaman, MD,c Atiye Cengel, MD,c and Mustafa Unlu, MDa Background. The purpose of this study is to develop and analyze an open-source artificial intelligence program built on artificial neural networks that can participate in and support the decision making of nuclear medicine physicians in detecting coronary artery disease from myocardial perfusion SPECT (MPS). Methods and Results. Two hundred and forty-three patients, who had MPS and coronary angiography within three months, were selected to train neural networks. Six nuclear medicine residents, one experienced nuclear medicine physician, and neural networks evaluated images of 65 patients for presence of coronary artery stenosis. Area under the curve (AUC) of receiver operating characteristics analysis for networks and expert was .74 and .84, respectively. The AUC of the other physicians ranged from .67 to .80. There were no significant differences between expert, neural networks, and standard quantitative values, summed stress score and total stress defect extent. Conclusions. The open-source neural networks developed in this study may provide a framework for further testing, development, and integration of artificial intelligence into nuclear cardiology environment. (J Nucl Cardiol 2010) Key Words: Ischemia Æ myocardial Æ myocardial perfusion imaging: SPECT Æ basic science
INTRODUCTION An artificial neural network (ANN) is computer software that mimics a biological neural network. It is composed of a set of processing units that are called nodes. The nodes are grouped in layers and each node in a layer is connected to the nodes in the previous and next layer. The connections have different levels of effects on other nodes with the help of connection weights (i.e., when a sufficiently strong input is received—like neurotransmitters—the receiving node fires). In this way
From the Department of Nuclear Medicine,a Gazi University School of Medicine, Besevler, Ankara, Turkey; Department of Computer Engineering, ODTU,b Middle East Technical University, Ankara, Turkey; Department of Cardiology,c Gazi University School of Medicine, Besevler, Ankara, Turkey. Received for publication Dec 6, 2009; final revision accepted Feb 11, 2010. Reprint requests: Levent A. Guner, MD, MS, Department of Nuclear Medicine, Gazi University School of Medicine, Besevler, Ankara, Turkey;
[email protected],
[email protected]. 1071-3581/$34.00 Copyright Ó 2010 by the American Society of Nuclear Cardiology. doi:10.1007/s12350-010-9207-5
an ANN builds its own memory within its structure composed of nodes, connections, and associated weights.1 Diagnosis from medical images is usually a pattern recognition task and ANNs are well suited for pattern recognition. While it is not possible to recognize all possible patterns, an ANN may complement physicians in recognizing some patterns that one of them misses, improving overall accuracy. There has been a great deal of interest in effects of computer aid on physician performance in the field of radiology2,3 but so far there have been only few attempts to see the impact of ANNs on the decision process in the domain of nuclear medicine.4 In our study, we developed a standalone artificial intelligence (AI) program based on artificial neural networks, analyze its performance and determine if we could improve physician performance in diagnosing significant coronary artery stenosis from myocardial perfusion SPECT (MPS). Previously some commercial systems have been developed that analyze MPS.5,6 In this study, we developed and provided the full source of our experiments and the final software so that interested researchers can freely build on or use parts of it (Accessible at http:// sourceforge.net/projects/cadnetworks/.).
Guner et al An open-source framework of neural networks
MATERIALS AND METHODS
Journal of Nuclear Cardiology
back projection with decay correction was used for reconstruction. Motion correction was applied when necessary.
Selection of Patients We selected 690 patients who were referred to our department for MPS and followed by coronary angiography between January 2004 and March 2008. Then we refined the list so that there were no more than 3 months between SPECT and angiography, no evidence of myocardial infarction, history of coronary artery bypass operation, existence of myocardial bridge or limit lesions for which further coronary reserve measurements were recommended. All of the SPECT data was evaluated by a nuclear medicine physician for image artifacts. After the refinement, the final list contained 308 patients. This study was retrospective and performed in accordance with the regulations of the local Ethics Committee.
Imaging Protocols All imaging protocols were performed in concordance with the relevant procedure guidelines. The patient preparation included overnight fasting, avoiding beta-blockers for 48 h, and calcium-channel blockers for 24 h. Patients were imaged at supine position. The radiopharmaceutical was 99mTc-MIBI (Methoxyisobutylisonitrile) for 30 patients (9.7%) and 201Tl for the remaining 278 patients (90.3%). Patients who were imaged with 201Tl had first stress imaging with 3 mCi 201Tl. After 4 h, rest-redistribution imaging was performed. 99mTc-MIBI imaging was done according to single day rest-stress protocol, 10-12 mCi 99mTc-MIBI injection for rest and 30-35 mCi 99m Tc-MIBI for stress imaging, 45-60 minutes elapsed between 99m Tc-MIBI injection and imaging. Stress test was performed with dobutamine for 34 patients (11%) and treadmill exercise for 274 patients (89%), one patient exercised with modified Bruce protocol and the rest with Bruce protocol. Dobutamine stress testing was performed with intravenous infusion of 10 mg kg-1 min-1 and increased every 3 minutes until maximal infusion of 40 mg kg-1 min-1 and maximal total duration of 12 minutes. Radiopharmaceutical was injected when patients achieved 85% of maximal age-predicted heart rate or at the end of maximal duration of infusion. Exercise and pharmacologic stress tests performed were symptom limited when necessary.
Nuclear Imaging Images were acquired with GE Optima (General Electric Medical Systems; Milwaukee, WI) dual-head cardiac gamma camera with low-energy general-purpose collimators for 201Tl and low-energy high-resolution collimators for 99mTc-MIBI, starting from right anterior oblique at 45 degrees to left posterior oblique for 180 degrees in a circular orbit. Energy window was centered at 72 keV ± 10% for 201Tl and 140 keV ± 10% for 99m Tc-MIBI. Matrix size was 64 9 64. For 201Tl imaging, the acquisition had 32 projections 40 seconds each and for 99m Tc-MIBI imaging, 64 projections 25 seconds each. Filtered
Angiography An experienced interventional cardiologist who was blind to the study examined coronary angiograms. Standard selective coronary angiography with at least four views of the left coronary system and two views of the right coronary artery was performed using the Judkins technique. Percentage luminal diameter stenosis was recorded for the coronary arteries.
Image Processing Acquired images were processed in a GE Entegra workstation (version 2.0315) and ECToolbox software (GE version, Emory University). Raw polar maps of stress and rest examinations were obtained in Dicom P10 format and in 64 9 64 matrix resolution using ‘‘dicom4che’’ open-source software (Downloadable from http://www.dcm4che.org.). Rest and stress polar maps were normalized by finding the point that has the maximum count in the stress map, drawing a 64 pixel square around this point in both stress and rest maps and then calculating the normalization factor between maps using the median count values in these squares. From stress and rest maps, difference polar maps were calculated. In order to limit input size, two-dimensional Fourier transformations of the stress and difference polar maps were made using FFTW library.7 From each polar map image, 12 frequency (magnitude, phase) and brightness values were recorded and normalized to [-1, 1] interval (average 0, range 2). The patients were randomly split into three groups; 60% training (n = 182), 20% validation (n = 61), and 20% test groups (n = 65) using a sequence random number generator (Accessible at http://www.random.org.).
Training of Neural Networks Networks were trained to recognize significant coronary artery stenosis, which is defined as C70% luminal diameter stenosis in any of the major coronary arteries or their major branches. Medium of artificial intelligence was RapidMiner machine learning software.8 Neural networks were created using Weka multilayer perceptron.9 The network structure includes an input layer consisting of 50 nodes (for stress and difference polar maps, 12 magnitude, 12 phase, and 1 brightness value each), one hidden layer consisting of three to five nodes and one output node. The networks that accept image data were named NN_img. Additionally, networks that have gender, image and body-mass index (NN_sbmi) and networks with only gender and image information (NN_s) were trained. Parameters including learning rate [0.1-0.9], decay, number of iterations [500-2000], momentum, and number of hidden neurons [3-5] were changed with a parameter iterator. During training, the networks created were applied on the validation group. The best performing networks on the validation group were selected with their optimum threshold points with respect to their receiver operating characteristic curves (ROC)
Journal of Nuclear Cardiology
Guner et al An open-source framework of neural networks
(i.e., minimum number of false-negatives and false-positives). This was done in order to achieve the best obtainable sensitivity and specificity from each network. Six of the best performing networks on the validation set were selected to form an ensemble. Each network output was subject to its optimum threshold value to output either a zero for exclusion or one for existence of coronary artery disease (CAD). Ensemble’s final decision was obtained using arithmetic averaging.
SPECT Evaluation and Human-Computer Interaction One expert reader who has 10 years of experience and six nuclear medicine residents who have two to four years of experience in nuclear cardiology took part in the study. Without the knowledge of networks’ output, they evaluated the test group (n = 65) for existence of CAD using a five point scale (normal, probably normal, equivocal, probably abnormal and abnormal). Four physicians re-evaluated the images with the knowledge of the networks’ output but without any information of the performance of networks. In addition, we designed a hypothetical computer aid such that when networks (NN_img) are confident in their decision (i.e., at least five of six networks output the same result), and physician is either uncertain or not completely certain; physician decision is changed in favor of networks. The differences in performances were calculated with ROC curves using the method of Hanley and McNeil.10 Sensitivity, specificity, and accuracy values were compared using
McNemar tests to reflect the use in clinical practice. Physician decision was defined as positive when he/she is at least uncertain for CAD (i.e., Cequivocal). Two thresholds at C70% and C50% were used to test differences in network and human performances. Utilizing the global recognition of CAD by networks and the expert, the vascular distribution of C70% stenosis was analyzed in left anterior descending artery (LAD) only and right coronary artery (RCA)—left circumflex artery (LCx) only groups. Summed Stress Score (SSS) and Total Stress Defect Extent (TSE) values were obtained from ECToolbox software. ROC curves of SSS and TSE were generated. For the networks and the expert reader, a cross-table was built showing the distribution of classified patients into true-positive, true-negative, false-positive, and false-negative. The image data were transferred to a General Electric Xeleris workstation (version 2.1220) and ECToolbox software (version 3.05). On this version of ECToolbox software, expert system Perfex was executed for the test group and results were recorded as levels of confidence.
RESULTS Characteristics of Patient Data Table 1 summarizes the demographics of patients used during neural network training-validation (n = 243), and the test group (n = 65). In the test group,
Table 1. Demographics of training, validation, and test patient groups
Demographic data Age Sex (Female) Weight (kg) Height (cm) Body mass index Diabetes Hypertension Smoker Family history Hypercholesterolemia Pretest symptom % Achieved heart rate Achieved level of exercise for treadmill stress (METS) Radiopharmaceutical 99mTc-MIBI/201Tl Treadmill stress test Pharmacological stress test Significant CADa a
Training and validation group (n 5 243)
Test group (n 5 65)
P-value
57.5 ± 10.9 91 (37.4%) 78.0 ± 11.9 167.0 ± 8.7 28.1 ± 4.1 61 (25.1%) 134 (55.3%) 97 (40.0%) 77 (31.4%) 130 (53.6%) 186 (76.6%) 93.3% ± 12.0% 9.8 ± 4.6
57.9 ± 12.2 23 (35.3%) 78.6 ± 12.9 166.3 ± 9.0 28.5 ± 4.8 23 (35.4%) 40 (61.2%) 22 (33.8%) 17 (25.8%) 31 (48.3%) 45 (69.3%) 94.8% ± 8.8% 8.8 ± 2.4
NS NS NS NS NS NS NS NS NS NS NS NS NS
22 (9%)/221 (91%) 215 (88.5%) 28 (11.5%) 89 (36.6%)
8 (12.3%)/57 (87.7%) 59 (90.8%) 6 (9.2%) 24 (36.9%)
NS NS NS NS
C70% luminal diameter stenosis is considered significant.
Guner et al An open-source framework of neural networks
Journal of Nuclear Cardiology
Table 2. Vascular distribution of training-validation and test groups
Training and validation (n 5 243) Significant CADa Single vessel disease Double vessel disease Triple vessel disease LAD LCx RCA
89 53 23 13 54 35 49
(36.6%) (21.8%) (9.5%) (5.3%) (22.2%) (14.4%) (20.1%)
Test (n 5 65) 24 9 9 6 16 13 16
P-value
(36.9%) (13.8%) (13.8%) (9.2%) (24.6%) (20%) (24.6%)
NS NS NS NS NS NS NS
a
C70% luminal diameter stenosis is considered significant.
average age of the patients was 57.9 ± 12.2. Twentythree patients were female (35.3%). Forty-five patients (69.3%) were symptomatic (typical and atypical angina, exertional dyspnea) before the stress test. Table 2 shows the vascular distribution of the significant CAD in the same groups. Twenty-four patients (36.9%) had C70% stenosis of which nine (13.8%) had single vascular, nine (13.8%) double vascular, and six (9.2%) triple vascular disease. Twenty-eight patients (43.1%) had C50% stenosis. When compared no variable differed significantly between the groups.
Performance of Neural Networks The areas under the curve (AUC) of ROC analysis of the networks to detect C70% stenosis were 0.74 (NN_img), 0.65 (NN_s), and 0.72 (NN_sbmi). The differences were not significant. For C50% stenosis the values were 0.75, 0.65, and 0.69 (NN_img 0.75 vs NN_s 0.65, P = .033). Since NN_img had the largest AUC, it was used for the remainder of analyses. The sensitivity, specificity, and accuracy of NN_img were 71%, 68%, and 69% (C70% stenosis, ensemble output threshold set as C2 networks positive for CAD).
Comparisons at ‡50% Luminal Diameter Stenosis Expert reader’s AUC, sensitivity, specificity, and accuracy was .84, 82%, 70%, and 75%, respectively. The AUC of networks was .75 (P = NS expert vs networks). At fixed 70% specificity, the sensitivity of networks was 68%. Sensitivity, specificity, and accuracies of the inexperienced readers were on average 66% (50-75%), 65% (51-78%), and 65% (58-69%). No statistically significant differences between networks and any of the physicians were detected at 50% or 70% thresholds. Subgroup Analysis Based on Vascular Distribution In the test group (n = 65) there were five patients with LAD disease without LCx or RCA disease and eight patients with LCx or RCA disease without LAD disease. In LAD only subgroup networks detected 3/5 patients (60%) while the expert detected 5/5 patients (100%). In LCx-RCA only subgroup networks detected 6/8 patients (75%) and the expert detected 5/8 patients (62%) (P = NS for both subgroups between networks and expert).
Comparisons at ‡70% Luminal Diameter Stenosis
Subgroup Analysis Based on Extent of Disease
At the threshold of C70% stenosis, expert reader’s AUC was .84, his sensitivity, specificity, and accuracy was 83%, 65%, and 72%, respectively. The AUC of networks was .74 (P = NS expert vs networks). At 68% specificity, the sensitivity of networks was 71%. The AUC of the other physicians were .71, .67, .72, .68, .80, and .73. When compared with the expert .67 vs .84 (P = .007) and .68 vs .84 (P = .03) were significant. Sensitivity, specificity, and accuracies of the inexperienced readers were on average 70% (50-83%), 63% (54-75%), and 66% (61-71%).
SSS of true-positive classifications of networks and expert were 18.8 ± 7.1 and 18.1 ± 7.1. The SSS of truenegative classifications were 8.0 ± 4.6 and 8.4 ± 4.8, respectively. No significant difference was detected between both subgroups (Table 3). Subgroup Analysis Based on Number of Territories Involved Networks detected 5/9 patients with single-vascular disease (56%), 6/9 patients with double-vascular disease
NS NS NS NS
P-value
Guner et al An open-source framework of neural networks
Table 4. Inter-rater agreement between neural networks and expert*
Expert reader 13.9 (n = 20) 13.3 (n = 14) 12.1 (n = 27) 2.2 (n = 4) ± ± ± ± 24.5 14.1 11.1 8.2 13.6 (n = 17) 12.6 (n = 13) 11.6 (n = 28) 6.7 (n = 7)
CADb
24 (3)c 7 (1)
11 (4) 23 (16)
*kappa = .448. a Classified as C2 networks reporting positive for CAD. b Classified as Cequivocal for CAD. c The number in parenthesis is the number of patients having C70% luminal diameter stenosis in any of the coronary arteries in angiography.
(67%), and 6/6 patients with triple-vascular disease (100%). The detection rates for the expert were, 6/9 patients with single-vascular disease (67%), 8/9 patients with double-vascular disease (89%), and 6/6 patients with triple-vascular disease (100%).
26.6 18.2 9.3 10.1
When the physician decisions were combined with networks, sensitivity, specificity, and accuracy were on average 72%, 76%, and 74%. Individually, for two physicians, changes were statistically significant. Second reader’s accuracy increased from 66% to 77% (P = .04) and third reader’s specificity increased from 63% to 80% (P = .01). Four physicians re-evaluated the images with information of networks output. On average, sensitivity, specificity, and accuracy increased from 70% to 74%, 65% to 69%, and 67% to 70%, respectively. Individually, no statistically significant improvements were detected.
± ± ± ±
7.1 4.6 4.8 2.4
(n (n (n (n
20) 14) 27) 4)
The inter-rater agreement kappa value was .448 (Table 4). Networks and expert reader agreed on 24 patients as negative and 87.5% (21 of 24) of these patients did not have CAD. They agreed on 23 patients as positive for CAD of which 69.6% (16 of 23) had significant CAD.
= = = =
NS NS NS NS
Concordance with Expert Reader
Physicians and Neural Networks
18.8 12.5 8.0 11.1
± ± ± ±
7.1 4.2 4.6 5.6
(n (n (n (n
= = = =
17) 13) 28) 7)
18.1 11.4 8.4 8.5
P-value Expert reader
SSS and TSE for Detecting CAD True positive False positive True negative False negative
NN_img
No CAD CADa
No CAD
± ± ± ±
Expert reader NN_img
Total stress defect extent
NN_img
Summed stress score
Table 3. Subgroup analysis of neural networks and expert based on summed stress score and total stress defect extent (n = 65)
Journal of Nuclear Cardiology
Average SSS of the patients negative for CAD was 9.4 ± 4.9 and positive for CAD was 16.5 ± 7.5 (P \ .0001). Average TSE of the patients negative for CAD was 12.1 ± 12.5 and positive for CAD was 21.8 ± 14.1 (P = .0054). AUC for SSS and TSE was .78 and .74, respectively. There were no statistically
Guner et al An open-source framework of neural networks
significant differences between AUC of SSS, TSE, expert and networks. Comparison with Expert System When the confidence levels in Perfex output were plotted (83% sensitivity and 73% specificity), AUC was .69. It achieved its highest sensitivity of 96% (23/24) when an abnormal interpretation was taken as positive for CAD (i.e., equivocal, possible, probable and is-abnormal states). The maximum attainable specificity of Perfex was 46% (19/41) regardless of the threshold selected. In comparison to networks, sensitivity and specificity differences were significant (96% vs 71%, P = .0313, 46% vs 68%, P = .049). DISCUSSION The training-validation and test groups were similar in their demographics and in CAD prevalence, which validates the random separation of the groups. In order to examine the diagnosing pattern of the networks, analyses based on extent of disease, number and location of vascular territories involved, different percentage luminal stenosis thresholds and analysis of concordance with the expert reader has been performed. SSS and TSE are frequently used for standard quantification of MPS images and are available from commonly used nuclear cardiology software.11-16 Wolak et al recently analyzed three commonly used software packages.17 They found sensitivities between 80% and 87% and specificities between 49% and 71% (SSS C 4). Although the difference in radiopharmaceutical agent (99mTc-MIBI and 201Tl) prevents us making direct comparisons to this study, expert reader and networks are similar in their performances when compared to these results. When we compared SSS and TSE values obtained in this study with the expert and networks, we found no statistically significant difference in detecting CAD. Table 3 shows the distribution of SSS and TSE in classifications of expert and networks. Networks classified 35 cases as negative (SSS 8.6 ± 4.9) and 30 cases as positive (SSS 16.0 ± 6.7). The classifications seemed to follow hypoperfusion detected by quantitative values even for false decisions (SSS of false-positives 12.5 vs all-negatives 8.6, P = .0158). This distribution shows that networks classified patients in parallel to severity and extent of hypoperfusion. The absence of any significant difference in SSS and TSE between the expert and networks suggest that networks capture the functional information in the images. When global CAD detection rates of the networks and the expert were compared in LAD and RCA-LCx
Journal of Nuclear Cardiology
subgroups, no significant differences were found. However, the networks were not trained for individual territories and the numbers in each subgroup is limited for a local analysis. Changing the threshold of stenosis from 70% to 50% did not create a significant change in performances of either physicians or networks. In addition, at both thresholds the networks were indistinguishable from physicians in terms of AUC, sensitivity, and specificity. For the networks and the expert reader, the detection rates were in parallel with the number of vascular territories involved. Both the networks and the physician correctly diagnosed all of the six patients with triplevascular disease. Expert reader detected three more cases in single and double-vascular groups. Networks and expert reader concordance was moderate (kappa = .448). They agreed for 47/65 of patients (72.3%). While the expert’s positive predictive value (PPV) was 58.8%, the PPV of a joint positive decision was 69.6%. On the opposite, expert’s negative predictive value (NPV) was 87.1% and NPV of a joint negative decision was 87.5%. In the disagreement, when only the expert’s decision was positive, the PPV dropped to 36.4%. When further analyzed, SSS of the first disagreement group (expert positive, networks negative) was 10.6 ± 5.2 and of the second group (expert negative, networks positive) 10.9 ± 4.0 (P = NS between two groups). Similarity in the SSS values of disagreement groups may suggest that the behavior of the networks follows the expert reader. However, the expert reader qualified more of these patients correctly as positive. The TSE value for the first disagreement group was slightly lower than for the second disagreement group (9% ± 6.5% vs 14.3% ± 7.0%, P = NS). The networks may have emphasized extent more than the expert did or networks did not detect some of the small hypoperfused areas. When there is a disagreement in the form of human reader as positive and networks negative, a consultation with another expert may be requested. Joint positive and negative decisions have stronger PPV and NPV thus may require no further consultation with another physician. In our test group, there were four patients with stenosis C50% but \70%. Interestingly, three of these patients were in the disagreement groups and one in the joint positive group. Two of these patients were in the first and one in the second disagreement group. Having all of these patients correctly classified by either expert or networks, this finding may imply that any disagreement with the networks deserves a secondary opinion from another experienced physician. One purpose of this study was to test benefits of using networks as an aid to physicians. There were net increases in average physician performances in
Journal of Nuclear Cardiology
sensitivity, specificity, accuracy, and individually statistically significant increases in two cases. Even though we detected some statistically significant differences between physicians before computer aid, there were no statistically significant differences after (not shown). Similarly, Lindahl et al detected a decrease in interobserver variability when physicians used network outputs as an advisory.4 In practice, physicians may need more time to adapt to using such aids and may want to know the strengths and weaknesses of the aid beforehand. Due to the study design, physicians were blind to the performance results of the networks. Previous Work Pattern recognition learned through experience is an important component in interpretation of myocardial perfusion scintigraphies. In 1992, using polar maps Fujita et al found that networks performed comparably to physicians.18 Later, Porenta et al developed networks that detect CAD from planar myocardial scintigraphies. With specificity fixed at 90%, the trained networks had 51% sensitivity against 72% of an expert.19 In our study, the highest ROC area belonged to the nuclear medicine fellow who had the most experience in nuclear cardiology (AUC .84 vs .74, expert vs NN_img, P = NS). When the comparison was made in terms of sensitivity, specificity, and accuracy, the ensemble was statistically indistinguishable from expert (71% vs 83% sensitivity, 68% vs 65% specificity, 69% vs 72% accuracy, NN_img vs expert, P = NS). This shows that the pattern recognition ability of neural networks can be trained at least to the level of several years of experience in the field. There are confounding variables like gender and body habitus that affect acquired images. Physicians consider this data when they interpret images.20 In our study, unexpectedly gender data (NN_s) made a negative impact in terms of ROC analysis. Adding body mass index improved the performance of networks but not over networks utilizing only images. Similarly, Lindahl et al fed networks with gender data and detected no improvements (AUC 0.92 gender with image vs 0.94 image).21 Scott et al detected improved accuracy with inclusion of exercise test data into image data (82% vs 68%, P \ .05).22 Tagil et al did not find improvement in network performances with gender-specific or tracerspecific networks.23 Haraldsson et al did not detect improvements with exercise data in addition to images (AUC .78 vs .77, P = NS).24 The literature tends to confirm the hypothesis that images contain much of the information to diagnose significant CAD correctly. In order to achieve good generalization, number of inputs to networks should be limited in comparison to the number of training cases. In some studies pixel
Guner et al An open-source framework of neural networks
averaging values were used as inputs to train ANNs.25 Lindahl et al converted polar map images into Fourier components.26 In terms of CAD diagnosis, networks had higher sensitivity than visual interpretation in LAD territory (77% vs 73%, P = .038), but not in other territories. In this study, we chose the same method in reducing images, but generation and selection of Fourier components from polar maps were different. Ensembles of neural networks provide a smoother response and stabilize the results. Furthermore, even without changing internal parameters, each network can have different characteristics depending on its starting weights. Lomksy et al have detected that an ensemble of networks were better at simulating physician interpretations than commonly used quantification algorithms (at 90% sensitivity, 85% vs 46% specificity, P \ .001).27 In our study, internal network parameters were iterated within a predefined range, because their optimum values cannot be determined beforehand. Among the thousands of networks created during training, the best performing ones were selected to form an ensemble in an attempt to obtain a stable network structure each with unique features. ANNs are not the only machine learning tools that researchers have focused on. Expert systems and casebased reasoning (CBR) are also actively investigated.28-32 Khorsand et al developed a CBR system and at 80% specificity, achieved 67% sensitivity compared with 65% of automated scoring and 74% of visual interpretation.29 Garcia et al developed a knowledgebased expert system (Perfex) and found statistically significant differences in detection of CAD.6,33 This expert system is integrated in the later versions of ECToolbox software. One advantage of neural networks is ability of using raw images as inputs, which provides an unbiased representation of what a physician actually sees. In addition, neural networks can accept Fourier components as inputs and map complex functions. On the other hand, an expert system can explain the pathway to its decision. Indeed, this may be important in acceptance of a tool into clinical practice. The possibilities are not limited to a single learner. The research shows that hybrid learners are more successful than learners built on a single algorithm.34 The success of combinations of machine learning algorithms may be a hint of our hypothesis that rather than comparisons, cooperation between physician and computers may be more beneficial, which is the main concept of computer-aided diagnosis. What differentiates our software from the valuable work in the literature is threefold. First, the case data will be made available upon request of individuals, unless for commercial use. Second, the availability of the software source code and experiments will enable
Guner et al An open-source framework of neural networks
researchers to build on this framework for further development and test its clinical value in their own departments. Third, development in Java environment enables multiplatform usage, Windows, Linux or Mac. Study Limitations Clinical application programs require 510k FDA clearance in the USA and use for clinical purposes may carry liability issues. Other nations may have further regulatory rules regarding such programs. The software developed in this study does not have FDA clearance. Therefore, we encourage the usage of the software presented here be limited only to a preliminary for further consultation with another physician and for research purposes. The test group size (n = 65) has been selected with regard to the total number of available cases (n = 308). There are three main methods to estimate the generalization error, which are split-sample validation, crossvalidation, and bootstrapping. Even though the test group size is relatively limited, we selected split-sample validation, as it lets us directly compare trained networks to human readers and it is intuitively easier to compare to other quantitative outputs (SSS, TSE etc.). Furthermore, a completely random three-split schema that we utilized means no network was trained on or selected according to its test group performance. This provides an unbiased estimate for the generalization error. We used coronary angiography as gold standard, which may not reflect the actual physiological information obtained from a MPS. However, coronary angiography provides an independent reference to compare neural networks to physicians. We performed concordance analysis with the expert reader to address this limitation. Selection of the patients was retrospective from patients who had coronary angiographies. It is obvious that there will be verification bias toward lower specificity. There are methods to fix this bias mathematically but the aim of this study is not to accurately predict the performance of MPS. A higher number of cases had normal angiographies in this study. This statistic may have resulted from the frequency and coexistence of multiple pretest risk factors and stresstest findings. The selection of six networks to form an ensemble was decided based on the observation of individual performances of each network trained and total response time. We used SPECT data that has been obtained from our department database. We intentionally did not include any patients with previous myocardial infarction and it is best avoided using such patients for interpretation by the
Journal of Nuclear Cardiology
networks created in this study. Similarly, separate networks for individual vascular territories can be trained. In this study, a global CAD recognition was preferred in order to increase the number of cases for proper training of networks. Because of software and technical limitations, we based our development on GE Entegra Workstation and ECToolbox software. The geometrical modeling and sampling of myocardium is different between nuclear cardiology programs.11,16,35 Although we used raw polar maps, slightly different results may be obtained from other cardiology software.
CONCLUSION In this study, we have developed and provided a detailed analysis of an open-source artificial intelligence software based on neural networks that can diagnose CAD from myocardial perfusion SPECT. Researchers can utilize this framework for further testing, development, and integration into clinical environment. Acknowledgments We would like to thank residents of the Department of Nuclear Medicine, Dr Unal, Dr Cakir, Dr Sucak, Dr Doksoz, and Dr Sahiner who have participated in the experiments of this study.
References 1. Cross SS, Harrison RF, Kennedy RL. Introduction to neural networks. Lancet 1995;346:1075-9. 2. Awai K, Murao K, Ozawa A, Nakayama Y, Nakaura T, Liu D, et al. Pulmonary nodules: Estimation of malignancy at thin-section helical CT—effect of computer-aided diagnosis on performance of radiologists. Radiology 2006;239:276-84. 3. Petrick N, Haider M, Summers RM, Yeshwant SC, Brown L, Iuliano EM, et al. CT colonography with computer-aided detection as a second reader: Observer performance study. Radiology 2008;246:148-56. 4. Lindahl D, Lanke J, Lundin A, Palmer J, Edenbrandt L. Improved classifications of myocardial bull’s-eye scintigrams with computer-based decision support system. J Nucl Med 1999;40:96-101. 5. Ohlsson M. WeAidU-a decision support system for myocardial perfusion images using artificial neural networks. Artif Intell Med 2004;30:49-60. 6. Garcia EV, Cooke CD, Folks RD, Santana CA, Krawczynska EG, De Braal L, et al. Diagnostic performance of an expert system for the interpretation of myocardial perfusion SPECT studies. J Nucl Med 2001;42:1185-91. 7. Frigo M, Johnson SG. The design and implementation of FFTW3. Proc IEEE 2005;93:216-31. 8. Ingo Mierswa MS, Wurst M. YALE: Rapid prototyping for complex data mining tasks. In: 12th ACM SIGKDD International PONZETTO & STRUBE Conference on Knowledge Discovery and Data Mining; 2006.
Journal of Nuclear Cardiology
9. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I. The WEKA data mining software. ACM SIGKDD Explor Newslett (ACM Digital Library) 2009;11:10-8. 10. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983;148:839-43. 11. Germano G, Kavanagh PB, Waechter P, Areeda J, Van Kriekinge S, Sharir T, et al. A new algorithm for the quantitation of myocardial perfusion SPECT. I: Technical principles and reproducibility. J Nucl Med 2000;41:712-9. 12. Slomka PJ, Nishina H, Berman DS, Akincioglu C, Abidov A, Friedman JD, et al. Automated quantification of myocardial perfusion SPECT using simplified normal limits. J Nucl Cardiol 2005;12:66-77. 13. Garcia EV, Faber TL, Cooke CD, Folks RD, Chen J, Santana C. The increasing role of quantification in clinical nuclear cardiology: The Emory approach. J Nucl Cardiol 2007;14:420-32. 14. Garcia EV, DePuey EG, DePasquale EE. Quantitative planar and tomographic thallium-201 myocardial perfusion imaging. Cardiovasc Intervent Radiol 1987;10:374-83. 15. Ficaro EP, Kritzman JN, Corbett JR. Development and clinical validation of normal Tc-99m sestamibi database: Comparison of 3D-MSPECT to CEqual. J Nucl Med 1999;40:506. 16. Kritzman JN, Ficaro EF, Liu YH, Wackers FJT, Corbett JR. Evaluation of 3-D MSPECT for quantification of Tc-99m sestamibi defect size. J Nucl Med 1999;40:817. 17. Wolak A, Slomka PJ, Fish MB, Lorenzo S, Acampa W, Berman DS, et al. Quantitative myocardial-perfusion SPECT: Comparison of three state-of-the-art software packages. J Nucl Cardiol 2008; 15:27-34. 18. Fujita H, Katafuchi T, Uehara T, Nishimura T. Application of artificial neural network to computer-aided diagnosis of coronary artery disease in myocardial SPECT bull’s-eye images. J Nucl Med 1992;33:272-6. 19. Porenta G, Dorffner G, Kundrat S, Petta P, Duitschedlmayer J, Sochor H. Automated interpretation of planar thallium-201dipyridamole stress-redistribution scintigrams using artificial neural networks. J Nucl Med 1994;35:2041-7. 20. Toft J, Hesse B, Rabol A, Carstensen S, Ali S. Myocardial sestamibi single-photon emission tomography: Variations in reference values with gender, age and rest versus stress? Eur J Nucl Med 1997;24:409-14. 21. Lindahl D, Toft J, Hesse B, Palmer J, Ali S, Lundin A, et al. Scandinavian test of artificial neural network for classification of myocardial perfusion images. Clin Physiol 2000;20:253-61. 22. Scott JA, Aziz K, Yasuda T, Gewirtz H. Integration of clinical and imaging data to predict the presence of coronary artery disease with the use of neural networks. Coron Artery Dis 2004;15:427-34.
Guner et al An open-source framework of neural networks
23. Tagil K, Underwood SR, Davies G, Latus KA, Ohlsson M, Gotborg CW, et al. Patient gender and radiopharmaceutical tracer is of minor importance for the interpretation of myocardial perfusion images using an artificial neural network. Clin Physiol Funct Imaging 2006;26:146-50. 24. Haraldsson H, Ohlsson M, Edenbrandt L. Value of exercise data for the interpretation of myocardial perfusion SPECT. J Nucl Cardiol 2002;9:169-73. 25. Allison JS, Heo JY, Iskandrian AE. Artificial neural network modeling of stress single-photon emission computed tomographic imaging for detecting extensive coronary artery disease. Am J Cardiol 2005;95:178-81. 26. Lindahl D, Palmer J, Ohlsson M, Peterson C, Lundin A, Edenbrandt L. Automated interpretation of myocardial SPECT perfusion images using artificial neural networks. J Nucl Med 1997;38:1870-5. 27. Lomsky M, Gjertsson P, Johansson L, Richter J, Ohlsson M, Tout D, et al. Evaluation of a decision support system for interpretation of myocardial perfusion gated SPECT. Eur J Nucl Med Mol Imaging 2008;35:1523-9. 28. Kurgan LA, Cios KJ, Tadeusiewicz R, Ogiela M, Goodenday LS. Knowledge discovery approach to automated cardiac SPECT diagnosis. Artif Intell Med 2001;23:149-69. 29. Khorsand A, Haddad M, Graf S, Moertl D, Sochor H, Porenta G. Automated assessment of dipyridamole 201Tl myocardial SPECT perfusion scintigraphy by case-based reasoning. J Nucl Med 2001;42:189-93. 30. Cios KJ, Teresinska A, Konieczna S, Potocka J, Sharma S. A knowledge discovery approach to diagnosing myocardial perfusion. IEEE Eng Med Biol Mag 2000;19:17-25. 31. Haddad M, Adlassnig KP, Porenta G. Feasibility analysis of a case-based reasoning system for automated detection of coronary heart disease from myocardial scintigrams. Artif Intell Med 1997;9:61-78. 32. Khorsand A, Graf S, Sochor H, Schuster E, Porenta G. Automated assessment of myocardial SPECT perfusion scintigraphy: A comparison of different approaches of case-based reasoning. Artif Intell Med 2007;40:103-13. 33. Ezquerra N, Mullick R, Cooke CD, Krawczynska EG, Garcia EV. PERFEX—an expert-system for interpreting 3D myocardial perfusion. Expert Syst Appl 1993;6:459-68. 34. Robert MB, Yehuda K. Lessons from the Netflix prize challenge. SIGKDD Explor Newslett 2007;9:75-9. 35. Maddahi J, Van Train K, Prigent F, Garcia EV, Friedman J, Ostrzega E, et al. Quantitative single photon emission computed thallium-201 tomography for detection and localization of coronary artery disease: Optimization and prospective validation of a new technique. J Am Coll Cardiol 1989;14:1689-99.