An approach for Ewing test selection to support the ... - Semantic Scholar

G Model ARTMED-1291; No. of Pages 9

ARTICLE IN PRESS Artificial Intelligence in Medicine xxx (2013) xxx–xxx

Contents lists available at SciVerse ScienceDirect

Artificial Intelligence in Medicine journal homepage: www.elsevier.com/locate/aiim

An approach for Ewing test selection to support the clinical assessment of cardiac autonomic neuropathy Andrew Stranieri a , Jemal Abawajy b , Andrei Kelarev b,∗ , Shamsul Huda a , Morshed Chowdhury b , Herbert F. Jelinek c a

School of Science, Information Technology and Engineering, University of Ballarat, P.O. Box 663, Ballarat, Victoria 3353, Australia School of Information Technology, Deakin University, 221 Burwood Highway, Victoria 3125, Australia c School of Community Health, Charles Sturt University, P.O. Box 789, Albury, New South Wales 2640, Australia b

a r t i c l e

i n f o

Article history: Received 27 August 2012 Received in revised form 23 February 2013 Accepted 25 April 2013 MSC: 68T05 68T10 Keywords: Optimal sequence of tests Accuracy of classification Decision trees Ewing features Cardiac autonomic neuropathy Diabetes patients

a b s t r a c t Objective: This article addresses the problem of determining optimal sequences of tests for the clinical assessment of cardiac autonomic neuropathy (CAN). We investigate the accuracy of using only one of the recommended Ewing tests to classify CAN and the additional accuracy obtained by adding the remaining tests of the Ewing battery. This is important as not all five Ewing tests can always be applied in each situation in practice. Methods and material: We used new and unique database of the diabetes screening research initiative project, which is more than ten times larger than the data set used by Ewing in his original investigation of CAN. We utilized decision trees and the optimal decision path finder (ODPF) procedure for identifying optimal sequences of tests. Results: We present experimental results on the accuracy of using each one of the recommended Ewing tests to classify CAN and the additional accuracy that can be achieved by adding the remaining tests of the Ewing battery. We found the best sequences of tests for cost-function equal to the number of tests. The accuracies achieved by the initial segments of the optimal sequences for 2, 3 and 4 categories of CAN are 80.80, 91.33, 93.97 and 94.14, and respectively, 79.86, 89.29, 91.16 and 91.76, and 78.90, 86.21, 88.15 and 88.93. They show significant improvement compared to the sequence considered previously in the literature and the mathematical expectations of the accuracies of a random sequence of tests. The complete outcomes obtained for all subsets of the Ewing features are required for determining optimal sequences of tests for any cost-function with the use of the ODPF procedure. We have also found two most significant additional features that can increase the accuracy when some of the Ewing attributes cannot be obtained. Conclusions: The outcomes obtained can be used to determine the optimal sequences of tests for each individual cost-function by following the ODPF procedure. The results show that the best single Ewing test for diagnosing CAN is the deep breathing heart rate variation test. Optimal sequences found for the cost-function equal to the number of tests guarantee that the best accuracy is achieved after any number of tests and provide an improvement in comparison with the previous ordering of tests or a random sequence. © 2013 Elsevier B.V. All rights reserved.

1. Introduction The problem of finding an optimal sequence of tests in diverse knowledge domains has been addressed by various authors

∗ Corresponding author. Tel.: +61 3 5330 2379; fax: +61 3 5227 1722. E-mail addresses: [email protected] (A. Stranieri), [email protected] (J. Abawajy), [email protected] (A. Kelarev), [email protected] (S. Huda), [email protected] (M. Chowdhury), [email protected] (H.F. Jelinek).

including Chi et al. [1] for a classification or a diagnosis, Thompson [2] for determining the sequence of tests that maximizes the predictive accuracy of a disease diagnosis, and Oddi and Cesta [3] for scheduling tasks to manage medical resources. Artificial intelligence (AI) methods were applied to planning and scheduling of tests for a number of diseases in [4–6]. Scheduling tests is a well known topic outside medicine including vehicle fault diagnosis [7] and other domains [8]. This is the first article devoted to a systematic investigation of sequences of the Ewing tests required for the identification of cardiac autonomic neuropathy (CAN). We use data mining methods to find optimal sequences of tests for the clinical risk assessment

0933-3657/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.artmed.2013.04.007

Please cite this article in press as: Stranieri A, et al. An approach for Ewing test selection to support the clinical assessment of cardiac autonomic neuropathy. Artif Intell Med (2013), http://dx.doi.org/10.1016/j.artmed.2013.04.007

G Model ARTMED-1291; No. of Pages 9 2

ARTICLE IN PRESS A. Stranieri et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

of CAN, to achieve the predictive ability of the tests as high as possible and as quickly as possible. We find the most effective additional tests to supplement the Ewing battery for increasing the predictive accuracy in situations where some Ewing tests cannot be performed. Since not all five tests recommended by Ewing can always be applied in each situation in practice, we determine the accuracy of using each one of the recommended Ewing tests to classify CAN and the additional accuracy obtained by adding the remaining tests recommended by Ewing and even additional supplementary tests. Here complete results are obtained for all subsets of the Ewing features. These results are required for determining the optimal sequences of tests for any individual cost-function. In addition, visual representations are proposed that may simplify the use of experimental results in practice. The outcomes of our experiments can be used, for any given cost-function of performing the tests, to determine the sequences of tests such that the predictive ability of every initial segment of the sequence corresponding to each fixed value of the cost-function is the best possible. Data mining, as a part of knowledge discovery from databases, can be used to identify novel, valid and useful patterns in data and has been applied extensively to data from the medical domain, for example, in [9–15]. CAN is a complication of diabetes that involves damage to the autonomic nerve fibres that innervate the heart and blood vessels. The resulting abnormalities in heart rate control and vascular dynamics is thought to account for many deaths [16]. The Ewing battery of tests [17–19] are used clinically to assess a patient’s risk of CAN. There are five tests in the Ewing battery: changes in heart rate associated with (1) lying to standing, (2) deep breathing, (3) attempted exhalation against a closed airway (valsalva manoeuvre) and changes in blood pressure associated with (4) hand grip and (5) lying to standing. Ewing and Clarke [18] recommended to perform all five tests for the diagnosis of CAN, considered only one sequence of tests and did not provide any data on comparison of this sequence with other possible sequences noting only that in general the question of finding the best sequence is a difficult one. However, the tests take time and not all of them are possible to complete for every patient. For instance, the hand grip test may not be performed due to arthritis. The lying to standing tests often cannot be performed due to mobility challenges and some patients have conditions where forceful breathing is contra-indicated. Further, clinicians sometimes have an idiosyncratic preference for one test or sequence of tests over another [20,21]. Ewing [22] writes that the question of finding an optimal sequence of tests remains a difficult open question that requires further investigation. This is also confirmed by the more recent work [23–27]. Although the time to perform all five tests, at around 20–30 min, is not long, this is sometimes difficult to achieve in the context of a busy practice. These issues result in CAN risk assessments being made in practice on the basis of only a subset of Ewing tests. This is why it is important to obtain data on performance of the decision trees for various subsets of the Ewing features. This problem is solved completely in the present article: our experiments assess the performance of decision trees for all subsets of the Ewing battery of tests. The current paper presents outcomes of our evaluation of the performance of the decision trees for all subsets of the Ewing battery. We use these outcomes to determine the best sequence of Ewing tests for the clinical assessment of CAN. Although we assume that each Ewing test is equally costly, the determination of the optimal sequence of tests using an individual cost-function for each test can be carried out using our tables. The main benefit inherent in the use of decision trees for the identification of an optimal sequence of tests is that the decision trees are simply generated, and easily understood by clinicians.

The paper is organized as follows. The next section elaborates on the dataset and pre-processing deployed for this study. Following that, Section 3 is devoted to the methods investigated in the present paper. Section 4 contains the experimental results and discussions. A summary of conclusions is presented in Section 5.

2. Cardiac autonomic neuropathy We used a new and unique database of the diabetes screening research initiative (DiScRi) project [16], which is more than ten times larger than the data set used by Ewing in his original investigation of CAN. DiScRi is a diabetes complications screening programme in Australia where members of the general public participate in a comprehensive health review consisting of tests including an electrocardiogram (ECG), Ewing battery, retinal scans, peripheral nerve function and blood supply assessments, for early detection and timely intervention of diabetes and cardio-vascular disease. Data on over 200 variables from over two thousand attendances have been collected in recent years. The dataset has been used in data mining applications in [16,23,24,28,29]. The presence of CAN from DiScRi data was determined using the Ewing battery of tests. Several expert editing rules were used to reduce the number of missing values in the database. These rules were determined during discussions with the experts maintaining the database. Preprocessing of data using these rules produced 1177 complete rows with complete values of all Ewing fields, which were used for the experimental evaluation of the performance of data mining algorithms. The classification of disease progression associated with CAN is important, because it has implications for planning of timely treatment, which can lead to improved wellbeing of the patients and a reduction in morbidity and mortality associated with cardiac arrhythmias in diabetes. As indicated above, the most important tests required for a risk assessment of CAN rely on responses in heart rate and blood pressure to various activities, usually consisting of tests described in [17–19]. In particular, Ewing and Clarke [18] recommended the tests be performed in a specific sequence as follows: (1) Heart rate response to the valsalva manoeuvre (VAHR); where the patient exhales against 40 mmHg pressure while the heart rate is observed. (2) Heart rate variation during deep breathing (DBHR); where the patient sits quietly and breathes deeply while an electrocardiogram records the heart rate variation over 6 breathing cycles. (3) Blood pressure response to sustained hand-grip (HGBP); where the systolic blood pressure variation is recorded before and after a sustained hand grip. (4) Heart rate response to moving from a lying to a standing position (LSHR); where the beat to beat (R–R) interval change in response to standing from a lying position is measured. (5) Blood pressure response moving from lying to standing (LSBP); where the blood pressure change in response to standing from a lying position is measured. Table 1 contains the boundary points for each test derived in [17–19] from physiological evidence in association with in-field trials. These boundary values are also explained by Ewing et al. [19] in great detail. The categorical variables abnormal, borderline and normal are introduced in the Ewing and Clark formulation for each test. The boundary points illustrated in Table 1 may not necessarily be the optimal boundary points for distinguishing categories of CAN risk when a subset of Ewing tests are used. New




Table 1 Ranges and boundary values determining categorical variables for the Ewing battery. Test

Normal

Borderline

≥1.21 ≥15 ≥16 ≥1.04 ≤10

1.11–1.20 11–14 11–15 1.01–1.03 11–29

Abnormal ≤1.10 ≤10 ≤10 ≤1.00 ≥30

boundary points may be identified by decision trees to maximize the predictive accuracy of CAN assessment with missing features. DiScRi database contains a separate attribute LSBPneg that can take on values FALSE and TRUE. If LSBPneg = TRUE and LSBP ≥30, then the result is abnormal. If LSBPneg = TRUE and 29 ≥ LSBP ≥ 11, then the result is borderline. In all other cases the result of this test is normal. Before applying the cut-offs to DiScRi data for DBHR, LSBP, HGBP we round off fractional parts to the second decimal place. This means that we are a little more conservative. Likewise, for VAHR, LSHR we discard the third and further digits after the decimal point. Ewing et al. [17,19] defined the five classes for a CAN risk assessment given in Table 2. Ewing et al. [19] considered alternative approaches to the classification of CAN and compared the categorization given in Table 2 with two scoring systems used by other researchers: (1) giving 0 for a normal test, 1/2 for a borderline result, and 1 for abnormal result, thus giving a score of 0–5 for each subject; and (2) the number of tests definitely abnormal, again giving a score of 0–5 for each subject. Ewing et al. [19] demonstrate that these scoring systems give roughly equivalent categorizations and seem to carry no real advantages. Since there are very few atypical patients in the DiScRi database, we investigated three original classifications of cardiac autonomic neuropathy progression introduced by Ewing et al. [17,19]. They have 2, 3 and 4 classes, respectively. The first one divides all patients into two classes allocating each patient either to the normal class, or to definite class. The second one divides all patients into three classes allocating each patient to one of the following classes: normal, early and definite. The fourth classification divides all patients into four classes, allocated each patient to one of the following classes: normal, early, definite and severe. 3. Methods The optimal decision path finder (ODPF) procedure for determining optimal sequences of tests was proposed and investigated in [1]. The main idea of the procedure is briefly illustrated in Fig. 1 for the purposes of our work. The ODPF uses a pre-specified threshold of confidence required for the diagnosis of a disease. The first test selected is identified as the one that is most likely to lead to a threshold crossing for a diagnosis. The next test selected depends on the result of the first test. For instance, if the first test involves blood pressure which is found to be 160/90, then the second test is one that is most likely to cross the disease classification threshold given that blood pressure is 160/90. Table 2 CAN classes defined by Ewing et al. [19]. Class

Test values

Normal Early Definite Severe

All tests normal or one borderline. One of the three heart rate tests abnormal or two borderline. Two or more of the heart rate tests abnormal. Two or more of the heart rate tests abnormal plus one or both of the blood pressure tests abnormal, or both borderline. Any other combination of abnormal tests.

Atypical

Start with an empty set of tests

Value

VAHR (ratio) DBHR (beats/min) HGBP (mmHg) LSHR (ratio) LSBP (mmHg)

3

Find a test that opmizes the next predicon

Train classifiers on the outcomes of selected tests

No

Confident?

Yes

Conclude a diagnosis

Fig. 1. Optimal decision path finder procedure.

The cost-functions used in the practical assessment of CAN include (1) the number of tests, (2) time required to perform the tests and (3) the minimization of the individual difficulties faced by each patient in performing the tests and the minimization of the negative impact in those cases where any one or more of the Ewing tests cannot be completed at all. This paper explores the use of decision trees to infer optimal sequences of tests given that some tests may not be able to be performed at all. The main claim made here is that the outcomes of training decision trees for all subsets of tests can be effectively applied to determine the optimal sequence of tests in the ODPF procedure. The decision trees are one of the most important algorithms for clinical applications. Their outcomes are easy to generate, interpret and understand. In addition, a relatively simple visual presentation can present the sequence in a manner that can be used in practice. In this paper, following [14,15], we report the performance of decision trees as one of the main data mining algorithms used in clinical applications involving heart disease. Decision tree induction implementations, advanced by Quinlan [30], are readily available. We used Waikato Environment for Knowledge Analysis (WEKA) to test the J48 classifier [31]. Decision tree induction algorithms are also available in Rattle [32,33] and many other sources. Lamb et al. [34] address the issue of selecting an optimal sequence of tests to predict the risk of falls for elderly women. They use classification tree algorithms to generate a decision tree that depicts a minimum number of tests or questions to make an accurate prediction of a falls risk. Zuzek et al. [35] note that the problem of identifying an optimal sequence of tests in order to reach a diagnosis at minimum cost has been studied by numerous authors in relation to mechanical or electrical fault diagnosis. 4. Experimental results and discussion In this study, we used the J48 decision tree induction algorithm in WEKA [31] for each single Ewing test, then for each group of 2, 3 and 4 tests. Throughout all experiments, 10-fold cross validation was used to estimate predictive accuracy. The training was repeated for classifications with 2, 3 and 4 classes of Ewing outcomes. Following [14], we used accuracy in assessing the decision trees as the main measure of performance essential for guiding the clinicians in determining the best sequence of tests for a particular patient.


ARTICLE IN PRESS


A. Stranieri et al. / Artificial Intelligence in Medicine xxx (2013) xxx–xxx

4

HGBP

LSHR VAHR HGBP LSBP

LSBP

LSHR DBHR HGBP LSBP 4 classes

4 classes LSHR

3 classes

LSHR DBHR VAHR HGBP

3 classes 2 classes

2 classes VAHR

DBHR VAHR LSBP LSHR

DBHR

DBHR VAHR LSBP HGBP

0

20

40

60

80

75

100

LSHR LSBP

LSHR HGBP HGBP LSBP VAHR LSBP VAHR HGBP

4 classes

LSHR VAHR

3 classes

DBHR HGBP

2 classes

DBHR LSBP LSHR DBHR DBHR VAHR 20

40

60

80

100

Fig. 3. Accuracy of J48 for sets of two Ewing features.

We carried out a systematic investigation of the Ewing battery of five tests for the diagnosis of CAN. The set of all five Ewing tests has 32 subsets. All of these subsets were included in our experiments. Figs. 2–5 present the outcomes of an exhaustive evaluation of performance of decision trees for the Ewing battery. Fig. 2 shows that the DBHR test is most effective among all Ewing tests. The accuracy achieved by this test alone may be sufficient, for example, for advice on lifestyle based strategies mitigating CAN. The DBHR is the best test that can be used as a single independent attribute for the identification of CAN. These outcomes are required for practical determination of such sequences following the ODPF methodology to minimize individual difficulties faced by each patient in performing the tests. We

LSHR HGBP LSBP

LSHR VAHR HGBP VAHR HGBP LSBP LSHR VAHR LSBP LSHR DBHR HGBP

4 classes

LSHR DBHR LSBP

3 classes

DBHR VAHR HGBP

2 classes

LSHR DBHR VAHR DBHR HGBP LSBP DBHR VAHR LSBP 0

20

40

60

80

100

Fig. 4. Accuracy of J48 for sets of three Ewing features.

85

90

95

100

Fig. 5. Accuracy of J48 for sets of four Ewing features.

Fig. 2. Accuracy of J48 for sets of one Ewing feature.

0

80

use these outcomes and the ODPF procedure to determine the best sequences of Ewing tests for the clinical assessment of CAN and cost-function equal to the number of tests (since the Ewing tests have approximately equal financial costs). These sequences are given in Table 3, which also contains the accuracies achieved after each initial segment of the optimal sequences for cost-function equal to the number of tests. Furthermore, we calculated the mathematical expectations, or expected values, of the accuracies that can be achieved by a randomly chosen sequence of tests, see Table 3. This is equivalent to running a large number of new tests for many sequences chosen uniformly at random and then averaging the outcomes. The mathematical expectation is equal to a weighted sum of the accuracies for all possible sequences of tests. Assuming that at each step the next test is chosen uniformly at random, the sum coincides with an average of the accuracies presented in Figs. 2–5. We include these mathematical expectations in Table 3. For comparison, it also contains the corresponding accuracies for the diagnosis of CAN with 2, 3 and 4 categories by each initial segment of the sequence recorded previously in [18]. Fig. 6 illustrates the optimal sequence of Ewing tests and predictive accuracies that can be achieved after each step for 2 classes of CAN and the number of tests as cost-function. Comparing parts (A), (B) and (C) of Table 3 it is easy to see that new sequences provide additional gain. Here we investigate not only the final predictive accuracy reached after performing all tests, Table 3 Comparing (A) accuracies achieved by all initial segments of the optimal sequences of Ewing tests obtained using ODPF procedure for the number of tests as costfunction; (B) the mathematical expectations of the accuracies achieved by initial segments of a random sequence; (C) accuracies achieved by all initial segments of the sequence of tests considered previously for the diagnosis of CAN with 2, 3 and 4 categories. (A) Optimal sequences of Ewing tests DBHR VAHR Sequence: 80.80 91.33 2 classes DBHR VAHR Sequence 79.86 89.29 3 classes Sequence DBHR VAHR 78.90 86.21 4 classes

LSBP 93.97 LSBP 91.16 LSBP 88.15

HGBP 94.14 LSHR 91.76 LSHR 88.93

(B) Expected accuracies of initial segments of a random sequence Sequence: 1 test 2 tests 3 tests 4 tests 69.18 81.51 88.60 91.52 2 classes 65.85 80.21 86.50 89.29 3 classes 63.15 77.04 83.61 86.14 4 classes

LSHR 100 HGBP 100 HGBP 100 5 tests 100.00 100.00 100.00

(C) Accuracies of initial segments of the sequence considered previously VAHR DBHR HGBP LSHR LSBP Sequence: 74.68 91.33 91.25 91.93 100 2 classes 70.18 89.29 89.72 89.80 100 3 classes 67.52 86.21 87.06 87.40 100 4 classes


ARTICLE IN PRESS



DBHR test Opmal sequence of Ewing tests for 2 classes of CAN 80.80

VAHR test

91.33

LSBP test

93.97

HGBP test

LSHR test

94.14

100

but are interested in the order of tests which provides the best possible accuracy after each number of tests performed during the process (for any value of the cost-function of the performing tests). Optimal sequences considered in our paper always remain optimal after each step: the very first test is chosen so that it provides the best accuracy among all single tests, the second test is then chosen so that the pair of the first two tests provides the best accuracy and so on. Therefore, after any number of steps the clinician can stop if the required level of accuracy has been reached and the preformed sequence of the first tests carried out up to this level will be an optimal one. This is the benefit of the new approach fine tuning previous work on tests for the diagnosis of CAN. Knowledge of such sequences allows the clinician to stop without completing the remaining tests when a satisfactory level of predictive accuracy has been reached. On the other hand, another important objective of a clinician is to minimize the time required to carry out the tests and the difficulties that individual patients may experience in determining the presence of CAN. Therefore there are alternative cost-functions to be taken into account on individual basis. This may lead to a different subset of tests that has to be used in a particular situation. The clinician has to make an assessment of the time required

BP LS

HG

BP

LS

H

R HR A V HR LS BP BP LS R LS HR H LS HR DB P VA HR SB L B D BP HR HR S A LS L V R HR BP SH P B L S D L SB R L R S H R L H LS LSH R AHR R H V H B D DB R R DBH DBH P LSB R VAHR H AHR V A V LSHR LSHR LSHR VAHR HR DBHR DB HR LS LSBP LSHR DBHR VAHR VAHR HGBP DBHR HGBP VAHR LSHR DBHR VAH H R GBP HGBP HG DBHR DBHR BP DB VAH VAHR LS HR R HR VAH DBH R VA R H GB HR VA HGB LS P HG P HR H R VAH HG BP R LS BP HG BP VA HR L H SH D R HG BHR VA R H LS BP LS R HR HG HR DB BP HG HR BP

Fig. 6. Optimal sequence of Ewing tests obtained using ODPF procedure and predictive accuracies after each step for two classes of CAN and the number of tests as cost-function.

5

R BH P D B HG HR DB BP BP LS P LS HR GB DB BP H BP HG BP LS BP HG BP HR HG BP LS LS R H BP LS LS BP HG R HR LS R LS LSH LSH P BP BP LSB HG HG R R P DBH DBH LSB HR BP S L LSBP DB LSHR LSHR LSHR LSBP DBHR DBHR HR LSHR VA HGBP LSBP LSHR DBHR DBHR HGBP LSHR HGBP DBHR LSBP DBHR LSHR HGBP HGBP HG B LSHR P LSHR LSH DB DB D R R B H H H R R VAH VA LSH R R HR D LS B B L LS HR SBP DB P B H P V LS R LS AH R H D BP B L R SB VA HR DB P VA HR H VA R L HR LS SBP DB HR H H R LS R VA BP LS HR BP

R BH D HR VA HR DB BP BP G LS P H HR B DB HR HG P B VA BP HR LS HG BP VA HR BP VA LS HR HG BP A V HG HR BP R VA LS R VAH AH BP V P HG BP LSB HG P P LSB GBP HGB H HR LS LSBP LSBP LSHR LSHR LSHR HGBP LSBP LSBP VAHR LSHR LSBP LSHR HGBP DBHR LSHR VAHR LSBP HG LS BP BP VAHR VAHR LSHR LSBP LSB LSBP P LSHR L LSHR S B LS P HR VAH VAHR R VA V A H L H S LS HR R HG R HG HR BP H G B L B SH P P D R VAH H B G HR R L BP H S GB V HR A LS P HG HR H VA R L BP H DB SB LS R H P H R LS R HG BP LS BP BP

R H LS HR DB HR LS HR BP G VA R H SHR H R L HR DB R H B R S D H AH L R V H DB HR HR DB A LS V HR BP HR S HG L DB HR HR VA R SHR LS L H LS R R P LSB DBH DBH P P B B R G H HG VAH BP BP LSBP S L HG VAHR VAHR VAHR LSBP HGBP HGBP R HR VA DBH VAHR LSBP HGBP HGBP LSBP DBHR LSBP LSHR HGBP VAHR HGBP DBHR LSBP LSBP LSB D BHR P D B HR HG DB BP HGB HGBP HR P VAH DBH VA R R HR LSB LS LSB BP LS DB P P HR BP VAH LS DB R DB BP HR LS BP VA HR D BH V HR H AH VA R HR DB GB R D B HR P HG HR VA BP HG HR BP

Fig. 7. Circle diagram.


ARTICLE IN PRESS



6

start

LSHR

DBHR

VAHR

HGBP

LSBP

LSHR DBHR

LSHR VAHR

LSHR HGBP

LSHR LSBP

DBHR VAHR

DBHR HGBP

DBHR LSBP

VAHR HGBP

VAHR LSBP

HGBP LSBP

LSHR DBHR VAHR

LSHR DBHR HGBP

LSHR DBHR LSBP

LSHR VAHR HGBP

LSHR VAHR LSBP

LSHR HGBP LSBP

DBHR VAHR HGBP

DBHR VAHR LSBP

DBHR HGBP LSBP

VAHR HGBP LSBP

LSHR DBHR VAHR HGBP

LSHR DBHR VAHR LSBP

LSHR DBHR HGBP LSBP

LSHR VAHR HGBP LSBP

DBHR VAHR HGBP LSBP

LSHR DBHR VAHR HGBP LSBP Fig. 8. Lattice diagram.

to undertake tests at the practice and the individual difficulties associated with each case. During the application of ODPF procedure explained in Fig. 1, a clinician could follow the resulting refinement of the protocol choosing tests so that every next test provides the greatest predictive accuracy for the resulting sequence according to the outcomes obtained in Figs. 2 to 5. These figures could be used to inform clinicians of the best remaining tests to perform for any cost-function and the ensuing predictive accuracies that can be achieved. Rather than embedding an optimal sequence of tests algorithm into a decision support system, we advocate the visualization of all possible Ewing test sequences in a diagram that depicts the accuracy gains in diverse test sequences so that a clinician can easily select the sequence of preference and be informed of the accuracy associated with the chosen sequence. Visualization is one of the most important methods in assisting clinical planning. Information visualization for medical applications has been considered, for example, in [37–39]. It has been applied to the design of plans [40]. Graph-based approaches were used for medical visualization, for instance, in [41]. Hierarchical visualization layouts were considered in [42]. Visualization for inference has been also investigated in [43,44]. For diabetes patients, it was treated in [45]. We include Figs. 7 and 8 representing compressed versions of two diagrams that illustrate visual aids and can be created to include the predictive accuracies of all test sequences facilitating the work of clinicians applying the ODPF procedure in practice. Complete versions for the use of practitioner would include the

prediction accuracies achieved at each step as well as average time required to perform the next test. The advantage of the circle diagram in Fig. 7 is that at every step of the ODPF process the current cell in the diagram keeps track not only of the final predictive accuracy achieved, but also of the whole sequence of previous tests in the order they were applied. The outcomes of our experiments can be used in conjunction with the visual representations or aids to apply the ODPF for the following three categories of cost-functions: (1) cost-function equal to the number of tests, (2) cost function equal to the time required to perform the tests and (3) cost-function expressing individual difficulties in performing the test for a particular patient or in a particular situation. For example, in applying the ODPF procedure with the diagnostic difficulty of tests as a costfunction, the clinician could use the diagrams represented in Fig. 7 and move from the centre of the diagram to the outside circle and choosing the easiest test with appropriate additional gain in predictive accuracy at each step of the process. This is similar to the application of ODPF procedure to determining optimal sequences for cost-function equal to the number of tests. When it is difficult for a patient to pass one of the standard Ewing battery tests, it may be possible to use the remaining tests to increase the combined predictive accuracy of classification. Determination of appropriate tests to be used for this has already been considered in the literature, see [46]. We used several feature selection methods to find a few most effective tests that can be combined with tests in Ewing battery. To rank features in the order of their significance we used three methods: gain ratio attribute evaluation, information gain attribute evaluation and classifier attribute evaluation. Gain ratio attribute


ARTICLE IN PRESS


QRS 10sec

HGBP

LSHR, HGBP, LSBP

LSBP

LSHR, VAHR, HGBP

LSHR

VAHR, HGBP, LSBP

VAHR DBHR

4 classes

HGBP

3 classes

Grade 10sec

Grade 10sec


2 classes

LSBP

LSHR, DBHR, LSBP DBHR, VAHR, HGBP

DBHR, HGBP, LSBP

VAHR

0

20

40

60

80

100

GainR(Class, Attribute) =

QRS 10sec

evaluation assesses the significance of each attribute by calculating its gain ratio using the formula

DBHR, VAHR, LSBP

4 classes

LSHR, HGBP, LSBP

3 classes

LSHR, VAHR, HGBP

2 classes

VAHR, HGBP, LSBP

Fig. 9. Accuracy of J48 for sets of one Ewing feature with QRS 10sec or Grade 10sec added.

LSHR, VAHR, LSBP LSHR, DBHR, HGBP LSHR, DBHR, LSBP DBHR, VAHR, HGBP LSHR, DBHR, VAHR

(H(Class) − H(Class|Attribute)) , H(Attribute)

(1)

where H(X) stands for the entropy of X, see [47]. Information gain attribute evaluation assesses the significance of each attribute by calculating the information gain using the formula InfoGain(Class, Attribute) = H(Class) − H(Class|Attribute).

(2)

Classifier attribute evaluation assesses the significance of each attribute by applying it with a user-specified classifier. We used classifier attribute evaluation with J48 classifier. Then we ordered all attributes according to the sum of their ranks in these three assessments. Three most significant features on this list are three standard parameters associated to the ECG recordings: ECG interpretation, Grade 10sec and QRS 10sec. Let us refer to [48] for more information on ECG. ECG interpretation, Grade 10sec and QRS 10sec are standard parameters associated to a 10-ses clinical ECG recording. The ECG interpretation is determined by the cardiologist as a characterization of the ECG recording. Grade 10sec can take on one of the following values: the categories 1a and 1b are associated with

LSHR, LSBP LSHR, HGBP

HGBP, LSBP Grade 10sec

LSHR, VAHR, LSBP LSHR, DBHR, HGBP

LSHR, DBHR, VAHR

LSHR

DBHR

VAHR, LSBP VAHR, HGBP

DBHR, HGBP, LSBP DBHR, VAHR, LSBP 0

20

40

60

80

100

120

Fig. 11. Accuracy of J48 for sets of three Ewing features with QRS 10sec or Grade 10sec added.

normal recordings of ECG, 2a is associated with ECG that may indicate minor pathology but is not clinically relevant, 2b suggests that it is advisable for the patient to see a doctor and category 3 means that the patient should have an immediate referral to a doctor. The QRS 10 sec refers to the time interval between onset and termination of the QRS complex within the ECG and indicates ventricular depolarisation. Note that the use of ECG data in applications of AI methods has been considered recently, for example, in [49–53]. Further tests have shown that J48 classifier could not use ECG interpretation efficiently, since it is a categorical variable with very large range of values and J48 would have to construct a very large tree to handle it correctly. Patients demographic and various clinical data are also contained in DiScRi database for each patients. Feature selection methods were applied to the whole database including the demographic and showed that these data are less significant than Grade 10sec and QRS 10sec in their ability to improve classification accuracy for the diagnosis of CAN. This is why we did not include the demographic features in further tests.

LSHR, VAHR DBHR, HGBP

LSHR, VAHR, HGBP, LSBP

DBHR, VAHR

4 classes

LSHR, LSBP

3 classes

LSHR, HGBP

2 classes

Grade 10sec

DBHR, LSBP

LSHR, DBHR

HGBP, LSBP VAHR, LSBP

VAHR, HGBP LSHR, VAHR

QRS 10sec

QRS 10sec

7

DBHR, HGBP DBHR, LSBP LSHR, DBHR

LSHR, DBHR, HGBP, LSBP LSHR, DBHR, VAHR, HGBP

DBHR, VAHR, LSBP, LSHR DBHR, VAHR, LSBP, HGBP

4 classes

LSHR, VAHR, HGBP, LSBP

3 classes 2 classes

LSHR, DBHR, HGBP, LSBP LSHR, DBHR, VAHR, HGBP

DBHR, VAHR, LSBP, LSHR DBHR, VAHR, LSBP, HGBP

DBHR, VAHR 0

20

40

60

80

100

120

Fig. 10. Accuracy of J48 for sets of two Ewing features with QRS 10sec or Grade 10sec added.

75

80

85

90

95

100

Fig. 12. Accuracy of J48 for sets of four Ewing features with QRS 10sec or Grade 10sec added.




8

We carried out a complete evaluation of the predictive accuracy of J48 classifier for the Ewing battery supplemented with the QRS 10sec and Grade 10sec attributes. Figs. 9–12 include experimental results of these tests. These outcomes show that Grade 10sec and QRS 10sec produce approximately equivalent improvement in the predictive accuracy of J48, with Grade 10sec slightly better than QRS 10sec. Figs. 9–12 can be used to determine the best sequence of Ewing tests, for example, for those patients who already have the values of the Grade 10sec and QRS 10sec attributes determined by the clinicians. 5. Conclusions We have applied decision trees to the problem of supporting clinicians in finding optimal sequences of tests for each individual patient for the assessment of cardiac autonomic neuropathy. We have determined the best sequences of Ewing tests for the diagnosis of CAN with 2, 3 and 4 categories and included a table with accuracies after each initial segment of test in these sequences. A comparison with the sequence considered previously in [18] and the mathematical expectations of the accuracies for a random sequence of tests shows significant improvement that has been achieved and demonstrates that our results can provide additional guidance to the clinicians toward selecting subsets of the whole Ewing battery and determining the order of performing the tests in each individual situation. Our tables with outcomes contain the predictive accuracies that can be achieved by diagnosing CAN on the basis of each subset of the Ewing features and can be used to determine the optimal sequences of tests for each individual cost-function by following the ODPF procedure. The results show that the best single Ewing test for diagnosing CAN is the deep breathing heart rate variation (DBHR) test. Optimal sequences found for the cost-function equal to the number of tests guarantee that the best accuracy is achieved after any number of tests, and provide an improvement in comparison with the original ordering of tests. In situations, where some of the Ewing tests cannot be performed, the best additional features that can be recommended to increase the accuracy are the Grade 10sec and QRS 10sec attributes. Our experiments show that Grade 10sec and QRS 10sec produce approximately equivalent improvement in the predictive accuracy of J48, with Grade 10sec slightly better than QRS 10sec. Acknowledgements This research was supported by a Deakin-Ballarat collaboration grant. The authors are grateful to two reviewers for corrections, comments and suggestions that have helped to improve this article. References [1] Chi CL, Street NK, Katz C. A decision support system for cost-effective diagnosis. Artificial Intelligence in Medicine 2010;50:149–61. [2] Thompson ML. Assessing the diagnostic accuracy of a sequence of tests. Biostatistics 2003;4:341–51. [3] Oddi A, Cesta A. Toward interactive scheduling systems for managing medical resources. Artificial Intelligence in Medicine 2000;20(2):113–38. [4] Houshyar A, Khayyal FA. A mathematical model for scheduling screening tests for progressive diseases. Socio-Economic Planning Sciences 1990;24(3):187–97. [5] Marinagi CC, Spyropoulos CD, Papatheodorou C, Kokkotos S. Continual planning and scheduling for managing patient tests in hospital laboratories. Artificial Intelligence in Medicine 2000;20(2):139–54. [6] Spyropoulos CD. AI planning and scheduling in the medical hospital environment. Artificial Intelligence in Medicine 2000;20(2):101–11. [7] Bartels J-H, Zimmermann J. Scheduling tests in automotive R&D projects. European Journal of Operational Research 2009;193(3):805–19. [8] Liberatore MJ, Nydick RL. The analytic hierarchy process in medical and health care decision making: a literature review. European Journal of Operational Research 2008;189:194–207.

[9] Bellazzi R, Ferrazzi F. Predictive data mining in clinical medicine: a focus on selected methods and applications. Data Mining and Knowledge Discovery 2011;1:416–30. [10] Escalante HJ, Montes-y-Gomez M, Gonzalez JA, Gomez-Gil P, Altamirano L, Reyes CA, et al. Acute leukemia classification by ensemble particle swarm model selection. Artificial Intelligence in Medicine 2012;55(3):163–75. [11] Gagliardi F. Instance-based classifiers applied to medical databases: diagnosis and knowledge extraction. Artificial Intelligence in Medicine 2011;52(3):123–39. [12] Kukar M, Kononenko I, Groselj C. Modern parameterization and explanation techniques in diagnostic decision support system: a case study in diagnostics of coronary artery disease. Artificial Intelligence in Medicine 2011;52(2): 77–90. [13] Liang G, Zhang C. Empirical study of bagging predictors on medical data. In: Vamplew P, Stranieri A, Ong K-L, Christen P, Kennedy PJ, editors. Australasian data mining conference, AusDM 2011, vol. 121 of CRPIT. Ballarat, Australia: ACS; 2011. p. 31–40. [14] Shouman M, Turner T, Stocker R. Using decision tree for diagnosing heart disease patients. In: Vamplew P, Stranieri A, Ong K-L, Christen P, Kennedy PJ, editors. Australasian data mining conference, AusDM 11, vol. 121 of CRPIT. Ballarat, Australia: ACS; 2011. p. 23–30. [15] Van A, Gay VC, Kennedy PJ, Barin E, Leijdekkers P. Understanding risk factors in cardiac rehabilitation patients with random forests and decision trees. In: Vamplew P, Stranieri A, Ong K-L, Christen P, Kennedy PJ, editors. Australasian data mining conference, AusDM 2011, vol. 121 of CRPIT. Ballarat, Australia: ACS; 2011. p. 11–22. [16] Jelinek HF, Wilding C, Tinley P. An innovative multi-disciplinary diabetes complications screening programme in a rural community: a description and preliminary results of the screening. Australian Journal of Primary Health 2006;12:14–20. [17] Ewing DJ, Campbell JW, Clarke BF. The natural history of diabetic autonomic neuropathy. Quarterly Journal of Medicine 1980;49:95–100. [18] Ewing DJ, Clarke BF. Diagnosis and management of diabetic autonomic neuropathy. British Medical Journal 1982;285:916–8. [19] Ewing DJ, Martyn CN, Young RJ, Clarke BF. The value of cardiovascular autonomic function tests: 10 years experience in diabetes. Diabetes Care 1985;8:491–8. [20] Ewing DJ, Campbell JW, Murray A, Neilson JMM, Clarke BF. Immediate heartrate response to standing: simple test for autonomic neuropathy in diabetes. British Medical Journal 1978;1:145–7. [21] Gonzalez-Clemente J-M, Vilardell C, Broch M, Megia A, Caixas A, GimenezPalop C, et al. Lower heart rate variability is associated with higher plasma concentrations of IL-6 in type 1 diabetes. European Journal of Endocrinology 2007;157:31–8. [22] Ewing DJ. Which battery of cardiovascular autonomic function tests? Diabetetologia 1990;33:180–1. [23] Jelinek HF, Khandoker A, Palaniswami M, McDonald S. Heart rate variability and QT dispersion in a cohort of diabetes patients. Computing in Cardiology 2010;37:613–6. [24] Jelinek HF, Rocha A, Carvalho T, Goldenstein S, Wainer J. Machine learning and pattern classification in identification of indigenous retinal pathology. In: 33rd annual international conference of the IEEE Engineering in Medicine and Biology Society. IEEE Press; 2011. p. 5951–4. [25] Chen HT, Lin HD, Wonb JGS, Lee CH, Wu SC, Lin JD, et al. Cardiovascular autonomic neuropathy, autonomic symptoms and diabetic complications in 674 type 2 diabetes. Diabetes Research and Clinical Practice 2008;82:282–90. [26] Khandoker AH, Jelinek HF, Palaniswami M. Identifying diabetic patients with cardiac autonomic neuropathy by heart rate complexity analysis. BioMedical Engineering OnLine 2009;8, http://www.biomedical-engineeringonline.com/content/8/1/3. [27] Stella P, Ellis D, Maser RE, Orcharda TJ. Cardiac autonomic neuropathy (expiration and inspiration ratio) in type 1 diabetes incidence and predictors. Journal of Diabetes and Its Complications 2000;14:1–6. [28] Cornforth D, Jelinek HF. Automated classification reveals morphological factors associated with dementia. Applied Soft Computing 2007;8:182–90. [29] Huda S, Jelinek HF, Ray B, Stranieri A, Yearwood J. Exploring novel features and decision rules to identify cardiovascular autonomic neuropathy using a hybrid of wrapper-filter based feature selection. In: Sixth international conference on intelligent sensors, sensor networks and information processing, ISSNIP 2010. Sydney: IEEE Press; 2010. p. 297–302. [30] Quinlan R. C4.5: programs for machine learning. San Mateo, CA: Morgan Kaufmann; 1993. [31] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. SIGKDD Explorations 2009;11:10–8. [32] Williams GJ. Rattle: a data mining GUI for R. R Journal 2009;1:45–55. [33] Williams G. Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R!). New York/Dordrecht/Heidelberg/London: Springer; 2011. [34] Lamb SE, McCabe C, Becker C, Fried LP, Guralnik J. The optimal sequence and selection of screening test items to predict fall risk in older disabled women: the women’s health and aging study. Journal of Gerontology, Series A 2008;63:1082–8. [35] Zuzek A, Biasizzo A, Novak F. Sequential diagnosis tool. Microprocessors and Microsystems 2000;24:191–7. [37] Chittaro L. Information visualization and its application to medicine. Artificial Intelligence in Medicine 2001;22(2):81–8.




[38] Combi C, Oliboni B. Visually defining and querying consistent multigranular clinical temporal abstractions. Artificial Intelligence in Medicine 2012;54(2):75–101. [39] Jelinek HF, Cornforth DJ, Blackmore K. Visualisation in biomedicine as a means of data evaluation. Journal of Visualization 2011;14:353–9. [40] Kosara R, Miksch S. Metaphors of movement: a visualization and user interface for time-oriented, skeletal plans. Artificial Intelligence in Medicine 2001;22(2):111–31. [41] Plaza L, Diaz A, Gervas P. A semantic graph-based approach to biomedical summarisation. Artificial Intelligence in Medicine 2011;53(1):1–14. [42] Tsay J-J, Wu B-L, Jeng Y-S. Hierarchically organized layout for visualization of biochemical pathways. Artificial Intelligence in Medicine 2010;48(23):107–17. [43] Park C, Godtliebsen F, Taqqu M, Stoev S, Marron JS. Visualization and inference based on wavelet coefficients, SiZer and SiNos. Computational Statistics & Data Analysis 2007;51(12):5994–6012. [44] Park C, Huh J. Statistical inference and visualization in scale-space using local likelihood. Computational Statistics & Data Analysis 2013;57(1):336–48. [45] Cho BH, Yu H, Kim K-W, Kim TH, Kim IY, Kim SI. Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods. Artificial Intelligence in Medicine 2008;42(1):37–53.

9

[46] May O, Arildsen H. Assessing cardiovascular autonomic neuropathy in diabetes mellitis: how many tests to use? Journal of Diabetes and Its Complications 2000;14:7–12. [47] Shannon CE. A mathematical theory of communication. Bell System Technical Journal 1948;27:379–423. [48] O’Keefe JH, Hammill SC, Freed MS, Pogwizd SM. The complete guide to ECGs. Jones & Bartlett Publishers; 2008. [49] Chao P-K, Wang C-L, Chan H-L. An intelligent classifier for prognosis of cardiac resynchronization therapy based on speckle-tracking echocardiograms. Artificial Intelligence in Medicine 2012;54(3):181–8. [50] Chen Y-H, Yu S-N. Selection of effective features for ECG beat recognition based on nonlinear correlations. Artificial Intelligence in Medicine 2012;54(1):43–52. [51] Chiarugi F, Colantonio S, Emmanouilidou D, Martinelli M, Moroni D, Salvetti O. Decision support in heart failure through processing of electro- and echocardiograms. Artificial Intelligence in Medicine 2010;50(2):95–104. [52] Gacek A, Pedrycz W. A characterization of electrocardiogram signals through optimal allocation of information granularity. Artificial Intelligence in Medicine 2012;54:125–34. [53] Jovic A, Bogunovic N. Electrocardiogram analysis using a combination of statistical, geometric, and nonlinear heart rate variability features. Artificial Intelligence in Medicine 2011;51(3):175–86.


An approach for Ewing test selection to support the ... - Semantic Scholar

An approach for Ewing test selection to support the ... - Semantic Scholar

Suggest Documents

An Approach to Description Logic with Support for ... - Semantic Scholar

MiHOS: an approach to support handling the ... - Semantic Scholar

An Approach to Support the Performance ... - Semantic Scholar

An Approach to Support the Implementation of ... - Semantic Scholar

An Integrated Decision Support Approach to the Selection of ...

An Integrated Decision Support Approach to the Selection of

An Optimal Test Pattern Selection Method to ... - Semantic Scholar

An Approach to Incorporate Context in GUI Test ... - Semantic Scholar

An Ant Colony Optimization Approach to Test ... - Semantic Scholar

An Approach to Support Long-term Creative ... - Semantic Scholar

An Approach to Support Automatic Generation of ... - Semantic Scholar

An Approach to support Web Service Classification ... - Semantic Scholar

An interaction-centric approach to support peer ... - Semantic Scholar

An Integrated Approach to Support Knowledge ... - Semantic Scholar

EXPERIMENTS FOR AN APPROACH TO ... - Semantic Scholar

an annotation-based approach to support design ... - Semantic Scholar

An approach to support long-term creative thinking ... - Semantic Scholar

An Ontology-driven Approach to support Wireless ... - Semantic Scholar

An Agent-based Approach to Navigational Support ... - Semantic Scholar

An ensemble approach for feature selection of ... - Semantic Scholar

An ensemble approach for feature selection of ... - Semantic Scholar

Best Test Cases Selection Approach Using ... - Semantic Scholar

TestFul: an Evolutionary Test Approach for Java - Semantic Scholar

TestFul: an Evolutionary Test Approach for Java - Semantic Scholar