IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 53, NO. 8, AUGUST 2006
1531
An Association Rule Mining-Based Methodology for Automated Detection of Ischemic ECG Beats Themis P. Exarchos, Costas Papaloukas, Dimitrios I. Fotiadis*, Member, IEEE, and Lampros K. Michalis
Abstract—Currently, an automated methodology based on association rules is presented for the detection of ischemic beats in long duration electrocardiographic (ECG) recordings. The proposed approach consists of three stages. 1) Preprocessing: Noise is removed and all the necessary ECG features are extracted. 2) Discretization: The continuous valued features are transformed to categorical. 3) Classification: An association rule extraction algorithm is utilized and a rule-based classification model is created. According to the proposed methodology, electrocardiogram (ECG) features extracted from the ST segment and the T-wave, as well as the patient’s age, were used as inputs. The output was the classification of the beat as ischemic or not. Various algorithms were tested both for discretization and for classification using association rules. To evaluate the methodology, a cardiac beat dataset was constructed using several recordings of the European Society of Cardiology ST-T database. The obtained sensitivity (Se) and specificity (Sp) was 87% and 93%, respectively. The proposed methodology combines high accuracy with the ability to provide interpretation for the decisions made, since it is based on a set of association rules. Index Terms—Association rules, automated ischemic beat detection, data mining, rule-based classification.
I. INTRODUCTION
M
YOCARDIAL ischemia is the condition where oxygen deprivation to the heart muscle is accompanied by inadequate removal of metabolites due to reduced blood flow or perfusion. This reduced blood supply to the myocardium causes alterations in the electrocardiographic (ECG) signal, such as deviations in the ST segment and changes in the T wave [1]. The detection of ischemic episodes in the ECG recordings can be very supportive to the physicians in the diagnosis of myocardial
Manuscript received May 17, 2005; revised January 22, 2006. This work was supported in part by the European Commission within the NOESIS project: Platform for wide scale integration and visual representation of medical intelligence (IST-2002-507960). Asterisk indicates corresponding author. T. P. Exarchos is with the Unit of Medical Technology and Intelligent Information Systems, Department of Computer Science, University of Ioannina, GR 45110 Ioannina, Greece (e-mail:
[email protected]). C. Papaloukas is with the Department of Biological Applications and Technology, University of Ioannina, GR 45110 Ioannina, Greece and also with the Unit of Medical Technology and Intelligent Information Systems, Department of Computer Science, University of Ioannina, GR 45110 Ioannina, Greece (e-mail:
[email protected]). *D. I. Fotiadis is with the Unit of Medical Technology and Intelligent Information Systems, Department of Computer Science, University of Ioannina, P.O. Box 1186, GR 45110 Ioannina, Greece, the Biomedical Research Institute, FORTH, GR 45110 Ioannina, Greece, and the Michaelideion Cardiology Center, GR 45110 Ioannina, Greece (e-mail:
[email protected]). L. K. Michalis is with the Department of Cardiology, Medical School, University of Ioannina, GR 45110 Ioannina, Greece and also with the Michaelideion Cardiology Center, GR 45110 Ioannina, Greece (e-mail:
[email protected]). Digital Object Identifier 10.1109/TBME.2006.873753
ischemia. The accurate ischemic episode detection, where a sequence of cardiac beats is assessed [2], is based on the correct detection of ischemic beats [3]–[6]. Several techniques that evaluate the ST segment changes and the T-wave alterations by different methodologies have been proposed for ischemic beat detection. More specifically, the use of approaches like parametric modelling [7], [8], wavelet theory [9], set of rules [10], [11], artificial neural networks [2], [12]–[15], multicriteria decision analysis [16], and genetic algorithms [17] have been previously reported, with the most common being the neural and the rule-based ones. Neural-based approaches have resulted in high performance but they do not provide explanations for the classification decisions. Rule-based approaches exhibit the highly desirable feature of interpreting the decisions but their performance is reduced. Data mining and more precisely classification using association rules [18] is a methodology with high accuracy and interpretability. Association rules [19] have been utilized for the extraction of knowledge from medical history, laboratory and demographic data [20]–[28] and for the analysis of medical signals [29], [30]. Classification using association rules [18] has been used for medical image categorization [31], analysis of hospitalized patient flows [32] and electroencephalographic transient event detection and classification [33]. In this study, classification using association rules is proposed for the first time in the literature for the ischemic beat detection. In order to train and test the efficiency of the proposed methodology a specific dataset was constructed, which consisted of five representative ECG features. Our methodology is based on a three stage schema (feature extraction module, module for discretizing the continuous feature values, and module for classification using association rules) and introduces two novelties; employment of association rules for ECG analysis and utilization of sophisticated classification tree algorithms for the discretization of the continuous valued features. The employment of association rules for ECG analysis offers the potential of discovering new knowledge in the form of rules and also the ability to provide explanation for the decisions made. The utilisation of sophisticated classification tree algorithms for the discretization of the continuous valued features enables the study of the relevance of each feature in the ischemic beat detection problem. The classification results using data from a task specific cardiac beat database indicate that the proposed diagnostic methodology is very effective and performs well both in terms of sensitivity and specificity. In the following we describe briefly data mining as well as the process of discretization and classification using association rules. Several algorithms for discretization and classification using association rules are analysed. Next, the employed
0018-9294/$20.00 © 2006 IEEE
1532
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 53, NO. 8, AUGUST 2006
Fig. 1. The proposed three-stage methodology.
dataset, as well as, the three stage schema of the ischemic beat detection methodology are analysed. The experiments carried out for the evaluation of our classification system are presented. The performance of the methodology and the advantages and disadvantages are given in the discussion section. Possible further improvements are also discussed. II. MATERIALS AND METHODS A methodology based on a three-stage schema was developed for ischemic beat detection as it is shown in Fig. 1. The three stages correspond to 1) noise handling and ECG feature extraction, 2) feature discretization, and 3) rule mining and beat classification. In the first stage, preprocessing of the ECG recording was performed to achieve noise removal and extraction of the signal features which were used for beat characterisation. In the second stage every continuous valued feature was discretized (it was transformed to categorical) in order to be utilized in the next stage. In the third stage, association rule mining algorithms were applied to generate association rules, which were used for ischemic beat detection. A. Data Mining Data mining is the process of discovering valuable information from large amounts of data stored in databases, data warehouses, or other information repositories. This valuable information can have the form of associations, patterns, changes, anomalies and significant structures [34], [35]. In brief, data mining tasks can be classified in two categories: descriptive data mining and predictive data mining. The first describes the dataset in a concise and brief manner and presents general properties of the data; whereas the second constructs one, or a set of, models, performs inference on the available dataset, and attempts to predict the behaviour of new datasets [36]. The application fields of data mining are class description, association analysis, classification, prediction, clustering, timeseries analysis and outlier analysis [37]. Association rule mining [19] is a data mining technique and is a commonly used methodology for local-pattern discovery in unsupervised learning systems. Association rule mining is the discovery of association relationships or correlations among a set of items. They are often expressed in a rule form showing feature-value conditions occurring frequently in a given dataset.
is interpreted as ‘dataAn association rule in the form are likely to satisfy ’. Tuples are base tuples that satisfy and are composed of rows in relational databases while items and are called itemsets. is called the antecedent part or body of the rule and the consequent part or head of the rule. The most widely used framework for the evaluation of the association rules is the support-confidence framework. The supis the ratio of the tuples port of an association rule which contain the itemsets and to the total number of tuples in the database. The confidence of an association rule is the ratio of the tuples which contain the itemsets and to the tuples which contain the itemset . Many efficient algorithms have been proposed for association rule mining, with the most common being the Apriori [38] and the FPgrowth [39] algorithms. B. Discretization Datasets usually have attributes with numerical domains which make them unsuitable for certain data mining algorithms dealing mainly with nominal attributes, such as naive Bayes classifiers and association rule mining algorithms. To use such algorithms numerical attributes must be replaced with nominal attributes representing intervals of numerical domains with discrete values. This process is known as discretization and involves the transformation of a quantitative variable into a qualitative one [40] and it is described below. Considering to be a numerical attribute of a set of objects, the set of values of the components of these objects that correspond to the attribute is the active domain of and is denoted . To discretize we select a sequence of numbers in . Then, the attribute is distinct values replaced by the nominal attribute having . Each in its active domain, which is denoted as component of attribute for an object is replaced by the discretized component of defined as
(1) define the discretization process The numbers and are known as class separators. There are two types of discretization based on the use or not of the classes to which objects belong: unsupervised discretization, where the discretization takes place without any knowledge of the classes to which objects belong; supervised discretization which takes into account the classes of the objects [41]. In our work we tested two methods for the discretization of the continuous features. An unsupervised one, the equal depth binning [40] and a supervised one which is based on classification trees [42]. Equal depth binning (or frequency partitioning) is perhaps one of the simplest methods for data discretization and it has been applied to produce nominal values from continuous ones. It involves division of a continuous variable into bins (sets). The number of bins is a parameter provided by the user. Given instances, each bin contains (possibly duplicated) adjacent values. The method is applied to each continuous feature
EXARCHOS et al.: ASSOCIATION RULE MINING-BASED METHODOLOGY FOR AUTOMATED DETECTION OF ISCHEMIC ECG BEATS
independently. Equal depth binning is an unsupervised method for discretization and it is efficient when data are skewed as is in our case due to the presence of noise in ECG recordings. The second approach employs classification trees. Classification trees are used for classification problems but they can be used for discretization problems, as well, after proper modification. In this way, the discretization intervals are based on the classification accuracy of each feature. Using this technique we studied the relevance of the features in the cardiac beat classification problem. This technique is called classification tree discretizer (CT-disc) and it is described below in detail. CT-disc is a purity-based method that constructs a classification tree for every continuous valued feature separately in order to determine the number of thresholds for discretization and their values. In this sense, the algorithm is first applied to each feature separately and builds a tree which contains binary splits testing only the single continuous feature. The algorithm , a purity-based measure, to determine the paruses of a dataset is titions for discrete intervals. The given as
(2) where is the number of classes (two in our case, normal and ischemic), is the number of instances in the dataset (the number of beats in the training set), and is the set of instances of class in . CT-disc computes a threshold between the feature values, so that if we split the feature in this threshold, we will have the minimum impurity. In order to find the threshold which minimizes the impurity, the cases in our dataset are sorted using the feature values to give ordered distinct values . Each pair of adjacent values suggests a potential threshold and a corresponding partition of . The set of examples (i.e., the beats) at the node to be split contains instances belonging to one of categories. We assume that the example set is split into two non-overlapping subsets and . contains the instances in which the attribute and the instances in which the attribute . and are the number of instances of category in and , respectively. In order to find the impurity of the split, we comfor the two subsets and pute the
(3)
(4) (5) is the for subset and is where for subset . After computing the impurity the for every possible threshold , the best (i.e., the one which gives the minimum impurity) is used to split . The procedure
1533
is repeated in order to find other possible thresholds to split the two new subsets obtained by the first split. In other words a complete tree is built for every feature. Splitting will continue until the tree reaches a maximum specified number of nodes (the number of nodes is the number of class separators). After the tree for a single feature is built with CT-disc, we can simply use the threshold values at each node of the induced tree as the threshold values for the discretization of this feature. C. Classification Using Association Rules A new field of data mining, classification using association rules [18], applies concepts used in association rule mining to a classification problem. Classification based upon association rules utilizes a special subset of association rules with consequents restricted to the classification class attribute. In order to understand the problem of classification using association rules we need to define the semantics and relations among the entities of tuple, literal and rule. Let be a set of tuples. Each tuple in follows the scheme , where are the features. If those features were continuous, then they should first be transformed into discrete ones. , A literal is a feature-value pair, taking the form where is an attribute and a value. A tuple satisfies a literal if and only if , where is the value of the th feature of . , A rule , which takes the form , associated consists of a conjunction of literals with a class label . A tuple satisfies rule ’s body if and only if it satisfies every literal in the rule. If satisfies ’s body, predicts that belongs to class . If a rule contains zero literal, its body is satisfied by any tuple. These rules are called class association rules (CARs) and, after their generation, they are used to form a classification model. Several studies have shown that classification using association rules has higher accuracy and efficiency than traditional classification systems (e.g. decision trees, naive-Bayes classifiers) [18], [43], [44]. An effective algorithm applying association rules for classification is the classification based on predictive association rules (CPAR) [44]. According to this algorithm, the classification is realized in three steps: rule generation, rule evaluation and classification. 1) Rule Generation: CPAR inherits the basic idea from the first-order inductive learner (FOIL) algorithm [45]. FOIL is a greedy algorithm learning rules able to distinguish positive examples from negative ones. FOIL repeatedly searches for the current best rule and removes all the positive examples covered by the rule until all the positive examples in the dataset are covered. For multi-class problems, FOIL is applied to each class: the examples for each class are used as positive examples and those of other classes as negative ones. The rules for all classes are merged together to form the resulted rule set. CPAR builds rules by adding literals one by one. At each stage of the rule generation process, CPAR computes the gain of each is the literal (the same quantity that is used in FOIL). number of bits (i.e., amount of information) saved to represent
1534
all the positive examples by adding the literal given as
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 53, NO. 8, AUGUST 2006
to rule
and is
(6) are the positive examples and are the negative exwhere amples satisfying the current rule ’s body. Also, after a literal is added to , there are positive and negative examples satisfying the new rule’s body. The best literal (i.e., the one having the highest gain) and the literals having similar gain (i.e., similar at least 99%, which is a usual value of gain similarity ratio) are chosen to form the predictive association rules. For example, when a new rule is to be built, the queue of the rules is first checked. If it is not empty, a rule is extracted from it and is considered as the current rule. After finding the best literal , another literal having similar gain as is found. Besides appending literal to , we also append to the current rule to create a new rule , which is pushed into the queue. This forms the depth-first-search in rule generation. Another property of the CPAR rule generation process is that after each example is covered by a rule, this example is used but again (as opposed to FOIL) in the computation of the with a decreased weight (it is multiplied by a decay factor). The weight of an example is decreased until it reaches a minimum value which is computed by multiplying with a parameter the initial weight of all positive examples. The weight of each example at the beginning of the rule generation is set to 1. 2) Rule Evaluation: After the rule generation process, CPAR evaluates every rule to determine its prediction power. For a rule , CPAR defines its expected accuracy as the probability that an example satisfying ’s body belongs ` . CPAR doesn’t use to class , or the usual support confidence framework for the association rule evaluation but makes use of the Laplace expected error estimate, to estimate the accuracy of rules. The expected accuracy of a rule is given as
(7) where the Laplace expected error estimate is defined as (8) Thus, (7) becomes: (9) where is the number of classes, is the total number of examples examples satisfying the rule’s body, among which belong to , which is the predicted class of the rule. 3) Classification: Given a rule set containing rules for each class, CPAR uses the best rules of each class for prediction, following the procedure: a) selection of all rules whose bodies are satisfied by the example;
b) selection of the best rules for each class from the rules selected in step a); and c) comparison of the average expected accuracy of the best rules of each class and selection of the class with the highest expected accuracy as the predicted class. Except CPAR we tested several similar algorithms, in the rule mining and classification stage of our methodology, namely the Classification Based on Associations (CBA) [18], the Classification based on Multiple-class Association Rules (CMAR) [43], and the Apriori-Total From Partial Classification (AprioriTFPC) [46]. CBA [18] first generates as candidate rules all the class association rules exceeding the given support and confidence thresholds using the a priori algorithm [38]. After the rule generation, CBA prunes the set of rules using the pessimistic error rate method [47]. More specifically if rule ’s pessimistic error rate (the latter is is higher than the pessimistic error rate of rule obtained by deleting one condition from the conditions of ), then rule is pruned. This pruning procedure can cut down the generated number of rules substantially. In the testing phase, when predicting the class label for an example, the best rule (i.e., the one having highest confidence) whose body is satisfied by the example is chosen for prediction. CMAR [43] generates rules in a similar way as CBA but uses the FPgrowth algorithm [39]. In the pruning phase, CMAR selects only positively correlated rules. This means that for a rule , the algorithm checks whether is positively correlated with by chi-square testing (chi-square test method measures the significance of associations). Only the rules that are positively correlated, i.e., value larger than a significance level threshold, those having are used for later classification. All the other rules are pruned. Also CMAR prunes rules based on database coverage. That is, CMAR removes one data object from the training dataset after it is covered by at least rules ( expresses the database coverage parameter). That allows more selected rules. In the testing phase, for a new sample, CMAR collects the subset of rules matching the sample from the total set of rules. If all the rules have the same class, CMAR assigns this class to the new sample. If the rules are not consistent in the class label, CMAR divides the rules into groups according to the class label and yields the label of the “strongest” group. The “strength” of a group of rules is computed using weighted chi-square. Apriori-TFPC [46] is a classification association rule mining algorithm based on the Apriori-Total From Partial (Apriori-TFP) algorithm [48]. Its difference from other algorithms is that it does not use the standard two stage approach: (1) generation of all CARs, and (2) pruning of CARs to produce a classifier. Instead Apriori-TFPC employs only a single stage where CARs are generated as part of the frequent set identification process. In the testing phase, the rule with the highest confidence, whose body is satisfied by a test sample, is chosen for the prediction of this sample. D. Dataset In order to construct the dataset for training and testing our classification methodology, 11 h of two-channel ECG recordings from the European Society of Cardiology (ESC) ST-T data-
EXARCHOS et al.: ASSOCIATION RULE MINING-BASED METHODOLOGY FOR AUTOMATED DETECTION OF ISCHEMIC ECG BEATS
1535
Fig. 2. ECG features extracted from the recordings: (a) ST segment deviation, (b) ST segment slope, (c) ST segment area, and (d) T-wave amplitude.
base [49] were used. Those contain the whole e0104 recording and the first hour of the e0103, e0105, e0108, e0113, e0114, e0147, e0159, e0162, and e0206 recordings. Each recording was sampled at 250 samples/s with 12-bit resolution over a nominal 20 mV input range. The sample values were rescaled after digitization with reference to calibration signals in the original analog recordings, in order to obtain a uniform scale of 200 analog-to-digital converter units/mV for all signals. These 10 recordings were selected because their ischemic ECG beats were characterized by significant waveform variability, which was observed by visual inspection of the above 10 recordings. This subset of the ESC ST-T Database has been previously used for ischemic beat detection [7], [16], [17]. Three medical experts annotated independently each beat as normal, ischemic or artefact. In case of disagreement the three medical experts reviewed the relevant beat and a decision was taken by consensus. This resulted in a dataset of 86 384 cardiac beats (half from every channel) annotated as normal, ischemic or artefact. After removing the artefacts and the misdetected beats, the final dataset contained 76 989 cardiac beats, diagnosed as normal or ischemic. From those, 1936 beats (982 normal beats and 954 ischemic) were used to find the discretization intervals and for rule mining (training) while the rest (38 344 normal beats and 36 709 ischemic) for testing the performance of the classification methodology. It must be noted that the training set was constructed by selecting iteratively the first beat out of a sequence of 40 ones.
E. Implementation The classification using association rules methodology for the ischemic beat detection was implemented in a three-stage approach (Fig. 1). First, the preprocessing of the recorded ECG signal was performed (for both leads) in order to eliminate noise distortions (e.g., baseline wandering, A/C interference and electromyographic contamination). Noise elimination was achieved by filtering each recorded cardiac beat separately using ECG filtering [17]. In brief, baseline wandering was removed by subtracting from the recorded signal the first-order polynomial that best fits the cardiac beat. A/C interference and electromyographic contamination were not removed from the recorded signal but were handled properly for the detection of the J point. More specifically, for these two types of noise, a 20 ms averaging filter was applied around J. The exact location of the J point was detected using a technique based on an edge-detection algorithm [50]. After noise removal and J point detection, the following features were extracted from each cardiac beat, (Fig. 2). : ST segment deviation [Fig. 2(a)]; : ST segment slope [Fig. 2(b)]; : ST segment area [Fig. 2(c)]; : T-wave amplitude [Fig. 2(d)]; : T-wave normal amplitude. In addition to these 5 features a sixth one (F6: age of the patient) was employed. This feature was given in the demographic data provided by the ESC ST-T database.
1536
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 53, NO. 8, AUGUST 2006
Fig. 3. Flowchart of the overall ischemic beat classification methodology.
According to the ESC recommendations [51], the ST segment changes were measured either 80 ms after the J point (J80) , or 60 ms after the J point (J60) . The ST segment deviation refers to the amplitude deviation of the ST segment from the isoelectric line, which is the line defining the level of zero amplitude. (In the European ST-T Database the ST segment deviation is measured relative to a reference waveform for each subject, which is selected from the first 30 s of each database record. Here we followed a different but similar approach since beat to beat annotations were employed for training and testing). The ST segment slope is the slope of the line connecting the J and J80 (or J60) points. The ST segment area is the area between the ECG trace, the isoelectric line and the points J and J80 (or J60). The T-wave amplitude is the amplitude deviation of the T-wave peak from the isoelectric line. The T-wave normal amplitude together with its respected polarity refers to the amplitude and polarity of normal beats for a specific ECG lead. It is calculated using the first 30 s of each recording and is computed by the mean value of the T-wave amplitudes at this interval. T-wave amplitude and T-wave normal amplitude are merged into a single new feature, , which is the difference between features and . It should be mentioned that the whole procedure described above is applied in each lead separately. The dataset was sampled and the 2.5% of it (as mentioned before 1936 beats) was used for training while the rest (75 053 beats) for testing. All the five features were continuous valued, so discretization was applied in the second stage of the methodology. The CT-disc and the equal depth binning algorithms were used. For the equal depth binning the number of bins was set to 10, so each bin had approximately 194 samples (number of beats in the training set/number of bins). The equal depth binfeature. ning technique could not be applied properly in the Having 10 patients, two of them with the same age, we created eight bins (the number of distinct age values), each one containing samples with the same age. In the case of CT-disc, after some experiments we set the maximum number of nodes to 9 for feature , to 12 nodes for feature , to 6 nodes for feature
, to 11 nodes for feature and to 7 nodes for feature . After extracting the discretization threshold values for the five features, we transformed them to qualitative ones. It should be mentioned that the same transformation was performed in the test set, in order to use it in the classification step for evaluation. This means that the continuous values of the features in the test set must be discretized in the same intervals estimated in the training set. Finally, in the third stage of our methodology the rule mining and the classification took place using the following algorithms: CBA, CMAR, CPAR, and Apriori-TFPC which worked in a similar fashion. In brief, they extracted class association rules having as antecedent a subset (or all) of the five ECG features with its respected discrete value and as consequent the class of the ECG beat. After testing several values for the parameters of the algorithms, the optimum ones were obtained. More specifically for the CPAR algorithm we used for both discretization techniques the following parameters: was set to 0.05, the minimum gain to 0.7, the gain similarity ratio to 0.99 and the weight decay factor to 0.667. The best five rules were used for prediction. In the CBA algorithm, when the equal depth binning was used, the minimum support was set to 0.3% and the minimum confidence to 50%. When CT-disc was used, the minimum support was set to 0.2% and the minimum confidence to 50%. For the CMAR algorithm and for both discretization techniques, the minimum support was set to 0.2%, the minimum confidence to 50%, the database coverage was set to 2.7055 (critical threshold for 5%“significance” level, assuming degree of freedom equivalent to 1). For the Apriori-TFPC algorithm, and the best results were obtained for for the equal depth binning and minimum support 0.2% and for the CT-disc. The flowchart of our methodology is shown in Fig. 3. III. RESULTS We run several experiments in order to test the accuracy of our methodology. Table I displays the experimental results obtained from the use of various association rule-based classifi-
EXARCHOS et al.: ASSOCIATION RULE MINING-BASED METHODOLOGY FOR AUTOMATED DETECTION OF ISCHEMIC ECG BEATS
TABLE I CLASSIFICATION RESULTS OBTAINED USING THE TWO DISCRETIZATION TECHNIQUES AND VARIOUS ASSOCIATION RULE BASED CLASSIFICATION ALGORITHMS
TABLE II CARDIAC BEAT CLASSIFICATION RESULTS USING THE CT-DISC AND CPAR ALGORITHMS. THE ROWS DENOTE THE ACTUAL CATEGORY OF THE CARDIAC BEATS (AS ANNOTATED BY THE MEDICAL EXPERTS), WHILE THE COLUMNS DENOTE THE CLASSIFICATIONS OF THE METHODOLOGY
cation algorithms and the two discretization techniques. All the tested algorithms performed comparably but the CT-disc algorithm for discretization combined with the CPAR algorithm for rule generation and classification was found to be the most effective. The ischemic beat detection methodology was tested on a task specific database and demonstrated a sensitivity (Se) 87% and specificity (Sp) 93%. As seen in Table II the combination of CT-disc with CPAR missed only 4629 ischemic and 2588 normal beats. The discretization intervals extracted with the equal depth binning and the CT-disc are shown in Tables III and IV, respectively. CT-disc divided all the continuous valued features in intervals, based on the diagnostic property of each feature. The “strength” of each feature in the cardiac beat classification problem is shown in Table V. This was measured after using the classification tree for every feature. This tree was created in the discretization stage from the training set, and was applied to classify the cases in the test set. It should be mentioned that the CPAR algorithm (after discretization with CT-disc), in the rule generation process, generated 394 association rules. 198 of them were rules which predicted ischemic beats, while the rest 196 predicted normal beats. Some indicative rules extracted with CPAR combined with the CT-disc are shown below. The rules can be interpreted using the discretization intervals from Table IV. Finally, the running times and the number of generated rules for the four algorithms after discretization with the CT-disc and equal depth binning (using an Intel Pentium IV 2.4-GHz and 256 MB RAM) are shown in Table VI. According to Table IV, the first rule is interpreted as follows: If the ST Segment Deviation of a cardiac beat is in the interval [ 0.1299, 0.0595) and the Age of the patient is in the interval [46.5,51), then the beat is classified as ischemic.
1537
IV. DISCUSSION In this paper, classification using association rules was applied for the detection of ischemic beats in long duration ECGs. The methodology was implemented in a three stage schema: 1) ECG feature extraction; 2) feature discretization; 3) rule generation and beat classification. All stages of the proposed methodology were performed automatically. The equal depth binning and the modified classification tree algorithm (CT-disc) were tested for discretization, while CBA, CMAR, CPAR, and Apriori-TFPC algorithms were tested for classification using association rules. Our methodology reported high accuracy, combined with the interpretation for the classification decisions. The high performance of the proposed ischemic beat detection schema can be attributed to several factors. The employment of CT-disc for the discretization process converted all the quantitative features into qualitative ones based on the property of each feature to classify the cardiac beats (a similar approach can be found in [52] where C4.5 decision tree induction algorithm was applied for discretization). CT-disc is a supervised method for discretization and according to previous reports [41], supervised methods are slightly better than unsupervised. Using this approach we studied the relevance of each feature in the beat classification problem and we found that the best “classifier” with an accuracy of 76% was the ST segment area, while the weakest one with an accuracy of 59% was the ST segment slope (Table V).
(L A = 99 23%) If ST segment deviation = 2 and Age = 2 Rule 1:
:
:
:
then beat is ischemic
(L A = 98 67%) If ST segment area = 4 and ST segment deviation = 6 and Age = 0 Rule 2:
:
:
:
then beat is normal
(L A = 98 44%) If ST segment area = 0 and ST segment deviation = 1 Rule 3:
:
:
:
then beat is ischemic
(L A = 97 37%) If Age = 2 and T 0 wave 0 T 0 normal amplitude = 3 Rule 4:
:
:
:
then beat is ischemic
(L A = 93 33%) If T 0 wave 0 T 0 normal amplitude = 5 and ST segment deviation = 8 and Age = 1 and ST segment slope = 0 Rule 5:
:
:
then beat is normal
:
1538
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 53, NO. 8, AUGUST 2006
TABLE III DISCRETIZATION INTERVALS OF THE CONTINUOUS VALUED FEATURES WITH THE EQUAL DEPTH BINNING ALGORITHM. THE FIRST COLUMN SHOWS THE BIN WHERE A FEATURE BELONGS AFTER THE EQUAL DEPTH BINNING ALGORITHM IS APPLIED
TABLE IV DISCRETIZATION INTERVALS OF THE CONTINUOUS VALUED FEATURES USING THE CT-DISC ALGORITHM. THE FIRST COLUMN SHOWS THE QUALITATIVE VALUE OF EACH FEATURE AFTER THE DISCRETIZATION PROCEDURE
TABLE V ACCURACY OF THE FIVE FEATURES USED IN CARDIAC BEAT CLASSIFICATION
TABLE VI RUNNING TIMES AND NUMBER OF EXTRACTED RULES FOR THE FOUR ALGORITHMS TESTED IN THE RULE MINING AND CLASSIFICATION STAGE OF OUR METHODOLOGY AFTER DISCRETIZATION WITH THE EQUAL DEPTH BINNING AND THE CT-DISC
(L A = 80 77%) If T 0 wave 0 T 0 normal amplitude = 11 and ST segment area = 1 Rule 6:
:
:
:
then beat is normal (where L.A. denotes Laplace Accuracy) The above rules can be interpreted using the discretization intervals shown in Table IV.
In addition, the utilisation of an effective rule-based classification algorithm provided high performance both in terms of accuracy and efficiency. CPAR achieved higher accuracy for both the discretization methods and required less time for rule generation. This can be possibly attributed to the following features: 1) the use of greedy approach in rule generation, which is more efficient than generating all candidate rules (like CBA, CMAR, and Apriori-TFPC); 2) the utilization of dynamic programming
approach to avoid iterative calculation in rule generation 3) the selection of all the literals with similar gain to the best one instead of selecting only the best literal. In this was important rules were not missed; 4) the utilization of the expected accuracy for the evaluation of each rule and the use of the best rules for prediction by the CPAR algorithm which helped to avoid overfitting. CPAR uses multiple rules for prediction because the accuracy of rules cannot be precisely estimated, and one cannot expect that a single rule can perfectly predict the class label of the examples satisfying its body. Several methodologies for ischemic ECG beat detection have been reported in the past and different datasets have been utilized for their evaluation. As a result the comparison between
EXARCHOS et al.: ASSOCIATION RULE MINING-BASED METHODOLOGY FOR AUTOMATED DETECTION OF ISCHEMIC ECG BEATS
TABLE VII COMPARISON OF THE PERFORMANCE OF SEVERAL METHODS FOR ISCHEMIC BEAT DETECTION EVALUATED USING THE SAME SUBSET OF THE ESC ST-T DATABASE
1539
ciation rules. The main advantage of the proposed methodology is the combination of high accuracy with the ability to provide interpretation for the decisions made, due to the employment of association rules for the classification. The performance of our approach compares well with previously reported results using the same subset from the ESC ST-T database and indicates that it could be part of a system for the detection of ischemic episodes in long duration ECGs. Clinical testing, however, is needed in order to be fully evaluated. REFERENCES
these methodologies is difficult. The proposed ischemic beat detection methodology, however, compares favorably with those that used the same dataset for evaluation (Table VII) [7], [10], [12], [16], [17]. All methods shown in Table VII were tested using the same subset of the ESC ST-T database with the current work. In [17] genetic algorithms were utilised for ischemic beat detection. This method performed slightly better, in terms of sensitivity, compared to our approach but it required high computational effort and processing time in order to tune the parameters of the genetic algorithm. The performance of other ischemic beat detection systems been proposed previously [4], [8], [9], [13]–[15], cannot be compared with our methodology, since they either have been evaluated with other test sets [8], [9] or different subsets of the ESC ST-T database [4], [13]–[15], or they have employed different performance measures [8], [13]–[15]. It should be mentioned that most of the previously reported techniques are based on signal processing and neural approaches. Such methods exhibit a serious drawback compared with our association rule approach, due to their inability to provide explanations for their classification decisions. In contrast, due to the rule-based nature of our methodology, the proposed approach satisfies this important requirement, and it is able to provide for each beat, the reason (rule) leading to each decision. A limitation of our methodology is the requirement of a representative training set in order to extract reliable rules. The validity of this training in different settings is unknown. In addition, the utilization of association rules for classification, besides finding valid, causal relationships in the clinical data, will also find all of the spurious and particular relationships among the data in the specific dataset. For this reason, results of any association rule mining procedure should be considered as exploratory and hypothesis-generating. Further improvement might focus on the utilization of more features extracted from the ECG describing better each beat and also the employment of other patient’s clinical data. In addition, coherent information could be used in order to diagnose a beat based on the previous diagnosed beats. Moreover our system could be adapted to address other cardiac abnormalities, such us arrhythmias (arrhythmic beat detection). V. CONCLUSION We presented a novel methodology for the automated detection of ischemic beats that employed classification using asso-
[1] M. J. Goldman, Principles of Clinical Electrocardiography, 11th ed. Los Altos, CA: LANGE Medical, 1982. [2] A. Taddei, G. Distante, M. Emdin, P. Pisani, G. B. Moody, C. Zeelenberg, and C. Marchesi, “The European ST-T database: standard for evaluating systems for the analysis of ST-T changes in ambulatory electrocardiography,” Eur. Heart J., vol. 13, pp. 1164–1172, 1992. [3] R. Silipo, P. Laguna, C. Marchesi, and R. G. Mark, “ST-T segment change recognition using artificial neural networks and principal component analysis,” Comput. Cardiol., pp. 213–216, 1995. [4] T. Stamkopoulos, K. Diamantaras, N. Maglaveras, and M. Strintzis, “ECG analysis using nonlinear PCA neural networks for ischemia detection,” IEEE Trans. Signal Process., vol. 46, no. 11, pp. 3058–3067, Nov. 1998. [5] F. Jager, R. G. Mark, G. B. Moody, and S. Divjak, “Analysis of transient ST segment changes during ambulatory monitoring using the Karhunen-Loeve transform,” Comput. Cardiol., pp. 691–694, 1992. [6] F. Jager, G. B. Moody, A. Taddei, and R. G. Mark, “Performance measures for algorithms to detect transient ischemic ST segment changes,” Comput. Cardiol., pp. 369–372, 1991. [7] C. Papaloukas, D. I. Fotiadis, A. Likas, and L. K. Michalis, “An expert system for ischemia detection based on parametric modeling and artificial neural networks,” in Proc. Eur. Med. Biol. Eng. Conf., 2002, pp. 742–743. [8] Pitas, M. G. Strintzis, S. Grippas, and C. Xerostylides, “Machine classification of ischemic electrocardiograms,” in Proc. IEEE Mediterranean Electrotechnical Conf. (MELECON), Athens, Greece, 1983. [9] L. Senhadji, G. Carrault, J. J. Bellanger, and G. Passariello, “Comparing wavelet transforms for recognizing cardiac patterns,” IEEE Eng. Med. Biol. Mag., vol. 14, no. 2, pp. 167–173, Mar./Apr. 1995. [10] C. Papaloukas, D. I. Fotiadis, A. P. Liavas, A. Likas, and L. K. Michalis, “A knowledge-based technique for automated detection of ischemic episodes in long duration electrocardiograms,” Med. Biol. Eng. Comput., vol. 39, pp. 105–112, 2001. [11] C. Papaloukas, D. I. Fotiadis, A. Likas, C. S. Stroumbis, and L. K. Michalis, “Use of a novel rule-based expert system in the detection of changes in the ST segment and the T wave in long duration ECGs,” J. Electrocardiol., vol. 35, pp. 27–34, Jan. 2002. [12] C. Papaloukas, D. I. Fotiadis, A. Likas, and L. K. Michalis, “An ischemia detection method based on artificial neural networks,” Artif. Intell. Med., vol. 24, pp. 167–178, 2002. [13] N. Maglaveras, T. Stamkopoulos, C. Pappas, and M. Strintzis, “ECG processing techniques based on neural networks and bidirectional associative memories,” J. Med. Eng. Technol., vol. 22, pp. 106–111, 1998. [14] S. Papadimitriou, S. Mavroudi, L. Vladutu, and A. Bezerianos, “Ischemia detection with a self-organizing map supplemented by supervised learning,” IEEE Trans. Neural Netw., vol. 12, no. 3, pp. 503–515, May 2001. [15] N. Maglaveras, T. Stamkopoulos, C. Pappas, and M. G. Strintzis, “An adaptive backpropagation neural network for real-time ischemia episodes detection: development and performance analysis using the European ST-T database,” IEEE Trans. Biomed. Eng., vol. 45, no. 7, pp. 805–813, Jul. 1998. [16] Y. Goletsis, C. Papaloukas, D. I. Fotiadis, A. Likas, and L. K. Michalis, “A multicriteria decision based approach for ischemia detection in long duration ECGs,” in Proc. IEEE EMBS 4th Int. Conf. Information Technology Applications in Biomedicine (ITAB 2003), 2003, pp. 230–233. [17] ——, “Automatic ischemic beat classification using genetic algorithms and multicriteria decision analysis,” IEEE Trans. Biomed. Eng., vol. 51, no. 10, pp. 1717–1725, Oct. 2004. [18] Liu, W. Hsu, and Y. Ma, “Integrating classification and association rule mining,” in Proc. of the 4th International Conf. on Knowledge Discovery and Data Mining (KDD-98), New York, 1998, pp. 80–86.
1540
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 53, NO. 8, AUGUST 2006
[19] R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,” in Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data, May 1993, pp. 207–216. [20] J. Balter, A. Labarre-Vila, D. Ziebelin, and C. Garbay, “A knowledgedriven agent-centered framework for data mining in EMG,” Crit. Rev. Biol., vol. 325, pp. 375–382, 2002. [21] P. C. Pendharkar, J. A. Rodger, G. J. Yaverbaum, N. Herman, and M. Benner, “Association, statistical, mathematical and neural approaches for mining breast cancer patterns,” Exper. Syst. with Applicat., vol. 17, pp. 223–232, 1999. [22] S. P. Imberman, B. Domanski, and H. W. Thompson, “Using dependency/association rules to find indications for computed tomography in a head trauma dataset,” Artif. Intell. Med., vol. 26, pp. 55–68, Sep./Oct. 2002. [23] L. Ma, F. C. Tsui, W. R. Hogan, M. M. Wagner, and H. Ma, “A framework for infection control surveillance using association rules,” in Proc. Am. Medical Information Association, 2003 Annu. Symp., 2003, pp. 410–414. [24] S. Doddi, A. Marathe, S. S. Ravi, and D. C. Torney, “Discovery of association rules in medical data,” Med. Inf. Internet. Med., vol. 26, pp. 25–33, Jan.-Mar. 2001. [25] S. M. Downs and M. Y. Wallace, “Mining association rules from a pediatric primary care decision support system,” in Proc. Am. Medical Information Association, 2000 Ann. Symp., 2000, pp. 200–204. [26] D. Gamberger, N. Lavrac, and V. Jovanoski, “High confidence association rules for medical diagnosis,” in Proc. 4th Workshop Int. Data Analysis in Medicine and Pharmacology (IDAMAP 99), 1999, pp. 42–51. [27] S. Stilou, P. D. Bamidis, N. Maglaveras, and C. Pappas, “Mining association rules from clinical databases: an intelligent diagnostic process in healthcare,” Medinfo, vol. 10, pp. 1399–1403, 2001. [28] C. Ordonez, E. Omiecinski, L. DeBraal, C. Santana, N. Ezquerra, J. Toboada, D. Cooke, E. Krawczynska, and E. Garcia, “Mining constrained association rules to predict heart disease,” in Proc. IEEE Int. Conf. Data Mining, ICDM, 2001, pp. 433–440. [29] J. Bourien, J. J. Bellanger, F. Bartolomei, P. Chauvel, and F. Wendling, “Mining reproducible activation patterns in epileptic intracerebral EEG signals: application to interictal activity,” IEEE Trans. Biomed. Eng., vol. 51, no. 2, pp. 304–315, Feb. 2004. [30] S. Konias and N. Maglaveras, “A rule discovery algorithm appropriate for ECG signals,” Comput. Cardiol., vol. 31, pp. 57–60, 2004. [31] M. Antonie, O. R. Zaoane, and A. Coman, “Associative classifiers for medical images,” in Lecture Notes in Artificial Intelligence, 2003, vol. 2797, Mining Multimedia and Complex Data, pp. 68–83. [32] T. Dart, Y. Cui, G. Chatellier, and P. Degoulet, “Analysis of hospitalised patient flows using data-mining,” Stud. Health Technol. Inf., vol. 95, pp. 263–268, 2003. [33] T. P. Exarchos, A. T. Tzallas, D. I. Fotiadis, S. Konitsiotis, and S. Giannopoulos, “A data mining based approach for the EEG transient event detection and classification,” in Proc. 18th IEEE Int. Symp Computer-Based Medical Systems, Dublin, Ireland, 2005, pp. 35–40. [34] U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to knowledge discovery: an overview,” in Advances in Knowledge discovery and data mining. Cambridge, MA: AAAI Press/MIT Press, 1996, pp. 1–36. [35] W. Frawley, G. Piatetsky-Shapiro, and C. Matheus, “Knowledge discovery in databases: an overview,” Artif. Intell. Mag., vol. 13, no. 3, pp. 57–70, 1992. [36] M. Chen, J. Han, and P. Yu, “Data mining: an overview from a database perspective,” IEEE Trans. Knowl. Data Eng., vol. 8, no. 6, pp. 866–881, Dec. 1996. [37] J. Han and M. Kamber, Data Mining Concepts and Techniques. San Fransisco, CA: Morgan Kaufmann, 2001. [38] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proc. 20th Int. Conf. Very Large Data Bases, Santiago, Chile, Sep. 1994, pp. 487–499. [39] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” in Proc. 2000 ACM SIGMOD Int. Conf. Management of Data, 2000, vol. 29, no. 2, pp. 1–12. [40] J. Catlett, “On changing continuous attributes into ordered discrete attributes,” in Proc. 5th Eur. Working Session on Learning, 1991, pp. 164–178. [41] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsupervised discretization of continuous features,” in Proc. 12th Int. Conf. Machine Learning, 1995, pp. 194–202. [42] L. Breiman, J. H. Friedman, R. A. Olsen, and C. J. Stone, Classification and Regression Trees. Monterey, CA: Wadsworth & Brooks, 1984. [43] W. Li, J. Han, and J. Pei, “CMAR: accurate and efficient classification based on multiple class-association rules,” in Proc. 2001 IEEE International Conf. on Data Mining, San Jose, CA, Nov. 2001, pp. 369–376.
[44] X. Yin and J. Han, “CPAR: classification based on predictive association rules,” in Proc. 3rd SIAM Int. Conf. Data Mining (SDM’03), San Francisco, CA, May 2003, pp. 331–335. [45] J. R. Quinlan and R. M. Cameron-Jones, “FOIL: a midterm report,” in Proc. 6th Eur. Conf. Machine Learning, Vienna, Austria, 1993, pp. 3–20. [46] F. Coenen, 2004, The LUCS-KDD Group, Department of Computer Science, The University of Liverpool, UK [Online]. Available: http:// www.cSc.liv.ac.uk/~frans/KDD/ [47] J. R. Quinlan, C4.5: Programs for Machine Learning. Los Altos, CA: Morgan Kaufmann, 1993. [48] F. Coenen, P. Leng, and S. Ahmed, “Data structures for association rule mining: T-trees and P-trees,” IEEE Trans. Knowl. Data Eng., vol. 16, no. 6, pp. 774–778, Jun. 2004. [49] European Society of Cardiology, Pisa, Italy, European ST-T database directory 1991. [50] K. Daskalov, I. A. Dotsinsky, and I. I. Christov, “Developments in ECG acquisition, preprocessing, parameter measurement, and recording,” IEEE Eng. Med. Biol., vol. 17, no. 2, pp. 50–58, Mar./Apr. 1998. [51] A. Taddei, A. Benassi, M. G. Bongiorni, C. Contini, G. Distante, L. Landucci, M. G. Mazzei, P. Pisani, N. Roggero, M. Varanini, and C. Marchesi, “ST-T changes analysis in ECG ambulatory monitoring: a European standard for performance evaluation,” Comput. Cardiol., pp. 63–68, 1988. [52] R. Kohavi and M. Sahami, “Error-based and Entropy-Based Discretization of Continuous Features,” in Proc. 2nd Int. Conf. Knowl. Data Mining, 1996, pp. 114–119. Themis P. Exarchos was born in Ioannina, Greece, in 1980. He received the Diploma degree in computer engineering and informatics from the University of Patras, Patras, Greece, in 2003. He is currently working toward the Ph.D. degree in medical physics at the University of Ioannina. His research interests include medical data mining, decision support systems in healthcare, and biomedical applications.
Costas Papaloukas was born in Ioannina, Greece, in 1974. He received the diploma degree in computer science and the Ph.D. degree in biomedical technology from the University of Ioannina in 1997 and 2001, respectively. He is a Lecturer of Bioinformatics with the Department of Biological Applications and Technology, University of Ioannina. His research interests include biomedical engineering and bioinformatics.
Dimitrios I. Fotiadis (M’01) was born in Ioannina, Greece, in 1961. He received the Diploma degree in chemical engineering from National Technical University of Athens, Athens, Greece, and the Ph.D. degree in chemical engineering from the University of Minnesota, Twin Cities. Since 1995, he has been with the Department of Computer Science, University of Ioannina where he currently is an Associate Professor. He is Director of the Unit of Medical Technology and Intelligent Information Systems. His research interests include biomedical technology, biomechanics, scientific computing, and intelligent information systems. Lampros K. Michalis was born in Arta, Greece, in 1960. He graduated with distinction from the Medical School, University of Athens, Athens, Greece, in 1984. He received the M.D. degree (with distinction) from Athens Medical School in 1989. Since 1995, he has been with the Medical School, University of Ioannina, Ioannina, Greece, where he is currently a Professor of Cardiology. He is in charge of the Coronary Care Unit and the Catheter Laboratory, University Hospital, Medical School, University of Ioannina. His research interests focus on bioengineering and interventional cardiology.