Biomedical Signal Processing and Control 5 (2010) 252–263
Contents lists available at ScienceDirect
Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc
Classification of electrocardiogram signals with support vector machines and genetic algorithms using power spectral features A. Khazaee ∗ , A. Ebrahimzadeh Faculty of Electrical and Computer Engineering, Babol University of Technology, Iran
a r t i c l e
i n f o
Article history: Received 22 February 2010 Accepted 26 July 2010 Available online 21 August 2010 Keywords: ECG beat classification SVM Genetic algorithm Parameter optimization Non-parametric PSD estimation methods Multitaper method
a b s t r a c t This paper proposes a new power spectral-based hybrid genetic algorithm-support vector machines (SVMGA) technique to classify five types of electrocardiogram (ECG) beats, namely normal beats and four manifestations of heart arrhythmia. This method employs three modules: a feature extraction module, a classification module and an optimization module. Feature extraction module extracts electrocardiogram’s spectral and three timing interval features. Non-parametric power spectral density (PSD) estimation methods are used to extract spectral features. Support vector machine (SVM) is employed as a classifier to recognize the ECG beats. We investigate and compare two such classification approaches. First they are specified experimentally by the trial and error method. In the second technique the approach optimizes the relevant parameters through an intelligent algorithm. These parameters are: Gaussian radial basis function (GRBF) kernel parameter and C penalty parameter of SVM classifier. Then their performances in classification of ECG signals are evaluated for eight files obtained from the MIT–BIH arrhythmia database. Classification accuracy of the SVMGA approach proves superior to that of the SVM which has constant and manually extracted parameter. © 2010 Elsevier Ltd. All rights reserved.
1. Introduction An arrhythmia is any abnormal cardiac rhythm [1]. Heart arrhythmias result from any disturbance in the rate, regularity, and site of origin or conduction of the cardiac electric impulse [2]. Classification of arrhythmia is an important step in developing devices for monitoring the health of individuals. The sequence of electrical signals of heart provides symptomatic information for classifying cardiac arrhythmias. Classification of normal and abnormal beats requires offline analysis of the ECG record data. This paper investigates the detection and classification of ECG arrhythmias. In the literature, several methods have been proposed for the automatic classification of ECG signals. Among the most recently published work are those presented in [3–20]. These works are a clear indication of research maturation in the field of automatic ECG classification. However there are still some open issues in the design of an ECG classification system which may lead to the development of more robust and efficient classifiers. One of these issues is related to the choice of the classification approach. In particular, the SVM approach does not seem to have received the attention it deserves in the ECG classification literature despite its great potential. Indeed, the SVM classifier exhibits a
∗ Corresponding author. E-mail address:
[email protected] (A. Khazaee). 1746-8094/$ – see front matter © 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.bspc.2010.07.006
promising generalization capability, thanks to the maximal margin principle (MMP) it is based upon [21]. Another important property is that it is less sensitive to the curse of dimensionality than traditional classification approaches. This is explained by the fact that the MMP makes it unnecessary to estimate explicitly the statistical distributions of classes in the hyper-dimensional feature space in order to carry out the classification task. Thanks to these interesting properties, the SVM classifier has proved successful in a number of different application fields. Turning back to ECG classification, another issue that need to be addressed is that, the selection of the best free parameters of the adopted classifier is generally done empirically (model selection issue). In this paper, we propose an automated method for differentiating normal heartbeats (N) from left bundle branch blocks (LBBB or L), right bundle branch blocks (RBBB or R), atrial premature contractions (APC or A) and premature ventricular contractions (PVC or V) heartbeats [1]. The spectral feature extraction in combination with temporal features is used in this study. As mentioned, SVM classifier is used due to its popularity in various classification problems in recent years. One of the strengths of this study is the use of search capability of genetic algorithms for finding optimum values of parameters of SVM (model selection). The value of soft margin constant C penalty parameter of support vector machines which is a positive integer number and the value of Gaussian radial basis function (GRBF) kernel parameter which is a positive real number must be optimized. In our proposed power spectral-based hybrid
A. Khazaee, A. Ebrahimzadeh / Biomedical Signal Processing and Control 5 (2010) 252–263
genetic algorithm-support vector machine (SVMGA) method, the values of C and parameters of SVM classifier are specified by genetic algorithms. The paper is organized as follows. Section 2 describes nonparametric power spectral density (PSD) estimation and the feature extraction module. Sections 3 and 4 explain the support vector machines (SVM) and genetic algorithms (GA), respectively. Section 5 presents our proposed SVMGA method. Section 6 describes the database and performance metrics. Section 7 shows some simulation results. Section 8 discusses the results and finally Section 9 concludes the paper. 2. Feature extraction Power spectrum estimation is perhaps the most widely used method of signal analysis. The power spectrum is related to the correlation function through the Fourier transform. The power spectrum reveals the repetitive and correlated patterns of a signal, which are important in detection, estimation, data forecasting and decision-making systems. The goal of spectral estimation is to describe the distribution (over frequency) of the power contained in a signal, based on a finite set of data. The power spectral density (PSD) of a stationary random process is mathematically related to the correlation sequence by the discrete-time Fourier transform. In general, the more correlated or predictable a signal, the more concentrated its power spectrum, and conversely the more random or unpredictable a signal, the more spread its power spectrum. Therefore the power spectrum of a signal can be used to deduce the existence of repetitive structures or correlated patterns in the signal process. Such information is crucial in detection, decision making and estimation problems, and in systems analysis. There are various methods of spectrum estimation that are categorized as follows: non-parametric power spectrum estimation, model-based power spectrum estimation and high-resolution spectral estimation based on subspace eigen-analysis [22]. Some algorithms from each category examined and consequently nonparametric power estimation methods chose to use in this ECG beat classification problem, due to their high classification performances.
253
for spectral estimation in situations where the SNR is high, and especially if the data record is long. The modified periodogram windows the time-domain signal prior to computing the FFT in order to smooth the edges of the signal. This has the effect of reducing the height of the sidelobes or spectral leakage. This phenomenon gives rise to the interpretation of side-lobes as spurious frequencies introduced into the signal by the abrupt truncation that occurs when a rectangular window is used. For nonrectangular windows, the end points of the truncated signal are attenuated smoothly, and hence the spurious frequencies introduced are much less severe. On the other hand, nonrectangular windows also broaden the main-lobe, which results in a net reduction of resolution. Nonrectangular windowing affects the average power of a signal because some of the time samples are attenuated when multiplied by the window. To compensate for this, the periodogram PSD and the Welch PSD normalize the window to have an average power of unity. This ensures that the measured average power is generally independent of window choice. If the frequency components are not well resolved by the PSD estimators, the window choice does affect the average power. Welch Method [23] attain by averaging modified periodograms from overlapped and windowed segments. In this method, a signal x(m), of length M samples, is divided into K overlapping segments of length N, and each segment is windowed prior to computing the modified periodogram. The ith segment is defined as xi (m) = x(m + iD),
m = 0, . . . , N − 1, i = 0, . . . , K − 1
(2)
where D is the overlap. For half-overlap D = N/2, while D = N corresponds to no overlap. For the ith windowed segment, the modified periodogram is given by
N−1
2
1 (i) Pˆ XX (f ) = w(m)xi (m)e−j2fm NU
(3)
m=0
where w(m) is the window function and U is the power in the window function, given by 1 2 w (m) N N−1
U=
(4)
m=0
2.1. Non-parametric power spectrum estimation methods Non-parametric methods are those in which the PSD is estimated directly from the signal itself. The simplest such method is the periodogram. An improved version of the periodogram is Welch’s method [23]. A more modern non-parametric technique is the multitaper method (MTM). The classic method for estimation of the power spectral density of an N-sample record is the periodogram introduced by Sir Arthur Schuster in 1891. The periodogram is defined as
N−1
2
2 1 1 Pˆ XX (f ) = x(m)e−j2fm = X(f ) N N
(1)
m=0
where x(m) is a signal with N samples and x(f) is the Fourier transform of x(m). The power spectral density function, or power spectrum for short, defined in Eq. (1), is the basis of non-parametric methods of spectral estimation. Owing to the finite length and the random nature of most signals such as ECG, the spectra obtained from different records of a signal vary randomly about an average spectrum. A number of methods have been developed to reduce the variance of the periodogram. In statistical terms, the periodogram is not a consistent estimator of the PSD. Nevertheless, the periodogram can be a useful tool
The spectrum of a finite-length signal typically exhibits sidelobes due to discontinuities at the endpoints. The window function w(m) alleviates the discontinuities and reduces the spread of the spectral energy into the side-lobes of the spectrum. The Welch power spectrum is the average of K modified periodograms obtained from overlapped and windowed segments of a signal [22]: 1 (i) Pˆ XX (f ) K K−1
W Pˆ XX (f ) =
(5)
i=0
In 1982, Thomson published the seminal paper where the multitaper spectral analysis method was first presented. It has since become a classic approach, and has been applied in multiple fields in the physical sciences, medicine and economics. The multitaper method is in principle not any different to other non-parametric direct spectral estimates. The data sequence to be analyzed is multiplied by a series of weights called tapers, the result is then Fourier transformed (using an FFT) and squared to obtain the estimate of the PSD. In the multitaper method, as its name suggests, a set of orthogonal tapers are used to compute many independent estimates of the PSD. The approximately independent spectral estimates are then averaged to achieve maximum suppression of random variability. This is accomplished in traditional methods by a frequency domain smoothing.
254
A. Khazaee, A. Ebrahimzadeh / Biomedical Signal Processing and Control 5 (2010) 252–263
As mentioned above the data sequence is multiplied by a number of weights or tapers. These tapers (Slepian tapers) are selected to optimally minimize broad-band bias, the tendency for power from strong peaks to spread into neighboring frequency intervals of lower power (also known as spectral leakage). Each of the tapered copies of the data is Fourier transformed and a weighted average is computed to obtain a low variance result while maintaining a high-resolution estimate. In practice only a few tapers need to be computed, depending on the resolution of the spectrum desired. The user chooses a bandwidth W over which the spectrum is smoothed, thus for an N-long sequence fixing the value NW known as the time-bandwidth product. The standard number of tapers K that need to be computed is K = 2NW − 1, although this is left for the user to decide and will depend on the particular study or type of data available. As mentioned, the MTM method provides a time-bandwidth parameter with which to balance the variance and resolution. This parameter is given by the time-bandwidth product, NW and it is directly related to the number of tapers used to compute the spectrum. There are always 2NW − 1 tapers used to form the estimate. This means that, as NW increases, there are more estimates of the power spectrum, and the variance of the estimate decreases. However, the bandwidth of each taper is also proportional to NW, so as NW increases, each estimate exhibits more spectral leakage (i.e. wider peaks) and the overall spectral estimate is more biased. For each data set, there is usually a value for NW that allows an optimal trade-off between bias and variance. For the Kth Slepian taper vk , we have Xk (f ) =
N−1
x(m)vk (m)e−2ifm
(6)
m=0
IRi =
the Kth eigencomponent, which is the complex-valued Fourier transform of the N long data sequence x(m) after being multiplied
2
by vk (m). Here we assume unit sampling. Note that Xk (f ) is a standard single-taper spectrum estimate. Following the adaptive weighting, we iteratively solve the following equations to obtain MTM (f ): the weights dk and the multitaper spectrum estimate PXX
dk (f ) =
MTM (f ) k PXX MTM k PXX (f ) + (1 − k ) 2
(7)
where 2 is the variance of the signal x(m), k is the k eigenvalue associated with the Slepian taper vk , and MTM PXX (f )
window, which was formed by 256 discrete data, was selected so that it contained a single ECG beat. For each beat the 129 points of the logarithm of the power levels of the PSDs were computed. The optimal value for time-bandwidth parameter, NW, of the MTM method experimentally found as 10. The sampling frequency of the ECG signals is 360 Hz. The time duration between beats contains useful information about their types. So we define three features called nxtRR, prevRR and ratRR. These features are defined as the time interval between the next (nxtRR), and the previous (prevRR) beats and also the time ratio between one beat to the next (ratRR), respectively. Since there are methods that with a high accuracy detect Rpeaks and QRS complexes, annotation files are provided in MIT–BIH database to speed up next developments in ECG studies. Annotation files specify the sample numbers of R-peaks in ECG records. So, in this study ECG waveform and power spectral features were extracted by selecting a window of −300 ms to 400 ms around the R wave as found in the database annotation. However, in the on line applications an R-wave detector is required to initialize our computer-aided ECG classification process. To reduce the DC offset and eliminated the amplitude variance from file to file, 252-sample vectors were normalized to a mean of zero and standard deviation of unity. As stated, in addition to the PSD-based spectral features, we extracted three local timing features that contributed to the discriminating power of morphology-based features, especially in discriminating morphologically similar heartbeat patterns. They are an R–R time interval ratio (IR) and two R–R time intervals. The IR ratio feature reflects the deviation from a constant beat rate and was defined as:
2 K−1 2 d Xk (f ) k=0 k = K−1 2 k=0
(8)
dk
However, the MTM method is more computationally expensive than periodogram, modified periodogram and Welch’s methods. It is outside the scope of this paper to explain the weighting procedure. A good treatment of this topic can be found in [24] and references therein. 2.2. Computation of feature vectors using ECG spectral and temporal characteristics Feature extraction plays an important role in any classification task. In this work, based on extensive research, we have used a balanced combination of spectral and timing features. The nonparametric PSD estimation methods provide sufficient resolution to estimate the sinusoids from the data. The periodogram, modified periodogram, Welch and multitaper methods were employed to obtain PSDs of the ECG signals. Using the frequency estimations provided by any one of these methods, the power levels of the signal can be determined from the power matrix. A rectangular
Ti − Ti−1 Ti+1 − Ti
(9)
where Ti represents the time at which the R-wave for beat i occurs. The local RR-interval ratio provides a convenient differentiator between normal beats (IRi ∼ 1) and PVC beats (IRi < 1), and is normalized by definition (IRi = 1) at constant rate. Two other timing features are the next and previous R–R time intervals for each heartbeat. Each feature vector is then normalized to have a zero-mean and a unity variance. The length of the final feature vectors is 132. 3. Support vector machine (SVM) SVM is a supervised machine learning method. SVM uses structural risk minimization (SRM) principle whereas in ANN, empirical risk minimization (ERM) is used to minimize training data error [25,26]. SVM performs classification tasks by constructing optimal separating hyper-planes (OSH). OSH maximizes the margin between the two nearest data points belonging to two separate classes (Fig. 1). Suppose the training set, (xi , yi ), i = 1, 2, . . ., l, x ∈ Rd , y ∈{− 1, is + 1} can be separated by the hyper-plane wT x + b = 0, where w the weight vector and b is bias. If this hyper-plane maximizes the margin, then the following inequality is valid for all input data: yi (wT xi + b) ≥ 1,
for all xi i = 1, 2, . . . , l
(10)
. Thus, the problem is The margin of the hyper-plane is 2/ w
the maximizing of the margin by minimizing of w2 subject to (10). This is a convex quadratic programming (QP) problem and Lagrange multipliers (ai , i = 1, . . ., l ; ˛i ≥ 0) are used to solve it:
1 w2 − ˛i [yi (wT xi + b) − 1] 2 l
LP =
i=1
(11)
A. Khazaee, A. Ebrahimzadeh / Biomedical Signal Processing and Control 5 (2010) 252–263 d K(xi , xj ) = (xi xj + 1)
polynomial : sigmoid :
255
K(xi , xj ) = tanh( xi xj + )
(17c) (17d)
where , d, and are the parameters of the kernel functions [26]. After a kernel function is selected, the QP problem will become: Ld =
l
1 ˛i ˛j yi yj K(xi , xj ) 2 l
l
˛i −
i=1
(18)
i=1 j=1
the ˛∗i is derived by: ˛∗i = arg
maxLd ˛
0 ≤ ˛i ≤ C; i = 1, 2, . . . , l;
l
˛i yi = 0
(19)
j=1
After training, the following, the decision function, becomes: Fig. 1. Separation of two classes by SVM.
f (x) = sgn
l
yi ˛∗i K(x, xi ) + b∗
(20)
i=1
After minimizing LP with respect to w and b, the optimal weights are given by: w∗ =
l
˛∗i yi xi
(12)
i=1
The dual of the problem is given by [25]: Ld =
l
1 ˛i ˛j yi yj xi T xj 2 l
l
˛i −
i=1
(13)
The performance of SVM can be controlled through the term C and the kernel parameter which are called hyper-parameters. These parameters influence on the number of the support vectors and the maximization margin of the SVM. As mentioned the performance of SVM-based classifier can be controlled through hyper-parameters of the SVMs. In this paper we have an optimization algorithm, namely GA for selection of the optimal values of these parameters. Next sections describe this optimization method.
i=1 j=1
To find the OSH, it must maximize Ld under the constraints of
l
˛ y = 0. The Lagrange multipliers are only non-zero (˛ 0) i=1 i i when yi (wT xi + b) = 1. Those training points, for which the equality in (10) holds, are called support vectors (SV) that can satisfy ˛i 0. The optimal bias is given by: b∗ = yi − w∗T xi
(14)
for any support vector xi . The optimal decision function (ODF) is then given by:
l
f (x) = sgn(
i=1
yi ˛∗i xT xi + b∗ )
4. Genetic algorithms Fig. 2 illustrates the operation of a general genetic algorithm. In GA, a candidate solution for a specific problem is called an individual or a chromosome and consists of a linear list of genes. Each individual represents a point in the search space, and hence a possible solution to the problem. A population consists of a finite number of individuals. Each individual is decided by an evaluating mechanism to obtain its fitness value. Based on this fitness value and undergoing genetic operators, a new population is generated iteratively with each successive population referred to as a
(15)
where ˛∗i s are optimal Lagrange multipliers. For input data with a high noise level, SVM uses soft margins can be expressed as follows with the introduction of the non-negative slack variables i , i = 1, . . ., l: yi (wT xi + b) ≥ 1 − i
for i = 1, 2, . . . , l
(16)
To obtain the OSH, it should be minimizing the ˚ = 1/2w2 +
l
C i=1 ik subject to (16), where C is the penalty parameter, which controls the tradeoff between the complexity of the decision function and the number of training examples, misclassified. In the nonlinearly separable cases, the SVM map the training points, nonlinearly, to a high dimensional feature space using kernel function K(xi , xj ), where linear separation may be possible (see [27] for more information). The kernel functions of SVMs are as follows: linear :
K(xi , xj ) = xi xj
(17a)
Gaussian radial basis function (GRBF) :
2
K(xi , xj ) = exp(−xi − xj /2 2 )
(17b)
Fig. 2. The operation of a generic GA.
256
A. Khazaee, A. Ebrahimzadeh / Biomedical Signal Processing and Control 5 (2010) 252–263
generation. During each generation, three basic genetic operators are sequentially applied to each individual with certain probabilities, i.e. selection, crossover (recombination), and mutation. First, a number of best-fit individuals are selected based on a user-defined fitness function. The remaining individuals are discarded. Next, a number of individuals are selected and paired with each other. Each individual pair produces one offspring by partially exchanging their genes around one or more randomly selected crossing points. At the end, a certain number of individuals are selected and the mutation operations are applied, i.e. a randomly selected gene of an individual abruptly changes its value. The GA is called a population-based technique because instead of operating on a single potential solution, it uses a population of potential solutions. The larger the population, the greater the diversity of the members of the population, and the larger the area searched by the population.
chromosome of SVMGA contains a value of kernel parameter and a value of C parameter. Therefore, a chromosome of SVMGA must consist of 2 genes, one for each parameter. The initial population of the genetic algorithm is consisted of 15 chromosomes. The first segment of a chromosome represents value of C parameter, which is an integer number between 1 and 1,000,000. The second segment of a chromosome represents GRBF kernel scaling factor parameter , which is a positive number. When a random chromosome is selected the value of parameter and value of C parameter represented by this chromosome are sent to SVM classifiers for realizing the training and testing of SVM classifiers by using parameters related with this chromosome. Then, the classification accuracy is computed and taken as the fitness value of the chromosome.
5. Proposed method
In this study, the Roulette Wheel selection scheme, which is a stochastic algorithm, is used. This algorithm is repeated until the desired number of individuals is obtained (called mating population). This technique is analogous to a roulette wheel with each slice proportional in size to the fitness. In this step we select 15 individuals (chromosomes).
The free parameters C and greatly affect the classification accuracy of SVM model. However, it is not known beforehand what values of the parameters are appropriate. Therefore, GA is used to search for better combinations of the parameters in SVM. Based on the Darwinian principle of ‘survival of the fittest’, GA can obtain the optimal solution after a series of iterative computations. Fig. 3 presents the whole process of ECG beat classification method implemented in the paper. The obtaining ECG records from database, extraction of beats from records by using annotation files and windowing method, normalization and feature extraction blocks in Fig. 3 were explained in previous sections. The following describes the structure and algorithm of SVMGA approach. A
5.1. Selection
5.2. Recombination (crossover) operation In this study, the intermediate recombination scheme is used [28]. Intermediate recombination is a method only applicable to real variables (and not binary variables). Here the variable values of the offspring are chosen somewhere around and between the variable values of the parents.
Fig. 3. The algorithm of SVMGA approach used for the problem of ECG beat classification.
A. Khazaee, A. Ebrahimzadeh / Biomedical Signal Processing and Control 5 (2010) 252–263
257
At the final of the mutation operation, 2 new chromosomes are obtained. There are total 14 chromosomes in population of next generation at the final of previous step and this step. 5.4. Elitism
Fig. 4. Area for variable value of offspring compared to parents in intermediate recombination [28].
Offspring are produced according to the rule: Vario =
p
p
Vari ai + Vari (1 − ai ) i ∈ (1, 2, . . . , Nvar ), ai ∈ [−d, 1 + d] uniform at random, d = 0.25, ai for each i new (21)
where a is a scaling factor chosen uniformly at random over an interval [ − d, 1 + d] for each variable anew. The value of the parameter d defines the size of the area for possible offspring. This method is called (standard) intermediate recombination. Because most variables of the offspring are not generated on the border of the possible area, the area for the variables shrinks over the generations. This shrinkage occurs just by using (standard) intermediate recombination. This effect can be prevented by using a larger value for d. A value of d = 0.25 ensures (statistically), that the variable area of the offspring is the same as the variable area spanned by the variables of the parents [28]. In this study d = 0.25 is used. Fig. 4 represents a picture of the area of the variable range of the offspring defined by the variables of the parents. The 85% portion of the 15 chromosomes obtained in previous step are randomly selected and subjected to crossover operator. Namely, 12 chromosomes are subjected to crossover operator. At the final of the crossover operations, 12 new chromosomes are obtained. 5.3. Mutation operation Each gene undergoes mutation with a fixed probability 5%. For binary representation of chromosomes, a bit position (or gene) is mutated by simply flipping its value. Since we are considering real valued representation in this article, we use the following mutation. A number ı in the range [0,1] is generated with uniform distribution. If the value at a gene position is v, after mutation it becomes
v ± ıv v = / 0,
(22)
v + ı,
(23)
v = 0.
The ‘+’ or ‘−’ sign occurs with equal probability. The mutation is considered such that its result is not a negative value [29].
We have implemented elitism at each generation by preserving the best string seen up to that generation into the next population. This eliminates the loss of good chromosomes and keeps the diversity of population. Thus on termination, we have fifteenth chromosome that surely contains the best approach met in the all of genetic algorithm generations. We implemented another elitism which does not contribute in the next generation. A same size external population constructed that keeps best chromosomes in the all of previous generations. The population is compared with current population in each generation and is updated with better chromosomes. Since C and parameters are positive and C parameter is integer we corrected the population at the end of each generation for the next one. 6. Database and performance metrics 6.1. MIT–BIH arrhythmia database The MIT–BIH arrhythmia database [30] was used as the data source in this study. The database contains 48 recordings. Each has a duration of 30 min and includes two leads; the modified limb lead II and one of the modified leads V1, V2, V4 or V5. The sampling frequency is 360 Hz, the data are bandpass filtered at 0.1–100 Hz and the resolution is 200 samples per mV. Twenty-three of the recordings are intended to serve as a representative sample of routine clinical recordings and 25 recordings contain complex ventricular, junctional, and supraventricular arrhythmias. There are over 109,000 labeled ventricular beats from 15 different heartbeat types. There is a large difference in the number of examples in each heart beat type. The largest class is “Normal beat” with about 75,000 examples and the smallest class is “Supraventricular premature beat” (SP) with just two examples. The database is indexed both in timing information and beat classification. For more details about MIT–BIH Arrhythmia database see [31]. We used a total of 8 records (see Table 1 for list) from the database. We used the database index files from database to locate beats in ECG signals. 6.2. Performance metrics Various approaches are adopted to evaluate the classifier performances in literature. In this study, we have considered four statistical indices: accuracy (Acc), sensitivity (Se), specificity (Sp),
Table 1 Record identification number and details of selected beats from each record and each beat type. Records
Normal
Left bundle branch block
Right bundle branch block
Atrial premature contraction
Premature ventricular contraction All
All
All
Selected
All
Selected
All
Selected
All
Selected
118 124 207 208 209 214 222 223
0 0 0 1586 2621 0 2062 2029
0 0 0 153 252 0 199 196
0 0 1457 0 0 2003 0 0
0 0 337 0 0 463 0 0
2166 1531 86 0 0 0 0 0
458 324 18 0 0 0 0 0
96 2 107 0 383 0 208 72
88 2 99 0 353 0 192 66
16 47 105 992 1 256 0 473
Selected 8 20 44 420 0 108 0 200
2278 1580 1755 2578 3005 2259 2270 2574
All
8298
800
3460
800
3783
800
868
800
1890
800
18,299
88.93 86.93 88.67 91.33 96.33 96.43 96.57 97.53 99.57 99.13 99.50 99.53 97.73 99.33 99.20 97.87 94.48 96.45 91.61 95.33 98.57 99.10 97.80 98.80 98.13 97.73 96.13 98.00 88.03 89.79 90.62 89.49
Sp LBBB
Se
92.64 92.93 92.51 93.97 Periodogram Modified periodogram Welch Multitaper
Sp
Pp Normal
Se
Acc
Table 2 Performance comparison of the different PSD estimation feature extraction methods, C = 10,000, = 0.01.
We randomly selected 100 beats from each class, and used these 500 beats for training of classifiers. Total number of beats in our database was 18,290. Clearly, the number of our training set is less than 3% of all beats. Thus, our study is well generalized. We conducted six experiments in the evaluation of our algorithm. In the first experiment we try to find best feature extraction method among four different non-parametric PSD estimation methods that introduced in Section 2.1. Since extraction of features for all 18,290 beats needs very much time, for this experiment 4000 beats, 800 from each beat type, were selected from records based on the number of each beat type in the record. Table 1 shows the number of selected beats from each beat type. In each record, specified beat numbers are selected randomly from its group. Then 50 beats from each class were used to train the algorithm and others were used to test the classifiers. Results of the experiment are shown in Table 2. In the next experiments, all of beats were used and the multitaper method was selected as ECG PSD estimator due to its high performance obtained in previous experiment. We tested four SVM kernels to find most successful kernel for the ECG beat classification problem. The linear, quadratic, polynomial with various orders and Gaussian RBF (GRBF) kernels with various C and parameters were tested. Consequently, the GRBF SVM kernel was selected to use in this situation, due to its better results. This is consistent with results obtained in many SVM classification applications, that the GRBF kernel function is commonly preferred to other kernel function types. The detailed results are not included here due to size of paper observations. However, finding optimum values of GRBF SVM parameters that result in the best accuracies is very difficult and time consuming. Classification accuracies of the SVM classifiers are examined next. This process involved the determination of GRBF kernel function parameter and C parameter. First, they were extracted
Pp
7. Results
97.00 97.37 97.73 97.30
Pp Sp RBBB
(27)
Methods
TP × 100 TP + FP
98.26 96.63 98.02 98.13
(26)
where FP (false positives) is the number of falsely detected events. Positive predictivity, Pp, is the ratio of the number of correctly detected events, TP, to the total number of events detected by the analyzer and is given by: Pp =
90.13 88.00 90.93 90.67
APC
TN × 100 TN + FP
Sp
where FN (false negatives) is the number of missed events. The specificity, Sp, the ratio of the number of correctly rejected nonevents, TN (true negatives), to the total number of nonevents is given by: Sp =
86.01 86.05 86.88 90.19
Pp
(25)
88.27 92.67 87.60 92.00
TP × 100 TP + FN
PVC
In this equation, Acc is the accuracy, and the variables, NE and NT , represent the total number of classification errors and beats in the file, respectively. Sensitivity, Se, the ratio of the number of correctly detected events, TP (true positives), to the total number of events is given by: Se =
99.33 99.13 99.03 99.30
Sp
(24)
Se
NT − NE × 100 NT
Se
Acc =
Pp
and positive predictivity (Pp), which are defined in the following Eqs. (24)–(27), respectively. The most crucial metric for determining overall system performance is usually accuracy. We defined the overall accuracy of the classifier for each file as follows:
97.09 96.17 95.82 97.03
A. Khazaee, A. Ebrahimzadeh / Biomedical Signal Processing and Control 5 (2010) 252–263
Se
258
A. Khazaee, A. Ebrahimzadeh / Biomedical Signal Processing and Control 5 (2010) 252–263
259
Table 3 The optimal values of SVM parameters C and GRBF SVM kernel parameters that are randomly selected and corresponding accuracies obtained in the problem of ECG beat classification. Num
Optimal values of C parameters
Optimal values of kernel function parameter,
Optimal classification accuracies
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
2837.1 505,140 4085.4 858,230 922.1829 427.6669 1377.2 33,099 61,015 348,440 878,070 966,900 386,550 5484.1 6061.9 323,980 17,884 2007.6 76,318 8856.9 66,392 9038 22,274 509,760 792,800 63,329 6194.4 2935.2 314,290 915,730 24,066 1563.1 800,600
0.8587 0.8915 1.0402 1.0053 1.2154 0.2365 1.3675 1.3594 1.6626 2.3567 1.9620 2.4255 2.2753 2.9704 2.9316 3.1919 4.0163 4.5956 4.8579 5.4849 5.7969 6.3942 6.7414 6.8016 6.8288 7.0922 7.4255 8.2378 9.4979 9.6143 9.7742 9.9233 9.9476
92.86 92.62 92.29 92.29 91.91 91.86 91.73 91.44 90.94 90.76 90.73 90.72 90.72 90.69 90.68 90.62 90.40 90.31 90.15 90.08 90.02 89.77 89.64 89.62 89.61 89.56 89.49 89.19 88.73 88.64 88.63 88.61 88.60
Table 4 The optimal values of SVM parameters C and GRBF SVM kernel parameters that estimated by SVMGA approach and corresponding accuracies obtained in the problem of ECG beat classification. Num
Optimal values of C parameters
Optimal values of kernel function parameter,
Optimal classification accuracies
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
448751.694334740 436495.665956083 364334.861961469 352732.410760749 381709.309765071 460675.401593579 330195.880376728 331475.339704423 586892.117887934 485639.578486249 447755.095168728 447476.055732119 451533.220992118 417172.133159822 419008.434694655 451981.081442430 422898.525762769 419461.833192202 475331.744589548 997815.795364488 739959.966526457 533336.118963568 374220.997307492 546461.410977722 584940.373737655 564718.340396837 546595.823095958 734567.482924314 867766.001638883 451558.288423683 445635.336827739 471125.788469585 445635.336827739
0.00430659988906543 0.00451776110219134 0.00542129830635645 0.00505287072305816 0.00542129830635645 0.00431384813773417 0.00545348191348891 0.00521258278679493 0.00363437533322219 0.00434405653159478 0.00425049957547429 0.00392008109330740 0.00450525173637588 0.004068977295631 0.004845850156967 0.00468718692136479 0.005338638434940 0.004605658021415 0.00413568090530507 0.00334689423832194 0.00253129794848464 0.00361618325646285 0.00599728349978074 0.00339504436556074 0.00366810017224628 0.00420346440039298 0.00376922597715315 0.00345815687885082 0.00392280166824557 0.005509067021134 0.005509067021134 0.002949123688866 0.003359068595951
96.00 95.98 95.98 95.97 95.97 95.97 95.96 95.95 95.95 95.95 95.94 95.93 95.93 95.93 95.93 95.93 95.92 95.92 95.92 95.91 95.89 95.88 95.88 95.88 95.88 95.88 95.87 95.87 95.86 95.81 95.79 95.79 95.74
99.52 37.64
95.42
93.84
63.36
82.53
97.68
95.56
89.05
98.83
98.43
96.52
99.59
94.96
98.91
98.79
97.56
85.09 90.48 32 16 3 1659 11 26 633 40 7 3550 18 47 29 79 107 14 L R A V
3279 7 6 30
6970 OPM without timing features
N
131
93
972
25
94.27 18 30 67 4 L R A V
3322 8 1 32
5 3638 0 12
0 0 683 34
13 2 16 1708
96.00
98.76
97.61
98.48
96.82
98.93
97.65
99.42
Pp Sp Se Pp Sp Se 25 V 361
A
41 R 7722 N Our proposed method (OPM)
42 L N
92.68
99.65
Pp Sp Se Pp Sp Se Pp Sp Se
PVC APC RBBB LBBB Normal Acc
Performance metrics Confusion matrix Name
Table 5 Comparison of results that obtained by feature vectors, with and without timing information.
95.62
A. Khazaee, A. Ebrahimzadeh / Biomedical Signal Processing and Control 5 (2010) 252–263
96.83
260
randomly and then, SVMGA approach was employed in the extraction of the above parameters. The classification accuracies of SVM classifiers that their parameters were randomly selected are given in Table 3. The optimal values of GRBF kernel function parameters and optimal values of C parameters estimated by SVMGA algorithm are given in Table 4. Main features that we used were spectral features. However, to have better differentiating between the spectrally similar beats, three timing features were used. To show the optimality of proposed feature combination, the experiment 4 prepared. First the best optimal parameters obtained by SVMGA method were used to classify ECG beats in five classes, and then the classification was performed again only with spectral coefficients. Results are in Table 5. ECG signals usually contain noise and artifacts. The most troublesome noise sources are electrical activity of muscles (EMG), instability of electrode–skin contact (electrode movement) and baseline wander [21]. We added different levels of different noises to our records in a controlled manner and measured the effect of them on the performance of the classification. Results are shown in Table 6. SNR in Table 6 indicates the signal to noise ratio for ECG records. WN, CN, EM and BW are white noise, colored noise, electrode movement and baseline wander, respectively. EM, EMG and BW noises are generated according to sample noise models obtained from the MIT–BIH non-stress test database. Finally the comprehensive file-by-file results for the proposed method were computed and are shown in Table 7. MTM method used to extract spectral features and they combined with temporal features to construct feature vectors. Then the SVM classifiers trained by feature vectors with training parameters C = 448751.694334740 and = 0.00430659988906543, which estimated by SVMGA method. The trained SVM classifiers employed for classification of each ECG record beats. Here, there are two training sets. One is the general training set that used in previous experiments. The other set constructed by first 1–5 min of each record. Consequently, a patient adaptable ECG beat classification performed in this experiment.
8. Discussion As seen in Table 2, the multitaper PSD estimation method achieved best classification accuracy of 93.97% among four nonparametric PSD estimation methods (as bolded). The next best method is modified periodogram. Periodogram and Welch methods are in the next ranks. Values of C and that used in Table 2 were experimentally predicted. Multiple execution of the program under a multitude of variables resulted in C = 10,000 and = 0.1 to show better outcome through trial and error. One of superiorities of the MTM method is that, it provides a time-bandwidth NW parameter with which to balance the variance and resolution. By changing NW the MTM method can be adapted to any specific classification problem. It seems, this is why the MTM method has best performance in extraction of PSD features of ECG beats. As shown in both Tables 3 and 4, the classification accuracies obtained by SVMGA approach are higher than those of SVM classifiers, whose kernel function parameter and C parameter are randomly selected. These results show that SVMGA approach is effective in estimating the most appropriate value of kernel function parameter and optimal value of soft margin constant C penalty parameter of support vector machines (SVM) classifier for the ECG beat classification task.
A. Khazaee, A. Ebrahimzadeh / Biomedical Signal Processing and Control 5 (2010) 252–263
261
Table 6 Obtained results for classifications of ECG beats in presence of different noises. ECG signals
SNR
Accuracy
Sensitivity Normal
LBBB
RBBB
APC
PVC
ECG + WN
10 15 20
86.3023 93.9440 95.6478
76.8038 90.3919 93.4562
96.2775 98.5110 98.7790
99.8912 99.5106 99.3475
65.7106 82.7901 87.8748
91.9553 94.9721 95.5307
ECG + CN
10 15 20
91.7117 94.6919 95.6422
88.4629 92.1621 93.5417
98.7493 99.0173 99.0173
99.0484 99.0484 99.0756
87.0926 89.7001 89.9609
80.2793 91.3408 94.3017
ECG + EMG
10 15 20
70.3891 90.0304 95.3216
93.9079 93.9812 94.0667
84.3955 97.1709 98.6897
27.9228 83.5237 97.9065
92.0469 91.9166 91.5254
14.4693 71.1173 91.0615
ECG + EM
10 15 20
81.9051 92.5776 95.4060
90.6483 94.2742 94.5794
97.6474 98.1239 98.5110
41.7074 83.4693 96.0305
84.7458 87.6141 88.0052
93.7430 95.2514 95.2514
ECG + BW
10 15 20
95.4791 95.8165 95.9064
94.0178 93.9934 93.9812
98.4515 98.7493 98.8088
98.3415 98.8309 98.9668
86.4407 89.8305 90.6128
94.5810 95.0279 95.2514
ECG + EMG + EM + BW
10 15 20
90.7951 94.8887 95.8165
95.5927 94.7625 94.3108
86.5992 98.6599 99.0471
98.3415 99.0212 99.0484
79.6610 85.1369 86.8318
65.9777 84.0782 93.8547
Table 7 Detailed results for file-by-file including both test and train dataset. Record
100 101 102 103 104 105 106 107 108 109 111 112 113 114 115 116 117 118 119 121 122 123 124 200 202 203 205 207 208 209 210 212 213 214 215 217 219 220 221 222 223 228 230 231 233 234
Accuracy
100 99.18 98.41 99.95 96.30 90.53 94.61 75.44 95.12 99.47 99.61 100 99.31 97.03 99.95 99.38 99.93 98.37 100 98.34 98.43 99.25 97.08 95.76 73.07 92.03 99.58 98.58 98.29 97.60 96.38 97.22 99.52 98.63 97.13 88.11 96.30 98.58 96.77 82.85 96.66 97.89 99.78 79.67 96.27 99.96
Normal
LBBB
RBBB
APC
PVC
Se
Sp
Pp
Se
Sp
Pp
Se
Sp
Pp
Se
Sp
Pp
Se
Sp
Pp
100 99.24 100 100 96.99 90.54 94.36 – 95.77 – – 100 99.31 97.62 99.95 99.39 99.93 – 100 98.40 98.43 99.25 – 97.58 74.02 93.46 99.92 – 97.73 98.01 98.41 98.48 99.77 – 99.78 85.53 97.10 99.90 96.42 82.72 98.52 97.58 99.78 0 97.80 99.96
100 66.67 100 50 100 100 98.63 100 80.95 99.80 99.86 100 – 92.45 – 100 100 98.95 100 100 – 100 99.43 97.19 83.78 85.28 97.30 99.77 100 94.79 80.41 96.67 97.98 99.16 74.85 99.34 88.73 71.28 99.75 86.54 96.52 99.73 100 99.92 94.89 100
100 99.94 100 99.95 100 100 99.49 – 99.73 0 0 100 100 99.77 100 100 100 0 100 100 100 100 0 98.57 99.20 97.35 99.92 0 100 99.23 98.36 93.90 99.81 0 98.68 99.51 99.57 98.63 99.95 98.38 99.06 99.93 100 0 98.07 100
– – – – – – – – – 99.71 99.61 – – – – – – – – – – – – – – – – 99.11 – – – – – 98.80 – – – – – – – – – – – –
100 99.82 100 100 97.78 94.60 98.78 82.46 97.75 100 100 100 100 99.23 100 99.75 99.93 100 100 99.45 100 100 98.16 99.45 96.35 97.98 99.74 99.32 99.88 100 99.36 99.93 100 100 100 97.67 99.70 100 99.44 99.03 98.80 99.73 100 100 98.73 99.96
– 0 – – 0 0 0 0 0 100 100 – – 0 – 0 0 – – 0 – – 0 0 0 0 0 99.86 0 – 0 0 – 100 – 0 0 – 0 0 0 0 – – 0 0
– – – – – – – – – – – – – – – – –– 99.12 – – – – 98.95 – – – – 94.12 – – – 96.56 – – – – – – – – – – – 99.92 – –
100 99.47 100 100 100 99.01 97.10 100 99.42 99.92 100 100 100 98.24 100 100 100 100 100 98.95 98.43 100 89.80 98.59 99.14 99.51 100 99.82 100 100 99.52 98.48 100 99.91 100 100 98.10 100 98.92 98.46 99.42 98.49 99.78 0.63 99.97 100
– 0 – – – 0 0 – 0 0 – – – 0 – – – 100 – 0 0 – 99.67 0 0 0 – 96.39 – – 0 99.19 – 0 – – 0 – 0 0 0 0 0 79.76 0 –
100 66.67 – 50 – – – – 0 – – 100 – 60 – 100 100 93.75 – 0 – – 0 50 30.91 – 66.67 96.23 – 94.78 – – 89.29 – 100 – 0 71.28 – 84.13 65.75 33.33 – 0 57.14 –
100 100 98.41 100 99.26 99.80 99.13 92.98 99.87 99.80 100 100 99.77 99.78 99.95 100 100 99.50 100 100 100 99.80 100 98.89 83.38 98.99 99.92 99.82 98.84 98.02 99.32 100 99.76 99.78 98.43 95.35 99.45 99.90 99.01 85.97 99.56 99.89 100 100 99.33 100
100 100 0 100 0 0 0 0 0 0 – 100 0 60 0 100 100 89.11 – – – 0 – 34.88 4.80 0 50 97.14 0 87.47 0 – 78.13 0 5.45 0 0 97.10 0 37.72 81.36 33.33 – – 16.67 –
100 – 75 – 50 90.24 95.29 75.44 58.82 84.21 100 – – 81.40 – 99.08 – 25 100 100 – 100 40.43 93.70 94.74 83.76 88.73 97.14 99.19 100 72.16 – 97.73 97.27 46.34 92.11 82.81 – 98.49 – 93.45 99.72 100 50 92.52 100
100 99.94 100 100 99.25 97.07 99.93 – 98.31 99.96 99.76 100 99.54 100 100 99.61 100 99.91 100 99.95 100 99.45 99.80 99.65 94.28 97.32 100 99.21 99.31 100 99.66 100 99.93 99.75 99.97 92.34 99.43 100 99.33 99.34 99.52 99.80 100 99.94 99.50 100
100 0 100 – 50 33.64 99.79 100 27.78 96.97 16.67 – 0 100 – 92.31 – 66.67 100 50 – 27.27 86.36 99.23 13.24 84.40 100 88.70 98.89 100 94.59 – 99.08 98.03 98.70 88.61 82.81 – 96.77 0 97.79 99.18 100 50 98.56 100
262
A. Khazaee, A. Ebrahimzadeh / Biomedical Signal Processing and Control 5 (2010) 252–263
9. Conclusion
Fig. 5. PSD estimation of five ECG classes using MTM method.
According to the results in Table 4, the best accuracy that obtained for the test set by SVMGA is 96.00% while the best accuracy that randomly achieved in Table 3 is 92.86%. In previous genetic algorithm–SVM studies, workers have recently been generally using binary representation, which only was capable for searching among discrete and determinate values of C and parameters. We used floating point representation which searches all feasible solutions. Therefore, the SVMGA approach proposed in this study is more comprehensive than previous genetic algorithm—SVM studies. Fig. 5 demonstrates the MTM method for PSD estimation of five ECG classes. Only first 45 frequency samples of all 129 frequency samples are shown in Fig. 5. Each PSD line is the average of all PSDs of beats in the class. This makes our PSD comparisons more robust and reliable. As illustrated, power spectral of Normal and APC beats are very similar. This causes the differentiating ability of spectral features for these classes. So we used three timing features. At follows we discuss about the utility of these features. As seen in Table 5, the classification accuracy degraded in absence of temporal features. While PSD-based spectral features provide effective discrimination capability between normal and some abnormal heartbeats (i.e. PVC beats), three temporal features contribute to the discriminating power of power spectral-based features, especially in discriminating spectrally similar heartbeat patterns (i.e. normal and APC beats). Results illustrated in Table 5 support our claim. Here, the sensitivity of LBBB, RBBB, and PVC classes did not change widely but the sensitivity of normal and APC classes decreased significantly. Sensitivity measures how successfully a classifier recognizes beats of a certain class without missing them. Because of spectral similarities of these classes, timing features have a great role in differentiating them and their absence decreases the sensitivity of two classes. Also degradation of positive predectivities is considerable for APC class. This means that more beats of other classes are misclassified as APC beats. According to confusion matrix part of Table 5, in first state, most misclassified normal beats are those classified as APC (361 beats) and conversely most misclassified APC beats are those classified as normal (67 beats). In second state (features without timing information), these values increase to 972 and 107 beats, respectively. Results in Table 6 show that, noise in ECG, if SNR is high, can extensively decrease the classification performance. Performance metrics degraded as SNR decreased. We added noises only to the test database and classifiers were trained with noise less ECG beats. However, if noise sources in train and test database assumed the same, usually results in Table 6 will increase. For example, if we do that, the accuracy for ECG + EMG with SNR of 10 which is 70.39% (the lowest accuracy obtained in presence of noise) will increase to 94.34%.
In this study, SVMGA approach is proposed for an automatic ECG beat classification. The SVMGA approach optimizes the value of GRBF kernel function parameter and the value of C parameter for SVM classifier, simultaneously. In the first experiment periodogram, modified periodogram, Welch and multitaper non-parametric PSD estimation methods are compared for feature subset. As the result, the MTM method was selected to obtain a compact set of spectral features. Three timing features are extracted as well. This combination of features captures all the temporal and shaping aspects of beats to classify ECG beats in five different classes. The SVM was employed as the classifier and GRBF kernel was selected as the optimum kernel function, experimentally. In the second experiment, SVM parameters C and were specified randomly. However, they were not optimum choices. In the next experiment, parameter selection was performed by proposed SVMGA approach. These experimental results showed that the classification accuracy of SVMGA for ECG beat classification is superior to those of SVM classifiers, whose parameters were randomly selected. The highest accuracy obtained by SVMGA method was 96.00%. The utility of adding the PSD estimation approach to extract timing features was demonstrated in the forth experiment. The fifth experiment demonstrates the effect of various ECG noises in degradation of performance metrics. Finally, the classification task performed record by record for all beats in the database. Overall, the results indicate that SVMGA approach can be used for various classification tasks such as control chart pattern recognition, speech recognition and texture image classification which may be the subjects of future studies.
References [1] G.D. Clifford, F. Azuaje, P.E. McShary, Advanced Methods and Tools for ECG Data Analysis, Artech House, Norwood, MA, 2006. [2] M.S. Thaler, The Only EKG Book You’ll Ever Need, third ed., Lippincott/Williams & Wilkins, Philadelphia, PA/Baltimore, MD, 1999. [3] L.Y. Shyu, Y.H. Wu, W.C. Hu, Using wavelet transform and fuzzy neural network for VPC detection from the Holter ECG, IEEE Trans. Biomed. Eng. 51 (2004) 1269–1273. [4] O.T. Inan, L. Giovangrandi, G.T.A. Kovacs, Robust neural-network-based classification of premature ventricular contractions using wavelet transform and timing interval features, IEEE Trans. Biomed. Eng. 53 (2006) 2507–2515. [5] A. Ebrahimzadeh, A. Khazaee, Detection of premature ventricular contractions using MLP neural networks: a comparative study, Measurement 43 (2010) 103–112. [6] S.N. Yu, K.T. Chou, Selection of significant for ECG beat classification, Expert Syst. Appl. 36 (2009) 2088–2096. [7] T. Ince, S. Kiranyaz, M. Gabbouj, A generic and robust system for automated patient-specific classification of electrocardiogram signals, IEEE Trans. Biomed. Eng. 56 (2009) 1415–1426. [8] U.R. Acharya, et al., Automatic identification of cardiac health using modeling techniques: a comparative study, Inform. Sci. 178 (2008) 4571–4582. [9] C.H. Lin, Frequency-domain features for ECG beat discrimination using grey relational analysis-based classifier, Comput. Math. Appl. 55 (2008) 680–690. [10] R.R. Sarvestani, R. Boostani, M. Roopaei, VT and VF classification using trajectory analysis, Nonlinear Anal. 71 (2009) e55–e61. [11] S. Osowski, T.H. Linh, ECG beat recognition using fuzzy hybrid neural network, IEEE Trans. Biomed. Eng. 48 (2001) 1265–1271. [12] E.D. Ubeyli, Recurrent neural networks employing Lyapunov exponents for analysis of ECG signals, Expert Syst. Appl. 37 (2010) 1192–1199. [13] P. Chazal, M. O’Dwyer, R.B. Reilly, Automatic classification of heartbeats using ECG morphology and heartbeat interval features, IEEE Trans. Biomed. Eng. 51 (2004) 1196–1206. [14] M. Lagerholm, et al., Clustering ECG complexes using Hermite functions and self-organizing maps, IEEE Trans. Biomed. Eng. 47 (2000) 839–847. [15] L. Khadra, A.S. Al-Fahoum, S. Binajjaj, A quantitative analysis approach for cardiac arrhythmia classification using higher order spectral techniques, IEEE Trans. Biomed. Eng. 52 (2005) 1840–1845. [16] R.V. Andreao, B. Dorizzi, J. Boudy, ECG signal analysis through hidden Markov models, IEEE Trans. Biomed. Eng. 53 (2006) 1541–1549. [17] R. JoyMartis, C. Chakraborty, A.K. Ray, A two-stage mechanism for registration and classification of ECG using Gaussian mixture model, Pattern Recognit. 42 (2009) 2979–2988.
A. Khazaee, A. Ebrahimzadeh / Biomedical Signal Processing and Control 5 (2010) 252–263 [18] S. Mitra, M. Mitra, B.B. Chaudhuri, A rough set-based inference engine for ECG classification, IEEE Trans. Instrum. Meas. 55 (2006) 2198–2206. [19] F. de Chazal, R.B. Reilly, A patient adapting heart beat classifier using ECG morphology and heartbeat interval features, IEEE Trans. Biomed. Eng. 53 (2006) 2535–2543. [20] S. Osowski, T. Markiewicz, L.T. Hoai, Recognition and classification system of arrhythmia using ensemble of neural networks, Measurement 41 (2008) 610–617. [21] V. Vapnik, Statistical Learning Theory, Wiley, New York, 1998. [22] S.V. Vaseghi, Advanced Digital Signal Processing and Noise Reduction, 4th edition, John Wiley & Sons, 2008. [23] P.D. Welch, The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms, IEEE Trans. Audio Electroacoust. AU-15 (June) (1967) 70–73. [24] D.B. Percival, A.T. Walden, Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques, Cambridge University Press, Cambridge, UK, 1993, 611 pp.
263
[25] C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining Knowl. Discov. 2 (1998) 121–167. [26] C. Cortes, V. Vapnik, Support vector networks, Mach. Learn. 20 (3) (1995) 273–297. [27] K.R. Muller, S. Mika, G. Ratsch, K. Tsuda, B. Scholkopf, An introduction to kernelbased learning algorithms, IEEE Trans. Neural Netw. 12 (March (2)) (2001) 181–202. [28] S. Sumathi, T. Hamsapriya, P. Surekha, Evolutionary Intelligence: An Introduction to Theory and Applications with Matlab, Springer-Verlag Berlin Heidelberg, 2008. [29] S. Bandyopadhyay, S.K. Pal, Classification and Learning Using Genetic Algorithms, Springer-Verlag Berlin Heidelberg, 2007. [30] R.G. Mark, G.B. Moody, MIT–BIH Arrhythmia Database 1997 [Online]. Available: http://ecg.mit.edu/dbinfo.html. [31] G.B. Moody, R.G. Mark, The impact of the MIT/BIH arrhythmia database, IEEE Eng. Med. Biol. Mag. 20 (May–June (3)) (2001) 45–50.