2008 International Conference on BioMedical Engineering and Informatics
Cardiac Arrhythmia Detection based on Signal Variation Characteristic Chusak Thanawattano and Surapol Tan-a-ram National Science and Technology Development Agency (NSTDA), National Electronics and Computer Technology center (NECTEC), Klong Luang, Pathumthani, 12120, Thailand E-mail:
[email protected] and
[email protected]
date have not been high enough for the classifiers to gain wide spread clinical acceptance. This could be caused by insufficiency of certain heart beat types used to train the classifier. Hu et al. [6] suggests that the classification accuracy can be boosted by using the patient specific combined with the global classifier. Chazal [1] suggests the use of timing interval in combination with waveform characteristic to provide the classification robustness. Inan et al. [2] combines the timing interval features with the wavelet transform of the ECG signal to be the training and testing set of the neural-network classifier to provide more efficient classification. ECG analysis can be used to detect a specific cardiac arrhythmia such as premature ventricular contraction (PVC) or ventricular fusion beats. The PVC results from irritated ectopic foci in the ventricular area of the heart [2]. These foci cause premature contractions of the ventricles that are independent of the pace set by the sinoatrial node. PVCs, when associated with myocardial infarction, can be linked to mortality [7]. The fusion beats occur when two separate pacemakers compete for control of the ventricles [8]. Most of the time the fusion beats look very similar to the PVC and normal beat. Chazal [1] and Hu [6], therefore, show their classification performances based on the data set of normal beat, fusion beat and PVC beat.
Abstract This paper presents the classification of cardiac arrhythmia based on the signal variation characteristic of each beat type. Considered beat types including the normal beat, fusion beat and premature ventricular contraction beats are differentiated to obtain feature sets. Using the principal component analysis estimation, the detection selects the class by searching the minimal norm of the error vector obtained by basis of each type. Without the help of the timing interval information, the proposed classifier outperforms the algorithm presented in the literature. The classification accuracy of the proposed algorithm achieves perfect detection.
1. Introduction The electrocardiogram (ECG) signal can be used for diagnosing the cardiac conditions. The work on the ECG detection and classification provide the option for the physiologist to pre-screening the non-severe arrhythmias. Moreover, it can provide the real-time detection for 24-hour ECG monitoring of the lifethreatening conditions. Recently, there have been a number of research groups that work on the detection and classification of arrhythmias that cause either severe or mild abnormal illness of cardiac condition. Some groups conduct the research of the ECG classification based on its characteristic or morphology [1]. Some other research groups use the signal transformation to extract the significant structure from the ECG signal [2] – [4] by decomposing the ECG signal into a set of components in feature subspace. Ge et al. [5] use the autoregressive (AR) modeling to extract features which are the AR coefficients from the ECG signal. However, as stated in [1], the classification rates of automatic beat classifiers presented in the literature to
978-0-7695-3118-2/08 $25.00 © 2008 IEEE DOI 10.1109/BMEI.2008.294
2. Aim Our work is to investigate the performance of the current proposed classifier that classifies the same data set as in [1, 6]. The ECG data set including normal (N), fusion beat (F) and premature ventricular contraction (P) are used for training and testing the classifier based on the signal variation characteristic of the ECG signal centered at the QRS complex.
1291 367
3. Material and methods
3.3 Feature Extraction
3.1 Data base selection
The low-passed ECG signals are then differentiated to extract the signal variation characteristic based on [10] which has the difference equation as in (2)
In this paper we use the three categories of beats, including the normal beat (N), fusion beat (F) and PVC beat (P), from MIT-BIH arrhythmias database [9] for training and testing the proposed classifier. From the database, in this study we select only 6 records including 119, 200, 208, 213, 223 and 233. From these records, only records 208 and 213 contain sufficient number of fusion beats even though they are in small portion compared to normal and PVC beats as shown in Table 1. The ECG signal of each record is segmented into 50000-sample sessions. Since each record contain about 650000 samples, therefore a single ECG record is divided into 13 sessions. We randomly select the sessions 1, 3 and 5 of ECG records 200, 208 and 213 for the training phase of the classification. The records 208 and 213 contain sufficient numbers of fusion beats while record 200, even though it has a very small amount of fusion beats, is used to be a representative of PVC beats in training phase.
y (nT )
database [9]ࢽ ࢽ N
F
P
N
Total
F
P
N
Training
F
P
Testing
119
1542
0
444
0
0
0
1542
0
444
200
1742
2
826
416
1
196
1326
1
630
208
1450
350
936
237
89
170
1213
261
766
213
2640
362
220
666
56
30
1974
306
190
223
2028
14
473
0
0
0
2028
14
473
233
2229
11
831
0
0
0
2229
11
831
(2)
While utilizing the pre-annotation of R-peak from the MIT-BIH arrhythmia database, we segment each ECG beat interval starting at 120th sample prior to the R-peak to 119th sample after the R-peak. Therefore each beat contains 240 samples centered at R-peak. The 240-sample differentiated ECG is then normalized so that the minimum of the signal is zero and the maximum of the signal is one. Each 240sample signal is then down-sampled by a factor of eight so that the beat interval session then has 30 samples. The 30-sample time-series sequences of each type (N, F and P) are then formed into the matrix format XN, XF and XP where their column vectors are 30-dimensional sequences. The dimensions of the training matrix XN, XF and XP are then 30×1319, 30×146 and 30×396, respectively. The principal component analysis (PCA) is then used to obtain the new basis BN, BF and BP, of the training matrix XN, XF and XP respectively. In this paper we select the dimension of new basis so that the projected time-series sequence is 3-dimensional. This means the dimension of new basis matrix is 3×30. The new basis is then pseudo-inverted as in (3), where B+ is the pseudo-inverse of B and BH is the Hermitian transpose of matrix B.
Table 1. Training and testing beats from MIT-BIH
Rec
(1 / 8 T )[ x(nT 2 T ) - 2 x(nT T ) 2 x(nT T ) x(nT 2 T )]
B
B H ( BB H ) 1
(3)
We are then obtained the pseudo-inverse matrix BN+, BF+ and BP+ of basis matrix BN, BF and BP, respectively. These pseudo-inverse matrices will be used in the detection phase that will be explained later.
3.2 Signal preprocessing 3.4 Signal Detection The ECG records are filtered by the low-pass filter based on Pan and Tompkins [10] which has the difference equation as in (1) y (nT )
2 y (nT T ) y (nT 2 T ) x(nT ) 2 x(nT 6 T ) x(nT 12 T )
In the signal detection phase, a single ECG record of unknown beat type is segmented into 240 as in the training phase. The 240-sample sequence is then lowpassed by a low-pass filter with difference equation as (1). The low-passed sequence is then differentiated by (2) and down-sampled by a factor of eight. We are now obtained the 30-dimensional testing column
(1)
1292 368
vector y that will then be classified into one of three classes. Let B be the basis matrix obtained by the PCA and let B+ be its pseudo-inverse matrix, the error vector e from the PCA estimation can then be expressed as e
y ( B ( By ))
(4)
Therefore, the error vectors obtained by basis matrices generated from training matrices XN, XF and XP are e N and e P
y ( B N ( B N y )) , e F
Figure 1. Process flow in testing phase. The basis and
y ( B F ( B F y ))
its pseudo-inverse matrix is obtained in training phase
y ( B P ( B P y )) respectively. The l1 norm is
used to measure the error as in (5) where ei is the ith element of the error vector e and n is the dimension of the error vector e. N
Norm of each ECG in testing phase
n
|| e || 1
¦ | ei |
20
20
20
10
10
10
0
(5)
In this paper, the decision ˆy is based on the lowest l1 norm of error vectors e N , e F and e P . That is ˆy is the Normal beat iff min( || e N ||1 , || e P ||1 , || e F ||1 ) || e N ||1 , ˆy is the Fusion beat iff min( || e N || 1 , || e P || 1 , || e F || 1 ) || e F || 1 , ˆy is the PVC beat iff min( || e N || 1 , || e P || 1 , || e F || 1 ) || e P || 1
P
F
i 1
0
5000
0
10000
0
5000
10000
0
15
15
15
10
10
10
5
5
5
0
0
0
500
0
500
0
20
20
20
10
10
10
0
0
0
2000
0
0
2000
0
5000
0
10000
500
0
2000
(6)
Figure 2. Norm measured on error vectors obtained by basis BN (and BN+), BF (and BF+) and BP (and BP+), respectively.
However, the l2 norm is also possible to measure the error but it requires more computation complexity.
4. Experiment and results
Classification result 1
N
To investigate the performance of the detector, in testing phase the rest of ECG signal from the database including records 119, 200, 208, 213, 223 and 233, are pre-processed. The ECG signal is low-passed by lowpass filter with difference equation as in (1). It is then differentiated by a system having difference equation as in (2). The differentiated sequence is then segmented into 240-sample vectors following with down-sampling by a factor of eight. The 30dimentional vector of unknown arrhythmia is now ready to be the input sequence of the testing process. Fig. 1 shows the flow of the testing process. The 30-dimentional testing sequence is projected to new space by basis BN, BF and BP. The corresponding sequence is further projected by basis BN+, BF+ and BP+. The error vector obtained from each basis projection is then measured using the l1 norm.
0.5 0 0
1000
2000
3000
4000
5000
6000
7000
8000
9000 10000
F
1 0.5 0 0
100
200
300
400
500
P
1 0.5 0 0
500
1000
1500
2000
2500
3000
Figure 3. Result of beat type detection. The result shows the perfect detection with 100% accuracy.
1293 369
timing interval information as in [1, 2], the performance of the classification based on the PCA estimation achieves 100% accuracy by using training beats fewer than 20% of each beat type.
Fig. 2 shows the result of norm measured from the error vector of each beat type. Each row represents norm measured from error vectors of the same beat type, N, F and P, projected to estimated space with basis BN (and BN+), BF (and BF+) and BP (and BP+), respectively. By using our proposed criteria, the basis that obtains the minimal norm indicates the result of the detection. Fig. 3 shows the result of the detection. The result is indicated by one for the correct detection and zero for misclassifying. Our method provides a perfect detection of beat types N, F and P.
7. Reference [1] P. Chazal, and R. B. Reilly, “Automatic classification of ECG betas using waveform shape and heart beat interal features”, International Conference on Acoustics, Speech and Signal Processing (ICASSP’03), vol.2, pp. 269-272, 2003. [2] O. T. Inan, L. Giovangrandi, and G. T. A. Kovacs, “Robust Neural-Network Based Classification of Premature Ventricular Contractions Using Wavelet Transform and Timing Interval Features, IEEE Transactions on Biomedical Engineering, Dec. 2006, Vol. 53, Part 1, pp. 2507-2515. [3] M. H. Kadbi, J. Hashemi, H. R. Mohseni, A. Maghsoudi, "Classification of ECG Arrhythmias Based on Statistical and Time-Frequency Features", Advances in Medical, Signal and Information Processing, 2006. MEDSIP 2006. IET 3rd Intr. Conf., July 2006, pp. 1-4. [4] Q. Zhao, and L. Zhang, "ECG Feature Extraction and Classification Using Wavelet,” International Conference on Neural Networks and Brain, 2005, (ICNN&B '05), Vol. 2, pp. 1089- 1092, 2005. [5] D. Ge, N. Srinivasan, and S. M. Krishnan, “Cardiac Arrhythmia Classification Using Autoregressive Modeling”, BioMedical Engineering OnLine 2002, http://www.biomedical-engineering-online.com /content/1/1/5 [6] Y. H. Hu, S. Palreddy, and W. J. Tompkins, “A PatientAdaptable ECG Beat Classifier Using a Mixture of Experts Approach”, IEEE Transactions on Biomedical Engineering, vol. 44, pp. 891-900, 1997. [7] I. Atsushi, M. Hwa, A. Hassankhani, T. Liu, and S. M. Narayan, “Abnormal Heart Rate Turbulence Predicts the Initiative of Ventricular Arrhythmias”, Pacing Clinical Electrophysiology, vol. 11, pp. 1189-97, Nov. 28, 2005. [8] H. J. L. Marriott, N. L. Schwartz, and H. H. Bix, “Ventricular Fusion Beats”, Circulation, vol 26, pp. 880-884, 1962. [9] R. Mark, and G. Moody, MIT-BIH Arrhythmia Database [Online], Available: http://ecg.mit.edu/ dbinfo.html. [10] J. Pan, and W. J. Tompkins, “A real-time QRS detection algorithm”, IEEE Transactions on Biomedical Engineering, vol. 32, no. 3, pp. 230-236, 1985.
5. Discussion The result of the classification shows that feature selection and signal detection proposed in this paper provide a very high accuracy in testing phase. Due to the signal variation in time domain, we extract the variation of each beat and then estimate using basis prepared in the training phase. The projected sequences are then further projected into estimation space by their corresponding pseudo-inverse matrices.
TABLE 2. Percentage of beats of each type used in training and testing phase N
F
P
N
Training
%
11.3
19.8
F
P
Testing 10.6
88.7
80.2
89.4
Table II shows the percentage of beats used in training and testing phase of the classification. The result of the classification is impressive even the training beats occupy only a small fraction of total beats. We believe that the selection of the training beat segment is also important.
6. Conclusion In this paper, we discussed the proposed cardiac beat classification of normal, fusion and premature ventricular contraction beats. The feature used in this paper is the signal variation in time domain. We extract this feature by differentiation of the 240-sample interval centered at the R-peak. Without using the
1294 370