ECG based Biometric Recognition using Wavelets and RBF ... - wseas

2 downloads 75 Views 316KB Size Report
Key-Words: -biometric; Daubechies wavelet; electrocardiogram; radial basis functions neural network. ..... Biometrics: Theory, Methods and Applications, In.
Recent Advances in Information Science

ECG based Biometric Recognition using Wavelets and RBF Neural Network MANAL TANTAWI#, KENNETH REVETT*,ABDEL-BADEEH SALEM# andMOHAMED F. TOLBA# #

Faculty of Computer and Information Sciences, Ain Shams University, Abbasia, CAIRO, EGYPT * Faculty of Informatics and Computer Science, The British University in Egypt, El SheroukCity, CAIRO, EGYPT [email protected] Abstract: - In the last two decades, many studies suggested utilizing electrocardiogram (ECG) for biometrics. In this paper, an electrocardiogram (ECG) based biometric system is proposed. The RR intervals are extracted and decomposed using discrete Daubechies wavelet in wavelet coefficient structures. These structures are reduced by excluding the uninformative coefficients. Thereafter, they are fed into a Radial Basis Functions (RBF) neural network for classification. The conducted experiments were validated using Physionet databases. Critical issues like stability over time, ability to reject impostors and generalization to other datasets were also addressed. Key-Words: -biometric; Daubechies wavelet; electrocardiogram; radial basis functions neural network. detection process which is a big challenge by itself especially for the onsets and the offsets of the three waves which are susceptible to noise. Moreover, there is no universally acknowledged rule for defining exactly where the wave boundaries lie [1]. On the other hand, non-fiducial systems usually investigate the ECG frequency content. For example, non-fiducial features can be wavelet coefficients, discrete cosine transform coefficients…etc. A potential benefit of utilizing non-fiducial features is that they need only the detection of the R peak which is considered the easiest point to detect due to its strong sharpness and for some approaches no detection is needed at all [2]. In this paper, a non-fiducial identification system based on wavelets and radial basis functions (RBF) neural network is proposed. A5-level discrete wavelet decomposition is applied to the RR intervals using Daubechies wavelet (db8). The uninformative coefficients are removed from the resulted wavelet coefficients to reduce their dimension. The reduced coefficients are fed into RBF classifier for classification. One's experiments were carried out using four Physionet datasets [3-6] and the evaluation was drawn on the basis of measuring quantities, such as subject identification (SI), heartbeat recognition (HR), false

1 Introduction Recently, electrocardiogram (ECG) has emerged as a new biometric trait. ECG is a recording for the electrical activity of the heart. It has been used as a powerful diagnostic tool for decades. Its validity for biometrics is supported by the fact that the physiological and geometrical differences of the heart in different subjects reveal certain uniqueness in their ECG signals [1, 2]. Its low cost, difficulty of being falsified or spoofed and its ability of providing aliveness indication are the main advantages of utilizing ECG in biometrics [1]. Each heartbeat is represented in an ECG trace as three consecutive waves, namely P, QRS and T waves. ECG based identification\verification systems can be categorized to fiducial or nonfiducial systems according to the considered features [1, 2]. Fiducial systems utilize fiducial features that represent the temporal and amplitude distances between fiducial points along with angle features. Hence, fiducial features require the detection of 11 fiducial points from the P, QRS and T waves of each heartbeat. These 11 fiducial points include three peak points (P, R and T), two valleys (Q and S) and the six onsets & offsets for the three waves [1, 2]. Consequently, fiducial features significantly rely on the accuracy of fiducial

ISBN: 978-960-474-304-9

100

Recent Advances in Information Science

system was tested on a database of 15 subjects with 100% subject identification accuracy.

acceptance\false rejection rate (FAR\ FRR) and generalizability to other datasets.

Except for the system proposed by Sufi et al. [10], the above systems have the advantage of relaxing the fiducial detection process to include only the R peak and sometimes no fiducial detection is needed at all. However, non-fiducial approaches for feature extraction usually yielded high dimension feature vectors which may include redundancies or irrelevant information for the task in hand. Moreover, crucial issues like stability over time, generalization to other datasets and response to intruders were not examined by most of the existing systems.

2 Related Work In this section, a brief survey of some key published results related to non-fiducial systems is presented as follows: Wan et al. [7] proposed a verification system that utilized a set of 40 heartbeats which were extracted for each subject. The 40 heartbeats (RR intervals) were reduced to a set of 10 heartbeats by averaging every 4 heartbeats. Each of them was decomposed into 256 biorthogonal wavelet coefficients. These coefficients were fed into a 3 layer feed-forward neural network. The system was trained on a database of 23 persons and the experiments showed that all of them were successfully verified. Another verification system was proposed by Belgacem et al. [8] where a 5-level discrete wavelet transform using db3 wavelet from Daubechies family was applied to averaged R-R cycles. Then the random forests method was used as a measure for the verification mechanism. A 100% verification rate was achieved with a dataset of 80 subjects. Chiu et al. [9] constructed input signal segments by concatenating data points form the backward 43rd point of the R peak to the forward 84th point from four heart beats yielding input segments of 512 points for each. Subsequently, these segments were decomposed using 9-level Haar wavelet. The system was trained on a database of 35 subjects and 10 arrhythmia patients. 100% verification rate was obtained for normal subjects and 81% for arrhythmia patients using Euclidean distance as a criterion for verification. Wang et al. [1] utilized the non-zero coefficients of the discrete cosine transform (DCT) of the autocorrelated heartbeat signals after removing noise by using a band pass filter. The Euclidean distance was employed for discrimination. The system was tested on 13 subjects from the PTB database and generalized on 14 subjects from the MIT_BIH database. The subject identification accuracy was 100% for both and the window recognition accuracy 94.4% for PTB and 97.8% for MIT_BIH. A polynomial distance measure (PDM) method for ECG based biometric authentication was introduced by Sufi et al. [10]. For each heartbeat, the three complexes P, QRS and T were approximated by a polynomial equation and the coefficients were stored. A match between 2 beats was achieved when the Euclidean distance between their feature vectors (polynomial coefficients of the three waves) was below certain threshold. The

ISBN: 978-960-474-304-9

3

Methodology

The proposed system can be broken down into three main steps: 1) pre-processing; 2) feature extraction & reduction; 3) classification.

3.1 Preprocessing For noise reduction and baseline line removal, a Butterworth filter of second order with cutoff frequencies of 1 and 40 Hz was utilized. Thereafter, The Pan and Tompkins algorithm [11] was applied to R peak detection. Subsequently, R-R cycles were extracted from each record and they were interpolated to the same length of 128 points. The amplitude of all points for each R-R cycle was normalized by the value of R peak into the range of [0-1]. Finally, from each training record, 40 cycles were randomly chosen. Each four cycles were averaged to one in order to avoid signal variations yielding 10 cycles for each subject utilized for training. 3.2 Feature extraction & reduction A5-level discrete wavelet decomposition was applied to the selected RR cycles using one of Daubechies wavelets 'db8' [12]. For the first level, the spectrum of each RR cycle is decomposed to low frequency region (approximation part) and high frequency region (details part). Thereafter, for each level, the approximation of the previous level is further decomposed into new approximation and details regions and so on until the last level. The resulted wavelet coefficient structure consists of six parts: five parts for the coefficients derived for the details region from each level and one part for the coefficients derived for the remaining approximation region in the last level. Fig. 1 clarifies the discrete wavelet decomposition levels

101

Recent Advances in Information Science

and the resulted wavelet coefficient structure. Meanwhile, fig. 2 (a and b) shows an original averaged R-R cycle and its corresponding structure. As shown in fig. 1, coefficients of d1 and d2 parts contribute to approximately 57% of the wavelet coefficient structure. However, most of them are zero or near zero values. This can be interpreted by the fact that the frequency content of an ECG signal after filtering is usually concentrated in low frequencies [1-40 Hz]. Hence, excluding these coefficients was investigated in this study.

3.3 Classification The wavelet coefficients of each RR cycle of the training set are fed into the classifier in order to perform the classification task (subject identification). Radial Basis Functions (RBF) neural network was employed here as a classifier. The RBF network is based on the simple idea that an arbitrary function y(x) can be approximated as the linear superposition of a set of localized basis functions φ(x) [13]. The RBF is composed of three different layers: the input layer in which the number of nodes is equal to the dimension of input vector. In the hidden layer, the input vector is transformed by a radial basis activation function (Gaussian function): (1) where ||x- cj || denotes the Euclidean distance between the input data sample vector x and the centercj of Gaussian function of the jth hidden node; finally the outer layer with a linear activation function, the kth output is computed by equation

(2)

Fig.1. The discrete wavelet decomposition levels and the construction of the resulted wavelet coefficient structure (201 coefficients from 6 parts).

wkjrepresents a weight synapse associates with the jth hidden unit and the kth output unit with m hidden units [13]. The orthogonal least square algorithm [14] was utilized for choosing the centres from the training set (500 averaged beat, 10 per each subject) which is a very crucial issue in RBF training, due to its significant impact on the network performance. This algorithm was chosen for its efficiency and because there are very few parameters to be set or randomly initialized [14].

(a)

4. Experiments and Results This section provides a detailed discussion about the conducted experiments and the achieved results.

4.1 Data preparation The famous published Physionet databases: PTB [14], MIT_BIH [19, 20] and Fantasia [21] were utilized in this work. These databases provide only one record for each of their healthy subjects except PTB database which has the advantage of providing more than one record for some of its subjects. PTB encompasses ECG records for 50 healthy subjects with duration that varies from 1.5-2 minutes

(b) Figure 2: (a) an averaged RR interval and (b) its wavelet coefficient structure (the bounds of each part are defined by dashed vertical lines).

ISBN: 978-960-474-304-9

102

Recent Advances in Information Science

(minimum 100 beats), 14 of them have more than one record. Hence, for more reliability and robustness, PTB database was utilized for training and testing, while fantasia database (40 subjects) and MIT_BIH (24 subjects) were utilized for impostors rejection and generalization test. The PTB database was partitioned in four sets: one set for training and three sets for testing. The training set includes 40 beats extracted from each of 50 records belonging to the 50 considered subjects. In the meantime, Test Set 1 includes the remaining beats from those 50 records which were not considered for training, Test set 2 includes other records (14 records) recorded on the same day of recording the training ones but in different sessionsand finally Test set 3 includes nine records recorded after few months (years) of recording the training records.

4.2 Testing The database

System

using

Table 1: The achieved SI and HR accuracies by the three test sets using full and reduced wavelet structure. Full wavelet Reduced wavelet structure structure SI % HR% SI % HR % Test set 1

100%

98.1%

100%

99%

Test set 2

100%

94.7%

100%

97.4%

Test set 3

100%

81.75%

100%

83%

False acceptance rate (FAR) is the percentage of access attempts by unauthorized individuals which are nevertheless successful. Meanwhile, false rejection rate (FRR) is the percentage of access attempts by enrolled individuals which are nevertheless rejected. FAR\FRR is typically acquired by adjusting one or more acceptance thresholds. The acceptance thresholds are varied, and for each value, the FAR and FRR are computed. In this work, two thresholds were employed. One is typically used (call it Θ1) which is the value above which a heartbeat is considered classified, while another threshold (call it Θ2) represents the minimum percentage of correctly classified beats needed for a subject to be considered identified. The set of imposters for computing FAR (64 subjects) were gathered from MIT_BIH and Fantasia databases. The thresholds Θ1 and Θ2 were adjusted for measuring the FAR and FRR in such a way to have the minimum FAR while preserving zero value for FRR as much as possible. One's results revealed that the optimum values for Θ1 and Θ2 are 0.57 and 80% respectively. Table 2 showsthe best values achieved for FAR, FRR and the average HR accuracy for test sets after thresholding with the full(201 coefficients) and reduced (87 coefficients) wavelet structures.The best result was also achieved with the reduced set of 87 coefficients.

PTB

The PTB training and testing sets were processed in the same way discussed in section 3. The wavelet coefficients of the training set were fed into an RBF classifier with spread 1 and mean squared error (MSE) 7e-5 for training. The number of input nodes is 201 or 87 in case of consideringa reduced set of coefficients, the optimum number of hidden nodes is 270 and the number of nodes in outer layers are 50 (number of considered persons). The values of parameters and number of hidden nodes were found empirical during the experiment. The evaluation of the proposed system was accomplished in terms of Subject Identification accuracy (SI), Heartbeat Recognition accuracy (HR), false acceptance rate (FAR) and false rejection rate (FRR). SI accuracy is defined as the percentage of subjects correctly identified by the system, and the HR accuracy is the percentage of heartbeats correctly recognized for each subject. A subject is considered correctly identified if more than half of his\her beats are correctly classified to him\her and a heartbeat is recognized by majority voting of the classifier outputs. Table 1 shows the SI and HR achieved for test set 1, test set 2 and test set 3 using the full(201 coefficients) and reduced (87 coefficients) wavelet structures.The results revealed that reduced wavelet structures not only decrease the computational load, but also improve the results.

Table 2: The achieved values of the HR after thresholding, FRR and FAR. HR % FRR% FAR% Full wavelet structure Reduced wavelet structure

ISBN: 978-960-474-304-9

103

95%

0%

7.8%

95.9%

0%

4.6%

Recent Advances in Information Science

4.3 Testing generalization capabilities of the system

[2] F. Agrafioti, J. Gao, D. Hatzinakos, Heart Biometrics: Theory, Methods and Applications, In Biometrics: Book 3, J. Yang, Eds., Intech., 2011, pp.199-216. [3] M. Oeff, H. Koch, R. Bousseljot, D. Kreiseler, the PTB Diagnostic ECG Database, (National Metrology Institute of Germany). http://www.physionet.org/physiobank/database/ptbd b/, Accessed 22 Feb 2013. [4] The MIT-BIH Normal Sinus Rhythm Database, http://www.physionet.org/physiobank/database/nsrd b/, Accessed 22 Feb 2013. [5] The MIT_BIH Long Term Database, http://www.physionet.org/physiobank/database/ltdb/ , Accessed 22 Feb 2013. [6] The Fantasia Database, http://www.physionet.org/physiobank/database/fant asia/, Accessed 22 Feb 2013. [7] Y. Wan, J. Yao, A Neural Network to Identify Human Subjects with Electrocardiogram Signals, in Proc. of the World Congress on Engineering and Computer Science 2008, San Francisco, USA, 2008. doi:10.1.1.148.5220. [8] N. Belgacem. A. Ali, R. Fournier and F. Bereksi-Reguig, ECG based Human Authentication using Wavelets and Random Forests, International Journal on Cryptography and Inf. Security (IJCIS), Vol. 2, No. 2, 2012, pp. 1-11. [9] C. Chiu, C. Chuang and C. Hsu, A novel personal identity verification approach using a discrete wavelet transform of the ECG signal, in MUE ’08: in Proc. of the 2008 International Conference on Multimedia and Ubiquitous Engineering. Washington, DC, USA, 2008, pp. 201–206. [10] F. Sufi, I. Khalil, I. Habib, Polynomial distance measurement for ECG based biometric authentication, Security and Communication Networks, Wiley, Interscience, 2008. Doi: 10.1002/sec.76. [11] J. Pan and W. Tompkins, A Real Time QRS Detection Algorithm, IEEE Transactions on Biomedical Engineering, Vol. 33, No. 3, 1985, pp. 230-236. [12] M. Misiti, Y. Misiti, G. Oppenheim and J. Poggi, Wavelet Toolbox 4 User Guide, 4.1 ed: The MathWorks, Inc., 2007. [13] S. Haykin, Neural networks: A comprehensive Foundation, 2nd ed., Prentice Hall, 1999. [14]S. Chen, E. Chng, Regularized Orthogonal Least Squares Algorithm for Constructing Radial Basis Function Networks, International Journal of Control, Vol. 64, No. 5, 1996, pp. 829-837.

Generalization is a very crucial issue, since it gives evidence on the ability of the system to maintain its performance when it is trained and tested using other datasets without any change in the structure of the algorithms, network structure, or parameters. The parameters and thresholds were fixed to their optimum values achieved with PTB database. Thereafter, the system was trained using fantasia database. The average HR accuracy, FRR and FAR were computed and summarized in Table 3. The set of imposters for the FAR test in this experiment encompassed subjects of PTB, MIT_BIH databases. Table 3: The generalization results with fantasia database. HR % FRR% FAR% Fantasia database (reduced wavelet structure)

95.89%

0%

5%

5. Conclusion This study proposes an ECG based biometric system that utilizes discrete Daubechies wavelet 'db8' coefficients as features and RBF neural network as classifier. A5-level discrete 'db8' decomposition was applied to RR intervals of 128 points; the resulted wavelet structure was fed into RBF neural network for classification. One's system was examined using Physionet databases and assessed on the basis of measuring quantities, such as SI, HR, FAR\ FRR. One's experiments revealed that only 47% of the resulted wavelet structure for RR is sufficient for the task in hand. Moreover, it is worth mentioning that the time needed for processing and classifying 10 averaged heartbeats by one's system is 0.05 seconds and this time can be reduced to 0.032 seconds when reduction of coefficients is considered (87 coefficients) using machine Intel core i7, Windows 7 and Matlab 2009b. Finally, one's system showed stability over time (table 1) and ability to generalize to other datasets.

References: [1] Y. Wang, F. Agrafioti, D. Hatzinakos , K. Plataniotis, Analysis of Human Electrocardiogram for Biometric Recognition, EURASIP journal of Advances in Signal Processing. Article ID 148658 (2008). doi:10.1155/2008/148658.

ISBN: 978-960-474-304-9

104