Feature extraction and selection algorithms in biomedical data ...

2 downloads 179 Views 573KB Size Report
Search. You're seeing our new conference paper page and we'd like your opinion, send .... Not logged in Google [Search Crawler] (3000811494) 66.249.64.48.
Feature extraction and selection algorithms in biomedical data classifiers based on time-frequency and principle component analysis. P. S. Kostka1, E. J. Tkacz1 1

Institute of Electronics, Div. of Microelectronics and Biotechnology, Silesian University of Technology, Gliwice, Poland

Abstract— Proposed methods for feature extraction and selection stages of biomedical pattern recognition system are presented. Time-Frequency signal analysis based on adaptive wavelet transform and Principle Component Algorithm (PCA) algorithm is used for extracting and selecting from original data the input features that are most predictive for a given outcome. From the discrete fast wavelet transform coefficients optimal feature set based on energy and entropy of wavelet components is created. Then PCA is used to shrink this feature group by creating the most representative parameter subset for given problem, which is the input for last neural classifier stage. System was positively verified on the set of clinically classified ECG signals for control and atrial fibrillation (AF) disease patients taken from MITBIH data base. The measures of specificity and sensitivity computed for the set of 20 AF and 20 patients from control group divided into learning and verifying subsets were used to evaluate presented pattern recognition structure. Different types of wavelet basic function for feature extraction stage as well as supervised (Multilayer Perceptron) and unsupervised (Self Organizating Maps) neural network classification units were tested to find the best system structure.

ables is not always significant. This problem may undermine the success of machine learning that is strongly affected by data quality: redundant, noisy or unreliable information may impair the learning process. Proposed feature extraction tools almost always must depend on the specificity of classification task to be sensitive to features, which will be able to distinguish between health and pathology cases. The application field of presented multi-domain feature extraction and selection is the trial of

Keywords— feature extraction, feature selection, principal component analysis, pattern recognition, wavelet transform.

I. INTRODUCTION Pattern recognitions system structure (fig.1) after preliminary data preparation parts consists of two major stages [1]: • Feature extraction and selection • Classification The pattern to be recognized is first converted to some features, believed to carry the class identity of the pattern, and then the set of features is classified as one of the possible classes. To achieve high recognition accuracy, the feature extractor is required to discover salient characteristics suited for classification and the classifier is required to set class boundaries accurately in the feature space [1]. Progress made in sensor technology and data management allow researchers to gather data sets of ever increasing sizes, particularly with respect to the number of variables. However, the incremental informative content of such vari-

Fig.1

Pattern recognition system structure.

T. Jarm, P. Kramar, A. Županič (Eds.): Medicon 2007, IFMBE Proceedings 16, pp. 70–73, 2007 www.springerlink.com © Springer-Verlag Berlin Heidelberg 2007

Feature extraction and selection algorithms in biomedical data classifiers based on time-frequency and principle component analysis.

71

Original Signal ECG lead: II

1 0.5 0 -0.5

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

3000

3500

4000

4500

5000

3000

3500

4000

4500

5000

3000

3500

4000

4500

5000

3000

3500

4000

4500

5000

3000

3500

4000

4500

5000

Wavelet Detail D7 0.5 0 -0.5

0

500

1000

1500

2000

2500 Wavelet Detail D6

0.2 0 -0.2

0

500

1000

1500

2000

2500 Wavelet Detail D5

0.2 0 -0.2

Fig. 2 Feature extraction stage based on ECG signal analysis in time(T), frequency (F) and mixed T-F domains.

0

500

1000

1500

2000

2500 Wavelet Detail D4

0.05 0 -0.05

0

500

1000

1500

2000

2500 Wavelet Detail D3

0.05

detection of atrial fibrillation (AF), which is a supraventricular tachyarrhythmia characterized by uncoordinated atrial activation with consequent deterioration of atrial mechanical function. The purpose of feature or variable selection is to eliminate irrelevant variables to enhance the generalization performance of a given learning algorithm. The selection of relevant variables may also be useful to gain some insight about the concept to be learned. Other advantages of feature selection include cost reduction of data gathering and storage and computational speedup [2]. In this paper we investigate the efficiency of criteria derived from support vector machines (SVMs) for variable selection in application to classification problems. II. METHODS A. Feature extraction Before selection of the most representative features (II.B) the set of parameters characteristic for Atrial Fibrillation (AF) detection were composed of the time, frequency and mixed time-frequency domain (fig.2) [3]. This multidomain feature set covers a wide spectrum of possible AF activity occurrence. 1. Time domain: Duration time of atrial activation (P wale time) (tP) 2. Frequency domain: Frequency of oscillations after ventricular activity cancellation (FAF) 3. Mixed T-F domain: T-F analysis carried out by fast Mallat wavelet decomposion was used to compute following parameters based on energy and entropy, which correspond to the measure of information included in every frequency sub-band of Mallat jth decomposition (fig.3) [4],[5],[6]:

0 -0.05

0

500

1000

1500

2000

2500

Fig. 3 Multilevel Mallat decomposition components (details: d3-d7 with original ECG signal) of II ECG lead of patient with AF (fig.1).

• Energy of wavelet component

E1, j {ci , j } = (ci , j ) 2 ⇒ E11, j {s (n)} = ∑ (ci , j ) 2 i

• The (non-normalized) Shannon entropy:

E2, j {ci , j } = −(ci , j ) 2 log(ci , j ) 2 ⇒ E2, j {s (n)} = −∑ ⎡⎣(ci , j ) 2 log(ci , j ) 2 ⎤⎦ i

Full list of proposed T-F parameters are included in [7]. B. Feature selection There are two approaches to feature selection problem: I. Feature subset selection II. Feature projection, which tries to find optimal original feature combination (projection into new domain) into smaller set of new features. Principle component analysis (PCA) [8] or projection pursuit [9] are often used feature projection methods [10]. In presented algorithm, the most relevant features were obtained as the arbitrary number of the most principle components computed for multi-domain features (from point II.A), characteristic for AF detection. Principle Component Analysis: PCA realizates linear input data mapping (M-dimension space) into new feature space (L-dimension). In pattern recognition tasks it enables

__________________________________________ IFMBE Proceedings Vol. 16 ___________________________________________

72

P. S. Kostka, E. J. Tkacz

to eliminate the uncorrelated noise and linear dependences in data.

[

]

T

= v1( j ) , v2( j ) ,..., vM( j ) , where: v ( j ) ∈ V ⊂ ℜ M , taken from dataset Γ = v (1) ,..., v ( P ) Every feature vector v

( j)

[

]

consisted of P feature vectors is mapped into reduced, L-

[

]

T

z ( j ) = z1( j ) , z2( j ) ,..., z L( j ) , ( j) L where z ∈ Z ⊂ ℜ ; L < M to fulfill MMSE minidimensional feature vector

malization criteria.

( j)

According to [11] each vector v can be approximated by the following sum with reduced dimension: L

v~ ( j ) = ∑ zi( j )ui + i =1

M

∑b u

i = L +1

i i

To chose the basis ortonormal vectors ui and set of coefficients bi to achieve the best, optimal approximation for every feature vector the sum of squared errors over the whole data set is created:

EL =

1 P ( j) ~( j) ∑ v −v 2 j =1

2

=

(

1 P M ( j) ∑ ∑ zi − bi 2 j =1 i=L+1

)

2

After some modifications [11] the minimum of the measure E L is achieved for the following form of basis vectors:

∑v ui = λiui where:



v

is covariance matrix of the learning set

of v feature vectors ; ui are the eigenvectors and eigenvalues of



v

λi are

the

.

PCA with mentioned above theoretical background was practically carried out according to algorithm presented in fig.4. Neural classifier structure of new selected feature vector F2 is the last stage of pattern classifier (fig.1), realized by: • Supervised learnt multilayer perceptron (MLP). • Unsupervised structure of Kohonen maps (SOMs).

Fig.4

PCA algorithm structure.

III. RESULTS To verify presented method, ECG signals taken from MITBIH database containing AF episodes were tested. Whole data set consisting of 40 cases with long term ECG recordings were divided into learning and verifying set. Performance of presented pattern recognition system was evaluated based on classical measures of classifier Sensitivity and Specificity. First group of tests (Table 1) was carried out for different types of basic wavelet structure on feature extraction stage (T-F analysis) and both supervised (Multilayer Perceptron) and nonsupervised (Self Organizating Maps) learnt neural structure used for final feature vector classification.

__________________________________________ IFMBE Proceedings Vol. 16 ___________________________________________

Feature extraction and selection algorithms in biomedical data classifiers based on time-frequency and principle component analysis.

Classifier structures comparison 100

[%]

90 80 70 60 50 SOM DM

MLP DM

SOM PCA

MLP PCA

Type of classifier structure Sensitivity

Specificity

Fig. 5 Results of comparison the feature selection stage realized with the method based on PCA and different approach using with feature Discriminity Measure [3], which expresses what is the value of given feature separability.

value of classifier sensitivity S=92%, while specificity SP=90% for AF with different degree of organization (atrial flutter, AF1, AF2 and AF3). PCA used in feature selection stage gave better results then the other type of feature selection based on particular feature Discriminity Measure. To conclude, obtained results showed, that before pattern classifier can be properly designed and effectively used, it is necessary to consider the feature extraction and data reduction problems. Feature extraction should consists in choosing those features, which are most effective for preserving the class separability. Principal Component Analysis appeared as an effective tool for most representative features selection, improving whole classification process. Presented classification procedure gave satisfactory results, considering described classification algorithm as a contribution to atrial fibrillation arrhythmia detection on preliminary screen examination stage.

Table 1 Comparison of AF detection results for two structures of neural

REFERENCES

classifier part: multilayer perceptron (MLP) & Kohonen self organizating maps (SOMs), two types of feature extraction basic wavelet function. Neural Network classifier part type MLP + Feature extraction using db5 wavelet + PCA SOMs + Feature extraction using db5 wavelet +PCA MLP + Feature extraction using bior2.1 wavelet+PCA SOMs + Feature extraction using bior2.1 wavelet +PCA MLP class. NO preliminary feature extr. and sel. stage SOMs class. NO preliminary feature extr. and sel. stage

Sensitivity [%]

Specificity [%]

92

90

81

80

90

86

80

80

65

60

70

71

Second group of verifying tests (fig.5) was connected with the comparison of using PCA in feature selection stage with presented in our previous works [3] parameter selection algorithms based on the computing of ith feature Discriminity Measure, which expresses what is the value of its separability. IV. CONCLUSIONS After feature extraction from different time (T), frequency (F) and T-F domains, Principal Component Analysis was used to transform extracted features into new space with reduced size. Presented article focuses on PCA used for revealing of the features with maximal weight in classification process. It allowed to find the optimal feature subset selection of from different domain T-F features. Attrial Fibrillation detector tests gave for the optimal structure the

73

1.

Duda R.O., Hart P.E. (1973) Pattern classification and scene analysis. John Wiley & Sons, New York. 2. Rakotomamonjy A. (2003), Variable Selection Using SVM-based criteria, Journal of Machine Learning Research 3:1357-1370. 3. Kostka P.S., Tkacz E.J. (2006), Hybrid Feature Vector Creation for Atrial Fibrillation Detection Improvement, Proc. of World Congress of Medical Physics and Biomedical Engineering, Seoul. 4. Mallat S. (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. on Pattern An, 7(11):674-693 5. Akay M. Time-Frequency and wavelet analysis. IEEE EMB Magazine 14(2). 6. Thakor N.V. Sherman D.L. (1994) Biomedical problems in timefrequency-scale analysis – new challenges, Proceedings of the IEEESP pp. 536–539, 1994. 7. Kostka P.S., Tkacz E.J. (2005), Feature extraction optimization in neural classifier of heart rate variability signals, Proceedings of the 4th International Conference on Computer Recognition Systems CORES 2005, Advances in Soft Computing, Springer-Verlag, pp.585-594. 8. Karhunen, Joutsensalo (1995), Generalizations of principle component analysis, optimization problems, and neural networks, Neural Networks, vol. 8, No. 4, pp. 549-562. 9. Friedman, W.J. Tukey, J.W. (1974) A proj. pursuit alg. for exploratory data analysis, IEEE Trans. on Com., Vol. 23, pp. 881-889. 10. Fukunaga K. (1990), Introduction to statistical pattern recognition. 2nd edition. Academic Press, San Diego, CA. 11. Bishop C.M, (1996) Neural Networks for Pattern Recognition, Oxford University Press, New York. Corresponding author: Author: Pawel Kostka Institute: Street: City: Country: Email:

Institute of Electronics, Silesian University of Technology Akademicka 16 Gliwice Poland [email protected]

__________________________________________ IFMBE Proceedings Vol. 16 ___________________________________________

Suggest Documents