Comparison of Feature Extraction Methods for Brain ... - TOBI Project

4 downloads 74664 Views 213KB Size Report
ing value, time domain parameters, and Hjorth parameters. FEM parameters were ... bined with a bipolar spatial filter yielded the best classification accuracies.
Comparison of Feature Extraction Methods for Brain-Computer Interfaces P. Ofner1 , G. R. M¨ uller-Putz1 , C. Neuper1,2 , C. Brunner1 1

Institute for Knowledge Discovery, Graz University of Technology, Graz, Austria 2 Department of Psychology, University of Graz, Graz, Austria [email protected]

Abstract This paper compares classification accuracies of feature extraction methods (FEMs) as used in sensory motor rhythm (SMR) based Brain-Computer Interfaces (BCIs). Features were extracted offline from 9 subjects and classified with linear discriminant analysis. The following FEMs were compared: adaptive autoregressive parameters, band power, phase locking value, time domain parameters, and Hjorth parameters. FEM parameters were optimized individually with a genetic algorithm in advance. In summary, time domain parameters combined with a bipolar spatial filter yielded the best classification accuracies.

1

Introduction

In most sensory motor rhythm (SMR) based Brain-Computer Interfaces (BCIs), feature extraction methods (FEMs) typically extract features from preprocessed electroencephalographic (EEG) recordings. Afterwards, these features are classified. In general, different FEMs in combination with a specific classifier – here, we used linear discriminat analysis (LDA) as a binary classifier – lead to different classification accuracies. Usually, one wants to use the FEM yielding the highest classification accuracy. Therefore, this work compares the following popular FEMs with respect to the achieved classification accuracies: adaptive autoregressive (AAR) parameters [1], bilinear AAR (BAAR) parameters [2], multivariate AAR (MVAAR) parameters [2], band power (BP) [3], phase locking value (PLV) [4], time domain parameters (TDP) [5], and Hjorth parameters [6]. Most FEMs contain meta parameters that must be set before the method can be applied. It is crucial to tune these meta parameters carefully to tap the full potential of these methods. Therefore, all meta parameters were optimizied in a subject-specific way with a genetic algorithm (GA) [7]. Previous work compared AAR, BP, and fractal dimension features without optimizing meta parameters [8]. Another study compared a slightly different method set (with and without subjectspecific optimization): AAR, BP, Hjorth, TDP, Barlow, Wackermann, and Brain-Rate [9].

2 2.1

Test Setup EEG Data

Prerecorded data from the BCI competition IV (data set 2A) [10] were used for an offline analysis. There 22 Ag/AgCl electrodes with inter-electrode distances of 3.5 cm were used. Two sessions from each of 9 participants were recorded. One session comprised 6 runs, each with 48 trials. Originally, four equally distributed classes – motor imagery (MI) of the left hand (class 1), right hand (class 2), both feet (class 3) or tongue (class 4) – were recorded in a cue-based paradigm without feedback. However, for the analysis in this work, only classes 1 and 2 were used. Cues 1

were presented on a computer screen as follows: first, a fixation cross appeared together with a short signal tone indicating the start of a trial; at second 2, a cue appeared for 1.25 s (left arrow for class 1, right arrow for class 2) prompting participants to perform the required MI task; at second 6, the fixation cross disappeared and a short break followed. The EEG signals were originally band-pass filtered from 0.5 Hz to 100 Hz and recorded monopolarly at 250 Hz (left mastoid served as reference, right mastoid as ground). In addition, a 50 Hz notch filter was applied. To reduce data, signals were lowpass-filtered with 55 Hz and downsampled to 125 Hz. Laplace, common average reference (CAR), and bipolar derivations were calculated from the original monopolar data.

2.2

Comparison of Feature Extraction Methods

First, FEMs were optimized in the optimization step with a genetic algorithm, and afterwards tested in the evaluation step. Data from session 1 of each subject was used in the optimization step and for training an LDA classifier in the evaluation step. Data from session 2 was used solely for testing the LDA classifier in the evaluation step. All FEMs used data from C3 and C4 positions (10-20 system) with a specific spatial filter (monopolar, bipolar, Laplace, CAR). Thereby, three types of bipolar spatial filters were used: FC3/4-C3/4, C3/4-CP3/4, FC3/4-CP3/4. The type of bipolar spatial filter with the best fitness score (best classification accuracy) in the optimization step for a subject was used. In addition, the PLV FEM used four channels (two channels per hemisphere) in various arrangements, because inter-hemispheric coupling was expected to contain discriminative information [4]. The channel combination leading to the best fitness score in the optimization step was used for further analysis. Optimization Step A GA optimized the meta parameters of a certain FEM. Therefore, the actual meta parameters were represented by an individual, and features were extracted according to these parameters from session 1. The classification accuracies over the trial were calculated with a 10 × 5 cross-validation procedure using an LDA classifier. Finally, the 0.9 quantile (which is more robust as e.g. the maximum) of these classification accuracies between the cue and end of trial was used as the fitness score. The GA maximized this fitness score. Evaluation Step Features were extracted using the optimized meta parameters from the previous step. An LDA classifier was trained with features from session 1 of a subject and tested against features from session 2 of the same subject. The 0.9 quantile of the classification accuracy reached by the LDA classifier (between cue and end of trial) was used as the final classification accuracy reported for that FEM. Thus, for each method/spatial filter/subject combination, the classification accuracy was evaluated.

3 3.1

Results Descriptive Statistics

Figure 1 shows a box-and-whisker plot including mean values (dotted lines). Methods are sorted from left to right by their mean classification accuracies, and only spatial filters yielding the highest mean classification accuracy are shown. TDP with a bipolar spatial filter reaches the highest mean classification accuracy of 78 % (standard deviation 11 %, median 82 %). MVAAR with a bipolar spatial filter reaches the highest median classification accuracy of 83 % (standard deviation 13 %, mean 74 %). A bipolar spatial filter yields the highest mean (and median) classification accuracies for AAR, BP, Hjorth, MVAAR and TDP. Figure 2 shows mean values and standard deviations of all FEMs and spatial filters. Bipolar and Laplacian filters yield higher mean classification accuracies than monopolar and CAR spatial filters – except for PLV, where the opposite is true.

2

100

100 80 60 40 20

classification accuracy [%]

90 80 70 60

bipolar car laplace monopolar

50

classification accuracy [%]

bipolar car laplace monopolar

BP

HJORTH

MVAAR

AAR

BAAR

PLV

0

TDP

feature extraction method

AAR

BAAR

BP

HJORTH

MVAAR

PLV

TDP

feature extraction method

Figure 1: Box-and-whisker plot of classification accuracies when the spatial filter with the highest mean accuracy for each FEM is used. Additionally, the dotted line with the square marks mean classification accuracies.

3.2

Figure 2: Mean values and standard deviations of classification accuracies for all FEMs and all spatial filters.

Inferential Statistics

A two-way repeated measures ANOVA was applied to test for significant effects of factors SPATIALFILTER (F3,24 = 8.63) and FEATURE-EXTRACTION-METHOD (F6,48 = 8.20) on the classification accuracy (dependent variable). If the sphericity assumption was violated, p-values have been corrected according to Huynh & Feldt. Both main effects and the interaction effect show significant p-values (p < 0.05). Tukey’s Test has been used to test for significant differences in group mean values of the groups shown in Figure 1 (FEMs with their best spatial filter) and spatial filters when using TDP. PLV (CAR) differs significantly from TDP (bipolar) and BP (bipolar). A monopolar spatial filter is significantly worse than bipolar and Laplacian filters when using the method with the highest mean classification accuracy (TDP).

4

Discussion

No significant differences in the classification accuracies between AAR, BP, Hjorth, MVAAR, TDP and BAAR have been found when using the best (mean/median) spatial filter for a method. These results are corroborated by Vidaurre et. al [5], who also found no significant differences between AAR, BP, Hjorth, and TDP when using subject-specific optimization. However, PLV is significantly worse than BP/TDP. A reason could be that synchronization between neuronal assemblies occurs when they are processing information and therefore showing a decrease in power (event-related desynchronization ERD [11]) within each. This ERD makes PLV prone to noise. Furthermore, a significant difference between monopolar and bipolar/Laplacian filters was found using the FEM with the highest mean classification accuracy (TDP). This is because bipolar/Laplacian filters, as opposed to a monopolar filter, eliminate noise common to the used electrodes. Figure 2 suggests that bipolar filters reach higher classification accuracies than Laplacian filters. Areas producing ERD patterns due to MI [11] are located anterior of C3/C4, which can be better covered by FC3/4-C3/4 bipolar filters than Laplacian filters focused directly on C3/C4. Thus, in this setup, bipolar filters yielded higher classification accuracies. Results are only valid when using an LDA classifier, because in general, the performance of a classifier depends on the distribution of the input data, which depends on the FEM. However, LDA is widely used in the field of BCI research and thus, these results are useful for many BCI 3

implementations. To get meaningful results, it is absolutely necessary that the final testing is performed on unseen data to avoid underestimating the error. Therefore, session 2 of a subject was only used in the evaluation step. Futhermore, one must keep in mind that a GA is a meta-heuristic optimization method and therefore likely finds a good solution, but not necessarily a global optimum.

5

Conclusion

No significant differences were found between TDP, BP, Hjorth, MVAAR, AAR, BAAR. However, TDP with a bipolar spatial filter yielded the highest mean classification accuracy, a high median classification accuracy, is computationally efficient, has less parameters to set (less need for subjectspecific optimization), and is therefore favorable of all compared feature extraction methods. This conclusion is restricted to the usage of an LDA classifier, a continuous MI task of left hand versus right hand, and the usage of a small number of electrodes.

References [1] A Isaksson, A Wennberg, and L H Zetterberg. Computer analysis of eeg signals with parametric models. Proceedings of the IEEE, 69:451–461, 1981. [2] Clemens Brunner, Martin Billinger, and Christa Neuper. A comparison of univariate, multivariate, bilinear autoregressive, and bandpower features for brain-computer interfaces. In Fourth International BCI Meeting, Asilomar, 2010. [3] G. Pfurtscheller, D. Flotzinger, M. Pregenzer, J. R. Wolpaw, and D. J. McFarland. EEGbased brain computer interface (BCI) - search for optimal electrode positions and frequency components. Medical Progress Through Technology, 21:111–121, 1995. [4] Clemens Brunner, Reinhold Scherer, Bernhard Graimann, Gernot Supp, and Gert Pfurtscheller. Online control of a brain-computer interface using phase synchronization. IEEE Transactions on Biomedical Engineering, 53(12):2501–2506, December 2006. [5] Carmen Vidaurre, Nicole Kr¨ amer, Benjamin Blankertz, and Alois Schl¨ogl. Time domain parameters as a feature for eeg-based brain–computer interfaces. Neural Networks, 22(9):1313– 1319, November 2009. [6] B Hjorth. Eeg analysis based on time domain properties. Electroencephalography and Clinical Neurophysiology, 29(3):306–310, September 1970. [7] John Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press (reprinted in 1992 by The MIT Press), 1975. [8] R Boostani, B Graimann, M H Moradi, and G Pfurtscheller. A comparison approach toward finding the best feature and classifier in cue-based bci. Medical and Biological Engineering and Computing, 45:403–412, 2007. [9] Carmen Vidaurre and Alois Schl¨ ogl. Comparison of adaptive features with linear discriminant classifier for brain computer interfaces. In 30th Annual International IEEE EMBS Conference, Vancouver, 2008. [10] Clemens Brunner, Robert Leeb, Gernot M¨ uller-Putz, Alois Schl¨ogl, and Gert Pfurtscheller. Bci competition 2008 - graz data set a. http://www.bbci.de/competition/iv/desc_2a. pdf, 2008. [11] Gert Pfurtscheller and Fernando H Lopes Da Silva. Event-related eeg/meg synchronization and desynchronization: basic principles. Clinical Neurophysiology, 110(11):1842–1857, 1999.

4

Suggest Documents