A Time-Series Prediction Approach for Feature Extraction in a Brain

0 downloads 0 Views 386KB Size Report
Abstract—This paper presents a feature extraction procedure. (FEP) for a brain–computer ... network (NN)-based time-series prediction (TSP) feature ex-.
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 13, NO. 4, DECEMBER 2005

461

A Time-Series Prediction Approach for Feature Extraction in a Brain–Computer Interface Damien Coyle, Member, IEEE, Girijesh Prasad, Member, IEEE, and Thomas Martin McGinnity, Member, IEEE

Abstract—This paper presents a feature extraction procedure (FEP) for a brain–computer interface (BCI) application where features are extracted from the electroencephalogram (EEG) recorded from subjects performing right and left motor imagery. Two neural networks (NNs) are trained to perform one-step-ahead predictions for the EEG time-series data, where one NN is trained on right motor imagery and the other on left motor imagery. Features are derived from the power (mean squared) of the prediction error or the power of the predicted signals. All features are calculated from a window through which all predicted signals pass. Separability of features is achieved due to the morphological differences of the EEG signals and each NNs specialization to the type of data on which it is trained. Linear discriminant analysis (LDA) is used for classification. This FEP is tested on three subjects off-line and classification accuracy (CA) rates range between 88% and 98%. The approach compares favorably to a well-known adaptive autoregressive (AAR) FEP and also a linear AAR model based prediction approach. Index Terms—Alternative communication, brain–computer interface (BCI), electroencephalogram (EEG), time-series prediction.

I. INTRODUCTION

R

ESEARCH undertaken over the last 15 years indicates that humans can learn to control certain characteristics of their electroencephalogram (EEG), such as amplitude and frequency rhythms. A person’s ability to control their EEG would enable him/her to communicate without the prerequisite of movement. As EEG-based communication does not require neuromuscular control, people with neuromuscular disorders who may have no control over any of their conventional communication channels may still be able to communicate via this approach. EEG-based communication can be realized through a direct brain–computer interface (BCI) which replaces the use of nerves and muscles and the movements they produce with electrophysiological signals in conjunction with the hardware and software that translate those signals into actions [1]. An important component of most BCIs is the feature extraction procedure (FEP). This work demonstrates a novel neural network (NN)-based time-series prediction (TSP) feature extraction procedure (FEP). Linear adaptive autoregressive (AAR) prediction models are also tested in a similar framework. Features are derived from the power of the predicted signals or the Manuscript received March 19, 2004; revised June 03, 2005; accepted July 20, 2005. The work of D. Coyle was supported by a William Flynn Scholarship. The authors are with the Intelligent Systems Engineering Laboratory, School of Computing and Intelligent Systems, Faculty of Engineering, University of Ulster, Derry, Northern Ireland, BT48 7JL, U.K. (e-mail: [email protected]). Digital Object Identifier 10.1109/TNSRE.2005.857690

prediction error signals. Features are extracted and classified after each new prediction is made. A comparison with the well known AAR coefficients based FEP [2], [3] is presented. Results from an offline analysis on three subjects are presented and implications for parameter selection are discussed. It is shown that performance, enumerated based on results obtained from three performance quantifiers, compares favorably to other approaches. Performance is quantified by classification accuracy (CA) (i.e., the percentage of correct classifications) and 90% confidence intervals (obtained using a -statistic [4]) and the information transfer (IT) rate [5]. The latter performance quantifier considers the CA and the time required to perform classification (CT) of each mental task. A third quantifier of performance for a BCI system is the mutual information (MI) [6]: a measure of the average amount of information a classifier output contains about the input signal when the classifier produces a value , where the sign of indicates the class (two class system) and the value expresses the distance to the separating hyperplane. is referred to as the time-varying-signed distance (TSD) [6]. NNs are widely used as classifiers in BCI applications [7]–[11] and have also been utilized to process raw EEG data or features. For example, Pfurtscheller et al. [10] use a finite impulse response multilayer perceptron (FIR-MLP) NN to extract temporal information from AAR-based features before classifying. Muller et al. [11] describe an approach for reducing the dimensionality of the raw EEG before classification by using auto-associative NNs. Muller et al. and Kohlmorgen et al. [12], [13] describe an approach for segmentation of time-series data that exhibits nonstationary switching dynamics, such as sleep EEG data. The approach involves training different numbers of NNs to specialize on different subsets of the data and subsequently these NNs compete for data. Classification is carried out based on the NN with lowest prediction error.1 This approach exhibits some similarity to the approach described in this work, in that both approaches are based on training separate NNs to predict data with different dynamics. In this work each NN is trained separately whereas in [12], [13] the NNs are trained simultaneously with a specialized objective function. Also, in this work features are extracted, either from the error signals or predicted signals to form a set of features for each class, after which, a classifier is trained to find a separating hyperplane between the classes, whereas in [12], [13] this is not the case. The approach in [12], [13] has not been applied in a BCI. Section II describes the data acquisition procedure and the data configuration. The TSP FEP based on nonlinear and linear 1In [13], Hidden Markov Models are also utilized to identify the transitions between successive subsets e.g., the transition between sleep stages I and II.

1534-4320/$20.00 © 2005 IEEE

462

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 13, NO. 4, DECEMBER 2005

prediction models are also described. The AAR coefficients approach is also described. In Section III, the results are presented. Section IV is a discussion of the results and methods. Section V concludes the paper. II. METHODS A. Data Acquisition The EEG data was recorded by the Graz BCI research group [2], [3], [6], [10], [14]. The data is recorded from three subjects in a timed experimental recording procedure. Each trial an was 8–9 s in length. The first 2 s was quiet, at acoustic stimulus signified the beginning of a trial, and a fix, an arrow ation cross was displayed for 1 s, then at (left or right) was displayed as a cue (the data recorded between 3 and 8 s is considered as event related). At the same time, each subject was asked to move a bar in the direction of the cue by imagining moving the left or right hand. The data was recorded over two sessions, each session consisting of 140 trials (subject S1) and 160 trials (subjects S2 and S3). The recordings were made using a g.tec amplifier2 and Ag/AgCl electrodes. All signals were sampled at 128 Hz and filtered between 0.5 and 30 Hz. Two bipolar3 EEG channels were measured using two electrodes positioned 2.5 cm posterior and anterior to position C3 and C4 according to the international standard (10/20 system) electrode positioning nomenclature [15]. For subjects S2 and S3 electromyogram (EMG), recorded from the extensor muscles of both hands, and electrooculogram (EOG) recordings made during the last recording session were used to ensure that the BCI system was not controlled by muscle activity or eye movements [3]. B. Data Configuration The recorded EEG time-series data is structured so that the signal measurements from sample indexes to are used to make a prediction of the signal at sample index . The parameter is the embedding dimension. Each NN training input exemplar contains measurements from both the C3 and C4 time-series’ and has the following format: (1) where is the time delay. There are many time-series analysis techniques for selecting the best values for and [2], [16]. An information theoretic approach is described in [17] where at is chosen as best however, an empirical approach is often the most reliable. In this work, the features are directly related to the NN predictions (cf. Section II-D), therefore, the quality of the features is a good indication of the best and combination to utilize.4 All values of ranging from 2 to 8 with

2See

(http://www.gtec.at). bipolar recording, the recorded voltage is the voltage difference between the anterior and posterior electrode at each recording site. 4The quality of the features i.e., the attributes of the signal which convey the most discriminative information, is verified by the feature separability which is quantified by the classification accuracy. 3In

ranging from 1 to 2 were tested. For the purpose of this paper for are provided. results based on Each trial consists of approximately samples of event-related data. All parameter selection was performed using data from recording session 1. To obtain a general view of the efficacy of the approach, a five-fold cross-validation was performed. Session 1 for each subject was partitioned into a training set (60% of the data), a validation set (20% of the data) which was used for the early stopping criteria to prevent overgeneralization during NN training, and the test set (20% of the data) was used to calculate the CA rates. For each five-fold cross-validation, this procedure was performed five times using a different test partition (20%) each time. The mean-CA calculated from the CA rates obtained on the five test partitions (i.e., ) and 90% confidence intervals was estimated for each subject and used to select the best system parameters. The 20% validation partitions are only used to prevent the NNs from over fitting the predictions on the raw EEG data. Five-fold cross-validations were repeated until a good set of parameters was obtained, after which, all session 1 data is utilized to train new predictor models and to obtain a feature set and classifier using the chosen parameters. The system is tested on session 2 which is considered as the final and true test of the system. The AAR approaches are setup using a similar data partitioning strategy (cf. Section II-F). C. Prediction NNs—Architecture and Training Two feed-forward MLP-NNs are used to perform prediction. As these NNs perform prediction, they are referred to as pNNs and labeled L for left (LpNN) and R for right (RpNN), corresponding to the type of EEG data on which they are trained. For prediction, the optimum number of hidden layers, neurons in each layer and the types of transfer function are important parameters and the best choice depends on the complexity of the data to be learned by the NN. In this investigation, many different pNN architectures were tested. Neuron numbers were varied between 1 and 12 for single-hidden-layer NNs. Pure linear and tansigmoidal transfer functions were tested in all cases, for all subjects. Although multi-hidden-layer NNs were investigated, it was noted that there was no advantage gained by the increased NN architecture complexity. The pNN weights were updated using the Levenberg–Marquardt [18] algorithm. This method allowed fast convergence to a minimum error (plateau) but has the disadvantage of being computationally expensive. The quality of the features (cf. Section II-D) produced determined the best architecture. D. Feature Extraction Procedure After each set of pNNs has been trained to perform one-step-ahead prediction, the training data and validation data are input to both pNNs, trial by trial. When a trial is input to both pNNs features are obtained by continually calculating the power (mean squared) of the predicted signals (MSY) or the error signals (MSE) as they pass through a feature extraction (FE) window. These calculations reduce predictions within the FE-window for each signal to a scalar (single) value. The FE window is illustrated by dash lines in Fig. 1 and the window

COYLE et al.: A TIME-SERIES PREDICTION APPROACH FOR FEATURE EXTRACTION IN A BRAIN–COMPUTER INTERFACE

Fig. 1.

463

Illustration of FEP and the complete system.

significance is outlined in the next paragraph. Equation (2) is used for obtaining the MSE or MSY type features for MSE (2) for MSY where and are the values of actual signal and predicted signals (i.e., for either C3 or C4) at time , respectively. The index indicates whether the signal is the output from the LpNN or RpNN (i.e., can be or ). is the number of prediction samples [i.e., the FE window length (see next paragraph)]. The left data is fed into both the LpNN and RpNN and each pNN provides a prediction for two signals (C3 and C4) therefore features can be extracted from four new signals. Similarly, right data is fed into both pNNs. Four features can be extracted after each new set of predictions (i.e., at the rate of the sampling interval). The feature vector, , is shown in Fig. 1. Normalizing the features is optional and can be advantageous but this can also cause feature instability (cf. Section III). The predictions made from EEG recorded at the initiation preof a trial are fed into the window first. There must be dictions made from each of the pNNs output channels before feature extraction begins, after which, each time a new prediction is made, the oldest prediction is moved out of the window and a new feature vector is calculated from the data within the window. New predictions contribute to each feature calculation samples earlier are removed while predictions made over from the calculation so that predictions at the beginning of a trial or non-event related predictions are forgotten as new predictions are made. In [19] it is suggested that, when -rhythms have to be modified by motor imagery, the trial length cannot be less than 2 s although a shorter decision time may be realized using rhythms. This may have implications for selecting the optimal window size. If the window size is selected properly,

the segments of data that produce maximum feature separability is selected are captured as they pass through the window. through an automated iterative search procedure. During each iteration a five-fold cross-validation is performed on session 1 is incremented by and after each iteration the window width 25 samples (i.e., :25:500). 5 The window width which produces the highest mean-CA rate is chosen as the best, after can produce minor improvements. Typiwhich, fine tuning cally the window width that produces the best features ranges ranging between circa 300 and between 2.3 and 3.2 s, i.e., 400. E. Classification Linear discriminant analysis (LDA) was used for classification. Linear classifiers are generally more robust than their nonlinear counterparts, since they have only limited flexibility (less free parameters to tune) and are less prone to over-fitting [11]. Classification is performed at the rate of the sampling interval. A different LDA classifier is produced to classify each new set of features for session 1. The classification time point which produced maximal mean-CA for session 1 are used to setup the classifier which is used for all classification performed on session 2. F. Linear Prediction-Based FEPs 1) The AAR-Coefficients (AARc) Approach: To compare the proposed NN-TSP approach with an existing, well-founded approach, experimentation was performed using the AAR signal modeling approach [3], [6]. An AAR model describes a signal in the following form. (3) is a purely random white noise where, in an ideal case, process with zero mean and variance . The model is of

M is varied from 50 to 500 in steps of 25 samples

5

464

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 13, NO. 4, DECEMBER 2005

order and are the time-varying AR-coefficients, which are estimated with the recursive least squares (RLS) algorithm using Kalman filter update coefficient [2], [3]. The AAR coefficients are used as features for each signal at each time point. Again, a five-fold cross-validation was performed on session 1. The classification time point which produced maximal mean-CA was used to setup the LDA classifier to be used for session 2. 2) The AAR-TSP Approach: A novel AAR-based TSP approach was implemented to determine how traditional linear predictors compare to the NN-TSP approach. Each pNN in the NN-TSP approach was replaced by two AAR models where, two AAR models are modeled on left data and two are modeled on right data (each on one trial C3 and C4). The trials are selected randomly from the 80% partition (i.e., 60% training 20% validation data). For each fold of the cross-validation, a different set of trials are randomly selected to set up the AAR models. The best window width selected for each subject (see Section III) was the same for both the AAR-TSP and NN-TSP approaches.

TABLE I COMPARATIVE ANALYSIS OF THE TSP AND AARC APPROACHES AND FEATURE TYPES. NEURON NUMBERS FOR SINGLE HIDDEN LAYER NNS IS SPECIFIED LINEAR OR N NONLINEAR TRANSFER FUNCTION). IN COLUMN 1 (L FEATURE TYPES ARE SPECIFIED IN COLUMN 2 (AN INDICATES NORMALIZED FEATURES). COLUMN 3 SPECIFIES THE MEAN-CA 90% CONFIDENCE INTERVALS OBTAINED FOR SESSION 1. COLUMNS 4–7, RESPECTIVELY, SPECIFY THE CA, CLASSIFICATION TIME (CT), INFORMATION TRANSFER RATES (IT), AND MUTUAL INFORMATION (MI) OBTAINED FOR SESSION 2

=

m

=

n

6

III. RESULTS All features for subjects S1, S2, and S3 were extracted from , 350, and 300, respectively. In genwindows of width eral, maximum separability occurred when between 3 and 4 s of data had been processed, therefore, all pNNs were trained on data from each trial ranging from the initiation of the communicating signal to 4 s after. This meant that the NNs were trained on the most separable (distinct) segments of the data from each trial. The results obtained using a number of the best NN architectures are presented in Table I. Different NN architectures and methods are compared based on mean-CA rates obtained using a -statistic and 90% confidence intervals (level of significance ). These performance measures allow better parameter selection and a comparative analysis of different approaches. To obtain information on inter-session stability of each approach and the extent of its potential to perform in an online situation, results from a one-pass test of session 2 data are provided for each subject. It was observed that the results obtained using specific feature types, parameters, and approaches were not consistently stable across sessions. For example, in a number of cases, the MSE type features produced high mean-CA rates on session 1 for subject S2 and S3 (95%–98%) for both the TSP approaches. The 90% confidence intervals are also relatively tight indicating good intra-session robustness. However, this can be misleading because the CA rates are not maintained when tested on data from session 2 indicating that MSE features are unstable across sessions. The MSY features did not match the mean-CA rates achieved by the MSE features on session 1 (circa 88%–93%) for all subjects but were much more stable across sessions. It was also observed that normalizing the features improved the CA in some cases whilst degrading in others. Feature normalization can reduce the intra-class variance which is desirable but can also reduce the inter-class variance; an adverse effect and a probable cause of the instability across sessions in some cases. It was observed that the CA rates for ses-

sion 2 always fell within the 90% obtained for session 1 when using nonnormalized MSY features only therefore; the TSP approaches are compared based on results obtained using nonnormalized MSY type features. Further discussion on the feature types is given in Section IV. For subject S1, the AAR-TSP did not achieve the highest CA rates for either of the sessions 1 or 2. Although the mean-CA is only slightly lower, the 90% is larger which is indicative that the AAR-TSP approach may achieve poorer performance than the NN-TSP approach for this subject. The mean-CA for the AARc approach is slightly better than the NN-TSP approach, although the results for session 2 indicate that neither of the AAR approaches generalize as well across sessions. It is concluded that NNs with nonlinear transfer functions may provide a marginal performance advantage for this subject. For subject S2, NNs with linear transfer functions obtained the highest CA rates (circa 92%) for session 2 although the AAR-TSP approach achieved the highest mean-CA rate with the tightest confidence interval. Again, both the TSP approaches outperform the AARc approach on session 2, although it is concluded that linear TSP methods may achieve the best inter-session stability for this subject. For subject S3, the maximum mean-CA rates of circa 94%–98% were achieved using linear AAR approaches. However, the maximum CA for session 2 is achieved by the NN-TSP

COYLE et al.: A TIME-SERIES PREDICTION APPROACH FOR FEATURE EXTRACTION IN A BRAIN–COMPUTER INTERFACE

465

TABLE II COMPARISON OF THE TSP APPROACHES FOR DIFFERENT VALUES OF L. mE ( 10 ) IS THE DIFFERENCE IN THE MEAN OF THE mRMSE AND  IS THE CORRELATION-COEFFICIENT BETWEEN THE m RMSE PRODUCED BY EACH APPROACH FOR RIGHT AND LEFT DATA

2

Fig. 2.

Time course of CA for session 2, all subjects, for each approach.

approach which only achieved moderate mean-CA (circa 88%) on session 1. An analysis of the mean-CA rates for session 1 indicates that the AARc approach is significantly better than the other approaches, obtaining the highest mean-CA of circa 98% with the tightest . The results on session 2 are poor (equal to chance) and suggest that the AARc approach is unstable across sessions for this data set for this subject. These observations outline the difficulties in choosing parameters for all approaches for generalization on later sessions and the importance of comparing BCI methods across sessions. Fig. 2 illustrates the time course of the CA rates obtained on session 2 for the three approaches (results were obtained with MSY features for the TSP approaches).

IV. DISCUSSION A. NN-TSP versus AAR-TSP Approach As can be seen from Fig. 2, the time course of the CA for the NN and AAR-TSP approaches are very similar, although in all cases the best NN-TSP approach peaks slightly higher than the AAR-TSP approach which may suggest that the NN-TSP approach is better across sessions although, the tests performed on session 1 indicate that both TSP approaches are statically similar due to overlapping confidence intervals. An appropriate test to compare both TSP approaches is the McNemar’s test where differences in CA are only determined by those samples in the test that were classified differently [4], [20]. Results from McNemar’s tests carried out on the classifications on session 2 using nonnormalized MSY features indicate that the differences between the TSP approaches are not statistically significant for all subjects whereas the differences between the TSP approaches and the AARc approach are statistically significant for subjects S1 and S3 but not for S2. Given the flexibility of the pNN architecture (more free parameters to tune), it may be possible to make improvements with a more intuitive training algorithm. For instance, results from an

analysis of the mean root-mean-squared-error (mRMSE) 6 indicate that the predictors with the highest CA rates also had the lowest correlation between the mRMSE for the left data and the right data. An analysis was carried out on the mRMSE produced to 7. Typically, the mRMSE by both TSP approaches for ranged from 0.03–0.035 for the NN-TSP and 0.039–0.045 for the AAR-TSP approach when tested under the same conditions. Table II shows the CA and the correlation coefficient between the mRMSE for both data types produced by the TSP approaches. It can be seen that CA is maximum when is minimum. The AAR-TSP errors are much more correlated than the NN-TSP errors but the difference in the mean of the four for each data type are greater. It is concluded mRMSEs that the best predictors for this FEP are those which produce the minimum correlation between the mRMSEs of each data . type while, concomitantly maximizing the difference in This occurs naturally due to the morphology of the signals used to train each pNN. It may be possible to enhance those characteristics by utilizing a specialized training objective function for simultaneously training the pNNs, such as that described in [12], [13]. The results in Table II indicate that an objective function based on minimizing the mRMSE correlations and maxifor both data types may mizing the differentiation in the produce pNNs which enhance the separability of predicted signals and therefore the features produced. A similar approach may be adopted for setting up the AAR-TSP approach. As can be seen from Table I, there are implications and anomalies for selecting the best parameters to utilize to sustain stability across sessions. Selecting the best AAR models can be problematic however; it is possible that, with a more rigorous selection process for the appropriate AAR models, the AAR-TSP approach could be improved. Selecting the best number of neurons for the NN-TSP approach is also problematic and this is exemplified by the fact the NN weights are randomly initialized. However, from this investigation, it has been shown that single hidden-layer NNs with up to eight neurons in the hidden layer provide the best results. For subject S2 linear activation functions worked best while for subject S1 and S3 nonlinear activation functions worked best. Subject specific parameter tuning in the ranges outlined above should provide further improvements ranging from 300–400 provides the to the results. Generally, best results. The best pNN parameters can be chosen utilizing an iterative approach similar to that described for selecting the best (cf. the end of Section II-D) where, first, FE window width, a good window width – is chosen and the pNN 6Calculate the RMSE for each predictor for each trial, then calculate the mean RMSE for all trials (each data type). This results in 4 mRMSE values for each data type.

466

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 13, NO. 4, DECEMBER 2005

architecture is adjusted and five-fold cross-validated through an iterative training and testing procedure. This approach was fully automated via strategic programming but has the disadvantage of being computationally expensive. However, it is less time consuming and a great deal more effective than manually choosing the parameters.

smoothed and its affect on the MSY features is reduced while the power in the MSE features is less stable. However, the problem of outliers is not completely eliminated; a quandary which could be engaged by performing regularization or more powerful feature extraction techniques (e.g., frequency based FEP) on the predicted signals or error signals.

B. The AAR-Coefficients (AARc) Approach The AAR-coefficients (AARc) approach achieved good performance on data recorded within the same session. The results were less satisfactory when single trial tests were performed on session 2. The TSP approaches outperformed the AARc approach in this respect. A more rigorous procedure for selection of the Kalman update coefficient (UC) could possibly have yielded better results. The average intra-class variance of features produced by AARc approach is higher than that of the NN-TSP approach. It is probable that this is due to the fact that the AAR coefficients are more sensitive to the trial-to-trial differences due to noise as well as the coefficient initialization process whereas the MSE/MSY features are more stable as they are derived from average power, in the signals produced by the pNNs, spread over the FE window.

V. CONCLUSION The results in this investigation demonstrate the potential for the prediction-based FEP to be applied in a BCI. It has been illustrated that, with careful parameter selection, the FEP can be stable across sessions which is very important for a BCI system due to the inherent day to day variability in EEG recordings. A number of potential advantages which are unique to the proposed FEP have been identified and future work will involve evaluating those potentials. Also, to obtain a more practical depiction of the efficacy of the approach, experimentation on multiple subjects over multiple sessions online is essential and this is the next step in the development of this work. ACKNOWLEDGMENT

C. System Comparison Results show that the proposed FEP compares well to existing approaches. 70%–95% CA rates have been reported for experiments carried out on similar EEG recordings [3], [10]. The TSP approach shows significant potential for stable feature extraction across sessions. Current BCIs have IT rates ranging between 5 and 25 bits/min [21] although the suitability of many signals for BCI utilization by impaired patients is speculative [22]. In this work, the IT rates for all subjects range between 7–10 bits/min. Usually the minimum CT ranges from approximately 3–4 s. To improve IT rates, it would be necessary reduce CT and/or to improve the CA. The AAR-TSP approaches achieved the best MI rates for subjects S2 and S3 although, in some cases, even though CA rates were low, maximum MI was obtained, therefore, MI was not used to compare methods. Subject S1 data was used in the BCI2003 competition [14], [23] and the winning result achieved a maximum CA of circa 89% and 0.61 bits MI. The TSP approaches improved upon those results7 and have other potential advantages, such as the following. 1) Features can be extracted without knowledge of where communication is initiated thus the FEP has potential for online EEG processing. The TSD and thresholds could be used to accommodate online asynchronous classification. 2) Two new signals are produced for each original signal. Extracting power features from the raw EEG signals always resulted in poorer performance (results not shown), therefore, the TSP approach does improve the separability of the data. 3) The power based features utilized in this approach can be degraded by outliers. The pNNs do not predict irregular transients in the signals so the power in the artifact is 7 It is stated in [23] that there are implications when comparing results with the competition entries due to the current availability of the class labels for session 2.

The authors would like to acknowledge the Institute of Human–Computer Interfaces, University of Technology, and Guger Technologies, Graz, Austria, for providing the EEG. REFERENCES [1] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, “Brain-Computer interfaces for communication and control,” J. Clin. Neurophys., vol. 113, pp. 767–791, 2002. [2] A. Schlogl, D. Flotzinger, and G. Pfurtscheller, “Adaptive autoregressive modeling used for single-trial EEG classification,” Biomedizinische Technik, vol. 42, pp. 162–167, 1997. [3] C. Guger, A. Schlogl, C. Neuper, T. Strein, D. Walterspacher, and G. Pfurtscheller, “Rapid proto-typing of an EEG based BCI,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 9, no. 1, pp. 49–57, Mar. 2001. [4] S. L. Salzberg, “Methodological note on comparing classifiers: Pitfalls to avoid and a recommended approach,” in Data Mining and Knowledge Discovery. Norwell, MA: Kluwer, 1997, vol. 1, pp. 317–328. [5] J. R. Wolpaw, H. Ramouser, D. J. McFarland, and G. Pfurtscheller, “EEG-based communication: Improved accuracy by response verification,” IEEE Trans. Rehabil. Eng., vol. 6, no. 3, pp. 326–333, Sept. 2000. [6] A. Schlogl, C. Keinrath, R. Scherer, and G. Pfurtscheller, “Estimating the mutual information of an EEG-based brain computer interface,” Biomedizinische Technik, vol. 47, pp. 03–08, 2002. [7] J. F. Borisoff, S. G. Mason, A. Bashashati, and G. E. Birch, “Brain–computer interface design for asynchronous control applications: Improvements to the LF-ASD asynchronous brain switch,” IEEE Trans. Biomed. Eng., vol. 51, no. 6, pp. 985–992, Jun. 2004. [8] T. Felzer and B. Freisleben, “Analyzing EEG signals using the probability estimated guarded neural classifier,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 11, no. 2, pp. 361–371, Jun. 2003. [9] C. Anderson and Z. Sijercic, “Classification of EEG signals from four subjects during five mental tasks,” in Proc Conf. Eng. Appl. Neural Networks, 1996, pp. 407–414. [10] E. Haselsteiner and G. Pfurtscheller, “Using time-dependent NNs for EEG classification,” IEEE Trans. Rehabil. Eng., vol. 8, no. 4, pp. 457–462, Dec. 2000. [11] K.-R. Muller, C. W. Anderson, and G. E. Birch, “Linear and nonlinear methods for brain-computer interfaces,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 11, no. 2, pp. 165–169, Jun. 2003. [12] K. R. Muller, J. Kohlmorgen, and K. Pawelzik, “Analysis of switching dynamics with competing neural networks,” IEICE Trans. Fundamentals Electron. Commun. Comp. Sci., vol. E78-A, no. 10, pp. 1306–1315, 1995.

COYLE et al.: A TIME-SERIES PREDICTION APPROACH FOR FEATURE EXTRACTION IN A BRAIN–COMPUTER INTERFACE

[13] J. Kohlmorgen, K.-R. Müller, J. Rittweger, and K. Pawelzik, “Identification of nonstationary dynamics in physiological recordings,” Biol. Cybern., vol. 83, no. 1, pp. 73–84, 2000. [14] The Graz Data Set and description for the BCI 2003 competition. [Online]. Available: http://ida.first.fraunhofer.de/projects/bci/competition/ [15] B. J. Fisch, Fisch & Spellmann’s EEG Primer: Basic Principles of Digital and Analogue EEG. New York: Elsevier, 1999. [16] G. P. Williams, Chaos Theory Tamed. New York: Taylor and Francis, 1997. [17] D. Coyle, G. Prasad, T. M. McGinnity, and P. Herman, “Estimating the predictability of EEG recorded over the motor cortex using information theoretic functionals,” in Proc. 2nd Int. BCI Workshop, Biomed. Technik, vol. 49, Sep. 2004, pp. 43–44. [18] M. T. Hagan and M. Menhaj, “Training feedforward networks with the Marquardt algorithm,” IEEE Trans. Neural Netw., vol. 5, no. 6, pp. 989–993, Nov. 1994. [19] G. Krausz, R. Scherer, G. Korisek, and G. Pfurtscheller, “Critical decision-speed and information transfer in the Graz brain-computer interface,” Appl. Psychophys. Biofeedback, vol. 28, no. 3, Sep. 2003. [20] P. Sykacek, S. Roberts, M. Stokes, E. Curran, M. Gibbs, and L. Pickup, “Probabilistic methods in BCI research,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 11, no. 2, pp. 192–195, Jun. 2003. [21] Vaughan et al., “Guest editorial-brain–computer interface technology: A review of the second international meeting,” IEEE Trans. Neur. Syst. Rehabil. Eng., vol. 11, no. 2, pp. 94–109, Jun. 2003. [22] A. Kubler, B. Kotchoubey, J. Kaiser, J. R. Wolpaw, and N. Birbaumer, “Brain-Computer communication: Unlocking the locked-in,” Psychol. Bull., vol. 127, no. 3, pp. 358–375, 2001. [23] Blankertz et al., “The BCI competition 2003: Progress and perspectives in detection of EEG single trials,” IEEE Trans. Biomed. Eng., vol. 51, no. 6, pp. 159–161, Jun. 2004.

Damien Coyle (S’04–M’05) was born in the Republic of Ireland, in 1980. He received the first class honors degree in electronics and computing engineering, in 2002, from the University of Ulster, Derry, U.K., where he is currently working toward the Ph.D. degree in the Intelligent Systems Engineering Laboratory (ISEL). His research interests include nonlinear signal processing, biomedical signal processing, chaos theory, information theory and neural and adaptive systems. Mr. Coyle is a Student Member of the IEE.

467

Girijesh Prasad (S’85–M’98) was born in India, in 1964. He received the first class honors degrees in electrical engineering, the first class M.S. degree in computer science and technology, and the Ph.D. degree from the Queen’s University of Belfast, Belfast, U.K., in 1997. Currently, he is Lecturer and is a Member of Intelligent Systems Engineering Laboratory (ISEL), Research Group in the School of Computing and Intelligent Systems, the University of Ulster, Derry, U.K. His research interests include computational intelligence, predictive modeling and control of complex nonlinear systems, performance monitoring and optimization, thermal power plants, brain– computer interface, and medical packaging processes. Dr. Prasad is a Member of the IEE, the IEEE, and is a Chartered Engineer.

Thomas Martin McGinnity (M’82) received the first class honors degree in physics and the Ph.D. degree from the University of Durham, Durham, U.K. He has been a Member of the University of Ulster academic staff since 1992, and holds the post of Professor of Intelligent Systems Engineering within the Faculty of Engineering. He has 25 years experience in teaching and research in electronic engineering, leads the research activities of the Intelligent Systems Engineering Laboratory, Magee Campus, the University of Ulster, Derry, U.K. and is Acting Associate Dean of the Faculty of Engineering. His current research interests relate to the creation of intelligent computational systems in general, particularly in relation to hardware and software implementations of neural networks, fuzzy systems, genetic algorithms, embedded intelligent systems utilizing reconfigurable logic devices, and bio-inspired intelligent systems. Dr. McGinnity is a Fellow of the IEE, member of the IEEE, and a Chartered Engineer.

Suggest Documents