Robust Common Spatial Patterns for EEG Signal ... - UBC ECE

16 downloads 0 Views 193KB Size Report
Median Absolute Deviation (MAD) is also used to robustly estimate the variance of the projected EEG signals. The results show that the proposed algorithm is ...
30th Annual International IEEE EMBS Conference Vancouver, British Columbia, Canada, August 20-24, 2008

Robust Common Spatial Patterns for EEG Signal Preprocessing Xinyi Yong, Rabab K. Ward and Gary E. Birch

Abstract— The Common Spatial Patterns (CSP) algorithm finds spatial filters that are useful in discriminating different classes of electroencephalogram (EEG) signals such as those corresponding to different types of motor activities. This algorithm is however, sensitive to outliers because it involves the estimation of covariance matrices. Classical sample covariance estimates are easily affected even if a single outlier exists. To improve the CSP algorithm’s robustness against outliers, this paper first investigates how multivariate outliers affect the performance of the CSP algorithm. We then propose a modified version of the algorithm whereby the classical covariance estimates are replaced by the robust covariance estimates obtained using Minimum Covariance Determinant (MCD) estimator. Median Absolute Deviation (MAD) is also used to robustly estimate the variance of the projected EEG signals. The results show that the proposed algorithm is able to reduce the influence of the outliers. When an average of 2.5% outliers is introduced, the average drop in the accuracy is 9.21% for the CSP algorithm and 0.72% for the proposed algorithm.

I. I NTRODUCTION Real world data such as EEG signals often contain outliers, which are observations that deviate from the general pattern of the data [1]. Examples of outliers in EEG signals are spikes, ocular artifacts, muscle artifacts, etc. The outliers can adversely affect the results obtained from the conventional estimation methods such as the least squares estimator and principal component analysis. Detecting outliers in multivariate data such as EEG is not straightforward because a correlation structure may exist among the variables. A multivariate outlier may not be an outlier in the coordinates of any of the variables and yet greatly affect the results. Robust methods can be used to detect outliers and reduce their influence. Such methods should have robustness against outliers and give good efficiency. A breakdown point is among the measures of robustness used in the literatures. The breakdown point gives the maximum fraction of outliers that an estimator can cope with [2]. For example, the breakdown point of the least squares estimator is 0 since a single outlier can wreck havoc the estimates. Common Spatial Patterns (CSP) is a method widely used in Brain-Computer Interface (BCI) systems to preprocess the EEG signals [3], [4]. The CSP algorithm finds the directions where the EEG signals should be projected onto so that the differences between any two classes of EEG signals are maximized (i.e. the variance of one class is minimized Xinyi Yong and Rabab K. Ward are with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada.

{yongy, rababw}@ece.ubc.ca Gary E. Birch is with the Neil Squire Society, Burnaby, BC, Canada.

[email protected]

978-1-4244-1815-2/08/$25.00 ©2008 IEEE.

while at the same time, the variance of the other class is maximized) [4]. The directions are given by a weight matrix whose rows give the weights of each EEG channel. Useful features can be extracted from the projected EEG signals (e.g. the variances of the projected signals are commonly used) and then used for classification. Despite its usefulness in preprocessing EEG signals, the CSP algorithm is sensitive to outliers because it involves the estimation of the covariance matrices. Classical sample covariance estimates are highly non-robust and has a breakdown point of 0. Outliers not only affect the variances and the correlation structure of the covariance matrices, but also the condition number of the matrices may grow to a very large values (the largest eigenvalue becomes very large) [1]. Outliers may change the ordered eigenvectors and drastically change the subspace spanned by the eigenvectors [5]. Such perturbation in the orientation of the eigenvectors will have an impact on the projection directions estimated by the CSP algorithm. Not only the estimation of covariance matrices, but also the sample variance estimates used in extracting the features from the projected EEG signals are also easily affected by even a single outlier. To the best of our knowledge, the problem of the nonrobustness of the classical estimates in the CSP algorithm has not been addressed in the literatures. In this paper, we aim to investigate how sensitive the CSP algorithm is to outliers. We also propose a robust version of the CSP algorithm to safeguard the algorithm from outliers. This is achieved using the robust estimator, Minimum Covariance Determinant (MCD) to estimate the covariance matrices in the CSP algorithm. Besides, we also propose the use of the Median Absolute Deviation (MAD) to estimate the variance of the projected EEG signals because variance estimate is also very sensitive to outliers. II. M ETHODOLOGY A. Data Description The EEG data used in this study consisted of two classes: right hand and right foot motor imageries. They were provided by Fraunhofer FIRST (Intelligent Data Analysis Group) and Campus Benjamin Franklin of the Charit´e University Medicine Berlin (Neurophysics Group) [6]. The EEG signals were recorded from five subjects using 118 electrodes per subject. The extended International 10-20 system at a sampling rate of 1 kHz was employed. During each experiment, the subject was given visual cues that indicated for 3.5s which of the three motor imageries should be performed: left hand, right hand and right foot. The resting interval between two trials was randomized from 1.75 to 2.25

2087

TABLE I

seconds. Only EEG trials for right hand and right foot were provided. Each class of EEG signals consisted of a certain number of training (labelled) trials and test (unlabelled) trials for each subject. The total number of EEG trials was 280 for all subjects. In this study, we downsampled the EEG data to 100 Hz and band-pass filtered them to the 8–35 Hz frequency band. This band encompassed of the mu and beta rhythms which have been reported to desynchronize or attenuate during motor imagery [7] and was used successfully in BCI systems to classify EEG signals [4], [3], [8].

T HE DETAILS OF THE OUTLIERS SIMULATED Study

Description vt = 0 Data: All subjects S0 Objective: To investigate the performance of the algorithms when there are no outliers. vt ∼ (1 − )δ0 + Np (µy + 3σy , Σy )  = 0.00 : 0.05 : 0.50 S1 Data: Subject al Objective: To investigate how the algorithms’ performance changes with the increasing number of outliers. vt ∼ (1 − )δ0 + Np (µy + Nm σy , Σy )  = 0.05, Nm = 0 : 2 : 50 S2 Data: Subject al Objective: To investigate how the algorithms’ performance changes with the increasing magnitude of the outliers. vt ∼ F where F is the distribution of the outliers obtained from the EEG segments related to jaw clenching and swallowing. Data: All subjects S3 Objective: To study the performance of the algorithms when a small amount of simulated muscle artifacts are introduced into the EEG signals. µy is the mean vector of the EEG signals, σy is the standard deviation vector of the EEG signals, Σy is the diagonal matrix with σy2 as its entries. p is the number of variables (electrodes), which is 118 in this study. Nm is a constant that controls the magnitude of the outliers.

B. Outlier Simulation In order to investigate the sensitivity of the CSP algorithm to outliers and to test the performance of both the CSP and the proposed algorithm in the presence of outliers, multivariate outliers are simulated and added to the EEG signals. The filtered EEG signals may still contain some outliers, which we assume is not significant. The multivariate EEG data contaminated by the simulated outliers can be expressed by Equation 1. The contaminated EEG signals, yt are the sum of the filtered EEG data (xt ) and the simulated outliers drawn from a multivariate normal distribution (vt ). yt = xt + vt

(1)

The distribution of vt as simulated in this study, has a normal mixture distribution [1]. vt ∼ (1 − )δ0 + Np (µ, Σ) where Np is the p–variate normal distribution with mean µ and covariance matrix Σ, δ0 is a point mass distribution located at zero and  > 0 is the probability of occurrence of the outliers. In this case, an outlier will occur at any fixed time t with probability . Outliers from a variety of settings are simulated and listed in Table I. The outliers are added to only the training EEG trials of both classes. In Study S1 and S2, only Subject al’s data are used because it has larger training EEG trials; also good accuracy can be achieved by just using the CSP algorithm. This makes it easier to look at the effects on the performance of the algorithm when the outliers are introduced. In Study S3, we simulate the artifacts related to jaw clenching and swallowing (using the jaw clenching and swallowing EEG trials obtained from our lab as a model). The number of samples simulated is 93 for jaw clenching and 225 for swallowing. An example of the outliers simulated is shown in Fig. 1. The outlier segments are drawn from the multivariate normal distribution Np (Nmt σy , Σy ) with a varying outlier’s magnitude, Nmt . The position where the segments of outliers are added to the training EEG trials is randomly selected. C. CSP Algorithm The data consist of p = 118 EEG channels. There are two classes of EEG signals: Class 1 (right hand) and Class 2 (right foot). These trials are divided into training and testing trials. The CSP algorithm is used to project the EEG signals onto a space where the variance of one class is maximal and

IN THIS STUDY.

the variance of the other class is minimal. This can be done by solving an optimization problem. The criterion or the cost function used in this optimization problem is the resulting variance of the projected Class 1 (or Class 2) signals. It is minimized while keeping the sum of the variances of both signal classes fixed [4]. Note that only the training trials are used to find the spatial filters in the CSP algorithm. Let S = {S1 , S2 , . . . , SM } where Si ∈ Rp×N denotes the i-th training trial EEG signal (filtered EEG signals in study S0 and contaminated EEG signals in the other studies), M the total number of training EEG trials and N the number of samples in the signal (350 in this study). The optimization problem is expressed as: X minimize var(wT Si ) w

subject to

i∈C1 M X

(2) var(wT Si ) = 1

i=1

where C1 represents all Class 1 EEG trials and w ∈ Rp is the unknown weight vector of the spatial filter. We can express the cost function in (2) using the definition of variance, i.e., var(wT Si )

= wT E{(Si − E{Si })(Si − E{Si })T }w = w T Σi w

where Σ1 and Σ2 are the mean covariance matrices for the concatenated signals belonging to sets C1 and C2 respectively. The CSP algorithm is summarised as follows: 1) Perform a whitening transformation on (Σ1 + Σ2 ), i.e., ˆ1 = find P such that P (Σ1 + Σ2 )P T = I. Let Σ ˆ 2 = P Σ2 P T . Then, Σ ˆ1 + Σ ˆ 2 = I. P Σ1 P T and Σ

2088

{P = Λ−1/2 ΦT where Λ and Φ are the eigen value and eigen vector matrix of (Σ1 + Σ2 )}. 2) Calculate an orthogonal matrix R and a diagonal ˆ 1 = RDRT . Then, Σ ˆ 2 = R(I − matrix D such that Σ T D)R ˆ 1 R = RT P Σ1 P T R = D, then W = RT P 3) Since RT Σ and w is the row of W that corresponds to the largest diagonal element in D. Two spatial filters are obtained from the CSP algorithm: the rows of W that correspond to the largest and the smallest diagonal element in D. The filters are used to project the signals. The variances of the two projected signals are the only two features used in the classification. Linear Discriminant Analysis (LDA) is used for classification. Since the goal is to classify the data, we are particularly interested in the influence of outliers on the classification accuracy. Hence, the classification accuracy (the percentage of the number of correctly classified test trials) is used as a performance metric. D. The Proposed Algorithm - RCSP The CSP algorithm is strongly influenced by outliers because it involves the use of the classical sample covariance estimates that is highly non-robust and has a breakdown point of 0. The simplest way to deal with this problem is ˜ 1 and Σ ˜ 2. to replace Σ1 and Σ2 with robust estimates Σ Robust estimators of covariance matrix such as M-estimator, Minimum Volume Ellipsoid (MVE), MCD or S-estimator can be used. The M-estimator has a low breakdown point when the dimension of the data is big. The MVE and MCD are computationally expensive [9]. The M or S-estimators work well only with good starting points [10]. In this study, we propose to use Fast-MCD estimator. The MCD estimator is introduced by Rousseuw [11] and the Fast-MCD estimator is a newer MCD algorithm proposed to overcome the problem of the long computational time of the MCD algorithm [9]. The algorithm has a high breakdown point and a better statistical efficiency than the MVE. It can also be applied to high dimensional data (such as the data used in this study, in which the number of electrodes p is 118). The MCD estimator seeks the h observations (out of a total of n) that have a covariance matrix with the lowest possible determinant. The estimator can resist (n−h) outliers, hence, h determines the robustness of the estimator. Random subsets are drawn from the observations and the ‘C-steps’ in the Fast-MCD algorithm ensures that the determinant of the covariance matrix is not increasing. The subset is updated until the determinant no longer decreases [9]. To improve the efficiency of the high breakdown point estimator, the estimation is re-weighted. The details of the algorithm can be found in [9]. In this study, the robust statistical toolbox - LIBRA [12] is used to obtain the robust estimates of the covariance matrices. h is chosen to be 0.8n. Therefore, the estimator can resist up to 20% of outliers in the signals. The proposed robust version of the CSP algorithm will ˜ as an output. Two spatial generate a weight matrix W

Fig. 1. In study S3, the EEG signals are contaminated by simulated outliers that are related to jaw clenching and swallowing.

filters are obtained from the matrix and used to project the EEG signals onto different spaces. Since the scale estimate (variance) of the projected signals is highly non-robust, the presence of outliers will still affect the performance of the classifier. In order to deal with this problem, we replace the scale estimate with the Median Absolute Deviations (MAD) estimate. The robust scale estimate, sˆ of a sample X = {x1 , x2 , . . . , xN } can be obtained using the median of the absolute deviations of the sample from their median divided by 0.6745 as shown in Equation (3) [13]. The median of the deviations is divided by 0.6745 so that the estimator is consistent when the distribution of the sample is normal distribution [14]. 1 med(|X − med(X)|) (3) sˆ = 0.6745 III. R ESULTS Outliers are here shown to adversely affect the performance of the algorithms especially the CSP algorithm. The results obtained from study S1 and S2 are presented in Fig. 2. As the level of outlier contamination in the EEG signals increases, the performance of the algorithms deteriorates. The CSP algorithm has a significant decrease in its performance (measured by the classification accuracy) whereas the proposed RCSP algorithm only starts to break down at  > 0.45 (when the number of outliers and the number of clean data points are almost the same).

(a)

(b)

Fig. 2. Comparison between the performance of CCSP and RCSP in simulation study (a) S1 and (b) S2.

In real EEG signals, the percentage of outliers embedded in the signals is unknown and is subject-specific (One subject

2089

may generate artifacts more frequently than the others). The use of h = 0.8n assumes that the percentage of outliers in the signals is less than 20%. The shortcoming is that some information is lost when only h observations are used in estimating the covariance matrices (if the number of outliers is less than 20%). If we know the percentage of outliers in the signals, the h value can be adjusted accordingly so that the performance of the estimator is optimal. Further studies are required to investigate the probability of outlier occurrence in the EEG signals. Nm in study S2 controls the magnitude of the outliers. As shown in Fig. 2(b), when  = 0.05, the performance of the CSP algorithm deteriorates as Nm increases. However, the proposed RCSP algorithm does not experience any decrease in its performance. This is because the Fast-MCD algorithm looks for a subset of 0.8n observations in the data that gives the smallest determinant of the covariance matrix. The outlying observations will not be selected and hence, do not influence the estimation of the covariance matrices. In real EEG data, the range of the outliers’ magnitude is large (up to 40 times the standard deviation when the subject is swallowing or performing strong muscle contractions). Table II compares the performance of both the CSP and the proposed RCSP algorithm in the simulation studies, S0 and S3. The values in the bracket next to the subject’s identifier (e.g., aa(40%, 0.01) indicate the percentage of the test EEG trials and the probability of outlier occurrence in the training EEG trials. The results show that the proposed algorithm is able to reduce the influence of the outliers. The last row of the table shows that for an average of 2.5% outliers, the average drop in accuracy is 9.21% for the CSP algorithm and 0.72% for the proposed algorithm. The effect of the outliers is especially drastic in the case of Subject ay in which the accuracy drops from 89.29% to 48.1%. This is because only a small number of training data is used in training the classifier and approximately 6% of the data are contaminated by the simulated outlier segments related to jaw clenching and swallowing, which is much higher than the percentage of outliers in the cases of other subjects. As for Subject al, the number of training trials is large and the probability of outlier occurrence is small. Hence, both algorithms do not experience any deterioration in their performance. ACCURACY (%)

OF

CSP

Subject aa (40%, 0.01) al (20%, 0.01) av (70%, 0.02) aw (80%, 0.03) ay (90%, 0.06) Average (60%, 0.025)

TABLE II AND RCSP IN STUDY S0 CSP(%) S0 S3 71.43 69.64 94.64 94.64 60.20 58.16 51.34 50.00 89.29 48.41 73.38 64.17

AND

S3.

RCSP(%) S0 S3 72.32 70.54 94.64 94.64 54.59 61.22 52.68 51.34 88.49 81.35 72.54 71.82

Eye blinks do not leave a significant effect on the algorithms’ performance because their energy content lies mainly in the low frequency bands and the EEG signals used in this study are bandpass filtered to the 8–35 Hz frequency

band. However, caution must be exercised when applying CSP to the low frequency band of the EEG signals such as the lateralised readiness potential (LRP), because the ocular artifacts will affect the performance. The proposed RCSP algorithm in this case will be more robust. For example, the EEG segments of Subject aa that correspond to the motor imageries contain approximately 11% outliers caused by eye-blink related artifacts. By including the low frequency components (1 - 4 Hz) when extracting the features, the accuracy of the CSP algorithm drops from 71.43% to 62.5%. The accuracy drop for the proposed algorithm is significantly less, only from 72.32% to 69.64%. IV. C ONCLUSIONS In this study, we first focus on demonstrating how outliers affect the performance of the CSP algorithm. To enable the CSP algorithm to be more robust, we propose a modified version of it. In the modified version, the covariance estimates in the algorithm are obtained using the MCD covariance estimates. In addition, the use of MAD to estimate the variance of the projected EEG signals further improves the robustness of the features extracted. The use of the robust estimates safeguards the proposed RCSP algorithm from outliers. R EFERENCES [1] R. A. Maronna, R. D. Martin, and V. J. Yohai, Robust Statistics: Theory and Methods, 1st ed. England: Wiley, 2006. [2] P. J. Huber, Robust Statistical Procedures, 2nd ed. Philadelphia, PA: SIAM (Society for Industrial and Applied Mathematics), 1996, vol. 27. [3] H. Ramoser, J. M¨uller-Gerking, and G. Pfurtscheller, “Optimal spatial filtering of single trial EEG during imagined hand movement,” IEEE Transactions on Rehabilitation Engineering, vol. 8, no. 4, pp. 441– 447, 2000. [4] G. Dornhege, B. Blankertz, M. Krauledat, F. Losch, G. Curio, and K.R. M¨uller, “Optimizing spatio-temporal filters for improving BrainComputer Interfacing,” in Advances in Neural Inf. Proc. Systems (NIPS05), J. Platt, Ed., vol. 18, Vancouver, Canada, December 2005. [5] S. Visuli, V. Koivunen, and H. Oja, “Sign and rank covariance matrices,” Journal of Statistical Planning and Inference, vol. 91, pp. 557–575, 2000. [6] “Data Set IVa for the BCI Competition III,” http://ida.first.fraunhofer.de/projects/bci/competition iii/desc IVa.html. [7] G. Pfurtscheller and F. H. L. da Silva, “Event-related EEG/MEG synchronization and desynchronization: basic principles,” Clinical Neurophysiology, vol. 110, pp. 1842–1857, 1999. [8] Y. Wang, S. Gao, and X. Gao, “Common spatial pattern method for channel selection in motor imagery based Brain-computer Interface,” in Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the IEEE, 2005, pp. 5392–5395. [9] P. J. Rousseeuw and K. V. Driessen, “A fast algorithm for the minimum covariance determinant estimator,” Technometrics, vol. 41, no. 3, pp. 212–223, 1999. [10] D. L. Woodruff and D. M. Rocke, “Computable robust estimation of multivariate location and shape in high dimension using compound estimators,” Journal of American Statistical Association, vol. 89, no. 427, 1994. [11] P. J. Rousseeuw, “Least medians of squares regression,” Journal of American Statistical Association, vol. 79, pp. 851–857, 1984. [12] S. Verboven and M. Hubert, “Libra: a matlab library for robust analysis,” Chemometrics and Intelligent Systems, vol. 75, pp. 127– 136, 2005. [13] R. V. Hogg, “An introduction to robust estimation,” in Robustness in Statistics, R. L. Launer and G. N. Wilkinson, Eds. New York: Academic, 1979, pp. 1–17. [14] P. J. Huber, Robust Statistics. New York: Wiley, 1981.

2090