Noname manuscript No. (will be inserted by the editor)
Shiliang Sun · Changshui Zhang
Adaptive feature extraction for EEG signal classification
Received: date / Accepted: date
Abstract One challenge in the current research of Brain
Keywords Brain Computer Interface (BCI), Common
Computer Interfaces (BCIs) is how to classify time-varying
Spatial Patterns (CSP), EEG signal classification, Feature
electroencephalographic (EEG) signals as accurately as pos-
extraction
sible. In this paper, we address this problem from the aspect of updating feature extractors, and propose an adaptive fea-
1 Introduction
ture extractor, namely Adaptive Common Spatial Patterns (ACSP). Through the weighed update of signal covariances, the most discriminative features related to the current brain states are extracted by the method of multi-class Common Spatial Patterns (CSP). Pseudo-online simulations of EEG signal classification with a Support Vector Machine (SVM) classifier for multi-class mental imagery tasks show the effectiveness of the proposed adaptive feature extractor.
The research of Brain Computer Interfaces (BCIs), which aim to provide their users communication and control capabilities that do not depend on the brain’s normal output channels of peripheral nerves and muscles, arouses more and more interests of late years [9,13–15]. Up to the present, study into BCI systems has mainly involved recording of electroencephalographic (EEG) signals using surface electrodes, since this kind of recording means is relatively convenient, harmless and inexpensive compared with other methods [2]. In this paper, we focus on the classification problem
Shiliang Sun and Changshui Zhang State Key Laboratory of Intelligent Technology and Systems, Depart-
of EEG signals, a crucial component embodied in general EEG-based BCIs. For an EEG-based BCI, adaptive learning
ment of Automation, Tsinghua University, Beijing 100084, China Tel.: +80-10-62796872
algorithms are necessary in principle, because the recorded
Fax: +80-10-62786911
EEG signals usually change over time due both to biological
E-mail:
[email protected],
[email protected]
and to technical causes, such as subject attention, subject fa-
2
tigue, disease progression, electrode impedances, amplifier
2 Adaptive feature extraction
noise, and environmental noise [13]. The high variability of EEG recordings makes it a difficult task to classify different
2.1 Common Spatial Patterns (CSP) and its extension to the
EEG signals accurately, and necessitates adaptive learning
multi-class paradigm
to boost up the performance of existing BCIs. The original feature extractor of CSP can be seen as linear With respect to adaptive learning for EEG signal clas-
spatial filters that lead to signals which discriminate opti-
sification, one can choose to update classifiers, or alterna-
mally between two conditions. It is based on a decompo-
tively choose to update feature extractors. However, up to
sition of the raw multi-channel signals into spatial patterns
the present, there is not much work addressing this prob-
that are extracted from the data of two population of EEGs
lem. The adaptive update of Bayesian statistical classifier
in a manner that maximizes their differences. Theses spa-
with Gaussian Mixture Model (GMM) is recently studied
tial patterns provide a weighting of the electrodes, which is
in several papers [6,7,10,11], whereas the performance is
derived directly from the data (for detailed description and
still very moderate. Wolpaw and McFarland use the least-
computational issues, please refer to [8]). Recently, there are
mean-square (LMS) algorithm to adaptively adjust weights
some approaches presented to extend CSP to the multi-class
for a two-dimensional movement control, and find out that
paradigm, such as using CSP within the classifier, one ver-
people with severe motor disabilities could use scalp EEG
sus the rest CSP (OVR), approximate simultaneous diago-
signals to operate a robotic arm or a neuroprosthesis [16].
nalization [3,8]. In this paper, the idea of OVR is adopted
In this paper, we propose to address the adaptive learning
to carry out feature extraction as suggested by [3]. To be
problem in EEG signal classification via updating feature ex-
exact, if given three sets of EEG trials or segments corre-
tractors. The basic feature extractor is named Common Spa-
sponding to three different conditions A, B and C (e.g. three
tial Patterns (CSP), whose essence is to project EEG signals
class of mental imaginary tasks), we use each set to obtain
to the most discriminative directions found after the simul-
covariance matrices CA , CB , and CC respectively. Then we
taneous diagonalization of covariance matrices from differ-
can combine the samples belonging to conditions B and C to
ent signal categories [8]. Because of the inherent variability
obtain a covariance matrix CA
of EEG patterns, the discriminative directions for classifica-
trices CB
tion tend to shift over time. In order to resolve this problem,
use each pair of covariance matrices (e.g. CA and CA
the method of Adaptive Common Spatial Patterns (ACSP) is
to carry out the standard CSP procedures. Thus the optimal
thus presented to improve the CSP method.
spatial filters related to the corresponding conditions A, B
Rest
and CC
Rest
Rest . Likewise, covariance ma-
could be obtained. Sequently, we Rest )
3
and C are reserved. The final feature extractor is the combi-
variables as follows
nation of the three sets of spatial filters.
CA (k) = µ CA (k − 1) + (1 − µ )x(k)x⊤(k),
2.2 Adaptive Common Spatial Patterns (ACSP)
CB
Rest (k)
= µ CB
Rest (k − 1) + (1 − µ )x(k)x
CC
Rest (k)
= µ CC
Rest (k − 1) + (1 − µ )x(k)x
⊤(k), ⊤(k).
(3)
Given K EEG trials or segments x(k) (k = 1, ..., K), the co-
Then, we follow the standard procedures of CSP to derive
variance matrix can be usually estimated as
projection directions.
C(k) =
1 K ∑ x(k)x⊤(k) , K k=1
The ACSP method is a superset of the basic CSP method, (1)
since when variability coefficient µ = 1, it degenerates to the
where x(k) is a N × T EEG recording matrix, and N and
CSP method. The selection of µ is very flexible, which can
T are respectively the number of recording electrodes and
reflect the discrepancies of different subjects. For subjects
recording points. As we know, the CSP feature extractor
whose signal variability is very slow, µ should take large
adopts fixed covariances obtained from training sessions. But values to retain more historical information, and vice versa. in the ACSP feature extractor when encountering a new trial
When the ACSP feature extractor is followed by a classi-
or EEG segment from test sessions, e.g., x(k), we update the
fication task in BCI applications, we employ the last EEG
corresponding covariance matrix as follows
segment to update the feature extractor, and then use the
C(k) = µ C(k − 1) + (1 − µ )x(k)x⊤(k),
(2)
updated feature extractor to extract the features of the current segment. Based on the extracted features, the classifier
where µ ∈ [0, 1] is defined as variability coefficient in our
trained on the previous training sessions could give the es-
paper. This kind of adaptive strategy embodies the idea of timated label of the current segment. This process runs iterweighted average, that is, the current covariance matrix is atively as new EEG segments are continually recorded and described as the weighted sum of historical covariance and sent for classification. the covariance of the current recording segment. In general, those two covariance components given in the right side of 3 Experiments (2) are both necessary. C(k − 1) contains historical information, and can also benefit the robust computation of C(k).
The data set used in this paper contains EEG recordings from
x(k)x⊤ (k) contains the newly added information, and re-
3 normal subjects (denoted by S1, S2, S3 respectively) dur-
flects the time-varying characteristic of EEG signals. Con-
ing mental imagery tasks, which are imagination of repeti-
cretely, in the multi-class paradigm if the new EEG segment
tive self-paced left hand movements (class C1 ), imagination
x(k) belongs to condition A, we would update the related
of repetitive self-paced right hand movements (class C2 ) and
4
generation of different words beginning with the same ran-
tering bands are determined by observing the first sessions
dom letter (class C3 ). For every subject, there were 3 record-
of these three subjects, and the slight variation of filtering
ing sessions acquired on the same day, each lasting about 4
bands among subjects reflects the individual specialty. In ad-
minutes with breaks of 5-10 minutes in between [1]. Gal´an
dition, each EEG segment is normalized to have unit energy,
et al. show that subjects S1, S2 and S3 represent three dif-
and to avoid the imperfection of temporal filtering to the start
ferent levels of mental consistency, which are respectively
and end parts of a segment, 81 points (with index from 20 to
consistent, scarcely consistent and inconsistent [5]. Hence
100) are retained for analysis from the original 128 points.
the data set is representative and in this paper it is employed to assess algorithms.
3.2 Training the classifier Throughout this paper, the Support Vector Machine (SVM)
3.1 Signal preprocessing classifier with a Radial Basis Function (RBF) kernel is adopted As mental imagery tasks are mainly related to the activities
to classify EEG segments [12]. The experimental paradigm
of brain’s sensorimotor cortices, from the entire electrode
for training a classifier is as follows. First we choose two
cap we retain the central 15 electrodes covering this cortical
recording sessions from the same subject, one serving as
region for signal analysis, which are F3, Fz, F4, Fc1, Fc2,
training session, and the other test session. Then we use the
C3,Cz,C4,Cp1,Cp2, Pz, P3, P4, Po3, Po4. Because the usual
stationary CSP method to extract features from the training
frequency of spontaneous EEG recordings is usually below
session, and further use these features to train a SVM clas-
50 Hz, we convert the original sampling rate to 128 Hz with-
sifier. The number of sources (number of spatial patterns re-
out fearing of information loss. The signals are then referred
lated to one condition, such as ‘left hand movement’) varies
by the Common Average Reference method [15]. Further,
from 2 to 6 in our experiments, for this is empirically enough
the continuous recording sessions are respectively partitioned to describe the sources of mental imagery tasks. Therefore, to segments of 1 second with 0.5 second overlapped to pro-
the maximum dimensions of one EEG segment after feature
vide an output every 0.5 seconds using the last second of
extraction would be 18 (each 6 dimensions belong to one
data, as in [6]. Resultantly, an EEG segment would be 128
kind of brain state). The optimal parameters, i.e., the penalty
points, with a time length of 1 second. To emphasize the µ
parameter for error terms in the SVM classifier, the variance
rhythm which is high discriminative in distinguishing dif-
parameter in RBF kernel, and the number of sources related
ferent mental tasks, these segments are temporally filtered
to each mental task are selected through 20-fold cross val-
(forward and reverse filtering) with pass bands 8-13Hz (for
idation [4] on each training set. Finally, SVM is retrained
subjects S1 and S2) and 11-15Hz (for subject S3). The fil-
using these optimal parameters and the whole training set.
5
3.3 Experimental results
Table 1 The classification accuracies (%) by using different feature extractors Training
Test
tractor, two other feature extractors are employed for per-
session
session
SCSP
WCSP
ACSP
formance comparisons. One is Stationary CSP (SCSP), the
1
2
69.03
68.60
70.32
standard CSP method which does not update spatial filters
1
3
68.74
68.31
76.66
2
1
61.59
65.24
63.73
2
3
69.38
69.16
64.24
3
1
51.93
59.23
63.30
tion of the test session. The other feature extractor which
3
2
58.71
57.42
67.96
we name as Windowed CSP (WCSP), updates the signal co-
1
2
51.29
57.33
65.09
variance by adding a new EEG segment and removing the
1
3
56.71
60.82
74.46
2
1
54.31
53.88
63.36
2
3
58.01
57.36
69.70
3
1
53.02
56.03
70.47
3
2
59.91
56.03
65.52
1
2
56.49
58.87
65.80
ity coefficient µ in ACSP feature extractor takes as 0.95 in
1
3
52.60
48.48
57.58
our experiments empirically.
2
1
54.63
54.85
53.96
2
3
55.19
54.33
56.06
The experimental results of classification rates on all the
3
1
51.98
54.85
57.27
available recording sessions using the above three feature
3
2
64.94
63.64
66.67
58.25
59.14
65.12
For the sake of objectively evaluating the ACSP feature ex-
Accuracy
Subject
at all on new test sessions. The spatial filters computed from
S1
the training session are used to implement feature extrac-
first segment from the original segment entries for calcuS2
lating covariance matrix. Like ACSP, WCSP uses updated covariances to construct a new feature extractor, and then extracts features of the current EEG segment. The variabil-
S3
extractors are given in Table 1. Through a paired t test, no
Average
significant differences are found between feature extractors SCSP and WCSP (p-value= 0.25), although the average ac-
WCSP and ACSP, at least on the used data set . At the same
curacy of WCSP is slightly better than that of SCSP. How-
time, this also manifests the necessity and applicability of
ever, with regard to SCSP and ACSP, sixteen out of eighteen
our adaptive feature extractor.
classification results of using ACSP are superior to those using SCSP. Through a paired t test, significant differences 4 Discussions and conclusions are found between feature extractors SCSP and ACSP (pvalue< 1e − 3). From these results, we can draw the conclu-
In this paper, we propose the ACSP method for the fea-
sion that the ACSP feature extractor is the best among SCSP,
ture extraction of EEG signals. Its efficacy and superiority
6
over the SCSP and WCSP methods are validated through
are respectively consistent, scarcely consistent and inconsis-
classification experiments on multiple recording sessions of
tent [5], we are confident that ACSP would function well in
three subjects. Theoretically, because EEG signals are time-
a wide range of users. We believe that ACSP would show
varying, if we use SCSP to extract features on new EEG
more merits in future developments of BCI technology.
segments, the distribution of signal features would be transAcknowledgements The authors would like to thank IDIAP Research
ferred. In this sense, adaptive feature extraction has a sound Institute of Switzerland for providing the analyzed data. The authors
basis. However, since WCSP treats all the EEG entries in
are also grateful to the anonymous editor and reviewers for giving valu-
computing covariance matrix equally, it can not catch hold
able comments.
of the variability of EEG signals effectively. This is the reason why WCSP is inferior to ACSP which considers the
References
weighted problem of EEG entries advisably. 1. Chiappa S, Mill´an JR (2005) Data set V [http://ida.first.fraunhofer.de/projects/bci/competition iii/ desc V.html]. IDIAP Research Institute, Switzerland 2. Curran EA, Stokes MJ (2003) Learning to control brain activity: a review of the production and control of EEG components for driv-
ACSP does not show improvements in the classification. We provide an explanation for this phenomenon. Though the
ing brain-computer interface (BCI) systems. Brain and Cognition 51:326–336
subjects are doing mental imagery tasks, there exist some ir-
3. Dornhege G, Blankertz B, Curio G, M¨uller KR (2004) Boosting
relevant activities from the brain. These activities also change
bit rates in noninvasive EEG single-trial classifications by feature combination and multiclass paradigms. IEEE Trans Biomed Eng
over time, and the present study does not take this into ac51:993–1002
count. A complete adaptive algorithm should also address the adaptive behaviour of the brain itself. The computational complexity of ACSP is much slight.
4. Duda RO, Hart PE, Stork DG (2000) Pattern classification. John Wiley & Sons, New York, USA 5. Gal´an petition
Because the number of electrodes (data dimensions) in BCI
F,
Oliva III,
F,
data
Gu`ardia set
V:
J
(2005) algorithm
BCI
com-
description
[http://ida.first.fraunhofer.de/projects/bci/competition iii/results/
utilities is usually small, such as 64, 128, and 256 (15 elec-
martigny/FerranGalan desc.pdf]. Faculty of Psychology, Univer-
trodes in our experiments), the simultaneous diagonaliza-
sity of Barcelona
tion of covariance matrices could almost be implemented
6. Mill´an JR (2004) On the need for on-line learning in braincomputer interfaces. Proc Int Joint Conf on Neural Networks, Bu-
in real time. This could completely meet the needs of ondapest, Hungary
line learning. Besides, as the subjects used in this article represent three different levels of mental consistency, which
7. Mill´an JR, Renkens F, Mouri˜no J, Gerstner W (2004) Brainactuated interaction. Artif Intell 159:241–259
7
8. M¨uller-Gerking J, Pfurtscheller G, Flyvbjerg H (1999) Designing optimal spatial filters for single-trial EEG classification in a movement task. Clin Neurophysiol 110:787–798 9. Nicolelis MAL (2001) Actions from thoughts. Nature 409:403– 407 10. Sun S, Zhang C (2005) Learning on-line classification via decorrelated LMS algorithm: application to brain-computer interfaces. Lect Notes Comput Sc 3735:215–226 11. Sun S, Zhang C, Lu N (2005) On the on-line learning algorithms for EEG signal classification in brain computer interfaces. Lect Notes Comput Sc 3614:638–647 12. Vapnik V (2000) The nature of statistical learning theory. Springer-Verlag, New York, USA 13. Vaughan TM (2003) Guest editorial brain-computer interface technology: a review of the second international meeting. IEEE Trans Neur Sys Reh 11:94–109 14. Wolpaw JR, McFarland DJ, Neat GW, Forneris C (1991) An EEG-based brain-computer interface for cursor control. Electroencephalogr Clin Neurophysiol 78:252-259 15. Wolpaw JR, Birbaumer N, McFarland DJ, Pfurtscheller G, Vaughan TM (2002) Brain-computer interfaces for communication and control. Clin Neurophysiol 113:767–791 16. Wolpaw JR, McFarland DJ (2004) Control of a two-dimensional movement signal by a non-invasive brain-computer interface in humans. Proc Natl Acad Sci 101:17849-17854