Adaptive feature extraction for EEG signal ... - Semantic Scholar

28 downloads 616 Views 43KB Size Report
the most discriminative features related to the current brain states are extracted by .... et al. show that subjects S1, S2 and S3 represent three dif- ferent levels of ...
Noname manuscript No. (will be inserted by the editor)

Shiliang Sun · Changshui Zhang

Adaptive feature extraction for EEG signal classification

Received: date / Accepted: date

Abstract One challenge in the current research of Brain

Keywords Brain Computer Interface (BCI), Common

Computer Interfaces (BCIs) is how to classify time-varying

Spatial Patterns (CSP), EEG signal classification, Feature

electroencephalographic (EEG) signals as accurately as pos-

extraction

sible. In this paper, we address this problem from the aspect of updating feature extractors, and propose an adaptive fea-

1 Introduction

ture extractor, namely Adaptive Common Spatial Patterns (ACSP). Through the weighed update of signal covariances, the most discriminative features related to the current brain states are extracted by the method of multi-class Common Spatial Patterns (CSP). Pseudo-online simulations of EEG signal classification with a Support Vector Machine (SVM) classifier for multi-class mental imagery tasks show the effectiveness of the proposed adaptive feature extractor.

The research of Brain Computer Interfaces (BCIs), which aim to provide their users communication and control capabilities that do not depend on the brain’s normal output channels of peripheral nerves and muscles, arouses more and more interests of late years [9,13–15]. Up to the present, study into BCI systems has mainly involved recording of electroencephalographic (EEG) signals using surface electrodes, since this kind of recording means is relatively convenient, harmless and inexpensive compared with other methods [2]. In this paper, we focus on the classification problem

Shiliang Sun and Changshui Zhang State Key Laboratory of Intelligent Technology and Systems, Depart-

of EEG signals, a crucial component embodied in general EEG-based BCIs. For an EEG-based BCI, adaptive learning

ment of Automation, Tsinghua University, Beijing 100084, China Tel.: +80-10-62796872

algorithms are necessary in principle, because the recorded

Fax: +80-10-62786911

EEG signals usually change over time due both to biological

E-mail: [email protected], [email protected]

and to technical causes, such as subject attention, subject fa-

2

tigue, disease progression, electrode impedances, amplifier

2 Adaptive feature extraction

noise, and environmental noise [13]. The high variability of EEG recordings makes it a difficult task to classify different

2.1 Common Spatial Patterns (CSP) and its extension to the

EEG signals accurately, and necessitates adaptive learning

multi-class paradigm

to boost up the performance of existing BCIs. The original feature extractor of CSP can be seen as linear With respect to adaptive learning for EEG signal clas-

spatial filters that lead to signals which discriminate opti-

sification, one can choose to update classifiers, or alterna-

mally between two conditions. It is based on a decompo-

tively choose to update feature extractors. However, up to

sition of the raw multi-channel signals into spatial patterns

the present, there is not much work addressing this prob-

that are extracted from the data of two population of EEGs

lem. The adaptive update of Bayesian statistical classifier

in a manner that maximizes their differences. Theses spa-

with Gaussian Mixture Model (GMM) is recently studied

tial patterns provide a weighting of the electrodes, which is

in several papers [6,7,10,11], whereas the performance is

derived directly from the data (for detailed description and

still very moderate. Wolpaw and McFarland use the least-

computational issues, please refer to [8]). Recently, there are

mean-square (LMS) algorithm to adaptively adjust weights

some approaches presented to extend CSP to the multi-class

for a two-dimensional movement control, and find out that

paradigm, such as using CSP within the classifier, one ver-

people with severe motor disabilities could use scalp EEG

sus the rest CSP (OVR), approximate simultaneous diago-

signals to operate a robotic arm or a neuroprosthesis [16].

nalization [3,8]. In this paper, the idea of OVR is adopted

In this paper, we propose to address the adaptive learning

to carry out feature extraction as suggested by [3]. To be

problem in EEG signal classification via updating feature ex-

exact, if given three sets of EEG trials or segments corre-

tractors. The basic feature extractor is named Common Spa-

sponding to three different conditions A, B and C (e.g. three

tial Patterns (CSP), whose essence is to project EEG signals

class of mental imaginary tasks), we use each set to obtain

to the most discriminative directions found after the simul-

covariance matrices CA , CB , and CC respectively. Then we

taneous diagonalization of covariance matrices from differ-

can combine the samples belonging to conditions B and C to

ent signal categories [8]. Because of the inherent variability

obtain a covariance matrix CA

of EEG patterns, the discriminative directions for classifica-

trices CB

tion tend to shift over time. In order to resolve this problem,

use each pair of covariance matrices (e.g. CA and CA

the method of Adaptive Common Spatial Patterns (ACSP) is

to carry out the standard CSP procedures. Thus the optimal

thus presented to improve the CSP method.

spatial filters related to the corresponding conditions A, B

Rest

and CC

Rest

Rest . Likewise, covariance ma-

could be obtained. Sequently, we Rest )

3

and C are reserved. The final feature extractor is the combi-

variables as follows

nation of the three sets of spatial filters.

CA (k) = µ CA (k − 1) + (1 − µ )x(k)x⊤(k),

2.2 Adaptive Common Spatial Patterns (ACSP)

CB

Rest (k)

= µ CB

Rest (k − 1) + (1 − µ )x(k)x

CC

Rest (k)

= µ CC

Rest (k − 1) + (1 − µ )x(k)x

⊤(k), ⊤(k).

(3)

Given K EEG trials or segments x(k) (k = 1, ..., K), the co-

Then, we follow the standard procedures of CSP to derive

variance matrix can be usually estimated as

projection directions.

C(k) =

1 K ∑ x(k)x⊤(k) , K k=1

The ACSP method is a superset of the basic CSP method, (1)

since when variability coefficient µ = 1, it degenerates to the

where x(k) is a N × T EEG recording matrix, and N and

CSP method. The selection of µ is very flexible, which can

T are respectively the number of recording electrodes and

reflect the discrepancies of different subjects. For subjects

recording points. As we know, the CSP feature extractor

whose signal variability is very slow, µ should take large

adopts fixed covariances obtained from training sessions. But values to retain more historical information, and vice versa. in the ACSP feature extractor when encountering a new trial

When the ACSP feature extractor is followed by a classi-

or EEG segment from test sessions, e.g., x(k), we update the

fication task in BCI applications, we employ the last EEG

corresponding covariance matrix as follows

segment to update the feature extractor, and then use the

C(k) = µ C(k − 1) + (1 − µ )x(k)x⊤(k),

(2)

updated feature extractor to extract the features of the current segment. Based on the extracted features, the classifier

where µ ∈ [0, 1] is defined as variability coefficient in our

trained on the previous training sessions could give the es-

paper. This kind of adaptive strategy embodies the idea of timated label of the current segment. This process runs iterweighted average, that is, the current covariance matrix is atively as new EEG segments are continually recorded and described as the weighted sum of historical covariance and sent for classification. the covariance of the current recording segment. In general, those two covariance components given in the right side of 3 Experiments (2) are both necessary. C(k − 1) contains historical information, and can also benefit the robust computation of C(k).

The data set used in this paper contains EEG recordings from

x(k)x⊤ (k) contains the newly added information, and re-

3 normal subjects (denoted by S1, S2, S3 respectively) dur-

flects the time-varying characteristic of EEG signals. Con-

ing mental imagery tasks, which are imagination of repeti-

cretely, in the multi-class paradigm if the new EEG segment

tive self-paced left hand movements (class C1 ), imagination

x(k) belongs to condition A, we would update the related

of repetitive self-paced right hand movements (class C2 ) and

4

generation of different words beginning with the same ran-

tering bands are determined by observing the first sessions

dom letter (class C3 ). For every subject, there were 3 record-

of these three subjects, and the slight variation of filtering

ing sessions acquired on the same day, each lasting about 4

bands among subjects reflects the individual specialty. In ad-

minutes with breaks of 5-10 minutes in between [1]. Gal´an

dition, each EEG segment is normalized to have unit energy,

et al. show that subjects S1, S2 and S3 represent three dif-

and to avoid the imperfection of temporal filtering to the start

ferent levels of mental consistency, which are respectively

and end parts of a segment, 81 points (with index from 20 to

consistent, scarcely consistent and inconsistent [5]. Hence

100) are retained for analysis from the original 128 points.

the data set is representative and in this paper it is employed to assess algorithms.

3.2 Training the classifier Throughout this paper, the Support Vector Machine (SVM)

3.1 Signal preprocessing classifier with a Radial Basis Function (RBF) kernel is adopted As mental imagery tasks are mainly related to the activities

to classify EEG segments [12]. The experimental paradigm

of brain’s sensorimotor cortices, from the entire electrode

for training a classifier is as follows. First we choose two

cap we retain the central 15 electrodes covering this cortical

recording sessions from the same subject, one serving as

region for signal analysis, which are F3, Fz, F4, Fc1, Fc2,

training session, and the other test session. Then we use the

C3,Cz,C4,Cp1,Cp2, Pz, P3, P4, Po3, Po4. Because the usual

stationary CSP method to extract features from the training

frequency of spontaneous EEG recordings is usually below

session, and further use these features to train a SVM clas-

50 Hz, we convert the original sampling rate to 128 Hz with-

sifier. The number of sources (number of spatial patterns re-

out fearing of information loss. The signals are then referred

lated to one condition, such as ‘left hand movement’) varies

by the Common Average Reference method [15]. Further,

from 2 to 6 in our experiments, for this is empirically enough

the continuous recording sessions are respectively partitioned to describe the sources of mental imagery tasks. Therefore, to segments of 1 second with 0.5 second overlapped to pro-

the maximum dimensions of one EEG segment after feature

vide an output every 0.5 seconds using the last second of

extraction would be 18 (each 6 dimensions belong to one

data, as in [6]. Resultantly, an EEG segment would be 128

kind of brain state). The optimal parameters, i.e., the penalty

points, with a time length of 1 second. To emphasize the µ

parameter for error terms in the SVM classifier, the variance

rhythm which is high discriminative in distinguishing dif-

parameter in RBF kernel, and the number of sources related

ferent mental tasks, these segments are temporally filtered

to each mental task are selected through 20-fold cross val-

(forward and reverse filtering) with pass bands 8-13Hz (for

idation [4] on each training set. Finally, SVM is retrained

subjects S1 and S2) and 11-15Hz (for subject S3). The fil-

using these optimal parameters and the whole training set.

5

3.3 Experimental results

Table 1 The classification accuracies (%) by using different feature extractors Training

Test

tractor, two other feature extractors are employed for per-

session

session

SCSP

WCSP

ACSP

formance comparisons. One is Stationary CSP (SCSP), the

1

2

69.03

68.60

70.32

standard CSP method which does not update spatial filters

1

3

68.74

68.31

76.66

2

1

61.59

65.24

63.73

2

3

69.38

69.16

64.24

3

1

51.93

59.23

63.30

tion of the test session. The other feature extractor which

3

2

58.71

57.42

67.96

we name as Windowed CSP (WCSP), updates the signal co-

1

2

51.29

57.33

65.09

variance by adding a new EEG segment and removing the

1

3

56.71

60.82

74.46

2

1

54.31

53.88

63.36

2

3

58.01

57.36

69.70

3

1

53.02

56.03

70.47

3

2

59.91

56.03

65.52

1

2

56.49

58.87

65.80

ity coefficient µ in ACSP feature extractor takes as 0.95 in

1

3

52.60

48.48

57.58

our experiments empirically.

2

1

54.63

54.85

53.96

2

3

55.19

54.33

56.06

The experimental results of classification rates on all the

3

1

51.98

54.85

57.27

available recording sessions using the above three feature

3

2

64.94

63.64

66.67

58.25

59.14

65.12

For the sake of objectively evaluating the ACSP feature ex-

Accuracy

Subject

at all on new test sessions. The spatial filters computed from

S1

the training session are used to implement feature extrac-

first segment from the original segment entries for calcuS2

lating covariance matrix. Like ACSP, WCSP uses updated covariances to construct a new feature extractor, and then extracts features of the current EEG segment. The variabil-

S3

extractors are given in Table 1. Through a paired t test, no

Average

significant differences are found between feature extractors SCSP and WCSP (p-value= 0.25), although the average ac-

WCSP and ACSP, at least on the used data set . At the same

curacy of WCSP is slightly better than that of SCSP. How-

time, this also manifests the necessity and applicability of

ever, with regard to SCSP and ACSP, sixteen out of eighteen

our adaptive feature extractor.

classification results of using ACSP are superior to those using SCSP. Through a paired t test, significant differences 4 Discussions and conclusions are found between feature extractors SCSP and ACSP (pvalue< 1e − 3). From these results, we can draw the conclu-

In this paper, we propose the ACSP method for the fea-

sion that the ACSP feature extractor is the best among SCSP,

ture extraction of EEG signals. Its efficacy and superiority

6

over the SCSP and WCSP methods are validated through

are respectively consistent, scarcely consistent and inconsis-

classification experiments on multiple recording sessions of

tent [5], we are confident that ACSP would function well in

three subjects. Theoretically, because EEG signals are time-

a wide range of users. We believe that ACSP would show

varying, if we use SCSP to extract features on new EEG

more merits in future developments of BCI technology.

segments, the distribution of signal features would be transAcknowledgements The authors would like to thank IDIAP Research

ferred. In this sense, adaptive feature extraction has a sound Institute of Switzerland for providing the analyzed data. The authors

basis. However, since WCSP treats all the EEG entries in

are also grateful to the anonymous editor and reviewers for giving valu-

computing covariance matrix equally, it can not catch hold

able comments.

of the variability of EEG signals effectively. This is the reason why WCSP is inferior to ACSP which considers the

References

weighted problem of EEG entries advisably. 1. Chiappa S, Mill´an JR (2005) Data set V [http://ida.first.fraunhofer.de/projects/bci/competition iii/ desc V.html]. IDIAP Research Institute, Switzerland 2. Curran EA, Stokes MJ (2003) Learning to control brain activity: a review of the production and control of EEG components for driv-

ACSP does not show improvements in the classification. We provide an explanation for this phenomenon. Though the

ing brain-computer interface (BCI) systems. Brain and Cognition 51:326–336

subjects are doing mental imagery tasks, there exist some ir-

3. Dornhege G, Blankertz B, Curio G, M¨uller KR (2004) Boosting

relevant activities from the brain. These activities also change

bit rates in noninvasive EEG single-trial classifications by feature combination and multiclass paradigms. IEEE Trans Biomed Eng

over time, and the present study does not take this into ac51:993–1002

count. A complete adaptive algorithm should also address the adaptive behaviour of the brain itself. The computational complexity of ACSP is much slight.

4. Duda RO, Hart PE, Stork DG (2000) Pattern classification. John Wiley & Sons, New York, USA 5. Gal´an petition

Because the number of electrodes (data dimensions) in BCI

F,

Oliva III,

F,

data

Gu`ardia set

V:

J

(2005) algorithm

BCI

com-

description

[http://ida.first.fraunhofer.de/projects/bci/competition iii/results/

utilities is usually small, such as 64, 128, and 256 (15 elec-

martigny/FerranGalan desc.pdf]. Faculty of Psychology, Univer-

trodes in our experiments), the simultaneous diagonaliza-

sity of Barcelona

tion of covariance matrices could almost be implemented

6. Mill´an JR (2004) On the need for on-line learning in braincomputer interfaces. Proc Int Joint Conf on Neural Networks, Bu-

in real time. This could completely meet the needs of ondapest, Hungary

line learning. Besides, as the subjects used in this article represent three different levels of mental consistency, which

7. Mill´an JR, Renkens F, Mouri˜no J, Gerstner W (2004) Brainactuated interaction. Artif Intell 159:241–259

7

8. M¨uller-Gerking J, Pfurtscheller G, Flyvbjerg H (1999) Designing optimal spatial filters for single-trial EEG classification in a movement task. Clin Neurophysiol 110:787–798 9. Nicolelis MAL (2001) Actions from thoughts. Nature 409:403– 407 10. Sun S, Zhang C (2005) Learning on-line classification via decorrelated LMS algorithm: application to brain-computer interfaces. Lect Notes Comput Sc 3735:215–226 11. Sun S, Zhang C, Lu N (2005) On the on-line learning algorithms for EEG signal classification in brain computer interfaces. Lect Notes Comput Sc 3614:638–647 12. Vapnik V (2000) The nature of statistical learning theory. Springer-Verlag, New York, USA 13. Vaughan TM (2003) Guest editorial brain-computer interface technology: a review of the second international meeting. IEEE Trans Neur Sys Reh 11:94–109 14. Wolpaw JR, McFarland DJ, Neat GW, Forneris C (1991) An EEG-based brain-computer interface for cursor control. Electroencephalogr Clin Neurophysiol 78:252-259 15. Wolpaw JR, Birbaumer N, McFarland DJ, Pfurtscheller G, Vaughan TM (2002) Brain-computer interfaces for communication and control. Clin Neurophysiol 113:767–791 16. Wolpaw JR, McFarland DJ (2004) Control of a two-dimensional movement signal by a non-invasive brain-computer interface in humans. Proc Natl Acad Sci 101:17849-17854

Suggest Documents