Sparse Bayesian Classification of EEG for Brain–Computer Interface

12 downloads 12441 Views 2MB Size Report
RIKEN Brain Science Institute, Saitama 351-0198, Japan, and also with the. System Research ... the degree of regularization without time-intensive CV. The outstanding ...... and X. Wang, “Fast nonnegative tensor factorization based on accelerated proximal gradient ... He received the B.S. degree in mathematics from.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

1

Sparse Bayesian Classification of EEG for Brain–Computer Interface Yu Zhang, Guoxu Zhou, Jing Jin, Qibin Zhao, Xingyu Wang, and Andrzej Cichocki, Fellow, IEEE

Abstract— Regularization has been one of the most popular approaches to prevent overfitting in electroencephalogram (EEG) classification of brain–computer interfaces (BCIs). The effectiveness of regularization is often highly dependent on the selection of regularization parameters that are typically determined by crossvalidation (CV). However, the CV imposes two main limitations on BCIs: 1) a large amount of training data is required from the user and 2) it takes a relatively long time to calibrate the classifier. These limitations substantially deteriorate the system’s practicability and may cause a user to be reluctant to use BCIs. In this paper, we introduce a sparse Bayesian method by exploiting Laplace priors, namely, SBLaplace, for EEG classification. A sparse discriminant vector is learned with a Laplace prior in a hierarchical fashion under a Bayesian evidence framework. All required model parameters are automatically estimated from training data without the need of CV. Extensive comparisons are carried out between the SBLaplace algorithm and several other competing methods based on two EEG data sets. The experimental results demonstrate that the SBLaplace algorithm achieves better overall performance than the competing algorithms for EEG classification. Index Terms— Brain–computer interface (BCI), electroencephalogram (EEG), event-related potential (ERP), Laplace prior, sparse Bayesian classification.

I. I NTRODUCTION

A

BRAIN–COMPUTER interface (BCI) is a new communication technique that aims to establish nonmuscular connections between a human brain and computer [1], [2]. BCIs can translate the intent of a user into computer commands by classifying (recognizing) a task-related brain activity

Manuscript received August 29, 2014; revised August 28, 2015; accepted August 29, 2015. This work was supported in part by the Grants-in-Aid for Scientific Research through the Japan Society for the Promotion of Science under Grant 15K15955 and Grant 26730125, in part by the National Natural Science Foundation of China under Grant 61201124, Grant 61202155, Grant 61203127, Grant 61305028, Grant 61573142, and Grant 91420302, in part by the Fundamental Research Funds for the Central Universities under Grant WG1414005, Grant WH1314023, and Grant WH1414022, and in part by the Guangdong Natural Science Foundation under Grant 2014A030308009. (Corresponding author: Yu Zhang). Y. Zhang, J. Jin, and X. Wang are with the Key Laboratory for Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China (e-mail: [email protected]; [email protected]; xywang@ecust. edu.cn). G. Zhou and Q. Zhao are with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Saitama 351-0198, Japan (e-mail: [email protected]; [email protected]). A. Cichocki is with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Saitama 351-0198, Japan, and also with the System Research Institute, Polish Academy of Sciences, Warsaw 00-901, Poland (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2015.2476656

typically measured by electroencephalography (EEG) [3]–[5]. With the help of BCIs, severely disabled people can potentially recover their environmental control abilities. Currently, one of the most popularly adopted EEG patterns for BCI development is event-related potential (ERP), which is time- and phase-locked to stimulus events of interest [6]–[8]. Typical ERP components P300, N170, and N200 have been successfully applied to the design of BCIs [9]–[11]. P300 is a positive deflection in EEG occurring at ∼300 ms after a stimulus, while N170 and N200 are the two negative deflections at 170 and 200 ms, respectively. Accordingly, ERP-based BCIs have been developed to detect the desired commands from a user by classifying the ERP components corresponding to controlled stimuli [6], [12]. Accurate classification of EEG patterns can be considerably challenging due to their poor signal-to-noise ratio caused by volume conduction effects, various noises, and signal nonstationarity [6], [13]. In particular, for ERP classification, the number of samples or feature vectors is relatively small within a limited calibration time, while each sample is formed by the concatenation of multiple temporal points from multiple channels. Hence, the dimensionality of the feature space is relatively high compared with the number of training samples [12]–[15]. As a result, overfitting is likely to occur, which deteriorates the classification performance when the traditional classification methods, such as Fisher’s discriminant analysis (FDA), are used. Until now, the regularization techniques have been mostly applied to prevent overfitting in EEG classification [16]–[18]. Typically, support vector machine (SVM) adopts a regularization item to control the classification error in training data and to increase the generalization capability for testing data [13]. Good SVM performance has been confirmed by many studies on EEG classification [19]–[21]. By imposing the L 2 -norm constraint, Müller et al. [18] proposed regularized FDA (RFDA) for BCI applications. The RFDA effectively prevents the influence of outliers and performs better than the FDA on motor-imagery (MI) EEG classification. Replacing the L 2 -norm by the L 1 -norm, Blankertz et al. [17] introduced a sparse variant of FDA for finger-movement-related EEG classification, which yields a lower classification error than both FDA and SVM. Zhang et al. [12] introduced a sparse FDA under the least-squares regression framework for ERP classification in BCI applications. Sparse FDA automatically implements feature selection for dimensionality reduction to alleviate the curse-of-dimensionality, and it outperforms the standard FDA. Another regularized version of FDA is stepwise

2162-237X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

linear discriminant analysis (SWLDA) [14], which improves the classification performance by iteratively selecting useful features. SWLDA has been widely applied to BCIs [7], [22]. The regularization parameter plays a key role in regularization techniques, and is typically determined by cross-validation (CV). However, the CV requires a large amount of training data to construct an additional validation set; moreover, the CV for classifier calibration is usually expensive. These two factors greatly limit the system’s practicability, and may cause a user to be reluctant to use a BCI. Instead of CV, Bayesian inference provides an effective approach to automatically and quickly estimate the model parameters under the so-called evidence framework [23]–[26]. Through a Bayesian treatment of linear regression, Hoffmann et al. [16] introduced Bayesian LDA (BLDA) for ERP classification in BCI applications. BLDA is basically a Bayesian version of RFDA, and can efficiently determine the degree of regularization without time-intensive CV. The outstanding performance of BLDA has been confirmed by many studies on BCI development [11], [27]–[29]. Through an elegant Bayesian treatment, sparse Bayesian learning (SBL) [30] shows its strength in estimating discriminant vectors with automatically determined sparsity. SBL has been successfully applied to P300 feature selection [31]. In recent years, the Bayesian-based methods have also been proposed for automatic spatial filtering of EEG [32]–[36]. Most of the related methods are summarized in [37]. In terms of the Bayesian rule, the BLDA adopts a standard Gaussian prior for L 2 -norm regularization, while the SBL uses a separate Gaussian prior for sparse learning. The standard Gaussian prior does not result in a sparse discriminative vector but a relatively smooth one. Hence, the BLDA has difficulty in selecting the most significant features for dimensionality reduction, and may not give the best classification accuracy due to the small sample size, especially for BCI applications [12]. On the other hand, although a separate Gaussian prior provides a sparse solution, it results in a multimodal posterior that gives different interpretations for a single observation [38], [39]. Consequently, the increased feature dimensionality and decreased number of samples make accurate inference more difficult. A Laplace prior is a more natural approach to sparse learning, since it is formally equivalent to L 1 -norm regularization [40]. Recently, the Laplace priors have been successfully applied to Bayesian compressive sensing [38], [41], [42]. Accordingly, we introduce a sparse Bayesian method by exploiting a Laplace prior, namely, SBLaplace, for EEG classification in BCI applications. In a hierarchical Bayes manner, a Laplace prior is used to learn a sparse discriminant vector with the Bayesian rule. All required model parameters are automatically estimated under the evidence framework without the need of additional validation data. Unlike SBL, the proposed SBLaplace algorithm provides a unimodal posterior with log-concavity that can capture the sparsity structure more accurately, rendering a more effective discriminant model for EEG classification. With two EEG data sets, extensive comparisons are implemented between the SBLaplace

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

algorithm and several other competing methods. The experimental results demonstrate that the SBLaplace achieves better overall performance for EEG classification than the competing methods, especially in a small sample size scenario. This suggests that the SBLaplace is advantageous for improving the practicability of BCIs. II. S PARSE BAYESIAN C LASSIFICATION OF EEG A. Maximum Likelihood and Regularized Least Squares Let x1 , x2 , . . . , x N ∈ R D be a set of EEG samples (feature vectors), where each sample is the concatenation of P temporal points from M channels, i.e., D = M P. The corresponding class labels are y1 , y2 , . . . , y N ∈ {1, −1}. Consider the following linear model: y = Xw + 

(1)

where y is the target vector containing the class labels, X = [x1 , x2 , . . . , x N ]T ∈ R N×D , and  denotes the zero-mean Gaussian noise with variance σ 2 . The likelihood function for weight w can be written as   N  2 1 1 2 2 p(y|w, σ ) = exp − 2 y − Xw2 . (2) 2πσ 2 2σ Taking the logarithm of the likelihood function yields N 1 y − Xw22 . (3) ln(2πσ 2 ) − 2 2σ 2 The maximum likelihood estimation for w is equivalent to minimizing the squared error y − Xw22 , which gives the solution ln p(y|w, σ 2 ) = −

w ˜ = (XT X)−1 XT y

(4)

which is the same as the least squares estimate [24], [43]. The least squares (maximum likelihood) estimate is likely to result in overfitting when the number of samples is small relative to the dimensionality of the feature space. Recently, the regularization has become a very popular approach to prevent overfitting. The regularized least squares is formulated as  1 w ˜ = arg min y − Xw22 + λ |wd |q 2 w D

(5)

d=1

where λ is a regularization parameter, q = 2 corresponds to L 2 -norm regularization [44] that gives a smooth solution, while q = 1 corresponds to L 1 -norm regularization, i.e., the least absolute shrinkage and selection operator [45] that results in a sparse solution. The optimal selection of the regularization parameter λ plays a key role in regularization techniques, and is typically implemented by CV. Although CV provides an accurate estimate for λ when using enough training samples for constructing training and validation sets, it usually requires expensive computations and is hardly applicable to small sample size scenarios [12], [46]. In the context of BCIs, these limitations substantially depress the system’s practicability, and may cause a user to be reluctant.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. ZHANG et al.: SPARSE BAYESIAN CLASSIFICATION OF EEG FOR BCI

3

B. BLDA

C. SBL

Bayesian inference provides an effective approach for automatic and relatively quick estimation of the regularization parameter using only the training set [23]. This allows all available samples to be used for training without the need of a validation set for CV [24]. With Bayesian linear regression, Hoffmann et al. [16] introduced BLDA to EEG classification for P300-based BCIs. Instead of the regularization term in (5), a zero-mean Gaussian prior in a probabilistic model is defined to express the penalty

The standard Gaussian prior in (6) used by BLDA is similar to the effect of L 2 -norm regularization in (5) when q = 2, which does not result in a sparse solution. However, a sparse projection vector has been suggested to be promising for automatic feature reduction, and hence to improve the classification of EEG [12], [31]. SBL, also known as relevance vector machine [30], provides an elegant way to obtain a sparse solution under the probabilistic framework by defining the following separate Gaussian prior:

p(w|α) =

D 

N (wd |0, α −1 )

p(w|α) =

d=1

=

 α D  α  2 exp − w22 2π 2

p(y|w, σ 2 ) p(w|α) . p(y|α, σ 2 )

(7)

As a result of combining a Gaussian prior and a Gaussian likelihood function, the posterior is also Gaussian with covariance and mean  = (σ −2 XT X + αI)−1

(8)

μ = σ −2 XT y.

(9)

For a new test sample xˆ , the predictive distribution can be computed by  p( yˆ |w, σ 2 , xˆ ) p(w|α, σ 2 , y) dw (10) p( yˆ |α, σ 2 , xˆ , y) = which is again Gaussian with mean and variance μˆ = μT xˆ

(11)

σˆ = σ + xˆ  xˆ . 2

2

T

(12)

A larger predictive mean more strongly represents the characteristics of ERPs as defined by the training set. Both the posterior in (7) and the predictive distribution in (10) depend on the hyperparameters α and σ 2 . Using the evidence procedure in [23], the two hyperparameters can be automatically and iteratively estimated as follows: α← σ2 ← where γ =

D

d=1 ηd /(α



N wd |0, αd−1

d=1

(6)

where α is a shared inverse variance hyperparameter similar to the regularization parameter λ in (5). Given the likelihood function in (2), the posterior can be computed according to the Bayesian rule p(w|α, σ 2 , y) =

D 

γ μT μ

(13)

y − Xμ22 N −γ

(14)

+ ηd ), and ηd is the dth eigenvalue

of matrix XT X/σ 2 . Recently, it was confirmed that the BLDA outperforms several other alternative algorithms, including FDA, SWLDA, and SVM for P300 classification [27].

=

D   αd  12 d=1



  1 exp − αd wd2 . 2

(15)

Instead of a shared hyperparameter α as in BLDA, (15) adopts D hyperparameters α = [α1 , . . . , α D ] to separately control the inverse variances of weights w1 , . . . , w D , respectively. Following the Bayesian rule with the likelihood function given in (2), we can estimate the mean of the posterior p(w|α, σ 2 , y) using (9) and the covariance by:  = (σ −2 XT X + )−1

(16)

where  = diag([α1 , . . . , α D ]). To estimate the hyperparameters α and σ 2 , the marginal likelihood p(y|α, σ 2 ) [23] is maximized to obtain the following iterative estimation formulas: αd ←

γd μ2d

(17)

σ2 ←

y − Xμ22 D N − d=1 γd

(18)

where μd is the dth posterior mean computed by (9), and γd = 1 − αd dd , where dd is the dth diagonal entry of the posterior covariance computed in (16) with the current α and σ 2 values. With the estimated posterior mean and covariance, the predictive distribution of a new test sample can be characterized by (11) and (12). In recent years, the SBL has been successfully applied to EEG compressed sensing and feature selection [31], [47].

D. SBLaplace Compared with the separate Gaussian prior in (15), the Laplace prior is considered to be more natural for expressing L 1 -regularization in (5) when q = 1, which has been confirmed to capture the sparsity structure more accurately [38], [41]. In fact, the Laplace prior has shown its strength in compressive sensing under the probabilistic framework [39]. Accordingly, we introduce a sparse Bayesian classification method by exploiting Laplace priors, namely, SBLaplace, to ERP-based BCIs. First, we define the following

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Laplace prior: D  τ exp (−τ |wd |) 2 d=1  τ D = exp (−τ w1 ) 2

p(w|τ ) =

(19)

where τ is a hyperparameter similar to the regularization parameter λ in (5). However, (19) is intractable with Bayesian analysis since it is not conjugate to the likelihood function in (2). To implement Bayesian D inference, we define a separate N (wd |0, δd ) and hyperprior Gaussian prior p(w|δ) = d=1      D θ δd θ θ = exp − p(δ|θ ) = δ|1, 2 2 2

(20)

d=1

where (a|b, c) = (a b−1 exp(−ac)/c−b (b)) denotes the Gamma distribution, and θ is a hyperparameter. The Laplace prior can then be remodel in a hierarchical Bayes manner [38]  p(w|θ ) = p(w|δ) p(δ|θ ) dδ



√ D D √  θ exp − θ |wd | (21) = 2 d=1

where θ = τ 2 . Thus, with the likelihood function p(y|w, σ 2 ), prior p(w|δ), and hyperprior p(δ|θ ), we can characterize the posterior p(w|δ, σ 2 , θ, y) using the covariance in (16) when αd = δd−1 and the mean is given by (9). Following the evidence procedure [23], we estimate the hyperparameters by maximizing the marginal likelihood:  p(y|w, σ 2 ) p(w|δ) p(δ|θ ) p(θ ) p(σ 2 ) dw p(y|δ, σ 2 , θ ) = 

  N 1 2 1 T −1 − 21 = |C| exp − y C y 2π 2 × p(δ|θ ) p(θ ) p(σ 2 )

(22)

where C = σ 2 I + X−1 XT , p(θ ) = θ −1, and p(σ 2 ) = σ 2. For convenience, we maximize the logarithm of the marginal likelihood p(y|δ, σ 2 , θ ) by considering the following cost function: 1 1 L = − log |C| − y T C−1 y 2 2 D θ + (D − 1) log θ − δd − log σ −2 2

and hence y T C−1 y = σ −2 yT y − σ −2 yT XXT σ −2 y = σ −2 yT (y − Xμ) = σ −2 y − Xμ22 + μT  −1 μ − σ −2 μT XT Xμ = σ −2 y − Xμ22 + μT μ.

dd + μ2d ∂L 1 θ = − − =0 ∂δd 2δd 2 2δd2  2 dd + μ2d 1 1 ⇒ δd + + 2 = 2θ θ 4θ  dd + μ2d 1 1 ⇒ δd ← + 2− θ 4θ 2θ D ∂L D−1  = − δd /2 = 0 ∂θ θ

The Woodbury inversion identity [49] implies (25)

(27)

d=1

D−1

⇒ θ ← D

d=1 δd /2 D 

(28) 

 2

γd − 2 σ − y

N−

⇒ σ2 ←

(24)

(26)

With the equivalences in (24) and (26), we set the derivatives of L with respect to δ, θ , and σ −2 to zero and obtain the following iterative estimation formulas for the hyperparameters:

(23)

where the constant terms independent of hyperparameters δ, σ 2 , and θ are ignored. Exploiting the determinant identity [48] yields

C−1 = σ −2 I − σ −2 X( + σ −2 XT X)−1 Xσ −2

Probabilistic graphical model of the SBLaplace method.

∂L 1 = −2 ∂σ 2

d=1

log |C| = − log || − N log σ −2 − log ||.

Fig. 1.

− Xμ22

=0

d=1

y − Xμ22 D N − d=1 γd − 2

(29)

where γd = 1 − dd /δd . After the hyperparameter optimization, the posterior covariance and mean can be computed by (16) and (9), respectively. The predictive mean and variance of a new test sample are then estimated by (11) and (12). Fig. 1 shows the probabilistic graphical model of the SBLaplace method. The whole procedure of SBLaplace for EEG classification is summarized in Algorithm 1.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. ZHANG et al.: SPARSE BAYESIAN CLASSIFICATION OF EEG FOR BCI

Algorithm 1 SBLaplace Algorithm for EEG Classification ]T

R N×D ,

Input: EEG training set X = [x1 , . . . , x N ∈ label set y = [y1 , . . . , y N ]T ∈ {1, −1}, and test EEG sample xˆ ∈ R D . Output: The predictive mean μˆ as a classification score. Initialization of hyperparameters δ, θ , and σ 2 . repeat Update the posterior covariance  by (16); Update the posterior mean μ by (9); for d = 1 to D do Update the hyperparameter δd by (27); end Update the hyperparameter θ by (28); Update the hyperparameter σ 2 by (29); until Convergence; Compute the predictive mean μˆ by (11).

Fig. 2. Experimental paradigms for EEG recordings of (a) Data Set A and (b) Data Set B.

III. E XPERIMENTAL S TUDY A. EEG Recordings Two EEG data sets recorded from our own experiments were used to validate the performance of the proposed SBLaplace method for ERP classification. 1) Data Set A: Data Set A was recorded from eight healthy subjects (A1–A8, from 22 to 27 years of age). The subjects were seated in a comfortable chair 60 cm from a laptop with a 15.6-in screen. Fig. 2(a) shows the experimental layout for the Data Set A recordings. Six circles were used as the stimuli to elicit ERPs, which corresponded to six different commands. Each subject completed 15 runs. In each run, a randomly cued circle was first intensified for 1 s followed by a 1-s black screen period. The six circles were then intensified in a block-randomized sequence 16 times. The intensification duration and interstimulus interval (ISI) were 200 and 100 ms, respectively. During the experiment, the subjects were asked to focus their attention on the cued circles and silently count the number of times they were intensified. EEG signals were recorded using the g.USBamp amplifier (g.tec, Austria) at a 256-Hz sampling rate with a 64-channel cap and were bandpass filtered between 0.5 and 30 Hz. The average of two mastoid electrodes was used as a reference and the electrode Fpz as a ground. The 16 electrodes (F3, Fz, F4, C3, Cz, C4, P7, P3, Pz, P4, P8, PO7, PO8, O1, Oz, and O2) were used for analysis.

5

A 1000-ms data segment was extracted from each circle stimulus intensification and downsampled with a decimation factor of eight. A sample (feature vector) was then formed by the concatenation of 32 temporal points from each of the 16 channels. A total of 1440 samples (240 targets and 1200 nontargets) with the feature dimensionality of 512 were obtained from each subject. Every 96 samples (16 targets and 80 nontargets) corresponded to one command selection (15 command selections in total). 2) Data Set B: Data Set B was recorded from seven healthy subjects (S1–S7, from 24 to 49 years of age). The subjects were seated in a comfortable chair 60 cm from a 17-in LCD monitor. Fig. 2(b) shows the experimental layout, which consisted of eight arrow commands to simulate a navigation control. Randomly specified objects (car, ship, bicycle, or house) were used as the visual stimuli to elicit ERPs. Each subject completed 16 experimental runs. In each run, a randomly cued target arrow was first presented in the middle of the screen for 1 s followed by a black screen period of 1 s. A block-randomized sequence of object presentations was subsequently presented. The presentation block was repeated five times. In each block, an object was presented on one arrow position once with a duration of 100 ms and ISI of 80 ms. During the experiment, the subjects were asked to focus their attention on the cued arrow positions and silently count the number of times the object stimuli appeared. EEG signals were recorded using the g.USBamp amplifier (g.tec, Austria) at a 256-Hz sampling rate with high-pass and low-pass filters of 0.1 and 30 Hz, respectively. The following 16 electrodes (arranged in accordance with the 10–20 international systems) were used for signal recording and analysis: F3, Fz, F4, T7, C3, Cz, C4, T8, P7, P3, Pz, P4, P8, PO7, PO8, and Oz, referenced to the average of two mastoids and grounded to the electrode Fpz. A 700-ms data segment was extracted from each object stimulus presentation and downsampled by a 12-point moving average. A sample (feature vector) was then constructed by the concatenation of 15 temporal points from each of the 16 channels; hence, the feature dimensionality was 240. A total of 640 samples consisting of 80 targets and 560 nontargets were derived for each subject, and every 40 samples (5 targets and 35 nontargets) corresponded to one command selection (16 command selections in total). B. Experimental Objectives The objectives of the two experiments were to realize the ERP-based BCIs. For each experiment, the task was to detect the target command (intensified circle and presented object for Data Set A and Data Set B, respectively) gazed by the subject through classifying the evoked ERPs from EEG. Thus, a classifier model was learned using a specific algorithm with N EEG training data. The classifier was subsequently applied to compute the classification scores of K test trials for each command. The classification scores were averaged over K trials, and the command corresponding to the maximum of the averaged classification scores was then detected as the target command. The classification accuracy was computed

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Fig. 3. Classification accuracies obtained by the SVM, SWLDA, BLDA, SBL, and SBLaplace algorithms, averaged for the subjects of Data Set A, for different numbers of training data. Six training runs: N = 576. Eight training runs: N = 768. Ten training runs: N = 960. K denotes the number of trials averaged. The accuracies shown in the left figure (six training runs) correspond to the average results of Fig. 4.

as the ratio of the number of correctly detected commands to total number of command selections. Theoretically, higher classification accuracy can be achieved by using more training data for classifier calibration or a larger number of trials average for command detection. However, the larger values of N and K require longer time for recording EEG and deciding a command, respectively, which consequently depresses the practicability of the BCI system. Hence, an effective classification algorithm for ERP-based BCIs should be able to obtain high classification accuracy with the fewest possible training data and smallest number of averaged trials. C. Performance Evaluation In this paper, we compared the performances of BLDA, SBL, and SBLaplace for ERP classification. For the three methods, a larger predictive mean more strongly represented the characteristics of ERPs as defined by the training set. Thus, the mean of the predictive distribution was used as the classification score for direct classification. For each command selection, the classification scores were averaged over the trials, and the command with the maximum average score was determined to be the target command. The class probability can also be computed with the predictive distribution for classification [16]. As two benchmarks, both SVM with a linear kernel and SWLDA [13], [14] were included in the comparison. The regularization parameter C in SVM was determined by the leave-one-run-out CV on the training data. The parameter settings of SWLDA were the same as those in [14] (i.e., add feature: p-value < 0.1; remove feature: p-value > 0.15; and maximal number of features: 60). We investigated the classification accuracy of each method using different numbers of training data with different numbers of trials averaged. For Data Set A, six, eight, and ten command selections were randomly chosen from the 15 command selections for classifier training. The remaining command selections were used to evaluate the classification performance. For Data Set B, four, six, and eight command selections were used for classifier training, while the remaining half were used

for testing. Both of the programs were repeated 100 times, and the average classification accuracies were calculated for the two data sets. D. Results 1) Classification Results for Data Set A: Fig. 3 shows the classification accuracies obtained by SVM, SWLDA, BLDA, SBL, and SBLaplace, averaged over the subjects of Data Set A (subjects A1–A8), for different numbers of training data. For all six, eight, and ten training runs, the proposed SBLaplace method yielded higher average accuracy than the other four methods. As the number of training data decreased, a more significant superiority was achieved by SBLaplace than the other methods. Fig. 4 further shows the classification accuracies of all eight subjects when using six training runs for classifier training. The paired t-test was used to investigate the statistical significance of accuracy differences between SBLaplace and the other methods. SBLaplace yielded significantly higher accuracy than SVM (for K = 1, . . . , 16, p < 0.05), SWLDA (for K = 1, . . . , 12, p < 0.05), BLDA (for K = 1, 2, 3, p < 0.05), and SBL (for K = 2, 3, 6, . . . , 10, p < 0.05). 2) Classification Results for Data Set B: Fig. 5 shows the classification accuracies obtained by the five methods, averaged over the subjects of Data Set B (subjects S1–S7), for different numbers of training data. As the number of training data decreased, the SBLaplace method yielded a more obvious superiority over the other four methods. While SBLaplace and SBL obtained similar performance for eight training runs, the former outperformed the latter when four training runs were used. Fig. 6 further presents the classification accuracies of all seven subjects when using four training runs for classifier training. The paired t-test revealed that the SBLaplace method yielded significantly higher accuracy than SWLDA and BLDA (for K = 1, . . . , 5, p < 0.05), SVM (for K = 1, . . . , 4, p < 0.05), and SBL (for K = 1, 2, 3, p < 0.05). In summary, the proposed SBLaplace method achieved better overall performance than the other four methods for ERP classification. The superiority of SBLaplace became

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. ZHANG et al.: SPARSE BAYESIAN CLASSIFICATION OF EEG FOR BCI

7

Fig. 4. Classification accuracies for Data Set A derived by the SVM, SWLDA, BLDA, SBL, and SBLaplace algorithms, using 1 to 16 trials averaged for target command detection, with the data from six runs (N = 576) for classifier training. K denotes the number of trials averaged.

Fig. 5. Classification accuracies obtained by the SVM, SWLDA, BLDA, SBL, and SBLaplace algorithms, averaged for the subjects of Data Set B, for different numbers of training data. Four training runs: N = 160. Six training runs: N = 240. Eight training runs: N = 320. K denotes the number of trials averaged. The accuracies shown in the left figure (four training runs) correspond to the average results of Fig. 6.

more significant as the number of training data used for classifier calibration decreased and the number of trials averaged for command detection decreased. This indicates that the SBLaplace method is a powerful tool for improving the practicability of ERP-based BCIs. IV. D ISCUSSION to

Regularization techniques have recently been applied prevent overfitting in EEG classification for

BCI applications [6], [12], [17]. However, the efficiency of regularization-based methods is often highly dependent on the selection of regularization parameters, which are typically determined by a CV procedure. CV is usually time-consuming and difficult to efficiently implement in small sample size scenarios since it requires additional data to validate the selection of regularization parameters [46]. Instead of regularization terms, BLDA, SBL, and the proposed SBLaplace method control the model complexity

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Fig. 6. Classification accuracies for Data Set B obtained by the SVM, SWLDA, BLDA, SBL, and SBLaplace algorithms, using one to five trials averaged for target command detection, with the data from four runs (N = 160) for classifier training. K denotes the number of trials averaged.

automatically to prevent overfitting by exploiting different priors under the Bayesian framework. More specifically, BLDA implements an automatic L 2 -norm regularization by adopting a standard Gaussian prior, while SBL and SBLaplace learn sparse solutions using a separate Gaussian prior and a Laplace prior, respectively. Fig. 7 shows the differences among the three types of priors. The standard Gaussian prior is a spherical distribution resulting in a posterior with low mass in the areas close to the axes, whereas the separate Gaussian prior and Laplace prior shift the posterior probability mass toward the axes, which encourages sparse solutions for automatic feature selection and dimensionality reduction. The posterior results from the separate Gaussian prior are shrunk toward the axes more heavily, and hence, strongly enforce the sparsity. However, it presents a bimodal characteristic that gives two different interpretations for a single observation and makes accurate inference more difficult with increasing feature dimensionality and a decreasing number of samples [50]. On the contrary, the Laplace prior leads to a unimodal posterior with log-concavity that provides an effective approach to avoid local minima [39], [41]. Thus, the Laplace prior more accurately captures the sparsity structure when compared with the separate Gaussian prior [38], [41]. Since a specific type of EEG response usually presents spatially localized and time- or phase-locked characteristics, sparse regularization has been suggested for EEG analysis [51]–[55]. Fig. 8 shows the scalp topographies of discriminative information (r 2 value [6]) at different temporal points, evaluated from subject S3 of Data Set B.

Fig. 7. Contour plots of the Gaussian prior, Student-t prior, Laplace prior, and the corresponding posterior distributions in two dimensions. The Student-t prior is derived by integrating out the hyperparameter α for the separate Gaussian prior. Uniform hyperpriors are specified on α, which are obtained from p(αd ) = (αd |0, 0). Compared with the standard Gaussian prior, the separate Gaussian prior and Laplace prior shift the posterior probability mass toward the axes, which encourages sparse solutions. Although the separate Gaussian prior more heavily shrinks the posterior toward the axes and enforces the sparsity, it presents a bimodal characteristic that gives two different interpretations for a single observation and makes accurate inference more difficult as the feature dimensionality increases. The Laplace prior leads to a unimodal posterior with log-concavity that provides an effective approach for avoiding local minima.

Fig. 8 indicates that these discriminative features mainly result from the ERP components across 200–600 ms, and they are indeed relatively sparse in the spatial-temporal distribution.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. ZHANG et al.: SPARSE BAYESIAN CLASSIFICATION OF EEG FOR BCI

Fig. 8.

9

Scalp topographies of the discriminative information (r 2 values) at different temporal points, obtained from subject S3 of Data Set B.

Fig. 9. Discriminant vectors derived by the SWLDA, BLDA, SBL, and SBLaplace algorithms for subject S3 of Data Set B, which were presented in the form of channels × temporal points.

Fig. 9 shows the learned discriminant vectors (presented in the form of channels × temporal points) for SWLDA, BLDA, SBL, and SBLaplace. Compared with BLDA, SBL and SBLaplace learned more sparse discriminative vectors. Through sparse learning, both SBL and SBLaplace captured the most discriminative features more accurately than the BLDA. Although SWLDA also derived sparse discriminative vectors, it failed to capture the significant features located in P3 at the fifth and seventh temporal points but put a large weight on the insignificant features located in C3 at the fourth temporal point. These results are evidence of the superiority of both SBL and SBLaplace over BLDA and SWLDA. Although SBL learned a more sparse discriminative vector compared with the SBLaplace, it overlooked some important features located in PO7 at the seventh temporal point, P4 and PO8 at the ninth temporal point, and Oz at the 12-th temporal point. These features were efficiently captured by SBLaplace. The experimental results demonstrate that the Laplace prior is more effective for sparse feature extraction of EEG than the separate Gaussian prior. Furthermore, we compared the computational times of SWLDA, SVM, BLDA, SBL, and SBLaplace (see Table I). The algorithms were implemented under MATLAB R2012a

TABLE I C OMPUTATIONAL T IME FOR C LASSIFIER T RAINING R EQUIRED BY THE SWLDA, SVM, BLDA, SBL, AND SBL APLACE A LGORITHMS FOR

O NE S UBJECT F ROM D ATA S ET A AND D ATA S ET B. SVM U SES A L EAVE -O NE -RUN -O UT CV FOR R EGULARIZATION PARAMETER S ELECTION

on a laptop with a 2.5-GHz CPU (10-GB RAM). SWLDA, BLDA, SBL, and SBLaplace performed the classifier training at comparable speeds. However, the SVM took a significantly longer time to train the classifier due to CV for regularization parameter selection. This indicates that Bayesian treatment can greatly increase the speed of estimating the model parameters. These results also show that the SBLaplace efficiently prevents overfitting. Note that the Bayesian-based methods have been successfully introduced in the automatic extraction of

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

EEG sources [32]–[35]. However, the optimal model parameters are difficult to determine when the number of variables is much larger than the number of measurements. A combination of the sparse Bayesian method with other more advanced techniques [56]–[60] is potentially helpful for determining the model parameters in underdetermined scenarios. We plan to study this conjecture in the future. The proposed SBLaplace method may also be directly applied to other types of BCIs such as the BCI based on MI [61]. Common spatial pattern (CSP) has been widely applied to MI feature extraction [62]. However, the effectiveness of CSP largely depends on the selection of filter bands. A good selected filter band may greatly improve the classification accuracy, while a poor selection may result in low effectiveness of the CSP [63]–[65]. Accordingly, we plan to exploit SBLaplace to learn a sparse discriminant vector for subband (or subfeature) optimization to further enhance the MI classification performance in the future. V. C ONCLUSION This paper introduced a sparse Bayesian method based on a Laplace prior called SBLaplace to classify EEG for BCI applications. The SBLaplace algorithm learns a sparse discriminant vector through hierarchical Bayes modeling of a Laplace prior. The sparsity degree is automatically and quickly estimated from training data under a Bayesian evidence framework. The extensive experimental comparisons were performed between the SBLaplace algorithm and several other competing methods on two EEG data sets. The results demonstrate that the SBLaplace algorithm yields better overall classification performance for ERP-based BCIs, especially for small sample size scenarios. This indicates that the proposed method is promising for improving the practicability of BCI systems. R EFERENCES [1] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, “Brain–computer interfaces for communication and control,” Clin. Neurophysiol., vol. 113, no. 6, pp. 767–791, 2002. [2] J. Jin, E. W. Sellers, S. Zhou, Y. Zhang, X. Wang, and A. Cichocki, “A P300 brain–computer interface based on a modification of the mismatch negativity paradigm,” Int. J. Neural Syst., vol. 25, no. 3, pp. 1550011-1–1550011-12, 2015. [3] M. Arvaneh, C. Guan, K. K. Ang, and C. Quek, “Optimizing spatial filters by minimizing within-class dissimilarities in electroencephalogrambased brain–computer interface,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 4, pp. 610–619, Apr. 2013. [4] J. del R. Millán et al., “A local neural classifier for the recognition of EEG patterns associated to mental tasks,” IEEE Trans. Neural Netw., vol. 13, no. 3, pp. 678–686, May 2002. [5] Y. Zhang, G. Zhou, J. Jin, X. Wang, and A. Cichocki, “Frequency recognition in SSVEP-based BCI using multiset canonical correlation analysis,” Int. J. Neural Syst., vol. 24, no. 4, pp. 1450013-1–1450013-14, 2014. [6] B. Blankertz, S. Lemm, M. Treder, S. Haufe, and K.-R. Müller, “Single-trial analysis and classification of ERP components—A tutorial,” NeuroImage, vol. 56, no. 2, pp. 814–825, 2011. [7] E. W. Sellers and E. Donchin, “A P300-based brain–computer interface: Initial tests by ALS patients,” Clin. Neurophysiol., vol. 117, no. 3, pp. 538–548, 2006. [8] J. Jin, B. Z. Allison, Y. Zhang, X. Wang, and A. Cichocki, “An ERP-based BCI using an oddball paradigm with different faces and reduced errors in critical functions,” Int. J. Neural Syst., vol. 24, no. 8, pp. 1450027-1–1450027-14, 2014.

[9] Y. Zhang, Q. Zhao, J. Jin, X. Wang, and A. Cichocki, “A novel BCI based on ERP components sensitive to configural processing of human faces,” J. Neural Eng., vol. 9, no. 2, p. 026018, 2012. [10] B. Hong, F. Guo, T. Liu, X. Gao, and S. Gao, “N200-speller using motion-onset visual response,” Clin. Neurophysiol., vol. 120, no. 9, pp. 1658–1666, 2009. [11] J. Jin, I. Daly, Y. Zhang, X. Wang, and A. Cichocki, “An optimized ERP brain–computer interface based on facial expression changes,” J. Neural Eng., vol. 11, no. 3, p. 036004, 2014. [12] Y. Zhang, G. Zhou, J. Jin, Q. Zhao, X. Wang, and A. Cichocki, “Aggregation of sparse linear discriminant analyses for event-related potential classification in brain–computer interface,” Int. J. Neural Syst., vol. 24, no. 1, pp. 1450003-1–1450003-15, 2014. [13] F. Lotte, M. Congedo, A. Lécuyer, F. Lamarche, and B. Arnaldi, “A review of classification algorithms for EEG-based brain–computer interfaces,” J. Neural Eng., vol. 4, no. 2, pp. R1–R13, 2007. [14] D. J. Krusienski et al., “A comparison of classification techniques for the P300 speller,” J. Neural Eng., vol. 3, no. 4, pp. 299–305, 2006. [15] Y. Zhang, G. Zhou, Q. Zhao, J. Jin, X. Wang, and A. Cichocki, “Spatial-temporal discriminant analysis for ERP-based brain–computer interface,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 21, no. 2, pp. 233–243, Mar. 2013. [16] U. Hoffmann, J.-M. Vesin, T. Ebrahimi, and K. Diserens, “An efficient P300-based brain–computer interface for disabled subjects,” J. Neurosci. Methods, vol. 167, no. 1, pp. 115–125, 2008. [17] B. Blankertz, G. Curio, and K. R. Müller, “Classifying single trial EEG: Towards brain computer interfacing,” in Advances in Neural Information Processing Systems, vol. 14, T. G. Dietterich, S. Becker, and Z. Ghahramani, Eds. Cambridge, MA, USA: MIT Press, 2002, pp. 157–164. [18] K.-R. Müller, M. Krauledat, D. Dornhege, G. Curio, and B. Blankertz, “Machine learning techniques for brain–computer interfaces,” Biomedizinische Technik, vol. 49, no. 1, pp. 11–22, 2004. [19] M. Kaper, P. Meinicke, U. Grossekathoefer, T. Lingner, and H. Ritter, “BCI competition 2003—Data set IIb: Support vector machines for the P300 speller paradigm,” IEEE Trans. Biomed. Eng., vol. 51, no. 6, pp. 1073–1076, Jun. 2004. [20] Y. Li, C. Guan, H. Li, and Z. Chin, “A self-training semi-supervised SVM algorithm and its application in an EEG-based brain computer interface speller system,” Pattern Recognit. Lett., vol. 29, no. 9, pp. 1285–1294, 2008. [21] Y. Li and C. Guan, “Joint feature re-extraction and classification using an iterative semi-supervised support vector machine algorithm,” Mach. Learn., vol. 71, no. 1, pp. 33–53, 2008. [22] G. Townsend et al., “A novel P300-based brain–computer interface stimulus presentation paradigm: Moving beyond rows and columns,” Clin. Neurophysiol., vol. 121, no. 7, pp. 1109–1120, 2010. [23] D. J. C. MacKay, “Bayesian interpolation,” Neural Comput., vol. 4, no. 3, pp. 415–447, 1992. [24] C. Bishop, Pattern Recognition and Machine Learning. New York, NY, USA: Springer-Verlag, 2006. [25] Q. Zhao, L. Zhang, and A. Cichocki, “Bayesian CP factorization of incomplete tensors with automatic rank determination,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1751–1763, Sep. 2015. [26] Q. Zhao, G. Zhou, L. Zhang, A. Cichocki, and S.-I. Amari, “Bayesian robust tensor factorization for incomplete multiway data,” IEEE Trans. Neural Netw. Learn. Syst., to be published. [27] N. V. Manyakov, N. Chumerin, A. Combaz, and M. M. Van Hulle, “Comparison of classification methods for P300 brain–computer interface on disabled subjects,” Comput. Intell. Neurosci., vol. 2011, Jul. 2011, Art. ID 519868. [28] X. Lei, P. Yang, and D. Yao, “An empirical Bayesian framework for brain–computer interfaces,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 17, no. 6, pp. 521–529, Dec. 2009. [29] P. Xu, P. Yang, X. Lei, and D. Yao, “An enhanced probabilistic LDA for multi-class brain computer interface,” PLoS One, vol. 6, no. 1, p. e14634, 2011. [30] M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine,” J. Mach. Learn. Res., vol. 1, pp. 211–244, Sep. 2001. [31] U. Hoffmann, A. Yazdani, J.-M. Vesin, and T. Ebrahimi, “Bayesian feature selection applied in a P300 brain–computer interface,” in Proc. 16th Eur. Signal Process. Conf., Lausanne, Switzerland, Aug. 2008, pp. 1–5. [32] W. Wu, Z. Chen, S. Gao, and E. N. Brown, “A hierarchical Bayesian approach for learning sparse spatio-temporal decompositions of multichannel EEG,” NeuroImage, vol. 56, no. 4, pp. 1929–1945, 2011.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. ZHANG et al.: SPARSE BAYESIAN CLASSIFICATION OF EEG FOR BCI

[33] W. Wu, C. Wu, S. Gao, B. Liu, Y. Li, and X. Gao, “Bayesian estimation of ERP components from multicondition and multichannel EEG,” NeuroImage, vol. 88, pp. 319–339, Mar. 2014. [34] W. Wu, Z. Chen, X. Gao, Y. Li, E. N. Brown, and S. Gao, “Probabilistic common spatial patterns for multichannel EEG analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp. 639–653, Mar. 2015. [35] H. Zhang, H. Yang, and C. Guan, “Bayesian learning for spatial filtering in an EEG-based brain–computer interface,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 7, pp. 1049–1060, Jul. 2013. [36] H. Kang and S. Choi, “Bayesian common spatial patterns for multisubject EEG classification,” Neural Netw., vol. 57, pp. 39–50, Sep. 2014. [37] S. Sanei, Adaptive Processing of Brain Signals. Chichester, U.K.: Wiley, 2013. [38] M. A. T. Figueiredo, “Adaptive sparseness for supervised learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 9, pp. 1150–1159, Sep. 2013. [39] S. D. Babacan, R. Molina, and A. K. Katsaggelos, “Bayesian compressive sensing using Laplace priors,” IEEE Trans. Image Process., vol. 19, no. 1, pp. 53–63, Jan. 2010. [40] D. Wipf, J. Palmer, B. Rao, and K. Kreutz-Delgado, “Performance evaluation of latent variable models with sparse priors,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Honolulu, HI, USA, Apr. 2007, pp. II-453–II-456. [41] M. W. Seeger and H. Nickisch, “Compressed sensing and Bayesian experimental design,” in Proc. 25th Int. Conf. Mach. Learn. (ICML), Helsinki, Finland, Jul. 2008, pp. 912–919. [42] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. Signal Process., vol. 56, no. 6, pp. 2346–2356, Jun. 2008. [43] M. E. Tipping, “Bayesian inference: An introduction to principles and practice in machine learning,” in Advanced Lectures on Machine Learning (Lecture Notes in Computer Science), vol. 3176, O. Bousquet, U. von Luxburg, and G. Rätsch, Eds. Tübingen, Germany: Springer-Verlag, 2004, pp. 41–62. [44] A. Tikhonov and V. Y. Arsenin, Solutions of Ill-Posed Problems. Washington, DC, USA: V. H. Winston, 1977. [45] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy. Statist. Soc., B (Methodological), vol. 58, no. 1, pp. 267–288, 1996. [46] H. Lu, H.-W. Eng, C. Guan, K. N. Plataniotis, and A. N. Venetsanopoulos, “Regularized common spatial pattern with aggregation for EEG classification in small-sample setting,” IEEE Trans. Biomed. Eng., vol. 57, no. 12, pp. 2936–2946, Dec. 2010. [47] Z. Zhang and B. D. Rao, “Sparse signal recovery with temporally correlated source vectors using sparse Bayesian learning,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 5, pp. 912–926, Sep. 2011. [48] K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis. New York, NY, USA: Academic, 1979. [49] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore, MD, USA: The Johns Hopkins Univ. Press, 2012. [50] F. Steinke, M. Seeger, and K. Tsuda, “Experimental design for efficient identification of gene regulatory networks using sparse Bayesian models,” BMC Syst. Biol., vol. 1, no. 1, p. 51, 2007. [51] M. Arvaneh, C. Guan, K. K. Ang, and C. Quek, “Optimizing the channel selection and classification accuracy in EEG-based BCI,” IEEE Trans. Biomed. Eng., vol. 58, no. 6, pp. 1865–1873, Jun. 2011. [52] Y. Zhang, G. Zhou, J. Jin, M. Wang, X. Wang, and A. Cichocki, “L1-regularized multiway canonical correlation analysis for SSVEP-based BCI,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 21, no. 6, pp. 887–896, Nov. 2013. [53] Y. Li, A. Cichocki, and S.-I. Amari, “Blind estimation of channel parameters and source components for EEG signals: A sparse factorization approach,” IEEE Trans. Neural Netw., vol. 17, no. 2, pp. 419–431, Mar. 2006. [54] Y. Zhang, J. Jin, X. Qing, B. Wang, and X. Wang, “LASSO based stimulus frequency recognition model for SSVEP BCIs,” Biomed. Signal Process. Control, vol. 7, no. 2, pp. 104–111, 2012. [55] Y. Zhang, G. Zhou, J. Jin, X. Wang, and A. Cichocki, “Optimizing spatial patterns with sparse filter bands for motor-imagery based brain–computer interface,” J. Neurosci. Methods, no. 255, pp. 85–91, Nov. 2015. [56] S. Xie, L. Yang, J.-M. Yang, G. Zhou, and Y. Xiang, “Time-frequency approach to underdetermined blind source separation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 2, pp. 306–316, Feb. 2012. [57] Z. Yang, Y. Xiang, Y. Rong, and S. Xie, “Projection-pursuit-based method for blind separation of nonnegative sources,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 1, pp. 47–57, Jan. 2013.

11

[58] G. Zhou, S. Xie, Z. Yang, J.-M. Yang, and Z. He, “Minimum-volumeconstrained nonnegative matrix factorization: Enhanced ability of learning parts,” IEEE Trans. Neural Netw., vol. 22, no. 10, pp. 1626–1637, Oct. 2011. [59] G. Zhou, A. Cichocki, and S. Xie, “Accelerated canonical polyadic decomposition using mode reduction,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 12, pp. 2051–2062, Dec. 2013. [60] Y. Zhang, G. Zhou, Q. Zhao, A. Cichocki, and X. Wang, “Fast nonnegative tensor factorization based on accelerated proximal gradient and low-rank approximation,” Neurocomputing, to be published. [61] G. Pfurtscheller, C. Brunner, A. Schlögl, and F. H. Lopes da Silva, “Mu rhythm (de)synchronization and EEG single-trial classification of different motor imagery tasks,” NeuroImage, vol. 31, no. 1, pp. 153–159, 2006. [62] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K.-R. Müller, “Optimizing spatial filters for robust EEG single-trial analysis,” IEEE Signal Process. Mag., vol. 25, no. 1, pp. 41–56, Jan. 2008. [63] K. K. Ang, Z. Y. Chin, H. Zhang, and C. Guan, “Filter bank common spatial pattern (FBCSP) in brain–computer interface,” in Proc. IEEE Int. Joint Conf. Neural Netw. (IJCNN), Hong Kong, Jun. 2008, pp. 2390–2397. [64] K. P. Thomas, C. Guan, C. T. Lau, A. P. Vinod, and K. K. Ang, “A new discriminative common spatial pattern method for motor imagery brain–computer interfaces,” IEEE Trans. Biomed. Eng., vol. 56, no. 11, pp. 2730–2733, Nov. 2009. [65] F. Qi, Y. Li, and W. Wu, “RSTFC: A novel algorithm for spatio-temporal filtering and classification of single-trial EEG,” IEEE Trans. Neural Netw. Learn. Syst., to be published.

Yu Zhang received the Ph.D. degree in control science and engineering from the School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China, in 2013. He was an International Program Associate with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Wako, Japan, from 2010 to 2012. He is currently an Assistant Professor with the School of Information Science and Engineering, East China University of Science and Technology. His current research interests include brain– computer interface, signal processing, tensor analysis, machine learning, and pattern recognition.

Guoxu Zhou received the Ph.D. degree in intelligent signal and information processing from the South China University of Technology, Guangzhou, China, in 2010. He is currently a Research Scientist with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Saitama, Japan, and a Full Professor with the School of Automation, Guangdong University of Technology, Guangzhou. His current research interests include statistical signal processing, tensor analysis, intelligent information processing, and machine learning.

Jing Jin received the Ph.D. degree in control theory and control engineering from the East China University of Science and Technology, Shanghai, China, in 2010. His Ph.D. advisors were Prof. G. Pfurtscheller with the Graz University of Technology, Graz, Austria, from 2008 to 2010, and Prof. X. Wang with the East China University of Science and Technology from 2006 to 2008, where he is currently a Professor. His current research interests include brain–computer interface, signal processing, and pattern recognition.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 12

Qibin Zhao received the Ph.D. degree from the Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China, in 2009. He is currently a Research Scientist with the Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Saitama, Japan, and also a Visiting Professor with the Saitama Institute of Technology, Fukaya, Japan. He has authored over 50 papers in international journals and conferences. His current research interests include machine learning, tensor factorization, computer vision, and brain–computer interface.

Xingyu Wang was born in Sichuan, China, in 1944. He received the B.S. degree in mathematics from Fudan University, Shanghai, China, in 1967, the M.S. degree in control theory from East China Normal University, Shanghai, in 1982, and the Ph.D. degree in industrial automation from the East China University of Science and Technology, Shanghai, in 1984. He is currently a Professor with the School of Information Science and Engineering, East China University of Science and Technology. His current research interests include control theory, control techniques, their application to biomedical systems, and brain control.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Andrzej Cichocki (M’96–SM’07–F’13) received the M.Sc. (Hons.), Ph.D., and D.Sc. (Habilitation) degrees from the Warsaw University of Technology, Warsaw, Poland, all in electrical engineering. He spent several years with University Erlangen–Nuerenberg, Erlangen, Germany, as an Alexander-von-Humboldt Research Fellow and a Guest Professor. From 1995 to 1997, he was a Team Leader of the Laboratory for Artificial Brain Systems with the Frontier Research Program, RIKEN Brain Science Institute, Saitama, Japan. He is currently a Senior Team Leader and the Head of the Laboratory for Advanced Brain Signal Processing with the RIKEN Brain Science Institute. He has authored over 300 technical journal papers and four monographs in English (two of them translated into Chinese). His publications currently report over 25 000 citations, according to Google Scholar. His current research interests include tensor decompositions, multiway blind sources separation, brain–machine interface, and EEG hyper-scanning. Prof. Cichocki has given keynote and tutorial talks at international conferences in computational intelligence and signal processing and served as a member of program and technical committees such as the European Signal Processing Conference, the International Joint Conference on Neural Networks, the International Communication Association Conference, the International Society of Nutrigenetics/Nutrigenomics, the International Conference on Neural Information Processing, and the International Conference on Artificial Intelligence and Soft Computing. He has served as an Associate Editor of the IEEE T RANSACTIONS ON N EURAL N ETWORKS , the IEEE T RANSACTIONS ON S IGNALS P ROCESSING, the IEEE T RANSACTIONS ON C YBERNETICS , and the Journal of Neuroscience Methods, and the Founding Editor-in-Chief of the Journal Computational Intelligence and Neuroscience.

Suggest Documents