Variational Bayes for Spatiotemporal ... - University of Surrey

1 downloads 0 Views 2MB Size Report
Engineering and Psychology, Cardiff University, Cardiff, Wales, U.K. Asterisk ... F. Ghaderi and S. Sanei are with the School of Engineering, Cardiff University,.
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 10, OCTOBER 2010

2413

Variational Bayes for Spatiotemporal Identification of Event-Related Potential Subcomponents Hamid Reza Mohseni∗ , Student Member, IEEE, Foad Ghaderi, Student Member, IEEE, Edward L. Wilding, and Saeid Sanei, Senior Member, IEEE

Abstract—We propose a novel method for detection and tracking of event-related potential (ERP) subcomponents. The ERP subcomponent sources are assumed to be electric current dipoles (ECDs), and their locations and parameters (amplitude, latency, and width) are estimated and tracked from trial to trial. Variational Bayes implies that the parameters can be estimated separately using the likelihood function of each parameter. Estimations of ECD locations, which have nonlinear relations to the measurement, are obtained by particle filtering. Estimations of the amplitude and noise covariance matrix of the measurement are optimally given by the maximum likelihood (ML) approach, while estimations of the latency and the width are obtained by the Newton–Raphson technique. New recursive methods are introduced for both the ML and Newton–Raphson approaches to prevent divergence in the filtering procedure where there is a very low SNR. The main advantage of the method is the ability to track varying ECD locations. The proposed method is assessed using simulated as well as real data, and the results emphasize the potential of this new approach for the analysis of real-time measures of neural activity. Index Terms—Event-related potentials (ERPs), maximum likelihood (ML) estimation, Newton–Raphson technique, particle filtering (PF), variational bayes.

I. INTRODUCTION VENT-RELATED potentials (ERPs) are one of a number of physiological measures of brain activity with excellent temporal resolution [1]. Conventional methods for analyzing ERPs involve time-locked averaging over many trials. This approach assumes that the ERP signal of interest is constant over trials, while the background EEG is a random process that will consequently be attenuated by averaging. While this procedure is widely employed in the psychological community, there is evidence that ERP subcomponents vary over time in their amplitude, latency, and scalp distributions due to factors, including

E

Manuscript received July 24, 2009; revised December 18, 2009 and March 31, 2010; accepted April 12, 2010. Date of publication May 24, 2010; date of current version September 15, 2010. This work was supported by the Schools of Engineering and Psychology, Cardiff University, Cardiff, Wales, U.K. Asterisk indicates corresponding author. ∗ H. R. Mohseni is with the Schools of Engineering and Psychology, Cardiff University, Cardiff, CF24 3AA, U.K. (e-mail: [email protected]). F. Ghaderi and S. Sanei are with the School of Engineering, Cardiff University, Cardiff, CF24 3AA, U.K. (e-mail: [email protected]; [email protected]). E. L. Wilding is with the Cardiff University Brain Research Imaging Centre, School of Psychology, Cardiff University, Cardiff, CF10 3AT, U.K. (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TBME.2010.2050318

fatigue, habituation, and levels of attention [2]. Our principal interest in this study is in developing a spatiotemporal method to reliably localize and detect the variability of ERP subcomponent parameters during the course of a recording session. Proposed approaches for detection of ERP parameters (primarily amplitude and latency) can generally be categorized into single- and multichannel-based methods. There are numerous studies reporting approaches for ERP detection and tracking using only one channel. These include Wiener filtering [3], Kalman filtering [4], [5], matching pursuit [6], wavelet transform [7], [8], maximum likelihood (ML) [9], [10], and maximum a posteriori solutions [11]. These approaches focus solely on characterizing temporal trial-to-trial signal variability and ignore spatial information. Approaches to multichannel estimations of ERP parameters include principal [12] and independent [13], [14] component analysis methods. The performances of these methods are somewhat limited by specific signal source properties such as independence of sources [15]. Recent methods have also been proposed to exploit spatiotemporal information by modeling ERP sources, using electric current dipoles (ECDs). In [16] and [17], the ECDs are assumed to have fixed locations and orientations in a spherical head model, whereas their amplitudes are allowed to vary in time according to either a parametric [16] or nonparametric model [17]. In [18], only the ECD location is fixed and the orientation and amplitude are allowed to vary in time according to a parametric model. A related method accommodates the spatially correlated noise between sensors with unknown covariances [19]. The main drawback of these spatiotemporal methods is that the ECD locations are assumed to be fixed during the course of a recording session. In addition to the earlier approaches, variational Bayes has been used effectively for spatiotemporal modeling of EEG and magnetoencephalography sensors for distributed and for dipole source models. The distributed source models assume that the potentials are generated by a large number of distributed ECDs, while the dipole source models assume that the potentials are generated by a constrained number of ECDs. Examples of the methods for distributed source models using variational Bayes can be found in [20]–[23]. Fewer methods have been developed using variational Bayes for dipole source models. One approach is the use of variational Bayes for fast computation of ECD parameters [24]. The main shortcoming for this approach is that only specific priors (normally Gamma distributions) are used for the model parameters. This constraint adversely influences the performance of the method in practical applications. A spatiotemporal variational Bayes using a dipole source model has

0018-9294/$26.00 © 2010 IEEE

2414

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 10, OCTOBER 2010

also been formulated in [25], [26]. In these methods, the Markov Chain Monte Carlo technique was used to sample the unknown high-dimensional parameters from the marginalized posterior distribution. One potential problem for this approach is that it is likely that the solutions are trapped in local maxima. Another approach is to use nonstationary Bayesian filtering for dipole source localization [27]. For other relevant studies, see the reviews in [28], the comparison studies in [29] and the references therein. These methods focus only on source localization, however, and temporal information within the data (e.g., shapes and trial-to-trial variability of the ERPs) is not incorporated. In this study, we propose a novel method for tracking the location, amplitude, latency, and width of ERP subcomponents. In this approach, the locations of ECDs can vary from trial to trial in a realistic head model. ERPs are assumed to be the superposition of a small number of ECDs, and their temporal bases are modeled by Gaussian waves. The amplitudes, means, and variances of the Gaussian waves can be interpreted as the amplitudes, latencies, and widths of ERP subcomponents, respectively. Variational Bayes shows that, when the prior distribution is unknown, maximizing the likelihood of each parameter (via separate estimation of each) is equivalent to minimizing the Kullback–Leibler distance between the estimated and the true posterior distributions. The locations are estimated using particle filtering (PF). Many studies have shown that PF is one of the best methods when the relation between the desired parameter (states) and the measurement is nonlinear [30]. A closed-form solution for the amplitude is also given by the ML approach. The solutions for the latency and width are given recursively by the Newton–Raphson technique, which has a rapid convergence. One challenge for this approach is that very low SNR in some trials can result in divergence of the filtering, which impacts negatively on the estimation of amplitudes, latencies, and widths. To compensate for this failure, recursive methods are introduced to improve the stability of the filtering from trial to trial. The rest of the paper is organized as follows. In Section II-A, the spatiotemporal ERP modeling is presented. Section II-B contains variational Bayes for estimation of the ERP model parameters. PF, ML, and Newton–Raphson methods to estimate the ECD locations and parameters are given in Section II-B. Several simulations for estimation of the ERP subcomponents are provided in Section III-A, and the effectiveness of the method in an oddball paradigm is demonstrated in Section III-B. Finally, a detailed discussion of the proposed method is presented.

II. METHODS A. Problem Formulation Let the measured ERP data Yk ∈ RL ×M be a matrix composed of the potentials acquired from L electrodes and M time samples at the kth trial. Also, suppose that the ERP is generated from q ECDs whose 3-D locations are specified by {ρk ,i ; i = 1, . . . , q}. The potential at the scalp Yk is assumed to be the superposition of the potentials from q ECDs. Based on

these assumptions, we may write Yk =

q 

H(ρk ,i )ak ,i ψ k ,i + Nk

(1)

i=1

where H ∈ RL ×3 is the forward matrix and is a nonlinear function of the ECD location. H can be calculated in a spherical head model with three layers: skull, scalp, and skin (the medium within each layer is assumed to be homogenous), or can be obtained using a realistic head model. In the realistic head model, after dividing the brain into sufficiently small grid cells, a precalculated forward matrix H is given for each grid cell. Nk represents the additive Gaussian zero-mean noise with unknown positive definite spatial covariance Qk , and known temporal covariance matrix I (the identity matrix). Hence, the covariance matrix of the noise can be written as I ⊗ Qk , where ⊗ represents the Kronecker product. Noise is assumed to be independent from the source activities and distributed identically across time, but not necessarily across sensors. These assumptions provide a fast and simple estimation of the noise covariance matrix from trial to trial. In (1), ak ,i ∈ R3×1 is the amplitude of the ECD moment in x, y, and z directions, and ψ k ,i = [ψk ,i (1) . . . ψk ,i (M )] ∈ R1×M represents the temporal basis of the ith ECD moment. Each ψk ,i (t) is given by a Gaussian wave as   (t − µk ,i )2 1 √ exp − . (2) ψk ,i (t) = 2σk2 ,i σk ,i 2π Note that the ECD amplitudes are different in x, y, and z directions, but have the same temporal bases in all three directions (i.e., same σk ,i and µk ,i in each case). For simplicity, and without √ loss of generality, we ignore the normalizing factor 1/σk ,i 2π and assume that it has been embedded in the amplitude vector ak ,i . Modeling the temporal bases of ERP subcomponents using parametric functions has been exploited in many studies (see, e.g., [16], [19]), and in these studies Gaussian waveform modeling is the most common approach [31]. Although real ERP subcomponents do not have the exact shape of Gaussian waveforms, this modeling approach allows a robust and fast estimation of the principal parameters (latency and amplitude) with which neurophysiologists and cognitive scientists are primarily concerned. Our primary aim is to recursively estimate the model parameters θ k ,i = {ρk ,i , ak ,i , Qk , µk ,i , σk ,i } based on their previous ˆ k −1,i and the available measurements Yk . Thereestimations θ fore, the evolution of the model parameters θ k ,i is assumed to be a Markovian process and does not vary extensively across trials. This assumption has been exploited in many ERP analysis approaches (e.g., [2], [4], [5], [32]). It can be explicitly justified by the observation [33] that consecutive responses to repeated stimuli vary slowly since brain states change gradually over time, although responses throughout the experiment can differ significantly. This assumption, however, may limit the deployment of the method in some applications where there is extensive electrophysiological variability from trial to trial. In this case (as will be shown in the simulation results section), the method may at least reveal trends for changes in parameters during the course

MOHSENI et al.: VARIATIONAL BAYES FOR SPATIOTEMPORAL IDENTIFICATION OF EVENT-RELATED POTENTIAL SUBCOMPONENTS

of a recording session. For instance, we can observe whether ERP parameters increase or decrease with time on task. B. Parameter Estimation by Variational Bayes Instead of estimating θ, we may estimate the posteriori distribution p(θ|Y), which fully describes our knowledge regarding the model parameters θ.1 Estimation of the posteriori distribution is the central quantity of interest in Bayesian estimation and is typically expanded using Bayes rule as p(θ|Y) ∝ p(Y|θ)p(θ)

(3)

where the dependence upon the model is implicitly assumed. p(Y|θ) is calculated from the model and p(θ) incorporates prior knowledge of the parameter values and their variability. In nonlinear models, however, the posteriori distribution is often difficult to analytically estimate using (3). In this case, it might be approximated with a simpler form r(θ), which can be determined using the variational method. A criterion to determine the fitness of r(θ) to the true posteriori distribution p(θ) can be given via free energy defined as follows:    p(Y, θ)p(θ) F = r(θ) log dθ. (4) r(θ) Inferring the posteriori distribution p(θ|Y) depends on correct estimation of r(θ), which is achieved by maximizing the free energy over r(θ). By such maximization, the best approximation to the true posteriori distribution is found. This also establishes the tightest lower bound on the true marginal likelihood. Moreover, maximization of F is equivalent to minimizing the Kullback–Liebler distance between r(θ) and the true posteriori distribution [34]. In variational Bayes, it is also assumed that  r(θ i ) (5) r(θ) = i

where the parameters in θ have been factorized into different groups θ i , each with its own approximate posteriori distribution r(θ i ). This is the key restriction in the variational Bayes method. The groupings are typically made logically according to their appearance in the model. For example, in our model, we place the locations in one group and the latencies and the widths in another group. Using calculus of variations, r(θ i ) is obtained by maximization of (4) as [34]  (6) log r(θ i ) ∝ r(θ /i ) log [p(Y|θ)p(θ)] dθ /i where θ /i refers to all the parameters of θ except those of the ith group. If the prior is assumed to be uniform, (6) implies that r(θ i ) is the likelihood function r(θ i ) ∝ p(Y|θ i ).

(7)

This equation is key for the estimation of the parameters in the rest of the paper. We partition θ into ρ, a, Q, µ, and σ and dif1 For

the sake of convenience in this section the variable indexes are omitted.

2415

ferent methods can be employed for estimation of the posteriori distribution of each subparameter r(θ i ) according to (7). 1) Estimation of ECD locations: To estimate the dipole locations, all the ECD locations are augmented in a matrix Rk = [ρk ,1 . . . ρk ,q ] ∈ R3×q . The dipole locations Rk have a nonlinear relation through forward matrix H to the measurements, and if the real head model is used, no exact closed-form solution for H exists. Therefore, nonlinear filtering is required to estimate the locations. PF is one estimator for p(Rk |Y1:k ), which has recently attracted interest [35]. PF is an emerging methodology, which can deal with nonlinearity of systems as well as the non-Gaussian nature of posteriori distributions. In PF, the posteriori distribution is approximated by discrete random (n ) measures defined by particles {Rk ; n = 1, . . . , N } and their (n ) associated weights {wk ; n = 1, . . . , N }. The posteriori distribution based on these particles and weights is approximated as p(Rk |Y1:k ) ≈

N 

(n )

(n )

wk δ(Rk − Rk )

(8)

n =1

where δ(·) is the Dirac delta function. Suppose in trial k, we want to approximate the posteriori distribution p(Rk |Y1:k ) subject to having p(Rk −1 |Y1:k −1 ). This means that, given the (n ) (n ) discrete random measure {Rk −1 , wk −1 ; n = 1, . . . , N } and the (n )

observation Yk , the approximation of {wk ; n = 1, . . . , N } is desired. Using Bayes’ rule and the concept of importance sampling, the new weights are updated as follows [35]: (n )

(n )

wk

(n )

∝ wk −1

(n )

(n )

p(Yk |Rk )p(Rk |Rk −1 ) (n )

(n )

π(Rk |Rk −1 , Y1:k )

(9)

where π(·) is the importance density. The choice of importance density is one of the crucial issues in designing the PF. In general, the closer the importance density to the actual posteriori distribution, the better the approximation. The most popular choice of (n ) (n ) the importance density is π(Rk |Rk −1 , Y1:k ) = p(Rk |Rk −1 ). This implies that (9) reduces to (n )

wk

(n )

(n )

∝ wk −1 p(Yk |Rk ).

(10)

By this choice of importance density, which is independent of the measurement, the state space is explored without any knowledge of the observations. Hence, this filter can be inefficient and sensitive to outliers. This choice, however, does have the advantages that the weights are easily evaluated, and that the (n ) importance density can be easily sampled. In (10), p(Yk |Rk ) is the likelihood function and has an equivalent distribution to the noise distribution p(Nk ), which has already been assumed to be zero-mean Gaussian with covariance matrix Qk . Note that to estimate the ECD locations, the other parameters are assumed to be known, and therefore they are omitted here. A major problem with PF is that the discrete random measure degenerates quickly. In other words, after some iterations, all except a few particles are assigned negligible weights. An increase in the number of particles with negligible weights leads to a decrease in the number of particles, contributing to the estimation of the posteriori distribution, and consequently the

2416

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 10, OCTOBER 2010

performance of the filtering procedure degrades. This can be compensated for by eliminating particles with small weights and replicating those with large weights. This resampling procedure can be conducted dynamically or within a fixed number of iterations. A direct implementation of resampling would consist of generating N independent and identically distributed (i.i.d.) random variables from a uniform distribution and computing the normalized cumulative sum. In each step of the resampling procedure, if the cumulative sum of the normalized weights is bigger than the cumulative sum of i.i.d. random variables, a particle is produced [36]. 2) Estimation of ECD amplitudes and noise covariance: Here, ML estimators for the ECD amplitudes ak ,i and noise covariance matrix Qk are derived. It follows from (1) that the negative log-likelihood function of the observed data samples is 

T q  H(ρk ,i )ak ,i ψ k ,i Q−1 f (θ; Yk ) = tr Yk − k

i=1

× Yk −

q 

H(ρk ,i )ak ,i ψ k ,i

i=1

+ M ln |Qk | + constant

(11)

T

where (·) and tr{·} denote transpose and trace operations, respectively. By equating the gradient of f (θ; Yk ) with respect to the parameter of interest ak ,i to zero, the estimation of amplitude is given by (see Appendix A) −1 T −1 ak ,i = (HT (ρk ,i )Q−1 k H(ρk ,i )) H (ρk ,i )Qk [Yk − qj =1,j = i H(ρk ,j )ak ,j ψ k ,j ]ψ Tk ,i ×  ψ k ,i 2F

(12)

where  . F denotes the Frobenius norm. The noise covariance matrix is also estimated by minimizing the negative loglikelihood function (11) with respect to Qk , which yields (see Appendix B)   q  1 H(ρk ,i )ak ,i ψ k ,i Qk = Yk − M i=1  T q  × Yk − H(ρk ,i )ak ,i ψ k ,i .

(13)

i=1

If all other parameters are estimated correctly, the likelihood monotonically increases with each iteration, hence convergence of the aforementioned algorithm to a local maximum is guaranteed. However, inaccurate estimation of the other parameters and/or the presence of large noise power in individual trials can influence the amplitude ak ,i and noise covariance matrix Qk estimates, and filtering may diverge. To prevent these possibilities, it is assumed that the evolution of the parameters is Markovian and then one can write ˜k −1,i + λa (ak ,i − a ˜ k −1,i ) ˜ k ,i = a a ˜ k −1 + λQ (Qk − Q ˜ k −1 ) ˜k = Q Q

(14)

˜ k ,i where 0 < λa , λQ ≤ 1 are constant forgetting factors, and a ˜ k are final estimations, which are updated by ak ,i and and Q

Qk . It is noteworthy that, in the earlier equations, we implicitly ˜ k |Q ˜ k −1 ) are zero-mean ak −1,i ) and p(Q assumed that p(˜ ak ,i |˜ Gaussian distributions. These recursive equations prevent sudden changes of amplitude and noise covariance because of some highly noisy individual trials and guarantee the stability of the filtering for sufficiently small values of λa and λQ . 3) Estimation of temporal basis parameters: ECD temporal basis parameters are the mean µk ,i and the variance σk ,i of Gaussian waveforms. They have nonlinear relations to the measurements. In general, the optimization problem in (11) does not appear to admit a closed-form solution for µk ,i and σk ,i . Iterative methods, therefore, may be employed to estimate the temporal basis parameters. The Newton–Raphson technique is a well-established method that can be used to solve this problem approximately [37]. The main benefit of the method is the fast convergence, especially if the iteration begins sufficiently near to the true point. Hence, this method is suitable for coupling with the other methods described here for the simultaneous estimation of parameters. In the Newton–Raphson technique, the following formulation is used to estimate the temporal basis parameters θ k based on θ k −1 at each iteration [37]  2 −1 ∂ f (θ; Yk ) ∂f (θ; Yk ) . (15) θ k = θ k −1 − λθ 2 ∂θ k ∂θ k For the same reason given in the previous section, the forgetting factor 0 < λθ ≤ 1 is added to the original Newton– Raphson equation. This guarantees stability of the filtering in the case of very low SNRs. Equation (15) needs the firstand second-order gradients of log-likelihood function f (θ; Yk ) with q respect to µk ,i and σk ,i . By defining ξ(t) = yk (t) − i=1 H(ρk ,i )ak ,i ψk ,i (t) and αk ,i = H(ρk ,i )ak ,i , the gradients with respect to µk ,i can be calculated and simplified as (see Appendix C) M ∂f (θ; Yk ) 2  = 2 (µk ,i − t)ψk ,i (t)αTk ,i Q−1 k ξ(t) ∂µk ,i σk ,i t=1

(16)

 M 2  ∂ 2 f (θ; Yk ) 1 2 = 2 1 − 2 (µk ,i − t) ψk ,i (t) ∂µ2k ,i σk ,i t=1 σk ,i × αTk ,i Q−1 k ξ(t) +

1 (µk ,i − t)2 ψk ,i (t)2 αTk ,i Q−1 k αk ,i . σk2 ,i (17)

Similarly, the first- and second-order gradients of the likelihood function with respect to σk ,i are as follows: M ∂f (θ; Yk ) 2  =− 3 (µk ,i − t)2 ψk ,i (t)αTk ,i Q−1 k ξ(t) (18) ∂σk ,i σk ,i t=1

 M ∂ 2 f (θ; Yk ) 2  1 4 = 4 3 − 2 (µk ,i − t) ψk ,i (t) ∂σk2 ,i σk ,i t=1 σk ,i × αTk ,i Q−1 k ξ(t) +

1 (µk ,i − t)4 ψk ,i (t)2 αTk ,i Q−1 k αk ,i . σk2 ,i (19)

MOHSENI et al.: VARIATIONAL BAYES FOR SPATIOTEMPORAL IDENTIFICATION OF EVENT-RELATED POTENTIAL SUBCOMPONENTS

It is evident from the foregoing account that any parametric function that has first- and second-order continuous derivatives can be employed instead of a Gaussian function to model the ERP waveforms. The temporal basis parameters of ERP subcomponents that are estimated by the earlier formulation may not be the optimum values related to the global minimum of the algorithm, and they depend on the initial points. Due to nonlinearity of the latency and width, the Newton–Raphson method is more sensitive to the initialization than are the PF and ML methods—both PF and ML can be considered as global minimizers. As a result, (15) needs a true estimation of the initial points, which can be chosen according to the latency and width of the fitted Gaussian waveforms estimated from the ensemble average over all trials. The procedure for selecting the initial points is explained further in the real data section. Although PF can again be employed to estimate the nonlinear parameters µk ,i and σk ,i , it requires extensive memory and computational time, which limits its application, particularly under conditions where there are multiple ECDs. 4) Overall algorithm: The aim of the overall algorithm is to update the parameters recursively based on the available measurements. The pseudocode of the method is presented in Algorithm 1. In this method (similar to that described by Doucet et al. [30], chapter 24), each particle not only holds a parame(n ) ter for location ρk ,i but also holds parametric representations (n )

(n )

(n )

for amplitude ak ,i , width σk ,i , and latency µk ,i of the ERP subcomponents—the superscript (n) denotes the nth particle. Note that, in this algorithm, the estimated noise covariance matrix Qk is used to update the weights, using the likelihood function in (10). Furthermore, the evolution of the locations (i.e., states in PF) is assumed to be a first-order Markovian process as Rk = Rk −1 + Wk

(20)

2417

where Wk is Gaussian white noise with known covariance matrix λρ I. Here, I is the identity matrix and λρ is a scalar that represents the noise power. Hence, the distribution (n −1) , Yk ) in Algorithm 1 is a zero-mean Gaussian disp(Rk |Rk tribution with covariance matrix λρ I. The proposed method is a grid-based method by which the brain is divided into sufficiently small 3-D grid cells, and the location of each ECD is restricted to one of these cells. Therefore, before updating the weights, if the location indicated by each particle is not one of these grid cells, it is replaced by the nearest cell. The grid-based method helps the use of the real head model and any form of forward solution. Moreover, depending on the number of grid cells, the computational complexity decreases considerably compared to nongrid-based methods. As described earlier, small forgetting factors should be employed for highly noise-contaminated data. A single update for each parameter, therefore, may not be sufficient to accurately estimate the parameters. To overcome this limitation, the parameters can simply be updated several times on each trial. In the following experiments, we updated the parameters seven times for each trial. In addition, in order to further increase the accuracy and decrease the dependency of the proposed method on the initial point, the following procedure was used for real data. The algorithm was first implemented using the initial points obtained from the average data. It was then implemented over the trials in reverse order, using the initial points that were obtained from the last trial of the previous implementation. This procedure was applied until the results at each iteration remained the same. This typically happened in fewer than four iterations.

III. RESULTS A. Simulated Data Results Several simulations were carried out to validate and demonstrate the application of the proposed method in empirical settings. The sensitivity of the results to the forgetting factors is presented first, then the methods for estimation of ECD locations and shape parameters are compared with their simpler forms. PF is compared with the extended Kalman filter (EKF), and the Newton–Raphson technique is compared with the gradient descent algorithm. EKF and gradient descent are two well-known methods for state space and nonlinear parameter estimation that estimate the desired parameters by linearizing the measurement functions. In addition to these comparisons, the overall performance of the proposed method is demonstrated via two examples using different SNRs. A three-shell homogeneous head model with realistic geometry was used to generate the EEG data. The conductivity ratio used for forward solution computation was 1 : 0.0125 : 1 for scalp:skull:brain. The values of estimators were scanned over discrete cubic grid cells with an intergrid distance of 2 mm. The EEG data contains ERP waves in the interval between 200 and 500 ms poststimulus. The P300 component is typically observed within this time period (for definition of P300 see Section III-B). The sampling frequency was set to 250 Hz and the number of trials to 60. These characteristics were chosen in order to match

2418

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 10, OCTOBER 2010

Fig. 1. Example of simulated EEG data with SNR = −5 dB, (a) original trial number 5 for multichannel EEG, (b) original trial number 55, (c) noisy trial number 5, and (d) noisy trial number 55.

those for the real EEG dataset to which the approach was applied subsequently. Two moving sources, one in frontal and one in parietal brain locations, were used for simulating the scalp ERP subcomponents. Gaussian waves were employed where amplitudes, latencies, and widths varied across trials. The amplitude profile of the frontal source (first source) was assumed to decrease linearly, but its latency and width were assumed to be approximately constant across trials. The amplitude profile of the parietal source (second source) was assumed to be approximately constant, but its latency and width were assumed to decrease linearly across trials. Gaussian white noise with different levels was added to the amplitudes, latencies, and widths of both sources. Spatially correlated Gaussian noise was also added to the simulated EEG signal to achieve a realistic SNR. The available noise power in the simulated multichannel EEG is measured by SNR in dB units, which is defined as 

Psignal (21) SNR = 10 log Pnoise where Psignal and Pnoise denote, respectively, the power of the simulated EEG and noise. Fig. 1 shows a typical example of the noiseless and noisy simulated ERPs. Noiseless trials 5 and 55 are shown in Fig. 1(a) and (b), and the noisy trials are shown in Fig. 1(c) and (d), respectively (for SNR = −5 dB). The signal waveform is not visible at most individual channels. The initial noise covariance matrices in the following simulated experiments are assumed to be known. The number of particles N was set empirically to 1000 during all experiments. It is important to note that an accurate selection of the number of particles is needed: if too few particles are chosen the performance of the method degrades, and if many particles are chosen, the computational cost increases. In the first experiment, we show the sensitivity of the method to the forgetting factors introduced in (14) and (15). The ECD frontal and parietal locations are assumed to be known and fixed.

SNR was set to −5 dB. Fig. 2 shows the estimated parameters, using λθ (θ ∈ {a, Q, µ, σ}) equal to 0.2, 0.4, and 0.8 shown by red, blue, and green lines, respectively.2 The original parameters are shown by black solid lines. This figure shows that, if large values for λθ are chosen, the estimated parameters will have large variability, and thus the error increases (see green lines). On the other hand, if small values for λθ are chosen, the estimated parameters will vary smoothly and suffer from less noise, but the algorithm is unable to track the trends for the parameters (see red lines). Therefore, to compromise between tracking the variability of the parameters and noise reduction, proper values of λθ are needed (blue lines). In practice, the best way to select these values is empirical selection according to the SNR of the available dataset. For instance, one can set all of the forgetting factors equal to 1, and if the results diverge, or the variability of estimated parameters is not reasonable (based upon our knowledge about the dataset), some or all forgetting factors can be decreased. This procedure is continued until the desired results are obtained. It is noteworthy that the algorithm has moderate sensitivity to λθ values, and in practice selecting the proper values should be relatively straightforward. In the second experiment, the results of PF for source localization are compared with those of EKF. The EKF linearizes the measurement equation, using the Jacobian matrix of the measurement function. The linear system is then recursively estimated using the Kalman filter. The measurement function (i.e. the forward model), therefore, should have the first-order derivative. A single-layer spherical head model was used for this example only, in order to be able to calculate the Jacobian matrix. The forward model and its Jacobian matrix are presented in Appendix D. The ECD parameters and the initial source locations are assumed to be known a priori. Fig. 3 presents the original (blue lines) and estimated ECD locations using PF (red lines) and EKF (black lines) for axial and sagittal views. PF clearly estimated and tracked the ECD locations better than EKF. It is notable that, if SNR is reduced to below −7 dB, the EKF (unlike the PF) diverges and is unable to track the sources. In addition, EKF is sensitive to the initial locations, while PF is not. For instance, if the initial location of the ECDs is assumed to be in the center of the head, the EKF will not be able to reach the true points. Furthermore, only PF can be applied to the realistic head model when a closed-form of the forward model is not available. The disadvantages of PF, however, are the computational time and the memory requirements. These are considerably larger than those for EKF. In the next experiment, the results of the Newton–Raphson technique for estimation of the ECD shape parameters are compared with those of the gradient descent method. The gradient descent method linearizes the objective function, using the gradient function. This can be written for our shape parameters as ∂f (θ; Yk ) (22) θ k = θ k −1 − µθ ∂θ k 2 In this example, if we set λ = 1, the algorithm diverges and it is not θ possible to estimate the parameters.

MOHSENI et al.: VARIATIONAL BAYES FOR SPATIOTEMPORAL IDENTIFICATION OF EVENT-RELATED POTENTIAL SUBCOMPONENTS

2419

Fig. 2. Investigation of effects of forgetting factors introduced in (14) and (15) for (a) amplitude, (b) latency, and (c) width. The original parameters are shown by solid black lines and estimated parameters using λθ (θ ∈ {a, Q, µ, σ}) equal to 0.2, 0.4, and 0.8 are shown using red, blue, and green lines, respectively.

Fig. 3. Comparison between PF and EKF for source localization in a spherical head model: (a) axial view and (b) sagittal view.

where µθ is the forgetting factor. The gradient function ∂f (θ; Yk )/∂θ k for latency and width parameters can be obtained according to (16) and (18), respectively. The locations of the two dipoles are assumed to be fixed and known. The initial points for latency and width are also assumed to be known for both algorithms. Fig. 4 shows the estimated shape parameters using both algorithms. The values of λθ and µθ were chosen such that the best results in terms of rms error were obtained. In Fig. 4, the blue lines are the original parameters and the red and black lines are the estimated parameters, using the Newton–Raphson and gradient descent algorithms, respectively. Newton–Raphson detects the parameters better than the gradient descent algorithm. We have found in our experiments that the gradient descent method is less sensitive to the initial points than is the Newton–Raphson method. One possible approach is to use both methods, where the gradient descent and Newton– Raphson techniques are used before and after convergence of the filter, respectively. In the next experiments, the proposed method is applied to the simulated data to estimate all parameters for two different SNRs. Fig. 5(a)–(c) shows the simulated and estimated amplitude, latency, and width of the frontal (red lines) and the parietal (blue lines) sources for the data with SNR = 5 dB. Fig. 6(a) and (b) depicts the simulated and estimated locations for axial and sagittal views for the same data. Because of the small amount of noise, we chose λρ = 0.001, λa = 1, λQ = 1, λµ = 1

Fig. 4. Comparison between Newton–Raphson and gradient descent algorithms for estimation of shape parameters of subcomponents, (a) latency of the first dipole, (b) latency of the second dipole, (c) width of the first dipole, and (d) width of the second dipole.

and λσ = 1. These values preserve the Markovian assumption for the parameters. Figs. 5 and 6 show that, in the presence of moderate noise power, the proposed method can detect the ERP subcomponent parameters and track their variability accurately. Similar results for the simulated data with SNR = −5 dB are shown in Figs. 5(d)–(f) and 6(c) and (d). The forgetting factors were set as λρ = 0.001, λa = 0.6, λQ = 0.6, λµ = 0.4, and λσ = 0.4. If these values are chosen to be the same as those employed for the previous simulation, the algorithm will diverge on some trials. Note that these figures indicate that, although the error for some of the individual trials is large, the proposed approach at least reveals the trends for the temporal basis parameters and locations. B. Real Data Results In this section, the application of the method in an auditory oddball paradigm is presented. First, the procedure for selection

2420

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 10, OCTOBER 2010

Fig. 5. Example of estimated amplitude, latency, and width of two ERP subcomponents from simulated data with SNR = 5 dB, (a) amplitude, (b) latency, and (c) width. For results with SNR = −5 dB, see (d) amplitude, (e) latency, and (f) width.

Fig. 6. Estimated locations using the proposed method from the simulated data with SNR = 5 dB in (a) axial view and (b) sagittal view. For results with SNR = −5 dB, see (c) axial view and (d) sagittal view.

of initial points, as well as the results of the method from a single subject, is explained. The stability of the method is then shown in a group of four subjects. The accuracy of the method is finally verified relative to the target-to-target interval (TTI).

Real data were obtained in an oddball paradigm in the Cardiff University Brain Research Imaging Center. The subjects were female, right-handed undergraduate students. They heard in total 300 tones, 240 of which were presented at one frequency, while the remaining 60 (the oddballs) were presented at a different frequency. During acquisition, the frequency bandwidth of the linear filter was 0.03–40 Hz, and the sampling rate was 250 Hz. An Fz (midline frontal) reference electrode was employed. EEG data were recorded using a standard 10–20 system with 25 scalp electrodes. In addition, horizontal and vertical electrooculogram were recorded to identify eye blinks and eye movements. The data were re-referenced offline to the algebraic mean of the signals at the left and right mastoids. A 152 ms prestimulus interval was used for baseline correction. Eye blinks were rejected using independent component analysis [38]. The ensemble average over 60 trials in response to oddball tones for 25 channels is shown in Fig. 7. The signal waveform is visible in the filtered and averaged data but not in the single-trial data. Fig. 7 shows two dominant peaks, one negative peak at approximately 100 ms poststimulus, and one positive peak approximately 300 ms after stimulus onset (P300). Here, we are interested in P300, which has two subcomponents: P3a and P3b. There is evidence that these subcomponents arise from interactions between frontal and parietal neural sources [39]. There is also localization work consistent with the view that the P300 data can be modeled accurately by sources placed in anterior and in posterior–parietal cortex [40]. The number of sources included in the model has a major effect on the performance of the method, and an increment in the number of sources increases exponentially the number of particles in the PF that are

MOHSENI et al.: VARIATIONAL BAYES FOR SPATIOTEMPORAL IDENTIFICATION OF EVENT-RELATED POTENTIAL SUBCOMPONENTS

2421

Fig. 7. Average ERP over 60 trials of real data for a female subject. The lower diagram shows the amplitude and time scales for all the plots. P300 amplitude can be seen clearly at central sensors.

needed for estimation of the posteriori distribution. For the sake of convenience, therefore, it is assumed that only two sources are responsible for the P300 generated by oddball stimuli in the oddball paradigm. Choosing the initial points is a crucial step that influences the behavior and convergence of the filtering procedure. To determine the initial points for the amplitude, latency, and width of each subcomponent, the data were average referenced and segmented around the P300 component. Also, the initial location was assumed to be frontal for P3a and parietal for P3b. Epochs from 200 to 500 ms time locked to stimulus onset for oddball trials only were extracted. By assuming that the location is fixed, Gaussian waves were fitted to the data. Since the average data benefit from less noise than single-trial data, all the forgetting factors were set to 1, and the algorithm was executed until the results remained the same. The real and fitted data for 25 channels are shown in Fig. 8 by red and blue lines, respectively. The amplitude, latency, and width of the fitted data were used to initialize the algorithm. The initial noise covariance matrix is also obtained by applying (13) to the average data. The estimated amplitudes, latencies, and widths of the P3a and P3b subcomponents for one subject are shown in Fig. 9. The amplitudes are the absolute values of the 3-D ECD ampli-

tude moments. The figure shows that the amplitude of P3b is more stationary than that of P3a on most trials. P3a amplitude diminishes from trial to trial while P3b is approximately constant. The latency of P3a is shorter than that of P3b, and P3b latency is less variable than P3a latency. Also, the width of P3b is larger and more stable than that of P3a. Fig. 10 depicts the estimated location of P3a and P3b in axial and sagittal views. The red markers (frontal) denote the location of P3a, while the blue markers (parietal) denote the location of P3b. The location of P3b in this subject is again more consistent than that of P3a. In the next experiment, the method was similarly applied to data from four subjects to further investigate the performance and stability of the method. Fig. 11 shows the estimated parameters for both P3a (red points) and P3b (blue points) and their linear regressions. The rows from top to bottom are the estimated amplitude, latency, and width, respectively, and the columns depict the results for different subjects. There is a significant correlation between estimated P3a parameters and trials across the four subjects (p < 0.01 in a two-tailed t-test for average of amplitudes, latencies, and widths across four subjects). The amplitude and width of P3a decreases and its latency increases with time. The correlation coefficients between amplitude, latency, and width and the trials are −0.816, 0.575, and

2422

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 10, OCTOBER 2010

Fig. 8. Average re-referenced and segmented data around P300 (200–500 ms) shown by red lines and fitted data shown by blue lines. The lower diagram shows the amplitude and time scales for all plots. The parameters of the fitted waveforms were used for initializing the algorithm.

Fig. 9.

Estimated amplitude (a), latency (b), and width (c) of P3a (red line) and P3b (blue line) for real data.

−0.695, respectively. However, the correlations between P3b parameters and trials are not significant. The correlation coefficients for amplitude, latency, and width and the trials are 0.0875, 0.1975, and −0.2575, respectively. The estimated locations of the P300 subcomponents for four subjects were similar to those

shown in Fig. 10. This means that P3a is less stable than P3b. These observations can be considered as a verification of the accuracy of the method using real data. To further assess the accuracy of the method, results obtained from the same subjects are plotted in Fig. 12, in which the

MOHSENI et al.: VARIATIONAL BAYES FOR SPATIOTEMPORAL IDENTIFICATION OF EVENT-RELATED POTENTIAL SUBCOMPONENTS

2423

for the M-step can be found by maximizing (11) with respect to /R θk . The method can also be considered as Rao–Blackwellized PF, which is a well-established method in different applications of PF [30]. Using Rao–Blackwellization, the variables may be marginalized out as p(Rk , ak , Qk , µk , σk |Yk ) Fig. 10.

Estimated locations of P3a (red marker) and P3b (blue marker).

trials were presented in reverse order to the algorithm, i.e., the filtering procedure processed the final trial first and the first trial last. The plots, as shown in Fig. 12, are approximately the same as those shown in Fig. 11. Thus, we conclude that the results of the filtering procedure converged to the desired values. Finally, the method was applied to the frequent as well as the oddball trials. The estimated amplitudes of P3a and P3b are shown in Fig. 13 with red and blue lines, respectively. The occurrences of the oddball stimuli are shown using green-dashed vertical lines. This figure shows that the earlier trials have larger amplitudes than the later trials. This is particularly true for P3a (red lines). Furthermore, the oddball trials generally have larger amplitudes than the frequent trials. Moreover, the amplitude is approximately proportional to TTI—the time interval between two consecutive oddball trials. This means that increasing the number of preceding frequent tones produced stronger effects on the oddball trials. This is entirely consistent with what is known about the P300, which increases in size as the local probability of encountering an oddball decreases [41]. IV. DISCUSSION Localizing and tracking ECDs can have many benefits, particularly in circumstances where the locations of dipoles may vary during the course of a recording session. For instance, there are many studies reporting the deployment of ECD tracking to investigate ERPs in healthy adult [42], [43], as well as in other populations, which includes people suffering from Parkinson’s disease [44], epilepsy [45], or the consequences of cerebral stroke [46]. The approach proposed here has the potential to be employed in these and similar applications to localize and track ECD locations. The proposed method, which was given within the variational Bayes framework, can be considered as an expectation maximization (EM) algorithm [47]. EM is an iterative method that alternates between performing an expectation (E) step, which computes an expectation of the likelihood with respect to the current estimate of the parameters, and a maximization (M) step, which computes the parameters by maximizing the expected likelihood. At the kth iteration of the EM algorithm, the distri/R /R bution of interest for the E-step is p(Rk |Yk , θ k ), where θ k represents all the parameters except location. In our problem, the distribution of location is intractable; hence an approximation to this distribution is used, which is propagated from trial to trial by the PF. By substituting the posteriori distribution of the locations Rk obtained in the E-step, the required optimization

= p(Rk|Yk , ak , Qk ,µk , σk )p(ak , Qk|Yk , µk σk )p(µk , σk|Yk). (23) The aforementioned equation implies that each distribution can be estimated using different methods. Here, the parameters are assumed to be independent. Equation (1), which is the product of H, ak , and ψ k , entails that the parameters are indeed independent. H is a function of ρk and ψ k is a function of σk and µk . In this study, the evolution of the parameters is assumed to be a first-order Markov model and the transition probabilities to be known. Hence, the method trades off between reducing the noise power and following sudden changes in the parameters. In other words, if the signal has moderate noise power, or the variabilities from trial to trial are small, the method can be applied successfully. If the SNR is very low, however, and the parameters have sudden changes from trial to trial, the method fails to estimate the parameters accurately. In these cases, it may reveal trends for variation of the parameters across trials, which can be useful in circumstances where it is reasonable to assume that the components or subcomponents of interest either decrease or increase. Psychological studies of the effects of habituation, learning, and fatigue are the kinds of contexts in which this form of information would be useful. It is also notable that this approach relies on weaker assumptions than those in other methods, which assume that the ERP parameters remain the same across all trials (e.g., [9], [10]). The noise Nk was assumed to be zero-mean Gaussian noise, independent from the source activities and identically distributed across time. These assumptions, however, may be violated in practical applications. For instance, the background activity (noise) is generally considered as a pink noise such that its autocorrelation is nonzero at different lags. Furthermore, since the ERPs comprise desynchronization of macroscopic eigenrhythms, such as alpha or mu rhythms, the independence assumption for the background noise may not be correct. A recently introduced mechanism for generation of ERPs by Nikulin et al. suggests that background activity is not necessarily zero mean and independent from ERPs [48]. When the noise is temporally independent and identically distributed, signal quality is improved. Different results are likely, however, if the noise is strongly correlated with time, such as when it is due to large amplitude alpha rhythms. Unknown temporal noise correlations can thus adversely affect the result of the method. The method is most suitable for scenarios in which the noise is strongly correlated in the spatial dimension. The method thus offers significant advantages relative to other

2424

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 10, OCTOBER 2010

Fig. 11. Estimated parameters in four subjects. The rows from top to bottom show the estimated amplitude, latency, and width, respectively, and the columns show the results for different subjects. The results for P3a and P3b are shown by red and blue dots, respectively, and the lines are their linear regressions.

Fig. 12. Same results as those shown in Fig. 11, but the algorithms were implemented using the trials in reverse order. The results are similar to those shown in Fig. 11, verifying convergence and stability of the method.

MOHSENI et al.: VARIATIONAL BAYES FOR SPATIOTEMPORAL IDENTIFICATION OF EVENT-RELATED POTENTIAL SUBCOMPONENTS

Fig. 13.

2425

Estimated amplitude of P3a (red marker) and P3b (blue marker) for both frequent and oddball stimuli.

methods, which assume that the statistics of the noise remain the same during the course of recording [32]. The ERPs acquired during oddball paradigms have been widely analyzed and discussed in neuroscience, clinical, and engineering studies (e.g., [15], [49], [50]). The paradigm typically results in a robust P300 elicited by the oddball (infrequent) stimuli [1]. These are the main reasons why the ERPs acquired in this paradigm were employed here to assess the proposed method. None the less, a complete and accurate localization of all of the sources contributing to P300 is not possible at the present time [51]. It is widely accepted, however, that fronto– parietal interactions are one of the main contributors [40]. As a result, the method described here is likely to be useful when comparing the variability of frontal and parietal contributions during cognitive tasks across groups. Contrasts might be made between young and older participants, or between controls and individuals with organic or psychiatric disorders. Further validation of the method proposed here will involve the use of datasets for which there is good evidence for the specific neural sources, with sensory evoked potentials being an obvious candidate. Finally, although the PF algorithm has the ability to converge to the global minimum, the initial locations are assumed to be known a priori. This is because the number of trials was limited, and the signal was contaminated with a high level of noise. The initial locations were chosen according to widely documented knowledge (e.g., [40]) about the ERP components that are observed in oddball experiments. Initial points could be assigned on the basis of (or in combination with) other kinds of information, e.g., the outcomes of studies in which hemodynamic measures of neural activity have been acquired in the same tasks for which ERPs are then analyzed subsequently [52]. V. CONCLUSION In this paper, a novel method for identifying ERP subcomponents was introduced. The ERP subcomponent sources were modeled by ECDs, and their moments were modeled by Gaussian waves. The evolution of the model parameters was assumed to be a first-order Markovian process, and the unknown parameters were estimated using PF, ML, and the Newton–Raphson techniques from trial to trial. The results obtained from several simulated and real datasets verified the potential use of the method in ERP analysis.

The ability to reliably extract ERP subcomponents over trials would be of great benefit in several contexts. For example, for ERP researchers interested in using ERPs to isolate cognitive processes, the reliance on averaging introduces an inevitable degree of caution when making inferences about the onset times of processes. Caution is also necessary when inferring whether peak amplitude differences between averaged ERPs for different conditions do in fact reflect consistent peak amplitude differences at the level of individual trials. The alternative is that the amplitude differences emerge because of greater intertrial variability in peak latencies for one condition in comparison to the other. Similar concerns apply when contrasting ERPs across different patient groups. In this context, the existence of a reliable means of extracting ERP subcomponents also offers a means of assessing whether factors such as habituation and fatigue influence ERPs differently according to variables such as disease state or severity, as well as the location and extent of focal brain damage. APPENDIX A ML ESTIMATION OF ak ,i We define



f (θ; Yk )|t =

yk (t) −

q 

T H(ρk ,j )ak ,j ψk ,j (t)

j =1

 yk (t) −

q 

Q−1 k



H(ρk ,j )ak ,j ψk ,j (t)

(24)

j =1

where yk (t) is the tth column of Yk . Using [53] ∂ (x − As)T W(x − As) = −2AT W(x − As) ∂s the derivative of (24) with respect to ak ,i becomes

(25)

∂ f (θ; Yk )|t = −2HT (ρk ,i )ψk ,i (t)Q−1 k ∂ak ,i   q  H(ρk ,j )ak ,j ψk ,j (t) . (26) yk (t) − j =1

Because of the trace operator in (11), it is evident that (∂/∂ak ,i )f (θ; Yk ) = M t=1 (∂/∂ak ,i )f (θ; Yk )|t . Therefore,

2426

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 10, OCTOBER 2010

by accumulating (26) for t = 1 to M and equalling to zero, we have M 

HT (ρk ,i )ψk ,i (t)Q−1 k H(ρk ,i )ak ,i ψk ,i (t)

t=1 M 

=



× Yk −

HT (ρk ,i )ψk ,i (t)Q−1 k



q 

yk (t) −

 H(ρk ,j )ak ,j ψk ,j (t) .

(27)

j =1,j = i

ψ k ,i 22 =

=

H(ρk ,j )ak ,j [ψk ,j (1) . . . ψk ,j (M )]

j =1,j = i

 =

× [ψk ,i (1) . . . ψk ,i (M )]T Yk −

q 

H(ρk ,j )ak ,j ψ k ,j ψ Tk ,i

(29)

(12) is obtained.

ML ESTIMATION OF Qk Using the following [53]

(34)

(30) (31)

the derivative of (11) with respect to Qk becomes

T

T

Q−1 k

∂ψk ,i (t) T −1 αk ,i Qk ξ(t) ∂µk ,i

+ (µk ,i − t)ψk ,i (t)αTk ,i Q−1 k

∂ξ(t) ∂µk ,i

(36)

which after some algebra results in (17). APPENDIX D



 q  ∂ 1 −1 f (θ; Yk ) = − H(ρk ,i )ak ,i ψ k ,i Qk Yk − ∂Qk M i=1 H(ρk ,i )ak ,i ψ k ,i

where again yk (t) is the tth column of Yk . By accumulating the aforementioned equation for t = 1 to M and using the fact that ∂ψk ,i (t)/∂µk ,i = ((t − µk ,i )/σk2 ,i )ψk ,i (t), (16) after some algebra follows. The second-order derivative of negative log-likelihood with respect to µk ,i is equal to the derivative of (16)

+ (µk ,i − t)

∂tr{AW−1 B} = −(W−1 BAT W−1 ) ∂W ∂|W| = |W|W−T ∂W

× Yk −

.

M ∂ 2 f (θ; Yk ) 2  ∂(µk ,i − t) = ψk ,i (t)αTk ,i Q−1 k ξ(t) ∂µ2k ,i σk2 ,i t=1 ∂µk ,i

APPENDIX B

q 

H(ρk ,i )ak ,i ψ k ,i

i=1



j =1,j = i



T

Equation (16) can be obtained in a way similar to the calculations in Appendix A. The derivative of (24) with respect to µk ,i can be written as T   ∂f (θ; Yk )  ∂ψk ,i (t) = −2 H(ρk ,i )ak ,i Q−1 k ∂µk ,i t ∂µk ,i   q  × yk (t) − H(ρk ,i )ak ,i ψk ,i (t) (35)



q 

q 

DERIVATION OF (16) AND (17)

H(ρk ,j )ak ,j ψk ,j (t)

[yk (1) . . . yk (M )] −

Equation (33) simply results in

q  1 Yk − H(ρk ,i )ak ,i ψ k ,i Qk = M i=1

APPENDIX C



j =1,j = i



(33)

(28) q 

ψk ,i (t) yk (t) −

+ I = 0.

i=1

ψk2 ,i (t)



t=1

H(ρk ,i )ak ,i ψ k ,i

× Yk −

t=1

M 

T



Using the aforementioned equation and the following facts M 

q  i=1

t=1

×

By post multiplication of (32) by QTk and recognizing that Qk = QTk , we have

q  1 −1 H(ρk ,i )ak ,i ψ k ,i − Qk Yk − M i=1

+ Q−T k

FORWARD MODEL AND ITS JACOBIAN MATRIX

Using the Biot–Savart law, the forward model for EEG can be obtained as follows [54]:   ρ − ρe1 1 ρ − ρeL H(ρ) = ··· (37) = 0. 4πσ0  ρ − ρe1 3  ρ − ρeL 3

i=1

(32)

where σ0 is the conductivity profile of the head tissues, and ρei is the location of the ith electrode. The derivative of the forward

MOHSENI et al.: VARIATIONAL BAYES FOR SPATIOTEMPORAL IDENTIFICATION OF EVENT-RELATED POTENTIAL SUBCOMPONENTS

model with respect to ρ is expressed as   I 1 (ρ − ρe1 )(ρ − ρe1 )T ∂H(ρ) = −3 ··· ∂ρ 4πσ0  ρ − ρe1 3  ρ − ρe1 5

 I (ρ − ρeL )(ρ − ρeL )T ··· − 3 .  ρ − ρeL 3  ρ − ρeL 5 (38) The EKF can be applied to a measurement that is in vector format. If there is only one ECD at location ρk = [ρxk ρy k ρz k ]T in trial k, then (1) can be rewritten as vec{Yk } = (I ⊗ H(ρk ))vec{ak ψ k } + vec{Nk }, where vec{·} denotes the vectorization operator. Hence, the Jacobian matrix can be obtained using (I ⊗ (∂H(ρk )/∂ρζ k )vec{ak ψ k }, where ζ ∈ {x, y, z}. REFERENCES [1] M. D. Rugg and M. G. H. Coles, Electrophysiology of Mind. Oxford, U.K.: Oxford Science, 1995. [2] M. L. A. Jongsma, T. Eichele, C. M. Van Rijn, A. M. L. Coenen, K. Hugdahl, H. Nordby, and R. Quian Quiroga, “Tracking pattern learning with single-trial event-related potentials,” Clin. Neurophysiol., vol. 117, pp. 1957–1973, 2006. [3] S. Cerutti, V. Bersani, A. Carrara, and D. Liberati, “Analysis of visual evoked potentials through Wiener filtering applied to a small number of sweeps,” J. Biomed. Eng., vol. 9, pp. 3–12, 1983. [4] M. V. Spreckelsen and B. Bromm, “Estimation of single-evoked cerebral potentials by means of parametric modeling and Kalman filtering,” IEEE Trans. Biomed. Eng., vol. 35, no. 9, pp. 691–700, Sep. 1988. [5] S. D. Georgiadis, P. O. Ranta-aho, M. P. Tarvainen, and P. A. Karjalainen, “Single-trial dynamical estimation of event-related potentials: A Kalman filter based approach,” IEEE Trans. Biomed. Eng, vol. 52, no. 8, pp. 1397– 1406, Aug. 2005. [6] C. Sielu˙zycki, R. K¨onig, A. Matysiak, R. Ku´s, D. Ircha, and P. J. Durka, “Single-trial evoked brain responses modeled by multivariate matching pursuit,” IEEE Trans. Biomed. Eng., vol. 56, no. 1, pp. 74–82, Jan. 2009. [7] E. A. Bartnik, K. J. Blinowska, and P. J. Durka, “Single evoked-potential reconstruction by means of wavelet transform,” Biol. Cybern., vol. 67, pp. 175–181, 1999. [8] R. Quian Quiroga and H. Garcia, “Single-trial event-related potentials with wavelet denoising,” Clin. Neurophysiol., vol. 114, no. 2, pp. 376–290, 2003. [9] D. T. Pham, J. M¨ocks, W. K¨ohler, and T. Gasser, “Variable latencies of noisy signals: Estimation and testing in brain potential data,” Biometrika, vol. 74, pp. 525–533, 1987. [10] P. Ja´skowski and R. Verleger, “Amplitudes and latencies of single trial ERPs estimated by a maximum likelihood method,” IEEE Trans. Biomed. Eng., vol. 46, pp. 987–993, Aug. 1999. [11] W. A. Truccolo, K. H. Knuth, A. Shah, S. L. Bressler, C. E. Schroeder, and M. Ding, “Estimation of single-trial multicomponent ERPs: Differentially variable component analysis (dVCA),” Biol. Cybern., vol. 89, pp. 426– 438, 2003. [12] R. Chapman and J. McCrary, “EP component identification and measurement by principal components analysis,” Brain Cogn., vol. 27, no. 3, pp. 288–310, 1995. [13] T. Jung, S. Makeig, M. Westerfield, J. Townsend, E. Courchesne, and T. J. Sejnowski, “Analysis and visualization of single-trial event-related potentials,” Human Brain Mapping, vol. 14, pp. 166–185, 2001. [14] S. Lemm, G. Curio, Y. Hlushchuk, and K. R. M¨uller, “Enhancing the signal-to-noise ratio of ICA-based extracted ERPs,” IEEE Trans. Biomed. Eng., vol. 53, no. 4, pp. 601–607, Apr. 2006. [15] S. Sanei and J. Chambers, EEG signal Processing. New York: Wiley, 2007. [16] M. Scherg and D. Von Cramon, “Two bilateral sources of the late AEP as identified by a spatiotemporal dipole model,” Electroencephalogr. Clin. Neurophysiol., vol. 62, pp. 32–44, 1985. [17] J. C. de Munck, “The estimation of time varying dipoles on the basis of evoked potentials,” Electroencephalogr. Clin. Neurophysiol., vol. 77, pp. 156–160, 1990.

2427

[18] M. Scherg and D. Von Cramon, “Evoked dipole source potentials of the human auditory cortex,” Electroencephalogr. Clin. Neurophysiol., vol. 65, pp. 344–360, 1986. [19] A. Dogandˇzi´c and A. Nehorai, “Estimating evoked dipole responses in unknown spatially correlated noise with EEG/MEG arrays,” IEEE Trans. Signal Proc., vol. 48, no. 1, pp. 13–25, Jan. 2000. [20] S. Baillet and L. Garnero, “A Bayesian approach to introducing anatomofunctional priors in the EEG/MEG inverse problem,” IEEE Trans. Biomed. Eng., vol. 44, pp. 374–385, May 1997. [21] N. J. Trujillo-Barreto, E. Aubert-Vazquez, and W. D. Penny, “Bayesian M/EEG source reconstruction with spatio-temporal priors,” NeuroImage, vol. 39, pp. 318–335, 2008. [22] J. Mattout, C. Phillips, W.D. Penny, M.D. Rugg, and K.J. Friston, “MEG source localization under multiple constraints: An extended Bayesian framework,” NeuroImage, vol. 30, pp. 753–767, 2006. [23] D. Wipf and S. Nagarajan, “A unified Bayesian framework for MEG/EEG source imaging,” NeuroImage, vol. 44, pp. 947–966, 2009. [24] S. J. Kiebel, J. Daunizeau, C. Phillips, and K. J. Friston, “Variational Bayesian inversion of the equivalent current dipole model in EEG/MEG,” NeuroImage, vol. 39, pp. 728–741, 2008. [25] S. C. Jun, J. S. George, J. Pare-Blagoev, S. M. Plis, D. M. Ranken, David M. Schmidt, and C. C. Wood, “Spatiotemporal Bayesian inference dipole analysis for MEG neuroimaging data,” NeuroImage, vol. 28, pp. 84–98, 2005. [26] D. M. Schmidt, J. S. George, D. M. Ranken, and C. C. Wood, “Spatialtemporal Bayesian inference for MEG/EEG,” in Biomag 2000: 12th International Conference on Biomagnetism, J. Nenonen, R. J. Ilmoniemi, and T. Katila, Eds., 2001, pp. 671–673. [27] E. Somersalo, A. Voutilainen, and J.P. Kaipio, “Non-stationary magnetoencephalography by Bayesian filtering of dipole models,” Inverse Problems, vol. 19, pp. 1047–1063, 2003. [28] S. Baillet, J. C. Mosher, and R. M. Leahy, “Electromagnetic brain mapping,” IEEE Signal Process. Magn., vol. 18, no. 6, pp. 14–30, Nov. 2001. [29] C Campi A Sorrentino, A Pascarella, and M Piana, “A comparative analysis of algorithms for the magnetoencephalography inverse problem,” in Proc. 6th Int. Conf. Inverse Problems Eng., IOP, 2008, pp. 012094-1– 012094-8. [30] A. Doucet, N. Freitas, and N. Gordon, Sequential Monte Carlo Metods in Practice. New York: Springer-Verlag, 2001. [31] L. Spyrou and S. Sanei, “Source localization of event-related potentials incorporating spatial notch filters,” IEEE Trans. Biomed. Eng., vol. 55, no. 9, pp. 2232–2239, Sep. 2008. [32] T. Limpiti, B. D. Van Veen, H. T. Attias, and S. S. Nagarajan, “A spatiotemporal framework for estimating trial-to-trial amplitude variation in event-related MEG/EEG,” IEEE Trans. Biomed. Eng., vol. 56, no. 3, pp. 633–645, Mar. 2009. [33] J. M¨ocks, T. Gasser, and P. Tuan, “Variability of single visual evoked potentials evaluated by two new statistical tests,” Electroenceph. Clin. Neurophysiol., vol. 57, pp. 571–580, 1984. ˇ ıdl and A. Quinn, The Variational Bayes Method in Signal Process[34] V. Sm´ ing. Berlin, Germany: Springer-Verlag, 2006. [35] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking,” IEEE Trans. Signal Process., vol. 50, no. 2, pp. 174–188, Feb. 2002. [36] J. S. Liu, “Metropolized independent sampling with comparison to rejection sampling and importance sampling,” Stat. Comput., vol. 6, pp. 113– 119, 1996. [37] S. M. Kay, Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. Englewood Cliffs, NJ: Prentice-Hall, 1993. [38] S. Delsanto, F. Lamberti, and B. Montrucchio, “Automatic ocular artifact rejection based on independent component analysis and eyeblink detection,” in Proc. Int. IEEE EMBS Conf. Neural Eng., 2003, pp. 309–312. [39] U. Hegerl and T. Frodl-Bauch, “Dipole source analysis of P300 component of the auditory evoked potential: A methodological advance?” Psychiatry Res., vol. 72, no. 2, pp. 109–118, 1997. [40] T. Frodl, G. Juckel, J. Gallinat, R. Bottlender, M. Riedel, U. Preuss, H. J. M¨oller, and U. Hegerl, “Dipole localization of P300 and normal aging,” Brain Topogr., vol. 13, no. 1, pp. 3–9, 2000. [41] C. J. Gonsalvez and J. Polich, “P300 amplitude is determined by targetto-target interval,” Psychophysiology, vol. 39, no. 3, pp. 388–396, 2002. [42] Y. Nakajima, K. Miyamoto, and M. Kikuchi, “Estimation of neural generators of cognitive potential P300 by dipole tracing method,” Brain Nerve, vol. 46, pp. 1059–1065, 1994. [43] K. Kobayashi, K. Yasuda, J. Takanashi, K. Sugita, Y. Kohno, H. Iwasa, and Y. Nakajima, “Dipole tracing examination for the electric source of

2428

[44] [45]

[46]

[47]

[48] [49] [50] [51] [52] [53] [54]

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 10, OCTOBER 2010

photoparoxysmal response provoked by half-field stimulation,” Epilepsia, vol. 41, pp. 60–60, 2000. N. Yoshimura, M. Kawamura, Y. Masaoka, and I. Homma, “The amygdala of patients with Parkinson’s disease is silent in response to fearful facial expressions,” Neuroscience, vol. 131, no. 2, pp. 523–534, 2005. K. Shindo, A. Ikeda, T. Musha, K. Terada, H. Fukuyama, W. Taki, J. Kimura, and H. Shibasaki, “Clinical usefulness of the dipole tracing method for localizing interictal spikes in partial epilepsy,” Epilepsia, vol. 39, pp. 371–379, 1998. Y. Nakajima, S. Homma, T. Musha, Y. Okamoto, R. H. Ackerman, J. A. Correia, and N. M. Alpert, “Dipole-tracing of abnormal slow brain potentials after cerebral stroke–EEG, PET, MRI correlations,” Neurosci. Lett., vol. 112, pp. 59–64, 1990. A. Zia, T. Kirubarajan, J. P. Reilly, Derek Yee, K. Punithakumar, and Shahram Shirani, “An EM algorithm for nonlinear state estimation with model uncertainties,” IEEE Trans. Signal Process., vol. 56, no. 3, pp. 921– 936, Mar. 2008. V. V. Nikulin, K. Linkenkaer-Hansen, G. Nolte, S. Lemm, K. R. Mueller, R. J. Ilmoniemi, and G. Curio, “A novel mechanism for evoked responses in the human brain,” Eur. J. Neurosci., vol. 25, pp. 3146–3154, 2007. J. Polich and K. Herbst, “P300 as a clinical assay: rationale, evaluation, and findings,” Int. J. Psychophysiol., vol. 38, pp. 3–19, 2000. J. Polich, “Updating P300: An integrative theory of P3a and P3b,” Clin. Neurophysiol., vol. 118, pp. 2128–2148, 2007. E. Niedermeyer and F. Lopes da Silva, Electroencephalography: Basic Principles, Clinical Applications, and Related Fields. Baltimore, MD: Lippincott Williams and Wilkins, 1999. K. A. Kiehl, K. R. Laurens, T. L. Duty, B. B. Forster, and P. F. Liddle, “An event-related fMRI study of visual and auditory oddball tasks,” Psychophysiol., vol. 21, pp. 221–240, 2001. K. B. Petersen and M. S. Pedersen. (2007). The Matrix Cookbook, [Online]. Available: http://www2.imm.dtu.dk/pubdb/p.php?3274 M. H¨am¨al¨ainen, R. Hari, R. Ilmoniemi, J. Knuutila, and O. Lounasmaa, “Magnetoencephalography–theory, instrumentation and applications to the noninvasive study of human brain function,” Rev. Mod. Phys., vol. 65, pp. 413–497, 1993.

Hamid Reza Mohseni (S’07) was born in Iran, in 1982. He received the B.S. and M.Sc. degrees in biomedical engineering from Amirkabir University of Technology, Tehran, Iran, in 2004, and M.Sc. degree in biomedical signal processing from Sharif University of Technology, Tehran, in 2006. In January 2007, he started his doctoral study as a joint student in Schools of Engineering and Psychology, Cardiff University, Cardiff, U.K. His current research interests include EEG and MEG signal detection and localization.

Foad Ghaderi (S’08) received the B.Sc. and M.Sc. degrees in computer engineering from Sharif University of Technology, Tehran, Iran, and the University of Tehran, Tehran, respectively. He is currently working toward the Ph.D. degree in the Centre of DSP at Cardiff University, U.K. His research interests include source separation, adaptive, and blind signal processing.

Edward L. Wilding is a psychologist who acquires real-time measures of neural activity in cognitive demanding tasks to identify neural correlates of information processing operations, with a view to using these correlates to understand human cognition. He is interested in maximizing the information extracted from real-time neural signals, with two recent emphases being on single-trial estimation of eventrelated potentials (ERPs) and accurate assessments of differences between the scalp topographies of patterns of neural activity.

Saeid Sanei (M’97–SM’05) received the Ph.D. degree in biomedical signal and image processing from Imperial College of Science, Technology, and Medicine, London, U.K., in 1991. Since then he has been a member of academic staff in Iran, Singapore, and U.K. He has developed many algorithms in adaptive and nonlinear signal processing, state-space systems, tensor factorization, time series analysis, and their applications to speech, biomedical, and communication data particularly brain signals and images. Currently, he is the Acting Director of the Centre of Digital Signal Processing, School of Engineering, Cardiff University, U.K. He has published a unique monogram called EEG Signal Processing, published by Wiley in 2007, a number of book chapters, and 220 technical papers. Dr. Sanei has served as an Associate Editor for IEEE SIGNAL PROCESSING LETTERS, and EURASIP Journal of Intelligence and Neuroscience, and a Guest Editor for another three special issues. He chaired or co-chaired a number of prestigious conferences and workshops such as IEEE SSP2009 and DSP2007 in Cardiff, U.K.