Comparison of Blind Source Separation Methods for ... - IEEE Xplore

Comparison of Blind Source Separation Methods for Removal of Eye Blink Artifacts from EEG Mumtaz Hussain Soomro, Nasreen Badruddin, Member IEEE and Mohd Zuki Yusoff Centre for Intelligent Signal and Imaging Research (CISIR) Department of Electrical and Electronics Engineering Universiti Teknologi PETRONAS, Malaysia Email: [email protected], {Nasreen.b, mzuki_yusoff}@petronas.com.my

Abstract— Electroencephalography (EEG) recording are generally corrupted by eye blink artifacts. In this research work, blind source separation (BSS) based methods for removal of eye blink artifacts are presented. Two techniques, namely the Independent Component Analysis (ICA) and the Canonical Correlation Analysis (CCA) are investigated. The efficiency and performance of the BSS methods were compared between the two methods using simulated contaminated EEG data of three channels. ICA recovers the EEG signals with average correlation coefficients of 0.5185, 0.7906 and 0.8217 for EEG signals 1, 2, and 3, respectively. The ICA improves signal-to-artifact ratio (SAR) from -4.1395 to 5.9685. On the other hand, CCA recovers the EEG signals with average correlation coefficients of 0.5739, 0.8229 and 0.8427 for EEG signal 1, 2 and 3, respectively, and it improves SAR from -3.5709 to 7.6891. In addition, elapsed time is also investigated for both methods. Average elapsed time of 0.0114s and 0.0905s were computed for ICA and CCA, respectively. These simulated results demonstrate that CCA is more accurate and faster than ICA. Keywords — EEG, Eye blink artifacts, Blind source separation (BSS), ICA, CCA, Correlation coefficient, Signalto-artifact ratio (SAR).

I.

INTRODUCTION

Electroencephalography (EEG) is a non-invasive technique which is used to record electrical cerebral activity through electrodes placed on the scalp of a person [1]. The EEG is a very important tool to investigate neurological disorders such as; epilepsy, tumor, sleep disorder etc. [2]. It is very challenging task to interpret and analyze the EEG signals when they are corrupted with other physiological signals, known as EEG artifacts. The eye blink artifact is the most prominent artifact having amplitude 10 to 100 times greater than the potential of the actual brain signal [2-3]. This results in EEG having very low signal-to-noise ratio (SNR). Different methods have been employed by several authors to remove eye blink artifacts. Among them, linear filtering [4], linear regression analysis [5], Principal Component Analysis (PCA) [6], Independent Component Analysis [7] and Wavelet transforms [8] are common approaches to remove artifacts from EEG recordings.

978-1-4799-4653-2/14/$31.00 © 2014 IEEE

In this work, BSS based methods are used to remove eye blink artifacts from the EEG signal. ICA and CCA are found to be the most reliable and widely explored BSS approaches. The performance and efficiency of ICA and CCA to denoise multi-channel EEG data are compared. The performance of both methods is validated on simulated data using correlation coefficient, signal-toartifact ratio (SAR) and computation time as performance metrics. The rest of the paper is organized as follows: Section II describes the problem formulation and basic description of methods, Section III presents the simulation of eye blink and EEG signals, Section IV gives simulation results and discussion. Finally, conclusion is provided in Section V. II.

METHODS

A. Problem Formulation The multi-channel EEG recording can be considered as an instantaneous linear mixture of underlying rudimentary brain sources and other biological signals, known as artifacts [9] such that: (1) x[ n ] = As[ n ] + v where x[n ] is the observed random data of M number of channels at instance n, s[n ] represents M number of unknown sources of interest, v is noise and A is an M × M unknown linear and square mixing matrix. The main goal of BSS is to recover the source vector s[n ] from the mixed observed vector x[n ] without any priori information. This is possible by finding a −1

demixing matrix, W, which is equal to W ≅ A . This demixing matrix W is projected onto the observed mixed recorded signals to estimate the sources, such that; T

s[ n] = W X [ n] B. Description of BSS Approaches

(2)

B.1 ICA Independent Component Analysis (ICA) is a special type of BSS problem. The ICA separates the mixed

observed random data x[n ] into non-gaussian mutually statistically independent sources s[n ] [10]. Here, the main goal of the ICA is to estimate the demixing matrix W which is the inverse of the mixing

2. 3.

Whiten the centered observed data. Choose m number of independent components.

4.

Randomly initialize w i , i=1, 2, …, m for each

-1

matrix A (i.e. W = A ). Then the independent source components are estimated by projecting the demixing matrix onto the observed data x[n ] as shown in (2). For successful implementation of ICA, the following assumptions should be satisfied. i.

iii.

iv.

xi

The centered observed vector X must be ~ whitened to get X . The unknown source is statistically independent and must have non-gaussian distribution among them. The unknown mixing matrix A must be a square matrix.

The first two assumptions are simple and applied before performing ICA algorithm to simplify the process that makes it robust and faster in performance [10]. These two steps are related to Principal Component Analysis (PCA). The main purpose of whitening the centered observed data is to make the random observed variables uncorrelated with their unity variance. After performing whitening, the new mixing matrix becomes the orthogonal to the original mixing matrix. Since the whitening process effectively reduces the number of parameter computations that are needed to estimate the mixing matrix [10]. The third assumption is an important step for ICA model where the ICA uses the nongaussianity as a contrast function to estimate the independent components [10]. The center limit theorem states that the distribution of the sum of independent variables is approximately close to the Gaussian distribution. Hence, ICA tries to maximize the nongaussianity to estimate the independent components. There are two statistical techniques that have been used by ICA to maximize the non-gaussianity: 1) kurtosis 2) negentropy. These two non-gaussianity measuring techniques have been employed in ICA algorithm individually in [10-11]. It is suggested that negentropy is a more robust technique than kurtosis. In this work, FastICA is used with negentropy as a contrast function. In [10-11], the authors have suggested that FastICA algorithm is more efficient and robust than other ICA algorithms, as FastICA is parallel, distributed, and computationally simple and also requires less memory. The implementation of the FastICA algorithm is well explained in [2] and is summarized as below. 1.

contrast function will renew each column of demixing matrix from the previous iteration.

The observed data is centered by subtracting the mean from each row of observable variable

ii.

5.

Center the data by removing mean from each row.

unit norm and orthogonalize the demixing matrix according to step 6. Using negentropy contrast function to maximize the non-gaussianity, for each i=1, 2,…, m, let ~ ~ ~ T T ' w i ← E{X g ( w i X )} − E{ g ( w i X )}w i . This

6.

Let W ← W

WW

demixing matrix w i 7.

,

T

orthogonalize

the

.

If the demixing matrix W has not converged, go back to step 5.

Convergence occurs only when the change from one iteration to the next is negligible and orthogonality condition met. After convergence, the demixing matrix W can be obtained. B.2 CCA Canonical Correlation Analysis (CCA) is a type of BSS approach that measures the linear association between two multi-dimensional random variables using their autocovariances and crosscovariances [12]. The CCA solves the BSS problem by finding two basis vectors for two sets of variables (i.e. The first set is the observed multi-dimensional random variables x[n ] and, the second set is a one sample delayed version of the multiwhich dimensional random variable x[n ] is y[ n ] = x[ n − 1] ) in which cross-correlation matrix between the data sets become diagonal and correlation of the diagonal are maximized [12-13]. By removing mean from each row of x[n ] and y[n ] , we can obtain vector ˆ with zero Xˆ with zero mean variables x , x ,......, x and Y 1

2

n

mean variables y1 , y2 ,...., yn . The linear combination of ˆ are known as canonical variates, variables in Xˆ and Y such that; T ˆ u=a X (3) T ˆ v=b Y (4) where a and b are called weight vectors and u and v are ˆ , called canonical variates corresponding to Xˆ and Y respectively. The CCA tries to maximize the correlation between u and v to find these weight vectors a and b, such that;

T

maxρ(u, v) = max a,b

a,b

a Cxyb T

T

a Cxxa b Cyyb

(5)

Where ρ represents the maximum canonical correlation, C xx and C yy are the autocovariances of Xˆ and Yˆ vectors respectively and C xy is the crosscovariance matrix ˆ vectors. Finally, the optimization between Xˆ and Y problem can be solved by using singular value decomposition (SVD) on (5) that gives; C xx C yy

−1

−1

C xy C yy C yx C xx

−1

−1

C yx a

= ρ 2a

C xy b

= ρ 2b

(6)

(7)

2

Where ρ is equal to eigenvalue λ in descending order i.e. λ1 > λ2 >,......, > λn and a and b represent the Eigenvectors corresponding to larger value of λ . As both x[n ] and y[n ] contain same information [13]. So, only one equation either (6) or (7) can be used to obtained weight vector. Next, this estimated weight vector is transformed on Xˆ as shown in (3), we get CCA components U = [u1 , u 2 ,......, u n ] that are uncorrelated with each other. The demixing matrix W can be obtained by taking inverse of estimated weight vector a −1

(i.e. W = a ). Finally, the sources s[n ] can be estimated by applying W on U, such that; T

s[n] = W U [ n]

(8)

The ICA uses negentropy as the contrast function to make the signals as non-gaussian as possible. The ICA does not consider temporal correlation among signals to solve the contrast function. In other words all the temporal information in a signal is thrown away. In the real world, the multivariate time series signals have a certain autocorrelation. The EEG signals have temporal and spatial structure causing the autocorrelation between them. Unlike ICA, CCA uses autocorrelation as a contrast function considering temporal correlation in the source signals [14]. ICA uses PCA as preprocessing steps to estimate the mixing matrix. As PCA operates on a covariance matrix of one data set x[n ] only i.e.

⎡Cxx ⎢ ⎣⎢0. ....

0⎤

⎥,

I⎦⎥

whereas CCA operates on a crosscovariance matrix of two data sets x[n ] and y[ n ] = x[ n − 1] i.e.

⎡Cxx ⎢ ⎢⎣Cyx

Cxy⎤

⎥

Cyy⎥⎦

that provides linear correlation between x[n ] and y[n ] while PCA fails to provide this. This crosscovariance property in CCA makes it better to estimate the mixing matrix. After estimating mixing matrix, ICA uses iterative procedure (as described in Sub-section B.1) to obtain demixing matrix W. Unlike ICA, CCA only takes inverse of the estimated matrix to obtain demixing matrix W. This is the reason CCA takes less computational time than ICA. III.

SIMULATION OF EEG SIGNALS AND EYE BLINK

A. EEG Signals Simulation

The EEG signal, s(k), of 2560 samples at 256 sampling frequency (i.e. 10 seconds duration) were simulated by autoregressive (AR) model [15], given by the following equation.

s ( k ) = 1.5084 s ( k − 1) − 0.1587 s ( k − 2) − 0.3109 s ( k − 3) − 0.0510 s ( k − 4) + W ( k )

(9)

B. Eye Blink Artifact Simulation In this simulation, a real vertical EOG recoding that is filtered by low pass filter to remove high frequency noise without distorting the waveform and contains only eye blink artifact is carried out for the simulation of eye blink artifact e. C. Contaminated EEG Simulation The artificially contaminated EEG signal, x(k), was then generated by summing up together s(k), and e. In this simulation, the filtered vertical EOG e is scaled by different transmission coefficients γ which are related to distance of electrodes placed on a human scalp to eyes, as mentioned in [16]. The transmission coefficients from the vertical EOG to the EEG signals were set as 0.5, 0.23 and 0.13 respectively, as shown in below equation.

x(k ) = s(k ) + γ × e

(10)

where γ = 0.5, 0.23, 0.12. Three different EEG signals are simulated in this simulation. First EEG signal is related to frontal polar channels (i.e. Fp1 and Fp2) using γ= 0.5, second EEG signal is related to frontal channels (F3, F4, F7, F8, and Fz) using γ = 0.23 and third is related to central (i.e. C3, C4, Cz) and two temporal channels (i.e. T3, T4) using γ = 0.12, while the effect of eye blink is very negligible over the rest of EEG channels (i.e. P3, P4, Pz, O1, O2, T5, T6). Therefore, only three EEG signals are enough to evaluate the performance of the proposed methods.

EEG Signal 3

Amplitude (microvolt) EEG Signal 2

EEG Signal 1

Clean EEG Signals 50 0 -50

0

500

1000

1500

2000

2500

3000

50 0 -50

0

500

1000

1500

2000

2500

3000

EEG Signal 1 Amplitude (microvolts) EEG Signal 2 EEG Signal 3

Clean EEG signal -50

Corrected EEG Signal via FastICA

0

500

1000

1500

2000

2500

3000

0

500

1000

1500

2000

2500

3000

0

500

1000

1500

2000

2500

3000

50 0 -50 100 50 0 -50

(d) Corrected EEG signals via FastICA v/s Clean EEG signals. Fig. 1. Clean EEG, contaminated EEG and corrected EEG signals.

-50

0

500

1000

1500 2000 Time (samples)

2500

3000

300 200

The Fig.1 (c) and Fig.1 (d) demonstrate that the both CCA and FastICA methods remove the eye blink artifacts from the EEG signals reliably. However, the small effect of eye blink remains in first EEG signals which could not be completely removed by the CCA and FastICA methods this is because EEG signal 1 behaves as reference to correct the EEG signals 2 and 3.

100

Performance Metrics for Comparison

0 -100

0

500

1000

1500

2000

2500

3000

0

500

1000

1500

2000

2500

3000

0

500

1000

1500 Samples

2000

2500

3000

150 100 50 0 -50 100 50 0 -50

(b) Contaminated EEG signals.

EEG Signal 1

0

0

Contaminated EEG Signals

Amplitude (microvolts) EEG Signal 2

50

50

(a) Clean simulated EEG signals.

EEG Signal 3

Amplitude (microvolts) EEG Signal 2

As an illustration, three simulated clean EEG signals, contaminated EEG signals and corrected or recovered EEG signals via CCA and FastICA are depicted in Fig.1(a), Fig.1(b), Fig.1(c),and Fig.1(d), respectively.

EEG Signal 1

SIMULATION RESULTS

EEG Signal 3

IV.

50 0 Clean EEG Signal -50

A. Correlation Coefficient (CC) The correlation coefficient is calculated to find the similarity between the original EEG signal and recovered EEG signal with respect to mean, variance and amplitude of the signal. The correlation coefficient value lies between 0 and 1. It gives a value of 1 for higher correlation value between two vectors and it indicates that the two vectors are similar in term of amplitude and mean. After removal of the eye blink artifacts with CCA, the average correlation coefficient of 100 trials is 0.5739 for EEG signal 1, 0.8229 for EEG signal 2 and 0.8427 for EEG signal 3. With FastICA, the average CC is 0.5185 for EEG signal 1, 0.7906 for EEG signal 2 and 0.8217 for EEG signal 3. The comparison of average CC is shown in Fig.2. It reveals that the removal of eye blink effect with CCA is better than with FastICA.

Corrected EEG Signal via CCA

0

500

1000

1500

2000

2500

3000

0

500

1000

1500

2000

2500

3000

0

500

1000

1500 Samples

2000

2500

3000

50 0 -50 50 0 -50

(c) Corrected EEG signals via CCA v/s Clean EEG signals.

Fig. 2. Comparison for average correlation coefficient.

D. Power Spectrum Density (PSD)

B. Signal-to-Artifact Ratio (SAR)

where x is the original clean EEG, xˆ contaminated EEG and xˆ

corr

(11)

(12)

cont

is the

is the corrected EEG. The

std function represents the standard deviation. SAR is calculated before and after removal of eye blink artifacts to quantify the degree of the removal of eye blink artifacts. Fig.3 shows the SAR averaged over 100 trials. The CCA gives a better performance of SAR on average compared to FastICA. This means that on average, the CCA removes more artifacts than FastICA. C. Elapsed Time

EEG Signal 3 (dB)

Fig. 3. Comparison for average SAR

100

Clean EEG Signal Contaminated EEG Signal

50

Corrected EEG Signal via CCA

0 -50

0

5

10

15

20

25

30

35

40

0

5

10

15

20

25

30

35

40

0

5

10

15

20 Frequency (Hz)

25

30

35

40

60 40 20 0 -20

60 40 20 0 -20

(a) PSD of clean EEG, contaminated EEG and corrected EEG by CCA

EEG Signal 2 (dB)

Elapsed time is related to computation time taken by an algorithm to perform the task. In this simulation of 100 trials, the comparison of elapsed time for the both CCA and ICA methods is shown in Fig.4. The average computation time is 0.0114s for CCA and 0.0905s for FastICA in the same 100 trials of simulation. CCA is therefore a more time-efficient algorithm.

EEG Signal 1 (dB)

std ( x )

EEG Signal 2 (dB)

⎞ ⎟ before ⎜ std ( x − xˆ ) ⎟ cont ⎠ ⎝ ⎛ std ( x ) ⎞ ⎟ = 10 ⋅ log⎜ SAR after ⎜ std ( x − xˆ ) ⎟ corr ⎠ ⎝ = 10 ⋅ log⎜

EEG Signal 3 (dB)

⎛

SAR

Power Spectrum Density (PSD) gives the average energy or power of a random signal across frequency response. The Welch method is utilized to compute the PSD of clean EEG signals, contaminated EEG and recovered EEG signals via CCA and FastICA methods. From the Fig.5, it shows that distortion of eye blink artifact lies in in low frequency band. The comparison of PSD plot in Fig.5 demonstrates that the power spectrum of recovered EEG signals via CCA and FastICA matches with the original clean EEG signals, while the distortion due to eye blink artifact in low frequency band is reduced.

EEG Signal 1 (dB)

The signal-to-artifact ratio (SAR) is defined as,

100

Clean EEG Signal Contaminated EEG Signal Corrected EEG Signal via FastICA

50 0 -50

0

5

10

15

20

25

30

35

40

0

5

10

15

20

25

30

35

40

0

5

10

15

20 Frequency (Hz)

25

30

35

40

60 40 20 0 -20

60 40 20 0 -20

(b) PSD of clean EEG, contaminated EEG and corrected EEG by FastICA Fig. 5. Comparison for PSD of CCA and FastICA

V.

CONCLUSION

In this paper, two BSS based methods, CCA and FastICA are presented. The proposed methods were comprehensively simulated using MATLAB. Fig. 4. Comparison for average elapsed time

On the basis of above comparative result, it reveals that both methods can remove eye blink artifacts from the EEG signals reliably using simulated data. However CCA is a better artifact removal method with less computation time. Nevertheless, some enhancements are still needed to the algorithms to improve their accuracy. REFERENCES [1] [2]

[3]

[4] [5] [6]

[7]

[8]

[9] [10]

[11]

[12]

[13]

[14]

[15] [16]

S. Sanei and J.A. Chambers, EEG Signal Processing. NewYork:Wiley, 2007. M.H. Soomro, N. Badruddin, M.Z. Yusoff, and A.S. Malik, “A Method for Automatic Removal of Eye Blink Artifacts from EEG Based on EMD-ICA,” in Proc. of 9th IEEE Colloquium on Signal Processing and its Applications, pp.129-134, 8-10 March 2013. K. Naraharisetti "Removal of Ocular artefacts from EEG Signal using Joint Approximate Diagonalization of Eigen Matrices (JADE) and Wavelet Transform", Canadian Journal on Biomedical Engineering & Technology Vol. 1, No. 4, July 2010. G. Correa, “Artifact removal from EEG signals using adaptive filters in cascade”, IOP publishing Ltd, 90, pp.1-10.5, 2007. R. J. Croft, R. J. Barry, “Removal of ocular artifact from the EEG: a review” Neurophysiologic Clin , vol.30, pp.5-19, 2000. T. D. Lagerlund, F. W. Sharbrough and N. E. Busacker, “Spatial filtering of multichannel electroencephalographic recordings through principal component analysis by singular valued decomposition,” J. Clin. Neurophysiol. vol. 14, pp. 73-82, 1997. Yandong li, “Automatic removal of the eye blink artifact from EEG using an ICA-based template matching approach” Psyological Measurement, vol. 27, pp. 425-436,2006 . P.Senthilkumar, “Removal of Ocular Artifacts in the EEG through Wavelet Transform without using an EOG Reference Channel”, Int. J.Open Problems Compt. Math , Vol. 1, No. 3 pp.188200,2008. J. Sarvas, “Basic mathematical and electromagnetic concepts of the biomagnetic inverse problem,” Phys. Med. Biol. vol. 32, pp.11–22, 1987. A. Hyvärinen, E. Oja, "Independent component analysis: algorithms and applications", Neural Networks, Vol. 13, Issues 4– 5, pp. 411-430, June 2000. D Langlois, S Chartier, D Gosselin, "An Introduction to Independent Component Analysis: Infomax and FastICA Algorithms", Tutorials in Quantitative Methods for Psychology, Vol. 6(1), pp. 31-38, 2010. M.H. Soomro, N. Badruddin, M.Z. Yusoff, M.A. Jatoi, "Automatic eye-blink artifact removal method based on EMDCCA," in Proc. of IEEE International Conf. on Complex Medical Engineering (CME), vol., no., pp.186-190, 25-28 May 2013. M. Borga and H. Knutsson, “A canonical correlation approach to blind source separation,” Linköping University, Linköping, Sweden, Technical Report LiU-IMT-EX-0062, 2001. B. S. Raghavendra, and D. N. Dutt, “Wavelet Enhanced CCA for Minimization of Ocular and Muscle Artifacts in EEG”, Int. Jr. of Biol. and Life Sciences, vol. 8:3, pp. 179-184, 2012. Mohd Zuki, "Generalized Subspace Approach for Measurement of Latencies in Visual Evoked Potentials", Thesis, IRC, UTP, Malaysia 2010. T. Gasser, L. Sroka, and J. Mocks, “The Transfer of EOG Activity into the EEG for Eyes Open and Closed,” Electroencephalography and Clinical Neurophysiology, vol.61, pp.181–193, 1985.