processing/classification, the output device components and the ... these signals into device commands. ..... Expert Review of Medical Devices, vol. 4, no.
Convolutional Neural Network with embedded Fourier Transform for EEG classification Hubert Cecotti, Axel Graeser Institute of Automation (IAT), University of Bremen, Germany {cecotti;ag}@iat.uni-bremen.de
Abstract In BCI (Brain - Computer Interface) systems, brain signals must be processed to identify distinct activities that convey different mental states. We propose a new technique for the classification of electroencephalographic (EEG) Steady-State Visual Evoked Potential (SSVEP) activity for non-invasive BCI. The proposed method is based on a Convolutional Neural Network that includes a Fourier transform between hidden layers in order to switch from the time domain to the frequency domain analysis in the network. The first step allows the creation of different channels. The second step is dedicated to the transformation of the signal in the frequency domain. The last step is the classification. It uses a hybrid rejection strategy that uses a junk class for the mental transition states and thresholds for the confidence values. The presented results with offline processing are obtained with 6 electrodes on 2 subjects with a time segment of 1s. The system is reliable for both subjects over 95%, with rejection criterion.
1
Introduction
Brain - computer interface (BCI) systems allow people to communicate through direct measures of brain activity [1, 3]. Unlike all other means of communication, BCIs require no movement [8]. BCI have been primarily used to enable communication for persons with severe disabilities who are unable to communicate through any classical devices. Usually, a BCI is decomposed into four parts: the signal acquisition, the signal processing/classification, the output device components and the operating protocol that links the three previous components. This work deals with the signal classification component. This part includes two main steps: the extraction of brain signal features and the translation of these signals into device commands. To classify different brain signals, the knowledge of the stimuli drives
978-1-4244-2175-6/08/$25.00 ©2008 IEEE
the solution to some specific pattern recognition methods. When a visual stimulus with a constant frequency is presented to the user, a continuous brain response at the same frequency is present in the visual cortical area. This paper deals with the classification of such responses that are called steady-state visual evoked potentials (SSVEP). One of the actual challenges in BCI is to improve the reliability of the commands, i.e. the EEG classification in a short time period. In the following sections, we propose a neural network that uses a priori knowledge of the problem and that can adapt to each subject. The second section presents the EEG signals from SSVEP. The main system is described in the third section. The experiments and the results are detailed in the fourth section.
2
EEG signals
The EEG signals classification is one of the current challenges for real BCI applications [4]. Different types of classifiers have been used for EEG classification like neural networks [2, 7] and Hidden Markov Models [12]. In the following, we consider Nelec for the EEG signal acquisition. The signal corresponds to the voltage measure between a reference electrode and one of the Nelec electrodes. Let SF be the signal frequency acquisition and T S the time segment, in second, attributed to the analysis of the signal. For SSVEP stimuli, we consider Nf req visual stimulation on a LCD screen with boxes flickering at different frequencies. Usually different channels are created to perform the classification. A channel is used as a linear combination of the signals measured by the Nelec electrodes. A channel c is defined by: cj =
N elec X
wi ∗ Xi,j
i=0
The creation of channels allows the analysis of a set of spatially independent vectors instead of a matrix. The
information from the electrodes is resumed in one scalar at a time j. For the EEG signal pre-processing, the first step is to find an optimal set wi,k , 0 ≤ i < Nelec and k is a finite number, as small as possible. There exists actual solutions for the creation of one or several channels: the average combination, the native combination [5], the bipolar combination [10], the Laplacian combination, the Minimum energy combination, the Maximum contrast combination [6]. We propose the creation of channels that are tailored together and function to their discriminant power once they are combined. The goal is to determine the optimal set of weights for a finite number of channels, which can help for the final classification. A solution for creating such channels is proposed in the next section.
3
System overview
The model is based on a convolutional neural network (CNN), which is based on a multi-layer perceptron with a special topology. This kind of model is widely used in handwriting character recognition [9]. Its interest is to directly classify the raw signal and to integrate the signal processing functions within the discriminant steps. Indeed, it is not always possible to know the kind of features to extract. However some knowledge of the problem may be included in the network topology. It is the reason why we propose to add some signal processing methods between 2 hidden layers. The inputs are the EEG signal values from the electrodes during a time segment, Xi,j , 0 ≤ i < Nelec , 0 ≤ j < SF ∗T S. The output corresponds to the Nf req SSVEP frequencies and a class for rejection. Thus, for the classification task, there are Nf req + 1 classes. The process within the neural network is composed of the signal normalization and denoising, the channels creation, the selection of a pool of frequencies and their harmonics then their classification. Before processing the signal, data are normalized (Xi,j ← (Xi,j − X¯i,j )/(σi,j ) where X¯i,j and σi,j are respectively the average value and the first deviation of the electrode i at the time j in a time segment T S.
3.1
Neural network topology
The network is composed of 4 layers, which are composed of one or several maps. We define a map as a layer entity that has a specific semantic: each map of the first hidden layer is a channel. The first hidden layer is dedicated to the denoising of the input data and the creation of the different channels. The network topology is described as follows:
• Layer 0 (L0 ): the input layer. Xi,j with 0 ≤ i < Nelec and 0 ≤ j < SF ∗T S. SF ∗T S corresponds to the number of samples in T S seconds. • Layer 1 (L1 ): the first hidden layer. L1 is composed of 10 maps. We define L1 Mm , the map number is m. Each map of L1 has the size 1*SF ∗ T S −4. (The reduction of 4 is due of the border effects of the kernel for filtering in the time domain). • Layer 1’ L01 ): L01 is the result of L1 after the Fourier Transform. Each map of L01 has the size 1*22, a selection of the frequencies that play a features role. • Layer 2 (L2 ): the second hidden layer. L2 is composed of 1 map of 100 neurons. This map is fully connected to its corresponding map on L01 . • Layer 3 (L3 ): the output layer. This layer has only one map of 6 neurons, which represents the 5 frequencies to detect and the transition state. This layer is fully connected to each map of L2 .
3.2
Propagation
The first hidden layer is dedicated to the automatic creation of the channels (their weights) and the automatic linear filtering of the signal in time. This step may be useful to cancel some artifacts due to difference of the phase in the signal between electrodes. For each map of L1 is processed as following: First a convolution is applied on the input layer for creating the channels: The value of a neuron n of L1 Mm is defined by: L1 Mm (n) = f (σ) where σ=
i,j=N elec ,5 X
(XSF ∗T S∗i,j+n Wm,i,j ) + Wm )
i,j=0,0
where 0 ≤ m < 10, 0 ≤ i < Nelec , 0 ≤ j < SF ∗ T S, 0 ≤ n < SF ∗ T S − 4 and f is a tanh function, W is the set of weights. Let notice that each neuron of the map shares the same set of weights and is only connected to a window of size 5 ∗ Nelec . This window allows filtering the signal in the space and time domain at the same time. Instead of learning one set of weights for each neuron, dependent to the neuron position, the weights are learnt independently to their corresponding output neuron. Once the channels are created. The Fourier Transform is applied on the neurons value to pass in the frequency domain. Then frequencies between 12.5Hz and
17.5Hz with a step of 0.5Hz, and their first harmonics, are selected as they represent the frequency band of the stimuli. Finally we consider the layer L01 and its map 0 , which have all 22 neurons. Between L01 and L00 Mm L3 , it corresponds to a classical multi-layer perceptron. L2 and L3 are fully connected, respectively to L01 and L2 .
3.3
Backpropagation
The backpropagation for the L3 and L2 is done by using a gradient descent by minimizing the least mean square error. The error must be transfered back in the time domain by using the Inverse Fourier Transform for correcting the weights in the first hidden layer. As the errors in this layer are complex numbers, only the real part is used for updating the weights.
3.4
Rejection strategy
The system must be very reliable. Therefore we combine 2 decisions for the rejection. First, we dedicate one class for the transition states as described previously. Then there are 2 thresholds for each class. They are determined function to the validation database, which is also used to find the best epoch for generalization. For each class Ci , we define the 2 thresholds: ζi =
2M ax(P (x ∈ Ci )) + Ave(P (x ∈ Ci )) 3
2M ax(DP (x ∈ Ci )) + Ave(DP (x ∈ Ci )) 3 where M ax(P (x ∈ Ci )) and Ave(P (x ∈ Ci )) are respectively the maximum and average probability values for a pattern to be accepted as belonging to the class Ci , i.e the signal at (13 + i)Hz. M ax(DP (x ∈ Ci )) and Ave(DP (x ∈ Ci )) are respectively the maximum and average distance between the first and second best answer for a pattern to be accepted as belonging to the class Ci . A signal X is attributed to the class Ci by the classifier E, i ≤ 0 < Nf req when: ψi =
• i = argmaxn L3 (n) • P (x ∈ Ci ) > ζi • DP (x ∈ Ci ) > ψi otherwise the class is rejected, E(X) = R.
4
Experiments
The experiments are made on a particular BCI type: the system is non-invasive. It does not require
surgery to implant electrodes. It only uses sensors with contact on the surface of the scalp via 8 standard EEG electrodes. They are placed on AFZ for ground, CZ for the reference and P O3 , P O4 , PZ , O1 , 02 , OZ for the input electrodes on the international 10-20 system of measurement [11]. The stimuli are flickering lights and their responses should correspond to an SSVEP response. The system must reflect the user attention to a fast oscillating stimulus. EEG signals were recorded on 2 subjects. For each subject, we have 5 different trials of about 3 minutes. Each trial is composed of a sequence of events. The sequence simulates the actions that can occur during online processing where the user has to shift his gaze between the different visual stimuli. During Event(i), the subject has to look for 4s at a box flickering at (13 + i)Hz, 0 ≤ i < Nf req , Nf req = 5. The sequence is defined as: for i=0 to Nf req − 1 for j=0 to Nf req − 1 { Event(i) Event(j) } Table 1. Results on the test database. Subject A B Recognition (learning) 93.44 76.39 Recognition (validation) 56.57 63.52 Recognition (test) 53.47 49.95 Rejection (test) 43.77 48.59 Reliability (test) 95.08 97.17
The learning, validation and test database are composed respectively of 6392, 2130 and 2134 patterns for each subject. The classes are equally distributed in the three database. For each signal, the three first trials are dedicated to the training of the system, the fourth trial is used for the validation and the fifth is for the test. The recognition, error, rejection and reliability rate (τrec , τerr , τrej , τrel ) are defined by: P ((E(X) = Ci ) and (X ∈ Ci )) τrec = X∈DB P X∈DB X ∈ Ci P τrej =
= R) and (X ∈ Ci )) X∈DB X ∈ Ci
X∈DB ((E(X)
P
where DB is the considered database. τerr = 1 − τrec − τrej and τrel = τrec /(τrec + τerr ). Table 1 presents the results (in %) obtained with the CNN with 2 subjects. The results correspond to
the epoch giving the best recognition rate on the validation database. For other epochs, we observe that the recognition rate reaches about 97% for the learning database, which proves the pertinent choice of the topology. However, the generalization remains average and the system must reject a lot of patterns to achieve a high reliability. Such generalization can be explained by the choice of the time segment and maybe the difficulty for the subjects to focus on the stimuli. The proposed approach is well suited for the channel creation as it allows the creation of channels that enable efficient results in spite of the very noisy signals. These results validate the idea to switch from the time domain to the frequency domain, and the need to adapt the BCI function to the user. The two subjects give good SSVEP response. The rejection strategy offers the same kind of results for both subjects. A time segment of 1s is quite challenging for observing a peak in the expected frequency. Thus almost half of the patterns are rejected. Contrary to other pattern recognition applications where the ground truth can be checked easily, the expected brain signals are difficult to verify during the experiment. Such incertitude leads on rejecting naturally a lot of patterns. The subject can shift his gaze and produce unwanted signals during the experiment. Therefore there is an inevitable risk that the data used for training may have some parts that do not correspond to an expected SSVEP response even if we consider the transition between events in a junk class. However the proposed strategy allows a good reliability while keeping a short time segment, which is a key factor for obtaining a high information transfer rate during online processing.
5
Conclusion
A new model based on a Convolutional Neural Network has been presented. The network integrates between 2 hidden layers the Fourier transform, which changes the layer semantic from the time domain analysis to the frequency domain analysis. This method allows also the automatic creation of channels and linear time filters function to the user and their discriminant powers for the signal classification. The proposed rejection criterion are successful and allow a high reliability. The model has been validated for the classification of EEG signal in a BCI system based on SSVEP responses.
Acknowledgment This research was supported by a Marie Curie European Transfer of Knowledge grant Brainrobot, MTKD-
CT-2004-014211, within the 6th European Community Framework Program.
References [1] B. Z. Allison, E. W. Wolpaw, and A. Wolpaw. Braincomputer interface systems: progress and prospects. Expert Review of Medical Devices, vol. 4, no. 4, pages 463–474, 2007. [2] C. Anderson, S. Devulapalli, and E. Stolz. Determining mental state from eeg signals using parallel implementations of neural networks. IEEE Workshop on Neural Networks for Signal in Processing, Cambridge, MA, USA,, pages 475–483, 1995. [3] N. Birbaumer and L. G. Cohen. Brain-computer interfaces: communication and restoration of movement in paralysis. Journal of Physiology-London, vol. 579, no. 3, pages 621–636, 2007. [4] B. Blankertz, G. Dornhege, S. Lemm, M. Krauledat, G. Curio, and K. Muller. The berlin brain-computer interface: Eeg-based communication without subject training. IEEE Trans. on Neural Systems and Rehabilitation Engineering, vol. 14, no. 2, pages 147–152, 2006. [5] G. Burkitt, R. Silberstein, P. Cadush, and A. Wood. Steady-state visual evoked potentials and travelling waves. Clin. Neurophysiol, vol. 111, no. 2, pages 246– 258, 2000. [6] O. Friman, I. Volosyak, and A. Graser. Multiple channel detection of steady-state visual evoked potentials for brain-computer interfaces. IEEE Trans. on Biomedical Engineering, vol. 54, no. 4, pages 742–750, 2007. [7] E. Haselsteiner and G. Pfurtscheller. Using time dependent neural networks for eeg classification. IEEE Trans. on Rehabilitationi Engineering, vol. 8, no. 4, pages 457–463, 2000. [8] A. Kostov and M. Polak. Parallel man-machine training in development of eeg-based cursor control. IEEE Trans Rehabil Eng, vol. 8, no. 2, pages 203–205, 2000. [9] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pages 2278–2324, 1998. [10] G. Muller-Putz, R. Scherer, C. Brauneis, and G. Pfurtscheller. Steady state visual evoked potential (ssvep)-based communication: impact of harmonic frequency components. J. Neural Eng. vol. 2, no. 4, pages 123–130, 2005. [11] G.-E. Sharbrough, R. Chatrian, H. Lesser, M. Lders, T. Nuwer, and P. TW. American electroencephalographic society guidelines for standard electrode position nomenclature. J. Clin. Neurophysiol, vol. 8, 200-2, 1991. [12] S. Zhong and G. Joydeep. Hmms and coupled hmms for multi-channel eeg classification. In Proc. IEEE Int. Joint. Conf. on Neural Networks, vol. 2, pages 1154– 1159, 2002.