30
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 1, JANUARY 2005
Automatic Detection of Epileptiform Events in EEG by a Three-Stage Procedure Based on Artificial Neural Networks Nurettin Acır*, ˙Ibrahim Öztura, Mehmet Kuntalp, Barıs¸ Baklan, and Cüneyt Güzelis¸
Abstract—This paper introduces a three-stage procedure based on artificial neural networks for the automatic detection of epileptiform events (EVs) in a multichannel electroencephalogram (EEG) signal. In the first stage, two discrete perceptrons fed by six features are used to classify EEG peaks into three subgroups: 1) definite epileptiform transients (ETs); 2) definite non-ETs; and 3) possible ETs and possible non-ETs. The pre-classification done in the first stage not only reduces the computation time but also increases the overall detection performance of the procedure. In the second stage, the peaks falling into the third group are aimed to be separated from each other by a nonlinear artificial neural network that would function as a postclassifier whose input is a vector of 41 consecutive sample values obtained from each peak. Different networks, i.e., a backpropagation multilayer perceptron and two radial basis function networks trained by a hybrid method and a support vector method, respectively, are constructed as the postclassifier and then compared in terms of their classification performances. In the third stage, multichannel information is integrated into the system for contributing to the process of identifying an EV by the electroencephalographers (EEGers). After the integration of multichannel information, the overall performance of the system is determined with respect to EVs. Visual evaluation, by two EEGers, of 19 channel EEG records of 10 epileptic patients showed that the best performance is obtained with a radial basis support vector machine providing an average sensitivity of 89.1%, an average selectivity of 85.9%, and a false detection rate (per hour) of 7.5. Index Terms—Automatic spike detection, EEG, neural networks, radial basis function networks, support vector machines.
I. INTRODUCTION
T
HE ELECTROENCEPHALOGRAM (EEG) has long been a valuable clinical tool in the diagnosing, monitoring and managing of neurological disorders related to epilepsy. Epilepsy may be defined as a symptom of paroxysmal and abnormal discharges in the brain that may be induced by a variety of pathological processes of genetic or acquired origin. This disorder is often characterized by sharp recurrent and transient Manuscript received December 18, 2002; revised May 13, 2004. The work of N. Acır was supported in part by the Turkish Scientific and Technical Research ˙ Council (TÜBITAK) through the Münir Birsel Fund. Asterisk indicates corresponding author. *N. Acır is with Neuro-Sensory Engineering Laboratory, University of Miami, Miami, FL 33124 USA, on leave from the Dokuz Eylül University, Electrical and Electronics Engineering Department, 35160, Buca, ˙Izmir, Turkey. (e-mail:
[email protected]). ˙I. Öztura and B. Baklan are with Dokuz Eylül University, Medical Faculty, Neurology Department, ˙Izmir, Turkey. M. Kuntalp and C. Güzelis¸ are with the Dokuz Eylül University, Electrical and Electronics Engineering Department, 35160, ˙Izmir, Turkey. Digital Object Identifier 10.1109/TBME.2004.839630
disturbances of mental function and/or movements of different body parts that result from excessive discharges of groups of brain cells. The presence of epileptiform activity, which is distinct from the background EEG activity, confirms the diagnosis of epilepsy, although it can be confused with other disorders producing similar seizure-like activities. During seizures, the scalp EEG of patients with epilepsy is characterized by high-amplitude and synchronized periodic EEG waveforms. In between seizures, the epileptiform transient waveforms, which include spikes and sharp waves, are typically observed in the scalp EEG of such patients. An EEG epileptiform transient (ET), which is different from the background activity, has a pointed peak and duration of 20 to 70 ms [1]. Although it may occur alone, an ET is usually followed by a slow wave, which lasts 150–350 ms, together forming what is known as a “spike and slow wave complex” [2]. The detection of epilepsy generally includes visual scanning of EEG recordings for these spikes and seizures by an experienced EEGer. This process, however, is very time consuming, especially in the case of long recordings. In addition, disagreement among the EEGers on the same record is possible due to the subjective nature of the analysis [3]. Therefore, a need arises for an automatic spike detection system that would make decisions based on objective criteria. Furthermore, the use of ambulatory monitoring, which produces 24 hours or longer continuous EEG recordings, is becoming more common, thus further increasing the need for an efficient automated detection method. Several attempts have been made to automate the spike detection process using computer-based methods. In most of these systems, the measurements of electrographic parameters of EEG waveforms, such as sharpness, slope, duration and amplitude are compared with thresholds, which are representative of typical true spikes [4]–[9]. A different spike detection system has also been developed, which is sensitive to the different states of EEG such as active wakefulness, quite wakefulness, desyncronized EEG, phasic EEG, and slow EEG [10]. Similarly, some filtering techniques have also been proposed for spike detection [11]. All these studies have tried to find some standards for detecting ETs using objective criteria. With an EEG signal free of artifacts, a reasonably accurate detection of spikes and sharp waves is possible; however, difficulties arise with artifacts. This problem increases the number of false detections that commonly plague all automatic detection systems [12], [13]. Many studies using an artificial neural network (ANN) approach have been reported in the literature [8], [14]–[22]. ANN-based detection systems basically use either of two different input representations: 1) the extracted EEG
0018-9294/$20.00 © 2005 IEEE
ACIR et al.: AUTOMATIC DETECTION OF EVS IN EEG
features or 2) the raw EEG signal. In the former case, the extracted features of an ET such as its slope, duration, amplitude, and sharpness and some context parameters are presented to the ANN for training and testing purposes. This parameterized approach has been used by a number of researchers successfully [14], [18]. However, the success of this approach depends on the proper selection of the features based on the experience of the EEGer. Weber et al. [18] compared the parameterized approach with the approach using raw EEG data and reported that an ANN employing parameterized input performed superior to that using raw EEG data [18]. In the second case, the raw EEG signal is presented to the ANN after a proper scaling and windowing [23], [24]. This approach does not need a precise definition of the spike morphology. Spike detection using raw EEG data has the potential advantage of avoiding possible false classification arising from data omission in EEG parameterization [16], [24]. In this paper, a three-stage ANN-based spike detection system is presented. The implementations of the system using three different types of ANN are described and the classification performance of each is compared with each other. The classification in the first stage with six extracted features is realized by using two discrete perceptrons one of which captures definite non-ETs while the second one captures definite ETs. In other words, the possible ETs and possible non-ETs are separated from definite ETs and definite non-ETs at this stage. For the second stage of classification with raw EEG, three different ANNs, i.e., a backpropagation multilayer perceptron (MLP), a radial basis function network (RBFN) trained by the hybrid method [25], and a radial basis support vector machine (RB-SVM) [26], [27], are constructed to function as a postclassifier. The postclassifier is aimed to separate possible ETs and possible non-ETs from each other. A convention adopted throughout this paper is to use ET to refer to epileptiform activity on a single channel, and to refer to activity which is simultaneously seen across two or more channels as an epileptiform event (EV). The classification performance of the system is determined by measuring the sensitivity and selectivity. Sensitivity is the ratio of true positives to the total number of EVs detected by the EEGer. Selectivity is the ratio of true positives to the total number of EVs detected by our system. Events are called as “true positive” when both our system and the EEGer detect them as EVs [28]. All calculated performances throughout this paper are determined after the integration of multichannel information in terms of EVs. Among the three alternatives, the RB-SVM is found to be the best in terms of the overall detection performance resulting in an average sensitivity of 89.1%, an average selectivity of 85.9% and an average false detection rate (per hour) of 7.5 for the test set. II. EEG DATA A. Data Acquisition The EEG data used in this study were acquired from 29 epileptic patients who had been under the evaluation and treatment in the Neurology Department of Dokuz Eylül University Hospital, ˙Izmir, Turkey. Nineteen of these EEG records were used in training, while the remaining 10 in testing. The data were obtained from a clinical EEG monitoring system, which stores continuous EEG data on its hard disk. The EEG data were acquired with Ag/AgCl disk electrodes placed using the 10–20
31
international electrode placement system. The recordings were obtained from 19 channels with 256-Hz sampling frequency and band-pass filtered between 1 and 70 Hz. The data were then stored on both a hard disk and an optical disk. B. Preparation of Training and Testing EEG Data The EEG records of 29 epileptic patients were obtained during restful wakefulness stage; two of them contained generalized epileptiform activity. The remaining EEG records contained focal epileptiform activity. All EEG records had been previously seen independently by two EEGers. The EEGers had labeled the ET candidates as a single-channel epileptiform activity throughout the 19 channels. Then, the ET candidates, for which there were no agreement between two EEGers, were treated as the background EEG. The multichannel EVs were also determined by the same EEGers with the same consensus mentioned above. The total length of the EEG records was 11 hours and 6 min (average 22.1 min) and the ages of the patients varied from 2 to 69 years (average 28 years). We used 29 EEG records. Nineteen of them were chosen for training procedures. These EEG records had an average length of 22.7 min and total length of 7 hours and 18 min. The patients had an average age of 29 years. The EEGers determined 216 EVs in the training set. The remaining 10 EEG records were used to test the performance of the trained system. These records had an average length of 20.8 min and total length of 3 hours and 48 min. Ninety-three EVs were determined for the testing procedure. The patients of this group had an average age of 26 years. General features of the test records are as follows. • Record 1 and 7: A few ETs. • Record 2 and 10: ETs, a few muscle artifacts, electrode pops. • Record 3: ETs, muscle artifacts, electrode pops, moving artifact, 50-Hz artifact. • Record 4: ETs, muscle artifacts, a few eye blinks. • Record 5 and 8: ETs, muscle artifacts. • Record 6 and 9: ETs, a few muscle artifacts. III. PEAK DETECTION AND FEATURE EXTRACTION A. Peak Detection Our approach is based on the fact that an ET appears as a peak in the EEG record. Therefore, the first step would be the extraction of the peaks from the record. To detect peaks, we first took the average of the whole signal and subtracted it from the original signal. This is done not only for peak detection, but also for the rest of the procedure. Then we calculated the and found its zero-crossings. time-derivative of the signal This can be achieved for a discrete signal by searching the inflection points based on the following criteria. • If , then this is a positive peak; we store it with index. • If , then this is a negative peak; we store it with index. • If , then this is not a peak; we discard it. The above procedure is indeed a numerical differentiation technique such that represents the first-order forward approximation to the derivative of at .
32
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 1, JANUARY 2005
Fig. 1. Features of an ET used in the pre-classification stage. Two of the features that are not shown in the figure are obtained as follows: First Half Wave Slope (FHWS)=First Half Wave Amplitude (FHWA)/First Half Wave Duration (FHWD) and Second Half Wave Slope (SHWS)=Second Half Wave Amplitude (SHWA)/Second Half Wave Duration (SHWD).
B. Feature Extraction An ET is a peak having the following distinguishing features[1]. 1) There is a relatively large and smooth slope followed by a relatively large and smooth slope of opposite polarity. 2) The apex of the ET is sharp. 3) Although the two sides may be of unequal length, the duration of an ET is always between 20 and 70 ms. This duration can be defined as the sum of the first and second half wave duration (Fig. 1). A model of an ET that possesses the properties described above is a simple triangular waveform. The horizontal side of the triangle is referred to as the base, while the other two sides as the legs. The triangularization process eliminates breakpoints between peaks, which correspond to definite non-ETs, by a simple threshold mechanism and results in a piecewise linear representation connecting the remaining successive peaks. We did not use the triangle model in all stages of the spike detection procedures. We used it only for extracting the features that would be fed to the pre-classifier as explained in Section IV-B. In the postclassification stage, on the other hand, we used the original form of the detected peaks. The features used as input to the pre-classifier are: first half wave amplitude (FHWA), second half wave amplitude (SHWA), first half wave duration (FHWD), second half wave duration (SHWD), first half wave slope (FHWS), and second half wave slope (SHWS) (Fig. 1). IV. PRECLASSIFICATION The pre-classification procedure is performed to eliminate definite non-ETs and also to capture definite ETs. The remaining peaks corresponding to possible ETs and possible non-ETs are the only inputs of the postclassifier. In this way, not only the computation time of the entire classification procedure is reduced, but also the overall detection performance is increased. A. Artifact Reduction Artifacts due to patient movement and eye blinks are abrupt changes in the EEG record with a rapid upstroke. They usually
Fig. 2. A discrete perceptron fed by six features: first half wave amplitude (FHWA), first half wave duration (FHWD), second half wave amplitude (SHWA), second half wave duration (SHWD), first half wave slope (FHWS), and second half wave slope (SHWS).
have large amplitudes and very short duration, but movement artifact can also have a long duration. A few 50-Hz and 50-Hzlike artifacts, which originate from electrical interferences, are observed in one of the test records. Muscle artifacts can also be observed in some portions of the records. We also observe a few eye blink artifacts in one of the test records. In order to eliminate the artifacts and very small peaks that are not ET, the peaks detected as explained in Section III-A are first located and marked, and then passed through a simple threshold mechanism. In this mechanism, if the length of a segment that connects two adjacent peaks is shorter than those of the next and previous segments, and if this segment also has an amplitude and duration and 20 ms, respectively, then we label the smaller than 2 peaks associated with that segment. The peaks located by this procedure are removed from the set of marked peaks while the original form of the signal is preserved. Although most of the artifacts located in our data set are eliminated in this way, a more efficient artifact rejection method would be needed especially in the presence of profuse artifacts in the records. B. Discrete Perceptron as the Preclassifier Two discrete perceptrons, which are fed by the six features mentioned in Section III-B, are trained to learn to separate definite ETs and definite non-ETs from the others (Fig. 2). The discrete perceptron learning rule [29] is used for both perceptrons, but with different desired outputs. The weight update in the discrete perceptron learning rule is defined as (1) where is the connection weight vector, is the actual output, is the desired output (labeled by experts), is the learning rate, and is the input vector that consists of six features. The actual output is calculated by the following input–output relation of the discrete perceptron (2) where
is the threshold value and (3)
ACIR et al.: AUTOMATIC DETECTION OF EVS IN EEG
33
2
Fig. 3. The peaks are classified into three subgroups: ( ) represents ET, ( ) represents non-ET, Group I represents definite ETs, Group II represents definite non-ETs, and Group III represents possible ETs and possible non-ETs.
Our aim in the first stage is, by using two discrete perceptrons, to map all peaks into three dinstinct groups: 1) definite ETs; 2) definite non-ETs; and 3) possible ETs and possible non-ETs (see Fig. 3). One of the discrete perceptrons is trained in such a way that its output would be 1 for definite non-ETs and 1 otherwise. The second discrete perceptron, on the other hand, is trained to produce 1 for definite ETs and 1 otherwise. A peak, which produces 1 at the output of both discrete perceptrons is assigned to the third subgroup (i.e., possible ETs and possible non-ETs). Since the peaks assigned to the first and second groups are precisely classified in the first stage as described above, they are not involved in the second stage. However the peaks belonging to the third group are not classified in the first stage. Therefore, they are to be further processed in the second stage by a nonlinear classifier which is designed to be able to discriminate only the peaks in the third group. Since we had already stored the indices of all peaks for further use of the original signal, the original values of a certain number of samples around each peak in the third group constitute the input vector to the nonlinear classifier of the second stage. 19 EEG records which contain 216 EVs are used to train the discrete perceptrons while a different set of 10 records with 93 EVs are reserved for testing purposes. By the pre-classification stage, 80% of the peaks fall into the first and second groups. The remaining 20% of the peaks, which belong to the third group, is given as input to the postclassifier. As a result of the pre-classification stage, the average sensitivity, selectivity and false detection rate are found as 100%, 9.6%, and 3371.9, respectively. In this way, both the training set and training time of the postclassifier are drastically reduced. V. POSTCLASSIFICATION The function of the postclassifier is to separate the peaks in the third group, i.e., possible ETs and possible non-ETs, from each other. An RBFN with a hard-limiter output unit is used as the postclassifier. The RBFN is trained by the hybrid method [25], [30]: Firstly, the centers of the hidden units are determined by using a trial and error fashion as explained in Section V-B. Secondly, the output unit is trained with the block least mean squares (LMS) method [25]. Instead of the six features used in the training of the pre-classifier, windowed raw data associated with each peak are fed to the RBFN. This postclassification procedure is also performed for autoregressive (AR)-based RBFN, RB-SVM and backpropagation MLP, and the results are compared in terms of their classification performances.
Fig. 4.
Architecture of the RBFN used for postclassification.
A. Radial Basis Function Network The architecture of the RBFN is shown in Fig. 4. An RBFN with one output neuron implements the input-output relation in (4), which is indeed an assemble of the nonlinear mapping realized by the hidden layer and the linear mapping realized by the output layer [25] (4) where is the radial basis function with unity is the input vector, denotes variance, the Euclidean norm, are the centers of the s are the entries of the linear weight vector RBFN, and . After having fixed the radial basis functions, the weights of the output layer can then be computed. For this computation, we use the method of least squares with batch processing [31]. The training set is denoted by , where denotes the input vector and denotes the desired output belonging to the th example. Let us define the following matrix and vector:
(5)
(6) The real valued , which is called the interpolation matrix, , where is the number of training is of size -byexamples and is the number of centers. The first column of matrix is included to account for the bias. The desired output vector is of size -by-1. Once the centers are determined in a way explained in Section V-B., the linear weight vector can be found by solving (7) , (7) represents an over-determined system of With equations in that we have more equations than the unknowns. To solve (7) for the weight vector , we utilize the block LMS
34
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 1, JANUARY 2005
TABLE I EFFECT OF THE NUMBER OF CENTERS, ASSIGNED RANDOMLY, PERFORMANCE OF THE RBFN
ON THE
TABLE II EFFECT OF THE NUMBER OF CENTERS, ASSIGNED TO SOME EPILEPTIFORM TRANSIENT TYPES, ON THE PERFORMANCE OF THE RBFN
TABLE III EFFECT Fig. 5. A sketch of an ET extracted from the original signal and used as the input to the RBFN using different number of data points from the right of the center.
method [25]. Alternatively, the same solution vector found by calculating the pseudoinverse where
OF THE INPUT SIZE ON THE OVERALL PERFORMANCE FOR RBFN
CLASSIFICATION
can be
(8) is the pseudoinverse of the interpolation matrix .
B. Implementation and Training of the RBFN After the pre-classification stage, a new data set is constructed by extracting new features of the ET candidates from the original signal by a procedure as will be explained below. In this procedure, the time indices of data have been stored. The procedure is as follows. • The main peak of the ET candidate is taken as the center of the window. • Ten data points from left and ten data points from right of the center are recalled by using the previously stored time indices from the original data. • This procedure is repeated for 20, 25, 30, 40, 50, and 60 data points from the right of the center, while ten data points from the left is fixed (Fig. 5). Using the above procedure, the training set is prepared without any modification on ET forms by extracting the ETs from the original signal in the determined intervals. The determination of the centers is the crucial step in the use of RBFN. During the determination of the centers, we first randomly assign the centers among the input samples that have been fixed to 41 data points. After the assignment of the centers, the learning procedure is repeated several times until we obtain the best and the worst accuracy values for 6, 8, 10, 12, 14, 16, and 20 centers by using the batch mode LMS learning rule (Table I). Second, we assign centers among the input samples, which correspond mostly to ETs, in a random manner. In other words, first, most centers are chosen among different ET types which correspond to 1 desired outputs in the training set. Then, the remaining centers are randomly chosen among the input samples and we compound these two center vectors in
a random manner. In this way, the centers are determined. The training procedure is also repeated several times until we get the best and the worst accuracy values for 6, 8, 10, 12, 14, 16, and 20 centers (Table II). The number of the center vectors also corresponds to the number of neurons in the second layer of the RBFN. We also note that these implementations are repeated several times for the test set including 10 different records and we get the same results as presented in Table II. This means that the implementations reported in this training procedure are reproducible for different tasks. The results presented in Table II imply that 14 is the optimum number of centers. Ten of these centers are assigned to some ET types while the remaining ones to other waveforms randomly chosen among the input samples as explained before. When we compare the results in Tables I and II, we see that the second procedure for center determination is more effective than the first one. Hence, we use the second procedure in the testing stage. To see the effect of the input window size on the performance of the RBFN, the implementation is repeated for different input window sizes by using the determined centers (Table III). The results presented in Table III imply that the input size of 41 data points, which corresponds to a time duration of 160 ms, is the optimal choice. Spike morphology studies also show that the ET duration is, on the average, between 20 and 70 ms [1]. Positioning of such an ET waveform in a 160–ms window provides all the information to the RBFN. It is observed that the window size which is equal to the input dimension of the RBFN, the number of centers and the way of determining the centers all affect the performance of the RBFN as the postclassifier. After the centers and the input window size are determined, the optimal weights are calculated by using
ACIR et al.: AUTOMATIC DETECTION OF EVS IN EEG
35
TABLE IV SENSITIVITIES AND SELECTIVITIES OF THE SYSTEM FOR EACH STAGE IN TERMS OF EVS
TABLE V FALSE DETECTION RATES (PER HOUR) OF THE SYSTEM FOR EACH STAGE IN TERMS OF EVS
the block LMS rule [25] in which ETs and non-ETs are represented by 1 and 1, respectively, for both the training and testing phases. The detection performance of the trained system is tested by using 10 epileptic records for an input window size of 41 data points as shown in Table IV. Testing the RBFN as a postclassifier in our system shows an average sensitivity of 87.7%, an average selectivity of 84.4% and an average false detection rate (per hour) of 9.3, which are determined in terms of EVs (Table IV and V). Computing time for a multichannel record of 42.8 min long is also found to be about 2 min for the RBFN. C. AR Based Implementation of the RBFN We also implement an AR process for feature extraction in the postclassification procedure. The AR model is very often used due to its simplicity and is given as
, which is a number obtained by using data points and for the Akaike criterion [32]. The number of centers is chosen 14 as in the RBFN. The AR coefficients of the centers are also cal. After these coefficients are assigned to the culated for RBFN, optimum weights are calculated by using the block LMS rule. All the remaining implementations are the same as in the RBFN. After training the network, test data are used to measure its overall performance. The performance is determined after the integration of multichannel information in terms of EVs. The performance of the AR-based RBFN results in 87.5% average sensitivity, 84.0% average selectivity, and average false detection rate of 10.1. So, the performance of the AR-based RBFN is approximately the same with the performance of the RBFN (Table IV and V). Computing time for a multichannel record of 42.8 min duration is about 2 min 40 s for the AR-based approach. D. Support Vector Machines
(9) is the sampled wave signal, are the coefficients where of the AR model, is the order of the AR model, and is the white noise having zero mean and whose probability density function is assumed to be nearly Gaussian [32]. After the pre-classification stage, we calculate the AR coefficients for 41
The support vector machine (SVM) is a relatively new approach for solving supervised classification problems and is very useful due to its generalization ability. In essence, such an approach maximizes the margin between the training data and the decision boundary, which can be cast as a quadratic optimization problem. The subsets of the patterns that are closest to the decision boundary are called the support vectors.
36
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 1, JANUARY 2005
For a linearly separable binary classification problem, the construction of a hyperplane so that the margin between the hyperplane and the nearest point is maximized can be posed as the following quadratic optimization problem [25]: (10) subject to (11) where
stands for the th desired output, stands for the th input sample of the training . (10) forces a rescaling on so that data set the point closest to the hyperplane has a distance of [25]. Maximizing the margin corresponds to minimizing the Euclidean norm of the weight vector. Often, in practice, a separating hyperplane does not exist. Hence, the constraint (11) , . is relaxed by introducing slack variables The optimization problem now becomes as follows (for a user defined positive finite constant ): (12) subject to with
(13) (14)
The C parameter controls the tradeoff between the complexity of the machine and the number of nonseparable points; it may, therefore, be viewed as a form of a “regularization” paramand using the eter. By introducing the Lagrange multipliers Karush–Kuhn–Tucker theorem of optimization theory, we can pose the equivalent dual optimization problem [33]
Fig. 6. Two examples of RB-SVM implementations to see the effect of C which is a user-specified positive number on the separation of ETs: solid line represents decision boundary, dashed line represents maximized margins, ( ) represents ET and (+) represents non-ET. The circled plus signs, and squares with the white circle inside represent the support vectors. (a) C = 100 (b) C = 0:1.
(15)
with the constraint (16) and (17) unchanged. The decision function for the vector then becomes
(16)
(21)
(17)
The ’s are determined from (16), (17), and (20). The bias parameter is determined from (19) by using two arbitrary support vectors from known but opposite classes. By replacing the inner products with kernel functions, the input data are mapped into a higher dimensional space. It is then in this higher dimensional space that a separating hyperplane is constructed to maximize the margin. In the lower dimensional data space, this hyperplane becomes a nonlinear separating function.
subject to
The solution is given by (18) The nonzero ’s correspond to the so-called support vectors that help to define the boundary between the two classes. All values the other training examples with corresponding zero are rendered irrelevant and they automatically satisfy the constraint (13) with . The hyperplane decision function can be written for the vector as follows:
E. Implementation of the SVM In this implementation, we construct an RB-SVM by using an RBF as the kernel function
(19)
(22)
To allow for more general decision surfaces, the inner product can simply be replaced by a suitable kernel function . Hence the objective function to be maximized can be written as
. First of all, in order to visualize the problem and where to see the effect of the parameter, we restrict ourselves to two features that contain the most important information about the class, namely the duration and the amplitude. As can be seen from Fig. 6, the separation of a group of ETs from a group of non-ETs is not so trivial. In the RB-SVM classification, support vectors are represented by circles as shown in Fig. 6, which
(20)
ACIR et al.: AUTOMATIC DETECTION OF EVS IN EEG
37
shows the results of an RB-SVM classification for various degrees of misclassification tolerance. Therefore, Fig. 6 visualizes the effect of the tolerance on misclassification errors on the topology of the classifier boundary. The results for seem to offer good solutions. Then, the same data utilized in the implementation of the RBFN is used for demonstrating the performance of the RB-SVM. The data set contains 41 data points for each peak selected after the pre-classification stage. ET and non-ET activities are represented by 1 and 1, respectively, for both training and testing procedures. We have trained the machine values until finding the best result, which is for different in the testing procedure. The number obtained for of the support vectors is 18, which corresponds to 3% of the training data. Ten epileptic records, with which the RBFN is also tested, are used for the measurement of the performance of the network. The overall classification performance of the RB-SVM after the integration of multichannel information is calculated by measuring its sensitivity and selectivity in terms of EVs. Testing the RB-SVM as the postclassifier shows an average sensitivity of 89.1%, an average selectivity of 85.9%, and an average false detection rate (per hour) of 7.5 (Tables IV and V). Computing time for a multichannel record of 42.8 min long is also found to be about 2 min 50 s for the RB-SVM. F. Multilayer Perceptron and Its Implementation An MLP network, one of the most commonly used neural networks, is used for the postclassification and trained by the standard backpropagation algorithm [25]. It is designed to have three layers: an input layer, a hidden layer, and an output layer. The neurons in the input layer act as buffers for distributing the input signal to the neurons in the hidden layer. Neuron in the hidden layer first obtains the weighted sum of the input signal through connection weights and then computes its output as a function of this sum (23) Here, is the bipolar sigmoid function. The backpropagation algorithm used for training the MLP network is a gradient descent algorithm. When the momentum in the conterm is added to the algorithm, the change nection weight at the th step becomes (24) where is the learning rate, is the momentum coefficient, and is a factor which is defined as follows: (25) and where net . For hidden neurons
is the desired output for neuron (26)
factors calculated for the output neurons are substituted into (26) to obtain factors for hidden neurons. Thus, beginning with the output layer, factors are computed in a backward way and all connection weights ’s are updated according to (24).
We replace the RBFN with the MLP in the three-stage procedure without any further change. The MLP is trained several times with different number of hidden neurons until reaching the best performance. We first use the input window size the same as with the RBFN, i.e., 41 data points. We also try windows with 31, 36, and 51 data points. The input window size with 41 data points is found to be the optimum. The effect of the number of hidden neurons on the performance is then evaluated. A performance improvement is observed when the number of hidden neurons is increased to 8. However, no further increase is observed for higher numbers of hidden neurons. Thus, the number of hidden neurons is held constant at 8 in both the training and testing phases. Testing the MLP network with the same data set used in the RBFN shows an average sensitivity of 87.4%, an average selectivity of 84.0%, and an average false detection rate (per hour) of 10.2 (Table IV and V). The computing time for a multichannel record of 42.8 min long is found to be about 3 min for the MLP. In some other studies [7], ANNs have been widely used for detecting EVs directly using just a few features without pre-classification. To make a comparison, we also use the MLP without the pre-classification to detect EVs using the six features (FHWA, FHWD, SHWA, SHWD, FHWS, SHWS) as input. The resultant overall performance shows an average sensitivity of 70.9%, an average selectivity of 55.4% and an average false detection rate (per hour) of 196.1 (Table IV and V). The computing time for a multichannel record of 42.8 min long in this case is found to be about 20 min. G. Integration of Multichannel Information In clinical practice, the EEGers also use spatial information in the process of identifying an EV. In a similar manner, the final stage of our spike detection system combines the outputs of the postclassifier in such a way as to confirm the presence of an EV across two or more channels of the EEG. ETs in different channels generally have similar patterns, but their peaks may not appear exactly at the same time, i.e., a time delay could occur between two peaks. Therefore, the localization of that activity should also be taken into account for the integration of multichannel information. After the postclassification stage, 19-dimensional vectors are formed for integrating multichannel information. For each ET candidate in any channel, we associate a vector . So, the total number of vectors is equal to the number of ET canof vectors associated didates in all channels. The set to ET candidates is constructed by the following algorithm. Step 1) Order the ET candidates in all channels by indexing them with . . Step 2) Set . Step 3) Choose the first ET candidate indexed by Step 4) If the chosen ET candidate is in the th channel, then set . Step 5) Take the time index (in the original EEG signal) corresponding to the peak of the chosen ET candidate as the center of a time window. Step 6) Specify the window as taking 25 data points (100 ms) from left and 25 data points (100 ms) from right of that center.
38
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 1, JANUARY 2005
if an ET candidate exists in channel , Step 7) Set ). (for all with Set otherwise. . Otherwise, increase by 1 Step 8) Stop if , then go to Step 4). vector obtained by Each this way reflects the ET activities in all channels related to a considered ET candidate at Step 4). After integrating multichannel information as forming vectors, a simple threshold method is used for determining the presence of an EV (27) where the threshold is chosen 4 and is as defined in indicates the presence of an EV whereas the absence (3). . This integration procedure of an EV is represented by is repeated for all detected ETs. As an example, assume that our system detects two ETs on Channel 3 and Channel 5 in the same window. Then, the corresponding vector will be constructed as , so confirms the presence of an EV in that part of the EEG. VI. RESULTS Each implementation of the proposed detection system was evaluated using 19 channel clinical EEG records of 29 epileptic subjects. Ten of them were used for testing purposes; the rest were used for training procedures. The proposed system has been developed using MATLAB 6.0. The tests were performed on a Pentium Celeron 400-MHz PC computer. The detection procedure was performed off-line on data stored on hard disk. In the evaluation process, the false detection rate per hour was also calculated as a measure to determine the performance of the system (Table V). By definition, the false detection rate is the number of false detections per hour. The false detection rate is an important measure of the performance of a detection system as it gives an indication of the usefulness of a detection system in routine clinical applications. In addition, “the measured false detection rate per hour can be used to place the reported performance of the system into context when considering the length of EEG records used in the test sets” [21]. Table IV gives the sensitivity and selectivity values of the system at each stage for each patient with different ANN models. The false detection rate (per hour) of the system is given in Table V. At each stage, the measures of sensitivity, selectivity and false detection rate are indicated for each EEG record separately. It can be seen that the pre-classifier stage resulted in the highest sensitivity for all EEG records with a sensitivity of 100%. On the other hand, however, it resulted in the lowest selectivity of 9.6% and false detection rate per hour of 3371.9. This shows that, in the pre-classification stage, all epileptic activities were detected with very little selectivity and with a too high false detection rate. Thus, the pre-classification procedure eliminated definite non-ETs and captured definite ETs. At the output of the postclassification stage, multichannel information was integrated and the performances were determined with respect to the multichannel EVs. The RB-SVM gave the
best results in terms of sensitivity, selectivity, and false detection rate. The selectivity was increased to 85.9% for the RB-SVM at this stage; however, the sensitivity was reduced to 89.1%. The false detection rate was also reduced to 7.5. The RBFN, AR-based RBFN, and MLP were found to display similar performances in terms of sensitivity, selectivity and false detection rate per hour (Table IV and V). Testing the RBFN as the postclassifier in the system showed an average sensitivity of 87.7%, an average selectivity of 84.4%, and an average false detection rate (per hour) of 9.3. The performance of the AR-based RBFN resulted in 87.5% average sensitivity, 84.0% average selectivity, and average false detection rate of 10.1. Similarly, the MLP network showed an average sensitivity of 87.4%, an average selectivity of 84.0%, and an average false detection rate (per hour) of 10.2. As can be seen from Table IV and Table V, MLP with no pre-classification gave an average sensitivity of 70.9%, an average selectivity of 55.4%, and an average false detection rate per hour of 196.1. These results are much lower than those of the other postclassifiers. VII. DISCUSSION In this paper, we introduce a three-stage procedure based on ANNs for the automatic detection of EVs in a multichannel EEG signal. In the first stage, a pre-classification unit trained by the well known discrete perceptron learning rule successfully separates possible non-ETs and possible ETs from definite ETs and definite non-ETs. Thus, the pre-classification not only reduces the computation time but also increases the overall detection performance of the system. In the second stage, possible ETs and possible non-ETs are aimed to be separated from each other by an ANN that would function as a nonlinear postclassifier. Three different neural networks, i.e., a backpropagation multilayer perceptron and two radial basis function networks trained by a hybrid method and a support vector method, respectively, are used as the postclassifier and are compared in terms of their classification performances. In the third stage, multichannel information is integrated for identifying EVs. In this manner, the final stage of our spike detection system combines the outcomes of the postclassifier to make a decision about the presence of an EV. The overall performance of the system is determined with respect to EVs confirmed after the integration of multichannel information. As can be seen from Tables IV and V, visual evaluation by two EEGers of 19 channel EEG records of 10 epileptic patients shows that the best performance is obtained with the RB-SVM providing an average sensitivity of 89.1%, an average selectivity of 85.9%, and an average false detection rate (per hour) of 7.5. Previous studies have demonstrated that ANNs can be successfully trained to recognize EVs [14]–[22]. ANN-based detection systems generally use two main approaches, differing essentially in their input representations. In the first approach, preselected parameters are computed from the EEG records and fed into an ANN as the input. In the second approach, the raw EEG data is directly used as the input to the ANN. While the parameterized input approach has the advantage of using fewer input features, it requires the precise definition of input features that would be selected for the detection of spikes. In essence, an expert for defining or selecting these parameters is needed. Consequently, this approach does not make full use of the power
ACIR et al.: AUTOMATIC DETECTION OF EVS IN EEG
of an ANN to find the best features for optimum detection. Instead, the ANN is forced to use the preselected features as is done in the traditional rule-based systems. Nevertheless, the parameterized approach has been used by a number of researchers with fairly successful results [14], [18]. The detection of spikes using raw EEG data, on the other hand, has the potential advantage of avoiding the possible false classification that could arise from data loss in the parameterization of the EEG data [16], [24]. In this paper, both the parameterized and raw input data are used in the pre-classification and postclassification stages, respectively. Thus, the present system combines the advantages of two methods. Webber et al. [18] have tested their system on the parameterized EEG records obtained from 10 patients, and reported satisfactory sensitivity and selectivity values (both at 74%) by using mimetic and ANN methods. When using raw EEG instead of parameters, they have obtained low sensitivity and selectivity values (both at 46%). Özdamar et al. [34] have reported similarly good results for sensitivity (90%), but selectivity is relatively low (69%). Dingle et al. [9] have given a very good result for the false detection rate per hour (0) and selectivity (100%), although the sensitivity is relatively low (58%). James et al. [21] have also reported very good results for selectivity (82%) and false detection rate per hour (7), but a relatively low sensitivity (55%). Comparison of our system with other detection systems given in the literature is difficult due to the varieties in the network types, architectures, and data sources (e.g., channel numbers, displaying montages, degrees of artifact presence, recording type, status of subject). For instance, our data contain muscle artifacts and a few other types of artifacts such as electrode pops, 50–Hz artifacts, eye blink artifacts, and moving artifacts. But, the data used in other studies may contain either less or more artifacts than our data. The artifact type also changes according to the type of record (e.g., awake or sleep EEG). Therefore, a direct comparison is difficult due to these reasons. Although most of the specified artifacts located in our data set are eliminated as mentioned in Section IV-A, a more efficient artifact rejection procedure would be needed especially in the presence of profuse artifact in the records. Nevertheless, when we evaluate the performance of our system, it can be seen that it achieves very good sensitivity, selectivity, and false detection rate values. In conclusion, this paper introduces a novel three-stage classification procedure based on different ANNs for the detection of EVs. The proposed approach accomplishes peak detection, feature extraction, pre-classification, postclassification and integration of multichannel information by preserving the original form of the ETs. Comparison with other successful methods shows that, the RB-SVM, which is found to be very useful due to its generalization ability in the three-stage classification procedure, achieves a significant improvement in terms of sensitivity, selectivity and false detection rate.
ACKNOWLEDGMENT The authors would like to thank the reviewers of this manuscript for their helpful comments and suggestions.
39
REFERENCES [1] E. Chatrian, L. Bergamini, M. Dondey, D. W. Klass, M. LennoxBuchthal, and I. Petersen, “A glossary of terms most commonly used by clinical electroencephalographers,” Electroenceph. Clin. Neurophysiol., vol. 37, pp. 538–548, 1974. [2] T. Kalayci and Ö. Özdamar, “Wavelet preprocessing for automated neural network detection of EEG spikes,” IEEE Eng. Med. Biol. Mag., vol. 14, no. 2, pp. 160–166, 1995. [3] P. Y. Ktonas, “Automated spike and sharp wave (SSW) detection,” in Methods of Analysis of Brain Electrical and Magnetic Signals, EEG Handbook, A. S. Gevins and A. Remond, Eds. Amsterdam, The Netherlands: Elsevier, 1987, vol. 1, pp. 211–241. [4] D. O. Walter, H. F. Muller, and R. M. Jell, “Semiautomatic quantification of sharpness of EEG phenomenon,” IEEE Trans. Biomed. Eng., vol. BME-20, pp. 53–54, 1973. [5] P. Y. Ktonas and J. R. Smith, “Quantification of abnormal EEG spike characteristics,” Comput. Biol. Med., vol. 4, pp. 157–163, 1974. [6] J. Gotman and P. Gloor, “Automatic recognition and quantification of interictal epileptic activity in the human scalp EEG,” Electroenceph. Clin. Neurophysiol., vol. 41, pp. 513–529, 1976. [7] G. Hellmann, “Multifold features determine linear equation for automatic spike detection applying neural nin interictal EcoG,” Clin. Neurophysiol., vol. 110, pp. 887–894, 1999. [8] C. W. Ko, Y. D. Lin, H. W. Chang, and G. J. Jan, “An EEG spike detection algorithm using ANN with multichannel correlation,” in Proc. 20th Annu. Int. Conf. IEEE Engineering in Med. and Biol. Soc., vol. 20, 1998, pp. 2070–2073. [9] A. A. Dingle, R. D. Jones, G. J. Carroll, and W. R. Fright, “A multistage system to detect epileptiform activity in the EEG,” IEEE Trans. Biomed. Eng., vol. 40, no. 12, pp. 1260–1268, Dec. 1993. [10] J. Gotman and L. Y. Wang, “State-dependent spike detection: concepts and preliminary results,” Electroenceph. Clin. Neurophysiol., vol. 79, pp. 11–19, 1991. [11] G. Pfurtscheller and G. Fischer, “A new approach to spike detection using a combination of inverse and matched filter techniques,” Electroenceph. Clin. Neurophysiol., vol. 44, pp. 243–247, 1977. [12] D. Frost, “Automatic recognition and characterization of epileptiform discharges in human EEG,” J. Clin. Neurophysiol., vol. 2, no. 3, pp. 231–249, 1985. [13] J. R. Glover, N. Raghavan, P. Y. Ktonas, and J. D. Frost, “Context-based automated detection of epileptogenic sharp transients in the EEG: elimination of false positives,” IEEE Trans. Biomed. Eng., vol. 36, no. 5, pp. 519–527, May 1989. [14] A. J. Gabor and M. Seyal, “Automated interictal EEG spike detection using artificial neural networks,” Electroenceph. Clin. Neurophysiol., vol. 83, pp. 271–280, 1992. [15] T. Shimada, T. Shiina, and Y. Saito, “Detection of characteristic waves of sleep EEG by neural network analysis,” IEEE Trans. Biomed. Eng., vol. 47, no. 3, pp. 369–379, Mar. 2000. [16] C. W. Ko and H. W. Chung, “Automatic spike detection via an artificial neural network using raw EEG data: effect of data preparation and implications in the limitations of online recognition,” Clin. Neurophysiol., vol. 111, pp. 477–481, 2000. [17] R. Eberhart, R. Dobbins, and W. R. S. Webber, “Neural network design considerations for EEG spike detection,” in Proc. 15th Northeast Bioeng. Conf., Boston, MA, 1989, pp. 97–98. [18] W. R. S. Webber, B. Litt, K. Wilson, and R. Lesser, “Practical detection of epileptiform discharges (ED’s) in the EEG using an artificial neural network: a comparison of raw and parameterized data,” Electroenceph. Clin. Neurophysiol., vol. 91, pp. 194–204, 1994. [19] G. Jando, R. M. Siegel, Z. Horvath, and G. Buzsaki, “Pattern recognition of the electroencephalogram by artificial neural networks,” Electroenceph. Clin. Neurophysiol., vol. 86, no. 2, pp. 100–109, 1993. [20] N. Pradhan, P. K. Sadasivan, and G. R. Arunodaya, “Detection of seizure activity in EEG by an artificial neural network: a preliminary study,” Comput. Biomed. Res., vol. 29, no. 4, pp. 303–313, 1996. [21] C. J. James, R. D. Jones, P. J. Bones, and G. J. Carroll, “Detection of epileptiform discharges in the EEG by a hybrid system comprising mimetic, self-organized artificial neural network, and fuzzy logic stages,” Clin. Neurophysiol., vol. 110, pp. 2049–2063, 1999. [22] A. J. Gabor, “Seizure detection using a self-organizing neural network: validation and comparison with other detection strategies,” Electroenceph. Clin. Neurophysiol., vol. 107, pp. 27–32, 1998. [23] F. M. C. Besag, M. Mills, F. Wardale, C. M. Andrew, and M. D. Craggs, “The validation of a new ambulatory spike and wave monitor,” Electroenceph. Clin. Neurophysiol., vol. 73, pp. 157–164, 1989.
40
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 1, JANUARY 2005
[24] Ö. Özdamar and T. Kalayci, “Detection of spikes with artificial neural networks using raw EEG,” Comput. Biomed. Res., vol. 31, pp. 122–142, 1998. [25] S. Haykin, Neural Networks; A Comprehensive Foundation, NJ: Prentice Hall, 1999. [26] V. Vapnik, S. Golowich, and A. Smola, “Support vector method for function approximation, regression estimation, and signal processing,” in Advances in Neural Information Processing Systems 9. Cambridge, MA: MIT Press, 1997, pp. 281–287. [27] B. Boser, I. Guyon, and V. Vapnik, “A training algorithm for optimal margin classifiers,” in Proc. 5th Annu. Workshop Computational Learning Theory, 1992, pp. 144–152. [28] W. R. S. Webber, B. Litt, R. P. Lesser, R. S. Fisher, and I. Bankman, “Automatic EEG spike detection: what should the computer imitate,” Electroenceph. Clin. Neurophysiol., vol. 87, pp. 364–373, 1993. [29] J. M. Zurada, Introduction to Artificial Neural Systems. Boston, MA: PWS, 1992. [30] Z. Uykan, C. Güzelis¸, M. E. Çelebi, and H. N. Koivo, “Analysis of inputoutput clustering for determining centers of RBFN,” IEEE Trans. Neural Networks, vol. 11, no. 4, pp. 851–858, Apr. 2000. [31] S. Haykin, Adaptive Filter Theory, NJ: Prentice-Hall, 1996. [32] M. Akay, Biomedical Signal Processing. San Diego, CA: Academic, 1994. [33] D. P. Bertsekas, Nonlinear Programming. Belmont, MA: Athenas Scientific, 1995. [34] Ö. Özdamar, C. Lopez, and I. Yaylali, “Multilevel neural network system for EEG spike detection,” in Computer Based Medical Systems. Los Alamitos, CA: IEEE Comput. Soc. Press, 1991, pp. 272–279.
Nurettin Acır received the B.Sc. degree in electronics engineering from Erciyes University, Kayseri, Turkey, in 1995 and the M.Sc. degree in electrical and electronics engineering from Nigˆ de University, Nigˆ de, Turkey, in 1998. He is now working towards the Ph.D. degree in electrical and electronics engineering at Dokuz Eylül University, ˙Izmir, Turkey. He worked in the Neuro-Sensory Engineering Laboratory at University of Miami, Coral Gables, FL, as a Visiting Researcher for one semester in 2003. His interest areas include intelligent systems, biomedical signal processing, artificial neural networks, linear and nonlinear systems, and adaptive filter theory.
˙ Ibrahim Öztura graduated from Medical School of Aegean University, ˙Izmir, Turkey, in 1987. He completed a residency in Neurology at ˙Izmir SSK Education and Research Hospital, Department of Neurology, ˙Izmir, Turkey, in 1993. He was with Stanford University Sleep Disorders Clinic, Stanford, CA, in 2003, as a Visiting Fellow. He is currently an Assistant Professor with the Department of Neurology at Dokuz Eylül University, ˙Izmir, Turkey. His research interests are neurophysiology, EEG, EMG, epilepsy, and sleep disorders.
Mehmet Kuntalp received the B.Sc. and M.Sc. degrees in electrical and electronics engineering from Bosphorus University, ˙Istanbul, Turkey, and Dokuz Eylül University, ˙Izmir, Turkey in 1988 and 1992, respectively. He received the Ph.D. degree in biomedical engineering from Northwestern University, Evanston, IL, in 1998. He is currently an Assistant Professor of Electrical and Electronics Engineering at Dokuz Eylül University. His research interests are in the application of biomedical signal analysis and classification techniques to EEG and ECG signals, investigation of the role of the cerebellum in the generation of EEG/ERP signals, and telemedicine applications.
Barıs¸ Baklan graduated from Medical School of Aegean University, ˙Izmir, Turkey in 1980. He finished his residency of Neurology at the Medical school of Dokuz Eylül University, ˙Izmir, Turkey, in 1987, and, he was Associate Professor of Neurology in 1995 and Professor of Neurology in 2001. He is currently a Professor of Neurology, director of epilepsy clinic and EEG, EEG monitoring and sleep laboratory at Dokuz Eylül University Medical School. His research interests are in the clinic electrophysiology; EEG, digital EEG, EEG mapping, video-EEG monitoring, polysomnography and epileptic disorders, and sleep disorders.
Cüneyt Güzelis¸ received the B.Sc., M.Sc., and Ph.D. degrees in electrical engineering from ˙Istanbul Technical University, Istambul, Turkey, in 1981, 1984, and 1988, respectively. Between 1989 and 1991, he worked in the Department of Electrical and Computer Engineering at University of California, Berkeley, CA, as Visiting Researcher and Lecturer. He is now a Professor in the Electrical and Electronics Engineering Department, Dokuz Eylül University, ˙Izmir, Turkey. His interest areas include neural networks, signal processing, and nonlinear circuits and systems.