Detection of Epileptiform Activity in Human EEG ... - Semantic Scholar

Neural Information Processing - Letters and Reviews

Vol.10, No.1, January 2006

LETTER

Detection of Epileptiform Activity in Human EEG Signals Using Bayesian Neural Networks Nadim Mohamed, David M. Rubin, and Tshilidzi Marwala School of Electrical and Information Engineering University of the Witwatersrand, Johannesburg Private Bag 3, Wits 2050, South Africa E-mail: [email protected], [email protected], [email protected] (Submitted on September 15, 2005; Accepted on January 4, 2006) Abstract – We investigate the application of neural networks to the problem of detecting interictal epileptiform activity in the electroencephalogram (EEG). The proposed detector consists of a segmentation, feature extraction and classification stage. For the feature extraction stage, coefficients of the Discrete Wavelet Transform (DWT), real and imaginary parts of the Fast Fourier Transform and raw EEG data were all found to be wellsuited to EEG classification. Principal Component Analysis was used to reduce the dimensionality of the features. For the classification stage, Multi-Layer Perceptron neural networks were implemented according to Maximum Likelihood and Bayesian Learning formulations. The latter was found to make better use of training data and consequently produced better trained neural networks. Rejection thresholds of 0.9 were applied to the network output as a doubt level in order to ensure that only reliable classification decisions are made. A maximum classifier accuracy of 95.1% was achieved with 25.0% of patterns not being classified. Bayesian moderated outputs could not improve on these classification predictions significantly enough to warrant their added computational overhead. Keywords – Epilepsy, Artificial neural network, Multi-layer Perceptron, Maximum Likelihood

1. Introduction Epilepsy is a common neurological disorder that is primarily diagnosed and monitored using the electroencephalogram (EEG). The EEG is a temporal recording of the variations in the weak electric potentials that develop across the scalp. Epileptiform activity is manifested in the EEG as transient waveforms known as spikes and sharp waves (SSWs). Figure 1 shows examples of the some of these waveforms. Despite advances in technology, analysis of EEG records is still usually performed manually by highly trained professionals who visually screen the EEG for the presence of spike waveforms. This is a laborious task as the EEG with its multiple recording channels can produce large volumes of data. Partially or fully automating this task with a computer-based pattern recognition system will result in a significant time saving. Research into techniques for automatic classification of the EEG began as early as the 1960s and has produced a variety of different approaches to solving this problem. Earlier systems employed mimetic, rulesbased [1, 2] or arithmetic [3] detectors. These have the advantage of being relatively simple to implement, however, suffer from high false detection rates. This is mostly caused by the inter-subject variability between the EEGs of different patients due to factors such as age, level of consciousness and inability to control all experimental conditions. Adaptive techniques which are more tolerant of variations in input signals are expected to be better suited to EEG classification than classical methods because of the wide inter-individual variation. This expectation has led to numerous applications of adaptive techniques, in particular the Artificial Neural Network (ANN), to EEG classification [4, 5, 6]. These studies generally report better classification results than studies that use alternative methods. 1

Detection of Epileptiform Activity in Human EEG Signals

Nadim Mohamed, David M. Rubin, and Tshilidzi Marwala

Figure 1. Examples of (A) epileptic spikes, (B) sharp waves and (C) spike and wave complex in the EEG. The ANN is a powerful data processing system consisting of a large number highly interconnected, smaller processing units called neurons. Their structure is loosely based on the structure of biological neurons in the human brain and gives them the ability to learn a classification function from example training data. The learning attribute allows for the implementation of a detector based on example data instead of specifying a complete set of detection rules or logic. This is an attractive property when classifying complex signals such as EEG signals whose variability between patients and even within the same patient does not make it feasible to specify a single, comprehensive set of rules for classification. In this study, the application of ANNs to the problem of epileptic SSW detection is further studied. In particular, we apply state-of-the-art, principled techniques for neural network training in order to optimise results. The problem of neural network training is posed in a Maximum Likelihood and Bayesian Learning formulation. Specifically, the error function and output activation function of the ANN are chosen so that the output of ANN approximates the posterior probability of class membership as derived by Bishop [7]. Rejection thresholds or doubt levels can then be applied to this output in order for classification decisions to only be made when there is a high degree of certainty in the class membership of an input pattern. It has been argued [8] that this will lead to classifiers that are perceived as being more reliable as there will be a lower number of false detections and less need for manual correction. Applying Bayesian inference techniques to neural network training and prediction offers principled methods for determining optimal weight decay coefficients and model selection while making efficient use of training data. This is useful for problems where only a limited set of data is available for training and testing of a neural network. The suitability both of Bayesian and Maximum Likelihood ANNs for EEG SSW detection is assessed in this study.

2. EEG Waveforms 2.1. Epileptic and Non-Epileptic Waveforms Abnormal EEG activity associated with epilepsy may be ictal (during seizures) or inter-ictal (between seizures) [2]. In this study, only inter-ictal activity is considered as this is more commonly used for clinical diagnosis of epilepsy. The following types of waveforms fall under this category: 1) Spikes and Sharp Waves (SSWs): These are paroxysmal, transient waveforms. They have high amplitudes relative to background activity and pointed peaks. The spikes have duration of 20 – 70 ms and sharp waves have duration of 70 – 200 ms. Polyspikes is the term given to multiple repetitive spikes that are repeated at a frequency of approximately 20Hz. 2



2) Spike and Wave Complex: SSWs are sometimes followed by a slow wave of frequency less than 4 Hz. Numerous artifacts are also often present in the EEG that is not generated by cerebral activity. These include eye-blink artifacts, eye movement artifacts, and Electromyographic (EMG) muscular artifacts as well as electrical interference. Some of these signals such as eye-blink artifacts bear a close resemblance to SSWs. The design of a detection system should facilitate a distinction between SSWs and artifacts.

3. Neural Networks for Classification For the purposes of this study, ANNs are viewed as a general framework for representing non-linear mappings between multi-dimensional spaces, where the form of mapping is governed by a set of adjustable parameters [7]. With supervised learning, the training data consists of both the input to the ANN and an associated target output. In this study, the input is a set of features extracted from the EEG and the target output is 1 or 0, where 1 represents the epileptiform activity pattern class and 0 represents the non-epileptiform activity pattern class.

3.1. Architecture The Multi-Layer Perceptron (MLP) ANN architecture was selected for this study. This architecture has been applied successfully to the epileptic SSW detection problem in the past [4, 5, 6], which suggests that it is suitable for classifying the EEG. Every connection between inputs and neurons is weighted by an adjustable weight parameter. Theoretically, this type of two-layer MLP model is able to approximate any continuous function with arbitrary accuracy, provided the number of hidden neurons is sufficiently large [7]. If x is the input to the MLP and y is the output of the MLP, a relation mapping between the input to the output may be written as ⎛M ⎞ ⎛N ⎞ y = f output ⎜⎜ ∑ w j f hidden ⎜⎜ ∑ wij xi + b0 ⎟⎟ + w0 ⎟⎟ ⎝ i =1 ⎠ ⎝ j =1 ⎠

(1)

where N is the number of inputs units, M is the number of hidden neurons, xi is the i-th input unit, wij is the weight parameter between input i and hidden neuron j, and wj is the weight parameter between hidden neuron j and the output neuron. The activation function f output (⋅) is sigmoid as

f output (a) =

1

(2)

1 + e− a The activation function f hidden (⋅) is hyperbolic tangent as f hidden (a) = tan h(a)

(3)

3.2. Maximum Likelihood Formulation For a two-class classification problem, Bishop [7] showed that the error between the network output in Eq.(1) and the target output t for all training patterns P is the cross-entropy error function given by P

[

ECEE = − ∑ t p ln( y p ) +(1 − t p ) ln(1 − y p ) p =1

]

(4)

During training, the ANN weight parameters w’s are iteratively adjusted in order to minimise the error function in Eq.(4). Initially the weights are set to random values. An optimisation method known as the scaled conjugate gradient (SCG) method [9] was used to determine the magnitude of the parameter updates. The SCG method was selected instead of other optimization methods because of its fast convergence properties. Bishop [7] proved that if the cross-entropy error function is minimized during ANN training and the activation function of an ANN is given by Eq.(2), the output of an ANN approximates the posterior probability of membership to a pattern class given the vector of inputs x. In the case of EEG SSW detection, the output will approximate the posterior probability of the input EEG pattern containing epileptic activity. If this class is represented by C1 and the pattern class for not containing epileptic activity is represented by C2, the relations for the posterior probability of class membership can be written as P (C 1 | x) = y

(5) 3



P(C 2 | x) = 1 − y

(6)

The above relations give a probabilistic interpretation to the ANN output. Based on these relations, it is clear that an input vector has a high probability of belonging to class C1 when y is close to 1, and C2 when y is close to 0. If y is close to 0.5, there is uncertainty in the class membership of the input vector. A simple method to ensure that the classifier makes classification decisions only where there is a high degree of certainty is to apply an upper and lower rejection threshold to the ANN output as proposed by Bishop [7]. This classification decision rule is defined as Decide C1 if y > θ; decide C2 if y < (1-θ); Otherwise do not classify x

(7)

The parameter θ sets the level of the rejection threshold.

3.3 Bayesian Formulation The maximum likelihood approach to neural network training assumes that weight parameters have fixed unknown values and estimates these values by minimizing a suitable error function. However, as the vector of weight parameters w are estimated from a finite set of data, there is always some degree of uncertainty associated with the value of this weight vector. For example, if a single training pattern of data is removed from the training set, the weight vector that is learnt during training will differ from the weight vector that is learnt when the same training pattern is present in the training set. Bayesian learning takes into account the uncertainty in the weight vector by assigning a probability distribution to the weight vector that represents the relative degrees of belief in the values of the weight vector. An MLP model structure reflecting the number of hidden layer neurons is first defined as H, and a prior distribution p(w|H) is assigned to the weight vector to reflect any initial beliefs regarding the distribution of the weight parameters. Once a set of training data, D, is observed, the prior distribution can be converted to a posterior distribution of network weights using the Bayes theorem as p ( w | D, H ) =

p( D | w, H ) p(w | H ) p( D | H )

(8)

p(D | w, H) is the likelihood function and p(D | H) is the evidence of model H. MacKay [10] uses a Gaussian prior as 1 ⎛ α ⎞ exp⎜⎜ − ∑ wi2 ⎟⎟ p(w | H ) = (9) W /2 (2π / α) ⎝ 2 i ⎠ where α is the regularization coefficient and W is the number of weight parameters in the ANN. The distribution in Eq.(9) is Gaussian with variance 1/ α. When the magnitudes of weight parameters w’s are large, p(w|H) will be small. In this way, the prior distribution favors smaller values for weight parameters. Using the prior in Eq.(9), the posterior probability of the weight vector given the training data in Eq.(8) may be derived as [10] 1 p ( w | D, H ) = exp(− S (w ) ) (10) ZS where α (11) S (w ) = ECEE + ∑ wi2 2 i and (12) Z S (α) = − ∫ S (w )dw According to Eq.(10), the weight vector that corresponds to the maximum of the posterior distribution wMP can be found by minimizing the negative logarithm of p(w | D, H). This is equivalent to minimizing an error function in Eq.(11), which consists of the cross-entropy error in Eq.(4) added to a weight penalty term called the weight decay regularizer. Evaluating the posterior probability in Eq.(10) is analytically intractable as it usually requires integration over a high dimensional weight space. One method of making these computations more tractable is to introduce approximations that simplify their computation. MacKay’s [10] evidence framework and Buntine and Weigend [11] approximate the posterior distribution, which is usually canonical, with a Gaussian distribution centered around one of its modes at wMP. Neal [12] proposes a more general and exact approach where Markov Chain Monte Carlo methods are used to compute integrations over weight space. The 4



method of MacKay [10] is used in this study because it is less computationally expensive than Neal’s method [12] and because of observations by Mackay [13] that the evidence approximation should produce superior results over Buntine and Weigend’s method when estimating regularization coefficients. MacKay’s evidence framework provides principled methods for model selection, prediction and estimating regularization coefficients without the need for a validation set of data. This allows more data to be used for training the ANN, which is useful when only limited amounts of data are available for training. The evidence framework specifies the following methods: 1) Estimating Regularization Coefficients: The evidence of α is evaluated by integrating over network weight parameters as p ( D | α, H ) = ∫ p ( D | w, H ) p (w | α)dw

(13)

Using Eq.(13), MacKay [10] derives an expression for the most probable regularization coefficient αMP that maximises p(D | α , H). This has the form of γ α MP = (14) ∑ wi2 where W λ (15) γ=∑ i i =1 λ i +α λ i ’s are eigenvectors of the Hessian matrix A or second derivative of the error function in Eq.(11) with respect to weight parameters. The Hessian is computed using the Pearlmutter method [14]. During training, α is first initialized to a small random value. Training takes place by adjusting weight parameters using the scaled conjugate gradient algorithm in order to minimize Eq.(11). Once, a minimum is reached, α MP is determined using Eq.(14) keeping w constant. The above procedure is then repeated again until a self-consistent solution for both α and w is reached.

2) Model Order Selection: The evidence framework compares ANN models of different complexity by evaluating the evidence of the model. For an ANN model Hi, the evidence is given by: p( D | H i ) = ∫ p ( D | α, H i ) p (α | H i )dw

(16)

MacKay [12] derives an expression for the logarithm of Eq.(16) as ln p( D | H i ) = − S (w ) −

(

)

1 W 1 ⎛ 4π ⎞ ln det( A) + ln α + ln 2 N N ! + ln⎜⎜ ⎟⎟ 2 2 2 ⎝ γ ⎠

(17)

Studies have reported a correlation between the model evidence and generalization error of a neural network [10]. The expression in Eq.(17) allows the model with the highest evidence to be selected without having to separate training data into a training and validation set. 3) Moderated Outputs: ANNs estimate a mapping function by interpolating in a region of function space generated by a finite set of training data. The confidence of predictions within this space will be higher than predictions outside of this space. Moderated outputs were proposed by MacKay [10] to adjust the ANN output by an amount that is reflective of the uncertainty of the weight parameters. MacKay [10] derives an expression for the moderated output by assuming that the activation function of the output neuron, a, is a locally linear function of ANN weights. The expression can be written as: P (C 1 | x) = g (K ( s )aMP )

where

(

K ( s ) = 1 +πs 2 / 8

)

−1 / 2

(18)

(19)

and s 2 (x) = gT A −1g .

(20)

Here, aMP is the activation of the output neuron when w=wMP and g is the derivative of a with respect to w.

5



Figure 2. Main components of an automatic detection system. A sliding window segments the raw EEG data. Features are then extracted and presented to a classifier for classification.

4. SSW Detector Implementation The complete EEG SSW detection system is divided into three sequential components as shown in Fig. 2. The first stage is a data segmentation stage where the EEG is segmented into short, 320ms segments of data using a sliding window. Features are then extracted from the segments to provide a representation of the raw data more suitable for classification. A neural network classifier then classifies these features as containing epileptic activity or not.

4.1. Feature Extraction Four different types of features were extracted from the EEG. These were based on methods used in previous studies. 1) Fast Fourier Transform (FFT): The real and imaginary parts of the FFT of an EEG data segment were used by Jándo [4] as inputs to a neural network classifier and were reported to produce good detection performance. These features convey a spectral representation of an EEG data segment. A Gaussian window is applied to the data segment prior to the FFT in order to reduce spectral leakage. This window was chosen because of its good localization in time and frequency. 2) Autoregressive Modeling: The Autoregressive Model has been used to extract features from the EEG in several EEG classification systems in the past [15]. This technique fits an EEG data segment to a parametric model in the form of a linear predictor. The parameters and prediction error of this model convey spectral information and are a low-dimensional representation of the raw data segment. A fourth order model was found to be sufficient to model the EEG. The Burg method was used to estimate the coefficients as it guarantees a stable model [16]. 3) Discrete Wavelet Transform (DWT): The above two feature extraction methods rely on an assumption of weak stationarity for the EEG data segment. However, in reality the EEG is a highly non-stationary signal. The DWT is suited to non-stationary signals and performs a multi-resolution analysis of a signal. Coefficients of the DWT are used as features that convey both temporal and spectral information. The method used by Kalayci and Özdamar [5] was adopted in this study. This method uses coefficients from the fourth level wavelet decomposition with a Daubechies-4 wavelet [17] as the mother wavelet 4) Raw EEG: Using the raw EEG data itself as input to a classifier has been shown to produce good results by Kalayci and Özdamar [18]. The benefit of using the raw data instead of extracted features is that the raw data will contain all information that is relevant for distinguishing between pattern classes, whereas extracted features may not represent all this information.

4.2. Dimensionality Reduction The dimensionalities of the features were reduced using Principle Component Analysis (PCA) before presenting them to neural network classifiers for classification. Bishop [7] showed that a lower dimensional 6



feature space will result in a less complex ANN that will require less training data for training. The amount of data required to train an ANN increases exponentially with dimensionality, the so called “curse of dimensionality” [7]. PCA or the Karhunen-Loéve transform is a linear unsupervised dimensionality reduction technique that aims to find a set of orthogonal vectors in the input data space that have the greatest contribution to the variance of the data. The eigenvectors and eigenvalues of the covariance matrix of the input features are calculated and only the largest eigenvalues are retained. Dimensionality reduction is achieved by projecting the input data onto these eigenvectors that correspond to these eigenvalues. PCA was found to reduce the dimensionality of raw EEG and FFT features by 75% and 80% while retaining 99% and 95% of the variance of the original data, respectively.

4.3. Input Standardization All input features are standardized to have a zero mean and unit standard deviation. Features extracted from the EEG differ significantly in magnitude. Larger inputs will be given higher importance by the ANN, which may not reflect their relative importance in making a classification decision. Standardization ensures that this does not occur and also assists with neural network training as smaller magnitude inputs are less likely to cause saturation of neural network activation functions.

4.4. Neural Network Validation Neural networks were implemented according to both the Maximum Likelihood and Bayesian Learning formulation. The generalization performance of each network was determined according to a four-fold cross validation strategy. The generalization error between ANN output and target output is quantified using a mean square error (MSE) criterion given by 1 P E MSE = ∑ t p − y p (21) P p=1 where P is the total number of test patterns, tp is the target output for input pattern p and yp is the actual ANN output when presented with pattern p. Classification performance is assessed in terms of Specificity, Sensitivity and Accuracy as defined below.

(

)

1) Sensitivity: Percentage of input patterns correctly classified as containing epileptic activity. 2) Specificity: Percentage of input patterns correctly classified as not containing epileptic activity 3) Accuracy: Percentage of correctly classified input patterns of either class.

4.5. Experimental Data The experimental data used to validate SSW detectors in this study was recorded from eleven patients with 12-bit accuracy at a sampling frequency of 200Hz using a Nihon-Kohden 2100 digital EEG system. Nine of the patients’ EEG contained epileptiform activity and the remaining two exhibited normal brain activity. As the data is taken from several different patients, the ability of detectors to tolerate inter-patient variations in the EEG can be assessed. Data was marked as containing either epileptic activity, artifact activity or normal background activity by two EEG technologists, each having more than four years experience in the field. One thousand six hundred and seventy eight data segments and their associated markings were extracted from the dataset, of which half contained epileptiform activity and the remainder contained artifact or background activity. The markings were used to determine targets for neural network training and validation.

5. Results and Discussion Three different experiments were conducted in order to assess whether the methods described in this paper are useful in the context of epileptic SSW detection. In the first experiment, the mean square error on test data was determined for neural networks trained using both the Maximum Likelihood and Bayesian formulations. Note that this does not include the Bayesian moderated outputs as they are evaluated in a separate experiment. The results are shown in Table 1. For all four detectors, the Bayesian neural network consistently performs better regardless of type of input feature. This can probably be attributed to its more efficient use of training data during training. 7



Table 1. Mean Square Error on Test Dataset Type of Feature Raw FFT DWT AR

Network Configuration 15-3-1 25-6-1 8-3-1 5-4-1

MSE Maximum Likelihood 0.0906 0.0969 0.0947 0.1456

MSE Bayesian 0.0875 0.0876 0.0923 0.1338

Table 2. Classification Performance with Rejection Threshold of 0.9 Type of Feature Raw FFT DWT AR

Sensitivity (%) 95.61 96.49 95.16 94.70

Specificity (%) 94.25 93.79 94.39 92.77

Accuracy (%) 94.85 95.10 95.03 93.71

Not Classified (%) 25.56 24.97 27.21 48.34

Table 3. Classification Performance with Rejection Threshold of 0.9 – Moderated Outputs Type of Feature Raw FFT DWT AR

Sensitivity (%) 95.10 97.74 95.00 95.49

Specificity (%) 94.67 96.09 95.63 92.21

Accuracy (%) 94.86 96.92 95.00 93.84

Not Classified (%) 29.61 39.46 31.52 49.89

The second experiment involves determining the classification performance of the Bayesian neural networks on test data. Classification decisions are made by applying a rejection threshold of 0.9 to the neural network outputs. This classification performance is computed using the test dataset and provides an estimate of a detector’s true generalization performance. Results are shown in Table 2. The rejection thresholds produce very high values for sensitivity, specificity, and accuracy for all four detectors. As a result, the decisions made by the detectors appear very reliable. The detector with DWT input features appears to produce slightly superior results compared to other detectors. It is important to also study the amount of data that was not classified due to uncertainty of the neural network. The detector that uses parameters from the Autoregressive model as a feature is clearly inferior to all other detectors, because it does not classify 48.34% of features compared to between 24.97% and 27.21% for the other detectors. Lastly, classification performance when using Bayesian Moderated Outputs was evaluated in order to assess whether their application could improve on the results in Table 2. These results are shown in Table 3. There is a negligible improvement in specificity, sensitivity and accuracy, and an increase in the number of unclassified features in comparison to the results in Table 2. We found that the moderated outputs required approximately ten times more computation time. This is mainly due to the computation of a Hessian matrix which is a computationally intensive task. As the Moderated Outputs do not improve classification results significantly and require additional computational overhead, we do not recommend their use for epileptic SSW detection.

6. Conclusion In this paper, we have implemented neural networks for epileptic SSW detection according to a Maximum Likelihood and Bayesian formulation. The latter was found to produce better trained networks and make better use of a finite set of training data than the former. Four different detector configurations were implemented with each detector using a different feature as input to a neural network. The DWT features appear to be a slightly better representation of the EEG compared to other features and also produce the least complex neural networks. Detectors that use AR features are significantly inferior to detectors that use other features, and are therefore not recommended for SSW detection. 8



The use of rejection thresholds ensures that classification decisions are made with accuracies in excess of 90%. Doubtful input patterns are not classified and would have to be visually interpreted by a user of the detection tool. This is a desirable property for a detector as the perceived reliability of its classification decisions are high while still significantly reducing the amount of EEG data that needs to be visually screened. Bayesian moderated outputs do not improve on neural network predictions and add a significant computational overhead. Therefore, they are not recommended for use in SSW detection systems. The conclusion that can be drawn from this study is that neural networks are a powerful tool that is very well suited to EEG classification. It is important to interpret neural networks in a systematic manner within the wider framework of statistical pattern recognition. We have shown that reliable detection and a high degree of accuracy can be achieved when neural networks are implemented in this way.

Acknowledgments The EEG data used in this study was collected by the Pattern Recognition Group of the Department of Electrical, Electronic and Computer Engineering in conjunction with the Department of Neurology at the University of Pretoria, South Africa. We gratefully acknowledge Professor E. C. Botha, Karl Geggus and Professor P. Bartel for making the data available for this work.

References [l] J. R. Glover, P. Y. Ktonas, N. Raghaven, J. M., Urunela, S. V. Velamuri and E. L. Reilly, “A multichannel signal processor for the detection of epileptogenic sharp transients in the EEG,” IEEE Trans. Biomed. Engng, vol. BME-33(12), pp. 1121-1128, 1986 [2] J. Gotman, P. Gloor and N. Schaul, “Comparison of traditional reading of the EEG and automatic recognition of interictal epileptic activity,” Elecroenceph. Clinical Neurophysiol., vol. 44, pp. 48-60, 1978. [3] J. Qian, J. S. Barlow and M. P. Beddoes, “A simplified arithmetic detector for EEG sharp transients – preliminary results,” IEEE Trans.Biomed. Engng, vol. BME-35(1), pp. 11-17, 1988. [4] G. Jando, R. M. Siegel, Z. Horvath and G. Buzsaki, “Pattern recognition of the electroencephalogram by artificial neural networks,” Elecroenceph. Clinical Neurophysiol., vol. 86, pp. 100-109, 1993. [5] T. Kalayci and O. Ozdamar, “Wavelet preprocessing for automated neural network detection of EEG spikes,” IEEE Eng. Med. Biol. Mag., pp. 160-166, Mar/Apr 1995. [6] A. J. Gabor and M. Seyal, “Automated interictal EEG spike detection using artificial neural networks,” Elecroenceph. Clinical Neurophysiol., vol. 83, pp. 271-280, 1992. [7] C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1995. [8] P. Sykacek, G. Dorffner, P. Rappelsberger and J. Zeitlhofer, “Evaluating confidence measures in a neural network based sleep stager,” Technical Report TR-97-21, Austrian Research Institute for Artificial Intelligence (OFAI), 1997. [9] M. F. Moller, “A scaled conjugate gradient algorithm for fast supervised learning,” Neural Networks, vol. 6, pp. 525-533, 1993. [10] D. J. C MacKay, Bayesian Methods for Adaptive Models, PhD thesis, California University of Technology, 1992. [11] W. L. Buntine and A. S. Weigend, “Bayesian back-propagation,” Complex Systems, vol. 5(6), pp. 603-643, 1991. [12] R. M. Neal, “Bayesian training of backpropagation networks by the Hybrid Monte Carlo method,” Technical Report CRG-TR-92-1, Connectionist Research Group, Department of Computer Science, University of Toronto, 1992. [13] D. J. C. MacKay, “Comparison of approximate methods for handling hyperparameters,” Neural Computation, vol. 11(5), pp. 1035-1068, 1993. [14] B. A. Pearlmutter, “Fast exact multiplication by the Hessian,” Neural Computation, vol. 6(1), pp. 147-160, 1994. [l5] A. Sharma and R. J. Roy, “Design of a recognition system to predict movement during anaesthesia,” IEEE Trans.Biomed. Engng, vol. 44, pp. 505-511, 1997 9



[l6] J. Pardey, S. Roberts and L. Tarassenko, “A review of parametric modelling techniques for EEG analysis,” Med. Eng. Physics, vol. 18, pp. 2-11, 1996 [17] I. Daubechies, Ten Lectures on Wavelets, Society for Industrial and Applied Mathematics, Philadelphia, 1992. [18] T. Kalayci and O. Ozdamar, “Detection of spikes with artificial neural networks using raw EEG,” Comput. Biomedical Research, vol. 31, pp. 122-142, 1998.

Nadim Mohamed graduated from the University of the Witwatersrand with a BSc(Eng) and MSc(Eng) in Electrical Engineering in 2003, and is currently a consultant at Accenture. His interests are in Biomedical Engineering, Bayesian learning and Pattern Recognition.

David Rubin is a medical doctor and specialist in Nuclear Medicine. He received his medical degree, MBChB, from the University of Pretoria in 1986. His other academic qualifications include Diploma in Anaesthetics and Fellow of the College of Nuclear Physicians, both from the South African College of Medicine, Master of Biomedical Engineering from the University of New South Wales, and the MMed degree from the University of the Witwatersrand, Johannesburg. He currently leads the Biomedical Engineering Research Group in the School of Electrical and Information Engineering at the University of the Witwatersrand, Johannesburg. His research interests include quantitative physiology, medical imaging, as well as identifying medical problems with potential engineering solutions, and working with engineers toward the realization of these solutions. Tshilidzi Marwala graduated with BS in Mechanical Engineering with a Magna Cum Laude from Case Western Reserve University in 1995, a Masters of Engineering from University of Pretoria, and a PhD in Computational Intelligence from the University of Cambridge in 2001. He was a postdoctoral research fellow at Imperial College of Science, Technology and Medicines. His interests are in System Identification and Pattern Recognition.

10