extractions, Section 5 presents the state of art in EEG epilepsy detection and ...... Let pX(x) and pY(y) be the marginal of probability density function (pdf) of X .... Fisch and Spehlmann's EEG primer: Basic principles of digital and analog EEG,.
0 3 EEG Signal Processing for Epilepsy Carlos Guerrero-Mosquera1 , Armando Malanda Trigueros2 and Angel Navia-Vazquez1 1 University
Carlos III of Madrid, Signal Theory and Communications Department Avda, Universidad, 30 28911 Leganes 2 Public University of Navarre, Electrical and Electronic Engineering Department Campus Arrosadia, 31006 Pamplona Spain
1. Introduction Neural activity in the human brain starts from the early stages of prenatal development. This activity or signals generated by the brain are electrical in nature and represent not only the brain function but also the status of the whole body. At the present moment, three methods can record functional and physiological changes within the brain with high temporal resolution of neuronal interactions at the network level: the electroencephalogram (EEG), the magnetoencephalogram (MEG), and functional magnetic resonance imaging (fMRI); each of these has advantages and shortcomings. MEG is not practical for experimental work when subjects may move freely, because of the large size of magnetic sensors. For image sequences, fMRI has a time resolution very low and many types of EEG activities, brain disorders and neurodegenerative diseases cannot be recorded. On the other hand the spatial resolution of the EEG is limited to the number of electrodes, as described in Ebersole & Pedley (2003); Sanei & Chambers (2007). Much effort has been made to integrate information of multiple modalities during the same task in an attempt to establish an alternative high-resolution spatiotemporal imaging technique. The EEG provides an excellent tool for the exploration of network activity in the brain associated to synchronous changes of the membrane potential of neighboring neurons. Understanding of neuronal functions and neurophysiological properties of the brain together with the mechanisms underlying the generation of biosignals and their recordings is important in the detection, diagnosis, and treatment of brain disorders. Cerebral sources of electroencephalography potentials are three-dimensional volumes of cortex. These sources produce three-dimensional potential fields within the brain. From the surface of the scalp, these can be recorded as two-dimensional fields of time-varying voltage. The physical and functional factors that determine the voltage fields that these sources produce could be appreciated in order to locate and characterize cortical generators of the EEG. Electroencephalography enables clinician to study and analyze electrical fields of brain activity recorded with electrodes placed on the scalp, directly on the cortex (e.g., with subdural electrodes), or within the brain (with depth electrodes). For each type of recordering, the
50
2
Epilepsy – Histological, Electroencephalographic and Psychological Aspects Will-be-set-by-IN-TECH
specialist attempts to determine the nature and location of EEG patterns and whether they correspond to normal or abnormal neural activity. In this chapter will introduce several typical methods in which EEG signal pre-processing and processing in EEG signals with epilepsy. The chapter is organized as follows: Section 2 presents a brief outline of electroencephalography, Section 3 introduces to EEG waveform analysis, Section 4 is an overview of different alternatives in EEG signal modeling and feature extractions, Section 5 presents the state of art in EEG epilepsy detection and classification, Section 6 shows different methods to dimensionality reduction for EEG signals and Section 7 gives a summary and conclusions of this chapter.
2. Outline of electroencephalography The nervous system is an organ system containing a network of specialized cells called neurons that gathers, communicates, and processes information from the body and send out both internal and external instructions that are handled rapidly and accurately. In most animals the nervous system is divided in two parts, the central nervous system (CNS) and the peripheral nervous system (PNS). CNS contain the brain and the spinal cord, and the PNS consists of sensory neurons, grouping of neurons called ganglia, and nerves cells that are interconnected and also connect to the CNS. The two systems are closely integrated because sensory input from the PNS is processed by the CNS, and responses are sent by the PNS to the organs of the body. Neurons transmit electrical potentials to other cells along thin fibers called axons, which cause chemicals called neurotransmitters that permit the neuronal function called synapses. These electrical potentials, called as “action potentials” is the information transmitted by a nerve that, in one cell, cause the production of action potentials in another cell at the synapse. A potential of 60-70 mV with some polarity may be recorded under the membrane of the cell body. This potential changes with variations in the synaptic process. In this sequence, the first cell to produce actions potentials is called the presynaptic cell, and the second cell, which responds to the first cell across the synapse, is called the postsynaptic cell. Presynaptic cells are typically neurons, and postsynaptic cells are typically other neurons, muscle cells, or gland cells. A cell that receives a synaptic signal may be inhibited, excited or otherwise modulated. The Fig.1 shows the synaptic activities schematically. The CNS is a major site for processing information, initiating responses, and integrating mental processes. It is analogous to a highly sophisticated computer with the ability to receive inputs, process and store information, and generate responses. Additionally, it can produce ideas, emotions, and other mental processes that are not automatic consequences of the information input. 2.1 Neural activities
Cells of the nervous system include neurons and nonneural cells. Neurons or nerve cell communicate information to and from the brain. They are organized to form complex networks that perform the functions of the nervous systems. All nerve cells are collectively referred to as neurons although their size, shape, and functionality may differ widely. Neurons can be classified with reference to morphology or functionality. Using the latter classification scheme, three types of neurons can be defined: sensory neurons, connected to sensory receptors, motor neurons, connected to muscles, and interneurons, connected to other neurons.
EEG Signal Processing for Epilepsy EEG Signal Processing for Epilepsy
513
Fig. 1. Presynaptic and postsynaptic activities in the neurons. An action potential that travels along the fibre ends in an excitatory synapse. This process causes an excitatory postsynaptic potential in the following neuron. The cell body is called the soma, from which two types of structures extend: the dendrites and the axon. Dendrites are short and consist of as many as several thousands of branches, with each branch receiving a signal from another neuron. The axon is usually a single branch which transmits the output signal of the neuron to various parts of the nervous system. Each axon has a constant diameter and can vary in size from a few millimeters to more than 1 m in length; the longer axons are those which run from the spinal cord to the feet. Dendrites are rarely longer than 2 mm. and are connected to either the axons or dendrites of other cells. These connexions receive impulses from other nerves or relay the signals to other nerves. The human brain has approximately 10,000 connexions between one nerve and other nerves, mostly through dendritic connections. Neurons are, of course, not working in splendid isolation, but are interconnected into different circuits (“neural networks”), and each circuit is tailored to process a specific type of information. 2.2 Cerebral cortex
The cerebral cortex constitutes the outermost layer of the cerebrum and physically it is a structure within the brain that plays an important role in memory, perceptual awareness, attention, thought, consciousness and language. Normally, it is called “grey matter” for its grey color and it is formed by neurons and “gray fibers” covered by a dielectric called myelin.
52
4
Epilepsy – Histological, Electroencephalographic and Psychological Aspects Will-be-set-by-IN-TECH
Myelinated axons are white in appearance, this characteristic is the origin of the name “white matter,” and it is localized below the grey matter of the cortex. Their composition is formed predominantly by myelinated axons interconnecting different regions of the nervous central system.
Fig. 2. Cerebral cortex and its four lobes. The human cerebral cortex is 2-4 mm thick. The cortical surface is highly convoluted by ridges and valleys of varying sizes and thus increases the neuronal area; the total area is as large as 2.5 m2 and includes more than 1010 neurons. The cortex consists of two symmetrical hemispheres–left and right–which are separated by the deep longitudinal fissure (the central sulcus). Each cerebral hemisphere is divided into lobes, which are named for the skull bones overlying each one: the frontal lobe, involved with decision-making, motor speech, problem solving, and planning; temporal lobe, involved with memory, sensory speech, emotion, hearing, and language; the parietal lobe, involved in the reception, reading comprehension and processing of sensory information from the body; and the occipital lobe, involved with vision, see Fig.2.
3. Introduction to the EEG waveform analysis Most of the brain disorders are diagnosed by visual inspection of EEG signals and the analysis is a rational and systematic process requiring a series of orderly steps characterizing the recorded electrical activity in terms of specific descriptors or features and measurements as viewed in Table.1.
EEG Signal Processing for Epilepsy EEG Signal Processing for Epilepsy
535
1. 2. 3. 4.
Frequency or wavelength Voltage Waveform. Regulation a. Frequency b. Voltage 5. Manner of occurrence (random, serial, continuous) 6. Locus 7. Reactivity (eye opening, mental calculation, sensory stimulation, movement, affective state) 8. Interhemispheric coherence (homologous areas) a. Symmetry i. Frequency ii. Voltage b. Synchrony i. Wave ii. Burst Table 1. Essential features of EEG analysis described in Ebersole & Pedley (2003). For example, an EEG from an 8 year old child, some 2 Hz waves are identified in the awake EEG. This activity must then be characterized according to their location, voltage, waveform, manner of occurrence, frequency, amplitude modulation, synchrony and symmetry. A change in any of these features might entirely change the significance of the 2 Hz waves finding this difference as abnormal. Some clinical information is required before the EEG analysis is begun, by example the patient’s age and state. Both age and birth date should be part of the EEG record. For example, there are clearly defined differences between the EEG of a premature infant with a conceptional age of 36 weeks, but there are no important or sharply delineated differences between the EEG of a 3 year old child and that 4 year old child described in Ebersole & Pedley (2003). The clinical experts in the fields are familiar with manifestation of brain rhythms in the EEG signals and it is important to recognize that the identification of a particular activity or phenomenon may depend on its “reactivity” (see Table.1). An important element of the recording and its analysis is the testing of the reactions, or responses, of the various components of the EEG to certain physiological changes. Specification of the reactivity of a given activity, rhythm or pattern is essential for the identification and subsequent analysis of the activity and may clearly differentiate it from another activity with similar characteristics. For example, in healthy adults, the amplitudes and frequencies of brain rhythms change from one state of the human to another, such as wakefulness and sleep. Similarly, a series of rhythmic, high voltage 3 to 4 Hz waves in the prefrontal leads (just over the eyes) occurring in association with arousal in a young child may be normal, but a similar burst occurring spontaneously and not associated with arousal may be abnormal. 3.1 Brain rhythms and waveforms
The electrical activity of the cerebral cortex is often called as rhythm because this recorded signals exhibit oscillatory, repetitive behavior. The diversity of EEG rhythms is enormous and
54
6
Epilepsy – Histological, Electroencephalographic and Psychological Aspects Will-be-set-by-IN-TECH
depends, among many other things, on the mental state of the subject, such as the degree of attentiveness, waking, and sleeping. The rhythms usually are conventionally characterized by their frequency range and relative amplitude. On the other hand, there are five brain waves characterized by their frequency bands. These frequency ranges are alpha (α), theta (θ), beta (β), delta (δ), and gamma (γ) and their frequencies range from low to high frequencies respectively. The alpha and beta waves were introduced in 1929 by Berger. In 1938, Jasper and Andrews found waves above 30 Hz that labeled as “ gamma” waves. A couple years before, in 1936, Walter introduced the delta rhythm to designate all frequencies below the alpha range and he also introduced theta waves as those frequencies within the range of 4-7.5 Hz. In 1944 the definition of a theta wave was introduced by Wolter and Dovey in Sanei & Chambers (2007). Alpha waves are over the occipital region of the brain and appear in the posterior half of the head. The normal range for the frequency of the occipital alpha rhythm in adults is usually given as 8 to 13 Hz, and commonly appears as a sinusoidal shaped signal. However, sometimes it may manifest itself as sharp waves. In such cases, the alpha wave consist of a negative and positive component that appears to be sharp and sinusoidal respectively. In fact, this wave is very similar to the morphology of the brain wave called rolandic mu (μ) rhythm. Delta waves lie within the range of 0.5-4 Hz. These waves appear during deep sleep and have a large amplitude. It is usually not encountered in the awake, normal adult, but is indicative of, e.g., cerebral damage or brain disease. Theta waves are the electrical activity of the brain varying the range of 4-7.5 Hz and its name might be chosen to origin assumption from thalamic region. The theta rhythm occurs during drowsiness and in certain stages of sleep or consciousness slips towards drowsiness. Theta waves are related to access to unconscious material, creative inspiration and associated to deep meditation. Beta waves are within the range of 14-26 Hz and consists in a fast rhythm with low amplitude, associated with an activated cortex and observed during certain sleep stages. This rhythm is mainly present in the frontal and central regions of the scalp. Gamma waves (sometimes called the fast beta waves) are those frequencies above 30 Hz (mainly up to 45 Hz) related to a state of active information processing of the cortex. The observation of gamma rhythm during finger movement is done simply by using an electrode located over the sensorimotor area and connected to a high-sensitivity recording system. Other waves frequencies much higher than the normal activity range of EEG have been found in the range of 200-300 Hz. The localization of these frequencies take place in cerebellar structures of animals, but they do not play any role in clinical neurophysiology. Most of the above rhythms may persist up to several minutes, while others occur only for a few seconds, such as the gamma rhythm. Fig.3 shows typical normal brain waves. There are also less common rhythms introduced by researchers such as Phi (ϕ), Kappa (κ), Sigma (σ), Tau (τ), Chi (χ), Lambda (λ) and transient waveforms associated to two sleep states, commonly referred to as non-REM (Rapid Eye Movement) and REM sleep: vertex waves, sleep spindles, and K complexes described in Sanei & Chambers (2007); Sörnmo & Laguna (2005).
EEG Signal Processing for Epilepsy EEG Signal Processing for Epilepsy
557
Fig. 3. Typical normal brain waves in the EEG It is often difficult to understand and detect the brain rhythms and waves from the scalp EEGs, even with trained eyes. New applications in advanced signal processing tools, however, should enable analysis and separation of the desired waveforms from the EEGs. Definitions such as foreground and background EEG are very subjective and totally depends on the abnormalities and applications. Possibly it is more useful to divide the EEG signal into two general categories: the spontaneous brain activity (the “background EEG”); and brain potentials which are evoked by various sensory and cognitive stimuli (evoked potentials, EPs). 3.2 Artifacts
Analysis of EEG activity usually raises the problem of differentiating between genuine EEG activity and that which is introduced through a variety of external influence. These artifacts may affect the outcome of the EEG recording. Artifacts originate from a variety of sources such as eyes movement, the heart, muscles and line power. Their recognition, identification, and eventual elimination are a primary responsibility of the EEG expert. Even the most experienced neurophysiologist cannot always eliminate all artifacts in EEG records. However, it is always a major goal to identify the artifactual activity and be sure that it is not of cerebral origin and should not be misinterpreted as such. Following Ebersole & Pedley (2003); Fisch (1999), artifacts are generally divided into two groups: physiological and non-physiological. Physiological artifacts usually arise from generator sources within the body but not necessarily the brain, for example, eye movements; electrocardiographic and electromyographic artifacts, galvanic skin response and so on. Biological generators present in the body may produce artifacts when an EEG recording is made directly from the surface of the brain. Nonphysiological artifacts come from a variety of sources such as instrumental and digital artifacts (electronic components, line power, inductance, etc.), electrode artifacts, environment, etc. As technology expands and additional equipment is developed and put into clinical use, novel artifacts will apper. Then, a correct artifact filtering strategy should on the one hand eliminate
56
8
Epilepsy – Histological, Electroencephalographic and Psychological Aspects Will-be-set-by-IN-TECH
unnecessary amount of information that has to be eliminated, and on the other hand maintain or ensure that the resulting information is not affected by undetected artifacts. Sometimes visual artifacts inspections could be a good alternative in cases when the artifacts are relatively easy detected by the EEG experts. However, there is the possibility that during the analysis of EEG databases these patterns from artifacts cause serious misinterpretation and then reduce the clinical usefulness of the EEG recordings. 3.3 Abnormal EEG patterns
Any variation in EEG patterns for certain states of the subject indicate abnormality. This may be due to many causes such as distortion and loss of normal patterns, increased occurrence of abnormal patterns, or disappearance of all patterns. In most abnormal EEGs, the abnormal EEG patterns do not entirely replace normal activity: they appear only intermittently, only in certain head regions, or only superimposed on a normal background. An EEG is considered abnormal if it contains (a) generalized intermittent slow wave abnormalities, commonly associated in the delta wave range and brain dysfunctions, (b) bilateral persistent EEG, often associated with impaired conscious cerebral reactions, and (c) focal persistent EEG usually associated with focal cerebral disturbance. The classification of the three categories presented before is not easy and needs to be extended to several neurological diseases and any other available information. A precise characterization of the abnormal patterns leads to a clearer insight into some specific neurodegenerative diseases such as epilepsy, Parkinson, Alzheimer, dementia and sleep disorders, or specific disease processes, for example Creutzfeldt-Jakob disease (CJD) described in Sanei & Chambers (2007). However, following Fisch (1999), recent studies have demonstrated that there is correlation between abnormal EEG patterns, general cerebral pathology and specific neurological diseases.
4. Modelling and segmentation 4.1 Modelling the EEG signals
Modelling the brain activities is not an easy task as compared with modelling any other organ. First literature related to EEG signal generation includes physical model such as the model proposed by Hodgkin and Huxley, linear models such as autoregressive (AR) modelling, AR moving average (ARMA), multivariate AR (MVAR), Prony methods and so on. There are also methods based on no-linear models such as autoregressive conditional heteroskedasticity (GARCH), Wiener modeling and local EEG model method (LEM). More details about the methods described above can be found in Celka & Colditz (2002); Sanei & Chambers (2007). Following Senhadji & Wendling (2002), other model relates a sampled EEG signal X (n ) with relevant activities as elementary waves, background activity, noise and artifacts as: X (n ) = F (n ) +
np
na
i =1
j =1
∑ Pi (n − t pi ) + ∑ R j (n − taj ) + B(n)
(1)
where F (n ) is the background activity; the Pi terms represent brief duration potentials corresponding to abnormal neural discharges; the R j terms are related to artifacts ( discussed later in section 3.2) and B (n ) is the measurement noise which is modeled as a stationary
EEG Signal Processing for Epilepsy EEG Signal Processing for Epilepsy
579
process. This model shows all the EEG information including the abnormal EEG signal. This is a mathematical model rather than an EEG generation signal model, but facilitates the manipulation of concepts that are introduced in the next sections. 4.2 Signal segmentation
Signal segmentation is a process that divides the EEG signal by segments of similar characteristics that are particularly meaningful to EEG analysis. Traditional techniques of signal analysis, for example, spectrum estimation techniques, assume time-invariant signals but in practice, this is not true because the signals are time-varying and parameters such as amplitude, frequency and phase change over time. Furthermore, the presence of short time events in the signal causes a nonstationarity effect. Non-stationary phenomena are present in EEG usually in the form of transient events, such as sharp waves, spikes or spike-wave discharges which are characteristic for the epileptic EEG, or as alternation of relatively homogenous intervals (segments) with different statistical features (e.g., with different amplitude or variance). The transient phenomena have specific patterns which are relatively easy to identify by visual inspection in most cases, whereas the identification of the homogeneous segments of EEG, known as quasi-stationary, requires a certain theoretical basis. Usually each quasi-stationary segment is considered statistically stationary with similar time and frequency statistics. This eventually leads to a dissimilarity measurement denoted as d(m) between the adjacent EEG frames, where m is a integer value indexing the frame and the difference is calculated between the m and (m − 1)th (consecutive) signal frames. There are different dissimilarity measures such as autocorrelation, high-order statistics, spectral error, autoregressive (AR) modelling and so on, presented in Sanei & Chambers (2007). These methods are effective in EEG analysis but can not be efficient for detection of certain abnormalities due to the impossibility of obtaining segments completely stationary. It is then necessary to take into account a different group of methods potentially useful for detecting and analyzing non-stationary EEG signals where the segmentation does not play a fundamental role such as the time-frequency distributions (TFDs). 4.3 Denoising and filtering
Biomedical signals in general, but more particularly EEG signals, are subject to noise and artifacts which are introduced through a variety of external influences. These undesired signals may affect the outcome of the recording procedure, being necessary a method that appropriately eliminates then without altering original brain waves. EEG denoising methods try to reject artifacts originated in the brain or body such as ocular movements, muscle artifacts, ECG etc. Filtering is a signal processing operation whose objective is to process a signal in order to manipulate the information contained in the signal. In other words, a filter is a device that maps an input signal to an output signal facilitating the extraction (or elimination) of information (or noise) contained in the input signal. In our context, the filtering process is oriented to eliminate electrical noise generated by electrical power line or extracting certain frequency bands.
58
10
Epilepsy – Histological, Electroencephalographic and Psychological Aspects Will-be-set-by-IN-TECH
4.3.1 Lowpass filtering
Most frequently EEG signals contain neuronal information below 100 Hz, for example, epileptic waves lie below 30 Hz, it is possible to remove frequency components above this value simply using lowpass filters. In the cases where the EEG data acquisition system is unable to remove electrical noise as 50 or 60 Hz line frequency, it is necessary to use a notch filter to remove it. Although digital filters could introduce nonlinearities or distortions to the signal in both of amplitude and phase, there are digital EEG process that allow corrections of these distortions using commercial hardware devices. However, it should be better to know the characteristics of the internal and external noises that affect the EEG signals but these information usually is not available. 4.4 Independent component analysis (ICA)
ICA is of interest to scientists and engineers because it is a mathematical tool able to reveal the driving forces which underlie a set of observed phenomena. These phenomena may well be the firing of a set of neurons, mobile phone signals, brain images such as fMRI, stock prices, or voices, etc. In each case, a set of complex signals are measured, and it is known that each measured signal depends on several distinct underlying factors, which provide the driving forces behind the changes in the measured signals. These factors or source signals (that are primary interest) are buried within a large set of measured signals or signal mixtures. Following Stone (2004), ICA can be used to extract the source signals underlying a set of measured signal mixtures. ICA belongs to a class of blind source separation (BSS) methods for estimating or separating data into underlying informational components. The term “blind” is intented to imply that such methods can separate data into source signals using only the information of their mixtures observed at the recording channels. BSS in acoustics is well explained in the “cocktail party problem,” which aims to separate individual sounds from a number of recordings in an uncontrolled environment such as a cocktail party. So, simply knowing that each voice is statistically unrelated to the others suggests a strategy for separating individual voices from mixtures of voices. The property of being unrelated is of fundamental importance, because it can be generalized to separate not only mixtures of sounds, but mixtures of other kind of signals such as biomedical signals, images, radio signals and so on. The informal notion of unrelated signals can be associated to the more precise concept of statistical independence. If two or more signals are statistically independent of each other then the value of one signal provides no information regarding the value of the other signals. ICA works under this assumption and this concept plays a crucial role in separating and denoising the signals. 4.4.1 ICA fundamentals
The basic BSS problem that ICA attempts to solve assumes a set of m measured data points at time instant t, x(t) = [ x1 (t), x2 (t), ..., xm (t)] T to be a combination of n unknown underlying sources s(t) = [ s1 (t), s2 (t), ..., sn (t)] T . The combination of the sources is generally assumed to be linear and fixed, and the mixing matrix describing the linear combination of s(t) is given by the full rank n × m matrix A such that x(t) = As(t)
(2)
59 11
EEG Signal Processing for Epilepsy EEG Signal Processing for Epilepsy
Electrodes signals
sˆ1 sˆ2
ICA
sˆ3 sˆ4
x4
x3
sˆ5
Separated sources
s5
x2
x5
x1 s1
s4
s2 s3
Brain signals sources Fig. 4. General ICA process applied to EEG signals It is also generally assumed that the number of underlying sources is less than or equal to the number of measurement channels (n ≤ m). The task of the ICA algorithms is to recover the original sources s(t) from the observations x(t) and this is generally equivalent to that of finding a separating (de-mixing matrix) W such that (3) sˆ (t) = Wx(t) given the set of observed values in x(t) and where sˆ (t) are the resulting estimates of the underlying sources. This idealistic representation of the ICA problem is described in Fig.4. In reality the basic mixing model assumed in Eq.2 is simplistic and assumed for the ease of implementation. In fact, a perfect separation of the signals requires taking into account some assumptions and the structure of the mixing process: • Linear mixing: The first traditional assumption for ICA algorithms is that of linear mixing, a realistic model can be formulated as x(t) = As(t) + n(t)
(4)
where A is the linear mixing matrix described earlier and n(t) is additive sensor noise corrupting the measurements x(t) (generally assumed to be i.i.d. spatially and temporally white noise, or possibly temporally colored noise), as described in James & Hesse (2005). In a biomedical signal context, linear mixing assumes (generally instantaneous) mixing of the sources using simple linear superposition of the attenuated sources at the measurement channel.
60
12
Epilepsy – Histological, Electroencephalographic and Psychological Aspects Will-be-set-by-IN-TECH
• Noiseless mixing: If observations x(t) are noiseless (or at least the noise term n(t) is negligible) then Eq.4 reduces to Eq.2. Whilst this is probably less realistic in practical terms, it allows ICA algorithms to separate sources of interest even if the separate sources themselves remain contaminated by the measurement noise. • Square mixing matrix: So far it has been assumed that the mixing matrix A may be non-square (n × m); in fact most classical ICA algorithms assume a square-mixing matrix, i.e. m = n, this makes the BSS problem more tractable. From a biomedical signal analysis perspective the square-mixing assumption is sometimes less than desirable, particularly in situations where high-density measurements are made over relatively short periods of time such as in most MEG recordings or fMRI. • Stationary mixing: Another common assumption is that the statistics of the mixing matrix A do not change with time. In terms of biomedical signals this means that the physics of the mixing of the sources as measured by the sensors is not changing. • Statistical independence of the sources: The most important assumption in ICA is that the sources are mutually independent. Two random variables are statistically independent if there is a joint distribution of functions of these variables. This means, for example, that independent variables are uncorrelated and have no higher order correlations. In the case of time-series data, it is assumed that each source is generated by a random process which is independent of the random processes generating the other sources. 4.5 Feature extraction
Feature extraction consist in finding a set of measurements or a block of information with the objective of describing in a clear way the data or an event presents in a signal. These measurements or features are the fundamental basis for detection, classification or regression tasks in biomedical signal processing and is one of the key steps in the data analysis process. These features constitute a new form of expressing the data, and can be binary, categoricals or continuous, and also represent attributes or direct measurements of the signal. For example, features may be age, health status of the patient, family history, electrode position or EEG signal descriptors (amplitude, voltage, phase, frequency, etc.). More formally, feature extraction assumes we have for N samples and D features, a matrix N × D, where D represents the dimension of the feature matrix. That means, at the sample n from the feature matrix, we could obtain an unidimensional vector x = [ x1 , x2 , ... , x D ] called as “pattern vector.” Several methods in EEG feature extractions can be found in the literature, see Guyon et al. (2006). More specifically in EEG detection and classification sceneries, features based on power spectral density are introduced in Lehmanna et al. (2007); Lyapunov exponents are introduced in Güler & Übeyli (2007); wavelet transform are described in Hasan (2008); Lima et al. (2009); Subasi (2007) and Xu et al. (2009); sampling techniques are used in Siuly & Wen (2009) and time frequency analysis are presented on Boashash (2003); Guerrero-Mosquera, Trigueros, Franco & Navia-Vazquez (2010); Tzallas et al. (2009) and Boashash & Mesbah (2001). Other approach in feature extraction based in the fractional Fourier transform is described in Guerrero-Mosquera, Verleysen & Navia-Vazquez (2010). It is important to add that features extracted are directly dependent on the application and also to consider that there are important properties of these features to have into account, such as noise, dimensionality, time information, nonstationarity, set size and so on (Lotte et al. (2007)).
61 13
EEG Signal Processing for Epilepsy EEG Signal Processing for Epilepsy
This section emphasizes methods oriented to frequency analysis, without excluding the time domain that permits to justify the importance of the frequency analysis and their shortcomings in front of nonstationary signals like the EEG. 4.5.1 Classical signal analysis tools
A signal could be represented in different forms being for example in time and frequency. While time domain indicates how a signal changes over time, frequency domain indicates how often such changes take place. For example, let us consider a signal with a linear frequency modulation varying from 0 to 0.5 Hz and with constant amplitude (see Fig.5). Looking at the time domain representation (Fig.5 upper) it is not easy to say what kind of modulation is contained in the signal; and from the frequency domain representation (see Fig.5 bottom), nothing can be said about the evolution in time of the frequency domain characteristics of the signal.
Signal in time Amplitude
1
0
Magnitude
−1
0
20
40
60 80 100 Time [s] Signal in frequency
120
140
400 200 0 −0.5
0 Normalized frequency
0.5
Fig. 5. Chirp signal using time domain (upper) and frequency domain (bottom). The two representations are related by the Fourier transform (FT) as: X (ω ) =
∞
x (t)e− jωt dt
(5)
X (ω )e− jωt dω
(6)
−∞
or by the inverse Fourier transform (IFT) as: x (t) =
∞ −∞
Eq.6 indicates that signal x (t) can be expressed as the sum of complex exponentials of different frequencies, whose amplitudes are the complex quantities X (ω ) defined by Eq.5.
62
Epilepsy – Histological, Electroencephalographic and Psychological Aspects Will-be-set-by-IN-TECH
14
The squared magnitude of the Fourier transform , | X (ω )|2, is often taken as the frequency representation of the signal x (t), which allows in some sense easier interpretation of the signal nature than its time representation. Better interpretation is obtained using a domain that directly represents frequency content while still keeping the time description parameter. This characteristic is the aim of time frequency analysis. To illustrate this, let us represent the chirp signal explained above using the spectrogram (more details about this in the following). Note how it is possible to see the linear progression with time of the frequency components, from 0 to 0.5 (Fig.6). Signal in time
Real part
1 0.5 0 −0.5 SP, Lh=16, Nf=64, lin. scale, imagesc, Threshold=5% 0.5
Frequency [Hz]
0.4
0.3
0.2
0.1
0
20
40
60 Time [s]
80
100
120
Fig. 6. Spectrogram representation of the chirp 4.5.2 Time-frequency distributions (TFD)
In a series of papers (Akay (1996); Cohen (1995)), Cohen generalized the definition of time-frequency distributions (TFDs) in such a way that a wide variety of distributions could be included in the same framework. Specifically the TFD of a real signal x (n ) is computed as: P (t, ω ) =
∞ ∞ 1 A(θ, τ )Φ(θ, τ )e-jθt-jωτ dθdτ 2π −∞ −∞
where,
(7)
∞ 1 τ τ x (u + ) x ∗ (u − )e jθu du (8) 2π −∞ 2 2 is the so-called ambiguity function and the weighting function Φ(θ, τ ) is a function called the kernel of the distribution that, in general, may depend on time and frequency.
A(θ, τ ) =
63 15
EEG Signal Processing for Epilepsy EEG Signal Processing for Epilepsy
If Φ (θ, τ ) = 1 in Eq.(7), we have P (t, w) =
∞ ∞ 1 τ τ x (u + ) x ∗ (u − )e−jωτ 2π −∞ −∞ 2 2 ∞ 1 e−jθ (t-u) dθdudτ 2π −∞
(9)
where
∞ 1 e-jθ(t-u) dθ = δ(t − u ) 2π −∞
(10)
and we know that ∞ −∞
x (u +
τ ∗ τ τ τ ) x (u − )δ(t − u )du = x (t + ) x ∗ (t − ) 2 2 2 2
(11)
If we substitute the Eq.(10) and Eq.(11) in Eq.(9), then we have the Wigner-Ville distribution (WV) defined as: ∞ 1 τ τ x (t + ) x ∗ (t − )e− jωτ dτ (12) WV (ω, t) = 2π −∞ 2 2 Following Hammond & White (1996), the recurrent problem of the WV is the so-called crossterm interference, due to bilinear nature of its definition. These crossed terms tend to be located mid-way between the two auto terms and are oscillatory in nature. When Φ (θ, τ ) = 1, we have the Wigner-Ville distribution WV (t, ω ). The Smooth Pseudo Wigner-Ville (SPWV) distribution is obtained by convolving the WV (t, ω ) with a two-dimensional filter in t and ω. This transform incorporates smoothing by independent windows in time and frequency, namely Ww (τ ) and Wt (t): SPWV (t, ω ) =
∞ −∞
Ww (τ )
x ∗ (u −
∞
−∞
Wt (u − t) x (u +
τ )du e− jωτ dτ 2
τ ) 2 (13)
Eq.(13) provides great flexibility in the choice of time and frequency smoothing, but the length of the windows should be determined empirically according to the type of signal analyzed and the required cross term suppression, discussed in Afonso & Tompkins (1995). As proved in Hlawatsch & Boudreaux-Bartels (1992), the SPWV in Eq.(13) does not satisfy the marginal properties, that is, the frequency and time integrals of the distribution do not correspond to the instantaneous signal power and the spectral energy density, respectively. However, it is still possible for a distribution to give the correct value for the total energy without satisfying the marginals, described in Cohen (1989; 1995). Therefore the total energy can be a good feature to detect signal events in the SPWV representation because the energy in EEG seizure is usually larger than the one during normal activity. The TFDs offer the possibility of analyzing relatively long continuous segments of EEG data even when the dynamics of the signal are rapidly changing. Taking the most of these, it can extract features from the time frequency plane such as ridges energy, frequency band values and so on. However, three considerations have to be taken, presented in Cohen (1989; 1995) and Durka (1996):
64
16
Epilepsy – Histological, Electroencephalographic and Psychological Aspects Will-be-set-by-IN-TECH
-
A TFD will need signals as clean as possible for good results.
-
A good resolution both in time and frequency is necessary and as the “uncertainty principle” states, it is not possible to have a good resolution in both variables simultaneously.
-
It is also required to eliminate the spurious information (i.e. cross-term artifacts) inherent in the TFDs.
The first consideration implies a good pre-processing stage to eliminate artifacts and noise. Second and third considerations have motivated the TFD selection or design, then it is important and necessary to choose a suitable TFD for seizure detection in EEG signals as well as for a correct estimation of frequencies on the time-frequency plane. Indeed, it is desirable that the TFD has both low cross-terms and high resolution. Choosing a distribution depends on the information to be extracted and demands a good balance between good performance, low execution time, good resolution and few and low-amplitude cross terms. One consideration before using the TFD is to convert each EEG segment into its analytic signal for a better time-frequency analysis. The analytic signal is defined to give an identical spectrum to positive frequencies and zero for the negative frequencies, and shows an improved resolution in the time-frequency plane, discussed in Cohen (1989). It associates a given signal x (n ) to a complex valued signal y(n ) defined as: y(n ) = x (n ) + jHT { x (n )}, where y(n ) is the analytic signal and HT {.} is the Hilbert transform. 4.5.3 Wavelet coefficients
The EEG signals can be considered as a superposition of different structures occurring on different time scales at different times. As presented in Latka & Was (2003), the Wavelet Transform (WT) provides a more flexible way of time-frequency representation of a signal by allowing the use of variable size windows and can constitute the foundation of a relatively simple yet effective detection algorithm. Selection of appropriate wavelets and the number of decomposition levels is very important in the analysis of signals using the WT. The number of decomposition levels is chosen based on the dominant frequency components of the signals. Large windows are used to get a finer low-frequency information and short windows are used to get high-frequency resolution. Thus, WT gives precise frequency information at low frequencies and precise time information at high frequencies. This makes the WT suitable for EEG analysis of spikes patterns or epileptic seizures. Wavelets overcome the drawback of a fixed time-frequency resolution of short time Fourier transforms. The WT performs a multiresolution analysis, WΨ f ( a, b ) of a signal, x (n ) by convolution of the mother function Ψ(n ) with the signal, as given in Latka & Was (2003), and Mallat (2009) as: N −1 ∗ n −b (14) WΨ x (b, a) = ∑ x (n )Ψ a n =0 Ψ(t)∗ denote the complex conjugate of Ψ(n ) (basis function), a the scale coefficient, b the shift coefficient and a, b ∈ , a = 0. Wavelets overcome the drawback of a fixed time-frequency resolution of short time Fourier transforms. The WT performs a multiresolution analysis, WΨ f ( a, b ) of a signal, x (n ) by
EEG Signal Processing for Epilepsy EEG Signal Processing for Epilepsy
65 17
convolution of the mother function Ψ(n ) with the signal, as given in Latka & Was (2003), and Mallat (2009) as: N −1 n −b (15) WΨ x (b, a) = ∑ x (n )Ψ∗ a n =0 Ψ(t)∗ denote the complex conjugate of Ψ(n ) (basis function), a the scale coefficient, b the shift coefficient and a, b ∈ , a = 0. In the procedure of multiresolution decomposition of a signal x (n ), each stage consists of two digital filters and two downsamplers by 2. The bandwidth of the filter outputs are half the bandwidth of the original signal, which allows for the downsampling of the output signals by two without loosing any information according to the Nyquist theorem. The downsampled signals provide detail D1 and approximation A1 of the signal, this procedure is described in Hasan (2008). Once the mother wavelet is fixed, it is possible to analyze the signal at every possible scale a and translation b. If the basis function Ψ(n ) is orthogonal, then the original signal can be reconstructed from the resulting wavelet coefficients accurately and efficiently without any loss of information. The Daubechies’ family of wavelets is one of the most commonly used orthogonal wavelets to non-stationary EEG signals presenting good properties and allowing reconstruction of the original signal from the wavelet coefficients, as described Mallat (2009). 4.5.4 Fractional Fourier transform
Fourier analysis is undoubtedly one of the most used tools in signal processing and other scientific disciplines and this technique uses harmonics for the decomposition of signals with time-varying periodicity. Similarly, TFDs are very frequently used in signal analysis especially when it is necessary to eliminate the windowing dependence on non-stationary signals. In 1930, Namias employed the fractional Fourier transform (FrFT) to solve partial differential equations in quantum mechanics from classical quadra-tic Hamiltonians1 . The results were later improved by McBride and Kerr in Tao et al. (2008). They developed operational calculus to define the FRFT. The FrFT is a new change in the representation of the signal which is an extension of the classical Fourier transform. When fractional order gradually increases, the FrFT of a signal can offer much more information represented in an united representation than the classical Fourier transform and it provides a higher concentration than TFDs, avoiding the cross terms components produced by quadratics TFDs. FrFT has established itself as a potential tool for analyzing dynamic or time-varying signals with changes in very short time and it can be interpreted as the representation of a signal in neutral domain by means of the rotation of the signal by the origin in counter-clockwise direction with rotational angle α in time-frequency domain as shown in Fig.7. The FrFT with
1
A development based on a concept called fractional operations. For example, the n-th derivative of f ( x ) can be expressed as dn f ( x ) /dx n for any positive integer n. If another value derived is required, i.e. the 0.5-th derivative, it is necessary to define the operator d a f ( x ) /dx a, where the value a could be an any real value. The function [ f ( x )]0.5 is the square root of the function f ( x ). But d0.5 f ( x ) /dx0.5 is the 0.5-th derivative of f ( x ) (a = 0.5), ( d f ( x ) /dx )0.5 being the square root of the derivative operator d/dx. As it can be seen, fractional operations is a concept that goes from the whole of an entity to its fractions.
66
Epilepsy – Histological, Electroencephalographic and Psychological Aspects Will-be-set-by-IN-TECH
18
Frequency (w)
v u
α α
Time (t) u=tcosα +wsinα v=−tsinα +wcosα
Fig. 7. The relation of fractional domain (u, v) with traditional time-frequency plane (t, w) rotated by an angle α. angle α of a signal x (t), denoted as Xα (u ) is defined in Almeida (1994) as: Xα ( u ) =
∞ −∞
x (t)Kα (t, u )dt
(16)
where Kα (u, t) is a linear kernel function continuous in the angle α, which satisfies the basic conditions for being interpretable as a rotation in the time-frequency plane. The kernel has the following properties
∞ −∞ ∞
Kα (t, u ) = Kα (u, t)
(17)
∗ K−α (t, u ) = K− α ( t, u )
(18)
Kα (− t, u ) = Kα (t, − u )
(19)
Kα (t, u )K β (u, z)du = Kα + β (t, z)
(20)
Kα (t, u )Kα∗ (t, u )dt = δ(u − u )
(21)
−∞
The FrFT is given by
Xα ( u ) =
⎧ ∞ ⎪ t2 1 − jcotα j u2 ⎪ ⎪ ( x (t)e j 2 cotα e jutcscα dt, e 2 cotα ⎪ ⎪ 2π ⎪ −∞ ⎪ ⎪ ⎪ if α is not a multiple of π ⎪ ⎪ ⎨ ⎪ ⎪ x (t), if α is multiple of 2π ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ x (− t), if α + π is multiple of 2π
More detailed definitions, proof and further properties of the kernel can be found in Almeida (1994).
EEG Signal Processing for Epilepsy EEG Signal Processing for Epilepsy
67 19
In summary, the FrFT is a linear transform, continuous in the angle α, which satisfies the basic conditions for being interpretable as a rotation in the time-frequency plane.
5. The detection problem in EEG signals Epilepsy is considered the disease with major prevalence within disorders with neurological origin. The recurrent and sudden incidence of seizures can lead to dangerous and possibly life-threatening situations. Since disturbance of consciousness and sudden loss of motor control often occur without any warning, the ability to predict epileptic seizures would reduce patients’ anxiety, thus improving quality of life and safety considerably. Intractable epilepsy is one of the most physically and emotionally destructive neurological disorders affecting population of all ages. It is generally accepted that surgical rejection of epileptic foci is the best solution. However, before conducting neurosurgery, it is necessary to study the presence of epileptiform activity, which is distinct from background EEG activity. The analysis of EEG data and the extraction of information is not an easy task. EEG recording may be contaminated by extraneous biologically generated (human body) and externally generated signals (power line, electrode movement etc.). The presence of this kind of noise or “artifacts” makes it difficult to discriminate between original brain waves and noise. This problem motivates a preprocessing step to obtain clean signals before the detection task. Another important problem in EEG processing is to figure out which kind of information or “patterns” we want to extract from the signal. This procedure is known as feature extraction. Extracted features depend considerably on the method used, which are usually transformations to other domains that permit the extraction of hidden information in the signal. Care has to be taken not to extract similar or irrelevant features that could reduced the detector performance or increase the computational load. Therefore, a feature selection procedure is also necessary to complement the features extraction procedure. Other important task in the medical environment to diagnose, classify or detect abnormalities, is to obtain ictal and interictal patterns. This usually involves monitoring of the patient during several weeks. Continuous observation or patient monitoring is a care activity that requires time and expensive work, being necessary specialized personnel for alerting of possible changes that a patient may have. When information is stored, there is another activity equally important: the analysis of the EEG registers. The specialists have to analyze waveforms, spectrum and peaks, and based on this analysis try to determine the pathology that the patient suffers. Usually they use a video unit. In many instances, there are disagreements among specialists about the same record due to the subjective nature of the analysis. The introduction of new techniques and mathematical algorithms in the EEG analysis can be helpful to design new supporting methods in medical decision and diagnosis, thus avoiding tedious analysis of long-term records and doubts about the brain pathology that a patient suffers. Nowadays there are many published studies about neurological diseases detection but these results are very focused on private institutional databases or rely on impractical numerical methods which are difficult to implement in a hospital environment. Therefore, the implementation and design of practical and reliable detection systems are very important in hospitals. This doctoral thesis, tries to narrow the gap that exists between EEG signal theory and practical implementation for the medical practice.
68
20
Epilepsy – Histological, Electroencephalographic and Psychological Aspects Will-be-set-by-IN-TECH
5.1 Summary of previous work in epileptic detection on EEG signals
Some methods of seizure detection were based on detecting strong rhythmic movements of the patient, but these methods had a limitation: seizures do not always present strong movements. This limitation led the detection problem to methods based on EEG signal analysis, for example, detection of large seizures discharges in several EEG channels by amplitude discrimination was described by J.R. Ives & Woods (1974); T.L. Babb & Crandall (1974) designed an electronic circuit for seizures detection from intracraneal electrodes. However, some seizures do not present EEG changes, therefore seizure detection only based on EEG analysis was not at all reliable and it was necessary to combine it with other methods. For example, P.F. Prior & Maynard (1973) identified on the EEG signal a large increase followed by a clear decrease in the amplitude and at the same time by large electromyogram (EMG) activity; A.M. Murro& Meador (1991) described a method based on spectral parameters and discriminant analysis. New alternatives for this detection problem are addressed from the point of view of pattern recognition. Gotman (1982) presented an automatic detection system based on seizure patterns. The drawback of this method is the necessity of traditional visual inspection of the patterns, being necessary a careful examination of them by a specialist. Presently, EEG epileptic detectors have evolved including new techniques such as neural networks, non-linear models, independent component analysis (ICA), Bayesian methods, support vector machines and variance-based methods, as described in Guerrero-Mosquera, Trigueros, Franco & Navia-Vazquez (2010). Other group of methods potentially useful for detecting and analyzing non-stationary signals are time-frequency distributions (TFDs) Cohen (1995). These methods allow us to visualize the evolution of the frequency behavior during some non-stationary event by mapping a one dimensional (1-D) time signal into a two-dimensional (2-D) function of time and frequency. Therefore, from the time-frequency (TF) plane it is possible to extract relevant information using methods such as peak matching, filter banks, energy estimation, etc. On the other hand, most of the detection methods proposed in the literature assume a clean EEG signal free of artifacts or noise, leaving the preprocessing problem open to any denoising algorithm such as digital filters, independent component analysis (ICA) or adaptive schemes using the electrooculogram (EOG) as reference signal, as in Guerrero-Mosquera & Navia-Vazquez (2009). 5.2 Classification algorithms for EEG signals
Unlike many theoretical approaches that solve certain problems using some model or formula, many classifiers are based on statistical learning. In such cases the system should be trained to obtain a good classifier taking into account that, under the following considerations described in Sanei & Chambers (2007), classification algorithms do not perform efficiently when: • the number of features is high, • there is limited execution time for a classification task, • the classes or labels from feature matrix are unbalanced, • there are nonlinearities between inputs and outputs, • data distribution is unknow,
EEG Signal Processing for Epilepsy EEG Signal Processing for Epilepsy
69 21
• there is no convergence guarantee to best solution (problem not convex or monotonic). Up-today, several algorithms in EEG signal classification and detection have been propose in the literature. For example, Multiple signal classification (MUSIC) combining EEG and MEG for EEG source localization described in Mosher & Leahy (1998); classification of patients with Alzheimer using Support Vector Machine (SVM) and neural networks (NNs) described in Lehmanna et al. (2007); Güler & Übeyli (2007) introduced the multiclass SVM for EEG. Lotte et al. (2007) describes several applications for BCI using methods such as Hidden Markov Modelling (HMM), Linear Discriminant Analysis (LDA) and fuzzy logic; Chiappa & Barber (2006) used the Bayes’s rule to discriminate mental tasks; detection of ERPs using SVM described in Thusalidas et al. (2006); Fuzzy SVM (FSVM) is utilized in Xu et al. (2009); Fisher’s discriminant is introduced in Müller et al. (2003). Applications in epilepsy classification such as Artificial Neural Networks (ANN) described in Subasi (2007); k-NN classifier and logistic regression with TFDs are used in Tzallas et al. (2009); Least Square SVM (LS-SVM) in Siuly & Wen (2009); Learning Vector Quantization with NN (LVQ-NN) described in Hasan (2008); Mixture of Experts (ME) and Multilayered Perceptron (MLP) in Subasi (2007); an automatic EEG signal classification using Relevance Vector Machine (RVM) is proposed by Lima et al. (2009). As showed Guyon et al. (2006), SVM and its variants have more applications in different classification scenarios and are powerful approach for pattern recognition, showing to be a good alternative for EEG signal classification due to their high performance, good generalization and adaptability in stationary and nonstationary environments compared to other methods such as NN.
6. Dimensionality reduction for EEG signals After feature extraction, it is necessary to select the subset of features that present better performance or are most useful for a problem at hand, such as regression, classification or detection. The data acquisition in environments such as biomedical signals leads to define each problem by hundreds or thousands of measurements leading to obtain high dimensional data with high computational cost. As discussed in Guyon & Elisseeff (2003), feature selection is based on the principle that choosing a smaller number of variables among the original ones, leads to an easier interpretation. In fact, under the assumption that reducing the training data2 might improve the performance task, the feature selection methods also allows a better data understanding and visualization together with reduction in data measurement and storage. Feature selection could be summarized in two main tasks: choosing the relevant features and searching the best feature subset. The first one tries to solve the question: is a feature (or subset of features) relevant for the problem? And the second one tries to search the best feature subset among all the possible subsets extracted from the initial task3 . The application of these two 2 3
Concept related to the fact of using a data set (also called data points, samples, patterns or observations) in order to gain knowledge, learn a task associated with desired outcomes. Although feature extraction and feature selection are different aspects of the pattern recognition process, it is important to distinguish the difference between them. The first one aims at building a good feature representation based on several measurements, and the second one tries to reduce the feature matrix by selecting subsets of features more useful in determined tasks.
70
Epilepsy – Histological, Electroencephalographic and Psychological Aspects Will-be-set-by-IN-TECH
22
tasks to high dimensional data causes a reduction in the data dimension, process known as dimensionality reduction. Besides feature selection, there is another set of methods known as projection methods that perform the same task but in practice could retain the problems suffered by high dimensional data, presented in Rossi et al. (2007; 2006). Typical projection algorithms are Principal Component Analysis (PCA), Sammon’s Mapping, Kohonen maps, Linear Discriminant Analysis (LDA), Partial Least Squares (PLS) or Projection pursuit, amongst others ( see Duda et al. (2009)). 6.1 Subset relevant assessment
This step is mainly based on a relevance criterion that has to be chosen by some measurement. The best choice for the criterion is certainly to estimate the performances of the model itself, i.e., an individual feature ranking could be appropriate at scenarios where the features provide a good performance by itself and there is the possibility of choosing features associated to high ranks. The idea of the “individual relevance ranking” can be clarified by the following example: Fig.8 shows a situation where the feature X2 is more relevant individually to predict the output Y than the feature Y1 . Notice the importance of choosing the right features to improve the performance of a task, which in this example is related to prediction of Y. There are different alternatives in relevance criteria, such as the Pearson correlation
0.8
0.8
0.6
0.6 Y
1
Y
1
0.4
0.4
0.2
0.2
0
0
0.5 X 1
1
0
0
0.5 X
1
2
Fig. 8. Simple prediction problem. The horizontal axis represents the feature and the vertical axis the output. It can see that feature X2 (left) is more relevant individually than feature X1 (right) in this simple prediction problem. coefficient, mutual information (MI) and wrapper methodology. Although each method has its advantages and disadvantages, mutual information has proven to be an appropriate measure in several applications such as selection of spectral variables, spectrometric nonlinear modelling and functional data classification, see Gomez-Verdejo et al. (2009); Rossi et al. (2007; 2006). Moreover, as discussed in Cover & Thomas (1991), correlation does not measure nonlinear relations among features and wrapper approach presents a high computational
71 23
EEG Signal Processing for Epilepsy EEG Signal Processing for Epilepsy
load. Furthermore, MI could be seen as a correlation measure applied to determine the nonlinearity among features. Next section focuses on the well-known concept of MI and shows why this relevance criterion is applicable for feature selection. 6.2 Mutual information (MI)
Mutual information (MI) measures the relevance between a group of features X and the variable or output Y. This relationship is not necessarily linear. As described in Cover & Thomas (1991), the mutual information between two variables is the amount of uncertainty (or entropy) that is lost on one variable when the other is known, and vice-versa. The variables X and Y could be multidimensional, solving the drawback in correlation measurements that are based on individual variables. Let p X ( x ) and pY (y) be the marginal of probability density function (pdf) of X and Y respectively, and the joint probability density function of X and Y is p X,Y ( x, y). If X has X alphabets, the entropy of X is defined as
∑
H (X) = −
x ∈X
p X ( x ) log p X ( x )
(22)
The base of the logarithm determines the units in which information is measured. Particularly, if the logarithm is base 2 the entropy is expressed in bits. The joint entropy H ( X, Y ) of a pair of discrete random variables ( X, Y ) with a joint distribution p X,Y ( x, y) is defined as H ( X, Y ) = −
∑ ∑
x ∈X y ∈Y
p X,Y ( x, y) log pY | X (y| x )
(23)
And the MI between two variables is calculated as I ( X, Y ) =
∑ ∑
x ∈X y ∈Y
p X,Y ( x, y) log
p X,Y ( x, y) p X ( x ) pY ( y )
(24)
Eq.24 gives the relation between X and Y, meaning that I ( X, Y ) is large (small) the variables are closely (not closely) related. The MI and entropy have the following relation, see Cover & Thomas (1991): (25) I ( X, Y ) = H (Y ) − H (Y | X ) For continuous variables, the entropy and MI are defined as H (X) = − I ( X, Y ) =
∞ ∞ −∞ −∞
∞ −∞
p X ( x ) log p X ( x )dx
p X,Y ( x, y) log
p X,Y ( x, y) dxdy p X ( x ) pY ( y )
(26) (27)
Note in Eq.24 and Eq.27 that it is necessary to know the exact pdf’s for estimating the MI and this is the most sensitive part in the MI estimation. Several methods have been proposed in the literature to estimate such joint densities, see Duda et al. (2009); Lotte et al. (2007).
72
24
Epilepsy – Histological, Electroencephalographic and Psychological Aspects Will-be-set-by-IN-TECH
7. Summary and conclusions In this chapter the fundamental concepts in the nervous system and different tools for EEG signal processing have been briefly explained. Several concepts in visual analysis of the EEG, brain rhythms, artifacts and abnormal EEG patterns, including EEG applications such as epilepsy detection, EEG modelling, EEG feature extraction, epilepsy detection and classification with methods oriented to dimensionality reduction have been reviewed. The chapter also provides key references for further reading in the field of EEG signal processing. Although all methods have been described in a brief way, they are introduced to give a good theoretical grounding in EEG processing and to better understand the methods proposed and their performance. Signal processing algorithms for EEG applications have specific requirements in filtering, feature extractions and selections. EEG is widely used as a diagnostic tool in clinical routine with an increasing develop of both analytical and practical methods. Its simplicity, low cost and higher temporal resolution of EEG maintains this tool to be considered in applications such as epilepsy seizures detection, sleep disorders and BCI. Future work implies the design of new EEG artifacts elimination methods, feature extraction to obtain possible hidden information and dimensionality data reduction. In medical environment, the steps that we follow in a classification problem are: (i) denoising and artifacts removal, (ii) LFE features extractions, (iii) detection/classification using these features and their combinations, (iv) if we do not have conclusive results, we add features from wavelets and fractional Fourier transform, (v) detection/classification using these features and their combinations, (vi) we apply dimensionality reduction if necessary. An additional potential field of research is the EEG Integration with other techniques such as fMRI. The principal drawback of the EEG is its low spatial resolution because it depends on the number of electrodes. MEG has a better temporal resolution than EEG but suffers the same disadvantage. fMRI solves this problem and its spatial resolution is on the order of milimeters. The integration of these techniques is of vital importance in neuroscience studies because this could improve the detection of other neurodegenerative diseases like Alzheimer, Parkinson, depression or dementia.
8. References Afonso, V. & Tompkins, W. (1995). Detecting ventricular fibrillation, IEEE Engineering in Medicine and Biology 14: 152–159. Akay, M. (1996). Detection and estimation methods for biomedical signals, Vol. 1, 1 ed Academic Press. Almeida, L. (1994). The fractional fourier transform and time-frequency representation, IEEE. Trans. on Signal Proc. 42: 3084–3091. A.M. Murro, D.W. King, J. S. B. G. H. F. & Meador, K. (1991). Computerized seizure detection of complex partial seizures, Electroencephalography and Clinical Neurophysiol. 79: 330–333. Boashash, B. (2003). Time Frequency Signal Analysis and processing. A comprehensive reference, Vol. 1, Elsevier. Boashash, B. & Mesbah, M. (2001). A time-frequency approach for newborn seizure detection, IEEE Eng. in Med. and Biol. Magazine 20: 54–64.
EEG Signal Processing for Epilepsy EEG Signal Processing for Epilepsy
73 25
Celka, P. & Colditz, P. (2002). Nonlinear nonstationary wiener model of infant eeg seizures, IEEE Trans. On Biomed. Eng. 49: 556–564. Chiappa, S. & Barber, D. (2006). Eeg classification using generative independent component analysis, Neurocomputing 69: 769–777. Cohen, L. (1989). Time-frequency distributions-a review, Proceedings of the IEEE 77: 941–981. Cohen, L. (1995). Time-Frequency Analysis, Vol. 1, Prentice Hall. Cover, T. & Thomas, J. (1991). Elements of Information Theory, Vol. 2, Wiley. Duda, R., Hart, P. & Stork, D. (2009). Pattern classification, Vol. 2, Elsevier. Durka, P. (1996). Time-Frequency analysis of EEG, Thesis Institute of Experimental Physics, Warsaw University. Ebersole, J. & Pedley, T. (2003). Current Practice of Clinical Electroencephalography, Vol. 3, Lippincott Williams & Wilkins. Fisch, B. J. (1999). Fisch and Spehlmann’s EEG primer: Basic principles of digital and analog EEG, Vol. 3, Springer. Güler, I. & Übeyli, E. D. (2007). Multiclass support vector machines for eeg-signals classification, IEEE Trans. on Inf. Tech. in Biomed. 11: 117–126. Gomez-Verdejo, V., Verleysen, M. & Fleury, J. (2009). Information-theoretic feature selection for functional data classification, Neurocomputing 72: 3580–3589. Gotman, J. (1982). Automatic recognition of epileptic seizures in the EEG, Electroencephalography and Clinical Neurophysiol. 54: 530–540. Guerrero-Mosquera, C. & Navia-Vazquez, A. (2009). Automatic removal of ocular artifacts from eeg data using adaptive filtering and independent component analysis, Proceedings of the 17th European Signal Processing Conference (EUSIPCO) pp. 2317–2321. Guerrero-Mosquera, C., Trigueros, A. M., Franco, J. I. & Navia-Vazquez, A. (2010). New feature extraction approach for epileptic eeg signal detection using time-frequency distributions, Med. Biol. Eng. Comput. 48: 321–330. Guerrero-Mosquera, C., Verleysen, M. & Navia-Vazquez, A. (2010). Eeg feature selection using mutual information and support vector machine: A comparative analysis, Proceedings of the 32nd Annual EMBS International Conference pp. 4946–4949. Guyon, I. & Elisseeff, A. (2003). An introduction to variable and feature selection, Journal of Machine Learning Research 3: 1157–1182. Guyon, I., Gunn, S., Nikravesh, M. & Zadeh, L. (2006). Feature Extraction, Foundations and Applications, Vol. 1, Springer. Hammond, J. & White, P. (1996). The analysis of non-stationary signals using time-frequency methods, Journal of Sound and Vibration 3: 419–447. Hasan, O. (2008). Optimal classification of epileptic seizures in eeg using wavelet analysis and genetic algorithm, Signal processing 88: 1858–1867. Hlawatsch, F. & Boudreaux-Bartels, G. (1992). Linear and quadratic time-frequency signal representation, IEEE SP Magazine 9: 21–67. James, C. & Hesse, C. (2005). Independent component analysis for biomedical signals, Physiol. Meas. 26: R15–R39. J.R. Ives, C.J. Thompson, P. G. A. O. & Woods, J. (1974). The on-line computer detection and recording of spontaneous temporal lobe epileptic seizures from patients with implanted depth electrodes via radio telemetry link, Electroencephalography and Clinical Neurophysiol. 37: 205. Latka, M. & Was, Z. (2003). Wavelet analysis of epileptic spikes, Physical Rev. E 67: 052902.
74
26
Epilepsy – Histological, Electroencephalographic and Psychological Aspects Will-be-set-by-IN-TECH
Lehmanna, C., Koenig, T., Jelic, V., Prichep, L., John, R., Wahlund, L., Dodgee, Y. & Dierks, T. (2007). Application and comparison of classification algorithms for recognition of alzheimerŠs disease in electrical brain activity (eeg), Journal of Neuroscience Methods 161: 342–350. Lima, C., Coelho, A. & Chagas, S. (2009). Automatic eeg signal classification for epilepsy diagnosis with relevance vector machines, Expert Systems with Applications 36: 10054–10059. Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F. & Arnaldi, B. (2007). A review of classification algorithms for eeg based brain-computer-interface, Journal of Neural Eng. 4: R1–R13. Mallat, S. (2009). A wavelet tour of signal processing, Third edition: The sparse way, Vol. 3, Elsevier. Müller, K.-R., Anderson, C. & Birch, G. (2003). Linear and nonlinear methods for brain-computer interfaces, IEEE Trans. on Neural Systems and Rehabilitation Eng. 11: 165–168. Mosher, J. & Leahy, R. (1998). Recursive music: A framework for eeg and meg source localization, IEEE Trans. On Biomed. Eng. 45: 1342–1354. P.F. Prior, R. V. & Maynard, D. (1973). An EEG device for monitoring seizure discharges, Epilepsia 14: 367–372. Rossi, F., Françoise, D., Wertz, W., Meurens, M. & Verleysen, M. (2007). Fast selection of spectral variables with b-spline compression, Chemometrics and Intelligent Laboratory Systems 86: 208–218. Rossi, F., Lendasse, A., Françoise, D., Wertz, W. & Verleysen, M. (2006). Mutual information for the selection of relevant variables spectrometric nonlinear modelling, Chemometrics ˝ and Intelligent Laboratory Systems 80: 215U226. Sanei, S. & Chambers, J. (2007). EEG signal processing, Vol. 1, Wiley. Senhadji, L. & Wendling, F. (2002). Epileptic transient detection: wavelets and time-frequency approaches, Neurophysiol Clin. 32: 175–192. Siuly, Y. & Wen, P. (2009). Classification of eeg signals using sampling techniques and least square support vector machine, Lectures notes in Computer Science 5589: 375–382. Sörnmo, L. & Laguna, P. (2005). Bioelectrical signal processing in cardiac and neurological applications, Vol. 1, Elsevier Academic Press. Stone, J. (2004). Independent Component Analysis: A tutorial introduction, Vol. 1, The MIT Press. Subasi, A. (2007). Eeg signal classification using wavelet feature extraction and a mixture of expert model, Expert Systems with Applications 32: 1084–1093. Tao, R., Deng, B., Zhang, W. & Wuang, Y. (2008). Sampling and sampling rate conversion of band limited signals in the fractional fourier transform domain, IEEE. Trans. on Signal Proc. 56: 158–171. Thusalidas, M., Guan, C. & Wu, J. (2006). Robust classification of eeg signal for brain-computer interface, IEEE Trans. on Neural Systems and Rehabilitation Eng. 14: 24–29. T.L. Babb, E. M. & Crandall, P. (1974). An electronic circuit for detection of EEG seizures recorded with implanted electrodes, Electroencephalography and Clinical Neurophysiol. 37: 305–308. Tzallas, A., Tsipouras, M. & Fotiadis, D. (2009). Epileptic seizure detection in eegs using time-frequency analysis, IEEE Trans. on Inf. Tech. in Biomed. 13: 703–710. Xu, Q., Zhou, H., Wang, Y. & Huang, J. (2009). Fuzzy support vector machine for classification of eeg signals using wavelet-based features, Medical Eng. & Physics 31: 858–865.