A new complexity-based algorithmic procedures for electroencephalogram (EEG) segmentation Boris Darkhovsky Institute for Systems Analysis, Russian Academy of Sciences, Moscow, Russia Email:
[email protected]
Abstract—Electroencephalogram (EEG) signals are complex and non-stationary. To analyze and model EEG records the segmentation of the signal into (quasi)stationary intervals is needed. In this paper we propose a novel approach for the segmentation problem that occurs in ”long” EEG records. This approach utilizes the concept of complexity of a continuous function. The complexity of a continuous function is defined as the fraction of the function values necessary to recover the original function via a certain fixed family of approximation methods without exceeding a given error. We applied this approach to the EEG signal of neonates to identify sleep stages. Our results showed 82-86% average agreement with the manual scoring provided by an expert pediatric neurologist. Keywords-EEG, complexity, time series, segmentation.
I. I NTRODUCTION It is well known, that EEG signals are some of the most complex physical signals encountered in science. This fact is connected to their strong non-stationarity and absence of the modern phenomenological models for them. In medical practice one studies relatively ”long” (hours, tens of minutes) EEG records, as well as relatively ”short” (minutes, seconds) EEG records. Due to the non-stationarity of the signals, methods based on the adjustment of different models to a fixed segment of ”long” EEG, in principal, can not be used to model of the signal in each new segment. Therefore, the segmentation of the ”long” EEG into (quasi)-stationary intervals is needed. This segmentation allows one to find a mathematical model for each (quasi)-stationary segment. In this paper, we propose a new approach to the segmentation problem of ”long” EEG records. This
Alexandra Piryatinska Department of Mathematics, San Francisco State University, San Francisco, CA, USA Email:
[email protected]
approach is based on the concept of the complexity of a continuous function which was employed previously by the authors in another context ([1]). We apply this approach to the EEG-sleep study of neonates. The level of brain dysmaturity of a neonate is difficult to assess by direct physical or cognitive examination, but dysmaturity is known to be directly related to the structure of neonatal sleep ([2]) as reflected in the nonstationary time series produced by EEG signals which, importantly, can be collected trough a noninvasive procedure. In the past, the assessment of sleep EEG structure has often been done manually by experienced clinicians. The goal of the study is to find an automated procedure to separate sleep stages, active and quiet. The manual scoring of the active and quiet sleep stages was provided by an experience physician. For neonates, the structure of sleep is very different from that of the adults, and the sleep-stage classifications used by the neurologists are also different. The EEG signal has high variability between the subjects and a study of such signals is a complicated problem. In our previous study ([3], [4]) we developed an algorithm for neonatal sleep stage separation, that separated active and quiet sleep stages. The review of the literature on this problem can be found in the above papers. In the present paper, we use the new complexity of the continuous function characteristics to construct an automated procedure for the sleep stage separation. II. C OMPLEXITY OF A CONTINUOUS FUNCTION ON A BOUNDED SEGMENT
At the beginning of the 1980s, Kolmogorov ([5]) suggested an algorithmic approach to the notion of object ”complexity”. The main idea of this approach is as follows:
A ”complex” object requires a lot of information for its reconstruction and, for a ”simple” object, little information is needed.
Thus the value h∗ (ϵ) is the minimum grid spacing such that the error of the function reconstruction from its values on the grid exceeds given ϵ.
This idea is the closest one to our approach to the complexity of a continuous function defined on a compact set in a finite dimensional space.
The value (h∗ (ϵ)) is the fraction (in relation to the volume of the unit cube) of the discarded function values.
Without loss of generality, we assume that a continuous function x(t), t ∈ I is defined on a unit cube I in the space Rk . On the set of such functions we introduce the norm ∥ · ∥. To be able to compare the complexity of different functions, it is reasonable to assume that ∥x(t)∥=1, i.e., to consider x(t)/∥x(t)∥ instead of x(t).
It is clear that value (1/h∗ (ϵ))k estimates the number of points in the set Ih∗ (ϵ) . Therefore it is natural to use h∗1(ϵ) as a measure of function complexity.
Let Zh be a k-dimensional grid with spacing h and Ih = I ∩ Zh . Assume that we know only the values of x(t) at the points of set Ih . With what precision can we reconstruct the function x(t) utilizing only this information? Suppose we are given a set of approximation methods F of the function over a finite set of function values at points Ih . Consider the approximation error
k
Definition 1. The number def
S(ϵ, F, ∥ · ∥) = S(ϵ) = log
1 h∗ (ϵ)
is called (ϵ, F, ∥·∥)-complexity of an individual function x(t) ( or, briefly, ϵ-complexity). In other words complexity of a continuous function on a segment is the (logarithmic) fraction of the function values that should be retained to reconstruct the function via a certain fixed family of approximation methods with a given margin of error.
δ(h) = inf ∥x(t) − x ˆ(t)∥, F
where x ˆ(t) is the approximation of the function constructed by one of the allowable methods of approximation. Infimum is taken over the whole set of allowable methods of approximation. It is clear that with increasing h the function δ(h) must increase monotonically: the increase of the grid spacing means that we discard more and more function values. Therefore, the error function δ(h) is a monotone nondecreasing positive function of its argument. If we fix a certain ”acceptable” (user-specified) level of approximation uncertainty, ϵ ≥ 0, then we can determine the fraction of the function values that could be discarded to reconstruct the original function via a certain fixed family of approximation methods without exceeding a given error. Note that the error of approximation should be related to the norm of the function, but since we assume that the function is pre-normalized , δ(h) is really the relative error. Let
h∗ (ϵ) =
{
inf{h ≤ 1 : δ(h) > ϵ}, 1
if {h : δ(h) > ϵ} ̸= ∅ if the set is empty (1)
III. E STIMATION OF THE FUNCTION COMPLEXITY Suppose we are given an array of size N values of a function. Let us choose a number 0 < S < 1, and discard from the array [SN ] values1 . In the next step we use the remaining [(1 − S)N ] values to approximate the values of the function for all discarded points using a collection F of approximation methods, and find the best approximation (that is the approximation with the smallest error). In our study the class F consists of piecewise-polynomial functions of different degrees. In this process, two factors have to be taken into account. First, the remaining points should be distributed relatively uniformly on I. Second, for the sake of the stability of the method it is expedient, for a given percentage of removed points we choose different selection patterns and average the corresponding minimal approximation errors over them, since the error of the approximation depends on the location of the remaining points. This process allows us to smooth out the unavoidable random errors in the calculations. Thus, for a given value of S we determine the value of the minimal error ϵ of the function’s reconstruction. 1 Here,
[a] denotes the integer part of a.
Our main conjecture(see [6]) is that for functions satisfying H¨older’s property, these values should be related as follows
log ϵ = A + B log S
(2)
Parameters (A, B) can be estimated by the leastsquares method using several values of the pairs (S, ϵ). Our simulations([6]) with functions of one variable confirmed that the relationship (2) holds reasonably well, therefore, the least-squares method is appropriate to this context. We will call the parameters (A, B) complexity coefficients and will utilize then as a characterization of function complexity. IV. A PPLICATION OF THE FUNCTION COMPLEXITY TO EEG SEGMENTATION Now we can apply this concept to the case of EEG recordings. EEG recordings are collections of discrete samples. We assume that each collection of samples is the projection of a continuous function on the grid of equidistant points embedded in some bounded segment. If the interval of observations in the EEG record is large enough (hours or a large fraction thereof) then it can be assumed that the data are generated by different mechanisms (stochastic as well as deterministic) in different subintervals. We assume that the change of the EEG generating mechanism means the change of the EEG record complexity. Besides, we assume that the complexity of the EEG record does not change in mean at any quasi-stationary segment. Taking this into account, we can estimate complexity coefficients using a sliding window, or split the record into disjoint segments. Then the sequences (A(t), B(t)) (see (2)), t = 1, 2, . . . , m (here m is the number of sliding windows or segments), can be used as diagnostic sequences for the change point detection algorithm. To detect change-points in diagnostic sequences in our experiments we use the non-parametric method described in [7]. This methodology is based upon two main ideas. The first idea relies on the observation that detection of changes in any probabilistic characteristic can be reduced (with an arbitrary degree of accuracy) to detection of changes in the mean value of some new, diagnostic sequence constructed from the original one. The second idea of the non-parametric approach employs the following family of statistics for detection
of change-points in the mean YN (n, δ) =
[ ] n N [( ∑ n ) n ]δ 1 ∑ N 1 xk − xN 1− , k N N n N −n k=1 k=n+1 { } (3)
where 0 ≤ δ ≤ 1, 1 ≤ n ≤ N − 1, X N := xN k is a diagnostic sequence (see details in [7]).
N , k=1
It is known (see [7]) that these statistics are asymptotically (as N → ∞) optimal. Application to the EEG-sleep study. As an example of a ”long” record we used the data for 20 fullterm and 16 preterm, healthy neonates at the 40week postconceptional age. Most of the fullterm neonate recordings were made on the 3rd day after the baby was born; preterm neonate recordings were obtained at the same post-conceptional age. Each recording collected signals on 14 EEG channels at a sampling rate of 64Hz. Minute by minute manual scoring of the sleep stages by a clinician (Dr. M. Scher) was also made available. The portions of the signals corresponding to the awake state and those caused by artifacts (such as rapid physical movements) were removed from our data prior to analysis. Below we illustrate our methodology of sleep stage separation using just the single channel 11:C4 − Cz. We performed the analysis for all channels and the channel C4 − Cz gives the best result. We carried out the analysis in three steps: in the First Step the signal for each of the 36 neonates in our study is divided into 30-second segments, and the coefficients (A(t), B(t)) are estimated for each segment thus providing us with what we call diagnostic sequences. Figure 1 shows an example of the box-plots for the estimated coefficients (A(t), B(t)) for the 30 sec segments of active and quiet sleep stages correspondingly for a fullterm neonate. We can see that the complexity coefficients can be used to separate sleep stages utilizing the mean. Notice the existence of some outliers. In the Second Step, the change-point detection algorithm is used to find the change-points in the means of the above two parameters for each neonate. We use the sequences (A(t); B(t))(see (2)), t = 1, 2, . . . as the diagnostic sequences and the nonparametric change point detection algorithm is applied to each of them. Then the original diagnostic sequence is replaced by its local means over the interval where the mean remains constant. We call the latter the de-noised diagnostic sequences. Figure 2 shows an example of the diagnostic se-
Diagnostic sequences of "A", EEG 88, Fullterm
eeg1, Fullterm, coeff "A"
−3.5
−4.0 −4.5
diagn seq denoised seq −4
−4.5
A
−5.0
−5
−5.5
−5.5
−6.0
−6
−6.5
−6.5
−7
Active
0
20
40
60
80
100
120
140
160
180
200
Time (minutes)
Quiet
Diagnostic sequences of "B", EEG 88, Fullterm
Sleep stages −0.6
diagn seq denoised seq
eeg1, Fullterm, coeff "B"
−0.8
−0.7
−0.9
−0.8
−1.0
B
−0.9
−1
−1.1
−1.1
−1.2
−1.2
−1.3
0
20
40
60
80
100
120
140
160
180
200
Time (minutes)
Active
Quiet Sleep stages
Fig. 1. Example of the box-plots for the estimated complexity coefficients (A(t), B(t)) for fullterm neonate, characteristics A(left) and B (right) in active and quiet sleep stages
quences (dash line) and de-noised diagnostic sequences (solid line) for the fullterm neonate (coefficient A (top) and B(bottom)).
Fig. 2. Example of the diagnostic sequences (dash line) and de-noised diagnostic (solid line) sequences for the fullterm neonate, coefficient A (top) and B (bottom) for fullterm neonate.
20 fullterm neonates and 12 out of 16 preterm neonates have an AP > 85%. Notice that the average AP between manual scores provided by different experienced physicians is 85 % ([8]).
In the Third Step, the sleep stage separation operation is carried out. We apply the k-mean cluster analysis, with 2 clusters for 2-dimensional denoised diagnostic sequences for each neonate (for details see [3], [4]). We assign the clusters to the active and quiet sleep stages depending on whether mean(A(tactive )) < mean(A(tquiet )), or mean(B(tactive )) < mean(B(tquiet )). Then the results are compared with the manual scores by a clinician. Figure 3 shows the degree of agreement of the automated and manual scores for fullterm (left) and preterm (right) neonates.
In this paper, we proposed a novel approach for the segmentation of the EEG records. This approach is based on the concept of the complexity of a continuous function. We demonstrated good performance of our approach on ”long” EEG records. These records came from the EEG-sleep study of neonates. The goal of the study was to identify automatically sleep stages, active and quiet. The manual scoring of the active and quiet sleep stages was provided by an experience physician.
Figure 4 shows the performance (agreement percentage (AP)) of our automated sleep stage separation algorithm (using just one channel) for all babies involved in our study. We observe that the mean agrement percentage (MAP) is 82.4% (standard deviation (sd) is 13.5%) for fullterm neonates, and 86.5% (sd=14.5%) for preterm neonates.The median AP is 83.9% for fullterm and 92.2% for preterm neonates. We observe that 9 out of
In our previous studies([3], [4]), we used delta waves power for fullterm neonates and used delta waves power, spectral entropy, and Hausdorff fractional dimension for preterm neonates, as diagnostic sequences in the segmentation procedure. Generally speaking, the use of these characteristics has some shortcomings. The main assumption for their estimation is stationary, but an EEG signal is highly non-stationary. Therefore we propose
V. C ONCLUSIONS AND DISCUSSION
AP for channel 11, Fullterm
Agreement of Automated and Manual scores, EEG88, Fullterm
100
Automated Manual 90
Quiet
Agreement
80
Undefined
70
60
50
Active
40
0
20
40
60
80 100 Time (minutes)
120
140
160
180
42
90
88
55
63
65
25
79
22 101 100 24
1
58
61
75
26
23
43
85
EEG num AP for channel 11, Preterm
Agreement of Automated and Manual scores, EEG82, Preterm
100
Automated Manual 90
Quiet
Agreement
80
Undefined
70
60
50
Active 40
0
20
40
60
80 100 Time (minutes)
120
140
160
180
Fig. 3. Example of the agreement percentage of the automated and manual scores for fullterm neonate(top) and preterm neonate (bottom).
new characteristics, complexity coefficients, estimation of which does not require such assumption. Moreover the use of these characteristics does not require any knowledge about the data generating mechanism. Therefore, we believe that utilization of function complexity characteristics is promising and useful. We use complexity coefficients as diagnostic sequences to detect changes in the EEG-sleep signal. The rest of the algorithm is similar to the one utilized in the previous studies. Here only one channel was used to illustrate the method. The study which utilized multi-channel approach will be published later. In the previous studies the best one-channel results were as follows: the mean agreement percentage (MAP) with the doctor’s score was 84% (sd=12%) for fullterm neonates, and 85% (12%) for preterm neonates. Using the new complexitybased characteristic the MAP results are as follows: 82.4% (sd=13.5%) for fullterm neonates, and 86.5% (sd=14.5%) for preterm neonates.The median AP is 83.9% for fullterm and 92.2% for preterm neonates. These results are comparable. Specifically, the algorithm did a better job classifying the EEG signal of the preterm neonates, but this effect was not as pronounced for the full term neonates. We conclude that the coefficients of complexity are attractive characteristics for use in the neonatal EEG-sleep separation. Our results of the EEG data analysis suggest that the proposed methodology can be widely used.
82
40
11
54
95
37
57
39
106
84
52
51
81
113
66
111
EEG num
Fig. 4. Agreement percentage by neonates, fullterm (top), preterm (bottom)
R EFERENCES [1] B. Darkhovsky, A. Kaplan, M. Kosinov, ”The estimation of complexity for the electroencephalogram in humans,” In: Computer Aided Control System Design, IEEE International Conference on Control Applications, Munich, 2006, pp. 301 - 306. 1 [2] M. S. Scher, B. L. Jones, D. A. Steppe, D. L. Cork, H. J. Seltman, D. L. Banks, ”Functional brain maturation in neonates as measured by EEG-sleep analyses”, Clinical Neurophysiology, vol. 114, pp. 875-882, 2003. 1 [3] A. Piryatinska, G. Terdik, W. A. Woyczynski, K. A. Loparo, M. S. Scher, and A. Zlotnikm, ”Automated detection of neonate EEG sleep stages”, Computer Methods and Programs in Biomedicine vol 9, pp.31-46, 2009. 1, 4 [4] A. Piryatinska, W. A. Woyczynski, K. A. Loparo and M. S. Scher, ”Optimal channel selection for analysis of EEG-sleep patterns of neonates”, Computer Methods and Programs in Biomedicine, vol. 106, pp. 14-26, 2012 1, 4 [5] A. N.Kolmogorov, ”Combinatorial foundations of information theory and probability calculations”, Uspechi math. nauk, vol. 38, pp 27-36, 1983. 1 [6] B. S Darkhovsky, A. Piryatinska, ”Complexity of continuous functions and segmentation of time series”, In JSM Proceedings, Statistical Computing Section. Alexandria, VA: American Statistical Association, 2012 (to appear). 3 [7] B. E. Brodsky and B. S. Darkhovsky, Non-parametric Statistical Diagnosis: Problems and Methods, Dordrecht: Kluwer, 2000. 3 [8] H. Danker-Hopfe, G. Gruber, G. Kl¨och, J. L.Lorenzo, S .L. Himanien, B. Kemp, T. Penzel, J. R¨oshke, H. Dorn, A. Schl¨ol, E. Trenker, G. Dorffner, ”Interrater reliability between scores from eight Europen sleep laboratories in subjects with different sleep disoders”, Journal of Sleep Reseach, vol. 13, pp. 63-69, 2004. 4