IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 47, NO. 6, JUNE 2000
757
A New and Fast Nonlinear Method for Association Analysis of Biosignals João Paulo Silva Cunha*, Member, IEEE, and Pedro Guedes de Oliveira, Senior Member, IEEE
Abstract—In this paper, we present some original theoretical aspects of a fast nonlinear association measure based on the work of Cramér. The features of this new measure—the measure—when applied to biosignals are also shown using simulated time series. A comparative study with other well-known association measures was found available in the literature of biosignals is presented. to be twice as fast and more robust to nonlinearities than the classical cross-correlation ratio ( 2 ) and more than 100 times faster than the nonlinear regression coefficient ( 2 ), presenting similar behavior in the presence of nonlinear simulated situations. This new measure is very fast and versatile. It is appropriate to deal with nonlinear relations presenting usually a sharp peak in the association function enabling a high degree of selectivity for maxima detection. It seems to constitute an improvement over linear methods of association which is faster and more robust to the existing nonlinearities. It can be used as an alternative to more complex nonlinear association measures when computational speed is an important feature. Index Terms—Biosignal, fast algorithm, nonlinear association measure.
I. INTRODUCTION
A
HANDFUL of methods with the aim of measuring association between biosignals are available from the literature. The classical cross-correlation ratio and the coherence and phase measure [1]–[5] are perhaps the most widely used. Although linear measures are usually very fast, they can only yield unambiguous results if linear associations are present which makes them unsuitable for many biosignal applications. Shannon’s information theory has been the basis for several nonlinear association measures. Shannon himself proposed the “rate of transmission” coefficient in his 1948 landmark publications on this subject [6], [7] which was generalized by the Russian mathematicians Gel’fand and Yaglom in 1959 [8]. Another measure based on this theory is the average amount of mutual information (AAMI) [9], [10] which was used in EEG signals from dogs [11]. One widely used nonlinear measure, called nonlinear regression coefficient (shortly called ) introduced by Winer in his book on statistical principles [12], was adapted Manuscript received October 7, 1998; revised February 2, 2000. This work was supported by the Portuguese Research and Development Agency (JNICT) under Grants PMCT/C/SAU/805/90 and BD/1656/91-IA and by Instituto de Engenharia de Sistemas e Computadores (INESC). Asterisk indicates corresponding author. *J. P. S. Cunha is with the Departmeno de Electrónica e Telecomunicações/IEETA, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal. He is also with the Neurophysiology Department, S. António General Hospital, 4050 Porto, Portugal (e-mail:
[email protected]). P. G. de Oliveira is with INESC Porto/Faculty of Engineering, University of Porto, 4050 Porto, Portugal. Publisher Item Identifier S 0018-9294(00)04411-6.
by Pijn and Lopes da Silva for biosignal analysis [13], [14] and applied to study coupling between different brain structures during epileptic seizures in kindled rats [15] and in epileptic patients [16]. In both papers, strong nonlinear relationships were reported. Other authors have presented similar results [17] indicating that in many (if not all) biosignal generating systems nonlinearity is present. To approach these situations we can use nonlinear association measures but they are usually complex and demand a lot of computer resources. This issue is particularly relevant when one wants to study large populations and/or long biosignal epochs. In EEG, where several channels of biosignal are acquired for long epochs (from several minutes to hours or even days) computational speed becomes a major concern. As an illustration, to analyze 1 min of 16 channels of EEG sampled at 512 Hz (all possible pairs for a delay of 50 to 50 takes more than four days on a samples between signals) Pentium-based PC computer. The context just described constituted the trigger for our investigation on developing an association measure sensible to nonlinearities and faster than the ones available in the literature. II. METHOD A. Statistical Association Measures The statistical association measures have been an important scientific issue from the beginning of the century until the middle 1970’s when the landmark book from Bishop et al. [18] was published presenting a complete review of this theme. These measures are usually applied to two-dimensional table with contingency tables. Let us consider a generic , row totals , column totals , category frequencies and a total as follows:
A class of these measures are the “general association measures” which are known to be sensitive to general associations while the other classes are devoted to detect specific type of associations (predictive or agreement) or can only be applied contingency tables. They have two important and apover pealing characteristics that led us to investigate their application in biosignals. First, they are very little demanding on computing power and, second, they follow well-known statistical distributions which can give us the possibility to know the significance level of the results.
0018–9294/00$10.00 © 2000 IEEE
758
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 47, NO. 6, JUNE 2000
1) General Association Measures: All the measures of this class are based on the Pearson’s coefficient of mean squared [19] which is given by contingency
[8]. These authors used the Jaynes’ maximum entropy principle to show that, for a contingency table (4)
(1)
where
are the entropies given by
and this coefficient is not limited to the inFor . In the literature, there are three different coefficients terval that overcome this limitation, being the most derived from general the one proposed by Cramér in 1946 [20]. It provides a interval for any simple measure that is normalized to table and can be written by
(2) under Given that the maximum likelihood estimate (MLE) of (being the Chi-squared value multinomial sampling is , and the total of a contingency table) [18] and making the corresponding MLE of will result in
This result gives an alternate interpretation to the where it times the additional information represents approximately given by the knowledge of the observed frequencies over the information already provided by the row and column totals of the contingency table being analyzed. , If we consider the information transmission coefficient also proposed by Shannon defined as (5)
(3) This measure should be interpreted as translating the squared departure from independence in a scale between zero and one and it is only suited to compare several contingency tables. It has nice proprieties such as symmetry and provides built-in test of significance. Algorithm: To generate the contingency tables on which we apply our measure we developed a modified version of the algorithm used by Callaway and Harris [21] where delay computation was introduced. Polarity and slope are used to build the classes of the contingency tables which are generated for each to samples) introduced bedelay (typically, between tween the time series under evaluation. is then computed over each table and the array generated constitutes the association function, i.e., the values of in function of time delay between time series.
we can relate it with
as defined in (4) by (6)
The importance of this new result resides in the indication that can be also interpreted as a mutual information measure, overcoming one of the problems of the chi-squared-based statistical measures—their lack of clear interpretation in terms of information theory. and : Gel’fang and Yaglom [8] 2) Relation Between presented a result that, if we assume only the presence of linear associations, shows that the relation between and the crosscorrelation is given by (7) Based on this result and on (6), we can write, after some simplification
III. RESULTS During our theoretical study of the measure, we tried to derive relations with other association measures available from literature. This can give us more deep insight on the proprieties of this “new” measure that, to our knowledge, was never applied in biosignal processing and for which there is few literature available. We derived two original relations of with other well-known measures of association. 1) Relation Between and Information Transmission Coefficient : Kapur and Kesavan [22] derived a result which estaband the concept of mutual lishes a relationship between the information (MI) proposed by Shannon and Weaver in 1949 [23] which was extended and generalized by Gel’fang and Yaglom
(8) This original result can be used to compare relations are considered.
and
when linear
A. “ ” Study Using Simulated Signals To evaluate the applicability of in association studies we performed a battery of tests using a large set of simulated signals, from white noise to sine waves, operated by linear and nonlinear transformations, such as simple delays, square root or double rectifier. More than 20 different pairs of time series reflecting different transformations over different types of signals were used. We dedicated special attention to simulations
CUNHA AND OLIVEIRA: NEW AND FAST NONLINEAR METHOD FOR ASSOCIATION ANALYSIS OF BIOSIGNALS
Fig. 1.
759
Examples of the behavior of V for different simulated situations.
of biosignals such as EEG. For this reason we generated two statistically independent 512 samples time series, SEEG1 and SEEG2. The first (SEEG1) was generated from random noise and ) filtered with with Gaussian distribution ( a low-pass Butterworth fifth-order filter with 35-Hz cutoff frequency. The second was generated by a similar procedure with uniform distribution and using a different seed in the random sequence generation algorithm. A 2-ms sampling period was considered. In Fig. 1(A), we estimated for an output equal to the input and in 1(B) for two independent signals as input and output. Values of were 100% and 23.4%, respectively. In Fig. 1(C), SEEG1 is delayed by 24 ms and is able to match the maximum peak (95.2%) for the simulated delay. In Fig. 1(D), we present the measure applied to the sum of the result of Fig. 1(C) and the independent simulated signal (SEEG2). In this case, we are in the presence of a 50% signal-to-noise ratio (SNR) which %). In this situais reflected in the value obtained ( tion, the characteristic sharp peak of is not present giving us an indication of the measure’s behavior in the presence of this level of noise.
Fig. 2. Association functions of T and V for a delay transformation. We can observe the similarity of both curves.
For a signed square transformation presented in Fig. 1(E), considered a “light” nonlinear transformation due to the small distortion it introduced in the original signal, and for a “strong” nonlinear one composed by a square root of the module of the presented a sharp peaks and input presented in Fig. 1(F),
760
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 47, NO. 6, JUNE 2000
Fig. 3. Results for simulations involving a 24-ms delay applied to (A) time serie SEEG1 and to the sum of the result of situation A and (B) the independent time serie SEEG2.
maxima values of 83.7% and 54.3%, respectively. Apparently, maxthe more nonlinearity involved the lower comes the imum. 1) Verification of the Relation Between and : To study our original result of (6) we also used simulated signals. In and for time series Fig. 2, association functions of produced by a delay [same signals presented in Fig. 1(C)] are shown. The resemblance of both curves is striking. Their at the correct maxima are high (87.6% for and 90.6% for delay) and differ in 3% which we consider acceptable due to the approximation nature of (6). This result is an additional indication that the signal characteristics we use to build our contingency table classes (polarity and slope) seem to be adequate. B. Comparative Study Using Simulated Signals meaAfter we performed a study of the behavior of the sure of association using different simulated signals we decided to compare it with other measures from the literature using the same set of simulated signals. For this comparative test we seand a nonlinear one , both prelected a linear measure sented before in this paper. For linear transformations, such as simple delay, all three was very near measures performed well. In most cases, the and . An example 100% and always slightly greater than is given in Fig. 3(A) where the simulated signal SEEG1, already used before, is delayed by 24 ms. In Fig. 3(B), the 50% SNR situation of Fig. 1(D) is now used in our comparative
study. All three measure reflect correctly the simulated SNR ( % % and %). In some nonlinear transformations (so-called “light nonlinear”), such as a delayed signed square [Fig. 4(A)] or delayed signed square-root of the absolute value of the input signal [Fig. 4(B)], we also obtained a good behavior for all the measures, with maxima larger on than and . For other (more gave poor results, “strongly”) nonlinear transformations, performed well and also presented good performance, maintaining its sharp form but with lower peak values, as can be depicted in Fig. 5(A) and (B). and : To test 1) Verification of the Relation Between and presented before (8) we used the relation between the results of Fig. 3(A) and (B) which were obtained for linear due to the combinations of transformations. In this case, polarity and slope from the algorithm used. Comparison was performed for the delay imposed by the transformation used (24 ms) and the results presented small difference errors (Table I). These results appear to support the approximation between and derived in (8). 2) Computational Speed Tests: As we have already declared, we consider computational speed an important characteristic to evaluate in our new measure. We performed a first computational speed test based on the analysis of the number and types of operations involved in each algorithm and, core loop as a function of the number of samples in the case of , also of an internal parameter (the number of “bins” chosen— ) which we present in Table II. Based
CUNHA AND OLIVEIRA: NEW AND FAST NONLINEAR METHOD FOR ASSOCIATION ANALYSIS OF BIOSIGNALS
Fig. 4.
Results of simulations involving “light” nonlinear transformations.
Fig. 5.
Results of simulations involving “strong” nonlinear transformations.
761
762
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 47, NO. 6, JUNE 2000
ERROR BETWEEN V
AND THE
TABLE I VALUE OF THIS MEASURE COMPUTED FROM r USING (8) ON LINEAR TRANSFORMATIONS OF FIG. 3
TABLE II NUMBER OF OPERATION INVOLVED ON EACH STUDIED ALGORITHM
Fig. 6. Results of a comparative CPU time test for the three measures studied. A logarithmic scale is used for a better graphical display.
N —number of samples; L—number of bins for h
algorithm.
on these results and assuming typical values for and and for typical number of CPU clock cycles is per type of instruction involved we can conclude that and more algorithmically more than 100 times faster than than three times faster than . To confirm this result we performed a second computational speed test of these measures by computing each measure over ten pairs of 512 samples segments for a time delay between and samples. The general structure of the programs (written language) was the same being the only difference the in C class responsible for the measure computation. This fact avoided the possibility to have different file streams and memory array manipulation algorithms that could introduce errors in our speed measures. These tests were performed in a PC computer and a simple CPU clock-based chronometer routine was used. The results are presented in Fig. 6. The computational efficiency of when compared to and can be clearly observed— was and twice as fast as . here about 140 times faster than IV. CONCLUSION In this paper, we present a new nonlinear association measure and we investigate its applicability to biosignals. This was done using simulated signals which emulate different relevant situations. We also studied the behavior of this measure in comparison with other methods available from literature.
A. “ ” Measure Study From the present study we can conclude that the behavior of the measure presents, in most cases (linear and nonlinear transformations), very sharp peaks at the correct simulated delay, being this peak higher for linear transformations than for the nonlinear ones. We could observe that the more nonlinearity was obtained. This behavior involved the lower value of indicates that is sensible to nonlinearity and the sharp peaks reveals a high degree of selectivity for maxima detection in most situation studied. An important propriety of our measure is its versatility because we can measure associations between many different sets of characteristics of the signals, since it is applied over contingency tables. It may have applications for many different problems even out of the area of biological signals. Some theoretical study was also performed which enabled us to find relations between and other association measures already used in biological signals. We could establish a relation between and Shannon’s concept of mutual information and, consequently, with the information transmission coefficient . Using simulated signals, we could find indications that confirm this relation. It is important to point out that this measure was first introduced by the studies of Cramér from 1946 on statistical association measures [20] and that Shannon published his first results of the information theory two years later, in 1948 [6], [7]. Apparently, both researchers reached a similar measure of association based on different motivations. A second relation, between and valid for linear transformations, could also be
CUNHA AND OLIVEIRA: NEW AND FAST NONLINEAR METHOD FOR ASSOCIATION ANALYSIS OF BIOSIGNALS
derived. We obtained simulated results that seem also to confirm this relationship. B. Comparative Study of Measures Three measures were compared: the cross-correlation coefficient , the nonlinear regression coefficient and . A fourth with measure could be indirectly studied. Pijn compared was more robust and AAMI extensively, concluding that faster to compute than AAMI [13]. From this comparative study we found that our measure shows a behavior facing nonlinearities better than and similar to , although presenting smaller absolute values. Our measure and twice was found to be more than 100 times faster than as fast as . This measure appears to constitute an improvement over linear methods of association which is faster and more robust to the existing nonlinearities. It can be used as an alternative to more complex nonlinear association measures when computational speed is an important feature. ACKNOWLEDGMENT The authors would like to dedicate this work to Dr. Jan Pieter Pijn They would like to thank Prof. A. Martins da Silva and Prof. F. Lopes da Silva for their continuous support and W. Blanes and Prof. D. Mendonça for their invaluable contributions in different phases of this work. REFERENCES [1] M. A. B. Brazier, “Interactions of deep structures during seizures in man.,” in Synchronization of EEG Activity in Epilepsies, P. P. Petche and M. A. B. Brazier, Eds, Berlin, Germany: Springer-Verlag, 1972. , “Electrical seizures discharges within the human brain; the [2] problem of spread,” in Epilepsy: Its Phenomena in Man., M. A. B. Brazier, Ed. S. Diego, CA: Academic, 1973. [3] J. Gotman, “Interhemispherical relations during bilateral spike and wave activity,” Epilepsia, vol. 22, pp. 453–466, 1981. , “Measurement of small time differences between EEG channels: [4] Method and applications to epileptic seizure propagation,” Electroenceph. Clin. Neurophysiol., vol. 56, pp. 501–514, 1983. , “Interhemispheric interactions in seizures of focal onset: Data [5] from human intracranial recordings,” Electroenceph. Clin. Neurophysiol., vol. 67, pp. 120–133, 1987. [6] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Technol. J., vol. 27, pp. 379–423, 1948a. [7] , “A mathematical theory of communication,” Bell Syst. Technol. J., vol. 27, pp. 623–656, 1948b. [8] H. W. Gel’fand and A. M. Yaglom, “Calculation of the amount of information about a random function contained in another such function,” Amer. Mathematical Soc. Translations, vol. 12, pp. 199–246, 1959. [9] N. J. I. Mars, “Time delay estimator for EEG analysis based on information theory,” in Proc. ICASSP’82, vol. 2, 1982, pp. 733–735. [10] N. J. I. Mars and G. W. Van Arragon, “Time delay estimation in nonlinear systems using average amount of mutual information analysis,” Signal Processing, vol. 4, pp. 139–153, 1982. [11] N. J. I. Mars and F. H. L. da Silva, “Propagation of seizure activity in kindled dogs,” Electroenceph. Clin. Neurophysiol., vol. 56, pp. 194–209, 1983. [12] B. J. Winer, Statistical Principles in Experimental Design, 2nd ed. New York: McGraw Hill, 1971. [13] J. P. M. Pijn, “Quantitative evaluation of EEG signals in epilepsy; nonlinear associations, time delays and nonlinear dynamics,” Ph.D. Thesis, Univ. Twente, Enschede, The Netherlands, 1990.
763
[14] J. M. P. Pijn and F. H. L. da Silva, “Propagation of electrical activity: Non-linear associations and time delays between EEG signals,” in Basic Mechanisms of the EEG, S. Zschocke and E.-J. Speckmann, Eds. Boston, MA: Birkhäuser, 1993. [15] V. M. F. De Lima, J. M. P. Pijn, C. N. Filipe, and F. H. Lopes da Silva, “The role of hipocampal commissures in the interhemispheric transfer of epileptiform after-discharges in the rat: A study using linear and nonlinear regression analysis,” Electroenceph. Clin. Neurophysiol., vol. 76, pp. 520–539, 1990. [16] J. P. M. Pijn, P. C. M. Vijn, F. H. L. da Silva, and V. M. F. De Lima, “Evolution of interactions between brain structures during an epileptic seizure in a kindled rat,” in Advances in Epileptology, J. Manelis, E. Bental, J. N. Loeber, and F. E. Dreifus, Eds. New York: Raven, 1989. [17] M. C. Casdagli, L. D. Iasemidis, R. S. Savit, R. L. Gilmore, S. N. Roper, and J. C. Sackellares, “Non-linerity in invasive EEG recordings from patients with temporal lobe epilepsy,” Electroenceph. Clin. Neurophysiol., vol. 102, pp. 98–105, 1997. [18] Y. M. Bishop, S. E. Fienberg, and P. W. Holland, Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press, 1975. [19] K. Pearson, “On the theory of contingency and its relation to association and normal correlation,” in Draper’s Co. Res. Mem. Biometric Ser. l., Cambridge, U.K.: Cambridge Univ. Press, 1904. Reprint (1948) in Karl Pearson’s Early Papers. [20] H. Cramér, Mathematical Methods of Statistics. Princeton, N.J.: Princeton Univ. Press, 1946. [21] E. Callaway and P. R. Harris, “Coupling between cortical potentials from different areas,” Science, vol. 183, pp. 873–875, 1974. [22] J. N. Kapur and H. K. Kesavan, Entropy Optimization Principles with Applications. S. Diego, CA: Academic, 1992. [23] C. E. Shannon and W. Weaver, The Mathematical Theory of Communication. Urbana, IL: Univ. Illinois Press, 1949.
João Paulo Silva Cunha (S’87–M’90) was born in Porto, Portugal. He received the Electronics and Telecommunications Engineering Diploma in 1989 and the Ph.D. degree in electrical engineering in 1996, both from the University of Aveiro, Aveiro, Portugal. In 1996, he joined the Electronics and Telecommunications Department of the University of Aveiro where he is currently Professor. Previously, he had been a Research Assistant at the “Abel Salazar” Institute of Biomedical Sciences, University of Porto, Porto, Portugal. He was a Visiting Scientist at the Computational Neuroengineering Laboratory, University of Florida, Gainsville, and at the “Instituut voor Epilepsiebestrijding,” Hemestede, The Netherlands. His research has been devoted to the study of computer-based systems to support diagnosis in clinical neurophysiology. His current research interests are epileptogenic focus localization in epilepsy, chaos in biological systems, telemedicine systems, and multimedia healthcare information systems. Dr. Cunha is vice-president of the Portuguese Association for Medical Informatics and a member of the IEEE EMBS Society, the Portuguese Biomedical Engineering Society, and the Portuguese League Against Epilepsy.
Pedro Guedes de Oliveira (M’87–SM’97) was born in Porto, Portugal, where he graduated in electrical engineering. He received the Ph.D. degree from the University of Aveiro, Aveiro, Portugal, in 1981. He was a Guest Researcher in the Medical Physics Institute—TNO, in Utrecht, The Netherlands. Until 1992, he was a Professor in the Electronics and Telecommunication Department at the University of Aveiro. In the beginning of 1993, he moved to the Faculty of Engineering of the University of Porto, Porto, Portugal, where he is currently a Professor in the Electrical and Computer Engineering Department and presides over INESC Porto, a private nonprofit research and development Institute for System and Computer Engineering. His research interests include biomedical engineering, signal processing, and analog and digital electronic circuits.