Oct 27, 2014 - Hanna BECKER. DENOISING, SEPARATION AND ... M. David BRIE. Professeur. Rapporteur. M. Christian G. BENAR. Chargé de Recherche ...
Denoising, separation and localization of EEG sources in the context of epilepsy Hanna Becker
To cite this version: Hanna Becker. Denoising, separation and localization of EEG sources in the context of epilepsy. Other. Universit´e Nice Sophia Antipolis, 2014. English. .
HAL Id: tel-01077875 https://hal.archives-ouvertes.fr/tel-01077875 Submitted on 27 Oct 2014
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destin´ee au d´epˆot et `a la diffusion de documents scientifiques de niveau recherche, publi´es ou non, ´emanant des ´etablissements d’enseignement et de recherche fran¸cais ou ´etrangers, des laboratoires publics ou priv´es.
UNIVERSITE DE NICE-SOPHIA ANTIPOLIS
ECOLE DOCTORALE STIC SCIENCES ET TECHNOLOGIES DE L’INFORMATION ET DE LA COMMUNICATION
THESE pour l’obtention du grade de
Docteur en Sciences de l’Université de Nice-Sophia Antipolis
Mention: Automatique, Traitement du Signal et des Images présentée et soutenue par
Hanna BECKER DENOISING, SEPARATION AND LOCALIZATION OF EEG SOURCES IN THE CONTEXT OF EPILEPSY – DEBRUITAGE, SEPARATION ET LOCALISATION DE SOURCES EEG DANS LE CONTEXTE DE L’EPILEPSIE Thèse dirigée par Pierre COMON et Laurent ALBERA soutenue le 24 octobre 2014
Jury: M. Gérard FAVIER Mme. Sabine VAN HUFFEL M. David BRIE M. Christian G. BENAR M. Pierre COMON M. Laurent ALBERA
Directeur de Recherche CNRS Professeur Professeur Chargé de Recherche INSERM Directeur de Recherche CNRS Maître de conférence
Président Rapporteur Rapporteur Examinateur Co-directeur de thèse Co-directeur de thèse
Denoising, separation and localization of EEG sources in the context of epilepsy
PhD thesis
submitted by Hanna Becker in partial fulfillment of the requirements for the degree of Doctor of Science specialized in Control Systems, Signal and Image Processing of the University of Nice - Sophia Antipolis
October 24, 2014
Advisors: Pierre Comon and Laurent Albera
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Acknowledgments I wouldn’t have been able to complete my PhD without a large amount of help and there are many people that have contributed (whether directly or indirectly) to this work: First of all, a big thank you goes to my advisors Pierre Comon and Laurent Albera, who have accompanied me through both my Bachelor thesis and my PhD thesis. I have really appreciated their support, feedback, and helpful advice over the last years. In particular, I am grateful to Pierre for leaving me a good amount of independence and for trusting me to go my own way, thus helping me to develop my autonomy. He has also enabled me to work in two different laboratories, I3S and GIPSA-Lab, which gave me the opportunity to discover two different environments and to meet a lot of nice and interesting people. Furthermore, I am much obliged to Laurent for his time for detailed discussions, his patience and encouragement as well as his ability to make a glass half empty look half full again. I also thank Laurent for welcoming me at the LTSI in Rennes a number of times and for his vital help in obtaining a teaching position at IUT Saint-Malo for my third PhD year. My thanks also go to Isabelle Merlet who has always been there to help me with application-related issues, to provide me with data required for simulations as well as some clinical data, and to get my publications and this thesis right from an application point of view. Moreover, I thank Christian Bénar, Martine Gavaret, and Jean-Michel Badier for providing me with the real data used in Section 4.4.3 as well as an MEG head model and for helping me with the data analysis and the interpretation of the results. I am also grateful to Amar Kachenoura for his help with the work on EEG denoising and to Rémi Gribonval for his helpful feedback concerning the part of the thesis that is devoted to distributed source localization. Furthermore, I would like to thank the members of the jury for taking the time to evaluate my thesis and for their participation in my PhD defense. I am also thankful to Cédric Marion for his help with the French summary of my thesis. Then I thank Mohammad Niknazar, Bertrand Rivet, and Christian Jutten for our collaboration on ECG source separation, which has also taken place during my PhD, even if this topic is not adressed in this manuscript. Moreover, I thank Jean-Claude Nunes for his help concerning our work on automatic thresholding of source imaging solutions, which is still on-going. I am also grateful to Martin Haardt, who made me first discover the fascinating field of signal processing and gave me the opportunity to work in his lab during my Bachelor and Master studies to get an early start in research. He also established the first contact with Pierre Comon. So without Martin, I would probably have ended up somewhere else entirely... By dividing the time of my thesis over three different laboratories, I3S, GIPSA-Lab, III
IV
ACKNOWLEDGMENTS
and LTSI, I have met a very large number of colleagues, whom I would like to thank for welcoming me and for the nice moments spent together. Last but not least, many thanks go to my friends without whom the last three years just wouldn’t have been the same and to my parents who have been a continuous source of support, always encouraging me to keep going at difficult times. Finally, I would like to acknowledge the financial support of my work, that has been ensured by CNRS and by Conseil Régional PACA.
Contents 1 Introduction 1.1 Context and motivation . . . . . . . . . . . 1.2 Proposed approach and outline of this thesis 1.3 Associated publications . . . . . . . . . . . . 1.4 Notations . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
1 1 2 3 4
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
2 EEG signals: Physiological origin and modeling 2.1 A brief introduction to brain anatomy . . . . . . . . . 2.2 Physiological origin of electromagnetic brain signals . . 2.3 Epilepsy . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Electroencephalography (EEG) . . . . . . . . . . . . . 2.5 Modeling of EEG signals . . . . . . . . . . . . . . . . . 2.5.1 Head model, source space, and lead field matrix 2.5.2 Generation of physiologically plausible signals .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
5 . 5 . 6 . 7 . 9 . 9 . 11 . 12
. . . . . . . . . . . . . . . . . . . . .
15 15 16 17 17 18 19 19 20 21 26 28 29 30 31 33 34 34 35 36 37 39
3 Preprocessing 3.1 Artifact removal and separation of independent patches using ICA . . . 3.1.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Principles of ICA . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2.1 Contrast function . . . . . . . . . . . . . . . . . . . . . 3.1.2.2 Prewhitening . . . . . . . . . . . . . . . . . . . . . . . 3.1.2.3 Parameterization of the mixing vectors . . . . . . . . . 3.1.3 State-of-the-art methods for blind source separation . . . . . . . 3.1.3.1 SO methods . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3.2 ICA methods . . . . . . . . . . . . . . . . . . . . . . . 3.1.4 Penalized semi-algebraic unitary deflation (P-SAUD) algorithm 3.1.5 Analysis of the computational complexity . . . . . . . . . . . . 3.1.6 Computer results . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.6.1 Simulation setup . . . . . . . . . . . . . . . . . . . . . 3.1.6.2 Performance analysis for different patch distances . . . 3.1.6.3 Computational complexity . . . . . . . . . . . . . . . . 3.1.7 Real data analysis . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Separation of correlated sources based on tensor decomposition . . . . . 3.2.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 CP decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2.1 Number of components . . . . . . . . . . . . . . . . . . V
. . . . . . . . . . . . . . . . . . . . .
VI
CONTENTS
3.3
3.2.2.2 Essential uniqueness . . . . . . . . . . . . . . . . . . . . 3.2.2.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Transform-based tensor methods . . . . . . . . . . . . . . . . . . 3.2.3.1 Space-Time-Frequency (STF) analysis . . . . . . . . . . 3.2.3.2 Space-Time-Wave-Vector (STWV) analysis . . . . . . . 3.2.4 Analysis of the trilinear approximation . . . . . . . . . . . . . . . 3.2.4.1 Sufficient conditions for perfect recovery . . . . . . . . . 3.2.4.2 Discrepancies from the above conditions . . . . . . . . . 3.2.4.3 Interpretation of the mathematical conditions with respect to the STF and STWV analyses . . . . . . . . . . . . . 3.2.5 Analysis of the computational complexity . . . . . . . . . . . . . 3.2.5.1 Tensor construction . . . . . . . . . . . . . . . . . . . . 3.2.5.2 Tensor decomposition . . . . . . . . . . . . . . . . . . . 3.2.6 Computer results . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.6.1 Simulation setup . . . . . . . . . . . . . . . . . . . . . . 3.2.6.2 Influence of the SNR . . . . . . . . . . . . . . . . . . . . 3.2.6.3 Influence of the number of time samples . . . . . . . . . 3.2.6.4 Influence of the number of sensors . . . . . . . . . . . . 3.2.6.5 Computational complexity . . . . . . . . . . . . . . . . . 3.2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statistical vs. deterministic preprocessing: when to use ICA, tensor decomposition, or both . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 ICA vs. tensor decomposition . . . . . . . . . . . . . . . . . . . . 3.3.1.1 Simulation setup . . . . . . . . . . . . . . . . . . . . . . 3.3.1.2 Separation of correlated sources . . . . . . . . . . . . . . 3.3.1.3 Robustness to artifacts . . . . . . . . . . . . . . . . . . . 3.3.2 Combination of ICA and tensor decomposition . . . . . . . . . . . 3.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Distributed source localization 4.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Hypotheses on the spatial distribution of the sources . . . . . . . 4.2.2 Hypotheses on the temporal distribution of the sources . . . . . . 4.2.3 Hypotheses on the spatio-temporal distribution of the sources . . 4.2.4 Hypotheses on the noise . . . . . . . . . . . . . . . . . . . . . . . 4.3 Classification and description of state-of-the-art methods . . . . . . . . . 4.3.1 Regularized least squares methods . . . . . . . . . . . . . . . . . . 4.3.1.1 Minimum norm estimates . . . . . . . . . . . . . . . . . 4.3.1.2 Minimum current estimates . . . . . . . . . . . . . . . . 4.3.1.3 Combination of smoothness and sparsity of the spatial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1.4 Sparsity in a transformed domain . . . . . . . . . . . . . 4.3.1.5 Mixed norm estimates . . . . . . . . . . . . . . . . . . . 4.3.1.6 Sparsity over space and in the transformed time domain 4.3.2 Bayesian approaches . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2.1 Variational Bayesian approaches . . . . . . . . . . . . .
. . . . . . . .
39 40 41 41 42 43 44 44
. . . . . . . . . . .
45 45 46 46 48 48 49 50 50 51 51
. . . . . . .
53 53 53 54 56 57 59
. . . . . . . . . .
61 62 62 63 64 65 65 66 67 68 69
. . . . . .
70 70 71 72 72 73
CONTENTS
VII
4.3.2.2 Empirical Bayesian approaches . . . . . . . . . . . . . . . 73 Extended source scanning methods . . . . . . . . . . . . . . . . . . 74 4.3.3.1 Beamforming approaches . . . . . . . . . . . . . . . . . . 75 4.3.3.2 Subspace-based approaches . . . . . . . . . . . . . . . . . 75 Tensor-based source localization . . . . . . . . . . . . . . . . . . . . . . . . 76 4.4.1 Disk algorithm (DA) . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4.2 Computer results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4.2.1 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . 78 4.4.2.2 Influence of the patch distance . . . . . . . . . . . . . . . 80 4.4.2.3 Influence of the patch depth . . . . . . . . . . . . . . . . . 83 4.4.2.4 Theoretical analysis of selected two patch scenarios . . . . 83 4.4.2.5 Influence of the number of CP components . . . . . . . . . 86 4.4.3 Real data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.4.3.1 Data acquisition and processing . . . . . . . . . . . . . . . 87 4.4.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.4.4 Discussion and conclusions . . . . . . . . . . . . . . . . . . . . . . . 90 Sparse, variation-based source imaging approaches . . . . . . . . . . . . . . 92 4.5.1 The SVB-SCCD algorithm . . . . . . . . . . . . . . . . . . . . . . . 93 4.5.2 Exploitation of temporal structure . . . . . . . . . . . . . . . . . . 93 4.5.3 Optimization using ADMM . . . . . . . . . . . . . . . . . . . . . . 94 4.5.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.5.4.1 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . 94 4.5.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Combination of tensor decomposition and variation-based source localization 97 4.6.1 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.6.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Comparative performance study . . . . . . . . . . . . . . . . . . . . . . . . 103 4.7.1 Analysis of the computational complexity . . . . . . . . . . . . . . 103 4.7.2 Evaluation of the source imaging results . . . . . . . . . . . . . . . 108 4.7.2.1 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . 108 4.7.2.2 Influence of the patch position . . . . . . . . . . . . . . . . 108 4.7.2.3 Influence of the patch size . . . . . . . . . . . . . . . . . . 111 4.7.2.4 Influence of the patch number . . . . . . . . . . . . . . . . 114 4.7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.3.3
4.4
4.5
4.6
4.7
5 Summary and conclusions 121 5.1 Summary and illustration of the complete data analysis process . . . . . . 121 5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.3 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Appendices
127
A Higher order statistics 129 A.1 Cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 A.2 Properties of cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
VIII
CONTENTS
B Semi-algebraic contrast optimization 131 B.1 Parameterized contrast functions . . . . . . . . . . . . . . . . . . . . . . . 131 B.2 Optimization of the COM2 cost function . . . . . . . . . . . . . . . . . . . 132 B.3 Optimization of the P-SAUD cost function . . . . . . . . . . . . . . . . . . 132 C Tensor-based preprocessing of combined EEG/MEG data C.1 EEG/MEG data model . . . . . . . . . . . . . . . . . . . . . C.2 STF and STWV analyses for EEG and MEG . . . . . . . . C.3 Joint CP decomposition . . . . . . . . . . . . . . . . . . . . C.3.1 Two common loading matrices . . . . . . . . . . . . . C.3.2 One common loading matrix . . . . . . . . . . . . . . C.3.2.1 ALS . . . . . . . . . . . . . . . . . . . . . . C.3.2.2 DIAG . . . . . . . . . . . . . . . . . . . . . C.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
135 . 135 . 136 . 136 . 137 . 137 . 137 . 138 . 139 . 140
D Construction of the STWV tensor
141
E Theoretical analysis of the trilinear tensor approximation
145
F Convex optimization algorithms for source imaging 149 F.1 FISTA: Optimization of the MCE and MxNE cost functions . . . . . . . . 150 F.2 ADMM: Optimization of the SVB-SCCD cost function . . . . . . . . . . . 150 Bibliography
153
Index
169
French extended summary of the thesis
175
List of abbreviations ADMM ALS BEM BSS cICA cLORETA CCA COM2 Corcondia CP CSF DA DelL DelL-DG DelL-SG DFT DIAG DLE ECG ECoG EEG EM EMG EOG ERP ESP EVD ExSo-MUSIC FEM FISTA FLOP fMRI FOCUSS FO FPF FWHM GD
Alternating direction method of multipliers Alternating least squares Boundary element method Blind source separation Constrained independent component analysis Cortical low resolution electromagnetic tomography Canoncial correlation analysis Contrast maximization algorithm Core consistency diagnostic Canonical polyadic decomposition Cerebrospinal fluid Disk algorithm Deflation algorithm by Delfosse and Loubaton Deterministic gradient deflation algorithm by Delfosse and Loubaton Stochastic gradient deflation algorithm by Delfosse and Loubaton Discrete Fourier transform Direct algorithm for canonical polyadic decomposition Dipole localization error Electrocardiography Electrocorticography Electroencephalography Expectation maximization algorithm Electromyography Electooculography Event-related potential Event sparse penalty algorithm Eigenvalue decomposition Extended source multiple signal classification Finite element method Fast iterative shrinkage thresholding algorithm Floating point operation functional magnetic resonance imaging Focal underdetermined system solution Fourth order False positive fraction Full width at half maximum Gradient descent IX
X
LIST OF ABBREVIATIONS GOF HO HOSVD ICA ICA-R ISTA JADE JCP JET JEVD LASSO LCMV LM LORETA MAP MCE MEG MI MNE MRI MUSIC MxNE NCS PARAFAC PCA P-SAUD ROC SAUD SEEG sLORETA SNR SO SOBI SOCP STF STFT STR STWV SVB-SCCD SVD TF-MxNE TPF TV VB-SCCD VESTAL WMNE
Goodness of fit Higher order Higher order singular value decomposition Independent component analysis Independent component analysis with reference Iterative shrinkage thresholding algorithm Joint approximate diagonalization of eigenmatrices Joint canonical polyadic decomposition Joint eigenvalue decomposition based on triangular matrices Joint eigenvalue decomposition Least absolute shrinkage and selection operator Linearly constrained minimum variance beamformer Levenberg-Marquardt algorithm Low resolution electromagnetic tomography Maximum a posteriori Minimum current estimate Magnetoencephalography Mutual information Minimum norm estimate Magnetic resonance imaging Multiple signal classification Mixed norm estimate Natural cubic splines Parallel factor analysis Principal component analysis Penalized semi-algebraic unitary deflation algorithm Receiver operating characteristic Semi-algebraic unitary deflation algorithm Stereotactic electroencephalography Standardized low resolution electromagnetic tomography Signal-to-noise ratio Second order Second order blind identification Second order cone programming Space-time-frequency Short term Fourier transform Space-time-realization Space-time-wave-vector Sparse variation-based sparse cortical current density Singular value decomposition Time-frequency mixed norm estimate True positive fraction Total variation Variation-based sparse cortical current density Vector-based spatial-temporal analysis using a L1-minimum-norm Weighted minimum norm estimate
List of symbols Scalars β C4,sp D f E F I I1 I2 I3 K λ M M4,sp N N0
¯ N
Nsw P Pb Pe Pm
SVB-SCCD regularization parameter FO cumulant of the signal sp number of grid dipoles frequency variable number of edges of the triangular grid number of frequency samples number of iterations first tensor dimension second tensor dimension third tensor dimension number of wave vector samples regularization parameter number of components extracted by P-SAUD FO moment of the signal sp number of sensors number of sensors for which the local spatial Fourier transform is computed average number of sensors that are considered for the computation of the local spatial Fourier transform for one sensor number of sweeps of the DIAG algorithm number of sources number of components associated with background activity number of components associated with epileptic sources number of components associated with muscle activity
φk ρh ρs t T τ θp,k
Givens rotation angle correlation coefficient of the spatial mixing vector correlation coefficient of the signal vector time variable number of time samples time lag Givens rotation parameter
Vectors a aF aW b bF bW c cF cW g ˜ γ h h(e) ˆ h k XI
loading vector STWV loading vector of space characteristics STF loading vector of space characteristics loading vector STWV loading vector of time characteristics STF loading vector of time characteristics loading vector STWV loading vector of wave vector characteristics STF loading vector of frequency characteristics lead field vector for a dipole with fixed orientation hyperparameter vector mixing vector / distributed source lead field vector spatial mixing vector of epileptic sources estimated spatial mixing vector wave vector
XII ω φ ψ q q(p) r s ˆ s ˜ s (b)
s s(e) s(m) sx
sy
sz τ w x xi z z(p)
LIST OF SYMBOLS demixing vector vector of Givens rotation angles distributed source coefficient vector unitary mixing vector temporary unitary mixing vector for the p-th source position vector signal vector estimated signal vector signal vector for a dipole with fixed orientation background activity vector epileptic signal vector muscle activity vector signal vector of x-component for a dipole with free orientation signal vector of y-component for a dipole with free orientation signal vector of z-component for a dipole with free orientation vector of time lags weight vector data vector instrumentation noise vector prewhitened data vector temporary prewhitened data vector for the p-th source
Matrices 0K,Q A AF AW B BF BW C CF
K × Q matrix of zeros loading matrix STWV loading matrix of space characteristics STF loading matrix of space characteristics loading matrix STWV loading matrix of time characteristics STF loading matrix of time characteristics loading matrix STWV loading matrix of wave vector characteristics
CW C2q,x C2q,s Cn Cx F+ G ˜ G H ˆ H H(b) H(e) ˆ (e) H H(m) IN K L N Ω P Φ Π Q ˆ Q Q(p) ˆ (p) Q
Qg S ˜ S
STF loading matrix of frequency characteristics FO cumulant matrix of the data FO cumulant matrix of the signals noise covariance matrix data covariance matrix ICA prewhitening matrix lead field matrix for free orientation dipoles lead field matrix for fixed orientation dipoles mixing matrix / distributed source lead field matrix estimated mixing matrix mixing matrix of background activity spatial mixing matrix of epileptic signals estimated spatial mixing matrix of epileptic signals mixing matrix of muscle activity identity matrix of size N × N Tikhonov regularized inverse matrix matrix implementing the Laplacian operator noise matrix demixing matrix spatial prewhitening matrix for source imaging dictionary of time-frequency basis functions permutation matrix unitary mixing matrix estimated unitary mixing matrix temporary unitary mixing matrix for the p-th source estimated temporary unitary mixing matrix for the p-th source Givens rotation matrix signal matrix signal matrix for fixed orientation dipoles
LIST OF SYMBOLS ¯ S S(b) ˜ (b) S
S(e) ˆ (e) S ˜ (e) S Σn Σs T Θ Un Us Υ Vs W X Xa Xb Xe Xi
distributed source signal matrix signal matrix of background activity signal matrix of background activity for fixed orientation dipoles epileptic signal matrix estimated epileptic signal matrix estimated epileptic signal matrix for fixed orientation dipoles matrix of singular values associated with the noise subspace matrix of singular values associated with the signal subspace transform matrix matrix of Givens rotation parameters matrix of left noise subspace vectors matrix of left signal subspace vectors matrix of dipole orientations matrix of right signal subspace vectors weight matrix data matrix artifact data background activity epileptic data instrumentation noise
XIII
Tensors F W
STWV tensor STF tensor
Sets I Iˆ J Ωp
set of dipole belonging to a distributed source set of estimated dipoles belonging to a distributed source set of all dipoles of the source space set of dipoles belonging to the pth patch
Functions µ(X) mutual coherence of X ψ contrast function w window function
Chapter 1 Introduction 1.1
Context and motivation
ElectroEncephaloGraphy (EEG) is a non-invasive technique that records brain activity with a high temporal resolution using an array of sensors, which are placed on the scalp. The measurements contain valuable information about the electromagnetic brain sources that are at the origin of the observed cerebral activity. This information is crucial for the diagnosis and management of some diseases or for the understanding of the brain functions in neuroscience research. In this thesis, we focus on the application of EEG in the context of epilepsy. More particularly, we are concerned with the localization of the epileptic regions, which are involved in epileptic activity between seizures. The precise delineation of these areas is essential for the evaluation of patients with drugresistant partial epilepsy for whom a surgical intervention can be considered to remove the epileptogenic brain regions that are responsible for the occurrence of seizures.
Figure 1.1: Illustration of the EEG forward and inverse problems. 1
2
CHAPTER 1. INTRODUCTION
The objective thus consists in identifying the positions (and spatial extents) of brain sources from the noisy mixture of signals which is recorded at the surface of the head by the EEG sensors. This is known as the inverse problem. On the other hand, deriving the EEG signals for a known source configuration is referred to as the forward problem (cf. Figure 1.1). Thanks to refined models of head geometry and advanced mathematical tools that permit to compute the so-called lead field matrix, which characterizes the propagation within the head volume conductor, solving the forward problem has become straightforward. By contrast, finding a solution to the inverse problem is still a challenging task. This is especially the case in the context of multiple sources with correlated time signals that can be involved in the propagation of epileptic phenomena. This problem is the key issue of this thesis and motivates the development of algorithms that are robust to source correlation. Another difficulty encountered in EEG data analysis consists in the fact that the recorded data do not only contain the cerebral activity of interest, but also comprise contributions from sources outside the brain such as ElectroCardioGraphic (ECG) signals, muscle ElectroMyoGraphic (EMG) activities or ElectroOculoGraphic (EOG) activities (eye blinks). These “non-brain” signals are generally referred to as artifacts. The artifacts may be of high amplitudes, masking the signals of interest, which correspond, in our case, to the activity of one or several epileptic sources. Therefore, to prevent the artifacts from compromising the interpretation of the EEG measurements, it is desirable to remove them prior to the application of further EEG analysis techniques.
1.2
Proposed approach and outline of this thesis
For EEG recordings containing multiple sources and artifacts, we propose to consider the following data processing steps to solve the inverse problem: 1. extraction of the cerebral activity of interest, i.e., the epileptic activity (removal of artifacts), 2. separation of simultaneously active, potentially correlated sources to facilitate their localization, 3. distributed source localization. The first two steps consist in preprocessing operations, that are optional, but may considerably simplify the source localization procedure depending on the characteristics of the data set at hand, whereas the actual solution of the inverse problem is carried out in the third step. In this thesis, we develop computationally efficient, robust techniques for the three data processing steps described above. The performance of the proposed algorithms in comparison to conventional methods is analyzed both in terms of accuracy and computational complexity. As precise knowledge about the epileptogenic zones is generally not available for real data, the performance evaluations are mostly based on realistic computer simulations, which permit us to compare the obtained results with the ground truth. Nevertheless, some examples with real EEG measurements are also presented to validate the proposed methods.
Associated publications
3
This thesis is organized as follows: In Chapter 2, we provide some background information on the origin of electromagnetic brain signals, the characteristics of EEG systems, as well as epilepsy. Furthermore, we describe the mathematical model of EEG data that is employed for the simulations conducted in this thesis. In Chapter 3, we consider two types of preprocessing methods: statistical approaches for the removal of artifacts based on Independent Component Analysis (ICA) and deterministic tensor decomposition methods for source separation. Chapter 4 is devoted to the localization of distributed sources. After providing a taxonomy of current state-of-the-art methods, we present several contributions to the development of new source localization methods. We conclude the chapter with a comprehensive performance study of eight different source localization algorithms. Finally, in Chapter 5, we illustrate the combination of the three analyzed data processing steps on a simulation example before summarizing our findings and discussing perspectives for future work.
1.3
Associated publications
Parts of the work presented in this thesis can be associated with the following publications: International conference papers • H. Becker, P. Comon, L. Albera, M. Haardt, and I. Merlet, “Multiway space-timewave-vector analysis for source localization and extraction,” Proc. of European Signal Processing Conference (EUSIPCO), Aalborg, Denmark, August 2010. Introduction of the STWV analysis, described in Section 3.2.3.2 of this thesis. • H. Becker, P. Comon, and L. Albera, “Tensor-based preprocessing of combined EEG/MEG data,” Proc. of European Signal Processing Conference (EUSIPCO), Bucharest, Romania, August 2012. Extension of the STF and STWV analyses to the combination of EEG and MEG data, presented in Appendix C. • H. Becker, L. Albera, P. Comon, R. Gribonval, F. Wendling, and I. Merlet, “A performance study of various brain source imaging approaches,” IEEE Proc. of Internat. Conf. on Acoustics Speech and Signal Processing (ICASSP), Florence, Italy, May 2014. Comparative performance study of seven brain source imaging algorithms similar to the simulations conducted in Section 4.7.2 of this thesis. • H. Becker, L. Albera, P. Comon, R. Gribonval, and I. Merlet, “Fast, variation-based methods for the analysis of extended brain sources,” Proc. of European Signal Processing Conference (EUSIPCO), Lisbon, Portugal, September 2014. Presentation of the SVB-SCCD algorithm, described in Section 4.5 of this thesis. International journal papers • H. Becker, P. Comon, L. Albera, M. Haardt, and I. Merlet, “Multiway space-timewave-vector analysis for EEG source separation,” Signal Processing, vol. 92, pp. 1021–1031, 2012.
4
CHAPTER 1. INTRODUCTION
Introduction of the STWV analysis, described in Section 3.2.3.2 of this thesis, and application to EEG data for extended source localization in the context of a spherical head model. • H. Becker, L. Albera, P. Comon, M. Haardt, G. Birot, F. Wendling, M. Gavaret, C. G. Bénar and I. Merlet, “EEG extended source localization: tensor-based vs. conventional methods,” NeuroImage, vol. 96, pp. 143–157, August 2014. Presentation of the STF-DA and STWV-DA methods for distributed source localization and evaluation on realistic simulations and actual data similar to Section 4.4.1 of this thesis. • H. Becker, L. Albera, P. Comon, R. Gribonval, F. Wendling, and I. Merlet, “Brain source imaging: from sparse to tensor models,” submitted to IEEE Signal Processing Magazine, 2014. Review and classification of different brain source imaging approaches (cf. Sections 4.2 and 4.3) as well as performance comparison of representative methods based on realistic simulations similar to Section 4.7.2 of this thesis. • H. Becker, L. Albera, P. Comon, A. Kachenoura, and I. Merlet, “A penalized semialgebraic deflation ICA algorithm for the efficient extraction of interictal epileptic signals,” submitted to IEEE Transactions on Biomedical Engineering, 2014. Introduction of the P-SAUD algorithm described in Section 3.1 of this thesis.
1.4
Notations
To facilitate the reading, the following notation is used throughout this thesis: vectors, matrices, and three-way arrays are denoted with bold lowercase (a, b, . . .), bold uppercase (A, B, . . .), and bold italic uppercase letters (A, B, . . .), respectively, while sets are ˆ denotes the estimate of A. The denoted by calligraphic letters (A, B, . . .). The matrix A Kronecker product between two matrices A and B is written as A ⊗ B. Moreover, (·)T and (·)H denote transposition and Hermitian transposition, and (·)+ stands for the MoorePenrose pseudo-inverse. The N × N identity matrix is denoted by IN and 0K,Q denotes a K × Q matrix of zeros. Furthermore, |·|, || · ||p , and || · ||F stand for the absolute value, the Lp -norm and the Frobenius norm, respectively. The cardinality of the set S is denoted by #S and the determinant of the matrix A is written as det(A). Finally, d·e denotes the ceiling function.
Chapter 2 EEG signals: Physiological origin and modeling In this chapter, we provide some background information on the origin of the electromagnetic brain signals that are analyzed in this thesis. In Section 2.1, we start with a description of the brain structure and the neurons, which can be regarded as the basic processing units of the brain, before explaining the physiological mechanisms that lead to the generation of brain electromagnetic fields in Section 2.2. Then, in Section 2.3, we give a short introduction to epilepsy and epileptic signals, on which we focus in this thesis. Furthermore, the principle of EEG recordings is presented in Section 2.4. Finally, we describe in Section 2.5 how EEG data can be modeled based on the anatomical and physiological knowledge about the brain. This model serves as a basis for the simulations conducted in subsequent chapters of this thesis.
2.1
A brief introduction to brain anatomy
The brain consists of two hemispheres, each of which corresponds to a thick layer of folded, neuronal tissue. Each hemisphere of the brain is divided into four lobes by two deep fissures. One distinguishes the frontal, parietal, temporal, and occipital lobes as depicted in Figure 2.1. Furthermore, one can distinguish between a large number of cortical areas that are associated with different brain functions such as processing visual information or controlling muscular activity.
Figure 2.1: The four lobes of the brain. 5
6
CHAPTER 2. EEG SIGNALS: PHYSIOLOGICAL ORIGIN AND MODELING
To process large amounts of information, the brain contains a high number of nerve cells, the neurons. The neurons are located in the gray matter, that forms the cerebral neocortex at the surface of the brain but also deeper nuclei and structures such as the hippcampus. The white matter, which is located underneath, contains nerve fibers that establish connections between different cortical areas and between the cortex and other brain structures. A neuron basically consists of three parts (see Figure 2.2): • the dendrites, which receive stimuli from thousands of other cells, • the soma or cell body, which contains the nucleus, and • the axon, which transmits nerve impulses from the neuron to other cells. The connection points at which stimuli are transmitted between the axon of one neuron and the dendrites of neighboring neurons are called synapses.
Figure 2.2: Schematic representation of a neuron.
2.2
Physiological origin of electromagnetic brain signals
The transmission of information between neurons is based on an electrochemical process (cf. Figure 2.3) [1, 2, 3]. Once the electric potential at the base of a neuron’s axon reaches a certain threshold value due to stimulations from other cells, a so-called action potential is generated by the neuron and an electric impulse is sent along the axon. This impulse leads to the release of special molecules, the neurotransmitters, which cross the synapses with other neurons and bind to the receptors at the dendrites of the neighbouring neurons, in the following referred to as postsynaptic cells. This enables certain ions to cross the cell membranes through specific ion channels of the postsynaptic cells, leading to a change of the electric potentials at the cell membranes, also called post-synaptic potentials, and a current flow is generated within the interior of each postsynaptic cell. This current is called the intra-cellular or primary current and, depending on its direction, leads to an increase or decrease of the electric potential at the base of the axon of the postsynaptic cell. If the potential is increased, this favors the generation of a new action potential in the postsynaptic cell, causing the transmission of the stimulus to other neurons once the electric potential attains a certain threshold value. In this case, one speaks of an excitatory postsynaptic potential. On the other hand, if the postsynaptic potential hinders the generation of a new action potential by decreasing the potential at the base
Epilepsy
7
Figure 2.3: Illustration of the electrochemical process of information transmission between neurons for an excitatory potential. of the axon, it is called inhibitory. The intra-cellular current is counterbalanced by an extra-cellular (or secondary) current, that flows in the opposite direction at the outside of the cell membrane. The electric potential at the surface of the head is mostly generated by the extra-cellular currents whereas the intra-cellular currents are at the origin of the magnetic field. The postsynaptic potential (if excitatory) leads to an extracellular milieu at the synapse that becomes more negative, forming a current sink, whereas the potential on the axonal end of the postsynaptic neuron becomes more positive, forming a current source. This can be modeled by a current dipole, which is oriented along the dendrite of the cell (cf. Figure 2.4), and constitutes the basis for mathematical models of brain activity. Since the current amplitude is very small for a single neuron, a current dipole is generally used to model the synchronized activity of a group of neurons within a small area (cf. Section 2.5). It is commonly admitted that the electrical activity as recorded at the surface of the scalp originates mostly from the postsynaptic potentials of pyramidal cells, which are located in the gray matter with an orientation that is perpendicular to the cortical surface. Due to the parallel arrangement of these cells, the small current flows of the individual neurons generally add up to generate electromagnetic activity of sufficiently large amplitude to be measurable at the surface of the head.
2.3
Epilepsy
Epilepsy is one of the most common neuronal diseases and concerns about one percent of the population. It leads to temporary dysfunctions of the electrical brain activity, the epileptic seizures, as a result of sudden abnormal electric discharges, called paroxysmal discharges. These discharges occur repeatedly in one or several brain regions. Depending
8
CHAPTER 2. EEG SIGNALS: PHYSIOLOGICAL ORIGIN AND MODELING
Figure 2.4: Neuron modelled as current dipole. The red arrow also indicates the direction of the dipole moment vector.1 on the involved cortical areas, which are different from one patient to another, the clinical symptoms associated with the epileptic discharges vary and can lead to physical and mental impairments. One can distinguish two types of epilepsy: generalized epilepsy, which involves the whole cortex, and partial or focal epilepsy, which is provoked by limited brain regions, the so-called epileptogenic zone. Furthermore, two different types of epileptic paroxysms can be recorded. During seizures, rhythmic discharges, also called ictal discharges, last several seconds to a few minutes and are characterized by a rhythmic activity. Between seizures, brief paroxysms, called interictal spikes, occur in irregular intervals. The majority of epileptic patients can be successfully treated with drugs, which prevent the occurence of epileptic seizures or permit to reduce their frequency. However, some patients are drug-resistant. In some of theses cases, a surgical intervention can be considered to remove the epileptogenic zone and thus stop the occurrence of seizures. Nevertheless, this is only possible if the epileptogenic zone involves a limited area and is located in brain regions that can be removed without leading to important functional deficiencies. Moreover, this surgical procedure requires a precise knowledge of the location and the spatial extent of the epileptogenic zone. To delineate the regions from where epileptic paroxysms arise, patients usually undergo extensive pre-surgical evaluation, including EEG sessions as well as intracranial stereotactic EEG (SEEG) recordings. In order to identify the epileptogenic zones from scalp EEG measurements, source localization algorithms have been applied to interictal spikes that can frequently be observed on the scalp with a high Signal-to-Noise-Ratio (SNR) compared to ictal discharges [4, 5, 6, 7, 8]. Even though the relationship between the brain regions that are at the origin of the epileptic spikes (irritative zone), and the epileptogenic zone, which provokes the seizures, is patient-dependent and not completely understood, it has been shown that source imaging methods applied to interictal spikes can provide useful information during the pre-surgical evaluation of patients with drug-resistant partial epilepsy. Following this approach, the objective of this thesis consists in developing methods that allow for an accurate estimation of the positions and spatial extents of brain regions involved in interictal epileptic activity, based on the sole analysis of surface EEG recordings containing interictal spikes. 1
This figure is adapted from Figure 1.4 in [9].
Electroencephalography (EEG)
2.4
9
Electroencephalography (EEG)
Electroencephalography is a multi-channel system that records the electromagnetic brain activity over a certain time interval with a number of sensors that are positioned on the surface of the scalp. More precisely, it measures the difference of the electric potential between each sensor and a reference electrode. Sometimes, one also employs the so-called common average reference, which corresponds to an artificial reference that is obtained by subtracting the time signal averaged over all sensors from the data of each channel. Standard medical EEG systems comprise between 19 and 32 electrodes whereas high resolution EEG caps include up to 256 sensors. The electrodes are positioned on the scalp according to a standardized placement system. For 21 electrodes, the original 10-20 system [10] is employed. For higher numbers of sensors, extensions of this system such as the 10-10 and the 10-5 electrode systems have been put forward [11, 12]. An important advantage of EEG (and of magnetoencephalography (MEG)) compared to other techniques for the analysis of cerebral activity such as functional Magnetic Resonance Imaging (fMRI) lies in its high temporal resolution at a millisecond scale, which permits to observe the brain dynamics. Furthermore, the EEG recordings are directly related to the electrophysiological brain mechanisms, which is not the case for fMRI recordings. Finally, EEG systems are much more affordable than fMRI or MEG systems, which require more sophisticated technical equipment. For these reasons, EEG is a routinely used technique for brain signal analysis, in particular for epileptic patients.
2.5
Modeling of EEG signals
As described in Section 2.2, the brain electric and magnetic fields are generated by the current flows that are associated with the transmission of information between neurons. In order to obtain a signal of sufficient amplitude to be measurable at the surface of the scalp, a certain number of simultaneously active neuronal populations is required. These populations can be modeled by a number of current dipoles belonging to a pre-defined source space, which is generally derived from structural information about the brain (see Section 2.5.1). Furthermore, different hypotheses on the location and orientation of the sources can be incorporated by considering either a volume or a surface grid of source dipoles and using either fixed or free dipole orientations. This basically corresponds to two philosophies of brain source modeling: the reconstruction of brain activity in a tomographic way by dipoles with free orientations in a volume grid, and the concentration on dipoles located on the cortical surface with fixed orientations perpendicular to this surface. The latter corresponds to a physiologically plausible source model because the surface EEG measurements are mostly generated by pyramidal cells located in the gray matter with an orientation perpendicular to the cortex (cf. Section 2.2, [13]). Assuming a source space with free orientation dipoles, the electric potential data that is recorded at the N electrodes of an EEG sensor array for T time samples then constitutes the superposition of all dipole signals contained in the signal matrix S ∈ R3D×T that are transmitted to the surface of the scalp. The propagation of the signals in the head volume conductor is characterized by the lead field matrix G ∈ RN ×3D , which depends on spatial parameters of the head, such as the geometry of the brain, skull, and scalp as well as their conductivities, and the positions of the D source dipoles. Furthermore, the EEG recordings may contain electromagnetic signals from other physiological origins such
10
CHAPTER 2. EEG SIGNALS: PHYSIOLOGICAL ORIGIN AND MODELING
as muscular activity or eye blinks. The contributions of these artifacts are subsequently denoted by Xa . Finally, the EEG measurements are generally corrupted by instrumentation noise Xi due to the measurement process. This leads to the following model for the EEG data: X = GS + Xa + Xi . (2.1) In the case of dipoles with fixed orientations, the brain activity is described by D dipole ˜ ∈ RD×T , leading to the data model signals contained in the matrix S ˜S ˜ + Xa + Xi , X=G
(2.2)
˜ ∈ RN ×D is related to the lead field G by G ˜ = GΥ and where the lead field matrix G 3D×D Υ∈R contains the fixed orientations of the dipoles. Subsequently, we focus on model (2.2), which is used for the generation of EEG data in this thesis, considering a source space composed of dipoles located on the cortical surface with an orientation perpendicular to this surface. In the context of epilepsy, the regions of interest, i.e., the epileptic regions, can be modeled by extended sources that can be described as the union of (one or) several nonnecessarily contiguous areas of the cortex (so-called patches) with highly correlated source activities [14, 15]. All dipoles that do not belong to an extended source can be considered to emit normal background activity. Consequently, in order to distinguish between the extended sources, that we want to retrieve, and the noisy background activity, we can rewrite the data model (4.2) in the following way: X =
P X X p=1 kp ∈Ωp
sT g ˜kp˜ kp +
X
g ˜l˜ sT l + Xa + Xi
(2.3)
l∈∪ / P p=1 Ωp
X = Xe + Xb + Xa + Xi = Xe + N.
(2.4)
Here, Ωp is the set of indices of the dipoles that belong to the p-th extended source, g ˜k is the lead field vector of the k-th dipole, and ˜ sk is the associated signal vector that ˜ corresponds to the k-th row vector of S. The matrix Xe comprises the data generated by the signals of interest whereas the noise matrix N summarizes background activity Xb , artifacts Xa , and instrumentation noise Xi . The instrumentation noise could be modeled by random variables drawn from a Gaussian distribution, but for simplicity, it is generally neglected in the rest of this thesis. Moreover, rather than modeling the artifacts, we will consider real measurements recorded during an EEG session of a patient. Therefore, in the following, we concentrate on the modeling of the EEG data of the epileptic signals of interest Xe and the EEG data due to the background activity Xb . On the one hand, in Section 2.5.1, we consider the modeling of the head and the source space, whose influences are reflected in the lead field matrix, and on the other hand, in Section 2.5.2, we are concerned with the generation of the source time signals. Please note that model (2.3) is employed for EEG data generation whereas different models may be considered for the data analysis. The choice of the data analysis model depends on the objective of a given analysis step and the assumptions on the data that are made by the method that is used to achieve this objective. In Sections 3.1.1 and 3.2.1 of this thesis, we introduce two different data analysis models that are used for source separation purposes.
Modeling of EEG signals
2.5.1
11
Head model, source space, and lead field matrix
The simplest model of the head consists of several nested spheres which represent the brain, the skull, and the scalp. Possibly, an additional sphere, which is located between the brain and the skull and which corresponds to the cerebro-spinal fluid (CSF), can be included. The considered layers are assumed to be homogeneous and of different conductivities due to their morphological differences. Besides its simplicity, the main advantage of the spherical head model consists in the fact that for given source and sensor positions, the lead field matrix can be computed analytically (see, e.g., [16, 3]). While the spherical head model is generally sufficient to obtain a good approximation of the magnetic field, which is insensitive to changes of conductivity, it leads to significant modeling errors in the computation of the electric potential at the surface of the head. To enhance the accuracy of the EEG lead field matrix, a realistic head model should be employed. In this case, the boundaries of the different layers are derived from structural Magnetic Resonance Imaging (MRI). More precisely, the MRI images are segmented to obtain triangular meshes for the cortical surface (gray matter/white matter interface), the brain, the skull, and the scalp (cf. Figure 2.5). This can be achieved using software such as BrainVISA [17, 18]. To obtain accurate source imaging results in the context of epilepsy, a realistic head model should be derived from the MRI of each patient because pathological variations in brain geometry cannot be ruled out.
Figure 2.5: Illustration of a realistic head model with three compartments representing the brain, the skull, and the scalp and a source space that consists of a large number of dipoles (represented by black dots) located on the gray matter/white matter interface. Based on the triangular mesh that represents the cortical surface, there are two approaches for the definition of the source space: the elementary dipoles can be positioned at the vertices of the mesh (this is the case in the Brainstorm Matlab toolbox [19]) or at the centroids of the triangles (as in the ASA software (ASA, ANT, Enschede, Netherlands)). Once the head model and the source space have been defined, the lead field matrix can be computed numerically using Boundary Element Methods (BEM) or Finite Element Methods (FEM), which are based on the quasi-static approximation of Maxwell’s equations (see [2, 3] for more details). BEM are, for instance, implemented in the ASA software or in OpenMEEG [20, 21]. In this thesis, for all simulations, we employ the realistic head model shown in Figure 2.5 and a source space that is defined by the triangularized inner cortical surface (gray matter / white matter interface). These meshes are obtained from the segmentation of a single subject MRI. A grid dipole is placed at the centroid of each of the triangles with an
12
CHAPTER 2. EEG SIGNALS: PHYSIOLOGICAL ORIGIN AND MODELING
orientation perpendicular to the triangle’s surface. The grid consists of 19626 triangles (9698 for the left hemisphere and 9928 for the right hemisphere) and on average, each triangle describes 5 mm2 of the cortical surface. The lead field vectors contained in the ˜ are then computed numerically for all grid dipoles using the BEM implemented matrix G in the ASA software (cf. Section 2.5.1). For the generation of distributed sources, we consider 11 different patches each of which consists of 100 adjacent grid dipoles corresponding to a cortical area of approximately 5 cm2 . The patches are located on the left hemisphere and are shown in Figure 2.6. For convenience, in subsequent chapters, we refer to these patches using the following names that indicate the patch positions: SupFr – superior frontal gyrus InfFr – inferior frontal gyrus PreC – precentral gyrus SupTe – superior temporal gyrus MidTe – middle temporal gyrus BasTe – basal aspect of the temporal lobe
OccTe – occipital temporal junction InfPa – inferior parietal lobule SupOcc – superior occipital gyrus Cing – cingulate gyrus Hipp – para-hippocampal gyrus
Figure 2.6: Location of the 11 patches that are considered for the simulations in this thesis. We provide 6 different views of the cortical surface such that all patches can be seen.
2.5.2
Generation of physiologically plausible signals
To generate physiologically plausible brain signals, a model based on coupled neuronal populations [14, 22] is employed. In this model, each current dipole of the source space is associated to one neuronal population. Moreover, a model that reflects the physiological mechanisms of information transmission between neurons based on action potentials and
Modeling of EEG signals
13
postsynaptic potentials (see Section 2.2) is used to characterize the interactions of the neurons within the neuronal population (for more details, see [14]). In this model, three different types of neurons are considered: excitatory neurons (pyramidal cells), inhibitory neurons with a fast action potential profile (GABAA fast interneurons), and inhibitory neurons with a slow action potential profile (GABAA slow interneurons). The dipole signal is obtained as the sum of the postsynaptic activity of all neuron types belonging to the neuronal population. Furthermore, the dynamics of different neuronal populations, i.e., of different dipoles, also interact. These interactions are characterized by coupling coefficients, which regulate the synchronization between different neuronal populations. By changing the parameters related to the generation of postsynaptic potentials of the neurons and the coupling coefficients between neuronal populations, the model can be employed to simulate both epileptic spikes and normal background activity of the brain. Figure 2.7 shows an example of 100 simulated interictal spike signals for the dipoles of an epileptic patch.
signal amplitude
10
5
0
−5
−10 0
0.5
1
1.5
time in s
Figure 2.7: Example of 100 simulated interictal spike signals.
Chapter 3 Preprocessing In this chapter, we are concerned with the preprocessing steps that are applied to the raw EEG measurements before the actual source localization. The first step of the EEG data analysis consists in removing the artifacts, or, equivalently, in extracting the activity of interest, i.e., the epileptic spikes. As little information on the underlying sources is available a priori, this is a typical application for Blind Source Separation (BSS) methods [23, 24]. Due to their different physiological origins, it is reasonable to assume that the artifacts are statistically independent of the epileptic activity, which motivates the application of statistical methods based on Independent Component Analysis (ICA). This approach is treated in Section 3.1. Furthermore, if several source regions are involved in the epileptic activity, it is desirable to separate these sources to facilitate the source localization process. For statistically independent sources, this may also be accomplished by ICA. However, in the context of propagation phenomena, the signals of the different epileptic source regions are (highly) correlated and their separation demands a different type of approach. In this thesis, we explore the use of deterministic tensor-based methods, described in Section 3.2. Depending on the amount and amplitude of artifacts and the nature of the epileptic activity (number of foci and correlation) of the EEG data at hand, one may decide to employ only one of the described preprocessing methods (ICA or tensor decomposition) or both. To provide some guidelines for the choice of the appropriate preprocessing approach, a comparison of the ICA-based and tensor-based preprocessing methods is conducted in Section 3.3, revealing the limitations and strengths of both approaches.
3.1
Artifact removal and separation of independent patches using ICA
In the past, a large number of BSS techniques have been successfully applied to separate the artifacts from the neuronal activities of interest (see, e.g., [25, 26, 27, 28, 29]). Among the employed methods one can find algorithms that are based on Second Order (SO) statistics only, such as SOBI [30] and CCA [28], as well as various ICA algorithms, which involve the exploitation of higher order (HO) statistics. In this section, we are mainly interested in ICA-based techniques for artifact removal. The idea and the principles of this approach are summarized in Sections 3.1.1 and 3.1.2. Furthermore, a description of several state-of-the-art methods, including SO algorithms, is provided in Section 3.1.3. 15
16
CHAPTER 3. PREPROCESSING
Recent studies [31, 32] have compared the performance of a number of popular ICA algorithms for EEG denoising. The authors of [31] concluded that the COM2 algorithm [33] is the method that yields the best compromise between performance and computational complexity. Albeit this method extracts not only the epileptic components of interest, but inherently identifies also a large number of other components of the mixture. Considering a sufficiently large number of sources, this can easily lead to the separation of a hundred signals in the context of high-resolution EEG. Since we are only interested in the epileptic activity, the computational complexity of the algorithm could be further reduced by extracting only the epileptic signals of interest. To extract a reduced number of ICA components, deflation methods can be employed. However, a remaining difficulty in deflationary approaches consists in ensuring that the signals of interest are extracted first such that the algorithm can be stopped after the separation of a small number of components from the mixture. This requires the exploitation of prior knowledge about the signals of interest. In [34], the constrained ICA (cICA) framework has been developped to this end and ICA methods that work with a reference signal, generally referred to as ICA-R, have been put forward [35, 36, 34, 37, 38, 39] to extract the signals with the highest resemblance to the references. However, in practice, reference signals are not always available. Therefore, in Section 3.1.4 of this thesis, we pursue a different approach that is based on a penalized contrast function, leading to the development of a new, deflationary algorithm, called P-SAUD. The performance of this method for the extraction of the epileptic activity is studied in Sections 3.1.6 and 3.1.7 on simulated and real data in comparison to several popular ICA algorithms.
3.1.1
Problem formulation
In this section, we assume that the measurements x[t] ∈ RN of the electric potential recorded by N sensors placed on the scalp constitute a linear mixture of epileptic activity, muscular activity, and background activity of the brain (cf. Figure 3.1) in the presence of instrumentation noise xi [t]. The mixture is characterized by the matrix H(e) ∈ RN ×Pe for the epileptic components s(e) [t], the matrix H(m) ∈ RN ×Pm for the muscle artifact components s(m) [t], and the matrix H(b) ∈ RN ×Pb for the background activity components s(b) [t]. The three matrices H(e) , H(m) , and H(b) can be combined in the mixing matrix H = [H(e) , H(m) , H(b) ] ∈ RN ×P where P = Pe +Pm +Pb denotes the number of components. In the following, we assume that P ≤ N . This leads to the following data analysis model for the EEG data: x[t] =H(e) s(e) [t] + H(m) s(m) [t] + H(b) s(b) [t] + xi [t] =Hs[t] + xi [t]
(3.1) (3.2)
where s[t] = [s(e) [t]T , s(m) [t]T , s(b) [t]T ]T . In this section, {x[t]}, {xi [t]}, and {s[t]} are considered as stochastic random vector processes. The measurements X correspond to one realization of length T of the process {x[t]} and are assumed to be centered. As epileptic, muscle, and background activity have different physiological origins, their signals can be assumed to be statistically independent. This property can be exploited to recover the associated random process {s[t]} and the mixing matrix H (or, equivalently, the demixing matrix Ω = H+ ) from the measurements such that the extracted signals are maximally statistically independent, which is the objective of ICA. The P components of the extracted signal vector s[t] can be divided into three subgroups that form bases for
Artifact removal and separation of independent patches using ICA
17
the signal subspaces of the epileptic, muscle, and background activities. Although the different sources of the same type of activity might not be independent, in this case, ICA still permits us to separate the subspaces of the three types of activity and therefore to extract the EEG data containing epileptic activity, which constitutes our main objective. To distinguish between the potentially correlated signals that are emitted by the electric current sources for each type of activity and the statistically independent signals which are extracted by ICA and form bases for the epileptic, muscle, and background signal subspaces, we subsequently refer to the latter as “components” instead of “sources”. Please note that the independent components can only be extracted up to a scale and permutation indeterminacy, which means that any random vector s0 [t] = DPs[t] and mixing matrix H0 = HP−1 D−1 that are obtained from solution vector s[t] and matrix H, where D is a diagonal matrix and P is a permutation matrix, are also solutions to (3.2).
Figure 3.1: Mixture of epileptic signals, muscular activity, and background activity recorded by the EEG sensors.
3.1.2
Principles of ICA
In this section, we describe several building blocks of common ICA algorithms. First of all, finding a solution to the ICA problem described above requires a mathematical formulation. This is achieved by employing a so-called contrast function, which is based on a measure of statistical independence. This subject is addressed in Section 3.1.2.1. Furthermore, to simplify the problem, many ICA algorithms employ prewhitening. This step is described in Section 3.1.2.2. Finally, in Section 3.1.2.3, we also introduce a parameterization of the mixing matrix, which is employed by several of the ICA algorithms considered in this thesis in conjunction with prewhitening. 3.1.2.1
Contrast function
To identify independent components by linear transformation, an optimization problem which is based on some measure of statistical independence is solved. In the context of ICA, the cost function that is maximized is referred to as a contrast function and has the following properties [33, 24]: • it is invariant to a permutation of the components,
18
CHAPTER 3. PREPROCESSING
• it is invariant to a change of scale, • it is maximal only for independent components. Different contrast functions can be derived based on different measures of independence, such as mutual information or negentropy. These two measures are closely related. Indeed, they differ only in a constant and the sign in the case of uncorrelated signals, i.e., for prewhitened data (see Section 3.1.2.2). Therefore, we subsequently concentrate on one of these measures, negentropy, which is described in more detail below. The definition of negentropy is based on the differential entropy, which is given by h(px ) =
Z
∞
−∞
px (u) log(px (u))du
(3.3)
where px is the probability density function (pdf) of the random variable x. It is well known that the differential entropy is maximal for a Gaussian distribution. The negentropy J(px ) is then defined as the distance between the differential entropy h(px ) of a given random variable x and the differential entropy h(φx ) of a Gaussian random variable with the same mean and variance as x: J(px ) = h(φx ) − h(px ).
(3.4)
Therefore, the negentropy constitutes a measure of non-Gaussianity. It is equal to zero for a Gaussian random variable and greater than zero otherwise. An intuitive explanation for the use of negentropy as a contrast function can be found in [33, 24]: according to the central limit theorem, the distribution of a linear mixture of random variables tends towards a Gaussian distribution. To recover the original, independent signals, it is therefore reasonable to search for the signals whose distributions are as far from a Gaussian distribution as possible, i.e., for which the negentropy is maximal, because even the mixture of two independent signals would result in a distribution that is closer to that of a Gaussian random variable than the distributions of the original signals. As the negentropy is difficult to compute in practice, the contrasts used in ICA algorithms are usually approximations of this measure. A number of popular contrast functions including those employed by the COM2 and DelL algorithms described in Section 3.1.3.2 are derived using cumulant-based approximations of negentropy. Another approach, pursued in the FastICA algorithm, consists in approximating negentropy by appropriate non-linear functions (see Section 3.1.3.2 and [40] for more details). 3.1.2.2
Prewhitening
In order to facilitate the separation of the components, many ICA methods employ a prewhitening step that precedes the actual component extraction. The goal of prewhitening is to decorrelate the source signals, leading to a covariance matrix of the prewhitened data that is equal to the identity matrix. One possibility to achieve this consists in computing an EigenValue Decomposition (EVD) of the data covariance matrix Cx = E{x[t]xT [t]}: " #" # h i Σ2 0 UT s s Cx = Us Un (3.5) 0 Σ2n UT n where the columns of Us ∈ RN ×P and Un ∈ RN ×(N −P ) span the signal and noise subspaces, respectively. The matrices Σ2s ∈ RP ×P and Σ2n ∈ R(N −P )×(N −P ) contain the eigenvalues
Artifact removal and separation of independent patches using ICA
19
corresponding to the signal and noise parts. In practice, the covariance matrix Cx is unknown and is estimated based on sample statistics. The prewhitened data z[t] ∈ RP are then obtained as z[t] = F+ x[t] (3.6) where F+ is the Moore-Penrose pseudo inverse of F = Us Σs . Note that an alternative way to obtain z[t] is to compute the Singular Value Decomposition (SVD) of the data matrix X ∈ RN ×T , which corresponds to one realization of the random vector process x[t] for t = 1, . . . , T time samples. Prewhitened data are then directly given by the right singular vectors. As statistical independence of the signals requires uncorrelatedness, identifying the matrix F, which permits to obtain uncorrelated signals, partially solves the ICA problem. However, a multiplication of F by any unitary matrix Q would also result in uncorrelated signals. The objective of all ICA algorithms that are based on the prewhitened data z[t] then reduces to identifying the unitary mixing matrix Q ∈ RP ×P which leads to ˆ of this matrix has been determined, statistically independent signals. Once an estimate Q ˆ = FQ. ˆ an estimate of the original mixing matrix can be obtained as H 3.1.2.3
Parameterization of the mixing vectors
To simplify the estimation of the unitary mixing matrix Q in the case of prewhitened data, we introduce a parameterization of the mixing vectors based on Givens rotations. As has been originally introduced in [41] and used in [33, 42], any unit-norm vector of dimension K whose last element is non-negative can be parameterized by K − 1 Givens rotation angles, such that the vector corresponds to the last row of the orthonormal matrix Q(p)T (φp ) = Q(p,K−1) (φp,K−1 ) · · · Q(p,1) (φp,1 ) g g
(3.7)
which is composed of the Givens rotation matrices
Ik−1 01,k−1
Q(p,k) (φp,k ) = g 0K−1−k,k−1 01,k−1
0k−1,1 0k−1,K−1−k 0k−1,1 cos(φp,k ) 01,K−1−k − sin(φp,k ) . 0K−1−k,1 IK−1−k 0K−1−k,1 sin(φp,k ) 01,K−1−k cos(φp,k )
Here, φp = [φp,1 , . . . , φp,K−1 ]T and p = 1, . . . , P denotes the index of the extracted ICA component. After prewhitening, each vector of the unitary mixing matrix Q can thus be characterized by a sequence of Givens rotations. To identify the mixing vectors qp , it is sufficient to search for the parameters φp,k of these Givens rotations that maximize the statistical independence of the components.
3.1.3
State-of-the-art methods for blind source separation
In the literature, a large variety of algorithms have been proposed to solve the BSS problem (see, e.g., [24] and references therein). In this section, we discuss SO methods, that exploit uncorrelatedness of the signals, and ICA, which relies on the statistical independence of the components.
20
3.1.3.1
CHAPTER 3. PREPROCESSING
SO methods
Methods based on SO statistics have been developed to separate uncorrelated signals. In particular, some techniques assume that the signals are stationary and that the temporal correlation profile is different for each component. Based on these assumptions, SO methods also allow for the separation of Gaussian signals. Subsequently, we review two SO methods, CCA and SOBI, that are commonly used in the context of EEG denoising. Together with the ICA algorithms described in Section 3.1.3.2, these methods will serve as benchmark algorithms in Section 3.1.6. CCA The objective of Canonical Correlation Analysis (CCA) (see, e.g., [28]) consists in identifying linear filters α and β for two datasets {x[t]} and {y[t]} such that the correlation coefficient αT Cxy β E{(αT x[t])(β T y[t])} =q ρ(αT x, β T y) = q (αT Cx α)(β T Cy β) E{(αT x[t])2 }E{(β T y[t])2 }
(3.8)
between the filter outputs αT x and β T y is maximized. Here, Cxy = E{x[t]yT [t]} denotes the cross-covariance matrix of the random processes {x[t]} and {y[t]} and Cx = E{x[t]xT [t]} and Cy = E{y[t]yT [t]} correspond to the covariance matrices of the processes {x[t]} and {y[t]}. Note that we here assume that the datasets {x[t]} and {y[t]} are zero-mean, which can easily be achieved by centering the data if this is not the case beforehand. It can be shown that maximizing (3.8) with respect to α and β is equivalent to solving the following two eigenvalue problems: −1 2 C−1 x Cxy Cy Cyx α = λ α −1 C−1 y Cyx Cx Cxy β
2
= λ β,
(3.9) (3.10)
which can be done efficiently using QR-decompositions. In the context of blind source separation, this approach has been adapted to extract uncorrelated signals with maximal autocorrelation. To this end, we consider the EEG data {x[t]} and the measurements at a time lag τ , {y[t]} = {x[t + τ ]}, and maximize the correlation coefficient between the extracted signal {ω T x[t]} and its delayed version {ω T x[t + τ ]}. In this case, the two filters α and β are thus identical to the demixing vector ω (the vector ω T corresponds to one row of the demixing matrix Ω). This leads to the CCA contrast function ω T Cx (τ )ω (3.11) ψ(ω) = T ω Cx (0)ω where Cx (τ ) = E{x[t]x[t + τ ]T } denotes the autocorrelation matrix at time lag τ . As α = β = ω, in [28, 43], approximate solutions to the optimization problem based on the contrast (3.11) are obtained by solving only one of the eigenvalue problems (3.9) and (3.10) with Cxy = CT yx = Cx (τ ) and Cx = Cy = Cx (0). Due to matrix symmetries, the P identified eigenvectors ωp are such that the extracted signals {ωpT x[t]} are uncorrelated. The CCA algorithm is summarized in Figure 3.2. Note that, contrary to most ICA methods, CCA does not require prewhitening. SOBI The Second Order Blind Identification (SOBI) algorithm [30] is applied to prewhitened data, which reduces the BSS problem to the identification of the unitary matrix
Artifact removal and separation of independent patches using ICA
21
1. Estimation of the spatial covariance matrices based on sample SO moments: P P Cx (0) = T1 Tt=1 x[t]x[t]T , Cx (1) = T1 Tt=1 x[t]x[t + 1]T 2. Determination of the eigenmatrix ΩT of Cx (0)−1 Cx (1)Cx (0)−1 CT x (1) by QRdecomposition and SVD of data matrices X and Y: XT = Qx Rx , YT = Qy Ry T QT x Qy = UΣV ΩT = Rx−1 U ˆ = Ω−1 and the signal matrix S ˆ = ΩX 3. Extraction of the mixing matrix H
Figure 3.2: Description of the CCA algorithm. Q. The algorithm is based on the spatial covariance matrices C(τr ) = E{z[t]zT [t + τr ]} of the prewhitened data {z[t]} for a fixed set of considered delays τr , r = 1, . . . , R. These covariance matrices are given by C(τr ) = QCs (τr )QT
(3.12)
where Cs (τr ) = E{s[t]sT [t + τr ]} are the covariance matrices of the signals, which are diagonal because the signals are uncorrelated. The idea of SOBI consists in identifying the matrix QT , which jointly (approximately) diagonalizes the matrices C(τr ), r = 1, . . . , R: QT C(τr )Q = diag(λ1 , . . . , λP ). This is achieved by minimizing the sum of the squares of the offdiagonal elements of the matrices QT C(τr )Q, giving rise to the following cost function: ψ(Q) =
R X
off(QT C(τr )Q)
with off(X) =
r=1
X
|Xi,j |2 for X ∈ RP ×P ,
(3.13)
1≤i6=j≤P
or, equivalently, by maximizing the squares of the diagonal elements, leading to the cost function: ψ(Q) =
R X
trace(QT C(τr )Q).
(3.14)
r=1
A number of techniques (see [44] or [24, Chapter 7] and references therein) have been proposed to solve this type of optimization problems, including, for example, the Jacobi algorithm, which is based on Givens rotations. The matrix Q, which is identified in this way, is unique up to scale and permutation indeterminacies if the signals have different normalized spectra (see [30] for more details). The steps of the SOBI algorithm are summarized in Figure 3.3. 3.1.3.2
ICA methods
In this section, we focus on ICA algorithms, which resort to HO statistics of the data (see Appendix A for a short introduction to HO statistics). As a consequence, these methods
22
CHAPTER 3. PREPROCESSING
1. Prewhitening: Cx =
1 T
PT
t=1
x[t]x[t]T , Cx = UΣ2 UT , z[t] = (Us Σs )+ x[t]
2. Estimation of the spatial covariance matrices based on sample SO moments: P C(τr ) = T1 Tt=1 z[t]z[t + τr ]T 3. Determination of the unitary matrix Q by joint diagonalization of the matrices C(τr ), r = 1, . . . , R ˆ = UΣQ and the signals ˆ ˆ + x[t] 4. Extraction of the mixing matrix H s[t] = H
Figure 3.3: Description of the SOBI algorithm. are not suited for the separation of Gaussian signals, whose cumulants of order greater than two are null. Only mixtures with at most one Gaussian signal can be identified by ICA methods, which constitutes an important difference compared to SO methods. Over the last two decades, a large number of ICA methods have been proposed (see [24] and references therein). These algorithms can be divided into two types of approaches: joint methods, that separate all components simultaneously, and deflation methods, which extract the components sequentially. Among the joint methods, one can find popular algorithms such as Infomax [45], JADE [46], and COM2 [33], whereas deflation methods include, for example, the FastICA algorithm1 [47], RobustICA [48], and the adaptive method by Delfosse and Loubaton [42], in the following referred to as DelL. Another aspect in which the above-mentioned ICA algorithms differ consists in the procedure employed for the maximization of the contrast function, leading to the distinction of iterative algorithms and semi-algebraic methods. Iterative algorithms are based on an update rule, whose repeated application permits to obtain an estimate of the mixing matrix. This type of approach comprises classical optimization procedures such as gradient-based algorithms and Newton search methods. Semi-algebraic approaches, on the other hand, include methods based on joint diagonalizations [46, 49, 50] and polynomial rooting [33, 51]. For the positionning of the techniques discussed in this thesis, the above-described differences between ICA algorithms are the most relevant. Note though that other classifications of ICA algorithms also exist. For example, the ICA methods can be classified according to the employed contrast function, leading to the distinction between approaches that are based on differential entropy, such as Infomax and FastICA, and cumulant-based techniques like JADE, COM2, and DelL. Furthermore, one can distinguish between batch ICA methods, that process a block of observed data samples, and adaptive algorithms, which work on a sample-by-sample basis. Finally, ICA techniques for underdetermined mixtures [49, 52, 50, 53] can also be considered as opposed to complete and overdetermined ICA methods, that are able to extract at most as many components as sensors. Here, we do not consider underdetermined methods and focus on cumulant-based approaches that are applied to blocks of EEG data. In the following, we describe three popular ICA algorithms: FastICA, DelL, and COM2. These algorithms are all based on prewhitened data and aim at identifying the unitary mixing matrix Q. 1
Please note that a version of the FastICA algorithm which jointly extracts all components also exists.
Artifact removal and separation of independent patches using ICA
23
FastICA The FastICA algorithm [40, 24] is an iterative method which extracts the components sequentially. For the extraction of the p-th component, it employs the contrast function ψ(qp ) = E{f (qpT z[t])} (3.15) which has been derived from an approximation of negentropy for suitably chosen nonquadratic functions f . For example, f (u) = u3 leads to a contrast that constitutes a simplified form of kurtosis. In this case, an approximation of negentropy has previously been derived in [33]. Please note that other choices for f lead to contrast functions which are not based on cumulants. The contrast is optimized iteratively with respect to the vector qp using an approximate Newton method. Based on an initial estimate qp(1) , at the iteration i, this leads to the following update rule for the vector qp : qp(i+1) = E{f (qp(i)T z[t])z[t]} − E{f 0 (qp(i)T z[t])}qp .
(3.16)
In practice, the expectations are replaced by sample means. In order to ensure that the p-th extracted component is decorrelated with all previously identified components, a deflation scheme based on Gram-Schmidt orthogonalization is used. More precisely, after each iteration, the current estimate qp(i+1) of the mixing vector is projected onto the subspace that is orthogonal to all previously extracted mixing vectors qq with q = 1, . . . , p − 1: qp(i+1)
←
qp(i+1)
−
p−1 X
(qp(i+1)T qq )qq .
(3.17)
q=1
Finally, the vector qp(i+1) is normalized according to: qp(i+1) ←
qp(i+1) (i+1)
||qp
||2
.
(3.18)
The steps of the FastICA algorithm are summarized in Figure 3.4 for f (u) = u3 . Note that in this case, the algorithm is not a Newton type, but deflates to a mere fixed-step gradient, as pointed out in [54, 48]. Prewhitening: Cx = T1 Tt=1 x[t]x[t]T , Cx = UΣ2 UT , z[t] = (Us Σs )+ x[t] for p = 1 to P do initialization of qp(1) for i = 1 to I do P T (i)T update of the mixing vector: qp(i+1) = z[t])3 z[t] − 3qp(i) t=1 (qp P (i+1)T projection: qp(i+1) ← qp(i+1) − p−1 qq )qq q=1 (qp P
normalization: qp(i+1) ←
(i+1)
qp
(i+1) ||qp ||2
end for ˆp = Us Σs q(i+1) extraction of the mixing vector h p end for
Figure 3.4: Description of the FastICA algorithm.
24
CHAPTER 3. PREPROCESSING
DelL The authors of [42] have proposed an adaptive ICA algorithm which extracts the components sequentially. This method, subsequently referred to as DelL, is based on the following contrast function: 2 C4,s p (3.19) ψ(sp ) = 4 where {sp [t]} corresponds to the extracted signal for the p-th component and C4,y denotes the fourth order (FO) cumulant of the random variable y. The signal {sp [t]} is obtained by applying an appropriate filter q(p) to the prewhitened data {z(p) [t]}: sp [t] = q(p)T z(p) [t]. To facilitate the determination of the filter, the parameterization based on Givens rotations, which has been described in Section 3.1.2.3, is used. More particularly, the vector q(p) is defined such that ˜ (p)T (φ ) Q (φp ) = (p)T p , q (φp ) "
Q
(p)T
#
where Q(p)T (φp ) can be written according to (3.7), and is entirely described by the vector φp of Givens rotation angles. The identification of q(p) (φp ) thus reduces to the estimation of the vector φp , which is performed using a gradient ascent algorithm. Based on an initial estimate φ(1) p , at iteration i, the vector of rotation angles is updated according to φ(i+1) = φ(i) p p +µ
∂ψ ∂φp
(3.20)
where µ is a stepsize parameter. The derivative of the contrast with respect to the parameter vector φp is given by ∂M4,sp ∂ψ = 2C4,sp ∂φp ∂φp
(3.21)
where M4,sp denotes the FO moment of the extracted signal {sp [t]}. Once the p-th component has been extracted, the remaining components can be identified from the data in the subspace that is orthogonal to the extracted component. This leads to a decrease in dimension of the analyzed data with increasing number of extracted components. To formulate the complete algorithm, we introduce the data vector z(p+1) [t] for the extraction of the (p + 1)-th component. This vector is obtained as: ˜ (p)T z(p) [t] with z(1) [t] = z[t]. z(p+1) [t] = Q Depending on the way the FO cumulant and moment, C4,sp and M4,sp , are estimated, we distinguish between the following two versions of the DelL algorithm: Stochastic gradient DelL (DelL-SG) The original DelL algorithm proposed in [42] is adaptive and estimates the moment M4,sp at iteration i based on the data sample that is available at time sample t = i, while the cumulant C4,sp is estimated adaptively: (i)
(p) 4 M4,sp = s4p [i] = (qT (φ(i) p )z [i]) (i+1) C4,sp
=
(i) C4,sp
s4 [i] − 3 (i) − C4,sp . +µ p 4 !
Artifact removal and separation of independent patches using ICA
25
This approach leads to a stochastic gradient algorithm with the following update rule for the parameter vector φp : (i)
3 (i) (p+1) [i] φ(i+1) = φ(i) p p + 2µC4,sp sp [i]Γ(φp )z
(3.22)
(i) where Γ(φ(i) p ) = diag(δ(φp )) with
(i) δ(φp,` )
(
=
1 for ` = 1 (i) i=1 cos(φp,i ) for ` = 2, . . . , P − p − 1.
Q`−1
Deterministic gradient DelL (DelL-DG) As EEG data are generally not processed in real-time, here we consider a deterministic gradient algorithm, which exploits all available time samples to estimate the FO cumulant and moment, C4,sp and M4,sp , required for the updates of the parameter vector φ: (i)
M4,sp = (i) C4,sp
T 1X (p) 4 (q(p)T (φ(i) p )z [t]) T t=1 T 1X (p) 4 (q(p)T (φ(i) − 3. p )z [t]) T t=1
!
=
This leads to the following update rule: φ(i+1) p
=
φ(i) p
+
(k) 2µC4,sp
T 1X (p) 3 (i) (p+1) (q(p)T (φ(i) [t]. p )z [t]) Γ(φp )z T t=1
(3.23)
The steps of the DelL-DG algorithm are summarized in Figure 3.5. Prewhitening: Cx = T1 Tt=1 x[t]x[t]T , Cx = UΣ2 UT , z[t] = (Us Σs )+ x[t] initialization: z(1) [t] = z[t] for p = 1 to P {loop over components} do initialization: φ(1) p for i = 1 to I {loop over iterations} do P (i) (p) 4 estimation of the cumulant C4,sp = T1 Tt=1 (q(p)T (φ(i) −3 p )z [t]) update of the parameter vector (i) 1 PT (p)T (p) 3 (i) (p+1) φ(i+1) = φ(i) (φ(i) [t] p p + 2µC4,sp T t=1 (q p )z [t]) Γ(φp )z end for computation of the mixing matrix Q(p) (φ(i+1) ) p Q (k)T ˆ ˜ extraction of the mixing vector hp = Us Σs p−1 Q q(p) , and the signal k=1 sˆp [t] = q(p)T z(p) [t] ˜ (p)T z(p) [t] z(p+1) [t] = Q end for P
Figure 3.5: Description of the DelL-DG algorithm.
26
CHAPTER 3. PREPROCESSING
COM2 The COM2 algorithm [33, 24] employs the parameterization based on Givens rotations (see Section 3.1.2.3) to identify the unitary mixing matrix Q and resembles the DelL algorithm in this regard. However, contrary to DelL, COM2 employs a semialgebraic optimization method and extracts all components at once. Furthermore, the COM2 method exploits the fact that it is sufficient to impose pairwise independence to solve the BSS problem described in Section 3.1.1 (cf. [33]). In order to achieve statistical independence of a pair of signals {sp [t]} and {sk [t]}, the following contrast is used: 2 2 ψ(sp , sk ) = C4,s + C4,s . p k
(3.24)
Due to the employed parameterization, the signals {sp [t]} and {sk [t]} that are extracted from the observations depend on the angle φp,k that characterizes the Givens rotation. More particularly, setting θp,k = tan(φp,k ), the signals are obtained from the elements of the prewhitened data as "
#
v u
"
u 1 1 −θp,k sk [t] =t 2 1 sp [t] 1 + θp,k θp,k
#"
zk [t]
#
zP −p+1 [t]
.
(3.25)
For each signal pair (p, k), COM2 determines the optimal parameter θp,k that maximizes the contrast (3.24). This can be achieved using the algebraic optimization method described in Appendix B. To obtain an estimate of the mixing matrix QT (Θ), the procedure is repeated for all signal pairs over I iterations, assembling the matrices Q(p,k) (θp,k ). The g steps of the algorithm are outlined in Figure 3.6. Note that there also exists an adaptive (i.e. stochastic) COM2 algorithm, along the lines of [55, 56]. Prewhitening: Cx = T1 Tt=1 x[t]x[t]T , Cx = UΣ2 UT , z[t] = (Us Σs )+ x[t] initialization: Q = IP for i = 1 to I {loop over iterations} do for p = 1 to P {loop over components} do for k = 1 to P − p {loop over all signal pairs} do optimization of (3.24) with respect to θp,k and construction of the Givens (θp,k ) rotation matrix Q(p,k) g T (p,k) T Q ← Qg Q z[t] ← Q(p,k) z[t] g end for end for end for ˆ = Us Σs Q extraction of the mixing matrix H P
Figure 3.6: Description of the COM2 algorithm.
3.1.4
Penalized semi-algebraic unitary deflation (P-SAUD) algorithm
To reduce the computational complexity of the ICA preprocessing step by extracting only the epileptic signals of interest, we subsequently develop a new algorithm, which is based
Artifact removal and separation of independent patches using ICA
27
on a penalized deflation scheme. The proposed method builds on the SAUD algorithm, that was first presented in [57] and which is inspired by both COM2 and DelL. SAUD extracts the components sequentially, exploiting ideas from DelL, but is based on the contrast function for pairwise independence and the efficient optimization procedure of the COM2 algorithm, thereby combining the strengths of both methods. However, as for COM2, the order of the components extracted with SAUD is arbitrary. In order to reduce the computational complexity, we would like to extract only the epileptic activity of interest. This would also lead to the additional benefit of avoiding high perturbations of the epileptic signal components due to an accumulation of errors during previous deflation steps, which constitutes a common problem of deflation algorithms. To ensure that the epileptic activity is identified during the first deflation steps, the idea of the proposed P-SAUD algorithm consists in exploiting the fact that the autocorrelation of the epileptic components is higher than that of muscular artifacts. More particularly, to extract, at each step of the deflation procedure, the signal with the highest autocorrelation, we add a penalization term to the COM2 contrast function. This gives rise to the P-SAUD contrast function which is given by 2 2 ψc (sp , sk ) = C4,s + C4,s + p k
R X
λr · cov(sp [t], sp [t + τr ])2 .
(3.26)
r=1
Here, cov(x, y) denotes the covariance of random variables x and y. Furthermore, λr , r = 1, . . . , R, are penalization parameters that determine the influence of each of the R penalty terms and τr denotes the signal delay included in the r-th covariance penalty. In the following, we consider that all covariance terms are equally important and therefore set λr = λ. In practice, the value of the penalization parameter λ needs to be adjusted depending on the kurtosis and the autocorrelation of the signal to extract. As the magnitudes of these factors are generally unknown, we propose to estimate them based on the signal {sp [t]} retrieved at the previous iteration. In order to ensure that the epileptic activity is extracted first, we use a high value of λ during the first iterations and reduce the penalization parameter with increasing number of iterations until it reaches a final value that manages a balance between the COM2 contrast (3.24) and the penalization term. As proposed in [57], the P-SAUD algorithm employs the same deflation procedure as DelL. For the extraction of the p-th component, it identifies the matrix Q(p) and in particular the vector q(p)T from the temporary data vector z(p) [t] (cf. Section 3.1.3.2). Using the parameterization based on Givens rotations, this reduces to the estimation of the rotation angles φp,k contained in the vector φp . Contrary to DelL, which estimates the vector φp using gradient ascent, the P-SAUD algorithm determines the optimal rotation angle for all signal pairs with reference signal p alternatingly, following the COM2 approach. To this end, for each considered signal pair, the P-SAUD contrast (3.26) is maximized with respect to the parameter θp,k , which characterizes the signals {sp [t]} and {sk [t]} as described in Section 3.1.3.2 (cf. equation (3.25)). The optimization can be ˜ (p) carried out algebraically and is described in Appendix B. Finally, we find the matrix Q and the vector ˜ z(p) such that: " # " # ˜ (p)T ˜ z(p) Q (p)T (p) Q = (p)T and z = (p) q z1 and initialize the temporary mixing matrix and data vector for the (p + 1)-th source by ˜ (p) and z(p+1) = ˜ Q(p+1) = Q z(p) .
28
CHAPTER 3. PREPROCESSING
The steps of the P-SAUD algorithm are summarized in Figure 3.7. The algorithm is stopped after M components including the epileptic signals of interest have been extracted. Remark: To automatically determine the number of components to extract, one could employ a spike detection method that is run on the extracted components and stops the algorithm after several components without epileptic activity have been identified. Prewhitening: Cx = T1 Tt=1 x[t]x[t]T , Cx = UΣ2 UT , z[t] = (Us Σs )+ x[t] initializations: z(1) [t] = z[t], Q(1) = IP , α = αmax for p = 1 to M do for i = 1 to I do α ← α − I1 (αmax − αmin ) for k = 1 to P − p do estimation of kurtosis and penalty of the current signal estimate to adjust the penalization parameter: (p) C1 = kurt(z1 [t])2 PR (p) (p) P1 = r=1 cov(z1 [t], z1 [t + τr ])2 λ = α CP11 optimization of (3.26) with respect to θp,k and construction of the Givens rotation matrix Q(p,k) g (p)T Q Q(p)T ← Q(p,k) g z(p) [t] ← Q(p,k) z(p) [t] g end for end for ˆp = Us Σs q(p) , sˆp [t] = q(p)T z[t] extraction of the mixing vector h (p+1) (p) (p+1) (p) ˜ ,z Q =Q [t] = ˜ z [t] end for P
Figure 3.7: Description of the P-SAUD algorithm.
3.1.5
Analysis of the computational complexity
An important aspect in the evaluation of different algorithms is their computational complexity. This point has been addressed in [31] for a number of popular ICA methods, including the SO and ICA methods presented in Section 3.1.3. Subsequently, we therefore concentrate on the calculation of the computational complexity of P-SAUD. The computational complexity is generally assessed as the number of real-valued floating point operations (FLOPs) that are required for the completion of an algorithm. As the number of additions is usually of the same order as the number of multiplications, the analysis of the computational cost of an operation is often limited to the determination of the number of multiplications. Therefore, in the following, we compute the number of real-valued multiplications involved in the P-SAUD algorithm. The first step of P-SAUD consists in prewhitening the data, which can be accomplished either by an EVD or by an SVD (cf. Section 3.1.2.2), leading to a computational complexity of min(T N 2 /2 + 4N 3 /3 + P M T, 2T N 2 ) FLOPs [31].
Artifact removal and separation of independent patches using ICA
29
The highest computational cost of P-SAUD can be associated with the estimation of the FO cumulants and the covariances that are exploited in the penalized contrast (3.26). These can be obtained in two different ways: 1. Estimation of the cumulants and covariances for each pair of analyzed signals from the temporary data vectors at each iteration. The estimation of the cumulants for one pair of signals that is associated with two rows of {y(p) [t]} from T time samples requires O(8T ) FLOPs if the LeonovShiryaev formula for zero-mean data with unit-variance is used (cf. Appendix A.1). Furthermore, the estimation of the covariances requires O(4T ) FLOPs for each of the R penalty terms. On the whole, considering that the number of epileptic components is small compared to the number of sensors and that these components are extracted first, this leads to O(8T IM P ) and O(4RT IM P ) FLOPs for cumulant and covariance estimations for the extraction of the first M components. 2. Estimation of the complete cumulant and covariance matrices in a first step and derivation of the statistics required for the analysis of a certain pair of components using the orthogonal transformation matrix Q(p) in a second step. For real-valued data, the FO cumulant matrix contains O(P 4 /24) different cumulants that are estimated using O(3T ) FLOPs for each cumulant. Exploiting the multilinearity property of cumulants (cf. Appendix A.2), the cumulants of the data after one Givens rotation can be derived from the quadricovariance matrix with 16 FLOPs per cumulant, i.e., O(2P 4 /3) FLOPs in total. On the whole, this corresponds to O(T P 4 /8 + 2P 5 M I/3) FLOPs. Similar considerations reveal a computational cost of O(R(P 2 T + 4M P 2 I)) for the estimation and transformations of R time-delayed covariance matrices. The first method is called “computation on demand” in [56] and generally involves less computations than the second method, as pointed out in this reference. For both ways of P estimating the statistics, the computational cost for all M n=1 (P − n) considered pairs of components has to be summed up over all iterations I. Furthermore, at each iteration and for each pair of components, the optimization of (3.26) requires the rooting of an 8-th degree polynomial (cf. Appendix B) which can be accomplished with IM P Q8 FLOPs, where Qn denotes the computational complexity associated with the rooting of a polynomial of degree n. The update of the temporary demixing matrix and the data vector z(p) [t] necessitates O(4T + 4P ) FLOPs for each pair of components, i.e., O((4T + 4P )IM P ) FLOPs for the extraction of M components. Finally, the computation of the signal and mixing vectors of the extracted components adds up to O(4P 2 IM + P T M ) FLOPs. Table 3.1 shows the summarized computational complexity of P-SAUD in comparison to the computational costs of CCA, SOBI, FastICA, COM2, and DelL-DG.
3.1.6
Computer results
In this section, we evaluate the performance of the proposed P-SAUD algorithm in comparison to the perfomances achieved by the state-of-the-arts methods described in Section 3.1.3 by means of computer simulations.
30
CHAPTER 3. PREPROCESSING
P-SAUD
CCA SOBI
FastICA COM2 DelL-DG
Number of FLOPs min(T N 2 /2 + 4N 3 /3 + P M T, 2T N 2 ) + IP 2 Q8 /2+ 4P 2 IM + R min(4T IM P, P 2 T + 2IP 3 ) + P T M + min(2IP 5 M/3 + P 4 T /8, 8IT P M ) T (3N 2 + 7P ) + 32P 3 /3 + N P 2 min(T N 2 /2 + 4N 3 /3 + P M T, 2T N 2 ) + 4N 3 /3+ IP (P − 1)(17(Nτ − 1) + 75 + 4P + 4P (Nτ − 1))/2+ (Nτ − 1)N 3 /2 min(T N 2 /2 + 4N 3 /3 + P M T, 2T N 2 )+ (2(P − 1)(P + T ) + 5T P (P + 1)/2)I min(T N 2 /2 + 4N 3 /3 + P M T, 2T N 2 ) + IP 2 Q4 /2+ min(IP 6 /6 + 2IP 3 + P 4 T /8 + T P 2 , 6IT P 2 ) min(T N 2 /2 + 4N 3 /3 + P M T, 2T N 2 ) + 3IT N 2
Table 3.1: Computational complexity in terms of real-valued multiplications for P-SAUD, CCA, SOBI, FastICA, COM2, and DelL-DG. The results for SOBI, FastICA, and COM2 are reproduced from [31]. Note that the number of iterations, denoted by I, may differ for each algorithm in order to reach convergence.
3.1.6.1
Simulation setup
Data generation We simulate 32 s of realistic measurement data for 32 electrodes and a sampling rate of 256 Hz originating from two patches with independent epileptic activities. Each patch consists of 100 adjacent grid dipoles emitting highly correlated signals. We consider three scenarios with different patch distances: patches SupFr and SupOcc (large distance), patches InfFr and InfPa (medium distance), and patches SupFr and InfFr (small distance). For an illustration of the patches, the reader is referred to Figure 2.6. The data are synthesized based on a realistic head model and signals obtained using a neuronal population model as described in Section 2.5.2. Finally, muscular activity recorded during an EEG session is added to the epileptic data according to a given SNR, e ||F where Xe denotes the epileptic data of interest and which is computed as SNR = ||X ||N||F N corresponds to the noise, i.e., in this case, to the muscular artifacts.
Tested algorithms We apply the proposed P-SAUD algorithm to 50 realizations of simulated data with different epileptic spikes and artifacts and compare its results to those of SAUD, SOBI, COM2, CCA, FastICA, and DelL-DG. The maximal delay for the autocorrelation matrices considered in SOBI is fixed to 15 time samples. Note that DelL-DG is implemented with an adaptive stepsize of the gradient algorithm, contrary to the original method described in [42]. For P-SAUD, we compare the results obtained for τ = 1 and τ = [1, 2, 3, 4, 5] with τ = [τ1 , . . . , τR ]. The parameters αmax and αmin (cf. Figure 3.7) are set to 4 and 0, respectively.
Evaluation In order to analyze the performance of the different methods, we compute the correlation coefficients of the original and estimated signals and of the original and
Artifact removal and separation of independent patches using ICA
31
estimated mixing vectors of the epileptic components, which are given by ρs = ρh =
(s − µs )T (ˆ s − µˆs) ||s − µs ||2 · ||ˆ s − µˆs||2 T ˆ (h − µh ) (h − µˆ ) h
ˆ − µˆ ||2 ||h − µh ||2 · ||h h
(3.27) (3.28)
where µx corresponds to the mean of the elements of vector x. To determine the worst case performance of the methods, we focus on the smallest correlation coefficient obtained for the two patches for each of the 50 realizations. To obtain a robust measure of this correlation coefficient, we then take the median value observed over all realizations. In addition, we determine how many components have to be extracted to identify the epileptic activity using the P-SAUD algorithm compared to SAUD and the other considered deflation algorithms, i.e., FastICA and DelL-DG. To this end, we identify the indices of the extracted components whose signals show the highest correlation with the original epileptic signals and average the maximal index for each method over all realizations. Even though CCA extracts the components simultaneously, we show the corresponding indices after ordering the extracted components according to their autocorrelation to demonstrate the interest in using the autocorrelation as a penalization term for P-SAUD. 3.1.6.2
Performance analysis for different patch distances
The mixing vector and signal correlation coefficients are displayed in Figure 3.8 and the maximal index of the extracted epileptic components is plotted in Figure 3.9 as a function of the SNR for the three considered scenarios. Figure 3.8 shows that P-SAUD for both τ = 1 and τ = [1, 2, 3, 4, 5] and COM2 achieve approximately the same performance. FastICA leads to similar results for distant patches and slightly worse results in the other cases. Furthermore, SAUD generally exhibits slightly reduced correlation coefficients compared to P-SAUD. Finally, the epileptic components extracted by the SO methods CCA and SOBI are not as accurate as those of higher order algorithms such as P-SAUD and FastICA, in particular concerning the spatial mixing vectors. This is due to the fact that the epileptiform signals of the two patches exhibit identical temporal correlation profiles, which prevents an accurate separation based on SO methods. DelL also exhibits a clearly reduced performance in comparison to the other FO methods for SNR less than -5 dB. Comparing the results obtained for different patch distances, we note that the closer the patches, the faster the correlation coefficients diminish with decreasing SNR. Nevertheless, for all considered scenarios, P-SAUD and COM2 permit to extract spatial mixing vectors with correlation coefficients close to 0.9 for SNR as small as -20 dB. If we consider that, as a rule of thumb, a correlation coefficient ρh of at least 0.9 is required to obtain reasonable source localization results based on the estimated spatial mixing vectors, this means that P-SAUD and COM2 accurately separate the epileptic activity from the artifacts if the SNR is higher than or equal to -20 dB. As shown in Figure 3.9, for SAUD, the index of epileptic signals increases with diminishing SNR and is very high for SNR below -15 dB. This means that SAUD extracts the signals of interest rather late, which favors the accumulation of errors in the deflation algorithm. On the contrary, due to the selection of the extracted components by the penalization term, P-SAUD ensures that the epileptic activity is extracted first as
CHAPTER 3. PREPROCESSING
1
1
0.9
0.9
0.8 0.7
CCA SOBI FastICA COM2 P−SAUD, R=1 P−SAUD, R=5 SAUD DelL
0.6 0.5 0.4 0.3 0.2 −30
signal correlation
spatial mixing vector correlation
32
−25
−20
−15
−10
0.8 0.7 0.6 0.5 0.4 −30
−5
−25
−20
1
0.8
0.9
signal correlation
spatial mixing vector correlation
1
0.6
0.4
0.2
0 −30
−5
−10
−5
−10
−5
0.7
0.6
−25
−20
−15
−10
0.5 −30
−5
−25
−20
−15
SNR 1
1
0.9
0.8
signal correlation
spatial mixing vector correlation
−10
0.8
SNR
0.6
0.4
0.2
0 −30
−15
SNR
SNR
0.8 0.7 0.6 0.5
−25
−20
−15
SNR
−10
−5
0.4 −30
−25
−20
−15
SNR
Figure 3.8: Correlation coefficients of recovered signal vectors and mixing vectors for the two distant patches SupFr & SupOcc (top), patches InfFr & InfPa with medium distance (center), and the two close patches SupFr & InfFr (bottom) for N = 32 sensors.
confirmed by low indices less than or equal to 5 for τ = [1, 2, 3, 4, 5] and less than or equal to 10 for τ = 1. This explains the reduced performance of SAUD in comparison to P-SAUD. Comparing the indices of the components extracted with P-SAUD for τ = 1 and τ = [1, 2, 3, 4, 5], one can observe that for medium and small patch distances, the indices are smaller if several penalties are used. This means that the use of several autocorrelation terms leads to a more robust extraction of the epileptic components compared to the case where only one penalty is used. However, this gain in robustness comes at an increased computational cost (cf. Figure 3.10). For FastICA and DelL, in this simulation, the average numbers of components that need to be extracted do not exceed 10 and 20, respectively. However, in practice, for these two methods the deflation algorithm
Artifact removal and separation of independent patches using ICA
30 CCA FastICA P−SAUD, R=1 P−SAUD, R=5 SAUD DelL
25 20
index of signal of interest
index of signal of interest
30
15 10 5 0 −30
33
−25
−20
−15
−10
25 20 15 10 5 0 −30
−5
−25
−20
−15
−10
−5
SNR
SNR
index of signal of interest
30 25 20 15 10 5 0 −30
−25
−20
−15
−10
−5
SNR
Figure 3.9: Index of recovered components for the two distant patches SupFr & SupOcc (top left), patches InfFr & InfPa with medium distance (top right), and the two close patches SupFr & InfFr (bottom) for N = 32 sensors. cannot be stopped after the extraction of a reduced number of components because theses methods extract the components in an arbitrary order and there is no guarantee that all epileptic components have been extracted after a given number of deflation steps. 3.1.6.3
Computational complexity
In Figure 3.10, we plot the performance achieved with the different tested algorithms as a function of the number of FLOPs computed according to Table 3.1 for a fixed SNR of -15 dB and the scenario with two distant patches. Except for CCA, which requires a fixed number of FLOPs, the computational complexity is varied by changing the number of iterations performed by the different algorithms. We assume that P-SAUD extracted the epileptic activity after M = Pe = 2 sweeps. It can be seen that the computational complexity of the SO methods CCA and SOBI (at the point of convergence) is smaller than that of conventional ICA methods such as FastICA and COM2 by a factor of approximately 100. However, FastICA, COM2, and SAUD extract the spatial mixing vectors with a higher accuracy than the SO methods. At an order of FLOPs that is comparable to that of the other ICA methods, DelL exhibits a very bad performance. Due to the deflation scheme, P-SAUD extracts the epileptic signals of interest at a reduced computational complexity compared to the other ICA methods while attaining the same accuracy as FastICA and COM2. While the gain on computational complexity of P-SAUD is significant if only one autocorrelation term is employed for penalization, as the number
34
CHAPTER 3. PREPROCESSING
spatial mixing vector correlation
1
signal correlation
0.9 0.8 0.7 0.6 0.5 0.4 7 10
8
9
10
10
10
10
FLOPs
1
0.8
0.6
0.4
0.2
0 7 10
8
9
10
10
10
10
FLOPs
Figure 3.10: Performance as a function of computational complexity for the scenario SupFr & SupOcc, an SNR of -15 dB, and N = 32 sensors. of autocorrelation terms increases, the computational complexity of P-SAUD augments. Therefore, it is generally more efficient to consider only one autocorrelation penalty in the P-SAUD contrast function and to extract a slightly increased number of components if necessary.
3.1.7
Real data analysis
To demonstrate the good performance of P-SAUD on real data, we illustrate the results obtained for 32-channel EEG recordings of a patient suffering from temporal lobe epilepsy. The measurements were acquired with a sampling frequency of 256 Hz and we considered an interval of about 20 s of interictal epileptic spikes corrupted by muscle artifacts and noise (cf. Fig. 4.10 (left), which shows a segment of the considered data). To remove the artifacts and noise, we applied both P-SAUD (with τ = 5) and COM2 to the data, extracting 32 independent signal components, shown in Fig. 4.10 (right). Contrary to COM2, which extracts the signals in an arbitrary order, in the case of P-SAUD, the order of the extracted signals depends on their autocorrelation. The components that characterize the epileptic spikes are thus extracted first whereas muscle activity, which has a low autocorrelation, is extracted last. The data were reconstructed using the first two P-SAUD components, which were selected by an EEG expert (see Fig. 4.10 (center left)). A comparison with the original data shows that the muscle activity, which corrupted in particular the recordings of electrodes FC6 and T4 has been removed in the reconstructed data and the noise has been reduced. Comparable results have been obtained for several other data sets. As this example shows, to reconstruct the epileptic spike data using P-SAUD, it would have been sufficient to extract only the first few components, which leads to a reduction of the computational complexity compared to COM2.
3.1.8
Conclusions
In this section, we have presented a new deflation algorithm that efficiently extracts the epileptic activity from EEG data corrupted by noise and artifacts. The proposed method exploits ideas from the COM2 and DelL algorithms, combining the efficient, semi-algebraic optimization procedure of COM2 with the deflation method by orthogonal projection of DelL. As shown by simulations and demonstrated on a real data example, the use of a contrast function that is penalized by autocorrelation terms ensures that the P-SAUD
Separation of correlated sources based on tensor decomposition
35
Figure 3.11: Real EEG recordings: Noisy data, data reconstructed using the first two components extracted by P-SAUD, and signal components extracted with COM2 and P-SAUD. algorithm extracts the epileptic activity in the first deflation steps. By uniting ideas of COM2, DelL, and CCA to extract only a small number of components of interest, the proposed P-SAUD algorithm therefore succeeds in denoising the EEG recordings of epileptic signals with the same performance as COM2, but at a considerably reduced computational cost.
3.2
Separation of correlated sources based on tensor decomposition
Let us assume that artifacts have already been removed from the EEG recordings, for example by applying one of the methods described in Section 3.1, or that the perturbations of the data due to artifacts are insignificant. The measurements can thus be considered to contain only the signals of interest, which may yet originate from several source regions, and background activity of the brain. In this context, to simplify the problem of localizing several potentially correlated patches, it is desirable to apply a preprocessing technique to the EEG data that separates simultaneously active source regions into different components and reduces the noise. In the past, a number of researchers have explored the application of deterministic tensor-based methods for EEG source separation. These techniques exploit multidimensional data (at least one dimension in addition to space and time) and assume a certain structure underlying the measurements. This structure is then exploited to identify a number of components that can be associated with the sources using tensor de-
36
CHAPTER 3. PREPROCESSING
composition methods such as the Canonical Polyadic (CP) decomposition [58], described in Section 3.2.2, which imposes a multilinear structure on the data. A few references can also be found where less restrictive tensor models are employed, including the PARAFAC2 model [59, 60, 61, 62], the Shift-invariant CP decomposition [63, 64] and an extension of the latter [65]. To obtain multidimensional data, one can either collect an additional diversity directly from the measurements, for instance, by taking different realizations of a repetitive event (see [66, 67, 68]), or create a third dimension by applying a transform which preserves the two original dimensions, such as the Short Term Fourier Transform (STFT) or the wavelet transform. Several authors have studied the application of the CP decomposition to Space-Time-Frequency (STF)-transformed EEG data, which is obtained by computing a wavelet transform [69, 70, 71, 72, 73] or the Wigner-Ville distribution [74] over the time dimension of the measurements. Under certain conditions on the signals, this method provides separate space, time, and frequency characteristics for each source region and therefore allows localizing each patch individually in a second step. This method is detailed in Section 3.2.3.1. In Section 3.2.3.2, we present an alternative method which is based on a local spatial Fourier transform of the EEG measurements. This leads to a Space-Time-Wave-Vector (STWV) tensor, that can also be decomposed using the CP model. An advantage of this approach compared to the STF analysis consists in its robustness to correlated source activities. This is of particular interest when patches with identical, but shortly delayed source activities have to be identified. This problem is, for instance, encountered in the context of interictal spikes when spreading of epileptic spikes is suspected between two regions. To understand the underlying mechanisms and conditions that are necessary for the STF and STWV techniques to work, in Section 3.2.4, we conduct a theoretical analysis of these approaches and derive sufficient conditions under which these methods yield exact results. To our knowledge, this has not been studied before. Finally, we analyze the source separation performance achieved by these methods on simulated data.
3.2.1
Problem formulation
In this section, we assume that the EEG data are generated by P (distributed) sources in the presence of background activity. The spatial distribution of each source is characterized by the spatial mixing vector hp(e) whereas the temporal activity is described by the signal vector s(e) p . This leads to the following data analysis model for the EEG recordings: X = H(e) S(e) + Xb (e)
(e)
(3.29)
where the spatial mixing matrix H(e) = [h1 , . . . , hP ] contains the spatial mixing vectors (e) (e) for all sources and the signal matrix S(e) = [s1 , . . . , sP ]T characterizes the associated temporal activities. The objective then consists in estimating the matrices H(e) and S(e) from the data X, which permits us to separate several simultaneously active patches. Please note that contrary to Section 3.1, where the measurements and the signals are regarded as stochastic random vector processes, in this section, they are treated as deterministic matrices.
Separation of correlated sources based on tensor decomposition
37
The model (3.29) is a bilinear model in space and time. However, there is no unique solution for such a matrix decomposition unless one imposes additional constraints like orthogonality or statistical independence as incorporated in Principal Component Analysis (PCA) or ICA. Since such constraints may be physiologically difficult to justify, especially in the context of propagation phenomena, which lead to correlated source signals, another solution to the problem of non-uniqueness is desirable. This is what motivates the use of tensor decomposition methods. The idea of tensor-based approaches consists in exploiting the structure of multi-way data, which can, for example, be obtained by applying a transform to the two-dimensional measurements. Under the hypothesis that the resulting data, which depend on three variables, are multilinear, the tensor can be decomposed in a unique way (under mild conditions) up to scale and permutation ambiguities into separate characteristics for each variable with the help of the CP decomposition (also sometimes referred to as Parallel Factor Analysis (PARAFAC)). It is thus possible to get an accurate estimate of the spatial mixing matrix or the signal matrix. Furthermore, this procedure leads to a reduction of the background activity because the latter does not match the assumed multilinear structure. In the following, we describe the CP decomposition and methods for the construction of a data tensor in more detail.
3.2.2
CP decomposition
We start this section with an introduction of terminology, definitions, and notations: • A tensor of order d corresponds to a d-dimensional data array for which fixed bases have been chosen for the representation. In the following, we concentrate on third order tensors, i.e, d = 3. • A third order tensor X ∈ CI1 ×I2 ×I3 has rank 1 if each element of the tensor can be written as the product of three functions ak , b` , dm , each of which depends on a distinct index: Xk,`,m = ak b` dm (3.30) with k = 1, . . . , I1 , ` = 1, . . . , I2 , m = 1, . . . , I3 . Equivalently, a rank-1 tensor can be written in the following form, based on the outer product of the three vectors a, b, and d: X = a ◦ b ◦ d. (3.31) • A mode-n vector of the tensor is obtained by varying the n-th index of the tensor elements from 1 to In , where In corresponds to the number of elements in the n-th dimension, and by fixing all other indices. This leads to a column vector of size In . • The mode-n unfolding matrix of the tensor X, denoted by [X](n) , contains all moden column vectors. Different definitions of the unfolding matrices can be found in the literature depending on the order in which the mode-n vectors are arranged into the matrix [X](n) . Here, we consider an ordering where higher indices are varied faster than lower indices (see [75] for more details). • The mode-n product of a tensor X of size I1 × I2 × I3 with a matrix A of size In × Jn is denoted by X •n A and corresponds to the multiplication of all mode-n vectors
38
CHAPTER 3. PREPROCESSING by the matrix AT such that [X •n A](n) = AT [X](n) . The size of the n-th dimension of the tensor thus changes from In to Jn . • The Khatri-Rao product of two matrices A ∈ CI×P and B ∈ CJ×P is defined as A B = [a1 ⊗ b1 , . . . , aP ⊗ bP ] ∈ CIJ×P and corresponds to the columns-wise Kronecker product. • The Kruskal rank (or k-rank) of a matrix X is the maximal number σ such that every subset of σ columns of X is linearly independent. • The mutual coherence of a matrix X = [x1 , . . . , xD ] is the maximal correlation |xH x | coefficient between any two columns of X and is defined as µ(X) = maxi6=j ||xi ||2i ·||xj j ||2 with i, j = 1, . . . , D.
Now, based on these definitions, we can introduce the CP tensor decomposition. Since we only consider methods based on third order tensors in this thesis, the presentation of the CP decomposition is limited to this case. Exact CP decomposition Let us consider an arbitrary third order tensor X of size I1 × I2 × I3 and indices k = 1, . . . , I1 , ` = 1, . . . , I2 , m = 1, . . . , I3 . Each element of the tensor can be written in the form: Xk,`,m =
Q X
ak [q]b` [q]dm [q]
(3.32)
q=1
which is generally called a polyadic decomposition of the tensor X. If Q is the smallest integer for which equality (3.32) holds, Q corresponds to the rank of the tensor. In this case, the model (3.32) is referred to as the Canonical Polyadic (CP) decomposition of the tensor X [58, 75]. The CP model comprises a trilinear structure. This means that according to equation (3.32), each element Xk,`,m of the tensor corresponds to the sum of Q components which can be factorized into the product of three functions ak [q], b` [q], and dm [q], each of which depends on only one variable. For this reason, the variables on which the tensor depends are said to be separable. The elements ak [q], b` [q], and dm [q] of the three functions can be stored in the matrices A ∈ CI1 ×Q , B ∈ CI2 ×Q , and D ∈ CI3 ×Q , respectively, called loading matrices. A graphical representation of the CP decomposition can be found in Figure 3.12. Approximate CP decomposition As in practice, measurements are always corrupted by noise, which generally leads to an increase of tensor rank, we are interested in approximating the tensor X of rank Q by a tensor of given lower rank P , for which each of the P rank-1 terms corresponds to a meaningful component: X≈
P X
ap ◦ bp ◦ dp .
(3.33)
p=1
Model (3.33) is subsequently referred to as the approximate CP decomposition. To find the best rank-P approximation of the tensor, in practice, one generally solves an optimization
Separation of correlated sources based on tensor decomposition
39
Figure 3.12: Graphical representation of the CP decomposition as a sum of rank-1 tensors, which are obtained by the outer product of three vectors. The resulting tensor is of size 4 × 6 × 5 and rank 3. problem of the form P X ap ◦ bp ◦ dp . inf X − ap ,bp ,dp p=1
(3.34)
F
3.2.2.1
Number of components
In general, the number of sources, which correponds to the number of components in the approximate CP decomposition, is not known a priori and needs to be estimated from the data. A popular approach for the estimation of an appropriate number of CP components consists in employing the Core Consistency Diagnostic (Corcondia) [76]. However, in our experience, this method tends to overestimate the number of sources in the context of EEG (distributed) source separation. In this thesis, we do not address this difficult problem, but assume that the number of (distributed) sources is known. 3.2.2.2
Essential uniqueness
In practice, it is not possible to recover the exact order of the CP components and the scaling of the loading matrices A ∈ CI1 ×P , B ∈ CI2 ×P , and D ∈ CI3 ×P . Therefore, the CP tensor decomposition is said to be essentially unique if loading matrices can be recovered up to multiplicative diagonal matrices ΛA , ΛB , ΛD such that ΛA ΛB ΛD = IP ˆ = AΠΛA , B ˆ = BΠΛB , D ˆ = DΠΛD . and a permutation matrix Π ∈ RP ×P , i.e., if A In the past, several conditions for essential uniqueness of the CP decomposition have been put forward. In this context, one has to distinguish between uniqueness conditions for the exact CP decomposition on the one hand, and existence and uniqueness conditions for the best low-rank approximation on the other hand. The former include the well-known Kruskal condition [77, 78], which states that the CP decomposition is essentially unique if: 2P + 2 ≤ k-rank(A) + k-rank(B) + k-rank(D) (3.35) as well as almost-sure conditions based on a dimension count such as (see e.g. [79] and references therein): I1 I2 I3 P < (3.36) I1 + I2 + I3 − 2 that are often used as a substitute of the Kruskal condition because they are easy to verify in practical situations. Note that these conditions are sufficient but not necessary.
40
CHAPTER 3. PREPROCESSING
In practice, we generally employ the approximate CP decomposition to cope with noise and inaccuracies in the data model. Therefore, we consider a recently established condition (see [79]) which is based on coherences of loading matrices. This condition ensures that a solution to (3.34) exists and that the approximate CP decomposition is essentially unique if 3 . (3.37) (µA µB µD )1/3 ≤ 2P + 2 Here, µA , µB , and µD denote the coherences of matrices A, B, and D, respectively. The sufficient condition (3.37) is more restrictive than the Kruskal condition, but it is easy to verify. Furthermore, the Kruskal condition has not been shown to apply to the approximate CP decomposition even if it is often used to check uniqueness in this case. The maximal number of components which can be separated by the approximate CP decomposition is actually larger than the bound defined by (3.37). But in most applications, it is not an annoying restriction. In fact, in the context of EEG source separation, the rank P of the noiseless tensor corresponds to the number of distributed sources, which is usually small (less than 10) compared to the tensor dimensions. The limitations of the tensor decomposition approach thus arise from the approximations that are made when imposing a certain structure on the data (cf. Section 3.2.4) and not from the identifiability conditions. 3.2.2.3
Algorithms
The objective of the approximate CP decomposition consists in finding a solution to the optimization problem (3.34). To this end, a wide panel of algorithms have been used, including alternating methods like Alternating Least Squares (ALS) [76], derivative-based techniques such as Gradient Descent (GD), conjugate gradient, Levenberg-Marquardt (LM) [58], or the efficient algorithms presented in [80, 81], and direct techniques (see, e.g., [82, 83, 44] and references therein). Subsequently, we shortly review the basic principles of ALS and the DIAG (Direct Algorithm for canonical polyadic decomposition) algorithm [84, 44]. ALS The idea of the ALS algorithm consists in alternatingly updating the three loading matrices A, B, and D. This is accomplished by resorting to the unfolding matrices of the tensor, which can be used to derive the following update rules for the loading matrices:
+
+
+
A = [X](1) (D B)T
B = [X](2) (D A)T D = [X](3) (B A)T
(3.38) (3.39) .
(3.40)
Based on initial estimates, the loading matrices are then determined by iterating over equations (3.38) to (3.40) until convergence or a maximal number of iterations is reached. Due to its simplicity, the ALS algorithm is very popular. It also allows for an easy incorporation of constraints such as real-valued loading matrices. Note though that this method may converge very slowly and that there is no guarantee that it will converge toward the best low-rank approximation at all.
Separation of correlated sources based on tensor decomposition
41
DIAG The DIAG algorithm exploits the fact that the mode-n loading matrix of the tensor can be obtained by multiplying the left signal subspace of the mode-n unfolding matrix of the tensor by a transform matrix. In the following, we assume, without loss of [s] [s] generality, that n = 1. In this case, A = U1 T1 where U1 is the left signal subspace of the mode-1 unfolding matrix [X](1) and T1 is a transform matrix. The determination of the loading matrix A can therefore be reduced to finding the transform matrix T1 . As has been shown in [84, 85], this can be achieved by searching for the matrix T1 that jointly P ×I2 diagonalizes by equivalence a set of I3 (I3 − 1) matrices Ψ(k,l) = Γk Γ+ l where Γk ∈ R [s] + is obtained from Y = [Γ1 , . . . , ΓI3 ] = (U1 ) [X](1) . In our implementation, we choose only one value for l that corresponds to the index of the best conditioned matrix Γk in order to limit the computational complexity. The joint diagonalization of the matrices Ψ(k,l) = T1 Λ(k,l) T−1 1 is then performed using the JET (Joint Eigenvalue decomposition based on Triangular matrices) algorithm presented in [84, 85]. Here, the diagonal matrix Λ(k,l) = diag{d(k) }(diag{d(l) })−1 with d(k) the k-th row of the loading matrix D allows to recover the elements of D. The loading matrix B can then be computed in a least squares sense based on the mode-2 unfolding [X](2) of the tensor and the two already determined loading matrices A and D: h i+ B = [X](2) (D A)T . By permuting the dimensions of the tensor, six different estimates for the three loading matrices can be obtained. Because of its good performance and robustness to collinear factors, overestimation of the number of CP components, and initialization, we employ the DIAG algorithm for the tensor decompositions that are computed in this thesis.
3.2.3
Transform-based tensor methods
In this thesis, we focus on approaches that construct a 3-dimensional data tensor, which can be treated using the CP decomposition, by applying a transform to the two-dimensional EEG recordings. To this end, one can either compute a transform over time of the electric potential measurements, which leads to the STF analysis, or a transform over space, yielding STWV data. These methods are described in the subsequent sections in the context of EEG data. An extension of these tensor methods to combined EEG/MEG data is presented in Appendix C. 3.2.3.1
Space-Time-Frequency (STF) analysis
An often used technique for the time-frequency analysis of EEG data consists in applying a wavelet transform to the time signals {x(r, t)} of the different channels [69, 70, 71, 72, 73]. The resulting three-way data can then be stored into the data tensor W (r, t, f ) =
Z
∞
x(r, τ )ψ(a, τ, t)dτ .
(3.41)
−∞
The frequency f can be estimated from the scale a of the wavelet ψ(a, τ, t) by f = fc /(aTs ) where fc is the center frequency of the wavelet and Ts is the interval between time samples. In order to decompose the tensor W using the CP decomposition, we assume that for each extended source, the time and frequency variables separate, leading to a trilinear
42
CHAPTER 3. PREPROCESSING
tensor. This is approximately the case under the hypothesis of oscillatory signals. The tensor can then be decomposed as: W [rk , t` , fm ] ≈
P X
aW [rk ; p]bW [t` ; p]dW [fm ; p]
(3.42)
p=1
where rk , t` , and fm represent the sampled space, time, and frequency variables and aW [rk ; p], bW [t` ; p], and dW [fm ; p] denote elements of the loading matrices AW , BW , and DW indicating the space, time, and frequency characteristics, respectively. The number of components P corresponds to the number of extended sources. The loading matrix AW containing the spatial characteristics generally constitutes a good estimate for the spatial mixing matrix H(e) . By contrast, an exact separation of the wavelet transformed data into time and frequency characteristics can only be obtained if the frequency content of the signal is constant over time. In practice, this is not the case and the bilinear approximation of the time-frequency data limits the accuracy of the time signals estimated by the temporal characteristics. This is why we use the pseudoinverse ˆ (e) to obtain an improved estimate of the signal of the estimated spatial mixing matrix H (e) ˆ : matrix S ˆ (e) = H ˆ (e)+ X. S (3.43) 3.2.3.2
Space-Time-Wave-Vector (STWV) analysis
If a local spatial Fourier transform is calculated within a certain region on the scalp, selected by the spherical window function w(r0 − r) centered at sensor position r (see Appendix D for more details), the STWV tensor F (r, t, k) =
Z
∞
w(r0 − r)x(r0 , t)ejk
T r0
dr0
(3.44)
−∞
is obtained. Here, the third variable k is the wave vector. Under the assumption that the space and wave vector variables separate for each extended source, which is approximately the case for superficial sources, the tensor F can be approximated by the CP model and be decomposed into space, time, and wave vector characteristics aF [rk ; p], bF [t` ; p], and dF [km ; p]: F [rk , t` , km ] ≈
P X
aF [rk ; p]bF [t` ; p]dF [km ; p].
(3.45)
p=1
ˆ (e) = BF constitute a In the case of the STWV analysis, the temporal characteristics S (e) (e) ˆ for the lead field matrix good approximation of the signal matrix S . An estimate H (e) (e)+ ˆ H can thus be obtained from the pseudo-inverse S of the estimated signal matrix (e) ˆ S and the data matrix X: ˆ (e)+ . ˆ (e) = XS H (3.46) This permits to obtain better results than employing the space characteristics identified by the CP decomposition of F because the local spatial Fourier transform does not lead to a bilinear model with clearly separated space and wave vector characteristics, leading to perturbations in the loading matrix AF compared to the spatial mixing matrix H(e) .
Separation of correlated sources based on tensor decomposition
43
Figure 3.13: Estimation of the spatial mixing matrix and the signal matrix based on the results of the STF and STWV tensor decompositions. The procedure for estimating the spatial mixing matrix and the signal matrix from the results of the CP decomposition of the STF and STWV tensors is illustrated in Figure 3.13.
3.2.4
Analysis of the trilinear approximation
Even though the STF analysis has been widely used, up to now only intuitive conditions such as oscillatory signals that presumably lead to trilinear data have been provided. But no theoretical validation that justifies the application of the CP decomposition to the STF data tensor has been performed and the mechanisms underlying the STF method are still insufficiently explored. The same is true for the STWV technique. Therefore, in this section, we analyze what happens when applying the DIAG algorithm to STF or STWV data, that are not exactly trilinear, and clarify under which conditions this procedure yields exact results. In order to treat both tensor methods simultaneously, we use a different notation for the matrices as in the rest of this thesis to avoid confusion that may be caused by discrepancies from the data model of equation (4.3). Please note that in the following, for the STF method, the matrix Z replaces the data matrix X, the matrix U corresponds to the spatial mixing matrix H(e) , that we want to extract, and the matrix M corresponds to the signal matrix S(e) . For the STWV method, Z replaces the transpose of the data matrix, XT , U corresponds to the transpose of the signal matrix S(e)T , that is to be identified, and M corresponds to the transpose of the spatial mixing matrix H(e)T . If a time-frequency or space-wave-vector transform is applied to the second dimension of the matrix Z = UM where U = [u1 , . . . , uP ] ∈ RN ×P is the matrix of interest and Q ∈ RP ×M , one obtains a tensor with the following structure: T =
P X
up ◦ M p
(3.47)
p=1
where Mp ∈ CM ×J , p = 1, . . . , P , are matrices of rank Lp . The objective of the STF or STWV analysis consists in recovering the vectors up from the tensor T using the CP decomposition.
44
3.2.4.1
CHAPTER 3. PREPROCESSING
Sufficient conditions for perfect recovery
In practice, the matrices Mp generally have full rank and the approximate rank-P CP decomposition of the tensor T does not lead to the correct identification of the vectors up in general. However, for P = 2, denoting M1 = M2 =
L1 X l=1 L2 X l=1
h
σl vl wlT = v1 V2 λl xl ylT
h
= x1 X2
" i σ 1
0
" i λ 1
0
#
iT 0 h w 1 W2 Σ2
(3.48)
#
iT 0 h y 1 Y2 Λ2
(3.49)
the SVD of M1 and M2 with σ1 > σ2 > . . . > σL1 , λ1 > λ2 > . . . > λL2 and assuming that ||u1 ||2 = ||u2 ||2 = 1, the vectors u1 and u2 can be perfectly recovered using the DIAG algorithm (see Section 3.2.2.3, [84, 44]) based on the mode-2 unfolding of T if one of the following conditions holds: C1) v1T X2 = 0T , x1T V2 = 0T , w1T Y2 = 0T , y1T W2 = 0T , and µ2 > 1 or C2) v1T X2 = 0T , x1T V2 = 0T , u1T u2 = 0, and µ2 > 1 . Furthermore, perfect recovery of u1 and u2 using the DIAG algorithm based on the mode-3 unfolding of T is possible under the conditions C3) v1T X2 = 0T , x1T V2 = 0T , w1T Y2 = 0T , y1T W2 = 0T , and ν2 > ϕ1 or C4) w1T Y2 = 0T , y1T W2 = 0T , u1T u2 = 0, and ν2 > ϕ1 . For the derivation of these conditions as well as the definition of the singular values µ2 , ν2 , 1 , and ϕ1 , the reader is referred to Appendix E. 3.2.4.2
Discrepancies from the above conditions
If the conditions on orthogonality are not fulfilled, which is usually the case in practice, the vectors u1 and u2 cannot be correctly recovered, leading to errors of the estimated vectors u ˆ1 and u ˆ2 . For small correlation coefficients between v1 and X2 , w1 and Y2 , x1 and V2 , and y1 and W2 or correlation of v1 , w1 , x1 , and y1 with vectors that are associated with very small singular values, the errors on the estimated vectors u ˆ1 and u ˆ2 can be regarded as negligible. In this case, the STF and STWV methods yield good results for the space or time characteristics of each patch. On the other hand, for large correlation coefficients between the singular vectors of M1 and M2 and especially in the case where the condition on the singular values (µ2 > 1 or ν2 > ϕ1 ) is not fulfilled (which occurs, for example, if the singular values of M1 and M2 do not decrease quickly or if one source is much stronger than the other source), the result of the CP decomposition can be seriously perturbed (up to containing only information about one of the sources) and does not permit to obtain an adequate estimate of the vectors u1 and u2 . In this case, the STF or STWV analysis fails.
Separation of correlated sources based on tensor decomposition
3.2.4.3
45
Interpretation of the mathematical conditions with respect to the STF and STWV analyses
In the following, we consider three types of conditions that are involved in C1) to C4) and point out how they intervene in the STF and STWV analyses of EEG data. µ2 > 1 , ν2 > ϕ1 The validity of this condition depends on the one hand on the singular value profiles of the time-frequency or space-wave-vector matrices of the patches (matrices M1 and M2 ) and on the other hand on the source strengths. For slowly decreasing singular values, it requires the source strengths to be approximately equal whereas quickly decreasing singular value profiles enable the STF and STWV techniques to tolerate a certain difference in source strength, which may be due to different patch sizes, different patch locations, or different signal amplitudes. This is the case for the STF analysis, for oscillatory signals, where one can assume that there is one dominant frequency characteristic for each source, yielding time-frequency matrices Mp with only one great singular value. In a similar way, superficial patches generate focused spatial distributions that can be described by one dominant spatial component per patch, leading to a quickly decreasing singular value profile of the space-wave-vector matrix. u1T u2 = 0 In the case of the STF analysis, the vectors u1 and u2 correspond to the spatial mixing vectors of the patches. This condition thus requires the spatial mixing vectors to be uncorrelated. The correlation of the spatial mixing vectors is related to the patch distance and is generally small for distant patches and high for close patches. For the STWV method, the source time signals are required to be uncorrelated as the vectors u1 and u2 characterize the time courses of the patch amplitudes. In practice, small correlation coefficients are usually sufficient to obtain reasonably good results (cf. Section 3.2.4.2). v1T X2 = 0T , x1T V2 = 0T , w1T Y2 = 0T , y1T W2 = 0T These orthogonality conditions concern correlations of the time-frequency or space-wave-vector profiles of the two patches and are difficult to interpret in practice. For the STF analysis, this is the case for sufficiently different time and frequency characteristics of two sources (for example sources with uncorrelated time signals involving different frequency bands) whereas this is achieved for sufficiently distant patches giving rise to different dominant spatial components in the case of the STWV analysis. The influence of each of these correlation coefficients also depends on the associated singular values. Quickly decreasing singular value profiles of the time-frequency or space-wave vector matrix considerably reduce the importance of a large number of correlation coefficients. In Section 4.4.2.4 of this thesis, the findings of the theoretical analysis are discussed with respect to the simulation results obtained for two scenarios with the STWV method.
3.2.5
Analysis of the computational complexity
In this section, we compare the STF and STWV methods relative to their computational costs. To this end, we subsequently determine the number of FLOPs in terms of realvalued multiplications that are required for the tensor construction and decomposition
46
CHAPTER 3. PREPROCESSING
steps. Our analysis builds on the computational complexities of several basic operations, which are shown in Table 3.2. The computational complexities of the STF and STWV techniques are summarized in Table 3.3. Number of FLOPs Matrix multiplication X = AB, A ∈ RN ×M , B ∈ RM ×L X = AAT , A ∈ RN ×M
NML O( 21 N 2 M )
EVD of X ∈ RN ×N
O( 43 N 3 )
Economy-size SVD of X ∈ RN ×M , rank(X) = R N 1 over other source imaging algorithms lies in their asymptotic robustness to Gaussian noise, because cumulants of order higher than 2 of a Gaussian random variable are null (see also Appendix A.2). 2q-ExSo-MUSIC The 2q-ExSo-MUSIC algorithm exploits the 2q-th order cumulants q q of the data arranged in a matrix, C2q,x ∈ RN ×N . The latter is generally estimated using the Leonov-Shiryaev formula and sample statistics (see Appendix A.1 and [154]). The 2qExSo-MUSIC algorithm is based on data model (4.3) and assumes that the measurements are generated by a small number of distributed sources, each characterized by a number ˜ with ψd ∈ {0, 1}. The of adjacent grid dipoles with identical amplitudes: h = Gψ distributed source lead field vector h is thus parameterized by the coefficient vector ψ. The restriction of coefficients to 0 or 1 imposes a piece-wise constant source distribution (corresponding to hypothesis S3) similar to VB-SCCD. The higher order cumulant matrix is then given by: C2q,x = H⊗q C2q,s (H⊗q )T (4.24) q
q
where C2q,s ∈ RP ×P is the 2q-th order cumulant matrix of the distributed sources. The vectors hp⊗q are thus contained in the signal subspace of C2q,x . In analogy to the classical MUSIC algorithm, the 2q-ExSo-MUSIC spectrum: ⊗q ⊗q ⊗q (ψ ⊗q )T (G⊗q )T Us UT (h⊗q )T Us UT sG ψ s h = FMUSIC (ψ) = (h⊗q )T h⊗q (ψ ⊗q )T (G⊗q )T G⊗q ψ ⊗q
(4.25)
is then computed for a number of pre-defined parameter vectors ψ. To this end, a dictionary of potential elementary distributed sources is defined by a number of circular-shaped cortical areas of different centers and sizes, subsequently called disks. Each disk is composed of a number of adjacent grid dipoles and characterized by a coefficient vector ψ with ψd = 1 for all dipoles belonging to the disk and 0 otherwise. For the true spatial mixing vectors hp⊗q , p = 1, . . . , P , which are contained in the signal subspace of C2q,x , the spectrum is equal to 1. In practice, the 2q-ExSo-MUSIC spectrum does not exactly reach 1 because of inaccurate modeling of the distributed source lead field vectors. The spectrum is hence thresholded and all coefficient vectors ψ for which the spectrum exceeds a fixed threshold are retained and united to model distributed sources.
4.4
Tensor-based source localization
While previous studies of tensor-based methods for EEG analysis [69, 70, 71, 72, 73] have concentrated on source separation and equivalent current dipole estimation, we here extend these techniques to the localization of distributed sources. This leads to a new family of source imaging algorithms: the tensor-based approaches. These methods proceed in two steps:
Tensor-based source localization
77
1. the separation of different distributed sources using tensor decomposition, and 2. the identification of grid dipoles characterizing each distributed source. The first step has already been adressed in Section 3.2 where we have described two tensor-based methods for the separation of EEG sources using the CP decomposition. Depending on the dimensions of the employed tensor, the CP decomposition involves different multilinearity assumptions: for Space-Time-Realization (STR) data, hypothesis T3) is required, for STF data, hypothesis T4) is involved, and for STWV data, we resort to hypothesis S5). This gives rise to different sub-classes of tensor-based source localization methods. In this section, we focus on hthe second istep, the distributed source localization, which ˆ1 , . . . , h ˆP of the spatial mixing matrix H, allowing for ˆ = h is based on an estimate H a separate identification of the grid dipoles for each patch. In principle, any localization algorithm that acts on a vector of spatial measurements can be employed to this end. Here, we introduce the Disk Algorithm (DA) that uses an optimization strategy inspired by the 2q-ExSo-MUSIC approach (see 4.3.3.2, [15]) but with a different metric built from the spatial mixing matrix estimated by the STF or STWV analysis. To this end, we employ the extended source data model (see Section 4.2.3) and assume a piece-wise constant spatial source distribution, associated with hypotheses ST) and S5). After presenting the disk algorithm in Section 4.4.1, we analyze the performance of the STWV-based and STF-based disk algorithms, referred to as STWV-DA and STF-DA in the following, based on computer simulations in Section 4.4.2. Finally, in Section 4.4.3, these methods are validated on real EEG recordings of an epileptic patient. Our findings are summarized and discussed in Section 4.4.4.
4.4.1
Disk algorithm (DA)
Similar to 2q-ExSo-MUSIC (cf. Section 4.3.3.2), the concept underlying the disk algorithm consists in recovering the extended source from a dictionary of potential distributed source regions corresponding to small, circular-shaped patches of grid dipoles, the disks. For each grid dipole, several disks composed of the 0 to Dmax −1 nearest dipoles and the current grid dipole as central point are determined. The disks are characterized by coefficient vectors ψk , k = 1, . . . , Dmax D, with ψk,d ∈ {0, 1} as described in Section 4.3.3.2. According to equation (4.4), the reconstructed spatial mixing vector of the k-th disk can then be ˜ k. obtained as hk = Gψ To determine which disks of the parameter space best describe the spatial mixing ˆp of the p-th source, which has previously been estimated using the STWV or vector h ˆp is compared to the spatial mixing vectors hk of all disks using STF method, the vector h the following metric, which is based on the normalized inner product: 2 ˆ T Gψ ˜ h p ˆp , ψ) = . Fiprod (h ˜ T Gψ ˜ ψTG
(4.26)
All grid dipoles belonging to disks for which (4.26) exceeds a certain threshold are then merged to form the p-th distributed source. On the other hand, one could also think of identifying the disks by minimizing the difference between estimated and reconstructed spatial mixing vectors which is done by
78
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
minimizing the function 2 ˆp , ψ) · Gψ|| ˆp , ψ) = ||h ˆp − m(h ˜ Fdiff (h 2.
(4.27)
Here, ˆp , ψ) = m(h
ˆ T Gψ ˜ h p ˜ T Gψ ˜ ψTG
(4.28)
is a normalization factor which is introduced due to the scaling ambiguity of the estimated spatial mixing vector inherent to the CP decomposition. The normalization factor is computed in such a way that it minimizes the metric for given estimated and reconstructed spatial mixing vectors. Interestingly, inserting (4.28) into (4.27), the difference metric can be reduced to 2 ˆ T Gψ ˜ h p ˆTh ˆ ˆp , ψ) = h Fdiff (h p p− ˜ T Gψ ˜ ψTG ˆTh ˆ ˆ = h p p − Fiprod (hp , ψ),
(4.29) (4.30)
and minimizing (4.30) is equivalent to maximizing the inner product metric (4.26). Therefore, we only consider the inner product metric in the following. The steps of the tensor-based disk algorithm are schematically illustrated in Figure 4.2.
Figure 4.2: Schematic illustration of the tensor-based disk algorithm.
4.4.2
Computer results
In this section, the performance of the STF-DA and STWV-DA methods is analyzed by means of realistic computer simulations. To this end, we employ a simulation setup that is inspired by [155]. 4.4.2.1
Simulation setup
Data generation EEG data are generated as described in Section 2.5 for N = 91 electrodes and a source space that is composed of D = 19626 dipoles with fixed orientations
Tensor-based source localization
79
located on the cortial surface. For the generation of extended sources, we consider a number of patches each of which consists of 100 adjacent grid dipoles corresponding to a cortical area of approximately 5 cm2 (see Figure 2.6). Using the neuronal population model described in Section 2.5.2, highly-correlated epileptiform spike-like signals comprising T = 200 time samples with a sampling rate of 256 Hz are created for all dipoles of one patch (see Figure 2.7 for an example). We analyze several scenarios composed of two patches, for which the time courses of the dipoles in the first patch are delayed by several time samples according to the distance between the two patches and attributed to the dipoles in the second patch. This corresponds to the case where the epileptic activity in the first patch spreads to a second patch. For small distances, a random delay of 1 or 2 time samples (4-8 ms) is used for each signal. For medium distances the signals are shifted by 3 or 4 time samples (12-16 ms) and for large distances, a signal delay of 5 or 6 samples (20-24 ms) is employed. Finally, using the same model of neuronal populations, we generate normalized physiological background activity and add it to the simulated measurement data. The normalization is carried out such that the amplitude of the background activity for dipoles outside the patch corresponds to the amplitude of background activity between spikes in the patch. Tested source localization methods The patches are localized using STF-DA and STWV-DA. Furthermore, we employ sLORETA, cLORETA, and 4-ExSo-MUSIC for distributed source localization to compare the results of the tensor-based methods to other approaches. All algorithms are tested both on the raw EEG data and on spatially prewhitened data (see Section 4.2.4) in order to evaluate the impact of prewhitening on the source localization results. To estimate the noise covariance matrix, we use 25000 time samples of data generated for the case where all dipoles emit background activity. For STWV-DA, we construct the tensor from the raw data, as prewhitening would change the space-wave-vector characteristics, but we estimate the spatial mixing matrix H from the prewhitened data matrix and the temporal characteristics identified by the CP decomposition. To improve the SNR, in particular for deep sources, whose measurable signals at the surface are otherwise completely submerged by the noisy background activity, we consider data that is averaged over 10 spikes, synchronized on the maximum of the spike. Since sLORETA and cLORETA do not take into account the temporal information, they are applied to the time sample that exhibits the highest variance over all EEG channels, corresponding to the maximum of the epileptiform spike. For 4-ExSo-MUSIC, in order to dispose of a sufficient amount of time samples to accurately estimate the FO statistics, we concatenate the T = 200 time samples that are selected for each of the 10 spikes, leading to a total of 2000 time samples. The parameters of the tested algorithms are chosen as follows: the tensor-based preprocessing with the STF and STWV methods is performed as described in Section 3.2.6.1 and if not stated otherwise, the number of CP components is chosen to be equal to the number of patches. For STF-DA and STWV-DA, the localization is thus performed for each patch separately. For both DA and 4-ExSo-MUSIC, we construct a dictionary of potential sources comprising circular-shaped source regions that are composed of up to Dmax = 100 grid dipoles. The dimension of the signal subspace of the FO cumulant matrix used in 4-ExSo-MUSIC is chosen according to the number of distributed sources according to the following rule: R = rank(Us ) = P (P2+1) . For sLORETA and cLORETA,
80
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
we fix the regularization parameter λ such that it approximately balances between the data fit term, whose expected value is equal to the number of sensors N in the case of prewhitening, and the regularization term. Evaluation criteria To quantitatively evaluate the performance of the different algorithms, we use two measures: the Distance of Localization Error (DLE) [156], which characterizes the difference between the original and the estimated source configurations, and the Receiver Operating Characteristic (ROC) [155], which reflects the ability of a source imaging algorithm to recover the extent and the form of the distributed sources. Mathematically, the DLE is defined as follows:
1 X 1 1 X min ||rk − r` ||2 + min ||r − r` ||2 DLE = ˆ ˆ k∈I k 2 Q k∈I `∈Iˆ Q `∈I
(4.31)
where I denotes the set of indices of all grid dipoles belonging to an active patch, Q is the number of active grid dipoles, i.e., Q = #I, Iˆ denotes the set of indices of all estimated ˆ = #Iˆ corresponds to the number of the latter. Furthermore, active source dipoles and Q rk denotes the position of the k-th source dipole, which corresponds to the centroid of the k-th triangle. To compare the estimated source configuration to the original source configuration characterized by the dipoles belonging to the active patches, we consider a number of active estimated dipoles that is equal to the true number of patch dipoles or as close to this number as possible. To achieve this, we threshold the absolute value of the sLORETA and cLORETA solutions and the STF-DA, STWV-DA, and 4-ExSo-MUSIC metrics by a suitable value. The ROC displays the True Positive Fraction (TPF) of correctly identified source dipoles as a function of the False Positive Fraction (FPF), which represents the number of source dipoles erroneously associated with the distributed sources: ˆ #(I ∩ I) #I ˆ #Iˆ − #(I ∩ I) FPF = . #J − #I
TPF =
(4.32) (4.33)
Here, J denotes the set of all dipoles belonging to the source space. Different TPF and FPF values are achieved by varying the threshold values for the extended source localization algorithms. The ROC curves are plotted for an FPF ranging from 0 % (no dipoles that are falsely associated to the patch) to 6 %, which corresponds to approximately 60 cm2 of cortex that is erroneously associated to the patch. Since each patch comprises an area of about 5 cm2 , we are mostly interested in the ROC curves for an FPF below 1 %. The results are averaged over 50 realizations, obtained with different spike-like signals and varying background activity. 4.4.2.2
Influence of the patch distance
An important factor for the distinction of two patches is their distance, especially for the STWV analysis, which exploits the difference between spatial distributions of the electric potential for each patch. To determine the influence of the patch distance on the source
Tensor-based source localization
81
localization results, we consider in the following three configurations of two superficial patches with large, medium, and small distances amounting to approximately 13.5 cm, 9 cm, and 5 cm, respectively. The analyzed scenarios are composed of patches SupFr and SupOcc (large distance), patches InfFr and InfPa (medium distance), and patches SupFr and InfFr (small distance). The ROC curves obtained for the tested source localization methods are plotted in Figure 4.3 for both raw and prewhitened EEG data and the corresponding DLE values are presented in Table 4.2. Furthermore, for the raw EEG data, Figure 4.4 shows the original and recovered patches for the STWV-DA algorithm.
80
80
60
60 sLORETA cLORETA STF−DA STWV−DA 4−ExSo−MUSIC
40
20
0 0
TPF
100
TPF
100
1
2
3
4
5
40
20
0 0
6
1
2
FPF
3
4
5
6
4
5
6
4
5
6
FPF
80
80
60
60
TPF
100
TPF
100
40
40
20
20
0 0
1
2
3
4
5
0 0
6
1
2
FPF
3
FPF 100
80
80
60
60
TPF
TPF
100
40
40
20
20
0 0
1
2
3
FPF
4
5
6
0 0
1
2
3
FPF
Figure 4.3: ROC obtained for STF-DA and STWV-DA in comparison to 4-ExSo-MUSIC, cLORETA, and sLORETA applied to raw EEG data (left) and to spatially prewhitened EEG data (right) for three different scenarios composed of patches SupFr & SupOcc with large distance (top), patches InfFr & InfPa with medium distance (center), and patches SupFr & InfFr with small distance (bottom).
82
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
patch distance sLORETA cLORETA STF-DA STWV-DA 4-ExSo-MUSIC
raw data large medium 2.95 2.97 52.8 1.86 31.1 21.5 0.66 0.51 3.66 20.3
small 2.58 22.2 18.4 0.91 16.8
prewhitened data large medium small 2.77 2.92 2.51 2.36 1.97 1.77 31.6 20.7 5.14 0.72 0.59 0.67 0.67 0.62 0.84
Table 4.2: Performance of source imaging algorithms in terms of DLE (in cm) for the considered scenarios with large patch distance (patches SupFr & SupOcc), medium patch distance (patches InfFr & InfPa), and small patch distance (patches SupFr & InfFr). The smallest DLE obtained for each scenario is marked in red.
Figure 4.4: Illustration of recovered patches for the three considered scenarios with large (left), medium (center), and small (right) patch distance for STWV-DA (as the method that yields the best results among the tested source imaging algorithms) applied to the raw EEG data. Triangles belonging to the original patches are marked in red, correctly identified triangles are dark red and erroneously identified triangles are yellow. In the case where the source imaging algorithms are applied to the raw EEG recordings, according to the ROC curves and the DLE, STWV-DA clearly outperforms all other approaches for the three considered patch configurations, and in particular for the scenario with two close patches. This is also reflected by the good concurrence of original and recovered patches (cf. Figure 4.4). The second best method is 4-ExSo-MUSIC, which, in terms of ROC, comes close to the performance of STWV-DA for distant patches, but exhibits a reduced performance for patches with medium and large distances. STF-DA localizes only one of the two patches, reaching a TPF of only 50 % and exhibiting high DLEs. Finally, cLORETA works only for the scenario with medium distance between the patches whereas sLORETA achieves comparable performances for all three source configurations, but does not permit to recover the patches as accurately as STWV-DA. Applying the source imaging algorithms to the prewhitened data leads to an improved performance for all methods. In particular, the 4-ExSo-MUSIC algorithm attains almost the same performance as STWV-DA, leading even to a smaller DLE for large patch distance. STF-DA manages to recover both patches if the FPF is increased sufficiently after the identification of the first patch. For prewhitened data, cLORETA outperforms sLORETA, both in terms of ROC and DLE. Furthermore, the ROC curves obtained for each method for the three tested scenarios are comparable, which means that the patch distance does not influence the source localization performance in the case of prewhitened data.
Tensor-based source localization
4.4.2.3
83
Influence of the patch depth
In the previous simulations for two patches, we considered only superficial patches. However, the depth of a patch plays an important role in the outcome of the source localization process. To determine its impact on the source separation and localization of two patches, we conduct a simulation study with the following three patch configurations: patches InfFr (superficial) and Cing (deep), patches MidTe (superficial) and Hipp (deep), and patches BasTe (deep) and Hipp (deep). The resulting ROC curves and DLE values are displayed in Figure 4.5 and Table 4.3, respectively. In Figure 4.6, the patches that are recovered by the source localization method that leads to the highest TPF for an FPF of 0.2% for the raw EEG data are shown in comparison to the original patches. scenario sLORETA cLORETA STF-DA STWV-DA 4-ExSo-MUSIC
raw data 1 2 3 12.9 2.64 7.56 8.23 6.30 21.4 11.8 3.55 10.3 15.8 14.5 18.6 11.4 3.57 9.08
prewhitened 1 2 12.8 2.59 10.2 3.51 11.6 4.49 2.68 5.97 11.4 3.96
data 3 7.97 15.8 9.11 8.73 8.75
Table 4.3: Performance of source imaging algorithms in terms of DLE (in cm) for the following three scenarios: patches InfFr & Cing (scenario 1), patches MidTe & Hipp (scenario 2), and patches BasTe & Hipp (scenario 3). The smallest DLE obtained for each scenario is marked in red. For source localization based on the raw EEG data, these results show that all tested source localization algorithms have great difficulties in identifying both patches. Both the STF and the STWV analyses fail to accurately separate the sources. The 4-ExSo-MUSIC algorithm features the best performance in terms of ROC for the first scenario. However, it only permits to recover the superficial patch (cf. Figure 4.6). In the second and third scenario, sLORETA yields the best source localization result, both in terms of ROC and DLE, closely followed by 4-ExSo-MUSIC. For the scenario MidTe & Hipp, sLORETA identifies only part of the patch MidTe, localizing more dipoles on a gyrus close to the patch MidTe. For the scenario BasTe & Hipp, sLORETA recovers parts of both patches, but does not permit to identify the true patch forms and extents. For prewhitened data, in most cases, only slight amendments of the ROC curves can be observed. The most noteworthy improvement of performance concerns the STWV-DA algorithm, especially for the scenario InfFr & Cing, where it outperforms the other source imaging algorithms when prewhitening is employed. 4.4.2.4
Theoretical analysis of selected two patch scenarios
In this section, we establish a link between the theoretical findings of Section 3.2.4 and the simulation results of the STWV-DA algorithm presented in Sections 4.4.2.2 and 4.4.2.3. To this end, we analyze what happens when applying the STWV analysis to two examples of two-patch scenarios and explain the consequences on the source localization results. More particularly, we are interested on the impact that the application of the DIAG algorithm for the CP decomposition has on the STWV tensor when the model is not exactly trilinear. As explained in Section 3.2.4, the first step of DIAG consists
84
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
100
100 sLORETA cLORETA STF−DA STWV−DA 4−ExSo−MUSIC
80
80
60
TPF
TPF
60
40
40
20
20
0 0
1
2
3
4
5
0 0
6
1
2
FPF
3
4
5
6
4
5
6
4
5
6
FPF
80
80
60
60
TPF
100
TPF
100
40
40
20
20
0 0
1
2
3
4
5
0 0
6
1
2
FPF
3
FPF
80
80
60
60
TPF
100
TPF
100
40
40
20
20
0 0
1
2
3
FPF
4
5
6
0 0
1
2
3
FPF
Figure 4.5: ROC obtained for STF-DA and STWV-DA in comparison to 4-ExSo-MUSIC, cLORETA, and sLORETA applied to raw EEG data (left) and to spatially prewhitened EEG data (right) for three different scenarios composed of patches InfFr & Cing (top), patches MidTe & Hipp (center), and patches BasTe & Hipp (bottom).
in truncating the SVD of, e.g., the mode-2 unfolding matrix, which ideally leads to a trilinear model where each component corresponds to one source (cf. equations (E.4) to (E.6)). In the following, we examine whether this step is successful for the STWV data of our simulation examples. This determines whether the patches are correctly separated and thus has a high impact on the performance of the source localization. In order to avoid perturbations that are not directly related to the STWV preprocessing and would complicate the evaluation of the results, we generate realistic simulation data as described in Section 4.4.2.1, but without background activity or noise. Furthermore, we attribute the same signal to all dipoles that belong to the same patch. In a
Tensor-based source localization
85
Figure 4.6: Illustration of recovered patches for the three considered scenarios involving deep patches using the method yielding the highest TPF for an FPF of 0.2% when applied to the raw EEG data. Triangles belonging to the original patches are marked in red, correctly identified triangles are dark red and erroneously identified triangles are yellow. first step, we then compute the STWV tensors F1 and F2 separately for each of the two patches. For each of these tensors, we determine the two dominant left singular vectors of the space-wave-vector matrices (vectors v1 and v2 for tensor F1 , and x1 and x2 for tensor F2 ), which contain information about the spatial distribution. In a second step, we calculate the SVD of the mode-1 unfolding matrix of the combined data tensor F = F1 + F2 and truncate it to obtain a rank-2 matrix (for P = 2 patches). If the condition C1) or C2) of Section 3.2.4.1 is fulfilled, the resulting two left singular vectors z1 and z2 should (at least approximately) span the same subspace as the vectors v1 and x1 . Otherwise, the separation of the two patches using the STWV analysis fails.
Figure 4.7: Dominant components of the patch SupFr (left), dominant components of the patch SupOcc (middle) and components recovered with the truncated SVD (right). Figure 4.7 corresponds to the scenario of two distant sources and shows the absolute
86
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
value of the interpolated spatial distributions at the surface of the scalp described by the two dominant singular vectors v1 and v2 of the patch SupFr and the two dominant singular vectors x1 and x2 of the patch SupOcc, as well as the left singular vectors z1 and z2 recovered from the truncated SVD of the mode-1 unfolding matrix of the tensor F . Obviously, the first singular vector z1 corresponds to the dominant x1 of the patch SupOcc, while the second singular vector z2 corresponds to the dominant vector v1 of the patch SupFr. Therefore, the STWV analysis leads to a separation of the two patches and allows for an accurate localization (see Section 4.4.2.2). Figure 4.8 shows the corresponding interpolated spatial distributions for the scenario of deep patches MidTe and Hipp. In this case, the left singular vectors z1 and z2 look like slightly perturbed versions of the two dominant vectors v1 and v2 of the patch MidTe, which leads to the conclusion that the patch MidTe yields observations with higher amplitudes than the patch Hipp. This means that the condition on the singular values µ2 and 1 is not fulfilled. The slight perturbation of the vectors v1 and v2 could be explained by an additional violation of the orthogonality conditions. In short, the STWV analysis fails in this case because it looses the information about the patch Hipp. This explains the bad performance of STWV-DA for this scenario (cf. Section 4.4.2.3).
Figure 4.8: Dominant components of the patch MidTe (left), dominant components of the patch Hipp (middle) and components recovered with the truncated SVD (right).
4.4.2.5
Influence of the number of CP components
The number of CP components identified in the decomposition of the STF and STWV tensors should be chosen according to the number of extended sources. However, in practice, the number of sources is unknown and has to be estimated from the measurements
Tensor-based source localization
87
(see also Section 3.2.2.1). While the estimation of the number of sources is out of the scope of this thesis, we analyze in this section the sensitivity of the STF and STWV based source localization methods to the number of CP components used in the tensor decomposition. To this end, we consider a scenario with a single patch, InfPa, and a scenario consisting of the two patches InfFr and InfPa. Then, we decompose the STF and STWV tensors using one CP component and using two CP components. For both cases, we perform source localization using STF-DA and STWV-DA. The resulting ROC curves are shown in Figure 4.9. For STF-DA, the results that are achieved with one or two CP components are the same, indicating that the spatial mixing vectors obtained for both components must be almost identical. With a TPF close to 100% for an FPF of about 1%, the results obtained by STF-DA are good for the single patch scenario, but poor for two patches where the TPF does not exceed 50% for an FPF smaller than or equal to 6%, which suggests that only one patch is localized. For STWV-DA, with a 1-component CP decomposition, one obtains the same results as with STF-DA for both scenarios. With a 2-component CP decomposition, on the other hand, the results of STWV-DA are worse than those obtained for one component in the single patch case, but considerably better than those obtained with one component in the two patch case. This shows that the correct choice of the number of CP components is important to achieve accurate results with STWV-DA.
Figure 4.9: ROC curves obtained for STF-DA and STWV-DA based on a CP decomposition with one or two components for a single patch and a two-patch scenario.
4.4.3
Real data analysis
To validate the tensor-based source localization methods and in particular the STWV analysis, we report in this section the results that we have obtained by applying the STFDA and STWV-DA algorithms to real EEG measurements that were recorded for a patient suffering from epilepsy. For comparison, we also analyzed the data using 4-ExSo-MUSIC, sLORETA, and cLORETA. 4.4.3.1
Data acquisition and processing
Real EEG data were acquired with a 62-channel measurement system using the common average reference with a sampling rate of 1000 Hz. Our analysis is based on 9 interictal
88
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
spikes that were selected from the recordings. We have considered data segments comprising the ascending and descending parts of the spikes as well as parts of the following wave. The selected time intervals are marked in Figure 4.10. A realistic head model was built by segmenting the patient’s MRI using the BrainVISA software [18]. The lead field matrix was then computed for a cortical mesh with 20003 vertices using Brainstorm [19] and OpenMEEG [20, 21]. In this case, each vertex of the mesh corresponded to one grid dipole. The sources were localized using STWV-DA, STF-DA, and 4-ExSo-MUSIC as well as using sLORETA and cLORETA. Contrary to STWV-DA, STF-DA, and 4-ExSoMUSIC, which exploit the data of the whole time interval, sLORETA and cLORETA were applied to three time points corresponding to the first (negative) peak, the second (positive) peak, and the wave. The STWV and STF tensors were constructed and decomposed as described in Sections 3.2.3 and 3.2.6.1. We analyzed the results obtained for P = 1, P = 2, and P = 3 CP components in accordance with the number of sources that could be expected according to the SEEG recordings that were available for the same patient. For the source localization using STWV-DA, STF-DA, and 4-ExSo-MUSIC, we employed a maximal disk size of 200 dipoles. The number of disks or grid dipoles to consider, which determines the size of the identified patch, was chosen such that the goodness-of-fit (GOF) value
GOF =
||X − Xrec ||F ||X||F
(4.34)
was minimal. Here, Xrec corresponds to the data matrix that is reconstructed from the estimated source configuration. In case of STF-DA, STWV-DA, and 4-ExSo-MUSIC, P T ˆp denotes the reconstructed spatial mixing vector for the comˆpˆ ¯ sp where h Xrec = Pp=1 h bination of a certain number of disks for the p-th component and ˆ ¯ sp denotes the correh i ˆ ˆ1 , . . . , h ˆP . For ¯=H ˆ + X with H ˆ = h sponding patch signal that can be computed as S T ˆˆ ˆ = P ˆg sLORETA and cLORETA, Xrec = h ¯ s where h d∈I ˜d corresponds to the sum of the lead field vectors of the considered grid dipoles, which are characterized by the set Iˆ ˆ The of dipole indices and which are identified by thresholding the coefficient vector ψ. + ˆ ˆ corresponding patch signal is computed as ¯ s = h X. The source localization results can be evaluated based on the findings of the SEEG, which give a strong hypothesis on the actual source regions of the epileptic activity. To this end, in Figure 4.10, we marked by small spheres the positions of the three SEEG electrodes for which the highest amount of epileptic spikes were automatically detected [157] during SEEG recordings. Note that the automatic detection was based on an independent evaluation of the recordings of each SEEG electrode which means that the epileptic activity at the three identified sites could be independent or concomitant. A more detailed analysis of the SEEG recordings showed that in some cases, an epileptic spike at the anterior SEEG electrode may be associated with an epileptic spike at the central SEEG electrode, delayed by about 20 ms, and a spike at the posterior SEEG site delayed by 70 ms with respect to the spike at the anterior site. This suggests that epileptic activity is propagated from the anterior SEEG electrode to the posterior SEEG electrode.
Tensor-based source localization
89
Figure 4.10: Results of STWV-DA and STF-DA for P = 1, P = 2, and P = 3 and of 4-ExSo-MUSIC for R = 1, R = 3, and R = 6 (corresponding to 1, 2, and 3 correlated sources) as well as of sLORETA and cLORETA for the different spikes. The patch dipoles are colored according to the number of spikes (from 0 to 9) for which they were identified. Small blue spheres indicate the positions of the three SEEG electrodes for which the highest amount of epileptic spikes was automatically detected during SEEG recordings. We also illustrated the spike intervals considered for STWV-DA, STF-DA, and 4-ExSoMUSIC and the time points considered for cLORETA and sLORETA (displayed on the time signal recorded by electrode AF7).
90
4.4.3.2
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
Results
In Figure 4.10, the patch dipoles that were identified with STWV-DA, STF-DA, 4-ExSoMUSIC, sLORETA, and cLORETA for the 9 different spikes are marked in color. The colorscale varies from 0 to 9 depending on the number of spikes for which a dipole was determined to be active. STWV-DA, STF-DA, 4-ExSo-MUSIC, and sLORETA identified patches mostly on the left hemisphere, at locations that were close to the two or three SEEG contacts that recorded the highest amount of spikes. However, all tested source localization methods also identified patches on the right hemisphere. This was particularly the case for cLORETA. In case of STWV-DA, the patches identified for most of the spikes were located in between the two posterior or the two frontal SEEG contacts, in the superior frontal gyrus or in the mesial areas of the frontal lobe. Some isolated patches were located in the posterior of the right hemisphere. Comparing the results of STWV-DA obtained for P = 1, P = 2, and P = 3 CP components, we observe that for P = 1, only one small patch, located in between the two frontal SEEG contacts, was identified. For P = 2, the recovered patches were larger and located close to the three SEEG contacts, whereas for P = 3, the majority of patches was localized between the two posterior SEEG contacts. The patches localized by STF-DA were mostly located between the two frontal SEEG contacts and were comparable for the three tested tensor ranks. However, for P = 3, in some cases, patches were also identified at the equivalent position on the right hemisphere. Moreover, a small number of patches were localized in the vicinity of the third SEEG contact in the pre-central gyrus. The 4-ExSo-MUSIC algorithm identified patches in between the frontal and the posterior SEEG contact, with most patches located close to the central SEEG contact. The patch size slightly increased for larger ranks of the signal subspace. Otherwise, the different ranks of the signal subspace lead to similar results. As for STF-DA, patches were also localized at the equivalent positions on the right hemisphere. The patches localized by sLORETA are globally more anterior, while cLORETA identified patches all over the frontal parts of the left and right hemispheres.
4.4.4
Discussion and conclusions
We have conducted realistic computer simulations which have shown that STWV-DA exhibits the best performance for scenarios with two superficial patches. In particular, this method has proven to be robust if applied to the raw EEG data, contrary to the other tested source imaging algorithms, which, for patches with medium to small distances, only lead to good results in the case of prewhitened data. Since it is difficult to obtain an accurate estimate of the noise covariance matrix in practice, robustness to noise spatial coherence is an important advantage of the STWV-DA source imaging method. The good performance of STWV-DA can be explained by the fact that the STWV analysis correctly separates the spatial mixing vectors of the two patches as demonstrated in Section 4.4.2.4 and therefore permits to localize each patch individually. Due to the highly correlated signals of the two patches, which differ only by a small time delay, the STF analysis fails to separate the patches, therefore impeding source localization. This explains the poor performance that has been observed for STF-DA. 4-ExSo-MUSIC needs to localize both patches simultaneously, which does not work as well as the localization of a single patch and thus does not yield as accurate results as STWV-DA for the raw EEG data. Nevertheless, employing prewhitening improves the source localization results obtained by 4-ExSo-MUSIC, leading to a performance that is similar to STWV-DA in this
Tensor-based source localization
91
case. cLORETA and sLORETA generally do not permit to achieve as accurate results as STWV-DA. However, if we consider scenarios with one deep patch and one superficial patch or two deep patches, the tensor-based techniques feature a poor performance because they do not accurately separate the two patches. In the presence of one superficial and one deep patch, this is mainly due to the different strengths of the patch signals recorded at the surface as has been shown by the analysis in Section 4.4.2.4. Furthermore, the STF technique fails because of the highly correlated signals of the patches and the STWV method struggles with the wide-spread distribution of the electric potential of deep sources. In summary, the performance of the STWV-DA and STF-DA depends greatly on the validity of the trilinear approximation that is made in the tensor-based preprocessing step, leading to superior source localization results compared to other methods if this approximation is accurate, but failing to recover the patches if it is not. A remaining problem of the tensor methods consists in the estimation of the number of active patches, which we assumed to be known in this thesis. As the analysis of STF-DA and STWV-DA with respect to the number of identified CP components (see Section 4.4.2.5) has shown, this parameter has a high impact on the results of the STWV analysis. An inappropriate number of components may cause the STWV analysis to fail by separating patches into several components or by mixing different patches in one component. For the STF analysis, the number of CP components seemingly did not have an impact on the results. But this insensitivity may be explained by the inability of the STF analysis to identify components that can be associated to different patches because the time-frequency content of the simulated patch activities is nearly identical. In practice, the number of patches has to be determined from the measurements, which is a difficult task, especially in the context of delayed signals for simultaneously active patches. Similarly, for 4-ExSo-MUSIC, one has to estimate the dimension of the FO signal subspace, which raises the same difficulties. Another source of errors for the tensor-based techniques stems from the imperfect synchronization of the signals that are emitted by the dipoles of one patch. Both the STF and the STWV analysis are based on the model (4.3), which approximates model (4.2) by assuming the same signal for all dipoles within a patch. If the activities of the patch dipoles are not sufficiently synchronous, this model is incorrect and leads to perturbations of the estimated spatial mixing vectors and thereby of the source localization results. The same problem also applies to the 4-ExSo-MUSIC algorithm. The application of the STF-DA and STWV-DA algorithms to the actual EEG measurements of an epileptic patient have led to the localization of patches that show a good correspondence to the positions of the SEEG electrodes detecting frequent interictal epileptic activity. More precisely, for all employed tensor decompositions with P = 1, P = 2, and P = 3 components, we identified patches that are close to two or three of the marked SEEG contacts for most spikes. In some cases, the determined patches are farther away from the marked SEEG electrodes and include regions on the right hemisphere. This could be due to lower SNRs for the single spikes or to propagation phenomena, which occur during the spike and wave complex of the analyzed epileptic spikes. Nevertheless, it is difficult to consider these results as a “false” localization since, in the absence of simultaneous SEEG/EEG recordings, the involvement of these remote regions cannot be ruled out. For STF-DA, we did not observe significant discrepancies between the results obtained for different tensor ranks. Due to propagation effects, the source signals can be expected to be highly correlated and the STF analysis is therefore unlikely to separate
92
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
the sources, distinguishing rather different components of one source. This would explain the insensibility of the results to the employed number of CP components, which has also been observed in the simulation study. For STWV-DA, the best results were achieved for P = 2, in which case patches were localized in proximity to the three marked SEEG contacts. For P = 3, the source localization results are slightly less concordant with the sites identified from the SEEG recordings and for P = 1, patches were only identified close to the two frontal SEEG contacts. This suggests that the STWV-DA method is able to separate only two sources. If there is a higher number of active patches, their signals are probably too correlated or their amplitudes too different to enable their separation using the STWV analysis, leading to worse results for higher tensor ranks. Altogether, we deduce that the tensor-based methods are also well suited for the analysis of real data, as is the 4-ExSo-MUSIC algorithm, for which we obtained similar results. The results of these three methods are more concordant with the identified SEEG contacts than those of sLORETA and cLORETA, which frequently identified patches on the contralateral hemisphere.
4.5
Sparse, variation-based source imaging approaches
Even though the STWV-DA and 4-ExSo-MUSIC algorithms lead to good source imaging results in a number of cases as shown in Section 4.4, these methods have some difficulties with the simultaneous localization of several active source regions because the employed distributed source parameterization based on disks does not take into account that the distributed source may be composed of several spatially distant patches. For STF-DA and STWV-DA, this problem arises especially if the patches cannot be separated into distinct CP components due to their highly correlated time signals. In order to overcome this problem, we explore a different source imaging approach in this section. This approach is based on the VB-SCCD algorithm (see Section 4.3.1.4, [116]), which showed a good performance for the localization of extended sources in a recent comparison of different source imaging algorithms [158]. In particular, this method permits to simultaneously localize several highly correlated patches and is therefore one of the most promising approaches for the identification of multiple active brain regions in the context of propagation phenomena. However, the VB-SCCD algorithm shows some difficulties in separating close sources and tends to combine them into one large source. Furthermore, the implementation of VB-SCCD using SOCP [123, 124] as proposed in [116] leads to a high computational complexity, which practically forbids the application of the method for large numbers of time samples. In Section 4.5.1 of this thesis, we improve on these points by proposing a new source imaging algorithm, subsequently referred to as sparse VB-SCCD (SVB-SCCD), which includes an additional L1 -norm regularization term. Such an approach, also known as sparse TV regularization [159], TV-L1 regularization [160] or fused LASSO [161], has previously been used in image processing [162] and fMRI prediction [159, 160], where it has been shown to lead to robust solutions, but is new in the field of brain source imaging. Note though that the combination of sparsity in the original source domain and in a transformed domain that is different from the total variation has been explored in [114] for MEG source imaging. As shown in this section, the SVB-SCCD method permits to obtain
Sparse, variation-based source imaging approaches
93
more focal source estimates than VB-SCCD and achieves the separation of even close sources. Furthermore, in Section 4.5.3, we illustrate the use of a different optimization technique, called Alternating Direction Method of Multipliers (ADMM) [163, 164] (see also [165]), which is much faster than SOCP. This gain on computational complexity enables us to apply the algorithm to large time intervals and to reconstruct the source signals. It also makes it possible to take into account the temporal structure of the data by employing an L1,2 -norm regularization as first suggested in [126] in the context of MxNE (see also Section 4.3), leading to more robust source estimates. This approach is described in Section 4.5.2. Finally, in Section 4.5.4.2, we demonstrate the superior performance of the resulting L1,2 -SVB-SCCD algorithm in comparison to VB-SCCD, SVB-SCCD and L1,2 -VB-SCCD by means of computer simulations.
4.5.1
The SVB-SCCD algorithm
The SVB-SCCD algorithm builds on the VB-SCCD method, described in Section 4.3.1.4, which imposes sparsity on the variational map of the sources. However, since the regularization term does not influence the source amplitudes, VB-SCCD frequently leads to amplitude-biased solutions and shows some difficulties in separating close sources. To overcome these problems, we introduce an additional regularization term that imposes sparsity in the original source domain, corresponding to hypothesis S3). This leads to the following optimization problem: 1 ˜ S|| ˜ 2F + λ(||TS|| ˜ 1 + β||S|| ˜ 1) min ||X − G ˜ S 2
(4.35)
where T is the transform matrix that permits to obtain the variational map and which is defined in (4.11). This approach permits us to adjust the size of the reconstructed source regions by varying the new regularization parameter β. Setting β = 1 leads to very focal source estimates, whereas small β only avoid the amplitude bias, but do not influence the size of the reconstructed source regions. In our experience, reasonable results can be achieved for 0.01 ≤ β ≤ 1. For the special case where β = 0, (4.35) reduces to an alternative form of the VB-SCCD optimization problem (4.10). The regularization parameter λ regulates the impact of the source priors and may be adjusted according to the acceptable upper limit for the reconstruction error, which depends on the noise level, as suggested in [116].
4.5.2
Exploitation of temporal structure
The SVB-SCCD algorithm as described in the previous section considers each time sample independently and thus does not take into account the temporal structure of the data. However, it can be expected that in the considered time interval, the active source regions stay the same. This hypothesis can be enforced by replacing the L1 -norm in equation (4.35) by the L1,2 -norm, leading to the following optimization problem: 1 ˜ S|| ˜ 2F + λ(||TS|| ˜ 1,2 + β||S|| ˜ 1,2 ). min ||X − G ˜ S 2
(4.36)
This permits to obtain more robust source estimates. The resulting source localization approach is subsequently called L1,2 -SVB-SCCD.
94
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
4.5.3
Optimization using ADMM
The optimization problems of the three different algorithms, VB-SCCD, SVB-SCCD, and L1,2 -SVB-SCCD, can be rewritten in a generalized, constrained optimization framework with latent variables Y and Z: 1 ˜ S|| ˜ 2F + λ(f (Y) + βf (Z)) min ||X − G ˜ S 2 ˜ Z = S. ˜ s. t. Y = TS, (4.37) Here, f represents the regularizing function that is either the L1 -norm (for SVB-SCCD and VB-SCCD) or the L1,2 -norm (for L1,2 -SVB-SCCD and L1,2 -VB-SCCD). Problem (4.37) can be solved using ADMM [163, 164] (see also [165]), which is a simple and efficient algorithm for constrained convex optimization. It is based on the idea of alternatingly updating the variables S ∈ RD×T , Y ∈ RE×T , and Z ∈ RD×T in the augmented Lagrangian of (4.37), as well as computing alternating updates of the scaled Lagrangian multipliers U ∈ RE×T and W ∈ RD×T . After initialization (for example, by setting all variables to zero), at the k-th iteration, the following update rules can be derived (see Appendix F.2): i−1 h
h
˜ TG ˜ + ρ(TT T + ID ) ˜ (k+1) = G S
˜ (k+1) + U(k) Y(k+1) = proxf,λ/ρ TS
˜ (k+1) + W(k) Z(k+1) = proxf,λβ/ρ S
i
˜ T X + ρTT (Y(k) − U(k) ) + ρ(Z(k) − W(k) ) G
˜ (k+1) − Y(k+1) U(k+1) = U(k) + TS ˜ (k+1) − Z(k+1) W(k+1) = W(k) + S where ρ > 0 denotes the penalty parameter introduced in the augmented Lagrangian (see [165]). Please note that in practice, the computation of the inverse of the large ˜ TG ˜ + ρ(TT T + ID ) ∈ RD×D should be avoided, for example, by resorting to matrix G inversion lemma and matrix decompositions (such as the QR-decomposition) which can be computed efficiently (see also Section 4.7.1). The algorithm is stopped after convergence or a maximal number of iterations is reached.
4.5.4
Simulations
In this section, we compare the performance of SVB-SCCD, VB-SCCD, L1,2 -SVB-SCCD, and L1,2 -VB-SCCD based on computer simulations. 4.5.4.1
Simulation setup
Data generation EEG data is generated for N = 91 electrodes and a source space consisting of D = 19626 dipoles. We simulate three different scenarios with patches of different distances. Each patch is composed of 100 adjacent dipoles. The first scenario comprises three patches of medium to large distance: patches SupFr, InfFr, and SupOcc. The second scenario includes two close patches, SupOcc and InfPa, and the patch InfFr. In the third scenario, we consider three close patches: InfPa, MidTe, and OccTe. The first patch is attributed an epileptic spike signal comprising T = 200 time samples (at 256 Hz sampling frequency) that has been segmented from SEEG recordings of a
Sparse, variation-based source imaging approaches
95
patient suffering from epilepsy. We then generate 100 different realizations of this signal, one for each patch dipole, by introducing small variations in amplitude and delay (see also Section 3.3.1.1). Assuming that the other patches are activated due to a propagation of the epileptic activity of the first patch, we use the same signals for the dipoles of the second and third patch, but introduce a delay of 4 to 24 ms depending on the distance to the first patch. All source dipoles that do not belong to a patch are attributed Gaussian background activity with an amplitude that is adjusted to the amplitude of the SEEG signals ˜ S|| ˜ 2F /||N||2F ≈ 1. between epileptic spikes, thus leading to realistic SNRs such that ||G Tested source localization methods The EEG data are spatially prewhitened before applying the source localization algorithms. For both VB-SCCD and SVB-SCCD, the regularization parameter λ is adjusted such that the reconstruction error lies within a confidence interval of 95 to 99 % of the noise power. In the case of SVB-SCCD, we use a fixed parameter β = 0.67 because we found that this leads to reasonable results for the considered scenarios. For VB-SCCD and SVB-SCCD, which provide one source estimate per time sample, we determine the active patches by thresholding the source estimates at the data sample of maximal power, corresponding to the maximum of the epileptic spike. For each identified source region, comprised of adjacent dipoles, we then compute the average of the time signals of all involved source dipoles in order to obtain one estimated time signal per patch. Evaluation criteria The performance of the source localization is assessed using the DLE and the ROC curves, which have been introduced in Section 4.4.2.1. The quality of the extracted signals is evaluated by calculating the correlation coefficients between the estimated patch signal and the averaged signal of all dipoles belonging to a patch. We then compute the mean of the correlation coefficients for all patches. The results are averaged over 30 realizations with different patch signals and background activity. 4.5.4.2
Results
The performance achieved with the different source imaging algorithms in terms of DLE and signal correlation coefficient for the three considered scenarios is summarized in Table 4.4. The corresponding ROC curves are shown in Figure 4.11.
scenario VB-SCCD SVB-SCCD L1,2 -VB-SCCD L1,2 -SVB-SCCD
DLE in cm 1 2 3 0.99 2.57 9.73 0.94 1.05 3.81 0.97 1.09 10.8 1.03 1.06 2.24
corr. 1 94.9 95.5 97.9 98.5
coeff. 2 92.5 94.9 97.5 98.3
in % 3 78.5 89.3 77.9 96.6
Table 4.4: Performance of source imaging algorithms in terms of DLE and signal correlation for scenario 1 (SupFr & InfFr & SupOcc), scenario 2 (SupOcc & InfPa & InfFr), and scenario 3 (SupOcc & MidTe & OccTe). The best result for each scenario is marked in red.
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
100
100
80
80
60
60
TPF
TPF
96
40 20 0 0
40
VB−SCCD L12−VB−SCCD SVB−SCCD L12−SVB−SCCD
2
4
FPF
20 0 0
6
2
4
6
FPF 100
TPF
80 60 40 20 0 0
2
4
6
FPF
Figure 4.11: ROC curves for (top left) scenario 1 (SupFr & InfFr & SupOcc), (top right) scenario 2 (SupOcc & InfPa & InfFr), and (bottom) scenario 3 (SupOcc & MidTe & OccTe).
Both the ROC curves and the DLE values show that the use of the additional L1 -norm regularization term in the SVB-SCCD approach turns out to be insignificant in the case of three patches with medium distance (scenario SupFr & InfFr & SupOcc) as SVB-SCCD and VB-SCCD exhibit a comparable performance in this case. However, for two close patches (scenario SupOcc & InfPa & InfFr), one can observe a slight improvement of the DLE obtained with SVB-SCCD compared to VB-SCCD, and for three close patches, the SVB-SCCD approach clearly leads to better results than VB-SCCD. This can also be seen in Figure 4.12, where we illustrate an example of the source imaging results obtained with the different methods for the scenario with the three close patches InfPa, MidTe, and OccTe. Obviously, the SVB-SCCD approach provides a better separation of the sources than the VB-SCCD approach. The exploitation of the temporal structure of the data in the VB-SCCD and SVBSCCD algorithms hardly has an impact on the source localization results of scenarios 1 and 2, but for SVB-SCCD, it yields more robust solutions in the case of three close patches (cf. DLE for scenario 3). Furthermore, it L1,2 -SVB-SCCD and L1,2 -VB-SCCD lead to a better performance in terms of source extraction than SVB-SCCD and VB-SCCD as demonstrated by the obtained signal correlation coefficients shown in Table 4.4.
Combination of tensor decomposition and variation-based source localization
97
Figure 4.12: Source imaging results obtained with the different tested algorithms.
4.5.5
Conclusions
In this section, we have analyzed two extensions of the VB-SCCD algorithm. Following the fused LASSO approach, we have included an additional, sparsity-inducing regularization term, which permits to obtain a better separation of close sources. Furthermore, we have taken into account the temporal structure of the data, which leads to an increased performance in terms of signal extraction. Finally, we have illustrated the use of an efficient algorithm, ADMM, to solve the L1,2 -SVB-SCCD optimization problem in a much faster way than the previously employed SOCP algorithm. The superior performance of the proposed approach in comparison to the classic VB-SCCD algorithm has been demonstrated by means of realistic computer simulations.
4.6
Combination of tensor decomposition and variation-based source localization
In Sections 4.4 and 4.5, we have considered two different source imaging approaches. The STWV-DA and STF-DA algorithms rely on a tensor decomposition step to separate the sources and to facilitate their localization. However, to achieve good performances, the DA technique requires all patches to be correctly separated. On the other hand, the SVB-SCCD approach allows for the simultaneous localization of all active source regions. Nevertheless, even for this method, the correct identification of several patches is more difficult than the localization of a single patch. This gives rise to the question whether separating the patches prior to the actual localization using the tensor-based preprocessing methods could lead to improved results of the SVB-SCCD algorithm. Such an approach would be of particular interest in the case where some sources can be separated into different components, which partly facilitates the source localization, but other patches are mixed in the same component, which requires a source imaging method that can deal with multiple patches, such as SVB-SCCD. This issue is adressed in this section, where
98
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
we analyze the combination of the STWV tensor method with the SVB-SCCD algorithm, subsequently called STWV-SVB-SCCD, in comparison to STWV-DA and SVB-SCCD by means of realistic computer simulations.
4.6.1
Simulation setup
Data generation EEG data are simulated for N = 91 electrodes, T = 200 time samples (at a sampling rate of 256 Hz), and a source space comprising D = 19626 dipoles. We consider two scenarios with three patches (see Figure 4.13), each of which is composed of 100 adjacent grid dipoles. For the source dynamics, we compare the following two cases: 1. The three patches emit epileptiform spike-like signals with different morphologies. 2. Two patches (marked in red in Figure 4.13) are attributed the same spike signals except for a small delay, corresponding to the propagation of the spike from one patch to another patch, whereas the third patch (marked in green in Figure 4.13) is associated with a signal of different morphology. The patches with different signal morphologies are subsequently referred to as patches with different signals whereas we speak of two delayed signals and one different signal or mixed signals in the second case.
Figure 4.13: Illustration of considered scenarios where the patches that are associated with propagated spikes (for the second type of considered source dynamics) are shown in red and the patch with a different source activity is shown in green. (Left) scenario SupFr & InfPa & InfFr and (right) scenario OccTe & MidTe & InfFr. The epileptiform signals are obtained using the neuronal population model described in Section 2.5.2. To obtain highly correlated signals for the dipoles within each patch, which cannot be achieved for all signal morphologies using the neuronal population model, we artificially introduce small variations in amplitude and signal delay, which are drawn from a Gaussian distribution and a lognormal distribution (see also Section 3.3.1.1). An example of the resulting patch signals is plotted in Figure 4.14 for both types of considered source dynamics. Tested source imaging methods We perform distributed source localization on the spatially prewhitened data using the STWV-DA, SVB-SCCD, and STWV-SVB-SCCD algorithms. The parameters for the STWV tensor method, DA and SVB-SCCD are chosen as previously described in Sections 4.4.2.1 and 4.5.4.1. In the case of three patches with different signals, the STWV tensor is decomposed into P = 3 components and the SVB-SCCD algorithm is applied to three time samples corresponding to the maxima of
Combination of tensor decomposition and variation-based source localization
5
signal amplitude in mV
signal amplitude in mV
5
0
−5 0
99
0.2
0.4
0.6
0.8
0
−5 0
0.2
time in s
0.4
0.6
0.8
time in s
Figure 4.14: Illustration of the average patch signals for three patches with epileptic spikes of different morphologies (left) and two patches with delayed signals and a third patch with a signal of different morphology (right).
the epileptiform spike signals associated with the considered patches. For two patches with mixed signals, we employ a tensor rank of P = 2 and apply the SVB-SCCD method to only two time samples, corresponding to the maxima of the two spike signals with different morphologies. To evaluate the source localization results obtained with SVBSCCD, the source distributions at the analyzed time points are combined by retaining the maximal amplitude observed for each grid dipole. For STWV-DA and STWV-SVBSCCD, the source regions identified from the different estimated spatial mixing vectors are combined such that the sizes of the estimated patches are comparable.
Evaluation criteria The source localization results are evaluated using the DLE and the ROC curves as defined in Section 4.4.2.1, wich are determined for 27 realizations with different spike signals and background activity. Due to the amount of outlying DLE values observed in particular in the case of patches with mixed signals, we distinguish in this section between non-aberrant and aberrant source localization results. If the DLE associated with an estimated distributed source is small and comparable to the values obtained for the majority of tested realizations for all algorithms, the source estimate is said to be non-aberrant. If, by contrast, the DLE is considerably higher than for most other source estimates, the source localization result is considered aberrant. For each scenario, the threshold value DLEa between aberrant and non-aberrant results is fixed after visual analysis of all obtained DLE scores. For each algorithm, the DLE and ROC curves are evaluated by taking only the realizations for which the method yields nonaberrant results into account. Furthermore, we analyze the probability of non-aberrant results, which is computed as
p(DLE < DLEa ) =
˜ L L
(4.38)
˜ correponds to the number of realizations for which the distributed source estimate where L is associated with an DLE that is smaller than the threshold value DLEa and L = 27 corresponds to the total number of tested realizations.
100
4.6.2
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
Results
The DLE values and probabilities of non-aberrant results obtained with the tested methods for the two considered scenarios are displayed in Tables 4.5 and 4.6, respectively. Figures 4.15 and 4.16 show the corresponding ROC curves and an example of the recovered source distributions for one non-aberrant realization.
Figure 4.15: Performance of STWV-SVB-SCCD in comparison to STWV-DA and SVBSCCD in terms of ROC curves and illustration of source localization results for one realization for the scenario SupOcc & InfPa & InfFr. In both scenarios, considering the DLE values and ROC curves, STWV-DA outperforms SVB-SCCD and STWV-SVB-SCCD in case of the three patches with different signal morphologies, but displays the worst performance for two patches with delayed signals and one patch with a different signal. In the latter case, STWV-DA localizes only two patches at the correct position while identifying a third, spurious patch for scenario SupOcc & InfPa & InfFr and no third patch at all for scenario OccTe & MidTe & InfFr. Furthermore, the probability of non-aberrant results is very small for STWV-DA in the case of scenario SupOcc & InfPa & InfFr because it often identifies only one patch. By con-
Combination of tensor decomposition and variation-based source localization
101
Figure 4.16: Performance of STWV-SVB-SCCD in comparison to STWV-DA and SVBSCCD in terms of ROC curves and illustration of source localization results for one realization for the scenario OccTe & MidTe & InfFr. trast, SVB-SCCD and STWV-SVB-SCCD permit to localize all three patches even in the presence of spike propagation and therefore attain smaller DLEs in this case. For scenario SupOcc & InfPa & InfFr, STWV-SVB-SCCD yields an intermediate performance compared to STWV-DA and SVB-SCCD in terms of DLE. The ROC curves are comparable to SVB-SCCD, but slightly worse. For scenario OccTe & MidTe & InfFr, considering the ROC curves, SVB-SCCD clearly outperforms STWV-SVB-SCCD. Nevertheless, STWVSVB-SCCD leads to the smallest DLE for this scenario when considering mixed signals. This might be explained by the fact that SVB-SCCD leads to a high number of aberrant results and also achieves high DLE values for several other realizations (not considered
102
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION DLE patch signals STWV-DA SVB-SCCD STWV-SVB-SCCD
SupOcc & InfPa & InfFr different mixed 0.70 4.75 0.79 1.41 0.78 3.32
OccTe & MidTe & InfFr different mixed 1.07 2.53 2.15 2.30 2.19 1.58
Table 4.5: Performance of source imaging algorithms in terms of DLE (in cm) for the two considered scenarios for three patches emitting signals with different morphologies (“different patch signals”) and two patches associated with propagated spike signals and a third patch attributed a signal of different morphology (“mixed patch signals”). The smallest DLE obtained for each scenario is marked in red. p(DLE < DLEa ) patch signals STWV-DA SVB-SCCD STWV-SVB-SCCD
SupOcc & InfPa & InfFr different mixed 0.96 0.14 0.96 0.78 0.96 0.67
OccTe & MidTe & InfFr different mixed 0.81 0.96 1 0.48 0.81 0.96
Table 4.6: Performance of source imaging algorithms in terms probability of non-aberrant results p(DLE < DLEa ) for the two considered scenarios for three patches emitting signals with different morphologies (“different patch signals”) and two patches associated with propagated spike signals and a third patch attributed a signal of different morphology (“mixed patch signals”). The highest probability of non-aberrant results obtained for each scenario is marked in red. as aberrant), showing that this method does not yield stable results in this case. In general, the soure regions localized by STWV-SVB-SCCD are very similar to the patches identified by SVB-SCCD and the probability of non-aberrant results is identical to that of STWV-DA except for scenario SupOcc & InfPa & InfFr with mixed signals. In most cases, SVB-SCCD displays the highest probability of non-aberrant results. Considering the overall performance, STWV-SVB-SCCD generally does not yield as accurate results as SVB-SCCD because of perturbations in the spatial mixing vectors that are estimated by the STWV tensor method.
4.6.3
Conclusions
In this section, we have analyzed the combination of tensor decomposition and variationbased source localization to overcome the difficulties encountered with the STWV-DA and SVB-SCCD source localization methods. However, we have observed that contrary to our expectations, the STWV-SVB-SCCD algorithm generally does not yield improved source estimates. In fact, this method is affected by inaccuracies in the spatial mixing vector estimates due to the approximations made by the STWV tensor analysis, which deteriorates the source localization results. If the STWV analysis works well, it has only a slight influence on the source regions determined by STWV-SVB-SCCD, which are comparable to those obtained by SVB-SCCD in this case, but the STWV preprocessing step may seriously compromise the source localization in cases where the tensor-based source separation scheme fails. These results imply that the tensor-based preprocessing is
Comparative performance study
103
only useful if the algorithm employed for distributed source localization requires separated patches, i.e., if it cannot localize several patches simultaneously, as is the case for DA.
4.7
Comparative performance study
Over the last two decades, the brain source imaging problem has been widely studied [166, 6, 89, 167, 132], giving rise to an impressive number of methods using different priors and methodological approaches (see Section 4.3). Even though several studies [168, 155, 89, 169] have aimed at comparing different source imaging algorithms, a full and deep study of these methods that takes into account recent advances in this field is still missing. The objective of this section thus consists in conducting a thorough comparison of different brain source imaging approaches, including the new algorithms that have been presented in Sections 4.4 and 4.5. To this end, we evaluate the performance of eight representative methods, including sLORETA, cLORETA, MCE, MxNE, Champagne, 4-ExSo-MUSIC, STWV-DA, and SVB-SCCD, with respect to their computational complexity and the accuracy of their distributed source estimates based on realistic computer simulations.
4.7.1
Analysis of the computational complexity
In order to determine the computational complexity of the selected source imaging methods, we subsequently compute the number of FLOPs (in terms of real-valued multiplications) that are required for the completion of each algorithm. The obtained results are summarized in Table 4.7. For descriptions of the analyzed algorithms, the reader is referred to Sections 4.3, 4.4, and 4.5. Moreover, a summary of the computational complexities associated with several basic operations can be found in Table 3.2. sLORETA The first step of the sLORETA algorithm consists in computing the generalized inverse matrix K, which requires O( 32 N 2 D + 76 N 3 ) real-valued multiplications. Once the matrix K has been computed, it can be applied to a data vector to compute the sLORETA coefficients, which leads to a computational complexity of N D operations per application. As the sLORETA algorithm works on a sample-by-sample basis, it is applied T times for a data interval of length T to reconstruct the source activity at each time point. Alternatively, if T < D, one could first multiply the second term of the generalized ˜G ˜ T + λIN )−1 with the data matrix (in practice, this is accomplished by inverse matrix (G linear system solving based on the Cholesky decomposition to avoid the computation of ˜ T , leading to the inverse matrix) and then execute the multiplication with the first term G 1 3 1 2 2 an overall computational cost of O( 2 N D + 6 N + N T + N DT ). If one is only interested in source localization and not in the source dynamics, sLORETA can be applied to a small number of time points corresponding to the maxima of the signal within the considered time interval. For example, for the identification of the epileptogenic zones, sLORETA can be applied to the maxima of the interictal spikes. cLORETA As for sLORETA, the most important operation of the cLORETA algorithm is the computation of the generalized inverse matrix. Exploiting the sparsity of the Laplacian matrix, the computation of the matrix (WLT LW)−1 and multiplications with this matrix can be neglected. Therefore, cLORETA has approximately the same computational complexity as sLORETA.
104
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
MCE The computational complexity of the MCE algorithm implemented with FISTA corresponds to the number of real-valued multiplications that are required for solving an L1 -norm regularized minimum norm problem with FISTA and amounts to O(2N DT I) FLOPs. Here, I denotes the number of iterations that are performed by FISTA. MxNE The MxNE algorithm employs a set of temporal basis functions that are computed from an SVD. As the number of time samples is generally larger than the number of sensors, it is computationally more efficient to determine the R dominant right singular vectors based on the left singular vectors derived from an EVD of the matrix XXT . In this case, the computational complexity associated with the computation of the temporal basis functions is equal to O( 12 N 2 T + 34 N 3 + N RT ) FLOPs. Moreover, N RT real-valued multiplications are required to obtain the modified data matrix X0 . The L1,2 -norm regularized MxNE cost function is then minimized using the FISTA algorithm, which leads ˜ to O(2N DRI) real-valued multiplications. Finally, the estimation of the signal matrix S necessitates DT R FLOPs. Champagne The first step of the Champagne algorithm consists in computing a temporary data matrix, which corresponds to a square root of the data covariance matrix. Assuming that the data matrix has full rank N , this is associated with a computational cost of O( 12 N 2 T + 43 N 3 ) real-valued multiplications. The update of the three matrices Yd , Zd and Cs,d , which are at the heart of the algorithm, requires O(2N 2 DI + 76 N 3 I) FLOPs, where I denotes the number of performed iterations. Eventually, O( 61 N 3 + N 2 (D + T ) + DN T ) real-valued multiplications are needed for the determination of the signal matrix based on the estimated source covariance matrix. 4-ExSo-MUSIC The 4-ExSo-MUSIC is based on the quadricovariance matrix of the 1 N 4 ) different FO cumulants. If the Leonovdata that contains N (N +1)(N4!+2)(N +3) = O( 24 Shiryaev formula for stationary signals is applied, the cumulants are estimated from the estimate of the SO and FO moments (cf. Appendix A.1). For zero-mean signals, the latter are efficiently obtained by first computing the products xk,t xl,t for all 12 N (N + 1) possible pairs (k, l) of sensor indices, k, l = 1, . . . , N , and all time samples, t = 1, . . . , T , as suggested in [33]. This corresponds to 21 N (N + 1)T multiplications. The SO moments are then simply approximated by averaging these products over the time samples. Moreover, the estimation of each cumulant necessitates the estimation of a FO moment, which requires T additional multiplications of two previously computed products (xk,t xl,t )(xm,t xn,t ), k, l, m, n = 1, . . . , N . As the computational complexity of the estimation of the SO moments is negligible compared to the estimation of the FO moments, 1 this procedure leads to an overall cost of approximately O( 24 N 4 T ) real-valued multiplications to estimate the quadricovariance matrix. The second step of the 4-ExSo-MUSIC algorithm consists in computing an EVD of the estimated quadricovariance matrix. This requires O( 61 N 6 ) multiplications. The source localization method employed in the 4-ExSo-MUSIC algorithm is based on a dictionary that contains all considered disks and their associated spatial mixing vectors and needs to be determined. This may require the computation of the distances between all grid dipoles, which is associated with a computational cost of 32 (D2 − D) multiplications. However, if additional information provided by a mesh is available, the disks can be determined by searching for adjacent dipoles within the grid. In this case,
Comparative performance study
105
the main effort is accomplished based on sorting algorithms and only a small number of, e.g., 92 DDmax multiplications is required to ultimately determine the distances of a few neighboring dipoles for each grid point. Then, for each disk, the FO spatial mixing vector h ⊗ h is computed, which leads to a computational complexity of O( 21 N 2 DDmax ) multiplications for Dmax considered disks with different sizes varying from 1 to Dmax dipoles per disk for each of the D grid dipoles. Finally, the computation of the 4-ExSoMUSIC spectrum amounts to O( 14 N 3 DDmax + 34 N 2 DDmax ) multiplications. Disk algorithm Similarly to 4-ExSo-MUSIC, the first step of the disk algorithm consists in generating a dictionary of disks (cf. previous paragraph for the associated computational complexity). Once the ensemble of disks is identified, the application of the disk algorithm necessitates 2P N DDmax multiplications (for P separately treated source spatial mixing vectors) to compute the normalized inner product metric between the estimated spatial mixing vector and the spatial mixing vector of each of the DDmax disks. SVB-SCCD The SVB-SCCD algorithm is based on alternating updates of five matrices, among which the update of the signal matrix is the most complex. This step involves ˜ TG ˜ + ρ(TT T + ID )−1 . the multiplication of a temporary data vector with the matrix G As this matrix is very large, the following inversion lemma is employed: ˜ TG ˜ + M)−1 = M−1 − M−1 G ˜ T (IN + GM ˜ −1 G ˜ T )−1 GM ˜ −1 (G where M = ρ(TT T + ID ). Since the matrix M is sparse, it can be computed at a low computational cost of O( 23 D2 ) and inverted using only O(4D2 ) real-valued multiplica˜ −1 G ˜ T , which necessitates O(N 2 D) FLOPs, tions. Compared to the multiplication GM the computation and inversion of the matrix M can be neglected. The computation of the inverse (IN + GM−1 GT )−1 is avoided by resorting to the Cholesky decomposition that requires O( 16 N 3 ) real-valued multiplications. These operations are performed only once at the beginning of the algorithm. Furthermore, at each iteration, the update of the signal matrix requires O((3N D + N 2 )T I) FLOPs. Compared to this high computational cost, the number of real-valued mutliplications that are performed for the updates of the four other matrices are neglible. The formulas which have been derived above for the computational complexity of various source imaging algorithms are difficult to interpret due to their dependence on a large number of parameters. To give an idea of the computational complexity that can be encountered in practice for the compared methods, we compute the number of FLOPs for fixed values of parameters that are also employed for the computer simulations in Section 4.7.2 and vary only the parameters associated with the data, namely the number of sensors and the number of time samples. For the analysis of the computational complexity as a function of the number of sensors, we consider that the algorithms are applied for distributed source localization only, which means that we consider that only one time sample corresponding to the maximum of the epileptic spike is employed for the source imaging methods that work on a sample-by-sample basis, i.e., for sLORETA, cLORETA, MCE, and SVB-SCCD, whereas T = 200 time samples are exploited for the other methods. By contrast, when determining the influence of the number of time samples on the computational cost of the algorithms, we change this parameter for all methods.
106
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
Number of real-valued multiplications sLORETA
1 2 N D 2
+ 61 N 3 + N DT + min(N 2 D, N 2 T )
cLORETA
1 2 N D 2
+ 61 N 3 + N DT + min(N 2 D, N 2 T )
MCE
2N DT I
MxNE
1 2 N T 2
Champagne
3 (N 2 T 2
+ N 3 ) + N 2 D + DN T + (2N 2 D + 67 N 3 )I
4-ExSo-MUSIC
1 N 4T 24
+ 16 N 6 + 41 DDmax N 3 + 54 N 2 DDmax
STWV-DA STWV DA SVB-SCCD
+ 34 N 3 + 2N RT + 2N DRI + DT R
¯ T K + 56R(R − 1)Nsw (N 0 + T + K) 2N 0 N +9IP N 0 T K 2P DDmax N DN 2 + 16 N 3 + (3N D + N 2 )T I
Table 4.7: Computational complexity in terms of real-valued multiplications for different distributed source localization algorithms. As Figure 4.17 (left) shows, the 4-ExSo-MUSIC algorithm clearly exhibits the highest computational cost, which augments rapidly with increasing number of sensors because the cost of the EVD of the cumulant matrix depends only on the number of sensors and dominates the cost of all other operations. The second highest computational complexity is attained by the Champagne algorithm. For large numbers of sensors, this method also requires a considerably increased number of FLOPs compared to all other source imaging algorithms except for 4-ExSo-MUSIC. SVB-SCCD requires approximately the same number of FLOPs as Champagne for small numbers of sensors, but contrary to Champagne, the computational complexity of SVB-SCCD stays approximately constant for all considered numbers of sensors. MCE, MxNE, and STWV-DA exhibit smaller computational costs for small numbers of sensors and approach the SVB-SCCD curve for large numbers of sensors. The cLORETA and sLORETA algorithms have the lowest computational complexities. In Figure 4.17 (right), it can be seen that the computational complexities of 4-ExSoMUSIC, Champagne, and MxNE hardly change with increasing numbers of time samples. 4-ExSo-MUSIC requires about 100 times as many FLOPs as MxNE, whereas Champagne necessitates 10 times as many real-valued multiplications as MxNE. For small numbers of times samples, cLORETA, sLORETA, and STWV-DA have a smaller computational complexity than MxNE, the minimal computational cost being associated with cLORETA and sLORETA and corresponding to only one tenth of the number of FLOPs required by MxNE. For more than 1000 time samples, the computational complexities of sLORETA, cLORETA, and STWV-DA become comparable, augmenting linearly with the number of time samples and exceeding the computational cost of MxNE. The number of FLOPs that have to be executed for SVB-SCCD and MCE also increases linearly with the number of time samples, but is about 100 times higher than for sLORETA, cLORETA, and STWVDA, surpassing the computational costs of Champagne and 4-ExSo-MUSIC for more than
Comparative performance study
107
100 and 1000 time samples, respectively. 14
13
10
10
12
10 12
10
complexity
complexity
11
10
10
10
10
10
9
10
8
10
8
10
7
6
10
10 50
100
150
200
250
0
10
number of sensors
1
10
2
10
3
10
4
10
number of time samples
Figure 4.17: Computational complexity in terms of real-valued multiplications of eight source imaging algorithms as a function of the number of sensors for T = 200 time samples (left) and as a function of the number of time samples for N = 91 electrodes (right).
Finally, we also examine the CPU runtimes that are required for the application of the different source imaging methods. The algorithms are implemented in Matlab and run on a machine with a 2.7 GHz processor and 8 GB of RAM. The average runtimes recorded for the considered three-patch scenarios (cf. Section 4.7.2.4) are listed in Table 4.8. Note that the runtime of 4-ExSo-MUSIC cannot be compared to that of the other algorithms because this method is partly implemented in C. Note also that we have determined the CPU times under the assumption that all parameters have been fixed previously and we have not considered operations that need to be performed only once before the actual application of the methods such as the determination of the Laplacian matrix for cLORETA or the construction of the dictionary of disks for STWV-DA and 4-ExSo-MUSIC.
sLORETA cLORETA SVB-SCCD MxNE MCE Champagne STWV-DA 4-ExSo-MUSIC
CPU runtime in s 0.18 0.03 120 5.9 2.2 233 156 58
Table 4.8: Average CPU runtime of the different source imaging algorithms for the considered three-patch scenarios.
The CPU runtimes confirm our observations with respect to the computational complexities of the methods. Champagne displays the longest CPU time, followed by STWVDA and SVB-SCCD whereas sLORETA and especially cLORETA are faster by a factor of about 1000.
108
4.7.2
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
Evaluation of the source imaging results
In this section, we evaluate the source localization results obtained with the different source imaging algorithms for a number of realistic simulation scenarios. 4.7.2.1
Simulation setup
Data generation We simulate EEG recordings for an electrode cap comprising N = 91 electrodes and data intervals with a length of T = 200 time samples (at a sampling frequency of 256 Hz). The data are generated by D = 19626 source dipoles emitting either interictal epileptiform spike-like signals, originating from the considered patches, or background activity. Both types of signals are obtained using the neuronal population model described in Section 2.5.2. Tested methods All source imaging algorithms are applied to spatially prewhitened data, except for Champagne, which exploits the knowledge of the noise covariance instead. To improve the SNR, we average the data of 10 spikes. The STWV-DA, 4-ExSo-MUSIC, sLORETA, cLORETA, and SVB-SCCD algorithms are implemented as described in Sections 4.4.2.1 and 4.5.4.1. Note, however, that we increase the size of the disks employed for STWV-DA and 4-ExSo-MUSIC to Dmax = 400 dipoles in the case of the large patch considered in Section 4.7.2.3. Like STWV-DA, MxNE and Champagne are applied to 200 time samples of epileptic activity, whereas MCE, which does not exploit temporal information, is applied to the time sample which corresponds to the maximum of the spike. For MCE, as the orientations of the dipoles of the source space are fixed, we do not need to determine them using a preliminary MNE solution. Instead, we directly minimize ˜ s||22 + λ||˜ the cost function L(˜ s) = ||x − G˜ s||1 using FISTA, which is more efficient than linear programming and also permits to retrieve the sign of the source signals. For MxNE, we also use FISTA to solve the optimization problem. In both cases, the regularization parameter is chosen according to the level of sparsity that we aim to achieve. Evaluation criteria The obtained source localization results are evaluated using the ROC curves and the DLE (see Section 4.4.2.1), which are averaged over 50 realizations of EEG data for different epileptiform signals and background activity. Furthermore, we plot the source distributions that are estimated by the tested source imaging methods for one realization in comparison to the original patches. 4.7.2.2
Influence of the patch position
Since superficial sources exhibit more focal distributions of the electric potential than deep sources, this may favor the source localization procedure. Furthermore, the signals emanating from deep sources lead to smaller amplitudes at the sensor level than those of superficial sources and therefore correspond to smaller SNRs for the same background activity. It is thus significant to determine the influence of the patch position on the localization accuracy of the different source localization methods. To this end, in the first simulation, we consider 8 different patches: InfFr, InfPa, Cing, SupOcc, PreC, BasTe, MidTe, and Hipp (see Figure 2.6 for an illustration of the patches). We evaluate the performance of the distributed source localization algorithms based on the ROC curves, which are plotted in Figure 4.18, and the DLE, displayed in Table 4.9. The patches that
Comparative performance study
109
are recovered by the algorithm which yields the highest TPF at an FPF of 0.2 % are shown in Figure 4.19 in comparison to the original patches. patch sLORETA cLORETA SVB-SCCD MxNE MCE Champagne STWV-DA 4-ExSo-MUSIC
InfFr 3.61 1.43 1.06 8.18 3.99 4.60 0.44 0.44
InfPa 2.41 2.72 0.13 6.18 3.19 4.03 0.95 0.96
Cing 3.28 24.8 6.44 27.9 22.5 2.95 1.70 1.70
SupOcc 2.05 1.28 0.37 4.53 3.07 2.47 0.57 0.58
PreC 4.07 3.17 3.33 10.6 4.86 2.95 1.84 1.84
BasTe 2.26 3.56 1.41 9.89 6.45 2.29 2.29 2.39
MidTe 6.28 7.43 0.74 8.53 5.20 4.37 1.85 1.83
Hipp 3.11 11.1 6.22 18.4 14.7 4.09 1.86 1.83
Table 4.9: Performance of source imaging algorithms in terms of DLE (in cm) for the 8 different single patch scenarios. The smallest obtained DLE value for each patch is marked in red. Comparing the results achieved by the different source imaging algorithms, we note that for 4 of the 8 examined patches (InfPa, SupOcc, BasTe, and MidTe), the best results in terms of both DLE and ROC are achieved by SVB-SCCD, whereas STWV-DA and 4-ExSo-MUSIC yield the best performance for the 4 other scenarios (InfFr, Cing, PreC, and Hipp). These two methods lead to almost identical source localization results because in the case of single patches, there is no source separation to be performed and the patch estimates are mostly influenced by the employed dictionary of disks, which is the same for STWV-DA and 4-ExSo-MUSIC. Among the focal source localization methods, sLORETA and cLORETA generally yield better results in terms of both DLE and ROC than MCE and MxNE. For all tested scenarios, MCE outperformed MxNE. This is surprising as one could have expected that the exploitation of temporal information would lead to more robust source estimates. However, the reconstructed source distributions for MCE and MxNE are very similar and this result could be related to our somewhat abusive use of the sparse source estimation techniques to recover patches of larger extent. Finally, the Champagne algorithm leads to intermediate DLEs because it always identifies a small number of dipoles at the correct source position. Albeit, as the ROC curves show, this method is the least suited to recover the spatial extent of the patches since the TPF remains below 40% even for high FPF of up to 6%. This is due to the fact that Champagne recovers very sparse source estimates. Concerning the patch location, the simulation results show that the examined source localization methods generally yield good results for superficial patches (InfFr, InfPa, SupOcc, and MidTe), for which the DLE of the best method is below 1 cm. This is also reflected by the good correspondence of original and estimated patches (see Figure 4.19), in particular for patches InfFr, InfPa, and SupOcc. The only exception is patch PreC, for which the obtained DLE and ROC curves are slightly worse and for which the source region recovered by SVB-SCCD is much larger than the original patch. Furthermore, the source imaging algorithms exhibit some difficulties for accurately recovering deep patches such as BasTe and Hipp, as well as for the patch Cing, located between the two hemispheres. In this case, the ROC curves are less steep than for superficial patches such as InfPa. Nevertheless, the DLE obtained for the best method is still smaller than 2 cm.
110
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
Figure 4.18: Illustration of the ROC curves for the different single patch scenarios.
Comparative performance study
111
Figure 4.19: Illustration of the recovered patches for the different single patch scenarios. Triangles belonging to the original patches are marked in red, correctly identified triangles are dark red and erroneously identified triangles are yellow. 4.7.2.3
Influence of the patch size
In order to determine the influence of the source extent on the performance of the different source imaging algorithms, we vary the size of the patch SupFr and consider three patches composed of 10, 100, and 400 grid dipoles corresponding to an area of approximately 50, 500 and 2000 mm2 . The ROC curves and the estimated source reconstructions for these three scenarios are shown in Figures 4.20 and 4.21, respectively, and Table 4.10 summarizes the obtained DLE values. patch size sLORETA cLORETA SVB-SCCD MxNE MCE Champagne STWV-DA 4-ExSo-MUSIC
10 dipoles 3.33 4.72 3.54 9.99 5.04 4.01 0.48 0.79
100 dipoles 1.87 1.36 0.33 6.17 3.11 2.52 0.41 0.45
400 dipoles 5.71 0.71 1.00 6.83 4.80 7.06 1.14 1.15
Table 4.10: Performance of source imaging algorithms in terms of DLE (in cm) depending on the size of the patch SupFr. The smallest obtained DLE value for each patch is marked in red. As can be seen in Figure 4.20, the performance of sLORETA, Champagne, MCE, and MxNE in terms of ROC decreases as the patch area increases. This is due to the fact that these methods and in particular the algorithms exploiting sparsity are conceived for focal sources and are not well suited to recover the spatial extent of the sources.
112
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
100
100
80
80
sLORETA cLORETA SVB−SCCD MxNE−FISTA Champagne MCE−FISTA STWV−DA 4−ExSo−MUSIC
40
20
0 0
1
2
3
4
5
60
TPF
TPF
60
40
20
0 0
6
1
2
FPF
3
4
5
6
FPF 100
80
TPF
60
40
20
0 0
1
2
3
4
5
6
FPF
Figure 4.20: ROC curves obtained for the different source imaging methods for different sizes of the patch SupFr: (top left) 10 dipoles, (top right) 200 dipoles, (bottom) 400 dipoles 4-ExSo-MUSIC, STWV-DA, and SVB-SCCD on the other hand have been developed to localize distributed sources and can therefore accurately recover sources of larger size. The hypothesis of spatial smoothness exploited by cLORETA also favors the reconstruction of extended sources, leading to the best performance among all tested methods for the largest considered patch. In terms of source reconstruction quality based on DLE, all methods except for cLORETA yield the highest reconstruction accuracy for a patch of medium size. For cLORETA, on the other hand, we observe a continually decreasing DLE for augmenting patch size. Comparing the original and estimated source configurations for the smallest patch, we notice that the best source reconstructions are achieved by STWV-DA and 4-ExSo-MUSIC, which is also reflected by the DLE scores. The source dipoles of maximal amplitudes identified by Champagne, MCE, and MxNE are close to the true patch, but not exactly at the correct position. Furthermore, SVB-SCCD overestimates the size of the patch, recovering a much larger source region, as is the case for sLORETA and cLORETA. In fact, these three methods yield nearly the same estimated source distributions for the patches of small and medium sizes. These source reconstructions are better suited for the medium-sized patch, which leads to smaller DLE in this case. In particular, SVB-SCCD achieves the best DLE of all methods for the medium-sized patch. Note that sLORETA, cLORETA, MCE, and MxNE also identify several dipoles on the right hemisphere, unlike the distributed source localization algorithms VB-SCCD, STWV-DA, and 4-ExSo-MUSIC. Nevertheless, for cLORETA, the amplitudes of the contralateral source
Comparative performance study
113
Figure 4.21: Illustration of the recovered source distributions for the different patch sizes.
114
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
dipoles are rather small. For the largest patch, cLORETA and SVB-SCCD outperform STWV-DA and 4-ExSo-MUSIC as well as the other tested methods. While the source distribution recovered by sLORETA indicates also a patch of larger extent, the sparse estimates obtained by MCE, MxNE, and, in particular, Champagne are not suited to determine the shape of the patch and would rather suggest the existence of several distinct point sources. 4.7.2.4
Influence of the patch number
In practice, one is often confronted with measurements that originate from several quasisimultaneous active source regions within the brain. In this section, we analyze the ability of the different source imaging algorithms to identify two or three patches that are involved in epileptic spike propagation. To this end, we first consider two scenarios with two patches of medium distance, composed of the patch InfFr combined once with the patch InfPa and once with the patch MidTe. In both cases, we use the same signals for the dipoles of both patches except for a 3 to 4 sample delay from one patch to another. The obtained source localization performance in terms of ROC and DLE is shown in Figure 4.22 and Table 4.11, respectively, and the reconstructed sources are displayed in Figure 4.23. 100
100
80
80
60
20
1
2
3
FPF
4
5
TPF
TPF 40
0 0
60
sLORETA cLORETA SVB−SCCD MxNE−FISTA Champagne MCE−FISTA STWV−DA 4−ExSo−MUSIC
40
20
6
0 0
1
2
3
4
5
6
FPF
Figure 4.22: ROC curves obtained for the different source imaging methods for the scenario InfFr & InfPa (left) and the scenario InfFr & MidTe (right). The patches are all located on the lateral aspect of the left hemisphere, but the patch MidTe is partly located in a sulcus, leading to weaker surface signals than the patches InfFr and InfPa, which are mostly on a gyral convexity. This has an immediate influence on the performance of all source imaging algorithms, except for SVB-SCCD and Champagne, leading to a decrease in TPF for a given FPF and an increase in DLE for the scenario InfFr & MidTe compared to the scenario InfFr & InfPa. When comparing the estimated source distributions, this difference in performance becomes apparent through the fact that for the scenario InfFr & InfPa, all source imaging algorithms exhibit high dipole amplitudes for dipoles belonging to each of the true patches. For the scenario InfFr & MidTe on the other hand, the weak patch is less visible on the estimated source distributions of cLORETA, MCE, and MxNE, slightly better visible on the sLORETA solution, but completely missing for 4-ExSo-MUSIC. SVB-SCCD and STWV-DA both recover the patch MidTe, but with smaller amplitude in case of SVB-SCCD and smaller size for STWV-DA.
Comparative performance study
115
Figure 4.23: Illustration of the original patches and the recovered source distributions for the two considered two-patch scenarios.
116
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
scenario sLORETA cLORETA SVB-SCCD MxNE MCE Champagne STWV-DA 4-ExSo-MUSIC
InfFr & InfPa 2.92 1.97 0.59 5.19 3.63 4.03 0.59 0.62
InfFr & MidTe 6.11 2.86 0.93 6.26 4.52 4.34 1.17 14.9
InfFr & InfPa & SupOcc 15.3 2.84 0.65 4.59 4.01 3.92 1.30 5.94
InfFr & MidTe & OccTe 5.45 3.43 1.42 5.61 4.54 4.83 7.68 4.33
Table 4.11: Performance of source imaging algorithms in terms of DLE for the considered scenarios with two and three patches. The smallest obtained DLE value for each scenario is marked in red. Next, we determine the performance of the source imaging algorithms when adding the patch SupOcc to the previously considered scenario InfFr & InfPa and the patch OccTe to the scenario InfFr & MidTe, thus further complicating the correct recovery of the active grid dipoles. In both cases, we consider a delay of 3 to 4 samples between the signals of the first two patches and a delay of 5 to 6 samples between the signals of the first and third patch. For the resulting ROC curves and source reconstructions, see Figures 4.24 and 4.25. The DLEs are tabulated in Table 4.11. 100
100
80
80
60
20
1
2
3
FPF
4
5
TPF
TPF 40
0 0
60
sLORETA cLORETA SVB−SCCD MxNE−FISTA Champagne MCE−FISTA STWV−DA 4−ExSo−MUSIC
40
20
6
0 0
1
2
3
4
5
6
FPF
Figure 4.24: ROC curves obtained for the different source imaging methods for the scenario InfFr & InfPa & SupOcc (left) and the scenario InfFr & MidTe & OccTe (right). The best results are achieved by SVB-SCCD, followed by STWV-DA and cLORETA. STWV-DA leads to good results for the scenario InfFr & InfPa & SupOcc, but underestimates the extent of the identified active source region that can be associated to the patch MidTe for the scenario InfFr & MidTe & OccTe. Furthermore, for this scenario, STWV-DA also identifies several spurious source regions of small size located between the patches MidTe and InfFr for some realizations. The cLORETA solution exhibits dipoles with high amplitudes at all three patch positions, but makes it difficult to determine the patch shape. The same problem is exacerbated for MxNE, MCE, and Champagne, which additionally identify a number of scattered dipoles around the three foci of brain activity. This explains the reduced performance compared to SVB-SCCD. 4-ExSo-MUSIC yields
Comparative performance study
117
Figure 4.25: Illustration of the original patches and the recovered source distributions for the two considered three-patch scenarios.
118
CHAPTER 4. DISTRIBUTED SOURCE LOCALIZATION
similar source localization results to STWV-DA for the scenario InfFr & InfPa & SupOcc, but overestimates the size of the patch InfFr while determining a smaller source region for the patch SupOcc. Furthermore, the estimated source region that can be associated with patch InfPa is not at the correct position, leading to a higher DLE and worse ROC curve than STWV-DA. In the case of scenario InfFr & MidTe & OccTe, 4-ExSo-MUSIC recovers only one of the two patches located in the temporal lobe, namely the patch OccTe. Finally, sLORETA leads to blurred source reconstructions and does not permit to distinguish between the close patches InfPa and SupOcc or MidTe and OccTe.
4.7.3
Discussion
To summarize the findings of the simulation study, we can say that sLORETA, Champagne, MCE, and MxNE recover well the source positions, though not their spatial extents as they are conceived for focal sources, while 4-ExSo-MUSIC, STWV-DA, and SVBSCCD also permit to obtain an accurate estimate of the source size. cLORETA leads to intermediate results as the identified dipoles with high amplitudes and smooth source distributions correspond well to the patches, but make it difficult to delineate the source regions. Nevertheless, the spatial smoothness prior has proven to be especially effective for large patch sizes where cLORETA achieves the best performance in terms of DLE. Furthermore, from a computational point of view, cLORETA is the most efficient of all tested source imaging methods. Among the methods for focal source reconstruction, Champagne leads to the sparsest source distributions, identifying only a very small number of dipoles. This result might be explained by the fact that Champagne is based on the assumption that all grid dipoles emit independent source activities, which is violated by the patches. Moreover, in the Champagne algorithm, there is no parameter that can be adjusted to obtain different levels of sparsity, contrary to MCE and MxNE where this is achieved by varying the regularization parameter. At the same time, this is also an important advantage of Champagne because in practice, the tuning of parameters is tedious and time-consuming, and even though Champagne does not identify the source extents, it still permits to accurately localize the foci of the source activity. Combined with an adequate scheme for distributed source localization, Champagne could thus become a powerful tool for source imaging. Nevertheless, the self-reliance of Champagne comes at an increased computational cost compared to most other source imaging approaches. While STWV-DA and 4-ExSo-MUSIC lead to similar patch estimates for the single patch scenarios and in some cases also for multipatch scenarios, all in all, STWV-DA outperforms 4-ExSo-MUSIC as it leads to better source estimates in the presence of multiple patches, where 4-ExSo-MUSIC does not localize all patches or leads to erroneous patch estimates. Furthermore, STWV-DA has a lower computational complexity than 4-ExSo-MUSIC, in particular for large numbers of sensors. Compared to SVB-SCCD, STWV-DA only leads to better results for the smallest considered patch. Otherwise, SVB-SCCD yields slightly better source estimates than STWV-DA, which is due to its greater flexibility in recovering the patch shape. Because of the use of a dictionary of disks, STWV-DA tends to recover circular-shaped source regions. Even though this has not been explicitly shown in the above simulations, we noticed that most of the methods except for STWV-DA require prewhitening of the data or a good estimate of the noise covariance matrix (in case of Champagne) in order to yield accurate
Comparative performance study
119
results. On the one hand, this can be explained by the assumption of spatially white Gaussian noise made by some approaches, while on the other hand, the prewhitening also leads to a decorrelation of the lead field vectors and therefore to a better conditioning of the lead field matrix, which consequently facilitates the correct identification of active grid dipoles. Finally, we have observed that the source imaging methods generally work well for superficial patches, but have difficulties in identifying deep patches and mesial sources, located close to the midline. On the whole, for the situations adressed in our simulation study, STWV-DA and SVB-SCCD seem to be the most promising algorithms for distributed source localization, both in terms of robustness and source reconstruction quality.
Chapter 5 Summary and conclusions In this thesis, we have aimed at identifying the positions and spatial extents of epileptogenic zones based on EEG recordings. In particular, we have addressed the issue of localizing simultaneously active brain regions with highly correlated time signals arising from the propagation of epileptic phenomena. To deal with this challenging problem, we have proposed a multistage approach consisting of three steps: the extraction of the epileptic signals from the noisy data, the separation of correlated sources, and the localization of distributed sources. In Section 5.1, we summarize the techniques that have been developed in this thesis for the individual steps and illustrate their combination on a simulation example. Our conclusions are then summed up in Section 5.2. Finally, we suggest some directions for future work in Section 5.3.
5.1
Summary and illustration of the complete data analysis process
To extract the epileptic spikes from the EEG data which are corrupted by artifacts, we have exploited the different physiological origins of the sources, assuming that the latter are statistically independent, and have considered methods based on ICA. More particularly, we have developed a new, semi-algebraic deflation algorithm, called P-SAUD, that relies on the autocorrelation of the signals to extract the epileptic components using a small number of deflation steps. The denoised EEG recordings are then reconstructed from the identified epileptic components. As several distributed source localization techniques show difficulties in localizing simultaneously active patches, in particular in the case of correlated activities, we have then explored the use of tensor methods based on the CP decomposition to separate several potentially correlated sources. Here, we have concentrated on transform-based tensor methods, which construct a data tensor using a time-frequency transform, leading to the classical STF method, or a space-wave-vector transform, resulting in the new STWV approach. These techniques aim at extracting a spatial mixing vector and a signal vector for each distributed source. Contrary to previous studies of tensor-based approaches [69, 70, 71, 72, 73], which have mainly focused on source separation and equivalent dipole localization, we have then gone a step further and have employed the results of the tensor-based preprocessing step for the localization of distributed sources. In this context, an important contribution of this thesis 121
122
CHAPTER 5. SUMMARY AND CONCLUSIONS
consists in the proposition of the DA algorithm, which permits us to accurately localize distributed sources based on the estimated spatial mixing vectors. This method is based on a parameterization of the distributed source, similar to [152, 15], but utilizes a different metric to identify the elements of the source space that best describe the measurements. Furthermore, we have examined other distributed source localization approaches that have been proposed in the literature, providing an overview of the different types of a priori information that have been exploited to solve the ill-posed linear inverse problem of the brain. These hypotheses can be broadly distinguished into constraints that are imposed on the spatial distribution of the sources and constraints that concern their temporal distribution. We have then presented a taxonomy of brain source imaging methods by classifying existing algorithms based on methodological considerations and exploited a priori information. Finally, we have proposed several improvements of the VB-SCCD source imaging method, which is based on structured sparsity, to develop an efficient algorithm for the simultaneous localization of multiple patches. The resulting algorithm is termed SVB-SCCD. To illustrate the combination of the three data processing steps using the P-SAUD, STWV, and DA methods, we consider a simulation example. As for all simulations conducted in this thesis, we employ a realistic head model, distributed sources that are characterized by patches with highly-correlated, physiologically plausible epileptiform spike signals, and artifacts recorded during an EEG session, resulting in a realistic setting for the performance evaluation of the tested methods. We simulate 32 s of EEG data for a 91 channel system using two patches, located in the inferior frontal and the inferior parietal lobes of the left hemisphere and emitting propagated interictal epileptiform spike signals with a delay of about 16–18 ms between the two patches. The data are corrupted by muscle artifacts according to an SNR of -15 dB. Figure 5.1 (left) shows an excerpt of the noisy EEG recordings for 32 of the 91 electrodes and the original patches. To extract the epileptic activity from the artifacts, we first apply the P-SAUD algorithm to the raw EEG measurements, followed by the STWV analysis and DA for source separation and localization, respectively. The EEG data containing only the epileptic spikes that have been reconstructed from the P-SAUD results and the patches localized by STWV-DA based on the denoised data are shown in Figure 5.1 (right). Comparing the original and estimated source regions, we note the good performance of the proposed three-step procedure.
5.2
Conclusions
The P-SAUD algorithm has been shown to extract the epileptic signals with the same accuracy as conventional ICA methods, but at a considerably reduced computational complexity. To achieve this, we have combined the strengths of three classical BSS methods to derive a new, efficient, semi-algebraic deflation procedure that resorts to the autocorrelation of the signals to determine the order of the identified ICA components. The exploitation of the temporal structure of the data obviates the need of reference signals, which have previously been used in the ICA-R approach [35, 36, 34, 37, 38, 39] to extract only the signals of interest, but can be difficult to determine in practice. For the considered source dynamics of propagated epileptic spike signals, the STF tensor method displayed very poor results. Indeed, this technique does not permit to separate the highly correlated sources because it exploits discrepancies in the time-frequency
Conclusions
123
reconstructed data
POZ P10 P9 FT10 FT9 CP6 CP5 FC5 FC6 CP2 CP1 FC2 FC1 PZ CZ FZ T6 T5 T4 T3 F8 F7 O2 O1 P4 P3 C4 C3 F4 F3 FP2 FP1 0
channels
channels
noisy data
1
2
3
4
time in s
5
6
7
8
POZ P10 P9 FT10 FT9 CP6 CP5 FC5 FC6 CP2 CP1 FC2 FC1 PZ CZ FZ T6 T5 T4 T3 F8 F7 O2 O1 P4 P3 C4 C3 F4 F3 FP2 FP1 0
1
2
3
4
5
6
7
8
time in s
Figure 5.1: Left: Excerpt of the noisy EEG data for 32 out of 91 channels (top) generated for two patches (bottom) with epileptiform spike-like signals, a sampling rate of 256 Hz and an SNR of -15 dB. Right: Extract of EEG data reconstructed from epileptic signals identified by P-SAUD (top) and patches localized by STWV-DA (bottom) based on the denoised data.
domain to discriminate between the sources, and the differences between the time and frequency characteristics of the different patches are negligible in the context of propagation phenomena. By contrast, the STWV analysis makes use of the space-wave-vector contents of the sources to separate them, which enables the method to cope with highly correlated source signals as long as the sources are not completely coherent and sufficiently distant to yield different space and wave vector characteristics. This explains the good performance of the STWV-based source localization approach which has been observed for a number
124
CHAPTER 5. SUMMARY AND CONCLUSIONS
of scenarios and especially in the case of several simultaneously active superficial patches. Nevertheless, the STWV analysis fails in certain cases such as for deep sources even if there is neither noise nor artifacts and all signals within a patch are identical. This is due to discrepancies between the structure of the EEG data and the employed trilinear tensor model. To clarify in which cases the tensor-based methods can be applied successfully for the separation of the sources, we have conducted a theoretical analysis, revealing that among the correlation between space, time, and wave vector characteristics of different sources, the source strength is also decisive for the separability of the sources. Even though the EEG preprocessing can sometimes be accomplished by ICA or tensor methods only, these two approaches are mostly complementary. Inherently, ICA is not suited for the separation of correlated sources and leads to poor results for overlapping epileptic spike signals, whereas tensor decomposition methods are not as robust to artifacts as ICA. Therefore, to achieve good preprocessing results for EEG data that frequently contain correlated sources due to propagation phenomena and are generally corrupted by artifacts of high amplitudes, the two preprocessing methods should be combined. Nevertheless, the preprocessed data should be analyzed carefully as errors in the denoising and source separation steps may accumulate. The performance study of different source imaging methods has shown that the tested algorithms can be distinguished into methods that permit only to determine the source positions, and algorithms that provide also an indication of the spatial extents of the sources. In the context of epileptic source localization, we are particularly interested in the second type of approach and have proposed two new algorithms. The STWV-DA algorithm, which localizes distributed sources based on the results of the tensor decomposition, has been shown to yield good results if the sources have been accurately separated in the preprocessing step, outperforming other source imaging methods in this case. In particular, even though prewhitening improves the source localization results obtained by STWV-DA, this method also leads to good results when applied to the raw EEG data, which is not always the case for other techniques such as 4-ExSo-MUSIC. This is of great interest because prewhitening requires knowledge of the noise covariance matrix, which is unknown and generally difficult to estimate in practice. Another advantage of STWV-DA over 4-ExSo-MUSIC consists in its reduced computational complexity. With an increasingly widespread use of high-resolution EEG, this is an important point to avoid unacceptably high runtimes. However, STWV-DA is not suitable for the simultaneous localization of several patches, i.e., for the localization of distributed sources that cannot be separated due to their almost coherent source signals. In this case, the SVB-SCCD algorithm should be used. This approach has the additional advantage that it is flexible with respect to the shape of the patch, whereas methods such as STWV-DA and 4-ExSo-MUSIC that employ a dictionary of potential distributed source regions tend to recover patches with similar shapes as the dictionary elements. The SVB-SCCD method also allows for an exploitation of temporal structure, which leads to more robust source estimates in difficult cases comprising several close patches. Finally, we have analyzed the combination of the STWV tensor analysis and SVB-SCCD for multipatch scenarios where only some of the patches can be separated, but the resulting STWV-SVB-SCCD algorithm performed slightly worse than SVB-SCCD. This shows that the tensor-based preprocessing approach for source separation is only effective when combined with source localization methods that have difficulties in localizing several patches simultaneously, such as DA.
Perspectives
5.3
125
Perspectives
Based on the obtained results, we can identify several promising directions for future research. First of all, it would be interesting to apply the P-SAUD algorithm for artifact removal from high-resolution EEG measurements. Due to the large numbers of electrodes, in this case, one can expect a particularly high gain on computational complexity compared to conventional ICA methods for joint source extraction. Furthermore, one could consider employing the P-SAUD algorithm for the denoising of epileptic seizure data where finding a compromise between the COM2 and CCA solutions based on the penalized contrast function might lead to improved performances. Concerning the separation of correlated sources based on the CP decomposition, we have only treated STF and STWV tensors in this thesis. As discussed above, the trilinear approximation is not always justified in this case. To overcome this problem, on the one hand, one could explore the use of tensors with different dimensions, such as STR data, which might better fit the structure of the CP model. On the other hand, one could also employ other tensor decompositions with a less rigid structure that better reflect the actual structure of the data. Another track for future research in brain source imaging consists in further exploring different combinations of a priori information, for example by merging successful strategies of different recently established distributed source localization approaches, such as tensorbased techniques, extended source scanning methods, or Bayesian approaches and sparsity. Additionally, one could try to further improve the results of the current source imaging algorithms by applying depth bias compensation techniques. It would also be desirable to develop methods for the automatic thresholding of the reconstructed source distributions in order to infer the spatial extent of the source regions from continuous source imaging solutions as obtained by regularized least squares approaches. The methods discussed in this thesis could also be applied to MEG data and one could further pursue the exploitation of combined EEG/MEG recordings, which we have discussed only briefly for the tensor-based preprocessing. Finally, it would be important to perform more evaluations on clinical EEG data for which a strong hypothesis on the epileptogenic source regions is available to confirm the good functioning of the proposed algorithms in real-world settings.
Appendices
127
Appendix A Higher order statistics In this appendix, we provide some background information on higher order cumulants, which are exploited by several ICA algorithms discussed in Section 3.1 and by the 2qExSo-MUSIC algorithm for distributed source localization (see Section 4.3.3.2).
A.1
Cumulants
The definition of cumulants is based on the second characteristic function [170] T ∗ wH x + wΨ x = ln E exp j Ψ 2
(
∗ Ψ(wΨ , wΨ )
!)!
(A.1)
of the random vector x. The cumulants then correspond to the coefficients of the Taylor ∗ ). For zero-mean stationary processes, one can exploit the relationship series of Ψ(wΨ , wΨ between cumulants and moments, given by the Leonov-Shiryaev formula [154], which permits to compute the cumulant of order n = 2q from the moments of order less than or equal to 2q. For a real-valued random vector x ∈ RN , this relationship is given by: cum(xi1 , . . . , xin ) =
n X
(−1)p−1 (p − 1)! E
xij
p=1
E
Y
j∈S2
xij
...E
Y
j∈Sp
xij
·
j∈S1
Y
.
(A.2)
Here, S1 , S2 , . . . Sp describe all possible partitions of (xi1 , . . . , xin ) into p sets and i1 , . . . , in ∈ {1, . . . , N }. The moments are computed for a zero lag between the arguments. In practice, the moments are estimated from a finite number of T time samples and the expected values are replaced by the estimates T 1X E{xi1 , xi2 , . . . , xin } ≈ xi [t]xi2 [t] . . . xin [t]. T t=1 1
(A.3)
In particular, at orders 2 and 4, the following equalities hold: cum(xi1 , xi2 ) = E{xi1 xi2 } cum(xi1 , xi2 , xi3 , xi4 ) = E{xi1 xi2 xi3 xi4 } − E{xi1 xi2 }E{xi3 xi4 } − E{xi1 xi3 }E{xi2 xi4 } − E{xi1 xi4 }E{xi2 xi3 }. 129
(A.4) (A.5)
130
A.2
APPENDIX A. HIGHER ORDER STATISTICS
Properties of cumulants
In the following, the most important properties of cumulants [171] are summarized: 1. Non-Gaussianity: The cumulants of order greater than 2 are 0 for Gaussian variables. 2. Sum of independent variables: The cumulant of the sum of two independent random vectors x = [x1 , . . . , xn ]T and y = [y1 , . . . , yn ]T equals the sum of the two cumulants: cum(x1 + y1 , . . . , xn + yn ) = cum(x1 , . . . , xn ) + cum(y1 , . . . , yn ). 3. Even distribution: For symmetrically distributed random variables (even probability density function), the cumulants of odd order n ≥ 3 are 0. 4. Symmetry: Cumulants are symmetric in their arguments, which means that they are invariant to a permutation of their arguments. 5. Subset of independent variables: If a subset of the arguments is independent of the rest, the cumulant is 0. 6. Multilinearity: If the random vector y is obtained through a linear transformation of the random vector x such that y = Qx, the cumulants of y are given by: P P cum(yk1 , . . . , ykn ) = i1 · · · in Qk1 ,i1 · · · Qkn ,in cum(xi1 , . . . , xin ). In consequence, higher order cumulants are insensitive to Gaussian noise, which constitutes an advantage of the use of cumulants over the use of higher order moments. It also means that BSS techniques based on higher order cumulants cannot be applied to Gaussian sources. Since the signals are often symmetrically distributed and due to property 3, many algorithms exploit only higher order statistics of even order.
Appendix B Semi-algebraic contrast optimization In this appendix, we describe how the COM2 and P-SAUD contrast functions, corresponding to equations (3.24) and (3.26), respectively, can be optimized with respect to the parameter θ that characterizes the Givens rotation.
B.1
Parameterized contrast functions
Both contrasts make use of the FO cumulants C4,sp and C4,sk of the signals {sp [t]} and {sk [t]}. Based on equation (3.25) and the multilinearity property of cumulants (cf. Appendix A.2), these cumulants can be computed from the cumulants of the data {zP −p+1 [t]} and {zk [t]} as follows: α4 θ4 + α3 θ3 + α2 θ2 + α1 θ + α0 (1 + θ2 )2 α0 θ4 − α1 θ3 + α2 θ2 − α3 θ + α4 = (1 + θ2 )2
C4,sp =
(B.1)
C4,sk
(B.2)
where we write θ for θp,k to simplify the notation and where α0 α1 α2 α3 α4
= cum(zP −p+1 , zP −p+1 , zP −p+1 , zP −p+1 ) = C4,zP −p+1 = 4 · cum(zP −p+1 , zP −p+1 , zP −p+1 , zk ) = 6 · cum(zP −p+1 , zP −p+1 , zk , zk ) = 4 · cum(zP −p+1 , zk , zk , zk ) = cum(zk , zk , zk , zk ) = C4,zk .
Here, cum(x1 , x2 , x3 , x4 ) denotes the FO cross-cumulant of the four random variables x1 , x2 , x3 , and x4 . The cumulants can be estimated from an estimate of the moments according to the Leonov-Shiryaev formula (see Appendix A.1, [154]). The contrasts are thus based on rational functions in θ. Due to the addition of the covariance-based penalization term in the P-SAUD algorithm, the optimization of the concrete contrast functions of COM2 and P-SAUD will be discussed separately in the following. 131
132
B.2
APPENDIX B. SEMI-ALGEBRAIC CONTRAST OPTIMIZATION
Optimization of the COM2 cost function
Due to the symmetry of the coefficients of the rational function that corresponds to the COM2 contrast, the degree of the numerator and denominator polynomials can be reduced from 8 to 4 by introducing the variable ξ = θ − 1θ . The identification of the parameter θ that maximizes the COM2 contrast function then reduces to the following steps (see [33] for more details): 1. determination of the optimal parameter ξ by rooting the fourth degree polynomial with coefficients b0 b1 b2 b3 b4
= 4(3(α3 α4 − α0 α1 ) + α1 α4 − α0 α3 + α2 α3 − α1 α2 ) = 12(α02 + α42 ) − 8(α0 α4 + α1 α3 ) − 4α22 = 3(α3 α4 − α0 α1 − α1 α4 + α0 α3 − α2 α3 + α1 α2 ) = 4(α02 + α42 ) − α12 − α32 − 2(α0 α2 + α2 α4 ) = α0 α1 − α3 α4
and identifying the root ξ for which the contrast is maximal, 2. identification of the parameter θ ∈] − 1, 1] for which θ2 − ξθ − 1 = 0.
B.3
Optimization of the P-SAUD cost function
For the P-SAUD contrast, we need to consider the penalization term in addition to the FO cumulants. The r-th covariance penalty that is included in the penalization term is given by β2,r θ2 + β1,r θ + β0,r cov(sp [t], sp [t + τr ]) = (B.3) 1 + θ2 with β0,r = cov(zP −p+1 [t], zP −p+1 [t + τr ]) β1,r = cov(zP −p+1 [t], zk [t + τr ]) + cov(zk [t], zP −p+1 [t + τr ]) β2,r = cov(zk [t], zk [t + τr ]). The penalization term in the contrast (3.26) can then be written as a fourth degree polynomial in θ with coefficients γ0 =
R X
2 λr β0,r
r=1 R X
γ1 = 2 γ2 =
r=1 R X
2 λr 2β0,r β2,r + β1,r
r=1 R X
γ3 = 2 γ4 =
λr β0,r β1,r
r=1 R X
λr β1,r β2,r
2 λr β2,r .
r=1
Optimization of the P-SAUD cost function
133
Inserting (B.1), (B.2), and (B.3) into the contrast (3.26), one obtains a rational function with polynomials of degree 8 in the numerator and denominator. Contrary to the COM2 contrast, we cannot reduce the degree of these polynomials because the coefficients are not symmetric. Therefore, in order to find the optima of the P-SAUD contrast, we have to determine the zeros of an 8-th degree polynomial with coefficients δ0 =2(α0 α1 − α3 α4 ) + γ1 δ1 =2(α32 + α12 + 2α2 α4 + 2α0 α2 − 2γ0 + γ2 ) − 8(α42 + α02 ) δ2 =6(α0 α3 + α1 α2 − α1 α4 − α2 α3 ) − 14(α0 α1 − α3 α4 ) − γ1 + 3γ3 δ3 = − 6(α12 + α32 ) + 16(α0 α4 + α1 α3 ) − 12(α0 α2 + α2 α4 ) + 8α22 + 2γ2 − 8γ0 + 4γ4 δ4 =20(α1 α4 + α2 α3 − α0 α3 − α1 α2 ) + 5(γ3 − γ1 ) δ5 =6(α12 + α32 ) − 16(α0 α4 + α1 α3 ) + 12(α0 α2 + α2 α4 ) − 2γ2 − 4γ0 + 8γ4 δ6 =6(α0 α3 + α1 α2 − α1 α4 − α2 α3 ) − 14(α0 α1 − α3 α4 ) + γ3 − 3γ1 δ7 = − 2(α32 + α12 + 2α2 α4 + 2α0 α2 − 2γ4 + γ2 ) + 8(α42 + α02 ) δ8 =2(α0 α1 − α3 α4 ) − γ3 . The optimal rotation angle θ corresponds to the real-valued zero for which the rational function attains its highest maximum.
Appendix C Tensor-based preprocessing of combined EEG/MEG data In Section 3.2 of this thesis, we analyze tensor-based methods for the separation of potentially correlated epileptic sources as a preprocessing step for source localization in the context of EEG data analysis. Nevertheless, the proposed methods can also be applied to MEG recordings. Since EEG and MEG measurements yield complementary information about the underlying sources and can be acquired simultaneously, several authors have examined the combination of EEG and MEG data in brain source localization algorithms [172, 173, 174, 175, 176, 177, 178], reporting a gain on accuracy of the source position estimates compared to the results of each modality alone. This raises the question whether combining EEG and MEG data in the preprocessing step also permits us to achieve an enhanced performance. This issue is addressed in this appendix, where we analyze the combination of EEG and MEG in tensor-based preprocessing using the STF and STWV tensor analyses. This leads us to the problem of computing CP decompositions of third order tensors that have one or two loading matrices in common. In order to improve the estimates of the loading matrices of each of these tensors, we propose to apply a joint CP (JCP) decomposition that simultaneously computes the loading matrices that are identical for all tensors. This approach is comparable to the JCP decompositions proposed in the context of symmetric [179] or hermitian [180] tensors. We then present the modifications of the ALS and DIAG algorithms (see Section 3.2.2.3, [76, 84]) that have to be carried out to this end. Finally, we examine the accuracy of the EEG and MEG spatial mixing matrix estimates that are obtained by applying the JCP decomposition to STF and STWV data, in comparison to the results achieved for a separate treatment of both modalities by means of simulations.
C.1
EEG/MEG data model
Both EEG and MEG data are measured as a function of sensor position and time and can be stored into two real-valued data matrices, Xeeg and Xmeg of sizes Neeg × Teeg and Nmeg × Tmeg , respectively, where Neeg and Nmeg denote the number of EEG and MEG sensors and Teeg and Tmeg indicate the number of time samples recorded with the EEG and MEG systems. Since the EEG and MEG measurements are generated by the same sources and are generally sampled synchronously, the data can be stored into the larger EEG/MEG data 135
136
APPENDIX C. TENSOR-BASED PREPROCESSING OF EEG/MEG DATA
matrix, which can, according to [91], be modelled as: "
Xmeeg
#
"
#
"
#
H(e) Neeg Xeeg eeg S(e) + . = = N Xmeg H(e) meg meg
(C.1)
Nmeg ×P Neeg ×P are the EEG and MEG spatial mixing and H(e) Here, H(e) meg ∈ R eeg ∈ R matrices, which are specific to the head model and the source positions, S(e) ∈ RP ×T with T = Teeg = Tmeg denotes the signal matrix that contains the temporal activities of P sources, and Neeg and Nmeg are the EEG and MEG noise matrices.
C.2
STF and STWV analyses for EEG and MEG
Since the signal matrices of EEG and MEG are identical, in case of the STF analysis, the Wavelet transform can be computed simultaneously for both EEG and MEG by applying it to the extended data matrix Xmeeg , yielding the tensor W = [Weeg t1 Wmeg ], where t1 denotes a concatenation along the first dimension. The tensors Weeg and Wmeg can be decomposed using the CP model and exhibit two different loading matrices Aeeg and Ameg for the spatial characteristics of EEG and MEG. However, the two loading matrices B and D that contain the time and frequency characteristics are the same for EEG and MEG due to the identical signal matrices. Therefore, in order to improve the results of the CP decomposition, we propose to exploit this property by jointly decomposing the tensors using the JCP decomposition for two common loading matrices asqdescribed in Weeg 0 = w Neeg ||W Section C.3.1. To this end, the tensors should be normalized to Weeg , eeg ||F q
Wmeg 0 = Nmeg ||W , where w is a weighting factor to account for different separability Wmeg meg ||F and SNR of EEG and MEG. On the other hand, the STWV tensors need to be constructed separately for both modalities because EEG and MEG yield physically different measurements and their spatial mixing matrices differ. In the next step of the STWV analysis, the resulting tensors Feeg and Fmeg can be decomposed individually using the CP model. However, in this case, we do not exploit the fact that both modalities are generated by the same sources. In fact, due to the identical EEG and MEG signal matrices, the loading matrices Beeg and Bmeg containing the temporal characteristics of the tensors Feeg and Fmeg should be equal, whereas the loading matrices associated to the space and wave vector characteristics generally differ. To achieve q this, we propose to apply q a JCP decomposition to the Feeg meg 0 0 normalized tensors Feeg = w Neeg ||Feeg ||F and Fmeg = Nmeg ||FFmeg that enforces one ||F loading matrix (in this case the matrix B) to be the same for both tensor decompositions while allowing different loading matrices A and D for the two tensors. This technique is described in detail in Section C.3.2.
C.3
Joint CP decomposition
In this section, we describe some algorithms for the Joint CP (JCP) decomposition of third order tensors that have one or two loading matrices in common.
Joint CP decomposition
C.3.1
137
Two common loading matrices
Consider M tensors Wm ∈ CIm ×J×K , m = 1, . . . , M , (M = 2 for the STF analysis of EEG/MEG data), with common loading matrices B and D in the second and third mode and M different loading matrices Am in the first mode. These tensors can be stacked into a larger tensor W = W1 t1 . . .t1 WM of size (I1 +. . .+IM )×J ×K. The JCP decomposition of the M tensors can then be achieved by solely decomposing the tensor W using any existing algorithm to fit the CP model (see, e.g., Section 3.2.2.3). This yields common T T loading matrices B and D for all tensors and the loading matrix A = [AT 1 , . . . , AM ] that contains all individual mode-1 loading matrices (cf. Figure C.1).
Figure C.1: JCP of M tensors with two common loading matrices.
C.3.2
One common loading matrix
In the following, we consider M tensors Fm ∈ CIm ×J×Km , m = 1, . . . , M , (M = 2 for the STWV analysis of EEG/MEG data). We assume that these tensors have one common loading matrix B in the second mode and different loading matrices Am and Dm in the first and third mode, respectively. The objective consists in decomposing the tensors simultaneously such that the loading matrix B is computed jointly for all tensors while allowing different loading matrices for each tensor in the first and third mode (cf. Figure C.2). Subsequently, we present modified versions of the ALS and DIAG algorithms described in Section 3.2.2.3 that meet these specifications. C.3.2.1
ALS
Starting from an initial setting, the classical ALS algorithm [76] iteratively updates the three loading matrices Am , Bm , and Dm of the tensor Fm , m = 1, . . . , M , until convergence or a certain number of iterations is reached:
+
+
+
Am = [Fm ](1) (Dm Bm )T
Bm = [Fm ](2) (Dm Am )T Dm = [Fm ](3) (Bm Am )T
(C.2) (C.3) .
(C.4)
138
APPENDIX C. TENSOR-BASED PREPROCESSING OF EEG/MEG DATA
Figure C.2: JCP of the MEG and EEG STWV tensors with one common loading matrix that corresponds to the signal matrix. A joint update of the loading matrix B = Bm , m = 1, . . . , M , of the M tensors Fm can hence be incorporated by replacing equation (C.3) by B = [F ](2) D+
(C.5) h
i
where D = [D1 A1 , . . . , DM AM ]T and [F ](2) = [F1 ](2) , . . . , [FM ](2) . The other loading matrices are updated separately according to equations (C.2) and (C.4). C.3.2.2
DIAG
The DIAG algorithm is described in Section 3.2.2.3 and can be used for the individual decomposition of all tensors Fm based on a JEVD and several SVDs. To enforce an identical loading matrix B for all tensors, we can only consider joint diagonalization problems for the mode-2 projection matrix T2 , which has to be equal for all tensors. Consequently, the mode-2 subspace U2 , which is identical for all tensors, needs to be computed jointly to prevent different representations: h
i
[F1 ](2) , . . . , [FM ](2) = U2 Σ2 V2H .
[s]
(C.6)
The matrix U2 then corresponds to the columns of U2 that are associated with the P largest singular values. We then extend the joint diagonalization problem for T2 by m ,lm ) simultaneously diagonalizing all matrices Ψ(k , m = 1, . . . , M , in the following way to m combine all tensors: m ,lm ) m ,lm ) Ψ(k = T2 · Λ(k · T−1 m m 2 .
Results
139
Once an estimate of the matrix T2 has been obtained, the matrix B can be computed. The other loading matrices can be obtained either from rank-1 decompositions of the [s] H matrices T−1 2 (U2 ) [Fm ][2] according to the DIAG algorithm [84] or by recovering, for m ,lm ) each tensor Fm , one loading matrix from the entries of the diagonal matrices Λ(k m and computing the third loading matrix by ALS from the tensor Fm and the two already known loading matrices. In fact, both latter strategies could be combined in order to jointly use the different estimates of the same loading matrix [181].
C.4
Results
In order to analyze the gain in accuracy of the spatial mixing matrix estimates that can be achieved by combining EEG and MEG data in the tensor-based preprocessing using the JCP decomposition, we perform some computer simulations. Contrary to the simulations conducted in the main part of this thesis, where we employ a realistic head model, distributed sources, and physiologically plausible interictal spike signals as well as background activity generated by a neuronal population model, EEG and MEG measurements are here generated using a simplified model. More particularly, we simulate EEG and MEG data for two dipoles sources located at [6.33, −1.35, 4.70] cm and [6.33, 1.35, 4.70] cm with dipole moment vectors [0.98, −0.21, −0.07] cm and [0.98, 0.21, −0.07] cm and Neeg = 64 EEG electrodes as well as Nmeg = 148 MEG sensors (magnetometers) in a 3-shell spherical head model. The radii of the three shells representing the brain, the skull, and the scalp are 8 cm, 8.5 cm, and 9.2 cm with conductivities 3.3 · 10−3 S/cm, 8.25 · 10−5 S/cm, and 3.3 · 10−3 S/cm, respectively. The MEG sensors are positioned on a sphere with radius 10.5 cm. Epileptogenic signals are obtained using the Jansen model [182] with parameters v0 = [7, 6], Br = [0, 100, 50], Aa = [7, 6], Bb = [46.6, 40], and Cc = 135 for two sources and T = 100 time samples that are acquired at a sample rate of 125 Hz. White Gaussian noise is added to the EEG and MEG data according to a given SNR, which is assumed to be equal for EEG and MEG. The STF tensors are built by computing a wavelet transform of the EEG and MEG data using a real-valued Morlet wavelet with a center frequency of 35 Hz and F = 100 frequency samples. The STWV tensors are constructed separately for EEG and MEG by calculating a discrete local Fourier transform over space of data selected by a spherical Blackman window function. For both modalities, we consider K = 63 wave vector samples. Each of the resulting tensors is then decomposed individually using a slightly modified version of the DIAG algorithm, yielding the spatial mixing matrices of the separately treated data. Moreover, we compute the JCP decompositions of the EEG and MEG tensors using the same modified DIAG algorithm. For the present source configuration, we use a weighting factor of w = 4 for the EEG tensor, which is chosen because of the high associated core consistency (cf. Section 3.2.2.1, [76]) of the decomposed tensors. To ensure a real-valued loading matrix for the temporal characteristics of the STWV tensors, one iteration of ALS is applied after the DIAG decomposition. For all cases, we assume that the number of sources and thereby the number of CP components is known. In Figure C.3, we plot the average correlation coefficient of the original and estimated EEG (left) and MEG (right) spatial mixing vectors depending on the SNR. It can be seen that for the STWV analysis, the JCP decomposition of the EEG and MEG tensors generally results in better estimates for the spatial mixing matrices of both modalities,
APPENDIX C. TENSOR-BASED PREPROCESSING OF EEG/MEG DATA
1
spatial mixing vector correlation
spatial mixing vector correlation
140
0.95 0.9 0.85 0.8 0.75
STF meeg STF sep STWV meeg STWV sep
0.7 0.65 −8
−6
−4
−2
SNR in dB
0
2
4
1 0.9 0.8 0.7 0.6 0.5 −8
−6
−4
−2
0
2
4
SNR in dB
Figure C.3: Correlation coefficient of estimated and original EEG (top) and MEG (bottom) spatial mixing vectors depending on the SNR for separate and JCP decomposition of the STF and STWV tensors for two dipoles and 200 realizations. whereas in case of the STF technique, we observe only a small improvement of the MEG spatial mixing matrix estimate. This can be explained by the fact that for STWV preprocessing the JCP decomposition improves the temporal characteristics and therefore the signal matrix estimate and the spatial mixing matrix estimate whereas even though the JCP decomposition of the STF tensor improves the time and frequency characteristics, the spatial characteristics, which provide an estimate of the spatial mixing matrix, are only slightly amended. In case of the STWV analysis, the combination of EEG and MEG improves especially the MEG spatial mixing matrix because the electric potential is more focused than the magnetic field, which facilitates the source separation based on EEG measurements.
C.5
Conclusions
We have shown that, due to the approximately identical signal matrices for EEG and MEG, the two modalities can be combined in tensor-based STF and STWV preprocessing. This can be accomplished by simultaneously decomposing EEG and MEG data tensors using the JCP decomposition introduced in Section C.3, and described for the ALS and DIAG algorithms. As we have demonstrated by simulations, the application of the JCP decomposition to STWV EEG/MEG data leads to clearly improved spatial mixing matrix estimates, whereas in case of the STF analysis, only a slight amendment of the MEG spatial mixing matrix can be achieved.
Appendix D Construction of the STWV tensor In this appendix, we describe the construction of the STWV tensor, which is based on a local spatial Fourier transform of the EEG measurements, in more detail.1 As the space, time, and wave vector variables, r, t, and k, respectively, are sampled, in practice, one has to compute a discrete local Fourier transform. However, this leads to the difficulty that the electrode array used for EEG applications is highly non-uniform and the classical Discrete Fourier Transform (DFT) can thus not be applied. Instead, a non-uniform algorithm needs to be employed, which can – for one dimension – be derived from the continuous Fourier transform U (k) =
Z
∞
u(x) · e−j2πkx dx
(D.1)
−∞
as follows: In the first step, the transformed spatial frequency variable k is discretized as 1 is the distance between the N samples usual: k = µ·k0 where µ ∈ {0, . . . , N −1}, k0 = ∆x of the transformed variable k and ∆x is the interval which corresponds to the length of one period of the periodic continuation of the signal used for the DFT (see also Figure D.1). This means that the samples of k are equidistant. In a second step, the position variable x is discretized, transforming the integral into a sum. The difference to the DFT consists in the fact that samples of x cannot be written as a multiple of some stepsize x0 because these samples are not equidistant. Thus, the actual sampling points xi have to be maintained, leading to the following expression for the 1D non-uniform discrete Fourier transform: U (µk0 ) =
N −1 X
xi
u(xi ) · e−j2πµ ∆x .
(D.2)
i=0
Remark: Note that many definitions of the DFT also include a normalization factor 1 where N is the number of samples in (D.2) which is left out here. Due to the scaling N ambiguity of the CP model, the normalization factor is of no importance. In 3-dimensional space, adding a window function w with spatial intervals ∆x, ∆y, and ∆z in x-, y-, and z-direction, respectively, equation (D.2) takes the following form: U (µ1 k1 , µ2 k2 , µ3 k3 ) =
XXX m
n
xm
yn
zl
w(ri , xm , yn , zl ) · u(xm , yn , zl ) · e−j2π(µ1 ∆x +µ2 ∆y +µ3 ∆z )
l
(D.3) 1
Please note that the material presented in this appendix has previously been published in [183].
141
142
APPENDIX D. CONSTRUCTION OF THE STWV TENSOR
Figure D.1: Non-uniformly sampled data used for the NUDFT.
1 1 1 where k1 = ∆x , k2 = ∆y , and k3 = ∆z stand for the distances between samples of the transformed spatial frequency variables and µ1 , µ2 , and µ3 are the spatial frequency indices. The variables x, y, and z denote the coordinates of the sampling points and ri is the point at which the non-uniform DFT is to be computed. In the case of the STWV analysis, the elements of the transformed function U correspond to the elements of the tensor F in (3.45) and the function u is replaced by the electric potential data. Furthermore, comparing equations (D.3) and (3.44) one can see 1 [µ1 k1 , µ2 k2 , µ3 k3 ]T constitute the discretized forms of the that the vectors [x, y, z]T and 2π position vector r0 and the wave vector k. The STWV method requires the computation of a local Fourier transform for a number of sensor positions ri (it is not possible to determine the local Fourier tranform at each electrode location because the data available at boundary sensors is not sufficient to compute the tranform). The window function is used to select data within the neighborhood of a certain electrode to obtain a local transform (compare Figures D.2 and D.3). If the window is a sphere as assumed in the following, ∆x = ∆y = ∆z and k1 = k2 = k3 . Moreover, w(ri , xm , yn , zl ) = w(||ri − [xm , yn , zl ]T ||2 ).
143
Figure D.2: 3D Non-uniform DFT: Selection of data along the x-axis using a Blackman window function. Window 1 chooses data for the local Fourier transform in the neighborhood of the sensor located at xi . Window 2 is centered at xj and selects data for the local Fourier transform computed at electrode position xj . The spatial interval of the window function is denoted by the variable ∆x.
Figure D.3: Selection of data in a 2-dimensional domain using a window function that is a circle. The black points mark the electrode positions, at which the electric potential is measured. The window is centered at the sensor position for which the local Fourier transform is to be computed. Data outside of the window is not considered for the transform.
Appendix E Theoretical analysis of the trilinear tensor approximation In this appendix, we analyze the trilinear approximation that is made by the STF and STWV analyses and derive sufficient conditions under which these methods yield exact results. The structure of the STF and STWV tensors, which is given by T =
P X
up ◦ Mp
(E.1)
p=1
with U = [u1 , . . . , uP ] ∈ RN ×P , Mp ∈ CM ×J , and rank(Mp ) = Lp (see Section 3.2.4), corresponds to a block-decomposition into rank(1, Lp , Lp )-components, which is unique up to scale and permutation indeterminacies for rank-deficient matrices Mp under certain conditions on N , M , J, Lp and P [184]. However, in practice, the matrices Mp generally have full rank. In this case, it is not possible to identify up and Mp from the given tensor T . In order to restore identifiability, the matrices Mp need to be approximated by ˜ p such that one obtains a model of the form: ˜ p of lower rank L matrices M T˜ =
P X
˜ p. up ◦ M
(E.2)
p=1
˜ p = 1, p = 1, . . . , P , the tensor T˜ can then be decomposed using the CP decomFor L position, which permits to uniquely identify the vectors of interest up up to scale and permutation ambiguities. The objective thus consists in transforming equation (E.1) into equation (E.2). This can, under certain conditions, be achieved by a truncated SVD in one or several modes of the tensor T .1 This procedure can be viewed as some kind of PCA applied to the data in the transformed (time-frequency or space-wave-vector) domain. Remark 1: Please note that a truncated SVD of the first mode does not change the data because the mode-1 unfolding matrix inherently has rank P . 1
The truncated SVD in mode n is obtained by calculating the SVD of the mode-n unfolding matrix and setting all but the P greatest singular values to 0.
145
146
APPENDIX E. TRILINEAR TENSOR APPROXIMATION
Remark 2: Please note that in any case, the transformation of (E.1) into (E.2) and thereby a perfect recovery of the matrix U is only possible if none of the matrices Mp has full rank. Due to a loss of information when passing from (E.1) to (E.2), the matrices Mp cannot be recovered from T˜.
Sufficient conditions for perfect recovery In the following, we determine the conditions under which the SVD permits to obtain the ˜ p = 1. For simplicity, we limit the considerations in the remainder model of (E.2) for L of this appendix to the case of P = 2 components. Nevertheless, we believe that it is possible to extend our analysis to cases where P > 2. We define the following notation for the SVD of M1 and M2 : M1 = M2 =
L1 X l=1 L2 X
σl vl wlT
h
= v 1 V2 h
λl xl ylT = x1 X2
l=1
" i σ 1
0 "
i λ 1
0
#
iT 0 h w 1 W2 Σ2 #
iT 0 h y 1 Y2 Λ2
where V2 = [v2 , . . . , vL ], W2 = [w2 , . . . , wL ], X2 = [x2 , . . . , xL ], Y2 = [y2 , . . . , yL ], σ1 > σ2 > . . . > σL1 , and λ1 > λ2 > . . . > λL2 . Moreover, without loss of generality, we assume that ||u1 ||2 = ||u2 ||2 = 1. For simplicity, we subsequently base our considerations on the mode-2 unfolding of the tensor T . The same considerations can be conducted for the mode-3 unfolding in an analogous way. With the above definitions, and ⊗ denoting the Kronecker product, the mode-2 unfolding of the tensor T can be written as [T ](2) = σ1 v1 (w1 ⊗ u1 )T + λ1 x1 (y1 ⊗ u2 )T + V2 Σ2 (W2 ⊗ u1 )T + X2 Λ2 (Y2 ⊗ u2 )T
(E.3)
= σ1 v1 (w1 ⊗ u1 )T + λ1 x1 (y1 ⊗ u2 )T + R.
(E.4)
We would like to obtain the matrix [T˜](2) = σ1 v1 (w1 ⊗ u1 )T + λ1 x1 (y1 ⊗ u2 )T ,
(E.5)
which corresponds to the CP model T˜ = σ1 u1 ◦ v1 ◦ w1 + λ1 u2 ◦ x1 ◦ y1
(E.6)
and would therefore permit us to recover the vectors u1 and u2 from the mode-2 unfolding matrix [T ](2) by means of a truncated SVD. This is possible if (E.4) corresponds to the SVD of [T ](2) , which is generally not the case. Our objective now consists in finding conditions under which the SVD of [T ](2) takes the form of (E.4) and under which truncation of (E.4) leads to (E.5). Let us consider the case that v1T X2 = 0T , x1T V2 = 0T , w1T Y2 = 0T , and y1T W2 = T 0 . The columns of the matrices σ1 v1 (w1 ⊗ u1 )T and λ1 x1 (y1 ⊗ u2 )T are then pairwise orthogonal to the columns of R and the columns of the matrices σ1 (w1 ⊗ u1 )v1T and λ1 (y1 ⊗ u2 )x1T are pairwise orthogonal to the columns of RT . Due to the correlation between the vectors v1 and x1 , the vectors u1 and u2 , and the vectors w1 and y1 , the
147
two associated mode-2 vectors v ˜1 and x ˜1 that are obtained by the SVD correspond to a linear combination of v1 and x1 . Furthermore, the vectors v ˜1 and x ˜1 are associated with two new singular values, µ1 ≥ max (σ1 , λ1 ) and µ2 ≤ min (σ1 , λ1 ). These singular values can be computed as the square roots of the eigenvalues of h
ih
σ1 v1 (w1 ⊗ u1 )T + λ1 x1 (y1 ⊗ u2 )T σ1 v1 (w1 ⊗ u1 )T + λ1 x1 (y1 ⊗ u2 )T
iT
and are given by µ1,2 =
v u u σ 2 + λ2 + 2σ λ c c c t 1 1 1 1 2 3 1
2
s
±
(σ12 + λ21 + 2σ1 λ1 c1 c2 c3 )2 − σ12 λ21 (1 − c21 c23 )(1 − c22 ) 4
with c1 = u1T u2 , c2 = v1T x1 , and c3 = w1T y1 . If µ2 > 1 , where 1 is the highest singular value of R (which can, depending on the correlation of vectors of X2 and V2 or W2 and Y2 be greater than max(λ2 , σ2 )), the truncation of the SVD of [T ](2) yields the matrix [T˜](2) of equation (E.6) and permits therefore to identify u1 and u2 using the CP decomposition. In an analogous way, one can assess that for ν2 > ϕ1 , where ν2 =
v u u σ 2 + λ2 + 2σ λ c c c t 1 1 1 1 2 3 1
2
s
−
(σ12 + λ21 + 2σ1 λ1 c1 c2 c3 )2 − σ12 λ21 (1 − c21 c22 )(1 − c23 ) 4
and ϕ1 is the greatest singular value of W2 Σ2 (V2 ⊗u1 )T +Y2 Λ2 (X2 ⊗u2 )T , the truncated SVD in the third mode also leads to the CP model (E.6) when reshaping the resulting matrix [T˜](3) . Please note that in the special case where u1 and u2 are orthogonal, the columns of the matrices σ1 v1 (w1 ⊗ u1 )T and λ1 x1 (y1 ⊗ u2 )T in equation (E.4) are also pairwise orthogonal to the columns of R if only v1T X2 = 0T and x1T V2 = 0T . In this case, the conditions w1T Y2 = 0T and y1T W2 = 0T are thus not needed for the mode-2 truncated SVD to obtain a CP model that allows to recover u1 and u2 . Accordingly, for the mode-3 truncated SVD, only the conditions w1T Y2 = 0T and y1T W2 = 0T are required whereas the conditions v1T X2 = 0T and x1T V2 = 0T are unnecessary. As a consequence, it is possible to perfectly recover u1 and u2 based on a truncated SVD of the mode-2 unfolding if the conditions C1) v1T X2 = 0T , x1T V2 = 0T , w1T Y2 = 0T , y1T W2 = 0T , and µ2 > 1 or C2) v1T X2 = 0T , x1T V2 = 0T , u1T u2 = 0, and µ2 > 1 are fulfilled. Furthermore, perfect recovery of u1 and u2 based on a truncated SVD of the mode-3 unfolding is possible under the conditions C3) v1T X2 = 0T , x1T V2 = 0T , w1T Y2 = 0T , y1T W2 = 0T , and ν2 > ϕ1 or C4) w1T Y2 = 0T , y1T W2 = 0T , u1T u2 = 0, and ν2 > ϕ1 . Since the DIAG algorithm (see Section 3.2.2.3, [84, 44]) is based on a truncated SVD in one mode of the tensor, it permits to perfectly recover u1 and u2 if one of the above conditions holds. If DIAG is based on the mode-1 unfolding, the truncated SVD does not
148
APPENDIX E. TRILINEAR TENSOR APPROXIMATION
change the unfolding matrix, which is already of rank P = 2, and therefore does not lead to a loss of information. But contrary to the unfolding matrix of a tensor that follows the CP model, the right signal subspace of the mode-1 unfolding matrix does not have a Kronecker structure. However, this assumed structure is exploited in the following steps of the DIAG algorithm (more particularly during the JET) and its absence generally causes errors on the estimated vectors u ˆ1 and u ˆ2 . These errors are difficult to quantify because they depend on the iterative optimization of the JEVD algorithm and their analysis is not treated in this thesis. Remark 3: For CP decomposition methods that are based on the truncated Higher Order Singular Value Decomposition (HOSVD) such as the algorithm described in [185, 82], the following condition can be derived: The vectors u1 and u2 can be perfectly recovered based on the HOSVD of the tensor if C5) v1T X2 = 0T , x1T V2 = 0T , w1T Y2 = 0T , y1T W2 = 0T , µ2 > 1 , and ν2 > ϕ1 . This condition for perfect recovery is more restrictive than those for the DIAG algorithm described above because the HOSVD performs a truncated SVD in all modes. Therefore, it requires that the perfect recovery conditions for each mode are fulfilled. The combination of conditions C1) and C3) or C2) and C4) leads to condition C5).
Appendix F Convex optimization algorithms for source imaging In this appendix, we describe two efficient convex optimization algorithms that can be employed to minimize the cost functions of regularized least squares approaches: the Fast Iterative Shrinkage Thresholding Algorithm (FISTA) [128] and the Alternating Direction Method of Multipliers (ADMM) [163, 164] (see also [165] and references therein). These algorithms belong to the class of proximal splitting methods [186, 187], which aim at solving optimization problems of the form min ˜ S
M X
˜ fm (S)
(F.1)
m=1
by splitting them into several subproblems, each of which involves only one of the convex functions fm , and by resorting to proximity operators to deal with nonsmooth functions. The proximity operator has first been introduced in [188] and is defined by 1 proxf,β (Y) = arg min ||Y − X||2F + βf (X) X 2
(F.2)
where f is a convex function and matrices X and Y are of the same size. Solutions to (F.2) for f corresponding to the L1 -norm or the L1,2 -norm of X can, for example, be found ˜ = prox||·|| ,λ (Y) ∈ RD×T are in [189, 125]. For the L1 -norm, the elements of the matrix S 1 given by Yd,t S˜d,t = (|Yd,t | − λ)+ |Yd,t |
(F.3)
˜ = prox||·|| ,λ (Y) ∈ RD×T such that and for the L1,2 -norm, one obtains S 1,2
λ
S˜d,t = Yd,t 1 − qP T
2 t=1 Yd,t
where (y)+ = max(y, 0) and d = 1, . . . , D, t = 1, . . . , T . 149
+
(F.4)
150
APPENDIX F. CONVEX OPTIMIZATION ALGORITHMS
F.1
FISTA: Optimization of the MCE and MxNE cost functions
The FISTA algorithm corresponds to an accelerated version of the Iterative Shrinkage Thresholding Algorithm (ISTA) (see, e.g., [190]) and has been developed to solve optimization problems that are composed of one smooth, differentiable, convex function f1 and one nonsmooth convex function f2 . In the context of the regularized least squares approach ˜ = ||X − G ˜ S|| ˜ 22 and f2 corresponds to the regularizing function. to source imaging, f1 (S) FISTA solves the optimization problem (F.1) for M = 2 by iterating over two steps: a gradient step based on the function f1 and a proximal step involving the function f2 . The algorithm can be summarized as follows: ˜ (0) , Z(1) = S ˜ (0) , τ (1) = 1, 0 < µ < L−1 Initialization of S for i = 1 to I do ˜ T (X − GZ ˜ (i) ) Y(i) = Z(i) + µG ˜ (i) = proxf ,λ (Y(i) ) S √2 1+ 1+4τ (i)2 (i+1) τ = 2 (i) −1 (i+1) (i) ˜ (i) − S ˜ (i−1) ) ˜ (S Z = S + ττ (i+1) end for where I denotes the number of iterations. Furthermore, the constant L that determines ˜ For MCE and the maximal stepsize µ corresponds to the spectral norm of the matrix G. MxNE, the function f2 corresponds to the L1 -norm and the L1,2 -norm, respectively.
F.2
ADMM: Optimization of the SVB-SCCD cost function
The SVB-SCCD optimization problem (see Section 4.5.3) involves a cost function with ˜ S|| ˜ 2F , f2 (S) ˜ = f (S), ˜ and f3 (S) ˜ = f (TS) ˜ where f is a ˜ = 1 ||X − G three terms: f1 (S) 2 convex function correpsonding to the L1 -norm or the L1,2 -norm. To employ the ADMM algorithm, we introduce latent variables Y ∈ RE×T and Z ∈ RD×T and reformulate (F.1) as a constrained convex optimization problem: 1 ˜ S|| ˜ 2F + λ(f (Y) + βf (Z)) min ||X − G ˜ 2 S ˜ Z = S. ˜ s. t. Y = TS, (F.5) The ADMM algorithm is based on the augmented Lagrangian that is associated with the optimization problem (F.5) and which is given by ˜ Y, Z, U, W) = 1 ||X − G ˜ S|| ˜ 2F + λ1 f (Y) + λ2 f (Z) + vec(U)T vec(TS ˜ − Y) L(S, 2 ρ ˜ ˜ − Z) + ρ ||S ˜ − Z||2F + ||TS − Y||2F + vec(W)T vec(S (F.6) 2 2 where ρ > 0 is a penalty parameter, U ∈ RE×T and W ∈ RD×T are the Lagrangian mutlipliers, λ1 = λ, and λ2 = λβ. The idea of ADMM consists in alternatingly updating ˜ Y, Z, U, and W by minimizing the Lagrangian with respect to the the matrices S, ˜ Y, and Z and by resorting to the dual ascent method for the updates of the matrices S, matrices U and W. Subsequently, we derive an update rule for each of these matrices.
ADMM: Optimization of the SVB-SCCD cost function
151
˜ Update of the signal matrix S ˜ the derivative To optimize the Lagrangian with respect to the signal matrix S, to 0:
∂L ˜ ∂S
is set
∂L ˜ TX + G ˜ TG ˜S ˜ + TT U − ρTT Y + ρTT TS ˜ + W − ρZ + ρS ˜ = 0. = −G ˜ ∂S Reordering the terms, one obtains: h
i
˜ TG ˜ + ρTT T + ρID S ˜=G ˜ T X − TT U + ρTT Y − W + ρZ, G
˜ which leads to the following update rule for the matrix S: h
˜T
i−1
"
˜= G G ˜ + ρ(T T + ID ) S T
!
1 1 G X+ρ T Y− U +ρ Z− W ρ ρ ˜T
T
!#
.
(F.7)
Update of the latent matrices Y and Z Minimizing the Lagrangian with respect to the matrix Y while keeping the other matrices fixed corresponds to solving the following optimization problem: ˜ Y, Z, U, W) = min λ1 f (Y) − vec(U)T vec(Y) + ρ vec(TS) ˜ T vec(TS) ˜ min L(S, Y Y 2 ρ T T ˜ (F.8) − ρvec(TS) vec(Y) + vec(Y) vec(Y) . 2
˜ which are constant with respect Adding the terms ρ2 vec( Uρ )T vec( Uρ ) + ρvec( Uρ )T vec(TS), to the matrix Y, problem (F.8) can be rewritten as ρ ˜ + 1 U) − Y||2F + λ1 f (Y). min ||(TS Y 2 ρ
(F.9)
This optimization problem can be solved using the proximity operator: ˜ + 1 U). Y = proxf, λ1 (TS ρ ρ
(F.10)
In an analogous way, the following expression can be derived for the matrix Z: ˜ + 1 W). Z = proxf, λ2 (S ρ ρ
(F.11)
Update of the Lagrangian multipliers U and W The matrices U and W are updated using the dual ascent method. In this approach, the convex optimization problem, which consists in minimizing the Lagrangian with respect to the matrix U, called the primal problem, is replaced by an equivalent problem, the dual problem, that aims at maximizing the so-called dual function g with respect to the matrix U. This new optimization problem is then solved using the gradient ascent algorithm. The dual function g can be derived from the Lagrangian. Its gradient corresponds to the residual of the constraint associated with the Lagrangian multiplier U (see [165]):
152
APPENDIX F. CONVEX OPTIMIZATION ALGORITHMS
˜ − Y. Therefore, using gradient ascent with a stepsize that is equal to the = TS penalty parameter ρ, one obtains the following update rule for the matrix U: ∂g ∂U
˜ (k+1) − Y(k+1) ). U(k+1) = U(k) + ρ(TS
(F.12)
Analogously, the matrix W is updated according to: ˜ (k+1) − Z(k+1) ). W(k+1) = W(k) + ρ(S
(F.13)
Bibliography [1] M. S. Hämäläinen, R. Hari, R. J. Ilmoniemi, J. Knuutila, and O. V. Lounasmaa, “Magnetoencephalography - theory, intrumentation, and applications to noninvasive studies of the working human brain,” Reviews on Modern Physics, vol. 65, no. 2, pp. 413 – 497, 1993. [2] H. Hallez, B. Vanrumste, R. Grech, J. Muscat, W. De Clerq, A. Vergult, Y. D’Asseler, K. P. Camilleri, S. G. Fabri, S. Van Huffel, and I. Lernahieu, “Review on solving the forward problem in EEG source analysis,” Journal of NeuroEngineering and Rehabilitation, vol. 4, no. 46, 2007. [3] A. Gramfort, Mapping, timing and tracking cortical activations with MEG and EEG: Methods and application to human vision, Ph.D. thesis, Telecom ParisTech, 2009. [4] J. S. Ebersole, “Noninvasive localization of epileptogenic foci by EEG source modeling,” Epilepsia, vol. 41, pp. S24 – 33, 2000. [5] I. Merlet, “Dipole modeling of interictal and ictal EEG and MEG,” Epileptic Disord Special Issue, pp. 11 – 36, July 2001. [6] C. M. Michel, M. M. Murray, G. Lantz, S. Gonzalez, L. Spinelli, and R. Grave De Peralta, “EEG source imaging,” Clinical Neurophysiology, vol. 115, no. 10, pp. 2195–2222, Oct. 2004. [7] M. Gavaret, J.-M. Badier, P. Marquis, A. McGonigal, F. Bartolomei, J. Regis, and P. Chauvel, “Electric source imaging in frontal lobe epilepsy,” Journal of Clinical Neurophysiology, vol. 23, no. 4, pp. 358–370, Aug. 2006. [8] C. Plummer, A. S. Harvey, and M. Cook, “EEG source localization in focal epilepsy: Where are we now?,” Epilepsia, vol. 49, no. 2, pp. 201 – 218, Feb. 2008. [9] F. Bijma, Mathematical modelling of magnetoencephalographic data, Ph.D. thesis, Vrije Universiteit Amsterdam, 2005. [10] H. H. Jasper, “The ten-twenty electrode system of the International Federation,” Electroencephalography Clinical Neurophysiology, vol. 10, pp. 371–375, 1958. [11] G. E. Chatrian, E. Lettich, and P. L. Nelson, “Ten percent electrode system for topographic studies of spontaneous and evoked EEG activity,” American Journal of EEG Technology, vol. 25, pp. 83–92, 1985. 153
154
BIBLIOGRAPHY
[12] R. Oostenveld and P. Praamstra, “The five percent electrode system for highresolution EEG and ERP measurements,” Clinical Neurophysiology, vol. 112, pp. 713–719, 2001. [13] A. M. Dale and M. I. Sereno, “Improved localization of cortical activity by combining EEG and MEG with MRI cortical surface reconstruction: a linear approach,” Journal of Cognitive Neuroscience, vol. 5, no. 2, pp. 162–176, 1993. [14] D. Cosandier-Rimélé, J.-M. Badier, P. Chauvel, and F. Wendling, “A physiologically plausible spatio-temporal model for EEG signals recorded with intracerebral electrodes in human partial epilepsy,” IEEE Transactions on Biomedical Engineering, vol. 54, no. 3, pp. 380–388, Mar. 2007. [15] G. Birot, L. Albera, F. Wendling, and I. Merlet, “Localisation of extended brain sources from EEG/MEG: the ExSo-MUSIC approach,” NeuroImage, vol. 56, pp. 102 – 113, 2011. [16] L. Albera, A. Ferréol, D. Cosandier-Rimélé, I. Merlet, and F. Wendling, “Brain source localization using a fourth-order deflation scheme,” IEEE Transactions on Biomedical Engineering, vol. 55, no. 2, pp. 490–501, 2008. [17] Y. Cointepas, D. Geoffrey, N. Souedet, I. Denghien, and D. Rivière, “The BrainVISA project: a shared software development infrastructure for biomedical imaging research,” in Proc. of 16th HBM, 2010. [18] D. Rivière, J. Régis, Y. Cointepas, D. Papadopoulos-Orfanos, A. Cachia, and J. F. Mangin, “A freely available Anatomis/BrainVISA package for structural morphometry of the cortical sulci,” NeuroImage, vol. 19, no. 2, pp. 934, 2003. [19] F. Tadel, S. Baillet, J. C. Mosher, D. Pantazis, and R. M. Leahy, “Brainstorm: A user-friendly application for MEG/EEG analysis,” Computational Intelligence and Neuroscience, vol. 2011, no. 8, 2011. [20] A. Gramfort, T. Papadopoulo, E. Olivi, and M. Clerc, “OpenMEEG: opensource software for quasi static bioelectromagnetics,” BioMedical Engineering OnLine, vol. 45, no. 9, 2010. [21] J. Kybic, M. Clerc, T. Abboud, O. Faugeras, R. Keriven, and T. Papadopoulo, “A common formalism for the integral formulations of the forward EEG problem,” IEEE Transactions on Medical Imaging, vol. 24, no. 1, pp. 12 – 28, 2005. [22] D. Cosandier-Rimélé, I. Merlet, J. Badier, P. Chauvel, and F. Wendling, “The neuronal sources of EEG: Modeling of simultaneous scalp and intracerebral recordings in epilepsy,” NeuroImage, vol. 42, no. 1, pp. 135–146, Apr. 2008. [23] R. Vigario and E. Oja, “BSS and ICA in neuroinformatics: from current practices to open challenges,” IEEE Reviews in Biomedical Engineering, vol. 1, pp. 50–61, 2008. [24] P. Comon and C. Jutten, Eds., Handbook of Blind Source Separation, Academic Press, New York, 2010.
BIBLIOGRAPHY
155
[25] T.-P. Jung, S. Makeig, C. Humphries, T. W. Lee, M. J. McKeown, and V. Iragui, “Removing electroencephalographic artifacts by blind source separation,” Psychophysilogy, vol. 37, no. 2, pp. 163–178, 2000. [26] R. Vigario, J. Sarela, V. Jousmaki, M. Hämäläinen, and E. Oja, “Independent component approach to the analysis of EEG and MEG,” IEEE Transactions on Biomedical Engineering, vol. 47, no. 5, pp. 589–593, 2000. [27] E. Urrestarazu, J. Iriarte, M. Alegre, M. Valencia, C. Viteri, and J. Artieda, “Independent component analysis removing artifacts in ictal recordings,” Epilepsia, vol. 45, no. 9, pp. 1071–1078, 2004. [28] W. De Clerq, A. Vergult, B. Vanrumste, W. Van Paesschen, and S. Van Huffel, “Canonical correlation analysis applied to remove muscle artefacts from the electroencephalogram,” IEEE Transactions on Biomedical Engineering, pp. 2583–2587, 2006. [29] B. W. McMenamin, A. J. Shackman, J. S. Maxwell, D. R. W. Bachhuber, A. M. Koppenhaver, L. L. Greischar, and R. J. Davidson, “Validation of ICA-based myogenic artifact correction for scalp and source-localized EEG,” NeuroImage, vol. 49, pp. 2416–2432, 2010. [30] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and E. Moulines, “A blind source separation technique using second-order statistics,” IEEE Transactions on Signal Processing, vol. 45, no. 2, pp. 434–444, Feb. 1997. [31] L. Albera, A. Kachenoura, P. Comon, A. Karfoul, F. Wendling, L. Senhadji, and I. Merlet, “ICA-based EEG demonising: a comparative analysis of fifteen methods,” Special Issue of the Bulletin of the Polish Academy of Sciences - Technical sciences, vol. 60, no. 3, pp. 407–418, 2012. [32] A. Delorme, J. Palmer, J. Onton, T. Oostenveld, and S. Makeig, “Independent EEG sources are dipolar,” PLOS ONE, vol. 7, no. 2, Feb. 2012. [33] P. Comon, “Independent component analysis - a new concept?,” Signal Processing, vol. 36, pp. 287–314, 1994. [34] W. Lu and J. C. Rajapakse, “Approach and applications of constrained ICA,” IEEE Transactions on Neural Networks, vol. 16, no. 1, pp. 203–212, Jan. 2005. [35] W. Lu and J. C. Rajapakse, “Constrained independent component analysis,” in Advances in Neural Information Processing Systems 13 (NIPS2000). 2000, pp. 570– 576, MIT Press. [36] A. Adib, E. Moreau, and D. Aboutajdine, “Source separation contrasts using a reference signal,” IEEE Signal Processing Letters, vol. 11, no. 3, pp. 312–315, 2004. [37] W. Lu and J. C. Rajapakse, “ICA with reference,” Neurocomputing, vol. 69, pp. 2244–2257, 2006.
156
BIBLIOGRAPHY
[38] D. S. Huang and J.-X. Mi, “A new constrained independent component analysis method,” IEEE Transactions on Neural Networks, vol. 18, no. 5, pp. 1532–1535, 2007. [39] J.-X. Mi, “A novel algorithm for independent component analysis with reference and methods for its applications,” PLOS ONE, vol. 9, no. 5, 2014. [40] A. Hyvärinen and E. Oja, “Independent component analysis: Algorithms and applications,” Neural Networks, vol. 31, no. 4–5, pp. 411–430, 2000. [41] P. A. Regalia, “An adaptive unit norm filter with applications to signal analysis and Karhunen-Loève transformations,” IEEE Transactions on Circuits and Systems, vol. 37, no. 5, 1990. [42] N. Delfosse and P. Loubaton, “Adaptive blind separation of independent sources: a deflation approach,” Signal Processing, vol. 45, pp. 59–83, 1995. [43] A. Vergult, W. De Clerq, A. Palmini, B. Vanrumste, P. Dupont, S. Van Huffel, and W. Van Paesschen, “Improving the interpretation of ictal scalp EEG: BSS-CCA algorithm for muscle artifact removal,” Epilepsia, vol. 48, no. 5, pp. 950 – 958, May 2007. [44] X. Luciani and L. Albera, “Canonical polyadic decomposition based on joint eigenvalue decomposition,” Chemometrics and Intelligent Laboratory Systems, vol. 132, pp. 152–167, Mar. 2014. [45] T. W. Lee, M. Girolami, and T. J. Sejnowski, “Independent component analysis using an extended Infomax algorithm for mixed sub-Gaussian and super-Gaussian sources,” Neural Computation, vol. 11, no. 2, pp. 417–441, 1999. [46] J. F. Cardoso and A. Souloumiac, “Blind beamforming for non-Gaussian signals,” IEE Proceedings F, vol. 140, no. 6, pp. 362–370, 1993. [47] A. Hyvärinen, “Fast and robust fixed-point algorithms for independent component analysis,” IEEE Transactions on Neural Networks, vol. 10, no. 3, pp. 626–634, May 1999. [48] V. Zarzoso and P. Comon, “Robust independent component analysis by iterative maximization of the kurtosis contrast with algebraic optimal step size,” IEEE Transactions on Neural Networks, vol. 21, no. 2, pp. 248–261, 2010. [49] A. Ferréol, L. Albera, and P. Chevalier, “Fourth order blind identification of underdetermined mixtures of sources (FOBIUM),” IEEE Transactions on Signal Processing, vol. 53, no. 3, pp. 1254–1271, 2005. [50] L. De Lathauwer, J. Castaing, and J. F. Cardoso, “Fourth-order cumulant-based blind identification of underdetermined mixtures,” IEEE Transactions on Signal Processing, vol. 55, no. 6, pp. 2965–2973, 2007. [51] V. Zarzoso, P. Comon, and R. Phlypo, “A contrast function for independent component analysis without permutation ambiguity,” IEEE Transactions on Neural Networks, vol. 21, no. 5, pp. 863–868, 2010.
BIBLIOGRAPHY
157
[52] L. Albera, A. Ferréol, P. Chevalier, and P. Comon, “ICAR, a tool for blind source separation using fourth order statistics only,” IEEE Transactions on Signal Processing, vol. 53, pp. 3633–3643, 2005. [53] A. Karfoul, L. Albera, and L. De Lathauwer, “Iterative methods for the canonical decomposition of multi-way arrays: Application to blind underdetermined mixture identification,” Signal Processing, vol. 91, no. 8, pp. 1789–1802, 2011. [54] V. Zarzoso, P. Comon, and M. Kallel, “How fast is FastICA?,” in European Signal Processing Conference (EUSIPCO), Florence, Italy, Sep. 4-8 2006. [55] P. Comon, “Séparation de mélanges de signaux,” in XIIème Colloque Gretsi, Juan les Pins, 12 -16 juin 1989, pp. 137–140. [56] P. Comon, “Analyse en Composantes Indépendantes et identification aveugle,” Traitement du Signal, vol. 7, no. 3, pp. 435–450, Dec. 1990, Numero special non lineaire et non gaussien. [57] L. Albera, P. Comon, and H. Xu, “SAUD, un algorithme d’ICA par déflation semialgébrique,” in Colloque GRETSI, Troyes, France, Sept. 2007, pp. 1013–1016. [58] P. Comon, X. Luciani, and A. L. F. De Almeida, “Tensor decompositions, alternating least squares and other tales,” Journal of Chemometrics, vol. 23, pp. 393–405, 2009. [59] B. Makki Abadi, D. Jarchi, and S. Sanei, “Simultaneous localization and separation of biomedical signals by tensor factorization,” in IEEE Proc. of 15th Workshop on Statistical Signal Processing, Cardiff, UK, Aug. 31 - Sep. 3 2009, pp. 497–500. [60] B. Makkiabadi, D. Jarchi, and S. Sanei, “Blind separation and localization of correlation P300 subcomponents from single trial recordings using extended PARAFAC2 tensor model,” in Proc. of 33rd Annual International Conference of the IEEE EMBS, Boston, MA, Aug. 30 - Sep. 3 2011, pp. 6955–6958. [61] M. Weis, D. Jannek, F. Römer, T. Günther, M. Haardt, and P. Husar, “Multidimensional PARAFAC2 component analysis of multi-channel EEG data including temporal tracking,” in IEEE Proc. of EMBC, Buenos Aires, Argentina, 2010. [62] M. Weis, D. Jannek, T. Günther, P. Husar, F. Römer, and M. Haardt, “Temporally resolved multi-way component analysis of dynamic sources in event-related EEG data using PARAFAC2,” in IEEE Proc. of EUSIPCO, Aalborg, Denmark, 2010, pp. 696–700. [63] K. H. Knuth, A. S. Shah, W. A. Truccolo, M. Ding, S. L. Bressler, and C. E. Schroeder, “Differentially variable component analysis: identifying mutliple evoked components using trial-to-trial variability,” Journal of Neurophysiology, vol. 95, pp. 3257–3276, 2006. [64] M. Morup, L. K. Hansen, S. M. Arnfred, L.-H. Lim, and K. H. Madsen, “Shiftinvariant multilinear decomposition of neuroimaging data,” NeuroImage, vol. 42, no. 4, pp. 1439–1450, Aug. 2008.
158
BIBLIOGRAPHY
[65] M. Morup, L. K. Hansen, and K. H. Madsen, “Modeling latency and shape changes in trial based neuroimaging data,” in Proc. of Asilomar-SCC, Monterey, CA, Nov. 2011. [66] J. Möcks, “Decomposing event-related potentials: a new topographic components model,” Biological Psychology, vol. 26, pp. 199–215, 1988. [67] J. Möcks, “Topographic components model for event-related potentials and some biophysical considerations,” IEEE Transactions on Biomedical Engineering, vol. 35, no. 6, pp. 482–484, June 1988. [68] A. S. Field and D. Graupe, “Topographic component (parallel factor) analysis of multichannel evoked potentials: Practical issues in trilinear spatiotemporal decomposition,” Brain Topography, vol. 3, no. 4, pp. 407–423, 1991. [69] F. Miwakeichi, E. Martinez-Montes, P. A. Valdes-Sosa, N. Nishiyama, H. Mizuhara, and Y. Yamaguchi, “Decomposing EEG data into space-time-frequency components using parallel factor analysis,” NeuroImage, vol. 22, pp. 1035–1045, 2004. [70] M. Morup, L. K. Hansen, C. S. Herrmann, J. Parnas, and S. M. Arnfred, “Parallel factor analysis as an exploratory tool for wavelet transformed event-related EEG,” NeuroImage, vol. 29, pp. 938–947, 2006. [71] M. De Vos, L. De Lathauwer, V. Vanrumste, S. Van Huffel, and W. Van Paesschen, “Canonical decomposition of ictal scalp EEG and accurate source localisation: Principles and simulations study,” Computational Intelligence and Neuroscience, 2007. [72] M. De Vos, A. Vergult, L. De Lathauwer, W. De Clercq, S. Van Huffel, P. Dupont, A. Palmini, and W. Van Paesschen, “Canonical decomposition of ictal scalp EEG reliably detects the seizure onset zone,” NeuroImage, vol. 37, pp. 844–854, 2007. [73] W. Deburchgraeve, P. J. Cherian, M. De Vos, R. M. Swarte, J. H. Blok, G. H. Visser, and P. Govaert, “Neonatal seizure localization using Parafac decomposition,” Clinical Neurophysiology, vol. 120, pp. 1787–1796, 2009. [74] M. Weis, F. Römer, M. Haardt, D. Jannek, and P. Husar, “Multi-dimensional space-time-frequency component analysis of event related EEG data using closedform PARAFAC,” in IEEE Proc. of ICASSP, Taipei, Taiwan, 2009, pp. 349 – 352. [75] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009. [76] R. Bro, Multi-way analysis in the food industry: Models, algorithms and applications, Ph.D. thesis, University of Amsterdam (NL), 1998. [77] J. B. Kruskal, “Three-way arrays: Rank and uniqueness of trilinear decompositions,” Linear Algebra and Applications, vol. 18, no. 2, pp. 95 – 138, 1977. [78] N. D. Sidiropoulos and R. Bro, “On the uniqueness of multilinear decompositions of N -way arrays,” Journal of Chemometrics, vol. 14, no. 3, pp. 229–239, 2000.
BIBLIOGRAPHY
159
[79] L.-H. Lim and P. Comon, “Blind multilinear identification,” IEEE Transactions on Information Theory, vol. 60, no. 2, pp. 1260–1280, 2013, arxiv:1212.6663. [80] A. H. Phan, P. Tichavský, and A. Cichocki, “Low complexity damped Gauss-Newton algorithms for CANDECOMP/PARAFAC,” SIAM Journal of Matrix Analysis and Applications, vol. 34, no. 1, pp. 126 – 147, 2013. [81] L. Sorber, M. Van Barel, and L. De Lathauwer, “Optimization-based algorithms for tensor decompositions: canonical polyadic decomposition, decomposition in rank(Lr , Lr , 1) terms and a new generalization,” SIAM Journal on Optimization, vol. 23, no. 2, pp. 695–720, 2013. [82] F. Römer and M. Haardt, “A semi-algebraic framework for approximate CP decompositions via simultaneous matrix diagonalization (SECSI),” Signal Processing, vol. 93, pp. 2462–2473, 2013. [83] S. Hajipour Sardouie, L. Albera, M. Bagher Shamsollahi, and I. Merlet, “From simultaneous Schur decomposition to canonical polyadic decomposition of complexvalued multi-way arrays,” submitted to IEEE Transactions on Signal Processing, 2013. [84] X. Luciani and L. Albera, “Semi-algebraic canonical decomposition of multi-way arrays and joint eigenvalue decomposition,” in IEEE Proc. of ICASSP, Prague, Czech Republic, 2011, pp. 4104 – 4107. [85] X. Luciani and L. Albera, “Joint eigenvalue decomposition of non-defective matrices based on the LU factorization with application to ICA,” submitted to IEEE Transactions on Signal Processing, 2014. [86] M. Unser, A. Aldroubi, and S. J. Schiff, “Fast implementation of the continuous wavelet transform with integer scales,” IEEE Transactions on Signal Processing, vol. 42, no. 12, Dec. 1994. [87] O. Rioul and P. Duhamel, “Fast algorithms for disrete and continuous wavelet transforms,” IEEE Transactions on Information Theory, vol. 38, no. 2, pp. 569 – 586, Mar. 1992. [88] P. Comon and G. H. Golub, “Tracking a few extreme singular values and vectors in signal processing,” Proceedings of the IEEE, vol. 36, pp. 287–314, 1990. [89] R. Grech, T. Cassar, J. Muscat, K. P. Camilleri, S. G. Fabri, M. Zervakis, P. Xanthopoulos, V. Sakkalis, and B. Vanrumste, “Review on solving the inverse problem in EEG source analysis,” Journal of NeuroEngineering and Rehabilitation, vol. 5, Nov. 2008. [90] J. C. Mosher, P. S. Lewis, and R. M. Leahy, “Multiple dipole modeling and localization from spatio-temporal MEG data,” IEEE Transactions on Biomedical Engineering, vol. 39, pp. 541–557, June 1992. [91] J. C. Mosher and R. M. Leahy, “Source localization using recursively applied and projected (RAP) MUSIC,” IEEE Transactions on Signal Processing, vol. 47, no. 2, pp. 332–340, Feb. 1999.
160
BIBLIOGRAPHY
[92] K. Sekihara, M. Sahani, and S. S. Nagarajan, “Localization bias and spatial resolution of adaptive and non-adaptive spatial filters for MEG source reconstruction,” NeuroImage, vol. 25, pp. 1056 – 1067, 2005. [93] J. Tao, S. Hawes-Ebersole, and J. Ebersole, “Intracranial EEG substrates of scalp EEG interictal spikes,” Epilepsia, vol. 46, no. 5, pp. 669–676, May 2005. [94] I. Merlet and J. Gotman, “Reliability of dipole models of epileptic spikes,” Clinical Neurophysiology, vol. 110, no. 6, pp. 1013 – 1028, June 1999. [95] J. S. Ebersole, “Magnetoencephalography/magnetic source imaging in the assessment of patients with epilepsy,” Epilepsia, vol. 38, pp. S1– 5, 1997. [96] N. Mikuni, T. Nagamine, A. Ikeda, K. Terada, W. Taki, J. Kimura, H. Kikuchi, and H. Shibasaki, “Simultaneous recording of epileptiform discharges by MEG and subdural electrodes in temporal lobe epilepsy,” NeuroImage, vol. 5, pp. 298–306, 1997. [97] M. Oishi, H. Otsubo, S. Kameyama, N. Morota, H. Masuda, M. Kitayama, and R. Tanaka, “Epileptic spikes: magnetoencephalography versus simultaneous electrocorticography,” Epilepsia, vol. 43, no. 11, pp. 1390–1395, 2002. [98] H. Shigeto, T. Morioka, K. Hisada, S. Nishio, H. Ishibashi, D. Kira, S. Tobimatsu, and M. Kato, “Feasibility and limitations of magnetoencephalographic detection of epileptic discharges: simultaneous recording of magnetic fields and electrocorticography,” Neurological Research, vol. 24, no. 6, pp. 531–536, 2002. [99] M. S. Hämäläinen and R. J. Ilmoniemi, “Interpreting magnetic fields of the brain: Estimates of current distributions,” Technical Report TKK-F-A559, Helsinki University of Technology, Finland, 1984. [100] A. M. Dale, A. K. Liu, B. R. Fischl, R. L. Buckner, J. W. Belliveau, J. D. Lewine, and E. Halgren, “Dynamic statistical parametric mapping: combining fMRI and MEG for high-resolution imaging of cortical activity,” Neuron, vol. 26, no. 1, pp. 55–67, 2000. [101] R. D. Pascual-Marqui, “Standardized low resolution brain electromagnetic tomography (sLORETA): technical details,” Methods and Findings in Experimental and Clinical Pharmacology, 2002. [102] R. D. Pascual-Marqui, C. M. Michel, and D. Lehmann, “Low resolution electromagnetic tomography: A new method for localizing elecrical activity in the brain,” Int. Journal of Psychophysiology, vol. 18, pp. 49 – 65, 1994. [103] M. Wagner, M. Fuchs, H. A. Wischmann, and R. Drenckhahn, “Smooth reconstruction of cortical sources from EEG and MEG recordings,” NeuroImage, vol. 3, no. 3, pp. S168, 1996. [104] F. Lin, J. W. Belliveau, A. M. Dale, and M. S. Hämäläinen, “Distributed current estimates using cortical orientation constraints,” Human Brain Mapping, vol. 27, pp. 1 – 13, 2006.
BIBLIOGRAPHY
161
[105] P. Xu, Y. Tian, H. Chen, and D. Yao, “Lp norm iterative sparse solution for EEG source localization,” IEEE Transactions on Biomedical Engineering, vol. 54, no. 3, Mar. 2007. [106] I. F. Gorodnitsky, J. S. George, and B. D. Rao, “Neuromagnetic source imaging with FOCUSS: a recursive weighted minimum norm algorithm,” Electroencephalography and Clinical Neurophysiology, pp. 231–251, 1995. [107] I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm,” IEEE Transactions on Signal Processing, vol. 45, no. 3, pp. 600–615, 1997. [108] K. Matsuura and Y. Okabe, “Selective minimum-norm solution of the biomagnetic inverse problem,” IEEE Transactions on Biomedical Engineering, vol. 42, pp. 608 – 615, 1995. [109] K. Matsuura and Y. Okabe, “A robust reconstruction of sparse biomagnetic sources,” IEEE Transactions on Biomedical Engineering, vol. 44, no. 8, pp. 720 – 726, Aug. 1997. [110] K. Uutela, M. Hämäläinen, and E. Somersalo, “Visualization of magnetoencephalographic data using minimum current estimates,” NeuroImage, vol. 10, pp. 173 – 180, 1999. [111] M.-X. Huang, A. M. Dale, T. Song, E. Halgren, D. L. Harrington, I. Podgorny, J. M. Canive, S. Lewis, and R. R. Lee, “Vector-based spatial-temporal minimum L1-norm solution for MEG,” NeuroImage, vol. 31, pp. 1025–1037, 2006. [112] L. Ding and B. He, “Sparse source imaging in EEG with accurate field modeling,” Human Brain Mapping, vol. 19, Sept. 2008. [113] M. Vega-Hernández, E. Martínes-Montes, J. M. Sánchez-Bornot, A. LageCastellanos, and P. A. Valdés-Sosa, “Penalized least squares methods for solving the EEG inverse problem,” Statistics Sinica, vol. 18, pp. 1535 – 1551, 2008. [114] W. Chang, A. Nummenmaa, J. Hsieh, and F. Lin, “Spatially sparse source cluster modeling by compressive neuromagnetic tomography,” NeuroImage, vol. 53, May 2010. [115] S. Haufe, V. Nikulin, A. Ziehe, K.-R. Mueller, and G. Nolte, “Combining sparsity and rotational invariance in EEG/MEG source reconstruction,” NeuroImage, vol. 42, 2008. [116] L. Ding, “Reconstructing cortical current density by exploring sparseness in the transform domain,” Physics in Medicine and Biology, vol. 54, pp. 2683 – 2697, 2009. [117] L. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D, pp. 259–268, Jan. 1992. [118] A. Chambolle, “An algorithm for total variation minimization and applications,” Journal of Mathematical Imaging and Vision, vol. 20, no. 1–2, pp. 89–97, Jan. 2004.
162
BIBLIOGRAPHY
[119] G. Adde, M. Clerc, and R. Keriven, “Imaging methods for MEG/EEG inverse problem,” in Proc. Joint Meeting of 5th International Conference on Bioelectromagnetism and 5th International Symposium on Noninvasive Functional Source Imaging, 2005. [120] K. Liao, M. Zhu, L. Ding, S. Valette, W. Zhang, and D. Dickens, “Sparse imaging of cortical electrical current densities using wavelet transforms,” Physics in Medicine and Biology, vol. 57, pp. 6881 – 6901, 2012. [121] K. Liao, M. Zhu, and L. Ding, “A new wavelet transform to sparsely represent cortical current densities for EEG/MEG inverse problems,” Computer Methods and Programs in Biomedicine, vol. 111, no. 2, pp. 376–388, 2013. [122] M. Zhu, W. Zhang, D. L. Dickens, and L. Ding, “Reconstructing spatially extended brain sources via enforcing multiple transform sparseness,” NeuroImage, vol. 86, pp. 280–293, 2014. [123] F. Alizadeh and D. Goldfarb, “Second-order cone programming,” Tech. Rep. 512001, Rutgers University, 2001. [124] S. Boyd and L. Vandenberghe, Convex optimization, Cambridge University Press, 2004. [125] A. Gramfort, M. Kowalski, and M. Hämäläinen, “Mixed-norm estimates for the M/EEG inverse problem using accelerated gradient methods,” Physics in Medicine and Biology, vol. 57, pp. 1937 – 1961, 2012. [126] E. Ou, M. Hämäläinen, and P. Golland, “A distributed spatio-temporal EEG/MEG inverse solver,” NeuroImage, vol. 44, 2009. [127] J. Montoya-Martínez, A. Artés-Rodríguez, M. Pontil, and L. K. Hansen, “A regularized matrix factorization approach to induce structured sparse-low rank solutions in the EEG inverse problem,” EURASIP Journal on Advances in Signal Processing, vol. 97, 2014. [128] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal of Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009. [129] T. S. Tian and Z. Li, “A spatio-temporal solution for the EEG/MEG inverse problem using group penalization methods,” Statistics and Its Interface, vol. 4, no. 4, pp. 521–533, 2011. [130] A. Gramfort, D. Strohmeier, J. Haueisen, M. Hämäläinen, and M. Kowalski, “Timefrequency mixed-norm estimates: Sparse M/EEG imaging with non-stationary source activations,” NeuroImage, vol. 70, pp. 410 – 422, 2013. [131] A. Bolstad, B. Van Veen, and R. Nowak, “Space-time event sparse penalization for magneto-/electroencephalography,” NeuroImage, vol. 46, pp. 1066 – 1081, 2009. [132] D. Wipf and S. Nagarajan, “A unified Bayesian framework for MEG/EEG source imaging,” NeuroImage, vol. 44, pp. 947 – 966, 2009.
BIBLIOGRAPHY
163
[133] J. Daunizeau and K. Friston, “A mesostate-space model for EEG and MEG,” NeuroImage, vol. 38, pp. 67–81, 2007. [134] K. J. Friston, L. Harrison, J. Daunizeau, S. Kiebel, C. Phillips, N. Trujillo-Barreto, R. Henson, G. Flandin, and J. Mattout, “Multiple sparse priors for the M/EEG inverse problem,” NeuroImage, vol. 39, no. 1, pp. 1104–1120, 2008. [135] K. J. Friston, W. Penny, C. Phillips, S. Kiebel, G. Hinton, and J. Ashburner, “Classical and Bayesian inference in neuroimaging: theory,” NeuroImage, vol. 16, pp. 465–483, 2002. [136] C. Phillips, J. Mattout, M. Rugg, P. Maquet, and K. Friston, “An empirical Bayesian solution to the source reconstruction problem in EEG,” NeuroImage, vol. 24, pp. 997–1011, 2005. [137] J. Mattout, C. Phillips, W. Penny, M. Rugg, and K. Friston, “MEG source localization under multiple constraints: an extended Bayesian framework,” NeuroImage, vol. 30, pp. 753–767, 2006. [138] D. Wipf, J. Owen, H. Attias, K. Sekihara, and S. Nagarajan, “Robust Bayesian estimation of the location, orientation, and time course of multiple correlated neural sources using MEG,” NeuroImage, vol. 49, pp. 641 – 655, 2010. [139] J. P. Owen, D. Wipf, H. Attias, K. Sekihara, and S. Nagarajan, “Performance evaluation of the Champagne source reconstruction algorithm on simulated and real M/EEG data,” NeuroImage, vol. 60, pp. 305 – 323, 2012. [140] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statisitcal Society. Series B (Methodological), vol. 39, no. 1, pp. 1–38, 1977. [141] J. Gross, J. Kujala, M. Hämäläinen, L. Timmermann, A. Schnitzler, and R. Salmelin, “Dynamic imaging of coherent sources: Studying neural interactions in the human brain,” PNAS, vol. 98, no. 2, pp. 694 – 699, 2001. [142] B. C. Van Veen, W. Van Drongelen, M. Yuchtman, and A. Suzuki, “Localization of brain electrical activity via linearly constrained minimum variance spatial filtering,” IEEE Transactions on Biomedical Engineering, vol. 44, no. 9, Sept. 1997. [143] K. Sekihara, S. S. Nagarajan, D. Poeppel, A. Marantz, and Y. Miyashita, “Reconstructing spatio-temporal activities of neural sources using an MEG vector beamformer technique,” IEEE Transactions on Biomedical Engineering, vol. 48, no. 7, pp. 760–771, 2001. [144] K. Sekihara, S. S. Nagarajan, D. Poeppel, A. Marantz, and Y. Miyashita, “Application of an MEG eigenspace beamformer to reconstructing spatio-temporal activities of neural sources,” Human Brain Mapping, vol. 15, pp. 199–215, 2002. [145] K. Sekihara, S. S. Nagarajan, D. Poeppel, and A. Marantz, “Performance of an MEG adaptive-beamformer technique in the presence of correlated neural activities: Effects on signal intensity and time-course estimates,” IEEE Transactions on Biomedical Engineering, vol. 49, no. 12, pp. 1534–1546, 2002.
164
BIBLIOGRAPHY
[146] S. S. Dalal, K. Sekihara, and S. S. Nagarajan, “Modified beamformers for coherent source region suppression,” IEEE Transactions on Biomedical Engineering, vol. 53, no. 7, pp. 1357–1363, July 2006. [147] M. J. Brookes, C. M. Stevenson, G. R. Barnes, A. Hillebrand, M. I. G. Simpson, S. T. Francis, and P. G. Morris, “Beamformer reconstruction of correlated sources using a modified source model,” NeuroImage, vol. 34, pp. 1454–1465, 2007. [148] M. Popescu, E. Popescu, T. Chan, S. Blunt, and J. Lewine, “Spatial-temporal reconstruction of bilateral auditory steady state responses using MEG beamformers,” IEEE Transactions on Biomedical Engineering, vol. 55, no. 3, pp. 1092–1102, 2008. [149] H. B. Hui, D. Pantazis, S. L. Bressler, and R. M. Leahy, “Identifying true cortical interactions in MEG using beamformes in electromagnetic source imaging,” NeuroImage, vol. 49, pp. 3161–3174, 2010. [150] M. A. Quraan and D. Cheyne, “Reconstruction of correlated brain activity with adaptive spatial filters in MEG,” NeuroImage, vol. 49, pp. 2387–2400, 2010. [151] A. Moiseev, J. M. Gaspar, J. A. Schneider, and A. T. Herdman, “Application of multi-source minimum variance beamformers for reconstruction of correlated neural activity,” NeuroImage, vol. 58, pp. 481–496, 2011. [152] T. Limpiti, B. D. Van Veen, and R. T. Wakai, “Cortical patch basis model for spatially extended neural activity,” IEEE Transactions on Biomedical Engineering, vol. 53, no. 9, pp. 1740 – 1754, 2006. [153] P. Chevalier, A. Ferréol, and L. Albera, “High-resolution direction finding from higher order statistics: the 2q-MUSIC algorithm,” IEEE Transactions on Signal Processing, vol. 54, no. 8, pp. 2986–2997, 2006. [154] P. Mc Cullagh, Tensor methods in statistics, chapter Monographs on statistics and applied probability, Chapman and Hall, London, U. K., 1987. [155] C. Grova, J. Daunizeau, J. M. Lina, C. G. Bénar, H. Benali, and J. Gotman, “Evaluation of EEG localization methods using realistic simulations of interictal spikes,” NeuroImage, vol. 29, no. 3, pp. 734 – 753, 2006. [156] J.-H. Cho, S. B. Hong, Y.-J. Jung, H.-C. Kang, H. D. Kim, M. Suh, K.-Y. Jung, and C.-H. Im, “Evaluation of algorithms for intracranial EEG (iEEG) source imaging of extended sources: feasibility of using (iEEG) source imaging for localizing epileptogenic zones in secondary generalized epilepsy,” Brain Topography, , no. 24, pp. 91–104, 2011. [157] J. Bourien, F. Bartolomei, J. J. Bellanger, M. Gavaret, P. Chauvel, and F. Wendling, “A method to identify reproducible subsets of co-activated structures during interictal spikes. application to intracerebral EEG in temporal lobe epilepsy.,” Clinical Neurophysiology, vol. 116, no. 2, pp. 443 – 455, Feb. 2005. [158] H. Becker, L. Albera, P. Comon, R. Gribonval, F. Wendling, and I. Merlet, “A performance study of various brain source imaging approaches,” in Proc. of ICASSP, Florence, Italy, 2014, pp. 5910–5914.
BIBLIOGRAPHY
165
[159] L. Baldassarre, J. Mourao-Miranda, and M. Pontil, “Structured sparsity models for brain decoding from fmri data,” in Proc. of PRNI Conf., 2012, pp. 5–8. [160] A. Gramfort, B. Thirion, and G. Varoquaux, “Identifying predictive regions from fMRI with TV-`1 prior,” in Proc. of PRNI Conf., 2013. [161] R. Tibshirani and N. Saunders, “Sparsity and smoothness via the fused LASSO,” Journal of the Royal Statistical Society B, vol. 67, no. Part I, pp. 91 – 108, 2005. [162] S. Ma, W. Yin, Y. Zhang, and A. Chakraborty, “An efficient algorithm for compressed mr imaging using total variation and wavelets,” in IEEE Proc. of CVPR, 2008, pp. 1–8. [163] D. Gabay and B. Mercier, “A dual algorithm for the solution of nonlinear variational problems via finite elements approximations,” Computers and Mathematics with Applications, vol. 2, pp. 17–40, 1976. [164] R. Glowinski and A. Marrocco, “Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité, d’une classe de problèmes de dirichlet non linéaires,” Revue Française d’Automatique, Informatique, et Recherche Opérationelle, vol. 9, pp. 41–76, 1975. [165] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1 – 122, 2010. [166] S. Baillet, J. C. Mosher, and R. M. Leahy, “Electromagnetic brain mapping,” IEEE Signal Processing Magazine, vol. 18, no. 6, pp. 14–30, Nov. 2001. [167] K. Wendel, O. Väisänen, J. Malmivuo, N. G. Gencer, B. Vanrumste, P. Durka, R. Magjarević, S. Supek, M. L. Pascu, H. Fontenelle, and R. Grave de Peralta Menendez, “EEG/MEG source imaging: methods, challenges, and open issues,” Computational Intelligence and Neuroscience, 2009. [168] R. D. Pascual-Marqui, “Review of methods for solving the EEG inverse problem,” International Journal of Bioelectromagnetism, vol. 1, no. 1, pp. 75 – 86, 1999. [169] R. A. Chowdhury, J. M. Lina, E. Kobayashi, and C. Grova, “MEG source localization of spatially extended generators for epileptic activity: comparing entropic and hierarchical Bayesian approaches,” PLOS ONE, vol. 8, no. 2, pp. 1–9, 2013. [170] M. Benidir, Higher-order statistical signal processing, chapter Theoretical foundations of higher-order statistical signal processing and polyspectra, Longman Cheshire, Australia, 1994. [171] J. M. Mendel, “Tutorial on higher-order-statistics (spectra) in signal processing and system theory: Theoretical results and some applications,” Proceedings of the IEEE, vol. 79, no. 3, Mar. 1991. [172] M. Fuchs, M. Wagner, H.-A. Wischmann, T. Köhler, A. Theissen, R. Drenckhahn, and H. Buchner, “Improving source reconstructions by combining bioelectric and biomagnetic data,” Electroencephalography and clinical Neurophysiology, vol. 107, pp. 93–111, 1998.
166
BIBLIOGRAPHY
[173] F. Babiloni, F. Carducci, F. Cincotti, C. Del Gratta, V. Pizzella, G. L. Romani, P. M. Rossini, F. Recchio, and C. Babiloni, “Linear inverse source estimate of combined EEG and MEG data related to voluntary movements,” Human Brain Mapping, vol. 14, pp. 197–209, 2001. [174] A. K. Liu, A. M. Dale, and J. W. Belliveau, “Monte Carlo simulation studies of EEG and MEG localization accuracy,” Human Brain Mapping, vol. 16, pp. 47–62, 2002. [175] D. Sharon, M. S. Hämäläinen, R. BH Tootell, E. Halgren, and J. W. Belliveau, “The advantage of combining MEG and EEG: comparison to fMRI in focally-stimulated visual cortex,” NeuroImage, vol. 36, no. 4, pp. 1225–1235, 2007. [176] A. Molins, S. M. Stufflebeam, E. N. Brown, and M.S. Hämäläinen, “Quantification of the benefit from integrating MEG and EEG data in minimum l2 -norm estimation,” NeuroImage, vol. 42, pp. 1069–1077, 2008. [177] R. N. Henson, E. Mouchlianitis, and K. J. Friston, “MEG and EEG data fusion: Simultaneous localisation of face-evoked responses,” NeuroImage, vol. 47, no. 2, pp. 581–589, 2009. [178] L. Ding and H. Yuan, “Simultaneous EEG and MEG source reconstruction in sparse electromagnetic source imaging,” Human Brain Mapping, vol. 34, pp. 775 – 795, 2013. [179] P. Comon and M. Rajih, “Blind identification of under-determined mixtures based on the characteristic function,” Signal Processing, vol. 86, no. 9, pp. 2271–2281, 2006. [180] A. Karfoul, L. Albera, and B. Birot, “Blind underdetermined mixture identification by joint canonical decomposition of HO cumulants,” IEEE Transactions on Signal Processing, vol. 58, no. 2, pp. 638–649, 2010. [181] L. Albera, P. Comon, P. Chevalier, and A. Ferréol, “Blind identification of underdetermined mixtures based on the hexacovariance,” in IEEE Proc. of ICASSP, Montreal, Canada, May 2004, vol. II, pp. 29–32. [182] B. H. Jansen and V. G. Rit, “Electroencephalogram and visual evoked potential generation in a mathematical model of coupled cortical columns,” Biological Cybernatics, vol. 73, pp. 357–366, 1995. [183] H. Becker, “Tensor-based techniques for EEG source localization,” Bachelor thesis, TU Ilmenau, Germany, 2010. [184] L. De Lathauwer, “Decompositions of a higher-order tensor in block terms – part ii: Definitions and uniqueness,” SIAM J. Matrix Anal. Appl., vol. 30, no. 3, pp. 1033–1066, 2008. [185] F. Römer and M. Haardt, “A closed-form solution for parallel factor (PARAFAC) analysis,” in IEEE Proc. of ICASSP, Las Vegas, NV, 2008, pp. 2365 – 2368.
BIBLIOGRAPHY
167
[186] P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Model. Simul., pp. 1168–1200, 2005. [187] P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signal processing,” in Fixed-point algorithms for inverse problems in science and engineering, pp. 185–212. Springer, 2011. [188] J. J. Moreau, “Fonctions convexes duales et points proximaux dans un espace hilbertien,” CR Acad. Sci. Paris Sér. A Math, vol. 255, pp. 2897–2899, 1962. [189] P. L. Combettes and J.-C. Pesque, “A proximal decomposition method for solving convex variational inverse problems,” Inverse Problems, , no. 6, Dec. 2008. [190] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Communications on Pure and Applied Mathematics, vol. 57, no. 11, pp. 1431–1457, Nov. 2004.
Index 2q-ExSo-MUSIC 4-ExSo-MUSIC parameters, 79 performance, 78, 87, 108 algorithm, 76 computational complexity, 104
algorithm, 20 computational complexity, 30 performance, 24, 29 Champagne algorithm, 73 computational complexity, 104 performance, 108 action potential, see potential characteristic function, second, 129 ADMM, 93, 94, 150 cLORETA ALS, 40, 137 algorithm, 69 alternating descent, 74 computational complexity, 103 alternating direction method of multipliers, parameters, 79 see ADMM performance, 78, 87, 108 alternating least squares, see ALS coherence analysis model, see data analysis model condition, 40 artifacts, 2, 10 mutual, 38 ASA software, 11 COM2 autocorrelation, 20, 27, 64 algorithm, 26 axon, 6 computational complexity, 30 contrast optimization, 132 background activity, 10, 13, 16, 62, 65 performance, 24, 29, 34 Bayesian approaches, 64, 72 common average reference, 9, 87 empirical Bayes, 73 computational complexity, 28, 45, 103 variational Bayes, 73 concave conjugate, 74 beamforming, 61, 74, 75 contrast BEM, see boundary element method definition, 17 blind source separation, 15, 19 optimization, 131 boundary element method, 11 penalized contrast function, 27 brain convex optimization, 149 anatomy, 5 Corcondia, see core consistency diagnostic lobes, 5 core consistency diagnostic, 39 signals, 12 correlation coefficient, 30 brain source imaging, 61, 62, 66 cortex, 6 Brainstorm, 11, 88 CP decomposition, 36 BrainVISA, 11, 88 algorithms, 40 BSS, see blind source separation approximate CP decomposition, 38 exact CP decomposition, 38 canonical correlation analysis, see CCA joint CP decomposition, 136 canonical polyadic decomposition, see CP number of components, 39, 86 decomposition shift-invariant, 36 CCA 169
170
uniqueness, 39 cumulants, 23, 65, 75, 131 definition, 129 properties, 130 current dipole, see equivalent current dipole extracellular, 7 intra-cellular, 6 DA, 122 algorithm, 77 computational complexity, 105 parameters, 79 data analysis model, 16, 36, 62 data fit, 67 data generation model, 10 deflation, 22–24, 27 DelL, 24 DelL-DG algorithm, 25 computational complexity, 30 performance, 24, 29 DelL-SG algorithm, 24 demixing matrix, 16 dendrites, 6 DIAG, 147 algorithm, 41, 138 computational complexity, 46 differential entropy, see entropy dipole localization error, see DLE disk, 76, 77 disk algorithm, see DA distributed source lead field vector, 65 localization, 61 model, 61, 65 DLE, 80 dSPM, 68 dynamic statistical parametric mapping, see dSPM EEG, see electroencephalography electrode placement system, 9 electroencephalography, 9 EM, see expectation maximization entropy, 18 epilepsy, 7 epileptic seizure, 7
INDEX
epileptogenic zone, 8 equivalent current dipole localization, 61, 74 model, 7, 61 ESP, 72 expectation maximization, 73 extended source, see distributed source extended source scanning methods, see scanning methods extraction, see source extraction eye blinks, 10 false positive fraction, see FPF fast iterative shrinkage thresholding algorithm, see FISTA FastICA algorithm, 23 computational complexity, 30 performance, 24, 29 FEM, see finite element method finite element method, 11 FISTA, 72, 150 FOCUSS, 69 forward problem, 2 Fourier transform, local spatial, 42, 63, 141 computational complexity, 46 parameters, 48 FPF, 80 frontal lobe, see brain lobes Gabor transform, 64, 72 generation model, see data generation model Givens rotations, 19, 24, 26, 27, 131 GOF, see goodness-of-fit goodness-of-fit, 88 gray matter, 6 head model realistic, 11 spherical, 11 higher order statistics, see statistics HOSVD, 148 hyperparameters, 64, 73, 74 hypotheses on the noise, 65 on the spatial source distribution, 63 on the spatio-temporal source distribution, 65
INDEX
171
on the temporal source distribution, 64 MCE algorithm, 70 ICA, 15, 16 computational complexity, 104 algorithms, 21 implementation, 108 constrained (cICA), 16 optimization, 150 principles of, 17 performance, 108 ICA-R, 16 mean field approximation, 73 ictal discharge, 8 MEG, see magnetoencephalography independent component analysis, see ICA metric, 77 indeterminacy minimum current estimates, 69 permutation, 17, 39 minimum energy, 63, 68 scale, 17, 39, 78 minimum norm estimates, 68 Infomax, 22 mixed norm estimates, 71 instrumentation noise, 10, 16, 62 mixing matrix, 16 interictal spike, 8, 13 spatial, 36 inverse problem, 2, 61, 62 mode-n irritative zone, 8 product, 37 unfolding matrix, 37 JADE, 22 vector, 37 JCP, see joint CP decomposition model evidence, 72, 73 JET, 41, 148 JEVD, see joint eigenvalue decomposition MRI, see magnetic resonance imaging multilinearity, 36, 77, 130 joint muscle activity, 10, 16 CP decomposition, 136 eigenvalue decomposition, 41, 138, 148 MUSIC, 61, 74, 76 MxNE ICA methods, 22 algorithm, 72 computational complexity, 104 Khatri-Rao product, 38 implementation, 108 Kruskal optimization, 150 condition, 39 performance, 108 rank, 38 L1,2 -norm, 70 Lagrangian, augmented, 94, 150 Laplace approximation, 73 Laplacian, 63, 68–70 LCMV filter, 75 lead field matrix, 9, 11, 62 Leonov-Shiryaev formula, 29, 76, 104, 131 definition, 129 loading matrix, 38 lobe, see brain lobes localization, see source localization LORETA, 68 magnetic resonance imaging functional, 9 structural, 11 magnetoencephalography, 9, 61, 75, 135
negentropy, 18, 23 neuron, 6, 12 neuronal population model, 12 neurotransmitter, 6 non-Gaussianity, 18, 130 occipital lobe, see brain lobes OpenMEEG, 11, 88 P-SAUD, 16 algorithm, 27 computational complexity, 28, 30 contrast optimization, 132 parameters, 30 performance, 29, 34, 122 PARAFAC, see CP decomposition PARAFAC2, 36
172
parameterization, 19, 24, 26, 75, 76, 131 parietal lobe, see brain lobes paroxysm, 7 patches, 10, 12 PCA, see principal component analysis penalization, 27, 132 posterior distribution, 72, 73 potential action potential, 6, 12 electric potential, 6, 9 postsynaptic potential, 6, 13 prewhitening, 18, 65 principal component analysis, 37 proximal splitting methods, 149 proximity operator, 149, 151 pseudo-periodicity, 64 pyramidal cell, 7, 9 realistic head model, see head model receiver operating characteristic, see ROC regularization, 67 L1 -norm regularization, 63 L2 -norm regularization, 63, 64 Lp -norm regularization, 63 L1,2 -norm regularization, 64, 70, 71, 93 L1 -norm regularization, 70, 93 L2 -norm regularization, 68, 70 Lp -norm regularization, 69 TV-L1 regularization, 92 regularized least squares, 67 RobustICA, 22 ROC, 80 SAUD algorithm, 27 performance, 24, 29 scanning methods, 74 second order cone programming, see SOCP second order statistics, see statistics SEEG, see stereotactic electroencephalography seizure, see epileptic seizure semi-algebraic optimization, 22, 26, 27, 131 separability, 38, 63, 64 separation, see source separation shift-invariant CP decomposition, see CP decomposition signal-to-noise ratio, 30
INDEX
sLORETA algorithm, 68 computational complexity, 103 parameters, 79 performance, 78, 87, 108 smoothness, 64, 68, 71 SNR, see signal-to-noise ratio SOBI algorithm, 20 computational complexity, 30 parameters, 30 performance, 24, 29 SOCP, 71, 72, 92 source extraction, 2, 15 localization, 2, 61 separation, 2, 15 space, 9, 11, 62 source imaging, see brain source imaging space–time–frequency, see STF space–time–wave-vector, see STWV sparsity, 63, 64, 69–71, 75, 93 spatial filter, adaptive, see beamforming spectrum, 76 spherical head model, see head model spike, see interictal spike statistical independence, 16 statistics estimates, 129 higher order, 21, 129 second order, 20 stereotactic electroencephalography, 8, 88 STF analysis, 41, 136 computational complexity, 46 hypothesis, 64 STF-DA algorithm, 77 parameters, 79 performance, 78, 87 STR, 64, 77 STWV analysis, 42, 136 computational complexity, 46 hypothesis, 63 STWV-DA algorithm, 77
INDEX
parameters, 79 performance, 78, 87, 108, 122 STWV-SVB-SCCD algorithm, 98 performance, 100 tensor construction, 42, 141 subspace-based approaches, 75 SVB-SCCD algorithm, 93 computational complexity, 105 optimization, 150 parameters, 95 performance, 95, 108 synapse, 6 synchronous dipoles, 65, 76 temporal lobe, see brain lobes tensor decomposition, 35, 77 definition, 37 rank, 38 rank-1 tensor, 37 tensor-based source localization, 76 TF-MxNE, 72 total variation, 71, 92 TPF, 80 trilinear approximation, 43, 83, 145 structure, 38 true positive fraction, see TPF TV, see total variation uncorrelatedness, 20 unfolding matrix, see mode-n unfolding matrix variational map, 71, 93 VB-SCCD, 92 algorithm, 71 parameters, 95 performance, 95 VESTAL, 69 wavelet transform, 41, 64, 71 computational complexity, 46 parameters, 48 white matter, 6 Wigner-Ville distribution, 36 window function, 42, 141
173
French extended summary of the thesis
175
Résumé de la thèse Chapitre 1 : Introduction Contexte et motivation L’électroencéphalographie (EEG) est une technique non-invasive qui enregistre l’activité cérébrale avec une haute résolution temporelle en utilisant un réseau de capteurs qui sont placés sur la surface de la tête. Les mesures contiennent des informations précieuses sur les sources électromagnétiques dans le cerveau qui sont à l’origine de l’activité cérébrale observée. Ces informations sont essentielles pour le diagnostic et le suivi de certaines maladies comme l’épilepsie et pour comprendre le fonctionnement du cerveau en neuroscience. Dans cette thèse, nous nous concentrons sur l’application de l’EEG dans le contexte de l’épilepsie. Plus particulièrement, nous nous intéressons à la localisation des régions épileptiques qui sont impliquées dans l’activité épileptique entre les crises. La délimitation de ces régions est essentielle pour l’évaluation des patients souffrant d’une épilepsie partielle et pharmaco-résistante pour qui une intervention chirurgicale peut être considérée pour enlever les zones épileptogènes qui sont responsables de l’occurrence de crises. L’objectif consiste alors à identifier les positions (et étendues spatiales) des sources cérébrales à partir du mélange bruité de signaux qui est enregistré à la surface de la tête par les capteurs EEG. Ceci est connu sous le nom de problème inverse. De l’autre côté, la dérivation des signaux EEG pour une configuration de sources connue est appelée le problème direct (cf. figure 1.1). Grâce aux modèles affinés de la géométrie de la tête et des outils mathématiques avancés permettant de calculer la matrice de lead field, qui caractérise la propagation dans le volume conducteur de la tête, il est aisé de résoudre le problème direct. Par contre, trouver une solution au problème inverse est toujours une tâche difficile. Ceci est particulièrement vrai dans le contexte de sources multiples avec des signaux temporels corrélés qui peuvent être impliquées dans la propagation de phénomènes épileptiques. Ce sujet est le problème central de cette thèse et motive le développement d’algorithmes qui sont robustes par rapport à la corrélation des sources. Une autre difficulté rencontrée dans l’analyse de données EEG consiste dans le fait que les données enregistrées ne reflètent pas uniquement l’activité cérébrale d’intérêt, mais reflètent également l’activité de sources non cérébrales telles que l’activité cardiaque, l’activité musculaire de la mâchoire ou encore l’activité oculaire. En général, on appelle ces signaux “non-cérébraux” artéfacts. Les artéfacts peuvent avoir de grandes amplitudes, cachant les signaux d’intérêt qui correspondent, dans notre cas, à l’activité d’une ou plusieurs sources cérébrales. Ainsi, pour empêcher les artéfacts de compromettre l’interprétation des mesures EEG, il est préférable de les supprimer avant d’appliquer d’autres méthodes d’analyse de l’EEG. i
ii
RÉSUMÉ DE LA THÈSE
Approche proposée et plan de cette thèse Pour les données EEG contenant plusieurs sources et artéfacts, nous proposons de considérer les étapes suivantes de traitement de données pour résoudre le problème inverse : 1. extraction de l’activité cérébrale d’intérêt, c’est-à-dire de l’activité épileptique (élimination des artéfacts), 2. séparation de sources simultanément actives et potentiellement corrélées pour faciliter leur localisation, 3. localisation de sources distribuées. Les deux premières étapes constituent des opérations de prétraitement qui sont optionnelles, mais qui peuvent considérablement simplifier la localisation de sources en fonction des caractéristiques du jeu de données à analyser, tandis que la solution du problème inverse est accompli lors de la troisième étape. Dans cette thèse, nous développons des techniques robustes et efficaces sur le plan des calculs pour les trois étapes de traitement de données décrites ci-dessus. Les performances des algorithmes proposés sont analysées en termes de précision et de complexité numérique en comparaison avec des méthodes conventionnelles. Comme des informations précises sur les zones épileptogènes ne sont généralement pas disponibles pour les données réelles, l’évaluation des performances est principalement basée sur des simulations réalistes qui nous permettent de comparer les résultats obtenus avec la vérité de terrain. Néanmoins, quelques exemples avec des données EEG réelles sont également présentés pour valider les méthodes proposées. Cette thèse est organisée de la manière suivante : dans le chapitre 2, nous donnons quelques informations de base sur l’origine des signaux électromagnétiques du cerveau, sur les caractéristiques des systèmes EEG, ainsi que sur l’épilepsie. En plus, nous décrivons le modèle mathématique des données EEG qui est utilisé pour les simulations conduites dans cette thèse. Dans le chapitre 3, nous considérons deux types de méthodes de prétraitement : des approches statistiques pour l’élimination des artéfacts basée sur l’analyse en composantes indépendantes (ACI) et des méthodes de décomposition tensorielle déterministe pour la séparation de sources. Le chapitre 4 est dédié à la localisation de sources distribuées. Après avoir décrit et classifié les méthodes de l’état de l’art, nous présentons quelques contributions au développement de nouvelles méthodes de localisation de sources. Nous concluons le chapitre avec une étude de performances de huit algorithmes de localisation de sources différents. Enfin, dans le chapitre 5, nous illustrons la combinaison des trois étapes de traitement de données par un exemple de simulation avant de résumer nos résultats et de discuter des perspectives pour la suite des travaux.
Publications associées Certaines parties du travail présenté dans cette thèse peuvent être associées aux publications suivantes : Articles de conférence internationale • H. Becker, P. Comon, L. Albera, M. Haardt, and I. Merlet, “Multiway space-timewave-vector analysis for source localization and extraction”, Proc. of European Si-
RÉSUMÉ DE LA THÈSE
iii
gnal Processing Conference (EUSIPCO), Aalborg, Denmark, August 2010. Introduction de l’analyse STWV décrite dans la section 3.2.3.2 de cette thèse. • H. Becker, P. Comon, and L. Albera, “Tensor-based preprocessing of combined EEG/MEG data”, Proc. of European Signal Processing Conference (EUSIPCO), Bucharest, Romania, August 2012. Extension des analyses STF et STWV à la combinaison de données EEG et MEG présentée dans l’annexe C. • H. Becker, L. Albera, P. Comon, R. Gribonval, F. Wendling, and I. Merlet, “A performance study of various brain source imaging approaches”, IEEE Proc. of Internat. Conf. on Acoustics Speech and Signal Processing (ICASSP), Florence, Italy, May 2014. Étude comparative des performances de sept algorithmes de localisation de sources distribuées semblable aux simulations conduites dans la section 4.7.2 de cette thèse. • H. Becker, L. Albera, P. Comon, R. Gribonval, and I. Merlet, “Fast, variationbased methods for the analysis of extended brain sources”, Proc. of European Signal Processing Conference (EUSIPCO), Lisbon, Portugal, September 2014. Présentation de l’algorithme SVB-SCCD décrit dans la section 4.5 de cette thèse.
Articles de revue internationale • H. Becker, P. Comon, L. Albera, M. Haardt, and I. Merlet, “Multiway space-timewave-vector analysis for EEG source separation”, Signal Processing, vol. 92, pp. 1021–1031, 2012. Introduction de l’analyse STWV décrite dans la section 3.2.3.2 de cette thèse et application aux données EEG pour la localisation de sources étendues dans le contexte d’un modèle de tête sphérique. • H. Becker, L. Albera, P. Comon, M. Haardt, G. Birot, F. Wendling, M. Gavaret, C. G. Bénar and I. Merlet, “EEG extended source localization : tensor-based vs. conventional methods”, NeuroImage, vol. 96, pp. 143–157, August 2014. Présentation des méthodes STF-DA et STWV-DA pour la localisation de sources distribuées et évaluation sur données simulées réalistes et données réelles semblable à la section 4.4.1 de cette thèse. • H. Becker, L. Albera, P. Comon, R. Gribonval, F. Wendling, and I. Merlet, “Brain source imaging : from sparse to tensor models”, submitted to IEEE Signal Processing Magazine, 2014. Exposé de synthèse et classification de différentes approches de localisation de sources distribuées (cf. sections 4.2 et 4.3) ainsi que comparaison des performances de méthodes représentatives basée sur simulations réalistes semblable à la section 4.7.2 de cette thèse.
iv
RÉSUMÉ DE LA THÈSE
• H. Becker, L. Albera, P. Comon, A. Kachenoura, and I. Merlet, “A penalized semialgebraic deflation ICA algorithm for the efficient extraction of interictal epileptic signals”, submitted to IEEE Transactions on Biomedical Engineering, 2014. Introduction de l’algorithme P-SAUD décrit dans la section 3.1 de cette thèse.
Chapitre 2 : Signaux EEG : Origine physiologique et modélisation Pour traiter une importante quantité d’informations, le cerveau contient un grand nombre de cellules nerveuses, les neurones, qui sont principalement localisées dans la matière grise formant le cortex cérébral. La transmission d’informations entre neurones est basée sur un processus électrochimique (cf. figure 2.3) [1, 2, 3] qui provoque un courant électrique dans les neurones ainsi qu’un contre-courant à l’extérieur des cellules. Ceci peut être modélisé par un dipôle de courant qui est à la base des modèles mathématiques de l’activité cérébrale. L’épilepsie est une des maladies neuronales les plus courantes. Cette maladie entraîne des dysfonctionnements temporaires de l’activité électrique du cerveau que l’on appelle les crises et qui se manifestent sous forme de décharges électriques rythmiques durant quelques secondes à quelques minutes. La majorité des patients peuvent être traités avec des médicaments, mais quelques patients sont pharmaco-résistants. Dans certains de ces cas, si on connaît la position et l’étendue spatiale de la zone épileptogène, on peut envisager une intervention chirurgicale pour enlever cette zone et arrêter l’occurrence des crises. Pour aider l’évaluation préchirurgicale de ces patients, dans cette thèse, nous appliquons des méthodes de localisation de sources aux pointes épileptiques, qui apparaissent à intervalles irréguliers entre les crises, pour identifier les régions cérébrales impliquées dans ces paroxysmes. L’EEG permet d’enregistrer l’activité électromagnétique du cerveau dans un certain intervalle de temps à l’aide de capteurs placés sur la surface de la tête. Les systèmes EEG comprennent entre 19 et 256 électrodes et mesurent la différence de potentiels électriques entre chaque capteur et une référence. Les avantages principaux de l’EEG sont sa haute résolution temporelle et son faible coût comparé à la magnétoencéphalographie (MEG) ou à l’imagerie par résonnance magnétique (IRM) fonctionnelle. Pour que l’activité cérébrale soit mesurable à la surface de la tête, un certain nombre de populations neuronales simultanément actives est requit. Ces populations peuvent être modélisées par une grille de dipôles formant l’espace sources prédéfini. Les données EEG peuvent être modélisées comme un mélange linéaire et instantané de signaux émis par les dipôles de l’espace sources, caractérisé par la matrice de lead field, auquel se superposent les contributions d’autres signaux électromagnétiques d’origine physiologique différente, appelés artéfacts, et le bruit d’instrumentation. Pour un modèle de tête et un espace sources donnés, la matrice de lead field peut être calculée numériquement [3]. Dans cette thèse, nous considérons un modèle de tête réaliste (cf. figure 2.5) qui est composé de trois couches, segmentées d’une IRM, et représentant le cerveau, le crâne et le cuir chevelu, et un espace sources comprenant des dipôles localisés sur la surface du cortex avec une orientation perpendiculaire à cette surface. Pour les simulations, nous définissons 11 sources distribuées, également appelées patchs, qui sont composées d’un certain nombre
RÉSUMÉ DE LA THÈSE
v
de dipôles adjacents de l’espace sources (cf. figure 2.6). En outre, nous générons des signaux cérébraux (pointes épileptiques hautement corrélées pour les dipôles appartenant aux sources distribuées et activité de fond pour les autres dipôles) en utilisant un modèle de populations neuronales [14, 22] (voir figure 2.7 pour un exemple de pointes épileptiques générées avec ce modèle).
Chapitre 3 : Prétraitement Élimination des artéfacts et séparation de patchs indépendants avec l’ACI La première étape de l’analyse de données EEG consiste à éliminer les artéfacts ou, de manière équivalente, à extraire l’activité d’intérêt, c’est-à-dire les pointes épileptiques. Comme peu d’informations sur les sources sous-jacentes sont disponibles a priori, ceci est une application typique pour les méthodes de séparation de sources aveugle [23, 24]. À cause de leurs différentes origines physiologiques, il est raisonnable de présumer que les artéfacts sont statistiquement indépendants de l’activité épileptique, ce qui motive l’application de méthodes statistiques basées sur l’analyse en composantes indépendantes (ACI). Cette approche traite les données et les signaux comme des réalisations d’un processus vectoriel aléatoire et cherche à séparer autant de composantes indépendantes qu’il y a de capteurs. Ces composantes peuvent être distinguées en trois groupes formant des bases pour les sous-espaces de l’activité épileptique, de l’activité musculaire, et de l’activité de fond. Notre objectif ici consiste à identifier les vecteurs de la matrice de mélange et les signaux associés qui engendrent le sous-espace de l’activité épileptique. Pour identifier les composantes indépendantes à l’aide d’une transformation linéaire, on résout un problème d’optimisation qui est basé sur une mesure d’indépendance statistique comme l’information mutuelle ou la néguentropie. Dans le contexte de l’ACI, la fonction de coût qui est maximisée est appelée fonction de contraste. Pour faciliter la séparation des composantes, beaucoup de méthodes d’ACI utilisent une étape de pré-blanchiment des données qui précède l’extraction des composantes. L’objectif de cette étape consiste à décorréler les signaux de manière à ce que la matrice de covariance des données préblanchies soit la matrice d’identité. Dans ce cas, la matrice de mélange à identifier est une matrice unitaire. L’estimation de cette matrice unitaire peut être simplifiée davantage en utilisant une paramétrisation basée sur les rotations de Givens. Dans la littérature, un grand nombre de méthodes de séparation de sources aveugle, incluant les algorithmes basés sur les statistiques d’ordre deux comme SOBI et CCA ou les techniques d’ACI basées sur les statistiques d’ordre supérieur, ont été appliquées avec succès pour séparer les artéfacts de l’activité neuronale (voir par exemple [25, 26, 27, 28, 29]). Une étude récente [31] des performances de plusieurs méthodes populaires d’ACI a montré que l’algorithme COM2 [33] donne le meilleur compromis entre performance et complexité numérique. Cependant cette méthode extrait non seulement les composantes épileptiques d’intérêt, mais identifie également un grand nombre d’autres composantes du mélange. Comme nous nous intéressons uniquement à l’activité épileptique, la complexité numérique de l’algorithme peut être réduite davantage en extrayant seulement les signaux épileptiques. Pour ce faire, on peut utiliser des méthodes de déflation. Par contre, il faut s’assurer que les signaux d’intérêt sont extraits en premier pour pouvoir arrêter l’algorithme après la séparation d’un petit nombre de composantes du mélange. Dans la
vi
RÉSUMÉ DE LA THÈSE
littérature, ceci est accompli en recourant à un signal de référence ce qui donne lieu aux approches d’ACI avec référence (ACI-R) [35, 36, 34, 37, 38, 39]. Toutefois, en pratique, un signal de référence n’est pas toujours disponible. C’est pourquoi nous proposons une autre méthode dans cette thèse qui est basée sur une fonction de coût pénalisée. Le nouvel algorithme, appelé P-SAUD, est basé sur la fonction de contraste bi-source et la procédure d’optimisation efficace utilisées par COM2, mais extrait les sources de manière séquentielle en suivant la méthode de déflation par projection de l’algorithme DelL [42]. Pour extraire les composantes épileptiques en premier, nous exploitons le fait que l’auto-corrélation de ces composantes est plus grande que celle des artéfacts musculaires. Ainsi l’auto-corrélation est utilisée comme terme de pénalisation dans la fonction de contraste. Un paramètre de pénalisation règle l’influence de l’auto-corrélation sur l’extraction des composantes. Au début, on attribue une grande valeur à ce paramètre, laquelle est ensuite diminuée au cours des itérations de l’algorithme. La valeur finale détermine l’équilibre entre la fonction de contraste de COM2 et le terme de pénalisation. La performance de P-SAUD en termes de précision et de complexité numérique est évaluée à l’aide de simulations, conduites avec deux patchs aux signaux épileptiques indépendants en présence d’artéfacts musculaires, et comparée à deux méthodes de séparation de sources aveugle d’ordre deux, SOBI [30] et CCA [28], ainsi qu’à trois méthodes classiques d’ACI, COM2 [33], FastICA [47], et DelL [42]. En termes de précision, les résultats de P-SAUD sont mieux que ceux de DelL, SOBI et CCA et comparables à ceux de COM2 et FastICA. Les simulations confirment également que P-SAUD extrait les composantes épileptiques parmi les premières composantes d’ACI. Ainsi, à précision égale, la complexité numérique de P-SAUD est considérablement réduite par rapport à COM2 et FastICA. Ceci a aussi été confirmé sur un exemple de données réelles.
Séparation de sources corrélées par décomposition tensorielle Si plusieurs régions de sources sont impliquées dans l’activité épileptique, il est préférable de séparer ces sources afin de faciliter le processus de la localisation de sources. Pour les sources statistiquement indépendantes, ceci peut également être accompli par l’ACI. Par contre, dans le contexte de phénomènes de propagation, les signaux des différentes régions épileptiques sont (hautement) corrélés et leur séparation nécessite un autre type d’approche. Dans cette thèse, nous explorons l’utilisation de méthodes tensorielles déterministes. Ces méthodes exploitent des données multidimensionnelles et présument une certaine structure sous-jacente. Ici, nous nous focalisons sur la décomposition canonique [58] qui impose une structure multilinéaire sur les données. Cette structure est alors exploitée pour identifier les composantes associées aux sources. La décomposition canonique (approximative) décompose un tenseur en une somme d’un petit nombre de tenseurs de rang 1 qui représentent les composantes significatives du tenseur. Pour un tenseur d’ordre 3, chaque tenseur de rang 1 correspond au produit extérieur de 3 vecteurs qui dépendent de trois variables différentes. Ainsi la décomposition canonique permet de séparer les variables. Le nombre de composantes, qui correspond au nombre de sources, doit être estimé à partir des données, mais nous supposons dans cette thèse que ce nombre est connu. Le grand avantage de la décomposition canonique comparé à d’autres décompositions tensorielles est qu’elle est unique sous certaines conditions sur le nombre de composantes. Ces conditions sont généralement vérifiées dans le contexte de la séparation de sources en EEG. Le calcul de la décomposition canonique est basé
RÉSUMÉ DE LA THÈSE
vii
sur un problème d’optimisation. Celui-ci peut être résolu à l’aide d’un grand nombre d’algorithmes qui ont été proposés dans la littérature, incluant des méthodes alternantes [76], des algorithmes basés sur le gradient [58, 80, 81], et des techniques directes [82, 83, 44]. Pour obtenir des données multidimensionnelles, on peut soit collecter une diversité supplémentaire directement à partir des données, soit créer une troisième dimension en appliquant une transformation qui préserve les deux dimensions originales, comme la transformée de Fourier à court terme ou la transformée en ondelettes. Plusieurs auteurs ont étudié l’application de la décomposition canonique aux données EEG transformées en données espace-temps-fréquence (STF), obtenues en calculant une transformée en ondelettes sur la dimension de temps des mesures [69, 70, 71, 72, 73]. Sous certaines conditions sur les signaux, cette méthode donne des caractéristiques d’espace, de temps et de fréquence pour chaque région source et permet alors de localiser chaque patch individuellement dans une deuxième étape. Dans cette thèse, nous présentons une méthode alternative qui est basée sur une transformée de Fourier locale en espace. Ceci donne un tenseur espace-temps-vecteur d’onde (STWV) qui peut également être décomposé en utilisant la décomposition canonique. L’avantage de cette approche consiste dans sa robustesse à la corrélation des signaux des sources. Pour comprendre les mécanismes sous-jacents des techniques STF et STWV et les conditions qui sont nécessaires à leur fonctionnement, nous conduisons une étude théorique de ces approches. Plus particulièrement, nous dérivons des conditions suffisantes pour que ces méthodes donnent des résultats exacts. Bien que ces conditions mathématiques soient très restrictives et qu’il soit difficile d’en déduire des conditions physiologiques que l’on peut vérifier en pratique, nous avons déterminé que les facteurs suivants influencent le fonctionnement des analyses STF et STWV : • puissance des sources, • corrélation des signaux des sources, • corrélation des vecteurs de mélange spatial des sources, • valeurs singulières des matrices temps-fréquence et espace-vecteur d’onde des sources. Pour évaluer la capacité des analyses STF et STWV à séparer des sources corrélées, nous effectuons quelques simulations dans le contexte de pointes épileptiques propagées. Nous analysons l’influence de plusieurs paramètres sur l’estimation des signaux et des vecteurs de mélange spatial. Les simulations montrent que l’analyse STF ne donne pas de bons résultats pour les sources corrélées tandis que la méthode STWV permet de séparer les sources correctement tant que le RSB, le nombre d’échantillons temporels et le nombre de capteurs ne sont pas trop petits. Une analyse de complexité numérique des deux méthodes a révélé que le coût de calcul des deux approches est comparable pour un nombre d’échantillons temporels modéré. ACI vs. décomposition tensorielle En fonction de la quantité et de l’amplitude des artéfacts et de la nature de l’activité épileptique (nombre de sources et corrélation) d’un certain jeu de données, on peut décider d’utiliser seulement une des deux méthodes de prétraitement décrites ci-dessus (ACI ou décomposition tensorielle) ou les deux à la fois. Pour déterminer dans quelle situation
viii
RÉSUMÉ DE LA THÈSE
laquelle de ces méthodes de prétraitement est appropriée, nous avons comparé les deux approches à l’aide de simulations. D’un côté, nous avons analysé la robustesse de l’ACI (représentée par l’algorithme P-SAUD) et de la décomposition tensorielle (représentée par la méthode STWV) à la corrélation des sources. Cette analyse nous a montré que STWV permet de séparer des sources avec signaux de morphologie différente sans difficulté et des sources avec signaux identiques mais retardés dès qu’il y a un certain délai minimal entre les activités des deux sources, tandis que P-SAUD ne sépare pas correctement les sources tant que leurs signaux se recouvrent partiellement. Ceci montre que STWV est la méthode de choix dans le contexte de phénomènes de propagation. De l’autre côté, nous avons étudié la robustesse de P-SAUD et de STWV aux artéfacts musculaires. Ici, nous avons constaté que le RSB limite pour une bonne estimation des vecteurs de mélange de deux sources non-corrélées est plus petit pour P-SAUD que pour STWV avec une différence d’environ 10 dB. Finalement, nous avons combiné P-SAUD et STWV pour extraire des sources propagées à partir de données EEG corrompues par des artéfacts. Cette approche réduit le RSB minimale requit pour une séparation correcte des deux sources par 5 dB.
Chapitre 4 : Localisation de sources distribuées Les méthodes de localisation de sources qui sont actuellement disponibles peuvent être distinguées en deux types d’approche : la localisation de dipôles équivalents où chacun de ces dipôles décrit l’activité électromagnétique dans une région étendue du cerveau et la localisation de sources distribuées qui sont caractérisées par un certain nombre de dipôles avec des positions fixes [89]. Dans cette thèse, nous nous concentrons sur la localisation de sources distribuées parce que les paroxysmes épileptiques que l’on observe en EEG impliquent souvent des grandes régions corticales comme montré par plusieurs études [22, 7, 93, 94, 95]. Par ailleurs, nous ne nous intéressons pas seulement aux positions des sources, mais aussi à leurs étendues spatiales. La localisation de sources distribuées consiste à estimer les amplitudes de tous les dipôles de l’espace sources à partir des données enregistrées en surface. Comme le nombre des dipôles de l’espace sources (plusieurs milliers) dépasse largement le nombre de capteurs EEG (une centaine), ceci est un problème inverse mal posé. Pour rétablir l’identifiabilité du problème, il est nécessaire de faire des hypothèses supplémentaires sur les sources. Dans cette thèse, nous distinguons trois catégories d’hypothèses : • les hypothèses qui s’appliquent à la distribution spatiale des sources (énergie minimale, énergie minimale dans un domaine transformé, parcimonie, parcimonie dans un domaine transformé, séparabilité dans le domaine espace-vecteur d’onde, densité de probabilité Gaussien avec covariance spatiale paramétrée), • les hypothèses qui s’appliquent à la distribution temporelle des sources (signaux lisses, parcimonie dans un domaine transformé, pseudo-périodicité avec variations en amplitude, séparabilité dans le domaine temps-fréquence, cumulants marginaux d’ordre supérieur non-zéro) et • les hypothèses qui s’appliquent à la distribution spatio-temporelle des sources (dipôles synchrones conduisant à un nouveau modèle de données pour sources distribuées).
RÉSUMÉ DE LA THÈSE
ix
Dans la littérature, un grand nombre de méthodes de localisation de sources distribuées ont été proposées. En se basant sur des considérations méthodologiques, on peut différencier trois familles principales de techniques : • les approches des moindres carrés régularisés (cf. figure 4.1) ; celles-ci incluent entre autres des algorithmes de norme minimale, comme sLORETA [101] et cLORETA [103], et des approches parcimonieuses, comme MCE [110], VB-SCCD [116], et MxNE [126], • les approches Bayésiennes (variationnelles et empiriques) ; un exemple d’une telle méthode est l’algorithme Champagne [138, 139], • les approches par recherche exhaustive pour sources étendues (approches de type MUSIC et filtres spatiaux) ; un exemple de cette famille est 2q-ExSo-MUSIC [15]. Dans chaque famille, les méthodes peuvent être classifiées davantage selon les hypothèses exploitées. Une quatrième famille de techniques de localisation de sources distribuées comprend les approches tensorielles qui sont proposées dans cette thèse. Ces méthodes procèdent en deux étapes : 1. la séparation de différentes sources distribuées en utilisant la décomposition tensorielle (décrite dans le chapitre 3) et 2. l’identification des dipôles de l’espace sources qui caractérisent chaque source distribuée. Pour localiser les sources distribuées à partir des vecteurs de mélange spatial, identifiés par la décomposition tensorielle, nous introduisons un nouvel algorithme, appelé DA. Cette méthode est basée sur un dictionnaire de sources distribuées potentielles de forme circulaire, les disques, qui sont comparées aux vecteurs de mélange spatial par une métrique. Cette approche est inspirée par la stratégie d’optimisation de l’algorithme 2q-ExSoMUSIC, mais la métrique utilisée par DA est différente. La performance de cette nouvelle méthode, appelée STWV-DA si basée sur des données espace-temps-vecteur d’onde et STF-DA si basée sur des données espace-temps-fréquence, est évaluée sur données simulées et réelles dans le contexte d’une activité épileptique propagée d’un patch à un deuxième patch en comparaison avec 4-ExSo-MUSIC, sLORETA, et cLORETA. Les simulations montrent que STWV-DA donne des bons résultats pour des patchs surfaciques dans le cas de données spatialement pré-blanchies, comparables aux résultats de 4-ExSoMUSIC et supérieurs aux résultats de sLORETA et cLORETA, mais également sur les données brutes, contrairement aux autres méthodes. Toutefois, STWV-DA a des difficultés à localiser des patchs profonds. STF-DA ne donne pas de bons résultats puisque l’analyse STF ne permet pas de correctement séparer les sources corrélées. Sur les données réelles, l’approche tensorielle et 4-ExSo-MUSIC localisent des régions concordantes avec les positions des électrodes de la SEEG pour lesquels une activité épileptique fréquente a été détectée. Même si STWV-DA et 4-ExSo-MUSIC donnent des bons résultats dans un certain nombre de cas, ces méthodes ont des difficultés à localiser plusieurs patchs en même
x
RÉSUMÉ DE LA THÈSE
temps. Pour surmonter ce problème, nous explorons une autre approche pour la localisation de sources distribuées en exploitant la parcimonie structurée de la distribution spatiale des sources. Cette approche est basée sur l’algorithme VB-SCCD [116] qui a montré une bonne performance lors d’une comparaison récente de différentes méthodes de localisation de sources distribuées [158]. L’algorithme VB-SCCD impose la parcimonie sur la matrice caractérisant les différences d’amplitude des dipôles adjacents de l’espace sources et permet, en particulier, de localiser simultanément plusieurs patchs hautement corrélés. Par contre, cet algorithme donne des distributions spatiales biaisées et montre des difficultés à séparer plusieurs sources qui sont proches les unes des autres. En outre, l’implémentation avec la méthode d’optimisation SOCP [123, 124] comme proposée dans [116] entraîne une grande complexité numérique qui empêche l’application de cette technique à un grand nombre d’échantillons temporels. Dans cette thèse, nous améliorons ces points en proposant un nouvel algorithme, appelé SVB-SCCD, qui inclut un terme de régularisation de norme L1 supplémentaire, imposant la parcimonie sur les amplitudes des dipôles. Ce terme évite le biais en amplitude de la distribution spatiale estimée et facilite la séparation des sources. En plus, il est possible d’ajuster la taille des sources reconstruites en variant le nouveau paramètre de régularisation ajouté dans cette approche. Nous utilisons l’algorithme ADMM [163, 164] (voir aussi [165]) au lieu de SOCP pour résoudre le problème d’optimisation avec un temps de calcul considérablement réduit. Finalement, nous considérons l’exploitation de la structure temporelle des données en remplaçant la norme L1 dans les termes de régularisation par la norme mixte L12 . L’algorithme qui exploite la structure temporelle, appelé L12 -SVB-SCCD, permet d’obtenir des résultats d’estimation de sources plus robustes. La meilleure performance de SVBSCCD et de L12 -SVB-SCCD, comparée à VB-SCCD et d’autres méthodes de localisation de sources distribuées (4-ExSo-MUSIC et cLORETA), est confirmée par des simulations dans le contexte de trois patchs aux signaux épileptiques hautement corrélés. Pour répondre à la question si une séparation (partielle) des sources avant la localisation peut améliorer les résultats de l’algorithme SVB-SCCD, nous considérons une approche qui combine la décomposition tensorielle avec la localisation de sources distribuées en exploitant la parcimonie structurée. À cette fin, nous conduisons une étude de simulations où nous comparons les algorithmes STWV-DA et SVB-SCCD avec l’approche combinée STWV-SVB-SCCD. Néanmoins, pour la majorité des scénarios testés, nous n’observons pas d’amélioration des performances de STWV-SVB-SCCD par rapport à STWV-DA et SVB-SCCD. Nous en concluons alors que le prétraitement tensoriel est seulement utile pour les algorithmes de localisation de sources distribuées qui ne sont pas capables de localiser plusieurs patchs simultanément. Enfin, nous conduisons une étude approfondie des performances de diverses méthodes de localisation de sources. Plus particulièrement, nous comparons huit algorithmes représentatifs, incluant sLORETA, cLORETA, MCE, MxNE, Champagne, 4-ExSo-MUSIC, STWV-DA et SVB-SCCD, par rapport à leurs complexités numériques et à la précision des résultats de l’estimation de sources distribuées sur données simulées réalistes dans le contexte de signaux épileptiques propagés. Dans les simulations, nous analysons l’influence de la position et de la taille d’une source distribuée ainsi que l’influence du nombre de patchs. En résumé, les résultats nous montrent que sLORETA, Champagne, MCE et MxNE estiment bien les positions des sources, mais pas leurs étendues spatiales puisque ces algorithmes ont été conçus pour la localisation de sources focales. Cependant, 4-ExSo-MUSIC, STWV-DA et SVB-SCCD permettent également d’obtenir une bonne
RÉSUMÉ DE LA THÈSE
xi
estimation de la taille des sources. cLORETA donne des résultats intermédiaires puisque les dipôles avec grandes amplitudes correspondent bien aux patchs, mais il est difficile de délimiter les régions sources à partir de la distribution spatiale estimée. Du point de vue de la complexité numérique, cLORETA et sLORETA sont les plus efficaces des méthodes testées tandis que 4-ExSo-MUSIC est associée au plus grand coût de calcul, suivie par Champagne. En tout, STWV-DA et SVB-SCCD semblent être les méthodes les plus prometteuses pour la localisation de (plusieurs) sources distribuées autant en termes de robustesse qu’en qualité des sources reconstruites.
Chapitre 5 : Résumé et conclusions Dans cette thèse, nous avons cherché à identifier les positions et les étendues spatiales des zones épileptiques à partir des mesures EEG. En particulier, nous avons adressé le problème de la localisation de régions cérébrales simultanément actives aux signaux temporels hautement corrélés qui surviennent de la propagation de phénomènes épileptiques. Pour traiter ce problème difficile, nous avons proposé une approche composée de trois étapes : l’extraction des signaux épileptiques des données bruitées, la séparation de sources corrélées, et la localisation de sources distribuées. D’abord, nous résumons les techniques qui ont été développées dans cette thèse pour chacune des étapes et nous illustrons leur combinaison pour un exemple de simulation. Ensuite, nous récaptilons nos conclusions et, finalement, nous suggérons quelques directions pour la suite des travaux.
Résumé et illustration du processus complet d’analyse de données Pour extraire les pointes épileptiques des données EEG corrompues par des artéfacts, nous avons exploité les différentes origines physiologiques des sources en supposant que cellesci sont statistiquement indépendantes et nous avons considéré des méthodes basées sur l’ACI. Plus particulièrement, nous avons développé un nouvel algorithme semi-algébrique, appelé P-SAUD, qui repose sur l’auto-corrélation des signaux pour extraire les composantes épileptiques en utilisant un petit nombre d’étapes de déflation. Les données EEG débruitées sont alors reconstruites à partir des composantes épileptiques identifiées. Comme plusieurs méthodes de localisation de sources distribuées montrent des difficultés lors de la localisation de patchs simultanément actives, en particulier dans le cas d’activités corrélées, nous avons ensuite exploré l’utilisation de méthodes tensorielles basées sur la décomposition canonique pour séparer plusieurs sources qui sont potentiellement corrélées. Ici, nous nous sommes concentrés sur les méthodes tensorielles basées sur une transformation qui construisent un tenseur de données en utilisant une transformation temps-fréquence, menant à la méthode STF classique, ou une transformation espace-vecteur d’onde, qui a pour résultat la nouvelle approche STWV. Ces techniques cherchent à extraire un vecteur de mélange spatial et un vecteur de signal pour chaque source distribuée. Contrairement aux études précédentes des approches tensorielles [69, 70, 71, 72, 73], qui se sont principalement concentrées sur la séparation de sources et la localisation de dipôles équivalents, nous sommes allés plus loin et nous avons utilisé les résultats de l’étape de prétraitement tensoriel pour la localisation de sources distribuées. Dans ce contexte, une contribution importante de cette thèse consiste dans la proposition de l’algorithme DA, qui nous permet de localiser précisément les sources distribuées à partir des vecteurs
xii
RÉSUMÉ DE LA THÈSE
de mélange spatial estimés. Cette méthode est basée sur une paramétrisation des sources distribuées, semblable à [152, 15], mais utilise une métrique différente pour identifier les éléments de l’espace sources qui décrivent au mieux les mesures. En plus, nous avons analysé d’autres approches de localisation de sources distribuées qui ont été proposées dans la littérature et nous avons donné un exposé de synthèse sur les différents types d’information a priori qui ont été exploités pour résoudre le problème inverse mal posé du cerveau. Ces hypothèses peuvent être grossièrement distinguées en contraintes qui sont imposées sur la distribution spatiale des sources et en contraintes qui concernent la distribution temporelle des sources. Ensuite, nous avons classifié les algorithmes existants pour la localisation de sources distribuées en nous basant sur des considérations méthodologiques et sur l’information a priori exploitée. Finalement, nous avons proposé quelques améliorations de l’algorithme de localisation de sources VB-SCCD, qui exploite la parcimonie structurée, pour développer un algorithme efficace pour la localisation simultanée de patchs multiples. Le nouvel algorithme est appelé SVB-SCCD. Pour illustrer la combinaison des trois étapes de traitement de données en utilisant les méthodes P-SAUD, STWV et DA, nous considérons un exemple de simulation. Comme pour toutes les simulations conduites dans cette thèse, nous utilisons un modèle de tête réaliste, des sources distribuées qui sont caractérisées par des patchs avec des signaux de pointes épileptiques, hautement corrélés et physiologiquement plausibles, et des artéfacts enregistrés lors d’une session d’EEG. Ceci nous permet d’évaluer les performances des méthodes testées dans un cadre réaliste. Nous simulons 32 s de données EEG pour un système avec 91 capteurs en utilisant deux patchs qui se trouvent dans le lobe frontal inférieur et dans le lobe pariétal inférieur de l’hémisphère gauche, et qui émettent des signaux de pointes épileptiformes propagées avec un délai d’environ 16 à 18 ms entre les deux patchs. Les données sont corrompues par des artéfacts musculaires selon un RSB de -15 dB. La figure 5.1 (gauche) montre un extrait des mesures EEG bruitées pour 32 des 91 électrodes et les patchs originaux. Pour séparer l’activité épileptique et les artéfacts, nous appliquons d’abord l’algorithme P-SAUD aux données EEG brutes, suivi par l’analyse STWV et l’algorithme DA pour la séparation de sources et la localisation. Les données EEG contentant uniquement les pointes épileptiques, qui ont été reconstruites à partir des résultats de P-SAUD, et les patchs localisés par STWV-DA, appliqué aux données débruitées, sont illustrées dans la figure 5.1 (droite). En comparant les régions de sources originales et estimées, nous remarquons la bonne performance de la procédure en trois étapes proposée.
Conclusions Nous avons démontré que l’algorithme P-SAUD extrait les signaux épileptiques avec la même précision que les méthodes d’ACI conventionnelles, mais avec une complexité numérique qui est considérablement réduite. Pour ce faire, nous avons combiné les points forts de trois méthodes classiques pour la séparation de sources aveugle pour dériver une nouvelle méthode efficace de déflation semi-algébrique qui repose sur l’auto-corrélation des signaux pour déterminer l’ordre des composantes d’ACI identifiées. L’exploitation de la structure temporelle des données évite le recours aux signaux de référence, qui ont été utilisés précédemment dans l’approche ACI-R [35, 36, 34, 37, 38, 39] pour extraire les signaux d’intérêt, mais qui peuvent être difficile à déterminer en pratique. Pour les signaux de pointes épileptiques propagées que nous considérons pour les dynamiques des sources,
RÉSUMÉ DE LA THÈSE
xiii
la méthode tensorielle STF donne des mauvais résultats. En effet, cette technique ne permet pas de séparer les sources hautement corrélées parce qu’elle exploite les divergences dans le domaine temps-fréquence pour distinguer les sources, et les différences entre les caractéristiques de temps et de fréquence des différents patchs sont négligeables dans le contexte de phénomènes de propagation. Par contre, l’analyse STWV exploite le contenu espace-vecteur d’onde des sources pour les séparer, ce qui permet de traiter les signaux de sources hautement corrélés tant que les sources ne sont pas complètement cohérentes et suffisamment distantes pour donner des caractéristiques d’espace et de vecteur d’onde différents. Ceci explique la bonne performance de l’approche de localisation de sources basée sur la méthode STWV qui a été observée pour un certain nombre de scénarios et particulièrement dans le cas de plusieurs patchs surfaciques et simultanément actifs. Néanmoins, l’analyse STWV échoue dans certains cas comme pour des sources profondes, même s’il n’y a ni bruit ni artéfacts et si tous les signaux à l’intérieur d’un patch sont identiques. Ceci est dû aux divergences entre la structure des données EEG et le modèle tensoriel trilinéaire utilisé. Pour clarifier dans quels cas les méthodes tensorielles peuvent être appliquées avec succès pour la séparation de sources, nous avons conduit une analyse théorique révélant que parmi la corrélation entre les caractéristiques d’espace, de temps et de vecteur d’onde des différentes sources, la puissance des sources est également déterminante pour la séparabilité des sources. Même si le prétraitement de l’EEG peut parfois être réalisé par l’ACI ou les méthodes tensorielles toutes seules, ces deux approches sont essentiellement complémentaires. Intrinsèquement, l’ACI n’est pas adaptée à la séparation de sources corrélées et donne des mauvais résultats pour des signaux de pointes épileptiques qui se recouvrent partiellement, tandis que les méthodes de décomposition tensorielle ne sont pas aussi robustes aux artéfacts que l’ACI. En conséquence, pour obtenir des bons résultats de prétraitement pour les données EEG qui contiennent fréquemment des sources corrélées à cause des phénomènes de propagation et qui sont généralement corrompues par des artéfacts avec grandes amplitudes, les deux méthodes de prétraitement devraient être combinées. Néanmoins, les données prétraitées devraient être analysées avec soin puisque les erreurs des étapes de débruitage et de séparation de sources peuvent s’accumuler. L’étude de performance des différentes méthodes de localisation de sources distribuées a montré que les algorithmes testés peuvent être distingués en méthodes qui permettent seulement de déterminer les positions des sources et algorithmes qui donnent également des indications sur les étendues spatiales des sources. Dans le contexte de la localisation de sources épileptiques, nous nous intéressons particulièrement au deuxième type d’approche et nous avons proposé deux nouveaux algorithmes. Il a été montré que l’algorithme STWV-DA, qui localise les sources distribuées à partir des résultats de la décomposition tensorielle, donne des bons résultats si les sources ont été correctement séparées dans l’étape de prétraitement. Dans ce cas, cette méthode affiche une meilleure performance que les autres techniques de localisation de sources distribuées. En particulier, même si le pré-blanchiment améliore les résultats de localisation obtenus par STWV-DA, cette méthode donne aussi des bons résultats si elle est appliquée aux données EEG brutes, ce qui n’est pas toujours le cas pour les autres techniques comme 4-ExSo-MUSIC. Ceci est d’un grand intérêt parce que le pré-blanchiment nécessite la connaissance de la matrice de covariance du bruit qui n’est pas connue et est généralement difficile à estimer en pratique. Un autre avantage de STWV-DA sur 4-ExSo-MUSIC est sa complexité numérique réduite. Avec l’utilisation grandissante de l’EEG de haute résolution, ceci est un point
xiv
RÉSUMÉ DE LA THÈSE
important pour éviter des temps de calcul inacceptablement longs. Toutefois, STWV-DA n’est pas adapté à la localisation simultanée de plusieurs patchs, c’est-à-dire à la localisation de sources distribuées qui ne peuvent pas être séparées à cause de leurs signaux quasiment cohérents. Dans ce cas, l’algorithme SVB-SCCD devrait être utilisé. Cette approche bénéficie de l’avantage supplémentaire qu’elle est flexible par rapport à la forme du patch, tandis que les méthodes qui utilisent un dictionnaire de sources distribuées potentielles comme STWV-DA et 4-ExSo-MUSIC ont tendance à identifier des patchs aux formes comparables à celles des éléments du dictionnaire. La méthode SVB-SCCD permet également l’exploitation de la structure temporelle ce qui donne des estimées de sources plus robustes dans les cas difficiles. Finalement, nous avons analysé la combinaison de la méthode tensorielle STWV et SVB-SCCD pour des scénarios avec multiples patchs pour lesquels seulement une partie des patchs peut être séparée, mais l’algorithme STWV-SVB-SCCD a affiché une performance qui est légèrement détériorée par rapport à SVB-SCCD. Ceci montre que l’approche de prétraitement tensoriel pour la séparation de sources est uniquement efficace si elle est combinée avec les méthodes de localisation de sources qui ont des difficultés à localiser plusieurs patchs simultanément comme, par exemple, DA.
Perspectives Basés sur les résultats obtenus, nous pouvons identifier quelques directions prometteuses pour la suite des travaux. Tout d’abord, il serait intéressant d’appliquer l’algorithme P-SAUD pour l’élimination d’artéfacts de données EEG de haute résolution. À cause du grand nombre d’électrodes, on peut attendre un gain particulièrement élevé en complexité numérique comparé aux méthodes conventionnelles d’ACI pour l’extraction de sources. En outre, on pourrait considérer d’utiliser l’algorithme P-SAUD pour le débruitage de données de crises épileptiques pour lesquelles un compromis entre les solutions de COM2 et de CCA, basé sur la fonction de contraste pénalisée, peut entraîner des meilleurs performances. Concernant la séparation de sources corrélées basée sur la décomposition canonique, dans cette thèse, nous avons seulement traité des tenseurs STF et STWV. Comme discuté ci-dessus, l’approximation trilinéaire n’est pas toujours justifiée dans ce cas. Pour surmonter ce problème, d’un côté, on pourrait explorer l’utilisation de tenseurs avec d’autres dimensions comme, par exemple, des données espace-temps-réalisation (STR) qui correspondraient peut-être mieux à la structure de la décomposition canonique. De l’autre côté, on pourrait aussi utiliser diffŕentes décompositions tensorielles avec une structure plus flexible qui reflète mieux la vraie structure des données. Une autre piste pour la suite des travaux dans le domaine de la localisation de sources distribuées consiste à explorer davantage les différentes combinaisons des informations a priori, par exemple en fusionnant les stratégies des différentes approches de localisation de sources distribuées récemment établies qui donnent des bons résultats comme, par exemple, les techniques tensorielles, les approches par recherche exhaustive pour les sources étendues, ou les approches Bayésiennes avec la parcimonie. En plus, on pourrait essayer d’améliorer les résultats des méthodes de localisation de sources distribuées utilisées actuellement en appliquant des techniques de compensation pour l’erreur systématique d’estimation des amplitudes des sources profondes. Il serait également préférable de développer des méthodes pour le seuillage automatique des distributions spatiales des
RÉSUMÉ DE LA THÈSE
xv
sources reconstruites. Ceci permettrait de déterminer l’étendue spatiale des régions de sources à partir des solutions de localisation continues comme on les obtient pour les approches de moindres carrés régularisés. Les méthodes discutées dans cette thèse pourraient également être appliquées aux données MEG et on pourrait poursuivre l’exploitation des enregistrements de l’EEG/MEG conjointe ce que nous avons discuté brièvement pour le prétraitement tensoriel. Finalement, il serait important d’effectuer des évaluations supplémentaires sur données EEG cliniques pour lesquelles nous avons une forte hypothèse sur les régions épileptogènes pour confirmer le bon fonctionnement des algorithmes proposés dans un cadre réaliste.
Abstract Electroencephalography (EEG) is a routinely used technique for the diagnosis and management of epilepsy. In this context, the objective of this thesis consists in providing algorithms for the extraction, separation, and localization of epileptic sources from the EEG recordings. In the first part of the thesis, we consider two preprocessing steps applied to raw EEG data. The first step aims at removing muscle artifacts by means of Independent Component Analysis (ICA). In this context, we propose a new semi-algebraic deflation algorithm that extracts the epileptic sources more efficiently than conventional methods as we demonstrate on simulated and real EEG data. The second step consists in separating correlated sources that can be involved in the propagation of epileptic phenomena. To this end, we explore deterministic tensor decomposition methods exploiting space-timefrequency or space-time-wave-vector data. We compare the two preprocessing methods using computer simulations to determine in which cases ICA, tensor decomposition, or a combination of both should be used. The second part of the thesis is devoted to distributed source localization techniques. After providing a survey and a classification of current state-of-the-art methods, we present an algorithm for distributed source localization that builds on the results of the tensor-based preprocessing methods. The algorithm is evaluated on simulated and real EEG data. Furthermore, we propose several improvements of a source imaging method based on structured sparsity. Finally, a comprehensive performance study of various brain source imaging methods is conducted on physiologically plausible, simulated EEG data.
Résumé L’électroencéphalographie (EEG) est une technique qui est couramment utilisée pour le diagnostic et le suivi de l’épilepsie. L’objectif de cette thèse consiste à fournir des algorithmes pour l’extraction, la séparation, et la localisation de sources épileptiques à partir de données EEG. D’abord, nous considérons deux étapes de prétraitement. La première étape vise à éliminer les artéfacts musculaires à l’aide de l’analyse en composantes indépendantes (ACI). Dans ce contexte, nous proposons un nouvel algorithme par déflation semi-algébrique qui extrait les sources épileptiques de manière plus efficace que les méthodes conventionnelles, ce que nous démontrons sur données EEG simulées et réelles. La deuxième étape consiste à séparer des sources corrélées. A cette fin, nous étudions des méthodes de décomposition tensorielle déterministe exploitant des données espace-temps-fréquence ou espace-temps-vecteur-d’onde. Nous comparons les deux méthodes de prétraitement à l’aide de simulations pour déterminer dans quels cas l’ACI, la décomposition tensorielle, ou une combinaison des deux approches devraient être utilisées. Ensuite, nous traitons la localisation de sources distribuées. Après avoir présenté et classifié les méthodes de l’état de l’art, nous proposons un algorithme pour la localisation de sources distribuées qui s’appuie sur les résultats du prétraitement tensoriel. L’algorithme est évalué sur données EEG simulées et réelles. En plus, nous apportons quelques améliorations à une méthode de localisation de sources basée sur la parcimonie structurée. Enfin, une étude des performances de diverses méthodes de localisation de sources est conduite sur données EEG simulées.