Sensor array signal processing and the neuro ... - CiteSeerX

6 downloads 33 Views 2MB Size Report
de Munck, Leon Kenemans, Fetsje Bijma, en Jérôme Daltrozzo. ...... A. R. McIntosh, C. L. Grady, L. G. Ungerleider, J. V. Haxby, S. I. Rapoport, and B. Horwitz.
Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain

Raoul P. P. P. Grasman

The Netherlands Organization for Scientific Research (NWO) is gratefully acknowledged for funding this project. This research was conducted while Raoul Grasman (527-25-014) was supported by a grant of the NWO foundation for Behavioral and Educational Sciences of this organization awarded to Hilde M. Huizenga, Peter C. M. Molenaar, and Leon J. L. Kenemans. ISBN 90-9018452-X Printed by PrintPartners Ipskamp B.V., Enschede Typesetting was done in LATEX 2ε Graphics were generated in R c 2004 Raoul P. P. P. Grasman Copyright  All rights reserved

SENSOR ARRAY SIGNAL PROCESSING AND THE NEURO-ELECTROMAGNETIC INVERSE PROBLEM IN FUNCTIONAL CONNECTIVITY ANALYSIS OF THE BRAIN

Academisch Proefschrift

ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag van de Rector Magnificus prof. mr. P. F. van der Heijden ten overstaan van een door het college voor promoties ingestelde commissie, in het openbaar te verdedigen in de Aula der Universiteit op dinsdag 21 september 2004, te 10:00 uur

door Raoel Philippe Peter Paul Grasman geboren te Haarlem

Promotie commissie: Promotor:

prof. dr. P. C. M. Molenaar

Co-promotor:

dr. H. M. Huizenga

Overige leden:

prof. dr. A. Mooijaart prof. dr. A. Nehorai prof. dr. V. A. F. Lamme dr. C. V. Dolan dr. J. C. de Munck

Faculteit der Maatschappij en Gedragswetenschappen Afdeling Psychologie

5

6

dankwoord  acknowledgement

Aangezien je op de ´e´en of andere manier een voorwoord moet inleiden, het volgende: Het schrijven van een proefschrift heeft tot doel het aanleren van het zelfstandig verrichten van wetenschappelijk onderzoek, en het daar over rapporteren middels openbare publicaties en spreekbeurten. Hoewel ik hoop dat dat enigszins gelukt is (dit proefschrift zou in principe een proeve van bekwaamheid moeten zijn), zou ik niet willen beweren dat ik nu volleerd wetenschapper ben—in tegendeel, ’t begint nu pas. Het is nog maar de de vraag of ik dat stadium ooit bereik, maar zoals met alles: je komt er niet alleen. Er staat een heel leger van mensen om je heen, en een bedankje mag er dus wel van af! In de eerste plaats wil ik Peter Molenaar en Hilde Huizenga, respectievelijk mijn promotor en co-promotor, bedanken voor het mij (aan)bieden van de mogelijk om me na m’n studie experimentele psych(fysio)ologie een aantal jaren te verdiepen in het inherent letterlijk multidisciplinaire en veel technischer vakgebied van de biomedische techniek—het is niet niks om je net-bij-elkaar-geschreven geld in iemand te investeren en dan maar te moeten hopen dat ’ie er iets van bakt, temeer ook omdat er eigenlijk iemand werd gezocht met een exacte discipline als achtergrond (had ik even geluk dat het niet storm liep op die advertentie...!). Vooral Hilde wil ik bedanken voor het altijd weer bereid zijn manuscripten n´ og maar weer eens te lezen en te voorzien van genadeloos opbouwende kritiek, en voor het eindeloze geduld met mijn tegenstribbelen, eigenwijsheid en getreuzel. Furthermore, I would like to thank the members of the committee for reading and for their approval of the entire manuscript: Prof. Dr. Arye Nehorai of the department of Electrical and Computer Engineering of the University of Illinois at Chicago, Prof. Dr. Victor Lamme of the Psychonomics department of the University of Amsterdam, Prof. Dr. Ab Mooijaart of the Methodology of Psychology department of Leiden University, Dr. Conor Dolan of the Psychological Methodology department of the University of Amsterdam, and Dr. Jan de Munck of MEG Center of the Free University Medical Center in Amsterdam. In het bijzonder wil ik de mensen van het NWO-aandachtsgebied noemen: Koen B¨ocker, Jan de Munck, Leon Kenemans, Fetsje Bijma, en J´erˆome Daltrozzo. Met name aan Koen heeft dit NWO-aandachtsgebied de frequente overlegbijeenkomsten te danken. Voor het nakijken, lezen, verbeteren en doorworstelen van stapels kladversies betuig ik mijn dank aan Hilde, Lourens, Peter, Koen, Conor, Tony, Mari¨ette, Ingmar, Robert, en Marije. Natuurlijk ook speciale dank aan Lourens Waldorp—studie-genoot, projectassistent-genoot, kamer-genoot, sollicitatie-genoot, OiO-genoot, weer kamer-genoot, g´e´en onderzoeksschoolgenoot, NWO-aandachtsgebied-genoot, vooral interesse- (wiskunde, mathematische statistiek) genoot, discussie-genoot, uitleggen-aan-elkaar-genoot, NWO-aandachtsgebied-genoot, discussiegenoot (nieuws enzo...), boekenleesclub-genoot, congresbezoeker-genoot en daarmee ook reisgenoot (Utrecht, D¨ usseldorf, Groningen, Helsinki/Espoo, Jena), verhuis-genoot, weer kamergenoot (maar nu met mooi uitzicht over de stad), discussie-genoot, ‘gedwongen’ verhuis-genoot (het uitzicht is ook nog wel fijn—het beloofde alleen wat drukker te worden met z’n vieren op ´e´en kamer), discussie-genoot (of had ik dat al gehad?), co-auteur-genoot, paranimf-genoot, ba-

8

dankwoord  acknowledgement

chelorparty-genoot, nog steeds discussie-genoot, en nu dan ook weer postdoc-genoot—voor het controleren van mijn gepruts op papier, voor de uitgebreide discussies, en zeker ook voor het mij altijd weer herinneren aan deze of gene afspraak of deadline—zelfs het op tijd contact opnemen met de drukker is aan hem te danken.... (Tip voor wie op zoek is naar een goed boek: vraag hem eens—bijna alle boeken over mathematische statistiek waar ik naar refereer, en wel meer, zijn te danken aan zijn radar voor goede boeken.) Naast werken waren er hackysack (Peter, Conor, kleine Peter, Maarten, Michiel, Denny, Ellen, Robert, Martijn B, Paul, Eric-Jan, Ingrid, Ingmar, Jasper, Lourens, Wery, Mark, Han, Maartje, Richard), bioscoop (Paul, Petra, Robert, Marije, Lourens, Ingrid, Robert B, Martijn B, Wery), vrijdagmiddagborrel/etentjes/feestjes (Lourens, Ingrid, Ellen, Wery, Ingmar, Eric-Jan, Denny, Dave, Hilde G, Verena, Romke, Jaap, Ren´e, Diane, Martijn M, Robert, Marije, Janneke, Steven, Jennifer, Heleen, Jasper, Steve, Klaartje, Mante, Marte, Johannes), kamergezelligheid (Lourens, Bert, Margot, kleine Peter, Wery, Annemie, Eric-Jan), altijd welkome dagelijkse interrupties/discussies/verveelkwartiertjes/consultaties/divergenties (Ellen, Lourens, Eric-Jan, Han, Ingmar, Annemie, Mark, Wery, kleine Peter, grote Peter, Conor, Hilde, Richard, Robert (Californi¨e!)), koffiehoek sessies (Mark, Brenda, Rena, Ingmar, Ingrid, Eveline, Mari¨ette, Meindert, Riek, Margot, Han, Richard, Atie), EPOS AiO-dagen in Egmond aan Zee en Noordwijk (Peter F, Gea, Eric-Jan, Mark, Rena, Ren´e, Diane, Hilde G, Sander N, Ingrid, Robert, Martijn M, Pauline, Durk, Bjørn, Durk, Paul, Michiel de R., Tako, Letty, Eveline, Bram, Noortje, Hedderik, Mark R, Wery), en collega’s van de 10e (Maurits, Han, Hilde, Riek, Maartje, Richard, Ad, Margot, Louis, Atie, Bert, Cor, Eveline, Mari¨ette, Michel, Mirjana, Verena, Annemat, Jan, Meindert, Brenda, Ingmar, Jasper, Silvan, Guido, Eric-Jan, Annemie, Lourens, Rena, Mark en Wery). Zonder dat allemaal had ik mezelf niet langer gekweld en was ik er allang geleden mee gekapt. Finally I would like to express my gratitude towards the “R Development Core Team” and the Free Software Foundation for their effort in de rise of mankind with their R-project and promotion of the open source gospel respectively, and of course also the people of MiKTeX at dante.de for their TEX/LATEX implementation.

Table of Contents

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Origins of EEG and MEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Forward and inverse problem: dipole model and localization . . . . . . . . . . . . . . . . . . . 3 1.2.1 Nonlinear regression: a limited number of equivalent current dipoles . . . . . . . 5 1.2.2 Linear search methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.3 Distributed source models: Linear current density reconstruction . . . . . . . . . . 9 1.3 Inverse methods for functional connectivity research . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 This thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2

Cortico-cortical interactions and their analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Interactions Among Nerve Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Point Process Description of Neuronal Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Noninvasive Methods for Determining Cortico-Cortical Interactions . . . . . . . . . . . . . 2.3.1 Terminology and Some Theoretical Considerations . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Cross-Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Coherence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Phase-locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Event Related (De-)Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.6 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.7 Path Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.8 Parametric Modelling: Vector Auto-Regressive Models . . . . . . . . . . . . . . . . . . . 2.3.9 Parametric modelling: Dynamic Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14 14 15 19 19 21 22 27 29 30 31 32 33 34

3

Mean and Covariance Structures for Complex Random Variables . . . . . . . . . . 3.1 Complex random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Maximum likelihood mean and covariance structure analysis for complex normal random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Concentrated likelihood methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Special structure: (pseudo) confirmatory factor model . . . . . . . . . . . . . . . . . . . 3.3 Generalized least squares in covariance structure analysis of complex random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Best GLS estimators in covariance structure analysis of complex variables. . 3.3.2 Special structure: (pseudo) confirmatory factor model structure . . . . . . . . . . . 3.4 Cross-spectrum structures in the analysis of time series . . . . . . . . . . . . . . . . . . . . . . .

38 38 39 40 41 49 50 53 56

10

4

5

6

7

CONTENTS

Frequency domain simultaneous source and source coherence estimation with an application to MEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Model specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Data generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Parameter estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Simulation results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Starting values and algorithm convergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Application to MEG data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59 59 60 60 62 64 64 65 66 67 68 69 70

Stochastic maximum likelihood mean and cross-spectrum structure estimation of EEG/MEG dipole sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Dipole model and measurements model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Model specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Mean and cross-spectrum structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Linear filter model for interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Structure of cross-spectrum Θk of the noise signals . . . . . . . . . . . . . . . . . . . . . 5.4 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 The case that B = 0 (unparameterized Ψ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 The case that B = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Assessment of model fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73 73 74 75 75 76 77 77 78 80 80 81 83

Optimizing interpretability of averaging kernels for the neuroelectromagnetic inverse problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Estimable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Relation with linear methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Optimized criteria of linear methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 “Simpleness” and simple estimable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Finding ‘simple’ linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Numerical illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Rotation of the lead field: comparison of the criteria . . . . . . . . . . . . . . . . . . . . 6.4.2 Comparison with linear techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86 86 87 87 88 89 89 89 90 91 91 94 95

Summary and discussion: requirements for accurate connectivity estimation 97 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 7.1.1 Justifiable attribution of cortical dynamics parameters to localized brain areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 7.1.2 Multiple dipoles modelling of functional and effective connectivity . . . . . . . . 98 7.1.3 Linear inverse methods in connectivity research . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.2 General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

CONTENTS

11

7.2.1 Functional connectivity estimation with neuroelectromagnetic inverse techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.2.2 Requirements for unambiguous functional connectivity estimation: ‘Signal’ to ‘noise’ ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.2.3 Directions for improvement: Increasing SNR . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 A Some operations on complex matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 A.1 Algebra of certain operators on complex matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 A.2 Commutation and Conjugation matrices, and the vech{} operator . . . . . . . . . . . . . . 112 A.3 Some results concerning the Moore-Penrose inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.4 Further results on the operator {}R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 A.4.1 Var(vec{V}) and Var([vech{S}]R ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 B

Nonlinear weighted least squares for complex random variables . . . . . . . . . . . . 120

C Generalization of the concentrated likelihood method . . . . . . . . . . . . . . . . . . . . . . 123 Nederlandse Samenvatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

12

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

1 Introduction

Cognitive neuroscience combines research objectives from both cognitive psychology and the neural sciences. Its main objectives consist of identifying the brain systems involved in different types of information processing, and establishing how various brain systems cooperate to give rise to basic cognitive functioning [127]. It does so, by studying the brain in human subjects as they are engaged in performing elementary mental operations required in a cognitive task. Even the simplest cognitive task requires the collaboration of a large number of specialized brain systems. A simple button press response to a sensory stimulus involves the coordination of sensory, association and other areas that analyze the stimulus, the motor systems that execute the response, and other systems that serve to allocate and direct attentional resources, match incoming signals to task requirements, and decide what action to take [77]. Intercortical neural networks that link different specialized cortical regions are thought to coordinate the neural assemblies involved in elementary cognitive operations [24, 230]. These assemblies have been reported to last tens of milliseconds to several seconds [99]—long enough for the activity to propagate through the networks between cortical areas with tens of milliseconds transmission delays (cf. [230]). It has therefore been suggested that the relevant measures that describe these networks are not the activities of the local assemblies engaged in the intercortical network, but the parameters that describe the dynamics of their interactions [24, 230]. This thesis is concerned with the possibility of non-invasively determining cortical dynamics. In particular, it is concerned with the techniques for the detection and modelling of corticocortical interactions, on the basis of neuro-electromagnetic signals that are non-invasively obtained from human subjects. These electromagnetic signals, the electro- and magnetoencephalogram, originate from within the brain, and reflect highly synchronized activity among large numbers of neurons. The electroencephalogram (EEG) measures electric potential differences between pairs of electrodes on the scalp. The magnetoencephalogram (MEG), the magnetic counter part of the EEG, measures the magnetic inductance near the scalp. Because EEG and MEG field patterns are mutually orthogonal, they complement each other, in that the spatial gradients of their field patterns provide more accurate spatial information in complementary directions [102, 114]. MEG is less sensitive to inhomogeneous conductive properties of the head, and has therefore been argued to provide scalp topographies with better spatial resolution than EEG does [102, 240]. EEG on the other hand is more sensitive to neural activity that is located deep inside the brain [114]. Event-related brain responses—brain responses that are evoked by the presentation of a sensory stimulus or the requirements of a cognitive task—are often buried in background brain activity. To improve the signal to noise ratio (that is, the ratio event-related vs. background activity), the responses are recorded in multiple trials, and then averaged across these trials. The resulting average waveform is referred to as the event related potential/field (ERP/ERF). The underlying assumption in this averaging, is that the event-related response is the same in each trial, and that background activity, that is not phase locked to the event, is independent

2

1 Introduction

of the event-related activity. By far the largest proportion of research involving electro- and magnetoencephalographic signals, has been concerned with event-related activity. Extensive experimentation has shown that the ERPs/ERFs consist of a series of components, whose waveforms can be manipulated more or less individually, by varying stimulus characteristics or task requirements [24]. A significant problem in the investigation of ERPs/ERFs is that these components reflect a combination of potentials/magnetic inductance, in unknown proportions, from multiple neural populations [24]. Hence, an important task in ERP/ERF studies is to decompose the signals into their underlying constituent neural sources [24]. Therefore, more recently, the locations of the sources of these wave forms have become more strongly of interest. The assumption underlying the ERP/ERF that the responses are the same across trials, is a rather strong assumption, and non-stationary effects, like habituation and fatigue of the subject, has been of some concern in the literature (e.g. [163]). The interest in trial to trial variations of the response has therefore increased, and single trial analysis has become a challenge to methodology development [12]. Furthermore, focuss has directed more and more to analyzing the variance of, and covariance between, EEG/MEG sensor signals, and their relation to cortical interactivity dynamics in basic cognitive processes [63, 69, 70, 77].

1.1 Origins of EEG and MEG When an external stimulus impinges on the senses, neurons in the senses send efferent follies of action potentials along their axons to the brain. In the brain, these axons terminate on the dendrites of other neural cells. The action potential follies increase the likelihood of the release of neurotransmitter at the axon’s terminal. Released neurotransmitter binds to the postsynaptic cell, to locally change the conductivity properties of the dendritic membrane of the postsynaptic cell. As a result of this change in conductivity, currents leak into (or out off) the dendrite, causing a deflection in the local dendritic transmembrane potential. This phasic change of the transmembrane potential is called the postsynaptic potential (PSP). The postsynaptic potential inflicts an intracellular potential difference between the affected dendrite and the cell’s soma, from which an intracellular current arises, called the primary current, that restores the potential equilibrium inside the cell. The local depletion of current at the synapse outside the cell, and the build up of charge near the soma, induce extracellular return currents. These return currents close the current loop, so that there is no net buildup of charge [12, 102, 127]. Return currents are also called secondary or volume currents. Both primary and volume currents, associated with postsynaptic potentials, are believed to contribute to the neuroelectric and neuromagnetic fields measured with EEG and MEG [102, 240]. Action potentials in axons on the other hand, are believed not to contribute to the externally measured fields: Although, like PSPs, they are constituted by inward directed currents, these currents flow in two opposing directions, so that the field largely cancels, and can barely be detected at (relatively) great distances. The superposition principle of electromagnetic theory implies, that the fields that are associated with different PSPs, contribute linearly to the externally measured fields—i.e., the externally measured field is a simple sum of the individual fields. The current-dipole moment associated with a PSP is very small, of order 20 fA·m, while typical current estimates on the basis of measured magnetic field outside the head are much larger (∼ 10 nA·m). Because of this difference, it has been determined that about a million synapses need to be active simultaneously, to give rise to the measured field [102, p. 424]. Due to partial cancellation of the fields however, a lot more may need to be active, and—based on the observation that a typical neuron receives input through a few thousand synapses—it has been estimated that an externally detectable field, requires several thousands of neurons be synchronously active [102]. Furthermore, these synchronized neurons should have their apical dendrites aligned in parallel, so that their fields will not cancel each other.

1.2 Forward and inverse problem: dipole model and localization

3

Most neural cells in the cortex are stellate cells and pyramidal neurons. Pyramidal neurons are particularly well arranged for their fields to be detected outside the skull: Their apical dendrites are aligned in parallel with each other, and are arranged perpendicularly to the surface of the cortex [102, 127]. Furthermore, neighboring cells in the cortex (organized in macrocolumns [127, 171]) receive closely related input activity, and the duration of the PSPs in their dendrites is relatively long (approximately 10 msec). Therefore, these cells can easily synchronize their PSPs—that is, have their PSPs overlap in time to a large extent. This is an another reason for believing that axonal action potentials do not contribute to the externally measured field: They have very short duration, and, therefore, do not easily overlap.

1.2 Forward and inverse problem: dipole model and localization Forward problem If a current flows in a straight portion of a nerve fiber of uniform thickness, it can be viewed as a tiny element of current. This tiny current element has its ‘source’ and ‘sink’ very near to one-another (when compared to its distance to the measurement sensor). The mathematical idealization of this situation, in which source and sink are getting arbitrarily close together, is called a current dipole. A current dipole is determined by its location in three dimensional space, and its moment. The moment is determined by its orientation in space, and its amplitude. From a distance (e.g. at the scalp), the primary current associated with a PSP looks like a (mathematical) current dipole, with its orientation along the dendrite in which the current flows [102]. If the primary current(s) and the conductivity distribution of the surrounding medium are known, the resulting electric (EEG) and magnetic (MEG) fields can be calculated from Maxwell’s equations. Knowing the primary currents therefore, and given the conductivity profile of the volume conductor it is contained in, enables one to predict the measured electric and magnetic fields. The calculations for doing this are not always easy, and performing these is referred to as ‘solving the forward problem’. One simplifying feature, is that because of the relatively long time course in which the neural processes take place, it is reasonable to adopt the quasi-static approach to the Maxwell equations, in which time dependent changes of the fields are neglected. Furthermore, the adoption of certain approximations to the volume conductor geometry of the head, can lead to exceedingly simpler calculations in solving the forward problem. As an example, we consider the simplest case of a conductor: an (infinite) homogeneous isotropic conductor of known conductance σ, whose medium is assumed to be linear to avoid dielectric complexities. The quasi-static version of Maxwell’s equations imply that the electric field E can be written as the gradient of a scalar potential field, E = −∇V , and the curl of the magnetic field B at each point is proportional to the current density J at that point, ∇×B = µ0 J. Here µ0 is the constant of magnetic permittivity in free space. Furthermore, because the magnetic field is purely rotational, the divergence of the magnetic field is always zero, ∇ · B = 0. For conciseness we have suppressed the dependence of the fields, the potential, and the current density, on the location, r . The current density J inside the conductor consists of the primary currents, Jp , and return currents that result from the electric field E set up by Jp , and from the conductivity of the conductor: J = Jp + σE = Jp − σ∇V . The interest lies in determining the potential function V and the magnetic field B from knowledge of the primary current Jp . Because the divergence of a curl is always zero, we have ∇ · ∇×B = 0 = µ0 ∇ · J = µ0 ∇ · Jp − µ0 σ∇2 V , where ∇2 is the Laplace operator. Hence, for the potential, we find the (Poisson) equation σ∇2 V = ∇ · Jp . If V is required to vanish at infinity, the solution to this equation is V (rr a ) = −(4πσ)−1 G (∇ · Jp (rr ))/rr a − r d3r at the (sensor) location r a [95]. The area of integration, G, is a region that contains all primary currents. The solution  to the magnetic field equation ∇×B = µ0 J is known as the Bio-Savart law, and is B(rr a ) = (µ0 /4π) G J × (rr a − r )/rr a − r 3 d3r = (µ0 /4π) G (Jp − σ∇V ) × (rr a − r )/rr a − r 3 d3r . With help of the equalities ∇(1/rr a − r ) = (rr a − r )/rr a − r 3 , ∇ · (φA) = ∇φ · A + φ∇ · A for scalar φ and  vector A, and Gauss’ divergence theorem, and by taking G arbitrarily large, we find V (rr ) = (4πσ)−1 G Jp · (rr a −rr )/rr a −rr 3 d3r . If the primary current consists of a single dipole, with moment q , i.e.,

4

1 Introduction

Jp = q δ(rr − r 0 ), where δ is Dirac’s delta, and is substituted in this equation, the solution to the forward problem for the potential becomes V (rr a ) = (4πσ)−1q · (rr a − r 0 )/rr a − r 0 3 . Substitution of this solution into B(rr a ) above, yields the solution for the magnetic field: B(rr a ) = (µ0 /4π) q × (rr a − r 0 )/rr a − r 0 3 , because ∇V = 0, as r does not appear in V for this primary current. The simplicity of these solutions can be entirely attributed to the assumption of a homogenous volume conductor. For more complicated conductor geometries, the solutions for V and B become exceedingly complex. This can be seen from ∇ · ∇×B = 0 = ∇ · J; if the assumption of homogenous conductance does not hold, this equation implies ∇ · Jp = ∇ · (σ∇V ) = ∇σ · ∇V + σ∇2 V . The latter shows, that the change in conductance, ∇σ, has to be taken into account. Intuitively, it furthermore indicates that in a piece-wise constant conductor, in which ∇σ is zero everywhere except at the boundaries of the constituent pieces, the complications in solving the forward problem, are due to these boundaries. Piecewise constant conductors play an important role in general approaches for approximate solutions to the forward problem; in particular in numerical methods, like boundary and finite element methods, that cope with realistic conductor models [102, 195].

It follows from the superposition principle, that once we possess the solution to the forward problem for an elementary current dipole, given a conductor geometry, the fields of more complex sources (multiple PSPs) can be obtained readily, by summing the individual fields [102].1 Forward solutions for dipole sources in EEG and MEG for many different conductor geometries of the head have been developed [52, 168, 195]. In some special conductor geometries, the volume currents (due to the electric field set up by primary currents), cause an equal but opposite field to that generated by the primary current. The net external field is then zero [102]. An important example of this is a spherically symmetric conductor: Only currents that have a component tangential to the surface of a spherically symmetric conductor produce a magnetic field outside; radial sources are externally silent [102,195]. This is important, since at least locally, the head is well approximated by a sphere at many locations. Therefore, MEG is mostly sensitive to tangentially oriented dipole sources. Such sources are generally found in the fissures of the cortex, where the folding of the cortical sheet is such, that the apical dendrites of the pyramidal neurons are approximately oriented tangential to the surface of the nearby skull [102]. An important feature of forward solutions is that, due to the superposition principle, it is always possible to write the relation between the fields and the current dipoles in such a way, that the dipole moment q 0 (a 3-vector) enters linearly in the equation. This means that given a dipole source at location r 0 , the measurement ya (either EEG or MEG) can be written in the form ya = λa (rr 0 )qq 0 . The 1 × 3 row vector λa is called the lead field, and depends on the type of measurements (EEG or MEG), on the volume conductor geometry, on the location of the source, and on the location of the sensor r a at which ya is measured. The lead field is often used to simplify calculating solutions to the inverse problem to be discussed next. (The lead field for EEG in the previous simple example of an infinite homogenous volume conductor, for example, is give by λa = (4πσ)−1 (rr a − r 0 ) /rr a − r 0 3 .)

Inverse problem When interpreting MEG and EEG data in terms of the origins of their sources, one is dealing with the neuro-electromagnetic inverse problem: The primary current distribution is unknown, while their aggregate field is known—or more precisely, measured by the EEG and MEG. The question then is, what primary current could underlie the externally measured field? This problem has no unique solution [102]. One must, therefore, introduce extra information, not contained in the EEG or MEG signals, to obtain a reconstruction of the primary current sources. This information can be incorporated into source models. There are two basic starting points for doing this: One is, by specifying an assumed limited number of current dipoles; the other is, by favoring reconstructions that optimize certain global properties to yield an image of the current distribution in the entire brain volume, i.e., a tomography of the current density in the 1

In fact, a more general statement is true: Any arbitrary primary current distribution, irrespective of it’s geometry, can be arbitrarily closely approximated by an infinite number of current dipoles.

1.2 Forward and inverse problem: dipole model and localization

5

brain [93, 102, 178]. Methods developed within the contexts of these two starting points, are briefly introduced next. Before continuing however, we first set down some notational conventions; these conventions will be maintained throughout this thesis. These notational conventions are repeated at the end of this chapter for easy reference. Bold uppercase letters will be used for matrices X = (xab ), bold lowercase for vectors, X denotes the transpose of a matrix, i.e. X = (xab ) = (xba ). The Frobenius norm of a matrix X, defined as the square root of the sum of the squared components of X, is denoted by XF . If the matrix is a vector x, this norm is also denoted x. The notation X−1 denotes the inverse of a nonsingular square matrix X, while for an arbitrary shaped matrix Y of arbitrary rank, Y+ denotes its Moore-Penrose generalized inverse, a definition of which may be found in App. A (p. 110). 1.2.1 Nonlinear regression: a limited number of equivalent current dipoles The current dipole is a popular source model for EEG and MEG. It is used to approximate the flow of electrical current in a small area of cortex [12]. This use is known as equivalent current dipole (ECD) modelling, because many thousands of tiny weak dendritic current elements, are approximated by one strong current dipole. For EEG measurements, it has been shown in [57] that this approximation is rather adequate: The field of a single current dipole can approximate with little error the field of a circular patch of cortex with a radius up to a third of the radius of the head, whose surface is filled with tiny current dipoles. This, however, also indicates the limited ability of ECD models to by very specific about the extent of the cortical source underlying EEG. The objective of solving the inverse problem in this case, is then, determining the parameters that describe these dipoles—i.e., their locations and their moments (orientations and amplitudes). Given a set of (instantaneous) measurements y = (y1 , y2 , . . . , ym ), obtained with a sensor array that covers the head, and given that d dipole sources are active, located at r 1 , . . . , r d , with moments q 1 , . . . , q d , the measurements y can be related to the sources by the equations ya = [λa (rr 1 ), . . . , λa (rr d )] (qq 1  , . . . , q d  ) , for a = 1, . . . , m. Collecting the locations in θ0 = (rr 1 , . . . , r d ) and the moments in q 0  = (qq 1  , . . . , q d  ), in matrix form this is written y = Λ(θ0 ) q 0 , where the a-th row of the m × 3d lead field matrix Λ is [λa (rr 1 ), . . . , λa (rr d )]. In any measurement situation, measurement noise confounds the data. Therefore, y = Λ(θ0 ) q 0 + ε. Here, ε denotes the measurement noise. The most widely used approach to solving the inverse problem in the ECD modelling approach, is to minimize the squared error; i.e., to obtain the (ordinary) least squares solution ˘ q˘) = arg min y − Λ(θ)qq 2 . (θ, θ,qq

This problem is usually solved by general numerical methods for optimization, such as, simplex, Gauss-Newton, or Newton-Raphson, and variants thereof [12, 79, 198]. These methods are iterative search algorithms, and involve the major computational burden in solving the inverse problem in this approach. These can be especially high when segments of the measured signals are to be modelled: When the sensor array measurements are obtained at multiple points in time, n say, we have a sequence of data vectors, {yt }nt=1 . It is possible to solve the inverse problem at each time point, independent of other time points, but more often, {yt }nt=1 is thought to be due to a time dependent sequence of dipole moments {qq 0,t }nt=1 of dipoles assuming fixed locations θ over time. This assumption constitutes the so called spatiotemporal dipole model. By arranging the sequence of measurements in an m × n matrix Y = (y1 , . . . , yn ), and similarly arranging the moments in the 3d × n matrix Q0 = (qq 0,1 , . . . , q 0,n ), the inverse problem can

6

1 Introduction

˘ Q) ˘ = arg minθ,Q Y − Λ(θ)Q2 . In this case the necessary computations can be be stated (θ, F greatly reduced by solving the equivalent2 problem θ˘ = arg minθ [I − Λ(θ)Λ+ (θ)]Y2F , which has a greatly reduced search space, as it depends only on θ [80, 169]. After θ˘ is found in this ˘ ˘ may be solved for by Q ˘ = Λ+ (θ)Y. way, Q Because in this solution, the dipole moments are left free to assume any orientation at each moment, this solution is sometimes called the spatiotemporal rotating dipole model [12, 102, 169]. Alternatively, a spatiotemporal model can be adopted, in which the dipole orientations can also be assumed fixed over time, while only the source amplitude varies. If the data were noiseless, and there should be any discrepancy between y and Λ(˘ r )˘ q , this discrepancy is due to modelling error—i.e. errors in Λ or in the assumed number of sources. Here we do not consider such error, which is mostly addressed by calculating the percentage of residual variance (%RV). Because the data are subject to noise, discrepancy arises that is not due to modelling error. A solution to the inverse problem in this case, can only be approximate. Therefore, for a given least squares solution to the inverse problem, the question quickly arises, on how much confidence one should have in this solution, in view of the quality of the data. Furthermore, one may ask wether it is the most precise solution possible. To answer these questions, the nature of the noise has to be considered more carefully. The inverse problem is then treated as a nonlinear regression problem [108, 200]. Noise in the measurements may have various causes. In EEG, noise often results from interference of electric fields in the neighborhood of the measurement device, due to badly attached electrodes, due to high impedances, etc. Furthermore, the noise level may not be equal in all sensors, due to imbalanced impedances. In MEG, sensor noise also results from all sorts of fields, e.g., induced by moving metal object like braces. As indicated previously, in ERP/ERF studies, the ongoing background EEG/MEG is also defined to be the noise, while the ERP/ERF itself is considered to be ‘the data’—that is, y is an average of measurements in repeated trials. Averaging the signals over multiple trials (L say) effectively decreases the noise level in y, but it does not eliminate the noise. We let ε denote the noise that is left after averaging. This is, therefore, both instrumental noise, as well as background EEG/MEG. Because EEG/MEG signals of different sensors are highly correlated, the components of ε will be highly correlated. We shall write Υ/L for the variance-covariance matrix of ε, and Υ−1/2 for the inverse of the symmetric “square root” of Υ. For this type of noise, the best least squares solution to the instantaneous problem is obtained from solving [108, 112]  q) = arg min Υ−1/2 [y − Λ(θ)qq ]2 , (θ, θ,qq

and is called the best generalized least squares (BGLS) estimate. It is best, because it is the most precise least squares estimate that can be attained [5,29]. Solving this non-linear regression problem is equivalent to solving θ = arg minθ {I − Υ−1/2 Λ(θ)[Υ−1/2 Λ(θ)]+ }Υ−1/2 y2 , which, again, reduces the dimension of the parameter space for the numerical search method [108]. As in most regression problems, the matrix Υ is unknown, and, in this case, it is estimated from  = V, where V is the sample covariance matrix of the the trial to trial variation around y: Υ  q) is then called the (feasible) generalized least squares estimate trial data. The estimate (θ, (GLS) [5]. In the case of trial averaged data, for instantaneous y, more can be said about ε: Because ε is an average of signals from multiple trials, often L  30, by the Central Limit Theorem (CLT), ε may be treated as a multivariate normal random variable. This justifies the use of normal theory maximum likelihood (ML) estimation of the dipole parameters. Furthermore, since each of the elements of the covariance matrix of the noise is usually freely estimated, and the ML estimator of Υ is essentially the same as the estimate used in the GLS method, the ML estimate is equal to the GLS estimate. An important consequence of this is, that because consistent ML estimators 2

The justification for this equivalence is discussed in chapter 3.

1.2 Forward and inverse problem: dipole model and localization

7

attain the Cram´er-Rao lower bound on the estimation error for unbiased estimators, the GLS estimators are asymptotically efficient. This means, that these estimators are the most precise estimates possible in view of the noisiness of the data (provided no additional information on Υ is available). GLS estimation in the spatiotemporal dipole model is a more complicated extension of the instantaneous model, than was the case for the ordinary least squares method. This is due to the fact that the noise vectors at different time points, {εt }nt=1 , are likely to be correlated. Since the covariances between the elements of εt , and between these elements at different time points, are generally unknown, they have to be estimated. However, the covariance matrix in that case would be a square matrix of dimensions mn × mn. Since the number of sensors is usually large (m > 60; at least in dipole modelling studies), and the number of time points is often greater, this covariance matrix can be quite large in dimension (e.g. if m = 60 and n = 10, the noise covariance matrix is 600 × 600, with 600(600 + 1)/2 = 180, 300 unique elements to be estimated). With as little as a common 100 to 400 trials (often even less), this matrix cannot be estimated in the usual (nonparametric) way with any degree of accuracy. Therefore, the (feasible) GLS approach is infeasible for the spatiotemporal dipole model. One method to circumvent this problem is to simply ignore time correlations in the noise, thereby loosing (asymptotic) efficiency of the estimators. A better method, is to use a parametric model for the noise mn × mn covariance matrix [17, 55, 109]. The Kronecker product structure adopted in these latter references is particularly convenient for computation in the inverse problem. An alternative approach to resolve the issue, is to transform the signals segments into the frequency domain by means of a discrete Fourier transform (DFT) [189, 191]. For wide sense stationary noise signals, this transform asymptotically (for signal segments of long duration) removes the time correlations in the data, so that only the covariances between sensors need to be estimated. The latter can be done across trials, in the same manner as in the instantaneous GLS method. This approach will be used extensively in this thesis, and will be discussed in detail in later chapters. Three prominent objections against the non-linear search methods have been proclaimed in the literature. They are, i ) the high dimensional parameter space that must searched, resulting in time consuming computations, ii ) the fact that most non-linear search algorithms are not guaranteed to find the true least squares estimate, because of local minima, and iii ) that the number of dipoles in the model must be chosen a priori. With respect to i ), reductions in the dimension of parameter space were already considered. When multiple dipoles are estimated however, this remains a problem, due in part to the ‘strong’ non-linearity of the lead field function Λ [12] (some troubling effects of the non-linearity may be alleviated by finding a suitable reparametrization of the model—general methods for finding these exist [200]). Regarding ii ); the local minima can be avoided by using multiple starting values, or specialized algorithms (e.g., simulated annealing and genetic programming [59, 75, 103, 227]) and therefore, does not have to be an urgent problem. The combination of i ) and ii ) however, can make multiple starting values impractical. Problem iii ) is prominent because an inadequate number or sources leads to serious bias in the source estimates [108]. It can be cured however, by fitting multiple models with different numbers of dipoles, and choosing the best model. Several model selection procedures to rationalize the model choice, are available [169, 221, 237]. 1.2.2 Linear search methods To circumvent the aforementioned objections against non-linear searches, quicker alternatives have been sought. These are all constituted by the idea of subdividing the head into a large number of small voxels, and scanning all voxels on their contribution to the externally measured field. Most important of these are the so called beamforming approaches [202–204,229,233,234], and the multiple si gnal classification (MUSIC) algorithm [169] (and its improvement RAPMUSIC [64, 167]).

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

8

1 Introduction

Beamforming approaches, sometimes called spatial filter or virtual sensor approaches, aim at constructing for each voxel an m-vector w, such that the linear combination of the sensor measurements, w y, represents as good as possible the activity at a voxel in the brain. The vector w is called a spatial filter, and is designed to maximize its sensitivity and selectiveness for the particular target voxel. The simplest of these, is the linearly constraint minimum variance (LCMV) beamformer [229]. In this case, w is constrained to be such, that for the target voxel location r and target orientation3 q /qq , w Λ(rr )qq /qq  = 1 holds. This constraint has the effect, that the amplitude of target dipole is transmitted undistorted—that is, the amplitude is amplified nor attenuated [203]. To make the beamformer as selective as possible under the imposed constraint, the total output variance (signal power) of the beamformer is minimized. That is, w(rr , q ) = arg minw w Vw subject to w Λ(rr )qq /qq  = 1, where V = YY  /n, is solved [229]. By the definition of V therefore, the beamformer only suits the spatiotemporal model. The solution of the optimal w, that may be obtained, e.g., by means of the Lagrange multiplier method, is w = [qq  Λ(rr ) V−1 Λ(rr )qq ]−1q  Λ(rr ) V−1 . With w(rr , q ) defined in this way, the entire brain volume is scanned on a voxel by voxel basis in search of “distinctive” maxima in the normalized output power “spatial spectrum”, or “neural activity index” [qq  Λ(rr ) V−1 Λ(rr )qq ]−1 /[qq  Λ(rr ) Λ(rr )qq ]−1 [134,229]. These maxima correspond to source locations. The second order moment estimate of the data, V = YY  /m, has to be nonsingular, and requires that more time points are included in the analysis than there are sensors (n > m); for stable estimates n should be several times m [229]. Time correlations in the noise increases the number of required time points [229]. Furthermore, the noise signals are assumed to be wide sense stationary [229]. Estimating the second order moment across trials for the instantaneous source model is possible [202, 203, 229], but requires the source amplitudes to vary incoherently across trials4 [229]. The beamformer spectrum is known to have a low resolution however, in the sense that peaks cannot always be clearly distinguished in the presence of closely spaced sources. This is especially true for coherent sources [134]. If only one dipolar source is active, this yields a least squares estimate. If multiple dipolar sources are simultaneously active, the method can give strongly biased source location estimates, because this essentially fits a single dipole model, which is an incorrect model for data generated by multiple dipoles [108]. MUSIC is suitable for finding multiple non-coherent dipole sources in a spatiotemporal dipole model—that is, dipole sources of which the moment time courses are linearly independent. In radar signal processing MUSIC was introduced as a substitute for the very quick beamformer scanning methods, as well as for the computationally expensive, but more accurate, standard least squares method. In EEG/MEG dipole source localization it was introduced before beamformers. Conditioned on the assumption that the columns of the sensor array lead field matrix for any set of d spatially distinct sources is linearly independent, and on the assumption that source amplitudes have zero correlation, MUSIC has a more discriminative spatial spectrum than beamformers, allowing the resolution of even closely spaced sources in equal circumstances [134]. In MUSIC, the m × n data matrix Y is decomposed into a signal subspace and a noise subspace by means of a principal components analysis on YY  . The number of components, and hence the effective dimension of the signal subspace, r say, is determined by means of a scree-plot. An orthonormal basis for the signal subspace is then defined by the eigenvectors corresponding to the r greatest eigenvalues. MUSIC then essentially scans the whole brain volume for peaks in the squared (subspace) canonical correlation of the signal subspace with the linear span of the lead field matrix at each voxel. The matrix YY  has to be nonsingular, and this requires a greater number of time samples than the number of sensors, as was the case for the LCMV beamformer. 3 4

The beamformer presented in [229] in fact does not require the specification of a target orientation, but here this simplifies the discussion, while the line of reasoning is the same. In principle, for the instantaneous source model, this requirement makes the beamformer inconsistent with the deterministic waveform model that underlies the ERP/ERF. However, source amplitudes are thought indeed to vary across trials.

1.2 Forward and inverse problem: dipole model and localization

9

The peaks are then identified as sources [169]. To address dipole sources with linearly dependent source amplitude waveforms RAP-MUSIC was developed. In RAP-MUSIC the MUSIC scanning procedure is recursively applied to combinations of an (iteratively increasing) number of voxel locations [64, 167]. Apart from pre-whitening issues, a significant problem in MUSIC is the separation of signal and noise subspace, which is generally not very straightforward when applied to real data. It has been determined that MUSIC can yield asymptotically (i.e. for L → ∞) efficient estimates, but this is the case, only if the sources are completely incoherent (i.e., have zero amplitude correlation). As the correlations between source amplitudes increase, the relative efficiency can rapidly decrease to zero [217, 218]. Currently, RAP-MUSIC has not been investigated with respect to these issues, but it may be compared to other multiple source extensions to MUSIC, of which the usefulness has be questioned, because it eventually involves searches with as many parameters as the GLS estimators. Hence, it is difficult to see, why it should be preferred to the asymptotically efficient GLS estimates [180]. 1.2.3 Distributed source models: Linear current density reconstruction In the second approach to solving the inverse problem, the aim is to construct a tomography of the current distribution in the entire brain volume or, in some cases, in the cortex only. This is achieved, by requiring that the obtained current density favors a certain global property. The global properties might be based on known physiological characteristics of current density relations between nearby neurons, but are currently mainly based on ad hoc assumptions; such as, the assumption that the brain has a tendency to minimize the total energy of the current distribution, which leads to the minimum norm (MN) current density ‘estimate’, or, that nearby voxels will have approximately equal current intensity because of related functionality, and therefore the primary currents density will change very smoothly with position, which leads to the minimum Laplacian ‘estimate’ (LORETA). As with linear scanning methods, in tomography approaches, the brain is subdivided in a large number d of voxels. Each voxel is then equipped with a freely oriented dipole (or, in case of reconstructions constrained to the cortical mantle, with one dipole oriented perpendicular to the surface of the cortex). The total m × 3d lead field matrix Λ is calculated, and the measurements are expressed (in the instantaneous model) as y = Λqq 0 , where q 0  = (qq 1  , q 2  , . . . , q d  ) is the unknown 3d source moment vector as before. Since the number of voxels is much larger than the number of sensors, and hence, 3d  m, this system of equations is highly under determined, and no unique solution can be obtained from this equation alone. The general solution to this system of equations is the estimate q˘ = Λ+ y + (I − Λ+ Λ)z, where z is an arbitrary 3d vector [150]. To obtain a unique estimate, the physiological assumption concerning the global properties of the current density is invoked: If it is assumed, for example, that the brain minimizes the total energy of the current density, this corresponds to q 2 . Invoking this assumption on the general the solution that minimizes the (squared) norm ˘ solution above, we seek the ‘estimate’ that minimizes Λ+ y + (I − Λ+ Λ)z2 = Λ+ y2 + (I − Λ+ Λ)z2 , which is clearly minimum if and only if (I − Λ+ Λ)z ≡ 0. Hence the minimum norm solution is Λ+ y. Often, the inversion operator Λ+ is regularized (i.e., the singular values of Λ are raised with a small amount) to account for noise [212], or to obtain a more stable [49], but also more blurred [91], solution. If it is alternatively assumed that the primary current density is very smooth, one can specify a measure of the roughness of the current density, e.g., as the sum of the squared differences in current between neighboring voxels, which is equivalent to the norm L(qq 1  , q 2  , . . . , q d  ) 2 , where L is a matrix that implements these differences. Minimizing the roughness, will yield the smoothest solution. In the low resolution tomography method (LORETA) [178] the matrix L is taken to be a discrete implementation of the Laplace

10

1 Introduction

operator ∇2 , which can be characterized as the operator of which the resultant vanishes if it is applied to the smoothest possible surfaces [95]. In LORETA L is taken such, that L L is nonsingular. We can then consider the equivalent problem of minimizing (L L)1/2q˘2 subject to y = Λ˘ q . Defining (L L)−1/2q˘α = q˘, substituting this in the latter constraint minimization problem, and applying the same reasoning as before shows that q˘α = [Λ(L L)−1/2 ]+ y, or q˘ = (L L)−1/2 [Λ(L L)−1/2 ]+ y = (L L)−1 Λ [Λ(L L)−1 Λ ]+ y. Because linear inverse methods determine many more parameters than there are measurements, these methods can only provide very blurred ‘images’ of the actual current density distribution. This makes the estimated parameters highly dependent on each other, and are therefore, not accessible to statistical analysis. Additionally, typically in MN solutions, activity shows at locations where none exists, and the estimated activity is biased towards the vertex [49]. Furthermore, if any other solution is obtained than the minimum norm solution, it can be shown that current density distributions exist, that satisfy the global property that was optimized, but cannot be reconstructed with the linear inversion method that was derived from it [91, 93]. The very smoothed images produced with these methods, seem to be at odds with the more focussed activity that is observed with functional magnetic resonance imaging trial averaged data, which have instigated some to interpret the maxima in the current density estimates as source locations [49] (especially in the evaluation of their performance [91]). If used for this type of dipole source localization, in which peaks are interpreted as locations of activity, they are know to be biased [93]. To deal with this bias, linear tomography methods have been modified to yield more focused solutions. These modifications can be characterized as finding a hybrid between a distributed source solution and a multiple current dipoles solution. This is done in general through an iterative process of refinement of linear inverses, in which, after a tomography has been constructed, those voxels that show “significant” activity, are retained or more strongly weighted in a second application of the inversion method. This process is iterated until a “sufficiently focussed” solution is obtained [46, 49, 83, 155]. Roughly, these methods differ in the way they select “significantly active” voxels. Iterative refinement algorithms have been reported to be highly unstable with noisy data [12]. Unfortunately, in investigating these methods, important issues such as unbiasedness and efficiency of the estimators have been only easily accessible to analysis in Monte Carlo simulation studies.

1.3 Inverse methods for functional connectivity research The methods discussed in the previous section have been widely used in ERP/ERF research. Application of these methods to unaveraged, single trial data, has been mostly limited to the virtual sensor applications of beamformer methods [67, 97, 209, 234]. Nonlinear least squares estimation of dipole models applied to unaveraged narrow band filtered MEG data, has been reported in [51, 54]. Iterative minimum norm solutions have been applied to unaveraged MEG in [47, 49]. Most application of inverse methods to single trial data aim at identifying cortical interactions by analyzing, e.g., correlations in source amplitude reconstructions in single trial data, exceptions being [51, 54]. Beamformers are becoming a popular tool as if they were a ‘zooming lens’ or ‘virtual sensor’, measuring activity at a pre-specified target location (e.g., [209, 234]). By design however, the LCMV beamformer is intended for scanning the set of possible source locations and not for ‘reconstructing the source amplitude waveform at an arbitrary location’. This use can give very misleading results, because often these ‘lenses’ are far from ‘zooming in’. For instance, the LCMV beamformer assures that activity at the specified target location is represented ‘undistorted’, meaning that the activity at the target location is not amplified or attenuated (provided no correlated sources exist). It is however, not guaranteed that the target source has the maximum gain of all contributing sources in the output of the filter [94]. Beamformers have also been

1.4 This thesis

11

used to search specifically for correlated sources [97]: In this method, called DICS (d ynamic i maging of coherent sources), first a “reference” source is chosen (e.g., from a peak in the spatial spectrum of the beamformer), and then a region of interest is scanned for sources that show the highest correlation with this reference source. This procedure limits the analysis to finding coherent sources, while lack of coherence can also reveal substantial information on how different functional cortical regions cooperate [47]. In sharp contrast with the latter use, the LCMV beamformer was developed under the assumption of incoherent sources [229], and their localization performance in known to deteriorate with coherent sources [134, 229]. Furthermore, it is well known that correlations between sources cause partial signal cancellation, resulting in distorted and attenuated amplitude time course reconstructions, and—in case of multiple correlated sources—in correlation magnitude dependent bias of correlations estimates [202]. In addition, beamformers can be very sensitive to noise, and the signal to noise ratio varies from voxel to voxel [12]. Their use for single trial analysis, and for connectivity research is therefore subject to interpretational pitfalls. Similarly, MUSIC explicitly assumes incoherent sources, and is therefore unsuited for estimating cortico-cortical connectivity (cf. [49]). Connectivity estimates based on distributed source models, in terms of amplitude correlation between voxels, were developed in [210]. Basically the method reconstructs the voxel to voxel covariance from the minimum norm reconstructed current density ‘estimate’, and can be done in one direct step: If Ψ denote the voxel to voxel covariance matrix of the current density, then the ˘ = Λ+ (V − Θ)Λ+ , where Θ is the minimum norm based estimate derived in [210] is given by Ψ pure instrumental noise covariance matrix. An estimate of Θ could be obtained for MEG, e.g., from the measurements in the sensors, when no sources (no subject) are present. An immediate problem of the method is that very many correlation estimates are obtained, and some means of making a distinction between “significant” and “non-significant” correlations is necessary. Because the number of estimated correlations is much higher than the degrees of freedom in the data, no statistical analysis is available for this purpose. Iterative refinement algorithms for hybrid distributed/focussed sources [49, 83, 212] have been used to restrict the number of voxel to voxel dependency estimates that have to be taken serious [47, 49]. The methods in the latter references rely on data permutation tests on a voxel by voxel basis, and then selects “statistically significant” voxels to be retained in a second iteration of reconstruction, in which the insignificant voxels are discarded.5 Although this method was shown to be more effective than minimum norm thresholding in simulations, unfortunately the sources turned out to be often “mislocalized” if many (i.e., 10 or 20) sources were active, even under favorable noise conditions [49]. Least squares and maximum likelihood approaches to the estimation of functional connectivity assuming a limited number of dipole sources, have not been attempted previously, and they are the aim of the methods considered in this thesis.

1.4 This thesis In this thesis statistical signal processing methods are considered for the neuro-electromagnetic inverse problem in the analysis of functionally related connectivity between different brain areas. In chapter two we will review most important methods that are currently used in the literature on non-invasive measurement of cortico-cortical interactions. Also, concepts concerning corticocortical interaction, such as functional and effective connectivity will be touched upon. Many of these methods used are compromised by confounding artifacts due to the physical origin of the signals. Caveats and some of the suggested solutions will be discussed in the chapter. The methods proposed in later chapters will be seen to be a natural outcome of this discussion. They 5

Clearly, these tests cannot be conceived as independent, as the reconstruction is a linear combination of the lead fields of m sensors (much less than the initial number of sources), and can therefore vary only in an m dimensional subspace.

12

1 Introduction

are based on the frequency domain dynamic factor analysis model as presented in [162], and on structural equation modelling with latent variables [123]. Chapter three presents the statistical basis for the methods presented in chapters four and five. It is more technical than the other chapters, and may not be of interest to all readers. This chapter focusses on different methods for estimating parameters in mean and covariance structure models. The sub-model of the dynamic factor model that will be used in chapters four and five transforms into a confirmatory factor model in the frequency domain. Therefore, in particular, the confirmatory factor model with structured means is considered. This model has been useful to psychology, as well as to science, engineering, and the social sciences. The applications in the engineering literature, as well as in this thesis, often involve complex random variables. Therefore, complex normal maximum likelihood (ML) for mean and covariance structures of complex stochastic variables will be discussed. Furthermore, generalized least squares (GLS) estimation of covariance structures [29] is extended to a certain class of complex random variables. The focuss on the confirmatory factor model allows the development of estimation algorithms with greatly increased computational speed. The in this chapter derived ML algorithm constitutes substantial generalizations of a method known in the electrical engineering literature as ‘stochastic maximum likelihood’. Furthermore these “special purpose” algorithms provides additional insight into the capabilities and limitations of the estimators. In chapter four, dipole modelling of EEG/MEG is combined with the confirmatory factor model, to simultaneously estimate dipole source locations, orientations and cross-spectra between dipole amplitudes. In a simulation study, maximum likelihood and generalized least squares estimators are compared, and the maximum likelihood method is applied to data obtained from a visual stimulation paradigm. The results of the latter application are unfortunately not conclusive, and some obvious reasons for this lack of success are discussed, along with possibilities to improve these caveats. One possibility is to include the associated mean structure, so that information that is present in the average of the EEG/MEG signals across repeated trials is used to improve the dipole source estimates. Chapter five then continues to include a mean structure, and further introduces a framework for linear response kernels modelling of effective connectivity between cortical sources. This framework has its roots in path analysis, structural equation modelling (SEM) and LISREL [123]. Because application of the method to available empirical data could not be interpreted reliably, no such applications are presented. In chapter six, the problem of detecting connectivity is approached from a different angle. This was partly instigated by the problematic results for the empirical data. In this chapter the linear inverse methods are considered as possible alternatives for the multiple dipole approaches in chapters four and five. This is in fact the approach that has been adopted elsewhere in determining cortico-cortical interactions. This is so, because it is often considered necessary, that non stimulus-locked or single trial data should be analyzed for connectivity assessment, and as little as possible should assumed a priori on the underlying current distribution. Linear inverse methods, especially spatial filters, are sometimes assumed to make this possible [40, 48, 49,97,116,118]. In chapter six we will consider if this assumption can be made warranted, and we will search for boundaries of what is possible with linear methods without making assumptions on the current density distribution. Finally, in chapter seven some conclusions are drawn on the ability of EEG and MEG to provide conclusive evidence for the existence of interactions between different parts of the cortex. It presents a discussion on fundamental requirements for the establishment of functional and/or effective connectivity on the basis of EEG and MEG data. Furthermore, it provides some directions for improvement.

1.5 Notation

13

1.5 Notation In this thesis, bold uppercase letters will be used for matrices X = (xab ), bold lowercase for vectors, X denotes the transpose of a matrix, i.e. X = (xab ) = (xba ), X Y = (xab yab ) denotes the Hadamard product, ⊗ will denote the Kronecker product defined by block structured matrix X ⊗ Y = (xab Y). The Frobenius norm of a matrix X, which sums the squared components of X, is denoted by XF . It is often useful to stack the columns of a matrix X on top of each other in a single vector: this√is denoted by vec{X}. For a complex variable Z = X + iY , where X and Y are real and i = −1 is the imaginary unit, Z denotes Z = X − iY , the complex conjugate of Z. The real part of Z, X, is denoted (Z), while the imaginary part of Z, Y , is denoted (Z), hence Z = (Z) + i (Z). √ A complex value can be alternately expressed in polar form: Z = R exp(iφ), where R = X 2 + Y 2 = |Z| = Mod (Z) is the modulus or absolute value of Z, and φ = arctan(Y /X) = Arg (Z) is the (principal) argument (0 ≤ φ < 2π), or phase angle of Z. For a complex matrix Z, Z∗ denotes the complex conjugate and matrix transpose of Z. Of any (real or complex) square matrix X, |X| denotes the determinant, X−1 the inverse of X if it exists. For any (real or complex) matrix X, X+ denotes the MoorePenrose generalized inverse of X defined in App. A (p. 110). For the m × d matrix X ∈ Km d , R(X) = {x : x = Xβ for some β ∈ Kd }, where K is either R or C, i.e., R(X) denotes the range space, or column space, of X. For the real m × d matrix X, the matrix X(X X)+ X is well known to be the orthogonal projector onto the column space R(X) of X, which takes any vector x ∈ Km , and projects it orthogonally onto the space spanned by the columns of X, i.e., (ΠX x) ∈ R(X) ⊆ Km ; it will denoted by ΠX . The projector onto the orthogonal complement ⊥ ⊥ ⊥ of R(X) is I − ΠX , and is denoted Π⊥ X . Note that ΠX + ΠX = I, and ΠX ΠX = ΠX ΠX = 0, and hence the projector partitions Rm into two perpendicular subspaces. By replacing  by ∗ , similar definitions can be obtained for the case that X is a complex valued m × d matrix. On several occasions we will use the differential of a vector or matrix valued function as defined in [150, Chap. 5]: Let f : Ω → Rm be a function defined on a domain Ω ⊂ Rp . If there exists a matrix valued function A : Ω → Rm p depending on θ but not on u, such that f (θ +u) = f (θ)+A(θ)u+rθ (u) for all u ∈ Rp in a neighborhood of θ, and limu→0 rθ (u)/u = 0, then f is said to be differentiable, and A(θ)u as a function of u is called the differential of f at θ. The differential is sometimes symbolically denoted as df = A(θ) dθ, where d is called the differential operator. It can be shown that the matrix A(θ) is unique, and corresponds to the matrix ((A)ab ) = (∂(f )a /∂(θ)b ) = ∂f /∂θ [150, Chap. 5]. The differential is extended to matrix valued functions F : Ω → Rm d by considering the vector valued function vec{F}. For rules of calculation we refer to [150] and [199]. For a stochastic variable X, E(X) will denote it’s expected value and Var(X) its variance. For stochastic variables X and Y , Cov(X, Y ) will denote the covariance between X and Y . For a sample of observations  {x1 , . . . , xn } of a random variable X, we will often denote the sample mean as x˙ = (1/n) na=1 xa . For a sequence of stochastic variables {Xa }∞ a=1 , convergence in probability to a stochastic variable X, defined by lima→∞ P (|Xa − X| > ) = 0, for all  > 0, p is denoted plim Xa = X, or Xa → X. Almost sure convergence of this sequence to X, defined by P (lima→∞ Xa = X) = 1, will be indicated by the qualification almost surely (a.s.), and with probability one (w.p.1). If X and Y are two stochastic variables that have identical distributions, d

this is denoted X = Y . If X and Y are independent, this is denoted X ⊥ Y . If θ0 is a vector of parameters that have to be estimated, in general θ will denote the maximum likelihood (ML) estimator, while θ˘ will denote a least squares (LS) estimator or an alternative estimator that is generally suboptimal compared to ML.

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

2 Cortico-cortical interactions and their analysis

Understanding cortico-cortical interactions is deemed crucial for the understanding the neurophysiological basis of cognitive functioning. In recent years the focus of research in cognitive neuroscience has shifted towards studying these interactions. In this chapter we will consider how interactions between different cortical regions may be described, and review some of the methods that are used in the literature to study cortico-cortical interactions. Many of these methods are hampered by confounding artifacts due to the nature of the signals. Caveats and some of the suggested solutions will be discussed in this chapter. Because cortico-cortical interactions are constituted by neuronal interactions, we will first briefly recall how neural cells communicate to each other. Furthermore, it will be suggested how interaction dynamics at the neural level, may relate to the interaction dynamics between neural currents reflected in EEG and MEG signals. This requires a highly simplifying abstract level of modelling of neuronal interactions, as was for instance introduced in [25].

2.1 Interactions Among Nerve Cells Two different types of neural coupling that prevail in the nervous system are constituted by electrical and chemical synapses. At electrical synapses, a current that results from a presynaptic action potential, flows through gap-junction channels from the presynaptic to the postsynaptical cell, where it causes a postsynaptic depolarization of the transmembrane potential. These gapjunctions are usually bidirectional, in the sense that the role of pre- and post-synaptic cell can be switched. Gap-junction synapses have virtually no signal transmission delays. Cells coupled through electrical synapses are particularly susceptible to synchronization of their electrical activity [128, p. 180]. In contrast to electrical synapses, at chemical synapses, neurons remain separated by the synaptic cleft, and communicate through the release of neurotransmitters. Axon terminals from which the transmitter is released, most often terminate on dendritic spines that reside on the dendrites of the post-synaptic cells. Transmitter molecules that are released by the presynaptic cell at the arrival of action potentials, cause ion channels to open up in the post-synaptic neuron. Inflow of ions through these channels into the postsynaptic cell, produces membrane potential depolarizations (or hyper-polarizations) called excitatory (or inhibitory) postsynaptic potentials (EPSP’s/IPSP’s). The inward currents flow towards the cell’s soma and axon hillock, where a build up of charge causes the post-synaptic cell to depolarize to a threshold level and fire an action potential. Chemical neuronal interactions are characterized by delays of a third of a millisecond to several milliseconds [128, 231]. The detailed interaction mechanisms through which neurons communicate are complex and diverse. A great deal is known about these various pathways of communication and the accompanying electrical dynamics. But although detailed models exist, both at the biochemical [128] as well as at the neuroelectronic level [50], understanding the interaction dynamics at the level of neural assemblies and networks requires substantive abstraction and simplification [50]. A class of abstract models of neuronal interaction with relatively few assumptions was developed

2.2 Point Process Description of Neuronal Interactions

15

in [25, 27], in which only the relations between the spiking times of the pre- and post-synaptic neurons are modelled in terms of a bivariate point process. At the larger scale of assemblies and networks, similar models can be postulated for local aggregate measures of activity. However, the relation between such models at these different levels—pre- and post-synaptic cell versus preand post-synaptic assemblies—is not immediate. This is particularly true for non-invasive measures of brain activity as provided by EEG, MEG, fMRI and PET1 , which are derived measures of the neural activity, and in which the spiking times of neurons cannot be observed directly. EEG, MEG, PET and fMRI measures of brain activity reflect mass firing bursts in entire populations of cells and cell assemblies: The currents that flow into the dendrites of postsynaptic cells in response to mass scale transmitter release, are observed in the EEG/MEG [102,231,240], and increases in oxygenated blood supply and changes in metabolic activity are observed in fMRI and PET [71, 171]. For activity of large populations of cells it might be possible to relate the interactions at the neural level in terms of point processes to the relations observed between the derived measures of activity in EEG and MEG. In the following section we describe the model of [25, 27, 193], and suggest how it relates to average postsynaptic intracellular dendritic currents [50]. This average is then used as an abstract model for the interaction dynamics between the aggregate activity of populations of cells residing in different cortical areas.

2.2 Point Process Description of Neuronal Interactions A finite stretch of point process data is a sequence {τa }n−1 a=0 such that τ0 ≤ τ1 ≤ · · · ≤ τn−1 . In the context of neuronal dynamics, its values τa are conceived as the times at which action potential spikes occur. Here n is the number of spikes that occurred during the time span of the sequence. A descriptive measure of the process is the number of spikes, in a time interval [0, T ). A stochastic point process on the real line is a random process whose realizations are sequences {τa }∞ a=−∞ , ordered by τa ≤ τa+1 , on the interval (−∞, ∞). Such a process is described by the joint distributions of the random variables N (I1 ), . . . , N (IJ ), J = 1, 2, . . ., where the Ij are sets consisting of (countable unions and intersections of) intervals of the real line, and N (Ij ) counts the number of spikes in Ij .The process is stationary if the joint distributions are unaffected by translations in time [27]. We will identify a particular realization of the process by it’s counting function N (·). Let M and N be stationary stochastic point processes, and let (M, N ) represent a bivariate stochastic point process having M and N as its components. If we define dN (t) = N ((t, t + dt]), then the mean intensity, pN , of process N is defined by the relation [25] E{dN (t)} = pN dt. Because the spikes occur in isolation (due the refractory period immediately after the spike), for dt small enough, dN (t) is either equal to zero or equal to 1 (with probability 1), and we find that pN dt may be interpreted as the probability that N spikes in (t, t + dt]. The second-order cross product density at lag u, pN M (u), is defined for u = 0 by [25] E{dM (t + u)dN (t)} = pM N (u)dt du, and may be interpreted as the probability that M spikes in (t + u, t + u + du], and N spikes in (t, t + dt]. These parameters may then be used to define the conditional intensity, E{dM (t + u)|N spikes in (t, t + dt]} = pM N (u)du/pN , and is interpreted as the probability that M spikes in (t + u, t + u + du] given that N spikes in (t, t + dt]. Brillinger [25] proposes to consider the stochastic variable 1

Non-invasive here refers to the fact that the individual undergoing the measurements remains structurally intact—in this sense PET may be considered “non-invasive”.

16

2 Cortico-cortical interactions and their analysis

µM |N (t) = lim Prob {M spikes in (t, t + h]|N }/h = E{dM (t)|N }/dt, h↓0

(2.1)

which is the limiting value of the time average conditional probability that M spikes in the time interval (t, t + h] given the entire sequence of spiking times of N . Following [26], we consider reasonable responses M of a time invariant system to which N is input. If N ((−∞, ∞)) ≡ 0, then it may be reasonable to suppose that µM |N (t) = α0 is a constant rate of spontaneous activity. If N has a single spike at time τ , then it may be reasonable to consider µM |N (t) = α0 + α1 (t − τ ) as the system response, where α1 represents the time dependent effect on the output of a single shock fed to the system. When N has two spikes at times τ1 and τ2 , then a reasonable model for µM |N may be µM |N (t) = α0 + α1 (t − τ1 ) + α1 (t − τ2 ) + α2 (t − τ1 , t − τ2 ), where α2 gives the interaction effect on M of the spikes of N at τ1 and τ2 . In a causal system, in which an input is a cause of the system’s output, α1 (t − τ1 ) = 0 if t < τ1 and α2 (t − τ1 , t − τ2 ) if t < τ2 , as causal effects cannot precede their cause. Here we will not bother to impose a causality restriction however. The last expression can be written in a more general form as  ∞  ∞ µM |N (t) = α0 + α1 (t − ν)dN (ν) + α2 (t − ν1 , t − ν2 )dN (ν1 )dN (ν2 ). −∞

−∞

Considering more and more spikes in N in this way, leads to a Volterra like expansion of the interaction between (the neurons generating) N and M [25]. It was observed in [25], [27] and [193] that the linear terms can elucidate even highly nonlinear situations as found in the interactions between different neurons, while at the same time simplifying the mathematical tractability of the model. We therefore limit ourselves to the linear terms in the Volterra expansion. This gives the linear model [193]    ∞ α1 (t − ν)dN (ν) dt (2.2) E{dM (t)|N } = α0 + −∞

for the interaction between the neurons represented by M and N , where α1 (τ ), by analogy with the terminology used for linear systems operating on continuous process, may be called the average impulse response [25, 193]. It should be noted that this model, which can be used for analyzing the interactions between neurons, does not inherently necessitate the existence of direct (mono-)synaptic connections. Relations between spike trains may also result from indirect pathways from one cell to another, or from common input. Even in multi-unit recordings it is often very difficult to establish a synaptic connection between individual neurons, so that often it is not known if the cells from which the spike trains were recorded, have direct synaptic connections. These observations adhere to the concepts of effective connectivity between cells, on which we will elaborate in a later section. In a causal interpretation of effective connectivity the kernel α1 (τ ) should be considered a causal filter, which is then defined to be nonzero for τ > 0. To make the transition from modelling neural spike train interactions, to modelling interactions of aggregate (derived) measures of this spiking activity observable in EEG/MEG, the expected value of the interaction dynamics of a single pair of neurons may be used as an approximate model for the average interaction dynamics across multiple pairs of neurons: Because many neurons within a cortical region are simultaneously active, synchronized to some extent in response to e.g. the presentation of a stimulus (otherwise their existence would go unnoticed in e.g. EEG/MEG signals [102]), interactions between different cortical regions are likely to involve many interacting pairs of neurons. We may use the expected value of the spiking dynamics of a single pair as a model for the dynamics of the aggregate measures of this activity of many pairs simultaneously. This is very similar to the way individual spike trains are often replaced

2.2 Point Process Description of Neuronal Interactions

17

by firing rates (i.e., the expected value of the spike train) in firing rate models [50, ch. 7]. For stationary point processes the expected value is only informative (by means of equation (2.2)) if the neural spiking processes are conditioned on an input process that is due to e.g. a stimulus presentation. This input process is assumed to be the same for all pairs of neurons across which the average is taken (for example a visual stimulus that excites many neurons at the same time, could justify this assumption in certain cases). The presynaptic neuron will then have a time dependent (i.e., nonstationary) intensity. As discussed earlier, spikes generated in the neurons axon induce EPSP’s and associated postsynaptic currents (EPSC’s) that flow into the post-synaptic cell’s dendrites towards the soma [50, ch. 5]. The firing rates (the average number of spikes per unit time) of the neurons are only indirectly accessible by EEG/MEG measurements through these induced EPSC’s: For two monosynaptically coupled neurons, the firing rate of the presynaptic neuron is functionally related to the EPSC’s in the postsynaptic cell. Temporal summation of the EPSC’s in the postsynaptic cell increases it’s firing rate (a positively rectified threshold linear function2 is often considered as a model for the functional relation between the postsynaptic current and the firing rate [50, p. 234]). The firing rate of the postsynaptic cell is functionally related to the EPSC’s it induced in what we will coin “post-postsynaptic” cells. It will be assumed that the induced currents are (on average) rather stereotyped responses to presynaptic action potentials. This assumption seems to be warranted for real neurons [50, p. 181]. Using this scheme, illustrated in Fig. 2.1, as a model for two groups of several thousands of more or less synchronized neurons (that will be visible in the EEG/MEG), the relation between the expected values of the EPSC’s η1 and η2 in the post- and post-postsynaptic cell respectively is determined. The expectation value of this relation will then be used as an approximate model for interactions between the two groups of simultaneously interacting neurons. pre

post

+

+

post−post

aggregate EPSC’s

Fig. 2.1. Interaction dynamics of post- and post-postsynaptic aggregate EPSC’s resulting from pre- and post-synaptic neuronal interactions. Each node represents a neuron with axon projecting to a postsynaptic cell. Vertical bars on axons indicate spikes. The continuous line above each spike train is the convolution in Eq. (2.3) of the spike train with the stereotyped response kernel which represents the induced EPSC signal in the cell that is post-synaptic to the cell of the axon. The resulting aggregate measures (in this case the sum) of the individual EPSC-signals within each neuron-group is plotted at the bottom next to the ‘+’ sign.

Consider two monosynaptically coupled cells, whose spiking signals are represented by dM (t) and dN (t). The EPSC’s induced in the respective postsynaptic cells in response to a single action potential, will be assumed to have the form h(τ ), generated when a presynaptic cell fires. The 2

This function is rectified to zero where the linear function is negative.

18

2 Cortico-cortical interactions and their analysis

total postsynaptic currents, η1 (t) and η2 (t), then are assumed to be the linear summation of the responses to the individual action potentials:  ∞  ∞ h(t − τ )dM (τ ), η2 (t) = h(t − τ )dN (τ ). (2.3) η1 (t) = −∞

−∞

Note that h(τ ) represents the effect of a single spike at a single synapse. Obviously we could have postulated a similar Volterra like expansion that includes interaction effects of multiple spikes on the postsynaptic currents, as discussed previously, but the assumption of linear summations of the individual effects is usually made [50, p. 233]. In the interaction between dM and dN we will assume that α0 ≡ 0 in (2.2):   ∞ α1 (t − τ )dN (τ ) dt, (2.4) E{dM (t)|N } = −∞

this assumption is immaterial and will be repaired shortly. Because dN (t) is induced by the presentation of a stimulus, it will have a nonstationary intensity E{dN (t)} = pN (t)dt. To determine the relation between E{η1 (t)} and E{η2 (t)}, we use the equality E{X} = E{E{X|Y }} for stochastic variables X and Y , and assume that orders of expectation and integration can be interchanged:   E{η1 (t)} = E{E{η1 (t)|N }} = E{E{ h(t − τ )dM (τ )|N }} = E{ h(t − τ )E{dM (τ )|N }}      = E{ h(t − τ ) α1 (τ − u) dN (u) dτ } = h(t − τ ) α1 (τ − u)E{dN (u)} dτ. By using E{dN (u)} = pN (u) du, and by change of variables,    E{η1 (t)} = α1 (ν) h(t − ν − u)pN (u)du dν =



−∞

α1 (ν)E{η2 (t − ν)} dν.

(2.5)

Hence the linear filter kernel that relates the firing rates of monosynaptically coupled neurons also relates the average EPSC’s induced by their firing, if it is assumed that the postsynaptic effects h of the spike trains in the presynaptic axon sum linearly. Under the assumptions made sofar therefore, up to the linear approximation, the dynamics of the interaction at the neural level of multi-unit recordings and the dynamics at the level of aggregate measure recordings of EEG or MEG, are the same. The assumption that h is the same for all neurons can be relaxed somewhat by assuming that h varies over different neurons (i.e. is a random variable), but that the average h˙ = E{h} of h across neurons in one cortical region is the same for both cortical regions of neurons, and that h ⊥ dM and h ⊥ dN , were the notation X ⊥ Y means that X and Y areindependently distributed: Then, with E{XY } = E{X}E{Y }  if X ⊥ Y , it is found that E{ h(t − τ )E{dM (τ )|N }} =  ˙ − τ )E{dN (u)}dτ shows that (2.5) still holds α1 (τ − u)E{h(t − τ )dN (u)}dτ = α1 (τ − u)h(t ˙ with h replaced by h. When α0 , the spontaneous firing rate of the postsynaptic neurons, is not assumed to be zero, similar arguments yield  ∞  ∞ ˙ α1 (τ )E{η2 (t − τ )}dτ. (2.6) h(τ )dτ + E{η1 (t)} = α0 −∞

−∞

Note from this that α0 cannot be determined from the η1 and η2 unless h˙ is known. The kernels ˙ and can αj , j = 2, 3, . . . , of higher order Volterra expansions yield terms that also depend on h, therefore not be determined uniquely from E{η1 (t)} and E{η2 (t)}. This provides an interpretation of the relation between dynamics of the interactions at the neural level and the interactions at the aggregate signal. The model in (2.6) for the dependency structure between the mean EPSC’s of two assemblies residing in different cortical regions, provides a basis for the interpretation of the structural equation model methods discussed later in this chapter.

2.3 Noninvasive Methods for Determining Cortico-Cortical Interactions

19

2.3 Noninvasive Methods for Determining Cortico-Cortical Interactions A large number of measures have been developed in the literature that attempt to quantify the level of interactivity between different cortical regions using noninvasive measures of brain activity as obtained in EEG/MEG, fMRI and PET. In this section a number of these are discussed; it will be indicated how they are computed, what the underlying assumptions are, how they are interpreted, and what their limitations and caveats are. Before proceeding however, some central concepts and terminology are discussed. 2.3.1 Terminology and Some Theoretical Considerations Two fundamental principles of cortical organization that emerged from empirical observations are functional segregation and functional integration [70, 174, 211]. Functional segregation (also called functional specialization) is consensus terminology for the idea that different cortical loci and cell assemblies are highly specialized to process rather specific types of information (e.g. area V5/MT is specialized to detect direction of motion in the visual field). Functional integration refers to the idea that functionally segregated neural assemblies need to interact in order to combine the processed information in a meaningful way for appropriate behavioral adjustments. In any type of cognitive or behavioral experiment then, multiple specialized areas are likely to be involved in supporting the task relevant functions. Functional segregation is considered to be meaningless without reference to functional integration. In particular, functional segregation relies on the assumption that different brain regions communicate, as for instance the neural correlates of semantic processing can be identified using written words by virtue of the assumption that visual regions interact with semantic regions [140]. Conversely, functional integration cannot be meaningfully interpreted without the context of the functionally segregated areas that are involved [71]. Functional segregation is the currently dominant approach in cognitive neuroscience, and focuses on the localization of regionally specific responses that can be attributed to differences in stimuli or task requirements [71]. Methods used in functional segregation studies mostly average the measurement data across time, trials of repeated stimulation, subjects, or any combination thereof. This is instigated by low signal to noise ratio’s and the desire to generalize conclusions drawn from the study to the population at large. This is e.g. the case in the ERP/ERF source localization approaches of the previous chapter, and in most PET and fMRI studies. The idea of functional segregation has been challenged [156], as it has been observed that specific regions are activated in a variety of tasks, it’s role depending on the activity in other areas—the specialization of a region therefore, seems to depend on the neural context [156]. Functional segregation will not be considered further in this section. Studies focussing on functional integration usually try to characterize networks of corticocortical interactivity. This interactivity is evidenced in a statistical dependence relation between the activities of different cortical regions that exists during the execution of a certain (cognitive) function. Basically, to assess cortico-cortical interactivity, signals measured from different cortical regions are cross-correlated, and the correlation structure is analyzed using methods that highlight certain aspects of putative networks. The focuss on functional segregation or functional integration at a conceptual level is parallelled by the focuss at the methodological level on the analysis of first or second-order moments of the data, i.c., the means structure or covariances structure. The choice of method to analyze the second-order moment structure depends on the type of inference that is attempted: Earlier, reference was made to the distinction between functional and effective connectivity. Functional connectivity refers to correlations between neurons, or between spatially remote neural assemblies [99] or brain regions [9], and does not provide an insight into the source of their cooperativity [70]. Such correlations may be induced by a tonic state (e.g. sleep, or an emotional state induced by a movie [107]), or by sustained stimulation by

20

2 Cortico-cortical interactions and their analysis

a (dynamic) stimulus (e.g. a steady state ERP resulting from flickering photic stimulation [24]). Effective connectivity on the other hand refers to the effect that one neural system exerts over an other. In this sense effective connectivity coincides with the question of cause and effect, or the direction of detected functional connections. It is a description of “the simplest possible neural like system” that produces the same pattern of correlations, summarized in a matrix of effective (synaptic) weighting coefficient [76]. In particular, this description may describe only a subset of the actual anatomical connectivity [76]. It has been noted that, although the consensus of terminology has converged to the definitions of functional and effective connectivity given here, different interpretational nuances exist, and subsequently different methods of analysis are chosen (e.g. determining activity covariances between different locations across trials for single subject, versus calculating covariance across subjects [157]) [106]. These choices may be enforced by the technical limitation of the chosen modality (e.g. in PET it may take minutes to complete a scan, and only a few trials can be obtained in a single subject—one therefore often resorts to averaging across both trials as well as subjects). Furthermore, used in different data contexts, the interpretation of these concepts differ at a practical level because of the different time (electrophysiology vs. hemodynamics) and spatial scales (millimeter vs centimeter resolution) associated with different modalities [70, 106]. It is however, unclear how observed functional connectivity in, for example, EEG and fMRI, relate to each other [106]. For instance, the alpha rhythm in EEG/MEG recordings is attenuated when eyes are open during passive viewing of visual stimuli [42]. Alpha rhythms are generated throughout the visual cortex in dogs [183], and have been localized by means of current dipole modelling to areas in the visual cortex (calcarine fissure and parieto-occipital sulcus) in humans [42]. These rhythms have been associated with integration of visual information. But it seems implausible that the hemodynamic activity, which is thought to increase with neural firing, is reduced during visual processing of stimuli [171]. Concomitant with a discussion of the distinctive processes that generate the signals in fMRI vs EEG/MEG, it is therefore argued in [171] that EEG/MEG and fMRI are highly selective and distinctive measures of different kinds of brain activity, which in some cases may reflect the activity of the same neural pools, whereas in others each modality provides a different view on the activity in the brain. A further distinction in the evaluation of functional and effective connectivity needs to be made: Whereas some studies use methods that construct measures of connectivity from the data covariance matrix across subjects or trial blocks (most notably in PET [157], but also in fMRI), other studies consider only measures constructed from dynamical properties of the data, usually in terms of the (lagged) cross-covariance function, as informative indicators of connectivity. To consider subject and trial based covariances, or trial and time based covariances as equivalent, would require an assumption of a form of ergodicity—i.e., an assumption that justifies the interchangeability of trials, subjects, and/or time. The distinction made here is closely related to the difference between functional and effective connectivity, though not the same. Focussing on the dynamics can be partly motivated by the desire to establish the causality of effective connectivity: One signature of causality, though not conclusive, is the (consistent) temporal precedence of event related activity in one area over event related activity in another [140]. Both functional and effective connectivity can be considered in such a dynamical context however. The emphasis on the dynamical nature of the data leads to the use of time series analysis methods. These may either assume stationary or non-stationary dynamics. The former focusses on the time invariant stochastic properties of the signal, whereas in the latter, the interest is especially directed towards change in the dynamics over time. In what follows we will discuss methods that are frequently used to analyze functional and/or effective connectivity. As indicated earlier, connectivity research focusses on the structure of the covariances of the data. All measures discussed below are derived from the covariance, of which the most basic are cross-correlation, and the related measures coherence and phase stability. In

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

2.3 Noninvasive Methods for Determining Cortico-Cortical Interactions

21

the computations of these measures several choices have to be made (e.g. averaging across time or trials), requiring different assumptions and implying different interpretations. 2.3.2 Cross-Correlation Analysis Method and Assumptions. The cross-covariance function between two zero-mean stochastic signals/time series/processes3 , ya (t) and yb (t), −∞ < t < ∞, is defined by ν ab (t, τ ) = E{ya (t)yb (t + τ )}.4 For stationary signals, the dynamics is independent of time, and hence the cross-covariance only depends on time lag τ —in this case we will write ν ab (τ ) instead of ν ab (t, τ ). When a = b, ν aa (t, τ ) is called the autocovariance function. In practice only finite length sample paths are observed, in a duration interval [−T /2, T /2) say. Furthermore, usually, only sample values ya (t), yb (t), t = 0, . . . , n − 1, are recorded at equally spaced points in the duration interval. If the time interval between samples is close enough to avoid aliasing, this is equivalent to continuous records of ya (t) and yb (t). Without loss of generality, in the discussion here it is assumed that E{ya (t)} = E{yb (t)} = 0, −∞ < t < ∞—when this is not the case, the given estimates and definitions can be readily adapted by conventional means. Construction of estimates of ν ab proceeds along two lines, requiring different assumptions. The first line of construction requires that multiple, L say, realizations of ya and yb have been observed; the straightforward estimate is then vab (t, τ ) = (1/L)

L 

ya,l (t)yb,l (t + τ ),

(2.7)

l=1

where the extra subscript l indexes the l-th observation. A law of large numbers (LLN)5 can then be used to guarantee that v(t, τ ) converges to ν(t, τ ) as L → ∞. This measure has been called the joint peri-stimulus-time histogram (J-PSTH) in the multi-unit spike analysis context [76,99], and has recently been introduced in MEG functional integration analysis [74]. The second line of construction assumes that ya and yb are stationary and ergodic, the latter of which means  T /2 that limT →∞ (1/T ) −T /2 ya (t)yb (t + τ )dt → ν ab (τ ) [133]. The limit of the integral exists if both processes are (strictly) stationary under weak conditions [133]. General stochastic processes are ergodic if the stochastic dependence between parts of the process which are separated by an interval of time, approach zero “sufficiently rapidly” as the length of the time interval increases to infinity [133, p. 54].6 This condition, which is stronger than ergodicity itself, is also known as mixing [26]. Under these circumstances the estimator suggests itself as (setting yb (t + τ ) = 0 whenever |t + τ | > T )  1 T /2 vab (τ ) = ya (t)yb (t + τ )dt, (2.8) T −T /2  or vab (τ ) = (1/n) n−1 t=0 ya (t)yb (t + τ ) in the sampled case. Ergodic signals allow statistical inference from single sample paths of ya and yb . Both of these approaches are used in practice, but more commonly in bio-signal analysis a combination of both is used: the cross covariance 3

4 5

6

Throughout this thesis we will use the terms signals, time series and processes interchangeably, and will freely combine them with ‘random’ and or ‘stochastic’—hence ‘stochastic process’ refers to the same thing as ‘random signal’. We will not make a typographical distinction between a random process variable and a particular sample path observed from this process. The specific LLN used affects the interpretation of the correlation. Possibly the simplest and most intuitive interpretation is obtained by assuming that the sample paths of different trials are independently and identically distributed (i.i.d). We will adhere to this assumption in discussing the interpretation of the estimate in (2.7). A further requirement is that the signal has no discrete components (like deterministic sinusoidal or almost periodic components), which is the case if it’s power spectrum distribution is continuous everywhere—see [133, ch. 2] for a discussion of this.

22

2 Cortico-cortical interactions and their analysis

is estimated from (2.8) in multiple trials, and then averaged across trials. In EEG/MEG the stationarity assumption made in this latter procedure seems to be warranted for signal epoch interval lengths of one to two seconds [9,226]. As the covariance function depends on the variance of the signals, the cross-correlation function vab (t, τ )/(vaa (t, 0)vbb (t+τ, 0))1/2 is considered more informative about the dependence between signals. Interpretation. These two estimates of correlation are readily seen to have quite distinct interpretations, although they coincide when the signals are stationary. Whereas the correlation estimate of (2.7) measures the amount of covariance of the amplitudes of two signals at specific time instances across trials (under the assumptions of i.i.d. sample paths), the estimate of (2.8) measures the shape similarity of the two signals at specific lags across time. The former can be used to assess the dynamics of the correlation, whereas the latter is often used for single trial analysis of lead-lag patterns. Such lead-lag patterns are indicative of temporal precedence, and is therefore suggestive of effective connectivity. In case that ya and yb are stationary and ergodic, both approaches estimate the same quantity. To obtain an interpretation of the cross-covariance function, consider the following signal model (a linear time invariant filter system) that will be of particular interest (compare equation (2.5)):  ∞

y1 (t),

y2 (t) =

−∞

h12 (s)y1 (t − s)ds + ε(t),

(2.9)

where it is assumed that y1 (t) and ε(t) are independent processes. Assume that the autocovariance function of ya (t) is ν aa (t, τ ). The cross-covariance between these two signals is  ∞  ∞ h12 (s)y1 (t + τ − s)ds + ε(t + τ )]} = h12 (s)ν(t, τ − s)ds, E{y1 (t)y2 (t + τ )} = E{y1 (t)[ −∞

−∞

i.e., the cross-covariance is the autocovariance of ya (t) at time t, filtered with the same kernel that determines the relation between y1 (t) and y2 (t). Under the linear filter model (2.9), knowledge of the autocovariance ν 11 (t, τ ) and the cross-covariance ν 12 (t, τ ) determines the nature of the relation represented by h12 (s). Problems. We defer a discussion of the problems of correlation analysis to the subsequent section on coherence analysis, since they are very much alike. 2.3.3 Coherence Analysis Method and Assumptions. The coherence between the two stationary signals ya (t) and yb (t) is obtained from the coherency function. The coherency function is a frequency dependent complex valued measure of “correlation” between the two signals, and is obtained from the cross-spectral density and (auto-)spectral densities of the signals. Let ν aa (τ ) = E{ya (t)ya (t + τ )} be the autocorrelation function of the previous section. Because  ∞ ya (t) is assumed to be stationary, ν(τ ) only depends on the1 lag  ∞ τ and not on time t. If |ν (τ )|dτ < ∞, the Fourier transform of ν (τ ), σ (ω) = (τ ) exp{−i2πω}dτ , aa aa aa 2π −∞ ν aa −∞ ∞ ω ∈ R, exists and is called the spectral density of the process ya (t). Also, if −∞ |ν bb (τ )|dτ < ∞ ∞ then −∞ |ν ab (τ )|dτ < ∞, and σ bb (ω) and σ ab (ω) can be defined conform σ aa (ω), and are called the spectral density (or simply spectrum) of yb and cross-spectral density (or simply cross-spectrum) between ya and yb respectively. It should be noted that, while the autospectra are real valued, the cross-spectrum is complex valued. By taking τ = 0 in the spectral repre∞ sentation ν aa (τ ) = −∞ σ aa (ω) exp{iωτ }dω, it is seen that the total sum across frequencies of the spectral density equals the variance ν aa (0) (power, in engineering terms) of ya (t). Therefore, β the quantities α [σ aa (ω) + σ aa (−ω)]dω, α, β ∈ R, 0 ≤ α < β reflect the portion of the variance ν aa (0) of the signals that is accounted for by the variance that is introduced by harmonic components with frequencies α ≤ ω ≤ β (i.e., by “oscillations at these frequencies”). In effect,

2.3 Noninvasive Methods for Determining Cortico-Cortical Interactions

23

σ aa (ω) and σ bb (ω) decompose the total signal variances ν aa (0), ν bb (0) into (independent) variβ ance contributions at different sets of frequencies. The quantity α [σ ab (ω) + σ ab (−ω)]dω reflects the portion in the covariance ν ab (0) between the signals that is accounted for by the harmonic components at frequencies α ≤ ω ≤ β. The coherency function is now defined in a similar way as correlation is defined for continuous random variables: ρab (ω) = σ ab (ω)/(σ aa (ω)σ bb (ω))1/2 , it’s squared modulus |ρab (ω)|2 is defined to be the coherence at frequency ω. A straightforward estimate of σ ab (ω) is suggested by the (complex valued) quantity (T ) Iab (ω)

1 = 2π



T /2 −T /2

vab (τ ) exp{−iωτ }dτ,

(T )

where vab (τ ) was obtained from (2.8), or Iab (ω) = (2π)−1 (T ) Iab (ω)

n

τ =−n vab (τ ) exp{−iωτ }

for the

is called the cross-periodogram. This estimate of σ ab (ω) is unbiased as sampled case. T → ∞ (with fixed sampling frequency), but inconsistent because it’s variance converges to a constant > 0 [26,133]. Some frequencies are of special significance, as will be seen in the following chapter; these are the frequencies ωk = 2πk/n, k = 1, . . . , K < n/2. It may be noted that the spacing of these special frequencies is dictated by the record length n. As with the cross-correlation function, two lines for construction of consistent estimates exist. One line of construction assumes that the spectrum is relatively constant in a small band (T ) of frequencies, and then smoothes the periodogram by averaging of adjacent Iab (ω) in this frequency band: J  sab (ωk ) = (2J + 1)−1 Iab (2π(k + j)/n). j=−J

It can be shown that this estimate is asymptotically unbiased and consistent if J → ∞, as T → ∞ (for fixed sampling frequency), while the length of the averaging window goes to zero [26, ch. 7]. To reduce spectral leakage (the source of the bias), instead of the simple average, a weighted average may be used, or the cross-covariance may be tapered with a lag window before applying the Fourier transform. This type of estimate is suitable for single trial analysis. The other line of construction assumes that multiple observations of ya,l (t) and yb,l (t) are available, and estimates (T ) σ ab (ω) by determining Iab,l (ω) for each trial l and averaging across trials: 1  (T ) Iab,l (ωk ). L L

sab (ωk ) =

l=1

(T )

This estimate stabilizes the erratic behavior of Iab,l (ωk ) as L → ∞, but is biased for finite T . Bias may be reduced by using a taper. As T → ∞ and L → ∞, this estimate is asymptotically unbiased and consistent. The coherency function may be estimated by ρab (ω) = sab (ω)/(saa (ω)sbb (ω))1/2 [26,133]. In what follows we will often collect saa (ω), sbb (ω), and sab (ω) in the square m × m matrix S(ω) = (sab (ω)). From the outset, coherence analysis assumes stationarity of the signals. This may be relaxed to some extend by assuming that the signals are only locally stationary; this implies that within short time windows, the process behaves stationary, but the global characteristics vary throughout the trial (cf. [9]). A type of dynamic analysis of the coherence between signals was presented under these assumptions in [9] called event related coherence: These authors suggest to compute the coherence at multiple points in time within a sliding window chosen to justify the stationarity assumption and to retain a desired degree of frequency spacing. Coherence estimates may then be obtained across trials, or across adjacent frequencies. As an alternative for dynamic changes in coherence it has been recommended to use wavelet transforms [135].

24

2 Cortico-cortical interactions and their analysis

Interpretation. Since sab (ω) is an average across multiple trials or frequencies, it may be reasoned that it’s value is small if the argument (phase angle) ∆φab,l (ω) = Arg Iab,l (ω) of Iab,l (ω) has great variability across these trials or frequencies, whereas it’s value will be large if the arguments ∆φab,l (ω), l = 1, . . . , L, have a preferred value. The former is for instance the case if the arguments are uniformly distributed across trials (frequencies). Alternatively, the value of sab (ω) may also be low because of lack of covariation across trials (frequencies) in the amplitudes of the oscillations in both signals, although this has a much smaller effect on coherence than great variability of the phase difference because the covariation between the amplitudes can be only positive (see [98] for a simple demonstration of these effects). Therefore it may be concluded that both the cross-spectrum sab (ω) and the coherency ρab (ω) are for the larger part measures of the stability of the phase relation between the signals at frequency ω. The squared modulus of ρab (ω), the coherence, quantifies the relation between signals at frequency ω in the similar way as the squared correlation coefficient [26]. In particular, like the squared correlation coefficient, the coherence ranges between 0 and 1. An alternative, more precise, interpretation of coherence is obtained from the model in (2.9): The cross-covariance between y1 (t) and y2 (t) in this model was found to be ν 12 (τ ) = Eq. ∞ −∞ h12 (s)ν 11 (τ − s)ds—the convolution of the autocovariance of y1 (t) with the filter h12 . The ∞ cross-spectrum was defined by σ 12 (ω) = (2π)−1 −∞ ν 12 (τ ) exp{−iωτ }dτ , so that by the convolution theorem of Fourier analysis we have σ 12 (ω) = β12 (ω)σ 11 (ω), or β12 (ω) = σ 12 (ω)/σ 11 (ω), ∞ where β12 (ω) = −∞ h12 (τ ) exp{−iωτ }dτ . In a similar manner the spectrum of y2 (t) is determined to be σ 22 (ω) = |β12 (ω)|2 σ 11 (ω) + σ ε (ω). From this, it is inferred that σ ε (ω) = σ 22 (ω) − |β12 (ω)|2 σ 11 (ω) = σ 22 (ω) − |σ 12 (ω)|2 /σ 11 (ω) = (1 − |ρ12 (ω)|2 )σ 22 (ω). Consequently 1−|ρ12 (ω)|2 is interpreted as the proportion of variance in y2 at frequency ω that is not explained by the linear filter model specified for the relation between y1 and y2 in (2.9), and |ρ12 (ω)|2 is interpreted as the proportion of variance that is explained by the filter model. Problems. Coherence has been characterized as an essential tool in EEG/MEG signal analysis of functional connectivity between various brain areas [63, 65]. A number of problems with the interpretation of coherences between EEG/MEG signals of different scalp sites have been pointed out however. Most of the problems regarding the relation between signal coherence and functional connectivity between different brain regions concern the inverse problem. Because the head is a volume conductor, measurements taken with sensors at scalp locations are an aggregate of activity over larger volumes. If sensors at different scalp locations have overlapping measurement volumes, then coherence will result because they (partly) measure the same activity, and not because different regions are functionally connected. We can take the potential/field due to a single dipole as an example. The activity of this dipole will be measured at each sensor, although differently attenuated (due to the distance between a specific sensor and the dipole source, and due to conductivity properties of the head). If noise at each sensor has approximately the same variance, then due to the different attenuation factors, different sensors will experience different signal to noise ratio’s and hence, correlate differently. The pattern of correlation however can be precisely predicted from the attenuation coefficients, and in fact simply reflect the iso-potential/field contours of the dipole measured without noise. For multiple dipoles, when independently active, the correlations will similarly follow the potential/field contours. For dipoles that have correlated activity the picture will be more difficult. The simple conclusion is of course, that the sources of coherence are at least as difficult to localize from the coherence by visual inspection of the coherence topographies, as it is to localize sources from the potential/field topographies of trial averaged data. To reduce the problem of localization ambiguity, for EEG it has been suggested to calculate an estimate of the surface Laplacian (the “total” second order spatial derivative) of the potential topography. The Laplacian of the potential is approximately proportional to surface normal component of the scalp current density (SCD) [182]. More generally, the Laplacian acts like a spatial filter that attenuates low spatial frequencies and emphasizes higher spatial frequencies.

2.3 Noninvasive Methods for Determining Cortico-Cortical Interactions

−0.5

0.0

0.5

1.0

magnetic field

−1.0

−1.0

−0.5

0.0

0.5

1.0

electric potential

25

−1.0

−0.5

0.0

0.5

1.0

−0.5

0.0

0.5

1.0

0.0 −0.5 −1.0

−1.0

−0.5

0.0

0.5

1.0

negative Laplacian of magnetic field

0.5

1.0

scalp current density

−1.0

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

Fig. 2.2. Laplacian of EEG and MEG (radial component of the magnetic field measured with first order gradiometers): Two dipoles, one low amplitude shallow (eccentricity 0.8) source and one high amplitude deep (eccentricity 0.3) source, were placed inside a spherical head model. The locations and orientations are stylistically indicated by the black dot-with-bar. The upper panels depict the potential (left) and field (right) maps; the lower panels depict the respective negative Laplacians.

26

2 Cortico-cortical interactions and their analysis

The result is that, because the field pattern varies more quickly across space near a current dipole than further away, contributions to the scalp topography of more shallow currents close to the sensor array are emphasized, while contributions of deeper currents are attenuated. For MEG the Laplacian was suggested in [23], and a similar procedure, called a V3 transform, was suggested in [117] (cited in [74]) and [243]. In Fig. 2.2 the spatial high-pass filter effect of the Laplacian is illustrated for EEG and MEG. The effect is demonstrated by placing two dipoles in a spherical head model conductor, one shallow (eccentricity 0.8 from the center of the sphere), and one deep (eccentricity 0.3). Both EEG and MEG are depicted. The deep dipole produces a field that changes relatively slowly across the surface, whereas the shallow dipole has a more quickly varying field. The negative of the Laplacian (which is proportional to the scalp radial current density in EEG) was obtained from a spherical spline interpolation algorithm [181, 182]. Clearly the Laplacian emphasizes the visibility of the more shallow dipole; at the same time the deep source is greatly diminished. The Laplacian can be estimated by various methods, including: Hjorth’s method, in which essentially a local weighted average reference is computed; spherical splines interpolation of the potential and taking analytical Laplacian of the interpolation splines [181, 182]; or spherical harmonics expansion [177] (cited in [137]). The Hjorth method can differ substantially from the other two methods, which is due to the very crude approximation to the true Laplacian, and highly depends on the chosen sensor combinations [173, 174]. The latter two methods have been reported to yield inflated coherence estimates, because in the interpolation procedure signals at widely separated locations are linearly combined, and the weighting coefficients for the Laplacian estimate at each location do not necessarily decrease with distance between sensors [15,16]. This result has been disputed however in [137, 174, 176, 181], because they were derived from random noise for which the spatial derivative is not defined and are therefore deemed incorrect. The advantage of the Laplacian is that no assumptions have to be made about the distribution or the number of sources in the brain. The Laplacian is sensitive however to (local) ratio of the conductivities of brain and skull: which spatial frequencies are emphasized to what extend varies with this ratio, resulting in less focalized contour topographies when this ratio is lower [173,213]. Furthermore, the spline and spherical harmonics interpolation algorithms do not account for the noise in the sensor signals which can greatly diminish their effectiveness [23]. The difference between spatially high passed filtered topographies and the underlying three dimensional current density in figure 2.2 indicates that the Laplacian has still substantial ambiguity in it’s signal origin. An alternative procedure to the surface Laplacian calculates the potential on the smooth dura sheet surrounding the volume of the brain, without considering the cortical convolutions. To do so, the dura potential is expanded on a basis of spherical harmonics assuming radial dipoles [171]. To account for the smearing conductive properties of the head, mostly spherical head models are used [171, 213]. The resulting images are generally very similar to the surface Laplacian [174]. The procedure belongs to a whole class of similar methods that attempt to estimate a charge or dipole layer on the dura sheet, that come under various names—“cortical surface potential imaging”, “spatial de-blurring”, “high-precision EEG”— [61] and all give somewhat different results. All these techniques are suited for scalar fields, blurred by a spatial low pass filter, and have found to give a slight improvement over the Laplacian in terms of correlation with the actual scalar field [61,173], although the spline Laplacian was argued to be more robust because it does not depend on a head model. Apart from conductivity effects on coherence, EEG coherence suffers from the infamous reference electrode: signals are obtained by measuring the electric potential at different scalp sites with respect to an electrode at a reference site. Ideally, the charge density at the reference electrode remains constant over time. To approach this ideal situation, the reference electrode is placed at a site on the head which is assumed to be relatively electrically inactive; usually an earlobe, a mastoid or the nose or chin. However, since the reference is always on the head

2.3 Noninvasive Methods for Determining Cortico-Cortical Interactions

27

it cannot be “far enough” from the sources within the head, to justifiably call it “electrically inactive” [129]. Therefore, the reference electrode is far from ideal, and the charge density at the reference electrode varies over time, resulting in a measured potential signal that actually represents the difference of two signals generated by charge density fluctuations at two different sites. Dramatic changes in coherence values have been shown in [66], when the differences in activity at the reference site and active sites is varied in power as well as phase. Coherence could either be inflated or deflated, depending on the precise relations between the electrical activities at the ‘active’ sites as well as the reference site. Solutions that have been proposed to solve this problem includes the use of an average reference [9,66], which can be motivated by the fact that the potential summed across the entire scalp is zero (the head as a whole is electrically neutral). However, in practice only the upper half of the head can be sampled, and electrodes are mostly not equally spaced, which makes the average reference very dependent on the precise locations of the electrodes, and therefore may not be close to zero [3, 129]. Alternatively, it was suggested to use bipolar potential derivations, which are approximately proportional to the first order spatial derivative. Then no two signals pair analyzed would have a common signal. Because the obtained gradient estimates are highly dependent on the choice of electrode pairs, the orientation of the axis they define, and on the spacing between the electrodes [174], electrode pairs located above co-active brain areas can fail to detect significant voltage differences [131, p. 3096]. The use of multiple references has been suggested; e.g. one on the contra lateral side of the head for each hemisphere [131]. This will most likely complicate the interpretation in a similar way as the common reference; in this case however, it will be unknown which subgroup of the four signals picked up by each electrode are coherent. The use of the Laplacian discussed earlier has also been advocated as a “reference free” alternative to the referenced signals. However, it has been argued that the Laplacian should not be considered as reference free since it is always the difference between the potential at the target electrode and a linear combination of surrounding electrodes [174, 188]. Because no universally accepted “solution” to the reference problem exists, some researchers have even argued that different references should be seen as providing images (of coherence) at different “spatial scales” [173, 174]. From a more statistical point of view, there is the problem of many coherence estimates. Coherences are often obtained for 60 to as many as 128 electrodes or 151 MEG sensors, in many different frequency bands (e.g. up to eighty [214]). This may easily result in over 8,000 to 11,000 coherences per frequency band. Although the significance levels of a coherence is easily calculated there is no simple way to avoid major capitalization on chance without significant losses in statistical power (as is the case e.g. with Bonferoni correction). As far as the signals of fMRI and PET are known to come from specific sites no interpretational difficulties have been indicated; these may arise as soon as it is not entirely clear which locations the signals reflect. Real problems for fMRI are the enormous amounts of data are gathered (capitalization on chance), but more importantly, the fact that sampling rates can be as low as once every 1.7 seconds [71], and the BOLD response in fMRI is slow, with relatively large extent in time (approximately 10-15 sec., with a maximum at 2-6 sec. post stimulus, see e.g. the estimated BOLD response kernel presented in [71]). Coherence analysis therefore is restricted to very low frequencies, and are likely to suffer from aliasing of spectral power from higher frequencies into these low frequencies. This requires long sample paths, which are likely to violate the stationarity assumption. Therefore coherence analysis should be considered inappropriate for fMRI and PET. Time domain cross-correlations on the other hand, estimated across trials, seem to be appropriate for fMRI and PET. 2.3.4 Phase-locking Methods and Assumptions. In phase-locking analysis the purpose is to detect a stable phase relation between signals, that is, whether the phase difference at particular frequencies remains

28

2 Cortico-cortical interactions and their analysis

more or less constant, at least over a portion of the time in which the signals were obtained. A variety of methods have been proposed to analyze relations between signals in terms of phase locking. These methods have been designed to apply under the assumption of local (short time) stationarity of the signals, e.g. [223], or under the assumption of continuous change of the phase relationship, e.g. [136]. The first approach estimates the phase difference between signals from the sample cross-spectrum sab (ω), simply by calculating it’s argument (phase angle)— i.e. the estimates are ∆φab (ω) = Arg sab (ω). The second approach requires a technique that estimates the instantaneous phase difference at each time point in the analyzed time interval. The latter situation, in which the phase difference constantly changes over time, is sometimes considered to be of fundamental importance in neuro-cognitive brain function [230]. To this end, the sensor signals ya (t) and yb (t) are first pass-band filtered with a narrow band filter centered at the frequency ω of interest. The pass-band filtered signals are then used to construct the instantaneous phase estimates for each of the signals. Techniques for obtaining instantaneous phases φa (ω, t) and φb (ω, t), are based on the Hilbert transforms of the signals, as described in [105,153,154,194,222], or on the convolution with a complex valued (Gabor) wavelet [136,139]. From these instantaneous phases, a time dependent phase difference ∆φab (ω, t) = φa (ω, t) − φb (ω, t) is obtained, and time windows in which this phase is near constant are found. To obtain a measure of stability of the instantaneous phases in [136] it was suggested to use the estimates’ variance across trials. Sometimes the generalized phase difference ∆φab (t) = pφa (ω, t) − qφb (ω, t) is considered, where p and q are fixed integers [139, 194, 230]. Generalized phase locking is then detected if |∆φab (ω, t) − C| <  for some constant C in an appreciable interval of time, for some small, positive constant . A comparison of the Hilbert transform based and wavelet based types of phase estimates in [139] suggests only minor differences. Interpretation. Extensive time windows (typically a few hundred milliseconds or so [139]) of a stable phase difference, i.e. phase coupling, between two brain signals can be indicative of functional connectivity. Such phase locking has been implied as a solution in the visual binding problem [208,230]. Under some circumstances the phase difference analysis may be turned into a technique that allows for the detection of effective connectivity in terms of temporal precedence. This technique results from the time-shift theorem of Fourier analysis, which has the corollary that the argument of the cross-spectrum at frequency ωk between a signal and a copy of itself, shifted in time by τ seconds, is 2πτ ωk . In other words, the phase of the cross-spectrum will be a linear function of frequency, with a slope that is proportional to the time lag between the signals. Therefore if such a linear relation is observed in (a portion of) the phase spectrum, the lead-lag relation may be determined, and the precise time delay between the signals may be determined from an estimate of the slope of the linear regression of the phases on the frequencies. This method hinges on the assumption that the signal of the leading sensor is transmitted relatively undistorted in at least a portion of the spectrum to the second sensor. This technique has been successfully applied to EEG’s of spike activity in epileptic seizure propagation from one hemisphere to another in [84]. Note that the time delays could have been determined from the main peak of the cross-correlation function, but in some cases one would like to limit the analysis to the frequency band which displays linear phase relation, which is most easily identified by the method described here. Problems. In [74] it was argued that under some relatively general assumptions, a stable phase difference is a necessary condition for non-zero cross-correlations. The same arguments would lead to the same conclusion for coherence. Therefore a demonstration of non-zero coherence between signals may be an implicit demonstration of phase locking (cf. [74]). On the other hand, phase differences between different channels are not meaningful without significant coherence [84]. Therefore there seems to be little advantage of phase difference stability analysis at isolated frequencies over coherence analysis. Phase analysis in addition to coherence can be useful however, because a phase difference unequal 0 (mod π) cannot be due to volume conduction of the signal of a single source, therefore phase analysis may help to identify volume

2.3 Noninvasive Methods for Determining Cortico-Cortical Interactions

29

conduction artifact (cf. [223]). Because a phase analysis relies on the significance of coherence, and coherence is reflective of the iso-potential and -field lines, phase analysis does not simplify localization of the origin of coherence any more than coherence itself does. Having identified a stable phase relation for one frequency does not allow inference on effective connectivity in terms of temporal precedence. This is so because the phase at a particular frequency can only be determined up to a value (mod 2π). This holds for both the instantaneous phase difference estimate, as well as for the more traditional cross-spectrum based phase difference estimate. In contrast, phase difference estimates of consecutive frequencies does allow assessment on effective connectivity. A caveat of this technique is however that it only allows detection of relatively undistorted transmission of activity from the leading signal to the lagging signal (at least for one band of frequencies). This requires minimum phase distorted filtering [133] which may be difficult to realize for a biological system. 2.3.5 Event Related (De-)Synchronization Method and Assumptions. It has been long observed that certain events, like opening the eyes, can “block” or desynchronize the ongoing alpha rhythm of the EEG. Furthermore it has been shown that visual stimuli can reduce the amplitude of the background EEG [183, 241]. Whereas stimulus locked ERP’s are thought to be reflective of the cascade of postsynaptic potentials evoked by a stimulus or event, amplitude modulations and alpha blocking are thought to be the result of changes in the parameters that control the oscillations in cell assemblies [183]. Cell assemblies in the cortex were found to be able to generate different frequencies under different behavioral states (sleep vs. paying attention) in dogs [143,183]. Therefore the type of oscillation will depend on the configuration of the activity in the involved cells. A key feature in this matter is that when cells are entrained in an assembly, the oscillation frequency that is enforced on their activity tends to diminish as the number of cells entrained grows, while at the same time the local field potential amplitude increases due to the greater number of cells acting in synchrony. This feature is due to the relative coupling between cells in a network, as was confirmed in simulations [142] (cited in [183]). Thus higher frequencies seem to be associated with greater spatial variation in oscillations (smaller, and more independently acting assemblies), whereas the opposite seems to be true for lower frequencies (cf. [183]). It is therefore believed that change in amplitude variance over time in specific frequency bands reflect variations in local neuronal coupling [183]. In line with this interpretation, when this change is displayed as a percentage relative to a reference baseline level (determined e.g. a few seconds prior to stimulus presentation), a deflection of this percentage is called an event related desynchronization (ERD), whereas a rise of this percentage is termed an event related synchronization (ERS). From the outset then, the signals are assumed to be non-stationary. Basic steps in the computation involve: bandpass filtering of the entire signal set (e.g in the alpha band, which is then often divided into two sub-bands 7 − 10-Hz and 10 − 12-Hz), estimation of the second order moment of the amplitude at each time point, averaging across trials, and smoothing of the second order moment time functions over time. The way in which the second order moment is calculated varies from author to author, and may be done by squaring the signals [241], or by subtracting the trial average ERP prior to squaring [183]. ERD/ERS extrema are often used as dependent variables (similar to traditional peak picking in ERP analysis). Interpretation. The ERD/ERS is interpreted as a overall measure proportional to the strength of coupling between unidentified neurons in a presumably localized neural assembly. ERD/ERS on the one hand is indicative of functional coupling, on the other hand it is more reflective of functional segregation, as it presumes a strongly localized neural assembly of which the local segregation and integration is reflected by ERD and ERS respectively. It therefore serves a somewhat different purpose than the other measures discussed in this section.

30

2 Cortico-cortical interactions and their analysis

Problems. ERD/ERS analysis suffers from the same problems as coherence analysis and correlation: Both EEG and MEG have a low spatial resolution, for which it has been recommended to use (spline) Laplacian derivations, cortical images or a distributed source model (the results of which are mostly the same) [183]. For EEG signals there is the reference problem, which has been suggested to solve by the same methods as suggested for coherence—i.e., average reference, local average reference and Laplacian. A problem specific to ERD/ERS analysis is the determination of the center frequency and width of the pass band of the filter. Large inter-individual variation exists with respect to the alpha frequency (i.e. the actual frequency band that should be considered as real ‘alpha’ differs from subject to subject). Several methods intended to overcome this problem are discussed in [183]. A further problem for ERD/ERS arises if the assumption of an evoked response that is homogeneous and stereotyped across trials does not hold. In that case, it is not unlikely that the variance of the trial to trial variation of the evoked response around the trial averaged evoked response is functionally related to the amplitude of the trial averaged evoked response (see [225] for an example of such an effect). 2.3.6 PCA Method and Assumptions. It has been suggested to use principal components analysis (PCA) of the signal covariance matrix, to obtain indications of the independent “functional networks” that exist throughout the brain. The decomposition uses the assumption that the networks satisfy a spatial orthogonality constraint: Two “networks” receive contributions from different regions, quantified by the network loadings, in such a way that these network loadings are orthogonal. This is for instance the case if each region contributes to only one network. The covariance used to this end is obtained by averaging across samples and/or trials (fMRI [70]) or across subjects (which is the only method feasible for PET [156]). To consider the latter as an equivalent of the former requires an ergodicity assumption on subjects. Interpretation. Principal components have been interpreted as functional networks based on PET and fMRI data [70, 159], as well as on EEG/MEG [70, 120, 149]. The associated component loadings indicate the regions that take part in the network represented by the component (i.e. a region with zero loading does not take part in the network). PCA is a data transformation technique, and is in fact the eigen-decomposition of the data covariance matrix. The resulting eigenvalues and associated component loadings (eigenvectors) are thought of as characterizations of underlying large scale networks. The spatial patterns reflected in the loadings, sometimes called eigenimages, are interpreted as being indicative of the involvement of an associated location (e.g. electrode location) in activity of the “large scale network” represented by the extracted component. Problems. In EEG/MEG the lead field is a major determinant of the correlation pattern, as was previously discussed in relation with coherence analysis. It is therefore also the major determinant of the component structure and components loadings patterns. Hence this method is inappropriate for EEG/MEG [171, for a similar argument]. For PET and fMRI studies, these “network’s” should be considered as highly tentative. These “networks” are forced to be uncorrelated and the network loadings are forced to be orthogonal. This is not only very restrictive, but should also clearly be an undesirable feature, since different cortical networks are not expected to work in complete isolation. PCA is known in psychometrics as an exploratory method, this in contrast to confirmatory factor analysis, in which the hypothesized factors are the result of theory. The hypothesized factor model is then tested through appropriate statistical tests, to see if the theory should be rejected on the basis of the data. The factor model, in particular the confirmatory factor model, which allows the existence of correlations between components may be a viable alternative to PCA for network analysis. As a further cautionary note, it is not unlikely that in fMRI data lagged covariances exist. In such cases, looking only at the covariance matrix at lag zero may strongly mislead interpretation, since highly correlated, but lagged components (“networks”) can show up in a PCA

2.3 Noninvasive Methods for Determining Cortico-Cortical Interactions

31

of covariance matrices at lag zero as completely independent components (“networks”). This precludes invocation of standard PCA in the analysis of multiple time series [7, 162]. We will return to these points when discussing the dynamic factor model below. 2.3.7 Path Analysis Method and Assumptions. Path-analysis or structural equations modelling (SEM) has been applied in PET and fMRI to assess effective connectivity [33,34,158]. Structural equation modelling can be conceived of as a sequence of regressions of variables onto each other, in accordance with a pre-specified structure of (causal) relations. Thus a priori information (assumed to be available) is used to specify the paths along which the interactions take place. The path analysis then estimates the strengths of these paths. For example, the activity ηa in region a may be known or hypothesized to be (linearly) dependent on the activities ηb and ηc in regions b and c, whereas the activity ηb in region b depends on the activity in region c, and the activity ηc (t) in region c in turn depends on the activity in region a and b. The resulting (linear) equations will then be ηa = βab ηb + βac ηc + ζa ,

ηb = βbc ηc + ζb ,

ηc = βca ηa + βcb ηb + ζc ,

or, alternatively, ηa may only depend indirectly on ηc through it’s dependence on ηb , then ηa = βab ηb + ζa . In fMRI the paths may be specified in accordance with known anatomical pathways [230]. The structural equations imply a certain structure for the covariance between ηa , ηb , and ηc , and the path coefficients therefore may be estimated by comparing the observed covariances with the predicted covariances and minimize their difference. Subject to distributional assumptions on the data, the adequateness of the model (i.e. confronting model with the data) may be assessed using a chi-square goodness of fit test. The distributional assumption that has to be made depends on the type of estimator that is chosen. For ML estimators the usual assumption made is multivariate normality. For other types of estimators, asymptotic normality of the sample variances and covariances is made [29], or only finite fourth order moments of the data are assumed [30]. Structural equation models are not confined to linear interaction [5, 20]. For example, nonlinear interactions were included in a path analysis conducted in [33], to assess changes in effective connectivity. The sample covariances used in the estimation procedure may be estimated over time, across trials (fMRI [34], or across subjects (which is the only feasible case for PET [156, 157]). To regard any two of these sample covariances as equivalent, requires an appropriate ergodicity assumption. Interpretation. The path coefficients are interpreted as the change in one variable (one cortical region) due to a unit change in another, while the other variables are held constant [20,34]. It has been suggested that averaging across trials yields estimates of direct connections, while averaging across subjects is indicative of how well a connection generalizes in the population [156]. If subjects have different structures of cortical activation in a given task however, e.g. due to different task strategies, the structural coefficients obtained from subject averaged covariances have no straight forward physiological interpretation in terms of effective connectivity. Although time is usually disregarded, it is easy to accommodate SEM models to include time lag dependencies. A Kalman filter method was introduced to characterize the change of path coefficients over time in [35]. Problems. Very often a priori paths are unknown, and path models are constructed in an ad hoc way. First of all the regions to include in a path model are not at all obvious and must be determined in some ad hoc way, e.g. by looking at eigenimages [156]. Secondly, relevant pathways have to be extracted from the neuroanatomy literature which is not a trivial step, and often depends on the convictions of the researcher [156]. For most models furthermore, alternative models can be found with entirely different interpretations that can be fitted to the data equally

32

2 Cortico-cortical interactions and their analysis

well. Commonly, a researcher starts with a very simple model, which often does not fit the data (as indicated by an appropriate test). The researcher then modifies the presumed connectivity pattern until a fitting model is obtained. It has been recommended to automatize the selection of an appropriate path model by adding connections that maximize a fitting criterion [36]. Connectivity estimates obtained in this way are susceptible to capitalization on chance because the use of multiple tests until nominal insignificance yields an unknown significance level [20, p. 61]. The models automatically selected in this way may therefore not optimally generalize to a more general population [20] (see [147,148] for demonstrations of these effects in simulations and real data). Warnings against automated addition of structural parameters without substantive theoretical justification are ubiquitous in the covariance structure modelling literature [148]. Even substantive theoretically justified addition of paths are problematic, because post hoc “justifications” are often easily found [148]. Automated selection can be helpful, but should be applied with care. Eventually the empirical cycle must select the appropriate model, not the individual data sets. This of course requires testing a priori hypotheses against the data. However, problems may arise in the testing phase: Sometimes no appropriate test exists or is known to exist and one has to rely on approximation, which in turn relies on asymptotic theory of large samples. At other times exact modelling is not appropriate or simply impossible and simplifications have to be made (see below). This then results in “hypothesized” models which are known to be incorrect for the data, and little rationale has been developed in deciding whether lack of model-data conformance is due to approximation or incorrectness. An intrinsic limitation of SEM is that not all conceivable structural equations models can be estimated, as not all of them necessarily identify their parameters.7 Furthermore, anatomically derived paths may result in a very complex system of equations, often highly recurrent. Such highly complex models easily become unidentifiable or difficult to interpret. Therefore compromises have to be made between model complexity, anatomical accuracy and interpretability [34]. As was the case with PCA, ignoring lagged covariances among the time series, which is the result from estimating the sample covariance across time, limits the validity of inferences made with respect to the dependencies between regions. To repair this, the lagged cross-covariance function in (2.7) or (2.8) should be computed and modelled. This is the natural domain for parametric models of time series, which are discussed next. 2.3.8 Parametric Modelling: Vector Auto-Regressive Models Method and assumptions. The most fundamental parametric models for (discrete) stationary time series are autoregressive moving average (ARMA) models [22, 26]. For multiple simultaneous time series as obtained in brain activity studies, the multivariate generalization of vector autoregressive moving average (VARMA) models is required. Vector autoregressive models (VAR—without the moving average part) have been applied in EEG signal analysis in [10,68,126,141,197]. Let y(t) represent a m-vector of measurements of the signals y1 (t), . . . , ym (t). The VAR model of order p expresses the measurements at time t as a linear function of the measurements at lags τ = 1, . . . , p—i.e. a linear filter model: y(t) = H1 y(t − 1) + H2 y(t − 2) + · · · + Hp y(t − p) + ε(t), where ε(t) is a temporally white noise excitation process, with covariance Υε(τ ) = δ0τ Υε, and which is usually assumed to have a multivariate normal Gaussian distribution. Here δ0τ is the Kronecker delta symbol which equals one when τ = 0 and zero otherwise. The m × m coefficient matrices H1 , . . . , Hp and Υε are unknown and have to be determined from the observed data. Estimates of the coefficient matrices are obtained from the multivariate Yule-Walker equations 7

The mathematical significance of “identifiability” of a model is discussed in App. B, for now it may be described as the requirement that all path coefficients (and other parameters) can be calculated as soon as the true covariance matrix is known.

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

2.3 Noninvasive Methods for Determining Cortico-Cortical Interactions

33

constructed from the block Toeplitz lagged sample covariance matrix of the signals, as described in e.g. [68]. For stationary time series VAR models can completely characterize the (observable) dynamics of the system generating the signals, provided their (possibly infinite) order is selected appropriately [22, 133]. For non-stationary time series, dynamic updating of the estimates has been pursued by means of lattice recursive filters [197], Kalman filtering [10], and Hidden Markov methods [39]. The coefficient matrices  H1 , . . . , Hp will then become time dependent: Ha → Ha (t), a = 1, . . . , p, yielding y(t) = a Ha (t)y(t − a) + ε(t). Interpretation. The AR filter coefficients can be interpreted by inspection of the lead lag patterns that are reflected in the relative contributions of one signal to another at specific lags. While this is feasible for relatively small sets of signals, this can be an daunting task for more than say five signals. Because different frequencies may be interpreted as independent components in stationary time series, a simplification of the interpretation of the fitted VAR model can be obtained by transforming the model into the frequency domain: To deal with larger numbers of signals of EEG measurements, in [126] a measure of directed frequencydependent transfer was suggested, p called the directed transfer function (DTF). Let B(k) = a=1 Ha exp(−i2πak/T ), then the transfer function of the AR filter at frequency 2πk/T is given by [I − B(k)]−1 , and can be used to quantify the relative contribution of signal ya to yb compared to the total input to yb from the other signals by comparing the squared absolute value of the transfer coefficient from ya to yb with the sum of the squared absolute values of all transfer coefficients of all signals to yb . Alternatively, the estimated transfer function can be used to obtain the cross-spectral matrix: Σk = (2π)−1 [I − B(k)]−1 Υε[I − B(k)∗ ]−1 , from which in turn the coherence and phase may be obtained. Here ∗ denotes complex conjugation and transposition. In combination with the p recursive techniques in [10, 141, 197] B(k) becomes time dependent, B(k) → B(k, t) = a=1 Ha (t) exp(−i2πak/T ), this then results in dynamic estimates of coherence and phase. The DTF coefficients, and the transfer function itself provide estimates of effective connectivity in terms of causal relations. Furthermore the obtained (dynamic) coherences provide an assessment of (evolution) of functional connectivity. Problems. We defer a discussion of the problems to the discussion of dynamic factor analysis discussed next. 2.3.9 Parametric modelling: Dynamic Factor Analysis Method and assumptions. An alternative to VAR models is constituted by the dynamic factor model (DFM) [161, 162]. In this model it is assumed that the observed signals are generated by a smaller set of d say, latent stationary stochastic processes, plus residual noise signals that are unique to a particular sensor. In it’s simplest form the model assumes the latent processes ζa (t), a = 1, . . . , m, to be mutually uncorrelated, all with a uniform spectrum (viz. “white noise”). The observed signals are then assumed to be the response of a lagged filter system that is excited by the latent processes, imposed with additive and spatially uncorrelated stationary noise processes: y(t) =

q 

Λ(a)ζ(t − a) + ε(t).

a=0

Here ζ  (t) = [ζ1 (t), . . . , ζd (t)], ε(t) is an m-vector of mutually uncorrelated stationary noise processes εa (t), a = 1, . . . , d, and {Λa }qa=0 is sequence of unknown m × d lagged filter coefficient matrices to be estimated. The (generally unknown) number q is called the order of the DFM. Examples of applications of DFM to EEG are found in [162–164]. As a statistical and theoretical advantage, dynamic factor models can be more parsimonious, because the m sensor signals are regressed upon a smaller set of signals requiring less regression coefficients than VAR models, provided that the lag order of the DFM does not exceed the order of the VAR too much.

34

2 Cortico-cortical interactions and their analysis

The model can be fitted in the time domain [161] and in the frequency domain [162, 163]. The frequency domain approach can have several advantages [162], in particular it can greatly simplify the estimation. One simplification of the frequency domain method arises in the fact that the cumbersome convolution inherent in the filter model can be turned into linear regression in the frequency domain8 [162]. As in the case of VAR models, the model assumes stationarity of the signals, although some advances for non-stationary time series have been made [165]. Although methods are available (Molenaar, personal communication), dynamically changing coefficients DFM models for non-stationary signals have not been pursued in the analysis of brain signals. VAR models and DFM models are not mutually exclusive, as DFM can be applied to the cross-spectrum matrices obtained from an VAR model [163]. Interpretation. The latent processes obtained in a DFM are interpreted as units of highly coherent activity, this may be a single source or a large scale network, the distinction of which might be possibly through inspection of the filter coefficients. Lead-lag patterns in the coefficients may be indicative of effective connectivity. Problems. Since EEG and MEG signals are linear mixtures of the signals of interest, VAR and DFM coefficient estimates will reflect contributions from cross-covariance, autocovariance, as well as from lead field induced covariance among the signals. Therefore contributions from one signal to another, as reflected in the transfer coefficients, DTF, or DFM filter coefficients, cannot be unambiguously interpreted in terms of connectivity. The filter coefficients do not simplify localization in any way. For EEG signals, the reference problem has not been considered properly in any of these models, and potentially greatly compromises their interpretability. The average reference and Laplacian derivation may be considered for this purpose, but as with coherence analysis, this will still yield ambiguous interpretation. It is generally believed that EEG and MEG are non-stationary “in the long run”, so that VAR and DFA models can only be applied to short segments of the time series (of up to two seconds). Preprocessing procedures such as trend removal have been suggested to increase the time span of signal stationarity, but trend removal should be exercised with care in EEG/MEG signal analysis, because it is a high pass filtering operation while these signals concentrate their spectral variance in the lowest frequency ranges [163]. A disadvantage of the DFM frequency domain method is that the estimated filter is not necessarily causal, which complicates it’s interpretation considerably. DFM furthermore suffers from the problem of rotation indeterminacy, common to all factor models. A solution to these problems was found in [162], where it was proposed to seek a rotation of the initial solution such that the resulting filters have minimum phase. The advantage of doing so is that the obtained filter is a unique causal filter. For otherwise freely estimated filter coefficients (i.e., if the set of estimated coefficients is not required to satisfy constraints) it remains necessary however to constrain the underlying sources to be uncorrelated white noise generators [161]. An alternative rotation was proposed in [164], in which a characteristic root that is common to all the processes fitted in the DFM, is in fact a characteristic root of the latent process. Hence the white noise latent signal constraint is not a necessary restriction. Specific to parametric models is the problem of filter order selection. A priori information on filter length is usually unavailable, and the order has to be estimated. A standard choice of doing this is to minimize Akaike’s information criterion [68, 125, 162].

2.4 Discussion In this chapter we have considered how neurons interact and communicate to each other, both (briefly) at the physiological level as well as at a more abstract conceptual level of how spike trains of two neurons are statistically related. A mathematical model at this conceptual level 8

See the ‘explained variance by a linear filter’ interpretation of coherence in section 2.3.3

2.4 Discussion

35

was put forward in [25] for the relation between “average” spike train signals that are involved in this interaction. This model specifies the relation “on average” in terms of a time invariant filter system which was obtained as the first order term in a Volterra-like expansion. Spike intensity is thought to vary with the level of local blood supply [71, 171], which in turn is thought to constitute the BOLD response. For many neurons in a group interacting approximately simultaneously with postsynaptic neurons in another group, it was considered how this model translates into a tentative linear filter model of the relations between average excitatory postsynaptic currents (EPSC’s), which are thought to be the primary current sources measured with EEG and MEG. These models are mathematical statements about the relation between signals, in a way that precisely coincides with the notion of effective connectivity. In a causal interpretation of effective connectivity, the filter kernels in these models should be considered causal filters, which are defined to depend only on past values of the exciting process and not on future values. Causality of the filter may be imposed explicitly, or may be observed by fitting an unrestricted kernel (see [25] for an example).

−10

−5

0

5

left ear − right ear

10

10 5 −10

−5

0

inion − nasion

5 −10

−5

0

inion − nasion

5 0 −10

−5

inion − nasion

sensor coherence

10

sensor coherence

10

sensor coherence

−10

−5

0

5

left ear − right ear

10

−10

−5

0

5

10

left ear − right ear

Fig. 2.3. Coherence due to noise alone, noise and a single dipole source, and the same dipole source with a slightly different orientation. Blacker and thicker lines indicate higher coherence. Noise was generated from a set of 30 randomly localized sources with random orientations and amplitudes at each time point [58]. Signal to noise ratio was approximately 2:1. Only coherences between sensors with a distance greater than 7 cm are depicted.

An overview was given of most common methods used to assess functional and effective connectivity. The methods discussed should not be considered as an exhaustive listing of all methods presented in the literature. In particular we did not discuss methods derived from information theory [50, 135], methods designed for detecting [2, 215] and correlating non-linear signals [100], or Graphical Markov models for multivariate time series [43, 44]. The methods discussed may be considered as basic however, in the sense that alternative methods that solely rely on signal analysis to establish connectivity, will be variations of the basic methods or closely related. The picture that arises, is that none of the current methods is without problems, and that their interpretation can be severely hampered. For PET and fMRI problems arise because of their very limited time resolution. For EEG and MEG the problems that arise concern the spatial resolution of the sensor signals due to volume conduction, and the choice of reference electrode in EEG, which cannot be chosen to be neuro-electrically silent [129]. The search for functional connectivity hinges on the concept of functional integration, which was already concluded to be meaningless without reference to functional segregation [70]. In other words, the notions of functional and effective connectivity are strongly rooted in the presumption that the systems that are involved are accurately identified—e.g. which neurons

36

2 Cortico-cortical interactions and their analysis

in multi-unit recordings, or which cortical regions in PET, fMRI, EEG and MEG. In fact, effective connectivity was defined to be the simplest possible neural like circuit that could explain the dynamic relations between the units involved. Segregation, or specialization of differently localized cortical regions is now a well established phenomenon, and it is therefore justifiable to refer to cortical locations as the units of functional segregation. At the minimum then, their approximate locations should be known. Without this knowledge, it is hard to determine what exactly has been discovered or confirmed in an experiment concerning connectivity, especially when mere “differences in connectivity” between different conditions have been found (e.g. [173]): In discussing the results of their experiment, authors usually allude to regions whose activity they presume to be reflected in the signals that were measured there nearby. For multi-unit recordings and fMRI and PET that seems to be a valid assumption, but for EEG/MEG such a statement can hardly be maintained. “Increased functional connectivity in the frontal regions” can be a highly misleading interpretation of activity measurements that are in fact aggregate measurements, like EEG and MEG. A small reconfiguration of activity in source space, may result in completely different correlations in aggregate EEG/MEG space. An example of this is given in fig. 2.3: In that figure MEG coherence due to a single dipole source and background noise is presented for two slightly different dipole orientations. The resulting coherence patterns are suggestive of interactivity involving completely different cortical regions and may be interpreted as “increased coherence” between the frontal and occipital region in the last panel of the figure, compared to the second panel. Of course no such “increase” is present at the source level. Furthermore the first panel shows “spurious” coherence between widely separated sensors picking up completely uncorrelated signals of randomly located dipoles. It is therefore argued here, that connectivity studies, both functional and effective, should be concerned about the origins of the signals and not rely on the simple sensor region-correspondence heuristic, which can result in misguided localizations, and subsequently misguided interpretations. PET and fMRI have the major advantage of providing accurate localization, and hence identification of cortical regions involved in connectivity is greatly facilitated. Structural equation models can be effectively used to study effective connectivity between regions involved in specific types of information processing. The dynamics of the interactions however, taking place at millisecond level time intervals, is not accessible in these data, due to the inherently slow physiological processes reflected by the signals. Although EEG and MEG do have millisecond time resolution, since these signals are linear mixtures of signals produced by the underlying neural generators, structural equation models for sensor signals do not lead to correct interpretations on effective connectivity in obvious ways. The usefulness of structural equations for neurophysiological data should be clear however, although modifications might be necessary to accommodate dynamic relations as described in terms of the linear filter model in Eq. (2.9). This would lead to a more complete description of effective connectivity. To be able to obtain interpretable connectivity measures from EEG or MEG, the multivariate structure of these signals has to be decomposed into its spatial and the temporal parts, by accounting for the biophysical laws that govern their generation. The multivariate methods of PCA, VAR and DFM are all aiming at a type of spatial and temporal decomposition, but none of these methods take into account the corresponding neuro-physical measurement theory. Of these methods, the dynamic factor model comes closest to a decomposition of the signal structure that assumes an appropriate measurement model, as it assumes that common (neural) sources underly the observed EEG/MEG signals. This is clearly a desirable feature. Unrealistic features of the DFM as applied so far, are the constraint of “white noise” sources, and the lagged filter relation between sensor and source signals. Not only are underlying sources correlated, which is in fact the purpose of the analysis, they are also not “white”, but have a bandlimited spectrum.9 9

As mentioned before, the white noise source constraint is in fact unnecessary, as the indeterminacy can be resolved in several ways [164].

2.4 Discussion

37

Also, because the quasi-static approximation is appropriate for EEG/MEG signals [52,102] there is no (observable) lag between the source signals and sensor signals. A good starting point of decomposing the multivariate signals into their spatial and temporal components seems to be the multiple dipole source model of the previous chapter. The DFM is readily adapted to incorporate the biophysical knowledge that is built into these dipole models, by means of the lead field matrix that is fundamental to these models. The immediate advantage of doing this, is that the requirement of latent uncorrelated white noise signal sources of the original formulation of the DFM [161, 162] becomes unnecessary. The relations between these latent sources can then be estimated. As with the DFM, a frequency domain approach will simplify model fitting considerably [162, 163], although some temporal resolution then has to be sacrificed, as stationarity of the interaction dynamics will be required. Due to the nonlinearity of the dipole model parameters, estimating these parameters may be highly complicated. In the three chapters to follow, attempts to combine dipole and dynamic latent variable models will be presented. In Chap. 3 some estimation theory will be presented underlying the methods developed in Chap. 4 and Chap. 5, and algorithms are developed to simplify the estimation procedure which greatly increase computational speed. In Chap. 4 the dynamic factor model is combined with the dipole model and includes estimates of their amplitude cross-spectra. In Chap. 5 the method in Chap. 4 is extended to include the information about the neural sources that is present in the trial average EP/EF (or ERP/ERF) which would allow to assess interaction between these sources and is of considerable interest [77]. Furthermore, a way to include dynamical path models between the estimated sources is developed.

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

3 Mean and Covariance Structures for Complex Random Variables

The1 sub model of the dynamic factor model that will be used in the two subsequent chapters transforms into a confirmatory factor model in the frequency domain. In this chapter we will therefore focuss on the confirmatory factor model. Models of this type are usually estimated by fitting the covariance structure that is implied by a model to the observed data covariances. A mean structure can be included as well. There are several ways of fitting, of which the two most prominent shall be discussed here. Originating from the psychometric literature, the model has dissipated to both the social- and engineering-sciences literature. The applications in the engineering literature, as well as in this thesis, often involve complex stochastic variables, which are briefly introduced first in this chapter. Subsequently, (complex normal) maximum likelihood (ML) and generalized least squares (GLS) estimation procedures for mean and covariance structures of complex stochastic variables will be discussed. The discussions will focuss on specialized techniques for estimating the confirmatory factor model. This targeted development of estimators has the advantage of providing estimation algorithms with increased computational speed, as well as providing some insight into the capabilities and limitations of the estimators. The confirmatory factor model has been of considerable interest in the electrical engineers’ signal processing literature for direction of arrival (DOA) estimation, and some techniques of estimation have been developed there, that are of interest to the current purposes. In particular: in our discussion of ML and GLS estimation, the estimators developed in the signal processing literature, based on concentrated likelihood and separable least squares methods, will be discussed and extended in several directions. This chapter will close with a brief discussion of the stochastic properties of the Discrete Fourier Transform (DFT) of stationary stochastic signals, and conclude that the mean and covariance structure methods presented below are appropriate for application to Fourier transformed data.

3.1 Complex random variables For real random variables A and B, the unique variable Z = A + iB, with i2 = −1, is defined to be a complex random variable. Its expected value EZ is given by EA + iEB. Its variance, denoted Var(Z), is defined to be E(Z − EZ)(Z − EZ), where Z denotes the complex conjugate of Z. For two complex random variables Z and W , their covariance, denoted Cov(Z, W ), is defined to be E(Z − EZ)(W − EW ). If Z1 , Z2 , . . . , Zm are complex random variables, then Z  = (Z1 , Z2 , . . . , Zm ) is an m-dimensional complex random vector, with expected value E{Z  } = (EZ1 , EZ2 , . . . , EZm ) and variance-covariance matrix Var(Z) = E{(Z − E{Z})(Z − E{Z})∗ } = (E(Zi − EZi )(Zj − EZj )), where Z ∗ denotes the complex conjugate and transpose of Z [6]. In appendix A the operators [·]R and {·}R are introduced, and a number of properties are deduced. These operators facilitate some derivations, and will be used extensively in this chapter. If Z = A + iB, where A, B ∈ Rm d , they are defined by 1

Parts of this chapter and App. C underly [90].

3.2 ML in mean and covariance structure analysis for complex normal random variables

 A [Z]R = B

 {Z}R =

and

39



A −B . B A

Some basic properties include (Thm. A.4, Thm. A.5 and Thm. A.7, p. 109-110): For an m × d matrix Z, an m × d matrix W , and an d × n matrix U , 

[Z]R  [W ]R = (Z ∗ W ),

[Z + W ]R = [Z]R + [W ]R , ∗

{Z}R = {Z }R , −1

{Z}R = {Z

{Z + W }R = {Z}R + {W }R , −1

}R if m = d and Z

−1

(3.1)

{Z}R {U }R = {ZU }R ,

exists, and

+

+

{Z}R = {Z }R

(3.2) (3.3)

where Z + denotes the Moore-Penrose generalized inverse of Def. A.6 at p. 110. Furthermore {Z}R [W ]R = [ZW ]R , [Z]R = [W ]R iff Z = W , and {Z}R = {W }R iff Z = W .

3.2 Maximum likelihood mean and covariance structure analysis for complex normal random variables Definition 3.1. [26, ch. 4] A complex valued stochastic m-vector y is said to have a multivariate complex normal distribution denoted CNm (µ, Σ), if and only if [y]R ∼ N2m ([µ]R , 12 {Σ}R ). The density of the m-dimensional complex normal distribution CNm (µ, Σ) can be shown to be [6, 26] π −m |Σ|−1 exp {−(y − µ)∗ Σ−1 (y − µ)}, y ∈ Cm With help of this definition, maximum likelihood estimation is considered. For L = n + 1 independent observations from this distribution, the likelihood function is f (µ, Σ; y1 , . . . , yL ) =

L

π −m |Σ|−1 exp{−(yl − µ)∗ Σ−1 (yl − µ)}

l=1

The ML estimates of µ and Σ of such observations are y˙ and S respectively, where 1 yl L L

y˙ =

1 ˙ ˙ ∗ (yl − y)(y l − y) L L

S=

l=1

(3.4)

l=1

It is well known that these two variables are independently distributed: From E{yl } = µ, E{S} = E{yl y∗ l − y˙ y˙ ∗ } = Σ + µµ∗ − L1 Σ − µµ∗ = L−1 L Σ, and n = L − 1, y˙ ∼ CNm (µ,

1 Σ), L

L S ∼ CWm (nΣ, n),

and y˙ ⊥ S.

Here CW(nΣ, n) denotes the complex Wishart distribution (see e.g. [6, 26]). Furthermore, they are sufficient for the complex normal distribution [6, Thm. 4.1], and hence we may consider ˙ S) instead of the full set of observations (y1 , . . . , yL ). Throughout the rest of this chapter it (y, will be assumed that L > m. In this case S is almost surely nonsingular. If µ and Σ are known functions of a set of parameters ξ, but the true parameter values ξ0 are unknown, then the ML estimate ξ of ξ0 is obtained by solving ξ = arg max f (µ(ξ), Σ(ξ); y1 , . . . , yL ). ξ

(3.5)

The negative log-likelihood is often easier to work with (the dependence on the observations is dropped):

40

3 Mean and Covariance Structures for Complex Random Variables

− log f (µ(ξ), Σ(ξ)) =

L 

[m log π + log |Σ| + (yl − µ)∗ Σ−1 (yl − µ)]

(3.6)

l=1 −1

= Lm log π + L log |Σ(ξ)| + L tr{Σ



−1

S} + L(y˙ − µ) Σ

(y˙ − µ),

and maximizing f is equivalent to minimizing − log f . In what follows we shall write simply − log f (ξ) for − log f (µ(ξ), Σ(ξ)). In practice, µ and Σ are unknown functions of ξ0 . In such cases theoretical considerations predict the structural form of these functions—which will then be called the model—and one of the objectives of maximum likelihood estimation is to evaluate the validity of this model for the observed values y1 , . . . , yL . A general approach to this problem is to use the generalized likelihood ratio test (GLRT) criterion. For the complex normal distribution this criterion tests the null hypothesis H0 that (µ, Σ) ∈ {(µ, Σ) : µ = µ(ξ), Σ = Σ(ξ), ξ ∈ Rp } against the alternative hypothesis H1 that µ is any vector in Cm and Σ is any positive definite Hermitian m × m matrix. The test statistic is constructed from the negative log-likelihood ratio: ˙ S)) l(ξ) = (− log f (µ(ξ), Σ(ξ))) − (− log f (y, = L log |Σ| + L tr{Σ−1 S} + L(y˙ − µ)∗ Σ−1 (y˙ − µ) − L log |S| − Lm

(3.7)

The negative log-likelihood ratio is convenient for optimization with respect to ξ. Analytic and numerical methods of optimization use the first order differential2 : dl(ξ) = −L tr{Σ−1 ([S + (y˙ − µ)(y˙ − µ)∗ ] − Σ)Σ−1 dΣ} − 2L (y˙ − µ)∗ Σ−1 dµ

(3.8)

Both µ and Σ in l and dl depend on the parameter vector ξ, but for clarity this dependency  It can be shown is suppressed in the notation. The GLRT is now defined to be GLRT = 2l(ξ). d

that under H0 , as L → ∞, GLRT → χ2df with df = m2 + 2m − p degrees of freedom [166, and d

references therein]. Here → indicates convergence in distribution. A second objective of maximum likelihood estimation is to obtain confidence intervals for  Under fairly parameter estimates, which indicate how much confidence one should have in ξ. p p  general conditions the maximum likelihood estimators are consistent (i.e. ξ → ξ0 , where → denotes convergence in probability), and it can be shown that [5, Thm. 4.2.4] √

−1   2l ∂ 1 . L(ξ − ξ0 ) → Np 0, − lim E L ∂ξ∂ξ  ξ0 d

(3.9)

From the covariance matrix of this asymptotic normal distribution (the inverse of the limiting  or standard errors, expected Hessian matrix of l), the standard deviations of the estimator ξ, can be obtained. In practice, where ξ0 is unknown, an approximation to the covariance matrix  With these standard errors and the asymptotic of ξ is obtained by evaluating the Hessian in ξ. normal distribution, confidence intervals can be constructed in the conventional way (refer to Chap. 4 for an example). All of this is of course standard maximum likelihood theory. 3.2.1 Concentrated likelihood methods In optimization problems such as that in equation (3.5) it is generally beneficial to separate parameters that have estimators in closed form from parameters whose estimators can only be implicitly represented and have to be solved for numerically. These closed form expressions may or may not be conditional on other parameters. The resulting estimators may be substituted for their parameters in the model defining expressions, which are used in the likelihood function that is maximized [200]. The resulting function that is to be optimized with respect to the 2

Throughout this thesis the differential as defined in [150] is used (see also Chap. 1, p. 13).

3.2 ML in mean and covariance structure analysis for complex normal random variables

41

non-separable parameters is called the concentrated likelihood [5, 200]. The benefit of obtaining closed form estimators of separable parameters and concentrating the likelihood, is not only the substantial computational simplification that can be achieved (which can be quite substantial, as subsequent chapters will indicate), they can also provide more insight into properties of the resulting estimates. In connection with this we have the following theorem, a proof of which may be found in [200, p. 39]: Theorem 3.2 ( [200], Thm. 2.2). Partition ξ  = (θ , γ  ), where θ is a q-vector of parameters, and γ is a p − q-vector. Let l(θ, γ) = l(ξ), and let ξ = (θ , γ  ) obtained from (3.5). Suppose, for any fixed θ, that γ = γ (θ) solves ∂l(θ, γ) = 0, ∂γ and let (θ) = l(θ, γ (θ)). Then ∂(θ)/∂θ|θb = 0. Furthermore, the inverse of the negative of the 2 Hessian of , (−∂ /∂θ∂θ )−1 evaluated in θ equals the upper left q × q sub-matrix of the inverse  assuming the last Hessian in of the negative of the Hessian of l, (−∂ 2 l/∂ξ∂ξ  )−1 evaluated in ξ, positive definite. With Thm. C.1 in App. C (p. 123) we generalize this result, extending its applicability to situations in which it is required that ∂ 2 l/∂ξ∂ξ  is nonnegative definite, and ∂ 2 /∂θ∂θ is positive definite. The function  in this theorem is called the concentrated likelihood [5, 200]. Note that substituting γ (θ) for γ in l can greatly increase the complexity of its derivative. However, for optimization purposes (both analytical and numerical), the derivative is only evaluated at γ = γ (θ), and by the rules of differentiation ∂(θ) ∂l(θ, γ (θ)) ∂l(θ, γ) ∂ γ (θ) ∂l(θ, γ) ∂l(θ, γ) = = + = . ∂θ ∂θ ∂θ γb(θ) ∂θ ∂γ γb(θ) ∂θ γb(θ) (θ). A further property The last equality hold, because ∂l(θ, γ)/∂γ|γb(θ) = 0 by construction of γ of the concentrated likelihood is that at θ0 the inverse of the expected value of its Hessian matrix provides the asymptotic covariance matrix of θ [5, 200]. It should be mentioned that similar separations of parameters can be obtained in least squares estimation, in which case the procedure is called separable least squares [80, 200]. 3.2.2 Special structure: (pseudo) confirmatory factor model In this section the concentrated likelihood method is applied to the model that has a very similar structure as the confirmatory factor model Σ = ΛΨΛ∗ + Θ.

(3.10)

All matrix variables are functionally dependent on the parameter vector ξ. Here Λ is an m × d matrix with d < m, possibly real, and not necessarily of full column rank, Ψ is a d × d Hermitian positive definite matrix of which the structure is otherwise unknown, and Θ is a p × p Hermitian positive definite matrix of which the structure is partly known. The structure of Σ is motivated by the linear data model y = Λη + ε, where y is a vector of order m of observed variables, η is a vector of order d of unobserved random variables (factor scores in factor analysis terminology), ε is a random vector of order m of measurement error (unique scores), and it is assumed that η and ε are independent and have a complex normal distribution — η ∼ CNd (α, Ψ), ε ∼ CNm (0, Θ). Then the covariance

42

3 Mean and Covariance Structures for Complex Random Variables

matrix of y is Σ, its mean is Λη, and since y is a sum of complex normal random variables is complex normal, y ∼ CN (Λη, Σ). The covariance structure in (3.10) is known in psychometrics as the confirmatory factor model for real normal random variables [121], in which case Λ, Ψ and Θ are taken to be real, Ψ and Θ are assumed to be symmetric positive definite, and Λ is assumed to have full column rank. The same structure is used in the signal processing literature on direction of arrival (DOA) estimation in a method known as ‘stochastic maximum likelihood’ (SML), where Λ is assumed to have full column rank, Ψ is Hermitian but otherwise unstructured and completely unknown, and Θ = σI for unknown real σ (e.g. [134]). As DOA estimation is usually applied in online signal processing settings (e.g. radar and sonar), computational efficiency of the estimators is of great importance. Using the concentrated likelihood method of the previous subsection, a well known compact and computationally very efficient algorithm has been obtained for these purposes (although in most applications it is still too slow) [134,175,180,220]. Although EEG and MEG signals are usually analyzed off-line, direct implementation of the negative log-likelihood results in an algorithm that is still far too slow to work with in practice. Therefore, in this subsection we will generalize the SML algorithm in several directions, making it more suited for the purposes in this thesis. One direction in which the algorithm is extended, is the inclusion of a mean structure. Another extension is that the requirement that Λ has full column rank is relaxed to the requirement that Λ, as a function of ξ, has a constant rank in a neighborhood of the true value ξ0 of the parameters. This is relevant for example, in the context of MEG dipole localization when a spherical head model is adopted: in that case Λ, which is then the lead field matrix, is rank deficient. The algorithm not only drastically reduces computation times, the closed form expressions for estimators derived for concentrating the likelihood provide some insight into their finite sample properties. The covariance model in (3.10) may be augmented with a corresponding structure for the mean: µ = Λα (3.11) It is recommended to add a mean structure, since y˙ contains important information about Λ, if E{η} = α = 0. (Note that in general this model may be unwarranted for non-ratio scale measurements.) To concentrate the likelihood we shall obtain closed form expressions for Ψ, α and σ of the maximization problem in (3.5) if Θ = σU — where U is a known Hermitian positive definite matrix that may depend on other parameters — and find the concentrated negative log-likelihood of (3.7). We will take Λ and U to be functions of (only) the unknown q dimensional parameter vector θ, i.e., Λ(ξ) = Λ(θ) and Θ(ξ) = σU(θ), both continuous differentiable in a neighborhood of the true value θ0 .3 Hence, the to be estimated parameter p-vector ξ contains the q-vector θ, the elements of Ψ and α, and σ. Throughout, we will use the assumptions mentioned after (3.10), and the further assumption that Ψ is completely left unspecified, so that all its unique elements are to be estimated. Subject to regularity conditions (see [5, 200]), the resulting estimator θ of the true value θ0 obtained from numerical optimization of the concentrated likelihood turns out to be consistent if the additional assumptions are made that Λ(θ) and U(θ) are continuously differentiable and Λ(θ) has fixed rank for all θ in a neighborhood of θ0 in the parameter search space, and that Λ(θ) = Λ(θ0 ) and U(θ) = U(θ0 ) implies θ = θ0 (i.e., Λ and U identify θ0 ). Readers not interested in the technical details may skip to the end of this section on page 48 for a summary of the results that are derived in what follows. Let Σ and µ have the structures of (3.10) and (3.11). Then from (3.8), for given θ, Ψ, and σ, or more generally, for given Λ and Σ, the ML estimate of α, if it exists, is a solution to 3

By allowing nonzero noise covariances (parameterized by θ) the current model departs from the classical confirmatory factor model which assumes conditional independence of the measurements ( [104] for intricacies of this situation).

3.2 ML in mean and covariance structure analysis for complex normal random variables

−2 [(y˙ − µ)∗ Σ−1 Λ dα] = 0.

43

(3.12)

This yields the first order conditions for α: [y˙ ∗ Σ−1 Λ] = [µ∗ Σ−1 Λ] and [y˙ ∗ Σ−1 Λ] = [µ∗ Σ−1 Λ], which can be combinedly expressed as y˙ ∗ Σ−1 Λ = µ∗ Σ−1 Λ = α∗ [Λ∗ Σ−1 Λ].  if and only if Λ∗ Σ−1 Λ is non-singular, which in turn is This equation has a unique solution α the case if and only if Λ has full column rank d since Σ−1 is nonsingular by assumption. Then ˙  Ψ, σ) = α(Λ(θ),  α(θ, Σ(θ, Ψ, σ)) = (Λ∗ Σ−1 Λ)−1 Λ∗ Σ−1 y, which is of course a well known result. In accordance with the concentrated likelihood method  Ψ, σ), yielding the concenin the previous subsection, in this case we may substitute α for α(θ, tration of the likelihood with respect to α. In what follows, in order simplify the notation, we  on θ, Ψ and σ, and hence simply write α  for α(θ,  Ψ, σ); we will suppress the dependency of α    , Σ, Ψ, and σ shall similarly suppress the dependency of µ  on other parameters. In the case that Λ∗ Σ−1 Λ is singular, a unique ML estimate of α does not exist. To see this, assume Λ∗ Σ−1 Λ is singular. We then have rank(Λ) < d and we can find a nonzero vector z ∈ Cd such that Λz = 0. Therefore, if we have any particular α1 that satisfies the first order conditions, then (α1 + z) will also satisfy these conditions because Λ(α1 + z) = Λα1 , and hence is an alternative solution. Not all hope is lost however, because a unique ML estimator of Λα still exists: Proposition 3.3. Given any particular values of Λ and Σ, the maximum likelihood estimate of Λα exists, and is given by = Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 y, ˙ Λα where the notation was taken from [150, Chap. 13]. We will denote the term (Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 y˙  with the understanding that it is only interpreted as the ML estimate in a juxtaposition by α, with Λ. This is only a slight abuse of notation.4 is indeed the ML estimate, note from (3.7) that given any value for Λ Proof. To see that Λα and any value for Σ such that Σ is positive definite, Λα maximizes the likelihood if it minimizes the quadratic form (y˙ − Λα)∗ Σ−1 (y˙ − Λα) which has the general solution [150, Thm. 11.35] α = (Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 y˙ + (Id − (Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 Λ)z where z is any complex d-vector. From Lem. A.41, p. 116, (Λ∗ V Λ)+ Λ∗ V Λ = Λ∗ V Λ(Λ∗ V Λ)+ = Λ+ Λ for any nonsingular V and thus Λ(I − (Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 Λ) = Λ − ΛΛ+ Λ = 0. Hence Λα = Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 y˙ + Λ(I − (Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 Λ)z = Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 y˙ = Λα is the unique ML estimate of Λα given Λ and Σ.    which is justified by Next, we find an expression for Ψ after the replacement of α by α, ∗ −1  , Λ Σ (y˙ − µ  ) = 0. The the concentrated likelihood method, so that by construction of µ differential of the concentrated likelihood function with respect to Ψ is ˙ µ  )(y˙ − µ  )∗ ]−Σ)Σ−1 Λ dΨ} = − tr{Λ∗ Σ−1 (S−Σ)Σ−1 Λ dΨ}. (3.13) − tr{Λ∗ Σ−1 ([S+(y− In the case that Λ has full column rank a unique estimate of Ψ can be obtained. In the case that Λ is rank deficient, again no unique solution may be obtained for Ψ; a unique estimate of ΛΨΛ∗ exists however. We have 4

 is not (in general) the ML estimator of α as it does not exist—only in The reader is warned that α  the ML estimator of α. the special case where Λ has full column rank, is α

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

44

3 Mean and Covariance Structures for Complex Random Variables

Proposition 3.4. For given Λ and Θ, the unique ML estimate of ΛΨΛ∗ exists and is given by ∗ = Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 (S − Θ)Σ−1 Λ(Λ∗ Σ−1 Λ)+ Λ∗ . ΛΨΛ

(3.14)

In the case that Λ∗ Σ−1 Λ is nonsingular the unique ML estimate of Ψ exists and is  = (Λ∗ Σ−1 Λ)−1 Λ∗ Σ−1 (S − Θ)Σ−1 Λ(Λ∗ Σ−1 Λ)−1 . Ψ

(3.15)

Again with slight abuse of notation we write  = (Λ∗ Σ−1 Λ)+ ΛΣ−1 (S − Θ)Σ−1 Λ(Λ∗ Σ−1 Λ)+ , Ψ

(3.16)

 is only interpreted as the ML estimator in the juxtaposition with the understanding that Ψ ∗  ΛΨΛ . Proof. From (3.13), the first order conditions are Λ∗ Σ−1 (S − Σ)Σ−1 Λ = 0 ⇐⇒ Λ∗ Σ−1 (S − Θ)Σ−1 Λ = Λ∗ Σ−1 ΛΨΛ∗ Σ−1 Λ

(3.17)

This system is consistent by [150, Thm. 2.11], and has the general solution [150, Thm. 2.13] Ψ = (Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 (S − Θ)Σ−1 Λ(Λ∗ Σ−1 Λ)+ + Z − (Λ∗ Σ−1 Λ)+ (Λ∗ Σ−1 Λ)Z(Λ∗ Σ−1 Λ)(Λ∗ Σ−1 Λ)+ = (Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 (S − Θ)Σ−1 Λ(Λ∗ Σ−1 Λ)+ + Z − Λ+ ΛZΛ+ Λ, where Z is any complex d×d matrix, and the last equality follows from (Λ∗ Σ−1 Λ)+ (Λ∗ Σ−1 Λ) = Λ+ Λ (Lem. A.41). Pre- and post-multiplying both sides by Λ and Λ∗ respectively, suggests the estimator ∗ = Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 (S − Θ)Σ−1 Λ(Λ∗ Σ−1 Λ)+ Λ∗ . ΛΨΛ In the case that Λ∗ Σ−1 Λ is nonsingular, equation (3.15) yields the unique solution to (3.17). To verify that this is indeed a minimum, by the first derivative test [150, Thm. 5.3], it is  sufficient to show that the first order differential is nonnegative in a neighborhood of Ψ—i.e., it ∗ −1 −1  ≥ 0 for all Ψ near Ψ,  with Σ = is sufficient to show that − tr{Λ Σ (S − Σ)Σ Λ(Ψ − Ψ)}  ∗+Θ ΛΨΛ∗ + Θ nonsingular. Such a neighborhood exists almost surely by the fact that ΛΨΛ is nonsingular almost surely (Lem. 3.16, p. 58), and by continuity of the determinant of a matrix [199, Thm. 5.18]. Now with help of Λ = ΛΛ+ Λ = Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 Λ (Lem. A.41), we find  − tr{Λ∗ Σ−1 (S − Σ)Σ−1 Λ(Ψ − Ψ)}  = − tr{Λ∗ Σ−1 (S − Σ)Σ−1 ΛΨ} + tr{Λ∗ Σ−1 (S − Σ)Σ−1 ΛΨ} = − tr{Σ−1 Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 (S − Σ)Σ−1 Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 ΛΨΛ∗ } + tr{Σ−1 Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 (S − Σ)Σ−1 Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 (S − Θ)} = tr{Σ−1 Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 (S − Σ)Σ−1 Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 (S − Σ)} ≥ 0. The inequality follows from the fact that Σ−1 Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 is (Hermitian) nonnegative definite, and the equality tr{ABCD} = vec{D } (C  ⊗ A)vec{B} [150, Thm. 2.3]. Therefore the unique ML estimate of ΛΨΛ∗ exists and is given by (3.14).    depends on Σ which in turn depends on Ψ. Our No separation is observed in (3.16), as Ψ  does not depend on Ψ. next result however implies that Ψ Lemma 3.5. Let Σ = ΛΨΛ∗ + Θ such that Σ and Θ are both nonsingular complex m × m matrices, and Θ is Hermitian positive definite. Then (Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 = (Λ∗ Θ−1 Λ)+ Λ∗ Θ−1 .

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

3.2 ML in mean and covariance structure analysis for complex normal random variables

45

Proof. First note that rank(Θ−1/2 Λ) = rank(Λ) = r, so that by [199, cor. 1.9.1] there exist an m × r matrix F and an r × d matrix G with rank(F ) = rank(G) = r such that Θ−1/2 Λ = F G. Next, the following equality may be verified X = Σ−1 Λ ⇐⇒ ΣX = ΛΨΛ∗ X + ΘX = Λ ⇐⇒ Θ−1/2 Λ(ΨΛ∗ X − Id ) = −Θ1/2 X ⇐⇒ F G(ΨΛ∗ X − Id ) = −Θ1/2 X. Now, from the equality |AB +Im | = |BA+Id | for conformable A and B (see e.g. [150, Thm. 1.9]), |ΨΛ∗ X − Id | = |ΨΛ∗ Σ−1 Λ − Id | = (−1)d−m |ΛΨΛ∗ Σ−1 − Im | = (−1)d−m |ΛΨΛ∗ − Σ||Σ−1 | = (−1)d |Θ||Σ−1 | = 0 by the assumptions on Σ and Θ, so that rank(G(ΨΛ∗ X − Id )) = rank(G) = r. From this, it must be that Θ1/2 X and F have the same range space. Accordingly, an r × d matrix H exists with rank(H) = r, such that Θ1/2 X = F H. Hence, Λ∗ Σ−1 Λ = X ∗ Θ1/2 F G = H ∗ (F ∗ F )G. Since G is r × d, H ∗ (F ∗ F ) is d × r, F ∗ F is r × r, and rank(G) = rank(H ∗ (F ∗ F )) = rank(F ∗ F ) = r, Lem. A.40 (p. 112) implies that (H ∗ (F ∗ F )G)+ = G+ (F ∗ F )+ H ∗ + . Therefore (Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 = (H ∗ (F ∗ F )G)+ X ∗ = (H ∗ (F ∗ F )G)+ H ∗ F ∗ Θ−1/2 = G+ (F ∗ F )+ H ∗ + H ∗ F ∗ Θ−1/2 = G+ (F ∗ F )+ F ∗ Θ−1/2 = G+ F + Θ−1/2 = (F G)+ Θ−1/2 = (Θ−1/2 Λ)+ Θ−1/2 = (Λ∗ Θ−1 Λ)+ Λ∗ Θ−1 where the fact that H ∗ + H ∗ = Ir was used, and Lem. A.40 and Lem. A.42 (p. 116) were used twice.   The observation in the proof of lemma 3.5 that Θ−1/2 Λ(ΨΛ∗ X − I) = −Θ1/2 X, and it’s implication that R(Θ1/2 X) = R(Θ−1/2 Λ), were R(X) denotes the range (or column) space of the matrix X, is a generalization of an observation made in [216]. Lemma 3.5 now allows the separation of Ψ and α from ξ:  = (Λ∗ Θ−1 Λ)+ Λ∗ Θ−1 y, ˙ α

 = (Λ∗ Θ−1 Λ)+ Λ∗ Θ−1 (S−Θ)Θ−1 Λ(Λ∗ Θ−1 Λ)+ . (3.18) and Ψ

 ∗+ It can be shown (see Lem. 3.16 in the appendix at the end of this chapter, p. 58) that |ΛΨΛ Θ| = 0 almost surely, so that the negative log-likelihood remains bounded almost surely. This assures that the concentration of the likelihood with respect to α and Ψ can still be used to estimate the remaining parameters.

Further separation can be attained if Θ(ξ) has the structural form σU(θ). First the likelihood ˇ = ΛΨΛ   + Θ and is concentrated with respect to Ψ and α—i.e. Σ and µ are replaced by Σ  = Λα.  To find the concentrated negative log-likelihood ratio, first an expression is obtained µ −1 ˇ for Σ , which exists almost surely by Lem. 3.16 (p. 58). The case that U = I will be considered, as the more general case is obtained from this by factoring U−1 into Q2 where Q is an Hermitian “square root” of U−1 (this can be done because U is Hermitian positive definite by assumption), and transforming both Σ and S by QΣQ and QSQ.  ∗ +σI)−1 = [ΛΛ+ (S−σI)Λ∗ + Λ∗ +σI]−1 = [ΠΛ SΠΛ +σΠ⊥ ]−1 . ˇ −1 = (ΛΨΛ If Θ = σI, then Σ Λ −1 + But (A + B) = (A + B) = A+ + B + if AB = BA = 0, as may be verified from the defining −1 = (ΛΛ+ SΛΛ+ )+ +(σΠ⊥ )+ . properties of the Moore-Penrose inverse. Hence, [ΠΛ SΠΛ +σΠ⊥ Λ] Λ

46

3 Mean and Covariance Structures for Complex Random Variables

Now (ΠΛ SΠΛ )+ = Λ(Λ∗ SΛ)+ Λ∗ , as may be verified using (Λ∗ SΛ)+ (Λ∗ SΛ) = Λ+ Λ and ∗ + = 1 Π⊥ . We therefore find Σ ˇ −1 = the MP-inverse property (ΛΛ+ ) = ΛΛ+ , and (σΠ⊥ Λ) σ Λ Λ(Λ∗ SΛ)+ Λ∗ + σ1 Π⊥ Λ . In the case that Θ(ξ) has the form σU(θ), for positive definite U, the corresponding relation is ˇ −1 = 1 U−1/2 Π⊥ −1/2 U−1/2 + U−1 Λ(Λ∗ U−1 SU−1 Λ)+ Λ∗ U−1 . Σ U Λ σ

(3.19)

A consequence of this is −1

ˇ Corollary 3.6. Σ

[Im − Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 ] = Θ−1 [Im − Λ(Λ∗ Θ−1 Λ)+ Λ∗ Θ−1 ].

ˇ −1 (Im − Λ(Λ∗ Θ−1 Λ)+ Λ∗ Θ−1 ). Now ˇ −1 (Im − Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 ) = Σ Proof. By Lem. 3.5, Σ denote Q = Θ−1/2 , and note that Θ−1 = σ1 U−1 , hence (3.19) gives ˇ −1 [Im − Λ(Λ∗ Θ−1 Λ)+ Λ∗ Θ−1 ] = Σ ∗ −1 + ∗ −1 −1 ∗ −1 + ∗ −1 ∗ −1 + ∗ −1 QΠ⊥ QΛ Q[Im − Λ(Λ Θ Λ) Λ Θ ] + Θ Λ(Λ Θ Λ) Λ Θ [Im − Λ(Λ Θ Λ) Λ Θ ],

of which the last term is equal to zero, and the first term is readily shown to be equal to ∗ −1 ∗ −1 + ∗ + ∗ −1   QΠ⊥ QΛ Q = Q[Im − QΛ[(Λ Q)(QΛ)] Λ Q]Q = Θ [I − Λ(Λ Θ Λ) Λ Θ ]. The concentrated negative log-likelihood ratio function with respect to Ψ and α is obtained as a further corollary from (3.19). The ML estimator of σ is obtained from this in Prop. 3.8. Corollary 3.7. The negative log-likelihood ratio, concentrated with respect to Ψ and α, is given by ˇ + 1 tr{U−1/2 Π⊥ −1/2 U−1/2 [S + y˙ y˙ ∗ ]} + r − log |S| − m, 1 (θ, σ)/L = log |Σ| U Λ σ

(3.20)

where r = rank(Λ). Proof. The likelihood ratio was given in (3.7), p. 40. By the concentrated likelihood method, Σ ˇ = ΛΨΛ  ∗ + σU and Λα  respectively. Substitution of (3.19) and µ may be substituted for Σ gives −1

ˇ tr{Σ

1 tr{U−1/2 Π⊥ U−1/2 S} + tr{U−1 Λ(Λ∗ U−1 SU−1 Λ)+ Λ∗ U−1 S} U−1/2 Λ σ 1 U−1/2 S} + r, = tr{U−1/2 Π⊥ U−1/2 Λ σ

S} =

since (Λ∗ U−1 SU−1 Λ)+ Λ∗ U−1 SU−1 Λ is a projection matrix of rank r.  = Λα  = Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 y˙ in Lem. 3.3 and from Cor. 3.6 we find Also, from µ ˇ −1 [Im − Λ(Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 ]y, ˇ −1 (y˙ − µ ) = Σ ˙ = Θ−1 [Im − Λ(Λ∗ Θ−1 Λ)+ Λ∗ Θ−1 ]y. ˙ Σ Substitution of σU for Θ, leads to −1

ˇ  )∗ Σ (y˙ − µ

) = (y˙ − µ

1 1 ∗ −1/2 ⊥ y˙ U ΠU−1/2 Λ U−1/2 y˙ = tr{U−1/2 Π⊥ U−1/2 y˙ y˙ ∗ }. U−1/2 Λ σ σ

Combining traces then yields the result.

 

ˇ = ΛΨΛ  ∗ + Θ, with Θ = σIm , the ML estimate of σ exists, and is Proposition 3.8. For Σ given by 1 ˙ y˙ ∗ ]}, σ = tr{Π⊥ (3.21) Λ [S + y m−r where r = rank(Λ).

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

3.2 ML in mean and covariance structure analysis for complex normal random variables

47

ˇ = d[ΛΛ+ (S−σIm )Λ∗ + Λ∗ +σIm ] = d[ΠΛ S ΠΛ +σ(Im −ΠΛ )] = d(ΠΛ S ΠΛ )+ Proof. First dΣ ⊥ ˇ d(σΠΛ ), so that ∂ Σ/∂σ = Π⊥ Λ . Setting the derivative of (3.20) equal to zero gives the first order condition for σ: ˇ −1 Π⊥ } − 1 tr{ΠΛ [S + y˙ y˙ ∗ ]} = 0. tr{Σ (3.22) Λ σ2 ˇ −1 Π⊥ } = tr{( 1 Π⊥ + Λ(Λ∗ SΛ)+ Λ∗ )Π⊥ } = 1 tr{Π⊥ } = 1 (m − r). Therefore By (3.19) tr{Σ Λ Λ Λ σ Λ σ σ (3.22) is solved for σ by σ  in (3.21). The verification that a minimum is obtained is postponed until after the concentrated likelihood is obtained.   In the more general case that Θ has the structure σU for general positive definite U, the estimate becomes 1 tr{Π⊥ σ = U−1/2 [S + y˙ y˙ ∗ ]U−1/2 }. U−1/2 Λ m−r ˙ It may come as a counter intuition that an estimator of the noise variance should depend on y. However, y˙ as an estimate of µ, carries information on σ as well, and the estimator efficiently utilizes this information. The concentrated likelihood ratio 1 can now be further concentrated by substitution of σ  ∗    for σ in (3.20). Denoting ΛΨΛ + σ U by Σ, where Ψ of equation (3.18) is evaluated at σ , this leads to  + 1 tr{U−1/2 Π −1/2 U−1/2 [S + y˙ y˙ ∗ ]} + r − log |S| − m )/L = log |Σ| 1 (θ, σ U Λ σ   − log |S|.  + 1 (m − r) σ + r − log |S| − m = log |Σ| = log |Σ| σ 

(3.23)

We therefore define  ∗+σ (θ) = L log |ΛΨΛ U| − L log |S|, where  = (Λ∗ U−1 Λ)+ Λ∗ U−1 (S − σ Ψ U)U−1 Λ(Λ∗ U−1 Λ)+ 1 tr{Π⊥ U−1/2 [S + y˙ y˙ ∗ ]U−1/2 }. σ = U−1/2 Λ m−r

(3.24) (3.25) (3.26)

To verify that indeed a minimum is obtained at σ , we show that for any σ > 0, 1 (θ, σ) − 1 (θ, σ ) ≥ 0, with equality if and only if σ = σ . We consider the case that U = I, as the more general case may be obtained in a similar way:

   ∗+σ ˇ + (m − r) σ + r − m − log |ΛΨΛ ) /L = log |Σ| I| 1 (θ, σ) − 1 (θ, σ σ  σ  ˇ −1 (ΛΨΛ  ∗+σ − 1 − log |Σ I)|. = (m − r) σ

(3.27)

∗  ∗+σ ˇ −1 = 1 Π⊥ + Λ(Λ∗ SΛ)+ Λ∗ from (3.19), and ΛΨΛ I = ΛΛ+ (S − σ I)Λ+ Λ∗ + σ I = Now, Σ σ Λ ⊥  ΠΛ from (3.18) if Ψ is evaluated in σ = σ . Hence ΠΛ SΠΛ + σ

1  ∗+σ ˇ −1 (ΛΨΛ I) = [ Π⊥ + Λ(Λ∗ SΛ)+ Λ∗ ][ΠΛ SΠΛ + σ  Π⊥ Σ Λ] σ Λ σ  σ  ⊥ = Λ(Λ∗ SΛ)+ Λ∗ ΠΛ SΠΛ + Π⊥ Λ = ΠΛ + Π Λ , σ σ m−r by because (Λ∗ SΛ)+ (Λ∗ SΛ) = Λ+ Λ. It is straight forward to verify that |ΠΛ + ρΠ⊥ Λ| = ρ ⊥ ∗ observing that ΠΛ + ρΠΛ = Z diag(1, . r. ., 1, ρ, m−r . . . , ρ) Z , where Z is an unitary matrix of which the first r columns span the range space of Λ (it also follows directly from Lem. A.43, p. 116). Therefore (3.27) becomes

48

3 Mean and Covariance Structures for Complex Random Variables

 

σ σ   1 (θ, σ) − 1 (θ, σ − log − 1 ≥ 0, ) /L = (m − r) σ σ

as for any real ρ > 0, ρ − log ρ ≥ 1, with equality if and only if ρ = 1. Therefore, σ  is indeed the ML estimate of σ. The ML estimator θ of θ0 is found by optimizing (θ) of (3.24). For this purpose it is advantageous to have derivatives. In the case that U does not depend on θ (i.e. is a fixed matrix), a simplified expression of the derivatives of  is obtained as follows. For simplicity we derive it for the case that U = I. First note that for θa , the a-th element of θ, ∂ ∂1 ∂1 ∂1 ∂ σ = + = , σ=b σ σ=b σ ∂θa ∂θa ∂σ ∂θa ∂θa σ=bσ . Here 1 is given in (3.20). Therefore, since ∂1 /∂σ|σb ≡ 0 by construction of σ ˇ ∂ ∂Π⊥ ∂1 1 Λ ˇ −1 ∂ Σ } tr{ = = tr{ Σ + [S + y˙ y˙ ∗ ]} ∂θa ∂θa σ=bσ ∂θa σ=bσ σ  ∂θa

(3.28)

ˇ = ( dΠΛ )SΠΛ + ˇ = ΛΛ+ (S − σIm )Λ∗ + Λ∗ + σIm = ΠΛ SΠΛ + σΠ⊥ , we have dθ Σ From Σ Λ ⊥ ΠΛ S( dΠΛ ) + σ dΠΛ , where dθ indicates that the differential is taken with respect to θ. Using this and (3.19), it may be verified that −1

ˇ tr{Σ =

1 ⊥ ∗ + ∗ ˇ ˇ } Σ Π dθ Σ} = tr{( + Λ(Λ SΛ) Λ ) d θ Λ σ=b σ σ b σ 

1 ∗ + ∗ tr{Π⊥ ( dΠ⊥  tr{Λ(Λ∗ SΛ)+ Λ∗ ( dΠ⊥ Λσ Λ )} + 2 tr{Λ(Λ SΛ) Λ ΠΛ S( dΠΛ )} + σ Λ )}, σ 

We show that the first and last terms are zero: Deduce from dΠΛ = d(Π2Λ ) = ( dΠΛ )ΠΛ + ⊥ ⊥ ΠΛ ( dΠΛ ) that ΠΛ ( dΠΛ ) = ( dΠΛ )Π⊥ Λ and hence ΠΛ ( dΠΛ )ΠΛ = 0 = ΠΛ ( dΠΛ )ΠΛ . ⊥ ⊥ ⊥ 2 1 ⊥ ( dΠΛ )} = − tr{(ΠΛ ) ( dΠΛ )} = Now dΠΛ = d(Im − ΠΛ ) = − dΠΛ , hence tr{ σb ΠΛ σ ⊥ ⊥ ∗ tr{ΠΛ ( dΠΛ )ΠΛ } = 0. Also Λ = ΠΛ Λ, ΠΛ = ΠΛ , and therefore tr{Λ(Λ∗ SΛ)+ Λ∗ dΠ⊥ Λ} = − tr{ΠΛ Λ(Λ∗ SΛ)+ Λ∗ ΠΛ dΠΛ } = 0. + +∗ ∗ ⊥ It was shown in [80] that dΠΛ = Π⊥ Λ ( dΛ)Λ + Λ ( dΛ )ΠΛ , hence + ∗+ ∗ ⊥ tr{Λ(Λ∗ SΛ)+ Λ∗ S dΠΛ } = tr{Λ(Λ∗ SΛ)+ Λ∗ S[Π⊥ Λ ( dΛ)Λ + Λ ( dΛ )ΠΛ ]}, + ∗ ⊥ + ∗ which reduces to 2 tr{Λ(Λ∗ SΛ)+ Λ∗ S Π⊥ Λ ( dΛ)Λ } = 2 tr{(Λ SΛ) Λ SΠΛ ( dΛ)}, where + ∗ ∗ + + the equality Λ Λ(Λ SΛ) = (Λ SΛ) was used (Lem. A.41). Therefore

∂Λ } σ=bσ = 2 tr{(Λ∗ SΛ)+ Λ∗ SΠ⊥ }. Λ ∂θi ∂θi

ˇ −1 ∂ Σ

ˇ tr{Σ

˙ y˙ ∗ ]} = −2 tr{Λ+ [S + y˙ y˙ ∗ ]Π⊥ Finally, substitution into (3.28) and tr{ dΠ⊥ Λ [S + y Λ dΛ} yield 1 ∂Λ ∂Λ 1 ∂ = 2 tr{(Λ∗ SΛ)+ Λ∗ SΠ⊥ } − 2 tr{Λ+ [S + y˙ y˙ ∗ ]Π⊥ }. Λ Λ L ∂θi ∂θi σ  ∂θi

(3.29)

Summary: A compact algorithm for CFM estimation To summarize the results in this section: We derived closed form expressions for the parameters Ψ, α and σ, for estimating the covariance structure Σ = ΛΨΛ∗ + Θ = ΛΨΛ∗ + σU and the associated mean structure µ = Λα.

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

3.3 GLS in covariance structure analysis of complex random variables

49

We derived the useful identity (Λ∗ Σ−1 Λ)+ Λ∗ Σ−1 = (Λ∗ Θ−1 Λ)+ Λ∗ Θ−1 for any positive definite Θ and nonsingular Σ, from which the following compact concentrated likelihood algorithm for estimation could be obtained: ∗  θ = arg min log |Λ(θ)Ψ(θ)Λ (θ) + σ (θ)U(θ)| − log |S| θ ∗ −1

 U)U−1 Λ(Λ∗ U−1 Λ)+ Ψ(θ) = (Λ U Λ)+ Λ∗ U−1 (S − σ 1 tr{Π⊥ U−1/2 [S + y˙ y˙ ∗ ]U−1/2 } σ (θ) = U−1/2 Λ m−r  ˙ α(θ) = (Λ∗ U−1 Λ)+ Λ∗ U−1 y.

(3.30) (3.31) (3.32) (3.33)

Here r = rank(Λ). For optimization purposes it was indicated in (3.29) how the derivative of (3.30) could be simplified in the case that U is independent of θ.  as estimators of σ, α and Ψ given θ, we may take  and Ψ To evaluate the bias of σ , α expectations in (3.31)-(3.33)—for simplicity we limit ourselves to the case U = Im : Let 1 ∗ ∗ βL (θ) = tr{Π⊥ (3.34) Λ Λ0 [Ψ + αα ]Λ0 }, m−r where Λ = Λ(θ), Λ0 = Λ(θ0 ) and θ0 is the true value of θ, then E{ σ | θ} = σ + βL (θ)  | θ} = (Λ+ Λ0 )Ψ(Λ+ Λ0 )∗ − βL (θ)(Λ∗ Λ)+ E{Ψ +

 | θ} = Λ Λ0 α. E{α

(3.35) (3.36) (3.37)

p p  → Now βL (θ) 0 if θ → θ0 as L → ∞. Therefore σ  is consistent provided that θ is consistent, which is the case if Λ has constant rank within an open ball centered at θ0 , and θ0 is identifiable. +  and α  are consistent under the same conditions if and only if Λ+ Furthermore Ψ 0 Λ0 ΨΛ0 Λ0 = Ψ + and Λ0 Λ0 α = α. For Λ less than full rank, this is not generally the case, but in some instances  can be interpreted as the the resulting estimator has a clear physical interpretation (e.g. α transversal component of an orientation vector). It is clear that with finite sample sizes the estimators are biased, the severity of which will depend on the curvature properties of Λ, the sizes of both Ψ and α, and the ratio between the number of elements in θ and m.

3.3 Generalized least squares in covariance structure analysis of complex random variables Least squares estimation involves the simple idea to minimize the squared distance between the elements of Σ(ξ) and S by varying ξ. In generalized least squares this idea is generalized by associating with each squared distance a constant that weighs the importance of that particular distance in determining the estimate. These weighing coefficients can be chosen to yield a generalized least squares estimator with the minimum possible standard errors. The estimators that converge to the latter estimator were called best generalized least squares estimators by Browne [29]. The asymptotic theory of these estimators for complex random variables is outlined in App. B. Generalized least squares (GLS) estimation of covariance structures was introduced for real random variables in the seminal paper of Browne [29]: If V is the usual sample covariance of the real valued m-vector observations x1 , . . . , xL , and Υ(ξ) is a matrix function such that E{V} = Υ0 = Υ(ξ0 ), then, subject to regularity conditions, it is shown in [29] that the GLS estimator ξ˘ of ξ0 can be obtained from minimizing the objective function 1 tr{[W(V − Υ(ξ))]2 }, 2

(3.38)

50

3 Mean and Covariance Structures for Complex Random Variables

where W is a positive definite weighting matrix. The asymptotic statistical properties of these estimators were determined in [29]. One of the results in [29] is that under the fairly general condition that V is asymptotically multivariate normal, a best generalized least squares (BGLS) estimator ξ˘ of ξ0 exists that has the same limiting properties as the maximum likelihood estimator. For complex random vectors y1 , . . . , yL , in principle the methods developed in [29] could be applied to L 1 V= [yl ]R [yl ]R  , L l=1

which requires to specify Υ(ξ) in the appropriate form. However, in this section we will find the complex equivalent of (3.38), which is more natural for complex random vectors because it uses the complex sample covariance L 1 yl yl ∗ , S= L l=1

and as an additional advantage, reduces the amount of computational work. Here, and throughout this section, it is assumed that E{yl } = 0. The motivation for generalized least squares estimation in the analysis of covariance structures of complex random variables is twofold: First of all, it applies to a larger class of random variables than complex normal maximum likelihood estimation, namely those complex valued random vectors y with a sample covariance matrix S, that has the property Cov(Sij , Spq ) = (Σ0 )ip (Σ0 )jq /L, where Σ0 = E{S}. Note the difference with the corresponding real requirement, which is Cov(Vij , Vpq ) = [(Υ0 )ip (Υ0 )jq + (Υ0 )iq (Υ0 )jp ]/L [29, Eq. 3]. Second, it has been observed that in some cases the computational complexity is much lower than that of maximum likelihood estimators [29]. 3.3.1 Best GLS estimators in covariance structure analysis of complex variables. Let vech{Σ} be the operator that stacks the upper triangular elements of (Σ) on top of the above diagonal elements of (Σ) into a vector of length m2 , and let Dm be the vech{} duplication matrix of Def. A.32 (p. 114). By definition Dm has the property Dm vech{Σ} = [vec{Σ}]R . Let Km be the m2 × m2 commutation matrix of Def. A.22 (p. 112) with the essential property that Km vec{Z} = vec{Z  } for any m × m matrix Z, and let Lm2 be the conjugation matrix of Def. A.27 (p. 113). Several properties of Km and Lm2 were derived in App. A, and include Lm2 [vec{Z}]R = [vec{Z}]R

and

{Km }R Lm2 {Σ ⊗ Σ}R = {Σ ⊗ Σ}R {Km }R Lm2

(3.39)

Furthermore, Dm , Lm2 and {Km }R are related through 1 + 2 = (I2m2 + {Km }R Lm2 ) = Nm = Nm . Dm Dm 2

(3.40)

See App. A for additional properties. Let Σ0 be correctly parameterized by the p-vector ξ and matrix function Σ(ξ), such that Σ0 = Σ(ξ0 ) for a unique ξ0 . The best generalized least squares estimate of ξ0 which has minimum variance among all weighted least squares estimates is obtained from ξ˘ = arg min vech{S − Σ(ξ)} A vech{S − Σ(ξ)}, ξ

with A−1 = Var(vech{S}) as follows from theorem B.4 in App. B (p. 120).

(3.41)

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

3.3 GLS in covariance structure analysis of complex random variables

51

+ [vec{S − Σ(ξ)}] . In lemma To find A, note that by definition of Dm , vech{S − Σ(ξ)} = Dm R A.57 of App. A it is determined that the variance of [vec{S}]R is

Var [vec{S}]R =

1 (I 2 + {Km }R Lm2 ){Σ0 ⊗ Σ0 }R . 2L 2m

(Note the similarity of this expression with Var(vec{V}) = real case, e.g. [18].) Hence 

+ + (Var [vec{S}]R )Dm = Var(vech{S}) = Dm

1 L (Im2 +Km )(Υ⊗Υ)

of corresponding

1 +1 + Dm (I2m2 + Lm2 {Km }R ){Σ0 ⊗ Σ0 }R Dm , L 2

which by (3.40) may be written Var(vech{S}) =

1 + + D {Σ0 ⊗ Σ0 }R Dm . L m

(3.42)



+ { 1 Σ ⊗Σ } D + )−1 = LD  {Σ ⊗Σ }−1 D , where the last equality follows Therefore A = (Dm 0 R m 0 0 R m m L 0 from theorem A.34 by exchanging the roles of Dp and Dp+ . The quadratic form in equation (3.41) can now be written

L vech{S − Σ(ξ)} D m {Σ0 ⊗ Σ0 }R−1 Dm vech{S − Σ(ξ)}

= L[vec{S − Σ(ξ)}]R  {Σ0 ⊗ Σ0 }R−1 [vec{S − Σ(ξ)}]R

(3.43)

−1

= L [vec{S − Σ(ξ)}∗ (Σ0 ⊗ Σ−1 0 )vec{S − Σ(ξ)}]

2 = L tr{[Σ−1 0 (S − Σ(ξ))] }.

Here the relations (3.1)–(3.2) (viz. [W ]R  [Z]R = (W ∗ Z), etc.), vec{D } (C  ⊗ A)vec{B} = tr{ABCD} and (Z ⊗ W )−1 = (Z −1 ⊗ W −1 ) [199] were used. Therefore the best (i.e., minimum variance) GLS estimator is 2 ξ˘ = arg min L tr{[Σ−1 0 (S − Σ(ξ))] }.

(3.44)

ξ

Similar arguments were used in [29] to obtain the result for the real case in (3.38). To obtain actual estimates, Σ0 in Prop. 3.11 needs to be replaced with a known matrix. If WL is a matrix that converges in probability to a matrix proportional to Σ−1 0 as L → ∞, then the following result is obtained. p

Lemma 3.9 (analogous to [29, Proposition 1]). Let WL be such that WL → Σ−1 0 . Then, subject to the identifiability condition mentioned in Thm. B.1, the estimate obtained from (3.44) is consistent. With this choice of weight matrix, the estimate is called the best generalized least squares (BGLS) estimate. p

Proof. The proof is similar to that of Lem. B.1 (p. 120), making use of the fact that WL → Σ−1 0 .   An obvious example of such a WL is WL = S−1 . Lemma 3.10 (analogous to [29, Proposition√2]). Subject to the regularity conditions of Thm. B.2 (on p. 121), the limiting distribution of L (ξ˘− ξ0 ) is zero mean multivariate normal. √  √ ∗ Proof. First L vech{S} = (1/ L) L l=1 vech{yl yl }, is a sum of i.i.d. real random variables, so that application of a central limit theorem yields √ d + + L vech{S − Σ0 } → N (0, Dm {Σ0 ⊗ Σ0 }R Dm ). (The covariance matrix was obtained in (3.42).)

52

3 Mean and Covariance Structures for Complex Random Variables p

 {W ⊗ W } D , so that A /L → A/L where A = LD  {Σ ⊗ Σ }−1 D Let AL = LDm m 0 0 R m L L R m L as previously. Now use equation (B.6) to obtain the relation √ √ L (J˘h AL J˘h )−1 J˘h AL Jαh (ξ˘ − ξ0 ) = (J˘h AL J0h )−1 J˘h AL ( L vech{S − Σ0 })

where J˘h = ∂vech{Σ}/∂ξ  |ξ˘ and J˘αh = ∂vech{Σ}/∂ξ  |ξα for some ξ α between ξ˘ and ξ0 . Since p p p  ˘ ξα → ξ0 also J˘h , Jαh → J0 and (J˘h AL J˘h ), (J˘h AL Jαh ) → J0h AJ0h . Application of Slutsky’s ξ, theorem therefore proves the result.   √ Proposition 3.11. The BGLS estimator is obtained from (3.44) and L ξ˘ has asymptotic variance −1   ∂vec{Σ}∗ −1 −1 ∂vec{Σ}

, (3.45) (Σ0 ⊗ Σ0 ) ∂ξ ∂ξ  ξ0 ξ0 provided ξ˘ is consistent, vec{Σ} has continuous first order derivatives and its Jacobian matrix has full column rank in a convex region around ξ0 . 



Proof. Let J0h = ∂vech{Σ}/∂ξ  |ξ0 , and J0 = ∂vec{Σ}/∂ξ  |ξ0 . Denote Π(B) = (J0h BJ0h )−1 J0h B. From Lem. B.3 (p. 121)  {Σ0 ⊗ Σ0 }R−1 Dm ) Var ξ˘ = Π(Dm  {Σ0 ⊗ Σ0 }R−1 Dm ) (Var vech{S}) Π(Dm

(3.46)

Now J0h  Dm  {Σ0 ⊗ Σ0 }R−1 Dm J0h = [J0 ]R {Σ0 ⊗ Σ0 }R−1 [J0 ]R = [J0 ∗ (Σ0 ⊗ Σ0 )−1 J0 ] by (3.1), furthermore J0h  Dm  {Σ0 ⊗ Σ0 }R−1 Dm = [J0 ]R  {Σ0 ⊗ Σ0 }R−1 Dm , and pre- and post-multiplying Var(vech{S}) by the latter yields + + {Σ ⊗ Σ}R Dm Dm  {Σ0 ⊗ Σ0 }R−1 [J0 ]R = [J0 ]R {Σ0 ⊗ Σ0 }R−1 [J0 ]R [J0 ]R  {Σ0 ⊗ Σ0 }R−1 Dm Dm

from (3.39)–(3.40), and Nm [J0 ]R = Nm Dm J0h = Dm J0h = [J0 ]R . Combining these results according to (3.46) completes the proof. (Similar arguments were used as given in [29] to obtain this result for real symmetric matrices.)   With help of the Cram´er-Rao lower bound on the variance of ξ˘ it can be shown that the BGLS √ 

√ A + {Σ ⊗ Σ} D +  , estimator is asymptotically efficient: Since L vech{S} ∼ N L vech{Σ}, Dm m R the asymptotic likelihood function may be written 2 /2

Λ = (2πL)−m

L + +  − 21  |Dm {Σ ⊗ Σ}R Dm | exp(− vech{S − Σ}∗ Dm {Σ ⊗ Σ}R−1 Dm vech{S − Σ}). 2

To obtain the Cram´er-Rao lower bound, the first and second order differentials of log Λ are computed.   Using [Z]R [W ]R = (Z W ), and |Dp+ {Z ⊗ Z}R Dp+ | = 2−p(p−1) ( Mod |Z|)2p (Lem. A.48, p. 117), 1 L d log Λ = − d log(2−m(m−1) |Σ|2m ) − d[vec{S − Σ}]R {Σ ⊗ Σ}R−1 [vec{S − Σ}]R 2 2 = −m tr{Σ−1 dΣ} + L tr{[Σ−1 (S − Σ)]2 Σ−1 dΣ} + L tr{Σ−1 (S − Σ)Σ−1 dΣ} (3.47) The second order differentials are derived from the individual terms of the last left hand side of this equation. The differential of the first term in (3.47) is −m d tr{Σ−1 dΣ} = m tr{Σ−1 ( dΣ)Σ−1 dΣ} − m tr{Σ−1 d2 Σ}. After lengthy algebra the differential of the second term of (3.47) can be written

(3.48)

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

3.3 GLS in covariance structure analysis of complex random variables

53

 L tr{2( dΣ−1 )(S − Σ)Σ−1 (S − Σ)Σ−1 dΣ + 2Σ−1 (− dΣ)Σ−1 (S − Σ)Σ−1 dΣ

 (3.49) + Σ−1 (S − Σ)( dΣ−1 )(S − Σ)Σ−1 dΣ + Σ−1 (S − Σ)Σ−1 (S − Σ)Σ−1 d2 Σ} .

The differential of the third term of the last LHS in (3.47) is found to be L tr{2( dΣ−1 )(S − Σ)Σ−1 dΣ + ( dΣ−1 ) dΣ + Σ−1 (S − Σ)Σ−1 d2 Σ},

(3.50)

The following result is required. Let Q1 and Q2 be non-random m × m matrices. From tr{ABCD} = vec{D } (C  ⊗ A)vec{B}, Σ = Σ, S = S, tr{Q1 ⊗ Q2 } = tr{Q1 } tr{Q2 } if Q1 and Q2 are square matrices, and the fact that Var vec{S} = (Σ0 ⊗ Σ0 )/L by Lem. A.56 (p. 118) E tr{Q1 (S − Σ)Q2 (S − Σ)}|ξ0 = tr{(Q2 ⊗ Q1 )E{vec{(S − Σ0 )}vec{(S − Σ0 )}∗ }} 1 1 1 = tr{(Q2 ⊗ Q1 )(Σ ⊗ Σ)} = tr{Q2 Σ0 ⊗ Q1 Σ0 } = tr{Q1 Σ0 } tr{Q2 Σ0 } L L L Taking expectations in expressions (3.49) and (3.50) evaluated in Σ = Σ0 , and applying the above result, it can be found after extensive algebra that −1 −E d2 log Λ|ξ0 = (m + L) tr{Σ−1 0 ( dΣ)|ξ0 Σ0 ( dΣ)|ξ0 } −1 + tr{Σ−1 0 ( dΣ)|ξ0 } tr{Σ ( dΣ)|ξ0 }.

(3.51)

Hence, the expected value of the negative of the Hessian matrix of log Λ is    2 ∂vec{Σ}∗ ∂ log Λ −1 −1 ∂vec{Σ} = (m + L) −E (Σ0 ⊗ Σ0 )   ∂ξ∂ξ ξ0 ∂ξ ∂ξ ξ0 ξ0 ∂vec{Σ}∗ −1 −1 ∗ ∂vec{Σ} + vec{Σ0 }vec{Σ0 } .  ∂ξ ∂ξ ξ0

ξ0

Comparing this with the asymptotic theory in (3.9) at the beginning of this chapter (p. 40) and with the asymptotic variance of ξ˘ in Prop. 3.11, it can be shown that the difference between the resulting estimator covariances goes to zero faster than 1/L. Therefore the BGLS estimators are asymptotically efficient [29]. 3.3.2 Special structure: (pseudo) confirmatory factor model structure In section 3.2.2 we considered maximum likelihood estimation of the confirmatory factor model. Here we will consider (B)GLS estimation of the confirmatory factor model and use the separable least squares method to obtain closed form GLS estimators of the same subset of the parameters in ξ. The reader who is not interested in the technical details may wish to skip to the summary on page 56 at the end of this section. Let Σ have the structure ΛΨΛ∗ + Θ, (3.52) where Λ is m × d, d < m, does not necessarily have full column rank, Ψ and Θ are assumed to be Hermitian positive definite. Λ is correctly parameterized by the q-vector θ. No knowledge on Ψ is available and hence must be completely estimated. Some knowledge on Θ is available and has the structure σU, where U is a known matrix function that depends on the parameter vector θ, such that Λ and U identify θ0 (i.e., Λ(θ) = Λ(θ0 ) and U(θ) = U(θ0 ) implies θ = θ0 ). All unknown parameters—i.e., θ, the (non-duplicate) real and imaginary components in Ψ, and σ—are collected in the p-vector ξ. Hence Σ is correctly parameterized by ξ. Let S be a sample covariance estimate of a complex random variable, with E{S} = Σ. The estimator ξ˘ = (θ˘ , γ˘  ) is obtained from the GLS criterion (3.44). Let σ = vech{Σ} and ψ = vec{Ψ}, then (3.52) is equivalently expressed by

54

3 Mean and Covariance Structures for Complex Random Variables + σ = Dm [vec{Σ}]R + + = Dm [(Λ(θ) ⊗ Λ(θ))vec{Ψ}]R + Dm [vec{Θ}]R  [ψ]R + = H(θ)β, = Dm ({(Λ(θ) ⊗ Λ(θ))}R , [vec{U}]R ) σ

where Dm is the vech{} duplication matrix of Def. A.32. Note that H(θ) and β are real.5 Writing s = vech{S}, the quadratic minimization problem in (3.44) or equivalently (3.43) is now written min(s − σ) Dm  {WL ⊗ WL }R Dm (s − σ) θ,β

where Σ0 was replaced with a consistent estimate WL . We consider the separable least squares sub-problem of finding an estimate of β, i.e. ˘ = arg min(s − Hβ) Dm  {WL ⊗ WL } Dm (s − Hβ). β R β

(3.53)

In [150, sec. 11.30] it is shown that the quadratic form (3.53) attains the minimum with respect to β if Hβ = H(H  Dm  {WL ⊗ WL }R Dm H)+ H  Dm  {WL ⊗ WL }R Dm s

(3.54)

˘ Closed The term (H  Dm  {WL ⊗ WL }R Dm H)+ H  Dm  {WL ⊗ WL }R Dm s will be denoted by β. 2 form expressions for estimates of ψ and σ are found from this next. Before doing so, the following result is used in Lem. 3.13 to determine the MP-generalized inverse in (3.54): Lemma 3.12. [192] Let the Hermitian positive semidefinite matrix Z, be partitioned as  E F Z= , GM and suppose that rank(Z) = rank(E) + rank(M ) and |M | = 0. Then  + E + E + F X + GE + −E + F X + + Z = −X + GE + X+

(3.55)

where X = M − GE + F . Proof. The proof involves the verification of the defining properties of the MP-inverse (see A.6, p. 110) and is given in [192].   ˇ = W1/2 UW1/2 and Λ ˇ = W1/2 Λ. The MP-inverse (H  (WL ⊗ WL )H)+ Lemma 3.13. Let U L L L in (3.54) is given by  Q R   + (H Dm {Wm ⊗ Wm }R DM H) = , R X −1 where ˇ  Π⊥ X = [vec{U}] R ˇ

ˇ {Λ⊗Λ} R

ˇ , [vec{U}] R

ˇ ⊗ Λ} ˇ + [vec{U}] ˇ X −1 R = −{Λ R R ˇ ⊗ Λ} ˇ + (I + [vec{U}] ˇ ⊗ Λ} ˇ +  Nd . ˇ X −1 [vec{U}] ˇ  ){Λ Q = {Λ R R R R 5

The fact that some of the elements of β are zero is ignored; the constructed estimator will however automatically satisfy this constraint, as will be seen shortly.

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

3.3 GLS in covariance structure analysis of complex random variables

55

+ ({(Λ ⊗ Λ)} , [vec{U}] ), hence with N = D D + of (3.40), Proof. H = Dm m m m R R

H  Dm  {WL ⊗ WL }R Dm H =   {Λ ⊗ Λ∗ }R Nm {WL ⊗ WL }R Nm {Λ ⊗ Λ}R {Λ ⊗ Λ∗ }R Nm {WL ⊗ WL }R [vec{U}]R [vec{U}]R {WL ⊗ WL }R Nm {Λ ⊗ Λ}R [vec{U}]R Nm {WL ⊗ WL }R Nm [vec{U}]R which may be reduced to  ∗ {Λ WL Λ ⊗ Λ∗ WL Λ}R Nd [vec{Λ∗ WL UWL Λ}]R [vec{Λ∗ WL UWL Λ}]R

tr{WL UWL U}   ˇ ∗Λ ˇ ⊗Λ ˇ ∗ Λ} ˇ Nd {Λ ˇ ⊗ Λ ˇ ∗ } [vec{U}] ˇ {Λ R R R , = ˇ ⊗ Λ} ˇ ˇ  {Λ ˇ U} ˇ [vec{U}]

tr{ U R R where the equalities Nm {Z ⊗ Z}R = {Z ⊗ Z}R Nd for any m × d matrix Z, Nm [vec{U}]R = [vec{U}]R and [vec{U}]R {WL ⊗ WL }R [vec{U}]R = tr{WL UWL U} were used. Since this ˇ U} ˇ is a positive number, and its rank equals the sum ranks matrix is positive semidefinite, tr{U of its block-diagonal elements, the conditions of Lem. 3.12 are satisfied, and hence its MP-inverse may be obtained from (3.55). Similar arguments and corollary A.36 show that ˇ U} ˇ − [vec{U}] ˇ  {Λ ˇ ⊗ Λ ˇ ∗ } [vec{U}] ˇ ˇ ⊗ Λ} ˇ ({Λ ˇ ∗Λ ˇ ⊗Λ ˇ ∗ Λ} ˇ Nd )+ {Λ X = tr{U R R R R R ˇ  (I − {Λ ˇ , ˇ ⊗ Λ} ˇ {Λ ˇ ⊗ Λ} ˇ + )[vec{U}] = [vec{U}] R R R R where corollary A.36 was used. This is the first equation. Next, with similar arguments ˇ ⊗ Λ ˇ ∗ } [vec{U}] ˇ X −1 ˇ ∗Λ ˇ ⊗Λ ˇ ∗ Λ} ˇ Nd )+ {Λ R = −({Λ R R R ˇ ∗Λ ˇ ⊗Λ ˇ ∗ Λ} ˇ + {Λ ˇ ⊗ Λ ˇ ∗ } Nm [vec{U}] ˇ X −1 = −{Λ R R R ˇ X −1 . ˇ ⊗ Λ} ˇ + [vec{U}] = −{Λ R R ˇ R .  ˇ ∗Λ ˇ ⊗Λ ˇ ∗ Λ} ˇ Nd )+ + RXR = {Λ ˇ ∗Λ ˇ ⊗Λ ˇ ∗ Λ} ˇ + Nd − {Λ ˇ ⊗ Λ} ˇ + [vec{U}] Finally, Q = ({Λ  R R R R ˘  ,σ ˘  in (3.54) is given by ([ψ] ˘ and ˘ ), where ψ˘ = vec{Ψ}, Proposition 3.14. β R ˇ −σ ˇ Λ ˇ + ˘ =Λ ˇ + (S ˘ U) Ψ

and

ˇ − Π ˇ UΠ ˇ tr{U ˇU ˇ − Π ˇ UΠ ˇ ˇ )S}/ ˇ ˇ U}. ˇ σ ˘ = tr{(U Λ Λ Λ Λ

(3.56)

˘ = (H   Dm  {WL ⊗ WL } Dm H)+ H  Dm  {WL ⊗ WL } [vec{S}] which, Proof. From (3.54), β R R R with the partitioned results above, may be written    ˇ  ˇ ⊗ Λ} ˇ  + R[vec{U}] ˘ Q{ Λ [ψ] R R R ˇ , = [vec{S}] R ˇ  ˇ ⊗ Λ} ˇ  + X −1 [vec{U}] σ ˘ R {Λ R R because

   ˇ ˇ Λ ⊗ Λ} { ˇ R H  Dm  {WL ⊗ WL }R [vec{S}]R = ˇ  [vec{S}]R . [vec{U}] R

Now ˇ  = −X −1 [vec{U}] ˇ  {Λ ˇ  ˇ ⊗ Λ} ˇ  + X −1 [vec{U}] ˇ ⊗ Λ} ˇ +  {Λ ˇ ⊗ Λ} ˇ  + X −1 [vec{U}] R {Λ R R R R R R −1  ˇ = X [vec{U}] (I − Π ). R

ˇ Λ} ˇ {Λ⊗ R

Now Π{Λ⊗ = {ΠΛ ˇ ⊗ ΠΛ ˇ }R , so that multiplication of the latter displayed expression by ˇ Λ} ˇ R ˇ yields [vec{S}] R

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

56

3 Mean and Covariance Structures for Complex Random Variables −1 ˇ ˇ }. ˇ  (I − {Π ˇ ⊗ Π ˇ } )[vec{S}] ˇ ˇS ˇ − UΠ ˇ ˇ SΠ σ ˘ = X −1 [vec{U}]

tr{U R R = X Λ Λ R Λ Λ

(3.57)

Furthermore, −1 ˇ ˇ ˇ + ˇ ˇ  (I − {Π ˇ ⊗ Π ˇ } )), ˇ ⊗ Λ} ˇ + R[vec{U}] [vec{U}] Q{Λ R R = {Λ ⊗ Λ}R Nm (I − [vec{U}]R X R Λ Λ R

so that ˘ = {Λ ˇ ⊗ Λ} ˇ + Nm (I − [vec{U}] ˇ X −1 [vec{U}] ˇ  (I − {Π ˇ ⊗ Π ˇ } ))[vec{S}] ˇ [ψ] R R R R Λ Λ R R ˇ ⊗ Λ} ˇ + Nm ([vec{S}] ˇ ⊗ Λ} ˇ + [vec{S ˇ ˇ ˇ −σ ˇ = {Λ ˘ ) = {Λ ˘ U}] R − [vec{U}]R σ R R R ˇ −σ ˇ . ˇ ⊗ Λ) ˇ + vec{S = [(Λ ˘ U}] R ˇ ⊗ Λ) ˇ + vec{S ˇ −σ ˇ or, ˘ U}, where (3.57) was recognized in the first right hand side. Hence ψ˘ = (Λ + + ˘ =Λ ˇ (S ˇ −σ ˇ Λ ˇ )∗ . since ψ = vec{Ψ}, Ψ ˘ U)(   Summary: A separable least squares algorithm To summarize the results of this section we obtain a separable least squares algorithm. Substi˘ and σ tution of Ψ ˘ of (3.56) into (3.44) leads to the separable least squares algorithm ∗ ˘ (θ) − σ ˘ (θ)U(θ)])2 } θ˘ = arg min tr{(WL [S − Λ(θ)Ψ(θ)Λ

(3.58)

θ

˘ ˘ U)WL Λ(Λ∗ WL Λ)+ Ψ(θ) = (Λ∗ WL Λ)+ Λ∗ WL (S − σ ˇ ˇ )S}/ ˇ ˇ U}, ˇ ˇ tr{U ˇU ˇ − Π ˇ UΠ ˇ − Π ˇ UΠ σ ˘ (θ) = tr{(U ˇ = where Λ

Λ Λ 1/2 WL Λ,

ˇ = U

Λ 1/2 1/2 WL UWL ,

(3.59) (3.60)

Λ

ˇ= S

1/2 1/2 WL SWL .

˘ To evaluate the bias of the estimators given θ, let β(θ) be defined by ˇ 0U ˇ 0 Π ˇ UΠ ˇˇ ˇΛ ˇ0 − Λ ˇ ˇΛ ˇ ˇ ˇ U}, ˇ β˘L (θ) = tr{(Λ ˇ UΠΛ Λ Λ 0 )Ψ}/ tr{UU − ΠΛ

(3.61)

ˇ 0 = W1/2 Λ(θ0 ). If we take expectations in (3.59)-(3.60), we find that where Λ L E{˘ σ | θ} = σ + β˘L (θ) ˇ+ ˇ

ˇ+ˇ

(3.62) ∗

ˇ+

ˇ+ ∗

˘ | θ} = Λ Λ0 Ψ(Λ Λ0 ) + β˘L (θ)Λ U( ˇ Λ ) . E{Ψ

(3.63)

p p p ˇ ˇ ˇ ,σ Therefore, since β˘L (θ) → 0 if θ˘ → θ0 because in that case ΠΛ ˘ is ˇ → ΠΛ ˇ 0 and Λ0 = Λ0 PΛ 0 + + ˇ Λ ˘ will be consistent if θ˘ is, but only if Λ ˇ Λ ˇ 0 ΨΛ ˇ 0 = Ψ. consistent if θ˘ is. Furthermore, Ψ 0

0

3.4 Cross-spectrum structures in the analysis of time series The multivariate complex normal distribution of section 3.2 arises naturally in the context of multivariate time stationary stochastic signals. Let { y(t)}∞ y(t)} = 0, t∈Z be an ordered sequence of real normal random m-vectors with E{  (t + u)) = Υ(u) only depends on the time lag u. Then { ∀t ∈ Z, such that Cov( y(t), y y(t)} is called a stationary Gaussian stochastic signal. (For more general stochastic sequences, the requirement is that all higher order cumulants depend on time lags and not on time itself. The ∞ results discussed in this section apply to this case as well.) If the further condition that u=−∞ |Υ(u)| < ∞ is satisfied, the complex function Σ(λ) =

∞ 1  Υ(u) exp(−iλu) 2π u=−∞

λ ∈ [0, π]

3.4 Cross-spectrum structures in the analysis of time series

57

can be defined, and is called the (cross-)spectral density of the time series { y(t)}t∈Z , or simply the cross-spectrum. The methods of the previous sections are particularly well suited for the analysis of cross-spectral structures. Let { y(t)}n−1 t=0 be a finite set of observed values from a stationary stochastic signal. The finite  (t) is given by Discrete Fourier Transform (DFT) of y y(k) =

n−1 

 (t) exp(−i2πkt/n) y

k = 0, . . . , K < n/2.

t=0

For fixed time intervals between the observations, if we let n → ∞, the Fourier coefficients {y(k)}K k=1 are asymptotically independent distributed as y(k) ∼ CNm (0, 2πnΣ(2πk/n)), and y(0) ∼ N (0, 2πnΣ(0)) [26, Thm. 4.4.1]. Their outer product, 1 y(k)y∗ (k), 2πn

k = 1, . . . , K,

called the periodogram, is an asymptotically (for n → ∞) unbiased but inconsistent estimate of the cross-spectrum Σ(2πk/n), k = 1, . . . , K. Throughout this thesis we will use the shorthand  Σk = Σ(2πk/n). Consistent estimates can be obtained by smoothing the periodogram, i.e., averaging the periodogram at nearby frequencies. Alternatively, a consistent estimate6 can be obtained by averaging periodograms obtained from different stretches of data [26, Thm.’s 7.3.1 , l = 1, . . . , L be the observations of a set of L realizations of a stochastic and 7.3.4]. Let { yl (t)}n−1 t=0 ˙ signal. If we set y(k) = L l=1 yl (k), k = 1, . . . , K, then [26, p. 282] 1  ∗ ˙ ˙ [yl (k) − y(k)][y Sk = l (k) − y(k)] 2πn L

k = 1, . . . , K

l=1

are independent complex Wishart variables with E{Sk } = Σk . Furthermore, from [26, p. 282] we can extract that as n → ∞, Cov( (Sk )a1 b1 , (Sk )a2 b2 ) = [(Σk )a1 a2 (Σ)b1 b2 + (Σk )a1 b2 (Σk )b1 a2 ]

(3.64)

Cov( (Sk )a1 b1 , (Sk )a2 b2 ) = −[(Σk )a1 a2 (Σ)b1 b2 + (Σk )a1 b2 (Σk )b1 a2 ]

(3.65)

Cov((Sk )a1 b1 , (Sk )a2 b2 ) = [(Σk )a1 a2 (Σ)b1 b2 + (Σk )a1 b2 (Σk )b1 a2 ].

(3.66)

In connection with this we have Lemma 3.15. For k = 1, . . . , K, Var[vec{Sk }]R =

1 2L (I

+ {Km }R Lm2 ){Σk ⊗ Σk }R .

Proof. Equations (3.64) and (3.66) can be directly equated with equations (A.21) and (A.23) of corollary A.58 by identifying σ with (Σ) and using the equality (Z) = (Z). From equation (A.20) Cov( vec{S}, vec{S}) = −(I + Kp )(Σ ⊗ Σ). This may be written −((Σ ⊗ Σ) + Kp (Σ ⊗ Σ)) = −((Σ ⊗ Σ) + (Σ ⊗ Σ)Kp ) = −((Σ ⊗ Σ) − (Σ ⊗ Σ)Kp ) = −(Σ ⊗ Σ)(I − Kp ), where the equality (Z) = −(Z) was used. Therefore Cov( vec{S},  vec{S}) = −(Σ ⊗ Σ)(I − Kp ) which may be written element wise −(σig σ jh − σih σ gj ) by lemma A.26. Identifying σ with (Σ) in equation (3.65) yields the result.   As a result, the latter lemma makes the BGLS estimator directly applicable to the crossspectral estimates Sk . Analysis of cross-spectral structures was suggested in [26], and was developed as a frequency domain approach to dynamic factor analysis in [162] and [163]. These method provide the underlying theory of the methods introduced in following two chapters. 6

Here the ‘broad’ interpretation of consistency of an estimator tending in probability to its expected value is meant.

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

58

3 Mean and Covariance Structures for Complex Random Variables

Appendix ˇ = ΛΨΛ  ∗ + σU, with Ψ  as defined in (3.18). If σ = 0 then |Σ| ˇ = 0 w.p. 1. Lemma 3.16. Let Σ Proof. Let U−1 = Q2 , then ˇ = |ΛΨΛ  ∗ + σU| = σ m−d |ΨΛ  ∗ U−1 Λ + σId ||U| |Σ| = σ m−d |(Λ∗ U−1 Λ)+ Λ∗ U−1 (S − σU)U−1 Λ(Λ∗ U−1 Λ)+ Λ∗ U−1 Λ + σId ||U| = σ m−d |(QΛ)+ (QSQ)(QΛ) + σΠ⊥ Λ∗ Q ||U|. First of all, if σ = 0, then rank(σΠ⊥ Λ∗ Q ) = d − r. Now QΛ = F G, where F is m × r and G is r × d, and rank(F ) = rank(G) = r. Therefore rank((QΛ)+ (QSQ)(QΛ)) = rank(G+ F + QSQF G) = rank(G+ (F ∗ F )−1 F ∗ QSQF G). It is clear that F ∗ QSQF is nonsingular w.p. 1, as S is complex Wishart, and hence rank(G) = rank( ∗ F Q S QF G). Furthermore, rank(G+ (F  F )−1 ) = rank(G+ ) = r. Theorem 2.12 in [199] shows that for any matrices A and B of dimensions m × n and n × p it holds that rank(AB) ≥ rank(A) + rank(B) − n. From this one obtains rank(G+ (F ∗ F )−1 F ∗ QSQF G) ≥ rank(G+ (F ∗ F )−1 )+rank(F ∗ QSQF G)−r = r+r−r = r. Now ((QΛ)+ (QSQ)(QΛ))Π⊥ Λ∗ Q = 0 = ⊥ ⊥ + + ΠΛ∗ Q ((QΛ) (QSQ)(QΛ)). Therefore rank((QΛ) (QSQ)(QΛ) + σΠΛ∗ Q ) = rank(σΠ⊥ Λ∗ Q ) + ⊥ + +  rank((QΛ) (QSQ)(QΛ)) = r + (d − r) = d, and hence |(QΛ) (QSQ)(QΛ) + σΠΛ∗ Q | = 0. 

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

4 Frequency domain simultaneous source and source coherence estimation with an application to MEG

4.1 Introduction As1 indicated in previous chapters, an important goal of cognitive neuroimaging is to establish how different cortical areas cooperate and interact with each other to implement cognitive functions. As an example, in the last decade compelling evidence has become available on the role of the visual and parietal cortical areas and their interactions during the selective processing of visual stimuli (e.g. [72,185]). Other phenomena consistently found are increased synchronization of electric activity while viewing stimuli with emotional content [2, 107], and while performing cognitive tasks [132]. Such synchronizations of cortical rhythms are hypothesized to solve the binding problem in the perception of objects [170, 208]. Current techniques for localizing active cortical areas are functional magnetic resonance imaging (fMRI), positron emission tomography (PET), high resolution cortical imaging [172], and equivalent current dipole (ECD) modelling of the magneto-(MEG) and electro-encephalogram (EEG) (e.g. [101]). Although advances have been made to establish cortical interactions with PET and fMRI (e.g. [73,81,158]), only an MEG/EEG based analysis has enough timing precision to be able to discern mutual influences between activities of different cortical areas directly. Current commonly applied methods to extract information on interactions between different brain regions from EEG/MEG, rely on the analysis of raw sensor signals. Standard methods include coherence analysis [9, 126, 163, 226] and event related (de-)synchronization (ERS/ERD) [183]. Coherence is assumed to be indicative of interactions between spatially separated areas, whereas ERS/ERD is interpreted as a measure of small scale interactions within brain areas (see [183]). High coherence between sensors is often attributed to coherent activity of sources directly beneath these sensors. This interpretation is not generally valid. Problems with this interpretation of MEG/EEG coherence lie in the fact that all sensors are sensitive to sources everywhere in the head. Moreover, for EEG there are additional confounding influences: volume conduction effects [188,214], and the reference effect. Therefore it is very difficult to localize areas responsible for the observed coherence [129]. ERS and ERD suffer from similar problems [41]. As an illustration Fig. 4.1a depicts a traditional sensor coherence analysis obtained from simulated data. The coherence pattern suggests that sources in the temporal and parietal cortical regions in the right hemisphere were correlated, and that possibly left and right hemispheric temporal cortical regions were correlated. In contrast Fig. 4.1b shows the dipoles that were used to generate the coherence pattern in Fig. 4.1a: a dipole in the parietal lobe of each hemisphere. In general it will thus be as difficult to localize coherent cortical areas by mere inspection of coherences between sensors, as it is to localize cortical areas by mere inspection of iso-field/potential contour plots. ECD modelling allows sources to be localized from the MEG/EEG. Currently only source locations, orientations and trial mean amplitudes are estimated from trial averaged data. This however disregards the information that is present in the amplitude variation across trials (but 1

This chapter has been published as [87].

60

4 Frequency domain simultaneous source and source coherence estimation in MEG dipole model

inion − nasion

coherences (SNR 1:1)

0.63

left − right

(a)

(b)

Fig. 4.1. a) Coherence analysis of simulated data that were obtained as described in section 4.3. Coherences between sensors ranging between 0.1 and 0.52, are indicated by a line joining them. Thicker and blacker lines indicate higher coherence. b) shows the actual dipole locations, orientations, amplitude coherence (indicated by the gray two-way arrow) and relative power (indicated by the length of the dipole moments–no units attached). The coherences were averaged across the included frequencies in Ω# . See the text for a definition of Ω# .

see [97]). Suppose for instance that there are two sources with partly stochastic amplitudes. Then there will be variation of the source amplitudes across trials, and possibly covariation if the sources are interacting. Information about this covariation of sources is thus contained in single trial MEG/EEG signals. In the currently presented method, this information is exploited to determine covariation between sources. This then overcomes the problems associated with the afore mentioned techniques, by directly estimating source locations, source orientations, and source covariances. The estimation procedure can be carried out in the frequency domain ( [144,189,191]) which, under the assumption of stationarity, can greatly reduce the computational burden as will be argued in the next section.

4.2 Method 4.2.1 Model specification Let ym,l (t) denote the signal measured at the mth sensor at time t in the lth trial sampled l (t) = [ y1,l (t), . . . , ym,l (t)] as the array of with fixed sampling frequency, and define the vector y measurements made by the m sensors at times t = 1, . . . , n in trials l = 1, . . . , L. Here  denotes transposition. (Throughout this chapter  will indicate time domain—as opposed to frequency   l = [ l (n)] will then denote the m × n data matrix yl (1), . . . , y domain—quantities.) The matrix y measured in trial l. We use the spatiotemporal dipole analysis model in which we assume a total  l : of d fixed dipoles generating the data in y   l + El .  l = Λ(θ)Ξ y

(4.1)

Here Λ is the m × d gain matrix with unit activity of d dipoles, depending on the location- and  l is the d × n matrix orientation-parameters θ which are fixed over time samples and trials, Ξ containing the source amplitude time series in trial l, and El is the m × n matrix of noise signals on trial l. The parameter θ consists of locations θdloc and orientations θdor for d = 1, . . . , d. If we l in a column vector, then with the let vec{ yl } denote the operator that stacks the columns of y

4.2 Method

61

Kronecker product ⊗ and the equality vec{ABC} = (C  ⊗ A)vec{B} [150, p. 30], the model can be written as l = [In ⊗ Λ(θ)] y ηl +  εl , (4.2)   l } = [ l (n) ] is the m·n vector containing the measurements in trial l = vec{y yl (1) , . . . , y where y  l } is the d·n vector that contains the source amplitudes, and   l}  l = vec{Ξ l. Similarly η εl = vec{E is the m·n vector with noise signals. We assume that the noise signals εl have zero mean. Note that we use Greek symbols to denote unknown, c.q., unobserved quantities. In the present article we will focus only on source coherences, and we will disregard the mean amplitude. Since we are interested in covariation between sources, and not in trial source  amplitude time functions, we may consider only the covariance matrix implied by (4.2): Let Υ   l , and Θ the covariance matrix l , Ψ the covariance matrix of η denote the covariance matrix of y of  εl . Then from (4.2) and the additional assumption that the source signals η l and noise signals εl are uncorrelated, we have  = (  [In ⊗ Λ(θ)] + Θ.  Υ yl −  yl )( yl −  yl )  = [In ⊗ Λ(θ)] Ψ

(4.3)

 is prohibitively large to estimate by where · indicates the ensemble average. The matrix Υ conventional means, containing m·n(m·n + 1)/2 unique elements—much more than the number of trials L in most situations. The situation can be significantly simplified however if we Fourier transform the data, making a further assumption that the variation of the source amplitudes around their ensemble averages η l  and noise signals are stationary and satisfy certain mixing conditions [26, 191]: Define 1  l (t) exp (−i2πtk/n) y 2πn t=0 n−1

yl (k) = √

to be the discrete Fourier transform coefficient at frequencies 2πk/n, k = 1, . . . , K < n/2 [191], and let yl = [yl (1) , . . . , yl (K) ] . Define η l (k) and εl (k) and η l and εl similarly. One advantage of working in the frequency domain (cf. [191]) arises in that different frequencies εl (s)  not necessarily have asymptotically uncorrelated Fourier coefficients—i.e. whereas  εl (t) ∗ equals zero for t = s, εl (j)εl (k)  → 0 for j = k as n → ∞ [26], where ∗ denotes complex conjugation and transposition. Similarly l (k) − yl (k)][yl (j) − yl (j)]∗  → 0 and [η l (k) − η l (k)][η l (j) − η l (j)]∗  → 0 when k = j as n → ∞. This implies that the covariance matrices Σ, Ψ and Θ of yl , η l and εl respectively, are approximately block diagonal for large n. Denoting the model in (4.2) in the frequency domain yl = [IK ⊗ Λ(θ)]η l + εl (see also [144, 191]), we find its covariance matrix to be   Σ1 0 · · · 0  ..   0 Σ2 .   . Σ = [IK ⊗ Λ(θ)]Ψ[IK ⊗ Λ(θ)] + Θ ≈  (4.4)  ..  ..  . . 0  0 · · · 0 ΣK where Σk = ΛΨk Λ + Θk , k = 1, . . . , K. The d × d matrices Ψk and the m × m matrices Θk are the block diagonal elements of Ψ and Θ respectively. The elements of Σk are the powerand cross-spectra of the MEG/EEG signals [191]. The elements of Ψk are the power- and crossspectra of the source-amplitude signals, and therefore provide a measure of the strength of interactions between the sources; Θk similarly specify the power- and cross-spectra of the noise signals. Compared to conventional coherence analysis, an estimate of Ψk has the advantage of describing covariation of cortical areas in terms of amplitude cross-spectra of sources.

62

4 Frequency domain simultaneous source and source coherence estimation in MEG

4.2.2 Parameter estimation Let S denote the matrix



 S1 0 · · · 0  ..   0 S2 .   , S= .  . . . . . 0  0 · · · 0 SK

(4.5)

Σ(ξ) = [IK ⊗ Λ(θ)]Ψ(ξ)[IK ⊗ Λ(θ)] + Θ(ξ).

(4.6)

  ∗ ˙ ˙ ˙ and y(k) = (1/L) l yl (k) the mean where Sk = 1/(L − 1) L l (k) − y(k)] l=1 [yl (k) − y(k)][y across trials.2 Sk is the observed cross-spectrum, and is an approximately unbiased estimate of the cross-spectrum Σk of (4.4) as n → ∞ for k = 1, . . . , K [26, ch. 7]. Collecting the unknown parameters of Ψ and Θ, together with the unknown parameters in θ, in the p-dimensional vector ξ, then from (4.4) we obtain

The true value of ξ will be denoted by ξ0 , i.e. Σ(ξ0 ) ≡ Σ of (4.4). To estimate the model parameters ξ, a discrepancy measure between Σ(ξ) and S can be minimized. Standard discrepancy measures have the least squares form [122, 242] 1 F(ξ) = W[S − Σ(ξ)]W∗ 2F , 2

(4.7)

where ·2F denotes the Frobenius norm. W can be any weight matrix such that W∗ W is positive definite, but we will only consider cases in which it has a similar block diagonal structure as S. Particular choices of W are IK·m and S−1/2 , where IKm is the K · m identity matrix and −1/2 denotes the inverse of the Cholesky factor of S [186], resulting in unweighed (ULS) and generalized (GLS) least squares parameter estimates respectively [122]. The loss function with these particular choices of W will be denoted FU LS (ξ) and FGLS (ξ) respectively throughout this article. In [29, 30] it was shown for real random variables that GLS estimators asymptotically have the lowest variance compared to all other choices of W. These results were generalized to complex random variables in Chap. 3. Moreover they are efficient if S follows a Wishart distribution law (in the sense that it is equal to the Cram´er-Rao lower bound for the error variance of any unbiased estimator [207]). In fact, due to a central limit theorem for the Fourier coefficients, Sk is asymptotically distributed as a complex Wishart variable for all k = 1, . . . , K, even when the signals themselves are not Gaussian [26]. This is another advantage of working in the frequency domain (see also [191]). The approximate complex Wishart distribution of Sk also allows the parameters to be estimated using the (approximate) maximum likelihood (ML) method: The ML estimation-function can then be defined as (cf. [29, 206]): FM L (ξ) = −p + log |Σ(ξ)| + tr{SΣ(ξ)−1 } − log |S|.

(4.8)

Here tr{·} gives the trace of a matrix, and | · | the determinant. The constants p and log |S| are of course not essential for the optimization of (4.8). These maximum likelihood estimators are also asymptotically efficient [Sect. 3.3.1, p. 50, this thesis] [29]. Taking advantage of the block diagonal structure of S, Σ(ξ) and W, both (4.7) and (4.8) can be rewritten as a sum over frequencies. As an additional advantage, merely a subset of frequencies can be used, effectively filtering out frequencies not of interest, and increasing the signal to noise ratio (SNR) if the source amplitudes have zero power at these omitted frequencies. Denoting this subset with K # included frequencies by Ω# , we have 2

 (k) should be close to zero for all k = 1, . . . , K [26] and is immaterial. For η l and εl stationary, y

4.2 Method

F(ξ) =

63

1  Wk [Sk − Σk (ξ)]Wk ∗ 2F 2 # k∈Ω

and FM L (ξ) = −p +



[log |Σk (ξ)| + tr{Sk Σk (ξ)−1 } − log |Sk |],

k∈Ω#

where Wk are the m × m block-diagonal elements of W. Minimization of either FM L (ξ) or F(ξ) with some choice of W yields the desired estimates of θ, Ψk and Θk , for k ∈ Ω# . Let each dipole be specified by r parameters. The number of unique parameters of ξ is r ·d dipole parameters, plus d2 K # for the non-zero real and imaginary parts of Ψk plus m2 K # parameters for the non-zero real and imaginary parts of Θk , k ∈ Ω# . There are therefore many more parameters than the m2 K # available non-duplicate values in Sk , k ∈ Ω# . Hence, without further assumptions, i.e. without a reduction of the number of free parameters in the model, the parameter vector ξ is not identifiable. Therefore, assumptions on Ψ or Θ (or both) have to be made. The assumption that we will adopt here is that Θk = σk2 Im , that is, the noise signals are uncorrelated between sensors. If this is not a valid assumption, and the sensor noise covariance U is (approximately) known up to a multiplicative constant (e.g. from measurements prior to the experiment), then Θk = σk2 U can be taken. The parameters ξ may be required to satisfy q constraints r(ξ) = 0.

(4.9)

where r(ξ) is a real continuously differentiable q-vector valued function [111]. This occurs for example in symmetric dipole pair models (e.g. [54, 130]). It also occurs where it is required that sources are tangential, for example in a MEG analysis in a spherical head model. More specifically, if both the location θdloc and orientation θdor of a dipole is specified in Euclidean coordinates and θdloc has it’s origin at the center of the sphere, and θdor has it’s origin at θdloc . Then the constraint that the dipole orientation vector θdor is orthogonal to the dipole location vector θdloc has to be imposed: (θdloc ) θdor ≡ 0 [111,114]. Then F(ξ) or FM L (ξ) has to be minimized subject to (4.9). The GLS and ML estimates have asymptotically a multivariate normal distribution [29]. An  −1 /L, where  = H(ξ) approximation to the covariance matrix of ξ is B ˘ = H(ξ)

2 ∂ 2 FGLS (ξ)  = ∂ FM L (ξ) or H( ξ) ∂ξ∂ξ  ∂ξ∂ξ T ξ=ξb ξ=ξ˘

(4.10)

The quality of this approximation however, highly depends on the number L of trials and on the nonlinearity of the loss function, and has to be assessed by simulation. If there are any constraints on the parameters ξ, it was derived in [4, 207] that the approximate covariance matrix of ξ may  of be obtained from the upper left p × p sub-matrix B  −1  R(ξ)   + R(ξ)   R(ξ) 1 H(ξ) (4.11)  L R(ξ) 0 ∂r(ξ) ∂ξ T

is the q×p Jacobian matrix of the constraints r(ξ) [32, for a full discussion on  = (bik ) in (4.11) the properties of the matrix in (4.11)]. The diagonal elements from the matrix B where R(ξ) =

can be used to construct confidence intervals for the parameter estimates, by using the fact that  the deviation  estimates ξi , i = 1 . . . p have an asymptotic normal distribution with standard  bii [29,32]. Then, the (1−α)×100% confidence interval is given by (ξi −Φ−1 (1 − α/2) bii , ξi +  Φ−1 (1 − α/2) bii ), where Φ−1 denotes the inverse of the normal distribution function. A ˘ conventional choice of α is α = 0.05. Analogues results hold for ξ.

64

4 Frequency domain simultaneous source and source coherence estimation in MEG

Misspecifications in the model in (4.6) may result in considerably biased estimators. Apart from errors in the head model, incorrect assumptions on the noise structure and incorrect number of sources d are major sources of misspecification. To evaluate the fitted model we will use the chi-square goodness of fit test approach. In [29] it was shown that if the model is correct and L  is approximately a χ2 variable with m2 K #− p + q degrees of freedom. is large, then L·(FGLS (ξ))  L·(FM L (ξ)) has the same approximate χ2 distribution [166]. Because the ML statistic is known  to perform poorly with small numbers of trials, the Bartlett corrected statistic γ(L − 1)FM L (ξ)

is used, where γ = {1 − [2m + 1 − 2/(m + 1)]/6(L − 1)} is a factor that reduces deviation from the asymptotic chi-square behavior of this statistic [166]. Both statistics can be used to test the null hypothesis that the model is correct, and can therefore be helpful in determining how many dipole sources are necessary to give an accurate description of the data. Other model selection procedures were discussed in [237].

4.3 Simulations. A first set of simulations was carried out to assess and compare the performance of both the GLS and the ML method when used with a realistic number of trials L and signal length n. The performance of the parameter estimators  was assessed by testing whether they are unbiased, have good approximate standard errors bii , and are approximately normally distributed. The combined effect of violations of these asymptotic properties is condensed in an evaluation of the coverage rates of the confidence intervals. The effect of different levels of noise was assessed by comparing two noise conditions. A second, separate set of simulations was conducted to assess to what extend the chi-square goodness of fit was helpful in determining the number of sources. The purpose was to see if the chi-square statistic would reject an (incorrect) model with two dipoles in favor of a (correct) model with three dipoles, when in fact three dipoles were used to generate the data. 4.3.1 Data generation. MEG data were generated using a unit spherical head model. In the first set of simulations two current dipoles were situated at the points (0.0, 0.5, 0.75) and (0.0, -0.5, 0.75) (origin at the center of the sphere), both having the orientation (1,0,0), where the origin of the orientation frame of reference is situated at the dipole’s location. Coherence of the source amplitudes was induced by letting them follow a first order vector auto-regressive stochastic process (VAR(1) process) [26]: η1 (t) = 0.7 η1 (t − 1) + a1 (t) and η2 (t) = 0.5 η1 (t − 1) + 0.7 η2 (t − 1) + a2 (t) where a1 (t), a2 (t) ∼ N (0, 1), independent across time and trials. For the simulations in which the performance of the chi-square goodness of fit was determined, an additional source was located at (-0.5, 0.0, 0.75) with orientation (0,1,0), and amplitude η3 (t) = 0.3 η1 (t− 1)+0.7 η3 (t−1)+a3 (t), again with a3 (t) ∼ N (0, 1). MEG was calculated for first order gradiometers at 61 different locations evenly spread across the head (one gradiometer right above the vertex and 4 rings around it, spaced by 22.5◦ , containing 6, 12, 18 and 24 gradiometers respectively—all with the inner pickup coil at eccentricity 1.09). Auto-correlated noise (AR(1) process) was added to each gradiometer measurement. These noise processes were uncorrelated between MEG sensors. The signal to noise ratio (SNR), defined as the ratio between the largest mean standard deviation of the (noiseless) sensor signals and mean standard deviation of the noise signals, was chosen to be3 1:5 and 1:1. In total 400 trials were generated this way, each trial consisting of 128 samples, sampled at 128 Hz. The data 3

The SNR was chosen to be consistent with ERP’s and background EEG bandwidths found in the literature: ERP typically tend to have minimum to maximum amplitude bandwidths of about 5 to 15 µV, whereas background EEG tends to vary in bandwidths of 50 to 100 µV.

4.3 Simulations.

65

were Fourier transformed using an FFT-algorithm, and the sample cross spectral matrices Sk , k ∈ Ω# , were calculated as indicated after (4.5). 4.3.2 Parameter estimation. In the estimation problem, the dipole orientation parameters θor were specified in Cartesian coordinates. This requires the nonlinear constraint that the orientation vectors have unit norm, i.e. θdor  ≡ 1 for d = 1, . . . , d, where d indexes the dipole. In addition we have the aforementioned orthogonality constraints on the dipole location and orientation parameters (θdloc ) θdor ≡ 0, d = 1, . . . , d. In the first set of simulation we therefore have 2 × 6 = 12 dipole parameters subject to four constraints. In the second set of simulations we also have 3 × 6 = 18 subject to six constraints for the three dipoles fit. For the first set of simulations, a subset of 10 frequencies, which contained most of the power in the generated signals, was selected for estimation: Ω# = {4, . . . , 13} (in Hz.). For two dipoles, Ψk has four, and Θk = σk2 Im has one unknown parameters for each k ∈ Ω# . This totals 10 × (4 + 1) = 50 parameters. We therefore have p = 12 + 50 = 62 parameters in the first set of simulations. In the second set of simulations the fit was limited to a subset of 5 frequencies Ω# = {4, . . . , 8}. We therefore have p = 12 + 5 × (4 + 1) = 37 parameters for two sources, and p = 18 + 5 × (9 + 1) = 68 parameters for three sources in this simulation set. Estimates ξ˘ and ξ were obtained by minimizing FGLS (ξ) or FM L (ξ) under the indicated constraints, using a quasi-Newton algorithm [78]. Approximate standard errors were obtained ˘ was approximated by a finite difference from the gradient of the loss from (4.11), where H(ξ) function: Let ej be the vector with all entries equal to zero, except the j-th entry which is equal to 1, let δi =  max(1, |ξ˘i |) where  > 0 is some small number (e.g.  = 10−8 ). An approximate ˘ ≈ ((∇i FGLS (ξ˘ + δj ej ) − ∇i FGLS (ξ))/δ ˘ Hessian matrix of FGLS (ξ) was obtained from H(ξ) j ), where ∇i = ∂/∂ξi , [32]. A similarly approximation was obtained for FM L (ξ). It was found in [62] and [111] that such approximations are adequate. In the second set of simulations first a two dipole model was fitted, and subsequently a three dipole model. The chi-square test was then used to see if the two dipole model was rejected, and the three dipole model was accepted. The SNR in this set of simulations was 1:1. The simulations were repeated 300 times for each combination of SNR (1:1 and 1:5) and estimation method (GLS and ML). The second set of simulations was only carried out for ML and an SNR of 1:1. The simulations in this set were also repeated 300 times. Table 4.1. Quality of the estimated standard errors. SNR 1:1 qGLS qM L SNR 1:5 qGLS qM L

θloc

θor

σk2

[Ψ]11

[Ψ]12

[Ψ]12

[Ψ]22

0.86 0.99

0.91 1.02

0.67 1.01

0.89 0.99

0.73 1.01

0.66 0.85

0.88 0.99

0.88 1.01

0.91 1.14

0.65 1.01

0.87 0.99

0.89 1.01

0.31 0.85

0.86 1.00

Table 4.1. The quality of the estimated standard errors (mean estimated s.e. / simulation s.e.), averaged for location and orientation parameters, and averaged across frequencies for σk2 and Ψ matrix elements. Values close to one indicate good quality, values larger than 1 indicate overestimation of the true parameter standard errors, values smaller than 1 indicate underestimation of the true parameter standard errors.

4 Frequency domain simultaneous source and source coherence estimation in MEG Ψ22 5

2.0

Θ

150 100

1.0 11 13

5

7

9

11 13

5

7

9

11 13

5

7

9

11 13

50 0 5

7

9

11 13

5

7

9

11 13

7

9

11 13

5

7

9

11 13

100

1.0

7

9

11 13

50

0

0

1

0.5

2 5

0.0

0

0.0

1

0.5

2

1.0

3

1.5

5

200

9

150

7

0

0.0 5

1.5

5 4 3

ML 1:5 GLS 1:5

1

0.5

2 11 13

5

9

4

7

2.0

5

2.0

0

0.0

1

0.5

2

1.0

3

1.5

2.0 1.5

5 4 3

ML 1:1 GLS 1:1

Im(Ψ12)

200

Re(Ψ12)

Ψ11

4

66

frequency

Fig. 4.2. Parameter bias. Simulation means of the Ψ and Θ parameter estimates from 300 simulations. True spectral parameters are indicated by the continuous lines.

4.3.3 Simulation results. Bias was assessed by testing the null-hypothesis H0 : ξ. = ξ0 , where ξ. denotes the average of ξ across the 300 simulations, against the alternative HA : ξ. = ξ0 using a Hotelling’s T 2 test. Post hoc t-tests were used to locate the source of bias to specific (types of) parameters. The quality of the estimated standard errors was assessed with a chi-square test, comparing the variances of the ratio’s (ξi − (ξ. )i )/(bii )1/2 with their asymptotic expected value 1. For each estimator in ξ a Kolmogorov-Smirnov test was used to assess whether they had a normal distribution. As the orientation parameters satisfy constraints, their distribution is improper (i.e. their covariance matrix is singular). It should therefore be expected that the standard errors obtained for these parameters will be less than adequate. The overall bias test indicated a clear difference between ξ. and ξ0 in all simulations (ML 1:1, ML 1:5, GLS 1:1 and GLS 1:5). The source of this bias could be located to the different types of parameters: In all conditions, the location and orientation estimators were essentially unbiased. The Θ and Ψ parameters on the other hand were statistically significantly biased. In Fig. 4.2, the mean parameters are plotted together with their true values. For the GLS estimates the bias was rather large; the Ψ and Θ parameters were underestimated on average well over 25% of their true value. For ML, Ψ and Θ bias was small: on average within approximately 2% of their true values. Different signal to noise ratio’s did not seem to affect bias significantly. In table 4.1 the ratio between the mean of the estimated standard errors of the 300 simulations (estimated s.e.) and the standard deviations of the parameter estimates determined from the 300 simulations (simulation s.e.) is given, averaged over parameters. Values close to one indicate good agreement between the estimated and simulation standard errors. As can be seen in table 4.1, the ML estimated standard errors are good. This was confirmed by the chi-square tests: none of the ML estimated standard errors deviated significantly from their asymptotic value, whereas all of the GLS estimated standard errors did. It is known that dipole estimation algorithms generally have more difficulty in determining the radial component of dipole locations than the tangential components [114]. This was verified by testing whether the variance of the location estimates along the radial direction was larger than along tangential directions. This was indeed

4.3 Simulations.

67

Ψ

σk2

92.12 95.71

14.30 93.79

0.00 71.42

84.50 96.26

51.10 94.59

0.00 43.67

0 2 4 6 8 10

SNR 1:1 % GLS % ML SNR 1:5 % GLS % ML

θ

%

~ Im(Ψ12 ) GLS 1 : 5

0.5

0.6

0.7

0.8

0.9

1.0

1.1

~ Im(Ψ12 )

Table 4.2. Coverage rates of the 95% confidence intervals: Percentage of simulations of which 95% confidence intervals contained the true parameter values. Fig. 4.3. Percentages of simulations in which Ψ estimates were contained in the ranges indicated by the bars. The curved line indicates the percentage expected on the basis of the best fitting normal distribution. This figure gives the worst case deviance from normality. Even in this worst case the normal distribution seems a good approximation to the actual distribution.

the case for all types of estimates; the variance of the location estimate was more than two times larger in the radial direction than in the tangential directions. In all cases the θloc estimators were normally distributed, as indicated by the KolmogorovSmirnov tests. The θor estimators were not, due to the constraints. Both GLS and ML Ψ estimators were not normally distributed as indicated by the tests, but the deviance from normality was very small as can be seen in Fig. 4.3. The Θ estimators were normally distributed in all cases. These results may justify the use of parameter confidence regions given in section 4.2.2 in the case of finite sample ML estimators, since their bias was small, their estimated standard errors had good quality, and they were approximately normally distributed. To get an impression the confidence intervals, the coverage rates are given in table 4.2. The confidence regions for the ML source location and orientation parameters θ are reliable (their coverage rate approximates the nominal rate of 95%). For the ML Ψ confidence intervals the coverage rates are also close to the nominal rate of 95%. Coverage rates for ML Θ deviate substantially, especially for bad SNR. This is due to relatively small standard errors. These parameters however, are to be considered as nuisance parameters, not of practical interest. Furthermore, they seem to have little effect on the other parameters. The coverage rates of the GLS θ confidence intervals were very close to the nominal level, but this was not the case for the Ψ and Θ confidence intervals as should be expected from the magnitude of their bias and the underestimated parameter standard errors. Because the GLS estimators did not perform well in the first set of simulations, we only examined the performance of the chi-square goodness of fit statistic with ML estimators. In 255 out of 300 simulations (87%) the two dipole model was rejected by the chi-square test of size α = 0.05, whereas the three dipole model was selected in 279 out of the same 300 simulations (92%) which is close to the expected nominal rate of 95%. A Kolmogorov-Smirnov test on the chisquare variate did not indicate deviation from the appropriate chi-square distribution when the true, three dipole model was used. These results indicate that the chi-square test statistic based  can be helpful in determining the number of sources that are necessary, provided on FM L (ξ) that the model is otherwise correctly specified. 4.3.4 Starting values and algorithm convergence. Common problems with nonlinear optimization algorithms are, among others, obtaining good starting values and avoiding local minima. Even with the starting values near ξ0 the ML method

68

4 Frequency domain simultaneous source and source coherence estimation in MEG

has a tendency to diverge to singular cross-spectral matrices Σk (ξ). This could be circumvented by using ULS estimates as starting values in the ML iterations. Local minima could be avoided when multiple starting values were used. The GLS seemed to suffer a lot from local minima, and sometimes diverged to models with non-positive definite amplitude cross-spectral matrices Ψk , especially with low SNR. In the simulations, when the maximum number of iterations was set to 500, on average about 300 major iterations (evaluation of the loss-function and it’s gradients)  and two different starting values were necessary for convergence to an admissible solution ξ.

4.4 Application to MEG data. As an illustration we analyzed data obtained in a visual evoked field experiment from a 20-year old subject of good health. Data collection. MEG data were obtained in a visual (right half field) stimulation experiment with the same checkerboard stimuli as used in [130]. A CTF-gradiometer system with 151 gradiometers in a magnetically shielded room, recorded the MEG-signals. The subject sat silently while watching the stimuli. 392 trials were recorded. Data processing. An interval of 300 milliseconds was selected, 82 milliseconds pre-stimulus and 211 milliseconds post-stimulus. The signals were sampled at a rate of 208.3 Hz, after low-pass filtering at 70 Hz. For each trial a pre-stimulus baseline of 48 milliseconds was used to center the trial data. After removing the trial average from the signals, the signals were Fourier transformed and an estimate of the cross spectral matrix was obtained at each frequency, as discussed following (4.5). A band of frequencies was chosen, in which the topographical contour plot of the sensors’ signal power was approximately equal (correlations between 0.94 and 0.99). The coherence between sensors with separated more than 9 cm is graphically displayed in Fig. 4.4a. The plot suggests the presence of two coherent sources in the left and right temporal lobe. Therefore a two dipole model was fitted on the cross spectral matrices of the four frequency components in the chosen frequency band (viz. 23.5, 26.9, 30.2, and 33.6 Hz.) Results The parameter estimates are presented in table 4.3 along with there estimated standard errors. The parameter standard errors of the two-dipole model were very large. We therefore subsequently fitted a three dipole model. The estimated sources are located medially in the left and right hemisphere near the vertex. Two of the dipoles are located close to the sources in the two-dipole model. It is seen in table 4.3 that the standard errors obtained from the estimation procedure are still very large (although substantially smaller than for the two-dipole model). Also given in table 4.3 are the coherences between sources. These coherences were computed from the Ψ estimates by ρˆab (k) = |(Ψk )ab |2 /((Ψk )aa (Ψk )bb )1/2 , a, b = 1, . . . , d [133]. The predicted coherence between sensors is displayed in Fig. 4.4b by the lines joining the sensors. Both the model and the data show high coherence between lateral temporal located sensors. For both models the chi-square test indicated lack of fit, which can be due to an incorrect head model, an incorrect number of sources or inadequacy of the dipole model for these sources, or an incorrect noise model. In the current implementation the noise cross-spectrum matrix is assumed to be proportional to the identity matrix which is an unrealistic assumption [58]. We therefore cannot draw definite conclusions about the number of sources from this test at present.

4.4 Application to MEG data. predicted coherences 26.9 Hz 3 dipole model

0 −5 −10

−10

−5

0

inion − nasion

5

10

0.4 < 0.5 0.5 < 0.6 0.6 < 0.7 0.7 < 1

5

10

coherences subject RM (26.9 Hz)

inion − nasion

69

−10

−5

0

5

10

−10

−5

0

left ear − right ear

left ear − right ear

(a)

(b)

5

10

0

θ3

θ1

θ2

0.42 −5

inion − nasion

5

3 dipole model

θ3

θ1

θ2 0.02

−5

0

0.27

5

left ear − right ear

(c) Fig. 4.4. (a) Coherence between sensors with distances greater than 9 cm. (b) Coherence between sensors predicted by the three-dipole model. (c) Three dipole model— the arrows connecting sources indicate the coherence (averaged across the frequencies presented table 4.3) between the source activity. See the text for details.

4.4.1 Conclusions. In Fig. 4.4c the dipole model is depicted, along with curves that indicate the coherence between sources. The dipole fit indicates that medial areas, likely in parietal cortical areas, are interacting. Since the sources are in close proximity of each other, the large standard errors might indicate that the underlying source is extended and is therefore not adequately modelled by dipole sources, but the pattern of coherence does not completely support this interpretation: One of these areas (θ3 ) seems to be coherent with the other areas (θ1 and θ2 ), whereas the other areas lack coherency. In contrast, the sensor to sensor coherence estimates seem to indicate that areas much lower and more temporal are synchronizing activity. Based on the simulations, we should have more confidence in the dipole model than in the sensor coherence. However, the standard errors of the model parameters are high and the chi-square test indicates lack of fit. Furthermore incorrect assumptions about the noise cross-spectrum Θk also lead to biased estimates of Ψk

70

4 Frequency domain simultaneous source and source coherence estimation in MEG

and subsequently over- or underestimated coherences between sources. The coherences should therefore be interpreted with caution. These somewhat disappointing results indicate that in order to make more definite inferences on these issues, the subtleties of the data have to be appreciated to a greater extent. The major source of lack of fit is presumably the incorrectness of the structure of the noise cross-spectrum Θk . Earlier we indicated that this matrix might be more adequately modelled by a matrix σk2 U, where U is a known matrix. The random dipole model developed in [58] may provide an appropriate choice of U. This model assumes that the background brain activity is generated by dipole like sources that are randomly located in the head, and have random orientations. This model predicts both the background EEG and the MEG quite well [58]. As the locations are random from time to time and trial to trial, they do not take part in the gain matrix Λ which only models the sources that are consistent across time and trials. Incorporating such a model for the noise covariance seems vital for successful application of the method.

4.5 Discussion We conclude from the simulations that with the method presented, it is possible to obtain accurate estimates of sources together with measures of interaction of these sources. In theory both GLS and ML should be adequate, however the theory is based on large numbers of trials (L) and can only be an approximation to the finite number of trials situation of actual experiments. In our simulation with a moderate trial count (L = 400), the GLS estimation procedure turned out to give biased estimators and poor quality standard error estimates, and as a consequence, poor quality confidence intervals. In contrast, the ML procedure provided (apart from the nuisance Θk Table 4.3. 3 dipole model fit θxloc

θyloc

θzloc

θxor

θyor

θzor

dipole 1 (s.e.) 2 (s.e.)

-1.9 (9.5) -1.1 (7.6)

2.0 (7.0) -1.9 (8.0)

4.7 (6.2) 4.8 (5.9)

0.76 (1.1) 0.66 (1.4)

-0.42 (2.1) 0.63 (1.9)

0.49 (1.5) 0.40 (1.6)

coh.

23.5

26.9

30.2

33.6

(Hz)

ρˆθ1 ↔θ2

0.34

0.44

0.55

0.46

θxloc

θyloc

θzloc

θxor

θyor

θzor

0.4 (5.2) -0.6 (4.5) -1.1 (6.8)

-2.9 (3.7) 0.9 (7.4) 2.5 (4.5)

4.7 (3.1) 5.3 (4.1) 5.5 (3.4)

-0.97 (0.3) -0.13 (3.1) -0.90 (1.2)

-0.22 (1.4) 0.98 (0.5) -0.43 (2.4)

-0.05 (1.2) -0.17 (1.4) 0.02 (1.7)

23.5 0.00 0.44 0.21

26.9 0.03 0.46 0.31

30.2 0.05 0.37 0.31

33.6 0.02 0.42 0.24

(Hz)

dipole 1 (s.e.) 2 (s.e.) 3 (s.e.) coh. ρˆθ1 ↔θ2 ρˆθ1 ↔θ3 ρˆθ2 ↔θ3

Table 4.3. The θloc parameter values are in centimeter units, θor parameters are direction cosines. ρθk ↔θm is the estimated coherence between the sources θk and θm at the indicated frequencies.

4.5 Discussion

71

estimators) nearly unbiased estimators, with adequate standard errors, and reliable confidence intervals. In Fig. 4.1 the method was compared with conventional coherence analysis of the data in a typical simulation. Interpretation of the coherences displayed in the figure would probably lead to the conclusion that central sensorimotor areas are interacting with more lateral areas on the same side of the head, possibly with contralateral temporal areas as well. This interpretation misses the two interacting sources in the parietal areas (one in each hemisphere). This shows that in general it will be very difficult to discern the amount of coherence due to cortical interactions and the amount of coherence due to overlapping sensor sensitivities. Although an additional phase analysis would help to some extent, this still does not provide a clear picture of the location of the sources, and limits the analysis to out-of-phase sources. The proposed method on the other hand, does provide an interpretation both in terms of physiological interactions and anatomical origin. A difficulty may be observed in two somewhat conflicting requirements of the method: On the one hand the time series segments have to be sufficiently large for the Fourier coefficients to become uncorrelated, while on the other hand the signals are assumed to be generated by relatively small number of dipole like sources. The number of sources incorporated in Λ(θ) will not be very high, since the cross-spectrum estimates are based on the periodograms of the signals averaged across trials. Therefore the power contribution of sources not consistently present at each trial will be diminished and smoothed out as theorized in [58]. Selection of a relatively small window size, which may be necessary to ensure that only a few dipoles are active, will introduce error in the approximations of (4.7) and (4.8) which in turn introduces some inefficiency in the estimates. We plan to investigate this error of approximation in a future article. We used the chi-square goodness of fit approach to assess the appropriateness of the model in the simulations. In the simulations this chi-square test turned out to be quite helpful in determining how many sources should be included in the ECD model. In real data however, the chi-square test is also sensitive to a wrong model for the noise cross-spectrum or a wrong head model. A better noise model than Θk = σk2 I and realistic head models may alleviate this problem. Possible candidates for Θk that accommodate noise correlations between neighboring sensors are given in [58], as discussed previously, and [56] and [236]. However, some model misspecification will be always present because of necessary approximations to reality. The chisquare will then reject a model because of the approximation and not because of an inadequate number of sources. Therefore alternative measures of fit have to be considered [237]. It was noted in the simulations that convergence can be rather slow, especially with large numbers of sensors (≥ 61). The rate of convergence of course depends in part on the starting values for the parameters, the number of frequencies included, and the number of model parameters. Good starting values can quicken convergence considerably, but are not generally easy to find. We obtained starting values from ULS estimates. This did improve the convergence rate of the ML procedure considerably and, as an additional advantage, prevented divergence to improper solutions (i.e. non-positive definite cross spectral matrices). Also, we first obtained estimates at each frequency independently, and then used the frequency average of these parameter estimates as starting values for the simultaneous analysis of all frequencies of interest. Further strategies to find starting values may be found, which may include the following: First a small subset of sensors is selected covering the regions in which sources are expected (e.g. based on previous research). Then, from this sensor-subset ULS estimates are obtained and used as starting values in a full ML optimization procedure with all sensors included. It is not always necessary to include all sensors; it is important to make a selection of those sensors that provide the most information about the sources that are expected to be found. Methods to make such optimal selections are given in [113]. The frequency domain approach to ECD modelling of MEG/EEG has been pursued earlier in [144], [189, 191] and [190]. The current work may be seen as an extension of the methods

72

4 Frequency domain simultaneous source and source coherence estimation in MEG

presented there. In [144], [189, 191] source amplitudes were assumed deterministic across trials. In [190] variations in magnitude and latency of the source amplitude waveforms were allowed for different experimental conditions. No attempts were made to estimate source covariation in these papers. In the current paper source amplitudes were assumed to be partly stochastic, and this stochastic nature was used to estimate their covariation across trials. It may be noted that the method presented here is close to what is known as “stochastic ML” (SML) in the signal processing literature [134].4 The method of this paper may be extended by further parameterizing Ψk to include structural regressions (linear or nonlinear) of activities from one source on another (i.e. estimates of transfer functions). The method may also be extended to include a model for the trial average signal. Furthermore, the method may be extendable to the time domain, where the assumption of stationarity is unnecessary. Some of these extensions are the subject of the following chapter. Computational simplifications may be found along the lines that are given in the discussion of SML in [134], and are pursued in chapters 3 and 5. Although the method was tested with MEG, it is not expected that the results will be drastically altered when using EEG. Also, in principle MEG and EEG could be combined in the analysis, thus profiting from their specific advantages [114].

4

We thank an anonymous reviewer for bringing this to our attention.

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

5 Stochastic maximum likelihood mean and cross-spectrum structure estimation of EEG/MEG dipole sources

5.1 Introduction In1 cognitive neuroscience the objective is to establish how structures of the brain cooperate to give rise to mental functions. Several brain imaging techniques are helpful in determining which parts of the cortex become active during certain mental processes [201]. These techniques include functional magnetic resonance imaging (fMRI) and equivalent current dipole (ECD) modelling of the electro- (EEG) and magneto- (MEG) encephalogram [102]. EEG and MEG measure, respectively, the scalp electric potential field and the magnetic field near the head of a human subject. These fields are generated by localized electric currents associated with neuronal activity. In ECD modelling, these currents are modelled by small current dipoles, and the objective is to estimate their unknown locations, orientations and amplitudes, given the EEG/MEG sensor outputs. Techniques such as fMRI provide great localization precision, whereas EEG and MEG provide great timing precision [201]. While function localization to different parts of the cortex has taken off to a great extent over the last decade, increasingly, researchers are interested in testing hypotheses about the cooperativity between these different cortical structures. That is, in estimating the parameters that describe the dynamics of these interactions [24, 230]. Standard methods of investigating interactions include coherence analysis of EEG and MEG signals and so-called event related (de-)synchronization [183]. Problems with the interpretation of these measures of cortico-cortical interactions include volume conduction effects, reference electrode effects, and the lack of spatial resolution of the EEG/MEG [230]. Newer approaches consist of localization of activity by means of dipole source localization procedures and correlating source amplitude functions estimated for these locations [49, 60, 97, 105]. Still another approach, that was laid down in the previous chapter (see also [87]), is to simultaneously estimate dipole locations and their amplitude cross-spectra from the sample cross-spectra of the EEG/MEG signals. The advantage of this last approach is that it makes full use of the virtues of statistical estimation theory, which include high precision maximum likelihood estimators, and straightforward model evaluation theory. In this chapter we will extend the method in the previous chapter with a framework for modelling and testing source amplitude coherence. Furthermore we will include the information that is present in the average of the EEG/MEG signals across repeated trials. The framework suggested, has its roots in what is known in biometrics as path-analysis, and in econometrics and psychometrics as structural equation modelling (SEM) [20]. The method employs maximum likelihood in a way that is very similar to stochastic maximum likelihood (SML) directions-ofarrival (DOA) estimation, as given in e.g., [134, 180, 220]. We modify the usual SML formulas to include the mean and a more general noise covariance. This paper is organized as follows: In section 5.2 the source model is presented. In section 5.3 the mean and cross-spectrum model is given, and a framework for modelling source amplitude coherence is presented. In section 5.4 closed form expressions for the estimators of some of the 1

This chapter has been submitted as [88].

74

5 Stochastic ML mean and cross-spectrum structure estimation of EEG/MEG dipole sources

parameters are derived, and an expression for the concentrated negative log-likelihood function is obtained. Also, the generalized likelihood ratio (GLRT) statistic and approximate standard errors of the estimators are briefly discussed in connection with model evaluation. In section 5.5 the approximate standard errors and GLRT statistic are evaluated in a set of numerical experiments. Finally in section 5.6 some closing remarks on the methods are made.

5.2 Dipole model and measurements model Experimental EEG and MEG data usually consist of signal segments measured in different trials, during which stimuli are presented to subjects in order to evoke specific brain responses. The EEG/MEG signals reflect these responses in a highly entangled way, and the purpose of ECD modelling is to disentangle these signals into the underlying components of localized neuroelectric activity in the cortex. It has been widely recognized that these cortical responses, evoked by the presentation of stimuli, are characterized by a deterministic part and a stochastic part [184].2 The deterministic part, the event related potential/field (ERP/ERF), can be estimated by averaging the signals across many repeated trials [24]. The stochastic part is only reflected in the variance of the signals across trials. It is generally accepted that these trials may be considered as statistically independent replications of an evoked brain response, provided that the time-interval separation between trials is not too small and unpredictable [184]. In relating the fields produced by these neural currents to the measurements, the head is often modelled as a spherically symmetric conductor that is locally fitted to the curvature of the skull [102]. The sources themselves are described by a parameter vector θ = [θ1  , . . . , θd  ], containing location and orientation parameters θa for each source, indexed a = 1, . . . , d. Here  denotes transposition. For EEG/MEG data in trials l = 1, . . . , L the m-dimensional array of measurements has the form3 l (t) = Λ(θ) η  l (t) +  y εl (t) t = 0, . . . , n − 1. (5.1) l (t) is the vector of measurements from m channels on trial l at time t. Λ is the m × d Here y matrix, of which the columns contain the gains for the unit amplitude sources parameterized  l (t) is the d vector of source amplitudes in trial l at time t, and  by θ. η εl (t) is an m vector of  l (t). The gain matrix Λ is obtained from the noise signals in trial l at time t, independent of η quasi-static Maxwell equations [102]. For the MEG measurements from a spherical head model that we use in section 5.5, Λ was determined in [195]. Throughout this article it is assumed that source parameters θ are fixed over time and trials. We will assume that the sources are sufficiently separated so that rank(Λ(θ)) = d throughout the source region.  l (t) is stochastic in nature, is that, from An advantage that arises if it may be assumed that η the variation across trials inferences about interdependency between different sources can be made, which may be interpreted as “functional coupling” of different cortical areas [73]. In [87] we also pointed out this fact, along with a discussion about the advantages of transforming the model into the frequency domain. To summarize the latter: Assuming stationarity of the stochastic processes, the Fourier coefficients of different frequencies are asymptotically (i.e. for n → ∞) uncorrelated and have approximately a complex normal distribution [26]; the fitting function may therefore be factored into a set of fitting functions that are much more efficiently evaluated than their time domain equivalent. As a result the computational burden can be reduced drastically. This property has been previously exploited in the context of the analysis of brain signals in [184, 189–191]. 2

3

The debate on this issue has recently revived due to experimental findings in [152] and a mathematical analysis of data preprocessing effects verified in experimental data in [17], resulting in opposing views. In any case, the model can be maintained as a model for the ensemble average, where any trial to trial variation is absorbed into the stochastic part of the response. Throughout this chapter tilde () will indicate time domain quantities.

5.3 Model specification

75

To summarize, we make the following assumptions on the measurements: i ) Multiple segments of multichannel data are available, generated in accordance with the source signal plus noise model in (5.1), in which a precisely defined event occurs, to which the sources respond. ii ) Segments are statistically independent of each other. iii ) The sources’ parameters θa are fixed across time and segments. iv ) The source responses consist of a deterministic part and a stochastic part. v ) Noise and source signals are statistically independent, and the expected value of the noise is zero. vi ) Λ is a known matrix function of θ and the sources are sufficiently separated such that Λ has full column rank.

5.3 Model specification 5.3.1 Mean and cross-spectrum structure  l (t) exp (−i2πtk/n) to be the discrete Fourier transform coeffiDefine yl (k) = (2πn)−1/2 t y cient at frequencies 2πk/n, k = 1, . . . , K < n/2 [191]. Define η l (k) and εl (k) similarly. As indicated previously, subject to certain mixing conditions and stationarity of the stochastic part of the signals, the Fourier coefficients yl (k) have an asymptotically complex normal distribution, and are statistically independent for k = j [26, 191]. Their covariance matrix E{[yl (k) − E{yl (k)}][yl (k) − E{yl (k)}]∗ } approaches the cross-spectral density matrix Σk as n → ∞. Here (·)∗ denotes conjugation and transposition. For the Fourier transformed data the equivalent of (5.1) is yl (k) = Λ(θ)η l (k) + εl (k)

k = 1, . . . , K < n/2.

(5.2)

Using assumption v in the last paragraph of the previous section, the cross-spectrum of the l (t) at each frequency then has the structure4 stochastic part of y Σk = Λ(θ)Ψk Λ(θ)∗ + Θk

k = 1, . . . , K < n/2.

(5.3)

Here Ψk is the cross-spectrum of the source amplitudes, and Θk is the cross-spectrum of the noise signals. These are the limiting values of the covariances of η l (k) and εl (k), respectively. This is the model presented in [87]. In that paper, this model was fitted to the sample cross-spectrum Sk that was computed from the observed data from the formula [26, p. 282] 1  Sk = [yl (k) − y˙ k ][yl (k) − y˙ k ]∗ , L−1 L

(5.4)

l=1

 where y˙ k = L−1 l yl (k). If an estimate of the matrix Ψk is obtained in this way, coherences between source amplitudes can be obtained as a measure of functional connectivity (see [87]). In the model thus far, the signal means, and hence the source amplitude means, are ignored. If the amplitude means are not equal to zero, the means contain important information, and are usually the object of interest to the researcher. Therefore the first extension of this model involves the incorporation of the trial average in the form of the expected value of the Fourier coefficients by taking expectations in (5.2): E{yl (k)} = µk = Λ(θ)E{η l (k)} = Λ(θ)η k ,

(5.5)

εl (t). Here η k is the k-th Fourier coefficient of the because E{εl (k)} = 0 by assumption v) on   (t). ensemble average waveform η 4

Λ(θ) is real valued for the biophysical model but can be complex in other applications. Henceforth we use ∗ instead of  in such cases.

76

5 Stochastic ML mean and cross-spectrum structure estimation of EEG/MEG dipole sources

5.3.2 Linear filter model for interactions In some situations a researcher may entertain substantive hypotheses about interactions between different sources. In the current model these hypotheses may be tested directly if Ψk is further structured. The interaction equation between the amplitudes of two sources indexed a and b, may be expressed in the time domain in terms of a Volterra expansion [26, 50]:   ηb,l (t − τ )dτ + hab,2 (τ1 , τ2 ) ηb,l (t − τ1 ) ηb,l (t − τ2 )dτ1 dτ2 + · · · , (5.6) ηa,l (t) = hab,1 (τ )  l , hab,1 is sometimes called the Wiener kernel, and hab,j , where ηa,l is the a-th component of η j = 2, . . ., are the kernels generating the nonlinear effects in the interaction [50]. For interactions between more than two sources this can be generalized to incorporate the effects of other source amplitudes. We will approximate such an expansion by its first order (Wiener) terms. The estimation of higher order terms would require higher order spectra. It is not expected that the relations will be perfect, since, apart from nonlinearities in the interactions, some intrinsic activity will exist and some external input activation is unaccounted for by the sources incorporated in the model. These effects will be incorporated through an additional zero mean stationary stochastic process term ζa,l (t). For different sources a and b, ζa and ζb are assumed to be independent. In addition, a non-random portion of the response is included through an extra term ha (t). The resulting equation for the interactions between sources is ηa,l (t) = ha (t) +

d  

ηb,l (t − τ )dτ + ζa,l (t) hab,1 (τ )

(5.7)

b=1

for a = 1, . . . , d. Of course neurophysiological hypotheses may imply that some of these kernels are equal to zero. By the convolution theorem of Fourier analysis, in the frequency domain this results in the relation d  βab (k)ηb,l (k) + ζa,l (k) (5.8) ηa,l (k) = αa (k) + b=1

 1 ha (t) exp{−i2πkt}dt, and βab (k) = 2π hab,1 (t) exp{−i2πkt}dt. Zero kernels where αa (k) = in (5.7) correspond to zero coefficients in this equation. For d sources these relations may be compactly represented in matrix form: 1 2π



η l (k) = αk + Bk η l (k) + ζ l (k)

(5.9)

where Bk = (βab (k)) is a d × d complex valued matrix, and ζ l (k) = (ζ1,l (k), . . . , ζd,l (k)). Some restrictions must be imposed in order to make Bk identifiable [20,122]. We will assume that Bk is specified in such a way that (I−Bk )−1 exists; this coincides with the condition that the  (t) consists of a deterministic system of filters in (5.7) is invertible [26, p. 30]. This ensures that η component superimposed on a stationary stochastic component. The vector of mean amplitude Fourier coefficient E{η k } is then obtained from rewriting (5.9) as η l (k) = (I−Bk )−1 [αk +ζ l (k)], and taking expectation: E{η l (k)}  η k = (I − Bk )−1 αk , as E{ζ l (k)} = 0 by the assumption on the components of ζl (t). Besides the invertibility condition  (t) will have to be restricted. A natural constraint is to on I − Bk , the cross-spectrum of ζ l

restrict E{ζ l (k)ζ l (k)∗ } = Φk = diag(φ1 (k), . . . , φd (k)), as otherwise, there could be correlations between dipole amplitudes not accounted for by the filter model that was invoked precisely to model these correlations. The amplitude cross-spectrum is now obtained from E{[η l (k) − E{η l (k)}][η l (k) − E{η l (k)}]∗ } = E{(I − Bk )−1 ζ l (k)ζ ∗ l (k)(I − Bk ∗ )−1 } or Ψk = (I − Bk )−1 Φk (I − Bk ∗ )−1 .

5.4 Parameter estimation

77

For any nonsingular diagonal matrix D it is seen that [D(I−Bk )]−1 DΦk D∗ [(I−Bk ∗ )D∗ ]−1 = (I−Bk )−1 Φk (I−Bk ∗ )−1 , while the patterns of zero and nonzero entries of both (I−Bk ) and D(I− Bk ) are the same. Hence, without fixing the scale of either Bk or Φk , both cannot be uniquely identified. A further restriction therefore has to be imposed. With no additional information, it is natural to require either Φk = I or diag(Bk ) = 0. The former has the interpretation that the intrinsic activity has a uniform spectrum (i.e. is pure white noise), which is somewhat unrealistic, especially for biological systems, and is therefore not desirable. The latter ensures that the diagonal elements of I−Bk are equal to 1 and means that the kernels haa (t), a = 1 . . . d, are identically zero; hence Bk contains the Fourier coefficients of the linear filter that predicts the activity of one source from the activity of only other sources (and not from its own activity). 5.3.3 Structure of cross-spectrum Θk of the noise signals In Chap. 4 the noise cross-spectrum Θk was constrained to be proportional to an identity matrix: Θk = σk I. Conceptually, this means that the amount of noise is the same for all sensors, and that noise at different sensors is mutually uncorrelated. A more realistic constraint on Θk is to assume that the background EEG/MEG consists of dipoles that are randomly located and randomly activated in different trials and different times [58]. Here we will assume that Θk is any function of KN parameters γ = (γjk ), j = 1 . . . N, k = 1, . . . , K such that these parameters are identifiable. In summary, then, in addition the measurement assumptions (section 5.2), we assume: vii ) The Fourier coefficients of the Fourier transformed data have (asymptotically) a complex normal distribution, independent for different frequencies. viii ) The dependencies between signals of different sources can be reasonably approximated by linear filter relations as in (5.7), this may be justified as a first order approximation of a Volterra functional expansion [26]. ix ) The filter system in (5.7) is invertible (i.e. (I−Bk )−1 exists for all k). x ) Restrictions have been introduced, in a way that is justifiable within the context of application, to make the matrices Bk , Φk and Θk identifiable.

5.4 Parameter estimation Since the interest of the analyst is usually restricted to a limited band of frequencies, not all frequencies have to be incorporated in the analysis; we will denote the subset of K frequencies incorporated by Ω# . Following [87], the unknown parameters in (5.3), (5.5) and section 5.3.3 (i.e. θ, αk , the nonduplicate elements in Ψk and γ = (γjk ), for j = 1, . . . , N , k ∈ Ω# ) are collected in the p-vector ξ, and are estimated by maximizing the “likelihood” [26, 191] ({yl (k) : l = 1, . . . , L, k ∈ Ω# }; ξ) ≈

exp{−[yl (k) − µ (ξ)]∗ Σk (ξ)−1 [yl (k) − µ (ξ)]} k k . π m |Σk (ξ)| l,k

The factorization in k is due to the afore mentioned asymptotic independence of the Fourier coefficients of different frequencies [26, 191]. It will be more convenient to minimize the negative log-likelihood, which is proportional to (cf. [8, 32])    ∗ −1 −1 log |Σk (ξ)| + tr{Σk (ξ) Sk } + [y˙ k − µk (ξ)] Σk (ξ) [y˙ k − µk (ξ)] + K log π m . F (ξ) = k∈Ω#

(5.10) When it is known that µk (ξ) ≡ 0 and a single frequency is considered, (5.10) may be reduced to log |Σk (ξ)|+ tr{Σk (ξ)−1 Sk }, which is the “Stochastic ML” (SML) objective function described in e.g. [180], [134, 175, 219, 220]. In some cases, the latter negative log-likelihood function can

78

5 Stochastic ML mean and cross-spectrum structure estimation of EEG/MEG dipole sources

be separated, and for unparameterized Ψk and Θk = σk I a well known compact equivalent concentrated problem was determined by B¨ ohme (cited in [175, 220]), which greatly increases the computational efficiency. Here we derive a similar concentrated problem for this case when the mean is incorporated. Furthermore, we obtain estimators under more general noise conditions—i.e., we do not assume that Θk = σk I, but allow Θk = σk U(γ), where U(γ) is a Hermitian positive definite matrix such that γ is identified. For simplicity however we will derive the results for U(γ) = I and then indicate how they generalize. 5.4.1 The case that B = 0 (unparameterized Ψ) + Let F+ denote the pseudo-inverse (F∗ F)−1 F∗ , and Π⊥ F denote the matrix I − FF for any F of full column rank. Then, for unparameterized Ψk we obtain the expressions

 = Λ(θ)+ y˙ k | b  k (θ) α θ=θ 1  = ˙ k y˙ k∗ ]}|θ=θb tr{Π⊥ σ k (θ) Λ [Sk + y m−d ∗  = Λ(θ)+ (Sk − σ  k (θ) Ψ k (θ)I)Λ(θ)+ |θ=θb  k (θ)Λ(θ)∗ + σ k (θ)I|, θ = arg min log |Λ(θ)Ψ θ

(5.11) (5.12) (5.13) (5.14)

⊥ k I In the case that Θk (γ) = σk U(γ), then Λ+ is replaced by (QΛ)+ Q, Π⊥ Λ by QΠQΛ Q, and σ −1/2 −1 , a Hermitian ‘square root’ of U . In deriving the results, we will by σ k U(γ), where Q = U temporarily drop the dependence on k and suppress the dependence of Λ on θ. These estimators are obtained by equating partial derivatives of F to zero and solving for the desired parameter. We first consider α.

Mean amplitude parameters α Setting the derivatives of F with respect to α equal to zero, the first order conditions −2(y˙ − Λα)∗ Σ−1 Λ = 0  are obtained, which may be solved to yield the optimal estimator α ˙  = [Λ∗ Σ−1 Λ]−1 Λ∗ Σ−1 y. α

(5.15)

In the appendix it is shown that for Σ = ΛΨΛ∗ + Θ, for any Λ of full column rank and nonsingular Ψ and Θ, (Λ∗ Σ−1 Λ)−1 Λ∗ Σ−1 = (Λ∗ Θ−1 Λ)−1 Λ∗ Θ−1

(5.16)

so that (5.11) is obtained for Θ = σI, and the more general result is obtained for Θ = σU. Amplitude cross-spectral parameters Ψ  into F yields the concentrated negative log-likelihood F|α Substitution of α b . Let ψ denote a real or imaginary part of an element of Ψ. The derivatives of F|α b with respect to ψ are the same as  does not depend on Ψ by (5.16). Setting the derivatives equal to zero, we those of F because α may obtain the equations − 2 tr{Λ∗ Σ−1 (S# − Σ)Σ−1 Λ∂Ψ/∂ψ}|α=b α=0 ψ = ( Ψ)ab or ψ = (Ψ)ab ,

a, b = 1, . . . , d,

5.4 Parameter estimation

79

˙ µ  ][y˙ − µ  ]∗ , and µ  = Λα.  From (5.15), by construction of α,  Λ∗ Σ−1 [y− ˙ µ ] = where S# = S−[y− ∗ −1 # ∗ −1 0, so that Λ Σ S = Λ Σ S and the estimation equations can be reduced to, in matrix form, Λ∗ Σ−1 (S − Σ)Σ−1 Λ = 0. Substituting Σ = ΛΨΛ∗ + Θ, this can be written Λ∗ Σ−1 (S − Θ)Σ−1 Λ = Λ∗ Σ−1 ΛΨΛ∗ Σ−1 Λ. Therefore the ML estimate of Ψ is given by  = [Λ∗ Σ−1 Λ]−1 Λ∗ Σ−1 (S − Θ)Σ−1 Λ[Λ∗ Σ−1 Λ]−1 Ψ

(5.17)

which with (5.16) and Θ = σI or Θ = σU yields (5.13) and the more general result indicated thereafter. Noise spectrum σ and concentrated negative log-likelihood We first derive some simplifying expressions, required to concentrate the likelihood with respect to α and Ψ.  can be rewritten First note that with U−1/2 = Q, Ψ  = (QΛ)+ QSQ(QΛ)+ ∗ − σ(Λ U−1 Λ)−1 . Ψ

(5.18)

Furthermore, by using the matrix inversion formula (e.g. [199, p. 9], see the appendix) twice, it can be shown that Σ−1 = (ΛΨΛ∗ + σU)−1 =

1 +∗ ∗ −1 −1 −1 + QΠ⊥ QΛ Q + Q(QΛ) [Ψ + σ(Λ U Λ) ] (QΛ) Q. σ

 in (5.18), and a little algebra that cancels terms, yields the equation Substitution of Ψ ⊥ −1 ∗ −1 −1 −1 ∗ −1 Σ−1 |Ψ b = (1/σ)QΠQΛ Q + U Λ(Λ U SU Λ) Λ U .

(5.19)

Substitute (5.19) in tr{Σ−1 S} to obtain ⊥ −1 ∗ −1 −1 −1 ∗ −1 tr{Σ−1 |Ψ b S} = tr{(1/σ)QΠQΛ QS} + tr{U Λ(Λ U SU Λ) Λ U S}

= tr{(QΠ⊥ QΛ QS}/σ + d,

where the equality tr{AB} = tr{BA} was used. Furthermore from (5.19) and (5.16) it can be shown that ∗ −1 −1 ∗ −1 −1 ∗ −1 −1 ∗ −1 Σ−1 |Ψ b [I − Λ(Λ Σ Λ) Λ Σ ] = Θ [I − Λ(Λ Θ Λ) Λ Θ ].

From this and from (5.15) therefore, we find ˙ −µ )  )∗ Σ−1 |Ψ (y˙ − µ b (y

∗ −1 −1 ∗ −1  )∗ Σ−1 |Ψ ˙ = (y˙ − µ b (I − Λ(Λ Σ Λ) Λ Σ )y

 )∗ Θ−1 (I − Λ(Λ∗ Θ−1 Λ)−1 Λ∗ Θ−1 )y˙ = (y˙ − µ

˙ because Θ−1 = (1/σ)U−1 = (1/σ)Q2 . Combining traces which can be written σ1 y˙ ∗ QΠ⊥ QΛ Qy, now yields the concentration of F in (5.10) with respect to α and Ψ: 1  ∗ ˙ y˙ ∗ )} + d. F|α, tr{QΠ⊥ b = log |ΛΨΛ + σU| + QΛ Q(S + y b Ψ σ

80

5 Stochastic ML mean and cross-spectrum structure estimation of EEG/MEG dipole sources

To find σ  we must take the derivative of F|α, b with respect to σ. Before doing so, first b Ψ ∗ ∗  + σU can be written Λ(QΛ)+ QSQ(QΛ)+ Λ∗ + note that together with (5.18) Σ| b = ΛΨΛ Ψ

−1 −1 ⊥ −1 σQ−1 Π⊥ b with respect b /∂σ = Q ΠQΛ Q . Setting the derivative of F|α, QΛ Q . Therefore ∂Σ|Ψ b Ψ to σ equal to zero gives the first order conditions −1 ⊥ −1 ⊥ ˙ y˙ ∗ )}/σ 2 .

tr{Σ−1 |Ψ b Q ΠQΛ Q } = tr{QΠQΛ Q(S + y

−1 ⊥ −1  = tr{QΠ⊥ With (5.19) we find tr{Σ−1 |Ψ b Q ΠQΛ Q } = (m − d)/σ, therefore σ QΛ Q(S + ∗ y˙ y˙ )}/(m − d) is obtained, which is (5.12). Substitution of σ  in F|α, b yields the concentrated b Ψ negative log-likelihood in (5.14).

5.4.2 The case that B = 0 Unfortunately, when B = 0, we cannot use the algorithm in (5.11)-(5.14). Some parameters can still be separated however: Next we find estimators for α and Φ when B = 0. If B = 0, Λ in (5.15) must be substituted by Λ(I − B)−1 . The resulting estimator of α has  where α  is given in (5.15). the simple form (I − B)α,  Let φ be a (real) diagonal element of We can obtain an estimate for Φ in a similar way as Ψ. Φ. Setting derivatives of F|α b with respect to φ equal to zero, we find the first order conditions to be

tr{Λ∗ Σ−1 (S − Θ)Σ−1 Λ∂Φ/∂φ} = tr{Λ∗ Σ−1 ΛΦΛ∗ Σ−1 Λ∂Φ/∂φ}. where Λ = Λ(I − B)−1 . Since the partial derivatives are taken only with respect to the real diagonal elements of Φ it is easy to see that may be dropped. The first order conditions therefore yields a system of equations with the solution  = [(Λ∗ Σ−1 Λ) (Λ∗ Σ−1 Λ)]−1 (diag[Λ∗ Σ−1 (S − Θ)Σ−1 Λ]) φ

(5.20)

where φ = diag(Φ) contains the diagonal elements of Φ, denotes the Hadamard product defined by A B = (aij bij ), and Σ denotes conjugation of Σ. 5.4.3 Assessment of model fit The appropriateness of a model can be assessed by means of various fit assessment techniques that are sometimes grouped under the term “model selection procedure” [237]. Some of these procedures indicate how well the model describes the data, while others provide a rationale for deciding which of several competing models should be preferred on the basis of the data. Hence, a model selection procedure can help to decide how many dipoles should be incorporated in the model, and whether cross-spectral parameters should be included. In [87] we assessed  − the usefulness of the generalized likelihood ratio (GLRT) statistic 2L · (F (ξ) k∈Ω# log |Sk |) in determining the number of dipole sources that should be incorporated in the model (i.e. the detection problem [239]). Here we assess its effectiveness in testing the lack of interaction between different sources. The GLRT has an asymptotic χ2df distribution with df = K(m2 + 2m) − p degrees of freedom, where p is the number of free parameters. For moderate numbers of observations a Bartlett corrected statistic should be used [13] as was indicated in [166, 235]. Confidence regions of the estimates can also help to decide which parameters are necessary, and which may be omitted: Location estimates that are not contained in each other’s confidence regions indicate separate sources, and confidence regions of cross-spectral parameter estimates indicate whether these differ from zero [110, 237].

5.5 Simulations

81

5.5 Simulations In [87] we showed that confidence regions can be constructed quite reliably. Confidence regions of the estimated parameters can be computed from the Hessian matrix of the negative log likelihood ratio F (ξ) evaluated in ξ [200, 207]. A finite difference approximation of the Hessian matrix was  Note that in order to obtain standard errors for calculated from the gradients at the estimate ξ. all parameters, F in (5.10) must be implemented fully, including, all analytic derivatives, but only needs to be evaluated after the last iteration of the algorithm in (5.11)-(5.14). We used a quasi-Newton algorithm [79] to optimize the full negative log-likelihood (5.10) to obtain the  We refer to Chap. 4 for the details on the calculation of the confidence regions. estimates ξ.

0.05 0.00 −0.10

amplitude

0.10

0.15

Reconstructed mean amplitude of source 1

0

20

40

60

80

100

120

time

Estimated source location parameters

θ3 θ2 θ1

−0.50

−0.25

0.00

0.25

0.50

0.75

Fig. 5.1. To give an impression of the reconstruction, here, the in each simulation reconstructed source amplitude of the first source is depicted (upper panel). The fat gray line is true amplitude as reconstructed from the frequencies used in the estimation. Furthermore, in the lower panel box and whisker plots of the (x) (y) (z) source location parameter (θa  = [θa , θa , θa ]) estimates are depicted. Black, gray and white boxed whisker plots respectively correspond to x, y and z-coordinates. Boxes show the estimates between the first and last quartiles, central line indicates the median, whiskers indicate the estimators’ range extremes, dots indicate very extreme estimates.

To asses the performance and the stochastic behavior of the estimators in the current extensions, a number of simulations were carried out. Simulations were conducted in much the same way as in Chap. 4: Three dipoles were placed in a unit radius sphere at (0, 0.5, 0.75), (0, −0.5, 0.75) and (−0.5, 0, 0.75), the first two with orientation cosines (1, 0, 0) and the third with orientation cosines (0, 1, 0). The amplitudes of the dipoles consisted of dampened sine waves a1 (t) added to a vector auto regressive stochastic process: η1 (t) = exp(−2t/n) sin(2πt2/n) +  and η2 (t) = exp(−2t/n) sin(2πt4/n) +  a2 (t), where  a1 (t) and  a2 (t) satisfied the equations a2 (t) = 0.5a1 (t − 1) + 0.7a2 (t − 1) + ζ2 (t). The ampli a1 (t) = 0.7a1 (t − 1) + ζ1 (t) and  tude of the third dipole was generated from η3 (t) = exp(−2t/n) sin(2πt6/n) +  a3 (t) with

95.6 95.5 95.8

92.2 93.9 94.9

94.4 94.5 95.1

94.2 86.2 68.0

60

the 95% confidence intervals. α Φk Bk σk

% accepted

Coverage rates of θ L 100 93.7 200 94.9 400 95.1

100

5 Stochastic ML mean and cross-spectrum structure estimation of EEG/MEG dipole sources

0 20

82

100

200 300 number of trials

400

Table 5.1. Coverage rates of the 95% confidence intervals. Coverage rates of the 95% confidence intervals. Percentage of simulations of which 95% confidence intervals contained the true parameter values when the correct model was fitted. Percentages are computed as the proportion of 300 simulations in each case. Fig. 5.2. Percentage of simulations in which the fitted model was accepted as indicated by the significance of the GLRT. This should be in 95% of the simulations in case of the correct model (continuous line), and in as few of the simulations as possible, in case of an incorrect model with too few parameters (dashed line).

 a3 (t) = 0.3 a1 (t−1)+0.7 a3 (t−1)+ ζ3 (t). For all three i = 1, 2, 3, ζi (t) ∼ N (0, 1). With these dipole amplitudes MEG data were simulated for a whole head 61 sensor array in accordance with (5.1) εa (t − 1) +  a (t), — the components of  ε(t) each satisfying the auto-regression process εa (t) = 0.7  ). The matrix function Λ was obtained from [195]. In all a = 1, . . . , m, where  a (t) ∼ N (0, σ simulations σ  was twice as large as the largest of the noiseless sensor signal variances, i.e., the signal to noise ratio (SNR) was 1:2. For each trial n = 128 samples were generated. The generated data were (Fast Fourier) transformed into the frequency domain and the mean y˙ k and sample cross-spectra Sk for the first five frequency components were calculated as indicated previously. The simulations were carried out with L = 100, 200 and 400 trials. Two models were fitted: one in which only the filter coefficients from dipole 1 to dipole 2 (β12 ) and from dipole 1 to dipole 3 (β13 ) were freely estimated while others were forced to zero5 and one model in which no interactions were allowed (which is an incorrect model for the simulated data). In all simulations Θk = σk I was used in accordance with the simulated data. The purpose of fitting the incorrect model was to assess the adequateness and usefulness of the Bartlett corrected GLRT in rejecting an incorrect a priori hypothesized model, while retaining a correct a priori hypothesized model. In Table 5.1 coverage rates for different kinds of parameters are presented. These coverage rates represent in a condensed form the accuracy of the estimators themselves, and the quality with which the confidence intervals are constructed. This is achieved by giving the percentage of simulations in which the true parameters were contained in the 95% confidence intervals constructed in each simulation. To illustrate, Fig. 5.1 depicts the reconstructed source amplitude of the first source (location (0.0, 0.5, 0.75)) in each simulation with 400 trials in which the correct model was fitted. Furthermore it depicts the estimated source locations of all sources in each simulation. As can be seen from Table 5.1, the coverage rates of the confidence intervals are rather close to their theoretical expected level of 95%—even for relatively small numbers of trials (i.e. L = 100). The latter is somewhat surprising because the theory was developed on the assumption of large numbers of trials. The departure from the theoretical value of the coverage rates of σk was anticipated from the results reported in [87] (Chap. 4). The remarkable feature of these coverage rates is that 5

This follows from the fact the vector autoregressive process (VAR) can be written in the frequency domain in matrix terms as ak = A exp(−i2πk)ak + ζ k or [I − A exp(−i2πk)]ak = ζ k , where A is the a2 and  a3 . This VAR is invertible. Multiplying both 3 × 3 matrix that implement the equations for  a1 ,  sides of this equation with (diag[I − A exp(−i2πk)])−1 and equating [I − Bk ] with the resulting left hand side assures that the diagonal elements of I − Bk are equal to 1, and hence diag(Bk ) = 0. ζ k is  now turned into ζ ∼ k which represent a colored forcing process in stead of the original white input ζ(t).

5.6 Concluding remarks

83

they are near perfect when few trials are available, and departure increases as the number of trials increases. This seemingly paradoxical result is due to a slight bias of the estimators in combination with oversized confidence intervals for relatively few trials (L = 100). At L = 400 the coverage rate of these parameters is about 68% which falls neatly in between the rates reported in Chap. 4 for these estimators under SNR’s 1:1 and 1:5, and the same number of trials. Apparently, as the signal to noise ratio decreases, the bias increases, as at L = 400 the estimated standard errors were in fact quite good. The acceptance rate of the GLRT is graphed in figure 5.2. As can be seen from the figure the GLRT rejected both models when trial count was low (L = 100), indicating that the asymptotic approximation is inadequate with low trial counts. Acceptance of the correct model was near the nominal rate of 95% for moderate L = 200 (89%) and at nominal rate for relatively large numbers of trials L = 400 (96%), indicating adequate approximation of the statistic by the asymptotic distribution in these cases6 . The incorrect model, was accepted too often with moderate trial counts (19%), indicating that the GLRT was too insensitive to modelling errors in such cases. For relatively large numbers of trials the GLRT therefore seems to be helpful in detecting interactions (but see the discussion below).

5.6 Concluding remarks We have formulated a framework for modelling coherence between sources in terms of linear transfer functions. This framework has its roots in the techniques known in the statistical literature as structural equation modelling (SEM) [20], confirmatory factor analysis [121], frequency domain dynamic factor analysis [163] and simultaneous equations [5]. For the latter, frequency domain-like variants were proposed in [28]. We have given closed form expressions for estimators of the separable parameters and an expression for the concentrated negative log-likelihood which greatly simplify the numerical optimization procedure. The expressions obtained are very similar to the standard expressions found in the signal processing literature on SML DOA estimation, and extends the SML methods with the inclusion of the mean, and a more general noise covariance matrix which may depend on unknown parameters. The results of the simulations show that parameter standard errors are reliably constructed; regardless of the number of trials. Furthermore, the results indicated that the GLRT statistic can be indicative on the presence of interactions between sources, provided that enough trials are available (L ≥ 200). In [87] we have also considered least squares techniques. However, the generalized least squares method, although known to have the same asymptotic statistical properties as ML estimators (Chap. 3; [29]), yielded biased estimates of source coherence in finite samples, and were therefore not considered here. Frequency domain dipole modelling of EEG/MEG data has been pursued before in [144,189, 191], while the asymptotic statistical independence of Fourier coefficients has been exploited in the context of general EEG signal analysis in e.g. [184], and [190]. The method discussed here and in [87], can be considered as extensions to the methods in these references. Other approaches that use dipole localization techniques from the outset to study cortical synchrony have been presented in [105], [97] and [49]. In [105] synthetic aperture magnetometery (SAM) is used to derive time series of activity in regions of interest, that are then subjected to phase analysis. In [97] a beam forming technique is used that searches for sources with maximum coherence. In [49] an interesting adaptation of iteratively refined minimum norm estimation [212] is presented which uses a bootstrapping technique on surrogate data. As argued in [49], problems with the first two approaches are that the linearly constrained minimum variance beam formers were developed under the assumption of incoherent sources, and their performance is 6

This was confirmed by a Kolmogorov-Smirnov test on the distribution of the statistic.

84

5 Stochastic ML mean and cross-spectrum structure estimation of EEG/MEG dipole sources

known to deteriorate with coherent sources. Furthermore the method in [97] only finds coherent sources, while neurophysiological research indicates that desynchronization of sources may play an important role in several cognitive processes [24, 183, 230] (cf. [49]). A similar argument would hold against the use of MUSIC for estimating coherence between sources [49, 221]. The minimum norm estimate is known to suffer from bias in its location estimates but was improved with bootstrapping methods [49]. Once the regions of activity have been localized in this manner, these authors suggest to perform a phase analysis on reconstructed time series [105]. Although this method seems promising, it is as yet difficult to see a principled framework in which the adequateness of the resulting source model can be assessed. In contrast, maximum likelihood estimation directly provides measures to assess modelling adequateness in the form of the GLRT statistic. As an alternative to SML estimation, subspace fitting methods (SF), in which Λ(θ) is fitted by least squares to subspace vectors obtained from e.g. principal components analysis (PCA) or independent components analysis (ICA [115]) have been investigated [175]. SF methods are corrected versions of methods that fit individual columns of Λ(θ) to individual subspace components as is done in [151] (PCA) and [60,152] (ICA), which are known to be suboptimal [1,53]. Weighted SF (WSF), which is based on PCA, was shown to yield asymptotically efficient estimates of θ in [175]. Furthermore WSF and SML were shown to be asymptotically robust against violations of distribution assumptions of the source signals. Currently, general (asymptotic) distributional properties of other subspace estimates, e.g. obtained from ICA, are unknown, and therefore it is unclear whether such estimates are efficient. As indicated earlier, we also investigated generalized least squares estimation of cross-spectrum structures, which is also known to be asymptotically efficient [29, 87, chapter 3], and concluded that it yields strongly biased coherence estimates in finite samples—this, in contrast with the SML estimates. We plan to investigate this in the simpler (W)SF estimates in future work. With respect to the GLRT statistic a word of caution is in order. As indicated in Chap. 4 the GLRT statistic is distributed as χ2 only asymptotically, that is, for large L. At the same time as L grows larger, the sensitivity to modelling error increases, and the test is likely to become significant because of the necessary approximations in the head model, the source model (dipole approximation to extended sources) and noise model. Therefore the GLRT may not be very appropriate as a rigid rejection criterion, and it has been recommended to use it more as a descriptive index of overall of fit rather than as a statistical test [123]. A large number of alternative measures have been presented in the literature, an overview of which may be found in [20]. In [237] a number of fit indices for selecting the number of dipole sources have been assessed, both with respect to certain theoretical requirements, as well as in numerical experiments with dipole localization with MEG. It was found that information theoretic criteria on the one hand, and Wald tests on source amplitudes on the other hand, were quite effective under various circumstances. In the current setup, if the mean is modelled, then the confidence intervals of the αk parameters are akin to the Wald amplitude test discussed in [237]. The difference between estimates of unparameterized Ψ and estimates of Ψ parameterized by B and Φ is precisely the distinction made in the neuroimaging community between “functional” and “effective” connectivity [230]. However, it should be emphasized that in modelling the coherence between sources, several equivalent models may exist, which can have very different neurophysiological interpretations. For example in the case of two sources, the interaction may be modelled as the first source being input to the second, or vice versa. Both models would fit equally well, so no distinction can be made on the basis of the fit. Therefore, in applications a priori information should be available on which interaction patterns are considered to be more valid than other, mathematically equivalent ones [20].

5.6 Concluding remarks

85

Appendix Equation (5.16). Assume Ψ−1 and Θ−1 exist, and that Λ has full column rank. From the matrix inversion lemma (A + CBD)−1 = A−1 − A−1 C(B−1 + DA−1 C)−1 DA−1 , e.g., [199, p. 9], we have Σ−1 = (ΛΨΛ∗ + Θ)−1 = Θ−1 − Θ−1 Λ[Ψ−1 + Γ]−1 Λ∗ Θ−1 where Γ = Λ∗ Θ−1 Λ. From this, and from (I + B)−1 = I − (B−1 + I)−1 , we find that Λ∗ Σ−1 = (I − Γ[Ψ−1 + Γ]−1 )Λ∗ Θ−1 = (I − [Ψ−1 Γ−1 + I]−1 )Λ∗ Θ−1 = [I + ΓΨ]−1 Λ∗ Θ−1 = [Γ−1 + Ψ]−1 Γ−1 Λ∗ Θ−1 . Therefore, postmultiplying Λ∗ Σ−1 by Λ we find Λ∗ Σ−1 Λ = [Γ−1 + Ψ]−1 , which yields (Λ∗ Σ−1 Λ)−1 Λ∗ Σ−1 = (Λ∗ Θ−1 Λ)−1 Λ∗ Θ−1 .

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

6 Optimizing interpretability of averaging kernels for the neuro-electromagnetic inverse problem

6.1 Introduction The1 neuro-electromagnetic inverse problem involves the determination of the current distribution in the brain, given a set of electro- (EEG) and magneto- (MEG) encephalographic signals [91, 102]. To be able to solve the inverse problem, it is necessary to impose restrictions on the reconstructed current distribution. This is due to sources that do not produce fields that are measurable with EEG/MEG, due to the limited number of sensors, and due to the noisiness of the measurements [91, 102]. Restricting the number of sources, as is done with equivalent current dipole (ECD) models, is appropriate for relatively simple configurations of current sources [57, 102]. Successful application of ECD models has therefore been confined to data obtained in simple tasks. If little is known about the source configuration, and complex current density distributions are expected (e.g. in complex cognitive tasks or single trial data), feasible methods that impose less restrictive assumptions than ECD models are desirable. This is especially true when ongoing cortical dynamics is of interest, in which trial averaging is not possible [205]. Linear inverse solutions, sometimes called distributed source models, seem to be an attractive alternative to ECD models. These only impose an assumption on a global property of the current distribution (e.g. minimum norm, minimum Laplacian) [102, 178, 212]. Other alternatives that make little assumptions about the current distribution, are “optimal spatial filters”, and include the beamformer approaches originating from radar signal processing [21,229,233], and the Backus-Gilbert method originating from geophysics [92, 94, 160]. Furthermore, scalp Laplacian methods have been suggested as assumption free spatial filters [23, 182, 214]. All these methods essentially form linear combinations of the sensor array data, in an attempt to construct a new measurement that reflects activity of a target location in the brain. However, due to the inherent ill-posedness of the EEG/MEG inverse problem and the limited number of measurements, these linear techniques can only provide very blurred images of the underlying current distribution. This strongly limits their interpretation [91]. Despite these limitations, there may exist linear combinations of the sensor measurements, that correspond to weighted averages of the underlying current densities, so called averaging kernels, that do have a useful and clear interpretation. If they exist, it is of interest to find these linear combinations. Each linear method proposed in the literature, focusses on optimizing the linear combination to reflect “as good as possible” the activity in a preselected region or activity of interest (ROI/AOI) [94, 96, 146]. By the physical nature of the measurements, not all possible choices of AOIs can be approximated equally “good”. Often the approximation of a chosen AOI is far from ideal [91]. Consequently, the resulting estimate includes activity from regions outside the intended AOI, resulting in ambiguous interpretation of the estimates. Although the relative contribution of each (potentially active) location to the estimate can be determined, the pattern of 1

This chapter constitutes [89].

6.2 Theory

87

these contributions is often so much widely spread and complex, that the estimates cannot be easily interpreted [92, 146]. Usually the preselected AOI’s consist of point sources in small voxels (in distributed source models many such AOI’s). This specific type of AOI may not be the type that can be best targeted by an averaging kernel that exists within the physical constraints of origins of the data. Instead of focussing on approximating the average current in a preselected AOI as good as possible, one may focuss on finding those AOI’s for which averaging kernels exist that do have a simple interpretation (for example in terms of the brain regions that contribute to the activity estimate—e.g., an extended, but tightly circumscribed cortical region). It is of course not at all guaranteed that such averaging kernels exist, and in fact not easy to determine if they do. The best one can do then, is to try to find such averaging kernels within the class of all kernels that satisfy the constraint imposed by the biophysical origin of the data. That is, one may focuss on optimizing the interpretability of the resulting current average, and let the AOI be determined by the constraints of the physical nature of the inverse problem. This shift of focuss has two advantages: First, in this way, it may be possible to find those linear combinations of the sensor measurements, that can be easily interpreted (in terms of the brain regions that contribute), without necessitating a priori assumptions on the current density distribution. Second, it may be possible to find the “simplest interpretable” AOI’s that exist—hence providing a lower bound on the ambiguity of any weighted current average estimate. Such a bound is of interest, because it indicates fundamental limitations on the possibility of obtaining activity estimates of localized brain regions without making a priori assumptions on current density distribution (such as a limited number of dipoles). In this paper we explore the possibility of optimizing measures of “simpleness” of averaging kernels for enhanced interpretability of current density estimates, and present a general strategy for doing this. Throughout this paper we shall use the following notation: Bold lower case letters are vectors, bold upper case letters are matrices, 1k is the k vector whose elements are all equal to one,  denotes matrix transpose, X+ is the Moore-Penrose inverse of X, and ⊗ denote the Hadamard and Kronecker products, Ik is the k × k identity matrix, and  · F denotes the Frobenius norm.

6.2 Theory All linear techniques are tantamount to determining a set of weighting coefficients for linearly combining the sensors, such that a certain desirable property of the estimate is optimized [96]. The theory of linear inverse methods, in particular the concept of the resolution matrix, has been suggested as a framework in which these methods can be compared and in which the interpretability of the resulting current average estimates can be evaluated [91, 92, 94, 146]. In this paper we will use the related framework of estimable functions, which is the standard framework in statistical regression theory, in which underdetermined regression models are dealt with [166, 199]. 6.2.1 Estimable functions In the discrete form of the neuromagnetic forward problem, the brain is subdivided into a large number, d say, of voxels; each of which contains 3 dipoles, one for each of the x, y and z Euclidean orientation components. The relation between the primary current density η and the measurements y can be expressed in the linear form [102] y = Λη + ε.

(6.1)

Here the 3d vector η contains the x, y, and z components of the primary current in each voxel, y is the m vector containing the sensor measurements at one instant in time, and ε is the m vector of measurement noise. The lead field matrix Λ is an m × 3d matrix, of which each block

88

6 Optimizing interpretability of averaging kernels for the neuro-electromagnetic inverse problem

of 3 consecutive columns corresponds to the sensor measurements due to unit amplitude dipoles with an x-, y- and z-axis orientation respectively, that are located in the voxel associated with the block. The number of voxels is usually much larger than the number of sensors, i.e. d  m. In such  of η exists [166, 199]. The standard way of dealing with these cases no unbiased estimator η cases in classical statistical regression analysis, is to find a class of linear combinations of the form p η, for which an unique unbiased estimator exists; i.e. the class of p’s, for which there exists an m vector h such that p η = E{h y} = h Λη. Here p is an 3d vector, and E{·} denotes expectation. This class of linear functionals is known as estimable functions [199]. Obviously, the linear span of this class of functionals is identical to the row space of Λ, and it may be determined that the requirement that p is estimable is equivalent to the condition [199] p Λ+ Λ = p .

(6.2)

Post multiplying both sides by η, and identifying E{y} = Λη it is seen that h = p Λ+ and p = h Λ. From the class of estimable functions, those functions are selected, that have interpretational significance. What this means exactly, depends on the field of application. In the case of current densities it may be argued that they should have compact support—i.e. the elements of p should be almost all (near) zero, except for a few elements that delimit a circumscribed region in the brain. More generally, p may be chosen to optimize a measure Q of “simpleness” within the class of estimable functions, which is a mathematical specification of what kind of current density averages (indexed by h) are considered to be “easily interpreted”. 6.2.2 Relation with linear methods In this section we indicate the significance of the estimable function concept to distributed source model estimates and spatial filter estimates. Distributed source models All linear inverse distributed solutions to (6.1) can be written as  = Γy = Rη + Γε, η

(6.3)

where Γ is a 3d × m matrix, and the 3d × 3d matrix R = ΓΛ is called the resolution matrix  may be seen to be the inner product of the [91,94,160]. From this equation, each component of η true current density η with a row of the resolution matrix, and hence, is a weighted average of the underlying η. Each row of R is therefore called the averaging kernel associated with a voxel for a given inverse solution [94]. If an averaging kernel has narrow compact support, the kernel is described as being localized. In the ideal case R = I3d . One strategy of choosing Γ is therefore to minimize a distance measure of R from I3d . Different distance measures lead toinverse solutions 2 with different properties optimized. For example, minimizing R − I2F = a,b (Rab − δab ) , + leads to the minimum norm (MN) inverse Γ ≡ Λ [160]. To account for the noise term in (6.3), the MN operator may be regularized [160]. Each row of the resolution matrix R is in fact an estimable function p , optimized to target a point source. Its weight h is the corresponding row of the chosen inversion operator Γ. For general h , not necessarily a row of Γ, the corresponding estimable function p was called the resolution field in [146]. The fact that each conceivable linear combination of sensor measurements is associated with an estimable function, c.q. resolution field, gives these concepts a central role in the evaluation of linear inverse methods. In what follows, we use the term resolution field specifically to refer to the spatial sensitivity pattern of a given estimable function. Where more convenient, we will use the term averaging kernel interchangeably with estimable function.

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

6.3 Method

89

Beamformers, virtual sensors and Laplacians Beamformers used as virtual sensors, also choose h to optimally let h y reflect the activity in a region of interest (ROI) [21, 96, 209]. In contrast with distributed source methods, this is done on a voxel by voxel basis. For example, the linearly constrained minimum variance (LCMV) beamformer chooses h to ensure that the target voxel has unit weight in the current average estimate, while interference from other sources is minimized [203, 229]. Laplacians have been suggested as spatial filters to enhance the resolution of scalp topographies [23, 182, 214]. For each point of the scalp, a Laplacian method (e.g., Hjorth, spline interpolated) finds an h such that h y gives the (interpolated) value of the Laplacian at that point. In each of these methods, the weight vector h is associated with an estimable function p = h Λ, whose resolution field indicates the brain regions that (potentially) contribute to the weighted average. Comparison of the resolution fields makes it clear that voxel by voxel current estimates represent largely overlapping averages, and therefore cannot be interpreted on a voxel by voxel basis in a simple manner. 6.2.3 Optimized criteria of linear methods Results obtained with the various linear methods depend highly on the properties they were chosen to optimize [91–94,96,146]. The key idea in choosing point sources as AOIs, is of course that a localized averaging kernel, preferably a point location, has the simplest possible interpretation. Point sources in general, and point sources at a priori chosen target locations in particular, may not be the type of AOIs, for which sufficiently selective estimable functions exist. Other AOIs, e.g. in sharply circumscribed regions, can also be considered to have “simple interpretation”. The type of AOIs with simple interpretation, for which sufficiently selective estimable functions exist, highly depend on nature of the lead field matrix Λ. In the next section we describe a general method for finding those estimable functions in the row space of Λ, that provide the “simplest” possible (which remains to be defined precisely) interpretation.

6.3 Method 6.3.1 “Simpleness” and simple estimable functions Finding matrices with a “simple” structure by linearly combining the rows or columns of another matrix has been a long standing problem in factor analysis, and has led to a variety of so called analytic rotation procedures [31, 160]. For matrices of the size of Λ in the current application (easily more than ten thousand columns and typically more that one hundred rows), full rotation of Λ is computationally somewhat unpractical. We will therefore focus on finding a single simple estimable function. A measure of the complexity of a vector λ = (λ1 , . . . , λd ) with all nonnegative elements suggested by Carroll [38] (cited in [31]) is QCarroll (λ) = λ1 λ2 + λ1 λ3 + λ1 λ4 + · · · + λ2 λ1 + λ2 λ3 + λ2 λ4 + · · · 

(6.4)



= λ (1d 1d − Id )λ. If λ = 0, QCarroll is zero if and only if all but one of the elements of λ are zero. Furthermore, QCarroll increases as more elements become nonzero [31]. Therefore the “least complex” vector has a single positive element (compare with a point source). Note that QCarroll can be interpreted as the total sum of the auto-covariance function of the vector λ, minus the auto-covariance ‘at lag zero’. If the elements of λ have a natural ordering, e.g. on a time- or space-axis, this observation

90

6 Optimizing interpretability of averaging kernels for the neuro-electromagnetic inverse problem

leads to a natural extension of QCarroll to an extended region, by subtracting lagged autocovariances within a chosen range of lags from the total sum of the auto-covariance function. The width of the chosen range of lags determines the width of the extent that is still considered to be simple. In a more flexible criterion, one can define a weighing function w(a, b) = w(a − b), with w(a, a) = 1 and define the weighted auto-covariance complexity 



Qacfw (λ) = λ 1d 1d λ −

d  d 

w(a − b)λa λb .

(6.5)

a=1 b=1

Let λ = (p(1) 2 , . . . , p(d) 2 ), with p(a) = (pa,1 , pa,2 , pa,3 ), where pa,b is the (3(a−1)+b)-th element of p = h Λ. Note that each element of λ indicates the squared total contribution of the corresponding voxel to the weigthed current average p η. In the neuroelectromagnetic inverse problem, Carroll’s complexity can be applied to λ. This is interpreted as the 3D spatial autocovariance function of λ at ‘zero distance lag’ subtracted from its total sum. To apply Qacfw to λ, w is chosen to be dependent on the voxel locations: w(a, b) = w(ra − rb ). One particular weight function would be the squared distance ra − rb 2 , resulting in a Backus-Gilbert type measure of the peak-width of the autocovariance function [11]. For this type of w, it can be advantageous to compute the spatial auto-covariance function using a 3D Fast Fourier Transform. For w with compact support, it is often more efficient to use (6.5). Another well known complexity criterion is Kaiser’s varimax [124, 160], which can be written as 2 d  d 1 1 Qvarimax (λ) = − λa − λb . (6.6) d d a=1

b=1

Minimizing Qvarimax is equivalent to maximizing the variance of the entries of λ. Therefore, these will either become close to zero or become as large as possible, resulting in high specificity on the source locations that contribute. A third complexity measure, called oblimax [196], may be stated as d



Qoblimax (λ) = −

λ2a

a

d  

2 λa ,

(6.7)

a

and seeks λ that is most orthogonal to the ‘least selective possible’ sensitivity pattern 1d , irrespective of its norm. This compels λ to align as good as possible with one of the columns of Id . The last complexity measure that we consider here is based on the entropy function from information theory. If s = λ/(1d  λ), so that 1 d s ≡ 1, it can be expressed as d 

Qentropy (λ) =

sa log sa ,

(6.8)

a

where log denotes the natural logarithm. Qentropy is minimum, if one of the elements is equal to one, and the others all tend to zero. This suggests that minimization of the entropy finds vectors with many small elements [31]. 6.3.2 Finding ‘simple’ linear combinations For a given complexity criterion Q(λ), define f (h) = Q(λ(h)). We seek a solution to the problem ˘ = arg min f (h) h h

subject to h = 1,

(6.9)

where the constraint ensures that h = 0. An algorithm for rotation, i.e. for minimizing f (h)  subject to h = 1, was provided in [119]: Given some β > 0 and an initial unit norm vector h, it consists of repeating the steps

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

6.4 Numerical illustrations

91

 − β∇f 1. Compute h ← h  ← h/h 2. Reassign h  To guarantee convergence, a step halving until convergence. The gradient ∇f is evaluated in h. procedure of β is necessary: If f (h/h) is not decreased after Step 1, β is halved, and Step 1 is recomputed. Halving of β is repeated until f evaluates to a smaller value than on the previous iteration. After Step 2, β may then be restored to it’s initial value. It was shown in [119] that   ∇f ) h,  which suggests the stopping criterion the algorithm has converged if and only if ∇f = (h  2 −3   ∇f − (h ∇f ) h <  , where e.g.  = 10 [119]. The gradient of f , can be compactly expressed in terms of the gradient of Q (see the appendix to this chapter, p. 96): ∇f = 2 Λ(p (∇Q ⊗ 13 )), so that only ∇Q still has to be determined. Table 6.1 tabulates (equivalent forms of) the criteria from the previous section, together with their gradients. Table 6.1. Complexity rotation criteria Q

∇Q

Carroll acfw varimax

λ (11 − I)λ λ (11 − W)λ − d1 λ (I − d1 11 )λ

2 (11 − I)λ 2 (11 − W)λ 1 1  d (I − d 11 )λ

oblimax

−λ λ/(1 λ)2

2 [λ λ1 − (1 λ)λ]/(1 λ)3

entropy

s slog

[(1 λ)I − 1λ ](slog + 1)/(1 λ)2

In this table, for clarity 1d and Id were denoted 1 and I respectively. The weight matrix W in the expression for Qafcw is given by W = (wab ), with wab = w(a, b). Furthermore, s = (sa ) = λ/(1d  λ), and (slog )a equals log sa if sa > 0, or 0 if sa = 0.

Usually, the interest lies in multiple current density averages that have a simple interpretation. A second estimable function, p∗ , may be found by simultaneously optimizing a non-interference condition, e.g. minimizing p p∗ 2 , which would minimize sensitivity of p∗ to sources to which p is sensitive. Generalization to multiple estimable functions is then straightforward.

6.4 Numerical illustrations In this section we demonstrate the method for a MEG lead field, and obtain the estimable functions that are least complex according to the suggested criteria. We consider the QCarroll , Qvarimax , Qoblimax , and Qentropy criteria; we do not consider Qacfw (with Backus-Gilbert weighting), because of its heavy computational burden. To obtain the lead field matrix, a sphere of radius 9.5 cm was subdivided into 4945 equally sized (0.9 × 0.9 × 0.9 cm) voxels, each containing three sources oriented along the x, y and z axis. The lead field was calculated as given in [195]. The sensor array consisted of 151 first order gradiometers, positioned in accordance with the helmet of whole-head MEG system (CTF Inc., Port Coquitlam, BC, Canada). 6.4.1 Rotation of the lead field: comparison of the criteria Fig. 6.1 depicts the resolution fieldsresulting from optimizing Carroll’s, varimax, oblimax and entropy criterion respectively. Fatter arrows indicate the sources that h is more sensitive to. In table 6.2 each h is measured in terms of the criterion they optimize, as well as the other criteria

92

6 Optimizing interpretability of averaging kernels for the neuro-electromagnetic inverse problem

z

z

x

x y

y

Carroll

Varimax

z

z

x

x y

y

Oblimax

Entropy

Fig. 6.1. Resolution fields of Carroll’s, the varimax, the oblimax and the entropy criteria: For each voxel a source is plotted as an arrow. Fatter arrows correspond to voxels that have relatively greater contribution to the spatial filter output–i. e., voxels with a fatter arrow weigh more heavily in average current density estimate than voxels with a thinner arrows. Table 6.2. Comparison of the rotation criteria. Carroll Varimax Oblimax Entropy Best sensor LCMV beamformer MN Laplace-Perrin Laplace-Hjorth

QCarroll 0.0019 77585.0 4396.6 9489.0 73342.3 67788.8 5504.6 8042.7 27292.4

Qvarimax −0.2015 −11 · 106 −1 · 106 −4 · 106 −10 · 106 −10 · 106 −2 · 106 −7 · 105 −6 · 106

Qoblimax 46.210 35.021 12.199 12.732 38.629 32.655 13.122 55.653 21.356

Qentropy 4.6130 5.0517 3.3293 0.0005 4.9504 5.0461 3.5570 5.1023 3.7611

peak-width 24170.19 67382.78 8760.69 1864.86 41329.80 59308.95 14975.35 51018.11 4069.33

Upper half: Complexity of λ(h), for each of the h’s obtained from optimizing Carroll’s, the varimax, the oblimax, and the entropy criteria in (6.4)-(6.8) as measured by each of these complexity measures. Lower half: Complexity measures determined for the estimable functions that are associated with the weight vectors obtained from the linear inverse methods by targeting the ‘entropy source’—the source that has the greatest contribution to the estimable function obtained from the entropy criterion. Laplace-Perrin is the Perrin interpolated Laplacian for the best sensor. Laplace-Hjorth is the Hjorth Laplacian for the best sensor. Right most column: Backus-Gilbert type peak-width of the autocorrelation functions (refer to text for details).

6.4 Numerical illustrations

93

I

II

III

IV

V

VI

VII

y

VIII x

IX

Fig. 6.2. Rows I-IV: Comparison of different rotation criteria in terms of the spatial autocorrelation of λ(h) at spatial “lags” 0 to 9. Each row displays xy-plane cross-sections of the three dimensional autocorrelation. Most left column depicts spatial autocorrelation in x (inion-nasion) and y (left ear right ear) direction (x,y-“lags” −9, . . . , 9) at z-“lag” zero. The last column depicts the autocorrelation at z-lag nine. (I) Carroll (II) varimax (III) oblimax (IV) entropy. Rows V-IX: Comparison with other linear filters. These rows display the spatial autocorrelation of λ(h), where h was obtained from conventional techniques by targeting the entropy source—the source with the largest ‘amplitude’ in λ(hentropy ). (V) ‘best sensor’ (VI) LCMV beamformer (VII) Minimum norm (VIII) spherical spline interpolated Laplacian. (IX) Hjorth Laplacian.

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

94

6 Optimizing interpretability of averaging kernels for the neuro-electromagnetic inverse problem

for comparison. As seen in Fig. 6.1, three of the four rotation criteria converged to an estimable function that focusses on the same area near the vertex. This is not very surprising, since this is a superficial source near the vertex whose field is probably best covered by the sensor array that we used. Only Carroll’s criterion has converged to an alternative point of focuss. Note that the resolution fields of these least complex averaging kernels (in terms of the chose criteria), all comprise the entire sphere. It is difficult to compare the localizedness of the resolution fields of the resulting estimable functions from Fig. 6.1. To get a better impression of this localizedness, the spatial autocorrelation of λ(h) was calculated. In the ideal case, in which λ(h) is strongly localized, the autocorrelation will quickly drop to zero as a function of spatial lag. Therefore the autocorrelation function gives an indication of the localizedness of the estimable function. This allows comparison of different estimable functions exclusively in terms of their localizedness, without reference to a particular target AOI. For each complexity criterion, Fig. 6.2 (rows I-IV) depicts slices of this 3 dimensional autocorrelation function for distances of 0 to 9 (corresponding to shifts of 0.9 cm in 3D space from 0 to 8.1 cm), taken parallel to the xy-plane along the z axis. Because the spatial autocorrelation is depicted, the maximum in these pictures is equal to 1, and occur at x, y and z lags zero—i.e. the center of the slices in the first column in Fig. 6.2. The pictures indicate visually that the varimax criterion yields the least localized estimable function (the iso-contours are relatively widely separated), while the entropy criterion yields the most localized estimable function (iso-contours are closely packed together). To make a more quantitative comparison we calculate Qacfw with Backus-Gilbert weighting, as this is a more widely accepted measure of the peak-width of a unimodular function [11]. It was rescaled such, that it measures the peak-width of the autocorrelation of λ(h). This peak-width is tabulated in table 6.2 for all used rotation methods. The peak-width of the autocorrelation function is not to be interpreted as indicating the “best” estimable function, because in that case we better could have minimized this measure (which is computationally much more expensive). It should be kept in mind that the complexity measures in (6.4)-(6.8) were chosen to emphasize other properties than this Backus-Gilbert peak-width. It does indicate however, which of the estimable functions is more localized. The peak-width values in table 6.2 confirm the earlier observations from Fig. 6.2, that of the complexity criteria, varimax yields the least localized, and entropy yields the most localized estimable function. 6.4.2 Comparison with linear techniques Rows V-IX in Fig. 6.2 similarly display autocorrelations for spatial filters obtained from the linear methods discussed in section 6.2, that most closely correspond with the entropy result as follows: For h obtained from the entropy criterion, the source was determined to which this spatial filter is most sensitive. This source, that we shall coin the entropy source, has the greatest contribution in the estimable function p = hentropy  Λ, and was taken as the AOI for the linear methods. In row (V) of Fig. 6.2 h = (0, . . . , 1, . . . , 0) and selects the “best sensor”: Following [146], this is the sensor with the highest amplitude if only the entropy source is active. In row (VI), h is set equal the LCMV beamformer that targets the entropy source. In (VII), h is set equal to the row of the minimum norm operator Λ+ corresponding to the location of the entropy source. In row (VII) h was obtained from spherical spline Laplacian method at the best sensor in (V), while in row (IX) h implements the corresponding Hjorth Laplacian. The Backus-Gilbert peak-widths were also computed for these autocorrelation functions, and are given in table 6.2. Comparing all autocorrelation functions in Fig. 6.2 and their peak-width in table 6.2, it may be concluded that, of the proposed criteria, two (oblimax and entropy) yield more compact estimable functions than the standard techniques, with an exception for the Hjorth Laplacian method. The averaging kernel with the smallest autocorrelation peak-width is obtained with the entropy method, followed by the Hjorth Laplacian. The largest peak-widths,

6.5 Discussion

95

result from the varimax criterion, the LCMV beamformer, the spherical spline Laplacian, and the best sensor methods. A weakness in this comparison is, that the Backus-Gilbert peak-width is not entirely appropriate for all methods. For example, the varimax criterion explicitly does not aim at localized estimable functions, but rather at estimable functions that are as specific as possible about the source locations that do or do not contribute. The autocorrelation of the varimax depicted in Fig. 6.2 shows however, that it is unimodular and diminishes monotonically with increased distance. Hence, the peak-width comparison seems warranted in this case. A surprising result in table 6.2 is the fact that of the standard methods, the Hjorth Laplacian—which, ironically, is probably the oldest spatial filter that has been proposed in the literature to increase the resolution of EEG—has the narrowest autocorrelation function. On the contrary, the spherical spline Laplacian has the but one largest width. Inspection of its spatial autocorrelation conveys, that it does not monotonically decrease with increased distance in all directions. Instead, it increases both up and down the y-axis at z-axis lags greater than six, resulting in a bimodal autocorrelation along the y-axis, at x-axis lag zero and z-axis lags five to nine (row VIII, columns six to ten in Fig. 6.2). This indicates that source clusters that are located at some distance (4.5-9 cm) from each other, contribute significantly in the averaging kernel. This was confirmed by the resolution field plot of the spline Laplacian (not depicted). This sheds some new light on a discussion in the literature, of ‘artifactually high’ coherences due to spherical spline Laplacians [16, 176, 181].

6.5 Discussion We have discussed a general strategy for finding alternative linear spatial filters in EEG and MEG inverse solutions than currently available. One advantage of this strategy is that it focusses on optimizing the interpretability (e.g. its selectiveness) of the resulting current average estimates. This contrasts with conventional methods, that try to estimate as good as possible the activity in a preselected AOI, which does not guarantee that the estimate will be good, or even interpretable. Even a single spatial filter that is highly specific about the regions from which the estimated activity arrives would be of great interest to researchers, as they can then at least be confident about origins of the activity estimated with this filter. As was indicated in [146], most spatial filters reflect activity from vast portions of the brain, defying such confidence. An additional advantage of the proposed approach is that, given a definition of ‘simplicity in interpretation’, it provides the most easily interpreted averaging kernel that is possible within the biophysical constraints of the neuro-electromagnetic inverse problem—thus providing fundamental limits on the interpretability of assumptionless current average estimates. This is especially of interest in estimating localized current densities from single trial data, in which assumptions of a limited number of strongly focalized dipolar sources is generally believed to be unwarranted [205]. Furthermore, the complexity criteria suggested here allow comparison of different spatial filters without reference to a specific AOI. This is different from other measures, such as the source identifiability measure which measures the fraction of current density that is visible of a given current distribution [91]. These measures focuss on the goodness of a reconstruction of a given source. The current criteria instead focuss on the interpretability of the current average estimate without worrying about the specific locations. They therefore allow comparison of very different types of filters, some of which are not even designed to target point locations inside the brain (e.g., both Laplacian methods). The current methods finds estimable functions that are ‘simplest’ (least complex) with respect to a chosen understanding of complexity, that can be constructed within the row space of the lead field matrix. In terms of the autocorrelation peak-width, the entropy result, as well as being least complex according to Qentropy , was also the most localized estimable function. In addition, it is second best in terms of both Qvarimax and Qoblimax , indicating high specificity and selectiveness compared to the other estimable functions. From the display of its resolution

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

96

6 Optimizing interpretability of averaging kernels for the neuro-electromagnetic inverse problem

field it may be seen however, that even this most localized averaging kernel, still has substantial contributions from widely separated regions throughout the head. Furthermore, the pattern of these contributions is rather complex, defying simple interpretation of a resulting current average estimate. This suggests that assumption free current estimation in circumscribed areas in the brain is fundamentally strongly limited by the physical nature of the inverse problem. We do not wish to suggest however, that we have provided the best possible complexity criteria. In theory the complexity criteria QCarroll and Qacfw aim at point source and spherically symmetric AOIs respectively. The varimax criterion on the other hand, does not put an a priori constraint on the shape of the AOI, and only compels the contribution of a voxel to be either high or low. Other, more flexible criteria may be developed.

Appendix The Jacobian ‘matrix’ of f is ∂Q ∂λ ∂f = .  ∂h ∂λ ∂h To find ∂λ/∂t , let px = (p1x , p2x , . . . , pdx ) and Λy and Λz be defined similarly. In the same way, let Λx , Λy and Λz denote the x, y and z components of the lead field Λ, i.e. Λx = Λ(Id ⊗ ex ), where ex  = (1, 0, 0), and similarly for the y and z components. Then we may write λ = px px + py py + pz pz , hence from d(A A) = 2A dA, the differential of λ is 2px dpx + 2py dpy + 2pz dpz = 2px Λx  dh + 2py Λy  dh + 2pz Λz  dh, so that ∂λ/∂h can be written 1 ∂λ 1 =  2 ∂h 2



∂λ ∂λ ,··· , ∂t1 ∂tm



= [px (Λx  ).1 · · · px (Λx  ).m ]

+ [px (Λy  ).1 · · · px (Λy  ).m ] + [pz (Λy  ).1 · · · pz (Λz  ).m ],

where (A).a denotes the a-th column of A. Now note that for each a, ∂Q/∂λ ∂λ/∂ha is the sum of inner products of the form 2 ∂Q/∂λ (Λw (Λw  ).a ), w assuming the indices x, y, and z. With the equality s (u v) = (s u) v, for conformable vectors s, u, v, these inner products can be written   ∂Q ∂Q  

p

(Λ ) (Λw  ).a , p = w w .a w ∂λ ∂λ from which the gradient of f , which is the transpose of the Jacobian matrix, is determined to be    ∂Q ∂Q ∂Q 1 ∂f (6.10) = Λx

px + Λy

py + Λz

pz . 2 ∂h ∂λ ∂λ ∂λ In view of the column structure of Λ and the element structure in Λ, the gradient can be concisely written   ∂f ∂Q , (6.11) = 2Λ p

⊗ 13 ∂h ∂λ where A B = B A was used.

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

7 Summary and discussion: requirements for accurate connectivity estimation

In this chapter we will summarize the previous chapters, discuss some general conclusions on detecting functional connectivity from EEG/MEG data, and indicate some directions of improvement.

7.1 Summary 7.1.1 Justifiable attribution of cortical dynamics parameters to localized brain areas In chapter 1 electric current dipoles were introduced as models for the current sources underlying the neuro-electromagnetic activity that is reflected in the EEG and MEG. These current dipoles are fundamental in solving the neuro-electromagnetic inverse problem: the determination of locations, orientations and/or amplitudes of brain electric sources given the EEG and/or MEG signals. Various procedures were discussed that are used to solve the neuro-electromagnetic inverse problem, and it was indicated that these methods have been applied with considerable success, to trial averaged data—the Event Related Potentials/Fields. As the interest of psychologists and other neuroscientists is shifting from localization of neural sources towards the assessment of functional connectivity between different regions of the brain, attempts are being made to use these inverse methods for inferences on these issues. In Chap. 2, the mechanisms by which neurons communicate with one another were recalled. Then, methods were discussed that have been used in the literature to investigate cortico-cortical interactions on the basis of EEG and MEG data, as well as on the basis of the more recent functional MRI and PET techniques. The distinction between functional and effective connectivity, was briefly discussed, and was related to the methods reviewed. These methods included crosscorrelation, coherence, phase and event related (de-) synchronization analysis on sensor signals, as well as non-parametric network analysis by means of (PCA) and parametric covariance and time series models (SEM,VAR,DFM). None of these methods provide interactivity measures that are void of interpretational problems. This is mainly because they do not take the biophysical nature of the origins of the signals into account (EEG/MEG), or have too limited time resolution (fMRI/PET). At the end of the chapter it was indicated that, for EEG and MEG, the dynamic factor model [162, 163] adopts a structure that most closely corresponds to the multivariate covariance structure of the signals that should be expected on the basis of their neurophysical origin. It therefore formed the basis for the techniques investigated in Chap. 3-5. The distinction between functional and effective connectivity was the motivation for extending the method of Chap. 4—which identifies functional connectivity—with the dynamic structural equations modelling framework in Chap. 5, which models effective connectivity in terms of response kernels of a linear time invariant (LTI) system.

98

7 Summary and discussion: requirements for accurate connectivity estimation

7.1.2 Multiple dipoles modelling of functional and effective connectivity In Chap. 3 the statistical foundations for the methods discussed in Chap. 4-5 were developed. Both generalized least squares (GLS) and maximum likelihood (ML) methods were considered. For a class of real random variables, the (best) generalized least squares method for fitting covariance structures was first developed in [29]. It was extended in Chap. 3 to a class of complex random variables, of which the complex normal variable is an exponent. It was shown that the resulting estimators are consistent, asymptotically normally distributed, and efficient in the terms of attaining the Cram´er-Rao lower bound on the estimation error. The maximum likelihood method is of course standard and has well known statistical properties: if these estimators are consistent, they are known to be asymptotically unbiased and efficient [5, 232]. Furthermore, a computationally very efficient algorithm was developed for ML confirmatory factor analysis with an added mean structure; it is based on a concentrated likelihood method known in the signal processing literature as ‘stochastic maximum likelihood’ (SML). The advantages of the extensions to the standard SML method are, that the algorithm allows a more general noise covariance matrix, it includes a mean structure, and it allows the array manifold (or ‘factor loading matrix’ in its original psychometric context) to have reduced column rank—this, without loss of the computational efficiency of the original formulation of SML by B¨ ohme ( [19] cited in [134,175,220]). A similarly computationally efficient algorithm was developed for GLS estimation of the confirmatory factor model, using the separable least squares methods of [80, 200]. These algorithms make routine application to data more feasible than the direct methods used in Chap. 4 and 5. The closed form expressions for some of the estimators that were derived in concentrating the likelihood and the least squares function, can also be used for appreciation of their asymptotic and finite sample bias. At the end of Chap. 3 it was argued that, as the signal length tends to infinity, the presented methods are feasible for the Discrete Fourier Transform (DFT) coefficients of (wide sense) stationary time series, because these coefficients have an asymptotically multivariate complex normal distribution [26]. From the simulation study reported in Chap. 4 it was concluded that, under circumstances with realistic numbers of samples, sensors and trials, the ML method outperformed the GLS method, because the latter yielded biased estimates of the source amplitude cross-spectra, while corresponding ML estimators were nearly unbiased. The generalized likelihood ratio χ2 test (GLRT) statistic was evaluated on its helpfulness in determining the number of sources in the data. In the simulations it turned out to be quite useful, but in general, one should be cautious with its use, since any model that is fitted to empirically obtained data is only approximate, and increasing the amount of data may lead to GLRT rejection, only because of necessary approximations [14, 32]. Unfortunately, in applying the method to MEG data obtained from a visual stimulation experiment, the method yielded inconclusive results. A number of reasons may be brought to bare for these problematic results, and include, in order of significance: a too low signal to noise ratio (i.e., the contribution of the modelled dipole sources to the total variance-covariance structure is small compared to the contribution of the noise), the lack of an appropriate noise model (in the application the sensor noise was assumed to be homoscedastic and uncorrelated), non-zero covariance between noise and source signals, an inappropriate number of sources1 , the approximation of the head with an isotropic conductor consisting of concentric homogeneous spheres, and correlations between the DFT coefficients at different frequencies due to non-stationarity or finite signal lengths (the latter however, should only have an effect on the efficiency of the estimators). 1

It is difficult to discerns these latter two sources of error in the modelling, since the idea of stimulus locked brain responses and background activity, is not very helpful in this matter, ‘the’ sources in this application were taken to be ‘important sources of the background activity’, although it was hoped to find the sources that were found in the ERF corresponding to these data, which are thought to have some variation across trials [184].

7.1 Summary

99

In Chap. 5 the method was extended to incorporate the mean structure that is associated with the confirmatory factor model structure used in Chap. 4. The additional information contained in the mean should improve the accuracy of the source estimates. Furthermore in chapter 5, ideas taken from linear structural equation modelling of latent variables [20, 122, 123] and from linear systems theory, were combined into a framework for estimating dynamic structural relations between sources. In this framework, time lagged dependencies between sources are modelled in terms of linear filter responses. The resulting modelling framework allows, in principle, direct testing of hypothesis concerning the effective connectivity (as opposed to functional connectivity) between underlying neural sources. The generality of this approach lies in the fact that linear response kernels may be interpreted as first order terms of Volterra expansions of nonlinear interaction dynamics. The simulations reported in the chapter, showed that conventional asymptotic theory was sufficiently accurate in experiments with 200 trials or more. The GLRT χ2 fit statistic was investigated in its ability to detect departure from independence of the source amplitudes, versus an a priori model of the effective connectivity between the sources. Although it was successful in the presented simulations, the limitations of the GLRT χ2 test indicated previously, still hold. An application to empirical data was not reported, as this resulted in similar interpretational problems as in Chap. 4. 7.1.3 Linear inverse methods in connectivity research In Chap. 6 the linear inverse methods approach to detecting functional connectivity was explored. This was partly instigated by the problematic application of the methods in chapters 4 and 5 to empirical data sets. Theoretical considerations were elaborated upon, which concerned the fundamental limitations on the possibilities of linear current density reconstructions. The problems are most easily assessed in the theoretical frameworks of estimable functions [166,199], the resolution matrix (for distributed source models [160]), and the related resolution field concept [146]. In accordance with the theory of estimable functions for underdetermined regression models [199], it was argued that, in the neuroelectromagnetic inverse problem one should concentrate on weighted averages of the current density in the brain that can be estimated unbiasedly. These averages should be constructed in such a way that they lend themselves to easy interpretation. This means that one should find current density averaging kernels, that are estimable, and are, for example, as selective as possible to a localized region in the brain. In Chap. 6 it was indicated that averaging kernels that are estimable must lie in the linear space that is spanned by the rows of the lead field matrix, i.e., it should be possible to express the averaging kernel as a linear combination of the rows of the lead field matrix. Because the rows of the lead field constitute a linear basis for only an m dimensional subspace, it will not be possible to construct an estimable function for every possible desired target region in the brain, as the dimension of the source space—the number of potentially contributing source locations (voxels in the discretesized problem)—is infinitely larger than the number of sensors. In the chapter, it was therefore considered, that for instance a current dipole source in a single voxel—which are the targets of most available linear techniques (an exception being SOFIA [21, 96])—may not be the type or shape of localized target region in voxel space, for which a sufficiently selective and specific estimable averaging kernel exists. The shapes and locations of source regions, that can be resolved with sufficient selectivity, highly depend on the properties of the lead fields. To search for averaging kernels in the row space of the lead field matrix that are most localized, requires a precise specification of what it means for an averaging kernel to be ‘localized’. More generally, one may search for estimable functions that have ‘simplest possible interpretation’. Characterizing criteria for what it means to have ‘simple interpretation’ were borrowed from the analytic rotation procedures developed in the factor analysis literature. One of these criteria, the entropy criterion, yielded an averaging kernel that was more localized (in the sense of the Backus-Gilbert peak-width of its spatial autocorrelation function) than the averaging kernels resulting from conventional linear methods. However, even this more localized averaging kernel

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

100

7 Summary and discussion: requirements for accurate connectivity estimation

was seen to be sensitive to activity in widespread cortical and subcortical areas in the brain. This indicates that, unless these widespread cortical areas are electrically silent (or at least their total power is comparatively low), the resulting activity estimate cannot be considered to arrive from a localized portion of the cortex. Thus, for unambiguous inferences on functional connectivity in the cortex, it is required that the activity in the cortex itself is strongly focussed.

7.2 General discussion 7.2.1 Functional connectivity estimation with neuroelectromagnetic inverse techniques Two general approaches can be distinguished in neuroelectromagnetic inverse techniques developed for functional connectivity detection: Those based on single trial reconstruction of current density dynamics, and those based on direct statistical modelling of the moments of the data. Techniques adopting the first approach, in general consist of two steps: in the first step, for each trial, dipole source amplitude time courses are reconstructed by solving the inverse problem for the EEG/MEG segments of that trial. In the second step, the resulting amplitude time series are subjected to one of the connectivity analysis methods discussed in Chap. 2. The inverse problem methodologies used in this approach, include beamformer techniques [97, 209], beamformer techniques targeting multiple (a priori selected) source locations simultaneously [67], Laplacians [174], and iterative refinement techniques for distributed source inverse solutions [47–49]. In the second approach, a model for the data, including the statistical parameters describing the connectivity dynamics, is specified in terms of moments of the data (i.c. the means, variances and covariances), and is directly fitted to sample moments of the observed EEG/MEG data. This approach includes the distributed source covariance matrix estimate of [210], and the multiple dipoles model-covariance structure fitting techniques developed in chapters 3-5. The methodologies of the first approach, are very direct and intuitively appealing, and the availability of the reconstructed amplitude time series in each trial makes it ostensible that the possibilities of alternative and single trial directed analysis of connectivity (e.g., coherence, phase, time-frequency analysis, wavelets, non-linear analysis, etc.) are virtually unlimited. Furthermore, pursuance of these alternative techniques is very straightforward, as they are simply carried out on the reconstructed source signals [48, 49, 209]. A number of cautionary remarks are in order however: First of all, very many (viz. the number of samples n times the number of voxels d times the number of trials L) amplitude parameters have to be estimated; sometimes even more amplitude parameters are estimated than there are data points to estimate them from (number of samples times the number of sensors m times the number of trials; e.g., [209]). The reconstructed source amplitude time series should not be interpreted as accurate estimates of source amplitude signals, as it can be inferred from Chap. 6 that activity from vast portions of cortex and subcortical tissues, other than at the intended target source locations, potentially contribute to the estimated activity. Furthermore,2 if the number of estimated amplitudes exceeds the number 2

This can be seen as follows: In Chap. 6 it was indicated that, given the array of sensor measurements y (a non-degenerate m dimensional random vector) at one instant in time, every linear distributed ˘ = Γy. Here, Γ is a 3d × m matrix, assumed to have full source inverse solution can be written η column rank m. Let {Γ1· , . . . , Γm· } denote a set of m linearly independent rows of Γ, corresponding m to the amplitude estimates {˘ ηa }m a=1 = {(Γa· )y}a=1 . Note that, because Γ has rank m, this set spans the complete row space of Γ, and hence, any other m non-zero row in Γ, Γ(m+1)· say, can be written as a linear combination of this set: Γ(m+1)· = a=1 βa Γa· , where β1 , . . . , βm , are not all equal to zero. Now, assume the set {Γ1· , . . . , Γm· } is such, that the source amplitude estimates {Γ1 y, . . . , Γm y} are mutually independently distributed (if such a set of rows of Γ does not exist, the amplitude distributed estimates are obviously necessarily dependent). Then Γ(m+1)· y cannot be independently m y, . . . , Γ y}, because for a = 1, . . . , m, Cov(Γ y, Γ y) = Cov( β Γ from {Γ 1 m a· (m+1)· b=1 b b· y, Γa· y) = m β Cov(Γ y, Γ y) = β Var(Γ y), which is zero, if and only if β = 0, as y is non-degenerate. b· a· a a· a b=1 b

7.2 General discussion

101

of data points, then these estimates are necessarily dependent3 , and therefore, any connectivity analysis based on dependency measures (e.g., correlation) that has sufficient statistical power, is certain to find “evidence” of functional connectivity— even if no factual functional connectivity underlies the observed EEG/MEG signals. Second, even if the source amplitudes in a limited number of source locations is reconstructed (e.g. by means of a multiple source beamformer reconstruction targeting a priori selected locations), the amplitude estimates are not consistent for unboundedly increasing numbers of samples (n → ∞) or trials (L → ∞), since adding time points or trials will only increase the number of parameters to be estimated. Therefore, to obtain more stable estimates of the connectivity measures, these estimates will have to be aggregated in some way—for example, by averaging across trials (e.g., [47]), thereby loosing the apparent advantage of single trial analysis.4 Thirdly, the results are not easily assessed in their accuracy, since no model evaluation theory is available for these types of reconstructions, in which source locations are not consistently estimated or more parameters are estimated than there are data points. Fourth, the estimates are subject to noise, and even in the absence of any source amplitude signal they provide ‘amplitude signal estimates’, which are then solely due to this noise. When estimating correlations between cortical sources, this noise should be accounted for, and this requires an appropriate noise model. This fact is commonly ignored in this approach.5 A further problem, specifically concerning beamformers, is the well known effect of partial signal cancellation that is due to correlations between source amplitudes. This results in distorted and attenuated amplitude signal reconstructions, and, if there are multiple correlated sources, this causes a correlation-magnitude dependent bias of correlation estimates [202]. The methodologies of the second approach assume homogeneity across replications in multiple trials (which of course, requires justification), and involve a single stage estimation procedure, without necessitating cumbersome single trial reconstructions. Within this second approach, the distributed source model method in [210] attempts to reconstruct the entire covariance structure in the current density variation, but can give only strongly blurred images of this structure.

3

4

5

By assumption on Γ(m+1)· , there exists a βa = 0, and hence, Γ(m+1)· y has nonzero correlation with at least one of {Γ1· y, . . . , Γm· y}. As a small excursion from the present discussion, note that this (rather obvious, but apparently often overlooked) fact has serious consequences for the ‘statistical’ justification of the iteratively refined minimum norm method proposed in [49], which selects active regions on the basis of voxel-wise permutation tests, assuming these tests are independent for different voxels. In a similar fashion, it limits the use of t and F statistics in the statistical parametric map proposed in [45] as statistical indicators for active voxels. (Note that the proposed F statistic in general does not have the there indicated 3 degrees of freedom in the numerator, as it does not account for correlations—an appropriate alternative would be to define a Hotelling’s T 2 type statistic for each voxel, which may then be transformed to obtain an F statistic with 2 degrees of freedom in the numerator for MEG in a spherical head model. The resulting statistics remain dependent however.) Assuming homogeneity of the data across trials, the estimates can be made consistent as L → ∞ by averaging across trials, provided that the source locations are correctly specified or estimated consistently. By averaging across a time window, an estimate of a connectivity measure could, in principle, be made consistent, subject to the same conditions on the source locations. It is necessary to distinguish stationary versus non-stationary functional connectivity: In the stationary case, the time span of the averaging time window can be simply increased, and the average estimates are consistent as n → ∞. In the non-stationary case, the time average estimate can be made consistent, if the time span of the averaging window can be decreased, while the information obtained per unit time in this decreasing window can be made to increase, e.g., by employing higher sampling rates. This is ignored for instance in the methods proposed in [47–49, 97], in the analysis of beamformer performance presented in [202], and in the application of synthetic aperture magnetometery (SAM) in [209]. To reiterate: If the model yl = Λη l + εl , l = 1, . . . , L, holds for the trial data, η l and εl being independent, Cov(εl , εl ) = Υε , and Λ being of full column rank, then the optimal GLS estimate of −1  −1 −1 ˘ l = (Λ Υ−1 ˘ l ) = Cov(η l , η l ) + (Λ Υ−1 Λ Υε y, has the covariance structure Cov(˘ ηl , η , ηl , η ε Λ) ε Λ) where the interest lies in Cov(η l , η l ). Therefore, the noise covariance structure Υε has to be taken into account in the estimation of Cov(η l , η l ).

102

7 Summary and discussion: requirements for accurate connectivity estimation

Furthermore, some means of making a distinction between “significant” and “non-significant” correlations is necessary. Because the number of estimated correlations is much higher than the number of degrees of freedom in the covariance matrix of the data, no statistical analysis is available for this purpose. The covariance structure modelling methods in this approach, developed in Chap. 3-5, circumvent this problem by considering only situations in which a multiple dipole model with a limited number of sources is warranted. This has a number of important advantages. Firstly, a very limited number of parameters needs to be estimated, which does not increase as the number of trials increases. Therefore, both the best generalized least squares and the maximum likelihood methods are able to yield consistent estimates as the number of trials and samples tend to infinity (L → ∞ and n → ∞).6 Second, these estimates were shown to have asymptotically the smallest possible estimation error, i.e., they are asymptotically efficient, and hence best generalize to new data sets. Thirdly, straightforward model fit assessment theory is available, by means of generalized likelihood ratio fit statistics, confidence intervals, and hypothesis testing on the estimated parameters—the latter of which were exemplified and evaluated in [108, 236, 238]. Covariance (i.c. cross-spectrum) fitting, like least squares methods for solving the inverse problem in ERP/ERF dipole source modelling, involves a non-linear search in a high dimensional parameter space. The same objections as discussed in Chap. 1 for ERP/ERF dipole localization may therefore be invoked here. These include, the associated high computational costs; the problem of multiple local minima; and the number of sources that must be specified a priori. The latter two problems can be handled in the same way as indicated in Chap. 1—i.e., multiple starting values can be used to avoid local minima, while the model selection procedures developed in [237, 238] can be used to determine the number of dipoles necessary. With respect to the first objection of high computational costs, these costs have been greatly reduced in the concentrated likelihood methods developed in Chap. 3 and Chap. 5. In resolving the combined problem of high computational cost and local minima, the concentrated likelihood methods may be deemed essential. A further limitation is that, currently, the methods only allow coherence and phase analysis, and are appropriate only for wide sense stationary signals. Alternative connectivity measures, allowing for non stationary data (such as wavelets, non-linear analysis, and dynamic correlations), are non-straightforward generalizations of the current methods. Some suggestions for alleviating these limitations will be indicated at the end of this chapter. More significant problems of the multiple dipole model covariance structure fitting techniques, are the strong assumptions that are placed on the source configuration, viz. a limited number of strongly focussed dipole like sources (the methods quickly becomes infeasible as the number of dipoles in the model becomes large), and the requirement of an accurate noise model. This last problem was already seen to be non-specific to the methods in the second approach—as indicated earlier, the methodologies in the first approach often simply ignore this problem. The former problem turns out to be also non-specific to the covariance fitting techniques however: As was seen in Chap. 6, even of the most localized estimable function, targeting the most optimally resolved localized source, the extent of the brain region that potentially contributes to the estimated average current density was still quite substantial. As the weighting vectors constructed were ‘best’ in the sense of the criteria they optimized, without a priori fixing the location of their focuss, other weighting vectors, targeting other locations, will be likely to have only greater spread of their corresponding resolution fields. Connectivity measures, based on the current density estimates obtained with these sensor weighting vectors, may suffer from significant bias due to shared sensitivity of the associated averaging kernels. One might propose to 6

For fixed Niquist sampling intervals, if n < ∞, the cross-spectrum estimates are biased, and therefore the estimates cannot be consistent. We therefore let n → ∞. In the more limited interpretation of consistency, defined as ‘tending to their expected value in probability’, the estimates would still be consistent as L → ∞.

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

7.2 General discussion

103

correct the correlation estimates obtained in this way. However, this requires substantial knowledge of the correlation between voxels throughout the brain, which seems to be at odds with the purpose of the analysis. Therefore, to justify conclusions drawn from connectivity estimates on the basis of these estimable functions, one is practically forced to assume the presence of only a few focused source regions, with relatively little activity in other regions. The latter assumption would imply however that the methodology investigated in this thesis is feasibly applicable, and statistically optimal if the assumptions on stationarity and independence of source and noise signals are satisfied. The next section evaluates these arguments in more detail. 7.2.2 Requirements for unambiguous functional connectivity estimation: ‘Signal’ to ‘noise’ ratio To obtain more explicit requirements for accurate functional connectivity estimation, here we consider effects of signal to noise ratio (SNR) of estimable functions designed to reconstruct activity in a localized region of the brain. Here, the meanings of ‘signal’ and ‘noise’ are restricted to signal variance coming from a region of interest versus signal variance coming from a region not of interest. For clarity of the discussion, we will not consider noise that may be classified as environmental or instrumental noise. Let p1  denote an estimable function, i.e., let p1 be such that there exists an h1 such that E{h1  y} = p1  η, where y is the m-vector of sensor measurements, and η is the 3d ( m) vector of source amplitudes, both at one instant in time. Partition the estimable function into two (1) (2) (1) parts, p1  = (p1  : p1  ), the first part, p1 , being sensitive to activity in the region of interest (2) (the signal region), the second, p1 , to the regions not of interest (the noise region).7 Ideally, (2) p1 = 0, but this may not be possible given the properties of the lead field matrix. Denote the covariance matrix8 of the current density η as Ψ, and partition it in correspondence with the partitioning of p1 :  Ψ11 Ψ12 . Ψ= Ψ21 Ψ22 (1)

(1)

(1)

(2)

(2)

(2)

Note that the total output variance of p1 is p 1 Ψp1 = p 1 Ψ11 p1 +2p 1 Ψ12 p1 +p 1 Ψ22 p1 . The ‘signal to noise ratio’ (SNR) of the estimable function p1 may be defined as ! (1) ! p  Ψ11 p(1) 1 . SN R(p1 ) = " 1(2) (2)  p1 Ψ22 p1 Suppose now, we have a second estimable function, p2 , whose target signal region lies in the noise region of p1 , and the interest lies in determining the correlation between the activity in the (1) (2) two signal regions. For simplicity, we will assume that p 2 = (p 2 : p 2 ) is partitioned in the (1) (1) (2) (2) same way as p1 (i.e., p1 and p2 , and p1 and p2 have the same dimensions) and that the signal region of p2 matches the noise region of p1 and vice versa. Since only p1  η and p2  η are accessible to estimation (by means of h1  y and h2  y), these estimates are used to approximate the correlation of interest. The resulting correlation is p1  Ψp2 ρ˘p1 p2 = # , (p1  Ψp1 )(p2  Ψp2 ) as obtained from the covariance matrix of the vector of activity estimates (p1  η, p2  η),     

p1 Ψp1 p1  Ψp2 p1 Ψ p1 p2 = . p2  p2  Ψp1 p2  Ψp2 7 8

It may be necessary to renumber the voxels for this partitioning. This covariance matrix maybe defined across time, trials or subjects.

104

7 Summary and discussion: requirements for accurate connectivity estimation

The latter covariance matrix may be written  (1)  (1) (1) (2) (1) (1) p Ψ11 p1 p1  Ψ12 p1 p1  Ψ11 p2  1(2)    (1) (2) (2) (2) (1) 12 0 p1 Ψ21 p1 p1  Ψ22 p1 p1  Ψ21 p2  (1) (1)  (2) (1)  (1)  0 12  p(1) 2 Ψ11 p1 p2 Ψ12 p1 p2 Ψ11 p2 (2) (1) (2) (2) (2) (1) p2  Ψ21 p1 p2  Ψ22 p1 p2  Ψ21 p2 Denoting the middle matrix in this expression as  κ11 κ12 κ13 κ21 κ22 κ23 κ= κ31 κ32 κ33 κ41 κ42 κ43

(1) (2)  p1  Ψ12 p2 (2) (2)   p1  Ψ22 p2  12 0 . (1) (2)  p2  Ψ12 p2  0 12 (2) (2) p2  Ψ22 p2

 κ14 κ24  , κ34  κ44

the problem can be reformulated as the question of how well ρ˘p1 p2 approximates ρp(1) p(2) = 1 2 # # √ κ14 / κ11 κ44 = ρ14 as a function of SN R(p1 ) = κ11 /κ22 = ς1 and SN R(p2 ) = κ44 /κ33 = ς2 . Now, ρ˘p1 p2 becomes ρ˘p1 p2 = √

κ13 + κ14 + κ23 + κ24 √ . κ11 + κ12 + κ21 + κ22 κ33 + κ34 + κ43 + κ44

√ Dividing both numerator and denominator by κ11 κ44 , using κ13 /(κ11 κ44 )1/2 = ρ13 (κ22 /κ11 )1/2 , κ23 /(κ11 κ44 )1/2 = ρ23 (κ22 /κ11 )1/2 (κ33 /κ44 )1/2 , κ24 /(κ11 κ44 )1/2 = ρ24 (κ22 /κ11 )1/2 , and making the substitutions (κ22 /κ11 )1/2 = 1/SN R(p1 ) = 1/ς1 and (κ33 /κ44 )1/2 = 1/SN R(p2 ) = 1/ς2 , this is written    ρ12 ρ34 ρ13 ρ23 ρ24 1 −1/2 1 −1/2 1+2 1+2 + + + 2 + 2 (7.1) ρ˘p1 p2 = ρ14 + ς2 ς1 ς2 ς1 ς1 ς2 ς1 ς2 Hence, if both ς1 → ∞ and ς2 → ∞, then ρ˘p1 p2 → ρ14 , as anticipated. Therefore, the approximation can be accurate, if the SNRs are high, or if the correlations ρ23 and ρ13 , ρ24 are zero. In particular, if both SNRs are “infinitely high”, the approximation will be exact, regardless of these correlations. # (1) (1) (2) (2) Since SN R(p1 ) = ς1 = κ11 /κ22 = (p1  Ψ11 p1 /p1  Ψ22 p1 )1/2 and SN R(p2 ) = ς2 = (2) (2) (1) (1) (p2  Ψ22 p2 /p2  Ψ11 p2 )1/2 , the requirement that the SNRs are high, is fulfilled if either i ) the estimable functions are (relatively) insensitive to activity in their noise regions—i.e., the regions for which they are (highly) sensitive are non-overlapping, or ii ) the sources in their noise region are (relatively) silent compared to the sources in their signal region, or iii ) a combination of both. The requirement that both SNRs should be high, rules out the second possibility (ii ), because the sources in the noise region of p1 lie in source region of p2 . Therefore, the estimable functions should be relatively insensitive to the activity in their noise regions. The possibility of constructing such estimable functions strongly depends on the nature of the lead field matrix, and it may not be achievable to a sufficient degree (cf. the considerations in Chap. 6). As indicated before, even the most localized estimable function that was obtained in Chap. 6 was sensitive to widely distributed brain regions. Hence, we have to resort to the third possibility: a combination of low sensitivity of the estimable functions to sources in their respective noise regions as well as the requirement that sources in the noise region are relatively silent. This can be attained if apart from a few sources with high amplitude variance, most sources in the noise regions are relatively silent (have low amplitude variance), and the estimable functions are chosen to be such, that they are particularly insensitive to high variance sources in the noise region. The latter can be achieved as follows: By definition, for estimable p, there exists a corresponding sensor weight vector h, such that E{h y} = p η. The relation between h and p is given by p = h Λ and h = p Λ+ , where Λ is the m × 3d (3d  m) lead field matrix (see Chap. 6).

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

7.2 General discussion

105

From this relation it can be directly seen that p can be chosen to be insensitive to any particular dipolar source desired by choosing h to be orthogonal to the column in Λ that corresponds to ⊥ this source. Obviously, there are exactly m − 1 such vectors h, {h⊥ 1 , . . . , hm−1 }, that are linearly independent (tacitely assuming that Λ has full row rank). Furthermore, any linear combination of ⊥ {h⊥ 1 , . . . , hm−1 } will be orthogonal to the source as well. The corresponding estimable functions will all be (completely) insensitive to the signal variance of this a priori given source. We are therefore free to choose h from a m − 1 dimensional subspace of Rm . If h is required to be insensitive to two a priori defined sources, then there are m − 2 linearly independent such weight vectors, and h can be chosen from an m−2 dimensional subspace, etc. A given estimable function can be ‘desensitized’ for at most m − 1 a priori given source locations in this way. From the continuity of the lead field as a function of location, it may be anticipated, that the estimable functions will be relatively insensitive to somewhat extended regions near the sources to which they were desensitized, hence relaxing somewhat the requirement that these sources are dipolar. Thus, if one focal (dipolar) source is known to have (relatively) great amplitude variance in the noise region of p1 (presumably the source that accounts for most of the variance in the signal region of p2 ), p1 has to be adapted in such a way that it becomes insensitive to that source. Similarly, p2 has to be adapted such, that it will be insensitive to high variance sources in its noise region (which includes the high variance source in the signal region of p1 ).9 We are therefore forced to conclude, that for the approximation of ρ14 by ρ˘p1 p2 in (7.1) to be accurate, there must be a limited number (≤ m) of strongly focalized high variance sources, while amplitude variance in other regions is relatively low. If this is not the case, the approximation will be inaccurate, and may lead to unjustifiable conclusions on connectivity. Furthermore, accurate estimates of the locations of these high variance sources must be available. In most cases, this information is not (accurately) available, and must be estimated from the data. This necessitates accurate source localization procedures. In conclusion, due to the ill posed-ness of the inverse problem, strongly focussed source generators of the data are a requirement for any accurate connectivity inference based on EEG/MEG. Furthermore, their locations should be accurately determined, and this requires a high precision source estimation procedure. Beamformers and minimum norm estimation procedures seem less well suited than multiple dipole modelling procedures, as they are known to yield biased estimates of the source locations [49, 91, 160, 202, 229, 232] resulting in insufficient desensitization. It is interesting to note, that multiple dipole model fitting procedures can be regarded as methods that simultaneously seek multiple estimable functions, of which the signal regions contain sources to which they are optimally sensitive, while being completely desensitized to the sources in the signal regions of each other. They do so, while estimating the source locations from the data. 7.2.3 Directions for improvement: Increasing SNR Following the previous discussion, for accurate connectivity estimation, the EEG/MEG data analyzed must be generated by current density distributions that can be characterized by strongly focussed (dipolar) sources. Background activity should contribute relatively little to the measured signals, compared to the contribution of these focal sources. 9

One way of doing this, for example, is to orthogonally project the weight vector h1  = p1  Λ+ associ⊥ ated with p1 onto the subspace spanned by {h⊥ 1 , . . . , hm−1 }. The resulting estimable function will be different from p1 , and all we can hope for, is that the difference will not be very large. Alternatively, since in each signal region contains one focal source, whose locations were assumed to be known in the former desensitization procedure, one might simply construct the ordinary least squares estimation operator for the source amplitudes at these a priori specified source locations, which is associated with a set of estimable functions (Chap. 6) which are precisely desensitized to the sources in signal regions of each other, while remaining optimally focussed on the sources in their source regions.

106

7 Summary and discussion: requirements for accurate connectivity estimation

The ideal situation of completely absent background activity seems to be physiologically infeasible however, and therefore, any method that will yield reliable estimates of cortico-cortical interactivity will have to take the covariance structure of the background neuronal activity into account. Models for this background activity have been proposed in [58] and [145], but they need to be refined if they are to be used in connectivity research, because they do not incorporate the general local (sub-centimeter) correlative structure that is observed in intracranial measurements [99, 170]. More extensive incorporation of this knowledge seems therefore necessary, in addition to more precise descriptions of short range and long range neuronal dynamics. A significant problem that has not been considered at all, is the correctness of the common assumption of independence of source and noise signals: If noise consists of both instrumental noise and background neuronal activity that is not of particular interest (since it is of low amplitude), then it seems to be justifiable to assume that the instrumental noise is independent of the brain signals.10 For background neuronal activity this assumption is likely to be violated however. Covariances between background activity and activity of major sources due to such dependencies should be taken into account. It might be possible to increase the signal to noise ratio by means of stimulus manipulation: For example, it is well known that background alpha activity becomes synchronized to light flash stimulations repeated at an appropriately chosen frequency. The phase locking in the alpha band is likely to entrain many neurons to fire in synchrony, resulting in sources with relatively high amplitude variance across time. Alternatively, it may be possible to experimentally induce high amplitude variance in sources of interest across trials. For example, it is known that the amplitude of the P300 ERP component to the deviant stimulus in an odd ball paradigm is modulated as a function of the expectation of the subject on the probability of the appearance of the deviant stimulus. It may be possible to manipulate the expectation of the subject by cuing this probability, thereby presumably modulating the amplitude of the underlying source(s). Similarly, somatosensory evoked potentials are amplitude modulated by the intensity of the stimulus [224]. It may also be possible to use source amplitude variance across different subjects to increase the signal to noise ratio. This is complicated by the requirement of anatomical homogeneity of subjects, while neuro-anatomical inhomogeneity of subjects is widely appreciated. One directive for a solution to this complication is to make the subjects anatomy an integral part of the forward problem. This can be done by using MRI scans of each subject, and warping the anatomy of each to a canonical anatomy space in which the dipole localization takes place. Inverse warping is then used to align the canonical source parameters to the individuals anatomies of the subjects. Procedures for warping subject anatomy to a canonical (Talaraich) anatomy are well developed in fMRI research [46]. Such a development would be of significant interest to electromagnetic source analysis of ERPs/ERFs as well. To evaluate the accuracy of the connectivity estimates obtained from linearly combining EEG/MEG sensor measurements, regardless of the estimation procedure used (be it beamformer, minimum norm or the methods developed in this thesis), it is recommendable to analyze the resolution fields of the estimable functions that are associated with these linear combinations. Graphical displays of the resolution fields of the sensor weighting vectors in multiple dipole models give an impression of the extent of the desensitized regions, while displays of e.g. beamformer resolution fields help prevent drawing too strong conclusions on connectivity when these resolution fields indicate sensitivity to the same sources. Because for functional connectivity estimation the sources must be (dipolarly) focussed in nature, in order to evaluate the accuracy of connectivity estimates, procedures are necessary to determine if this requirement is fulfilled. Currently, no procedures seem to be available that are specifically designed for this purpose. Directions for development may be found in goodness of fit statistics and model selection procedures as developed in [237, 238]. [] 10

This assumption does not hold for example if the instrumental noise depends on the amplitude of the measurements. But this may be of minor importance.

7.2 General discussion

107

It may be possible to fit covariance structures for connectivity estimation in the time domain, to relief the requirement that signals are wide sense stationary for a long duration, in order to justify the assumption of independence of Fourier coefficients at different frequencies. Without a parametric model for the data covariance, such as the Kronecker product structure models proposed in [55, 109], not only the computational demands of such an undertaking would be very high, but the number of trials required would be prohibitive. For short time segments, it may be realistic to assume that the connectivity between a number d (< m) of cortical regions remains approximately stable, while the temporal dynamics is approximately equal in these areas. Let these areas be modelled by dipolar sources, with amplitudes time functions η1 (t), . . . , ηd (t). Collect the amplitude of n samples of the a-th source in the vector η a  = [ηa (1), . . . , ηa (n)], and collect the d amplitude vectors in the vector η = vec{[η 1 , . . . , η d ] }. The connectivity between the sources may then be described by a spatial connectivity matrix C, and the temporal dynamics may be described by the autocorrelation matrix Cor(η a , η a ) = R, a = 1, . . . , d. The covariance of η can then be expressed as Cov(η, η) = R ⊗ C. Invoking the data model y = [y(1) , . . . , y(n) ] = [In ⊗ Λ(θ)]η + [ε(1) , . . . , ε(n) ], the covariance of y can be expressed as Cov(y, y) = [In ⊗ Λ(θ)](R ⊗ C)[In ⊗ Λ(θ)] + Υε, where Υε is the covariance matrix of the vector [ε (1), . . . , ε (n)] , and θ represent the source parameters. Following [55], if we can express Υε as a Kronecker product of a spatial covariance matrix X and a temporal correlation matrix T, i.e., Υε = T ⊗ X, we may write Cov(y, y) = R ⊗ [Λ(θ)CΛ(θ) ] + T ⊗ X. If we furthermore make the greatly simplifying assumption that R ≈ T for the short time segment under investigation, the covariance structure reduces to Cov(y, y) = R ⊗ [Λ(θ)CΛ(θ) + X], which allows the development of a similar algorithm as given in [55]. Concentrated likelihood methods may be used to increase computational efficiency. A further way to make time domain covariance structure models feasible is to reduce the number of sensors that is incorporated in the analysis, thereby reducing the number of covariances that have to be estimated, and hence stabilizing the sample covariance matrix. If a priori approximate locations are known, the best possible estimates with a limited fixed number of sensors can be obtained by optimal positioning the sensors with respect to these hypothesized locations. In the case of MEG the sensor positions are fixed, and an optimal subset of the sensors has to be selected. Methods for this optimal design problem and optimal sensor subset selection were developed in [110]. Based on the discussion of bias in Chap. 3 it should be noted that finite sample bias of the source amplitude covariances will increase as the number of sensors decrease, and hence an optimal tradeoff between limiting the number of sensors (hence, the stability of the covariance matrix) and the bias in the amplitude covariance estimates has to be found. It has been suggested to use independent component analysis (ICA) to separate the EEG or MEG signals into “independent topographies”, increasing the signal to noise ratio for each topography, and then use these independent topographies to fit dipole sources [37, 60, 152]. As indicated in Chap. 5 this is alike the principle components analysis method in which dipole sources were fitted to the component loadings [151], which was shown to yield strongly biased results [1, 53]. Similar biased results are to be expected for the ICA dipole localization method, since, if two separate localized cortical regions are driven by the same ‘independent component’ input, ICA yields a single vector of component loadings that is a mixture of the lead field gains of the two regions. More complex situations are easily imagined. The method in [37] corrects for this effect in an ad hoc way, but is not very principled in its statistical foundations. A class of methods in which the same idea is exploited, but treats it in a statistically systematic fashion, is known as signal subspace fitting. Basically, these methods obtain a matrix from the data, of which linear span of the columns forms a consistent estimate of the signal subspace. The methods differ in the way in which this matrix is determined. One prominent example is the matrix formed by the first d principal vectors obtained in a PCA. To this matrix, the best linearly transformed array manifold of a multiple source model is fitted, thus aligning the estimated signal subspace and column space of the array manifold [180]. Both the best linear transformation and the source

108

7 Summary and discussion: requirements for accurate connectivity estimation

parameters are determined in this fit. A lot of methods can be brought under the umbrella of signal subspace fitting, including various types of beamformers and MUSIC [134, 180]. A well developed optimal signal subspace fitting method exist only for PCA however, called weighted subspace fitting (WSF), but may be extendible to ICA. Connectivity indices would then be provided by the estimated best linear transformation of the sensor array manifold. This requires a general asymptotic theory for ICA, which is currently not well developed.

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

A Some operations on complex matrices

Some of the results and their proofs presented here and elsewhere in this thesis rely on those presented in other references—this is always explicitly stated, and a reference is provided. To save space, only proofs are given if a proof of the result could not be found in the literature. In most other cases only a reference is given.

A.1 Algebra of certain operators on complex matrices Definition A.1. [6, ch. 1] Let Z = A + iB, with A, B ∈ Rpq . Then [Z]R is defined by  A . [Z]R = B Definition A.2. [26, sect. 3.7] Let Z = A + iB, with A, B ∈ Rpq . Then {Z}R is defined by  A −B . {Z}R = B A

Lemma A.3. Let Z, W ∈ Cpq and Z = A + iB, W = C + iD, then {Z}R = {W }R ⇐⇒ Z = W . Proof. By definition Z = W =⇒ {Z}R = {W }R . Conversely, {Z}R = {W }R implies that A = C and B = D, and hence that Z = A + iB = C + iD = W .   Theorem A.4. Let Z, W ∈ Cpq and Z = A + iB, W = C + iD, A, B, C, D ∈ Rpq . Let U ∈ Cpr , U = E + iF , E, F ∈ Rpr , and α, β ∈ R. Then [W + Z]R = [W ]R + [Z]R ,

(A.1)

[αZ]R = α[Z]R if α ∈ R,

(A.2)

[(α + iβ)Z]R = α[Z]R + β[iZ]R ,

(A.3)









[Z]R [U ]R = (Z U ),

(A.4) 

[iZ]R [U ]R = (Z U ) = −[Z]R [iU ]R .

(A.5)

Proof. Equations (A.1)-(A.3)

  are checked by writing out. To prove (A.4), on the one hand [Z]R [W ]R = A , B  C  , D  = A C + B  D, on the other hand Z ∗ W = (A + iB) (C + iD) = ∗ (A − iB) (C + iD) = (A C + B  D) + i(A D − B  C). Identifying A C + B  D

), the con

=  (Z W   clusion follows. Similarly for (A.5), [iZ]R [W ]R = [−B + iA]R [C + iD]R = −B , A C  , D =

  −B  C + A D = −(Z ∗ W ), and [Z]R [iW ]R = [A + iB]R [−D + iC]R = A , B  −D , C  = −A D + B  C = −[iZ]R [W ]R . (Equation (A.4) was given in [26, exc. 3.10.32])  

110

A Some operations on complex matrices

Theorem A.5. Let Z, W ∈ Cpq , Z = A+iB, W = C+iD, A, B, C, D ∈ Rpq , U ∈ Cqr , U = E+iF , E, F ∈ Rqr , and α, β ∈ R. Then it may be verified that {Z + W }R = {Z}R + {W }R , 

{αZ}R = α{Z}R , 

{(α + iβ)Z}R = α{Z}R + β{iZ}R ,  

{Z }R = {Z}R ,



{Z}R = {Z }R ,

{ZU }R = {Z}R {U }R ,

{Z

−1

−1



{Z }R = {Z}R ,

}R = {Z}R if p = q and Z

−1

(A.6) (A.7)

exist

(A.8)

Proof. To verify (A.8), write down in matrix form: {ZU }R = {(AE − BF ) + i(BE + AF )}R    AE − BF −BE − AF A −B E −F = = BE + AF AE − BF B A F E = {Z}R {U }R . The last equality follows from I2p = {Ip }R = {ZZ −1 }R = {Z}R {Z −1 }R , and I2p = {Z}R {Z −1 }R iff {Z}R−1 = {Z −1 }R . Some of these equalities are given in [26, lemma 3.7.1].   Definition A.6 (MP-inverse). [6, 26] For the matrix Z ∈ Cpq and W ∈ Cqp , if W satisfies ZW Z = Z

W ZW = W





(ZW ) = ZW

i+ii)

(W Z) = W Z

iii+iv)

then W is called the Moore-Penrose generalized matrix inverse (MP-inverse) of Z and is denoted Z + . It can be shown that the MP-inverse exists for any Z ∈ Cpq and is unique. Theorem A.7. {Z + }R = {Z}R+ Proof. By verifying the real valued counterparts of properties i)-iv) for {Z + }R using (A.8) and (A.7): i) {Z}R {Z + }R {Z}R = {ZZ + Z}R = {Z}R , ii) {Z + }R {Z}R {Z + }R = {Z + ZZ + }R = {Z + }R , ∗ ∗ ∗ iii) ({Z}R {Z + }R ) = {Z + }R {Z}R = {Z + }R {Z ∗ }R = {Z + Z ∗ }R = {(ZZ + ) }R = {ZZ + }R = ∗ ∗ ∗ {Z}R {Z + }R , iv) ({Z + }R {Z}R ) = {Z}R {Z + }R = {Z ∗ }R {Z + }R = {Z ∗ Z + }R = {(Z + Z) }R = + + + {Z Z}R = {Z }R {Z}R . Hence {Z }R satisfies all the properties of the MP-inverse of {Z}R . Therefore, by uniqueness of the MP-inverse, it must be that {Z + }R = {Z}R+ .   Lemma A.8. Let Z = A + iB where A, B ∈ Rpq , and let W = C + iD with C, D ∈ Rqr . Then {Z}R [W ]R = [ZW ]R

and [Z ∗ ]R {W }R = [(ZW )∗ ]R , ∗ 

∗ 

{ZW }R = [Z]R [W ]R + [iZ]R [iW ]R .

(A.9) (A.10)



  A −B C AC − BD Proof. {Z}R [W ]R = = = [(A + iB)(C + iD)]R = [ZW ]R (see B A D BC + AD also [6]). From this and from equation (A.7), [Z ∗ ]R {W }R = ({W }R [Z ∗ ]R ) = ({W ∗ }R [Z ∗ ]R ) = [(ZW )∗ ]R .

 To prove (A.10) , rewrite in component form and use [iW ∗ ]R = D , C : 

AC − BD −AD − BC {ZW }R = {(AC − BD) + i(BC + AD)}R = BC + AD −BD + AC       AC −AD −BD −BC A −B C , −D + D,C = + = BC −BD AD AC B A = [Z]R [W ∗ ]R + [iZ]R [iW ∗ ]R .  

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

A.1 Algebra of certain operators on complex matrices

111

Before obtaining results for the vec{} operator and the Kronecker product, consider two special matrices: 

 0 −In Hn = √12 In , iIn Definition A.9. Jn = {iIn }R = In 0 These matrices have the following properties: Lemma A.10. Let Z be p × q. Jn = −Jn , Jp {Z}R = {iZ}R = {Z}R Jq , Jp {Z}R Jq = {Z}R , Jn Jn = Jn Jn = I2n , Jn Jn = −I2n , (Jn ⊗ Jn ) = Jn ⊗ Jn , 12 (I4N 2 ± Jn ⊗ Jn ) is idempotent and symmetric. Proof. Jn = {iIn }R = {iIn }R = {−iIn }R = −{iIn }R = −Jn , Jp {Z}R = {iIp }R {Z}R = {iZ}R = {Z}R Jq , Jp {Z}R Jq = {iIp }R {Z}R {−iIq }R = {−i2 Z}R = {Z}R , the next equation is a special case of the previous, the next equation follows from i2 = −1. (Jn ⊗ Jn ) = Jn ⊗ Jn = (−Jn ) ⊗ (−Jn ) = (−1)2 (Jn ⊗Jn ). The last equality follows from (I4n2 ±Jn ⊗Jn )2 = I4n2 ±2Jn ⊗Jn +(Jn ⊗Jn )(Jn ⊗ Jn ) = I4N 2 + (Jn Jn ⊗ Jn Jn ) ± 2Jn ⊗ Jn = I4n2 + (−I2n ) ⊗ (−I2n ) ± 2Jn ⊗ Jn = 2(I4n2 ± Jn ⊗ Jn ),   symmetry follows from (Jn ⊗ Jn ) = (Jn ⊗ Jn ). Lemma A.11. Hn Hn ∗ = H n Hn  = In , Hn Hn = H n H n  = 0, and Hn+ = Hn ∗ .

  Proof. First Hn Hn ∗ = 12 In , iIn In , −iIn = 12 (In + (−i2 )In ) = In , H n Hn = Hn Hn ∗ = In =

  In . Next Hn Hn = 12 In , iIn In , iIn = 12 (In + i2 In ) = 0 and H n H ∗ n = Hn Hn = 0. Finally Hn+ = Hn ∗ (Hn Hn ∗ )+ = Hn ∗ (In )+ = Hn ∗ .   Next relations are obtained between vec{Z} and Z ⊗ W , and {Z}R and [Z]R . Sometimes for conciseness the subscript from In , Hn and Jn is dropped (i.e. they are denoted I, H and J) when the dimensions are clear from the context. Lemma A.12. Z = Hn {Z}R Hm ∗ , where Z is n × m. Proof. Let Z = A + iB, then Hn {Z}R

H∗

m

=

1 2

A) = A + iB = Z.

   A −B Im In iIn = 12 (A + iB + iB + −iIm B A  

Corollary A.13. vec{Z} = (H m ⊗ Hn )vec{{Z}R }. Proof. This follows from Lem. A.12 and the equality vec{ABC} = (C  ⊗ A)vec{B}.

 

The matrix H m ⊗ Hm shares many of the properties of Hm , as the following lemma shows. ∗

Lemma A.14. (H m ⊗ Hm )(H m ⊗ Hm ) = Im2 , (H m ⊗ Hm )(H m ⊗ Hm ) = 0, (H m ⊗ Hm )+ = ∗ (H m ⊗ Hm ) . Proof. Direct from Lem. A.11.

 

Lemma A.15. Hn ∗ ZHm = 12 {Z}R + 12 i{−iZ}R . Proof. Write Z = A + iB, A, B ∈ Rnm ,  

 1 A + iB −B + iA 1 In ∗ Z Im iIm = Hn ZHm = 2 −iIn 2 B − iA A + iB   1 1 1 1 A −B B A + i = {Z}R + i{−iZ}R . = 2 B A 2 −A B 2 2   Corollary A.16. {Z}R = Hn ∗ ZHm + Hn ZH m .

112

A Some operations on complex matrices

Proof. This follows from Lem. A.15 and adding conjugates.

 

Lemma A.17. Let Z be n × m, then 1 (H  ZH) ⊗ (H ∗ ZH) + (H ∗ ZH) ⊗ (H  ZH) = ({Z}R ⊗ {Z}R + {iZ}R ⊗ {iZ}R ). 2 Proof. This follows from Lem. A.15, and adding conjugates.

 

Note from Lem. A.17 that 12 ({Z}R ⊗{Z}R +{iZ}R ⊗{iZ}R ) = 12 (I4m2 +Jn ⊗Jn )({Z}R ⊗{Z}R ) may also be written (H  ⊗ H ∗ )(Z ⊗ Z)(H ⊗ H) + (H ∗ ⊗ H  )(Z ⊗ Z)(H ⊗ H), therefore lemma A.17 relates (Z ⊗ Z) to ({Z}R ⊗ {Z}R ). Lemma A.18. HJ = iH, HJ = −iH, (H ⊗ H)(J ⊗ J) = H ⊗ H. Corollary A.19. (I + J ⊗ J)(H ∗ ⊗ H  ) = 2(H ∗ ⊗ H  ) and (I − J ⊗ J)(H ∗ ⊗ H  ) = 0. √ Lemma A.20. 2[Z]R = H ∗ Z + H  Z.



 Proof. This follows from H ∗ Z = √12 I −iI (A + iB) = √12 A + iB  B  − iA and adding conjugates.   Lemma A.21. J[Z]R = [iZ]R . Proof. J[Z]R = {iI}R [Z]R = [iZ]R by lemma A.9.

 

A.2 Commutation and Conjugation matrices, and the vech{} operator Definition A.22 (Commutation matrix). [150, ch. 3] Since vec{A} and vec{A } for a p × p matrix consist of the same elements, the difference being the order of arrangement, it is obvious that there must exist a matrix Kp such that Kp vec{A} = vec{A } . This matrix is called the commutation matrix of order p. Lemma A.23. Some properties of Kp are listed. 1. Kp−1 = Kp  , therefore Kp  Kp = Ip . 2. For any p × q matrices A and B, Kp (A ⊗ B) = (B ⊗ A)Kq . 3. Kp = Kp  . Proof. Proofs are found in [150, ch. 3].

 

Since Kp vec{A} = vec{A }, where A = (aij ) is an arbitrary p × p matrix, it should be clear that Kp consists of only zeros and ones, and (only) those elements of vec{A} that correspond to the diagonal elements of A remain unchanged since transposition of A alters the locations of all but the diagonal elements. Therefore the following result is obtained. Lemma A.24. Assume p > 1, and A is p × p. Then Kp has elements equal only to 0 or 1 and has precisely p diagonal elements equal to 1. Proof. Note that (vec{A})i+(j−1)p = aij . Also, (Kp vec{A})i+(j−1)p = (vec{A })i+(j−1)p = a ij = aji = (vec{A})j+(i−1)p . Because the aij are arbitrary, the j +(i−1)p-th element of the i+(j −1)pth row of Kp must equal 1 while the other elements of this row must equal 0. Furthermore (vec{A})i+(j−1)p = (Kp vec{A})i+(j−1)p if and only if i+(j−1)p = j+(i−1)p ⇐⇒ (j−i)(p−1) = 0. Since p > 1 this is the case if and only if i = j. This occurs precisely p times because i, j = 1, . . . , p.  

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

A.2 Commutation and Conjugation matrices, and the vech{} operator

113

Before obtaining the elements of Kp (A⊗B) for p×p matrices A and B, the following definition obtains the elements of A ⊗ B: Definition A.25. [29] Let A and B be p × p matrices, then def

(A ⊗ B)ij,gh = (A ⊗ B)i+(j−1)p,g+(h−1)p = ajh big . Lemma A.26. Let A and B be p × p matrices, then (Kp (A ⊗ B))ij,gh = aih bjg . Proof. The g + (h − 1)p-th column of K(A ⊗ B) is Kp times the g + (h − 1)p-th column of A ⊗ B. Denote this column (A ⊗ B)·,gh , then Kp (A ⊗ B)·,gh = Kp (A.h ⊗ B·g ) = Kp (A·h ⊗ B·g ) = Kp vec{B·g (A·h ) } = vec{A·h (B·g ) } = B·g ⊗ A·h . The i + (j − 1)p-th element of this column is bjg aih and the result follows.   Definition A.27 (conjugation matrix). Let  Ip 0 . Lp = 0 −Ip This will be called the conjugation matrix for reasons that become clear in what follows. This matrix has the properties Lemma A.28. Let Lp be the conjugation matrix of Def. A.27, and let Z = A + iB, for real p × q matrices A and B. Then Lp [Z]R = [Z]R , Lp {Z}R Lq = {Z}R , L2p = Lp Lp = Ip , Lp {Z}R = {Z}R Lq , and {Kp }R Lp2 = Lp2 {Kp }R .  

Proof. This is straightforwardly verified. The following theorem is analogues to Thm. 11 in [150, ch. 3]. Theorem A.29. Let Np = 12 (I2p2 + {Kp }R Lp2 ). The matrix Np has the properties Np = Np = Np2 rank(Np ) = tr{Np } = p

(A.11) 2

Np ({K}R Lp2 ) = Np = ({K}R Lp2 )Np

(A.12) (A.13)

Proof. Subscripts are omitted if unambiguous. First Np = ( 12 (I + {Kp }R Lp2 )) = 12 (I +  Lp2 {Kp }R ) = 12 (I + Lp2 {K p }R ) = 12 (I + {Kp }R Lp2 ) = Np . Also, Np2 = 14 (I 2 + 2{K}R Lp2 + {Kp }R Lp2 {K}R Lp2 ) = 12 (I+{Kp }R Lp2 ) = Np , because {Kp }R Lp2 {Kp }R Lp2 = {Kp }R L2p2 {Kp }R = {Kp }R {Kp }R = {Kp2 }R = I. To prove the second equation, obtain the structure of Np : 1 Np = 2

 0 Ip2 + Kp . 0 Ip2 − Kp

Then, from lemma A.24 and the fact that rank(A) = tr{A} for any idempotent matrix [150, th. 1.21], rank(Np ) = tr{Np } = tr{I + Kp }/2 + tr{I − Kp }/2 = (p2 + p)/2 + (p2 − p)/2 = p2 . The third equation follows again from {Kp }R Lp2 {Kp }R Lp2 = I.   Lemma A.30. Let Z ∈ Cp×q respectively. Then Np {Z ⊗ Z}R = {Z ⊗ Z}R Nq . Proof. Np {Z⊗Z}R = 12 (I+{Kp }R Lp2 ){Z⊗Z}R = 12 ({Z⊗Z}R +{Kp (Z⊗Z)}R Lq2 ) = {Z⊗Z}R Nq .  

114

A Some operations on complex matrices

Lemma A.31. Let Z ∈ Cp×p be Hermitian, i.e. Z = Z ∗ . Then Np [vec{Z}]R = [vec{Z}]R . Proof. Np [vec{Z}]R = 12 (I + {Kp }R Lp2 )[vec{Z}]R . Now,   

vec{Z  }

vec{Z}

Kp vec{Z} = = . {Kp }R Lp2 [vec{Z}]R = {Kp }R [vec{Z}]R =  vec{Z}  Kp vec{Z}  vec{Z  }   The matrix Np is very similar to the matrix defined in Thm. 11 in [150, ch. 3]. Definition A.32 (complex structure duplication matrix). Suppose Σ is p × p Hermitian. Define vech{Σ} to be the operator that stacks the upper triangular elements of (Σ) on top of the above diagonal elements of (Σ) into a vector of length p2 . Obviously there must exist a matrix Dp such that [vec{Σ}]R = Dp vech{Σ}. This matrix will be called the vech{} duplication matrix. The matrix Dp is an extension to the vech{} operator of the duplication matrix for the vec{} operator defined in [29] and [150, sect. 3.8]. In particular, as for the symmetric case, the Hermitian structure of Σ does not restrict vech{Σ}. Therefore Dp must have full column rank p2 and its MP-inverse is given by Dp+ = (Dp Dp )−1 Dp . The following theorem and its proof are entirely analogues to Thm. 12 in [150, ch. 3] and have been adapted therefrom. Theorem A.33. Let Dp be the complex structure duplication matrix, then ({Kp }R Lp2 )Dp = Dp 1 Dp Dp+ = (I2p2 + {Kp }R Lp2 ) = Np . 2

(A.14) (A.15)

Proof. Let Σ be Hermitian. Then {Kp }R Lp2 Dp vech{Σ} = {Kp }R Lp2 [vec{Σ}]R = [vec{Σ}]R . The last equality follows from the facts that vec{ (Σ)} = vec{ (Σ) } = Kp vec{ (Σ)} and vec{(Σ)} = vec{−(Σ) } = −vec{(Σ) } = −Kp vec{(Σ)}. From the definition [vec{Σ}]R = 2 Dp vech{Σ}, and since vech{Σ} may freely vary in Rp , the first equation follows. From the first equation it follows that Np Dp = Dp . From theorem A.29 rankNp = p2 = rank(Dp ). Then by theorem 2.8 in [150, ch. 2] Np = Dp Dp+ . (This proof is modelled after the proof of theorem 3.12 in [150, ch. 3].)   The next theorem is the analogue of Thm. 13 in [150, ch. 3]. Theorem A.34. Let Z be a p × p matrix. Then Dp Dp+ {Z ⊗ Z}R Dp = {Z ⊗ Z}R Dp Dp Dp+ {Z



 Z}R Dp+

= {Z ⊗

(a)

 Z}R Dp+

(b)

⊗ Z −1 }R Dp

(c)

and if Z is non-singular, (Dp+ {Z ⊗ Z}R Dp )−1 = Dp+ {Z (Dp {Z ⊗ Z}R Dp )−1 = Dp+ {Z

−1

−1



⊗ Z −1 }R Dp+ .

(d)

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

A.3 Some results concerning the Moore-Penrose inverse

115

Proof. The proof is the almost same as that of theorem 13 in [150, ch. 3]: (a) and (b) follow from N p Dp = Dp ,

Dp Dp+ = Np ,

(A.16)

Np Dp+  = Dp+  ,

Np {Z ⊗ Z}R = {Z ⊗ Z}R Np ,

(A.17)

where the equality on the lower-right follows from Lem. A.30. Result (c) is proved my multiplication: Dp+ {Z ⊗ Z}R Dp Dp+ {Z

−1

⊗ Z −1 }R Dp

= Dp+ {Z ⊗ Z}R Np {(Z ⊗ Z)−1 }R Dp = Dp+ {Z ⊗ Z}R {Z ⊗ Z}R−1 Np Dp = Dp+ Dp = Ip2 . The last result (d) is obtained from (c) and Dp+ = (Dp Dp )−1 Dp : (Dp {Z ⊗ Z}R Dp )−1 = (Dp Dp Dp+ {Z ⊗ Z}R Dp )−1 = (Dp+ {Z ⊗ Z}R Dp )−1 (Dp Dp )−1 = Dp+ {Z

−1

⊗ Z −1 }R Dp (Dp Dp )−1 = Dp+ {Z

−1

⊗ Z −1 }R Dp+ .  

A.3 Some results concerning the Moore-Penrose inverse Lemma A.35. Suppose P 2 = P = P ∗ , P A = AP and A+ P = P A+ . Then (AP )+ = P A+ . Proof. Verify the defining properties of the Moore-Penrose inverse (p.110): (P A+ )(AP )(P A+ ) = ∗ ∗ P 3 A+ AA+ = P A+ and ((AP )(P A+ )) = P ∗ (AA+ ) P ∗ = P AA+ P = (AP )(P A+ ) and similarly ∗ (AP )(P A+ )(AP ) = AP and ((P A+ )(AP )) = (P A+ )(AP ).   Corollary A.36. Let A be a real p × p matrix. Then ((A ⊗ A)Np )+ = Np (A ⊗ A)+ . Proof. Note that Np = Np = Np2 , Np (A ⊗ A) = (A ⊗ A)Np , and Np (A ⊗ A)+ = Np (A+ ⊗ A+ ) = (A ⊗ A)+ Np , so that lemma A.35 implies that ((A ⊗ A)Np )+ = (A ⊗ A)+ Np .   Corollary A.37. Let Z be a complex p × p matrix. Then ({Z ⊗ Z}R Np )+ = Np {Z ⊗ Z}R+ . Proof. The proof is the same as the in the corollary above, with help of Lem. A.30.

 

Lemma A.38. Let Z be a complex n × p matrix, p < n, and V an positive definite Hermitian n × n matrix. Then (Z ∗ V Z)(Z ∗ V Z)+ = Z + Z. Proof. Let R∗ R = V be the Cholesky factorization of V , and let B = {Z ∗ }R {R∗ }R = {(RZ)∗ }R = {RZ}R , then BB  = {RZ}R {RZ}R = {Z ∗ R∗ RZ}R = {Z ∗ V Z}R . Using the fact that {Z + }R = {Z}R+ and B  (BB  )+ = B + [150, Thm. 2.3], {(Z ∗ V Z)(Z ∗ V Z)+ }R = BB  (BB  )+ = BB + = {RZ}R {RZ}R+ = ({Z}R {R}R )({Z}R {R}R )+ . Since |{R}R {R}R | = |{R}R ||{R}R | = 0, the latter equals {Z}R {Z}R+ by [150, Thm. 2.7]. Finally {Z}R {Z}R+ = ({Z}R+ {Z}R ) = {Z}R+ {Z}R = {Z + }R {Z}R = {Z + Z}R , hence lemma A.3 implies that (Z ∗ V Z)(Z ∗ V Z)+ = Z + Z.   Lemma A.39. Let Z ∈ Cm×n . Then rank(Z) = r ⇐⇒ rank({Z}R ) = 2r. Proof. First, rank({Z}R ) = rank({Z}R {Z}R ) and rank(Z) = rank(Z ∗ Z). From [26, lem. 3.7.1] {Z}R {Z}R = {Z ∗ Z}R has the same eigenvalues as Z ∗ Z with multiplicity 2. Since Z ∗ Z is Hermitian, the number of nonzero eigenvalues of Z ∗ Z equals its rank [199, Thm. 3.11].   Lemma A.40. [199, Thm. 5.9] Let A and B be complex m × r and r × n matrices respectively, with rank(A) = rank(B) = r, then (AB)+ = B + A+ .

116

A Some operations on complex matrices

Proof. A proof is given in [199] for the real case, which readily translates into the complex case through Thm. A.7 (p. 110)by {(AB)+ }R = {AB}R+ = ({A}R {B}R )+ , lemma A.39 then implies rank({A}R ) = rank({B}R ) = 2r. Since {A}R is 2m × 2r and {B}R is 2r × 2n therefore, ({A}R {B}R )+ = {B}R+ {A}R+ = {B + }R {A+ }R = {B + A+ }R . Lemma A.3 (p. 109) now yields the result.   Lemma A.41. Let Z be a complex n × p matrix, p < n, and M any nonsingular n × n matrix. Suppose rank(Z) = r. Then (Z ∗ M Z)(Z ∗ M Z)+ = Z + Z. Proof. Note that {Z ∗ M Z}R = {Z}R  {M }R {Z}R . By [199, cor. 1.9.1] there exists an 2n × 2r matrix F and an 2r × 2p matrix G, both of rank 2r, such that F G = {Z}R . Therefore, by Lem. A.40, ({Z}R  {M }R {Z}R )+ = (G F  {M }R F G)+ = G+ (F  {M }R F )−1 G + . Hence, we have {(Z ∗ M Z)(Z ∗ M Z)+ }R = {(Z ∗ M Z)}R {(Z ∗ M Z)}R+ = G (F  {M }R F )GG+ (F  {M }R F )−1 G + = G G + = (G+ G) = G+ G, where the fact that GG+ = Ir was used. Now G+ G = G+ F + F G, because F + F = Ir as well. Hence G+ G = (F G)+ F G = {Z}R+ {Z}R = {Z + Z}R , from which the result follows by Lem. A.3 (p. 109).   Lemma A.42. Let Z ∈ Cm×n . Then Z + = (Z ∗ Z)+ Z ∗ . Proof. A proof for the real case is given in [150, th. 2.5]. The complex case follows from {Z + }R = {Z}R+ = ({Z}R {Z}R )+ {Z}R = {Z ∗ Z}R+ {Z ∗ }R = {(Z ∗ Z)+ Z ∗ }R .   Lemma A.43. Let A and B be m × m matrices, such that AB = BA = 0, then the nonzero eigenvalues of A and B and their associated eigenvalues are eigenvalues*and associated * eigenvectors of A + B. Furthermore, if A + B is nonsingular, then |A + B| = ra=1 λa (A) m−r a=1 λa (B), where r = rank(A), and λa (A) is the a-th eigenvalue of A such that λa (A) ≥ λa+1 (A) and λa (B) defined similarly. Proof. From Ax = λx, (A + B)x = Ax + Bx = λx + B(Ax/λ) = λx. The same argument applies to Bx =*λx. The second part of the lemma follows directly from this and the facts that |A + B| = m a=1 λa (A + B) and m = rank(A + B) = rank(A) + rank(B) by the indicated assumptions on A, B, and A + B.  

A.4 Further results on the operator {}R In this section some results for the trace and determinant of {Z}R and the trace and determinant of Z are found, and relations with the conjugation, commutation and vech{} duplication matrix are established. Let d denote the differential operator defined in [150, ch. 5]. Lemma A.44. Let Z = A + iB ∈ Cpq , then d{Z}R = { dZ}R . Lemma A.45. Let Z ∈ Cpp , then tr{{Z}R } = 2 tr{Z}. 

Z −Z Proof. tr{{Z}R } = tr{ } = 2 tr{ Z} = 2 tr{Z}. Z Z Lemma A.46. Let Z ∈ Cpp and let Np be the matrix of Thm. A.29. Then 1 2 tr{{Z}R } = tr{Z}.

  tr{Np {Z}R } =

Proof. From Thm. A.29 (p. 113), 2 tr{Np {Z}R } = tr{(I + Lp2 {Kp }R ){Z}R } = tr{{Z}R } + tr{Lp2 {Kp Z}R }   = tr{{Z}R } − ( tr{ Kp Z} − tr{ Kp Z}) = tr{{Z}R }. Theorem A.47. Let Z ∈ Cpp such that |Z| = 0, then |{Z}R | = ( Mod |Z|)2 .

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

A.4 Further results on the operator {}R

117

Proof. From d|W | = |W | tr{W −1 dW } for any square matrix whose determinant = 0 [150, ch. 8], we find d|{Z}R | = |{Z}R | tr{{Z}R−1 d{Z}R } = |{Z}R |2 tr{Z −1 dZ} = |{Z}R |( tr{Z −1 dZ}+ tr{Z

−1

dZ}). By multiplying both sides with |Z||Z| the relation (|Z||Z|) d|{Z}R | = |{Z}R |(|Z||Z| tr{Z −1 dZ} + |Z||Z| tr{Z

−1

dZ})

= |{Z}R |(|Z| d(|Z|) + |Z| d(|Z|)) = |{Z}R | d(|Z||Z|) is obtained. This is a differential equation of the form xdy = ydx which has the (unique) general solution y = Cx. Therefore |{Z}R | = C|Z||Z|. This holds for all matrices Z, and in particular it holds for Z = I. Therefore 1 = |{I ⊗ I}R | = C|I||I| = C.   Theorem A.48. Let Dp be the vech{} duplication matrix of Def. A.32, and let Z ∈ Cpp . Then |Dp {Z ⊗ Z}R Dp | = |(Dp Dp )−1 |(|{Z ⊗ Z}R |)1/2 = |(Dp Dp )|−1 ( Mod |Z|)2p . Proof. The same strategy is used as in theorem A.47. As before 







d|(Dp+ {Z ⊗ Z}R Dp+ )| = |(Dp+ {Z ⊗ Z}R Dp+ )| tr{(Dp+ {Z ⊗ Z}R Dp+ )−1 Dp+ d{Z ⊗ Z}R Dp+ }. 

From Dp Dp+ = Np = Np2 , Np (Z ⊗ Z) = (Z ⊗ Z)Np and (Dp+ {Z ⊗ Z}R Dp+ )−1 = Dp {Z ⊗ Z}R−1 Dp (theorems A.29–A.34) the relation 



d|(Dp+ {Z ⊗ Z}R Dp+ )| = |(Dp+ {Z ⊗ Z}R Dp+ )| tr{Np {Z ⊗ Z}R−1 d{Z ⊗ Z}R } 

= |(Dp+ {Z ⊗ Z}R Dp+ )| tr{Np {(Z ⊗ Z)−1 d(Z ⊗ Z)}R } is found. Using lemma A.46 and multiplying both sides by |{Z ⊗ Z}R | the equality   1 |{Z ⊗ Z}R | d|(Dp+ {Z ⊗ Z}R Dp+ )| = |(Dp+ {Z ⊗ Z}R Dp+ )| d |{Z ⊗ Z}R | 2

is obtained. This is a differential equation of the form xdy = 12 ydx with general solution y =  Cx1/2 , hence |(Dp+ {Z ⊗ Z}R Dp+ )| = C(|{Z ⊗ Z}R |)1/2 . As this must hold for any Z, it also holds    for Z = I, and hence C = |(Dp+ Dp+ )| = 1/|(Dp Dp )|. Lemma A.49. |(Dp Dp )| = 2p(p−1) . Proof. Since Dp vech{Z} = [vec{Z}]R for all Z, each row of Dp has at most one element not equal to zero. Therefore Dp Dp must be diagonal. Also, because Dp duplicates certain elements of vech{Z} and merely copies others, the columns of Dp have either one or two elements unequal to zero. The elements unequal zero must equal either 1 or −1, hence (Dp ).i (Dp ).i = 1 or = 2 for i = 1, . . . , p2 . The former occurs only for the p diagonal elements of (Z). Therefore |(Dp Dp )| = 2   2p −p = 2p(p−1) . 

To summarize the latter two results: |(Dp+ {Z⊗Z}R Dp+ )| = 2−p(p−1) ( Mod |Z|)2p . The result of Thm. A.47 was stated in [26, ch. 3] and [6, app. B], and proved by construction of the eigenvalues of {Z}R . A.4.1 Var(vec{V}) and Var([vech{S}]R ) We restate the definition of the complex normal distribution of Chap. 3. Definition A.50. A complex valued stochastic variable z is said to have a multivariate complex normal distribution denoted CN (µ, Σ), if and only if [z]R ∼ N ([µ]R , 12 {Σ}R ).

118

A Some operations on complex matrices

An immediate consequence of this definition is that Var (Z) = Var(Z)

and

Cov (Z)(Z) = −Cov(Z) (Z).

Let zl , l = 1, . . . , L be observations from the m-dimensional random vector z ∼ CN (0, Σ). Define L L 1 1 ∗  V= [zl ]R [zl ]R S= zz . L L l=1

l=1

Using the results of the previous sections, we now determine the relation between V and S. In particular the relation between Varvec{V} and Var[vech{S}]R is found. First the relations between V and S are considered. Lemma A.51. {S}R = V + Jm VJm  , where Jm = {iIm }R .   1 L   ∗} = 1 ∗} = 1 Proof. By Lem. A.8, {S} = { z z {z z l l l l R R R l l ([zl ]R [zl ]R + [izl ]R [izl ]R ) = l=1 L L L    1 1 1       l [zl ]R [zl ]R + L l [izl ]R [izl ]R = V + L l Jm [zl ]R [zl ]R Jm = V + Jm VJm , by Lem. A.21.  L Corollary A.52. vec{{S}R } = (I + Jm ⊗ Jm )vec{V}. Proof. The result follows from the equality vec{ABC} = (C  ⊗ A)vec{B}. Lemma A.53. Var vec{V} = of Def. A.22.

1 4L (I + K2n )({Σ}R

⊗ {Σ}R ), where Kp is the commutation matrix  

Proof. A proof can be found for the general real case in [18, ch. 6]. Corollary A.54. Var vec{{S}R } =

1 2L (I

 

+ K)({Σ}R ⊗ {Σ}R + {iΣ}R ⊗ {iΣ}R ).

Proof. The equality Km (A ⊗ B) = (B ⊗ A)Kn holds for any n × n matrix A and m × m matrix B (see lemma A.23). Furthermore (J ⊗ J)({Σ}R ⊗ {Σ}R ) = ({Σ}R ⊗ {Σ}R )(J ⊗ J). Therefore, using Lem. A.10, Var vec{{S}R } = (I + J ⊗ J)Var vec{V}(I + J ⊗ J) 1 = (I + J ⊗ J) (I + K)(I + J ⊗ J)({Σ}R ⊗ {Σ}R )(I + J ⊗ J) 4L 1 (I + K)(I + J ⊗ J)2 ({Σ}R ⊗ {Σ}R ) = 4L 1 (I + K)(I + J ⊗ J)({Σ}R ⊗ {Σ}R ) = 2L The result follows from the last expression since (J ⊗ J)({Σ}R ⊗ {Σ}R ) = J{Σ}R ⊗ J{Σ}R = {iΣ}R ⊗ {iΣ}R .   Corollary A.55. By lemma A.17 (p. 112) it follows that Var vec{{S}R } =

1 (I + K)((H  ⊗ H ∗ )(Σ ⊗ Σ)(H ⊗ H) + (H ∗ ⊗ H  )(Σ ⊗ Σ)(H ⊗ H)). (A.18) L

Lemma A.56. Var vec{S} =

1 L (Σ

⊗ Σ).

Proof. vec{Σ} = (H ⊗ H)vec{{Σ}R } from Cor. A.13. Also (H ⊗ H)(H ⊗ H) = 0 (Lem. A.14). Furthermore (H ⊗ H)(I + K2n )(H  ⊗ H ∗ ) = (H ⊗ H)((H  ⊗ H ∗ ) + (H ∗ ⊗ H  )Kn ) = (H ⊗ H)(H  ⊗ H ∗ ) = I (again Lem. A.14). Therefore, pre- and post-multiplying (A.18) of Cor. A.55 

by (H ⊗ H) and (H ⊗ H) yields (H ⊗ H)Varvec{{S}R }(H ⊗ H) = Lemma A.57. Var[vec{S}]R =

1 2L (I4m2

+ {K}R Lm2 ){Σ ⊗ Σ}R .

1 L (Σ

⊗ Σ).

 

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

A.4 Further results on the operator {}R

119

√ Proof. 2[vec{S}]R = H ∗ vec{S} + H  vec{S} by Lem. A.20 on page 112. The last expression  equals H ∗ vec{S} + H  vec{S} = H n2 vec{S} + Hn 2 Kn vec{S} = (H n2 + Hn 2 Kn )vec{S} because S is Hermitian. Therefore, using the equalities Kp = Kp , Kp Kp = Ip and Kp (A ⊗ B) = (B ⊗ A)Kp from Lem. A.23 (p. 112),  √   Var 2L[vec{S}]R = (H n2 + Hn 2 Kn )(Σ ⊗ Σ)(H n2 + Hn 2 Kn ) 

= H n2 (Σ ⊗ Σ)Hn2 + Hn 2 Kn (Σ ⊗ Σ)Kn H n2 + Hn 2 Kn (Σ ⊗ Σ)Hn2 + H n2 (Σ ⊗ Σ)Kn H n2

(A.19)



= H n2 (Σ ⊗ Σ)Hn2 + Hn 2 (Σ ⊗ Σ)H n2 + Hn 2 Kn (Σ ⊗ Σ)Hn2 + H n2 Kn (Σ ⊗ Σ)H n2 = {Σ ⊗ Σ}R + Hn 2 Kn (Σ ⊗ Σ)Hn2 + H n2 Kn (Σ ⊗ Σ)H n2 

where the last equality follows from corollary A.16. It may be verified that H n2 Kn = {Kn }R H n2 .   Furthermore Hn 2 = Ln2 H n2 and H n2 = Ln2 Hn 2 . Hence Hn 2 Kn (Σ ⊗ Σ)Hn2 + H n2 Kn (Σ ⊗  Σ)H n2 = {K}R Ln2 H n2 (Σ ⊗ Σ)Hn2 + {K}R Ln2 Hn 2 (Σ ⊗ Σ)H n2 = {Kn }R Ln {Σ ⊗ Σ}R , again 1 (I2n2 + {Kn }R Ln2 ){Σ ⊗ Σ}R .   from corollary A.16. Therefore Var[vec{S}]R = 2L The matrix Var[vec{S}]R may also be written {Σ ⊗ Σ}R + Ln2 {Kn (Σ ⊗ Σ)}R . Therefore it has the block structure   1 ((I + Kn )(Σ ⊗ Σ)) −((I + Kn )(Σ ⊗ Σ))

(vec{S}) Var (A.20) = (vec{S}) 2L ((I − Kn )(Σ ⊗ Σ)) ((I − Kn )(Σ ⊗ Σ)) This has the following corollary: Corollary A.58. The elements of Var[vec{S}]R may be written componentwise as Cov (Sij ) (Sgh ) = (σ jh σig + σ ih σjg )/2L,

(A.21)

Cov (Sij )(Sgh ) = −(σ jh σig + σ ih σjg )/2L,

(A.22)

Cov(Sij )(Sgh ) = (σ jh σig − σ ih σjg )/2L.

(A.23)

Proof. The result follows from lemma A.26 (p. 113).

 

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

B Nonlinear weighted least squares for complex random variables

Let f (θ) be a complex valued function of a real parameter p-vector θ. Let yL be an observation of an m-dimensional complex valued random vector YL satisfying the relation YL = f (θ0 ) + εL

(B.1)

where εL is a random vector having E{εL } = 0 and Var{εL } = Σ/L where Σ is Hermitian and positive definite. The objective is to obtain an estimate of θ0 by minimizing G(θ; yL ) = [yL − f (θ)]R A[yL − f (θ)]R .

(B.2)

where A is a real positive definite weight matrix. The estimate of θ0 obtained from G is the weighted least squares estimate. Theorem B.1. Suppose f is continuous on a compact set Θ ⊂ Rp containing θ0 , and f (θ) = f (θ0 ) =⇒ θ = θ0 for θ ∈ Θ. Then the estimate θ˘ = arg minθ G(θ; yL ) of θ0 is consistent. p

Proof. The quadratic form G is continuous in yL for all θ. Since yL → f (θ0 ) uniformly as L → ∞, p therefore, by theorem 3.2.6 in [5, ch. 3] G(θ; yL ) → [f (θ0 ) − f (θ)]R A[f (θ0 ) − f (θ)]R uniformly. The latter attains its minimum zero if and only if f (θ) ≡ f (θ0 ). From f (θ) = f (θ0 ) =⇒ θ = θ0 and continuity of G in both yL and θ, the point where G has its minimum converges to θ0 .   Suppose there is a initial estimate θ1 . Define  ∂f + ∂f , ∂

∂θ = , ∆1 =  [f (θ)]R = ∂f ∂θ ∂θ R θ=θ1 θ1  ∂θ  the Jacobian matrix of [f (θ)]R , assuming that it exists. If the elements of [∂f /∂θ ]R are continuous and θ1 is sufficiently close to θ0 then [f (θ0 )]R ≈ [f (θ1 )]R + ∆1 δ1 ,

(B.3)

where δ1 = θ0 − θ1 . From (B.1) it is found that [yL ]R ≈ [f (θ1 )]R + ∆1 δ1 + [εL ]R or [yL − f (θ1 )]R ≈ ∆1 δ1 + [εL ]R .

(B.4)

An estimate of δ1 that minimizes ([yL − f (θ1 )]R − ∆1 δ1 ) A([yL − f (θ1 )]R − ∆1 δ1 ) is δ˘1 = (∆1 A∆1 )−1 ∆1 A[yL − f (θ1 )]R ,

(B.5)

if ∆1 has full column rank. An improved estimate of θ0 is then the update θ2 = θ1 + δ˘1 . Recursive application of the steps in (B.3)-(B.5) in which the current estimate of θ0 replaced by it’s update, yields a sequence of estimates θk . Here it must be assumed that [∂f /∂θ ]R exists, is bounded and has full column rank at each updated estimate θk , k = 1, 2, . . .. A final estimate θ˘ is obtained

B Nonlinear weighted least squares for complex random variables

121

if no further improvements is made, i.e. if δ˘k = (∆k A∆k )−1 ∆k A[yL − f (θk )]R = 0 for some k ∈ {1, 2, 3, . . .}. The variance of θ˘ is obtained from the first order Taylor expansion of [f ]R around θ0 : By the ˘ = [f (θ0 )] + ∆∗ (θ˘ − θ0 ) where ∆∗ = [∂f /∂θ ] |θ=θ for some θ∗ in mean value theorem [f (θ)] ∗ R R R ˘ Here it must be assumed that the parameter search space is the line segment joining θ0 and θ. convex. Then ˘ = [f (θ0 )] − [f (θ)] ˘ + [εL ] = ∆∗ (θ˘ − θ0 ) + [εL ] . [yL − f (θ)] R R R R R ˘ −1 ∆ ˘  A∆) ˘  A yields Pre-multiplying both sides of this equation by (∆ ˘ = 0 = (∆ ˘ −1 ∆ ˘  A∆) ˘ −1 ∆ ˘  A∆) ˘  A[yL − f (θ)] ˘  A(∆∗ (θ˘ − θ0 ) + [εL ] ). (∆ R R

(B.6)

˘ = ∆k from the above sequence of iterates. From this we find that Here ∆ ˘ −1 ∆ ˘ ∆ ˘  A∆) ˘ −1 ˘  A∆) ˘  A∆∗ (θ˘ − θ0 )(θ˘ − θ0 ) ∆∗ A∆( (∆ ˘  A∆) ˘ −1 ∆ ˘ ∆ ˘  A∆) ˘ −1 ˘  A[εL ] [εL ] A∆( = (∆ R R ˘  A∆∗ (θ˘ − θ0 )(θ˘ − θ0 ) ∆∗ A∆ ˘ =∆ ˘  A[εL ] [εL ] A∆ ˘ ⇐⇒ ∆ R R or,

˘  A∆∗ )−1 ∆ ˘ ∆ ˘  A∆∗ )−1 , ˘  A[εL ] [εL ] A∆( (θ˘ − θ0 )(θ˘ − θ0 ) = (∆ R R

(B.7)

p p ˘ → ˘  A∆∗ )−1 exists. If θ˘ → provided (∆ θ0 then θ∗ − θ 0 and, if [∂f /∂θ ]R exists and is continuous, p p p ˘ − ∆∗ → 0. Furthermore since ∆ ˘ → ∆0 , ∆∗ → ∆0 where ∆0 = [∂f /∂θ |θ ] . Therefore ∆ 0 R p E(θ˘ − θ0 )(θ˘ − θ0 ) → (∆0 A∆0 )−1 ∆ 0 AE{[εL ]R [εL ]R  }A∆0 (∆ 0 A∆0 )−1 .

(B.8)

Denote E{[εL ]R [εL ]R } = V /L. These considerations prove the following theorem: Theorem B.2. Let yL be an observation from a complex m-vector random variable Y , with E{YL } = f (θ0 ) and positive definite covariance matrix Σ. Suppose that E{[YL − f (θ0 )]R [YL − f (θ0 )]R } = V /L is positive definite. Furthermore, assume the following set of regularity conditions hold (R1) f is continuous on a convex and compact set Θ ⊂ Rp and θ0 is an interior point of Θ. (Compactness assures that G has a minimum in Θ.) (R2) f (θ) = f (θ0 ) =⇒ θ = θ0 (i.e., f identifies θ0 ). ∂f (R3) The Jacobian matrix J = ∂θ  exists, is bounded, continuous, and ∆ = [J|θ ]R has full column rank for all θ ∈ Θ. √ Then the estimate θ˘ of θ0 obtained from minimizing G(θ) in (B.2) is consistent and L(θ˘ − θ0 ) has asymptotic variance (B.9) (∆0 A∆0 )−1 (∆0 AV A∆0 )(∆0 A∆0 )−1 . Proof. From the regularity conditions theorem B.1 implies consistency. Therefore all conditions in the considerations above are satisfied.   Lemma B.3. If A = {W }R for some complex Hermitian W , then θ˘ is obtained from minimizing (yL − f (θ))∗ W (yL − f (θ)), and



L θ˘ has asymptotic variance ( J ∗ 0 W J0 )−1 [W ∗ J0 ]R V [W J0 ]R ( J ∗ 0 W J0 )−1 .

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

122

B Nonlinear weighted least squares for complex random variables

Proof. By lemmas A.9 and A.10 of App. A and equation (A.4), since A = {W }R , G(θ; yL ) = [yL − f (θ)]R {W }R [yL − f (θ)]R = (yL − f (θ))∗ W (yL − f (θ)), and the latter inner product is real in any case because W is Hermitian. Also from equation (A.4)   and lemma A.9 (App. A), ∆0 A∆0 = [J0 ]R {W }R [J0 ]R = J0 ∗ W J0 . √ A lower bound on the variance of L θ˘ is obtained in the following theorem. √ Theorem B.4. [29, prop. 3] The asymptotic variance of L θ˘ of theorem B.2 is bounded from below by (∆0 V −1 ∆0 )−1 . The bound is attained if A = V −1 . If V −1 = {W }R for some Hermitian complex matrix W , this reduces to ( [J0 ∗ W J0 ])−1 . Proof. Let Π(B) = (∆0 B∆0 )−1 ∆0 B for and appropriately dimensioned B, then (∆0 A∆0 )−1 ∆0 AV A∆0 (∆0 A∆0 )−1 − (∆0 V −1 ∆0 )−1 = Π(A)V Π(A) − (∆0 V −1 ∆0 )−1 ∆0 V −1 V V −1 ∆0 (∆0 V −1 ∆0 )−1 = Π(A)V Π(A) − Π(V −1 )V Π(V −1 ) = (Π(A) − Π(V −1 ))V (Π(A) − Π(V −1 ) (The latter equality can be verified by expansion of the product.) The latter term is at least positive semi-definite since V is positive definite. Therefore symbolically (∆0 A∆0 )−1 (∆0 AV A∆0 )(∆0 A∆0 )−1 ≥ (∆0 A∆0 )−1 , with equality if and only if A = V −1 . The result in case that V −1 = {W }R is due to ∆0 A∆0 =

[J0 ∗ W J0 ].  

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

C Generalization of the concentrated likelihood method

Unfortunately, the standard theory of the concentrated likelihood as put forward in [179] and [5], assumes that the Hessian matrix of the likelihood function is nonsingular. In the SML method developed in Chap. 3, the Hessian of the likelihood is singular. In this appendix we therefore generalize the concentrated likelihood, and show that it can be made valid for the singular Hessian case.1 The following theorem generalizes Thm. 2.2 in [200], originating from [179], which is basic to the concentrated likelihood method, to a case that does not require the Hessian matrix of the log-likelihood function to be nonsingular. It does require the Hessian of L to be positive semi-definite at a root of the score vector ∂L/∂δ, and requires the p × p submatrix of the Hessian corresponding to a subset of p parameters to be positive definite. As indicated in [200], the theorem applies to any function L that satisfies the conditions indicated below, not just log-likelihood functions. Theorem C.1. Let L be a function of θ ∈ Rp and τ ∈ Rq . Assume L is twice differentiable. ˆ  ) solve ∂L/∂δ = 0, i.e. Let δˆ = (ˆ τ , θ ∂L(τ , θ) =0 δˆ ∂τ Define

and

 ∂2L − ∂τ ∂τ  |δˆ ∂ 2 L J =− ˆ =  δ ∂2L ∂δ∂δ − ∂θ∂τ  |δ ˆ

∂L(τ , θ) = 0. δˆ ∂θ 2

∂ L − ∂τ |ˆ ∂θ δ 2

∂ L − ∂θ∂θ  |δ ˆ

 =

  Jτ τ J τ θ , Jθτ Jθθ

(C.1)

and assume that J is positive semi-definite, Jθθ is positive definite, and rank(J ) = rank(Jθθ)+ rank(Jτ τ ). ˆ → Rq , defined on a neighborhood Suppose there exists a differentiable function γ : N (θ) ˆ ˆ ˆ N (θ) of θ, such that for every fixed θ ∈ N (θ), γ(θ) solves ∂L(τ , θ)/∂τ = 0. That is, ∂L(τ , θ) = 0. τ =γ(θ) ∂τ

(C.2)

Define M (θ) = L(γ(θ), θ). Then (i) ∂M/∂θ|θˆ = 0, and (ii) The p × p submatrix of J + that corresponds to the elements of θ equals [−∂ 2 M/∂θ∂θ  ]+ . Here, X + denotes the Moore-Penrose generalized inverse of X. Proof. Most of the proof closely follows the proof for the nonsingular Hessian case given in [200]. The first order derivative of M is 1

We have not been able to find the treatment of the concentrated likelihood method in this situation in the literature, but the results presented here may not be new.

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

124

C Generalization of the concentrated likelihood method

∂L ∂M = + ∂θ ∂θ τ =γ(θ)



∂γ ∂θ 



∂L ∂L = , ∂τ τ =γ(θ) ∂θ τ =γ(θ)

because ∂L/∂θ|τ =γ(θ) ≡ 0 by definition of γ. Therefore ∂L ∂L ∂M = 0, ˆ = ˆ θ ˆ = θ τ =γ( θ), ∂θ ∂θ ∂θ τˆ ,θˆ which proves (i). To prove (ii): The matrix of second order partial derivatives of M is  2  . ∂ L ∂2M ∂ - ∂L ∂ 2 L ∂γ = = + . ∂θ∂θ  ∂θ  ∂θ τ =γ(θ) ∂θ∂θ  ∂θ∂τ  ∂θ  τ =γ(θ) From (C.2) on the other hand we find that    2  ∂ L ∂ ∂L ∂ 2 L ∂γ = 0= + , ∂θ  ∂τ τ =γ(θ) ∂τ ∂θ  ∂τ ∂τ  ∂θ  τ =γ(θ) or − Hence

∂ 2 L ∂ 2 L ∂γ .  τ =γ(θ) = ∂τ ∂τ  τ =γ(θ) ∂θ  ∂τ ∂θ

∂ 2 L(θ, τ ) ∂2M −  = ∂θ∂θ ∂θ∂θ  τ =γ(θ)



∂γ ∂θ 



 ∂γ ∂ 2 L(θ, τ ) . ∂τ ∂τ  τ =γ(θ) ∂θ 

(C.3)

(C.4)

However, by pre-multiplying both sides of (C.3) by  2 + ∂ L ∂ 2 L − , ∂θ∂τ  τ =γ(θ) ∂τ ∂τ  τ =γ(θ) and using (C.3) again together with the property XX + X = X, we obtain the equality  2 + 2    ∂ L ∂γ ∂γ  ∂ 2 L ∂ L ∂ 2 L . = ∂θ∂τ  τ =γ(θ) ∂τ ∂τ  τ =γ(θ) ∂τ ∂θ  τ =γ(θ) ∂τ ∂τ  τ =γ(θ) ∂θ  ∂θ 

(C.5)

Therefore, (C.4) implies  + 2 ∂ 2 L ∂2M ∂ 2 L ∂ 2 L ∂ L − − = − − . ∂τ ∂τ  τ =γ(θ) ∂τ ∂θ  τ =γ(θ) ∂θ∂θ  ∂θ∂θ  τ =γ(θ) ∂θ∂τ  τ =γ(θ) ˆ then yields Evaluation in θ = θ ∂ 2 L ∂ 2 L ∂ 2 M = − − − ˆ ˆ ∂θ∂θ  θ ∂θ∂θ  δ ∂θ∂τ  δˆ



∂ 2 L − ∂τ ∂τ  δˆ

+

∂ 2 L = Jθθ − Jθτ Jτ+τ Jτ θ, ∂τ ∂θ  δˆ

(C.6)

ˆ since τˆ = γ(θ). Now, by assumptions on J and Jθθ, and Lem. 3.12 (p. 54) taken from [192], we have  + Jτ τ + Jτ+τ Jτ θX + Jθτ Jτ τ −Jτ+τ Jθτ X + + , (C.7) J = −X + Jτ+θJτ+τ X+ where X = Jθθ − Jθτ Jτ+τ Jτ θ. But from (C.6)   ∂ 2 M + = (Jθθ − Jθτ Jτ+τ Jτ θ)+ , − ∂θ∂θ  θˆ

(C.8)

ˆ which is precisely the p × p submatrix of the generalized inverse Hessian of L evaluated at δ, corresponding to the elements of θ.  

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

C Generalization of the concentrated likelihood method

125

The essential observation in the proof is (C.5) and this is also the most important difference between the proof here and the proof given in [179,200]. Note furthermore, that only the property AA+ A = A of the Moore-Penrose inverse of a matrix A was used, and therefore may be replaced by any generalized inverse (g-inverse) A− for which AA− A = A. Then the right hand side of (C.7) is a g-inverse of J even if the assumption rank(J ) = rank(Jθθ) + rank(Jτ τ ) does not hold (cf. [192]). If the theorem is applied to a likelihood function , it turns out that the distribution of the estimator obtained with the concentrated likelihood c is asymptotically normal, and its covariance matrix can be obtained from the Hessian of the concentrated likelihood. To show √ ˆ − θ 0 ), θ ˆ obtained from optimizing c (θ) = (τ , θ)|τ =γ(θ), is asymptotically normal, that N (θ let  = log LN , where LN denotes the likelihood if a set of N independent identically distributed variables. Then, under general conditions 0 / 1 ∂2 1 ∂ d √ |δ0 → N (0, − plim  |δ0 ), N ∂δ∂δ N ∂δ

as N → ∞,

(C.9)

which is well known (see e.g. [5, 207, 228]). Assuming continuous third order derivatives of  exist in a convex neighborhood of δ 0 , from a Taylor expansion, given θ 0 , 0=

∂ ∂ 2  ∂ = + [γ(θ 0 ) − τ 0 ] θ0 ,γ(θ0 ), ∂τ  ∂τ δ0 ∂τ ∂τ θ0 ,γ(θ0 ) ∂ 3  1  [γ(θ 0 ) − τ 0 ] [γ(θ 0 ) − τ 0 ] + 2 ∂τi ∂τ ∂τ  θ0 ,¨γ

(C.10)

¨ is on the line element between γ(θ 0 ) and τ 0 . (Note that the last term of (C.10) is a where γ vector whose elements are indexed by i.) If N1 ∂ 3 /∂τi ∂τ ∂τ  converges stochastically to a constant p as N → ∞, and (θ  , γ(θ) ) → (θ 0 , τ 0 ), then   √ 1 ∂ 2  1 ∂ √ = − + o (1) N [γ(θ 0 ) − τ 0 ], (C.11) p N ∂τ ∂τ θ0 ,γ(θ0 ) N ∂τ δ0 where op (1) denotes a random variable that converges stochastically to zero as N → ∞. In a similar manner,   √ 1 ∂ 1 ∂ 2  1 ∂ √ √ = + + o (1) N [γ(θ 0 ) − τ 0 ], p θ ,γ(θ ) θ ,τ θ ,γ(θ )  0 0 N ∂θ∂τ 0 N ∂θ 0 N ∂θ 0 0 which with (C.3) can be written 1 ∂ 1 ∂ √ √ = + N ∂θ θ0 ,γ(θ0 ) N ∂θ θ0 ,τ 0



 √ 1 ∂γ  ∂ 2  − + o (1) N [γ(θ 0 ) − τ 0 ], p N ∂θ θ0 ∂τ ∂τ  θ0 ,γ(θ0 )

Combining (C.11) with the latter result, assuming ∂γ/∂θ  |θ0 converges in probability to a constant, we have   1 ∂ 1 ∂ 1 ∂c d ∂γ . √ √ √ = → I , plim p ∂θ θ0 N ∂θ θ0 N ∂θ θ0 ,γ(θ0 ) N ∂δ δ0 √ Therefore by (C.9), (1/ N )∂c /∂θ|θ0 is asymptotically normal, with covariance matrix

Grasman, R.P.P.P. (2004) "Sensor array signal processing and the neuro-electromagnetic inverse problem in functional connectivity analysis of the brain", PhD-thesis, University of Amsterdam

126

C Generalization of the concentrated likelihood method



   ∂2 ∂γ  ∂γ   Ip plim Ip plim − plim ∂θ ∂θ ∂δ∂δ   2  2 2 ∂  ∂γ ∂  ∂  ∂γ ∂γ  ∂ 2  ∂γ = − plim + + + ∂θ ∂τ ∂θ  ∂θ∂τ  ∂θ  ∂θ ∂τ ∂τ  ∂θ  ∂θ∂θ   2 ∂γ  ∂ 2  ∂γ ∂  − = plim ∂θ ∂τ ∂τ  ∂θ  ∂θ∂θ     ∂2 + ∂2 ∂2 ∂2 − = plim − − ∂τ ∂τ  ∂τ ∂θ  ∂θ∂θ  ∂θ∂τ  = Iθθ − Iθτ Iτ+τ Iτ θ,

(C.12) (C.13)

where the second and third equalities follow from (C.3) and (C.4) respectively, all partials are evaluated in δ = δ 0 , and Iθτ = − plim ∂ 2 /∂θ∂τ  |δ0 , etc. But then, from  2 c  ∂c ∂  ∂c ˆ − θ 0 ), |ˆ = |θ + |θ + op (1) (θ 0= ∂θ θ ∂θ 0 ∂θ∂θ  0 it follows that −

√ 1 ∂ 2 c d + ˆ  |θ0 N [θ − θ 0 ] → N (0, Iθθ − Iθτ Iτ τ Iτ θ). N ∂θ∂θ

Hence, if Iθθ − Iθτ Iτ+τ Iτ θ is nonsingular, then by (C.6) √ / 0−1 2 c d N (θˆ − θ0 ) → N (0, − plim 1 ∂  |θ ). N ∂θ∂θ

0

∂ 2 c | ∂θ∂θ θ0

is nonsingular, and therefore

Nederlandse Samenvatting

Dit proefschrift gaat over de mogelijkheid om de samenwerking tussen verschillende hersengebieden te onderzoeken bij mensen, zonder hen daarbij bloot te stellen aan een mogelijk schadelijke medische ingreep.

Inleiding Experimenteel psychologen, cognitief psychofysiologen en andere cognitie (neuro-)wetenschappers zijn zeer geinteresseerd in de vraag hoe het menselijk brein alledaagse cognitieve functies als waarnemen, geheugen, aandacht, afremmen van automatische reacties, denken, lezen, beslissen, etc., succesvol tot stand brengt. Psychologen onderzoeken deze functies meestal door een zo eenvoudig mogelijke taak te bedenken waarin, de functie waarin men is geinteresseerd noodzakelijkerwijs een rol speelt. Door te kiezen voor hele eenvoudige taken, zorgt de experimentator ervoor dat de betreffende functie zo goed mogelijk geisoleerd onderzocht kan worden van andere functies. In een experiment krijgen proefpersonen vervolgens de opdracht de taak in een groot aantal trials herhaaldelijk uit te voeren, waarbij de onderzoeker dan kijkt naar allerlei maten van prestatie—zoals de reactiesnelheid en het aantal fouten dat gemaakt wordt. Aan de hand van de prestatie probeert men af te leiden hoe de onderzochte cognitieve functie in elkaar steekt, en hoe de informatieverwerkingsstroom daarbij georganiseerd is. Fysiologische meetvariabelen, als hartritme, zweetklieractiviteit, en hersenactiviteit, die tijdens de taak gemeten kunnen worden, geven daarnaast informatie over allerlei niet direct observeerbare activiteit in het zenuwstelsel die gepaard gaat met het uitvoeren van de taak. Omdat vanuit neuropsychologisch en dierexperimenteel onderzoek het er steeds meer op lijkt dat de hogere cognitieve functies met name tot stand komen in de hersenschors—de buitenste laag van de grote hersenen, verschaft vooral de hersenactiviteit zelf belangrijke informatie over deze functies. Neuroanatomisch en dierfysiologisch onderzoek wijst er bovendien op dat de hersenschors sterk georganiseerd is in functionele deelgebieden, die gespecialiseerd zijn in het uitvoeren van specifieke taken. Daarom zou het onderzoek naar de organisatie van de informatieverwerkingsstroom sterk vereenvoudigd worden wanneer bij een psychologisch experiment bepaald kan worden in welke delen van de hersenen activiteit optreedt die specifiek gerelateerd is aan de taak in het experiment. Met de komst van de zogenaamde “neuroimagingtechnieken” wordt dit steeds beter mogelijk, en toepassing van deze technieken in het cognitieonderzoek is de laatste jaren dan ook explosief gegroeid. Naast het identificeren van de hersenschorsdelen die betrokken zijn bij een cognitieve functie, zou het ook wenselijk zijn om vast te kunnen stellen welke delen van de hersenen met elkaar interacteren. Omdat het hierbij gaat over wederzijdse beinvloeding van de verschillende hersendelen, speelt dynamiek van de hersenactiviteit over de tijd hierin een belangrijke rol. Het is de vraag in hoeverre de neuroimagingtechnieken gebruikt kunnen worden om interacties tussen verschillende hersendelen vast te stellen.

128

Nederlandse Samenvatting

Er bestaat een groot aantal technieken waarmee men de activiteit van de hersenen kan meten. Een aantal daarvan zijn geschikt voor psychologisch onderzoek omdat ze niet invasief zijn—dat wil zeggen, ze behoeven geen medische ingreep: Dit zijn het elektroencefalogram, het magnetoencefalogram en het functionele magnetisch resonantie beeld (magnetic resonance image—MRI). Met behulp van functionele MRI (f MRI) kan de mate van doorbloeding in de hersenen worden gemeten met millimeter nauwkeurigheid. In combinatie met structurele MRI (die in tegenstelling tot f MRI een nauwkeurige foto van de structuur van de hersenen maakt) levert dit een duidelijk beeld op van de variatie in de doorbloeding in verschillende hersengebieden. Hoewel het duidelijk is dat verschillende gebieden meer doorbloed raken in de ene taak dan in de andere, strekt de verandering over tijd van de doorbloeding op een bepaalde locatie zich uit over meerdere seconden (6 to 20 seconden), terwijl de meeste in psychologisch onderzoek gebruikte taken binnen 300 to 700 milliseconden kunnen worden uitgevoerd door de gemiddelde proefpersoon. Daardoor is f MRI minder geschikt voor het onderzoeken van de interactiedynamiek in de hersenen—f MRI heeft zogezegd een lage temporele resolutie. Met behulp van het elektroencefalogram (EEG) en het magnetoencefalogram (MEG) kunnen de elektromagnetische velden die het gevolg zijn van de elektrische activiteit in de hersenen worden geregistreerd. In tegenstelling tot fMRI zijn EEG en MEG een tamelijk directe afspiegeling van de elektrische activiteit van de zenuwcellen waaruit de hersenen zijn opgebouwd. Het EEG en MEG wordt gemeten met behulp van een groot aantal (zo’n 30 tot meer dan 150) sensoren— de sensor array—die evenredig verspreid zijn over de oppervlakte van het hoofd. Zodoende meet EEG de elektrische potentiaalverschillen tussen verschillende plekken op de hoofdhuid die ontstaan door verschillen in activiteit van neuronen in verschillende hersengebieden, en meet MEG op verschillende plekken vlak boven het hoofd de variatie in het magnetische veld dat worden opgewekt door de stromen die gepaard gaan met neurale activiteit. Met milliseconden precisie worden zo de elektrische potentialen en de magnetische velden geregistreerd in de vorm van een reeks metingen over de tijd, of signalen—voor iedere sensor ´e´en signaal. Hoewel, dus in tegenstelling tot f MRI deze signalen met milliseconden precisie gemeten worden (EEG en MEG hebben een hoge temporele resolutie), en ze daarom zeer geschikt zijn om iets te vertellen over de dynamiek in de hersenactiviteit, is het weer veel moeilijker om aan de hand van deze signalen te bepalen waar de activiteit vandaan komt. Met behulp van zogenaamd stroomdipoolmodellen, die uit de natuurkunde van de elektromagnetische veldtheorie komen, kan dit op een verantwoorde wijze vergemakkelijkt worden.

Samenvatting Valide conclusies over interacties tussen hersengebieden In hoofdstuk 1 wordt kort een inleiding gegeven tot elektrische stroomdipoolmodellen voor de elektrofysiologische bronnen die ten grondslag liggen aan de neuro-elektromagnetische activiteit, zoals die weerslag vind in EEG en MEG signalen. Deze stroomdipolen zijn van fundamenteel belang voor het oplossen van het zogenaamde neuro-elektromagnetische inverseprobleem: het bepalen (schatten) van de locaties, ori¨entaties, ´en amplitudes (bronsterkte) van de bronnen van elektrofysiologisch activiteit in de hersenen, op basis van de EEG en MEG signalen. Verschillende procedures worden in dit hoofdstuk besproken om het inverseprobleem op te lossen met behulp van stroomdipoolmodellen. Zowel niet-lineaire zoekalgoritmen voor enkel- en meervoudige dipool-bronmodellen als lineaire zoekmethoden en zogenaamde gedistribueerde bronmodellen komen aan de orde. Omdat de meetgegevens (de data) die geregistreerd worden met EEG en MEG onderhevig zijn aan storende invloeden, zijn deze data feitelijk kansvariabelen—dat wil zeggen, ze vari¨eren met een zekere mate van onvoorspelbaarheid. Met name de statistische aspecten van de methoden worden daarom benadrukt, en er wordt een aantal problematische

Nederlandse Samenvatting

129

aspecten van de methoden besproken, die vanuit de statistische aanpak als vanzelfsprekend naar voren komen, terwijl ze vanuit de traditionele puur technische aanpak soms verborgen blijven. Een aantal van de besproken problemen kan bij benadering worden omzeild door de signalen in het frequentiedomein te bekijken, waarvoor ze getransformeerd moeten worden met behulp van een Fourier transformatie. In het frequentiedomein worden de signalen dan gerepresenteerd met behulp van complexwaardige getallen—de Fourier co¨effici¨enten. Aan het einde van het hoofdstuk wordt geconcludeerd, dat deze methoden met redelijk succes zijn toegepast op de trial-gemiddelde signalen—de zogenaamde Event Related Potentials/Fields (ERP/ERF; signalen met een karakteristieke vorm die de hersenen produceren naar aanleiding van een specifieke gebeurtenis tijdens het experiment). Maar de vraag of deze dipoolmodellen ook ingezet kunnen worden voor het onderzoeken van interactiviteit tussen hersengebieden is nog maar zeer ten dele bekeken, en de voorgestelde methoden hiervoor hebben zo hun haken en ogen. In hoofdstuk 2, wordt kort besproken hoe zenuwcellen in de hersenen met elkaar communiceren en elkaar wederzijds beinvloeden. Daarnaast wordt een overzicht gegeven van de methoden die in de literatuur veel gebruikt zijn voor het onderzoeken van cortico-corticale interacties (wederzijdse beinvloeding tussen verschillende delen van de schors)—zowel op basis van EEG en MEG signalen, als op basis van f MRI en PET meetgegevens. Het onderscheid tussen functionele en effectieve connectiviteit (c.q. interactie) tussen hersendelen dat bij dergelijke analyses gemaakt moet worden, is kort besproken en gerelateerd aan de verschillende signaalanalyse methoden die in dit hoofdstuk worden besproken. Deze methoden, die ontwikkeld zijn in de zeer algemene theorie over systemen, signalen en signaalanalyse, omvatten kruis-correlatie, coherentie- en faseanalyse, en event related (de-)synchronisatie analyse van de signalen van meerdere sensoren tegelijkertijd; maar ook netwerk- of concectiviteitsanalyse op basis van principale componenten analyse (niet-parametrisch) of netwerkanalyse op basis van parametrische covariantie en/of tijdreeksmodellen2 waaronder structurele vergelijkingsmodellen (SEM), vector autoregressie modellen (VAR) en dynamische factoranalyse (DFA). Geen van deze methoden leidt echter tot een eenduidige interpretatie voor wat betreft de betrokken hersengebieden en hun interacties. Dit komt doordat deze methoden te algemeen zijn, en geen rekening houden met de specifieke biofysische oorsprong van de signalen (i.c. EEG en MEG), of doordat de meetgegevens een te beperkte tijdsresolutie hebben (i.c. f MRI/PET). Voor EEG en MEG signalen, komt de dynamische factor analyse een signaalstructuur aanneemt die het dichtst bij de structuur komt die is te verwachten op basis van de neurofysische oorsprong van de signalen. Het signaalmodel achter dynamische factoranalyse vormt daarom de basis voor de technieken die zijn uitgewerkt in de hoofdstukken 3, 4 en 5. Het onderscheid tussen functionele en effectieve connectiviteit kan worden gezien als een motivatie voor het uitbreiden van de methode uit hoofdstuk 4—die functionele connectiviteit identificeert—met het raamwerk voor dynamische structurele vergelijkingsmodellen in hoofdstuk 5, waarbinnen effectieve connectiviteit gemodelleerd wordt in termen van zogenaamde respons kernels uit de algemene theorie van lineaire tijdsinvariante systemen. Meervoudige dipoolmodellen voor functionele en effectieve connectiviteitsanalyse In hoofdstuk 3 worden de statistische fundamenten besproken voor de methoden die zijn uitgewerkt in hoofdstukken 4 en 5. Zowel gegeneraliseerd kleinste kwadraten (generalized least squares—GLS: een veralgemenisering van het gebruikelijke minimaliseren van het gekwadrateerde verschil tussen gemeten waarden en voorspelde waarden), als de methode van de grootste aannemelijkheid (maximum likelihood—ML) worden besproken. Voor een klasse van gewone (re¨eelwaardige) kansvariabelen is de beste GLS methode voor samenhangstructuur analyse, of covariantiestructuur analyse, voor het eerst ontwikkeld door Browne [29]. Deze methode is in hoofstuk 3 uitgebreid naar een klasse van complexwaardige kansvariabelen, waarvan de complexwaardige normale of Gaussische verdeling een karakteristiek voorbeeld is. In het hoofdstuk 2

modellen voor samenhang tussen signalen, die beschreven kunnen worden aan de hand van een zeer beperkt aantal parameters (karakteriserende grootheden)

130

Nederlandse Samenvatting

wordt aangetoond dat de met behulp van deze GLS methode verkregen schattingen van de modelparameters (zoals bijvoorbeeld locaties en ori¨entaties) voldoen aan de statistisch wenselijke eigenschappen van consistentie (op den duur, met meer en meer meetgegevens, zal de methode de correcte parameterwaarden vinden), asymptotisch Gaussisch verdeeld zijn, en statistische effici¨entie (op den duur hebben ze van alle zuivere schatters de theoretisch kleinst mogelijke schattingsfout). De ML-methode geeft schattingen (meest aannemelijke schatters) die van nature vaak deze eigenschappen hebben, maar die in veel gevallen moeilijker te berekenen zijn. Omdat deze methoden het gebruik van rekenintensieve niet-lineaire zoekalgoritmen vereisen, is een computationeel tijdseffici¨ent algoritme afgeleid voor de ML-methode voor het schatten van het confirmatieve factormodel dat de basis is voor de methoden in volgende hoofdstukken. Dit algoritme is gebaseerd op een geconcentreerde likelihood methode die in de signaalverwerkingsliteratuur bekend staat als ‘stochastic maximum likelihood’ (SML). De voordelen van de in dit proefschrift ontwikkelde uitbreiding op het standaard SML algoritme, relevant voor met name MEG signalen, zijn het toestaan van een meer algemene covariantie structuur van de ruissignalen, het opnemen van een gemiddelde structuur, en het toestaan van een ‘array manifold’ (of ‘factor ladingen matrix’ zoals ze wordt genoemd in haar oorspronkelijk psychometrische context) met een gereduceerde kolommen rang—zonder daarbij de oorspronkelijke computationele voordelen te verliezen van de oorspronkelijke formulering van SML door B¨ohme (geciteerd in [134, 220]). Een gelijksoortig computationeel voordelig algoritme is ook ontwikkeld voor GLS schatten van het confirmatieve factormodel in het hoofdstuk. Deze algoritmen maken routinematige toepassing op EEG en MEG beter mogelijk dan met de directe methoden die gebruikt zijn in hoofdstukken 4 en 5. De analytische uitdrukkingen voor de schatters van sommige parameters die werden bepaald bij het concentreren van de likelihood en het GLS criterium, kunnen ook worden gebruikt voor het evalueren van asymptotische en eindige steekproefonzuiverheid zoals die werd geobserveerd in hoofdstuk 4. Aan het einde van het hoofdstuk is beargumenteerd waarom de gepresenteerde GLS en ML methoden geschikt zijn voor de Fourier getransformeerde signalen. Het voornaamste argument daarbij is dat de Fourier co¨effici¨enten bij benadering complex Gaussisch verdeeld zijn, en bovendien asymptotisch onafhankelijk zijn voor verschillende frequenties—dat wil zeggen, de getransformeerde representatie van de signalen zijn meer Gaussisch verdeeld dan de signalen zelf, en hoewel de signaalwaarden op opeenvolgende tijdstippen een sterke samenhang kunnen vertonen, kunnen de Fourier co¨effici¨enten van opeenvolgende frequenties bij benadering als statistisch onafhankelijk worden behandeld. Deze twee aspecten vereenvoudigen de statistische aspecten van het schatten aanzienlijk. In hoofdstuk 4 is een methode ontwikkeld om tegelijk met het schatten van de bronparameters (te weten, de locaties en ori¨entaties) van meerdere actieve hersengebieden, ook de mate van hun interactie te schatten. De methode maakt gebruik van het confirmatieve factormodel, en een vergelijking is gemaakt tussen het gebruik van de GLS methode en de ML methode. Uit de simulatiestudie die is gerapporteerd in hoofdstuk 4, is geconcludeerd dat onder omstandigheden met een realistische signaallengte, een realistisch aantal sensoren en een realistisch aantal trials, de ML methode beter werkt dan de GLS methode. Dit, omdat de laatste tamelijk onzuivere schattingen van de bron amplitude kruis-spectra gaf, terwijl de corresponderende ML schattingen nagenoeg zuiver waren. Omdat op voorhand het aantal actieve hersengebieden meestal onbekend is, is in de simulatie ook gekeken of het juiste aantal bronnen bepaald kon worden op basis van een statistische toetsingsgrootheid die in beide methode automatisch wordt berekend tijdens het schatten. Voor de ML methode is dit de gegeneraliseerde toets op de waarschijnlijkheidsratio (generalized likelihood ratio test—GLRT), die een chi-kwadraat (χ2 ) verdeling heeft. In de simulaties bleek deze GLRT zeer behulpzaam bij het bepalen van het aantal bronnen, maar in het algemeen bij toepassing op empirische data moet men voorzichtig zijn met het gebruik ervan: het model dat geschat wordt is noodzakelijkerwijs een benadering van de werkelijkheid, en de GLRT zal

Nederlandse Samenvatting

131

met een toenemende hoeveelheid data meer en meer de neiging hebben het model af te keuren vanwege deze onontkoombare benadering. Bij het toepassen op MEG data die waren verkregen in een visueel stimulatie-experiment, bleek de methode helaas geen overtuigend interpreteerbare resultaten op te leveren. Een aantal oorzaken zijn genoemd voor deze problematische toepassing: een lage signaal/ruis-verhouding (dat wil zeggen, de bijdrage van de gemodelleerde bronnen aan de totale covariantiestructuur is klein vergeleken met de bijdrage van de ruis), het ontbreken van een adequaat ruismodel (in de toepassing was aangenomen dat het ruisniveau voor alle sensoren gelijk was, en dat er geen afhankelijkheid was tussen de ruissignalen op verschillende sensoren), afhankelijkheid tussen de bronsignalen en de ruissignalen, een onjuist aantal bronnen, de benadering van het hoofd met een isotrope geleider van concentrisch homogene bollen, en afhankelijkheid tussen de Fourier co¨effici¨enten door eindige signaallengte (hoewel dit laatste eigenlijk uitsluitend een effect op de statistische effici¨entie van de schatters zou moeten hebben). In hoofdstuk 5 is de methode van hoofdstuk 4 uitgebreid met een gemiddelde structuur die geassocieerd is met het signaalmodel dat ten grondslag ligt aan de confirmatieve factorstructuur die daar gebruikt werd. De additionele informatie die bevat is in het gemiddelde zou de accuratesse van de bron-parameterschattingen moeten verhogen. Daarnaast zijn in hoofdstuk 5 idee¨en die afkomstig zijn uit de lineaire structurele vergelijkingen modellen voor latente variabelen en uit de lineaire systeem theorie gecombineerd in een algemeen raamwerk voor het schatten van dynamische structurele relaties tussen bronnen. In dit raamwerk wordt samenhang tussen signalen gemodelleerd in termen van lineaire filter-responsen. Het resulterende modelleringskader biedt, in principe, de mogelijkheid op voorhand geformuleerde hypothesen over vermoede effectieve connectiviteit tussen verschillende neurale bronnen te toetsen. De algemeenheid van deze benadering ligt in het feit dat de lineaire respons-kernels geinterpreteerd mogen worden als de eerste orde termen in een Volterra expansie van de in het algemeen niet-lineaire interactie dynamiek. De simulaties die in het hoofdstuk zijn gerapporteerd wijzen uit dat de gebruikelijke asymptotische benadering van de eigenschappen van de ML schatters voldoende accuraat is bij experimenten met meer dan 200 trials. De GLRT toetsingsgrootheid werd onderzocht op het vermogen interacties tussen de bronnen de detecteren. Hoewel dit inderdaad het geval was in de simulaties, gelden dezelfde voorbehouden bij het gebruik van deze toets als eerder genoemd. Toepassing van de methode op MEG signalen verkregen uit een experiment zijn niet gerapporteerd, omdat dit tot hetzelfde soort interpretatieproblemen leidde als die bij de toepassing die werd gerapporteerd in hoofdstuk 4. Lineaire aanpak bij connectiviteitsonderzoek In hoofdstuk 6 worden de lineaire benadering van het inverseprobleem nader onderzocht op hun mogelijkheden bij het detecteren van functionele connectiviteit. Dit was voor een belangrijk deel ingegeven door de problematische resultaten van de methoden uit hoofdstukken 4 en 5 op empirische data. Een aantal theoretische overwegingen zijn uitgewerkt die betrekking hebben op fundamentele beperkingen op de mogelijkheden van lineaire methoden om de stroomdichtheid (als indicatie voor de mate van activiteit) op een bepaalde locatie in het hoofd te reconstrueren. De problemen zijn het gemakkelijkst te bestuderen binnen de theoretische kaders van de estimable functions, de resolutiematrix (voor gedistribueerde bronmodellen [160]), en het daaraan gerelateerde resolutieveld concept [146]. In overeenstemming met de theorie van estimable functions voor overgeparameteriseerde regressiemodellen [199], wordt in hoofdstuk 6 gesteld dat men zich bij het neuro-elektromagnetische inverseprobleem moet richten op het schatten van die gewogen gemiddeldes van de stroomdichtheid, die zuiver geschat kunnen worden. In deze gewogen gemiddelden krijgen alle mogelijke bronlocaties een wegingsfactor die de relatieve bijdrage van een bron op die locatie aan het gewogen gemiddelde bepaalt. De wegingsfactoren moeten zodanig geconstrueerd worden, dat het resulterende gewogen gemiddelde een gemakkelijke interpretatie

132

Nederlandse Samenvatting

toestaat. Dit betekent dat zogenaamde averaging kernels gevonden moeten worden die zuiver geschat kunnen worden, ´en bijvoorbeeld zo selectief mogelijk zijn voor een bepaalde locatie of hersengebied—dat wil zeggen, die selectief zoveel mogelijk weging toekennen aan bronlocaties in ´e´en gebied en zo min mogelijk aan alle andere gebieden. Een fundamentele beperking hierbij is dat het aantal vrij te kiezen parameters voor het vastleggen van deze wegingsfactor gelijk is aan het aantal sensoren—oneindig veel minder dan het aantal mogelijke bronlocaties. Hierdoor is het niet mogelijk voor iedere gewenste locatie een selectieve averaging kernel te construeren. De locaties en vormen van gebieden waarvoor selectieve averaging kernels bestaan hangt in hoge mate af van de eigenschappen van de biofysische relatie tussen neurale bronnen en de MEG/EEG metingen. Om binnen deze restricties op zoek te kunnen gaan naar averaging kernels die zeer selectief zijn voor nauw gefocusseerde gebieden, moet eerst een criterium geformuleerd worden dat aangeeft in welke mate een averaging kernel ‘selectief voor een nauw gefocusseerd gebied’ is. Meer algemeen kan worden gezocht naar estimable functions die een ‘zo eenvoudig mogelijke interpretatie’ hebben. De criteria die gebruikt zijn in hoofdstuk 6 als maat voor de ‘eenvoud van interpretatie’ van een averaging kernel, zijn geleend van de analytische rotatieprocedures uit de psychometrische factor analyse literatuur. Met ´e´en van deze criteria, het zogenaamde ‘entropie’ criterium, werd een averaging kernel gevonden die meer gefocusseerd was (in termen van de Backus-Gilbert piekbreedte van de spati¨ele autocorrelatiefunctie) dan de averaging kernels verkregen met conventionele lineaire methoden. Echter, zelfs deze meest gefocusseerde averaging kernel was gevoelig voor bronnen verspreidt door het hele hoofd. Dit geeft aan dat, tenzij deze wijdverspreide hersengebieden elektrisch inactief zijn (of op z’n minst hun signaal relatief zwak is), het resulterende gewogen gemiddelde niet gezien kan worden als een gemiddelde van de activiteit in een sterk gelokaliseerd deel van de hersenen. Anders gezegd, lijkt het er dus op dat voor eenduidige conclusies over functionele connectiviteit in de hersenen is het noodzakelijk dat de activiteit in de cortex zelf sterk geconcentreerd is rond een beperkt aantal locaties. Conclusie In hoofdstuk 7 is de laatste conclusie verder uitgewerkt in termen van geschatte correlaties tussen de activaties van verschillende hersengebieden. Uit de gepresenteerde discussie kan de conclusie worden getrokken dat voor eenduidige inferenties over interacties tussen verschillende delen van de hersenen een noodzakelijke voorwaarde is dat de activiteit in de cortex zelf zich concentreert rond een beperkt aantal sterk gefocusseerde gebieden. Bovendien moet de locatie van deze foci van activiteit nauwkeurig bepaald worden. In het geval dat de activiteit voornamelijk uit dergelijke foci komt, maar de locatives van deze foci niet (nauwkeurig) bekend zijn, zijn de ML methoden die zijn voorgesteld in hoofdstukken 4 en 5 in theorie optimaal. Het SML algoritme uit hoofdstuk 3 maakt het in dat geval mogelijk de locaties en ori¨entaties van deze foci effici¨ent en stabiel te schatten. Omdat het vanuit fysiologisch oogpunt niet plausibel is dat in het algemeen aan deze eisen voldaan wordt, zal gezocht moeten worden naar experimentele taakmanipulaties waarmee situaties gecre¨eerd kunnen worden waarin w´el aan deze eisen wordt voldaan.

References

1. A. Achim, F. Richer, and J. M. Saint-Hilaire. Methods for separating temporally overlapping sources of neuroelectric data. Brain Topography, 1(1):22–28, 1988. 2. L. I. Aftanas, A. A. Varlamov, S. V. Pavlov, V. P. Makhnev, and N. V. Reva. Affective picture processing: event-related synchronization within individually defined human theta band is modulated by valence dimension. Neuroscience Letters, 303:115–118, 2001. 3. D. Aharoni and H. Pratt. A note on the common reference debate. Electroenceph. and clin. Neurophysiol., 91:488–490, 1994. 4. J. Aitchinson and S. D. Silvey. Maximum likelihood estimation of parameters subject to restraints. Annals of Mathematical Statistics, 29:813–828, 1958. 5. T. Amemiya. Advanced Econometrics. Harvard University Press, Cambridge MA, 1986. 6. H. H. Andersen, M. Hojbjerre, D. Sorensen, and P. S. Eriksen. Linear and Graphical Models (for the Multivariate Complex Normal Distribution). Springer-Verlag, New York, 1995. 7. T. W. Anderson. The use of factor analysis in the statistical analysis of multiple time series. Psychometrika, 28:1–25, 1963. 8. T. W. Anderson. An introduction to multivariate statistical analysis. Wiley, New York, 1971. 9. C. Andrew and G. Pfurtscheller. Event-related coherence as a tool for studying dynamic interaction of brain regions. Electroencephalography and clinical Neurophysiology, 98:144–148, 1996. 10. M. Arnold, W. H. R. Miltner, H. Witte, R. Bauer, and C. Braun. Adaptive AR modeling of nonstationary time series by means of Kalman filtering. IEEE Trans. BME, 45(5):553–562, 1998. 11. G. Backus and F. Gilbert. Uniqueness in the inversion of inaccurate gross earth data. Phil. Trans. Roy. Soc. Lond., A, 266(1173):123–192, 1970. 12. S. Baillet, J. C. Mosher, and R. M. Leahy. Electromagnetic brain mapping. IEEE Signal processing magazine, pages 14–30, Nov 2001. 13. M. S. Bartlett. A note on the multiplying factors for various χ2 approximations. J. Roy. Stat. Soc., 16:296–298, 1954. 14. P. M. Bentler and P. Dugeon. Covariance structure analysis: statistical practice, theory, and directions. Annual Review of Psychology, 47:562–592, 1996. 15. C. A. Biggins, F. Ezekiel, and G. Fein. Spline computation of scalp current density and coherence: a reply to Perrin. Electroenceph. and clin. Neurophysiol., 83:172–174, 1992. 16. C. A. Biggins, G. Fein, J. Raz, and A. Amir. Artifactually high coherences result from using spherical spline computation of scalp current density. Electroenceph. and clin. Neurophysiol., 79:413–419, 1991. 17. F. Bijma, J. C. de Munck, H. M. Huizenga, and R. M. Heethaar. A mathematical approach to the temporal stationarity of background noise in MEG/EEG measurements. NeuroImage, 20:233–243, 2003. 18. M. Bilodeau and D. Brenner. Theory of multivariate statistics. Springer-Verlag, New York, 1999.

134

References

19. J. F. B¨ ohme. Estimation of spectral parameters of correlated signals in wavefields. Signal Processing, 10:329–337, 1986. 20. K. A. Bollen. Structural equations with latent variables. Wiley series in probability and mathematical statistics. Wiley, New York, USA, 1st edition, 1989. 21. J. P. R. Bolton, J. Gross, L. C. Liu, and A. A. Ioannides. SOFIA: spatially optimal fast initial analysis of biomagnetic signals. Phys. Med. Biol., 44:87–103, 1999. 22. G.E.P. Box and G.M. Jenkins. Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco, CA, 2nd edition, 1976. 23. L. A. Bradshaw and J.P. Wikswo Jr. A spatial filter approach for evaluation of the surface Laplacian of EEG and MEG. Ann. Biomed. Eng., 29:202–213, 2001. 24. S. L. Bressler. Event-related potentials. In M. A. Arbib, editor, The Handbook of Brain Theory and Neural Networks, pages 412–415. MIT Press, Cambridge, MA, 2002. 25. D. R. Brillinger. The identification of point process systems. The Annals of Probability, 3(6):909–929, Dec 1975. 26. D. R. Brillinger. Time Series: data analysis and theory. International series in decision processes. Holt, Reinhart and Winston Inc., New York, 1975. 27. D. R. Brillinger. Nerve cell spike train data analysis: A progression of technique. J. Am. Stat. Ass., 87(418):260–271, Jun 1992. 28. D. R. Brillinger and M. Hatanaka. An harmonic analysis of nonstationary multivariate economic processes. Econometrica, 37(1):131–141, Jan. 1969. 29. M. W. Browne. Generalized least squares estimators in the analysis of covariance structures. South African Statistical Journal, 8:1–24, 1974. 30. M. W. Browne. Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37:62–83, 1984. 31. M. W. Browne. An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36(1):111–150, 2001. 32. M. W. Browne and S. H. C. du Toit. Automated fitting of nonstandard models. Multivariate Behavioral Research, 27:269–300, 1992. 33. C. B¨ uchel, J. T. Coull, and K. J. Friston. The predictive value of changes in effective connectivity for human learning. Science, 283:1538–1541, Mar 1999. 34. C. B¨ uchel and K. J. Friston. Modulation of connectivity in visual pathways by attention: cortical interactions evaluated with structural equation modelling and fMRI. Cereb. Cortex, 7(8):768–778, 1997. 35. C. B¨ uchel and K. J. Friston. Dynamic changes in effective connectivity characterized by variable parameter regression and Kalman filtering. Human Brain Mapping, 6:403–408, 1998. 36. E. Bullmore, B. Horwitz, G. Honey, M. Brammer, S. Williams, and T. Sharma. How good is good enough in path analysis of fMRI data? NeuroImage, 11:289–301, 2000. 37. J. Cao, N. Murata, S.-I. Amari, A. Cichocki, and T. Takeda. Independent component analysis for unaveraged single-trial MEG data decomposition and single-dipole source localization. Neurocomputing, 49:255–277, 2002. 38. J. B. Carroll. An analytic solution for approximating simple structure in factor analysis. Psychometrika, 18:23–28, 1953. 39. M. J. Cassidy and P. Brown. Hidden Markov based autoregressive analysis of stationary and nonstationary electrophysiological signals for functional coupling studies. J. Neuroscience Methods, 116:35–53, 2002. 40. D. Cheyne, G. R. Barnes, I. E. Holliday, and P. L. Furlong. Localization of brain activity associated with non-time-locked tactile stimulation using synthetic aperture magnetometry (SAM). In J. Nenonen, R. J. Ilmoniemi, and T. Katila, editors, BioMag2000. Proceedings of 12th Int. Conf. Biomagnetism, pages 681–685, Espoo, Finland, Aug 2000. Helsinki University of Technology.

References

135

41. I. A. Cook, R. O’Hara, S. H. J. Uijtdehaage, M. Mandelkern, and A. F. Leuchter. Assessing the accuracy of topographic EEG mapping for determining local brain function. Electroencephalography and clinical Neurophysiology, 107(6):408–414, 1998. 42. C. Cuilla, T. Takeda, and H. Endo. MEG characterization of spontaneous alpha rhythm in the human brain. Brain Topography, 11(3):211–222, 1999. 43. R. Dahlhaus. Graphical interaction models for multivariate time series. Preprint, 1995. 44. R. Dahlhaus, M. Eichler, and J. Sandk¨ uhler. Identification of synaptic connections in neural ensembles by graphical models. J. Neuroscience Methods, 77:93–107, 1997. 45. A. M. Dale, A. K. Liu, B. R. Fischl, R. L. Buckner, J. W. Belliveau, J. D. Lewine, and E. Halgren. Dynamic Statistical Parametric Mapping: combining fMRI and MEG for high-resolution imaging of cortical activity. Neuron, 26:55–67, Apr. 2000. 46. A. M. Dale and M. I. Sereno. Improved localization of cortical activity by combining EEG and MEG with MRI cortical surface reconstruction - a linear-approach. J. Cogn. Neuroscience, 5(2):162–176, 1993. 47. O. David, D. Cosmelli, D. Hasboun, and L. Garnero. A multitrial analysis for revealing significant cortico-cortical networks in magnetoencephalography and electroencephalography. Neuroimage, 20(1):186–201, Sep 2003. 48. O. David, D. Cosmelli, J. P. Lachaux, S. Baillet, L. Garnero, and J. Martineri. A theoretical and experimental introduction to the non-invasive study of large-scale neural phase synchronization in human beings (invited paper). International Journal of Computational Cognition, 1(4):53–77, Dec. 2003. 49. O. David, L. Garnero, D. Cosmelli, and F. J. Varela. Estimation of neural dynamics from MEG/EEG cortical current density maps: application to the reconstruction of large-scale cortical synchrony. IEEE Trans BME, 49(9):975–987, Sep 2002. 50. P. Dayan and L. F. Abbott. Theoretical Neuroscience : Computational and Mathematical Modeling of Neural Systems. MIT Press, Boston, 1st edition, 2001. 51. A. de Jongh, J. C. de Munck, Baayen J. C. Jonkman E. J., R. H. Heethaar, and B. W. van Dijk. The localization of spontaneous brain activity: first results in patients with cerebral tumors. Clinical Neurophysiology, 112(2):378–385, 2001. 52. J. C. de Munck. A mathematical and physical interpretation of the electromagnetic fields of the brain. PhD thesis, University of Amsterdam, Amsterdam, 1989. 53. J. C. de Munck. The estimation of time varying dipoles on the basis of evoked potentials. Electroencephalography and Clinical Neurophysiology, 77(2):156–160, March - April 1990. 54. J. C. de Munck, A. de Jongh, and B. W. van Dijk. The localization of spontaneous brain activity: an efficient way to analyze large data sets. IEEE Transactions on Biomedical Engineering, 48:1221– 1228, 2001. 55. J. C. de Munck, H. M. Huizenga, L. J. Waldorp, and R. M. Heethaar. Estimating stationary dipoles from MEG/EEG data contaminated with spatially and temporally correlated background noise. IEEE Transactions on Biomedical Engineering, 50(7):1565–1572, 2002. 56. J. C. De Munck, H. M. Huizenga, L. J. Waldorp, and R. M. Heethaar. Estimating stationary dipoles from MEG/EEG data contaminated with spatially and temporally correlated background noise. IEEE Transactions on Signal Processing, 50(7):1565–1572, 2002. 57. J. C. de Munck, B. W. van Dijk, and H. Spekreijse. Mathematical dipoles are adequate to describe realistic generators of human brain activity. IEEE T Biomed. Eng., 35(11):960–966, Nov. 1988. 58. J. C. de Munck, P. C. M. Vijn, and F. H. Lopes da Silva. A random dipole model for spontaneous brain activity. IEEE Transactions on Biomedical Engineering, 39(8):986–990, Aug. 1992. 59. K. Deepak, M. Singh, and M. Don. Spatio-temporal EEG source localization using simulated annealing. IEEE Trans. Biomed. Eng., 44(11):1075–1091, Nov. 1997. 60. A. Delorme, S. Makeig, M. Fabre-Thorpe, and T. Sejnowski. From single-trial EEG to brain area dynamics. Neurocomputing, 44–46:1057–1064, 2002.

136

References

61. Y. Dezhong. High-resolution EEG mappings: a spherical harmonic spectra theory and simulation results. Clinical Neurophysiol., 111:81–92, 2000. 62. C. V. Dolan and P. C. M. Molenaar. A comparison of four methods of calculating standard errors of maximum likelihood estimates in the analysis of covariance structure. British Journal of Mathematical and Statistical Psychology, 44:359–369, 1991. 63. G. Dumermuth and L. Molinari. Relationships among signals: cross-spectral analysis of the EEG. In R. Weitkunat, editor, Digital Biosignal Processing, pages 361–398. Elsevier Science, Amsterdam, 1991. 64. J. J. Ermer, J. C. Mosher, M. Huang, and R. M. Leahy. Paired MEG data set source localization using recursively applied and projected (RAP) MUSIC. IEEE Trans. on Biomedical Engineering, 47(9):1248–1260, 2000. 65. M. Essl and P. Rappelsberger. Eeg coherence and reference signals: experimental results and mathematical explanations. Med. & Biol. Eng. & Comp., 36(4):399–406, 1998. 66. G. Fein, J. Raz, F. F. Brown, and E. L. Merrin. Common reference coherence data are confounded by power and phase effects. Electroenceph. and clin. Neurophysiol., 69:581–584, 1988. 67. J. Fell, O. Hauk, and H. Hinrichs. Linear inverse filtering improves spatial separation of nonlinear brain dynamics: a simulation study. Journal of Neuroscience Methods, 98:4956, 2000. 68. P. J. Franaszczuk, K. J. Blinowska, and Kowalczyk. The application of parametric multichannel spectral estimates in the study of electrical brain activity. Biological cybernetics, 51:239–247, 1985. 69. C. C. French and J. G. Beaumont. A critical review of EEG coherence studies of hemisphere function. Int. J. Psychophysiol., 1(3):241–254, 1984. 70. K. J. Friston. Functional specialization and integration in the brain: and example from schizophrenia research. In R. W. Thatcher, G. Reid Lyon, J. Rumsey, and N. Krasnegor, editors, Developmental Neuroimaging: Mapping the development of brain and behavior, chapter 19, pages 281–295. Academic Press, San Diego, USA, 1996. 71. K. J. Friston. Imaging cognitive anatomy. Trends in Cogn. Sci., 1(1):21–27, 1997. 72. K. J. Friston and C. B¨ uchel. Attentional modulation of effective connectivity from V2 to V5/MT in humans. Neuroimage, 14:1353–1360, 2001. 73. K. J. Friston, C. Buechel, G. R. Fink, J. Morris, E. Rolls, and R. J. Dolan. Psychophysiological and modulatory interactions in neuroimaging. Neuroimage, 6:218–229, 1997. 74. K. J. Friston, K. M. Stephan, and R. S. J. Frackowiak. Transient phase-locking and dynamic correlations: are they the same thing? Human Brain Mapping, 5:218–229, 1997. 75. J. Gerson, V. A. Cardenas, and G. Fein. Equivalent dipole parameter estimation using simulated annealing. Electroenceph. and clin. neurophysiol., 92:161–168, 1994. 76. G. L. Gerstein, P. Bedenbaugh, and M. H. J. Aertsen. Neuronal assemblies. IEEE Trans. Biomed. Eng., 36(1):4–14, Jan. 1989. 77. A. Gevins. The future of electroencephalography in assessing neurocognitive functioning. Electroencephalog. and Clin. Neurophysiol., 106:165–172, 1998. 78. P. E. Gill, W. Murray, M. A. Saunders, and M. H. Wright. User’s guide for NPSOL (Version 4.0): A FORTRAN package for nonlinear programming. SOL, Stanford, California, 4.0 edition, 1998. 79. P. E. Gill, M. H. Wright, and W. Murray. Nonlinear Programming. Stanford University Press, Stanford, 1986. 80. G. H. Golub and V. Pereyra. The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM J. Numer. Anal., 10(2):413–432, April 1973. 81. M. S. Gon¸calves, D. A. Hall, I. S. Johsrude, and M. P. Haggard. Can meaningful effective connectivities be obtained between auditory cortical regions. Neuroimage, 14:1353–1360, 2001. 82. M. Goossens, F. Mittelbach, and A. Samarin. The LATEXCompanion. Addison-Wesley, Boston, 1994.

References

137

83. I. F. Gorodnitsky and Rao B. D. Sparse signal reconstruction from limited data using focuss: A re-weighted minimum norm algorithm. IEEE Trans. Sign. Proc., 45(3):600–616, 1997. 84. J. Gotman. Measurement of small time differences between EEG channels: method and application to epileptic seizure propagation. Electroenceph. and Clin. Neurophys., 56:501–514, 1983. 85. R. P. P. P. Grasman, H. M. Huizenga, P. C. M. Molenaar, and L. J. Waldorp. Electromagnetic source localization and interactions between neural generators. In J. Nenonen, R. J. Ilmoniemi, and T. Katila, editors, BioMag2000. Proceedings of 12th Int. Conf. Biomagnetism, pages 738–740, Espoo, Finland, Aug 2000. Helsinki University of Technology. 86. R. P. P. P. Grasman, H. M. Huizenga, L. J. Waldorp, K. B. E. B¨ ocker, and P. C. M. Molenaar. Frequency domain source and source coherence estimation. In H. Nowak, J. Haueisen, F. Gießler, and R. Huonker, editors, BioMag2002. Proceedings of 13th Int. Conf. Biomagnetism, pages 751–753, Jena, Germany, Aug 2002. VDE Verlag. 87. R. P. P. P. Grasman, H. M. Huizenga, L. J. Waldorp, K. B. E. B¨ ocker, and P. C. M. Molenaar. Frequency domain simultaneous source and source coherence estimation with an application to MEG. IEEE Trans. on Biomedical Engineering, 51(1):45–55, Jan. 2004. 88. R. P. P. P. Grasman, H. M. Huizenga, L. J. Waldorp, K. B. E. B¨ ocker, and P. C. M. Molenaar. Stochastic maximum likelihood mean and cross-spectrum structure estimation of EEG/MEG dipole sources. IEEE Trans. Signal Processing, Under revision. 89. R. P. P. P. Grasman, H. M. Huizenga, L. J. Waldorp, and P. C. M. Molenaar. Optimizing interpretability of averaging kernels for the neuro-electromagnetic inverse problem. IEEE Trans. Biomedical Engineering, In prep. 90. R. P. P. P. Grasman, L. J. Waldorp, P. C. M. Molenaar, and H. M. Huizenga. Stochastic maximum likelihood array processing for rank deficient array manifolds. IEEE Trans. on Signal Processing, In prep. 91. R. Grave de Peralta-Menendez and S. L. Gonzalez-Andiono. A critical analysis of linear inverse solutions to the neuroelectromagnetic inverse problem. IEEE Trans. BME, 45(4):440–448, Apr. 1998. 92. R. Grave de Peralta-Menendez and S. L. Gonzalez-Andiono. Backus and Gilbert method for vector fields. Human Brain Mapping, 7:161–165, 1999. 93. R. Grave de Peralta-Menendez and S. L. Gonzalez-Andiono. Discussing the capabilities of Laplacian minimization. Brain Topography, 13:97–104, 2000. 94. R. Grave de Peralta-Menendez, O. Hauk, S. L. Gonzalez-Andiono, H. Vogt, and C. Michel. Linear inverse solutions with optimal resolution kernels applied to electromagnetic tomography. Human Brain Mapping, 5:454–467, 1997. 95. D. J. Griffiths. Introduction to Electrodynamics. Prentice Hall, New Jersey, 2nd edition, 1999. 96. J. Gross and A. A. Ioannides. Linear transformations of data space in MEG. Phys. Med. Biol., 44:2081–2097, 1999. 97. J. Gross, J. Kujala, M. H¨ am¨al¨ ainen, L. Timmermann, A. Schnitzler, and R. Salmelin. Dynamic imaging of coherent sources: Studying neural interactions in the human brain. Proc. Nat. Ac. Sci. USA, 98(2):694–699, 2001. 98. M. A. Guevara and M. Corsi-Cabrera. Eeg coherence or eeg correlation? Int. J. of Psychophysiol., 23:145–153, 1996. 99. I. Haalman and E. Vaadia. Dynamics of neuronal interactions: relation to behavior, firing rates, and distance between neurons. Human Brain Mapping, 5(4):249–253, 1997. 100. Chen Hai-Wen, L. D. Jacobson, J. P. Gaska, and D. A. Pollen. Cross-correlation analyses of nonlinear systems with spatiotemporal inputs. IEEE Trans. BME, 40(11):1102–1113, Nov 1993. 101. M. H¨ am¨al¨ ainen, R. Hari, R. Ilmoniemi, J. Knuutila, and O.V. Lounasmaa. Magnetoencephalography – theory, instrumentation, and applications to noninvasive studies of the working human brain. Review of modern physics, 65:413–497, 1993.

138

References

102. M. H¨ am¨al¨ ainen, R. Hari, R. J. Ilmoniemi, J. Knuutila, and O.V. Lounasmaa. Magnetoencephalography – theory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of Modern Physics, 65:413–497, 1993. 103. H. Haneishi, N. Ohyama, K. Sekihara, and T. Honda. Multiple current dipole estimations using simulated annealing. IEEE Trans. Biomed. Eng., 41(11):1004–1009, 1994. 104. E.J. Hannan and M. Deistler. The statistical theory of linear systems. Wiley series in probability and mathematical statistics. Wiley, New York, 1988. 105. T. Holroyd, M. Nielsen, S. Miyauchi, and T. Yanagida. Broad-band magnetic brain activity during rhythmic tapping tasks. In J. Nenonen, R. J. Ilmoniemi, and T. Katila, editors, BioMag2000. Proceedings of 12th Int. Conf. Biomagnetism, pages 307–310, Espoo, Finland, Aug 2000. Helsinki University of Technology. 106. B. Horwitz. The elusive concept of brain connectivity. Neuroimage, 19:466–470, 2003. 107. J.H. Houtveen, B. Bermond, and M.R. Elton. Alexithymia: A disruption in a cortical network? an EEG power and coherence analysis. J. Psychophysiology, 11(2):147–157, 1997. 108. H. M. Huizenga. The statistical approach to electromagnetic source localization in the brain. PhD thesis, University of Amsterdam, Amsterdam, 1995. 109. H. M. Huizenga, J. C. De Munck, L. J. Waldorp, and R. P.P.P. Grasman. Spatiotemporal EEG/MEG source analysis based on a parametric noise covariance model. IEEE Transactions on Biomedical Engineering, 2002. 110. H. M. Huizenga, D. J. Heslenfeld, and P. C. M. Molenaar. Optimal measurement conditions for spatiotemporal EEG/MEG source analysis. Psychometrika, 67(2):299–313, Jun 2002. 111. H. M. Huizenga and P. C. M. Molenaar. Estimating and testing the sources of evoked potentials in the brain. Multivariate Behavioral Research, 28:237–262, 1994. 112. H. M. Huizenga and P. C. M. Molenaar. Equivalent source estimation of scalp potential fields contaminated by heteroscedastic and correlated noise. Brain Topography, 8(1):13–33, 1995. 113. H. M. Huizenga and P. C.M. Molenaar. Optimal measurement conditions for spatiotemporal EEG/MEG source analysis. Psychometrika, 67(2):299–313, 2002. 114. H. M. Huizenga, T. L. Van Zuijen, D. J. Heslenfeld, and P. C.M. Molenaar. Simultaneous MEG and EEG source analysis. Physics in Medicine and Biology, 46(7):1737–1751, 2001. 115. A. Hyv¨arinen and E. Oja. Independent component analysis: algorithms and applications. Neural Networks, 13:411–430, 2000. 116. A. A. Ioannides. Real time human brain function: observations and inferences from single trial analysis of magnetoencephalographic signals. Clinical Electroencephalography, 32(3):98–111, 2001. 117. A. A. Ioannides, R. Hasson, and G. J. Miseldine. Model-dependent noise elimination and distributed source solutions for the biomagnetic inverse problem. In A. F. Gmitro, P. S. Idell, and I. J. Lahaie, editors, Digital Image Synthesis and Inverse Optics, volume 1351 of SPIE Conf., pages 471–481. SPIE, 1990. 118. A. A. Ioannides, G. K. Kostopoulos, N. A. Laskaris, L. Liu, T. Shibata, M. Schellens, V. Poghosyan, and A. Khurshudyan. Timing and connectivity in the human somatosensory cortex from single trial mass electrical activity. Hum. Brain Mapping, 15:231–246, 2002. 119. R. I. Jennrich. A simple general method for oblique rotation. Psychometrika, 67(1):7–20, 2002. 120. E. R. John. The neurophysics of consciousness. Brain Research Reviews, 39:1–28, 2002. 121. K. G. J¨ oreskog. A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34:182–202, 1969. 122. K. G. J¨ oreskog. Analysis of covariance structures. Scandinavian Journal of Statistics, 8:65–92, 1981. 123. K. G. J¨ oreskog and D. S¨ orbom. LISREL 7, a guide to the program and applications. J¨ oreskog and S¨ orbom/SPSS Inc., Chicago, Illinois, 2nd edition, 1989.

References 124. H. F. Kaiser. The varimax criterion for analytic rotation in factor analysis. 23(3):187–200, 1958.

139

Psychometrika,

125. M. Kaminski and K. Blinowska. A new method of the description of the information flow in the brain structures. Biol. Cybern., 65:203–210, 1991. 126. M. Kaminski, K. Blinowska, and W. Szelenberger. Topographic analysis of coherence and propagation of eeg activity during sleep and wakefulness. Electroenceph. and clin. Neurophysiol., 102:216– 227, 1997. 127. E. R. Kandel, J. H. Schwartz, and T. M. Jessel. Principles of neural science. McGraw-Hill, New York, USA, 4th edition, 2000. 128. E. R. Kandel and S. A. Siegelbaum. Overview of synaptic transmission. In E. R. Kandel, J. H. Schwartz, and T. M. Jessel, editors, Principles of neural science, chapter 10, pages 175–186. McGrawHill, New York, USA, 4th edition, 2000. 129. R. D. Katznelson. EEG recording, electrode placement, and aspects of generator localization. In P. L. Nunez, editor, Electric fields of the brain: the neurophysics of EEG, pages 176–213. Oxford University Press, New York, 1981. 130. J. L. Kenemans, J. M. P. Baas, G. R. Mangun, M. Lijffijt, and M. N. Verbaten. On the processing of spatial frequencies as revealed by evoked-potential source modeling. Clinical Neurophysiology, 111(6):1113–1123, 2000. 131. M. G. Knyazeva, D. C. Kiper, V. Y. Vildavski, P. A. Despland, M. Maeder-Invar, and G. M. Innocenti. Visual stimulus-dependent changes in interhemispheric EEG coherence in humans. J. Neurophysiol. Bethes., 82(6):3095–3107, Dec. 1999. 132. Z. J. Koles and P. Flor-Henry. The effect of brain function on coherence patterns in the bipolar EEG. Int. J. of Psychophysiol., 5:63–71, 1987. 133. L. H. Koopmans. The spectral analysis of time series, volume 22 of Probability and Mathematical Statistics. Academic Press, San Diego, 2nd edition, 1995. 134. H. Krim and M. Viberg. Two decades of array signal processing research. the parametric approach. IEEE Signal Processing Mag., 13(4):67–95, July 1996. 135. J. P. Lachaux, A. Lutz, D. Rudrauf, D. Cosmelli, M. Le van Quyen, J. Martinerie, and F. J. Varela. Estimating the time-course of coherence between single-trial brain signals: An introduction to wavelet coherence. Neurophysiologie-Clinique, 32(3):157–174, 2002. 136. J. P. Lachaux, E. Rodriguez, J. Martinerie, and F. J. Varela. Measuring phase synchrony in brain signals. Human Brain Mapping, 8:194–208, 1999. 137. T. D. Lagerlund, F. W. Sharbrough, N. E. Busacker, and K. M. Cicora. Interelectrode coherences from nearest-neighbor and spherical harmonic expansion computation of Laplacian of scalp potential. Electroenceph. and clin. Neurophysiol., 95:178–188, 1995. 138. L. Lamport. LATEX: A document preperation system—Users guide and reference manual. AddisonWesley, Reading, Massachusetts, 2nd edition, 1994. 139. M. Le van Quyen, J. Foucher, J. P. Lachaux, E. Rodriguez, A. Lutz, J. Martinerie, and F. J. Varela. Comparison of hilbert transform and wavelet methods for the analysis of neuronal synchrony. J. Neurosci. Meth., 111:83–98, 2001. 140. L. Lee, M. Harrison, and A. Mechelli. A report of the functional connectivity workshop, D¨ usseldorf 2002. NeuroImage, 19:457–465, 2003. 141. D. Liberati, M. Cursi, T. Locatelli, G. Comi, and S. Cerutti. Total and partial coherence analysis of spontaneous and evoked EEG by mean of multi-variable autoregressive processing. Med. & Biol. Eng. & Comput., 35:124–130, 1997. 142. F. H. Lopes da Silva, A. van Rotterdam, P. Barts, E. van Heusden, and W. Burr. Models of neuronal populations: the basic mechanisms of rhythmicity. In M.A. Corner and Swaab D. F., editors, Perspectives of brain research, volume 45 of Prog Brain Res, pages 281–308. 1976.

140

References

143. F. H. Lopes da Silva, J. E. Vos, J. Mooibroek, and A. van Rotterdam. Relative contributions of intracortical and thalamo-cortical processes in the generation of alpha rhythms, revealed by partial coherence analysis. Electroenceph. and clin. Neurophysiol., 50:449–456, 1980. 144. B. L¨ utkenh¨ oner. Frequency-domain localization of intracerebral dipolar sources. Electroencephalography and clinical Neurophysiology, 82(2):112–118, 1992. 145. B. L¨ utkenh¨ oner. Magnetic-field arising from current dipoles randomly distributed in a homogeneous spherical volume conductor. J. Appl. Physics, 75(11):7204–7210, 1994. 146. B. L¨ utkenh¨ oner and R. Grave de Peralta-Menendez. The resolution-field concept. Electroencephalography and clinical Neurophysiology, 102:326–334, 1997. 147. R. C. MacCallum. Specification searches in covariance structure modeling. Psychological Bulletin, 100:107–120, 1986. 148. R. C. MacCallum, M. Roznowski, and L. B. Necowitz. Model modifications in covariance structure analysis: the problem of capitalization on chance. Psychological Bulletin, 111(3):490–504, 1992. 149. B. Maess, A. D. Friederici, M. Damian, A. S. Meyer, and W. J. M. Levelt. Semantic category interference in overt picture naming: Sharpening current density localization by PCA. IEEE Trans. Sign. Proc., 14(3):455–462, 2002. 150. J. R. Magnus and H. Neudecker. Matrix differential calculus: with applications in Statistics and Econometrics. Wiley series in probability and statistics. Wiley, Chichester, revised edition, 1999. 151. J. Maier, G. Dagneli, H. Spekreijse, and B. W. van Dijk. Principal components analysis for source localization of VEPs in man. Vision Research, 27:165–177, 1987. 152. S. Makeig, M. Westerfield, T.-P. Jung, S. Enghoff, J. Townsend, E. Courchesne, and T. J. Sejnowski. Dynamic Brain Sources of Visual Evoked Responses. Science, 295(5555):690–694, 2002. 153. S. L. Marple Jr. Computing the discreet-time “analytic” signal via FFT. IEEE Trans. Sign. Proc., 47(9):2601–2603, Sep. 1999. 154. S. L. Marple Jr. Estimating group delay and phase delay via discrete-time “analytic” crosscorrelation. IEEE Trans. Sign. Proc., 47(9):2604–2607, Sep. 1999. 155. K. Matsuura and Y. Okabe. Selective minimum-norm solution of the biomagnetic inverse problem. IEEE Trans. Biomed. Engin., 42(6):608–615, Jun. 1995. 156. A. R. McIntosh. Mapping cognition to the brain through neural interactions. Memory, 7(5/6):523– 548, 1999. 157. A. R. McIntosh, C. L. Grady, L. G. Ungerleider, J. V. Haxby, S. I. Rapoport, and B. Horwitz. Network analysis of cortical visual pathways mapped with PET. Journal of Neuroscience, 14(2):655– 666, 1994. 158. A. R. McIntosh, C. L. Grady, L. G. Ungerleider, J. V. Haxby, S. I. Rapoport, and B. Horwitz. Network analysis of cortical visual pathways mapped with PET. Journal of Neuroscience, 14(2):655– 666, 1994. 159. F. L. Mcintosh, A. R. Bookstein, J. V. Haxby, and C. L. Grady. Spatial pattern analysis of functional brain images using partial least squares. Neuroimage, 3(3):143–157, 1996. 160. W. Menke. Geophysical data analysis: discrete inverse theory. Academic press, Inc. Wiley, San Diego, USA, revised edition, 1989. 161. P. C. M. Molenaar. A dynamic factor model for the analysis of multivariate time series. Psychometrika, 50(2):181–202, 1985. 162. P. C. M. Molenaar. Dynamic factor analysis in the frequency domain: causal modeling of multivariate psychophysiological time series. Multiv. Behav. Res., 22(3):329–353, 1987. 163. P. C. M. Molenaar. Dynamic factor analysis of psychophysiological signals. In J.R. Jennings, P. Ackles, and M.G.H. Coles, editors, Advances in Psychophysiology, volume 5 of Advances in Psychophysiology, pages 229–302. Jessica Kingsley Publishers, London, 1993. 164. P. C. M. Molenaar and J. R. Nesselroade. Rotation in the dynamic factor modeling of multivariate stationary time series. Psychometrika, 66(1):99–107, Mar 2001.

References

141

165. P.C.M. Molenaar, J.G. de Gooijer, and B. Schmitz. Dynamic factor analysis of nonstationary multivariate time series. Psychometrika, 57:333–349, 1992. 166. D. F. Morrison. Multivariate statistical methods. McGraw-Hill, New York, 2d edition, 1989. 167. J. C. Mosher and R. M. Leahy. Recursive MUSIC: A framework for EEG and MEG source localization. IEEE Transactions on Biomedical Engineering, 45(11):245–259, Nov 1998. 168. J. C. Mosher, R. M. Leahy, and P. S. Lewis. EEG and MEG: Forward solutions for inverse methods. IEEE Transactions on Biomedical Engineering, 46(3):245–259, March 1999. 169. J. C. Mosher, P. S. Lewis, and R. M. Leahy. Multiple dipole modeling and localization from spatiotemporal MEG data. IEEE Transactions on Biomedical Engineering, 39(6):543–557, June 1992. 170. J. I. Nelson. Binding in the visual system. In M. A. Arbib, editor, Handbook of brain research and neural networks, pages 157–159. MIT Press, Cambridge, MA, 1995. 171. P. L. Nunez and R. B. Silberstein. On the relationship of synaptic activity to macroscopic measurements: Does co-registration of EEG with fMRI make sense? Brain Topography, 13(2):79–96, 2000. 172. P. L. Nunez, R. B. Silberstein, P. J. Caduch, R. S. Wijesinghe, A. F. Westdorp, and R. Srinivasan. A theoretical and experimental study of high resolution eeg based on surface laplacian and cortical imaging. Electroenceph. and clin. Neurophysiol., 90:40–57, 1994. 173. P. L. Nunez, R. B. Silberstein, Z. Shia, M. R. Carpenter, R. Srinivasan, D. M. Tucker, S. M. Doran, P. J. Cadusche, and R. S. Wijesinghe. Eeg coherency ii: experimental comparisons of multiple measures. Brain Topography, 110:469–486, 1999. 174. P. L. Nunez, R. Srinivasan, A. F. Westdorp, R. S. Wijesinghe, D. M. Tucker, R. B. Silberstein, and P. J. Cadusche. EEG coherency I: statistics, reference electrode, volume conduction, Laplacians, cortical imaging, and interpretation at multiple scales. Electroenceph. and clin. Neurophys., 103:499– 515, 1997. 175. B. Ottersten, M. Viberg, and T. Kailath. Analysis of subspace fitting and ml techniques for parameter estimation from sensor array data. IEEE Trans. Signal Processing, 40(3):590–600, March 1992. 176. R. D. Pascual-Marqui. The spherical spline Laplacian does not produce artifactually high coherences: comments on two articles by Biggins et al. Electroenceph. and clin. Neurophysiol., 87:62–64, 1993. 177. R. D. Pascual-Marqui, S. L. Gonzalez-Andino, P. A. Valdes-Sosa, and R. Biscay-Lirio. Current source density estimation and interpolation based on the spherical harmonic fourier expansion. Int. J. Neurosci., 43:237–249, 1988. 178. R. D. Pascual-Marqui, C. M. Michel, and D. Lehmann. Low resolution electromagnetic tomography: a new method for localizing electrical activity in the brain. Int. J. Psychophysiol., 18:49–65, 1994. 179. W. M. Patefield. On the miximized likelihood function. Sankhya: Series B, 89(1):92–96, 1977. 180. A. Paulraj, B. Ottersten, R. Roy, A. Swindlehurst, G. Xu, and T. Kailath. Subspace methods for directions-of-arrival estimation. In N. K. Bose and C. R. Rao, editors, Handbook of Statistics, chapter 16, pages 639–739. Elsevier Science Publishers B.V., Amsterdam, Netherlands, 1993. 181. F. Perrin. Comments on article by Biggins et al. Electroenceph. and clin. Neurophysiol., 83:171–172, 1992. 182. F. Perrin, J. Pernier, O. Bertrand, and J. F. Echallier. Spherical splines for scalp potential and current density mapping. Electroencephalogr. Clin. Neurophysiol., 72:184–187, 1989. 183. G. Pfurtscheller and F. H. Lopes da Silva. Event-related EEG/MEG synchronization and desynchronization: basic principles. Clinical Neurophysiology, 110(11):1842–1857, 1999. 184. D.T. Pham, J. M¨ ocks, W. K¨ ohler, and T. Gasser. Variable latencies of noisy signals: Estimation and testing in brain potential data. Biometrika, 74(3):525–533, 1987. 185. M. I. Posner and M. E. Raichle. Networks of attention. In M. I. Posner, editor, Images of Mind, pages 153–179. W. H. Freeman and Company, New York, 1994.

142

References

186. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C: the art of scientific computing. Cambridge University Press, Cambridge, 2nd edition, 1993. 187. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2003. ISBN 3-900051-00-3. 188. P. Rappelsberger. The reference problem and mapping of coherence: a simulation study. Brain Topography, pages 63–72, 1989. 189. J. Raz, C. A. Biggins, B. Turetsky, and G. Fein. Frequency-domain dipole localization - extensions of the method and applications to auditory and visual-evoked potentials. IEEE Transactions on Biomedical Engineering, 40(9):909–918, 1995. 190. J. Raz, V. Cardenas, and Fletcher D. Frequency-domain estimation of covariate effects in multichannel brain evoked-potential data. Biometrics, 51(2):448–460, 1995. 191. J. Raz, B. Turetsky, and G. Fein. Frequency-domain estimation of the parameters of human brain electrical dipoles. Journal of the American Statistical Association, 87(417):69–77, 1992. 192. C. A. Rohde. Generalized inverses of partitioned matrices. J. Soc. Indust. Appl. Math., 13(4):1033– 1035, 1965. 193. J. R. Rosenberg, D. M. Halliday, P. Breeze, and B. A. Conway. Identification of patterns of neuronal connectivity—partial spectra, partial coherence, and neuronal interactions. J. Neurosci. Meth., 83:57–72, 1998. 194. M. G. Rosenblum, A. S. Pikovsky, and J. Kurths. Phase synchronization of chaotic oscillators. Phys. Rev. Let., 76(11):1804–1807, 1996. 195. J. Sarvas. Basic mathematical and electromagnetic concepts of the bio-magnetic inverse problems. Phys. Med. Biol., 32:11–22, 1987. 196. D. R. Saunders. The rationale for an “oblimax” method of transformation in factor analysis. Psychometrika, 26:317–324, 1961. 197. B. Schack and W. Krause. Dynamic power and coherence analysis of ultra short-term cognitive processes—a methodological study. Brain Topography, 8(2):127–135, 1995. 198. M. Scherg. Fundamentals of dipole source potential analysis. In F. Grandori, M. Hoke, and G. L. Romani, editors, Auditory evoked magnetic fields and electric potentials. Advances in audiology, volume 6, pages 40–69. Karger, Basel, 1990. 199. J. R. Schott. Matrix analysis for statistics. In Wiley Series In Probability And Statistics. [87], 1997. 200. G. A. F. Seber and C. J. Wild. Nonlinear Regression. Wiley series in Probability and Mathematical Statistics, Applied probability and statistics. Wiley, New York, 1989. 201. T. J. Sejnowski and P. Smith Churchland. Brain and cognition. In Michael I. Posner, editor, Foundations of cognitive science, chapter 8, pages 301–358. MIT Press, Cambridge, Mass., 1989. 202. K. Sekihara, S. S. Nagarajan, D. Poeppel, and A. Marantz. Performance of an MEG adaptivebeamformer technique in the presence of correlated neural activities: Effects on signal intensity and time-course estimates. IEEE Trans. Biomed. Eng., 49(12):1534–1546, 2002. 203. K. Sekihara, S. S. Nagarajan, D. Poeppel, A. Marantz, and Y. Miyashita. Reconstructing spatiotemporal activities of neural sources using a MEG vector beamformer technique. IEEE Trans. Biomed. Eng., 48(7):760–771, 2001. 204. K. Sekihara, S. S. Nagarajan, D. Poeppel, A. Marantz, and Y. Miyashita. Application of an MEG eigenspace beamformer to reconstructing spatio-temporal activities of neural sources. Human Brain Mapping, 15:199–215, 2002. 205. K. Sekihara and B. Scholz. Average-intensity reconstruction and Wiener reconstruction of bioelectric current distribution based on its estimated covariance. IEEE Trans. Biomed. Eng., 42(2):149–157, 1995. 206. A. Shapiro and M. W. Browne. Analysis of covariance structures under elliptical distributions. Journal of the American Statistical Association, 82(400):1092–1097, 1987.

References

143

207. S. D. Silvey. Statistical inference. Penguin, Harmondsworth, 1970. 208. W. Singer. Synchronization of neural responses as a putative binding mechanism. In Arbib M. A., editor, Handbook of brain research and neural networks, pages 960–964. MIT Press, Cambridge, MA, 1995. 209. K. D. Singh, G. R. Barnes, A. Hillebrand, E. M. E. Forde, and A. L. Williams. Task-related changes in cortical synchronization are spatially coincident with the hemodynamic response. NeuroImage, 16:103–114, 2002. 210. W. E. Smith. Estimation of the spatio-temporal correlations of biological electrical sources from their magnetic fields. IEEE Trans. BME, 39(10):997–1004, Oct 1992. 211. O. Sporns. Network analysis, complexity, and brain function. Complexity, 8(1):56–60, 2003. 212. R. Srebro. Iterative refinement of the minimum norm solution of the bioelectric inverse problem. IEEE Trans BME, 43(5):547–552, May 1996. 213. R. Srinivasan. Methods to improve the spatial resolution of EEG. Int. J. Bioelectromagn., 1(1):102– 111, 1999. 214. R. Srinivasan, P.L. Nunez, and R.B. Silberstein. Spatial filtering and neocortical dynamics: estimates of EEG coherence. IEEE Transactions on Biomedical Engineering, 45(7):814–826, 1998. 215. C. J. Stam, T. C. A. M van Woerkom, and W. S. Pritchard. Use of non0linear EEG measures to characterize EEG changes during mental activity. Electroencephalogr. Clin. Neurophysiol., 99:214– 224, 1996. 216. P. Stoica, E. G. Larsson, and A. B. Gershman. The stochastic CRB for array processing: A textbook derivation. IEEE Sign. Proc. Lett., 8(5):1033–1035, May 2001. 217. P. Stoica and A. Nehorai. MUSIC, maximum likelihood and the Cram´er-Rao Bound. IEEE Trans. Accoust., Speech, and Sign. Proc., 37(5):720–741, May 1989. 218. P. Stoica and A. Nehorai. MUSIC, maximum likelihood and the Cram´er-Rao Bound: Further results and comparison. IEEE Trans. Accoust., Speech, and Sign. Proc., 38(12):2140–2150, Dec. 1990. 219. P. Stoica, B. Ottersten, and M. Viberg. Optimal array signal processing in the presence of coherent wavefronts. volume 5 of ICASSP, pages 2904–2907, New York, NY, USA, 1996. IEEE. 220. P. Stoica, B. Ottersten, M. Viberg, and R. Moses. Maximum likelihood array processing for stochastic coherent sources. IEEE Trans. on Signal Processing, 44(1):96–105, January 1996. 221. S. Supek and C. J. Aine. Spatio-temporal modeling of neuromagnetic data: II. multi-source resolvability of a MUSIC-based location estimator. Human Brain Mapping, 5(3):154167, 1997. 222. P. Tass, M. G. Rosenblum, J. Weule, J. Kurths, A. Pikovsky, J. Volkmann, A. Schnitzler, and H. J. Freund. Detection of n:m phase locking from noisy data: Application to magnetoencephalography. Phys. Rev. Let., 81(15):3291–3294, 1998. 223. R. W. Thatcher, P. J. Krause, and M. Hrybyk. Cortico-cortical associations and EEG coherence: a two-compartmental model. Electroenceph. and Clin. Neurophysiol., 64:123–143, 1986. 224. K. Torquati, V. Pizzella, S. Della Penna, R. Franciotti, C. Babiloni, P. M. Rossini, and G. L. Romani. Comparison between si and sii responses as a function of stimulus intensity. Neuroreport, 13(6):813–819, 2002. 225. W. A. Truccolo, M. Ding, K. H. Knuth, R. Nakamura, and S. L. Bressler. Trial-to-trial variability of cortical evoked responses: implications for the analysis of functional connectivity. Clin. Neurophysiol., 113:206–226, 2002. 226. D. M. Tucker, D. L. Roth, and T. B. Bair. Functional connections among cortical regions: topography of EEG coherence. Electroenceph. and clin. Neurophysiol., 63:242–250, 1986. 227. K. Uutela, M. H¨ am¨al¨ ainen, and R. Salmelin. Global optimization in the localization of neuromagnetic sources. IEEE Transactions on Biomed. Eng., 45(6):716–723, Jun. 1998. 228. A. W. van der Vaart. Asymptotic Statistics. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Cambridge, United Kingdom, 1998.

144

References

229. B.D. Van Veen, W. van Drongelen, M. Yuchtman, and A. Suzuki. Localization of brain electrical activity via linearly constrained minimum variance spatial filtering. IEEE Trans. BME, 44(9):867– 880, Sept. 1997. 230. F. Varela, J. P. Lachaux, E. Rodriguez, and J. Martinerie. The brainweb: phase synchronization and large-scale integration. Nature reviews: Neuroscience, 2(4):229–239, Apr. 2001. 231. H. G. Vaughan and J. C. Arezzo. The neural basis of event-related potentials. In Picton L. W., editor, Human event related potentials. EEG Handbook (revised series), volume 3, pages 45–96. Elsevier Science Publishers, Amsterdam, 1988. 232. M. Viberg, B. Ottersten, and A. Nehorai. Performance analysis of direction finding with large arrays and finite data. IEEE Transactions on Signal Processing, 43(2):469–477, Feb. 1995. 233. J. Vrba and S. E. Robinson. Differences between Synthetic Aperture Magnetometry (SAM) and linear beamformers. In J. Nenonen, R. J. Ilmoniemi, and T. Katila, editors, BioMag2000. Proceedings of 12th Int. Conf. Biomagnetism, pages 681–685, Espoo, Finland, Aug 2000. Helsinki University of Technology. 234. J. Vrba and S. E. Robinson. Signal processing in magnetoencephalography. Methods, 25:249–271, 2001. 235. L. J. Waldorp, H. M. Huizenga, C. V. Dolan, and P. C. M. Molenaar. Estimated generalized least squares electromagnetic source analysis based on a parametric noise covariance model. IEEE Transactions on Biomedical Engineering, 48:737–741, 2001. 236. L. J. Waldorp, H. M. Huizenga, C. V. Dolan, and P. C.M. Molenaar. Estimated generalized least squares electromagnetic source analysis based on a parametric noise covariance model. IEEE Transactions on Biomedical Engineering, 48(6):737–741, 2001. 237. L. J. Waldorp, H. M. Huizenga, R. P. P. P. Grasman, K. B. E. B¨ ocker, J. C. de Munck, and P. C. M. Molenaar. Model selection in electromagnetic source analysis with an application to VEF’s. IEEE Transactions on Biomedical Engineering, 49(10):1121–1129, 2002. 238. L. J. Waldorp, H. M. Huizenga, R. P.P.P. Grasman, K. B. E. B¨ ocker, J. C. de Munck, and P. C. M. Molenaar. Model selection procedures in spatiotemporal electromagnetic source analysis. In prep. 239. M. Wax and T. Kailath. Detection of signals by information theoretic criteria. IEEE Trans ASSP, 33(2):387–392, Apr. 1985. 240. S. J. Williamson and L. Kaufman. Theory of neuroelectric and neuromagnetic fields. In F. Grandori, M. Hoke, and Romani G. L., editors, Auditory evoked magnetic fields and electric potentials. Advances in audiology, volume 6, pages 1–39. Karger, Basel, 1990. 241. J. Yordanova, V. Kolev, and J. Polich. P300 and alpha event-related desynchronization (erd). Psychophysiology, 38:143–152, 2001. 242. K. H. Yuan and P. M. Bentler. Mean and covariance structure analysis: Theoretical and practical improvements. Journal of the American Statistical Association, 92(438):767–774, 1997. 243. S.R. Zhao, H. Heer, A. A. Ioannides, M. Wagener, H. Halling, and H. W. M¨ uller-G¨ artner. Interpolation of magnetic fields and their gradients for MEG data with 3d spline functions.

Index Cov(·, ·), 13 E(·), 13 , 13 Mod (·), 13

, 13 Rm d , 13 Var(·), 13 vec{·}, 13 [·]R , 109 Z, 13 X∗ , 13 {·}R , 109

absolute value, 13 argument, 13 conjugate, 13 modulus, 13 phase, 13 polar form, 13 concentrated likelihood, 40, 41, 123 confidence interval, 6, 63, 81 connectivity, 11, 19, 97 and inverse methods, 10, 100 concepts, 19 effective, 16, 19, 84, 97 functional, 19, 84, 97 constraint, 63 and estimator variance, 63 orthogonality in PCA, 30 convergence almost sure, 13 in probability, 13 with probability one (w.p. 1), 13 covariance, 19 structure, 32, 38, 39, 73 Cram´er-Rao bound, 7, 52 cross-correlation, 21, 22 cross-covariance, 21 cross-spectral density, see cross-spectrum cross-spectrum, 22 current, 2 primary, 2, 2 return, see current, volume secondary, see current, volume volume, 2, 2 currents, 14 postsynaptic, see postsynaptic current

d

=, 13 X+ , 13 ⊥, 13 ˘ 13 θ, ˙ 13 y,  13 θ,  · , 13  · F , 13

, 13 ⊗, 13 p →, 13 i (imaginary unit), 13 X−1 , 13 a.s., 13 action potential, 14 Akaike’s information criterion, 34 algorithm, 12, 98 iterative, 5 objections against, 7 auto-spectral density, see spectrum autocovariance, 21 axon, 2, 14 background activity, 1, 6 beamformer, 7, 8, 10, 11 and coherent sources, 8 linearly constraint minimum variance (LCMV), 8 performance, 8 BGLS, see least squares, generalized, best brain, 1

dendrites, 2, 14 determinant, 13 DFM, 97 DICS, 11 dipole, 3, 4 equivalent current dipole, 5 tangential, 4 dipole model, see model, dipole dynamics, 99 cortical, 1, 2

central limit theorem, 6, 62 chi-square, 64, 98 CLT, see central limit theorem cognitive task, 1 coherence, 22 dynamic, 23 event related, 23 coherency, 22, 23 complex variable, 13

EEG, 6, 14, 19, 35 efficiency computational, 12, 41, 53, 98 statistical, 7 eigenimage, 30 electroencephalogram, 1 origins, 2 ergodicity, 21

145

146

INDEX

assumptions of, 20 error, 6 estimable function, 99 intepretability, 99 selectiveness, 99 specificness, 99 estimator BGLS, 50, 52, 62 efficiency, 52, 53 GLS, 62 ML, 39, 62, 77 subspace fitting, 84 ULS, 62 event related, 1 (de-)synchronization, 29 field, 1 potential, 1 filter, 18, 22 causal, 16 forward problem, 3 Fourier transform, 7, 22, 23, 38, 57, 61, 75 frequency domain, 7, 12, 33, 34, 37, 57, 60, 76 function estimable, see estimable function functional “network” and PCA, 30 integration, 19 segregation, 19 specialization, 19 Gabor, 28 Gauss-Newton, 5 generalized likelihood ratio test, see GLRT GLRT, 40, 40, 64, 74, 98 GLS, see least squares, generalized Grave de Peralta-Menendez, R., 10 Hadamard product, 13 Hessian matrix, 40, 41, 53, 65, 81 homoscedastic, 98 ICA, 107, 108 imaginary unit, 13 interactions, 1, 14–38, 70, 73, 97 cortico-cortical, 14, 71, 97 dynamics, 14, 73 neural, 14 model for, 14 inverse, 13 generalized, 5, 13, 110 Moore-Penrose, see inverse, generalized regularized, 9 inverse problem, 4, 86, 99

Laplace operator, 9, 26 law of large numbers, 21 LCMV beamformer, 10 lead field, 4, 5 least squares, 6 generalized, 7, 62, 120 best, 6 feasible, 6 ordinary, 5 separable, 41, 53 linear search methods, 7, 86, 99 LISREL, 12 LORETA, 9 magnetoencephalogram, 1 origins, 2 maximum likelihood, 6, 39, 62, 73, 77, 98 Maxwell equations, 3 quasi-static approximation, 3 mean, 12, 13, 19, 42, 59, 61, 82, 99 structure, 12, 38, 39, 42, 73, 98 MEG, 6, 14, 19, 20, 24–26, 30, 34, 35, 59, 64, 68, 73, 74, 82, 86, 91, 97, 100 method exploratory PCA, 30 for determining interactions, 19 minimum norm, 9 mixing, 21 model, 16, 17 confirmatory factor, 12, 53, 98 dipole, 3, 5 dipole models and connectivity, 98 distributed source models, 9, 11 dynamic factor, 12 error, 6 for neural interactions, 16 instantaneous dipole model, 5 Kronecker product, 7, 107 linear time invariant filter, 22 for connectivity, 99 rotating dipole model, 6 spatiotemporal dipole model, 5 and GLS, 7 structural equations, 12, see structural equations model MUSIC, 7, 8, 11 performance, 8

Kronecker product, 13

neural assemblies, 1 neurotransmitter, 2, 14 Newton-Raphson, 5 noise, 5, 6 nonlinear regression, see regression, nonlinear norm Frobenius, 5, 13 notational conventions, 13

Lagrange multiplier method, 8

orthogonal complement, 13

joint peri-stimulus-time histogram, 21

INDEX parameters dipole, 5 path analysis, 31 PCA, 30, 108 periodogram, 23 permutation test, 11 phase, 27 generalized phase difference, 28 phase difference stability, 27 point process, 15 bivariate, 15 average impulse response, 16 stochastic, 15 postsynaptic cell, 14 current, 18 excitatory, 17 potential excitatory, 14, 17 inhibitory, 14 potential action potential, 2 precision, 7 presynaptic cell, 14 principal components, see PCA projector, 13 radar, 8 random variable complex, 12 RAP-MUSIC, 7 re-parametrization, 7 regression nonlinear, 5, 6 residual variance, 6 SEM, 97 separable least squares, see leas squares signal processing, 8 signal subspace fitting, 108 SML, see stochastic maximum likelihood source localization, 19, 20, 42, 59, 73, 86, 97, 102, 105 spatial filters, 12 spectral density, see spectrum spectrum, 22 stationarity, 15, 21 local, 23 stochastic maximum likelihood, 12, 42, 73, 77, 98 structural equations model, 31 subspace fitting, 8, 84 superposition principle, 2, 4 synaps chemical, 14 electrical, 14 synchronize, 2 systems

brain, 1 linear time invariant (LTI), 97 tomography, 9 transpose, 13 trial, 1 average, 1, 6 VAR, 97 virtual sensor, see beamformer Volterra expansion, 16, 18, 76, 99 volume conductor, 3 spherical, 4, 98 wavelet, 23, 28 weighted signal subspace fitting, see WSF Wishart distribution, 39 WSF, 84, 108

147

Index