The Co-occurrence of Multisensory Facilitation and Competition in the ...

2 downloads 3188 Views 4MB Size Report
2.3.2 Multi-scale entropy and power spectral density (PSD). 2.3.3 PLS analysis ...... and preserving the total amount of data in each bootstrap set. ...... Single trial power spectra were computed using the Fast Fourier Transform (FFT). Given the.
The Co-occurrence of Multisensory Facilitation and Competition in the Human Brain and its Impact on Aging

by

Andreea Diaconescu

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Psychology University of Toronto

© Copyright by Andreea Diaconescu 2011

The Co-occurrence of Multisensory Facilitation and Competition in the Human Brain and its Impact on Aging Andreea Diaconescu Doctor of Philosophy Graduate Department of Psychology University of Toronto 2011

Abstract Perceptual objects often comprise of a visual and auditory signature, which arrives simultaneously through distinct sensory channels, and multisensory features are linked by virtue of being attributed to a specific object. The binding of familiar auditory and visual signatures can be referred to as semantic audiovisual (AV) integration because it involves higher level representations of naturalistic multisensory objects. While integration of semantically related multisensory features is behaviorally advantageous, multisensory competition, or situations of sensory dominance of one modality at the expense of another, impairs performance. Multisensory facilitation and competition effects on performance are exacerbated with age. Older adults show a significantly larger performance gain from bimodal presentations compared to unimodal ones. In the present thesis project, magnetoencephalography (MEG) recordings of semantically related bimodal and unimodal stimuli captured the spatiotemporal patterns underlying both multisensory facilitation and competition in young and older adults. We first demonstrate that multisensory processes unfold in multiple stages: first, posterior parietal neurons respond preferentially to bimodal stimuli; secondly, regions in superior temporal and posterior cingulate cortices detect the semantic category of the stimuli; and finally, at later processing stages, orbitofrontal regions process crossmodal conflicts when complex sounds and ii

pictures are semantically incongruent. Older adults, in contrast to young, are more efficient at integrating semantically congruent multisensory information across auditory and visual channels. Moreover, in these multisensory facilitation conditions, increased neural activity in medial fronto-parietal brain regions predicts faster motor performance in response to bimodal stimuli in older compared to younger adults. Finally, by examining the variability of the MEG signal, we also showed that an increase in local entropy with age is also behaviourally adaptive in the older group as it significantly correlates with more stable and more accurate performance in older compared to young adults.

iii

Acknowledgments I would like to thank my supervisor, Dr. Randy McIntosh for his continuing support, guidance, and professional mentoring. My research these last six years has been incredibly rewarding and will serve as a source of inspiration for the rest of my career. The other members of my thesis committee, Drs. Lynn Hasher and Claude Alain, also contributed to my work with most valuable advice. I would also like to thank Dr. Natasa Kovacevic and the members of the MEG lab including Dr. Bernhard Ross for their invaluable assistance in data collection and analysis. The members of the McIntosh lab have also provided indispensable critique and feedback on my work. Finally, I would like to acknowledge the continued patience, love and support provided by my family and friends. Thank you for encouraging me every step of the way.

iv

Table of Contents Chapter 1: General Introduction 1. Multisensory facilitation 1.1 Overview 1.2 Electrophysiological recordings of basic multisensory facilitation 1.3 Neuroimaging studies of basic multisensory facilitation 2. Audiovisual semantic congruence: A special case of multisensory facilitation 2.1 Multisensory competition 2.2 Electrophysiological studies of multisensory interactions 2.3 Functional neuroimaging studies of multisensory interactions 2.4 Audiovisual speech: A special case of multimodal congruence 3. Perceptual and cognitive changes with increasing age 3.1 Age and inhibitory control: Behavioural findings 3.2 Age and inhibitory control: Electrophysiological changes 3.3 Age and inhibitory control: Evidence from neuroimaging studies 3.4 Age and behavioural variability 3.5 Audiovisual multisensory integration and aging 4. Aging and compensatory neural mechanisms 4.1 Age-related structural changes 4.2 Age-related functional changes 5. Purpose of the study Chapter 2: Methodology and Experimental Design 1. Experimental design 1.1 Stimuli 1.2 Procedure 1.3 Cognitive and neuropsychological testing 2. MEG recordings 2.1 Overview of MEG recordings 2.2 Beamforming approaches to source analysis 3. Multivariate statistical analysis: Partial least squares 3.1 Overview of multivariate statistical methods for neuroimaging studies 3.2 Partial least squares (PLS) 3.2.1 Data organization 3.2.2 Singular value decomposition 3.2.3 Behaviour PLS analysis approach 3.3 Statistical assessment

v

Chapter 3: Aging Effects of Multisensory Facilitation and Competition 1. Introduction 2. Materials and methods 2.1 Participants 2.2 Stimuli 2.3 Procedure 2.3.1 Experimental conditions 2.3.2 Neuropsychological screening 3. Results 3.1 Neuropsychological screening 3.2 Accuracy 3.3 Response latencies 3.4 Multisensory facilitation and competition 3.5 Response time variability 4. Discussion Chapter 4: Neural Correlates of Multisensory Facilitation and Competition 1. Introduction 2. Materials and methods 2.1 Participants 2.2 Stimuli 2.3 Procedure 2.4 MEG recording and analysis 2.4.1 MEG pre-processing 2.4.2 MEG data analysis 2.4.2.1 Event-related SAM analysis 2.4.2.2 PLS analysis 2.4.2.3 Statistical assessment 3. Results 3.1 Behavioural results 3.2 MEG results 3.2.1 Event-related fields potentials (ERFs) and source activation maps 3.2.2 PLS results 3.2.2.1 Bimodal versus unimodal 3.2.2.2 Animacy effects 3.2.2.3 Multisensory facilitation versus competition 4. Discussion 4.1 Behaviour results 4.2 The distinct effects of multisensory facilitation and competition on neuromagnetic activity 4.3 Conclusion

vi

Chapter 5: Age-related Modulations of Multisensory Spatiotemporal Patterns 1. Introduction 2. Materials and methods 2.1 Participants 2.2 Stimuli 2.3 Procedure 2.3.1 Experimental conditions 2.4 MEG recordings 2.5 MEG pre-processing 2.6 MEG data analysis 2.6.1 Event-related SAM analysis 2.6.2 PLS analysis 2.6.4 Statistical assessment 3. Results 3.1 Behavioural results: Accuracy and RT Trends 3.1.1 Condition-specific analysis 3.1.2 Multisensory facilitation versus multisensory competition 3.2 MEG source activations 3.2.1 Detection condition 3.2.2 Animacy condition 3.2.3 Congruency condition 4. Discussion Chapter 6: Age Sculpts Brain Signal Complexity in Human Auditory, Visual and Multimodal Systems 1. Introduction 2. Materials and methods 2.1 Stimuli 2.2 Procedure 2.3 Overview of analysis methodology 2.3.1 MEG source analysis 2.3.2 Multi-scale entropy and power spectral density (PSD) 2.3.3 PLS analysis 2.3.3.1 Behaviour PLS analysis 2.3.4 Statistical assessment 3. Results 3.1 Age and signal complexity 3.2 Condition and trial type differences 3.2.1 Auditory versus visual processing 3.2.2 Bimodal versus unimodal processing 3.2.3 Effects of crossmodal conflict 3.3 Relationship to behaviour 3.3.1 Response times 3.3.2 Response time variability 3.3.3 Performance accuracy 4. Discussion

vii

Chapter 7: General Conclusion 1. Semantic multisensory integration 2. Semantic multisensory integration in aging 2.1 Brain signal complexity and age-related differences in multisensory integration 3. Evaluation of research findings 3.1 Generalizability 3.2 Ecological validity 3.3 Future directions References Tables Figures Appendices

viii

List of Tables Table 1: List of auditory and visual stimuli used in the experiment Table 2: Exclusion criteria applied to subject selection Table 3: Questionnaire results: Descriptive statistics Table 4: Accuracy Values: Descriptive Statistics Table 5: RT analysis: Descriptive statistics for age group, trial type, and condition Table 6: Response Times in Condition 1: Unimodal Auditory, Visual, Bimodal Trial Types Table 7: Response Times in Condition 2: Unimodal Auditory, Visual, Bimodal Trial Types Table 8: Response Times in Condition 3: Unimodal Auditory, Visual, Bimodal Trial Types Table 9: Response Times in Condition 4: Unimodal Auditory, Visual, Bimodal Trial Types Table 10: Response Times for Condition 5: Unimodal Auditory, Visual, Bimodal Congruent and Incongruent Trial Types Table 11: Multisensory facilitation and competition: Descriptive statistics for age group and condition type Table 12: Regional Map Coordinates with Reference to Talairach-Tournoux Table 13: Task Contrasts for Non-rotated PLS Analysis Table 14: Descriptive statistics for accuracy, RTs and coefficient of variation of RTs in detection, animacy, and congruency conditions Table 15: Descriptive statistics for the multisensory integration index (RT unimodal - RT bimodal) in detection, animacy, and congruency conditions. Table 16: Task Contrasts for Non-rotated PLS Analysis (MSE/Spectral Power)

ix

List of Figures Figure 1: Experimental design Figure 2: Response Time Trends Figure 3: RTs across Trial Types and Conditions Figure 4: Multisensory Competition and Facilitation: Young, Middle, and Older Adults Figure 5: Response Time Variability across Age Groups and Condition Types Figure 6: Accuracy Values in Animacy and Congruency Conditions Figure 7: RTs in Detection, Animacy and Congruency Conditions Figure 8: Event-related field potentials in the detection conditions Figure 9: Group averages of event-related SAM images in the detection condition Figure 10: Event-related field potentials in the animacy condition Figure 11: Group averages of event-related SAM images in the animacy condition Figure 12: Event-related field potentials in the congruency condition Figure 13: Group averages of event-related SAM images in the congruency condition Figure 14: Non-rotated PLS results: Bimodal versus Unimodal trial type differences across all conditions Figure 15: Non-rotated PLS results: Effects of animacy Figure 16: Non-rotated PLS results: Effects of crossmodal conflict across all trial types Figure 17: Congruency Effects in AV+ and AV- bimodal trial types Figure 18: Mean RTs in the detection condition in young and older groups Figure 19: Mean RTs and performance accuracy values in the animacy condition in young and older groups Figure 20: Mean RTs and performance accuracy values in the congruency condition in young and older groups Figure 21: Multisensory competition and facilitation in young and older groups Figure 22: Group averages of event-related SAM images in the detection condition (older group) x

Figure 23: Group averages of event-related SAM images in the animacy condition (older group) Figure 24: Group averages of event-related SAM images in the congruency condition (older group) Figure 25: Group mean-centered PLS results in the detection condition across AV+, AVversus A trial types (LV1) Figure 26: Group mean-centered PLS results in the detection condition across AV+, AVversus A trial types (LV2) Figure 27: Group mean-centered PLS results in the detection condition across AV+, AVversus V trial types (LV1) Figure 28: Brain-behaviour correlations with RT measures in the Detection Condition: Bimodal and unimodal trial types in LV1 Figure 29: Group mean-centered PLS results in the animacy condition across AV+ versus A trial types (LV1) Figure 30: Group mean-centered PLS results in the animacy condition across AV+ versus A trial types (LV2) Figure 31: Group mean-centered PLS results in the animacy condition across AV+ versus V trial types (LV1) Figure 32: Brain-behaviour correlations with RT measures in the Animacy Condition: Bimodal and unimodal trial types in LV1 Figure 33: Brain-behaviour correlations with performance accuracy measures in the Animacy Condition: Bimodal and unimodal trial types in LV1 Figure 34: Brain-behaviour correlations with performance accuracy measures in the Animacy Condition: Bimodal and unimodal trial types in LV2 Figure 35: Group mean-centered PLS results in the congruency condition across AV+, AVversus A trial types in (LV1) Figure 36: Group mean-centered PLS results in the congruency condition across AV+, AVversus A trial types in (LV2) Figure 37: Group mean-centered PLS results in the congruency condition across AV+, AVversus V trial types in (LV1) Figure 38: Group mean-centered PLS results in the congruency condition across AV+ versus AV- trial types in LV2 Figure 39: Group mean-centered PLS results in detection and congruency conditions (LV1) xi

Figure 40: Correlations between age, multi-scale entropy and power spectral density in a representative brain region Figure 41: Non-rotated PLS results: Auditory versus visual processing in young and older adults Figure 42: Non-rotated PLS results: Bimodal versus unimodal processing in young and older adults (MSE measure) Figure 43: Non-rotated PLS results: Effect of congruency in young and older adults (MSE measure) Figure 44: Brain-behaviour correlations between RT, cvRT (coefficient of variation of RT) and multi-scale entropy in the detection condition Figure 45: Brain-behaviour correlations between accuracy and signal complexity in the detection condition Figure 46: Brain-behaviour correlations between RT, cvRT (coefficient of variation of RT) and signal complexity in the congruency condition Figure 47: Brain-behaviour correlations between accuracy and signal complexity in the congruency condition

xii

List of Appendices Appendix 1: Rotman Research Institute Screening Questionnaire Appendix 2: Short Blessed Test Appendix 3: Shipley Institute for Living Scale Test Appendix 4: Folstein Mini-Mental Status Examination Appendix 5: Subject-specific MEG recording summary sheet Appendix 6: Procedure for MEG data pre-processing Appendix 7: Subject-specific MEG event marker summary sheet Appendix 8: Procedure for MEG data analysis Appendix 9: Information and Consent Form for Behavioural Study Appendix 10: Information and Consent Form for MEG Study Appendix 11: MEG Study Recruitment Script Appendix 12: MEG Description Script Appendix 13: MEG Subject Instruction Sheet

xiii

Chapter 1 General Introduction Previous research has solidified our understanding of the behavioural and neural changes that take place in aging. Episodic and working memory functions decline with increasing age, while crystallized knowledge and procedural skills remain well-preserved up to the late 60s (Grady & Craik, 2000; Salthouse, 2007). Although they retain a more global awareness of the external environment, older adults also become more increasingly vulnerable to distraction. This can be observed in various settings ranging from basic perceptual speed (Lustig, Hasher, & Tonev, 2006) to more complex controlled memory search tasks (Hartman & Hasher, 1991). On the other hand, older adults are also more efficient at binding multimodal information (Laurienti, Burdette, Maldjian, & Wallace, 2006) The present thesis project focuses on skills that improve with age. Throughout the lifespan, sensory information arrives simultaneously through distinct sensory channels, most commonly auditory and visual ones. As such, multisensory features are linked by virtue of being attributed to a specific object. Continued exposure to such multisensory events sets up expectations about what a given object most likely "sounds" like, and vice versa. For example, a dog is expected to bark, not chirp, and an approaching car is unlikely to sound like a train engine. The binding of complex auditory and visual stimuli can be referred to as semantic audiovisual (AV) integration because it simulates naturalistic multisensory contexts. Semantic auditory stimuli include words and naturalistic animate or inanimate sounds. The corresponding semantic visual stimuli include articulatory gestures (i.e. facial expressions or lip movements) corresponding to the given words, and visual animate or inanimate objects. A recent behavioural study that used colours and spoken words pertaining to the corresponding colours in a simple detection task found that older adults benefited more from semantically congruent AV stimuli than young adults (Laurienti et al., 2006). In contrast to young adults, older adults showed a larger gain in performance from bimodal, AV presentations compared to unimodal (auditory only or visual only) ones. While integration of semantically congruent AV stimuli speeds up object detection and recognition, multisensory competition impairs performance. Multisensory competition refers to 1

2 situations of sensory dominance of one modality at the expense of another when participants are required to process and identify a target embedded within a bimodal event. Previous behavioural studies demonstrated the presence of visual dominance over audition with ambiguous visual inputs impairing the recognition of complex sounds more than ambiguous auditory information impaired visual object recognition (Chen & Spence, 2010; Laurienti, Kraft, Maldjian, Burdette, & Wallace, 2004; Yuval-Greenberg & Deouell, 2009). To date, most studies have focused on either multisensory facilitation or multisensory competition. Taking into consideration the extant literature that examined the behavioural advantages of AV semantic integration and the performance decreases following crossmodal conflicts, more research is required to determine whether a common network of brain regions is responsible for multisensory facilitation and multisensory competition. Thus, in the present study, we used magnetoencephalography (MEG) recordings to investigate the neural aspects of both multisensory facilitation and competition following the presentation of complex sounds and semantically related black-and-white line drawings of animate and inanimate objects. Although critical advancements have been made to elucidate the contributions of sensory-specific and multimodal neural systems to multisensory integration, this process is not well understood in aging. The present thesis project also examines how multisensory integration neural processes change with age.

1

Multisensory Facilitation 1.1 Overview

Perception by means of distinct sensory channels contributes to the richness and vividness of each sensory experience. While it seems natural to the perceiver that information from different sensory modalities is unified into a coherent whole, the central question of the mechanisms mediating this integration continues to puzzle researchers. Several factors, such as temporal correspondence and spatial congruence, seem to mediate multisensory integration in auditory and visual modalities. Examples of such integration can be observed using simple stimuli, such as pure tones and monochromatic lights in stimulus detection tasks. Todd (1912) reported the first account of multisensory facilitation, or faster response latencies following bimodal (auditory and visual) and trimodal (auditory, visual, and somatosensory)

3 presentations. The author applied a focused attention task in which participants were instructed to respond to monochromatic lights upon detection and ignore all other stimuli such as tones or tactile stimulation, which accompanied the visual target in a subset of the trials (Todd, 1912). Response times (RTs) to bimodal and trimodal stimuli were significantly faster than those to unimodal visual stimuli. For the last 20 years of behavioural and neuroimaging research in this field, the phenomenon of multisensory facilitation, or the improved detectability of multimodal stimuli, has been replicated in different variations from the original procedure. Behavioural studies manipulated the degree of spatial correspondence between auditory and visual stimuli (McDonald et al., 2000) and stimulus onset asynchronies (SOAs) between auditory, visual, and somatosensory targets (Diederich & Colonius, 2004) and demonstrated significantly faster and more accurate performance following multimodal presentations compared to unimodal ones. A stimulus presented in one sensory modality can facilitate the detection of a spatially coincident stimulus occurring in a distinct sensory modality. McDonald and colleagues (2000) used an auditory crossmodal cueing paradigm to demonstrate that the occurrence of an irrelevant sound decreases the speed of detection of a subsequent light when the light appears in the same location as the sound (McDonald, Teder-Salejarvi, & Hillyard, 2000). Similar crossmodal response facilitation effects were demonstrated between tactile and auditory modalities (Gillmeister & Eimer, 2007), and between olfactory and visual modalities (Gottfried & Dolan, 2003). As observed originally by Todd (1912), RT facilitation following bimodal, audiovisual stimuli could be further enhanced through the use of trimodal presentations with auditory, visual and somatosensory stimuli occurring simultaneously. Diederich and Colonius (2004) studied behavioural responses to unimodal stimuli and visual, auditory, and somatosensory stimuli that were presented simultaneously and at short intervals. The authors found that trimodal stimulus combinations yielded the largest multisensory facilitation effects. Specifically, trimodal RTs were shorter than RTs to any unimodal or bimodal combination, and increased monotonically as SOA values decreased. In other words, this trimodal RT advantage was largest when auditory, visual, and somatosensory stimuli were presented simultaneously. Indeed, this multisensory facilitation effect was modeled based on the assumption that it reflects the summation of signals from each sensory channel.

4 Computational models of the multisensory facilitation effect, or the improved detectability of multimodal stimuli, described the mechanisms underlying the behavioural evidence pioneered by Todd (1912). The independent race model, for example, assumes that the shorter RTs during multimodal trials reflect the triggering of a motor response on the basis of the first detected stimulus (Miller & Ulrich, 2003; Raab, 1962). The independent race model explains the multisensory facilitation effect in terms of a probability summation or statistical facilitation. The model predicts that if the distribution of the RTs for each of the two single targets overlaps, the mean RT for the redundant target trials will be faster than either of the two single target means. As such, the processing of a visual stimulus is facilitated by signal processing in the auditory system in response to the co-occurring auditory stimulus. The weakness of the independent race model is that it can only explain multisensory facilitation effects when bimodal or trimodal stimuli are temporally or spatially congruent and there is substantial evidence that multisensory facilitation occurs over both temporal and spatial gaps (cf., Diederich & Colonius, 2004; McDonald et al., 2000). Furthermore, in a series of experiments employing simple auditory and visual stimuli, Miller (1982) found that the presence of distractors in distinct sensory modalities increases target RTs as long as the two sets of stimuli are not spatially coincident. In other words, when crossmodal, non-target stimuli are presented in distinct locations from the target stimulus, they fail to facilitate the detection of the target. Indeed, situations of divided attention violate the independent race model because non-target stimuli presented in distinct modalities capture attention away from the target modality thereby increasing RTs. Miller (1982) posited that information from multiple sensory systems needs to be integrated prior to the final decision mark (Miller, 1982), suggesting that stimuli from distinct sensory modalities do not race with each other but are in fact pooled across channels into a single, integrated value. As such, multisensory co-activation models assume that there is some form of activation-strength summation with information being pooled across sensory channels prior to the execution of the motor response. The co-activation model presented above, however, emphasizes the importance of integration at higher levels without any intermediate crossmodal interactions. A third model, namely the interactive race model, combines the race and co-activation models, allowing for both low-level crossmodal interactions and higher level integration (Eriksen, Goettl, St James, & Fournier, 1989; Eriksen, Webb, & Fournier, 1990; Fournier & Eriksen, 1990; Mordkoff & Yantis, 1991).

5 The interactive race model can also account for the slowed RTs in cases of divided attention, but claims that the presence of a stimulus in one modality influences the processing of the second stimulus in the other modality at lower, sensory-specific levels. In other words, low-level auditory processing can influence visual processing, and vice versa; however, audiovisual activation strength is still summed across sensory channels prior to the execution of a motor response. Neurophysiological examinations of the multisensory facilitation effect used the interactive race model to describe the neural mechanisms underlying the performance benefits of multisensory presentations, or more specifically, the faster RTs and improved performance accuracy during bimodal (auditory and visual) compared to unimodal (auditory only or visual only) trial types. There is evidence from electrophysiological recordings and neuroimaging studies that multisensory interactions occur in both sensory-specific and multimodal brain regions (cf., Meredith & Stein, 1986; Molhom et al., 2002; Molholm et al., 2006).

1.2 Electrophysiological recordings of basic multisensory facilitation The mechanisms of multisensory integration have been investigated at the neural level using single-cell recordings in animals, and also using electrophysiological recordings in human subjects undergoing brain surgery. Pioneering work from Meredith and Stein (1986) demonstrated that neurons in the superior colliculus (SC) respond to stimuli of different sensory modalities - visual, auditory, and somatosensory (Meredith & Stein, 1986). In general, the SC plays a key role in orienting behaviours, whether it is overt orienting - moving the eyes to best capture a target stimulus, or covert orienting - the allocation of spatial attention to a region of interest. Meredith and Stein (1992) showed that the receptive fields of the SC cells were arranged to provide a functional map of external space, and were particularly sensitive to spatially congruent stimuli. Cells in its deep layers were found to be bimodal, specialized to respond to primarily audio-visual and visualsomatosensory stimuli. Indeed, receptive fields in the SC overlap, exhibiting a multiplicative response enhancement to multimodal stimuli from the same region of space (Meredith, Wallace, & Stein, 1992).

6 Multiple cell recordings in human participants suggested that brain regions in the posterior parietal cortex such as the superior parietal lobule (SPL) responded preferentially to spatially congruent bimodal stimuli as neurons in these regions also contained multimodal - auditory, visual and tactile - receptive fields (Molholm et al., 2006). The influence of temporal and spatial congruence on multisensory integration can be explained by the receptive field properties of multisensory neurons (cf., Meredith & Stein, 1986; Meredith et al., 1992; Molholm et al., 2006) in the SC or the SPL, respectively. In other words, multimodal presentations increase the firing rate of SC or SPL neurons because both stimuli, by virtue of their temporal or spatial coincidence, fall into the overlapping receptive fields of multimodal neurons. Other mechanisms of audiovisual integration that do not take advantage of such low-level properties of crossmodal stimuli, such as spatial or temporal correspondence, but instead investigate the degree of semantic relatedness between crossmodal stimuli operate on a higher level of multisensory integration. As such, those processes are more difficult to accommodate in similar theoretical and modeling frameworks.

1.3 Neuroimaging studies of basic multisensory facilitation Non-invasive electroencephalography (EEG) studies in humans examined the time course of concurrent multimodal presentations using temporally and spatially coincident abstract auditory and visual stimuli (Teder-Salejarvi, Di Russo, McDonald, & Hillyard, 2005; Teder-Salejarvi, McDonald, Di Russo, & Hillyard, 2002). Furthermore, the studies presented below were also performed on healthy, young adults. Early modulations of event-related potentials (ERPs) were observed in sensory-specific temporal and occipital electrodes. Late amplitude deflections were also captured in temporal and parietal electrode sites (Teder-Salejarvi, Di Russo, McDonald, & Hillyard, 2005; Teder-Salejarvi, McDonald, Di Russo, & Hillyard, 2002). Studies from Molholm and colleagues (Molholm et al., 2002) showed early auditory-visual interactions in parietal-occipital electrodes using simple stimulus detection tasks with bimodal and unimodal auditory and visual stimuli. Functional resonance imaging (fMRI) studies using spatially and temporally congruent AV presentations showed that several regions in the posterior parietal cortex responded preferentially to bimodal compared to unimodal stimuli (for a review, see Calvert et al., 2001). Neuroimaging studies of multisensory integration suggest that it is not a linear, supra-additive process. In

7 addition to eliciting increased sensory-specific cortical activity, bimodal stimuli also activated multimodal brain regions, including the inferior parietal sulcus (IPS), the inferior parietal lobule (IPL BA 40), and the SPL (BA 39) during simple stimulus detection tasks (Baumann & Greenlee, 2007; Bishop & Miller, 2008; Calvert, Hansen, Iversen, & Brammer, 2001; Grefkes, Weiss, Zilles, & Fink, 2002; Macaluso, George, Dolan, Spence, & Driver, 2004; Moran, Molholm, Reilly, & Foxe, 2008). In the present thesis project, we propose that, in contrast to merely varying the degree of spatial and temporal audiovisual correspondence, multisensory facilitation can also be examined using bimodal stimuli that share features in common. Such mechanisms have frequently been called "semantic" (cf., Calvert et al., 2001; Laurienti et al., 2003) providing improved ecological validity as they simulate naturalistic contexts of multisensory interactions.

2

Audiovisual Semantic Congruence: A Special Case of Multisensory Facilitation

Typical multisensory events within a semantic context are those in which a complex visual stimulus is paired with its auditory counterpart, for instance a picture of a dog paired with a corresponding barking sound. Studies that examine multisensory facilitation based on the congruency of complex sounds and visual features measure multisensory associations previously established in the course of an individual's learning history. Thus, an individual's familiarity and previous exposure to a particular stimulus category (e.g., animals, musical instruments, household objects) is assumed by the experimenter and not directly manipulated. In many cases, these associations can be regarded as arbitrary as they emerge from continued exposure to semantic information conveyed through distinct sensory channels. This is in contrast to features that are intrinsically supramodal, such as rhythm and motion, which could in principle be perceived in more than one sensory modality (see Bahrick & Lickliter (2004) for a developmental account of the emergence of supramodal properties). Once arbitrarily-related stimuli across sensory modalities become associated, they contribute to a holistic representation of a particular object. This sets up expectations about what a given object most likely "sounds" like, and vice versa. Such multisensory congruence facilitates object detection and recognition (Laurienti et al, 2004).

8 Behavioural (e.g. Laurienti et al, 2004) and neuroimaging studies (e.g. Hein et al., 2007; N. van Atteveldt, Formisano, Goebel, & Blomert, 2004) sought to capture the effects of AV congruence on object detection and recognition by contrasting "congruent" and "incongruent" experimental conditions (for a review, see Doehrmann & Naumer, 2008). The effects of AV congruence on multisensory processes can be measured using various experimental paradigms. For instance, congruent and incongruent pairings could be generated using various types of semantic content such as visual articulatory gestures and voice stimuli (e.g. Dolan, Morris, & de Gelder, 2001) or complex images and sounds of common objects (e.g. Noppeney, Josephs, Hocking, Price, & Friston, 2008; Noppeney, Ostwald, & Werner, 2010). Behavioural studies examined the impact of semantic congruency by contrasting matching (or congruent) and non-matching (or incongruent) AV presentations to unimodal ones. While a large number of behavioural studies examined the influence of spatial and temporal factors on multisensory integration (cf., Fournier & Eriksen, 1990; Mordkoff & Miller, 1993), the impact of semantics on AV interactions has been explored in a limited number of studies. Laurienti and colleagues (2004) developed a behavioural paradigm, which examined the effects of crossmodal audiovisual pairings and intramodal visual pairings on stimulus detection speed (Laurienti et al., 2004). In crossmodal AV pairings, blue coloured circles were presented along with the verbalization of the word "blue". In intramodal (visual-visual) trials, the written word "blue" was presented on top of the blue circle. Significantly faster RTs were detected specifically for semantically congruent AV pairs, but not for congruent, visual intramodal pairs. In contrast to the multisensory facilitation effects of congruent AV pairs, significantly longer RTs were observed in response to incongruent AV combinations. The study highlights the impact of crossmodal, and not intramodal, semantic congruency on response latencies. Congruency effects on perception of size were demonstrated using simple AV stimuli that were paired together according to size. Gallace and Spence (2006) demonstrated that the degree of congruency between disks and low to high frequency tones can have a significant impact on size judgments of disk stimuli, even when participants were explicitly instructed to ignore the concurrent tones and respond to the visual stimuli only (Gallace & Spence, 2006). Recently, Chen and Spence (2010) assessed the effects of audiovisual semantic congruency on participants' ability to identify briefly-presented, and then rapidly-masked pictures. A

9 semantically congruent sound was presented concurrently with the visual targets in a percentage of the trials at a SOA that varied between 0 and 533 ms (auditory lagging). The results suggested that when the pictures and the tones were presented simultaneously, a semantically congruent sound improved while a semantically-incongruent sound impaired participants' ability to identify the pictures. Visual stimulus identification was unaffected in the white-noise control condition. Significant multisensory facilitation effects were observed at SOAs of 300ms, whereas no semantic congruency effects were detected at the longest interval of 533ms. These results suggest that AV integration of semantically related stimuli is not constrained by strict temporal coincidence of the constituent auditory and visual stimuli at short time lags ≤ 300ms. Thus, AV integration might occur in a short-term, working memory buffer, which also accesses the semantic representations of visual and auditory objects. Expanding on the relationship between AV integration and long-term semantic representations, Murray et al. (2004) showed that recognition performance for repeated presentations of line drawings representing common objects was significantly improved when the line drawings were paired with task-irrelevant, but semantically congruent tones (Murray et al., 2004). Most notably, subsequent memory performance was significantly improved by prior exposure to semantically congruent, and not incongruent, AV presentations. Thus, presentations of semantically congruent AV stimuli do not only affect immediate performance, but can also influence subsequent memory for the AV objects used. Sensory-specific auditory and visual influences on multisensory integration also depend on the extent to which tones and pictures capture attention. For example, psychophysical experiments from Kubovy and van Valkenburg (2001) suggested that the human auditory system is in the service of the visual-motor system, because it facilitates visual spatial attention (Kubovy & Van Valkenburg, 2001). Thus, the extent of AV integration might be modulated by the attentional capture of one of the two sensory modalities that form the coherent AV stimulus. Sensory-specific influences on AV integration were demonstrated in a name verification task using complex, naturalistic objects (Yuval-Greenberg & Deouell, 2009). Stronger effects of AV congruency on performance were observed when participants were required to attend to the auditory modality. Irrelevant visual information affected auditory recognition more than irrelevant auditory information affected visual recognition. These results suggest the presence of

10 a more pronounced influence of concurrent visual stimulation on auditory perception than vice versa. This perceptual effect might occur because the visual modality provides more reliable and unambiguous information for object recognition. This interpretation is consistent with the documented situations of visual dominance over auditory processing as in the Colavita effect (Colavita, 1974). The Colavita effect refers to the increased incidence of misses in response to AV presentations due to visual modality dominance over auditory and tactile targets when participants are required to detect an auditory or tactile target embedded within a bimodal event.

2.1 Multisensory competition While integration of semantically congruent AV stimuli speeds up object detection and recognition, crossmodal conflicts impair performance. Several behavioural studies demonstrated the presence of visual dominance during crossmodal conflicts with ambiguous visual inputs impairing the recognition of complex sounds more than ambiguous auditory information impaired visual object recognition (Chen & Spence, 2010; Yuval-Greenberg & Deouell, 2009). In an example of AV presentations leading to visual interference, Colavita and colleagues (Colavita, 1974; Colavita & Weisberg, 1979) presented sequences of intermixed light flashes and simple tones, and instructed participants to respond as rapidly as possible to auditory targets with one response key and to visual targets with another. Bimodal trials, consisting of simultaneous presentation of visual flashes and auditory beeps, were presented occasionally (i.e., in 15% of the trials). Surprisingly, despite equally fast and accurate responses to unimodal visual and auditory presentations, participants almost exclusively pressed the visual key following the rare bimodal trials. The increased number of false alarms for unimodal visual cases following bimodal presentations suggests that the visual modality was more salient than the auditory one, leading to interference from the visual modality on bimodal trials. Following earlier experiments of the Colavita effect several recent studies from Spence and colleagues (Hartcher-O'Brien, Gallace, Krings, Koppen, & Spence, 2008; Koppen, Alsius, & Spence, 2008; Koppen & Spence, 2007a, 2007b; Sinnett, Soto-Faraco, & Spence, 2008; Sinnett, Spence, & Soto-Faraco, 2007) found an overall slowing down of RTs in bimodal trials when participants were required to detect occasional auditory targets embedded in bimodal presentations. The slowing down of RTs under these conditions reflects the interference of the more dominant visual modality on the processing of auditory targets.

11 Additionally, Sinnett et al. (2008) reported another significant aspect of multisensory competition when employing familiar, naturalistic sounds and line drawings. As participants were required to identify auditory stimuli that were either embedded in bimodal, audiovisual pairs or presented in isolation, they were faster and more accurate to respond to unimodal auditory trials compared to bimodal ones. This suggested that line drawings masked the detection of the complex sounds. Conversely, when required to respond to visual targets only, participants were significantly faster and more accurate to respond to bimodal trials compared to unimodal visual ones. The former behavioural effect reflects visual dominance over audition as reported previously in studies that employed more simple AV stimuli. The latter behavioural result, on the other hand, reflects the classical multisensory facilitation effect following temporally coincident bimodal presentations. Thus, in the latter case, the auditory target did not interfere with the processing of the visual target; on the contrary, the presence of the auditory target sped up the detection of the visual target. It should also be noted that simple detection tasks as those used by Sinnett and colleagues or Laurienti et al. (2004) might benefit less from AV semantic congruency as compared to target identification or categorization tasks. For example, the categorization of a complex auditory or visual stimulus, into animate or inanimate classes, might require a higher level of object processing. Object categorization may therefore be enhanced by congruent semantic input provided by a second sensory modality. In contrast, the detection of a given target could be also achieved by using stimuli that lack semantic content. Even though vision can bias auditory object perception, vision does not always dominate audition. The perceived time of occurrence of a flash of light can be influenced by the presentation of an asynchronous simple beep sound in a phenomenon entitled “temporal ventriloquism” (Bertelson & Aschersleben, 2003; Recanzone, 2003). The asynchronous beep biased the perceived onset and duration of a light stimulus. Another example of audition dominating vision was observed in the illusory flash effect described as the compulsory integration of visual and auditory information when multisensory events are presented in succession. When a flashing light is accompanied by more than one beep, it appears to flash twice, although the extra flash that is perceived is illusory (Shams et al., 2000).

12 In summary, a number of behavioural studies demonstrated significant influences of semantically related AV stimuli on perception. While congruent, bimodal auditory and visual information facilitates object detection and recognition, AV incongruence greatly impairs performance as in the cases of crossmodal conflict or even obscures perception as in the cases of temporal ventriloquism. Given the beneficial and detrimental effects of binding auditory and visual information into a single object representation, more research is required to understand the neural processes supporting both kinds of multisensory interactions.

2.2 Electrophysiological studies of multisensory interactions A growing body of electrophysiological studies extended the study of multisensory facilitation to the domain of semantically related AV stimuli. Several studies found both early and late modulations of ERPs associated with concurrent AV presentations of semantic AV objects (Giard & Peronnet, 1999; Molholm, Ritter, Javitt, & Foxe, 2004; Yuval-Greenberg & Deouell, 2007). The first study to demonstrate both early and late multisensory interactions across semantically related AV stimuli was performed by Giard and Peronnet (1999). The group used circles and pure tones to examine whether, under the same object manipulations (i.e. the pitch of the pure tones was incrementally increased and the diameter of the circle was enlarged to become more ellipsoid), two stimuli from distinct modalities could be linked into bimodal pairs. The participants’ task was to identify which category of unimodal or bimodal object was presented by pressing one of two keys. The two objects were defined either by auditory or visual features alone, or by the combination of their respective unimodal features. Participants were significantly faster when the AV stimuli shared congruent features; they were slowest to respond to bimodal, incongruent pairs. Giard and Peronnet (1999) also detected significant crossmodal interactions at early stages of sensory processing, at the visual N1 and auditory N1 components across modality-specific electrode sites, and at later processing stages between 140 and 165ms across right frontal-temporal electrode sites. Furthermore, at even later latencies, multimodal processes were captured by positive P3 waves peaking at parietal electrodes. Using an object detection task of natural bimodal and unimodal auditory and visual objects, Molholm and colleagues (2004) showed that significant modulations of the visual N1 component accompanied bimodal presentations of familiar objects. Congruency effects were observed for

13 target stimuli only between 150 and 280ms, an effect that was localized to the right occipitotemporal cortex. However, because the degree of congruence between the auditory and visual features of the bimodal objects had no effect on non-target stimuli, the findings from Molholm et al. might not reflect the effects of multisensory facilitation following semantically related AV presentations. The amplitude modulations of the visual N1 component were likely influenced by attentional processes as participants were required to keep track of the target stimuli. In contrast to the study from Molholm et al. (2004), Yuval-Greenberg and Deouell (2007) examined oscillatory gamma-band activity (between 30-70Hz) associated with AV integration using a name verification task of naturalistic objects (Yuval-Greenberg & Deouell, 2007). Early and late gamma-band activity at 90ms and 260ms, respectively, was associated with low-level feature integration and higher level object representation. There is increasing evidence that meaning and semantic relatedness play an important role in multisensory associations. For instance, Senkowski et al. (2007) compared the effects of natural and abstract stimuli reflecting motion. Participants were presented with a random stream of naturalistic and abstract video clips, and static target stimuli. Each stimulus class consisted of unimodal auditory, unimodal visual and bimodal presentations. The participants’ task was to count the target stimuli embedded within the natural and abstract stimulus classes. Only the former reflected early multisensory integration effects after approximately 120ms. Naturalistic motion represented by AV presentations was associated with increased amplitude modulations over occipital, temporal and frontal regions. In contrast, multimodal effects of abstract motion stimuli were observed 210ms after stimulus onset across inferior parietal electrode sites (Senkowski, Saint-Amour, Kelly, & Foxe, 2007). These findings suggest that natural motion is processed as early as ~100ms after stimulus onset. Furthermore, the study also showed that natural motion presented via congruent AV stimuli modulated cortical activity across various primary and secondary sensory areas, including temporal and occipital cortices. Abstract motion, on the other hand, was associated with late amplitude deflections (~210ms) and recruited posterior parietal areas that are sensitive to any multimodal presentations, any combination of auditory, visual or tactile stimuli. The findings suggest that familiar AV content is processed at earlier stages of stimulus processing compared to novel, unfamiliar AV information.

14 With respect to the incongruence of multisensory stimuli (i.e., auditory and visual pairs that were semantically unmatched), more negative amplitude deflections were reported 390ms after stimulus onset (Molholm et al., 2004). This ERP pattern resembled the N400 component, which was previously associated with semantic mismatch processing of objects in linguistic contexts (Kutas & Federmeier, 2000). A likely source of the N400 component was localized to the temporal cortex, although there is evidence from fMRI studies that inferior frontal regions also contribute to this effect (Van Petten & Luka, 2006). The N400 component has also been reported in studies examining Stroop-like interference processes with both intramodal and crossmodal stimulus combinations, yielding stronger amplitude modulations for incongruent compared to congruent conditions in the anterior cingulate cortex (ACC), and at later latencies, in the inferior frontal cortex. This suggests that cingulate and inferior frontal activation patterns reflect conflict monitoring neural mechanisms that are not necessarily specific to semantically related AV presentations. To summarize, previous ERP studies demonstrated that multisensory binding is more than just a supra-additive process. Multimodal posterior parietal areas respond preferentially to bimodal presentations, while temporal and frontal regions are sensitive to the information conveyed across auditory and visual sensory channels. However, the involvement of specific networks of brain regions during the processing of semantically related AV content needs to be examined with neuroimaging techniques that provide superior spatial resolution, such as fMRI or magnetoencephalography (MEG) recordings.

2.3 Functional neuroimaging studies of multisensory interactions Functional neuroimaging studies investigated the neural correlates of multisensory integration by comparing unimodal presentations to bimodal ones. One of the first fMRI studies to investigate the impact of semantic congruence between sounds and pictures on multisensory integration comes from Laurienti et al (2003). Participants were presented with semantically congruent and incongruent AV stimuli of common, inanimate objects. Although visual stimuli were paired with semantically matching or unmatching tones, participants were required to respond to the visual stimuli only. Congruent, compared to incongruent, bimodal presentations activated the

15 ACC and adjacent medial prefrontal cortices. Furthermore, these activation patterns were also modulated by task difficulty. These results, however, must be interpreted cautiously. First of all, due to the paradigm used, attentional processes directed to visual stimuli may only affect the degree of integration with semantically related complex sounds, which had to be ignored. Therefore, the effects of multisensory integration cannot be disentangled from those of attention. Secondly, the authors did not employ any unimodal control conditions, which could have enabled the assessment of multisensory, compared to unimodal, facilitation effects on performance. To overcome previous limitations of not employing adequate control conditions, other fMRI studies contrasted bimodal, AV presentations to unimodal ones (Beauchamp, Lee, Argall, & Martin, 2004). Beauchamp et al. (2004) conducted a series of experiments using AV stimuli taken from two categories: animate (animals) and inanimate (tools) objects. Black-and-white photographs of tools (man-made manipulable objects) and animals were presented alongside sounds produced by animals and tool manipulations. Participants made a two-alternative forced choice between congruent and incongruent presentations. The posterior superior temporal sulcus (pSTS) and the posterior middle temporal gyrus (pMTG) showed increased blood oxygen level dependent (BOLD) activity when auditory and visual object features were presented together relative to unimodal stimuli presented alone. In terms of semantic relatedness, Beauchamp and colleagues observed a trend towards enhanced activation in the STS and MTG regions in response to congruent compared to incongruent AV stimuli. Thus, the superior and middle temporal cortices may be sensitive to the degree of semantic relatedness between auditory and visual stimuli. Furthermore, van Atteveldt et al. (2010) used an fMRI adaptation paradigm in which AV stimuli (letter-speech sound pairs) were presented repeatedly (van Atteveldt, Blau, Blomert, & Goebel, 2010). The authors manipulated the degree of congruency between the auditory and visual components, and showed that occipital and temporal regions adapted independently of the degree of congruency between bimodal presentations. Clusters in the superior temporal gyrus, however, adapted more strongly to congruent compared to incongruent AV stimuli, indicating increased sensitivity to the degree of semantic relatedness between auditory and visual features. Manipulating semantic congruency of AV stimuli, Belardinelli et al. (2004) demonstrated

16 significant effects of semantically congruent compared to incongruent AV stimuli in sensory association cortices, including the left ventral and medial temporal cortex as well as in the bilateral lingual gyrus. The left inferior frontal gyrus, on the other hand, responded more strongly to incongruent AV combinations (Belardinelli et al., 2004). In spite of previous research on the effects of semantic congruence, there has not yet been a consensus regarding the network of brain regions that are sensitive to AV semantic incongruence. Recently, Taylor et al. (2006) demonstrated that the parahippocampal gyrus in the medial temporal lobes responded preferentially to incongruent, relative to congruent, AV combinations (Taylor, Moss, Stamatakis, & Tyler, 2006). Consistent with previous reports from Laurienti et al. (2004), Taylor et al (2006) also found increased activity in the medial frontal cortex (MFC) and the ACC in response to incongruent compared to unimodal stimuli. Both cortical structures have also been implicated in the detection of conflicts between simultaneously active, competing representations of stimuli both within and between sensory modalities (cf., Carter & van Veen, 2007; Jessup, Busemeyer, & Brown, 2010; Nakao et al., 2010). Others (e.g. Naghavi, Eriksson, Larsson, & Nyberg, 2007), however, failed to show effects in frontocingulate regions, and reported only right claustrum and insula activations in response to semantic conflicts, suggesting that the detection of semantic incongruence is a complex process that may strongly depend on the nature of the tasks employed. These results further support the need for more studies that examine the impact of AV incongruence on multisensory integration. Using a different approach to tackle the problem of semantic AV binding, Noppeney and colleagues (Noppeney et al., 2008) used priming paradigms to determine the sensitivity of various cortical regions in response to AV incongruence. The authors presented participants with pictures of animate (animals) and inanimate (man-made) objects along with matching and non-matching natural sounds or spoken words. Consistent with previous fMRI studies, Noppeney et al. (2008) demonstrated that several regions in the temporal, parietal and frontal lobes showed incongruency effects, however, they did not participate equally in response to the different types of stimuli. Only the medial and inferior frontal cortices showed incongruence effects for both spoken words and natural sounds when paired with the animate or inanimate pictures. These results are consistent with previous reports of conflict monitoring (cf., Van Petten & Luka, 2006). Stronger incongruence effects for words were observed in the posterior

17 STS and MTG, and, conversely, larger effects for non-linguistic sounds were captured in the IPL extending to the IPS. In a subsequent effective connectivity analysis that examined causal interactions among functionally connected brain regions, Noppeney et al. (2010) suggested that multisensory incongruence was mediated by bottom-up processes in the auditory cortex including primary and secondary auditory areas (Noppeney et al., 2010). Participants performed a visual selective attention paradigm, in which they were presented with AV movies of hand actions from two categories: tools and musical instruments. In a two-alternative force-choice task, participants categorized the videos as tools or musical instruments while ignoring the concurrent congruent or incongruent tones. Both auditory and visual information could be intact (reliable) or degraded (unreliable). Behaviourally, semantic incongruence effects increased as the visual input became more unreliable and as the auditory input determined AV object categorization. The left inferior frontal sulcus (IFS) was the only region that showed an interaction between AV incongruence and auditory and visual reliability, exhibiting increased activity with increases in the variance of the task-relevant visual input. Surprisingly, the ACC and MFC showed very small task-related effects. The right fusiform gyrus was the only region to display increased activation for incongruent compared to congruent AV stimuli, an activation pattern that was only to a very limited degree modulated by auditory and visual reliabilities. Recently, Hein et al. (2007) examined whether inferior frontal sensitivity to incongruent AV material depends on the observer's familiarity with the bimodal stimuli (Hein et al., 2007). Participants were presented with familiar, semantically related AV stimuli of animals, and artificial, unfamiliar AV stimuli. By contrasting BOLD activity related to unimodal stimulation to activity associated with AV presentations, the authors showed that the left inferior frontal cortex (IFC) responded more strongly to incongruent familiar stimuli. Furthermore, unfamiliar AV stimuli activated the right IFC only. Conversely, activity in the posterior STS and superior temporal cortex was specifically associated with congruent familiar stimuli. The authors concluded that the left IFC monitors the degree of mismatch across over-learned (or familiar) stimuli presented across distinct sensory channels. The function of the right IFC, on the other hand, is to compare ambiguous auditory and visual inputs to determine their degree of semantic relatedness based on no prior semantic knowledge of these objects.

18 The activation patterns reported in the aforementioned fMRI studies depended greatly on the paradigm and types of bimodal stimuli utilized. Participants performed diverse behavioural tasks ranging from simple fixation (e.g., Hein et al., 2007; van Atteveldt et al., 2004) to explicit semantic decisions on congruent and incongruent AV stimuli (e.g., Noppeney et al., 2008; Noppeney et al., 2010; Taylor et al., 2006). While the former may be suitable for determining the network of regions sensitive to AV stimulation, the latter paradigms operate on a higher cognitive level involving decisions about semantic relatedness. The fMRI studies described thus far used semantically-related AV stimuli to identify a set of superior temporal and inferior frontal cortical regions that were significantly modulated by the degree of semantic congruence between auditory and visual targets embedded within multisensory presentations. Moreover, a subset of studies also suggested a further differentiation in inferior frontal regions with increased preference for incongruent compared to congruent AV stimuli.

2.4 Audiovisual speech: A special case of multimodal congruence Audiovisual speech is a special class of familiar, bimodal stimuli that can be regarded as a specialized system, because it transmits meaning in a supramodal fashion, independent of the sensory modality used for communication. Sumby and Pollack (1954) pioneered the field of audiovisual speech showing evidence of multisensory facilitation of speech comprehension under adverse conditions (Sumby & Pollack, 1954). Participants' ability to perceive speech in a noisy environment was primarily influenced by the perceived articulatory gestures (i.e. facial expressions or lip movements). A subcategory of feature congruency paradigms includes those which manipulate the correspondence between phonemes and articulatory gestures. The most documented is the McGurk effect (McGurk & MacDonald, 1976) in which seeing incongruent or temporally asynchronous acoustic and articulatory events can modify the percept phonetically (McGurk & MacDonald, 1976). For example, the McGurk effect can be observed following the presentation of video clips, which induce illusory percepts of hearing the syllable /da/ although participants actually see lips mouthing /ba/. In an fMRI study of the McGurk effect, Jones and Callan (2003) showed that illusory perceptions pertaining to incongruent acoustic and articulatory events

19 activated posterior parietal, inferior frontal and superior temporal regions. Several MEG studies (e.g., Kaiser, Hertrich, Ackermann, Mathiak, & Lutzenberger, 2005; Nishitani & Hari, 2002) demonstrated effects in the oscillatory gamma-band activity (~30 Hz and up) induced by McGurk-like stimuli beginning in the posterior parietal cortex at 160ms, and extending over occipital, temporal to inferior frontal regions between 200 and 320ms. Neuroimaging studies showed that silent lip reading activated primary auditory regions (Calvert et al., 1997; Molholm & Foxe, 2005). Moreover, synchronous meaningful AV speech increased blood flow in the posterior left STS in comparison to single presentations of either articulatory gestures or acoustic speech (Allison, Puce, & McCarthy, 2000; Puce, Allison, Bentin, Gore, & McCarthy, 1998), suggesting that the posterior STS is also involved in crossmodal binding between auditory and visual components of speech. As outlined in the previous section, recent neuroimaging studies demonstrated that semantically related, multisensory stimuli activated primary temporal and frontal areas in the brain. Cingulate, middle and superior temporal regions seemed to respond more strongly to congruent features of familiar AV objects. In contrast, left inferior frontal cortices participated more in the processing of incongruent AV stimuli potentially reflecting the detection of crossmodal conflict and the increased cognitive control demands when comparing complex sounds to visual targets and vice versa. Finally, right inferior frontal cortices processed unfamiliar AV stimuli. This region may be involved in comparing ambiguous auditory to visual information to determine degree of mismatch between the two sensory modalities. Given the behavioural advantages of multisensory semantic integration and the performance decreases during crossmodal conflicts, several critical questions can be posed: (1) is there a common network of brain regions responsible for both multisensory facilitation and competition processes, and (2) to what extent are multisensory facilitation and competition effects modulated by the degree of correspondence between semantic visual and auditory targets? In the present study, we investigated the effects of both multisensory facilitation and competition in sensoryspecific and multimodal brain areas during semantic, multisensory presentations using MEG recordings. We were particularly interested in contrasting multisensory processes during conditions of semantic congruence and performance facilitation to those containing crossmodal conflicts.

20 Although significant strides have been made to determine the performance advantages and neural processes underlying multisensory integration, few studies have examined changes in multisensory processing with age. To date, most studies examined multisensory processes in healthy, young adults. Given the age-related changes in sensory acuity (Cerella, 1985; Schneider et al., 1998; Alain, Dyson, & Snyder, 2006), we predict that multisensory facilitation and competition processes change with age. Recent behavioural examinations of age-related changes in multisensory integration suggested that older adults benefited more from semantically congruent AV stimuli than younger adults (Laurienti et al., 2006; Peiffer, Mozolic, Hugenschmidt, & Laurienti, 2007). In contrast to young adults, older adults exhibit a larger performance gain in response to bimodal, AV presentations relative to unimodal (auditory only or visual only) ones. No studies to date have examined how older adults integrate complex sounds with semantically related or unrelated visual stimuli. Given the behavioural facilitation effects of multisensory integration in the older group, the present study aims to capture group differences in multisensory integration at the neural level by measuring multisensory facilitation and competition effects in both young and older participants using MEG recordings.

3

Perceptual and Cognitive Changes with Increasing Age

Changes in basic auditory and visual processing occur with age. In addition to age-related reductions in visual acuity (Cerella, 1985; Spear, 1993), peripheral auditory processes also change with age. Aside from general age-related hearing loss (Schneider, Speranza, & PichoraFuller, 1998), increased age is also accompanied by deficits in central auditory processing, including the reduced efficiency of temporal and spectral resolution (cf., Alain, Ogawa, & Woods, 1996; Alain et al., 2006; Pichora-Fuller, Schneider, & Daneman, 1995; Schneider, Daneman, & Pichora-Fuller, 2002). In addition to perceptual changes, a recent qualitative examination of age-related changes in cognition suggested that while semantic knowledge remains stable with increasing age (Craik & Bialystok, 2006), fluid intelligence, demonstrated in the form of attentional regulation, decreases in efficiency, speed, and complexity (cf., Park et al., 2002; Salthouse, 2006). Attentional regulation is influenced by many factors including reductions in the ability to suppress irrelevant

21 information in a task-relevant context (Hasher, Stoltzfus, Zacks, & Rypma, 1991; Hasher, Zacks, & May, 1999). In general, inhibitory attentional-control mechanisms limit the amount of items that enter the focus of attention, suppressing information that is irrelevant to the task context. V arious studies from Hasher and colleagues (Chiappe, Hasher, & Siegel, 2000; Darowski, Helder, Zacks, Hasher, & Hambrick, 2008; Hasher et al., 1991; May, Hasher, & Kane, 1999; May, Zacks, Hasher, & Multhaup, 1999; Stoltzfus, Hasher, Zacks, Ulivi, & Goldstein, 1993) suggested that inhibitory control is reduced in older compared to younger adults.

3.1 Age and inhibitory control: Behavioural findings Previous research has shown that older adults are more vulnerable than young adults to the disruptive effects of concurrent distraction as can be seen in basic perceptual speed (Lustig et al., 2006), selective attention (Stoltzfus et al., 1993), controlled memory search (Hartman & Hasher, 1991), visual search (Scialfa, Esau, & Joffe, 1998), sustained attention (Bunce, Warr, & Cochrane, 1993), Stroop (Spieler, Balota, & Faust, 1996; West & Alain, 1999, 2000), and flanker tasks (Zeef, Sonke, Kok, Buiten, & Kenemans, 1996). Because they are not efficiently dampened, task-irrelevant stimuli may receive a richer representation, and become strongly associated to target stimuli (Campbell, Hasher, & Thomas, 2010b). Reduced suppression of distractors, however, confers an advantage, facilitating performance on subsequent implicit memory tasks (Kim, Hasher, & Zacks, 2007; Rowe, Valderrama, Hasher, & Lenartowicz, 2006), and cued-recall memory tasks (Campbell et al., 2010b). Older adults seem to encode irrelevant stimulus information without awareness, showing superior performance on memory tasks in which stimuli that were distracting in previous tasks became relevant on the given task. These results jointly suggest the existence of a hyper-binding phenomenon with increasing age (Campbell et al., 2010b). Inefficient inhibitory control can also influence crossmodal interactions (Tun, O'Kane, & Wingfield, 2002). Tun et al. (2002) found age differences in crossmodal interference during a visual attention task when distracting information was presented in the auditory modality. Recently, however, studies measuring saccadic RTs to audiovisual targets demonstrated that older adults benefit more from crossmodal cueing than younger adults (Campbell, Al-Aidroos, Fatt, Pratt, & Hasher, 2010a). In contrast to unimodal targets, visual targets that were preceded by pure tones were associated with greater saccadic trajectory deviations away from task-

22 irrelevant visual distractors in older adults. No differences between unimodal and crossmodal conditions were observed in the younger group. Recently, Guerreiro et al. (2010) suggested that age-related distractibility is modality dependent, and more sensitive to interference from task-irrelevant visual stimuli than auditory ones. By performing a meta-analysis of previous studies that examined inhibitory control, the group suggested that task-irrelevant auditory suppression during visual attention was more successful than task-irrelevant visual suppression during auditory attention tasks (Guerreiro, Murphy, & Van Gerven, 2010). Furthermore, the group suggested that the modality-specificity of distraction was dependent on working memory load. In other words, they predicted that during a low working memory load imposed by a primary visual task, older participants will successfully suppress irrelevant auditory information. Conversely, when the primary visual task involves higher cognitive demands, older participants will likely fail to suppress auditory distractors. This proposal, however, emerged from fewer studies reporting auditory-related distraction during visual attention tasks, and remains untested. Recent findings of enhanced multisensory facilitation with age (cf., Laurienti et al., 2006) may be explained partly by age-related changes in inhibitory control. Older adults may associate complex sounds with semantically congruent visual stimuli more strongly than young or middleaged groups simply by virtue of their temporal and spatial correspondence. The present thesis project further explored the behavioural differences in multisensory facilitation in addition to the neural mechanisms underlying these processes in young and older subjects.

3.2 Age and inhibitory control: Electrophysiological changes Electrophysiological studies have reported age-related changes in the amplitude, latency and topographic distribution of brain electrical responses, which have been associated with a decreased ability to filter incoming information (for a review, see Polich, 1996). Increased activation and altered scalp topography may reflect a reduction in brain functional specificity with age (for a review, see Dustman, Emmerson, & Shearer, 1996 and Park & Reuter-Lorenz, 2009) Previous electrophysiological studies that used auditory distractors embedded amidst visual targets (and vice-versa) demonstrated enhanced early and late sensory evoked potentials in

23 response to distracting auditory (Alain & Woods, 1999; Chao & Knight, 1997; Pekkonen et al., 1995), and visual stimuli (Ceponiene, Westerfield, Torki, & Townsend, 2008; Gazzaley et al., 2008). In selective attention tasks, participants responded to infrequent targets embedded in a sequence of standard stimuli in the visual or auditory modality while ignoring competing stimuli in the alternate modality (Ceponiene et al., 2008). They were instructed to attend to the visual sensory modality while ignoring incidental auditory stimuli. Reduced inhibition of auditory stimuli was captured across late P2 and N2 components in older compared to younger adults. This age-related increase in the amplitude of the auditory P2 component reflected increased processing of the auditory distractor during the focused visual attention task. Conversely, the amplitude decrease and latency increase of the N2 component reflected the decreased inhibition of the auditory stimuli. Alain and Woods (1999) also found enhanced sensory responses to irrelevant auditory stimuli in healthy older adults during visual attention tasks, contributing to the original inhibitory deficit model of aging (cf., Hasher et al., 1991). These studies, therefore, proposed an age-related loss of suppression over task-irrelevant inputs to the primary auditory cortex, a result that is comparable to what has been observed in the visual modality (Gazzaley et al., 2008). Using complex stimuli such as faces and houses, Gazzaley and colleagues (2008) found that older adults fail to inhibit face stimuli amidst task-relevant pictures of natural scenes, and vice versa. Reduced inhibitory control over task-irrelevant material was associated with an increase in the P1 component for task-irrelevant face stimuli when the targets were scenes (and vice versa), with a more pronounced effect in lower-performing older adults (Gazzaley et al., 2008). Furthermore, older adults showed a generalized increase in midline theta power, a frontal rhythm associated with increased working memory load (Klimesch, 1999) in response to both taskrelevant and irrelevant items, suggesting that, in addition to increased overall effort with aging, excessive attention is directed toward processing distractors early in the time course of viewing the stimuli. The phenomenon of inefficient inhibitory control with increasing age was documented by Fabiani and colleagues in terms of reduced suppression of stimulus-evoked potentials following repetition (Fabiani, Low, Wee, Sable, & Gratton, 2006; Golob, Miranda, Johnson, & Starr, 2001). Fabiani et al. (2006) found that older adults, in comparison to young, exhibit reduced N1 attenuation to repeated tones. Reduced N1 attenuation after repetition of a second tone pair for

24 older relative to young adults was demonstrated in another ERP study by Golob et al. (2001). However, in this study, the N1 component associated with the repeated tone was more pronounced in crossmodal pairs, in which the tone was preceded by a visual stimulus. Taken together, these data suggest that older adults, in contrast to young adults, utilize their attention resources to process no longer relevant items. Furthermore, they fail to suppress repeated presentations of task-irrelevant information in particular when such information is conveyed across distinct sensory modalities.

3.3 Age and inhibitory control: Evidence from neuroimaging studies Functional neuroimaging studies also explored the link between age-related reduction in inhibitory control and cortical activations in response to task-irrelevant information. Townsend, Adamo and Haist (2006) found that older adults activated cortical regions associated with auditory as well as visual processing during visual focused attention tasks. This activation pattern was distinct from that of younger adults, in which activation was circumscribed to visual areas. In addition, older compared to younger adults activated additional regions within the left insula and frontal cortex. These findings suggest that older adults are less efficient at inhibiting task-irrelevant auditory information (Townsend et al., 2006). Additionally, in another focused visual attention task with complex visual stimuli, Gazzaley et al. (2005) showed that older adults, relative to young, also failed to suppress BOLD activity associated with task-irrelevant faces or houses. Across both groups, however, focused attention led to an enhancement of task-related BOLD activity (Gazzaley, Cooney, McEvoy, Knight, & D'Esposito, 2005). Using complex stimuli such as visual scenes and faces and an fMRI adaptation paradigm, Schmitz et al. (2010) found that older adults processes target stimuli differently. In contrast to younger adults, older adults processed task-irrelevant stimuli along with the attended targets. BOLD activity in the fusiform gyrus and parahippocampal place area showed significant adaptation to faces and houses, respectively, even when visual scenes were task-irrelevant. Hyper-binding of visual targets and distractors was also associated with improved place recognition memory scores in older compared to young adults (Schmitz, Cheng, & De Rosa, 2010).

25 Increased BOLD activity in response to task-irrelevant content has also been reported in the auditory system. Older adults showed increased primary auditory cortical activity in response to the acoustic noise of the fMRI scanner (Stevens, Hasher, Chiew, & Grady, 2008). This serendipitous finding contributes to the proposal of reduced inhibitory control and increased distraction from external, task-irrelevant auditory stimuli with age (cf., Alain & Woods, 1999). Increased distraction from scanner noise was also supported by quantitative perfusion imaging during rest and steady-state visual presentations (Hugenschmidt, Mozolic, & Laurienti, 2009).

3.4 Aging and behavioural variability In addition to perceptual slowing and reduced inhibitory control with age, older adults also exhibit more variable response latencies in various tasks ranging from simple detection tasks to more demanding tasks that require increased executive control (West, Murphy, Armilio, Craik, & Stuss, 2002). Recent studies suggested that RT variability measures offer an insight into the cognitive changes that occur with age and may also serve as early indicators of mild cognitive impairment (Balota et al., 2010). Recent studies demonstrated a strong link between perceptual speed and RT variability in older adults compared to middle-aged and young participants. Hultsch, MacDonald, & Dixon (2002) employed simple reaction time tasks in a group of young adults (ages 17-36 years) and older adults (ages 54-94 years) (Hultsch et al., 2002). Age differences were observed across two types of performance variability measures: variability between participants (diversity of RTs) and intra-individual variability (intra-individual dispersion of RTs). Other behavioural studies demonstrated a positive correlation between an overall slowing down of perceptual speed and intra-individual variability in older compared to young adults. Such increased intra-individual dispersion of RTs was reported using various perceptual tasks ranging from simple target detection to more complex visual pattern discrimination tasks (Ratcliff, Thapar, & McKoon, 2001, 2004). Furthermore, recent studies suggested that RT variability is greatly influenced by cognitive control demands. While performance variability was similar for younger and older adults in simple target detection tasks, RT variability increased in older, but not in younger adults, in working memory tasks that incorporated a greater number of distractors (West, Murphy,

26 Armilio, Craik, & Stuss, 2002). Although RT variability was correlated with increased cognitive demands and slower perceptual speed, West and colleagues did not examine the aspects of the RT distribution that were most associated with poorer performance in the older group. In other words, no previous studies investigated the extent to which the RT distribution in the older group differed from a normal, Gaussian distribution. Recently, Balota and colleagues examined this question using attentional selection and executive control tasks, such as Stroop and Simon tasks, and showed that the dispersion of RTs that indicated increased interference in older, and not young adults, was due to an increase in the tail of the distribution (Duchek et al., 2009; Spieler et al., 1996). Moreover, the group found that the measure tau, which indexed the tail of the distribution reflecting the longest RTs, was also negatively correlated with psychometric measures of episodic memory, working memory, and processing speed (Tse, Balota, Moynan, Duchek, & Jacoby, 2010). In other words, the bigger the tau, the poorer the memory scores measured. In a longitudinal study of older adults at risk of developing dementia of the Alzheimer’s type, Balota and colleagues showed that the measure of tau across time also predicted future cognitive impairment (Balota et al., 2010).

3.5 Audiovisual multisensory integration and aging In spite of reduced behavioural stability and increased cognitive decline with age, there are several functions that remain fairly unchanged with age. Procedural learning, sensory priming, conditioning and semantic knowledge hold up well in older individuals up to the late 60s (e.g., Grady & Craik, 2000; Salthouse, 2007). In the present thesis project, we asked whether certain perceptual processes can actually improve with healthy aging. We posited that multisensory integration is perceptual phenomenon that may offer us insight into the compensatory neural mechanisms that emerge as a result of sensory-specific declines with age. Very few studies examined how multisensory integration changes with age. Preliminary work from Laurienti and colleagues suggests that the multisensory integration improves significantly with age (Laurienti et al., 2006; Peiffer et al., 2007). Recently, there has been an increased interest in investigating how multisensory integration changes with age, particularly the extent to which older adults benefit from binding similar information conveyed across distinct sensory channels. As sensory-specific processes change with age, older adults may potentially benefit more from redundant, multimodal presentations compared to younger adults. In other words

27 complex sounds may receive a richer representation when presented alongside semantically congruent pictures (and vice versa). Laurienti and colleagues examined this question at the behavioural level using simple reaction time tasks, in which young and elderly participants responded to semantic AV stimuli, such as coloured discs paired with the verbalization of the given colour (Laurienti et al., 2006), and abstract AV stimuli including green light-emitting diodes paired with white noise (Peiffer et al., 2007). Both studies demonstrated larger multisensory facilitation, faster RTs and improved accuracy for bimodal compared to unimodal presentations, in the elderly group after controlling for a general sensorimotor slowing with age. Similarly, Diederich and colleagues (2008) measured saccadic RTs to bimodal stimuli and showed that the elderly participants exhibited significantly greater multisensory enhancement for bimodal presentations, relative to unimodal ones, compared to younger participants. The authors also examined the mechanisms underlying enhanced multisensory facilitation with age by modeling the contributions of the sensory and cognitive processes responsible for the behavioural differences using a time-window-of-integration (TWIN) model (Colonius & Diederich, 2004; Diederich, Colonius, & Schomburg, 2008). This model retains the classic notion of the race mechanism as an explanation of crossmodal interactions (cf., Colonius, Ozyurt, & Arndt, 2001; Miller & Ulrich, 2003; Raab, 1962), but restricts it to the very first stage of stimulus processing. The window of integration acts as a filter for determining whether the bottom-up processes summed across sensory channels are registered close enough in time for crossmodal integration to take place. The TWIN model allows for interactions to occur even between distant stimuli of different modalities as long as they fall within a given temporal window (Colonius & Arndt, 2001; Colonius & Diederich, 2004). The results of the TWIN model on the saccadic RT data suggested that peripheral, sensoryspecific processing takes substantially longer for older participants than for younger by an average of 36ms for visual and 80ms for auditory stimuli. Moreover, the group found that the temporal window of integration was wider for older than for young adults, averaging at 450ms and 275ms, respectively. In conclusion, the results suggest that, as peripheral processing slows down with age, the probability of integrating information from different sensory modalities

28 declines as well. The probability of processing visual and auditory information becomes more variable with age and is less likely to terminate within the crossmodal integration window. Thus, in a parsimonious fashion, the authors suggest that the RT facilitation effect is a direct indication of perceptual slowing in older subjects. Many of the peripheral processes that account for the race model, such as the early inhibitory function of neurons in the superior colliculus driven by redundant stimuli (Wurtz, Basso, Pare, & Sommer, 2000), might be reduced in the elderly. However, the multisensory gain is still greater in this group compared to the younger group. Diederich et al. (2008) interpret this multisensory facilitation effect in the elderly as analogous to the principle of “inverse effectiveness” (cf., Stein, Meredith, & Wallace, 1993), according to which multisensory enhancement for weaker stimuli tends to be larger than for more intense stimuli. According to this interpretation, increasing stimulus intensity should lead to a pattern of results in the elderly similar to those observed in younger adults. However, this seems to not be the case here since Laurienti et al. (2006) equated the perception of stimulus intensity across the two groups, and still observed multisensory facilitation effects in the elderly group. In contrast to the interpretation proposed by Diederich and colleagues (2008), previous studies have shown that decreased auditory or visual input leads to reductions in multisensory integration. For example, Musacchia and colleagues (2009) showed that multisensory integration effects were absent or less consistent in participants with hearing loss (Musacchia, Arum, Nicol, Garstecki, & Kraus, 2009). Specifically, decreased auditory input to the central nervous system, in the form of persistent hearing loss over time, also impaired multisensory integration. Furthermore, Gordon and Allen (2009) recently found that noise-modulated visual speech reduced performance in response to AV stimuli in older compared to young adults. Although both age groups demonstrated similar levels of visual enhancement of speech in bimodal compared to unimodal conditions, older adults showed no such enhancement after the visual speech was blurred, whereas young adults' enhancement was unaffected (Gordon & Allen, 2009). The group may have observed this because, in the older group, the speech representation is enhanced with adequate crossmodal stimulation. In this case, visual input was blurred, reducing perception of speech.

29 To summarize, the available behavioural studies investigating age-related changes in multisensory integration propose that older adults benefit more than young adults from congruent multisensory information. However, recent observations of reduced multisensory integration following impoverished auditory or visual input suggest that increased multisensory facilitation in aging may not be the result of reduced peripheral, sensory-specific input as the TWIN model would suggest (cf., Diederich et al., 2008). Multisensory facilitation in aging may exhibit an inverted U effectiveness curve. In other words, noisier or more variable multisensory input facilitates object detection or recognition, as in the inverse effectiveness hypothesis proposed by Stein et al., 1993. However, when bimodal, AV input is significantly reduced, as in cases of hearing loss or vision impairment, then multisensory integration declines as well. An interesting question that arises from these recent studies is whether the multisensory RT advantage observed in the elderly involves more central, higher-level neural mechanisms. One possibility is that the multisensory facilitation advantage in the elderly reflects a form of compensation beyond sensory-specific channels comprising multimodal brain regions in parietal and frontal lobes. Whereas young adults selectively recruit auditory and visual systems during multisensory integration tasks, older adults may show reduced functional specialization and engage multimodal parietal and prefrontal regions during the exact same tasks. This prediction is supported by the cortical dedifferentiation hypothesis that proposes that age is accompanied by decreased neural specificity or increased cortical dedifferentiation (for a review, see Park & Reuter-Lorenz, 2009). A second possibility is that older adults are more efficient at binding information across auditory and visual channels, as the reports of reduced inhibitory control in aging would predict (cf., Hasher et al., 1991; 1999). Thus, as a result of an age-related loss of frontal suppression over task-irrelevant sensory inputs (cf., Gazzaley et al., 2008), older adults may show significantly larger sensory-specific activation patterns in response to redundant, bimodal presentations. No studies thus far have examined the neural processes associated with multisensory facilitation in both young and elderly participants. The present study will investigate the extent to which older adults differ from young adults in terms of the source activation patterns in response to multisensory stimuli.

30 We also predicted that older adults, in contrast to young, will show reduced functional specialization and engage multimodal parietal and prefrontal cortices in response to multisensory presentations. Additionally, we hypothesize that such age-related changes in cortical activity will be behaviourally-relevant and will also correlate with faster and more accurate performance during multisensory contexts in older compared to younger participants. To explore this hypothesis further, we will first examine the existing literature on age and compensatory functional changes.

4

Aging and Compensatory Neural Mechanisms

Changes in functional specialization with age are determined by the anatomical or structural connectivity of the brain, as structural connectivity determines the possible network configurations that distributed brain regions can form (cf., Honey, Thivierge, & Sporns, 2010). In other words, it determines that a certain brain region A will communicate with other brain regions B and C by virtue of their synaptic connectivity. Furthermore, structural connectivity will also determine how neural activity between brain regions A through C will change over time. First, we will outline the current state-of-the-art regarding the structural changes that occur with age, and, afterwards, we will examine the literature that supports age-related functional network reorganization in various perceptual and cognitive tasks when performance is equivalent across young and older groups. Finally, we will advance and explore the neural compensation hypothesis, which can potentially offer insights into the neural mechanisms that support the enhanced multisensory facilitation effects in aging.

4.1 Age-related structural changes The structural changes that accompany aging include overall gray and white matter volume declines along with ventricular enlargement (Gunning-Dixon & Raz, 2000; Jernigan et al., 2001; Liu et al., 2003). The white matter abnormalities are usually observed later on in life (e.g., late 60s), and have been associated with attenuated performance on tasks that examine processing speed, immediate and delayed memory, executive functions, and indices of global cognitive functioning (Gunning-Dixon & Raz, 2000).

31 In addition to age-related changes in gray and white matter volumes, the strength and shape of age and volume associations vary greatly among brain regions. Age-related decreases in prefrontal volume have consistently been reported in the literature (Allen, Damasio, Grabowski, Bruss, & Zhang, 2003; Raz, Gunning-Dixon et al., 2004; Sowell et al., 2003; Tisserand et al., 2004). Furthermore, Raz and colleagues (e.g., Raz et al., 2005; Raz, Rodrigue, Head, Kennedy, & Acker, 2004) demonstrated significant gray matter volume declines in brain regions associated with cognitive processes including the hippocampus, entorhinal cortices, inferior temporal and the prefrontal cortices. Across basal ganglia structures, the largest gray matter declines were found in the caudate and putamen (Gur, Gunning-Dixon, Turetsky, Bilker, & Gur, 2002). Gray matter density declines were also observed longitudinally in a group of 92 healthy older adults between the ages of 59-85 years across the cingulate and insular cortices (Resnick, Pham, Kraut, Zonderman, & Davatzikos, 2003). With respect to sensory brain regions, the superior temporal lobes, including primary and secondary auditory cortices, showed gray matter volume declines, while the occipital lobes evidenced the least amount of change with age. Multimodal posterior parietal cortices including the IPS and SPL seem to also show negligible white and gray matter changes with age (Sowell et al., 2003).

4.2 Age-related functional changes Advances in neuroimaging technology have enabled researchers to examine the behavioural consequences of age-related changes in both structural and functional connectivity. Several theories that emerged from such research posit that the cognitive deficits that emerge with healthy aging arise from alterations in functional properties of coordinated brain systems. In other words, due to white matter loss or degeneration, brain regions that ordinarily activate together, become functionally disconnected with age (O'Sullivan et al., 2001; Salat et al., 2001; Head et al., 2004). Studies that examined age-related changes in functional connectivity found marked reductions in normally present functional correlations between brain regions in the "default-mode network" (Andrews-Hanna et al., 2007; Damoiseaux et al., 2008). This system referred to as the "default-mode network" can be observed at rest in the absence of any external stimulation (for a review, see Fox & Raichle, 2007). This default-mode or restingstate network has been associated with internally directed mental states and may reflect intrinsic

32 properties of brain functional organization (Greicius et al., 2003; Buckner & Vincent, 2007). Recent studies showed that healthy aging was accompanied by reduced functional connectivity between anterior and posterior regions within the default-mode network, namely the medial prefrontal cortex and the posterior cingulate cortex (Andrews-Hanna et al., 2007; Damoiseaux et al., 2008). These functional changes have been related to decreases in white matter integrity. These findings suggest that long-range functional connectivity or coupling between distal brain regions declines with age. Alternate, altered, or additional neural recruitment, likely resulting from the presence of white and gray matter reductions, was reported in elderly subjects in various fMRI studies (see ReuterLorenz & Cappell, 2008 for a review). The proposal of increased functional network reorganization with age was initially motivated by neuroimaging studies of healthy aging that demonstrated distinct activation patterns in older adults compared to young when the two groups performed similarly on the same tasks (for reviews see Park & Reuter-Lorenz, 2009; ReuterLorenz, 2002; Reuter-Lorenz & Cappell, 2008). Age-related underactivation was typically interpreted as a sign of impairment due to underutilized strategies or due to age-related structural atrophy. However, the mechanisms underlying the behavioural advantages that stemmed from increased and distinct cortical activation patterns with age remained more ambiguous. Behavioural stability across age groups in spite of cortical activation differences raised the hypothesis that the additional cortical activation patterns in the elderly served a beneficial, compensatory function. Neuroimaging studies that found evidence of reduced specialization with age proposed that such functional changes were adaptive because they predicted improved performance on perceptual and memory tasks in older compared to the younger participants (for a review, see Park & Reuter-Lorenz, 2009). For example, Cabeza and colleagues (2002) showed that bilateral recruitment of medial prefrontal cortices was associated with enhanced performance on a source memory task in older compared to younger participants (Cabeza, Anderson, Locantore, & McIntosh, 2002). The authors proposed that this alternate activation pattern reflected the additional recruitment of neural resources to meet the cognitive demands of the task in older participants whose memory scores matched those of young subjects.

33 Early neuroimaging studies found that older adults exhibit decreased functional specificity in parietal and prefrontal brain regions during face- and location-matching tasks (Grady et al., 1994). This finding was evidence of a greater reliance on parietal and prefrontal regions as a result of reduced focused visual attention with age. In addition to increased parietal and prefrontal involvement in visual attention tasks, there is growing evidence that more specialized sensory cortices, including the ventral visual cortex, exhibit reduced neural specificity with age. For example, Park et al. (2004) found that, in comparison to young adults, older adults showed markedly reduced neural specificity in the fusiform face area, a brain region specialized for face perception, and the parahippocampal place area, specialized for perception of scenes. These studies suggest that functional networks that are highly specialized in young adulthood become less specialized with age. To test the extent to which age-related functional reorganization predicted equivalent performance across different age groups, certain studies applied transcranial magnetic stimulation (TMS) to temporally perturb neural activity. This technique applies focally-directed magnetic pulses to the surface of the scalp to stimulate or deactivate underlying neural tissue. The prefrontal cortex was a good candidate structure for such manipulations because this region shows considerable changes in functional reorganization with age (see Park & Reuter-Lorenz, 2009) and because the sub-regions in the prefrontal cortex can be accessed with TMS. Rossi et al. (2004) confirmed the functional importance of increased prefrontal recruitment in the elderly by showing that TMS applied to either hemisphere of the prefrontal cortex disrupted performance on recognition memory tasks in a group of older adults (Rossi et al., 2004). TMS can also be applied to stimulate particular parts of the cortex that are under-recruited in certain subjects. Sole-Padulles and colleagues (2006) examined whether stimulation of prefrontal areas using TMS would increase memory performance in older adults. Indeed, in a group of lowperforming older adults, TMS activation pulses to each hemisphere of the prefrontal lobes increased activity in bilateral prefrontal areas. These functional activation changes also led to improved performance on recognition memory tests (Sole-Padulles et al., 2006). The aforementioned studies suggest that increased prefrontal activation was adaptive in older adults as it was directly associated with performance improvements in lower-performing older participants.

34 The observation of age-equivalent performance on various tasks, and, thus, the emergence of compensatory neural mechanisms may be the result of distinct lifestyle choices in a given population of healthy older adults. Memory improvements were previously observed in older adults whose IQ levels exceeded their education level, and who showed greater coping strategies with existing stressors (Garrett, Grady, & Hasher, 2010a). Epidemiological research suggests that in old age, a lifestyle rich in mental, physical and social stimulation exerts benefits on cognitive performance (cf., Backman & Dixon, 1992; Kramer, Bherer, Colcombe, Dong, & Greenough, 2004). Reaching a consensus about which cortical regions contribute to successful performance in older adults is an important step towards understanding the emergence of distinct functional specialization in healthy aging.

5

Purpose of the Study

Given the evidence of age-related changes in functional specialization, it is important to further examine the context in which healthy, older adults’ performance exceeds that of younger adults. If age-related behavioural advantages exist, it is also crucial to examine whether they are supported by changes in cortical activity or changes in cortical specialization. Recent behavioural evidence suggested that older adults exhibit larger RT facilitation effects following concurrent bimodal stimulation compared to auditory-only or visual-only presentations (Laurienti et al., 2006; Peiffer et al., 2007), and therefore, may be more efficient at binding congruent, AV information. To our knowledge, no studies have examined age-related differences in brain activity in response to semantically congruent AV stimuli. In the present study, we propose that multisensory integration may provide a window into the performance benefits that accompany age-related changes in functional network specialization. The multisensory facilitation advantage in the elderly may represent a form of functional reorganization beyond sensory-specific channels comprising higher-order, multimodal brain regions in parietal and prefrontal lobes. The purpose of the present study is to examine whether the multisensory facilitation advantage observed in the elderly groups can be explained by altered cortical activity or functional reorganization in multimodal brain areas. The brain regions of interest include posterior parietal and prefrontal cortices that have been shown to be associated with multimodal processes (for a review, see Calvert et al., 2001). Moreover, evidence from functional neuroimaging studies

35 suggested that these regions exhibit decreased functional specificity with age (Grady et al., 1994; Park & Reuter-Lorenz, 2009). To simulate familiar contexts of multisensory integration, we designed a series of conditions that manipulated the degree of semantic relatedness between complex sounds and familiar visual stimuli. In addition to testing the behavioural advantages of multisensory congruence, we also explore the reverse perceptual phenomenon, namely multisensory competition, by measuring the performance and neural consequences of crossmodal conflicts. The materials used in this study along with a detailed account of the experimental procedures are included in the second chapter of the thesis. The purpose of the methodological chapter (Chapter 2) is to give a detailed overview of the design and analysis methodology common to all experimental chapters of the thesis (Chapters 3-6). With the exception of participant information, which is specific to each experimental chapter, the general methods chapter provides a detailed description of (1) the stimuli and the experimental procedure, (2) the basics of MEG recordings, pre-processing and analysis, and (3) the rationale and background of the analytical tools employed in the study, namely synthetic aperture magnetometry (SAM) and partial least squares (PLS). In the first experimental chapter (Chapter 3), we assess both multisensory facilitation (faster responses and superior performance) and multisensory competition (slower responses and less accurate performance) in response to bimodal presentations in young, middle-aged, and older participants. We demonstrate that older adults respond faster and more accurately to semantically congruent, bimodal presentations when required to simply detect unimodal and bimodal stimuli. Conversely, they also show more errors and slower RTs as they judge the degree of semantic congruence between auditory and visual targets within semanticallyincongruent AV presentations. Secondly, we investigate the spatiotemporal patterns underlying both multisensory facilitation and competition at multiple stages of processing (Chapter 4). At early stages, 100ms after stimulus onset, multimodal posterior parietal regions respond preferentially to bimodal presentations, irrespective of the degree of semantic relatedness between the auditory and visual stimuli. At later stages of processing, 200 and 400ms after stimulus onset, neural activity in cingulate and superior temporal cortices is modulated by the semantic content of bimodal and

36 unimodal stimuli as participants judged whether they were animate or inanimate. Furthermore, the processing of crossmodal conflicts extends between 400-600 and 600-1000ms after stimulus onset, and was captured in orbitofrontal cortices. In the third experimental chapter (Chapter 5), we establish an unprecedented link between the neural processes underlying multisensory integration and the behavioural advantage observed in the elderly by demonstrating that older adults recruit an alternate network composed of frontoparietal brain regions in response to semantically congruent, multisensory stimuli. Activity in parietal and medial prefrontal cortices predicts faster RTs in bimodal compared to unimodal trials in the older group between 100 and 300ms after stimulus onset. Older adults further differentiated between congruent and incongruent presentations between 150 and 300ms, and effect captured primarily in left medial temporal cortices. In the fourth and final experimental chapter (Chapter 6), we examine age differences in MEG signal variability by quantifying brain signal complexity with the multi-scale entropy measure. Indices of brain signal complexity in combination with power spectrum density measures reveal age differences in signal variability across both low (delta and theta) and high (gamma) frequency bands. Our findings suggest that there is an increase in short-distance communication (as supported by higher frequencies) and a decrease in long-range communication (as supported by lower frequencies) with increasing age. These age-related differences in brain signal complexity also predict improved accuracy, more stable performance, and faster RTs in both young and older participants. The final chapter (Chapter 7) is a summary of the main findings and an evaluation of the study in the context of the current framework of multisensory integration in aging.

Chapter 2 Methodology and Experimental Design The critical materials used in this study were complex sounds and semantically related visual targets to simulate naturalistic multisensory contexts. Bimodal (auditory and visual) presentations are compared to unimodal (auditory only or visual only) presentations in various experimental manipulations. The purpose of the study was to investigate the behavioural and neural aspects of multisensory integration that change with age. To capture both spatial and temporal aspects of multisensory integration, we used magnetoencephalography (MEG) recordings and beamforming approaches to source modeling. We wished to distinguish the neural signatures of processes related to multisensory facilitation from those related to multisensory competition without having to select particular time points or brain regions of interest. Accordingly, we use a multivariate analysis approach, namely partial least squares (PLS), to analyze task- and trial type-related differences.

1

Experimental Design 1.1 Stimuli

Two types of stimuli, animate and inanimate, were used in the study. Items were selected from 4 distinct categories: (1) animals, (2) musical instruments, (3) automobiles, and (4) household objects. The first category of stimuli was labeled as "animate", while the remaining 3 categories were considered "inanimate" objects. To ensure that the complex sounds were easily nameable and identifiable, we assessed accuracy values and response times (RTs) for each stimulus exemplar in an initial behavioural pilot. Five young and older adults (mean ages 26 and 64, respectively), participated in this initial pilot. Complex sounds were excluded if detection accuracy levels fell below 75% and RTs exceeded two standard deviations above the mean RT values for each individual subject. Furthermore, after behavioural testing, we also asked participants to rate the complex sounds based on their recognizability. Based on the behavioural findings and the post-experiment questionnaire results, we excluded several complex sounds along with their visual counterparts. Thus, for each animate or inanimate category, 30 different exemplars from each sensory modality (auditory and 37

38 visual) were selected because they could be unambiguously categorized. Please refer to Table 1 for a complete list of the visual and auditory stimuli used in the experiment. Black-and-white line drawings selected from the Snodgrass and Vanderwart (1980) database of visual stimuli were used for visual presentations. All visual stimuli were matched according to size (in pixels), brightness, and contrast. Auditory stimuli were selected from the Rotman Research Institute database of non-speech, complex sounds. Complex sounds were matched according to amplitude. Root mean square (RMS) amplitudes were calculated for each sounds, and the overall mean was computed. For complex waveforms such as those used in the experiment, the RMS amplitude is usually used because it is an unambiguous quantification of the sounds. As such, each sound was assigned the mean amplitude; thus, the louder sounds were reduced, while the softer ones were amplified. Complex sounds were delivered binaurally at an intensity level of 60dB hearing level (HL) based on the audiometric mean across both ears. There were two types of bimodal trials, congruent presentations, in which the auditory and visual stimuli matched semantically (e.g., picture of a lion paired with the sound of a roar or picture of an ambulance car paired with a siren), and incongruent presentations, in which the auditory and visual stimuli were semantically mismatched (e.g., picture of a lion paired with the ambulance siren).

1.2 Procedure Presentation software (version 10.3; Neurobehavioural Systems, Inc.; http://www.neurobs.com/) was used to control visual and auditory stimulus delivery and to record participants' response latency and accuracy. Four possible stimulus combinations were used in this study: (1) unimodal auditory (A), (2) unimodal visual (V), (3) bimodal congruent or simultaneous auditory and visual stimuli that matched semantically (AV+), and (4) bimodal incongruent or simultaneous auditory and visual stimuli that did not match semantically (AV-). Each stimulus or stimulus pair was presented for 400ms; for the auditory stimulus, the 400ms interval also included a 5ms fall and rise time. The time interval between the end of the stimulus presentation and the beginning of the next trial was either 2, 3, or 4 seconds (equiprobable). See Figure 1 for an illustration of the paradigm.

39 Response instructions varied in 3 conditions. Multisensory facilitation effects (i.e., faster response latencies and improved accuracy) were examined using conditions 1 and 2, and multisensory competition effects (i.e., slower response latencies and reduced accuracy) were examined using condition 3. Each condition contained four distinct trial types: (1) unimodal auditory (A), (2) unimodal visual (V), (3) bimodal, semantically matched (AV+), and (4) bimodal semantically unmatched (AV-). All four trial types were randomly presented in each condition in a total of 120 trials (30 presentations per trial type). The set of stimuli and categories used are included in Table 1. The set of stimuli listed in Table 1 were used in all 3 conditions, and were presented in a random order in each condition. The first condition was a simple stimulus detection task, and served as a baseline for multisensory facilitation. Participants were instructed to detect an upcoming "event" (i.e., A, V, AV+ or AV-) and respond to it as quickly as possible irrespective of the particular trial type. Previous studies of multisensory integration demonstrated enhanced RT facilitation to bimodal stimuli using a similar experimental procedure (cf., Laurienti et al., 2007). To enhance the effects of multisensory congruence on RT patterns and accuracy levels, we also included an animacy task in which participants evaluated the semantic content of bimodal and unimodal stimuli by dividing each stimulus into animate or inanimate categories. This condition was adapted from other functional neuroimaging studies of semantic AV integration (cf., Beauchamp et al., 2004; Noppeney et al., 2010). Participants were required to make living versus non-living judgments for 3 of the 4 trial types (i.e., only A, V, AV+ were included). No incongruent presentations were used in this condition. Forty presentations of each trial type were used in a total of 120 trials. As such, participants labeled "animate" any animal complex sound, pictures or sounds and pictures presented simultaneously. Conversely, they categorized musical instruments, automobiles, and household objects as "inanimate" (see Table 1 for the stimuli and categories used). Finally, to investigate the effects of crossmodal conflicts on performance in both young and older adults, we instructed participants to identify the degree of congruence or mismatch between complex sounds and line drawings in response to simultaneous bimodal presentations. Similarly to the simple detection task, this task also included random presentations of all four trial types (i.e., A, V, AV+ and AV-) in a total of 120 trials (30 presentations per trial type).

40

1.3 Cognitive and neuropsychological screening To ensure that the individuals that we tested were physically and neurologically healthy, we took extensive participant screening measures; the subject selection criteria are outlined in Table 2. Furthermore, to assess neuropsychological function, four neuropsychological tests were administered (see Appendices 1-4 for the questionnaires that were administered during in-person interviews). The Rotman Research Institute Screening Questionnaire assessed medical, psychiatric, or neurological impairments. Concentration and attention were measured using the Short Blessed Test (Katzman et al., 1983), vocabulary using the Shipley Institute for Living Scale Test (Shipley, 1991), and memory and motor function using Folstein’s Mini-Mental Status examination (Folstein, Folstein, & McHugh, 1975).

2.1 Overview of MEG recordings The magnetoencephalography (MEG) method is based on the superconducting quantum interference device or SQUID, a sensitive detector of magnetic fields. MEG was introduced in the late 1970s by James Zimmerman (Cohen, 1972). SQUIDS can be used to detect and quantify millisecond changes in the magnetic flux through magnetometer coils in a superconducting environment. Neuronal currents produce weak magnetic fields that, when arising in concert from tens of thousands of neurons, can be detected from the surface of the scalp (Hamalainen, Hari, Ilmoniemi, Knuutila, & Lounasmaa, 1993). Macrocolumns of tens of thousands of synchronously activated large pyramidal cortical neurons are believed to be the main MEG generators because of the coherent distribution of their large dendritic trunks locally oriented in parallel, and pointing perpendicularly to the cortical surface (Nunez, 2000). Excitatory postsynaptic potentials (EPSPs) are generated at apical dendrites of individual neurons. The apical dendritic membrane becomes transiently depolarized with respect to the cell soma and the basal dendrites. This potential difference causes electrical current to flow through a volume conductor from the non-excited membrane of the soma and basal dendrites to the apical dendritic tree sustaining the EPSPs. The currents associated with the EPSPs generated among the dendrites are believed to be at the source of most of the signals detected in MEG because they typically last longer than the rapidly firing action potentials traveling along the axons of excited neurons (Hamalainen et al., 1993).

41 MEG and EEG are closely related as they both reflect changes in neuronal currents, and exhibit millisecond-range temporal resolution. However, they also have important differences. MEG is reference-free in the sense that the measured signal does not need to be compared with a signal from another location, whereas EEG recordings are made with respect to a reference voltage. MEG is especially sensitive to current flowing tangentially to the surface of the skull. Since dendrites are mostly orthogonal to the cortical surface, MEG is mostly sensitive to sources located in the fissures of the cortex. EEG, however, reflects currents in all orientations. In contrast to EEG, MEG also allows monitoring of the topography of the cortical activation sequences without severe distortion by the skull and other extra-cerebral tissues (Baillet, Mosher, & Leahy, 2001; Hari, Parkkonen, & Nangini, 2010). In the present study, the MEG was recorded in a magnetically shielded room at the Rotman Research Institute, Baycrest Centre, using a 151-channel whole head neuro-magnetometer (OMEGA, VSM Medtech Inc., Vancouver, Canada). Participants sat in upright position, and viewed the visual stimuli on a back-projection screen that subtended approximately 30 degrees of visual angle when seated 70cm from the screen. With respect to the visual presentations, the MEG collection was synchronized with the onset of each stimulus by recording the luminance change of the screen with a photodiode. Binaural auditory stimuli were presented at 60 dB HL via OB 822 Clinical Audiometer through ER30 transducers (Etymotic Research, Elk Grove, USA) and connected with 1.5 m of length-matched plastic tubing and foam earplugs to the participant’s ears. Sound transmission of such length was required because the sound transducers had to be placed at sufficient distance from the MEG sensor to avoid interference between the sound signal and the recorded brain activity. Sound transmission through the plastic tube caused a delay of approximately 10ms. To control for this timing confound, the MEG data collection was synchronized to the onset of the auditory sound envelope. Neuromagnetic activity was co-registered to each participant's individual structural magnetic resonance image (MRI). In order to constrain the sources of activation as measured by MEG, structural MRI scans were acquired using a 3.0T Siemens Tim MAGNETOM Trio MRI scanner (Software level Syngo MR, Siemens Medical, Germany) with 12-channel head coil. All participants’ structural MRIs and MEG source data were spatially normalized to the Talairach standard brain using Analysis of Functional Neuroimaging software (AFNI; Cox, 1996). Participants' head positions within the MEG were determined at the start and end of each

42 recording block using indicator coils placed on nasion and bilateral pre-auricular points. These fiducial points established a head-based Cartesian coordinate system for pre-processing and analysis of the MEG data. A subject-specific MEG recording summary sheet was created for each participant, and used to track hearing/vision threshold levels, fiducial point coordinates, and head motion during the MEG scan (see Appendix 5 for a template of the subject-specific MEG recording summary sheet). Neuromagnetic activity was sampled at a rate of 1250 Hz, and was recorded continuously in 6 experimental blocks of 15 minutes recording time each. Third gradient noise correction was used on the continuous MEG data. Afterwards, the MEG data were parsed into epochs including a 200ms pre- and a 1000ms of post-stimulus activity period, and D.C. offsets were removed from the entire epoch. A detailed summary of the pre-processing steps used following MEG recording is included in Appendix 6. Finally, MEG data were grand averaged across all stimulus types and band-pass filtered between 0.1 and 55 Hz. A principal components analysis (PCA) was performed on each epoch and components larger than 2.0 pT (picoTesla) at any time point were subtracted from the data. This preprocessing step effectively removed large artifacts caused by eye-blinks (Okada, Jung, & Kobayashi, 2007). Subject-specific event marker summary sheets were used to keep track of (1) the frequency of trial type presentations, (2) correct/incorrect responses, and (3) number of trials for each subject (see Appendix 7). The major challenge of MEG analysis methodologies is spatially mapping the neuronal current distribution that underlies the measured magnetic fields. The MEG inverse problem states the following: given a known magnetic distribution measured from the surface of the scalp, source modeling algorithms must reconstruct an accurate representation of the original neuronal current distribution in the brain (Baillet et al., 2001). This problem is ill-defined as it gives rise to multiple solutions because any single magnetic field distribution could be caused by an infinite number of current distributions. For these reasons, while the temporal resolution of MEG is unparalleled, its spatial resolution is dependent on the approach that is used to solve the inverse problem. In recent years, advanced data analysis methods have significantly improved the spatial accuracy of MEG recordings. Those included minimum norm (Hauk, 2004), weighted minimum norm solutions (Liang & Wang, 2009), and standardized low resolution brain electromagnetic tomography (sLORETA; Pascual-Marqui, 2002). Each of these source analysis techniques relies on specific assumptions set in order to solve the inverse problem. One class of

43 source analysis techniques, which overcomes the non-uniqueness of generalized inverse solutions, is collectively known as the MEG beamforming approach (Robinson & Vrba, 1998; Sekihara, Nagarajan, Poeppel, Marantz, & Miyashita, 2001).

2.2 Beamforming approaches to source analysis A beamformer performs spatial filtering on data from a sensor array to discriminate between signals arriving from different spatial locations. Beamforming was originally utilized in radar and sonar signal processing but has since found applications in diverse fields ranging from astronomy to biomedical signal processing (cf., Brookes et al., 2007; Robinson & Vrba, 1999; Sekihara et al., 2001). The beamformer methodology applied to source localization of MEG data uses a spatial filtering approach as well, and can be thought of as being analogous to the application of a traditional frequency filter. Similarly to a frequency filter, the beamformer can be used to extract signals with a certain frequency band between a specified "control" and "active" time window. Unlike modeling the MEG data with a small number of current dipoles, the beamformer analysis does not require a priori assumptions about the number of active sources. Through the sequential application of the beamformer to a number of locations placed on a regular grid spanning the whole brain, a volumetric image can be extracted (Robinson & Vrba, 1998; Van Veen, Van Drongelen, Yuchtman, & Suzuki, 1997). The implementation of the beamforming technique to MEG data assumes that two or more spatially distinct sources are orthogonal to each other. This gives rise to certain limitations of the technique, namely that if two spatially distinct sources become correlated over time, the source power estimate would be attenuated. Because it is unlikely that a high degree of source correlation would persist for longer than a few seconds (Hillebrand, Singh, Holliday, Furlong, & Barnes, 2005), the effect of the correlated sources can be minimized by ensuring that analyses are performed over large temporal windows (e.g., an average of 10 seconds). To address the limitations of previous source modeling techniques, the "nonlinear beamformer" or the synthetic aperture magnetometry (SAM) technique was proposed (Robinson & Vrba, 1998; Sekihara et al., 2001). This approach to MEG data analysis is a two-step procedure: the first step uses the beamformer as a spatial filter for reconstructing source activity, and the second step involves the computation of a signal statistic that is derived from source activity and mapped volumetrically (Brookes et al., 2007).

44 The SAM spatial filtering technique is based on a nonlinear constrained minimum variance beamformer (Robinson & Vrba, 1998), which overcomes the non-uniqueness of generalized inverse solutions such as minimum norm, because it does not attempt to solve a parametric source model to explain the data. As mentioned above, inverse solutions are not unique for three-dimensional source current distributions because there are an infinite number of source models that could completely explain the measurements. SAM examines the converse, namely whether there is a unique forward solution for any current dipole in a given brain region of interest. As such, SAM minimizes power or the variance of the measured MEG signals such that signals emitted from sources outside each specified voxel are suppressed (Brookes et al., 2007; Cheyne, Bostan, Gaetz, & Pang, 2007). This enables one to display simultaneously active sources at multiple sites, provided that they are not perfectly synchronized. With SAM, each brain region is defined by a three-dimensional position vector and consists of a unique set of sensor coefficients that constitute a weighting matrix. The MEG data is then projected through this spatial filter to give a measure of current density, as a function of time, in the target brain region. Because this source time series is calculated using a weighted sum of the MEG sensors, it has the same millisecond time resolution as the original MEG sensor time series (Brookes et al., 2007). The weighting parameters are derived based on power minimization. Sources that are primarily due to current flow tangential to the cortical surface can be calculated at a number of locations to produce a whole head volumetric image of source activity. However, signal-to-noise ratio estimates decrease for sub-cortical sources that are close to the centre of the brain, and noiserelated power is likely to dominate over any genuine physiological sources of power (Papadelis, Poghosyan, Fenwick, & Ioannides, 2009). For this reason, source power estimates are normalized by an estimate noise power term called the pseudo-Z-statistic or the neural activity index, which can be obtained by dividing source power by the noise power (Robinson & Vrba, 1998). This step is necessary for removing any spatial distortion due to uncorrelated noise. When measuring the change in cortical power related to performance or to a specific task, a pseudo-T-statistic can be calculated. The SAM spatial filtering technique has been used to localize brain activity in motor tasks (Herdman et al., 2004), tasks utilizing auditory (Herdman et al., 2003), visual (Fawcett, Barnes, Hillebrand, & Singh, 2004), somatosensory stimuli (Gaetz & Cheyne, 2003; Schulz et al., 2004),

45 audiovisual stimuli (Herdman et al., 2004), and even semantic and phonological task-set priming tasks (McNab, Rippon, Hillebrand, Singh, & Swithenby, 2007). However, these studies sacrificed temporal resolution by integrating power over relatively long temporal windows (e.g. an average of 10 seconds). Recently, Cheyne and colleagues (2006) proposed a new event-related SAM beamformer technique to identify instantaneous (event-related or evoked) brain responses from un-averaged single-trial data (Cheyne, Bakhtazad, & Gaetz, 2006). Similar to previous beamforming approaches, the event-related SAM analysis uses the individual trials of each condition and the forward solution for optimal current direction to calculate a spatial filter for each voxel using the minimum-variance beamforming algorithm. Event-related beamforming algorithms have the additional advantage of being able to image evoked brain activity even in the presence of large amounts of both environmental noise and other unwanted, subject-specific artifacts like eye movements (cf., Cheyne et al., 2007). To enhance the spatial precision of this technique, the participants' structural MRIs are used to constrain the event-related SAM images to each participant's individual anatomy and to allow for spatial normalization and group averaging in stereotaxic space. Event-related beamforming approaches have been successfully employed to image voluntary movements (Cheyne et al., 2006), and to measure higher order cognitive processes associated with face perception (Itier, Herdman, George, Cheyne, & Taylor, 2006), verb generation (Herdman & Ryan, 2007), relational learning (Moses et al., 2009), or recognition memory (Riggs et al., 2009). A common finding in these studies, besides the successful detection of time-locked, task-relevant brain activity, was an overall suppression of eyemovement artifacts and low-level environmental noise. This is an advantage of the technique, because, unlike traditional beamformer approaches, which integrate power over long temporal windows, event-related SAM estimates power based on stimulus or event onsets. Because eyemovement artifacts and other environment sources of noise are not stimulus-locked, they do not contribute to the power estimations. In the present study, we employed the event-related SAM analysis technique because we wanted to measure event-related source activity while also preserving a high degree of temporal resolution of source activity. Because we were not required to integrate power over large temporal windows, we were able to use a larger number of trials (i.e., an average of 40 trials per stimulus type) in each scanning block. Furthermore, by employing the event-related

46 beamforming technique, we were also able to design the experiment in an event-related fashion, without the use of specific "active" and "control" time windows. This allowed us to directly apply the original event-related paradigm used in the behavioural study (Chapter 3) to MEG recordings in young (Chapter 4) and older adults (Chapter 5).

3.1 Overview of multivariate statistical methods for neuroimaging studies Contemporary theories of human brain function agree that behavioural and cognitive operations emerge from dynamic interactions across distributed brain areas (McIntosh, 1999). In the last decade, there has been a major drive to develop multivariate statistical models that investigate whole brain activity, measuring activity in more than one brain region at a time. Distributed network analysis approaches, such as interregional correlation (Horwitz, Duara, & Rapoport, 1984), structural equation modeling (SEM; McIntosh & Gonzalez-Lima, 1991), probabilistic independent components analysis (Beckmann & Smith, 2004), and dynamic causal modeling (DCM; Stephan et al., 2007) have been applied to neuroimaging data. Greater emphasis has been placed on statistical tools that examine the spatial and temporal patterns of brain function simultaneously. Partial least squares (PLS), which was introduced to the neuroimaging community in 1996 (McIntosh, Bookstein, Haxby, & Grady, 1996), is a multivariate statistical package that, in contrast to previous multivariate approaches, examines whole brain activity as a function of task demands or as related to an outcome measure. It does so without placing constraints on the data, and it utilizes a singular value decomposition (SVD) approach to determine brain-task and brainbehaviour relationships. Additional advantages of PLS over other multivariate approaches include (1) the non-parametric assessment of statistical significance of task effects, (2) the robust statistical protection against outlier influences at the sensor or source level through bootstrap resampling along with (3) flexible configurations that allow in-depth examination of activation differences and brain-behaviour correlations (McIntosh & Lobaugh, 2004).

3.2 Partial least squares The term "partial least squares" refers to the computation of an optimal squares fit to a matrix (Wold, 1982; Wold, Geladi, Esbensen, & J., 1987). The part is the "cross-block" correlation

47 between some set of dependent measures. PLS is similar to PCA or to canonical correlation, with the exception of one important feature: PLS solutions are constrained to the part of the covariance structure that is attributable to the experimental manipulations or that relates to a given dependent measure. Moreover, PLS is ideal for data sets in which the dependent measures within a block are highly inter-correlated or not full rank, such as is the case for neuroimaging data. Thus, items within a block are not adjusted for these correlations as they are in the canonical correlation approach. PLS has been applied to many neuroimaging techniques including positron emission tomography (PET), fMRI, structural MRI, EEG, and recently to MEG data as well (cf., Boonstra, Daffertshofer, Breakspear, & Beek, 2007; Duzel et al., 2003; Moses et al., 2009). PLS applied to MEG data is conceptually analogous to the analysis of MEG source difference waveforms in that it identifies task-related differences in amplitude across all MEG sources by deriving the optimal least squares contrasts that code for trial type differences. Because PLS performs this derivation across the entire dataset in time and space simultaneously, there is no need to specify a priori MEG sources or time intervals.

3.2.1

Data organization

Similar to other multivariate techniques, PLS operates on the entire data structure at once, and thus, requires the data to be in matrix format. Every row of the matrix contains data for one subject in one condition. The rows are arranged such that subjects are nested within condition blocks. The columns of each data matrix contain the MEG signal measured for each source at each time point. The first column has intensity for the first source at the first time point, and the second column has intensity for the first source at the second time point. The present application of PLS primarily employed the within-task mean-centering approach. Here, trials within each experimental condition are averaged and then expressed as a source-by-source deviation from the grand mean across the entire experiment. This mean-centered matrix is simply an orthogonal rotation of the cross-block covariance matrix using orthonormal contrasts (i.e., they are orthogonal and of unit length).

48

3.2.2

Singular value decomposition (SVD)

SVD is applied to the mean-centered deviation matrix. Mathematically, SVD simply reexpresses this matrix as a set of orthogonal singular vectors or latent variables (LVs), the number of which is equivalent to the total number of tasks. Each LV contains a pair of vectors relating brain activity to the experimental design. The LVs are analogous to eigenvectors in PCA and account for the covariance of the original mean-centered matrix in decreasing order of magnitude. For each LV, the two vectors are linked by a singular value (equivalent to the square root of the eigenvalues). The singular value indicates the proportion of cross-block covariance (i.e., covariance between the two blocks of data: brain activity and experimental design) that is accounted for by each LV. The two vectors mentioned above reflect a symmetrical relationship between the components of the experimental design most related to the differing signals in the MEG sources on one hand, and the optimal, in the least-squares sense, spatiotemporal pattern of MEG sources related the identified experimental design components on the other. The numerical weights at each MEG source and time-point are called source saliences. Those saliences identify the collection of MEG sources that, as a group, are most related to the condition differences expressed in the given LV. The task saliences, on the other hand, indicate the degree to which each condition is related to the identified pattern of source waveform differences expressed in the given LV.

3.2.3

Behaviour PLS analysis approach

PLS can be also used to examine the relationship between outcome measures (i.e., accuracy, response times) and whole brain neuromagnetic activity. The configuration of the data matrix is the same as for the mean-centered, task PLS approach. However, instead of mean-centering the matrix, the correlation between the behaviour measures at the MEG signal is computed across subjects within each condition. The resulting matrix represents a within-task brain-behaviour correlation matrix. SVD applied to this brain-behaviour correlation matrix produces three new matrices as well. Similar to the mean-centered approach, the output matrices include the (1) source saliences, (2) singular values, and (3) task saliences. The variations across task saliences, however, indicate in

49 this case whether a given LV represents a similarity or difference in brain-behaviour correlations across conditions. The source saliences reflect the corresponding brain-behaviour correlation pattern across space (expressed across a collection of MEG sources) and time (expressed across all time points included in the analysis).

3.3 Statistical assessment Arbitrary decisions regarding the optimal number of LVs to retain and which of the task or source weights to consider relevant are minimized by using two complementary re-sampling techniques that provide statistical assessment of the LVs. First, permutation tests assess whether the task weights represented by the given LV are significantly different from random noise. This is accomplished using sampling without replacement, and reassigning the order of the conditions to each subject (Nichols & Holmes, 2002). Thus, task condition labels are shuffled across all participants resulting in the random assignment of conditions and group membership (in case of group PLS analysis). PLS is recalculated for each new re-ordering, and the number of times the permuted singular values exceed the original values is calculated and assigned a probability. This results in a distribution of singular values from the shuffled datasets, from which the cumulative 95th percentile is taken as the significance threshold. With permutation tests, statistical significance of the identified task weights is determined without relying on distributional assumptions common to conventional parametric statistical methods (McIntosh et al., 1996). Second, the reliability of each source contribution to the LV is assessed by a bootstrap estimation of standard errors for the MEG source saliences by re-sampling participants with replacement and preserving the total amount of data in each bootstrap set. The primary purpose of the bootstrap estimation is to determine the time points of the source waveforms that show reliable experimental effects across participants. Bootstrap estimation of standard errors involves (i) randomly sampling subjects with replacement while keeping the assignment of experimental conditions fixed for all observations, (ii) performing singular value decomposition on the resampled matrix, and (iii) computing the standard error of the sources contributing to the task effects expressed at each time point (Efron & Tibshirani, 1985; Efron & Tibshirani, 1993). Importantly, by using bootstrap estimation of standard errors, no correction for multiple

50 comparisons is necessary because the source saliences are calculated in a single mathematical step, on the whole brain at once. The results of the bootstrap estimation can also be used to derive standard error bars around brain-behaviour correlation scores. This facilitates the identification of the sources and time points that reflect a stable relationship between behaviour and brain activity. The time points where the salience is greater than 3 times the standard error (i.e., a bootstrap ratio ≥3) are indicated above or below the plots of the grand-averaged source waveforms appearing in the figures of this manuscript. The bootstrap estimation is also used to derive confidence intervals for the task weights and the latent variable correlations in the behaviour PLS analysis. Thus, the upper and lower percentiles of the bootstrap distribution are used to determine the confidence interval limits; those are represented as error bars over the brain-behaviour correlation patterns included in the figures pertaining to the behaviour PLS results. To summarize, permutation tests are used to determine the significance of the task effects represented by each LV, and the bootstrap estimates of standard error evaluate the reliability of the contribution of each source salience. These two re-sampling techniques provide complementary information about the statistical strength of the task effects and its reliability across participants. Statistical evaluation of task effects was performed using an optimal number of 500 permutations (cf., Nichols & Holmes, 2002) and 300 bootstrap iterations (cf., Efron & Tibshirani, 1986; McIntosh et al., 1996). Appendix 8 includes the steps in the analysis and a listing of the MATLAB scripts (version 7.6; MathWorks, Inc.; http://www.mathworks.com/) used to compute each step in the MEG data analysis.

Chapter 3 Aging Effects of Multisensory Facilitation and Competition

1

Introduction

Integration of object features across auditory and visual sensory channels contributes to the richness and vividness of each sensory experience. While it seems natural to the observer that information from different sensory modalities is unified into a coherent whole, the question of the behavioural consequences of this integration has been examined by researchers. Several factors, such as temporal correspondence (Diederich & Colonius, 2004; Frassinetti, Bolognini, & Ladavas, 2002) and spatial congruence (McDonald et al., 2000; Thorne & Debener, 2008), seem to mediate multisensory integration of auditory and visual information. Other mechanisms of AV integration that do not take advantage of low-level properties of AV stimuli such as temporal or spatial correspondence, assess the impact of semantic congruence across auditory and visual channels on performance. The binding of familiar, complex sounds with semantically-related visual stimuli can be referred to as semantic AV integration because it simulates naturalistic multisensory contexts. Several behavioural studies showed that semantic AV integration facilitates both object detection and recognition (Chen & Spence, 2010; Laurienti et al., 2004; Yuval-Greenberg & Deouell, 2009). Laurienti and colleagues (2004), for example, examined the effects of both crossmodal pairings and intramodal visual pairings on stimulus detection speed. In crossmodal AV pairings, blue coloured circles were presented along with the verbalization of the word "blue". In intramodal (visual-visual) trials, the written word "blue" was presented on top of the blue circle. Significantly faster RTs were detected specifically for semantically congruent AV pairs, but not for congruent, visual intramodal pairs. In contrast to the multisensory facilitation effects of congruent AV pairs, significantly longer RTs were observed in response to incongruent AV combinations. This study, therefore, highlighted the impact of crossmodal, and not intramodal, semantic congruency on response latencies. Recent behavioural studies suggested the presence of sensory-specific effects on multisensory integration. Chen and Spence (2010) assessed the effects of AV semantic congruency on participants' ability to identify briefly-presented, and then rapidly-masked pictures (Chen & 51

52 Spence, 2010). A semantically congruent sound was presented concurrently with visual targets on a percentage of the trials. When the pictures and the complex sounds were presented simultaneously, a semantically-congruent sound improved while a semantically-incongruent sound impaired participants' ability to identify the pictures. Visual stimulus identification was unaffected in the white-noise control condition. These results suggest that the auditory system functions in the service of the visual system during stimulus identification tasks, particularly when complex sounds and pictures share semantic features. In a series of psychophysical tests, Kubovy and van Valkenburg (2001) also showed that simple tones can facilitate visual spatial attention, but not vice-versa. Thus, the extent of AV integration might be affected by the degree of attentional capture from one of the two stimuli within the bimodal presentations. While AV congruence facilitates performance, situations of crossmodal conflict achieve the converse, predicting slower RTs and lower accuracy values. Behavioural effects of visual sensory dominance over audition have also been reported previously. Sensory-specific influences on AV integration were demonstrated in a name verification task on complex, naturalistic objects (Yuval-Greenberg & Deouell, 2009). Stronger effects of AV congruency on performance were observed when participants were required to attend to the auditory sensory modality. Irrelevant visual information affected auditory recognition more than irrelevant auditory information affected visual recognition. These results suggest a more pronounced influence of concurrent visual stimulation on auditory perception than vice versa. This effect might occur because the visual modality provides more salient and unambiguous information for object recognition. This interpretation is consistent with the documented situations of visual dominance over auditory processing as in the Colavita effect (Colavita, 1974). The Colavita effect refers to the increased incidence of misses in response to AV presentations due to visual modality dominance over auditory and tactile targets when participants are required to detect an auditory or tactile target embedded within a bimodal event. Although significant advances have been made in establishing the behavioural consequences of both multisensory facilitation and multisensory competition, very few studies, have examined how multisensory integration changes with age. As auditory and visual acuity decreases with age (cf., Alain et al., 2006; Spear, 1993), task-irrelevant stimuli may not sufficiently dampened, thereby receiving a richer representation (Campbell et al., 2010b). As such, older adults may be

53 more efficient at binding semantically related auditory and visual information when exemplars from the two sensory modalities are presented simultaneously. Laurienti and colleagues used simple reaction time tasks in which young and elderly participants responded to semantic AV stimuli, such as coloured discs paired with the verbalization of the given colour (Laurienti et al., 2006), and abstract AV stimuli including green light-emitting diodes paired with white noise (Peiffer et al., 2007). Both studies demonstrated larger multisensory facilitation effects in the elderly group, compared to the younger group, after controlling for a general sensorimotor slowing with age. Similarly, Diederich et al. (2008) measured saccadic RTs to bimodal stimuli and found that elderly participants exhibited a significantly larger multisensory RT gain in response to bimodal presentations compared to younger participants. Bimodal presentations also reduced task-irrelevant distraction in older adults in focused visual attention tasks. In contrast to unimodal targets, bimodal targets were associated with greater saccadic trajectory deviations away from visual distraction in older compared to young adults (Campbell et al., 2010a). Although previous studies demonstrated that older adults’ performance was enhanced by bimodal presentations, no studies so far have examined changes in both multisensory facilitation and competition with age. Multisensory facilitation refers to the performance gain (i.e., faster response latencies and improved accuracy) in response to bimodal, semantically congruent stimuli compared to unimodal ones. Given the behavioural advantages of multisensory congruence, we also investigated the converse perceptual phenomenon, namely multisensory competition, by measuring the performance consequences of crossmodal conflicts. Multisensory competition refers to situations of sensory dominance of one modality over another when observers are required to process individual target stimuli embedded within a bimodal event. For example, as shown previously, concurrent visual stimulation has a more pronounced effect on auditory perception than vice versa (cf., Yuval-Greenberg & Deouell, 2009). The aim of the behavioural study was to develop a paradigm that could test both multisensory facilitation and competition effects in younger, middle-aged, and older participants. Bimodal congruent and incongruent presentations were compared to their unimodal auditory and visual counterparts as task instructions varied to induce, in some cases, multisensory facilitation (i.e., faster response times, more accurate detection of bimodal presentations), and in others,

54 multisensory competition (i.e., slower response times and lowered accuracy for bimodal presentations). As this was a within-subject design, questionnaires that measured concentration, working memory and vocabulary were administered in-between each condition to minimize task interference.

2

Materials and Methods

The critical materials used in the study along with the experimental design were described in great detail in Chapter 2. This section will contain methodological details pertaining only to the behavioural study.

2.1 Participants Nine healthy young adults (19-25 years, mean age ± s.d., 21 ± 2.3), ten middle-aged adults (4857 years, mean age ± s.d., 51 ± 3.8), and nine older adults (60-80 years, mean age ± s.d., 68 ± 8) participated in this study. All volunteers were right-handed and had normal to corrected-tonormal vision. All volunteers were audiometrically screened to determine hearing thresholds for each ear separately; volunteers whose hearing thresholds exceeded 15dB hearing level (HL) were excluded from participation, as that was considered below normal levels. All participants gave formal informed consent in accordance with the joint Baycrest Centre-University of Toronto Research Ethics Committee. The study description and the informed consent form that was administered to participants prior to the start of the experiment are included in Appendix 9. The subject selection criteria are outlined in Table 2. It is important to note that the subject sample size was relatively small as this behavioural study in young, middle-aged, and older adults served as a precursor for the MEG study in young and older adults (Chapters 4 and 5). The behavioural study allowed us to develop an event-related paradigm that showed robust differences across age groups which could then be applied to MEG recordings in young (Chapter 4) and older adults (Chapter 5).

2.2 Stimuli Complex sounds and black-and-white line drawings (Snodgrass & Vanderwart, 1980) were selected from 4 distinct categories: (1) animals, (2) musical instruments, (3) automobiles, and (4) household objects. The first category of stimuli was labeled as "animate", while the remaining 3

55 categories were considered "inanimate" objects. For each animacy type, 30 different exemplars from each sensory modality (auditory and visual) were chosen from a larger database based on pilot testing that confirmed that the exemplars selected could be unambiguously categorized. All visual stimuli were matched according to size (in pixels), brightness, and contrast, and all complex sounds were matched according to amplitude.

2.3 Procedure Presentation software (version 10.3, Neurobehavioural Systems, Inc.; http://www.neurobs.com/) was used to control visual and auditory stimulus delivery and to record participants' response latency and accuracy. All experimental conditions were administered in a sound-attenuated testing room. Participants were seated comfortably with their chin fixed 60cm from the computer monitor that was used to display the visual stimuli. Auditory stimuli were delivered through speakers situated in the left and right sides of the computer monitor. Four possible stimulus combinations were used in this study: (1) unimodal auditory (A), (2) unimodal visual (V), (3) bimodal congruent or simultaneous auditory and visual stimuli that matched semantically (AV+), and (4) bimodal incongruent or simultaneous auditory and visual stimuli that did not match semantically (AV-). Each stimulus or stimulus pair was presented for 400ms; for the auditory stimulus, the 400ms interval also included a 5ms fall and rise time. The time interval between the end of the stimulus presentation and the beginning of the next trial was either 2, 3, or 4 seconds (equiprobable). See Figure 1 for an illustration of the paradigm. All four trial types were randomly presented in each condition. A total of 120 trials were used in each condition (30 presentations of each trial type). The set of stimuli listed in Table 1 were used in all 5 conditions, and were presented in random order in each condition. Response instructions varied in 5 separate conditions. Multisensory facilitation (i.e., faster response latencies and improved accuracy) was examined in conditions 1-2 and multisensory competition effects (i.e., slower response latencies and reduced accuracy) were assessed in conditions 3-5.

2.3.1

Experimental conditions

The first two conditions included simple stimulus detection tasks, in which participants were instructed to detect an upcoming "event" (i.e., A, V, AV+ or AV-). In the first condition,

56 participants responded to any of the 4 stimulus events as quickly as possible without respect to the particular trial type (keyboard press A). Previous studies of multisensory integration demonstrated enhanced RT facilitation to bimodal stimuli using a similar experimental set-up (cf., Laurienti et al., 2007). In the second condition, however, participants were instructed to differentiate between unimodal and bimodal stimuli, responding with one of three key presses (keyboard presses A, S, and D) to indicate auditory, visual, or both (bimodal) stimuli: A, V, or AV (which included both AV+ and AV- trials) - as rapidly as possible. The third and the fourth conditions were meant to investigate the effects of visual dominance over auditory processing analogous to the Colavita effect (Colavita, 1974). The Colavita effect refers to the increased incidence of misses in response to AV presentations due to visual modality dominance over auditory and tactile targets in situations in which participants are required to detect an auditory or tactile target embedded within a bimodal event. In the third condition "Detect the auditory target: Two-button press", participants were required to respond only to auditory targets embedded in A, V, AV+ and AV- presentations. As such, participants needed to press the response key “A” during auditory only, AV+ or AV- presentations and “S” during visual only presentations. In the fourth condition entitled "Detect the visual target: Twobutton press", participants detected the visual target embedded in various types of stimulus presentations. Thus, participants needed to press the response key “A” during visual only, AV+ or AV- presentations and “S” during auditory only presentations. In order to investigate the effects of crossmodal conflicts on performance in both young and older adults, we instructed participants to identify the degree of congruence or mismatch between complex sounds and line drawings in response to simultaneous multisensory presentations. Three key press responses (keyboard responses A, S and D) corresponded to (i) unimodal presentations (A or V), (ii) bimodal matched (i.e., AV+, for example, picture of a bird paired with a bird ‘chirp’), and (iii) bimodal unmatched (i.e., AV- , for example, picture of a violin paired with the sound of bird ‘chirp’). In conditions 2 through 4, the correspondence between response keys and the type of stimulus presentation was counterbalanced between participants to control for any handedness effects on response latencies.

57

2.3.2

Neuropsychological screening

As this was a within-subject design, in order to minimize task interference, three questionnaires were administered in between each run (see Appendices 2-4). Concentration and working memory were assessed using the Short Blessed Test (Katzman et al., 1983), vocabulary using the Shipley Institute for Living Scale Test (Shipley, 1991), and memory and motor function with Folstein’s Mini-Mental Status examination (Folstein et al., 1975).

3

Results

Statistical Package for the Social Sciences (SPSS) software (version 16) was used to analyze the behavioural data: questionnaire results, percent accuracy and response time (RT) measures. The behavioural data were analyzed using a mixed design analysis of variance (ANOVA) in which questionnaire type (i.e., SBT, vocabulary, MMSE) and condition (i.e., Detect event, Detect event (specific), Detect auditory target, Detect visual target, and Detect congruency) were the withinsubject factors, and age was the between-subject factor.

3.1 Neuropsychological screening All participants showed MMSE scores of 28 and above. Middle-aged and older participants scored higher than younger adults on all the questionnaires administered: SBT, vocabulary, and the MMSE tests (see Table 3 for descriptive statistics). An interaction between age group and questionnaire type was observed [F (4, 54)=6.9, p < 0.0001] with young adults exhibiting poorer performance on the SBT and vocabulary tests compared to middle-aged and older groups, with the largest group differences between the young and the older participants.

3.2 Accuracy Accuracy was high across all three groups averaging at mean ± s.d., 0.96 ± 0.008 (Table 4). A main effect of condition type was observed [F (4, 21) = 13.62, p < 0.0001] suggesting that all three groups had the largest proportion of errors in the fifth condition in which they were required to detect the degree of mismatch between auditory and visual targets within AV pairings. Only correct responses were used in the reaction time (RT) data analysis. Prior to RT data analysis, participants' response latencies that were 2 or more standard deviations above or below

58 each participant’s mean RT for that trial type and condition were removed. Less than 1% of the data was eliminated following this RT trimming procedure. Next, a missing value analysis was computed to complete the missing RTs for outlier participants who exhibited RTs that were 2 or more standard deviations above the mean. The criterion used in this procedure was to omit any variables that were missing more than 5% of the cases (i.e., omit trials in which more than 5% of RTs were missing); therefore, the 30th trial was discounted for all conditions and trial types. The missing value analysis step was important in this analysis because we wanted to examine RT trends and practice effects across the three groups. We employed multiple linear regression to estimate missing values, because this procedure provided unbiased estimates for missing data (Schafer & Olsen, 1998). The multiple regression analysis used the parameters of the nonmissing data (means, standard deviations, and covariances) to predict the values of the missing data. Regression residuals from randomly selected cases of the non-missing values were used to estimate the level of noise in the missing data.

3.3 Response latencies Investigation of RT trends revealed that all participants showed a small learning curve in each condition with RTs stabilizing after the 7th trial (from a total of 29 trials per trial type - i.e., A, V, AV+ and AV-; Figure 2a). This effect was partly skewed by participants in the older group. In contrast to young adults who began to show stable RTs as early as the 2nd trial, participants in the middle-aged group showed stable RTs after approximately 10 trials. In contrast to the young and middle-aged groups, older adults exhibited the slowest RTs, which did not reach those of the other two groups and stabilized after the 12th trial (Figure 2b). Following the missing value analysis, RTs were averaged across trials for each subject, each trial type and each condition, and analyzed using a mixed design ANOVA in a 3 x 5 x 4 factorial design with 3 age groups, 5 conditions, and 4 trial types (i.e. A, V, AV+ and AV-). Trial types and conditions were the within-subject factors, and age was the between-subject factor. First, a main effect of age [F (2, 25) = 6.25, p < 0.006] was observed with older adults showing overall longest RTs (mean ± s.d., 924 ± 46), followed by middle-aged (765 ± 43), and young adults (700 ± 46). Second, a main effect of trial type [F (3, 75) = 20.30, p < 0.0001] was identified with RTs to V-only and AV+ trials significantly faster than A-only and AV- ones. Third, a main effect of condition type was also found [F (4, 100) = 203.75, p < 0.0001]. In comparison to the rest, the

59 first condition in which participants were required to simply detect the any of the stimulus types exhibited the shortest RTs. The slowest condition was the fifth one, in which participants were required to identify whether exemplar bimodal stimulus presentations matched or did not match semantically (see Table 5). For each condition separately, a mixed design ANOVA was performed in a 3 x 4 factorial design with 3 age groups and 4 trial types (i.e. A, V, AV+ and AV-). In the first simple detection condition, bimodal RTs were significantly faster than unimodal ones [F(2,50)=59.86, p < 0.0001; Table 6]. Furthermore, in the second condition, which required participants to identify unimodal auditory, visual, and bimodal (AV+ and AV-) conditions, bimodal RTs were also significantly faster than unimodal ones across all age groups [F(2,50)= 25.81, p < 0.0001; Table 7]. In the third and fourth conditions, which tested the effects of visual dominance on auditory processing, RTs to bimodal trials were significantly faster than unimodal ones [F (2, 50)=17.55, p < 0.0001; Table 8]. However, in the fourth condition, where participants detected the visual target embedded within bimodal presentations, RTs in auditory-only trials were significantly slower than those in both bimodal and unimodal visual trials [F (2, 50)=29.00, pProcessing Parameters set: ƒ Choice of Noise Reduction = 3rd gradient ƒ Offset and Trend Removal = remove DC Offset based on whole trial (actually this does not do baselining here because this is continuous data, just some kind of dc offset) ƒ check SAVE PROCESSING PARAMETRES ƒ optionally set up band pass filtering (band-pass filter set between 0.1Hz to 55Hz) (bandpass filtering, especially highpass is *very* important - otherwise MSE curves look like noise curves) ƒ check SAVE PROCESSING PARAMETRES and from menu Save Dataset (Edit Info): this saves preprocessing configuration file s1_abstract_cont.ds/processing.cfg

220

Appendix 7: Subject-specific MEG event marker summary sheet Subject ID: ________________ Date:________________ Block Detect AV (part 1 and 2): AVmatch: ______________________________ AVunmatch: ______________________________ V: ______________________________ A: ______________________________ TOTAL: Tr24 (Presentation button code #1): ______________________________ L1 (response): ______________________________ Notes: Block Detect Specific Event (part 1 and 2): AVmatch: ______________________________ AVunmatch: ______________________________ V: ______________________________ A: ______________________________ TOTAL: inanimate (Presentation button code #1): ______________________________ Tr23 (Presentation button code #2): ______________________________ Tr24 (Presentation button code #3): ______________________________ TOTAL: L1 (response #1): ______________________________ L2 (response #2): ______________________________ L3 (response #3): ______________________________ TOTAL: Notes: Block Detect Animate (part 1 and 2): AVmatch: ______________________________ V: ______________________________ A: ______________________________ TOAL: animate: ______________________________ inanimate: ______________________________ TOTAL: Tr23 (Presentation button code #1): ______________________________ Tr24 (Presentation button code #2): ______________________________ TOTAL: L1 (response #1): ______________________________ L2 (response #2): ______________________________ TOTAL: Notes:

221 Appendix 8: Procedure for MEG data analysis Required data: 1. raw meg data as subj.ds files (raw_data directory) 2. shape files subj-o.shape, subj-o.shape_info in shape_files directory 3. anatomical scans subj-o.mri 4. talairach coordinates as exel sheet (MRIs/area_coords.xls) 3. cp raw_data/sub03_DetectAV_01.ds/processing.cfg allsubj_processing.cfg Required Files: 1. ctf32 2. ctf_mri2afni 3. ctf_SAMgrid 4. Talairach2MEG_coordinates_ad.pl (converting from radiological into neurological convention) 5. alltime 6. alltime_animacy 7. whole_head_3cm 8. load_randy_regions.m 9. area_coords.xls (spatial filter: 72 ROI coordinates) Required code for obtaining behavioural data from the MEG data: (batch_get_behaviour.m) 1.find_nearest.m 2.get_markers.m 3.get_corr_RTs.m 4.find_timing.m 5.saveData.m 6.detectevent_behavioural_data.m 7.detectanimate_behavioural_data.m 8.detectspecific_behavioural_data.m 9.batch_get_behaviour.m 10.junk.m (used to check for any hidden bugs) Required code for plotting behavioural data: 11.batch_rt_detectav.m 12.batch_rt_detecteventspecific.m 13.batch_rt_detectanimate.m 14.barweb.m (put error bars on the plots) Required code for analyzing MEG data: 1.parse_MEG_data.m 2.batch_make_weights_ad.m % IMPORTANT: the Talairach2MEG_coordinates_ad.pl needs to be run from kingpin because it requires certain matlab scripts 3.batch_read_weights_ad.m (event-related SAM analysis for detection condition) 4.batch_read_weights_animacy.m (event-related SAM analysis for animacy condition) 5.batch_read_weights_specificevent.m (event-related SAM analysis for congruency condition)

222

Appendix 9: Information and Consent Form for Behavioural Study Title: Spatiotemporal modeling of human cognitive function Investigators: Andreea Diaconescu (416) 785-2500 ext. 3139 Anthony R. McIntosh, Ph.D. (416) 785-2500 ext. 3522 Rotman Research Institute Baycrest Centre Purpose of the Research: The goal of this study is to compare response times of simultaneously presented auditory and visual stimuli to unimodal presentations. Description of the Study: You will be presented with meaningful sounds and pictures (i.e. pictures of animals, musical instruments, household products), and your task is to respond to the stimulus event as quickly as possible. In some trials, pictures and sounds are presented simultaneously, and in others, separately. The different runs will have distinct task instructions asking you to differentiate between matching and un-matching audiovisual combinations, or auditory-only and visual-only presentations Potential Benefits: You will not benefit directly from participating in this study. The information gained from this research may be used in the future to help people with diseases or damage to the brain. Confidentiality: No subject names will be recorded on subject databases, data sheets, or computer files. Any data resulting from your participation that will be published in scientific journals, texts, or other media will not reveal your identity. Neither your identity nor any personal information will be available to anyone other than the investigators. Your decision whether or not to participate will not prejudice you or your future interactions with researchers performing the study, nor will it affect care provided to you or your family members at Baycrest. If you decide to participate, you are free to withdraw your consent and to discontinue your participation at any time. A copy of the consent form will be given to you and the other copy will be retained by the principal investigators – Andreea Diaconescu and Randy McIntosh. Reimbursement: You will be reimbursed a total of $20 for the session. NAME OF PROJECT : Spatiotemporal modeling of human cognitive function INVESTIGATORS : Andreea Diaconescu, Anthony R. McIntosh, Ph. D. _______________________________________

I have been provided with a description of the experimental procedures and any possible risks or benefits that might be associated with these procedures.

223 I have also been given an opportunity to ask questions concerning these procedures and any questions that I have asked have been adequately answered. I have been told that I can withdraw my consent and stop my participation in this experimental study at any time and for any reason. I have been informed that confidentiality will be maintained in this study. I have been told that I can keep a copy of the informed consent form. I understand the information that I have been provided and I voluntarily consent to participate in this experimental study. Name_________________________ Signature______________________ Date__________________________

224

Appendix 10: Information and Consent Form for MEG Study Title: Spatiotemporal modeling of human cognitive function Investigators: Andreea Diaconescu (416) 785-2500 ext. 3139 Anthony R. McIntosh, Ph.D. (416) 785-2500 ext. 3522 Rotman Research Institute Baycrest Centre Description of the Study: You will be presented with meaningful sounds and pictures (i.e. pictures of animals, musical instruments, household objects), and your task is to respond to the stimulus event as quickly as possible. In some trials, pictures and sounds are presented simultaneously, and in others, separately. The different runs will have distinct task instructions asking you to differentiate between matching and un-matching audiovisual combinations, or auditory-only and visual-only presentations Potential Benefits: You will not benefit directly from participating in this study. The information gained from this research may be used in the future to help people with diseases or damage to the brain. Confidentiality: No subject names will be recorded on subject databases, data sheets, or computer files. Any data resulting from your participation that will be published in scientific journals, texts, or other media will not reveal your identity. Neither your identity nor any personal information will be available to anyone other than the investigators. Your decision whether or not to participate will not prejudice you or your future interactions with researchers performing the study, nor will it affect care provided to you or your family members at Baycrest. If you decide to participate, you are free to withdraw your consent and to discontinue your participation at any time. A copy of the consent form will be given to you and the other copy will be retained by the principal investigators – Andreea Diaconescu and Randy McIntosh. Reimbursement: This experiment lasts a total of 2 hours. You will be reimbursed $15.00 per hour, a total of $30 for the session. NAME OF PROJECT : Spatiotemporal modeling of human cognitive function INVESTIGATORS : Andreea Diaconescu, Anthony R. McIntosh, Ph. D. _______________________________________

I have been provided with a description of the experimental procedures and any possible risks or benefits that might be associated with these procedures. I have also been given an opportunity to ask questions concerning these procedures and any questions that I have asked have been adequately answered.

225

I have been told that I can withdraw my consent and stop my participation in this experimental study at any time and for any reason. I have been informed that confidentiality will be maintained in this study. I have been told that I can keep a copy of the informed consent form. I understand the information that I have been provided and I voluntarily consent to participate in this experimental study. I understand the information that I have been given and I voluntarily consent to participate in this experimental study. I do not have a cardiac pacemaker. I do not have any other metal in my body (implants, body piercing, metal stitches from surgery). □ I give the investigator permission to access my structural brain image, recorded in the fMRI suite at the Rotman Research Institute, for the purposes of MEG data analysis.

Name_________________________ Signature______________________ Date__________________________

226

Appendix 11: MEG Study Recruitment Script Telephone Script Hi, my name is Andreea Diaconescu and I am a graduate student at the Rotman Research Institute of the Baycrest Hospital. I received your name and number from our volunteer subject database. We currently have a study you may be interested in. If you have some time, I would like to explain the study to you. Do you have a few moments right now? If no: Okay, is there a day and time when I can call you back? Date:_______ Time:_________ Thank you for your time. If yes: Great! Thank you for your time, this won’t take long. This study is part of our research about multisensory integration. This study involves recording your magnetic brain activity while you respond to common visual objects and sounds. You will be taken to a magnetically shielded and soundproof room and asked to sit comfortably. Your head will be resting inside the helmet shaped MEG scanner. The task is not very difficult and we will explain it in detail. For an actual recording block of 7 min duration you have to perform the task and not move your head at all. We will perform 6 such recording blocks with short breaks for rest and relaxation in between. You have to sit for a total time of about 1 hour in the MEG scanner. There are no known risks involved with this procedure. Prior to the MEG portion, I will be performing several hearing tests which requires about 5 min time. We are seeking volunteers who have no metal in their body, because any, even very tiny metal part will distort the recorded brain signals. Do you have any questions at this moment? We will require your participation in 1 session of approximately 1 hour. You will also be paid $15.00/hour for your time. In addition, we are happy to reimburse your travel expenses, either by paying for parking here at Baycrest, or by reimbursing you $5 for TTC travel. Is this something you may be interested in? Great! There are a few questions that I need to ask to ensure that you are eligible to participate. Do you have metal in your body? (implant, pacemaker, screws, braces, etc?) YES: Thank you for your time. Unfortunately, this means you are not eligible for my study. NO: Continue Are you Right Handed? NO: Thank you for your time. Unfortunately this means that you are not eligible for my study.

227 YES: Continue Do you have any questions? Wonderful! Can we schedule a day and time when you can participate in our study? Date:_______ Time:_________ (Before hanging up, give them instructions for their visit.) Wear as little metal as possible. If your clothes have too much metal on them, you will be offered a hospital gown to change into. Wearing metal during the experiment will not cause you any danger but it will cause magnetic interference with the brain signal we are interested in. We will ask you to remove metal jewelry, dental appliances, glasses, and any clothing that has metal on it (for example, underwire bras). Also, if you have them, please avoid wearing coloured contact lenses as they may also interfere with the recording. If they are driving: Get their license plate number. Tell them you will meet them downstairs (entrance off Bathurst) when they get here and that you will have a parking pass ready for them so they can park in the back lot. Do you have any questions at this moment? If you’d like, I can email you a breakdown of what we will be doing the day of the study. Would you be interested in that? Thank you for your time today as well as your willingness to be a part of our study. Please to not hesitate to contact me if you have any further questions or concerns. Again, my name is Andreea Diaconescu and I can be reached at 416-785-2500, extension 3139. I will follow up with you the day before your appointment to confirm that you are still able to make it. Thank you for your time and I look forward to meeting you.

228

Appendix 12: MEG Description Script: What is MEG? When you perform any activity, different areas of your brain communicate with each other through electrical impulses. These impulses can also be thought of as electric currents. Any current will generate electric and magnetic fields. Thus, when your brain is active, it creates tiny magnetic fields inside and around your head. Our scientists are very interested in determining which areas of your brain are working while you perform different tasks. This is where Magnetoencephalography (MEG) comes in. As explained, we can think of patterns of brain activity as a number of sources of electro-magnetic fields. MEG allows us to measure the magnetic fields outside of your head. MEG is a non-invasive and passive way to measure the magnetic fields that your brain generates naturally. This method is non-invasive because it does not require making measurements inside your body. It is passive because it simply measures a signal that your brain generates, and does not act upon your body in any way. This method is used in hospitals and medical centers, and has no known short or long term side effects. Is MEG the Same as MRI? In many ways, MEG is the opposite of MRI. Rather than applying a magnetic field to the brain and observing its effects, MEG measures naturally occurring magnetic fields generated by the brain. An MRI image gives us information about structure of brain matter, and sometimes how the blood-oxygen level of brain tissue changes over time. From these changes in bloodoxygen level, we can determine where blood flow changes are occurring in the brain. However, MEG allows us to localise the electrical impulses of brain activity directly. Thus, in many ways, MEG and MRI are complimentary forms of neuroimaging. The information about the structure of an individual's brain (obtained by MRI) can be combined with the brain activity location information (obtained by MEG) to generate a picture of which part of the brain is responsible for a given task.

229

Appendix 13: MEG Subject Instruction Sheet: Before I describe the experiment, we would like to do a brief scan to ensure that there are no objects on your person that will interfere with the scanner. I’m going to raise you up into the helmet. Just try to remain still for the next five minutes. We will be watching you through the camera and will be able to hear you at all times. Let us know immediately if you are uncomfortable, and we’ll come and get you out. The entire session will take about 2 hours. This time will be broken up into shorter blocks lasting about 15 minutes. During each block, it is very important that you remain as still as possible. We will tell you when each block starts. Please try your best not to move until we tell you that the block is over. Between blocks, you can relax and move around as much as necessary. Sitting position only: It’s easiest to remain still if you rest your head against the back or the front of the helmet. DetectAV: In this task, you are required to respond to any stimulus that you hear or see on the screen, regardless of type or sensory modality. Press button 1 (left index finger) as soon as you hear or see the stimulus. Detect_EventTypeSpecific: In this task, you are required to distinguish between (1) matching (or congruent) audiovisual pairs (e.g., picture of a plane, sound of a plane; picture of a dog and a bark), (2) non-matching (or incongruent) audiovisual pairs (e.g., picture of a plane and sound of a plane; picture of a dog and a bark), and (3) unimodal – auditory only/visual stimuli. Press button 1 (left index finger) for a matching pair, press button 2 (left middle finger) for a non-matching pair, and press button 3 (left ring finger) for an auditory-only or visual-only stimulus. DetectAnimateEvent: In this task, you are required to distinguish between “living” (i.e., animals) and “non-living” stimuli (i.e., musical instruments, household objects, automobiles). Press button 1 (left index finger) for a stimulus that is living, press button 2 (left middle finger) for a stimulus that is non-living. One more thing: When you blink your eyes, there is a big signal generated on the MEG that interferes with the measurements we are doing. If at all possible, please try to avoid blinking during the presentation of the sounds or pictures We would prefer that you blink during the interval between the stimuli, instead. Okay, let’s get started. Remember that we will be watching you through the camera and will be able to hear you at all times. Let us know immediately if you are uncomfortable, and we’ll come and get you out.