Document not found! Please try again

Phonological manipulation between speech perception and ...

3 downloads 184924 Views 731KB Size Report
Jul 21, 2011 - running on a Samsung R65-T5500 Canspiro laptop situated outside .... stimulus onset to response offset, for all stimuli regardless of.
NeuroImage 59 (2012) 788–799

Contents lists available at ScienceDirect

NeuroImage j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / y n i m g

Phonological manipulation between speech perception and production activates a parieto-frontal circuit Claudia Peschke a, b, Wolfram Ziegler c, Juliane Eisenberger c, Annette Baumgaertner a, d,⁎ a

University Medical Center Hamburg-Eppendorf, Department of Systems Neuroscience, Martinistraße 52, 20246 Hamburg, Germany Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany City Hospital Bogenhausen, Clinic for Neuropsychology, Clinical Neuropsychology Research Group, Dachauer Str. 164, D-80992 München, Germany d Fresenius University of Applied Sciences, Department of Speech and Language Pathology, Hamburg, Germany b c

a r t i c l e

i n f o

Article history: Received 4 October 2010 Revised 5 July 2011 Accepted 10 July 2011 Available online 21 July 2011 Keywords: Auditory motor mapping Segmental processing Prosodic processing Area Spt Repetition Dorsal stream

a b s t r a c t Repetition has been shown to activate the so-called ‘dorsal stream’, a network of temporo-parieto-frontal areas subserving the mapping of acoustic speech input onto articulatory-motor representations. Among these areas, a region in the posterior Sylvian fissure at the temporo-parietal boundary (also called ‘area Spt’) has been suggested to play a central role particularly with increasing computational demands on phonological processing. Most of the relevant evidence stems from tasks requiring metalinguistic processing. To date, the relevance of area Spt in natural phonological operations based on implicit linguistic knowledge has not yet been investigated. We examined two types of phonological processes assumed to be lateralized differently, i.e., the processing of syllabic stress versus subsyllabic segmental processing. In two ways, subjects modified an auditorily presented pseudoword before reproducing it overtly: (a) by a prosodic manipulation involving a stress shift across syllable boundaries, (b) by a segmental manipulation involving a vowel substitution. Manipulation per se was expected to engage area Spt. Segmental compared to prosodic processing was expected to reveal predominantly left lateralized activation, while prosodic compared to segmental processing was expected to result in bilateral or right-lateralized activation. Contrary to expectation, activation in area Spt did not vary with increased phonological processing demand. Instead, area Spt was engaged regardless of whether subjects simply repeated a pseudoword or performed a phonological manipulation before reproduction. However, for both segmental and prosodic stimuli, reproduction after manipulation (compared to repetition) activated the left intraparietal sulcus and left inferior frontal cortex. We propose that these parieto-frontal regions are recruited when the task requires phonological manipulation over and above the more automated transfer of auditory into articulatory verbal codes, which appears to involve area Spt. When directly contrasted with prosodic manipulation, segmental manipulation resulted in increased activation predominantly in left inferior frontal areas. This may be due to an increased demand on phonological sequencing operations at the subsyllabic phoneme level. Contrasted with segmental manipulations, prosodic manipulation did not result in increased activation, which may be due to a lower degree of morphosyntactic and to syllablelevel processing. © 2011 Elsevier Inc. All rights reserved.

Introduction For decades, the link between acoustic speech information and the conceptual system has been central to research on the processing of auditory–verbal information. In contrast, the interface between

⁎ Corresponding author at: Hochschule Fresenius University of Applied Sciences, Fachbereich Gesundheit, Studiengang Logopädie, Alte Rabenstrasse 2, 20148 Hamburg, Germany. Fax: + 49 40 2263259 91. E-mail addresses: [email protected] (C. Peschke), [email protected] (W. Ziegler), [email protected] (J. Eisenberger), [email protected] (A. Baumgaertner). 1053-8119/$ – see front matter © 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2011.07.025

auditory speech perception and the speech motor system has only recently become the focus of intensive research. Recent studies show that an auditory–motor link develops and is strengthened during a period of speech acquisition in childhood, when knowledge is acquired about how sound translates into articulation (Kuhl, 2000). There is strong evidence that the link between speech perception and production persists into adulthood. Experiments using auditory feedback perturbations have shown that adults subconsciously modify their own speech productions to counteract artificially induced shifts in pitch or formant frequency (Houde and Jordan, 2002; Larson et al., 2000; Purcell and Munhall, 2006; Tourville et al., 2008), suggesting that perceptual input may play a crucial role in guiding speech motor programming (Guenther, 2006). In support of

C. Peschke et al. / NeuroImage 59 (2012) 788–799

this notion, adult speakers have been shown to unintentionally imitate incidental acoustic properties of linguistic stimuli during repetition (Kappes et al., 2009). The auditory–motor integration of speech is thought to involve temporo-parietal and frontal regions, usually referred to as the “dorsal stream” (Hickok and Poeppel, 2004). Within the dorsal stream, a special role is ascribed to a region in the posterior Sylvian fissure at the temporo-parietal boundary (i.e., area Spt). As part of the left posterior planum temporale, area Spt has been shown to activate both during speech perception and production, suggesting that its specific role is to map acoustic speech signals onto articulatory representations (Buchsbaum et al., 2001; Hickok and Poeppel, 2004, 2007; Okada and Hickok, 2006; Papathanassiou et al., 2000). Likewise, posterior superior temporal cortex has been suggested to map incoming auditory information onto stored templates derived from auditory experience, which are then used to generate a program for a motor response (Warren et al., 2005). The strong link between sensory and motor functions implicated in speech processing may be licensed by distinct temporo-frontal fiber pathways, as suggested by several structural imaging studies. The arcuate fascicle, a fiber bundle connecting posterior temporal with inferior frontal structures, shows stronger structural maturation in the left compared to the right hemisphere between childhood and adolescence (Paus et al., 1999). According to the authors, this structural superiority may facilitate the fast bidirectional transfer of information between auditory and motor regions in the left hemisphere. A strongly left lateralized temporo-frontal white matter pathway was also shown by several diffusion tensor imaging (DTI) studies in adults (Barrick et al., 2007; Buchel et al., 2004; Parker et al., 2005). One of the DTI studies tracking the arcuate fascicle is consistent with the assumption that Spt is part of this dorsal stream system (Catani et al., 2005). In addition to the direct temporo-frontal tract, this study reported a second indirect temporo-frontal pathway, consisting of two tracts with a connecting relay in inferior parietal cortex (i.e. BA39/BA40). According to the authors, the indirect pathway is used whenever an intervening stage, such as phonological recoding, occurs between auditory input and articulatory output. Other fiber tracking studies have independently shown structural connectivity between the supramarginal gyrus (Parker et al., 2005) or inferior parietal lobule/intraparietal sulcus (Frey et al., 2008), and inferior frontal as well as superior temporal regions, respectively. The left temporo-parietal junction and specifically area Spt (e.g. Hickok, 2009) have been proposed to be involved in tasks which require temporary storage of phonological information (Buchsbaum and D'Esposito, 2008; Jacquemot and Scott, 2006). This is because area Spt is assumed to “act as an auditory–motor interface that serves to bind acoustic representations of speech with articulatory counterparts” (Buchsbaum and D'Esposito, 2008, p. 13). In fact, several functional imaging studies have revealed an increased involvement of Spt in tasks with increased phonological processing demands. Activation in the dorsal posterior temporal plane is influenced by word length, showing greater BOLD signal changes for multisyllabic compared to monosyllabic words in a covert object naming task (Okada et al., 2003), during the silent rehearsal phase of word pairs in a verbal working memory task (Buchsbaum et al., 2005a), and during covert rehearsal of nonsense sentences in which verbs and nouns had been replaced by pseudowords (Hickok et al., 2003). Sensory–motor integration for phonological information has recently been proposed to be a prominent function of area Spt (Hickok, 2009). Thus, area Spt may play a central role in the mapping of acoustic speech input onto articulatory-motor representations, particularly when there is an increased computational demand on phonological processing. In prior studies, activation of components of the dorsal stream was shown predominantly in phonological tasks with a relatively high verbal working memory load (Buchsbaum et al., 2001, 2005a; Burton et al., 2000; Heim et al., 2003). Some of these studies

789

used relatively artificial tasks requiring metalinguistic processing, such as subvocal rehearsal (Buchsbaum et al., 2005a) or explicit phoneme discrimination (Ashtari et al., 2004; Burton et al., 2000; Jacquemot et al., 2003; Zaehle et al., 2008). In addition, some of the studies selectively examined either receptive (Burton et al., 2000; Heim et al., 2003) or expressive phonological processes (Okada et al., 2003). The question arises, therefore, whether Spt is involved in natural phonological operations such as stress shifts across syllable boundaries (as in China versus Chinese) or vowel changes (as in woman versus women), operations which are usually performed according to implicit linguistic knowledge. Thus, the primary goal of the present study was to examine the involvement of area Spt in performing phonological manipulations, using naturalistic phonological processes requiring relatively low verbal working memory resources. Auditory–motor integration generally is assumed to rely on lefthemispheric dorsal stream structures (e.g. Hickok and Poeppel, 2007). While prelexical auditory processing involves the superior temporal gyri of both hemispheres (Hickok, 2009; Hickok and Poeppel, 2004; Hickok et al., 2008), later auditory–verbal processing stages are presumed to involve left lateralized pathways (e.g. Hickok and Poeppel, 2000). However, the lateralization of dorsal stream activation during auditory–motor integration may depend on the nature of the phonological process. Available evidence suggests differential lateralization of segmental versus prosodic processing. Sublexical processes requiring the sequencing of phonemic information primarily involve left hemispheric regions (Ashtari et al., 2004; Burton and Small, 2006; Burton et al., 2000; Gelfand and Bookheimer, 2003; Heim et al., 2003; Jacquemot et al., 2003; Zaehle et al., 2008). In contrast, findings regarding the lateralization of linguistic–prosodic processes have been less consistent. Some lesion studies provide evidence consistent with the processing of prosodic units in the right hemisphere (Bradvik et al., 1991; Weintraub et al., 1981), whereas other studies suggest a left-hemispheric dominance for linguistic prosody (Arciuli and Slowiaczek, 2007; Emmorey, 1987; Van Lancker, 1980). Neuroimaging evidence, comparing the production of rhythmic to isochronous syllable sequences, points to an involvement of right hemisphere regions in linguistic–prosodic processing at the supra-syllabic level (Riecker et al., 2002). Therefore, a secondary goal was to examine the lateralization of dorsal stream activation during segmental versus prosodic phonological manipulation. To investigate the involvement of area Spt in phonological processing and to determine the role of left versus right hemisphere structures, we used a segmental and a prosodic manipulation task, both based on naturalistic phonological regularities. Subjects were asked to manipulate auditory pseudoword stimuli phonologically according to implicit linguistic knowledge (i.e. from ‘woman’ to ‘women’ or from ‘China’ to ‘Chinese’) before overt reproduction. Verbatim repetition of the stimulus served as the control condition. In task construction we proceeded from the assumption that in order to arrive at the correct reproduction after phonological manipulation, a stimulus first has to be analyzed sequentially, then the phonological information has to be manipulated at the prosodic or the segmental level, respectively, and finally the stimulus has to be reassembled in order to generate the target utterance. In contrast, in the verbatim repetition condition a stimulus only has to be analyzed sequentially and then reassembled for production. Our main question was whether additional phonological manipulations between speech perception and speech production would preferentially engage area Spt when compared to verbatim repetition. Our secondary question was whether the nature of the employed phonological operation, namely segmental versus prosodic manipulation, would result in preferential engagement of the left and/or the right hemisphere. We hypothesized that repetition as well as reproduction after manipulation involves structures associated with the dorsal stream,

790

C. Peschke et al. / NeuroImage 59 (2012) 788–799

predominantly superior temporal and inferior frontal areas. Reproduction after manipulation, as compared to verbatim repetition, was expected to reveal increased activation of Spt due to the additional phonological manipulation required. The comparison between the two phonological processes was expected to show predominantly left lateralized activation for segmental and more bilateral or right lateralized activation for prosodic manipulation. Materials and methods Participants Twenty-three healthy right-handed subjects participated in the study. One of the subjects had to be excluded from further analysis, as he misunderstood the task of one of the sessions. All subjects were native speakers of German without any history of serious medical, neurological or psychiatric illness, or hearing loss (mean age 26.8 years, range 21–36 years, eleven females). Hand preference was tested with the 10-item version of the Edinburgh Handedness Inventory (Oldfield, 1971). Subjects had an average laterality quotient of 0.85 (range 0.5–1.0). The study was performed according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of the University Medical Center Hamburg-Eppendorf. All subjects gave written informed consent prior to the experiment. Stimuli Pseudowords were used to discourage semantic processing, since we were primarily interested in phonological processing. The materials included a total of 144 pseudowords, and consisted of two different types of speech stimuli for the two phonological processes to be examined. The first type of stimuli, designed to examine prosodic processing (hereafter called ‘PROS’), consisted of 72 two- and threesyllabic pseudowords. The items were modeled after German bisyllabic names of countries (e.g., /'ku:.ba/) and their three-syllabic language counterparts (/ku.'ba:.nıʃ/) (English: Cuba — Cuban). Thus, the materials consisted of 36 bisyllabic pseudocountries and 36 associated trisyllabic pseudolanguages. Notably, these pairs differed by the location of word stress relative to segmental content, i.e., stress on /ku:/ versus /ba:/ in the above example. A pseudoword example occurring in the experimental materials is /'do:.ga/ (pseudo-country) and /do.'ga:.nıʃ/ (pseudo-language). The second type of stimuli was designed to predominantly require segmental processing (hereafter called ‘SEGM’). The materials consisted of 36 monosyllabic pseudonouns preceded by the article “der” (as in /de:ɐ 'bax/, English: the stream), and 36 associated bisyllabic pseudodiminutives preceded by the article “das” (as in /das 'bæç.lain/, English: the streamlet). These pairs differed by the frontness of the stressed vowel (umlaut). A pseudoword example of the SEGM materials is /de:ɐ 'tu:m/ versus /das 'ty:m.lain/. The speech materials were spoken by an experienced male speaker and recorded in a sound treated booth. Mean durations of the stimuli are listed in Table 1. There was a significant difference between the two types of speech stimuli, with longer durations for the SEGM Table 1 Stimulus durations (in ms). Stimulus group

Mean

SD

Range

All stimuli (n = 144) PROS stimuli (n = 72) Pseudocountries (n = 36) Pseudolanguages (n = 36) SEGM stimuli (n = 72) Pseudonouns (n = 36) Pseudodiminutives (n = 36)

889 824 660 988 954 802 1105

185 177 71 59 170 76 73

544–1288 544–1136 544–810 859–1136 639–1288 639–962 970–1288

materials as compared to the PROS materials (t = 4.49; DF = 142; p b 0.001). This difference is explainable by the fact that the model speaker made short pauses between the prefixed article and the noun in the SEGM stimuli. The intensity of all stimuli was set to 80 dB. Tasks and procedures We chose to use overt speech production, since a recording of overt verbal responses enabled us to monitor and analyze subjects' productions with respect to accuracy and response latency. Subjects performed two different tasks: (1) In the repetition task (hereafter called ‘REPEAT’) participants had to repeat the speech stimuli as accurately as possible; (2) in the manipulation task (hereafter called ‘MANIP’), items had to be manipulated phonologically prior to overt reproduction. For the PROS stimuli, a pseudocountry (e.g. /'do:.ga/) had to be converted into its respective pseudolanguage (and vice versa) with the concomitant stress shift (i.e., /do.'ga:.nıʃ/). For the SEGM stimuli, a pseudonoun (e.g., /de:ɐ 'tu:m/) had to be converted into its pseudodiminutive counterpart (and vice versa) with the concomitant vowel change in the stressed syllable (i.e., /das 'ty:m.lain/). Collapsed across items, input and output were identical in the experimental and the control condition. During practice, the manipulation tasks were introduced through implicit learning procedures. Specifically, for the manipulation condition no explicit description was given of the rule underlying the manipulation. Instead, subjects were required to infer the task through implicit learning. The learning procedure started with German real word examples presented by the examiner (e.g., /'ku:.ba/) — (/ku.'ba:.nıʃ/) and subjects were then asked to complete a pair when only one stimulus was given. When a participant had given several correct answers in a row, learning was extended to pseudoword examples. To alert subjects to the beginning of an auditory stimulus, each trial started with the presentation of a visual cue (i.e., a symbol of a loudspeaker) which appeared 500 ms before the onset of each auditory stimulus, and stayed on until its offset. Five hundred ms after the offset of the auditory stimulus, a green dot was presented for 300 ms. Subjects were instructed to begin speaking as soon as the green dot had disappeared. During the rest of the trial the screen remained dark. The SOA was jittered between 5.5 and 10.5 s (see Fig. 1). Stimulus presentation The visual cues (a gray symbol of a loudspeaker, a green dot) were presented centrally on a dark background. They were projected onto a translucent screen located behind the head coil, which was viewed by the subjects via a mirror mounted on top of the coil. The auditory stimuli were presented via an MR-compatible electrodynamic headphone with a built-in dual-channel microphone (MR ConFon GmbH, Magdeburg, Germany, http://www.mr-confon.de) for the combined presentation and recording of speech. Subjects' responses were recorded with the sound recording software PhonOr implemented in the ConFon system which automatically preprocesses the dual channel recordings by reducing the whole frequency spectrum of the scanner noise by 20 dB in relation to the speech signal. Vocal response recordings had to be synchronized with the recordings of scanner pulses and stimulus onsets. To this end, scanner pulses were recorded in two different channels: one was routed to the standard presentation software which also recorded stimulus onset times, and the other was fed into the vocal response recording channel as an additional signal. This procedure ensured perfect synchronization of the general presentation parameters with the recording of vocal responses. Vocal responses were recorded by ConFon software running on a Samsung R65-T5500 Canspiro laptop situated outside the scanner room, and were saved as wavfiles. The volume for the auditory presentation was set to a comfortable loudness level individually for each subject in a preceding test scan. The task

C. Peschke et al. / NeuroImage 59 (2012) 788–799

791

Fig. 1. Example of a trial sequence (in s).

sequence was controlled by a PC running “Presentation” software (Neurobehavioral Systems, http://www.neurobs.com/).

correcting artifacts due to stimulus-correlated movements which are unavoidable in tasks with overt speech production.

Experimental design

Statistical analyses

The study had a one-factorial within-subject design with the factor (PROS versus SEGM). Verbatim repetition (REPEAT) served as the baseline condition. The experiment contained eight runs, four for each task. There were two runs for the manipulation of the prosodic and two runs for the manipulation of the segmental materials; analogously, to collect corresponding baseline data, there were four runs for the REPEAT task. Task and phonological process remained the same within a run. Instructions regarding task and process were visually presented for three seconds at the beginning of each run. Each run consisted of 36 experimental trials and took about 5.5 min to complete. Half of the 36 trials contained bisyllabic and the other half trisyllabic pseudowords. The sequence of runs was pseudorandomized, with the restriction that a task or phonological process could occur maximally two times in a row and that, to the degree possible, each run appeared in all positions of potential run sequences across subjects. The order of pseudowords within a run was pseudorandomized in a way that maximally three bi- or trisyllabic stimuli, respectively, occurred in a row. The trial sequence was fixed within runs for all subjects. The experiment took about 50 min to complete.

Behavioral data Response accuracy was rated for two different parameters. First, phonemic accuracy was examined across tasks to verify that the experimental tasks could be accomplished in the scanner environment, and that subjects were able to repeat the stimuli accurately despite the scanner noise. Secondly, and more importantly, manipulation accuracy was evaluated as an index of level of performance. Thus, the first step in vocal response analysis was to determine the phonemic accuracy of responses in the REPEAT and MANIP tasks, using the sound editing software ‘Audacity’ (http://audacity. sourceforge.net/). Accuracy of verbal responses was evaluated independently by two raters (CP and a second rater who was not familiar with the goal of the study). Mean concordance between raters was 0.8 (Cohen's Kappa), which constitutes substantial interrater agreement by convention (Landis and Koch, 1977). Whenever the raters differed, the respective tokens were rated by a third person (instructed by CP). The inclusion criterion was set to a minimum of 80% correct responses overall, across tasks and phonological processes. In a second step, the accuracy of the manipulation process per se was analyzed for the MANIP tasks. Specifically, the manipulation for PROS stimuli was considered accurate whenever word stress had been shifted from the first to the second syllable or vice versa, depending on the model stimulus. The manipulation for SEGM stimuli was considered accurate whenever a vowel replacement had occurred, independently of the accuracy of the vowel per se. Next, duration (in ms) of verbal responses was measured in two ways. The first analysis included only responses not containing syllable iterations, self-corrections, or dysfluencies and, for the MANIP task, only accurate responses. This analysis was expected to provide an unbiased estimate of response duration differences between conditions. On- and offset of verbal responses were determined by visual and auditory inspection of the speech wave. The second analysis addressed the duration of the complete process of repetition and manipulation, respectively, measured for each event from model stimulus onset to response offset, for all stimuli regardless of dysfluencies or manipulation accuracy. The latter analysis was used to model onset and duration of events with respect to the functional imaging data. Furthermore, response latencies were computed by calculating, for each item, the time interval between offset of the green dot to vocal response onset. In a final step, the behavioral parameters (accuracy, response duration, response latency) were correlated with each other in order to test for trade-off effects between speed and accuracy or between response duration and response latency.

PHONOLOGICAL PROCESS

Data acquisition and image preprocessing Imaging was conducted using a 3 T Siemens magnetic resonance imaging system, acquiring a total of around 1520 volumes for each subject. Functional T2-weighted gradient-echo echo-planar images were obtained from 26 axial slices (voxel size 3 × 3 × 3 mm, no gap, TR 1720 ms, TE 30 ms, flip angle 80°, field of view 216 × 216 mm 2, matrix 72 × 72) aligned to the anterior and posterior commissure. Additionally, a high-resolution (1 × 1 × 1 mm voxel size) T1-weighted structural MRI (MPRAGE) was acquired for each subject. Head movement was restrained within the head coil by circumaural headphones fitting tightly into the head coil and by tight foam padding. Additionally, subjects were instructed to minimize head movement. The microphone was placed centrally to the mouth as closely as possible without touching the lips. Processing and analysis of imaging data was performed with SPM5 (http://www.fil.ion.ucl.ac.uk/spm/). Preprocessing included slice timing (correction for differences in slice acquisition time), realignment and unwarping (motion correction), coregistration between the individual structural T1-weighted image and the EPI-images, and segmentation of the structural images. The resulting estimated spatial normalization parameters were then applied to the series of functional images, which were subsequently resampled to a voxel size of 3 × 3 × 3 mm and finally smoothed using a 6 mm full-width at half-maximum isotropic Gaussian kernel. Speech movement artifacts were controlled by using an unwarp mechanism in the preprocessing of the data, thereby correcting for geometric distortions due to magnetic field inhomogeneity (Andersson et al., 2001; Hutton et al., 2002). This preprocessing step removes variance of movement-bysusceptibility–distortion interaction and is therefore useful for

Functional imaging data Statistical analyses of the functional imaging data were performed in two steps. In a first-level analysis, a statistical model was computed

792

C. Peschke et al. / NeuroImage 59 (2012) 788–799

for each subject. To this end, the processes of repetition and manipulation, respectively, were defined as one single event lasting from stimulus onset to vocal response offset, and entered as one variable into the model (since the perception and production of the pseudowords were temporally correlated, they could not meaningfully be modeled separately as two regressors). This regressor, containing the whole process of repetition or reproduction after manipulation, was then convolved with a canonical hemodynamic response function (HRF) as implemented in SPM5. Since intrasubject stimulus-correlated motion was corrected by using the SPM5function ‘unwarp’ for the preprocessing of the images, the realignment parameters were not inserted into the model. Voxel-wise regression coefficients for the variable of interest were estimated using the least-squares method within SPM5, and statistical parametric maps of the t statistic (SPM{t}) were generated. Next, we computed the simple main effects of each of the four conditions in our design (REPEAT_PROS, REPEAT_SEGM, MANIP_PROS, MANIP_SEGM). In a second-level analysis, the four contrast images of the first-level analysis for each subject were used to perform a group analysis. To this end, the contrast images of the four conditions were entered into a within-subject ANOVA model, including a correction for non-sphericity. All of the following analyses were computed within this model. To identify activation due to manipulation of pseudowords across phonological processes, the contrast images of the main effects of the MANIP conditions (i.e., MANIP_PROS, MANIP_SEGM) were submitted to a conjunction analysis under a conservative conjunction null hypothesis (Nichols et al., 2005). The same analysis was performed for the REPEAT conditions. The next step aimed to identify areas sensitive to the specific demands of segmental versus prosodic manipulation. To this end, MANIP was contrasted with REPEAT separately for both phonological processes (i.e. MANIP_SEGM N REPEAT_SEGM and MANIP_PROS N REPEAT_PROS). The reverse contrasts (i.e., REPEAT N MANIP for segmental and prosodic processing, respectively) were computed analogously. To examine whether segmental and prosodic manipulations preferentially engage the left and/or the right hemisphere we computed a lateralization index (LI), respectively, by using the LIToolbox (Wilke and Lidzba, 2007; Wilke and Schmithorst, 2006). We applied the bootstrapping method using weighted means for the whole brain except a sagittal section of 5 mm to the left and right of the midline. This analysis results in values between 1 (completely left lateralized) and −1 (completely right lateralized). Finally, we computed an interaction between PHONOLOGICAL PROCESS and the two tasks (reproduction with manipulation, repetition) in order to examine the impact of manipulation while controlling for differences in stimulus durations (and, subsequently, trial duration) between the segmental and the prosodic stimuli (see Table 1). With this analysis we intended to examine whether the two phonological processes differed with respect to amount or localization of the expected additional increase in activation in the MANIP relative to the REPEAT condition. The analyses described above were computed for the entire volume of 26 slices. Since the location of area Spt varies considerably across individuals (Westbury et al., 1999; see also Hickok, 2009), we additionally established a region of interest (ROI) for area Spt whose extent was defined by the coordinates derived from nine previous studies which reported Spt activation during auditory–motor integration (Buchsbaum et al., 2005a,b, 2011; Callan et al., 2006; Hickok et al., 2003; Okada and Hickok, 2006; Okada et al., 2003; Pa and Hickok, 2008; Wilson and Iacoboni, 2006). To determine the center of this ROI, we first transformed the Spt coordinates wherever necessary into MNI space using the function provided by Matthew Brett at http://imaging.mrc-cbu.cam.ac.uk/imaging/MniTalairach, and then computed the average of the respective minimum and maximum x-, y- and z-coordinates given in the nine studies. These averages yielded the central ROI coordinate (i.e., [−54 −39 15]). Around this central

coordinate we then defined a sphere with a radius large enough (i.e., 9 mm) to cover all of the peak coordinates of the nine studies contributing to the ROI definition. We wanted to ensure that the observed activations did not result from deactivation in the comparison condition. Therefore, all differential contrasts were inclusively masked by the corresponding main effect of the minuend, as stated in the figure legends (voxelwise p b 0.05). The statistical threshold was set to p b 0.05 (FWE-corrected) for simple main effects and to p b 0.05 (FDR-corrected) for differential contrasts and the interaction analyses. Anatomical localization of activation peaks was done by using the SPM Anatomy Toolbox Version 1.6 (Eickhoff et al., 2005). Spatial references are given in Montreal Neurological Institute (MNI) coordinates. Results Behavioral data Phonemic accuracy across tasks and phonological processes All participants reached a satisfactory level of phonemic accuracy, with a group mean of 86.8% correct responses across tasks and phonological processes (range 80.6–93.1%, SD 3.7%, see Table 2). Thus, none of the subjects had to be excluded from further analysis. Apart from the primary factor PHONOLOGICAL PROCESS (i.e., segmental or prosodic processing), TASK was included as a second within-subject factor in a repeated-measures MANOVA. This analysis revealed significant main effects of TASK (F(1,21) = 51.24; p b 0.001) and of PHONOLOGICAL PROCESS (F(1,21) = 9.91; p b 0.01), and a significant TASK × PROCESS interaction (F(1,21) = 10.48; p b 0.01). Responses in the REPEAT task were more accurate (mean: 91.8% correct) than in the MANIP task (mean: 81.8% correct). Across tasks, the SEGM stimuli showed higher phonemic accuracy (mean 88.9% correct) than the PROS stimuli (mean: 84.7%). The interaction is explained by a larger difference in accuracy between MANIP and REPEAT tasks in the SEGM compared to the PROS stimuli. Manipulation accuracy across phonological processes This analysis exclusively considered the success of the manipulation, regardless of phonemic errors. Manipulation per se revealed a high mean accuracy of 96.2% across phonological processes (range of individual averages across tasks and phonological processes 80.6– 100%, SD 5.3%). For the PROS and SEGM stimuli, the means were 95.9% (range 66.7–100%) and 96.5% (range 72.0–100%), respectively. A pairedsamples t-test failed to reveal a significant difference in manipulation accuracy between the two phonological processes (t(21) = −0.30; p N 0.05). Response latency Average response latency, computed from the offset of the green dot, was 149 ms (range 36–325 ms; SD 88 ms). A repeated-measures ANOVA with TASK and PHONOLOGICAL PROCESS as factors revealed no significant main effects of TASK (F(1, 21) = 2.16; p N 0.05) or of PHONOLOGICAL PROCESS (F(1, 21) = 2.58; p N 0.05) and no TASK × PHONOLOGICAL PROCESS interaction (F(1, 21) = 0.64; p N 0.05).

Table 2 Phonemic accuracy in %, averages (ranges) of individual means. Task/phonological process

MANIP

REPEAT

Across tasks

SEGM

81.8 (68.1–97.2) 81.7 (66.7–91.7) 81.8 (70.8–93.1)

95.9 (88.9–100) 87.8 (76.4–94.4) 91.8 (86.8–95.8)

88.9 (81.3–97.2) 84.7 (75.0–91.7) 86.8 (80.6–93.1)

PROS Across processes

C. Peschke et al. / NeuroImage 59 (2012) 788–799

Response and trial duration In the analysis of response durations we excluded responses which contained iterations, self-corrections, delays or manipulation errors (3% of all responses). The mean response duration, computed from vocal response onset to offset across both tasks and phonological processes, was 913 ms (range 743–1248 ms, SD 113 ms). A repeated-measures ANOVA with TASK and PHONOLOGICAL PROCESS as within-subject factors revealed a significant main effect of PHONOLOGICAL PROCESS (F(1,21) = 43.63; p b 0.001). A post-hoc analysis showed a significantly longer mean response duration for the SEGM stimuli (966 ms) than for the PROS stimuli (881 ms; paired-samples t-test: t(21) = 7.16; p b 0.001). There was no significant main effect of TASK (F(1,21) = 2.80; p N 0.05) or TASK× PHONOLOGICAL PROCESS interaction (F(1,21) = 0.27; p N 0.05). Next, trial duration, computed from stimulus onset to response offset, was determined for all responses. Across stimuli, the analysis revealed a mean trial duration of 4010 ms (range 3570–4770 ms, SD 260 ms). A repeated-measures ANOVA with TASK and PHONOLOGICAL PROCESS as within-subject factors revealed a significant main effect of PHONOLOGICAL PROCESS (F(1,21) = 210.72; p b 0.001). A post-hoc analysis showed a significantly longer trial duration for the SEGM stimuli (4190 ms) than for the PROS stimuli (3830 ms; paired-samples t-test: t(21) = 6.84; p b 0.001). There was no significant main effect of TASK (F(1,21) = 0.53; p N 0.05) and no TASK × PROCESS interaction (F(1,21) = 0.02; p N 0.05). Correlations between behavioral variables None of the behavioral parameters (accuracy, response duration, response latency) correlated significantly with any other parameter (all r between − 0.19 and 0.35; p N 0.05). Functional imaging data Areas involved in repetition and reproduction with manipulation The imaging data of all participants (n = 22) were valid and could be used for functional data analyses. First, we performed a conjunction analysis, examining conjoint BOLD signal changes across phonological processes for the simple main effects of MANIP and REPEAT, respectively. Computed across PROS and SEGM stimuli, repetition resulted in

793

an extensive pattern of perisylvian activation with bilateral changes in the post- and precentral gyri, the thalami, the middle cingulate gyri, primary motor cortices, superior temporal gyri, and the cunei. There were also changes in right middle temporal cortex, and in the left middle frontal gyrus (Figs. 2a/d). At the very conservative level of a Bonferroni correction across the whole brain, we observed relatively little activation in inferior frontal gyrus (IFG). However, IFG activation was observed when the statistical correction threshold was lowered to false discovery rate (across the whole brain). For reproduction with manipulation, the conjunction analysis revealed a similar activation pattern (Figs. 2b/e). However, in addition to the pattern observed for repetition, there were larger clusters of activation in IFG and left middle frontal gyrus (MFG) as well as in left intraparietal sulcus (IPS). In left posterior superior temporal gyrus (pSTG), the conjunction analyses for repetition as well as reproduction with manipulation resulted in strong and circumscribed activation in area Spt. In order to examine the strength of Spt activation for all conditions (i.e., MANIP_PROS, MANIP_SEGM, REPEAT_PROS, REPEAT_SEGM) we first extracted the respective parameter estimates from each subject's individual peak voxel within the Spt-ROI (constructed as described above in Materials and methods). We then averaged the estimates across subjects and plotted the BOLD signal changes in area Spt separately for each condition. Parameter estimates computed within this ROI show comparable changes of activation in all four conditions (see Fig. 2c). Areas involved in phonological manipulation To explore the areas involved in segmental and prosodic manipulation we contrasted the MANIP with the REPEAT task separately for both phonological processes. In the segmental condition, this contrast revealed a large connected cluster of activation covering most of left inferior frontal cortex, including Broca's area and extending into BA47 and the middle frontal gyrus (peak MNI coordinate: [− 48 12 27]), and a large cluster of activation in the left IPS (peak MNI coordinate: [−42 −42 42]; Table 3, Fig. 3a). Activation in IFG and the inferior parietal lobule was also found in the right hemisphere. Additionally, there was activation in the left middle

Fig. 2. Main effects of (a/d) repetition and (b/e) reproduction with manipulation, across phonological processes. Note: a) Conjunction of REPEAT_PROS N rest ∩ REPEAT_SEGM N rest. b) Conjunction of MANIP_PROS N rest ∩ MANIP_SEGM N rest. Contrasts entering conjunctions are based on t-tests, p b 0.05, FWE corrected for the entire volume of images, color scale represents t-values. c) BOLD signal changes in area Spt (ROI with a radius of 9 mm around the MNI coordinate [− 54 − 39 15]), for each of the four conditions. d) Transversal sections for the conjunction of REPEAT_PROS N rest ∩ REPEAT_SEGM N rest (corresponding to a). e) Transversal sections for the conjunction of MANIP_PROS N rest ∩ MANIP_SEGM N rest (corresponding to b).

794

C. Peschke et al. / NeuroImage 59 (2012) 788–799

Table 3 Activation maxima for MANIP N REPEAT for segmental and prosodic processing (threshold p b 0.05, FDR-corrected for the whole brain, cluster level N 10 voxels). Region (peak activation)

MANIP N REPEAT for SEGMa Inferior parietal lobule (IPS) Inferior frontal gyrus (BA44/BA45) Medial superior frontal gyrus (BA32) Inferior temporal gyrus Superior frontal gyrus Supramarginal gyrus Calcarine gyrus Lingual gyrus Superior temporal gyrus Insular lobe Middle temporal gyrus Cuneus Calcarine gyrus

Side Voxels MNI coordinates in cluster x y z L L L L R R R L R L L R

1217 1698 2076 24 13 21 85 84 15 23 23 17 22

L L

219 13

− 42 − 48 −3 − 45 27 60 21 − 15 48 − 42 − 48 0 12

T-value

− 42 42 9.12 12 27 9.11 21 42 8.63 − 54 − 6 5.13 3 45 4.44 − 15 24 3.75 − 57 6 3.57 − 48 − 3 3.49 − 45 15 3.48 −3 0 3.40 − 42 3 3.26 − 75 21 3.05 − 72 15 2.83

b

MANIP N REPEAT for PROS Inferior parietal lobule (IPS) Inferior frontal gyrus (BA45)

− 39 − 42 − 45 12

39 5.66 27 4.11

Note: IPS — intraparietal sulcus. a Cluster level extent N 10; inclusively masked with main effect of MANIP_SEGM. b Cluster level extent N 10; inclusively masked with main effect of MANIP_PROS.

and inferior temporal gyri, the superior medial frontal gyrus and the caudate nucleus. The weighted mean LI (lateralization index) was 0.69, showing a moderate left lateralization of activation for segmental processing. In the prosodic condition, the contrast between the manipulation and repetition task revealed considerably less activation compared to the segmental condition, but a comparable activation pattern. That is, activation was predominantly found in the left IFG (peak MNI

coordinates: [−45 12 27]) and left IPS (peak MNI coordinates: [−39 −42 39]; Table 3, Fig. 3b). Similarly to the segmental condition, the activation was moderately left lateralized (weighted mean LI = 0.65). Because we had expected reproduction after manipulation (compared to repetition) to lead to an increase of activation in area Spt due to the additional phonological manipulation required, we performed an ancillary analysis exploring the effects of manipulation within the Spt-ROI. This analysis was performed in two ways, by (1) examining the two phonological processes separately (i.e., MANIP N REPEAT for segmental and for prosodic processing, respectively) and by (2) combining the two phonological processes in a single analysis (i.e., MANIP N REPEAT for segmental and prosodic processing). Neither analysis resulted in any significantly activated voxels, even when performed without masking and when the statistical threshold was lowered to an uncorrected level of p b 0.001. Because an analysis at the group level may have obliterated significant findings for individual subjects due to individual variability in the location of area Spt, we also examined activation within the SptROI for each of the 22 subjects individually (using a threshold of p b 0.001 uncorrected for the Spt-ROI). This analysis revealed that 7 out of 22 subjects showed activation within the Spt-ROI. Of those 7 subjects, 3 showed clusters of between 1 and 5 voxels, and 4 showed clusters of 8, 9, 18 and 32 voxels, respectively, when manipulation was contrasted with verbatim repetition across phonological processes. For SEGM as well as PROS, the reverse contrast (REPEAT N MANIP) showed activation changes predominantly in the angular gyrus bilaterally, the posterior cingulate cortex, and the medial prefrontal cortex. Finally, we computed the interaction of PHONOLOGICAL PROCESS with the two tasks (repetition, reproduction after manipulation) in order to examine the differential effects of an additional task (reproduction

Fig. 3. Effect of reproduction with manipulation (compared to verbatim repetition) for (a/c) segmental and (b/d) prosodic processing. Note: a) MANIP N REPEAT for SEGM, inclusively masked with main effect of MANIP_SEGM. b) MANIP N REPEAT for PROS, inclusively masked with main effect of MANIP_PROS. Contrasts in (a) and (b) are based on t-tests (pb 0.05, FDR corrected for the entire volume of images, cluster level extent N 10 voxels). c) Transversal sections for the contrast of MANIPN REPEAT for SEGM (corresponding to a). d) Transversal sections for the contrast of MANIP N REPEAT for PROS (corresponding to b). e) BOLD signal changes in left IFG (MNI coordinate [−45 12 27]) for each of the four conditions. f) BOLD signal changes in left IPS (MNI coordinate [−39 −42 39]) for each of the four conditions. MNI coordinates in (e) and (f) represent the peak activation in IFG and IPS in a conjunction analysis of MANIPN REPEAT for SEGM∩ PROS.

C. Peschke et al. / NeuroImage 59 (2012) 788–799

795

Fig. 4. Interaction of PHONOLOGICAL PROCESS × TASK (higher activation for segmental than prosodic processing during reproduction with manipulation, relative to repetition). Note: a) Difference of MANIP N REPEAT greater for SEGM than for PROS stimuli (p b 0.05, FDR corrected for the entire volume of images, cluster level extent N 10). b) Transversal sections, corresponding to a.

with manipulation versus verbatim repetition) independently of stimulus-inherent differences between phonological processes such as stimulus duration (and, subsequently, response duration). This analysis revealed a stronger effect of reproduction with manipulation (relative to repetition) for SEGM compared to PROS particularly in the left postcentral gyrus extending along the postcentral sulcus, in medial superior frontal gyri, and in the inferior frontal gyrus bilaterally as well as in the left insula and the right caudate nucleus (Fig. 4 and Table 4). The reverse interaction (i.e., the test for a stronger effect of prosodic than segmental processing for manipulation relative to repetition) did not yield any significant results at this threshold. Discussion The present fMRI study aimed to examine whether areas associated with the dorsal stream are involved in perception– production tasks requiring relatively few verbal working memory resources. Additionally, the present study sought to examine the relative contributions of the left and right hemispheres, depending on the segmental versus prosodic nature of a phonological manipulation. Subjects performed two different phonological tasks which required either a segmental or a prosodic manipulation of heard pseudowords. A verbatim repetition task served as control task. Based on previous findings, both reproduction with manipulation and repetition were expected to activate a temporo-frontal network. Additional activation in area Spt, defined as a region of interest in the posterior Sylvian fissure at the temporo-parietal boundary, was expected specifically for the manipulation task, which – unlike verbatim repetition – required segmental or prosodic manipulation

Table 4 Activation maxima for the interaction of TASK × PHONOLOGICAL PROCESS (difference of MANIP N REPEAT greater for SEGM than for PROS stimuli; threshold p b 0.05, FDR corrected for the entire volume of images, cluster level extent N 10). Region

Side Voxels MNI coordinates in cluster x y z

Postcentral gyrus Inferior frontal gyrus (BA45) Medial superior frontal gyrus (BA32) Inferior frontal gyrus (BA47) Caudate nucleus Insular lobe Lingual gyrus Calcarine gyrus Lingual gyrus Inferior frontal gyrus (BA44) Calcarine gyrus Midbrain Superior frontal gyrus Premotor cortex Insular lobe Middle occipital gyrus Hippocampus Calcarine gyrus

L L L R R L R R L R R L L R R L L L

94 943 284 301 384 49 18 31 13 98 24 17 21 11 17 17 11 11

− 66 − 45 −3 33 9 − 39 9 12 − 15 42 18 −3 − 15 30 42 − 39 − 21 −9

T-value

− 21 24 5.38 33 15 4.99 21 42 4.87 24 − 9 4.86 6 0 4.80 −3 6 4.12 − 60 − 3 3.91 − 75 15 3.83 − 48 − 6 3.79 9 30 3.72 − 54 6 3.66 − 18 − 12 3.64 45 27 3.62 − 15 48 3.50 3 0 3.44 − 75 21 3.41 − 18 − 9 3.30 − 72 15 3.14

of the perceptual input. We expected left-lateralized activation for segmental, and bilateral or right-lateralized activation for prosodic processing. Behavioral analyses demonstrated that subjects successfully performed the tasks. All participants reached a satisfactory level of phonemic accuracy. Manipulation accuracy was high and did not differ between the segmental and the prosodic task. Cortical areas involved in auditory–motor integration Both repetition and reproduction with manipulation evoked cortical activation predominantly in the postcentral gyri bilaterally, the STG bilaterally, and the left MFG. Furthermore, left IFG was strongly activated in the manipulation task and less so also in repetition. This perisylvian network resembles the activation pattern found in a shadowing paradigm which examined the relatively automated, immediate repetition of pseudowords (Peschke et al., 2009). Bilateral STG activation corroborates existing evidence regarding the involvement of these regions in speech perception. The strong activation of the postcentral gyri was somewhat surprising, but replicates the activation pattern found in our previous study (Peschke et al., 2009) as well as in prior reports of sensorimotor involvement during speech production (e.g. Dhanjal et al., 2008). In tasks requiring overt articulation, the postcentral gyrus appears to represent a significant component of the speech motor network (Dogil et al., 2002; Lotze et al., 2000; Riecker et al., 2005); this area may also be involved in tasks requiring articulation without phonation (Pulvermuller et al., 2006). Somatosensory cortex has been proposed to play a crucial role in speech motor control as part of a somatosensory feedback system (Guenther et al., 2006). In our study, subjects may have relied on the tactile-sensory feedback control system in order to compensate for an impoverished auditory feedback due to ambient scanner noise and tight-fitting headphones. Activation of left MFG in the present study was unexpected and had not been detected in our former shadowing task (Peschke et al., 2009). One explanation is that despite the use of pseudowords our paradigms may have triggered word retrieval mechanisms which are known to involve left MFG (review by Price, 2010, and references therein). As an alternative explanation, MFG has been argued to be part of an inhibitory neurocognitive network (Liddle et al., 2001; Pedersen et al., 1998; Rubia et al., 2001). In the present study, MFG activation may have resulted from the fact that subjects had to hold off their response until cued by the offset of a visual stimulus. Activation in the left IFG likely reflects speech production processes. Remarkably, this region was more active in the manipulation condition than in repetition, suggesting that the phonological manipulation examined here recruited this area in a specific way. This will be discussed in greater detail in the next section. Areas involved in phonological manipulation Our major objective of the present study was to determine the neuronal correlates of naturally occurring phonological manipulations. Our tasks required relatively few verbal working memory

796

C. Peschke et al. / NeuroImage 59 (2012) 788–799

resources, since only two to three syllables had to be processed and responses occurred at relatively short delays. Contrary to our predictions, reproduction with manipulation compared to verbatim repetition did not result in predominant activation of area Spt. Instead, phonological manipulation revealed large clusters of activation in the left dorsal IFG and in IPS. Despite our efforts to keep working memory demands low our manipulation tasks activated regions which have previously been associated with verbal working memory functions. Specifically, the inferior parietal region has been argued to be associated with the phonological store, whereas the IFG has been argued to be involved in articulatory rehearsal (Awh et al., 1996; Paulesu et al., 1993; Smith et al., 1996; Strand et al., 2008). Additionally, both regions have been reported to be sensitive to verbal working memory load, showing increasingly more activation as the number of items to be rehearsed increases (Martin et al., 2003; Ravizza et al., 2004; Rypma et al., 1999). On this background our results may suggest that the task of manipulating a heard verbal stimulus prior to reproduction required working memory resources to a significant extent, even though stimulus lengths and response delays did not place high demands on temporal storage capacities. Conceivably, there is a higher demand on verbal working memory when part of an auditory stimulus must be modified before reproduction than when the stimulus is simply repeated. In verbatim repetition, the incoming auditory stimulus may be transferred to the speech programming and articulation network without any further modifications and with only minor verbal working memory expenses. In the manipulation task, however, the auditory information has to be maintained in a more abstract form until the manipulation process has been completed, thereby delaying the start of speech motor programming. However, the parieto-frontal activation found in the present study may not solely be explainable by verbal working memory processes, since the phonological manipulation required more than the temporary storage of a stimulus. Rather, this task called for an explicit phonological manipulation of the perceived pseudowords. We propose that left IPS as well as left IFG is involved in these manipulation processes, and that activation in these areas is independent of whether a segmental or a prosodic manipulation is performed. Comparable to our finding, sequential manipulation of syllabic units compared to a simple matching task resulted in activation of Broca's area and the left supramarginal gyrus as well as the superior parietal lobule around the IPS (Gelfand and Bookheimer, 2003). Gelfand and Bookheimer's (2003) study may be argued to involve a high degree of verbal working memory compared to the task used here, since a string of three syllables had to be maintained until a second syllable sequence appeared. However, involvement of frontal and parietal areas has also been reported in phonological tasks with relatively low verbal working memory demands, e.g., during phoneme monitoring (IFG and inferior parietal lobe; Newman and Twieg, 2001) or in phoneme detection (IFG; Heim et al., 2003). Recently, IPS activation has been associated with the manipulation, and dorsolateral frontal activation with the monitoring of verbal information (Champod and Petrides, 2010), an interpretation which is perfectly consistent with our findings. Furthermore, IPS involvement in the present phonological manipulation task is consistent with the notion that IPS may be an inherently multisensory region which recodes “one representation into another frame of reference to permit action or comparison” (Foster and Zatorre, 2009, p. 8). In the present study, IPS involvement may indicate ‘recoding’ within the phonologic levels of ‘stress’ and ‘phonemic structure’, while left frontal involvement may partly reflect monitoring processes. Our finding is consistent with previous reports suggesting that the left dorsal IFG and the left IPS are part of the dorsal stream supporting auditory–motor integration. Recent MRI tractography studies of areas associated with the dorsal stream have suggested that the inferior parietal lobule is part of the dorsal stream (Catani et al., 2005; Parker et al., 2005). Frey et al. (2008) showed that the left IPS connects with

the posterior superior temporal region. Furthermore, the authors reported connections of the dorsolateral IFG with inferior parietal cortex. Our prediction that phonological manipulation, relative to verbatim repetition, would result in increased activation in area Spt clearly was not borne out. At a group level, there was no evidence suggesting that, compared to verbatim repetition, the added requirement of performing a stress shift across syllable boundaries or altering the quality of a vowel lead to an additional increase of activation in area Spt. Even at the individual subject level, only 7 (out of 22) subjects showed significant increases in BOLD signal change in this area as a result of manipulation, and only 4 out of 22 subjects showed clusters of more than 8 contiguous voxels. Testing against the binomial distribution (with the probability of non-activation set to 5%), the finding that 15/22 subjects did not show an increase in activation in Spt at a rather liberal threshold (see Results) is clearly incompatible with the a priori hypothesis that phonological manipulation results in an increase of activation in Spt (p b .001). This is not to say that the phonological task does not rely on area Spt, since activation in this region is evident in the conjunction of both segmental and prosodic manipulation (Fig. 2b). However, to a similar degree, area Spt also is involved in verbatim repetition (Fig. 2a). In the present study, the local maximum in area Spt of repetition and reproduction with manipulation (i.e., MNI coordinates [− 60 − 36 18]) is in close proximity to activation peaks in studies reporting area Spt as a sensory–motor integration area (Buchsbaum et al., 2005a; Okada and Hickok, 2006; Pa and Hickok, 2008). Therefore, our results suggest that both manipulation and repetition significantly rely on area Spt. This may not be surprising since both tasks require the transfer of auditory into speech motor information, confirming previous findings which suggest a role of area Spt in efficiently transferring verbal information into articulatory representations (Peschke et al., 2009). The additional demand of phonological conversion in the manipulation task may have led to a flexible corecruitment of multimodal areas in posterior parietal cortex (and specifically, IPS) involved in recoding and manipulating verbal as well as nonverbal material. In addition to dorsolateral prefrontal and parietal activation, we observed activation in the left inferior temporal gyrus and in ventrolateral prefrontal cortex (BA45) when subjects performed the segmental manipulation. These regions recently have been associated with a ‘ventral stream’ in language processing. A recent fiber tracking study demonstrated a ventral connection of middle and inferior temporal regions with pars triangularis and pars orbitalis of IFG via the extreme capsule (Saur et al., 2008). The authors of this study proposed that the dorsal stream is involved in sublexical processing, whereas the ventral stream may support higher-level language comprehension. This is compatible with the assumption that the ventral stream is involved in the lexical-semantic processing of speech (Hickok and Poeppel, 2004; Rilling et al., 2008; Specht et al., 2008). Also, the left posterior inferotemporal and the inferior prefrontal cortex have been reported to mediate semantic working memory (Fiebach et al., 2007). In the present study, it may appear somewhat unexpected that segmental manipulations should operate via both dorsal and ventral connections. However, the ventral activation may have been a result of stimulus construction. In the segmental condition, stimuli were composed of a pseudoword and a real German definite article (e.g. “der”), thereby constituting a syntactic phrase and indicating that the pseudoword-items, in a grammatical sense, are nouns. Additionally, the pseudodiminutives in this condition contain a real German suffix (e.g. “chen”), perhaps allowing unwanted associative imagery of larger versus smaller objects to infiltrate segmental processing. Hence, the definite article as well as the suffixes may have added implicit ‘wordiness’ to the pseudowords in the segmental condition, thereby involving structures of a ventral lexical-semantic processing stream.

C. Peschke et al. / NeuroImage 59 (2012) 788–799

The direct comparison of repetition with manipulation, computed separately for segmental and prosodic processing, showed higher BOLD signal changes in the angular gyrus bilaterally, the posterior cingulate cortex and the medial prefrontal cortex. These signal changes resulted from a higher deactivation in these areas during manipulation than during repetition. This network of areas fits a formation of activated areas previously described as a ‘default mode network’ (Raichle et al., 2001), which has been proposed to be active during resting state or during tasks with fewer cognitive requirements, and to be deactivated or suppressed in tasks with higher cognitive demands (Damoiseaux et al., 2006; Greicius and Menon, 2004; McKiernan et al., 2003; Persson et al., 2007; Singh and Fawcett, 2008). In the present study, manipulation likely required more cognitive capacity than repetition and therefore may have led to a higher deactivation of the default mode network than repetition. Lateralization of segmental versus prosodic processing In our secondary research question we asked whether specific phonological processes preferentially engage the left and/or the right hemisphere, depending on whether segmental or prosodic operations are involved. We hypothesized that segmental processing is lateralized to the left hemisphere, whereas prosodic processing may be mediated bilaterally or preferentially by the right hemisphere. Both phonological processes evoked a predominantly left lateralized activation pattern. A left lateralization for segmental processing (as required when altering the quality of a vowel at a subsyllabic level) is perfectly consistent with previous studies examining segmental phonological processing (e.g., Burton and Small, 2006; Heim et al., 2003). Our expectation of a bilateral or right-lateralized distribution of activation for prosodic processing, however, was not confirmed, at least for the given tasks which required a stress shift across a syllable boundary. Instead, we observed activation increases predominantly in left parietal and inferior frontal regions. The relative involvement of the left hemisphere in processing the prosodic stimuli in the present study might be explained by the specific requirements of our prosodic manipulation task: subjects had to identify and modify the temporal characteristics of the (stressed and unstressed) vowels of the stimuli in order to achieve a shift of stress across the syllable boundary. In addition, reproducing a stimulus after implementing a stress shift may be argued to impose a higher demand on auditory– motor integration, thereby leading to an increased involvement of left hemispheric areas. Cortical involvement has been reported to shift from a right to a more left-lateralized pattern when subjects are asked to evaluate prosodic features such as pitch range or vowel duration while repeating the stimuli using inner speech, as opposed to a prosodic evaluation without any explicit articulatory demands (Pihan, 2006). We further examined specific effects of segmental versus prosodic processing by computing a PHONOLOGIC PROCESS by TASK interaction. By way of this interaction we controlled for the longer trial durations in the segmental compared to the prosodic condition — thus, differences in stimulus length should not contribute to the observed activation. Nevertheless, segmental versus prosodic processing resulted in a large and highly significant cluster of activation in the dorsolateral IFG predominantly in the left hemisphere, as well as in activation in the left postcentral gyrus (extending to the postcentral sulcus). Among other functions, inferior frontal cortex has been associated with articulatory processes (Bonilha et al., 2006; Hillis et al., 2004). Furthermore, increased dorsolateral IFG activation during segmental processing may be related to morphosyntactic computations. This is because in German, when transforming a noun into its respective diminutive, the article has to be adapted and the appropriate suffix has to be appended to form a grammatical (diminutive) noun phrase. Such morphosyntactic processes have been associated with Broca's area (Embick et al., 2000; Indefrey et al., 2001). In particular, several

797

functional imaging studies have shown that grammatical gender processing activates part of the IFG (Heim et al., 2002; Hernandez et al., 2004; Padovani et al., 2005). Thus, the segmental manipulation task may have induced more complex linguistic operations. Nearly all subjects took more time to become acquainted with the segmental compared to the prosodic manipulation during the practice trials preceding the experiment. Probably, different linguistic units serve as the basis for performing the two types of manipulation. A stress shift, as required for the prosodic manipulation, is based on the whole prosodic word (Mary and Yegnanarayana, 2008) and leaves the segmental make-up of each syllable invariant, whereas a vowel replacement, as required for the segmental manipulation, affects the nucleus of a syllable and requires a re-programming of syllabic units. Since syllables are considered the basic units of speech and are represented as pre-compiled chunks including all relevant phonetic features (Levelt and Wheeldon, 1994), the re-organization of such units in the prosodic manipulation task may have occurred at more superficial processing levels than the reorganization of syllabic plans. This suggestion is substantiated by an earlier observation that processing of phonemes led to higher BOLD signal changes in the left IFG than the processing of syllables (Siok et al., 2003), and may explain our finding of more inferior frontal activation for the segmental as compared to the prosodic manipulation. As in our previous study (Peschke et al., 2009), the present tasks revealed strong involvement of the postcentral gyri. Involvement of the left postcentral gyrus (and sulcus) was observed to be even stronger when subjects had to replace a vowel during reproduction, compared to when they had to shift the word stress from the first to the second syllable (and vice versa). We previously suggested that postcentral activation may indicate increased reliance on a tactilesensory feedback control system in order to compensate for the impoverished auditory feedback in the noisy scanner environment (Peschke et al., 2009). Reliance on tactile-sensory feedback may be especially important in the segmental manipulation condition because segmental auditory information is probably more susceptible to scanner noise masking than prosodic information. Finally, testing for stronger effects of prosodic versus segmental manipulation (while controlling for potential differences between the two processes during repetition) revealed no specific effects of prosodic (compared to segmental) processing. As discussed in the preceding section, this may be due to the fact that during prosodic manipulation, reprogramming could proceed at the syllabic level, without the need for further sequencing operations at the sub-syllabic level. This suggests that prosodic manipulation, at least as required by the present task of stress shifts across syllable boundaries, is mediated by inferior parietal and inferior frontal (pars tringularis) areas, without necessary recourse to right hemispheric regions. Summary The present study was designed to examine the areas involved in phonological manipulations during auditory–motor integration. We demonstrated that the left IPS and the left dorsal IFG are involved in the manipulation of phonological representations independent of the linguistic level (i.e. segmental versus prosodic) at which the manipulation is performed. We propose that these parieto-frontal regions are co-recruited when the task requires explicit (phonological) manipulation, in addition to the more automated transfer of auditory into articulatory verbal codes which appears to involve area Spt. As predicted, segmental manipulation resulted in predominantly left hemispheric activation, involving left parietal and large aspects of left inferior frontal cortex. Prosodic manipulation did not evoke bilateral or right lateralized activation as expected, but a clearly left lateralized pattern, perhaps due to a task-dependent focus on temporal stimulus characteristics. Segmental compared to prosodic operations resulted in a left lateralized activation pattern probably resulting from the fact that

798

C. Peschke et al. / NeuroImage 59 (2012) 788–799

segmental processes are performed at the subsyllabic phoneme level, which may require a higher number of sequencing operations than prosodic processing. Such sequencing operations have been primarily associated with inferior frontal areas. Contrary to our predictions, a direct comparison of prosodic with segmental manipulation did not result in increased activation, which may be due to the specific requirements in the prosodic condition, namely a lower degree of morphosyntactic operations and processing at the syllabic level. Acknowledgments This work was supported by a grant from the German Federal Ministry of Education and Research (BMBF-01GW0572) to WZ and AB and was carried out as part of the collaborative BMBF research project “From dynamic sensorimotor interaction to conceptual representation: Deconstructing apraxia”. We thank two anonymous reviewers for their constructive comments. Appendix A. Supplementary data Supplementary data to this article can be found online at doi:10. 1016/j.neuroimage.2011.07.025. References Andersson, J.L.R., Hutton, C., Ashburner, J., Turner, R., Friston, K., 2001. Modelling geometric deformations in EPI time series. Neuroimage 13, 90–919. Arciuli, J., Slowiaczek, L.M., 2007. The where and when of linguistic word-level prosody. Neuropsychologia 45, 2638–2642. Ashtari, M., Lencz, T., Zuffante, P., Bilder, R., Clarke, T., Diamond, A., Kane, J., Szeszko, P., 2004. Left middle temporal gyrus activation during a phonemic discrimination task. Neuroreport 15, 389–393. Awh, E., Jonides, J., Smith, E.E., Schumacher, E.H., Koeppe, R.A., Katz, S., 1996. Dissociation of storage and rehearsal in verbal working memory: evidence from positron emission tomography. Psychol. Sci. 7, 25–31. Barrick, T.R., Lawes, I.N., Mackay, C.E., Clark, C.A., 2007. White matter pathway asymmetry underlies functional lateralization. Cereb. Cortex 17, 591–598. Bonilha, L., Moser, D., Rorden, C., Baylis, G.C., Fridriksson, J., 2006. Speech apraxia without oral apraxia: can normal brain function explain the physiopathology? Neuroreport 17, 1027–1031. Bradvik, B., Dravins, C., Holtas, S., Rosen, I., Ryding, E., Ingvar, D.H., 1991. Disturbances of speech prosody following right hemisphere infarcts. Acta Neurol. Scand. 84, 114–126. Buchel, C., Raedler, T., Sommer, M., Sach, M., Weiller, C., Koch, M.A., 2004. White matter asymmetry in the human brain: a diffusion tensor MRI study. Cereb. Cortex 14, 945–951. Buchsbaum, B.R., D'Esposito, M., 2008. The search for the phonological store: from loop to convolution. J. Cogn. Neurosci. 20, 762–778. Buchsbaum, B.R., Hickok, G., Humphries, C., 2001. Role of left superior temporal gyrus in phonological processing for speech perception and production. Cogn. Sci. 25, 663–678. Buchsbaum, B.R., Olsen, R.K., Koch, P., Berman, K.F., 2005a. Human dorsal and ventral auditory streams subserve rehearsal-based and echoic processes during verbal working memory. Neuron 48, 687–697. Buchsbaum, B.R., Olsen, R.K., Koch, P.F., Kohn, P., Kippenhan, J.S., Berman, K.F., 2005b. Reading, hearing, and the planum temporale. Neuroimage 24, 444–454. Buchsbaum, B.R., Baldo, J., Okada, K., Berman, K.F., Dronkers, N., D'Esposito, M., Hickok, G., 2011. Conduction aphasia, sensory–motor integration, and phonological shortterm memory — an aggregate analysis of lesion and fMRI data. Brain Lang. doi:10.1016/j.bandl.2010.1012.1001. Burton, M.W., Small, S.L., 2006. Functional neuroanatomy of segmenting speech and nonspeech. Cortex 42, 644–651. Burton, M.W., Small, S.L., Blumstein, S.E., 2000. The role of segmentation in phonological processing: an fMRI investigation. J. Cogn. Neurosci. 12, 679–690. Callan, D.E., Tsytsarev, V., Hanakawa, T., Callan, A.M., Katsuhara, M., Fukuyama, H., Turner, R., 2006. Song and speech: brain regions involved with perception and covert production. Neuroimage 31, 1327–1342. Catani, M., Jones, D.K., ffytche, D.H., 2005. Perisylvian language networks of the human brain. Ann. Neurol. 57, 8–16. Champod, A.S., Petrides, M., 2010. Dissociation within the frontoparietal network in verbal working memory: a parametric functional magnetic resonance imaging study. J. Neurosci. 30, 3849–3856. Damoiseaux, J.S., Rombouts, S.A., Barkhof, F., Scheltens, P., Stam, C.J., Smith, S.M., Beckmann, C.F., 2006. Consistent resting-state networks across healthy subjects. Proc. Natl. Acad. Sci. U.S.A. 103, 13848–13853. Dhanjal, N.S., Handunnetthi, L., Patel, M.C., Wise, R.J.S., 2008. Perceptual systems controlling speech production. J. Neurosci. 28, 9969–9975.

Dogil, G., Ackermann, H., Grodd, W., Haider, H., Kamp, H., Mayer, J., Riecker, A., Wildgruber, D., 2002. The speaking brain: a tutorial introduction to fMRI experiments in the production of speech, prosody and syntax. J. Neurolinguist. 15, 59–90. Eickhoff, S.B., Stephan, K.E., Mohlberg, H., Grefkes, C., Fink, G.R., Amunts, K., Zilles, K., 2005. A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data. Neuroimage 25, 1325–1335. Embick, D., Marantz, A., Miyashita, Y., O'Neil, W., Sakai, K.L., 2000. A syntactic specialization for Broca's area. Proc. Natl. Acad. Sci. U.S.A. 97, 6150–6154. Emmorey, K.D., 1987. The neurological substrates for prosodic aspects of speech. Brain Lang. 30, 305–320. Fiebach, C.J., Friederici, A.D., Smith, E.E., Swinney, D., 2007. Lateral inferotemporal cortex maintains conceptual-semantic representations in verbal working memory. J. Cogn. Neurosci. 19, 2035–2049. Foster, N.E.V., Zatorre, R.J., 2009. A role for the intraparietal sulcus in transforming musical pitch information. Cereb. Cortex 20, 1350–1359. Frey, S., Campbell, J.S., Pike, G.B., Petrides, M., 2008. Dissociating the human language pathways with high angular resolution diffusion fiber tractography. J. Neurosci. 28, 11435–11444. Gelfand, J.R., Bookheimer, S.Y., 2003. Dissociating neural mechanisms of temporal sequencing and processing phonemes. Neuron 38, 831–842. Greicius, M.D., Menon, V., 2004. Default-mode activity during a passive sensory task: uncoupled from deactivation but impacting activation. J. Cogn. Neurosci. 16, 1484–1492. Guenther, F.H., 2006. Cortical interactions underlying the production of speech sounds. J. Commun. Disord. 39, 350–365. Guenther, F.H., Ghosh, S.S., Tourville, J.A., 2006. Neural modeling and imaging of the cortical interactions underlying syllable production. Brain Lang. 96, 280–301. Heim, S., Opitz, B., Friederici, A.D., 2002. Broca's area in the human brain is involved in the selection of grammatical gender for language production: evidence from eventrelated functional magnetic resonance imaging. Neurosci. Lett. 328, 101–104. Heim, S., Opitz, B., Muller, K., Friederici, A.D., 2003. Phonological processing during language production: fMRI evidence for a shared production–comprehension network. Brain Res. Cogn. Brain Res. 16, 285–296. Hernandez, A.E., Kotz, S.A., Hofmann, J., Valentin, V.V., Dapretto, M., Bookheimer, S.Y., 2004. The neural correlates of grammatical gender decisions in Spanish. Neuroreport 15, 863–866. Hickok, G., 2009. The functional neuroanatomy of language. Phys. Life Rev. 6, 121–143. Hickok, G., Poeppel, D., 2000. Towards a functional neuroanatomy of speech perception. Trends Cogn. Sci. 4, 131–138. Hickok, G., Poeppel, D., 2004. Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92, 67–99. Hickok, G., Poeppel, D., 2007. The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393–402. Hickok, G., Buchsbaum, B.R., Humphries, C., Muftuler, T., 2003. Auditory–motor interaction revealed by fMRI: speech, music and working memory in area Spt. J. Cogn. Neurosci. 15, 673–682. Hickok, G., Okada, K., Barr, W., Pa, J., Rogalsky, C., Donnelly, K., Barde, L., Grant, A., 2008. Bilateral capacity for speech sound processing in auditory comprehension: evidence from Wada procedures. Brain Lang. 107, 179–184. Hillis, A.E., Work, M., Barker, P.B., Jacobs, M.A., Breese, E.L., Maurer, K., 2004. Reexamining the brain regions crucial for orchestrating speech articulation. Brain 127, 1479–1487. Houde, J.F., Jordan, M.I., 2002. Sensorimotor adaptation of speech I: compensation and adaptation. J. Speech Lang. Hear. Res. 45, 295–310. Hutton, C., Bork, A., Josephs, O., Deichmann, R., Ashburner, J., Turner, R., 2002. Image distortion correction in fMRI: a quantitative evaluation. Neuroimage 16, 217–240. Indefrey, P., Hagoort, P., Herzog, H., Seitz, R.J., Brown, C.M., 2001. Syntactic processing in left prefrontal cortex is independent of lexical meaning. Neuroimage 14, 546–555. Jacquemot, C., Scott, S.K., 2006. What is the relationship between phonological shortterm memory and speech processing? Trends Cogn. Sci. 10, 480–486. Jacquemot, C., Pallier, C., LeBihan, D., Dehaene, S., Dupoux, E., 2003. Phonological grammar shapes the auditory cortex: a functional magnetic resonance imaging study. J. Neurosci. 23, 9541–9546. Kappes, J., Baumgaertner, A., Peschke, C., Ziegler, W., 2009. Unintended imitation in nonword repetition. Brain Lang. 111, 140–151. Kuhl, P.K., 2000. A new view of language acquisition. Proc. Natl. Acad. Sci. U.S.A. 97, 11850–11857. Landis, J.R., Koch, G.G., 1977. The measurement of observer agreement for categorical data. Biometrics 33, 159–174. Larson, C.R., Burnett, T.A., Kiran, S., Hain, T.C., 2000. Effects of pitch-shift velocity on voice F0 responses. J. Acoust. Soc. Am. 107, 559–564. Levelt, W.J.M., Wheeldon, L., 1994. Do speakers have access to a mental syllabary? Cognition 50, 239–269. Liddle, P.F., Kiehl, K.A., Smith, A.M., 2001. Event-related fMRI study of response inhibition. Hum. Brain Mapp. 12, 100–109. Lotze, M., Seggewies, G., Erb, M., Grodd, W., Birbaumer, N., 2000. The representation of articulation in the primary sensorimotor cortex. Neuroreport 11, 2985–2989. Martin, R.C., Wu, D., Freedman, M., Jackson, E.F., Lesch, M., 2003. An event-related fMRI investigation of phonological versus semantic short-term memory. J. Neurolinguist. 16, 341–360. Mary, L., Yegnanarayana, B., 2008. Extraction and representation of prosodic features for language and speaker recognition. Speech Comm. 50, 782–796. McKiernan, K.A., Kaufman, J.N., Kucera-Thompson, J., Binder, J.R., 2003. A parametric manipulation of factors affecting task-induced deactivation in functional neuroimaging. J. Cogn. Neurosci. 15, 394–408.

C. Peschke et al. / NeuroImage 59 (2012) 788–799 Newman, S.D., Twieg, D., 2001. Differences in auditory processing of words and pseudowords: an fMRI study. Hum. Brain Mapp. 14, 39–47. Nichols, T., Brett, M., Andersson, J., Wager, T., Poline, J.B., 2005. Valid conjunction inference with the minimum statistic. Neuroimage 25, 653–660. Okada, K., Hickok, G., 2006. Left posterior auditory-related cortices participate both in speech perception and speech production: neural overlap revealed by fMRI. Brain Lang. 98, 112–117. Okada, K., Smith, K.R., Humphries, C., Hickok, G., 2003. Word length modulates neural activity in auditory cortex during covert object naming. Neuroreport 14, 2323–2326. Oldfield, R.C., 1971. The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9, 97–113. Pa, J., Hickok, G., 2008. A parietal–temporal sensory–motor integration area for the human vocal tract: evidence from an fMRI study of skilled musicians. Neuropsychologia 46, 362–368. Padovani, R., Calandra-Buonaura, G., Cacciari, C., Benuzzi, F., Nichelli, P., 2005. Grammatical gender in the brain: evidence from an fMRI study on Italian. Brain Res. Bull. 65, 301–308. Papathanassiou, D., Etard, O., Mellet, E., Zago, L., Mazoyer, B., Tzourio-Mazoyer, N., 2000. A common language network for comprehension and production: a contribution to the definition of language epicenters with PET. Neuroimage 11, 347–357. Parker, G.J., Luzzi, S., Alexander, D.C., Wheeler-Kingshott, C.A., Ciccarelli, O., Lambon Ralph, M.A., 2005. Lateralization of ventral and dorsal auditory–language pathways in the human brain. Neuroimage 24, 656–666. Paulesu, E., Frith, C.D., Frackowiak, R.S., 1993. The neural correlates of the verbal component of working memory. Nature 362, 342–345. Paus, T., Zijdenbos, A., Worsley, K., Collins, D.L., Blumenthal, J., Giedd, J.N., Rapoport, J.L., Evans, A.C., 1999. Structural maturation of neural pathways in children and adolescents: in vivo study. Science 283, 1908–1911. Pedersen, J.R., Johannsen, P., Bak, C.K., Kofoed, B., Saermark, K., Gjedde, A., 1998. Origin of human motor readiness field linked to left middle frontal gyrus by MEG and PET. Neuroimage 8, 214–220. Persson, J., Lustig, C., Nelson, J.K., Reuter-Lorenz, P.A., 2007. Age differences in deactivation: a link to cognitive control? J. Cogn. Neurosci. 19, 1021–1032. Peschke, C., Ziegler, W., Kappes, J., Baumgaertner, A., 2009. Auditory–motor integration during fast repetition: the neuronal correlates of shadowing. Neuroimage 47, 392–402. Pihan, H., 2006. Affective and linguistic processing of speech prosody: DC potential studies. Prog. Brain Res. 156, 269–284. Price, C.J., 2010. The anatomy of language: a review of 100 fMRI studies published in 2009. Ann. N. Y. Acad. Sci. 1191, 62–88. Pulvermuller, F., Huss, M., Kherif, F., del Prado, Moscoso, Martin, F., Hauk, O., Shtyrov, Y., 2006. Motor cortex maps articulatory features of speech sounds. Proc. Natl. Acad. Sci. U.S.A. 103, 7865–7870. Purcell, D.W., Munhall, K.G., 2006. Adaptive control of vowel formant frequency: evidence from real-time formant manipulation. J. Acoust. Soc. Am. 120, 966–977. Raichle, M.E., MacLeod, A.M., Snyder, A.Z., Powers, W.J., Gusnard, D.A., Shulman, G.L., 2001. A default mode of brain function. Proc. Natl. Acad. Sci. U.S.A. 98, 676–682. Ravizza, S.M., Delgado, M.R., Chein, J.M., Becker, J.T., Fiez, J.A., 2004. Functional dissociations within the inferior parietal cortex in verbal working memory. Neuroimage 22, 562–573.

799

Riecker, A., Wildgruber, D., Dogil, G., Grodd, W., Ackermann, H., 2002. Hemispheric lateralization effects of rhythm implementation during syllable repetitions: an fMRI study. Neuroimage 16, 169–176. Riecker, A., Mathiak, K., Wildgruber, D., Erb, M., Hertrich, I., Grodd, W., Ackermann, H., 2005. fMRI reveals two distinct cerebral networks subserving speech motor control. Neurology 64, 700–706. Rilling, J.K., Glasser, M.F., Preuss, T.M., Ma, X., Zhao, T., Hu, X., Behrens, T.E., 2008. The evolution of the arcuate fasciculus revealed with comparative DTI. Nat. Neurosci. 11, 426–428. Rubia, K., Russell, T., Bullmore, E.T., Soni, W., Brammer, M.J., Simmons, A., Taylor, E., Andrew, C., Giampietro, V., Sharma, T., 2001. An fMRI study of reduced left prefrontal activation in schizophrenia during normal inhibitory function. Schizophr. Res. 52, 47–55. Rypma, B., Prabhakaran, V., Desmond, J.E., Glover, G.H., Gabrieli, J.D., 1999. Loaddependent roles of frontal brain regions in the maintenance of working memory. Neuroimage 9, 216–226. Saur, D., Kreher, B.W., Schnell, S., Kummerer, D., Kellmeyer, P., Vry, M.S., Umarova, R., Musso, M., Glauche, V., Abel, S., Huber, W., Rijntjes, M., Hennig, J., Weiller, C., 2008. Ventral and dorsal pathways for language. Proc. Natl. Acad. Sci. U.S.A. 105, 18035–18040. Singh, K.D., Fawcett, I.P., 2008. Transient and linearly graded deactivation of the human default-mode network by a visual detection task. Neuroimage 41, 100–112. Siok, W.T., Jin, Z., Fletcher, P., Tan, L.H., 2003. Distinct brain regions associated with syllable and phoneme. Hum. Brain Mapp. 18, 201–207. Smith, E.E., Jonides, J., Koeppe, R.A., 1996. Dissociating verbal and spatial working memory using PET. Cereb. Cortex 6, 11–20. Specht, K., Huber, W., Willmes, K., Shah, N.J., Jäncke, L., 2008. Tracing the ventral stream for auditory speech processing in the temporal lobe by using a combined time series and independent component analysis. Neurosci. Lett. 442, 180–185. Strand, F., Forssberg, H., Klingberg, T., Norrelgen, F., 2008. Phonological working memory with auditory presentation of pseudo-words — an event related fMRI study. Brain Res. 1212, 48–54. Tourville, J.A., Reilly, K.J., Guenther, F.H., 2008. Neural mechanisms underlying auditory feedback control of speech. Neuroimage 39, 1429–1443. Van Lancker, D., 1980. Cerebral lateralization of pitch cues in the linguistic signal. Paper Ling. Int. J. Hum. Comm. 13, 201–277. Warren, J.E., Wise, R.J., Warren, J.D., 2005. Sounds do-able: auditory–motor transformations and the posterior temporal plane. Trends Neurosci. 28, 636–643. Weintraub, S., Mesulam, M.M., Kramer, L., 1981. Disturbances in prosody. A righthemisphere contribution to language. Arch. Neurol. 38, 742–744. Westbury, C.F., Zatorre, R.J., Evans, A.C., 1999. Quantifying variability in the planum temporale: a probability map. Cereb. Cortex 9, 392–405. Wilke, M., Lidzba, K., 2007. LI-tool: a new toolbox to assess lateralization in functional MR-data. J. Neurosci. Methods 163, 128–136. Wilke, M., Schmithorst, V.J., 2006. A combined bootstrap/histogram analysis approach for computing a lateralization index from neuroimaging data. Neuroimage 33, 522–530. Wilson, S.M., Iacoboni, M., 2006. Neural responses to non-native phonemes varying in producibility: evidence for the sensorimotor nature of speech perception. Neuroimage 33, 316–325. Zaehle, T., Geiser, E., Alter, K., Jancke, L., Meyer, M., 2008. Segmental processing in the human auditory dorsal stream. Brain Res. 1220, 179–190.