Neuropsychologia 42 (2004) 183–200
Auditory lexical decision, categorical perception, and FM direction discrimination differentially engage left and right auditory cortex David Poeppel a,∗ , Andre Guillemin b , Jennifer Thompson b , Jonathan Fritz c , Daphne Bavelier d , Allen R. Braun b a
Cognitive Neuroscience of Language Laboratory, Departments of Linguistics and Biology, University of Maryland, 1401 Marie Mount Hall, College Park, MD 20742, USA b Language Section, Voice, Speech, and Language Branch, National Institute of Deafness and other Communication Disorders, Bethesda, MD 20892, USA c Institute for Systems Research, University of Maryland, College Park, MD 20742, USA d Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY 14627, USA Received 1 June 2002; received in revised form 22 November 2002; accepted 22 July 2003
Abstract Recent neuroimaging and neuropsychological data suggest that speech perception is supported in bilaterally auditory areas. We evaluate this issue building on well-known behavioral effects. While undergoing positron emission tomography (PET), subjects performed standard auditory tasks: direction discrimination of frequency-modulated (FM) tones, categorical perception (CP) of consonant–vowel (CV) syllables, and word/non-word judgments (lexical decision, LD). Compared to rest, the three conditions led to bilateral activation of the auditory cortices. However, lateralization patterns differed as a function of stimulus type: the LD task generated stronger responses in the left, the FM task a stronger response in the right hemisphere. Contrasts between either words or syllables versus FM were associated with significantly greater activity bilaterally in superior temporal gyrus (STG) ventro-lateral to Heschl’s gyrus. These activations extended into the superior temporal sulcus (STS) and the middle temporal gyrus (MTG) and were greater in the left. The same areas were more active in the LD than the CP task. In contrast, the FM task was associated with significantly greater activity in the right lateral–posterior STG and lateral MTG. The findings argue for a view in which speech perception is mediated bilaterally in the auditory cortices and that the well-documented lateralization is likely associated with processes subsequent to the auditory analysis of speech. © 2003 Elsevier Ltd. All rights reserved. Keywords: Hemispheric asymmetry; Speech perception; Word recognition; Spectral; Temporal
1. Introduction Despite much recent work on the functional architecture of speech perception, some basic issues remain unresolved, including coarse functional anatomic considerations on hemispheric lateralization. One point of debate concerns to what degree there is a significant bilateral contribution to speech perception (construed as the process of analyzing and transforming continuous waveform input into representations suitable to interface with the mental lexicon), notwithstanding the fact that language processing beyond the input interface of speech perception is highly lateralized (Binder et al., 1997, 2000; Giraud & Price, 2001; Hickok & Poeppel, 2000; Mummery, Ashburner, Scott, & Wise, 1999; Norris & Wise, 2000; Scott et al., 2000). ∗
Corresponding author. Tel.: +1-301-405-1016; fax: +1-301-405-7104. E-mail address:
[email protected] (D. Poeppel).
0028-3932/$ – see front matter © 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.neuropsychologia.2003.07.010
Previous work investigating the neural basis of speech has used behavioral tasks such as phoneme monitoring (Démonet et al., 1992) or listening to rotated speech (Scott et al., 2000). To complement these studies, we investigate the cortical architecture of speech building on canonical psychophysical phenomena, contrasting three standard auditory paradigms: (i) discrimination (up/down) of frequency-modulated (FM) signals; (ii) categorical perception (CP) (ba/pa) of consonant–vowel (CV) syllables varying along an acoustic voice-onset time (VOT) continuum; and (iii) lexical decision of phonologically permissible targets (word/non-word). FMs (adjusted to have the same frequency-range as speech without eliciting speech-like percepts) are used to evaluate elementary auditory processing of dynamic signals (e.g. Gordon & Poeppel, 2002). CP of CV syllables varying in one acoustic parameter is a phenomenon that has been used extensively to probe mechanisms of speech perception. The successful execution of a
184
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
CP task requires precise analysis of the speech sound but does not entail any (obvious) lexical–semantic processing (Liberman, Harris, Hoffman, & Griffith, 1957). In an auditory lexical decision (LD) task, subjects must judge whether or not an auditory target (e.g. “blicket”) is a word. Execution of this task requires lexical access or lexical search in addition to the analysis of the speech signal (for review, see e.g. Goldinger, 1996). We attempt to minimize task effects: in all three paradigms subjects execute a single-trial two-alternative-forced-choice on signals presented at the same rate. By hypothesis, in all cases there is an initial processing stage that constructs spectro-temporal representations of the signals. Subsequently, these representations interface with different systems. Words and non-words elicit lexical access (requiring speech analysis and lexical access), syllables elicit processing of the speech signal but not (the same degree of) lexical analysis, and FM processing requires spectral analysis but neither speech nor lexical processing. These differences should be reflected in distinct functional anatomic substrates. Three questions are investigated. First, we evaluate whether the processing of speech is mediated bilaterally in the superior temporal gyri. Second, we test whether words additionally activate extra-temporal areas and reflect the lateralization that is typical of language processing beyond the analysis of the input signal. Third, the hypothesis that left and right non-primary auditory areas differentially contribute to the analysis of the speech and other complex signals is explored by comparing which signals and tasks drive the (most likely non-primary) areas more effectively. 2. Materials and methods 2.1. Subjects
computer (Apple Computer, Inc., Cupertino, CA) playing through an AIWA LCX-800M stereo system (AIWA America, Inc., Mahwah, NJ). The output from the stereo was presented through two audio speakers situated on the sides of the positron emission tomography (PET) scanner gantry. These speakers were equidistant, dorsal and inferior to either side of the subjects’ ears. Subjects indicated their responses by pressing a response button held in each hand. Responses in the two-alternative forced choice experiments were recorded by the computer and program used to present stimuli. Materials for the FM sweeps condition (FM) consisted of eight frequency-modulated signals of 380 ms duration (sinusoidal carrier): four linearly rising FM sweeps (200–3200, 200–1600, 200–800 and 200–400) and four linearly falling FM sweeps (3200–200, 1600–200, 800–200 and 400–200 Hz). The mean presentation amplitude of the sweeps was 76 dB. The stimuli for the CP condition consisted of seven synthesized CV syllables of 386 ms total duration with VOTs of 5, 10, 15, 25, 30, 35 and 45 ms. The syllables were synthesized using the Sensyn implementation of the Klatt synthesizer (Sensimetrics, Cambridge MA) and the synthesis parameters were those previously reported (Poeppel et al., 1996). The presentation amplitude of the CV syllables was 85 dB (range, 84–86). Materials for the LD condition consisted of 200 single syllable words (e.g. lease, fruit, herb, lead) and 200 single-syllable phonologically permissible non-words (e.g. tice, treek, jide, zumb). The materials were spoken by an adult male speaker. Mean word duration was 537 ms (range, 528–550). The presentation amplitude of the words was 85 dB (range, 77–93). The difference in duration between words and the other two stimulus types is problematic. However, to maintain natural word stimuli, we were forced to compromise on this issue. 2.3. Scanning methods
Participants were five males and five females (mean age, 26 years; range, 19–34 years). All participants graduated from or were attending a 4-year college. All participants were right handed according to the Oldfield handedness inventory (Oldfield, 1971), were native English speakers, and had normal physical, audiometric, and neurological examinations. All participants gave written consent after the nature and possible consequences of the study were explained. Each subject was paid for participating. 2.2. Stimulus materials and apparatus All materials were recorded on an Apple Power Macintosh computer using Macromedia SoundEdit 16 (Macromedia, Inc., 1990–1996). SoundEdit files were converted to Macintosh System 7 sound resource files using SoundApp (Franke, 1999). All materials were auditorily presented to participants via the program RSVP (courtesy of Michael Tarr, Brown University) using an Apple Power Macintosh
PET scans were performed on a GE Advance tomograph (Milwaukee, WI). The scanner has an axial field of view of 15.3 cm, and an axial and in-plane resolution 5.5 mm FWHM. 35 contiguous axial planes, offset by 4.25 mm (center to center), were acquired simultaneously. Subjects’ eyes were patched, and head motion was restricted during the scans by the use of a thermoplastic mask. 10 mCi of H2 15 O were injected as an intravenous bolus for each scan in 6–8 cm3 of normal saline. Auditory tasks were begun 30 s prior to the injection of radiotracer and continued throughout the scanning period. Scans were initiated automatically when the count rate in the brain reached a threshold value of 8000 s−1 , approximately 20 s after injection. Data acquisition continued for 1 min. Studies were separated by 5 min intervals with background scans acquired for count correction beginning one minute prior to each H2 15 O injection. A transmission scan using rotating Ge-68/Ga-68 pin source was performed for attenuation correction before the rCBF scans.
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
2.4. Procedure Each participant underwent five scans in each of the three experimental conditions, as well as five scans in a resting control condition. Participants were given response instructions for each condition before being placed in the scanner. For the FM sweeps condition, participants were instructed to indicate whether a stimulus was rising or falling by pressing the right or left response button, respectively. For the categorical perception condition, participants were instructed to indicate whether a stimulus sounded like a /ba or a /pa/ by pressing the right or left response button, respectively. For the lexical decision condition, participants were told to indicate whether a stimulus was a word or non-word by pressing the right or left response button, respectively. For the resting control condition, participants pressed alternately either the left or right response button. No sounds were played in this condition. All participants wore opaque eye patches to block out ambient light. All trials started with the stimulus being played followed with a response period during which the participant could indicate his or her response. Participants’ reaction time was measured from the start of stimulus playback. All trials in all conditions were 1500 ms in length. The presentation order of all stimuli in all conditions was determined by a pseudo-random block design. In each scan in the FM sweeps condition, participants were played 10 blocks of stimuli, each block containing four rising sweeps and four falling sweeps. All sweeps appeared once in each block. In each scan in the categorical perception condition, participants were played 11 blocks of stimuli, each block containing seven phonemes. All phonemes appeared once in each block. In each scan in the lexical decision condition, participants were played 10 blocks of stimuli, each block containing four words and four non-words. No stimulus was repeated during the experiment. 2.5. Analysis Calculations and image processing were carried out on a SUN Ultra 60 workstation using Matlab (MathWorks, Natick, MA) and SPM96 software (Wellcome Department of Cognitive Neurology, London, UK). To correct for head movement between scans, images were aligned on a voxel by voxel basis using a 3-D automated image registration algorithm (Woods, Cherry, & Mazziotta, 1992). Images were stereotaxically normalized into a canonical space (Talairach & Tournoux, 1988), and smoothed using a Gaussian filter of 15 mm × 15 mm × 9 mm in x, y and z axes. The SPM analysis, which used the multi-subjects: with replications model, is an implementation of the General Linear Model (Friston, 1995), equivalent to an ANOVA applied on a voxel-by-voxel basis in which the task effect is the parameter of interest and global activity and inter- and intrasubject variability are confounding effects. Images are scaled to a global mean rCBF of 50 ml/100 g/min.
185
Two sets of within-group contrasts were performed: (1) each auditory task was compared to rest (R): LD–R, CP–R and FM–R; (2) contrasts between the auditory tasks were then carried out: LD–FM and LD–CP (to evaluate lexical processing); FM–LD and FM–CP (to evaluate FM processing); CP–LD and CP–FM (to evaluate syllable processing). The resulting set of voxel values for each contrast constitute a statistical parametric map of the t-statistic (SPM{t}) which is then transformed to standard normal (SPM{z}) scores. Tests of significance based on the size of the activated region (Friston, Worsley, Frackowiak, Mazziotta, & Evans, 1994) were performed for each contrast. For the auditory task versus rest contrasts, mean differences in normalized rCBF at selected voxels of interest were extracted from the SPM output for purposes of illustration. 3. Results 3.1. Behavioral results 3.1.1. Lexical decision Fig. 1a and b summarizes the LD behavioral data. Participants correctly made word/non-word judgments on 87% of trials. The mean time to make these judgments was 980 ms. Participants timed out on 1% of trials. There was no significant difference in accuracy between their judgments of words and non-words (89% versus 85%, respectively; F(1, 9) = 1.14, P > 0.10). However, Fig. 1b shows the typical finding for lexical decision studies contrasting words and pronounceable non-words, namely that participants were faster to judge words than non-words (951 ms versus 1009 ms, respectively, F(1, 9) = 16.42, P < 0.01). 3.1.2. Categorical perception The response profile generated by subjects in the syllable categorization task is shown in Fig. 1c. The judgments matched judgments made by participants in previous studies of CV categorical perception (Liberman et al., 1957). Specifically, the continuously varying variable VOT is treated discontinuously in perception, i.e. equal acoustic steps (10 ms VOT) are classified into discrete bins, with syllables with VOTs of 5, 10, or 15 ms all being categorized as voiced /ba/ whereas syllables with VOTs longer than 25 ms are classified as the voiceless stop /pa/. Participants’ mean time to make category judgments is shown in Fig. 1d. Overall, participants took 716 ms to make these judgments. Participants timed out on less than 1% of trials. The performance of participants was analyzed in a one-way within-subjects ANOVA. Replicating the well known categorical perception reaction time response profile, there was an effect of VOT that was due to the increased time in classifying the 25 and 30 ms (boundary) voice-onset stimuli [F(6, 54) = 20.11, P < 0.001]. 3.1.3. FM sweeps Fig. 1e and f summarizes the behavioral performance for the FM direction discrimination task. Participants’
186
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
Fig. 1. Behavioral data for three tasks. (a and b) The lexical decision data. Note that although there is no performance difference, the longer reaction time for the pronounceable non-words (b) reflects that these items require additional time to make an accurate lexical decision. (c and d) The response profile for the CV-syllable categorical perception task. Note the sharp (categorical) drop-off in judgment between 20 and 30 ms VOT, a typical judgment profile for native English speakers. The reaction time data shown in (d) also reflect the increased processing time at the point of uncertainty between 20 and 30 ms VOT. (e) The proportion correct and (f) the reaction time data for the up/down FM discrimination task for each FM rate. FM rate increases from left to right.
accuracy in judging the frequency direction of FM sweep stimuli is shown in Fig. 1e. Overall, participants made accurate judgments on 92% of trials. The performance of participants was analyzed in a 2 (direction: rising, falling) × 4 (range: 200–400, 200–800, 200–1600, 200–3200 Hz) within-subjects ANOVA. There was no difference between participants’ accuracy in judging rising sweeps from falling sweeps [92% versus 93%, respectively; F(1, 9) = 2.61, P > 0.10]. However, there was an interaction between range and direction [F = (3, 27) = 13.38, P < 0.001]. As shown in Fig. 1e, as the range of the sweep (and therefore
the FM rate) increased, participants were more accurate with rising sweeps [F(3, 27) = 14.35, P < 0.001] and less accurate with falling sweeps [F(3, 27) = 8.65, P < 0.001]. Further, there was a main effect of range [F(3, 27) = 9.90, P < 0.001]. This effect appeared to be due to the fall-off in accuracy of participants judging the frequency direction of the rising 200–400 Hz tone and the falling 3200–200 Hz tone. Participants’ reaction time in judging the frequency direction of sweeps stimuli is shown in Fig. 1f. Excluding time-outs (1% of trials), the mean time to make judgments was 821 ms. Participants’ reaction times were analyzed in
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
the same design used to analyze the accuracy of their judgments. Participants were faster to judge falling FM sweeps than rising FM sweeps (979 ms versus 845 ms, respectively; F(1, 9) = 39.46, P < 0.001). A direction × range interaction for reaction time was found that was similar to that for accuracy [F(3, 27) = 21.49, P < 0.001]. As shown in Fig. 1f, as the frequency-range of the sweep increased, participants were faster with rising sweeps [F(3, 27) = 31.97, P < 0.001] and slower with falling sweeps [F(3, 27) = 7.04, P = 0.001]. Finally, there was a main effect of range of frequency [F(3, 27) = 8.46, P < 0.001] on reaction times. This effect is partially due to the relatively long time participants took to classify rising 200–400 Hz tones. 3.2. PET results 3.2.1. Tasks versus rest comparison Table 1 summarizes the analysis for the comparison of each of the three tasks against rest, listing the local activation maxima for each contrast. Fig. 2 shows activations versus resting baseline as standard normal (SPM{z}) scores in canonical planar Talairach views. The comparison LD versus rest yielded eight significant clusters, encompassing activations in both left and right hemispheres, with 9744 voxels above threshold, lateralized to the left (L:R ratio 1.40). CP versus rest yielded eight significant clusters, with 5386 voxels above threshold, evenly distributed in the left and right hemispheres (L:R ratio 1.05). FM versus rest showed 11 significant clusters, with 7734 voxels above threshold, lateralized to the right (L:R ratio 0.89). In addition to the evident trend—leftward lateralization for words, bilateral activation for syllables, and slight rightward lateralization for sweeps—activations for sweeps also extended more posteriorly in the right hemisphere. Note that these lateralization patterns are qualitative; we are not making quantitative claims about absolute lateralization outcomes, but merely reporting the overall patterns as reflected by clusters and maximal Z-scores.
187
3.2.1.1. Activations common to all tasks. Temporal areas. For all tasks, activation includes the entire antero-posterior extent of the lateral–temporal cortex in both hemispheres anterior, middle and posterior superior temporal gyrus (STG), anterior, middle and posterior middle temporal gyrus (MTG), and all portions of intervening superior temporal sulcus (STS). The areas of activation extended medially to include the areas in and around the transverse temporal gyrus. The activation should include AI and contiguous portions of AII, or, using a more contemporary nomenclature, core and belt auditory areas (Kaas & Hackett, 2000; Rademacher & Caviness, 1993; Rademacher & Morosan, 2001; Rauschecker, 1998). The resolution of our measurement technique, however, does not permit us to localize the activation in way that would allow us to use this available nomenclature for the auditory cortex in an accurate way. Importantly the local maxima—indexing the greatest activations—are located in the STG and STS. For all tasks, the largest Z-scores, indexing the greatest local maxima overall, are found in the middle portion of the STS, at the ventral bank of the STG, in both hemispheres (mid-STG/STS, Table 1). As the data in Table 1 show, throughout the STG/STS, activations in the LH exceeded those in RH for LD and CP (left hemisphere lateralization is most pronounced in the STS). In the central part of the STG, encompassing the most anterior portion of Heschl’s gyrus (putative primary auditory cortex; Rademacher & Morosan, 2001), activation is lateralized (qualitatively) to the left for all tasks. On the other hand, for FM, STS activations in the RH always exceed those in the LH, and the degree of rightward lateralization is most pronounced in the posterior portions of the STS. Extra-temporal areas. All tasks were associated with increased activity in premotor and motor structures, including pre-central gyrus (primary motor cortex), SMA, putamen and ventral thalamus and cerebellum (elements of
Fig. 2. Map of three tasks compared to rest. Maps of brain areas activated during the lexical decision (a), categorical perception (b) and FM sweeps discrimination (c) tasks. Statistical parametric maps in three projections display pixels in which normalized regional cerebral blood flow differed between task and rest conditions. Values are Z-scores representing the significance level of changes in normalized rCBF in each voxel; the range of scores for each contrast are coded in the accompanying color tables. The grid is the standard stereotaxic (Talairach) grid into which subjects’ scans were normalized. The anterior commissural–posterior commissural line is set at zero on the sagittal and coronal projections. Vertical projections of the anterior commissure (VAC) and posterior commissure (VPC) are depicted on the transverse and sagittal projections. See Table 1 for detail.
188
Table 1 Three tasks vs. rest Region
Brod. no.
Lexical decision
Categorical perception
Left
Right
Z-score
x
y
z
Z-score
22 22 22
12.58 10.04 15.49 9.41
−42 −52 −58 −48
−32 −4 −22 −44
4 −4 4 8
Inferior, basal temporal ITG Posterior fusiform Anterior fusiform
37 37
– 4.80 3.41
– −38 −38
– −60 −26
Parieto-occipital Superior parietal lobule Supramarginal gyrus
7 40
– −
– –
Prefrontal Inferior DLPFC/operculum Superior DLPFC Inferior frontal operculum Middle frontal operculum Superior frontal operculum
46/44 8 47 44/45 44/6
– – 6.68 7.03 6.26
4/6 6 6
32/24 35
Frontal motor Precentral SMA LPM Insula/cingulate/PHPC Anterior insula Anterior cingulate/MPF Anterior PHPC Subcortical Lateral CB Posterior CB Caudate Putamen Ventral thalamus Pulvinar MB tegmentum PAG
Left x
y
z
8.04 9.50 14.43 7.41
44 54 54 50
−32 −4 −28 −44
12 −4 4 8
– −16 −16
– – −
– – −
– – −
– – −
– –
– –
– –
– –
– –
– –
– – −34 −44 −36
– – 24 20 6
– – −4 12 24
– – 4.50 4.59 5.72
– – 34 42 42
– – 24 20 22
– – −4 8 24
4.07 7.16 –
−24 −2 –
−10 6 –
44 48 –
– 7.02 –
– 0 –
– 12 –
6.84 6.14 6.05
−36 −2 −30
22 14 −28
0 40 0
6.10 4.14 5.32
34 10 30
4.68 4.30 4.50 4.50 – 4.23 4.04 3.62
−40 −6 −14 −16 – −22 −10 −6
−62 −80 −10 8 – −34 −10 −34
−20 −20 20 4 – 12 −4 −4
– – 3.67 3.78 3.68 3.14 – 3.17
– – 10 12 10 22 – 0
Right
Z-score
x
y
9.90 7.32 11.67 8.76
−38 −54 −58 −48
−32 −4 −24 −36
– –
z
Z-score 8 0 4 8
Left x
y
z
x
y
−40 −52 −60 −44
−32 −2 −26 −40
12 −4 4 8
– – −
– – −
−18 −36
7.52 6.98 11.60 5.84
48 52 58 54
−30 −4 −28 −44
8 −4 4 8
7.47 6.83 9.60 8.53
3.28 – −
36 – −
−50 – −
−4 – −
– – −
– –
– –
– –
30 – – – 36
24 – – – 20
24 – – – 24
– – –
– – –
– – –
– –
Right
Z-score
z
Z-score
x
y
z
6.47 8.08 11.76 11.75
44 52 52 58
−32 −10 −30 −36
12 −4 4 8
– – −
– – −
– – −
– – −
– – −
−64 −38
48 36
– –
– –
– –
– –
– – – – –
– – – – –
– – – – –
– 4.03 – – 3.83
– 30 – – 38
– 38 – – 14
– 20 – – 24
6.58 6.57 4.02
−42 −4 −42
−6 6 0
40 48 44
5.20 – 3.10
40 – 34
−2 – −6
32 – 48
– –
– –
– –
3.57 3.86
– – – – 3.03
– – – – −50
– – – – 24
– – – – 20
4.32 – – – 4.30
– 44 –
5.54 5.70 4.10
−48 −4 −46
−6 4 0
40 48 44
– – –
22 4 −36
4 40 0
– 4.37 4.90
– −10 −30
– 10 −28
– 36 0
4.05 – 4.70
36 – 30
18 – −36
0 – 0
– 4.10 3.87
– −10 −30
– 12 −34
– 36 0
4.91 3.24 3.94
30 18 30
20 4 −32
4 32 0
– – 6 2 −10 −36 – 26
– – 8 4 8 12 – −8
– 4.30 5.44 4.24 3.97 – 3.63 3.70
– −6 −16 −6 −12 – −14 −6
– −78 −10 8 −4 – −10 −34
– −20 20 4 8 – 0 −8
– – – – – – – 3.65
– – – – – – – 8
– – – – – – – −24
– – – – – – – −12
4.03 4.81 4.76 4.99 3.10 – 4.18 4.28
−34 −2 −16 −18 −16 – −12 −6
−68 −78 −10 6 −12 – −20 −32
−24 −20 20 4 12 – −12 −8
– 3.64 3.93 4.11 3.71 – 3.60 –
– 8 16 18 12 – 18 –
– −74 −2 2 −4 – −16 –
– −20 16 0 8 – −4 –
– – – – –
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
Superior, middle temporal STG core Anterior STS Middle STS Posterior STS
FM sweeps
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
corticostriatal thalamocortical motor circuit). All tasks were also associated with increased activity compared to rest in the anterior cingulate cortex, parahippocampal gyri, and midbrain.
189
To compare activation across conditions, Figs. 3 and 4 illustrate increases in normalized rCBF versus rest at coordinates of interest representing maxima in the temporal lobe (Table 1, STG and STS coordinates) derived from the LD
Fig. 3. Bar graph illustrating changes in normalized rCBF for each of the auditory tasks vs. rest. For each contrast, the magnitude of rCBF increases at specified voxels of interest representing maxima in the temporal lobe derived from the LD minus rest contrast (Table 1) were extracted from SPM output matrices. Values represent mean differences in normalized rCBF (ml/100 g/min ± S.D.) at regions in left (A) and right (B) hemispheres, between resting baseline rCBF values and lexical decision (solid), categorical perception (heavily stippled) and FM sweeps (lightly stippled) tasks: (A) a: greater than (P < 0.01) CP and FM; b: greater than (P < 0.0001) CP and FM; c: greater than (P < 0.01) FM; (B) a: greater than (P < 0.0001) CP and (P < 0.01) FM; b: greater than (P < 0.05) CP; c: greater than (P < 0.0001) CP and FM; d: greater than (P < 0.001) CP and (P < 0.05) FM.
190
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
Fig. 4. Bar graph illustrating changes in normalized rCBF for each of the auditory tasks vs. rest. For each contrast, the magnitude of rCBF increases at specified voxels of interest representing maxima in the temporal lobe (Table 1, STG and STS coordinates) derived from the FM sweeps minus rest contrast (Table 1) were extracted from SPM output matrices. Values represent mean differences in normalized rCBF (ml/100 g/min ± S.D.) at regions in left (A) and right (B) hemispheres, between resting baseline rCBF values and lexical decision (solid), categorical perception (heavily stippled) and FM sweeps (lightly stippled) tasks: (A) a: significantly greater (P < 0.0001) than CP and FM; (B) a: significantly greater (P < 0.05) than CP and (P < 0.0001) FM; b: significantly greater (P < 0.0001) than CP; c: significantly greater (P < 0.0001) than LD and CP; d: significantly greater (P < 0.0001) than LD and (P < 0.001) CP; e: significantly greater than LD (P < 0.05).
minus rest contrast (Fig. 3) and the FM sweeps minus rest contrast (Fig. 4). Fig. 3 shows that the response to words was significantly greater than the other conditions in all regions except the right posterior STS. The magnitude of
the differences went up to 76% (LD versus FM, STS-mid) in the left hemisphere and up to 68% (LD versus CP, STG) in the right hemisphere. Fig. 4 shows that rCBF responses to FM sweeps exceeded those to other stimuli only in the
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
191
Fig. 5. Maps illustrating contrasts between responses to auditory stimuli in three conditions. The statistical parametric (SPM{z}) map illustrating changes in rCBF is displayed on a standardized MRI scan. The MR image was transformed linearly into the same stereotaxic (Talairach) space as the SPM{z} data. Using Voxel View Ultra (Vital Images, Fairfield, Iowa), SPM and MR data were volume-rendered into a single three-dimensional image for each contrast. The volume sets are resliced and displayed at selected planes of interest relative to the anterior commissural–posterior commissural line as indicated. Values are Z-scores representing the significance level of differences in normalized rCBF in each voxel; the range of scores is coded in the accompanying color table, scaled to the maximal value for each contrast. The images in (a) illustrate the contrasts between lexical decision and FM sweeps as baseline (upper row) and syllable categorization as baseline (lower row). The images in (b) illustrate the contrasts between FM sweeps and LD as baseline (upper row) and syllable categorization as baseline (lower row). Images in (c) illustrate the contrasts between syllable categorization and FM sweeps as baseline (upper row) and LD as baseline (lower row).
192
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
right hemisphere. Particularly robust are the FM-elicited responses observed in the right posterior regions, STG/PT and posterior MTG. 3.2.1.2. Task specific activations. The LD task was associated with left-lateralized activation within extra-temporal regions, including the inferior and mid portions of the frontal operculum bilaterally (left greater than right, qualitatively, e.g. by assessing Z-scores), the left anterior and posterior fusiform gyrus, the pulvinar (left greater than right), and the left anterior insula. CP and FM, but not LD, were associated with significant activation of the right dorso-lateral prefrontal cortex, CP alone was associated with activity in the right inferior temporal gyrus, and FM sweeps alone showed activation of the left superior parietal lobule and supramarginal gyrus (SMG). 3.2.2. Task versus task comparisons Fig. 5 and Tables 2–4 summarize the analysis for the comparison of each of the tasks (LD, CP, FM) against two other tasks (i.e. LD–CP and LD–FM; CP–LD and CP–FM; FM–LD and FM–CP). Tables 2–4 list the peak activation maxima for each of those comparisons. Fig. 5 shows the PET activations overlaid on axial MR images at five different levels. Insofar as we consider commonalities of activations across subtraction conditions, these are done by visual inspection and by comparing scores, not by using masking. 3.2.2.1. Lexical decision versus FM and CP (Fig. 5a and Table 2). The comparison between LD and FM revealed four significant clusters, very strongly lateralized to the left (8598 voxels above threshold; L:R ratio 3.58). The comparison between Lexical decision and CP revealed four significant clusters, also strongly lateralized to the left (7681 voxels above threshold; L:R ratio 2.96). Again, lateralization is qualitatively assessed as a ratio of the activation patterns, but not further quantified. Greater activity for LD versus both CP and FM. Significant increases in activity for the LD task versus both CP and FM are seen throughout the entire antero-posterior extent of the STS bilaterally, as well as in the left posterior MTG. Bilateral increases in activity of the ITG are also seen. Outside of the temporal cortices, significantly greater activation for words versus either syllables or FM sweeps was detected in the left frontal operculum, the left anterior insula, the left DLPFC, the left anterior and posterior fusiform gyri, as well as the left anterior parahippocampal gyrus. In summary, greater activity for the lexical decision task than either FM or CP is found bilaterally in the auditory cortices; the differences in basal temporal and extra-temporal regions are, on the other hand, lateralized to the left hemisphere. Other contrast specific differences. Compared to CP only, lexical decision showed greater activation of the central portion of the STG (BA 42/22), in both right and left
hemispheres. Greater activity was also observed in the right MTG. Compared to FM only, LD was associated with greater activation of the left temporal pole, the right cerebellum, and the left ACC. 3.2.2.2. Categorical perception versus LD and FM (Fig. 5c and Table 3). The CP versus FM comparison showed three significant clusters, strongly lateralized to the left (1286 voxels above threshold; L:R ratio 3.85). CP versus LD showed four significant clusters, lateralized to the right (4188 voxels above threshold; L:R ratio 0.49). In these analyses, the contrasts were markedly different, i.e. there were essentially no differences common to both contrasts (no regions in which activations during the CP task were significantly greater than both LD and FM). When compared with FM, the following patterns were observed: greater activity for CP in the lateral–temporal cortices, bilateral in the mid portion of the STS; in other temporal regions, significant increases in activation for CP versus FM— including the posterior STS and the anterior and posterior MTG—are lateralized to the left hemisphere. Extra-temporal foci were similarly lateralized; greater activation for CP versus FM was seen in the operculum, orbital and dorsolateral prefrontal cortices, the inferior parietal lobule, the cingulate and the parahippocampal gyrus. Broadly speaking, the pattern is similar to that seen in the LD versus FM contrast (which ostensibly reflects the speech–non-speech difference): greater activity bilaterally for CP in the mid-portion of the STS; other differences in lateral–temporal and extra-temporal areas were left lateralized. When compared with LD, no activations associated with CP were significantly greater in the STG, MTG or STS in either hemisphere. Relative elevations in activity in extra-temporal regions were found principally in the right hemisphere: in dorsolateral prefrontal, insular, parietal and occipital cortices. Bilateral elevations in activity are seen in midline cortices: medial prefrontal, anterior and posterior cingulate gyri. In summary, in all temporal areas generally associated with auditory processing, activation associated with lexical decision exceeded activation in the categorical perception task. 3.2.2.3. FM sweeps versus LD and CP (Fig. 5b and Table 4). The FM sweeps versus LD contrast showed 11 significant clusters, lateralized to right (4799 voxels above threshold; L:R ratio 0.71). The FM versus CP comparison showed six significant clusters, lateralized to left (2747 voxels above threshold; L:R ratio 1.24). Greater activity for FM versus both LD and CP. In the lateral–temporal cortices, local maxima representing greater activation for FM versus either CP or LD are located entirely within the right hemisphere. All differences common to both contrasts (FM versus both LD and CP) are found in the posterior temporal regions (greater than 45 mm posterior to the anterior commissure). Activity for FM exceeds both CP and LD
Table 2 LD vs. CP and LD vs. FM Region
Brod. no.
Lexical decision–categorical perception Left Z-score
Right y
z
3.34 5.27 7.21 6.31 4.20
– −48 −50 −56 −44 −50
– −14 4 −14 −34 −44
Z-score
Superior, middle temporal Temporal pole STG, core Anterior STS/STG Middle STS/STG Posterior STS/STG Posterior MTG
– 42/21 22 22 22 21
Inferior, basal temporal ITG Posterior fusiform Anterior fusiform
3.34 37 37
−44 6.19 5.91
−32 −42 −40
−12 −62 −38
Prefrontal Inferior DLPFC Superior DLPFC Operculum, orbital Operculum, triangular Operculum, triangular Operculum, operculum
46 8 47 4.82 44/45 44/6
4.20 – 3.88 −34 6.82 5.95
−46 – −22 24 −40 −38
36 – 20 −4 20 2
8 – −12 – 12 28
3.21
−10
14
−34 – 3.10
16 – −30
−44
−64
Frontal motor SMA –ACC Insula/cingulate/PHPC Anterior insula Anterior cingulate Anterior PHPC Cerebellum Lateral CB
6 5.30 32/24 35 5.68
– 12 −8 0 4 4
x
y
z
x
y
z
5.23 8 −4 4 0 4
−24 – 6.48 9.54 6.72 6.37
10 – −54 −56 −44 −54
−24 – −4 −16 −34 −32
– – −4 0 4 −12
5.78
−50 7.10 5.15
−44 −44 −42
−8 −58 −28
3.91 −16 −16
4.34 3.85
−48 −30
36 18
8 48
−38 6.57 5.41
26 −46 −38
−4 22 4
3.91 12 28
3.79
−6
20
−42 – 5.24
16 – −24
8 – −6
−40
−48
−20
– −28 −2 −22 −38 −44
−48 – –
−12 – –
– – – – – 3.69
– – – – – 44
– – – – – 2
48
5.34
8
12
40
4 – −6
– – −24
– 5.20 –
– −4 –
– 30 –
4.71 40 –
−16
–
–
–
6.42
54 – –
–
Right
Z-score
– 42 50 58 54 56
3.05 −16 −24
– 5.25 5.88 5.84 4.80 3.77
Left
– – – – – 8.10 – 28
Z-score
y
z
– – 56 54 54 –
– – −10 −26 −36 –
– −4 4 0 –
58 – –
−44 – –
−12 – –
– –
– –
– –
– –
– –
30 – –
32 – –
−8 – –
– –
52
–
–
–
–
– – −24
–
–
–
–
–
–
34
−46
−20
3.12
– – 7.54 6.53 6.26 –
x
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
x
–
Lexical decision–FM sweeps
–
193
194
Table 3 FM vs. LD and FM vs. CP Region
Brod. no.
FM sweeps –lexical decision
FM sweeps –categorical perception
Left
Superior, middle temporal Ant STG Posterior STG –PT Posterior MTG
22 22 21
Parieto-occipital Superior parietal lobule/PCC Supramarginal gyrus Lingual gyrus Lateral occipitlal cortex
7 40 3.13 18
Prefrontal Inferior DLPFC Frontal motor Inferior precentral SMA LPM CD Insula/cingulate/PHPC Anterior insula Anterior cingulate/MPF Cerebellum Posterior CB
46 4/6 6 6 –
– – – 4.02 4.66 −20 4.05 – 3.75 – – –
x
y
z
– – –
– – –
−16 −58 −52 −50
−68 −34 4 −74
–
–
−40 – – –
−12 – – –
– 32/24
– –
– –
– –
3.62
−14
−68
−16
Z-score – – –
48 24 3.25 4 – 44 – – 3.37 – – 3.98
Left x
y
z
Z-score
– 5.21 5.82
– 50 48
– −48 −60
– 20 4
– – –
– 4.66 8 –
– 46 −46 –
– −40 4 –
– 28 – –
4.03 3.96 – –
3.19
40
48
12
–
4.34 – 3.01 4
48 – 42 24
−4 – 0 8
36 – 44 –
4.14 – – –
– 3.14
– 4
– 40
– 4
−70
−16
18
3.38
Right x
y
z
Z-score
– – –
– – –
– – –
−16 −36 – –
−68 −42 – –
44 40 – –
–
–
–
−34 – – –
−8 – – –
32 – – –
– –
– –
– –
−28
−86
−20
3.42 – –
x
y
z
50 56 50
4 −46 −60
−4 16 4
– – – –
– – – –
– – – –
– – –
–
–
–
–
5.69 3.91 4.33 –
44 8 42 –
−2 4 0 –
28 48 44
42 3.54
−6 6
−4 2
–
–
4.11 3.88 3.53
–
40
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
Z-score
Right
Table 4 CP vs. FM and CP vs. LD Region
Brod. no.
Categorical perception –FM sweeps Left
Right x
y
z
−58 −60 −62 −48
−16 −18 −36 −46
−24 −4 4 −4
–
–
–
21 20/21 22/21 21/39
3.22 3.77 3.96 3.10
Inferior, basal temporal ITG
37/19
–
Parietal Supramarginal gyrus Angular gyrus Lingual gyrus
40 39 –
– 3.44 –
– −44 –
– −72 –
Prefrontal Medial orbital Medial prefrontal Lateral orbital Superior DLPFC Inferior frontal operculum
11 10 10 8 47
3.97 – – 3.34 2.68
−6 – – −30 −42
42 – – 16 26
–
–
–
Insula/cingulate/PHPC Anterior insula Anterior cingulate Posterior cingulate Anterior PHPC Cerebellum Posterior CB
– 32 23/31 28 –
– 3.56 4.03 3.25 –
Z-score
y
z
Z-score
Right x
y
z
Z-score
– 58 – –
– −14 – –
– 0 – –
– – – –
– – – –
– – – –
– – – –
– – – –
–
–
–
–
–
–
–
–
– – 36
– – −54
– – −4
– –
– –
– –
– –
−12 – – 44 −8
– – – – –
– – – – –
– – – – –
– – – – –
3.68 3.51 3.43 – –
−4 −12 −30 – –
46 54 56 – –
–
–
–
–
–
–
–
– −8 −10 −26
– 34 −40 −12
– 40 28 −20
– – 10 –
– – −48 –
–
–
–
–
–
– 24 3.49
– 3.55 – –
Left x
– – 4.06 – –
3.26 – 12 – –
–
–
x
y
z
– – – –
– – – –
– – – –
3.29
44
−72
8
– –
4.83 –
42 –
−50 –
28 –
−16 4 −4 – –
– 4.15
– 2
– 60
– 4
3.81 –
30 –
24 –
40 –
3.55
6
22
0
−14 −12
20 40
8
4.98
2
38
0
–
–
–
–
–
–
–
–
–
–
–
3.31
16
−72
−16
32 3.94
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
Z-score Superior, middle temporal Anterior MTG Middle STS/MTG Posterior STS/STG Posterior MTG
Frontal motor CD
Categorical perception –Lexical decision
195
196
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
in the most caudal portions of the right MTG and in the right posterior STG in the region of the planum temporale. Greater activity for FM versus both tasks was also seen in right anterior cingulate cortex and left parietal cortex (superior parietal lobule and SMG) (Fig. 5b). Significantly larger activations for FM were also found in motor areas: precentral gyri, premotor cortices, and cerebellum. In summary, in the temporal lobe, the greater activations for the FM discrimination are confined to the right posterior areas. Portions of the parietal cortex appear to be more active for the FM task as well. Other contrast specific differences. Differences specific to the FM versus CP or the FM versus LD contrasts are also principally located within the right hemisphere. Compared to CP only, the right anterior STG and the right insula (as well as scattered motor areas) show foci of increased activation for FM sweeps. Compared to LD, the FM task was also associated with greater activity in the right SMG; more widespread activations seen for FM versus LD alone include right SMG, DLPFC, cerebellum, and occipital areas bilaterally.
4. Discussion PET was used to measure the response patterns associated with auditory lexical decision, categorical perception of syllables, and FM direction identification. Because the acoustic and psycholinguistic characteristics of these signals are well understood and the behavioral profiles generated by the tasks are stable, we suggest that the neuronal activation data connect readily to the psycholinguistic literature. Crucially, the behavioral data we collected replicated the typical response profiles (Fig. 1). To insure that the activation patterns observed were robust, and to more explicitly characterize the features of the response patterns, task-related activations were compared against three baselines: rest and the two other tasks. By using three different subtractions, we aimed to isolate those activations that survive comparisons with different types of controls. It is the sites that are activated across three comparisons that appear to us likely to merit interpretation in the context of our experiment. We used the same experimental procedure for each stimulus type (single trial two-alternative-forced-choice, same rate of stimulus presentation) to minimize the effects due to the execution of the experimental task itself (Poeppel et al., 1996; Norris & Wise, 2000). The main findings were: (1) all tasks activated auditory cortex bilaterally (Fig. 2). When the non-speech condition (FM) was used as a baseline for both speech tasks (acoustic control), the bilateral nature of the response remained. Given that the activation for the sweeps was highly significant bilaterally, it is compelling that further bilateral (most likely non-primary) activation in STG and STS— perhaps related to the complex acoustic aspects of the speech signal that we did not control for persisted; (2) the left and
right areas were, however, differentially modulated (Fig. 5). The words engaged left temporal areas more strongly and also activated left-lateralized extra-temporal areas, including the frontal operculum and fusiform gyrus. The FM activation strongly lateralized to the right, particularly in non-primary temporal areas (STS and MTG). The categorical perception task was associated with bilateral activation of auditory fields and a leftward bias. For the CP versus FM the pattern was similar to that seen for LD versus FM. Two methodological issues require comment: (i) the nature of the acoustic contrasts we used and their inherent limitations and (ii) the utility of PET for mapping the functional anatomy of the superior temporal cortex. The acoustic matching of our stimuli across the three experimental tasks is a difficult issue. For this experiment, we opted to use acoustic stimuli that are well understood in psycholinguistics, particularly research on speech processing. We chose the lexical, syllabic, and non-speech (FM) stimuli because the choice allowed us to connect with the literature exploring these materials in behavioral research. In other words, given how popular ‘lexical decision’ and ‘categorical perception’ tasks are in cognitive science research, we wanted to explore the neural basis associated with these tasks as they are executed with typical stimuli, and we focused more on matching across behavioral requirements (i.e. same response type, stimulation rate, etc.). However, we have as a consequence had to compromise with regard to the acoustic matching. Specifically, there are acoustic complexities that differ across the three conditions, with speech being the most acoustically complex relative to FM sweeps. The interpretation of these data requires caution insofar as we want to discuss the activations as forming the basis for speech processing. The second issue concerns whether or not using PET one can localize activation with a high resolution and therefore differentiate between primary and non-primary auditory areas. PET is quiet (and therefore well suited for auditory studies that require psychoacoustic judgments) and has terrific sensitivity, but a limited spatial resolution when compared to fMRI. Recent treatments, e.g. by Johnsrude, Giraud, & Frackowiak (2002) and Hall, Hart & Johnsrude (2003), point out that a convincing analysis of the functional anatomy of human auditory cortex is very difficult to establish with a method that has a spatial resolution on the order of 10–15 mm. Fine-grained anatomic differences (and possible subtle lateralization patterns in responses) may be masked given the limited resolution of the technique. Therefore, we stick to gross morphological landmarks and qualitative patterns of lateralization in the data. 4.1. Bilaterality/symmetry The robust bilateral activation to words and syllables (even when FM sweeps were used as baseline) suggests that speech perception—construed as the set of procedures that take acoustic input and derive representations that make contact with the mental lexicon—is mediated in left and right audi-
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
tory areas. Precisely which computation in the speech perception process is being subserved by left and right areas cannot be resolved because in the present study the acoustics were not controlled in a way to permit that analysis. There are suggestions in the literature that processes optimized for spectral analysis are more rightward lateralized and the processes optimized for temporal analysis more leftward lateralized (Zatorre, Belin, & Penhune, 2002; Poeppel, 2003). In the context of speech per se it has been suggested that a longer temporal window of analysis associated with right non-primary auditory cortex (∼200–300 ms) can confer a slight rightward advantage for syllabic processing— and syllable-sized acoustic units, in turn, form an effective temporal unit for spectral analysis tasks. In contrast, the shorter temporal integration window associated with left non-primary areas (∼20–50 ms) forms the basis for processing at shorter time scales, which in speech would be advantageous for segmental and subsegmental processing and the establishment of intra-syllabic temporal order (Poeppel, 2003). In summary, in the context of the present results, the bilateral nature of the response need not be speech-specific; but very probably the processes mediated by left and right auditory areas play a core role in the analysis of the speech signal. A common assumption deriving primarily from neuropsychological research is that speech perception lateralizes to the dominant hemisphere, presumably motivated by the fact that speech perception is closely linked to language processing—which is highly lateralized (Geschwind, 1970; Binder et al., 1997). In view of recent data, including the data from this study, this model has to be reconsidered. The data observed here converge with imaging results by Wise et al. (1991), Mummery et al. (1999), Belin, Zatorre, Lafaille, Ahad, & Pike (2000), Binder et al. (2000), as well as neuropsychological arguments by Poeppel (2001). Recent reviews of imaging, lesion, and electrophysiological data by Hickok & Poeppel (2000) and Norris & Wise (2000) emphasize the emerging consensus that speech perception (or, better, the numerous processes underlying speech perception, including the temporal and spectral analysis of the signal, the analysis of periodic and aperiodic components, and other necessary signal processing subroutines) is mediated bilaterally. By and large, the data we observe here are consistent with the hierarchical model of auditory word processing articulated by Binder et al. (2000). Their model suggests that spectro-temporally complex sounds, in general, including speech sounds, are processed bilaterally in the dorsal temporal plane, including STG. Speech sounds per se appear to be processed bilaterally in areas more ventro-lateral than non-speech signals (e.g. along STS). Finally, the processing of words beyond the analysis of the input signal (i.e. lexical search, word recognition) appears to be handled by additional areas outside of the superior temporal lobe (e.g. MTG) as well as extra-temporal areas. Using a different experimental design and a different imaging technique, the data we report are consistent with the above model.
197
4.2. Laterality/asymmetry 4.2.1. Words Temporal areas were differentially more active for words than for syllables. The extent and magnitude of activation in left areas exceeded right areas, although the response was still bilateral. While lateralization was observed in the response magnitude of temporal areas, it was even more apparent in the recruitment of extra-temporal areas. Words compared to either CP or FM showed left-lateralized extra-temporal activations including the operculum (BA 44, 45 and 47), the anterior insula, and the fusiform gyrus (BA 37). In general, single words have been used in a number of studies, and a marked leftward lateralization has been observed by many groups (Wise et al., 1991; Howard et al., 1992; Fiez, Raichle, Balota, Tallal, & Petersen, 1996; Price et al., 1996; Binder et al., 1997). Our results suggest that post-perceptual computations, perhaps aspects of lexical semantics, reflect the laterality that is characteristic of language processing. Recent convergent fMRI evidence also argues that it is the lexical–semantic level of processing that is lateralized (Zahn et al., 2000). We selected durations of words, CVs and FMs that are used in standard psychophysical paradigms. As a consequence, the words were about 40% greater in duration than the syllables and sweeps, which were matched to each other. On this basis, differences in the duration of our stimuli may have accounted for a portion of the variation in the auditory cortical responses. However, the major effects we observe—differences exceeding 80% (versus FM sweeps) in the left anterior STS (Fig. 4A) and 73% (versus CV syllables) in the right anterior STS (Fig. 4B)—were in non-primary auditory areas, and the previous literature indicates that left non-primary areas are less susceptible to rate changes (Price et al., 1992). Moreover, the finding that posterior right STG and MTG respond much more strongly to FM sweeps than to words argues against a conception that signal duration entirely drives the response pattern. What is the role of the left frontal opercular activation? One hypothesis is that the lexical decision task, in addition to the known exhaustive lexical search, requires different underlying phonetic/phonological computations (cf. Bokde, Tagamets, Friedman, & Horwitz, 2001). In particular, processing non-words might engage speech segmentation operations that have been shown to drive left frontal opercular cortex (Zatorre, Evans, Meyer, & Gjedde, 1992; Burton, Small, & Blumstein, 2000; Burton, 2001). If this interpretation is on the right track, then the activation may be primarily due to processing of the non-words, which more strongly engages sound segmentation. The fusiform activation may reflect an interface with word or conceptual storage. Fusiform activation to words (visual and auditory) has been observed in other studies (e.g. Zatorre, Meyer, & Gjedde, 1996; Wagner et al., 1998; Chee et al., 1999). Based on neuropsychological and imaging
198
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
data we argue that inferior temporal and fusiform areas play a core role in processing lexical and conceptual information, possibly in a manner independent of the input modalit (Büchel, Price, & Friston, 1998). 4.2.2. Syllables The analysis of syllables with the type of consonantal onset we used is modulated by formant transitions, which are typically on the order of 20–50 ms duration. Based on arguments that the left hemisphere is optimized to analyze ‘rapid temporal transitions’ (Fitch et al., 1993; Nicholls, 1996; Poeppel, 2001, 2003) it is assumed that the perception of CV syllables should be associated with left auditory cortex. Studies that have used auditory CV syllables have indeed typically observed activation in the left (Zatorre et al., 1992; Fiez et al., 1995; Celsis et al., 1999; Burton et al., 2000). The present results do not clearly support this view. The activation in the syllable task, although much more diffuse, does not appear to be markedly lateralized. One reason for the absence of the lateralization in CP might be that subjects execute a CP task by attending to the entire spectrum of a syllable (i.e. envelope and temporal fine structure) rather than just the spectro-temporal fine structure necessary for the segmentation tasks used in previous work. 4.2.3. FM sweeps FM stimuli, while eliciting a bilateral response, more strongly activated right temporal areas, where responses were also less variable than on the left. What drives this response pattern is not clear. Recently there has been increasing emphasis on using non-speech auditory stimuli with more complex spectral structure (Johnsrude, Zatorre, Milner, & Evans, 1997; Belin et al., 1998; Scheich et al., 1998; Schlosser, Aoyagi, Fulbright, Gore & McCarthy, 1998; Baumgart, Gaschler–Markefski, Woldorff, Heinze, & Scheich, 1999; Belin et al., 2000; Binder et al., 2000; Thivard, Belin, Zilbovicius, Poline, & Samson (2000), Hall et al., 2002). One of the emerging themes from these studies— consistent with findings in the animal literature by Scheich and colleagues—is that FM sounds, particularly FMs with slow rates of change or FMs with long durations, lateralize to the right, as observed in the present study. For example, Schlosser et al. (1998) used FM sounds in fMRI recordings (5.9 s duration, 200–5000 Hz bandwidth) and found activation in right STG. The emerging consensus is that ‘slow’ FMs drive right temporal cortical fields. However, this is complementary with observations by others (Fiez et al., 1995; Johnsrude et al., 1997; Belin et al., 1998) that left areas appear to be driven by rapid FMs or other rapidly changing sounds. The relevant contrast appears to lie in the duration of the FM or the FM rate (these are confounded, making direct comparisons difficult). Crucially, because duration, bandwidth, and FM rate interact, it will be necessary to do parametric studies to assess how these stimuli drive lateralization. One possibility is that the lateralization in auditory
processing is driven by differing temporal and spectral sensitivities, as mentioned above. Zatorre & Belin (2001), in particular, argue that left cortical areas are specialized for temporal processing and right areas for spectral processing. Signals that require an analysis that emphasizes either type of processing to execute a task will therefore differentially engage left versus right areas. An alternative possibility, also discussed above, is that the temporal integration windows over which sounds are analyzed in non-primary auditory areas differ between the left and right areas. Left areas favor temporal information because shorter temporal integration windows are analyzed (favoring temporal features of a sound), right areas favor spectral information because longer integration windows are considered (Poeppel, 2001, 2003).There are two novel aspects to this work. First, tasks such as categorical perception or lexical decision derive from a rich experimental literature. Because these stimuli and tasks are well understood, it seems reasonable to use them to aid the interpretation of the anatomic findings. For example, because we know that lexical decision requires lexical access, we can argue that the identified areas must play a role in subparts of lexical access. The network of areas identified can subsequently be investigated in a more parametric manner. Second, the observations we report confirm and extend a new perspective on the neural basis of speech. The data are consistent with a model in which all sound-based representations are constructed bilaterally in auditory cortex (Binder et al., 2000; Hickok & Poeppel, 2000). These representations interface with computational systems in different ways. Auditory representations that are subject to linguistic interpretation lateralize primarily to the left; other representations presumably lateralize based on their functional role. One parameter that conditions the lateralization is the apparent differential temporal sensitivity of left and right auditory areas (Zatorre & Belin, 2001; Zatorre et al., 2002; Poeppel, 2001, 2003). On this view, the analysis of the speech signal is mediated bilaterally because speech contains ‘fast’ (e.g. formant transitions) and ‘slow’ components (envelope of syllable, intonation contour). In contrast, post-perceptual linguistic computation is lateralized.
Acknowledgements This work was supported by the James S. McDonnell Foundation Program in Cognitive Neuroscience, NIH DC04638 and NIH DC05660 (DP) and the National Institute of Deafness and other Communication Disorders Intramural Research Program (AB). We thank Barry Horwitz for comments on the manuscript, Anna Salajegheh for help with figure preparation, and Charles Wharton and Lucila San Jose for help with stimulus creation. Correspondence to David Poeppel, Cognitive Neuroscience of Language Laboratory, University of Maryland, 1401 Marie Mount Hall, College Park, MD 20742, USA (
[email protected]).
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
References Baumgart, F., Gaschler-Markefski, B., Woldorff, M. G., Heinze, H., & Scheich, H. (1999). A movement-sensitive area in auditory cortex. Nature, 400, 724–725. Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature, 403, 309– 312. Belin, P., Zilbovicius, M., Crozier, S., Thivard, L., Fontaine, A., Masure, M. C., & Samson, Y. (1998). Lateralization of speech and auditory temporal processing. Journal of Cognitive Neuroscience, 10, 536– 540. Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S., Springer, J. A., Kaufman, J. N., & Possing, E. T. (2000). Human temporal lobe activation by speech and nonspeech sounds. Cerebral Cortex, 10, 512–528. Binder, J. R., Frost, J. A., Hammeke, T. A., Cox, R. W., Rao, S. M., & Prieto, T. (1997). Human brain language areas identified by functional magnetic resonance imaging. Journal of Neuroscience, 17, 353– 362. Bokde, A., Tagamets, M., Friedman, R., & Horwitz, B. (2001). Functional interactions of the inferior frontal cortex during the processing of words and word-like stimuli. Neuron, 30, 609–617. Büchel, C., Price, C., & Friston, K. (1998). A multimodal language region in the ventral visual pathway. Nature, 394, 274–277. Burton, M. W. (2001). The role of inferior frontal cortex in phonological processing. Cognitive Science, 25, 695–709. Burton, M. W., Small, S., & Blumstein, S. E. (2000). The role of segmentation in phonological processing: An fMRI investigation. Journal of Cognitive Neuroscience, 12, 679–690. Celsis, P., Boulanouar, K., Doyon, B., Ranjeva, J. P., Berry, I., Nespoulous, J. L., & Chollet, F. (1999). Differential fMRI responses in the left posterior superior temporal gyrus and left supramarginal gyrus to habituation and change detection in syllables and tones. NeuroImage, 9, 135–144. Chee, M. W., O’ Craven, K. M., Bergida, R., Rosen, B. R., & Savoy, R. L. (1999). Auditory and visual word processing studied with fMRI. Hum. Brain Mapping, 7, 15–28. Démonet, J.-F., Chollet, F., Ramsay, S., Cardebat, D., Nespoulous, J.-L., Wise, R., Rascol, A., & Frackowiak, R. (1992). The anatomy of phonological and semantic processing in normal subjects. Brain, 115, 1753–1768. Fiez, J. A., Raichle, M. E., Balota, D. A., Tallal, P., & Petersen, S. E. (1996). PET activation of posterior temporal regions during auditory word presentation and verb generation. Cerebral Cortex, 6, 1–10. Fiez, J., Raichle, M. E., Miezin, F. M., Petersen, S. E., Tallal, P., & Katz, W. F. (1995). Studies of auditory and phonological procressing: Effects of stimulus characteristics and task demands. Journal of Cognitive Neuroscience, 7, 357–375. Fitch, R. H., Brown, C. P., & Tallal, P. (1993). Left hemisphere specialization for auditory temporal processing in rats. Annals of the New York Academy of Sciences, 682, 346–347. Franke, N. (1999). SoundApp (2.5.1 ed.). Friston, K. J. (1995). Commentary and opinion. II. Statistical parametric mapping: Ontology and current issues. Journal of Cerebral Blood Flow and Metabolism, 15, 361–370. Friston, K., Worsley, K., Frackowiak, R., Mazziotta, J., & Evans, A. (1994). Assessing the significance of focal activations using their spatial extent. Human Brain Mapping, 1, 210–220. Geschwind, N. (1970). The organization of language and the brain. Science, 170, 940–944. Giraud, A. L., & Price, C. J. (2001). The constraints functional neuroimaging places on classical models of auditory word processing. Journal of Cognitive Neuroscience, 13, 754–765. Goldinger, S. D. (1996). Auditory lexical decision. Language and Cognitive Processes, 11, 559–567.
199
Gordon, M., & Poeppel, D. (2002). Inequality in identification of direction of frequency change (up versus down) for rapid frequency-modulated sweeps. ARLO/Journal of Acoustical Society of America, 3 (1). Hall, D. A., Hart, H. C., & Johnsrude, I. S. (2003). Relationships between human auditory cortical structure and function. Audiology & Neurootology, 8, 1–18. Hall, D. A., Johnsrude, I. S., Haggard, M. P., Palmer, A. R., Akeroyd, M. A., & Summerfield, A. Q. (2002). Spectral and temporal processing in human auditory cortex. Cerebral Cortex, 12, 140–149. Hickok, G., & Poeppel, D. (2000). Towards a functional neuroanatomy of speech perception. Trends Cognitive Sciences, 4, 131–138. Howard, D., Patterson, K., Wise, R., Brow n, W., Friston, K., Weiller, C., & Frackowiak, R. (1992). The cortical localization of the lexicons: Positron emission tomography evidence. Brain, 115, 1769–1782. Johnsrude, I. S., Giraud, A. L., & Frackowiak, R. (2002). Functional imaging of the auditory system: The use of positron emission tomography. Audiology & Neurootology, 7, 251–276. Johnsrude, I. S., Zatorre, R. J., Milner, B. A., & Evans, A. C. (1997). Left-hemisphere specialization for the processing of acoustic transients. Cognitive Neuroscience and Neuropsychology, 8, 1761–1765. Kaas, J., & Hackett, T. (2000). Subdivisions of auditory cortex and processing streams in primates. Proceedings of the National Academy of Sciences of the Unite States of America, 97, 11793–11799. Liberman, A., Harris, K. S., Hoffman, H. S., & Griffith, B. C. (1957). The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology, 54, 358–368. Mummery, C. J., Ashburner, J., Scott, S. K., & Wise, R. J. (1999). Functional neuroimaging of speech perception in six normal and two aphasic subjects. Journal of Acoustical Society of America, 106, 449– 457. Nicholls, M. (1996). Temporal processing asymmetries between the cerebral hemispheres: Evidence and implications. Laterality, 1, 97–137. Norris, D., Wise, R. (2000). The study of prelexical and lexical processes in comprehension: Psycholinguistics and functional neuroimaging. In Gazzaniga, M. (Ed.), The new cognitive neurosciences. Cambridge, MA: MIT Press. Oldfield, R. C. (1971). The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia, 9, 97–113. Poeppel, D. (2001). Pure word deafness and the bilateral processing of the speech code. Cognitive Science, 25, 679–693. Poeppel, D. (2003). The analysis of speech in different temporal integration windows: Cerebral lateralization as ‘asymmetric sampling in time’. Speech Communication, 41, 245–255. Poeppel, D., Yellin, E., Phillips, C., Roberts, T. P. L., Rowley, H. A., Wexler, K., & Marantz, A. (1996). Task-induced asymmetry of the auditory evoked M100 neuromagnetic field elicited by speech sounds. Cognitive Brain Research, 4, 231–242. Price, C. J., Wise, R. J. S., Ramsay, S., Friston, K. J., Howard, D., Patterson, K., & Frackowiak, R. S. J. (1992). Regional response differences within the human auditory cortex when listening to words. Neuroscience Letters, 146, 179–182. Price, C. J., Wise, R. J. S., Warburton, E. A., Moore, C. J., Howard, D., Patterson, K., Frackowiak, R. S. J., & Friston, K. J. (1996). Hearing and saying: The functional neuro-anatomy of auditory word processing. Brain, 119, 919–931. Rademacher, J., & Caviness, V. S. (1993). Topographical variation of the human primary cortices: Implications for neuroimaging, brain mapping, and neurobiology. Cerebral Cortex, 3, 313–329. Rademacher, L., & Morosan, P. (2001). Probabilistic mapping and volume measurement of human primary auditory cortex. NeuroImage, 13, 669– 683. Rauschecker, J. (1998). Parallel processing in auditory cortex of primates. Audioogy and Neurootology, 3, 86–103. Scheich, H., Baumgart, F., Gaschler-Markefski, B., Tegeler, C., Tempelmann, C., Heinze, H. J., Schindler, F., & Stiller, D. (1998). Functional magnetic resonance imaging of a human auditory cortex area involved in foreground–background decomposition. European Journal of Neuroscience, 10, 803–809.
200
D. Poeppel et al. / Neuropsychologia 42 (2004) 183–200
Schlosser, M. J., Aoyagi, N., Fulbright, R. K., Gore, J. C., & McCarthy, G. (1998). Functional MRI studies of auditory comprehension. Human Brain Mapping, 6, 1–13. Scott, S. K., Blank, S. C., Rosen, S., & Wise, R. J. S. (2000). Identification of a pathway for intelligible speech in the left temporal lobe. Brain, 123, 2400–2406. Talairach, J., Tournoux, P. (1988). Co-planar stereotactic atlas of the human brain (2nd ed.). Stuttgart: Thieme Verlag. Thivard, L., Belin, P., Zilbovicius, M., Poline, J. B., & Samson, Y. (2000). A cortical region sensitive to auditory spectral motion. NeuroReport, 11, 2969–2972. Wagner, A. D., Schacter, D. L., Rotte, M., Koutstaal, W., Maril, A., Dale, A. M., Rosen, B. R., & Buckner, R. L. (1998). Building memories: Remembering and forgetting of verbal experiences as predicted by brain activity. Science, 281, 1188–1191. Wise, R., Chollet, F., Hadar, U., Friston, K., Hoffner, E., & Frackowiak, R. (1991). Distribution of cortical neural networks involved in word comprehension and word retrieval. Brain, 114, 1803–1817.
Woods, R. P., Cherry, S. R., & Mazziotta, J. C. (1992). Rapid automated algorithm for aligning and reslicing PET images. Journal of Computer Assisted Tomography, 16, 620–633. Zahn, R., Huber, W., Drews, E., Erberich, S., Krings, T., Willmes, K., & Schwarz, M. (2000). Hemispheric lateralization at different levels of human auditory word processing: A functional magnetic resonance imaging study. Neuroscience Letters, 287, 195–198. Zatorre, R. J., & Belin, P. (2001). Spectral and temporal processing in human auditory cortex. Cerebral Cortex, 11, 946–953. Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: Music and speech. Trends Cognitive Sciences, 6, 37–646. Zatorre, R., Evans, A., Meyer, E., & Gjedde, A. (1992). Lateralization of phonetic and pitch discrimination in speech processing. Science, 256, 846–849. Zatorre, R. J., Meyer, E., Gjedde, A., & Evans, A. C. (1996). PET studies of phonetic processing of speech: Review, replication, and reanalysis. Cerebral Cortex, 6, 21–30.