Predictive Mechanisms in Idiom Comprehension - Semantic Scholar

8 downloads 0 Views 345KB Size Report
expected noun elicited an N400 smaller than for both types of unexpected noun. .... within-language lexical switches, namely, English synonyms ...... S: Nicola aveva perso il video del concerto durante ... bull by the horns for instance). 4.
Predictive Mechanisms in Idiom Comprehension Francesco Vespignani1, Paolo Canal2, Nicola Molinaro3, Sergio Fonda2, and Cristina Cacciari2

Abstract ■ Prediction is pervasive in human cognition and plays a central

role in language comprehension. At an electrophysiological level, this cognitive function contributes substantially in determining the amplitude of the N400. In fact, the amplitude of the N400 to words within a sentence has been shown to depend on how predictable those words are: The more predictable a word, the smaller the N400 elicited. However, predictive processing can be based on different sources of information that allow anticipation of upcoming constituents and integration in context. In this study, we investigated the ERPs elicited during the comprehension of idioms, that is, prefabricated multiword strings stored in semantic memory. When a reader recognizes a string of words as an idiom before the idiom ends, she or he can develop expectations concerning

INTRODUCTION An increasing number of studies attribute a crucial role to predictive mechanisms in language processing (e.g., Federmeier, 2007; Pickering & Garrod, 2007). These studies have generally manipulated the semantic and world knowledge (e.g., Kutas & Hillyard, 1984) provided by a linguistic fragment that builds up an expectation for a specific upcoming word (e.g., “The burglar had no trouble locating the secret family safe. Of course it was situated behind a …”; Van Berkum, Brown, Zwitserlood, Kooijman, & Hagoort, 2005). In our study, we explored the electrophysiological correlates of predictive forwardlooking processing when the linguistic fragment contains a multiword expression (i.e., an idiom) whose canonical structure and meaning is stored in semantic memory.

Predictive Forward-looking Mechanisms in Language Comprehension The comprehension of a linguistic message relies on a complex interplay between previously processed information, information currently being processed, and the predictions arising from the combination of these two sources of information (Roehm, Bornkessel-Schlesewsky, Rösler, & Schlesewsky, 2007). It has long been known that the 1

Università degli Studi di Trento, Italy, 2Università di Modena e Reggio Emilia, Italy, 3Universidad de la Laguna, Spain

© 2009 Massachusetts Institute of Technology

the incoming idiomatic constituents. We hypothesized that the expectations driven by the activation of an idiom might differ from those driven by discourse-based constraints. To this aim, we compared the ERP waveforms elicited by idioms and two literal control conditions. The results showed that, in both cases, the literal conditions exhibited a more negative potential than the idiomatic condition. Our analyses suggest that before idiom recognition the effect is due to modulation of the N400 amplitude, whereas after idiom recognition a P300 for the idiomatic sentence has a fundamental role in the composition of the effect. These results suggest that two distinct predictive mechanisms are at work during language comprehension, based respectively on probabilistic information and on categorical template matching. ■

longer a sentential fragment, the fewer the available alternatives in the language to continue and/or conclude it in a syntactically well-formed and semantically meaningful fashion (Miller & Selfridge, 1950). The notion of clozeprobability1 capitalizes on the fact that as a sentential context becomes more informative, reducing the uncertainty about possible alternative completions, the range of elicited responses becomes smaller and the readerʼs ability to predict the most probable response increases. Despite the bulk of evidence showing the presence of predictive forward-looking mechanisms in language, the exact role of these anticipatory mechanisms has not yet been established. Syntax-first serial models of sentence comprehension attribute an essential role to top–down predictive processing in the architecture of the parser, although specifically limited to the minimal structural nodes necessary to construct a sentence. The meaning of a sentence is considered a function of the meaning of the constituents and of the syntactic rules that determine their combination. Accordingly, language interpretation is viewed as a two-step event in which, firstly, the context-free meaning of a sentence is computed compositionally in ways specified by the syntax. In a second step, the sentential meaning can be integrated with information coming from prior context, world knowledge, and pragmatic information that are crucial in establishing sentence interpretation in a final stage (e.g., Cutler & Clifton, 1999). In contrast, proponents of one-step models of language processing (e.g., Hagoort & van Berkum, 2007)

Journal of Cognitive Neuroscience 22:8, pp. 1682–1700

assume that every source of information functional to the interpretation of a sentence interact from the very beginning to form a coherent mental model of what the sentence is about (e.g., Tanenhaus & Trueswell, 1995). According to this bottom–up view, highly expected information can be inserted in the language comprehension architecture as one of the sources of information that acts as soon as possible in parallel with other sources of information. Despite these differences, both types of model agree that we can anticipate (part of ) a message and develop word-level expectations, but they disagree as to when and how predictive forward-looking processing occurs. The time course and whether a single or multiple systems manage anticipatory predictions are thus fundamental topics for neurocognitive models of language comprehension. The study of predictive language processing has frequently used the measurement of scalp-recorded ERPs and exploited the characteristics of the N400 component of the ERP. The N400 is a broad negative deflection that begins 200–300 msec after a word has been presented and has its peak after approximately 400 msec. Since its original discovery (Kutas & Hillyard, 1984), the processing nature of the N400 has been extensively investigated (for recent overviews, see Lau, Phillips, & Poeppel, 2008; Kutas, Van Petten, & Kluender, 2006). Several studies have shown that the amplitude of the N400 to words within a sentence depends on how predictable those words are (where predictability is measured by off-line cloze-probability tests): The more predictable a word, the smaller the N400 elicited. The N400 has been considered an index of message-level semantic integration and contextual facilitation. Although it has been shown to be associated with semantic anomaly or low predictability, the N400 can be elicited by a variety of meaningful stimuli (e.g., isolated words, pronounceable pseudowords, faces, pictures). The functional interpretation of the word- and sentence-related N400 differs according to the way in which the on-line impact of language predictability is conceived: According to the integration view (e.g., Holcomb, 1993), the N400 indexes the amount of search in a semantic space necessary to select the semantic value of a word and insert it into a partial interpretation of the sentence fragment. The ERP waveform, and the behavioral facilitation often observed, does not necessarily reflect the explicit prediction of upcoming constituents. In contrast, the prediction view (e.g., Federmeier, 2007) posits that the N400 indexes the mismatch between the predicted lexical entry and the actual value of the incoming word. A number of studies (e.g., Wicha, Moreno, & Kutas, 2004) have shown that the N400 is strongly influenced by predictive language processing. For instance, in Federmeier and Kutas (1999), participants read sentences such as, “They wanted to make the hotel look more like a tropical resort. So along the driveway, they planted rows of ….” The sentence could be completed with the most expected constituent (i.e., palms) or with an unexpected noun of the same or different category ( pines and tulips, respectively). The

expected noun elicited an N400 smaller than for both types of unexpected noun. Critically, and in contrast with what would have been predicted by the integration view, pines elicited a smaller N400 than tulips, despite their similar low cloze-probability (Federmeier, 2007). Most of the studies that have obtained predictive N400 effects have manipulated expectations deriving from sentential and discourse information. Recently, Roehm et al. (2007) investigated the comprehension of antonymous adjectives (e.g., black–white). According to Gross, Fischer, and Miller (1989), the meanings of predicative adjectives are organized in semantic memory by relations of antonymy and synonymy. Antonymous adjective pairs provide the basic semantic structure with synonymous adjectives clustering around the two antonyms. Hence, antonymous word pairs can be considered as units retrieved as such from semantic memory. Roehm et al. (2007, Experiment 1) presented participants with visual sentences that contained the first part of an antonym (e.g., “The opposite of black is…”) and ended with either the correct antonymous adjective (white), or an adjective of the same category ( yellow) or of a different one (nice). An N400 emerged for the nonantonymous adjectives, whereas the antonymous adjective elicited a clear positive peak interpreted as a P300, a waveform commonly associated with general processes of context updating (Donchin & Coles, 1988). Roehm et al. argued that “the P300 occurs in the same time range as the N400 to index functionally distinct levels of predictive processing via distinct electrophysiological characteristics” (p. 1260). Specifically, “the P300 for the expected antonymous noun arises because the correct identification of the predicted word does not require a lexical search (there is a unique prediction that may either be fulfilled or not)” (p. 1272). The authors acknowledge that this positivity might depend on the nature of the task (judging whether the sentence was right or wrong) and on individual processing strategies. Nonetheless, these results suggest possible P300 effects within the N400 time range. Multiword Expressions: A Neglected Source of Evidence for Neurocognitive Models of Language Semantic memory is a repository for a variety of knowledge that includes word meanings, concepts, and also learnt multiword strings (e.g., book titles, lines of poetry, clichés, and idioms). Idioms are strings of words whose meaning is generally not derived from that of the constituent parts. Idioms are well suited to investigating predictive mechanisms because their constituents are bound together in the string and have a typical, canonical structure and word order (Cacciari & Glucksberg, 1991). The principles that govern the syntactic and semantic variability of idioms have yet to be formalized. However, almost all idioms, like other types of multiword expressions, share the characteristic that prediction of their identity is based on how much of the string is necessary before the expression is called to mind. For example, cry over… is completed by most Vespignani et al.

1683

speakers with spilt milk, whereas break the… is more often given a literal ending (e.g., cup, bottle, dish) rather than an idiomatic one (e.g., ice). According to Cacciari and Tabossi (1988), the former idiom (cry over spilt milk) is predictable: It is recognized as soon as spilt is processed and its figurative meaning becomes available at that point. In contrast, break the ice is unpredictable and its meaning is not retrieved until the whole string has been processed. The words that form an idiom string might be anticipated during reading or listening to an ongoing sentence in a way that partly differs from what happens in literal sentences. The comprehension of literal sentences, in fact, proceeds incrementally and compositionally by integrating each piece of semantic information in a dynamic representation of the described state of affairs. The more semantic and syntactic information accumulates to express it, the more semantic and structural expectations the reader can develop. However, although one can understand, “Unfortunately Cristina spilt the milk” by applying the semantic and morphosyntactic compositional rules of the language, this does not suffice for comprehending the figurative meaning of “Unfortunately Cristina spilt the beans.” The solution to this problem was initially found by postulating that idioms are semantically empty “long words” retrieved as such from the mental lexicon (Swinney & Cutler, 1979). According to this view, the idiomʼs meaning is directly retrieved from semantic memory and not elaborated via linguistic processing. However, consistent evidence has accumulated to show that: (1) idioms undergo syntactic analysis as literal sentences and (2) the semantic structure of the constituent words can constrain the final interpretation assigned to an idiomatic sentence (Cacciari & Glucksberg, 1991). In fact, current models of idiom processing (for an overview, see Cacciari, Padovani, & Corradini, 2007) assume that idiom meaning activation is not based on a mere retrieval of a word-like unit from the lexicon. Despite the fact that idioms form highly constrained contexts, higher-order language processes are maintained at least until the idiom has been retrieved, with respect to the semantic contribution of the constituent words, and until the end of the sentence, with respect to the syntactic analysis of the sentence (Peterson, Burgess, Dell, & Eberhard, 2001). An influential view of how idiom comprehension processes unfold is the Configuration Hypothesis (Cacciari & Tabossi, 1988), according to which an idiom is processed word by word, just like any other piece of language, until enough information has accumulated to render the sequence of words identifiable as—or highly expected to be—a memorized idiom. Only at this point is the idiomatic meaning retrieved. This implies that, once the reader has enough information to realize that the unfolding sentence contains an idiom or an idiom fragment (e.g., “All of a sudden John realized that he was barking up the…”), she or he can retrieve the string from semantic memory and compare the expected constituent (wrong tree) with the actual idiom string. Therefore, idiom recognition is a necessary prerequisite for the idiom meaning to 1684

Journal of Cognitive Neuroscience

be retrieved from semantic memory. The point at which the string is identified as a known idiom determines how early the idiomatic meaning is activated (for a related claim, see Sprenger, Levelt, & Kempen, 2006). The aim of this study is to explore the possibility that the electrophysiological correlates of the processing of highly expected words in idioms, where predictability is determined by the knowledge of that specific expression stored in semantic memory, might differ from those reflecting the processing of highly expected words in sentences where predictability is subject to constraints deriving from sentence-level semantic–pragmatic information. Despite the pervasiveness of multiword strings in language, the electrophysiological correlates of their comprehension have been scarcely investigated, and none of the very few ERP studies testing idiom comprehension has manipulated the predictability of the constituents of the idiom string. Strandburg et al. (1993) measured ERP time-locked to the second word of idiomatic, literal, or nonsensical pairs of words presented in an acceptability judgment task. The authors found the N400 amplitude increased from the idiomatic to the literal and to the nonsense condition. In Laurent, Denhières, Passerieux, Iakimovac, and HardyBaylé (2006), participants were visually presented with the first part of a French idiom string followed by the final constituent and asked to perform a semantic relatedness task. The idiom string had both a literal and an idiomatic meaning. The N400 was smaller for highly salient idioms than for weakly salient ones. However, the notion of saliency partially overlaps with idiomatic meaning dominance and predictability of the last constituent, neither of which were controlled for. In Moreno, Federmeier, and Kutas (2002), English–Spanish bilinguals read literal or figurative English sentences that could end with three different types of word: expected high cloze-probability words (i.e., literal completions or proverb/idiom completions); within-language lexical switches, namely, English synonyms of the expected completions; or code switches, namely, translations into Spanish of the expected completions. Within-language lexical switches elicited larger N400s in literal and figurative contexts and a late positivity in figurative contexts, whereas code switches elicited a positivity that began at about 450 msec and continued into the 650–850 msec time window. In summary, the amplitude of the N400 response was affected by the predictability of the lexical item regardless of the prior context, whereas the latency of the N400 to lexical switches was affected by English vocabulary proficiency. The Present Study Predictable idioms (i.e., those that are identified as idioms before the last constituent) are an ideal test case to investigate the ERP components associated with the comprehension of multiword strings. The Configuration Hypothesis posits that during on-line idiom processing a qualitative change occurs after the idiomʼs recognition point (RP) in Volume 22, Number 8

that only then can the idiomatic meaning be retrieved from semantic memory. Hence, idiom recognition is a prerequisite for idiom meaning activation (for an overview of the behavioral evidence, see Cacciari et al., 2007). If predictive sentence processing modulates only the N400, we might expect to find an N400 whose amplitude differs before and after the idiom RP. This N400 might index sensitivity to the co-occurrence of constituents even before enough perceptual input has accumulated to trigger recognition of the idiom. As the strings unfold, co-occurrence can create a “sense of familiarity” that incrementally increases as more constituents arrive, up to a “threshold” after which the idiom is recognized and then activated. After the idiomʼs RP, the specific configuration is retrieved from semantic memory. The matching of the actual input (the idiom fragment) to the stored template (the idiomatic configuration) might be indexed by a different component: a P300, similar to that found by Roehm et al. (2007) for antonymous pairs. To test these hypotheses, we compared an idiom-neutral sentential context containing a predictable Italian idiom (idiomatic condition; see 1a in Table 1 for an example) with two semantically well-formed literal sentences: one in which the constituent forming the idiomʼs RP (i.e., the constituent after which the idiom is retrieved from semantic memory, henceforth RP, indicated with a subscript in the example) was substituted with an idiom-unrelated word (1b, substitution condition) and one in which the constituent just after the RP (when we assume the idiom to be already retrieved, henceforth RP+1, indicated with a subscript in the example) was changed with an idiom-unrelated word (1c, expectancy–violation2 condition, henceforth referred to as violation condition). The substitution condition was designed to track the development of idiom prediction as the sentence unfolds and the fragment is still perceived as literal. In fact, the idiom should be available to the reader after it is recognized as a known configuration, namely, after the RP. In this condition, we replaced the constituent coinciding with the idiomʼs RP with another constituent with similar lexical characteristics (see below). We expect that this change might modulate the N400 amplitude, as a function of cloze-probability differences, insofar as this component is sensitive to the distributional properties of language re-

gardless of the literal or figurative nature of the linguistic input. The violation condition was designed to test the response to the match versus mismatch of the idiomatic configuration (just retrieved from semantic memory) to the actual sentence fragment that, in this condition, continued with a different constituent, again matched for lexical characteristics (the position of the changed constituent is indicated as RP+1). The waveforms associated with the perception of this mismatch might be a larger N400, as typically observed for unexpected constituents or for constituents that are more difficult to integrate. However, if our hypothesis is correct, we might alternatively find a P300 to index the match between the retrieved idiomatic configuration and the idiomatic fragment. Determining whether an effect is caused by a single component or by different components (e.g., Luck, 2005), specifically whether an effect is a diminished N400 or a larger P300, is well known to be problematic. According to the literature, both ERP effects should result in a more negative potential at centro-parietal sites for substitution at the RP and for violation at the RP+1 with respect to the idiomatic condition. However, despite the similarity, some crucial differences exist. Regarding latency, the N400 has a peak around 400 msec and is usually evident between 300 and 500 msec for visually presented words in sentences. The P300 peaks at around 300 msec with an onset at around 250 msec. Regarding topography, the N400 is broadly distributed on the scalp with a maximum at the head vertex, usually slightly right-lateralized (Cz, C4). The P300 reported by Roehm et al. (2007) had a more posterior distribution with a maximum at parietal sites (P3, Pz, P4). We thus expect a peak at around 400 msec and maximum around Cz for the comparison between the substitution and idiomatic conditions at the RP. If the match between the constituent at RP+1 and the retrieved idiom elicits a P300, we expect to find an earlier effect in the comparison between the idiomatic and violation conditions peaking around Pz and also larger on occipital sites. To aid interpretation of the ERP, we conducted a selfpaced reading time experiment using the same experimental materials. Consistent with the behavioral literature, we expected idiomatic sentences to be read faster than literal sentences after the idiom RP, namely, at the RP+1.

Table 1. An Example of the Experimental Materials in Italian, with Word-by-Word English Translations and a Free Translation of the Figurative Meaning for the Example 1a Condition

Example

Idiomatic

1a. Giorgio aveva un bucoRP nelloRP+1 stomaco quella mattina. (George had a hole in the stomach that morning, namely George was hungry that morning)

Substitution

1b. Giorgio aveva un doloreRP nelloRP+1 stomaco quella mattina. (George had a pain in the stomach that morning)

Violation

1c. Giorgio aveva un bucoRP sullaRP+1 camicia quella mattina. (George had a hole on the shirt that morning)

Recognition point (RP) and the following word (RP+1) are reported as subscripts.

Vespignani et al.

1685

No effect is expected at the RP in the substitution versus idiomatic condition as the idiom should not yet have been retrieved from semantic memory. The effect of the expectancy violation component in the RP+1 should produce longer reading times compared with the idiomatic condition.

METHODS Participants A group of undergraduates from the University of Modena participated in the study for course credit: All were Italian native speakers unaware of the aim of the experiment, with normal or corrected-to-normal vision and no history of neurological disease. Specifically, 303 students participated in the norming of the experimental materials. Fifty different students participated in the ERP experiment after giving informed consent (26 women, 24 men; mean age = 21.0 years). They were right-handed, as assessed with an Italian version of the Oldfield questionnaire (Oldfield, 1971). Seventy students participated in the self-paced reading time experiment (40 women, 30 men; mean age = 20.9 years), none of whom had participated in the norming phase or in the ERP experiment. Materials One hundred seventy (170) idioms formed by a verb plus at least two constituents were selected from various collections of Italian idioms. This initial set of idioms was presented to 62 participants who were asked to rate each idiom for familiarity (on a 7-point scale, ranging from 1 = never heard to 7 = heard very often) and to paraphrase it. We selected 124 idioms that were familiar (M = 4.96, SD = 0.68, range = 3.6–6.3) and were correctly paraphrased (M = 0.90, SD = 0.08, range = 0.74–1.00). In order to test the idiom predictability and to identify the RP, 10 written questionnaires were prepared containing idiom fragments of increasing length inserted in minimal neutral contexts. The literal fragments were intermixed with the idiom fragments so that the latter represented only one third of the materials in each list. Eighteen students per list were asked to complete each sentence with the first words that came to mind. The RP of each idiom was operationally defined as the constituent after which idiomatic completion probability exceeded .65. The idiomʼs RP was at least one or two words before the offset of the idiom string. The mean cloze-probability of idiomatic completions after the RP was .85 (range = 0.66–100). For each of the 87 selected idioms,3 we constructed three sentences of similar syntactic structure and length: In the idiomatic condition (see Example 1a in Table 1; for further examples see Appendix 1), the sentence contained the idiom string in its canonical form embedded in as neutral a context as possible. The idiom string was always followed by two or three constituents (the same was done in the substitution 1686

Journal of Cognitive Neuroscience

and violation conditions). In the substitution condition (Example 1b, Table 1), the RP was substituted with a constituent unrelated to the idiomatic meaning and matched to the idiom constituent for number of characters, concreteness, grammatical class, age of acquisition (AoA) (idiom: M = 2.58, SD = 0.93, range = 0.72–4.44; substitution: M = 2.58, SD = 0.85, range = 0.88–4.28; t < 1), and written frequency (idiom: M = 5.88, SD = 2.98, range = 0–11.84; substitution: M = 6.05, SD = 2.83, range = 0.39–11.71; t < 1). In the violation condition (Example 1c, Table 1), the constituent after the RP (RP+1) was changed to a constituent unrelated to the idiomatic meaning and matched to the idiomatic constituent for number of characters, concreteness, grammatical class, AoA (idiom: M = 3.03, SD = 1.21, range = 0.6–5.45; violation: M = 2.78, SD = 0.85, range = 1.07–4.49; t < 1), and written frequency (idiom: M = 7.59, SD = 3.23, range = 1.13–14.05; violation: M = 7.24, SD = 2.92, range = 1.4–13.08; t < 1).4 The substitution and violation conditions had semantically and syntactically well-formed literal sentences with the same number of words as the sentences in the idiomatic condition (idiom: M = 8.54, SD = 1.31; substitution: M = 8.53, SD = 1.32; violation: M = 8.69, SD = 1.42; t < 1). We assessed the cloze-probability of the constituents at the RP and at the RP+1 in the idiomatic, substitution, and violation conditions. The idiom predictability and the cloze-probability of an idiom constituent might differ because predictability refers to the probability that an idiom fragment is completed idiomatically, whereas the clozeprobability of a constituent refers only to the probability of a given word appearing irrespective of further continuations. For instance, in “George had a holeRP inRP+1 the stomach that morning,” predictability after the RP (i.e., after hole) is defined as the proportion of participants who, when presented with the fragment George had a hole.., completed it with in the stomach. In contrast, the cloze-probability of the RP (hole) is the proportion of participants that when presented with George had a… continued it with hole, regardless of whether the following words were consistent with the idiom or not. The clozeprobability values for the two critical constituents (RP and RP+1) in the three experimental conditions are reported in Table 2. The cloze-probability of the constituent after which the idiom was recognized (i.e., the RP) showed large between-item variability; however, the cloze-probability of half of the items lay between .11 and .62, leading to a mean value of .37. The cloze-probability of the word that substituted for the RP was even lower (75% of the items had a cloze-probability below .05). The constituent following the RP had a high cloze-probability in the idiomatic condition with all the idioms showing a cloze-probability larger than .68 (mean value = .86). The constituent that violated the expectancy generated by the idiom fragment (i.e., in the RP+1 position) had a very low cloze-probability (none of the participants completed the sentence with that constituent in more than 75% of the items). The word that followed this item had, on average, a cloze-probability of Volume 22, Number 8

Table 2. Means and Quartile Values of the Cloze-probability Distribution at the RP and RP+1 in the Three Experimental Conditions Quartile Mean

Min

1st Quartile

2nd Quartile

3rd Quartile

Max

RP Idiom

0.37

0

0.11

0.37

0.62

1

Substitution

0.05

0

0

0

0.05

0.37

Idiom

0.86

0.68

0.79

0.86

0.94

1

Substitution

0.18

0

0

0.10

0.32

0.90

Violation

0.02

0

0

0

0

0.48

RP+1

Twenty-five percent of the items had a cloze-probability lower than the first quartile value, 50% lower than the second, and 75% lower than the third one. A zero value on the third quartile means that for more than 75% of the items, none of the participants completed the fragment with that specific word.

.18. Again, there was high variability in the level of clozeprobability of this constituent (50% of the items were below .1 vs. 25% above .32). These differences reflect semantic constraints deriving from the need to complete the sentences in a meaningful fashion (we preserved the vocabulary class of the items in almost all cases, but it was impossible to balance the other lexical characteristics).5 Three lists were prepared using a Latin square design. The participants were randomly assigned to one of the three lists. Each list contained 29 experimental sentences per condition (idiomatic vs. substitution vs. violation) intermixed with 120 literal filler sentences6 of similar length and structure in a random order, the only constraint being that two experimental sentences in the same condition were never presented sequentially.

Wooley, 1982). Each trial began with a button press at which the first word of the sentence was displayed on the screen with all nonspace characters of the rest of the sentence replaced by dashes. When the participant pressed the space bar, the following word was displayed replacing the corresponding dashes and the previous one reverted back to dashes. We measured the time between each button press and the accuracy of responses to the comprehension questions. The moving window paradigm, in which the reader knows in advance how many words are coming up, did not bias the reader toward the idiom as the idioms were always embedded in larger contexts. Practice, task, and instructions were the same as in the ERP experiment. EEG Acquisition and ERP Extraction

Procedure The sentences were visually presented word by word in the center of a computer screen. The participants were instructed to read the sentences for comprehension. The instructions were given in written form and then orally repeated after a brief training. Each trial began when a participant pressed a keyboard button. A fixation point (a cross) at the center of the screen was substituted by single words presented for 300 msec and separated by a 300-msec blank (ISI = 600 msec). The last word of a sentence was followed by a period. The presentation of each sentence was followed by a 1500-msec blank. Every 10 sentences on average, the participants were asked to answer a true– false question about the content of the sentence just read. After each response, feedback was given. The experiment lasted approximately 35 min. The experiment started with a practice session formed by 15 literal sentences similar in structure and length to the experimental ones. In the self-paced reading experiment, the same experimental and filler sentences were visually presented word by word in the center of a computer screen using a moving window self-paced reading procedure ( Just, Carpenter, &

The electroencephalogram (EEG) was amplified and recorded with the BioSemi Active-Two System from 30 active electrodes placed on the scalp (Fp1, Fp2, AF3, AF4, F3, F4, F7, F8, FC1, FC2, FC5, FC6, C3, C4, T7, T8, CP1, CP2, CP5, CP6, P3, P4, P7, P8, O1, O2, PO3, PO4, Fz, Cz, Pz, Oz) plus four electrodes placed around the eyes for eye movement monitoring (2 at the external ocular canthi and 2 below the eyes) and two electrodes placed over the left and right mastoids. Two additional electrodes were placed close to Cz, the Common Mode Sense [CMS] active electrode and the Driven Right Leg [DRL] passive electrode and used to form the feedback loop that drives the average potential of the participant as close as possible to the AD-box reference potential (Metting van Rijn, Peper, & Grimbergen, 1990). EEG and EOG signals were amplified and digitized continuously with a sampling rate of 512 Hz. Adequate trigger signals were generated and recorded for synchronization. EEG signals were off-line referenced to the average activity of the two mastoids and then analyzed using Brainvision Analyzer. After a band-pass filter (0.2–30 Hz band pass), 1500-msec epochs containing the ERP elicited by the two target words (RP and RP+1) were extracted, starting Vespignani et al.

1687

200 msec prior to the onset of the RP. Segments including artifacts exceeding the amplitude of ±100 μV on any channel were rejected and the accepted epochs were averaged after a prestimulus 200-msec baseline correction. Six participants were excluded from the analyses due to the high number of rejected epochs (>25%). EEG Data Analyses The extracted average waveforms for each participant and condition were used to calculate the grand-average waveforms, to carry out ANOVAs on the fixed time windows, and to conduct latency and principal component analyses (PCAs). The statistical analyses on single-subject mean voltages in fixed time windows were performed using repeated measure ANOVAs with the Greenhouse– Geisser correction when the numerator degree of freedom exceeded one. Separate ANOVAs were carried out on different electrode groups in the 300–500 msec time window: one for midline sites (Fz, Cz, Pz, Oz) and one using 24 lateralized sites that were organized into three topographical regions, allowing evaluation of topographical effects to be divided into three orthogonal dimensions (see Table 3). The ANOVAs compared the idiomatic, substitution, and violation conditions at the RP and at the RP+1. ANOVAs were followed by t tests of the average voltage differences between conditions on separate topographic levels in order to test our hypothesis of a maximal effect around Cz in the comparison between the idiomatic and substitution conditions after the RP (N400), and of a more posterior effect, maximal around Pz, between the idiomatic and violation conditions at RP+1 (given that the P300 in the idiomatic condition is expected to contribute to the global effect). Separate comparisons were conducted for each of the midline sites, whereas for the 24 lateralized sites, comparisons were made of the average differences between the cells defined by the longitude and mesiolateral factors used for the ANOVA, as the effects are not expected to be lateralized. All the p values were adjusted with the

Bonferroni correction. Because the violation and idiomatic conditions coincide up to RP+1, we pooled the mean values of these two conditions for the analyses at the RP. However, the t tests were conducted on the differences between the ERP amplitude of the substitution condition and the pooled means of the two other conditions. At RP+1, the t tests were conducted on pairwise differences among the three conditions. A latency analysis was carried out in order to evaluate the onset of the effect both at the RP and at the RP+1. For this purpose, we considered the average activity over a large cluster of centro-parietal sites (C3, Cz, C4, CP1, CP2, P3, Pz, P4), where both the N400 and P300 should be visible. Separate t tests were conducted on 10-msec contiguous intervals comparing the differences between the substitution condition and the pooled mean value of the idiomatic and violation conditions at the RP, and between the idiomatic and the violation conditions at the RP+1. The onset of the effect was defined as the point at which at least five subsequent comparisons were statistically significant. This technique allows for evaluation of the latency of an effect (Rugg, Doyle, & Wells, 1995) and is appropriate when a relatively small number of trials by subject and condition (29 in our case) undermine the possibility of estimating peak latencies or fractional area latencies at the single-subject level. A temporal PCA was performed to better describe the ERP components that underlie the effects at the RP and at the RP+1. This statistical decomposition technique (together with the spatial, spatio-temporal PCA, and independent component analysis) can be used to describe features in the ERP more objectively and more precisely than is possible to the unaided eye (Dien & Frishkoff, 2005). Recent simulations have shown that the temporal PCA is well suited for distinguishing between components partially superimposed in time such as the N400 and the P300 (Dien, Khoe, & Mangun, 2007). Moreover, the temporal PCA defines different factors on the basis of their evolution in time, hence, it is particularly appropriate to analyzing

Table 3. Organization of 24 Measurement Sites into Three Orthogonal Topographical Factors for the ANOVAs F3

F4

F7

F8

FC1

FC2

FC5

FC6

C3

C4

T7

T8

Lateralization

L

R

L

R

L

R

L

R

L

R

L

R

Longitude

F

F

F

F

FC

FC

FC

FC

C

C

C

C

Mesiolateral

M

M

L

L

M

M

L

L

M

M

L

L

CP1

CP2

CP5

CP6

P3

P4

P7

P8

O1

O2

PO3

PO4

Lateralization

L

R

L

R

L

R

L

R

L

R

L

R

Longitude

CP

CP

CP

CP

P

P

P

P

O

O

O

O

Mesiolateral

M

M

L

L

M

M

L

L

M

M

L

L

Lateralization (2 Levels; L = Left; R = Right), Longitude (6 Levels; F = Frontal; FC = Fronto-central; C = Central; CP = Centro-parietal; P = Parietal; O = Occipital), Mesiolateral (2 Levels; M = Medial; L = Lateral).

1688

Journal of Cognitive Neuroscience

Volume 22, Number 8

long multiword epochs. The values emerging from the temporal PCA represent the amplitudes of the variables across time points. The factor loadings thus represent the time course of each factor. The scores assign a value to the contribution of each subject, condition, and electrode to each factor. The factor loadings must be the same across the entire dataset and a visual inspection of their time course is necessary in order to choose the latent factors likely to explain the effects under study. Before running the temporal PCA, the data were filtered (low-pass, 15 Hz cutoff ), and resampled at 256 Hz as we are not interested in extracting factors accounting for fast signals. Temporal factors (accounting for 90% of the variance) were extracted using the Varimax rotation procedure.

RESULTS The ERP Study The participants responded to the comprehension questions with an overall accuracy of 93%, indicating that they

indeed read for comprehension. Figure 1 shows the grandaveraged ERP for the three experimental conditions timelocked to the idiom RP. Figure 2 shows the grand-average waveform over the cluster of centro-parietal sites used for the latency analysis. Visual inspection of the waveforms reveals a more negative potential (peaking at around 400 msec) for the literal sentences, compared to the idiomatic condition, that occurred at the RP in the substitution condition, and at the RP+1 in the violation condition. However, the timing and the topographical distribution of these effects are different. The effect observed at the RP in the idiomatic versus substitution condition is compatible with a centrally distributed N400, slightly more pronounced over the right hemisphere. In contrast, the effect at the RP+1 in the idiomatic versus violation condition begins earlier and has a different topographical distribution: It is more posterior and is also visible at the occipital sites. Moreover, the waveform elicited by the idiomatic condition at the RP+1 is characterized by a clearly visible peak that is absent in the other conditions. This peak is also missing in the waveforms

Figure 1. Grand-average waveforms of the ERPs plotted in the negative-upward convention, for the three experimental conditions: idiomatic (thick line), substitution (thin continuous line), and violation (thin dashed line). The vertical lines correspond to the onset of the recognition point and the two following words (SOA = 600 msec). Note that the first word in the idiomatic and violation conditions are the same.

Vespignani et al.

1689

Figure 2. Grand-average waveforms of the ERPs over the cluster of centro-parietal sites (C3, Cz, C4, CP1, CP2, P3, Pz, P4) for the idiomatic (thick line), substitution (thin continuous line), and violation (thin dashed line) conditions. Vertical thicker lines correspond to the onset of the RP and of the RP+1, thin vertical lines are drawn at 300 and 500 msec after the onset of each word.

elicited by the idiomatic and substitution conditions at the RP. Its amplitude is similar to that of the preceding P200 at the occipital and parietal sites and it is comparable to what Roehm et al. (2007) have classified as a P300. We performed the ANOVAs on the mean voltage in the 300–500 msec time window, an interval typically used for quantifying the N400. The results for the midline and the lateralized sites are reported in Tables 4 and 5, respectively. At the RP, we obtained main effects of condition and longitude for the midline electrodes (see Table 4). The t tests (see Table 6) showed that, despite the absence of a significant Condition × Longitude interaction in the ANOVA, the effect was maximal at Cz. The ANOVAs on the lateralized sites (see Table 5) for the RP showed two interactions involving the factor condition: a Mesiolateral × Condition interaction and a Longitude × Mesiolateral × Condition interaction. The Bonferroni-corrected t tests showed only a marginally significant effect for the C3, C4 electrode pool (Longitude C, Mesiolateral M): mean difference = 0.983 μV, t(42) = 2.991, p < .1. The overall analyses indicate that the scalp distribution of the effect was broad and maximal around the vertex as expected for modulation of the N400 amplitude. At the RP+1 (see Table 4), we observed a main effect of condition and a Site × Condition interaction for the central line that quantitatively supports the more posterior distribution of the effect. In fact, the t tests (Table 7) showed significant differences between the violation and idiomatic conditions not only at Cz but also at Pz and Oz. We also obtained similar results in the comparison between the idiomatic and substitution conditions, whereas no significant differences emerged when we compared the two literal conditions. The ANOVAs at RP+1 for the lateralized sites (see Table 5) showed a main effect of condition and significant interactions of this factor with the longitude 1690

Journal of Cognitive Neuroscience

and mesiolateral factors. Two significant three-way interactions (Longitude × Condition × Mesiolateral factors; Longitude × Condition × Laterality) were also obtained. The t tests (Table 8) suggest that the interaction Longitude × Mesiolateral was due to significant differences between the violation and idiomatic conditions on posterior sites (central–parietal, parietal, and occipital). Significant results emerged in the comparison between the idiomatic and the substitution conditions but only on mesial levels at centro-parietal and parietal sites (this might account for the Longitude × Condition × Mesiolateral interaction). The three-way interaction with laterality was unexpected. It might reflect either a slight asymmetry of the posterior effect between the idiomatic and the violation conditions (i.e., larger effects on left centro-parietal sites) or the onset of a negative deflection for the idiomatic condition on left anterior sites evident in the grand-average (see Figure 1).

Table 4. Summary of the ANOVA Analyses RP+1

RP Source

df

F

p

F

Long

(3, 129)

5.34

.013***

1.04

Cond

(2, 86)

3.69

.031**

11.11