Eye Movements Reveal the Dynamic Simulation of Speed in Language

35 downloads 0 Views 312KB Size Report
who Time trots withal, who Time gallops withal, and who he stands still withal. —William Shakespeare, As You Like It. How does the brain process motion words ...
Cognitive Science 38 (2014) 367–382 Copyright © 2013 Cognitive Science Society, Inc. All rights reserved. ISSN: 0364-0213 print / 1551-6709 online DOI: 10.1111/cogs.12096

Eye Movements Reveal the Dynamic Simulation of Speed in Language Laura J. Speed, Gabriella Vigliocco Department of Cognitive, Perceptual and Brain Sciences, University College London Received 10 May 2012; received in revised form 23 March 2013; accepted 1 April 2013

Abstract This study investigates how speed of motion is processed in language. In three eye-tracking experiments, participants were presented with visual scenes and spoken sentences describing fast or slow events (e.g., The lion ambled/dashed to the balloon). Results showed that looking time to relevant objects in the visual scene was affected by the speed of verb of the sentence, speaking rate, and configuration of a supporting visual scene. The results provide novel evidence for the mental simulation of speed in language and show that internal dynamic simulations can be played out via eye movements toward a static visual scene. Keywords: Simulation; Embodiment; Language comprehension; Eye movements

Time travels in divers paces with divers persons. I’ll tell you who Time ambles withal, who Time trots withal, who Time gallops withal, and who he stands still withal. —William Shakespeare, As You Like It How does the brain process motion words such as those used by Shakespeare in the quote above? How do we understand that some refer to slow movement and others fast? One explanation for this ability comes from the embodied language approach (e.g., Barsalou, 1999). This account describes language understanding as the simulation of referred to events, simulation realized using the same systems as those used in actual perception and action (e.g., Glenberg & Kaschak, 2002; Kousta, Vigliocco, Vinson, Andrews, & Del Campo, 2011; Stanfield & Zwaan, 2001). In other words, the cognitive processes used in understanding meaning are “body based” (Wilson, 2002). For example, understanding the Correspondence should be sent to Laura J. Speed, Department of Cognitive, Perceptual and Brain Sciences, University College London, WC1H 0AP London, UK. E-mail: [email protected].

368

L. J. Speed, G. Vigliocco / Cognitive Science 38 (2014)

meaning of a sentence describing a motion event would involve an internal dynamic perceptual simulation of that event, with strong similarities to the processing of a comparable real-world event (Richardson & Matlock, 2007). According to this approach, the specific features of motion in the excerpt above would be simulated. So far, research has shown that concrete dimensions of objects or events are included in a simulation, such as orientation or size (e.g., Stanfield & Zwaan, 2001; Zwaan, Madden, Yaxley, & Aveyard, 2004; Zwaan, Stanfield, & Yaxley, 2002). For example, readers represent the fact that a nail has a different orientation if it is being hammered into the ground rather than into the wall (Stanfield & Zwaan, 2001). Similarly, in understanding descriptions of motion, dynamic simulations are generated, including features such as direction of motion (Kaschak et al., 2005; Meteyard, Bahrami, Vigliocco, & Zokaei, 2008) and change of object size during motion (Zwaan et al., 2004). These studies, however, do not really go much beyond establishing effects of interaction between language and perception and action. Moreover, and especially central to our goals here, we know little about whether aspects of an event that do not map directly into a single concrete type of experience, like speed, are simulated. To process the speed of a moving stimulus, one needs to take into account and integrate two pieces of information: temporal and spatial properties. Thus, understanding speed involves quite complex computations over space and time (Lingnau, Ashida, Wall, & Smith, 2009). Due to its complexity, it is possible that this level of motion information may not be included within the internal simulation of the meaning of sentences. On the other hand, speed information is important to encode in order to function efficiently in real-world interactions, and hence it may be an integral part of a simulated event. For example, tracking the speed of a moving object allows one to anticipate when it will reach a certain destination, which is essential in daily activities such as crossing a road or catching a ball, and so encoding this information in language may be particularly important. Looking at speed therefore allows us to investigate to what extent perceptual simulations generated during language comprehension capture the specific kinematic properties of real-world motion. To investigate whether perceptual simulations are sensitive to such fine-grained realworld differences, here we look at how speakers direct their eye gaze to static scenes while listening to sentences describing slow or fast motion events. A similar approach has been used by Richardson and Matlock (2007), in which speed was indirectly manipulated using descriptions of motion traversing easy (The desert is flat) versus difficult (The desert is hilly) terrain. Participants spent more time looking at a path region of a visual scene for fictive motion sentences that followed a description of a difficult terrain compared to an easy terrain, suggesting that listeners developed a mental representation of motion along the path. It is thought that eye movements observed while attending to a visual scene and comprehending linguistic information reflect sensorimotor patterns learned from experience with events in the world. That is, perceptual-motor content of sentences (such as motion information) can influence the direction of visual attention and its interaction with visual properties of the scene (Mishra & Marmolejo-Ramos, 2010). Perceptual simulations

L. J. Speed, G. Vigliocco / Cognitive Science 38 (2014)

369

generated during language comprehension and indexed by eye movements are consistent with the view that language develops in support of situated action (Barsalou, 1999; Glenberg & Gallese, 2012). Eye tracking is an important paradigm to be used in the investigation of embodied claims as it allows us to go beyond existing research by assessing simulation without requiring a dual-task design where the presence of mental simulation is inferred only by the existence of interference or facilitation effects when combining a language task with a task or stimulus with related perceptual and action features (e.g., Glenberg & Kaschak, 2002; Zwaan & Taylor, 2006). By recording eye movements during the presentation of a spoken sentence, we can monitor simulation online during the process of language comprehension. Moreover, we can observe simulation without requiring an additional task or judgment but in a manner more natural to language use: simply listening and understanding (see Altmann, 2011, for discussion). Evidence of simulation in this type of situation is crucial to arguments for the necessary role of simulation in language understanding. If speakers simulate the speed of events, as predicted by embodiment approaches, events described as having different speeds should differ with respect to the duration of the corresponding simulation. Because of the strong interactive processes between language comprehension, world knowledge, and visual attention (Altmann & Kamide, 2004; Crocker, Knoeferle, & Mayberry, 2010; Heuttig, Rommers, & Meyer, 2011), this difference in duration of simulation should then be reflected in the low-level visual processes engaged in eye-movement control, with looking times to objects in a supporting visual scene being longer for events described as being slow than for events described as being fast.

1. Experiment 1 The first experiment contrasted events described by “slow” verbs (e.g., amble) and “fast” verbs (e.g., dash) in spoken sentences, accompanied by matching visual scenes. Our interest was to assess whether listeners looked for longer/shorter durations at objects in the scene depending on the speed of verb. In addition, we manipulated the speaking rate (fast vs. slow) of the sentence. This was done mainly as a manipulation check: We would expect to see differences in eye-movement behavior between fast and slow speech simply because there will be more time to look around the scene for sentences that are spoken slowly compared to sentences spoken quickly. In addition, any interaction between speaking rate and speed of the verb would also be theoretically important, suggesting that processing semantic features of speed words and processing differences in physical speed of language production engage overlapping processes. That is, understanding the meaning of speed described in the sentences may involve similar systems to those involved in processing the auditory speed of the sentences (i.e., speed perception), and combining these two types of speed may affect processing differently when the speeds match compared to when they do not match. There could also be differences in the extent to which simulation occurs for fast and slow speech. Some researchers argue that

370

L. J. Speed, G. Vigliocco / Cognitive Science 38 (2014)

simulation of meaning is a slow process and thus does not fully develop all of the time (Barsalou, Santos, Simmons, & Wilson, 2008). Comprehension that requires quick or shallow understanding (e.g., in noisy situations or under time pressure) may rely more heavily on processes such as statistical linguistic patterns (Louwerse & Jeuniaux, 2010). Thus, we might expect to see evidence of simulation for sentences that are spoken slowly but not for sentences that are spoken quickly. 1.1. Method 1.1.1. Participants Forty-four native English speakers with normal or corrected-to-normal vision (29 females, Mage = 24.1) were recruited from the UCL Psychology Subject Pool and were paid for their participation. Four participants were removed for having an insufficient number of looks to any of the objects in the scene. 1.1.2. Materials Experimental items were spoken sentences describing either a fast or slow event, for example, “The lion ambled/dashed to the balloon.” Sixteen fast verbs (e.g., dash) and 16 slow verbs (e.g., amble) were used and had previously been rated as to their implied speed by a separate group of participants. The speaking rate of each sentence was also manipulated, either slow (average of 222 words per minute) or fast (average of 116 words per minute). Each experimental item therefore had four versions; a slow event with a slow speaking rate, a fast event with a fast speaking rate, a slow event with a fast speaking rate, and a fast event with a slow speaking rate. Each participant heard only one version of each item. Thus, there were four versions of the experiment with items allocated to each using a Latin Square design. Forty-one filler sentences were created that described either motion of no specific speed or no motion. Sentences were recorded by a speaker with an English accent in a soundproof room and spliced so that sentences with the same speaking rate were identical except for the verb, with the resulting sentences sounding natural. Mean duration of sentences was 3,667 ms (SD = 242 ms) for slow speaking rate and 1,777 ms (SD = 145 ms) for fast speaking rate. Each experimental item was paired with a visual scene that included the agent and target destination of the sentence and a distractor destination (e.g., see Fig. 1A). The agent was located at the center of the scene connected by a path to the target destination and the distractor destination located on the left or right side, counterbalanced across trials. 1.1.3. Procedure Before beginning the experiment, participants completed a “Mouse Training” task in which they had to click on circles on the screen for approximately 2 min. This task was included in order for participants to feel comfortable with the mouse and to give them practice at clicking on objects in an experimental setting.

L. J. Speed, G. Vigliocco / Cognitive Science 38 (2014) (A)

371

(B)

Fig. 1. Example scene with (A) and without (B) a distractor for the sentence “The man zoomed to the shop.”

In each trial, participants first had to look at the center of the screen as a drift correction check and then had to click on a fixation cross to ensure that mouse position was central at the beginning of the trial. The visual scene was then presented for 1,000 ms, after which the sound file was played while the scene remained onscreen. Participants had to use the mouse to click on the last object mentioned in the sentence. After 25% of filler trials, comprehension questions were presented on screen such as “Did the moose go to the box?” and participants were to respond pressing the left mouse button for “yes” and the right mouse button for “no.” Six practice trials were completed first. Note that the mouse task and comprehension questions were not of primary interest to us as dependent variables, but rather served as an incentive for the participants to listen carefully to the sentences. Eye movements were recorded using an Eyelink 2 head-mounted eye-tracker (SR Research Ltd., Mississauga, Ontario, Canada). The experiment lasted around 25 min. 1.2. Results and discussion Eye movements were recorded from the onset of the spoken sentence to the time at which they clicked on the target destination. As part of a data cleaning process, fixation times less than 150 ms were removed from analysis ( .5); we only observed an effect in eye movements. The data provide evidence that simulations develop in a manner consistent with the kinematics of real-world motion. Moreover, the results suggest that there is a link between simulation in language understanding and the low-level motor mechanisms that control eye movements, and that these mechanisms are sensitive to fine-grained, dynamic motion information contained in a sentence. We propose that mental simulations are not fixed but are flexible and built on the fly, interacting with relevant visual information when it is made available. The simulation, as evidenced by eye movements, is the combination of linguistic information from the sentence, information extracted from the visual scene, and world knowledge, such as how motion events unfold. When this information is in conflict, or one source is unavailable or ambiguous (as the destination is in Experiment 1), the simulation does not take that into account and is limited to the other available information that is reliable. In the case of Experiment 1, since only the agent is known, the simulation focuses exclusively on the agent. The simulation is very different when, instead, the destination is known, in which case, this latter is simulated. We note that in Experiment 3, we did not replicate the finding from Experiment 1 of an effect of speed on the sentence agent. This difference could be explained by the difference in task. In Experiment 3, participants are no longer required to click on the target destination; thus, they do not have as much of an incentive to pay attention to the sentence and scene as in Experiment 1. Since the scenes with a distractor are temporarily ambiguous, comprehension of the sentence and scene is likely to be more difficult here, and hence participants may process them less efficiently. This is supported by the accuracy results for comprehension questions. Participants performed significantly worse in the “with distractor” condition than the “without distractor” condition (F (1, 39) = 11.798, p = .001). Importantly, in two experiments we have shown that when there is no ambiguity between the sentence meaning and the visual scene, there is a clear effect of the perceptual simulation of speed of motion. How are eye movements engaged during sentence comprehension? The observed pattern of eye movements could be explained as the mapping between one’s mental representation of the world, which is dynamic and changeable, and incoming linguistic information (Altmann & Kamide, 2007). Thus, increased activation of elements in the

380

L. J. Speed, G. Vigliocco / Cognitive Science 38 (2014)

developing simulation caused by the linguistic input can lead to a shift in covert attention to the corresponding elements in a visual scene, increasing the likelihood of overt eye movements to their location (Altmann, 2011). In terms of our results, the speed of motion described in the sentences affected the speed of the internally simulated event; thus, the duration of the activation of the elements in the sentence differed according to the speed of the verb and thus the time spent looking at those objects also differed. Spivey and Geng (2000) propose that the oculomotor system should respond to an activated visual representation irrespective of how it is generated (e.g., by visual perception, through language or by memory) because “After all, how could it know the difference?” (p. 240). These results are also compatible with an action-based language theory (Glenberg & Gallese, 2012), which proposes that when listening to sentences, speech controllers are activated as a form of covert imitation, which in turn activates corresponding action controllers for the meaning of the words. Following this, forward models are generated via predictors and the perceptual or motor consequences of the action controllers are anticipated. Predictors in this model are thought to correspond to simulators in embodied approaches (Barsalou, 1999). In terms of the present data, activation of forward models for the heard sentences causes the eyes to move toward the objects that are anticipated, and they are looked at in a way consistent with the form of the predicted action (i.e., for longer for slow motion than fast motion). One implication arising out of these results is that sentence comprehension in the conditions in which eye-movement simulation was not observed may be different to when simulation was observed. For example, when sentences are spoken quickly, or when the configuration of the visual scene makes matching between sentence and objects ambiguous, is comprehension hindered? The present experiment does not allow us to properly analyze such a suggestion. Additional research, such as directly manipulated eye movements, would be invaluable to test these implications. Although a strength of our results is that we observed simulation online during comprehension, without the requirement of an extra task, a question that remains open is to what extent can the simulation of speed be observed when there is no supporting visual scene? That is, the present results may be the consequence of a mapping between sentence meaning and visual scene, and they may not reflect more general simulation processes that occur in other comprehension situations. To address this question, other types of behavioral experiments could be conducted using a variety of language tasks. For example, one could combine speed in language with real-world speed and assess if language judgments are affected by the interaction between the two types of speed, along similar lines to experiments conducted by Glenberg and Kaschak (2002) and Kaschak et al. (2005). One may also question the generalizability of our effect. Recent work in other labs has also shown an effect of speed on eye movements, using different scenes and sentences (Lindsay, Scheepers, & Kamide, 2013). We also believe that mental simulation can take place across natural language situations but, as our results highlight, may be dependent on situational factors. For example, the lack of an effect in our fast speaking rate condition suggests that simulations may not develop in cases where comprehenders are under time constraints.

L. J. Speed, G. Vigliocco / Cognitive Science 38 (2014)

381

It is possible that the observed eye-movement patterns could be explained by a nonsimulation account in which post-semantic imagistic representations develop from an amodal representation of the sentence meaning. Although we do not have evidence to rule out this view at present, we believe our interpretation is more favorable due to its parsimony: It postulates fewer representations. Further investigations could test the functional role of the simulations evidenced by eye movements by manipulating the eyemovement patterns in some way and assessing any comprehension difference. This research advances theories in embodied language processing by providing first evidence for the simulation of a relatively abstract and less salient aspect of semantics. Importantly, this shows that simulations are not simply shallow re-enactments, but operate at a fine grain according to specific properties about real-world interactions. Moreover, we provide this evidence in an experimental paradigm that allows a clear and natural observation of simulation. In addition, we add to the theoretical understanding of the processes linking language understanding and low-level visual mechanisms by showing that eye movements are sensitive to subtle semantic differences like speed.

Acknowledgments We thank Daniel Richardson for comments on the experiment design and the manuscript and Julia Florentine for recording of the experiment stimuli. The work reported here was supported by an ESRC grant (RES-062-23-2012) to GV. Data are archived in the ESRC Data Store (oai:store.ac.uk:archive:1079).

Note 1. All error bars in this paper represent 1 SE.

References Altmann, G. T. M. (2011). The mediation of eye movements by spoken language. In S. P. Liversedge, I. D. Gilchrsit, & S. Everling (Eds.), The Oxford handbook of eye movements (pp. 979–1003), Oxford, England: Oxford University Press. Altmann, G. T. M., & Kamide, Y. (2004). Now you see it, now you don’t: Mediating the mapping between language and visual world. In J. Henderson & F. Ferreira (Eds.) The interface of language, vision, and action: Eye movements and the visual world (pp. 347–386), New York: Psychology Press. Altmann, G. T. M., & Kamide, Y. (2007). The real-time mediation of visual attention by language and world knowledge: Linking anticipatory (and other) eye movements to linguistic processing. Journal of Memory and Language, 57, 502–518. Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. I., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39, 445–459.

382

L. J. Speed, G. Vigliocco / Cognitive Science 38 (2014)

Barsalou, L. W. (1999). Perceptual symbol systems. Behavioural and Brain Sciences, 22, 577–660. Barsalou, L. W., Santos, A., Simmons, W. K., & Wilson, C. D. (2008). Language and simulation in conceptual processing. In M. De Vega, A. M. Glenberg, & A. Graesser (Eds.), Symbols and embodiment: Debates in meaning and cognition (pp. 245–283), Oxford, England: Oxford University Press. Crocker, M., Knoeferle, P., & Mayberry, M. (2010). Situated sentence processing: The coordinated interplay account and neurobehavioural model. Brain and Langugae, 112, 189–201. Glenberg, A. M., & Gallese, V. (2012). Action-based language: A theory of language acquisition production and comprehension. Cortex, 48(7), 905–922. Glenberg, A., & Kaschak, M. (2002). Grounding language in action. Psychonomic Bulletin & Review, 9, 558–565. Heuttig, F., Rommers, J., & Meyer, A. (2011). Using the visual world paradigm to study language processing: A review and critical evaluation. Acta Psychologica, 137, 151–171. Kaschak, M. P., Madden, C. P., Therriault, D. J., Yaxley, R. H., Aveyard, M. E., Blanchard, A. A., & Zwaan, R. A. (2005). Perception of motion affects language processing. Cognition, 94, B79–B89. Kousta, S. T., Vigliocco, G., Vinson, D. P., Andrews, M., & Del Campo, E. (2011). The representation of abstract words: Why emotion matters. Journal of Experimental Psychology General, 140(1), 14–34. Lindsay, S., Scheepers, C., & Kamide, Y. (2013) To dash or to dawdle: Verb-associated speed of motion influences eye movements during spoken sentence comprehension. PLoS One, 8(6), e67187. Lingnau, A., Ashida, H., Wall, M. B., & Smith, A. T. (2009). Speed encoding in human visual cortex revealed by fMRI adaptation. Journal of Vision, 9(13), 1–14. Louwerse, M. M., & Jeuniaux, P. (2010). The linguistic and embodied nature of conceptual processing. Cognition, 114, 96–104. Meteyard, L., Bahrami, B., Vigliocco, G., & Zokaei, N. (2008). Now you see it: Visual motion interferes with lexical decision on motion words. Current Biology, 18, R732–R733. Mishra, R., & Marmolejo-Ramos, F. (2010). On the mental representations originating during the interaction between language and vision. Cognitive Processing, 11(4), 295–305. Richardson, D., & Matlock, T. (2007). The integration of figurative language and static depictions: An eye movement study of fictive motion. Cognition, 102(1), 129–138. Sanford, A. J. (2008). Defining embodiment in understanding. In M. De Vega, A. M. Glenberg, & A. Graesser (Eds.), Symbols and embodiment: Debates in meaning and cognition. Oxford, England: Oxford University Press. Spivey & Geng (2000). Oculomotor mechanisms activated by imagery and memory: Eye movements to absent objects. Psychological Research, 65, 235–241. Stanfield, R. A., & Zwaan, R. A. (2001). The effect of implied orientation derived from verbal context on picture recognition. Psychological Science, 12, 153–156. Wilson, M. (2002). Six views of embodied cognition. Psychonomic Bulletin and Review, 9(4), 625–636. Zwaan, R. A., Madden, C. J., Yaxley, R. H., & Aveyard, M. E. (2004). Moving words: Dynamic representations in language comprehension. Cognitive Science, 28, 611–619. Zwaan, R. A., Stanfield, R. A., & Yaxley, R. H. (2002). Language comprehenders mentally represent the shapes of objects. Psychological Science, 13, 168–171. Zwaan, R. A., & Taylor, L. J. (2006). Seeing, acting, understanding: Motor resonance in language comprehension. Journal of Experimental Psychology-General, 135, 1–11.

Suggest Documents