Spatial context learning
1
Makovski & Jiang
[February 2011; In press, Quarterly Journal of Experimental Psychology]
Investigating the role of response in spatial context learning Tal Makovski Yuhong V. Jiang
Department of Psychology and Center for Cognitive Sciences, University of Minnesota
Keywords: contextual cueing, perception and action, visual search, touch response, eye movement
Running title: Spatial context learning
Send correspondence to: Tal Makovski Department of Psychology University of Minnesota N218 Elliott Hall Minneapolis, MN 55455 Email:
[email protected] Tel: 612-624-9483 Fax: 612-626-2079 Acknowledgements This study was supported in part by NIH 071788. We thank Khena Swallow and Ming Bao for help with eye tracking, Ameante Lacoste, Birgit Fink, and Sarah Rudek for comments, and Eric Bressler, Jacqueline Caston, and Jen Decker for help with data collection. Correspondence should be sent to Tal Makovski, N218 Elliott Hall, Minneapolis, MN 55455. Email:
[email protected].
Spatial context learning
2
Makovski & Jiang
Abstract Recent research has shown that simple motor actions, such as pointing or grasping, can modulate the way we perceive and attend to our visual environment. Here we examine the role of action in spatial context learning. Previous studies using keyboard responses have revealed that people are faster locating a target on repeated visual search displays (“contextual cueing”). However, this learning appears to depend on the task and response requirements. In Experiment 1, participants searched for a T-target among L-distractors, and responded either by pressing a key or by touching the screen. Comparable contextual cueing was found in both response modes. Moreover, learning transferred between keyboard and touch screen responses. Experiment 2 showed that learning occurred even for repeated displays that required no response and this learning was as strong as learning for displays that required a response. Learning on no-response trials cannot be accounted for by oculomotor responses, as learning was observed when eye movements were discouraged (Experiment 3). We suggest that spatial context learning is abstracted from motor actions. Embedded (or “grounded”) cognition theories postulate that cognition should not be studied independent of the body and environment that accommodates it (cf., Barsalou, 2008). Instead, body states, motor actions, and their interactions with the environment all take part in basic cognitive processes. This idea of a unified cognitive system, where an action is more than the mere outcome of cognition (Song & Nakayama, 2009), also underlies the vision-for-action hypothesis (Milner & Goodale, 2006). According to this hypothesis, the dorsal visual pathway, commonly described as carrying “where” information, should be described as carrying “how” information, as objects are represented in relation to the observer and to the actions directed to them (Goodale & Milner, 1992). These theories suggest that vision is determined not only by the visual input, but also by the motor output directed to it. Along a similar theoretical vein, the premotor theory of attention (Rizzolatti, Riggio, Dascola, & Umilta, 1987) states that covert attention is driven by the need to overtly direct a motor action to an object. In his review on attention, Allport suggested that perceptual filtering of distractors is not the main function of attention. Rather, people selectively attend to targets to ensure that a successful response is made to them (Allport, 1989). That perception and attention are affected by whether or not an action is carried out raises the possibility that learning and memory may be modulated by motor actions as well. Indeed, a few findings indicate that this may be the case. In research on the production effect, memory for words spoken out loud is enhanced relative to memory for words read silently (MacLeod, Gopie, Hourihan, Neary & Ozubko, 2010). In addition, it has been shown that negative adjectives were remembered better when participants shook their heads during encoding, and positive adjectives were remembered better when participants nodded their heads during encoding (Forster & Stark, 1996). Even a simple motor action can influence cognition as shapes presented after a “go” response were considered more pleasant than shapes presented after a “no-go” response (Buttaccio & Hahn, 2010). In the present study we further examine the interaction between vision and action. In particular, we test how overt motor action made toward a search target modulates contextual learning between the target and its surrounding spatial context (Chun & Jiang, 1998).
Spatial context learning
3
Makovski & Jiang
The role of action in visual learning People are highly efficient at extracting regularities embedded in the environment. Learning is revealed for many aspects of the environment, including repeated spatial layout (Chun & Jiang, 1998), motion trajectories (Chun & Jiang, 1999; Makovski, Vázquez, & Jiang, 2008), target locations (Miller, 1988; Umemoto, Scolari, Vogel, & Awh, 2010), visual or movement sequences (Nissen & Bullemer, 1987; Swallow & Zacks, 2008), and object-to-object associations (Fiser & Aslin, 2002; Turk-Browne, Junge, & Scholl, 2005). For at least some types of visual statistical learning, making an overt motor response is not necessary. For example, consistent association between objects can be learned when observers are simply asked to look at a series of shapes (Fiser & Aslin, 2002). In other types of learning, however, motor response appears to be an integral component, as shown in the Serial Reaction Time task (SRT, Nissen & Bullemer, 1987). In this task, participants press one of several keys to report the corresponding position of a stimulus on the screen. Unbeknown to them, the positions of the stimulus over several trials sometimes form a repeated sequence (e.g., positions 12432431…12432431…). Participants usually respond faster to repeated sequences than unrepeated one, even in situations where they are unaware of the repetition (Nissen & Bullemer, 1987). Although participants can learn the motor sequence of finger movements independent of the perceptual sequence of stimulus on the screen (Willingham, 1999), or the perceptual sequence on the screen independent of the motor sequence (Remillard, 2003), covarying the motor and perceptual sequences may facilitate learning of complex sequences (Dennis, Howard & Howard, 2006). Ziessler and Nattkemper (2001) suggest that response-tostimulus association governs learning in the SRT task. Specifically, in line with the notion of response-effect learning (e.g., Hommel, Müsseler, Aschersleben & Prinz, 2001), participants learn to predict the upcoming stimulus based on their current response. Unlike the SRT task, spatial context learning in visual search does not involve predictive association between a repeated display and a motor response, or between a motor response and the next display. In this task, participants are asked to press a key to report the orientation of a Ttarget embedded among L-distractors. Unbeknown to them, some search displays are repeated in the experiment, presenting consistent associations between the spatial layout of the search display and the target’s location (Chun & Jiang, 1998). However, the target’s orientation, and hence the associated key press, is randomized, preventing participants from associating a repeated display with a specific response. Nonetheless, participants are often faster searching from repeated displays than from new ones, revealing “contextual cueing” (Brady & Chun, 2007; Chun & Jiang, 1998; Kunar, Flusberg, Horowitz & Wolfe, 2007). Although learning in contextual cueing cannot be attributed to the acquisition of consistent stimulus-response association, there is indirect evidence that this type of learning is not fully abstracted from the task and responses that participants made. First, when the same spatial displays are used for either a change-detection or a visual search task, learning does not fully transfer between the two tasks (Jiang & Song, 2005). Second, whereas young children (613 years old) fail to show contextual cueing when making a keyboard response to the target’s orientation (Vaidya, Huger, Howard, & Howard, 2007), they do show robust learning when they simply touch the target’s location on a touch screen (Dixon, Zelazo, & De Rosa, 2010). Although these two studies differ in several aspects (e.g., whether the search stimuli were fish or letters), it is possible that some types of motor response promote greater learning than other types. Specifically, touching a target on the screen involves a task-relevant, viewer-centered spatial component that is absent in a keypress task. Moreover, touching a screen may facilitate learning
Spatial context learning
4
Makovski & Jiang
because vision is superior near the hands (Abrams, Davoli, Du, Knapp & Paull, 2008). Therefore visual learning may be enhanced when the hands are positioned near the screen than near the keyboard. Third, the representation of spatial context is viewpoint specific, as contextual cueing acquired from a virtual 3-D search display does not transfer after a 30º change in viewpoint (Chua & Chun, 2003). The viewpoint-dependent characteristic suggests that what is learned in contextual cueing may depend on the viewer’s perspective and goals toward the target. Finally, in adults, whereas successful detection of a target leads to learning of the search display, learning is absent when search is interrupted before the target is detected and a response is made (Shen & Jiang, 2006). The studies reviewed above suggest that statistical regularity alone is insufficient to fully account for what is learned in contextual cueing. It is possible that making an action toward the target is an integral part of the learning. Furthermore, making a detection response, as opposed to no response, constitutes a change in the participant’s current activity, and this change may trigger a re-checking or updating process that enhances the learning and memory of concurrently presented visual information (Swallow & Jiang, 2010, 2011; Swallow, Zacks, & Abrams, 2009). The main purpose of this study is to directly evaluate the role of motor actions in spatial context learning. We test whether learning is specific to the kind of motor responses made to the targets (keyboard vs. touch-screen response), and whether withholding a response to targets hinders learning. Experiment 1 Experiment 1 investigated the specificity of contextual cueing to different response modes toward the search target. The ideomotor principle (cf. Stock & Stock, 2004) states that movements are associated with their contingent sensory effects. Consistent with this principle, Hommel and colleagues proposed that an action is automatically associated with all active event codes to form an associative structure (action-effect binding; e.g., Elsner & Hommel, 2001; Hommel, Alonso & Fuentes, 2006; Hommel et al., 2001). If action codes are integrated with perceptual representation in spatial context learning, then contextual cueing acquired in one response mode may not readily transfer to tasks involving a different response mode. Alternatively, contextual cueing may be largely abstracted from the action made to the targets. If this is the case, then it should transfer between different response modes. To address this issue, in Experiment 1 we first tested whether spatial context learning was affected by response modes. During training, all participants completed a touch–screen version and a keyboard version of the standard contextual cueing task in separate sessions. We examined whether contextual cueing was greater in the touch task than in the keypress task. After the training session, we tested whether learning acquired from touch transferred to keypress and vice versa.
Method Participants: All participants were students from the University of Minnesota. They were 18 to 35 years old and had normal color vision and normal or corrected-to-normal visual acuity. Twelve participants (mean age 19.4 years; 5 males) completed Experiment 1.
5
Spatial context learning
Makovski & Jiang
Equipment: Participants were tested individually in a normally lit room and sat unrestricted at about 55 cm from a 15” ELO touch screen monitor (resolution: 1024x768 pixels; refresh rate: 75Hz). The experiment was programmed with psychophysics toolbox (Brainard, 1997; Pelli, 1997) implemented in MATLAB (www.mathworks.com). Stimuli: The search items were white T and Ls (1.75° X 1.75°) presented against a black background. Participants searched for a T rotated 90° to the right or to the left. The orientation of the T was randomly selected provided that there were equal numbers of left and right Ts in each condition of each block. Eleven L-shaped stimuli served as distractors. They were rotated in one of the four cardinal orientations (the offset at the junction of the two L segments was 0.175°). All items were positioned in an imaginary 8 by 6 matrix (28° X 21°) and the position of each item was slightly off the center of a cell to reduce co-linearity between items. Touch task: In the touch task, participants initiated each trial by touching a small white fixation point (0.42° X 0.42°) at the bottom of the screen. Then the search display appeared and remained until participants touched the target’s position (Figure 1 Left). Responses falling within an imaginary square (1.05 º X 1.05 º) surrounding the center of the target were considered correct. Each incorrect response was followed by a red minus sign for 2000 ms. Press task: In the keypress task, each trial started with a small white fixation point (0.42° X 0.42°) appearing at the bottom of the screen for 500 ms, followed by the search array (Figure 1 Left). To make the keypress task similar to the touch task in terms of task requirement (detecting the T), participants were asked to press the spacebar as soon as they detected the target. The time from the search display onset to the spacebar response was taken as the search RT. To make sure that participants had accurately located the target, we erased the display upon the spacebar response and asked participants to report whether the T was rotated to the left (by pressing the ‘N’ key) or to the right (by pressing the ‘M’ key). The second response was unspeeded and provided us with an accuracy measure. Each incorrect response was followed by a red minus sign for 2000 ms.
Old L
L
T
L
L
new LL
L
L
L
T
L
Old T
LL
L
L
L
L
L
L
L
L
L
L
L
L
L
Training: press (touch)
L
L
L
L
L
L
L
L
L
T L
L
L
L
new L
L
L
L
L L
L
L
L L L L
. .
.
L
L
Block M
Block N+1
L
L L
T
L
L
L
L
LL
L
.
L
L
.
new L
T
Old
L
L
.
L
Block N
Testing: touch (press)
L
L
L L
.
.
.
Figure 1. Left: A schematic illustration of trial sequences used in the press and touch tasks of Experiment 1. Right: A schematic illustration of old and new conditions. Items are not drawn to scale.
Spatial context learning
6
Makovski & Jiang
Design: We used a within-subject design for all experimental factors. All participants completed two sessions of the experiment separated by one week. Each session started with a training phase in one task (e.g., touch), followed by a testing phase in the other task (e.g., press). Training and testing tasks for session 1 were reversed in session 2. The order of the two sessions (touch training first or press training first) was randomized across participants. The spatial configurations used for the two sessions were unrelated. Participants were familiarized with the tasks and procedure prior to each session with 10 practice trials in each task. This was followed by the training phase that consisted of 576 trials, divided into 24 blocks of 24 trials each. Half of the trials in each block involved old displays: these were 12 unique search displays that were repeated across blocks, once per block. The old displays were generated at the beginning of each session and were not the same across the two sessions. The other half of the trials involved new displays: they were 12 unique displays newly generated at the beginning of each block, with the constraint that the search target’s location was repeated across blocks (see Chun & Jiang, 1998). To avoid biases for searching among a particular quadrant, the target’s location was evenly distributed across the four visual quadrants in both the new and old conditions. Within each session, after completing the training phase in one task (e.g., touch), participants were tested in the other task (e.g., keypress) using old displays shown earlier in the session. They were familiarized with 10 practice trials in the new task before the testing phase began. The testing phase consisted of four, 24-trial blocks. Half of the trials within each block were the old displays used in the training phase and half were newly generated (new) displays. The test phase enabled us to assess whether there was any transfer of learning across touch and keypress tasks. A comparison between the amount of learning in the test phase and in the training phase allowed us to examine whether learning was significantly weakened by a change in task. Figure 1 (right) depicts a schematic illustration of display conditions. Explicit recognition test: Memory of the search displays was tested at the end of session 2 in a surprise recognition test. Twelve new displays were randomly intermixed with 12 old displays from session 2. These 24 displays were presented one at a time for participants to make an unspeeded “old/new” response. Results Accuracy Accuracy was relatively high in both training and testing phases of the experiment and for both touch and keypress tasks (Table 1). Accuracy was not affected by any of the experimental factors or their interactions (all p’s > .30) except for the main effect of task: touch responses were more accurate than keypress responses, F(1, 11) = 13.51, p < .01, ηp2 =.51. This difference was likely due to the fact that participants could touch the T while it was on the screen, but they had to report the T’s orientation after the display was cleared in the keypress task. Table 1: Percent correct as a function of task (touch, press), experiment phase (training, testing) and condition (old, new). Standard errors of the mean are presented in parentheses. Trained with touch Trained with keypress Training (touch) Testing (press) Training (press) Testing (touch) Old 98.5 (0.5) 92.5 (2.6) 95.2 (1.9) 99.5 (0.3) New 98.8 (0.3) 92.4 (2.9) 94.5 (1.9) 99.1 (0.5)
Spatial context learning
7
Makovski & Jiang
RT In the RT analysis, we excluded incorrect trials as well as trials exceeding three standard deviations of each participant’s mean of each condition. This RT outlier cutoff removed 1.5% of the correct trials. 1. Training phase To reduce statistical noise, the 24 training blocks were binned into 6 epochs (Figure 2 Left; 1 epoch = 4 blocks). A repeated-measures ANOVA on task (touch or press), condition (old or new), and epoch (1 to 6) revealed a significant main effect of epoch, F(5, 55) = 29.09, p < .01, ηp2 =.73, reflecting faster RTs as the experiment progressed. Touch responses were numerically but not statistically slower than keypress responses, F(1, 11) = 1.81, p > .20. Contextual cueing was revealed by the significantly faster responses in the old than new conditions, F(1, 11) = 5.4, p < .05, ηp2 =.33, and this effect did not interact with task, F < 1. There was no difference between the old and new conditions in epoch 1 (F < 1), but old trials were responded to faster than new trials in epochs 2-6, F(1, 11) = 7.9, p < .02, ηp2 =.42. The interaction between epoch and condition, however, failed to reach significance, F(5, 55) = 2.05, p < .09, ηp2 =.16. None of the other interactions were significant (all F’s .72), they were both significantly faster than the new condition, t(15) = 2.30, p < .05 for new vs. old-response; t(15) = 2.60, p < .05 for new vs. old-no-response. This finding suggests that the omission of a target detection response during training was not detrimental to spatial context learning.
A: Training
B: Testing
840
RT "hits" (ms)
780 820 720
RT (ms)
660
600
780
Proportion correct
1 0.9 740
0.8 0.7 0.6 0.5
700 1
2
3 Epoch
4
5
old-no-response old-response Display type
new
Spatial context learning
13
Makovski & Jiang
Figure 4: Left: Mean RT for hit responses (top) and overall proportion correct (bottom) in the training phase of Experiment 2. Right: Mean RT in the testing phase of Experiment 2 . Error bars show ±1 S.E. Explicit recognition In the recognition task, participants committed high false alarm rates for the new (59%) and novel (54.3%) conditions, which were similar to the hit rates for the old-no-response (53.1%) and the old-response conditions (57%), F < 1. These findings confirm that learning was largely implicit for old displays that received a response and for those that received no response. Discussion Experiment 2 showed that contextual cueing did not depend on making an overt motor response to the search targets. Search displays that did not receive a motor response were learned to the same extent as search displays that received a response. We found no evidence that refraining from responding to the target reduced learning or produced inhibition. This finding provides further evidence that spatial context learning is abstracted from responses. Not only is it invariant to two different response modes to the target (touch or press, Experiment 1), but it is also comparable for displays involving a detection response and ones where a response was omitted. The advantages afforded by target detection observed in previous behavioral and neuroscience studies (e.g., Aston-Jones et al., 1994; Swallow & Jiang, 2010, 2011) are likely triggered by detecting the target, rather than making an overt motor response to it.
Experiment 3 The results have clearly shown that neither the need nor the nature of making a manual response has an effect on spatial context learning. Before we can conclude that overt motor action is not critical for such learning, we need to consider the role of oculomotor responses. Previous research has shown that spatial context learning not only results in faster manual response at detecting the target, but also faster eye movement towards the target (Peterson & Kramer, 2001). Even though contextual cueing was observed when the search display was too brief for eye movements (Chun & Jiang, 1998), this finding was observed when subjects made a manual detection response to the target. Experiment 2 had revealed contextual cueing in the absence of a manual detection response, however, subjects may have made saccadic eye movements toward the target [footnote 3]3. To address whether contextual cueing depends on any type of motor response, Experiment 3 repeated Experiment 2’s design under conditions where eye movements were prevented or discouraged. In Experiment 3a participants were discouraged from making any eye movements, and compliance with this requirement was checked with the use of an eye tracker. In Experiment 3b the search display was briefly presented for 150 ms, a duration long enough for shifting covert attention but shorter than a typical saccade latency (Nakayama& Mackeben, 1989). If overt motor response is a critical component of spatial context learning, then in the absence of eye movements or manual responses, contextual cueing should be eliminated.
3
We thank Dr. Dominic Dwyer and an anonymous reviewer for raising this possibility.
Spatial context learning
14
Makovski & Jiang
Experiment 3a Method Participants: Performing a search task for a long time without moving their eyes turned out to be difficult for naïve subjects. Only eight (out of 14) participants (mean age 23.8 years; 6 males) were able to complete Experiment 3a. Equipment, stimuli, procedure and design: The main novelty of this experiment was the use of an ISCAN ETL-300LR eye-tracker to ensure fixation. A chinrest was used to fixate head position at a viewing distance of 75 cm (thereby the perceived size of the stimuli was reduced by ~17%). Participants were told to fixate their gaze at the white fixation point (0.35° X 0.35°) that remained on the screen throughout a trial. A trial would not start until participants had fixated within a 1.04° X 1.04° imaginary box surrounding the center, for at least 60% of the preceding 500 ms sliding time window. The same criterion – fixating within the 1.04º fixation box for at least 60% of a sliding 500 ms time window – was used to indicate whether fixation had been broken during a trial, and was based on the notion that subjects would not be able to execute a saccade towards the target without being detected. Violating this stringent criterion would trigger a warning message (“you moved your eyes in the last trial!”) 500 ms after the accuracy feedback. Compliance with the eye fixation requirement was checked during both training and testing phases and a 15-30 minutes practice for maintaining fixation was given prior to the start of the experiment. We did not administer the explicit recognition test. In all other respects the experiment was identical to Experiment 2. Results Training phase Except for one subject who broke fixation on 65% of the trials in the training phase, the averaged fixation breaks was 28.9% (S.E. =4.8). Fixation breaks were more frequent on noresponse trials (38.4%) than on response trials (28.4%) probably because these trials lasted longer, providing participants more opportunities to break fixations. The average false alarm rate was 11.3% (S.E. = 2.5) and participants correctly executed a response on 79.5% (S.E. = 2.0) of the response trials, with a mean RT of 792 ms (S.E. = 21.5) Testing phase: Accuracy Accuracy in the testing phase was not affected by display type: 94.3% for the old-noresponse condition, 92.8% for the old-response condition, and 93.6% for the new condition, F .79), they were both significantly faster than the new condition, p’s.33, for no-response displays, and r(126)=-.03, p>.7, for response displays. A recent conference report also demonstrated contextual cueing in participants whose eye movement was minimized with the use of an eye tracker (Le Dantec & Seitz, 2010). 1080
Experiment 3a: Eye-Tracker
RT (msec)
1040 1000 960 920 880 840 old-no-response
old-response
new
Display type
800
Experiment 3b: Brief display
RT (msec)
760 720 680
640 old-no-response
old-response
new
Display type
Figure 5: Experiment 3a’s (top)and 3b’s (bottom)mean RT in the testing phase. Error bars show ±1 S.E.
Experiment 3b Experiment 3a replicated the findings of Experiment 2 under conditions where eyemovements were discouraged. Specifically, even when oculomotor responses were minimized, similar learning effects were found regardless of whether the task required a motor response. However, because of the high rate of fixation breaks we deemed it necessary to seek converging evidence for these findings. In Experiment 3b we repeated the previous design but restricted the presentation duration of the search display to 150 ms. Because this duration is shorter than saccade latency, subjects could not have moved their eyes to the target (Chun & Jiang, 1998). Method Participants: Sixteen participants (mean age 21.0 years; 2 males) completed Experiment 3b.
Spatial context learning
16
Makovski & Jiang
Equipment, stimuli, procedure and design: The main change to Experiment 3b was to reduce the presentation duration of the training phase from 500 ms to 150 ms. A pilot study showed that the target-detection task used in Experiment 2 was too difficult when the display was presented briefly. We therefore simplified the search task by lowering set size from 12 to 8. We also increased the difference between the target T and distractor Ls (the items were now elongated to 1.75ºx1.28º) and simplified the T and Ls (the T-item was rotated by 15° to the right or to the left, while the L-shaped distractors were randomly rotated in one of the four cardinal orientations). Finally we asked participants to fixate their gaze at the white fixation point (0.42° X 0.42°) that remained on the screen throughout a trial. We did not use an eye tracker, neither did we administer the explicit recognition test. In all other respects the experiment was identical to Experiment 2. Results Training phase The average false alarm on no-response trials was 8.7% (S.E. = 1.1). Accuracy on response trials was 89.1% (S.E. = 2.1) and mean RT was 624 ms (S.E. = 18.5). Testing phase: Accuracy Accuracy in the testing phase was not affected by display type: 95.1% for the old-noresponse condition, 95.3% for the old-response condition, and 95.4% for the new condition, F < 1. In the RT analysis, we excluded incorrect trials as well as trials exceeding three standard deviations of each participant’s mean of each condition (1.7% of the correct trials). Testing phase: RT Figure 5b shows RT in the testing phase of the experiment. Planned contrasts showed that whereas the old-no-response and old-response conditions were not significantly different (p > .82), they were both significantly faster than the new condition, t(15) = 2.18, p < .05 for new vs. old-response; t(15) = 2.94, p < .01 for new vs. old-no-response. Direct comparison between these data and Experiment 2’s findings revealed no effects of presentation duration, all F’s 14, p’s < .01, ηp2 >.43, they did not interact, F