Psychological Research DOI 10.1007/s00426-011-0402-z
O R I G I N A L A R T I CL E
Response-mode shifts during sequence learning of macaque monkeys Dennis Rünger · F. Gregory Ashby · Nathalie Picard · Peter L. Strick
Received: 29 August 2011 / Accepted: 29 November 2011 © Springer-Verlag 2011
Abstract Incidental sequence learning has been conceptualized as involving a shift from stimulus-based to planbased performance (e.g., Tubauet et al. in Journal of Experimental Psychology: General 136:43–63, 2007). We analyzed the response time (RT) data of two macaque monkeys that were trained for thousands of trials on a sequential reaching task in a study by Matsuzaka et al. in Journal of Neurophysiology 97, 1819–1832 (2007). The animals learned to respond predictively to a repeating 3-element sequence. During a transitional period, RT distributions were bimodal, indicating that the animals alternated between two processing modes. An analysis of trial-to-trial mode shifting probabilities provided preliminary evidence for a strategic process.
The analysis of response-time data in this article is based on behavioral data reported by Matsuzaka, Piccard, and Strick (J Neurophysiol 97:1819–1832, 2007) D. Rünger (&) · F. G. Ashby Department of Psychological and Brain Sciences, University of California, Santa Barbara, Santa Barbara, CA 93106-9660, USA e-mail:
[email protected] N. Picard Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA P. L. Strick Department of Neurobiology, Pittsburgh Veterans AVairs Medical Center, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
Introduction Learning a perceptual-motor skill often begins with a set of verbal instructions that the novice then tries to translate into a coordinated series of movements. In many other instances, sequential regularities are learned incidentally. For example, a proWcient tennis player becomes, over time, increasingly capable of reading an opponent’s serve, even when she makes no conscious attempt to analyze the other player’s strokes. In fact, she may not even be aware of the sequence of cues (e.g., the direction of the ball toss, position of the racquet head) that she uses to anticipate the ball’s speed and trajectory. This article tests some recent theoretical ideas about how action sequences are acquired in the absence of a conscious intention to learn about a sequential regularity. Several authors proposed that learning includes a shift from stimulus-based to internally generated or plan-based sequence production (HoVmann & Koch, 1997; Koch, 2007; Tubau, Hommel, & López-Moliner, 2007; Tubau & LópezMoliner, 2004). When an individual performs a task that is structured according to a sequential regularity, initial responses are stimulus-triggered, akin to a “prepared reXex” (Hommel, 2000). Over time, response control shifts from the stimulus environment to some internal representation of the sequential regularity. SpeciWcally, behavior is thought to be controlled by motor programs (Keele, 1968; HoVmann & Koch, 1997), motor chunks (Verwey, 1999, 2003), or action plans (Tubau et al., 2007) in advanced stages of sequence learning. Support for this view of incidental sequence learning comes from studies with the serial reaction time (SRT) task (Nissen & Bullemer, 1987). In this task, participants monitor several screen locations for the appearance of a target stimulus. Upon presentation of the target, they quickly push
123
Psychological Research
the response button that is assigned to the target’s current location. The key feature of the task is that target locations on consecutive trials follow a repeating pattern. Participants learn about this pattern, even when task instructions make no mention of an underlying regularity. Sequence learning can be demonstrated, for example, by contrasting participants’ RTs to systematic and randomly determined target locations. RT savings on predictable trials indicate that participants were able to anticipate upcoming trial events. In previous studies with the SRT task, a shift from stimulus-based to plan-based responding was demonstrated indirectly by showing that learning immunized performance against experimental manipulations of stimulus– response (S-R) compatibility. The rationale is that as responses come to be controlled by internal sequence representations, the stimulus characteristics of the task are rendered irrelevant and RT impairments due to incompatible S-R mappings should disappear. For example, some of HoVmann and Koch’s (1997) participants responded to target locations with spatially compatible button presses, while others made the same responses to centrally presented symbols. Initially, the reduced S-R compatibility in the symbolic condition caused substantially higher RTs. However, at the end of training, participants who subsequently were able to report the sequence responded nearly as quickly to symbols as to locations. In a conceptually similar experiment by Tubau and López-Moliner (2004), participants made left–right responses to symbols that were presented randomly at left– right screen locations. Responses were determined by the identity of the symbols that were presented in a repeating sequence. The authors observed the well-documented Simon eVect: RTs were elevated when the irrelevant location of the symbol was incompatible with the spatial location of the required response (Simon & Small, 1969). However, the Simon eVect dissipated in the Wnal training blocks for participants who subsequently reported the systematic response sequence correctly. Koch (2007) was able to replicate this Wnding with four response alternatives and stimulus locations, respectively. Finally, Haider, Eichler, and Lange (2010) found that the Stroop congruency eVect (Stroop, 1935) vanished after participants became consciously aware of the regular response sequence that was built into the task. The aim of the present research is to investigate the transition from stimulus-triggered to plan-based responding. In particular, we focused on the question of whether this transition occurs discretely—that is, as a shift between alternative processing modes—or gradually through a parallel race process. The distinction between mode shifting and parallel processing was elaborated in Verwey’s (2003) framework of sequential-action control. In line with a growing body of empirical evidence, Verwey assumed that sequence infor-
123
mation can be represented at multiple levels, for example, as a sequence of locations in perceptual space or as a series of movements in kinematic space (cf. Hikosaka, Nakamura, Sakai, & Nakahara, 2002). One possibility is that these representations function as independent processors that, on each trial, enter into a race for action control, with the fastest processor determining the response (cf. Logan, 1988). Alternatively, a general-purpose processor (GPP) could select a representation for action control and thus lock the system to a speciWc processing mode for a variable number of trials. For example, the GPP might shift from executing S-R rules to producing responses on the basis of a spatial memory representation. Verwey (2003) argued that these two alternatives, parallel processing and mode change, can be distinguished on the basis of RT distributions. It has been shown that the distribution of the minima from two or more unimodal distributions is, itself, unimodal (Townsend & Ashby, 1983). Hence, parallel processing yields unimodal RT distributions. In contrast, shifts between mutually exclusive processing modes lead to bimodal or multimodal distributions, provided there is a suYcient distance between the distributions associated with each mode (see Verwey, 2003, for details; cf. Rickard, 1999). Verwey (2003) tested these predictions experimentally by training participants on two discrete 6-element sequences in a cued 8-choice SRT task. In a subsequent test phase, participants performed three to six trials with random sequences before they were transitioned to one of the familiar sequences. Verwey reasoned that processing of the random sequences would be purely stimulus-driven and that responses to a familiar sequence would be primarily plan-based. Thus, the switch from stimulus-based to planbased processing must occur during the Wrst few trials after the initial presentation of the familiar sequence. To test whether this transition was discrete or the result of a parallel race process, Verwey asked whether the observed RT distributions for the Wrst few trials with the familiar sequences were multimodal. The evidence favored multimodality, but only weakly. On the one hand, visual inspection of the RT frequency histograms for individual subjects seemed to support multimodality, and Monte Carlo simulations with four independent processors with normally distributed RTs provided a good qualitative Wt to the observed RT distributions. On the other hand, no general statistical test of multimodality was attempted, and a plausible bimodal mixture model was rejected. In addition, it is important to note that a mixture of four normal distributions can mimic many qualitatively diVerent probability distributions (e.g., Bishop, 1995), including unimodal but skewed and multimodal distributions. It is also important to note that Verwey’s (2003) participants had already learned to produce plan-based responding to familiar sequences before the RT distributions were
Psychological Research
analyzed. This means that Verwey addressed the question of how transitions occur between previously established processing modes, and not how this transition occurs during learning. It is logically possible that such transitions are discrete, even if the Wrst transition between these modes during initial learning is continuous (e.g., as a result of a parallel race). When incidental learning is studied in human participants, learning is often so fast that the transition from stimulus-triggered to plan-based performance occurs quickly, sometimes within a few experimental trials (cf. Haider & Frensch, 2009; Haider & Rose, 2007; Rose, Haider, & Büchel, 2010; Haider et al., 2010). For this reason, we focused our analysis on an extensive data set, reported by Matsuzaka, Picard, and Strick (2007), who trained macaque monkeys on a visually guided reaching task that was structured according to a repeating sequence. As we will show, for the monkeys in this experiment the transition occurred slowly, in many cases over the course of thousands of trials. This slow transition allowed a unique opportunity to examine learning-related changes in sequential-action control with rigorous statistical methods. Another reason why the Matsuzaka et al. data are particularly well-suited to study the transition to plan-based sequence production is that their experimental design enabled the monkeys to respond before the next visual cue was presented. In contrast, the vast majority of sequence learning studies with human participants were designed in such a way that purely predictive responses could not be observed, either because predictive responses were not permitted, or because the next cue was presented without a time delay that would allow for a predictive response. In the present context, predictive responses are advantageous for two reasons. First, they constitute direct evidence for plan-based responding. Second, predictive responses are considerably faster than stimulus-triggered responses. Consequently, it is easier to detect RT bimodality in case of discrete shifts between processing modes. To summarize: we asked how the transition from stimulus-based to plan-based responding occurs during initial sequence learning. To this end, we applied Verwey’s (2003) framework of sequential-action control to the sequence learning data of Matsuzaka et al. (2007). SpeciWcally, if there is a transition period during which performance is characterized by bimodal or multimodal RT distributions, then this would support the hypothesis that the animals switched between discrete processing modes. Alternatively, if skilled sequence production relied on a race between independent processors to trigger the next response, then we should Wnd unimodal RT distributions that shift continuously toward faster RTs as training progresses. Critically, we relied on statistical model comparisons to determine the number of component distributions
that contributed to the observed RT distributions. As we will see, the data strongly support a discrete transition between stimulus-based and plan-based responding.
Method A detailed account of subjects, task, and experimental design is provided in Matsuzaka et al. (2007). We, therefore, limit our own description to those aspects that are most germane to the current RT analysis. Subjects and task Four macaque monkeys were trained on a visually guided reaching task. In this report, we focus on the RT results for two of these monkeys, FN and MA. The task required the animals to monitor Wve empty rectangles, referenced by the numbers 1–5 from left to right, that were arranged horizontally on a touch-sensitive computer screen. On each trial, one of the rectangles lit up, and the animal responded by touching the target with its right hand. On contact, the target was turned oV and the next target was presented after a delay of 400 ms for systematic target locations, or after 100 ms for random locations. RT was measured as the time diVerence between stimulus onset and response. Predictive responses were permitted and resulted in negative RTs. For example, if, in the systematic condition, the animal responded 250 ms before the next target was scheduled to appear, the RT for this trial was ¡250 ms. If a predictive response was correct, the animal advanced to the next target. Thus, for systematic target locations, the task was performed in the absence of visual cues as long as the animal responded correctly within the 400 ms RSI. Training comprised 380 near-daily sessions for FN and 205 for MA. The animals were Wrst trained on randomly determined target locations until their performance reached asymptotic levels after about 70 sessions. At this point, the continuously repeating sequence 5-3-1 was introduced, and 15 (FN) or 23 (MA) sessions later the sequence 2-3-4.1 The animals then alternated between trial blocks with random target locations and blocks with systematic locations. Each block comprised between 500 and 1,000 trials. The average number of trials per session was approximately 5,350 (SD = 2,350) and 3,650 (SD = 1,150) for FN and MA, respectively. By the time the Wrst systematic sequence was introduced, the monkeys received a juice reward for every fourth correct response. This reinforcement rate was maintained until the end of the experiment. 1
Matsuzaka et al.'s (2007) primary motivation for training the monkeys on two sequences was to obtain evidence for sequence-speciWc neuronal activity in primary motor cortex after extensive practice.
123
Psychological Research
Fig. 1 Top row mean training RTs to Response Locations 2 (squares), 3 (circles), and 4 (triangles) for FN (left panel) and MA (right panel). Bottom row corresponding RT standard deviations. R2 = 2-response, R3 = 3-response, R4 = 4-response
Results In this section, we present the RT data of two monkeys, FN and MA, as they performed the continuously repeating response sequence 2-3-4. Unless noted otherwise, the results were qualitatively similar for the two remaining monkeys and the sequence 5-3-1. The Wrst 50 trials of each block, trials that contained an error, and trials following an error were excluded from the RT analysis. We also discarded trials following a reward because FN and MA’s responses were slowed on post-reward trials by 18 ms and 78 ms, respectively. Since the number of trials per session varied considerably both within and between animals, we subdivided training with the 2-3-4 sequence into bins of 1,500 consecutive trials, disregarding session boundaries. Consequently, each bin contained about 500 responses per target location. Mean RTs and standard deviations The top row of Fig. 1 displays FN and MA’s mean training RTs, plotted separately for the 2-response (R2), the 3-response
123
(R3), and the 4-response (R4). While Bin 35 marked the end of the training period for MA, FN’s training continued until Bin 198. However, we do not present results for bins greater than 45 because the animal had already transitioned to plan-based sequence production by the time it reached Bin 45 (see next section). Figure 1 shows that when the animals Wrst practiced the 2-3-4 sequence, mean RTs to the three target locations were very similar in magnitude (Bins 1 and 2). Subsequently, large RT improvements occurred at diVerent points in time for the three response locations. FN showed a large RT drop for R3 in Bin 3, and for R4 in Bins 4/5, but R2 latencies remained high until a drop occurred in Bin 19. Similarly, MA’s latencies for R2 and R3 decreased markedly in Bins 2/3, but for R4, an RT drop of similar size was not apparent until Bin 6. In short, mean RTs of both animals suggest diVerent learning trajectories for the three response locations. RT standard deviations (STDs) are plotted separately for R2, R3, and R4 in the bottom panels of Fig. 1. For both monkeys, STDs were at their lowest level in Bin 1. As mean RTs decreased over the Wrst 15 bins, STDs increased.
Psychological Research
Fig. 2 Examples of RT frequency distributions of RT for monkeys FN (left column) and MA (right column). Black curves represent best-Wtting ex-Gaussian mixture models (see text for details). R2 = 2-response, R3 = 3-response, R4 = 4-response
More speciWcally, the large RT decrements described in the previous paragraph closely coincided with surges in RT variability. The Wnding of inversely proportional RTs and STDs is remarkable, given that prominent theories of automaticity predict that STDs, much like mean RTs, should decrease monotonically as a power function of practice (e.g., Logan, 1988; Rickard, 1997). A proportional change in RT and STD did occur after Bin 15. For example, as FN’s R4 latencies decreased by about 175 ms from Bin 18 to Bin 21, STDs decreased by 90 ms. Similarly, a 190-ms improvement in MA’s R4 latencies across Bins 15–22 was accompanied by a 60-ms reduction in STD. It will become clear in the next section that the changes in RT variability were directly related to the occurrence of multimodal RT distributions. RT distributions Figure 2 shows examples of RT frequency histograms produced by FN and MA at diVerent points in the training phase. Deviations from unimodality are readily apparent in these examples. In order to determine formally the number of individual distributions that contributed to the observed RT distribution in each bin, we Wtted an ex-Gaussian
probability density function (pdf) and mixture models with two, three, and four ex-Gaussian components to the RT data. The ex-Gaussian pdf is the convolution of a Gaussian (normal) pdf and an exponential pdf. In psychology, it is a popular way of characterizing RT distributions because it can accommodate the skewness typically found with RT distributions (e.g., Heathcote, Popiel, & Mewhort, 1991; RatcliV & Murdock, 1976). In the present context, we opted for the ex-Gaussian rather than the Gaussian density because its positive skewness rendered it less likely that our analysis would favor bi- or multimodality over unimodality. For example, a unimodal distribution that is skewed right might be best Wt by a mixture of two or more Gaussian pdfs because the mixture of Gaussians model can only accommodate skewness by adding extra components. Thus, the ex-Gaussian pdf provided a more conservative test of the notion that sequence learning entails discrete shifts from visually guided to plan-based performance. The ex-Gaussian distribution has three parameters: the mean () and variance (2) of the normal distribution, and the mean of the exponential distribution (). Consequently, a 2-ex-Gaussian mixture model has six parameters, plus one free parameter for the two mixture probabilities associated
123
Psychological Research
with the ex-Gaussian distributions.2 More generally, an N-component ex-Gaussian mixture model requires 3N + (N ¡ 1) parameters to be estimated. We used maximum likelihood (ML) estimation to Wt the models to the RT data of each bin.3 By maximizing the log likelihood function, ML Wnds estimates of the model parameters for which the probability of the observed data is highest. In order to obtain reasonable starting values for the location parameters in mixture models with N = {2, 3, 4} components, we Wrst estimated the RT pdfs using Parzen kernel estimators with a Gaussian kernel (Parzen, 1962). This is a generalization of the relative frequency histogram that uses graded, rather than Wxed-width bins. The only parameter is kernel (or bin) width. For each value of N, we determined the smallest kernel width that yielded an estimated pdf with N modes. The locations of these modes were used as the starting values for Wts of the ex-Gaussian mixture models. After the parameter values were estimated, we determined the Bayesian Information Criterion (BIC; Schwarz, 1978) score for each model. The BIC is derived from the log likelihood function and allows for a goodnessof-Wt comparison of models with diVerent numbers of parameters. For each bin, we selected the model with the lowest BIC score as the model that provided the best account of the RT data. The results of the model Wtting are depicted in Fig. 3. The vertical location of each dot in Fig. 3 is centered on the mean of an ex-Gaussian component (i.e., + ) from the best-Wtting mixture of ex-Gaussians model. Consequently, the number of dots per bin indicates the number of ex-Gaussian components that contributed to the observed RT distribution. The size of each dot represents the mixture probability for the component distribution. For example, the distribution of FN’s R3 latencies in Bin 3, shown in the top left panel of Fig. 2, is a mixture of two distinct components. One component consists of predictive RTs with a mean of 66 ms and accounts for 75% of the responses. The second component has a mean of 346 ms and comprises 25% of the responses. To organize the model-Wtting results, we adopted a heuristic classiWcation scheme. When the RT distributions were bimodal, we denote the mean of the slower component with a black dot and the mean of the faster component with a medium gray dot. A reasonable interpretation of 2
Since the mixture probabilities are constrained to sum to 1, a model with N ex-Gaussian distributions has N ¡ 1 free parameters for N mixture probabilities. Our primary goal was to determine the number of ex-Gaussian components in each bin and their locations. Therefore, we do not report estimates for the parameters and .
3
We excluded trials with RTs >600 ms for FN, and trials with RTs >500 ms for MA. These rare RTs were outside the critical RT range for the transition from stimulus-based to plan-based sequence production, and adversely aVected model Wts.
123
these components is that the slower component reXects stimulus-based responding and the faster component reXects predictive or plan-based responding. For example, in most cases the slower component has a mean that is greater than 300 ms for FN and greater than 250 ms for MA, whereas the faster component typically has a mean that is less than 100 ms for FN and less than 50 ms for MA. Furthermore, for both animals, in many cases the faster component has a negative mean (especially for response R3), which can only occur with plan-based responding. In several instances, the RT analysis suggested a further decomposition of stimulus-based or plan-based processing (e.g., FN-R4, Bins 26–37). To highlight these cases, one of the components was marked by a dot Wlled with a diagonal line pattern. A closer inspection of Fig. 3 provides a number of insights. The animals transitioned from stimulus-elicited to plan-based behavior during the Wrst 15–20 bins. In Bins 1– 3, RT distributions were predominantly unimodal for all response locations in both animals. After this initial unimodal phase, the animals entered a bimodal phase that lasted for about 10 bins. Typically, within one or two bins, performance changed from being exclusively stimulus-triggered to being predominantly plan-based. The animals continued to respond to the visual cue on a substantial number of the trials, but the proportion of stimulus-triggered responses decreased steadily over the next ten bins. The Wndings presented so far help to explain the rather puzzling pattern of mean RTs and STDs discussed earlier. Initially, the animals’ responses to the 2-3-4 sequence were stimulus-triggered. Large RT savings typically occurred within the Wrst 5 bins as sequence learning enabled the animals to perform a signiWcant proportion of trials planbased. During this time, the animals switched between plan-based and stimulus-based sequence production. The ensuing bimodal RT distribution was responsible for the substantial increase in STD in Fig. 1 (see also Fig. 2, topleft panel). Further RT improvements occurred until about Bin 15, as the animals were increasingly less likely to switch back to stimulus-triggered responding. In eVect, RT distributions became unimodal again and overall RT variability decreased accordingly (cf. Figs. 1, 2, bottom-left panel). In short, the mean RTs, the RT STDs, and the RT distributions are all consistent with a trial-by-trial switching between stimulus-based and plan-based responding, and the transition from purely stimulus-based to purely plan-based was largely completed during the Wrst 15 bins of training. What happened after the animals had completed the shift to plan-based sequential action? Figure 3 suggests that further RT improvements occurred. Interestingly, for some response locations (FN-R4, MA-R3, MA-R4), the animals transitioned through another phase of bimodal RT distributions. This points to the possibility that plan-based processing
Psychological Research
Fig. 3 RT model-Wtting results for FN (left column) and MA (right column). Each dot represents an ex-Gaussian distribution. The number of dots per bin corresponds to the number of ex-Gaussian components that contributed to the observed RT distribution. The y-coordinate of a dot indicates the mean of the ex-Gaussian distribution. The size of a dot denotes the mixture probability associated with the distribution. The
relation between dot size and magnitude of the mixture probability is illustrated in the legend in the middle-right panel. Components with mixture probabilities below 0.01 are not shown. Arrows in Wgure point to the model-Wtting results for the four RT frequency distributions displayed in Fig. 2. R2 = 2-response, R3 = 3-response, R4 = 4-response
is itself not a unitary process, but an aggregate of motor programs (cf. Verwey, 2003). However, with the other two monkeys, bimodality only occurred during the transition from stimulus-based to plan-based behavior, not after. That is, they did not exhibit bimodal RT distributions when they
performed the 2-3-4 sequence predictively. In addition, there was no indication of bimodal distributions during plan-based responding with the 5-3-1 sequence in any of the four monkeys. Therefore, the signiWcance of the RT bimodalities for predictive responses in the 2-3-4 condition is unclear.
123
Psychological Research
Characterizing the transition between processing modes
123
0.8
0.6
P
In the previous section we presented evidence that the animals’ substantial RT improvements early in training (i.e., during the Wrst 15–20 bins) were not mediated by the gradual speeding of a unitary process, but instead occurred because they switched between two discrete processing modes—stimulus- and plan-based, gradually increasing the percentage of trials they spent in the faster plan-based mode. The Wnding of bimodal RT distributions indicates that the animals switched processing modes, but it can tell us nothing about the frequency with which these shifts took place on a trial-by-trial basis. Consider a hypothetical block with 70 plan-based and 30 stimulus-triggered responses. The animal might perform the Wrst 30 trials in stimulus mode and then switch to plan-based responding for the remaining 70 trials. In this case, the probability that the animal switches between modes on any randomly selected trial is nearly zero. Another possibility is that the animal could digress from plan-based responding for one trial at a time and switch back to memory mode immediately. In this case, the probability of switching is nearly one. These two alternatives mark the endpoints of a behavioral continuum. In the former case, the animal is biased toward perseverating in the current response mode; in the latter, it is biased toward alternating between response modes. It is important to note that the absolute frequency of trial-to-trial shifts is constrained by the relative availability of the two processing modes. In our hypothetical example, plan-based responding is dominant, and the maximum number of mode shifts that can occur with 70% plan-based responses in the 100-trial block is 60. By contrast, if the two modes were equally likely, the animal could shift modes on as many as 99 out of 100 trials. This means that an index of the animal’s tendency to switch between modes needs to take into account the baseline probabilities of the two response modes. One way to do this is to evaluate trialto-trial dependencies. The animal can be said to switch modes randomly if the probability of a plan-based response on trial t does not depend on the response mode of the previous trial—that is, if P(plan-based on trial t|plan-based on trial t¡1) = P(plan-based on trial t|stimulus-triggered on trial t¡1), or in abbreviated form, P(p|p) = P(p|s). More speciWcally, the expectation for random mode switching is that the two conditional probabilities P(p|p) and P(p|s) equal the unconditional probability of plan-based responding, P(p). If P(p|p) and P(p|s) diVer signiWcantly, then the animal’s performance was biased, either toward perseverating [if P(p|p) > P(p|s)] or toward mode switching [if P(p|p) < P(p|s)]. The proposed analysis of conditional probabilities requires at least some response-mode variability on consecutive trials, or in other words that the RT distribu-
1
0.4 P(p) 0.2
P(p|p) P(p|s)
0
FN
MA
Fig. 4 Trial-to-trial dependencies in switching between response modes. P(p) = unconditional probability of plan-based performance (dots); P(p|p) = conditional probability of two plan-based responses in a row (Wlled diamonds); P(p|s) = conditional probability of a planbased response following a stimulus-triggered response (empty diamonds). R2 = 2-response; R3 = 3-response; R4 = 4-response. Error bars represent the 95% conWdence interval for the diVerence P(p|p)¡P(p|s)
tions were bimodal. In FN’s case, this condition was met for R3 and R4 in Bins 4–7. MA showed bimodal RT distributions for R2 and R3 in Bins 3–7 (see Fig. 3). We classiWed responses in these training intervals as either plan-based or stimulus-based by determining the probability of a response belonging to either RT distribution, and selecting the distribution for which the response was more likely. Although it is apparent in Fig. 3 that stimulus-based and plan-based responding yield distinct RT distributions, misclassiWcations were possible. Therefore, we restricted the analysis to trials that could be classiWed with suYcient certainty. SpeciWcally, we selected trials for which classiWcation accuracy was 0.8 or higher (i.e., for which the likelihood ratio was either greater than 4 or less than 0.25). This selection criterion was met on 89% and 87% of the trials for FN and MA, respectively. Pooling across bins, the probability of responding planbased was about 0.8 for both animals. We then calculated the conditional probabilities of plan-based responding for each monkey. The results are shown in Fig. 4. Starting with MA, the overlapping conWdence intervals for P(p|p) and P(p|s) indicate that the monkey’s performance was consistent with random mode switching. For the R2-to-R3 transition, MA was equally likely to continue to respond plan-based and to switch from stimulus-triggered to planbased responding. In contrast, FN was much less likely than expected to execute R4 plan-based when R3 was stimulusbased. This means that FN was prone to perseverate in stimulus mode.
Psychological Research
Discussion This article investigated the transition from stimulus-based to plan-based responding during initial sequence learning. Two monkeys received extensive training with a visually guided reaching task that was structured according to a continuously repeating 3-element sequence. At the outset of training, unimodal RT distributions for the three response locations indicated that initial responses were stimulus-triggered. The animals then entered a transition phase characterized by bimodal RT distributions during which responding changed from stimulus-driven to predominantly plan-based. The animals continued to respond to the visual cue on a substantial number of trials—hence the bimodality; but the proportion of stimulus-based trials decreased steadily until RT distributions became unimodal again. At this point, the animals can be said to have completed the transition from stimulus-based to plan-based responding. Predictive responses in the remainder of the training phase were typically unimodal, but some bimodality did occur, hinting at the possibility that plan-based responding comprises more than one processing mode. The analysis of RT distributions sheds light on an atypical pattern of mean RTs and STDs. While mean RTs steadily decreased with training, STDs followed an inverse U-shaped trajectory. We found that the transitory increase in STD coincided with the occurrence of bimodal RT distributions. As a Wrst step toward describing the process of mode shifting in greater detail, we analyzed trial-to-trial dependencies during the initial bimodal phase. At this point in training, responses were mostly plan-based, but the animals still responded to the visual cue on about 20% of the trials. Monkey MA’s behavior was consistent with random mode switching, but FN performed in stimulus mode more persistently than was expected by chance. This individual diVerence suggests that mode switching may include a strategic component and that mode switching may include two aspects: the general availability of alternative processing modes, and the consistency with which these modes are utilized. Our results support the notion that sequence learning involves shifts between distinct processing modes, but the cognitive processes that bring about these shifts have yet to be speciWed in greater detail. One possibility is to conceptualize a mode shift as the result of the animal’s decision to wait (or not to wait) for the visual cue. This would allow us to draw on a large decision-making literature that is pertinent to this issue. Throughout training, the animals received a food reward for every fourth correct button press. A decision to wait for the visual cue increased the probability of a correct response and, thus, the probability of reward, but at the cost of an increased delay. A large literature implicates
a variety of frontal cortical areas in decision-making tasks of this type. For example, many studies have reported evidence that regions of the anterior cingulate cortex (ACC) and prefrontal cortex respond diVerentially to changes in the probability of an expected reward (e.g., Amiez, Joseph, & Procyk, 2006; Matsumoto, Suzuki, & Tanaka, 2003; McClure, Laibson, Loewenstein, & Cohen, 2004; Tremblay & Schultz, 1999). These results might suggest that during the early periods of learning, the animals continuously monitored their uncertainty about which response should be emitted next. When uncertainty was high, the animal tended to wait for the next visual cue, which eliminated this uncertainty. When uncertainty was low, the animal tended to make an immediate predictive response. This interpretation is supported by the results of Beran, Smith, Redford, and Washburn (2006), who reported that macaques can learn to use an uncertainty response appropriately in a sensory discrimination task. Using fMRI Paul, Ashby, and Smith (2009) identiWed a network that included a variety of frontal cortical regions (ACC, prefrontal cortex, insula) that were uniquely active on trials when humans made uncertainty responses in this same task. If macaques are able to make a unique response when they are uncertain about which response is correct in a sensory discrimination task, then it seems plausible that they also could decide to wait for a visual cue on trials when they are uncertain about which response to make next in the sequential reaching task. A distinguishing feature of the current study was that predictive responses provided direct and unequivocal evidence for plan-based performance. Compatible Wndings were reported in a recent study by Desmurget and Turner (2010). They trained two macaque monkeys on a discrete sequence production task that required the animals to make four consecutive reaching movements to four peripheral visual targets. After extensive training with either one (1-23-4) or two (1-2-3-4, 4-2-1-3) regular sequences, the animals typically initiated their reaching movements before the target stimulus was presented on the second, third, and fourth trials of the sequence. Moreover, when the monkeys were transferred to a random sequence of target locations, they tended erroneously to reproduce the learned sequences if, by chance, the Wrst or more target locations corresponded to the learned sequences. In previous research with human subjects, plan-based performance was inferred from the absence of interference due to incompatible S-R mappings (e.g., HoVmann & Koch, 1997; Koch, 2007; Tubau & López-Moliner, 2004). Interestingly, this eVect was only observed in participants who were able to communicate the response sequence in a post-experimental interview. The strong empirical link between explicit knowledge and plan-based performance led Tubau et al. (2007) to propose that action plans are
123
Psychological Research
represented phonetically, that is, by means of inner speech (cf. Vygotsky, 1986). This view is supported, for example, by the Wnding that visual-verbal and sound distractors impeded plan execution, but irrelevant spatial information did not (Tubau et al., 2007). The ability to make plan-based reaching movements of nonlinguistic animals does not disprove the link between speech and the acquisition of action plans in humans, but it suggests that action plans can be formed in more than one way. In the current experiment, the transition from stimulus-based to plan-based performance occurred at diVerent points in time for the three response locations and often extended over several bins. For example, 50% of MA’s 2responses were already predictive in Bin 2, but for the 4response, clear evidence of predictive responding did not emerge until Bin 7. By contrast, human participants in incidental learning experiments often become aware of the sequential regularity within a single experimental session. For example, Rünger and Frensch (2008) found that after 1,200 trials with a 6-choice SRT task, 40% of the participants were able to report the 6-element response sequence of the training phase. Moreover, it has been shown with a variety of incidental learning tasks that participants with full explicit knowledge of the regularity exhibit a sudden and pronounced decrease in RT during the training phase, and there is accumulating evidence that this RT drop marks the point in time at which explicit knowledge emerges (e.g., Haider & Frensch, 2009; Haider & Rose, 2007; Rose et al. 2010; Haider et al., 2010). These Wndings suggest that on the basis of their linguistic abilities, human participants gain insight into the structure of the task, which enables them to construct an action plan that can be implemented in a top-down manner. In contrast, the monkeys in the current study seemed to develop action plans piecemeal, that is, over an extended period and at diVerent points in time for the three response locations. It thus seems that without the capacity for language-based reasoning, the construction of action plans is a more data-driven, incremental process. An interesting challenge for future research is to explore commonalities and diVerences in the functional properties of plan-based sequence production of human and nonhuman primates. In the “Introduction” section, we reviewed SRT task studies that assessed participants’ susceptibility to interference from incongruent S-R mappings in order to discriminate stimulus-based and plan-based response generation. An obvious question is whether plan-based sequence production of monkeys is similarly impervious to interference. In the current study, we focused on the initial transition from stimulus-based to plan-based response generation. However, as training continues, plan-based responding is likely to become automatized. This creates the possibility that the functional properties of plan-based
123
responding are, themselves, subject to change—a point that should be taken into consideration in future studies. In closing, we return to the distinction between parallel processing and changes between processing modes. Our results support the notion that sequence learning entails a shift from stimulus-based to plan-based responding. However, mode shifting and parallel processing are not mutually exclusive processes. In the current study, for example, the monkeys’ RT performance improved further after they had switched to plan-based responding. In most instances, these improvements were characterized by a continuous shift of unimodal RT distributions toward faster latencies. This result could be explained by a strengthening of existing sequence representations, or by the formation of new representations that come to control behavior. Moreover, unimodal RT distributions are compatible with a race between independent processors and with discrete shifts between processing modes if the RT distributions of each mode overlap substantially (Verwey, 2003). Consequently, the decomposition of RT distributions into their component distributions has limited explanatory power. Nevertheless, it has proven to be a useful tool in the analysis of the behavioral data by Matsuzaka et al. (2007). Acknowledgments This research was supported in part by Award Number P01NS044393 from the National Institute of Neurological Disorders and Stroke and in part by funds from the OYce of Research and Development, Medical Research Service, Department of Veterans AVairs (PLS).
References Amiez, C., Joseph, J. P., & Procyk, E. (2006). Reward encoding in the monkey anterior cingulate cortex. Cerebral Cortex, 16, 1040– 1055. doi:10.1093/cercor/bhj046. Beran, M. J., Smith, J. D., Redford, J. S., & Washburn, D. A. (2006). Rhesus macaques (Macaca mulatta) monitor uncertainty during numerosity judgments. Journal of Experimental Psychology: Animal Behavior Processes, 32, 111–119. doi:10.1037/00977403.32.2.111. Bishop, C. (1995). Neural networks for pattern recognition. Oxford, UK: Clarendon. Desmurget, M., & Turner, R. S. (2010). Motor sequences and the basal ganglia: kinematics, not habits. The Journal of Neuroscience, 30, 7685–7690. doi:10.1523/JNEUROSCI.0163-10.2010. Haider, H., Eichler, A., & Lange, T. (2010). An old problem: how can we distinguish between conscious and unconscious knowledge acquired in an implicit learning task. Consciousness and Cognition, 20, 658–672. doi:10.1016/j.concog.2010.10.021. Haider, H., & Frensch, P. A. (2009). ConXicts between expected and actually performed behavior lead to verbal report of incidentally acquired sequential knowledge. Psychological Research/Psychologische Forschung, 73, 817–834. doi:10.1007/s00426-0080199-6x. Haider, H., & Rose, M. (2007). How to investigate insight: a proposal. Methods (San Diego, Calif), 42, 49–57. doi:10.1016/j.ymeth. 2006.12.004.
Psychological Research Heathcote, A., Popiel, S., & Mewhort, D. J. K. (1991). Analysis of response time distributions: an example using the Stroop task. Psychological Bulletin, 109, 340–347. doi:10.1037/00332909.109.2.340. Hikosaka, O., Nakamura, K., Sakai, K., & Nakahara, H. (2002). Central mechanisms of motor skill learning. Current Opinion in Neurobiology, 12, 217–222. doi:10.1016/S0959-4388(02)00307-0. HoVmann, J., & Koch, I. (1997). Stimulus-response compatibility and sequential learning in the serial reaction time task. Psychological Research/Psychologische Forschung, 60, 87–97. doi:10.1007/ BF00419682. Hommel, B. (2000). The prepared reXex: automaticity and control in stimulus-response translation. In S. Monsell & J. Driver (Eds.), Control of cognitive processes: attention and performance XVIII (pp. 247–273). Cambridge, MA: MIT Press. Keele, S. W. (1968). Movement control in skilled motor performance. Psychological Bulletin, 70, 387–403 Koch, I. (2007). Anticipatory response control in motor sequence learning: evidence from stimulus-response compatibility. Human Movement Science, 26, 257–274. doi:10.1016/j.humov. 2007.01.004. Logan, G. D. (1988). Toward an instance theory of automatization. Psychological Review, 95, 492–527. doi:10.1037/0033-295X. 95.4.492. Matsumoto, K., Suzuki, W., & Tanaka, K. (2003). Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science, 301, 229–232. doi:10.1126/science.1084204. Matsuzaka, Y., Picard, N., & Strick, P. L. (2007). Skill representation in the primary motor cortex after long-term practice. Journal of Neurophysiology, 97, 1819–1832. doi:10.1152/jn.00784.2006. McClure, S. M., Laibson, D. I., Loewenstein, G., & Cohen, J. D. (2004). Separate neural systems value immediate and delayed monetary rewards. Science, 306, 503–507. doi:10.1126/science.1100907. Nissen, M., & Bullemer, P. (1987). Attentional requirements of learning: evidence from performance measures. Cognitive Psychology, 19, 1–32. doi:10.1016/0010-0285(87)90002-8. Parzen, E. (1962). On estimation of a probability density function and mode. Annals of Mathematical Statistics, 35, 1065–1076. Retrieved from http://www.jstor.org/stable/2237880. Paul, E. J., Ashby, F. G., & Smith, J. D. (2009). Neural networks that monitor response uncertainty. Paper presented at the 2009 Meetings of the Cognitive Neuroscience Society, March 21–24, San Francisco, CA. RatcliV, R., & Murdock, B. (1976). Retrieval processes in recognition memory. Psychological Review, 83, 190–214. doi:10.1037/0033295X.83.3.190.
Rickard, T. C. (1997). Bending the power law: a CMPL theory of strategy shifts and the automatization of cognitive skills. Journal of Experimental Psychology: General, 126, 288–311. doi:10.1037/ 0096-3445.126.3.288. Rickard, T. C. (1999). A CMPL alternative account of practice eVects in numerosity judgment tasks. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 532–542. doi:10.1037/0278-7393.25.2.532. Rose, M., Haider, H., & Büchel, C. (2010). The emergence of explicit memory during learning. Cerebral Cortex, 20, 2787–2797. doi:10.1093/cercor/bhq025. Rünger, D., & Frensch, P. A. (2008). How incidental sequence learning creates reportable knowledge: the role of unexpected events. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 1011–1026. doi:10.1037/a0012942. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464. Retrieved from http://www.jstor.org/stable/2958889. Simon, J. R., & Small, A. M., Jr. (1969). Processing auditory information: interference from an irrelevant cue. Journal of Applied Psychology, 53, 433–435. doi:10.1037/h0028034. Stroop, R. J. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643–662. doi:10.1037/ h0054651. Townsend, J. T., & Ashby, F. G. (1983). Stochastic modeling of elementary psychological processes. New York: Cambridge University Press. Tremblay, L., & Schultz, W. (1999). Relative reward preference in primate orbitofrontal cortex. Nature, 398, 704–708. doi:10.1038/19525. Tubau, E., Hommel, B., & López-Moliner, J. (2007). Modes of executive control in sequence learning: from stimulus-based to planbased control. Journal of Experimental Psychology: General, 136, 43–63. doi:10.1037/0096-3445.136.1.43. Tubau, E., & López-Moliner, J. (2004). Spatial interference and response control in sequence learning: the role of explicit knowledge. Psychological Research/Psychologische Forschung, 68, 55–63. doi:10.1007/s00426-003-0139-4. Verwey, W. B. (1999). Evidence for a multistage model of practice in a sequential movement task. Journal of Experimental Psychology: Human Perception and Performance, 25, 1693–1708. doi:10.1037/0096-1523.25.6.1693. Verwey, W. B. (2003). Processing modes and parallel processors in producing familiar keying sequences. Psychological Research/Psychologische Forschung, 67, 106–122. doi:10.1007/s00426-002-0120-7. Vygotsky, L. (1986). Thought and language (A. Kozulin, Trans.). Cambridge, MA: MIT Press. (Original work published in 1934).
123