Neuronal Representation of Response–Outcome in the Primate ...

1 downloads 0 Views 634KB Size Report
condition, the click sound of the solenoid valve, which was used for reward delivery, was presented without delivery of the water. After the monkey fixated on the ...
Neuronal Representation of Response–Outcome in the Primate Prefrontal Cortex

Satoshi Tsujimoto and Toshiyuki Sawaguchi

For flexible control of behaviour, it is important to associate preceding behavioural response with its outcome. Since the dorsolateral prefrontal cortex (dlPFC) plays a major role in such control, it is likely that this area has a neuronal mechanism of coding response–outcome, such as reward/non-reward, based on the nature of the behavioural response made immediately before. To test this hypothesis, we examined neuronal activity in the dlPFC while monkeys performed a variant of the oculomotor delayed-response (ODR) task that had two reward conditions. In this task, the correct response was rewarded in half of the trials only and the subject could not expect the outcome (reward/non-reward). The response was followed by a fixation of 2 s (F2-period). We also employed a fixation (FIX) task that required monkeys to fixate on the peripheral target only, with two reward conditions that were similar to those in the ODR task. Post-response activity of a subset of dlPFC neurons was modulated by both the direction of the preceding response and its outcome. None of these neurons showed directional F2-period activity in the FIX task. These results suggest that a subset of dlPFC neurons represent response–outcome (i.e. reward/non-reward associated with directional saccade made immediately before).

It is well known that a major role of the dlPFC is the cognitive control of spatial behaviour (Funahashi and Kubota, 1994; Goldman-Rakic, 1995; Fuster, 1997). The neuronal basis of this function has been extensively studied using the delayedresponse paradigm. Such studies have demonstrated that the activity of dlPFC neurons is related to various aspects of these tasks, such as cue (Fuster, 1973; Niki, 1974; Kojima, 1980; Funahashi et al., 1990), delay (Kubota et al., 1974; Kojima and Goldman-Rakic, 1982; Funahashi et al., 1989; Rao et al., 1998; Sawaguchi and Yamane, 1999; Takeda and Funahashi, 2002) and response, i.e. saccades/reaching (Kojima, 1980; Boch and Goldberg, 1989; Funahashi et al., 1991). The magnitude of these task-related activities often differs according to the direction of the cue/response. Such ‘directional’ task-related activity has been considered a neural basis of visuospatial working memory processes that are critical for cognitive control of goal-directed spatial behaviour (Funahashi and Kubota, 1994; Goldman-Rakic, 1995; Fuster, 1997). Thus, the dlPFC is likely to have a neuronal mechanism to associate different (directional) spatial behavioural response with its specific outcome, such as reward/non-reward, for flexible cognitive control of spatial behaviour (Cohen et al., 1996; Miller and Cohen, 2001; Schultz, 2001). We therefore hypothesized that a subset of neurons in the PFC codes response–outcome (reward/non-reward) based on the nature of the spatial response made immediately before, particularly a spatial working memory-guided response. To address this hypothesis, we recorded neuronal activity in the dlPFC while the monkey performed a variant of the oculomotor delayedresponse (ODR) task (Funahashi et al., 1989), which had two conditions of reward contingency. In this task, the correct motor response was rewarded in half of the trials only and the monkeys could not anticipate the outcome (reward/nonreward) until the end of the response. We report here that post-response activity of a subset of dlPFC neurons was modulated by both the direction of preceding response and its outcome and, hence, coded the outcome associated with a differential (directional) spatial response.

Keywords: flexible control of behaviour, monkey, prefrontal cortex, response–outcome, reward, single neuron activity

Introduction

Laboratory of Cognitive Neurobiology, Hokkaido University Graduate School of Medicine, Sapporo 060-8638 and Core Research for Evolutional Science and Technology, Japan Science and Technology, Saitama 332-0012, Japan

As primates, including humans, can react flexibly to a given environment, it is important to associate the nature of behavioural response with its outcome. Since the dorsolateral prefrontal cortex (dlPFC) plays a major role in flexible/cognitive control of behaviour (Cohen et al., 1996; Miller and Cohen, 2001), this area may possess a neural mechanism to associate differential behavioural response with its outcome, particularly reward/non-reward. Indeed, the dlPFC in humans is activated by delivery of monetary reward (Thut et al., 1997; Pochon et al., 2002; Ramnani and Miall, 2003), while dlPFC neurons in monkeys show activity related to reward and/or expectancy of reward (Niki and Watanabe, 1979; Watanabe, 1989, 1996; Leon and Shadlen, 1999; Kobayashi et al., 2002; Watanabe et al., 2002). In addition to reward prediction, which was examined in these previous studies, it is also important in reward-based behavioural control to associate the immediately preceding response with its outcome, such as reward/non-reward (Balleine and Dickinson, 1998). Although a subset of dlPFC neurons appears to show differential rewardrelated activity depending on different behavioural responses (Watanabe, 1989), systematic studies on this issue are lacking; in particular, the neuronal association between cognitive spatial behaviour, whose control is a major function of the PFC (see below), and its outcome is completely unknown.

Subjects and Task Procedures Two male macaque monkeys (Macaca fuscata, ∼4.5 and ∼5.5 kg) were used as subjects. Throughout this study, the subjects were treated in accordance with the ‘Guide for Care and Use of Laboratory Animals’ of both the National Institutes of Health and our institutes. Before training, the monkeys were habituated to a monkey chair and then preliminary surgery was performed under deep pentobarbital sodium anaesthesia (∼25 mg/kg i.v.) and aseptic conditions. The skull was partly exposed and two head-holding devices (stainless steel pipes, 8 mm internal diameter) were implanted on the anterior and

Cerebral Cortex V 14 N 1 © Oxford University Press 2004. All rights reserved.

Cerebral Cortex January 2004;14:47–55; DOI: 10.1093/cercor/bhg090

Materials and Methods

posterior portions of the skull with dental acrylic. For grounding, small stainless-steel bolts (3 mm diameter) were anchored to the skull and fixed with dental acrylic. To prevent infection, antibiotics were injected i.m. on the day of surgery and daily for 1 week thereafter. After recovery from the preliminary surgery, the subjects were trained to perform a variant of the ODR task with two different reward conditions: immediate reward (IM-Rw) and delayed reward (DL-Rw; Fig. 1a). In both conditions, the trial commenced when the monkey fixated on a central spot (a white square, 0.5 × 0.5°) on the CRT monitor. After 1.5 s, a cue (a white square, 0.5 × 0.5°) appeared at one of the six symmetric peripheral locations (eccentricity 15°; Fig. 1b) for 0.5 s. After a delay period of 3 s, the fixation spot turned off, which instructed the monkey to make a memory-guided saccade to the cued location. When the eye movement fell inside a target window of 5° from the cue position, the cue reappeared as a target of second fixation; in the IM-Rw condition only, a drop of water (∼0.1 ml) was delivered into the mouth of the monkey 500 ms later. In the DL-Rw condition, the click sound of the solenoid valve, which was used for reward delivery, was presented without delivery of the water. After the monkey fixated on the peripheral target for 2 s (i.e. 2.5 s from the end of the saccade), the same amount of reward (∼0.1 ml) was delivered in both conditions. The target remained for another 1 s and the monkey fixated on it, although this third fixation was not rewarded in both conditions. Throughout the trial, the eye position was restricted to within 5° of the central fixation point or peripheral target and if the monkey broke fixation, the trial was aborted. The two conditions were randomly intermixed so that the monkey could not anticipate whether

his saccade would be rewarded immediately after (i.e. 500 ms) the saccade. The monkeys were firstly trained on a conventional ODR task, where they were rewarded immediately after every correct saccade. Secondly, the two conditions were introduced simultaneously and then the duration of F2-period was gradually extended to 2.5 s. The monkeys were sufficiently over-trained in the final version of the task so that the order of training should be unrelated to differences in neuronal activity between the two conditions. At the end of the training session and throughout the recording sessions, the performance of both monkeys was almost perfect (>95% correct responses). As a control task, we also employed a fixation (FIX) task with two reward conditions that were similar to the ODR task. After the subject fixated on the peripheral target for 2 s, the reward was delivered (IMRw condition), or the click tone of the solenoid valve was presented (DL-Rw condition). An additional 2 s-fixation (‘F2-period’) was followed by reward delivery in both conditions. The reward sizes were the same as for the ODR task. The tasks and the recordings were controlled by a system consisting of an infrared eye-camera system (R-21C-A; RMS, Hirosaki, Japan), two personal computers (PC9801 FA and BA39; NEC, Tokyo) and other associated peripheral equipment. The eye-camera system was connected to the personal computers via A/D converters and was used for monitoring and sampling eye positions. The two personal computers were networked by RS232C and parallel I/O. One of the computers controlled the tasks, while the other monitored and collected the data for neuronal activities, eye positions and task events.

Figure 1. Behavioural task and recording sites. (a) Temporal sequence of the oculomotor delayed-response task with two reward conditions. In the IM-Rw condition, but not in the DL-Rw condition, the reward was delivered 500 ms after the correct response (memory-guided saccade). A click tone of the solenoid valve used for reward delivery was presented in the DL-Rw condition. This event (reward/non-reward) was followed by the second fixation period (2 s), followed by reward delivery for both conditions. (b) A central fixation spot and six peripheral cue/target locations. (c) Schematic drawings of the recording sites. The recorded sites (shaded area) were located rostral to the frontal eye field. AS, arcuate sulcus; FP, fixation point; PS, principal sulcus.

48 Prefrontal Neurons Representing Response–Outcome • Tsujimoto and Sawaguchi

Recording Procedures After training was completed, surgery for recording was performed. Under pentobarbital sodium anaesthesia (∼25 mg/kg, i.v.) and aseptic conditions, an oval opening was made in the skull to expose the dura over the frontal cortex and a stainless steel cylinder (20 × 40 mm) was implanted with dental acrylic. Prophylactic antibiotics were injected i.m. on the day of surgery and daily for 1 week thereafter. The activity of single neurons was recorded with custom-made glassinsulated elgiloy microelectrodes (0.3–1.8 MΩ), using conventional electrophysiological techniques similar to those described in our previous studies (Sawaguchi and Yamane, 1999; Iba and Sawaguchi, 2002). The microelectrode was positioned using a pulse motor-driven micromanipulator (MO-81; Narishige, Tokyo) and a plastic grid with numerous small holes (0.7 mm internal diameter, 1.5 mm apart from each other) attached to the cylinder. We did not pre-screen neurons for task-related responses. Rather, we advanced the electrode until the activity of one or more neurons was well isolated and then commenced data recording. Data for neuronal activity were digitized by a window discriminator (DDIS-1; BAK Electronics, Germantown, MD). The data for task events and eye positions were also digitized using an A/D converter. These digitized data were stored in a datacollection computer and analysed off-line. Furthermore, all analogue data (i.e. neuronal activity, eye positions and task sequence) were recorded on digital audio tape (DAT) using an eight-channel DAT recorder (PC-208 M; Sony, Tokyo, Japan). We focused on neurons in the dlPFC rostral to the frontal eye field (FEF) (Fig. 1c). To estimate physiologically the FEF, we applied intracortical microstimulation (ICMS; 22 cathodal pulses of 0.3 ms duration at 333 Hz, up to 100 µA) through the recording electrodes. When eye movements were elicited by the ICMS, the site was considered to be within the FEF (Bruce et al., 1985) and data recorded from these sites were excluded from this study. Data Analysis Since we were interested in neuronal processes associated with response–outcome in this study, we focused on F2-period activity and applied a two-factor analysis of variance (ANOVA) to examine the effects of saccade direction and its outcome (reward/non-reward) on the activity during the F2-period. Neurons with significant main effects in both factors were the focus of the study, as activity is influenced by both the preceding response and its outcome (P < 0.05). To examine the spatial tuning of the F2-period activity, we adapted the Gaussian function as follows: 3

f(d) = B +

exp ( – 0.5 [ ( d – D + ( i – 2 )T ) ⁄ ( Td ) ] ) ∑ --------------------2πTd A

2

2

i=1

where f(d) is the mean discharge rate as a function of the saccade direction and d is saccade direction, B indicates the ‘baseline’ discharge rate during the precue ‘control’ period (1 s before the cue presentation), D indicates the discharge rate at the best direction, T is 360°, A is the activity response strength and Td is an index of the tuning width, with a smaller Td value indicating sharper tuning.

Results Neuronal Database We recorded activity from a total of 735 neurons in the dlPFC while the monkey performed the ODR task under the two conditions and we examined activity during the post-response F2-period (500–2500 ms after target onset). In sum, 309 (42%) neurons showed a significant main effect in at least one of the two factors, as revealed by two-way ANOVA (direction × reward condition, P < 0.05). Of the 309 F2 neurons, 156 (51%) showed significantly different activity between the reward conditions. Most of them (104/156, 67%) also showed significant differences according to the direction of the preceding response. We focused here on these 104 neurons, where post-response activity was affected by both response direction

and reward-contingency, since they are likely to code response–outcome based on the nature of response made immediately before. This sample of neurons contained two distinct types: ‘Rw– type’ (n = 41), which showed significantly higher F2-period activity in the DL-Rw condition than in the IM-Rw condition, and ‘Rw+ type’ (n = 37), which showed significantly higher F2-period activity in the IM-Rw condition. The F2-period activity of the remaining 26 neurons (‘complex type’) was not as straightforward, showing a significant interaction between the direction and reward conditions in addition to the main effects. Most Rw– and Rw+ neurons did not show any clear change in activity preceding the onset of the response: for Rw– neurons, only 20% (8/41) showed directional cue and/or delay period activity (one-way ANOVA, P < 0.05; n = 3 for cue only, n = 2 for delay only and n = 3 for both cue and delay), whereas for Rw+ neurons, only 19% (7/37) showed directional cue and/or delay period activity (n = 2 for cue only, n = 3 for delay only and n = 2 for both cue and delay). The activity preceding the response was relatively weak and it was not apparent in the population activity of either Rw– or Rw+ neurons, as is evident in the population histograms (see below). F2-period Activity of Representative Neurons An example of Rw– neurons is shown in Figure 2a. In this figure, raster displays and averaged histograms of neuronal activity are illustrated separately for the two conditions and the six directions. This neuron showed an increase in activity during the F2-period in the DL-Rw condition, particularly in lower right (300°) trials. In contrast, this neuron showed little change in activity during the F2-period in the IM-Rw condition. The two-factor ANOVA confirmed the significant main effects of both directions and reward conditions (P < 0.01, for both factors). Figure 2b illustrates tuning curves of F2-period activity in the neuron shown in Figure 2a, which were estimated using a Gaussian function. F2-period activity in the DL-Rw condition had a clear spatial tuning (Td = 51.7°, best direction = 277°), which was not evident in the IM-Rw condition. Figure 3a shows an example of Rw+ neurons. This neuron showed an increase in activity during the F2-period in the IMRw condition, particularly in 60 and 120° trials, showing a clear spatial tuning (Fig. 3b, Td = 61.5°, best direction = 102°). In contrast, this neuron did not show such a directional F2period activity or spatial tuning in the DL-Rw condition. The ANOVA revealed significant main effects of the two factors (direction and reward condition, P < 0.01 for both factors). Figure 4 shows two examples of ‘complex’ neurons, which show rather complex F2-period activity with significant interaction in addition to the main effects of reward and direction. In this figure, the mean discharge rate during the F2-period for each reward condition is plotted separately for the six directions, with a Gaussian function-fitting curve applied to each data set. The neuron in Figure 4a shows directional F2-period activity, especially in the DL-Rw condition, with a best direction of 354°, whereas this neuron showed spatial tuning for almost the opposite direction in the IM-Rw condition (best direction = 176°), although the tuning was not as clear. The neuron shown in Figure 4b showed a similar pattern of interaction between direction and reward condition, although the magnitude of F2-period activity was larger and spatial tuning was sharper in the IM-Rw condition (best direction = 275° for

Cerebral Cortex January 2004, V 14 N 1 49

Figure 2. Activity of an Rw– neuron. (a) Raster displays and averaged histograms, which are aligned by the onset of the target of the second fixation (Tg), reward or click sound presentation (Rw1/clk) and reward delivery after the F2-period (Rw2), for the activity in two conditions. The activities associated with the six directions are illustrated separately. (b) Tuning curves of the F2-period activity shown in (a), for each reward condition. Plots show mean discharge rate during the F2-period for the six saccade directions, with a Gaussian function-fitting curve applied to the data. Error bars show SD. (c) Averaged histograms of the activity of the same neuron as (a) for trials with preferred direction (i.e. 300°) in the DL-Rw condition, which are calculated separately according to the condition of the immediately preceding trial. Solid and dotted lines show F2-period activities for the DL-Rw and IM-Rw conditions in the immediately preceding trials, respectively.

DL-Rw and 61° for IM-Rw conditions). Although these complex neurons are likely to code response–outcome, we excluded them from further analyses, because their relatively small number (n = 26), and complex activity prohibited analysis. Further Examinations of the Properties of F2-period Activity To examine further the properties of the activity of Rw– and Rw+ neurons, we summed neuronal activity for the direction that had the greatest differential activity between the two conditions, separately for each type, and made population histograms (Rw– and Rw+, n = 41 and 37, respectively; see Fig. 5). In addition, at the population level, the two types of neurons showed distinct F2-period activity according to the reward condition, although both the magnitude and time course were similar between the two types of neurons. Further, at the population level, both types of neurons did not show any clear activity change preceding the response, indicating that these neurons form distinct groups that are different from neurons showing activity in other periods such as cue and delay, as described above.

50 Prefrontal Neurons Representing Response–Outcome • Tsujimoto and Sawaguchi

To compare the spatial tuning in Rw– and Rw+ neurons for DL-Rw and IM-Rw conditions, respectively, we calculated an index of the tuning width (Td) and the best direction using a Gaussian function. Td values were similarly distributed for Rw– and Rw+ neurons in DL-Rw and IM-Rw conditions, respectively (mean ± SD, 76.59 ± 18.76 versus 81.89 ± 20.68 for Rw– and Rw+ neurons in DL-Rw and IM-Rw conditions, respectively; P > 0.05, Mann–Whitney U-test). The best directions were also similarly distributed between Rw– and Rw+ neurons (mean ± SD, 188 ± 107° versus 163 ± 109° for Rw– and Rw+ in DL-Rw and IM-Rw conditions, respectively; P > 0.05, Mann–Whitney U-test) and did not show any bias in the visual field. Thus, the two groups of neurons showed similar spatial tuning in terms of width and direction. Considering the nature of the present behavioural paradigm, it was possible that the neuronal response of Rw– and Rw+ neurons during the F2-period was influenced by the reward condition of previous trials. To examine this possibility, we calculated the mean discharge rate separately according to the condition of the immediately preceding trials. Figures 2c and 3c show the results of such analysis on each neuron shown in Figures 2a and 3a, respectively. In these figures, the recalcu-

Figure 3. Activity of an Rw+ neuron. The format and abbreviations are as in Figure 2. In (c), 60° trials in the IM-Rw condition are shown.

lated F2-period activity of the direction and condition with maximum activity (i.e. 300° in DL-Rw condition for Fig. 2, 60° in IM-Rw condition for Fig. 3) are illustrated separately for the condition of previous trials. As shown in Figures 2c and 3c, neither Rw– nor Rw+ neuron showed any clear difference in activity between reward conditions of immediately preceding responses. Similar results were obtained for all of Rw– and Rw+ neurons. Thus, Rw– and Rw+ neurons did not appear to be influenced by history of the reward delivery in the previous trials. F2-period Activity in the FIX Task Although the F2-period activity of Rw– and Rw+ neurons showed a clear spatial tuning, it was possible that directional selectivity was associated not with the directions of saccade made immediately before, but with the eye position on a specific peripheral target. To test this possibility, some Rw– and Rw+ F2 neurons were also examined in a fixation (FIX) task under two reward conditions that were similar to the ODR task (see Materials and Methods), in which the monkey was required to fixate solely on the target that appeared at one of the six locations. After fixation on the target for 2 s, the reward was delivered in the IM-Rw condition only; an additional 2 s-fixation (‘F2-period’) was followed by reward in both the DL-Rw and IM-Rw conditions. F2-period activities of the two representative neurons examined in both tasks are illus-

Figure 4. Tuning curves of the F2-period activity of two ‘complex’ neurons, both of which show complex activity with interaction between direction and reward condition. Plots show mean discharge rate during the F2-period for the six saccade directions, with a Gaussian function-fitting curve applied to the data.

trated in Figure 6, in which the F2-period activity of the direction with most differential activity between the two reward conditions in the ODR task are illustrated separately for the four conditions. The neuron in Figure 6a showed higher F2period activity, especially in the lower right (240°) trials (illustrated direction) in the DL-Rw condition of the ODR task only (i.e. Rw– type). However, such a difference did not appear in the FIX task, despite using the same eye position during this F2-period. Further, this neuron showed spatial tuning in the

Cerebral Cortex January 2004, V 14 N 1 51

Figure 5. Population histograms of (a) Rw– (n = 41) and (b) Rw+ (n = 37) neurons, aligned at the cue period (C), delay (D), target onset (Tg), first reward (Rw1)/click tone (clk) and second reward (Rw2). Solid and dotted lines show the activities in DL-Rw and IM-Rw conditions, respectively. Error bars show SD.

DL-Rw condition in the ODR task only (Fig. 6c). Similarly, an Rw+ neuron in Figure 6b showed differential activity in the IMRw condition of the ODR task only and had spatial tuning for only one of the four conditions (i.e. IM-Rw condition in the ODR task; Fig. 6d). Among 11 neurons examined in the four conditions of the two tasks (DL-Rw and IM-Rw conditions of the ODR and FIX tasks), none showed significantly different F2-period activity according to the reward conditions in the FIX task. To clarify this point, we made population histograms of Rw– and Rw+ neurons that were examined in both the ODR and FIX tasks (n = 7 for Rw–, n = 4 for Rw+), illustrated in Figure 7. As expected, the neuronal population showed differential F2period activity between reward conditions in the ODR task only (Fig. 7a and 7b for Rw– and Rw+ neurons, respectively), while such a significant difference in activity was not observed in the FIX task (Fig. 7c and 7d for Rw– and Rw+ neurons, respectively).

Discussion The present study examined neuronal activity in the dlPFC of monkeys performing a variant of the ODR task under two

52 Prefrontal Neurons Representing Response–Outcome • Tsujimoto and Sawaguchi

reward conditions. We found that a subset of neurons showed differential post-response (‘F2-period’) activity, depending on whether the response had been rewarded, the magnitude of which differed with the direction of saccade made immediately before. The differences between the directions did not appear to be associated with the position of the gaze, but, rather, with the behavioural response (i.e. directional memory-guided saccade) made immediately before. The difference in activity between reward and non-reward conditions could not be ascribed to any aspects of the saccadic response and sensory events, because the subject was not able to anticipate reward delivery/non-delivery until the end of saccade and the visual and auditory (including the click sound by reward delivery) events were exactly the same in both conditions. Further, the directional difference in activity cannot be explained by such factors as licking movements or gustatory stimuli, since these factors should be the same for all directions of each condition. In addition, the difference in activity between two conditions did not seem to represent differences in the predictability of reward, because in our paradigm, DL- and IM-Rw conditions were randomly intermixed and the same amount of reward was delivered in both conditions after the F2-period. Thus, our findings suggest that a subset of neurons in the dlPFC code directional spatial response–outcome (i.e. reward/non-reward associated with directional saccade made immediately before). Several studies have suggested that single neurons in the PFC show directional delay-period activity that is modulated by predicted reward (Watanabe, 1996; Leon and Shadlen, 1999; Kobayashi et al., 2002). Further, the PFC neurons have been demonstrated to code associations between sensory stimuli and the predicted reward (Watanabe, 1990, 1992). These studies have suggested that such reward prediction can contribute to reward-based control/learning of behaviour. However, it is also important for reward-based behavioural control to associate reward/non-reward with the response made immediately before (Balleine and Dickinson, 1998). We demonstrate here that the dlPFC neurons show post-response activity that was modulated by both the direction of the prior response and its reward-contingency, suggesting that such neurons encode retrospective, but not prospective/expectational, response–outcome association. A few previous studies using tasks with GO/NO-GO responses reported that neurons in the dlPFC showed differential reward-related post-trial activity according to the preceding response, i.e. GO/NO-GO (Niki and Watanabe, 1979; Kubota and Komatsu, 1985; Watanabe, 1989). However, systematic analysis of these post-trial activities has been limited, particularly for cognitive spatial behaviours. In contrast to these previous studies, we adopted a variant of an ODR task that has been widely used to examine the involvement of the PFC in visuospatial working memory in monkeys (Funahashi et al., 1989; Sawaguchi and Iba, 2001) as well as humans (Sweeney et al., 1996). These studies with the ODR task have suggested that the dlPFC is involved in the cognitive control of spatial behaviour, particularly spatial workingmemory guided behaviour (for reviews, see Funahashi and Kubota, 1994; Goldman-Rakic, 1995; Fuster, 1997). In agreement, several lines of research have reported that the dlPFC is critical for performance of a spatial self-ordered task (Passingham, 1985; Owen et al., 1996; Levy and Goldman-

Figure 6. Two examples of F2-period activity, examined in both the ODR and FIX tasks. (a, b) Raster displays and averaged histograms of an Rw– neuron (a) and Rw+ neuron (b) are illustrated, which are aligned by the first reward or click tone (Rw1/clk) and the second reward (Rw2). Only the activities associated with the directions with the most differential activity between the two reward conditions in the ODR task are shown. (c, d) Tuning curves of F2-period activities, shown in (a) and (b), respectively.

Rakic, 1999), suggesting that this region is involved in the monitoring and/or manipulation of outcome of a spatial response. Here, for the first time, we found that distinct dlPFC neuronal groups code the outcome (reward/non-reward) of a different spatial response made immediately before. The PFC has been suggested to be involved in processes of updating the status to select a response (Owen et al., 1996; Funahashi and Inoue, 2000; Petrides, 2000), which is essential for contextually dependent, flexible control of behaviour. The Rw– and Rw+ neurons described here may subserve such a flexible/dynamic control of appropriate spatial behaviour by means of contributing to updating signals. Several lines of research have provided evidence that the dlPFC is involved in learning (Petrides, 1986; Asaad et al., 1998; Fletcher et al., 2001), in which a response is reinforced according to the response–reward contingency and the response is extinguished when a response is not rewarded. Rw– and Rw+ neurons have characteristics that are typically associated with learning processes, such as reinforcement and extinction, although our paradigm did not require any learning of outcome information.

Thus, for the first time, we provide evidence that the dlPFC contains distinct neuronal groups that code outcome following the differential spatial response, which should play a role in associating different (directional) response with its outcome. Such coding may be critical in flexible control and/ or learning of appropriate spatial behaviour according to a rule or situation that can change dynamically, which has long been considered a critical role of the PFC (Milner, 1963; Miller and Cohen, 2001; Wallis et al., 2001). However, it is still unclear whether the activation of F2 neurons is actually used for such behavioural control, because our paradigm did not explicitly require the subject to select the next response based on the response–outcome; this problem/topic should be approached in further studies using more appropriate behavioural paradigms. Nevertheless, our findings of distinct neuronal groups that code response–outcome would be significant for understanding the neuronal mechanisms of flexible control/learning of cognitive spatial behaviours mediated by the dlPFC.

Notes The authors thank Dr K. Amemori for valuable comments on this study. The authors also thank K. Watanabe-Sawaguchi and E. Ishida for

Cerebral Cortex January 2004, V 14 N 1 53

Figure 7. Population histograms of Rw– (n = 7) and Rw+ (n = 4) neurons, which were examined in both the ODR and FIX tasks, aligned at first reward (Rw1)/click tone (clk) and second reward (Rw2). (a) Rw– neurons in the ODR task. (b) Rw+ neurons in the ODR task. (c) Rw– neurons in the FIX task. (d) Rw+ neurons in the FIX task. Solid and dotted lines show the activities in DL-Rw and IM-Rw conditions, respectively.

their assistance with animal care and surgery. This work was supported by Grants-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science, and Technology. Address correspondence to Dr Toshiyuki Sawaguchi, Laboratory of Cognitive Neurobiology, Hokkaido University Graduate School of Medicine, N15W7, Kita-ku, Sapporo 060-8638, Japan. Email: [email protected].

References Asaad WF, Rainer G, Miller EK (1998) Neural activity in the primate prefrontal cortex during associative learning. Neuron 21:1399–1407. Balleine BW, Dickinson A (1998) Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37:407–419. Boch RA, Goldberg ME (1989) Participation of prefrontal neurons in the preparation of visually guided eye movements in the rhesus monkey. J Neurophysiol 61:1064–1084. Bruce CJ, Goldberg ME, Bushnell MC, Stanton GB (1985) Primate frontal eye fields. II. Physiological and anatomical correlates of electrically evoked eye movements. J Neurophysiol 53:606–635. Cohen JD, Braver TS, O’Reilly RC (1996) A computational approach to prefrontal cortex, cognitive control and schizophrenia: recent developments and current challenges. Phil Trans R Soc Lond B 351:1515–1527. Fletcher PC, Anderson JM, Shanks DR, Honey SR, Carpenter TA, Donovan T, Papadakis N, Bullmore ET (2001) Responses of human frontal cortex to surprising events are predicted by formal associative learning theory. Nat Neurosci 4:1043–1048. Funahashi S, Inoue M (2000) Neuronal interactions related to working memory processes in the primate prefrontal cortex revealed by cross-correlation analysis. Cereb Cortex 10:535–551. Funahashi S, Kubota K (1994) Working memory and prefrontal cortex. Neurosci Res 21:1–11. Funahashi S, Bruce CJ, Goldman-Rakic PS (1989) Mnemonic coding of visual space in primate prefrontal neurons revealed by oculomotor paradigms. J Neurophysiol 63:814–831. Funahashi S, Bruce CJ, Goldman-Rakic PS (1990) Visuospatial coding in primate prefrontal neurons revealed by oculomotor paradigms. J Neurophysiol 63:814–831.

54 Prefrontal Neurons Representing Response–Outcome • Tsujimoto and Sawaguchi

Funahashi S, Bruce CJ, Goldman-Rakic PS (1991) Neuronal activity related to saccadic eye movements in the monkey’s dorsolateral prefrontal cortex. J Neurophysiol 65:1464–1483. Fuster JM (1973) Unit activity in prefrontal cortex during delayedresponse performance: neuronal correlates of transient memory. J Neurophysiol 36:61–78. Fuster JM (1997) The prefrontal cortex. Philadelphia, PA: LippincottRaven. Goldman-Rakic PS (1995) Cellular basis of working memory. Neuron 14:477–485. Iba M, Sawaguchi T (2002) Neuronal activity representing visuospatial mnemonic processes associated with target selection in the monkey dorsolateral prefrontal cortex. Neurosci Res 43:9–22. Kobayashi S, Lauwereyns J, Koizumi M, Sakagami M, Hikosaka O (2002) Influence of reward expectation on visuospatial processing in macaque lateral prefrontal cortex. J Neurophysiol 87:1488–1498. Kojima S (1980) Prefrontal unit activity in the monkey: relation to visual stimuli and movements. Exp Neurol 69:110–123. Kojima S, Goldman-Rakic PS (1982) Delay-related activity of prefrontal neurons in rhesus monkeys performing delayed response. Brain Res 248:43–49. Kubota K, Komatsu H (1985) Neuron activities of monkey prefrontal cortex during the learning of visual discrimination tasks with GO/ NO-GO performances. Neurosci Res 3:106–129. Kubota K, Iwamoto T, Suzuki H (1974) Visuokinetic activities of primate prefrontal neurons during delayed-response performance. J Neurophysiol 37:1197–1212. Leon MI, Shadlen MN (1999) Effect of expected reward magnitude on the response of neurons in the dorsolateral prefrontal cortex of the macaque. Neuron 24:415–425. Levy R, Goldman-Rakic PS (1999) Association of storage and processing functions in the dorsolateral prefrontal cortex of the nonhuman primate. J Neurosci 19:5149–5158. Miller EK, Cohen JD (2001) An integrative theory of prefrontal cortex function. Annu Rev Neurosci 24:167–202. Milner B (1963) Effects of different brain lesions on card sorting. Arch Neurol 9:100–110. Niki H (1974) Differential activity of prefrontal units during right and left delayed response trials. Brain Res 70:346–349. Niki H, Watanabe M (1979) Prefrontal and cingulate unit activity during timing behavior in the monkey. Brain Res 171:213–224. Owen AM, Evans AC, Petrides M (1996) Evidence for a two-stage model of spatial working memory processing within the lateral frontal cortex: a positron emission tomography study. Cereb Cortex 6:31–38. Passingham RE (1985) Memory of monkeys (Macaca mulatta) with lesions in prefrontal cortex. Behav Neurosci 99:3–21. Petrides M (1986) The effect of periarcuate lesions in the monkey on the performance of symmetrically and asymmetrically reinforced visual and auditory Go, No-Go tasks. J Neurosci 6:2054–2063. Petrides M (2000) Dissociable roles of mid-dorsolateral prefrontal and anterior inferotemporal cortex in visual working memory. J Neurosci 20:7496–7503. Pochon JB, Levy R, Fossati P, Lehericy S, Poline JB, Pillon B, Le Bihan D, Dubois B (2002) The neural system that bridges reward and cognition in humans: an fMRI study. Proc Natl Acad Sci USA 99:5669–5674. Ramnani N, Miall RC (2003) Instructed delay activity in the human prefrontal cortex is modulated by monetary reward expectation. Cereb Cortex 13:318–327. Rao SC, Rainer G, Miller EK (1998) Integration of what and where in the primate prefrontal cortex. Science 276:821–824. Sawaguchi T, Yamane I (1999) Properties of delay-period neuronal activity in the monkey dorsolateral prefrontal cortex during a spatial delayed matching-to-sample task. J Neurophysiol 82:2070–2080. Sawaguchi T, Iba M (2001) Prefrontal cortical representation of visuospatial working memory in monkeys examined by local inactivation with muscimol. J Neurophysiol 86:2041–2053. Schultz W (2001) Multiple reward signals in the brain. Nat Rev Neurosci 1:199–207.

Sweeney JA, Mintun MA, Kwee S, Wiseman MB, Brown DL, Rosenberg DR, Carl JR (1996) Positron emission tomography study of voluntary saccadic eye movements and spatial working memory. J Neurophysiol 75:454–468. Takeda K, Funahashi S (2002) Prefrontal task-related activity representing visual cue location or saccade direction in spatial working memory tasks. J Neurophysiol 87:567–588. Thut G, Schultz W, Roelcke U, Nienhusmeier M, Missimer J, Maguire RP, Leenders KL (1997) Activation of the human brain by monetary reward. Neuroreport 8:1225–1228. Wallis JD, Anderson KC, Miller EK (2001) Single neurons in prefrontal cortex encode abstract rules. Nature 411:953–956.

Watanabe M (1989) The appropriateness of behavioral responses coded in post-trial activity of primate prefrontal units. Neurosci Lett 101:113–117. Watanabe M (1990) Prefrontal unit activity during associative learning in the monkey. Exp Brain Res 80:296–309. Watanabe M (1992) Frontal units of the monkey coding the associative significance of visual and auditory stimuli. Exp Brain Res 89:233–247. Watanabe M (1996) Reward expectancy in primate prefrontal neurons. Nature 382:629–632. Watanabe M, Hikosaka K, Sakagami M, Shirakawa S (2002) Coding and monitoring of motivational context in the primate prefrontal cortex. J Neurosci 22:2391–2400.

Cerebral Cortex January 2004, V 14 N 1 55

Suggest Documents