ternative sources of point deliveries were su- perimposed). An unsignaled changeover de- lay prevented points from being delivered for. 5 s following a change ...
2003, 79, 193–206
JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR
NUMBER
2 (MARCH)
EFFECTS OF ALTERNATIVE REINFORCEMENT ON HUMAN BEHAVIOR: THE SOURCE DOES MATTER G REGORY J. M ADDEN UNIVERSIT Y OF WISCONSIN–EAU CLAIRE
AND
M ICHAEL P ERONE WEST VIRGINIA UNIVERSIT Y
Competing theories regarding the effects of delivering periodic response-independent reinforcement (more accurately, response-independent points exchanged for money) on a baseline rate of behavior were evaluated in human subjects. Contiguity theory holds that these events decrease target responding because incompatible behavior is adventitiously strengthened when the point deliveries follow target behavior closely in time. Matching theory holds that response-independent points, like any other alternative reinforcer, should reduce target responding. On this view, temporal contiguity between target responding and response-independent point delivery is unimportant. In our experiment, four different responses (moving a joystick in four different directions) were reinforced with points exchangeable for money according to four independent variable-interval schedules. Different schedules of point delivery were then superimposed on these baselines. When all superimposed point deliveries occurred immediately after one of the four responses (the target response), time allocated to target responding increased. When the superimposed point deliveries could be delivered at any time, time allocated to target responding declined and other behavior increased. When superimposed points could never immediately follow target responses, time allocated to target responding decreased further and other behavior or pausing predominated. The findings underscore the contribution of temporal contiguity in the effects of response-independent deliveries of food, money, points, etc. Key words: contiguity theory, matching law, response-independent reinforcement, money, joystick response, humans
The effects of response-independent reinforcement (more accurately, response-independent stimuli such as deliveries of food, water, or some other commodity) on behavior have been of interest since Skinner’s (1948) early investigation of ‘‘superstition’’ in pigeons. When Skinner delivered food at regular intervals, most of the pigeons developed stereotyped response patterns (e.g., rooting in a corner or pecking the floor), instances of which often preceded food deliveries. Skinner explained the development of these patterns as a result of adventitious reinforcement: the food deliveries strengthened This research was supported by a Grant-In-Aid of Research from Sigma Xi and by a grant from the West Virginia University Department of Psychology Alumni Fund. Portions of this paper were presented at the 20th annual meeting of the Association for Behavior Analysis, Atlanta, May 1994. Correspondence concerning this paper and reprint requests should be sent to Gregory J. Madden, Department of Psychology, University of Wisconsin–Eau Claire, Eau Claire, Wisconsin 54702-4004 (e-mail: maddengj@uwec. edu).
whatever behavior happened to precede them. Skinner’s view—that temporal contiguity between response and reinforcer is sufficient to increase the probability of behavior—has not been met with universal acceptance (e.g., Staddon & Simmelhaag, 1971; Timberlake & Lucas, 1985). Several subsequent studies further examined the role of response-reinforcer contiguity on operant response rates (e.g., Edwards, Peek, & Wolfe, 1970; Henton & Iversen, 1978; Imam & Lattal, 1988; Kop & Van Haaren, 1982; Lachter, 1971; Lachter, Cole, & Schoenfeld, 1971; Lattal, 1974; Lattal & Abreu-Rodriguez, 1997; Lattal & Bryan, 1976; Rachlin & Baum, 1972). Baseline responding typically was established with a schedule of responsedependent reinforcement, and then a subsequent schedule of response-independent food delivery was superimposed. Thus, some reinforcers were obtained contingent upon a defined target response (e.g., key pecking) and other food deliveries occurred regardless of what the animal was doing. In general, su-
193
194
MADDEN and PERONE
perimposing a schedule of response-independent food deliveries decreased response rates. Henton and Iversen (1978) suggested that these procedures decrease target response rates because competing ‘‘superstitious’’ nontarget responding increases. In their Experiment 9, a variable-time (VT) 1-min schedule of food delivery was superimposed on a variable-interval (VI) 1-min baseline. In addition to measuring their rats’ lever pressing, Henton and Iversen measured four other responses (e.g., rearing). Consistent with contiguity theory, they reported decreased rates of lever pressing and increased rates of the other behaviors after the latter were immediately followed by response-independent food deliveries (comparable effects have been reported by Alleman & Zeiler, 1974; Lattal, 1972; Vollmer, Ringdahl, Roane, & Marcus, 1997; and Zeiler, 1971). According to Henton and Iversen’s account, as more time was spent making these superstitious responses, less time was allocated to target behavior. In an alternative account, Rachlin and Baum (1972) argued that response-independent food deliveries reduce target behavior because organisms come to discriminate them from response-dependent reinforcers (see also Baum, 1981; Staddon & Simmelhag, 1971). The discrimination derives from ‘‘the molar correlation between responding and reinforcement’’ (Rachlin & Baum, p. 238) and is not easily weakened by the occasional food delivery that immediately follows a target response. On the basis of this molar account, Rachlin and Baum predicted that response-independent food would produce comparable decreases in target behavior regardless of whether response-reinforcer contiguity was left to chance or was prevented from occurring. To test this prediction, Rachlin and Baum (1972) superimposed different configurations of response-independent food delivery on a VI schedule of food reinforcement that maintained keypecking. When a VT 3-min schedule was superimposed, the response-independent food deliveries could, by chance, closely follow keypecking (an event that should, according to contiguity theory, increase response rates above the baseline VI levels). In another condition, a tandem VT 3-min differential reinforcement of other be-
havior (DRO) 2-s schedule was superimposed on the VI baseline. Because response-independent food could not be delivered within 2 s of a key peck in the latter condition, response-independent food deliveries were likely to be delivered while other behavior was underway. According to contiguity theory, these other behaviors should be adventitiously reinforced, thereby leaving less time for target responding and reducing target behavior below levels observed in the VT condition. Molar theory, however, predicts no difference in response rates across the VT and VT DRO conditions because subjects should discriminate between response-dependent and response-independent food deliveries based on the dependency alone. Of the 2 pigeons exposed to both conditions, 1 responded at higher rates in the superimposed VT condition. But between-subject comparisons in 6 other pigeons revealed no consistent difference across conditions, supporting Rachlin and Baum’s contention that organisms can discriminate response-independent from response-dependent food deliveries despite occasional pairings of the former with target responding. Rachlin and Baum’s (1972) results also are consistent with Herrnstein’s (1970) single-alternative version of the matching law: B5
kr 1 . r 1 1 r 2 1 rO
(1)
According to this account, organisms come to discriminate that some reinforcers (r 1) are delivered contingent upon target responding (B), some food deliveries are response-independent (r 2), and some reinforcers are obtained contingent upon behavior other than target responding (r O)—k corresponds to the maximum response rate and is assumed to remain constant. According to Equation 1, target response rate declines as the frequency of response-independent food (r 2) increases relative to response-contingent reinforcement (r 1). Imam and Lattal (1988) noted that between-subject variability might have hampered Rachlin and Baum’s (1972) ability to detect differences across conditions in which response-independent food could or could not be delivered contiguously with responding. Therefore, they systematically replicated
195
ALTERNATIVE REINFORCEMENT Rachlin and Baum’s procedures while conducting more extensive within-subject comparisons. Response rates in their first superimposed VT and tandem VT DRO conditions were comparable in 2 pigeons while 1 pigeon responded at higher rates in the VT condition and the 4th did the opposite. In the second exposure to these conditions, however, all 4 pigeons responded at higher rates in the VT condition, and for 1 pigeon the DRO value was increased from 2 s to 6 s (see Zeiler, 1976, for comparable results with 10-s DRO values). Perhaps important for contiguity theory, Imam and Lattal reported a negative relation between response rate and the time interval between key pecks and response-independent food deliveries. These findings are difficult to reconcile with Equation 1. Burgess and Wearden (1986) proposed a more molecular model of response-independent reinforcement that renders Herrnstein’s (1970) equation consistent with the Imam and Lattal (1988) and other findings: B5
k(r 1 1 pr 2 ) . r 1 1 pr 2 1 (1 2 p)r 2 1 rO
(2)
According to this model, a proportion of response-independent food deliveries (p) function as response-dependent reinforcers; all other parameters are as in Equation 1 (for a similar account of concurrent schedule performance, see Davison & Jenkins, 1985). Rachlin and Baum’s (1972) account must predict that p is equal to zero if response rates are equivalent regardless of whether response-independent reinforcers occasionally or never immediately follow target responding: When p is equal to zero, Equation 2 reduces to Equation 1. Contiguity theory predicts that p equals the proportion of response-independent food delivered in close temporal proximity to target responding. If p were greater than zero, target response rates would be higher when response-independent food occasionally followed target responding than if this never happened. The purposes of the present experiment were (a) to gather further data regarding the role of temporal contiguity in the effects of response-independent stimuli on a baseline of behavior, (b) to examine if human operant behavior is systematically affected by shifting from response-dependent reinforcement to
response-independent schedules, and (c) to assess quantitative predictions of Equations 1 and 2 under these conditions. To this end, we systematically replicated the Imam and Lattal (1988) experiment with human subjects while measuring time spent engaged in target and other behavior. In the first condition, a VI schedule of points exchangeable for money was superimposed on a baseline VI schedule. In subsequent conditions, the superimposed schedule was changed to either a VT or a tandem VT DRO schedule. Thus, as in the studies by Rachlin and Baum (1972) and Imam and Lattal, a target response either immediately preceded the superimposed reinforcers (superimposed VI), sometimes preceded the point deliveries (superimposed VT), or never immediately preceded point deliveries (superimposed tandem VT DRO). Unlike the Rachlin and Baum experiment, we conducted within-subjects comparisons of behavior across these conditions and measured temporal intervals between target responding and response-independent point deliveries. METHOD Participants College students were recruited for voluntary participation by advertisements posted around West Virginia University and in the campus newspaper. Three women were selected from the applicants because they were experimentally naive, needed money, and had no more than an introductory course in psychology. Before giving informed consent, each subject participated in a trial session to illustrate the basic procedure and to ensure that she could move a joystick comfortably. Subjects were 19 (Subject S1), 24 (S2), and 25 (S3) years old. They were paid a base rate of $0.40 for each session plus the monetary reinforcers obtained during the session. Earnings averaged $4.16 per hour. Subjects forfeited any money earned during the session if they withdrew from the study before its completion. Apparatus Each subject sat alone in a quiet room at a table with a response console containing a spring loaded, self-centering joystick; two push buttons; and a 14-in color computer
196
MADDEN and PERONE
monitor positioned on top. The joystick protruded 3 cm through the 12 by 45 cm sloped (458) face of the console and could be deflected with a force of approximately 1 N. A cross-shaped opening in the center of the console face restricted joystick movements to 4 cm in any of four mutually exclusive directions (left, right, up, and down). A black ‘‘reinforcer collection’’ button was mounted on the face of the console 22 cm from the joystick and 3 cm from the right edge. A red button was mounted on top of a small aluminum box and attached to the left side of the console. The buttons closed separate circuits when pressed with a force of approximately 1 N. White noise was delivered through headphones to mask extraneous sounds. A microcomputer in an adjacent room controlled experimental events and recorded data. General Procedure Each subject came to the laboratory 5 days per week for five scheduled sessions per day, with each session lasting 21 minutes (not counting time during the delivery of reinforcers). Brief rest periods separated the sessions. Personal items were not allowed in the room. Subjects participated at nonoverlapping times so they would be less likely to meet each other and, perhaps, discuss the experiment between sessions. These instructions appeared on the monitor before every session: Move the joystick in any of the four directions to produce a white signal that tells you points are available. When the white signal in the middle of the screen is present, you have two seconds to press the black button to earn five cents. You must hold the red button down with your left hand throughout the session. Press and hold down the red button to begin.
Before the first session, the experimenter read these instructions while pointing to the joystick and buttons. Questions were answered by referring to the instructions or by saying ‘‘You will figure that out by working on the task.’’ When the red button was depressed, the session began and the instructions were replaced with a 1 cm by 1 cm green box located in the center of a blue background on the monitor. Four mutually exclusive responses were possible: moving the joystick to the left,
right, up, or down. Moving the joystick 2 cm or more in any direction was counted as a response and caused the green box to extend to an 8 cm by 1 cm rectangle in that direction. The box remained extended until the joystick was returned to within 2 cm of the center position (releasing the joystick automatically returned it to the center position). Moving the joystick less than 2 cm produced no stimulus change and was not recorded as a response. The joystick had to be returned to within 2 cm of the center position before another response could be recorded. When a reinforcement contingency had been fulfilled (see below) by deflecting the joystick at least 2 cm or a response-independent point delivery was available, a flashing white vertical line (2 mm by 1 cm) appeared in the center of the green box. The line continued to flash until the subject pressed the black button on the face of the console, or until 2 s had elapsed. If the button was pressed in time, the screen was blank except for a message indicating, ‘‘Five cents have been added to your total earnings.’’ If the button was not pressed in time, the message was ‘‘Too Late: No Money’’ (a consequence that subjects rarely contacted after the first session and never contacted in stable sessions). In either case, the message was presented for 3 s and then the green box was returned to the screen. Schedule timers and the session clock were suspended during these message-delivery periods. Throughout each session, the subject was required to hold down the red button on the left side of the console. Releasing the button initiated a 5-s timeout during which the schedule timers and session clock were suspended and the screen was black except for the message ‘‘Illegal Button Release.’’ If the red button was not pressed at the end of the 5 s, the timeout was restarted. The requirement to hold down the red button, in conjunction with the limited hold placed on reinforcers, was designed to occupy the subject’s eyes and hands and decrease the probability that subjects would fall asleep or walk away from the apparatus while data were being recorded (see Madden & Perone, 1999). Stability criteria. Except as noted below, conditions continued for at least 15 sessions and until behavior stabilized. Judgments about
197
ALTERNATIVE REINFORCEMENT stability were based on relative time allocated to each response (i.e., time spent moving the joystick in one direction divided by the sum of times spent in all four directions). Time allocation was selected over response allocation because (a) local response rates were approximately the same when subjects were responding (i.e., when not pausing), (b) presenting time data allows pausing to be quantified as a fifth behavior, (c) Equations 1 and 2 may be applied both to response and time allocation, and (d) Baum (1979) has suggested time allocation may provide a better measure of behavior under concurrent schedules of reinforcement. The stability criterion considered the most recent six sessions. For each relative time allocation, we calculated the difference between the mean of the first three sessions and the mean of the last three. The differences for three of the four relative time allocations were required to be less than 0.15. In addition, session-by-session graphs of time allocations had to be free of trend as judged by visual inspection. Experimental Conditions The order of conditions, the baseline and superimposed schedules, and the number of sessions in each condition are presented in Table 1. Baseline sessions. Four independent VI 3-min schedules were programmed, one for each direction the joystick could be moved. The sequence of 14 intervals comprising each schedule was generated using Fleshler and Hoffman’s (1962) procedure. The purpose of these sessions was to identify a response for each subject that was neither preferred nor avoided. The stability criteria were not applied to these sessions, and the data are not presented below. The identified response was used in subsequent conditions as the target response (i.e., the response upon which alternative sources of point deliveries were superimposed). An unsignaled changeover delay prevented points from being delivered for 5 s following a change in responding (i.e., joystick direction change). Changeover delays were initiated at the first response (i.e., deflection of the joystick by at least 2 cm) in a new direction. The changeover delay remained in effect for all subsequent conditions. Superimposed VI. In the superimposed VI
Table 1 Experimental conditions (in order of exposure) and the number of sessions in each. Target Subject response S1
Up
S2
Left
S3
Right
Superimposed schedule None (baseline) VI 15s VT 15s VI 15s tand VT 15s DRO None (baseline) VI 90s VT 90s VI 90s tand VT 90s DRO tand VT 90s DRO tand VT 90s DRO tand VT 90s DRO None (baseline) VI 90s VT 90s VI 90s tand VT 90s DRO tand VT 90s DRO tand VT 90s DRO
Sessions
10s
2s 5s 10s 20s
5s 10s 20s
4 15 15 15 21 7 26 25 39 32 17 26 24 10 22 22 42 8 33 30
Note. For S1, independent VI 5-min schedules were associated with the four joystick responses. For S2 and S3, VI 3-min schedules were used.
condition, a VI 90-s schedule was superimposed on the VI 3-min schedule correlated with the target response; the remaining three schedules were unchanged. Thus, a single target response made after either VI timer elapsed was followed by the flashing line signaling that a reinforcer could be collected (holding the joystick in one direction was not reinforced by either the baseline or superimposed VI schedule). In the event that both timers had elapsed, the baseline schedule was given priority and the superimposed reinforcer was available following the next target response. Subjects were given no instructions or other cues that a target response had been identified following the baseline sessions or that the reinforcement contingencies had changed. Because the time Subject S1 allocated to target responding was no higher than her time allocated to other responses under these conditions, the baseline schedule value was changed from VI 3 min to VI 5 min and the superimposed schedule value was changed from VI 90 s to VI 15 s (reinforcers were adjusted from 5 cents to 2.5 cents to keep her session earnings approximately constant).
198
MADDEN and PERONE
Data collected before reaching this configuration of schedules are not reported below. Superimposed VT. In the second experimental condition, the superimposed schedule was changed from a VI to a VT schedule of the same value. Points delivered by the VT schedule could be delivered at any time except during the changeover delay or if they became available while the joystick was already being held in one of the four different compass directions. Thus, VT points could be delivered while the joystick was centered, as the joystick was being deflected, at the moment the joystick was deflected by 2 cm or more, and as the joystick was being returned to the center position (but not if it was held in a deflected position when the VT timer elapsed). This was designed to mirror conditions in pigeon experiments in which individual instances of key pecks occupy little time and so VT food deliveries tend to occur at moments other than when the microswitch on the response key is already closed. Tandem VT DRO. Following a reversal to the superimposed VI condition, the superimposed schedule was changed to a tandem VT DRO schedule (the VT value was the same as that used in previous conditions). Under this contingency, superimposed VT point deliveries could not be delivered if the target response had occurred during the interval specified by the DRO but instead were held until this interval elapsed without a target response (see Lattal & Boyer, 1980). For Subject S1, the DRO value was set at 10 s, while it ranged from 2 s to 20 s for S2 and between 5 s and 20 s for S3. These values are comparable to those employed in Zeiler’s (1976) investigation of the effects of response-independent food delivery on pigeons’ responding maintained on fixed-interval schedules. Because of a scheduling error, S3 completed only eight sessions in the tandem VT DRO 5-s condition. Because relative time allocations met our quantitative stability criteria after these sessions, these data are presented below. Statistical methods. Statistical comparisons were made between times spent making target and nontarget responses across the VI, VT, and tandem VT DRO conditions. MannWhitney U tests were conducted for each subject, comparing the stable sessions from each condition. This test was chosen because sev-
eral of the distributions of difference scores between conditions were nonnormal. RESULTS Figure 1 shows the average amount of time per session allocated to the target response (filled bars), the average of the three nontarget responses (open bars), and pausing (hatched bars). The figure presents means based on the stable sessions of each condition; error bars show standard deviations. Time allocated to the four individual joystick responses and pausing is given in Table 2. The time values were obtained by summing the intervals between the first response (i.e., $ 2 cm deflection of the joystick) in one direction and the first response in a different direction (excluding time spent in the reinforcement cycle). Pausing was defined as an interresponse time of 5 s or more. For example, if 8 s elapsed between two successive upward movements of the joystick (and no other joystick movements occurred in this interval), this 8 s was added to the sum of the time spent pausing and subtracted from the time allocated to the upward response. As in typical animal experiments, no data were collected on the duration of individual responses (i.e., the time from which the joystick was deflected by at least 2 cm to the time at which it was returned to within 2 cm of the center position). Thus, if a subject held the joystick in a deflected position for more than 5 s (analogous to a rat holding down a lever), this could not be discerned from the data and so this time was added to the time spent pausing (casual observations of subjects revealed no instances of holding the joystick in this way and, as noted above, neither response-dependent nor response-independent point deliveries could occur while the joystick was held in a deflected position). Relatively more time was spent on the target response than the average nontarget response when the superimposed VI schedule was in effect; that is, when all superimposed reinforcers occurred contingent upon and immediately following target responding. Subjects spent approximately one third of the session making the target response and pausing was rarely observed. Subject S1 showed the most extreme pattern of time allocation during this condition (particularly in the sec-
ALTERNATIVE REINFORCEMENT
199
Fig. 1. Mean time allocated to the target joystick response, the average of the three remaining nontarget joystick directions, and pausing in each condition. Error bars show standard deviations.
200
MADDEN and PERONE Table 2 Time (in seconds) allocated to each of the four joystick responses and pausing (IRT . 5s). Results are means of the six stable sessions in each condition. Also shown is the average number of changeover responses per minute from the stable sessions. Standard deviations are shown in parentheses.
Subject S1 (Up)
Superimposed schedule VI 15s VT 15s VI 15s VT 15s DRO 10s
S2 (left)
VI 90s VT 90s VI 90s VT 90s DRO 2s VT 90s DRO 5s VT 90s DRO 10s VT 90s DRO 20s
S3 (Right)
VI 90s VT 90s VI 90s VT 90s DRO 5s VT 90s DRO 10s VT 90s DRO 20s
Left
Right
Up
Down
Pause
Changeover rate
253.3 (22.9) 261.3 (22.5) 244.5 (46.6) 221.7 (54.7) 384.2 (23.7) 317.9 (11.3) 377.7 (10.6) 292.4 (28.3) 283.2 (23.6) 251.9 (51.8) 241.8 (12.3) 262.9 (18.4) 278.5 (27.7) 291.5 (10.0) 146.3 (39.2) 211.1 (14.9) 124.7 (15.4)
325.6 (17.2) 365.1 (42.6) 223.3 (13.1) 340.7 (78.1) 291.5 (20.3) 380.8 (10.9) 328.9 (13.6) 276.7 (34.1) 304.2 (29.0) 280.5 (60.3) 277.5 (32.2) 398.7 (14.8) 333.1 (20.6) 410.0 (16.7) 207.4 (57.2) 215.7 (22.8) 177.8 (12.8)
433.3 (33.8) 216.7 (26.9) 549.5 (25.3) 70.65 (20.6) 284.7 (7.1) 302.5 (29.3) 266.3 (19.6) 254.9 (47.9) 315.8 (17.7) 313.4 (44.4) 293.3 (48.0) 299.8 (18.0) 316.5 (32.3) 281.2 (10.6) 146.7 (45.3) 143.7 (28.5) 98.0 (13.9)
239.2 (18.9) 315.3 (42.0) 234.0 (25.9) 400.3 (85.0) 286.9 (15.0) 317.0 (20.0) 250.9 (17.5) 264.4 (36.2) 308.9 (26.2) 294.1 (50.2) 298.1 (39.3) 296.8 (17.4) 315.5 (9.0) 272.4 (12.2) 146.2 (43.3) 155.5 (17.1) 154.5 (18.1)
6.1 (8.1) 98.2 (52.6) 4.1 (4.1) 197.9 (94.2) 8.6 (6.8) 3.1 (3.1) 34.7 (19.4) 167.2 (102.7) 44.2 (19.7) 117.9 (145.5) 162.3 (57.9) 1.0 (1.7) 11.6 (15.3) 2.9 (4.8) 610.4 (165.4) 527.0 (28.7) 698.1 (44.5)
3.2 (0.2) 0.9 (0.2) 3.2 (0.5) 0.7 (0.1) 5.6 (0.4) 3.4 (0.4) 4.5 (0.4) 3.29 (0.3) 4.0 (0.2) 3.7 (0.3) 3.8 (0.3) 8.0 (0.9) 9.3 (0.8) 7.9 (0.6) 5.9 (0.7) 9.5 (4.1) 4.6 (0.4)
ond exposure), perhaps because of the large difference separating baseline and superimposed schedule values. When the superimposed schedule was changed from VI to VT, all subjects decreased time spent on the target response (S1: U 5 0, p 5 .004; S2: U 5 2, p 5 .02; S3: U 5 0, p 5 .004) while time spent making the average nontarget response increased (S1: U 5 0, p 5 .004; S2: U 5 1, p 5 .008; S3: U 5 1, p 5 .008). There were no tie scores in these or any subsequent comparisons. Time spent pausing increased in this condition for S1 alone. These trends were reversed in the next condition when superimposed reinforcers were again delivered according to a VI sched-
ule (target response times increased, U 5 0, p 5 .004 for all subjects; while nontarget response times decreased, U 5 0, p 5 .004 for all subjects). For S1, target response times in the tandem VT DRO 10-s condition decreased below levels maintained in the average of the two VI conditions (U 5 0, p , .004). The same was true for S2 and S3 for all DRO values explored (the least significant difference was in S2’s DRO 2-s condition: U 5 0, p 5 .004). For S1, the reduction in time allocated to target responding was accompanied by an increase in time allocated to the average nontarget response (U 5 5, p 5 .04) and an increase in pausing (U 5 0, p 5 .004). Reductions in S2’s
ALTERNATIVE REINFORCEMENT
201
Fig. 2. Frequency distribution of VT point deliveries obtained following a range of temporal intervals since the last target response (left column) and the last nontarget response (right column). Data are expressed as the proportion of all points arranged by the VT schedule that were obtained in the stable sessions of the superimposed VT condition.
target response times in the tandem VT DRO conditions were accompanied by significant increases in the average nontarget time in the tandem VT DRO 5-s condition (U 5 0, p 5 .004) and increases in pausing in the other tandem VT DRO conditions (all conditions: U 5 0, p 5 .004). For S3, pausing dramatically increased with the introduction of the DRO contingency (all conditions: U 5 0, p 5 .004), while time spent making target and nontarget responses decreased (in all cases: U 5 0, p 5 .004). Target response times in S1’s tandem VT DRO 10-s condition were below levels observed in the VT condition (U 5 0, p 5 .004). For S2, the decrease in target response times from the VT to the tandem VT-DRO 2-s condition was not statistically significant (U 5 9, p 5 .15). At DRO values of 5 s and greater,
however, the decline from VT levels was statistically significant (DRO 5 s: U 5 2, p 5 .01; DRO 10 s: U 5 5, p 5 .04; DRO 20 s: U 5 0, p 5 .004). For S3, statistically significant decreases were observed at all DRO values (DRO 5 s: U 5 6, p 5 .05; DRO 10 s: U 5 0, p 5 .004; DRO 20 s: U 5 0, p 5 .004). Because time allocated to the target response was higher in the VT than in the tandem VT DRO conditions in which the DRO value exceeded 2 s, we examined how many point deliveries in the VT condition were contiguous with target responding and how many were delayed (response-independent points were always delayed in the tandem VT DRO conditions). The left column of graphs in Figure 2 shows these numbers partitioned into 0.5-s bins as a proportion of all responseindependent points obtained in the stable
202
MADDEN and PERONE
sessions. For S1 and S3, the majority of all points delivered according to the VT schedules fell into two bins: those delivered within 0.5 s of a target response and those occurring more than 9.5 s after the last target response. For S2, less than 4% of the points arranged by the VT schedule fell within 0.5 s of a target response (this subject moved the joystick more slowly, and response-independent points were usually delivered as the joystick was returned to its center position), but 25% of all points delivered by the VT schedule fell within 1.5 s of such a response. The right column of graphs in Figure 2 reveals, from this same condition, that the majority of points delivered by the VT schedule closely followed nontarget responding. Next, we compared the ability of Equations 1 and 2 to predict the change in time allocated to the target response across the VT and tandem VT DRO 10-s conditions (the latter was the only DRO value common to all subjects and all subjects’ target response times significantly declined at this DRO value). The target reinforcement rate (r 1 in Equations 1 and 2) was simply the number of reinforcers in each session obtained from the one baseline VI schedule associated with the target response. The nontarget reinforcement rate (r 2) was calculated by summing the total number of point deliveries obtained from the other three baseline VI schedules and the one response-independent point delivery schedule in each of the final six sessions of the VT and tandem VT DRO 10-s conditions. We estimated p (from Equation 2, the proportion of response-independent point deliveries that function as response-dependent point deliveries) for each session in the VT condition by taking the proportion of point deliveries arranged by the VT schedule that followed target responding by 0.5 s or less. Results averaged across the stable sessions are presented in Table 3 (the p values for each subject also are shown graphically in the first bin of the left column of Figure 2). We used values in Table 3 and Equations 1 and 2 to predict percentage changes in the time allocated to the target response across the VT (Equation 2) and tandem VT DRO 10-s (Equation 1) conditions. The calculations set k equal to 1, and r o to 0. The obtained percentage decrease from the VT to the VT DRO 10-s condition is shown for each
subject in Figure 3 along with the decreases predicted by Equations 1 and 2. For all three subjects, the decrease predicted by Equation 2 more closely approximated the obtained decrease than did the prediction based on Equation 1. Table 2 shows the average number of changeover responses (i.e., switching joystick directions) per minute in the stable sessions. In general, changeover rates were high, averaging 4.7 changeovers per minute (range 0.7 to 9.5). High changeover rates resulted in the subjects spending a good deal of time in the changeover delay. Figure 4 shows the average percentage of each session that was spent in the changeover delay in the final six sessions of each condition. With the exception of S3 in the VT condition, changeover rates tended to increase when superimposed reinforcers were delivered contingent upon the target response. Thus, time spent responding consistently in one direction (visit duration) tended to increase when responseindependent points could be delivered at virtually any time. DISCUSSION The behavior of our human subjects was sensitive to changes from response-dependent reinforcement to response-independent delivery of points exchangeable for money, and from the latter to conditions in which response-independent points never followed target responding. When a VI schedule was superimposed on the baseline VI that maintained target behavior, response-reinforcer temporal contiguity was assured and target behavior dominated. When temporal contiguity was left uncontrolled by superimposing a VT schedule, fewer superimposed point deliveries were obtained shortly after emission of the target response, and time allocated to this behavior declined from the levels maintained in the superimposed VI condition. The majority of point deliveries arranged by the VT schedules closely followed nontarget responding, and time allocated to these responses increased. When superimposed point deliveries never immediately followed target responses (tandem VT DRO conditions), target response time decreased below the levels maintained in the VI and VT conditions and nontarget behavior (the other joystick re-
203
ALTERNATIVE REINFORCEMENT Table 3 Number of target reinforcers (r1) and nontarget point deliveries (r2) obtained per session in the VT and tandem VT DRO 10-s conditions. Also shown is the proportion of responseindependent point deliveries in the VT condition obtained within 0.5 s of a target response (p). Results are means of the stable six sessions of each condition, with standard deviations in parentheses. r1
r2
Subject
VT
VT-DRO
VT
VT-DRO
p
S1 S2 S3
2.83 (0.7) 5.50 (1.3) 5.33 (1.7)
2.50 (0.8) 4.67 (2.0) 4.83 (1.2)
85.67 (2.1) 28.00 (2.3) 27.33 (2.5)
79.83 (3.8) 26.83 (2.5) 26.67 (0.9)
0.185 (.048) 0.039 (.039) 0.190 (.074)
sponses, pausing, or a combination) increased over levels observed in the VI and VT conditions. These findings are consistent with those reported by Henton and Iversen (1978) who also reported increases in rats’ nontarget behavior when response-independent food deliveries followed several nontarget behavior. Our findings are also consistent with those
Fig. 3. Percentage decreases in time allocated to target behavior from the VT to the VT DRO condition (see text for details about these conditions). Separate bars are presented for the decreases predicted by Equations 1 and 2 and the obtained decreases for each subject.
reported by Imam and Lattal (1988) who reported lower response rates under tandem VT DRO contingencies than under VT contingencies alone. Like the Imam and Lattal study, our across-condition comparisons were made within subjects. Our findings, when combined with those of Imam and Lattal, suggest that Rachlin and Baum’s (1972) failure
Fig. 4. Mean percentage of the stable sessions in each condition spent in the changeover delay (COD). Error bars show standard deviations.
204
MADDEN and PERONE
to detect differences in responding across these conditions may be the product of between-subject comparisons. The present findings offer little support for the molar account of response-independent reinforcement provided by Equation 1, which assumes that response-independent stimuli (food, points, etc.) decrease target behavior regardless of the temporal relation between the stimuli and the target and nontarget responses. The test in the present experiment involves changes in target response times across the VT and tandem VT DRO conditions. Although Equation 1 correctly predicted the direction of the change, it consistently underestimated the magnitude of the decrease in all subjects. Adding the DRO contingency to the VT schedule decreased target time allocations that were more than double those predicted by Equation 1. One account of these additional decrements is provided by contiguity theory (Henton & Iversen, 1978; Skinner, 1948). Contiguity theory holds that the critical relation between a response and a reinforcer is the temporal interval between them, not the contingency. When response-independent stimuli closely follow target responses, they function as if they were response-dependent reinforcers (which, in this experiment, were always delivered immediately after a response). Because VT schedules allow response-independent stimuli to at least occasionally occur immediately after responding and tandem VT DRO schedules do not, target response allocation should be higher in the VT condition. Equation 2 provides a quantitative model of contiguity theory when one assumes, as we did here, that response-independent stimuli delivered within 0.5 s of a target response functioned as response-dependent reinforcers. Using Equation 2 with this assumption predicted percentage decrements in target responding that more closely approximated obtained values than did Equation 1 (see Figure 3). Our findings support the position that response-independent point deliveries (and presumably other stimuli such as food and water) function as response-dependent reinforcers when they occur in close temporal association with target responding. Additional support comes from Killeen (1978) and Killeen and Smith (1984), who found that pigeons were
incapable of discriminating response-independent from response-dependent stimulus changes when the stimulus change occurred within 200 ms of a response. In their studies (in which accurately reporting the source of the stimulus change was reinforced) correct discriminations improved as the interval between response and stimulus change increased. These findings suggest that responsestimulus temporal contiguity is an important factor in determining whether stimuli such as food, water, and points exchangeable for money will reinforce target or other behavior. If a response-independent stimulus closely follows target behavior, Killeen’s findings suggest the organism will be incapable of discriminating this event from a response-produced reinforcer. This observation, when combined with the data presented here and those presented by Imam and Lattal (1988) and Henton and Iversen (1978), suggest that these response-independent stimuli will function as response-dependent reinforcers. Although the present findings support contiguity theory, the evidence they provide should be recognized as correlational. While immediate response-independent points may have increased the future probability of target and nontarget behaviors, it also is possible that changes in the time allocated to these behaviors increased the probability that response-independent points would closely follow their occurrence. The latter, however, appears unlikely given that comparable relative distributions of target and nontarget behavior were observed across conditions in all subjects, and these distributions were systematically related to the proportion of responseindependent points obtained in close temporal proximity to these responses. Implications for Applied Behavior Analysis Despite the lack of a clear laboratory understanding of response-independent reinforcement (Vollmer & Hackenberg, 2001), applied behavior analysts have for some time been employing response-independent stimuli therapeutically. Generalizing from the matching law, McDowell (1981, 1982, 1988) suggested that Equation 1 predicted that adding response-independent stimuli could decrease problem behaviors. According to McDowell (1988), this effect ‘‘cannot be understood in terms of traditional behavior
ALTERNATIVE REINFORCEMENT analytic principles’’ (p. 105) because it differs from procedures designed to differentially reinforce more appropriate behaviors (e.g., DRO). Although ample inter vention outcomes have qualitatively supported the predictions of Equation 1 (e.g., Carr, Bailey, Ecott, Lucker, & Weil, 1998; Fischer, Iwata, & Mazaleski, 1997; Hagopian, Fisher, & Legacy, 1994; Lalli, Casey, & Kates, 1997; Mace & Lalli, 1991; Marcus & Vollmer, 1995; Vollmer, Iwata, Zarcone, Smith, & Mazaleski, 1993; Vollmer, Marcus, Ringdahl, & Roane, 1995), we believe these outcomes are also predicted by contiguity theory which holds that less time is allocated to problem behavior when alternative, incompatible behaviors are adventitiously reinforced. In addition, contiguity theory goes beyond Equation 1 to predict that if response-independent stimuli occasionally follow instances of a problem behavior, the probability of this behavior will be increased above levels that would be maintained if these events never immediately followed these behaviors. An apparent instance of this side effect of response-independent stimuli led Vollmer et al. (1997) to add a DRO criterion to an FT schedule designed to reduce aggressive behaviors (Vollmer & Hackenberg [2001] reported a preliminary description of an animal replication of this effect). Some researchers (e.g., Vollmer et al., 1998) have suggested that response-independent schedules may be superior to DRO schedules in therapeutic settings because they are more easily administered or because they reduce the disruptive effects of withholding reinforcement when the target behavior is emitted. Britton, Carr, Kellum, Dozier, and Weil (2000), however, have proposed that applied behavior analysts use tandem FT DRO contingencies as a means of reducing problem behavior while concurrently preventing adventitious reinforcement of the behavior. The reduction in target behavior from the VT to the tandem VT DRO conditions in our experiment support the precautionary use of such tandem contingencies. REFERENCES Alleman, H. D., & Zeiler, M. D. (1974). Patterning with fixed-time schedules of response-independent rein-
205
forcement. Journal of the Experimental Analysis of Behavior, 22, 135–141. Baum, W. M. (1979). Matching, undermatching, and overmatching in studies of choice. Journal of the Experimental Analysis of Behavior, 32, 269–281. Baum, W. M. (1981). Discrimination of correlation. In M. L. Commons & J. A. Nevin (Eds.), Quantitative Analyses of Behavior (Vol. 1, pp. 247–256). Cambridge, MA: Ballinger. Britton, L. N., Carr, J. E., Kellum, K. K., Dozier, C. L., & Weil, T. M. (2000). A variation of noncontingent reinforcement in the treatment of aberrant behavior. Research in Developmental Disabilities, 21, 425–435. Burgess, I. S., & Wearden, J. H. (1986). Superimposition of response-independent reinforcement. Journal of the Experimental Analysis of Behavior, 45, 75–82. Carr, J. E., Bailey, J. S., Ecott, C. L., Lucker, K. D., & Weil, T. M. (1998). On the effects of noncontingent delivery of differing magnitudes of reinforcement. Journal of Applied Behavior Analysis, 31, 313–321. Davison, M. C., & Jenkins, P. E. (1985). Stimulus discriminability, contingency discriminability, and schedule performance. Animal Learning and Behavior, 13, 77–84. Edwards, D. D., Peek, V., & Wolfe, F. (1970). Independently delivered food decelerates fixed-ratio rates. Journal of the Experimental Analysis of Behavior, 14, 301– 307. Fischer, S. M., Iwata, B. A., & Mazalewski, J. L. (1997). Noncontingent delivery of arbitrary reinforcers as treatment for self-injurious behavior. Journal of Applied Behavior Analysis, 30, 239–249. Fleshler, M., & Hoffman, H. S. (1962). A progression for generating variable-interval schedules. Journal of the Experimental Analysis of Behavior, 5, 529–530. Hagopian, L. P., Fisher, W. W., & Legacy, S. M. (1994). Schedule effects of noncontingent reinforcement on attention-maintained destructive behavior in identical quadruplets. Journal of Applied Behavior Analysis, 27, 317–325. Henton, W. W., & Iversen, I. H. (1978). Classical conditioning and operant conditioning. New York: SpringerVerlag. Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243–266. Imam, A. A., & Lattal, K. A. (1988). Effects of alternative reinforcement sources: A reevaluation. Journal of the Experimental Analysis of Behavior, 50, 261–271. Killeen, P. R. (1978). Superstition: A matter of bias, not detectability. Science, 199, 88–90. Killeen, P. R., & Smith, J. P. (1984). Perception of contingency in conditioning: Scalar timing, response bias, and erasure of memory by reinforcement. Journal of Experimental Psychology: Animal Behavior Processes, 10, 333–345. Kop, P. F. M., & Van Haaren, F. (1982). Effects of direct and indirect variables on behaviour maintained by fixed- and variable-time schedules of reinforcement. Behaviour Analysis Letters, 2, 1–9. Lachter, G. D. (1971). Some temporal parameters of noncontingent reinforcement. Journal of the Experimental Analysis of Behavior, 16, 207–217. Lachter, G. D., Cole, B. K., & Schoenfeld, W. N. (1971). Response rate under varying frequency of non-contingent reinforcement. Journal of the Experimental Analysis of Behavior, 15, 233–236. Lalli, J. S., Casey, S. D., & Kates, K. (1997). Noncontin-
206
MADDEN and PERONE
gent reinforcement as treatment for severe problem behavior: Some procedural variations. Journal of Applied Behavior Analysis, 30, 127–137. Lattal, K. A. (1972). Response-reinforcer independence and conventional extinction after fixed-interval and variable-interval schedules. Journal of the Experimental Analysis of Behavior, 18, 133–140. Lattal, K. A. (1974). Combinations of response-reinforcer dependence and independence. Journal of the Experimental Analysis of Behavior, 22, 357–362. Lattal, K. A., & Abreu-Rodrigues, J. (1997). Response-independent events in the behavior stream. Journal of Applied Behavior Analysis, 68, 375–398. Lattal, K. A., & Boyer, S. S. (1980). Alternative reinforcement effects on fixed-interval performance. Journal of the Experimental Analysis of Behavior, 34, 285–296. Lattal, K. A., & Bryan, A. J. (1976). Effects of concurrent response-independent reinforcement on fixed-interval schedule performance. Journal of Applied Behavior Analysis, 26, 495–504. Mace, F. C., & Lalli, J. S. (1991). Linking descriptive and experimental analysis in the treatment of bizarre speech. Journal of Applied Behavior Analysis, 24, 553– 562. Madden, G. J., & Perone, M. (1999). Human sensitivity to concurrent schedules of reinforcement: Effects of observing schedule-correlated stimuli. Journal of the Experimental Analysis of Behavior, 71, 303–318. Marcus, B. A., & Vollmer, T. R. (1995). Effects of differential negative reinforcement on disruption and compliance. Journal of Applied Behavior Analysis, 28, 229– 230. McDowell, J. J. (1981). On the validity and utility of Herrnstein’s hyperbola in applied behavior analysis. In C. M. Bradshaw, E. Szabadi, & C. F. Lowe (Eds.), Quantification of steady-state operant behaviour (pp. 311– 324). Amsterdam: Elsevier. McDowell, J. J. (1982). The importance of Herrstein’s mathematical statement of the law of effect. American Psychologist, 37, 771–779. McDowell, J. J. (1988). Matching theory in natural human environments. The Behavior Analyst, 11, 95–109. Rachlin, H., & Baum, W. M. (1972). Effects of alternative
reinforcement: Does the source matter? Journal of the Experimental Analysis of Behavior, 18, 231–241. Skinner, B. F. (1948). ‘‘Superstition’’ in the pigeon. Journal of Experimental Psychology, 38, 168–172. Staddon, J. E. R., & Simmelhag, V. L. (1971). The ‘‘superstition’’ experiment: A reexamination of its implications for the principles of adaptive behavior. Psychological Review, 8, 3–43. Timberlake, W., & Lucas, G. A. (1985). The basis of superstitious behavior: Chance contingency, stimulus substitution, or appetitive behavior? Journal of the Experimental Analysis of Behavior, 44, 279–299. Vollmer, T. R., & Hackenberg, T. D. (2001). Reinforcement contingencies and social reinforcement: Some reciprocal relations between basic and applied research. Journal of Applied Behavior Analysis, 34, 241– 253. Vollmer, T. R., Iwata, B. A., Zarcone, J. R., Smith, R. G., & Mazaleski, J. L. (1993). The role of attention in the treatment of attention-maintained self-injurious behavior: Noncontingent reinforcement and differential reinforcement of other behavior. Journal of Applied Behavior Analysis, 26, 9–21. Vollmer, T. R., Marcus, B. A., Ringdahl, J. E., & Roane, H. S. (1995). Progressing from brief assessments to extended experimental analyses in the evaluation of aberrant behavior. Journal of Applied Behavior Analysis, 28, 561–576. Vollmer, T. R., Progar, P. R., Lalli, J. S., Van Camp, C. M., Sierp, B. J., Wright, C. S., et al. (1998). Fixed-time schedules attenuate extinction-induced phenomena in the treatment of severe aberrant behavior. Journal of Applied Behavior Analysis, 31, 529–542. Vollmer, T. R., Ringdahl, J. E., Roane, H. S., & Marcus, B. A. (1997). Negative side effects of noncontingent reinforcement. Journal of Applied Behavior Analysis, 30, 161–164. Zeiler, M. D. (1971). Eliminating behavior with reinforcement. Journal of the Experimental Analysis of Behavior, 16, 401–405. Zeiler, M. D. (1976). Positive reinforcement and the elimination of reinforced responses. Journal of the Experimental Analysis of Behavior, 26, 37–44. Received August 29, 2001 Final acceptance January 7, 2003