positive reinforcement can act like an aversive stimulus. For example, its threatened presenta- tion can suppress variable-interval behavior in the presence of a ...
JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR
VOLUME
6,
NUMBER
AVERSIVE ASPECTS OF A SCHEDULE OF REINFORCEMENT1 JAMES B. APPEL2
3
JULY, 1963
POSITIVE
YALE UNIVERSITY SCHOOL OF MEDICINE
Six male White Carneaux pigeons were trained to peck at one of two keys to obtain food on several fixed-ratio schedules of reinforcement. Concurrently, the first response on a second key could, I-change the conditions of visual stimulation and remove the food reinforcement contingency, II-change the conditions of stimulation and have no effect upon the reinforcement contingency, or III-do nothing. The second response on the stimulus change key always restored baseline conditions. When second-key responses produced a stimulus change, the number of such responses was a function of the ratio value on the first key. Typically, second-key responses occurred before the start of fixed-ratio runs. The duration of stimulus change periods was an exponential function of the number of responses required for reinforcement when the possibility for reinforcement was not disturbed by periods of stimulus change (Condition II).
A given set of stimulus conditions is reinforcing if an organism will normally work to obtain it or is aversive if an organism will work to remove it. Conditioned stimuli can, however, be either rewarding or aversive, depending upon the manner in which they are related to primary reinforcement contingencies. Conditions associated with a timeout (TO) or extinction period apparently function in much the same manner as other conditioned stimuli. Morse and Herrnstein (1956), and Ferster (1957, 1958), have shown that a TO from positive reinforcement can act like an aversive stimulus. For example, its threatened presentation can suppress variable-interval behavior in the presence of a pre-TO stimulus (Ferster, 1957) and its occurrence can inhibit SA responding in matching to sample (Ferster and Appel, 1961). On the other hand, the TO will sometimes act like a positive reinforcer. This is true when baseline conditions are themselves aversive. Any situation involving escape from electric shock is an obvious example.
Verhave (1962) has, in addition, shown that rats prefer to work for the removal of conditions associated with an avoidance schedule (i.e., a TO) rather than simply avoid shock. All of these studies lead to the conclusion that a TIO from positive reinforcement is aversive while a TO from aversive conditions is reinforcing. Unfortunately, it appears that the situation is not this simple, for an organism will sometimes acquire and maintain a response which changes or even removes
certain conditions associated with positive reinforcement. For example, Azrin (1961) describes an experiment in which pigeons are trained to peck a key for food reward on several fixed-ratio schedules. The first response on a second key changes the stimuli in the box and simultaneously imposes a TO from positive reinforcement. The second response on the second key restores the baseline conditions. It was found that birds will peck at the second key, and that the duration of the self-imposed TO is a function of the number of responses required for reinforcement. On the basis primarily, but not exclusively, of Azrin's work some questions can be raised 'This research was supported in part by Public Health Service Research Grant MY-3363, from the regarding (1) what aspects of the TO or exNational Institute of Mental Health. tinction period are controlling operant be2The author would like to thank D. X. Freedman, havior, and (2) how these are related to onM.D., for supplying laboratory facilities. Miss C. V. going reinforcement contingencies. At least Perotti and Miss A. R. West provided technical as- two features of the TO can be considered. sistance which is hereby gratefully acknowledged. Reprints may be obtained from the author, Dept. of First, by definition, no reinforcement can be Psychiatry, Yale University, School of Medicine, 333 obtained when TO conditions are in effect. It does not seem reasonable to argue that an Cedar St., New Haven 11, Conn. 423
JAMES B. APPEL
424
animal will find an extinction contingency reinforcing when it can get food simply by remaining under the control of baseline conditions and running off a fixed-ratio requirement. But the TO also has another feature; it simply introduces a change in stimulation. Perhaps this change is, in itself, reinforcing. If so, a change in illumination rather than a change with an added TO should maintain responding on the right key. But the problem remains as to whether or not the relative reinforcing value of the stimulus change is related to other experimental conditions and, if so, what is this relationship? The present experiment was designed to try to answer some of these questions. METHOD
Subjects Six male White Carneaux pigeons, nine months old at the start of the investigation, served as subjects. They had previously been used in an experiment involving different colored lights and variable-interval schedules of reinforcement. The birds were maintained at 80-90% of their free-feeding body weights by feeding supplementary grain of a composition recommended by Ferster and Skinner (1957). Water was available at all times in both home and experimental cages. Apparatus A pigeon box of the kind described in detail elsewhere (Ferster and Skinner, 1957) was modified to contain two keys spaced 2 in apart and equidistant from the food magazine. Each key could be illuminated by either a green or a yellow 110 v, 7 w bulb. One 110 v, 7 w white house light was located at the upper right corner and another at the upper left corner of the working area of the experi-
mental chamber. All experimental events were programmed automatically by relay and timing circuits in an adjoining room. Running-time meters measured the durations of all stimulus conditions; responses on each key were recorded on electro-magnetic counters and on a cumulative recorder. Procedure The six birds were run seven days per week. Sessions were either of 1 hr duration or
terminated when an animal had obtained 70 reinforcements-whichever came first. Since the ratio was normally somewhat strained even at FR 80, sessions almost always terminated after 1 hr. Each reinforcement consisted of 5-sec access to grain. Because the birds were not experimentally naive, little preliminary training was necessary. They were always placed into the apparatus with both key lights green and the house lights on. One of the following conditions prevailed on the right key: I-The first peck turned both key lights yellow, extinguished the house lights and eliminated the food reinforcement contingency on the left key. The second peck restored the original (green) conditions of illumination and the reinforcement contingency. 11-The same as I except that the animal could obtain reinforcement (by responding on the left key) during the period when the yellow light was on. III-Responses on the right key produced no changes in the experimental conditions. Each bird received all three conditions in the following order: Bird 1: 1, III, II; Bird 2: III, II, I; Bird 3: 1, I, III; Bird 4: III, II, I; Bird 5: 1, I, III; Bird 6: 1, III, II. Contingencies on the right key were never changed until performances were stable. Responses on the left key were rewarded with food on a fixed-ratio schedule. The ratio was gradually increased for each animal from FR 1 to FR 240 during preliminary training on the first set of conditions. The birds were then run on values of FR 80, 100, 120, 150, 180, 210, and 240 in random order for all three conditions of the experiment. The possibility of the adventitious reinforcement of right-key responses was eliminated by imposing a 4-sec period after each right-key peck during which no left-key response could be followed by food, even if the ratio requirement had been met. RESULTS Figure 1 shows the average for all six birds of the median number of right-key responses during the last five sessions of each ratio. The number of responses is directly related to rate since the sessions were of a relatively constant duration (approximately 1 hr). When no change in illumination was associated with a
A VERSIVE and POSITIVE REINFORCEMENT
hi030 0
7= 0
I-I
25 *.
--Xm M
/
20
0
0)
10-
0n o
/
/5
/
.Os....
*4
4
o 100 120 150 ISO 210 240 NO. OF RESPONSES REQUIRED FOR REINFORCEMENT (FR VALUE) Fig. 1. The number of responses on the right key as a function of the number of responses required for reinforcement on the left key. In Condition I, the first response on the right key is followed by a change in stimulation and by a TO; the second right-key peck restores the baseline conditions. In II, only the stimuli are affected by right-key responses and in III, right-key pecks have no effects whatever. 80
response on the right key (III), the birds rarely pecked at that key and the FR value had little effect upon the number of rightkey responses. When, however, responses on the right key produced a stimulus change, the number of right-key pecks increased rapidly over control values and was an increasing function of the ratio requirement on the left key (I, II). The presence or absence of reinforcement during the periods of stimulus change had little apparent effect upon the frequency of pecking the right key. Individual performances are not shown but each bird followed the same general pattern as the group data. Another measure of the reinforcing value of a stimulus is the amount of time an organism chooses to remain in its presence. Figure 2 shows the median percentage of the total session time the birds spent in the yellow light (and no house lights) during the last five days on each fixed-ratio. When an extinction condition on the left key was associated with stimulus change (I), the animals rarely spent more than five per cent of the session (3 min) in yellow at any FR value. On the other hand, when reinforcements could be obtained in either set of stimulus conditions (II), the time spent in yellow was an exponential function
425
of the number of responses required for reinforcement. The characteristic pattern of behavior under the concurrent schedules is illustrated in the cumulative record of Fig. 3. An entire session on FR 240 during which extinction was associated with stimulus change (condition I) is presented. At least four aspects of the individual performances are general enough to merit comment. 1) Fixed-ratio responding on the left key, when both lights were green, was often strained at relatively low FR values. It was difficult, if not impossible, to maintain any behavior at FR 280 and higher. There was a considerable amount of pausing after reinforcement and the length of the pauses increased as each daily session progressed (Fig. 3). 2) Right-key responses almost invariably occurred before ratio runs and during pauses after reinforcement (e.g., at a, e, i) although they could sometimes be observed during an early portion of a run (e.g., at c). Few rightkey responses occurred when there was little pausing and the rate on the left key was relatively high (e.g., before a). 3) The rate of ratio responding was usually at or near zero during stimulus-change periods (e.g., between a and b, c and d, i and 1). The birds usually pecked at the right key a second time and thereby restored the baseline conZ z 030
~-'
ZZ
IL.a 0=
25
.J2,a20
-.I -
e/
zt
I-., ILZ 00
10-
hi0
5
-
/
.0
WL Z
°
" 0 100 120 ISO ISO 210 240 NO. OF RESPONSES REQUIRED FOR REINFORCEMENT ( FR VALUE) Fig. 2. The percentage of the total session time spent in the stimulus change period as a function of the number of responses required for reinforcement on the left key. In Condition I, an extinction or TO
period accompanies stimulus change and in Condition II, it does not.
426)
JAMES B. APPEL
/RsSEC z
10 MINUTES
Fig. 3. Cumulative record of a 1-hr session for Bird 1 on FR 240 during Condition I of the experiment. The records are compressed before a and after j, when no stimulus changes occur, to save space. Left-key responses and reinforcements are shown on the stepping pen. Right-key pecks are indicated by the small letters and by resets of the pen. The event marker (bottom line) shows the periods of stimulus change and extinction.
ditions of illumination before responding once more on the left key (e.g., at b, d, h, 1). Occasionally, however, the birds ran off ratios during periods of stimulus change and these runs occurred more frequently early in training than they did when performances were stable. 4) Neither the distribution of right-key responding nor the duration of stimulus change periods was systematically related to session time or to experimental conditions except that, other things being equal, durations were longer when no TO contingency was prescribed.
Figure 4 shows the behavior of a bird which received seven reinforcements during a single period when the lights were yellow (between a and b) and emitted at least 1680 left-key responses.
DISCUSSION The results of the present experiment agree only partially with those of Azrin (1961). It is apparent both that the pigeon will peck at a key which changes the conditions of stimulation and that the frequency of responding on that key is a function of the value of a fixed-
Fig. 4. Cumulative record of a 1-hr session for Bird 5 on FR 240 during Condition II of the experiment. The records are compressed before a and after b. The small letters and resets indicate right-key responses; the event pen shows the duration of a period of stimulus change but no extinction contingency.
AVERSIVE and POSITIVE REINFORCEMENT ratio maintaining responding on another key. Moreover, the pattern of responding on the stimulus change key is similar to that observed by Azrin; most responses occur before the bird begins its run on the ratio key. The relative influence upon behavior of the TO or extinction period associated with right-key responses is less clear. Azrin states (1961, p383): "The change in stimuli was not itself reinforcing, since the pigeon imposed extinction periods regardless of whether an increase or a decrease in illumination was associated with time-out." This may be, but the results of having no change in illumination accompany right-key responses were not reported until the present experiment. When this control is run, the stimulus change, regardless of the presence or absence of the extinction contingency, can be seen to induce a substantial increase in the frequency of right-key responses and that this increase is most apparent at high ratios (Fig. 1). Azrin reported an exponential relationship between the minutes spent in TO and ratio value. My animals did not reveal any such function when the extinction condition was present; when there was a stimulus change but no extinction, a relationship similar to that of Azrin was obtained but the ratio values were generally higher in the present experiment. In addition, my birds tended to spend less time in the stimulus-change condition than did those of Azrin. At present, the stimulus change appears to be a sufficient explanation for the occurrence of the initial right-key response and the frequency data of Fig. 1. The question remains as to why such a change can be reinforcing and on this issue I am in basic agreement with Azrin. Because the disposition to respond on the right key is directly related on the one hand to the value of the ratio or frequency of reinforcement maintaining responding on the left key and is also inversely related to the animal's FR rate, it can be assumed that some condition controlling behavior on the left key is also controlling behavior on the right. Accidental contingencies can be ruled out because of the imposition of the 4-sec delay between right-key responses and the possibility of reinforcement by rapidly changing over to the left key. It does not seem unreasonable to hypothesize that an organism will impose a stimulus change when the original stimulating
427
conditions become aversive. A right-key peck can be viewed as an escape response from some noxious aspect of the positively reinforcing FR schedule, e.g., the conditions after reinforcement are aversive in the sense that a relatively long time and large amount of work are required before another reinforcemnent can be obtained, particularly at high ratios. Exactly what motivates the birds to peck the right key a second time and restore the baseline conditions is considerably less clear unless it is argued that any stimulus change can be reinforcing. The presence or absence of the possibility of obtaining reinforcement when the stimulus change conditions are present may explain the differences in time spent in yellow between the extinction and no-extinction conditions (Fig. 2). However, since Azrin's data on TO do not correspond to my own, the hypothesis that the absence of reinforcement during stimulus change periods is necessary to induce a second right-key response in a relatively short time is, at best, tenuously supported by the available evidence. The low frequency at which birds usually respond on the left key during the stimulus change also argues against the adequacy of this explanation for, if few ratios are run off, few reinforcements can be missed and the extinction contingency cannot be expected to have a great deal of effect. Some factor in the ratio schedule, training, or early reinforcement history of the animals is probably relevant but, until more data have been collected, the identity of this factor is mere speculation. In conclusion, it seems that at least some of the TO data of Azrin and others can be explained simply by positing that a change in stimulation can be either reinforcing or aversive. Many investigators have, of course, been saying this for years. But the relative reinforcing value of a stimulus change is probably related to experimental conditions which prevail at the time the change is imposed or to the degree of "aversiveness" of the baseline schedule of reinforcement. REFERENCES Azrin, N. H. Time-out from positive reinforcement. Science, 1961, 133, 382-383. Ferster, C. B. Withdrawal of positive reinforcement as punishment. Science, 1957, 126, 509.
428
JAMES B. APPEL
Ferster, C. B. Control of behavior in chimpanzees and pigeons by time out from positive reinforcement. Psychol. Monogr., 1958, 72, No. 8 (Whole No. 461). Ferster, C. B. and Appel, J. B. Punishment of SA responding in matching to sample by time out from positive reinforcement. J. exp. Anal. Behav., 1961, 4, 45-56. Ferster, C. B. and Skinner, B. F. Schedules of reinforcement. New York: Appleton-Century-Crofts, 1957.
Morse, W. H. and Herrnstein, R. J. The maintenance of avoidance behavior using the removal of a conditioned positive reinforcer as the aversive stimulus. Amer. Psychol., 1956, 11, 430. (Abstract) Verhave, T. The functional properties of a time out from an avoidance schedule. J. exp. Anal. Behav., 1962, 5, 391-422.
Received October 11, 1962