J. E. R. Staddon and Susan Motheral. Duke University. Animals match relative response rate to relative reinforcement rate in two- choice situations where each ...
Psychological Review 1978, Vol. 85, No. 5, 436-444
On Matching and Maximizing in Operant Choice Experiments J. E. R. Staddon and Susan Motheral Duke University Animals match relative response rate to relative reinforcement rate in twochoice situations where each alternative provides reinforcement according to a variable-interval schedule. We show that matching, and a model proposed for it by Herrnstein, can both be derived from reinforcement maximization under a linear response constraint. Empirical results are consistent with the constraint assumption, but they fail to support an extension of the approach to choice situations in which one alternative dispenses reinforcement according to a ratio schedule. Neither matching nor maximizing may be a fundamental principle. As part of the effort to understand how reward and punishment affect behavior, increasing attention has been devoted in recent years to experiments on choice in animals. So-called concurrent schedules, in which two or more alternatives are simultaneously available, are a particularly convenient situation for choice studies: Frequency and type of reward (or punishment) can be varied over a wide range, and the rate of instrumental responding to each alternative provides a graded dependent variable. In an influential article, Herrnstein (1961) studied the choice behavior of hungry pigeons confronted with two alternatives. Pecks on each of two keys occasionally produced brief access to a feeder at rates determined by two independently programmed variable-interval (VI) schedules. Herrnstein varied the VI value (i.e., the average minimum interfood interval) for each alternative in such a way that the overall frequency of access to food was approximately constant. The pigeons' distribution of pecks between the two alternatives followed a particularly simple rule: In the steady state, relative frequency of pecking an alternative matched the relative frequency The research was supported by grants from the National Science Foundation to Duke University. We are grateful to Howard Rachlin, who drew our attention to this problem and has analyzed it in a different way. Requests for reprints should be sent to J. E. R. Staddon, Department of Psychology, Duke University, Durham, North Carolina 27706.
of access to food for pecks on that alternative. This finding has come to be known as the matching law. It has provoked much discussion and experiment and has been extended in numerous ways since Herrnstein's original study (e.g., Baum, 1974; de Villiers, 1977; Herrnstein, 1970; Rachlin, 1973). If x and y are the rates of responding to the two choice alternatives, and Rx and Rv are the rates of access to food actually obtained, then the matching law corresponds to the following relation: x/(x
+ y) = Rx/(Rx+Ry) x/y = Rx/Ry.
(lb)
Three approaches have been taken to the matching finding. One involves experimental exploration of its limiting conditions. As part of this work, alternative formulations have been proposed. For example, a power form of Equation lb, that is, x/y = a(Rx/Rv)>,
(2)
where a and 0 are constants, provides a better fit to the data under many conditions (e.g., Baum, 1974; Staddon, 1968). We will not be directly concerned with this line of work in the' present article. A different line has been taken by Herrnstein, who in his original article was concerned with the independence of operants. He reasoned that if responding to the two alternatives is truly independent, then x should depend
Copyright 1978 by the American Psychological Association, Inc. 0033-295X/78/8505-0436$00.75
436
(la)
or
MATCHING AND MAXIMIZING
solely on the rate of reinforcement for X, and similarly for F. Stated more formally, functions x = f(Rx) and y = f(Rv) should exist such that Equation 1 is true. In his original article, Herrnstein suggested the linear form: f(Rx) = kRx. However, this failed to describe the single-choice case, and subsequent work has favored the alternative (Herrnstein, 1970, 1974) : x= where k is a constant representing the maximum level of responding, and Rx and Ry are the rates of reinforcement for responses other than a; and y, respectively. Obviously, Rx = R0 + Rv and Rg = R0 + Rx> where R0 is the rate of reinforcement for responses other than * or y. Hence, Equation 3 can be rewritten as follows : (4)
or, more generally, if
ma - kRxJ
(5)
Thus, the original assumption of response independence was given up, since in Equations 3-5 the rate of each response depends on the rate of reinforcement for each relative to the rate of reinforcement for all. If Rx + Rv = constant, then Equation 4 implies that x + y = constant. This constraint will become important in later discussion. Matching and Maximizing A third line has been taken by Shimp (1966, 1969) and subsequently by Rachlin, Green, Kagel, and Battalio (1976). Shimp noted an equilibrating property of concurrent VI-VI schedules that works in the following way. Since the availability of reinforcement on interval schedules depends on time, the longer the animal responds on one alternative, say X, the more likely it is that a switch to the other, Y, will be immediately reinforced. It is easy to show that if the availability of reinforcement is random with respect to time, then the probability that a switch will be immediately reinforced is given by P(f) = 1 — exp(—at),
437
where t is the time since the last response to that alternative, and a is the programmed average (maximum) reinforcement rate (Staddon, 1977b). Shimp showed by simulation that if the animal samples the alternatives at discrete, brief time intervals, and always responds to the one with the highest momentary probability of reinforcement, matching will result. More recently, Staddon (1977b) has shown analytically that the assumption of a probability threshold, together with a limit on the maximum possible response rate, yields Equation 3 for the single-response case. Hence, both matching and Herrnstein's equation (Equations 3-5) can be derived from a molecular maximizing process. Matching can also be derived from molar maximizing without making any assumptions about switching. Rachlin et al. (1976), using computer simulation and an explicit maximizing strategy, were able to derive the matching result for concurrent VI-VI schedules. There are, therefore, a number of hints suggesting that matching and possibly Herrnstein's equation as well are a by-product of simple reinforcement maximization. Despite these demonstrations, the conclusion that matching is a by-product of maximizing has met with considerable resistance. Most of the attacks have been directed at Shimp's momentary maximizing idea. In addition to overall matching, momentary maximizing also predicts particular sequences of choices by the animal, and a clear dependence of switch probability on time since the last switch. Unfortunately, sequential data fit the model poorly, and there is no evidence that switch probability depends on time since the last switch (de Villiers, 1977; Heyman & Luce, Note 1). Moreover, Heyman and Luce (Note 1) have recently shown that a more realistic switching model, based on the observed pattern of random switching, fails to predict matching. Hence, momentary maximizing based on a switching model may not explain matching on concurrent VI-VI schedules. However, as Rachlin et al. showed, maximizing based on a molar model is not ruled out by these arguments. We now show that a very simple maximizing model that considers only the optimal allocation of behavior between two alternatives—and takes no explicit account
J. E. R. STADDON AND SUSAN MOTHERAL
438
Bacotti (1977)
Herrnstein (19611
B
a a
01
0)
30
50 V 1v
7O
9O
1O
30
vi
50
(resp / m i n ) Figure 1. Left panel shows response rate on one key plotted against response rate on the other key from Herrnstein (1961, Figure 2). (Each point represents a different pair of variable-interval [VI] values [data for the VI 3-minute - VI 3-minute schedule are averaged across the two exposures]. Conditions in which one alternative received no reinforcement have been excluded.) Right panel shows responseresponse plots for bird S2K from the experiment by Bacotti (1977, Table 2, no changeover delay and S-sec changeover delay). (Each point represents a VI 4-minute schedule on one key and one of four fixed-ratio (FR) values on the other.)
of switching—is in fact sufficient to derive both matching and Herrnstein's equation (Equation 3). Schedule Functions We will be considering choices between VI and VI, or VI and either fixed-ratio (FR) or variable-ratio (VR) schedules. Hence, the first step in the analysis is to derive the schedule functions (i.e., the form of the feedback relation between responses made and reinforcements obtained; cf. Baum, 1973; Staddon, 1975) for ratio and interval schedules. The ratio case is obvious. Since we are considering only molar (i.e., overall session average) measures, for any ratio schedule of value m, Rv = y/m, (6) where Rv is the obtained reinforcement rate, and y is the response rate, as before. For interval schedules, the case is slightly more difficult, but the function can still be derived simply (see Appendix A for a more general treatment). Consider an interval
schedule of value Tx(=l/a), the minimum average interreinforcement interval. If responding is randomly distributed in time, then at any instant the expected time to the next response is equal to the mean interresponse time, that is, to I/at, where x is the average response rate. The expected time between reinforcement actually obtained, Dx, is obviously equal to the expected time between one reinforcement and the " setup" of the next, which is Tx plus I/at, the expected delay between the setup and the next response. Hence,
D, = 1/R. = I/* +T,= l/x + I/a, which reduces to
Rx = ax/(a + x),
(7)
where a is the programmed (i.e., maximum) reinforcement rate. Response Constraint If there is no constraint on the total amount of responding, there is no optimal solution in the choice situation: To maximize reinforce-
MATCHING AND MAXIMIZING
ment, the animal should simply respond as rapidly as possible to both alternatives. This solution is not realistic. For the symmetrical VI-VI situation, a reasonable constraint is of the form * + cy = k, (8)
439
the constraint (cf. Lancaster, 1968; see Appendix B for a general treatment). Perhaps the most obvious intuitively is the marginality approach: Total rate of reinforcement, RT, is maximized if response rates x and y are such that the marginal changes in reinforcement associated with each are equal, that is, when
where * and y are the rates of responding to X and Y, and c = 1. This constraint is implied dRx/dx = dRy/dy. by Herrnstein's model for concurrent VI-VI Incorporating the constraint, differentiating, schedules when the sum of reinforcements for and rearranging yields the quadratic in x: the two alternatives is held constant. However, it also appears to hold quite well even when x2(b* - a2) + 2x(aW + a26 + a?k) this condition is not met (e.g., in concurrent - a?k(2b + A) = 0. (10) stimulus-generalization experiments; cf. Staddon, 1977a). The left-hand panel of Figure 1 This quadratic has two roots, only one of is a plot of x versus y for two pigeons from the which is in the allowable range (0 ^ x ^ k), experiment by Herrnstein (1961, Figure 2). namely, x=ka/(a+b). (11) The fits to a straight line are adequate (r2 = .87 in one case and .93 in the other), deviations Differentiating Equation 10 a second time are not systematic, and the slopes are close to and substituting Equation 11 for a; shows this -1 (-1.04 and -.77). The right-hand panel to be a maximum. Since x + y = k, of Figure 1 shows similar data for two condi- y = kb/(a + b). The programmed reinforcetions from a single pigeon in the concurrent ment rates, a and b, can be eliminated using VI-FR experiment by Bacotti (1977). The the schedule function, Equation 7, to derive linear fit again is quite good, but the slope in an expression for x (or y) in terms of obtained this asymmetric situation is clearly different rates of reinforcement. It turns out to be of the from — 1. For a total of 15 bird conditions in same form as Equation 11, that is, Bacotti's experiment, the median r2 = .92, x= kRx/(Rx+Rv) (12a) with a range of .14-1.0. The mean slope corresponds to c = .40, favoring the ratio and y=kRy/(Rx+Ry), (12b) response. Thus, c ^ 1 for the concurrent ratiointerval case. which is Herrnstein's equation (Equation 4), with R0 = 0. It is obvious that this result Optimal Solutions also predicts matching, since x/y = a/b The solution that maximizes total reinforce- = Rx/Ry. Thus, reinforcement maximization, ment rate can be found by forming an expres- subject to a symmetrical linear constraint, sion for the overall frequency of reinforcement yields both Herrnstein's equation for respondas a function of response rates x and y. The maximum of this function, subject to the linear 1 Note that this approach assumes only that the response constraint (Equation 8), is then the distribution of X and Y responses, considered indeoptimal solution. For two choices, the total pendently of each other, is exponential (i.e., random reinforcement rate, RT, is given by RT in time). It is quite indifferent to the pattern of switch= Rx-\- Rv; substituting for Rx and Rv for the ing between keys, providing that the overall response distributions are random. Since the two VI schedules VI-VI case from Equation 7 yields the are independently programmed, neither the procedure following: nor our analysis takes explicit account of switching.
RT = ax/ (a + x) + by/ (b + y),
(9)
where a and b are the rates of reinforcement programmed by the VI schedules for responses X and 7, respectively.1 There are a number of standard methods for finding the minimum of Equation 9 under
The situation becomes more complicated when a changeover delay is added, since this feature has implications for the optimal switching strategy. However, as an empirical matter, the results with which we are concerned are relatively insensitive to the presence or absence of a changeover delay and to variations in its duration over a substantial range (Bacotti, 1977; de Villiers, 1977; Heyman & Luce, Note 1).
440
J. E. R. STADDON AND SUSAN MOTHERAL
ing to each alternative considered separately and for matching when both are considered together. We show in Appendix B that the matching result does not depend on the response constraint and is the optimal solution even if each response entails a fixed cost, providing the cost is the same for both responses. If the cost is different, biased matching of the form x /y =