ANIMAL BEHAVIOUR, 1999, 57, 233–241 Article No. anbe.1998.0944, available online at http://www.idealibrary.com on
Interruptions to foraging and learning in a changing environment SASHA R. X. DALL*, JOHN M. MCNAMARA† & INNES C. CUTHILL*
*School of Biological Sciences, University of Bristol †School of Mathematics, University of Bristol (Received 15 December 1997; initial acceptance 27 January 1998; final acceptance 28 July 1998; MS. number: 5730)
Many resources are both stochastic and variable in their average profitability. Animals have to sample them to track their current states, but whether it is economic to attempt this depends on many factors. Furthermore, there are many interruptions and distractions from foraging (e.g. escape from predators, bad weather, displacement by competitors) which interfere with the acquisition of information. We present a dynamic model of foraging in a stochastic and varying environment, under the constant threat of interruption, to investigate this very general problem. A forager faces two foraging options, one of which provides a known and constant reward, the other providing a reward that is not only stochastic, but whose mean payoff varies in time. The forager has to learn which option has the highest current payoff by sampling. However, interruptions to foraging can occur at any time, the timing and duration of which are beyond the animal’s control. When there is a small probability of foraging being interrupted, the forager should forage extensively on the unknown option, but as the probability of interruptions is increased, there is a sudden transition to foraging only on the known option. This occurs because interruptions affect both the level of information required to make exploitation of the unknown option profitable, and the ability to acquire and maintain that information. At what probability of being interrupted this threshold emerges is affected by the value of learning about the unknown option and the duration of interruptions. We discuss the generality of our results with reference to the pervasive problem of updating information in the face of different types of interruption.
costs and benefits associated with sampling and the rate at which options change state (see Stephens & Krebs 1986; Krebs & Inman 1992 for reviews). However, there is an additional complication facing foragers tracking uncertain and changing resources: animals are rarely isolated from other demands when they are foraging. Some of these demands are under the animal’s control (e.g. provisioning young, displaying to potential mates, scanning for predators) whereas others may be unpredictable in timing or duration (e.g. escape from a predator, sheltering from bad weather, displacement by a competitor). Such demands will force foragers to take time out from foraging on and tracking their continuously changing resources (Devenport & Devenport 1994). At one extreme, foraging can be terminated for the remainder of the period being considered (e.g. the day in diurnal foragers). This may happen where, for instance, heavy snowfall in winter prevents foraging in groundfeeding birds, or the periodic covering of mud flats by tidal waters interrupts the foraging of intertidal shore birds. This is the type of interruption problem that has been considered most often from an information
In an unpredictable world, the quality of prey or patch types may be changing continuously because of changes in weather, the behaviour of other animals and/or other such factors that are out of the direct control of a foraging organism. Because of this, the problem of tracking a changing environment has captured the interest of optimal foraging theorists (Stephens & Krebs 1986; Krebs & Inman 1992). On the one hand, by sampling each of its options (e.g. prey or patch types) regularly, a forager gains from being able to exploit them when they are productive, and avoid them otherwise. However, such sampling can be costly in terms of wasting time and energy on unproductive resources when there are more productive alternatives. In addition, where the states of the different options are changing continuously and independently of each other, the quality of the forager’s estimates of these states will degrade with the time since the options were last sampled. By formalizing the above, one can show that the optimal level of sampling depends on the relative Correspondence: I. C. Cuthill, School of Biological Sciences, Woodland Road, Bristol BS8 1UG, U.K. (email:
[email protected]). 0003–3472/99/010233+09 $30.00/0
1999 The Association for the Study of Animal Behaviour
233
1999 The Association for the Study of Animal Behaviour
234 ANIMAL BEHAVIOUR, 57, 1
v
1–v
Interrupted
u
Foraging
No foraging Information about Option 1(X) degrades towards long-term average (probability of Option 1 being in good state = 0.5)
Decision Choose Option 0
Choose Option 1 1–α
Gain µ 0 with certainty X degrades to 0.5
α
α
Bad state
Good state
Gain e with low probability
Gain e with high probability
1–α
1–u Figure 1. Schematic diagram of the situation characterized by the model. See Table 1 for definitions of variables.
perspective, since it has important implications for the timescale of foraging decision making; the longer the expected time-horizon available to it, the more a forager should invest in learning since it will have more opportunity to exploit the information gained (reviewed by Lucas 1990). However, such terminal interruptions represent only an extreme of the range of possible distractions to foraging that animals can face. More generally, foragers will be able to return to a particular site after spending some period of time away. How much time is spent away will depend on the context of the interruption: seeing a conspecific eaten by a predator is likely to cause foragers to stay away for longer than if they are startled by a sudden movement or if they have been displaced by a competitor. How the opportunity to return to a site, and the length of time spent away, influence decision making with incomplete information remains to be explored. Here we present a framework for investigating the effects of interruption on learning and foraging, allowing a wide range of interruption types to be modelled. We investigate interruptions to foraging and their effects on information about resources that vary and for which there is never complete information. To do this, we use a paradigm designed to capture the effects of incomplete information and a changing environment on ‘risk sensitivity’ under rate maximization (McNamara 1996). Our model forager has to choose between an option of known, and constant, payoff and one of uncertain, and varying, payoff. Because of its similarity to choosing a fruit machine, based on an assessment of its likely payout, this sort of problem is known as a ‘onearmed bandit’ in the probability theory and economics literature (see e.g. Krebs et al. 1978). Here, one option/ arm is known and the other is unknown and variable in its average payoff. The unknown option may or may not be better than the known option, but the animal can
assess this only by direct sampling. As both good and bad states of the unknown option are stochastic, even when the unknown side is ‘good’ a sampling attempt may result in no reward. So repeated sampling is necessary to gain accurate information about whether the current probability of reward on the unknown arm is high or low. Even an animal maximizing long-term rate of energy gain, as in our model, can show risk- (i.e. variance-) sensitive behaviour under these circumstances (McNamara 1996). In the current model, at any time the animal can be interrupted from foraging and resume after a certain period, during which its certainty about the state of the unknown option decays. We model its information about the unknown option as a state variable, and find the policy that maximizes the animal’s mean rate of energetic gain. THE MODEL Figure 1 is a schematic diagram of the model, with terms defined in Table 1. At each time step in the model, the animal can either be foraging, or not foraging owing to an interruption. The probability of interruption whilst foraging is 1u (u=probability that the forager will remain uninterrupted for another time unit) and interruptions can happen when foraging at either option/arm. Interruptions mean that any foraging decision must wait until the next time step (t+1), when the animal can resume foraging with probability 1v (v=probability of remaining interrupted for another time unit). Note that the probability of resumption may differ from the probability of interruption. Whilst foraging, at each of the decision times t=0, 1, 2, . . . the animal must choose between one of two foraging options. Option 0 gives the same amount of food, ì0, whenever it is sampled, so the animal always has complete information about the rate of energy gain
DALL ET AL.: LEARNING WITH INTERRUPTIONS 235
Table 1. One-armed time-varying bandit Variable
u
v
µ0 e p µ1 µ1good
µ1bad
d α
X
Definition
The probability, per time step, that a forager will remain uninterrupted while foraging (so 1−u is the probability of interruption) The probability, per time step, that a forager remains uninterrupted, having been interrupted (so 1−v is the probability of resumption) A constant (=1). The amount of food per unit time available from Option (arm) 0 The quantity of food delivered, if delivered, at Option 1 The probability that food is delivered at Option 1 (=pe) The expected amount of food per unit time from Option (arm) 1 The expected amount of food per unit time from Option (arm) 1 when in its good state (p is high) The expected amount of food per unit time from Option (arm) 1 when in its bad state (p is low) The difference between µ1good and µ1bad (used only in figures) The probability that Option 1 remains in its current state (so 1−α is the probability of change from µ1good to µ1bad or vice versa) The state variable representing information about the unknown arm. It is the probability that µ1 on Option 1 is µ1good
Option 0 provides. Conversely, Option 1 is stochastic, giving an amount of food e with probability p and no food with probability 1p. The expected payoff from Option 1 is thus pe=ì1. If p did not change, with continued sampling the animal could gain a precise estimate of p and hence make a choice, once and for all, between foraging exclusively on arm 0 or 1, whichever gave the highest average rate of energy gain. Unless ì1 =ì0, once sampling had given the animal sufficient information about p (or ì1), the rate-maximizing solution would be to choose one option exclusively. However, in our model, the probability p varies over time (the amount of food e is constant). Because of this, we can investigate the optimal behaviour when the value of Option 1 is sometimes better than Option 0, but sometimes worse. Owing to the stochasticity of reward, the state of Option 1 (or its expected rate of energy gain, ì1) is uncertain and a single visit can never provide complete information. ì1 (=pe) varies between two states ì1good >ì1bad such that if ì1 =ì1good at time t then ì1 =ì1good at t+1 with probability á and ì1(t+1)=ì1bad with probability 1á. If ì1 =ì1bad at t then ì1(t+1)=ì1bad with probability á and ì1(t+1)=ì1good with probability 1á. That is, for the results presented here, the probabilities of change from good to bad are the same as from bad to good (but need not equal 0.5). In other models these probabilities of change need not be symmetrical. At any time the animal
has incomplete information about ì1, and can gain information only by choosing Option 1 and noting whether food is obtained. We find the strategy that maximizes the animal’s mean long-term rate of food gain (the optimality equations are given in the Appendix). A strategy is a rule for choosing between the two foraging options based on experience, where experience determines the animal’s information about the unknown arm. We represent the animal’s current information on the system by a value X which is the probability that ì1 =ì1good. As ì0 is known with certainty, and ì1 can only be ì1good or ì1bad, then X is a complete representation of the information on the system. If a long time has elapsed since Option 1 was chosen, X will be close to 0.5, provided that the probability of change is symmetric between the states of ì1, as it is for the results described here. This is true whether the animal did not choose Option 1 or was interrupted from foraging. Similarly, if Option 1 is not chosen at time t, X will be closer to 0.5 by time t+1. If Option 1 is chosen and no food is found, then X will be less at the next decision point than if Option 1 had not been chosen. Conversely, if Option 1 is chosen and a food item is found, X will be greater at the next decision point (the information updating functions are given in the Appendix). Choosing Option 1 gives an expected reward of ì1bad (1X)+ì1goodX, while choosing Option 0 always gives a reward of ì0. If ì0 ì1bad, the optimal policy is trivial. In the former case the animal should always exploit the unknown option; in the latter case it should never exploit the unknown option. Thus in this paper we restrict attention to the case ì1bad