Optimizing sequential decisions in the drift-diffusion model - bioRxiv

bioRxiv preprint first posted online Jun. 11, 2018; doi: http://dx.doi.org/10.1101/344028. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.

Optimizing sequential decisions in the drift-diffusion model Khanh P. Nguyena , Kreˇsimir Josića,b,c,f , Zachary P. Kilpatrickd,e,f a Department

of Mathematics, University of Houston, Houston TX 77204 ([email protected], [email protected]) b Department of Biology and Biochemistry, University of Houston, Houston TX 77204 c Department of BioSciences, Rice University, Houston TX 77005 d Department of Applied Mathematics, University of Colorado, Boulder, Colorado 80309, USA ([email protected]). e Department of Physiology and Biophysics, University of Colorado School of Medicine, Aurora, CO 80045 f equal authorship

Abstract To make decisions organisms often accumulate information across multiple timescales. However, most experimental and modeling studies of decision-making focus on sequences of independent trials. On the other hand, natural environments are characterized by long temporal correlations, and evidence used to make a present choice is often relevant to future decisions. To understand decision-making under these conditions we analyze how a model ideal observer accumulates evidence to freely make choices across a sequence of correlated trials. We use principles of probabilistic inference to show that an ideal observer incorporates information obtained on one trial as an initial bias on the next. This bias decreases the time, but not the accuracy of the next decision. Furthermore, in finite sequences of trials the rate of reward is maximized when the observer deliberates longer for early decisions, but responds more quickly towards the end of the sequence. Our model also explains experimentally observed patterns in decision times and choices, thus providing a mathematically principled foundation for evidence-accumulation models of sequential decisions. Keywords: decision-making, drift-diffusion model, reward rate, sequential correlations

1. Introduction Organismal behavior is often driven by decisions that are the result of evidence accumulated to determine the best among available options (Gold and Shadlen, 2007; Brody and Hanks, 2016). For instance, honeybee swarms use a democratic process in which each bee’s opinion is communicated to the group to decide which nectar source to forage (Seeley et al., 1991). Competitive animals evaluate their opponents’ attributes to decide whether to fight or flee (Stevenson and Rillich, 2012), and humans decide which stocks to buy or sell, based on individual research and social information (Moat et al., 2013). Importantly, the observations of these agents are frequently uncertain (Hsu et al., 2005; Brunton et al., 2013) so accurate decision-making requires robust evidence integration that accounts for the reliability and variety of evidence sources (Raposo et al., 2012). The two alternative forced choice (TAFC) task paradigm has been successful in probing the behavioral trends and neural mechanisms underlying decision-making (Ratcliff, 1978). In a TAFC task subjects decide which one of two hypotheses is more likely based on noisy evidence (Gold and Shadlen, 2002; Bogacz et al., 2006). For instance, in the random dot motion discrimination task, subjects decide whether a cloud of noisy dots predominantly moves in one of two directions. Such stimuli evoke strong responses in primate motion-detecting areas, motivating their use in the experimental study of neural mechanisms underlying decision-making (Shadlen and Newsome, 2001; Gold and Shadlen, 2007). The response trends and underlying neural activity are well described by the drift-diffusion model (DDM), which associates a subject’s belief with a particle drifting and diffusing between two boundaries, with decisions determined by the first boundary the particle encounters (Stone, 1960; Bogacz et al., 2006; Ratcliff and McKoon, 2008). The DDM is popular because (a) it can be derived as the continuum limit of the statistically-optimal sequential probability ratio test (Wald and Wolfowitz, 1948; Bogacz et al., 2006); (b) it is an analytically tractable Wiener diffusion process whose summary statistics can be computed explicitly (Ratcliff and Smith, 2004; Bogacz et al., Preprint submitted to Elsevier

June 11, 2018


2006); and (c) it can be fit remarkably well to behavioral responses and neural activity in TAFC tasks with independent trials (Gold and Shadlen, 2002, 2007) (although see Latimer et al. (2015)). However, the classical DDM does not describe many aspects of decision–making in natural environments. For instance, the DDM is typically used to model a series of independent trials where evidence accumulated during one trial is not informative about the correct choice on other trials (Ratcliff and McKoon, 2008). Organisms in nature often make a sequence of related decisions based on overlapping evidence (Chittka et al., 2009). Consider an animal deciding which way to turn while fleeing a pursuing predator: To maximize its chances of escape its decisions should depend on both its own and the predator’s earlier movements (Corcoran and Conner, 2016). Animals foraging over multiple days are biased towards food sites with consistently high yields (Gordon, 1991). Thus even in a variable environment, organisms use previously gathered information to make future decisions (Dehaene and Sigman, 2012). We need to extend previous experimental designs and corresponding models to understand if and how they do so. Even in a sequence of independent trials, previous choices influence subsequent decisions (Fernberger, 1920). Such serial response dependencies have been observed in TAFC tasks which do (Cho et al., 2002; Fründ et al., 2014) and do not (Bertelson, 1961) require accumulation of evidence across trials. For instance, a subject’s response time may decrease when the current state is the same as the previous state (Pashler and Baylis, 1991). Thus, trends in response time and accuracy suggest subjects use trial history to predict the current state of the environment, albeit suboptimally (Kirby, 1976). Goldfarb et al. (2012) examined responses in a series of dependent trials with the correct choice across trials evolving according to a two-state Markov process. The transition probabilities affected the response time and accuracy of subjects in ways well described by a DDM with biased initial conditions and thresholds. For instance, when repeated states were more likely, response times decreased in the second of two repeated trials. History-dependent biases also increase the probability of repeat responses when subjects view sequences with repetition probabilities above chance (Abrahamyan et al., 2016; Braun et al., 2018). These results suggest that an adaptive DDM with an initial condition biased towards the previous decision is a good model of human decision-making across correlated trials (Goldfarb et al., 2012). Most earlier models proposed to explain these observations are intuitive and recapitulate behavioral data, but are at least partly heuristic (Goldfarb et al., 2012; Fründ et al., 2014). Yu and Cohen (2008) have proposed a normative model that assumes the subjects are learning non-stationary transition rates in a dependent sequence of trails. However, they did not examine the normative model for known transition rates. Such a model provides a standard for experimental subject performance, and a basis for approximate models, allowing us to better understand the heuristics subjects use to make decisions (Ratcliff and McKoon, 2008; Brunton et al., 2013). Here, we extend previous DDMs to provide a normative model of evidence accumulation in serial trials evolving according to a two-state Markov process whose transition rate is known to the observer. We use sequential analysis to derive the posterior for the environmental states H± given a stream of noisy observations (Bogacz et al., 2006; VelizCuba et al., 2016). Under this model, ideal observers incorporate information from previous trials to bias their initial belief on subsequent trials. This decreases the average time to the next decision, but not necessarily its accuracy. Furthermore, in finite sequences of trials the rate of reward is maximized when the observer deliberates longer for early decisions, but responds more quickly towards the end of the sequence. We also show that the model agrees with experimentally observed trends in decisions. 2. A model of sequential decisions We model a repeated TAFC task with the environmental state in each trial (H+ or H− ) chosen according to a two-state Markov process: In a sequence of n trials, the correct choices (environmental states or hypotheses) H 1:n = (H 1 , H 2 , ..., H n ) are generated so that P(H 1 = H± ) = 1/2 and P(H i = H∓ |H i−1 = H± ) = for i = 2, ..., n (See Fig. 1). When = 0.5 the states of subsequent trials are independent, and ideally the decision on one trial should not bias future decisions (Ratcliff, 1978; Shadlen and Newsome, 2001; Gold and Shadlen, 2007, 2002; Bogacz et al., 2006; Ratcliff and McKoon, 2008). When 0 ≤ < 0.5, repetitions are more likely and trials are dependent, and evidence obtained during one trial can inform decisions on the next1 . We will show that ideal observers use their decision on the previous trial to adjust their 1 If

0.5 < ≤ 1 states are more likely to alternate. The analysis is similar to the one we present here, so we do not discuss this case separately.

2


states

trial 2

✏ ✏

H

H+ 1

✏ ✏

time

AAAB8HicbVBNS8NAEN3Ur1q/qh69LBbBiyUpBT0WvHisYD+kDWWznbRLd5OwOxFK6K/w4kERr/4cb/4bt20O2vpg4PHeDDPzgkQKg6777RQ2Nre2d4q7pb39g8Oj8vFJ28Sp5tDisYx1N2AGpIighQIldBMNTAUSOsHkdu53nkAbEUcPOE3AV2wUiVBwhlZ6vOrjGJANaoNyxa26C9B14uWkQnI0B+Wv/jDmqYIIuWTG9Dw3QT9jGgWXMCv1UwMJ4xM2gp6lEVNg/Gxx8IxeWGVIw1jbipAu1N8TGVPGTFVgOxXDsVn15uJ/Xi/F8MbPRJSkCBFfLgpTSTGm8+/pUGjgKKeWMK6FvZXyMdOMo82oZEPwVl9eJ+1a1XOr3n290qjncRTJGTknl8Qj16RB7kiTtAgnijyTV/LmaOfFeXc+lq0FJ585JX/gfP4ANVKP9g==

✏ H +✓4

AAAB8HicbVBNS8NAEN3Ur1q/qh69LBZBEEqiBT0WvHisYD+kDWWznbRLd5OwOxFK6K/w4kERr/4cb/4bt20O2vpg4PHeDDPzgkQKg6777RTW1jc2t4rbpZ3dvf2D8uFRy8Sp5tDksYx1J2AGpIigiQIldBINTAUS2sH4dua3n0AbEUcPOEnAV2wYiVBwhlZ6vOjhCJD1r/rlilt156CrxMtJheRo9MtfvUHMUwURcsmM6Xpugn7GNAouYVrqpQYSxsdsCF1LI6bA+Nn84Ck9s8qAhrG2FSGdq78nMqaMmajAdiqGI7PszcT/vG6K4Y2fiShJESK+WBSmkmJMZ9/TgdDAUU4sYVwLeyvlI6YZR5tRyYbgLb+8SlqXVc+teve1Sr2Wx1EkJ+SUnBOPXJM6uSMN0iScKPJMXsmbo50X5935WLQWnHzmmPyB8/kDM76P9Q==

y 3 (t)

y 2 (t)

y 1 (t) ✓1

H+

+✓3

AAAB8HicbVBNS8NAEN3Ur1q/qh69LBZBEEpSCnosePFYwX5IG8pmO2mX7iZhdyKU0F/hxYMiXv053vw3btsctPXBwOO9GWbmBYkUBl332ylsbG5t7xR3S3v7B4dH5eOTtolTzaHFYxnrbsAMSBFBCwVK6CYamAokdILJ7dzvPIE2Io4ecJqAr9goEqHgDK30eNXHMSAb1Ablilt1F6DrxMtJheRoDspf/WHMUwURcsmM6Xlugn7GNAouYVbqpwYSxidsBD1LI6bA+Nni4Bm9sMqQhrG2FSFdqL8nMqaMmarAdiqGY7PqzcX/vF6K4Y2fiShJESK+XBSmkmJM59/TodDAUU4tYVwLeyvlY6YZR5tRyYbgrb68Ttq1qudWvft6pVHP4yiSM3JOLolHrkmD3JEmaRFOFHkmr+TN0c6L8+58LFsLTj5zSv7A+fwBMjqP9A==

AAAB8HicbVDLSgNBEJz1GeMr6tHLYBC8GHYloMeAF48RzEOSJcxOOsmQmdllplcIS77CiwdFvPo53vwbJ8keNLGgoajqprsrSqSw6Pvf3tr6xubWdmGnuLu3f3BYOjpu2jg1HBo8lrFpR8yCFBoaKFBCOzHAVCShFY1vZ37rCYwVsX7ASQKhYkMtBoIzdNLjZRdHgKwX9Eplv+LPQVdJkJMyyVHvlb66/ZinCjRyyaztBH6CYcYMCi5hWuymFhLGx2wIHUc1U2DDbH7wlJ47pU8HsXGlkc7V3xMZU9ZOVOQ6FcORXfZm4n9eJ8XBTZgJnaQImi8WDVJJMaaz72lfGOAoJ44wboS7lfIRM4yjy6joQgiWX14lzatK4FeC+2q5Vs3jKJBTckYuSECuSY3ckTppEE4UeSav5M0z3ov37n0sWte8fOaE/IH3+QMzzo/1

H+ H 1

+✓2

AAAB8HicbVDLSgNBEJz1GeMr6tHLYBAEIexKQI8BLx4jmIckS5iddJIhM7PLTK8QlnyFFw+KePVzvPk3TpI9aGJBQ1HVTXdXlEhh0fe/vbX1jc2t7cJOcXdv/+CwdHTctHFqODR4LGPTjpgFKTQ0UKCEdmKAqUhCKxrfzvzWExgrYv2AkwRCxYZaDARn6KTHyy6OAFkv6JXKfsWfg66SICdlkqPeK311+zFPFWjkklnbCfwEw4wZFFzCtNhNLSSMj9kQOo5qpsCG2fzgKT13Sp8OYuNKI52rvycypqydqMh1KoYju+zNxP+8ToqDmzATOkkRNF8sGqSSYkxn39O+MMBRThxh3Ah3K+UjZhhHl1HRhRAsv7xKmleVwK8E99VyrZrHUSCn5IxckIBckxq5I3XSIJwo8kxeyZtnvBfv3ftYtK55+cwJ+QPv8wcwto/z

trial 4

✏

H

+✓1

trial 3

✓2

AAAB8HicbVBNS8NAEN3Ur1q/qh69LBbBiyXRgh4LXjxWsB/ShrLZTtqlu0nYnQgl9Fd48aCIV3+ON/+N2zYHbX0w8Hhvhpl5QSKFQdf9dgpr6xubW8Xt0s7u3v5B+fCoZeJUc2jyWMa6EzADUkTQRIESOokGpgIJ7WB8O/PbT6CNiKMHnCTgKzaMRCg4Qys9XvRwBMj6V/1yxa26c9BV4uWkQnI0+uWv3iDmqYIIuWTGdD03QT9jGgWXMC31UgMJ42M2hK6lEVNg/Gx+8JSeWWVAw1jbipDO1d8TGVPGTFRgOxXDkVn2ZuJ/XjfF8MbPRJSkCBFfLApTSTGms+/pQGjgKCeWMK6FvZXyEdOMo82oZEPwll9eJa3LqudWvftapV7L4yiSE3JKzolHrkmd3JEGaRJOFHkmr+TN0c6L8+58LFoLTj5zTP7A+fwBNtaP9w==

AAAB8HicbVDLSgNBEOz1GeMr6tHLYBAEIexKQI8BLx4jmIckS5idTJIhM7PLTK8QlnyFFw+KePVzvPk3TpI9aGJBQ1HVTXdXlEhh0fe/vbX1jc2t7cJOcXdv/+CwdHTctHFqGG+wWMamHVHLpdC8gQIlbyeGUxVJ3orGtzO/9cSNFbF+wEnCQ0WHWgwEo+ikx8sujjjSXrVXKvsVfw6ySoKclCFHvVf66vZjliqukUlqbSfwEwwzalAwyafFbmp5QtmYDnnHUU0Vt2E2P3hKzp3SJ4PYuNJI5urviYwqaycqcp2K4sguezPxP6+T4uAmzIROUuSaLRYNUkkwJrPvSV8YzlBOHKHMCHcrYSNqKEOXUdGFECy/vEqaV5XArwT31XKtmsdRgFM4gwsI4BpqcAd1aAADBc/wCm+e8V68d+9j0brm5TMn8Afe5w81Qo/2

y 4 (t)

H+ 1

AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRS48V7Qe0sWy2m3bpZhN2J0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6wEnC/YgOlQgFo2il+/qj6pcrbtWdg6wSLycVyNHol796g5ilEVfIJDWm67kJ+hnVKJjk01IvNTyhbEyHvGupohE3fjY/dUrOrDIgYaxtKSRz9fdERiNjJlFgOyOKI7PszcT/vG6K4bWfCZWkyBVbLApTSTAms7/JQGjOUE4soUwLeythI6opQ5tOyYbgLb+8SloXVc+teneXldpNHkcRTuAUzsGDK6hBHRrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzifPx1/jaw=

Hn

trial 1

✓3

time

AAAB8HicbVDLSgNBEOz1GeMr6tHLYBC8GHYloMeAF48RzEOSJcxOJsmQmdllplcIS77CiwdFvPo53vwbJ8keNLGgoajqprsrSqSw6Pvf3tr6xubWdmGnuLu3f3BYOjpu2jg1jDdYLGPTjqjlUmjeQIGStxPDqYokb0Xj25nfeuLGilg/4CThoaJDLQaCUXTS42UXRxxpr9orlf2KPwdZJUFOypCj3it9dfsxSxXXyCS1thP4CYYZNSiY5NNiN7U8oWxMh7zjqKaK2zCbHzwl507pk0FsXGkkc/X3REaVtRMVuU5FcWSXvZn4n9dJcXATZkInKXLNFosGqSQYk9n3pC8MZygnjlBmhLuVsBE1lKHLqOhCCJZfXiXNq0rgV4L7arlWzeMowCmcwQUEcA01uIM6NICBgmd4hTfPeC/eu/exaF3z8pkT+APv8wc4Wo/4

time

✓4

time

Figure 1: Sequence of TAFC trials with states H n determined by a two-state Markov process with switching probability := P(H n , H n−1 ). For instance, H 1:4 = (H+ , H+ , H− , H+ ) could stand for the drift direction of each drift-diffusion process yn (t) corresponding to the decision variable of an ideal observer making noisy measurements, ξn (t). The decision threshold (±θn−1 ) crossed in trial n−1 determines the sign of the initial condition in trial n: yn (0).

prior over the states at the beginning of the following trial. Importantly, we assume that all rewards are given at the end of the trial sequence (Braun et al., 2018). Rewarding the correct choice on trial i provides unambiguous information about the state H i , superseding the noisy information gathered over that trial. Our results can be easily extended to this case, as well as cases when rewards are only given with some probability. Our main contribution is to derive and analyze the ideal observer model for this sequential version of the TAFC task. We use principles of probabilistic inference to derive the sequence of DDMs, and optimization to determine the decision thresholds that maximize the reward rate (RR). Due to the tractability of the DDM, the correct probability and decision times that constitute the RR can be computed analytically. We also demonstrate that response biases, such as repetitions, commonly observed in sequential decision tasks, follow naturally from this model. 3. Optimizing decisions in a sequence of uncorrelated trials We first derive in Appendix A and summarize here the optimal evidence-accumulation model for a sequence of independent TAFC trials (P(H i = H i−1 ) = 0.5). Although these results can be found in previous work (Bogacz et al., 2006), they are crucial for the subsequent discussion, and we thus provide them for completeness. A reader familiar with these classical results can skip to Section 4, and refer back to results in this section as needed. The drift-diffusion model (DDM) can be derived as the continuum limit of a recursive equation for the loglikelihood ratio (LLR) (Wald and Wolfowitz, 1948; Gold and Shadlen, 2002; Bogacz et al., 2006). When the environmental states are uncorrelated and unbiased, an ideal observer has no bias at the start of each trial. 3.1. Drift-diffusion model for a single trial We assume that on a trial the observer integrates a stream of noisy measurements of the true state, H 1 . If these measurements are conditionally independent, the functional central limit theorem yields the DDM for the scaled LLR, P(H 1 =H+ |observations) y1 (t) = D log P(H 1 =H |observations) , after observation time t, − dy1 = g1 dt +

√ 2DdW.

(1)

Here W is a Wiener process, the drift g1 ∈ g± depends on the environmental state, and D is the variance which depends on the noisiness of each observation. 3


For simplicity and consistency with typical random dot kinetogram tasks (Schall, 2001; Gold and Shadlen, 2007), we assume each drift direction is of equal unit strength: g+ = −g− = 1, and task difficulty is controlled by scaling the variance2 through D. The initial condition is determined by the observer’s prior bias, y1 (0) = D ln[P(H 1 = H+ )/P(H 1 = H− )]. Thus for an unbiased observer, y1 (0) = 0. There are two primary ways of obtaining a response from the DDM given by Eq. (1) that mirror common experimental protocols (Shadlen and Newsome, 2001; Gold and Shadlen, 2002, 2007): An ideal observer interrogated at a set time, t = T responds with sign(y(T )) = ±1 indicating the more likely of the two states, H 1 = H± , given the accumulated evidence. On the other hand, an observer free to choose their response time can trade speed for accuracy in making a decision. This is typically modeled in the DDM by defining a decision threshold, θ1 , and assuming that at the first time, T 1 , at which |y1 (T 1 )| ≥ θ1 , the evidence accumulation process terminates, and the observer chooses H± if sign(y1 (T 1 )) = ±1. The probability, c1 , of making a correct choice in the free response paradigm can be obtained using the FokkerPlanck (FP) equation corresponding to Eq. (1) (Gardiner, 2009). Given the initial condition y1 (0) = y10 = D ln[P(H 1 = H+ )/P(H 1 = H− )], threshold θ1 , and state H 1 = H+ , the probability of an exit through either boundary ±θ1 is 1 − e−(y0 +θ1 )/D , 1 − e−2θ1 /D

e−(y0 +θ1 )/D − e−2θ1 /D , 1 − e−2θ1 /D

(2)

e−θ1 /D = 1 − c1 . 1 + e−θ1 /D

(3)

1

1

πθ1 (y10 ) =

π−θ1 (y10 ) =

simplifying at y10 = 0 to πθ1 (0) =

1 =: c1 , 1 + e−θ1 /D

π−θ1 (0) =

An exit through the threshold θ1 results in a correct choice of H 1 = H+ , so c1 = πθ1 (0). The correct probability c1 increases with θ1 , since more evidence is required to reach a larger threshold. Defining the decision in trial 1 as d1 = ±1 if y1 (T 1 ) = ±θ1 , Bayes’ rule implies 1 = c1 = P(d1 = ±1|H 1 = H± ) = P(H 1 = H± |d1 = ±1), 1 + e−θ1 /D

e−θ1 /D = 1 − c1 = P(d1 = ±1|H 1 = H∓ ) = P(H 1 = H∓ |d1 = ±1), 1 + e−θ1 /D

(4a) (4b)

since P(H 1 = H± ) = P(d1 = ±1) = 1/2. Rearranging the expressions in Eq. (4) and isolating θ1 relates the threshold θ1 to the LLR given a decision d1 = ±1: ±θ1 = D ln

P(H 1 = H+ |d1 = ±1) . P(H 1 = H− |d1 = ±1)

(5)

3.2. Tuning performance via speed-accuracy tradeoff Increasing θ1 increases the probability of a correct decision, and the average time to make a decision, DT 1 . Humans and other animals balance speed and accuracy to maximize the rate of correct decisions (Chittka et al., 2009; Bogacz et al., 2010). This is typically quantified using the reward rate (RR) (Gold and Shadlen, 2002): RR1 (θ1 ) =

c1 , DT 1 + T D

(6)

where c1 is the probability of a correct decision, DT 1 is the mean time required for y1 (t) to reach either threshold ±θ1 , and T D is the prescribed time delay to the start of the next trial (Gold and Shadlen, 2002; Bogacz et al., 2006). Eq. (6) increases with c1 (accuracy) and with inverse time 1/(DT 1 + T D ) (speed). This usually leads to a nonmonotonic dependence of RR1 on the threshold, θ1 , since increasing θ1 increases accuracy, but decreases speed (Gold and Shadlen, 2002; Bogacz et al., 2006). 2 An

arbitrary drift amplitude g can also be scaled out via a change of variables y = g˜y, so the resulting DDM for y˜ has unit drift, and D˜ = D/g2 .

4


The average response time, DT 1 , can be obtained as the solution of a mean exit time problem for the Fokker– Planck (FP) equation for p(y1 , t) with absorbing boundaries, p(±θ1 , t) = 0 (Bogacz et al., 2006; Gardiner, 2009). Since the RR is determined by the average decision time over all trials, we compute the unconditional mean exit time,   θ /D 1  e 1 + e−θ1 /D − 2e−y0 /D  1 1 (7) T (y0 ; θ1 ) = θ1   − y0 , eθ1 /D − e−θ1 /D which for y10 = 0 simplifies to " T (0; θ1 ) = θ1

# 1 − e−θ1 /D = DT 1 . 1 + e−θ1 /D

(8)

Plugging this expression into the RR as defined in Eq. (6), assuming y10 = 0, we have h i h i−1 RR1 (θ1 ) = θ1 1 − e−θ1 /D + T D 1 + e−θ1 /D . We can identify the maximum of RR1 (θ1 ) > 0 by finding the minimum of its reciprocal, 1/RR1 (θ1 ) (Bogacz et al., 2006), opt θ1 = T D + D − DW exp [(T D + D)/D] . (9) opt

Here W( f ) is the Lambert W function (the inverse of f (W) = WeW ). In the limit T D → 0, we have θ1 → 0, and Eq. (9) defines nontrivial optimal thresholds at T D > 0. Having established a policy for optimizing performance in a sequence of independent TAFC trials, in the next section we consider dependent trials. We can again explicitly compute the RR function for sequential TAFC trials with states, H 1:n = (H 1 , H 2 , H 3 , ..., H n ), evolving according to a two-state Markov process with parameter := P(H n+1 , H n ). Unlike above, where = 0.5, we will see that when ∈ [0, 0.5), an ideal observer starts each trial H n for n ≥ 2 by using information obtained over the previous trial, to bias their initial belief, yn0 , 0. 4. Integrating information across two correlated trials We first focus on the case of two sequential TAFC trials. Both states are equally likely at the first trial, P(H 1 = H± ) = 1/2, but the state at the second trial can depend on the state of the first, := P(H 2 = H∓ |H 1 = H± ). On both trials an observer makes observations, ξns , with conditional density f± (ξ) if H n = H± to infer the state H n . The observer also uses information from trial 1 to infer the state at trial 2. All information about H 1 can be obtained from the decision variable, d1 = ±1, and ideal observers use the decision variable to set their initial belief, y20 , 0, at the start of trial 2. We later show that the results for two trials can be extended to trial sequences of arbitrary length with states generated according to the same two-state Markov process. 4.1. Optimal observer model for two correlated trials The first trial is equivalent to the single trial case discussed in section 3.1. Assuming each choice is equally likely, P(H 1 = H± ) = 1/2, no prior information exists. Therefore, the decision variable, y1 (t), satisfies the DDM given by Eq. (1) with y1 (0) = 0. This generates a decision d1 ∈ ±1 if y1 (T 1 ) = ±θ1 , so that the probability of a correct decision c1 is related to the threshold θ1 by Eq. (4). Furthermore, since = P(H 2 = H∓ |H 1 = H± ), it follows that 1 − = P(H 2 = H± |H 1 = H± ), which means P(H 2 = H± |d1 = ±1) =P(H 2 = H± |H 1 = H± )P(H 1 = H± |d1 = ±1) + P(H 2 = H± |H 1 = H∓ )P(H 1 = H∓ |d1 = ±1) =(1 − )c1 + (1 − c1 ),

(10)

and, similarly, P(H 2 = H∓ |d1 = ±1) = (1 − )(1 − c1 ) + c1 . 5

(11)


As the decision, d1 = ±1, determines the probability of each state at the end of trial 1, individual observations, 1 ξ1:s , are not needed to define the belief of the ideal observer at the outset of trial 2. The ideal observer uses a sequence 2 of observations, ξ1:s , and their decision on the previous trial, d1 , to arrive at the probability ratio R2s =

2 P(H 2 = H+ |ξ1:s , d1 ) 2 P(H 2 = H− |ξ1:s , d1 )

=

2 P(ξ1:s |H 2 = H+ )P(H 2 = H+ |d1 ) 2 P(ξ1:s |H 2 = H− )P(H 2 = H− |d1 )

.

(12)

2 Taking the logarithm, and applying conditional independence of the measurements ξ1:s , we have that

L2s

=

ln R2s

=

s X j=1

ln

f+ (ξ2j ) f− (ξ2j )

+ ln

P(H 2 = H+ |d1 ) , P(H 2 = H− |d1 )

h i indicating that L02 = ln P(H 2 = H+ |d1 )/P(H 2 = H− |d1 ) . Taking the temporal continuum limit as in Appendix A, we find √ (13) dy2 = g2 dt + 2DdW, with the Wiener process, W, and variance defined as in Eq. (1). The drift g2 ∈ ±1 is determined by H 2 = H± . Furthermore, the initial belief is biased in the direction of the previous decision, as the observer knows that states are correlated across trials. The initial condition for Eq. (13) is therefore y2 (0; d1 = ±1) = D ln

P(H 2 = H+ |d1 = ±1) (1 − )eθ1 /D + = ±D ln = ±y20 , P(H 2 = H− |d1 = ±1) eθ1 /D + (1 − )

(14)

where we have used Eqs. (4), (10), and (11). Note that y20 → 0 in the limit → 0.5, so no information is carried forward when states are uncorrelated across trials. In the limit → 0, we find y20 → θ1 , so the ending value of the decision variable y1 (T 1 ) = ±θ1 in trial 1 is carried forward to trial 2, since there is no change in environmental state from trial 1 to 2. For ∈ (0, 1/2), the information gathered on the previous trial provides partial information about the next state, and we have y20 ∈ (0, θ1 ). We assume that the decision variable, y2 (t), evolves according to Eq. (13), until it reaches a threshold ±θ2 , at which point the decision d2 = ±1 is registered. The full model is thus specified by, √ dy j = g j dt + 2DdW, until |y j (T j )| ≥ θ j 7→ d j = sign(y j (T j )), (15a) y1 (0) = 0,

y2 (0) = ±y20 if d1 = ±1, and g j = ±1 if H j = H± , j = 1, 2.

(15b)

The above analysis is easily extended to the case of arbitrary n ≥ 2, but before we do so, we analyze the impact of state correlations on the reward rate (RR) and identify the decision threshold values θ1:2 that maximize the RR. Remark: As noted earlier, we assume the total reward is given at the end of the experiment, and not after each trial (Braun et al., 2018). A reward for a correct choice provides the observer with complete information about the state H n , so that y2 (0; H 1 = ±1) = D ln

P(H 2 = H+ |H 1 = ±1) (1 − ) = ±D ln . P(H 2 = H− |H 1 = ±1)

A similar result holds in the case that the reward is only given with some probability. 4.2. Performance for constant decision thresholds We next show how varying the decision thresholds impacts the performance of the observer. For simplicity, we first assume the same threshold is used in both trials, θ1:2 = θ. We quantify performance using a reward rate (RR) function across both trials: c1 + c2 , (16) RR1:2 = DT 1 + DT 2 + 2T D 6


✏=

0. 5

B 1.5

0.9

AAAB83icbVBNSwMxEJ2tX7V+VT16CRbB07Irfl2EghePFawtdJeSTbNtaDYJSVYopX/DiwdFvPpnvPlvTNs9aOuDgcd7M8zMSxRnxgbBt1daWV1b3yhvVra2d3b3qvsHj0bmmtAmkVzqdoIN5UzQpmWW07bSFGcJp61keDv1W09UGybFgx0pGme4L1jKCLZOiiKqDONS3AT+RbdaC/xgBrRMwoLUoECjW/2KepLkGRWWcGxMJwyUjcdYW0Y4nVSi3FCFyRD3acdRgTNq4vHs5gk6cUoPpVK7EhbN1N8TY5wZM8oS15lhOzCL3lT8z+vkNr2Ox0yo3FJB5ovSnCMr0TQA1GOaEstHjmCimbsVkQHWmFgXU8WFEC6+vEwez/ww8MP781r9soijDEdwDKcQwhXU4Q4a0AQCCp7hFd683Hvx3r2PeWvJK2YO4Q+8zx8nBJEM

A

0.8

✏=

1

25 0.

sha1_base64="s+QXTTAvxGtXGWXchJ8vQZADf9Y=">AAAB9HicbVDLSgNBEJyNrxhfUY9eBoPgadkNvi5CwIvHCOYByRJmJ73JkNmZdWY2EJZ8hxcPinj1Y7z5N06SPWhiQUNR1U13V5hwpo3nfTuFtfWNza3idmlnd2//oHx41NQyVRQaVHKp2iHRwJmAhmGGQztRQOKQQysc3c381hiUZlI8mkkCQUwGgkWMEmOloAuJZlyKW8+tXvbKFc/15sCrxM9JBeWo98pf3b6kaQzCUE607vheYoKMKMMoh2mpm2pICB2RAXQsFSQGHWTzo6f4zCp9HEllSxg8V39PZCTWehKHtjMmZqiXvZn4n9dJTXQTZEwkqQFBF4uilGMj8SwB3GcKqOETSwhVzN6K6ZAoQo3NqWRD8JdfXiXNqut7rv9wUald5XEU0Qk6RefIR9eohu5RHTUQRU/oGb2iN2fsvDjvzseiteDkM8foD5zPH5vpkUg=

AAAB83icbVBNSwMxEJ2tX7V+VT16CRbBU9kVUS9CwYvHCvYDukvJptk2NJuEJCuUpX/DiwdFvPpnvPlvTNs9aOuDgcd7M8zMixVnxvr+t1daW9/Y3CpvV3Z29/YPqodHbSMzTWiLSC51N8aGciZoyzLLaVdpitOY0048vpv5nSeqDZPi0U4UjVI8FCxhBFsnhSFVhnEpbv160K/W/Lo/B1olQUFqUKDZr36FA0mylApLODamF/jKRjnWlhFOp5UwM1RhMsZD2nNU4JSaKJ/fPEVnThmgRGpXwqK5+nsix6kxkzR2nSm2I7PszcT/vF5mk5soZ0JllgqyWJRkHFmJZgGgAdOUWD5xBBPN3K2IjLDGxLqYKi6EYPnlVdK+qAcusYfLWuOqiKMMJ3AK5xDANTTgHprQAgIKnuEV3rzMe/HevY9Fa8krZo7hD7zPHyD0kQg=

y02

0.3

0.4

0

0.5

0.1

0.2

0.3

0.4

0.5

max that Figure 3: A,B. Reward rate (RR), given by Eq. (20), depends on the change rate and thresholds θ1:2 . For = 0.25 (panel A), the pair θ1:2 max max max 2 maximizes RR (red dot) satisfies θ1 > θ2 . Compare to the pair θ1:2 (red circle) for θ = θ1:2 as in Fig. 2D. The boundary θ2 = y0 (θ1 ), obtained from Eq. (14), is shown as a dashed blue line. For = 0.1 (panel B), the optimal choice of thresholds (red dot) is on the boundary θ2 = y20 (θ1 ). C. Decreasing from 0.5, initially results in an increase in θ1max , and decrease in θ2max . At sufficiently small value of , the second threshold collides with the initial condition, θ2 = y20 (θ1 ) (vertical black line). After this point, both θ1max and θ2max increase with , and meet at = 0. The optimal constant threshold, θmax , is represented by a thin blue line. D. RRmax 1,2 () decreases as is increased, since the observer carries less information from trial 1 to trial 2. The thin blue line shows RRmax 1,2 for the constant threshold case.

average, closer to the threshold θ. This shortens the average decision time, and increases the RR (Fig. 2C, D). Lastly, note that the threshold value, θ, that maximizes the RR is higher for lower values of , as the observer can afford to require more evidence for a decision when starting trial 2 with more information. 4.3. Performance for dynamic decision thresholds We next ask whether the RR given by Eq. (16) can be increased by allowing unequal decision thresholds between the trials, θ1 , θ2 . As before, the probability of a correct response, c1 = πθ1 (0), and mean decision time, DT 1 , in trial 1 are given by Eqs. (3) and (8). The correct probability c2 in trial 2 is again determined by the decision threshold, and noise variance, D, as long as θ2 > y20 . However, when θ2 < y20 , the response in trial 2 is instantaneous (DT 2 = 0) and c2 = (1 − )c1 + (1 − c1 ). Therefore, the probability of a correct response on trial 2 is defined piecewise   1   : θ2 > y20 ,   −θ2 /D  1 + e (18) c2 =   (1 − ) + e−θ1 /D    : θ2 ≤ y20 .  −θ /D 1+e 1 We will show in the next section that this result extends to an arbitrary number of trials, with an arbitrary sequence of thresholds, θ1:n . In the case θ2 > y20 , the average time until a decision in trial 2, is (See Appendix C) DT 2 = θ2

1 − e−θ2 /D (1 − 2)(1 − e−θ1 /D ) (1 − )eθ1 /D + − D ln , 1 + e−θ2 /D 1 + e−θ1 /D eθ1 /D + (1 − ) 8

(19)


which allows us to compute the reward rate function RR1:2 () = RR1:2 (θ1:2 ; ):  E+1 + E+2    h i h i      E+2 E−1 θ1 − (1 − 2)y20 + E+1 θ2 E−2 + 2T D E+2 RR1:2 () =     2(1 − ) + E+1     θ1 E−1 + 2T D E+1

: θ2 > y20 , (20) : θ2 ≤

y20 ,

where E±j = 1 ± e−θ j /D . This reward rate is convex as a function of the thresholds, θ1:2 , for all examples we examined. max The pair of thresholds, θ1:2 , that maximize RR satisfy θ1max ≥ θ2max , so that the optimal observer typically decreases their decision threshold from trial 1 to trial 2 (Fig. 3A,B). This shortens the mean time to a decision in trial 2, since the initial belief y20 is closer to the decision threshold more likely to be correct. As the change probability decreases from 0.5, initially the optimal θ2max decreases and θ1max increases (Fig. 3C), and θ2max eventually colides with the boundary y20 = θ2 . For smaller values of , RR is thus maximized when the observer accumulates information on trial 1, and then makes the same decision instantaneously on trial 2, so that c2 = (1 − )c1 + (1 − c1 ) and DT 2 = 0. As in the case of a fixed threshold, the maximal reward rate, RRmax 1,2 (), increases as is decreased (Fig. 3D), since the observer starts trial 2 with more information. Surprisingly, despite different strategies, there is no big increase in RRmax 1,2 compared to constant thresholds, with the largest gain at low values of . Remark: In the results displayed for our RR maximization analysis (Fig. 2 and 3), we have fixed T D = 2 and D = 1. We have also examined these trends in the RR at lower and higher values of T D and D (plots not shown), and the results are qualitatively similar. The impact of dynamic thresholds θ1:2 on the RR are slightly more pronounced when T D is small or when D is large. 5. Multiple correlated trials To understand optimal decisions across multiple correlated trials, we again assume that the states evolve according to a two-state Markov Chain with := P(H j+1 = H∓ |H j = H± ). Information gathered on one trial again determines the initial belief on the next. We first discuss optimizing the RR for a constant threshold for a fixed number n of trials known to the observer. When decision thresholds are allowed to vary between trials, optimal thresholds typically decrease across trials, θmax ≤ θmax j j−1 . 5.1. Optimal observer model for multiple trials As in the case of two trials, we must consider the caveats associated with instantaneous decisions in order to define initial conditions y0j , probabilities of correct responses c j , and mean decision times DT j . The same argument leading to Eq. (13) shows that the decision variable evolves according to a linear drift diffusion process on each trial. The probability of a correct choice is again defined by the threshold, c j = 1/(1 + e−θ j /D ) when θ j > y0j , and c j = (1 − )c j−1 + (1 − c j−1 ) when θ j ≤ y0j , as in Eq. (18). Therefore, c j may be defined iteratively for j ≥ 2, as  1    : θ j > y0j ,  −θ j /D cj =  1 + e    (1 − )c j−1 + (1 − c j−1 ) : θ j ≤ y j , 0

(21)

with c1 = 1/(1 + e−θ1 /D ) since y10 = 0. The probability c j quantifies the belief at the end of trial j. Hence, as in Eq. (14), the initial belief of an ideal observer on trial j + 1 has the form y j+1 (0; d j = ±1) = ±D ln

(1 − )c j + (1 − c j ) = ±y0j+1 , c j + (1 − )(1 − c j )

(22)

and the decision variable y j (t) within trials obeys a DDM with threshold θ j as in Eq. (13). A decision d j = ±1 is then registered when y j (T j ) = ±θ j , and if θ j ≤ y0j , then d j = d j−1 and T j = 0. 9


A 0.35

B 0.35

C 0.5

0.3

0.3

0.4

n!1 AAAB83icbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0IOHghePFWwtNKFstpt26WYTdidCCP0bXjwo4tU/481/47bNQVsfDDzem2FmXphKYdB1v53K2vrG5lZ1u7azu7d/UD886pok04x3WCIT3Qup4VIo3kGBkvdSzWkcSv4YTm5n/uMT10Yk6gHzlAcxHSkRCUbRSr4iPibEFyrCfFBvuE13DrJKvJI0oER7UP/yhwnLYq6QSWpM33NTDAqqUTDJpzU/MzylbEJHvG+pojE3QTG/eUrOrDIkUaJtKSRz9fdEQWNj8ji0nTHFsVn2ZuJ/Xj/D6DoohEoz5IotFkWZJPbPWQBkKDRnKHNLKNPC3krYmGrK0MZUsyF4yy+vku5F03Ob3v1lo3VTxlGFEziFc/DgClpwB23oAIMUnuEV3pzMeXHenY9Fa8UpZ47hD5zPH5oikV8=

n=6 AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1INCwYvHivYD2lA22027dLMJuxOhhP4ELx4U8eov8ua/cdvmoK0PBh7vzTAzL0ikMOi6305hZXVtfaO4Wdra3tndK+8fNE2casYbLJaxbgfUcCkUb6BAyduJ5jQKJG8Fo9up33ri2ohYPeI44X5EB0qEglG00oO6ueiVK27VnYEsEy8nFchR75W/uv2YpRFXyCQ1puO5CfoZ1SiY5JNSNzU8oWxEB7xjqaIRN342O3VCTqzSJ2GsbSkkM/X3REYjY8ZRYDsjikOz6E3F/7xOiuGVnwmVpMgVmy8KU0kwJtO/SV9ozlCOLaFMC3srYUOqKUObTsmG4C2+vEyaZ1XPrXr355XadR5HEY7gGE7Bg0uowR3UoQEMBvAMr/DmSOfFeXc+5q0FJ585hD9wPn8AzoGNcw==

n=3 AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0INCwYvHivYD2lA22027dLMJuxOhhP4ELx4U8eov8ua/cdvmoK0PBh7vzTAzL0ikMOi6305hZXVtfaO4Wdra3tndK+8fNE2casYbLJaxbgfUcCkUb6BAyduJ5jQKJG8Fo9up33ri2ohYPeI44X5EB0qEglG00oO6Oe+VK27VnYEsEy8nFchR75W/uv2YpRFXyCQ1puO5CfoZ1SiY5JNSNzU8oWxEB7xjqaIRN342O3VCTqzSJ2GsbSkkM/X3REYjY8ZRYDsjikOz6E3F/7xOiuGVnwmVpMgVmy8KU0kwJtO/SV9ozlCOLaFMC3srYUOqKUObTsmG4C2+vEyaZ1XPrXr3F5XadR5HEY7gGE7Bg0uowR3UoQEMBvAMr/DmSOfFeXc+5q0FJ585hD9wPn8AyfWNcA==

0.25

0.3

0.25

n=2 AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKoAeFghePFe0HtKFstpt26WYTdidCCf0JXjwo4tVf5M1/47bNQVsfDDzem2FmXpBIYdB1v53C2vrG5lZxu7Szu7d/UD48apk41Yw3WSxj3Qmo4VIo3kSBkncSzWkUSN4Oxrczv/3EtRGxesRJwv2IDpUIBaNopQd1U+uXK27VnYOsEi8nFcjR6Je/eoOYpRFXyCQ1puu5CfoZ1SiY5NNSLzU8oWxMh7xrqaIRN342P3VKzqwyIGGsbSkkc/X3REYjYyZRYDsjiiOz7M3E/7xuiuGVnwmVpMgVWywKU0kwJrO/yUBozlBOLKFMC3srYSOqKUObTsmG4C2/vEpatarnVr37i0r9Oo+jCCdwCufgwSXU4Q4a0AQGQ3iGV3hzpPPivDsfi9aCk88cwx84nz/IcY1v

0.2 0

✏ = 0.25 AAAB+HicbVBNSwMxEJ2tX7V+dNWjl2ARPC27RdGDQsGLxwr2A9qlZNNsG5pNliQr1NJf4sWDIl79Kd78N6btHrT1wcDjvRlm5kUpZ9r4/rdTWFvf2Nwqbpd2dvf2y+7BYVPLTBHaIJJL1Y6wppwJ2jDMcNpOFcVJxGkrGt3O/NYjVZpJ8WDGKQ0TPBAsZgQbK/XccpemmnEp0A3yvepFz634nj8HWiVBTiqQo95zv7p9SbKECkM41roT+KkJJ1gZRjidlrqZpikmIzygHUsFTqgOJ/PDp+jUKn0US2VLGDRXf09McKL1OIlsZ4LNUC97M/E/r5OZ+CqcMJFmhgqyWBRnHBmJZimgPlOUGD62BBPF7K2IDLHCxNisSjaEYPnlVdKseoHvBffnldp1HkcRjuEEziCAS6jBHdShAQQyeIZXeHOenBfn3flYtBacfOYI/sD5/AHInpHT

1

2

3

0.2 0

✏ = 0.1 AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0INCwYvHCvYD2lg220m7dLMJuxulhP4PLx4U8ep/8ea/cdvmoK0PBh7vzTAzL0gE18Z1v53Cyura+kZxs7S1vbO7V94/aOo4VQwbLBaxagdUo+ASG4Ybge1EIY0Cga1gdDP1W4+oNI/lvRkn6Ed0IHnIGTVWeuhiormIJbkmbtXrlStu1Z2BLBMvJxXIUe+Vv7r9mKURSsME1brjuYnxM6oMZwInpW6qMaFsRAfYsVTSCLWfza6ekBOr9EkYK1vSkJn6eyKjkdbjKLCdETVDvehNxf+8TmrCSz/jMkkNSjZfFKaCmJhMIyB9rpAZMbaEMsXtrYQNqaLM2KBKNgRv8eVl0jyrejaxu/NK7SqPowhHcAyn4MEF1OAW6tAABgqe4RXenCfnxXl3PuatBSefOYQ/cD5/ANWhkWI=

1

2

3

0.2 0

✏=0

n=1

AAAB83icbVBNSwMxEJ2tX7V+VT16CRbBU9kVQQ8KBS8eK9gP6C4lm2bb0GwSkqxQlv4NLx4U8eqf8ea/MW33oK0PBh7vzTAzL1acGev7315pbX1jc6u8XdnZ3ds/qB4etY3MNKEtIrnU3RgbypmgLcssp12lKU5jTjvx+G7md56oNkyKRztRNErxULCEEWydFIZUGcalQLfI71drft2fA62SoCA1KNDsV7/CgSRZSoUlHBvTC3xloxxrywin00qYGaowGeMh7TkqcEpNlM9vnqIzpwxQIrUrYdFc/T2R49SYSRq7zhTbkVn2ZuJ/Xi+zyXWUM6EySwVZLEoyjqxEswDQgGlKLJ84golm7lZERlhjYl1MFRdCsPzyKmlf1AO/Hjxc1ho3RRxlOIFTOIcArqAB99CEFhBQ8Ayv8OZl3ov37n0sWkteMXMMf+B9/gDzbJDv

AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqAeFghePFe0HtKFstpt26WYTdidCCf0JXjwo4tVf5M1/47bNQVsfDDzem2FmXpBIYdB1v53C2vrG5lZxu7Szu7d/UD48apk41Yw3WSxj3Qmo4VIo3kSBkncSzWkUSN4Oxrczv/3EtRGxesRJwv2IDpUIBaNopQd14/XLFbfqzkFWiZeTCuRo9MtfvUHM0ogrZJIa0/XcBP2MahRM8mmplxqeUDamQ961VNGIGz+bnzolZ1YZkDDWthSSufp7IqORMZMosJ0RxZFZ9mbif143xfDKz4RKUuSKLRaFqSQYk9nfZCA0ZygnllCmhb2VsBHVlKFNp2RD8JZfXiWti6rnVr37y0r9Oo+jCCdwCufgQQ3qcAcNaAKDITzDK7w50nlx3p2PRWvByWeO4Q+czx/G7Y1u

1

2

3

Figure 4: Reward rate (RR1:n ) defined by Eq. (24) increases with n, and as is decreased, as shown for = 0.25 (panel A); = 0.1 (panel B); and = 0 (panel C). The optimal decision threshold, θmax , increases with n. The limiting reward rates RR1:n (θ; ) → RR∞ (θ; ) as n → ∞ defined by Eq. (25) are shown in light blue.

5.2. Performance for constant decision thresholds With n trials, the RR is the sum of correct probabilities divided by the total decision and delay time: Pn j=1 c j RR1:n = Pn . j=1 DT j + nT D

(23)

The probability of a correct choice, c j = 1/(1 + e−θ/D ), is constant across trials, and determined by the fixed threshold θ. It follows that y20 = y30 = · · · = yn0 = y0 . Eq. (22) then implies that θ > y0j for > 0 and y0j = θ if = 0. As y0j is constant on the 2nd trial and beyond, the analysis of the two trial case implies that the mean decision time is given by an expression equivalent to Eq. (17) for j ≥ 2. Using these expressions for c j and DT j in Eq. (23), we find h i h i−1 RR1:n (θ; ) = n 1 − e−θ/D nθ − (n − 1)(1 − 2)y0 + nT D 1 + e−θ/D .

(24)

As n is increased, the threshold value, θmax , that maximizes the RR increases across the full range of values (Fig. 4): As the number of trials increases, trial 1 has less impact on the RR. Longer decision times on trial 1 impact the RR less, allowing for longer accumulation times (higher thresholds) in later trials. In the limit of an infinite number of trials (n → ∞), Eq. (24) simplifies to h i h i−1 RR∞ (θ; ) = 1 − e−θ/D θ − (1 − 2)y0 + T D 1 + e−θ/D ,

(25)

and we find θmax of RR∞ (θ; ) sets an upper limit on the optimal θ for n < ∞. Interestingly, in a static environment ( → 0), lim RR∞ (θ; ) =

→0

1 , T D 1 + e−θ/D

(26)

so the decision threshold value that maximizes RR diverges, θmax → ∞ (Fig. 4C). Intuitively, when there are many trials (n 1), the price of a long wait for a high accuracy decision in the first trial (c1 ≈ 1 but DT 1 1 for θ 1) is offset by the reward from a large number (n − 1 1) of subsequent, instantaneous, high accuracy decisions (c j ≈ 1 but DT 2 = 0 for θ 1). 5.3. Dynamic decision thresholds In the case of dynamic thresholds, the RR function, Eq. (23), can be maximized for an arbitrary number of trials, n. The probability of a correct decision, c j , is given by the more general, iterative version of Eq. (21). Therefore, while analytical results can be obtained for constant thresholds, numerical methods are needed to find the sequence of optimal dynamic thresholds. 10


three trials

A1.6

✓1max ✓2max ✓3max

five trials

B

✓1max ✓2max ✓3max ✓4max

AAAB+3icbVBNS8NAEN34WetXrEcvi0XwVDZe9Fjw4rGC/YCmhs120y7dTcLuRFpC/ooXD4p49Y9489+4bXPQ1gcDj/dmmJkXplIYIOTb2djc2t7ZrexV9w8Oj47dk1rHJJlmvM0SmeheSA2XIuZtECB5L9WcqlDybji5nfvdJ66NSOIHmKV8oOgoFpFgFKwUuDUfxhxo4D3mvlZY0WkRuHXSIAvgdeKVpI5KtAL3yx8mLFM8BiapMX2PpDDIqQbBJC+qfmZ4StmEjnjf0pgqbgb54vYCX1hliKNE24oBL9TfEzlVxsxUaDsVhbFZ9ebif14/g+hmkIs4zYDHbLkoyiSGBM+DwEOhOQM5s4QyLeytmI2ppgxsXFUbgrf68jrpXDU80vDuSb1Jyjgq6Aydo0vkoWvURHeohdqIoSl6Rq/ozSmcF+fd+Vi2bjjlzCn6A+fzB+L9lEI= sha1_base64="T4bKfsE+rjkX48us+V9x0RcmvEI=">AAAB+3icbVC7TsNAEFzzDOFlQklzIiBRRTYNlJFoKAMiDykx1vlySU65s627NUpk5VdoKECIlh+h42+4PApIGGml0cyudneiVAqDnvftrK1vbG5tF3aKu3v7B4fuUalhkkwzXmeJTHQrooZLEfM6CpS8lWpOVSR5MxreTP3mE9dGJPEDjlMeKNqPRU8wilYK3VIHBxxp6D/mHa2IoqNJ6Ja9ijcDWSX+gpSr7tl9FQBqofvV6SYsUzxGJqkxbd9LMcipRsEknxQ7meEpZUPa521LY6q4CfLZ7RNybpUu6SXaVoxkpv6eyKkyZqwi26koDsyyNxX/89oZ9q6DXMRphjxm80W9TBJMyDQI0hWaM5RjSyjTwt5K2IBqytDGVbQh+Msvr5LGZcX3Kv6dTcODOQpwAqdwAT5cQRVuoQZ1YDCCZ3iFN2fivDjvzse8dc1ZzBzDHzifP0T1lUg= sha1_base64="Cj1QpRS6ZxDkxuW1AMtCqYT8VJY=">AAAB+3icbVDLSsNAFJ3UV62vWJduBqvgqiRudGfBjcsq9gFNDZPppB06k4SZG2kJ+RU3LhRx64+4E/wMP8DpY6GtBy4czrmXe+8JEsE1OM6nVVhZXVvfKG6WtrZ3dvfs/XJTx6mirEFjEat2QDQTPGIN4CBYO1GMyECwVjC8mvitB6Y0j6M7GCesK0k/4iGnBIzk22UPBgyI795nnpJYklHu2xWn6kyBl4k7J5WafXx72f/+qvv2h9eLaSpZBFQQrTuuk0A3Iwo4FSwvealmCaFD0mcdQyMime5m09tzfGKUHg5jZSoCPFV/T2REaj2WgemUBAZ60ZuI/3mdFMKLbsajJAUW0dmiMBUYYjwJAve4YhTE2BBCFTe3YjogilAwcZVMCO7iy8ukeVZ1nap7Y9Jw0AxFdIiO0Cly0TmqoWtURw1E0Qg9omf0YuXWk/Vqvc1aC9Z85gD9gfX+AzjPl34=


2

AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSyCp5L0oseCF48V7Ac0MWy2m3bpbhJ2J9IS8le8eFDEq3/Em//GbZuDtj4YeLw3w8y8MBVcg+N8W5Wt7Z3dvep+7eDw6PjEPq33dJIpyro0EYkahEQzwWPWBQ6CDVLFiAwF64fT24Xff2JK8yR+gHnKfEnGMY84JWCkwK57MGFAgtZj7imJJZkVgd1wms4SeJO4JWmgEp3A/vJGCc0ki4EKovXQdVLwc6KAU8GKmpdplhI6JWM2NDQmkmk/X95e4EujjHCUKFMx4KX6eyInUuu5DE2nJDDR695C/M8bZhDd+DmP0wxYTFeLokxgSPAiCDziilEQc0MIVdzciumEKELBxFUzIbjrL2+SXqvpOk333mm0nTKOKjpHF+gKuegatdEd6qAuomiGntErerMK68V6tz5WrRWrnDlDf2B9/gDki5RD sha1_base64="4rKbX3yJRfpWlZcIiqsFvH9cfp0=">AAAB+3icbVDLTgJBEOzFF+JrxaOXiWjiiexy0SOJF49o5JHASmaHASbM7G5meg1kw6948aAxXv0Rb/6Nw+OgYCWdVKq6090VJlIY9LxvJ7exubW9k98t7O0fHB65x8WGiVPNeJ3FMtatkBouRcTrKFDyVqI5VaHkzXB0M/ObT1wbEUcPOEl4oOggEn3BKFqp6xY7OORIu5XHrKMVUXQ87bolr+zNQdaJvySlqnt+XwWAWtf96vRilioeIZPUmLbvJRhkVKNgkk8LndTwhLIRHfC2pRFV3ATZ/PYpubBKj/RjbStCMld/T2RUGTNRoe1UFIdm1ZuJ/3ntFPvXQSaiJEUescWifioJxmQWBOkJzRnKiSWUaWFvJWxINWVo4yrYEPzVl9dJo1L2vbJ/Z9PwYIE8nMIZXIIPV1CFW6hBHRiM4Rle4c2ZOi/Ou/OxaM05y5kT+APn8wdGg5VJ sha1_base64="s4uCoPtrttTugIlaLnXf38pn33A=">AAAB+3icbVDLSsNAFJ34rPUV69LNYBVclaQb3Vlw47KKfUBTw2Q6aYfOJGHmRlpCfsWNC0Xc+iPuBD/DD3D6WGjrgQuHc+7l3nuCRHANjvNprayurW9sFraK2zu7e/v2Qamp41RR1qCxiFU7IJoJHrEGcBCsnShGZCBYKxheTfzWA1Oax9EdjBPWlaQf8ZBTAkby7ZIHAwbEr95nnpJYklHu22Wn4kyBl4k7J+WafXJ72f/+qvv2h9eLaSpZBFQQrTuuk0A3Iwo4FSwveqlmCaFD0mcdQyMime5m09tzfGqUHg5jZSoCPFV/T2REaj2WgemUBAZ60ZuI/3mdFMKLbsajJAUW0dmiMBUYYjwJAve4YhTE2BBCFTe3YjogilAwcRVNCO7iy8ukWa24TsW9MWk4aIYCOkLH6Ay56BzV0DWqowaiaIQe0TN6sXLryXq13matK9Z85hD9gfX+Azpdl38=

1.4

C

ten trials ✓1max

3


AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSyCp5L0oseCF48V7Ac0MWy2m3bpbhJ2J9IS8le8eFDEq3/Em//GbZuDtj4YeLw3w8y8MBVcg+N8W5Wt7Z3dvep+7eDw6PjEPq33dJIpyro0EYkahEQzwWPWBQ6CDVLFiAwF64fT24Xff2JK8yR+gHnKfEnGMY84JWCkwK57MGFAgtZj7imJJZkVgd1wms4SeJO4JWmgEp3A/vJGCc0ki4EKovXQdVLwc6KAU8GKmpdplhI6JWM2NDQmkmk/X95e4EujjHCUKFMx4KX6eyInUuu5DE2nJDDR695C/M8bZhDd+DmP0wxYTFeLokxgSPAiCDziilEQc0MIVdzciumEKELBxFUzIbjrL2+SXqvpOk333mm0nTKOKjpHF+gKuegatdEd6qAuomiGntErerMK68V6tz5WrRWrnDlDf2B9/gDki5RD sha1_base64="4rKbX3yJRfpWlZcIiqsFvH9cfp0=">AAAB+3icbVDLTgJBEOzFF+JrxaOXiWjiiexy0SOJF49o5JHASmaHASbM7G5meg1kw6948aAxXv0Rb/6Nw+OgYCWdVKq6090VJlIY9LxvJ7exubW9k98t7O0fHB65x8WGiVPNeJ3FMtatkBouRcTrKFDyVqI5VaHkzXB0M/ObT1wbEUcPOEl4oOggEn3BKFqp6xY7OORIu5XHrKMVUXQ87bolr+zNQdaJvySlqnt+XwWAWtf96vRilioeIZPUmLbvJRhkVKNgkk8LndTwhLIRHfC2pRFV3ATZ/PYpubBKj/RjbStCMld/T2RUGTNRoe1UFIdm1ZuJ/3ntFPvXQSaiJEUescWifioJxmQWBOkJzRnKiSWUaWFvJWxINWVo4yrYEPzVl9dJo1L2vbJ/Z9PwYIE8nMIZXIIPV1CFW6hBHRiM4Rle4c2ZOi/Ou/OxaM05y5kT+APn8wdGg5VJ sha1_base64="s4uCoPtrttTugIlaLnXf38pn33A=">AAAB+3icbVDLSsNAFJ34rPUV69LNYBVclaQb3Vlw47KKfUBTw2Q6aYfOJGHmRlpCfsWNC0Xc+iPuBD/DD3D6WGjrgQuHc+7l3nuCRHANjvNprayurW9sFraK2zu7e/v2Qamp41RR1qCxiFU7IJoJHrEGcBCsnShGZCBYKxheTfzWA1Oax9EdjBPWlaQf8ZBTAkby7ZIHAwbEr95nnpJYklHu22Wn4kyBl4k7J+WafXJ72f/+qvv2h9eLaSpZBFQQrTuuk0A3Iwo4FSwveqlmCaFD0mcdQyMime5m09tzfGqUHg5jZSoCPFV/T2REaj2WgemUBAZ60ZuI/3mdFMKLbsajJAUW0dmiMBUYYjwJAve4YhTE2BBCFTe3YjogilAwcRVNCO7iy8ukWa24TsW9MWk4aIYCOkLH6Ay56BzV0DWqowaiaIQe0TN6sXLryXq13matK9Z85hD9gfX+Azpdl38=

AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSyCp5LoQY8FLx4r2A9oYthsN+3S3STsTqQl5K948aCIV/+IN/+N2zYHbX0w8Hhvhpl5YSq4Bsf5tiobm1vbO9Xd2t7+weGRfVzv6iRTlHVoIhLVD4lmgsesAxwE66eKERkK1gsnt3O/98SU5kn8ALOU+ZKMYh5xSsBIgV33YMyABFePuacklmRaBHbDaToL4HXilqSBSrQD+8sbJjSTLAYqiNYD10nBz4kCTgUral6mWUrohIzYwNCYSKb9fHF7gc+NMsRRokzFgBfq74mcSK1nMjSdksBYr3pz8T9vkEF04+c8TjNgMV0uijKBIcHzIPCQK0ZBzAwhVHFzK6ZjoggFE1fNhOCuvrxOupdN12m6906j5ZRxVNEpOkMXyEXXqIXuUBt1EEVT9Ixe0ZtVWC/Wu/WxbK1Y5cwJ+gPr8wfmGZRE sha1_base64="tXAtYm522ZoWb5oVqG6R7qzFndE=">AAAB+3icbVC7TsNAEFyHVwgvE0qaEwGJKrKhgDISDWVA5CElITpfLskpd7Z1t0aJLP8KDQUI0fIjdPwNl0cBCSOtNJrZ1e5OEEth0PO+ndza+sbmVn67sLO7t3/gHhbrJko04zUWyUg3A2q4FCGvoUDJm7HmVAWSN4LRzdRvPHFtRBQ+4CTmHUUHoegLRtFKXbfYxiFH2r18TNtaEUXHWdcteWVvBrJK/AUpVdzT+woAVLvuV7sXsUTxEJmkxrR8L8ZOSjUKJnlWaCeGx5SN6IC3LA2p4qaTzm7PyJlVeqQfaVshkpn6eyKlypiJCmynojg0y95U/M9rJdi/7qQijBPkIZsv6ieSYESmQZCe0JyhnFhCmRb2VsKGVFOGNq6CDcFffnmV1C/Kvlf272waHsyRh2M4gXPw4QoqcAtVqAGDMTzDK7w5mfPivDsf89acs5g5gj9wPn8ASBGVSg== sha1_base64="x7L2WXpXVEWa21JRDoYSCEHJwx8=">AAAB+3icbVDLSsNAFJ3UV62vWJduBqvgqiS60J0FNy6r2Ac0MUymk3boTBJmbqQl9FfcuFDErT/iTvAz/ACnrQttPXDhcM693HtPmAquwXE+rMLS8srqWnG9tLG5tb1j75abOskUZQ2aiES1Q6KZ4DFrAAfB2qliRIaCtcLB5cRv3TOleRLfwihlviS9mEecEjBSYJc96DMgweld7imJJRmOA7viVJ0p8CJxf0ilZh/eXPS+PuuB/e51E5pJFgMVROuO66Tg50QBp4KNS16mWUrogPRYx9CYSKb9fHr7GB8ZpYujRJmKAU/V3xM5kVqPZGg6JYG+nvcm4n9eJ4Po3M95nGbAYjpbFGUCQ4InQeAuV4yCGBlCqOLmVkz7RBEKJq6SCcGdf3mRNE+qrlN1r00aDpqhiPbRATpGLjpDNXSF6qiBKBqiB/SEnq2x9Wi9WK+z1oL1M7OH/sB6+wY765eA

2.5

AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSyCp5LoQY8FLx4r2A9oYthsN+3S3STsTqQl5K948aCIV/+IN/+N2zYHbX0w8Hhvhpl5YSq4Bsf5tiobm1vbO9Xd2t7+weGRfVzv6iRTlHVoIhLVD4lmgsesAxwE66eKERkK1gsnt3O/98SU5kn8ALOU+ZKMYh5xSsBIgV33YMyABFePuacklmRaBHbDaToL4HXilqSBSrQD+8sbJjSTLAYqiNYD10nBz4kCTgUral6mWUrohIzYwNCYSKb9fHF7gc+NMsRRokzFgBfq74mcSK1nMjSdksBYr3pz8T9vkEF04+c8TjNgMV0uijKBIcHzIPCQK0ZBzAwhVHFzK6ZjoggFE1fNhOCuvrxOupdN12m6906j5ZRxVNEpOkMXyEXXqIXuUBt1EEVT9Ixe0ZtVWC/Wu/WxbK1Y5cwJ+gPr8wfmGZRE sha1_base64="tXAtYm522ZoWb5oVqG6R7qzFndE=">AAAB+3icbVC7TsNAEFyHVwgvE0qaEwGJKrKhgDISDWVA5CElITpfLskpd7Z1t0aJLP8KDQUI0fIjdPwNl0cBCSOtNJrZ1e5OEEth0PO+ndza+sbmVn67sLO7t3/gHhbrJko04zUWyUg3A2q4FCGvoUDJm7HmVAWSN4LRzdRvPHFtRBQ+4CTmHUUHoegLRtFKXbfYxiFH2r18TNtaEUXHWdcteWVvBrJK/AUpVdzT+woAVLvuV7sXsUTxEJmkxrR8L8ZOSjUKJnlWaCeGx5SN6IC3LA2p4qaTzm7PyJlVeqQfaVshkpn6eyKlypiJCmynojg0y95U/M9rJdi/7qQijBPkIZsv6ieSYESmQZCe0JyhnFhCmRb2VsKGVFOGNq6CDcFffnmV1C/Kvlf272waHsyRh2M4gXPw4QoqcAtVqAGDMTzDK7w5mfPivDsf89acs5g5gj9wPn8ASBGVSg== sha1_base64="x7L2WXpXVEWa21JRDoYSCEHJwx8=">AAAB+3icbVDLSsNAFJ3UV62vWJduBqvgqiS60J0FNy6r2Ac0MUymk3boTBJmbqQl9FfcuFDErT/iTvAz/ACnrQttPXDhcM693HtPmAquwXE+rMLS8srqWnG9tLG5tb1j75abOskUZQ2aiES1Q6KZ4DFrAAfB2qliRIaCtcLB5cRv3TOleRLfwihlviS9mEecEjBSYJc96DMgweld7imJJRmOA7viVJ0p8CJxf0ilZh/eXPS+PuuB/e51E5pJFgMVROuO66Tg50QBp4KNS16mWUrogPRYx9CYSKb9fHr7GB8ZpYujRJmKAU/V3xM5kVqPZGg6JYG+nvcm4n9eJ4Po3M95nGbAYjpbFGUCQ4InQeAuV4yCGBlCqOLmVkz7RBEKJq6SCcGdf3mRNE+qrlN1r00aDpqhiPbRATpGLjpDNXSF6qiBKBqiB/SEnq2x9Wi9WK+z1oL1M7OH/sB6+wY765eA

max ✓10

AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSyCp5KIoMeCF48V7Ac0MWy2m3bpbhJ2J9IS8le8eFDEq3/Em//GbZuDtj4YeLw3w8y8MBVcg+N8W5WNza3tnepubW//4PDIPq53dZIpyjo0EYnqh0QzwWPWAQ6C9VPFiAwF64WT27nfe2JK8yR+gFnKfElGMY84JWCkwK57MGZAgqvH3FMSSzItArvhNJ0F8DpxS9JAJdqB/eUNE5pJFgMVROuB66Tg50QBp4IVNS/TLCV0QkZsYGhMJNN+vri9wOdGGeIoUaZiwAv190ROpNYzGZpOSWCsV725+J83yCC68XMepxmwmC4XRZnAkOB5EHjIFaMgZoYQqri5FdMxUYSCiatmQnBXX14n3cum6zTde6fRcso4qugUnaEL5KJr1EJ3qI06iKIpekav6M0qrBfr3fpYtlascuYE/YH1+QPnp5RF sha1_base64="QP2cI2olNNdWY2g9DcjeW4YupNw=">AAAB+3icbVDLSgNBEOyNrxhfazx6GYyCp7Argh4DXjxGMQ9IYpidTJIhM7vLTK8kLPsrXjwo4tUf8ebfOHkcNLGgoajqprsriKUw6HnfTm5tfWNzK79d2Nnd2z9wD4t1EyWa8RqLZKSbATVcipDXUKDkzVhzqgLJG8HoZuo3nrg2IgofcBLzjqKDUPQFo2ilrlts45Aj7V4+pm2tiKLjrOuWvLI3A1kl/oKUKu7pfQUAql33q92LWKJ4iExSY1q+F2MnpRoFkzwrtBPDY8pGdMBbloZUcdNJZ7dn5MwqPdKPtK0QyUz9PZFSZcxEBbZTURyaZW8q/ue1Euxfd1IRxgnykM0X9RNJMCLTIEhPaM5QTiyhTAt7K2FDqilDG1fBhuAvv7xK6hdl3yv7dzYND+bIwzGcwDn4cAUVuIUq1IDBGJ7hFd6czHlx3p2PeWvOWcwcwR84nz9Jn5VL sha1_base64="maR9nmwYDcPwONjfmctycV0Ovlw=">AAAB+3icbVDLSsNAFJ3UV62vWJduBqvgqiQi6M6CG5dV7AOaGCbTSTt0JgkzN9IS+ituXCji1h9xJ/gZfoDT1oW2HrhwOOde7r0nTAXX4DgfVmFpeWV1rbhe2tjc2t6xd8tNnWSKsgZNRKLaIdFM8Jg1gINg7VQxIkPBWuHgcuK37pnSPIlvYZQyX5JezCNOCRgpsMse9BmQ4PQu95TEkgzHgV1xqs4UeJG4P6RSsw9vLnpfn/XAfve6Cc0ki4EKonXHdVLwc6KAU8HGJS/TLCV0QHqsY2hMJNN+Pr19jI+M0sVRokzFgKfq74mcSK1HMjSdkkBfz3sT8T+vk0F07uc8TjNgMZ0tijKBIcGTIHCXK0ZBjAwhVHFzK6Z9oggFE1fJhODOv7xImidV16m61yYNB81QRPvoAB0jF52hGrpCddRAFA3RA3pCz9bYerRerNdZa8H6mdlDf2C9fQM9eZeB

1.2

2

✓5max

1.5

AAAB/nicbVBNS8NAEN34WetXVDx5WSyCp7LxoseCF48V7Ac0MWy2m3bpbhJ2J2IJBf+KFw+KePV3ePPfuG1z0NYHA4/3ZpiZF2VSGCDk21lZXVvf2KxsVbd3dvf23YPDtklzzXiLpTLV3YgaLkXCWyBA8m6mOVWR5J1odD31Ow9cG5EmdzDOeKDoIBGxYBSsFLrHPgw50LDwyOS+8LXCij5OQrdG6mQGvEy8ktRQiWbofvn9lOWKJ8AkNabnkQyCgmoQTPJJ1c8Nzygb0QHvWZpQxU1QzM6f4DOr9HGcalsJ4Jn6e6KgypiximynojA0i95U/M/r5RBfBYVIshx4wuaL4lxiSPE0C9wXmjOQY0so08LeitmQasrAJla1IXiLLy+T9kXdI3XvltQapIyjgk7QKTpHHrpEDXSDmqiFGCrQM3pFb86T8+K8Ox/z1hWnnDlCf+B8/gApRZWI sha1_base64="/9DxfXfBsn3taFrzRgIZ64eq6Js=">AAAB/nicbVDLSgNBEOz1GeNrVTx5GYyCp7DrRY8BLx6jmAck6zI7mSRDZnaXmV4xLAF/xYsHRbz6Hd78GyePgyYWNBRV3XR3RakUBj3v21laXlldWy9sFDe3tnd23b39ukkyzXiNJTLRzYgaLkXMayhQ8maqOVWR5I1ocDX2Gw9cG5HEdzhMeaBoLxZdwShaKXQP29jnSMPc90b3eVsroujjKHRLXtmbgCwSf0ZKFffktgIA1dD9ancSlikeI5PUmJbvpRjkVKNgko+K7czwlLIB7fGWpTFV3AT55PwRObVKh3QTbStGMlF/T+RUGTNUke1UFPtm3huL/3mtDLuXQS7iNEMes+mibiYJJmScBekIzRnKoSWUaWFvJaxPNWVoEyvaEPz5lxdJ/bzse2X/xqbhwRQFOIJjOAMfLqAC11CFGjDI4Rle4c15cl6cd+dj2rrkzGYO4A+czx+LLpaO sha1_base64="2SNd8l7UAoNJ2LYUiaGMBeUJ/HI=">AAAB/nicbVDLSsNAFJ3UV62vqLhyM1gFVyVxozsLblxWsQ9oaphMJ+3QmSTM3IglBPwVNy4Ucet3uBP8DD/A6WOhrQcuHM65l3vvCRLBNTjOp1VYWFxaXimultbWNza37O2dho5TRVmdxiJWrYBoJnjE6sBBsFaiGJGBYM1gcDHym3dMaR5HNzBMWEeSXsRDTgkYybf3POgzIH7mOvlt5imJJbnPfbvsVJwx8Dxxp6RctQ+vz3vfXzXf/vC6MU0li4AKonXbdRLoZEQBp4LlJS/VLCF0QHqsbWhEJNOdbHx+jo+M0sVhrExFgMfq74mMSK2HMjCdkkBfz3oj8T+vnUJ41sl4lKTAIjpZFKYCQ4xHWeAuV4yCGBpCqOLmVkz7RBEKJrGSCcGdfXmeNE4qrlNxr0waDpqgiPbRATpGLjpFVXSJaqiOKMrQI3pGL9aD9WS9Wm+T1oI1ndlFf2C9/wB/CJjE

AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSyCp5IIoseCF48V7Ac0MWy2m3bpbhJ2J9IS8le8eFDEq3/Em//GbZuDtj4YeLw3w8y8MBVcg+N8W5WNza3tnepubW//4PDIPq53dZIpyjo0EYnqh0QzwWPWAQ6C9VPFiAwF64WT27nfe2JK8yR+gFnKfElGMY84JWCkwK57MGZAgqvH3FMSSzItArvhNJ0F8DpxS9JAJdqB/eUNE5pJFgMVROuB66Tg50QBp4IVNS/TLCV0QkZsYGhMJNN+vri9wOdGGeIoUaZiwAv190ROpNYzGZpOSWCsV725+J83yCC68XMepxmwmC4XRZnAkOB5EHjIFaMgZoYQqri5FdMxUYSCiatmQnBXX14n3cum6zTde6fRcso4qugUnaEL5KJr1EJ3qI06iKIpekav6M0qrBfr3fpYtlascuYE/YH1+QPpNZRG sha1_base64="yv/Ac3qTsC3tIiOwk/H15dSFKLM=">AAAB+3icbVDLSgNBEOyNrxhfazx6GYyCp7AriB4DXjxGMQ9IYpidTJIhM7vLTK8kLPsrXjwo4tUf8ebfOHkcNLGgoajqprsriKUw6HnfTm5tfWNzK79d2Nnd2z9wD4t1EyWa8RqLZKSbATVcipDXUKDkzVhzqgLJG8HoZuo3nrg2IgofcBLzjqKDUPQFo2ilrlts45Aj7V4+pm2tiKLjrOuWvLI3A1kl/oKUKu7pfQUAql33q92LWKJ4iExSY1q+F2MnpRoFkzwrtBPDY8pGdMBbloZUcdNJZ7dn5MwqPdKPtK0QyUz9PZFSZcxEBbZTURyaZW8q/ue1Euxfd1IRxgnykM0X9RNJMCLTIEhPaM5QTiyhTAt7K2FDqilDG1fBhuAvv7xK6hdl3yv7dzYND+bIwzGcwDn4cAUVuIUq1IDBGJ7hFd6czHlx3p2PeWvOWcwcwR84nz9LLZVM sha1_base64="rgnfkOUWta5DpgCspyJwinbVzsY=">AAAB+3icbVDLSsNAFJ3UV62vWJduBqvgqiSC6M6CG5dV7AOaGCbTSTt0JgkzN9IS+ituXCji1h9xJ/gZfoDT1oW2HrhwOOde7r0nTAXX4DgfVmFpeWV1rbhe2tjc2t6xd8tNnWSKsgZNRKLaIdFM8Jg1gINg7VQxIkPBWuHgcuK37pnSPIlvYZQyX5JezCNOCRgpsMse9BmQ4PQu95TEkgzHgV1xqs4UeJG4P6RSsw9vLnpfn/XAfve6Cc0ki4EKonXHdVLwc6KAU8HGJS/TLCV0QHqsY2hMJNN+Pr19jI+M0sVRokzFgKfq74mcSK1HMjSdkkBfz3sT8T+vk0F07uc8TjNgMZ0tijKBIcGTIHCXK0ZBjAwhVHFzK6Z9oggFE1fJhODOv7xImidV16m61yYNB81QRPvoAB0jF52hGrpCddRAFA3RA3pCz9bYerRerNdZa8H6mdlDf2C9fQM/B5eC

1

1.5 1

0.8 0.6 0

0.1

0.2

0.3

0.4

0.5

0

1 0.1

0.2

0.3

0.4

0.5

0.5 0

0.1

0.2

0.3

0.4

0.5

max , for n trials with correlated states. A. For n = 3, the set of optimal thresholds θmax obeys the ordering Figure 5: Optimal sets of thresholds, θ1:n 1:3

θ1max ≥ θ2max ≥ θ3max for all values of , converging as → 0.5− and as → 0+ . Vertical lines denote points below which θmax = y0 successively for j j = 3, 2, 1. B,C. Ordering of optimal thresholds is consistent in the case of n = 5 (panel B) and n = 10 trials (panel C). j

The expression for the mean decision time, DT j , on trial j is determined by marginalizing over whether or not the initial state y0j aligns with the true state H j , yielding   1 − e−θ j /D    θj − (1 − 2)(2c j−1 − 1)y0j −θ j /D DT j =  1 + e    0

: θ j > y0j , : θ j ≤ y0j .

(27)

Thus, Eq. (21), the probability of a correct decision on trial j − 1, is needed to determine the mean decision time, DT j , on trial j. The resulting values for c j and DT j can be used in Eq. (23) to determine the RR which again achieves a maximum for some choice of thresholds, θ1:n := (θ1 , θ2 , ..., θn ). max In Fig. 5 we show that the sequence of decision thresholds that maximize RR, θ1:n , is decreasing (θ1max ≥ θ2max ≥ max . . . ≥ θn ) across trials, consistent with our observation for two trials. Again, the increased accuracy in earlier trials improves accuracy in later trials, so there is value in gathering more information early in the trial sequence. Alternatively, an observer has less to gain from waiting for more information in a late trial, as the additional evidence can only be used on a few more trials. For intermediate to large values of , the observer uses a near constant threshold on the first n − 1 trials, and then a lower threshold on the last trial. As is decreased, the last threshold, θnmax , collides max max with the boundary yn0 , so the last decision is made instantaneously. The previous thresholds, θn−1 , θn−2 , . . . collide with n−1 n−2 the corresponding boundaries y0 , y0 , . . . in reverse order as is decreased further. Thus for lower values of , an increasing number of decisions toward the end of the sequence are made instantaneously. 6. Comparison to experimental results on repetition bias We next discuss the experimental predictions of our model, motivated by previous observations of repetition bias. It has been long known that trends appear in the response sequences of subjects in series of TAFC trials (Fernberger, 1920), even when the trials are uncorrelated (Cho et al., 2002; Gao et al., 2009). For instance, when the state is the same on two adjacent trials, the probability of a correct response on the second trial increases (Fründ et al., 2014), and the response time decreases (Jones et al., 2013). In our model, if the assumed change rate of the environment does not match the true change rate, assumed , true , performance is decreased compared to assumed = true . However, this decrease may not be severe. Thus the observed response trends may arise partly due to misestimation of the transition rate , and the assumption that trials are dependent, even if they are not (Cho et al., 2002; Yu and Cohen, 2008). Motivated by our findings on RRmax 1:n in the previous sections, we focus on the model of an observer who fixes their decision threshold across all trials, θ j = θ, ∀ j. Since allowing for dynamic thresholds does not impact RRmax 1:n considerably, we think the qualitative trends we observe here should generalize.

11


A

1

B

dj 1 = +1 uncond. dj 1 = 1

1

AAAB8XicbVBNSwMxEJ2tX7V+VT16CRZBEMtGBHsRCl48VrAf2C4lm822sdnskmSFsvRfePGgiFf/jTf/jWm7B219MPB4b4aZeX4iuDau++0UVlbX1jeKm6Wt7Z3dvfL+QUvHqaKsSWMRq45PNBNcsqbhRrBOohiJfMHa/uhm6refmNI8lvdmnDAvIgPJQ06JsdJD0M8ez/Hk+gz3yxW36s6AlgnOSQVyNPrlr14Q0zRi0lBBtO5iNzFeRpThVLBJqZdqlhA6IgPWtVSSiGkvm108QSdWCVAYK1vSoJn6eyIjkdbjyLedETFDvehNxf+8bmrCmpdxmaSGSTpfFKYCmRhN30cBV4waMbaEUMXtrYgOiSLU2JBKNgS8+PIyaV1UsVvFd5eVei2PowhHcAyngOEK6nALDWgCBQnP8ApvjnZenHfnY95acPKZQ/gD5/MHQIOP8A==

0.8

RR AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEaI8FLx5rsR/QhrLZbtqlm03YnQgl9B948aCIV/+RN/+N2zYHbX0w8Hhvhpl5QSKFQdf9dgpb2zu7e8X90sHh0fFJ+fSsY+JUM95msYx1L6CGS6F4GwVK3ks0p1EgeTeY3i387hPXRsTqEWcJ9yM6ViIUjKKVWq3WsFxxq+4SZJN4OalAjuaw/DUYxSyNuEImqTF9z03Qz6hGwSSflwap4QllUzrmfUsVjbjxs+Wlc3JllREJY21LIVmqvycyGhkziwLbGVGcmHVvIf7n9VMM634mVJIiV2y1KEwlwZgs3iYjoTlDObOEMi3srYRNqKYMbTglG4K3/vIm6dxUPbfqPdxWGvU8jiJcwCVcgwc1aMA9NKENDEJ4hld4c6bOi/PufKxaC04+cw5/4Hz+AEssjSg=

AAAB9XicbVBNS8NAEN3Ur1q/qh69LBbBU0hEsMeCF48V7Ae0sWw2m3bpZjfsTtQS+j+8eFDEq//Fm//GbZuDtj4YeLw3w8y8MBXcgOd9O6W19Y3NrfJ2ZWd3b/+genjUNirTlLWoEkp3Q2KY4JK1gINg3VQzkoSCdcLx9czvPDBtuJJ3MElZkJCh5DGnBKx03wf2BHkmqZKROx1Ua57rzYFXiV+QGirQHFS/+pGiWcIkUEGM6fleCkFONHAq2LTSzwxLCR2TIetZKknCTJDPr57iM6tEOFbalgQ8V39P5CQxZpKEtjMhMDLL3kz8z+tlENeDnMs0AybpYlGcCQwKzyLAEdeMgphYQqjm9lZMR0QTCjaoig3BX355lbQvXN9z/dvLWqNexFFGJ+gUnSMfXaEGukFN1EIUafSMXtGb8+i8OO/Ox6K15BQzx+gPnM8fASWSyg==

AA AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEsMcWLx6r2A9oQ9lsJ+3SzSbsboQS+g+8eFDEq//Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJaPZpqgH9GR5CFn1FjpodEYlCtu1V2ArBMvJxXI0RyUv/rDmKURSsME1brnuYnxM6oMZwJnpX6qMaFsQkfYs1TSCLWfLS6dkQurDEkYK1vSkIX6eyKjkdbTKLCdETVjverNxf+8XmrCmp9xmaQGJVsuClNBTEzmb5MhV8iMmFpCmeL2VsLGVFFmbDglG4K3+vI6aV9VPbfq3V9X6rU8jiKcwTlcggc3UIc7aEILGITwDK/w5kycF+fd+Vi2Fpx85hT+wPn8AReTjQY=

AAAB8XicbVBNSwMxEJ2tX7V+VT16CRbBS8tGBHsRCl48VrAf2C4lm822sdnskmSFsvRfePGgiFf/jTf/jWm7B219MPB4b4aZeX4iuDau++0U1tY3NreK26Wd3b39g/LhUVvHqaKsRWMRq65PNBNcspbhRrBuohiJfME6/vhm5neemNI8lvdmkjAvIkPJQ06JsdJDMMgeq3h6XcWDcsWtuXOgVYJzUoEczUH5qx/ENI2YNFQQrXvYTYyXEWU4FWxa6qeaJYSOyZD1LJUkYtrL5hdP0ZlVAhTGypY0aK7+nshIpPUk8m1nRMxIL3sz8T+vl5qw7mVcJqlhki4WhalAJkaz91HAFaNGTCwhVHF7K6Ijogg1NqSSDQEvv7xK2hc17Nbw3WWlUc/jKMIJnMI5YLiCBtxCE1pAQcIzvMKbo50X5935WLQWnHzmGP7A+fwBQ42P8g==

cXY

0.6

0.5

AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEsMeKF4+12A9oQ9lsJ+3SzSbsboQS+g+8eFDEq//Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ3dzvPKHSPJaPZpqgH9GR5CFn1FipedsclCtu1V2ArBMvJxXI0RiUv/rDmKURSsME1brnuYnxM6oMZwJnpX6qMaFsQkfYs1TSCLWfLS6dkQurDEkYK1vSkIX6eyKjkdbTKLCdETVjverNxf+8XmrCmp9xmaQGJVsuClNBTEzmb5MhV8iMmFpCmeL2VsLGVFFmbDglG4K3+vI6aV9VPbfqPVxX6rU8jiKcwTlcggc3UId7aEALGITwDK/w5kycF+fd+Vi2Fpx85hT+wPn8ATFXjRc=

AR

AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0R9CQBLx4jmIckS5idTJIxszPLTK8QlvyDFw+KePV/vPk3TpI9aGJBQ1HVTXdXlEhh0fe/vZXVtfWNzcJWcXtnd2+/dHDYsDo1jNeZltq0Imq5FIrXUaDkrcRwGkeSN6PRzdRvPnFjhVb3OE54GNOBEn3BKDqpwbpZ62HSLZX9ij8DWSZBTsqQo9YtfXV6mqUxV8gktbYd+AmGGTUomOSTYie1PKFsRAe87aiiMbdhNrt2Qk6d0iN9bVwpJDP190RGY2vHceQ6Y4pDu+hNxf+8dor9qzATKkmRKzZf1E8lQU2mr5OeMJyhHDtCmRHuVsKG1FCGLqCiCyFYfHmZNM4rgV8J7i7K1es8jgIcwwmcQQCXUIVbqEEdGDzCM7zCm6e9F+/d+5i3rnj5zBH8gff5A5f6jx0=

0.4

RA AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEsMeKF4+12A9oQ9lsJ+3SzSbsboQS+g+8eFDEq//Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ3dzvPKHSPJaPZpqgH9GR5CFn1Fip2bwdlCtu1V2ArBMvJxXI0RiUv/rDmKURSsME1brnuYnxM6oMZwJnpX6qMaFsQkfYs1TSCLWfLS6dkQurDEkYK1vSkIX6eyKjkdbTKLCdETVjverNxf+8XmrCmp9xmaQGJVsuClNBTEzmb5MhV8iMmFpCmeL2VsLGVFFmbDglG4K3+vI6aV9VPbfqPVxX6rU8jiKcwTlcggc3UId7aEALGITwDK/w5kycF+fd+Vi2Fpx85hT+wPn8ATFojRc=

0.2

C

0 -10

-5

0.5AR

0

5

10

D

0 0

RA

0.2

0.3

0.4

0.5

4

5

2

AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEsMeKF4+12A9oQ9lsJ+3SzSbsboQS+g+8eFDEq//Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ3dzvPKHSPJaPZpqgH9GR5CFn1FipedsclCtu1V2ArBMvJxXI0RiUv/rDmKURSsME1brnuYnxM6oMZwJnpX6qMaFsQkfYs1TSCLWfLS6dkQurDEkYK1vSkIX6eyKjkdbTKLCdETVjverNxf+8XmrCmp9xmaQGJVsuClNBTEzmb5MhV8iMmFpCmeL2VsLGVFFmbDglG4K3+vI6aV9VPbfqPVxX6rU8jiKcwTlcggc3UId7aEALGITwDK/w5kycF+fd+Vi2Fpx85hT+wPn8ATFXjRc=

✏ = 0.1 AAAB83icbVBNSwMxEJ2tX7V+VT16CRbBU9kVQS9KwYvHCvYDukvJptk2NJuEJCuUpX/DiwdFvPpnvPlvTNs9aOuDgcd7M8zMixVnxvr+t1daW9/Y3CpvV3Z29/YPqodHbSMzTWiLSC51N8aGciZoyzLLaVdpitOY0048vpv5nSeqDZPi0U4UjVI8FCxhBFsnhSFVhnEpbvx60K/W/Lo/B1olQUFqUKDZr36FA0mylApLODamF/jKRjnWlhFOp5UwM1RhMsZD2nNU4JSaKJ/fPEVnThmgRGpXwqK5+nsix6kxkzR2nSm2I7PszcT/vF5mk+soZ0JllgqyWJRkHFmJZgGgAdOUWD5xBBPN3K2IjLDGxLqYKi6EYPnlVdK+qAcusYfLWuO2iKMMJ3AK5xDAFTTgHprQAgIKnuEV3rzMe/HevY9Fa8krZo7hD7zPHyNckRA=

AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEsMeKF4+12A9oQ9lsJ+3SzSbsboQS+g+8eFDEq//Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ3dzvPKHSPJaPZpqgH9GR5CFn1Fip2bwdlCtu1V2ArBMvJxXI0RiUv/rDmKURSsME1brnuYnxM6oMZwJnpX6qMaFsQkfYs1TSCLWfLS6dkQurDEkYK1vSkIX6eyKjkdbTKLCdETVjverNxf+8XmrCmp9xmaQGJVsuClNBTEzmb5MhV8iMmFpCmeL2VsLGVFFmbDglG4K3+vI6aV9VPbfqPVxX6rU8jiKcwTlcggc3UId7aEALGITwDK/w5kycF+fd+Vi2Fpx85hT+wPn8ATFojRc=

0.4

0.1

1.5

AA RR

AAAB7XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoCcJePEYIS9JljA7mU3GzGOZmRXCkn/w4kERr/6PN//GSbIHTSxoKKq66e6KEs6M9f1vr7C2vrG5Vdwu7ezu7R+UD49aRqWa0CZRXOlOhA3lTNKmZZbTTqIpFhGn7Wh8O/PbT1QbpmTDThIaCjyULGYEWye1Gv2s8zDtlyt+1Z8DrZIgJxXIUe+Xv3oDRVJBpSUcG9MN/MSGGdaWEU6npV5qaILJGA9p11GJBTVhNr92is6cMkCx0q6kRXP190SGhTETEblOge3ILHsz8T+vm9r4OsyYTFJLJVksilOOrEKz19GAaUosnziCiWbuVkRGWGNiXUAlF0Kw/PIqaV1UA78a3F9Wajd5HEU4gVM4hwCuoAZ3UIcmEHiEZ3iFN095L96797FoLXj5zDH8gff5A4Dzjw4=

TXY

AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEsMcWLx6r2A9oQ9lsJ+3SzSbsboQS+g+8eFDEq//Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKHSPJaPZpqgH9GR5CFn1FjpodEYlCtu1V2ArBMvJxXI0RyUv/rDmKURSsME1brnuYnxM6oMZwJnpX6qMaFsQkfYs1TSCLWfLS6dkQurDEkYK1vSkIX6eyKjkdbTKLCdETVjverNxf+8XmrCmp9xmaQGJVsuClNBTEzmb5MhV8iMmFpCmeL2VsLGVFFmbDglG4K3+vI6aV9VPbfq3V9X6rU8jiKcwTlcggc3UIc7aEILGITwDK/w5kycF+fd+Vi2Fpx85hT+wPn8AReTjQY=

0.3

1

AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEaI8FLx5rsR/QhrLZbtqlm03YnQgl9B948aCIV/+RN/+N2zYHbX0w8Hhvhpl5QSKFQdf9dgpb2zu7e8X90sHh0fFJ+fSsY+JUM95msYx1L6CGS6F4GwVK3ks0p1EgeTeY3i387hPXRsTqEWcJ9yM6ViIUjKKVWq3WsFxxq+4SZJN4OalAjuaw/DUYxSyNuEImqTF9z03Qz6hGwSSflwap4QllUzrmfUsVjbjxs+Wlc3JllREJY21LIVmqvycyGhkziwLbGVGcmHVvIf7n9VMM634mVJIiV2y1KEwlwZgs3iYjoTlDObOEMi3srYRNqKYMbTglG4K3/vIm6dxUPbfqPdxWGvU8jiJcwCVcgwc1aMA9NKENDEJ4hld4c6bOi/PufKxaC04+cw5/4Hz+AEssjSg=

✏ = 0.25

0.2

AAAB9HicbVBNSwMxEM3Wr1q/qh69BIvgadktil6UghePFewHtEvJprNtaDZZk2yhLP0dXjwo4tUf481/Y9ruQVsfDDzem2FmXphwpo3nfTuFtfWNza3idmlnd2//oHx41NQyVRQaVHKp2iHRwJmAhmGGQztRQOKQQysc3c381hiUZlI8mkkCQUwGgkWMEmOloAuJZlyKG8+tXvbKFc/15sCrxM9JBeWo98pf3b6kaQzCUE607vheYoKMKMMoh2mpm2pICB2RAXQsFSQGHWTzo6f4zCp9HEllSxg8V39PZCTWehKHtjMmZqiXvZn4n9dJTXQdZEwkqQFBF4uilGMj8SwB3GcKqOETSwhVzN6K6ZAoQo3NqWRD8JdfXiXNqut7rv9wUand5nEU0Qk6RefIR1eohu5RHTUQRU/oGb2iN2fsvDjvzseiteDkM8foD5zPH55RkVA=

0.5

0.1 0 0

✏ = 0.4 AAAB83icbVBNSwMxEM3Wr1q/qh69BIvgadkVQS9KwYvHCvYDukvJprNtaDYJSVYopX/DiwdFvPpnvPlvTNs9aOuDgcd7M8zMSxRnxgbBt1daW9/Y3CpvV3Z29/YPqodHLSNzTaFJJZe6kxADnAloWmY5dJQGkiUc2snobua3n0AbJsWjHSuIMzIQLGWUWCdFESjDuBQ3gX/Zq9YCP5gDr5KwIDVUoNGrfkV9SfMMhKWcGNMNA2XjCdGWUQ7TSpQbUISOyAC6jgqSgYkn85un+MwpfZxK7UpYPFd/T0xIZsw4S1xnRuzQLHsz8T+vm9v0Op4woXILgi4WpTnHVuJZALjPNFDLx44Qqpm7FdMh0YRaF1PFhRAuv7xKWhd+GPjhw2WtflvEUUYn6BSdoxBdoTq6Rw3URBQp9Ixe0ZuXey/eu/exaC15xcwx+gPv8wcn6JET

0.1

0.2

0.3

0.4

0.5

0 0

1

2

3

Figure 6: Repetition bias trends in the adaptive DDM. A. The psychometric functions in trial j conditioned on d j−1 shows that when d j−1 = +1, the fraction of d j = +1 responses increases (dark blue). When d j−1 = −1, the fraction of d j = +1 responses decreases (light blue). The weighted average of these two curves gives the unconditioned psychometric function (black curve). Coherence g/D is defined for drift g ∈ ±1, noise diffusion coefficient D, and assumed change rate = 0.25. Inset: When true = 0.5 is the true change rate, the psychometric function is shallower (grey curve). B. Correct probability cXY in current trial, plotted as a function of the assumed change rate , conditioned on the pairwise relations of the last three trials (R: repetitions; A: alternations). Note the ordering of cAA and cAR is exchanged as is increased, since the effects of the state two trials back are weakened. Compare colored dots at = 0.05 and = 0.25. C. Decision time T XY conditioned on pairwise relations of last three trials. D. The initial condition, y0 , increases as a function of the noise diffusion coefficient D for = 0.1, 0.25, 0.4. In all cases, we set T D = 2, max for all trials, and D = 1 if not mentioned. θ = θ∞

6.1. History-dependent psychometric functions To provide a normative explanation of repetition bias, we compute the degree to which the optimal observer’s choice on trial j − 1 biases their choice on trial j. For comparison with experimental work, we present the probability that d j = +1, conditioned both on the choice on the previous trial, and on coherence3 (See Fig. 6A) . In this case, P(d j = +1|g j /D, d j−1 = ±1) = πθ (±y0j ), where the exit probability πθ (·) is defined by Eq. (2), and the initial decision variable, y0j , is given by Eq. (14) using the threshold θ and the assumed transition rate, = assumed . The unconditioned psychometric function is obtained by marginalizing over the true environmental change rate, true , P(d j = +1|g j /D) = (1 − true )P(d j = +1|g j /D, d j−1 = +1) + true P(d j = +1|g j /D, d j−1 = −1), which equals πθ (0) if = true . Fig. 6A shows that the probability of decision d j = +1 is increased (decreased) if the same (opposite) decision was made on the previous trial. When assumed , true , the psychometric function is shallower and overall performance max worsens (inset), although only slightly. Note that we have set θ = θ∞ , so the decision threshold maximizes RR∞ in Eq. (25). The increased probability of a repetition arises from the fact that the initial condition y j (0) is biased in the direction of the previous decision. 6.2. Error rates and decision times The adaptive DDM also exhibits repetition bias in decision times (Goldfarb et al., 2012). If the state at the current trial is the same as that on the previous trial (repetition) the decision time is shorter than average, while the decision 3 We

define coherence as the drift, g j ∈ ±1, in trial j divided by the noise diffusion coefficient D (Gold and Shadlen, 2002; Bogacz et al., 2006).

12


time is increased if the state is different (alternation, see Appendix D). This occurs for optimal ( assumed = true ) and suboptimal ( assumed , true ) models as long as ∈ [0, 0.5). Furthermore, the rate of correct responses increases for repetitions and decrease for alternations (See Appendix D). However, the impact of the state on two or more previous trials on the choice in the present trial becomes more complicated. As in previous studies (Cho et al., 2002; Goldfarb et al., 2012; Jones et al., 2013), we propose that such repetition biases in the case of uncorrelated trials could be the result of the subject assuming = assumed , 0.5, even when true = 0.5. Following Goldfarb et al. (2012), we denote that three adjacent trial states as composed of repetitions (R), and alternations (A) (Cho et al., 2002; Goldfarb et al., 2012). Repetition-repetition (RR)4 corresponds to the three identical states in a sequence, H j = H j−1 = H j−2 ; RA corresponds to H j , H j−1 = H j−2 ; AR to H j = H j−1 , H j−2 ; and AA to H j , H j−1 , H j−2 . The correct probabilities cXY associated with all four of these cases can be computed explicitly, as shown in Appendix D. Here we assume that assumed = true for simplicity, although it is likely that subject often use assumed , true as shown in Cho et al. (2002); Goldfarb et al. (2012). The probability of a correct response as a function of is shown for all four conditions in Fig. 6B. Clearly, cRR is always largest, since the ideal observer correctly assumes repetitions are more likely than alternations (when ∈ [0, 0.5)). Note when = 0.05, cRR > cAA > cAR > cRA (colored dots), so a sequence of two alternations yields a higher correct probability on the current trial than a repetition preceded by an alternation. This may seem counterintuitive, since we might expect repetitions to always yield higher correct probabilities. However, two alternations yield the same state on the first and last trial in the three trial sequence (H j = H j−2 ). For small , since initial conditions begin close to threshold, then it is likely d j = d j−1 = d j−2 , so if d j−2 is correct, then d j will be too. At small , and at parameters we chose here, the increased likelihood of the first and last response being correct, outweighs the cost of the middle response likely being incorrect. Note that AA occurs with probability 2 , and is thus rarely observed when is small. As is increased, the ordering switches to cRR > cAR > cAA > cRA (e.g., at = 0.25), so a repetition in trial j leads to higher probability of a correct response. In this case, initial conditions are less biased, so the effects of the max . state two trials back are weaker. As before, we have fixed θ = θ∞ These trends extend to decision times. Following a similar calculation as used for correct response probabilities, cXY , we can compute the mean decision times, T XY , conditioned on all possible three state sequences (See Appendix D). Fig. 6C shows that, as a function of , decision times following two repetitions, T RR , are always the smallest. This is because the initial condition in trial j is most likely to start closest to the correct threshold: Decisions do not take as long when the decision variable starts close to the boundary. Also, note that when = 0.05, T RR < T AA < T AR < T RA , and the explanation is similar to that for the probabilities of correct responses. When the environment changes slowly, decisions after two alternations will be quicker because a bias in the direction of the correct decision is more likely. The ordering is different at intermediate where T RR < T AR < T AA < T RA . Notably, both orderings of the correct probability and decision times, as they depend on the three trial history, have been observed in previous studies of serial bias (Cho et al., 2002; Gao et al., 2009; Goldfarb et al., 2012): Sometimes cAA > cAR and T AA < T AR whereas sometimes cAA < cAR and T AA > T AR . However, it appears to be more common that AA sequences lead to lower decision times, T AA < T AR (e.g., See Fig. 3 in Goldfarb et al. (2012)). This suggests that subjects assume state switches are rare, corresponding to a smaller assumed . 6.3. Initial condition dependence on noise Lastly, we explored how the initial condition, y0 , which captures the information gathered on the previous trial, max changes with signal coherence, g/D. We only consider the case θ = θ∞ , and note that the threshold will vary with D. We find that the initial bias, y0 , always tends to increase with D (Fig. 6D). The intuition for this result is that the max will tend to increase with D, since uncertainty increases with D, necessitating a higher threshold to optimal θ = θ∞ maintain constant accuracy (See Eq. (4)). As the optimal threshold increases with D, the initial bias increases with it. This is consistent with experimental observations that show that bias tends to decrease with coherence, i.e. increase with noise amplitude (Fründ et al., 2014; Olianezhad et al., 2016; Braun et al., 2018). We thus found that experimentally observed history-dependent perceptual biases can be explained using a model of an ideal observer accumulating evidence across correlated TAFC trials. Biases are stronger when the observer 4 Note

that RR here refers to repetition-repetition, as opposed to earlier when we used RR in Roman font to denote reward rate.

13


assumes a more slowly changing environment – smaller – where trials can have an impact on the belief of the ideal observer far into the future. Several previous studies have proposed this idea (Cho et al., 2002; Goldfarb et al., 2012; Braun et al., 2018), but to our knowledge, none have used a normative model to explain and quantify these effects. 7. Discussion It has been known for nearly a century that observers take into account previous choices when making decisions (Fernberger, 1920). While a number of descriptive models have been used to explain experimental data (Fründ et al., 2014; Goldfarb et al., 2012), there have been no normative models that quantify how information accumulated on one trial impacts future decisions. Here we have shown that a straightforward, tractable extension of previous evidence-accumulation models can be used to do so. To account for information obtained on one trial observers could adjust accumulation speed (Ratcliff, 1985; Diederich and Busemeyer, 2006; Urai et al., 2018), threshold (Bogacz et al., 2006; Goldfarb et al., 2012; Diederich and Busemeyer, 2006), or their initial belief on the subsequent trial (Bogacz et al., 2006; Braun et al., 2018). We have shown that an ideal observer adjusts their initial belief and decision threshold, but not the accumulation speed in a sequence of dependent, statistically-identical trials. Adjusting the initial belief increases reward rate by decreasing response time, but not the probability of a correct response. Changes in initial bias have a larger effect than adjustments in threshold, and there is evidence that human subjects do adjust their initial beliefs across trials (Fründ et al., 2014; Olianezhad et al., 2016; Braun et al., 2018). On the other hand, the effect of a dynamic threshold across trials diminishes as the number of trials increases: RRmax 1:n is nearly the same for constant thresholds θ as for when allowing dynamic thresholds θ1:n . Given the cognitive cost of trial-to-trial threshold adjustments, it is thus unclear whether human subjects would implement such a strategy (Balci et al., 2011). To isolate the effect of previously gathered information, we have assumed that the observers do not receive feedback or reward until the end of the trial. Observers can thus use only information gathered on previous trials to adjust their bias on the next. However, our results can be easily extended to the case when feedback is provided on each trial. The expression for the initial condition y0 is simpler in this case, since feedback provides certainty concerning the state on the previous trial. We also assumed that an observer knows the length of the trial sequence, and uses this number to optimize reward rate. It is rare that an organism would have such information in a natural situation. However, our model can be extended to random sequence durations if we allow observers to marginalize over possible sequence lengths. If decision thresholds are fixed to be constant across all trials θ j = θ, the projected RR is the average number of correct decisions chni divided by the average time of the sequence. For instance, if the trial length follows a geometric distribution, n ∼ Geom(p), then ¯ RR(θ; ) =

1/p . 1 − e−θ/D θ/p − (1/p − 1)(1 − 2)y0 + (T D /p) 1 + e−θ/D

This equation has the same form as Eq. (24), the RR in the case of fixed trial number n, with n replaced by the average hni = 1/p. Thus, we expect that as p decreases, and the average trial number, 1/p, increases, resulting in an increase in the RR and larger optimal decision thresholds θmax (as in Fig. 4). The case of dynamic thresholds is somewhat more involved, but can be treated similarly. However, as the number of trials is generated by a memoryless process, observers gain no further information about how many trials remain after the present, and will thus not adjust their thresholds as in Section 5.3. Our conclusions also depend on the assumptions the observer makes about the environment. For instance, an observer may not know the exact probability of change between trials, so that assumed , true . Indeed, there is evidence that human subjects adjust their estimate of across a sequence of trials, and assume trials are correlated, even when they are not (Cho et al., 2002; Gao et al., 2009; Yu and Cohen, 2008). Our model can be expanded to a hierarchical model in which both the states H 1:n and the change rate, , are inferred across a sequence of trials (Radillo et al., 2017). This would result in model ideal observers that use the current estimate of the to set the initial bias on a trial (Yu and Cohen, 2008). A similar approach could also allow us to model observers that incorrectly learn, or make wrong assumptions, about the rate of change. Modeling approaches can also point to the neural computations that underlie the decision-making process. Cortical representations of previous trials have been identified (Nogueira et al., 2017; Akrami et al., 2018), but it is unclear how 14


the brain learns and combines latent properties of the environment (the change rate ), with sensory information to make inferences. Recent work on repetition bias in working memory suggests short-term plasticity serves to encode priors in neural circuits, which bias neural activity patterns during memory retention periods (Papadimitriou et al., 2015; Kilpatrick, 2018). Such synaptic mechanisms may also allow for the encoding of previously accumulated information in sequential decision-making tasks, even in cases where state sequences are not correlated. Indeed, evolution may imprint characteristics of natural environments onto neural circuits, shaping whether and how is learned in experiments and influencing trends in human response statistics (Fawcett et al., 2014; Glaze et al., 2018). Ecologically adapted decision rules may be difficult to train away, especially if they do not impact performance too adversely (Todd and Gigerenzer, 2007). The normative model we have derived can be used to identify task parameter ranges that will help tease apart the assumptions made by subjects (Cho et al., 2002; Gao et al., 2009; Goldfarb et al., 2012). Since our model describes the behavior of an ideal observer, it can be used to determine whether experimental subjects use near-optimal or suboptimal evidence-accumulation strategies. Furthermore, our adaptive DDM opens a window to further instantiations of the DDM that consider the influence of more complex state-histories on the evidence-accumulation strategies adopted within a trial. Uncovering common assumptions about state-histories will help guide future experiments, and help us better quantify the biases and core mechanisms of human decision-making. Acknowledgements This work was supported by an NSF/NIH CRCNS grant (R01MH115557). KN was supported by an NSF Graduate Research Fellowship. ZPK was also supported by an NSF grant (DMS-1615737). KJ was also supported by NSF grant DBI-1707400 and DMS-1517629. Appendix A. Derivation of the drift-diffusion model for a single trial 1 We assume that an optimal observer integrates a stream of noisy measurements ξ1:s = (ξ11 , ξ21 , ..., ξ1s ) at equally spaced times t1:s = (t1 , t2 , ..., t s ) (Wald and Wolfowitz, 1948; Bogacz et al., 2006). The likelihood functions, f± (ξ) := P(ξ|H± ), define the probability of each measurement, ξ, conditioned on the environmental state, H± . Observations in each trial are combined with the prior probabilities P(H± ). We assume a symmetric prior, P(H± ) = 1/2. The probability ratio is then

R1s =

1 P(H 1 = H+ |ξ1:s ) 1 P(H 1 = H− |ξ1:s )

=

f+ (ξ11 ) f+ (ξ21 ) · · · f+ (ξ1s )P(H 1 = H+ ) f− (ξ11 ) f− (ξ21 ) · · · f− (ξ1s )P(H 1 = H− )

,

(A.1)

due to the independence of each measurement ξ1s . Thus, if R1s > 1(R1s < 1) then H 1 = H+ (H 1 = H− ) is the more likely state. Eq. (A.1) can be written recursively ! f+ (ξ1s ) 1 1 Rs = R , (A.2) f− (ξ1s ) s−1 where R10 = P(H+ )/P(H− ) = 1, due to the assumed prior at the beginning of trial 1. Taking the logarithm of Eq. (A.2) allows us to express the recursive relation for the log-likelihood ratio (LLR) L1s = ln R1s as an iterative sum L1s = L1s−1 + ln

f+ (ξ1s ) , f− (ξ1s )

(A.3)

where L01 = ln [P(H+ )/P(H− )], so if L1s ≷ 0 then H 1 = H± is the more likely state. Taking the continuum limit ∆t → 0 of the timestep ∆t := t s − t s−1 , we can use the functional central limit theorem to yield the DDM (See Bogacz et al. (2006); Veliz-Cuba et al. (2016) for details): √ dy1 = g1 dt + 2DdW, (A.4) 15


h i h f (ξ) i 1 + |H depends on the state, and 2D = Var ln |H where W is a Wiener process, the drift g1 ∈ g± = ∆t1 Eξ ln ff+− (ξ) ± ξ ± (ξ) ∆t f− (ξ) is the variance which depends only on the noisiness of each observation, but not the state. With Eq. (1) in hand, we can relate y1 (t) to the probability of either state H± by noting its Gaussian statistics in the case of free boundaries P(y1 (t)|H 1 = H± ) = √

1 4πDt

e−(y (t)∓t) /(4Dt) , 1

2

so that P(H 1 = H+ |y1 (t)) P(y1 (t)|H 1 = H+ ) 1 = = ey (t)/D . P(H 1 = H− |y1 (t)) P(y1 (t)|H 1 = H− )

(A.5)

Note, an identical relation is obtained in Bogacz et al. (2006) in the case of absorbing boundaries. Either way, it is 1 1 =H+ ) clear that y1 = D · LLR1 . This means that before any observations have been made ey (0)/D = P(H , so we scale the P(H 1 =H− ) 1 log ratio of the prior by D to yield the initial condition y0 in Eq. (1). Appendix B. Threshold determines probability of being correct We note that Eq. (A.5) extends to any trial j in the case of an ideal observer. In this case, when the decision variable y2 (T 2 ) = θ, we can write P(H 2 = H+ |y2 (T 2 ) = θ) = eθ/D , P(H 2 = H− |y2 (T 2 ) = θ)

(B.1)

so that a rearrangement of Eq. (B.1) yields c2 :=

P(H 2 = H+ |y2 (T 2 ) = θ)P(y2 (T 2 ) = θ) 1 = c1 , = P(d2 = +1) 1 + e−θ/D

(B.2)

Therefore the probability of correct decisions on trials 1 and 2 are equal, and determined by θ = θ1:2 , and D. A similar computation shows that the threshold, θ j = θ, and diffusion rate, D, but not the initial belief, determine the probability of a correct response on any trial j. To demonstrate self-consistency in the general case of decision thresholds that may differ between trials θ1:2 , we can also derive the formula for c2 in the case y20 ≤ θ2 . Our alternative derivation of Eq. (B.2) begins by computing the total probability of a correct choice on trial 2 by combining the conditional probabilities of a correct choice given decision d1 = +1 leading to initial belief y20 , and decision d1 = −1 leading to initial belief −y20 : c2 = [(1 − )c1 + (1 − c1 )] · πθ2 (y20 ) + [c1 + (1 − )(1 − c1 )] · πθ2 (−y20 ), (1 − )eθ1 /D + (1 − )(eθ1 /D − e−θ2 /D ) + (1 − e(θ1 −θ2 )/D ) eθ1 /D + 1 ((1 − )eθ1 /D + )(1 − e−2θ2 /D ) eθ1 /D + (1 − ) (eθ1 /D − e−θ2 /D ) + (1 − )(1 − e(θ1 −θ2 )/D ) + eθ1 /D + 1 (eθ1 /D + (1 − ))(1 − e−2θ2 /D ) (eθ1 /D + (1 − ))((1 − )eθ1 /D + )(eθ1 /D + 1)(1 − e−θ2 /D ) = θ /D (e 1 + (1 − ))((1 − )eθ1 /D + )(eθ1 /D + 1)(1 − e−2θ2 /D ) 1 c2 = , 1 + e−θ2 /D =

showing again that the correct probability c2 on trial 2 is solely determined by θ2 and D, assuming y20 ≤ θ2 .

16

(B.3)


Appendix C. Mean response time on the second trial We can compute the average time until a decision on trial 2 by marginalizing over d1 = ±1, X T (s · y20 ; θ2 ) · P(d1 = s|H 2 = H± ), DT 2 |(H 2 = H± ) =

(C.1)

s=±1

since d1 = ±1 will lead to the initial condition ±y20 in trial 2. We can compute T (±y20 ; θ) using Eq. (7) as  " #±1   θ2 eθ1 /D + (1 − )  (1 − )eθ1 /D + θ2 2  ∓ D ln . T (±y0 ; θ2 ) =θ2 coth − csch   D D (1 − )eθ1 /D + eθ1 /D + (1 − )

(C.2)

The second terms in the summands of Eq. (C.1) are given by P(H 2 = H± |d1 = ±1)P(d1 = ±1) = (1 − )c1 + (1 − c1 ), P(H 2 = H± ) P(H 2 = H± |d1 = ∓1)P(d1 = ∓1) P(d1 = ∓1|H 2 = H± ) = = c1 + (1 − )(1 − c1 ), P(H 2 = H± ) h i h i and note DT 2 = DT 2 |H 2 = H+ P(H 2 = H+ ) + DT 2 |H 2 = H− P(H 2 = H− ) = DT 2 |(H 2 = H± ). Using Eqs. (C.1-C.2) and simplifying assuming θ1:2 = θ, we obtain Eq. (17). For unequal thresholds, θ1 , θ2 , a similar computation yields Eq. (19). P(d1 = ±1|H 2 = H± ) =

Appendix D. Repetition-dependence of correct probabilities and decision times Here we derive formulas for correct probabilities c j and average decision times DT j in trial j, conditioned on whether the current trial is a repetition (R: H j = H j−1 ) or alternation (A: H j , H j−1 ) compared with the previous trial. To begin, we first determine the correct probability, conditioned on repetitions (R), c j |R := c j |(H j = H j−1 = H± ): X π±θ (±s · y0j ) · P(d j−1 = ±s|H j = H j−1 = H± ) c j |R = s=±1 j

j

1 1 − e−θ/D e+y0 /D e−θ/D 1 − e−θ/D e−y0 /D · + · = 1 − e−2θ/D 1 + e−θ/D 1 − e−2θ/D 1 + e−θ/D θ/D (1 − )e = + 1 + e−θ/D (1 − )eθ/D + 1 + e−θ/D eθ/D + (1 − ) (1 − )e2θ/D + (1 − )eθ/D + 2 = . −θ/D 1+e (1 − )eθ/D + eθ/D + (1 − ) In the case of alternations, c j |A := c j |(H j , H j−1 ) = c j |(H j , H j−1 = H± ): X c j |A = π∓θ (∓s · y0j ) · P(d j−1 = ∓s|H j , H j−1 = H± ) s=±1 j

j

1 − e−θ/D e−y0 /D e−θ/D 1 − e−θ/D e+y0 /D 1 · + · −2θ/D −θ/D 1−e 1+e 1 − e−2θ/D 1 + e−θ/D eθ/D (1 − ) = + 1 + e−θ/D (1 − )eθ/D + 1 + e−θ/D eθ/D + (1 − ) (1 − )e2θ/D + eθ/D + (1 − )2 = . 1 + e−θ/D (1 − )eθ/D + eθ/D + (1 − ) =

Thus, we can show that the correct probability in the case of repetitions is higher than that for alternations, c j |R > c j |A for any θ > 0 and ∈ [0, 0.5), since h i (1 − 2) eθ/D − 1 c j |R − c j |A = > 0. 1 + e−θ/D (1 − )eθ/D + eθ/D + (1 − ) 17


Now, we also demonstrate that decision times are shorter for repetitions than for alternations. First, note that when the current trial is a repetition, we can again marginalize to compute DT j |R := DT j |(H j = H j−1 = H± ): DT j |R = T (+y0j ; θ)

e−θ/D 1 j + T (−y ; θ) , 0 1 + e−θ/D 1 + e−θ/D

whereas in the case of alternations DT j |A = T (+y0j ; θ)

e−θ/D 1 + T (−y0j ; θ) . 1 + e−θ/D 1 + e−θ/D

By subtracting DT j |A from DT j |R, we can show that DT j |R < DT j |A for all θ > 0 and ∈ [0, 0.5). To begin, note that DT j |R − DT j |A =

i 1 − e−θ/D h T (+y0j ; θ) − T (−y0j ; θ) . −θ/D 1+e

Since it is clear the first term in the product above is positive for θ, D > 0, we can show DT j |R − DT j |A < 0 and thus DT j |R < DT j |A if T (+y0j ; θ) < T (−y0j ; θ). We verify this is so by first noting the function F(y) = (ey/D − e−y/D )/y h i is always nondecreasing since F 0 (y) = Dy cosh Dy − sinh Dy /y2 ≥ 0 since y/D ≥ tanh y/D. In fact F(y) is strictly increasing everywhere but when y = 0. This means that as long as y0 < θ, F(y0 ) < F(θ), which implies T (+y0j ; θ) − T (−y0j ; θ) = 2θ

ey0 /D − e−y0 /D − 2y0 < 0, eθ/D − e−θ/D

so T (+y0j ; θ) < T (−y0j ; θ) as we wished to show, so DT j |R < DT j |A. Moving to three state sequences, we note that we can use our formulas for c j |R and c j |A in Appendix D as well as πθ (y) described by Eq. (2) to compute cRR := c j |RR = π±θ (±y0j ) · c j |R + π±θ (∓y0j ) · (1 − c j )|R, cRA := c j |RA = π±θ (∓y0j ) · c j |R + π±θ (±y0j ) · (1 − c j )|R, cAR := c j |AR = π±θ (±y0j ) · c j |A + π±θ (∓y0j ) · (1 − c j )|A, cAA := c j |AA = π±θ (∓y0j ) · c j |A + π±θ (±y0j ) · (1 − c j )|A. Applying similar logic to the calculation of the decision times as they depend on the three state sequences, we find that T RR := DT j |RR = T (+y0j ; θ) · c j |R + T (−y0j ; θ) · (1 − c j )|R, T RA := DT j |RA = T (−y0j ; θ) · c j |R + T (+y0j ; θ) · (1 − c j )|R, T AR := DT j |AR = T (+y0j ; θ) · c j |A + T (−y0j ; θ) · (1 − c j )|A, T AA := DT j |AA = T (−y0j ; θ) · c j |A + T (+y0j ; θ) · (1 − c j )|A. Appendix E. Numerical simulations All numerical simulations of the DDM were performed using the Euler-Maruyama method with a timestep dt = 0.005, computing each point from 105 realizations. To optimize reward rates, we used the Nelder-Mead simplex method (fminsearch in MATLAB). All code used to generate figures will be available on github. Abrahamyan, A., Silva, L. L., Dakin, S. C., Carandini, M., Gardner, J. L., 2016. Adaptable history biases in human perceptual decisions. Proceedings of the National Academy of Sciences 113 (25), E3548–E3557. Akrami, A., Kopec, C. D., Diamond, M. E., Brody, C. D., 2018. Posterior parietal cortex represents sensory history and mediates its effects on behaviour. Nature 554 (7692), 368. Balci, F., Simen, P., Niyogi, R., Saxe, A., Hughes, J. A., Holmes, P., Cohen, J. D., 2011. Acquisition of decision making criteria: reward rate ultimately beats accuracy. Attention, Perception, & Psychophysics 73 (2), 640–657.

18


Bertelson, P., 1961. Sequential redundancy and speed in a serial two-choice responding task. Quarterly Journal of Experimental Psychology 13 (2), 90–102. Bogacz, R., Brown, E., Moehlis, J., Holmes, P., Cohen, J. D., 2006. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychological review 113 (4), 700. Bogacz, R., Hu, P. T., Holmes, P. J., Cohen, J. D., 2010. Do humans produce the speed–accuracy trade-off that maximizes reward rate? Quarterly journal of experimental psychology 63 (5), 863–891. Braun, A., Urai, A. E., Donner, T. H., Jan. 2018. Adaptive History Biases Result from Confidence-weighted Accumulation of Past Choices. Journal of Neuroscience. Brody, C. D., Hanks, T. D., 2016. Neural underpinnings of the evidence accumulator. Current opinion in neurobiology 37, 149–157. Brunton, B. W., Botvinick, M. M., Brody, C. D., 2013. Rats and humans can optimally accumulate evidence for decision-making. Science 340 (6128), 95–98. Chittka, L., Skorupski, P., Raine, N. E., 2009. Speed–accuracy tradeoffs in animal decision making. Trends in ecology & evolution 24 (7), 400–407. Cho, R. Y., Nystrom, L. E., Brown, E. T., Jones, A. D., Braver, T. S., Holmes, P. J., Cohen, J. D., 2002. Mechanisms underlying dependencies of performance on stimulus history in a two-alternative forced-choice task. Cognitive, Affective, & Behavioral Neuroscience 2 (4), 283–299. Corcoran, A. J., Conner, W. E., 2016. How moths escape bats: predicting outcomes of predator–prey interactions. Journal of Experimental Biology 219 (17), 2704–2715. Dehaene, S., Sigman, M., 2012. From a single decision to a multi-step algorithm. Current opinion in neurobiology 22 (6), 937–945. Diederich, A., Busemeyer, J. R., 2006. Modeling the effects of payoff on response bias in a perceptual discrimination task: Bound-change, driftrate-change, or two-stage-processing hypothesis. Perception & Psychophysics 68 (2), 194–207. Fawcett, T. W., Fallenstein, B., Higginson, A. D., Houston, A. I., Mallpress, D. E., Trimmer, P. C., McNamara, J. M., et al., 2014. The evolution of decision rules in complex environments. Trends in cognitive sciences 18 (3), 153–161. Fernberger, S. W., 1920. Interdependence of judgments within the series for the method of constant stimuli. Journal of Experimental Psychology 3 (2), 126. Fründ, I., Wichmann, F. A., Macke, J. H., Jun. 2014. Quantifying the effect of intertrial dependence on perceptual decisions. Journal of Vision 14 (7). Gao, J., Wong-Lin, K., Holmes, P., Simen, P., Cohen, J. D., 2009. Sequential effects in two-choice reaction time tasks: decomposition and synthesis of mechanisms. Neural Computation 21 (9), 2407–2436. Gardiner, C., 2009. Stochastic methods. Vol. 4. springer Berlin. Glaze, C. M., Filipowicz, A. L., Kable, J. W., Balasubramanian, V., Gold, J. I., 2018. A bias–variance trade-off governs individual differences in on-line learning in an unpredictable environment. Nature Human Behaviour 2 (3), 213. Gold, J. I., Shadlen, M. N., 2002. Banburismus and the brain: decoding the relationship between sensory stimuli, decisions, and reward. Neuron 36 (2), 299–308. Gold, J. I., Shadlen, M. N., 2007. The neural basis of decision making. Annual review of neuroscience 30. Goldfarb, S., Wong-Lin, K., Schwemmer, M., Leonard, N. E., Holmes, P., 2012. Can post-error dynamics explain sequential reaction time patterns? Frontiers in Psychology 3, 213. Gordon, D. M., 1991. Behavioral flexibility and the foraging ecology of seed-eating ants. The American Naturalist 138 (2), 379–411. Hsu, M., Bhatt, M., Adolphs, R., Tranel, D., Camerer, C. F., 2005. Neural systems responding to degrees of uncertainty in human decision-making. Science 310 (5754), 1680–1683. Jones, M., Curran, T., Mozer, M. C., Wilder, M. H., 2013. Sequential effects in response time reveal learning mechanisms and event representations. Psychological review 120 (3), 628. Kilpatrick, Z. P., 2018. Synaptic mechanisms of interference in working memory. Scientific Reports 8 (1), 7879. URL https://doi.org/10.1038/s41598-018-25958-9 Kirby, N. H., 1976. Sequential effects in two-choice reaction time: Automatic facilitation or subjective expectancy? Journal of Experimental Psychology: Human perception and performance 2 (4), 567. Latimer, K. W., Yates, J. L., Meister, M. L., Huk, A. C., Pillow, J. W., 2015. Single-trial spike trains in parietal cortex reveal discrete steps during decision-making. Science 349 (6244), 184–187. Moat, H. S., Curme, C., Avakian, A., Kenett, D. Y., Stanley, H. E., Preis, T., 2013. Quantifying wikipedia usage patterns before stock market moves. Scientific reports 3, 1801. Nogueira, R., Abolafia, J. M., Drugowitsch, J., Balaguer-Ballester, E., Sanchez-Vives, M. V., Moreno-Bote, R., 2017. Lateral orbitofrontal cortex anticipates choices and integrates prior with current information. Nature Communications 8, 14823. Olianezhad, F., Tohidi-Moghaddam, M., Zabbah, S., Ebrahimpour, R., Nov. 2016. Residual Information of Previous Decision Affects Evidence Accumulation in Current Decision. arXiv.org. Papadimitriou, C., Ferdoash, A., Snyder, L. H., 2015. Ghosts in the machine: memory interference from the previous trial. Journal of neurophysiology 113 (2), 567–577. Pashler, H., Baylis, G. C., 1991. Procedural learning: Ii. intertrial repetition effects in speeded-choice tasks. Journal of Experimental Psychology: Learning, Memory, and Cognition 17 (1), 33. Radillo, A. E., Veliz-Cuba, A., Josić, K., Kilpatrick, Z. P., 2017. Evidence accumulation and change rate inference in dynamic environments. Neural computation 29 (6), 1561–1610. Raposo, D., Sheppard, J. P., Schrater, P. R., Churchland, A. K., 2012. Multisensory decision-making in rats and humans. Journal of neuroscience 32 (11), 3726–3735. Ratcliff, R., 1978. A theory of memory retrieval. Psychological review 85 (2), 59. Ratcliff, R., 1985. Theoretical interpretations of the speed and accuracy of positive and negative responses. Psychological review 92 (2), 212. Ratcliff, R., McKoon, G., 2008. The diffusion decision model: theory and data for two-choice decision tasks. Neural computation 20 (4), 873–922. Ratcliff, R., Smith, P. L., 2004. A comparison of sequential sampling models for two-choice reaction time. Psychological review 111 (2), 333. Schall, J. D., 2001. Neural basis of deciding, choosing and acting. Nature Reviews Neuroscience 2 (1), 33.

19


Seeley, T. D., Camazine, S., Sneyd, J., 1991. Collective decision-making in honey bees: how colonies choose among nectar sources. Behavioral Ecology and Sociobiology 28 (4), 277–290. Shadlen, M. N., Newsome, W. T., 2001. Neural basis of a perceptual decision in the parietal cortex (area lip) of the rhesus monkey. Journal of neurophysiology 86 (4), 1916–1936. Stevenson, P. A., Rillich, J., 2012. The decision to fight or flee–insights into underlying mechanism in crickets. Frontiers in neuroscience 6, 118. Stone, M., 1960. Models for choice-reaction time. Psychometrika 25 (3), 251–260. Todd, P. M., Gigerenzer, G., 2007. Environments that make us smart: Ecological rationality. Current directions in psychological science 16 (3), 167–171. Urai, A. E., de Gee, J. W., Donner, T. H., 2018. Choice history biases subsequent evidence accumulation. bioRxiv, 251595. Veliz-Cuba, A., Kilpatrick, Z. P., Josić, K., 2016. Stochastic models of evidence accumulation in changing environments. SIAM Review 58 (2), 264–289. Wald, A., Wolfowitz, J., 1948. Optimum character of the sequential probability ratio test. The Annals of Mathematical Statistics, 326–339. Yu, A. J., Cohen, J. D., 2008. Sequential effects: Superstition or rational behavior? Advances in Neural Information Processing Systems 21, 1873–1880.

20

Optimizing sequential decisions in the drift-diffusion model - bioRxiv

Optimizing sequential decisions in the drift-diffusion model - bioRxiv

Suggest Documents

Decision Theory: Sequential Decisions

Collaborative, Sequential, and Isolated Decisions in Design

Optimizing Intermodal Trip Planning Decisions in ...

Optimizing the Likelihood with Sequential Monte

Sequential Effects in Important Referee Decisions The Case of ...

Exploration of Microalgae Biorefinery by Optimizing Sequential

Retiming and resynthesis: optimizing sequential ... - Semantic Scholar

Design of Reversible Sequential Circuits Optimizing ... - CiteSeerX

Understanding Sequential Decisions via Inverse ... - IEEE Xplore

Greedy Algorithms for Sequential Sensing Decisions - IJCAI

Optimizing ICU Discharge Decisions with ... - Columbia University

Optimizing Requirements Decisions With KEYS - Tim Menzies

Adapting Decisions, Optimizing Facts and ... - Semantic Scholar

Optimizing index allocation for sequential data broadcasting in ...

Optimizing index allocation for sequential data broadcasting in ...

Neuronal basis of sequential foraging decisions in a ... - Hayden Lab

Adaptive rating scales in sequential decisions - Springer Link

contrast effects in sequential decisions: evidence from ... - Google Sites

Leading by example? Investment decisions in a mixed sequential ...

contrast effects in sequential decisions: evidence from ... - Google Sites

Optimizing Inventory Decisions in Facility Location within Distribution ...

Microscopic Model for Sequential Tunneling in Semiconductor ...

Optimizing model parameterization in 2D linearized ... - CiteSeerX

The Probit Choice Model under Sequential Search