Computational models as statistical tools

0 downloads 0 Views 471KB Size Report
Daniel Durstewitz, Georgia Koppe. 1 and Hazem Toutounji. 1. Traditionally, models in statistics are relatively simple 'general purpose' quantitative inference ...
Available online at www.sciencedirect.com

ScienceDirect Computational models as statistical tools Daniel Durstewitz, Georgia Koppe1 and Hazem Toutounji1 Traditionally, models in statistics are relatively simple ‘general purpose’ quantitative inference tools, while models in computational neuroscience aim more at mechanistically explaining specific observations. Research on methods for inferring behavioral and neural models from data, however, has shown that a lot could be gained by merging these approaches, augmenting computational models with distributional assumptions. This enables estimation of parameters of such models in a principled way, comes with confidence regions that quantify uncertainty in estimates, and allows for quantitative assessment of prediction quality of computational models and tests of specific hypotheses about underlying mechanisms. Thus, unlike in conventional statistics, inferences about the latent dynamical mechanisms that generated the observed data can be drawn. Future directions and challenges of this approach are discussed. Address Department of Theoretical Neuroscience, Bernstein Center for Computational Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim of Heidelberg University, Mannheim, Germany Corresponding author: Durstewitz, Daniel ([email protected]) 1 Equal contribution. Current Opinion in Behavioral Sciences 2016, 11:93–99 This review comes from a themed issue on Computational modeling Edited by Peter Dayan and Daniel Durstewitz

http://dx.doi.org/10.1016/j.cobeha.2016.07.004 2352-1546/# 2016 Elsevier Ltd. All rights reserved.

Introduction In traditional statistics, models are general-purpose devices in the sense that they could be applied to a large class of experimental situations, originating in various fields and disciplines, where inference about a set of observed data is sought. A General Linear Model (GLM), for instance, relies on assumptions about the distribution of the data (or error terms), and the functional form of the relationship between predictors and outcomes (linearity), but otherwise makes no claims about the specific processes or mechanisms that underlie the data at hand. Parameters in the model (like the ‘beta weights’ in the GLM) obtain their meaning only within the specific experimental context investigated. Statistical models are usually simple (often linear), with relatively few or www.sciencedirect.com

strongly constrained (penalized) parameters, to render the inference process well-defined and tractable. Models in computational neuroscience, on the other hand, are traditionally tools for gaining insight into the possible processes and mechanisms that underlie experimental observations. They are put forward to advance an explanation for a pattern of experimental results, not necessarily at a quantitative, but, at least in the past, often at a rather qualitative level (but see [1,2]). For instance, a classical observation in prefrontal cortex neurophysiology is that single cells recorded in vivo appear to hop from a low-firing into a high-firing rate state during the delay period of a working memory task, when a specific item has to be retained in short-term memory to guide subsequent responding [3]. A ‘classical’ account for this observation is that the underlying network is a multi-stable dynamical system where the single neuron ‘hopping’ is a consequence of the network switching between different stimulus-selective attractor states (e.g. [4]). Although these models are often loosely adapted to capture key aspects (or moments) of the data, like the mean spiking rate and its coefficient of variation, their parameters are not estimated in a principled or systematic manner to capture the full data distribution (although, fitting by least squares, without explicitly specifying probability distributions, is sometimes used, e.g. [1,5]). They serve to provide an explanation for a key observation, not necessarily to explain all variation in a specific data set. Computational models are often complicated, highly nonlinear and with a large number of parameters. Both approaches are obviously justified in their own right, and both – statistics in particular – are anchored in their own long-standing research traditions. Here we will argue that a lot could be gained by merging them (see also [6]). It is emphasized that this is not, per se, a new idea: Statistical estimation of computational process models has indeed a longer history in various fields of the life sciences, like ecology (e.g. [7]) or biochemistry [8], and, somewhat more recently, also in some areas of the neuroand behavioral sciences (see below). In neuroscience, it is not yet, however, a widespread idea, and still one associated with many open issues.

Integrating computational models into a statistical framework As with comparatively simple statistical models, computational models can be augmented with probability assumptions that allow for principled inference by maximum likelihood or Bayesian approaches. Some of these Current Opinion in Behavioral Sciences 2016, 11:93–99

94 Computational modeling

may follow naturally from the type of data, as for instance if the model produces as its output binary behavioral choices (e.g., correct vs. incorrect) or spike counts, which follow a Bernoulli process and may be captured by a binomial or a Poisson distribution. In other cases, the Gaussian distribution might be a reasonable choice. A more challenging aspect with computational models, often referred to as generative models in this context [9], is that these commonly comprise latent (hidden) states not directly instantiated through observed data, often embedded within nonlinear functional relationships. To make the discussion more concrete, consider process (generative) models of the general form (Figure 1, [6,10,11]): pðyt jzt Þ ¼ g u ðhðzt ÞÞ zt ¼ f l ðzt1 ; ut ; et Þ;

(1)

where Y = {yt} is a (vector) time series of observed outputs (like behavioral responses) with probability distribution gu which depends solely on the underlying unobserved states zt at time t and parameters u. The link function h1 connects the hidden process to natural parameters of the distribution g (usually its mean). The hidden states zt themselves form a dynamical system (which may be given through difference, as here, or differential equations2), where fl is the (potentially nonlinear) transition function, parameterized through l and affected by known (fixed) inputs ut (e.g. experimental conditions or stimuli), and a stochastic factor et, known as process noise (Figure 1). Process noise may either reflect unknowns in the specific form, parameter space, inputs, or other factors, of the underlying dynamical system, or it may capture known biological noise sources (such as probabilistic synaptic release; [12]) that might come with a computational purpose (e.g., escaping from local optima [13] or sampling from probability distributions for inference [14]). The goal of statistical estimation would be to obtain, in the general case, both estimates for the unknown parameters {u, l} and the posterior distribution over the unobserved latent states Z = {zt} given the observed data series {yt} and regressors {ut}. A simple statistical example is factor analysis, where the latent states (factors) Z give rise to the observations Y through a linear-Gaussian model. Embedding computational models in such a framework has a number of profound advantages (see also [6]): 1) Model parameters are not hand-tuned or guessed in some arbitrary or rough fashion, but obtained through a principled optimization approach using a well-defined criterion function (e.g., the probability or density of the data given the parameters, as in maximum likelihood). 2) The estimation comes with confidence intervals (or credibility intervals in a Bayesian approach) by virtue of the 2 In continuous time, the latent process is described by a stochastic differential equation, z˙ðtÞ ¼ f l ðzðtÞ; uðtÞ; eðtÞÞ, where e(t)) specifies the process noise.

Current Opinion in Behavioral Sciences 2016, 11:93–99

probability assumptions connected with the model. Thus we gain a quantitative sense of how much confidence we can put into the estimated model parameters. 3) We can directly test hypotheses about (the relevance of) specific model parameters and the computational processes associated with them, e.g. through likelihood ratio statistics [15]. Approaches like hierarchical Bayesian (mixedeffects) modeling additionally give us insight into the structure of parameter space itself and the form of the (prior) distribution of parameters across individuals [16]. 4) Vice versa, through the tight formal link to the experimental observations, the models can more directly inform the experimental design such as to optimally de-correlate or disentangle specific model parameters. 5) We can also compare quite different computational models with respect to how well they can account for the observed data using principled means like likelihood-based information criteria [17] (e.g., the Akaike or the Bayesian Information Criterion), sampling- or cross-validation-based estimates of prediction error [18,19], or Bayesian model comparison [20,21,22] (see [23] in this issue, and [19,20,24] for a more in depth review of model comparison). In these approaches, finding the model which minimizes out-of-sample prediction error may be seen as the ultimate target ([19]; see also below). 6) Procedures to obtain estimates of prediction error will also give us a specific idea about how much we might be over-fitting the data. 7) Perhaps most excitingly, the fact that computational models, different from a pure statistical approach, include process assumptions about underlying latent states, implies we have a means to look beyond the ‘data surface’, to gain insight into the mechanisms that may have produced the observed data, rather than just establishing the statistical significance of a pattern observed in the data. That is, we may be able to infer, in a maximum likelihood or Bayesian sense, the dynamical process underlying the observed data. It thus seems we can only win by placing computational models into a statistical context. But, of course, there are also caveats and limitations. To obtain the likelihood p(Yju, l), we need to integrate the joint probability p(Y, Zju, l) across the usually very high-dimensional hidden state path Z = {zt}, using efficient algorithms like Expectation-Maximization [25] and the Kalman filter-smoother recursions [26,27] (see [28,29] for alternative approaches). Originally, these approaches have been developed for linear models (i.e., with h and f being linear functions) and Gaussian assumptions on both the outputs (g) and the process noise (e), a statistical framework often referred to as ‘state space models’ [10]. However, linear models are very restricted in the types of dynamics they can produce, exhibiting either fixed point behavior or simple harmonic oscillations which are highly sensitive to noise (e.g. [6]). With nonlinear (dynamical) transition equations and/or non-Gaussian observations, on the other hand, only approximate analytical or comparatively time-consuming, www.sciencedirect.com

Computational models as statistical tools Durstewitz, Koppe and Toutounji 95

Figure 1

(a)

λ

(b)

θ

u1

u2

u3

u4

u t–1

z1

z2

z3

z4

zt–1

ut

ut

zt–1

t = 1,...,T



zt

zt

gθ , h

yt y1

y2

y3

y4

yt–1

yt

Current Opinion in Behavioral Sciences

Graphical representation of state-space (generative) models. (a) Latent variable (state-space, generative) model for sequential data. Open white circles refer to the generating latent process and parameters, open gray circles to observables, and the black node to known inputs. (b) In this state space example, the input {ut} is a sequence of either rewarding (cherry) or aversive (lime) stimuli, the latent variable{zt} represents stimulus values that are learned over time, and the observed variable {yt} corresponds to neural spike trains that encode the latent values.

sampling-based numerical solutions are usually feasible (e.g. [17,30]). The complexity of computational models, both in terms of their state space dimensionality and their numbers of parameters, thus becomes much more of a burden than in conventional computational modeling. Problems with parameter identifiability frequently ensue [31,32] which may request additional measures like including penalty terms and constraints into the optimization process [33]. Hence, currently, considerable resources in terms of computing time and expertise for setting up, running, and evaluating these models is often required.

Behavioral computational-statistical models For behavioral computational models, statistical estimation has received growing interest especially within the past decade due to rapid advancements and increasing availability of model estimation and selection techniques [20,21,22,34]. We focus here on examples from the arguably two most influential classes of models, reinforcement and belief learning models on the one hand, and sequential sampling models for decision making on the other. Reinforcement learning (RL) models learn values for state-action pairs from repeated experience based on reward prediction errors, that is the differences between expected and actually received rewards [35]. Based on these learnt (iteratively updated) (state, action)-values they, in general, select among two or more available behavioral options according to a probabilistic (e.g. Boltzmann-type) choice function. The choice function therefore constitutes a natural link (h in Eq. (1)) from the RL process to the probability parameters of a bi- or multinomial output distribution (g in Eq. (1)). The (state, action)-values may be seen as the underlying latent variables {zt} in Eq. (1), www.sciencedirect.com

which — in the simplest case — follow a linear deterministic updating process (i.e., no process noise in Eq. (1)). Belief-learning models are similar (in fact may encompass RL models as a special case [36,37]), but assume in addition that learning also occurs for non-chosen actions and fictive rewards. They have been applied mainly to study learning in the context of social decision making and game theory, i.e. when personal values and actions depend on the observed actions and inferred values of others. Since the latent process is usually deterministic in these models (but see [38–40]), parameter recovery through maximization of a closed-form likelihood under the assumption of a Markov Decision Process is relatively straightforward. The benefit that this type of approach has brought to neuroscience is the ability to infer, and formally and quantitatively characterize, a variety of underlying psychological processes from observed behavioral responses, including computational parameters that control the rate of learning [41], the exploration-exploitation tradeoff [42], reward sensitivity [43], or memory [44], to name but a few. In combination with model selection techniques [20,21,22], they provide the means to explicitly test and disentangle hypotheses on specific alterations of these processes among subject groups in mechanistic terms, e.g. with respect to mental disorders [43–45], or to study the neural implementation of different learning algorithms (e.g. [46–49]). The drift diffusion model (DDM) [50,51] as an instance of sequential sampling models, is another example of a highly successful cognitive process model which has been employed as a ‘statistical tool’ to elucidate basic cognitive processes underlying (2-choice) decision making under temporal constraints [52–54]. DDMs, usually formulated in continuous time, perform a noisy integration of relative Current Opinion in Behavioral Sciences 2016, 11:93–99

96 Computational modeling

evidence through a latent state variable z(t) (see Eq. (1) and footnote 2), driven by a constant drift term (which embodies the relative strength of evidence) and a (usually Gaussian/Wiener) diffusion noise process e(t). A binary choice is emitted once z(t) crosses one of two decision boundaries. The drift rate, a time related to all nondecision (like perceptual or motor) processes, and an a priori bias, all modeled as random variables, along with the parameter setting the distance between decision boundaries, determine the pattern of choices and trialby-trial reaction times {yt}. DDMs, unlike conventional statistical approaches, thereby take into account trial-totrial variations and the full (typically non-Gaussian) reaction time distributions for correct vs. incorrect choices, to infer underlying information processing components. As discussed in the previous section, the process noise in the evidence accumulation aggravates statistical estimation of these models, but a variety of solutions exist [55–57], together with publicly available code [16,57,58].

Neural computational-statistical models For neural systems, broadly, models have been formulated at two levels: Either 1) neural recordings in the form of spike trains or neuroimaging data are used to estimate an abstract (network-level) representation of the underlying latent dynamics [17,59,60], connectivity or biophysical parameters [61], or for decoding stimulus features [62,63]; or 2) biophysically more detailed spiking single neuron models such as integrate-and-fire-like [64–67] or Hodgkin-Huxley-like [30,68] models are estimated from spike train or membrane potential recordings. Spike train observations represent point process or count data (if binned), such that a Poisson distribution for g in Eq. (1) is a natural assumption, often coupled to a spike intensity or rate, produced by the latent dynamics through a log-link function h1, while for the latent dynamics itself, additive Gaussian noise e  N (0,S) is commonly assumed [17,29,59,62,63]. One aim with these models may be, for instance, to find a lowdimensional, latent recurrent neural network dynamic representation of high-dimensional spike train observations [59]. Another example is biophysically-anchored latent models of human MEG data to elucidate properties of synaptic transmission, capture pharmacological manipulations and predict behavioral responses [60]. With continuously-valued observations, like field potentials (EEG) or membrane potential recordings, Gaussian assumptions for the output distribution g (Eq. (1)) may be more appropriate. On this basis, biophysically more detailed models have been used, for instance, to systematically infer ion channel and synaptic parameters of single neurons [30,68,69]. However, the large number of parameters associated with biophysically detailed neuron models makes it difficult to extend this approach to biophysical networks beyond a handful of neurons only Current Opinion in Behavioral Sciences 2016, 11:93–99

[70] (sometimes the hurdles here may be more on the computational side [18], however, as for biophysical models the available data are usually also more numerous and precise, in terms of spatio-temporal resolution and noise levels, than, e.g., for behavioral models). An alternative approach for making a link to the network level therefore is to start from stochastic differential equations for biophysically realistic, yet still simplified, single neuron models (e.g. [5,67]), and to translate these into Fokker-Planck equations for describing the mean-field dynamics of populations of such neurons [71]. FokkerPlanck equations are partial differential equations, which, put in this context, describe the evolution of the membrane potential’s probability density (and potentially that of other single neuron variables like adaptation currents) [64,66,72,73], and thus can be used to probabilistically characterize, in the mean-field sense, the latent state evolution, i.e. @p(z(t))/@t, of an entire population derived from biophysical single neuron models. The underlying state distribution may then be converted into a spike density or rate [64,73] that provides a link to series of observed spike times or counts.

Future directions There are several areas in this field that need further attention. First, we still need to find efficient ways of dealing with larger-scale models comprising very many parameters and high-dimensional state spaces. One possibility is hierarchical, stepwise approaches. For instance, single neuron parameters of cells in a biophysical network model may first be estimated from in vitro electrophysiological recordings and then fixed [5]; similar for the properties (conductances, time constants, etc.) of synaptic currents. At the next, network level, statistics derived from in vivo electrophysiological measurements and anatomical studies may then be used to infer connectivity parameters of the model (cf. [2]). Often we may have to combine data from qualitatively quite different sources (e.g., anatomical and physiological; [1]) to sufficiently constrain the model. More generally, joint estimation of data at different levels, specifically neural and behavioral [74], generated by the same underlying model, may be a powerful way forward for linking different scales. Sometimes it may be possible to lump parameters in a physiologically reasonable way (as with simplified spiking neuron models, e.g. [5,75,76]), or parameter distributions may be defined which are governed by a much smaller set of (meta-)parameters that one attempts to estimate. For instance, we may not need to estimate the strength of each synaptic connection in a network model, but just parameters of their distribution. Another issue is the criteria for model selection and comparison (see also [23]). Good out-of-sample prediction in the statistical sense, as e.g., estimated by cross-validation, may not be enough to guarantee we are dealing with the right model mechanism [18]. This is, partly (see also www.sciencedirect.com

Computational models as statistical tools Durstewitz, Koppe and Toutounji 97

Bishop CM: Pattern Recognition and Machine Learning (Information Science and Statistics). New York, Inc.: SpringerVerlag; 2006.

discussion in [23]), because it only probes prediction within the same data domain, i.e. with respect to new observations drawn from the same statistical distribution that also underlies model estimation. We may demand, however, that a good computational model should also predict new observations within regions of data space that were not assessed through the initial experiments, call it ‘out-of-domain’ rather than just ‘out-of-sample’ predictions (e.g. [5,18]). For instance, we may want a model inferred solely from physiological data to also make good behavioral predictions, or a model inferred from one cognitive task to also predict behavior on a different cognitive task. These are still very much underexplored theoretical issues that, we think, need to be addressed to take this whole approach to the next level.

9.

Conflicts of interest

16. Wiecki TV, Sofer I, Frank MJ: HDDM: hierarchical Bayesian estimation of the drift-diffusion model in python. Front Neuroinform 2013, 7:14.

Nothing declared.

Acknowledgements This work was funded through the German Science Foundation (DFG) within the SPP-1665 (Du 354/8-1) and SFB 1134, and through the German Ministry for Education and Research (BMBF) via the e:Med framework (01ZX1311A & 01ZX1314G).

References 1. 

Fisher D, Olasagasti I, Tank D, Aksay EF, Goldman M: A modeling framework for deriving the structural and functional architecture of a short-term memory microcircuit. Neuron 2013, 79:987-1000. First study to demonstrate estimation (by regularized, constrained leastsquared-error) of a full biophysical network model from a large variety of qualitatively different data at the anatomical, physiological, and behavioral levels.

2.

Hass J, Herta¨g L, Dustewitz D: A detailed data-driven network model of prefrontal cortex reproduces key features of in vivo activity. PLoS Comput Biol 2016, 12:e1004930.

3.

Fuster JM, Alexander GE et al.: Neuron activity related to shortterm memory. Science 1971, 173:652-654.

4.

Durstewitz D, Seamans JK, Sejnowski TJ: Neurocomputational models of working memory. Nat Neurosci 2000, 3:1184-1191.

Herta¨g L, Hass J, Golovko T, Durstewitz D: An approximation to the adaptive exponential integrate-and-fire neuron model allows fast and predictive fitting to physiological data. Front Comput Neurosci 2012, 6:62. Study that provides closed-form approximate expressions for initial and steady-state firing rates of a simple yet powerful single neuron model (the adaptive exponential leaky-integrate-&-fire model) through which fast and efficient estimation of neuron parameters (through least-squarederror) became feasible. This study also demonstrates generalization performance to novel (‘out-of-domain’) observations with completely different statistics than used for model estimation.

5. 

6.

Durstewitz D: Advanced Statistical Models in Neuroscience. Heidelberg, Germany: Springer. (in press).

7. Wood SN: Statistical inference for noisy nonlinear ecological  dynamic systems. Nature 2010, 466:1102-1104. Highlights the fundamental problem of statistical estimation in nonlinear state space models within the chaotic regime, using examples from population ecology. A synthetic likelihood based on summary statistics is introduced as an approach to circumvent this problem. 8.

Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmuller U, Timmer J: Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics 2009, 25:1923-1929.

www.sciencedirect.com

10. Durbin J, Koopman SJ: Time series analysis by state space methods. Oxford, UK: Oxford University Press; 2012. 11. Chen Z: Advanced State Space Methods for Neural and Clinical Data. Cambridge, UK: Cambridge University Press; 2015. 12. Jahr CE, Stevens CF: A quantitative description of NMDA receptor-channel kinetic behavior. J Neurosci 1990, 10:18301837. 13. Aarts E, Korst J: Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing. New York, NY: John Wiley & Sons, Inc.; 1989. 14. Maass W: Searching for principles of brain computation. Curr Opin Behav Sci 2016. (this issue). 15. Wilks SS: The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Mathematical Stat 1938, 9:60-62.

17. Latimer KW, Yates JL, Meister MLR, Huk AC, Pillow JW: Single trial spike trains in parietal cortex reveal discrete steps during decision-making. Science 2015, 349:184-187. Addresses a long-standing question about single unit behavior during decision making tasks via model-based statistical inference. The authors show that the observed trial-averaged ramping activity of LIP neurons during decision making can be better explained, for most neurons, by sudden step-like changes in the firing rate on single trials, with trial-to-trial variability in the stepping time. 18. Meliza CD, Kostuk M, Huang H, Nogaret A, Margoliash D,  Abarbanel HD: Estimating parameters and predicting membrane voltages with conductance-based neuron models. Biol Cybern 2014, 108:495-516. One of the few studies which attempted to estimate quite a large number of unknown channel states (12) and parameters (72) of biophysical (Hodgkin-Huxley-type) model neurons from just scalar membrane potential observations in a maximum likelihood framework, using variational approximations and Monte Carlo methods. Estimated models were shown to generalize well to (‘out-of-domain’) current input regimes not used in the estimation process, but limitations are discussed as well. 19. Hastie T, Tibshirani R, Friedman J: The elements of statistical learning: data mining, inference, and prediction. Springer; 2003. 20. Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ:  Bayesian model selection for group studies. Neuroimage 2009, 46:1004-1017. A hierarchical Bayesian method for model selection in group studies, where models are treated as random effects that may differ between subjects, with model probabilities at the top of the hierarchy following a Dirichlet distribution with parameters estimated across the group. 21. Penny WD, Stephan KE, Daunizeau J, Rosa MJ, Friston KJ, Schofield TM, Leff AP: Comparing families of dynamic causal models. PLoS Comput Biol 2010, 6:e1000709. 22. Rigoux L, Stephan KE, Friston KJ, Daunizeau J: Bayesian model selection for group studies - revisited. Neuroimage 2014, 84:971-985. 23. Churchland A, Kiani R: Three challenges for connecting model to mechanism in decision making. Curr Opin Behav Sci 2016. (this issue). 24. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB: Bayesian Data Analysis. Third Edition. Boca Raton, FL: CRC Press; 2013. 25. Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B Stat Methodol 1977:1-38. 26. Kalman RE: A new approach to linear filtering and prediction problems. J Basic Eng 1960, 82:35-45.

Current Opinion in Behavioral Sciences 2016, 11:93–99

98 Computational modeling

27. Rauch A, La Camera G, Lu¨scher HR, Senn W, Fusi S: Neocortical pyramidal cells respond as integrate-and-fire neurons to in vivo-like input currents. J Neurophysiol 2003, 90:1598-1612.

48. Zhu L, Mathewson KE, Hsu M: Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning. Proc Natl Acad Sci U S A 2012, 109:1419-1424.

28. Paninski L, Ahmadian Y, Ferreira DG, Koyama S, Rahnama Rad K, Vidne M, Vogelstein J, Wu W: A new look at state-space models for neural data. J Comput Neurosci 2010, 29:107-126.

49. Huys QJ, Eshel N, O’Nions E, Sheridan L, Dayan P, Roiser JP: Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Comput Biol 2012, 8:e1002410.

29. Macke JH, Buesing L, Sahani M: Estimating State and Parameters in State Space Models of Spike Trains. In Advanced State Space Methods for Neural and Clinical Data. Edited by Chen Z. Cambridge University Press; 2015:137-159.

50. Ratcliff R: A theory of memory retrieval. Psychol Rev 1978, 85:59-108.

30. Huys QJM, Paninski L: Smoothing of, and parameter estimation from, noisy biophysical recordings. PLoS Comput Biol 2009, 5:1000379. 31. Roweis S, Ghahramani Z: Learning Nonlinear Dynamical Systems Using the Expectation–Maximization Algorithm. In Kalman Filtering and Neural Networks. John Wiley & Sons, Inc.; 2002:: 175-220. 32. Auger-Methe M, Field C, Albertsen CM, Derocher AE, Lewis MA, Jonsen ID, Mills Flemming J: State-space models’ dirty little secrets: even simple linear Gaussian models can have estimation problems. Sci Rep 2016, 6:26677. 33. Buesing L, Macke JH, Sahani M: Learning stable, regularised latent models of neural population dynamics. Network 2012, 23:24-47. 34. Daunizeau J, Adam V, Rigoux L: VBA: a probabilistic treatment of nonlinear models for neurobiological and behavioural data. PLoS Comput Biol 2014, 10:e1003441. 35. Sutton RS, Barto AG: Reinforcement learning: An introduction. Cambridge, MA: MIT Press; 1998. 36. Camerer C, Hua Ho T: Experience-weighted attraction learning in normal form games. Econometrica 1999, 67:827-874. 37. Camerer C: Behavioral game theory: experiments in strategic interaction. Princeton, NJ: Princeton Univ Press; 2003. 38. Gershman SJ: A unifying probabilistic view of associative learning. PLoS Comput Biol 2015, 11:e1004567. 39. Ez-Zizi A, Farrell S, Leslie D: Bayesian Reinforcement Learning in Markovian and non-Markovian Tasks. 2015 IEEE Symposium Series on Computational Intelligence. 2015:579-586. 40. Geist M, Pietquin O: Kalman temporal differences. J Artif Intel Res 2010:483-532. 41. Behrens TE, Woolrich MW, Walton ME, Rushworth MF: Learning the value of information in an uncertain world. Nat Neurosci 2007, 10:1214-1221. 42. Khamassi M, Quilodran R, Enel P, Dominey PF, Procyk E: Behavioral regulation and the modulation of information coding in the lateral prefrontal and cingulate cortex. Cereb Cortex 2015, 25:3197-3218. 43. Huys QJ, Pizzagalli DA, Bogdan R, Dayan P: Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis. Biol Mood Anxiety Disord 2013, 3:12. 44. Collins AG, Brown JK, Gold JM, Waltz JA, Frank MJ: Working memory contributions to reinforcement learning impairments in schizophrenia. J Neurosci 2014, 34:13747-13756. 45. Chen C, Takahashi T, Nakagawa S, Inoue T, Kusumi I: Reinforcement learning in depression: A review of computational research. Neurosci Biobehav Rev 2015, 55:247267. 46. Gla¨scher J, Daw N, Dayan P, O’Doherty JP: States versus Rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 2010, 66:585-595. 47. Deserno L, Huys QJ, Boehme R, Buchert R, Heinze HJ, Grace AA, Dolan RJ, Heinz A, Schlagenhauf F: Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making. Proc Natl Acad Sci U S A 2015, 112:1595-1600. Current Opinion in Behavioral Sciences 2016, 11:93–99

51. Ratcliff R, Rouder JN: Modeling response times for two-choice decisions. Psychol Sci 1998, 9:347-356. 52. Forstmann BU, Ratcliff R, Wagenmakers EJ: Sequential sampling models in cognitive neuroscience: advantages, applications, and extensions. Annu Rev Psychol 2016, 67:641-666. 53. Ratcliff R, Smith PL, Brown SD, McKoon G: Diffusion decision model: current issues and history. Trends Cognit Sci 2016, 20:260-281. 54. Brunton BW, Botvinick MM, Brody CD: Rats and humans can optimally accumulate evidence for decision-making. Science 2013, 340:95-98. 55. Ratcliff R, Tuerlinckx F: Estimating parameters of the diffusion model: approaches to dealing with contaminant reaction times and parameter variability. Psychonomic Bull Rev 2002, 9:438-481. 56. Navarro DJ, Fuss IG: Fast and accurate calculations for firstpassage times in Wiener diffusion models. J Math Psychol 2009, 53:222-230. 57. Voss A, Voss J: A fast numerical algorithm for the estimation of diffusion model parameters. J Math Psychol 2008, 52:1-9. 58. Vandekerckhove J, Tuerlinckx F: Diffusion model analysis with MATLAB: a DMAT primer. Behav Res Methods 2008, 40:61-72. 59. Yu BM, Afshar A, Santhanam G, Ryu S, Shenoy K, Sahani M:  Extracting dynamical structure embedded in neural activity. In Advances in Neural Information Processing Systems 18. Edited by Weiss Y, Scho¨lkopf B, Platt JC. Cambridge, Ma: MIT Press; 2006: 1545-1552. First paper (to our knowledge) which formulated an estimation scheme based on the extended Kalman-filter-smoother and Expectation-Maximization algorithm for a nonlinear recurrent neural network from spike train observations, applied to reconstruct neural trajectories from in vivo observations. 60. Moran RJ, Symmonds M, Stephan KE, Friston KJ, Dolan RJ: An in vivo assay of synaptic function mediating human cognition. Curr Biol 2011, 21:1320-1325. 61. Moran R, Pinotsis DA, Friston K: Neural masses and fields in dynamic causal modeling. Front Comput Neurosci 2013, 7:57. 62. Smith AC, Brown EN: Estimating a state-space model from  point process observations. Neural Comput 2003, 15:965-991. First paper (to our knowledge) which formulated an extended Kalmanfilter-smoother framework for estimating a state space model from spike train observations via a Poisson intensity function, where the latent state was to reconstruct (decode) an underlying stimulus 63. Kulkarni JE, Paninski L: Common-input models for multiple neural spike-train data. Network 2007, 18:375-407. 64. Dong Y, Mihalas S, Niebur E: Improved integral equation solution for the first passage time of leaky integrate-and-fire neurons. Neural Comput 2011, 23:421-434. 65. Koyama S, Paninski L: Efficient computation of the maximum a posteriori path and parameter estimation in integrate-and-fire and more general state-space models. J Comput Neurosci 2010, 29:89-105. 66. Paninski L, Pillow JW, Simoncelli EP: Maximum likelihood estimation of a stochastic integrate-and-fire neural encoding model. Neural Comput 2004, 16:2533-2561. 67. Pozzorini C, Mensi S, Hagens O, Naud R, Koch C, Gerstner W: Automated high-throughput characterization of single neurons by means of simplified spiking models. PLoS Comput Biol 2015, 11:e1004275. www.sciencedirect.com

Computational models as statistical tools Durstewitz, Koppe and Toutounji 99

68. Toth BA, Kostuk M, Meliza CD, Margoliash D, Abarbanel HD: Dynamical estimation of neuron and network properties I: variational methods. Biol Cybern 2011, 105:217-237. 69. Kostuk M, Toth BA, Meliza CD, Margoliash D, Abarbanel HD: Dynamical estimation of neuron and network properties II: path integral Monte Carlo methods. Biol Cybern 2012, 106:155-167. 70. Knowlton C, Meliza CD, Margoliash D, Abarbanel HD: Dynamical estimation of neuron and network properties III: network analysis using neuron spike times. Biol Cybern 2014, 108:261273.

73. Herta¨g L, Durstewitz D, Brunel N: Analytical approximations of the firing rate of an adaptive exponential integrate-and-fire neuron in the presence of synaptic noise. Front Comput Neurosci 2014:8. 74. Rigoux L, Daunizeau J: Dynamic causal modelling of  brain-behaviour relationships. Neuroimage 2015, 117: 202-221. A framework for the joint estimation of brain-behavior relationships in the context of Dynamic Causal Modeling (DCM), where behavior emerges as a consequence of stimulus-driven latent neural network dynamics.

71. Brunel N: Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. J Comput Neurosci 2000, 8:183-208.

75. Izhikevich EM: Simple model of spiking neurons. IEEE Trans Neural Netw 2003, 14:1569-1572.

72. Moran RJ, Stephan KE, Dolan RJ, Friston KJ: Consistent spectral predictors for dynamic causal models of steady-state responses. Neuroimage 2011, 55:1694-1708.

76. Brette R, Gerstner W: Adaptive exponential integrate-and-fire model as an effective description of neuronal activity. J Neurophysiol 2005, 94:3637-3642.

www.sciencedirect.com

Current Opinion in Behavioral Sciences 2016, 11:93–99

Suggest Documents