Spike Coding from the Perspective of a Neurone - CiteSeerX

5 downloads 0 Views 503KB Size Report
theory to neural coding. We present novel methods for characterising single unit activity based on the perspective of a downstream neurone and propose a.
Spike Coding from the Perspective of a Neurone G. S. Bhumbra, R. E. J. Dyball∗ Department of Anatomy, University of Cambridge, Downing Street, Cambridge, CB2 3DY. Running title: Spike Coding Keywords: spikes, coding, information theory

Please send comments to [email protected] or [email protected]. This review will appear as: Bhumbra, G.S., and Dyball R.E.J. (2005), ‘Spike Coding from the Perspective of a Neurone’ in Cognitive Processing Vol. 6 No. 2. For copyright reasons, this version of the paper is taken from the final draft rather than from the publisher’s PDF version. The original publication is available at www.springerlink.com.

∗ Corresponding

author: [email protected]

1

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

Abstract

1993). The model thus considers both the mean and variability of activity as independent aspects of the neural code (Softky, 1995). Since temporal coding incorporates rate coding in addition to the variability of firing, the two models are not mutually exclusive. Arguments supporting rate or temporal coding are mostly based on motor and sensory systems. Many studies have tested the synfire chain theory, which proposes that the firing of functionally related neurones will show repeated motifs above chance levels in terms of both space and time. The interpretation of such observations (Dayhoff and Gerstein, 1983; Abeles et al., 1993) is difficult (Abeles and Gerstein, 1988) because frequency modulation can itself produce apparent patterning. Studies that accounted for changes in firing rate have suggest that repeated patterns occur at chance levels in the lateral geniculate nucleus (Oram et al., 1999) and motor cortex (Baker and Lemon, 2000). Precise spike timing is important in temporal coding (Borst and Theunissen, 1999), particularly for sound localisation (Knudsen and Konishi, 1979), echolocation (Simmons, 1979), and adaptation to a sustained stimulus (DeCharms and Merzenich, 1995). Studies of the fly visual system show that in specific situations, two spikes in close temporal proximity carry more than twice the information conveyed by a single spike (Brenner, Strong, Koberle, Bialek and de Ruyter Van Steveninck, 2000). In the lateral geniculate nucleus, the activity of cells reveals significantly more information about a stimulus if simultaneously occurring spikes are treated separately from isolated spikes (Dan et al., 1998). The coordinated activity of functionally related neurones might thus signal some features of a stimulus in the absence of overall changes in firing rates (Salinos and Sejnowski, 2001). The occurrence of collective oscillations of single units is a widespread phenomenon (Usrey and Reid, 1999; Wehr and Laurent, 1999), however the functional significance of such oscillations remains unclear (Shadlen and Moveshon, 1999; Gray, 1999). In the locust olfactory system, artificial desynchronisation of oscillatory activity strongly impairs the specificity of odour discrimination (Stopfer et al., 1997; MacLeod et al., 1998). Synchrony encompasses a spectrum of neural activity over broad scales of both time and space (Aertsen and Arndt, 1993). On a diurnal time scale, the suprachiasmatic nucleus of the hypothalamus shows a circadian rhythm of spike irregularity that is more obvious than the diurnal changes in firing rates (Bhumbra et al., 2005). The traditional view of coincidence detection is based on temporal summation with a short membrane time constant (Softky and Koch, 1993). However, lo-

In this paper, we compare existing methods for quantifying the coding capacity of a spike train, and review recent developments in the application of information theory to neural coding. We present novel methods for characterising single unit activity based on the perspective of a downstream neurone and propose a simple yet universally applicable framework to characterise the order of complexity of neural coding by single units. We establish four orders of complexity in the capacity for neural coding. First-order coding, quantified by firing rates, is conveyed by frequencies and is thus entirely described by first moment processes. Second-order coding, represented by the variability of inter-spike intervals, is quantified by the log interval entropy. Third-order coding is the result of spike motifs that associate adjacent inter-spike intervals beyond chance levels; it is described by the joint interval histogram, and is measured by the mutual information between adjacent log intervals. Finally, non-stationarities in activity represent coding of the fourth-order that arise from the effects of a known or unknown stimulus.

Introduction The all-or-none properties of action potentials (Hodgkin and Huxley, 1952) require that the coding of information by neurones is conveyed by the timing of occurrence of spikes (Tuckwell, 1988b). A number of different encryption strategies for spike coding have been proposed (Rieke et al., 1999; Rao et al., 2002). One of the first proposals followed the observation that the firing rate of stretch receptors increased in response to increased load (Adrian, 1926), and attributes changes in mean spike frequency to variations in stimulus intensity. In the simplest form, rate coding, the average firing rate encodes all information while variability about the mean frequency represents noise (Shadlen and Newsome, 1994). This means that the output of a cell would be a function of the firing rates of its inputs, regardless of any temporal organisation of afferent spikes. A rate coding neurone may thus be perceived as a noisy integrator. By contrast, temporal coding treats a neurone as a coincidence detector (Abeles, 1982). This means that the activity of afferent cells is tightly coordinated, as predicted by ‘synfire chain’ theory (Abeles, 1991), so that the precise times of spikes convey specific information. A key requirement for temporal coding is a high degree of variability in firing (Softky and Koch,

2

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

cal encryption by the spatiotemporal patterning of spikes may be enhanced through spatial summation as a result of nonlinearities imposed by the structural arrangement of synapses (Mel, 1993). Synapses sufficiently distant from each other might act as independent nonlinear units to increase the computational capacity of a dendritic tree (Poizari and Mel, 2001). There is no universally accepted measure of the extent to which a neurone or group of neurones can encode a specific feature of a stimulus. Similarly, no single parameter has been adopted universally to measure the coding capacity of a spike train, without considering any known stimuli. Developments in information theory (Shannon and Weaver, 1949) allowed the quantification of the coding capacity of spike trains with respect to spike counts (Mackay and McCulloch, 1952). More recently, inter-spike intervals have formed the basis for quantifying the coding capacity of a spike train. Initial methods adopted empirical ordinal techniques (Sherry and Klemm, 1984), but numerical methods have been developed based on the logarithmic (Bhumbra and Dyball, 2004) intervals and most recently, linear intervals (Reeke and Coop, 2004). An advantage of temporal coding over rate coding is that the amount of information conveyed by a given number of spikes is potentially greater (Ferster and Spruston, 1995). Since efficient temporal coding requires an extensive range of inter-spike intervals among converging inputs (Deco and Schurmann, 1998), interval variability may reflect the extent of reliance on encryption through functional coincidence detection. From the perspective of a downstream neurone, we have argued that, for a constant level of activity, the coding capacity of a single input, and its repertoire of inter-spike intervals, are equivalent (Bhumbra and Dyball, 2004). We model the interpretation of a spike train by a post-synaptic neurone on a perspective based on the sequence of inter-spike intervals. The variability of the inter-spike intervals provides a ‘fingerprint’ of the distributional coding capacity of the spike train (Perkel and Bullock, 1968) that may optimise the extent of data compression. Correlations between inter-spike intervals are able to quantify the extent of patterning (Sherry and Klemm, 1981) that may optimise the extent of data transmission. The relationship between the sequence of spikes and the stimulus that evokes it represents the amount of data that may be of use to the post-synaptic cell. Characterising the neural code is difficult because the nature of neuronal responses is non-linear. The variability of non-linear activity cannot be quantified comprehensively using a second order statistic such

as variance. Parameters based on information theory (Shannon and Weaver, 1949) are however not constrained by second order statistics. The parameter we discuss in this review, the entropy of a probability distribution, is similar to a variance in that it quantifies the variability of the distribution, but also reflects the irregularities that are represented by the skewness, kurtosis, and higher moments. Covariances are linear statistics that quantify the extent of association between variables based on the amount of variance that can be explained by their correlation. Such approaches are inadequate for characterising non-linear neuronal responses because of to their reliance on the assumptions of linear statistics, for example that the distributions are Gaussian. Information theory avoids such reliance by quantifying the extent of association between variables based on the amount of entropy that can be explained by their dependence. This measure is called the mutual information, the equivalent of covariance in information theory. In this paper, we compare existing methods for quantifying the coding capacity of a spike train, and review developments in the application of information theory to neural coding. Techniques to investigate the spontaneous activity of single units and their responses to putative stimuli are reviewed. We present novel methods for characterising single unit activity based on the perspective of a downstream neurone. Finally, we propose a universally applicable framework to characterise the complexity of neural coding by single units.

The Coding Capacity of a Spike Train The term ‘entropy’ used in a statistical sense has often been confused with its thermodynamic counterpart. In his authoritative text, Jaynes (2003) acknowledges with some resignation that many papers are flawed by the failure of authors to distinguish between these terms. It is thus of little surprise that many neurophysiologists have been reluctant to adopt entropy measures. However the term ‘entropy’ is now so firmly embedded in statistical literature that it cannot be avoided or replaced.

The Binary String Method Initial methods of applying entropies to neural coding treated spike trains as binary strings by quantising the time axis (Mackay and McCulloch, 1952). If a train of r spikes occurring in time ∆t is quantised into bins of 3

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

spike frequency M and bin width ∆τ , to be expressed with respect to these terms.

width ∆τ , the length of the binary string n is ∆t/∆τ . By selecting a temporal resolution ∆τ for counting spikes that is less than the absolute refractory period, the maximum spike count of any time bin cannot exceed one. A binary string can thus be constructed (Figure 1).

I 1

I 0

0

0

I

1

0

1

I 0

0

0

1

S[c|M, ∆τ ] ≈ log2

1

Whether Equation 5 or 6 is used to calculate the entropy per spike, the result depends only on M and ∆τ . The logarithmic scale to base two in Equation 6 means that the halving of either value will increase the entropy per spike by one bit, thus the binary string method of quantifying the coding capacity depends on an arbitrary selection of bin width ∆τ . This has been referred to as ‘the binning problem’ (Reeke and Coop, 2004). Another limitation of the method is the assumption that any combination of ones and zeroes is equally probable so long as the numbers of spikes and bins remain constant. The relative refractory properties of the neurones make this unlikely as they reduce the probability of the occurrence of binary strings with two spikes in consecutive bins. While the entropy estimate would provide an upper bound to the coding capacity for a given firing rate and bin width, it ignores relevant background information I that might be of considerable biological importance.

Figure 1: A spike train and a binary string that represents it. The coding capacity of the spike train illustrated in Figure 1 may be represented by the number of different ways that five spikes can be arranged among thirteen bins. A probability model can thus be constructed by considering the number of possible combinations n Cr , where n denotes the length of the binary string of which r digits are ones. n

Cr =

n! r!(n − r)!

(1)

In the case of the binary string in Figure 1, there are 1287 possible combinations. For simplicity it is assumed that any one possible binary string is equally as probable as any other. If each combination was denoted by every element of a vector c defined over the alphabet set AC , the probability mass function P (c|I) would be uniformly distributed. We adopt the notation of Sivia (1996) by expressing each probability ‘given any relevant background information’ as ‘|I’. The coding capacity can be expressed as the discrete entropy S[c] of the probability distribution of the combinations. S[c] = −

X

P (c|I) log2 P (c|I)

The ‘Direct Method’ The ‘direct method’ of entropy estimation accounts for spike timing correlations observed during a particular epoch of a stimulus cycle (de Ruyter van Steveninck et al., 1997; Strong et al., 1998). Motifs of spike occurrences are characterised as binary strings using a small bin width. However, the range of binary strings during a particular ‘window’ in the stimulus cycle forms the basis of the ‘direct method’ of entropy estimation. By analogy, the approach can be likened to a modified morse code that adopts a constant length n for the total number of symbols (i.e. dots and dashes) for each letter L. The presence or absence of a spike in each bin can be regarded as a symbol, like a dot or dash respectively, so that the jth combination of symbols of a given length n uniquely defines a ‘letter’ lj of the vector l. Using the range of observed binary strings, it is possible to construct an ‘alphabet’ set AL (n, ∆τ ) containing each possible ‘letter’ that could be encoded by the observed data. If for each possible peri-stimulus time τi of the vector τ , the observed frequencies of every letter were represented as a histogram, a joint probability mass function P (l, τ |I) for every ‘letter’

(2)

c∈AC n

=−

Cr X 1 1 log2 n nC C r r i=1

(3)

n

1 Cr log2 n Cr Cr ∴ S[c] = log2 (n Cr ) = −n

(6)

assuming (from Equation 5) that n, r, and (n − r) are large, and r ¿ n.

I 0

e M ∆τ

(4) (5)

For the binary string in Figure 1, the result is log2 1287. In practice, binary strings are too long for realistic calculation of the exact number of combinations but a number of approximation methods and simplifying assumptions (Rieke et al., 1999) allow the expression of the entropy per spike, given the mean 4

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

for each peristimulus time can be constructed by normalisation over the entire set of peri-stimulus times AT . The entropy per spike given a constant symbol length per ‘letter’ and bin width, can thus be expressed with respect to the probability mass function P (l, τ |I) using the discrete entropy in two dimensions.

the entropy estimate. Third, estimation of the entropy requires extensive mathematical manipulation of the spike data. The interprations are thus so remote from any physiological mechanism that is difficult to envisage any obvious biological applications.

S[l, τ |n, ∆τ ] =

Interval techniques focus on intervals between spikes rather than on the spikes themselves. Theoretically (Mackay and McCulloch, 1952), interval modulation is more efficient than rate modulation. An inter-spike interval histogram (ISIH) (Gerstein and Mandelbrot, 1964) can be constructed to represent the distribution of intervals of width w in seconds. A probability density function of the intervals f (w|I) can be modelled by fitting it to the histogram. The continuous nature of f (w|I) is an attraction, as it avoids an arbitrary resolution of spike quantisation. Interval methods represent the perspective of the neurone that receives the input more convincingly than methods based on spike counts. Frequency approaches model the probability of observing N spikes in a time window ∆τ . The probability mass function thus depends on ∆τ . The binning problem is at least as much a limitation for large bins as it is for small bins since a post-synaptic cell does not necessarily ‘count’ spikes over the time duration specified by ∆τ . From the perspective of the post-synaptic cell, the most relevant information available is the occurrence time of the previous spike tk−1 conveyed by an excitatory or inhibitory input. Perhaps the best representation is thus the probability density function for the time of the kth spike given the time of its predecessor, f (tk |tk−1 , I). For a spontaneously firing neurone in the absence of any known stimulus, the absolute value of tk has little meaning except in the context of the time of the preceding spike tk−1 . In this situation, f (tk |tk−1 , I) can be expressed with respect to the difference between the occurrence times of adjacent spikes f (tk − tk−1 |I). Since the inter-spike interval w is exactly this difference, the perspective of a spontaneously firing neurone can be represented by the probability density function f (w|I). Using the hazard or survival functions of the ISIH, it is possible to model the post-spike membrane potential. The technique has been applied to a number of different cell types, including motor neurones (Matthews, 1996), cortical units (Wetmore and Baker, 2004), and hypothalamic neurosecretory cells (Leng et al., 2001). While the method is attractive, it has yet to be validated by parallel recordings using intracellular electrodes to record the true value of the mem-



X

X

τ ∈AT l∈AL (n,∆τ )

The Interval Method

P (l, τ |I) log2 P (l, τ |I) (7)

An advantage of using S[l, τ |n, ∆τ ] to provide an upper bound to the coding capacity of a spike is that it makes no prior assumptions regarding the distribution of symbols or about the nature of any stimulus. However, again a limitation of the method is the binning problem due to its dependence on bin width. Experimental considerations also limit application since a large number of trials is required to construct a precise frequency histogram for the observed letters. The impracticality of collecting sufficiently large data sets to characterise P (l, τ |I) accurately, has been referred to as ‘the sampling problem’ (Reeke and Coop, 2004).

The Asymptotically Method

Unbiased

Estimator

To avoid the selection of an arbitrary resolution for entropy estimation, a ‘binless strategy’ has been proposed (Victor, 2002) that relies on the ‘asymptotically unbiased estimator’ (Kozachenko and Leonenko L, 1987); this avoids discretising the spike train but requires absolute continuity in the observed interval distribution. A limitation of the method (Reeke and Coop, 2004) is its sensitivity to the requirement for absolute continuity in the observed interval distribution, and consequently the temporal resolution for digitising the spike train. Even at a resolution of 10 µs, as both firing rate and discharge regularity increase, more intervals are classified as identical. Consequently, the application of the method results in a negative entropy estimate that could only be corrected by adding noise to increase the measurement resolution artificially. Limitations of the asymptotically unbiased estimator method might explain why it has not been commonly adopted by neurophysiologists. First, there is no universally accepted method to ‘embed’ the spike train to provide input data for the entropy estimate. Second, the requirement for absolute continuity in the observed interval distribution means that the method precludes resampling to obtain confidence intervals of 5

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

brane potential. A limitation of the ISIH to model the profile of the post-spike membrane potential is that it is based not on interval permutations but only combinations which ignores the order in which they occurred. Methods based on ISIHs are thus limited because they neglect the non-random ordering of intervals that might result from the deflections in the membrane potential which such techniques are intended to model. The exponential probability density function describes the inter-event intervals of Poisson processes, such as the depolarising events occurring at motor end-plates (Fatt and Katz, 1952) but the refractory properties of the neural cell membrane limit its direct application to spikes (Gerstein and Mandelbrot, 1964). The relative refractory period is accommodated by a gamma probability density function (Stein, 1965). A gamma process can be simulated by a simple Poissionian integrate-and-fire model so that only every αth event elicits a spike (Tuckwell, 1988b), where α is the ‘gamma order’. As the gamma order α increases, the distribution becomes increasingly gaussian. While the gaussian probability density function has been used to model the ISIH (Rodieck et al., 1962), its application is limited as matching of fits to the observed data is seldom successful (Berry et al., 1997). Using a Wiener process to describe an integrate-and-fire model based on modelling synaptic events as a diffusion process with gaussian noise, the theoretical interval distribution is an inverse gaussian (Gerstein and Mandelbrot, 1964). There is no universally accepted distribution to model ISIHs. Except for the normal distribution, the examples described above are asymmetric distributions due to their positive skewness, reflecting the tendency of slower firing cells to show greater variability in the interval distribution (Tuckwell, 1988b). For an exponential distribution, the mean and standard deviation are always equal, reflecting a scaling effect of the first moment (location parameter, or mean) on the second moment (scaling parameter, or standard deviation). This is not surprising as a Poisson process is a ‘first-moment’ stochastic (random) process and can thus be described entirely by its mean. The ISIH can be characterised using moments. Whatever the distribution, the mean interval is the reciprocal of the firing rate. Previous studies (Softky and Koch, 1993; Saeb-Parsy and Dyball, 2003) have quantified irregularity of firing using the coefficient of variation (Cv = standard deviation / mean) rather than simply the variance. The use of Cv allows more valid comparisons to be made between spike activities at different firing rates. For an exponential distribution, Cv is always exactly one because it scales the

standard deviation to the mean. However the first two moments cannot on their own describe an inter-spike interval distribution completely. No single probability density function can be fitted successfully to the interval distribution for every cell type. In practice, it is often necessary to accommodate for more than one mode to describe the profile of the interval distribution adequately (Tuckwell, 1988b). Since the coding capacity of a spike train may be reflected in both the profile and the dispersion of the ISIH, an information approach can be used to overcome the limitations of traditional ‘second moment’ statistics. Estimating the coding capacity of the spike train using the interval method is based on quantifying the variability of intervals regardless of the dependence of the cell activity on any particular stimulus (Reeke and Coop, 2004). The technique assumes that a spike train consists of symbols represented by the intervals w between adjacent spikes. It also assumes that sufficient intervals have been sampled to describe the interval distribution adequately using a continuous probability density function. Another assumption is that the intervals sampled are from a single probability distribution. In practice, the last two assumptions may conflict. While an increased number of intervals improves the accuracy of the model distribution, the spike train needs to be sufficiently short for it to be valid to assume that the activity is stationary. A further assumption made by a recently developed technique of estimating the coding capacity of a spike train (Reeke and Coop, 2004) is that the interval distribution can be modelled by a suitable probability density function fw (w|I). This method relies on obtaining parameters that describe fw (w|I) by maximum likelihood estimation. Once the parameters have been estimated, the continuous form (Karbowiak, 1969) of the ‘interval entropy’ (Reeke and Coop, 2004), or differential entropy s[w], can be calculated in ‘nats’ by using a logarithm of e instead of 2. s[w] = −

Z

SW

fw (w|I) loge fw (w|I)dw

(8)

where SW is the support set for fw (w|I). Initially the interval method was applied using gamma and gaussian probability density functions. The gamma distribution is characteristic of simple models of spike firing (Uscher et al., 1994), and close fits to recorded spike data have been reported (Stein, 1965; Tiesenga et al., 2000). Its probability density function G(α, β) can be expressed with respect to w, 6

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

the differential entropy cannot always be expressed in a closed form. However, an empirical result can be evaluated by plotting the model multimodal probability function and calculating the area. By discretising the intervals axis with time bins wi of a small width δw, fw (w|I) can be modelled by its probability mass function P (w|I). Using a logarithmic base of e, the discrete entropy S[w] can be expressed in ‘nats’ by summation.

using α and β as shaping and scaling parameters respectively.

G(α, β) =

1 wα−1 e−w/β β α Γ(α)

(9)

By substitution of the right hand expression for fw (w|I) into Equation 8, the interval entropy can be expressed as an integral. The expansion of the integral given by Reeke and Coop (2004) differs from that of Cover and Thomas (1991), but that provided by Reeke and Coop (2004) is correct (Reeke, 2005). s[w] = loge [βΓ(α)] + (1 − α)Ψ(α) + α

S[w] = −

(w−m)2 1 − 2ς 2 e 2 2πς

s[w] = S[w] + lim loge δw δw→0

∴ s[w] =

Z

SW

N (m, ς 2 ) loge N (m, ς 2 )dw

1 loge (2πeς 2 ) 2

(14)

(15)

Algebraic manipulation allows a similar estimation of the discrete entropy at different temporal resolutions for δw; the accuracy depends on small values for the original choice of δw. Reeke and Coop (2004) used a bin width of 0.5 ms, but such choices are influenced by cell type and the memory capacity of the computers used for analysis. Such small bin widths may be appropriate for the fast firing cells of sensory systems (Reinagel and Reid, 2000) but are impractical for slower firing neurones, such as hypothalamic cells with an ISIH that may span over many seconds or even minutes (Dyball and Leng, 1986). The method of maximum likelihood estimation avoids discretisation of data when estimating the differential entropy. Even if discretising the probability mass function is necessary to estimate the entropy of a multimodal function, the maximum likelihood estimates are independent of δw (Reeke and Coop, 2004). However, the entropy estimate depends on the choice of the probability density function. While goodnessof-fit statistics and confidence intervals can compare the one density function with another, the entropy estimate remains a function of a fitted model rather than of the original spike data. A potential problem of the interval method is its dependence on the units used to measure time. For example, the differential entropy of a normal distribution is 21 loge (2πeς 2 ) and it is thus a function of the interval variance ς 2 . A change in units from seconds to milliseconds would thus result in an arbitrary increase in entropy. The differential entropy of a gamma process is similarly a function of the scaling parameter β since it is given by loge [βΓ(α)] + (1 − α)Ψ(α) + α. If the choice of units influences the entropy estimate, the absolute value is no longer dimensionless and is

(11)

By substituting the normal probability density function for fw (w|I) in Equation 8, the differential entropy s[w] can be expressed in ‘nats’ (Cover and Thomas, 1991).

s[w] = −

P (w|I) loge P (w|I)

w∈AW

The differential entropy s[w] can then be expressed in ‘nats’.

(10)

Our own version of the proof is provided in the Appendix. Where the gamma order α is 1, G(α, β) becomes an exponential probability density function 1 −w/m e , where the mean m = β. The M(m) = m entropy estimate is thus given by log e me (Cover and Thomas, 1991). In this situation, increases in overall event rate tend to reduce the mean interval and thus decrease the entropy estimate. The interval method (Reeke and Coop, 2004) has also been applied by modelling f (w|I) using the normal probability density function N (m, ς 2 ), where m is the mean and ς 2 is the variance. N (m, ς 2 ) = √

X

(12) (13)

However the analytical solution is not applicable to real ISIHs. Normal distributions span the range [−∞, ∞] and inter-spike intervals cannot be negative. It is thus necessary to truncate the probability density function at w = 0 and renormalise the remaining distribution (Reeke and Coop, 2004). In practice, the interval distributions of many neurones cannot described by a single distribution. Bimodal (Bhumbra and Dyball, 2004) and multimodal (Reeke and Coop, 2004) probability density functions have been proposed, but the expanded integral for 7

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

thus ambiguous. However, for a constant quantisation resolution, the difference between entropies tends to be constant, provided the resolution is sufficient to model the probability functions accurately (Bhumbra and Dyball, 2004). An advantage of the interval method is that it represents the coding capacity of a spike train from the point of view of a neurone receiving that input. For example, a single unit that fires as regularly as a metronome cannot convey any information other than its mean frequency. From the perspective of a downstream neurone, any spike from such an input would be of little ‘surprise’ and contribute no more information than the maintainance of regular firing rate. Since entropies measure ‘uncertainty’, such interpretations of afferent information are neglected by entropy estimates based on spike counts, such as the binary string method (Mackay and McCulloch, 1952). The ‘direct method’ of estimation (de Ruyter van Steveninck et al., 1997; Strong et al., 1998) can detect local motifs related to stimuli, but cannot apply to many cell types, particularly those that fire slowly, because of the combination of the binning and sampling problems (Reeke and Coop, 2004). The interval method can be used for a wide range of levels of activity so long as there are sufficient intervals to model the ISIH adequately. A disadvantage of the interval method is that the entropy estimate does not scale naturally to the spike data. For a constant frequency, entropies based on linear time scales are maximised by a Poisson process, in which the ISIH is described by an exponential probability density function M(m). The differential entropy of M(m) is loge (me), where m is the mean interval. Since the mean interval is the reciprocal of the firing rate, the interval method results in an entropy estimate that depends on frequency. Ideally, an entropy estimate would directly reflect the physiological mechanisms of a neurone that govern the coding capacity of its spike trains. We have thus proposed the adoption of the entropy of the log interval distribution because it is constant for a given stochastic process and is independent of its frequency (Bhumbra and Dyball, 2004).

on the fly visual system has shown the importance of short inter-spike intervals by adopting a log 10 time scale to describe changes in the ISIH in the context of a rapidly changing stimulus (Fairhall et al., 2001). The choice of base is not important, but a base of e was chosen here because it is more convenient for the application of information theory. The lognormal probability density function Ne (µ, σ 2 ) is similar to its gaussian counterpart.

The Log Interval Method

The value of Cv is uniquely defined only by the standard deviation of the log intervals, thus it is simply an indirect measure of the variance of the log ISIH. A major limitation of Cv as a measure of irregularity of firing is that it strongly weights long intervals at the expense of the shorter intervals that are of more physiological interest (Bhumbra and Dyball, 2004). A unimodal lognormal model is limited because of the tendency of ISIHs to display more than one mode in

Ne (µ, σ 2 ) =



1

w 2πσ 2

e−

(loge w−µ)2 2σ 2

(16)

The mean m and standard deviation ς of the linear intervals wk do not correspond to the antilog of the mean µ and standard deviation σ of the log intervals xk . The expectation E(w) and expected variance V (w) of the lognormal probability density function fw (w|I) are each affected by both the mean and variance of the log intervals (Aitchison and Brown, 1963). E(w) = eµ+σ V (w) = e

2

/2

2µ+σ 2

(e

(17) σ2

− 1)

(18)

The coefficient of variation Cv scales the dispersion of the ISIH to the firing rate of a cell to quantify some aspect of coding capacity (Softky and Koch, 1993).

Cv =

p

V (w) E(w)

(19)

In Equations 17 and 18, E(w) and V (w) have been expressed with respect to the mean and variance of the corresponding logarithmic intervals. p

e2µ+σ2 (eσ2 − 1) eµ+σ2 /2 p 2 eµ+σ /2 (eσ2 − 1) = eµ+σ2 /2 q

Cv =

=

A lognormal probality density function has been used (Burns and Webb, 1976) to model interval distributions. In practice it is simplest to use logarithm xk of the intervals wk , where xk = loge wk , and obtain maximum likelihood estimates of the mean and variance to construct a gaussian model N (µ, σ 2 ) with time on a logarithmic scale (Tuckwell, 1988b). Recent work 8

(eσ2 − 1)

(20) (21) (22)

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

Frequency/counts per bin

30 secs 1 mV 0.5 secs

d 20 15 10 5 0

c 373

3 2 1 0

0

300 Time/s

1

2

3

10 10 10 Inter−spike interval/ms

3

10

2

10

1

10

1

2

3

10 10 10 Preceding interval/ms

373

= 5050

10 5 0

600

e Succeeding interval/ms

Frequency/counts per bin

1 mV

b 4

Counts per bin

a

Succeeding interval/log ms

real spike data. Consequently, an advantage of entropies over variance measures is their dependence on the profile of an interval distribution beyond second order statistics (Bhumbra and Dyball, 2004). Cells that give occasional bursts of spikes show an ISIH that is often described as ‘L-shaped’ (Tuckwell, 1988b), with most of the short intervals clustered in the first few bins (Figure 2c). Logarithmic transformation of the intervals for this type of cell may be useful to characterise the left tail of the histogram which represents the short intervals that may be of particular interest. An ‘L-shaped’ ISIH of the cell activity seen in the peri-nuclear zone of the supraoptic nucleus can be transformed (Bhumbra and Dyball, 2004) to a bimodal gaussian distribution on a log scale (Figure 2d). The distributions can be so separate that they may represent two distinct subpopulations of intervals to enrich the repertoire of coded messages that result from separate and independent physiological mechanisms. For example, the first mode that represents short intervals may reflect deterministic effects of membrane channels on the post-spike membrane potential whereas the second mode may reflect the stochastic nature of the inputs (Bhumbra et al., 2004). Logarithmic transformation increases the relative contribution of short intervals to the histogram. An emphasis on short intervals may be useful to characterise codes that depend on a small number of spikes, such as that of the auditory cortex of the bat (Dear et al., 1993) for echolocation. A single-echo pair is sufficient to influence the behaviour of a bat over a timescale of only a few milliseconds (Simmons, 1989). So there must be aspects of coding represented in a time window that could accommodate at most one spike. Behavioural responses to visual stimuli on a 25-50 ms timescale that allows only the passage of 1-5 spikes have been reported in the visual systems of flies (Land and Collett, 1974), cats (Reid et al., 1991), and monkeys (Knierem and van Essen, 1992). Brief tactile stimulation to individual whiskers of the rat elicits an average response of a single spike in the most sensitive cells of the somatosensory cortex (Fee and Kleinfeld, 1994). In the rat hippocampus, cells selective for spatial location show a maximum firing rate of ≈ 30 Hz (Wilson and McNaughton, 1993; O’Keefe and Recce, 1993). It has thus been argued (Rieke et al., 1999) that if a rat is aware of its position within an accuracy of 1 cm and can travel up to 20 cm per second, hippocampal coding of location must depend on 1-2 spikes per cell. Temporal hyperacuity in the echolocating bat can discriminate echo delay differences of 10-50 ns (Sim-

0 500 1000 Inter−spike interval/ms

f 8 6 4 2 2 4 6 8 Preceding interval/log ms

Figure 2: The activity of a peri-nuclear zone cell dorsolateral to the supraoptic nucleus recorded from a rat in vivo. Trace a) shows excerpts of recording showing the doublet spikes and panel b) shows a ratemeter record with a bin width of 1 s. Panel c) shows an ISIH with a bin width of 1 ms truncated on both axes to show the short intervals and an untruncated inset to illustrate the ‘L-shape’ distribution. In panel d), the log ISIH is plotted using a bin width of 0.02 log e (ms) showing a bimodal gaussian distribution. Panel e) is a scatter of each interval plotted against its predecessor on a log scale and panel f) represents the JISIH. In this example, no two short intervals occur adjacent to one another.

mons et al., 1990), and the electric fish can respond to 100 ns phase shifts in an oscillating field (Rose and Heilenberg, 1985). In such cases, behavioural responses are likely to depend on the integration of sensory information dispersed among many cells, and may thus require synergetic approaches of assessing neural activity (Haken, 1996). However, the existence of large neural ensembles does not preclude a neural code that depends on a few spikes occurring on a short time scale. Invertebrates have no large neural ensembles to integrate inputs, but the latency of their behavoural responses can be within only a few milliseconds (Rieke et al., 1999). Under certain conditions, neurones adopt a sparse temporal coding strategy. Averaging is not effective for coding over a period that permits only a single spike. Adaptation to a prolonged stimulus (Adrian, 1926) constitutes a physiological mechanism for a neural response to scale to a baseline stimulus 9

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

intensity. Such modulation optimises the rapid detection of changes in stimuli by rescaling the neural responses to maximise the transmission of useful information (Brenner, Bialek and de Ruyter van Steveninck, 2000). Changes in neural activity, that adapts to multiple timescales, have thus been described by shifts in the mode of the ISIH on a logarithmic timescale (Fairhall et al., 2001). In support of previous reports (Brenner, Bialek and de Ruyter van Steveninck, 2000), the observations indicate that the log interval distribution of a neurone represents its ‘fingerprint’ (Perkel and Bullock, 1968) of distributional coding. The perspective of a post-synaptic cell to a single input on a logarithmic time scale or with respect to time constants. The two interpretations however are not mutually exclusive. Synaptic activity and its effects on firing can be modelled using a Lapicque model (Lapicque, 1907), which has more recently (Tuckwell, 1988a) been called the ‘leaky integrator’ or the ‘forgetful integrate-and-fire model’. Realistic integrate-andfire models incorporate the decay of the membrane potential between synaptic events (Stein, 1965). Logarithmic transformation of an exponential decay results in a function with a profile that is independent of the time constant although the position is offset by its logarithm. Thus the membrane time constant moves the position of the post-synaptic potential on a log scale, but does not alter its width or shape. The profile of the log ISIH thus represents the interpretation of the neural code independent of any time constants. By contrast, the first moment of the log ISIH fixes the spike train in ‘real time’. The mean of the log intervals thus represents an aspect of the neural code that may be affected by the properties of the post-synaptic membrane. For a single projection, different post-synaptic targets are likely to have different membrane time constants. An advantage of quantifying coding capacity in a way that is independent of membrane time constants is that it does not require assumptions that confine its applicability to a few cell types. We have thus proposed that the log interval entropy quantifies coding capacity in a way that is universally applicable (Bhumbra and Dyball, 2004). The log interval method is similar to the interval method except that the entropy s[x] is based on the probability distribution of the logarithmic intervals fx (x|I). s[x] = −

Z

SX

fx (x|I) loge fx (x|I)dx

It is possible to inter-relate the two entropies for the linear and logarithmic intervals. s[x] = s[w] − E(x)

(24)

Thus s[x] ‘scales’ s[w] to E(x). In practice, it is simplest to consider the observed log intervals xk directly. For example, the maximum likelihood estimates for the mean and standard deviation of the log intervals can be used to construct a gaussian model N (µ, σ 2 ). (x−µ)2 1 e− 2σ2 N (µ, σ 2 ) = √ x 2πσ 2

(25)

In constrast to the gaussian model for linear intervals, it is valid to use Equation 13 as the support set Sx is defined over the range [−∞, ∞]. s[x] = − =

Z

SX

N (µ, σ 2 ) loge N (µ, σ 2 )dx

1 loge (2πeσ 2 ) 2

(26) (27)

Equation 27 emphasises one key advantage of the log interval method. Since the variance σ 2 of logarithmic intervals is dimensionless, s[x] is independent of the units used to measure time. A similar adaptive rescaling property of the log interval entropy can also be illustrated for a gamma model G(α, β). The full derivation of the log interval entropy using a gamma model is provided in the Appendix. s[x] = loge Γ(α) − αΨ(α) + α

(28)

Equation 28 expresses the differential entropy of the log intervals of a gamma process and is independent of the scaling parameter β. Thus s(x) is exclusively a function of the shaping parameter α, that defines the gamma order. As the gamma order α increases, s(x) decreases. For a Poisson process, the gamma order α is 1. Since Γ(1) is 1 and Ψ(α) is the negative of Euler’s constant γ(≈ 0.5772), s(x) is ≈ 1.5772 nats. Thus the entropy per event is independent of the rate. An entropy estimate that approaches that of a Poisson process, regardless of the spike frequency, suggests that the physiological mechanisms underlying the neuronal firing are likely to be dictated predominantly by the stochastic nature of synaptic events. This is consistent with a reduced log interval entropy in vitro compared to that seen in vivo as the reduction is likely

(23)

where SX is the support set for fx (x|I) defined over the interval [−∞, ∞]. 10

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

increases in the post-synaptic potential from Poissonian synaptic events, would predict a gamma interval probability density G(α, β) (Tuckwell, 1988b). The gamma order α represents the number of excitatory synaptic events required to elicit an action potential (Baker and Gerstein, 2000). Since α denotes the ‘decimation’ order of a Poisson process (Baker and Lemon, 2000), the difference between the entropy of a Poisson process and s[x] reflects the amount of information lost by the relative refractory properties of the cell membrane. The information lost by ‘decimation’ calculated using the interval entropy (Reeke and Coop, 2004) and log interval entropy (Bhumbra and Dyball, 2004) would equate as the β terms cancel. However, an advantage of the log interval entropy is that it is does not depend on a constant value for β. For a given firing rate, the linear interval entropy is maximised by a Poisson process as the exponential probability density function represents the maximum entropy given the constraints of a fixed first moment and the boundary condition [0, ∞] (Cover and Thomas, 1991). However, this is not true for the log interval entropy, as the effects on E(w) of increasing σ 2 can be countered by decreases in µ (see Equation 17) so that there is no unique maximum entropy distribution. The independence of the log interval entropy and firing rate is as might be expected. If intervals of different lengths are regarded as distinct coding symbols, it makes little sense to quantify the coding capacity per symbol by the number of symbols observed in a second. By analogy, it would seem absurd to quantify the coding capacity of each word in English by the speed the words are spoken or read. To quantify the information rate, the log interval entropy per spike can be multiplied by the firing rate to give a product in units of bits per second or ‘baud’ (Bhumbra et al., 2004). The gaussian approximation of the gamma probability density function on a logarithmic time-scale can simplify the characterisation of the ISIH. For example, gaussian functions of more than one mode can be used to describe the logarithmic ISIH (Bhumbra and Dyball, 2004). The multimodality of many interval distributions highlights the limitations of simple integrate and fire models that would predict only one mode. While maximum likelihood estimates can model distributions with more than one mode (Reeke and Coop, 2004; Bhumbra and Dyball, 2004), our original implementation of the log interval entropy made no attempt to fit the ISIH. The difficulty with distributions with more than one mode is that the differential entropy integral cannot always be expressed in a

to result from the severing of inputs during the brain slice preparation (Bhumbra and Dyball, 2004; Bhumbra et al., 2005). Equation 28 gives an exact value of the differential entropy of the gamma model. The gamma and psi functions impose non-linearities between the entropy and the gamma order, but for a large gamma order α, both functions can be approximated using linear expressions. The derivation of the approximation for s[x] is provided in the Appendix as the result is of particular biological importance.

s[x] ≈

1 loge 2

µ

2πe α



(29)

The result is analogous to the differential entropy of a gaussian model expressed in Equation 27 and that a large gamma order α is described by a gaussian model N (µ, σ 2 ) as represented in Equation 25, where σ 2 = 1/α. In practice, the negative relationship between the entropy and the log of the gamma order α expressed in Equation 29 only slightly underestimates the entropy for small values of α, as shown in Figure 3. The term ‘approximate entropy’ should not be confused with the unrelated ‘ApEn’ of Pincus (1991). 1.5

Exact entropy Approximate entropy

Entropy / nats

1

0.5

0

−0.5

−1 0 10

1

10 Gamma order

2

10

Figure 3: Relationship between the gamma order and differential entropy using the exact (Equation 28) and approximate (Equation 29) values. By changing the base of the logarithm to two in Equation 29, s[x] can be expressed ‘bits’. s[x] ≈

1 1 log2 (2πe) − log2 α 2 2

(30)

The term (1/2)(log2 α) in Equation 30 represents the loss of information due to an increasing gamma order. A simple integrate-and-fire model, based on step 11

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

closed form (Reeke and Coop, 2004). Instead of using a model probability mass function, we convolved the data with a gaussian kernel to construct a smoothed probability mass function; the use of a gaussian kernel was justified by the tendency of log ISIHs to consist of a combination of gaussian distributions (Bhumbra and Dyball, 2004). In the Appendix, we have approximated the gamma model on a logarithmic scale to a gaussian model, further validating the use of a gaussian kernel. The use of the gaussian kernel is also useful to construct the joint inter-spike interval histogram (JISIH) to quantify the shared information between adjacent intervals (Bhumbra and Dyball, 2004). While the JISIH is likely to reflect spike motifs arising from the profile of the post-spike membrane potential and the influences of any non-stationarities (Tuckwell, 1988b), it has received little attention from neurophysiologists.

However, a logarithmic transformation of the interval scattergram ameliorates the validity of linear methods for hypothalamic cells (Sabatier et al., 2004), but the valid use of correlation coefficients remains restricted to those cells with no more than one mode in their interval distribution (Bhumbra and Dyball, 2004). Linear coefficients quantify the correlation between random variables; by contrast, an information theoretical approach measures their dependence. The ‘shared’ information between two intervals either side of a spike can be quantified by the reduction in entropy of one interval by knowing the other. If x and y represent the preceding and succeeding log intervals respectively, their ‘mutual information’ I[x; y] is given by this reduction (Cover and Thomas, 1991). I[x; y] = s[y] − s[y|x]

(31)

where s[y] is the log interval entropy for y and s[y|x] is the conditional entropy for y given x.

Spike Motifs An assumption that underlies the use of the ISIH to quantify coding capacity is that successive intervals are independent of one another (Reeke and Coop, 2004). While the assumption may not hold for real spike trains, it does not invalidate the estimate of the upper bound of the entropy measure. Serial correlation of intervals reflects motif patterning (Perkel et al., 1967) that might reduce the coding capacity of a spike train (Sherry and Klemm, 1981). However, a negative correlation between adjacent intervals might increase the capacity for coding of a time-dependent stimulus (Chacron et al., 2001). There have even been suggestions of statistical associations between intervals separated by up to five intervening spikes (Klemm and Sherry, 1981). While patterning reduces data compression, maximal redundancy is the optimal strategy for data transmission since noise is less likely to corrupt any code that contains repeated instances of the same information (Cover and Thomas, 1991). In English, for example, ‘q’ is almost invariably followed by ‘u’ and such redundancy constitutes a degree of robustness to noise. For any given word, corruption of a letter that follows ‘q’ is unlikely to deny an English speaker any aspect of its meaning. Autocorrelation coefficients have been used in the thalamus to test for association between adjacent intervals in lemniscal (Nakahama et al., 1966) and somatic (Poggio and Viernstein, 1964) cells. Linear statistics can test for the independence of adjacent intervals (Rodieck et al., 1962; Yang and Chen, 1978), but their application is limited by the non-gaussian distribution of intervals (Bhumbra and Dyball, 2004).

The mutual information is not affected by interchanging x for y. While s[y] is given by the log interval entropy, s[y|x] cannot be obtained from the ISIH. The mutual information I[x; y] between the x and y can also be expressed with respect to the joint entropy s(x, y). I[x; y] = s[x] + s[y] − s[x, y]

(32)

The marginal entropies s[x] and s[y] are calculated by applying Equation 23 to the ‘marginal’ probability densities f (x|I) and f (y|I) that can obtained from the joint probability density f (x, y|I) by marginalisation of the opposite variable by integration.

f (x|I) =

Z

f (x, y|I)dy, SY

f (y|I) =

Z

f (x, y|I)dx SX

(33)

s[x, y] can be obtained from f (x, y|I) by applying Equation 23 in two dimensions.

s[x, y] = −

Z

Sy

Z

Sx

f (x, y|I) loge f (x, y|I)dxdy (34)

No method has been proposed to characterise f (x, y|I) using analytical methods. The use of linear techniques, such as principle component analysis (Hastie et al., 2001) is limited due to the nongaussian distribution of the intervals. Probabilistic

12

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

approaches can be employed empirically by normalising the JISIH on a linear (Rodieck et al., 1962) or logarithmic (Bhumbra and Dyball, 2004) timescale, to construct a probability mass function P (x, y|I). For a finite data set, empirical methods over-estimate the mutual information (Bhumbra and Dyball, 2004) and do not provide confidence limits for the estimates. Monte Carlo methods (Madras, 2002; MacKay, 2003) can be useful in such instances to test whether the mutual information is significantly different from zero (Bhumbra and Dyball, 2004). Since the profile of the post-spike membrane potential will affect spike patterning (Tuckwell, 1988b), it would be useful to characterise the JISIH analytically. The mutual information between adjacent intervals does not quantify the nature of their relationship but only the strength of their association (Bhumbra and Dyball, 2004). There may be a complexity of coding of a higher order than implied by the distribution of the interval scattergram. Many studies have assumed that the spontaneous activity of cells is sufficiently stationary for valid characterisation of their activity using overall frequencies or ISIHs. We illustrate below how the profile of the JISIH can be used to test this. ‘Stationary’ activity can be defined as the steadystate condition in which all the intervals are drawn from a single population with a constant probability density (Reeke and Coop, 2004). We extend this definition to include the requirement that the statistical conditioning effects of any one interval on the next never changes. If an interval is solely influenced by its predecessor, the result is a steady-state Markov chain. A Markov chain is a stochastic process in which a random variable explicitly depends on its immediate predecessor but is conditionally independent of all other preceding random variables (Cover and Thomas, 1991). Consider a sequence of intervals a, b, c, d, . . . of a stationary Markov chain. For a constant JISIH, the joint probabilities f (a, b|I), f (b, c|I), f (c, d|I) are equal. If the ISIH is stationary, the marginal probability densities f (a|I), f (b|I), f (c|), . . . are also equal. The consequences of these equalities can be explained using Bayes’ rule. Bayes’ rule can be derived from the product rule: f (a, b|I) = f (a|b, I)×f (b|I). The terms a and b can be interchanged to express the counterpart product rule: f (b, a|I) = f (b|a, I) × f (a|I). Since f (a, b|I) = f (b, a|I), the right hand expressions can be equated and rearranged to express Bayes’ rule.

f (a|b, I) =

f (a|I)f (b|a, I) f (b|I)

(35)

For the steady state conditions specified above, the marginal probabilities f (a|I) and f (b|I) are equal therefore f (a|b, I) = f (b|a, I). This means that the direction of time cannot affect the profile of f (a, b|I). There must therefore be a line of symmetry along the diagonal a = b in the JISIH (Figure 2f). It is possible to envisage a spike train in which the intervals become progressively shorter. However, this violates our steady-state assumptions as neither the conditional probability between adjacent intervals not their marginal probabilities are stationary. ‘Phasic’ vasopressin cells of the hypothalamus fire in bursts with intermittent periods of relative silence (Poulain and Wakerley, 1982). Although this implies such a nonstationarity, f (a, b|I) routinely shows a line of symmetry along the main diagonal (Bhumbra and Dyball, 2004; Sabatier et al., 2004), suggesting that such spike patterning might require the interaction of dynamic and static mechanisms that affect the firing. The disturbance of such firing patterns in vitro suggests that such interactions may coordinate the stochastic influences of inputs with the deterministic properties of the cell membrane (Sabatier et al., 2004). The inter-neurones of the perinuclear zone dorsolateral to the supraoptic nucleus often show a bimodal log interval distribution (Bhumbra and Dyball, 2004), reflecting the occurrence of spike doublets (Figure 2a). There are examples in which the interval scattergram shows that no two short intervals can occur adjacent to one another (Figure 2e). Intracellular recordings in vitro have shown that perinuclear zone cells show a low threshold depolarising ‘hump’ when depolarised from a hyperpolarised membrane potential (Armstrong and Stern, 1997). Similar humps have been described in other neurones (Tasker and Dudek, 1991). To relate the profile of the JISIH to the postspike membrane potential, it is simplest to characterise the conditioning effects of one interval on the next f (b|a, I). The marginal probability f (a|I) can be obtained by ‘random walking’ a spike train across the conditional probability f (b|a, I), with which it can be multipied to construct the joint probability function f (a, b|I) = (a|I)f (b|a, I). Alternatively, f (a|I) can be described analytically by marginalisation of the Markov chain. The conditional probability f (c|a, I) can be obtained by marginalisation as f (c|b, I) = f (b|a, I).

13

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

f (c|a, I) =

Z

f (c|b, I)f (b|a, I)db

For a bin width ∆τ , the peri-stimulus times τ , over the entire stimulus cycle set AT , can be used to represent a PSTH as a frequency distribution function freq(τ ). By normalising the frequency distribution function, it is possible to construct the conditional probability P (Λ|τ , I) of any one particular spike Λ occurring in the peri-stimulus time bins τ , by dividing the frequency distribution function freq(τ ) by its sum.

(36)

Sb

The procedure can repeated indefinitely.

f (d|a, I) =

Z

f (d|c, I)f (c|a, I)dc

(37)

Sc

Sequential marginalisation can characterise f (c|a, I), f (d|a, I), . . . , f (z|a, I) where z is negligibly dependent on a. If there is no conditioning effects of a on z, then f (z|a, I) equals f (z|I), which for a stationary Markov process equals f (a|I). This means that the mutual information between intervals of increasing separation will asymptote to zero. An additional test for stationarity could thus be to compare the hypothetical reduction in mutual information for the Markov process with that of the observed data. Asymmetry in the JISIH precludes steady-state activity, but it does not necessary follow that a symmetrical histogram confirms stationarity. For example, a neurone that is gradually increasing in firing rate might show clustering of the points in the interval scattergram that shifts along the line of symmetry. Complete descriptions of non-stationary activity requires the characterisation of the dynamics of an unknown stimulus influencing the activity of the cell. While blinded methods exist (Bell and Sejnowski, 1995; Lee, 1998; Roberts and Everson, 2001), they are beyond the scope of this review. Instead, we apply information theory to the more familiar situation in which the stimulus is known to us.

P (Λ|τ , I) = P

freq(τ ) τ ∈AT freq(τ )

(38)

P (Λ|τ , I) does not represent the probability of a spike occurring in time bin τi , but refers to the conditioning effects of the ith peri-stimulus time τi on a particular spike. Since it must occur at some peristimulus time, the sum of the conditional probabilities over the set of peri-stimulus times is one. The summation is an example of a marginalisation and can be used to express P (Λ|I). P (Λ|I) =

X

P (Λ|τ , I) = 1

(39)

τ ∈AT

P (Λ|τ , I) is not representative of the perspective of the neurone as a cell is never ‘given’ the peristimulus time (Rieke et al., 1999); from the point of view of a neurone, τ is the unknown of interest. In Bayesian terms (Sivia, 1996; Leonard and Hsu, 1999), P (Λ|τ , I) is the likelihood probability mass function that can be used to express the posterior probability P (τ |Λ, I), which is the probability of a peri-stimulus time given a particular spike. The terms can be interrelated using Bayes’ rule (Equation 35).

Stimulus-Evoked Activity P (τ |Λ, I) =

Stimuli include some that are elicited directly and others that occur ‘naturally’. The latter are more ‘physiological’ response, but the experimental logistics are more difficult to control. For simplicity, we confine the discussion to the more conventional framework in which the stimulus is controlled experimentally.

P (τ |I)P (Λ|τ , I) P (Λ|I)

(40)

The denominator P (Λ|I) can be omitted as its value equals one (Equation 39). Equation 40 is an example of the generalised ‘decoding’ problem faced by a neural ensemble, where ‘decoding’ can be defined as the construction of a ‘dictionary’ that defines the relationship between each stimulus and response.

The Peri-Stimulus Time Histogram Modulation of spike rates over a stimulus cycle can be represented by a peri- (or post-) stimulus time histogram (Gerstein and Kiang, 1960) (PSTH) and changes in the PSTH can be visualised using cumulative sum techniques (Ellaway, 1978). A PSTH represents the likelihood of a spike occurring in the ith time bin τi throughout the peri-stimulus cycle (Figure 4a).

P (Stimulus|Response, I) = P (Stimulus|I)P (Response|Stimulus, I) P (Response|I) (41) Bayes’ rule thus inter-relates the two parts of a ‘bilingual dictionary’ and represents the ‘decoding’ 14

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

problem from the persectives of both scientist P (Response|Stimulus, I) and neurone P (Stimulus|Response, I). Much of the difficulty in constructing the dictionary is that there is no unambiguous definition of ‘Response’ and ‘Stimulus’ for real data. Equation 40 is perhaps the simplest method of illustrating the decoding problem. However, the PSTH is likely to be an oversimplifaction of neuronal response (Rieke et al., 1999). For example, there may be a cost in information transmission if correlations in the activity of inputs are neglected (Nirenberg and Latham, 2003). The PSTH tends to distort interpretations of the amplitude and shape of the synaptic current underlying motoneurone reflexes (Awiszus et al., 1991). It is possible to average frequency responses to represent a population response by summation or using combined recordings of action potentials. However, averaging means (Waelti et al., 2001), or averaging the mean of averages (Cutler et al., 2003), may result in the loss of physiologically important information (Bhumbra et al., 2005). Here, we focus on the perspective of a neurone of a single input and consider its background information Ineurone . The PSTH ignores any structuring of the spike train during each given sweep. If the probability P (Stimulus|Response, Ineurone ) is to reflect the perspective of the cell, Ineurone must be considered closely. The background information available to the neurone is the previous history of the spike train and the simplest method to represent the interpretation of a neurone to a new spike occurring at time tk is the time of the previous spike tk−1 . As discussed under the ‘The Interval Method’, the relationship between adjacent spikes can be modelled with respect to the interval w. Hence Equation 40 can reconsidered from the perspective of a neurone, by substituting w for Λ. Since both τ and w are continuous variables, their probabilities can be represented using density functions. f (τ |w, Ineurone ) =

also been characterised with respect to their reciprocal, which represents instantaneous spike frequency (Bessou et al., 1968). Plots of instantaneous frequency indicate the time course of the net activation to a motoneurone more accurately than a PSTH (Turker and Cheng, 1994). The peri-stimulus frequencygram (PSF), which plots 1/wk against τk (Figure 4c), has been used more recently to characterise stimulus-evoked responses (Turker et al., 1997; Turker and Powers, 1999). Before concluding this paper, we introduce a modification of the IISP or PSF that can be applied to every cell type. The technique combines information and probabilistic approaches, using the principle of maximum entropy, in an attempt to represent the perspective of the post-synaptic neurone.

The Phase Interval Stimulus Histogram From the perspective of a neurone, the optimal strategy for modelling f (τ |w, Ineurone ) is to maximise the entropy of the probability density function of the peri-stimulus times f (τ |Ineurone ). For our neurone to be as openminded as possible, it would also have to consider the possibility that we remove or change some aspect of the stimulus. If, for a moment, the neurone is willing to believe our promises that we are playing fair, it would know that a cyclic stimulus time of period T is bounded by the interval [−T /2, T /2]. By applying the principle of maximum entropy (Jaynes, 1957), using Lagrange multipliers or Kullback-Leibler divergences (Lee, 1998), our neurone would assign a uniform probability density function for its prior. The time limits [−T /2, T /2] imply two extremes at −T /2 and T /2 when in fact they correspond to exactly the same moment half-way during the stimulus cycle. Since the perspective of a neurone does not recognise such a discontinuity, the peri-stimulus times are better represented using polar rather than Cartesian ordinates. Polar transformation can be used to express the phase angles θ with respect to the peristimulus times τ .

f (τ |Ineurone )f (w|τ , Ineurone ) f (w|Ineurone ) (42)

where f (τ |w, Ineurone ) is the probability density function of the peri-stimulus time τ given a particular interval w. For real data, the relationship between w and τ can be represented by the scatter distribution of plotting each wk against the peri-stimulus time τk of the spike that terminated the interval. Since the introduction of this scatter plot (Awiszus, 1988), called the inter-spike interval superposition plot (IISP), its use has been limited. Inter-spike intervals have

θ=

2πτ T

(43)

It is thus straightforward to convert f (τ |w, Ineurone ) into its corresponding polar counterpart f (θ|w, Ineurone ). We can fix the zero point of θ to some stimulus reference to rotate f (θ|w, Ineurone ); a neurone does not ‘know’ where the reference moment is in the stimulus cycle and the zero point may not be the phase of most ‘interest’ to a cell. Ultimately, 15

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

models can only be obtained from experimental data rather than directly from the perspective of a neurone. Consequently, we must appeal our neurone to see our point of view and express f (w|θ, Ineurone ) by rearranging Equation 42, replacing θ for τ .

f (w|θ, Ineurone ) =

f (w|Ineurone )f (θ|w, Ineurone ) f (θ|Ineurone ) (44)

where f (w|Ineurone ) is the prior, f (θ|w, Ineurone ) is the likelihood function, and f (θ|Ineurone ) is the evidence of a Bayesian model of the posterior f (w|θ, Ineurone ). By applying the principle of maximum entropy, a neurone would optimise the model of f (w|θ, Ineurone ) by maximising the entropy of the prior f (w|Ineurone ). Since from Ineurone , w is a time interval, it is bounded by the range [0, ∞]. The maximum state of ignorance of a quantity known only to be positive (Jaynes, 2003) is represented by the Jeffreys prior (Jeffreys, 1939). An optimal model of the response of a neurone to a stimulus is thus given by modelling the log interval x as its corresponding prior is a uniform probability density function.

f (x|θ, Ineurone ) =

Figure 4: Stimulus-evoked response of a suprachiasmatic cell recorded in vivo during stimulation of the rat arcuate nucleus at 1 Hz. Panel a) is the PSTH with a bin width of 2 ms showing a complex response of orthodromic inhibition and excitation. In panel b), an ISIH for the period is shown with a bin width of 1 ms. Panel c) shows a PSF in which the instantaneous frequency is plotted against the peri-stimulus time. In panel d), the corresponding PhISH (Phase Interval Stimulus Histogram) is illustrated (see text).

f (x|Ineurone )f (θ|x, Ineurone ) (45) f (θ|Ineurone )

Since both the prior f (x|Ineurone ) and evidence f (θ|Ineurone ) become normalisation constants, the posterior f (x|θ, Ineurone ) is proportional to the likelihood function f (θ|x, Ineurone ). f (x|θ, Ineurone ) ∝ f (θ|x, Ineurone )

(46)

Hence modelling log intervals maximises the information that can be obtained from an interval distribution knowing that intervals cannot be negative. Equation 46 thus suggests that any interpretation of a neurone of its own perspective f (θ|x, Ineurone ) and ours f (x|θ, Ineurone ) are directly related if and only if the inter-spike intervals are logged. For real data, the relationship between x and θ can be represented by the scatter distribution of plotting each log interval xk against its peri-stimulus phase angle θk of the spike that terminated the interval. As for the JISIH, analytical descriptions of f (θ, x|I) are conceptually possible but not straightforward. Again, we can resort to empirical models based on normalising a histogram to construct P (θ, x|I). We call the histogram that represents the joint distribution of the log

intervals and the peri-stimulus phase angle the ‘phase interval stimulus histogram’ (PhISH). The PhISH models f (θ, x|I), and resembles the IISP plotted using logarithmic intervals, and substituting the peri-stimulus time for the phase angle (Figure 4d). Since the PSF uses the reciprocal of the interval, it is represented by the PhISH except changing the sign of the logarithmic interval. The probability densities f (x|I) and f (θ|I) can be expressed with respect to f (θ, x|I) by marginalisation. f (x|I) =

Z

f (θ, x|I)dθ, SΘ

f (θ|I) =

Z

f (θ, x|I)dx SX

(47)

where SΘ and SX are the support sets of θ and x 16

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

respectively.

The information capacity thus represents the reduction of uncertainty in predicting the peri-stimulus phase angle after the most informative of intervals that represents the most ‘unexpected’ code messages for the neurone. To establish a change in the stimulusevoked response in different physiological situations, Fourier methods could be applied to compare the different probability density functions for f (θ, x|I).

f (x|I) represents the ISIH using log intervals and f (θ|I) represents the PSTH using peri-stimulus phase angles. Equation 23 can be used to express their respective entropies, s[x] and s[θ], and the joint entropy s[θ, x] is given by Equation 34. The mutual information between spike and stimulus I[θ; x] can thus be expressed from the perspective of a neurone.

Conclusions I[θ; x] = s[x] + s[θ] − s[θ, x]

(48) Using inter-spike intervals, an information approach has been used to characterise spontaneous firing in a way that can distinguish between the different cell types in the supraoptic nucleus (Bhumbra and Dyball, 2004). Similar techniques have been applied to the cells of the suprachiasmatic nucleus (Bhumbra et al., 2005) to describe diurnal oscillations in firing that are not evident using frequency methods alone. By musical analogy, if the mean spike frequency defines the key of a scale of a piece played on an instrument, the timbre of the instrument is represented by the repertoire of inter-spike intervals quantified by the log interval entropy. Extending the analogy, the extent of the usage of motifs in the tune is reflected by the mutual information between adjacent log intervals. In this paper we introduce the PhISH to determine the mutual information between the most recent log interval and the stimulus phase as a measure of the degree to which a neurone is in tune with the world around it. The bias of this review has been from the point of view of a neurone. As a framework for describing single unit activity, we have established four orders of complexity in the capacity for neural coding, in which each order can operate independently. Firstorder coding, quantified by firing rates, is conveyed completely by frequencies that thus entirely describe first moment processes. Second-order coding, represented by the variability of the inter-spike intervals reflected in the interval histogram, is quantified by the log interval entropy. Third-order coding is the result of spike motifs that associate adjacent inter-spike intervals beyond chance levels as observed in the joint interval histogram, and are thus measured by their mutual information. Finally, non-stationarities in activity represents coding of the fourth-order that arise from the effects of a known or unknown stimulus.

Using empirical probability mass functions for real data, Monte-Carlo methods (Madras, 2002) can test if the the mutual information I[θ; x] is significantly greater than zero. Such randomisation methods represent the cross multiplication of f (x|I) and f (θ|I), so that x and θ would be modelled as independent variables. The profile of the PSTH represented by f (θ|I) would not be enough to constitute mutual information. While we might observe an excitation or inhibition in the PSTH, the effect is lost on the postsynaptic cell if the resulting spike train from the single input does not help it infer the stimulus time. In such cases, the post-synaptic cell would have to rely on other inputs either independently or in conjunction. The mutual information represents the average shared information between the log interval and peristimulus phase angle. While the zero-point in the stimulus cycle may not reflect phase angle most of ‘interest’ to the cell, it is possible to determine this from the point in which the shared information is greatest. The mutual information I[θ; x] can be expressed as the reduction of in entropy by knowing the stimulus phase. I[θ; x] = s[x] − s[x|θ]

(49)

The information capacity C[x|θ] represents the maximum reduction in entropy, when the neurone can estimate the stimulus phase with greatest certainty. C[x|θ] = max (s[x] − s[x|θ]) f (θ|I)

(50)

However, a cell is not ‘given’ the peri-stimulus phase angle θ. Its perception is best represented by the information capacity C[θ|x] which represents the maximum reduction in entropy that predicts the stimulus phase with the greatest confidence. C[θ|x] = max (s[θ] − s[θ|x]) f (x|I)

(51)

17

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

Appendix

To expand the integral, it is useful to change the variable w for z, where z = w/β, thus w = βz, therefore dw = βdz, and SZ is defined over the interval [0, ∞].

The Entropy of a Gamma Probability Density Function Consider a gamma probability density function G(α, β) expressed with respect to time w, using α and β as shaping and scaling parameters respectively. 1 wα−1 e−w/β G(α, β) = α β Γ(α)

1 E(loge w) = α β Γ(α) α

(52)

=

where the gamma function Γ(α) is given by a definite integral expressed with respect to z over its support set SZ defined over the interval [0, ∞]. Γ(α) =

Z

z α−1 e−z dz

s[w] = −

SW

=−

Z

=−

Ã

SW

(53)

E(loge w) =

d Γ(α) = dα

=

SW

1 wα−1 e−w/β loge wdw (59) β α Γ(α)

(62)

SZ

SZ

z α−1 e−z loge zdz (63)

z α−1 e−z loge zdz

Z

SZ

z α−1 e−z loge zdz

1 d Γ(α) Γ(α) dα = loge β + Ψ(α)

(64)

(65)

(66) (67)

where the psi function Ψ(α) is the derivative of the log gamma function:

The expectation term E(loge w) can be expressed with respect to G(α, β).

SW

1 Γ(α)

Z

Z

E(loge w) = loge β +

s[w] = α loge β + loge Γ(α) − (α − 1)E(loge w) + α (57)

Z

(61)

Equation 64 can thus be expressed with respect to the differential term:

(56)

G(α, β) loge wdw

z α−1 e−z (loge β + loge z)dz

The remaining integral expression corresponds to the differential of the gamma function, where:

For a gamma probability density function G(α, β), the expectation E(w) is given by αβ (Papoulis and Pillia, 2002).

Z

SZ

loge β 1 Γ(α) + Γ(α) Γ(α)

= loge β +

− α loge β − loge Γ(α)

E(loge w) =

(60)

The first integral expression can be substituted with the gamma function expressed in Equation 53.

G(α, β) loge G(α, β)dw (54) µ ¶ 1 α−1 −w/β G(α, β) loge w e dw β α Γ(α) (55)

!

(βz)α−1 e−z loge (βz)βdz

¶ Z loge β = z α−1 e−z dz Γ(α) SZ ¶ µ Z 1 z α−1 e−z loge zdz + Γ(α) SZ

SZ

E(w) + (α − 1)E(loge w) − β

Z

SZ

µ

It is possible to estimate α and β from the first two moments (Wadsworth, 1990), but precise values can only be obtained using computational algorithms such as maximum likelihood estimation (Mann et al., 1974; Gross and Clark, 1975). By substituting the gamma probability density function for fw (w|I) into Equation 8, the differential entropy s[w] can be expressed in ‘nats’. Z

β β α Γ(α)

Z

d loge Γ(α) dα 1 dΓ(α) ∴ Ψ(α) = Γ(α) dα Ψ(α) =

(58)

(68) (69)

The expression for E(loge w) in Equation 67 can thus be substituted into Equation 57.

18

Spike Coding from the Perspective of a Neurone, Bhumbra & Dyball

s[w] = α loge β + loge Γ(α)

fy (y|I) =

− (α − 1)(loge β + Ψ(α)) + α (70) = α loge β + loge Γ(α) − α loge β + loge β − (α − 1)Ψ(α) + α (71) ∴ s[w] = loge [βΓ(α)] + (1 − α)Ψ(α) + α

= =

(72)

=

The Log Interval Entropy of a Gamma Process

=

First, the probability density function fx (x|I) of the logarithmic intervals x is expressed with respect to the probability density function fw (w|I) of the linear intervals w, where w = ex = dw/dx. dw dx = fw (w|I)ex

=

(74)

The probability density function G(α, β) can be used to represent fw (w|I) for a gamma process. fx (x|I) = G(α, β)ex 1 ∴ fx (x|I) = α wα e−w/β β Γ(α)

E(y) =

(75)

s[x] = −

SX

G(α, β) loge

µ



(81) (82) (83) (84) (85)

fy (y|I)ydy

(86)

SY

Z

(87) (88)

Equation 65 can be expressed with respect to y where z = ey and dz = ey dy.

1 wα e−w/β dx β α Γ(α) (77)

µ

E(w) = − −α loge β − loge Γ(α) + αE(log w) − β (78)



Z d z α−1 e−z loge zdz Γ(α) = dα SZ Z y d ∴ eαy e−e ydy Γ(α) = dα SY

(89) (90)

Equation 90 expresses the integral term of Equation 88 as the derivative of the gamma function. It can be divided by the gamma function to express the psi function Ψ(α) as defined in Equation 69.

For a gamma probability density function G(α, β), the expectation E(w) is given by αβ (Papoulis and Pillia, 2002). s[x] = α loge β + loge Γ(α) − αE(log w) + α

Z

1 αy −ey e e ydy Γ(α) SY Z y 1 = eαy e−e ydy Γ(α) SY =

(76)

The entropy s[x] can be expressed by substitution into Equation 23 Z

(80)

Since the probability density function fy (y|I) is independent of β, and identical to fx (x|I) except shifted negatively by loge β, the entropy of fx (x|I) is also independent of β. This can be proved by expressing the expectation E(y) by the integral of the product of fy (y|I) and y over the support set SY that is defined over the interval [−∞, ∞].

(73)

fx (x|I) = fw (w|I)

1 wα e−w/β β α Γ(α) x 1 eαx e−e /β α β Γ(α) y+loge β 1 /β eαy+α loge β e−e β α Γ(α) y loge β 1 /β eαy eα loge β e−e e α β Γ(α) y 1 eαy β α e−e β/β β α Γ(α) 1 αy −ey e e Γ(α)

(79)

E(y) =

To express the expectation term E(log w), it is first useful to introduce a new variable y, where y = x − loge β. Since dy/dx = 1, the probability density function fy (y|I) is identical to fx (x|I) as shown in Equation 76, and may be expressed with respect to y by the substitution x = y + loge β.

1 dΓ(α) = Ψ(α) Γ(α) dα

(91)

Since x is y + loge β, its expectation E(x) can be expressed with respect to E(y). E(x) = E(y) + loge β = Ψ(α) + loge β 19

(92) (93)

The expression for E(x) can thus be substituted into Equation 79.

Although the summation of reciprocals is divergent (Blakey, 1949), the negative logarithmic term converges the value of the expression to a single quantity (≈ 0.5772). If α is large, it can be used to provide an approximation for γ.

s[x] = α loge β + loge Γ(α) − α(Ψ(α) + loge β) + α (94) = α loge β + loge Γ(α) − αΨ(α) − α loge β + α (95) ∴ s[x] = loge Γ(α) − αΨ(α) + α

γ ≈ − loge α +

(96)

If α is a positive integer, the gamma function Γ(α) can be expressed by factorial (a − 1)!. Using Stirling’s formula (Harris and Stocker, 1998), Γ(α + 1) can be approximated for large values of α. Γ(α + 1) = α! ≈

Ψ(α) ≈ loge α − = loge α −

2πααα e−α

(97)

∴ Ψ(α) ≈ loge α −

Since Γ(α + 1) equals αΓ(α) (Press et al., 2002), it is possible to approximate loge Γ(α).

loge Γ(α) ≈ loge

µ

1√ 2πααα e−α α 1



(98)

i=1

j→∞

− loge j +

i

α−1 X i=1

Ãα−1 ! X1 i=1

1 α

i



1 i

(107) α−1

1 X1 + α i=1 i

(108) (109)

References

1 i

1 1 1 + + ... + 1 2 j à ! j X1 = lim − loge j + j→∞ i i=1

γ = lim

i=1

+

¶ 1 1 loge (2π) − loge α + α loge α − α s[x] ≈ 2 2 ¶ µ 1 +α (110) − α loge α − α 1 1 = loge (2π) − loge α + α loge α − α 2 2 1 − α loge α + α + α (111) α 1 1 = loge (2π) − loge α + 1 (112) 2 µ ¶2 1 2π = loge + loge e (113) 2 α µ ¶ 2πe 1 ∴ s[x] ≈ loge (114) 2 α

(103) Abeles, M. (1982), ‘Role of the cortical neuron: integrator or coincidence detector?’, Israel Journal of Medical Sciences 18, 83–92.

where γ is Euler’s constant that can be expressed with respect to an infinite summation. µ

α X 1

µ

The function Ψ(α) can be expressed with respect to Euler’s constant γ by summation. α−1 X

(106)

i

Equations 102 and 109 can thus be used to approximate s[x] expressed in Equation 28.

(99) = loge ((2πα) 2 αα−1 e−α ) 1 = loge (2πα) + (α − 1) loge α − α 2 (100) 1 1 = loge (2π) + loge α 2 2 + α loge α − loge α − α (101) 1 1 ∴ loge Γ(α) ≈ loge (2π) − loge α + α loge α − α 2 2 (102)

Ψ(α) = −γ +

i=1

The approximation of can be substituted into Equation 103.

Approximation of the Log Interval Entropy of a Gamma Process



α X 1



Abeles, M. (1991), Corticonics., Cambridge University Press., Cambridge, UK. (104)

Abeles, M., Bergman, H., Margalit, E. and Vaadia, E. (1993), ‘Spatiotemporal firing patterns in the frontal-cortex of behaving monkeys’, Journal of Neurophysiology 70, 1629–1638.

(105)

20

Abeles, M. and Gerstein, G. (1988), ‘Detecting spatiotemporal firing patterns among simultaneously recorded single neurons’, Journal of Neurophysiology 60, 909–924.

Bhumbra, G. and Dyball, R. (2004), ‘Measuring spike coding in the supraoptic nucleus’, Journal of Physiology 555, 281–296. Bhumbra, G., Inyushkin, A. and Dyball, R. (2004), ‘Assessment of spike activity in the supraoptic nucleus’, Journal of Neuroendocrinology 16, 390– 397.

Adrian, E. (1926), ‘The impulses produced by sensory nerve endings: Part i.’, Journal of Physiology 1, 151–171. Aertsen, A. and Arndt, M. (1993), ‘Response synchronization in the visual cortex’, Current Opinions in Neurobiology 3, 586–594.

Bhumbra, G., Inyushkin, A., Saeb-Parsy, K., Hon, A. and Dyball, R. (2005), ‘Rhythmic changes in spike coding in the rat suprachiasmatic nucleus’, Journal of Physiology 563, 291–307.

Aitchison, J. and Brown, J. (1963), The Lognormal Distribution, Cambridge University Press, Cambridge.

Blakey, W. (1949), University Mathematics, Blackie and Son Limited, London, UK.

Armstrong, W. and Stern, J. (1997), ‘Electrophysiological and morphological characteristics of neurons in perinuclear zone of supraoptic nucleus’, Journal of Neurophysiology 78, 2427–2437.

Borst, A. and Theunissen, A. (1999), ‘Information theory and neural coding’, Nature Neuroscience 2, 947–957. Brenner, N., Bialek, W. and de Ruyter van Steveninck, R. (2000), ‘Adaptive rescaling maximizes information transmission’, Neuron 26, 695–702.

Awiszus, F. (1988), ‘Continuous functions determined by spike trains of a neuron subject to stimulation’, Biological Cybernetics 58, 321–327. Awiszus, F., Feistner, H. and Schafer, S. (1991), ‘On a method to detect long-latency excitations and inhibitions of single hand muscle motoneurones in man’, Experimental Brain Research 86, 440– 446.

Brenner, N., Strong, S., Koberle, K., Bialek, W. and de Ruyter Van Steveninck, R. (2000), ‘Synergy in a neural code’, Neural Computation 12, 1231– 1552. Burns, B. and Webb, A. (1976), ‘The spontaneous activity of neurones in the cat’s cerebral cortex’, Proceedings of the Royal Society of London. B. 194, 211–223.

Baker, S. and Gerstein, G. (2000), ‘Improvements to the sesntivity of gravitational clustering for multiple neuron recordings’, Neural Computation 12, 2597–2620.

Chacron, M., Longtin, A. and Maler, L. (2001), ‘Negative interspike interval correlations increase the neuronal capacity for encoding time-dependent stimuli’, Journal of Neuroscience 21, 5328–5343.

Baker, S. and Lemon, R. (2000), ‘Precise spatiotemporal repeating patterns in monkey primary and supplementary motor areas occur at chance levels’, Journal of Neurophysiology 84, 1770–1780.

Cover, T. and Thomas, J. (1991), Elements of Information Theory, John Wiley.

Bell, A. and Sejnowski, T. (1995), ‘An information maximisation approach to blind separation and blind deconvolution’, Neural Computation 7, 1129–1159. Berry, M., Warland, D. and Meister, M. (1997), ‘The structure and precision of retinal spike trains’, Proceedings of the National Academy of Sciences USA 94, 5411–5416.

Cutler, D., Haraura, M., Reed, H., Shen, S., Sheward, W., Morrison, C., Marston, H., Harmar, A. and Piggins, H. (2003), ‘The mouse VPAC2 receptor confers suprachiasmatic nuclei cellular rhythmicity and responsiveness to vasoactive intestinal polypeptide in vitro’, European Journal of Neuroscience 17, 197–204.

Bessou, P., Laporte, Y. and Pages, B. (1968), ‘A method of analysing the responses of spindle primary endings to fusimotor stimulation’, Journal of Physiology 196, 37–45.

Dan, Y., Alonso, J., Ursery, W. and Reid, R. (1998), ‘Coding of visual information by precisely correlated spikes in the lateral geniculate nucleus’, Nature Neuroscience 1, 501–507.

21

Dayhoff, J. and Gerstein, G. (1983), ‘Favored patterns in spike trains. 2. Applications.’, Journal of Neurophysiology 49, 1349–1363.

Gray, C. (1999), ‘The temporal correlation hypothesis of visual feature integration: still alive and well’, Neuron 24, 31–47.

de Ruyter van Steveninck, R., Lewen, G., Strong, S., Koberle, R. and Bialek, B. (1997), ‘Reproducibility and variability in neural spike trains’, Science 275, 1805–1808.

Gross, A. and Clark, V. (1975), Survival distributions: Reliability applications in the biomedical sciences, Wiley, New York. Haken, H. (1996), Principles of Brain Functioning: A Synergetic Approach to Brain Activity, Behaviour and Cognition, Springer-Verlag, New York, USA.

Dear, S., Simmons, J. and Fritz, J. (1993), ‘A possible neuronal basis for representation of acoustic scences in the auditory cortex of the big brown bat’, Nature 364, 620–623.

Harris, J. and Stocker, H. (1998), Handbook of Mathematics and Computational Science, Springer.

DeCharms, R. and Merzenich, M. (1995), ‘Primary cortical representation of sounds by the coordination of action potential timing’, Nature 381, 610– 613.

Hastie, T., Tibshirani, R. and Friedman, J. (2001), The Elements of Statistical Learning, SpringerVerlag, New York, USA.

Deco, G. and Schurmann, B. (1998), ‘The coding of information by spiking neurons: an analytical study’, Network: Computational Neural Systems 9, 303–317.

Hodgkin, A. and Huxley, A. (1952), ‘A quantitative description of membrane current and its application to conduction and excitation in the nerve’, Journay of Physiology 117, 500–544.

Dyball, R. and Leng, G. (1986), ‘Regulation of the milk ejection reflex in the rat’, Journal of Physiology 380, 239–256.

Jaynes, E. (1957), ‘Information theory and statistical mechanics’, Physics Reviews 106, 171–190. Jaynes, E. (2003), Probability Theory: The Logic of Science, Cambridge University Press, Cambridge, UK.

Ellaway, P. (1978), ‘Cumulative sum technique and its application to the analysis of peristimulus time histogram’, Electroencepholography and Clinical Neurophysiology 45, 302–304.

Jeffreys, H. (1939), Theory of Probability, Clarendon Press, Oxford.

Fairhall, A., Lewen, G., Bialek, W. and de Ruyter Van Steveninck, R. (2001), ‘Efficiency and ambiguity in an adaptive neural code’, Nature 412, 787–792.

Karbowiak, A. (1969), Theory of Communication, Oliver and Boyd, Edinburgh, UK. Klemm, W. and Sherry, C. (1981), ‘Serial ordering in spike trains: What’s it ”trying to tell us”?’, International Journal of Neuroscience 14, 15–33.

Fatt, P. and Katz, B. (1952), ‘Spontaneous subthreshold activity at motor nerve endings’, Journal of Physiology 117, 109–128.

Knierem, J. and van Essen, D. (1992), ‘Neuronal responses to static textures in area V1 of the alert Macaque monkey’, Journal of Neurophysiology 67, 961–980.

Fee, M. and Kleinfeld, D. (1994), ‘Neuronal responses in rat vibrissa cortex during behaviour’, Society of Neuroscience Abstracts. Ferster, D. and Spruston, N. (1995), ‘Cracking the Neuronal Code’, Science 270, 756–757.

Knudsen, E. and Konishi, M. (1979), ‘Mechanisms of sound localization in the barn owl (Tyto alba)’, Journal of Comparative Physiology 133, 13–21.

Gerstein, G. and Kiang, N.-S. (1960), ‘An approach to the quantitative analysis of electrophysiological data from single neurons’, Biophysical Journal 1, 15–28.

Kozachenko, L. and Leonenko L (1987), ‘Sample estimate of the entropy of a random vector’, Problems of Information Transmission 23, 95–101.

Gerstein, G. and Mandelbrot, B. (1964), ‘Random walk models for the spike activity of a single neuron’, Biophysical Journal 4, 41–68.

Land, M. and Collett, T. (1974), ‘Chasing behavior of houseflies (Fannia canicularis): A description and analysis’, Journal of Comparative Physiology 89, 331–357. 22

Lapicque, L. (1907), ‘Recherches quantitatives sur l’excitation electrique des nerfs traitee comme une polarization’, Journal of General Physiology and Pathology 9, 620–635.

Nirenberg, S. and Latham, P. (2003), ‘Decoding neuronal spike trains: How important are correlations’, Proceedings of the National Academcy of Sciences USA 100, 7348–7353.

Lee, T.-W. (1998), Independent Component Analysis, Kluwer Academic Publishers.

O’Keefe, J. and Recce, M. (1993), ‘Phase relationship between hippocampal place units and the EEG theta rhythm’, Hippocampus 3, 317–330.

Leng, G., Brown, C., Bull, P., Brown, D., Scullion, S., Currie, J., Blackburn-Munro, R., Feng, J., Onaka, T., Verbalis, J., Russell, J. and Ludwig, M. (2001), ‘Responses of magnocellular neurons to osmotic stimulation involves coactivation of excitatory and inhibitory input: An experimental and theoretical analysis’, Journal of Neuroscience 21(17), 6967–6977.

Oram, M., Wiener, M., Lestienne, R. and Richmond BJ (1999), ‘Stochastic nature of precisely timed spike patterns in visual system neuronal responses’, Journal of Neurophysiology 81, 3021– 3033. Papoulis, A. and Pillia, S. (2002), Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, USA.

Leonard, T. and Hsu, J. (1999), Bayesian Methods, Cambridge University Press, Cambridge, UK.

Perkel, D. and Bullock, T. (1968), ‘Neural coding: a report based a a Neuroscience Research Progress work session’, Neuroscience Research Progress Bulletin 6, 3.

MacKay, D. (2003), Information Theory, Inference, and Learning Algorithms, Cambridge University Press, Cambridge, UK. Mackay, D. and McCulloch, W. (1952), ‘The limiting information capacity of a neuronal link’, Bulletin of Mathematical Biophysics 14, 127–135.

Perkel, D., Gerstein, G. and Moore, G. (1967), ‘Neuronal spike trains and stochastic point processes: I. the single spike train’, Biophysical Journal 7, 391–418.

MacLeod, K., Baecker, A. and Laurent, G. (1998), ‘Who reads temporal information contained across synchronized and oscillatory spike trains?’, Nature 395, 693–698.

Pincus, S. (1991), ‘Approximate entropy as a measure of system complexity’, Proceedings of the National Academy of Sciences USA 88, 2297– 2301.

Madras, N. (2002), Lectures on Monte Carlo Methods, American Mathematical Society, Rhode Island, USA.

Poggio, G. and Viernstein, L. (1964), ‘Time series analysis of impulse sequences of thalamic somatic sensory neurons’, Journal of Neurophysiology 27, 517–545.

Mann, N., Schafer, R. and Singpurwalla, N. (1974), Methods for statistical analysis of reliability and life data, Wiley, New York.

Poizari, P. and Mel, B. (2001), ‘Impact of active dendrites and structural plasticity on the memory capacity of neural tissue’, Neuron 29, 779–796.

Matthews, P. (1996), ‘Relationship of firing intervals of human motor units to the trajectory of postspike after-hyperpolarization and synaptic noise’, Journal of Physiology 492, 597–628.

Poulain, D. and Wakerley, J. (1982), ‘Electrophysiology of hypothalamic magnocellular neurones secreting oxytocin and vasopressin’, Neuroscience 7, 773–808.

Mel, B. (1993), ‘Synaptic integration in an excitable dendritic tree’, Journal of Neurophysiology 70, 1086–1101.

Press, W., Teukolsky, S., Vetterling, W. and Flannery, B. (2002), Numerical Recipes in C++: The Art of Scientific Computing, Cambridge University Press, Cambridge, UK.

Nakahama, H., Nishioka, S., Otsuka, T. and Aikawa, S. (1966), ‘Statistical dependency between interspike intervals of spontaneous activity in thalamic lemniscal neurons’, Journal of Neurophysiology 29, 921–934.

Rao, R., Olshausen, B. and Lewicki, M. (2002), Probabilistic Models of the Brain, The Massachusetts Institute of Technology Press, Cambridge, Massachusetts.

23

Reeke, G. (2005), Personal Communication.

Sherry, C. and Klemm, W. (1981), ‘Entropy as an index of the informational state of neurons’, International Journal of Neuroscience 15, 171–178.

Reeke, G. and Coop, A. (2004), ‘Estimating the temporal interval entropy of neuronal discharge’, Neural Computation 16, 941–970.

Sherry, C. and Klemm, W. (1984), ‘What is the meaningful measure of neuronal spike train activity?’, Journal of Neuroscience 10, 205–213.

Reid, R., Soodak, R. and Shapley, R. (1991), ‘Directional selectivity and spatiotemporal structure of receptive fields of simple cells in cat striate cortex’, Journal of Neurophysiology 66, 505–529.

Simmons, J. (1979), ‘Perception of echo phase information in bat sonar’, Science 204, 1336–1338.

Reinagel, P. and Reid, R. (2000), ‘Temporal coding of visual information in the thalamus’, Journal of Neuroscience 20, 5392–5400.

Simmons, J. (1989), ‘A view of the world through the bat’s eat: The formation of acoustic images in echolocation’, Cognition 33, 155–199.

Rieke, F., Warland, D., de Ruyter Van Steveninck, R. and Bialek, W. (1999), Spikes: Exporing the Neural Code, The Massachachusetts Institute of Technology Press.

Simmons, J., Ferragam, M., Moss, C., Stevenson, S. and Altes, R. (1990), ‘Discrimination of jittered sonar echoes by the echolocating bat, Eptesicus fuscus: The shape of target images in echolocation’, Journal of Comparative Physiology 167, 589–616.

Roberts, S. and Everson, R. (2001), Independent Component Analysis: Principles and Practice, Cambridge University Press, Cambridge, UK.

Sivia, D. (1996), Data Analysis: A Bayesian Tutorial, Oxford University Press Inc., New York, USA.

Rodieck, R., Kiang, N.-S. and Gerstein, G. (1962), ‘Some quantitative methods for the study of spontaneous activity of single neurons’, Biophysical Journal 2, 351–367.

Softky, W. (1995), ‘Simple codes versus efficient codes’, Current Opinions in Neurobiology 5, 239– 247s.

Rose, G. and Heilenberg, W. (1985), ‘Temporal hyperacuity in the electrical sense of fish’, Nature 318, 178–180.

Softky, W. and Koch, C. (1993), ‘The highly irregular firing of cortical cells is consistent with temporal integration of random epsps’, Journal of Neuroscience 13, 334–350.

Sabatier, N., Brown, C., Ludwig, M. and Leng, G. (2004), ‘Phasic spike patterning in rat supraoptic neurones in vivo and in vitro’, Journal of Physiology 558, 161–180.

Stein, R. (1965), ‘A theoretical analysis of neuronal variability’, Biophysiological Journal 5, 173–194. Stopfer, M., Bhagavan, S., Smith, B. and Laurent, G. (1997), ‘Impaired odour discrimination on desynchronization of odour-encoding neural assemblies’, Nature 390, 70–74.

Saeb-Parsy, K. and Dyball, R. (2003), ‘Defined cell groups in the rat suprachiasmatic nucleus have different day / night rhythms of single-unit activity in vivo’, Journal of Biological Rhythms pp. 26–42.

Strong, S., Koberle, R., de Ruyter van Steveninck, R. and Bialek, W. (1998), ‘Entropy and information in neural spike trains’, Physical Review Letters 80, 197–200.

Salinos, E. and Sejnowski, T. (2001), ‘Correlated neuronal activity and the flow neural information’, Nature Reviews 2, 539–550.

Tasker, J. and Dudek, F. (1991), ‘Electrophysiological properties of neurones in the region of the paraventricular nucleus in slices of the rat hypothalamus’, Journal of Physiology 434, 271–293.

Shadlen, M. and Moveshon, J. (1999), ‘Synchrony unbound: a critical evaluation of the temporal binding hypothesis’, Neuron 24, 67–77. Shadlen, M. and Newsome, W. (1994), ‘Noise, neural codes and cortical organization’, Current Opinions in Neurobiology 4, 569–579.

Tiesenga, P., Jose, J. and Sejnowski, T. (2000), ‘Comparison of current-driven and conductancedriven neocortical model neurons with HodgkinHuxley voltage-gated channels’, Physical Review E 62, 8413–8419.

Shannon, C. and Weaver, W. (1949), The Mathematical Theory of Communication., University of Illinois Press, Urbana. 24

Tuckwell, H. (1988a), Introduction to Theoretical Neurobiology: volume 1 -linear cable theory and dendritic structure, Cambridge University Press, Cambridge.

Yang, G. and Chen, T. (1978), ‘On statistical methods in neural spike train analysis’, Mathematical Biosciences 38, 1–34.

Tuckwell, H. (1988b), Introduction to Theoretical Neurobiology: Volume 2 - Nonlinear and Stochastic Theories, Cambridge University Press.

Acknowledgements This work was supported by the Medical Research Council (U.K.), Engineering and Physical Sciences Research Council (U.K.), Merck, Sharp, and Dohme (U.K.), and the James Baird Fund. We are extremely grateful to G. Leng from the University of Edinburgh for his assistance in the preparation of this manuscript.

Turker, K. and Cheng, H. (1994), ‘Motor-unit firing frequency can be used for the estimation of synaptic potentials in human motoneurones’, Journal of Neuroscience 53, 225–234. Turker, K. and Powers, R. (1999), ‘Effects of large excitatory and inhibitory inputs on motoneuron discharge rate and probability’, Journal of Neurophysiology 82, 829–840. Turker, K., Yang, J. and Scutter, S. (1997), ‘Tendon tap induces a single long lasting excitatory reflex in the motoneurons of human soleus muscle’, Experimental Brain Research 115, 169–173. Uscher, M., Stemmler, M. and Koch, C. (1994), ‘Network amplification of local fluctuations causes high spike rate variability, fractal firing patterns and oscillatory field potentials’, Neural Computation 6, 795–836. Usrey, W. and Reid, R. (1999), ‘Synchronous activity in the nervous system’, Annual Reviews in Neuroscience 61, 435–456. Victor, J. (2002), ‘Binless strategies for estimation of information from neural data’, Physical Review E 66, 051903 1–051903 15. Wadsworth, H. (1990), Handbook of Statistical Methods for Engineers and Scientists, McGraw-Hill. Waelti, P., Dickinson, A. and Shultz, W. (2001), ‘Dopamine responses comply with basic assumptions of formal learning theory’, Nature 412, 43– 48. Wehr, M. and Laurent, G. (1999), ‘Relationship between afferent and central temporal patterns in the locust olfactory system’, 19, 381–390. Wetmore, D. and Baker, S. (2004), ‘Post-spike distance-to-threshold trajectories of neurones in monkey motor cortex’, Journal of Physiology. Wilson, M. and McNaughton, B. (1993), ‘Dynamics of the hippocampal code for space’, Science 261, 1055–1058.

25

Suggest Documents