INSTITUTE OF PHYSICS PUBLISHING
NETWORK: COMPUTATION IN NEURAL SYSTEMS
Network: Comput. Neural Syst. 14 (2003) 303–319
PII: S0954-898X(03)60761-X
Optimal neural rate coding leads to bimodal firing rate distributions M Bethge, D Rotermund and K Pawelzik Institute of Theoretical Physics, University Bremen, Otto-Hahn-Alle, D-28334, Germany E-mail:
[email protected]
Received 23 July 2002, in final form 4 February 2003 Published 8 April 2003 Online at stacks.iop.org/Network/14/303 Abstract Many experimental studies concerning the neuronal code are based on graded responses of neurons, given by the emitted number of spikes measured in a certain time window. Correspondingly, a large body of neural network theory deals with analogue neuron models and discusses their potential use for computation or function approximation. All physical signals, however, are of limited precision, and neuronal firing rates in cortex are relatively low. Here, we investigate the relevance of analogue signal processing with spikes in terms of optimal stimulus reconstruction and information theory. In particular, we derive optimal tuning functions taking the biological constraint of limited firing rates into account. It turns out that depending on the available decoding time T , optimal encoding undergoes a phase transition from discrete binary coding for small T towards analogue or quasi-analogue encoding for large T . The corresponding firing rate distributions are bimodal for all relevant T , in particular in the case of population coding. (Some figures in this article are in colour only in the electronic version)
1. Introduction Since the discovery by Adrian (1926) that action potentials are generated by sensory neurons with a frequency that is substantially determined by the stimulus, the idea of rate coding has become the prevalent paradigm in neuroscience (Perkel and Bullock 1968). In particular, today the coding properties of many neurons from various areas in the cortex have been characterized by tuning curves, which describe the average firing rate response as a function of certain stimulus parameters. Remarkably, almost all tuning functions measured in the mammalian cortex have a smooth bell-shaped form, which suggests an analogue neural code. On the other hand it is obvious that the maximum number of spikes kmax that can be taken into account by subsequent neurons is limited by their integration time, so that a rate code actually constitutes a discrete code with kmax + 1 different symbols. Therefore, the precision 0954-898X/03/020303+17$30.00
© 2003 IOP Publishing Ltd
Printed in the UK
303
304
M Bethge et al
x
f
x(k) T T
Figure 1. The presynaptic neuron on the left computes some analogue value x from its synaptic inputs. In order to signal this quantity to another postsynaptic neuron, spikes are generated with a firing rate f (x) and propagated to the postsynaptic neuron. The postsynaptic neuron integrates over all incoming spikes within a time window of length T . The resulting spike count k then serves as the basis for any computation of the postsynaptic neuron for which an estimate xˆ of x is required.
with which an analogue signal can be transmitted to other neurons in a rate code is limited, because the mutual information I between the received number of spikes k and the underlying firing rate f cannot be larger than the log index of the symbol set (i.e. I log2 (kmax +1) (Cover and Thomas 1991)). Another measure for the coding accuracy that is particularly well suited to the issue of analogue signalling is given by the minimum mean squared error (MMSE) with which the original signal can be reconstructed from the received signal. Similarly to mutual information, the MMSE χ 2 is bounded by kmax , too. For example for any uniformly distributed signal with variance v it holds that χ 2 v/(kmax + 1)2 , where the rhs corresponds to the mean squared error of the uniform quantizer (Gersho and Grey 1992). For both measures it holds that the bounds can only be attained in the case of noiseless signal transmission. Here we study the question of optimal neural rate coding, in cases where the rate signal is subjected to Poisson noise. In the presence of noise, the bounds on mutual information and the MMSE still hold true, if kmax is replaced by the maximum mean spike count µmax . The high degree of irregularity in spike timing of cortical cells in vivo implies that the effective integration time of cortical neurons is small such that for the generation of a postsynaptic action potential essentially only the last spike emitted by each presynaptic neuron is relevant (Softky and Koch 1993). This fact suggests that small values of µmax (say 1 µmax 5) are particularly relevant. Taken together the biophysical properties of cortical neurons impose strong constraints on rate signalling, in particular short integration times, limited maximum firing rates, energy constraints, anatomical constraints etc (in Bethge et al (2002) we investigated the differential effects of various constraints on the shape of optimal tuning functions). Here, we analyse properties of optimal encoding under the first two constraints listed above with respect to the MMSE. The motivating picture we have in mind refers to the communication process from one neuron to another, where we assume that the presynaptic neuron has computed an analogue number x from its inputs and is now faced with the problem of signalling it over some distance along its axon to other neurons by the use of spikes (figure 1). In other words, x is assumed to represent exactly the ‘relevant information’ encoded by the neuron, which need not match with the stimulus parameters typically investigated experimentally. For this situation we seek to determine optimal tuning functions such that the MMSE with which x can be inferred by a postsynaptic neuron is minimized. Note that a tuning function with respect to x constitutes a neuronal response function very similar to an f –I curve as known from experimental studies. The theoretical analysis presented in this paper, however, does not rely on assumptions about particular physical signals corresponding to x. With respect to the issue of optimal rate signalling between neurons, as motivated above, the MMSE appears to be a well suited objective function and it is rather natural to consider each neuron individually. Nevertheless, in the literature several papers are devoted to the issue
Optimal neural rate coding leads to bimodal firing rate distributions
305
k
x
f
T
x(k)
Figure 2. All neurons of the presynaptic population on the left together compute some analogue value x from their synaptic inputs. In order to signal this quantity to another postsynaptic population, all neurons generate spikes with a firing rate f (x) that are propagated to the postsynaptic population. The postsynaptic neurons integrate over all incoming spikes within a time window of length T . The resulting total spike count k then serves as the basis for any computation of the postsynaptic neurons for which an estimate xˆ is required.
of population coding with respect to the mean squared error loss, so we will take this case into account, too. In particular, this study might serve to take a different view on neuronal populations as they are defined in many large-scale neural network models. The latter often consist of a homogeneous population, for which the responses of all neurons depend identically on the external input. In this highly redundant case the signal has to be encoded in graded differences of the population rate, which has been called ‘intensity coding’ or ‘rate-gradient coding’ (figure 2). It is, however, also possible to encode the signal in different spatial patterns rather than in the graded total activity, which is often termed ‘place coding’ or ‘labelled-line coding’. In its most characteristic form, each neuron then responds only in a binary manner contributing at most one bit to the whole signal representation. According to Snippe (1996) any population code can be considered as a combination of these two extreme encoding strategies. In previous studies optimal population codes have been derived on the basis of Fisher information only, which generally cannot account for discrete encodings and is meaningful merely as an asymptotic quantity (Bethge et al 2002). By comparison of some selected coding strategies, we will show strong evidence that optimal population codes have binary tuning functions for all relevant µmax so that rate-gradient coding appears to be of minor relevance. The bottom line of this paper is to ask in how far the interpretation of rate coding as an analogue code actually makes sense. To this end, we make use of the very basic constraint of a limited decoding time only, while additional constraints e.g. energy consumption would imply a further restriction to the possibility of analogue rate coding. After a brief introduction of the main methods in section 2, we analyse optimal encoding strategies of a single neuron in section 3 and finally we investigate the case of population coding in section 4. 2. Methods The issue is to optimally encode a real number x in the number of pulses emitted by a neuron within a certain time window. Thereby, x stands for the intended analogue output of the neuron that shall be signalled to subsequent neurons. We assume, however, that the neuronal output
306
M Bethge et al
actually read out by subsequent neurons is given by the number of spikes k integrated within a time interval of length T . The statistical dependence between x and k is specified by the assumption of Poisson noise (µ(x))k exp{−µ(x)}, (1) k! and the choice of the tuning function f (x), which together with T determines the mean spike count µ(x) = T f (x). An important additional constraint is the limited range of the neuronal firing rate, which can be included by the requirement of a bounded tuning function ( f min f (x) f max , ∀x). Since inhibition can reliably prevent a neuron from firing, we will consider the case fmin = 0 most of the time. Instead of specifying fmax it makes sense to impose a bound on the mean spike count directly (i.e. µ(x) µmax ), because f max constitutes a meaningful constraint only with respect to a fixed time window of length T . Since µmax has a crucial effect on the signal-to-noise ratio, we will analyse the coding properties as a function of µmax . As objective function we consider the MMSE1 with respect to x ∈ [0, 1] 1 ∞ ( 0 x p(k|µ(x)) dx)2 1 χ 2 [µ(x)] = E[x 2 ] − E[xˆ 2 ] = − , (2) 1 3 k=0 p(k|µ(x)) dx p(k|µ(x)) =
0
where x(k) ˆ = E[x|k] denotes the mean square estimator, which is the conditional expectation (see e.g. Lehmann and Casella 1999). In the second part of the paper, we consider the case where N > 1 neurons together encode for the same quantity x. Then we have to deal with N tuning functions { f j (x)} Nj=1 and N spike counts k j , which we put together in a single spike count vector k = (k1 , . . . , k N ). Provided the noise between different neurons is statistically independent, we then have p(k|x) =
N
p(k j |µ j (x)) =
j =1
N (µ j (x))k j j =1
k j!
exp{−µ j (x)}
(3)
and the objective function becomes χ 2 [µ1 (x), . . . , µ N (x)] = E[x 2 ] − E[xˆ 2 ] 1 ∞ ∞ ( 0 x p(k|µ1 (x), . . . , µ N (x)) dx)2 1 ... . = − 1 3 k1 =0 p(k|µ1 (x), . . . , µ N (x)) dx k N =0
(4)
0
3. Optimal encoding with a single neuron As derived in Bethge et al (2002) on the basis of Fisher information the optimal tuning function for a single neuron in the asymptotic limit T → ∞ has a parabolic shape (figure 3, left): 2 f max − fmin x + f min . (5) f asymp (x) = For any finite µmax , however, this tuning function is not necessarily optimal. In particular, in the limit µmax → 0, the Poisson distribution becomes converges uniformly to a Bernoulli distribution with P(k|µ) = µk (1 − µ)1−k for k ∈ {0, 1}, and for the latter, it is straightforward to show that the optimal tuning curve is a step function f binar y (x) = f min + ( f max − fmin )(x − ϑ f min (µmax )), 1
For a discussion of this choice see Bethge et al (2002).
(6)
Optimal neural rate coding leads to bimodal firing rate distributions
307 f
fmax
max
0
0
1
0
0
1 x
x
Figure 3. While the parabolic tuning function f asym p (left) is asymptotically optimal in the limit µmax → ∞, the step function f binary (right) is advantageous for small µmax . The optimal threshold of the step function moves from 23 to 12 with increasing µmax .
χ2
1/12
1/48 0
5
µ
10
max
Figure 4. Comparison of the MMSE for the parabolic tuning function and for the step function. The χ 2 -axis has a logarithmic scale. In the relevant region 1 µmax 5 the step function f binary is clearly advantageous.
where (z) is the Heaviside function, that equals one if z > 0 and zero if z < 0 (figure 3, right). In the case of fmin = 0 the optimal threshold ϑ f min (µmax ) ∈ [1/2, 2/3] as a function of µmax can be determined analytically2 √ 3 − 8e−µmax + 1 (7) ϑ0 (µmax ) = 1 − 4(1 − e−µmax ) as well as the corresponding MMSE (see appendix): 3ϑ02 (µmax ) 1 χ 2 [ f binar y ] = 1− . 12 [(1 − ϑ0 (µmax ))(1 − e−µmax )]−1 − 1
(8)
The MMSE of the asymptotically optimal tuning function is given by χ 2 [ f asymp ] =
2 ∞ (k + 1) 1 1 0,µ 1 max − √ , 3 3 2( µmax ) k=0 k! 0,µmax (k + 12 )
where r,s denotes the truncated gamma function s r,s (k) = t k−1 e−t dt.
(9)
(10)
r
A comparison of χ 2 [ f asymp ] with χ 2 [ f binar y ] shows that the step function in fact leads to a smaller average reconstruction error than the parabolic tuning function (see figure 4) if µmax < 8.2. 2
Equations (7) and (8) refer to Poisson noise.
308
M Bethge et al
0.5
1
2
3
a ,a ,a ,a
4
1
0
1
3 µ
10
max
Figure 5. Bifurcation diagram with logarithmic µmax -axis that shows the parameters a1 , . . . , a4 of the optimal tuning function within the class S5 . A clear phase transition from the binary step function to a staircase function that uses all available quantization levels occurs at µmax ≈ 3. Up to the phase transition the graphs of a1 , . . . , a4 are in precise agreement with equation (7).
At this point the question arises of whether other tuning functions would outperform f binar y and f asymp for 0 < µmax < ∞. Since it is not possible to compare the step function with all other tuning functions3 , we have to select an appropriate function class F such that it becomes unlikely to find a better tuning function than the step function, if no such function exists in F . Since rate distortion theory tells us that optimal source encoding is discrete (Rose 1994), we first studied the following classes Sλ of piecewise constant staircase functions with λ 2 quantization levels:
Sλ ≡
stair stair f aλ1-,...,a (x) : f aλ1-,...,a (x) = b0 + λ−1 ,b1 ,...bλ−2 λ−1 ,b1 ,...bλ−2
λ−1 (bl − bl−1 )(x − al ), l=1
a1 , . . . , aλ−1 ∈ [0, 1] and b1 , . . . bλ−2 ∈ [ f min , f max ] ,
(11)
where b0 = f min and bλ−1 = f max . All these classes together build up a hierarchy of genuine subsets: S2 ⊂ S3 ⊂ S4 ⊂ · · ·
(12)
and contain the optimal binary step functions given by equation (6) as a special case. Note that for λ = 2 the general notation above might be misleading, because then b1 no longer constitutes a free parameter so that we have f a21-stair (x) = b0 + (b1 − b0 )(x − a1 ). The MMSE for all of these tuning functions reads λ ∞ k 2 )bi−1 e−bi−1 ]2 1 [ i=1 (ai2 − ai−1 1 1 2 λ-const χ [ f a1 ,...,aλ−1 ,b1 ,...bλ−2 ] = − (13) λ k −b j −1 3 4 k=0 k! j =1 (a j − a j −1 )b j −1 e where a0 = 0 and aλ = 1 are the left- and right-hand boundaries of the interval, respectively. In the case of f min = 0 and λ = 3, 4, 5 we evaluated the optimal parameters a1 , . . . , aλ−1 , b1 , . . . , bλ−2 as a function of µmax finding a phase transition at µcmax 2.95 (figure 5). For µmax < µcmax the optimal tuning function within S5 is equal to the optimal step function defined by equations (6) and (7). For µmax > µcmax , however, it makes use of all available quantization levels. 3 A comparison with all possible tuning functions would require us to solve the optimization problem analytically, which appears to be difficult, if not impossible, because e.g. a variational approach at some step requires us to solve a certain functional equation F [µ(x)] = 0, which seems to be not feasible. However, we were able to prove the existence of a phase transition analytically for a restricted parametrization (Bethge et al 2003).
Optimal neural rate coding leads to bimodal firing rate distributions
309
0.5
1
2
a ,a ,a
3
1
0
1
3 µ
10
max
Figure 6. Bifurcation diagram with logarithmic µmax -axis that shows the parameters a1 , . . . , a3 of the optimal tuning function within the class L3 . A clear phase transition from the binary step function to a piecewise linear function occurs at µmax ≈ 3. Up to the phase transition the graphs of a1 , . . . , a3 are in precise agreement with equation (7).
In order to check whether the binary coding for µmax < µcmax is generically optimal or whether this is rather due to the specific parametrization of S5 , we also considered another function class Lλ that consists of piecewise linear tuning functions: b1 , 0 < x < a1 x − a1 b1 + (b2 − b1 ) , a1 < x < a2 a2 − a1 b2 + (b3 − b2 ) x − a2 , a2 < x < a3 λ-linear a3 − a2 f a1 ,...,aλ ,b2 ,...bλ−1 (x) = (14) .. .. . . x − aλ−1 bλ−1 + (bλ − bλ−1 ) , aλ−1 < x < aλ aλ − aλ−1 bλ , aλ < x < 1, where b1 = f min and bλ = fmax and Sλ ⊂ Lλ+1 holds. We determined the optimal tuning function within L3 , for which the MMSE is given by ∞ A2 (k) 1 linear − ] = χ 2 [ f aλ1-,...,a λ ,b2 ,...bλ−1 3 k=0 B(k)
(15)
where (1 − a32 ) µkmax e−µmax a12 δk,0 + Aa1 ,a2 ,0,b2 + Aa2 ,a3 ,b2 ,µmax + 2 2 k! µkmax e−µmax B(k) = a1 δk,0 + Ba1 ,a2 ,0,b2 + Ba2 ,a3 ,b2 ,µmax + (1 − a3 ) k! (β − α)2 γ ,ζ (k + 2) γ (β − α) Aα,β,γ ,ζ = Bα,β,γ ,ζ + α− k!(ζ − γ )2 (ζ − γ ) 1 (β − α) γ ,ζ (k + 1). Bα,β,γ ,ζ = k! (ζ − γ ) A(k) =
(16) (17) (18) (19)
The corresponding bifurcation diagram for the optimal parameters as a function of µmax is shown in figure 6, which again exhibits a phase transition at µcmax ≈ 3. Finally, we consider the function class R2 , which has only two free parameters α β ∈ [0, 1] and contains S2 as well as the asymptotic optimal parabolic function as special cases.
310
M Bethge et al
α, ϑ0, β
1
0.5
0 –1 10
0
10
1
µmax
10
2
10
Figure 7. Bifurcation diagram with logarithmic µmax -axis that shows the parameters α and β of the optimal tuning function within the class R2 . A clear phase transition from the binary step function to a parabolic ramp function occurs at µmax ≈ 3. Up to the phase transition the graphs of α and β are in precise agreement with equation (7). After the phase transition the width of the parabolic region is permanently increasing. The continuation of the graph of the optimal threshold ϑ0 is indicated by the dashed curve.
The parametrization f min , 2 x −α ramp fα,β (x) = f max − f min + f min , β −α f max ,
0