Wyss, R., Verschüre, P. F. M. J., & König, P. (2003). Properties of a Temporal Population Cod. Reviews in the Neurosciences, 14(1-2), 21-34.
Properties of a temporal population code Reto Wyss∗ ,Paul F.M.J. Verschure, Peter K¨onig Institute of Neuroinformatics, University/ETH Z¨ urich, Switzerland. February 5, 2003
∗
Corresponding author: e-mail:
[email protected] tel: +41 1 635 3072, fax: +41 1 635 3053, address: Institute of Neuroinformatics University/ETH Z¨ urich Winterthurerstr. 190 CH-8057 Z¨ urich, Switzerland
Keywords: temporal coding, visual system, pattern recognition, invariant representation
Manuscript information: 33 pages 177 words in the abstract 7220 words in the total manuscript 1 table, 8 figures
1
Abstract The temporal patterning of neuronal activity may play a substantial role in the representation of sensory stimuli. One particular hypothesis suggests that visual stimuli are represented by the temporal evolution of the instantaneous firing rate averaged over a whole population of neurons. Using an implementation in a cortical type network with lateral interactions, we could previously show that this scheme can be successfully applied to a pattern recognition task. Here, we use a large set of artificially generated stimuli to investigate the coding properties of the network in detail. The temporal population code generated by the network is intrinsically invariant to stimulus translations. We show that the encoding is invariant to small deformations of the stimuli and robust with respect to static and dynamic variations in synaptic strength of the lateral connections in the network. Furthermore, we present several measures which indicate that the encoding maps the stimuli into a high-dimensional space. These results show that a temporal population code is a promising approach for the encoding of relevant stimulus properties while simultaneously discarding the irrelevant information.
1
Introduction
The importance of the temporal dynamics of neuronal activity in representing visual stimuli gains increased attention. Several experimental studies indicate that the temporal structure of neuronal discharges participates in coding visual information. In a series of studies[Richmond and Optican, 1987, Richmond and Optican, 1990] the responses of single cells in different cortical areas to a complete basis set of visual patterns are investigated. The neuronal discharge patterns are decomposed using a principal component analysis. Both, the first component, which corresponds to the response strength, and higher order components, which correspond to different temporal activity patterns, are significantly related to the visual stimuli. Thus, the stimuli drive both the strength and temporal pattern of the activity of single neurons. In contrast, other experimental studies suggest that synchronization of oscillatory activity of many neurons might play a crucial role in feature binding and scene segmentation[Singer, 1999a, Gray et al., 1989, Gail et al., 2000]. In this 2
view, the average firing rates represent local stimulus features. The correlation structure on a millisecond time-scale provides information on the grouping of these features into representations of whole objects. In this framework the presence or absence of correlations supplies a binary signal on the global interpretation of a stimulus. An alternative view on temporal coding is put forward in a theoretical study by Buonomano and Merzenich[Buonomano and Merzenich, 1999]. They simulate a network of orientation selective feature detectors receiving strong feed-forward inhibition. This approach is applied for pattern recognition. The latencies between stimulus onset and the first spike of the neurons in the network constitute an encoding of the presented stimuli. This representation is naturally invariant to translations. Building on this approach, in a previous study we present a model of a cortical network which encodes visual stimuli in a temporal population code[Wyss et al., 2003]. It places an emphasis on an adequate incorporation of known anatomical properties of cortex. By virtue of the lateral connections in the network the topology of visual stimuli gives rise to specific temporal activity patterns. Thus, the visual stimuli are represented by the temporal evolution of the instantaneous firing rate averaged over a whole population of neurons. Due to the symmetries of the lateral connectivity in the network (e.g. homogeneity) the temporal pattern of the population activity inherits the corresponding invariances (e.g. translation invariance). Furthermore we demonstrate that the stimulus encoding is invariant to small deformations and preserves an intuitive notion of visual similarity. In a widely used benchmark (MNIST[LeCun et al., 1998]), a database containing many handwritten samples of the 10 digits, it achieves nearly 95% of correct classifications. Thus, the encoding process performed by the network discards part of the information for the generation of invariant representations, while preserving the relevant information for the classification of stimuli. Here, in order to have access to hundreds of stimulus classes, we generate a set of synthetic stimuli. These are used to investigate the scaling performance of the network, as well as the robustness and dimensionality of the sensory representation.
3
2
Methods
2.1
Network
Our model of primary visual cortex (V1) consists of a retinotopic map of laterally interacting columns of spiking neurons. These columns are approximated by one single leaky integrate-andfire unit with graded output. The time course of its membrane voltage V (t) is described by the differential equation: Cm
! " dV = − Iexc (t) + IK (t) + Ileak (t) dt
where Cm is the membrane capacitance (Cm = 0.2 nF), and I represents the transmembrane current, i.e. excitatory input (Iexc ), spike-triggered potassium current (IK ) and leak current (Ileak ). These currents are computed by multiplying a conductance g with the driving force: I(t) = rev g(t)(V (t) − V rev ) where V rev is the reversal potential of the conductance (Vexc = 60 mV, VKrev = rev −90 mV, Vleak = −70 mV). The column’s activity at time t is given by A(t) = a · H(V (t) − θ)
where a ∈ [0, 1] is the activation of the column as determined by the applied input stimulus and the column’s orientation and spatial frequency selectivity. H is the Heaviside function and θ is the firing threshold (θ = −55 mV). Each time a column emits a spike, the potential is reset to rev Vrest = Vleak . The constant leak conductance gleak is 20 nS. The time course of the potassium peak ˆ ˆ = H(V (t) − θ) with a time conductance is given by τK dgK /dt = −(gK (t) − gK A(t)) where A(t) peak peak constant τK and a peak conductance gK (τK = 40 ms, gK = 200 nS). The excitatory input to
a column consists of two components. First, a constant driving conductance of 5 nS in conjunction with the time course of the membrane potential as given above yields a firing rate of approximately 42 Hz after frequency adaptation. Second, the synaptic conductances of the lateral connections between different columns. Each column is characterized by its orientation and spatial frequency selectivity. Thus the columns can be parameterized by a triplet (⃗x, φ, ν), where ⃗x ∈ [0, 1]2 is a two-dimensional vector specifying the center of the column’s receptive field within the visual space, φ ∈ {0 ◦ , 45◦ , 90◦ , 135◦ } 4
is the column’s preferred orientation and ν ∈ {high, medium, low} its preferred spatial frequency. The total number of columns is 8400, with an equal amount selective for each of the four orientations. The ratio of columns selective for the three different spatial frequencies is given by high : medium : low = 16 : 4 : 1. The columns selective for a particular spatial frequency are arranged in a regular grid spanning the [0, 1]2 plane, where in turn four columns representing the different orientations share the same receptive field center. The input to this network consists of a retinal grey-scale image of 80 × 80 pixels corresponding to the [0, 1]2 plane. This image passes through an edge detection stage, by convolving it with a difference of Gaussian (DOG) kernel kij given by 2
kij = e−16r − 1/4e−4r
2
with r =
√2
i +j 2 3
for i, j ∈ {−3, . . . , 3}.
The resulting contour is cropped to the original size of the image and represents the activity in the LGN, which serves as the input to V1 (figure 1). Given the position ⃗y ∈ [0, 1]2 of the thalamic neurons within the visual space, the activation of a V1 column a(⃗x,φ,ν) is defined by the absolute value of a complex sum # # # a(⃗x,φ,ν) = # #
$
ν ∥⃗ y −⃗ x∥