a natural result of learning in a sparse spike

2 downloads 0 Views 239KB Size Report
that we may define a sparse spike coding scheme by implementing accordingly lateral ... areas and we first applied these spiking neurons models to information.
Finding Independent Components using spikes : a natural result of learning in a sparse spike coding scheme Laurent Perrinet ([email protected]) INPC-CNRS, 31, chemin Joseph Aiguier, 13402 Marseille, France Submitted December 1, 2003 Abstract. As an alternative to classical representations in machine learning algorithms, we explore coding strategies using events as is observed for spiking neurons in the central nervous system. Focusing on visual processing, we have previously shown that we may define a sparse spike coding scheme by implementing accordingly lateral interactions [Perrinet et al., 2002]. This class of algorithms is both compatible with biological constraints and also to neurophysiological observations and provides a possible algorithm to explain the speed of visual processing despite the relatively slow time of response of single neurons. We explore here learning mechanisms to unsupervisely derive a overcomplete set of filters which provide a sparse representation of the input. We first based this work on previous work of Olshausen et al. (1998) and the results leads to similar results on the same architecture, suggesting that this provide a neural implementation to Blind Source Separation. However, this neuromimetic algorithm may be more easily extended to realistic architectures of cortical columns in the primary visual cortex. We show results for simple models of cortical colums but also for a strategy of higher-order representation. Keywords: Vision, parallel asynchronous processing, wavelet transform, natural images statistics, over-complete representation, sparse spike coding, hebbian learning, emergence.

1. Introduction: Toward dynamical models of neural coding Based on the results of psychological experiments which showed the rapidity of categorization in primates [Thorpe et al., 1996], we studied in collaboration with the team of Simon Thorpe in Toulouse (France) new paradigms of neural coding. These studies try to find solutions for real-time processing tasks such as video processing by exploring the strategies choosen in the highly parallel neural code. This exploration is leading to alternatives to the paradigm of analog spike frequency coding inspired by the leading studies of Adrian (1928) such as temporal coding or rank-order coding [Thorpe and Gautrais, 1998]. In particular, we analyzed the performance of an artificial neural network which neurons fired only once and where the information is using the relative latency of spikes. This model proved to be very efficient visual tasks such as pattern recognition, since it is highly parallel and exhibits intrinsic properties which mimic the performance of biological processing. In the hierarchical architecure of the visual processing streams, we are c 2004 Kluwer Academic Publishers. Printed in the Netherlands.

perrinet03nc.tex; 14/01/2004; 19:04; p.1

2

Laurent Perrinet

especially concerned with the low-level processing in the primary visual areas and we first applied these spiking neurons models to information transmission from the retina to the optic nerve and to the primary visual area (V1). The retinal model used a wavelet architecture to transform the intensities captured by the photoreceptors in a multi-scale representation of contrasts and used the relative regularity of the ranked coefficients over a database of natural images1 to transmit the coefficients’ analog values solely by their rank [Van Rullen and Thorpe, 2001]. However, the constraints on the architecture present limits (orthogonality of the filters, sensibility of the representation to small perturbations) which are incompatible to realistic models. We answered to these drawbacks by exploring over-complete representations, i.e. where the dictionary of filters is much larger than the dimension of the input space. In this paper, I will first present this algorithm and show how it forms a sparse spike code of the image [Perrinet et al., 2002]. A main challenge for this algorithm is to define the dictionary to be used to represent the image in an ”optimal” way. We will present in the second part how we derived from a Hebbian learning scheme a strategy which leads to a sparse spiking representation of the input. We will then show how this scheme is biologically relevant and provides a neuromimetic implementation of Blind Source Sepaation (BSS). In fact, BSS finds the independent components of a collection of signals [Zibulevsky and Pearlmutter, 2001] and this neuro-mimetic algorithm may therefore provide a hint to the reverse engineering processes involved in the brain [Thorpe et al., 2000, Thorpe and Fabre-Thorpe, 2001]. We will finally present results of experiments using natural images. When comparing with the achitecture of Olshausen et al. (1998), this algorithm provides similar outcomes toward the neurophysiological observations of orinetation selective filters as is obnserved in the primary visual area [van Hateren and van der Schaaf, 1998].

2. Toward sparse spike coding 2.1. Analog to spike coding in the retina As described in Van Rullen and Thorpe (2001), let us first define a model retina with a set of fixed neurons modeling the ganglion cells (GCs) and sensitive at different spatial scales to the local contrast of the image intensity detected at the photo-receptors. The neurons 1 We define natural images as the images that are sensed by the visual system under consideration and that are customary to their environnement. This thus yields different definitions for animals, robots or imaging satelites.

perrinet03nc.tex; 14/01/2004; 19:04; p.2

Finding IndependentComponents using spikes

3

are defined by their position and scale as dilated, translated and sampled Mexican Hat (or Difference Of Gaussian — DOG) filters (see [Mallat, 1998, pp. 77]). They are placed uniformly over dyadic scales grids, i.e. growing over both axis as powers of 2. Approximating the response of ganglion cells by a linear reponse [Rodieck, 1965], the dendrite of a neuron i may be characterized by its weight vector φi over its receptive field and the activity Ci adding up at the soma of the neuron is given by the dot product : Ci :=< I, φi >=

X ~l∈Ri

I(~l).φi (~l)

where I(~l) is the output of the photoreceptors (for image processing the luminance2 of a pixel) at ~l and Ri is here the receptive field of the neuron i yielding a contrast Ci . Moreover, instead of differentiating ON or OFF cells, we will consider for simplicity —if not stated otherwise— that each neuron i is assigned a polarity pi which is either +1 or −1, so that the coefficients are rectified (i.e. |Ci | = pi .Ci ). Finally, this architecture is tuned to form an orthogonal wavelet-like transform [Mallat, 1998, pp. 77] of the image, so that responses of different filters are uncorrelated. The analog image of luminances is therefore transformed in a stack of images of decreasing size (hence the name pyramid ) representing the image at different scales to yield mesures of the multi-scale contrast that are driving the output of the retina, the Ganglion Cells’ layer, their driving currents thus forming a wavelet representation of the input. When presenting an image at an initial time, each neuron integrates this analog information at its soma until it reaches a threshold: it then emits a spike —the more it is activated, the more the spike is fired rapidly. This spiket propagates along the axon and the neuron’s activity is then shunted. The analog image is therefore transformed in a spatio-temporal pattern of spikes whose instantaneous frequency may constitute the image’s code. But alternatively, the code may also be equivalently carried by the exact spiking time (or latency) of the first spike. Even if this consists an extrem case, we will study this case to show how transient information may be carried in the neural code. We will consider solely the first spike for each neuron, so that the studied code exactly consists of this latency for each of the different fibers i (and which is inversely proportional to the neuron’s excitation current, i.e. to the corrected activity). This algorithm therefore defines a coding scheme from the analog pyramid of the wavelet coefficients to a spike ’wave front’ which will travel along the optic nerve. But, how to decode 2

The computer images used in this work were gamma-corrected and normalized between 0 and 1 with a resolution of 8 bits.

perrinet03nc.tex; 14/01/2004; 19:04; p.3

4

Laurent Perrinet

this information back? As a case study for a visual code and even if this situation is biologically highly unrealistic, we will first study the quality of the image reconstruction by the wave front as an upper bound of information transmission quality. We will therefore record the spike wave generated by this model retina and in our framework, we will use a property of the orthogonal wavelet transform which states that the image may be reconstructed by the coefficients’ values by Irec ∼

X i

Ci .φi

But how may the coefficients’ absolute values be transmitted along the optic nerve? In fact, Van Rullen and Thorpe (2001) have shown that these values observed regularities across natural images as they were ordered from the largest to the lowest rectified value. A solution is therefore to use the mean analog value to form a Look-Up Table (LUT) to decode back the analog values from their rank3 . This regularity is due to the regular distribution in natural images of edges of different orders (i.e. dots, lines, ramps, gradients) — mathematically of different Lipschitz coefficients— and which are extracted and ordered in this wavelet transform [Perrinet et al., 2003]. It is therefore enhanced if each scale is tuned according to the statistics of natural images. Finally, this strategy builds a complete and efficient code along the optic nerve from the retina (analog to spike coding) to the brain (spike to analog coding) using solely the rank of the spikes in the wave front, i.e. a rank-order coding scheme [Thorpe and Gautrais, 1998]. 2.2. Overcomplete representation using Matching Pursuit The condition on the filters for a correct compact reconstruction, i.e. the orthogonality of the dictionary used to represent the image, is a strong constraint on the architecture and is achieved only approximatively with the model presented in Van Rullen and Thorpe (2001), resulting in a small information loss. Moreover, in the biological retina, the architecture is not orthogonal and neurons have correlated responses. This condition is therefore restrictive in order to build a biologically inspired model of the retina. Moreover, applying the algorithm in further models in the primary visual system where the interdependence is even stronger would yield in highly redundant spiking codes which are incompatible with neuro-physiological observations [Baddeley et al., 1997] that prove that V1 simple cells don’t form an orthogonal representation (in 3 In practice, we have to add an hypothesis upon the mean contrast of images so that the LUT may be sufficiently regular. We will therefore normalize images to their mean absolute contrast value in the following.

perrinet03nc.tex; 14/01/2004; 19:04; p.4

Finding IndependentComponents using spikes

5

wavelets terminology a tight frame) [Salinas and Abbott, 2000]. In the previous model, the number of coefficents was of the order of 4/3 of the number of pixels and in order to represent the visual information in a more efficient way, we may want to use an overcomplete representation of the image, i.e. for which the number of filters is far greater than the dimension of the input space. Also, to mimic biology’s efficiency, we need at the same time that the code defines a sparse code, i.e. that the model’s coefficients absolute values rapidly decrease (only a relatively few number of neurons are active) [Olshausen and Field, 1998] but assuring also that the responses of different neurons are uncorrelated as is observed in the primary visual areas [Vinje and Gallant, 2000]. Following the framework presented in Olshausen and Field (1998), the image is modeled as the probabilistic realization of a linear generative model (LGM, i.e. where the input image is approached as a linear sum of neuronal filters). The goal is therefore to choose the best representation as a trade-off between precision and the sparsity of the chosen filters’ coefficients. This leads theoretically to a combinatorial explosion (it is a NP-hard problem [Mallat, 1998]) which is approached in [Olshausen and Field, 1998] using a conjugate gradient descent. In the framework of event-based computation inspired by spike coding, another strategy is to use a Matching Pursuit (MP) algorithm [Mallat and Zhang, 1993]. Under the same assumption of gaussian noise as in [Olshausen and Field, 1998] we optimize the representation one filter after the other by choosing first the most probable filter over an arbitrary dictionary D to represent the image (hence the name of ”greedy pursuit”). The associated coefficient minimizing the error is defined as the projection of the image along the choosen filter, and the algorithm iterates then on the successive image residuals [Mallat, 1998, pp.412–9] (i.e. using the probability of the image knowing what filters and coefficients were already chosen). Formally, as described in [Perrinet et al., 2002] the algorithm is simply at discrete times t ∈ 1, ..., N (and which may be translated to the exact spiking latencies)   it = ArgMaxi∈D (|Cit |)  Cit+1 = Cit −

C tt i

Nit 2

< φit , φi >

In particular, the current of a winning neuron becomes Cit+1 = 0. In t comparison with [Olshausen and Field, 1998], we don’t define a priori an objective sparsity function, but minimize the value of the coefficients. A less ”greedy” option is to cancel only a fraction of the projection or to follow a given sparsity function. This option gives a realistic model of firing in the primary visual cortex that compares to multi-unit recordings [Celebrini et al., 1993]. However, to fully ex-

perrinet03nc.tex; 14/01/2004; 19:04; p.5

6

Laurent Perrinet

plore this ”transient” alternative, we will stick to the greedy approach. Transposing the algorithm to neurons, the best match is given by the first neuron that fires and the correlations between filters is implemented by a fast propagation along lateral interactions (as may be the the case with fast spiking inter-neurons) or synaptic adaptation [Chance et al., 1998] to cancel the correlation whenever a neuron is selected (see Fig. 1). Moreover, the weight of the lateral interactions may be learned by a simple hebbian rule over the conjoint activity of neurons [Oja, 1982]. Finally, the algorithm is simply4 at the discrete times t ∈ 1, ..., N (defined by the exact spiking latencies) (

with mt =

|C tt | i

Nit 2

it = ArgMaxi∈D (|Cit |) Cit+1 = Cit − pt .mt . < φit , φi >

and pt is the sign of Citt (i.e. its ON or OFF polarity).

In particular, it should be noticed that this process is non-linear (if the doubling of the signal gives the same spike list, the addition of 2 signals won’t usually give the sum of the 2 spike lists) and that the activity of anti-corelated neurons may be enhanced. The reconstruction is then simply Irec (T ) =

X t=0,...,T

pt .mt .φit

At a first stage, we implemented a model retina as in Van Rullen and Thorpe (2001) and we observed the behavior of the absolute coefficients’ values in function of the rank of propagation for different natural images drawn from a database of outdoor scenes (as described in [van Hateren and van der Schaaf, 1998]). Similarly as the previous model, we observed regularities across natural images and that this behavior showed up to be stable allowing the similar use of a Look-Up Table (LUT) to decode the analog value by its rank. Moreover, we used an incremental adaptive rule which has the advantage of being more biologically plausible and enabling on-line learning. This rule takes the form of a stochastic algorithm so that after the nth image coding using m(n) as a modulation function, m(n+1) (t) = (1 − µ(n) ).m(n) (t) + µ(n) .

|Citt | Nit 2

where t is as before the discrete time corresponding to the decomposition and µ(n) (typically, µ(n) = 1/n) is the stochastic learning gain. 4

details may be founded in Perrinet and Samuelides (2002a)

perrinet03nc.tex; 14/01/2004; 19:04; p.6

7

f0

Finding IndependentComponents using spikes

wi

i0

< w i0 ;w i >

Figure 1. Principle of spike coding using a matching pursuit scheme. We represented a layer of neurons i (grey triangles) sharing some similar inputs on their synapses (black dots) according to given synaptic weights w ~ i and which add up to create a driving current at the soma. The idea is to define the initial time step as the latency of the first spike to fire, i.e. corresponding to the neuron i0 with maximal activity. This ”winning neuron” elicits a spike along its axon (red impulse). According to the matching pursuit scheme, we may then directly account for the correlation between filters by subtracting to the other neurons an activity proportional to the correlation < w ~ i0 , w ~ i > between their weight vector and the weight vector of the chosen neuron (blue rounded lines). This scheme is then recursively repeated to the neuron corresponding to the next maximal residual activity and until this activity is less than a given threshold. The spatio-temporal spike pattern will therefore represent the input signal in a sparse representation and the signal may be reconstructed back by a simple linear rule.

Practically, since the correlation is low, the algorithm is similar to the previous retinal model and it shows similar convergence as the LUT in the retina and leads to the same reconstruction error, though slightly better due to the adaptive nature of the algorithm. 2.3. Sparse spike coding This algorithm may be related in some way to the divisive normalization scheme developped by Schwartz and Simoncelli (2001), since their algorithm tries to cancel out activities when Now that we defined a spike coding scheme, let’s apply it to a model of the primary visual cortex (V1) where in comparison with the retina, the over-completeness of the representation is far greater (in humans the number of ganglion cells is of the order of one million as for V1 the number of neurons is approximately 300 million). In fact, as is observed for the simple cells of V1, we first simulated the single layer sparse spike coding scheme with a set of weight vectors φi defined as dilated, translated and sampled Gabor filters ([Daugman and Downing, 1995], for a definition,

perrinet03nc.tex; 14/01/2004; 19:04; p.7

8

Laurent Perrinet

see [Mallat, 1998, pp. 160]). Following neuro-physiological observations, √ the scale grows geometrically with a factor ρ = 5 2 (i.e. 5 layers per octave), Gabor filters were symmetric and antisymmetric and the direction is circularly 0, π/4, π/2 and 3π/4. The resulting distribution of the coefficients is highly kurtotic and the LUT were tabulated in the same manner as for the model retina, so that the information rate —i.e. the information needed to code the address of one spike— is in this layer log2 (nfilters )bit/spike where nfilters denotes the total number of neurons. Convergence is quicker as for the retinal model —each spike carry more information— so that this particular architecture may be compared to Jpeg at high compression gains [Perrinet and Samuelides, 2002b]. In comparison to the model retina, this architecture with a set of Gabor filters show that by building lateral interactions, we may get a sparse representation of the image, i.e. that the relative number of active neurons to represent the information is low [Perrinet et al., 2003]. This therefore defines a sparse spike coding scheme in V1.

3. Learning in a sparse spike coding scheme The choice of the dictionary of Gabor filters was made based on the neurophysiological data and because they are optimal for multiple criteria in image processing (such as the optimal spread in the scale-space wavelet space). But it is useful to derive an adaptive scheme in our framework to be able to construct other dictionaries but also to adapt to different data sets for which ”optimal” filters are not known. 3.1. Unsupervised learning and sparse spike coding First, to define an ”optimality”, we will consider that the goal of lowlevel sensory coding is to decompose an ensemble of signals (in our case the natural images) using an as sparse as possible over-complete representation. In fact, under the hypothesis that natural images are generated using a sparse realization of a Linear Generative Model (LGM), this strategy’s goal is not only to keep the mean overall activity of the system low, but more importantly it means that the corresponding activities corespond to the sources that generated the sensory signal [Zibulevsky and Pearlmutter, 2001]. We saw previously that during the matching pursuit algorithm, the inhibition is equal to the projection of the chosen neuron’s filter to the residual image. If we assume that the energy of the natural images are normalized to 1, during the coding the relative energy taken from the

perrinet03nc.tex; 14/01/2004; 19:04; p.8

Finding IndependentComponents using spikes

9

residual image will be proportionnal to the correlation of the filter with the residual image: the more the filter is close to the image, the more the energy decreases rapidly. In an extreme case, a dictionary that would consist of all images of the image database would decompose all these images in a single sweep. A more general algorithm consist to chose a reasonable number of filters and to progressively adapt them so that they match in average the residual image during the sparse spike codings. An important parameter of the system is the overcompleteness of the dictionary since it will determine the complexity of the algorithm [Perrinet et al., 2003]. This relates thus to the theory of the VC-dimension [Vapnik, 1995] and we will consider that the number of neurons is fixed and we will tune it to achieve an acceptable compromise between precision and generality. To optimize the sparsity, the learning algorithm should therefore adapt the filters so that they progressively optimize the representation. Given a certain number of initial random filters and a randomly chosen image from the database, a simple strategy consists to include the learning step in the sparse spike coding algorithm, but with a much slower time constant. As before, we recursively choose the best match on the residual activity : this correponds to the winning filter. To maximize the correlation of this filter with the residual image, we slightly orient the filter in the direction of the residual image. The Matching Pursuit coding scheme is then resumed, activities are updated and we proceed to the next step. This coding sweep therefore provides a newly modified filter base which is used with a next image drawn from the database. 3.2. Method : Hebbian learning More formally, we may derive a learning rule for the sparse spike coding scheme. Due to the properties of Matching Pursuit, the Squared Error (SE) at time T is simply given by : kI − Irec (T )k2 = kI (T +1) k2 = kIk2 −

X

= kIk2 −

X

t=0,...,T

kCit k2

t=0,...,T

k < I t , φit > k2

where I t is the residual image at t which is recursively constructed during the MP by I t = I and I t+1 = I t −

Citt φit Nit 2

It should be noted that the MP algorithm assures that the SE is monotonically decreasing. Moreover, if the dictionary is at least complete

perrinet03nc.tex; 14/01/2004; 19:04; p.9

10

Laurent Perrinet

(i.e. that D covers the input space), then the convergence is exponential toward 0. In our framework, we may formalize the learning scheme by adding at each coding sweep a learning step after each selection of a best match it (i.e. after each spike) as a gradient descent to maximize the squared coefficient |Citt |2 = | < I t , φit > |2 . As in Eq. 17 in [Olshausen et al., 1998], this takes the form of a Hebbian learning: φit ← φit + γ.|Citt |.I t where γ is the learning factor, taken proportionnaly to the activity (e.g. C γ = k C i0t k.γ0 and γ0 = 1/n, n being the learning step) to enhance the i first coding sweep. As in [Olshausen et al., 1998], a separate mechanism assures that the competition between filters is ”fair”, i.e. that there is no prior toward any filter. 3.3. Link to Independant Component Analysis Our algorithm differs from classical Hebbian learning rules since the signal is progressively decomposed in residuals by the coding scheme. It also differs from Sparsenet by the fact that the learning sweep is associated to each spike coding instead of averaging over a batch of samples but also because it doesn’t assume a desired sparsity function. However, if we consider γ small the learning algorithm is similar to the previous scheme since it provides a gradient descent on the set of filters D = {φi } over the cost defined by the squared sum of the coefficients. It is also similar to other optimization strategies as the maximization of the kurtosis or based on informational basis. Let’s assume that there exist an over-complete collection of filters Ds = ψi so that for every signal indexed by d from the database M (of cardinal much larger than the cardinal of D), there exist a sparse P d decomposition of f = j≤1 bdj .ψod (j) (parameters of every decomposition are the coefficients bd and the spike list od ). Let’s assume that we defined a random dictionary of filters D = φi of same overcompleteness. Then, restricting ab initio to the very first coefficient, we may see that , we may prove the convergence of the algorithm, it should be compared to the problem of Blind Source Separation. In fact, the distribution of the data for natural images is highly non-uniform, and is characterized by components with high kurtosis along some axes of the representation space Field (1994). By finding the filters, i.e. the axes in the representation space, which yield a sparse representation of the signal, Olshausen et al. (1998) proved this representation relates to Independent Components Analysis Comon (1994) since it uses the same informational optimizing principle. Our results provides therefore

perrinet03nc.tex; 14/01/2004; 19:04; p.10

Finding IndependentComponents using spikes

11

a simple algorithm which could be an underlying principle of the reverse engineering processes that may take place in the brain.

4. Results : emergence of feature selective neurons 4.1. Application to the architecture of Sparsenet To compare the convergence of the algorithm with that proposed by Olshausen and Field (1996), we simulated our algorithm using the same whitened images but also the same protocol. Only the optimization by the conjugate gradient was remplaced by the matching pursuit scheme and the hebbian learning was included in the coding. Same parameters were kept as for the size and number of filters but also for the learning rates. The competition between filters was implemented using the same algorithm. After approximately 100 learning steps, filters converge to direction selective filters (see Fig. 2) similarly the Sparsenet algorithm. This confirms the analogies of our algorithm and Sparsenet and that the filters of simple cells in the primary visual cortex constitute the independent components of static images [Bell and Sejnowski, 1997]. This results are very similar to the precursor work of Linsker (1986) or to the results of van Hateren and van der Schaaf (1998) and to other algorithms learning optimal sparse representations at this stage of processing. However, the hebbian sparse spike coding algorithm is different from these methods since it is computationally more simple and tractable but also because it is based on a neuro-mimetic approach. 4.2. Extension to models of cortical columns we used large image patches of 128 × 128 from these databases and whitened them using We applied this algorithm on different natural image databases, first from Olshausen and Field (1996) but also on the set of calibrated set of natural images provided by van Hateren and van der Schaaf (1998). Following the protocol of Olshausen and Field (1996), the same whitening filter. The norm of every filter was controlled by a similar momentum term to provide the best competition between filters. We initiated a dictionary of 15 similar 9 × 9 filters that were duplicated for every position in the image under the assumption of translational ergodicity. on peut avoir des RFs en parallleSince filters have a limited receptive field R, the particular patch in the image is therefore corresponding to the best feature in this dictionary. This learning algorithms differs from classical algorithms and fits to the spike coding scheme. In fact, it is

perrinet03nc.tex; 14/01/2004; 19:04; p.11

12

Laurent Perrinet

Figure 2. Emergence of filters using the same protocol as Sparsenet. Using the same parameters as Olshausen (1998), I simulated the propagation of the visual information through a neuronal layer which filters are initially randomly set. By using the learning rule defined in the text, neurons selective to orientation emerge, as may be observed for V1 simple cells.

parallel since we feed whole images and that the patch chosen to learn the filter is chosen in the image by the algorithm and not randomly. We may therefore predict that if two filters are similar with respect to a small translation, they will be in parallel competition and only the best one will be chosen. 4.3. Lateral cooperation Also there is a priori no relationship between filters and –similarly to the Lloyd algorithm— the learning scheme may be enhanced by implementing lateral interactions between filters. In this case, we may modify the sensibility of the neurons (by lowering their threshold), hence their chance to fire and be selected as the best match, according to the previous filters. For instance, when a filter is selected near an area of high

perrinet03nc.tex; 14/01/2004; 19:04; p.12

Finding IndependentComponents using spikes

13

Figure 3. Emergence of filters in a model of a cortical column. We simulated the propagation of the visual information through the retina and then through a second layer which filters are initially randomly set. By approaching the receptive fields of neurons toward the patch that elicited the firing of a neuron, we progressively extract primitives of the image which compete through the MP algorithm. Using the same protocol as , V1-like orientation selective emerge, but the parallel propagation avoids doubles.

contrast, we may facilitate for neighboring neurons (in the dictionary) the selection of a neighboring patch of high contrast in the image. Since smooth borders (lines and edges of solids) are more probable in natural images, filters will associate by similarity. This scheme provides therefore an interdependence similar to those found in Kohonen maps that my be particularly efficient for image representation. An extension of this model was to introduce lateral connectivity in this learning scheme. This therefore introduced topological relations between neurons which associated neighboring spatial responses to neighboring neurons. This strategy leaded to a better convergence since learned features in filters —and which therefore appeared more frequently in comparison to random filters— cooperated to neighboring filters. This scheme led to the emergence of a ”pin-wheel” by associating the learning for neighboring neurons (see Fig. ??) and could be a the origin of the emergence of hyper-columns as a basic unit in V1 for detecting the orientation of edges.

Conclusion : toward a model of neural representation Extending the sparse spike coding algorithm, this strategy builds an efficient dictionary of spatial filters which are particularly adapted to the processing of images. It leads to the emergence of independent filters for a given size of the dictionary which are in this case direction selective filters, as is observed in the primary visual cortex (V1). This

perrinet03nc.tex; 14/01/2004; 19:04; p.13

14

Laurent Perrinet

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Figure 4. Introduction of lateral cooperation. Introducing lateral connections between filters from a model cortical column (see text) in favor of neighboring activity, we observe the emergence of topological relations between neighboring filters (cyclically numbered from 1 to 16) similar to the neurophysiologically observed pinwheels.

strategy is improved by introducing topological constraints and further investigations leads to the formation of spatio-temporal filters in the visual flow which may be fitting the response of neurons in V1. Further investigations for larger sizes of dictionaries yield more complex structures such as rounded borders (the so-called bananalets) or endstops, and could provide a basis for secondary visual areas. As is proved by simple visual illusions, a challenge is then to achieve a link between the local features given by the neurse neurons to model the neuronal integration of local information which give rise to the single and global perception as we experience it.

References Adrian, E.: 1928, The Basis of Sensation: The Action of Sense Organs. London: ChristoPhers.

perrinet03nc.tex; 14/01/2004; 19:04; p.14

Finding IndependentComponents using spikes

15

Baddeley, R., L. F. Abbott, M. Booth, F. Sengpiel, T. Freeman, E. Wakeman, and E. Rolls: 1997, ‘Responses of neurons in primary and inferior temporal visual cortices to natural scenes’. In: Proc. Roy. Soc.(Lond.). Bell, A. J. and T. J. Sejnowski: 1997, ‘The ‘Independent Components’ of Natural Scenes are Edge Filters.’. Vision Research 37(23), 3327–38. Celebrini, S., S. J. Thorpe, Y. Trotter, and M. Imbert: 1993, ‘Dynamics of orientation coding in area V1 of the awake primate.’. Vis Neurosci 5(10), 811–25. Chance, F., S. B. Nelson, and L. F. Abbott: 1998, ‘Synaptic depression and the temporal response characeristics of V1 simple cells.’. The Journal of Neuroscience 18, 4785–99. Comon, P.: 1994, ‘Independent Component Analysis, A new concept?’. Signal Processing 36(3), 287–314. Daugman, J. and C. Downing: 1995, ‘Gabor Wavelets for Statistical Pattern Recognition.’. The MIT Press, Cambridge, MA, pp. 414–9. Field, D. J.: 1994, ‘What Is the Goal of Sensory Coding?’. Neural Computation 6(4), 559–601. Linsker, R.: 1986, ‘From Basic Network Principles to Neural Architecture: Emergence of Spacial-Opponent Cells / Orientation-Selective Cells / Orientation Columns’. Proceedings of the National Academy of Sciences 83, 7508–7512, 8390–8394, 8779–8783. Mallat, S.: 1998, A Wavelet Tour of signal Processing. Academic Press. Mallat, S. and Z. Zhang: 1993, ‘Matching Pursuit with Time-Frequency Dictionaries’. IEEE Transactions on Signal Processing 41(12), 3397–3414. Oja, E.: 1982, ‘A simplified neuron model as a principal component analyzer.’. J. Math. Biol (15), 267–73. Olshausen, B. and D. J. Field: 1996, ‘Natural image statistics and efficient coding.’. Network 7, 333–339. Olshausen, B. and D. J. Field: 1998, ‘Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1?’. Vision Research 37, 3311–25. Olshausen, B., P. Sallee, and M. S. Lewicki: 1998, ‘Learning Sparse Wavelet Codes for Natural Images.’. In: M. I. Jordan, M. J. Kearns, and S. A. Solla (eds.): Advances in Neural Information Processing Systems, Vol. 12. The MIT Press, Cambridge, MA. Perrinet, L. and M. Samuelides: 2002a, ‘Coherence detection in a spiking neuron via Hebbian learning’. Neurocomputing 44–6(C), 133–9. Perrinet, L. and M. Samuelides: 2002b, ‘Sparse Image Coding Using an Asynchronous Spiking Neural Network’. In: Proceedings of ESANN. pp. 313–8. Perrinet, L., M. Samuelides, and S. Thorpe: 2002, ‘Sparse spike coding in an asynchronous feed-forward multi-layer neural network using Matching Pursuit.’. Neurocomputing, in press. Perrinet, L., M. Samuelides, and S. Thorpe: 2003, ‘Coding static natural images using spiking event times : do neurons cooperate?’. Submitted. Rodieck, R. W.: 1965, ‘Quantitative analysis of cat retinal ganglion cell response to visual stimuli’. Vision Research 5, 583–601. Salinas, E. and L. Abbott: 2000, ‘Do Simple Cells in Primary Visual Cortex Form a Tight Frame’. Neural Computation 12(2), 313–335. Schwartz, O. and E. Simoncelli: 2001, ‘Natural Signal Statistics and Sensory Gain Control’. Nature Neuroscience 4(8), 819–25. Thorpe, S. J., A. Delorme, R. Van Rullen, and W. Paquier: 2000, ‘Reverse engineering of the visual system using networks of spiking neurons.’. Vol. 4. pp. 405–8.

perrinet03nc.tex; 14/01/2004; 19:04; p.15

16

Laurent Perrinet

Thorpe, S. J. and M. Fabre-Thorpe: 2001, ‘Neuroscience. Seeking categories in the brain.’. Science 291(5502), 260–263. Comment on: Science. 2001 Jan 12;291(5502):312-6. Thorpe, S. J., D. Fize, and C. Marlot: 1996, ‘Speed of processing in the human visual system’. Nature 381, 520–2. Thorpe, S. J. and J. Gautrais: 1998, ‘Rank Order Coding’. In: Computational Neuroscience: Trends in Research 1998, J. Bower, Editor. Plenum Press: New York. pp. 113–8. van Hateren, J. and A. van der Schaaf: 1998, ‘Independent Component Filters Of Natural Images Compared With Simple Cells In Primary Visual Cortex’. Proc.R.Soc.Lond.B 265, 359–66. Van Rullen, R. and S. J. Thorpe: 2001, ‘Rate Coding Versus Temporal Order Coding: What the Retina Ganglion Cells Tell the Visual Cortex’. Neural Computation 13(6), 1255–83. Vapnik, V.: 1995, The Nature of Statistical Learning Theory. New York: Springer. Vinje, W. E. and J. L. Gallant: 2000, ‘Sparse Coding and Decorrelation in Primary Visual Cortex During Natural Vision’. Science 287, 1273–1276. Zibulevsky, M. and B. A. Pearlmutter: 2001, ‘Blind Source Separation by Sparse Decomposition in a Signal Dictionary’. Neural Computation 13(4), 863–882.

perrinet03nc.tex; 14/01/2004; 19:04; p.16

Suggest Documents