[This document will be published as a CWI Technical Report, please consult the authors for proper referencing]
Dynamic Binding in Sparse Spike-time Vectors Sander M. Bohte
Joost N. Kok
[email protected],
[email protected] CWI P.O. Box 94079, 1090 GB Amsterdam, The Netherlands
ABSTRACT The neural code enabling dynamic binding in the brain has so far remained an unresolved issue in neural modeling. We present a possible solution by developing a neural vector code based on spiking neurons. As such vectors have the ability to simultaneously represent multiple items, this significantly alleviates the so-called superposition catastrophe underlying the dynamic binding problem. We show how this property enables dynamic binding and symbolic processing in properly configured neural networks. Based on interconnected microcircuits, we demonstrate how sparse vectors can preserve an “origin” tag, allowing for a relatively straightforward computation of a solution to the problem of dynamic feature integration. Extending on this observation, we show how the ability to simultaneously encode multiple “tags” in a single neural vector can be employed to generate a representation implementing the Gestalt-criterion of “good-continuation” by separating the “where” stream from the “what” stream. In effect this allows for one stream to compute the “brackets” and the other the “content” of a symbolic representation, implying that the algorithm can be generalized to symbolic processing. The ability to hold multiple items in a single vector then also translates naturally in the capability to deal with ambiguity as multiple interpretations can coexists until sufficient information for resolving the issue is available. Importantly, the neural vector approach explored here enables dynamic binding by conjunction detection. We note that such conjunctions can easily be learned by correlation based unsupervised learning algorithms, such as Hebbian learning. Given the observation that sparse vectors can contain multiple items simultaneously, we derive a lower bound for the item capacity, as well as a lower bound on the benefit of using sparse spike-time vectors instead of binary vectors. For sparse spiketime vectors, we also argue that relative vector latency can implement an intensity measure as an alternative to rate-code. 1991 Mathematics Subject Classification: 82C32, 68T05, 68T10, 68T30, 92J40, 92B20. 1998 ACM Computing Classification System: C.1.3, F.1.1, I.2.6, I.5.1.
Keywords and Phrases: spiking neurons, sparse coding, dynamic binding, dynamic feature integration, sparse spike-time vectors, symbolic processing. Note: Work carried out under theme SEN4 “Evolutionary Systems and Applied Algorithmics”. A short version of this paper has been submitted for publication.
1. Introduction Many consider artificial neural networks as the edifice of universal computation: limited only by time and memory, any function can be approximated by a multi-layer-perceptron (MLP). However, as pointed out already by Minsky & Papert [Minsky and Papert, 1969], the lower bounds on memory (or number of required neurons) can be a serious obstacle. In fact, it has been argued that MLP-type of neural networks, or in fact any current neural network model, cannot compute the problem of “dynamic binding” (as described in Section 1.1). As such, this
2 problem has been a focus of research in both cognitive neuroscience, in particular with respect to elucidating visual perception (starting as early as Donald Hebb, [Hebb, 1949]), as well as in computer vision, where the binding problem for instance arises in image segmentation (e.g. [Pal and Pal, 1993; Mozer et al., 1992]). The rest of this introduction will motivate the architecture and give the basic ideas, and the remainder is organized as follows: we develop a paradigm for artificial neural networks, based on sparse neural vectors. We first examine the dynamic binding problem (Section 1.1) and currently proposed solutions (Sections 1.2 and 1.3), which either suffer from an unbounded number of required neurons, or merely offer a representation instead of a solution. Given these solutions, we examine other current trends in neural networks, namely temporal coding (Section 1.4), and sparse neural vectors (Section 1.5). In particular, we discuss recent results, which demonstrate that such sparse vectors can represent multiple items simultaneously (Section 1.6). As this alleviates the superposition catastrophe underlying the binding problem (see Section 1.1), we will then argue that given suitable network structures, this enables dynamic feature binding with sparse vectors in hierarchical neural networks (Section 1.7). Crucial is the construction of sparse vectors consisting of single firing times of individual neurons (Section 1.8). The general idea is demonstrated in a model that performs dynamic feature integration, but can easily be extended to more complex binding tasks such and dynamic symbolic processing (introduced in Section 1.9). 1.1 The Dynamic Binding Problem The code of classical artificial neural networks is usually considered to be the analog rate-code of the single neuron: a high activation corresponding to a near-maximal spike discharge rate, and a low activation corresponding to a low rate. It has been argued that this code is very poor and too narrow in its possibilities to expand to a class of combinatory problems generally requiring what is characterized as “dynamic binding” [von der Malsburg and Schneider, 1986; Fodor and Pylyshyn, 1988]. Von der Malsburg [von der Malsburg, 1999] illustrates this problem by a classical example due to Frank Rosenblatt [Rosenblatt, 1961]: Imagine a specific neural network for visual recognition, which is internally structured such that it can derive four features and represent them by output neurons. Two neurons recognize objects, a triangle or a square, both generalizing over position. The other two indicate the position of objects in the image: in the upper half or in the lower half, both generalizing over the nature of the objects (see Figure 1). When showing single objects to the network it responds adequately, e.g., with (triangle, top) or (square, bottom). A problem arises when two objects are present simultaneously. If the output reads (triangle, square, top, bottom) it is not clear whether the triangle or the square is in the upper position. This is the binding problem: the neural data structure does not provide for a means of binding the proposition top to the proposition triangle, or bottom to square, if that is the correct description. In a typographical system, this could easily be done by rearranging symbols and adding brackets: [(triangle, top), (square, bottom)]. The problem with the code of classical neural networks is that it provides neither for the equivalent of brackets nor for the rearrangement of symbols. This is a fundamental problem with the classical neural network code: it has no flexible means of constructing higher-level symbols by combining more elementary symbols. The difficulty is that simply coactivating the elementary symbols leads to binding ambiguity when more than one composite symbol is to be expressed. Simple coactivation of the attributes of multiple objects could thus result in the activation of incorrect descriptions of composite symbols: e.g. in the previous example, if there was
1. Introduction
3
a downstream neuron coding for the combination of (bottom, triangle), it would be activated, even though the particular constellation is not actually present, thus causing a “ghost” activation.
Figure 1: Rosenblatt Example, adapted from Von der Malsburg, 1999. See text for explanation.
The problem is a general one: by superimposing the activity responses belonging to different entities, coactivation leads to what is referred to as the “superpositon catastrophe”: the neural code will not express the information required to subdivide the composite state into its correct constituent components. In [Treisman, 1996], Treisman summarizes a (non-exhaustive) list of the various instances of the binding problem encountered in the brain: • Property binding, in which different properties (e.g. shape, color, and motion) must be bound to the objects that they characterize. The ‘binding problem’ arises here because different aspects of the scene are coded in specialized visual areas, with the size of visual receptive fields increasing at later levels of the coding hierarchy. • Part binding, in which parts of an object must be segregated from the background and bound together, sometimes across discontinuities resulting from partial occlusion. • Range binding, in which particular values on a dimension (e.g. purple on the color dimension, or tilted 20o for orientation) are signaled by ratios of activity in a few distinct populations of neurons (e.g. those sensitive to blue and to red to signal purple). Any system using ‘coarse coding’ with just a few distinct populations of broadly tuned detectors to represent fine distinctions along a full range of properties must combine, and therefore bind, different levels of firing in cells with overlapping sensitivities to represent particular point on the dimension in question. • Hierarchical binding, in which the features of shape-defining boundaries (e.g. orientation, curvature, and closure) are bound to the surface-defining properties that carry them (e.g. luminance, color, texture, motion, and stereoscopic depth).
4 • Conditional binding, in which the interpretation of one property (e.g. direction or motion) often depends on another (e.g. depth, occlusion or transparency). • Temporal binding, in which successive states of the same object must be integrated across temporal intervals, in real and apparent motion, and other transformations. • Location binding, in which objects are bound to their current locations. Objects and locations appear to be separately coded in ventral and dorsal pathways, respectively, raising what may be the most basis binding problem: ‘what’ to ‘where’. Note that several such bindings have to be performed in parallel and appear to be accessible simultaneously in the brain, as is for instance the case for inter- and intra-object binding: two complex objects are not just perceived as two blobs, but the correct relative binding of the parts is also available. 1.2 Conjunction coding Rosenblatt’s example has a very simple solution in terms of conjunction coding: one could have a neuron respond to the conjunction of a triangle in the top, and another neuron to a square in the bottom. This could be realized with direct connections to the input-image where no generalization has taken place. The main gripe with this scheme is the sheer number of conjunction-detecting neurons that would be required: consider M possible positions and N shapes, the detection of the possible (position, shape) pairs would require M xN neurons. Also, evidence exists that suggests that the brain does not simply use extensive conjunction coding, as testified by incorrect bindings caused by “illusory conjunctions” (e.g. see for example [Wolfe and Cave, 1999]). 1.3 Binding by Temporal Synchrony. With the recognition that rate-based neural codes have a limited scope, it has been suggested that the timing of individual spikes could convey additional information. Most notably here, it has been theorized [von der Malsburg, 1981; Milner, 1974] that neurons coding for components that belong to the same object could synchronize their spike discharge. Within a specified time-interval, alternation of synchronized neurons coding for different object-representations could thus in effect signal the presence of multiple objects. In spite of physiological evidence which can be interpreted as support for such a theory (reviewed in [Singer and Gray, 1995]), this theory has been heavily criticized (e.g. comprehensibly summarized in [Shadlen and Movshon, 1999]). We like to add that as presented, temporal synchrony would constitute a representation of a solution. It is however not clear at all how such a representation would be computed. For hard problems, this distinction is not trivial at all: compare it to the traveling salesman problem (notably NP-complete). A representation that would enumerate all paths starting with the shortest would be very convenient, but hardly useful for finding the minimal path solution. Moreover, with respect to the computation of visual objects a number of the algorithms used are quite well known (notably Gestalt-critera [Kanisza, 1979]). It is not clear at all how such algorithms would translate their results into temporal synchrony, or, for that matter, why (as argued in [Shadlen and Movshon, 1999]). 1.4 Temporal Coding The possibility of exploiting the time-domain remains intriguing though, especially as experiments have demonstrated that the brain is able to very quickly process information: one spike might be all that is available from a cortical neuron [Thorpe and Gautrais, 1997]. As such, it has been demonstrated that an alternative neural code can be constructed where the relative timing of individual spikes conveys intensity information [Maass, 1997]. In theory, temporal coding is computationally more powerful than
1. Introduction
5
rate-code [Maass, 1996]. In practice, networks of spiking neurons have been demonstrated to perform well, both for unsupervised clustering [Natschl¨ager and Ruf, 1998; Bohte et al., 2000b] as well as for error-backpropagation-based supervised learning [Bohte et al., 2000a]. We note however that in itself, temporal coding does not solve our problem, as long as we use an atomic representation where the activity of a single neuron has a specific semantic interpretation. 1.5 Neural Codes: Sparse Coding This brings us back to the issue of how items (be it objects or features) can be encoded. This has been the subject of much study and debate, where, in contrast to the semantic, atomic neuron approach, most efforts have been focusing on the encoding possibilities in a finite set of neurons. Considering the activation of such a set of neurons as a binary vector, the maximal number of different representations that can be encoded by N neurons equals 2 N , in which each binary pattern is associated with an item. The atomic view, where each neuron encodes an item, would only be capable of representing N items. In the coding debate, a recurring theme has been the possibility of sparse binary codes where a low ratio of active neurons encodes an item, as it is considered efficient both from an information theoretical as well as from a metabolic point of view (e.g. see [F¨oldi´ak and Young, 1995; F¨oldi´ak, 1990; Olshausen and Field, 1996; Meunier and Nadal, 1995]). As noted, when discussing the binding problem, single neurons are generally considered in the atomic representation. To be more precise, the instantaneous response of a pool of neurons is taken as the average activation for a particular input [Shadlen and Newsome, 1994], and as such fulfills the role of semantic atom. Given the theoretic reasoning above, it seems worthwhile to investigate the benefits that can be derived from taking the activity in such a pool of neurons as a neural vector. As we will show, it is not too difficult to derive a rapid and efficient solution to the binding problem once we turn to sparse vectors. 1.6 Multi-item vector coding When considering a sparsely populated neural activity vector, recent work has demonstrated that multiple items can be represented simultaneously in such a sparse binary vector [Rachkovskij and Kussul, 1999]. These multi-item vectors can be created by merging multiple sparse vectors each representing individual items into a new vector of the same dimensionality. When sparseness is maintained by the merging operation (by eliminating part of the activity in the input vectors), an element-by-element comparison shows that this new vector is still much like its constituents. For other, relatively randomly populated sparse vectors such overlap will be very small, as it occurs only by chance. The merged vector can thus be considered to contain the merged input vectors. In the geometrical sense, the resultant vector still lies in the (average) direction of its input-vectors, whereas other vector will be relatively orthogonal. The general idea is shown in Figure 2.
Figure 2: A) Merging binary sparse vectors: the overlap between S1 (as well as for S2 ) and the resultant vector overlaps 4 out of 8 active neurons. For a random vector, chance dictates an average overlap of only 1/9 of the active neurons, or less than one “1”. B) Geometrical interpretation.
6 1.7 Dynamic Feature Integration with Sparse Spike-time Vectors What multi-item vectors in effect allow, is to alleviate the superposition catastrophe: the vector representation prevents mixed sources to become indistinguishable. In this paper, we develop this idea and demonstrate a neural implementation of computations with sparse neural vectors capable of both computing and learning dynamically bound features. To this end, we will describe a modular architecture capable of dynamic feature integration when processing sparse vectors containing multiple items. We derive neural vectors from small local neural networks, loosely modeled after cortical microcircuits of several hundreds to thousands neurons [Douglas and Martin, 1991], see also [Maass, 2000]. From these microcircuits we construct invariant feature-processors and show how they can preserve and merge sparse vectors that match the particular feature and subsequently propagate them as output activity. Our key observation here is that if a vector containing many extractable features is thus processed in parallel by different invariant feature-processors, the unique activity pattern will be the resultant output of the feature-processors selective for the features incorporated in that vector. Given suitable connectivity, this allows simple coincidence detection to be sufficient for object recognition by feature conjunctions. When sparse vectors belonging to different objects are orthogonal, this ability is relatively independent of the presence of other objects. When multiple objects share the same feature, the vector output of the feature-processor can represent both items simultaneously. 1.8 Sparse Spike-time Vectors For rapid dynamic binding, we consider sparse coding within single spike time-windows: instead of a the binary activation vector, we construct sparse spike-time vectors from neural microcircuits by taking as elements the relative firing times of active neurons and “null” or “x” for those that did not (fig. 3). If the neurons within microcircuits have highly variable responses to a stimulus, this will promote spike-time vectors originating from different sources to be virtually orthogonal. The spike pattern can thus be considered a unique “tag” labeling the source from which it originated. To make use of this tag, the sparse vector has to be preserved when processed, and hence propagated with a fair degree of precision through specific connectivity. Also needed is circuitry that is capable of recognizing meaningful conjunctions of tags. We will detail the inter-circuit connectivity required. As we consider the case where the output of feature-processors is a sparse-vector, detecting conjunctions originating from the same source corresponds to dynamic feature integration, where a feature-processor can exhibit a (limited) number of tags simultaneously (thus overcoming the superposition catastrophe).
Figure 3: Spike time vectors. Each element corresponds to the spike time of a neuron relative to the first to fire. Non-firing neurons are denoted by an “x”. The creation of a new vector out of two existing ones is discussed in Section 2.
1.9 Beyond Dynamic Feature Integration Being able to represent a number of items simultaneously in a single vector enables a form of dynamic feature integration where the intrinsic orthogonalitiy of the different sparse vectors is exploited. This does not yet solve the more complex forms of dynamic binding such as binding
1. Introduction
7
across space. Rather, starting from the observation that the initial vectors originating from different places are relatively randomly populated, we are faced with the problem that two random vectors are hard to relate to each other (based on activation pattern alone). Here, we make another observation: having four items in a single vector representing four “features” can just as easily be interpreted as a vector representing a single item containing four “relative relationships”: it is not prohibited to obtain the random spike-pattern in a more clever way, possibly by separating the feature-detection from the pattern generation for vectors. Inspired by the separate “what” and “where” pathways in the brain ( [Grossberg, 1998; Mishkin et al., 1983; Goodale and Milner, 1992]), we show in Section 7 how such separate pathways can implement the Gestalt-criterion of “good continuation” by giving an algorithm for binding together parts into wholes, which then represents both parts and whole simultaneously. The particular network we describe serves as an example, but in general separating the description of an object into “what” and “where” in this framework amounts to separate pathways for calculating the “brackets” and the “content” of a symbolic description. In that sense, the neural vector approach enables a (limited) number of brackets to be inserted into a symbolic description. This suggests that such structures are suitable for general symbolic processing, and in fact the dual processing of complementary information is observed in a number of cortical information processing systems [Grossberg, 1998; Lamme et al., 1999]. It has been noted that the pervasive ability of the cortex to handle symbolic information is one of the key differences with the abilities of current classical neural networks [von der Malsburg, 1999]. 1.10 Overview We give an overview of the rest of this paper and its main contributions. • In Section 2, we show how changing sparse binary vectors to sparse spike-time vectors allows for the deterministic combination of multiple items into a single sparse vector. • In Section 3 we derive a lower bound for the number of distinct items that can be simultaneously represented by a sparse spike-time vector. • In Section 4, we argue that neuronal microcircuits implementing k-Winner-Takes-All (kWTA) can be used for creating spike-time vectors that contain multiple items while preserving sparseness, as only the spikes of the k winners are then present in the new vector. Multiitem sparse spike-time vectors created by kWTA enable the detection of feature-conjunctions by coincidence detection, as opposed to binary vectors. • In Section 5, we demonstrate that feature-processors that associate latencies with sparse vectors before merging them can thus encode a graded feature match by influencing the relative proportion of an input vector within the k winning spikes. Such relative latencies could provide a substrate for object enhancement through (attentional) excitatory feedback. • Using kWTA networks we can construct invariant feature-processors whose outgoing activity can be identified with (merged) sparse input vectors that match the feature (Section 6). Correct feature-conjunctions can then be detected by element-wise coincidence-detection, as only vectors originating from the same object will not be orthogonal to each other. In the case of multi-item output-vectors part of the vectors will still be coincident, enabling (diminished) detection. False “ghost” conjunctions are avoided since feature-vectors belonging to different objects are orthogonal. • In Section 7, we suggest how the ability to encode multiple tags into a single vector could allow for binding across space by using Gestalt-criteria as determinants for vector merging in a “where” stream, whose results are then imposed on the “what” stream location-wise. This constitutes at least an actually working algorithm to perform both inter- and intra-object whole-part binding simultaneously.
8 • Separating the calculation of relationships between parts from the detection allows for what amounts to a symbolic representation where one stream provides the correct “brackets” and the other the “content”. • Finally, in Section 8, we discuss the implications of moving to a neural vector code and assess the available evidence. 2. Merging sparse spike-time vectors A network of directly interconnected excitatory neurons reciprocally connected to local inhibitory interneurons is loosely modeled on neural microcircuits [Douglas and Martin, 1991]. Such a circuit has been argued to in effect perform k-Winner-Takes-All selection by transmitting only the k first arriving spikes [Maass, 2000].
Figure 4: A) Three microcircuits. Inhibitory interneurons are depicted as filled circles, excitatory neurons either as open circles (lower circuit) or diamonds respectively squares (upper circuits). The two upper circuits are connected to the lower circuit by excitatory connections only, within the circuit the excitatory neurons are reciprocally connected to the interneurons (connections depicted in lower circuit). B) By connecting the circuits neuron-by-neuron, in effect a neural vector in transmitted and processed. For spike-time vectors, this microcircuit can implement a vector merging operation when we take the activity of the excitatory neurons as the outgoing sparse vector, and input from other such circuits is exclusively directed to these neurons (fig. 4A, B). Input signals evoke spikes in the circuit until local inhibitory feedback becomes too strong (dotted line in fig. 5A), roughly transferring some k spikes. When connecting several such circuits neuron-by-neuron (fig. 4B), the resultant activity is a mix of the input vectors and can be interpreted as a multi-item sparse vector. For sparse spike-time vectors, the merging of two sparse-spike time vectors is graphically depicted in Figure 5B. In several aspects, such circuits deviate from a simple kWTA network: as the activity limitation is achieved through indirect inhibitory feedback, the temporal structure of the input can alter the activity-ratio drastically: in particular, perfectly synchronous input will not be subjected to kWTA simply because the negative feedback takes a small, but finite time (e.g. [Gupta et al., 2000]). For now, we will consider only Gaussian temporal spike distributions. 3. Item capacity. When merging multiple vectors, the number of spikes from each vector present in the resulting vector decreases as a function of the number of vectors (or items) merged into it. Identifying the k winning spikes with activity-ratio (or sparseness) P , we establish a lower bound on the number of items that can be represented in a single sparse vector as a function of P . We define the similarity between two vectors X and Y as the proportion of corresponding non-null elements (X(i) 6= ∗ ∧ Y (i) 6= ∗).
4. Merging Vectors
9
Figure 5: A) Combination of two spike-time vectors into one output vector. The two input vectors originating from respective circuits are represented by diamonds respectively squares. Input vectorelements projecting onto the same output neuron are depicted on the same input-line. The outgoing resultant activity is a mix of the spikes in the two input vectors that constitute spikes arriving before feedback inhibition becomes too strong. In effect the first k arriving spikes are selected (filled circles). Note that coincident input onto an output neuron has a higher survival chance, as the inhibitory feedback required to remove them is larger. B) Merging of spike-time vector in a microcircuit.
When we assume that the number of input spikes preserved from each input vector is inversely proportional to the number of items n, the overlap between the original vector and the merged vector equals P/n. This corresponds to the number of elements that are active in both the source vector (S1 in Figure 6A), and the part of this vector preserved in the multi-item vector (S10 ). However, the number of coincidences between the multi-item vector and the n − 1 other source vectors by chance equals (n − 1)P 2 (Figure 6B). For the sake of argument, we assume here that none of the n − 1 other objects is present in the multi-item vector. This merely calculates the required portion of S1 needed to be significantly above noise levels. A lower bound on the capacity can then be derived by noting that a merged proportion P/n is no longer informative when the overlap between the merged vector and n − 1 other object vectors occurring by chance is equally large: P/n = (n − 1)P 2 . For an average firing probability of 0.08 (corresponding to the sparseness), the capacity of a sparse vector would be about 4 items. The number of neurons that participate in a single vector would in this interpretation just serve as a means of approximating this sparseness. 4. Merging Vectors To maintain sparseness in the output vector, the kWTA process removes all spikes from sparse input vectors after the arrival of the first k winners. Compared to random elimination of spikes, as would be the case for binary sparse vectors, the benefit of this deterministic elimination becomes clear when we consider m parallel cases of this sparsening. For an initial input vector consisting of k spikes (or density P1 ) reduced to l spikes in the output vector (the sparsened density P2 ), the part of this vector preserved in all instances of m randomly sparsened vectors is roughly (1 − (P1 − P2 ))m (the overlap), which for small probabilities approximates to P1 − m(P1 − P2 ) (Figure 7A). For a temporal order based sparsening, the overlap simply coincides with P 2 as the same spikes are eliminated each time (Figure 7B). If P2 equals P1 /n for n merged items, random sparsening reduces the overlap rapidly as a function of the number of sparsened instances m, whereas for temporal order sparsening the signal only degrades with the item-load n. For the proposed feature-binding scheme allows the detection of conjunctions of many features by coincidence detection with sparse spike-time
10
Figure 6: Item capacity. A) The overlap between a merged vector and the original vector S 1 equals P/n. The overlap corresponds to the number of neurons that become activated in a vector of coincidence detecting neurons with threshold θ = 2 spikes. B) The overlap due to coincidences by chance with the n − 1 other vectors in this situation equals P 2 (n − 1) and this would constitute the “noise” level.
Figure 7: Sparsening multiple instances. A) Binary vectors: random sparsening flips all but three “1”’s to zero’s. The overlap between two sparsened instances diminishes rapidly. B) Sparse spike-time vectors: sparsening after the first three spikes reduces the overlap only as a function of sparsening (i.e.: number of items in the vector), and not of the number of sparsened instances.
vectors, as opposed to binary vectors where this ability decreases inversely with the number of feature conjunctions determining an object. 5. Latency and intensity coding. Added latency before merging sparse spike-time vectors can encode a measure of intensity or relative feature match. An alternative for rate-coding is needed as sparseness is fixed to k winners and spike-density is thus not available as an intensity measure. Even when allowing for a changing sparseness, it would be an ambiguous measure, as it could arise from a changing relative feature match or a changing number of merged items. We note that when merging vectors, kWTA removes all spikes that arrive after the first k winners. Manipulating the average arrival time of spikes of an input vector thus influences how many spikes from this
6. A solution for the dynamic binding problem
11
vector are incorporated into the output vector. This is illustrated in Figure 8. In Figure 8A, two vector with zero relative delay are merged: in the resultant sparse vector where only 11 neurons can become active, each is represented by six spikes (one coincident). In Figure 8B, the vector S1 is delayed relatively to S2 : if again only 11 output neurons are allowed to become active, the ratio between S1 and S2 becomes 4/8, a dramatic decrease.
Figure 8: The proportion of a sparse-vector present in the resultant merged vector depends on the relative delay between vectors: in A), zero relative delay between S1 and S2 , in B), a significant delay. Relative representation of S1 in the outgoing vector decreases from 6/6 to 4/8
Say a fraction Ai of vector i arrives before the k-th spike (corresponding to time τ ): Z τ Si (t)dt, (5.1) Ai = −∞
with Sj the spike-time distribution of vector j. The relative representation Ti amounts to: Ai Ti = P , j Aj
(5.2)
for j sparse spike-time vectors with fraction Aj in the resultant output vector. For n Gaussian spike-time distributions, the effect of added latency to one vector (Figure 9A, ∆t in temporal width σ) relative to n − 1 others (with zero latency) then equals: Ai =
Z
τ
Si (t − ∆t) + −∞
n−1 X
Sj (t)dt.
(5.3)
The result is plotted in Figure 9B for n = 2, 3, 4 and 6. As can be seen, latency has a gradual effect on the relative representation of an item in the multi-item vector, and can hence serve as an intensity measure when added to individual vectors by feature-processors before merging. If one were to provide excitatory feedback for a specific location, the effect would be the reduction of the latency of the associated vectors. The components or features constituting the object will then be enhanced in what could be regarded as a “cocktail-party-effect”. 6. A solution for the dynamic binding problem We can recast the binding problem as introduced in the introduction in a more abstract setting: consider the case of two arbitrary objects S1 and S2 each containing two features. To detect which features constitute the objects we could use invariant feature detectors A, B and C. Subsequent detection of feature conjunctions would take place in “object detectors” AB,
12
Figure 9: Proportion of a sparse-vector in a merged vector as a function of vector latency ∆t relative to n−1 other included vectors. A) Shaded area: a normalized activity of magnitude 0.5 corresponding to k spikes is allowed to pass before feedback inhibits all other activity. For different relative latency’s, the relative proportion of the early vector (dark shaded) vs. the n − 1 other vectors (light shaded) is calculated. B) Relative proportion of delayed vector present in merged vector for n =2,3,4 and 6. A vector relatively early is over-represented, whereas a late vector is under-represented.
AC and BC. If S1 were to contain features A and B, and S2 were to contain B and C, hierarchical activation-based detection would first activate detectors A, B and C, and subsequently AB, BC and also AC, even though the latter object is not present, but its constituent features are: an example of the aforementioned “ghosting”. Without feature invariance however, the number of isomorphic feature conjunctions suffers from a combinatorial explosion and object detection by feature conjunctions becomes infeasible. The dynamic binding problem as described can be solved by constructing more involved feature-processors (boxes in Figure 10A, see also Section 6.1). For objects K each generating spike-time vectors Sk , these structures can be designed to yield as their output those input vectors that sufficiently match the feature. In the case that multiple input vectors contain the same feature, these vectors are merged into the sparse output vector. Downstream of these feature processors it is then sufficient to perform element-wise coincidence-detection on the output-vectors of those feature-processors making up a particular conjunction to determine the presence of an object exhibiting these features. To this end, we extend conjunction detectors to a population of neurons where each neuron is only connected to its counterpart in the output vectors of the feature-processors that make up the conjunction. The threshold for these neurons can be set such that they are triggered only by coincidences from all required features in their input (threshold ≈ n spikes for a conjunction of n features). In the example case of two objects generating sparse random spike-time vectors S 1 and S2 (fig. 10A) where the vector S1 contains features A and B, and S2 features B and C, no neurons in conjunction detector AC are activated (fig. 10C), whereas in the correct conjunction (object) detectors many are (fig. 10B, D). Note that activity is diminished by the vector-merging in feature-processor B. The element-wise conjunction detection is depicted in detail in fig. 10B-D. Although this example of dynamic feature binding is highly simplified, and all conjunctions of features are detectable as objects, the encoding shares the same approach to the combinatorial explosion of feature-combinations as the synchrony-hypothesis (e.g. [von der Malsburg, 1981]): a subset of familiar objects is activated by virtue of the coincidences of its constituent features, and unfamiliar feature-combinations are discernible as they are temporally congruent, and can hence easily be learned. However, here we can also give an algorithm (i.e., the network design) that uses this encoding to process multiple objects simultaneously.
6. A solution for the dynamic binding problem
13
Figure 10: A)Each input stream is connected to all invariant feature-processors, which in turn propagate the matching input-vector(s) as output signal. The input to the coincidence detecting object-processors is shown in Figures B-D. For a threshold of two spikes, only the neurons with filled circles are activated, input from respective feature-processors is denoted by dark/light coloring. B) Part of the S1 vector overlaps with the combined S1 ∧ S2 vector and activates coincident detecting neurons. The same applies to the BC detector (D). In C), the orthogonality of the two vectors in AC ensures that for coincidence detecting neurons with a threshold of approximately 2 spikes, no activity will ensue.
Note that as corresponding vectors are temporally and spatially congruent, simple conjunction detection suffices for recognition. This has the added benefit that such temporal conjunctions can be learned by (biologically plausible, [Markram and Tsodyks, 1996]) unsupervised Hebbian learning [Natschl¨ager and Ruf, 1998; Bohte et al., 2000b]. 6.1 Feedforward detection and merging At the risk of drawing criticism with regard to the existence of biological counterparts, we present algorithms that compute with sparse neural vectors and that can also be embedded relatively straightforward into neural networks. We present a design capable of selecting the proper (feature-carrying) input vectors and subsequently merging these vectors into an outgoing vector of same sparseness. This feature-processor is depicted in fig. 11. Every input vector is connected to (mostly) sub-threshold neurons in the first layer. The middle layer constitutes the invariant feature detector. Assuming that the presence of a feature in a vector consists of certain level of activity within a designated subset of feature-neurons, one option consists of having the feature-detector continually “prime” these feature-neurons in layer I, raising their potential sufficiently to enable them to reach threshold when they receive input from outside. Excitatory lateral connections then provide the feedback sufficient for those sub-threshold (non-feature) neurons that also receive external input to reach threshold, and
14
Figure 11: A feature processor capable of rapid feedforward feature-detection and subsequent (delayed) vector merging. Small circles denote subthreshold neurons, double circles feature neurons. See text for explanation.
in effect reconstruct the sparse vector. Since the lateral excitation depends on the number of feature neurons that become activated, this translates into a delay of the remaining neurons relative to the “feature-match”, effectively implementing the latency code discussed in Section 5. The resultant activity is then merged in the output kWTA network (layer III), possibly modulated by the activity in the feature-detector. This proposal is essentially feed-forward, but depends critically on priming of featurespecific input neurons by the feature-detector, and hence provides a poor substrate for invariant learning in the feature-detector. An alternative (circumventing this problem) can be implemented with some modifications to the original circuit: we consider two subsequent spike-vectors impinging on the input-layer at time t0 and t1 , some time apart from each other. The two volleys have to have roughly the same set of active neurons (not necessarily with the same temporal pattern). If the threshold of the input layer allows the first volley to be simply transmitted, the composite activation of all input-vectors can be constructed in the first layer of the feature-detector. When a sufficient feature-match is present, the second layer in the feature detector becomes active, and “enables” the output-circuit and also projects activity back to the feature-specific neurons in layer I. In the case of suitable synaptic-depression [Abbott et al., 1997; Markram and Tsodyks, 1996], neurons receiving the second incoming spike volley will require this additional feedback to reach threshold, and the same vector-completion process as earlier described can take place. In Figure 12 the response of the feature processor to the two volleys is described: in response to the volley at t0 , the presence of the feature in the composite input is detected, and excitatory input is send to output-circuit, in time to enable the just arriving (e.g., delayed) spikes from the first volley to be transmitted. As every single active input vector is mixed into the output, this signal does not convey any information regarding its origin. At the same time, activity is projected back from the second layer in the feature-detector to the neurons specific for the feature in the input-circuits. Given suitable delay, this back-projection will coincide with the arrival of the second volley. As it is well known that the synapses required for spike-transmission tend to have lesser efficacy with
6. A solution for the dynamic binding problem
15
the number of spikes transmitted (synaptic depression), the threshold for the input neurons can be set such that after the first volley, their effective threshold becomes 2 spikes, and they hence require the folded back-projection from the feature-detection in layer II to reach threshold. Hence the second volley will only be transmitted when it contains a minimum number of active neurons specific to the feature.
Figure 12: A feature processor capable of rapid folded-back feature-detection and subsequent (delayed) vector merging. In B), small circles denote subthreshold neurons, double circles feature neurons. Learning in the feature detector can take place because of the specific loop-structure: given a composite signal in the first feature layer, the second layer can learn repeating patterns in an unsupervised fashion [Natschl¨ager and Ruf, 1998; Bohte et al., 2000b]. When the third feature layer projects element-by-element to the input circuits, and is connected one-to-one to the first and second feature layer, it can by simple conjunction detection learn which neurons
16 are to be given feedback when the feature detecting neurons in the second layer become active (Figure 13).
Figure 13: Invariant learning in the feature processor. This design of the “feature-processor” is mostly ad-hoc and given the large number of potentially equally (or more) functional designs, we do not claim it directly corresponds to any biological structure. Only detailed neurophysiological studies could further illuminate possible biological implementations. However, our computer simulations work well and are capable of rapidly and dynamically binding features to multiple objects with invariant feature detectors. Also note that beyond the level of the feature-processors, signals are effectively invariant, allowing for invariant feature conjunction learning by temporal correlation. 6.2 Including Latency The two designs for feature-detection and merging based on neural vectors implement relative vector-latency to encode feature-match. Also on the scale of invariant conjunction detection (bottom circuits in Figure 10), latency acts as an intensity measure: the optimal presence of features constituting an object will result in the earliest possible arrival of the respective feature-vectors. Any configuration where one (or more) of the features is further from optimal will result in a relatively later activation of integrate-and-fire neurons in the conjunction detector (Figure 14).
Figure 14: Binding with delays: an intensity code. Left panel: if all components of an object are optimally detected, the respective vectors will all arrive at the earliest possible time (t 0 ), and hence the conjunction detector will fire early. Right panel: a less perfect match will delay the arrival of a number of neural vectors, delaying the time when the neurons in the conjunction detector reach threshod.
7. Binding across space In the previous section, we have demonstrated that two main ideas can be integrated into one neural network architecture to solve the problem of dynamic feature integration: we can overcome the superposition catastrophe by using multi-item sparse vectors, and we can build
7. Binding across space
17
a network in such a manner that the detection of (correct) feature configurations can be performed by simple conjunction detection. This second (implicit) idea is important, because it allows Hebbian-style correlations to be sufficient for learning these conjunctions, allowing us to maintain the essentially hierarchical picture of neural processing, where in every successive stage increasingly complex constellations in the input are detected out of the local results obtained in previous stages. One problem with the model presented however, results from the requirement of extracting a neural vector from the source and preserving this vector as a “tag”, to distinguish the constituent features from other sources. We note that it will be hard to learn meaningful conjunctions of multiple sources, as the vectors representing these sources are supposed to be both random and orthogonal. However, integrating multiple sources is exactly what is required in tasks such as binding across space. Here, we are going to deviate from the “practical” solution to the feature-integration problem we started with. It is easy to see that sparse spike-time vectors allow for the simultaneous addition of a number of “tags” to a feature. So far, we have assumed that a relatively random process generates these “tags”. However, if we were to have some means of deciding which “tags” are applied to which feature(s), we would have (at least in principle) the tools for symbolic manipulation. Consider the example given in Section 1.1. In the case of the simultaneous presence of two objects, simple feature detection yielded the feature vector {top, bottom, square, triangle}, without any internal structure. By arguing that top and triangle share the same original neural vector (and the same for bottom and square), we in effect use symbolic brackets: {[top, triangle],[bottom,square]}. In this interpretation, sparse codes allow for a (limited) number of brackets to be inserted into the feature-vector. However, for vectors generated randomly at the origin, it will not be sufficient for more intricate operations involving the correct combination of multiple (spatially distant, hence orthogonal) vectors. We argue that separating the “what” from the “where” can enable just that: dynamic symbolic processing in connectionist networks. It has been well established that in the brain, the “what” is separately processed from the “where”, although extensive cross-connections are present (e.g.: [Grossberg, 1998]). Without arguing that this is necessarily the way the brain is organized, vector combination and generation in the “where” stream subsequently used as input vectors for the “what” stream enables the same sort of conjunction-detection as mentioned in Section 6 to be used for the detection of (multiple) spatial configurations (object, or objects consisting of configurations of objects etc). Consider the detection of two separate lines, which would qualify for the Gestalt criterion of “good continuation” (Figure 15). From this drawing, three objects can be constructed: two separate lines, and the composite “whole”: a dotted line. As “good continuation” constitutes an important feature for binding parts into wholes, we assume composite detectors are present for every topographical position, and the same for the basic line detector. The activity at the first “where” stage would suffice for generating the original object “tags” of the two line segments A and B: [A] and [B]. To enable the dynamic construction of the “whole” out of the basic line-segments, we basically want to add a “tag” to the representation of the two line segments A and B: {[A],[B]}, that is: A and B are bound by “good continuation”. When the local “what” detectors receive input from the corresponding “where” location, spike patterns (the vector-“tags”) generated in the “where” stream will be incorporated in the spike patterns of the “what” A and B. Thus the first layer establishes the distinguishable spike-vectors [A] and [B], as is shown in the left side of Figure 15. The lateral connections between the two streams correspond to a direct topographically mapping of spike-vectors to (in general a multitude of) feature-maps. The “where” stream further consists of local circuits detecting local “good continuation”, that detect the conjunction of activity in two conjoining
18
Figure 15: Detection of good-continuation and inclusion of tag into line representation. The initial response is separately treated in two pathways: the detection of line-segments in the “what” stream, and the generation of sparse vectors for the respective positions in the “where” stream. The vector generators in the “where” stream are triggered by any activity at the particular position, and are topologically connected to corresponding input in the “what” stream. Hence the vector [A] is generated at the left-hand side, and is used as complementary input in right-hand side (next to the direct feature specific connections that characterize the detection of a line-piece). In the “where” stream, the local detector of good-continuation is activated and generates a vector that is projected back to the corresponding locations of both sources, hence providing a common vector-signature to the “what” locations of both A and B. Or, in a symbolic interpretation, in the top “where” layer, the square brackets for the input are generated, and subsequently the curly brackets, yielding the symbolic {[A],[B]} representation.
locations along a preferred axis, regardless of the identity of the objects generating the activity (converging connections in the “where” side of Figure 15). The thus generated “good continuation” spike-vectors are again laterally connected to the corresponding topographical location(s) in possibly all feature-maps in the “what” stream (lateral projection in Figure 15, depicted for one location in Figure 16). For A and B, both locations are covered by the “good continuation” detector: the topographically mapped spike vector will be directed to both locations and provide a third spike-vector merged into the representation of both A and B, in effect establishing the curly brackets in the description of the scene: {[A],[B]}. Hence, we can have the “where” stream provide topographically wired spike-vectors as (additional) input into the “what stream” to provide for a hierarchy of symbolic “tags”. In this construction potentially many more “tags” will be merged into the “what” stream then can be carried by sparse coding. Here, we note that back-projection from activated feature-conjunctions can engage latency competition to effectively suppress all but the best matches, implementing a means of figure enhancement and selection by object-based attention. Several remarks can be made with respect to this construction: firstly, relating any two random signals to each other, without reference, is difficult. Hence the need for a point of reference, where for the spatial binding problem we chose the location. Note that psychophysical evidence seems to agree [Treisman, 1998]. Secondly, the goal is not to provide the algorithm by which the brain performs binding according to Gestalt criteria, but rather to show that it is possible to create efficient neural algorithms that implement such criteria and that are also compatible with the sparse-vector
8. Discussion
19
Figure 16: Generating vectors for a local feature. Vectors corresponding to different local relationships are generated in the “where” stream and projected to the corresponding “what location. The “what” location is sensitive to a (simple) local feature (detected by feature-detecting neurons, filled circles), and if these are present, the remainder of the neural vector consists of competing (kWTA) contributions from the “where” stream.
representation. 8. Discussion In this section, we discuss the neural algorithm and the biological evidence supporting it. 8.1 The neural algorithm We have demonstrated how the temporal dimension of individual spikes can be employed to both support the simultaneous representation of multiple objects or features, as well as provide a means of coding intensity. Together, this allows for a solution of the problem of representing an object as a conjunction of invariant feature detectors in the presence of other objects. In the spirit of Treisman’s Feature Integration Theory [Treisman, 1998] this is accomplished by preserving a naturally assigned spatial “origin”-tag. The ability to simultaneously represent multiple-itmes also allows for simple conjunction detection to suffice for object recognition. This has as an added advantage, that conjunctions can easily be learned by unsupervised correlation detecting learning mechanisms, such as biologically plausible (temporal) Hebbian learning [Markram and Tsodyks, 1996; Gerstner et al., 1996; Kempter et al., 1999]. This is only part of a solution to the dynamic binding problem, as it is not sufficient for, for instance, binding across space. Interpreting the item capacity of a sparse vector in terms of spatial relationships determined in a “where” stream and then merge appropriate relational vector-tags into the “what” stream seems a reasonable extension to solve such more complex problem. In effect the “where” stream then provides the brackets and the “what” stream the content for a symbolic description of a scene. Obviously such a solution can also be considered in a general scheme for symbolic reasoning and in fact such dual streams are observed extensively in the brain [Grossberg, 1998]. Having multiple items in the same output vector does jeopardize the notion of firing-rate as a means of intensity coding. As we have shown, vector-latency can be a viable alternative. However, a precise temporal spike code as explored in [Maass, 1996; Natschl¨ager and Ruf, 1998; Bohte et al., 2000b] is not explicitly required in our construction. The feature-processors presented merely rely on the reasonable preservation of a vector of in itself potentially meaningless spike-times.
20 As a solution to the problem of how to represent multiple objects simultaneously, the “synchrony hypothesis” has so far been the main theory available, but doubt with regard to usefulness as well as plausibility has been mounting, most notably as recently spelled out by Shadlen & Movshon [Shadlen and Movshon, 1999]. We have presented an alternative strategy based on sparse neural vectors to overcome the superposition catastrophe. This strategy has several advantages: an essentially orthogonal code is easily generated, kWTA vector merging can be implemented in small neural microcircuits, feature-integration can be performed quickly and there is no need to establish (the slow process of) synchronization. Moreover, in our construction, we retain the ability to learn conjunctions via Hebbian-style correlations. The strategy developed bears strong resemblance to the Code-Division strategy (CDMA) protocol as employed for instance in cell-phones to allow multiple users to share the same limited bandwidth. Intriguingly, a drawback of CDMA, the susceptibility to the “cocktailparty-effect” [Kohno et al., 1995; Lupas and Verdu, 1990] can be interpreted as a substrate for object enhancement through feedback in our implementation. 8.2 Biological Evidence There exists direct physiological evidence that demonstrates that the relative timing of large numbers of neurons is indeed important on a fine temporal scale. From [Singer, 1999]: [In a micro-stimulation experiment in the optic tectum of a cat, Brecht et al. (1997, Soc. Neurosci., abstract, [Brecht et al., 1998]) report that, SMB ] if two spatially distant sites are stimulated with two synchronous train stimuli, the vector of the resulting eye movement is the average of the vectors corresponding to the two sites. However, if the two trains are phase shifted by >5 ms, the vector of the resulting eye movement switches from the average to the sum of the individual vectors. Hence, the relative latency on a very fine temporal scale does make a coding difference in the cat’s brain. Moreover, many physiological studies have found that the onset of (first) spikes after a stimulus presentation can be very precise [Abeles et al., 1993; Buracas et al., 1998; Heller et al., 2000], and such high temporal precision can be maintained when propagating spike-volleys through (modeled) cortical circuits [Diesmann et al., 1999]. It is thus very well possible that the spike-times themselves carry additional information as well, and this could be incorporated into the model. Also, in the presence of a fixed number of competing vectors, latency determines the respective input vector sparsening, and thus activity downstream in conjunction detecting circuits. Hence at this level a rate-code is available, and the debate about which neural code is used is not yet decided, although temporal coding seems to be far more efficient in terms of the number of neurons required and individual computational power. Physiological evidence also supports a number of key issues needed for neural vector coding: small cortical neural microcircuits that can implement kWTA abound in the cortex [Douglas and Martin, 1991], the neural availability of an essentially orthogonal code has been readily observed (as it has been extensively noted that within a neural microcircuit neurons have highly diverse sensitivities [Richmond and Gawne, 1998]), and the proposed strategy does not need to establish synchronization after solving the combinatory problem. It has been observed extensively that object-based attention is related to excitatory feedback [Roelfsema et al., 1998; Lamme and Spekreijse, 2000]). As noted, feedback would reduce the latency of the vectors associated with these objects, and result in what amounts to the “cocktail-party-effect”. This would amount to a substrate for object enhancement through attention. To establish the actual use of such a strategy however, the specific connectivity
9. Conclusions
21
between circuits that preserves the sparse vector nature of activity in our model has to be identified. Finally, we note that the firing-probability of 0.08 in Section 3 would correspond to an average firing rate of 4 Hz and a 20ms time-interval. A spike-time vector taken from this interval can accommodate four items, which seems to be sufficient [Luck and Vogel, 1997]. The example value 4Hz is chosen as the average firing rate to reflect the diverse selectivity of neurons within microcircuits. The trade-of between average firing-rate and time-window is linear, hence smaller time-windows would allow for larger average firing-rates. However, the item capacity is clearly constrained by the quadratic dependence of sparseness: the penalty for a higher capacity is a much lower sparseness, and consequently the need for more neurons in a vector to accurately represent such sparseness. 9. Conclusions We presented a neural code based on spiking neurons suitable for dynamic binding. This is achieved by moving from a single neuron activity code to a sparse activity vector based population code. We have shown that sparse neural vectors enable us to overcome the superposition catastrophe that is at the heart of the dynamic binding problem. To demonstrate that such a representational solution can also be computed relatively straightforward in neural networks, we developed a neural architecture that handles such vectors, while drawing on elements from neuroscience. Adapting the idea of sparse activity vectors to sparse-spike time-vectors has several advantages: • Including neuronal spike-times in the sparse vectors makes element-wise conjunction detection independent of the number of feature conjunctions. • Sparse-spike time vectors enable relative vector-latency to encode intensity. • An architecture based on single spike vectors is capable of processing information while using sparse neural vectors. As indicated, having a workable algorithm is a critical step beyond finding merely a good representation for a solution. Although in a sense sparse-vector coding is an alternative to the synchrony hypothesis, many elements are borrowed, in particular with respect to the temporally congruent representation of unfamiliar object configurations. As such it can also be considered a significant refinement and extension of the underlying idea of employing the temporal domain for more powerful neural processing. To fully appreciate the power of the solution we presented, it is useful to note that most point on the list of binding problems presented in the introduction can be addressed (with the notable exception of binding across time, but with the added observation that general symbolic processing can be implemented and represented with sparse neural vectors in separate streams). Regardless of possible physiological implications of the proposed neural coding, the issue of dynamic binding is of importance in the field of applied neural networks and Artificial Intelligence. The ability to assign simultaneously multiple “tags” in various strengths in neural networks is a significant step towards symbolic reasoning in Artificial Neural Networks. We believe that as such, the framework developed should enable new ways of dealing with these issues and as such has its own merits.
22
References [Abbott et al., 1997] L. Abbott, J. Varela, K. Sen, and S. Nelson. Synaptic depression and cortical gain control. Science, pages 220–223, 1997. [Abeles et al., 1993] M. Abeles, H. Bergman, E. Margalit, and E. Vaadia. Spatiotemporal firing patterns in the frontal cortex of behaving monkeys. J. Neurophysiol., 70:1629–1658, 1993. [Bohte et al., 2000a] S.M. Bohte, J.N. Kok, and H. La Poutr´e. Spike-prop: errorbackprogation in multi-layer networks of spiking neurons. In M. Verleysen, editor, Proceedings of the European Symposium on Artificial Neural Networks ESANN’2000, pages 419–425. D-Facto, 2000. [Bohte et al., 2000b] S.M. Bohte, J.N. Kok, and H. La Poutr´e. Unsupervised classification in a layered network of spiking neurons. In Proceedings of IJCNN’2000, page 211, 2000. [Brecht et al., 1998] M. Brecht, W. Singer, and A.K. Engel. Role of temporal codes for sensorimotor integration in the superior collicus. In Proceedings of ENA 98, 1998. [Buracas et al., 1998] G.T. Buracas, A. Zador, M.R. DeWeese, and T.D. Albright. Efficient discrimination of temporal patterns by motion-sensitive neurons in primate visual cortex. Neuron, 20:959–969, 1998. [Diesmann et al., 1999] M. Diesmann, M.-O. Gewaltig, and A. Aertsen. Stable propagation of synchronous spiking in cortical neural networks. Nature, 402:529–33, 1999. [Douglas and Martin, 1991] R.J. Douglas and K.A.C. Martin. Opening the grey box. Trends in Neurosciences, 14:286–293, 1991. [Fodor and Pylyshyn, 1988] J.A. Fodor and Z.W. Pylyshyn. Connectionism and cognitive architecture: a critical analysis. Cognition, 28:3–71, 1988. [F¨oldi´ak and Young, 1995] P. F¨oldi´ak and P. Young. Sparse coding in the primate cortex. In M.A. Arbib, editor, The Handbook of Brain Theory and Neural Networks. MIT Press, Cambridge, MA, 1995. [F¨oldi´ak, 1990] P. F¨oldi´ak. Forming sparse representations by local anti-hebbian learning. Biological Cybernetics, 64:165–170, 1990. [Gerstner et al., 1996] W Gerstner, R Kempter, JL van Hemmen, and H Wagner. A neuronal learning rule for sub-millisecond temporal coding. Nature, 383:76–78, 1996. [Goodale and Milner, 1992] M.A. Goodale and D. Milner. Separate visual pathways for perception and action. 1992. [Grossberg, 1998] S. Grossberg. The complementary brain: A unifying view of brain specialization and modularity. Technical Report Boston University, (CAS/CNS-TR-98-003):1–21, 1998. [Gupta et al., 2000] A. Gupta, Y. Wang, and H. Markram. Organizing principles for a diversity of gabaergic interneurons and synapses in the neocortex. Science, 287:273–278, 2000. [Hebb, 1949] D.O. Hebb. The Organization of Behaviour. Wiley, New York, 1949. [Heller et al., 2000] J. Heller, J.A. Hertz, T.W. Kjaer, and B.J. Richmond. Information flow
References
23
and temporal coding in primate pattern vision. J. Comp. Neurosci., in press:1–17, 2000. [Kanisza, 1979] G. Kanisza. The Organization of Vision. New York: Praeger, 1979. [Kempter et al., 1999] Richard Kempter, Wulfram Gerstner, and J. Leo van Hemmen. Hebbian learning and spiking neurons. Phys. Rev. E, 59(4):4498–4514, 1999. [Kohno et al., 1995] Ryuji Kohno, Reuven Meidan, and Laurence B. Milstein. Spread spectrum access methods for wireless communications, http://www.cs.berkeley.edu/ IEEE Communica~gribble/cs294-7 wireless/summaries/spread spectrum.html. tions Magazine, page 1, January 1995. [Lamme and Spekreijse, 2000] V.A. Lamme and H. Spekreijse. Modulations of primary visual cortex activity representing attentive and conscious scene perception. Front Biosci., 5:232– 243, 2000. [Lamme et al., 1999] V.A.F. Lamme, V. Rodriguez, and H. Spekreijse. Separate processing dynamics for texture elements, boundaries and surfaces in primary visual cortex of the macaque monkey. 1999. [Luck and Vogel, 1997] S. J. Luck and E. K. Vogel. The capacity of visual working memory for features and conjunctions. Nature, 390:279–281, 1997. [Lupas and Verdu, 1990] R. Lupas and S. Verdu. Near-far resistance of multiuser detectors in asynchronous channels. IEEE Trans. on Comm., 38:496–502, 1990. [Maass, 1996] W. Maass. Lower bounds for the computational power of networks of spiking neurons. Neural Computation, 8(1):1–40, 1996. [Maass, 1997] W. Maass. Networks of spiking neurons: The third generation of neural network models. Neural Networks, 10(9):1659–1671, 1997. [Maass, 2000] W. Maass. On the computational power of winner-take-all. Neural Comp., in press, 2000. [Markram and Tsodyks, 1996] H. Markram and M. Tsodyks. Redistribution of synaptic efficacy between neocortical pyramidal neurons. Nature, 382:807–810, 1996. [Meunier and Nadal, 1995] C. Meunier and J.-P. Nadal. Sparsely coded neural networks. In M.A. Arbib, editor, The Handbook of Brain Theory and Neural Networks. MIT Press, Cambridge, MA, 1995. [Milner, 1974] P. Milner. A model for visual shape recognition. Psychol. Rev., 81:521–535, 1974. [Minsky and Papert, 1969] M. Minsky and S. Papert. Perceptrons: An Introduction to Computational Geometry. The MIT Press, 1969. [Mishkin et al., 1983] M. Mishkin, L.G. Ungerleider, and K.A. Macko. Object vision and spatial vision: two cortical pathways. 1983. [Mozer et al., 1992] M. Mozer, R. S. Zemel, M. Behrmann, and C. K. I. Williams. Learning to segment images using dynamic feature binding. Neural Computation, 4(5):650–665, 1992. [Natschl¨ager and Ruf, 1998] T. Natschl¨ager and B. Ruf. Spatial and temporal pattern analysis via spiking neurons. Network: Computation in Neural Systems, 9(3):319–332, 1998. [Olshausen and Field, 1996] B.A. Olshausen and D.J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996. [Pal and Pal, 1993] N.R. Pal and S.K. Pal. A review of image segmentation techniques. Pattern Recog. Lett., 26:1277–1294, 1993.
24
References
[Rachkovskij and Kussul, 1999] D.A. Rachkovskij and E.M. Kussul. Binding and normalization of binary sparse distributed representations by context-dependent thinning. Cogn. Sci. E-print Archive, http://cogprints.soton.ac.uk/abs/comp/199904008, 1999-04-008:1– 41, 1999. [Richmond and Gawne, 1998] B.J. Richmond and T.J. Gawne. The relationship between neuronal codes and cortical organization. In H.B. Eichenbaum and J.L. Davis, editors, Neuronal Ensembles: Strategies for Recording and Decoding, chapter 3. Wiley-Liss, New York, NY, 1998. [Roelfsema et al., 1998] P.R. Roelfsema, V.A. Lamme, and H. Spekreijse. Object-based attention in the primary visual cortex of the macaque monkey. Nature, 395:376–81, 1998. [Rosenblatt, 1961] F. Rosenblatt. Principles of Neurodynamics: Perception and the Theory of Brain Mechanisms. Spartan Books, Washington, CD, 1961. [Shadlen and Movshon, 1999] M.N. Shadlen and J.A. Movshon. Synchrony unbound: A critical evaluation of the temporal binding hypothesis. Neuron, 24:67–77, 1999. [Shadlen and Newsome, 1994] M.N. Shadlen and W.T. Newsome. Noise, neural codes and cortical organization. Curr. Opin. Neurobiol., 4:569–579, 1994. [Singer and Gray, 1995] W. Singer and C.M. Gray. Visual feature integration and the temporal correlation hypothesis. Annu. Rev. Neurosci., 18:555–586, 1995. [Singer, 1999] W. Singer. Neuronal synchrony: A versatile code for the definition of relations. Neuron, 24:49–65, 1999. [Thorpe and Gautrais, 1997] S.J. Thorpe and J. Gautrais. Rapid visual processing using spike asynchrony. In Michael C. Mozer, Michael I. Jordan, and Thomas Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 901. The MIT Press, 1997. [Treisman, 1996] A. Treisman. The binding problem. Current Opinion in Neurobiology, 6:171–178, 1996. [Treisman, 1998] A. Treisman. The perception of features and objects. In R.D. Wright, editor, Visual Attention, chapter 2. Oxford Univ. Press, Oxford, UK, 1998. [von der Malsburg and Schneider, 1986] Ch. von der Malsburg and W. Schneider. A neural cocktail-party processor. Biological Cybernetics, 54:29–40, 1986. [von der Malsburg, 1981] Ch. von der Malsburg. The correlation theory of brain function. Internal Report 81-2, Max-Planck-Institute for Biophysical Chemistry, G¨ ottingen, Germany, 1981. [von der Malsburg, 1999] Ch. von der Malsburg. The what and why of binding: The modeler’s perspective. Neuron, 24:95–104, 1999. [Wolfe and Cave, 1999] J.M. Wolfe and K.R. Cave. The psychophysical evidence for a binding problem in human vision. Neuron, 24:11–17, 1999.