d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
An instance of Coincidence Detection Architecture relying on Temporal Coding Dominique Béroule
Abstract—Although time and space are interrelated in every occurrence of real-world events, only spatial codes are used at the basic level of most computational architectures. Inspired by neurobiological facts and hypotheses that assign a primordial coding role to the temporal dimension, and developed to address both cognitive and engineering applications, Guided Propagation Networks (GPNs) are aimed at a generic realtime machine, based on time-space coincidence testing. The involved temporal parameters are gradually introduced, in relation with complementary applications in the field of Human-Machine Communication: sensorimotor modeling, pattern recognition and natural language processing. Index terms—Coincidence Detection, Learning, Pattern Recognition, Sensori-motor Modeling, Natural Language Processing I. INTRODUCTION Among the dimensions within which our physical world exists, time plays a critical role. In contrast to its spatial partners, this abstract dimension flows continuously, independently of the processing time required by any computational architecture. Therefore, this is not surprising that time has not been viewed as a tractable coding comp onent in the design of our information processing machines. In computation, the coding elements are a spatial pattern of bits stored in a memory for a given piece of information; the relative location of the registers can also participate in the representation of some knowledge. In contrast, the precise moment at which a register is accessed, whether for writing or reading purpose, cannot be considered as meaningful, for it depends on several factors, including the varying system load. In modern sequential architectures, differences in synchrony between time-dependent processing tasks is considered an obstacle rather than an asset. Previous generations of machines “took their time” without qualitatively impeding processing. From a general point of view, information processing devices can be examined with respect to the way they exploit the basic physical dimensions, namely: local or distributed space and time [1]. The issue is to exhibit missing dimensions, those that might be used to complete the representational span of our current processing architectures.
Manuscript received June 1, 2003 ; revised December 31, 2003. T he author is with the Architectures and Models for Interaction group (AMI), LIMSI-CNRS B.P.133 91403 Orsay, France (e-mail:
[email protected])
Through the manipulation of pointers, symbolic programming languages provide computers with distributed spatial representations (i.e.: list structures) that complement the local space codes contained in memory registers. Despite this symbolic processing capacity, computers are not yet seen as natural conduits for dealing with real-world patterns and natural languages: a statement that might be linked with their inadequacy in exploiting temporal coding at their most elementary functioning level. Founding spatial representations in the generic ANNs Artificial Neural Networks (ANNs) inherently implement a spatial coding of the distributed type. At the local level, a formal neuron also contains spatial codes (i.e.: weights) that hold a prime role in the full process. ANNs can, in principle, associate some meaning to the instant at which a processing element is active, relatively to other elements (distributed time coding). Furthermore, the local time dimension can also be brought into play, through the management of dynamic pulse patterns. Research in this field is currently guided by recent neurobiological discoveries towards the use of temporal codes [2]. It thus appears that, despite its spatial foundation, connectionist modeling holds the potential to cover the full representational spectrum, which contributes to making the general approach promising. The spatial basis of ANNs may be attributed to the absence of temporal dimension in the original neuronal model [3]; time has been neglected before being gradually incorporated into the network representations, whether they are used for learning to generate sequences of patterns [4], providing more meaning and independence to internal pulse signals [5], dealing with the recognition of sequential patterns [6][7], or for modeling perceptual skills with a very large population of oscillating neurons [8] (for a more complete account of related recent developments, see the other contributions to this Special Issue). Surprisingly, the temporal properties of Post-Synaptic Potentials integration [9] were known long before the connectionist renewal that arose in the 80’s, still based on the same time-free formal neuron. Coincidence detection and delay lines in specific devices Temporal coding has rarely been explored for computational purpose. This relatively minor research interest can be traced to the late 40’s, with a correlation-based auditory model of the localization of sound sources, involving both delay lines and coincidence detectors [10]. The same temporal operators participated in other devices carrying out specific tasks such as: pitch perception [11], motion detection in insects [12], cerebellum function [13], and comb filtering [14]. Coincidence detection has also more recently been put forward by several authors [15][16][17]. Time-delays and internal
1
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
signal shapes were used, together with the “time-of-arrival” coding, in the “coincidence detection”-based memory model from which the system addressed in this paper originates [18].
A. Neurobiological motivations and inspirations GPNs have mainly been inspired by the hypothetical topological structure of the brain, involving neurons that respond selectively to environmental events. Together with its spatial location relative to other neurons (spatial topology), we assumed that a given neuron could contain a relative characteristic response time for participating in a spatiotemporal topological coding of information. The established integrative properties of neurons [8] have then been considered as a biological implementation of a coincidence detection mechanism, thanks to which characteristic memory locations could respond. Instead of being directly activated by environmental stimuli, memory units would primarily undergo an internal flow of activation that would be enhanced along specific directions (memory pathways) by both external and internal signals, towards event detectors. This is the “guided propagation” assumption (H1) in this memory model. The third piece of psycho-physiological knowledge taken into account has been the long-lasting interaction between datastorage and data-retrieval. Beside the Hebbian-like dynamic reinforcement of useful pathways [24], we assumed that “coincidence detectors” could be created in the course of processing for retaining new combinations of contextual and factual information (H2). Although considered as irrelevant from a neurobiological point of view when it was proposed [25][26], H2 received neurobiological pieces of evidence, initially through experiments concerning vocal learning in birds before the breeding season1 [27]. ANNs that grow new nodes now constitute a research path (see [28] for a review).
Temporal parameters in the generic GPNs Whereas the role of temporal coding may be thought of as bringing an extra degree of freedom to spatial network representations, this article demonstrates that a generic information processing system can make use of time as a basic dimension, on equal terms with spatial representation. In the perspective of a future all-purpose machine that would constitute an alternative to the computer family in the field of Human-Machine Communication (HMC), finding timedependent methods that reach the same performance level as existing computational algorithms will be a minimum prerequisite. The significant representational enhancements offered by temporal coding has driven our development of these new methodologies which are referred to as Guided Propagation Networks (GPNs), introduced in section II. Later sections focus on the detailed temporal codes, in relationship with their respective functions in engineering applications and cognitive modeling. The presentation of these different topics is here organized in a way that fits in with the complementary components of a HMC system: pattern generation, pattern recognition, and Natural Language Processing (NLP). The relative time-of-arrival information is addressed in the sensorimotor models of Section III, with an emphasis on its real-time treatment through coincidence detection. Section IV concentrates on three temporal parameters that participate in the integration of the system internal signals for pattern recognition purpose, namely: time-delays, durations and repetitions. In Section V which is concerned with NLP, it is shown that the precise timing of phase-locked pulses can for the first time be used for learning, parsing and generating recursive sentences of the center-embedded type.
B. Generic architecture of GPNs Although there are a number of GPN variants, depending on the application, they share the following features, that distinguish them from the main stream of ANNs: 1. They have representational modules, each being responsible for coding one type of knowledge. The deeper the module within the global architecture, the more abstract its internal representations, the longer its associated time-scale.
II. GUIDED PROPAGATION NETWORKS (GPNS) GPNs have been studied in a diverse set of applications in the field of Human Sciences and Human-Machine Communication, from which complements to the initial architecture have resulted. Most recent developments have included Knowledge Management and Data Mining concerning large databases. The diversity of these applications gives indication of the flexibility of GPNs temporal resolution as previously reported for pattern recognition [19], Psychology [20], Neurobiology [21], Artificial Intelligence [22] and Linguistics [23]. This paper will focus on the temporal aspects and their associated functional capabilities. Contrary to ANNs, the GPNs behavior does not refer to an abstract representational space controlled by global measures (i.e.: error function to optimize a preliminary training session). The representational space is the network itself; learning mechanisms are not supervised, and strongly interact with other system functions (data retrieval and generation). Once having sketched the system origin, and visualized the way a GPN module identifies (Fig. 2) and generates (Fig. 3) events, and how it grows (Fig. 4), more formal aspects will gradually be introduced.
2. Each module is primarily activated by its own spontaneous, so-called “contextual”, flow. The module behavior is regulated by a central modulator. 3.
Within a module, memory pathways are dynamically grown, along which the contextual flow propagates and possibly drives the sprouting of next pathways. Memory pathways represent timespace distributions of events that are generated by more peripheral modules, including input feature detectors. A pathway can be used either for recognition or generation purpose (see Fig. 2 and Fig. 3).
1 Given that the breeding season lasts somewhat longer in human beings than in birds, we may be confident about the human brain ability to bring new neurons into play whenever needed in the course of its life.
2
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
4.
The contextual flow is guided across time towards characteristic output memory locations, under the combined action of: • Time-space discrete Stimuli S(t) (activation signals) issued from more peripheral modules; • Facilitation F(t) (threshold-decrease signals) issued from (deeper) modules, and possibly from the central modulator. 5.
The Transfer Function of an e.p.u. is linear above its threshold, with a null output below, and a max output Amax for the maximum input. Given
Sn(t ) the stimuli input integrated by a CD unit, and
Cn −1(t ) its contextual input. In order to regulate the contributions of Sn(t ) and Cn −1(t ) , a contextual weight Rn −1
is used, that may vary slowly under the reinforcement mechanism. This is why it can be approximated as constant across a short period of time. The basic threshold θn(t ) is
Elementary processing units (e.p.u.s) are distributed along the memory pathways, so as to implement the guided propagation of the contextual flow. E.p.u.s are responsible for detecting the possible matching between a contextual input C(t) and stimuli S(t), according to a criteria (response threshold θ( t ) ) that can be lowered under the facilitation F(t) triggered by deeper modules (see Fig. 1). Most of the e.p.u.s are Context Dependent (CD) units, whereas those that form the output of a module are named Event Detectors (EDs) or Event Effectors (Ees).
defined as a ratio
θn(t) = =
En
Amax stands for the saturation value of every input, and En > 1 . A CD unit is « data-driven » when its maximum stimuli input is strong enough to overcome its threshold (« forced
( ) propagation » mode) : Amax > 1 + Rn − 1 × Amax En
of an e.p.u : time-delays and internal signal temporal features are aimed at grouping C(t) and S(t) together in time. The ratio R between the contextual weight and the stimulus weight allows one of the two flows to be dynamically favored. The e.p.u. Excitability determines the mode in which its host module is set (recognition, learning/generation).
which associates
En and Rn −1 in the relation : En > Rn −1 + 1
The CD is « knowledge-driven » when its maximum contextual input can cross alone its threshold (« extended propagation mode ») :
Rn − 1 × Amax >
(1 + Rn −1 ) × Amax , that is : En
En >
1 +1 Rn −1
In its basic « restricted propagation » mode, a CD unit requires both context and stimuli for being activated above its threshold (see Fig. 10).
Event-Detectors (Effectors)
Stimuli module
En
(1 + Rn −1 ) × Amax
in which
6. A few parameters determine the behavior
GPN
En of the maximum input to the CD unit: max(Sn(t ) + Rn −1 × Cn −1(t ))
Contextual flow memory pathways
EventDetector
Ci −1(t )
Fi(t ) Θi(t )
ContextDependent unit
central modulator
Ci(t )
Event-Detectors (Effectors) Fig. 1. GPN architecture : from a macroscopic to a microscopic view (from left to right). In the global architecture instance at the left-hand side, banks of feature detectors (to the top) feed the most peripheral modules. The modules form a hierarchy from the top to the bottom, and stay under the regulation of a central modulator. In the central image, an « educated » GPN module contains tree-like pathways conveying an internal contextual flow that streams from the top-left root towards potential target e.p.u.s at the bottom. The branch preferably taken by the main stream is the one that supports the best match between the arrival time of the stimuli and the internal signals time course. In the right-hand side view, a Context-Dependent unit intersects the two flows of activity. It receives a contextual input Ci-1 (t) from its predecessor in a pathway, and stimuli S i(t) from a more peripheral pathway output, towards which a facilitation signal Fi(t) is generated. When the response threshold q(t) is reached by the total input, an output signal Ci(t) spreads towards other similar CD units.
3
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
Fig. 2. Four stages of the recognition of a discrete space-time pattern by a GPN module. The same bank of Event-Detectors (EDs), represented by square cells, is visible along the vertical x axis beside, and at the top of every image below. The set of gray dots to the right-hand side stands for the pattern that has previously raised up one of the memory pathways composed of chained Context-Dependent (CD) un its (round cells). A new instance of the same pattern, drawn beside by a close space-time distribution of black dots, is the one the recognition of which is shown in the series of images below. The units and links brightness corresponds to their level of activity. E.p.u. with a white contour are subjected to backward facilitation F(t), displayed by upward gray arrows. Pattern-Detectors constitute targets that stand at the end of the memory pathways (bottom of the module). As the pattern develops over time, the spontaneous internal flow generated by the module is guided towards a specific output detector by both incoming stimuli (activation exerted by more peripheral modules) and expectations (facilitation by deeper modules).
x 15 8 7 5 14
ta
ta
tb
tc
td
t
tb 5
0
0
1
7
8
1 6
2 3
3
2 11 4 12 13
ta / Before t he occurrence of stimuli generated by the EDs at the top, a spontaneous flow of activity is initiated within the module by the rootunit 0, which preactivates the CD units 1, 2 and 3. The module is thus prepared to integrate 3 beginnings of patterns. Patterns may be anticipated by other modules; here, 3 pathways are temporarily “enlarged” by a backward facilitation flow issued from deeper modules (not represented). The response thresholds of the facilitated e.p.u. (white contour) are temporarily decreased.
tb / When the first stimulus occurs at time tb in channel 5, the corresponding ED n°5 fires. the activity of the latter spreads towards its associated CDs in the module below (n°1 and n°4). Whereas CD unit n°4 gets only activated below its response threshold, the primed CD1 receives an extra activation that crosses its threshold. The resulting activity then propagates towards the CD n°6, in order to anticipate the next stimuli. Backward facilitation generated by CD6 lowers the thresholds of ED7 and ED8
tc
td
7
8
14
6
15
6
9 3
11 12 10
13
13
16 tc / The ED7 sprinkles three CDs (6, 9, 10), among which only the n°6 has previously been fed by the contextual flow. At the CD6 input, the stimulus issued from ED7 undergoes a time-delay settled when the pattern was learnt, in order to favour its coincidence with the late stimulus issued from ED8. Thanks to the duration of the corresponding internal signals integrated by CD6, a slight temporal distortion between the learnt and the current pattern does not impede recognition (see section IV.B).
td / Among three possible directions activated by CD6, the internal flow is eventually guided by ED14 towards one of the target output of the module (ED n°16). Although the expected ED15 remains inactive (missing event compared with the learnt pattern), the CD13 has been facilitated enough for firing. The activation of ED16 may similarly participate in guiding the internal flows of deeper modules towards compound event detectors.
4
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
Fig. 3. Four stages of the production of a discrete space-time pattern by a GPN module. Graphical conventions are the same as in Fig.2. The horizontal axes stand for the time dimension in the diagrams beside, where the e.p.u. response thresholds are represented by dotted lines. The backward facilitation generated by a deeper module (long upward gray arrow) results in strongly decreasing the e.p.u. thresholds. This phenomenon does not yet affect the inactive e.p.u. n°5 and n°6, but induces the response of the first CD unit of the pathway (n°3), because of its initial pre-activation by the internal flow (top diagram). CD3 then sends an activation signal to the next CD (n°5), and a facilitating signal to the EventEffector n°7. Once actuated the event associated with EE7, the latter stimulates CD3 with a “proprioceptive” feedback, which allows the internal flow to move forward. The 3-levels shape of activity undergone by CD5 is typical of the CD units: 1/ a first plateau below the threshold, 2/ a second plateau that crosses the threshold, so as to generate both facilitation and activation, and 3/ a pick of activity caused by the stimulus feedback, followed by the CD reset, and that induces the next CD activation.
7
3 9
reset
5
tb
2/
ta tb
tc
3/
td
t
7
0
1
1
2
2
3
4
1/ 6
ta
0
proprioceptive stimulus
0
3
4 5
5
6
6
ta / Before being aspirated along a specific direction by a facilitation signal, the spontaneous activation issuing from the root unit (n°0) can potentially feed in parallel every pathway of the module. If EE6 is facilitated by a CD unit contained in a deeper module (not represented), the corresponding threshold decrease rows up its pathway (gray upward arrows) towards the root, touching the constituent CD units (n°3, 5).
tb / The response threshold of CD3 has fallen below the level of preactivation sent by the spontaneous flow. CD3 sends a slight activation signal towards CD4 and CD5, and a facilitation signal up to the Event Effector n°7. The pathway that leads to EE7 is now going to produce its associated pattern of activity.
tc
td 7
0
9
9
0
1
2
2
5
10 0
3
3
4
1
4
8
5
6
6 tc / Once the action represented by the EE7 pathway has been carried out, the latter gets fully activated by the contextual flow of its module, and spreads activation towards CD3 and CD8. CD3 was waiting for this « acknowledgement receipt » before feeding again its offspring (CD4 and CD5). Given that CD5 has been already facilitated at time ta, the same production schema occurs again, with the facilitation of the EE9 pathway.
td / At the end of this generation process, EE6 gets fully activated, and therefore indicates to deeper modules that its associated action is completed. The next action represented in this module can be triggered in the same way by a deeper module (white arrow at the bottom-right).
5
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
τ ij = d ∗ (1 − A (ts ) / A max
Fig. 4. Learning mechanisms in a GPN module. Graphical conventions are the same as in Figs. 2 and 3. Every time a CD unit fully responds, it sends an acknowledgement of effective receipt to the detectors that contributed to its activation, under the form of a facilitation signal. The corresponding decrease of threshold resets the detector (circled in the diagrams to the right). The subsequent facilitation of all Detectors reveals those that have not successfully activated a CD unit, and therefore need to be “learnt”. The time-delay to be associated with the correspondin g new connection can be deduced from the signal parameters and its instantaneous amplitude at learning time tL (top diagram to the right). Learning by differentiation is displayed in the first three images below, from a newborn module. Learning by generalization occurs at a later stage (td), when the knowledge contained in deeper module can guide the interpretation of an unexpected event, or when the latter occurs simultaneously to an expected event.
)
Amax 9
A(ts) d
2
6
td
ta
tb
t
tL
2
0
0
1 1
3
ta / Before having received stim uli generated by the output Event Detectors of a more peripheral module (at the top of the image), a newborn module merely contains a root-unit (0) that feeds a CD-cell with a spontaneous flow of activity.
tb / A first ED (n°2) gets activated but cannot guide the internal flow, since it does not yet own output links. When a test facilitation signal is sent to all the EDs by the Control Unit at time t1 L, ED2 fires again. The coincidence of ED2 and CD1 activities leads to a new connection (dotted arrow) and the sprouting of the pathway towards a new-CD (n°3).
tc
td 22
0
4
5
9
6
0
1
4
5
6
1 3
3
7
7
F 8
8
td / Once the module has grown several pathways, and that deeper modules have also developed through the same sprouting mechanism, the unexpected activation of an ED may trigger a different learning mechanism, referred to as Generalization. At this learning time, ED4 and ED5 have just been reset by their associated CD3, and ED6 has fully activated CD7 thanks to the facilitation F of deeper modules (upward arrow). The simultaneous occurrence of ED9 leads to a new connection between ED9 and CD7 (dotted curved arrow), the response of which is thus generalized.
tc / Between t1 L and a second learning time t2 L, two other EDs got active (n°4 and n°5). The CD n°3 became therefore connected to ED4 and ED5, according to the same Differentiation mechanism. The activation of ED6 has then been followed by an “end of pattern” signal (silence, space…), which triggers the assignment of an ED part to the last created unit (8)
6
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
C. Learning in GPNs
process”, the easiness of a given pathway activation is linked with the familiarity of the events pattern it codes for. b/ heavy transient changes. The response thresholds of CD units are then decreased on a short time-scale, for temporarily facilitating the activation of the pathway they belong to.
The existence of a preliminary learning phase is only related to artificial information processing systems. In reality, the acquisition of knowledge is not separated from other functions. Ideally the acquisition and development of the learnt model would be somewhat shaped by its context and previously learnt information. On the assumption of a space-time coding, context should allow the system to determine at once a spatio-temporal memory location for a new or unexpected event, which goes with the assignment of a “meaning”. As shown in Fig. 2 and Fig. 4, the front of this contextual flow pre-activates (or primes) a portion of the module, during a time-interval, so as to anticipate all the possible next events. If an unexpected event occurs, its future memory location can however be found in the neighborhood of the portion of memory currently fed by the contextual flow. The smaller the space-time interval activated by this flow, the more precise the coincidence on which learning is based. This is why, for learning purpose, the module must be set in the “restricted propagation” mode by a central modulator, the behavior of which has been inspired by the neuromodulation system in the brain [29]. If a CD unit gets activated at the front of the contextual flow, at the very time unexpected events have occurred, this temporal coincidence can be formally preserved in the form of a new link binding the CD and the currently active EDs. This structural change corresponds to the enlargement of the set of stimuli that may feed an existing pathway: the pathways response is generalized. In contrast, if the EDs activity does not match (coincide) at all with any CD, a new branch is sprouted from the last activated CD unit: the module behavior is differentiated. The generalization and differentiation unsupervised learning mechanism are triggered in the course of processing, and allow every memory module to be entirely built from a single root unit [26] (see Fig. 4). The network is therefore possibly enlarged when, during a certain time-interval, active EDs have not participated in the activation of CDs. The activity diagrams in Fig. 4 show how EDs that are ineffective can be detected by using a facilitation signal generated by the modulator [30]; time-delays can then be calculated according to [4]. At initialization time, before any learning stage, a GPN is thus equipped with a few components: - At the periphery: banks of feature detectors; - At more central levels: a preset of connected modules, each containing a root e.p.u. aimed at generating a spontaneous flow, only feeding a first CD unit (germ of the first pathway to be grown); - At the center: a modulating device for setting the functioning modes of the modules. It is notable that every module growth simply relies on temporal information, namely the instants at which the Event Detectors of more peripheral modules fire. Once created, memory pathways are regulated under two time scales : a/ slight long-lasting changes: frequently used memory pathways are gradually “enlarged” by successive reinforcements. If balanced by an opposite “forgetting
III. GPNS FOR SENSORI -MOTOR MODELING As emphasized in the previous section, the behavior of a GPN module relies on the relative time-of-arrival of its input stimuli in relation to its internal contextual flow of activity; these two flows must be well synchronized for their host module to work properly. Viewed from the internal flows, the latter must follow the rate of stimuli extraction, and therefore propagate in “real-time” within their respective modules. This basic temporal constraint provides explanations to psychological phenomena that involve either a single perceptual modality (section B), a movement generation modality (section C), or a combination of both (section D). Before illustrating the GPN architectures that model these three types of sensori-motor functions, the realtime question is first addressed, together with formal considerations. A. Real-time and formal considerations For natural information processing systems, working in real-time is not only a matter of convenience but also a matter of survival. Behavioral decisions must be taken as soon as enough evidence is brought in our brain by significant environmental and internal stimuli that may represent the context. This relies on our ability to quickly combine the past experience with current events, anticipate future possible events, but also coordinate our movements with our perceptions. At each GPN processing instant, such as the ones displayed in Fig. 2 and Fig. 3, a few e.p.u.s spread activity or facilitation signals towards a subset of other e.p.u.s. If a CD unit stands at the intersection of these different subsets, it may fire. Otherwise, when the intersection is empty, this indicates a mismatch between expected and actual events, which may trigger a dynamic learning phase. As the network grows, the size of the aforementioned subsets tends to increase; this is at the expense of the “internal energy” consumption, since more signals are propagated in parallel towards target CD units. However, the time required for a target to respond remains constant. Provided with highly parallel hardware architecture, Coincidence Detection becomes a real-time processing principle. With respect to the software simu lations of GPNs already implemented on sequential machines, the easiest way to improve the response time consists in considering at each time-step only the e.p.u. subset which is busy integrating at least one active input. In the models presented in the next sections III.C and III.D, the generation of movement is triggered by a facilitation signal that temporarily changes the basic
7
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
threshold becomes:
θn of every CDn along a pathway. The threshold
temporal coding is a
Θn( t ) = θ n − ∆θ n × Η( t )
features
where H is the Heavyside step function. ∆θ n is calculated so that any of the CD unit that belong to a facilitated pathway be activated according to the diagram of Fig. 3, namely : (1) when its predecessor in the pathway is Context driven, the CD unit should only be pre-activated ; (2) when the predecessor is fully activated, the CD threshold must be crossed.
letters
Management of the windows saccades [position of letters and words]
words
Given n the index of the CD :
previewing [Shape of words]
syntactic expectations
Condition (1) gives :
selective facilitation
Rn −1 × Amax < Amax × (1 + Rn − 1 ) − ∆θ n 1 + Rn − 1
(
phrases
)
Amax × R + Rn − 1 + 1 Rn − 1 + 1 Condition (2) gives : Amax > Amax × (1 + Rn −1 ) − ∆θ n that is : ∆θ n > Amax × Rn − 1 an adequate value for ∆θ n is the middle of its allowed that is :
∆θ n
conjugated verb -> verb in participe -> preposition -> proper name (series of 5 classes ). From the top to the bottom of the vertical axis may be found the activity over time of the 5 corresponding CD units, and the final detector. The horizontal dashed lines indicate the cells thresholds. The histograms displayed to the right-hand side correspond to the input sentence "Ils partis sont pour Paris" in which S2 and S3 have been inversed. There are two problems in this case. Firstly, S3 occurs earlier than expected and thus cannot well coincide with t he context signal sent by CD2. Secondly, S2 (Stimulus) occurs later than expected at the time when the value of the context signal (Context) sent by CD1 starts decreasing (offset). CD3 is stimulated too early by S3 (A), then by the context delivered by CD2 (B), and in the end by the combination "Context+Stimulus" of CD2 (C). Here, the ED is activated 70% of the maximum activation.
Above the response threshold, the output Ci (t) of a CD unit can be expressed by the following linear function of its S i (t) and Ci-1 (t) input, in which Ri-1 represents the contextual weight:
Ci (t ) =
With a maximum level of spontaneous activity, not modified by the initial contextual weight R0 = 1 , the last term of the equality becomes:
Rn −1 × ... × R1 ( S (t ) + Amax ) × 1 (1 + Rn −1 ) × ...(1 + R1 ) 2 The pathway output Cn( t ) should be maximum when it
S i(t ) + Ri −1 × Ci −1(t − 1 ) 1 + Ri −1
the output Cn of a pathway comprising n units can be expressed by a weighted sum of its stimuli:
is fed by all its n fully-active stimuli
Sn(t ) R × Cn −1(t − 1) Cn( t ) = + n −1 1 + Rn −1 1 + Rn −1 1 Rn−1 = × Sn(t ) + × S (t ) 1 + Rn −1 (1 + Rn −1 ) × (1 + Rn − 2 ) n−1 Rn−1 × Rn− 2 + × S (t ) (1 + Rn−1 ) × (1 + Rn − 2 ) × (1 + Rn − 3 ) n − 2 ⋅⋅⋅ Rn−1 × ... × R1 × R0 + × C (t ) (1 + Rn−1 ) × ... × (1 + R0 ) 0
∀i ∈ [1, n ], Si = Amax ⇒ Cn(t) = Amax 1 Rn −1 + + 1 + Rn−1 (1 + Rn −1 ) × (1 + Rn− 2 ) Cn( t ) = × Amax Rn−1 × Rn− 2 × ... × R1 ⋅ ⋅ ⋅ + (1 + Rn −1 ) × ... × (1 + R1 ) This can be demonstrated by recurrence. If the above property is true for the lowest value of n ; if furthermore it is inherited from Cn −1 t to Cn t , then it is true whatever n.
()
12
()
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
1/ C1( t ) = S1( t ) + R0 × C0(t − 1) = Amax + R0 × Amax = Amax
1 + R0
2/ if
En
1 + R0
Cn −1( t ) = Amax in ( ), then Cn( t ) = Amax
4
When the initial spontaneous flow and one stimulus among n both activate the pathway, the same output value Amax / n + 1 should be obtained whatever the stimulus location in time :
(
)
3
Amax 1 = × Amax n + 1 1 + Rn −1 Rn −1 = ×A (1 + Rn −1 ) × (1 + Rn −2 ) max ... =
4 En = 1 / Rn − 1 + 1
1 1
1
2
3
4
Rn −1
Fig. 10. Functioning modes of CDs that are chained to form a pathway (at the top), depending on their parameters: the contextual weight Rn −1 , and the Excitability En . the pathways units are initially set in the “Restricted propagation” area (1), where both Sn t and Cn −1 t input are required for Cn t to be triggered (diamond-shaped dots). The subsequent increase of Excitability may be caused whether on a long-term scale by a weak selective reinforcement (not displayed), or by the strong facilitation issued from a deeper module (square dots). The units functioning point are driven up to areas (2: “extended propagation”) and even (3: “free propagation”) where only Cn t can elicit a response. The area (4: “Forced propagation”) corresponds to a “pathological” functioning mode in which a CD unit would respond to Sn t only. The response thresholds can never be reached in the hachured area.
Rn − 1 = n , from which the = n − 1 , and so on until R0 = 1 .
The first equality gives
second one gives Rn − 2 The distribution of contextual weights thus follows an arithmetical progression, which means that the internal flow grows as its propagates along a pathway, with a trend to drive as much the CD units as they are located further from the root. In order to slightly compensate for this, the Excitability of the CD units can decrease accordingly. Fig. 10 displays the corresponding values of the two parameters on which the behavior of a CD unit relies.
()
()
()
()
()
( ai1, ai2, ai3,... aiN )
N
×
×
N-1
×
× i
2
2
Rn −1 × ... × R1 × R0 × 2 × Amax (1 + Rn −1 ) × ... × (1 + R0 )
f
En = Rn + 1
3
× × ×
t
Array of N-dimensional vectors
t
i
Banks of feature detectors
f f
(t,i xi )
offset detectors
(tj, xj )
onset detectors
t tk
sliding multi-channel window
f
t tk t' j
t t
scanning one-channel window
t'i
offset detectors onset detectors convergence unit dissyllable detectors Scanned banks of feature detectors, convergence node
Fig. 11. Three possible transformations of a spectral image (dissyllable /ari/), with their respective pre-processing front-end. To the top, the usual series of vectors that takes place inside a multi-dimensional abstract space, and feed Markov or Dynamic Time Warping recognition algorithms. In the middle images, several time-space characteristic locations are detected in the spectral plan (only 2-Dimensional). The resulting discrete events feed banks of onsets (white dots) and offsets (black dots with a white contour) that form a GPN input. At the bottom, the same 2 distributions of events are scanned so as to obtain a fully temporal coding. The associated architecture consists of banks of feature detectors distributed along the frequency axis, that feed ConVergence (CV) units (diamond-shaped, only one being represented here). The CV unit feeds disyllables detectors. There are as many links between the CV and a detector as number of dots in the pattern. Each link is characterized by its time-delay. The same temporal signal issued from the CV propagates with different time-delays that are aimed at synchronization with the characteristic time of the detectors (see Fig. 13).
13
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
C. Patterns viewed as a time-series of pulses
V. GPNS FOR NATURAL LANGUAGE PROCESSING
The aforementioned temporal matching deals with variations undergone by the time dimension of musical score-like representations. However, the content of a musical score may vary along its two dimensions at once; a melody remains identifiable whatever its pitch and despite tempo variations. In order to fully exploit the temporal coincidence principle, we are led to the view that our timespace distribution of events should be transformed in a purely temporal pattern (see Fig. 11). Inspired by the time-delays imposed by the inner ear basilar membrane on acoustic signals, a scanning of the frequency axis has been proposed for dispatching frequency channel events along the temporal dimension: the scanner behaves like a periodic video beam that scans repetitively the frequency axis, with a temporal shift between each period T. The extracted pulse events trigger a periodic signal of period T, where the repetition is aimed at anticipating the possible shift in time of the event, whereas the shape of the repeated signal deals with frequency shifts (Fig.13). To sum up, the formal expression of the integrated Stimuli can be expressed by
Si (t ) =
∑a
j
This section concentrates on the use of shiftable phaselocked signals for learning and parsing non-regular grammars, and thus questions the assumed innatist nature of syntactic rules. The type of symbolic treatment involved in natural language processing or reasoning implies temporary combinations of variables to be kept in short-term memory. Taking advantage of the GPN capacity to grow, a possible solution was proposed by P. Blanchet, which associates the creation of pathways to new combinations of variables, together with the reinforcement of the combinations that led to a successful behavior [43][44]. An alternative to this dynamic management of a spatial coding brings in an extra representational dimension. Whereas in computers a variable consists of a given location that may contain different possible values, a connectionist variable would be a given value (or signal) occupying different possible locations. Instead of standard signals, such a system could then propagate specific “colored” signals [23]. Several variables should then be able to share the same value (location) without interfering. This is where phase-locked pulses may supply their ability to cohabit without interference. The synchrony temporal coding has been proposed for dynamically binding units [45] a method used for instance between slots and values in frames symbolic structures [46]. Synchrony between pulse signals has been incorporated with the GPN approach by J.C. Martin for dealing with multi-modal commands [47][48].
× Ρ(t − τij ) were P is the
j
function represented in Fig. 12
1
τij
Tij
t
A. Innate versus acquired mechanisms for syntactic processing
Fig.12. generic shape of the signals integrated at the level of an e.p.u. input, and triggered by an incoming pulse, occurring at the origin of the horizontal time axis. If the left -most signal is not repeated, one retrieves the signal shape used in the recognition experiments and cognitive models presented in the previous sections of this paper. The repetition of a wide signal, tuned to the double-scanning period can be used for dealing with variability across two pattern dimensions.
According to Chomsky’s innatist theory, our grammatical skill is given at birth in the form of a “language organ” that differs from the other modules of the brain. Syntactic rules would then not result from the general and incremental learning stages advocated by developmentalists such as Piaget, but assumed to be pre-programmed instead [49]. Driven by an opposite assumption, the Artificial Intelligence sub-field of grammatical inference is concerned with learning syntactic rules from data. So far, related algorithms can handle regular grammars represented by finite automata, which cannot adequately represent the nonregular, context -free, Natural Languages (NL). Advances in ANNs concern the modeling of grammatical competence, including recursive graphs [28] and a subset of context -free rules (a n b n ones) [50]. In contradiction with the innatist view, the same GPN generic mechanisms that are involved in pattern recognition and generation task can be brought into a syntactic play. In principle, the parallel processing inherent to connectionist systems is well adapted to handling the ambiguity of natural language, by keeping several possible interpretations of the same structure simultaneously active in memory; this notably replaces the usual backtracking of the algorithmic approach. In GPNs, all the pathways that can possibly represent an input structure remain active until the ambiguity resolves, possibly under the influence of semantic modules [23]. From the central modulator point of
In the GPN architecture adapted to this double-scanning strategy, spectral feature detectors converge towards a few ConVergence (CV) units that feed dissyllable detectors. Every pulse spatial origin is thus forgotten within the receptive field of a given CV unit, the generated information of which is purely temp oral. Four types of features have been extracted : channel onsets, offsets and maxima of energy from a narrow band spectral analysis, and onsets from a broad band analyzer, assuming two complementary spectral analyses in the peripheral auditory system [18]. The double-scanning has been tested in experiments where both GPN recognizer and human listener had to identify 696 stop consonant segments of gated duration (from 1 to 30 ms), uttered by four different speakers. As visible in Fig. 14, the GPN performance follows the perceptive data. Furthermore, the repetition effect is reproduced, as well as the performance variation due to the voiced/voiceless distinction [42].
14
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
x and y translation
y distortion
y
x distortion
x and y distortions y f x t T
x
ED1
scanning period
T
CV ED2
CV
T
τ11
ED1 y tolerance
τ12 τ13
x tolerance Response threshold
τ 21
Σ
ED2
τ 22 Σ Fig. 13. Input activity histograms of two elementary pattern detectors (ED1, ED2), while distorted versions of the learnt patterns are recognized by using a doublescanning technique. The gray dots in the matrixes at the top correspond to the learnt patterns; the black dots represent the instances that are currently recognized, each column of a matrix being scanned from the bottom to the top, one after the other from left to right (gray arrows in the top-right frame).
analysis has been cancelled too early during the sentence parsing; furthermore, the generated sentence does not obey a valid rule. Ambiguity can also be generated by noise. The worst case occurs when the misspelled version of a word exists in the source language (homonymy); this is known to rule out translators, because of their sensitivity to noise and their bottom-up strategy. A phrase like “il vint vers moi” (“he came to me”) may undergo the following transformations (underlined) without changing its pronunciation. “il vin_ (wine) vert (green) mois (month)” is correctly translated by the GPN system, whereas the standard translator generates “he green wine month”. Apart from ambiguity, the other feature observed by Chomsky for evidencing a predetermined NL parser was the existence of center-embedded constructions [51]. Contrary to right-embedded relative clauses such as: [This is the translator [that made errors [that customers noticed]]], center-embedded clauses require dynamic links to be operated between sentence pieces that may be separated by long expressions. For instance, the bold boundary words of the following sentence should be associated: [This translator [that customers [that I know] appreciate] is broken]. If a recursive rule that would account for such embedded structures can be stored directly in the computer knowledge base, its possible learning did not seem accessible, so far. The syntactic modules that are presented belong to a larger system (translator prototype) compatible with the modularity feature assumed by the innatist theory. Where
% Correct
90 80 70 60 50
Human listener
40 30 20 10 0
GPN recognizer
1
3
5
7
9
11 13 15 17 19 21 23 25 27 29 ms Gate Duration
Fig.14 (after [42] ). Recognition performance for 10 listeners one the onehand, a GPN on the other hand, responding to stop consonants, as a function of the segment duration.
view, the instantaneous spread of a module internal flow constitutes an ambiguity measure, helpful for regulating the module parameters. For instance, the French sentence beginning “la belle porte...” is ambiguous, since both “detadj-noun” and “det-noun-verb” are valid. “la belle porte une robe rouge” is accurately translated «the beauty wears a red dress » by the GPN translator (Fig. 15), whereas a reference translation system (Systran) gives “ the beautiful door a red dress”, showing that the correct
15
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
previous instance, tuned to phase Y-1, should continue. For participating in the previous parsing, the output pulse is shifted backward. It may be noticed that this forward/backward temporal shift of characteristic pulses in GPN e.p.u.s is analogous to the push/pull spatial management of parameters in the computer stack. A temporal counterpart to a space-oriented method is thus proposed. Apart from its compatibility with the rest of the functional architecture, this parallel method may appear more adapted to natural data than the all-or-none stack serial management:
our approach deviates from this view lies in the system homogeneity: the syntactic modules grow syntactic structures from examples, by using the same learning mechanisms as any other module of the full system, except that syntactic pathways are allowed to feed each other through internal loops. French
English
letters
letters
words
words
compound words
compound words
-
syn. clas ses sem. classes phrases structures syntactic structures
-
semantic frames syntactic structures
central modulator
Fig. 15. Architecture of the GPN French/English translator currently under study. There are 17 components, including a central modulator, and, inside each language modality: 5 modules and 3 banks of detectors. Syntactic classes and semantic frames pathways are built from the data contained in a French/English dictionary. Syntactic rules and their cross-language associations (not displayed) are learnt from examples chosen by a teacher. Thanks to the system homogeneity, each language modality can be used either in the recognition or production mode. The target -language word which is generated at a given time by the system is the one that receives the greatest facilitation from several origins : the source-language word module, semantic frames and syntactic modules of the target -language, knowing that each origin spreads facilitation towards a specific subset of several words of the target -language.
B. Temporal processing of syntactic structures Assuming the uniqueness of internal representations, this is the same memory pathway that will be activated every time its associated phrase structure (Noun Group, relative clause) occurs in a given sentence. The aforementioned synchrony coding is aimed at preventing interference between successive activations of the same pathway. In order to maintain the activation as long as parsing is not completed, the corresponding impulsive activity should be maintained on a periodic basis. A characteristic time is then a discrete, phase-locked, position of the pulse within the shortterm memory period. By shifting one step forward the phase to which e.p.u.s are sensitive after having responded, their possible next stimulation at this new phase will not interfere with their on-going activity. Furthermore, this method codes for the chronology of several instances of the same structure: the successive activations of a pathway are dynamically represented in the series of pulse signals it conveys, at several characteristic phases. Once an embedded instance of phrase is successfully parsed at phase Y, the parsing of the
on return from the recursive analysis of several embedded phrases, the end of a sentence is not always clearly linked with a single sentence beginning to be completed. Finding the correct reference could be facilitated by a semantic module.
This approach supports the learning of recursive sentence from chosen examples: A dictionary is used for building connections between class-detectors and every word when the pathway of the latter is built. The learning algorithm is similar to the one used with standard signals, except that the temporal criteria is more precise: this is the synchrony between pulses that will trigger a connection between a pathway output and possibly a CD unit of the same module. The training session starts with words, and then the system is given noun-phrases, and finally more complex structures. A relative clause is only taught by using the following series of sentences of the kind: “the eggs are big”, “the hen lays”, “that the hen lays”, “the eggs that the hen lays are big”. The last sentence only contains one level of recursion, but the parser is then capable of parsing in real-time several embeddings of this type, with a maximum depth that depends on the number of available phases (short-term memory span). Fig. 16 shows the activity of the CD units involved in the 5-phases parsing of a 3-embedded structure (a N-embedded structure would require 2N-1 phases). The generation of embedded sentences by the same network makes use of facilitating pulses that follow the same shifting rules as activation pulses. VII. CONCLUSION AND FUTURE PROSPECTS It was twenty years ago today, student Béroule visualized a play: a wide beam of rays was traveling in a dark space and was hit by a beam of a short-duration. This caused the direction of the large beam to change. the beam was hit for a second time, which caused its direction to change again.. This visualization may not seem relevant if considered out-of-context, but its meaning was clear for me at the time that I was a student in computer science. In the context of the associative memory model which was being developed at that time, it provided a solution to the retrieval of the right memory location under the influence of external stimuli (short duration beams), provided an internal flow of activation (the “wide beam”). When these initial concepts had to be formalized in order to address real-life tasks, it was recognized that a specific methodology for introducing temporal parameters had to be designed. Following the implementation of the initial “same time-of-arrival”
16
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
detectors, temporal parameters have been gradually added to the model whenever required. Time-delays allowed coincidence detectors to integrate speech time-space patterns. The duration of integrated signals provided robustness to variability along one pattern dimension, an alternative to the usual algorithmic time warping in the case of speech processing. The signals repetition factor induced robustness along a second dimension, provided a specific scanning of the input. Phase coding permitted the binding of multi-modal variables. The advantage of using these parameters in a coincidence detection architecture has, however, gone beyond the questions they were initially aimed at solving.
processing thus permits structures to be learnt.
recursive
(center-embedded)
- Thanks to the time-of-arrival coding, a GPN hardware implementation would inherently work in real-time, independently of the amount of memorized events. This feature is of prime necessity in the perspective of a machine that would be asked to take instantaneous decisions based on different pieces of knowledge, and produce movements accordingly. It is now recognized that pattern recognition, natural language understanding, perception-dependent generation, and ultimately, dialog handling, all require the mobilization of several representation modules at once. Designing VLSI coincidence detectors is now feasible. Unfortunately, the sprouting learning mechanism does not yet hold a hardware counterpart.
- Temporal coding can underlie psychological and neurobiological models. This paper has addressed : the saccades of the eyes in reading, the fast auditory perception of consonants, the impairment of movement production and motor learning through imitation. Following the neuromodulation study on Parkinson disease, a work currently in progress concerns the influence of emotion on memory retrieval. Future prospects could re-examine the learning by imitation strategy, with respect to the function of mirror neurons.
Although the research results reported in this paper tend to promote the temporal coding of real-world events as an alternative to the usual spatial coding of information, it also reveals that designing a complete machine based on coincidence detection is a work of time.
A CKNOWLEDGEMENTS
- In Pattern Recognition, the discrete time-space coding of continuous environmental signals is a prerequisite to coincidence detection, and can be exploited by using timedelays and long-duration signals. Once memorized by a perception module, a significant time-space configuration of events can be corrupted by noise (causing extra, missing and shifted events) without reducing recognition, a challenging feature in natural environments. The value of the repetition factor of the internal signal remains to be thoroughly investigated as a way to deal with pattern variations that occur in two dimensions at once.
I am very grateful to Dr Claire T.-N. who kindly forwarded this Special Issue Call for Paper, the anonymous reviewers for their helpful comments with respect to the presentation of this article, and my colleague - and friend Dr Mark Hoser, of Geneform Technologies, for his English expertise.
REFERENCES [1] D. Béroule, "A propos de la dimension spatiale et de la dimension temporelle dans le support des représentations", 5èmes Journées Neurosciences et Sciences de l'Ingénieur, Aussois, 7-10 May, 1990. [2] P. A. Cariani, “Temporal Coding of sensory information in the brain”, Acoust Sci.&Tech., 22(2), pp.77-84, 2001. [3] D. Béroule, “La dimension Temporelle dans les Systèmes Connexionnistes”, Intellectica, N°9-10, pp. 299-305, 1990. [4] G. Willwacker, "Storage of temporal pattern sequence in a network", Biological Cybernetics N°43, pp. 115-126, 1981. [5] C. von der Malsburg, “The Correlation Theory of Brain Function”, MaxPlanck Institute for Biophysical Chemistry, Internal Report 81-2, 1981. [6] A. Waibel, T., Hanasawa, G. Hinton, K., Shikano, K., Lang, “Phoneme recognition using time-delay neural networks”. IEEE, Transactions on Acoustics, Speech and Signal Processing, 1988. [7] K. P. Unnikrishnan, J. J. Hopfield, and D. W. Tank, "Connected-Digit Speaker-Dependent Speech Recognition Using a Neural Network with Time-Delayed Connections," IEEE Transactions on Signal Processing, 39, pp. 698-713, 1991. [8] W.J.Freeman, “Chaotic Oscillations and the Genesis of Meaning in Cerebral Cortex”, in “Temporal Coding in the Brain”, G. Buzsdki at al. (Eds.), Springer-Verlag Berlin Heidelberg, 1994. [9] P. Laget, “Relations synaptiques et non-synaptiques entre les éléments nerveux”, Masson, 1970. [10] L. A. Jeffress, “ A place theory of sound localization”, J. Comp. Physiol. Psychol. 41, pp 35–39, 1948. [11] J. C. R. Licklider, “A duplex theory of pitch perception”, Experentia, 7, pp. 128-133, 1951. [12] W. Reichardt, “Autocorrelation, a principle for evaluation of sensory information by the central nervous system. In Principles of Sensory Communication, W. Rosenblith, ed., John Wiley and Sons, NewYork, 1961.
- Together with a pattern matching skill, a temporal recognizer should adapt on-line to events that are not identified. In real life, not every significant experience is restricted to a preliminary learning/education phase. Thanks to the basic anticipation ability allowed by temporal coding, with an architecture that allows some mo dules to selectively facilitate anticipation towards specific memory locations in a module to be grown, autonomous learning can take place in the course of processing. But any learning system is constrained by its physical boundaries, which forbids every combination of known events to be memorized in a permanent way, through structural changes. This is relevant, for not all such combinations need to be retained on a longterm scale. Binding variables can be considered as a shortterm memory process which does not cause any structural change, and could be followed by the longer-term storage of associations through a pathway growing process. - All together, the different temporal codes can instantiate symbolic structures. Instead of making a copy of a new instance of internal reference (i.e.: a Noun Phrase), a reference pathway is activated with a temporal code that does not interfere with the previous activations (instances). Not only equivalent to existing approaches, a time-based
17
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
Root unit 0
Noun Group
1
[ this translator ]
[ customers ] [ I ]
response threshold
[ know ]
[ appreciate ]
short-term memory interval
Verb
2
t [ customers appreciate ]
[ I know ]
3 [ that ]
[ that ]
4
5 [ that I know ]
[ that customers appreciate ]
6 [ is ]
Aux
7
[ broken ]
Past Verb
8
[ the translator is broken ]
9
10
11 Full sentence
12
Embedded phrases: “This translator ( that { customers [ that ( I know ) ] appreciate } ) is broken”
assigned phases:
1
2
3
4
5
3
1
Fig. 16. Input activity diagrams of 12 e.p.u.s involved in the parsing of a relative clause that comprises two embedding levels. These e.p.u.s, the host pathways of which are displayed to the left -hand side, belong to the “syntactic structures” module of the English modality in Fig. 15, set in the “recognition” mode. The crossstimulations between t hese pathways are represented by curved thick gray arrows. The same arrow conventions apply to the diagrams: every time an e.p.u. responds to the coincidence of Context and Stimulus pulses, an arrow shows the propagation of its activity towards other units. Dotted gray lines stand for the mutual inhibition between sister units (originating from the same CD unit), aimed at desactivating the pulses that stand for expectations that are no more valid. The second diagram corresponds for instance to the CD unit “conjugated verb in the context of a Noun Group (NG)”; every time a NG occurs in the sentence, e.p.u. n° 2 receives a new “prediction” pulse, but shifted forward in order to avoid interference with the previous NG occurrence expectation. In the middle of the sentence (between “I” and “known”), when the 3 levels of embedding are simultaneously present, the short -term memory interval is loaded with 3 pulses. As embedding resolves, each contextual pulse sums with a stimulus pulse from an expected verb, which crosses the e.p.u. threshold, resets its activity at this very instant, and spreads activity towards e.p.u. n°3. When an ED - say e.p.u. n° 3 – responds, the generated pulse is shifted backward so as to be synchronized with an awaiting pulse tuned to a lower embedding level (in e.p.u. n°5).
18
d’après: IEEE Trans. On Neural Networks, Special Issue on Temporal Coding for Neural Information Processing, D. Wang, W.J.Freeman, R.Kozma, A.G.Lozowski, and A.A.Minai (Eds.), Vol. 15, N°5, pp. 963-979, September 2004.
[13] V. Brainterberg, “Is the cerebellar cortex a biological clock in the millisecond range? Prog. Brain Res. 25, pp 334-346, 1967. [14] D. M. McKay, "Self-organization in time domain", in Self-organizing systems, M. C. Yovitts, G.T. Jacobi, & G. D. Goldstein (Eds.), Washington D.D.: Spartan Books, pp. 37-48, 1962. [15] M. Abeles, “Local cortical circuits : studies of brain function. An electrophysiological study” Vol.6, Springer-Verlag, 1982. [16] Y. Burnod, “Cerebral Cortex and Behavioral Adaptation: a possible mechanism”, Masson, 1988. [17] C. Jacquemin, « A coincidence detection network for spatio-temporal coding : Application to nominal composition », In Actes, 13th International Joint Conference on Artificial Intelligence (IJCAI'93), pp. 1346-1351, Chambéry. San Mateo: Morgan Kaufman, 1993. [18] D. Béroule “Un Modèle de Mémoire Adaptative, Dynamique et Associative pour le Traitement Automatique de la Parole”, Thèse de Doctorat, Orsay University, may 1985. [19] D. Béroule, “ Management of time distortions through rough coincidence detection”, in Proceed. of the 1st European Conference on Speech Communication and Technology, Paris, 1989. [20] D. Béroule, “Vers un Connexionnisme Cognitiviste ?”, in “Modèles et Concepts pour la Science Cognitive”, M.Denis, G.Sabah (Eds.), Presses Universitaires de Grenoble, pp. 109-124, 1993. [21] C. Toffano-Nioche, D. Béroule, J.P. Tassin, “ A Functional Model of some Parkinson's Disease Symptoms using a Guided Propagation Network”, Artificial Intelligence in Medecine, N°14, pp. 237-258, 1998. [22] D. Béroule, “The adaptive, Dynamic and Associative Memory Model: a possible future tool for vocal Human-Computer Communication”, The Structure of Multi-modal Dialogue, M.Taylor, D.G.Bo uwhuis, F.Néel (Eds), North-Holland, pp. 189-202, 1989. [23] D. Béroule, “Traitement Connexionniste du Langage”, in: Histoire, Epistémologie et Langage , tome 11, fascicule 1: Sciences du Langage et Recherches Cognitives, F.Rastier (Ed), 1991. [24] V. Bloch, S. Laroche, “Facts and hypotheses related to the search of the engram, in “Neurobiology of Learning and Memory”, Guilford, New-York, 1984. [25] D. Béroule, “A model of dynamic, associative, distributed and adaptive memory” (in French), in AFCET conf: “Hardware and software components and architectures for the 5th generation”, 5-7 March, Paris, pp. 201-211, 1985. [26] D. Béroule, “The Never-Ending Learning”, Neural Computers, R.Eckmiller, C.v.d. Malsburg (Eds), Springer Verlag, pp 219-230, 1988. [27] F. Nottebohm, “reassessing the mechanisms and origins of vocal learning in birds”, TINS, Vol.14 N°5, pp. 206-211, 1994. [28] P. Fletcher, “Connectionist learning of regular graph grammars”, Connection Science, 13, N°2, pp. 127-188, 2001. [29] J-P. Tassin, “ Norepinephrine/dopamine interactions in the prefrontal cortex and their possible roles as neuromodulators in Schizophrenia”, J. Neural Transm; suppl 36, pp. 135-62, 1992. [30] C. Toffano-Nioche, H. Ruellan, C. Ménigault, J. C. Martin, A. Lainé, D. Béroule, “The Neverending Learning II”, LIMSI internal Report N° 96-05, 1996. [31] K. Rayner, S. Duffy, “Lexical complexity and fixation times in reading: Effect of word frequency, verb complexity and lexical ambiguity”, in Memory and Cognition, 14, pp. 191-201, 1986. [32] H. Blanchard, A. Pollatsek, K. Rayner, “The Acquisition of parafoveal word information in reading”, Perception and Psychophysics, 46, pp. 85-94, 1989. [33] D. Béroule, H. Ruellan, R. von Hoe, “A Guided Propagation Model of Reading”, Instituut vor Perceptie Onderzoek (IPO) Annual Progress Report 28, Eindhoven, 1994. [34] H. Ruellan, “The role of reinforcement in a reading model”, Cybernetics and systems’96, Vol.2, Trappl, R. (Eds), pp. 1090-1095, 1996. [35] C. Toffano-Nioche, “Un Organe de Controle inspire de la Neuromodulation pour les Réseaux à Propagation Guidée”, Thèse de Doctorat, Orsay University, 1996. [36] J. H. G. Williams, A. Whiten, T. Suddendorf, D.I. Perrett, “Imitation, mirror neurons and Autism”, Nerosciences & Biobehavioral reviews 25 (4), pp. 287-295, 2001.
[37] J-L. Schwartz, D. Béroule, “Essai de formalisation de faits et hypothèses de physiologie concernant le traitement de l'information pour la reconnaissance automatique de la parole”, Proceedings of 15th JEP , Aix-en-Provence, 1986. [38] D. Béroule, “Guided Propagation inside a topographic memory”, 1st Conference on Neural Networks, San Diego, 1987. [39] J. Leboeuf, D. Béroule, “Un système connexionniste appliqué au traitement automatique de la parole”, European Conference on Speech Technology, Edinburg, 1987. [40] P. Escande, D. Béroule, P. Blanchet, “Speech recognition experiments with Guided Propagation”, Proceeds of the Intern. Joint Conf. On Neural Networks, Singapore, 1991. [41] P. Westerlund, D. Béroule, M. Roques, “Experiments in Robust Parsing with a Guided Propagation Network”, New Methods in Language Processing, D.Jones (Ed.), UCL Press, London, pp 96112, 1996. [42] A. Lainé, D. Béroule, P. Dermody, “Representation Issues in spoken word recognition: an account for the repetition effect”, Proceed. of the Intern. Conf. on Computer Linguistics, Speech and Document Processing, February 18-20, Calcutta, 1998. [43] P. Blanchet, “Une architecture connexionniste pour l’apprentissage par l’expérience et la représentation des connaissances, Thèse de Doctorat, Orsay, 1992. [44] P. Blanchet, “An architecture for representing and learning behaviour by trial and error”, in “From animals to animats 3, 3rd International Conference on Simulation of Adaptive Behaviour”, Brighton, 1994. [45] C. von der Malsburg, “How are nervous structures organised?”, in Synergetics of the Brain , E. Basar, H. Flohr, H. Haken, A. J. Mandell, Eds. Berlin, Springer-Verlag , pp. 238-249, 1983. [46] V. Ajjnagadde, L. Shastri, “Rules and variables in neural nets”, Neural Computation 3, pp. 121-134, 1991. [47] J. C. Martin, J.C., D. Béroule, “ Temporal Codes within a Typology of Cooperation Between Modalities”, Artificial Intelligence Rev. 9, pp. 95-102, 1995. [48] J. -C. Martin, R. Veldman, D. Béroule, “Developing Multimodal Interfaces: A Theoretical Framework and Guided Propagation Network”, in Multimodal Human-Computer Communication, H. Bunt, R.J. Beun, T. Borghuis (Eds.), Lecture notes in Computer Science; Vol. 1374 : Lecture notes in Artificial Experiments, Springer-Verlag, pp.158-187, 1998. [49] “ Théories du langage. Théories de l'apprentissage, Le débat entre Jean Piaget et Noam Chomsky”, Paris: Editions de Seuil, 1979. [50] S. Levy, O. Melnik and J. B. Pollack “ Infinite RAAM: A Principled Connectionist Basis for Grammatical Competence”, COGSCII 2000, IEEE press, 2000. [51] N. Chomsky, “Three models for the description of language”, IRE Transactions on information theory 2, pp. 113-124, 1956.
Dominique Béroule received his PhD degree in computer science from the University of Orsay, France, in 1985. He was a Research Fellow with the Eindhoven “Institute for Perception Research” in the Netherlands, in 1987, and Visiting Scientist at the University of Sydney, in 1997. Since 1988, he has held a Researcher position with the French national center for scientific research (CNRS), were he is conducting the development of an associative memory model (Guided Propagation Networks) in the framework of Human-Machine Communication. His research activities range from long-term projects, including cognitive modeling, to industrial applications, with an emphasis on pattern recognition and natural language processing. The ultimate purpose will be to design a generic parallel machine with human-like skills.
19