Predicting non-linear dynamics: a stable local learning scheme for recurrent spiking neural networks Aditya Gilra1,*
arXiv:1702.06463v1 [q-bio.NC] 21 Feb 2017
1
Wulfram Gerstner1
School of Computer and Communication Sciences, and Brain-Mind Institute, School of Life ´ Sciences, Ecole Polytechnique F´ed´erale de Lausanne (EPFL), Lausanne 1015, Switzerland. ∗ Correspondence:
[email protected]
Abstract Brains need to predict how our muscles and body react to motor commands. How networks of spiking neurons can learn to reproduce these non-linear dynamics, using local, online and stable learning rules, is an important, open question. Here, we present a supervised learning scheme for the feedforward and recurrent connections in a network of heterogeneous spiking neurons. The error in the output is fed back through fixed random connections with a negative gain, causing the network to follow the desired dynamics, while an online and local rule changes the weights; hence we call the scheme FOLLOW (Feedback-based Online Local Learning Of Weights) The rule is local in the sense that weight changes depend on the presynaptic activity and the error signal projected onto the post-synaptic neuron. We provide examples of learning linear, non-linear and chaotic dynamics, as well as the dynamics of a two-link arm. Using the Lyapunov method, and under reasonable assumptions and approximations, we show that FOLLOW learning is uniformly stable, with the error going to zero asymptotically.
1
Introduction
How do we learn the motor tasks required in life, from taking our first step, to holding a pen, to chopping vegetables, to skiing or even the one-handed tennis backhand? These tasks require learning dynamical models of our muscles, our body, and the outside world (see for example reviews [Pouget and Snyder, 2000, Wolpert and Ghahramani, 2000, Lalazar and Vaadia, 2008]). During pre-natal [Khazipov et al., 2004] and post-natal development [Petersson et al., 2003], and even in adulthood [Wong et al., 2012, Hilber and Caston, 2001], humans and other animals, learn how their muscles and body move in response to neural motor commands (sometimes, randomly generated as twitches, especially during development), by receiving proprioceptive and multi-sensory feedback [Lalazar and Vaadia, 2008, Wong et al., 2012, Sarlegna and Sainburg, 2009, Dadarlat et al., 2015]. Similarly, humans also learn models of the external world [Davidson and Wolpert, 2005, Zago et al., 2005, Zago et al., 2009, Friston, 2008]. Here, we ask how a non-linear dynamical system, like the muscles-body system, may be learned by a network of heterogeneous spiking neurons, in an online and synaptically local manner. In particular, we explore how to learn the forward predictive model of an arm [Pouget and Snyder, 2000] shown in Figure 1.
1
true state
multi-sensory integration
visual feedback
− error in predicted state feedback / modifier
motor command generator
+ forward predictive model
predicted state
proprioceptive feedback
motor command
muscles+body dynamics
Figure 1: Motor learning illustrated in a two-link arm. A motor learning configuration (adapted from [Pouget and Snyder, 2000]). The forward predictive model (in bold) is to be learned. During learning, random motor commands (motor babbling) from a motor command generator cause movements in the muscles and body. The same motor commands are also sent to the forward predictive model in the brain which predicts the positions and velocities (state variables) of the limbs. Multi-sensory feedback relays back the true positions and velocities of the limbs. The deviation of the predicted positions and velocities from the true positions and velocities is used as feedback, and to learn and refine the forward predictive model using the FOLLOW scheme in this article. Learning to generate the motor commands (control input) that yield a desired trajectory, i.e. the inverse model, are left for future work.
2
Ideally, in order to be biologically plausible, a learning rule must be online i.e. constantly incorporating new data as opposed to requiring the data all at once or in batches; and local i.e. the quantities that modify the weight of a synapse must be available locally at the synapse. Stability of the learning dynamics and a guarantee of convergence to the requisite system would also be desirable. However, supervised learning of recurrent weights to predict or generate nonlinear dynamics, given command input, is known to be difficult in networks of rate units, and even more so in networks of spiking neurons (see [Abbott et al., 2016] for a review). Learning the recurrent weights via gradient descent on the mean squared error, for example in artificial neural networks via backpropagation through time (BPTT) [Rumelhart et al., 1986] or via real-time recurrent learning (RTRL) [Williams and Zipser, 1989], has problems of stability due to bifurcations [Doya, 1992]. Moreover, BPTT is non-local in time (off-line), while RTRL is non-local in space. For reviews, see [Pearlmutter, 1995, Jaeger, 2005]. Further, these algorithms suffer from vanishing gradient and other issues [Bengio et al., 1994, Hochreiter et al., 2001]. These have been alleviated to some extent by using Long-Short-Term-Memory (LSTM) units [Hochreiter and Schmidhuber, 1997], but these units have no simple analogue to realistic neurons. Another approach has been to use a reservoir of recurrently-connected, non-linear, rate units or spiking neurons, which receives feedforward input [Jaeger, 2001, Maass et al., 2002, Legenstein et al., 2003, Maass and Markram, 2004, Jaeger and Haas, 2004, Joshi and Maass, 2005, Legenstein and Maass, 2007]. Due to random recurrent connections satisfying certain conditions, the units have sufficiently rich activity dynamics while maintaining a decaying memory of the input, so that any low-dimensional functional of the input can be decoded linearly from their activities. The initial proposal of the cited works was to compute only the read-out weights, not the recurrent weights, minimizing the mean squared error. Thus, they avoided stability issues arising with recurrent learning. The FORCE algorithm was developed for a network of continuous-variable rate neurons, that also learned the recurrent weights [Sussillo and Abbott, 2009, Sussillo and Abbott, 2012]. While online, with a local variant (that is slower), these rules were designed for rate units (rates can go negative, unlike spiking neural rates), even though implementing FORCE learning with spiking neurons has seen recent progress [DePasquale et al., 2016, Thalmeier et al., 2016, Nicola and Clopath, 2016]. At the start of FORCE learning, large weight changes are required at a rate faster than the dynamics being learned, which raises the question of biological plausibility. In addition, for multi-dimensional outputs, the error for each dimension must be fed back to a separate sub-network. A local, online learning rule, in a balanced spiking network, has been shown to learn linear dynamics [Bourdoukan and Den`eve, 2015]. For a non-linear dynamical system, the recurrent weights in a heterogeneous, spiking network have typically been computed offline [Eliasmith, 2005]; or, if online, the learning has been limited to simple linear systems [MacNeil and Eliasmith, 2011]. Oculomotor integrators and compensators have been learned using rules with convergence proof via the Lyapunov method, albeit using linear rate units not spiking neurons [Porrill et al., 2004, Turaga et al., 2006]. None of the above learning schemes to our knowledge incorporates all of the following: learning non-linear dynamics, implementation with spiking neurons, synaptic locality, online updating, bio-plausible time scale of weight evolution, and stability or convergence guarantee.
3
Here, we demonstrate a Lyapunov stable scheme for how a recurrently connected network of heterogeneous deterministic spiking neurons may learn to mimic a low-dimensional non-linear dynamical system, with a local and online learning rule. We show that feedforward and recurrent weights in a network of spiking neurons can be trained to mimic a non-linear dynamical system, while keeping the readout weights constant. Our learning rule is supervised, and requires access to the error in the observable outputs. The output errors are fed back with random, but fixed feedback weights. Given a set of fixed error-feedback weights, it is synaptically local, by combining presynaptic activity with the local postsynaptic error variable. Thus, it would be easy to implement in a biological substrate. We implement the scheme in a recurrently connected network of heterogeneous spiking neurons and show examples of learning linear, non-linear and chaotic dynamical systems, as well as a two-link model of vertical arm dynamics. We further prove uniform global stability of the learning system, using the Lyapunov method, with the error tending to zero over time, in the limit of approximating the target dynamics by a large-enough spiking network. See Discussion for comparisons of our scheme with others.
2
Results
2.1
Task and overview
We want a spiking neuronal network to learn a forward predictive model (Fig. 1) of a nonlinear dynamical system (muscles-body) of the form: dxα (t) = fα (~x(t)) + gα (~u(t)), (1) dt where ~x indexed by α = 1, . . . , Nd is a vector of observable state variables, ~u(t) ∈ RNd is the input, and f~ and ~g are vectors whose components are arbitrary non-linear functions fα and gα respectively. In view of the motor learning task of Figure 1, we also refer to ~u as the motor command, and to ~x as the positions and velocities. We also consider a more general form of the dynamical system: d~y (t) ~ = h(~y (t), ~u(t)), dt ~x(t) = ~k(~y (t)).
(2)
This allows to include mixed terms involving both the state variables and the input, such as terms involving the state-dependent inertia matrix for a 2-link arm (see Methods) which cannot be put into the form of (1). ~h and ~k can be arbitrary non-linear functions with different dimensions. In Supplementary subsection 7.3, we will show that learning the general system (2) can be reduced, via a specific network architecture, to a form partly similar to (1), which then permits our scheme to be applied. There are two important insights to our learning scheme. Firstly, non-linear low-dimensional functions can be approximated by linear decoding from a basis of non-linear tuning curves of a large number of heterogeneous neurons (Fig. 2B; see Supplementary subsection 7.1). These heterogeneous neurons formed two layers in our network (Fig. 2A). Feedforward weights from the first layer of neurons to the second, and recurrent weights amongst the second layer of neurons, learned to perform approximate transformations corresponding to functions ~g and f~ in the reference dynamics of equation (1). The output ~xˆ was linearly decoded from the filtered neural spike trains of the second layer. 4
Secondly, during the learning phase, random time-dependent vector input ~u corresponding to motor babbling, was given to both the reference system (muscles-body) and the spiking network. Borrowing from adaptive control theory [Narendra and Annaswamy, 1989, Ioannou and Sun, 2012], the neurons of the second layer received, through fixed feedback weights with a large negative gain, the error defined as the difference between the output ~xˆ and the desired state variables ~x (positions and velocities). This negative feedback drove the neurons to produce firing close to ideal, from the start. The feedforward and recurrent weights of the second layer were adjusted online with a local delta rule, involving the product of the error projected into the post-synaptic neuron and the presynaptic firing. Since the error feedback causes the network output to follow the desired dynamics at all times, enabling online and local learning; we call our scheme FOLLOW (Feedback-based Online Local Learning Of Weights). At the end of the learning phase lasting typically 10000 s, different test inputs were provided to both the reference and the network, but without any error feedback. If the network reproduced the state variables of the reference system without any supervision, we say that it was performing as a forward predictive model. Specific dynamical systems, namely a linear, a non-linear, a chaotic, and a 2-link arm model were learned separately. For the two-link arm, the network was tested to reproduce a reaching task and a swinging task.
2.2
FOLLOW learning scheme
We learned a dynamical system of the form (1), by learning feed-forward and recurrent weights in a two-layer network architecture depicted in Figure 2A. Only the weights onto and within the second layer were learned. The first layer of heterogeneous leaky integrate and fire (LIF) neurons (see Methods) was provided the low-dimensional motor command input ~u, with Nd components uα . The current input to a neuron with index k in the feedforward layer was X Jkff = νkff effkα uα + bffk , (3) α
where effkα were constant weights, bffk and νkff were neuron-specific constants for bias and gain respectively (see Methods). We use Greek letters for the low-dimensional variable indices and Latin letters for neuronal indices, with summations going over the full range of the indices. The number of neurons N in each layer was much larger than the dimensionality of the input to be represented i.e. N Nd . The second layer of neurons received feedforward connections from the first layer, recurrent connections from within the second layer, and error feedback. The input current to a neuron with index i in the recurrent layer was ! X X X ff Ji = νi wik (Skff ∗ κ)(t) + wij (Sj ∗ κ)(t) + k eiα (α ∗ κ)(t) + bi , (4) k
α
j
ff where wik and wij were the forward and recurrent weights, respectively, which were both subject to our synaptic learning rule, whereas eiα were fixed error encoding weights. The spike trains travelling along the feedforward path Skff and those within the recurrent networks Sj were both low-pass filtered (denoted by ∗) at the synapses with an exponential filter κ with a time constant
5
A
B
C
Figure 2: FOLLOW learning architecture. A. Learning of feedforward and recurrent weights to mimic a non-linear dynamical system. Neurons of the feedforward layer (no recurrent connections) received the continuous command input ~u as a current weighted by random weights effkα . ff The spiking output Skff of this layer was fed as a synaptic current via feedforward weights wik into the neurons of the recurrent network / layer, which had recurrent connections with weights (x) wij . Decoding weights dβi decoded the filtered spiking activity of the recurrent network as the estimated state xˆβ (t). The error in the predicted state from the true state was fed back with a large negative gain k into the recurrent network with encoding weights eiβ (dashed lines). These error encoders and the output decoders formed an auto-encoder. The fedback error current was used to update the feedforward and recurrent weights. B. The neurons in the feedforward and recurrent layer were heterogeneous with different tuning curves (random gains/encoders/biases). Illustrative tuning curves of a few neurons to a one-dimensional input u are plotted. C. A cartoon depiction of feedforward, recurrent and error currents entering a neuron i in the recurrent network. The error current and the neural spike train Si , after slow (200 ms) and fast (20 ms) filtering respectively, were used to update the feedforward and recurrent weights. Here, in a possible configuration, the error current enters the primary (vertical) dendrite so that it is available for synaptic weight updates, isolated from the somatic current (here shown via basal dendrites) (see Discussion for another scheme).
6
of 20 ms. The parameters bi and νi were neuron specific constants for the bias and gain, respectively. The constant k will be called the feedback gain. The errors α will be introduced now. Since we are interested in a supervised learning scheme with several output variables, we have several error components, with component index α. Let us denote the target output of component α of the ideal dynamical system by xα and the actual output of our recurrent network for this component by xˆα . The error in component α is defined as the difference α = xˆα − xα and can be positive and negative. The output xˆα of the recurrent network is X (x) xˆα = dαj (Sj ∗ κ)(t), (5) j
where the sum runs over all neurons in the recurrent second layer. The decoding weights dαj are assumed to be matched to the error encoding weights eiα so as to implement an auto-encoder loop: an arbitrary value α sent via the error encoding weights to the recurrent network and read out, from its N neurons, via the decoding weights gives (approximately) back α . We assume that such an auto-encoder has been pre-trained by a suitable (and biologically plausible) algorithm; see Discussion. Further, in subsection 2.6.2, we show that learning is robust to deviations from the auto-encoder. Our aim is to learn the feedforward and recurrent weights using a learning scheme that: (i) is synaptically local; (ii) is online; (iii) uses an error in the observable state rather than its derivative; (iv) learns non-linear dynamics; and (v) is Lyapunov stable with error tending to zero. Thus we overcome some of the drawbacks of previous learning schemes. Our derivation and proofs can be found in Methods sections 4.1 and 4.2. The learning rule on the feedforward and recurrent weights wij , for learning with error feedback, is derived in Methods subsection 4.1, simultaneously demonstrating Lyapunov stability, to be: ! X ff w˙ ik = −η eiα α ∗ κ (Skff ∗ κ)(t), α
w˙ ij = −η
X α
!
eiα α ∗ κ (Sj ∗ κ)(t),
(6)
where η is the learning rate. For a post-synaptic neuron i, the same error term appears for all its synapses, while the pre-synaptic factor is synapse-specific. This learning rule is online and uses an error in the observable state rather than its derivative. Furthermore, it is synaptically local since these quantities could be information that is available at the synapse in the post-synaptic neuron. First, we imagine that the post-synaptic error current (third term in eqn (4)) in the apical dendrite stimulates a messenger molecule that quickly diffuses or is actively transported into the short basal dendrites where synapses from feedfoward and feedback input are located, as depicted in Figure 2C. Consistent with the picture of a messenger molecule, we low-pass filter the error current with a 200 ms time constant, much longer than the synaptic time constant of 20 ms. Second, filtered information about pre-synaptic spike arrival Sj ∗ κ is available at each synapse, possibly localized in individual spines. Hence the learning rule is similar to Hebbian as it involves the product of pre- and post-terms. Thus, we have addressed goals (i)-(iii) above. For a critical evaluation of the notion of ‘local rule’, we refer to the Discussion. 7
Intuitively, the FOLLOW learning scheme stands on two principles. First, arbitrary transformations corresponding to ~g (~u) and f~(~xˆ) of the encoded variables ~u and ~xˆ can be approximated by linearly decoding the spike trains of heterogeneous LIF neurons via the feedforward and recurrent weights, respectively. Second, the error feedback with large negative gain forces the second layer neurons to fire such that they approximate the desired output ~x at all times.
2.3
Spiking networks learn target dynamics via FOLLOW learning
We wondered whether the FOLLOW scheme would enable the network to learn various dynamical systems of increasing complexity from linear to chaotic. For each dynamical system, we had a learning phase and a testing phase. During each phase, we provided time-varying input to both the network (Fig. 2A) and the reference system. The spiking output of the network was decoded as ~xˆ in (5) using pre-learned decoders. The error signal ~ was generated by subtracting the reference ~x given by (1) from the decoded ~xˆ. The error was fed back as currents to the neurons of the recurrent layer with negative gain k and via fixed encoding weights eiβ as shown in Figure 2A. The reference ~x was filtered before computing the error, with an exponentially-decaying filter with time constant τs = 20 ms, to match the synaptic filtering of the feedforward layer. During the learning phase, the feedforward and recurrent weights were initially set to zero, and then updated according to the FOLLOW learning rule (6). During learning, the mean squared error, where the mean was taken over the number of dynamical dimensions Nd and over a duration of a few seconds depending on the system, decreased. We chose to stop the learning phase i.e. weight updating, once the error approximately plateaued, and the network performed satisfactorily by eye, typically after approximately 10000 s. In the testing phase, the weights were held at the values reached at the end of the learning phase, and time-varying input was again provided to both the network and the reference. Also, the feedback of the error current to neurons of the second layer, was terminated. Even without error feedback, the network output now approximately matched the reference for a short duration of the order of a few seconds. This duration would determine the typical time horizon in planning a task for which the network is used as a forward predictive model, without visual or proprioceptive feedback. For ongoing tasks, in contrast to task planning, we expect the error feedback to continue, even after the learning is stopped; otherwise the state variables will integrate noise and start to diverge from the reference as here. Thus, the biological network with visual or proprioceptive feedback will perform better in an ongoing task, than what we show in our testing without feedback. 2.3.1
Linear system – decaying oscillations around varied operating points
We first implemented our FOLLOW learning scheme on a spiking network with 2000 neurons in the feedforward layer and 2000 neurons in the recurrent layer, to learn a 2-dimensional decaying oscillator (see Methods), with oscillatory and decaying time scales much larger than membrane or synaptic time scales of individual neurons (Fig. 3). Random time-varying input was provided to the reference and the network. The time varying input comprised 2-dimensional step functions at two time-scales, one changing step-value every
8
A
|
u1
0.10
start
Learning
| noise
end
Testing
ramp and step
0.00 0.06
x1 ,xˆ1
B
feedback off | feedback on 0.6 0.0
feedback on |feedback off
feedback off
0.8
0.5
0
time (s)
E
®
err 2
10 -9
9990
time (s)
9994
2 0
weight (arb·10−3 )
10 -3
4
Nd , t
D
0.1 0.0
0
time (s)
10000
7
F 12
0
time (s)
4
1
weight (arb·10−3 )
1
density
error ²1
C
0
10000
0
Figure 3: Learning linear dynamics via FOLLOW: 2D decaying oscillator. A-C. We plot network variables versus time: without and with error feedback at the start of learning (left panels); in the last 2 s of learning (first half of middle panels); and during testing without error feedback (second half of middle panels, and right panels). Weight updating and error current feedback were both turned on after the vertical red line on the left at the start of learning, and turned off after the vertical red line in the middle at the end of learning. A. The first component of the input u1 . B. First component of the learned dynamical variable xˆ1 (red) decoded from the network, and the filtered reference x1 (blue). To the left of the red vertical line in the left panel, feedback was off resulting in no output. To the right, feedback was turned on resulting in the output tracking the reference, and learning of weights. At the end the learning phase (middle panel), feedback and learning are turned off, while the output continued to track the reference. The output tracked the reference with an input different from the training input (right panel). C. First component of the error 1 = xˆ1 − x1 between the output and the reference. D. Mean squared error per dimension averaged over 2 s blocks, on a log scale, evolving during learning with feedback on, from 2 s to 9992 s. E. A few randomly selected weights are shown evolving during learning. F. Histogram of weights after learning. A few strong weights |wij | > 1 are out of bounds and not shown here (cf. panel E).
9
u2
0.20
start
Learning
Testing pulse on pedestal
0.00 0.15 feedback off | feedback on 5
x2 ,xˆ2
B
error ²2
C
feedback on |feedback off
feedback off
0 4 4 0 8
0
time (s)
E
9980
time (s)
9988
0.4
F 25
0
time (s)
8
0.0
®
err 2
10 -6
8
weight (arb·10−3 )
10 -2 Nd , t
D
| noise
end
0
time (s)
10000
1.2
density
A
|
0
10000
0
0.6 weight (arb·10−3 )
0.6
Figure 4: Learning non-linear dynamics via FOLLOW: the van der Pol oscillator. Layout and legend of panels A-F are analogous to Figure 3A-F. 50 ms and a base step-height changing every 2 s, with their amplitudes chosen uniformly from within a fixed range (see Methods). In Figure 3, at the start of the learning phase, we blocked error feedback for 2 s, hence the output remained zero, due to zero feedforward and recurrent weights. Once the error feedback with large negative gain was turned on, the feedback forced the network to follow the reference. Thus, with feedback, the error even at the start of learning, was low (Fig. 3C-D). With learning, the error dropped over time (Fig. 3D). After having stopped learning at 10000 s, we found the weight distribution to be uni-modal with a few very large weights (Fig. 3E-F). During the testing phase without error feedback, the network reproduced the reference trajectory of the dynamical system for a few seconds, in response to the same kind of input as during learning. We also tested the network on a different input, viz. a ramp for 2s followed by a step to a constant non-zero value i.e. a different operating point. The ramp can also be construed as a preparatory input before initiating an oscillatory movement, in a similar spirit to that observed in (pre-)motor cortex [Churchland et al., 2012]. For such input too, the network tracked the reference for a few seconds (Fig. 3A-C).
10
2.3.2
Non-linear – van der Pol oscillator with varied periods
Our FOLLOW learning scheme also enabled a network with 3000 neurons in the feedforward layer and 3000 neurons in the recurrent layer to reproduce the non-linear 2-dimensional van der Pol oscillator (Fig. 4). Here too, we used a two time-scale random steps input with amplitudes changing at 50 ms and 4 s drawn uniformly from an interval (see Methods). We wanted our network to not just learn the dynamics around the oscillator’s limit cycle to which nearby trajectories are attracted, but also the transient starting from rest. Thus, after every 4 s, we reset the reference and network outputs and the error to zero, so that the oscillator could start anew. As for the linear case, the system learned to track the reference system for a few seconds for different inputs (Fig. 4A-C). Error evolution and weight distributions were similar to the linear case (Fig. 4D-F), though the mean squared error at the end of learning for 10000 s was much larger here, despite a slightly larger number of neurons, than in the linear case (Fig. 3D), probably due to the non-linear dynamics. In particular, a small phase shift of the non-linear oscillator will give rise to large errors, even though the qualitative behaviour is correct. In the testing phase, a sharp square pulse as initial input caused the network to track the reference as shown in Figure 4A-C right panels. More importantly, given different constant inputs, the network reproduced different periods of the van der Pol oscillator approximately, for example a short and a long period in Figure 4A-C right panels. This is an advantage of learning the dynamical system as here (albeit in a limited region of the configuration space), rather than learning just a stereotypical trajectory. 2.3.3
Non-linear – chaotic Lorenz system
Our FOLLOW scheme also enabled a network with 5000 neurons each in the feedforward and recurrent layers, to learn the 3-dimensional non-linear chaotic Lorenz system with its characteristic strange attractor (Fig. 5). Here, we provided a control input consisting of a start pulse in a random direction at the very beginning of the learning phase and zero afterwards. Thus, the network learned just the dynamics of the autonomous attractor with zero input. During the testing phase without error feedback, since the system is chaotic, minor differences led to differing trajectories of the network and the reference (Fig. 5A-C). However, the network obeyed the same underlying strange attractor dynamics, as seen from the butterfly shape of the attractor in configuration space, and the tent map of successive maxima versus the previous maxima (Fig. 5D,E). The tent map generated from our network dynamics (Fig. 5E) has lower values for the larger maxima compared to the reference tent map due to the filtering of the reference by an exponentially-decaying filter. However, very large outliers like those seen in a network trained by FORCE [Thalmeier et al., 2016] are absent. We repeated the learning without filtering the Lorenz reference signal (Supplementary Fig. S1), and found that the mismatch is reduced, but a doubling appeared in the tent map which had been almost imperceptible with filtering (Fig. 5E).
2.4
FOLLOW enables learning a two-link planar arm model under gravity
To consider a task closer to real life, we next wondered if a spiking network can also learn the dynamics of a two-link arm via the FOLLOW scheme. We used a two-link arm model adapted from [Li, 2006] as our reference. The dynamics of the two-link arm belong to the general system 11
A
| startLearning end | zero
u3
0.06
Testing
zero
0.00 0.06
B x3 ,xˆ3
20
feedback off | feedback on
feedback on |feedback off
feedback off
0 25
C
0
40
time (s)
14900
14940
time (s)
E 20
x1
x2
x2
x1
0
time (s)
40
2
max n (x3 )
20
max n + 1 (x3 )
30
x3
D
0
x3
error ²3
40
2
Figure 5: Learning chaotic dynamics via FOLLOW: the Lorenz system. Layout and legend of panels A-C are analogous to Figure 3A-C. The network was started off with a random pulse, but a later 40 s period with zero input is plotted in the rightmost panels of A-C. D. The trajectories of the reference (left panel) and the learned network (right panel) are shown in state / configuration space for 80s with zero input during the testing phase, forming the well-known Lorenz attractor. E. Tent map, i.e. local maximum of the third component of the reference signal (blue) / network output (red) is plotted, versus the previous local maximum, for 800 s of testing with zero input. The reference is plotted with filtering in panels A-C, but unfiltered for the strange attractor (panel D left) and the tent map (panel E blue).
12
given by equation (2), rather than equation (1). The general system of equation (2) cannot be approximated by the feedforward and recurrent network architecture used till now, as the input term is not added to, but mixed with, the recurrent term in the time derivative. Thus, we use a modified network architecture with only one layer and only recurrent weights, as described in Supplementary Section 7.3, and schematized in Supplementary Figure S4. The two links in the model correspond to the upper arm and the fore arm. The arm was constrained to move in the vertical plane under gravity, while torques were applied directly at the two joints, as a proxy for the action of muscles. We limited the motion of each link, by appyling a counter-torque to a joint if it rotated more than 90 degrees on either side of the rest position, so as to avoid full rotations around the elbow or shoulder. The dynamical system representing the arm is four-dimensional with the state variables being the two joint angles and corresponding two angular velocities (see Methods section). The network must integrate the torques to obtain the angular velocities which in turn must be integrated for the angles. Learning these dynamics is difficult due to these sequential integrations involving complicated functions of the state variables and the input. Similar to the previous examples, random input torque with amplitudes of pulse and pedestal changing at 50 ms and 1 s time intervals, was provided to each joint during the learning phase. The pulse and pedestal were linearly interpolated between consecutive values for smoother dynamics. In Figure 6 and Supplementary Videos 1 and 2, we show that the general FOLLOW scheme learns to reproduce the arm dynamics without error feedback for a few seconds during the test phase. To assess the generalization capacity of the network, we tested it on a reaching task and an acrobot-inspired swinging task [Sutton, 1996]. In the reaching task, torque was provided to both joints to enable the arm-tip to reach beyond a specific (x, y) position from rest. The arm dynamics of the reference model and the network are illustrated in Figure 6D and animated in Supplementary Movie 1. We also tested the learned network model of the 2-link arm on an acrobot-like task i.e. a gymnast swinging on a high-bar [Sutton, 1996], with the shoulder joint analogous to the hands on the bar, and the elbow joint to the hips. The gymnast can only apply small torques at the hip and none at the hands, and must reach beyond a specified (x, y) position, by swinging. Thus, during the test, we provided input only at the elbow joint, pre-computed for the reference to reach beyond a specific (x, y) position from rest by swinging. The control input and the dynamics are plotted in Figure 6A-C right panels, and depicted in Figure 6E and Supplementary Movie 2, showing that the network has also learned the inertial properties of the arm model, at least for this limited acrobot-inspired task.
2.5
Error tending to zero
In Methods subsection 4.1, we show that the FOLLOW learning scheme is Lyapunov stable under certain reasonable assumptions and approximations. The most important assumption is that of a realizable network i.e. there exist feedforward and recurrent weights given the specific heterogeneous LIF neurons, that enable the network to mimic the reference dynamics perfectly. However, any finite network with its tuning curves as basis functions, can only approximate a reference that is defined by continuous differential equations. Thus the network will have frozen noise as a function of the state variables, even apart from spiking noise. This frozen noise causes drift of the weights as explained in Supplementary subsection 7.4, which further violates an assumption of bounded weights. This drift of weights may increase the error with increased 13
u2
A
0.15 0.00 0.15 feedback off | feedback on 1.0 0.0 0.8 0.6
Testing acrobot-like task
feedback on |feedback off
feedback off
x2 ,xˆ2
B
| startLearning end | noise
error ²2
C
0.0 0.2
0
time (s) reaching task
D
2 19995
time (s)
E
19997 0 time (s) acrobot-like task
2
0.0 s
0.3 s
0.6 s
gravity
reference 0.0 s
0.7 s
1.1 s
1.6 s
2.3 s
network
Figure 6: Learning arm / acrobot-like dynamics via FOLLOW. Layout and legend of panels A-C are analogous to Figure 3A-C except that: in panel A, the control input (torque) on the elbow joint is plotted; in panel B, reference and decoded angle θ, θˆ (solid) and angular velocity ω, ω ˆ (dotted) are plotted, for the elbow joint; in panel C, the error θˆ − θ in the elbow angle is plotted. For the right most panels, the control input was chosen to perform a swinging acrobot-like task, such that the bottom end of the reference arm reaches the cyan target area by applying small fixed on-off (interpolated) torque only on the elbow joint. D. Reaching task. Time snapshots of the configuration of the arm, reference in blue (top panels) and network in red (bottom panels), are shown as it starts from rest at t = 0s, and subject to torques in the directions shown by the circular arrows reaches the cyan target over the time displayed in the middle. Gravity acts downwards in the direction of the arrow. E Acrobot-inspired swinging task (visualization of right panels of A-C). Interpreted same as D, except that the torque is applied only at the elbow, and to reach the target, the arm swings forward, back, and forward again.
14
A
10 -8
err 2
®
Nd , t
10 -2
learned weights (arb×10−4 ) recurrent feedforward
B
0
1
10 0 10
30000
0
1
R 2 = 0.891
0 1
time (s)
1 R 2 = 0.981
10 0 10 −4 realizable network weights (arb×10 )
Figure 7: Error tends to zero for a realizable reference network. We ran our FOLLOW scheme on a network for learning one of two different implementations of the reference van der Pol oscillator: (1) direct integration of the differential equations, versus (2) a network realized using FOLLOW learning for 10000s starting from zero. A. We plot the evolution of the mean squared error, mean over number of dimensions Nd and over 4 s time blocks, from the start to 30000 s of learning. With the weights starting from zero, mean squared error for the direct integration reference (1) is shown in black, while that for the realizable network reference (2) is in red. B. The feedforward weights (top panel) and the recurrent weights (bottom panel) at the end of 30000s of learning, are plotted versus the corresponding weights of the realizable target network. The coefficient of determination i.e the R2 value of the fit to the identity line (y = x) is also displayed for each panel. A value of R2 = 1 denotes perfect equality of weights to those of the realizable network. Some weights fall outside the plot limits.
15
learning time, though in our simulations, any instability due to drift was not noticeable for a large enough network and learning time limited to 10000 s. We can keep the weights bounded and simultaneously counteract the drift due to the frozen approximation error, by introducing a linear weight decay term in the learning rule (similar to a weight regularization term in a loss function) (see also Supplementary subsection 7.4). In practice, we did not need this term, as the error continued to reduce even if we increased the learning time from 10000 to 30000 s (Fig. 7). In Methods subsection 4.2, we show that the error goes to zero asymptotically with FOLLOW learning i.e. the network output matches the reference output over time, under the same assumptions as the stability proof. This is a stronger result than Lyapunov stability which just ensures that bounded orbits remain bounded i.e. the error remains bounded. To confirm if the error indeed tends to zero for an exactly realizable network, we used a spiking network as the reference. The feedforward and recurrent weights of this reference network were set to those reached, starting from zero, after 10000 s of learning against the differential-equations based reference of the van der Pol oscillator (as in subsection 2.3.2). The neurons of the learning network had the same parameters as those of the reference network, but its weights were initialized to zero. We ran FOLLOW learning against the reference network for 30000 s, three times longer than the usual learning duration of 10000 s. In Figure 7A we see than the error kept dropping over 30000 s. The error reduced faster when learning against this exactly realizable reference, compared to the differential equations of the van der Pol oscillator as reference, supporting our claim that the error would eventually tend to zero (upto the spiking noise), for a realizable system. The adaptive control literature [Ioannou and Sun, 2012, Narendra and Annaswamy, 1989] shows that for adaptive systems as here, under the condition of persistent excitation, loosely defined as input that excites all the modes of the system, the parameters also converge to those of the system. Although not proven by us for the spiking network with FOLLOW learning, we still tested whether the network weights converged to the weights of the realizable network. We see in Figure 7B, that the feedforward and recurrent weights started to match those of the realized network. The results for this example are consistent with the claim that the error tends to zero over time for a realizable or closely-approximated system (Methods subsection 4.2). For cases where the realizable network approximation to the reference is not as good, and bounds on the weights need to be enforced to avoid drift and instability, the performance of our learning scheme remains to be tested. We have not explored this issue further, as the literature in adaptive control theory offers multiple solutions to the issues of drift, bounded parameters and convergence [Ioannou and Sun, 2012, Narendra and Annaswamy, 1989, Ioannou and Fidan, 2006].
2.6 2.6.1
Learning is robust to sparse connectivity and noisy decoding weights Sparsity
The networks we used till now had all-to-all connectivity. We next tested whether sparse connectivity in our network, consistent with that observed experimentally in cortical areas (see
16
B
10 -5
10 4 s
10 4 s, χ = 2
err 2
C
1 · 10 4 s 2 · 10 4 s
®
Nd , t
A 10-4
10 -6 0.0
0.5 sparsity
1.0
0
10 18 decoder noise param. χ
0.5
0.0 0.5 decoder noise param. ξ
Figure 8: Learning error versus sparse connectivity and noisy decoding weights. We ran the van der Pol oscillator learning protocol for 10000 s for different parameter values and measured the mean squared error, over the last 400 s before the end of learning, mean over number of dimensions Nd and time. A. We evolved only a fraction of the feedforward and recurrent connections, randomly chosen as per a specific sparsity, according to FOLLOW learning, while keeping the rest zero. The round dots show the mean squared error for different sparsities after a 10000 s learning protocol (default sparsity = 1 is starred); while the square dots show the same after a 20000 s protocol. B. We multiplied the original decoding weights (that form an auto-encoder with the error encoders) by a random factor (1+uniform(−χ, χ)) drawn for each weight. The mean squared error at the end of a 10000s learning protocol for increasing values of χ is plotted (default χ = 0 is starred). C. We multiplied the original decoding weights by a random factor (1+uniform(−χ + ξ, χ + ξ)), fixing χ = 2, drawn for each weight. The mean squared error at the end of a 10000 s learning protocol, for a few values of ξ on either side of zero, is plotted. for example [Markram et al., 2015, Brown and Hestrin, 2009]), was sufficient for learning these low-dimensional dynamics. We ran the van der Pol oscillator learning protocol with the sparsity of connections varying from 0.1 to 1 (full). Connections that were absent after the sparse initialization could not appear during learning, while the existing sparse connections was allowed to evolve according to FOLLOW learning. As shown in Figure 8A, we found that learning was slower with sparser connectivity; but with twice the learning time, the sparse network with biologically plausible connectivity of around 25% reached similar low error levels as the fully connected network. 2.6.2
Noisy decoding weights
For our FOLLOW scheme, the decoders were pre-learned for auto-encoding with respect to the error encoder weights, in the absence of recurrent connections. We sought to relax this requirement. Simulations show that with completely random decoders, the system does not learn to reproduce the target dynamical system. However, if the random decoders have some overlap with the auto-encoder, so that for a feedback error ~z, the loop through error encoding followed by decoding of the network state yields m~z + n(~z), with m a constant around 1, then with sufficiently high error feedback gain k, the term linear in error can still drive the output close to the desired one (see Methods). In our simulations, when we multipled each decoding weight of the auto-encoder by one plus a uniform factor between −χ + ξ and χ + ξ: i.e. d˜αi = dαi (1 + uniform(−χ + ξ, χ + ξ)), then the 17
system was still able to learn the van der Pol oscillator up to χ ∼ 5 and ξ = 0, or χ = 2 and ξ variable, as seen in Figure 8B,C. Negative ξ results in a lower overlap with the auto-encoder (effectively m = 1 + ξ) compared to the noise, leading to the asymmetry seen in Figure 8C. In conclusion, the FOLLOW learning scheme is robust to this form of multiplicative noise on the decoding weights. Other forms of noise will require the auto-encoder to be re-learned. Decoder noise can also be studied as frozen noise in approximating the reference, causing a drift of the learned weights, possibly controlled by weight decay (Supplementary subsection 7.4).
3 3.1
Discussion Comparison with previous work
Our FOLLOW learning scheme is the first, to our knowledge, to incorporate all of the following together: proof of Lyapunov stability with error tending to zero; learning of non-linear dynamics; and synaptically local and online learning rule on the feedforward and recurrent weights. A more specific comparison is provided below. Eliasmith and colleagues used a learning rule derived from stochastic gradient descent on a loss function, in a similar network structure comprising heterogeneous spiking neurons with error feedback, to learn an integrator [MacNeil and Eliasmith, 2011]. However, they did not demonstrate learning non-linear dynamics, nor provide a proof of convergence. For further discussion, see Supplementary subsection 7.2. Den`eve and colleagues have developed a framework for learning linear dynamics with error feedback in a spiking network [Bourdoukan and Den`eve, 2015]. They used a network of homogeneous neurons with fast and slow recurrent connections. They learned an autoencoder with the fast recurrent connections, and linear dynamics with the slow connections. We are able to learn non-linear dynamics by using a larger number of heterogeneous neurons, in combination with linear decoding of spike trains. Our FOLLOW learning rule, though similar in form to theirs, is derived in a principled way from Lyapunov stability. It will be interesting to see if their approach to learning the autoencoder and maintaining balance could be used in our heterogeneous network. Reservoir computing models use non-linear units recurrently connected to be close to chaos to generate a rich dynamics from which any trajectory can be linearly decoded [Jaeger, 2001, Maass et al., 2002, Legenstein et al., 2003, Maass and Markram, 2004, Jaeger and Haas, 2004, Joshi and Maass, 2005, Legenstein and Maass, 2007]. The rate or spiking units are typically homogeneous and the near-chaotic state arises from the recurrent connections. In our model, the neurons are heterogeneous and their non-linear rectifying transfer functions with different gains and biases provide the basis functions to linearly decode any transformation. Our learning enables the network to mimic a dynamical system over a range of inputs, giving rise to a family of different trajectories. In initial versions of reservoir computing, only the readout i.e. output weights were learned. Later efforts using FORCE learning in rate networks [Sussillo and Abbott, 2009, Sussillo and Abbott, 2012] learned both the readout and the recurrent weights. In the case of a multidimensional target, multi-dimensional errors were typically fed to distinct parts of the network, as opposed to the distributed-encoding used in our network. The time scale of plasticity in
18
FORCE learning is faster than the time scale of the dynamical system, whereas our rule can handle arbitrary learning rates upto a maximum depending on noise and feedback. FORCE learning performs a large initial weight change to keep error magnitudes small at all times. Due to our large-gain negative feedback, the error is always small, even if weight changes are small. [DePasquale et al., 2016] use a rate-based (continous variable) reservoir model to obtain auxiliary functions that the recurrent connections of spiking neural networks must approximate. The learning essentially minimizes a loss function involving these auxiliary functions, but it is difficult to interpret the learning rule in terms of a local observable error (see Supplementary subsection 7.2) [Thalmeier et al., 2016] build on the Den`eve framework incorporating dendritic non-linearities to approximate a non-linear rate unit. They use the FORCE learning rule for training these networks, so the same comparison as above applies. [Nicola and Clopath, 2016] are able to learn various dynamical systems in spiking neural networks trained with FORCE. However, the weights need to be initialized to achieve near-chaotic dynamics. The learning rules used are recursive least squares or batch least squares, which are difficult to interpret as local, online rules. Moving towards a less supervised direction, using reward-modulated Hebbian rules on the output weights of a reservoir enabled learning periodic pattern generation and other computations [Hoerzer et al., 2014, Legenstein et al., 2010]. It will be interesting to see if we can use weaker supervisory signals or three-factor learning rules in our scheme. The control community has a strong tradition of learning parameters in system identification and control. They also use recurrent networks trained either via backpropagation through time which is offline, or via real-time recurrent learning or the related extended Kalman filter which are non-local [Pearlmutter, 1995, Zerkaoui et al., 2009]. Further, these typically use rate / continuous-output units. Optimal control methods [Hennequin et al., 2014] or stochastic gradient descent [Song et al., 2016] have also been applied in recurrent networks of neurons, but without plausibility of the learning rule or stability guarantees.
3.2
FOLLOW learning scheme: highlights, shortcomings and extensions
We derived a learning scheme for a spiking neural network to function as a forward predictive model that mimics a non-linear dynamical system activated by one or several time-varying inputs. Our FOLLOW learning scheme is local, online and Lyapunov stable i.e. uniformly globally stable, with a guarantee that the error will reduce to zero asymptotically over time, subject to the reference dynamics being realizable by the recurrent spiking network among other conditions (Methods subsection 4.1). Our scheme uses error feedback as in adaptive control theory which forces the output of the system to mimic the reference, and allows the error current to be used as factor in the learning rule. The scheme is supervised and requires an error which is the difference between the network output and the observable reference (cf. Supplementary subsection 7.2). Our error-feedback based learning rule is not local in the traditional sense, as it is not Hebbian pre-firing-rate times post-firing-rate, is not three-factor with a global error, rather it is similar in spirit to [Roelfsema and van Ooyen, 2005], [Lillicrap et al., 2016] and local FORCE [Sussillo and Abbott, 2009].
19
Only the error current is used in the learning rule, not the total current comprised of feedforward, recurrent and error currents. One way to separate the error current from the feedforward and recurrent currents, could be by delivering it at the apical dendrite while the others are delivered at basal dendrites close to the soma (Fig. 2C), or by delivering it at a different type of synapse compared to the others. The error current would set off a biochemical cascade, corresponding to slow filtering of the error current which may then be used as a signal for learning by all synapses in the entire cell. We now discuss the key requirements for the stability of the learning and for the error to tend to zero asymptotically. Assumption (1) of the error encoders and output decoders forming an auto-encoder can be met if they are learned earlier, via say mirrored STDP [Burbank, 2015]. Assumption (2) of a realizable reference is approximated by having a large number N of heterogenous neurons (different encoders and biases) to be able to closely decode any function of the state variables in the recurrent weights with low noise, whose variance goes as 1/N [Eliasmith and Anderson, 2004]. Regarding assumption (3) of observability of state variables ~x(t), currently we used them directly for computing the error. However higher-dimensional signals that contain full information about ~x can also used in our general learning architecture (Supplementary section 7.3). Assumption (4) requires that the system dynamics be slower than synaptic dynamics. With faster AMPA synapses, we expect that even system dynamics with time constants in the tens of ms scale can be learned. Assumption (5) requires bounding the weights, which can be achieved by a weight decay term (see also Supplementary subsection 7.4), though we did not require it in practice. We found that our FOLLOW learning scheme also works under biologically plausible sparsity of connections. Furthermore, it is robust to multiplicative noise in the output decoders, relaxing the auto-encoder assumption. It would be interesting to compare this with the weakened conditions on the error backpropagation weights in articial neural networks [Lillicrap et al., 2016]. Our FOLLOW scheme is compatible with deterministic spiking neurons and sparse connectivity. However, the weights freely change between positive and negative values, which violates Dale’s law. An algorithmic scheme to incorporate Dale’s law has been suggested before [Parisien et al., 2008]. Neurons in cortical networks are also seen to maintain a balance of excitatory and inhibitory incoming currents [Den`eve and Machens, 2016]. It would be interesting to investigate a more biologically plausible extension of FOLLOW learning that maintains Dale’s law and excitatory-inhibitory balance, possibly using inhibitory plasticity [Vogels et al., 2011]; pre-learns the autoencoder, via say mirrored STDP [Burbank, 2015]; and perhaps computes the projected error within each neuron in the same network, inspired by [Urbanczik and Senn, 2014]. Further directions worth pursuing include learning multiple different dynamical transforms within one recurrent network, without interference; hierarchical learning with stacked recurrent layers; and learning to generate the control input given a desired state trajectory i.e. the inverse model, for motor control.
4 4.1
Methods Derivation and proof of stability of the FOLLOW learning scheme
We derive the FOLLOW learning rules, while simultaneously proving the stability of the scheme. We assume that: (1) the error-encoding and output-decoding weights form an autoencoder; (2) given the gains and biases of the spiking LIF neurons, there exist feedforward and 20
recurrent weights that make the network follow the reference dynamics perfectly (in practice, the dynamics is only approximately realizable by our network, see Supplementary subsection 7.4 for a discussion); (3) the state ~x of the dynamical system is observable; (4) the intrinsic time scales of the reference dynamics are much larger than both the synaptic time scale and the time scale of the error feedback loop; and (5) the input ~u and the feedforward and recurrent weights remain bounded throughout learning, and the reference dynamics is stable. Our strategy is to obtain the evolution equation for the error and use it in the time-derivative of a Lyapunov function V , to show that V˙ ≤ 0 for uniform stability, similar to proofs in adaptive control theory [Narendra and Annaswamy, 1989, Ioannou and Sun, 2012]. The filtered low-dimensional output of the recurrent network is given by equation (5), reproduced here: X (x) xˆα = dαj (Sj ∗ κ)(t) j
(x)
The decoders {dαj } are assumed pre-learnt to decode the variable encoded using the error encoders {eiα } in the recurrent network (auto-encoder assumption 1). However, even in the presence of further feedforward and recurrent currents apart from the error current, the decoding acts just on the spiking output and is agnostic to how the currents were generated. Hence, the decoded variable ~xˆ can be approximated, from an effective encoded variable represented in the currents, as: X (x) X xˆα = dαj (Sj ∗ κ)(t) ≈ (E −1 )αi Ji0 , (7) j
i
defining the effective encoded currents Ji0 ≡ (Ji − bi )/νi in the recurrently connected neurons (cf. the encoded variable in the feedforward equation (3)), with E −1 being the right pseudo-inverse P of the matrix of encoders {ejα } such that α ejα (E −1 )αi = δji . Now, equation (5) can also be written:
τs xˆ˙ α (t) = −ˆ xα (t) +
X
(ˆ x)
dαj Sj (t),
(8)
j
We convolve this equation with kernel κ, and multiply and sum throughout by encoders to get: X X X X (x) τs eiα (xˆ˙ α ∗ κ)(t) = − eiα (ˆ xα ∗ κ)(t) + eiα dαj (Sj ∗ κ)(t). (9) α
α
α
j
Using the approximate equation (7) in equation (9), and then replacing the neural currents for the recurrent network from equation (4), we obtain X X τs ejα (xˆ˙ α ∗ κ)(t) ≈ − ejα (ˆ xα ∗ κ)(t) + Jj0 (t) α
α
=−
X
+
α
ejα (ˆ xα ∗ κ)(t) +
X k
X
ff wjk (Skff ∗ κ)(t) +
21
i
wji (Si ∗ κ)(t)
X α
kejα (α ∗ κ)(t).
(10)
[Aside: Effectively, the dynamical system given by equation (1) is to be mimicked by our spiking network implementing a different dynamical system with an extra error feedback term as below (cf. equation (10)): τs xˆ˙ α = −ˆ xα + f˘α (~xˆ) + g˘α (~u) + kα ,
(11)
where k 0 and the error ~ ≡ ~xˆ − ~x. After learning f˘α (~xˆ) and g˘α (~uˆ) should approximate f˜α (~xˆ) = τs fα (~xˆ)+ xˆα and g˜α (~uˆ) = τs gα (~uˆ) respectively (upto a constant term distributed between the two functions). In our simulations, we will usually start with zero feedforward and recurrent weights, so that initially f˘(~xˆ) = 0 = g˘α (~uˆ).] ∗ ff* . and ideal recurrent weights wij Using assumption (2), we define ideal feedforward weights wik ∗ The output of this ideal realizable network ~xˆ is assumed to match ~x(t) in (1), even without any error feedback. We write an equation similar to equation (10) for this ideal output, without the error-feedback term: X X X ∗ τs eiα (xˆ˙ ∗α ∗ κ)(t) = − eiα (ˆ x∗α ∗ κ)(t) + wij (Si∗ ∗ κ)(t) α
α
+
X k
j
ff* (Skff* wik
∗ κ)(t),
(12)
where (Skff* ∗ κ)(t) and (Sj∗ ∗ κ)(t) are defined as the filtered spike trains in the ideal feedforward and recurrent layers, in this realizable network. The ideal weights are constants and their actual values are unimportant. They are merely a device to demonstrate stability of the system as below. The system is approximately realizable due to the expansion in a high-dimensional non-linear basis (see Supplementary subsection 7.1). Weights close to the ideal ones can be calculated algorithmically as in Supplementary subsection 7.2. The error can be computed as the true output ~x is observable (assumption (3)), or possibly deduced by multi-sensory integration in the brain. The observable output need not be the state variables themselves, but even a higher-dimensional function of them, as shown for the general scheme in Supplementary section 7.3. The error encoders and the output decoders form an auto-encoder (assumption (1)), and so any error in the output, when fed back with a negative gain (k) times the error encoders, tends to push the output to the desired value. Now, we set the learning time scales given by 1/η1 and 1/η2 much larger than the error feedback and synaptic time scales. Further, we use assumption (4) that the reference dynamics is slower than the synaptic and feedback loop time scales, and assumption (5) that the input and weights are bounded and the reference dynamics is stable. Thus, in equation (10), at the synaptic or error feedback time scale, ~x in the error is approximately constant, and the terms involving the weights are bounded. Choosing the negative-feedback gain for the error large enough (k 0), the difference between ~xˆ(t) and ~x(t) becomes small. Thus, at the reference dynamics time-scale, the recurrent network output follows the ideal output at all times, ~xˆ(t) ≈ ~x(t). Since the network effectually encodes ~x, the actual spiking rates will be approximately the same as the ideal, so that (Si∗ ∗ κ)(t) in (12) can be replaced by (Si ∗ κ)(t). The filtered spike trains (Skff ∗ κ)(t) of the feedforward network are already the same as ideal, since they are driven by ~u and the weights being learned don’t affect them. 22
We use equations (10) and (12) in the time-evolution equation of the error, and approximate first the true system by the ideal realized one i.e. xα ≈ x∗α (assumption (2)), and then the ideal filtered spike trains by the actual ones as above: X X X τs eiα (˙α ∗ κ)(t) = τs eiα ((xˆ˙ α − x˙ α ) ∗ κ)(t) ≈ τs eiα ((xˆ˙ α − x˙ ∗α ) ∗ κ)(t) α
=
X j
+ ≈
α
wij (Sj ∗ κ)(t) −
X
X j
k
X j
j
ff wik (Skff ∗ κ)(t) −
wij −
+ (k − 1)
≡
X
∗ wij
X α
α
∗ wij (Sj∗
X k
ff* wik (Skff* ∗ κ)(t) + (k − 1)
(Sj ∗ κ)(t) +
eiα (α ∗ κ)(t)
ψij (Sj ∗ κ)(t) +
X k
∗ κ)(t)
X k
ff wik
−
ff* wik
(Skff
φik (Skff ∗ κ)(t) + (k − 1)
X α
eiα (α ∗ κ)(t)
∗ κ)(t)
X α
eiα (α ∗ κ)(t), (13)
∗ ff ff* where ψij ≡ wij − wij and φik ≡ wik − wik .
Consider the candidate Lyapunov function: V (˜, ψ, φ) =
11 X 1X 2 1 1 X ˜i + (ψij )2 + (φik )2 , 2 i 2 η˜1 i,j 2 η˜2 i,k
(14)
P where ˜i ≡ (τs α eiα α ∗ κ). Choosing η˜1 , η˜2 > 0 (which will translate to positive learning rates later), the Lyapunov function is positive semi-definite V (˜, ψ, φ) ≥ 0, with the equality to zero only at (˜, ψ, φ) = (0, 0, 0). It has continuous first-order partial derivatives. Further, V is radially unbounded since V (˜, ψ, φ) > |(˜, ψ, φ)|2 /(4 max(1, η˜1 , η˜2 )), and decrescent since |V (˜, ψ, φ)| < |(˜, ψ, φ)|2 / min(1, η˜1 , η˜2 ), P P P where |(˜, ψ, φ)|2 ≡ i (˜i )2 + i,j (ψij )2 + i,k (φik )2 and min / max take the minimum / maximum of their respective arguments. Using Lyapunov’s direct method, we need to show, apart from the above conditions, that ˙ V ≤ 0 for uniform global stability (roughly bounded orbits remain bounded, so the error remains bounded); or V˙ < 0 for asymptotic global stability (see for example [Narendra and Annaswamy, 1989, Ioannou and Sun, 2012]). Taking the time derivative of V , and replacing ˜˙i
23
i.e. τs
P
α eiα (˙α
V˙ =
∗ κ) from (13), we have:
X i
≈
X i
1 X 1 X ψij ψ˙ ij + ˜i ˜˙i + φik φ˙ ik η˜1 ij η˜2 ik ˜i
X j
ψij (Sj ∗ κ)(t) +
X k
φik (Skff ∗ κ)(t) + (k − 1)
1 X 1 X ψij ψ˙ ij + φik φ˙ ik + η˜1 i,j η˜2 ik X 1 ˙ = ψij ˜i (Sj ∗ κ)(t) + ψij η˜1 i,j X X 1 ˙ ff + φik ˜i (Sk ∗ κ)(t) + φik + (k − 1) ˜2i /τs . η˜2 i i,k
X α
!
eiα α ∗ κ
(15)
If we ensure that ψ˙ ij = −˜ η1 ˜i (Sj ∗ κ)(t) φ˙ ik = −˜ η2 ˜i (Skff ∗ κ)(t), then V˙ = (k − 1)
X i
(16)
˜2i /τs ≤ 0
choosing k < 1, which is subsumed under k 0 for negative error feedback. The condition (16) with η1 ≡ η˜1 τs and η2 ≡ η˜2 τs , is the promised learning rule (6). Thus, we have proven the global uniform stability of the fixed point (˜, ψ, φ) = (0, 0, 0), which is effectively (, ψ, φ) = (0, 0, 0), in the (˜, ψ, φ)-system given by equations (13) and (16), choosing η1 , η2 > 0 and k 0, under assumptions (1)-(5). Here we have shown that the system is Lyapunov stable i.e. bounded orbits remain bounded, and not asymptotically stable. We further show in Methods subsection 4.2 that the error ~ goes to zero asymptotically, so ~xˆ reproduces the dynamics of ~x as per (1) with learning. A major caveat of this proof is that under assumption (2), approximation errors are currently ignored. These errors in approximating the reference dynamics appear as frozen noise causing runaway drift of the parameters, as studied in the adaptive control literature [Ioannou and Sun, 2012]. In our simulations with sufficient number of neurons, the approximations were good, and thus this drift was possibly slow, and did not cause the error to rise during typical timescales of learning. Also, to maintain our assumption (5), while the input is in our control, some form of bounding is needed to stop weights from drifting. Various techniques to address such modelapproximation noise and bounding weights have been studied in the robust adaptive control literature. We discuss this issue and briefly mention some of these ameliorative techniques in Supplementary sub-section 7.4. The FOLLOW learningP rule (16) on the feedforward or recurrent weights has two terms: (i) a projected filtered error α eiα α ∗ κ used for all synapses in neuron i that is available as a current in the post-synaptic neuron i due to error feedback, see equation (4); and (ii) a filtered pre-synaptic firing trace (Skff ∗ κ)(t) or (Sj ∗ κ)(t) that is available locally at each synapse. 24
Thus, the learning rule is: (i) local; (ii) learns arbitrary dynamical transforms f~(~x) + ~g (~u); (iii) uses an error in the observable ~x, not in its time-derivative; (iv) is uniformly globally stable i.e. Lyapunov stable, with the error tending to zero over time (Methods subsection 4.2); and (v) does not involve averaging over ~x or stochastic gradient descent. This learning scheme can be easily used for non-linear rate units by replacing the filtered spikes (Si ∗ κ)(t) by the output of the rate units r(t). The rate units must also low-pass filter their input with kernel κ(t).
4.2
Proof of error going to zero
P In section P 4.1 we showed uniform global stability using V˙ = (k − 1) i (˜i )2 ≤ 0, with k < 1 and ˜i ≡ τs α ejα α ∗ κ. However, this only means that a bounded error remains bounded. Here, we show that the error tends to zero asymptotically with time. We adapt the proof in section 4.2 of [Ioannou and Sun, 2012], to our spiking network. Here, we want to invoke a special case of Barb˘alat’s lemma: if f, f˙ ∈ L∞ and f ∈ Lp for some p ∈ [1, ∞), then f (t) → 0 as t → ∞. Recall the definitions: function f ∈ Lp when 1/p R∞ exists; and similarly function f ∈ L∞ when ||x||∞ ≡ supt≥0 |f (τ )| ||x||p ≡ 0 |f (τ )|p dτ exists. Since V is positive semi-definite (V ≥ 0) and is a non-increasing function of time (V˙ ≤ 0), its limt→∞ V = V∞ exists and is finite. Using this, the following limit exists and is finite: Z ∞ XZ ∞ 1 2 ˙ )dτ = 1 (V (0) − V∞ ). V (τ (˜i ) = k−1 0 k−1 0 i Since each term in the above and thus ˜i ∈ L2 ∀i.
P
i
is positive semi-definite,
R∞ 0
(˜i )2 also exists and is finite ∀i,
To show that ˜i , ˜˙i ∈ L∞ ∀i, consider equation (13). First, using the assumptions that the input ~u(t) is bounded and the reference dynamics is stable, we have that the reference output ~x(t) is bounded. Since network output ~xˆ is also bounded due to saturation of firing rates (as are the filtered spike trains), the error (each component) is bounded i.e. ˜i ∈ L∞ ∀i. If we also bound the weights from diverging during learning, then ψij , φik ∈ L∞ ∀i, j, k. With these reasonable assumptions, all terms on the right hand side of the equation (13) for ˜˙i are bounded, hence ˜˙i ∈ L∞ ∀i. Since ˜i ∈ L2 ∀i and ˜i , ˜˙i ∈ L∞ ∀i, invoking Barb˘alat’s lemma as above, we have ˜i → 0 ∀i as t → ∞. We don’t require the convergence of parameters to ideal ones for our purposes, since the error tending to zero, i.e. network output matching reference, is functionally sufficient for the forward predictive model. In the adaptive control literature [Ioannou and Sun, 2012, Narendra and Annaswamy, 1989], the parameters are shown to converge to ideal ones if input excitation is “persistent”, loosely that it excites all modes of the system. It may be possible to adapt the proof to our spiking network case, but is not pursued here.
25
4.3
Random / noisy decoders
We assumed that the output decoders form an auto-encoder with the error encoders. If ~h(~z) of the instead, the decoders are random, the output of the network is an arbitrary P −1function effective encoded variable encoded in the second layer denoted ~z ≡ i (E )αi Ji0 , where E −1 is the right-pseudo inverse of the matrix of error encoders {eiα }. Thus, equation (7) becomes: ! X (x) X xˆα = dαj (Sj ∗ κ)(t) ≈ h (E −1 )αi Ji0 . j
i
If this is used in equation (9) to get an analogue of (10) as in the derivation and proof subsection 4.1, we can see how ~h(~z) = m~z + n(~z), with m a constant around 1, will still enable sufficient negative feedback of the error, and learning of the system, as long as the extra term involving n(·) acts as a zero-mean noise. Simulations showing robustness to noisy decoders with m ≡ 1 + ξ are in subsection 2.6.2.
4.4
Simulation software
All simulation scripts were written in python (https://www.python.org/) for the Nengo 2 simulator [Stewart et al., 2009] (http://www.nengo.ca/) with minor custom modifications to support sparse weights. We ran the model using the Nengo GPU backend (https://github. com/nengo/nengo_ocl) for speed. The script for plotting the figures was written in python using the matplotlib module (http://matplotlib.org/). These simulation and plotting scripts will be made available online at https://github.com/adityagilra, once this article is peerreviewed and published.
4.5
Network parameters
For all our example implementations, we used N number of heterogeneous, deterministic, spiking leaky integrate and fire (LIF) neurons [Eliasmith and Anderson, 2004], in each layer shown in Figure 2A. The voltage Vk of each LIF neuron indexed by k, was a low-pass filter of its current Jk : dVk = −Vk + Jk , τm dt with τm the membrane time constant, set at 20 ms. The neuron fired when the voltage Vk crossed a threshold θ = 1 from below, after which the voltage was reset to zero for a refractory period of 2 ms. If the voltage went below zero, it was clipped to zero. The output spike train of neuron k is denoted Skff (t) in the first layer and Sk (t) in the second layer, which can be considered a sum of delta functions at the times of each spike. Spike trains were synaptically filtered when injecting as a current, by convolving with kernel κ(t) ≡ exp(−t/τs )/τs with a time constant of τs = 20 ms. Fixed, random encoding weights effkα and eiα for the input uα and the error α were uniformly chosen on an Nd -dimensional hypersphere (Nengo 2 default) of radius 1/R1 and 1/R2 . Thus, the norm of the encoded variable in layer 1 or 2 could range from (−R1,2 , R1,2 ), where R1 and R2 (Table 1) will be called the representation radii of layer 1 and layer 2. Typically R1 was smaller than (around 0.2 of) R2 as the input from layer 1 was integrated in layer 2. The intercept of the transfer function (firing rate vs normalized input variable curve) for each neuron (Fig. 2B), along the direction of its encoder, was chosen P uniformly from (−1, 1); and its maximum firing rate for an input in (−1, 1), reached when α eiα α = 1, was chosen uniformly 26
Number of neurons / layer Tperiod (s) Representation radius R1 Representation radius R2 Learning pulse ζ1 Learning pedestal ζ2
Linear 2000 2 0.2 1 R1 /6 R2 /16
van der Pol 3000 4 0.2 5 R1 /6, R1 /2 R1 /6, R1 /2
Lorenz 5000 20 6 30 R2 /10 0
Arm Non-linear feedforward 5000 2000 2 2 0.2 1 1 R2 /0.3 R1 /0.6 R2 /0.3 R2 /1.6
Table 1: Network parameters for example systems from (200, 400) Hz. From this desiredPintercept and maximum rate for the transfer function of each LIF neuron, with input zi = γi α eiα α + bi , its bias bi and gain γi were calculated from the static rate equation for the LIF neuron νi = 1/(τr + τm ln(1 − 1/zi )).
(17)
Ideally, these encoders, biases and gains would be learnt during development from the input statistics, but here we set them randomly. The input vector ~u(t) to the network was Nd -dimensional and time-varying. During the learning phase, the input was switched every 50ms to an Nd -vector of components chosen uniformly between (−ζ1 , ζ1 ), added to a pedestal of a randomly chosen direction vector on an Nd -dimensional hypersphere of radius ζ2 switched every Tperiod. Parameter values for the network and input for each dynamical system are provided in Table 1. Further details are noted in subsection 4.6. (x) Decoders for the output ~xˆ, i.e. the linear readout weights dβi from the recurrently connected network, were computed algorithmically to form an autoencoder with respect to error encoders (p) eiα . P (= N ) number of random inputs β within the Nd -dimensional hypersphere of radius R1 were chosen, and encoded in the neurons of layer 2 with encoders eiα , gains and biases as (p) above. Neural activities ai were computed with the static rate equation (17) for a LIF neuron. (x) The decoders dβi acting on these activities should yield back the encoded points thus forming 2 PP P PN (x) (p) (p) with L2 regularization an auto-encoder. A squared-error function p β i dβi ai − β was used for this linear regression (default in Nengo 2). Biologically plausible learning rules exist for auto-encoders [Burbank, 2015, Voegtlin, 2006], and so, we set these decoders as if they had already been learned.
The reference output ~x was computed by integrating the dynamical equations for each example system on giving the same inputs as to the network. This reference ~x was subtracted from ~xˆ decoded from the network, to compute the error directly. The FOLLOW learning rule i.e. equation (6) was applied on the feedforward and recurrent ff weights, namely wji and wji . The error for our learning rule was the error β = xˆβ − xβ in the observable output ~x, not the error in the desired function f~˜(~x). Further as per the FOLLOW scheme, this error was projected via the fixed encoding weights keiβ with a large negative gain k = −10, making it available as a post-synaptic current. In a biological setting, we expect the error current to be delivered into the dendrites making it isolated from the feedforward and recurrent currents delivered at the soma (Fig. 2C) (see also Discussion).
27
The synaptic time constant τs was 20ms in all synapses, including that for calculating the error and for feeding the error back to the neurons. The error used for the weight update, was filtered by a 200 ms decaying exponential.
4.6
Equations and parameters for the example dynamical systems
The equations and input modifications for each dynamical system are detailed below. Time derivatives are in units of s−1 . The equations for the Linear decaying oscillators system were x˙ 1 = u1 /0.02 + (−0.2x1 − x2 )/0.05 x˙ 2 = u2 /0.02 + (x1 − 0.2x2 )/0.05. The equations for the van der Pol oscillator system were x˙ 1 = u1 /0.02 + x2 /0.125
x˙ 2 = u2 /0.02 + 2(1 − x21 )x2 − x1 /0.125.
Each component of the input ~u(t) was scaled differently for the van der Pol oscillator as reported in Table 1. The equations for the chaotic Lorenz system were x˙ 1 = u1 /0.02 + 10(x2 − x1 ) x˙ 2 = u2 /0.02 − x1 x3 − x2 x˙ 3 = u3 /0.02 + x1 x2 − 8(x3 + 28)/3. In our equations above, x3 was translated x3 → x3 − 28 to vary around zero, compared to the usual Lorenz system, as the network represented a symmetric range around zero for each component. This does not change the system dynamics, just its representation in the network. For the Lorenz system, only a pulse at the start for 250 ms, chosen from a random direction of norm ζ1 , was provided to set off the system, after which the system followed autonomous dynamics. Our FOLLOW scheme also learned non-linear feedforward transforms, as demonstrated in Supplementary Figures S2 and S3. For the non-linear feedforward case, we used the linear system as the reference, but the input to it was transformed to a non-linear gα (~u) = 10((uα /0.1)3 − uα /0.4), while the input to the network remained ~u. Thus, the feedforward weights had to learn the non-linear transform ~g (~u) while the recurrent weights learned the linear system. Together, the equations of the reference were: x˙ 1 = 10((u1 /0.1)3 − u1 /0.4) + (−0.2x1 − x2 )/0.05 x˙ 2 = 10((u2 /0.1)3 − u2 /0.4) + (x1 − 0.2x2 )/0.05.
28
In the example of learning arm dynamics, we used a two-link model for an arm moving in the vertical plane with damping under gravity (see for example http://www.gribblelab.org/ compneuro/5_Computational_Motor_Control_Dynamics.html and https://github.com/studywolf/ control/tree/master/studywolf_control/arms/two_link), with parameters from [Li, 2006]. The differential equations for the four state variables, namely the shoulder and elbow angles θ~ = (θ1 , θ2 )T and the angular velocities ω ~ = (ω1 , ω2 )T , given input torques ~τ = (τ1 , τ2 )T were: ˙ θ~ = ω ~
with
~ −1 ~τ − C(θ, ~ ω ~ ω ~˙ = M (θ) ~ ) − B~ω − gD(θ)
(18) (19)
2 2 2 d + d cos θ + m s + m s d + 2d cos θ + m s 3 2 2 2 2 1 2 2 1 2 2 1 ~ = M (θ) d3 + m2 s22 d3 + d2 cos θ2 + m2 s22 −θ˙2 (2θ˙1 + θ˙2 ) b11 b12 ~ C(θ, ω ~) = d2 sin θ2 , B = , b21 b22 θ˙12 (m1 s1 + m2 l1 ) sin θ1 + m2 s2 sin(θ1 + θ2 ) ~ D(θ) = , m2 s2 sin(θ1 + θ2 ) d1 = I1 + I2 + m2 l12 , d2 = m2 l1 s2 , d3 = I2 ,
where mi is the mass, li the length, si the distance from the joint center to the center of the mass, and Ii the moment of inertia, of link i; M is the moment of inertia matrix; C contains centripetal and Coriolis terms; B is for joint damping; and D contains the gravitational terms. Here, the state variable vector ~x = [θ1 , θ2 , ω1 , ω2 ], but the effective torque τ was obtained from the input torque ~u as below. To avoid any link from rotating full 360 degrees, we provided an effective torque τα to the arm, by subtracting a term proportional to the input torque uα , if the angle crossed ±90 degrees and uα was in the same direction: ˜ (θα ) uα > 0 uα σ τ α = uα − 0 uα = 0 , uα σ ˜ (−θα ) uα < 0
where σ ˜ (θ) increases linearly from 0 to 1 as θ goes from π/2 to 3π/4 as below: θ ≤ π/2 0 σ ˜ (θ) = (θ − π/2)/(π/4) 3π/4 > θ > π/2 1 θ ≥ 3π/4
The parameter values were as per Model 1 of the human arm in section 3.1.1 of the PhD thesis of Li [Li, 2006] from the Todorov lab; namely m1 = 1.4 kg, m2 = 1.1 kg, l1 = 0.3 m, l2 = 0.33 m, s1 = 0.11 m, s2 = 0.16 m, I1 = 0.025 kg m2 , I2 = 0.045 kg m2 , and b11 = b22 = 0.05, b12 = b21 = 0.025. Acceleration due to gravity was set at g = 9.81 m/s2 . For the arm, we did not filter the reference variables for calculating the error. The input torque ~u(t) for learning the two-link arm was generated, not by switching the pulse and pedestal values sharply, every 50ms and Tperiod as for the others, but by linearly interpolating in-between to avoid oscillations from sharp transitions, due to the feedback loop in the input in the general network (Supplementary Fig. S4). 29
The input torque ~u and the variables ω ~ , θ~ obtained on integrating the arm model above were scaled by 0.02, 0.05 and 1./2.5 respectively, and then used as the reference for the spiking network. Effectively, we scaled the input torques to cover one-fifth of the representation radius, the angular velocities one-half, and the angles full, as each successive variable was the integral of the previous one.
5
Acknowledgements
We thank Johanni Brea and Samuel Muscinelli for helpful discussions and comments on the manuscript. Financial support was provided by the European Research Council (Multirules, grant agreement no. 268 689) and by the Swiss National Science Foundation (grant agreement no. CRSII 147636).
6
References
[Abbott et al., 2016] Abbott, L. F., DePasquale, B., and Memmesheimer, R.-M. (2016). Building functional networks of spiking model neurons. Nature Neuroscience, 19(3):350–355. [Bengio et al., 1994] Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157– 166. [Bourdoukan and Den`eve, 2015] Bourdoukan, R. and Den`eve, S. (2015). Enforcing balance allows local supervised learning in spiking recurrent networks. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., Garnett, R., and Garnett, R., editors, Advances in Neural Information Processing Systems 28, pages 982–990. Curran Associates, Inc. [Brown and Hestrin, 2009] Brown, S. P. and Hestrin, S. (2009). Intracortical circuits of pyramidal neurons reflect their long-range axonal targets. Nature, 457(7233):1133–1136. [Burbank, 2015] Burbank, K. S. (2015). Mirrored STDP Implements Autoencoder Learning in a Network of Spiking Neurons. PLoS Computational Biology, 11(12). [Burnod et al., 1992] Burnod, Y., Grandguillaume, P., Otto, I., Ferraina, S., Johnson, P. B., and Caminiti, R. (1992). Visuomotor transformations underlying arm movements toward visual targets: A neural network model of cerebral cortical operations. Journal of Neuroscience, 12(4):1435–1453. [Churchland et al., 2012] Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Foster, J. D., Nuyujukian, P., Ryu, S. I., and Shenoy, K. V. (2012). Neural population dynamics during reaching. Nature, 487(7405):51–56. [Dadarlat et al., 2015] Dadarlat, M. C., O’Doherty, J. E., and Sabes, P. N. (2015). A learningbased approach to artificial sensory feedback leads to optimal integration. Nature Neuroscience, 18(1):138–144. [Davidson and Wolpert, 2005] Davidson, P. R. and Wolpert, D. M. (2005). Widespread access to predictive models in the motor system: A short review. Journal of Neural Engineering, 2(3):S313. [Den`eve and Machens, 2016] Den`eve, S. and Machens, C. K. (2016). Efficient codes and balanced networks. Nature Neuroscience, 19(3):375–382. 30
[DePasquale et al., 2016] DePasquale, B., Churchland, M. M., and Abbott, L. F. (2016). Using Firing-Rate Dynamics to Train Recurrent Networks of Spiking Model Neurons. arXiv:1601.07620 [q-bio]. [Doya, 1992] Doya, K. (1992). Bifurcations in the learning of recurrent neural networks. In , 1992 IEEE International Symposium on Circuits and Systems, 1992. ISCAS ’92. Proceedings, volume 6, pages 2777–2780 vol.6. [Eliasmith, 2005] Eliasmith, C. (2005). A Unified Approach to Building and Controlling Spiking Attractor Networks. Neural Computation, 17(6):1276–1314. [Eliasmith and Anderson, 2004] Eliasmith, C. and Anderson, C. H. (2004). Neural Engineering: Computation, Representation, and Dynamics in Neurobiological Systems. MIT Press. [Friston, 2008] Friston, K. (2008). Hierarchical Models in the Brain. PLOS Computational Biology, 4(11):e1000211. [Funahashi, 1989] Funahashi, K.-I. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks, 2(3):183–192. [Gerstner et al., 2014] Gerstner, W., Kistler, W. M., Naud, R., and Paninski, L. (2014). Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition. Cambridge University Press, Cambridge, United Kingdom, 1 edition edition. [Girosi and Poggio, 1990] Girosi, F. and Poggio, T. (1990). Networks and the best approximation property. Biological Cybernetics, 63(3):169–176. [Hennequin et al., 2014] Hennequin, G., Vogels, T. P., and Gerstner, W. (2014). Optimal Control of Transient Dynamics in Balanced Networks Supports Generation of Complex Movements. Neuron, 82(6):1394–1406. [Hilber and Caston, 2001] Hilber, P. and Caston, J. (2001). Motor skills and motor learning in Lurcher mutant mice during aging. Neuroscience, 102(3):615–623. [Hochreiter et al., 2001] Hochreiter, S., Bengio, Y., Frasconi, P., and Schmidhuber, J. (2001). Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies. [Hochreiter and Schmidhuber, 1997] Hochreiter, S. and Schmidhuber, J. (1997). Long ShortTerm Memory. Neural Computation, 9(8):1735–1780. [Hoerzer et al., 2014] Hoerzer, G. M., Legenstein, R., and Maass, W. (2014). Emergence of Complex Computational Structures From Chaotic Neural Networks Through Reward-Modulated Hebbian Learning. Cerebral Cortex, 24(3):677–690. [Ioannou and Fidan, 2006] Ioannou, P. and Fidan, B. (2006). Adaptive Control Tutorial. SIAM, Society for Industrial and Applied Mathematics, Philadelphia, PA. [Ioannou and Sun, 2012] Ioannou, P. and Sun, J. (2012). Robust Adaptive Control. Dover Publications, Mineola, New York, first edition, first edition edition. [Jaeger, 2001] Jaeger, H. (2001). The ”echo state” approach to analysing and training recurrent neural networks. Technical report. [Jaeger, 2005] Jaeger, H. (2005). A Tutorial on Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF and the ”Echo State Network” Approach.
31
[Jaeger and Haas, 2004] Jaeger, H. and Haas, H. (2004). Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication. Science, 304(5667):78–80. [Joshi and Maass, 2005] Joshi, P. and Maass, W. (2005). Movement Generation with Circuits of Spiking Neurons. Neural Computation, 17(8):1715–1738. [Khazipov et al., 2004] Khazipov, R., Sirota, A., Leinekugel, X., Holmes, G. L., Ben-Ari, Y., and Buzs´aki, G. (2004). Early motor activity drives spindle bursts in the developing somatosensory cortex. Nature, 432(7018):758–761. [Lalazar and Vaadia, 2008] Lalazar, H. and Vaadia, E. (2008). Neural basis of sensorimotor learning: Modifying internal models. Current Opinion in Neurobiology, 18(6):573–581. [Legenstein et al., 2010] Legenstein, R., Chase, S. M., Schwartz, A. B., and Maass, W. (2010). A Reward-Modulated Hebbian Learning Rule Can Explain Experimentally Observed Network Reorganization in a Brain Control Task. Journal of Neuroscience, 30(25):8400–8410. [Legenstein and Maass, 2007] Legenstein, R. and Maass, W. (2007). Edge of chaos and prediction of computational performance for neural circuit models. Neural Networks, 20(3):323–334. [Legenstein et al., 2003] Legenstein, R., Markram, H., and Maass, W. (2003). Input prediction and autonomous movement analysis in recurrent circuits of spiking neurons. Reviews in the Neurosciences, 14(1-2):5–19. [Li, 2006] Li, W. (2006). Optimal Control for Biological Movement Systems. PhD thesis, University of California, San Diego. [Lillicrap et al., 2016] Lillicrap, T. P., Cownden, D., Tweed, D. B., and Akerman, C. J. (2016). Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications, 7:13276. [Maass and Markram, 2004] Maass, W. and Markram, H. (2004). On the computational power of circuits of spiking neurons. Journal of Computer and System Sciences, 69(4):593–616. [Maass et al., 2002] Maass, W., Natschl¨ager, T., and Markram, H. (2002). Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations. Neural Computation, 14(11):2531–2560. [MacNeil and Eliasmith, 2011] MacNeil, D. and Eliasmith, C. (2011). Fine-Tuning and the Stability of Recurrent Neural Networks. PLoS ONE, 6(9):e22885. [Markram et al., 2015] Markram, H., Muller, E., Ramaswamy, S., Reimann, M. W., Abdellah, M., Sanchez, C. A., Ailamaki, A., Alonso-Nanclares, L., Antille, N., Arsever, S., Kahou, G. A. A., Berger, T. K., Bilgili, A., Buncic, N., Chalimourda, A., Chindemi, G., Courcol, J.D., Delalondre, F., Delattre, V., Druckmann, S., Dumusc, R., Dynes, J., Eilemann, S., Gal, E., Gevaert, M. E., Ghobril, J.-P., Gidon, A., Graham, J. W., Gupta, A., Haenel, V., Hay, E., Heinis, T., Hernando, J. B., Hines, M., Kanari, L., Keller, D., Kenyon, J., Khazen, G., Kim, Y., King, J. G., Kisvarday, Z., Kumbhar, P., Lasserre, S., Le B´e, J.-V., Magalh˜aes, B. R. C., Merch´an-P´erez, A., Meystre, J., Morrice, B. R., Muller, J., Mu˜ noz-C´espedes, A., Muralidhar, S., Muthurasa, K., Nachbaur, D., Newton, T. H., Nolte, M., Ovcharenko, A., Palacios, J., Pastor, L., Perin, R., Ranjan, R., Riachi, I., Rodr´ıguez, J.-R., Riquelme, J. L., R¨ossert, C., Sfyrakis, K., Shi, Y., Shillcock, J. C., Silberberg, G., Silva, R., Tauheed, F., Telefont, M., Toledo-Rodriguez, M., Tr¨ankler, T., Van Geit, W., D´ıaz, J. V., Walker, R., Wang, Y., Zaninetta, S. M., DeFelipe, J., Hill, S. L., Segev, I., and Sch¨ urmann, F. (2015). Reconstruction and Simulation of Neocortical Microcircuitry. Cell, 163(2):456–492. 32
[Narendra and Annaswamy, 1989] Narendra, K. S. and Annaswamy, A. M. (1989). Stable Adaptive Systems. Prentice-Hall, Inc. [Nicola and Clopath, 2016] Nicola, W. and Clopath, C. (2016). Supervised Learning in Spiking Neural Networks with FORCE Training. arXiv:1609.02545 [q-bio]. [Parisien et al., 2008] Parisien, C., Anderson, C. H., and Eliasmith, C. (2008). Solving the Problem of Negative Synaptic Weights in Cortical Models. Neural Computation, 20(6):1473– 1494. [Pearlmutter, 1995] Pearlmutter, B. A. (1995). Gradient calculations for dynamic recurrent neural networks: A survey. IEEE Transactions on Neural Networks, 6(5):1212–1228. [Petersson et al., 2003] Petersson, P., Waldenstr¨om, A., F˚ ahraeus, C., and Schouenborg, J. (2003). Spontaneous muscle twitches during sleep guide spinal self-organization. Nature, 424(6944):72–75. [Poggio, 1990] Poggio, T. (1990). A theory of how the brain might work. Cold Spring Harbor Symposia on Quantitative Biology, 55:899–910. [Porrill et al., 2004] Porrill, J., Dean, P., and Stone, J. V. (2004). Recurrent cerebellar architecture solves the motor-error problem. Proceedings of the Royal Society B: Biological Sciences, 271(1541):789–796. [Pouget and Sejnowski, 1997] Pouget, A. and Sejnowski, T. J. (1997). Spatial Transformations in the Parietal Cortex Using Basis Functions. Journal of Cognitive Neuroscience, 9(2):222– 237. [Pouget and Snyder, 2000] Pouget, A. and Snyder, L. H. (2000). Computational approaches to sensorimotor transformations. Nature Neuroscience, 3:1192–1198. [Roelfsema and van Ooyen, 2005] Roelfsema, P. R. and van Ooyen, A. (2005). Attention-gated reinforcement learning of internal representations for classification. Neural Computation, 17(10):2176–2214. [Rumelhart et al., 1986] Rumelhart, D., Hinton, G., and Williams, R. (1986). Learning Internal Representations by Error Propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol 1, volume 1. MIT Press Cambridge, MA, USA. [Sarlegna and Sainburg, 2009] Sarlegna, F. R. and Sainburg, R. L. (2009). The Roles of Vision and Proprioception in the Planning of Reaching Movements. Advances in experimental medicine and biology, 629:317–335. [Seung et al., 2000] Seung, H. S., Lee, D. D., Reis, B. Y., and Tank, D. W. (2000). Stability of the Memory of Eye Position in a Recurrent Network of Conductance-Based Model Neurons. Neuron, 26(1):259–271. [Song et al., 2016] Song, H. F., Yang, G. R., and Wang, X.-J. (2016). Training ExcitatoryInhibitory Recurrent Neural Networks for Cognitive Tasks: A Simple and Flexible Framework. PLOS Comput Biol, 12(2):e1004792. [Stewart et al., 2009] Stewart, T. C., Tripp, B., and Eliasmith, C. (2009). Python scripting in the Nengo simulator. Frontiers in Neuroinformatics, 3.
33
[Sussillo and Abbott, 2012] Sussillo, D. and Abbott, L. (2012). Transferring Learning from External to Internal Weights in Echo-State Networks with Sparse Connectivity. PLoS ONE, 7(5):e37372. [Sussillo and Abbott, 2009] Sussillo, D. and Abbott, L. F. (2009). Generating Coherent Patterns of Activity from Chaotic Neural Networks. Neuron, 63(4):544–557. [Sutton, 1996] Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems 8, pages 138–1044. [Thalmeier et al., 2016] Thalmeier, D., Uhlmann, M., Kappen, H. J., and Memmesheimer, R.-M. (2016). Learning Universal Computations with Spikes. PLOS Comput Biol, 12(6):e1004895. [Turaga et al., 2006] Turaga, S. C., Sompolinsky, H., and Seung, H. S. (2006). Online learning in a model neural integrator. [Urbanczik and Senn, 2014] Urbanczik, R. and Senn, W. (2014). Learning by the Dendritic Prediction of Somatic Spiking. Neuron, 81(3):521–528. [Voegtlin, 2006] Voegtlin, T. (2006). Temporal Coding using the Response Properties of Spiking Neurons. pages 1457–1464. [Vogels et al., 2011] Vogels, T. P., Sprekeler, H., Zenke, F., Clopath, C., and Gerstner, W. (2011). Inhibitory Plasticity Balances Excitation and Inhibition in Sensory Pathways and Memory Networks. Science, 334(6062):1569 –1573. [Williams and Zipser, 1989] Williams, R. J. and Zipser, D. (1989). A Learning Algorithm for Continually Running Fully Recurrent Neural Networks. Neural Computation, 1(2):270–280. [Wolpert and Ghahramani, 2000] Wolpert, D. M. and Ghahramani, Z. (2000). Computational principles of movement neuroscience. Nature Neuroscience, 3:1212–1217. [Wong et al., 2012] Wong, J. D., Kistemaker, D. A., Chin, A., and Gribble, P. L. (2012). Can proprioceptive training improve motor learning? Journal of Neurophysiology, 108(12):3313– 3321. [Zago et al., 2005] Zago, M., Bosco, G., Maffei, V., Iosa, M., Ivanenko, Y. P., and Lacquaniti, F. (2005). Fast Adaptation of the Internal Model of Gravity for Manual Interceptions: Evidence for Event-Dependent Learning. Journal of Neurophysiology, 93(2):1055–1068. [Zago et al., 2009] Zago, M., McIntyre, J., Senot, P., and Lacquaniti, F. (2009). Visuo-motor coordination and internal models for object interception. Experimental Brain Research, 192(4):571–604. [Zerkaoui et al., 2009] Zerkaoui, S., Druaux, F., Leclercq, E., and Lefebvre, D. (2009). Stable adaptive control with recurrent neural networks for square MIMO non-linear systems. Engineering Applications of Artificial Intelligence, 22(4–5):702–717.
34
7
Supplementary Material
7.1
Decoding
Consider the problem of decoding an arbitrary output ~v (~u) corresponding to the ~u encoded in the feedforward first layer from the spike trains Skff (t) of the neurons, by synaptically filtering (~v ) and linearly weighting the trains with decoding weights dαi : X (~v) vˆα (~u) = dαk (Skff ∗ κ)(t), (20) k
Rt R∞ where ∗ denotes convolution (Skff ∗ κ)(t) ≡ −∞ Skff (t0 )κ(t − t0 )dt0 = 0 Skff (t − t0 )κ(t0 )dt0 , and κ(t) ≡ exp(−t/τs )/τs is a normalized filtering kernel. (~v )
We can obtain the decoders dαi by minimizing the loss function L=
* X α
vα (~u) −
X k
(~v )
dαk hSkff ∗ κit
!2 +
(21)
~ u
with respect to the decoders. The average h·i~u over ~u guarantees that the same constant decoders are used over the whole range of constant inputs ~u. The time average h·it denotes an analytic rate computed for each constant input for a LIF neuron. Linear regression with a finite set of constant inputs ~u was used to obtain the decoders (see Methods). With these decoders, if the input ~u P (~v) varies slowly compared to the synaptic time constant τs , we have vˆα = k dαk (Skff ∗κ)(t) ≈ vα (~u).
Any function of the input ~v (~u) can be approximated with appropriate linear decoding weights from the high-dimensional basis of non-linear tuning curves of heterogeneous neurons with different biases, encoding weights and gains, schematized in Figure 2B. With a large enough number of such neurons, the function is expected to be approximated to arbitrary accuracy. While this has not been proven rigorously for spiking neurons, this has theoretical underpinnings from theorems on universal function approximation using non-linear basis functions [Funahashi, 1989, Girosi and Poggio, 1990] sucessful usage in spiking neural network models by various groups [Eliasmith and Anderson, 2004, Eliasmith, 2005, Pouget and Sejnowski, 1997, Seung et al., 2000], and biological plausibility [Poggio, 1990, Burnod et al., 1992, Pouget and Sejnowski, 1997]. (~v ) dαi
Here, the neurons that are active at any given time operate in the mean driven regime i.e. the instantaneous firing rate increases with the input current [Gerstner et al., 2014]. The dynamics is dominated by synaptic filtering, and the membrane time constant does not play a significant role [Eliasmith and Anderson, 2004, Eliasmith, 2005, Seung et al., 2000, Abbott et al., 2016]. Thus, the decoding weights derived from eq. (21) with stationary input are good approximations even in the time-dependent case, as long as the input varies on a time scale slower than the synaptic time constant.
7.2
Online learning based on a loss function and its shortcomings
As mentioned in the ‘Aside’ in the derivation and stability proof of the learning scheme in Methods subsection 4.1, the feedforward connections must learn to decode g˜α (~u) = τs gα (~uˆ), while the recurrent connections muct decode f˜α (~x) = τs fα (~xˆ) + xˆα [Eliasmith and Anderson, 2004].
35
Assuming that the time scales of dynamics are slower than synaptic time scale τs , we can approximate the requisite feedforward and recurrent weights, by minimizing the following loss functions respectively, with respect to the weights: * !2 + X X X Lff = effkα g˜α (~u) − wjk hSkff ∗ κit , (22) α
j
Lrec =
k
* X X j
α
x
X ˜ ejα f~α (~x) − wji hSi ∗ κit i
!2 +
.
(23)
x
Using these loss functions, we can precalculate the weights required for any dynamical system numerically, similarly to the calculation of decoders in Supplementary subsection 7.1. We now derive rules for learning the weights online based on stochastic gradient descent of these loss functions, similar to [MacNeil and Eliasmith, 2011], and point out its lacunae. The learning rule for the recurrent weights by gradient descent on the loss function given by equation (23) is 1 ∂Lrec dwji =− η dt 2 ∂wji * ! + X X ˜ ≈η ejβ f~β (~x) − wji (Si ∗ κ)(t) (Si ∗ κ)(t) ≡η
β
~ (f˜) j (Si
∗ κ)(t) .
i
(24)
x
x
In the second line, the effect of the weight change on the filtered spike trains is assumed small and neglected, using a small learning rate η. With requisite dynamics slower than synaptic τs , and with large enough number of neurons, weP have approximated aPtime average over synaptic τs duration before each time point such that i wji hSi ∗ κit (t) ≈ i wji (Si ∗ κ)(t). The third line defines an error in the projected f~˜(~x), which is the supervisory signal. If we assume that the learning rate is slow, and the input samples the range of x uniformly, then we can remove the averaging over x, similar to stochastic gradient descent.
(f˜)
where j
dwji (f˜) ≈ −ηj (Si ∗ κ)(t). (25) dt P ˜β (~x) − P wji (Si ∗ κ)(t) . This learning rule is the product of a projected ≡ e f jβ β i (f˜)
and the filtered presynaptic spike train (Si ∗ κ)(t). However, this projected error in the unobservable f~˜ is not available to the post-synaptic neuron, making the learning rule non-local. A similar issue arises in the feedforward case. multi-dimensional error j
In mimicking a dynamical system, we want only the observable output of the dynamical system i.e. ~x to be used in a supervisory signal, not a term involving the unknown f (~x) appearing in the derivative ~x˙ . Even if this derivative is computed from the observable ~x, it will be noisy. Furthermore, this derivative cannot be obtained by differentiating the observable versus time, if the observable is not directly the output, but an unknown non-linear function of it, which 36
however our FOLLOW learning can handle (see Supplementary subsection 7.3). Thus, using the observable error, this online rule can only learn an integrator as in [MacNeil and Eliasmith, 2011], for which f (x) ∼ x. Indeed learning both the feedforward and recurrent weights simultaneously using gradient descent on these loss functions, requires two different and unavailable error currents to be projected into the post-synaptic neuron to make the rule local.
7.3
General dynamical system
General dynamical systems of the form (2), namely d~y (t) ~ = h(~y (t), ~u(t)), dt ~x(t) = ~k(~y (t)) can be learned using a different network configuration than the Figure 2A configuration used for systems of the form (1). Here, the state variable is ~y , but the observable which serves as the reference to the network is ~x. The dimensionality of the relevant variables: (1) the state variables (say joint angles and velocities) ~y ; (2) the observables represented in the brain (say sensory representations of the joint angles and velocities) ~x; and (3) the control input (motor command) ~u, can be different from each other, but must be small compared to the number of neurons. Furthermore, we require the observable ~x to not lose information compared to ~y , i.e. ~k must be invertible, so ~x will have at least the same dimension as ~y . The time evolution of the observable is X ∂kβ (~y ) X ∂kβ (~y ) y˙ α = hα (~y , ~u) ≡ pβ (~x, ~u). x˙ β = ∂yα ∂yα α α
So we essentially need to learn x˙ β = pβ (~x, ~u).
Consider an augmented vector of the state and input variables x˜γ ≡ [xα , uβ ], where the index γ runs over the values for indices α and β serially. Its time derivative is x˜˙ γ = [x˙ α , u˙ β ] = [pα (~x˜), u˙ β ]. The α components are of the same form as (1), but the β components involve u˙ β which is not specified as a function pβ (~x˜). However, we don’t need u˙ β as only the state variables need to be predicted, not the input. Our network for the general dynamical system, as shown in Supplementary Figure S4, decodes the augmented ~xˆ˜ i.e. ~xˆ and ~uˆ both as output. The error in the augmented ~x˜ is fed back to the network. Since the pα (~x˜) is a mixed function of the state and input variables, ~u is no longer fed into the network via a feedforward layer, rather it enters only via the error in the augmented variable. Once learning is complete, the error feedback in ~xˆ can be stopped, but the error in ~uˆ must still be fed back, so that ~u functions as a motor command.
7.4
Approximation error causes drift in weights
A frozen noise term ξ(~x(t)) due to the approximate decoding from non-linear tuning curves of neurons, by the feedforward weights, recurrent weights and output decoders, will appear additionally in equation (13). If this frozen noise has a non-zero mean over time as ~x(t) varies, leading to a non-zero mean error, then it causes a drift in the weights due to the error-based learning rules in equations (6), and possibly a consequent increase in error. Note that the stability and error tending to zero proofs assume that this frozen noise is negligible. 37
Multiple strategies with contrasting pros and cons have been proposed to counteract this parameter drift in the robust adaptive control literature [Ioannou and Sun, 2012, Narendra and Annaswamy, 1989, Ioannou and Fidan, 2006]. These include a dead zone strategy with no updation of weights once the error is lower than a set value, or a weight leakage / regularizer term switched slowly on when a weight crosses a threshold. In our simulations, the error continued to drop even over longer than typical learning time scales (Figure 7), and so, we did not implement these strategies. In practice, the learning can be stopped once error is low enough, while the error feedback can be continued, so that the system does not deviate too much from the observed one.
38
7.5
A
Supplementary Figures
| startLearning end | zero
u3
0.06
Testing
zero
0.00 0.06
B x3 ,xˆ3
20
feedback off | feedback on
feedback on |feedback off
feedback off
0 25
C
0
40
time (s)
14900
14940
time (s)
E 20
x1
x2
x2
x1
0
time (s)
40
2
max n (x3 )
20
max n + 1 (x3 )
40
x3
D
0
x3
error ²3
40
2
Figure S1: Learning the Lorenz system without filtering the reference variables. Panels A-E are interpreted similar to Figure 5A-E, except that the reference signal was not filtered in computing the error, and is also plotted without filtering. Without filtering, the tent map for the network output (panel E red) shows a doubling, but the mismatch for large maxima is reduced, compared to with filtering in Figure 5.
39
A
|
u2
0.35
start
Learning
| noise
end
Testing
ramp and step
0.00 0.30 feedback off | feedback on 0.8
x2 ,xˆ2
B C
feedback on |feedback off
feedback off
0.0 0.4
0
time (s)
E
time (s)
9994
3
®
err 2
0
10 -9
9990
weight (arb·10−3 )
10 -4
4
Nd , t
D
0.00 0.15
0
time (s)
10000
1
F 70
0
time (s)
4
density
error ²2
0.30
0
10000
0
0.3 weight (arb·10−3 )
0.3
Figure S2: Learning nonlinear feedforward transformation with linear recurrent dynamics via FOLLOW. Panels A-F are interpreted similar to Figure 3A-F, except in panel A, along with the input u2 (t) (blue) to layer 1, the required non-linear transform g2 (~u(t)) (/20) is also plotted (cyan); and in panels E and F, the evolution and histogram of weights are those of the feedforward weights, that perform the non-linear transform on the input.
40
non-lin-sys wts (arb×10−4 )
A 50 feedforward B 50 recurrent R 2 = -13.354
R 2 = 0.942
0 50
0 50
50 0 50 50−4 lin-sys wts (arb×10 )
0
50
Figure S3: Feedforward weights are uncorrelated, while recurrent ones are correlated, when learning same recurrent dynamics but with different feedforward transforms. The linear decaying oscillators system was learned for 10000s with either a non-linear or a linear ff feedforward transform. A. The learned feedforward weights wij were plotted for the system with the non-linear feedforward transform versus those with the linear feedforward transform. The feedforward weights for the two systems do not fit the identity line (coefficient of determination R2 is negative; R2 is not the square of a number and can be negative) showing that the learned feedforward transform is different in the two systems as expected. B. Same as A, but for the recurrent weights in the two systems. The weights fit the identity line with an R2 close to 1 showing that the learned recurrent transform is similar in the two systems as expected. Some weights fall outside the plot limits.
41
recurrent Ji network Si predicted state & input P (˜x) ˆ˜γ (t) = x d Si ∗ κ(t)
(˜ x)
dγi
~x ˆ ~u ˆ
i
γi
ˆ˜γ = (~x x ˆ, ~u ˆ)
wij
true dynamics
keiγ
d ˜γ dt x
+
= pγ (~x, ~u)
true state & input x ˜γ (t) = (~x, ~u)
error ˜γ −
Figure S4: Network configuration for learning a general dynamical system The recurrent network decodes the augmented vector xˆ˜γ = (~xˆ, ~uˆ). Compare this configuration with that in Figure 2A. The input is no longer provided directly to the network but only via the augmented error signal. The augmented error signal is fed back into the recurrent layer. Only the recurrent weights need to be learned, as there is no feedforward layer here. After learning is turned off, the error in input must continue to be fed-back to the neurons, to serve as a motor command; but the error in the state variables need not be.
42