Implementing Learning on the SpiNNaker Universal Neural Chip ...

Implementing Learning on the SpiNNaker Universal Neural Chip Multiprocessor Xin Jin, Alexander Rast, Francesco Galluppi, Mukaram Khan, and Steve Furber School of Computer Science, University of Manchester Manchester, UK M13 9PL {jinxa,rasta,francesco.galluppi,khanm}@cs.man.ac.uk {steve.furber}@manchester.ac.uk http://www.cs.manchester.ac.uk/apt

Abstract. Large-scale neural simulation virtually necessitates dedicated hardware with on-chip learning. Using SpiNNaker, a universal neural network chip multiprocessor, we demonstrate an STDP implementation as an example of programmable on-chip learning for dedicated neural hardware. By using a pre-synaptic sensitive scheme, we optimize both the data representation and processing for efficiency of implementation. The deferred-event model we developed provides a reconfigurable length of timing records to meet different requirements of accuracy. Results demonstrate successful STDP within a multi-chip simulation containing 60 neurons and 240 synapses. This optimisable learning model illustrates scalable general-purpose techniques essential for developing functional learning rules on general-purpose, parallel neural hardware. Key words: Neural, Spiking, SpiNNaker, Learning, Event-Driven, STDP

1

Introduction

Neural networks are intrinsically learning systems, therefore, that hardware to support neural networks ought to support learning. Nonetheless, many neural network chips have opted not to support any learning on-chip, partly because of scalability concerns revolving around complex update circuitry [1]. Partly this is a result of a well-known and even more fundamental limitation of most neural hardware: fixed model - the device can support one or at most a few selected families of neural network. A universal neural network device is necessary to develop large networks without prior commitment to the model. Such a device must have general-purpose support for on-chip learning. Furthermore by not being “hard-wired”, it can mitigate the scalability concerns involved, exchanging expensive update circuits for simpler general-purpose synaptic logic. For such an architecture, there are three principal requirements: 1) that the device have specific dedicated programmable hardware that the model can use to implement learning, 2) that the learning rule itself be purely software or configuration commands, 3) that the learning implementation be efficient enough to realise

the gains of hardware in a scalable way. In the SpiNNaker chip, an example of a universal neural network chip, the previously-introduced virtual synaptic channel circuitry [2] provides generalised support for on-chip learning without constraining the learning rule. Using deferred-event processing to reorder events makes possible an efficient software-based, event-driven implementation of the well-known STDP learning rule. The methodology has the further advantage of being efficient both in memory utilisation and processing overhead, since it only requires update on receipt of a presynaptic spike. The methods we show here translate theoretical learning rules into efficient actual implementations, and provide a path for future development of (possibly as yet undiscovered) learning rules in hardware.

2 2.1

Architecture and Models The STDP model

New Event (Spike): Retrieve Last Synapse Data

T1

Local Wij, t Shared Synapse Memory Memory

Phase 2: Update Local Weights

Neural Processor Core

Local Memory Wij + δWij(t)

Phase 3: Process

Neural Processor Core

Local Memory Wij'

Phase 4: Write Back Updated Weights and Times

Local Wij', t' Shared Synapse Memory Memory T2 Phase 5: Next Event

Wij' => Wij" t' => t"

(a) STDP rule

(b) Update methods

Fig. 1. STDP update rule: theoretical update curves and conceptual implementation

The model we consider is the Gerstner [3] spike-timing-dependent-plasticity (STDP) learning rule, a Hebbian model for spiking neural networks. We use a simplified implementation (equation 1 and Figure 1(a)) as described in [4]. ( ∆t ∆t < 0, A+ e τ+ F (∆t) = (1) −∆t −A− e τ− ∆t ≥ 0. X X F (∆t)] (2) [γ + ∆W = ε pre

post

Where ∆t is the time difference between the pre- and post-synaptic spike timing, A+ and A− are the maximum amount of synaptic modification, τ+ and

τ− are the time windows determining the range of spike interval over which the STDP occur. If the pre-synaptic spike arrives at the post-synaptic neuron before the post-synaptic neuron fires (I.E. t > 0), it cause the long-term potentiation (LTP) and the synaptic weight is strengthened according to A+ e−t/τ+ . If the presynaptic spike arrives after the post-synaptic neuron fires (I.E. t < 0), it cause the long-term depression (LTD) and the synaptic weight is weakened according to A− et/τ− . The modification is accumulated and the weight is updated according to equation 2. 2.2

SpiNNaker and the Event-Driven Model

As previously introduced [5], SpiNNaker is a universal neural network chip for massively parallel real-time large-scale simulation. Without attempting to describe all the features that have been the subject of previous publication, 3 important design aspects are critical to the on-chip learning model. First, mapping of neurons to processors is many-to-one. Each ARM968 processor (called a fascicle) is capable of modeling upto 1000 Izhikevich neurons [6] with 1 millisecond time resolution in 16-bit fixed-point arithmetic[7]. Second, local memory resources are limited: a 64KB private data Tightly-Coupled Memory (TCM) is available to each processor; but global memory resources are large: a 1Gb external shared SDRAM is available to all 20 processors on a given chip. A dedicated DMA controller makes global memory “virtually local” to each processor by swapping data between SDRAM and TCM [2]. Most synaptic data therefore usually resides off-chip (and off-processor), the synaptic channel providing “just-in-time” local access. Third and most importantly, SpiNNaker uses an event-driven processing model with annotated real-time model delays [8]. There are 2 important events from the point of view of the model. A Timer event, occurring nominally each millisecond, drives neural state update. A spike event, occurring (asynchronously) whenever an input Address-Event-Representation (AER) spike packet arrives at a neuron, triggers an synaptic state update. This event model makes it possible, by exploiting the difference between model “real” time and electronic “system” time, to reorder processing and redistribute synaptic memory resources in order to achieve efficient, yet accurate, on-chip learning [8]. The earlier work [8] outlines the basic method of the deferred event model. Key details of the actual implementation optimise learning for the hardware. Neurons are mapped to cluster groups of postsynaptic target neurons connecting to the same pre-synaptic source onto a single fascicle. Not only does this improve routability, but it allows a single contiguous memory area (called the synapse block) to contain all the synaptic information for the group. A synapse block is a compressed line of 32-bit words containing a 4-bit synaptic delay, a 12bit postsynaptic index and a 16-bit synaptic weight. A single event therefore retrieves the entire group using a DMA operation and makes the entire block of synapses locally available to its respective fascicle. This permits the characteristic feature and most significant optimisation of the method: synaptic update only upon presynaptic input events.

3 3.1

Methodology Mapping STDP to SpiNNaker

Most STDP implementations trigger weight update both on pre- and postsynaptic spikes [9],[10]. In this approach, calculating the ∆t is simply a matter of comparing the history records of spiking timings. This corresponds to examining the past spike history (as in Figure 2(a)), at least within the STDP sensitivity window. However, in SpiNNaker, since the synapse block is a neuron-associative memory array, it can only be indexed either by the pre- or the post-synaptic neuron. If synapses are stored in pre-synaptic order, LTD will be very efficient while LTP plasticity will be inefficient, and vice versa - because one or the other lookup would require a scattered traverse of discontiguous areas of the synaptic block. Furthermore, because of the virtual synaptic channel memory model, a given pre-synaptic indexed synapse block will only appear in the TCM when an associated pre-synaptic spike arrives. As a result, a pre-post sensitive scheme would double the number of SDRAM accesses and be only partially able to take advantage of block orientated contiguous burst transfers.

(a) Pre-post sensitive

(b) Pre sensitive

Fig. 2. STDP implementation methods

To solve this problem, we develop an alternative scheme: pre-synaptic sensitive update. The pre-synaptic sensitive scheme only triggers STDP with the arrival of pre-synaptic spikes (Figure 2(b)) . This guarantees that the synapse block is always in the TCM when STDP is triggered, and makes accessing individual synapses possible by efficient iteration through the array elements when the synapse block is in pre-synaptic order. However, this requires not only emaining the past spike history records, but also the future records. Naturally, future spike timing information is not available at the time the pre-synaptic spike arrives since it has not happened yet . The deferred-event model solves this problem by reordering the spike timing in a time stamp and performing STDP in future (the current time plus the maximum delay and the time window). This ensures accurate recording and incorporation of future spike timings in the update. 3.2

Synaptic Delay and Timing Records

Synaptic delays (axonal conduction delays) play an important role in the simulation of spiking neural networks with plasticity. In SpiNNaker, delays are annotated as a post-process upon receipt of a spike, the individual delay values being

a synaptic parameter. This makes the delay itself entirely programmable; the reference model uses delays from 1 - 16 ms for each connection [7]. STDP requires both pre-synaptic and post-synaptic spike timings. The SDRAM stores a presynaptic time stamp with 2ms resolution at the beginning of each synapse block (Figure 3) which is updated when an associated spike arrives. The time stamp has two parts, a coarse time and fine time. Coarse time is a 32-bit digital value representing the last time the neuron has fired. Fine time is a bitmapped field of 24 bits representing spike history in the last 48 ms. Post-synaptic time stamps reside in local TCM (Figure 3) and have a similar format to pre-synaptic time stamps except that they are 64 bits long (represents 128ms), allowing longer history records to account for input delays. Post-synaptic time stamps are updated when their corresponding neurons fire.

Fig. 3. Time stamps for STDP

3.3

Method and Model

Input pre-synaptic spikes trigger the learning rule following an algorithm that proceeds in 3 steps:update the pre-synaptic time stamp, traverse post-synaptic connections and update synaptic weights, as shown in Figure 1(b). Step 1: Update the pre-synaptic time stamp In the first step, the presynaptic time stamp is updated. The fine time stamp is shifted left until bit 0 equals the time of the current spike. If any ‘1’ is shifted out (goes to bit 25), STDP starts. Bit 25 then represents the pre-synaptic spike time used to compute the update. Step 2: Traverse post-synaptic connections This step checks the postsynaptic connections one by one. First, the time of bit 25 is incremented by the synaptic delay to convert the electronic timing to the neural timing T . Second, the neuron’s ID is used as an index to retrieve the post-synaptic spike time stamp from the TCM. Step 3: Update synaptic weights Next, the processor calculates the LTD window [T − T− , T ] and the LTP window [T, T + T+ ]. If any bit in the postsynaptic time stamp is ‘1’ within the LTD window or LTP window, the synaptic weight is either potentiated or depressed according to the STDP rule.

Each of the three steps may run through several iterations. If there are n ‘1’s shifted to bit 25 in step 1, m connections in the synapse block in step 2 and l bits within the time window in step 3, the computational complexity in Step 3 will dominate as O(nml). For the sake of performance, Step 3 updates should be as efficient as possible. 3.4

Length of Time Stamps

The length of the time stamp effects both the performance and the precision of the STDP rule. Longer history records permit better precision at the cost of significantly increased computation time. Determining the optimal history length is therefore model-dependent upon required precision and performance. The test model assumes peak firing rates ∼10Hz. TCM memory limitations lead to the choice of 64-bit post-synaptic time stamp, able to record a maximum of 128ms. A 24-bit pre-synaptic time stamp with 2 ms resolution and a maximum of 16 ms delay guarantees a 24 ∗ 2 − 16 = 32ms LTP window for any delay. This in turn permits a 1000/(128 − 32) = 10.4Hz firing rate to guarantee the same 32ms time window for LTD. These lengths are reconfigurable (dynamically if necessary) to any other value to meet different requirements.

4

Results

(a) Spike raster plot

(b) Weight curves of connections from presynaptic neuron 6. The synaptic weight going rapidly to 0 is a self-connection.

Fig. 4. STDP results. At the beginning of the simulation input neurons fire synchronously, exciting the network which exhibits high-amplitude synchronized rhythmic activity around 5/6 Hz. As synaptic connections evolve according to STDP, uncorrelated synapses are depressed while correlated synapses are potentiated. Since the network is small and the firing rate is low, most synapses will be depressed (as per panel b), leading to a lower firing rate. The synaptic weight going rapidly to zero is the self-connection of neuron 6: since each spike arrives short after a previous spike the synapses is quickly depressed.

We implemented a neural network on a cycle accurate four-chip SpiNNaker simulator based on the ARM’s SOC designer [11] to test our model. The network is largely based on the code published in [10], which was also used to test the consistency of our results. It has 48 Regular Spiking Excitatory neurons (a = 0.02, b = 0.2, c = -65, d = 8) and 12 Fast Spiking Inhibitory neurons (a = 0.1, b = 0.2, c = -65, d = 2). Each neuron connects randomly to 40 neurons (self-synapses are possible) with random 1-16 ms delay; inhibitory neurons only connect to exictatory neurons. Initial weights are 8 and -4 for excitatory and inhibitory connections respectively. We used τ+ = τ− = 32ms, A+ = A− = 0.1 for STDP. Inhibitory connections are not plastic [12]. There are 6 excitatory 1 inhibitory input neurons, receiving constant input current I = 20 to maintain a high firing rate. We ran the simulation for 10 sec (biological time). Figure

Fig. 5. Weight modification caused by the correlation of the pre and post time. Modification is triggered by pre-synaptic spikes. The weight curve in between two pre-synaptic spikes is firstly depressed because of LTD window and then potentiated because of the LTP window.

4 gives the results: the left part shows the raster plot and the right part the evolution of synaptic weights of connections from pre-synaptic neuron id 6 (an input neuron). Detail weights modification of the self-connection in correlation with pre- and post-synaptic timing is shown in Figure 5.

5

Discussion and Conclusion

Implementing STDP on SpiNNaker indicates that general-purpose neural hardware with on-chip, real-time learning support is feasible. The pre-synaptic sensitive scheme and the deferred-event model provide the core of the solution, but nonetheless as we have seen requires careful optimisation and an efficient implementation if it is to be effective. Implementing learning on any hardware neural

system is a tradeoff between performance and functionality. With SpiNNaker, the user can choose that tradeoff according to the needs of his model. There is considerable work remaining to develop both additional rules and additional extensions to the rule above. Besides maximising performance and accuracy with parameter adjustments, we are also investigating methods to implement chemical- dependent LTP and LTD (as well as methods for long-distance chemical transmission). The long-term goal is to have a “library” of learning rules that the user can instantiate on-chip or use as templates to modify in order to fit his model. Acknowledgements We would like to thank the Engineering and Physical Sciences Research Council (EPSRC), Silistix, and ARM for support of this research. S.B. Furber is the recipient of a Royal Society Wolfson Merit Award.

References 1. Maguire, L., McGinnity, T.M., Glackin, B., Ghani, A., Belatreche, A., Harkin, J.: Challenges for large-scale implementations of spiking neural networks on fpgas. Neurocomputing 71(1-3) (December 2007) 13–29 2. Rast, A., Yang, S., Khan, M.M., Furber, S.: Virtual Synaptic Interconnect Using an Asynchronous Network-on-Chip. In: Proc. 2008 Int’l Joint Conf. Neural Networks (IJCNN2008). (2008) 2727–2734 3. Gerstner, W., Kempter, R., van Hemmen, J.L., Wagner, H.: A Neuronal Learning Rule for Sub-millisecond Temporal Coding. Nature 383(6595) (Sep. 1996) 76–78 4. Song, S., Miller, K.D., Abbott, L.F.: Competitive hebbian learning through spiketiming-dependent synaptic plasticity. Nature Neuroscience 3 (2000) 919 – 926 5. Plana, L.A., Furber, S.B., Temple, S., Khan, M.M., Shi, Y., Wu, J., Yang, S.: A GALS Infrastructure for a Massively Parallel Multiprocessor. IEEE Design & Test of Computers 24(5) (Sep.-Oct. 2007) 454–463 6. Izhikevich, E.: Simple Model of Spiking Neurons. IEEE Trans. Neural Networks 14 (November 2003) 1569–1572 7. Jin, X., Furber, S., Woods, J.: Efficient Modelling of Spiking Neural Networks on a Scalable Chip Multiprocessor. In: Proc. 2008 Int’l Joint Conf. Neural Networks (IJCNN2008). (2008) 2812–2819 8. Rast, A., Jin, X., Khan, M.M., Furber, S.: The Deferred Event Model for HardwareOriented Spiking Neural Networks. In: Proc. 2008 Int’l Conf. Neural Information Processing (ICONIP08). (2009) 000–000 9. Masquelier, T., Guyonneau, R., Thorpe, S.J.: Competitive STDP-based spike pattern learning. Neural Computation 21(5) (2009) 1259–1276 10. Izhikevich, E.: Polychronization: Computation with spikes. Neural Computation 18(2) (February 2006) 245–282 11. Khan, M., Painkras, E., Jin, X., Plana, L., Woods, J., Furber, S.: System level modelling for SpiNNaker CMP system. In: Proc. 1st International Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools (RAPIDO’09). (2009) 12. Bi, G., Poo, M.: Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J. Neuroscience 18(24) (1998) 10464–10472

Implementing Learning on the SpiNNaker Universal Neural Chip ...

Implementing Learning on the SpiNNaker Universal Neural Chip ...

Suggest Documents

SpiNNaker: A Multi-Core System-on-Chip for Massively ... - CiteSeerX

modeling spiking neural networks on spinnaker - APT - The University ...

Toward universal learning: implementing ... - unesdoc, unesco

Toward universal learning: implementing ... - unesdoc, unesco

Implementing DSP Algorithms with On-Chip Networks

Development of a Dynamically Extendable SpiNNaker Chip ... - NTNU

Breaking the millisecond barrier on SpiNNaker

Development of a Dynamically Extendable SpiNNaker Chip ... - NTNU

Neural Circuits on a Chip - MDPI

Universal Fingerprinting Chip Server - Bioinformation

The SpiNNaker Project - IEEE Xplore

Implementing Artificial Neural Networks on Programmable Logic

Universal neural field computation

Universal neural field computation

A Learning Analog Neural Network Chip with Continuous ... - CiteSeerX

fully parallel learning neural network chip for real

Implementing the Universal Virtual Computer - Springer Link

The SpiNNaker Project - IEEE Xplore

System-on-Chip Implementation of Neural Network Training on FPGA

A Universal Intelligent System-on-Chip Based Sensor ... - CiteSeerX

A Universal Intelligent System-on-Chip Based ... - Semantic Scholar

A Universal Intelligent System-on-Chip Based Sensor Interface - MDPI

Multilevel (3D) lab on chip for implementing reconfigurable ... - Hal

Design & Implementing Network on Chip Router ... - Google Sites