A Novel Approach for the Implementation of Large ... - Semantic Scholar

3 downloads 271 Views 518KB Size Report
inherent parallelism and run orders of magnitude faster than software simulations, becoming thus, adequate for real-time applications. Developing custom ASIC ...
A Novel Approach for the Implementation of Large Scale Spiking Neural Networks on FPGA Hardware B. Glackin, T.M. McGinnity, L.P. Maguire, Q.X. Wu, and A. Belatreche Intelligent Systems Engineering Laboratory, Magee Campus, University of Ulster, Derry, Northern Ireland, BT48 7JL, UK Phone: +44-28-7137-5591, Fax: +44-28-7137-5570 [email protected]

Abstract. This paper presents a strategy for the implementation of large scale spiking neural network topologies on FPGA devices based on the I&F conductance model. Analysis of the logic requirements demonstrate that large scale implementations are not viable if a fully parallel implementation strategy is utilised. Thus the paper presents an alternative approach where a trade off in terms of speed/area is made and time multiplexing of the neuron model implemented on the FPGA is used to generate large network topologies. FPGA implementation results demonstrate a performance increase over a PC based simulation. Keywords: SNN, FPGA, Hardware, I&F.

1 Introduction Recent trends in computational intelligence have indicated a strong tendency towards forming a better understanding of biological systems and the details of neuronal signal processing [1]. Such research is motivated by the desire to form a more comprehensive understanding of information processing in biological networks and to investigate how this understanding could be used to improve traditional information processing techniques [2]. Spiking neurons differ from conventional artificial neural network models as information is transmitted by the mean of spikes rather than by firing rates [3-6]. It is believed that this allows spiking neurons to have richer dynamics as they can exploit the temporal domain to encode or decode data in the form of spike trains [7]. Software simulation of network topologies and connection strategies provides a platform for the investigation of how arrays of these spiking neurons can be used to solve computational tasks. Such simulations face the problem of scalability in that biological systems are inherently parallel in their architecture whereas commercial PCs are based on the sequential Von Neumann serial processing architecture. Thus, it is difficult to assess the efficiency of these models to solve complex problems [8]. When implemented on hardware, neural networks can take full advantage of their inherent parallelism and run orders of magnitude faster than software simulations, becoming thus, adequate for real-time applications. Developing custom ASIC devices J. Cabestany, A. Prieto, and D.F. Sandoval (Eds.): IWANN 2005, LNCS 3512, pp. 552 – 563, 2005. © Springer-Verlag Berlin Heidelberg 2005

A Novel Approach for the Implementation of Large Scale Spiking Neural Networks

553

for neural networks however is both time consuming and expensive. These devices are also inflexible in that a modification of the basic neuron model would require a new development cycle to be undertaken. Field Programmable Gate Arrays (FPGAs) are devices that permit the implementation of digital systems, providing an array of logic components that can be configured in a desired way by a configuration bitstream [3]. The device is reconfigurable such that a change to the system is easily achieved and an updated configuration bitstream can be downloaded to the device. Previous work has indicated that these devices provide a suitable platform for the implementation of conventional artificial neural networks [9, 10]. It is the aim of the authors current research to devise a strategy for the implementation of large scale SNNs on FPGA hardware. A primary objective of the research is to investigate how biological systems process information and as a result the system employs biologically plausible neuron models. Section two of this paper provides details on the SNN model used for the hardware implementation while the approach used to enable large scale networks to be implemented is described in section three. Results using this approach to implement a 1D co-ordinate transform application are detailed in section four while conclusions and future work are presented in section five.

2 SNN Model Implementation Considerable research has been undertaken in developing an understanding of the behaviour of a biological neuron however there are less research results available on how large arrays of these interconnected neurons combine to form powerful processing arrays. The Hodgkin-Huxley spiking neuron model [11] is representative of the characteristics of a real biological neuron. The model consists of four coupled nonlinear differential equations which are associated with time consuming software simulations and would incur high “real-estate” costs when a hardware implementation is targeted [12]. Thus hardware implementations demand the application of a simpler neuron model. One such model is the Integrate-and-Fire (I&F) model which has been widely investigated by other researchers [12-14]. The model consists of a first order differential equation where the rate of change of the neuron, v, is related to the membrane currents. The I&F model used is,

cm

w j g sj (t ) j dv(t ) = g l ( El − v(t )) + ∑ ( E s − v(t )) A dt j

(1)

For a given synapse j, an action potential (AP) event at the presynaptic neuron at time j tap triggers a synaptic release event at time t ap + t delay , a discontinuous increase in the synaptic conductance j j g sj (t ap + t delay + dt ) = g sj (t ap + t delay ) + q sj

(2)

554

B. Glackin et al.

otherwise g sj (t ) is governed by dg sj (t ) − 1 j = j g s (t ) dt τs

(3)

The forward Euler integration scheme with a time step of dt=0.125ms was used to solve the I&F model equations. Using this method differential equation (1) becomes (4) w j g sj (t ) j 1 v(t + dt ) = v(t ) + (( g l ( El − v(t )) + ∑ ( E s − v(t ))))dt c A j and differential equation (3) becomes, g sj (t + dt ) = g sj (t ) + (

−1

τi

(5)

g sj (t ))dt

A description of the parameters from the above equations and the values used can be found in Table 1. Table 1. I&F Model Parameters

Param cm

Common 8nF/mm2

Pyramidal

Inhibitory Description membrane capacitance

-70mV

membrane reversal potential

v th

-59mV

threshold voltage

v reset

-70mV

reset voltage

τ ref

6.5ms

refractory period

j t delay

0.5ms

propagation delay

El

gl A E sj τ sj

1uS

1uS 2

membrane leak conductance 2

0.03125mm 0.015625mm membrane surface area 0mV -75mV reverse potential of synapse 1ms

4ms

synapse decay time

The hardware implementation employed a fixed point coding scheme. 18 bits were used to represent the membrane voltage (4), 10 bits of which correspond to the fractional component. The synaptic conductance (5), was implemented using 12 bit precision for the fractional component. Multiplicand and divisor parameters were chosen as powers of 2 so that they could be implemented using binary shift thus utilising a smaller amount of logic than what would be required by a full multiplier or divider. A biologically plausible Spike Timing Dependant Plasticity (STDP) algorithm, based on the Song and Abbott approach, was implemented in hardware to train the network[15-16]. Each synapse in an SNN is characterized by a peak conductance q (the peak value of the synaptic conductance following a single pre-synaptic action

A Novel Approach for the Implementation of Large Scale Spiking Neural Networks

555

potential) that is constrained to lie between 0 and a maximum value qmax. Every pair of pre and post-synaptic spikes can potentially modify the value of q, and the changes due to each spike pair are continually summed to determine how q changes over time. The simplifying assumption is that the modifications produced by individual spike pairs combine linearly. A pre-synaptic spike occurring at time tpre and a post-synaptic spike at time tpost modify the corresponding synaptic conductance by q ! q + qmax F(∆t), where ∆t = tpre + tpost and F(∆t) is defined by (6) ⎧ A exp(∆t / τ + ), if ∆t < 0 F (∆t ) = ⎨ + ⎩− A− exp(∆t / τ − ), if ∆t ≥ 0 The time constants τ+ and τ- determine the ranges of pre- to postsynaptic spike intervals over which synaptic strengthening and weakening are significant, and A+ and A_ determine the maximum amount of synaptic modification in each case. The modification function for the STDP algorithm is shown in Fig. 1.

Fig. 1. STDP Modification Function

3 Large Scale Implementation Despite the computational simplicity of the I&F model there is still a limit to the size of network that can be implemented on an FPGA. Consider the logic requirements to incorporate each SNN component on an FPGA as shown in Table 2. Using a fully parallel implementation the number of neurons that can be implemented on an FPGA is generally limited to the number of embedded multipliers provided. Further multipliers can be generated from the logic, however this is area intensive and rapidly consumes the remainder of the resources, leaving little logic to implement STDP synapses. The largest device in the Xilinx Virtex II series of FPGAs is the XC2V8000 device which consists of 46,592 slices and 168 embedded

556

B. Glackin et al.

Table 2. SNN Logic Requirements SNN Component Synapse STDP Synapse Neuron

Slices 33 106 63

Embedded Multipliers 0 0 1

multipliers. Therefore using an STDP synapse to neuron connection ratio of 1:1 a total of 168 neurons and 168 STDP synapses could be implemented on a XC2V8000 device (using approximately 60% of the available resources for functional circuitry). Increasing the connection ratio of STDP synapses to neurons to a more realistic ratio of 10:1 enables a network consisting of 41 neurons and 410 synapses to be implemented. Finally using a more biologically plausible ratio of 100:1 indicates that the largest size network that could be implemented would be 4 neurons and 400 STDP synapses. Utilising this fully parallel implementation with a clock speed of 100 MHz and a Euler integration time step of 0.125ms per clock period, a 1 second real time period of operation could be completed in 0.8ms. This provides a speed up factor of 12500 compared to real time processing. This compares very favourably with a software implementation where the data is processed serially and can take hours to run a several second real time simulation. The speed up factor indicates that there are possibilities for examining speed/area trade offs and determining alternative solutions that will lead to larger network capabilities. By partitioning the design between hardware and software a compromise can be made between the size of network that can be implemented and the computing speed. One approach is to use a controller that can time-multiplex a single neuron or synapse, which reduces the amount of logic required but increases the amount of time to compute. A flow diagram for this process is shown in Fig. 2. The program running on the processor loops through the number of neurons/ synapses to be computed. It reads the past time steps values for the neuron/synapse and sends them to the hardware component as inputs. The result of the hardware computation is read back and then written to RAM such that it can be used for the next time step computation. This approach has been adopted and a system has been developed which utilises the MicroBlaze soft processor core. MicroBlaze is a 32-bit soft processor core featuring a RISC architecture with Harvard-style separate 32-bit instruction and data busses. Synapse and neuron blocks are implemented in the logic along with the processor cores and can be accessed to perform custom SNN instructions for the processor. An overview of the system developed is shown in Fig. 3.

Fig. 2. Flow Diagram

A Novel Approach for the Implementation of Large Scale Spiking Neural Networks

557

Fig. 3. MicroBlaze Overview

Fig. 4. Multiple Processor System

The main component in the system is the MicroBlaze soft processor core. The program running on this processor is contained in the Local Memory Bus (LMB) Block RAM (BRAM) and is accessed through the LMB BRAM controller. The neuron/synapse/STDP blocks are implemented in the FPGA logic and accesses by the processor using the Fast Simplex Links (FSL) interface which are uni-directional

558

B. Glackin et al.

point-to-point communication channel busses. The neural network data for the network is then held in SRAM which is external to the FPGA device. This memory is accessed over the On-chip Peripheral Bus (OPB) via the OPB External Memory Controller (EMC). As the memory is synchronous a clock-deskew configuration using two Digital Clock Managers (DCM) blocks with a feedback loop ensures that the clock used internal to the FPGA and that for clocking of the SRAM chips is consistent. Additional memory is also made available to the processor in the form of additional BRAM which is also accessed over the OPB bus. Internal status of the network such as the synapse weights or output spike trains can be displayed on a host PC via the serial port which is provided by the OPB User Asynchronous Receive Transmit (UART). Finally the OPB MDM provides an interface to the JTAG debugger allowing the system to be debugged if problems arise. Multiple instances of the soft processor core can be instantiated on a single FPGA thus a degree of parallelism can be maintained in the network while a number of FPGAs can also be designated to operate in parallel. Thus the system can be extended by increasing the number of processors in the system as shown in Fig. 4.

4 Co-ordinate Transform In order to test the system described in the previous section, a spiking network designed to perform one dimensional co-ordinate transformation was implemented. Co-ordinate transformation is used in the SenseMaker system [17], to convert arm angles of a haptic sensor to (x,y) co-ordinates. These co-ordinates are then used to generate a haptic image representation of the search space. A biologically plausible strategy was developed based on the principle of the basis function network proposed by Deneve et al [18] and was further extended by incorporating the STDP algorithm described in section 2 to train the network to perform the function y = x – c. An input layer with 100 neurons is fully connected to the output layer with synapses for which weights are determined by STDP. The training layer with 100 neurons is connected to the output layer with fixed weights within a receptive field. The output layer receives spikes from both the input and the training layers during the training stage. Only the input layer is used during the testing stage. Using this approach a network was trained to perform the function y = x – 72°. A Gaussian distributed spike train was presented at each input neuron while a similarly distributed spike train was applied at the appropriate training neuron. For example if a spike train centred at neuron 50, i.e. 0°, was applied to the input layer, a spike train centred at neuron 30, i.e. -72° would be applied at the training layer. Using this approach for the complete network produced the weight distribution matrix as shown in Fig. 6. This figure shows, in a graphical manner, the value of the synaptic weights connecting the input and output layers converged after training has been completed. It can be seen that the weight distribution reflects the function itself, i.e. a shift y = x 72°. The figure shows the fixed weight interpretation of the synapse weights. As a 12 bit coding scheme was used, the actual weight can be determined by dividing by 212, e.g. 48 ÷ 212 = 0.01171875.

A Novel Approach for the Implementation of Large Scale Spiking Neural Networks

Fig. 5. SNN Trained by STDP for y=x-c

Fig. 6. Weight Distribution Matrix

559

560

B. Glackin et al.

After training, the network was tested by applying input spike trains and observing the output spike train produced by the network implemented on the FPGA. These tests indicated that the network possessed good generalisation capabilities, with an input spike train centred on neuron x, an output spike train could be observed centred on neuron x – 72°. Fig. 7 and Fig. 8 display the input and output spike trains for one such test where an input centred at neuron 50 (0°) was applied and an output centred at neuron 30 (-72°) can be observed. These results were obtained from a single processor system as depicted in Fig. 3. Experiments have also been undertaken to evaluate the performance of the multiprocessor system when applied to the co-ordinate transformation application. For this experiment the network structure was divided up over the 4 processors, as depicted in Fig. 4, such that each processor was responsible for 25 of the 100 output neurons. Using this approach the weight matrices for the individual processors were obtained and are shown in Fig. 9. These individual matrices can then be combined to form the single matrix for the overall network depicted in Fig. 10.

Fig. 7. Input Layer Spike Activity

Fig. 9. Multi-Processor Weight Matrices

Fig. 8. Output Layer Spike Activity

Fig. 10. Combined Weight Matrix

To test the maximum capability of this system the largest network possible was also trained on the FPGA system. The network structure consisted if 1400 input

A Novel Approach for the Implementation of Large Scale Spiking Neural Networks

561

synapses, training synapses and output synapses with each processor responsible for 350 of these output neurons. In total there are 1,964,200 synapses and 4200 neurons for this network. The results obtained are the same as for the smaller versions of the network. The FPGA implementation for the experiments detailed in this section took place on the BenNuey system developed by Nallatech which provides a multi-FPGA platform. Two of the FPGAs in this platform were utilised for the 4 MicroBlaze system, an XC2V8000 and an XC2V4000 device. The platform can accommodate up to 7 FPGAs, thus this system could be extended further by configuring additional FPGAs in the system with additional processors. The times taken to perform a 1 second real time simulation using the approaches above are shown in Table 3. Table 3. Simulation Performance

Network Size 100*100*100 1400*1400*1400

PC (Matlab) 2,917.2s 454,418s

1 µBlaze 115.7s N/A

4 µBlaze 27.2s 4,237s

The multi-processor FPGA approach achieves a performance improvement factor of 107.25 over the Matlab simulation which was implemented on a Pentium 4 2GHz PC.

5 Conclusions and Further Development The results presented in this paper displays that the existing system is a viable platform for the implementation of large scale biologically plausible neural networks on FPGA devices. The current system outperforms a Matlab software simulation by over a factor of 100. Further improvements could be gained by increasing the number of processors in the system thus distributing the processing power and introducing a higher degree of parallelism. Further enhancements however have also been identified which may further improve performance of the system. It is a goal of this work to benchmark the MicroBlaze soft processor core against the PowerPC core which is integrated on Xilinx Virtex II Pro devices to determine possible gains from moving to this technology. The work to date has identified that the combination of multiple processor instances with dedicated SNN blocks interfaced to the processor can give performance gains over software simulations. A target of this work is also however to improve the performance of the system such that a large amount of neurons and synapses can be simulated on the FPGA system in real time. From analysing the implementations to date it has been identified that large portions of the network are regularly inactive for large periods of time. With the current approach however for each time step of the simulation each synapse and neuron value is computed regardless. A more efficient approach would be to only compute neuron and synapse when necessary, i.e. when input activity for that component is present. This event based simulation is the next stage of this work which aims to further improve the

562

B. Glackin et al.

computational performance of the hardware implementations. Future work is also investigating the applicability of this platform to replicate the perceptual capabilities of humans which may demand networks of the order of 105 neurons and 108 synapses.

Acknowledgements The authors acknowledge the financial and technical contribution of the SenseMaker project (IST-2001-34712) which is funded by the EC under the FET life like perception initiative.

References [1] U. Roth, A. Jahnke and H. Klar, "Hardware Requirements for Spike-Processing Neural Networks", From Natural to Artificial Neural Computation (IWANN), edited by J. Mira and F. Sandoval, Springer Verlag Berlin, 1995, Page(s): 720 -727. [2] Wolfgang Maass (Editor), Christopher M. Bishop (Editor), Pulsed Neural Networks, MIT Press [3] Andrés Upegui, Carlos Andrés Peña-Reyes, and Eduardo Sanchez. “A methodology for evolving spiking neural-network topologies on line using partial dynamic reconfiguration”, ICCI - International Conference on Computational Inteligence. Medellin, Colombia. November 2003. [4] A. Perez-Uribe. Structure-adaptable digital neural networks. PhD thesis. 1999. EPFL. [5] E. Ros, R. Agis, R. R. Carrillo E. M. Ortigosa. Post-synaptic Time-Dependent Conductances in Spiking Neurons: FPGA Implementation of a Flexible Cell Model. Proceedings of IWANN'03: LNCS 2687, pp 145-152, Springer, Berlin, 2003. [6] Haykin. Neural Networks, A Comprehensive Foundation. 2 ed, Prentice-Hall, Inc, New Jersey, 1999. [7] D. Roggen, S. Hofmann, Y. Thoma and D. Floreano, “Hardware spiking neural network with run-time reconfigurable connectivity in an autonomous robot,” NASA/DoD Conference on Evolvable Hardware July 2003. [8] A. Delorme, J. Gautrais, R. VanRullen and S.J. Thorpe, “SpikeNET: A simulator for modeling large networks of integrate and fire neurons.” Neurocomputing, 26- 27, 989996 1999. [9] J.J. Blake, T.M. McGinnity, L.P. Maguire, The Implementation Of Fuzzy Systems, Neural Networks and Fuzzy Neural Networks Using FPGAs, Information Sciences, Vol. 112, No. 1-4, pp. 151-68, 1998. [10] B. Glackin , L. P. Maguire, T. M. McGinnity, Intrinsic and extrinsic implementation of a bio-inspired hardware system, Information Sciences, Volume 161, Issues 1-2 , April 2004, Pages 1-19 [11] Hodgkin, A. L. and Huxley, A. F. (1952) "A Quantitative Description of Membrane Current and its Application to Conduction and Excitation in Nerve" Journal of Physiology 117: 500-544 [12] Gerstner W, Kistler W, Spiking Neuron Models: Single Neurons, Populations, Plasticity, Cambridge University Press, 2002 [13] Koch C, Biophysics of Computation: Information Processing in Single Neurons, Oxford University Press, 1999

A Novel Approach for the Implementation of Large Scale Spiking Neural Networks

563

[14] Dayan P, Abbott LF, Theoretical Neuroscience: Computational and Methematical Modeling of Neural Systems, MIT Press, Cambridge, 2001 [15] Song, S., Miller, K. D., and Abbott, L. F. (2000).Competitive Hebbian learning though spike-timing dependent synaptic plasticity. Nature Neuroscince, 3:919-926. [16] Song, S. and Abbott, L.F. (2001) Column and Map Development and Cortical ReMapping Through Spike-Timing Dependent Plasticity. Neuron 32:339-350 [17] http://sensemaker.infm.ulst.ac.uk/ [18] Deneve S., P. E. Latham and A. Pouget, Efficient computation and cue integration with noisy population codes, Nature Neuroscience 4:826-831, 2001.

Suggest Documents