An FPGA platform for on-line topology exploration ... - Semantic Scholar

4 downloads 44581 Views 410KB Size Report
In this paper we present a platform for evolving spiking neural networks on FPGAs. Embedded ... E-mail addresses: [email protected] (A. Upegui), carlos.pena@ .... strike a good balance between exploitation of the best solutions, and ...
Microprocessors and Microsystems 29 (2005) 211–223 www.elsevier.com/locate/micpro

An FPGA platform for on-line topology exploration of spiking neural networks Andres Upegui*, Carlos Andre´s Pen˜a-Reyes, Eduardo Sanchez Logic Systems Laboratory, Swiss Federal Institute of Technology, IN-Ecublens, 1015 Lausanne, Switzerland Received 20 October 2003; revised 9 July 2004; accepted 19 August 2004 Available online 15 September 2004

Abstract In this paper we present a platform for evolving spiking neural networks on FPGAs. Embedded intelligent applications require both high performance, so as to exhibit real-time behavior, and flexibility, to cope with the adaptivity requirements. While hardware solutions offer performance, and software solutions offer flexibility, reconfigurable computing arises between these two types of solutions providing a tradeoff between flexibility and performance. Our platform is described as a combination of three parts: a hardware substrate, a computing engine, and an adaptation mechanism. We present, also, results about the performance and synthesis of the neural network implementation on an FPGA. q 2004 Elsevier B.V. All rights reserved. Keywords: Neural hardware; Spiking neuron; Evolvable hardware; Topology evolution; Dynamic reconfiguration; FPGA

1. Introduction Living organisms, from microscopic bacteria to giant sequoias, including animals such as butterflies and humans, have successfully survived on earth during millions of years. If one has to propose but one key to explain such a success, there would certainly be adaptation. Two types of adaptation can be identified on living organisms: at species level, and at individual level. Adaptation at species level, also known as evolution [1,2], refers to the capability of a given species to adapt to an environment by means of natural selection and reproduction. While adaptation at individual level, also known as learning [3], refers to the behaviour changes on an individual, performed by interacting with an environment. Although several artificial approaches have been largely explored by researchers, in contrast with nature, adaptation has been very elusive to human technology. Among other * Corresponding author. Tel.: C41 21 693 67 14; fax: C41 21 693 37 05. E-mail addresses: [email protected] (A. Upegui), carlos.pena@ epfl.ch (C.A. Pen˜a-Reyes), [email protected] (E. Sanchez). 0141-9331/$ - see front matter q 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.micpro.2004.08.012

properties, adaptivity makes artificial neural networks (ANNs) one of the most common techniques for machine learning. Adaptivity refers to the modification performed to an ANN in order to allow it to execute a given task. Several types of adaptive methods might be identified according to the modification done. The most common methods modify either the synaptic weights [4] or/and the topology [5–7]. Synaptic-weight modification is the most widely used approach, as it provides a relatively smooth search space. On the other hand, the sole topology modification provides a highly rugged landscape of the search space (i.e. small changes on the network may result in very different performances), and, even though these adaptation techniques substantially explore the space of computational capabilities of the network, it is very difficult to converge to a solution. A hybrid of both methods could achieve better performance, because the weight-adaptation method contributes to smooth the search space, making it easier to find a solution. Growing [5], pruning [6], and evolutionary algorithms (EAs) [7] are adaptive methods widely used to modify an ANN topology, that, in association with weight

212

A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223

modification, may converge to a solution. We propose, thus, a hybrid method where an adaptation of the structure is done by modifying the network topology, allowing the exploration of different computational capabilities. The evaluation of these capabilities is done by weight-learning, finding in this way a solution for the problem at hand. However, topology modification has a high computational cost. Besides the fact that weight learning can be time-consuming, it would be multiplied by the number of topologies that are being explored. Under these conditions, on-line applications would be unfeasible, unless it is available enough knowledge of the problem to restrict the search space just to perform small topology modifications. A part of the problem can be solved with a hardware implementation, which highly reduces the execution time as the evaluation of the network is performed with the neurons running in parallel. However, a complexity problem remains: while on software, additional neurons and connections imply just some extra loops, on hardware there is a finite area (resources) that limits the number of neurons that can be placed on a network. This is due to the fact that each neuron has a physical existence that occupies a given area and that each connection implies a physical cable that must connect two neurons. Moreover, if an exploration of topologies is done, the physical resources (connections and neurons) for the most complex possible networks must be allocated in advance, even if the final solution is less complex. This fact makes of connectionism a very important issue since a connection matrix for a large number of neurons is considerably resource-consuming. Current Field Programmable Gate Arrays (FPGAs) allow tackling this resource availability problem thanks to their dynamic partial reconfiguration (DPR) feature [8], which allows reusing internal logic resources. This feature permits to dynamically reconfigure some physical logic units, while the circuit remains operational, reducing the size of the hardware requirements, and optimizing the number of neurons and the connectivity resources. In this paper we propose a reconfigurable hardware platform using DPR, which tackles the ANN topologysearch problem. Section 2 presents an introduction to the bio-inspired techniques used in our platform. In Section 3 we present a brief description of FPGAs, and more precisely to dynamic partial reconfiguration. Section 4 presents a description of the full platform. In Section 5 we describe the hardware substrate necessary to support our platform. In Section 6 we discuss the implementation of a GA on our hardware platform. In Section 7 we present a spiking neuron model exhibiting a reduced connectionism schema and low hardware resources requirements. In Section 8 we present some preliminary results: a simulation of a network solving a problem of frequency discrimination, and its respective FPGA implementation as a validation for the network. Section 9 contains a discussion about the possibilities and limitations of the platform, and gives some directions for further work. Finally, Section 10 concludes.

2. Background: bio-inspired techniques Nature has always stimulated the imagination of humans, but it is only very recently that technology is allowing the physical implementation of bio-inspired systems. They are man-made systems whose architectures and emergent behaviours resemble the structure and behaviour of biological organisms [9]. Artificial neural networks (ANNs), evolutionary algorithms (EAs), and fuzzy logic (FL) are the main representatives of a new, different approach to artificial intelligence. Names like ‘computational intelligence’, ‘soft computing’, ‘bioinspired systems’, or ‘natural computing’ among others, are used to denominate the domain involving these and other related techniques. Whatever the name, these techniques exhibit the following features: (1) their role models in different extents are natural processes such as evolution, learning, or reasoning; (2) they are intended to be tolerant of imprecision, uncertainty, partial truth, and approximation; (3) they deal mainly with numerical information processing using little or no explicit knowledge representation. We present below a brief description of ANNs and EAs, and the hybrid between them: Evolutionary ANNs (EANNs). 2.1. Artificial neural networks As said by Haykin: ‘A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use. It resembles the human brain in two respects: (1) Knowledge is acquired through a learning process. (2) Synaptic weights are used to store the knowledge.’ [4]. Among other features, ANNs provides nonlinearity (an ANN made up of nonlinear neurons has a natural ability to realize nonlinear input–output functions), are universal approximators (ANNs can approximate input–output functions to any desired degree of accuracy, given an adequate computational complexity), are adaptable (adjustable synaptic weights and network topology, can adapt to its operating environment and track statistical variations), are fault tolerant (an ANN has the potential to be fault-tolerant, or capable of robust performance, in the sense its performance degrades gradually under adverse operating conditions), and intend to be neurobiologically plausible (neurobiologists look to neural networks as a research tool for the interpretation of neurobiological phenomena. By the same token, engineers look to the human brain for new ideas to solve difficult problems) [4]. In other terms, an artificial neural network is a system that learns to map a function from an input vector to an output vector. It consists on a set of simple units which are called artificial neurons. Each neuron has an internal state which depends on its own input vector. From this state the neuron maps an output that is sent to other units through parallel connections. Each connection has a synaptic weight

A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223

that multiplies the signal travelling through it. So, the final output of the network is a function of the inputs and the synaptic weights of the ANN. In general learning deals with adjusting these synaptic weights, but some algorithms modify also the network architecture—i.e. the network connectionism or the neuron model. Three main types of learning algorithms are identified: supervised, unsupervised and reinforcement learning. In supervised learning, the desired output from the network is known in advance, so modifications are done in order to reduce the resulting error. It is often used for data classification and non-linear control. In unsupervised learning modifications depend on correlations among the input data, so the network is intended to identify these correlations without knowing them in advance. It is used for clustering, pattern recognition, and reconstruction of corrupted data, among others. Finally, in reinforcement learning, modifications are done based on a critic’s score, which indicates how well the ANN performs, but there is no explicit knowledge of the desired solution. It is often used in systems that interact with an environment such as robot navigation and games (e.g. backgammon, chess). 2.2. Evolutionary algorithms Evolutionary computation makes use of a metaphor of natural evolution according to which a problem plays the role of an environment wherein lives a population of individuals, each representing a possible solution to the problem. The degree of adaptation of each individual to its environment is expressed by an adequacy measure known as the fitness function. The phenotype of each individual, i.e. the candidate solution itself, is generally encoded in some manner into its genome (genotype). Evolutionary algorithms potentially produce progressively better solutions to the problem. This is possible thanks to the constant introduction of new ‘genetic’ material into the population, by applying so-called genetic operators which are the computational equivalents of natural evolutionary mechanisms. The archetypal evolutionary algorithm proceeds as follows: an initial population of individuals, P(0), is generated at random or heuristically. Every evolutionary step t, known as a generation, the individuals in the current population, PM(t), are decoded and evaluated according to some predefined quality criterion, referred to as the fitness. Then, a subset of individuals, P 0 (t)—known as the mating pool— is selected to reproduce, according to their fitness. Thus, high-fitness (‘good’) individuals stand a better chance of ‘reproducing,’ while low-fitness ones are more likely to disappear. As they combine elements of directed and stochastic search, evolutionary techniques exhibit a number of advantages over other search methods. First, they usually need a smaller amount of knowledge and fewer assumptions about the characteristics of the search space. Second, they are less prone to get stuck in local optima. Finally, they

213

strike a good balance between exploitation of the best solutions, and exploration of the search space. Among the applications, we can find topics as diverse as molecular biology, analogue and digital circuit synthesis, robot control, etc. 2.3. Evolutionary artificial neural networks Adaptation refers to a system’s ability to undergo modifications according to changing circumstances, thus ensuring its continued functionality. In this context, learning and evolution are two fundamental forms of adaptation. Evolutionary artificial neural networks refer to a special class of artificial neural networks in which evolution is applied as another form of adaptation in substitution of, or in addition to, learning. Evolutionary algorithms are used to perform various tasks, such as connection weight training or initialization, architecture design, learning rule adaptation, and input feature selection. Some of these approaches are examined below. – Evolution of connection weights. In this strategy evolution replaces learning algorithms in the task of minimizing the neural network error function. Global search, conducted by evolution, allows overcoming the main drawback presented by gradient-descent-based algorithms which often get trapped in local minima. It is also useful for problems in which an error-gradient is difficult to compute or estimate. This approach has been widely used as reflected by the numerous references presented by Yao [7]. – Evolution of architectures. The architecture of an artificial neural network refers to its topological structure. Architecture design is crucial since an undersized network may not be able to perform a given task due to its limited capability, while an oversized one may overlearn noise in the training data and exhibit poor generalization ability. Constructive and destructive algorithms for automatic design of architectures are susceptible to becoming trapped at structural local optima. Research on architectural evolution of neural networks have concentrated mainly in the design of connectivity [10–12]. – Evolution of learning rules. The design of training algorithms used to adjust connection weights depends on the type of architectures under investigation. It is desirable to develop an automatic and systematic way to adapt the learning rule to an architecture and to the task to be performed. Research into the evolution of learning rules is important not only in providing an automatic way of optimizing learning rules and in modelling the relationship between learning and evolution, but also in modelling the creative process since newly evolved learning rules can deal with a complex and dynamic environment. Representative advances of this research are [13,14].

214

A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223

Fig. 1. Design layout with two reconfigurable modules. (From Ref. [8]).

3. Dynamic partial reconfiguration on FPGAs FPGAs are programmable logic devices that permit the implementation of digital systems. They provide an array of logic cells that can be configured to perform a given function by means of a configuration bitstream. This bitstream is generated by a software tool, and usually contains the configuration information for all the internal components. Some FPGAs allow performing partial reconfiguration (PR), where a reduced bitstream reconfigures only a given subset of internal components. Dynamic Partial Reconfiguration (DPR) is done while the device is active: certain areas of the device can be reconfigured while other areas remain operational and unaffected by the reprogramming [8]. For the Xilinx’s FPGA families Virtex, Virtex-E, Virtex-II, Spartan-II and Spartan-IIE there are three documented styles to perform DPR: small bits manipulation (SBM), multi-column PR, independent designs (ID) and multi-column PR, communication between designs (CBD). Under the SBM style the designer manually edit lowlevel changes. Using the FPGA Editor the designer can change the configuration of several kinds of components such as: look-up-table equations, internal RAM contents, I/O standards, multiplexers, flip-flop initialization and reset values. After editing the changes, a bitstream can be generated, containing only the differences between the before and the after designs. For complex designs, SBM results inaccurate due to the low-level edition and the lack of automation in the generation of the bitstreams. ID and CBD allow the designer to split the whole system in modules. For each module, the designer must generate the configuration bitstream starting from an HDL description and going through the synthesis, mapping, placement, and routing procedures, independently of other modules. Placement and timing constraints are set separately for each

module and for the whole system. Some of these modules may be reconfigurable and others fixed (see Fig. 1). A complete initial bitstream is generated for the fixed and initial reconfigurable modules. Partial bitstreams are generated for each reconfigurable module. The difference between these two styles of reconfiguration is that CBD allows the inter-connection of modules through a special bus macro, while ID does not. This bus macro guarantees that, each time partial reconfiguration is performed the routing channels between modules remain unchanged, avoiding contentions inside the FPGA and keeping correct connections between modules. While ID results limited for neural-network implementation because it does not support communication among modules, CBD is well suited for implementing layered topologies of networks where each layer matches with a module. CBD has some placement constraints, among which: (1) the size and the position of a module cannot be changed; (2) input–output blocks (IOBs) are exclusively accessible by contiguous modules; (3) reconfigurable modules can communicate only with neighbour modules, and it must be done through bus macros (see Fig. 1); and (4) no global signals are allowed (e.g. global reset), with the exception of clocks that use a different bitstream and routing channels [8].

4. Description of the platform The proposed platform consists of three parts: a hardware substrate, a computation engine, and an adaptation mechanism. Each of them can be addressed on a separate way; however, they are tightly correlated. The hardware substrate supports the computation engine. It must also provide, the flexibility for allowing the adaptation mechanism to modify the engine. Maximum

A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223

flexibility could be reached with a software specification of the full system, however, computation with neural networks is a task that is inherently parallel, and microprocessorbased solutions perform poorly as compared to their hardware counterparts. FPGAs provide high performance for parallel computation and enhanced flexibility compared to application specific integrated circuits (ASIC), constituting the best candidate for our hardware substrate. The computation engine constitutes the problem solver of the platform. We have chosen spiking neurons given their low implementation cost in FPGA architectures [15–18], but other neuron models could also be considered. Other computational techniques are not excluded, such as fuzzy systems, filters, or simple polynomial functions. The adaptation mechanism provides the possibility to modify the function described by the computational part. Two types of adaptation are allowed: structural adaptation and local learning. The first type results very intuitive given the hardware substrate that we present, and consists on a modular structural exploration, where different module combinations are tested, as described on Section 6. This principle applies also for any kind of computational technique and can be implemented using different search algorithms such as swarm optimization [19]. The second type of adaptation depends directly on the computational technique used as it is specific for each one of them and refers to the type of adaptation that does not modify the physical topology. For our system, implemented with neural networks, it refers to synaptic-weight learning which implies modifying only the contents of a memory. For neural network implementations it could also refer to module-restricted growing and pruning techniques where neurons might be enabled or disabled. In the same way, for other computation methods, it must refer to adaptation techniques specific for the given method.

5. Hardware substrate A hardware substrate is required to support our platform. It must provide good performance for real-time applications and enough flexibility to allow topology exploration. The substrate must provide a mechanism to test different possible topologies in a dynamic way, to change connectionism, and to allow a wide-enough search space. Application specific integrated circuits (ASICs) provide very high performance, but their flexibility for topology exploration risks to be reduced to a connection matrix, given the complexity of an ASIC design. Microprocessors offer high degrees of flexibility, but in networks with large number of neurons computed sequentially, execution time could be very long, making them unsuitable for real-time applications. Programmable logic devices appear as the best solution providing high performance thanks to their hardware specificity, and a high degree of flexibility given their dynamic partial reconfigurability.

215

Under the constraints presented in Section 3 for DPR, we propose a hardware substrate that contains two fixed and one or more reconfigurable modules. Fixed modules constitute the codification and de-codification modules. The codification module, placed at the left side of the FPGA (referred to the schema in Fig. 1), receives signals from the real-world and codifies them as inputs for the neural network. This codification may be a frequency or phase coding for spiking neurons, or a discrete or continuous coding for perceptron neurons. On the same way the de-codification module is positioned at the right side of the FPGA and is the one who interprets the outputs from the network to provide output signals. Reconfigurable modules contain the neural network; each one of them can contain any component or set of components of the network, such as neurons, layers, connection matrices, and arrays of them. Different possible configurations must be available for each module, allowing different possible combinations of configurations for the network. A search algorithm should be responsible to search for the best combination of these configurations, specifically, a GA for our case, as presented in Section 6.

6. Our proposed on-line evolving ANN Three main types of evolutionary ANN approaches might be identified: evolution of synaptic weights, evolution of learning rules, and evolution of topologies as summarized in the exhaustive review done by Yao [7]. Evolution of synaptic weights (learning by evolution) is far more timeconsuming than other learning algorithms. Evolution of learning rules (learning to learn), where one searches for an optimal learning rule, could be of further interest for our methodology. Topology evolution is the most interesting as it allows the exploration of a wider search space and, combined with weight learning, is a powerful problem solver. DPR flexibility fits well for topology evolution. The main consequence of the aforementioned features of DPR is a modular structure, where each module communicates solely with his neighbour modules through a bus macro (Fig. 1). This structure matches well with a layered neural-network topology, where each reconfigurable module contains a network layer. Inputs and outputs of the full network should be previously fixed, as well as the number of layers and the connectivity among them (number and direction of connections). While each layer can have whatever kind of internal connectivity, connections among them are fixed and restricted to neighbour layers. For each module, there exists a pool of different possible configurations. Each configuration may contain a layer topology (i.e. a certain number of neurons with a given connectivity). As illustrated in Fig. 2, each module can be configured with different layer topologies, provided that they offer the same external view (i.e. the same inputs and

216

A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223

Fig. 2. Layout of the reconfigurable network topology.

outputs). Several generic layer configurations are generated to obtain a library of layers, which may be used for different applications. A GA [20,21] is responsible for determining which configuration bitstream is downloaded to the FPGA. The GA considers a full network as an individual (Fig. 3). For each application the GA may find the combination of layers that best solves the problem. Input and output fixed modules contain the required logic to code and decode external signals and to evaluate the fitness of the individual depending on the application (the fitness could also be evaluated off-chip). As in any GA the phenotype is mapped from the genome, in this case the combination of layers for a network. Each module has a set of possible configurations and an index is assigned to each configuration, the genome is composed of a vector of these indexes. The genome length for a network with n modules, and c(i) possible configurations forPthe ith module (with iZ 1,2,.n), is given by LZ l(i). For a binary genome encoding l(i)Zlog2(c(i)), while for a positive integer encoding l(i)Z1.

7. Neural model Most neuron models, such as perceptron or radial basis functions, use continuous values as inputs and outputs, processed using logistic, gaussian or other continuous functions [4,5]. In contrast, biological neurons process spikes: as a neuron receives input spikes by its dendrites, its membrane potential increases following a post-synaptic response. When the membrane potential reaches a certain threshold value, the neuron fires, and generates an output pulse through the axon. The best known biological model is the Hodgkin and Huxley model (H and H) [22], which is based on ion current activities through the neuron membrane. The most biologically plausible models are not the best suited for computational implementations. This is the reason why other simplified approaches are needed [23]. The leaky integrate and fire (LI&F) model [24,25] is based on a current integrator, modelled as a resistance and a capacitor in parallel. Differential equations describe the voltage given by the capacitor charge, and when a certain voltage is reached the neuron fires. The spike response

Fig. 3. Evolution of a layered neural network. The genome uses a binary codification. The genome maps an individual, a neural network in this case. When a measure of the fitness is obtained, a new individual can be tested, and so on. When the full population is tested the genetic operands can be applied and restart the calculation of the fitness.

A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223

model order 0 (SRM0) [24,25] offers a response that resembles that of the LI and F model, with the difference that the membrane potential is expressed in terms of kernel functions [24] instead of differential equations. Spiking-neuron models process discrete values representing the presence or absence of spikes; this fact allows a simple connectionism structure at the network level and a striking simplicity at the neuron level. However, implementing models like SRM0 and LI and F on digital hardware is largely inefficient, wasting many hardware resources and exhibiting a large latency due to the implementation of kernels and numeric integrations. This is why a functional hardware-oriented model is necessary to achieve fast architectures at a reasonable chip area cost. 7.1. The proposed neuron model Our simplified integrate-and-fire model [26], as standard spiking models, uses the following five concepts: (1) membrane potential; (2) resting potential; (3) threshold potential; (4) postsynaptic response; and (5) after-spike response (see Fig. 4). A spike is represented by a pulse. The model is implemented as a Moore finite state machine. Two states, operational and refractory, are allowed. During the operational state, the membrane potential is increased (or decreased) each time a pulse is received by an excitatory (or inhibitory) synapse, and then it decreases (or increases) with a constant slope until the arrival to the resting value. If a pulse arrives when a previous postsynaptic potential is still active, its action is added to the previous one. The membrane potential dynamics is

217

described by uðtÞ Z uðt K1ÞKKðuðt K1ÞÞ C (

n X

Wi si ðt K1Þ with KðuðtÞÞ

iZ1

k1

if uðtÞOUrest

ð1Þ Kk2 otherwise where u(t) is the membrane potential at time t, Urest is the constant resting potential, n is the number of inputs to the neuron, Wi is the synaptic weight for input i, si(t) is the input spike at input i and at time t, and k1 and k2 are positive constants that determine, respectively, the decreasing and increasing slopes. When the firing condition is fulfilled (i.e. potential R threshold) the neuron fires, the potential takes on a hyperpolarization value called after-spike potential and the neuron passes then to the refractory state. After firing, the neuron enters in a refractory period in which it recovers from the after-spike potential to the resting potential. Two kinds of refractoriness are allowed: absolute and partial. Under absolute refractoriness, input spikes are ignored, and the membrane potential is given by uðtÞ Z uðt K 1Þ C k2 (2) Under partial refractoriness, the effect of input spikes is attenuated by a constant factor. The membrane potential in this case would be expressed as n X Wi si ðt K 1Þ Z

uðtÞ Z uðt K 1Þ C k2 C

iZ1

(3) a where a is a constant positive integer, and determines the attenuation factor. The refractory state determines the time

Fig. 4. Response of the model to a train of input spikes, and the Moore state machine that describes such response.

218

A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223

Fig. 5. Hebbian learning windows. When neuron n3 fires at tf3, the learning window of neurons n1 and n2 are disabled and enabled, respectively. At time tf3 synaptic weight W13 is decreased by the learning algorithm, while W23 is increased.

needed by a neuron to recover from firing. This time is completed when the membrane potential reaches the resting potential, and the neuron comes back to the operational state. Our model simplifies some features with respect to SRM0 and LI and F, in particular, the post-synaptic response. The way in which several input spikes are processed affects the system dynamics: under the presence of two simultaneous input spikes, SRM0 performs a linear superposition of postsynaptic responses, while our model, in a similar way as LI and F, adds the synaptic weights to the membrane potential. Even though our model is less biologically plausible than SRM0 and LI and F, it is still functionally similar.

the synaptic weight Wij, considering the simultaneity of the firing times of the pre- and post-synaptic neurons i and j. Herein we will describe a simplified implementation of hebbian learning oriented to digital hardware. Two functions are added to the neuron model: active-window and learning. The active-window function determines whether the learning-window of a given neuron is active or not (Fig. 5) maintaining a value 1 during a certain time after the generation of a spike by a neuron ni. The awi function is given by awi ðtÞ Z stepðtif Þ K stepðtif C wÞ

(4)

where tfi is the firing time of ni and w is the size of the learning window. This window allows the receptor neuron (nj) to determine the synaptic weight modification (DWij) that must be done. The learning function modifies the synaptic weights of the neuron, performing the hebbian learning (Fig. 5). Given a neuron ni with k inputs, when a firing is performed by ni the learning modifies the synaptic weights Wij (with jZ 1,2,.,k) as follows Wij ðtÞ Z Wij ðt K 1Þ C DWij ðtÞ with : DWij ðtÞ Z a,awj ðtÞ K b

(5)

7.2. Learning Weight learning is an issue that has not been fully solved on spiking neuron models. Several learning rules have been explored by researchers, being Synaptic Time Dependant Plasticity (STDP), a type of hebbian learning, one of the most studied [24,25]. In general, hebbian learning modifies

where a is the learning rate and b is the decay rate. Both a and b are positive constants such that aOb. These two functions, active-window and learning, increase the amount of interneuron connectivity as they imply, respectively, one extra output and k extra inputs for a neuron (Fig. 6a).

Fig. 6. Proposed hardware neuron (a) External view. (b) Architecture.

A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223

219

7.3. The proposed neuron on hardware Several hardware implementations of spiking neurons have been developed on analog and digital circuits [15–18, 25]. Analog electronic neurons achieve postsynaptic responses very similar to their biological counterpart; however, analog circuits are difficult to setup and debug. On the other hand, digital spiking neurons use to be less biologically plausible, but easier to setup, debug, scale, and learn, among other features. Additionally these models can be rapidly prototyped and tested thanks to configurable logic devices such as FPGAs. The hardware implementation of our neuron model is illustrated in Fig. 6. The neuron is basically composed of: (1) a control unit; (2) a memory containing parameters; (3) some logic resources to compute the membrane potential; (4) two modules performing the learning, and (4) logic resources to interface input and output spikes. The control unit is a finite state machine with two states: operational and refractory. An absolute refractoriness is implemented on our neuron. The computing of a time slice (iteration) is given by a pulse at the input clk_div, and takes a certain number of clock cycles depending on the number of inputs to the neuron. The synaptic weights are stored on a memory, which is swept by a counter. Under the presence of an input spike, its respective weight is enabled to be added to the membrane potential. Likely, the decreasing and increasing slopes (for the post-synaptic and after-spike responses, respectively,) are contained in the memory. Although the number of inputs to the neuron is parameterizable, increasing the number of inputs implies raising both the area cost and the latency of the system. Indeed, the area cost depends highly on the memory size, which itself depends on the number of inputs to the neuron (e.g. the 32!9-neuron on Fig. 4 has a memory size of 32! 9 bits, where the 32 positions correspond to 30 input weights and the increasing and decreasing slopes; 9 bits is the arbitrarily chosen data-bus size). The time required for computing a time slice is equivalent to the number of inputs plus one—i.e. 30 inputs plus either the increasing or the decreasing slope. The dark blocks on Fig. 6, active-window and learning module, perform the learning on the neuron. The activewindow block consists on a counter triggered when an output spike is generated, and that stops when a certain value, the learning window, is reached. The output aw_out values logic-1 if the counter is active and logic-0 otherwise. The learning module performs the synaptic weight learning described above. This module computes the change to be applied to the weights (DW), maintaining them bounded. At each clock cycle the module computes the new weight for the synapse pointed by the COUNTER signal; however, these new weights are stored only if an output spike is generated by the current neuron.

Fig. 7. Layout of the network implemented on hardware.

8. Experimental Setup and Results The experimental setup consists of two parts: a Matlabw simulation for a spiking neural network (Section 8.1), and its respective validation on an FPGA (Section 8.2). 8.1. Network description and simulation A frequency discriminator is implemented in order to test the capability of the learning network to unsupervisedly solve a problem with dynamic characteristics. Using the 30-input neuron described in Section 7.3, we implement a layered neural network with three layers, fulfilling the constraints required for the on-line evolution implementation described in Section 6. Each layer contains 10 neurons and is internally full-connected. Additionally, layers provide outputs to the preceding and the following layers, and receive outputs from them, as described in Fig. 7. For the sake of modularity, each neuron has 30 inputs: 10 from its own layer, 10 from the preceding one, and 10 from the next one. To present the patterns to the network, the encoding module takes into account the following considerations: (1) we use nine inputs at layer 1 to introduce the pattern; (2) the patterns consist on two sinusoidal waveforms with different periods; (3) the waveforms are normalized and discretized to nine levels; (4) every three time slices (iterations) a spike is generated at the input corresponding to the value of the discretized signal (Fig. 8). The simulation setup takes into account the constraints imposed by the hardware implementation. Table 1 presents the parameter setup for the neuron model and for the learning modules. Initial weights are integer numbers generated randomly from 0 to 127. Different combinations of two signals are presented as shown in Fig. 8. In order to help the unsupervised learning—described in Section 7.2—to separate the signals, they are presented as follows: during the first 6000 time slices the signal is swapped every 500 time slices, leaving, between them, an interval of 100 time slices, where no input

220

A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223

Fig. 8. Neural activity on a learned frequency discriminator. The lowest nine lines are the input spikes to the network: two waveforms with periods of 43 and 133 time slices are presented. The next 10 lines show the neuron activity at layer 1, as well as the following lines show the activity of layers 2 and 3. After 10.000 time slices a clear separation can be observed at the output layer where neurons fire under the presence of only one of the waveforms.

spike is presented. Then, this interval between the signals is removed, and these latter are swapped every 500 time slices. Several combinations of signals with different periods are presented to the network. Five tries are allowed for each combination. Some of the signals are correctly separated at least once, while others are not, as shown in Table 2. It must be noticed that the ranges of periods that are separable depends highly on the way in which the data are presented to the network (encoding module). In our case, we are generating a spike every three time slices; however, if higher (or lower) frequencies are expected to be processed, spikes must be generated at higher (or lower) rates. The period range is also affected by the dynamic characteristic of the neuron: i.e. the after-spike potential and the increasing and decreasing slopes. They determine the membranepotential response after input and output spikes, playing a fundamental role on the dynamic response of the full network. Table 1 Set-up parameters for each neuron and for the hebbian learning Neuron parameters Resting-potential Threshold-potential After-spike potential Increasing slope Decreasing slope Potential lower bound

Learning parameters 32 128 18 1 1 K128

Learning rate Decay rate Weight upper bound Weight lower bound Learning window size w

6 4 127 K32 16

8.2. The network on hardware The same neural network described above is implemented on a relatively small FPGA to validate the network execution We work with a Spartan II xc2s200 FPGA from Xilinx Corp. with a maximum capacity of implementing up to 200.000 logic gates. This FPGA has a matrix of 28!42 CLBs (configurable logic blocks), each of them composed of two slices, which contain the logic where the functions are implemented, for a total of 2352 slices. The xc2s200 is the largest device from the low-cost FPGA family Spartan II. Other FPGAs families such as Virtex II offer up to 40 times more logic resources. Table 2 Signal periods presented to the network. Period units are time slices Period 1

Period 2

Separation

40 43 47 47 50 73 73 101 115 133 133

100 133 73 91 100 150 190 133 190 170 190

Yes Yes No Yes Yes No Yes Yes No No No

A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223 Table 3 Synthesis results for a neuron, a layer, and a network

221

the operation frequency of the system, since it implies some reduction on the logic resources.

Unit synthesized

Number of CLB slices

FPGA percentage

A neuron (30 inputs) A layer (10 neurons) A network (three layers, without modules) A network (three layers, modular design)

53 500 1273

2.25 21.26 54.21

1500

63.78

We implemented the 30-input neuron described in Section 7.3 with a data bus of 11 bits, however, the memory maintains its width of 9 bits. The width of the data bus is larger than the memory width preventing transitory overflow for arithmetic operations. Synthesis results about different implementations of these neurons can be found in [15]. The area requirement is very low compared to other more biologically-plausible implementations (e.g. Ros et al. [18] use 7331 Virtex-E CLB slices for two neurons). However, measuring a quantitative comparison in terms of performance with this or with other implementations would be impossible, given the absence of a standard criterion for measuring performance. Several criteria additional to the minimum error achieved on different possible problems might be taken into account, such as: execution-speed, learning-speed, size of the neuron, generalization ability, possibility of learning on-chip or off-chip, possibility of learning on-line or off-line, biological plausibility, etc. Table 3 presents the synthesis results for a neuron, a layer, and the whole network with and without modular design. Note that a layer of 10 neurons take less slices than 10 independent neurons thanks to synthesis optimization. That should also apply for the whole network. However, when the network is modular it is not possible to simplify the implementation, given that each layer has clearlydefined boundaries on the circuit, and cannot be merged with neighbour modules. To test the design, the sequence of input spikes (i.e. after the encoding stage) is stored in a memory block. The hardware network, both in simulation and on-chip, exhibits similar behaviour to that of its Matlab counterpart: clear frequency discrimination is obtained at the output of the network, as some outputs generate spikes only responding to a given input frequency. The system achieves a speed of up to 54.4 MHz. The latency for a time slice is 64 clock cycles, what means that the duration of a time slice can go down to 1.17 ms. The neuron was implemented with a latency of 64 clock slices to allow it to interact with larger neurons with up to 62 inputs, guaranteeing uniformity in the spike duration. However, given that specifically for this network we use only 30-input neurons, the latency could be reduced to 32 clock cycles. This latency reduction may increase also, slightly,

9. Further work Our promising results have incited us to engage in further investigation of this approach. We are currently pursuing five lines of research: (1) improvement of adaptivity techniques, introducing in particular, novel and more effective learning algorithms; (2) enhancement of the computation engine, providing other options for postsynaptic potential response; (3) refining the implementation of the platform in order to allow on-chip evolution; (4) introducing interpretability options using fuzzy logic; and (5) implementing more challenging applications. Below we develop briefly each issue. 9.1. Adaptivity Hebbian learning has shown not to be the best learning technique for embedded applications given its unsupervised nature. However, as in nature, it can be the base to implement other, more advanced, learning strategies. Reinforcement learning uses to be the most suitable type of learning for systems adapting to real world environments. A hybrid between hebbian and reinforcement learning could be implemented in our system by adding dopaminergic signals. In the same way, a hybrid between supervised and hebbian learning could result adequate for some applications. There are, also, other adaptivity approaches to be explored at the evolutionary level. Coevolutionary algorithms [27] model the interaction between several species, where each of them evolves separately but whose fitness is affected by the interaction with other species. For our system each module could be considered as a separated evolving species. 9.2. On-chip evolution Current trends in Systems-On-Chip have led to the commercialization of FPGAs containing powerful hardwired microprocessors, which is the case for the Virtex II pro FPGA containing a PowerPC. Given this availability, why to run a GA on a PC instead of executing it on a high performance processor inside the device being reconfigured? If such high performance is not required, other lower cost solutions are available, like soft processor cores (not hardwired, but implemented on the FPGA logic cells). 9.3. Post-synaptic potential response Other types of post-synaptic responses may be considered as the ones presented by Maass [23]. A type-A neuron uses a post-synaptic response as the one of Fig. 9(a),

222

A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223

Frequency discrimination is just the first step for a more general field, which is signal processing. Challenging applications such as electroencephalography (EEG) and electrocardiography (ECG) signal analysis, and speech recognition, are target applications that could benefit from embedded smart artefacts that adapt by themselves for different users. These are the type of problems where, given their complexity, it is not easy to determine the best type of architecture to solve them, and the evolution of a neural network could provide the required flexibility to search for a correct solution.

must be noticed that we have implemented the full platform on an FPGA board as a stand-alone platform. As computation engine we presented a functional spiking neuron model suitable for hardware implementation. The proposed model neglects several characteristics from biological and software oriented models. Nevertheless, it keeps its functionality and is able to solve a relatively complex task like temporal pattern recognition. Since the neuron model is highly simplified, the lack of representation power of single neurons must be compensated by a higher number of neurons, which in terms of hardware resources could be a reasonable trade-off considering the architectural simplicity allowed by the model. In the case of the frequency discriminator implementation, the use of the sole hebbian learning, given his unsupervised nature, results effective but not efficient. This is due to the nature of the problem—i.e. a classification problem, with the desired output known in advance—where a supervised algorithm would certainly perform better. Although solutions were found for a given set of frequencies, we consider that better solutions could be found with the adequate amount of neurons. While for some classification tasks it remains useful, hebbian learning results inaccurate for other applications. Spiking-neuron models seem to be the best choice for this kind of implementation, given their low requirements of hardware and connectivity [15–18], keeping good computational capabilities, compared to other neuron models [25]. Likewise, layered topologies, which are among the most commonly used, seem to be the most suitable for our implementation method. However, other types of topologies are still to be explored. A simple GA is proposed as adaptation mechanism; however, different search techniques could be applied with our methodology. GAs constitute one of the most generic, simple, and well known techniques, however, we are convinced that it is not the best one: it does not take into account information that could be useful to optimize the network, such as the direction of the error.

10. Conclusions

References

Fig. 9. Post-synaptic potential responses for neurons (a) type-A and (b) type-B.

which could provide lower computing capabilities and lower resources requirements for the FPGA implementation. On the same way, the post-synaptic response for a type-B neuron (Fig. 9(b)) could improve computation, requiring more logic resources. 9.4. Interpretability Many human tasks may benefit from, and sometimes require, decision explanation systems. Among them one can cite diagnosis, prognosis, and planning. However, neural networks produce outputs without providing any insight on the underlying reasoning mechanism. Fuzzy inference systems provide a formalism to represent knowledge in a way that resembles human communication and reasoning. Moreover, their layer structure, somewhat similar to that of neural networks, makes them adequate to lie on our modular architecture. 9.5. Application

We have presented a platform defined by three parts: a hardware substrate, a computation engine, and an adaptation mechanism. We presented each of these three parts and how they can be merged. We described the platform design, simulation and validation, and proposed options to apply different computation and adaptation techniques on our platform. The present work proposes a trade-off between flexibility and performance by means of reconfigurable computing. By that, there is a lot of future task to work yet. The validation of the proposed architecture is presented, for understanding purposes, as a software simulation (Matlab). However, it

[1] S.J. Gould, The Structure of Evolutionary Theory, Belknap Press of Harvard University Press, Cambridge, MA, 2002. [2] M. Ridley, Evolution, 3rd ed., Blackwell Publishers, Oxford, 2004. [3] T.M. Mitchell, Machine Learning, McGraw-Hill, New York, 1997. [4] S. Haykin, Neural Networks, A Comprehensive Foundation, 2nd ed., Prentice-Hall, New Jersey, 1999. [5] A. Perez-Uribe, Structure-adaptable digital neural networks, PhD Thesis, Lausanne, EPFL, 1999. [6] R. Reed, Pruning algorithms—a survey, IEEE Transactions on Neural Networks 4 (1993) 740–747. [7] X. Yao, Evolving artificial neural networks, Proceedings of the IEEE 87 (1999) 1423–1447. [8] Xilinx Corp., XAPP 290: Two Flows for Partial Reconfiguration: Module Based or Small Bits Manipulations, 2002.

A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223 [9] C.G. Langton, Artificial Life: An Overview, MIT Press, Cambridge, MA, 1995. [10] H.A. Abbass, Speeding up backpropagation using multiobjective evolutionary algorithms, Neural Computation 15 (2003) 2705–2726. [11] M. Husken, C. Igel, M. Toussaint, Task-dependent evolution of modularity in neural networks, Connection Science 14 (2002) 219– 229. [12] C. Igel, M. Kreutz, Operator adaptation in evolutionary computation and its application to structure optimization of neural networks, Neurocomputing 55 (2003) 347–361. [13] D. Floreano, J. Urzelai, Evolutionary robots with on-line selforganization and behavioral fitness, Neural Networks 13 (2000) 431–443. [14] Y. Niv, D. Joel, I. Meilijson, E. Ruppin, Evolution of reinforcement learning in uncertain environments: a simple explanation for complex foraging behaviors, Adaptive Behavior 10 (2002) 5–24. [15] A. Upegui, C.A. Pena-Reyes, E. Sanchez, A hardware implementation of a network of functional spiking neurons with hebbian learning, presented at BioAdit—International Workshop on Biologically Inspired Approaches to Advanced Information Technology, Lausanne, 2004. [16] D. Roggen, S. Hofmann, Y. Thoma, D. Floreano, Hardware spiking neural network with run-time reconfigurable connectivity, presented at Fifth NASA/DoD Workshop on Evolvable Hardware (EH 2003), 2003. [17] O. Torres, J. Eriksson, J.M. Moreno, A. Villa, Hardware optimization of a novel spiking neuron model for the POEtic tissue, Artificial Neural Nets Problem Solving Methods, Pt Ii 2687 (2003) 113–120. [18] E. Ros, R. Agis, R.R. Carrillo, E.M. Ortigosa, Post-synaptic timedependent conductances in spiking neurons: FPGA implementation of a flexible cell model, Artificial Neural Nets Problem Solving Methods, Pt Ii 2687 (2003) 145–152. [19] J.F. Kennedy, R.C. Eberhart, Y. Shi, Swarm Intelligence, Morgan Kaufmann Publishers, San Francisco, 2001. [20] M.D. Vose, The Simple Genetic Algorithm: Foundations and Theory, MIT Press, Cambridge, MA, 1999. [21] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA, 1989. [22] A.L. Hodgkin, A.F. Huxley, A quantitative description of membrane current and its application to conduction and excitation in nerve, Journal of Physiology-London 117 (1952) 500–544. [23] W. Maass, Networks of spiking neurons: the third generation of neural network models, Neural Networks 10 (1997) 1659–1671. [24] W. Gerstner, W. Kistler, Spiking Neuron Models. Single Neurons, Populations, Plasticity, Cambridge University Press, Cambridge, 2002. [25] W. Maass, C. Bishop, Pulsed Neural Networks, The MIT Press, Cambridge, MA, 1999. [26] A. Upegui, C.A. Pena-Reyes, E. Sanchez, A functional spiking neuron hardware oriented model, Computational Methods in Neural Modeling, Pt 1 2686 (2003) 136–143.

223

[27] C.A. Pen˜a-Reyes, Coevolutionary fuzzy modeling. PhD Thesis, Lausanne, EPFL, 2002, pp. 148.

Andres Upegui is a PhD student at the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland. He obtained a diploma on Electronic Engineering in 2000 from the Universidad Pontificia Bolivariana (UPB), Medellı´n, Colombia. He joined the UPB microelectronics research group from 2000 to 2001. From 2001 to 2002 he did the Graduate School on Computer Science at the EPFL, and then he joined the Logic Systems Laboratory (LSL) as PhD student. His research interests include reconfigurable computing, bio-inspired techniques and processor architectures.

Carlos Andre´s Pen˜a-Reyes received a diploma in Electronic Engineering from the Universidad Distrital ‘Francisco Jose´ de Caldas’ Bogota´, Colombia, in 1992. He finished postgraduate studies on Industrial Automation at the Universidad del Valle, Cali, Colombia in 1997 and on Computer Science at the Swiss Federal Institute of Technology at Lausanne—EPFL, in 1998. His PhD Thesis from the EPFL was nominated to the prize ‘EPFL 2002 for the best thesis’. He was Assistant Instructor at the Universities Javeriana and Autonoma in Cali, Colombia in 1995 and lecturer at the University of Lausanne, Switzerland in 2003. His research interests include computational intelligence-based modelling techniques, in particular hybrid approaches.

Eduardo Sanchez received a diploma in Electrical Engineering from the Universidad del Valle, Cali, Colombia, in 1975, and a PhD from the Swiss Federal Institute of Technology in 1985. Since 1977, he has been with the Department of Computer Science at the Swiss Federal Institute of Technology, Lausanne, where he is currently a Professor in the Logic Systems Laboratory, engaged in teaching and research. He also holds a professorship at the Ecole d’Ingenieurs du Canton de Vaud, University of Applied Sciences of Western Switzerland. His chief interests include computer architecture, VLIW processors, reconfigurable logic, and evolvable hardware. Dr Sanchez was co-organizer of the inaugural workshop in the field of bio-inspired hardware systems, the proceedings of which are titled Towards Evolvable Hardware (Springer-Verlag, 1996).

Suggest Documents