{rui.araujo,nicolai.waniek,conradt}@tum.de http://www.nst.ei.tum.de. Abstract. The SpiNNaker neural computing project has created a hard- ware architecture ...
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-11179-7_103
Development of a Dynamically Extendable SpiNNaker Chip Computing Module Rui Ara´ ujo, Nicolai Waniek, and J¨org Conradt Technische Universit¨ at M¨ unchen, Neuroscientific System Theory, Karlstraße 45, 80333 M¨ unchen, Germany {rui.araujo,nicolai.waniek,conradt}@tum.de http://www.nst.ei.tum.de
Abstract. The SpiNNaker neural computing project has created a hardware architecture capable of scaling up to a system with more than a million embedded cores, in order to simulate more than one billion spiking neurons in biological real time. The heart of this system is the SpiNNaker chip, a multi-processor System-on-Chip with a high level of interconnectivity between its processing units. Here we present a Dynamically Extendable SpiNNaker Chip Computing Module that allows a SpiNNaker machine to be deployed on small mobile robots. A nonneural application, the simulation of the movement of a flock of birds, was developed to demonstrate the general purpose capabilities of this new platform. The developed SpiNNaker machine allows the simulation of up to one million spiking neurons in real time with a single SpiNNaker chip and is scalable up to 256 computing nodes in its current state. Keywords: mobile robotics, parallel computing module, spiking neural networks simulations, SpiNNaker
1
Introduction
Abstract processes carried out in the biological brain are still one of the great challenges for computational neuroscience. Despite an increasing amount of experimental data and knowledge of individual components, we have only insufficient understanding of the operations of intermediate levels. However, those are believed to be the key elements in constructing thoughts and in processing information [5]. Large-scale neural network simulations will help increasing our knowledge about these intermediate levels. However, the huge amount of neurons in the human brain [6] and their connectivities [2] are difficult if not impossible to simulate. General purpose digital hardware is ill suited to perform these simulations, as they are unable to cope with the massive parallelism carried out in the brain. A possible approach is the usage of neuromorphic systems such as the BrainScales [7] or the Neuro-Grid hardware [3] which emulate the neural network with a physical implementation of the individuals neurons.
1
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-11179-7_103
2
Development of a SpiNNaker Chip Computing Module
Another possible approach is the SpiNNaker system, which is a massivelyparallel computer architecture based on a multi-processor System-on-Chip (MPSoC) technology that can scale up to a million cores. It is capable of simulating up to a billion spiking neurons in biological real time with realistic levels of interconnectivity between the neurons. The SpiNNaker system was motivated by the attempt to understand and study biological computing structures. Their high level of parallelism with frugal amounts of energy are opposed to traditional electronics designs, which were mostly driven by serial throughput just until recently. The biological approach to the design such many-cores architecture also brings new concerns in terms of fault-tolerance computation and efficiency. The SpiNNaker chip, the basic building block of a SpiNNaker machine, relies on smaller processors than other machines but in larger number, it has 18 highly efficient embedded ARM processors that allow the SpiNNaker system to be competitive according to two metrics, MIPS/mm2 and MIPS/W.
Fig. 1. The SpiNNaker Chip Computing Module Dataflow Architecture.
The currently available machines with SpiNNaker chips are relatively large, the minimum size at this moment is 105 × 95mm, which limits their deployment on systems with limited size as for example, small mobile robots, specially flying ones due to very strict weight and space constraints. Additionally the SpiNNaker systems currently require a workstation, usually a desktop or a laptop, connected
2
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-11179-7_103
Development of a SpiNNaker Chip Computing Module
3
through an Ethernet connection to bootstrap the system every time it powers on and to feed the processing data into the system. This requirement seriously limits the independence and deployment capabilities of systems with embedded SpiNNaker chips. At present, it necessary to add an wireless router in order to have a mobile system with a SpiNNaker machine [4]. The drawbacks from this approach are fairly obvious, such as increased power consumption and space requirements since the typical wireless router consumes around 4 to 5 watts and even though there are fairly small models available at the market it would still take up some space. Furthermore the amount of extensibility provided by standard SpiNNaker machines is very limited since it only allows increases of computing power in fixed amounts. The current single board SpiNNaker machines are available in two versions, one with four chips and another with forty eight. These are wildly different amounts of processing capability which make it difficult to create intermediate solutions. It would be interesting to have the capability of selecting how many SpiNNaker chips one needs to deploy without having to design new hardware. The current requirements of the SpiNNaker architecture are not suitable for a lot of applications where its processing power and capabilities would be helpful. It is then necessary to design a new solution that it can overcome the limitations of the present options.
2
SpiNNaker Chip Computing Module
Traditional SpiNNaker systems rely on the host system to provide the input data. This limits the possibilities of interfacing the SpiNNaker chip with the external environment. In order to overcome this restriction, we added a microcontroller which is responsible for communicating with the outside world. Figure 1 presents the general architecture for the developed solution which involves: 1) the design of a PCB with a single SpiNNaker chip and a microcontroller, 2) the software to drive the microcontroller, and 3) an application that runs on the host system to communicate with the board. We strived to keep the hardware as simple and small as possible to achieve one of our main goals: portability. Therefore, our solution consists only of the necessary components. Several requirements determined the usage of our system. For instance, backwards compatibility with the standard tools used for the bigger SpiNNaker machines, ybug and tubotron, was an important aspect that lead to the creation of the host system application. Having backwards compatibility allows users of our novel system to simply apply their already established know-how about SpiNNaker and its tools. Adding the microcontroller had various benefits. We could replace the Ethernet connection with a simpler, albeit slower, universal asynchronous receiver/transmitter (UART) at a baud rate of 12 MBps, which roughly translates into 1.2 Megabytes/s each way. Additionally, the microcontroller can boot the SpiNNaker chip without a host system as it is capable of storing the previously transmitted image. The
3
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-11179-7_103
4
Development of a SpiNNaker Chip Computing Module Microcontroller Prototype Debug I/O
SpiNNLink Port (optional)
SpiNNaker Chip
Fig. 2. The SpiNNaker Chip Computing Module.
microcontroller and the application in the workstation act jointly to pretend they are another SpiNNaker chip which is connected to the Ethernet, having their own point to point (P2P) address and position in the grid. SpiNNaker machines are a two dimensional grid of nodes, where a node is a SpiNNaker chip, each with its own P2P address, a 16 bit number. 2.1
Evaluation
We took measurements to characterize our computing module. The main objective was to determine the input and output capacity in terms of SpiNNaker packets since higher level protocols rely on these. Both cores ran at 180 MHz during all tests. The methodology used for these procedures was as follows: – change the source code to set and clear a GPIO pin around the action to be measured. – use the switch present in the prototype board to trigger the transmission of a packet. – using an oscilloscope, measure the time taken between the changes in state of the previously selected GPIO pin. The results are compiled in Table 1. The first two lines represent the time taken to transmit a packet, including the calculation of the parity bit and consequential addition to the header, while the following two lines represent the time taken only during the symbol transmission. There is no such difference for the input side since it is only reading symbols and placing them at the queue in the correct position. It is possible to calculate the symbol transmission capacity with these results. Each symbol takes around 90 ns to be transmitted, which translates into a 11 MSymbol transmission rate. The input capacity, around 43 % of the transmission rate, lies at 4.78 MSymbols.
4
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-11179-7_103
Development of a SpiNNaker Chip Computing Module
5
These numbers are fairly small when compared to the capacity on the SpiNNaker side, which is around 62.4 MSymbols in both directions. This large difference is explained by the fact the microcontroller implementation is entirely software based as opposed to the SpiNNaker which is implemented in dedicated hardware. Furthermore, the difference between the transmission and the reception rate lies in the increased complexity when receiving symbols. Whereas receiving includes a step to identify symbols, no such step is required when transmitting packets. Another analysis showed that the transmission and reception capacity exceeds the UART’s 12 MBps by far. This means that this communication channel is currently the bottleneck. The packet transmission rate resides at 455K packets per second for packets with payload and 770K packets per second for packets without further payload. As for the packet reception rate, 435K packets per second can be received if they do not include a payload, whereas the rate for packets with payload is around 238K packets per second. Action Time taken (µs) Packet Transmission with payload 2.20 Packet Transmission without payload 1.30 Symbol transmission for packet with payload 1.80 Symbol transmission for packet without payload 1.00 Packet reception with payload 4.20 Packet reception without payload 2.30 Table 1. Performance Measurements for the transmission and reception of SpiNNaker packets.
3
Boids Model
We developed a case study for the SpiNNaker Chip Computing Module to demonstrate possible applications. SpiNNaker was obviously designed to simulate spiking neural networks. however, we opted for a more general example and simulated the aggregate motion of a flock of birds while using distributed rules for each bird. The implemented model for this simulation is traditionally named the Boids model. The Boids model is a distributed behavioral model [8] for flocks of flying birds or fish schools. It shares many characteristics with particle systems – large sets of individual items, each with its own behavior – but has several crucial differences. Traditional particle systems are usually used to model fire, smoke, or water. Each particle is created, ages, and finally dies, and is generally denoted by a point-like structure. In a boids simulation however, individual boids have a geometrical shape and, consequently, describe an orientation. Additionally, typical particles do not interact between themselves as opposed to e.g. a bird which must do so in order to flock appropriately.
5
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-11179-7_103
6
Development of a SpiNNaker Chip Computing Module
Boids models are often referred to as a prime example of Artificial Life [1]. They illustrate a variety of principles which appear in natural systems, such as emergence where complex behavior comes from the local interaction of simple rules. Another example is unpredictability. Although a bird usually does not behave chaotically in a temporally local context, it is difficult or near impossible to predict its behavior on a larger time scale. A natural flock has certain behavioral rules that allows it to exist and survive. For instance, birds tend to avoid collisions with other birds but still stay close to the flock in order to protect the whole flock against predators. We impose such possibly contradictory rules on our simulated boids to induce seemingly natural flocking behaviors. The basic rules are: – Collision Avoidance: avoid collisions with nearby flock members – Velocity Matching: attempt to match velocity with nearby flock members – Flock Centering: attempt to stay close to nearby flock members Each behavior rule produces an acceleration which is a contribution to a tunable weighted average. The relative strength of each rule will dictate the general behavior of the flock. For example, if the flock centering behavior has a very low impact then the flock will be very sparse while it will still follow a common direction. Since the model attempts to simulate the movement of birds, it must be based on a semi-realistic model of flight. It does not need to take in consideration all physical forces like aerodynamic drag or even gravity but it must limit the velocity and instantaneous accelerations to realistic values. These restrictions help modeling creatures with finite amounts of energy.
Fig. 3. A frame of the Boids Visualizer with 2176 birds.
3.1
Evaluation
We developed two distinct versions of the simulation. One incorporates the capabilities of our SpiNNaker Chip Computing Module to simulate the boids model
6
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-11179-7_103
Development of a SpiNNaker Chip Computing Module
7
which is described above, the other does not and executes the simulation on a standard desktop computer. Both implementations use a desktop computer to visualize the simulation results. The differences between the two simulations can be directly inspected with this approach. A screenshot of the simulation is shown in Figure 3 and video of the simulation can be found online at https://www.youtube.com/watch?v=KiyhVRgxugY. The method used for the evaluation is as follows: – select a number of birds for the simulation. – run the simulation on the SpiNNaker Chip Computing Module, and record the frames per second (FPS) of the visualization. – run the simulation on the computer and, again, record the frames per second. The results of this benchmark are compiled in Table 2. The central processing unit (CPU) of the computer used for running this simulation was an AMD Phenom II X4 945 running at 3 GHz. The O(N 2 ) complexity of the algorithm is clearly visible in the results. The increase of the number of birds leads to an ever increasing reduction of the frame rate. The version that uses the SpiNNaker Chip Computing Module also suffers from a reduction in the frame rate due to the shear number of objects it has to represent. The graphics code was not optimized to handle a large number of birds, but as it is the same code for both versions, the rendering time is negligible. The upper limit of 60 frames per second is due to the monitor’s internal refresh rate setting. Clearly, the SpiNNaker Chip Computing Module is advantageous. The simulation delivers higher frame rates when using the module when compared to the desktop version of the simulation. Number of birds FPS on SCCM 2176 60 4352 59 6528 43 Table 2. Frames per second for the simulation with and Computing Module (SCCM).
4
FPS on Desktop Computer 60 30 15 without the SpiNNaker Chip
Summary and Conclusions
We developed hardware and software for a novel module consisting of a single SpiNNaker chip and an auxiliary microcontroller. The system can simulate huge neural networks but still has a very low power consumption. In fact, its realtime I/O capabilities with limited power requirements make it highly suitable for mobile power efficient systems. This point is even more emphasized by the module’s small size. Furthermore, our module is designed to be easily extendable
7
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-11179-7_103
8
Development of a SpiNNaker Chip Computing Module
by other SpiNNaker Chip Computing Modules or by other SpiNNaker systems. We demonstrated the capabilities of one single module using a boids simulation. The evaluation shows that our system is capable of simulating large numbers of distributed items. Hence, the module is ideal to simulate large networks of artificial neurons. We are currently working on a version which is even further reduced in size. It will thus be able to use the module on tiny robots, in neuroprosthetics, or multiple connected modules to simulate even larger artificial neural networks with only little space consumption. Acknowledgments. This project would not have been possible without the precious help from Steve Temple, Luis Plana and Francesco Gallupi from the University of Manchester who supplied the SpiNNaker chips and provided essential guidance while studying the SpiNNaker system.
References 1. Bedau, M.A.: Artificial life: organization, adaptation and complexity from the bottom up. Trends in Cognitive Sciences 7(11), 505 – 512 (2003) 2. Brotherson, S.: Understanding Brain Development in Young Children (April 2009), http://www.ag.ndsu.edu/pubs/yf/famsci/fs609.pdf 3. Choudhary, S., Sloan, S., Fok, S., Neckar, A., Trautmann, E., Gao, P., Stewart, T., Eliasmith, C., Boahen, K.: Silicon neurons that compute. In: Villa, A., Duch, ´ W., Erdi, P., Masulli, F., Palm, G. (eds.) Artificial Neural Networks and Machine Learning – ICANN 2012, Lecture Notes in Computer Science, vol. 7552, pp. 121–128. Springer Berlin Heidelberg (2012) 4. Denk, C., Llobet-Blandino, F., Galluppi, F., Plana, L., Furber, S., Conradt, J.: Realtime interface board for closed-loop robotic tasks on the spinnaker neural computing system. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E., Appollini, B., Kasabov, N. (eds.) Artificial Neural Networks and Machine Learning – ICANN 2013, Lecture Notes in Computer Science, vol. 8131, pp. 467–474. Springer Berlin Heidelberg (2013) 5. Furber, S., Brown, A.: Biologically-inspired massively-parallel architectures - computing beyond a million processors. In: Application of Concurrency to System Design, 2009. ACSD ’09. Ninth International Conference on. pp. 3–12 (2009) 6. Nguyen, T.: Total number of synapses in the adult human neocortex. Journal of Mathematical Modeling: One + Two 3(1) (2010) 7. Pfeil, T., Gr¨ ubl, A., Jeltsch, S., M¨ uller, E., M¨ uller, P., Petrovici, M.A., Schmuker, M., Br¨ uderle, D., Schemmel, J., Meier, K.: Six networks on a universal neuromorphic computing substrate. ArXiv e-prints (Oct 2012) 8. Reynolds, C.W.: Flocks, herds and schools: A distributed behavioral model. SIGGRAPH Comput. Graph. 21(4), 25–34 (Aug 1987)
8