Management Protocol (SNMP) [3] standard in order to control the SpiNNaker ... 4) Next a 'functional' neural monitoring software appli- cation will be created ...
Command and Control of a Massively Parallel Biologically Inspired Computer System James Cameron Patterson Supervisor: Prof. Stephen B. Furber, Advisor: Dr. Aphrodite Galata.
Abstract— A novel approach is proposed to the problem of managing a very large scale computer system - SpiNNaker. SpiNNaker has been developed to undertake real-time parallel neural simulations, using many hundreds of thousands of interconnected processing cores. Key to the successful operation of such a design is the functionality provided to operators to manage the machine’s hardware system. However it is also envisaged that similar monitoring requirements for the ‘customers’ of the machine will be required, providing realtime visibility of the operational state of the neural software system in-flight.
I. I NTRODUCTION T O S PI NNAKER SpiNNaker is a computing architecture optimised to operate massive parallel spiking neural network models in realtime [9]. It has additional design goals of frugality of power use and resilience to component failures within the system [5]. These goals mimic characteristics of the biological brain, which in humans achieves incredibly high performance using billions of simple, fundamental processing components (neurons) working together in parallel [4], [11]. Each SpiNNaker chip is a custom designed Chip MultiProcessor (CMP) including up to 20 independent ARM968 processing cores, and supporting components, controllers and memories [6]. Each of the processors is loaded with neural software simulating around 1,000 biological neurons in real time [8]. In a SpiNNaker system chips are interconnected in a meshed 6-way torus topology (Figure 1), with configurations of up to 1 million processor cores and a billion neurons.
II. R ESEARCH A IMS In order to manage large systems with many thousands of components, a command and control system is deployed with responsibility for fault alerting, capacity management, performance analysis, accounting of resources, and security [7]. The research problem is to identify the system components to be monitored, at what depth, and to create an efficient scalable model solution to the problem. A second research issue surrounding the SpiNNaker system is a consequence of the real-time goals of the machine. Users who create their neural network simulations will require an interface to view the real-time function, and to interact with it rather than it just operating as a ‘black-box’ system. The aim of the research contribution is to combine these two distinct real-time ‘system’ and ‘neural application’ management functions into a single proven framework, scalable to a massively sized SpiNNaker system. The third research area is based around the SpiNNaker resiliency aim. Hard faults including link/core/chip failures are automatically disabled by hardware, however the software consequences are not dealt with (similarly for soft-faults such as high-error rates on links, or persistent ‘hot-spots’ where some shared resource is being swamped). The contribution to this area of research is an approach to the automation of decision making around data path re-routing and neuron/chip mappings, with rapid self-healing of the system in a consistent manner not necessitating abandonment of the simulation run. III. R ESEARCH
AND
C ONTRIBUTIONS
TO
DATE
A. Research Areas
Fig. 1.
Multi-chip SpiNNaker CMP System shown as a torus.
This is a 1000 word report submitted for examination for continuation beyond 1st year of the PhD programme at the University of Manchester
Research has been carried out into management systems, control protocols and management information structures, and a decision has been taken to use the Simple Network Management Protocol (SNMP) [3] standard in order to control the SpiNNaker system. The sent/retrieved data is concentrated around areas of contended resource in a treelike structure known as the Management Information Base (MIB) [10]. Research has also been carried out into imaging of neural networks, particularly that performed on living subjects. Functional imaging of the brain highlights areas which are stimulated, and they ‘light-up’ in the scan (eg. fMRI [2], Figure 2). It is therefore proposed that a similar imaging technique is used for the real-time software visualisation of the neural network simulation. A paper has been accepted for the ‘The 21st UK Asynchronous Forum’ on the Command and Control topic, and a talk will be given there in September 2009 [1].
SpiNNaker p System Fig. 2. An Image from a fMRI scan detailing regions of activation including primary visual cortex
Ethernet(s) Ethernet(s)
Protocol Translator
B. Contributions integrated into the SpiNNaker chip A novel ‘Spoofed IP’ packet format has been designed, validated and implemented into the SpiNNaker ROM, providing (low-processing cost) external packet routing functionality, permitting remote collaboration. A full test suite was created for the communication aspects of the chip, specifically looking at the router diagnostic function. Feedback from this testing led to a redesign of the diagnostic registers, and bug-fixes to the on-chip router. Subsequently for the next iteration of the SpiNNaker chip a full set of diagnostic requirements has been produced, refining and increasing flexibility and detail.
SNMP
Eg.NMS Systems Software Capacity Vi li ti Visualisation Pl Planning i
Fig. 3.
∙
IV. F UTURE P LANS The following descriptions correspond with the five colour-matched work sections on Page 3’s Gantt chart. After each of 2 → 5 it is proposed to create a conference/journal paper, and write relevant the thesis sections (boxed items). Please see Page 3 for the time-line, 1) The first item for attention is creation of the host system software to support the test chip (2009Q4) that will provide a base for all other research. 2) The second item is to provide improvements to the existing protocols and diagnostics again readying for the the following research items. 3) Following host system implementation, an SNMP management framework and MIB shall be created that covers both hardware and neural software MIB monitoring. However due to limited memory and processing resources in SpiNNaker, it is impossible to support all the SNMP functionality on the SpiNNaker chips themselves. Therefore a novel external Protocol Converter is proposed translating between the lightweight SpiNNaker and the standardised SNMP management stations (Figure 3). 4) Next a ‘functional’ neural monitoring software application will be created using the SNMP framework and protocol converter. 5) Finally it is proposed that the output of the monitoring platforms is analysed in real-time by an in-flight system reconfiguration function, which will permit reconfiguration of the system to automatically route around faults or re-map neurons if ‘hot-spots’ are detected, alerting the operators to the same. V. P ROPOSED T HESIS S TRUCTURE ∙
Chapter 1 Introduction - discussion of the problems, motivation for approaching them, research methodology and notable publications list.
Ethernet
Host System HostSystem
∙
∙
∙
∙
∙
Hardware Vi li ti Visualisation
Proposed SpiNNaker management systems
Chapter 2 Background Research - background materials such research into network management, SNMP, MIB, neural imaging and the SpiNNaker project. Chapter 3 Research Aims and Contribution - full approach to what the PhD is aiming to achieve and its contributions to the research area. Chapter 4 Protocol Converter - the research and implementation, the network protocols devised, the experimentation and the results. Chapter 5 Management and Imaging - tackles both the the hardware and neural network software imaging. May need to split into two separate chapters. Chapter 6 Automated Reconfiguration - actions automated from management alerting - research and results from experiments on in-flight system reconfiguration. Chapter 7 Conclusions and Future Research - discussion of conclusions drawn from PhD Command and Control research work, and its applicability beyond just SpiNNaker. PhD provoked further research areas. VI. C ONCLUSIONS
Although still under development, it is believed that the combined framework management system as outlined with its Protocol Converter could provide a viable and promising low-overhead method of managing and controlling SpiNNaker multi-processor systems of arbitrary size, particularly due to its approach of conserving resources. It is possible that this method could be applied to other multi-processor systems, particularly those with restricted resources, and to embedded systems. There has already been some success with the ‘IP spoofing’ protocol, simulating IP packets in the SpiNNaker test environment, and this appears to be a good low-processing cost approach to significantly increasing network functionality of the system. Additional improvements to chip diagnostics have been proposed for the production chip in order for the management function to enjoy greater power and flexibility in future, over and above those of the test chip.
FinalPhDWriteUp
Paper:LargeScaleNeuralSystemR.T.Reconfig.
I.F.S.Rimplementation+experiments(5)
INFLIGHTSYSTEMRECONFIGURATIONspec(5)
Writeup+Paper:NeuralImagingofA.N.N.
NeuralImagingdataforofflineviaP.C.(4)
NEURALIMAGINGofA.N.N.onhostsystem(4)
2ndyearreport,researchsymposiumetc
WriteUp+Paper:ProtocolConvertor
P.C.Performance/Topologyexperiments(3)
ImplementPROTOCOLCONVERTER(3) l ( )
DESIGNMIBstructuresforMP+SpiNNaker(3)
WriteUp+Paper:LowCostIPcompatibility
ROM Update EthernetFraming+ARP(2) ROMUpdateͲ Ethernet Framing + ARP (2)
DiagnosticRegisterupdate+testing(2)
HostSystemonSpiNNakerTestChip(1)
HOST SYSTEM + test on SOC Sim (1) HOSTSYSTEM+testonSOCSim(1)
R EFERENCES [1] The 21st UK Asynchronous Forum. Retrieved from the internet, 30th August 2009. URL: http://www.cs.bris.ac.uk/home/ simon/async_forum/Async_Forum_09-Bristol.html. [2] J. W. Belliveau, D. N. Kennedy, R. C. McKinstry, B. R. Buchbinder, R. M. Weisskoff, M. S. Cohen, J. M. Vevea, T. J. Brady, and B. R. Rosen. Functional mapping of the human visual cortex by magnetic resonance imaging. Science, 254:716–719, Nov 1991. [3] J Case, M Fedor, M Schoffstall, and C Davin. RFC 1067 A Simple Network Management Protocol (SNMP). Retrieved from the internet, 1989. URL: http://tools.ietf.org/html/rfc1098. [4] J. E. Dowling. Neurons and Networks: An Introduction to Behavioral Neuroscience. Harvard University Press, ISBN: 0674004620, 2nd revised edition edition, 2001. [5] S. Furber and A. Brown. Biologically-inspired massively-parallel architectures - computing beyond a million processors proc. 9th international conference on the application of concurrency to system design. In ACSD, 2009. [6] S. Furber, S. Temple, and A. Brown. On-chip and inter-chip networks for modeling large-scale neural systems. In Circuits and Systems, 2006. ISCAS 2006. Proceedings. 2006 IEEE International Symposium on, pages 4 pp.–, 0-0 2006. [7] ISO. Information processing systems – Open Systems Interconnection – Basic Reference Model – Part 4: Management framework. Retrieved from the internet, 29th August 2009. URL: http://www.iso. org/iso/catalogue_detail.htm?csnumber=14258. [8] Xin Jin, S.B. Furber, and J.V. Woods. Efficient modelling of spiking neural networks on a scalable chip multiprocessor. In Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, pages 2812–2819, June 2008. [9] Luis A. Plana, Steve B. Furber, Steve Temple, Mukaram Khan, Yebin Shi, Jian Wu, and Shufan Yang. A gals infrastructure for a massively parallel multiprocessor. IEEE Design and Test of Computers, 24(5):454–463, 2007. [10] M. Rose. Management Information Base for Network Management of TCP/IP-based internets: MIB-II. Retrieved from the internet, 1990. URL: http://tools.ietf.org/html/rfc1158. [11] R. W. Williams and K. Herrup. The control of neuron number. Annual Review of Neuroscience, 11(1):423–453, 1988.