sensors Article
A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System Zhen Zhang, Cheng Ma *
ID
and Rong Zhu
Department of Precision Instrument, Tsinghua University, Beijing 100084, China;
[email protected] (Z.Z.);
[email protected] (R.Z.) * Correspondence:
[email protected]; Tel.: +86-010-6277-1694 Received: 27 June 2017; Accepted: 21 August 2017; Published: 23 August 2017
Abstract: Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA) architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP). The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO) real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas. Keywords: artificial neural networks; FPGA; neuromorphic processor; granularity variable; MIMO control
1. Introduction In recent years, machine learning has entered into our daily life. When we communicate with smart phones using natural language or get pictures on digital cameras using face detection, artificial intelligence plays a key role in the process [1]. Over the past decade, Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), have become the state-of-the-art methods and achieved amazing success in machine learning, especially in visual recognition, speech recognition, and other domains [2–7]. With significantly higher accuracy than traditional algorithms in various tasks like face recognition and image processing [8,9], DNNs have attracted the enthusiastic interest of internet giants such as Google [10,11], Microsoft [12], Facebook [13], and Baidu [14]. There are several hardware platforms for developing accelerated implementation of DNN models, including multicore CPUs [15], General Purpose Graphics Processing Units (GPGPUs) [16], Application Specific Integrated Circuits (ASICs) [17], and Field Programmable Gate Arrays (FPGAs) [18]. CPUs and GPUs are parts of General Purpose Processors (GPPs). The classic platforms based on CPU and GPU are SpiNNaker and Carlsim, correspondingly. The SpiNNaker machine is a specifically designed computer for supporting the sorts of communication found in the brain. It is based on the connection of processing nodes, which have eighteen ARM processor cores in one node. Over hundred Sensors 2017, 17, 1941; doi:10.3390/s17091941
www.mdpi.com/journal/sensors
Sensors 2017, 17, 1941
2 of 13
Sensors 17, connection 1941 2 of 13 based 2017, on the of processing nodes, which have eighteen ARM processor cores in one node.
Over hundred neurons can be modelled in each processor core and there are one thousand input synapsescan connected to each [15]. The Carlsim is a GPU-accelerated which connected is capable neurons be modelled in neuron each processor core and there are one thousandsimulator input synapses of simulating the neural model [19]. GPPs can provide a high degree of flexibility and tend to be more to each neuron [15]. The Carlsim is a GPU-accelerated simulator which is capable of simulating the readily accessible. However, the hardware performs with less energy efficiency, which is of particular neural model [19]. GPPs can provide a high degree of flexibility and tend to be more readily accessible. importancethe in hardware embedded,performs resource-limited or server-based scale deployments However, with lessapplications energy efficiency, which is large of particular importance[1]. in Recently, the development of the neuromorphic processor has received increasing embedded, resource-limited applications or server-based large scale deployments [1]. attention. For GPPs, application level execution relies on the traditional von has Neumann It stores Recently, the development of the neuromorphic processor receivedarchitecture. increasing attention. instructions and data in external memory to be fetched. The von Neumann architecture nonFor GPPs, application level execution relies on the traditional von Neumann architecture. Itisstores scalable and inefficient in executing massive neural networks, and the von Neumann bottleneck can instructions and data in external memory to be fetched. The von Neumann architecture is non-scalable be mitigated by colocated computation and memory [17]. and inefficient in executing massive neural networks, and the von Neumann bottleneck can be As seen in Figure 1, the centralized sequential von Neumann architecture computer is different mitigated by colocated computation and memory [17]. fromAs theseen brain’s distributed parallel architecture. The processor’s increasing clock frequencies and in Figure 1, the centralized sequential von Neumann architecture computer is different power densities are headed away from the operating point of the brain. As to implementing neural from the brain’s distributed parallel architecture. The processor’s increasing clock frequencies and networks in a von Neumann architecture computer, a central processor has to simulate power densities are headed away from the operating point of the brain. As to implementing neural communication infrastructure and a great number of neurons. The bottleneck which serves as the networks in a von Neumann architecture computer, a central processor has to simulate communication communication channel between the processor and external memory causes power-hungry data infrastructure and a great number of neurons. The bottleneck which serves as the communication movement while retrieving synapse states and updating neuron states [17]. A single processor is not channel between the processor and external memory causes power-hungry data movement while suitable for simulating highly interconnected networks, which will cause interprocessor messaging retrieving synapse states and updating neuron states [17]. A single processor is not suitable for explosions [17]. simulating highly interconnected networks, which will cause interprocessor messaging explosions [17].
(a)
(b)
Figure 1. 1. Von Neumann architecture architecture chip chip features. features. (a) clock Figure Von Neumann (a) Comparison Comparison of of power power densities densities and and clock frequencies between brain and von Neumann architecture processors. Reprinted with permission frequencies between brain and von Neumann architecture processors. Reprinted with permission from AAAS AAAS [17]. [17]. (b) von Neumann Neumann from (b) Schematic Schematic diagram diagram of of implementing implementing neural neural networks networks in in von architecture computers. architecture computers.
In comparison, the neuromorphic processor has a different architecture. The special In comparison, the neuromorphic processor has a different architecture. The special computation computation structure of neural networks implies that the hardware suitable for exploiting pipeline structure of neural networks implies that the hardware suitable for exploiting pipeline parallelism parallelism takes advantage. When GPPs execute a parallel based on multiple cores, specially takes advantage. When GPPs execute a parallel based on multiple cores, specially designed ASICs and designed ASICs and FPGAs can support inherently pipelined and multithreaded applications, which FPGAs can support inherently pipelined and multithreaded applications, which are not based on the are not based on the von Neumann architecture. They have the ability to exploit the large extent of von Neumann architecture. They have the ability to exploit the large extent of pipeline parallelism and pipeline parallelism and distributed on-chip memory. Similar to the brain, the neuromorphic distributed on-chip memory. Similar to the brain, the neuromorphic processor has distributed and processor has distributed and integrated computation and memory, and operate in parallel [1,20]. integrated computation and memory, and operate in parallel [1,20]. Developing the neuromorphic Developing the neuromorphic processor via the ASIC-based or FPGA-based approach shows their processor via the ASIC-based or FPGA-based approach shows their different advantages. different advantages. ASICs are dedicated to a specific application. In recent years, the TrueNorth chip, which is ASICs are dedicated to a specific application. In recent years, the TrueNorth chip, which is developed by IBM, has attracted considerable attention. It is a low power, high parallel chip with 4096 developed by IBM, has attracted considerable attention. It is a low power, high parallel chip with neurosynaptic cores. The core is the basic block, which has a crossbar array for synaptic connections 4096 neurosynaptic cores. The core is the basic block, which has a crossbar array for synaptic and neurons for calculation. Each core contains 256 input axons, a 64k synaptic crossbar, and 256
Sensors 2017, 17, 1941
3 of 13
neurons. The TrueNorth chip is a neurosynaptic chip produced via a standard-CMOS manufacturing process [17,20]. Generally, the ASICs can provide high performance. At the same time, they are expensive and time consuming to produce and the architectures are relatively fixed and inflexible [1]. Traditionally, we must consider the flexibility, performance, and energy efficiency when evaluating hardware platforms. On the one hand, GPPs can be highly flexible and easy to use, but perform relatively inefficiently. On the other hand, ASICs work with high efficiency at the cost of being inflexible and difficult to produce [1]. As a compromise, the FPGA-based approach has drawn a significant number of applications from scientists and become one of the most promising alternatives, due to its low power usage, high performance, reprogrammability, and fast development round [21–26]. FPGAs often provide better performance per watt than GPPs and naturally fit with the neural network execution [1]. Table 1 shows a comparison of neuromorphic processors and GPPs mentioned above for implementing the neural network. Microcontroller Unit (MCU), which serves as a kind of low cost GPP, is also included for full comparison. Table 1. Comparison of neuromorphic processors and General Purpose Processors (GPPs) for implementing neural network. Features
General Purpose Processors (GPPs) multicore CPU
computing process computing structure energy-efficiency development round cost
GPGPU
MCU
sequential centralized low short high
high
Neuromorphic Processors ASIC-based
FPGA-based
parallel distributed best better longest longer low
highest
moderate
In this paper we propose a FPGA-based granularity variable neuromorphic processor (FBGVNP) with integrated computing and addressing cells. The presented neuromorphic processor consists of neuromorphic cores structured by a router, a neural computing unit, and a data-transmission controller. The router is used to build connections and keep communications between different cores. Meanwhile, the neural computing unit is composed of neuron computing cells, which can process computing and addressing simultaneously. Moreover, the data-transmission controller is responsible for the computing result transmissions. The traits of the neuromorphic processor can be summarized as follows. (1)
(2)
(3)
Granularity variability: The number of the cells in one neural computing unit can vary, which will enhance the flexibility of the neuromorphic core compared with fixed architectures. The neuron computing cells perform as the basic elements in this architecture. One can expand the size of the cells as required. That will make the core better suited for different applications. Scalability: The scalability is achieved by connecting different cores with routers and extending inner neural computing units. The data interaction through routers links the neuromorphic cores so they become whole. Generally, routers serve as the communication nodes in the multi-core network and the network scale can be extended as needed. Integrated computing and addressing ability: The neuron computing cell combines together computing and addressing abilities. The data transmission uses the broadcast mechanism. On this basis, a neuron computing cell serves as a data receiving and processing terminal and the two processes are executed simultaneously, which makes the computations perform in parallel.
The paper is organized as follows. Section 2 introduces the architecture of the FBGVNP. In Section 3, a neural network is mapped to the presented neuromorphic processor and applied in a multi-input multi-output (MIMO) temperature sensing and control test platform. Then, the experiment results and discussions are given. Finally, the conclusions are drawn in Section 4.
Sensors 2017, 17, 1941
4 of 13
Sensors 2017, 17, 1941
4 of 13
2. Neuromorphic Processor Architecture
2. Neuromorphic Processor Architecture
2.1. Architecture of FBGVNP
2.1. Architecture of FBGVNP
The architecture of the processor is designed to be granularity variable, scalable, and parallel, with The architecture of the processor is designed to be granularity variable, scalable, and parallel, a router, a neural computing unit, and a data-transmission controller as the blocks. The neural with a router, a neural computing unit, and a data-transmission controller as basic the basic blocks. The computing unit consists of neuron computing cells, which can conduct computing and addressing neural computing unit consists of neuron computing cells, which can conduct computing and simultaneously. The comparison TrueNorthofand FBGVNP shown in Figurein 2. Figure 2. addressing simultaneously. Theofcomparison TrueNorth andisFBGVNP is shown
(a)
(b)
Figure 2. Conceptual blueprint comparison of TureNorth and FPGA-based, granularity-variable
Figure 2. Conceptual blueprint comparison of TureNorth and FPGA-based, granularity-variable neuromorphic processor FBGVNP. (a) Conceptual blueprint of TureNorth. (b) Conceptual blueprint neuromorphic processor FBGVNP. (a) Conceptual blueprint of TureNorth. (b) Conceptual blueprint of FBGVNP. of FBGVNP.
From a structural view, the link relationship of TureNorth is shown in Figure 2a. The structure of TureNorth core can be divided three parts. The communication module is used to data of From a structural view, the linktorelationship of TureNorth is shown in Figure 2a.keep Thethe structure interaction via a connected network. The synapses module is responsible for connecting different TureNorth core can be divided to three parts. The communication module is used to keep the data neurons, while the neuron module serves as a neural computation center. The synapses module interaction via a connected network. The synapses module is responsible for connecting different adopts crossbar architecture, which makes the number of neurons in one core fixed at 256 [20]. Its neurons, while the neuron module serves as a neural computation center. The synapses module adopts basic block is a core which contains a neural network with 256 inputs and 256 outputs connected crossbar architecture, which makes the number neurons core3a, fixed at 256 [20]. Its basic block is through 256-by-256 synaptic connections [17].ofAs shownin inone Figure a TureNorth neurosynaptic a corecore which contains a neural network with 256 inputs and 256 outputs connected through is composed of input buffers, axons, dendrites, and neurons. The horizontal and vertical256-by-256 lines synaptic connections [17]. As shown in Figure 3a, a the TureNorth neurosynaptic core composed represent axons and dendrites correspondingly. Also, triangles represent neurons. Theisblock dots, of inputwhich buffers, axons, dendrites, and neurons. The horizontal and vertical lines represent axons and serve as synapses, represent the connections between axons and dendrites [20]. A synaptic crossbar organizes the synapses in each core. The neuron’sneurons. output is connected the axon’s dendrites correspondingly. Also, the triangles represent The blocktodots, whichinput serve as located in the same core or in a different core over the routing network. synapses, represent the connections between axons and dendrites [20]. A synaptic crossbar organizes Figure 2b shows of FBGVNP. In axon’s FBGVNP, the located neuron in andthe itssame the synapses in each core.the Theconnecting neuron’s architecture output is connected to the input corresponding synapses are mixed in one computing cell, which serves as the basic element. The core or in a different core over the routing network. contents of the synapses vary in different computing cells. A communication module connects several Figure 2b shows the connecting architecture of FBGVNP. In FBGVNP, the neuron and its neural computing units in one core, while the number of neurons can change in one core as needed. corresponding are can mixed in one computing cell, which serves asconnectivity the basic and element. Building onsynapses the local, one link multiple cores together by distributed global The contents of the synapses vary in different computing cells. A communication module connects construct more complex networks. Unlike TrueNorth, the basic structure of FBGVNP is the neuron several neural computing in one core, the number change in neuron one core as computing cell, and theunits number of the cellswhile can change in a coreofasneurons needed. can In one cell, the andBuilding its corresponding synapses together. cores together by distributed global connectivity needed. on the local, oneare canmixed link multiple The more structure of a neuromorphic core TrueNorth, in FBGVNPthe can be seen in Figure 3b. Theisneuron and construct complex networks. Unlike basic structure of FBGVNP the neuron computing cell serves as the basic element. It consists of two parts: decoder and neuron. The data and computing cell, and the number of the cells can change in a core as needed. In one cell, the neuron transmission uses broadcast mechanism. The neuron outputs (axons) are passed to the inputs of the its corresponding synapses are mixed together. neuron computing cells in a sequence via the routing network. Each neuron can send out an axon The structure of a neuromorphic core in FBGVNP can be seen in Figure 3b. The neuron computing cell serves as the basic element. It consists of two parts: decoder and neuron. The data transmission uses broadcast mechanism. The neuron outputs (axons) are passed to the inputs of the neuron computing cells in a sequence via the routing network. Each neuron can send out an axon packet to a
Sensors 2017, 17, 1941
5 of 13
Sensors 2017, 17, 1941
5 of 13
target in the same local core or another external core. Then, the decoders in the target core receive the Sensors 1941 in the same local core or another external core. Then, the decoders in the target5 core of 13 packet2017, to a17, target past axon packet and start to decode. There are different pre-stored dendrite records in each decoder, receive the past axon packet and start to decode. There are different pre-stored dendrite records in and the decoded axon information will be compared withcore. the Then, local the dendrite records. Aftercore that, the packet to a target the same local core or another external decoders in the target each decoder, andinthe decoded axon information will be compared with the local dendrite records. validreceive axon information will be transferred to the neuron for computing. Therefore, axonal branching is the the pastvalid axonaxon packet and start to decode. There aretodifferent pre-stored dendrite Therefore, records in After that, information will be transferred the neuron for computing. executed hierarchically in two steps. Firstly, a connection passes through a long distance between the each decoder, andisthe decoded axon information be Firstly, compared with the local dendrite records. axonal branching executed hierarchically in twowill steps. a connection passes through a long that, the axon information will be transferred to the aneuron computing. Therefore, cores.After Secondly, viavalid a the short distance within core, it is broadcasted to multiple neuron computing distance between cores. Secondly, via a short distance within core, it for is broadcasted to multiple cells branching iscells executed hierarchically in twocore. steps. Firstly, a connection passes through a long neuron computing after arriving at its target after axonal arriving at its target core. distance between the cores. Secondly, via a short distance within a core, it is broadcasted to multiple neuron computing cells after arriving at its target core.
(a)
(b)
Figure 3. Structur diagram comparison of TureNorth and FBGVNP. (a) Bipartite graph (left) and
(a) (b)graph (left) and Figure 3. Structur diagram comparison of TureNorth and FBGVNP. (a) Bipartite logical representation (right) of a TrueNorth core. © [2015] IEEE. Reprinted, with permission, from logical representation (right) of a TrueNorth core. © [2015] IEEE. Reprinted, with permission, from [20] Figure Structur diagram comparison of TureNorth and FBGVNP. (a) Bipartite graph (left) and [20] (b)3.Logical representation of a FBGVNP core. (b) Logical representation of a FBGVNP core. logical representation (right) of a TrueNorth core. © [2015] IEEE. Reprinted, with permission, from [20] (b) Logical representation of a FBGVNP core. 2.2. FBGVNP Internals
2.2. FBGVNP Internals
The FBGVNP core is composed of a router, a neural computing unit, and a data-transmission 2.2. FBGVNP Internals controller. The details the neuromorphic coreacan be seen in Figureunit, 4. The is used for The FBGVNP core isofcomposed of a router, neural computing androuter a data-transmission The FBGVNP core is composed of aneural router,computing a neural computing unit, and a data-transmission communicating with other cores, and the unit is where the neuron connection and for controller. The details of the neuromorphic core can be seen in Figure 4. The router is used controller. of addition, the neuromorphic core can be controller seen in Figure 4. for Thetransmitting router is used for computing The takedetails place. In the data-transmission is used neuron communicating with other cores, and the neural computing unit is where the neuron connection communicating with other cores, and the neural computing unit is where the neuron connection and outputs. and computing take place. In addition, the data-transmission used for transmitting computing take place. In addition, the data-transmission controllercontroller is used foristransmitting neuron neuron outputs. outputs.
Figure 4. Internal structure of the FBGVNP core composed of router, neural computing unit, and datatransmission controller. clk: clock. gclk: global clock. Figure 4. Internal structure of the FBGVNP core composed of router, neural computing unit, and dataFigure 4. Internal structure of the FBGVNP core composed of router, neural computing unit, and transmission controller. clk: clock. gclk: global clock.
data-transmission controller. clk: clock. gclk: global clock.
Sensors 2017, 17, 1941
6 of 13
The FBGVNP architecture supports the interconnected neural network by building a network of the presented neuromorphic cores. Connections between cores are implemented by sending data though the router network. As shown in Figure 4, in a one router, one neural computing unit structure, each router is decomposed into five separated data transfer directions: (1) left; (2) right; (3) up; (4) down; and (5) local. Five input ports exist to receive the routing packets from local core or nearest-neighbor routers in the left, right, up, and down directions. In addition, there are also five output ports corresponding. The transmitted data packet between routers has the following format as shown in Table 2, and the data length of each part can be changed as needed. Table 2. The format of the transmission data packet. Dx
Dy
Destination Unit Index
Destination Cell Index
Neuron Output
8 bits
8 bits
4 bits
8 bits
8 bits
Upon received a packet, the information in the dx or dy fields (number of hops in the left/right or up/down directions correspondingly) is used to pass the packet out to one of five destinations: the local neural computing unit or nearest neighbor routers in four different directions. For using the X-Y dimension order routing strategy, the dx field is decremented to right hop or incremented to left hop, and packets are routed in the horizontal direction first. When dx becomes 0, the dy field is decremented to above hop or incremented to below hop. When the dx and dy both become 0, the destination unit index field, which stores the target neural computing unit address, will be resolved. Then, after stripping the associated bits, the remaining 16 bits are sent to the target neural computing unit. The destination cell index field denotes the address of the outputting neuron, which can be matched by the neurons in the target core. Thus, within a routing window, a packet from any core can be sent to a neuron at any destination core, and the axon value is stored in the neuron output field. The neural computing unit is where the neuron connection and the computing take place. The packet transferred from the router will be broadcasted to all the cells in the neural computing unit. There are different address-weight pairs pre-storage in the decoders. After the decoding of the packet in each cell, the neuron output field data (axon value) of the address-matched packet will be sent to the neuron as well as the corresponding pre-storage weight. In the neuron, the axon value and weight will be multiplied and then sent to an accumulator. The result of the accumulator will become the neuron output. The data-transmission controller serves as the scheduler in computing the result transmissions. It is used to control the processes of outputting neuron computing results stored in the sending buffer. Comparably, the number of the neural computing units connected to a router can be extended in a similar way. The data transmission of FBGVNP proceeds according to the following steps. (1)
(2)
(3)
A neuromorphic core receives a packet from the network and resolves it. If not equal to zero, the dx field will be decremented or incremented and the packet will be sent to the corresponding right or left routing hop. Then, when dx becomes 0, the dy field will also be decremented or incremented in a vertical direction transmission until it turns to 0. The destination unit index field will be resolved to get the target unit neural computing unit address. The dx, dy, and destination unit index field bits are stripped and the remaining 16 bits are broadcasted to the cells in the target neural computing unit. The destination cell index field bits will be compared to the address entries pre-stored in each decoder. If they match, the corresponding weight in the decoder will be sent to the neuron for computing, as well as to the neuron output field bits in the packet. If not, the cell will work only when the next packet reaches the neuron. The neuron receives weight and neuron output field bits from the decoder and multiplies them. The result will be sent to an accumulator.
Sensors 2017, 17, 1941
(4) (5)
(6)
(7)
7 of 13
When a synchronization trigger signal called the global clock arrives, each neuron outputs the accumulator result to the sending buffer and the core steps in the data sending process. After the global clock arrives, the core step is in the data transmission period. The data-transmission controller initializes the N_period signal. It is a number denoting which neuron’s computing result is selected to be transferred out. At the same time, the N_data_comp Sensors 2017, 17, 1941 13 signal pulses once and drives a process verifying the validity of the selected neuron’s7 of computing result storedthe in global the sending buffer.the If invalid, N_data_null signal pluses once, (5) After clock arrives, core step the is in the data transmission period. The and data-the core goes totransmission another data transmission On the other if valid, the which N_R_data_en controller initializesperiod. the N_period signal. It ishand, a number denoting neuron’s signal computing result state, is selected be end transferred At the data same sending. time, the N_data_comp signal turns to the enabling untiltothe of thisout. period’s once and drives asignal processisverifying the validity of the selected neuron’s computing result will be When pulses the N_R_data_en enabled, the selected neuron’s computing result stored in the sending buffer. If invalid, the N_data_null signal pluses once, and the core goes to transferred to the router, if the inner serial peripheral interface (SPI) port’s buffer is not full. another data transmission period. On the other hand, if valid, the N_R_data_en signal turns to After the transferring, the the router will the packet targeting to the destination the enabling state, until end ofcontroller this period’s datasend sending. core return back a R_N_data-done signal the data-transmission the (6) and When the N_R_data_en signal is enabled, the to selected neuron’s computingcontroller. result will Then, be transferred to the router, if the inner serial peripheral interface port’s is not full. After N_period and N_data_comp signals will be updated along(SPI) with the buffer core steps in another data the transferring, transmission period. the router controller will send the packet targeting to the destination core and return back a R_N_data-done signal to the data-transmission controller. Then, the N_period and After the transferring of the last neuron’s computing result, the core will keep waiting for the N_data_comp signals will be updated along with the core steps in another data transmission next arrival period.of the global clock. (7) After the transferring of the last neuron’s computing result, the core will keep waiting for the
2.3. Featuresnext Comparison and TrueNorth arrival of of theFBGVNP global clock.
FBGVNP and TrueNorth both serve as a neuromorphic processor but have their respective 2.3. Features Comparison of FBGVNP and TrueNorth features. The comparison of FBGVNP and TrueNorth can be seen in Table 3. FBGVNP and TrueNorth both serve as a neuromorphic processor but have their respective features. The comparison of FBGVNP and TrueNorth can be seen in Table 3.
Table 3. Comparison of FBGVNP and TrueNorth.
Processors
Neuron Number
Table 3. Comparison of FBGVNP and TrueNorth.
Neuron FBGVNP Processorsvariable Number TrueNorth fixed FBGVNP variable TrueNorth fixed
Computing and Addressing
Computing and integrated Addressing separated integrated separated
Scalability
internal & external Scalability external
internal & external external
Power Consumption
Power high Consumption low high low
The main difference between FBGVNP and TrueNorth is from their corresponding structure, The main difference based betweenon FBGVNP and TrueNorthFPGA, is from while their corresponding structure, since FBGVNP is developed a reprogrammable TrueNorth uses a relatively since FBGVNP is developed based on avia reprogrammable FPGA, while usesabove, a relatively fixed crossbar structure and is developed ASIC-based approach. AsTrueNorth mentioned ASIC-based fixed crossbar structure and is developed via ASIC-based approach. As mentioned above, ASICTrueNorth has the highest energy efficiency. based TrueNorth has the highest energy efficiency. Further, an execution comparison between the two processors based on a fully-connected, Further, an execution comparison between the two processors based on a fully-connected, proportional–integral–derivative neuralnetwork network (FCPIDNN) is presented proportional–integral–derivative(PID) (PID) neural (FCPIDNN) is presented as follows. as Thefollows. The structure ofofthe FCPIDNN is shown in Figure 5. structure thereference reference FCPIDNN is shown in Figure 5.
Figure 5. Structure diagram of the referencefully-connected, fully-connected, proportional–integral–derivative neural neural Figure 5. Structure diagram of the reference proportional–integral–derivative network (FCPIDNN). network (FCPIDNN).
Sensors 2017, 2017, 17, 17, 1941 1941 Sensors
88 of of 13 13
The FCPIDNN is a structure suitable for resolving multi-input, multi-output (MIMO), real-time Theproblems FCPIDNN is a structure forand resolving multi-input, multi-output real-time control [27,28]. It inputssuitable reference sampled real signals, and outputs(MIMO), control signals to control problems [27,28]. It inputs reference and sampled real signals, and outputs control signals the actuators. The scale of the FCPIDNN depends on the number of input and output channels. The to the actuators. The in scale of the FCPIDNN on6the number of input and output channels. presented FCPIDNN Figure 5 adopts a 300 depends inputs and outputs configuration. The execution time The presented in Figure 5 adopts a 300 and 6 outputs configuration. The comparison of FCPIDNN FBGVNP and TrueNorth, based oninputs the reference FCPIDNN, is presented in execution Table 4. time comparison of FBGVNP and TrueNorth, based on the reference FCPIDNN, is presented in Table 4. Table 4. Execution time comparison of FBGVNP and TrueNorth based on FCPIDNN. Table 4. Execution time comparison of FBGVNP and TrueNorth based on FCPIDNN.
Processors Processors
FBGVNP TrueNorth FBGVNP
1st Layer (Gclk1st Period) Layer (Gclk 1 Period) 1 1
TrueNorth
1
2nd Layer (Gclk 2ndPeriod) Layer (Gclk1Period) 11 1
3rd Layer (Gclk Period) 3rd Layer (Gclk Period) 1 11 1
4th Layer (Gclk Period) 4th Layer (Gclk Period) 1 1 2 2
It is obvious to see that, based on the FCPIDNN, the execution time of the TrueNorth lasts longer than It the mainbased influencing lies inthe theexecution dendritetime extension the is FBGVNP. obvious toThe see that, on the factor FCPIDNN, of the operation. TrueNorthBecause lasts longer neuron number is fixed in one TrueNorth core, when there more synapses need to be accumulated than the FBGVNP. The main influencing factor lies in the dendrite extension operation. Because the dendritenumber extension has toinbeone applied, though thatwhen takesthere additional time and hardware neuron is fixed TrueNorth core, more synapses need to beresources. accumulated On the other hand, when the number of the neurons connected in one TrueNorth core is less dendrite extension has to be applied, though that takes additional time and hardware resources. than On 256,the it other will create waste in the unconnected crossbar (memory resources for saving synaptic hand, when the number of the neurons connected in one TrueNorth core is less than weights). instance, 6 shows the connection condition of the second layer of the reference 256, it willFor create waste Figure in the unconnected crossbar (memory resources for saving synaptic weights). FCPIDNN. valid6 connection is less thancondition 1% (1/128). is, overlayer 99%of ofthe thereference crossbar resource is For instance,The Figure shows the connection of That the second FCPIDNN. wasted. The valid connection is less than 1% (1/128). That is, over 99% of the crossbar resource is wasted.
Figure 6. Connection diagram in TrueNorth of the second layer of the reference FCPIDNN. Figure 6. Connection diagram in TrueNorth of the second layer of the reference FCPIDNN.
Relatively, Relatively, FBGVNP FBGVNP employs employs the the methodologies methodologies of of granularity granularity variability variability and and integrated integrated computing and addressing. That makes the number of neurons in one core capably vary computing and addressing. That makes the number of neurons in one core capably vary as as needed needed without which helps helps to to improve of the without additional additional hardware hardware resources, resources, which improve the the utilization utilization of the hardware hardware resources resources and and reduce reduce time time delays. delays. 3. Experiments and Discussion 3. Experiments and Discussion 3.1. Experiment Setup 3.1. Experiment Setup In this section, a FCPIDNN is mapped to the neuromorphic processor and applied in a temperature In this section, a FCPIDNN is mapped to the neuromorphic processor and applied in a sensing and control system. The control system includes a mockup, a power supply, temperature temperature sensing and control system. The control system includes a mockup, a power supply, sensors distributed in the mockup, a data monitoring computer, and a data processing circuit. temperature sensors distributed in the mockup, a data monitoring computer, and a data processing The presented neuromorphic processor is realized in a FPGA placed on the data processing circuit. circuit. The presented neuromorphic processor is realized in a FPGA placed on the data processing The structure of the temperature control system is shown in Figure 7. circuit. The structure of the temperature control system is shown in Figure 7.
Sensors 2017, 17, 1941
9 of 13
Sensors 2017, 17, 1941 Sensors 2017, 17, 1941
9 of 13 9 of 13
Figure 7. Structure of the temperature control system. Figure7.7.Structure Structure of the temperature Figure temperaturecontrol controlsystem. system.
The configuration configuration of of the the mockup mockup contains contains six six fans fans indicated indicated as as Fan1 Fan1 to to Fan6 Fan6 and and six six inside inside The The configuration of the mockup contains six fans indicated asthe Fan1 to Fan6 anddifferent six inside modules modules indicated as module 1 to module 6. The modules inside mockup have volumes modules indicated as module 1 to module 6. The modules inside the mockup have different volumes indicated as module 1 tofor module 6. The modules inside theeach mockup haveThe different volumes shapes. and shapes. shapes. A heater heater generating heat is placed placed inside module. six heaters heaters haveand different and A for generating heat is inside each module. The six have different A levels heater forpower. generating heat is placed inside each module. The six heaters have different levels of of power. The temperature temperature control actuators inside the the mockup are cooling cooling fans. The Thelevels local of The control actuators inside mockup are fans. local temperatures around the six modules are detected by distributed temperature sensors. The data power. The temperature control actuators inside the mockup are cooling fans. The local temperatures temperatures around the six modules are detected by distributed temperature sensors. The data processing circuit acquires the temperatures from the sensors through the universal asynchronous around the sixcircuit modules are detected by distributed temperature sensors. data processing circuit processing acquires the temperatures from the sensors through theThe universal asynchronous receiver-transmitter (UART) port, runs the temperature controller, and outputs the pulse width acquires the temperatures from the sensors through the universal asynchronous receiver-transmitter receiver-transmitter (UART) port, runs the temperature controller, and outputs the pulse width modulation (PWM) control signals to the the actuators. actuators. The data data monitoring computer is used used to collect collect (UART) port, runs thecontrol temperature controller, and outputs themonitoring pulse widthcomputer modulation (PWM) control modulation (PWM) signals to The is to the temperature information via the UART port from the data processing circuit [28]. The prototypes signals to the actuators. The data computer usedprocessing to collect the temperature information the temperature information via monitoring the UART port from theisdata circuit [28]. The prototypes of theUART data processing processing printed circuit boardcircuit (PCB) [28]. and the the temperature sensing andprocessing control mockup mockup viaof the port from printed the datacircuit processing Thetemperature prototypes sensing of the data printed the data board (PCB) and and control are shown in Figure 8. circuit board in (PCB) and are shown Figure 8. the temperature sensing and control mockup are shown in Figure 8.
(a) (a)
(b) (b)
Figure 8. (a) Data processing printed circuit board (PCB); (b) Temperature sensing and control mockup. Figure 8. (a) Data processing printed circuit board (PCB); (b) Temperature sensing and control mockup. Figure 8. (a) Data processing printed circuit board (PCB); (b) Temperature sensing and control mockup.
Sensors 2017, 17, 1941 Sensors 2017, 17, 1941
10 of 13 10 of 13
3.2. Experiment Experiment Results Results and and Discussions Discussions 3.2. Weconduct conductexperiments experiments to validate the effectiveness of the FBGVNP-based temperature We to validate the effectiveness of the FBGVNP-based temperature controller. controller. Figure shows the response process when the temperature targets for all are set Figure 9 shows the9response process when the temperature targets for all modules aremodules set to constant. to constant. Figure 10 shows the response process with multi-targets or the variational target. Figure 10 shows the response process with multi-targets or the variational target. 40
10
T-1 T-2 T-3 T-4 T-5 T-6 T-target
6
Cost(°C)
Temperature/°C
8 35
30
4 2
25 0
500
1000 Time/s
1500
0 0
2000
500
(a) 40
1000 Time(s)
1500
2000
1500
2000
(b) 10
T-1 T-2 T-3 T-4 T-5 T-6 T-target
Cost(°C)
Temperature/°C
8 35
30
6 4 2
25 0
500
1000 Time/s
1500
0 0
2000
(c)
500
1000 Time(s)
(d)
Figure 9. 9. Experimental Experimental results results of of temperature temperature control control response response when when the the target target temperature temperature is is set set to to Figure constant. (a,c) Temperature responses of different modules (target temperature is 30 °C and 32 °C, ◦ ◦ constant. (a,c) Temperature responses of different modules (target temperature is 30 C and 32 C, respectively); (b,d) (b,d)changes changesof of cost function values corresponding the temperature responses respectively); cost function values corresponding to thetotemperature responses shown shown in (a,c). in (a,c).
In Figure 9, it indicates that the different modules’ temperatures approach the set target, and the In Figure 9, it indicates that the different modules’ temperatures approach the set target, and the differences between the actual and target temperatures decrease. The cost function value is differences between the actual and target temperatures decrease. The cost function value is formulated formulated as the following Equation (1). as the following Equation (1). v u u 1 66 t 1 (n) (1) J(n )J(n = )= ∑ ee22m(n) (1) ඩ2 2m=1 m m=1
Cost(°C)
Temperature/°C
where the difference between the target and actual temperature is represented by e, the number of the where the difference between thethe target andnumber actual temperature is by represented by e, the number of module is represented by m, and sample is represented n. the module is represented by m, and thethe sample number is represented n. with the temperature Figure 10 shows the results using FBGVNP-based controller tobydeal control with different targets, which further validates the effectiveness of the temperature control and 40 10 T-1 T-2 T-3 T-4adaptability T-5 T-6 T-target exhibits its good robustness and to different scenarios. Figure 10a,b shows the control 8 results when two different targets were set. In the process, the target temperature of module 3 was set to 33 ◦35 C and others were 30 ◦ C. It can be seen that the cost function gradually reached the minimal and 6 the modules’ temperatures approached their specific targets. Figure 10c,d shows the control results when 30 the target temperature for modules 3 to 4 was set4 to 33 ◦ C and the target temperature for other modules was 30 ◦ C. The modules approached both their targets, respectively. Figure 10e,f show the 2 ◦ control results if the modules’ targets varied, i.e., 33 0C at the beginning and then turned to 30 ◦ C. 25 0 500 1000 1500 2000 0 500 1000 2000 The FBGVNP-based controller set targets1500 successfully and Time/s modulated the modules’ temperature to the Time(s) the cost functions got to minimal at last. (a) (b)
6
1 J(n )=ඩ e2m (n) 2
(1)
m=1
where the difference between the target and actual temperature is represented by e, the number of Sensors 2017, 17, 1941 11 of 13 the module is represented by m, and the sample number is represented by n. 40
10
T-1 T-2 T-3 T-4 T-5 T-6 T-target
6
Cost(°C)
Temperature/°C
8 35
30
4 2
25 0
500
1000 Time/s
Sensors 2017, 17, 1941
1500
0 0
2000
500
(a) 40
1000 Time(s)
1500
2000
1500
2000
11 of 13
(b) 10
T-1 T-2 T-3 T-4 T-5 T-6 T-target
Cost(°C)
Temperature/°C
8 35
30
6 4 2
25 0
500
1000 Time/s
1500
0 0
2000
500
(c) 40
1000 Time(s)
(d) 10
T-1 T-2 T-3 T-4 T-5 T-6 T-target
Cost(°C)
Temperature/°C
8 35
30
6 4 2
25 0
500
1000
1500 2000 Time/s
(e)
2500
3000
3500
0 0
500
1000
1500 2000 Time(s)
2500
3000
3500
(f)
Figure 10. Temperature 33 ◦°C Figure 10. Temperature control control scenarios. scenarios. (a,b) (a,b) Temperature Temperature target target of of module module 33 is is set set to to 33 C and and others are set to 30 °C. (c,d) Temperature target of modules 3 to 4 is set to 33 °C and others are ◦ ◦ others are set to 30 C. (c,d) Temperature target of modules 3 to 4 is set to 33 C and others are set set to to 30 (e,f) Temperature Temperaturetarget targetisisset settoto3333 first 1800 s and then turned to◦30 °C; (b,d,f) arecost the ◦ C°C 30 ◦°C. C. (e,f) at at first 1800 s and then turned to 30 C; (b,d,f) are the cost function values corresponding to (a,c,e). function values corresponding to (a,c,e).
Figure 10 shows the results using the FBGVNP-based controller to deal with the temperature 4. Conclusions control with different targets, which further validates the effectiveness of the temperature control In this paper, we propose a FPGA-based granularity variablescenarios. neuromorphic processor FBGVNP. and exhibits its good robustness and adaptability to different Figure 10a,b shows the The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, control results when two different targets were set. In the process, the target temperature of module First,were the number of the variable rather gradually than constant in one 3and wasaddressing set to 33 °Cability: and others 30 °C. It can be neurons seen that isthe cost function reached the core; second, the number of internal neural computing units and the scale of the multi-core network minimal and the modules’ temperatures approached their specific targets. Figure 10c,d shows the can be extended as needed; third, the neuronfor addressing arethe executed control results when the target temperature modules 3and to computing 4 was set toprocesses 33 °C and target simultaneously. Additionally, a comparison between the FBGVNP and an existing neurosynaptic chip temperature for other modules was 30 °C. The modules approached both their targets, respectively. TrueNorth is conducted. Moreover, a neural network-based controller is mapped to FBGVNP and Figure 10e,f show the control results if the modules’ targets varied, i.e., 33 °C at the beginning and applied to a multi-input, (MIMO), temperature-sensing control system. Experiments then turned to 30 °C. Themulti-output FBGVNP-based controller modulated the and modules’ temperature to the set validate the effectiveness of the presented neuromorphic processor. The FBGVNP provides a new targets successfully and the cost functions got to minimal at last. scheme for building ANNs, which is flexible and highly energy-efficient, and can be widely applied in many areas with the support of the state-of-the-art algorithms in machine learning. 4. Conclusions In this paper, we propose a FPGA-based variable processor FBGVNP. Acknowledgments: This study was supported granularity by the National Key neuromorphic Scientific Instrument and Equipment Development Projects, China (Grant No. 2013YQ19046705). The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: First, the number of the neurons is variable rather than constant in one core; second, the number of internal neural computing units and the scale of the multi-core network can be extended as needed; third, the neuron addressing and computing processes are executed simultaneously. Additionally, a comparison between the FBGVNP and an existing neurosynaptic chip TrueNorth is conducted. Moreover, a neural network-based controller is mapped to FBGVNP and applied to a multi-input, multi-output (MIMO), temperature-sensing and control system. Experiments validate the effectiveness of the presented neuromorphic processor. The FBGVNP
Sensors 2017, 17, 1941
12 of 13
Author Contributions: Zhen Zhang and Cheng Ma conceived and designed experiments; Zhen Zhang performed the experiments; Zhen Zhang and Cheng Ma analyzed the data; Cheng Ma and Rong Zhu contributed materials and analysis tools; Zhen Zhang wrote the paper; Cheng Ma and Rong Zhu revised the manuscript. Conflicts of Interest: The authors declare no conflict of interest.
References 1. 2. 3. 4. 5. 6. 7.
8.
9.
10.
11.
12.
13. 14.
15. 16.
17.
18.
19.
Lacey, G.; Taylor, G.W.; Areibi, S. Deep learning on FPGAs: Past, present, and future. arXiv, 2016, arXiv:1602.04283. Hasan, R.; Taha, T. A reconfigurable low power high throughput architecture for deep network training. arXiv, 2016, arXiv:1603.07400. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed] Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [CrossRef] [PubMed] Gao, F.; Huang, T.; Wang, J.; Sun, J.; Hussain, A.; Yang, E. Dual-branch deep convolution neural network for polarimetric SAR image classification. Appl. Sci. 2017, 7, 447. [CrossRef] Anibal, P.; Gloria, B.; Oscar, D.; Gabriel, C.; Saúl, B.; María, B.R. Automated Diatom Classification (Part B): A Deep Learning Approach. Appl. Sci. 2017, 7, 460. Pastur-Romay, L.A.; Cedron, F.; Pazos, A.; Porto-Pazos, A.B. Deep Artificial Neural Networks and Neuromorphic Chips for Big Data Analysis: Pharmaceutical and Bioinformatics Applications. Int. J. Mol. Sci. 2016, 17, 1313. [CrossRef] [PubMed] Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1701–1708. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [CrossRef] [PubMed] Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484. [CrossRef] [PubMed] He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the IEEE International Conference on Computer Vision, Washington, DC, USA, 7–13 December 2015; pp. 1026–1034. Yadan, O.; Adams, K.; Taigman, Y.; Ranzato, M. Multi-gpu training of convnets. arXiv, 2016, arXiv:1312.5853. Yu, K. Large-scale deep learning at Baidu. In Proceedings of the ACM International Conference on Information and Knowledge Management, San Francisco, CA, USA, 27 October–1 November 2013; pp. 2211–2212. Furber, S.B.; Galluppi, F.; Temple, S.; Plana, L.A. The SpiNNaker Project. Proc. IEEE 2014, 102, 652–665. [CrossRef] Beyeler, M.; Carlson, K.D.; Chou, T.S.; Dutt, N. CARLsim 3: A user-friendly and highly optimized library for the creation of neurobiologically detailed spiking neural networks. In Proceedings of the International Joint Conference on Neural Networks, Killarney, Ireland, 12–17 July 2015; pp. 1–8. Merolla, P.A.; Arthur, J.V.; Alvarez-Icaza, R.; Cassidy, A.S.; Sawada, J.; Akopyan, F.; Jackson, B.L.; Imam, N.; Guo, C.; Nakamura, Y.; et al. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 2014, 345, 668–673. [CrossRef] [PubMed] Farabet, C.; Poulet, C.; Han, J.Y.; LeCun, Y. CNP: An FPGA-based processor for Convolutional Networks. In Proceedings of the International Conference on Field Programmable Logic and Applications, Prague, Czech Republic, 31 August–2 September 2009; pp. 32–37. Nageswaran, J.M.; Dutt, N.; Krichmar, J.L.; Nicolau, A.; Veidenbaum, A.V. A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors. Neural Netw. 2009, 22, 791–800. [CrossRef] [PubMed]
Sensors 2017, 17, 1941
20.
21. 22.
23.
24.
25.
26.
27. 28.
13 of 13
Akopyan, F.; Sawada, J.; Cassidy, A.; Alvarez-Icaza, R.; Arthur, J.; Merolla, P.; Imam, N.; Nakamura, Y.; Datta, P.; Nam, G.; et al. True North: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip. IEEE Trans. Comput Aid D Integr. Circuits Syst. 2015, 34, 1537–1557. [CrossRef] Aydonat, U.; O’Connell, S.; Capalija, D.; Ling, A.C.; Chiu, G.R. An OpenCL(TM) Deep Learning Accelerator on Arria 10. arXiv, 2017, arXiv:1701.03534. Park, J.; Sung, W. FPGA based implementation of deep neural networks using on-chip memory only. In Proceedings of the International Conference on Acoustics Speech and Signal Processing, Shanghai, China, 20–25 March 2016; pp. 1011–1015. Zhang, C.; Li, P.; Sun, G.; Guan, Y.; Xiao, B.; Cong, J. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In Proceedings of the Acm/sigda International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2015; pp. 161–170. Peemen, M.; Setio, A.A.A.; Mesman, B.; Corporaal, H. Memory-Centric Accelerator Design for Convolutional Neural Networks. In Proceedings of the IEEE International Conference on Computer Design, Asheville, NC, USA, 6–9 October 2013; pp. 13–19. Qiu, J.; Wang, J.; Yao, S.; Guo, K.; Li, B.; Zhou, E.; Yu, J.; Tang, T.; Xu, N.; Song, S.; et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. Acm/sigda International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 21–23 February 2016; pp. 26–35. Zhan, C.; Fang, Z.; Zhou, P.; Pan, P.; Cong, J. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks. In Proceedings of the IEEE ACM International Conference on Computer-Aided Design, Austin, TX, USA, 7–10 November 2016. Lee, C.; Chen, R. Optimal Self-Tuning PID Controller Based on Low Power Consumption for a Server Fan Cooling System. Sensors 2015, 15, 11685–11700. [CrossRef] [PubMed] Zhang, Z.; Ma, C.; Zhu, R. Self-Tuning Fully-Connected PID Neural Network System for Distributed Temperature Sensing and Control of Instrument with Multi-Modules. Sensors 2016, 16, 1709. [CrossRef] [PubMed] © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).