FPGA-TARGETED NEURAL ARCHITECTURE FOR EMBEDDED ALERTNESS DETECTION Bernard Girau LORIA INRIA-Lorraine & Universit´e Nancy 2 Nancy - France email:
[email protected]
Khaled Ben Khalifa TIM Project - Biophysics Laboratory - Medical Faculty of Monastir Monastir - TUNISIA email:
[email protected]
Abstract Several recent works have used neural networks to discriminate vigilance states in humans from electroencephalographic (EEG) signals. Our study aims at being more exhaustive. It takes into account various connectionist models, and it precisely studies their features and their performances. Physicians have been associated to the project, especially when tuning our models. Above all, our work has been oriented in such a way to get a light, low-power, easy to wear system. First implementation works have focused on the use of a Self-Organizing Map architecture, since the most efficient neural model of our study, a multilayer perceptron (MLP), was too huge for a straightforward FPGA implementation. In this paper, we describe how the theory of FPNA (Field Programmable Neural Arrays) has been applied to this model, so as to simplify the topology of the MLP of our application. Thanks to this simplification, a fully parallel FPGA implementation has been made possible and efficient, without any significant performance loss.
Keywords alertness, neural networks, FPGA, FPNA, low-power
1 Introduction Through a wide collaboration between connectionists, biophysicists, physicians and electronicians, we have developed an embedded neural system for vigilance state discrimination and alertness decision from electroencephalographic parameters. This work handles different kinds of automatic classification algorithms (connectionist models with unsupervised and supervised training), using Kohonen selforganizing maps (SOM) or multilayer perceptrons (MLP). These models are tuned to perform different tasks: separation and classification of the vigilance states, or alertness decision. The principles and software results of this work are presented in [1]. The aim is to lead to a portable electronic device able to detect alertness from a minimal EEG derivation. Since this device is dedicated to an ambulatory use, real-time computation and above all power consump-
tion are key factors. Therefore, the technological choices of our work have to take into account the performance of a model for the tasks mentioned above, as well as the need of a very low-power solution. As shown in [2], the use of a SOM-based decision process has led to a very satisfactory implementation on a FPGA circuit (Field Programmable Gate Array). Nevertheless, we have preferred until now this approach mainly because of its ability to be implemented in a rather straightforward way: the simplicity of the neural architecture (a 5x5 Kohonen map) may be mapped as a parallel architecture on a large FPGA, though it requires well-chosen arithmetical and technological choices. The use of a MLP-based decision process requires a quite large neural architecture (23 inputs, at least 10 hidden neurons and 8 output neurons), which direct parallel implementation on a FPGA is not possible. Therefore, this solution has been first considered as unsuitable for our implementation requirements. As mentioned in [5] FPGA circuits offer highperformance, high-speed and high capacity programmable logic solutions that enhance design flexibility while reducing time-to-market. Despite the advantages of FPGAs for neural implementations, such devices do not solve the problem of parallel implementations of large neural architectures. The amount of logic resources in a FPGA are still below the needs of the neural networks used in many applications, and the underlying 2D structure of FPGAs does not fit the complex topologies of standard neural networks. The FPNA framework (Field Programmable Neural Arrays, see [5, 4]) has been developed so as to provide simplified neural topologies without any significant loss of computation power. The aim is to derive FPGA-adapted neural networks from efficient standard neural networks such as MLPs. This article shows how a FPNA neural network has been derived from our rather large MLP, by means of the principles of [4]. The obtained FPNA results in highly satisfactory alertness decision performance, while its simplified topology may be easily mapped on an FPGA device, using the pipeline implementation method described in [4]. Section 2 describes the spectral processing carried out on the EEG signal, the connectionist methods that have been selected (SOM and MLP) and their performance for alertness decision. Section 3 describes the implementation
work of the SOM-based method. Section 4 shortly describes the main principles of the FPNA framework. Section 5 describes the FPNA architecture that has been developed from a MLP for our alertness detection application. Then it describes our serial implementation of this FPNA on a Xilinx Virtex FPGA.
ming window ponderation type. For our study, 23 bands of 1 Hz, normalized from 1 to 23Hz, are used. After this processing and the choice of a signal band coding, connectionist methods are applied, as described below. Their inputs are the 23 percentages of the power spectrum of each band.
2 Materials and methods
2.4 Connectionists tools
2.1 Subjects and recordings
In this work, two kinds of connectionist models have been used for the separation and the classification of the vigilance states :
This study was concerned with a control group of five healthy male medical students, aged 18 to 23. Each subject had three 24-hour recordings with an interval of 15 days. The equipment was an ambulatory long-duration recording system with 8 channels, O XFORD M EDILOG 9000 model. The analog recording was made on a magnetic tape, and then digitized and visualized by a second reading system. Each recording contains five EEG channels (F3-F4; C3-P3; C3-01; C4-P4 and P4-O2). The sampling frequency of all registered signals is 128 Hz. Four noisy recordings have been eliminated. A 24-hour recording has been selected for each subject.
2.2 Discrimination of the states of vigilance by the expert An expert in EEG and polysomnography interpretation has labeled the different vigilance levels. His analysis on the zones of awakening-drowsiness transition enabled to distinguish five levels of vigilance : wide awakening (Wa), calm awakening with wide open eyes (Cawoe), calm awakening with closed eyes (Cace), drowsiness (Drow) and stage 1 of the sleep (Stg1). To take into account artefacted parts and to ensure the continuity of the visual analysis, the expert had to define three other states : artefacted calm awakening with wide open eyes (Art-Cawoe), artefacted calm awakening with closed eyes (Art-Cace) and artefacts due to movements (Mv).
• An automatic extraction of categories has been performed using self-organizing models (Kohonen maps) with unsupervised training. • A multilayer perceptron with supervised training has obtained the most performant discrimination with respect to the categories that were identified by the expert.
2.4.1 Self-organizing map The SOM principle models the mechanism of the spatial self-organization of perceptions operated by the cortex in the form of a topographic classification process. The Ndimensional input vectors are converted into classes that self-organize according to a two-dimensional structure of neurons on which neighborhood relations are preset. The SOM training algorithm model is detailed in [6]. In our study, we have finally used a 5x5 Kohonen map. As shown in figure 1 this model has identified states which regularity is strongly linked to the classification performed by the expert: neurons/classes with the same color correspond to the same vigilance state, and the drawn path corresponds to natural transitions between these states.
Input layer X1
2.3 Pre-processing In order to get a portable, easy-to-wear system, we have tried to minimize the number of electrodes. Our study has shown that acceptable performances are still reachable with only one electrode, which is a significant improvement with respect to existing approaches. The choice of the parietooccipital EEG derivation P4-O2 avoids the ocular frontal derivation artefacts. This choice also leads to an alpha activity that is characteristic of the calm awakening with closed eyes. The spectral pre-processing applied on this derivation (P4-O2) consists of a Short Term Fast Fourier transformation STFFT with 4-second portions and a 512-point Ham-
Competitive (or output) layer
Decision layer
X2
Wp,n
+
Y1
+
Y2
Wcn,k
Xp
X23
Figure 1. Kohonen map for vigilance states classification
We have also adapted this model for an alertness decision process (decision layer), optimized by means of the su-
pervised version of the Kohonen maps, the LVQ (learning vector quantization). After unsupervised training, learning examples are classified, and if the most activated neuron belongs to the right class, it is brought closer to the example, or pushed back in the opposite case.
2.4.2 Multilayer perceptron The MLP model is the most used and well-known neural architecture. It consists of several ordered layers of neurons. Two consecutive layers are fully connected. A neuron computes a weighted sum of the outputs from the previous layer. Then it adds a threshold and it applies a transfer function to the sum. The learnable parameters of this neuron are the synaptic weights and the threshold. MLP learning is often performed thanks to what is called the back-propagation learning algorithm. This wellknown algorithm is indeed the combination of the backpropagation method (for gradient computing) and of the gradient descent algorithm. Back-propagation allows an efficient computation of the gradient of an error function with respect to the parameters of a neural network. As soon as they are computed, the gradient values are used to modify the parameters to make the neural network perform the chosen task. In our study, several MLP architectures have been tested. Satisfactory results have been obtained with an MLP that uses 23 inputrs, one hidden layer of 10 neurons, and 8 output neurons (standard coding for 8-class discrimination). Every neuron applies the same transfer function, which is a sigmoidal function. The tanh function has been chosen. This architecture is shown in figure 2 (see section 5 for the thick connection).
awakening states (see [1]). We report here the results for alertness decision. For each subject, a training and a test corpus (for learning and generalization phases respectively) were built gathering states Cace and Cawoe as well as states Drow and Stg1. Therefore there are only 2 states of vigilance: Awakening and Sleep (artefacted states Art-Cawoe, ArtCace and Mv are not taken into account). The network architecture includes 23 units in the input layer, that represent the 23 spectral bands. With the SOM-based approach (5x5 map), a global success rate of roughly 77 % is reached (correctly classified states in the generalization phase). With the MLP-based approach (23x10x8), this global success rate reaches 88 %. More detailed results may be found in [1].
3 Implementation of the neural models 3.1 Towards an embedded implementation The results mentioned above were obtained from software simulations carried out on a PC. The aim of our work is to design an embedded system for vigilance detection. Therefore, our goal is to implement the decision process on FPGAs, using pre-trained parameters obtained from simulation. Such a work has to take into account several parameters and useful characteristics such as power consumption according to the external clock frequency, number of inputs/outputs, integration surface, or neural parallelism (neurons may work in a concurrent way). Some of these parameters are difficult to estimate before synthesis. Computation speed is sometimes another essential criterion. In our case, the decision (hypovigilance detection) may be done in real time without difficulty. Therefore speed does not stand as a real constraint for the technological choices that must be made. On the contrary, the assumption of an embedded circuit in an ambulatory system implies the need of a very low-power implementation. Among the parameters quoted above, the number of inputs/outputs, and above all the level of neural parallelism have a direct influence on the obtained implementation consumption. A fully parallel implementation is a real challenge, taking into account the size of the neural networks and of the FPGA.
Figure 2. MLP for vigilance states classification
3.2 Implementation environment
2.5 Results Our connectionist tools have been applied to portions of EEG signals for various subjects. Different tasks have been studied: vigilance states classification or discrimination, recognition of artefacted states, classification of sleep and
Since the appearance of programmable hardware devices, such as field programmable gate arrays (FPGAs), algorithms may be implemented on very fast integrated circuits with software-like design principles. Usual VLSI designs lead to very high performances. But the time needed to realize a circuit is too long, especially when different neural network configurations must be tested. The chip production time is usually very long (up to 6 months).
FPGAs, such as Xilinx FGPA ([7]), are based on a matrix of configurable logic blocks (CLBs). Each CLB is able to implement small logical functions (4 or 5 inputs functions) with a few elementary memory devices (flipflops or latches) and somes multiplexors. Depending on the CLB capabilites, some operators can lead to small or large solutions in the FPGA. The Xilinx Virtex series are well-suited for the implementation of serial arithmetic operators. The CLBs can be connected using a configurable routing structure. An FPGA approach simply adapts to the handled application, whereas a usual VLSI implementation requires costly rebuildings of the whole circuit when changing some characteristics. A design on FPGAs requires the description of several operating blocks. Then the control and the communication schemes are added to the description, and an automatic “compiling” tool maps the described circuit onto the chip. This work uses a RC1000-PP hardware platform from Celoxica Ltd. It is a standard PCI bus board equiped with a Virtex XCV1000E-4BG560 FPGA from Xilinx, with up to 1,000,000 system gates. It must be pointed out that current FPGAs already outperform the capacity of the Virtex XCV1000E: such an FPGA contains 27648 logic cells, to be compared with the 73008 ones of the largest Virtex-E, as well as with the 142128 logic cells of the largest current Virtex-4.
propagation issue may be avoided. For more information about this very specific arithmetics, see [3]. Our implementation uses this arithmetics, since a MLP uses sigmoidal functions that can not be computed in a LSBF mode.
3.3.2 Computation precision Software simulation must be performed to study the precision that is required by a neural application before its hardware implementation. Precisions must be studied with respect to differents kinds of data: weights, inputs, neuron outputs, internal computations, . . . It must be mentioned that implementations based on serial arithmetics might be more easily extended to larger precisions than implementations based on parallel arithmetics. It would mainly induce a linear increase of the implementation area of multipliers and elementary functions, and a modification of control time intervals.
3.4 Implementation of connectionist models 3.4.1 Implementation of the self-organizing map
The main issues when a neural network is mapped onto a FPGA are the huge number of operators, and the routing problems due to the dense interconnections of these models. A first standard technological choice to solve these problems is to use serial arithmetics: smaller operators may be implemented, and they require less connection wires. Another essential technological choice is to estimate the minimum precision required to keep satisfactory results, so as to use as small as possible operators.
An FPGA implementation of the SOM used for alertness decision is described in [2]. It is a fully parallel implementation where all connections and neurons work simultaneously. It uses a standard LSBF serial arithmetics, with a precision of eight binary digits (the size of the fractional part depends of the type of data in the computations). This implementation of the whole SOM network requires 24572 logic cells, and its delay is equal to 34. With the minimal frequency of the board, 400 KHz, computation time is 92.5µs, which is still largely sufficient for the realtime constraints of hypovigilance detection. Consumption is then of 32 mW (1000 to 2000 times less than with current processors). Taking into account a decision process made every second in the daytime, consumption remains lower than 4 µW.
3.3.1 Serial arithmetics
3.4.2 Implementation of the MLP
Serial arithmetics correspond to computation architectures where digits are provided in a serial way, i.e. digit after digit. Serial arithmetics lead to operators that need small implementation areas and less inputs/outputs, and that easily handle different precisions, without an excessive increase of the implementation area. Serial systems are characterised by their delay, i.e, the number δ such that p digits of the result are deduced from p+δ digits of the input values. Two main kinds of serial arithmetics are available: LSBF (least significant bit first), and MSBF (most significant bit first). The only existing radix-2 MSBF serial arithmetics is called on-line arithmetics. It uses a redundant number representations system, thanks to which any carry
The straightforward parallel implementation of the 23x10x8 MLP is simply not possible. Each connection requires a multiplier, each neuron requires a global adder and a sigmoidal transfer operator (such operators only exist in MSBF mode). Therefore, more than 50000 logic cells (see [4] for the implementation requirements of on-line operators). Moreover, complete connection schemes between layers can not be handled by the routing structure of FPGAs.
3.3 Technological choices
4 Field Programmable Neural Arrays The theoretical and practical framework developed in [5, 4] aims at developing neural architectures that are easy to map
onto FPGAs, thanks to a simplified topology and an original data exchange scheme. These FPNA models reconcile the high connection density of neural architectures with the need of a limited interconnection scheme in hardware implementations.
4.1 A set of autonomous neural resources FPNAs are based on a FPGA-like approach: a set of resources whose interactions are freely configurable. These resources (links and activators) are defined so as to perform computations of standard neurons, but they behave in an autonomous way. As a consequence, numerous virtual links may be achieved thanks to the application of a multicast data exchange protocol to the resources of a sparse neural network. Two kinds of autonomous neural resources appear in a FPNA: activators that apply standard neural functions to a set of input values on one hand, and communication links that behave as independent affine operators on the other hand. In a standard neural model, each communication link is a connection between the output of a neuron and an input of another neuron. The number of inputs of each neuron is its fan-in in the connection graph. On the contrary, communication links and neurons become autonomous in a FPNA: their dependencies are freely programmable. More precisely, the communication links connect the nodes of a directed graph, each node contains one activator. The specificity of FPNAs is that relations between any of the local resources of each node may be freely set. A link may be connected or not to the local activator and to the other local links. Therefore direct connections between affine links appear, so that the FPNA may compute numerous composite affine transforms. These compositions create numerous virtual neural links.
4.2 Computing in a FPNA Several computation methods have been defined for FPNAs. In any such method, all resources behave independently, and when a resource receives values, it applies its local operator(s), and sends the result to all neighboring resources to which it is locally connected (an activator waits for a predefined number of values before sending any result to its neighbors). Unlike standard neural computations, the FPNA paradigm allows a resource to be connected or not to a neighboring resource. Moreover, a communication link may handle several values, and it may directly send them to other links. It must be mentioned that FPNAs with rather simple topologies are functionally equivalent to more complex neural networks, thanks to the various virtual connections that correspond to directly connected consecutive links. But the weights that are associated with these virtual connections are indeed combined products of the weights of the available links, so that complex dependencies exist
between virtual weights.
4.3 Learning in a FPNA When some conditions are fulfilled by a FPNA (see the synchronous FPNAs of [5]), a gradient descent algorithm may be applied to learn its weights, though the computation of the gradient is more complex than in standard multilayer networks. Despite the complex dependencies that exist between the virtual connections, experiments have shown that the performances reached by standard feedforward neural models can be reached by equivalent FPNAs that use far less resources (for example 10 times less links in [4]). The corresponding FPNAs are then rather easy to map onto configurable hardware with the help of the modular implementation methods described in [4].
5 FPNA for alertness decision 5.1 Architecture and learning Based on the systematic approach of [4], a FPNA has been built to be functionally equivalent to the 23x10x8 MLP used in our application. This FPNA is shown in figure 3. It uses as many activators as there are neurons in the MLP. The local connections between neural resources are : • inter-layer links are connected to their input and output activators, as well as to the intra-layer links of their output node (to broadcast inputs inside layers) • intra-layer links are connected to their output activator, and to other intra-layer links so as to forward data inside the layer Virtual connections are illustrated in figure 3 by the successive links that appear thick: the combination of these connected resources creates a virtual connection that corresponds to the MLP thick connection in figure 2. The parameters of this FPNA have been tuned by means of a gradient descent algorithm. The global success rate for alertness decision is 86 %. It is already very close to the performance of the MLP, and far better than with the Kohonen map. Further learning improvements (for example using second order learning algorithms such as BFGS, that might prove more efficient on complex models such as FPNAs) might lead to FPNAs with the performance of the initial MLP.
5.2 Implementation The pipeline implementation method of [4] applies to the synchronous FPNA of figure 3. On-line arithmetics is used. Each FPNA link is mapped onto a multiplier and a small adder which fan-in is equal to the number of incoming local connections (always less than 3). Each activator is mapped onto an on-line tanh operator and a small adder (maximum
6 Conclusion
LINKS
LOCAL RESOURCE CONNECTIONS
ACTIVATORS
Figure 3. FPNA for alertness decision
fan-in equal to 3). Flip-flops are added so as to delay the different inputs of each adder: incoming data must be synchronized according to [4]. Figure 4 illustrates the implementation block of a single node in the hidden layer. to previous node of the layer
from previous node of the layer ADD
MULT
delay
ADD
MULT
delay from input layer
ADD
delay
MULT
delay
ADD
TANH
ADD
MULT
to output layer
delay ADD
MULT
to previous node of the layer
ADD
In this study, we have used connectionist methods with supervised and unsupervised training to discriminate the EEG signals that characterize a decrease of vigilance. The goal is to get an embedded system for ambulatory use. A low-power parallel implementation on FPGA was required. Our first implementation works have focused on the use of self-organizing maps, despite their lower performance for alertness detection, compared to multilayer perceptrons. This choice was motivated by the size of a well-performing MLP, that is far too large for a parallel implementation on FPGA. In this paper, we have described how the FPNA paradigm has made possible the definition of a neural model that is functionally equivalent to the above MLP, despite a far simpler topology. This topology results in less operators to implement, and in simpler connection schemes that can be handled by FPGA routing tools. The performance of this FPNA for alertness detection is almost as satisfactory as with the MLP it is derived from. Further works will focus on the problem of on-chip learning: the initial study shows that training must adapt to each individual, so that an autonomous embedded system must be able to learn the specificity of the user, before reconfiguring the FPGA with the low-power implementation described in this paper.
References [1] K. Ben Khalifa, M.H. Bedoui, L. Bougrain, R. Raychev, M. Dogui, and F. Alexandre. Analyse et classification des e´ tats de vigilance par r´eseaux de neurones. Technical Report RR-4714, INRIA, 2003.
MULT
from previous node of the layer
Figure 4. Detail of the pipeline implementation of the FPNA
[2] K. Ben Khalifa, B. Girau, F. Alexandre, and M.H. Bedoui. Parallel fpga implementation of selforganizing maps. In International Conference on Microelectronics - ICM’04, 2004. [3] M.D. Ercegovac and K.S. Trivedi. On-line algorithms for division and multiplication. IEEE Trans. Comp., C-26(7):pp 681–687, 1977.
5.2.1 Implementation results Our implementation uses 12-digit precision operators. The total number of logic cells (including the storage of the weights) is 16440. The global delay is 92. With the minimal frequency of the board, 400 KHz, the computation time is 250µs, which is still largely sufficient for the realtime constraints of our application. Consumption is then of 82 mW. Taking into account a decision process made every second in the daytime, consumption remains lower than 10 µW. It must be pointed out that these results have been obtained through simulations of a modular VHDL description of the system. An on-chip validation is still required.
[4] B. Girau. Fpna: applications and implementations. In FPGA Implementations of Neural Networks. 2004. [5] B. Girau. Fpna: concepts and properties. In FPGA Implementations of Neural Networks. 2004. [6] Kohonen T. Self-Organization Maps. Springer, 2001. [7] Xilinx, editor. The Programmable Logic Data Book. Xilinx, 2002.