Neural Networks Using Bit Stream Arithmetic: a Space ... - CiteSeerX

1 downloads 0 Views 38KB Size Report
... for representing values is used in an analog implementation by Murray and Smith. [4]. ..... [4] Alan F. Murray and Anthony V. Smith, "Asynchronous. VLSI neural ...
Neural Networks Using Bit Stream Arithmetic: a Space Efficient Implementation Valentina Salapura Institut für Technische Informatik Technische Universität Wien Treitlstraße 3-182-2 A-1040 Wien AUSTRIA (+43) 1 -58801-8151 [email protected]

ABSTRACT In this paper an expandable digital architecture that provides an efficient implementation base for large neural networks, is presented. The architecture uses the circuit for arithmetic operations on delta encoded signals to carry out the large number of required parallel synaptic calculations. All real valued quantities are encoded on delta bit streams. The actual digital circuitry is simple and highly regular, thus allowing very efficient space usage of fine grained FPGAs.

INTRODUCTION One of the major constraints on hardware implementations of neural networks [1], [4], [5] is the amount of circuitry required to perform the multiplication of each input by its corresponding weight and their subsequent addition. This problem is especially acute in digital designs, where parallel multipliers and adders are extremely expensive in terms of circuitry [1]. An equivalent bit serial architecture reduces this complexity, but still tends to result in large and complex overall designs. This paper describes an alternative neural network architecture implemented extremely efficiently with Field Programmable Gate Arrays (FPGAs), but with the improved performance. FPGAs [7] provide different design choices to be evaluated in a short time and keep system cost at a minimum. For the proposed design, training is done offchip, and a fully trained configuration is downloaded into hardware. Also the encoding of input signals is performed off-chip and fed to the input neurons. In this way, less hardware is required. As training occurs only once in the lifetime of an application, this method does not reduce the network's functionality.

The central idea is to represent the real-valued signals passing between neurons as delta encoded binary sequences. A real value v in the range [-1, 1] is presented with delta encoded sequence. Adding this sequence appropriately to itself or to a zero-sequence, the multiplication by any real value is obtained. The addition of sequences is realised using simple logic circuitry: a single one-bit full adder.

RELATED WORK The pulse-stream encoding scheme for representing values is used in an analog implementation by Murray and Smith [4]. They perform space-efficient multiplication of the input signal by the synoptic weight by intersecting it with a highfrequency chopping signal. van Daalen et al. [6] present a bit-stream stochastic approach. They represent values v in the range [-1, 1] by stochastic bit-streams in which the probability that a bit is set is (v+1)/2. Their input representation and architecture restrict this approach to fully interconnected feed-forward nets. The non-linear behaviour of this approach requires that new training methods be developed. In [2], we propose an another bit-stream approach. Digital chopping and encoding values v from the range [0, 1] by a bit stream, where the probability that a bit is set is v, are used. Using this encoding, an extremely space efficient implementation of the multiplication is achieved. This method enables the construction of a various network architectures. GANGLION [1] is a very fast implementation of a simple three layer feed forward net. The implementation is highly parallel achieving performance of 20 million decisions per second. This approach is extremely real estate intensive.

BIT-STREAM ARITHMETIC Delta modulation (d.m.) is a simple encoding technique practiced in signal processing. Every real value v, in the range [-1, 1] is represented by a delta encoded binary sequence (d.b.s.). This d.b.s. has a property that the difference of the numbers of symmetrical ones and zeros is proportional to the value v (see Fig. 1):

The adder is realized with a conventional one-bit full adder interchanging the roles of the sum and carry outputs. A D - flip-flop is necessary for storing the carry. The complete circuit is called delta full adder (DFA) and is shown in Figure 2.

g:lYl q → v

(

)

g {Yl } =

n1 ( Yi ) − n0 ( Yi )

n1 ( Yi ) + n0 ( Yi )

,

i = 1, 2,

A

{Yl } {X l } {C l }

K, l

Co

B

{Sl }

S

Cl

D Q

where Yi is an element of a binary sequence, having values

Figure 2: Delta full adder.

Yi = 1 or Yi = 0 , n1 bYi g, n0 bYi g are the numbers of ones and zeros in the sequence, lYl q is the sequence itself with a length l = n1 bYi g + n0 bYi g , and the function g assigns to every sequence lYl q an appropriate real value v. For v=0,

As the factor 2 is introduced by each addition for the corresponding real value, it is possible with successive additions to realise a multiplication of a d.b.s by a constant α , α < 1. Consider a sequence l Pl q as a product of d.b.s.

the numbers of ones and zeros in the sequence are equal. This sequence is called idling sequence, denoted l Il q .

−1

l Xl q

and the constant

α , so that gcl Xl qh = x . The

multiplication gives the product gclPl qh = αx . The constant will be considered as rounded-off binary number with q significant bits:

0.75

L +α 2

α = α 1 2 −1 + α 2 2 −2 +

0.25

−q

q

with α i = 1 or 0, i=1,2,...,q-1 and α q = 1.

0.00 Then l Pl q can be written as follows:

-0.50

K+

l Pl q = m Al r + em Al r + e 1

Figure 1: A few d.b.s. for different real values v.

2

dm Al r + l Il qijj

with m Al r = l Xl q if α i = 1 and m Al r = l Il q if α i = 0 . i

Kouvaras [3] developed the arithmetic for d.s.q. which makes possible the addition of sequences and multiplication of a sequence by a constant. The digital hardware necessary for implementation of the described arithmetic is simple and modular, employing only conventional one-bit full adders and D-flip-flops. Consider a sequence lSl q as a sum of two d.b.s., l Xl q and lYl q :

lSl q = l Xl q + lYl q ,

Kouvaras

[3],

gives the gclSl qh = 2 −1 a x + y f :

i

The multiplier is realized with a series of q DFAs, where q is a number of significant bits of α . The complete circuit is called delta multiplier (DMP). The circuit for multiplication a d.b.s. l Xl q by a constant α = 0,100112 is shown in Figure 3.

so that gclYl qh = y and

gcl Xl qh = x . The addition of two d.b.s., defined by half-sum

of

Si = Xi Yi + Xi Ci −1 + Yi Ci −1 Ci = Xi ⊕ Yi ⊕ Ci−1 Ci−1 = 1 or Ci−1 = 0

two

d.b.s.,

q

{Xl }

{Il }

{Il }

{Xl }

{Xl }

{Pl } {Il } 1

0

0

1

Figure 3: Delta multiplier for α = 0,100112 .

1

Figure 4: The constructed neuron.

CONSTRUCTION OF THE NEURON

THE OVERALL NETWORK ARCHITECTURE

In the neuron, the following three functions are performed: the multiplication of the inputs by synaptic weights, their addition and the activation function of the neuron. As arithmetic operations on d.b.s. are hardware efficient, all operations are performed in parallel.

From single neurons various neural nets can be realized. A best fitting interconnection scheme for a particular application can be chosen and implemented using FPGA cell routing. This neuron design can be used for realization of a wide range of different neural nets models The architectures which employs neurons with binary or continuos inputs and with hard- limiter as activation function are supported. Such nets are Hopfield model, Kanrva's SDM net, feed-forward, recursive, etc. As an illustration, a feed-forward network with four neurons in the input layer, four neurons in the hidden, and two neurons in the output layer is shown in figure 5.

Figure 4 shows a neuron with eight inputs. The multiplication of each input by a synaptic weight is realised with the appropriate DMP. The multiplied signals are summed passing through the adder-tree, which is constructed of DFAs. The threshold activation function of the neuron is implemented by a 9-bit counter. It counts the number of bits set in the sum-sequence. The counter is initialized for each new bit stream fed to the neurons. The negated offset value θ (threshold) in the based representation is loaded into the counter. This offset value is chosen so that the counter will overflow into its top bit if the chosen threshold is exceeded. The output of the neuron is given by the counter's most significant bit. After all input bit streams of length l have been fed to the neuron, a new computational cycle can be started. A global counter checks this condition, and sends to all neurons an initialization signal. The neurons latch their output state and the local counter is reloaded to restart a new computation.

length of sequence

precision

firing rate

512

0.0039

64000+

256

0.0078

128000+

128

0.0156

256000+

Table 1: Firing rate of a neuron (per second, at 33 MHz), as a function of the length of the sequence. Following the design proposed, different performance and precision requirements can be obtained. These requirements

Figure 4: Example architecture: a feed-forward net with ten neurons. are functions of the sequence length l (see table 1). Here the precision is defined as the distance between two successive representable values from the range [-1, 1].

Oliver Maischberger for valuable help in the preparation of this paper.

It is possible to implement several neurons in a single FPGA. This number depends on the FPGA type, the number of inputs and the number of significant bits for multiplication, and is typically in the rage from four to 32 using a Xilinx XC4005. This expandable digital architecture provides by replicating these basic building blocks a space efficient real time implementation platform for large neural networks.

REFERENCES

CONCLUSION A space-efficient, scalable neural network design is proposed. This design can obtain a wide range of performance and precision requirements. Starting from an optimized, freely interconnectable neuron, various neural network models can be implemented. By using delta binary encoding of real values and the digital circuits which apply the arithmetic addition on it, the hardware for implementation of a single neuron can be reduced considerably. This allows for massive replication of neurons to build complex neural nets. FPGAs are used as hardware platform, facilitating the implementation of arbitrary network architectures. Acknowledgement The author would like to express her gratitude to Michael Gschwind for many suggestions used in this work and to

[1] Charles e. Cox and Ekkehard Blanz, "GANGLION - a fast field-programmable gate array implementation of a connectionist classifier", IEEE Journal of Solid-State Circuits, vol. 27, pp. 288-299, March 1992. [2] Michael Gschwind, Valentina Salapura and Oliver Maischberger, "Space Efficient Neural Net Implementation", Proc. of the Second ACM Workshop on Field-Programmable Gate Arrays, February 1994. [3] N. Kouvaras, "Operations on delta-modulated signals and their application in the realisation of digital filters", The Radio and Electronic Engineer, vol. 48, pp. 431438, September 1978. [4] Alan F. Murray and Anthony V. Smith, "Asynchronous VLSI neural networks using pulse-stream arithmetic", IEEE Journal of Solid-State Circuits, vol. 23, pp. 688679, March 1988. [5] Valentina Salapura, "A digital Hopfield neural network", Proc. of the ETAN'90, pp. 3.66-3.70, June 1990. [6] Max van Daalen, Peter Jeavons and John Shave-Taylor, "A stochastic neural architecture that exploits dynamically reconfigurable FPGAs", in IEEE Workshop on FPGAs for Custom Computing Machines, IEEE CS Press, April 1993. [7] Xilinx. The XC4000 Databook. Xilinx Corp., 1992.