addition using constrained threshold gates

0 downloads 0 Views 531KB Size Report
from analog and digital solutions, and in the last several years we have witnessed a growing number of ... It would be useful to find answers to such problems for any .... w∆⁄2e∆⁄2 + ∑ ...... Anderson, ed., Neural Information Processing Systems (American Institute of Physics, ... (Holt, Rinehart and Winston, New York, 1982).
Accepted at the International Conference on Technical Informatics ConTI’94, Timisoara, România, 16-19 November 1994.

ADDITION USING CONSTRAINED THRESHOLD GATES ① or

❝When Neural Networks Go VLSI ❞ V. Beiu†,‡, J.A. Peperstraete†, J. Vandewalle† and R. Lauwereins†,② †

Katholieke Universiteit Leuven, Department of Electrical Engineering, Div. ESAT Kardinaal Mercierlaan 94, B-3001 Heverlee, Belgium ‡ on leave of absence from “Politehnica” University of Bucharest Department of Computer Science, Spl. Independentei 313, 77206 Bucharest, România

Abstract. In this paper we show that efficient VLSI implementations of ADDITION are possible using constrained threshold gates (i.e. having limited fan-in and range of weights). We introduce a class of Boolean functions F▲ and while proving that ∀f∆ ∈ F▲ is linearly separable, we discover that each f∆ function can be built starting from the previous one (f∆−2) by copying its synaptic weights. As the G-functions computing the carry bits are a subclass of F▲, we are able to build a set of “neural networks” for ADDITION with the fan-in (∆) as a parameter, having depth = O (lgn⁄ lg∆)③, and size = O (nlgn⁄ lg∆). By taking ∆ = O (lgn), which keeps the weights polynomially bounded, depth = O (lgn⁄ lglgn) and size = O (nlgn⁄ lglgn). Further directions for research are pointed out in the conclusions.

1. INTRODUCTION In the following we shall consider feedforward neural networks (NNs) made of linear threshold gates (TGs). The arguments for this choice being that TGs are borrowing both from analog and digital solutions, and in the last several years we have witnessed a growing number of TG theoretical investigations. TGs have received more attention as microelectronic technology has evolved up to the mature point where it is possible to realize TGs in silicon having comparable area and time performances with classical logic gates (AND, OR), and to foresee the VLSI implementation of larger NNs [1, 2]. TGs are a challenging alternative to classic Boolean solutions due to their intrinsic nonlinearity, and the solid theoretical background [3, 4]. The new interest in TGs is proven by many articles from the late 80s and 90s showing that TGs are much more powerful than logic gates [5–11], as well as proposals of implementations [12–15], or even designs [16]. ①

② ③

This research work was partly carried out in the framework of a Concerted Research Action of the Flemish Community, entitled: “Applicable Neural Networks,” and partly supported by a Doctoral Grant offered to V. Beiu by KULeuven. The scientific responsibility is assumed by the authors. Senior Research Associate of the Belgian National Fund for Scientific Research. In this paper all the logarithms are base 2 and will be denoted by lg.

1

A linear TG is computing a Boolean function (BF) f : {0,1}n ➞ {0,1}, where the input vector is Z = ( z0, z1, … , zn−1 ) ∈ {0,1}n:

f (Z) =

    

1 0

if if

n−1 wizi + t ∑i=0 wizi + t ∑n−1 i=0

≥ 0 < 0

(1) ,

and wi are called synaptic weights, with t known as the threshold. Equivalently one can write f (Z) = sgn ( ∑ni =−01 wizi + t), where sgn it the signum function. A feedforward NN will be a feedforward circuit of TGs. Two important cost functions used to estimate the complexity of such NNs are: (i) depth, which is the number of layers, and (ii) size, representing the number of TGs in the circuit. These two values are linked to T = delay (depth) and A = area (size) of a VLSI chip [17, 18]. While this is true for logic gates of bounded fan-in, TGs (even with bounded fan-in) do not closely follow this proportionality. There is more than one reason: ● first, the area of the connections counts when unbounded fan-in TGs are used, as has been thoroughly investigated in [19] leading to the result that the “area required for inter-node connectivity grows as the cube of a node’s fan-in;” ● second, the silicon area occupied by a bounded fan-in TG can easily be put in relation to its associated incoming weights and threshold, as they have to be physically realized. Few articles have looked at such complexity measures. Beside [19], we should mention [20] which has considered edge complexity, and [7, 21] who have “calculated the number of bits needed to represent all the weights (since comparing the number of nodes is inadequate for comparing the complexity of neural nets as the nodes themselves could implement quite complex functions).” One can find out that there are also sharp limitations on the NNs we can implement: ● the maximal value of the input connectivity of one TG, or maximum fan-in = ∆, cannot grow over a certain limit, this being one of “the most significant barrier” [22]; ● the maximal ratio between the largest and the smallest weights of one TG is limited by the precision of present day analog technologies [23]. All of these hardly explored parameters, and the tradeoffs between them have motivated research to find algorithms for reducing the complexity of NNs [13, 24]. A lot of work has been devoted to find minimum size and/or minimum constant-depth TG circuits [6, 8, 10, 11, 14] but little is known about tradeoffs between those two cost functions [15]. And even less about how design parameters like fan-in, weights and thresholds influence the overall area and time performances. It would be useful to find answers to such problems for any BF, but since for the general case only existence exponential bounds are known [8, 15], it is important to isolate classes of functions whose implementation is simpler than that of others (polynomial), and to find efficient solutions for their synthesis [14, 25]. This paper strives on these lines and shows that VLSI efficient implementations of a class of BFs using TGs are possible. In Section 2 we present some theoretical results which will be used in Section 3 for implementing ADDITION. By having the fan-in as an additional parameter we will be able to develop a full set of solutions covering depth from O (lgn) to O (lgn⁄ lglgn) and size from O (nlgn) to O (nlgn⁄ lglgn). Several conclusions and future directions for research end the paper.

2

2. A CLASS OF LINEARLY SEPARABLE FUNCTIONS We define a class of BFs denoted by F▲. This is the class of BFs f∆ of ∆ input variables, with ∆ even, f∆ = f∆ ( g∆⁄2−1,e∆⁄2−1,…,g0,e0), and computing: (2)  g ∧ ( ∧ ∆⁄2−1 e )  . k   j k=j+1   One restriction is that the input variables are pair-dependent, meaning that we can group the ∆ input variables in ∆⁄2 pairs of two input variables each (g∆⁄2−1,e∆⁄2−1) , … , (g0,e0), and that in each such group one variables is “dominant” (i.e. when a dominant variable is 1, the other variable from the pair will also be 1). This restriction is not as strict as it might appear as one can use either direct or negate form of the input variables. Formally:

f∆ = f∆ ( g∆⁄2−1,e∆⁄2−1,…,g0,e0) =

∨ j⁄=−10 ∆2

 ∆⁄2−1 ∆ def   ek) , and gi ➯ ei  . F▲ = f∆ | f∆: {0,1}∆ ➞ {0,1}, ∆⁄2 ∈ IN ∗, with f∆ = ∨ j ⁄2=−10  gj ∧ (∧k=j+1    

Theorem 1 Proof

F▲ is a class of linearly separable functions. The proof is constructive, showing how to build a TG for ∀f∆ ∈ F▲. If ∆ = 4 eq. 2 becomes f4 (g1,e1,g0,e0) = g1 ∨ (e1∧g0), and from [3, 4, 26] it is well known that this is a linearly separable BF: f4 ( g1,e1,g0,e0) = sgn ( 2g1 + e1 + g0 − 2) .

(3)

Refining eq. 2 we can determine the following recursive version: ∆⁄2−1     ∆⁄2−1   ∆⁄2            = f∆+2 = ∨  gj ∧  ∧ ek  = ∨  gj ∧  ∧ ek  ∧ e∆⁄2  ∨ (g∆⁄2 ∧ 1) (2)     k = j+1   k = j+1   j=0  j=0        (2) (4) = g∆⁄2 ∨ e∆⁄2 ∧ f∆(g∆⁄2−1,e∆⁄2−1,…,g0,e0) = g∆⁄2 ∨ (e∆⁄2 ∧ f∆) .   Suppose that the claim is true for ∆ (i.e. f∆ is linearly separable), then: ∆⁄2

∆⁄2−1 (5)   ∆⁄2−1  f∆ = sgn  ∑ vigi + ∑ wiei + t∆    i=0 i=0   is true. As hypothesis for recursion we shall also consider that all the weights are positive integers while only the thresholds being negative integers (easily to verify for ∆ = 4 by looking at eq. 3: v1 = 2, w1 = 1, v0 = 1, w0 = 0, t∆ = − 2). To constructively prove that f∆+2 is linearly separable we build f∆+2: ■ copy all the corresponding weights from f∆; ■ add two weights v∆⁄ and w∆⁄ for the additional input variables g∆⁄ and e∆⁄ 2 2 2 2

v∆⁄2 ■

= 1 +

∑i ⁄=−10 wi ∆2

and

w∆⁄2 =

∑i ⁄=−10 vi ∆2

(6)

change the threshold to t∆+2 =

− 1 −

∑i ⁄=−10 vi ∆2

3



∑i ⁄=−10 wi . ∆2

(7)

These lead to: ∆⁄2−1 ∆⁄2−1      (8) f∆+2 = sgn  v∆⁄2g∆⁄2 + ∑ vigi  +  w∆⁄2e∆⁄2 + ∑ wiei  + t∆+2  .      i=0 i=0      We shall now verify that eq. 5 (f∆) and eq. 8 (f∆+2) satisfy the recursion (eq. 4). The following three cases have to be considered:



If g∆⁄2 = 1, f∆+2 = 1 (see eq. 4) and by hypothesis e∆⁄2 = 1; eq. 8 becomes: ∆⁄2−1 ∆⁄2−1      f∆+2 = sgn  v∆⁄2 + ∑ vigi  +  w∆⁄2 + ∑ wiei  + t∆+2  .      i=0 i=0      The worst case—due to the fact that the weights are positive—is when all the other inputs are 0. By substituting eq. 6 and eq. 7 we obtain:

f∆+2 = sgn ( v∆⁄2 + w∆⁄2 + t∆+2 ) = sgn (0) = 1. ■

If g∆⁄2 = 0 we have to analyse two other cases. ●



First suppose that e∆⁄2 = 0. Now f∆+2 = 0 (see eq. 4) and eq. 8 can be rewritten ∆⁄2−1  ∆⁄2−1   f∆+2 = sgn  ∑ vigi + ∑ wiei + t∆+2  i=0  i=0   and even if all the other input variables are 1, the value of t∆+2 (eq. 7) is large enough, such that f∆+2 = sgn ( − 1) = 0. The last and most complicated case is when e∆⁄2 = 1 which makes f∆+2 = f∆. Starting again from eq. 8 we obtain:

 ∆⁄2−1 f∆+2 = sgn  ∑ vigi +  i=0  and by substitution of

∆⁄2−1     w∆⁄ +   w e + t i i ∆+2  ∑  2    i=0    eq. 6 and eq. 7, it becomes:

∆⁄2−1 ∆⁄2−1 ∆⁄2−1  ∆⁄2−1  ∆⁄2−1      f∆+2 = sgn  ∑ vigi +  ∑ vi + ∑ wiei  − 1 − ∑ vi − ∑ wi  =  i=0  i=0   i=0 i=0 i=0     ∆⁄2−1 ∆⁄2−1 ∆⁄2−1   = sgn  ∑ vigi + ∑ wiei − 1 − ∑ wi  .  i=0  i=0 i=0   Due to the recursion hypothesis (eq. 5), the first two sums are − t∆ + ε, with ε ≥ 0 if f∆ = 1, and respectively ε < 0 if f∆ = 0, so: ∆⁄2−1    f∆+2 = sgn  − t∆ + ε − 1 − ∑ wi    i=0   and replacing t∆ as given by eq. 7 (reverse recursion):

4

∆⁄2−1   ∆⁄2−2   ∆⁄2−2 ∆⁄2−2  f∆+2 = sgn 1 + ∑vi + ∑wi + ε − 1 − ∑wi = sgn  ∑vi + ε − w∆⁄2−1 . i=0   i = 0 i=0  i=0       Finally use eq. 6 as reverse recursion to obtain:

 ∆⁄2−2 f∆+2 = sgn  ∑ vi + ε −  i=0  These conclude the proof.

 ∑ vi  = sgn (ε) i=0 

∆⁄2−2

= f∆ . ❏

Remark 1 Clearly ∀f∆ ∈ F▲ can be implemented by one TG having the fan-in limited by ∆ − 1 (as e∆⁄2−1 is not used). Remark 2 It should be mentioned that copying the weights of a linearly separable function to build another (even if changing the threshold) does not give rise to a linearly separable function in general. This has been proven in [22] where the conclusion followed that: “no direct mapping of weights exists between fully and limitedinterconnect nets.” Theorem 1 shows that there are particular BFs for which such a direct mapping of weights can be found. The weights and the thresholds for the TGs implementing the first three f∆ functions (∆ = 4, 6, 8) can be seen in Fig. 1. We now proceed to show that these TGs can satisfy constrains concerning weights and thresholds by proving fan-in dependent upper bounds. Corollary 1

The absolute weights and threshold of the TG implementing ∀f∆ ∈ F▲ are ∆ bounded by 2 ⁄2.

Proof

By solving the system of recurrent equations 6 with v1 = 2, w1 = 1, v0 = 1, w0 = 0, we find that for ∀i ≥ 4, vi and wi are vi = wi = 5 ⋅ 2i−3. Replacing i with the largest possible value we get the maximal weights: v∆⁄2−1 = w∆⁄2−1 = ∆ ∆ = 5 ⋅ 2 ⁄2−4 = 5⁄16 ⋅ √ 2∆ < 2 ⁄2. From eq. 6 and eq. 7: t∆ = − v∆⁄2−1 − w∆⁄2−1, so the maximum threshold is: |t∆| = 5 ⋅ 2 ⁄2−3 = 5⁄8 ⋅ √ 2∆ < 2 ⁄2 ∆



(9) ❏

concluding the proof.

2 Figure 1.

f4

f6

f8

−2

−5

− 10

1

1

2

3

2

1

1

5

5

2

3

2

1

1

T he series of weights: 1,1,2,3,2,5,5 (10,10,20,20,...) and the corresponding thresholds: − 2,− 5,− 10 (− 20,− 40, …) for f4, f6, f8 (f10, f12, …).

5

This results can be slightly improved. Keeping in mind that the area of a VLSI implementation of any TG can be directly linked to the value of the weights and threshold of that TG, the lower these values are the better. We do not know how to lower the weights any more, but we can practically eliminate the threshold by diminishing it to a constant. Corollary 2

Any f∆ ∈ F▲ function can be realized by a TG which has constant threshold.

Proof

The proof is based a well known theorem from the switching theory of TG n wixi + w0) is the threshold gate realization of networks [3, 4, 26]: ❝If sgn (∑i=1 n f (x1, … , xn), then sgn [(∑i=1,i≠k wixi − wkxk ) + (w0 + wk)] is the realization of _ _ _ f ∗ (x1, … , xk, … , xn).❞ We implement f∆∗ = f∆∗( g∆⁄2−1, e∆⁄2−1, … , g0, e0) instead of f∆. By replacing eq. 6 and eq. 7 in eq. 8: ∆⁄2−1 ∆⁄2−1 ∆⁄2−1       ∆⁄2−1        ∆ + ∑vigi +  ∑vie ⁄2 + ∑wiei + −1− ∑vi− ∑wi , f∆+2 = sgn ∑      i = 0  i=0 i=0   i=0 i=0       and now using the above mentioned theorem we compute:

 ∆⁄2−1   wig∆⁄2 1+  i = 0   

∗ t∆+2

 = − 1 −  

∆⁄2−1

   v − w + ∑ i ∑ i  i=0 i=0  

∆⁄2−1

∆⁄2−1

∆⁄2−1

∑vi +

i=0

 w ∑ i = − 1. i=0 

∆⁄2−1



Remark 3 If we consider the area equal to the sum of the weights plus the thresholds (this being a much more accurate estimate of the area than the size [6, 21, 27]), the previous corollary reduces the area with 33%! This claim follows by summing the weights: ∆⁄2−4

A∆w = ( 1 + 1 + 2 + 3 + 2 ) + 10

∑ 2i

= 9+

i=0

5 ∆⁄2 ∆ ( 2 − 1) ≅ 10 ⋅ 2 ⁄2−3 4

∆ and remembering that the threshold has been |t∆| = 5 ⋅ 2 ⁄2−3 (eq. 9).

The weights and the thresholds for the TGs implementing the first three f∆∗ functions (∆ = 4, 6, 8) have been drawn in Fig. 2.

2 Figure 2.

f4 ∗

f6 ∗

f8 ∗

−1

−1

−1

−1

1

2

−3

2

−1

1

5 −5

2 −3

2 −1

1

The alternate series of weights: 1,–1,2,–3,2,–5,5 (–10,10,–20,20,...) having constant threshold (–1), for f4 ∗, f6 ∗, f8 ∗ (f10∗, f12∗, …).

6

3. ADDITION The results obtained so far can be used to compute ADDITION. We define the ADDITION of two n-bit binary numbers, an augend X = xn−1 … x0 and an addend Y = yn−1 … y0, as the unsigned sum of the addend added to the augend S = snsn−1 … s0. A well established method for computing the si being [28–31]: ci = (xi ∧ yi) ∨ (xi ∧ ci−1) ∨ (yi ∧ ci−1) , c− 1 = 0, _ _ _ _ _ _ si = xi ⊕ yi ⊕ ci−1 = ( xi∧yi∧ci−1 ) ∨ ( xi∧yi∧ci−1 ) ∨ ( xi∧yi∧ci−1 ) ∨ ( xi∧yi∧ci−1 ) , for i = 1,2, … ,n−1, and sn = cn. The ci are known as the “carry” bits. Historically much attention has been paid to the tradeoff between delay (depth) and number of gates (size), but later attention has switched and focused on the VLSI area complexity measure by looking at how to connect the gates in simple and regular ways [28]. Several adders build out of AND-OR bounded fan-in logic gates have: ● 2n − 1 delay and 5n − 3 gates (school method [32]); ● 4lgn delay and 35n − 6 gates (carry-look-ahead); ● 4lgn delay and 14n − lgn − 10 gates [28]; ● 3lgn + 2 delay and 3nlgn + 12.5n − 8 gates (conditional sum [33]); ● (2+ε)lgn delay and (2+ε)nlgn + 5n gates [34]; ● 2lgn delay and 2.5nlgn + 5n − 1 gates [33]; 6 ● 2lgn + 2k + 2 gates (prefix algorithm [32]) delay and n ( 8 + ⁄2k ) for 0 ≤ k ≤ lgn; ● lgn + 7√  2lgn + 16 delay and 9n gates (Krapchenko [32]). Some authors [34, 35] have formulated the problem of minimizing the latency in carry-skip and block carry-lookahead adders as a multidimensional dynamic programing. Others [36] have investigated implementations based on spanning trees. But on the whole a lot of effort has been devoted to practical implementations [33, 34, 36, 37]. Out of these, at least two papers made the remark that a way to reduce the number of logic levels (and corresponding the circuit latency) is to increase fan-in [34], or equivalently to group more bits [36]. But they mentioned that “no practical method for doing this has been presented in the literature.” From the completely different point of view of NNs, ADDITION has been considered as a challenging and useful function [10, 12]. It has also been proven that a depth-2 circuit of AND-OR logic gates for ADDITION must have exponential size. In [15] the two constructions detailed for ADDITION are based on AND-OR logic gates. As these gates can be simulated by TGs, they have been considered as unbounded fan-in gates, the results being: (i) a depth-7 TG circuit of size O (nlgn) [15, Lemma 4, p. 1408], and (ii) a depth-3 TG circuit of size O (n2) [15, Theorem 7, p. 1410]. We mention that due to the fact that the BFs from Step 3 and Step 4 of Lemma 4 [15] are f∆ functions, the depth-7 construction can immediately be shrunk to depth-5 by allowing TGs in the intermediate layers. Going for a lower depth (from 7 to 3) increases the size complexity. The solution to use TGs falls in between (depth-5). But this result can be generalized by showing how to bridge the gap between the logarithmic depth solution and O ( lgn ⁄ lglgn) depth ones. Theorem 2

The ADDITION of two n-bit numbers can be computed by a NN with polynomially bounded integer weights and thresholds in depth = ∆ = O ( lgn ⁄ lg∆), size = O ( nlgn ⁄ lg∆) and area = O (2 ⁄2 ⋅ n ⋅ lgn ⁄ lg∆) for all the integer values of the fan-in in the range 4 to O (lgn).

7

Proof

We use a ∆-ary tree to compute the carries. From [28] it is known that the carry chain can be computed based on an associative operator “❍”: (g, p) ❍ (g′, p′) = (g ∨ (p ∧ g′), p ∧ p′) ,

(10)

and (Gi, Pi) = (gi, pi) ❍ (Gi−1, Pi−1) = (gi, pi) ❍ (gi−1, pi−1) ❍ … ❍ (g0, p0) for all 2 ≤ i ≤ n. It has been proven that ci = Gi. Here gi is the “carry generate”, pi is the “carry propagate”, Gi can be imagined as “a block carry generate” (Gfunctions, or “triangles” [32]), and Pi can be imagined as “a block carry propagate.” The “carry generate” is gi = xi ∧ yi; for the “carry propagate” we can use either pi = xi ⊕ yi or pi = xi ∨ yi (see [15, 38]). The solution lends itself quite easily into a mesh of binary trees, where each node is a (Gi,Pi) processor [28]. Pi is an AND function and can be implemented by one TG. Gi is a f∆ function (eq. 10), so it can be implemented by one TG (Theorem 1). Replacing the AND-OR gates of [28], by TGs decreases the depth to 2lgn (instead of 4lgn). All the Pi TGs have 2 inputs; all the Gi TGs have 3 inputs (∆ − 1 inputs for ∆ = 4). The normal idea to increase the fan-in can now be applied. As the G-functions (the “triangles” [32]) are F▲ functions, the depth for computing any of the carries will be:  lgn   lgn  depthCARRY (n,∆) =  = O  .  lg∆ − 1   lg∆  The structure for computing all the carries is formed by a superposition of ∆-ary trees, each one using as many nodes as possible from the others. This “overlapping” trees will have a size lower than the product between the depth and the width of these trees (2n as they have to cover the inputs):  nlgn   lgn  sizeCARRY (n,∆) < 2n   = O  lg∆  . lg∆ − 1     To compute ADDITION we need one first layer of 2n gates computing gi and pi (with fan−in = 2), and one last layer for computing the si (with fan−in = 4) if the TG circuit for computing the carries is slightly modified in the output l ayer._ These __ _ fan−in =__∆ +_ 1 (_!) nodes wi ll comput e four fu nctio ns (xi+1∧yi+1∧Gi, xi+1∧yi+1∧Gi, xi+1∧yi+1∧Gi, xi+1∧yi+1∧Gi) as in [15]. The depth is increased by 3 and the size by 7n (4n in the first layer, 2n in the modified last layer for computing the carries, and n in the last layer for computing si). If the area for implementing one f∆ function is proportional to the sum of its incoming weights (we neglect the area of interconnections): ∆⁄2−4

AF (∆) TG

= (1 + 1 + 2 + 3 + 2) + 10 ⋅ ∑ 2i = 9 + i=0

5 ∆⁄2 ∆ ( 2 − 1) ≅ 5 ⋅ 2 ⁄2−2, 4

the area needed for ADDITION will be:  lgn  ∆ ⋅ 2n ⋅  + 3n + 6n + 2n ⋅ 5 ⋅ 2 ⁄2−2 + 4n ≅   lg∆ − 1  5n ∆⁄2  n lgn ∆⁄2  5n lgn ∆⁄2 2 + 2 + 13n = O  ⋅2 . ≅ 2 2 lg∆ ❏  lg∆ 

AADD (n,∆) = 5 ⋅ 2

∆⁄2−2

8

4

4

60

x 10

7 School 1 Carry

3.5

BPVL_4

6 Krapchenko

50

2 Kelliher

3

1 Carry 4 Brent

40

Size

BPVL_6

BPVL_lg

2 BPVL_16 4 Brent 5 Prefix

1.5

1

0 0

30

5 Prefix 20

10

School

200

Figure 3.

400

600 n

800

1000

1200

2 Kelliher 3 Wei BPVL_4

6 Krapchenko

0.5

(a)

Depth

3 Wei

2.5

BPVL_lg

(b)

0 0

200

400

600 n

800

1000

1200

(a) The size of ADDITION for: 1 - carry look-ahead; 2 - Kelliher et al.; 3 - Wei and Thompson; 4 - Brent and Kung; 5 - prefix algorithms; 6 - Krapchenko; 7 - school method; as well as for BPVL at several fan-in values; (b) the depth for the same cases. The best previously known solution is drawn with a thicker line.

For ∆ = lgn: sizeADD(n,lgn) < 2 nlgn⁄lg∆ + 7n and the weights are polynomially bounded. This compares favorable with the other known solutions (Fig. 3a). Here BPVL comes from the names of the authors and is followed by the maximum fan-in. Still, the real advantage comes from the lower depth: depthADD(n,lgn) < lgn ⁄ (lglgn − 1) + 3 (Fig. 3b). By parantheses the result that the G-functions (triangles) are linearly separable functions is not surprising: they are increasing functions of the inputs, but no constructive proof has been given in the literature. As an example one of the known articles [15] has used TGs to simulate AND and OR gates. Our proof that the G-functions are linearly separable reduces the depth-7 construction presented there to depth-5, while having the same O (nlgn) size. The main advantage of our construction over classical AND-OR solutions is that we can increase the fan-in such as to compute “generate” and “propagate” by trees having blocks ∆ times larger. These results can be used to compute an “optimum” fan-in for AT and AT 2. Corollary 3

The lowest values for the AT and AT 2 VLSI complexity measures of ADDITION are to be found in the close vicinity of ∆ = 4, and respectively ∆ = 6…8.

Proof

These claims follow by simply computing the derivatives of AT and AT 2 with respect to ∆. For doing that, the functions have to be considered continuous (by neglecting the ceilings), and that is why we cannot have an exact result, but only a “close vicinity.” ❏

The AT 2 can be seen in Fig. 4 for two different estimates of the area: (a) area ≈ size; and (b) the more precise area ≈ ∑ ∑ (|wi| + |t|). Simulation results are in agreement with the allTGs

i

“optimal” theoretical values of the fan-in: see the plots for AT 2 in Fig. 4b being minimized by ∆ = 8 for a wide range of values of n. Varying the fan-in in the range 4 to O (lgn) the full set of solutions can be found. As we increase n the connectivity pattern, although very regular, becomes more and more complex (see Fig. 5), which suggests that we should start counting the area of interconnections—which we have neglected (a valid assumption for typical size adders).

9

7

5 Prefix algorithms

6

7

x 10

3 7

1

6

4

2

x 10

BPVL_16

3 BPVL_4

6

2.5

5

2

BPVL_4

AT^2

AT^2

4

3

BPVL_lg 1.5

BPVL_5

1

BPVL_9 BPVL_7 BPVL_8

2 BPVL_lg 1

(a)

0 0

Figure 4.

0.5

BPVL_16

200

400

600 n

800

1000

1200

(b)

0 0

200

400

600 n

800

1000

1200

The AT 2 complexity measure for ADDITION: 1 - carry look-ahead; 2 - Kelliher et al.; 3 - Wei and Thompson; 4 - Brent and Kung; 5 - prefix algorithms; 6 - Krapchenko; 7 - school method; as well as for BPVL at several fan-in values: (a) when using the estimates A ≈ size and T ≈ depth; (b) when using the estimates A ≈ ∑ ∑ (|wi| + |t|) and T ≈ depth. allTGs i

4. CONCLUSIONS AND OPEN PROBLEMS In this paper we have defined a class of linearly separable functions F▲ and proved that ∀f∆ ∈ F▲ possesses the property that each one can be easily constructed starting from a previous f∆ function by simply copying the weights. We have also proved a direct relation between the limited fan-in value and the limited value of the weights and thresholds of f∆ functions and shown that by taking alternate signs for the weights we can eliminate the thresholds (for f∆ functions). Finally we have used these results to enhance the known performances of ADDITION. Even if the weights of F▲ functions are of the same order of magnitue 2∆, the “addition” made by the TGs is less “complex” than the addition of the inputs written in radix 2: 1, 1, 2, 3, 5, 5, ... versus 1, 2, 2, 4, 4, 8. Further research should be pursued to find other classes of functions having the property of F▲ functions and/or find similar relations between low level parameters (like fan-in) and global ones (like area and time) for other classes of functions. It still remains an open question how to lower even more the depth, size and/or area of ADDITION. With respect to more precise VLSI estimates one should include the area of interconnections and take into account the additional delay introduced by the long wires. We conclude that as the fan-in influences not only the size and the depth of a TG circuit, but also its range of weights and thresholds, it could be used by VLSI designers as a fine tuning parameter for both the area and the time performances of the final neural chip.

Figure 5.

The interconnection pattern for computing the carries when n = 64 and ∆ = 8.

10

REFERENCES [1] H.P. Graf, E. Sackinger, B. Boser and L.D. Jackel, Recent Developments of Electronic Neural Nets in USA and Canada, in: U. Ramacher, U. Rückert and J.A. Nossek, eds., Microelectronics for Neural Networks (Kyrill & Method Verlag, Münich, 1991) 471-488. [2] M.A. Holler, VLSI Implementations of Learning and Memory Systems: A Review, in R.P. Lipmann, J.E. Moody and D.S. Touretzky, eds., Advances in Neural Information Processing (Morgan Kaufmann, San Mateo, 1991) 993-1000. [3] S.T. Hu, Threshold Logic (University of California Press, Berkeley, 1965). [4] C.L. Sheng, Threshold Logic (Academic Press, New York, 1969). [5] A. Albrecht, On Bounded-Depth Threshold Circuits for Pattern Functions, in: I. Aleksander and J. Taylor, eds., Artificial Neural Networks (North-Holland, Amsterdam, 1992) vol. I, 135-138. [6] J. Bruck, Harmonic Analysis of Polynomial Threshold Functions, SIAM J. on Disc. Math. 3(2) (1990) 168-177. [7] J. Bruck and J. Goodman, On the Power of Neural Networks for Solving Hard Problems, J. of Complexity 6 (1990) 129-135. [8] J. Bruck and R. Smolensky, Polynomial Threshold Functions, AC0 Functions and Spectral Norms, SIAM J. Comput. 21(1) (1992) 33-42. [9] E. Mayoraz, On the Power of Networks of MAJORITY Functions, in: A. Prieto, ed., Lecture Notes in Computer Science 540 (Springer-Verlag, 1991) 78-85. [10] K.-Y. Siu and J. Bruck, On the Power of Threshold Circuits with Small Weights, SIAM J. on Disc. Math. 4(3) (1991) 423-435. [11] K.-Y. Siu and J. Bruck, On the Dynamic Range of Linear Threshold Elements, SIAM J. on Disc. Math. to appear. [12] N. Alon and J. Bruck, Explicit Construction of Depth-2 MAJORITY Circuits for COMPARISON and ADDITION, Tech. Rep. RJ 8300, IBM Research, Aug. 1991. [13] V. Beiu, J.A. Peperstraete and R. Lauwereins, Algorithms for Fan-In Reduction, in: Proc. IJCNN’92, Beijing, China (1992) vol. III, 204-209. [14] K.-Y. Siu and J. Bruck, Neural Computation of Arithmetic Functions, Proc. IEEE 78(10) (1990) 1669-1675. [15] K.-Y. Siu, V. Roychowdhury and T. Kailath, Depth-Size Tradeoffs for Neural Computations, IEEE Trans. on Comp. 40(12) (1991) 1402-1412. [16] R. Lauwereins and J. Bruck, Efficient Implementation of a Neural Multiplier, in: U. Ramacher, U. Rückert and J.A. Nossek, eds., Microelectronics for Neural Networks (Kyrill & Method Verlag, Münich, 1990) 217-230. [17] C.A. Mead and L. Conway, Introduction to VLSI Systems (Addison-Wesley, Reading, 1980). [18] J.D. Ullman, Computational Aspects of VLSI (Computer Science Press, Rockville, 1984). [19] D. Hammerstrom, The Connectivity Analysis of Simple Association, in: D.Z. Anderson, ed., Neural Information Processing Systems (American Institute of Physics, New York, 1988) 338-347. [20] R. Paturi and M. Saks, On Threshold Circuits for Parity, in: Proc. IEEE Symp. Found. Comp. Sci. (1990). [21] R.C. Williamson, ε-Entropy and the Complexity of Feedforward Neural Networks, in: R.P. Lipmann, J.E. Moody and D.S. Touretzky, eds., Advances in Neural Information Processing (Morgan Kaufmann, San Mateo, 1991) 946-952. [22] M.R. Walker, S. Haghighi, A. Afghan and A. Akers, Training a Limited-Interconnect, Synthetic Neural IC, in: D.S. Touretzky, ed., Advances in Neural Information Processing (Morgan Kaufmann, San Mateo, 1989) 777-784. [23] J.S. Shawe-Taylor, M. H.G. Anthony and W. Kern, Classes of Feedforward Neural Nets and Their Circuit Complexity, Neural Networks 5(6) (1992) 971-977.

11

[24] V. Beiu, J.A. Peperstraete and R. Lauwereins, Simpler Neural Networks by Fan-In Reduction, in: J.-C. Rault, ed., Neural Networks & Their Applications (EC2, Nanterre, 1992) 589-600. [25] V. Beiu, J. Peperstraete, J. Vandewalle and R. Lauwereins, Efficient Decomposition of COMPARISON and Its Applications, in: M. Verleysen, ed., European Symp. Artif. Neural Networks (D facto, Brussels, 1993) 45-50. [26] P.E. Wood. Jr., Switching Theory (McGraw-Hill, New York, 1968). [27] V. Beiu, J.A. Peperstraete and R. Lauwereins, Using Threshold Gates to Implement Sigmoid Nonlinearity, in: I. Aleksander and J. Taylor, eds., Artificial Neural Networks (North-Holland, Amsterdam, 1992) vol. II, 1447-1450. [28] R.P. Brent and H.T. Kung, A Regular Layout for Parallel Adders, IEEE Trans. Comp. 31(3) (1982) 260-264. [29] K. Hwang, Computer Arithmetic: Principles, Architecture and Design (John Wiley & Sons, New York, 1979). [30] H. Ling, High Speed Binary Adder, IBM J. Res. Develop. 25(3) (1981) 156-166. [31] S. Waser and M.J. Flynn, Introduction to Arithmetic of Digital Systems Designers (Holt, Rinehart and Winston, New York, 1982). [32] I. Wegener, The Complexity of Boolean Functions (John Wiley & Sons New York, 1987). [33] T.P. Kelliher, R.M. Owens, M.J. Irwin, and T.-T. Hwang, ELM – A Fast Addition Algorithm Discovered by a Program, IEEE Trans. on Comp. 41(9) (1992) 1181-1184. [34] B.W.Y. Wei and C.D. Thompson, Area-Time Optimal Adder Design, IEEE Trans. on Comp. 39(5) (1990) 666-675. [35] P.K. Chang, M.D.F. Schlag, C.D. Thomborson and V.G. Oklobdzija, Delay Optimization of Carry-Skip Adders and Block Carry-Lookahead Adders Using Multidimensional Programing, IEEE Trans. on Comp. 41(8) (1992) 920-930. [36] T. Lynch and E.E. Swartzlander, Jr., A Spanning Tree Carry Lookahead Adder, IEEE Trans. on Comp. 41(8) (1992) 931-939. [37] N.T. Quach and M.J. Flynn, High-Speed Addition in CMOS, IEEE Trans. on Comp. 41(12) (1992) 1612-1615. [38] R.E. Ladner and M.J. Fischer, Parallel Prefix Computations, J. ACM 27(4) (1980) 831-838.

12