Area Efficient Implementations Of Fixed-template

968

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 45, NO. 9, SEPTEMBER 1998

Transactions Brief Area Efficient Implementations of Fixed-Template CNN’s Mancia Anguita, Francisco J. Pelayo, Ignacio Rojas, and Alberto Prieto

Abstract—Implementations of fixed-template Cellular Neural Networks (CNN’s) with reduced circuit complexity are presented. Considerable improvements in area without performance degradation have been obtained by: 1) using single-polarity signals that reduce the number of transistors required for signal replication and to generate the pseudo-linear output function; 2) using simple current-mode circuits to implement the output pseudo-linear function; and 3) searching for network parameter configurations that solve a particular application using the proposed circuit implementation with less hardware complexity. Experimental results for a CCD-CNN chip prototype with a density of 230 cells per millimetersquared (mm2 ) are also reported. Index Terms— Area-efficient implementation, fixed-template CNN, rewrite of cell state equation, simple circuits for the pseudolinear function, single polarity signals.

I. INTRODUCTION The CNN model introduced by Chua and Yang [1] has been widely studied due to its interesting features in performing principally image processing tasks. A CNN is basically an array of locally interconnected analog processing elements, or cells, operating in parallel, whose dynamic behavior is determined by the cell connectivity pattern (neighborhood extent) and a set of configurable parameters. The time evolution of the state of a cell c in an N 2 M -cell CNN is described by the differential equation [1]

dxc (t) dt

=

0xc (t) + gc (t) = 0xc (t) +

n

II. CNN WITH SINGLE-POLARITY SIGNALS (STATE, INPUT, AND OUTPUT)

An0c yn (t)

n; c N: M ; n n 2 NR (c); jxn (0)j 1; jun j 1 Bn0c un + 1

+

1

(1)

where n denotes a generic cell belonging to the neighborhood of cell c, N R (c), with radius equal to R. N1 (c) is the set of 3 2 3 cells centered in c (N1 (c) = fc 0 N 0 1; c 0 N; c 0 N + 1; c 0 1; c; c + 1; c + N 0 1; c + N; c + N + 1g), N2 (c) is the set of 5 2 5 cells centered in c, and so on. xc is the state of cell c, y n is the output from each cell n, defined in terms of the nonlinear function

yn = f (xn ) =

jxn + 1j 0 jxn 0 1j);

1 2(

offset term, and the initial states, the resulting CNN is configured to perform a given processing task on the inputs. Since the publication of the two papers by Chua and Yang [1] a number of VLSI approaches have been proposed to approximate the CNN model as well as other network models inspired by the original one. Test results of various working chips [2]–[12] have also been reported implementing CNN’s with either fixed [2], [5], [7], [11] or programmable parameters [3], [4], [6], [8]–[12]. Although programmable implementations offer, in general, a better performance/cost relationship, fixed-template CNN implementations are useful to carry out specific processing tasks on images with a higher pixel density and speed than those allowed by the programmable approach. In order to reduce the silicon area of fixed-templates CNN implementations, we have considered the following three points: 1) how to rewrite the cell state equation to obtain a simpler circuit implementation; 2) the use of simple circuits to implement the cell components; and 3) the search for new parameter configurations that lead to less circuit complexity for the particular circuit implementation to be used. A translation of the cell state equation leading to a simpler CNN implementation is proposed in Section II [point 1 above]. Section III presents simple current-mode circuits to implement the pseudo-linear function [point 2]. In Section IV examples are given of fixed-template CNN implementations using current mirrors and the cell output limiters presented in Section III. The parameter configurations used in these examples offer good efficiency (performance versus hardware cost) for the circuit implementation proposed [point 3]. Moreover, test results of a chip prototype for a CNN with eight cells implementing one of the examples are included in this section. Finally, Section V presents some conclusions.

yn 2 [01; 1]:

(2)

un is the input to the cell n, I is an offset term, and the matrixes A and B are called feedback and control templates, respectively. Depending on the values of the cloning template components, the Manuscript received April 14, 1997; revised October 20, 1997. The authors are with the Departamento de Electrónica y Tecnolog´ıa de Computadores, Facultad de Ciencias, Universidad de Granada, 18071Granada, Spain (e-mail: [email protected]). Publisher Item Identifier S 1057-7122(98)06502-7.

The implementations of CNN chips in [2]–[12] use cell signals (input, state, and output) that take both positive and negative values. However, adding shift values to the CNN cell input, state, and output, to use single-polarity signals, can be significantly reduced the area required to implement fixed-template CNN’s, without losing CNN accuracy or functionality. Dense integrated implementations can be obtained due to a reduction of practically half the number of transistors otherwise employed to obtain the limited output y c and the weighted terms [see (1)]. The error in copying a doublepolarity signal depends on the error of the two single-polarity signals (currents) needed to generate it ( 2 (I1 0I2 ) = 2 (I1 )+ 2 (I2 )); then this error is greater than the error when copying just a single-polarity signal. The designer may take advantage of this to improve the accuracy of the CNN implementation. With respect to functionality, this depends on the chosen CNN mathematical model and the accuracy of the implemented circuit. Using single-polarity signals, the mathematical model does not change, and the accuracy may even improve, then CNN functionality compared to a double-polarity model implementation is not reduced. In order to operate with only positive states a reference shift current IR is added to the state current of each cell, Ixc , such that IR jIRN j, with IRN being the negative limit of the state current. According to the first work cited in [1], jIRN j has a known value

1057–7122/98$10.00  1998 IEEE


(a)

(b)

969

(c)

(d) Fig. 1. Current-mode circuit proposed to implement the pseudo-linear function in a single-polarity CNN: (a) limiter configuration, (b) relationship between the limit current 2I L and the limitation voltage V 2L , (c) input–output characteristic, and (d) HSPICE simulation of transistors M1 and M2.

for each parameter configuration

jIRN j = IL +

n

where

jAn0c j +

n

jBn0c j

IL

0 I:

(3) Io = IL + I

Moreover, to work with positive inputs and outputs, the current IL that limits these variables is added; that is, 0 IL + Iuc 2IL , and 0 IL + Iyc 2IL . In terms of the above shifted currents, the state equation of the Chua–Yang CNN model (1) can be rewritten as

dlxc (t) = dt

+

n n

(IL + Iyn (t))An0c

An0c +

n

Bn0c IL :

(5)

In the FSR model, the cell state (x) coincides with the cell output (y ); then both take values in the range [01, 1].

III. PROPOSED CURRENT-MODE CIRCUITS TO IMPLEMENT THE PSEUDOLINEAR FUNCTION

(IL + Iun )Bn0c + Io

where Ioff = IR + I

0

n

An0c +

n

Bn0c IL :

(4)

In a similar way, the state equation of the FSR CNN model [4], [11] can be written as dlyc (t) = dt

n

0(IR + Ixc (t)) + IR + Igc (t) = 0(IR + Ixc (t)) +

0

0(IL + Iyc (t)) + IL + Igc (t) = 0(IL + Iyc(t)) + +

n n

(IL + Iyn (t))An0c (IL + Iun )Bn0c + Io

For a unipolar Chua–Yang CNN cell, the shifted cell output I L +Iy may be obtained from the shifted state I R + Ix using either of the two circuits in Figs. 1 and 2. In each of these, a current source subtracts the value I R 0 I L from I R + Ix, limiting the circuit output signal to a minimum. The reference voltage V 2L makes transistor M2 limit the output current to the maximum value 2I L [see Figs. 1(c) and 2(c)]. For the configuration in Fig. 1, the gate voltage of transistor M1 (V y ) reproduces the limited output. In the circuit in Fig. 2, the M1 gate voltage reproduces the limited output when this transistor operates in saturation, depending on M1 and M2 gate voltages when M1 works in its linear region. For an FSR-CNN cell, the current source is not required, thus the output limiter is reduced to just transistors M1 and M2 in the two circuit implementations. Figs. 1 and 2 also show HSPICE simulation results of transistors M1 and M2 of the limiter circuits. The limiter in Fig. 2

970


(a)

(b)

(c)

(d) Fig. 2. Alternative current-mode circuit proposed to implement the pseudo-linear function in a single-polarity CNN. (a) Limiter configuration. (b) Relationship between the limit current 2I L and the limitation voltage V 2L . (c) Input–output characteristic. (d) HSPICE simulation of transistors M1 and M2.

suffers from a strong Early Effect influence that can be reduced, replacing transistor M1 by a cascode configuration. Observe that both limiter circuits may also be used in double-polarity implementations by providing the appropriate shift sources at the limiter input and output. Monte Carlo simulations show a lower cell output error in CNN’s with the limiters proposed than using more conventional ones based on differential pairs [2], [8], [10] or on cascaded current mirrors [4]–[7]. In the simulations, deviations in the mobility and threshold voltage have been assumed for each transistor with both Gauss and uniform distributions. The values used to estimate the

parameter deviation are those given in [13] for transistors of 25 and 9 m2 . IV. CNN IMPLEMENTATION A high density current-mode fixed-template CNN can be implemented using the limiter circuits previously shown to implement the pseudo-linear output function, and current mirrors to obtain and reproduce single-polarity weighted terms [see state equation in (1), (4), and (5)]. FSR-CNN cells for connected component detection (CCD) and for shadow creation using the proposed circuit


(a)

971

(b)

(c) Fig. 3. CMOS current mode implementation of an FSR-CNN cell for CCD.

Fig. 4. CMOS current mode implementation of an FSR-CNN cell for shadow creation. The use of 1 instead of 2 as values of a reduces circuit complexity but considerably worsens circuit error tolerance.

implementation are shown in Fig. 3 [15], [16] and Fig. 4 [16], respectively. To obtain a CCD, the parameter configuration proposed in [14] has been employed. Shadow creation configurations have been proposed in [17] and [18]; however, higher performance (this is illustrated in the simulation examples of Table I) and less hardware complexity are obtained using the configuration shown in Fig. 4. Generalizing, an additional reduction in complexity for the circuit

implementation proposed above is obtained using CNN parameter configurations with: a smaller number of parameters, integer and near unity values, a higher number of parameters of equal value, and avoiding the use of template B . A CNN chip for CCD with eight cells based on the circuit in Fig. 3 has been designed, fabricated, and tested [16]. Each cell uses only 16 transistors, with a density of 230 cells per square millimeter

972


TABLE I COMPARATIVE SIMULATIONS FOR THE SHADOW CREATION CONFIGURATIONS PROPOSED IN [16]–[18] THAT ILLUSTRATE THE ERROR TOLERANCE AND SPEED (IN THE WEIGHTED TERM AND OFFSET) OF THE CNN PROGRAMMED WITH EACH CONFIGURATION. THESE SIMULATION RESULTS HAVE BEEN GENERATED FOR AN FSR-CNN (SIMILAR PERFORMANCE IS OBTAINED FOR THE CHUA–YANG MODEL) USING THE RUNGE–KUTTA 4 METHOD FOR DERIVATIVE APPROXIMATION WITH STEPS OF T = 0:05 [ IS THE CELL TIME CONSTANT, SEE (1), (4), AND (5)]. NETWORK OUTPUT AT TV INTERVALS IS SHOWN IN THE SIMULATIONS. THE MULTIPLICATIVE AND ADDITIVE ERRORS FOR I AND EACH WEIGHTED TERM HAVE BEEN GENERATED BY A UNIFORM DISTRIBUTION (UD). THE ERRORS FOR THE SAME CELL c ARE EQUAL FOR THE SIMULATIONS WITH DIFFERENT—CONFIGURED NETWORKS (FOR THE TWO SIMULATIONS WITH MULTIPLICATIVE ERRORS, 1 AND 2, THESE ARE EQUAL FOR THE EXAMPLES WITH THE SAME LABEL, 1 OR 2). AS CAN BE DEDUCED, MULTIPLICATIVE ERRORS AFFECT THE OUTPUT IMAGE WHEN > 1=6 FOR THE PARAMETERS IN [16] AND [17], AND WHEN > 1=7 FOR THE PARAMETERS IN [18]; AND ADDITIVE ERRORS AFFECT THE OUTPUT IMAGE WITH > 1=3 FOR THE PARAMETERS IN [16] AND [17], AND WITH > 1=6 FOR THE PARAMETERS IN [18]

(without photosensors) being obtained for the 1.2 m 2m–2p (2m: double metal, 2p: double poly) process of AMS (Austria Mikro Systeme International) using one metal layer for interconnections. If photosensors are included, a cell density of 160 cells per square millimeter is obtained. These densities are better than those obtained in previous current-mode implementations for CCD (designs with double-polarity signals) in [2] (based on differential pairs and cascode current mirrors), [7] and [11] (based on cascode current mirrors). The designs in [2], [7], and [11] employ, without photosensors, 36 transistors (31.39 cells per mm2 using a CMOS 2 m 2m–2p process), and 40 transistors (120 cells per mm2 using a 1.6 m 2m–1p process), respectively. Fig. 5 shows experimental measurements from the integrated prototype with 8 cells. We have verified a correct network functioning for supply voltages between 1.8 and 5 V, and current units between nanoamperes and microamperes. Using a power-supply voltage of 1.8 V and a limit current of 2IL = 1:2 A, about 3 s is needed to obtain

the CCD computation in a row of 8 cells. Using a power supply of 3 V and a limit current of 2IL = 1:2 A, the cell consumes 13 W. V. CONCLUSIONS A considerable improvement in area for fixed-template CNN implementation without performance degradation has been obtained by: 1) using single-polarity signals that reduce by approximately half the number of transistors required to obtain weighted terms and the pseudo-linear output function; 2) using simple current-mode circuits to implement the pseudo-linear output function based on two transistors for the FSR-CNN model with unipolar signals; and 3) searching for network parameter configurations for a particular application that require less hardware complexity using the proposed circuit implementation (configuration with fewer nonzero parameters, with integer and near unity values, with a higher number of parameters of equal value; and without using template B).


2

(a)

(b) Fig. 5. Experimental measurements from the integrated CCD prototype with 8 cells. (a) Experimental measurements captured by an acquisition board and (b) oscillogram showing cell speed for the last input test in (a). The upper plot that starts the CNN operation. shows the digital active-low signal g The lower plot shows the shifted output L + c of the 8th cell for this experiment. This current is converted to voltage by a 1 M resistor.

F 0>x I Iy

The brief has also shown the particular design and experimental results for a chip with eight cells for CCD based on the circuit implementation proposed. This prototype presents a cell density of 230 cells per mm2 (16 transistors), improving cell density of previous CCD-CNN implementations in [2], [7], and [11] with densities of 31.39 (36 transistors) and 120 (40 transistors) cells per mm2 , respectively. REFERENCES [1] L. O. Chua and L. Yang, “Cellular Neural Networks: Theory,” IEEE Trans. Circuits Syst., vol. 35, pp. 1257–1272, Oct. 1988. See also, “Cellular neural networks: Applications,” IEEE Trans. Circuits Syst., vol. 35, pp. 1273–1290, Oct. 1988. [2] J. M. Cruz and L. O. Chua, “A CNN chip for connected component detection,” IEEE Trans. Circuits Syst., vol. 38, pp. 812–816, July 1991. [3] H. Harrer, J. A. Nossek, and R. Stelzl, “An analog implementation of discrete-time Cellular Neural Networks,” IEEE Trans. Circuits Syst., vol. 39, pp. 466–476, May 1992. [4] A. Rodr´ıguez-Vázquez, S. Espejo, R. Dom´ınguez Castro, J. L. Huertas, and E. Sánchez Sinencio, “Current-mode techniques for the implementation of continuous and discrete time cellular neural networks,” IEEE Trans. Circuits Syst. II, vol. 40, pp. 132–146, Mar. 1993. [5] J. E. Varrientos, E. Sánchez-Sinencio, and J. Ram´ırez-Angulo, “A current-mode cellular neural networks implementation,” IEEE Trans. Circuits Syst. II, vol. 40, pp. 147–156, Mar. 1993. [6] M. Anguita, F. J. Pelayo, A. Prieto, and J. Ortega, “Analog CMOS implementation of a discrete-time CNN with programmable cloning templates,” IEEE Trans. Circuits Syst. II, vol. 40, pp. 215–219, Mar. 1993. [7] S. Espejo, A. Rodr´ıguez-Vázquez, and R. Dom´ınguez-Castro, J. L. Huertas, and E. Sánchez-Sinencio, “Smart-pixel cellular neural networks in analog current-mode CMOS technology,” IEEE J. Solid-State Circuits, vol. 29, Aug. 1994.

973

6DPCNN: A pro[8] M. Salerno, F. Sargeni, and V. Bonaiuto, “6 grammable mixed analog-digital chip for cellular neural networks,” in Proc. CNNA’96, Sevilla, Spain, June 1996, pp. 451–460. [9] A. Paasio, A. Dawidziuk, K. Halonen, and V. Porra, “Current mode cellular neural network with digitally adjustable template coefficients,” in Proc. IEEE Microneuro’94, Turin, Italy, 1994, pp. 268–272. [10] P. Kinget and M. S. J. Steyaert, “A programmable analog cellular neural network CMOS chip for high speed image processing,” IEEE J. Solid-State Circuits, vol. 30, Mar. 1995. [11] S. Espejo, “VLSI design and modeling of CNN’s,” Ph.D. dissertation, Univ. Sevilla, Mar. 1994. [12] M. Anguita, F. J. Pelayo, F. J. Fernandez, and A. Prieto, “A lowpower CMOS implementation of programmable CNN’s with embedded photo-sensors,” IEEE Trans. Circuits Syst. I, vol. 44, pp. 149–153, 1997. [13] M. J. M. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers, “Matching properties of MOS transistors,” IEEE J. Solid-State Circuits, vol. 24, pp. 1433–1439, Oct. 1989. [14] T. Matsumoto, L. O. Chua, and H. Suzuki, “CNN cloning template: Connected component detector,” IEEE Trans. Circuits Syst., vol. 37, pp. 633–635, May 1990. [15] M. Anguita, F. J. Pelayo, E. Ros, D. Palomar, and A. Prieto, “Focalplane and multiple chip VLSI approaches to CNN’s,” in Analog Integrated Circuits and Signal Processing, to be published. [16] M. Anguita, “Implementación de arquitecturas VLSI para Redes Neuronales Celulares (CNN’s),” Ph.D. dissertation, Dep. Electrónica y Tecnolog´ıa de Computadores, Universidad de Granada, Spain, July 1996. [17] T. Matsumoto, L. O. Chua, and H. Suzuki, “CNN cloning template: Shadow detector,” IEEE Trans. Circuits Syst., vol. 37, pp. 1070–1073, Aug. 1990. [18] L. O. Chua and P. Thiran, “An analytic method for designing simple cellular neural networks,” IEEE Trans. Circuits Syst., vol. 38, pp. 1332–1341, Nov. 1991.

Area Efficient Implementations Of Fixed-template

Area Efficient Implementations Of Fixed-template

Suggest Documents

High-Throughput and Area-Efficient FPGA Implementations of Data ...

Efficient Implementations of Machine Vision ... - Jan Wedekind

Efficient Software Implementations of Large Finite ...

Efficient Hardware Implementations of BRW Polynomials ... - Cinvestav

Correct and Efficient Implementations of Synchronous ... - CiteSeerX

Efficient implementations of predictive control - Google Sites

Efficient ECC-Based Directory Implementations for ... - CiteSeerX

Research Article Efficient FIR Filter Implementations ...

Comprehensive Efficient Implementations of ECC on C54xx Family of ...

Comprehensive Efficient Implementations of ECC on C54xx Family of ...

Efficient Uses of FPGAs for Implementations of DES and Its

Efficient Parallel Implementations of QM/MM-REMD (Quantum ...

Efficient FPGA Implementations of Block Ciphers KHAZAD ... - CiteSeerX

Efficient FPGA-based Implementations of the MIMO-OFDM ... - CiteSeerX

Efficient implementations of the sum-product algorithm for ... - CiteSeerX

Scalable Energy-Efficient, Low-Latency Implementations of Trained ...

Scalable Energy-Efficient, Low-Latency Implementations of Trained ...

PCIU: Hardware Implementations of an Efficient Packet Classification

Efficient Implementations of Mobile Video Computations on Domain ...

Efficient FPGA Implementations of Block Ciphers KHAZAD and MISTY1

SAR Imaging via Efficient Implementations of ... - LU Research Portal

Efficient Implementations of Four-Dimensional GLV-GLS ... - MDPI

Efficient Hardware Implementations of High Throughput SHA-3 ...

PCIU: Hardware Implementations of an Efficient Packet Classification