theory of canonical piecewise-linear functions [Chua & Kang, 1977], the ... CNN is a brain-like type of computation, i.e. it re- .... (3) where {i, j} are two integer labels indicating the po- sition of the CNN cell Ci,j within a ...... equal components, each one equals to 2, we have ...... tens of minutes on a Pentium 233 Mhz processor.
Tutorials and Reviews International Journal of Bifurcation and Chaos, Vol. 9, No. 1 (1999) 1–48 c World Scientific Publishing Company
UNIVERSAL CNN CELLS RADU DOGARU∗ and LEON O. CHUA Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA ∗Department of Applied Electronics and Information Engineering, Polytechnic University of Bucharest, Bvd. Iuliu Maniu, 1, Sect 6, Bucharest, Romania Received February 5, 1998; Revised July 15, 1998 A cellular neural/nonlinear network (CNN) [Chua, 1998] is a biologically inspired system where computation emerges from a collection of simple nonlinear locally coupled cells. This paper reviews our recent research results beginning from the standard uncoupled CNN cell which can realize only linearly separable local Boolean functions, to a generalized universal CNN cell capable of realizing arbitrary Boolean functions. The key element in this evolutionary process is the replacement of the linear discriminant (offset) function w(σ) = σ in the “standard” CNN cell in [Chua, 1998] by a piecewise-linear function defined in terms of only absolute value functions. As in the case of the standard CNN cells, the excitation σ evaluates the correlation between a given input vector u formed by the outputs of the neighboring cells, and a template vector b, which is interpreted in this paper as an orientation vector. Using the theory of canonical piecewise-linear functions [Chua & Kang, 1977], the discriminant function P k w(σ) = z + z0 σ − s m (−1) |σ − zk | is found to guarantee universality and its parameters k=1 can be easily determined. In this case, the number of additional parameters and absolute value functions m is bounded by m < 2n − 1, where n is the number of all inputs (n = 9 for a 3 × 3 template). An even more compact representation where m < n − 1 is also presented which is based on a special form of a piecewise-linear function; namely, a multi-nested discriminant: w(σ) = s(zm + |zm−1 + · · · |z1 + |z0 + σ|||). Using this formula, the “benchmark” Parity function with an arbitrary number of inputs n is found to have an analytical solution with a complexity of only m = O(log2 (n)).
1. Introduction
bors but exchanges information with other cells via a central processor and certain digital controllers, in the CNN chips computation emerges by simply coupling the neighboring cells. Moreover, since the CNN cells are analog systems, the state space is continuous, and includes the binary computations as a special case. In fact, it was shown in [Chua, 1998] that any Cellular Automata [Toffoli & Margolus, 1987] with binary states can be realized as a special case of a Generalized Cellular Automata which is essentially a CNN with a discrete-time loop which applies the output of each cell at time “t” to its input at the following time step “t+1”. Since all cells are operated simultaneously, the computation
A cellular neural/nonlinear network (CNN), [Chua, 1998] is any discrete regular spatial architecture, such as a lattice, made of cells (described by a dynamical system) which are coupled to other neighbor cells within a prescribed “sphere of influence”. In most cases, the couplings are identical and the cells are described by nonlinear ordinary differential equations. This homogeneity is an essential feature for implementing a large number of cells in a given technology, in a manner similar to the design of computer memory chips. Instead of memory chips, however, where each cell is isolated from its neigh1
2 R. Dogaru & L. O. Chua
speed of CNN devices can be at least a thousand times faster than the speed of current digital signal processors (DSP). The type of computation performed inside a CNN is a brain-like type of computation, i.e. it relies on the emergence and interactions of certain patterns of cell activities. Therefore the problem to be solved by a CNN is usually posed as follows: Find the cell structure and coupling parameters so that a desired pattern of interaction will emerge. For an important class of CNN systems, namely, Reaction–Diffusion CNNs, the local activity theory [Chua, 1998] can be used to find a well-defined domain in the cell parameter space, called an edge of chaos, where emergent computations typically occur [Dogaru & Chua, 1998b, 1998d]. Increased computing speed and versatility can be achieved by inventing a universal cell structure which can be programmed to implement any local Boolean logic operation directly. In Sec. 5 of this paper we will demonstrate that by using the canonical form for expressing piecewise-linear (PWL) functions introduced in [Chua & Kang, 1977], any local Boolean function can be represented via a generalized CNN cell; namely, a canonical PWL universal cell. Moreover, it is shown that the cell parameters can be easily determined via an analytical procedure. Since arbitrary Boolean functions can be directly represented, there is no need for the decomposition (via a CNN program called a chromosome [Chua, 1998]) required by the standard CNN cells, [Crounse et al., 1997], and therefore the computation speed is maximized. Moreover, the use of a piecewise-linear discriminant opens new and interesting perspectives towards understanding the relationship between the local function performed by the cell and the potential emergence of useful dynamics in the CNN formed by such coupled cells. Instead of a statistical analysis over the entire space of abstractly defined Boolean functions [Langton, 1990], a deterministic analysis based on the identification of failure boundaries [Chua, 1998] in the cell parameter space can lead to better insights. For a canonical PWL cell the failure boundaries in the cell parameter space can be identified, as well as their membership in certain behavioral classes of interest, such as the corresponding edge of chaos domain defined for Reaction–Diffusion CNNs. Compared to the standard CNN cell, the 1
canonical PWL cell contains no additional multipliers but only an additional set of “m” absolute-value function terms, and the same number of additive threshold parameters. The value of m depends on the complexity of the local function and it can range from 0 (for the case of linearly separable functions, where the canonical PWL cell reduces to the standard cell) to 2n − 1, where n is the number of cell inputs. It is shown that for the complex Boolean “Parity” function (with n inputs),1 the complexity of a canonical PWL cells is m = O(n). The cell complexity m can be reduced even more, by choosing a special form of a piecewiselinear discriminant function; viz. a multi-nested formula, as shown in Sec. 6. In this case, the number m of additional parameters is only linearly dependent on the number of inputs (i.e. 0 < m < n − 1, and m = O(n)) in the worst case. Using this approach, Boolean functions considered to be highly complex, such as the “Parity” function (with n inputs) were found to admit implementations with a sub-linear complexity; viz. m = O(log2 (n))! The structure of this paper is as follow: Section 2 introduces several issues of interest from the perspective of designing more versatile CNN cells; viz. the CNN cell structure, the cell requirements, and an overview on Boolean functions and their representations. Section 3 introduces the key concepts of orientation vector and projection tape which are the basis for designing optimal piecewise-linear discriminant functions. Section 4 discusses methods for finding an optimal orientation which leads to the most compact solution, and which minimizes the number m of additional parameters. Using a valid projection tape a canonical piecewise-linear discriminant function can be designed and further optimized, as shown in Sec. 5. The most compact CNN cell, based on a multi-nested discriminant function, is presented in Sec. 6.
2. Preliminaries Throughout this paper we will restrict ourselves to the case of an uncoupled CNN cell [Chua, 1998] described by the following equations: Standard CNN cell σ=
X
bkl ukl
k,l∈{i−1,i,i+1}×{j−1,j,j+1}
,
9 X
bi ui
(1)
i=1
The “Parity” Boolean function is known for its “complexity” and is usually used as a “benchmark” in evaluating the complexity of adaptive nonlinear systems [Hassoun, 1994].
Universal CNN Cells 3
x˙ ij = −xij + aij f (xij ) + σ + z
(2)
1 yij = f (xij ) = (|xij + 1| − |xij − 1|) , 2
(3)
where {i, j} are two integer labels indicating the position of the CNN cell Ci,j within a two-dimensional grid [Chua, 1998], {k, l} are similar indices indicating the position of the neighboring cells, ukl represents the “9” inputs coming from the cell itself, and from its eight neighbors, xij is the scalar state variable associated with the CNN cell and yij is the associated output. The scalar variable σ is called an excitation, and in the case of the standard CNN cell, it is computed as a linear correlation between the feed-forward (controlling) template vector b = [b1 , . . . , bn ], which is a repacked version of the B template [Chua, 1998], and its associated input vector u = [u1 , u2 , . . . , un ], as defined in (1). The second notation, with the index “i” replacing the pair of indices {k, l} is more general and can be applied to arbitrary choices of CNN architectures (e.g. spherical as in a C60 molecule) and spheres of influence. From this perspective, n represents the number of cell inputs and Rn is the cell input space. It was proved in [Chua, 1998] that when the central (self) feedback coefficient aii > 1, and aij = 0, i 6= j, the cell dynamics starting from xij (0) = 0 converges towards a stable steady state for which yij (∞) = sgn(σ + z). The “infinity” symbol here denotes the time for the dynamics to reach a steady state output, and represents a small transient period (settling time) which depends on the implementation technology. In the case of current VLSI technology, it is usually in the order of nanoseconds (10−9 sec). It follows that an uncoupled CNN cell maps a continuous input space Rn into a binary (Boolean) output space. In the special case where the inputs are binary, the cell can realize various local Boolean logic functions. Since the excitation σ in the standard CNN cell in [Chua, 1998] is obtained as a linear correlation between the inputs and the b template vector, the standard CNN cell has the capability to implement only a limited number of local Boolean functions, viz. the linearly separable ones. Any linearly not separable Boolean function is realized using a series of templates (called a chromosome in [Chua, 1998]) and implemented via a CNN universal chip. In what follows we will show that by replacing σ with a nonlinear discriminant function w(σ, z0 , . . . , zm , s) of σ, any Boolean function
can be realized with the same cell structure defined by Eqs. (1)–(3). The most important result is that our generalized cell model requires no additional correlation unit except the one already used by the standard CNN cell defined in (1). The formula for the generalized cell, henceforth referred to as a “universal CNN cell” in the sense that every 9 22 = 2512 ≈ 10154 local Boolean functions of nine input variables (u1 , u2 , . . . , u9 ) can be realized by this generalized CNN cell, is given below: U niversal (P iecewise-linear) CNN cell σ=
X
bkl ukl
k,l∈{i−1,i,i+1}×{j−1,j,j+1}
=
9 X
(10 )
bi ui
i=1
x˙ ij = −xij +aij f (xij )+w(σ, s, z0 , z1 , . . . , zm ) (20 ) 1 (30 ) yij = f (xij ) = (|xij +1|−|xij −1|) 2 where for convenience the discriminant function w, (also called an offset in standard CNN cells [Chua, 1998]) is a nonlinear piecewise-linear (PWL) function to be defined in Secs. 5 and 6, and {s, z0 , . . . , zm } is an additional set of m+2 parameters (compared to the standard CNN cell). We must stress that (10 )–(30 ) is not the only model capable of realizing a universal CNN cell. However, this model has the advantage of simplicity and tractability due to its piecewise-linear nature. Another advantage is that it includes the standard CNN cell as a special case and therefore can be considered as a natural extension of the standard CNN cell. Moreover, it is important to observe that in our approach, (10 ) performs a dimensionality reduction via a projection from the n-dimensional input space to a scalar, onedimensional axis (the projection axis) corresponding to the excitation variable σ. As a consequence, the determination of the nonlinear discriminant function w(σ) is dramatically simplified, as shown in Secs. 5 and 6. By additional optimization of the orientation vector b = [b1 , . . . , bn ] associated with the projection axis, an optimal, or near-optimal, template b∗ = [b∗1 , . . . , b∗n ] can be found to minimize the number m of additional parameters. Since we have assumed that aij > 1, and aij = 0 for i 6= j, it follows from Eqs. (20 ) and (30 ) that: Steady state CNN output equation yij (∞) = sgn(w(σ))
(40 )
4 R. Dogaru & L. O. Chua
The following is a simplified model of the canonical PWL universal cell, where only the input– output relationship is retained for the purpose of this paper. In any hardware realization, however, one must include the small (approximately 10−9 sec via current VLSI technologies) transient time associated with the dynamics described by Eq. (20 ): U niversal (P iecewise-linear) CN N cell : Steady State Input–Output Representation σ=
X
bkl ukl
=
k,l∈{i−1,i,i+1}×{j−1,j,j+1}
9 X
(100 )
bi ui
i=1
(200 )
y = sgn[w(σ, s, z, z1 , . . . , zm−1 )]
B template into a “ribbon” as detailed in [Chua, 1998]. By choosing other b templates (orientation vectors), and/or by changing the numerical parameters within wPAR9 (σ) we can realize many other Boolean functions. However, the cell defined by wPAR9 (σ) in (5) is not universal, even if the constant parameters are changed, because it cannot realize certain Boolean functions with nine inputs regardless of the parameters. However, compared to the situation of the standard CNN cell defined by w(σ) = σ + z, the cell model (5) is much closer to universality since it can realize many linearly nonseparable Boolean functions.
2.1.2. Compactness 2.1. What would be an “ideal” CNN cell? An ideal CNN cell is an abstract concept, providing a reference for comparing various CNN cell models. As shown in this paper, some of the features of an ideal cell may conflict with each other in practice so that certain tradeoffs between them may be necessary. The following are the most important features of an “ideal” cell:
2.1.1. Universality Universality is concerned with the possibility to use the same physical structure of the CNN cell for implementing arbitrary Boolean functions, by simply changing (programming) the cell parameter values. It is also important that the ideal cell contains as few cell parameters as possible. As a general rule, a universal cell is less compact (i.e. it has more cell parameters and a higher complexity in physical implementation) than a cell dedicated to implement a specific function, or a restricted class of functions. For example, if one would like to use a CNN cell only for implementing the parity function with nine inputs2 (Parity9), the best choice (leading to the simplest realization) for w(σ) would be: wPAR9 (σ) = 1 − | − 2 + | − 4 + | − 8 + | − 7 + σ|||| , (5) with
b = [1
1 1
1 1
1 1 1
1]
where we have repacked the nine entries in the 2
This feature is concerned with the number of elementary physical units required to implement a CNN cell. There are two aspects influencing compactness: The number of free parameters and the complexity of the nonlinear discriminant function. Although the standard linear cell is not universal, it is the most compact: It has a realization requiring only n synapses associated with the linear correlation σ = buT and an additional threshold parameter. Compactness can be expressed as a function of n (the number of cell inputs). From this perspective, we are interested to determine universal cells having a polynomial rather than an exponential dependence on n in implementation complexity. As shown in Sec. 6, compactness of a universal CNN cell can be pushed to its limits in the case of the universal multi-nested PWL CNN cell where universality is achieved with only 2n + 1 parameters. Compactness is a feature which conflicts with robustness. The more compact a cell is, the less robust it will be. The reason is that any local Boolean function with “n” inputs requires N = 2n bits to be unambiguously specified. A compact universal cell has to preserve this information in its definition and therefore it must distribute it among the cell parameters. For example, an arbitrary (local) Boolean function with nine inputs (e.g. the standard local logic for two-dimensional CNNs) requires 512 bits to be specified. For the case of the multi-nested PWL cell (to be presented in Sec. 6) where universality is achieved with 2n + 1 parameters, each parameter requires, on average, at least 512/2·10 ≈ 26 bits. At
The Parity9 function returns 1 if and only if there is an odd number of active (ui = 1) inputs, and −1 otherwise.
Universal CNN Cells 5
the opposite extreme, the pyramidal universal cell in [Dogaru et al., 1998e] requires 2n parameters: In this case, since each parameter is associated with only 1 bit of information, the realization is maximally robust but clearly not compact at all.
2.1.3. Robustness A CNN cell is defined by a certain set of real parameters, as shown in (10 )–(30 ). When both the input and the output spaces are discrete (in our case, binary), the parameter space associated with a given cell structure is partitioned into compact domains separated by failure boundaries [Chua, 1998]. Each domain corresponds to the realization of a particular Boolean function and therefore there is a continuum of cell parameters, all of them realizing the same Boolean function. A visualization of the failure boundaries for the case of a simple standard CNN cell is given in [Chua, 1998]. Within each region, the most robust set of parameters can be determined as the point in the cell parameter space which maximizes its distance from the failure boundary. The degree of robustness of this cell parameter point depends on the volume of its associated region. Since the profile of the failure boundaries is determined by the nonlinear function w(σ) in (20 ), it follows that the robustness of a CNN cell is closely related to w(σ). Observe that universality and robustness are conflicting features. The more functions we can represent with the same structure, the more domains have to “compete” for the same parameter space and thus the less robust each function will be. Fortunately, robustness can be improved by increasing the dimension of the parameter space. However, in this case we have to accept a reduction in the cell compactness as illustrated above. It is important to note that in general, the shape of the regions separated by the failure boundaries in a high dimensional parameter space is difficult to determine analytically. Therefore, a robust solution is relatively difficult to define in analytical terms. In the case of the canonical PWL universal cell (to be presented in Sec. 5), however, an analytical robust solution can be easily identified in the {s, z, z1 , . . . , zm−1 } cell parameter subspace since the failure boundaries defined by w(σ) are simply points and the separating regions are simply segments in the one-dimensional space of the projection tape.
2.1.4. Capability of evolution This last feature is concerned with the capability of a cell to adjust its parameters so that it can “learn” new functions by evolutionary interactions with the external world. Basically this feature assumes that the design (or learning) algorithm is simple enough to admit an on-chip implementation so that the cell can rapidly adapt to novel tasks. Here, by evolution we mean mutations that may take place in the cell parameter space so that the cell can realize novel Boolean functions. There are two major components; namely, evolution by design and evolution by interactions. The first case corresponds to the genetic inheritance in the living systems, and is achieved by loading the CNN cell with a particular gene [Chua, 1998], chosen from a genome (previously determined by a human expert) so that the associated CNN will perform a prescribed task (e.g. contour detection). While any useful CNN cell model should provide a cell parameter identification (design) procedure, in many cases the following aspect of evolution is usually neglected. Evolution by interactions assumes that cells are allowed to mutate (change) their parameters as a result of their interaction with other cells, or with certain input stimuli provided by a specific problem. The goal is to optimize a cost function that cannot be described analytically in terms of a local Boolean function. For example, in defining the function of a “corner detector” [Chua, 1998] there are certain subjective issues which may lead human experts to generate different CNN genes to implement such subjective tasks. In such cases, one may consider a set of images with various types of corners and let the CNN cell evolve towards an optimum gene which minimizes the error between the desired output images and the actual ones. In this paper we will focus on more versatile CNN genes [Chua, 1998] that can support evolution by interactions. Such features may be also of interest from the perspective of building evolvable systems [Mange & Tomassini, 1998] capable of complex tasks such as self-repair and self-reproduction of their components.
2.2. Local binary computation Local computation in a CNN is a special case of Boolean computation where the inputs have some particular topological significance with respect to the CNN grid. However, at the cell level, the grid topology can be ignored and therefore the task is
6 R. Dogaru & L. O. Chua
simply to represent a Boolean function of the cell inputs. Since the CNN is made of many identical cells, we would like to invent CNN cells that are as compact as possible. For versatility, we would also like the cells to be universal. Indeed, let us consider again the Parity9 function. Using the standard digital design approach, 9 XOR gates with 2 inputs each are required to implement this Boolean function. However, using the cell realization (5), the number of active devices is reduced to only 4, corresponding to the realization of the absolute value functions. In the standard digital design, the addition of universality will lead to complicated hardware realizations, a universal cell from this perspective being nothing else than a RAM memory with 2n bits. We will show in this paper that an analog approach will lead to much more compact realizations of Boolean functions. In what follows we will consider the general case of implementing arbitrary Boolean functions of n input variables. According to the convention used in the CNN literature, a “0” (or false) logic level is coded with −1, while a “true” (or “1”) logic level is coded with +1. Let us now consider several methods for representing a Boolean function:
2.2.1. Using a truth table A truth table has N = 2n rows, corresponding to the same number of possible configurations of the input vector u = [u1 , u2 , . . . , un ]. For each possible input configuration, the binary output (−1 or 1) associated with the Boolean function is presented in an additional column Y. In principle, as long as both inputs and the output are presented in the table, there is no need for a special ordering of rows and columns. However, it is a common practice to accept a certain ordering preference. We will adopt the system in [Chua, 1998], where the leftmost column contains the most significant input bit and the rightmost column the least significant one. Therefore, in the first row all inputs are −1, while they are all +1 in the last row of the table. It is convenient to associate an index j to each row, where j = 0, . . . , 2n −1 with the convention that j = 0 corresponds to the last row in the table. For the above ordering convention the following equations define 3
entirely the input entries in the truth table: (
ui,j =
+1 if mod(j, 2n−i+1 ) < 2n−i −1 else
(6)
where mod(m, n) is the remainder of m/n. The binary output corresponding to each row j is denoted as γj ∈ {−1, 1}.
2.2.2. As a decoding tape Assuming the ordering scheme described above, one should note that there is no need to write down the input entries. Therefore, a much more compact representation of a Boolean function was proposed in [Chua, 1998] in the form of an N -dimensional vector Y (ID) = [γN −1 , γN −2 , . . . , γ0 ], where N = 2n . Using the color code in [Chua, 1998] (red for γj = +1, and blue for γj = −1) each Boolean function can be represented as a colored strip called a decoding tape, which is reminiscent to that of a gene in biological systems. The decoding tape is a useful concept, giving us a one-dimensional view of the Boolean function rather than a complicated description in the form of a truth table or a spatial representation. In fact, as we will see in the next subsection, the decoding tape can be treated as a special case of a more general construct called a projection tape. The integer ID is a function identification number which is the decimal equivalent of the binary vector Y . Let us consider the Boolean function Its associated decoding tape is “Parity3”.3 Y (105) = [−1, 1, 1, −1, 1, −1, −1, 1]. The equivalent binary number is obtained by substituting −1 with 0. The resulting equivalent binary string is given by: 011010012 = 1 + 23 + 25 + 26 = 10510 .
2.2.3. As a hyper-dimensional hypercube In the input space, each vector uj = [u1,j , u2j , . . . , unj ] corresponds to a vertex Vj of an n-dimensional hypercube. By assigning to each vertex a binary value Y (Vj ) = γj , the result is a hyper-spatial representation of the Boolean function Y as a geometrical object. To specify a Boolean function we use the color red to code all vertices corresponding to γj = 1 and we use the color blue to code all vertices corresponding to γj = −1. For example, in Fig. 1, the Boolean “Parity2” function (ID = 6)
The Parity3 function returns +1 if and only if one or three inputs are +1 while the remaining one(s) are −1.
Universal CNN Cells 7
with two inputs is represented by the colors of 22 = 4 vertices of a square. In Figs. 2 and 3, two other Boolean functions with three inputs (ID = 105, and ID = 142) are represented via the 23 = 8 vertices of a cube. The entire set of 256 Boolean functions with three inputs is represented graphically by a cube in R3 in [Chua, 1998]. The spatial representation of a Boolean function can be extended to any type of data, including illdefined problems normally arising from various pattern classification tasks. In such cases, each vertex Vj will correspond to a stimulus-desired output pair, where the stimulus is the input vector uj = [u1j , u2j , . . . , unj ] and the desired output is Y (Vj ) = γj . Since the procedures for the cell design described in Secs. 4 and 5 can be extended to arbitrary ill-defined input–output mappings, the canonical PWL CNN cell can be applied not only for implementing arbitrary Boolean functions, but also for various pattern classification and signal processing tasks. Such cells represent actually a more compact and simpler design alternative to adaptive structures, such as multilayer perceptrons (MLP) [Hassoun, 1995], or other types of neuro-fuzzy networks.
3. Orientations and Projection Tapes The main advantage of the spatial representation of a Boolean function as a hypercube with colorlabeled vertices is that it can provide hints on how to separate the “blue” vertices form the “red” ones, i.e. how to define a valid discriminant function w(u). It is clear that for any Boolean function, one can define an infinite number of nonlinear discriminant functions to separate the red vertices from the blue vertices. The universality requirement imposes that a canonical formula should describe it so
that any change in its associated Boolean realization must correspond only to a change of the parameters. It is also desirable that these parameters be determined via a simple and fast algorithm. The general equation of the discriminant hypersurface w(u) = 0 for an input space dimensions larger than n = 2 leads to some complicated topologies, making it rather difficult to map into a canonical piecewise-linear formula [Kahlert & Chua, 1992] with an arbitrary number of inputs. In the case of linearly separable Boolean functions, various convergent solutions exist, such as the classical “perceptron learning” algorithm, the LMS algorithm [Hassoun, 1995] or the use of such linear programming techniques as the Simplex algorithm. In this case, w(u) = 0 corresponds to the equation of a hyperplane: b1 u1 + b2 u2 + · · · + bn un + z = 0. In the case of linearly nonseparable functions, the main difficulty to derive a discriminant function analytically can be traced to the high dimensionality of the input space, which gives rise to complicated geometrical shapes which are difficult to interpret. Multilayer perceptrons [Hassoun, 1995] combine linear threshold gates (also called neurons) so that the discriminant functions are obtained by intersecting a certain number of (nonparallel) hyperplanes, each of which is associated with a particular neuron unit in a layer. The multilayer perceptron approach has two disadvantages when considered from the perspective of a CNN cell implementation: (i) The structure of a multilayer perceptron is not compact; each hyperplane is associated with a correlation (neuron) unit of the type described by Eq. (1) and since the hyperplanes are not parallel there exists no exact technique to predict the exact number of such units required by a specific problem. Therefore, a physical cell containing the maximal number of units (given
Fig. 1. (following pages) Piecewise-linear realizations of the Boolean function “Parity2” with two variables. The upper plot represents the input square, where each vertex represents a particular input vector and their color assignments define the output Boolean function. The projection tape associated with the orientation vector b and the orthogonal vertices projections of the four vertices V0 , V1 , V2 and V3 on the tape are transcribed onto a horizontal axis in the lower plot. (a) A valid projection tape with 2 transitions for the Boolean function Y = [−1, 1, 1, −1] (ID = 6, or “Parity2”), defined by the default orientation vector b = [2, 1]. (b) Another valid projection tape with 2 transitions, defined by the orientation vector b = [1, 1]. Observe that in this case, two different vertices (V1 , and V2 ) having the same color in the input space are projected onto the same point on the projection tape. (c) The orientation vector b = [0.5, 1] also leads to a valid canonical piecewise-linear realization of the “Parity2” Boolean function, using only one absolute value function. (d) The orientation: b = [0, 1] leads to a conflicting situation: Vertices from the input square having different colors are projected onto the same point on the projection tape. Such an orientation vector is not acceptable since it leads to a not-valid projection tape.
8 R. Dogaru & L. O. Chua
(a) Fig. 1.
Universal CNN Cells 9
(b) Fig. 1.
(Continued )
10 R. Dogaru & L. O. Chua
(c) Fig. 1.
(Continued )
Universal CNN Cells 11
(d) Fig. 1.
(Continued )
12 R. Dogaru & L. O. Chua
by certain upper bounds as shown in [Hassoun, 1995]) must be realized in order to ensure that any Boolean function can be realized via such networks. Since it is known that for hard problems the number of neuron units can increase exponentially with the number of inputs, the multilayer perceptron solution is definitely not compact. (ii) The complexity of the design algorithm (called a learning algorithm in the multilayer perceptron literature) also increases with the (previously unknown) number of units in the network and there is no guarantee that it will lead to a locally optimal solution, let alone the global optimal solution. In many cases, such algorithms will not converge to a valid realization, indicating that the assumed structure (number of units/layers) must be changed via some complicated algorithm which is not well suited for “on-chip” implementation solutions. In what follows we will demonstrate that all of these drawbacks can be overcome by taking a different approach to finding the discriminant function. According to our approach, (1) provides a unique projection from the n-dimensional input space into a one-dimensional real axis. This projection corresponds to only one neuron unit, exactly as in the case of the (linear) perceptron. What changes with respect to the (linear) perceptron is the discriminant function. By choosing a canonical piecewiselinear (PWL) function [Chua & Kang, 1977], a simple design algorithm can provide a guaranteed solution for realizing any Boolean function. It is just a matter of additional optimization to find a better orientation vector so that the number of absolute value function terms in the PWL discriminant is minimized. In any case, a guaranteed solu-
tion always exists and by additional optimization, an extremely compact realization can be achieved. By using the canonical PWL cell model (10 )– 0 (3 ), the complex separating hypersurface w(u) = 0 is replaced by a collection of parallel hyperplanes: σ = buT = tk , k = 1, . . . , tr, where “tr” denotes a positive integer called the “transition” number to be defined below, and where tk is one of the “tr” real roots of the scalar nonlinear equation w(σ) = 0. The sign of the output function will remain invariant for any input vector which lies between such planes, but changes to the opposite sign when an input vector crosses a separating hyperplane [as shown in Fig. 2(d)]. From this perspective, the class of linearly separable functions is just a special case, when there is only one transition; namely, tr = 1. In what follows we will show that for any Boolean functions there is at least a default orientation so that a solution based on “tr” separating hyperplanes always exists. Moreover, the number “tr” of parallel separating hyperplanes can be minimized by additional optimization of the orientation vector so that, if the function is linearly separable, the algorithm converges to tr = 1. This approach not only allows for a simplification of the discriminant function design, but also gives a much more natural perspective over the distinction between a linearly separable and a linearly not separable Boolean function. In fact, the optimal transition number “tr” (i.e. the minimum number of separating hyperplanes) can be used to characterize the complexity of a Boolean function, or of any problem specified by a set of input–output samples. Since any of the separating hyperplanes is defined by an equation σ = buT = tk , it follows that the orientation vector b is perpendicular to these planes. Indeed, let us consider an arbitrary Remark.
Fig. 2. (following pages) Transitions, robustness and discriminant functions for the realization of Boolean logic with piecewiselinear CNN cells. The case of “Parity3” function (with 3 inputs), Y = [−1, 1, 1, −1, 1, −1, −1, 1] is considered. In Figs. 2(a)– 2(c), the upper plot represents the input (hyper) cube, where each vertex corresponds to a particular input vector and the color assignment of all vertices defines the Boolean function. The lower plot represents the projection tape, where all data required by the realization algorithm are clearly marked. (a) An inversion of the default orientation (b = [−4, −2, 1]) leads to a projection tape with 5 transitions, as shown in the lower plot. The polynomial discriminant function w(σ) = (σ − 6)(σ − 2)σ(σ + 2)(σ + 6) has a simple structure, but leads to an impractical realization because five multiplication operators are required. (b) The number of transitions can be reduced, by rotating the projection axis: Only 3 transitions are found on the projection tape defined by the orientation vector b = [−4, −2, 3]. The associated piecewise-linear discriminant function is a special case of Eq. (22) obtained by applying the realization algorithm given by Eqs. (23)–(26). Observe that the robustness vector is unbalanced and hence further improvement of the orientation vector is still expected. (c) The near optimal orientation vector b = [−4, −2, 4] leads to a balanced robustness vector while keeping the number of transitions minimal (equal to 3). (d) Another valid realization of the same function via the orientation vector b = [−1, 1, 1]. Here, the three parallel separating (hyper) planes associated with the discriminant function w(σ) are represented with different colors within the input cube.
Universal CNN Cells 13
(a) Fig. 2.
14 R. Dogaru & L. O. Chua
(b) Fig. 2.
(Continued )
Universal CNN Cells 15
(c) Fig. 2.
(Continued )
16 R. Dogaru & L. O. Chua
(d)
(Continued ) Fig. 2.
Universal CNN Cells 17
oriented line segment uαβ = uβ − uα lying on a separating hyperplane, joining the point u = [uα1 , uα2 , . . . , uαn ] with the point uβ = [uβ1 , uβ2 , . . . , uβn ]. Since both uα and uβ belong to the same hyperplane, it follows that b(uα )T = tk , and b(uβ )T = tk . By subtracting the former equation from the latter equation it follows that b(uβ − uα )T = 0, i.e. b is perpendicular to any line segment in a separating hyperplane, and therefore b is an orientation vector perpendicular to all these planes. Let us consider first a simple linearly nonseparable example; namely, the “Parity2” function. Its graphical representation in the input space is shown in Fig. 1. It consists of four vertices, two colored in blue and two colored in red. Example 1.
Following the CNN coding scheme presented in Sec. 2.8.3 of [Chua, 1998], the Parity2 Boolean function of 2 variables is represented uniquely by the binary code [0 1 1 0], or equivalently, by the color code [blue red red blue] at the corresponding vertices [V3 , V2 , V1 , V0 ] = [(−1, −1), (−1, 1), (1, −1), (1, 1)] in Fig. 1(a). In terms of CNN variables, which are real numbers, the above Parity2 binary code translates into the real number representation [−1 1 1 − 1], where the “binary” symbols “0” and “1” code for the “real ” numbers “−1” and “1”, respectively. In terms of the binary number [0 1 1 0], the “Parity2” Boolean function, or truth table, has a decimal equivalent of ID = 0(2)3 + 1(2)2 + 1(2)1 + 0(2)0 = 6, which represents uniquely its identification number (ID). The above standard compact CNN coding and identification scheme will be used throughout this paper. For example, the Boolean function [1 −1 −1 −1 1 1 1 −1] in Fig. 3(a) can be simply coded by its equivalent decimal number equal to 1(27 ) + 1(23 ) + 1(22 ) + 1(21 ) = 142. In this paper, we define a projection axis to be a straight oriented line (parallel to the orientation vector b) passing through the origin of the input
space and which is perpendicular to tr parallel separating hyperplanes, where tr ≥ 1. In Figs. 1–3 it is depicted as a 2-color “line-segment” intersecting the origin. The positive semiaxis (upper half of the line segment) is colored in light cyan, while the negative semiaxis (lower half line segment) is colored in green. The projection axis is free to “rotate” around the origin of the n-dimensional input space. The orientation of the projection axis in the input space is specified by the orientation vector b = [b1 , b2 , . . . , bn ] formed by repacking the nine entries of the B template, as defined in [Chua, 1998]. Mathematically, the projection axis is defined as the geometrical loci of the vector P = [tb1 , tb2 , . . . , tbn ] where t ∈ (−∞, +∞). Several different orientations, as well as their corresponding orientation vectors are shown in Figs. 1(a)–1(c). Observe that the same vertices are projected onto different positions on the projection axis corresponding to different orientations. In what follows we will introduce several tools which will allow us to understand better the mechanisms for optimizing the orientation.
3.1. Projection tapes In the n-dimensional input space, a Boolean function is entirely specified by a set of pairs (vertex, color of vertex): {(V0 , Y (V0 )), (V1 , Y (V1 )), . . . , (VN −1 , Y (VN −1 ))}, where N = 2n . The problem of realizing a given Boolean function is to find an n-dimensional discriminant function so that it will separate the “red” vertices from the “blue” ones. The problem can be dramatically simplified by simply projecting the vertices and their colors onto the projection axis. The result is called a projection tape (PT) where scalar projection of the vertices and their colors are transcribed onto the projection axis, in accordance with the following simple
Fig. 3. (following pages) The same representations as in Figs. 2(a)–2(c) for the case of a linearly separable Boolean function; viz., Y = [1, −1, −1, −1, 1, 1, 1, −1] (ID = 142): (a) The default orientation vector b = [1, 2, 4] leads to a nonoptimal projection tape with 5 transitions. Its associated discriminant function w(σ) requires four absolute value function terms. (b) Using permutations and inversions (the first and the last elements of b were permutated, and the resulting first element was multiplied by −1) of the default orientation vector, the number of transitions can be further reduced to 3, when b = [−4, 2, 1]. The solution is still not optimal in this case since it requires a discriminant function with two absolute value function terms even though the Boolean function is linearly separable. (c) Allowing mutations in the orientation vector and selecting the best solution b = [−2, 2, 3], one finds the optimal projection tape, which requires only 1 transition, leading to a linear discriminant function.
18 R. Dogaru & L. O. Chua
(a) Fig. 3.
Universal CNN Cells 19
(b) Fig. 3.
(Continued )
20 R. Dogaru & L. O. Chua
(c) Fig. 3.
(Continued )
Universal CNN Cells 21
where
projection transcription rule: Define vj =
n X
bi uij = b · VαTj ,
i=1
j = 0, . . . , N − 1
{α1 = 2, v1 = −1} ,
{α2 = 1, v2 = +1} ,
{α3 = 0, v3 = +3} ,
Let us first observe that: (7)
where αj ∈ {0, 1, . . . , N − 1} denotes the index of the vertex Vαj where “j” is arranged in the same order as: v0 ≤ v1 ≤ · · · ≤ vj ≤ · · · ≤ vN −1 . Therefore, the projection tape PT is specified by the set: PT = {(vj , Y (Vαj ))|j = 0, . . . , N − 1}
{α0 = 3, v0 = −3} ,
(8)
Observe that PT consists of N pairs, each pair being composed by a real scalar representing the excitation associated with an input vertex and by a binary number representing the desired output (plus or minus) of the cell for that input vertex. Therefore, the projection tape PT(b, ID) depends not only on the orientation of its associated projection axis but also on the associated Boolean function (specified simply by its identification number ID). In fact, the projection tape is a complete representation of a Boolean function of n variables in a one-dimensional space where the test variable for the discriminant function is the scalar excitation P σ = ni=1 bi ui = b · uT . Using this simple transformation, the task of finding a discriminant function to realize an arbitrary Boolean function reduces to that of finding a nonlinear function w(σ) of one variable σ, so that sgn(w(vj )) = Y (Vj ), ∀j. In each of Figs. 1–3, a specific Boolean function is chosen to illustrate the concept of a projection tape, which is represented both by a hypercube associated with the Boolean function, and also by a box below each hypercube representation of the Boolean function.
3.2. Default orientations For the case of the “Parity2” function, let us start with the orientation b = [2, 1], called a default orientation in this paper, which will be defined below for the general case for an arbitrary number of inputs. Observe in Fig. 1(a) that the resulting projection tape is characterized by four equidistant projected vertices: PT(b, 6) = {(v0 , −1), (v1 , +1), (v2 , +1), (v3 , −1)},
αj = 2n − 1 − j
(9)
vj = −2n + 1 + 2j
(10)
and
where j = 0, . . . , N − 1 . This follows from our choice of the default orientation specified above, which in the general case of n inputs is defined by: b = {2n−1 , 2n−2 , . . . , 20 }
(11)
Remark. Since the default orientation will always project a vertex from the “hypercube” input space onto a unique point on the projection tape, there is no overlap with other projected vertices. Therefore the default orientation defined by (11) will always lead to a valid projection tape defined as one having no overlapping projection points.
On any valid projection tape one can identify transitions between intervals where consecutive projected vertices are labeled in “red”, and intervals where consecutive projected vertices are labeled in “blue”. We will see in the following that this transition map, and its special case, called a robust transition map, are key concepts towards defining a discriminant function. The default orientation is not the only orientation leading to a valid decoding tape [Chua, 1998]; in fact most of the arbitrarily chosen orientations will lead to such a tape. For example, any permutation, and/or inversion (multiplication with −1), of elements in (11) will lead to a valid projection tape, as can be easily checked.
3.3. Valid and not-valid projection tapes Note that in general, the number of distinct projected vertices on a projection tape can be smaller than N . For example, in Fig. 1(b), using the orientation b = [1, 1] there are only three distinct points
22 R. Dogaru & L. O. Chua
on the projection tape since (v1 , 1) = (v2 , 1) = (0, 1). Observe that in this case, the two vertices Y (Vα1 ) = Y (Vα2 ) = 1 have the same color and therefore projecting them from the hypercube onto the same point on the projection axis does not lead to a conflicting situation. However, in some cases, different vertices having different colors may project onto the same point on the projection axis, thereby leading to a conflicting situation, henceforth called a not-valid projection tape. An example of a not-valid projection tape is shown in Fig. 1(d), where v0 = v1 = −1 and v2 = v3 = +1 but where (Y (Vα0 ) = Y (V1 ) = 1) 6= (Y (Vα1 ) = Y (V3 ) = −1) and Y (Vα2 ) 6= Y (Vα3 ). Such situations are degenerate; they correspond to failure boundaries4 in the parameter space of the orientation vector b. Such failure boundaries are hypersurfaces separating compact domains characterized by distinct orderings Ω = {α0 , α1 , . . . , αN }. A not-valid projection tape is not robust, and any small perturbation in the parameter space will lead to a valid projection tape corresponding to one of the domains bordering the failure boundary associated with the not-valid projection tape. The situation illustrated in Fig. 1(d), for b = [0 1] corresponds to a point on the failure boundary separating a domain for which Ω = {3, 1, 2, 0} from a domain where Ω = {1, 3, 0, 2}. A point in the former domain corresponds to the valid projection tape shown in Fig. 1(c). It is interesting to observe that the valid projection tape in Fig. 1(b) was obtained using the orientation vector b = [1, 1] which lies on a separating boundary between the domains associated with Ω = {3, 1, 2, 0} and Ω = {3, 2, 1, 0}, respectively. However, this failure boundary is not a failure boundary with respect to the objective function of interest; viz., the number of valid transitions. Indeed, even if the order of the projected vertices changes when the orientation vector is rotated and passed over the boundary b = [1, 1] from Fig. 1(a) to Fig. 1(c), via Fig. 1(b), we found that the number of transitions remains constant (equal to 2) in this example and therefore with respect to the “number of transitions” objective function, the orientation vector b = [1, 1] does not lie on a failure boundary. 4
3.4. Transitions and robust transitions In a valid decoding tape, a transition exists if Y (vj ) 6= Y (vj+1 ) and is defined as any real number tk , lying between vj and vj+1 ; i.e. vj < tk < vj+1 . In other words, every change in the sign (or color) of the projected vertices, which is observed when the projection tape is scanned from minus infinity to plus infinity, is counted as a transition. Since transition points are roots of the discriminant function w(σ), they contain the most important information of the projection tape. While any tk which satisfies vj < tk < vj+1 where Y (vj ) 6= Y (vj+1 ) is qualified as a transition point (by definition), the most robust transition is obtained by choosing: tk =
vj + vj+1 . 2
(12)
In what follows, we will assume all transitions points are calculated from (12) and therefore are robust transitions. For the sake of simplicity they will be simply called “transitions”. An arbitrary orientation vector b and an arbitrary Boolean function ID, will lead to a projection tape characterized by the transition vector T (b, ID) = {t1 , t2 , . . . , tk , . . . , ttr }. The degree of robustness rk associated with each transition tk is defined by: rk = vj+1 − vj .
(13)
The robustness vector R(b, ID) associated with the orientation vector b and a Boolean function ID is defined by R(b, ID) = {r1 , r2 , . . . , rk , . . . , rtr }. One key advantage of the one-dimensional projection tape is that it allow us to evaluate the robustness of a particular CNN cell. Indeed, we can define the robustness associated with a given projection tape by the positive number r = mink=1,...,tr {R}. A value of r close to 0 indicates a nonrobust solution corresponding to an orientation which is close to the failure boundary associated with a not-valid projection tape. In such situations, small changes in the orientation vector can lead to a Boolean function which is different from the prescribed one.
In [Chua, 1998] a failure boundary was defined as a hypersurface separating parameter domains of a cell which correspond to different Boolean function realizations. We extend this notion here to the case where each separated domain is associated with a class of realizable Boolean functions but where the same order Ω = {α0 , α1 , . . . , αN } is preserved.
Universal CNN Cells 23
Consider next the Boolean Parity function with 3 inputs, defined by ID = 105 [Chua, 1998]. For a permutation followed by inversions of the default projection tape; namely for the orientation vector b = [−4, −2, 1], we obtain a projection tape with tr = 5 transitions points, as shown in Fig. 2(a). In this case, the transition vector is given by T = {t1 , t2 , t3 , t4 , t5 } = {−6, −2, 0, 2, 6}. Together with the sign (color) of the first projected vertex projection: s = Y (v0 ) = Y (Vα0 ), (s, T ) constitutes a complete set of parameters required by the design algorithm to be developed in Sec. 5. In other words, the discriminant functions w(σ) associated with a prescribed Boolean function and a prescribed orientation vector b, is completely specified by this set of parameters. Example 2.
The simplest choice for a discriminant function, from a pure mathematical point of view, is a polynomial. Since the roots of the discriminant function are the transition points, the polynomial can be immediately written without any auxiliary design algorithm as follows: w(σ) = s(−1)tr (σ − t1 )(σ − t2 ) · · · (σ − ttr ) . (14) In the above example, the color of v0 is “blue” (corresponding to s = −1), and there are 5 transitions for the orientation specified above. Since the robustness vector in this case is composed of five equal components, each one equals to 2, we have r = 2. Therefore, a valid discriminant function is given by the polynomial: w(σ) = (σ−6)(σ−2)σ(σ+ 2)(σ + 6). Although useful as a mathematical construct, a polynomial discriminant function is not attractive in practical implementations because it requires many additional multipliers. Instead, a piecewiselinear discriminant can perform the same task with the addition of only (tr − 1) absolute value function units, but without any multiplication. Since an absolute value operator is much simpler to implement in current electronic technologies than multipliers, the piecewise-linear discriminant function is a better choice from the perspective of electronic realizations. It also leads to a simpler mathematical analysis when used in CNN systems. A complete description for designing piecewise-linear discriminants via the theory of canonical piecewise-linear representations presented in [Chua & Kang, 1977] is given in the next section.
4. Finding the Optimal Orientation Finding the optimal orientation vector b∗ , is geometrically equivalent to “rotating” the projection axis about its origin in the n-dimensional hypercube input space with the goal of minimizing the number of transitions tr (coarse optimization). This process may be followed by a “fine optimization”, where the goal is to maximize the robustness parameter r. Such a process is illustrated in Figs. 2(a)– 2(c), where the “rotation” is generated by picking an arbitrary value for the orientation vector b. Observe that the number of transitions reduces from 5 in Fig. 2(a) to 3 in Fig. 2(b). The situation in Fig. 2(b), however, corresponds to an “unbalanced” robustness vector R = [4, 2, 4], with a robustness of r = 2. By further rotating the projection axis, a “balanced” robustness vector corresponding to an optimized robustness of r = 4 is obtained as shown in Fig. 2(c). Consider next the case of a linearly separable function (ID = 142). Starting from the arbitrary default orientation vector b = [1, 2, 4], the number of transitions is tr = 5, as shown in Fig. 3(a). Since we know that the function is linearly separable (in general it is unknown), it is clear that this solution is not an optimal one. However, even for this not-optimized orientation, there is a valid realization via the following discriminant function: w(σ) = −σ + |σ + 3| − |σ + 1| + |σ − 1| − |σ − 3|.
Example 3.
In fact, this is an important advantage of our approach; viz., it offers a fast, valid solution even if it may not be optimal. Starting from this nonoptimal solution and by using an evolutionary approach, the orientation can be improved so that the number of additional parameters and absolute value functions is reduced as much as possible. Another aspect of our optimization approach is that both linearly separable and linearly not separable functions are treated equally, and the minimum number of additional parameters m that is ultimately needed will give an indication of the complexity of the Boolean function being realized. To optimize the orientation vector, one simple approach is to try all possible permutations and inversions of the initial (default) orientation vector. In this example, if we choose the orientation vector b = [−4, 2, 1], we would obtain a smaller number of transitions in the projection tape, in this case tr = 3, as shown in Fig. 3(b). While in many
24 R. Dogaru & L. O. Chua
cases this simple optimization procedure leads to a better solution, it does not necessarily lead to the optimal one. For n inputs, there are 2n n! possible permutations and inversions of the default orientation vector. Therefore, this optimization method is applicable only to relatively small number of inputs. For example, in the case of n = 9, corresponding to the typical CNN cell, it takes about 30 minutes for the function “perms” in Matalb running on a Pentium-233 MHz processor to generate the entire set of 9! = 362880 permutations of the orientation vector. We have found experimentally that a nearoptimal solution (in terms of tr) can be found much faster if random values are simply allocated to the orientation vector. Among a “population” of random mutations, the set of orientations minimizing tr is selected. This algorithm can be applied further using specific techniques of genetic algorithms [Koza, 1992], which are particularly well suited for this problem. Applying such an approach to this example, we obtain the optimal orientation vector b∗ = [−2, 2, 3]. It corresponds to the linearly separable solution (tr = 1) shown in Fig. 3(c). In this case the discriminant function is still a canonical piecewise-linear form, but with no absolute value function terms; viz., w(σ) = −σ. The problem of seeking an optimal orientation can be viewed as a classical optimization problem where the goal function (the number of transitions tr, or a combination of tr and the robustness r) has an unknown dependence on the parameters to be optimized. In such cases, we can apply genetic and evolutionary algorithms [Koza, 1992], as well as techniques based on directed random search and reinforcement of the type described in [Harth & Pandya, 1988]. Note that none of these techniques can guarantee the convergence towards the optimal solution. However, since any starting solution leads to a valid realization (except degenerate cases corresponding to not-valid decoding tapes which can be avoided by small random mutations), such techniques are very similar in nature with real life evolution of species. This may also support the assumption that most real-life systems are operated in a near-optimal rather than optimal manner [Kauffman, 1995]. Moreover, techniques based on genetic algorithms can be translated into “on-chip” solutions which exploit the parallel nature of the CNN and of the chaotic dynamics discovered from
simple cells such as Chua’s circuit [Madan, 1993]. In particular, the entire population of physically implemented cells can simultaneously evolve via mutations induced from an additional layer of chaotic CNN cells, while simple circuits attached to each cell count the number of transitions, thereby selecting the cell with the best performance. In the next step, the best orientation vector will be duplicated at all cells in the CNN and a new cycle of chaotic mutations will start. If the new “best orientation” is better than in the previous cycle, a new duplication step will take place. If not, a new cycle of chaotic mutations will start with cell parameters inherited from the previous cycle. It is obvious that such a process must necessarily lead to a continuous decrease in the objective function (the number of transitions). Since it is a parallel process running in an analog system, its speed is fast enough to lead to near-optimal solutions with a much smaller computing time compared to the same algorithm on a classic digital computer (using sequential processing). An open question is whether there exists an analytical procedure which is capable of finding the optimal orientation for an arbitrary Boolean function with n inputs. Another open question is on how the optimal solution (the number of transitions, or the parameter m in (10 )–(30 ) depends on the number of inputs. Our experimental results with randomly generated Boolean functions indicates that near-optimal solutions found with random search, or with directed random search methods, have an exponential dependence on n; viz., m = O(2n ), where m is an average over m parameters of near-optimal solutions obtained with the above mentioned algorithms. There are however several special classes of Boolean functions for which optimization can be achieved easily. We will discuss two such cases; viz., the totalistic and the semi-totalistic functions. Totalistic functions were defined in [Wolfram, 1984] as Boolean functions where the output depends only on the sum of the input variables. While in his work Wolfram usually assumes linearly separable totalistic functions (his discriminant function being a linear one) we will extend this terminology in his paper to any piecewise-linear discriminant function with an arbitrary number of transitions on its associated projection tape. By definition, any totalistic Boolean function has already optimized its orientation vector by restricting it to b = [1, 1, . . . , 1], i.e. all components are 1.
Universal CNN Cells 25
Fig. 4. Two realizations of the universal PWL CNN cells for implementing the Boolean function “Parity4” (Parity with four inputs). This cell allows pattern replication in a generalized cellular automata [Chua, 1998] where the cell inputs are connected to the “east”, “west”, “north” and “south” neighboring cell outputs. (a) The default orientation vector leads to a nonoptimal solution where the projection tape is characterized by 10 transitions, leading to a discriminant function w(σ) with nine absolute value function terms. (b) Since the Boolean function Parity4 is totalistic, the optimal realization is found by simply prescribing the orientation vector b∗ = [1, 1, 1, 1]. The resulting projection tape requires only 4 transitions, leading to a much simpler discriminant function w(σ) with only three absolute value function terms.
The cells implementing the Parity4 function5 are known to posses interesting duplication properties when implemented on a Cellular Automata [Toffoli & Margulos, 1987]. The default orientation vector b = [8, 4, 2, 1] yields a projection tape with tr = 10 transitions, as shown in Fig. 4(a). No permutation and/or inversion of the default orientation vector can give a better result, as expected since in the case of totalistic functions, the output does not depend on the specific position of the inputs. However, since the function is totalistic (a simple test can be made, by allowing all inputs to be permutated and observing that the associated Example 4.
5
decoding tape does not change), the optimal orientation vector is b = [1, 1, 1, 1]. Indeed, in this case, the resulting projection tape will have only tr = 4 transitions, as shown in Fig. 4(b) where realizations via the canonical PWL discriminant function corresponding to both an optimal, and a not-optimal orientation are presented. The optimal canonical PWL discriminant function for implementing the Parity4 function is: w(σ) = 3 − |σ + 2| + |σ| − |σ − 2|. It can be easily shown that for the general case of the “parity” function with n inputs, the
The “Parity4” cell returns +1 if and only if there is an odd number of inputs (coming from north, south, east and west neighboring cells) in the +1 state. Otherwise, the cell returns −1.
26 R. Dogaru & L. O. Chua
Fig. 5. Two realizations of the PWL CNN cells for implementing the semitotalistic Boolean function “Life” with 9 inputs. In generalized cellular automata [Chua, 1998] this cell is responsible for producing the “Game of Life” behavior [Conway et al., 1982]. (a) An orientation vector taking into account the semitotalistic character of the “Life” function, but where the central element was arbitrarily chosen to be b5 = −8, leads to a near-optimal projection tape with 4 transitions. (b) Further optimization of the central parameter leads to an optimal solution with only 2 transitions with b5 = 0.50. The resulting cell is the simplest for implementing the “Game of Life”, requiring only one absolute value function term.
Universal CNN Cells 27
orientation vector associated with the definition of the totalistic function is an optimal one, with tr = n transitions. This result is very important because it contradicts the widely accepted reasoning that the class of parity functions represents the most “complicated” Boolean function. In fact, this class of functions yields a “complexity” of the associated piecewise-linear cell which grows only linearly with the number of inputs, while randomly selected Boolean functions were found to have an exponential increase in the number of transitions with the number of inputs. Moreover, for the class of parity functions it is shown in Sec. 6 that a complexity of O(log(n)) can always be achieved if the discriminant function is a multi-nested piecewise-linear formula. Semitotalistic functions are functions where only one of the n inputs has a “privileged” role; all others being “indifferent” to their location within the sphere of influence, as in the case of totalistic functions. Therefore, by definition, the optimal orientation is to be found among the family b = [1, 1, 1, . . . , λ, 1, 1, . . . , 1], where the coefficient λ is a real number. This parameter is subject to further optimization and its position within the orientation vector corresponds to the position of the “privileged” input. Simple tests on the truth table of a Boolean function (which require permutations of the inputs and inspection of the resulting decoding tapes) can detect whether a function is semitotalistic and if so, identify the “privileged” input. An additional process of optimization can be initiated to find the best value of λ. Since this involves
a one-dimensional search process, it is a much less complex problem than that of optimizing an entire orientation vector. Consider next the classic “game of life” Boolean function [Conway, 1982] with 9 inputs whose “gene decoding book” (truth table with 512 entries) is given in [Chua, 1998]. If the default orientation b = [256, 128, 64, 32, 16, 8, 4, 2, 1] is used, the corresponding projection tape will have tr = 156 transitions. Using an optimization based on a random search with 1000 trials, we obtain the following vector: b = [−15103, −14370, −6967, −10319, −13535, −5599, −14752, −5833, −14912], with tr = 79 transitions, which is still far from optimal. However, when we take advantage of the observation that the form of Boolean function “Life” is semitotalistic, we obtain a dramatically improved near-optimal orientation vector b = [1, 1, 1, 1, −8, 1, 1, 1, 1] whose projection tape has only 4 transitions, as shown in Fig. 5(a). Its associated canonical PWL discriminant function reduces to: w(σ) = 10 − |σ + 11| + |σ + 2| − |σ − 6|. This already very good “performance” can be further improved even more by optimizing the parameter λ cited above by changing it in increments of 0.5 (from −8 to +0.5). The resulting orientation vector b∗ = [1, 1, 1, 1, 0.5, 1, 1, 1, 1] has only tr = 2 transitions, which is the minimum possible,6 as shown in Fig. 5(b). The discriminant function associated with this projection tape is simplified further to: w(σ) = 1.5 − |σ + 2.5|. We must stress that this is the most compact implementation for a cell realizing the Boolean “Life” function: Example 5.
Simplest CNN realization of the Game of Lif e 1 5 3 w = − u1 + u2 + u3 + u4 + u5 + u6 + u7 + u8 + 2 2 2 x˙ ij = −xij + 2f (xij ) + w 1 yij = f (xij ) = (|xij + 1| − |xij − 1|) . 2
6
We know that “Life” is not a linearly separable Boolean function, therefore the number of transitions on the projection tape must satisfy tr > 1. Since tr = 2 is the smallest value satisfying this property, the associated orientation vector is an optimal one with respect to the number of transitions.
28 R. Dogaru & L. O. Chua
Fig. 6. Realization of the universal CNN cell via a canonical piecewise-linear discriminant function w(σ). After a valid and eventually optimal orientation vector was found, the parameters {s, z, m, z0 , . . . , zm } of the discriminant function w(σ) can be simply determined from the associated projection tape.
Comparing with the standard CNN cell, the above optimal CNN realization of the Game of Life has only one additional parameter and one absolute value function. A dedicated CNN chip with a high cell density can be implemented in hardware. No other implementation in the framework of digital technologies can outperform our above optimal implementation of the “Game of Life”.
5. Canonical PWL Universal CNN Cell In the previous sections we have introduced the concept of a projection tape as a one-dimensional representation which makes it possible to design a simple piecewise-linear discriminant function for realizing arbitrary Boolean functions. We have pointed out that the main problem with the projection tapes is to find an optimal or near-optimal orientation vector which minimizes the number of transitions.
Additional fine tuning of the orientation vector can also be made in order to maximize the robustness r. As already illustrated in Sec. 3, to design the discriminant function w(σ), the only information required are the transition vector T = {t1 , t2 , . . . , tk , . . . , ttr } and the sign (color) of s = Y (Vα0 ) associated with an optimal, or near-optimal, orientation vector. In what follows we will derive the design algorithm for the canonical piecewiselinear universal CNN cell using results from [Chua & Kang, 1997]. Any continuous piecewise-linear function of one variable σ can be represented uniquely by the following canonical piecewise-linear representation: w(σ) = z + z0 σ +
m X
βk |σ − zk |
(15)
k=1
where m is the number of linear segments minus one. Each linear segment connects two consecutive breakpoints (i.e., intersection between two adjacent
Universal CNN Cells 29
linear segments) located at σ = zk , and σ = zk+1 on the σ axis. The parameters z, z0 , . . . , zm , βk can be determined from the following formulas [Chua & Kang, 1977]:
z0 = k = 1, . . . , m
(17) zk =
βk |θ − zk |
(18)
k=2
m0 = −s
(19)
mk = (−1)mk−1 ,
k = 1, . . . , m
(20)
tk + tk+1 , 2
k = 1, . . . , m .
(21)
It can be easily checked that (19)–(21) leads to a valid discriminant function which intersects the σ axis exactly at the transition points T = {t1 , t2 , . . . , tk , . . . , ttr } and having the prescribed sign (color) at the projected vertices. Moreover, it follows that m = tr − 1. Therefore the following formula represents the canonical PWL discriminant function of our universal CNN cell: T he canonical P W L CNN cell discriminant f unction : w(σ) = z + z0 σ − s
m X
0,
s = Y (Vα0 ) if m = odd integer
−s, if m = even integer tk + tk+1 , 2
z = −z0 t1 + s
m X
(23) ,
(24)
k = 1, . . . , m
(25)
(−1)k |t1 − zk |
(26)
k=1
where mk is the slope of the linear segment joining the breakpoint (zk , w(zk )) with the breakpoint (zk+1 , w(zk+1 )), as shown in Fig. 6, and θ is any root of the discriminant function: w(θ) = 0. The slopes m0 and mm should be specified explicitly since they correspond to the leftmost and rightmost segments respectively, and therefore have only one breakpoint each. In order to apply these formulas to our problem, we need to specify the breakpoints zk as well as their associated slopes mk . Given the transition points T = {t1 , t2 , . . . , tk , . . . , ttr } and s = Y (Vα0 ), the following choice leads to a simple design solution:
zk =
(
(16)
1 βk = (mk − mk−1 ) , 2 z = −z0 θ −
The canonical PWL cell gene: m = tr − 1 ,
1 z0 = (m0 + mm ) 2
m X
Y (Vα0 ) = Y (v0 ):
(−1)k |σ − zk |
(22)
k=1
where the gene parameters m, s, z, z0 , z1 , . . . , zm are determined below from the two elements defining the associated projection tape; namely, the transition vector T = {t1 , t2 , . . . , tk , . . . , ttr } and s =
6. Multi-Nested Piecewise-Linear Universal CNN Cells In the previous sections we have proved that any locally Boolean function can be realized with a CNN cell described by (10 )–(30 ) where the discriminant function w(σ) is defined by the canonical piecewiselinear formula (22). Compared to a standard CNN cell, this realization requires “m + 2” additional parameters and “m” absolute value functions; both are determined by a simple and exact design procedure via (23)–(26). Moreover, an evolutionary approach can be applied to find an optimal or near-optimal orientation vector b, which minimizes the number “m” of additional absolute value functions. It follow from Eq. (23) that the number m depends linearly on the number of transitions on the projection tape. In the worst case of a poorly optimized projection tape, or of a complex Boolean function, the number of transitions can increase exponentially with the number of inputs n. It follows that m ≤ 2n . Since a CNN cell has usually n = 9 inputs, in the worst case we may need up to 29 = 512 additional absolute value functions. One possibility to overcome this situation is to optimize the projection tape as discussed in Sec. 4. Unfortunately, except for the case of totalistic and semitotalistic Boolean functions there is presently no exact procedure for determining the optimal orientation vector b∗ , and therefore we cannot predict in advance the bound m∗ associated with the optimal orientation. Even though for totalistic Boolean functions, it can be proved that m∗ ≤ n, in the general case of arbitrary Boolean functions we have to admit an exponential dependence of the type m∗ ≤ 2n unless new theoretical results can be found to provide better bounds for m∗ .
30 R. Dogaru & L. O. Chua
In this section, we will present another approach to overcome the disadvantage of a large number of transitions. This approach relies on a novel discriminant function w(σ). The multi-nested discriminant is a piecewise-linear formula where the number m of absolute value functions and the additional parameters increases only with log2 (tr), where tr represents as before the number of transitions on the projection tape. As shown in [Dogaru & Chua, 1998f], any Boolean function can be realized with a multi-nested discriminant requiring only m = n − 1 absolute value functions and m + 1 = n additional parameters. The expression of the multinested discriminant function is given by: w(σ) = s(zm + |zm−1 + | · · · + z1 + |z0 + σ|||) (27) By replacing the canonical piecewise-linear discriminant function (22) with (27) in the defining equations of the CNN cell (10 )–(20 ), we obtain a cell with only 2n + 1 parameters but still can realize any Boolean function! The payoff for this dramatic reduction in complexity is given by the fact that the 2m roots of w(σ) = 0 are not independent and therefore, an additional restriction must be imposed on the orientation vector. This restriction stipulates that the transitions on the associated projection tape must match the root distribution structure specified by a bifurcation tree to be defined in the next subsection. Therefore, instead of the canonical piecewise-linear CNN cell, the optimization of the orientation vector will be tightly coupled with the process of determining the additional parameters {s, z0 , z1 , . . . , zm }, and except for the case of totalistic functions, there is currently no explicit algorithm for designing this CNN cell. Instead, evolutionary algorithms based on directed random search and/or genetic mutations are used to solve this nonlinear optimization problem. To illustrate our procedure, we will present a table which contains the genes (sets of parameters {s, z0 , z1 , . . . , zm } and b) of all Boolean functions with n = 4 inputs, using a multi-nested representation (27) with m = 3 nests.
6.1. Bifurcation tree for multi-nested discriminant function Equation (27) can be recast in the following recursive form:
w0 (σ) = z0 + σ FOR k = 1, . . . , m wk (σ) = zk + |wk−1 (σ)| END
(270 )
w(σ) = swm (σ)
(28)
where k is an index associated with the level of “nesting”. Observe that, at each level of nesting “k”, the number of linear segments of the intermediate discriminant function wk (σ) is double that of wk−1 (σ). In particular, when m = 0, (27) and (270 ) correspond to the linear discriminant function (with one linear segment). After k = m levels of “nesting”, the resulting discriminant function w(σ) = wm (σ) will contain 2m segments, as shown in Fig. 7. Since it is a piecewise-linear function, (27) admits always a canonical PWL representation. However, instead of 2m absolute value functions required by the canonical representation, due to the recursive form of (270 ), only m absolute value functions are now required. The converse is not true, i.e. not every function described by a canonical PWL representation admits a multi-nested piecewise linear form (270 ). This “irreversibility” is reflected by a special structure of the roots of the equation wm (σ) = 0. In contrast to the canonical PWL representation where its roots are independent, in the case of (270 ) the roots are subject to the following constraints: By choosing the parameters z0 , z1 , . . . , zm so that: |zm | < |zm−1 | < · · · < |z1 | , (29) and
z1 , z2 , . . . , zm < 0
it can be easily proved that wm (σ) = 0 has 2m roots given by: σq = −z0 +
m X
zk Ψk
(30)
k=1
where Ψk ∈ {−1, 1}, and q is the decimal equivalent of the binary number Ψ1 Ψ2 , . . . , Ψm , where the bar denotes changing each “−1” to “0”. Equations (29) and (30) impose constraints on the parameters in (27) and on the positions of the roots on the projection tape. Therefore, in order to use (27) as a discriminant function, the orientation vector must be chosen so that (30) is satisfied and the cell parameters {z1 , z2 , . . . , zm } obey Eq. (29).
Universal CNN Cells 31
In fact (29) is not a strong restriction, since arbitrary values are allowed for the threshold parameters {z1 , z2 , . . . , zm }. However, in the case of (29) the number and the position of the roots are easier to control in developing the analytical techniques in Secs. 6.3 and 6.4. In a graphical representation, (30) can be associated with a bifurcation tree as shown in Fig. 7. For the particular case of m = 2 nests. The term “bifurcation” is used here because, for each additional level of nesting the number of roots is doubled in a manner reminiscent of the period-doubling bifurcation observed in many nonlinear chaotic systems. The structure of the bifurcation tree is
characterized by a main trunk positioned at σ00 = −z0 on the projection tape. This “trunk” corresponds to the unique root of the linear discriminant function, w0 (σ00 ) = 0. For each additional level of nesting k, the roots σqk of the intermediate discriminant function wk (σ) = 0 can be calculated using the recursive definition (270 ). For example, at level 1, by imposing w1 (σ) = 0 in (270 ) and assuming that (29) is satisfied, it follows that the number of roots doubles and they can be determined by simply solving the piecewise-linear equation w1 (σ) = 0 as follows: z1 + (z0 + σ) = 0 ,
if σ > −z0
(31a)
Fig. 7. Bifurcation tree for a universal multi-nested CNN cell. For each additional level of “nesting” the number of roots (transitions) of the discriminant function doubles instead of incrementing by one, as in the case of the canonical PWL cell. In this example we considered a Boolean function with 3 inputs and two levels of nesting. Therefore, given an appropriate orientation vector, the multi-nested discriminant leads to a more compact representation. However, the positions of the transitions on the projection tape are now restricted to lie within a configuration determined by a bifurcation tree (colored in green). This leads to a more complicated procedure for finding the optimal orientation and the remaining cell parameters. Each level of nesting corresponds to a “branching” in the bifurcation tree corresponding to “root doubling” reminiscent of the “period doubling” scenario in nonlinear dynamics.
32 R. Dogaru & L. O. Chua
or z1 − (z0 + σ) = 0 ,
if σ < −z0 .
(31b)
If z1 < 0 both alternatives are possible, leading to a doubling of the number of roots (in this case from 1 root at nesting level 0 to 2 roots at nesting level 1); namely: σ01 = −z0 − z1 ,
and σ11 = −z0 + z1 .
(32)
If z1 ≥ 0 (therefore, contradicting (29)), none of (31a) or (31b) has a solution. Indeed, (31a) has a solution σ = −z0 − z1 but since z1 ≥ 0 this solution does not satisfy the constraint σ > −z0 . A similar situation occurs for (31b). In terms of the bifurcation tree shown in Fig. 7 this situation corresponds to a “collision” between the two resulting branches. Observe that (32) is a special case of (30), when m = 1. Here we used the notation tm q to represent the “q”th root of the discriminant function associated with the nesting level m. In terms of the bifurcation tree (31a) and (31b) are equivalent to generating a set of symmetrical branches centered on the trunk associated with the previous nesting level, as shown in Fig. 7 for a particular case, when Z = [z0 , z1 , z2 ] = [−2, −4, −2]. At each additional level of nesting, such branches will bifurcate into a double number of branches following the same scenario. Looking at the representation of the bifurcation tree in Fig. 7, it is clear that “branch collisions” can be avoided if and only if |zm | < · · · < |z2 | < |z1 | and zk < 0, k = 1, . . . , m, which is in fact the condition (29). Therefore, a “branch collision” leading to a decrease in the number of roots occurs whenever at least one of the inequalities in (29) is violated.
6.2. Uniform multi-nested cells and their bifurcation trees Let us consider a special case of (29) where it always leads to a uniform distribution of roots on the projection tape. This case corresponds to the additional constraint: zk = −η2−k ,
k = 1, . . . , m
where η is an arbitrary scaling coefficient. 7
(33)
Since (33) specifies explicitly the bias parameters {z1 , z2 , . . . , zm }, a dramatic simplification in our design method is expected. Indeed, in this case our design procedure reduces to that of finding a uniform orientation, i.e. an orientation vector b which will generate a uniform projection tape to match the uniform multi-nested discriminant function. We conjecture that for any Boolean function there always exists at least one uniform orientation. This conjecture has been verified for n ≤ 3 inputs, and partially verified for n = 4, as shown in Sec. 6.4. It follows from (30) that any root of the equation w(σ) = 0 is given by: σq = −z0 − η
m X
2−k Ψk
(34)
k=1
It can be easily checked that the distance between two consecutive roots on the σ axis is always equal to each other and it corresponds to a one bit change (from −1 to +1) in the least significant position Ψm of the binary number Ψ1 Ψ2 , . . . , Ψm . Therefore, for m levels of nesting, the distance between two consecutive roots is given by ∆m = η2−m (+1 − (−1)) = η2−m+1
(35)
and the leftmost root σ0 on the σ axis is given by σ0 = −z0 − η(1 − 2−m ) .
(36)
For the multi-nested PWL function (with m = 2 nests) presented in Fig. 7, the bifurcation tree is a uniform one, with η = 2m+1 = 8. As shown in the same figure, it splits the projection tapes into uniform segments containing an equal number of “red” and “blue” projected vertices, with a distance of four units between two consecutive roots (at nesting level m = 2).
6.2.1. The uniform multi-nested discriminant as an analog-to-digital converter It is interesting to observe that the multi-nested discriminant function (27) where the threshold parameters are defined by (33) performs a very important function in signal processing; namely the conversion from analog values (in this case, the excitation σ) into binary ones. Indeed, if one considers
A Gray code is a binary code associated with a decimal number so that any increment or decrement of the decimal number will correspond to the change of only 1 bit in its associated Gray code.
Universal CNN Cells 33
the sign (color) sk = sgn(wk (σ)) of each intermediate discriminant function in (270 ), it is easy to check that the binary word S = s0 s1 s2 , . . . , sm is a Gray code7 of the analog input σ. Such structures had been proposed for high-speed video signal analog-to-digital conversion [Fiedler & Seitzer, 1978] and with certain technological improvements they currently represent the most compact and fast analog-to-digital converters. There is also a nonlinear dynamics interpretation for the recursive definition (27); viz., in the uniform case, each nesting level in (270 ) is equivalent to a “tent map” [Ott, 1993] transformation, as pointed out in [Kennedy, 1995].
6.2.2. Uniform orientations and projection tapes A uniform projection tape associated with a Boolean function ID is obtained for a uniform orientation bu . As shown in Secs. 3.4 and 5, for a given orientation vector b, a projection tape is completely specified by its associated transition vector T = {t1 , t2 , . . . , ttr }, and by the sign (color) of the leftmost projected vertex: Y (Vα0 ). A uniform projection tape must satisfy the additional constraint: tj+1 − tj = δ ,
j = 1, . . . , tr − 1 .
(37)
Experimental results on arbitrary Boolean functions indicate that the robustness restriction (12) imposed on the transition vector of a projection tape will dramatically reduce the chances of finding an associated uniform orientation. For example, in the case n = 3, it was impossible to find uniform orientations which satisfy (12) for 24 out of 256 functions. However, when this restriction was removed, thereby allowing transitions to be located in arbitrary positions between projected vertices, uniform orientations were found for these 24 functions. Therefore, in what follows we will remove restriction (12) from the definition of projection tapes. In effect, some Boolean functions will have less robust realizations than others. As shown in Sec. 5, by removing the restriction (33) which defines uniform multi-nested discriminant functions, the robustness will increase at the expense of storing the additional information represented by the parameters {z1 , z2 , . . . , zm } for any prescribed Boolean function. We conjecture that for any Boolean function there is at least one uniform orientation leading to a uniform projection tape. Given a uniform pro-
jection tape, the realization via a uniform multinested cell is straightforward as shown in the next subsection.
6.3. Boolean realizations via uniform multi-nested cells; an analytic approach If a Boolean function has a uniform projection tape with tr roots and the distance between consecutive roots is δ, a uniform multi-nested cell can be simply designed so that its associated bifurcation will match the projection tape: Uniform multi-nested CNN universal cell realization procedure Assumption: At least one uniform projection tape exists. 1. Choose m = dlog2 (tr)e ,
(38)
where dxe represents the first integer larger than x (also called a ceiling operator). 2. Following (35), determine η so that δ = ∆m . It follows that: η = δ2m−1 .
(39)
3. Since the roots of a uniform multi-nested discriminant must satisfy (34), determine the bias z0 so that (34) is satisfied. In the general case where tr is not a power of two, there are many valid choices.8 However, we will always choose the first transition t1 in the projection tape to be the first root σ0 of the uniform multi-nested discriminant. Therefore: σ0 = −z0 − η(1 − 2−m ) = t1 , where t1 is the leftmost transition on the projection tape. It follows that: z0 = −η(1 − 2−m ) − t1 = −t1 − η + δ/2 . (40) 4. Determine the sign parameter s in (27). From (270 ) it follows that at any level of nesting k, k ≥ 1 the leftmost segment of the piecewise-linear discriminant wk (σ) has a slope equal to −1. This is true for the last level of nesting “m” as well. It follows from (270 ) that the sign of the discriminant function when σ < σ0 is equal to s. Since σ0 = t1 and there is only one projected vertex v0 to
34 R. Dogaru & L. O. Chua
the left of t1 on the projection tape, it follows that: (
s=
Y (v0 ) = Y (Vα0 ) ,
if m ≥ 1
−Y (v0 ) = −Y (Vα0 ) , if m = 0 (41) since for m = 0 the slope of the linear discriminant function is +1.
Example 6. Let us consider again the Parity4 function, for which we have obtained earlier a realization via the canonical PWL cell in Sec. 3 (Example 4). We know that this function is a totalistic Boolean function, so in a search for a uniform orientation (which leads to a uniform projection tape) the most natural choice would be b = [1, 1, 1, 1]. If the cell is considered from the perspective of having 9 inputs (e.g. a Parity9 function in a two-dimensional CNN grid), and knowing which are the “active” inputs, the orientation vector can be rewritten as b = [0, 1, 0, 1, 0, 1, 0, 1, 0].
Indeed, as shown in Fig. 8, its associated decoding tape is defined by the transition vector T = {−3, −1, 1, 3} and it is a uniform projection tape with δ = 2 and t1 = −3. Applying the above algorithm, a matching bifurcation tree is simply determined as follow: m = dlog2 (4)e = 2 , η = δ2m−1 = 2 · 2 = 4 , z0 = −t1 − η + δ/2 = 3 − 4 + 1 = 0 . Using the definition (33) for uniform multinested discriminant functions, the remaining values of the threshold parameters can be easily determined as follows: z1 = −η2−1 = −2 ,
z2 = −η2−2 = −1 .
Similarly, the sign parameter is given by: s = Y (V0 ) = −1 . Let us now consider the Parity9 Boolean function, i.e. a function which determines whether the number of +1 inputs is odd or even.
Example 7.
An equivalent definition (within the binary formalism {−1, 1}) of the Parity9 function is that it computes the product of the 9 binary inputs. Since this Boolean function is totalistic, a good choice for the orientation vector is b = [1, 1, 1, 1, 1, 1, 1, 1, 1]. Indeed, this choice leads to a uniform projection tape defined by the transition vector T = {−8, −6, −4, −2, 0, +2, +4, +6, +8} with tr = 9 and δ = 2, as shown in Fig. 9. The simple design algorithm defined by Eqs. (38)–(40) can be applied again, leading to the following realization: m = dlog2 (9)e = 4 , η = δ2m−1 = 2 · 8 = 16 , z0 = −t1 − η + δ/2 = 8 − 16 + 1 = −7 . Using the definition (33) for the uniform multinested discriminant functions, the remaining values of the threshold parameters can be easily determined as follows: z1 = −η2−1 = −8 ,
z2 = −η2−2 = −4 ,
z2 = −η2−3 = −2 ,
z2 = −η2−4 = −1 .
Similarly, the sign parameter is given by: s = Y (v0 ) = −1 . Observe that in this case, the rightmost ten branches of the bifurcation tree remain unused, and they will intersect the projection tape in a “don’t care” region void of any projected vertices. It is also important to observe that from a hardware realization perspective, only four nonlinear devices (absolute value operators) and four additional parameters are required to implement the Parity9 function via the above uniform multi-nested cell. The parity function with n inputs admits the simplest realization, having a complexity of O(log2 (n)), via a uniform multinested PWL CNN cell. Theorem 6.1.
The proof is constructive, and it consists of applying the design procedure defined by (38)–(40): It can be easily verified that the orientation vector b = [1, 1, . . . , 1] (n coefficients 1) is a uniform orientation vector which leads to a uniform projection tape with n transitions (tr = n) defined by the transition vector T = {−n + 1, −n + 3, . . . , n − 1},
Proof.
Assuming that the 2m −tr additional roots of the uniform multi-nested function lie in the “don’t care” regions on the projection tape, i.e. regions where there is no input vertex projected.
8
Universal CNN Cells 35
Fig. 8. Three steps for realizing a multi-nested CNN cell for the totalistic Boolean function “Parity4” (also illustrated in Fig. 4): (1) The orientation vector is based on the totalistic character of the function. (2) The associated projection tape has 4 uniform transitions, thereby determining a “matching” bifurcation tree structure associated with the uniform multi-nested formula. (3) As a result of the matching process, the bias and the sign parameters z0 , z1 , z2 , s are found, leading to a realization with only two absolute value terms instead of three such terms required by the optimal realization of the same Boolean function via the canonical PWL universal CNN cell (Fig. 4).
36 R. Dogaru & L. O. Chua
Fig. 9. Three steps for realizing a multi-nested CNN cell for the totalistic Boolean function “Parity9”: (1) The orientation vector is based on the totalistic character of the function. (2) The associated projection tape has 9 uniform transitions, thereby determining a “matching” bifurcation tree structure associated with the multi-nested formula. (3) As a result of the matching process the bias and the sign parameters z0 , z1 , . . . , z4 , s are found, leading to a realization with only four absolute value terms instead of eight such terms required by the optimal realization of the same Boolean function via the canonical PWL universal CNN cell.
Universal CNN Cells 37
and having δ = 2. Therefore, by applying the above design algorithm, it follows that: m = dlog2 (n)e , η = δ2m−1 =]n[ , where ]n[ is the first integer bigger than n which is a power of 2, z0 = −t1 − η + δ/2 = n−]n[ . Using definition (33) for uniform multi-nested discriminant functions, the remaining values of the threshold parameters can be easily determined as follows: zk = −η2−k = −
]n[ , 2k
k = 1, . . . , m .
Similarly, the sign parameter is given by: s = Y (v0 ) = −1 This result is extremely significant because the simplest realization of parity functions with n inputs reported in the literature has a linear complexity, since it requires M = O(n−1) Parity2 (or XOR) gates. Since parity functions are wide spread in modern information processing systems, their function being often associated with parity checksums for error detection and/or correction, our solution via a uniform multi-nested CNN cell offers a dramatic reduction in complexity compared to any existing (digital) solution. Indeed, tasks like checking the parity of binary words of 64 bits are standard operations performed currently in computers. Observe that even though 63 XOR gates would be required to accomplish this task using a conventional approach, by using our CNN cell, only six simple nonlinear devices (absolute value functions) and seven additive parameters (biases) are required. The projection buT is realized in this case by a simple summation of the inputs, which can be achieved with minimal hardware cost via KCL (Kirkhoff’s Current Law).
6.4. Enumerating Boolean realizations via uniform multi-nested cells In the previous subsection, we have presented an exact design solution for the case of a uniform projection tape, assuming that a uniform orientation had
been determined. For certain classes of Boolean functions, including the parity function, finding such an orientation is a simple process. Moreover, it is not necessary that every totalistic function must use the orientation vector b = [1, 1, 1, . . . , 1] in order to obtain a uniform projection tape. In the case of arbitrary Boolean functions, the task of finding optimal and uniform orientations can be a difficult one in the absence of a formal theory for an exact search procedure. However, any of the methods described in Sec. 4 can be applied, by simply replacing the objective function to be minimized. Instead of a minimal number of transitions, now we are seeking orientations with a uniform distribution of transitions. Although the number of transitions is not so important anymore, it would be convenient to find uniform orientations having a minimum number of transitions. Another difference from the case of canonical PWL functions is that, in the case of uniform multi-nested discriminant functions, there is yet no formal proof to guarantee the existence of a uniform projection tape for arbitrary Boolean functions. In other words, in this case, we cannot provide the equivalent of a default orientation as a “starting” solution. Yet, based on our successful results in enumerating uniform multi-nested CNN cell realizations with 3 and 4 inputs, we conjecture that any Boolean function admits at least one uniform projection tape. In this section we will examine several results obtained simply by enumerating the Boolean realizations using uniform multi-nested discriminants. In this case, instead of seeking a valid realization for a specific Boolean function, we are concerned with finding a uniform multi-nested CNN cell realization for the entire set of Boolean functions with n inputs. For n ≤ 4, we will show that a simple algorithm for exploring the parameter space will lead to a solution (i.e. find the complete table of parameters). For the case of uniform multi-nested discriminants let us assume that the uniform structure of the projection tape will induce a uniform partition of the parameter space [b1 , b2 , . . . , bn ], i.e., we assume that the failure boundary (with respect to our “objective” (goal) function of finding a uniform orientation) has the geometry of a regular mesh separating identical polyhedral domains, each one associated with a particular Boolean function realization. Hence, let us assume bi = −q + p, i = 1, . . . , n where q is an integer which measures the “resolution” of our search algorithm, and
38 R. Dogaru & L. O. Chua
p = 0, . . . , 2q. We will also use the same exploration strategy for z0 , i.e. z0 = −q + p. Therefore, for a fixed value of q, there are (n + 1)2q+1 possible orientation vectors to explore. Since we expect to n find at least one valid orientation for each of the 22 possible Boolean functions, it follows that necessarn ily (n + 1)2q+1 > 22 . This inequality gives some hints for choosing q. Observe that a larger q requires a larger computation time, while a smaller q has a higher probability that some uniform orientations will be missed. According to this inequality, it follows that q ≥ 3, and q ≥ 4, for n = 3, and n = 4, respectively. The following algorithm will allow us to derive a table of parameters, each row realizing a Boolean function with n inputs. Consequently, as in [Chua, 1998], the set of parameters in each row which uniquely defines a Boolean function of n inputs is called a universal CNN gene. Universal CNN gene enumeration algorithm: (Algorithm for finding uniform multi-nested realizations for all Boolean functions with n inputs) (1) INITIALIZATION: (1.1) q, η, Determine {z1 , z2 , . . . , zm } using (33). (1.2) Initialize a table TAB(ID) with 2n entries ID = 0, . . . , 2n − 1. Each row contains a uniform multi-nested “gene” with the gene parameters listed in the order [s, z0 , b1 , b2 , . . . , bn ]. All genes are initialized with 0. (2) FOR z0 = −q, −q + 1, . . . , q FOR b1 = −q, −q + 1, . . . , q ··· FOR bn = −q, −q + 1, . . . , q FOR s ∈ {−1, 1} Exploratory gene: G = [s, z0 , b1 , b2 , . . . , bn ] Determine ID(G) by using the multinested formula (27) IF TAB(ID) IS EMPTY, TAB(ID) = G. Observe that although the above algorithm is rather simple, its running time (complexity) is of O((2q + 1)n+1 ), where q is proportional to 2n−1 . For n = 3, the running time is of the order of tens of minutes on a Pentium 233 Mhz processor. For n > 4, the running time becomes unreasonably large if q = 2n−1 . The parameter η has to be optimized for each n, and its value is directly related to the volume of the compact domain in the
parameter space associated with the Boolean function being realized. Example 8. Uniform multi-nested realizations for all Boolean functions with 3 inputs.
Using the above universal CNN gene enumeration algorithm, a complete table of genes for all Boolean functions with 3 inputs had been derived and shown in Table 1. This table was derived by starting with q = 4. A similar table had been presented in [Dogaru & Chua, 1998f] using a different approach, i.e. by learning parameters using a gradient descent procedure. The table in that case was derived for a nonuniform multi-nested formula (where each gene parameter has two additional parameters z1 , z2 ). These two parameters are uniquely specified for the entire table presented below; viz. z1 = −3, and z2 = −3/2, corresponding to η = 1.5q = 6. The enumeration algorithm presented above is found to be about 50 times faster than the gradient learning algorithm, and the resulting table is more compact. A further improvement can be easily achieved by gradually reducing and minimizing the number of nests. Example 9.
Uniform multi-nested realizations for
n = 4 inputs. (a) Fixed m = n − 1 = 3. For the case of n = 4, we started with q = 4, and η = 1.5q = 6. In this case only 17 856 (about 27%) of the total of 65 536 Boolean functions were found to have a uniform multi-nested realization. By gradually increasing the value of q and repeating the algorithm without discarding the information obtained from previous runs, we were able to fill the table with 60108 genes (i.e. 91% of the entire set of Boolean functions with 4 inputs) in almost one month of continuous running. The remaining genes were obtained for q = 16, and η = 1.5q = 24. By picking random values for the gene parameters, instead of enumerating all possible combinations from −q to q), each of the remaining Boolean functions can be realized in about 10 hours. Hence we have opted for a different approach to determine the realizations of the remaining 9% of the functions. This approach assumes a nonuniform multi-nested realization, where the bias parameters are free to vary and they are subject to change together with the orientation vector via a multipurpose optimization algorithm (Alopex) presented in detail in the next
Table 1. Uniform multi-nested CNN cell realizations with 2 nests for the entire set of Boolean functions with 3 inputs. Observe that two of the bias parameters (z1 , and z2 ) are independent of the Boolean function, therefore only five parameters s, z0 , b1 , b2 , b3 listed in the table will determine the “gene” associated with a particular function ID. For any function where ID > 127 the associated realization is the one listed in this table for the function 255 − ID but where the sign parameter s is inverted (multiplied by −1).
39
Table 1.
(Continued )
40
Universal CNN Cells 41
subsection. The Alopex algorithm acts on a given Boolean function by trying to find a valid gene via (27). For n = 4, this algorithm can find a realization of an arbitrary Boolean function (in the sense of finding its gene) in about 1 minute (on a Sun Ultra-Sparc 10 workstation under Matlab). For some functions, several runs with different initial conditions should be considered until a valid realization is found. (b) Variable m. In this case, we are interested to obtain some hints on how many Boolean functions admit representations with a minimum number m of nests. The value q = 16 was used as a compromise between speed and accuracy. In all cases, we use η = 1.5q = 24. The results show that among 60 108 functions for which 3-nested realizations were found, only 60 108 − 55 805 = 4303 require 3 nests. The remaining 55 805 require at most 2-nested realizations. Moreover, among them, 55 805 − 15 646 = 40 159 functions require m = 2 nests, while the remaining 15 646 functions require only m ≤ 1. Among them, 1882 functions are linearly separable and therefore only 15 646 − 1882 = 13 764 functions require effectively m = 1 nests. In percentages, this result is summarized in the pie-chart shown in Fig. 10(a). Even though this result is not complete (due to the difficulty in finding realizations for 8.2% of the Boolean functions), it gives us a glimpse on the distribution of Boolean functions with respect to the number of nests. Observe that most of the Boolean functions (61.2%) can be realized with only m = 2 nests, while the maximum number of nests (m = 3) is required for only 6.56% of the total number of cases. This result mainly suggests that the maximal number of nests (m = n − 1) is required to realize only a relatively small number of Boolean functions. This observation is consistent with that found for the multi-nested realizations of Boolean functions with 3 inputs presented in [Dogaru & Chua, 1998f]. In that case, only 9.37% of the functions were found to require the maximal number of nests, (m = 2) while most of the functions (50%) were realized with only m = 1, as shown in Fig. 10(b).
6.5. Boolean function realization via optimization with evolutionary strategies For limited number of inputs, exploratory search of the orientation vector parameter space can provide
a reliable method for realizing multi-nested CNN cells with a uniform roots structure, as shown in the previous sections. However, this approach becomes inefficient for n ≥ 4. In this section we will consider a general purpose optimization method, called “the Alopex algorithm” [Harth & Pandya, 1988], which can be used to realize any Boolean function with any number of input variables. The essence of this method is simple: Given a multi-nested structure (27) with a well defined number of nests, assume the parameter s to be either +1 or −1, and generate a random gene. In this case the gene G = [s, z0 , z1 , . . . , zm , b1 , . . . , bn ] is a vector with m + n + 1 real-valued parameters. In the case of a completely random search, at each step a random mutation is generated and an objective function F is evaluated. The function F , in our case, is defined to be the distance between the desired Boolean function Y and the effective Boolean function realized via (27) with the assumed gene. Whenever a gene is found to minimize F , its value is stored and the process continues until F reaches its global minimum value F = 0. The goal function in our case is defined as follow: F = 0.5
n −1 2X
|Y (Vj ) − sigm(w(σ(Vj , G), G), ρ)|
j=0
(42) where w(σ(Vj , G), G) is given by (27) for inputs corresponding to the vertex Vj and having its parameters defined by the actual gene G. The discontinuous “sign” function from equation (200 ) is replaced here with a smoother approximation; viz., the sigmoid function: sigm(x, ρ) =
1 − exp(−ρx) . 1 + exp(−ρx)
(43)
This replacement was found useful in order to achieve a faster convergence. By trial and error, the most convenient value for the sigmoid gain was found to be ρ = 0.5 for our optimization problem. In practice, the process of minimizing F by random search can last for a very long time, being somewhat equivalent to the natural process in the evolution of species by genetic mutations. The Alopex algorithm [Harth & Pandya, 1988] is a directed random search in the sense that it seeks a dramatic reduction in the speed of evolution by reinforcing those mutations which resulted in greater reductions in the goal function F . Therefore, the
42 R. Dogaru & L. O. Chua
Fig. 10. Minimal number of “nests” required for realizing a multi-nested PWL cell for arbitrary Boolean functions. The entire sets of possible Boolean functions with 4 inputs and 3 inputs are shown in (a) and (b), respectively. Observe that in both cases, most of the Boolean functions require m − 1 nests, where m is the maximum number of nests which is linearly proportional to the number of inputs. (a) The majority (61%) of the Boolean functions require only two absolute value function terms (nests) for realizing a uniform multi-nested CNN cell, while the maximum number of nests is 3. Due to the large amount of time required to explore the cell parameter space for all 65536 Boolean functions, no realizations were found for 8% of the functions. Further training procedures and the elimination of uniform restriction did show that all of these functions can be realized with no more than 3 nests (most requires 3 nests). (b) Uniform multi-nested PWL cells require only 1 nest to realize 50% of the 256 Boolean functions with 3 inputs, while the maximum number of nests (m = 2) is required for the realization of only 9% of the functions.
Universal CNN Cells 43
“path” in the parameter space towards an acceptable solution is not completely random; instead, it takes into account the correlation between random mutations and the objective function. An additional parameter “T ”, called temperature, influences the degree by which the mutations are either directed or left to be completely random. At the beginning of the Alopex process, the temperature is chosen to be high, which corresponds to a random search. Then the temperature is lowered during successive iterations so that the effects of mutations on the objective function can be taken into account, thereby achieving a faster convergence towards an optimal solution. In a random search, the mutation ∆G is a binary vector with a number of components equal to the number of parameters composing the gene, and where the probability of each component being either −1 or +1 is equal to each other; viz., 1/2. Therefore any direction in the parameter space is equally privileged. Formally, this situation can be expressed as: ∆G = [g1 , g2 , . . . , gk , . . . , gn+m+1 ] , (44) where
P (gk = −1) = P (gk = 1) = 0.5 .
The mutation strength is determined by an additional parameter θ > 0. Independent of the type of search (random or directed), the resulting “mutant” gene is: G = G + θ∆G .
(45)
In a directed search, the probabilities of the mutation vector are not necessarily equal to 1/2. Instead they are determined using the following steps: (a) Determine the correlation between the last mutation gk− and its effect on changing the goal function from F −− (from two iterations earlier) to F − : ck = gk− (F − − F −− ) , k = 1, . . . , m + n + 1 . (46) (b) Determine the following threshold parameter (which is influenced by the actual temperature T ): 1 . (47) tk = ck 1 + exp T
Observe that tk = 0.5 if the temperature is much higher than ck , and it can range from 0 to 1 depending on ck , when T < |ck |. (c) Determine the new component gk of the mutation vector gk = sgn(tk − ξ)
(48)
where ξ is a random variable uniformly distributed on the interval [0, 1]. Such a variable may correspond to a chaotic signal in a practical realization. The above Alopex algorithm for finding a multinested gene for a given Boolean function Y can be summarized as follow: (1) INITIALIZATION: (1.1) Algorithm parameters: n stps = 4000 , ρ = 0.5 ,
T = 1,
θ = 1/n ,
win temp = 10
(1.2) Generate an arbitrary random gene9 , G = [G1 , G2 , . . . , Gk , . . . , Gm+n+1 ], where Gk = ξ − 0.5, i.e., a random variable uniformly distributed between −0.5 and 0.5. (1.3) Generate a null vector of previous mutations and the maximum value possible for the previous goal functions: − ] ∆G− = [g1− , g2− , . . . , gk− , . . . , gm+n+1
= [0, 0, . . . , 0] , F − = F −− = 2n−1 . (2) FOR step = 1 to n stps (2.1) Generate a “directed” mutation ∆G using (46)–(48) and update the gene with (45). (2.2) Evaluate the objective function F using (42). (2.3) IF F < 0.45 test if the actual gene is a valid realization by employing (20 ) where w(σ) is given by (27) and its parameters are defined by the actual gene. IF (the realization is valid) EXIT END (2.4) Prepare the next step: ∆G− = ∆G, F −− = F − , F − = F In addition, the Alopex algorithm has a schedule for updating the temperature, after each pool
44 R. Dogaru & L. O. Chua
Fig. 11. Two examples which used the modified “Alopex” algorithm to “learn” the orientation vectors and the bias parameters so that a multi-nested CNN cell represents a given Boolean function with 4 inputs. In each figure the left plot indicates the evolution of the objective function F during the training process, and the right plot indicates the evolution of the “temperature” T parameter of the algorithm. As expected, both the objective function and the temperature decrease with some fluctuations. (a) A valid solution was found after 4000 iterations, if the number of nests was prescribed at m = 3. (b) When the number of nests is prescribed at m = 2, the “Alopex” algorithm is unable to find a valid solution indicating that for this particular function the maximum number of nests is required. The resulting solution corresponds to a Boolean function which differs from the original in only 1 of 16 bits.
Universal CNN Cells 45 Table 2. Excerpt from a table containing multi-nested CNN cell realizations with 3 nests of Boolean functions with 4 inputs. All rows in color black correspond to uniform multi-nested realizations (i.e. where the set of parameters [z1 , z2 , z3 ] = [−12, −6, −3] is independent of the function’s ID). In these cases, the orientation vectors and the bias parameter z0 were determined by the enumeration algorithm described in Sec. 6.4. The remaining rows (colored in green and red) indicate realizations found via the “Alopex” algorithm described in Sec. 6.5. The red row corresponds to a function with the bias parameter z1 = 0, thus indicating that two absolute value function terms have merged into one, corresponding to a 2-nested implementation.
of win temp steps. The new temperature is computed as the variance of a vector formed by all correlation coefficients ck evaluated with (46) during the last win temp steps. Although the algorithm parameters listed in the “Initialization” section of the algorithm were found to be the most convenient for solving Boolean functions with 4 inputs, they can be used for larger number of inputs with minor adjustments. Example 10. Consider the Boolean function with 4 inputs defined by ID = 43567. By choosing arbitrarily the sign parameter s = −1, the Alopex algorithm was first executed for m = 3. As shown
9
in Fig. 11(a), the result is a valid gene which was obtained after 4000 iterations. However, any attempt to find a realization with m = 2 failed, as shown in Fig. 11(b) where the algorithm saturates on a solution which differs by only 1 bit (F = 1) from the optimal one. Observe that the goal function F is not decreasing monotonically, as in the case of a well-tuned gradient algorithm. This is the effect of the partially random search which allows for a much better search strategy in the parameter space by avoiding being trapped in various local minima. It is also interesting to observe that by allowing the biases z1 , z2 , z3 to vary, the final solution does not usually satisfy the restriction (29).
Or a gene previously derived from an unfinished optimization process.
46 R. Dogaru & L. O. Chua
Completion of the entire genome for CNN cells with 4 inputs. Example 11.
As shown in Sec. 6.4, using an algorithm based on enumeration, the genes for more than 91% of the entire set of 65536 Boolean functions were found via a uniform multi-nested CNN cell realization. The Alopex algorithm was then used to complete the table successfully, in the sense that for any arbitrary Boolean function with 4 inputs, its corresponding gene is found in this table. The file containing the table as well as an auxiliary Matlab function to access it are stored on the ftp site: ftp://fred.eecs.berkeley.edu/pub/radu/full4.tar, or can be obtained by request from the authors. An excerpt from this table is listed in Table 2; namely, 24 lines starting with the function ID = 43565. The rows corresponding to realizations obtained via the enumeration algorithm in Sec. 6.4 are printed in black color, while those obtained using the Alopex algorithm (followed by additional quantization to get integer parameters) are printed in green or red color. Observe that [z1 , z2 , z3 ] = [−12, −6, −3] for any realization obtained via the enumeration algorithm, as expected. In the case of realizations generated via the Alopex algorithm, the biases have distinct values for each realization. In the case of the realization of the Boolean function with ID = 43577 (printed in red color), the Alopex algorithm yields a solution with z1 = 0, which is equivalent to a 2-nested realization.
7. Concluding Remarks In this paper we review recent results concerning a novel approach for designing uncoupled CNN cells with arbitrary number of inputs, which are capable of universal computation over the binary space. Since uncoupled CNN cells had been proved in [Chua, 1998] to have a steady state solution corresponding to a static input–output mapping, the term “universal computation” is used here in the sense that the same CNN cell structure, or gene, can realize any Boolean functions by merely changing the gene parameters. The determination of the gene associated with a desired Boolean function is called a realization problem. Our approach is completely different from other solutions reported in the literature, and it offers the 10
most compact representation, a very important criterion when many cells are to be implemented physically, such as a CNN chip. The central ideas in defining the new type of CNN cells are: (i) Replacement of the linear discriminant functions w(σ) = σ in the standard CNN cell with piecewise-linear functions in either a canonical form [Sec. 5, Eq. (22)], or a new form called a multi-nested discriminant function [Sec. 6, Eq. (27)]. The rest of the cell structure [Eqs. (1) and (3) in Sec. 2] is kept unchanged. Both piecewise-linear discriminants have simple explicit formulas, admit recursive implementations, and are characterized by an addition of m bias parameters. The case m = 0 corresponds to the linear discriminant in either (22) or (27), while for m = 1 (22) and (27) are equivalent. (ii) A new interpretation for the feed-forward template b, now called an orientation vector, and the introduction of a novel concept; viz., a projection tape (Sec. 3) which is a one-dimensional projection of all vertices defining a dichotomy 10 via the linear transformation σ = buT . By employing these new concepts we were able to show that the distinction between a “linearly” and a “not linearly” separable Boolean function is not essential. In particular, we have shown in Secs. 3 and 4 that there is always a set m of parallel hyperplanes which can separate arbitrary Boolean dichotomies. There are several differences between the two discriminant functions (22) and (27). While (22) leads to a simple design algorithm [Sec. 5, Eqs. (23)– (26)], and is guaranteed to have a valid representation for any Boolean function, it has a complexity (given by m) which appears to grow exponentially with the number n of inputs. On the other hand, (27) was introduced to reduce the implementation complexity, which is now at most linear with the number of inputs n. However, for arbitrary Boolean functions it leads to a more complicated gene realization, as shown in Sec. 6.5. It is remarkable that for a certain class of Boolean functions with arbitrary number of inputs, including totalistic and semitotalistic functions, both types of cells have simple and exact realization procedures, as shown in Secs. 4 and 6.3,
A (binary) dichotomy is the specification of the (two) classes of vertices in the hypercube representation of a Boolean function having the same sign attribute (or the same color in the graphical representation).
Universal CNN Cells 47
respectively. Moreover, for cells with a number of inputs smaller then 4, using enumeration techniques, the complete genomes were found, as shown in Secs. 6.4 and 6.5. Among the most remarkable result of using our new approach based on the multi-nested formula (27) is the proof of a theorem showing that the parity function with arbitrary n inputs admits a realization with a physical complexity of only O(log2 (n)) (Sec. 6.3). This result outperforms the complexity reported for such realizations via digital techniques (based on combining elementary gates) which have a complexity bound of O(n). This result clearly demonstrates the potential of analog computation for the case of binary inputs and outputs, and anticipates that CNN cells with piecewise-linear discriminants of the type (22), or (27) can outperform another class of analog computing devices; viz., the multilayer perceptrons (MLP). For the later case, the parity functions are considered to be “hard” and the complexity of the MLPs implementing such functions is reported to grow at least by O(nα ). In fact, by using (22), the same parity function with n inputs was found to admit a discriminant function which produces a number of parameters m = O(n) less than that required by many other arbitrarily selected Boolean functions. Therefore, in the light of these results, we can conjecture that the parity function is far from being “among the most complex functions” as it is widely accepted. In fact, the novel types of cells described here offer via their realization algorithms, a more natural way to characterize the complexity of a Boolean function, assuming that we can guarantee that our realization algorithm will lead to an optimal realization (i.e., m = m∗ ). Unfortunately, the design methods described in this paper are guaranteed to provide optimal realizations only for a limited class of Boolean functions, which include the special cases of totalistic and semitotalistic functions discussed above, and partially, the methods based on enumeration. A major goal for further research, is therefore to develop better realization algorithms. However we should stress that, as shown in Secs. 3 and 6.5, respectively, both types of cells can use evolutionary algorithms to evolve their gene realizations. Therefore, even with actual algorithms, it is just a matter a time to run such algorithms and let them evolve into other genes which may be very close to a suboptimal, if not the optimal ones. Another important result, described in Sec. 4, concerns the simplest CNN cell for implementing
the Boolean function “Life”. Its associated complexity is found to be m∗ = 1, for either (22) or (27). If we define a complexity index ω = m∗ /n for CNN cells defined by the multi-nested discriminant (27), we would obtain ωLife = 1/9. It would be interesting to determine whether there is a relationship between this complexity measure defined above for a local static Boolean function and the “behavioral” complexity observed from a Generalized Cellular Automata [Chua, 1998] which connects uncoupled CNN cells via a spatiotemporal feedback loop.
Acknowledgments This work is partially supported by the DOD Office of Naval Research under grant number N00014-981-0594.
References Aizenberg, N. N., Aizenberg, I. N. & Krivosheev, G. A. [1995] “Multi- valued neurons: Learning, networks, application to image recognition and extrapolation of temporal series,” Lecture Notes in Computer Science 930, eds. Mira, J. & Sandoval, F., pp. 389–395. Chua, L. O. & Kang, S. M. [1977] “Section-wise piecewise-linear functions: Canonical representation, properties, and applications,” Proc. IEEE 65(6), 915–929. Chua, L. O. & Yang, L. [1988] “Cellular neural networks: Theory and applications,” IEEE Trans. Circuits Syst. 35, 1257–1290. Chua, L. O. [1998] CNN: A Paradigm for Complexity, (World Scientific, Singapore). Chua, L. O. [1999] “Passivity and complexity,” IEEE Trans. Circuits Syst. I 46(1), 71–81. Berlekamp, E., Conway, J. H. & Guy, R. K. [1982] Winning Ways for Your Mathematical Plays (Academic, NY), Vol. 2, Chapter 2, pp. 817–850. Crounse, K. R., Fung, E. L. & Chua, L. O. [1997] “Efficient implementation of neighbourhood logic for cellular automata via the cellular neural network universal machine,” IEEE Trans. Circuits Syst. I 44(4), 355–361. Dogaru, R. & Chua, L. O. [1998a] “Rectification neural networks: A novel adaptive architecture and its application for implementing the local logic of cellular neural networks,” U.C. Berkeley, Electronics Research Laboratory Memorandum No. UCB/ERL M98/4, 15 January 1998. Also available from ftp://fred.eecs.berkeley.edu/pub/radu/papers/rnn.ps Dogaru R. & Chua, L. O. [1998b] “Edge of chaos and local activity domain of FitzHugh–Nagumo Equation,” Int. J. Bifurcation and Chaos 8(2), 211–257.
48 R. Dogaru & L. O. Chua
Dogaru, R. & Chua, L. O. [1998c] “Edge of chaos and local activity domain of the Brusselator CNN,” Int. J. Bifurcation and Chaos 8(6), 1107–1130. Dogaru, R. & Chua, L. O. [1998d] “Edge of chaos and local activity domain of the Gierer–Meinhardt CNN,” Int. J. Bifurcation and Chaos 8(12), 2321–2340. Dogaru, R., Chua, L. O. & Crounse, K. [1998e], “Pyramidal cells: A novel class of adaptive coupling cells and their applications for cellular neural networks,” IEEE Trans. Circuits Syst. I 45(10), 1077–1090. Dogaru, R. & Chua, L. O. [1998f] “CNN genes for one-dimensional cellular automata: A multi-nested piecewise-linear approach,” Int. J. Bifurcation and Chaos 8(10), 1987–2001. Fielder, U. & Seitzer, D. [1979] “A high-speed 8 bit A/D converter based on a gray-code multiple folding circuit,” IEEE J. Solid-State Circuits 14(3), 547–551. G¨ uzelis, C. & G¨oknar, I. C. [1991] “A canonical representation for piecewise-affine maps and its application to circuit analysis,” IEEE Trans. Circuits Syst. I 38(11), 1342–1354. Haring, D. R. [1966] “Multi-threshold threshold elements,” IEEE Trans. Electron. Comput. EC-15(1), 45–65. Harth, E. & Pandya, A. S. [1988] “Dynamics of ALOPEX process: Application to optimization problems,” Biomathematics and Related Computational Problems, ed. Ricciardi, L. M. (Kluwer Academic Publishers), pp. 459–471. Hassoun, M. H. [1995] Fundamentals of Artificial Neural Networks (MIT Press). Kahlert, C. & Chua, L. O. [1992] “The complete canonical piecewise-linear representation — Part I: The ge-
ometry of the domain space,” IEEE Trans. Circuits Syst. I 39(3), 222–236. Kennedy, M. P. [1995] “A nonlinear dynamics interpretation of algorithmic A/D conversion,” Int. J. Bifurcation and Chaos 5(3), 891–893. Koza, J. R. [1994] Genetic Programming II (MIT Press, Cambridge MA). Langton, C. G. [1990] “Computation at the edge of chaos: Phase transitions and emergent computation,” Physica D42, 12–37. Madan, R. N. [1993] Chua’s Circuit: A Paradigm for Chaos (World Scientific, Singapore). Mange, D. & Tomassini, M. (eds.) [1998] Bio-Inspired Computing Machines (Press Polytechniques et Universitaires Romandes, Lausanne). Nauta, B. & Venes, A. G. W. [1995] “A 70-MS/s 110-mW 8-b CMOS folding and interpolating A/D converter,” IEEE J. Solid-State Circuits 30(12), 1302–1308. Nemes, L., Chua, L. O. & Roska, T. [1998] “Implementation of arbitrary Boolean functions on the CNN universal machine,” Int. J. Circuits Theor. Appl. 26(6), 593–610. Ott, E. [1993] Chaos in Dynamical Systems (Cambridge University Press, UK). Roska, T. & Chua, L. O. [1993] “The CNN universal machine: An analogic array computer,” IEEE Trans. Circuits Syst. II: Analog and Digital Signal Processing 40, 163–173. Toffoli, T. & Margolus, N. [1987] Cellular Automata Machines (MIT Press, Cambridge, MA). Wolfram, S. [1984] “Universality and complexity in cellular automoata,” Physica D10, 1–35.