UNIFYING RESULTS IN CNN THEORY USING DELTA OPERATOR Martin Hänggi1 , Hari C. Reddy2∗, and George S. Moschytz1 1 Signal
2 Dept. of Electrical Engineering California State University, Long Beach CA 90840, USA
[email protected]
and Information Processing Laboratory Swiss Federal Institute of Technology 8092 Zurich, Switzerland
[email protected]
ABSTRACT By means of the delta operator, a new type of CNN, the (δ, c)-CNN, is introduced. It is a superclass of continuoustime (CT) and discrete-time (DT) CNNs with any saturation-, high-gain-, or hardlimiting sign-nonlinearity. It is shown that the (δ, c)-CNN allows continuous transition between different types of nonlinearities and between CT- and DT-CNNs, providing a unifying framework for CNN theory. In particular, the problem of optimally robust template design can be dealt with in a unified manner. 1. INTRODUCTION
Note that f c (x) is linear for |x| < c with a slope of 1/c and that lim f c (x) = sgn(x).
(3)
c→0
This nonlinearity is a generalization of the sat(·) ≡ f 1 (·) function in [1]. The boundary condition for both state and input is assumed to be −1 throughout the paper. For the implementation of CNNs as sampled-data systems [2], discrete-time CNNs (DT-CNNs) were proposed in [3]. Their basic dynamic equation is X x i [k + 1] = am f c (x m [k]) + bm u m + I . (4) m∈Ni
Continuous-time cellular neural networks (CT-CNNs) were introduced in [1]. In this paper, we consider two-dimensional and spatially invariant networks with a single layer. Their dynamics is governed by a system of n = M N differential equations, C
X 1 d x i (t) = − x i (t) + ak fc (x k (t)) + bk u k + I , (1) dt R k∈Ni
In the original definition, the DT-CNN was restricted to the sgn(·) ≡ f 0 (·) nonlinearity. This paper aims at providing a unifying framework for CT- and DT-CNNs with different types of nonlinearities. In Sec. 2, we introduce the (δ, c)-CNN as a superset of the classes of CT- and DT-CNNs with sat(·) and sgn(·) output functions. Sec. 3 investigates robustness and template design issues, and Sec. 4 concludes the paper.
where Ni denotes the neighborhood of the cell Ci . The CNN is characterized by the template set A and B containing ak and bk , respectively, and a bias I . f c (·) is the piece-
2. THE (δ, C )-CNN The delta operator, also called forward Euler operator, is defined on the set of causal sequences by [4]
fc (x) 1
,1
δx[k] =
,c c
1
x
Figure 1: The nonlinear output function f c (·)
lim δx[k] = x(t)| ˙ t =kT .
wise linear saturation function (Fig. 1) 1 |x + c| − |x − c| , 2c
T →0
0 6 c 6 1.
(2)
∗This work was carried out while Prof. H.C. Reddy was a visiting professor at the Institute for Integrated Systems (IIS) at ETHZ as a guest of Prof. Qiuting Huang.
(5)
(6)
The dynamic equation of a delta-operator based discretetime CNN is defined by [5, 6] Cδx i j [k] = −
V-547 0-7803-5474-5/99/$10.00(C)1999 IEEE
k ∈ Z.
Regarding x[k] = x(kT ) as a discrete-time sequence, sampled from a continuous-time signal x(t), the delta operator is an approximation of its derivative. In the limiting case of T → 0, we obtain
,1
fc (x) =
x[k + 1] − x[k] , T
1 x i j [k] + A ∗ f c (x i j [k]) + B ∗ u i j + I , (7) R
I u
B
C[x(nT )] ,1
1
C
x(nT )
fc
y
,R1 A neighbors
Figure 2: Blockdiagram of δ CNN where we apply the spatial convolution operator ∗. By expanding (7), we obtain x i j [k+1] = 1−
T T x i j [k] + A ∗ f c (x i j [k])+B ∗u i j +I τ C (8)
with τ = RC as the internal time scale of the analog system. By inspection of (8), we can see that CT-CNNs and DT-CNNs are the limiting cases of δ-CNNs, as T → 0 and T → τ , respectively. InR the blockdiagram (Fig. 2), the continuous-time integrator and the shift operator q ◦−−• z −1 , respectively, are replaced by the delta inverse operator δ −1 . Hence, by an appropriate choice of T and c, the (δ, c)-CNN qualifies as a CT- or DT-CNN with any of the nonlinearities sat(·) or sgn(·). In the following, for the sake of simplicity but with no loss in generality, we normalize both R and C to unity. Since the edges in the (T , c)-plane (Fig. 3) are reached by a continuous change of T and c, any value 0 6 T 6 1, 0 6 c 6 1 is possible, which is particularly interesting for two reasons. On the one hand, high-gain CNNs (c < 1) [7] and numerically forward Euler integrated CT-CNNs [8] are covered as well. On the other hand, it can be expected that theoretical results for (δ, c)-CNNs are applicable to CT- and DT-CNNs and vice versa. For our considerations in the next section, we restrict
c SAT-CTCNN
SAT-DTCNN
1
SGN-CTCNN
0
0
SGN-DTCNN
1
T
Figure 3: The (T , c) plane ourselves to CNN operations that are feasible by all (δ, c) CNNs. This excludes tasks with equilibria |y ∗ | = | f c (x ∗ )| < 1 (gray-scale output) and tasks whose equilibria depend on mutual dynamical influence of neighboring cells, such as
the Laplacian operator [9]. Most of the CNN tasks, namely the so-called locally regular ones [8, 10], including propagating operations such as shadowing, connected component detection, hole filling, and global connectivity detection, however, can be done by all (δ, c)-CNNs. This implies that the equilibria of (δ, c)-CNNs do not depend on T and c. Solving (8) for its equilibrium yields x i∗j = R · A ∗ f c (x i∗j ) + B ∗ u i j + I , (9) where, in fact, T is cancelled out, and c does not matter due to the assumption f c (x ∗ ) = 1. The stability is not influenced by T or c, either. Stability proofs do not rely on a particular choice of nonlinearity but include the whole class of odd, bounded, and monotonic functions, which clearly is a superset of f c (·). For locally regular templates and (δ, c)-CNNs with 0 6 T 6 1, stability is proved in [8]. Furthermore, local regularity implies bipolar input and initial state – this is a necessary restriction for our discussion of robustness issues in the next section. 3. ROBUSTNESS AND TEMPLATE DESIGN The robustness of a CNN template set is a measure which quantifies the degree by which a template set can be altered while still producing the desired output. In programs for CNN VLSI chips, it is crucial that all templates have a certain degree of robustness, since their values cannot be guaranteed to be reproduced exactly by the analog circuit, but suffer from perturbations of typically 5-10%. The relative robustness of a CNN template set is defined to be the maximal relative simultaneous perturbation the template can bear before the CNN ceases to operate correctly. Definition 1 The relative robustness D(·) of a template set T is D(T ) = D(p) = max α y∞ p ◦ (11 + α 1 ± ) = y∞ (p) α ∀ 1 ± ∈ {−1, 1}m . p is the vector of all m non-zero entries in the template set T , and ◦ denotes componentwise vector multiplication. The center element of the A-template is denoted by ac . The slope for |x| < c is −1 + ac /c; ac > c therefore guarantees that all stable equilibria lie in the saturated region |x ∗ | > c, resulting in a bipolar output (| f c (x)| = 1 ∀|x| > c). Fig. 4 shows two dynamic routes with a single equilibrium point, while in Fig. 5, a dynamic route with two stable equilibria is depicted. Note that for all stable equilibria |x ∗ | > c. Locally regular tasks are fully characterized by a set of local rules specifying the sign of a cell’s state derivative for different configurations of neighboring cells. If a perturbed template set leads to a change in the derivative, then (at least) one local rule is violated. This violation is critical if it results in a wrong output at equilibrium.
V-548 0-7803-5474-5/99/$10.00(C)1999 IEEE
(bipolar) configuration of the neighboring cells. If (10) is satisfied at the critical points, then it is guaranteed that the correct operation is performed. Taking ac out of p and noting that | f c (x c )| = 1, we obtain
δx
x2∗
31
c
(−c + ac ) f c (x c ) + kˆ t pˆ ? 0
x1∗
x
−c
32
δx
xb∗
3a
−c
xa∗
x 3b
ˆ respecfor reduced coefficient and template vectors kˆ and p, tively. Defining a˜ c := ac − c and introducing a template vector p˜ which includes a˜ c instead of ac , we may write (11) as k˜ t p˜ > 0
Figure 4: Phase plot with two dynamic routes, ac > c.
(12)
˜ by appropriately choosing the signs of the coefficients in k. With q as the number of local rules, we define a coefficient matrix K ∈ {−1, 1}q×m to be ˜ k˜ t1 sgn (k˜ t1 p) ˜t sgn (k˜ t p) 2 ˜ k2 K= (13) . .. . ˜ k˜ t sgn( k˜ t p) q
c
(11)
q
By means of K, the characterization of a CNN task simplifies to the following linear and homogeneous system of inequalities Figure 5: Phase plot with two stable equilibria, ac > c. Definition 2 A critical point in the trajectory of a cell is a point x c [kc ], where the smallest absolute perturbation in δx[k] is sufficient to cause a wrong output at equilibrium.
X
Proof: A wrong output occurs if and only if the perturbation creates a new equilibrium point in any of the saturation regions, or if an existing equilibrium is eliminated. Suppose that the dynamic route of a cell suffers from a translation parallel to the δx axis, caused by perturbations in the template set. Since the dynamic route in the linear region |x| < c is a straight line with positive slope in the phase plot, equilibrium points are created or eliminated exactly when one of the points δx||x|=c toggles its sign. Any horizontal translation is not more critical than this vertical shift, due to |x ∗ − c| > δx||x|=c . Lemma 1 can be verified in Figs. 4 and 5, where 3 denotes the safety margin δx||x|=c . Note that bipolar initialization is assumed. A single local rule for a CNN task may be formulated as −x + kt p > 0
or
− x + kt p < 0 ,
(10)
depending on the desired sign for δx. k is a vector of the same dimension as the template vector p and represents the
∀1 6 i
6q,
(14)
which is solved for the optimally robust template vector by [10, 11] p˜ opt (γ ) = (Kt K)−1 Kt γ 1 m
D p˜ opt (γ ) =
(15)
1 γ = kp˜ opt (γ )k1 + c kp˜ opt (1)k1 + γc
(16)
is its relative robustness. To obtain the optimum template for any value of c 6 1, we have to add c to the value for a˜ c from (15). The safety margin does not depend on c. In the case of c = 0 (SGN-CNN), (15) directly yields the optimum template, and the relative robustness is larger than for any c > 0 due to the smaller denominator in (16). If we dispose of an optimum template for any c, we can easily determine the optimum template for any other c0 simply by adding c0 − c to ac . All other template entries remain unchanged. We are now ready to state the following theorem which is already proved by the above deduction. Theorem 1 (Robust Template Design for (δ, c)-CNNs.) If, for any (δ, c)-CNN, a1 a2 a3 A = a4 ac a6 ; B ; I a7 a8 a9
V-549 0-7803-5474-5/99/$10.00(C)1999 IEEE
>0
with 1 = [1, 1, . . . , 1]t . γ is the minimum safety margin of the template set, and
Lemma 1 The set of critical points c is c := x[k] |x[k]| = c .
X
˜ i (K p)
is an optimally robust template set, then the optimally robust template set for a (δ, c0 )-CNN is a1 a2 a3 A = a4 ac − c + c 0 a6 ; B ; I , a7 a8 a9 independent of the value of T . The robustness given by (16) is monotonically decreasing with increasing c. To directly design an optimum template vector, p˜ opt (γ ) has to be calculated (see (15)) yielding the solution for c = 0. Simple addition of c to ac gives the solution for any c 6 1. Example: Connected component detection. For a SAT-CTCNN, the connected component detector A = [γ γ + 1 − γ ] ;
B = [ 0 ];
I =0
is optimally robust with a safety margin of γ and a robustness of γ /(3γ + 1) [10]. The very same template set is optimal for any (δ, c)-CNN with c = 1, including SAT-DTCNNs. For a (δ, c)-CNN with sgn-nonlinearity, A = [γ γ − γ ] ;
B = [ 0 ];
I =0
is optimal with 33% robustness, irrespective of γ . In general, A = [γ γ + c − γ ] ;
B = [ 0 ];
I =0
is the optimally robust connected component detector. 4. CONCLUSIONS (δ, c)-CNNs were introduced as new class of CNNs. They are based on the delta operator which allows a continuous transition between continuous-time and discrete-time systems. CT- and DT-CNNs appear as limiting cases for T → 0 and T → 1, respectively. The output nonlinearity is defined by the parameter c denoting the width of the linear region. Four important standard types of CNNs are located as the four corners in the (T , c) plane, bounded by 0 6 T , c 6 1. Hence, (δ, c)-CNNs comprise all these standard CNNs and allow the development of a unified theory. While stability issues do not depend on T or c, the robustness of a template set decreases slightly with increasing c. Restricting ourselves to the important class of locally regular templates, the analysis of the dynamic route shows that results for robust template design can be generalized in a very direct fashion for (δ, c)-CNNs. Surprisingly, c has simply to be added to the center element of the A-template in order to obtain the optimally robust template for any c. All other template parameters remain unchanged. Further investigations of (δ, c)-CNNs will include other aspects of CNN theory such as the settling time.
5. REFERENCES [1] Leon O. Chua and Lin Yang, “Cellular Neural Networks: Theory,” IEEE Transactions on Circuits and Systems–I, vol. 35, no. 10, pp. 1257–1272, Oct. 1988. [2] Hubert Harrer and Josef A. Nossek, “An analog implementation of discrete-time CNNs,” IEEE Transactions on Neural Networks, vol. 3, 1992. [3] Hubert Harrer and Josef A. Nossek, “Discrete-time cellular neural networks,” International Journal of Circuit Theory and Applications, vol. 20, Sept. 1992. [4] Graham C. Goodwin, Richard H. Middleton, and H. Vincent Poor, “High-speed Digital Signal Processing and Control,” Proceedings of the IEEE, vol. 80, no. 2, pp. 240–259, Feb. 1992. [5] Hari C. Reddy and George S. Moschytz, “Unified Cellular Neural Network Cell Dynamical Equation Using Delta Operator,” in IEEE International Symposium On Circuits And Systems, Hong Kong, June 1997, vol. 1, pp. 577–580. [6] Bahram Mirzai, Hari C. Reddy, and George S. Moschytz, “Robustness of Delta Operator Based Cellular Neural Networks,” in IEEE International Symposium On Circuits And Systems, Hong Kong, June 1997, vol. 1, pp. 761–764. [7] Ari Paasio, Adam Dawidziuk, and Veikko Porra, “VLSI implementation of cellular neural network universal machine,” in IEEE International Symposium On Circuits And Systems, Hong Kong, June 1997, vol. 1, pp. 545–548. [8] Martin Hänggi, Hari C. Reddy, and George S. Moschytz, “The CNN Sampling Theorem,” IEEE Transactions on Circuits and Systems–I, 1998, submitted for publication. [9] Leon O. Chua and Lin Yang, “Cellular Neural Networks: Applications,” IEEE Transactions on Circuits and Systems–I, vol. 35, no. 10, pp. 1273–1290, Oct. 1988. [10] Martin Hänggi and George S. Moschytz, “An Exact and Direct Analytical Method for the Design of Optimally Robust CNN Templates,” IEEE Transactions on Circuits and Systems–I, vol. 46, no. 2, Feb. 1999. [11] Martin Hänggi and George S. Moschytz, “Making CNN Templates Optimally Robust,” in International Symposium on Nonlinear Theory and its Applications, Crans-Montana, Switzerland, Sept. 1998, vol. 3, pp. 935–938.
V-550 0-7803-5474-5/99/$10.00(C)1999 IEEE