optimization of templates based on measurements of actual CNN chips; ... than one robust templates using the decomposition method presented here. We use ...
Fault Tolerant CNN Template Design and Optimization Based on Chip Measurements Péter Földesy, László Kék, Tamás Roska, Ákos Zarándy and Guszti Bártfai Analogical and Neural Computing Laboratory, Hungarian Academy of Sciences, H-1111, Kende u. 13-17., Budapest, Hungary. Phone: +36-1-209-5263, FAX: +36-1-209-5264 ABSTRACT: This paper proposes a generic method for finding non-propagating Cellular Neural Network (CNN) templates that can be implemented reliably on a given CNN Universal Machine chip. The method has two main components: (i) adaptive optimization of templates based on measurements of actual CNN chips; (ii) simplification and decomposition of Boolean operators into a sequence of simpler ones that work correctly and more robustly on a given chip. Examples are presented using two concrete stored-program CNNUM chips to demonstrate the effectiveness of the proposed method, whose advantages and limitations are also discussed.
1. Introduction Cellular Neural Networks (CNN’s) are arrays of locally and regularly interconnected neurons, or cells, whose global functionality are defined by a small number of parameters that specify the operation of the component cells as well as the connection weights between them [1]. They are particularly suitable for hardware implementation due to their local connectivity. A common phenomenon of analog VLSI circuits is parameter scattering, which causes them to behave differently from their intended (ideal) functionality. This is also true for the implementation of CNN Universal Machine chips [2, 4], where parameter deviations introduce errors to the output of various Boolean and analog operators. To compensate for these inaccuracies, several template optimization methods have been developed [e.g., 8, 9, 10, 11, 17], which are mostly based on generating test patterns (or images) for the whole chip to be used during the optimization process. The main drawback of these approaches is that the mainly ad hoc (global) test patterns only cover a small fraction of the input space, which largely decreases the chance for good generalization. Furthermore, the resulting templates after optimization may not be reliably used on a given chip. In this paper, we introduce a generic method for finding non-propagating template values that can be reliably implemented on a given CNN Universal Machine chip. The method has two main components: (i) adaptive optimization of templates based on measurements of actual CNN chips; (ii) simplification and decomposition of Boolean operators into a sequence of simpler ones that work correctly and more robustly on a given chip. The actual implementation of the method depends on the types of allowable operators and the application sequence of the above two steps. In the present implementation of the method, the adaptive template optimization step is applied first. If a single template still fails at some cells/pixels after optimization, it is then decomposed into more than one robust templates using the decomposition method presented here. We use two stored-program CNN chips as examples to illustrate the method: one is a 22x20-cell chip with binary input-output [5], the other is a 14x14-cell chip with analog input-output [6]. In section 2, the adaptive template optimization method for both binary and analog input-output chips is introduced. Section 3 describes a method for decomposing complex templates into simpler ones that can be implemented reliably on CNN chips with binary inputs. An experimental example is shown in section 4 to illustrate the effectiveness of the method. Finally, conclusions are drawn in section 5.
1
2. Adaptive template optimization In real CNN VLSI chips, the actual template values that get stored (and used) at each cell will be different from the ideal ones due to parameter scattering introduced during the fabrication process. This will result in some cells responding erroneously to certain inputs. Modifying the template elements will, in general, affect the amount of wrong cells (in case of binary outputs) or the magnitude of the error at each cell (in case of analog outputs). This leads naturally to the application of optimization methods that aim to minimize this error. The complexity and effectiveness of these methods depends crucially on (i) how well the chip is modeled and (ii) how the error is minimized. Here we propose a method for modeling a CNN chip with a single linear neuron (ADALINE), which is optimized via the LMS gradient descent method [13]. The main idea is to map the many output values of a CNN chip — in response to identical local inputs — into a single “cumulative” output (see figure 1) by calculating the normalized sum (or average) of individual cell outputs. This way we get a model for the entire chip which is functionally equivalent to a single ADALINE [13].
Σ Ideal output Actual output1 Cumulative output Figure 1. This figure shows how the output of an entire array of cells can be mapped onto a single output value. An ideal CNN cell implementing a Boolean operator (on the left) produces black (+1) or white (-1) output only (for a given local input, not shown here), resulting in an output array of only one color (white here, in the middle). In practice, some cells from the array will produce erroneous output (black in this case), which causes the cumulative output to be non-binary (shown gray on far right). The model generalizes to cells implementing analog operators as well. More precisely, the ideal steady-state output of a non-propagating CNN cell is defined by the following algebraic equation [3]: y s = sgn ( A0, 0 − 1) ⋅ x ( 0) + Bkl ⋅ ukl + z C ( k ,l )∈S r ( 0, 0 )
∑
(1)
where x(0) is the initial state of the cell, ukl are the input values (or input vector u) and Bkl are the control template values (or template vector B) in the Sr neighborhood of the cell, z is the bias and A0,0 is the selffeedback. If we assume that A0,0 and x(0) are constant, then (1) will be equivalent to an ADALINE with threshold output. The individual cell outputs yci ( i= 1,…,Nc where Nc is the total number of cells) will not be always equal to ys due to the random variations of Bi and zi. We now define the cumulative output of the cells ( y c ) as their average, i.e., y c
≡
1 Nc
Nc
∑ yci .
Since the whole chip is controlled via a single (cell-independent, or space-
i =1
invariant) B template and z bias, we can define the cumulative squared error of the chip for a given local input (which is identical to each cell), template and bias as E u ( u, B, z ) ≡ ( y s − y c ) 2 . The average of Eu over a given training set defines the mean squared error (MSE) of the chip for a given template and bias as 1 E ( B, z ) ≡ Eu , where Ns is the size of training set TSu. The template optimization task is now defined as N s u∈TS
∑
u
the minimization of E by varying B and z. Since the exact form of the E(B, z) function for a given chip is generally not known, iterative optimization methods can be applied. We used the LMS training algorithm in a set up shown in figure 2. The initial B and z are the ideal values, which are input to a CNN simulator to produce the desired (target) output for each input u in the training set. These values are adjusted in each training iteration according to the LMS learning rule until the minimum of E is reached.
1
Note that the actual output of each cell in response to given local input pattern cannot be calculated by the CNN chip in one run since the local inputs overlap. Therefore, the output maps shown in this article were generated in several runs with the same input pattern appearing at different cell locations. 2
Ideal output
Simulation Input
CNN chip B, z
Σ
Cumulative output
Error
Training set
LMS Figure 2. Training set up. The ideal output is produced by a CNN simulator, which is compared with the actual (cumulative) output of the chip. The error is used to modify the template and bias values. Figure 3 shows the response of the analog CNNUM chip [6] to two random (analog) input vectors before and after training, which took only 4 iterations. As can be seen, there were still incorrect cells after training, which meant that E did not reach zero. This raises the question of whether zero error can in theory be reached with this method.
Input pattern 1
Î
Actual output before and after training
Input pattern 2
Î
Actual output before and after training
Figure 3. This figure shows how individual cells (excluding the boundaries) responded to two input patterns before and after training using the “edge gray” template [18] on the analog input-output CNNUM chip [6]. The upper left pixels, which were added there for clarity only, show the desired response to the corresponding input (black and white, respectively). Statement: Given a CNN chip and its model ADALINE node (as defined above), the LMS training algorithm is guaranteed to find a global minimum of E for a given training set, which is not necessarily zero. The truth of this statement can be seen by first noting that with the ADALINE linear chip model, the E(B, z) error surface — as “seen” by the optimization method — will be paraboloid, which guarantees that the local minimum found by LMS will also be global. The error will not be zero in general, however, because (i) the chip is controlled via a global template and bias only, and (ii) the ADALINE model gives only an approximation of the real E(B, z) error surface. In order to eliminate the error completely with this method, we either need a more sophisticated (possibly nonlinear) model of the chip or need to be able control individual cells of the chip, or both. For Boolean operators, we propose an alternative solution, which is based on the systematic simplification and decomposition of templates. In this case, it is possible to generate all input combinations (if Sr=1 the total number of input patterns are 29=512), which are used in the optimization process.
3. Fault Tolerant Template Design for Local Boolean Operators Theoretically, all local Boolean operators (functions) can be implemented using one ore more CNN templates. When the logic function is linearly separable, it can be realized by a single template. Otherwise the function needs to be decomposed into a sequence of linearly separable functions (operators), i.e., CNN templates. 3.1 Fault Tolerant Templates In practice, all the templates to be used must be fault tolerant. Suppose a binary input/output CNN chip is given. Let T be an arbitrary 3x3 linear template that achieves a certain binary logic function F of 9 variables. T can be optimized in order to get a more robust template realizing F. T is said to be fault tolerant with regard to the given CNN chip if it gives the ideal result for all the 29 = 512 inputs with regard to all CNN cells, that is, if F is realized by T on each CNN cell. So far in the CNN literature the on-chip experiments have shown the correctness of a certain template only for some inputs. However it does not approve the correctness of the template at all!
3
Template optimization methods, including the one discussed in the previous section, may not always produce fault tolerant templates. In such cases, the original template can be replaced by a sequence of fault tolerant ones. We propose an approach to the “fault tolerant template sequence generation” problem. 3.2 Logic Function Minimization Measurements made on the given CNN chip [5] by using the CNN chip prototyping system [10] have shown that with increasing the number of zero template elements the robustness of the template grows. Therefore a possible way for fault tolerant template design is the generation of a sequence of robust templates, that is, templates having “enough” zero template elements. The main idea of the method is the logic function minimization [15] that provides the disjunctive normal form (DNF) of the function. An obvious solution for the “fault tolerant template design” problem is the generation of (one or more) fault tolerant templates for each prime implicant and performing the logic relations according to the DNF. This might result in a large number of templates that might be not effective in a real application. Our idea is the manipulation of prime implicants in order to decrease the number of templates needed. It can be achieved, for instance, by finding common terms in the set of prime implicants. A method of template generation for a logic term (prime implicant) is proposed here. Consider logic 9
∏
expressions L1 =
ui* and L2 =
i =1
respectively;
ui*
9
∑ ui* , where ∏ (∑) denotes the logic AND (OR) operation on i =1
ui* and u*j ,
1, if u is not present in the expression L1 0, if u is not present in the expression L 2 = * u , if u is the ponalt of u i i i ui , if u *i is the negalt of u i * i * i
It is obvious that the template structure to be used should be as follows:
E E E
E E E
E E E
]
]
ui* = 1 (0) means that ui is not present in the logic expression L1 (L2), respectively. Hence the template + 1, if u *i = u i element bi must be equal to 0. Otherwise bi = . The current value z is to be calculated as * − 1, if u i = u i 1 − s for L1 follows: z = , where s is the number of existing elements in the logic expression L1 (L2), s − 1 for L2 respectively. The following is an illustrative example using the LCP template [16]. The minimized form of the Boolean logic function that is realized by the LCP template is as follows: F(u) = u 4 u 5 u 6 u 8 (u 7 + u 9 ) . The realization of this function by using two fault tolerant templates is outlined below. LCP TEMPLATE DECOMPOSITION: Original LCP: $
%
]
$
%
]
$
%
]
TEM1:
TEM2:
Result: LCP ⇔ TEM1 AND TEM2 4
During our experiments we have analyzed all the binary input/output CNN templates in [16]. It turned out that all the templates that needed decomposition could be replaced by using 2 fault tolerant templates.
4. Two experimental examples In this section, we demonstrate how the two components of the generic method (as presented in sections 2 and sections 3) can be combined to design templates for Boolean operators that work reliably on a binary input CNNUM chip [5]. We use an erosion and the above mentioned LCP template. First we attempt to correct the errors of the chip via adaptive optimization (as described in section 2). The optimization of the erosion template was successful. Figure 4. illustrates results of the optimization and the template values were the following before and after training: Feedback
Control
0
0
0
0
0
1
0
1
0
0
0
0
Input pattern
Î
1
Feedback 0
Bias
1
1
-4
1
0
Î
Actual output before and after training
Control
0
0
0
0
1.3
0
0
0
0
0
1.23 0.02
0.99 0.85 0.97 0
-5.95
0.94 0.05
Î
Input pattern
Bias
Î
Actual output before and after training
Figure 4. This figure shows the cumulative output of the chip for the erosion template for two typical input patterns before and after training (i.e., using the original and optimized templates, respectively). Gray pixels represent “don’t care” values. For the LCP template, the error cannot be eliminated completely with optimization only (see figure 5.). Therefore, we simplified and decomposed the Boolean function of the template first (as described in section 3), which resulted in two simpler ones, both of which can be implemented accurately (after optimization) on the given chip. Figure 5 illustrates the results of template optimization before and after the decomposition step. The TEM1 and TEM2 templates (see section 3.2) were optimized to eliminate erroneous cells, which resulted in the following templates (denoted as TEM1* and TEM2*): TEM1*
TEM2*
Feedback
Control
0
0
0
0
0.51
0
0
0
0
Input pattern
Input pattern
0
Control
0.05
Bias
0
0
0
0
1.37 1.23 1.39
-5.24
0
-0.19
0
0
0
0
0
0
0
Feedback
-1.44
Î
Î
Î
Î
0
Actual output before and after training
Input pattern
Actual output before and after training
Input pattern
0.02
Bias
-0.07 0.05
0
0.94
1.42 -0.05 1.39
Î
Î
Î
Î
Actual output before and after training
Actual output before and after training
Figure 5. The boxes in the upper row show the cumulative output of the chip for two input patterns using the LCP template before and after training. It shows clearly that the decomposition step was necessary as the optimization method could not eliminate the error completely for the original LCP template. The boxes in the lower row illustrate that the decomposed templates (TEM1*, TEM2*) yield an error-free result for two input patterns.
5
5. Conclusions We have proposed a method for finding non-propagating template values that can be reliably implemented on a binary input-output CNN Universal Machine chip. It can be considered as an implementation of a generic method, which is not restricted to Boolean operators and the particular template decomposition method discussed here. According to our current method, an optimization step is applied first to attempt to eliminate all erroneous cells in a chip. If it fails, i.e. some cells still respond incorrectly, a template decomposition step is applied in order to replace the original template with a sequence of more robust ones. The optimization step is then reapplied to each of the resulting templates to eliminate all chip errors. We have demonstrated, through a simple example, that this method works correctly. We regard the proposed method as a first step towards a fully automated method for finding robust templates on any given CNN chip. In the future, we intend to explore this as well as issues like improved chip models (e.g., ones that take the statistical distribution of chip template values into account [7]), different training methods or variants (e.g., batch mode vs. on-line adaptation), the effect of self-feedback and the introduction of space-variant parameters in template optimization.
6. Acknowledgments This research has been supported by the Grant No. N68171-97-C-9038 of the Office of Naval Research and by the Hungarian Academy of Sciences. The comments of Dr. Péter Szolgay are gratefully acknowledged.
7. References [1] L. O. Chua and L. Yang, “Cellular neural networks: Theory and Applications”, IEEE Transactions on Circuits and Systems, Vol.35, pp.1257-1290, 1988. [2] L. O. Chua and L. Yang, “The CNN Paradigm”, IEEE Transactions on Circuits and Systems-I, vol.CAS-40, pp.147-156, March 1993. [3] L. O. Chua and B. E. Shi, Multiple Layer Cellular Neural Network: A Tutorial, In Algorithms and Parallel VLSI Architecture, Vol. A, Edited by F. Deprette and A. Van der Veen, pp. 137-168, Elsevier, 1991. [4] T. Roska and L. O. Chua, “The CNN Universal Machine: An Analogic Array Computer”, IEEE Transactions on Circuits and Systems-II, vol. 40, pp. 163-173, March 1993. [5] R. Dominguez-Castro, S. Espejo, A. Rodriguez-Vazquez, R. Carmona, “A CNN Universal Chip in CMOS Technology”, Proc. of the third IEEE Int. Workshop on Cellular Neural Networks and their Application (CNNA-94), pp. 91-96, Rome, 1994. [6] J. M. Cruz, L. O. Chua, and T. Roska, “A Fast, Complex and Efficient Test Implementation of the CNN Universal Machine”, Proc. of the third IEEE Int. Workshop on Cellular Neural Networks and their Application (CNNA-94), pp. 61-66, Rome Dec. 1994. [7] M. Pelgrom et al. “Matching properties of MOS transistors”, IEEE Journal of Solid-State Circuits, vol. 24, no 5, pp. 1433-1439, 1989. [8] T. Kozek, T. Roska, L. O. Chua, “Genetic Algorithm for CNN Template Learning”, IEEE Transactions on Circuits and Systems, vol. 40, no. 6, pp.392-402, 1993. [9] T. Szirányi, M. Csapodi, “Texture Classification by CNN and Genetic Learning”, Proc. of IEEE Int. Conf. Pattern Recognition, Jerusalem, ICPR’94, vol. III, pp. 381-383, 1994. [10] B. Chandler, Cs. Rekeczky, “CNN Template Optimization by Adaptive Simulated Annealing”, Proc. of Int. Symposium on Nonlinear Theory and its Appl., Japan, NOLTA’96, pp. 445-448, 1996. [11] C. Güzels, S. Karamahmut, “Recurrent Perceptron Learning for Completely Stable Cellular Neural Networks”, CNNA-94, pp. 177-182, Rome, 1994. [12] CADETWin, (CNN Applications Development Environment and Toolkit) and CCPS (CNN Chip Prototyping System), User’s Guide, Analogical and Neural Computing Laboratory, Computer and Automation Institute, Hungarian Academy of Sciences (MTA-SzTAKI), Budapest, 1997. [13] B. Widrow and M. E. Hoff, “Adaptive Switching Circuits”, Stanford Electron Laboratories, Stanford, CA, Technical Report 1553-1, June 30, 1960. [14] Á. Zarándy, J. Cruz, P. Szolgay, P. Földesy, L. O. Chua, and T. Roska, “Functional Measurements of the First Analog Input/Output CNN Universal Chip”, DNS-4-1997, MTA SZTAKI. [15] K. H. Rosen, “Discrete Mathematics and Its Applications”, Second Edition, McGraw-Hill, Inc. 1991 [16] “CNN Software Library (Templates and Algorithms) Version 7.2”, Edited by T. Roska, L. Kék, L. Nemes, Á. Zarándy, M. Brendel and P. Szolgay, Computer and Automation Institute of the Hungarian Academy of Sciences, Budapest, 1998. [17] R. Tetzlaff, R. Kunz, and G. Geis, “Analysis of Cellular Neural Networks with Parameter Deviations”, IEEE Proceedings of the ECCTD’97, Budapest, 1997, pp. 650-654. [18] L. Chua and T. Roska, “Cellular Neural Networks: Foundations and Primer”, Lecture Notes for the course EE129 at University of California at Berkeley, 1997. 6