approximation, e.g. a template used for a simula- ... berg 10, B-3001 Heverlee (Leuven) Belgium, e-mail: ... independent robust template design methods were.
Automatic Chip-Specific CNN Template Optimization using Adaptive Simulated Annealing Samuel Xavier de Souza, M¨ u¸stak E. Yal¸cın, Johan A.K. Suykens, Joos Vandewalle∗ Abstract — This paper describes a solution proposal for automatically tuning cellular neural network— CNN templates for given CNN Universal Machine — CNN-UM chips in order to make them respond in the same fashion as a simulator, i.e. to minimize or even eliminate the erroneous behavior of actual CNN-UM chips. The approach uses measurements of actual CNN-UM chips as part of the cost function for the adaptive simulated annealing—ASA algorithm to find an optimal template given an initial approximation, e.g. a template used for a simulator. The tuned templates are therefore customized versions that are expected to be much less sensitive to imperfections on the manufacturing process and other reasons of erroneous behavior of CNN-UM chips. Results are presented for the binary and gray scale input cases. The automatic tuning was able to find better templates for all considered tasks. It is expected that the maturity of this technique will give to CNN-UM chips enough reliability to compete with digital systems in terms of robustness in addition to advantages in speed.
1
INTRODUCTION
A Cellular Neural Network—CNN [1], is an analog processor array locally interconnected and very suitable for image processing because of its arrangement in a regular two-dimensional grid. Few years after its invention, image processing applications could be executed thanks to another invention: a programmable CNN, the so called CNN Universal Machine – CNN-UM [2]. The space invariant local interconnections, characteristic of CNN’s, yields a small set of free parameters called templates, which uniquely determine the array behavior. At the time that CNN and CNN-UM were invented, 1988 and 1992 respectively, the templates were designed for ideal structures simulated on digital computers. Several design methods were developed for these simulators [3][4][5]. The CNN Software Library [6] contains a nice collection of designed templates. Actually with the analog VLSI technology [7], considerably larger CNN-UM can be implemented in a single chip [8]. These chips can perform image processing tasks with extremely high throughput data rates: in the order of Tera operations per sec∗ K.U.Leuven,
ESAT-SCD-SISTA, Kasteelpark Arenberg 10, B-3001 Heverlee (Leuven) Belgium, e-mail: {samuel.xavierdesouza,mustak.yalcin,johan.suykens, joos.vandewalle}@esat.kuleuven.ac.be, tel.: +32-16321709, fax: +32-16-321970.
ond [9]. This performance can make CNN chips very suitable for a wide range of image processing tasks, especially for real time applications [10]. Nevertheless, the chip parameters are slightly different from the ideal ones. This happens mainly because of noise in electrical components of a cell as well as imperfections in the fabrication process resulting in erroneous behavior of cells for some tasks. When the early templates designed for the simulators were first tested on VLSI chip implementations, many of them were found to work incorrectly [11]. Consequently new chipindependent robust template design methods were developed [12][13][14][15] for the purpose of generating templates that are more tolerant against parameter deviations and noise. However, even the most robust templates will not guarantee a fully correct behavior of a given CNN chip [11]. This paper describes a chip-specific solution proposal to automatically tune CNN templates in order to correct erroneous behavior of CNN-UM chips and make them respond in the same way as a simulator. This approach also uses measurements of actual CNN-UM chips, in particular the ACE4K chip [8]. The measurements are used as part of the cost function for the adaptive simulated annealing—ASA algorithm [17][18] to find an optimal template given an initial approximation, e.g. the most robust designed template. The method also considers seeking for the best template among the range of templates found to be optimal from the experiments. The final templates are therefore customized versions that are presumed to be insensitive, or at least less sensitive, to imperfections on the manufacturing process and other reasons of erroneous behavior of CNN-UM chips. This paper is organized as follows. The next section gives more insights on the erroneous behavior of CNN chips. Section 3 explains in more detail the process of tuning the templates using the ASA algorithm. Section 4 is dedicated to the results of the performed experiments. 2
ERRONEOUS CHIP BEHAVIOR
The erroneous behavior observed in VLSI implementations of CNN-UM are considered to be a result of a combination of reasons: noise in the elec-
trical components of cells; and parameter variations introduced during the fabrication process. Analog VLSI implementation can only guarantee a rough accuracy related to ideal parameter values and, additionally, template parameters have a discrete range of implementable values [7]. These so far unavoidable and undesirable features make CNN-UM chips loose reliability for certain template operations. Some templates developed for use in ideal CNN’s, like the AVERTRSH [6], can even produce different results in a chip for runs with the same input and initial conditions (Figure 1). Such an observation can only lead to the conclusion that CNN-UM chips are also susceptible to errors due to post-manufacturing interference, such as noise and temperature, in addition to errors caused by VLSI manufacturing imperfections.
(a)
(b)
(c)
(d)
Figure 1: Erroneous behavior of the ACE4k CNN-UM chip: a) input; b),c), and d) different outputs
3
TEMPLATE TUNING
The approach described in this paper attempts to adjust the template parameters of a given operation in order to compensate inherent parameter deviations of a given chip so that the resulting template allows a more reliable operation. The adjustment, or tuning, of templates is performed by a search around the parameter values of the initial template. Adaptive Simulated Annealing [17][18] was chosen to do this search owing to its speed and robustness as a method to search for a global optimum in non-linear complex problems with multiple local optima. Section 3.1 details the task of tuning templates for a given chip. For the case of template operations with binary output, due to the discrete nature of the results and to the continuous nature of the template values, it is possible to find several templates considered to be optimal. The method described here also takes into account the search for the best of these optima and consequently introduces the concept of chip-specific robustness, meaning that the chosen template is expected to be the most robust among those considered to be optimal for the given chip. Section 3.2 explains the concept. Another chip-dependent, or chip-specific, effort was also carried out in [16]. This method was then called template optimization, instead of template
design or learning, and also intends to minimize the errors of a given CNN chip by modifying template values using measurements of the real chip. It combines gradient descent optimization with decomposition of ideal CNN templates. Furthermore, it only considers the class of uncoupled CNN operations, whereas here there are no restrictions to this class of templates owing to the use of a search method that does not depend on information of the gradient of the error. The loss in speed given by the use of a global optimization method can surely be neglected because the tuning only needs to be done once for each chip and for each type of template. 3.1
Template Optimization - Gross Tune
The goal of chip-specific template optimization methods is to find, for a given chip, modified template values whose result matches with the result of a given initial template simulated on ideal CNNUM models. In order to do a fully correct optimization of CNN templates, a very important step is the choice of the training set, which is composed by a set of triplets θ containing the input u, the initial state x, and the desired output y d . In [16] the importance of this step is discussed and a good method to compose the training set is proposed. The cost function chosen for the ASA algorithm is a normalized version of the same cost function used in [3] for learning purposes. Equation (1) shows this cost function, where p denotes the parameter vector, i.e. the probing template, θ is the current training triplet, k is the number of cells, yid is the value of the ith pixel of the desired output and yi (∞) is the corresponding value of the steady-state output, whose values are acquired from direct chip measurements. Hence, the cost function g(p, θ) of the probe template p for the input and initial state contained in the triplet θ gives the RMS value of the distance between the desired output vector y d and the steady-state output y(∞). The objective of the ASA algorithm is, therefore, to minimize g(p, θ) given and an initial template pinit . v u k X 1 u (1) g(p, θ) = √ t (yid − yi (∞))2 . k i=1 Imposing an initial approximation pinit seems to be of less importance for a global optimization method like ASA. However, for this approach the approximation is used to set the boundaries of the search since the objective here is tuning and not learning, where the whole parameter range would be used instead. Namely the boundaries for the search are pmin,i = pinit,i − b for the lower bound
and pmax,i = pinit,i + b for the upper bound, where i is the index of each template parameter and b is a small value. Observe that here two assumptions are made: the initial template is assumed to be a fully correct working template on the simulator; and the parameter deviations introduced during fabrication are assumed to be smaller than b. Narrow boundaries for the ASA search decreases duration of the optimization and allows the use of smaller annealing temperatures, resulting in a much more efficient optimization. Once the training set and the search boundaries are defined, the optimization can be performed. The procedure is finished and considered successful when the cost function becomes smaller than a certain end condition value. In most cases, the end value of the cost function is zero, meaning that for the final template, the results for chip matches perfectly the simulator results.
chip, i.e. the most chip-specific robust template for a given operation will be the one less sensitive to disturbances of the chip parameters caused sometimes by noise in components or temperature variations. Figure 2 depicts the concept for one component of the parameter vector.
e
Theoretical lower bound
Chip-Specific Robustness - Fine-Tune
The optimal template obtained by the method of the last section may not be unique for the binary output case, as explained previously. Small variations on the final template may also result in the same output. Therefore, the optimization process performed a gross tuning, whose results are an initial approximation for a fine-tuning, where further improvements are applied to the final optimum. In order to do the fine-tuning and find the best among the optimal templates, the ASA algorithm uses now lower annealing temperatures and the parameter boundaries are shrunken. Besides, another cost function is used, where instead of including the measurement of one template run, it has now several different embedded measurements. Equation (2) presents this function, where r denotes the number of runs executed for the triplet θ, and e is a vector of the same size as p where each element is a Gaussian noise with zero mean and very small variance. r 1X g(p + ej , θ). (2) gf ine (p, θ) = r j=1
Theoretical upper bound pinit
pmin
Min value
3.2
chip-robust optimum
pmax
Gross tuning optimum
pmin
pinit p max
Max. value
Figure 2: Chip-specific robustness: illustration for one component of the parameter vector
4
EXPERIMENTS
The experiments were performed on the ACE4k CNN-UM chip (64x64 processor cells) with aid of the Matlab environment. The ASA algorithm used measurements from the chip on-the-fly with number of iterations in the order of tens of thousands and latency of about 50ms per measurement. The method described here was applied to several gray scale and binary input template operations on images of 64x64 pixels. In particular, results of templates for edge detection, average (AVERTRSH), and half-toning are shown here. Random images were used in the training sets with the desired outputs generated by a simulator. The erroneous behavior of the chip was minimized for every addressed cases, when not eliminated. Figure 3 presents some results. In figure 3, the error of the edge detection operation was reduced from 0.7719 to zero level. For the average task, the reduction on the error was from 0.6370 to 0.1683, where for the half-toning was from 0.4357 to 0.2563. For some tasks like edge detection, some of the ideal template values exceeded the limits allowed on the chip and were trimmed, resulting in total disturbance of the original behavior. Even for this cases, the tuning showed excellent results.
Assuming that the set of template optima are located in intervals of real number parameter values per each of the components. Then the addition of different samples of e to the probe template p in gf ine () will statistically make this function minimal when p has its elements in the middle of each corresponding dimensional range of optima. As a result, the template optimum found by the fine-tuning will 5 CONCLUSIONS be very close to the most robust template for specific use in the given chip. Chip-specific robustness Despite the extraordinary speed performance of is thus the concept of robustness within a given CNN-UM chips for image processing tasks, digital
Edge detection
References [1] L.O. Chua and L. Yang. Cellular Neural Networks: Theory and applications. IEEE Trans. Circuits and Syst., 35:1257–1290, 1988.
Average
[2] T. Roska and L.O. Chua. The CNN Universal Machine: an Analogic Array Computer. IEEE Trans. Circuits and Syst., 40(II):163–173, March 1993. [3] T. Kozek, T. Roska, and L. O. Chua. Genetic Algorithm for CNN Template Learning. IEEE Trans. Circuits and Syst., 40(I):392–402, March 1993.
Halftoning
[4] B. Chandler, C. Rekeczky, Y. Nishio, and A. Ushida. Adaptive Simulated Annealing in CNN Template Learning. IEICE Trans. Fundamentals, E82(2):398– 402, Feb. 1999.
(a)
(b)
(c)
(d)
Figure 3: a) original image; b) simulator output; c) non-optimized chip output; d) optimal results
systems are still a majority in the field owing to their superior reliability. The development of a method toward chip-specific robustness contributes to diminish this superiority. The method described here works well for all tested stable binary output template operations. For the case of gray scale outputs, a more elaborated approach that takes into account also the transient time has to be developed and will be considered for future research. Using an optimization method that does not rely on information about the gradient of the cost function allowed this approach to efficiently tune not only uncoupled templates but also coupled ones. The fact that a method like ASA requires more execution time than common local optimization methods can surely be neglected since the optimization here is expected to be done only once, e.g. prior to first utilization. Chip-specific robust tuning of templates provides a method to place parameters values in the middle of a correct operating range. This minimizes the erroneous behavior of CNN chips for already optimized templates due to parameter variations caused by post-manufacturing disturbance, e.g. temperature and noise, which may cause the parameter values to fall outside the correct working range. Acknowledgments.
This research work was carried
out at the ESAT laboratory and the Interdisciplinary Center of Neural Networks ICNN of the Katholieke Universiteit Leuven, in the framework of the Belgian Programme on Interuniversity Poles of Attraction, initiated by the Belgian State, Prime Minister’s Office for Science, Technology and Culture (IUAP P4-02, IUAP P4-24, IUAP-V), the Concerted Action Project MEFISTO of the Flemish Community and the FWO project Collective Behavior and Optimization: an Interdisciplinary Approach and ESPRIT IV 27077 (DICTAM). JS is a postdoctoral researcher with the Fund for Scientific Research FWO - Flanders.
[5] C. G¨ uzelis, S. Karamahmut, and I. Gen¸c. A recurrent perceptron learning algorithm for cellular neural networks. ARI - Interdiscip. J. of Phys. and Eng. Sciences, 51(4):296–309, 1999. ´ Zar´ [6] T. Roska, L. K´ ek, L. Nemes, A. andy, M. Brendel, and P. Szolgay. ”CNN Software Library” in CADETWin. Computer and Automation Institute of the Hungarian Academy of Sciences, Budapest, 1998. [7] S. Espejo, R. Dom´ınguez-Castro, R. Carmona, and A. Rodr´ıguez-V´ azquez. A CNN Universal Chip in CMOS Technology. Int. J. of Circuit Th. & Appl., 24:93–109, Jan-Feb 1996. [8] G. Li˝ na ´n, S. Espejo, R. Dom´ınguez-Castro, and A. Rodr´ıguez-V´ azquez. ACE4k: An analog I/O 64x64 visual microprocessor chip with 7-bit analog accuracy. Int. J. of Circuit Th. & Appl., 30(2-3):89–116, 2002. ´ Zarndy. CNN [9] L. O. Chua, T. Roska, T. Kozek, and A. Universal chips crank up computing power. IEEE Circuits and Devices, 12(4):18–28, 1996. [10] K. R. Crounse and L. O. Chua. Methods for image processing and pattern formation in Cellular Neural Networks: a tutorial. IEEE Trans. Circuits and Syst., 42(10):583–601, Oct. 1995. ´ Zar´ [11] T. Roska, L. K´ ek, L. Nemes, A. andy, M. Brendel, and P. Szolgay. CADETWin. Computer and Automation Institute of the Hungarian Academy of Sciences, Budapest, 1998. [12] J. A. Nossek. Design and Learning with Cellular Neural Networks. Int. J. of Circuit Th. & Appl., 24:15–24, 1996. [13] B. Mirzai, D. Lim, and G. S. Moschytz. Robust CNN Templates: Theory and Simulations. In Proc. of IEEE Int. Workshop on Cellular Neural Networks and Their Applicat., (CNNA’96), pages 393–398, Sevilla, 1996. [14] P. Kinget and M. Steyaert. Evaluation of CNN Template Robustness Toward VLSI Implementation. Int. J. of Circuit Th. & Appl., 24(1):93–110, 1996. ´ Zar´ [15] A. andy. The Art of CNN Template Design. Int. J. of Circuit Th. & Appl., 27(1):5–23, 1999. ´ Zar´ [16] P. F¨ oldesy, L. K´ ek, A. andy, and G. B´ artfai T. Roska. Fault-Tolerant Design of Analogic CNN Templates and Algorithms—Part I: The Binary Output Case. IEEE Trans. Circuits and Syst., 46(2):312– 322, February 1999. [17] L. Ingber. Very Fast Simulated Re-Annealing. J. of Mathematical Computer Modelling, 12:967–973, 1989. [18] L. Ingber. Adaptive simulated annealing (ASA). version 24.1 source code at http://www.ingber.com, 2002.