The Computational Infrastructure for Cellular Visual Microprocessors P.Szolgay, Á.Zarándy, S.Zöld, T.Roska, P.Földesy, L. Kék, T.Kozek, K. László, I.Petrás, Cs. Rekeczky, I. Szatmári and D. Bálya Analogic and Neural Computing Systems Laboratory Computer and Automation Institute, Hungarian Academy of Sciences, P.O.B.63, H-1502 Budapest, Hungary, e-mail:
[email protected]
Abstract --- A new computational paradigm is emerging for spatio-temporal problems: analogic CNN arraycomputing. The elementary instructions and programming techniques, however, are drastically different from any other computers. These elementary instructions represent complex, spatio-temporal, nonlinear, dynamic phenomena including all the standard and exotic properties of simple image processing operators as well as waves, patterns, and "evolving systems". Meanwhile, a computational infrastructure has also emerged [6] interfacing this revolutionary computing technology to digital systems as well as enabling the development of high-level software. The main goal was to provide easy-to-use tools and hardware interfacing elements, and to put most of the sophistication in the chips and chip sets.
I. INTRODUCTION The CNN Universal Machine (CNN-UM) architecture, invented in 1992 [2], is a novel spatio-temporal array computer. It is also termed analogic CNN computer because analog spatio-temporal dynamics are combined with logic operations, embedded in a stored programmable framework. The enormous computing power (0.18-1.3 Tera equivalent floating point operation/second/cm2) of the first experimental chips ([3] and [4]), using standard CMOS fabrication technology, is the consequence of an entirely new processing method. Meanwhile, a computational infrastructure has also emerged [6] interfacing this revolutionary computing technology to digital systems as well as enabling the development of high-level software. The two development systems for chip prototyping and algorithm design have been developed hand in hand. The main goal was to provide easy-to-use tools and hardware interfacing elements, and to put most of the sophistication in the chips and chip sets. The analogic CNN array-computer has all the ingredients of the stored programmable computer (highlevel language, compiler, operating system, assembly and
machine code). The elementary instructions and the algorithmic techniques, however, are drastically different from any other computers. These elementary instructions represent complex, spatio-temporal, nonlinear, dynamic phenomena including all the standard and exotic properties of simple image processing operators as well as waves, patterns, and "evolving systems". The key theoretical novelty is that all these phenomena can be generated by varying the local interconnection patterns in a simple array, called cellular neural/nonlinear network (CNN), embedded in a new computer architecture called CNN Universal Machine. The physical implementation may be analogic CMOS, optical, emulated digital, and so on. The analogic spatio-temporal CNN algorithms represent a new type of the CNN programming algorithmic world, which is independent of physical implementation. We may characterize such a program as a sequence of flashes in a branching and converging flow, where the flashes, i.e., the CNN operations, are performed by the physics of the silicon or the optical processes. In this paper, the computational infrastructure of this new computing paradigm is described. We present the details of the CNN Chip Prototyping System (CCPS) in order to use the different CNN-UM chips and the CNN Application Development Environment and Toolkit (CADETWin) as an effective tool to develop analogic CNN algorithms. The high-level language, called Alpha, contains general as well as CNN-specific instructions. The Alpha compiler generates code for software simulators, emulated digital PC add-in boards, analogic CNN Chip platforms, as well as for self-contained CNN Engine Boards. Section 2 desribes the high level Alpha language and the compiler as well as the assembly language (Analogic Macro Code - AMC), which are the two different ways to define an analogic CNN algorithm. The basic analogic CNN instructuions, subroutines and some programs are stored in the CNN software library. In section 3, the CNN Application Development Environment and Toolkit (CADETWin) is introduced and in section 4, the CNN Prototyping System is described.
II. THE ALPHA AND AMC LANGUAGES AND COMPILER [9] For the description of analogic CNN algorithms, a highlevel language called Alpha (along with its compiler) has been developed. In writing and developing analogic CNN algorithms, the emphasis is placed on both the algorithmic aspects and their stored-programmable physical implementation. Therefore, we can define parameters and limitations of CNN-UM chips, those of the Engine Board [12] and software simulator which are to implement the actual CNN chip-set architecture [7]. The block diagram of the complete software framework is shown in Figure 1. Algorithm: flow-diagram, templates and subroutines
Alpha source code
Alpha Compiler
AMC (Analogic Macro Code) macrocode for different CNN implementations
Simulator (SimCNN) running on a Pentium chip in a PC
CNN-UM chip in CCPS
Chipset in Engine Board
Emulated digital CNN-UM
Figure 1. The levels of the software and the core engines In an analogic CNN algorithm [4],[5] the key instruction is a subroutine call implementing a CNN template operation. Its general form is as follows: 7(0L,1387 67$7( 287387 WLPH ERXQGDU\ FRQGLWLRQ
where 7(0L is the template identifier loaded at the beginning of the program or a function, ,1387 67$7( are the input and the initial state memories, 287387 is the output memory, WLPH is the duration of the CNN dynamics and the ERXQGDU\ FRQGLWLRQ is defined by three options:
• fixed values of ukl and ykl at the virtual boundary cells, • zero flux, • periodic. In addition to the different boundary conditions, BIAS (zij is an offset map), MASK (a fixed state mask), CONTROL (some control signals of the cell array), and the LAM, LLM variables can be defined for each cell of the NxM CNN array. The most important operators are templates, local logic, and local arithmetic functions.
Description of the Alpha program elements The Alpha language is a CNN specific, simple, algorithmic (procedure-oriented) language. Its purpose is to provide a notation for describing simple algorithms, i.e., flows of analogic operations and to hide the inherent complexity of CNN details. The language provides a unified interface of which different subsets can be implemented depending on the actual realization of analogic operations. The main features of the Alpha language Alpha is built up of the following language constructs: • An algorithm is defined as a main program (called PROCESS), and a set of user-defined procedures (called FUNCTIONs). • The definition of implementation features, algorithmic features of chips and those of simulators are stored in separate definition files which can be referred to by using specific keywords. • Constants used in the program are defined in a separate program segment. • The image memories (local memories) available on chips or in the simulators are defined as array variables in a separate program segment (marked by the CHIP keyword). • The image memories and common variables available on the engine board are defined in the BOARD segment of the program. In the current implementation of the language, all variables are global for the whole program. • There is a separate language construct for referring to CNN template file(s). The main significance of this feature is that the architecture of the underlying system (CNN Universal Machine) can be regarded as a special architecture with dynamic operations that are defined run-time for the target architecture using the template file(s). • The analogic operations in a program are either builtin arithmetic, logic, and assignment statements or user-defined procedures. • The language supports most of the usual control constructs such as loop statements, conditional statements, and procedure call statements (built-in or user-defined).
• The language contains facilities for specifying “logical” files as input and output media (representing “physical” signal arrays). • There is a special built in procedure to control the optical input of a chip architecture. The formal definition of the statements in Alpha will not be detailed here. The following example shows how to call a hole filler template [9], get the input from LLM1, the initial state from LLM2, and store the results in LLM3, sample the output at t=5⋅τCNN and use a fixed value boundary condition of ukl=ykl=-1 (white). hole(LLM1, LLM2, LLM3, 5, -1); The compiler of the Alpha language The output of the Alpha compiler is a middle-level complex code called the Analogic Macro Code (AMC), which can be interpreted by other system components. The main features of the compiler The compiler uses three main input files: • the definition and limitations of the target architecture (the CNN-UM chip realisations and engine Board parameters), • the source program in Alpha language, • the template definition file (or files). The compiler produces the following main output files: • the file of the target code (AMC description), • a diagnostic file with error messages and program listing, • a file with supplementary information regarding the program arguments.
The assembly-level code AMC The common assembly-level programing language of different CNN implementations is the Analogic Macro Code (AMC). It contains image handling instructions, integer variable handling instructions, branching and looping instructions. These operations can be analog CNN operations, pixel-by-pixel local logic operations, image transfer operations, image resolution conversion, or global test operations. (A test operation checks whether, for example, a binary image contains any black pixel or not.) Among the integer variable handling instructions, there are simple instructions only. The branching and looping group contains the basic conditional branch and subroutine calling instructions.
Analogic CNN Software Library (CSL) [11] The CNN templates and some analogic subroutines and algorithms are stored in the Analogic CNN Software Library. The templates are categorised according to their function (e.g. shape extraction, motion estimation, colour
processing, depth detection, etc.). Within the shape category, the following types of functions are considered: basic image processing, mathematical morphology, spatial logic and texture. There are templates for global optimisation, neuromorphic modelling, elementary combinatorics, and for solving some partial differential equations.
III.FUNCTIONAL SPECIFICATION OF THE CADETWIN COMPONENTS [15] CADETWin assists in designing and testing analogic CNN algorithms in phases as follows, − in designing and optimising templates, − in designing template sequences/subroutines, − in testing algorithms in real life conditions by providing interfaces to standard optical input devices, − in developing analogic algorithms by using the high level analogic CNN language Alpha, the same one as in the CNN Chip Prototyping System, − in designing applications. The main components of the CADETWin System are as follows: • Integrated software environment to develop CNN applications, which can store and visualise inputs and results (VisMouse), •
Multi-layer simulators for testing CNN instructions (templates) and sequences of instructions (SimCNN),
•
High level language to design analogic algorithms (Alpha),
•
CNN template designer and optimisation program (TemMaster),
•
Analogic CNN software library (CSL).
CNN simulator (SimCNN) under the Visual Mouse software platform (VisMouse) A global system overview and the implementation independent functional organisation of the program are shown in Figure 2. The main blocks are as follows: • VisMouse: the software platform of a CNN-type image processing system in Windows environment Features: (i) data routing and visualisation, (ii) control over the subsystems. • System Files: configuration, script, and record files • Input: image (sequence) from hard disk or frame grabber • Output: processed image sequence in VisMouse
frame • Template and Image Library: image sequences and CNN templates used in simulations • SimCNN: Multi-layer CNN simulator. • Interfaces: interface to visualisation and numerical computation software, e.g. to MATLAB from MathWorks Inc. VisMouse receives its input (image sequence representing 2 or 3 dimensional data) from hard disk or through a frame grabber. It preprocesses input images and passes them to different subsystems for further processing. It can also display and store final results. The VisMouse has an easy-to-use user interface. Figure 3 shows the general outlook of the VisMouse platform. Figure 3. General outlook of the VisMouse platform Template subroutine and Image library
Template Designer (TemMaster)
Input Si mCNN - image sequence from imagers, frame grabber - CNN simulation on various hardware or hard disk platforms SystemFiles - configuration, DDE script and record files
APPLICATIONS
VisMouse
Output - processed image sequence
- specific applications processing 2 or 3 dimensional data
TemMaster is a software package for assisting users in solving the following problems, mainly within the class of uncoupled CNNs with binary input and output: • optimisation of templates concerning robustness, • visualisation of local logic functions given in several formats, • calculation of optimally robust templates or template sequences that realise an arbitrary Boolean function of less than 10 variables. In some well-defined cases, the program supports coupled template design as well.
IV.THE CNN CHIP PROTOTYPING SYSTEM [14]
INTERFACES
- numerical computation and visualisation
Figure 2. Implementation independent functional organisation of the VisMouse system. The most important feature of SimCNN is the capability of simulating multi-layer CNN arrays, even with different cell densities on different layers. There is no restriction imposed on which layers are interconnected. The templates describing these interconnections can be linear or non-linear. Both Continuous and Discrete-Time CNN (DTCNN) arrays can be simulated. Logic and arithmetic operations can also be performed. Animation files can be created to store a whole transient, that can be replayed later without being recomputed.
The analogic CNN chip-prototyping system has been designed and built for two main purposes: one is to test CNN Universal Chips (we call cP) or chip-sets [7], the other is to run analogic CNN algorithms once the chips have been tested and verified. Analogic CNN algorithms can be defined by using a high-level language, Alpha [9]. An analogic CNN algorithm library [11] containing numerous CNN instructions (cloning templates) and subroutines is available, which makes the development phase of Alpha programs easier. The Alpha compiler generates an intermediate assembly code, called Analogic Machine Code (AMC), that can be used for programming the cP directly. The user may describe his/her analogic CNN algorithm on AMC as well. Each type of cP has its own dedicated platform connected through the CNN Platform bus. The platform bus is common for all CNN chips. The binary control as well as the digital and logic data streams for the CNN chips are communicated through the CNN Platform bus. The template sequence, local logic operator sequence,
switch configuration sequence; code for the global analogic control unit and I/O information have to be downloaded to program the CNN-UM chip. The CNN Chip Prototyping System has a three-level structure (Figure 4). The top level is a PC (host machine). It provides the user interface as well as controls the CNN Prototyping System board (CPS board). The medium level is the CPS board controlled by the intermediate code (AMC). This is either generated by the Alpha compiler or can be put in directly as a mnemonic code. The intermediate machine code itself is independent of the currently used CNN Universal Chip. Its execution method differs from chip to chip because of the hardware differences. We call the program that executes the AMC “External CNN ‘Operating System’ (COS)”, because it fully controls all the resources of the CNN chip, and runs on an external digital processor. This level is driven by a TMS320C25 digital signal processor (DSP). Its output is a standard CNN Physical Interface code (CPI code). The bottom level is the CNN Platform, which hosts the CNN chip. Its role is to adapt the CNN Platform bus signals (CPI code) of the CPS board to the current CNN chip. For example, this platform board performs A/D and D/A conversion, analog multiplexing, sampling and holding, analog level shifting, control code decoding, and so on. Due to the different CNN chip designs, different CNN Platforms are to be designed for each new CNN chip type. The PC - CPS board communication is done via the standard ISA bus (AT bus). The CPS board - CNN Platform communication is performed through the CNN Platform Bus (using ribbon cables). The CNN Platforms accompanying the CCPS can currently host the following CNN Universal Machine chips: (i) cP400 CNN Universal Chip [3] containing 22x20 elementary processors with on-chip black/white optical input and black/white electrical output, (ii) cP200 CNN Universal chip [4] with 14x14 CNN cells, analog input and dual (analog and binary) output, (iii) cP150 discrete-time CNN Universal chip [13] with 12x12 CNN cells, (iv) cP2500 CNN Universal chip[14] with 48x48 CNN cells.
ALPHAdescription of analgorithm PC
ALPHA compiler
display
interfaces and Executable program code for PC
image library
video
template library ISAbus output of the CPSboard
{
CNN Prototyping Systemboard (CPS)
CNNPlatformbus
CNN Platform
analogic machine code (AMC), image data, template data
image data, binarydecision code fields
}
input of the CPSboard
External CNN“Operating System” (COS) running
electrical input, control, template and data signals (CPI code)
electrical output data signals
level shifters, sample/hold, multiplexers
CNNchip
optical input
Figure 4. The architecture of the CNN Chip prototyping System (CCPS)
The CNN platform The CNN platform has a very specific role in the CCPS, because this component of the system hosts the CNN-UM chip itself. Physically, it is a relatively small printed circuit board. This is the only part which should be redesigned and rebuilt when a new CNN-UM chip is adopted to the system. From a hardware point of view, the platform hosts the CNN-UM chip itself, and includes some bus drivers and latches for temporally storage of long control and data words, a programmable logic device (PLD) for controlling the latches, and some voltage or bias references and stabilized voltage supply for the analog CNN chip. Furthermore, if the CNN-UM chip has analog input and/or output, it contains all the analog devices (A/D, D/A converters, sample-holds, level shifters). In the new designs, the Analog RAM (ARAM) will also be adopted to the board, and we are planning to install some video input-output devices as well. The roles and advantages of the platform are as follows: • It interfaces the CPS board to the chip providing all the data and control signals for the chip to achieve the correct operation. • Some components, which logically belong to the CNN-UM chip and were not implemented on the die
(like template memory, reference voltages, etc.), are built on the platform. • It physically separates the noisy digital power supply from the analog power signal. • It is a quickly and easily redesignable part of the system, hence the adaptation of a new chip does not require a huge effort. The first version of the CNN chip prototyping system was designed and built in 1994. Since then, six different platforms have been designed, each adopting a different chip. Three of the chips came from Seville [3] (some of them were not programmable), one came from Berkeley/Munich [13], one from Berkeley [4], and one from Helsinki [14].
(a) INPUT_PICTURE
(b) OUTPUT_PICTURE
Figure 6. The input and the output images of the algorithm for texture segmentation (image size: 192x154)
VI.CONCLUSION V. EXAMPLE The aim of the following simple CNN analogic program is to separate two binary textures of predefined types in the input image. In the first step, the textures are separated by local Ratio Of Black-and-white pixels (ROB). In the next step, a ‘small killer’ template [9] enhances the separation. Next a majority-vote-taker template [9] is used. Finally, a closing operation smoothes the result. This analogic CNN algorithm in Alpha language was developed in CADETWin. Than the CNN Engine Board of cP400 was used to run this application. The elementary CNN operations used in the algorithm (texture segmentation, small killer, dilation, and erosion templates) can be found in the CNN Software Library [9] as TX_RACC3, SMKILLER, DILATION, and EROSION, respectively.
BINARY INPUT PICTURE texture segmentation template small killer template majority-vote-taker template morphological closing
dilation template erosion template BINARY OUTPUT PICTURE
Figure 5. The flow diagram of an analogic CNN algorithm for binary texture segmentation
Analog VLSI implementation of the CNN Universal Machine exhibits supercomputing power in the range of teraops. The analogic CNN algorithms were designed on CADETWin system. The CNN Chip Prototyping System was designed and implemented to measure and program these chips in a standard PC environment by using a highlevel language (Alpha) and a low-level language (AMC). To interface the different CNN-UM chips, CNN platforms were designed.
REFERENCES [1.a] L.O.Chua and L.Yang, "Cellular neural networks: Theory", IEEE Trans. on Circuits and Systems, Vol.35, pp. 1257-1272, 1988. [1.b] L.O.Chua and L.Yang, "Cellular neural networks: Applications", IEEE Trans. on Circuits and Systems, Vol.35, pp. 1273-1290, 1988. [2] T.Roska and L.O.Chua, "The CNN Universal Machine: an analogic array computer", IEEE Transactions on Circuits and Systems-II Vol.40, pp. 163-173, March,1993. [3] R.Dominguez-Castro, S.Espejo, A.Rodriguez-Vázques, R.Carmona, P.Földesy, Á. Zarándy, P. Szolgay, T. Szirányi and T. Roska “A 0.8 µm CMOS 2-D programmable mixed-signal focal-plane array-processor with on-chip binary imaging and instructions storage” Vision Chip with Local Logic and Image Memory, IEEE J. of Solid State Circuits 1997 [4] J.M.Cruz, L.O.Chua, and T.Roska, “A Fast, Complex and Efficient Test Implementation of the CNN Universal Machine”, Proc. of the third IEEE Int. Workshop on Cellular Neural Networks and their Application (CNNA94), pp. 61-66, Rome Dec. 1994. [5] F.Werblin, T.Roska and L.O.Chua, “The analogic cellular neural network as a bionic eye”, International Journal of Circuit Theory and Applications, Vol. 23, pp. 541-549, 1995 [6] T. Roska, P. Szolgay, Á. Zarándy, P. Venetianer, A. Radványi, T. Szirányi, "On a CNN chip-prototyping system" Proc. of CNNA'94, Rome, pp. 375-380, 1994.
[7] [8]
[9]
[10]
[11]
[12]
[13]
[14]
[15] [16]
T. Roska,” CNN chip set architecture and the Visual Mouse”, Proc. of CNNA'96, pp.369-374, Seville, 1996. T.Roska, L.O.Chua and Á.Zarándy, “Language, compiler, and operating system for the CNN supercomputer”, Report UCB/ERL M93/34, University of California, Berkeley, 1993 S.Zöld, " CNN Alpha Language and Compiler", Report DNS-10-1997, Computer and Automation Research Institute, Budapest, 1997 T. Szirányi, M. Csapodi, “Texture Classification by CNN and Genetic Learning”, Report DNS-8-1993, Analogical and Neural Computing Lab., Computer and Automation Institute, Hungarian Academy of Sciences (MTASzTAKI), Budapest, 1993 & Proc. of IEEE Int. Conf. Pattern Recognition, Jerusalem, ICPR’94, vol. III, pp. 381-383, 1994 " CNN Software Library" in CADETWin (ed. by T.Roska, L. Kék, L.Nemes, Á. Zarándy, M.Brendel and P.Szolgay) MTA-SZTAKI Budapest, 1998 P. Földesy, P.Szolgay, “ A CNN Engine Board”, Proc. of ECCTD’97 in Design Automation Day, pp.199-204, 1997. Budapest. H.Harrer, J.A.Nossek, T.Roska, L.O.Chua, “A Currentmode DTCNN Universal Chip”, Proc. of IEEE Intl. Symposium on Circuits and Systems, pp135-138, 1994. A. Paasio, A. Dawidziuk, K. Halonen, V. Porra, “Minimum Size 0.5 Micron CMOS Programmable 48 by 48 CNN Test Chip” Proceedings of the ECCTD’97 Budapest, September, 1997, pp. 154-156., 1997. CADETWin User’s Manual, MTA-SZTAKI Budapest, 1998 CCPS User’s Manual, MTA-SZTAKI Budapest, 1998