A Tool for the Implementation of Heavy-ComputationalLoad Control Functions in Embedded Systems J.A. Rodríguez Mondéjar
F. de Cuadra García
O. Nieto-Taladriz García
Dpto. de Electrónica y Automática ICAI UPCO Alberto Aguilera 23, 28015 Madrid
[email protected]
Instituto de Investigación Tecnológica (IIT) ICAI UPCO Alberto Aguilera 23, 28015 Madrid
[email protected]
Dpto. de Ingeniería Electrónica ETSIT UPM Ciudad Universitaria s/n, 28040 Madrid
[email protected]
Abstract - This paper presents a tool for incorporating heavy- computational-load control functions in low-cost embedded systems. The proposed tool uses piecewise multilinear interpolation techniques recursively to approximate the target control function. The paper presents the theoretical basis of the proposed method and the software tool developed for implementation. The main implementation issues are explained in detail. The developed tool includes automatic C- and VHDL-code generators that implement the best hardware or software solution in terms of computational time and cost.
shows the unit step response for second order systems as function of time and damping. The function is smooth but it’s hard to calculate in a simple hardware. Moreover, there is usually no need for great precision in the computations due to the robustness of the (engineering) system. These two facts allow the use of interpolation techniques to replace the slow-function by another with far simpler structure, low implementation cost, fast and totally predictable. In this work, we use the recursive piecewise multilinear interpolation to construct this approximation function, following the results of [3]. This function receives the name of abacus, according to the nomenclature of [3].
I. INTRODUCTION 2
1.5
f(t,ζ)
Very often the solution of practical problems, like those that appear in automation and industrial control, implies the use of control functions with a heavy computational load. The excessive computation time, or the requirement of sophisticated hardware, makes their direct use of these functions in low cost embedded systems nonviable [1]. For instance, reference [2] presents a survey of the problems associated with the direct implementation of the functions obtained using artificial intelligence techniques: the most important drawback being the non-deterministic behavior in time. This paper presents an alternative approach that shows many practical advantages with respect to the previously mentioned techniques [1]. Our approach makes use of the fact that these functions (slowfunctions from now on) do not necessarily imply a complicated topography. The control functions used in engineering are usually smooth and they have good local behavior. This means that their values can be approximated using interpolation or extrapolation techniques with low error [3]. For example, the fig. 1
1
0.5
0 1.5 10
1
8 6 0.5
4 2
ζ
0
0
t
Fig. 1. Example of slow function with smooth behavior.
II. RECURSIVE MULTILINEAR INTERPOLATION The basic method to build the abacus divides the original control function domain according to a cartesian grid of non homogenous step. The values in the vertices or nodes of the mesh of the slow-function
domain are stored in memory. The approximation value of the slow- function is calculated by multilinear interpolation from stored values. Therefore, the abacus is a function defined by pieces of multilinear functions (splines of degree 1) [4]. With this model, the abacus has three blocks: 1. Memory block. It stores the constants (target function values in the nodes of the grid). The cartesian grid simplifies the organization of the data. 2. Searching block. It selects the necessary values of the memory according to the mesh to make the interpolation at the point. 3. Interpolation block. This calculates the approximation value (abacus function) with this formula: a
f
(x ) =
1 Vol hyper
∑ Vol
vertex
( x ) ⋅ f (vertex
) (1)
vertex
where Volhyper is the volume of the interval (or hyperinterval) at which the point belongs to within the mesh. Volvertex is the volume between the point and the corresponding vertex opposed to the vertex of the indicated interval. This formula is derived from Lagrange formula [4]. For the case of 2 dimensions the formula stays as: f
a
(x , y ) =
S 00 F 00 + S 01 F 01 + S 10 F10 + S 11 F11 (2) S 00 + S 01 + S 10 + S 11
The ZOOM1 of fig. 2 shows the meaning of the terms in Eq. (2). The most important properties of the abacus are: • Searching block and interpolator block only depend on the interface of the slow-controlfunction: number of inputs, input format and output format. The operations involved are comparisons, additions, multiplications and only one division (if it is necessary). Their simplicity makes the computational time totally predictable with low cost hardware. • Flexible solution: the memory stores the topography of the slow-function. If you change the slow-function to another with the same interface, you must only change the memory values (if the memory size is enough). The error between the slow-function and the abacus, and the slow-function topography define the minimum memory size. In the limit, with error 0, the abacus becomes the classical lookup table [7]. These properties make abacus suitable for low cost embedded systems (microprocessor or ASIC based).
The first problem of abacus construction is to calculate the grid. Every point on the slow-function domain must satisfy the error equation:
f − f a ≤ e( x )
(3)
where f is the slow-function, f a is the abacus and e(x) is the error function in each point of the domain. This natural definition of the error is hard to manage due to the use of the absolute value function. With the exception of simple cases, there is not an efficient algorithm to calculate the optimal grid in ndimensional approximation problems [5]. Reference [3] describes an algorithm that has proved to be very efficient for calculating the grid (one possible solution). The algorithm is based on the following idea: if the definition of the error of Eq. (3) is fulfilled in all points of the domain, it is also fulfilled on the edges that define the border of this domain. We suppose that border points belong to slowfunction domain. The n-dimensional problem has become an initial 1-dimension approximation problem in the edges of the domain. We solve it by applying the Adaptive Approximation algorithm [3]. The convergence is assured by the Jackson Theorem [6]. This edges division leads to the first grid. The next step is to verify that the error is satisfied within each interval of this first grid. We use the Hooke&Jeeves algorithm to calculate the maximum error. These hyperintervals that do not fulfill are calculated separately in the same manner (the recursive term comes from here). Figure 1 shows the mechanism. ZOOM 0
ZOOM 1 (x,y)
ZOOM 1 F01 S10
S00
F11
(x,y) S11 F00
S01 F10
Fig. 2. Recursive multilinear Interpolation
Nevertheless, the direct application of this method in real-time systems has several problems: •
Fixed-point abacus. In [3] only floating-point arithmetic is used. In low cost systems, the use of fixed-point arithmetic is very common. The Hooke&Jeeves algorithm presents important problems of local minimums with the abacus at fixed point. These have been solved by replacing the fixed-point abacus by two functions in floating point, which surround the fixed-point abacus.
•
Excessive number of nested zooms. This fact causes the searching block to become slow. In order to avoid this, the information of the inferior zoom is used to recalculate the grid of the superior zoom. This is equivalent to using in figure 2 the samples that divide the axes of the ZOOM1 in the ZOOM0.
tool. The designer provides the tool with a C-function with the slow-function. The rest of the information (interface of the function with the real world, maximum error allowed, maximum number of nesting of zooms, type of implementation, etc.) is introduced through script files or in a interactive way. The most important characteristics of the tool are: • The implementation of the abacus can be at standard floating point or fixed point with different formats (without sign, sign-magnitude, two’s-complement, non-standard floating point) and with different sizes in the data (standard in the microprocessor case and custom in a ASIC implementation). •
Automatic generator of C code and VHDL code (compatible with SYNOPSYS Behavioral Compiler). 3 variables
4
4.5
x 10
CLIENT 4
Total Area (gates)
COMPUTATIONAL PROBLEM
ISOLATED
HIGH PRECISION ABACUS
SLOW FUNCTION
3.5
3
2.5
2
ERROR APPROXIMATION
1.5
AUTOMATIC
ABACUS GENERATOR
TYPE
ABACUS
IFD
IFD 55
DESIGN 50
LIBRARY
LOOK UP STRATEGY
Calculator Cycles
45
SYNTHESIZER
40
ID IB
30
20
VHDL
ID
35
25
C
IFD ID
IB
IFD IB ID IB
0 ws − Area Opt | 0 ws − Time Opt | 1 ws − Area Opt | 1 ws − Time Opt
COMPILER
SYNTHESIZER
ÁBACUS HARDWARE
Fig. 3. Work methodology of the tool.
III. TOOL FOR THE IMPLEMENTATION A software tool has been developed that implants the calculation of the abacuses outlined in the previous section. Figure 3 shows the work methodology of the
Fig. 4. Area and interpolation cycles data base of a 3-input abacus for 4 cases: memory without memory wait state (ws) and area optimization, memory without memory wait state and time optimization, memory with 1 wait state and area optimization and memory with 1 wait state and time optimization. IFD: adaptive floating point operations with directory search. ID: fixed point operations with directory search. IB: fixed point operations with bipartition search.
•
Study of time and size in each one of the possible solutions. For C it calculates the necessary memory and the necessary number of operations in the worst case. With these
operations and the technical sheet of the microprocessor, the response time can be estimated. In the case of ASIC the tool provides memory size, number of gates and number of clock cycles. In order to diminish the development time, the program uses a built-in data base with typical solutions in ASIC implanted by SYNOPSYS Behavioral Compiler (the main constraint is the minimization of memory accesses). Figure 4 shows the data base for a 3-input abacus (8/10/12/14/16 bits). Figure 5 outlines the architecture of hardware abacus. •
It is not a complicated function, with the exception of its implementation in a 8-bit microcontroller like the 8051. It shows how the abacus works. Figure 8 is the abacus solution without limits of nesting zooms and an error of 1%. Figure 7 shows the abacus grid. dci: 1036 ptos Profu 3 600
400
200
Study of the error in the approximation. 0
−200
SEARCHING B.
INPUT
MEMORY
FUNCTION −400
INTERPOLATOR −600 −600
−400
−200
0
200
400
600
Fig. 7. Abacus grid Fig. 5. Abacus hardware dci: 1036 ptos Profu 3
IV. EXAMPLE OF APPLICATION 4
We present 2 examples. The first example outlines the results of the application of the tool to the following 2-dimensional function:
x 10 4 3 2
f = x ⋅ y ⋅ sen( x + y / 2) ⋅ cos(10 ⋅ x ) 2
1
3
0 −1
Figure 6 displays this function.
−2 −3 500 500 0 0
15 −500
10
−500
Fig. 8. Abacus
5 0 −5 −10 −15 0 −1
−0.5 −1
−1.5 −1.5
−2 −2
−2.5
Fig. 6. Slow function
This VHDL code of the function is implemented in ASIC technology (LSI10K) with these parameters: • Memory with 1 wait state. • Area restricted synthesis. The results are: • Estimated combinational area: 846 gates. • Estimated sequential area: 1550 gates. • Estimated total area: 2396 gates. • Maximum clock cycles: 68 cycles. • Average clock cycles: 46 cycles. In this circuit, the tool changed the final division of the interpolation with a shift operation. The area of
intervals is power of 2. The tool also optimized the memory references. The second example is the implementation of a 3input function for the control of a scara-robot arm. Table 1 shows different implementations of this function provided by the tool. The implementation is in C language on a 16-bit microcontroller. The size of all the variables (inputs and output) is 8 bits. The approximation error is smaller than 1%. For each strategy the number of operations classified by types, as well as the necessary size of memory are indicated. With these tables the designer chooses the most suitable design for their application. In this case and if there are no other restrictions, the second column presents the most favorable solution. If this is implanted on Siemens 80167 20 MHz the response time of the function estimated is smaller than 30 microseconds. The last table line allows a comparison with the case of lookup table. Zoom architecture Directory Directory Tree Tree Search method Bipartition Index Bipartition Index Memory access 50 34 85 67 Function calls 6 0 15 0 Multiplications 27 27 36 33 Additions 69 37 88 43 Divisions 1 1 1 1 Comparisons 39 9 58 15 Memory (Bytes) 6436 11684 6532 12628 Look up table: 16MBytes, 1 memory access
Table 1. Different implementations of a scara-robot arm control.
V. CONCLUSIONS In this paper, a tool for the implementation of slow control functions by means of the substitution for another faster and cheaper has been presented. The obtained results are very satisfactory. The designer can explore the space of possible designs on a fast form, thanks to the flexibility of the tool and the database that incorporates it. VI. REFERENCES 1. 2.
3.
4.
5. 6. 7.
P. A. Laplante, “Real-Time Systems Design and Analysis”, IEEE Press, 1993. L. Motus, “Timing Problems and their Handling at System Integration, Artificial Intelligence in Industrial Decision Making, Control and Automation”, Kluwer Academic Publishers, 1995. F. de Cuadra, “El Problema General de la Optimización del Diseño por Ordenador: Aplicación de Técnicas de Ingeniería del Conocimiento”, Tesis Doctoral, Universidad Pontificia Comillas, Madrid, 1990. G. Hämmerlin, K. Hoffmann, “Numerical mathematics”, Springer-Verlag New York Inc., 1991. Paul Dierkcx, “Curve and Surface Fitting with Spline, Oxford University Press”, Oxford, 1995. Carl de Boor, “A Practical Guide to Splines”, Springer-Verlag, New York, 1978. “Table Look-up and Interpolation on the TMS320C2xx”, Texas Instruments, 1996.