Apr 25, 2003 - Designing application-specific cores using JBits: A run-time parametrizable FIR filter. In J. Schewel, P. M. Athanas, P. B. James-Roxby, and J. T. ...
Library of Adaptive Regular Arrays for Convolution-like Computations Toomas P. Plaks London South Bank University April 25, 2003 Introduction The recent years have shown increasing interest in configurable or adaptive computing [13]. This approach aims at implementing algorithms in a computational space consisting of a hudge number of elementary computing cells, which can be configured, or adapted, for solving a given problem. In this case, we say, an algorithm is implemented in space and time in contrast to the conventional computers where an algorithm is implemented in time, i.e. is carried out step by step on a single processor. The adaptive computing approach integrates the flexibility of programming conventional computers, with the efficiency of dedicated hardware devices on ASICs. This new computing platform demands new design tools and methods [5, 7, 12]. One way to improve the efficiency of designers is to develop libraries of programs for adaptive computing devises [6]. The new, adaptive computing platform presents new requirements for such libraries. For example, a conventional program module for matrix multiplication is parameterized with respect to the problem size parameters, i.e. with respect to the size of matrices. In the case of adaptive computing new parameters should be added. In different applications, the matrix multiplication program is needed to be implemented with different performances [18]. Thus, the parameters that describe the performance and the available area should be added. The performance depends on a number of elementary computing cells allocated to the program, i.e. depends on the size of a space on an adaptive computing platform that is assigned to the particular program at a particular instance. Utilization of different number of computing cells requires a different degree of parallelism in implementation of a given program. In general, the program modules for adaptive computing platform must be adaptable to fit the space-time restrictions of a particular application and an instance of use. For example, ❄ ✲ A2 ✲ the Fig. 1 represents an adaptive computational space where a A1 ✲ program module, A, is implemented using different space-time configurations. Configurations A 1 and A2 have the same number of inputs/outputs and the same performance, but different ✲ ✲ ✲ ✲ space shapes and different input/output positioning. Configura✲ ✲ ✲ A4 tions A3 and A4 have an increased, but different performance ✲ A3 ✲ ✲ achieved by an increased number of inputs/outputs, which have ✲ ✲ ❄❄❄❄ ✲ ✲ different positioning. The current approaches in developing libraries for reconfigurable computing platform introduce modules that are parameterized with respect to the problem size and the word size paFig. 1. Configuring of a program module A. rameters [6, 7]. Conventionally, an algorithm representation uses a sequential approach that makes them often not very suitable for direct and efficient implementation in space-time. Thus, the algorithm must be converted into a suitable form, i.e. into a form of space and time. The theory of mapping algorithms into space-time is known as the theory of regular processor arrays [19, 20, 21]. The main aim of this theory is the minimizing of the time or the number of processors required for solving a given problem. The same theory can be used and extended for adapting algorithms into different space-time shapes. An approach for configuring algorithms and regular processor arrays is the Isoplane method [17]. The basic idea of this method is to increase the parallelism in problem representation, which, together with algebraic transformations, provides so-called piecewise regular arrays with improved performance and topological structure in comparison with conventional regular processor arrays. Using the iso-plane method, different topologies (space shapes) for the same problem can be synthesized automatically at the algorithm level [15]. Such a configuring of algorithms can be efficiently applied for a class of regular iterative algorithms, which represent an important class of algorithms in signal and image processing applications [8, 19].
Configuring Technique As a theoretical basis, we will use the regular array theory, which represents an algorithm using so-called Polytope Model [11]. In this model, the dependencies between computations are represented as vectors in n-dimensional space, where n is the number of for loops used to describe the algorithm. Different regular array descriptions are given by different mappings of dependence vectors into a new coordinate system that can be treated as space and time, hence, space-time mapping. To find an optimal mapping requires a solving of an integer linear programming problem that is a time consuming task. To provide a large variety of arrays with different performance and shape, we bring together three sources: (1) the classical regular array theory [9], which provides the theoretical model, (2) the iso-plane method [17] to increase the parallelism and the (3) multidimensional time [3, 4] to increase the descriptive power of the polytope model. As a result, the range of different array structures for a given problem will be increased and the adaptability of a module for particular application requirements will be increased. The higher degree of parallelism serves two aims: to increase the performance or, to decrease the power consumption using lower clock rates. The description of an adaptive module includes previously found space-time mappings, so, the reconfiguring does not require to solve the integer linear programming problems, which is inevitable using regular array design tools as, for example MMAlpha [1], LooPo [10] or SPADE [14]. Thus, configuring of program modules does not require considerable amount of computations and can be done at a run-time [2]. It should also be mentioned that not all array structures can be automatically synthesized. At the same time, these arrays can be developed by the designer following the basic ideas of regular array theory and, can be described using the space-time mapping approach. Thus, we can increase the range of regular arrays that may be synthesized using conventional tools.
Adaptive Module of Convolution-like Computations Convolution-like computations constitute an important class of algorithms in signal and image processing, including filtering, interpolation, correlation etc. All these computations have similar computational structure, i.e. the same polytope model, or, they can be transformed to the same model by an affine transformation. The only difference is in the operations performed during execution. An Adaptive Module for convolution-like computations (we consider here only the case of 1-D convolution) consists of two parts: (1) a scheme, which describes the possible regular structures of an algorithm and (2) an interpretation, which describes the operations to be performed by PEs. The scheme (or structure) is a part of the module and is used for implementing the space-time mappings and for configuring of a nested for loop algorithm onto the regular array structures. The scheme of an Adaptive Module consists of two parts: (1) the source description and (2) the reconfiguration information. The interpretations determines the particular algorithm, i.e. correlation, filtering etc. An Adaptive Module may be implemented (1) fully on an adaptive computational space, i.e. fully parallel, (2) on a microprocessor, i.e. fully sequentially, or (3) partially on a computational space and microprocessor. An Adaptive Module is parameterized with respect to the three groups of parameters: (1) the problem parameters (type of operators, problem size and word size), (2) the real-time parameters (throughput rate and latency time), and (3) the hardware parameters (the size and the shape of an available computational space together with input/output positioning). Before loading of an Adaptive Module into a computational space, a configuration will be chosen and the corresponding regular structure will be generated. In this paper we present a method for simple and efficient description and generating of a wide range of different array structures. These structure are implemented in Java language. The paper will consider a family of scalable processor array structures (1 and 2-Dimensional structures with different input/output positioning and performance) for convolution-like computations developed using the isoplane method [16, 17]. For example, a 2-Dimensional array for 1-Dimensional convolution may have a shorter latency time than the classical 1-Dimensional array. This is important in the cases with a high number of filter taps. Contrarily, a 2-Dimensional array may have parallel inputs/outputs that improves the overall performance. Also, these two properties can be combined in a way, which results in a short latency and a high performance. In these arrays, the degree of parallelism (i.e. the number of processing elements, or the occupied silicon area) is higher than in the case of classical arrays. This makes it possible to reduce the clock frequency that results in lower power consumption.
References [1] Alpha. http://www.irisa.fr/EXTERNE/ projet/api/ALPHA/welcome.html.
2
[2] R. Bittner and P. Athanas. Wormhole run-time reconfiguration. In FPGA ’97: ACM/SIGDA International Symposium on Field Programmable Gate Arrays, page . ACM, 1997. [3] P. Feautrier. Some efficient solution to the affine scheduling problem. Part I. One-dimensional time. Int. J. Parallel Programming, 21(5):313–347, 1992. [4] P. Feautrier. Some efficient solution to the affine scheduling problem. Part II. Multidimensional time. Int. J. Parallel Programming, 21(6):389–420, 1992. [5] J. Hammes, R. Rinker, W. B¨ohm, and W. Najjar. Cameron: High-level language compilation for reconfigurable systems. In PACT’99, page , 1999. [6] P. James-Roxby. Designing application-specific cores using JBits: A run-time parametrizable FIR filter. In J. Schewel, P. M. Athanas, P. B. James-Roxby, and J. T. McHenry, editors, Reconfigurable Technology: FPGAs and Reconfigurable Processors for Computing and Communication III, Proc. SPIE Vol. 4525, pages 18–26, 2001. [7] A. Koch. Compilation for adaptive computing systems using complex parameterized hardware objects. J. of Supercomputing, 21(2):179–190, February 2002. [8] S. Y. Kung. VLSI Array Processors. Prentice Hall, Englewood Cliffs, N.J., USA, 1988. [9] D. Lavenier, P. Quinton, and S. Rajopadhye. Advanced systolic design. In K. K. Parhi and T. Nishitani, editors, Digital Signal Processing Multimedia Systems, Signal Processing Series, chapter 23, pages 657–692. Marcel Dekker, 1999. [10] Lehrstuhl f¨ur Programmierung, Universit¨at Passau. http://www.fmi.uni-passau.de/ loopo/.
The polyhedral loop parallelizer:
LooPo.
[11] C. Lengauer. Loop parallelization in the polytope model. In E. Best, editor, CONCUR’93, Lecture Notes in Computer Science 715, pages 398–416. Springer-Verlag, 1993. [12] W. Luk and S. McKeever. Pebble: A language for parameterised and reconfigurable hardware design. In Field-Programmable Logic and Applications , LNCS 1482, pages 9–18. Springer-Verlag, 1998. [13] W. H. Mangione-Smith, B. Hutchings, D. Andrews, et al. Seeking solution in configurable computing. Computer, 30(12):38–43, December 1997. [14] J. G. Nash. Automatic generation of systeolic array designs for reconfigurable computing. In T. P. Plaks and P. M. Athanas, editors, Engineering of Reconfigurable Systems and Algorithms, ERSA’02. Proc. of the International Conference, Las Vegas, Nevada, USA, June 24–27, 2002, pages 176–182. CSREA Press, 2002. [15] T. P. Plaks. Formal derivation of multilayered hardware/software structures. In S. Liu, J. A. McDermid, and M. G. Hinchey, editors, Proc. ICFEM 2000. Third IEEE International Conference on Formal Engineering Methods 2000, York, England, UK. 4–6 Sept. 2000, pages 5–13. IEEE Computer Society Press, 2000. [16] T. P. Plaks. Spatially reconfigurable module for FIR filters. In J. Schewel, P. M. Athanas, P. B. JamesRoxby, and J. T. McHenry, editors, Reconfigurable Technology: FPGAs and Reconfigurable Processors for Computing and Communication III, Proc. SPIE Vol. 4525, pages 107–115, 2001. [17] T. P. Plaks. Configuring of algorithms in mapping into hardware. Journal of Supercomputing, 21(2):161–177, February 2002. [18] Chapter 2. Requirements for http://www.quicksilvertech.com.
adaptive
computing.
Quicksilver
Tech.
Homepage:
[19] P. Quinton and Y. Robert. Systolic Algorithms and Architectures. Prentice Hall, Masson, UK, 1991. [20] S. V. Rajopadhye and R. Fujimoto. Synthesizing systolic arrays from recurrence equations. Parallel Computing, 14(2):163–189, June 1990. [21] J. Teich and L. Thiele. Partitioning of processor arrays: A piecewise regular approach. INTEGRATION, 14(3):297–332, 1993.
3