Journal of Circuits, Systems, and Computers c World Scientific Publishing Company
A FPGA CORE GENERATOR FOR EMBEDDED CLASSIFICATION SYSTEMS
DAVIDE ANGUITA, LUCA CARLINO, ALESSANDRO GHIO∗, SANDRO RIDELLA Dept. of Biophysical and Electronic Engineering, University of Genova, Via Opera Pia 11a Genova, I-16145, Italy
Received (received date) Revised (revised date) Accepted (accepted date) We describe in this work a Core Generator for Pattern Recognition tasks. This tool is able to generate, according to user requirements, the hardware description of a digital architecture, which implements a Support Vector Machine, one of the current state–of– the–art algorithms for pattern recognition. The output of the Core Generator consists of a high–level language hardware core description, suitable to be mapped on a reconfigurable device, like a Field Programmable Gate Array (FPGA). As an example of the use of our tool, we compare different solutions, by targeting several reconfigurable devices, and implement the recognition part of a machine vision system for automotive applications.
1. Introduction A desired goal for building more effective embedded systems for real–world applications is the increase of on-board intelligence through, for example, Pattern Recognition or Machine Learning algorithms 1 . These algorithms require a significant amount of computational power, which is not always available on embedded systems due to their severe resource constraints (e.g. size, dissipation, etc.). One possibility for overcoming this problem consists in reducing the complexity of the algorithms at the expense of their accuracy 2,3 . An alternative option consists in providing sufficient computational capabilities, supporting the main processing unit, through a special–purpose co–processor, which is, in general, more efficient than the general–purpose one. A good candidate for this task is a Field Programmable Gate Array (FPGA), which can be easily configured to implement the desired computation: the main processing unit deals with most of the work, while the FPGA is dedicated to the resource–consuming algorithm. Several examples of this approach have recently appeared in the literature, thank to the development, in the last years, of high–performance FPGAs and the corresponding programming environments: they target robot control 4 , digital signal processing 5 , DNA sequencing 6 , etc. ∗ Corresponding
author. E-mail:
[email protected].
A FPGA Core Generator for Embedded Classification Systems
Among the algorithms that are of interest for embedding recognition capabilities on a processing system, we focus our attention on the Support Vector Machine (SVM) 7,8 . The SVM has been developed mainly for pattern recognition tasks and can be seen as a next–generation Artificial Neural Network (ANN). In particular, it resembles the Radial Basis Function (RBF) and Multi–Layer Perceptron (MLP) networks 9,10 , but gives up their biological plausibility in favour of solid statistical foundations 11 . The algorithm builds a classifier during a training phase, using a set of labeled patterns (the training set ); then, the parameters of the trained classifiers are frozen and the classifier is used to predict the more plausible class of any new pattern (the on–line or feedforward phase). Even if some solutions have been proposed in the literature to perform the training phase on the device itself 12 , usually this step is performed off–line, on a conventional computer, then the trained classifier is downloaded to the embedded system 13,14,15 . While the original SVM has been mostly implemented on general–purpose computers 16 , several variants have been proposed to target special–purpose digital architectures, therefore allowing its realization on FPGAs and embedded systems 17,18 . Unfortunately, the hardware implementation of a SVM is not straightforward, even when implementing only the feedforward phase: in fact, the user faces several options and can choose between different architectures, each one having different advantages and drawbacks. Depending on the user requirements and the characteristics of the target device, the architecture must be designed to guarantee the best trade–off between resource utilization (memory, logic gates, adders, embedded multipliers, etc.) and performance (reliability of the output, throughput, maximum clock frequency, latency, etc.). For this task, a tool that generates an optimized hardware core, according to the user requirements, can be particularly useful. The idea of using a tool for hardware design and optimization, starting from an algorithmic description, has been quite successful in the past, when Silicon Compilers 19,20 were developed to automatically generate the layout of an integrated circuit, taking user specifications as inputs. Core Generators and Hardware Compilers 21,22,23 represent the modern evolution of Silicon Compilers. Their customizations can be set by the user through a Graphical User Interface (GUI) or a high–level source code, and the output consists of the behavioural or structural description of the architecture in a high–level Hardware Description Language (e.g. VHDL or Verilog). Our proposal belongs to this framework and allows application developers to easily include a state–of–the–art pattern recognition module in their design, which is optimized according to their needs and system constraints. In particular, the main objective of this work is to describe a Core Generator for SVMs or, in other words, a tool that, according to user requirements, generates an optimized hardware description of a digital architecture, which targets a current– generation FPGA and implements the feedforward phase of a trained SVM. This paper is organized as follows: as a first issue, we briefly review a hardware– friendly version of the SVM, that is a reformulation of the SVM using fixed–point arithmetic, which is more suitable for resource constrained systems where floating–
A FPGA Core Generator for Embedded Classification Systems
point units are usually avoided. Then, we propose several digital architectures, that can be exploited for implementing the SVM: each architecture is optimal according to some criteria and the main scope of the Core Generator is to choose the best one respect to the application, the user’s requirements and the main characteristics of the target device. Finally, we show the use of the Core Generator by targeting four different FPGAs, from 300K up to 4M equivalent logic gates. Both the performance and the resource requirements for implementing the SVM core are detailed and compared, along with the best architecture selected by the Core Generator. 2. A Support Vector Machine for digital hardware The algorithm targeted by our analysis is the homogeneous SVM: this version is theoretically equivalent to the conventional one 24 and is more amenable for hardware implementations 18 . Let us suppose that the dataset, which must be learned by the SVM, is composed by l patterns {x1 , ...., xl }, where xi ∈