Automatic Mapping of OpenCV Based Systems on New Heterogeneous SoCs Francisco-Jos´e Sanchis-Cases
´ Antonio Mart´ınez-Alvarez
Sergio Cuenca-Asensi
Department of Computer Technology University of Alicante Email:
[email protected]
Department of Computer Technology University of Alicante Email:
[email protected]
Department of Computer Technology University of Alicante Email:
[email protected]
Abstract—In this paper, a new approach to facilitate the development of heterogeneous embedded vision systems, by means of a software-friendly design flow is proposed. Our work is based on a framework called AMAZynq, especially focused on leverage the resources of Xilinx Zynq devices. Starting from an OpenCV code, the user defines explicitly the HW/SW partition and the AMAZynq produces the corresponding hardware and software infrastructures. The result is a SoC that not only matches the original functionality, but also accelerates it by means of an automatically generated and verified HLS core. Keywords—Heterogeneous SoCs, Vision processing, High-Level Synthesis
I.
Vivado HLS
HLS Project
Vivado HLS
EDK
AMAZynq Compiler
OpenCV code
Directives / Pragmas
Templates
Architectures
EDK Project (.mhs)
SW Project
Implementation
.bit / .bin
ARM Compiler
.elf API Linux AXI4 Drivers
Fig. 1.
Cross-Compile
AMAZynq Block Diagram and Data Flow
I NTRODUCTION
Image and video processing is a very intensive computational task having a vast applicability and an ongoing interest. This processing commonly implies stressing up a heterogeneous architecture (FPGA[1], GPU[2], CPU[3]) by tuning the system accordingly with diverse degrees of inherent parallelism as well as adjusting properly data representation for a given application. Although FPGA offers inherent advantages for implementing such systems, its applicability is limited by two main factors. On one hand, FPGA design is time-consuming and expensive task. Although IPs and new untimed C-to-Hardware compilers [4] [5] facilitate the HW design, the learning curve is hard for image processing engineers. Reference designs and computational models are not enough to reduce this knowledge gap. On the other hand, integration of IP Cores within CPU architecture need of specialized hardware/software skills and especially when DMA transfers and OS are involved. In this context, our work presents an effort to increase the abstraction level in the development of embedded vision systems, taking advantage of the new generation of C-to-Hardware compilers. Our approach is based on a software-friendly design flow which facilitates the migration from a pure software OpenCV project to the new heterogeneous Xilinx Zynq platform. Furthermore as a secondary goal, the migration is designed to be carried out by engineers from different domains, not only FPGA engineers. II.
AMAZynq
HLS OpenCV Primitives
AMAZ YNQ F RAMEWORK
The design environment comprises several independent blocks (Fig. 1) which bring scalability and flexibility allowing the use of third-party HLS OpenCV library contributions. HLS OpenCV primitives are a HW core library based on Xilinx
Vivado HLS tool. The library can be populated with HW cores from Xilinx or from third parties. Currently there are two libraries in use: the Xilinx HLS OpenCV and our AMAZynq HLS OpenCV. The AMAZynq Compiler (AZ Compiler) has the function to process the input code and get an equivalent HW/SW model. The AMAZynq outcomes depend on user directives and pragmas that are included in the original OpenCV code. These pragmas allow the specification of the HW components of the system and also the control of several low level parameters as the width of data at different locations of the datapath. AZ Compiler generates three projects: HLS Project, EDK Project and SW Project. HLS Project includes the HLS description of a HW accelerator (HW core), the interface specification with the corresponding AXI buses and the testbench. AZ Compiler extracts the functions and data dependencies from the original code and builds the equivalent HW datapath using the HLS OpenCV primitives. In addition to this task, it is possible to perform optimizations at different levels. At the level of operator space, the compiler is able to do simplifications to reduce the number of HW primitives. At architectural level, the compiler can identify some cases where resources can be shared to reduce HW usage. Finally, at data representation level and in accordance to optional user directives, the compiler can adjust the width of the output variable of every OpenCV function to reduce the resource usage and to obtain the required accuracy at the same time. OpenCV only defines standard data widths, therefore to preserve accuracy, it is usual to declare variables bigger than really needed. The HLS OpenCV primitives are not just clones of original ones, as they are able to deal with different data width in a transparent manner. As mention above, the testbench of the HLS Project is also
TABLE I.
generated. Using Vivado HLS, the user can automatically verify the functionality of the core against the original OpenCV code using still images, video files or live video acquired by means of a webcam. The C-style simulation performed by Vivado HLS can reveal discrepancies in the accuracy, indicating that data width are needed to be adjusted in the AZ Compiler phase. Once verified, the core is synthesized by Vivado HLS. It produces an IP core ready for integration with the EDK Project by means of the corresponding bus adapters and interfaces. These interfaces were automatically declared and configured by the AZ Compiler. For this task the compiler knows the options that the specific Reference Architecture (RA) offers among the AXI buses, but also has to figure out the number and nature of the data transactions between HW and SW. AZ Compiler identifies each parameter of every OpenCV function and classifies them among, video streams, compile time (ct) and run time parameters (rt). Ct parameters are usually constants that can be used to guide the synthesis of OpenCV primitives, therefore no need to be transmitted during system operation. Rt parameters, on the contrary, can be used to modify the processes carried out by the HW core. Anyway, this information has to be known by the AZ Compiler to take the correct decision. Currently the AMAZynq framework incorporates a mechanism for doing this information accessible to the compiler. It also allows the update of the information when the library is updated or new HLS primitives are included by users or third parties. EDK Project is defined by means of a Xilinx Hardware Specification file (.mhs). This file is a product of the compiler, which makes use of a specific RA. It includes the instantiation of the HW core synthesized from the HSL Project and its support infrastructures, such as VideoDMA, AXI buses, Camera Iface, etc. A data base of RAs allows the user to choose among different frameworks to fit the system. The election depends on the system specification and requirement. For example, the image acquisition could be resolved attaching the camera to the CPU or directly to the HW accelerator. The video stream could be redirected from the HW core to CPU for further processing or send to a monitor controller for displaying, even the system could need several cameras working concurrently. Each RA has to include all the HW and SW infrastructures. Nowadays Xilinx provides several RAs and the data base is prepared to be enriched with the contributions of other third parties. Ideally the user has a minimum interaction with the EDK tool to generate the bitstream for the FPGA. SW Project comprises the SW side of the original OpenCV code and some calls to functions and procedures for handling the data transactions between SW and HW. This SW is ARM compatible and is ready to run on Linux. To build the SW project the AZ Compiler takes advantage of software libraries and Templates. III.
BRAM_18K DSP48E FF LUT SLICE
IV.
Harris Lib2 AMAZynq HLS OpenCV 7 52 8200 6516 4638
C ONCLUSION
This paper presents a compiler based framework for semiautomatic synthesizing and implementing embedded vision systems on Xilinx Zynq platforms. The entry of the design is a pure OpenCV code slightly modified by means of pragmas and user directives. As future works, we want to extend the compiler to be able to test and calculate the error in SW simulation. Therefore, the tool will be able to launch a battery of synthesis and choose the fittest depending on use case. The compiler offers the possibility to include different strategies to automatically explore the space design in the search of different trade-offs. ACKNOWLEDGMENT This work has been funded by the 2010 Research National Plan in Spain of the Ministry of Science and Innovation with the project TEC2010-22095-C03-01. We also thank to Xilinx University Program and Juanjo Noguera from Xilinx Research Labs Ireland and for their support and encouragement. R EFERENCES [1]
[2]
[3]
[4]
Although the Harris Corner filter is included as a native top-level function on OpenCV, its definition in terms of the low-level primitives (convolutions, space color conversions,
Harris Lib1 Xilinx HLS OpenCV 18 40 6259 4168 4843
data normalization, etc.) have been used to test the tool. In this case study, the most of the OpenCV functions were defined for migrating to HW (cv::{Sobel, cvtColor, boxFilter, calcHarris, Filter2D}), only normalization function and display routines were left in charge of the CPU. Table I shows the resource usage of the HW accelerator generated by AZ Compiler using two different HLS OpenCV primitives: Xilinx library and our AMAZynq library. As can be seen, there is a significant difference in block RAMs usage, due to the fact that AMAZynq lib is designed to leverage the optimizations of AZ Compiler. The final SoC was tested on Zedboard platform.
U SE C ASE
In order to assess the functionality of the framework, a case study was performed using the Harris Corner detector [6]. Harris Corner is a medium complexity algorithm that has been widely used in different Computer Vision systems [7].
HW I MPLEMENTATION (EDK R EPORT )
[5] [6] [7]
M. Bin Mohamed Shukor, L. H. Hiung, and P. Sebastian, “Implementation of real-time simple edge detection on fpga,” in Intelligent and Advanced Systems, 2007. ICIAS 2007. International Conference on, 2007, pp. 1404–1406. M. Marengoni and D. Stringhini, “High level computer vision using opencv,” in Graphics, Patterns and Images Tutorials (SIBGRAPI-T), 2011 24th SIBGRAPI Conference on, 2011, pp. 11–24. S. Sankaraiah and R. Deepthi, “Highly optimized opencv based cell phone,” in Sustainable Utilization and Development in Engineering and Technology (STUDENT), 2011 IEEE Conference on, 2011, pp. 47–52. Vivado high-level synthesis. [Online]. Available: http://www.xilinx.com/products/design-tools/vivado/index.htm Impulse accelerated technologies: Accelerate software algorithms on fpgas. [Online]. Available: http://www.impulseaccelerated.com/ C. Harris and M. Stephens, “A combined corner and edge detector,” in Alvey vision conference, vol. 15. Manchester, UK, 1988, p. 50. P.-Y. Hsiao, C.-L. Lu, and L.-C. Fu, “Multilayered image processing for multiscale harris corner detection in digital realization,” Industrial Electronics, IEEE Transactions on, vol. 57, no. 5, pp. 1799–1805, 2010.