2012 4th International Conference on Intelligent and Advanced Systems (ICIAS2012)
FPGA-based Hardware Implementation of Optical Flow Constraint Equation of Horn and Schunck Ruzali Rustam, Nor Hisham Hamid and Fawnizu Azmadi Hussin Centre for Intelligent Signal and Imaging Research, Electrical and Electronic Engineering Department Universiti Teknologi Petronas, Bandar Seri Iskandar, 31750 Tronoh, Perak, Malaysia.
[email protected], {hishmid, fawnizu}@petronas.com.my Abstract - In hardware implementation, there are different architectures that can represent the same algorithm into hardware. The different architectures are usually caused by using different number representations. In this work, two hardware architectures of optical flow constraint equation of Horn and Schunck (OFCE-HS) are presented and compared. The first architecture (OFCE-HS MZ) is previous work using full integer number to represent the architecture. The second architecture (OFCE-HS RH) is our work using combination between integer and fraction number to represent the architecture. Hardware designs of the architectures are performed using Xilinx System Generator through HW-SW cosimulation scheme. As a result, our proposed work has better performance compared to the previous work. It has the ability to reduce noise as well as hardware resources. Keywords—hardware architecture, optical flow, OFCE-HS MZ, OFCE-HS RH, integer, fraction, HW-SW co-simulation, loop process, Xilinx System Generator (XSG), FPGA.
I.
Optical flow constrain equation (OFCE) is an algorithm proposed by Horn and Schunck (HS) in 1981 [1]. The OFCE-HS was the first and one of the best techniques to estimate optical flow based on the evaluation in [6]. Attempts to implement OFCE-HS into the hardware also introduced by Martin et al. [2-4] and Cobos et al. in [7-8]. From this point, the work proposed by Martin et al. has a better performance. This paper is trying to continue the previous work presented in [5] in order to improve performance of the work proposed by Martin et al. There were two architectures presented and compared – i.e. OFCEHS of Martin, Zuloaga et al. (called as OFCE-HS MZ) and our work (OFCE-HS RH) – as shown in Fig. 1 and Fig. 2, respectively.
INTRODUCTION
Integer and fraction with fixed-point representations are commonly used for hardware implementation compared to the representations with single/double precision. This is because hardware resources can be reduced and speed can be improved when using the integer and the fraction representations. With the different type representations (i.e. integer and fraction), there are different slight results at the end computation when applied for a same complex algorithm. This is because there are errors truncating bits. For some cases, the different results are not important as well as invisible. However, when applied for digital image/video processing, the different results can easily be observed. It often has effect to noise issues as well as resources usage. The objective of this work is to show the effect of using the different types of hardware architecture for optical flow constraint equation of Horn and Schunck (OFCE-HS) [1]. The previous hardware architecture [2-4] used only integer representation to implement the OFCE-HS into hardware. Consequently, it has more noise and resources usage. To overcome that problem, this work proposed another architecture as in [5]. The architecture is built by combining integer and fraction instead of the integer architecture. The rest of this paper is organized as follows: section 2 is going to talk about related works. Section 3 will present the methodology, followed by subsections (a) Xilinx System Generator, (b) hardware design of OFCE-HS, and (c) hardware design of CORE. Section 4 will describes experimental setup while doing this research. Section 5 shows the results and discussions of this work. The conclusions are offered in section 6.
978-1-4577-1967-7/12/$26.00 ©2011 IEEE
[ 790 ]
II. RELATED WORK
Figure 1. Previous OFCE-HS MZ Architecture in [2-4].
Figure 2. Proposed OFCE-HS RH Architecture [5].
Fundamental difference of both the designs is that the OFCE-HS MZ utilizes two dividers (Fig. 1), whereas the OFCE-HS MZ only utilizes one divider (Fig. 2). This is because OFCE-HS MZ tries to avoid bit error truncation through the integer divider. For purpose in this work, its intention is to reduce two division arithmetic functions and noise appearing on the previous work. The proposed architecture has only one division function as illustrated by the Naïve dataflow diagram in Fig. 2a. It has been achieved with combining the integer and the fraction arithmetic functions as shown by the datapath in Fig. 2b.
2012 4th International Conference on Intelligent and Advanced Systems (ICIAS2012)
III. METHODOLOGY A. Xilinx System Generator Xilinx System Generator (XSG) [9] is tool of system level modeling that enables the use of the MathWorks modelbased Simulink design environment for hardware design. In other words, XSG is an application running in Simulink Environment – called as an extended Simulink Environment – to facilitate a modeling environment that is well-suited to the hardware design. With this ability, it is possible for hardware designers to design and simulate entire their hardware within the extended Simulink environment through methodology of software (SW) and hardware software (HWSW) co-simulation. The SW co-simulation involves Xilinx ISE [10] as underlying software and Matlab/Simulink [11] to validate and evaluate the designs. Besides both of the software, for HW-SW co-simulation, it involves target hardware platform to download design into FPGA, i.e. XtremeDSP Starter Platform with Spartan-3A DSP 1800A FPGA [12-14] and “Platform Cable USB II [15-16].
In this work, the design methodology with SW and HWSW co-simulations is divided as follows: 1) Hardware designs are firstly modeled using Xilinx blockset [17] and placed within FPGA boundary [18]. 2) To validate and evaluate its reliability, the hardware design is placed and simulated within a validation system environment of SW co-simulation mode as a tested design. The validation system (i.e. high-level testbench abstraction) consists of other designs modeled with Simulink blockset [11] to build comprehensive system environment. 3) After simulation, the hardware design is implemented to target hardware platform to measure its performance, i.e. speed (timing analysis) and resources usage, using Timing and Power tool [18]. 4) Finally, the hardware design is compiled using hardware co-simulation tool [18] to obtain FPGA bit stream block. This block is used to test the hardware design running in HW-SW co-simulation system.
Figure 3. Hardware Design of OFCE-HS MZ using Xilinx Blocksets of XSG
Figure 4. Hardware Design of OFCE-HS RH using Xilinx Blocksets of XSG
[ 791 ]
2012 4th International Conference on Intelligent and Advanced Systems (ICIAS2012)
B. Hardware Design of OFCE-HS equipped with zero detection circuit to avoid division error There are two hardware architectures of OFCE-HS which when a number is divided with zero as shown in Fig. 7. are designed and tested, i.e. OFCE-HS MZ proposed by Martin et al. [2-4] and OFCE-HS RH proposed in this work. The hardware designs of OFCE-HS MZ and OFCE-HS RH are shown in Fig. 3 and Fig. 4 (above), respectively. They are developed using basic math blocks of Xilinx blocksets. Besides that, they are equipped by the Buffer OFCE-HS and the Weight Factor as shown by dash-lines in Fig. 3 and Fig. 4. The Fig. 5 and Fig. 6 show hardware designs of the Buffer OFCE-HS and the Weight Factor in detail, respectively. The Buffer OFCE-HS functions to overcome loop process of minimum iterations required by the algorithm. It buffers new raw-data of frames of image sequences for initial process and keep the old raw-data for the loop process through its control circuit inside. It is developed using SP-RAM (single port random access memory) and control logic blocks (Fig. 5).
Figure 7. Hardware Design of Divider Circuit.
C. Hardware Design of CORE Because of its loop process, the OFCE-HS can not work without additional block representing smoothness constraint equation of Horn-Schunck (SCE-HS) as in [1]. For the purpose in this work, combination of OFCE-HS and SCE-HS are designed to build a block called CORE as shown in Fig. 8. The CORE is also equipped the Line Delayers to obtain the appropriate loop process. With the same design, the combination for OFCE-HS RH is obtained by replacing the OFCE-HS MZ. Figure 5. Hardware Design of Buffer OFCE-HS.
Figure 6. Hardware Design of Weight Factor.
Functions of division for both of the hardware designs of the OFCE-HS architectures (i.e. shown by dash-line in Fig. 3 and Fig. 4) use the Divider Generator blocks of the Xilinx Blockset as shown in Fig. 7. They are set-up by using radix-2 [17] as basic math component. With two types of divider used, the divider circuit with integer output of OFCE-HS MZ is described in Fig. 7a and the divider with fraction output of OFCE-HS RH is described in Fig. 7b. Both the dividers are
[ 792 ]
Figure 8. Hardware Design of CORE.
IV. EXPERIMENTAL SETUP To validate and evaluate the reliability of OFCE-HS, a validation system is developed. The validation system consists of high-level testbench abstraction and the “Tested Designs” (i.e. the CORE of OFCE-HS MZ or RH) as shown
2012 4th International Conference on Intelligent and Advanced Systems (ICIAS2012)
Figure 9. Validation System of OFCE-HS.
in Fig. 9. The high-level testbench consists of Multimedia File, Buffer I/O (Frame to Serial and Serial to Frame blocks), control signal generator (CSG) of the CORE, Gradient Processor 2x2x2, Adders (Add) and Video Display blocks.
target hardware platform as explained in the previous subsection. V. RESULT AND DISCUSSION Figures of Fig. 11 and Fig. 12 show simulation results of OFCE-HS (MZ/RH) when running in the validation system environment. The middle of simulation results shows the original images, the left-side shows the images of OFCE-HS MZ and the right-side shows the our images of OFCE-HS RH. The Fig. 11 is the simulation results using basic video (round5fps.avi) with white background and black-round movement. The Fig. 12 is the simulation results using real video (traffic.avi) with car movement within static scene.
Figure 10. Parameter setup for ten loop process with frame by frame.
The Multimedia File block provides image sequences of video file entering the Tested Designs. The Video Display displays the input and output (optical flow representation) the image sequences. Because the raw-data of the Multimedia File and Video Display blocks is in frame-based representations, they must be transformed to serial-based representations by using the Buffer I/O blocks. The Gradient Processor 2x2x2 produces spatial-temporal gradient that is needed by the Tested Design blocks. The Adder (Add) combines the output of CORE consisting of velocities (u, v) in order to provide the optical flow representations. For loop process, the CSG of CORE generates control signal to manage the CORE (OFCE-HS & SCE-HS) when it starts to process new raw-material data and when it performs the loop. In this experiment, the number of loop used is set-up to ten with frame by frame as shown by parameter setup in Fig. 8. After validation and reliability evaluation, the hardware designs of OFCE-HS (MZ/RH) are implemented into the
Figure 11. Simulation result with black-round movements. The original image is in the middle, the result of OFCE-HS MZ (i.e. Cmzhs_...) is in the left-side and our result of OFCE-HS RH (i.e. Crhhs_f...) is in right-side.
Figure 12. Simulation results with car movements.
[ 793 ]
2012 4th International Conference on Intelligent and Advanced Systems (ICIAS2012)
Table I shows the speed performance for both the hardware designs when running the timing and power analysis tools. It is based on the target hardware implementation, i.e. Spartan-3A DSP 1800A FPGA. Table II shows resources usage that is also produced from same tools. TABLE I REPORT OF TIMING ANALYSIS
Timing OFCE-HS MZ OFCE-HS RH
Min. Period (ns)
Max. Frequency (MHz)
9.517 9.783
105.075 102.218
TABLE II REPORTS OF RESOURCES USAGE
Resources Slices Flip Flops 4-input LUTs BUFGMUXs DSP48As RAMB16BWERs
OFCE-HS MZ OFCE-HS RH Number Values Number Values 2,002 2,779 1,329 1 7 3
12% 8% 3% 4% 8% 3%
1,018 1,451 829 1 7 3
6% 2% 2% 4% 8% 3%
VI. CONCLUSION In this work, the hardware architectures of the OFCE-HS MZ and the OFCE-HS RH have been compared. The OFCEHS MZ has been built by using full integer arithmetic to reduce the errors truncating bits, increase the speed, and reduce the resources usage. The OFCE-HS RH has been built by combining integer and fraction arithmetic against the OFCE-HS MZ. Based on the experiment, we found that the OFCE-HS MZ has poor performance compared to OFCE-HS RH because OFCE-HS MZ produces much noise. Besides that, the OFCE-HS MZ consumes more resources usage because of the two dividers used. As a conclusion, the full integer arithmetic for hardware implementation does not always can reduce of the errors truncating bits, increase the speed, and reduce the resources usage. To obtain appropriate results, a combination of integer and fraction is best for the hardware implementation of the OFCE-HS as shown in this work. REFERENCES [1] [2] [3] [4]
Optical flow is defined as distribution of apparent velocities of movement of brightness patterns in imagesequences that can arise from relative motion of scene [2]. Intuitively, when object moves within the static scene, there is only the object movement that appears to represent the optical flow and the scene (background) does not appear. For two given videos (round5fps.avi, traffic.avi) in Fig. 11 and Fig. 12, the optical flows are ideally represented by blackround and car movements. If there is any object outside the black-round and the car movement, they represent noise. Based on the first simulation results in Fig. 11, the OFCEHS MZ produces noise which appears the straight lines on the left and below the object. The OFCE-HS RH for the first simulation does not produce any noise. For the second simulation results in Fig. 12, noise (i.e. appearing the sketches of the scene) is still produced by OFCE-HS MZ. The noise almost does not appear in OFCE-HS RH. By comparing both the simulation results, the OFCE-HS RH has better performance compared to OFCE-HS MZ in term of noise reduction. From speed point of view, the speed produced by both the hardware designs is almost same as shown in Table I. In other words, there are no significant differences in terms of the speed performance. However, for hardware resources usage, there are significant differences between OFCE-HS MZ and OFCE-HS RH as shown in Table II. For the slices, the OFCE-HS RH can reduce 49.15% of OFCE-HS MZ. For flip flops, the OFCE-HS RH can reduce 47.79% of OFCEHS MZ. For the 4-input LUTs, the OFCE-HS RH can reduce 37.62% of OFCE-HS MZ.
[5]
[6] [7] [8]
[9] [10] [11] [12] [13] [14] [15] [16] [17] [18]
[ 794 ]
B. K. P. Horn and B. G. Schunck, "Determining Optical Flow," Massachusetts Institute of Technology1980. J. L. Mart´n, et al., "Hardware implementation of optical flow constraint equation using FPGAs," Comput. Vis. Image Underst., vol. 98, pp. 462-490, 2005. A. Zuloaga, Martin, J.L., Ezquerra, J., "Hardware architecture for optical flow estimation in real time," ICIP 98, pp. 972-976, 1998. A. Zuloaga, Bidarte, U., Martin, J. L., Ezquerra, J., "Optical flow estimator using VHDL for implementation in FPGA," Proceedings XIII design of circuits and systems conference, pp. 36-41, November 1998. Ruzali Rustam; Nor Hisham Hamid and Fawnizu Azmadi Hussin, "Hardware architecture of OFCE-HS for hardware implementation," in Machine Vision (ICMV), 4th International Conference on, Singapore, 2011. Computer Vision Research Group - Dept. Computer Science of Univ. Otago: New Zealand -. (2007), Optical Flow Algorithm Evaluation. Available: http://of-eval.sourceforge.net/. P. C. Arribas, "Real time hardware vision system design," presented at the Proceedings of the 9th WSEAS International Conference on Systems, Athens, Greece, 2005. P. Cobos and F. Monasterio-Huelin, "FPGA Implementation of the Horn & Schunck Optical Flow Algorithm for Motion Detection in Real Time Images," Proc. of XIII Design of Circuits and Integrated Systems Conference, DCISSOH98, pp. 616-621, 1998. Xilinx Inc. Xilinx System Generator. Available: http://www.xilinx.com/tools/sysgen.htm. Xilinx Inc. Xilinx ISE Software. Available: http://www.xilinx.com/products/design-tools/ise-designsuite/index.htm. MathWorks Inc. Matlab and Simulink Software. Available: http://www.mathworks.com. Xilinx Inc. XtremeDSP Starter Platform - Spartan-3A DSP 1800A Edition. Available: http://www.xilinx.com/products/boards-andkits/HW-SD1800A-DSP-SB-UNI-G.htm. Xilinx Inc. (Jan. 30, 2009), Spartan-3A DSP Starter Platform. User Guide (UG454) 1.1. Available: http://www.xilinx.com/support/documentation/user_guides/ug454.pdf. Xilinx Inc. (June 12, 2008), Getting Started with the Spartan-3A DSP S3D1800A Starter Platform. User Guide (UG485) 1.1. Available: http://www.xilinx.com/support/documentation/user_guides/ug485.pdf. Xilinx Inc. Platform Cable USB II. Available: http://www.xilinx.com/products/boards-and-kits/HW-USB-II-G.htm. Xilinx Inc. (June 9, 2008), Platform Cable USB II: Advance Product Specification. Data Sheet (DS593) 1.2. Available: http://www.xilinx.com/support/documentation/data_sheets/ds593.pdf. Xilinx Inc. (Sept. 21, 2010), System Generator for DSP Reference Guide. User Guide (UG638) 12.3. Available: http://www.xilinx.com/support/documentation/user_guides/ug638.pdfc. Xilinx Inc. (Sept. 21, 2010), System Generator for DSP User Guide. User Guide (UG640) 12.3. Available: http://www.xilinx.com/support/documentation/user_guides/ug640.pdf.