FPGA-Based Reconfigurable Unit for Image Filtering ...

3 downloads 133 Views 729KB Size Report
Blanco, 36700 Salamanca, Guanajuato, Mexico. {[email protected]}. Abstract— Digital filtering is a key step of image processing in many applications.
FPGA-Based Reconfigurable Unit for Image Filtering in Frequency Domain Luis M. Ledesma-Carrillo, Misael Lopez-Ramirez, Ana L. Martinez-Herrera, Eduardo Cabal-Yepez, Arturo Garcia-Perez. Division de Ingenierias, Campus Irapuato-Salamanca, Universidad de Guanajuato / Carr. Salamanca-Valle km 3.5+1.8, Comunidad de Palo Blanco, 36700 Salamanca, Guanajuato, Mexico. {[email protected]}

Abstract— Digital filtering is a key step of image processing in many applications. Due to its importance in this work a general FPGA-based reconfigurable architecture for real-time, online image filtering in the frequency domain is presented. The proposed FPGA-based implementation is portable to distinct platforms from different vendors. Obtained results from different study cases show the high capability and performance of the proposed hardware implementation by applying any user designed filtering operation on an image, and outperforming by two orders of magnitude its software implementation counterpart. Keywords- 2D-FFT; Frequency domain; FPGA; Generic architecture; Image filtering; Real-time image processing.

I.

INTRODUCTION

The two-dimensional Fast Fourier transform (2D-FFT) and its inverse (2D-IFFT) play an important role in digital image processing to provide filtering solutions for many applications as robotics, optics, telecommunications, and bio-imaging, among others. Nowadays, filtering is utilized for image processing operations like denoising, edge detection, recognition, segmentation, smoothing, and restoration [1], [2]. For instance, in [3], meteorological images are filtered utilizing the Fourier transform to predict storms by analyzing the frequency contents in an image taken by a satellite. In [4], different types of digital filters namely average filter, median filter, and wiener filter are described and compared for removing noise present in transmission electron microscopy images of nanomaterials. In [5], a method based on median filtering to compute the total variation with respect to the central pixel in a filter window is introduced to detect edges of nuclei in microscopy images. In [6] different low-pass filtering schemes are applied to restore a gray scale image corrupted by Salt & Pepper noise. The efficiency of each filtering scheme is evaluated through mean square error, signal to noise ratio, peak signal to noise ratio and signal to noise ratio improvement. In [7], a filter-learning model is built based on the theory of function approximation and the generalization of the partial differential equation model for image deblurring. From here, it is clear that digital filtering is a key step of image processing in many applications. Due to its importance for image processing, different implementations have been proposed to compute the 2D-FFT in hardware. For instance, in [8], a 4096-point radix-8 fast Fourier transform (FFT) is designed and implemented on a field programming gate array (FPGA). The proposed architecture utilizes a pipeline structure to process 4096 point

978-1-4799-2079-2/13/$31.00 ©2013 IEEE

in 20.48 µs at 100 MHz. In [9], a complex, single precision, floating-point, one-dimensional fast Fourier transform (1DFFT) implementation is proposed and compared in performance against graphic processing unit (GPU) accelerators. In [10], an implementation of Laplacian sharpening and 2D-FFT transform in an FPGA is used to assess the quality of the iris image in a Texas Instrument digital signal processor (DSP) based iris recognition system. In [11], a real-time, embedded automated image recognition system is introduced utilizing a combination of a DSP to implementing the distance-classifier correlation filter algorithm, and an FPGA to implement computational bottlenecks like the 2D-FFT and 2D-IFFT operations. In [12], a High Level framework for the implementation of FFT in an FPGA for real-time image processing applications is presented. The proposed architecture is implemented in two Celoxica-Rc1000 development boards utilizing the hardware description language (HDL) Handel-C. In [13], the design and realization of a high level Handel-C based framework, which includes wide range of FFT algorithms like radix-2, radix-4, split-radix, and fast Hartley transform, for the implementation of 1D and 2D FFTs in image filtering applications is reported. In [14], a stream, coarse grained, and application-specific pipeline architecture described in data flow graph (DFG) is proposed for accelerating 2D-FFT computation in embedded and multimedia applications. A major disadvantage of the methods described above is their lack of portability among different FPGA platforms, in part due to the utilization of object orient programming (OOP) languages like Handel-C or DFG during their description. Another disadvantage of utilizing OOP languages for hardware description is that they do not guarantee optimal resource utilization in the FPGA device since they are not proper hardware description languages, but programming languages that allow expressing algorithms without worrying too much about in how the underlying computation engine works [15]. The contribution of this work is the development of a generic, FPGA-based, reconfigurable architecture for online image filtering in the frequency domain, in real-time. The proposed generic architecture implements the 2D-FFT and the 2D-IFFT in hardware utilizing very high speed integrated circuits hardware description language (VHDL). Different from previous approaches, the generic VHDL-based hardware description allows portability to different FPGA platforms for image filtering in distinct applications. To assess the performance and effectiveness of the proposed generic hardware architecture, five different study cases in

common image processing applications are used: 1) denoising, 2) edge detection, 3) low-pass filtering, 4) highpass filtering, and 5) deblurring. Obtained results show the versatility and usefulness of the proposed generic, FPGAbased, reconfigurable architecture for online image filtering in the frequency domain. II.

This last expression is known as time decimation due to the reorganization of the discrete-time samples into two sets, which is called the radix-2 butterfly. Hence, for an N-point FFT computation just (N/2) log2(N) complex multiplications and N log2(N) complex additions are required. The basic operation of the radix-2 butterfly is depicted in Fig. 1.

THEORETICAL BACKGROUND

A. Discrete Fourier Transform and Its Inverse The N-point discrete Fourier transform (DFT) is defined by

W Nnk Figure 1. Time-decimation radix-2 butterfly.

N −1

X (k ) =



x(n)WNkn ,

k = 0,1,2..., N − 1

(1) C. 2D-Discrete Fourier Transform The 2D-DFT of an N×N matrix is defined in a similar way as the 1D-DFT, but considering both dimensions.

n =0

where

WN

N −1 N −1

− j 2π =e N

(2)

is know as the twiddle factor, and x(n) is the time-discrete signal. The number of complex multiplications and additions required for an N-point DFT computation are N2 and N(N-1), respectively. The inverse discrete Fourier transform (IDFT) is defined by

x ( n) =

1 N

N −1

n = 0,1,2..., N − 1

(3)

k =0

where X(k) is the frequency-domain complex signal. The twiddle factor WN is used with a change of sign. The number of complex multiplications and additions for the IDFT computation are the same as those for the DFT calculation [2]. B. Radix-2 Fast Fourier Transform The fast Fourier transform (FFT) is an efficient algorithm that consists on decomposing an N-point DFT into successively smaller DFTs in order to exploit the symmetry and periodicity of the twiddle factor WN [16]. It separates the original discrete-time sequence into its even and odd components to compute the frequency-domain complex signal, as described from (4) to (6).

X even (k ) =



n =0 N −1 2

X odd (k ) =



n=0

x(2n)WN2kn

(4)

2

x(2n + 1)WN2 kn

∑ ∑ x(n1, n2 )e

− j 2π ( k1n1 + k2n2 )

N

In (7), 0 ≤ k1 ≤ N-1, 0 ≤ k2 ≤ N-1, and x(n1,n2) is the N×N discrete time sampled matrix. To obtain the 2D-FFT of an N×N discrete time matrix, it is necessary to apply the 1D-FFT on each row n2, followed by the 1D-FFT on each column n1 of the matrix x(n1,n2). FREQUENCY-DOMAIN IMAGE FILTERING

The filtering basic operation in time-domain is the convolution. For images, this operation is fast and easy as long as the input image and the filter order are relatively small. However, for large images and filter orders it is faster and easier to carry out this operation in the frequency domain utilizing the 2D-FFT and the Fourier transform property for time-domain convolution, which transforms this operation in a multiplication. The image filtering algorithm in the frequency domain is described in Fig. 2, and it involves three steps: Step 1: 2D-FFT computation of the input image. Step 2: Complex multiplication of the input image and the applied filter in the frequency domain. Step 3: 2D-IFFT computation of the obtained result from step 2. In Fig. 2, i(x,y) is the discrete-time N×N input image. H(u,v) and G(u,v) are the transfer function and the filtered image, respectively, in the frequency domain. Finally, g(x,y) is the processed image in time-domain. The filter H(u,v) depends on the application; for instance, denoising through a Gaussian filter, edge detection utilizing a Laplacian filter,

(5)

2

X (k ) = X even (k ) + W Nk X odd (k )

(7)

n1 =0 n2 =0

III.

∑ X (k )WN−kn ,

N −1 2

X (k1 , k 2 ) =

(6) Figure 2. Scheme for image filtering in frequency domain.

low-pass filtering, high-pass filtering, or deblurring through Wiener filter. IV.

IMAGE FILTERING HARDWARE IMPLEMENTATION

Fig. 3 shows the main components for FPGA-based image filtering in frequency domain. From previous sections, it is clear that image filtering in frequency domain directly depends on the 2D-FFT and 2D-IFFT implementation. A Control Unit provides synchronization and control signals to carry out all the different tasks on each component, and an RS232 interface is used for optional communication with an external PC for user interfacing purposes and subsequent analyses.

H(u,v)

i(x,y)

g(x,y)

Figure 4. Two-port RAM for matrix storage and row/column management.

FFT internal structure is depicted in Fig. 5. The input parameters to the 2D-FFT block are the image real and imaginary information Di_real and Di_imag, respectively, the operation control OPC to carry out the FFT or IFFT, and the image size N. First, the FFT is computed on each row of the image; then, the same operation is applied on each column. The FFT computation sequence is controlled by the Count Row and Count Column elements. The outputs Do_real and Do_imag are generated on each FFT computation. The 2D-FFT block provides the corresponding memory addresses to RAM in charge of storing obtained result through the Addr_RAM bus.

Figure 3. Proposed generic hardware architecture for image filtering in frequency domain.

A. Two-Port RAM In image processing, information is managed as a matrix; therefore, a two-port RAM with the following features is required for the study cases treated here: • Two 32-bit input data ports (IDA and IDB) and two 32-bit output data ports (ODA and ODB). • Two 16-bit address busses (AA and AB). • Two control signals for enabling data transferences to the memory during writing operations (WEA and WEB). Fig. 4 shows a block diagram of this two-port RAM, which main feature is its capability of reading and writing from and to two different memory locations at the same time. Two more RAM are used for partial image processing results and filter coefficients. The defined format in the addresses buses AA and AB for matrix storage and management utilizes the 8-most significant bits (MSB) for representing rows and the remaining 8-lest significant bits (LSB) for representing columns in a 256×256 image. Data are stored in fixed-point format (FXPF), which varies depending on the dimensions of the analyzed matrix (image) and the applied filter coefficients. B. 2D-FFT and 2D-iFFT The 2D-FFT block is in charge of obtaining the frequency-domain transformation I(u,v) of the input image i(x,y). The main component of the 2D-FFT block is the radix-2 FFT computation, which carry out the frequency transformation on each row and column of i(x,y). The 2D-

Figure 5. 2D-FFT structure for image transformation into the frequency domain.

C. Radix-2 FFT The FFT computation consists of a radix-2 decimation in time, a look-up table (LUT) containing the twiddle factors, and an address generation unit to obtain corresponding twiddle factor from the LUT and store computed values in the RAMs. The FFT internal structure depicted in Fig. 6 has the following parameters, input data Di_real and Di_imag, the number of point N defines the address generation unit outputs G and Addr_RAM, which provides the corresponding twiddle factor address to the LUT, and the input and output addresses to the two-port RAMs for proper management of the image being processed. An arithmetic inverter is implemented to carry out FFT or IFFT, WNkn or WN-kn respectively, which is selected through the signal OPC.

Figure 8. Point-to-point complex multiplication.

Figure 6. Radix-2 FFT hardware implementation.

Fig. 7 shows the Radix-2 block structure, which is the main operation in the FFT computation. It carries out time decimation on the two real A_real, B_real, and the two imaginary A_imag y B_imag inputs applying the twiddle factors Tw_real and Tw_imag to perform the basic FFT operation (the butterfly), and it provides real and imaginary outputs in a separate way. The Radix-2 block consists of four 32-bit multipliers and three 64-bit adders and subtracters. The 64-bit result is truncated to its 32 MSB and divided by 2, on each iteration, to avoid overflow.

Figure 7. Hardware architecture for the radix-2 time decimation.

D. Point-to-Point Multiplication The point-to-point multiplication between the 2D-FFT of the input image I(u,v) and the filter coefficients H(u,v) (8) is carried out through the point-to-point complex multiplication (9) on each row and column of the image. Fig. 8 depicts the hardware architecture to perform this operation.

G (u, v) = I (u, v) H (u, v) Greal + Gimag = ( I real H real − I imag H imag ) + ( I real H imag + I imag H real )

(8)

(9)

The point-to-point complex multiplication is carried out through four 32-bit multipliers, one 64-bit adder and subtracter. It is worth to notice that because of the two-port RAM structure, it is possible to carry out to complex multiplications in parallel with a pipeline structure to speed up the processing time. V.

TESTS AND RESULTS

This section shows the experimentation carried out to test the performance of the proposed hardware implementation for image filtering in the frequency domain, the resource utilization on an FPGA device from Xilinx, as well as the obtained results on each treated case. A. Study Cases Five different study cases were used for testing the performance of the proposed generic hardware implementation for image filtering in the frequency domain: a) denoising, b) edge detection, c) low-pass filtering, d) highpass filtering, and e) deblurring. On each study case a 256×256 grayscale image was used as shown in Fig. 9. The information exchange between the FPGA and a standard PC was carried out through an RS232 interface. B. Hardware Implementation Results The proposed generic unit for image filtering in the frequency domain was implemented in an FPGA device from two different vendors, the Virtex 6 ML605 from Xilinx, and the Stratix III DE3-260 from Altera. Table I summarizes the hardware implementation results for processing a 256×256 grayscale image as a relation between available and used resources in the device, as well as the maximum operation frequency of the hardware processing unit. C. Image Filtering Results For the denoising study case the image under test (Fig. 9a) was obtained by applying Gaussian noise of mean m=0, a variance v=0.01. A Gaussian filter with standard deviation σ=0.5 was applied. Fig. 10 shows obtained results for image denoising. For the edge detection case a Laplacian filter was applied on the image in Fig. 9b. Obtained results are shown in Fig. 11. The image in Fig. 9c was used for low-pass and highpass filtering. This image is known as the Siemens star and

TABLE I.

RESOURCE UTILIZATION OF THE PROPOSED FPGA-BASED HARDWARE PROCESSING UNIT

Resource Utilization Programmable Logic Memory Blocks DSP Blocks Max. Op. Freq.

Xilinx Virtex 6 ML605 1% 61% 6% 66 MHz

Altera Stratix III DE3-260 1% 57% 6% 77 MHz

contains high frequencies (bright spokes) in its middle, which decrease (becomes darker) by moving away from it. Obtained results are shown in Fig. 12 and Fig. 13 for lowpass and high-pass filtering, respectively. Figure 11. Edge detection thorugh a Laplacian filter.

Figure 12. Low-pass filtering results.

Figure 9. Tested images: a) Cameraman with Gaussian noise, b) Lena for edge detection, c) Siemens star for low-pass and high-pass filtering, d) Lena with blurring. Figure 13. High-pass filtering results.

filter with a delay parameter α=0.003. Obtained results are shown in Fig. 14. Finally, Table II shows the elapsed time for image filtering in hardware utilizing a Xilinx Virtex 6 device, and in software utilizing MATLAB 7.3.0.499 on a 2.4GHz Intel Core Duo Processor. From there, the proposed generic hardware implementation outperforms by 2 orders of magnitude its software counterpart.

Figure 10. Denoising of an image utilizing a Gaussian filter.

The image in Fig. 9d was used for the deblurring study case. The blurred effect was obtained by applying the MATLAB function fspecial(‘motion’) on Lena. The image restoration (deblurring) was carried out through a Wiener

D. Discussion. From obtained results in the different study cases, it was shown that the generic FPGA-based hardware TABLE II. ELAPSED TIME FOR IMAGE PROCESSING IN THE PROPOSED GENERIC HARDWARE IMPLEMENTATION AND IN MATLAB. Elapsed time

FPGA 15.88 ms

MATLAB 6.83 s

errors by applying a digital post-processing on the captured image. Acknowledgments: This work was supported in part by the University of Guanajuato, PIFI 2012, and by the National Council on Science and Technology (CONACYT), Mexico, under Scholarships 254859, 254860, 269243. REFERENCES [1] [2] Figure 14. Deblurring of an image through a Wiener filter.

[3]

implementation for image filtering in the frequency domain appropriately applies any user designed filtering operation on an image. This is verified by comparing the obtained results from the proposed FPGA-based hardware processing unit against its software counterpart implementation in MATLAB, where practically the same results are produced from both cases. Regarding performance, the hardware processing unit outperforms by two orders of magnitude the elapsed time in its software implementation counterpart providing a real-time solution for image filtering by processing around 63 fps. Different from previous approaches based on Handel-C [12], [13] or data flow graph (DFG) [14] descriptions, the proposed FPGA-based implementation is highly portable to different platforms of distinct vendors because of its VHDL-based hardware description, and it fits on a single die contrary to reviewed approaches that used two development boards [12].

[4]

VI.

[5]

[6]

[7]

[8]

[9]

CONCLUSIONS

Digital filtering is a key step of image processing in many applications. Due to its importance different hardware implementations have been proposed. Unfortunately most of them are application specific approaches that have limitations in the applied filtering technique and portability because of the utilized hardware description method. This work contributes with a generic, FPGA-based, reconfigurable architecture for real-time, online image filtering in the frequency domain. Obtained results from the different study cases: 1) denoising, 2) edge detection, 3) lowpass filtering, 4) high-pass filtering, and 5) deblurring, show that the proposed FPGA-based hardware processing unit is highly efficient on applying any user selected filtering operation in an image. Furthermore, thanks to its generic reconfigurable architecture and portability, the FPGA-based hardware processing unit can be extended to process images with different dimensions for distinct applications in realtime, outperforming by two orders of magnitude its software counterpart implementation. Future Work: The generic FPGA-based hardware implementation unit introduced in this paper for image filtering in the frequency domain is considered as preceding work for applications in digital cameras to fix optical-system

[10]

[11]

[12]

[13]

[14]

[15] [16]

R. C. Gonzalez, R. E. Woods, Digital Image Processing, 3rd Edition, Upper Saddle River, NJ, Prentice Hall, 2007. E. O. Brigham, The fast Fourier Transform and its Applications, 1st Edition, Upper Saddle River, NJ, Prentice Hall, 1988. O. Raaf, A. E. H. Adane, “Pattern recognition filtering and bidimensional FFT-based detection of storms in meteorological radar images,” Digital Signal Processing, vol. 22, pp. 734–743, Sep. 2012. H.S. Kushwaha, S. Tanwar, K. S. Rathore, S. Srivastava, “De-noising Filters for TEM (Transmission Electron Microscopy) Image of Nanomaterials,” in Proc. Second International Conference on Advanced Computing & Communication Technologies (ACCT), Rohtak, Haryana, 2012, pp.276–281. X. Xu, Z. Yang, Y. Wang, “A method based on rank-ordered filter to detect edges in cellular image,” Pattern Recognition Letters, vol. 30, pp. 634–640, Apr. 2009. C.S. Panda, S. Patnaik, “Filtering and Performance Evaluation for Restoration of Grayscale Image Corrupted by Salt & Pepper Noise Using Low Pass Filtering Schemes,” in Proc. 2nd International Conference on Emerging Trends in Engineering and Technology (ICETET), Nagpur, India, 2009, pp. 940–945. L. Wang, Y. Huang, X. Luo, Z. Wang, S. Luo, “Image deblurring with filters learned by extreme learning machine,” Neurocomputing, vol. 74, pp. 2464–2474, Sep. 2011. M. Sun, L. Tian, D. Dai, “Radix-8 FFT Processor Desing Based on FPGA,” in Proc. 5th International Congress on Image and Signal Processing (CISP), Chongqing, China, 2012, pp. 1453–1457. K. Pereira, P. Athanas, H. Lin, W. Feng, “Spectral method characterization on FPGA and GPU accelerators,” in Proc. International Conference on Reconfigurable Computing and FPGAs (ReConFig), Cancun, Mexico, 2011, pp. 487–492. Y. Zhang, X. F. Lu, H. L. Lu, W. Liu, “Iris image quality assessment based on FPGA coprocessor,” in Proc. IET International Communication Conference on Wireless Mobile and Computing (CCWMC), Shanghai, China, 2009, pp.555–558. S. Neema, J. Scott, T. Bapty, “Real time reconfigurable image recognition system,” in Proc. 18th IEEE Instrumentation and Measurement Technology Conference (IMTC), Budapest, Hungary, 2001, pp. 350–355. I. S. Uzun, A. Amira, F. Besaali, “A reconfigurable coprocessor for high-resolution image filtering in real time,” in Proc. 10th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Sharjah, United Arab Emirates, 2003, pp.192–195. I.S. Uzun,. A. Amira, A. Bouridane, “FPGA implementations of fast Fourier transforms for real-time signal and image processing,” IEE Proceedings - Vision, Image and Signal Processing, vol. 152, pp. 283–296, Jun. 2005. W. Wang, B. Duan, C. Zhang, P. Zhang, N. Sun “Accelerating 2D FFT with non-power-of-two problem size on FPGA,” in Proc. International Conference on Reconfigurable Computing and FPGAs (ReConFig), Cancun, Mexico, 2010, pp. 208–213. Celoxica Limited, DK4-Handel-C Language Reference Manual, Celoxica LTD 2005. J. W. Cooley and J. W. Tukey, “An algorithm for the machine computation of the complex Fourier series,” Mathematics of Computation, vol. 19, pp. 297–301, Apr. 1965.

Suggest Documents