Real-time hardware accelerator for single image haze ... - J-Stage

LETTER

IEICE Electronics Express, Vol.11, No.24, 1–12

Real-time hardware accelerator for single image haze removal using dark channel prior and guided filter Zhengfa Lianga), Hengzhu Liu, Botao Zhang, and Benzhang Wang School of Computer, National University of Defense Technology, Deya Road NO.109, Kaifu District, Changsha 410073, Hunan, China a) [email protected]

Abstract: This paper presents a real-time hardware accelerator for single image haze removal using dark channel prior and guided filter on a FPGA chip. Single image haze removal using dark channel prior and guided filter is one of the state-of-art algorithms recently proposed. However, its large quantity of calculation limits its real-time processing ability. So, in this paper, we design a hardware accelerator based on FPGA implementation for single image haze removal, which takes full advantage of the powerful parallel processing ability of the hardware and the parallelism of the algorithm. To be exactly, 1) the dark channel calculation part and the atmospheric light calculation part of the algorithm are modified to reduce the quantity of computation; 2) two pipelines are applied in the guided filtering to speed up the processing; 3) in addition, fast mean filtering technique is used to accelerate the mean filtering, which is the main calculation of the guided filter, by avoiding redundant computation. To the best of our knowledge, this paper is also the first FPGA design for single image haze removal using dark channel prior and guided filtering. The design can achieve 13.74 ms at 100 MHz when processing a 720 × 576 image, and gives almost the same results as that of original algorithm. Keywords: haze removal, dark channel prior, guided filtering, hardware accelerator, real-time process Classification: Integrated circuits References

© IEICE 2014 DOI: 10.1587/elex.11.20141002 Received October 20, 2014 Accepted November 11, 2014 Publicized November 28, 2014 Copyedited December 25, 2014

[1] S. K. Nayar and S. G. Narasimhan: Proc. Seventh IEEE International Conf. Computer Vision (1999) 820 DOI:10.1109/ICCV.1999.790306 [2] R. Kimmel and M. Elad: Int. J. Comput. Vis. 52 (2003) 7. DOI:10.1023/A: 1022314423998 [3] S. G. Narasimhan and S. K. Nayar: IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003) 713. DOI:10.1109/TPAMI.2003.1201821 [4] M.-J. Seow and V. K. Asari: Neurocomputing 69 (2006) 954. DOI:10.1016/j. neucom.2005.07.003

1


[5] S. G. Narasimhan and S. K. Nayar: Proc. IEEE Workshop Color and Photometric Methods in Computer Vision, in Conjunction with IEEE Intl Conf. Computer Vision (2003). http://www.cs.cmu.edu/~ILIM/publications/PDFs/NN-CPMCV03. pdf [6] R. Fattal: Proc. ACM SIGGRAPH (2008). DOI:10.1145/1360612.1360671 [7] K. He, J. Sun and X. Tang: IEEE Trans. Pattern Anal. Mach. Intell. 33 (2011) 2341. DOI:10.1109/TPAMI.2010.168 [8] K. He, J. Sun and X. Tang: ECCV (2010) 1. DOI:10.1007/978-3-642-15549-9_1 [9] S. Rakshit, A. Ghosh and B. U. Shankar: Pattern Recognit. 40 (2007) 890. DOI:10.1016/j.patcog.2006.02.008 [10] J. Zhang, H. Liu, W. Hu, D. Liu and B. Zhang: IEICE Electron. Express 9 (2012) 765. DOI:10.1587/elex.9.765

1


Introduction

Images of outdoor scenes are usually degraded by the turbid medium (e.g., particles and water droplets) in the atmospheric due to atmospheric absorption and scattering [1]. The degraded images lose contrast and color fidelity, which will directly affect the performance of computer vision application in outdoor scene, such as object recognition and tracking of outdoor video surveillance, highway monitoring, satellite remote sensing, aerial reconnaissance, etc. Therefore, haze removal (or dehazing) is highly desired in consumer/computational photography and computer vision applications to improve their reliability and robustness. In the past studies, people have put forward a lot of haze removal methods, which can be roughly classified into two major categories. One is based on image enhancement, such as retinex algorithm [2], contrast restoration [3], homomorphic filtering [4], etc. They heuristically assume a constant visual depth of the scene, which goes against the fact that the haze is dependent on the unknown depth of the scene. So they cannot remove the haze naturally with the variation of the scene depth. Another kind of methods is based on atmospheric physics model [5] which is used to describe the imaging process of the haze image. These methods have made significant progresses in the recent years lying on using stronger priors or assumptions. Fattal [6] estimates the albedo of the scene and the medium transmission under the assumption that the transmission and the surface shading are locally uncorrelated. This approach is physically sound and can produce impressive results. However, it cannot handle heavily hazy images well and may fail in the cases where the assumption is broken. He et al. [7] proposed dark channel prior assumption, that is, most local patches in outdoor haze-free images contain some pixels whose intensity is very low in at least one color channel, to estimate the depth information based on the comparison between the hazy and clean image. The dark channel prior is simple but effective for single image haze removal. Nevertheless, the haze transmission map is estimated by dark channel prior is rough and need to be refined. In [7] a soft matting method is used to refine the transmission map by solving the matting Laplacian matrix, which is of highly computation complexity. In [8], He et al. proposed guided filter and showed that the guided filter

2


is closely related to the matting Laplacian matrix. When guided filter is used to refine the transmission map, the processing time can be reduced to about 1/250 of that of soft matting method. So the haze removal algorithm using dark channel prior and guided filtering becomes very applicable and widely studied. In this paper, we design a hardware accelerator based on FPGA for single image haze removal using dark channel prior and guided filter. The design takes full advantage of powerful parallel processing ability of the hardware and the parallelism of the algorithm. Firstly, we modify dark channel calculation part and the atmospheric light calculation part of the algorithm to reduce the quantity of computation. Secondly, we apply two pipelines in the guided filtering part to speed up the processing procedure. In addition, fast mean filtering technique is used to accelerate the mean filtering which is the main computation task in the guided filter by avoiding redundant computation. The design can achieve real-time processing for 720 576 images. The haze removal result of the hardware accelerator is compared with that of the original algorithm which is implemented by MATLAB. Experiment shows that they are almost the same. 2

Algorithm of single image removal using dark channel prior and guided filter

2.1 Atmospheric scattering model Atmospheric scattering model [5] is widely used to describe the formation mechanism of a hazy image and for haze removal, which is simplified as Eq. (1): IðxÞ ¼ JðxÞtðxÞ þ Að1 tðxÞÞ

ð1Þ

where IðxÞ is the observed intensity, JðxÞ is the scene radiance, A is the global atmospheric light, and tðxÞ is the medium transmission describing the portion of the light that is not scattered and reaches the camera. The goal of haze removal is to recover t, A, and J from I. 2.2 Estimating the transmission by dark channel prior and guided filter The method uses dark channel prior to estimate the transmission tðxÞ. The dark channel prior is a statistical regularity on outdoor haze-free images: most local patches in outdoor haze-free images contain some pixels whose intensity is very low in at least one color channel [7]. For an arbitrary image J, its dark channel J dark is given by dark c J ðxÞ ¼ min min y ð2Þ yðxÞ cfr;g;bg

where J c is a color channel of J and ðxÞ is a local patch centered at x. If J is an outdoor haze-free image, the intensity of J’s dark channel is low and tends to be zero: J dark ! 0 © IEICE 2014 DOI: 10.1587/elex.11.20141002 Received October 20, 2014 Accepted November 11, 2014 Publicized November 28, 2014 Copyedited December 25, 2014

ð3Þ

So when apply the dark channel prior to Eq. (1), and assume that A is given, we have

3


I dark ð4Þ A But the transmission by Eq. (4) is coarse and need to be refined. Guided filter [8] is one of the fastest edge-preserving filters, which computes the filtering of the input image by considering the content of the guidance image. The output image obtains the edge details sufficient of the guided image and also retains the overall characteristics of the input image. Guided filter can be used to refine the transmission map in Eq. (4) as follows: tðxÞ ¼ 1

t~ðxÞ ¼ GFðt; IÞ

ð5Þ

where GF is a guided filter with input image t and guidance image I. For the details of guided filter, readers can refer to reference [8]. Here we give out its pseudocode in Algorithm 1. In Algorithm 1, fmean is mean filter algorithm. Mean filter calculates the sum of all pixels in a local patch every time, then calculates the mean of the center pixel. In fact, the intermediate results when calculating the former patch can be used to calculate the next pixel’s mean via moving sum. This method is called fast mean filter technique [9], which can reduces the number of additions to approximately 1=n th of the original number, where n n is the patch’s size and completely eliminates the division operation by shift operation if n is the power of 2. The fast mean filter’s pseudocode is given in Algorithm 2. 2.3 Estimating the atmospheric light When estimating the atmospheric light, top 0.1 percent brightest pixels in the dark channel are first picked [7]. These pixels are usually most haze-opaque. Among these pixels, the pixels with highest intensity in the input image I are selected as the atmospheric light. 2.4 Recovering the dehaze image After the transmission map being refined, and the atmospheric light being estimated, we can recover the haze-free image according to Eq. (1). But the term JðxÞtðxÞ may be very close to zero, when tðxÞ is close to zero. The directly recovered dahaze image J is prone to noise. Therefore, the transmission tðxÞ is restricted a lower bound t0 , i.e., a small amount of haze in very dense haze regions is preserve. J¼


IA A max t~; t0

ð6Þ

Algorithm 1. guided Filter Input: filtering input image p, guidance image I, patch’width n, regularization ε Output: filtering output image q 1: mean_I = fmean(I) mean_p = fmean(p) mean_II = fmean(I. I)

4


mean_Ip = fmean(I. p) 2: var_I = mean_II - mean_I. mean_I cov_Ip = mean_Ip - mean_I. mean_p 3: a = cov_Ip./(var_I+ε) b = mean_p - a. mean_I 4: mean_a = fmean(a) mean_b = fmean(b) 5: q = mean_a. I + mean_b Algorithm 2. 2D Fast Mean Filter Via Moving Sum Input: Image I of size h w, patch’width n Output: output Image P of size h w for row = 0 to (h-1) col = 0; Store sum of pixel greylevels of columns 0, 1, … , n-1 in s[0], s[1], … , s[n-1] Sum = s[0] + s[1] + … + s[n-1]; P[row][col] = mean[Sum]; j = 0; for col = 0 to (w-1) Sum = Sum + s[ j] s[ j] = Sum of pixel greylevels in column (col + n) Sum = Sum + s[ j] P[row][col] = mean[Sum] j=j+1 if j >¼ n j=0 end end end 3


The implementation of hardware accelerator

From the description of the algorithm in section 2, we can see that the major computation task is within the guided filtering part, which contains several mean filtering operations. Fortunately, the mean filtering can be done simultaneously and accelerated by fast mean filter technique. At the same time, the guided filtering can be speeded up by pipeline technique. In addition, because that the algorithm involves large quantity of the data storage and exchange, data assessment scheme need to be carefully designed. In this section, we describe the implementation of hardware accelerator for the single image haze removal task, as is shown in Fig. 1. The image is first obtained from a camera and written into DDR memory with RGB format. Then the hardware accelerator reads the image from DDR memory. The processing can be divided into three steps:

5


Fig. 1.

The hardware architecture of single image haze removal using dark channel prior and guided filter. Different colors represent different major components in the algorithm. Red: algorithm input and output part; Blue: DDR memory; Yellow: coarse transmission map and atmospheric light estimation part; Green: refining transmission using guided filter; Orange: original image recovery part.

1) The gray scale map and the dark channel of the image are calculated simultaneously. The dark channel is used to estimate the atmospheric light and the coarse transmission map. The gray scale map is stored in a FIFO, which will be used for guided filtering. 2) The coarse transmission map is refined by guided filtering. 3) The refined transmission map together with the atmospheric estimated before is used to recover the original image. After recovering, the image is stored back into the DDR memory. In the rest of this section, we will give the implementation of the three steps of haze removal in detail.


3.1 Coarse transmission map, atmospheric light and gray scale map calculation Firstly, the dark channel and gray scale map of the input image are calculated. In [7], the dark channel of pixel x is defined as Eq. (2) and the transmission is estimated as Eq. (4). Note that the coarse transmission map will be refined later, we simplify the calculation of dark channel in our design. We adjust the patch size to 8 8. At the same time, instead of using sliding window manner, we chose non overlapping window. So in each patches, the dark channel has the same value. In order to show the efficiency of the adjustment, we compare our method with that in [7]. In [7], the patch size is 15 15 with sliding window. For a 720 576 image, it requires 225 720 576 ¼ 93312000 comparison operations for calculating the image dark channel, while our method only needs 64 90 72 ¼ 414720 comparison operations. The quantity of computation is greatly reduced

6


with almost the same haze removal effect by the original algorithm. The gray scale map of the input image is given by gray ¼ 0:25R þ 0:5G þ 0:25B

ð7Þ

which is not exactly the same with the classical definition, but it makes almost no difference and is much easier to be implemented in hardware design, because we can simply use cut-off operation instead of multiplication to reduce the resource utilization. Finally, the atmospheric light is simply equaled to the gray value whose dark channel value is the largest. A ¼ gray argmaxðIdark ðxÞÞ ð8Þ x

Although several approximate treatments are exploited, they make almost no difference to the haze removal results. The hardware architecture for this part is shown in Fig. 2.

Fig. 2.

The hardware architecture of processing procedure for calculating gray scale map, coarse transmission map and atmospheric light.

In addition, due to the fact that the gray value of a pixel can be obtained after 2 clock cycles, while the dark channel needs 2 þ 64 ¼ 66 clock cycles, a FIFO for buffering the gray value is necessary to make them synchronized. The FIFO is exploited as ping-pong buffer and its architecture is shown in Fig. 3.


3.2 Refine the transmission map by guided filter While part of the coarse transmission map and the gray value map are calculated, they are stored into a ping-pong buffer respectively with data width being 72 bits and memory depth being 512. The 72 bits contain 8 data with 9-bit format (the highest bit is sign bit). When the buffers are full, the data within them are exported to another two buffers which are used to buffer the input data or intermediate data of guided filtering as is shown in Fig. 1. These two buffers contain 16 banks

7


Fig. 3.


Ping-pong buffer for buffering coarse transmission map and gray value map.

respectively. The data width of the bank is the same with that of the ping-pong buffer. In our design, four fast mean filter modules are implemented for the guided filtering. Here, we divide the guided filtering into two steps: 1) In the first step, mean_I, mean_p, mean_II, mean_Ip are calculated by all the four fast mean filter module simultaneously to obtain a and b. a and b are written into DDR. 2) a and b are read from DDR and feedback to the ping-pong buffer. Then two of the four fast mean filter modules are reused to calculate mean_a and mean_b, which are used to refined the transmission map at last as shown in Algorithm 1. The multiplier here is implemented by the IP cores provided by Xilinx. The architecture of the fast mean filter is shown in Fig. 4. The patch size of mean filter is 128 128. As described before, for each buffer, every row can store 8 16 ¼ 128 data, which equals to the width of the patch just right. By this way, we can provide data to the fast mean filters conveniently. The divider is implemented by the CORDIC method [10]. The whole processing procedure is speeded up by pipeline technique. Fig. 5 shows the first step’s flow path.

8


Fig. 4.

The scheme of fast mean filter. Register output the result of mean filter in current row after 128 clock cycles. When calculating the mean of next row, we should reset the fifo to restart the mean filter.

Fig. 5.

The first step of guided filter which is speeded up by pipeline technique.

3.3 Recovering the haze-free image After obtaining the atmospheric light in section 3.1, and refined transmission map in section 3.2, the haze removal work can be done by Eq. (3). Fig. 6 shows the haze removal module of the design. Threshold value t0 is set to 0.1. The R, G, B channel are processed simultaneously. Here, pipeline technique is applied as well. For each pixel in mean_a and mean_b, so long as generated, they are used to recover the corresponding pixel’s value. 4


Experiments and results

The whole design is implemented in Zedboard, which is based on Xilinx all programmable SoC Zynq-7000 that tightly integrates high-performance ARM Cortex A9 with programmable logic devices. Zedboard is chosen for the needs of our project. And we use only the FPGA part of Zedboard. The development environment is Xilinx ISE 14.4, with Modelsim SE 10.1c integrated as the

9


Fig. 6.

Recover the haze-free image by refined transmission map and atmospheric light. Table I. FPGA resource requirements for haze removal


resource type

resource used

total available

ratio

Slice

3980

13300

29%

Slice LUT

10404

53200

19%

Slice Register

10897

106400

10%

BRAM36E1

38

140

27%

DSP48E1

75

200

34%

simulation tool. In addition, simulation of the original algorithm implemented by MATLAB is done to compare the haze removal results. The resources requirement of the design is given in Table I. All memories including FIFO of the mean filter are mapped to the BRAM resource, while the multipliers are mapped to DSP resources. The hardware accelerator needs 29% Slices, 27% BRAM and 34% DSP resources of the Zedboard at all. The main computation task is within the fast mean filtering part. The patch size of the mean filter is set to 128 128, which decides the number of memories and multipliers units of the guided filter. If we reduce the patch size, a large degree of logic resources in FPGA can be saved. Considering that the maximum clock frequency of the DSP can be configured to 450 MHz, we set the frequency of the accelerator to 100 MHz and that of DSP logic IP to 400 MHz. Fig. 7 gives the haze removal results in three scenes. We can see that our design can achieve almost the same results with the original algorithm. The details of the scene obscured by haze are recovered admirably. In Table II, we give the difference between the two haze-free images obtain by original algorithm and hardware accelerator. As is shown, the error difference of the two images is very small. In our experiments, it needs about 4 seconds to process a 720 576 image in MATLAB on a PC with a 3.2 GHz Intel i5 processor, but only takes about 13.74 ms in our hardware accelerator operating at 100 MHz. So our design can achieve real time processing for haze removal task.

10


Fig. 7.

Table II.


Haze removal results for three scenes. First row: Input hazy images. Second row: Refined transmission maps by original algorithm. Third row: Refined transmission maps by our hardware accelerator. Fourth row: Recovered images by original algorithm. Fifth row: Recovered images by our hardware accelerator.

The analysis of the error between Fig. 7(fourth row) and Fig. 7(fifth row)

Group

maximal error

minimal error

(1)

0.6388

(2) (3)

mean error in channel R

G

B

0

0.0548

0.0520

0.0480

0.3937

0

0.0239

0.0257

0.0275

0.3403

0

0.0811

0.0812

0.0818

11


5

Conclusion

In this paper, we design a hardware accelerator implemented by FPGA for single image haze removal based on dark channel prior and guided filter. We modify the calculation of the dark channel and atmospheric light, so as to make the algorithm be implemented in hardware easier and reduce the computation greatly while making almost no difference to the original algorithm. Fast mean filtering and pipeline technique are applied to speed up the processing procedure. The hardware accelerator can process single image haze removal is able to process about 80 haze images (720 576) per second. It meets the requirement of real time processing.


12