Cost-effective VLSI Design of Non Linear Image Processing Filters *Sergio Saponara, *Michele Cassiano, **Stefano Marsi, *Riccardo Coen, *Luca Fanucci *DIIEIT, University of Pisa, via G. Caruso, I-56122, Pisa, Italy **DEEI, University of Trieste, via A. Valerio 10, I-34127, Trieste, Italy Email: {sergio.saponara, michele.cassiano, riccardo.coen, luca.fanucci} @iet.unipi.it,
[email protected] Abstract This paper presents a design methodology suitable for the cost-effective and real-time implementation of nonlinear image processing algorithms. Starting from highlevel functional descriptions the proposed optimization flow simplifies the designer’s duty to achieve a low complexity and low power realization in CMOS technology (FPGA and/or ASIC) with low accuracy loss for the implemented algorithm. As an application case study the paper describes the design of a system, based on a Retinex-like algorithm, to improve the visual quality of images acquired in bad lighting conditions.
1. Introduction As proved in literature [1-10], the use of non linear operators in image processing allows for a higher visual quality with respect to classic linear solutions as FIR, IIR filtering structures. To avoid ugly effects like image blurring, overshoot and undershoot, non linear operators have been proposed for several tasks such as denoising, deblocking, image enhancement, luminance correction, dynamic compression and interpolation. The application fields include video communication and entertainment, video-surveillance and tele-assistance in health care, military and security systems, ambient intelligence, active security in automotive. Non linear functions are usually implemented real-time on DSP platforms or they can be executed off-line on a PC. However, the software-based approach is not suitable for applications as battery power terminals or single-chip embedded systems where realtime, low-complexity and low-power are required. The above specifications can be met by designing VLSI macrocells targeted, through a semi-custom flow, in FPGA or ASIC technology. To reduce the high development time and cost of the VLSI approach a design flow, tailored for the cost-effective hardware implementation of non linear algorithms, has to be devised. The main problems to be faced are (i) the definition of precise criteria to linearize non linear operators and (ii) the definition of machine arithmetic. A rapid exploration of such design space is desirable and an intellectual property (IP) reuse approach has to be followed for the development of VLSI cells. Linearization methodologies for the low-complex implementation of non linear functions in embedded systems have been studied in literature [11,12]. The
proposed solutions address basic non linear functions, e.g. a single operator as a sine or a logarithm, while complex processing systems combine several non liner operators in the same algorithm. Furthermore, known works [11,12] refer to one-dimensional data structures and adopt only objective criteria to drive the linearization process. As it will be further discussed in the paper, both subjective and objective criteria are required when optimizing bidimensional image processing tasks to avoid the appearance of ugly effects on the final visualized output. With reference to the implementation of Retinex non-linear algorithms [2-9] this paper presents a design methodology to efficiently cope with the above issues. Our target is achieving a cost-effective and flexible VLSI implementation while keeping the same performance of the original non linear infinite precision algorithm. Hereafter Section 2 describes the case study of Retinex-like non linear operators. Section 3 introduces the overall optimization flow. Sections 4 and 5 detail the linearization phase and the definition of machine arithmetic. In Section 6 the simulation and implementation results obtained for the case study will validate the proposed methodology. Conclusions are drawn in Section 7.
Figure 1. Block diagram of Retinex-based operators
2. Retinex-like Non Linear Algorithms In the Retinex theory, first proposed in [2], an image is expressed as the pixel-by-pixel product of the ambient illumination y and the reflectance r of the scene object. Based on Retinex several non-linear operators have been proposed in literature for image contrast enhancement, correction of images acquired in bad lighting conditions, control of dynamic in logarithmic sensors [3-9]. All these filters exploit a similar structure sketched in Fig. 1: a nonlinear edge-preserving low-pass filter F (e.g. a Recursive Rational Filter is adopted in [4]) is used to estimate the illumination y. Then the reflectance information r is obtained by division. These two components are splitted to different elaboration chains which operate non linear pointto-point transformations, e.g. luminance correction Γ and reflectance enhancement β in [3,4]. After Γ transformation
the illumination component is linearly stretched to perfectly cover the whole input range (0÷255 in case of 8-bit input pixel). Eventually the two components are recombined by multiplication to elaborate the output of the system. In case of inputs from logarithmic sensors division and multiplication in Fig. 1 are replaced by sum and subtraction. As an example, Fig. 2 shows a portion of an image acquired in bad lighting conditions (2a). Applying the classical histogram equalization brings to the result visualized in Fig. 2b. While trying to get the image clearer, a detail blurring comes up. Retinex algorithm in [3,4], instead, permits to obtain the effect in Figure 2. a) Original Fig. 2c solving the problems image, b) histogram of image contrast and equalization, c) Retinex brightness together. While the block diagram in Fig. 1 seems to be simple the description of each block involves the use of mathematical artifices. In Fig. 3, the Γ curves for three different values of the parameter γ are displayed. In Fig. 4, β transformation is shown for three values of the parameter b. The exact equations for the Γ and β operators are:
output, while So and Sv effects are more hidden to the observer. They are used, indeed, to limit overshoot and halo effect, generated when the input image presents sharp luminance variation. Thus their approximation does not greatly influence the final image quality. This means that, when trying to reduce design complexity, the implementation loss has to be reduced in particular for Γ and β. Similar problems should be faced also in other kind of image processing applications where non linear algorithms have been successfully applied [1-10].
Figure 3. Γ transformation (luminance correction)
y γ⋅ 1+
1 1 y 255 Γ(y) = 255 ⋅ and β(r) = (1) + − b ⋅ log r 2 1+ e 255 The low-pass filter F is based on a recursive configuration whose coefficients are non-linear functions of the input pixel in(n,m) and its neighbours within 3x3-sample masks. y( n , m) =
S o ⋅ f o + S v ⋅ f v + in (n , m) So + S v + 1
(2)
where: f o = α ⋅ y(n − 1, m) + (1 − α ) ⋅ in (n, m) f v = α ⋅ y(n , m − 1) + (1 − α ) ⋅ in (n , m) So =
10 −2 in( n − 1, m) + 1 10 − 5 + log10 in( n + 1, m) + 1
2
Sv =
Figure 4. β transformation (contrast enhancement)
(4) (5)
10−2 in(n, m − 1) + 1 10−5 + log10 in(n, m + 1) + 1
2
(6)
The parameter α controls the cut-off frequency. In Fig. 5 So, Sv are plotted in a logarithmic scale. From now on, the logarithm argument in equation 6 will be referred to as X. Non linearity plays its role both in Γ and β transformations and in the expressions of filter coefficients. It is immediately to put in evidence that Γ and β operators produce easily visible effects on the system
Figure 5. So (Sv) transformation vs. the logarithm argument X
original algorithm. Two different criteria are proposed, that is, an objective criterion requiring a PSNR (Peak Signalto-Noise Ratio widely used in the image processing research community) higher than 30 dB and a subjective one based on visual perception and comparison of the relevant outputs. For good optimization results both these criteria have to be applied. In Sections 4 to 6 the proposed design flow is applied with reference to the Retinex-like operator described in Section 2. The design of the architecture in Fig. 1 can be split in the design of two submodules, one concerning the generation of y and r images by the recursive filter F plus the divider and one concerning the elaboration of the two separated channels by Γ and β blocks plus the final recombination.
Figure 6. Optimization flow
4. Linearization of Non-linear Operators 4.1 General considerations
Figure 7. Examples of linear piece-wise (left) and constant piece-wise (right) approximation
3. Proposed Methodology This Section describes the proposed design steps to rapidly achieve a cost-effective implementation of nonlinear filtering structures. In Fig. 6 the optimization flow is sketched. The first part of the design is developed in a C/Matlab environment. In this stage the design space is explored to achieve a linearized structural C model of the filter with finite precision arithmetic. Starting from the structural C model, a parametric VLSI architecture (an IP macrocell modelled through VHDL) has been derived and synthesized in physical resources, in both FPGA and ASIC technologies. Most of the designer effort lies in the first part when he has to look ahead to foresee how his choices will influence the final result in terms of complexity and implementation loss. This first step is splitted in two stages: non linear operator linearization and, following, bittrue arithmetic definition. The final aim in this process is an optimized model which faithfully describes in a highlevel language how the implemented system will behave. This model is constantly compared to the original algorithm, because a low implementation loss is a fundamental design goal. So some criteria are needed to evaluate the faithfulness of the optimized model vs. the
The linearization phase is the key step in the proposed flow. The constraints for the final implementation are low circuit complexity and small performance loss. A limited quantity of involved resources does not allow a faithful implementation of the original algorithmic description and hence the accuracy loss is high. On the contrary, a lot of physical resources are required for the original algorithm to be faithfully reproduced. So an optimal trade-off has to be found. For a rapid exploration of the design space, valid for any application case, we consider two basic linearization approaches: linear piece-wise and constant piece-wise. Fig. 7 shows an example of these two approximations with reference to the Γ transformation for a fixed parameter value (γ=0.25). Both these cases are suitable in the attempt to approximate a non linear transformation from an input variable to an output, even making provision for the presence of a parameter changing the function shape (e.g. Γ, β graphs in Figs. 3, 4). The presence of an edge on a particular abscissa value in Fig. 7 indicates the crossing from a piece to another, that is the changing of the output approximate expression. This established, the designer effort consists now in choosing how many pieces the piece-wise has to be composed of and to distribute edges on the abscissa axis according to an appropriate law. To find the optimal edge-distribution law a statistical analysis of the visual signal is useful. Visualizing some images histogram, also at an intermediate level of elaboration, suggests the zone where a higher precision is needed, that is where to accumulate the majority of the edges. To be noticed that the algorithmic performance loss is determined by both number and position of edges while the final circuit complexity is mainly determined by the edge number. Indeed, in both approaches (linear and constant piece-wise) there is the need of preventively evaluate, according to the value of a particular pixel input, which is the zone of the
piece-wise involved in the transformation. This requires a comparison to the edge values implemented by a comparator array. Thus, the higher is the number of edges, the higher is the comparator-stage complexity. For an exhaustive complexity analysis also the rest of the resources have to be considered. Being NC the number of pieces of the constant piece-wise, NL the number of pieces of the linear piece-wise, K the number of the possible shape parameters and agreeing to express in round brackets the size of involved Look-Up Tables (LUT), ROMs, comparator arrays (comp), we can estimate the complexity for the two approaches. In the constant piecewise approach NC comparators are needed plus a Look-Up Table containing the NC × K output values of the constant piece-wises. The LUT is addressed by the comparator output and by the selected parameter (Fig. 8a). Therefore the implementation cost C1 of the constant piece-wise approach can be expressed as: (7) C1 = comp(NC) + LUT(NC × K) In the linear piece-wise approach, besides NL comparators some other computational resources are needed. It is necessary to register in a ROM memory all the parameters characterizing the segments that set up the piece-wises, that is, NL × K slopes and NL × K offsets. The ROM is addressed by the output of the comparator array and by the parameter (Fig. 8b). Moreover we need a multiplier and an adder (Mul&Add) to execute the product of the slope by the input and to add the previous product and the offset. The implementation cost C2 of the linear piece-wise approach can be expressed as: C2 = comp(NL) + ROM(2 × NL × K) + 1 Mul&Add (8) Taking a look at Fig. 7 it’s easy to realize that setting NL equal to NC the linear piece-wise case allows for a better faithfulness. Simulation results for several test images and different non-linear filtering structures proved that to obtain the same PSNR values, typically, it is necessary setting an NC value 3 times higher than NL. Working under these conditions the LUT size is nearly equivalent to the ROM size and the comparative logic overhead in the first case counterbalances the need of a multiplier and an adder in the second one. Thus the complexity analysis brings to similar results in the two cases. Since complexity results for equal PSNR values are comparable, for the selection of most suitable linearization approach we have to consider another aspect. A subjective analysis of the visual results suggests that a constant piecewise approach is unadvisable for transformation of quantities directly observable at the output of the system, like illumination and reflectance transformations Γ and β in the proposed case study. The reason is that for equal PSNR values the visual quality of the constant piece-wise approach is worse than the linear piece-wise one. As it’s shown in Figs. 9-10, using constant piece-wise approach to approximate Γ (9a) or β (10a) causes bad effects on the output image. In the first case some round
halos due to output quantization of illumination component is observable. In the second one the quantization of reflectance component causes a sort of granular noise to appear. The linear piece-wise approach (9b and 10b) guarantees better quality results in both cases. Instead, the constant piece-wise approach is advisable when the non linear transformation involves quantities not directly observable at the system output, like the filter coefficients depending on So and Sv.
Figure 8. Block diagram for the two approaches
Quantization noise disappears in b a) b) Figure 9. a) Γ with constant piece-wise approach; b) Γ with linear piece-wise approach. The image is originally noisy
Granular noise disappears in b a) b) Figure 10. a) β with constant piece-wise approach; b) β with linear piece-wise approach
In this case the quantization doesn’t produce noticeable effects on the system output. A worst approximation can be accepted and hence, adopting constant piece-wise, the implementation cost is less complex.
4.2 Linearization of Retinex-like Non linear Operators The general linearization strategy described in Section 4.1 is applied hereafter to the case study of Retinex-like operator (see Section 2). The finally optimized algorithm has to respect the PSNR specification of at least 30 dB. In the attempt of respecting such a specification we have to consider that part of the implementation loss is due to the approximation of the first sub-module (recursive rational filter F plus divider) and another part is due to the approximation of the second sub-module (Γ and β blocks). After linearization, the step of bit-true arithmetic definition further increases the implementation loss. Therefore when linearizing each of the two sub-modules we have to respect a safety margin of 5 dB at least, that is totally 35 dB of PSNR. This margin is needed because while optimizing each sub-module the other is supposed to be ideal, that is behaving exactly as it is theoretically described. As far as the approximation of filter F is concerned, the optimization of So and Sv transformations has to be addressed. In this case, the constant piece-wise approach has been used for the reasons exposed in Section 4.1. Considering the mathematical definition of these two functions it’s easy to verify that, representing them in a logarithmic scale, they are symmetric functions of the argument X, as confirmed by the graph plotted in Fig. 5. This observation suggests that we could take care of implementing just one side of the curve. In fact, for input values smaller than 1 the transformation can be accomplished by the right side of the curve, once the input has been replaced with its reciprocal. This established, we should decide the total number of edges and the relevant distribution law. In this stage, the local slope of the curve and the results of the statistical analysis have been taken into consideration. Referring to an image representing a white swan on a dark background (Fig. 2), the histogram of So (Sv) matrix has been plotted for the input range X∈[1÷2]. This choice is due to the fact that computer simulations proved that only a small percentage (5 %) of the inputs assumes values greater than 2. In Fig. 11, the histogram is plotted for the reference image, dividing the range [1÷2] in 8 equal parts. Similar results have been obtained for other test input videos. The distribution law for the edges has been decided according to a precise criterion: they have to be placed so that every input gap between two consecutive edges has roughly the same probability of occurrence. For instance, since the first bin has a probability to occur ten times greater than the forth bin has, in the first there will be a number of edges ten times grater than in the forth. As far as the total number of edges is concerned, Fig. 12 plots the PSNR results as a
function of the edges used in the constant piece-wise technique. As comparison the dotted line in Fig. 12 represents the PSNR results obtained with an 8-edge linear piece-wise solution. By using at least 23 edges for the constant piece-wise technique permits to achieve equal or better PSNR results vs. the 8-edge linear piece-wise one (37.5 dB with a 7.5 dB margin vs. the 30 dB target). For the second system sub-module (luminance and reflectance processing by Γ and β blocks) linear piece-wise approach has been used because its results are directly observable at the output of the system, as discussed in Section 4.1 and shown in Figs. 9, 10. The total PSNR is caused by a joint effect of the Γ and β approximation. Obviously the least precise approximation is the one which limits most the PSNR value. The illumination information has a greater impact on the observer with respect to reflectance, so we can use a coarser approximation for β transformation, provided that the function of detail enhancement is accomplished. This means forcing the β block to be the limiting one on the PSNR value, that must totally respect the specification of 35 dB. Γ approximation has been first performed keeping β to its infinite precision description.
Figure 11. Histogram of So (Sv) matrix in the input range X∈[1,2]
Figure 12. PSNR (dB) vs. number of edges in the So (Sv) approximation by constant piece-wise approach
Figure 13. PSNR (dB) in Γ approximation by linear piece-wise approach
different implementations of the β block. These PSNR values have been obtained adopting for Γ the aforesaid 16edge cubic approximation. Because of the law describing β, we chose an exponential progression as edge distribution law. As reported in Fig. 14 the less complex solution respecting the 35 dB specifications, is the one involving 3 edges. Similar PSNR values can be achieved with a 10-edge constant piece-wise approach, but having the result suffering from the bad effect shown in Figs. 9a and 10a. To be noticed that PSNR results in Figs. 12 to 15 are averaged on several test inputs, (acquired in different lighting conditions, see for example Figs. 2, 9 and 10), leading to the same conclusions for the linearization process. The same procedure has been followed for the divider and the linear stretching blocks. Also in these cases a statistical analysis helped us to optimize edges placement. A LUT-based constant piece-wise approach has turned out to be the best one in both cases.
5 Bit-true Arithmetic Definition
Figure 14. PSNR (dB) in β approximation by linear piece-wise approach Then the approximation of β has been carried out using the linearized Γ function. The Γ optimization metric has been a high accuracy to the detriment of the complexity. Fig. 13 shows, for the Γ linearization, the PSNR results as a function of the γ parameter and for different choices of the edge number and distribution law. Varying γ, the shape of Γ transformation changes and the approximation degree of the piece-wise too. First, using the same number of edges (e.g. equal to 11 in Fig. 13), we changed the distribution law. Having a greater concentration of edges near the axis origin is useful, because in this zone Γ transformation changes more rapidly and is more sensitive to parameter variations (see Fig. 3), so we need a higher precision. Once discovered that cubic law is the most suitable, we started increasing the edge number and we observed saturation over 16 edges. So the choice was to implement 16 edges in cubic progression. Similar PSNR values can be achieved with a 40-edges constant piecewise approach, but having the result suffering from the bad effect shown in Figs. 9a and 10a. The β optimization metric has been a low complexity to the detriment of the accuracy. Fig. 14 plots the PSNR results as a function of the parameter b and for four
The linearization process described in Section 4 has been carried out using a float type for the numbers involved, so also the approximated function is executed in floating point arithmetic with a precision of 64 bits. Translating these operators in physical resources a further implementation loss has to be faced. A low complexity hardware can’t support floating point operations with such a precision. Fixed point arithmetic can be more easily implemented in low complex hardware macrocells. So the model chosen for arithmetical representation is fixed point where the integer part uses the minimum number of bits for a correct representation of signal dynamic (in our case study 2 bits for reflectance and 8 bits for luminance). Negative numbers are represented through two’s complement convention requiring 1 sign bit. Furthermore, N-bit fractional part is used. N should be determined according to a trade-off between complexity of arithmetical operators and processing accuracy. The higher N, the smaller the implementation loss, but at the expenses of an increased complexity. We solved this problem by simulating the real system operative mode through a structural C model where a floating point input is transformed to the nearest fixed point value by cutting the less significant bits off. This approach is based on the use of the predefined C function floor returning the integer part of a float input. The operation carried out on every input and output of the arithmetical operators is the following: out =
(
floor in ⋅ 2 N
)
2N being N the number of fractional bits we want to use for fixed point representation. Fig. 15 plots the PSNR result of the proposed Retinex-like operator, averaged on several test inputs, as a function of the number of fractional bits. When increasing the number of fractional bits, at the
beginning, a proportional increase of PSNR performance can be achieved. After a cut-off value the graph saturates and gets stuck to a fixed PSNR result (32 dB in Fig. 15). This should be attributed to the fact that the piece-wise approximation error prevails and makes useless a further improvement of arithmetic precision. This observation can be adopted as a practical methodology to define the arithmetic of the implemented system. Generally speaking, the cutoff value should be the solution to the trade-off between complexity and precision. In our case study 5÷6 bits could be a right choice. Besides this general procedure, other considerations can be done. For example, it could be the case to differentiate the precision of some quantities to lower the complexity while keeping the same processing accuracy. Particularly, we analyzed the possibility of turning the edge precision down to few fractional bits. In fact, there’s no need to strictly fix the edge values. They can range over a small gap not making the approximation worse. This way the complexity of comparators can be reduced while keeping a good approximation of the implemented algorithm with respect to the original one. In the considered case study our final choice has been using a 6bit precision for the fractional part of every involved quantity, except for the edges which are represented with a 2-bit precision.
In Table 1 the RAM required to implement the Retinexlike operator in Section 2 is quoted for five different frame formats. In Table 2 the timing requirements are specified as minimum clock frequencies the system has to work at in order to elaborate a still image in 1 second. The architecture of the whole Retinex-like algorithm has been synthesized, within Xilinx ISE environment, on a low-cost FPGA device as the Spartan-3 xc3s1000.
Figure 15. PSNR (dB) vs. the number of fractional bits
6. Architecture Design and Synthesis After linearization and bit-true arithmetic definition in Sections 4 and 5, the optimization flow goes on with the architectural design by an HDL (typically VHDL/Verilog). The more the C optimization has been accurate the easier is the phase of HDL architecture description. That is, a detailed description of the hardware in C language permits to shorten the architectural design time. Furthermore, an accurate tuning of the optimized algorithm in C environment increases the possibilities of success in the verification step. While tuning in C environment is a very fast process, HDL simulation is several times slower. It’s important to notice that using the constant or linear piecewise linearization approach the whole architecture is easily implementable by using few Look-Up Tables, comparators and Mul&Add structures whose operand size has been determined in the machine arithmetic design step. As an example, Fig.16 shows the block diagram of the recursive rational filter (see Section 2) that estimates the ambient illumination y from the input image. This is undoubtedly the most limiting part in the complexity optimization process, because this is the block that requires the higher quantity of computational resources. While the logic synthesis is made agile by the above approach, the storage question represents the major constraint for the implementation. In fact, some matrices are necessary to keep the information at the intermediate processing levels.
Figure 16. Architecture of the recursive rational filter Format QCIF (176 × 144) CIF (352 × 288) VGA (640 × 480) SDTV (720 × 576)
RAM 86.7 KB 347 KB 1.03 MB 1.38 MB
Table 1. RAM for storage of different formats Format QCIF (176 × 144) CIF (352 × 288) VGA (640 × 480) SDTV (720 × 576)
fclock 2.10 MHz 8.41 MHz 25.5 MHz 34.4 MHz
Table 2. Clock frequencies to elaborate different formats in 1 second
Power (mW): 1 image/sec
300
200
100
0 QCIF
CIF
VGA
SDTV
Image Format
Figure 17: Power cost of Retinex FPGA-design Positive results have been obtained for the logic complexity: only the 39 % of the device is used (26 % due to video information splitting in luminance and reflectance components), besides the utilization of 4 dedicated multipliers out of the 24 available. By using this kind of device, the timing requirements quoted in Table 2 are easily achievable. Therefore the generated IP macrocell well behaves when compared to specific implementation of Retinex-like operators as [8], targeting the same FPGA device. Using such a component the logic function can be easily accomplished, but off-chip RAM has to be used to provide the required memorization support, unless very simple formats (e.g. 125 × 86 pixels as in [8]) are addressed. A synthesis of the same VHDL on 0.18 µm CMOS standard-cells technology has been performed within Synopsys environment with reference to real-time processing of CIF format at 30 frames/s. The obtained ASIC has complexity of 90-Kgates for the logic plus 347KB of RAM (see Table 1). As far as energy cost is concerned, Fig. 17 reports the power spent by the FPGA implementation of our architecture to process all the different still image formats in 1 second. Independently from the supported image format 140 mW are due to the quiescent power contribution. The reported data has been obtained using the XPower tool from Xilinx.
7. Conclusion A design methodology suitable for the cost-effective and real-time implementation of non-linear image processing algorithms has been presented in the paper. The proposed optimization flow aims at simplifying designer’s duty to achieve a low complexity realization in FPGA or ASIC technology with low accuracy loss for the implemented algorithm. First, starting from high-level functional descriptions the design space is explored to achieve a linearized structural C model of the algorithm with finite arithmetic precision. For the design space exploration both subjective and objective criteria are adopted while a statistical analysis of typical input images drives the optimization process. Starting from the structural C model a parametric VLSI architecture (an IP
macrocell modelled through VHDL) has been derived and synthesized in ASIC or FPGA technology. As an application case study, the design of a Retinex-like processing system for the correction of images acquired in bad lighting conditions is also presented. The generated IP macrocell implements the original algorithm with low accuracy loss and a complexity suitable for the implementation on low-cost FPGA-device. Real-time processing is achieved for the main formats with limited power consumption, in the order of hundreds of mW.
8. Acknowledgements This work has been partially supported by the PRIN 2003 "Low-power electronic systems for advanced multimedia applications" and FIRB "Reconfigurable platforms for wideband wireless communications" projects by the Italian Ministry for Instruction, University and Research.
9. References [1] S.K. Mitra, G.L. Sicuranza, “Nonlinear image processing”, Academic Press, 2001 [2] E. Land, J. McCann, “Lightness and retinex theory” Journal of the Optical Society of America, vol. 61, pp. 1-11, 1971 [3] G. Orsini, G. Ramponi, P. Carrai, R. Di Federico, “A modified retinex for image contrast enhancement and dynamic control”, Proc.IEEE ICIP03, Barcelona, Sept. 2003 [4] S. Marsi, G. Ramponi, S. Carrato, “Image contrast enhancement using a recursive rational filter”, Proc. IEEE IST04, Stresa, Italy, May 2004 [5] M. Ogata, T. Tsuchiya, T. Kubozono, K. Ueda,“Dynamic range compression based on illumination compensation”, IEEE Trans. Cons. Electr., vol. 47, n.3, pp. 548-558, 2001 [6] S. Carrato, “A pseudo-retinex approach for the visualizaton of high dynamic range images”, Proc. 5th COST 276 Workshop, Prague, Oct. 2003, pp. 15-20 [7] S. Marsi, S. Carrato, G. Ramponi, B. Crespi, “Video dynamic range compression for logarithmic CMOS imagers”, Proc. 6th COST 276 Workshop, Thessaloniki, Greece, May 2004 [8] A. Ukovich, S. Marsi, S. Carrato, “Feasibility study of the real-time implementation of an algorithm for high dynamics video data on a low-cost FPGA”, Proc. 7th COST 276 Workshop, Ankara, Turkey, Nov 2004 [9] S. Marsi, S. Carrato, G. Ramponi, B. Crespi, “A nonlinear pseudo-retinex for dynamic range compression in CMOS imaging systems”, Proc. IEEE NSIP03, Grado, June 2003 [10] S. Saponara, L. Fanucci, P. Terreni, “Design of a low-power VLSI macrocell for nonlinear adaptive video noise reduction”, EURASIP Journal of Applied Signal Processing, n. 12, vol. 2004, pp. 1921-1930, 2004 [11] D. Lee, W. Luk, J. Villasenor, P Cheung, “Hierarchical segmentation schemes for function evaluation”, Proc. IEEE Conf. on Field Programmable Tech., Dec. 2003, pp. 92–99 [12] S. Catunda, O. Saavedra, “Constraints definition and evaluation of piecewise polynomial approximation functions for embedded systems”, Proc. IEEE Instr. and Measurement Tech. Conference, May 2002. pp. 1103- 1108